HFST: Monish Guesser-Analyzer

We exemplify the use of HFST command line tools with an example taken from Beesley & Karttunen (Finite-State Morphology, pages 475 - 476). See the solution in the book for more information on the example. FORMAT is the implementation format that is used. The solution given here can also be executed with a single script.

Define front, back and morphophonemic wovels.

echo '[ i | e |  |  ]' | hfst-regexp2fst -f $FORMAT > FrontV

echo '[ u | o |  | a ]' | hfst-regexp2fst -f $FORMAT > BackV

echo '[ %^U | %^O | %^ | %^A ]' | hfst-regexp2fst -f $FORMAT > MorphoV

Define verb roots as all strings that look like valid Monish roots.

echo '[ [ ? - [ @"BackV" | @"MorphoV" ] ]+ & $[ @"FrontV" ] ] |' \
      [ [ ? - [ @"FrontV" | @"MorphoV" ] ]+ & $[ @"BackV" ] ]' | \
        hfst-regexp2fst -f $FORMAT > Root

Define suffixes.

echo '[ %+Int .x. [ %^U %^U k ] ]' | hfst-regexp2fst -f $FORMAT > Suff1

echo '[ %+Perf .x. [ %^O n ] ] | ' \
     '[ %+Imperf .x. [ %^ m b ] ] | ' \
     '[ %+Opt .x. [ %^U d d ] ] ' > Suff2

echo '[ %+True .x. [ %^A n k ] ] | ' \
     '[ %+Belief .x. [ %^A %^A v %^O  t ] ] | ' \
     '[ %+Doubt .x. [ %^U %^U z ] ] | ' \
     '[ %+False .x. [ %^ q ] ] ' > Suff3

echo '[ %+1P %+Sg .x. %^A %^A b %^A ] |' \
     '[ %+2P %+Sg .x. %^ m %^A ] |' \
     '[ %+3P %+Sg .x. %^U v v %^U ] |' \
     '[ %+1P %+Pl %+Excl .x. %^A %^A b %^O r %^A ] |' \
     '[ %+1P %+Pl %+Incl .x. %^A %^A b %^U g %^A ] |' \
     '[ %+2P %+Pl .x. %^ m %^O r %^A ] |' \
     '[ %+3P %+Pl .x. %^U v v %^O r %^U ]' > Suff4

Define two replace rules:

echo '[ %^U -> u , %^O -> o, %^ ->  , %^A -> a || @"BackV" ?* _ ]' | \
     hfst-regexp2fst -f $FORMAT > Rule1

echo '[ %^U -> i , %^O -> e , %^ ->  , %^A ->  ]' | \
     hfst-regexp2fst -f $FORMAT > Rule2

Create the possible combinations of morphemes and apply rules to them:

echo '[ @"Root" (@"Suff1") @"Suff2" (@"Suff3") @"Suff4" ]' | \
     hfst-regexp2fst -f $FORMAT | \
     hfst-compose -2 Rule1 | hfst-compose -2 Rule2 \
       > MonishGuesserAnalyzer


-- ErikAxelson - 2011-10-20