HFST: Brazilian Portuguese

We exemplify the use of HFST command line tools with an example taken from Beesley & Karttunen (Finite-State Morphology, pages 470 - 473). See the solution in the book for more information on the example. FORMAT is the implementation format that is used. The solution given here can also be executed with a single script.

We first define vowel that is used in phonological rules.

echo "[ a | e | i | o | u " \
     "|  |  |  |  |  " \
     "|  |  |          " \
     "|  |              " \
     "|                  " \
     "|                  " \
     "]" | hfst-regexp2fst -f $FORMAT > Vowel

We have a set of rules that we want to compile into transducers and compose those transducers. We start with a simple identity transducer [?] named Solution that can be used as an initial value for the solution because for any transducer A, [?] .o. A is equivalent to A.

echo "[?]" | hfst-regexp2fst -f $FORMAT > Solution

We then compile each rule into a transducer and compose the current value of Solution with that transducer and set the result of the composition as a new value for Solution.

for i in \
  '[ s -> z || @"Vowel" _ @"Vowel" ]' \
  "[ cedille -> s ]" \
  "[ c h -> %$ ]" \
  "[ c -> s || _ [ e | i |  |  |  ] ]" \
  "[ c -> k ]" \
  "[ s s -> s ]" \
  "[ n h -> N ]" \
  "[ l h -> L ]" \
  "[ h -> 0 ]" \
  "[ r r -> R ]" \
  "[ r -> R || .#. _ ]" \
  "[ e -> i || _ (s) .#. , .#. p _ r ]" \
  "[ o -> u || _ (s) .#. ]" \
  "[ d -> J || _ [ i |  ] ]" \
  "[ t -> C || _ [ i |  ] ]" \
  "[ z -> s || _ .#. ]";
do
  echo $i | hfst-regexp2fst -f $FORMAT | hfst-compose -1 Solution > TMP;
  mv TMP Solution;
done

Finally we minimize the solution.

mv Solution BrazilianPortuguese


-- ErikAxelson - 2011-10-20