HFST: Finnish Numerals from 1 to 99

In Beesley & Karttunen, the following task is given: Construct a transducer that maps the numbers 1-99 to numerals in some language other than English, and construct a transducer that translates from English numerals to numerals in your language, and vice versa.

We choose Finnish and show how to create the transducer with HFST command line tools. FORMAT defines the type of the transducer. The solution given on this page can also be executed with a single script.

First, we create a transducer that maps numbers 2 ... 9 to the corresponding numerals:

echo "2:kaksi
3:kolme
4:neljä
5:viisi
6:kuusi
7:seitsemän
8:kahdeksan
9:yhdeksän" | hfst-strings2fst -f $FORMAT -j > 2to9.hfst;

and 1 ... 9:

echo "1:yksi" | hfst-strings2fst -f $FORMAT | hfst-disjunct 2to9.hfst > 1to9.hfst;

10 is handled as a separate case:

echo "10:kymmenen" | hfst-strings2fst -f $FORMAT > 10.hfst;

From 11 to 19, i.e. [ 1:0 1to9 0:toista ]:

echo "1:" | hfst-strings2fst -f $FORMAT > 1toEps.hfst;
echo ":toista" | hfst-strings2fst -f $FORMAT > EpsToToista.hfst;
hfst-concatenate 1toEps.hfst 1to9.hfst | hfst-concatenate EpsToToista.hfst > 11to19.hfst;

From 20 to 99, i.e. [ 2to9 0:kymmentä ( "0":0 | 1to9 ) ]:

echo ":kymmentä" | hfst-strings2fst -f $FORMAT > EpsToKymmenta.hfst;
echo "0:" | hfst-strings2fst -f $FORMAT > 0toEps.hfst;
hfst-concatenate 2to9.hfst EpsToKymmenta.hfst > TMP;
hfst-disjunct 0toEps.hfst 1to9.hfst | hfst-concatenate -1 TMP > 20to99.hfst;

Finally, from 1 to 99:

hfst-disjunct 1to9.hfst 10.hfst | hfst-disjunct 11to19.hfst | hfst-disjunct 20to99.hfst > FinnishNumerals.hfst;

To get transducers that map Finnish to English numerals and vice versa, we use composition and inversion. We assume that the transducer EnglishNumbersToNumerals.hfst is already constructed as shown in HfstNumbersToNumerals.

hfst-invert FinnishNumerals.hfst | hfst-compose EnglishNumbersToNumerals.hfst > FinnishToEnglishNumerals.hfst;
hfst-invert FinnishToEnglishNumerals.hfst EnglishToFinnishNumerals.hfst;

Now we can test the transducers:

$ hfst-fst2strings -r 10 EnglishToFinnishNumerals.hfst
seventeen:seitsemäntoista
eleven:yksitoista
fourteen:neljätoista
forty-five:neljäkymmentäviisi
twelve:kaksitoista
ten:kymmenen
eight:kahdeksan
four:neljä
two:kaksi
one:yksi
$
$ hfst-fst2strings -r 10 EnglishToFinnishNumerals.hfst
fifty:viisikymmentä
ninety-one:yhdeksänkymmentäyksi
eleven:yksitoista
fourteen:neljätoista
forty-one:neljäkymmentäyksi
ten:kymmenen
eight:kahdeksan
three:kolme
one:yksi


-- ErikAxelson - 2011-08-23