HFST: English Numerals

We examplify the use of HFST command line tools with an example taken from Beesley & Karttunen that creates a transducer that recognizes English numerals from "one" to "ninety-nine". $FORMAT is the implementation type of the transducer. The solution given on this page can also be executed with a single script.

From "one" to "nine":

echo "one
two
three
four
five
six
seven
eight
nine" | hfst-strings2fst -j -f $FORMAT > OneToNine.hfst

It is convenient to define a set of prefixes that can be followed either by "teen" or by "ty":

echo "thir
fif
six
seven
eigh
nine" | hfst-strings2fst -j -f $FORMAT > TeenTen.hfst

From "ten" to "nineteen":

echo "ten
eleven
twelve" | hfst-strings2fst -j -f $FORMAT > TenElevenTwelve.hfst;
echo "teen" | hfst-strings2fst -f $FORMAT > Teen.hfst;
echo "four" | hfst-strings2fst -f $FORMAT | hfst-disjunct TeenTen.hfst | hfst-concatenate -2 Teen.hfst | hfst-disjunct TenElevenTwelve.hfst > Teens.hfst;

Let's define a set of prefixes that can be followed by "ty".

echo "twen
for" | hfst-strings2fst -j -f $FORMAT | hfst-disjunct TeenTen.hfst > TenStem.hfst

TenStem is followed either by "ty" or by ty-" and a number from OneToNine.

echo "ty" | hfst-strings2fst -f $FORMAT | hfst-concatenate -1 TenStem.hfst > TMP.hfst;
echo "" | hfst-strings2fst -f $FORMAT > Epsilon.hfst;
echo "-" | hfst-strings2fst -f $FORMAT | hfst-concatenate -2 OneToNine.hfst | hfst-disjunct Epsilon.hfst | hfst-concatenate -1 TMP.hfst > Tens.hfst

hfst-disjunct OneToNine.hfst Teens.hfst | hfst-disjunct -2 Tens.hfst > OneToNinetyNine.hfst

Done. Let's print some examples.

hfst-fst2strings OneToNinetyNine.hfst --random 5


-- ErikAxelson - 2011-04-04