HFST: Command Line Tools Tutorial

Cat to Dog

We give a simple example of creating a transducer that maps "cat" to "dog" with weight 1.5 and printing the string pairs recognized by that transducer. The following commands, when executed on the command line

echo "{cat}:{dog}::1.5" | hfst-regexp2fst | hfst-fst2strings

will yield the following output

cat:dog

To see how the strings are tokenized and the tokens aligned, we can print the input-output pairs and add a space between the pairs. We can also print the total weight of the string pair. This happens by adding the following options to hfst-fst2strings

--xfst=print-space --xfst=print-pairs --print-weights

which will yield

c:d a:o t:g     1.5

One feature of HFST tools is that they by default read from standard input and write to standard output, making it easy to pipeline a series of commands, as can be seen in our example. We could also have written separate commands to get the same result:

echo "cat:dog" > cat2dog.txt
hfst-strings2fst --input cat2dog.txt --output cat2dog.hfst
hfst-fst2strings --input cat2dog.hfst --output result.txt
cat result.txt

Another feature is that we are able to choose the implementation from several back-end libraries (openfst-tropical, foma, sfst) with the option --format or in short just -f. If back-end is not specified, openfst-tropical will be used. Sometimes it can be more efficient to use a certain library for a task. In our example the differences between libraries are negligible, so we could have used --format foma or --format sfst as well. Most of the time we don't need to worry about the back-end implementation and we can leave this parameter out, using the default openfst-tropical.

The parameters of a tool can be seen using the option --help or on the tool-specific wiki page. In our example, the commands hfst-regexp2fst --help and hfst-fst2strings --help and the pages HfstRegexp2Fst and HfstFst2Strings tell how the tools are used and what is their purpose.

Animal Nouns and Verbs

The tool hfst-lexc is very useful for writing grammars. We have a simple lexicon defined with lexc formalism in file lexicon.lexc as follows

LEXICON Root
Noun ;
Verb ;

LEXICON Noun
cat #;
dog #;

LEXICON Verb
mew #;
bark #;

and can easily convert that into a transducer by executing

hfst-lexc lexicon.lexc -o lexicon.hfst

which will write a corresponding transducer in the file lexicon.hfst. We can print the strings recognized by the transducer as follows

hfst-fst2strings lexicon.hfst

which will yield

bark
mew
dog
cat

What next


-- ErikAxelson - 2012-02-22