Convert transducers between binary formats.


The help message:

Usage: hfst-fst2fst [OPTIONS...] [INFILE]
Convert transducers between binary formats

Common options:
  -h, --help             Print help message
  -V, --version          Print version info
  -v, --verbose          Print verbosely while processing
  -q, --quiet            Only print fatal erros and requested output
  -s, --silent           Alias of --quiet
Input/Output options:
  -i, --input=INFILE     Read input transducer from INFILE
  -o, --output=OUTFILE   Write output transducer to OUTFILE
Conversion options:
  -f, --format=FMT                  Write result in FMT format
  -b, --use-backend-format          Write result in implementation format, without any HFST wrappers
  -S, --sfst                        Write output in (HFST's) SFST implementation
  -F, --foma                        Write output in (HFST's) foma implementation
  -x, --xfsm                        Write output in native xfsm format
  -t, --openfst-tropical            Write output in (HFST's) tropical weight (OpenFST) implementation
  -l, --openfst-log                 Write output in (HFST's) log weight (OpenFST) implementation
  -O, --optimized-lookup-unweighted Write output in the HFST optimized-lookup implementation
  -w, --optimized-lookup-weighted   Write output in optimized-lookup (weighted) implementation
  -Q  --quick                       When converting to optimized-lookup, don't try hard to compress

If OUTFILE or INFILE is missing or -, standard streams will be used.
Format of result depends on format of INFILE
FMT must be name of a format usable by libhfst, i.e. one of the following:
{ foma, openfst-tropical, openfst-log, sfst, xfsm
  optimized-lookup-weighted, optimized-lookup-unweighted }.
Note that xfsm format is always written in native format without HFST wrappers.

Report bugs to <hfst-bugs@helsinki.fi> or directly to our bug tracker at:


Converting to efficient lookup format

We have a transducer big_transducer.hfst, say in OpenFst format, and would like to lookup hundreds of thousands of words in it. In that case we would probably want to use the HFST optimized lookup format, so we have to convert the transducer:

hfst-fst2fst --optimized-lookup-weighted big_transducer.hfst big_transducer.hfst_olw

Then we can perform efficient lookup:

cat 100_000_words | hfst-lookup big_transducer.hfst_olw > results

Using other tools with HFST

If we are used to writing simple one-path transducers using the formalism input:output and would like to use the commandline tools of, say SFST, we can convert HFST transducers into SFST by using the option --use-backend-format:

echo "cat:dog" | hfst-strings2fst -f sfst | hfst-fst2fst -f sfst --use-backend-format > cat2dog.sfst

and then use SFST's own tools:

fst-print cat2dog.sfst

Testing different backend implementations

We have a morphology morphology.sfstpl written in SFST programming language and we want to test whether we get the same result by compiling it with the different backend implementations. We can use the tool hfst-fst2fst to convert all results into a common format, say foma and testing whether they are equivalent:

hfst-sfstpl2fst morphology.sfstpl -f sfst | hfst-fst2fst -f foma > morphology.sfst
hfst-sfstpl2fst morphology.sfstpl -f openfst | hfst-fst2fst -f foma > morphology.openfst
hfst-sfstpl2fst morphology.sfstpl -f foma | hfst-fst2fst -f foma morphology.foma

hfst-compare morphology.sfst morphology.openfst &&
hfst-compare morphology.openfst morphology.foma &&
echo "Morphologies are equivalent."

An issue with foma

Foma writes its binary transducers in gzipped format using the gz tools. However, we experienced problems when trying to write to standard output or read from standard in with gz tools (foma tools do not write to or read from standard streams). So we choose to write, and accordingly read, foma transducers unzipped when writing or reading binary HFST transducers that use foma as back-end format. As a result, when we write an HFST transducer in its plain backend format, the user must zip it themselves before it can be used by foma tools. Similarily, a foma transducer must be unzipped before it can be read by HFST tools.

The following commands will create a file 'ab.foma.gz' that can be used by foma tools:

echo "a:b" | hfst-strings2fst -f foma > ab.hfst
hfst-fst2fst --use-backend-format -f foma > ab.foma
gzip ab.foma

Suppose we have a foma transducer transducer.foma and want to use it with HFST tools. The name of the file must be appended a .gz extension so that the program gunzip knows it is a zipped file:

mv transducer.foma transducer.foma.gz
gunzip transducer.foma.gz
hfst-sometool transducer.foma

-- ErikAxelson - 09 Jul 2008