Convert AT&T tabular format into a binary transducer.


Usage: hfst-txt2fst [OPTIONS...] [INFILE]
Convert AT&T or prolog format into a binary transducer

Common options:
  -h, --help             Print help message
  -V, --version          Print version info
  -v, --verbose          Print verbosely while processing
  -q, --quiet            Only print fatal erros and requested output
  -s, --silent           Alias of --quiet
Input/Output options:
  -i, --input=INFILE     Read input transducer from INFILE
  -o, --output=OUTFILE   Write output transducer to OUTFILE
Text and format options:
  -f, --format=FMT    Write result using FMT as backend format
  -e, --epsilon=EPS   Interpret string EPS as epsilon in att format
  -p, --prolog        Read prolog format instead of att
Other options:
  -C, --check-negative-epsilon-cycles  Issue a warning if there are epsilon cycles
                                       with a negative weight in the transducer

If OUTFILE or INFILE is missing or -, standard streams will be used.
If FMT is not given, OpenFst's tropical format will be used.
The possible values for FMT are { foma, openfst-tropical, openfst-log,
sfst, optimized-lookup-weighted, optimized-lookup-unweighted }.
If EPS is not given, @0@ will be used.

Space in transition symbols must be escaped as '@_SPACE_@' when using
att format.

Report bugs to <hfst-bugs@helsinki.fi> or directly to our bug tracker at:


Compile cat.att to an hfst transducer using SFST as backend format:

  hfst-txt2fst -f sfst cat.att

The file cat.att:

0    1    c    c
1    2    a    a
2    3    t    t

The same with prolog file:

  hfst-txt2fst -f sfst --prolog cat.prolog

The file cat.prolog

arc(CAT, 0, 1, "c").
arc(CAT, 1, 2, "a").
arc(CAT, 2, 3, "t").
final(CAT, 3).

Input formats

AT&T format

The AT&T input format for this tool is same as in OpenFst tools; tab separated values where lines of 2 fields are used to represent final state number and its weight, and lines of 5 fields are used to represent transitions from state number to state number with given input symbol,output symbol and weight. The state number is integer as parsed by strtoul(3) The symbols are strings. The weight is a float as parsed by strtod(3) and can be omitted.

Prolog format

The prolog input format consists of an obligatory compound network followed by any number of compounds symbol followed by any number of compounds arc followed by any number of compounds final. The compounds are separated by newlines.

A common an obligatory first argument to all compounds is the name of the network, called NAME in the following explanations. NAME cannot contain commas. Compounds arc and final have an optional last argument called WEIGHT which specifies the weight of that arc or final state. A weight represented as a float, i.e. an optional plus or minus sign followed by at least one digit, optionally followed by a comma and at least one digit.

The compound network just specifies that we are dealing with a network. Its format is the following:


The compound symbol declares a symbol that is included in the network's alphabet but not present in any of the transitions. Its format is the following where argument SYMBOL is the name of the symbol:

symbol(NAME, "SYMBOL").

The compound arc declares a transition, possibly having a weight. Its format is one of the following where SOURCE and TARGET are the source and target state numbers, SYMBOL the transition symbol (if input and output are the same), INPUT_SYMBOL and OUTPUT_SYMBOL the input and output symbols (if not the same).


The compound final declares a final state, possibly with a final weight. Its format is one of the following where STATE is the final state number:

final(NAME, STATE).

Special symbols

The special symbols recognised by the tool when using AT&T format:

symbol representations recognised note
epsilon "@0@", "@_EPSILON_SYMBOL_@" option -e EPS allows the representation EPS
space "@_SPACE_@" an unescaped space is used as a field separator
colon ":", "@_COLON_@" the escaped "@_COLON_@" is also supported as it is sometimes used by tools hfst-fst2strings and hfst-lookup
tabulator "@_TAB_@" an unescaped tabulator is used as a field separator

The special symbols recognized by the tool when using prolog format:

symbol representation recognized note
epsilon "0"
zero "%0" zero as a part of symbol does not need to be escaped, e.g. "100"
identity "?" when used as only symbol, see also unknown below
unknown "?" when used in expressions of form "?":"foo", "foo":"?" and "?":"?"
question mark "%?"
double quote "%""
percent sign "%%"
tabulator none not supported

See also


-- ErikAxelson - 10 Jul 2008