Compile simple (weighted) regular expressions into transducer(s).


The help message:

Usage: hfst-regexp2fst [OPTIONS...] [INFILE]
Compile (weighted) regular expressions into transducer(s)
Common options:
  -h, --help             Print help message
  -V, --version          Print version info
  -v, --verbose          Print verbosely while processing
  -q, --quiet            Only print fatal erros and requested output
  -s, --silent           Alias of --quiet
Input/Output options:
  -i, --input=INFILE     Read input transducer from INFILE
  -o, --output=OUTFILE   Write output transducer to OUTFILE
String and format options:
  -f, --format=FMT          Write result in FMT format
  -j, --disjunct            Disjunct all regexps instead of transforming
                            each regexp into a separate transducer
  -l, --line                Input is line separated (default)
  -S, --semicolon           Input is semicolon separated
  -e, --epsilon=EPS         Map EPS as zero, i.e. epsilon.
  -x, --xerox-composition=VALUE Whether flag diacritics are treated as ordinary
                                symbols in composition (default is false).
  -X, --xfst=VARIABLE       Toggle xfst compatibility option VARIABLE.
  -H, --do-not-harmonize    Do not expand '?' symbols.
  -F, --harmonize-flags     Harmonize flag diacritics.
  -E, --encode-weights      Encode weights when minimizing (default is false).

If OUTFILE or INFILE is missing or -, standard streams will be used.
FMT must be one of the following: {foma, sfst, openfst-tropical, openfst-log}.
If EPS is not defined, the default representation of 0 is used
VALUEs recognized are {true,ON,yes} and {false,OFF,no}.
Xfst variables are {flag-is-epsilon (default OFF)}.

  echo " {cat}:{dog} " | hfst-regexp2fst       create transducer {cat}:{dog}
  echo " {cat}:{dog}::3 " | hfst-regexp2fst    same but with weight 3
  echo " c:d a:o::3 t:g " | hfst-regexp2fst    same but with weight 3
                                               in the middle
  echo " cat ; dog ; "3" " | hfst-regexp2fst -S  create transducers
                                               "cat" and "dog" and "3"

Report bugs to <hfst-bugs@helsinki.fi> or directly to our bug tracker at:

Input formatting

The format for regular expressions supported by hfst-regexp2fst is the same as regular expression format in hfst-xfst with added support for weights. The original unweighted format for regular expression for describing two-level automata is detailed in the book "Finite-State Morphology" by Kenneth R. Beesley and Lauri Karttunen.