hfst-substitute

Purpose

Relabel transducer arcs or replace them with a copy of a transducer.

Usage

The help message:

Usage: hfst-substitute [OPTIONS...] [INFILE]
Relabel transducer arcs

Common options:
  -h, --help             Print help message
  -V, --version          Print version info
  -v, --verbose          Print verbosely while processing
  -q, --quiet            Only print fatal erros and requested output
  -s, --silent           Alias of --quiet
Input/Output options:
  -i, --input=INFILE     Read input transducer from INFILE
  -o, --output=OUTFILE   Write output transducer to OUTFILE
Relabeling options:
  -f, --from-label=FLABEL      replace FLABEL
  -t, --to-label=TLABEL        replace with TLABEL
  -T, --to-transducer=TFILE    replace with transducer read from TFILE
  -F, --from-file=LABELFILE    read replacements from LABELFILE
  -R, --in-order               keep the order of the replacements
                               (with -F)
Input options:
  -C, --do-not-convert         require that transducers in TFILE and INFILE
                               have the same type
Transient optimisation schemes:
  -9, --compose                compose substitutions when possible

If OUTFILE or INFILE is missing or -, standard streams will be used.
Format of result depends on format of INFILE
LABEL must be a symbol name in single arc in transducer,
or colon separated pair defining an arc.
If TFILE is specified, FLABEL must be a pair.
LABELFILE is a 2 column tsv file where col 1 is FLABEL
and col 2 gives TLABEL specifications.

Examples:
  hfst-substitute -i tr.hfst -o tr_relabeled.hfst -f 'a' -t 'A'
      relabel all symbols 'a' with 'A'
  hfst-substitute -i tr.hfst -o tr_relabeled.hfst -f 'a:b' -t 'A:B'
      relabel all arcs 'a:b' with 'A:B'
  hfst-substitute -i tr.hfst -o tr_relabeled.hfst -f 'a:b' -T repl.hfst
      replace all arcs 'a:b' with transducer repl.hfst

Report bugs to <hfst-bugs@helsinki.fi> or directly to our bug tracker at:
<https://sourceforge.net/tracker/?atid=1061990&group_id=224521&func=browse>


The substitution options

--from-label=FLABEL

Defines a label that is substituted in all transitions of the input transducer. FLABEL must be a single symbol or a pair of symbols separated by a colon. The substituting label or transducer is given with --to-label or --to-transducer.

--to-label=TLABEL

Defines how the label FLABEL given with --from-label is substituted with another label. TLABEL must be a single symbol if FLABEL is a single symbol. TLABEL must be a pair of symbols separated by a colon if FLABEL is a pair.

--to-transducer=TFILE

Defines how the label FLABEL given with --from-label is substituted with a transducer. Only one transducer may be given in TFILE.

--from-file=LABELFILE

Defines a set of substitutions that are carried out on the input transducer. Both labels and label pairs may be given in LABELFILE. All label-to-label substitutions and label pair-to-label pair substitutions are performed at the same time, making the substitutions faster than doing them separately.

The label-to-label substitutions are by default done before the symbol pair-to-symbol pair substitutions, both in the order that they are given in LABELFILE. If the substitutions must be done in the same order as they are listed in LABELFILE, use option --in-order.

--in-order

Perform substitutions given with --from-file in the same order as they are listed in LABELFILE. This is slower than using the default order since substitutions are done one by one, but gives more control over the order of the substitutions.

Input formats

hfst-substitute supports rewriting a list of symbols or arcs with the --from-file LABELFILE option. The substitution file consists of tab separated lines containing two fields, the FLABEL field and the TLABEL field. When used with the option --in-order, the -F option basically performs hfst-substitute multiple times in a row, giving each time the first field as the -f parameter and the second field as the -t parameter. If --in-order is not used, the symbol-to-symbol substitutions are carried out before the symbol pair-to-symbol pair substitutions, both in the order that they are given in LABELFILE.

       hfst-substitute -o cg.hfst -F omor2cg.relabel omor.hfst
              transform omor tags to cg

Special symbols

The strings @0@ and @_EPSILON_SYMBOL_@ can be used to denote the epsilon symbol.

Examples

       hfst-substitute -f c -t f -i cat.hfst -o fat.hfst
              rewrite c's in cat.hfst with f's