Difference: HfstCommandLineTools (1 vs. 82)

Revision 822016-05-20 - ErikAxelson

Line: 1 to 1
 
META TOPICPARENT name="HfstHome"

HFST: Command Line Tools

Line: 9 to 9
 

Downloading and Installation

Changed:
<
<
NOTE: HFST will be migrated under Github at the end of Februrary 2016.

Tools can be fetched from the Sourceforge download page. We offer Debian packages for Linux, a Windows installer as well as a Macport installation. It is also possible to compile the tools from source.

>
>
Tools can be fetched from our download page. We offer Debian packages for Linux, a Windows installer as well as a Macport installation. It is also possible to compile the tools from source.
  For installing from scratch, see instructions in INSTALL. Briefly, the usual ./configure && make &&  (sudo) make install should result in a local installation and make uninstall in its uninstallation. If you would rather install in eg. your home directory (or aren't the system administrator), you can tell ./configure: ./configure --prefix=${HOME}.

Revision 812016-05-18 - KristerLinden

Line: 1 to 1
 
META TOPICPARENT name="HfstHome"

HFST: Command Line Tools

Line: 323 to 323
  Source code archive contains test suite make check, which must be passed for all distributed versions, unless clearly labeled as alpha test versions.
Deleted:
<
<

  -- TommiPirinen
>
>
-- TommiPirinen -->
 
<-- vim: set ft=twiki: -->

META TOPICMOVED by="TommiPirinen" date="1236767171" from="KitWiki.NotYetHfstToolCommandLines" to="KitWiki.HfstCommandLineTools"
Added:
>
>
META PREFERENCE name="VIEW_TEMPLATE" title="VIEW_TEMPLATE" type="Set" value="FinCLARIN.ViewFinClarinWideEngTemplate"

Revision 802016-02-23 - ErikAxelson

Line: 1 to 1
 
META TOPICPARENT name="HfstHome"

HFST: Command Line Tools

Line: 9 to 9
 

Downloading and Installation

Added:
>
>
NOTE: HFST will be migrated under Github at the end of Februrary 2016.
 Tools can be fetched from the Sourceforge download page. We offer Debian packages for Linux, a Windows installer as well as a Macport installation. It is also possible to compile the tools from source.

For installing from scratch, see instructions in INSTALL. Briefly, the usual ./configure && make &&  (sudo) make install

Revision 792014-04-01 - ErikAxelson

Line: 1 to 1
 
META TOPICPARENT name="HfstHome"

HFST: Command Line Tools

Line: 53 to 53
 
hfst-lookup Fast look-up of strings in a transducer.
hfst-minimize Minimize transducer input.
hfst-name Name or print the name of each transducer in input.
Changed:
<
<
hfst-optimized-lookup Run a transducer on standard input (one word per line) and print analyses. Slightly more efficient version of hfst-lookup.
>
>
hfst-optimized-lookup Run a transducer on standard input (one word per line) and print analyses. More efficient version of hfst-lookup.
hfst-ospell Spell check using HFST finite-state automata.
 
hfst-pair-test Test a Twol rule file using correspondences of strings.
hfst-pmatch Perform matching/transformation on text streams with a RTN system.
hfst-pmatch2fst Compile regular expressions into transducer(s) for use with hfst-pmatch.

Revision 782014-02-24 - ErikAxelson

Line: 1 to 1
 
META TOPICPARENT name="HfstHome"

HFST: Command Line Tools

Line: 48 to 48
 
hfst-head Take N first transducers in transducer input.
hfst-info Print known data of HFST library.
hfst-invert Invert each transducer in input.
Changed:
<
<
hfst-lexc A wrapper for foma's lexc, the native tool is hfst-lexc.
>
>
hfst-lexc-wrapper A wrapper for foma's lexc, the native tool is hfst-lexc.
 
hfst-lexc Compile lexicon files in Xerox Lexc formalism into an HFST transducer.
hfst-lookup Fast look-up of strings in a transducer.
hfst-minimize Minimize transducer input.

Revision 772014-02-19 - ErikAxelson

Line: 1 to 1
 
META TOPICPARENT name="HfstHome"

HFST: Command Line Tools

Line: 48 to 48
 
hfst-head Take N first transducers in transducer input.
hfst-info Print known data of HFST library.
hfst-invert Invert each transducer in input.
Changed:
<
<
hfst-lexc Compile lexicon files in Xerox Lexc. formalism into an HFST transducer.
hfst-lexc2fst Legacy tool for compiling lexc files into transducer. See also hfst-lexc.
>
>
hfst-lexc A wrapper for foma's lexc, the native tool is hfst-lexc.
hfst-lexc Compile lexicon files in Xerox Lexc formalism into an HFST transducer.
 
hfst-lookup Fast look-up of strings in a transducer.
hfst-minimize Minimize transducer input.
hfst-name Name or print the name of each transducer in input.

Revision 762014-02-10 - ErikAxelson

Line: 1 to 1
 
META TOPICPARENT name="HfstHome"

HFST: Command Line Tools

Line: 48 to 48
 
hfst-head Take N first transducers in transducer input.
hfst-info Print known data of HFST library.
hfst-invert Invert each transducer in input.
Changed:
<
<
hfst-lexc Compile lexicon files in Xerox Lexc. formalism into an HFST transducer.
hfst-lexc2fst Legacy tool for compiling lexc files into transducer. See also hfst-lexc.
>
>
hfst-lexc Compile lexicon files in Xerox Lexc. formalism into an HFST transducer.
hfst-lexc2fst Legacy tool for compiling lexc files into transducer. See also hfst-lexc.
 
hfst-lookup Fast look-up of strings in a transducer.
hfst-minimize Minimize transducer input.
hfst-name Name or print the name of each transducer in input.
Line: 177 to 177
  HFST tools that operate on multiple input parameters are:
Changed:
<
<
hfst-lexc
>
>
hfst-lexc
 

Defining the backend format

Revision 752013-12-04 - ErikAxelson

Line: 1 to 1
 
META TOPICPARENT name="HfstHome"

HFST: Command Line Tools

Line: 9 to 9
 

Downloading and Installation

Changed:
<
<
Tools can be fetched from the Sourceforge download page.
>
>
Tools can be fetched from the Sourceforge download page. We offer Debian packages for Linux, a Windows installer as well as a Macport installation. It is also possible to compile the tools from source.
 
Changed:
<
<
For installing instructions, see INSTALL. Briefly, the usual ./configure && make &&  (sudo) make install should result in a local installation and make uninstall in its uninstallation. If you would rather install in eg. your home directory (or aren't the system administrator), you can tell ./configure: ./configure --prefix=${HOME}
>
>
For installing from scratch, see instructions in INSTALL. Briefly, the usual ./configure && make &&  (sudo) make install should result in a local installation and make uninstall in its uninstallation. If you would rather install in eg. your home directory (or aren't the system administrator), you can tell ./configure: ./configure --prefix=${HOME}.
 

Getting started

Revision 742013-10-16 - ErikAxelson

Line: 1 to 1
 
META TOPICPARENT name="HfstHome"

HFST: Command Line Tools

Line: 39 to 39
 
hfst-disjunct Disjoin (calculate the union of) two transducer inputs pairwise.
hfst-duplicate Use first transducer of an archive repeatedly.
hfst-edit-metadata Set values of properties in transducer headers.
Added:
>
>
hfst-foma Wrapper around foma. Native HFST tool is hfst-xfst.
 
hfst-format Give the implementation format of transducer input.
hfst-fst2fst Convert between HFST, OpenFst, SFST and foma transducers.
hfst-fst2strings Display the strings recognized by a transducer.
Line: 80 to 81
 
hfst-traverse Walk through the transducer arc by arc.
hfst-twolc Compile a two-level grammar in Xerox Twolc formalism into an HFST transducer.
hfst-txt2fst Convert AT&T tabular format into a binary transducer.
Changed:
<
<
hfst-xfst Compile files in xfst language into HFST transducers.
>
>
hfst-xfst Compile XFST scripts or use XFST commands in interactive mode.
 

Usage

Revision 732013-01-16 - ErikAxelson

Line: 1 to 1
 
META TOPICPARENT name="HfstHome"

HFST: Command Line Tools

Line: 60 to 60
 
hfst-pmatch2fst Compile regular expressions into transducer(s) for use with hfst-pmatch.
hfst-proc Perform morphological analysis and generation with finite state transducers.
hfst-project Project a transducer towards input or output level.
Added:
>
>
hfst-prune-alphabet Remove symbols from the alphabet of a transducer that do not occur in any of the transitions.
 
hfst-push-weights Push weights of a transducer towards initial or final state(s).
hfst-regexp2fst Convert regular expression(s) into transducer output.
hfst-remove-epsilons Remove epsilons from transducer input.

Revision 722012-11-27 - ErikAxelson

Line: 1 to 1
 
META TOPICPARENT name="HfstHome"

HFST: Command Line Tools

Line: 37 to 37
 
hfst-conjunct Conjunct (intersect) two transducer inputs pairwise.
hfst-determinize Determinize transducer input.
hfst-disjunct Disjoin (calculate the union of) two transducer inputs pairwise.
Changed:
<
<
hfst-duplicate Use first transducer of an archive repeatedly.
>
>
hfst-duplicate Use first transducer of an archive repeatedly.
 
hfst-edit-metadata Set values of properties in transducer headers.
hfst-format Give the implementation format of transducer input.
hfst-fst2fst Convert between HFST, OpenFst, SFST and foma transducers.

Revision 712012-11-27 - SamHardwick

Line: 1 to 1
 
META TOPICPARENT name="HfstHome"

HFST: Command Line Tools

Line: 56 to 56
 
hfst-name Name or print the name of each transducer in input.
hfst-optimized-lookup Run a transducer on standard input (one word per line) and print analyses. Slightly more efficient version of hfst-lookup.
hfst-pair-test Test a Twol rule file using correspondences of strings.
Changed:
<
<
hfst-pmatch Perform matching/lookup on text streams.
hfst-pmatch2fst Compile regular expressions into transducer(s).
>
>
hfst-pmatch Perform matching/transformation on text streams with a RTN system.
hfst-pmatch2fst Compile regular expressions into transducer(s) for use with hfst-pmatch.
 
hfst-proc Perform morphological analysis and generation with finite state transducers.
hfst-project Project a transducer towards input or output level.
hfst-push-weights Push weights of a transducer towards initial or final state(s).

Revision 702012-10-17 - ErikAxelson

Line: 1 to 1
 
META TOPICPARENT name="HfstHome"

HFST: Command Line Tools

Line: 54 to 54
 
hfst-lookup Fast look-up of strings in a transducer.
hfst-minimize Minimize transducer input.
hfst-name Name or print the name of each transducer in input.
Changed:
<
<
hfst-optimized-lookup Run a transducer on standard input (one word per line) and print analyses. Slightly more efficient version of hfst-lookup.
>
>
hfst-optimized-lookup Run a transducer on standard input (one word per line) and print analyses. Slightly more efficient version of hfst-lookup.
 
hfst-pair-test Test a Twol rule file using correspondences of strings.
hfst-pmatch Perform matching/lookup on text streams.
hfst-pmatch2fst Compile regular expressions into transducer(s).

Revision 692012-10-17 - ErikAxelson

Line: 1 to 1
 
META TOPICPARENT name="HfstHome"

HFST: Command Line Tools

Line: 73 to 73
 
hfst-subtract Subtract pairwise two transducer inputs.
hfst-substitute Substitute transition(s) in each input transducer with another transition(s) or a transducer.
hfst-summarize Print general information of a transducer.
Added:
>
>
hfst-tag Tag a text file using an hfst tagger.
 
hfst-tail Take N last transducers in the input.
Added:
>
>
hfst-train-tagger Compile training data file into an hfst part-of-speech tagger.
 
hfst-traverse Walk through the transducer arc by arc.
hfst-twolc Compile a two-level grammar in Xerox Twolc formalism into an HFST transducer.
hfst-txt2fst Convert AT&T tabular format into a binary transducer.

Revision 682012-10-16 - ErikAxelson

Line: 1 to 1
 
META TOPICPARENT name="HfstHome"

HFST: Command Line Tools

Line: 202 to 202
  Tools that take transducer(s) as input use the backend functions of the input transducers and write output in the same format. To use the functions
Changed:
<
<
of a different transducer library, the user must perform explicit conversion with =hfst-fst2fst=. For example,
>
>
of a different transducer library, the user must perform explicit conversion with hfst-fst2fst. For example,
 if transducer.sfst is a binary transducer in SFST format and the user wishes to use foma's inversion function and get the result in SFST format, the following commands are needed:

Revision 672012-10-15 - ErikAxelson

Line: 1 to 1
 
META TOPICPARENT name="HfstHome"

HFST: Command Line Tools

Line: 38 to 38
 
hfst-determinize Determinize transducer input.
hfst-disjunct Disjoin (calculate the union of) two transducer inputs pairwise.
hfst-duplicate Use first transducer of an archive repeatedly.
Changed:
<
<
hfst-edit-metadata Name a transducer. Same as hfst-name?
>
>
hfst-edit-metadata Set values of properties in transducer headers.
 
hfst-format Give the implementation format of transducer input.
hfst-fst2fst Convert between HFST, OpenFst, SFST and foma transducers.
hfst-fst2strings Display the strings recognized by a transducer.
Line: 50 to 50
 
hfst-info Print known data of HFST library.
hfst-invert Invert each transducer in input.
hfst-lexc Compile lexicon files in Xerox Lexc. formalism into an HFST transducer.
Changed:
<
<
hfst-lexc2fst Compile lexc files into transducer. Same as hfst-lexc?
>
>
hfst-lexc2fst Legacy tool for compiling lexc files into transducer. See also hfst-lexc.
 
hfst-lookup Fast look-up of strings in a transducer.
hfst-minimize Minimize transducer input.
hfst-name Name or print the name of each transducer in input.
Changed:
<
<
hfst-optimized-lookup Run a transducer on standard input (one word per line) and print analyses. Same as hfst-lookup?
hfst-pair-test Test a Twol rule file using correspondences of strings. ??
>
>
hfst-optimized-lookup Run a transducer on standard input (one word per line) and print analyses. Slightly more efficient version of hfst-lookup.
hfst-pair-test Test a Twol rule file using correspondences of strings.
 
hfst-pmatch Perform matching/lookup on text streams.
hfst-pmatch2fst Compile regular expressions into transducer(s).
hfst-proc Perform morphological analysis and generation with finite state transducers.

Revision 662012-10-10 - ErikAxelson

Line: 1 to 1
 
META TOPICPARENT name="HfstHome"

HFST: Command Line Tools

Line: 58 to 58
 
hfst-pair-test Test a Twol rule file using correspondences of strings. ??
hfst-pmatch Perform matching/lookup on text streams.
hfst-pmatch2fst Compile regular expressions into transducer(s).
Deleted:
<
<
hfst-preprocess-for-optimized-lookup-format Remove epsilons from a transducer. Anything else?
 
hfst-proc Perform morphological analysis and generation with finite state transducers.
hfst-project Project a transducer towards input or output level.
hfst-push-weights Push weights of a transducer towards initial or final state(s).
Line: 71 to 70
 
hfst-shuffle Shuffle two transducers.
hfst-split Write each transducer in the input into a separate file.
hfst-strings2fst Compile string pairs and pair-strings into transducers.
Deleted:
<
<
hfst-strip-header Remove any HFST3 headers. hfst-fst2fst does the same?
 
hfst-subtract Subtract pairwise two transducer inputs.
hfst-substitute Substitute transition(s) in each input transducer with another transition(s) or a transducer.
hfst-summarize Print general information of a transducer.

Revision 652012-10-09 - ErikAxelson

Line: 1 to 1
 
META TOPICPARENT name="HfstHome"

HFST: Command Line Tools

Line: 28 to 28
 

Tool Purpose
Added:
>
>
hfst-affix-guessify Create weighted affix guesser from automaton.
hfst-calculate An alias for hfst-sfstpl2fst.
 
hfst-compare Compare two transducer inputs for equivalence.
hfst-compose Compose two transducer inputs pairwise.
hfst-compose-intersect Compute the intersecting composition of a lexicon transducer and rule transducers.
Line: 35 to 37
 
hfst-conjunct Conjunct (intersect) two transducer inputs pairwise.
hfst-determinize Determinize transducer input.
hfst-disjunct Disjoin (calculate the union of) two transducer inputs pairwise.
Added:
>
>
hfst-duplicate Use first transducer of an archive repeatedly.
hfst-edit-metadata Name a transducer. Same as hfst-name?
 
hfst-format Give the implementation format of transducer input.
hfst-fst2fst Convert between HFST, OpenFst, SFST and foma transducers.
hfst-fst2strings Display the strings recognized by a transducer.
hfst-fst2txt Print transducer in AT&T tabular format.
Added:
>
>
hfst-grep Search for PATTERN in each FILE or standard input.
hfst-guess Use a guesser (and generator) to guess analyses or inflectional paradigms of unknown words.
hfst-guessify Compile a morphological analyzer into a guesser and generator.
 
hfst-head Take N first transducers in transducer input.
Added:
>
>
hfst-info Print known data of HFST library.
 
hfst-invert Invert each transducer in input.
hfst-lexc Compile lexicon files in Xerox Lexc. formalism into an HFST transducer.
Added:
>
>
hfst-lexc2fst Compile lexc files into transducer. Same as hfst-lexc?
 
hfst-lookup Fast look-up of strings in a transducer.
hfst-minimize Minimize transducer input.
hfst-name Name or print the name of each transducer in input.
Changed:
<
<
hfst-pair-test Test a Twol rule file using correspondences of strings.
>
>
hfst-optimized-lookup Run a transducer on standard input (one word per line) and print analyses. Same as hfst-lookup?
hfst-pair-test Test a Twol rule file using correspondences of strings. ??
hfst-pmatch Perform matching/lookup on text streams.
hfst-pmatch2fst Compile regular expressions into transducer(s).
hfst-preprocess-for-optimized-lookup-format Remove epsilons from a transducer. Anything else?
 
hfst-proc Perform morphological analysis and generation with finite state transducers.
hfst-project Project a transducer towards input or output level.
hfst-push-weights Push weights of a transducer towards initial or final state(s).
Line: 53 to 66
 
hfst-remove-epsilons Remove epsilons from transducer input.
hfst-repeat Repeat a transducer from N to M times.
hfst-reverse Reverse each transducer in input.
Added:
>
>
hfst-reweight Reweight transducer weights simply.
 
hfst-sfstpl2fst Compile files in SFST programming language into HFST transducers.
Added:
>
>
hfst-shuffle Shuffle two transducers.
 
hfst-split Write each transducer in the input into a separate file.
hfst-strings2fst Compile string pairs and pair-strings into transducers.
Added:
>
>
hfst-strip-header Remove any HFST3 headers. hfst-fst2fst does the same?
 
hfst-subtract Subtract pairwise two transducer inputs.
hfst-substitute Substitute transition(s) in each input transducer with another transition(s) or a transducer.
hfst-summarize Print general information of a transducer.
hfst-tail Take N last transducers in the input.
Added:
>
>
hfst-traverse Walk through the transducer arc by arc.
 
hfst-twolc Compile a two-level grammar in Xerox Twolc formalism into an HFST transducer.
hfst-txt2fst Convert AT&T tabular format into a binary transducer.
hfst-xfst Compile files in xfst language into HFST transducers.

Revision 642012-04-12 - ErikAxelson

Line: 1 to 1
 
META TOPICPARENT name="HfstHome"

HFST: Command Line Tools

Line: 41 to 41
 
hfst-fst2txt Print transducer in AT&T tabular format.
hfst-head Take N first transducers in transducer input.
hfst-invert Invert each transducer in input.
Changed:
<
<
hfst-lexc Compile lexicon files in Xerox Lexc. formalism into an HFST transducer.
>
>
hfst-lexc Compile lexicon files in Xerox Lexc. formalism into an HFST transducer.
 
hfst-lookup Fast look-up of strings in a transducer.
hfst-minimize Minimize transducer input.
hfst-name Name or print the name of each transducer in input.

Revision 632012-04-12 - ErikAxelson

Line: 1 to 1
 
META TOPICPARENT name="HfstHome"

HFST: Command Line Tools

Line: 30 to 30
 
Tool Purpose
hfst-compare Compare two transducer inputs for equivalence.
hfst-compose Compose two transducer inputs pairwise.
Added:
>
>
hfst-compose-intersect Compute the intersecting composition of a lexicon transducer and rule transducers.
 
hfst-concatenate Concatenate two transducer inputs pairwise.
hfst-conjunct Conjunct (intersect) two transducer inputs pairwise.
hfst-determinize Determinize transducer input.
Line: 39 to 40
 
hfst-fst2strings Display the strings recognized by a transducer.
hfst-fst2txt Print transducer in AT&T tabular format.
hfst-head Take N first transducers in transducer input.
Deleted:
<
<
hfst-compose-intersect Compute the intersecting composition of a lexicon transducer and rule transducers.
 
hfst-invert Invert each transducer in input.
hfst-lexc Compile lexicon files in Xerox Lexc. formalism into an HFST transducer.
hfst-lookup Fast look-up of strings in a transducer.

Revision 622012-04-10 - ErikAxelson

Line: 1 to 1
 
META TOPICPARENT name="HfstHome"

HFST: Command Line Tools

Line: 19 to 19
 

Getting started

Changed:
<
<
  • Get familiar with the different functionalities offered by the HFST tools.
>
>
  • Get familiar with the different functionalities offered by the HFST tools.
 
  • Examples of HFST command line tools are given in tool-specific wiki pages.
Changed:
<
<
  • The rest of this page gives information on parameters and formats recognized by the HFST tools.
>
>
  • The rest of this page lists the HFST tools and their purposes and gives information on parameters and formats recognized by the tools.

Command Line Utilities

Tool Purpose
hfst-compare Compare two transducer inputs for equivalence.
hfst-compose Compose two transducer inputs pairwise.
hfst-concatenate Concatenate two transducer inputs pairwise.
hfst-conjunct Conjunct (intersect) two transducer inputs pairwise.
hfst-determinize Determinize transducer input.
hfst-disjunct Disjoin (calculate the union of) two transducer inputs pairwise.
hfst-format Give the implementation format of transducer input.
hfst-fst2fst Convert between HFST, OpenFst, SFST and foma transducers.
hfst-fst2strings Display the strings recognized by a transducer.
hfst-fst2txt Print transducer in AT&T tabular format.
hfst-head Take N first transducers in transducer input.
hfst-compose-intersect Compute the intersecting composition of a lexicon transducer and rule transducers.
hfst-invert Invert each transducer in input.
hfst-lexc Compile lexicon files in Xerox Lexc. formalism into an HFST transducer.
hfst-lookup Fast look-up of strings in a transducer.
hfst-minimize Minimize transducer input.
hfst-name Name or print the name of each transducer in input.
hfst-pair-test Test a Twol rule file using correspondences of strings.
hfst-proc Perform morphological analysis and generation with finite state transducers.
hfst-project Project a transducer towards input or output level.
hfst-push-weights Push weights of a transducer towards initial or final state(s).
hfst-regexp2fst Convert regular expression(s) into transducer output.
hfst-remove-epsilons Remove epsilons from transducer input.
hfst-repeat Repeat a transducer from N to M times.
hfst-reverse Reverse each transducer in input.
hfst-sfstpl2fst Compile files in SFST programming language into HFST transducers.
hfst-split Write each transducer in the input into a separate file.
hfst-strings2fst Compile string pairs and pair-strings into transducers.
hfst-subtract Subtract pairwise two transducer inputs.
hfst-substitute Substitute transition(s) in each input transducer with another transition(s) or a transducer.
hfst-summarize Print general information of a transducer.
hfst-tail Take N last transducers in the input.
hfst-twolc Compile a two-level grammar in Xerox Twolc formalism into an HFST transducer.
hfst-txt2fst Convert AT&T tabular format into a binary transducer.
hfst-xfst Compile files in xfst language into HFST transducers.
 

Usage

Line: 168 to 210
 

Tool-specific parameters

Changed:
<
<
For more detailed info on the parameters of a tool, see tool-speficic wiki pages listed below.

Command Line Utilities

>
>
For more detailed info on the parameters of a tool, see their wiki pages.
 
Deleted:
<
<
Tool Purpose
hfst-compare Test if two transducer inputs are equivalent pairwise.
hfst-compose Compose two transducer inputs pairwise.
hfst-concatenate Concatenate two transducer inputs pairwise.
hfst-conjunct Conjunct (intersect) two transducer inputs pairwise.
hfst-determinize Determinize transducer input.
hfst-disjunct Disjoin (calculate the union of) two transducer inputs pairwise.
hfst-format Give the implementation format of transducer input.
hfst-fst2fst Change the implementation format of transducer input.
hfst-fst2strings Print string pairs recognized by transducer input.
hfst-fst2txt Convert transducer input into AT&T text format.
hfst-head Take N first transducers in transducer input.
hfst-compose-intersect Compose a lexicon transducer with the logical intersection of two-level rule transducers
hfst-invert Invert each transducer in input.
hfst-lexc ...
hfst-lookup Perform efficient lookup in transducer input.
hfst-minimize Minimize transducer input.
hfst-name Name or print the name of each transducer in input.
hfst-pair-test ...
hfst-proc ...
hfst-project Extract input or output level of transducer input.
hfst-push-weights Push weights towards initial or final state(s) for each transducer in input.
hfst-regexp2fst Convert regular expression(s) into transducer output.
hfst-remove-epsilons Remove epsilons from transducer input.
hfst-repeat ...
hfst-reverse Reverse each transducer in input.
hfst-sfstpl2fst Convert SFST programming language script(s?) into transducer output.
hfst-split Write each transducer in the input into a separate file.
hfst-strings2fst Convert pair strings and string pairs into transducer output.
hfst-subtract Subtract pairwise two transducer inputs.
hfst-substitute Substitute transition(s) in each input transducer with another transition(s) or a transducer.
hfst-summarize Print some general information about transducers.
hfst-tail Take N last transducers in the input.
hfst-twolc ...
hfst-txt2fst Convert AT&T text format into HFST format transducer output.
hfst-xfst Compile XFST scripts into transducer output.
 

Transducer and file formats

Revision 612012-04-05 - ErikAxelson

Line: 1 to 1
 
META TOPICPARENT name="HfstHome"

HFST: Command Line Tools

Line: 105 to 105
 hfst-compare, hfst-compose, hfst-concatenate, hfst-conjunct, hfst-disjunct, hfst-subtract
Added:
>
>
hfst-compose-intersect takes two transducer inputs as parameters in the same way as the rest of the binary tools, although it processes the inputs in a slightly different way.
 

Parameters for tools operating on arbitrary number of transducers

For tools that operate on arbitrary number of input transducers, the list of filenames must be given as free parameters of command line, e.g.:

Line: 115 to 118
  HFST tools that operate on multiple input parameters are:
Changed:
<
<
hfst-compose-intersect, hfst-lexc
>
>
hfst-lexc
 

Defining the backend format

Revision 602012-04-05 - ErikAxelson

Line: 1 to 1
 
META TOPICPARENT name="HfstHome"

HFST: Command Line Tools

Line: 237 to 237
 

Transducer input and output

The tools support transducer files (or pipelined transducer input/output) containing a sequence of transducers.

Added:
>
>
 If tools that take a single transducer input are used, the tools repeat the operation for each input transducer in the input as if they had been provided in separate invocations.
Changed:
<
<
For tools that require two transducer inputs, the operation depends on specific tool; currently composition composes pairwise, repeating the last transducer for the input that contains fewer transducers. All other tools operate pairwise and exit as soon as either stream is exhausted. TODO: check how this works
>
>
The tools that take two transducer inputs repeat the operation pairwise for each pair of transducers read from the inputs. The transducer inputs must contain the same number of transducers, else the program exits and prints an error message. An exception is the case where the first input contains one or more transducers and the second one exactly one transducer. In this case the operation is applied for each transducer in the first input so that the second transducer remains the same all the time.

The tool hfst-compose-intersect allows both transducer inputs contain one or more transducers and the inputs can have a different number of transducers. The tool applies composing intersection for each transducer in the first input so that the set of transducers (i.e. all transducers read from the second input) remains same all the time.

 To further operate on sequences of transducers, tools hfst-head, hfst-tail and hfst-split can be used.

Revision 592012-04-03 - KimmoKoskenniemi

Line: 1 to 1
 
META TOPICPARENT name="HfstHome"

HFST: Command Line Tools

Line: 181 to 181
 
hfst-format Give the implementation format of transducer input.
hfst-fst2fst Change the implementation format of transducer input.
hfst-fst2strings Print string pairs recognized by transducer input.
Changed:
<
<
hfst-fst2txt Print transducer input in AT&T text format.
>
>
hfst-fst2txt Convert transducer input into AT&T text format.
 
hfst-head Take N first transducers in transducer input.
Changed:
<
<
hfst-compose-intersect ...
>
>
hfst-compose-intersect Compose a lexicon transducer with the logical intersection of two-level rule transducers
 
hfst-invert Invert each transducer in input.
hfst-lexc ...
hfst-lookup Perform efficient lookup in transducer input.
Line: 202 to 202
 
hfst-strings2fst Convert pair strings and string pairs into transducer output.
hfst-subtract Subtract pairwise two transducer inputs.
hfst-substitute Substitute transition(s) in each input transducer with another transition(s) or a transducer.
Changed:
<
<
hfst-summarize Print information on each input transducer.
>
>
hfst-summarize Print some general information about transducers.
 
hfst-tail Take N last transducers in the input.
hfst-twolc ...
Changed:
<
<
hfst-txt2fst Convert AT&T text format into transducer output.
hfst-xfst Convert XFST scripts into transducer output?
>
>
hfst-txt2fst Convert AT&T text format into HFST format transducer output.
hfst-xfst Compile XFST scripts into transducer output.
 

Transducer and file formats

Revision 582012-04-02 - ErikAxelson

Line: 1 to 1
 
META TOPICPARENT name="HfstHome"

HFST: Command Line Tools

Line: 74 to 74
  Unary operations are:
Changed:
<
<
>
>
hfst-determinize, hfst-fst2strings, hfst-fst2txt, hfst-fst2fst, hfst-head, hfst-invert, hfst-minimize, hfst-name, hfst-project, hfst-push-weights, hfst-remove-epsilons, hfst-repeat, hfst-reverse, hfst-split, hfst-strings2fst, hfst-summarize, hfst-tail, hfst-txt2fst
 

Parameters for binary operator tools

Line: 115 to 102
  Binary operations are:
Changed:
<
<
>
>
hfst-compare, hfst-compose, hfst-concatenate, hfst-conjunct, hfst-disjunct, hfst-subtract
 

Parameters for tools operating on arbitrary number of transducers

Line: 132 to 115
  HFST tools that operate on multiple input parameters are:
Changed:
<
<
>
>
hfst-compose-intersect, hfst-lexc
 

Defining the backend format

Line: 189 to 171
 
Changed:
<
<
Utility Name
HfstCompare hfst-compare
HfstCompose hfst-compose
HfstConcatenate hfst-concatenate
HfstConjunct hfst-conjunct
HfstDeterminize hfst-determinize
HfstDisjunct hfst-disjunct
HfstFormat hfst-format
HfstFst2Fst hfst-fst2fst
HfstFst2Strings hfst-fst2strings
HfstFst2Txt hfst-fst2txt
HfstHead hfst-head
HfstComposeIntersect hfst-compose-intersect
HfstInvert hfst-invert
HfstLexc2Fst hfst-lexc − A Lexicon Compiler
HfstLookUp hfst-lookup
HfstMinimize hfst-minimize
HfstName hfst-name
HfstPairTest hfst-pair-test
HfstProc hfst-proc
HfstProject hfst-project
HfstPushWeights hfst-push-weights
HfstRegexp2Fst hfst-regexp2fst
HfstRemoveEpsilons hfst-remove-epsilons
HfstRepeat hfst-repeat
HfstReverse hfst-reverse
HfstSfstPl2Fst hfst-sfstpl2fst − An SFST Programming Language Compiler
HfstSplit hfst-split
HfstStrings2Fst hfst-strings2fst
HfstSubtract hfst-subtract
HfstSubstitute hfst-substitute
HfstSummarize hfst-summarize
HfstTail hfst-tail
HfstTwolC hfst-twolc − A Two-Level Grammar Compiler
HfstTxt2Fst hfst-txt2fst
HfstXfst hfst-xfst − An Xfst Compiler
<-- | HfstFlagDiacritics | hfst-flag-diacritics | Does not yet work. | -->
>
>
Tool Purpose
hfst-compare Test if two transducer inputs are equivalent pairwise.
hfst-compose Compose two transducer inputs pairwise.
hfst-concatenate Concatenate two transducer inputs pairwise.
hfst-conjunct Conjunct (intersect) two transducer inputs pairwise.
hfst-determinize Determinize transducer input.
hfst-disjunct Disjoin (calculate the union of) two transducer inputs pairwise.
hfst-format Give the implementation format of transducer input.
hfst-fst2fst Change the implementation format of transducer input.
hfst-fst2strings Print string pairs recognized by transducer input.
hfst-fst2txt Print transducer input in AT&T text format.
hfst-head Take N first transducers in transducer input.
hfst-compose-intersect ...
hfst-invert Invert each transducer in input.
hfst-lexc ...
hfst-lookup Perform efficient lookup in transducer input.
hfst-minimize Minimize transducer input.
hfst-name Name or print the name of each transducer in input.
hfst-pair-test ...
hfst-proc ...
hfst-project Extract input or output level of transducer input.
hfst-push-weights Push weights towards initial or final state(s) for each transducer in input.
hfst-regexp2fst Convert regular expression(s) into transducer output.
hfst-remove-epsilons Remove epsilons from transducer input.
hfst-repeat ...
hfst-reverse Reverse each transducer in input.
hfst-sfstpl2fst Convert SFST programming language script(s?) into transducer output.
hfst-split Write each transducer in the input into a separate file.
hfst-strings2fst Convert pair strings and string pairs into transducer output.
hfst-subtract Subtract pairwise two transducer inputs.
hfst-substitute Substitute transition(s) in each input transducer with another transition(s) or a transducer.
hfst-summarize Print information on each input transducer.
hfst-tail Take N last transducers in the input.
hfst-twolc ...
hfst-txt2fst Convert AT&T text format into transducer output.
hfst-xfst Convert XFST scripts into transducer output?
 

Transducer and file formats

Changed:
<
<
The software is essentially created around the concept of synchronized transducers, i.e. the input and the output symbols are synchronized symbol pairs.
>
>
The software is essentially created around the concept of synchronized transducers, i.e. the input and the output symbols of a transducer are synchronized symbol pairs.
 In order to reduce the number of different versions of our tools, one character encoding convention must be used for the input and output text formats. Currently Unicode with UTF8 is used in all utilities and all our demo lexicons are implemented in UTF8 (even the English lexicon).
Line: 242 to 223
 

Transducer formats

Changed:
<
<
HFST 3 stores automata in HFST automata container format, which consists of HFST3\0 magic sequence, HFST 3 metadata header, and the backend's own automaton in original format.
>
>
HFST 3 stores automata in HFST automata container format, which consists of HFST3\0 magic sequence, HFST 3 metadata header, and the backend's own automaton in original format.
 
Changed:
<
<
For operations that input and output transducers, the output is always of same type as input(s) and this cannot be overriden (except for hfst-fst2fst).
>
>
For operations that input and output transducers, the output is always of same type as input(s) and this cannot be overriden (except for hfst-fst2fst).
 The command line tools are file-format independent, they select the function based on the input data type.
Changed:
<
<
All input transducers are assumed to have the same binary format (which is concluded from the first transducer in the input). In the text-to-transducer modules, the format is selected by the user with tool options. To convert between different binary formats, the tool hfst-fst2fst is provided. Tools which operate on multiple transducers will issue error message if fed with different types of transducers.

Transducer archive files

The tools support transducer archives, that is, streams containing catenation of multiple transducers. If tools that take a single input transducer stream are used, the tools repeat the operation for each input as if they had been provided in separate invocations. For tools that require two input streams, the operation depends on specific tool; currently composition composes pairwise, repeating the last transducer for stream that contains fewer. All other tools operate pairwise and exit as soon as either stream is exhausted. To further operate on transducer archives, tools hfst-head etc. can be used.

>
>
All input transducers are assumed to have the same binary format (which is concluded from the first transducer in the first transducer input). In the text-to-transducer tools, the format is selected by the user with options. To convert between different transducer binary formats, the tool hfst-fst2fst is provided. Tools which operate on multiple transducers, will issue error message if fed with different types of transducers.

Transducer input and output

The tools support transducer files (or pipelined transducer input/output) containing a sequence of transducers. If tools that take a single transducer input are used, the tools repeat the operation for each input transducer in the input as if they had been provided in separate invocations. For tools that require two transducer inputs, the operation depends on specific tool; currently composition composes pairwise, repeating the last transducer for the input that contains fewer transducers. All other tools operate pairwise and exit as soon as either stream is exhausted. TODO: check how this works To further operate on sequences of transducers, tools hfst-head, hfst-tail and hfst-split can be used.

 

Transition symbols

Changed:
<
<
For tools that take strings or AT&T text format as input or output, the following special symbols are reserved:
>
>
For tools that take strings or AT&T text format as input or print them as output, the following special symbols are reserved:
 
symbol meaning
"@_EPSILON_SYMBOL_@" The epsilon.

Revision 572012-04-02 - ErikAxelson

Line: 1 to 1
 
META TOPICPARENT name="HfstHome"

HFST: Command Line Tools

Line: 305 to 305
 
-- TommiPirinen
<-- vim: set ft=twiki: -->

Revision 562012-03-26 - ErikAxelson

Line: 1 to 1
 
META TOPICPARENT name="HfstHome"

HFST: Command Line Tools

Line: 135 to 135
 
Changed:
<
<

Parameters for tools creating automata

>
>

Defining the backend format

 
Changed:
<
<
The tools that create automata may specify details as command-line options.
>
>
The tools that create transducers from scratch (hfst-sfstpl2fst, hfst-regexp2fst, hfst-strings2fst) or AT&T format (hfst-txt2fst) or perform binary format conversion (hfst-fst2fst) may specify the backend format of the resulting binary transducer(s).
 
Changed:
<
<
-f, --format=FORMAT Use FORMAT backend for automata operations
-w, --weight=NUMBER Use NUMBER as default weight instead of semiring one
>
>
-f, --format=FORMAT Use backend FORMAT
  Legal parameters of FORMAT depend on backend library supports compiled in HFST.
Changed:
<
<
The list of names supported is in hfst-commandline.cc in function hfst_parse_format_name(const char*). At time of HFST 3, the following strings are mapped:
>
>
The default backend format, if available, is openfst-tropical. All available backend formats supported can be obtained with the command hfst-format --list.
<-- The list of names supported is in hfst-commandline.cc in function hfst_parse_format_name(const char*). -->
At time of HFST 3, the following strings are allowed:
 
Changed:
<
<
allowed strings used backend
>
>
allowed strings used backend note
 
sfst SFST backend
ofst-tropical, openfst-tropical, openfst, ofst OpenFST standard automata with tropical semiring weights (default)
ofst-log, openfst-log OpenFST with log weights
foma foma backend
Changed:
<
<
optimized-lookup-weighted, olw, optimized-lookup, ol HFST's lookup-optimized automata with weights
optimized-lookup-unweighted, olu HFST's lookup-optimized automata without weights
>
>
optimized-lookup-weighted, olw, optimized-lookup, ol HFST's lookup-optimized automata with weights Not supported by hfst-regexp2fst or hfst-sfstpl2fst
optimized-lookup-unweighted, olu HFST's lookup-optimized automata without weights Not supported by hfst-regexp2fst or hfst-sfstpl2fst

Tools that take transducer(s) as input use the backend functions of the input transducers and write output in the same format. To use the functions of a different transducer library, the user must perform explicit conversion with =hfst-fst2fst=. For example, if transducer.sfst is a binary transducer in SFST format and the user wishes to use foma's inversion function and get the result in SFST format, the following commands are needed:

cat transducer.sfst | hfst-fst2fst --format foma | hfst-invert | hfst-fst2fst --format sfst 

The optimized lookup backend format is not supported by most tools, as it is mainly intended for fast lookup. The tools that support it are hfst-lookup, hfst-fst2fst, hfst-txt2fst, hfst-fst2txt, hfst-strings2fst and hfst-format.

Parameters for tools that support weights given on the command line

The tools hfst-regexp2fst and hfst-strings2fst have the following option:

-w, --weight=NUMBER Use NUMBER as default weight instead of semiring one
 
Changed:
<
<
For operations that add weight, the weight NUMBER is parsed using
>
>
The weight NUMBER is parsed using
 standard library's strtod(3) implementation. The semantics for weights depends on selected backend.

Revision 552012-02-24 - ErikAxelson

Line: 1 to 1
 
META TOPICPARENT name="HfstHome"

HFST: Command Line Tools

Line: 18 to 18
 

Getting started

Changed:
<
<
>
>
 
  • Get familiar with the different functionalities offered by the HFST tools.
  • Examples of HFST command line tools are given in tool-specific wiki pages.
  • The rest of this page gives information on parameters and formats recognized by the HFST tools.

Revision 542012-02-23 - ErikAxelson

Line: 1 to 1
 
META TOPICPARENT name="HfstHome"

HFST: Command Line Tools

Line: 18 to 18
 

Getting started

Changed:
<
<
  • HfstCommandLineToolsTutorial a couple of hands-on examples
  • HfstOutline to get familiar with the different functionalities offered by the HFST tools
  • More examples of HFST command line tools are also given in tool-specific wiki pages listed here.
>
>
  • A tutorial with hands-on examples.
  • Get familiar with the different functionalities offered by the HFST tools.
  • Examples of HFST command line tools are given in tool-specific wiki pages.
 
  • The rest of this page gives information on parameters and formats recognized by the HFST tools.

Usage

Revision 532012-02-22 - ErikAxelson

Line: 1 to 1
 
META TOPICPARENT name="HfstHome"

HFST: Command Line Tools

Changed:
<
<
HFST tools is a collection of HFST based command line utilities that can create, operate and print transducers using the HFST interface.
>
>
HFST tools is a collection of HFST-based command line utilities that can create, operate and print transducers using the HFST interface.
 The tools are licenced under GNU GPL version 3 (other licences may be available at request). Licence text can be found from file COPYING. Other licences are possible, and can be given by authors found in AUTHORS file.
Changed:
<
<
HfstOutline lists most of the command line tools and their purpose.

Examples of command line tools are given in HfstOutline, tool-specific pages and HfstCommandLineToolExamples.

Downloading

>
>

Downloading and Installation

  Tools can be fetched from the Sourceforge download page.
Changed:
<
<

Installation

For installing instructions, see INSTALL. Briefly, the usual

        ./configure
        make
        (as root) make install

should result in a local installation and

        make uninstall

in its uninstallation. If you would rather install in eg. your home directory (or aren't the system administrator), you can tell ./configure:

        ./configure --prefix=${HOME}
<--  
-->
>
>
For installing instructions, see INSTALL. Briefly, the usual ./configure && make &&  (sudo) make install should result in a local installation and make uninstall in its uninstallation. If you would rather install in eg. your home directory (or aren't the system administrator), you can tell ./configure: ./configure --prefix=${HOME}

Getting started

  • HfstCommandLineToolsTutorial a couple of hands-on examples
  • HfstOutline to get familiar with the different functionalities offered by the HFST tools
  • More examples of HFST command line tools are also given in tool-specific wiki pages listed here.
  • The rest of this page gives information on parameters and formats recognized by the HFST tools.
 

Usage

Revision 522011-12-12 - ErikAxelson

Line: 1 to 1
 
META TOPICPARENT name="HfstHome"

HFST: Command Line Tools

Changed:
<
<
HFST tools is a collection of HFST based command line utilities that can create, operate and print transducers using the HFST interface. The tools are licenced under GNU GPL version 3 (other licences may be available at request). Licence text can be found from file COPYING. Other licences are possible, and can be given by authors found in AUTHORS file.
>
>
HFST tools is a collection of HFST based command line utilities that can create, operate and print transducers using the HFST interface. The tools are licenced under GNU GPL version 3 (other licences may be available at request). Licence text can be found from file COPYING. Other licences are possible, and can be given by authors found in AUTHORS file.
  HfstOutline lists most of the command line tools and their purpose.
Changed:
<
<
Examples of commandline tools are given in HfstOutline and HfstCommandLineToolExamples.
>
>
Examples of command line tools are given in HfstOutline, tool-specific pages and HfstCommandLineToolExamples.
 

Downloading

Line: 22 to 25
 hfst-toolname [OPTIONS] [FILE...]
Changed:
<
<
HFST tools contain number of different command line utilities, and their parameters vary on case by case basis. If in doubt, parameter --help will always tell the parameters of a tool. For further instructions and examples, see tool-specific wiki pages listed here.
>
>
HFST tools contain number of different command line utilities, and their parameters vary on case by case basis. If in doubt, parameter --help will always show the parameters of a tool. For further instructions and examples, see tool-specific wiki pages listed here.
 

Common parameters

Line: 43 to 48
 hfst-toolname > text.txt
Changed:
<
<
If the output transducer is written into the standard output stream, warnings and verbose output are printed to standard error stream instead of standard output. Error messages are always printed to standard error stream.
>
>
If the resulting transducer is written into the standard output stream, warnings and verbose output are printed to standard error stream instead of standard output. Error messages are always printed to standard error stream.
 

Input parameters for unary operator tools

Changed:
<
<
The input filename may also be specified as free argument of command line, that is, the following are equivalent in terms of input file processing:
>
>
The input filename may also be specified as free argument of command line or given through standard input, that is, the following are equivalent in terms of input file processing:
 
hfst-toolname --input=transducer.hfst
Line: 96 to 102
 hfst-toolname --input1=first.hfst second.hfst hfst-toolname --input2=second.hfst first.hfst cat first.hfst | hfst-toolname --input2=second.hfst
Added:
>
>
cat first.hfst | hfst-toolname second.hfst
 cat second.hfst | hfst-toolname --input1=first.hfst
Deleted:
<
<
cat second.hfst | hfst-toolname first.hfst
 
Changed:
<
<
If the binary operator is not commutative, the input1 or first transducer is the first or leftmost operand. E.g. for composition input1’s output level is matched against input2’s input level.
>
>
If the binary operator is not commutative, the input1 or first transducer is the first or leftmost operand. E.g. for composition input1’s output level is matched against input2’s input level.
  Binary operations are:
Line: 131 to 138
 
-f, --format=FORMAT Use FORMAT backend for automata operations
-w, --weight=NUMBER Use NUMBER as default weight instead of semiring one
Changed:
<
<
Legal parameters of FORMAT depend on backend library supports compiled in HFST. The list of names supported is in hfst-commandline.cc function hfst_parse_format_name(const char*). At time of HFST 3.0_beta release the following strings were mapped:

*allowed strings* *used backend*
sfst SFST backend
ofst-tropical, openfst-tropical, openfst, ofst OpenFST standard automata with tropical semiring weights (default)
ofst-log, openfst-log OpenFST with log weights
foma foma backend
optimized-lookup-weighted, olw, optimized-lookup, ol HFST's lookup-optimized automata with weights
optimized-lookup-unweighted, olu HFST's lookup-optimized automata without weights
>
>
Legal parameters of FORMAT depend on backend library supports compiled in HFST. The list of names supported is in hfst-commandline.cc in function hfst_parse_format_name(const char*). At time of HFST 3, the following strings are mapped:

allowed strings used backend
sfst SFST backend
ofst-tropical, openfst-tropical, openfst, ofst OpenFST standard automata with tropical semiring weights (default)
ofst-log, openfst-log OpenFST with log weights
foma foma backend
optimized-lookup-weighted, olw, optimized-lookup, ol HFST's lookup-optimized automata with weights
optimized-lookup-unweighted, olu HFST's lookup-optimized automata without weights
  For operations that add weight, the weight NUMBER is parsed using
Changed:
<
<
standard library's strtod(3) implementation. The semantics for
>
>
standard library's strtod(3) implementation. The semantics for
 weights depends on selected backend.

Tool-specific parameters

Line: 153 to 162
 
Changed:
<
<
Utilities Comment
>
>
Utility Name
 
HfstCompare hfst-compare
HfstCompose hfst-compose
HfstConcatenate hfst-concatenate
Line: 166 to 175
 
HfstFst2Txt hfst-fst2txt
HfstHead hfst-head
HfstComposeIntersect hfst-compose-intersect
Deleted:
<
<
HfstFlagDiacritics hfst-flag-diacritics Does not yet work.
 
HfstInvert hfst-invert
HfstLexc2Fst hfst-lexc − A Lexicon Compiler
HfstLookUp hfst-lookup
Line: 190 to 198
 
HfstTwolC hfst-twolc − A Two-Level Grammar Compiler
HfstTxt2Fst hfst-txt2fst
HfstXfst hfst-xfst − An Xfst Compiler
Added:
>
>
<-- | HfstFlagDiacritics | hfst-flag-diacritics | Does not yet work. | -->
 

Transducer and file formats

Changed:
<
<
The software is essentially created around the concept of synchronized transducers, i.e. the input and the output symbols are synchronized symbol pairs. In order to reduce the number of different versions of our tools, one character encoding convention must be used for the input and output text formats. Currently unicode with UTF8 is used in all utilities and all our demo lexicons are implemented in UTF8 (even the English lexicon).

In order to allow different input modes for various functionalities, it was found most convenient to separate the conversions between string, text and binary formats into separate modules. Unless otherwise specified on the command line, we assume that the input is read from the standard input and the output is directed to the standard output. The input and output may specify or contain several transducers. Transducers in text format are separated by a transducer delimiter ("--" plus a newline). A delimiter at the end of a file indicates that an empty transducer follows. A sequence of two delimiters indicates an empty transducer inbetween.

>
>
The software is essentially created around the concept of synchronized transducers, i.e. the input and the output symbols are synchronized symbol pairs. In order to reduce the number of different versions of our tools, one character encoding convention must be used for the input and output text formats. Currently Unicode with UTF8 is used in all utilities and all our demo lexicons are implemented in UTF8 (even the English lexicon).

In order to allow different input modes for various functionalities, it was found most convenient to separate the conversions between string, text and binary formats into separate modules. Unless otherwise specified on the command line, we assume that the input is read from the standard input and the output is directed to the standard output. The input and output may specify or contain several transducers. Transducers in text format are separated by a transducer delimiter ("--" plus a newline). A delimiter at the end of a file indicates that an empty transducer follows. A sequence of two delimiters indicates an empty transducer in between.

 

Transducer formats

HFST 3 stores automata in HFST automata container format, which consists of HFST3\0 magic sequence, HFST 3 metadata header, and the backend's own automaton in original format.

Changed:
<
<
For operations that input and output transducers, the output is always of same type as input(s) and this cannot be overriden (except for hfst-fst2fst). The command line tools are file-format independent, they select the function based on the input data type. All input transducers are assumed to have the same binary format (which is concluded from the first transducer in the input). In the text-to-transducer modules, the format is selected by the user with tool options. To convert between different binary formats, the tool hfst-fst2fst is provided. Tools which operate on multiple transducers will issue error message if fed with different types of transducers.
>
>
For operations that input and output transducers, the output is always of same type as input(s) and this cannot be overriden (except for hfst-fst2fst). The command line tools are file-format independent, they select the function based on the input data type. All input transducers are assumed to have the same binary format (which is concluded from the first transducer in the input). In the text-to-transducer modules, the format is selected by the user with tool options. To convert between different binary formats, the tool hfst-fst2fst is provided. Tools which operate on multiple transducers will issue error message if fed with different types of transducers.
 

Transducer archive files

Changed:
<
<
The tools support transducer archives, that is, streams containing catenation of multiple transducers. If tools that take a single input transducer stream are used, the tools repeat the operation for each input as if they had been provided in separate invocations. For tools that require two input streams, the operation depends on specific tool; currently composition composes pairwise, repeating the last transducer for stream that contains fewer. All other tools operate pairwise and exit as soon as either stream is exhausted. To further operate on transducer archives, tools hfst-head etc. can be used.
>
>
The tools support transducer archives, that is, streams containing catenation of multiple transducers. If tools that take a single input transducer stream are used, the tools repeat the operation for each input as if they had been provided in separate invocations. For tools that require two input streams, the operation depends on specific tool; currently composition composes pairwise, repeating the last transducer for stream that contains fewer. All other tools operate pairwise and exit as soon as either stream is exhausted. To further operate on transducer archives, tools hfst-head etc. can be used.
 

Transition symbols

Line: 213 to 238
 
symbol meaning
"@_EPSILON_SYMBOL_@" The epsilon.
Added:
>
>
"@0@" An alternative representation of the epsilon.
 
"@_UNKNOWN_SYMBOL_@" Any symbol not known to a transducer.
"@_IDENTITY_SYMBOL_@" Any identity symbol pair not known to a transducer.
Changed:
<
<
Some tools may take input or produce output that uses a different formalism. For instance, in SFST programming language the epsilon is always denoted as "<>".
>
>
Some tools may take input or produce output that uses a different formalism. For instance, in SFST programming language the epsilon is always denoted as "<>".
 However, the resulting transducers always use the above-mentioned special symbols:
Changed:
<
<
$ echo "<>:a" | hfst-calculate -f sfst | hfst-fst2txt @_EPSILON_SYMBOL_@ a 0 1
>
>
$ echo "<>:a" | hfst-sfstpl2fst -f sfst | hfst-fst2txt @0@ a 0 1
 1

Changed:
<
<
The internal representation of a transition label in a transducer is a number. The mapping from symbols (strings) to numbers is done internally. If the mappings differ between transducers, harmonization is carried out.
>
>
The internal representation of a transition label in a transducer is a number. The mapping from symbols (strings) to numbers is done internally. If the mappings differ between transducers, harmonization is carried out.
 

Reporting bugs

Changed:
<
<
All bugs in command line tools shall be reported to sourceforge's HFST issue tracker It is good to include at least steps to reproduce the error (i.e. exact command(s) used), and first line of output of command hfst-tool --version. E.g. include the following in your message:
>
>
All bugs in command line tools shall be reported to sourceforge's HFST issue tracker. It is good to include at least steps to reproduce the error (i.e. exact command(s) used), and first line of output of command hfst-tool --version. E.g. include the following in your message:
 
$ hfst-tool --version

Revision 512011-08-22 - ErikAxelson

Line: 1 to 1
 
META TOPICPARENT name="HfstHome"

HFST: Command Line Tools

Line: 6 to 6
  HfstOutline lists most of the command line tools and their purpose.
Added:
>
>
Examples of commandline tools are given in HfstOutline and HfstCommandLineToolExamples.
 

Downloading

Tools can be fetched from the Sourceforge download page.

Revision 502011-08-18 - ErikAxelson

Line: 1 to 1
 
META TOPICPARENT name="HfstHome"

HFST: Command Line Tools

Line: 152 to 152
 

Utilities Comment
Deleted:
<
<
HfstSfstPl2Fst hfst-calculate − An SFST Programming Language Compiler
 
HfstCompare hfst-compare
HfstCompose hfst-compose
HfstConcatenate hfst-concatenate
Line: 179 to 178
 
HfstRemoveEpsilons hfst-remove-epsilons
HfstRepeat hfst-repeat
HfstReverse hfst-reverse
Added:
>
>
HfstSfstPl2Fst hfst-sfstpl2fst − An SFST Programming Language Compiler
 
HfstSplit hfst-split
HfstStrings2Fst hfst-strings2fst
HfstSubtract hfst-subtract

Revision 492011-08-18 - ErikAxelson

Line: 1 to 1
 
META TOPICPARENT name="HfstHome"

HFST: Command Line Tools

Line: 152 to 152
 

Utilities Comment
Changed:
<
<
HfstCalculate hfst-calculate − An SFST Programming Language Compiler
>
>
HfstSfstPl2Fst hfst-calculate − An SFST Programming Language Compiler
 
HfstCompare hfst-compare
HfstCompose hfst-compose
HfstConcatenate hfst-concatenate
Line: 214 to 214
 
"@_UNKNOWN_SYMBOL_@" Any symbol not known to a transducer.
"@_IDENTITY_SYMBOL_@" Any identity symbol pair not known to a transducer.
Changed:
<
<
Some tools may take input or produce output that uses a different formalism. For instance, in SFST programming language the epsilon is always denoted as "<>".
>
>
Some tools may take input or produce output that uses a different formalism. For instance, in SFST programming language the epsilon is always denoted as "<>".
 However, the resulting transducers always use the above-mentioned special symbols:

Revision 482011-07-18 - TrondTrosterud

Line: 1 to 1
 
META TOPICPARENT name="HfstHome"

HFST: Command Line Tools

Line: 227 to 227
 

Reporting bugs

Changed:
<
<
All bugs in command line tools shall be reported to sourceforge's HFST issue tracker It is good to include at least steps to reproduce the error (i.e. exact command(s) used), and first line of output of command hfst-tool --version. E.g. include following in your message:
>
>
All bugs in command line tools shall be reported to sourceforge's HFST issue tracker It is good to include at least steps to reproduce the error (i.e. exact command(s) used), and first line of output of command hfst-tool --version. E.g. include the following in your message:
 
$ hfst-tool --version

Revision 472011-04-06 - MiikkaSilfverberg

Line: 1 to 1
 
META TOPICPARENT name="HfstHome"

HFST: Command Line Tools

Line: 171 to 171
 
HfstLookUp hfst-lookup
HfstMinimize hfst-minimize
HfstName hfst-name
Added:
>
>
HfstPairTest hfst-pair-test
 
HfstProc hfst-proc
HfstProject hfst-project
HfstPushWeights hfst-push-weights

Revision 462011-03-18 - TommiPirinen

Line: 1 to 1
 
META TOPICPARENT name="HfstHome"

HFST: Command Line Tools

Deleted:
<
<
Warning, important This page already describes HFST 3 conventions that have been slightly changed from HFST 2 versions of tools.
 HFST tools is a collection of HFST based command line utilities that can create, operate and print transducers using the HFST interface. The tools are licenced under GNU GPL version 3 (other licences may be available at request). Licence text can be found from file COPYING. Other licences are possible, and can be given by authors found in AUTHORS file.

HfstOutline lists most of the command line tools and their purpose.

Revision 452011-02-13 - TommiPirinen

Line: 1 to 1
 
META TOPICPARENT name="HfstHome"

HFST: Command Line Tools

Line: 160 to 160
 
HfstConcatenate hfst-concatenate
HfstConjunct hfst-conjunct
HfstDeterminize hfst-determinize
Deleted:
<
<
HfstDiffTest hfst-diff-test Does not yet work.
 
HfstDisjunct hfst-disjunct
HfstFormat hfst-format
HfstFst2Fst hfst-fst2fst
Line: 174 to 173
 
HfstLookUp hfst-lookup
HfstMinimize hfst-minimize
HfstName hfst-name
Deleted:
<
<
HfstPairTest hfst-pair-test Does not yet work.
 
HfstProc hfst-proc
HfstProject hfst-project
HfstPushWeights hfst-push-weights

Revision 442011-02-04 - ErikAxelson

Line: 1 to 1
 
META TOPICPARENT name="HfstHome"

HFST: Command Line Tools

Line: 71 to 71
 
Added:
>
>
 
Line: 172 to 173
 
HfstLexc2Fst hfst-lexc − A Lexicon Compiler
HfstLookUp hfst-lookup
HfstMinimize hfst-minimize
Added:
>
>
HfstName hfst-name
 
HfstPairTest hfst-pair-test Does not yet work.
HfstProc hfst-proc
HfstProject hfst-project

Revision 432011-01-11 - ErikAxelson

Line: 1 to 1
 
META TOPICPARENT name="HfstHome"

HFST: Command Line Tools

Line: 188 to 188
 
HfstTail hfst-tail
HfstTwolC hfst-twolc − A Two-Level Grammar Compiler
HfstTxt2Fst hfst-txt2fst
Changed:
<
<
HfstXfstCompiler hfst-xfst-compiler
>
>
HfstXfst hfst-xfst − An Xfst Compiler
 

Transducer and file formats

Revision 422010-12-31 - TommiPirinen

Line: 1 to 1
 
META TOPICPARENT name="HfstHome"

HFST: Command Line Tools

Line: 121 to 121
 HFST tools that operate on multiple input parameters are:

Changed:
<
<
>
>
 

Parameters for tools creating automata

Line: 169 to 169
 
HfstComposeIntersect hfst-compose-intersect
HfstFlagDiacritics hfst-flag-diacritics Does not yet work.
HfstInvert hfst-invert
Changed:
<
<
HfstLexcCompiler hfst-lexc − A Lexicon Compiler
>
>
HfstLexc2Fst hfst-lexc − A Lexicon Compiler
 
HfstLookUp hfst-lookup
HfstMinimize hfst-minimize
HfstPairTest hfst-pair-test Does not yet work.

Revision 412010-12-21 - TommiPirinen

Line: 1 to 1
 
META TOPICPARENT name="HfstHome"

HFST: Command Line Tools

Line: 121 to 121
 HFST tools that operate on multiple input parameters are:

Changed:
<
<
>
>
 

Parameters for tools creating automata

Line: 169 to 169
 
HfstComposeIntersect hfst-compose-intersect
HfstFlagDiacritics hfst-flag-diacritics Does not yet work.
HfstInvert hfst-invert
Changed:
<
<
HfstLexC hfst-lexc − A Lexicon Compiler
>
>
HfstLexcCompiler hfst-lexc − A Lexicon Compiler
 
HfstLookUp hfst-lookup
HfstMinimize hfst-minimize
HfstPairTest hfst-pair-test Does not yet work.

Revision 402010-12-21 - ErikAxelson

Line: 1 to 1
 
META TOPICPARENT name="HfstHome"

HFST: Command Line Tools

Revision 392010-12-21 - TommiPirinen

Line: 1 to 1
 
META TOPICPARENT name="HfstHome"

HFST: Command Line Tools

Added:
>
>
Warning, important This page already describes HFST 3 conventions that have been slightly changed from HFST 2 versions of tools.
 HFST tools is a collection of HFST based command line utilities that can create, operate and print transducers using the HFST interface. The tools are licenced under GNU GPL version 3 (other licences may be available at request). Licence text can be found from file COPYING. Other licences are possible, and can be given by authors found in AUTHORS file.

HfstOutline lists most of the commandline tools and their purpose.

Line: 44 to 46
 If the output transducer is written into the standard output stream, warnings and verbose output are printed to standard error stream instead of standard output. Error messages are always printed to standard error stream.
Deleted:
<
<
 

Input parameters for unary operator tools

Line: 127 to 123
 
Added:
>
>

Parameters for tools creating automata

The tools that create automata may specify details as command-line options.

-f, --format=FORMAT Use FORMAT backend for automata operations
-w, --weight=NUMBER Use NUMBER as default weight instead of semiring one

Legal parameters of FORMAT depend on backend library supports compiled in HFST. The list of names supported is in hfst-commandline.cc function hfst_parse_format_name(const char*). At time of HFST 3.0_beta release the following strings were mapped:

*allowed strings* *used backend*
sfst SFST backend
ofst-tropical, openfst-tropical, openfst, ofst OpenFST standard automata with tropical semiring weights (default)
ofst-log, openfst-log OpenFST with log weights
foma foma backend
optimized-lookup-weighted, olw, optimized-lookup, ol HFST's lookup-optimized automata with weights
optimized-lookup-unweighted, olu HFST's lookup-optimized automata without weights

For operations that add weight, the weight NUMBER is parsed using standard library's strtod(3) implementation. The semantics for weights depends on selected backend.

 

Tool-specific parameters

For more detailed info on the parameters of a tool, see tool-speficic wiki pages listed below.

Line: 181 to 198
 

Transducer formats

Added:
>
>
HFST 3 stores automata in HFST automata container format, which consists of HFST3\0 magic sequence, HFST 3 metadata header, and the backend's own automaton in original format.
 For operations that input and output transducers, the output is always of same type as input(s) and this cannot be overriden (except for hfst-fst2fst). The command line tools are file-format independent, they select the function based on the input data type. All input transducers are assumed to have the same binary format (which is concluded from the first transducer in the input). In the text-to-transducer modules, the format is selected by the user with tool options. To convert between different binary formats, the tool hfst-fst2fst is provided. Tools which operate on multiple transducers will issue error message if fed with different types of transducers.

Transducer archive files

Line: 213 to 232
 
$ hfst-tool --version
Changed:
<
<
HFST Toolname [internal ALPHA-$Revision: 1.7 $]
>
>
HFST Toolname 0.1 (hfst 3.0)
 $ hfst-toolname [PARAMETERS that fail] Failure output
Line: 222 to 241
 

Development and distribution

Changed:
<
<
Source code archive contains test suite make check, which must be passed for all distributed versions, unless clearly labeled as alpha test versions.
<-- 
Statically linked test versions can be made with make static.
-->
>
>
Source code archive contains test suite make check, which must be passed for all distributed versions, unless clearly labeled as alpha test versions.
 
-- TommiPirinen
<-- vim: set ft=twiki: -->
>
>
--> -- TommiPirinen
<-- vim: set ft=twiki: -->
 
META TOPICMOVED by="TommiPirinen" date="1236767171" from="KitWiki.NotYetHfstToolCommandLines" to="KitWiki.HfstCommandLineTools"

Revision 382010-12-05 - ErikAxelson

Line: 1 to 1
 
META TOPICPARENT name="HfstHome"

HFST: Command Line Tools

Line: 156 to 156
 
HfstLookUp hfst-lookup
HfstMinimize hfst-minimize
HfstPairTest hfst-pair-test Does not yet work.
Added:
>
>
HfstProc hfst-proc
 
HfstProject hfst-project
HfstPushWeights hfst-push-weights
HfstRegexp2Fst hfst-regexp2fst

Revision 372010-12-04 - ErikAxelson

Line: 1 to 1
 
META TOPICPARENT name="HfstHome"

HFST: Command Line Tools

HFST tools is a collection of HFST based command line utilities that can create, operate and print transducers using the HFST interface. The tools are licenced under GNU GPL version 3 (other licences may be available at request). Licence text can be found from file COPYING. Other licences are possible, and can be given by authors found in AUTHORS file.

Added:
>
>
HfstOutline lists most of the commandline tools and their purpose.
 

Downloading

Tools can be fetched from the Sourceforge download page.

Revision 362010-12-02 - ErikAxelson

Line: 1 to 1
 
META TOPICPARENT name="HfstHome"

HFST: Command Line Tools

Line: 6 to 6
 

Downloading

Changed:
<
<
Tools can be fetched from the Sourceforge download page.
>
>
Tools can be fetched from the Sourceforge download page.
 

Installation

Line: 18 to 18
 hfst-toolname [OPTIONS] [FILE...]
Changed:
<
<
HFST tools contain number of different command line utilities, and their parameters vary on case by case basis. If in doubt, parameter --help will always tell the parameters of a tool. For further instructions and examples, see tool-specific wiki pages listed at HfstHome#Utilities.
>
>
HFST tools contain number of different command line utilities, and their parameters vary on case by case basis. If in doubt, parameter --help will always tell the parameters of a tool. For further instructions and examples, see tool-specific wiki pages listed here.
 

Common parameters

Line: 81 to 81
 
Deleted:
<
<
 
Line: 102 to 101
 cat second.hfst | hfst-toolname first.hfst
Changed:
<
<
If the binary operator is not commutative, the input1 or first transducer is the first or leftmost operand. E.g. for composition input1’s lower level is matched against input2’s upper level.
>
>
If the binary operator is not commutative, the input1 or first transducer is the first or leftmost operand. E.g. for composition input1’s output level is matched against input2’s input level.
  Binary operations are:
Line: 128 to 127
 

Tool-specific parameters

Changed:
<
<
For more detailed info on the parameters of a tool, see tool-speficic wiki pages listed at HfstHome#Utilities.
>
>
For more detailed info on the parameters of a tool, see tool-speficic wiki pages listed below.
 

Command Line Utilities

Added:
>
>
 
Utilities Comment
HfstCalculate hfst-calculate − An SFST Programming Language Compiler
HfstCompare hfst-compare
Line: 173 to 174
  The software is essentially created around the concept of synchronized transducers, i.e. the input and the output symbols are synchronized symbol pairs. In order to reduce the number of different versions of our tools, one character encoding convention must be used for the input and output text formats. Currently unicode with UTF8 is used in all utilities and all our demo lexicons are implemented in UTF8 (even the English lexicon).
Changed:
<
<
In order to allow different input modes for various functionalities, it was found most convenient to separate the conversions betwen string, text and binary formats into separate modules. Unless otherwise specified on the command line, we assume that the input is read from the standard input and the output is directed to the standard output. The input and output may specify or contain several transducers. Transducers in text format are separated by a transducer delimiter ("--" plus a newline). A delimiter at the end of a file indicates that an empty transducer follows. A sequence of two delimiters indicates an empty transducer inbetween.
>
>
In order to allow different input modes for various functionalities, it was found most convenient to separate the conversions between string, text and binary formats into separate modules. Unless otherwise specified on the command line, we assume that the input is read from the standard input and the output is directed to the standard output. The input and output may specify or contain several transducers. Transducers in text format are separated by a transducer delimiter ("--" plus a newline). A delimiter at the end of a file indicates that an empty transducer follows. A sequence of two delimiters indicates an empty transducer inbetween.
 

Transducer formats

Changed:
<
<
For transducer operations that input and output transducers, the output is always of same type as inputs and this cannot be overriden. The command line tools are file-format independent, they select the function based on the input data type. All input transducers are assumed to have the same binary format (which is concluded from the first transducer in the input). In the text-to-transducer modules, the format is selected by the user with tool options. To convert between weighted and unweighted variants, the tool hfst-fst2fst is provided. Tools which operate on multiple transducers will issue error message if fed with different types of transducers.
>
>
For operations that input and output transducers, the output is always of same type as input(s) and this cannot be overriden (except for hfst-fst2fst). The command line tools are file-format independent, they select the function based on the input data type. All input transducers are assumed to have the same binary format (which is concluded from the first transducer in the input). In the text-to-transducer modules, the format is selected by the user with tool options. To convert between different binary formats, the tool hfst-fst2fst is provided. Tools which operate on multiple transducers will issue error message if fed with different types of transducers.
 

Transducer archive files

Changed:
<
<
The support for transducer archives, that is, streams containing catenation of multiple transducers, is experimental and tool-dependent. If streams that have single input transducer stream are used, the tools repeat the operation for each input as if they had been provided in separate invocations. For tools that require two input streams, the operation depends on specific tool; currently composition composes pairwise, repeating the last transducer for stream that contains fewer. All other tools operate pairwise and exit as soon as either stream is exhausted. To further operate on transducer archives, tools hfst-head etc. can be used.
>
>
The tools support transducer archives, that is, streams containing catenation of multiple transducers. If tools that take a single input transducer stream are used, the tools repeat the operation for each input as if they had been provided in separate invocations. For tools that require two input streams, the operation depends on specific tool; currently composition composes pairwise, repeating the last transducer for stream that contains fewer. All other tools operate pairwise and exit as soon as either stream is exhausted. To further operate on transducer archives, tools hfst-head etc. can be used.

Transition symbols

 
Changed:
<
<

Symbol tables

>
>
For tools that take strings or AT&T text format as input or output, the following special symbols are reserved:
 
Changed:
<
<
The internal representation of a transition label in a transducer is a number. The mapping from symbols (strings) to numbers can be provided with a symbol table. By default, all tools write a symbol table with the output transducer, if (1) all input transducers have a symbol table or (2) a separate symbol table file has been defined with option --read-symbols.
>
>
symbol meaning
"@_EPSILON_SYMBOL_@" The epsilon.
"@_UNKNOWN_SYMBOL_@" Any symbol not known to a transducer.
"@_IDENTITY_SYMBOL_@" Any identity symbol pair not known to a transducer.
 
Changed:
<
<
The unary tools that take one input transducer stream read the input transducers as such. If an input transducer has a symbol table, the same table is written with the corresponding output transducer.
>
>
Some tools may take input or produce output that uses a different formalism. For instance, in SFST programming language the epsilon is always denoted as "<>". However, the resulting transducers always use the above-mentioned special symbols:
 
Changed:
<
<
The tools that take more than one input transducer stream assume either that (1) all input transducers have a symbol table or that (2) option --numbers is used. In case (1), the first transducer is read as such and rest of the transducers are harmonized according to the symbol table of the first transducer. The output transducer has a symbol table that contains all symbols used in the input transducers. In case (2), all transducers are read as such and no harmonization is done. The output transducer does not have a symbol table.
>
>
$ echo "<>:a" | hfst-calculate -f sfst | hfst-fst2txt
@_EPSILON_SYMBOL_@  a    0    1
1
 
Changed:
<
<
hfst-fst2strings requires a complete symbol table that contains all symbols that occur in the input or output strings. So does hfst-fst2txt, unless option --number is used. The tools hfst-strings2fst and hfst-txt2fst require that at least the symbol for epsilon is defined. In h(w)fst-calculate the epsilon is always defined as <>. In all three programs, all other symbols are freely assigned a number value and added to the symbol table, if no symbol table is given or a symbol is not found in the symbol table.
>
>
The internal representation of a transition label in a transducer is a number. The mapping from symbols (strings) to numbers is done internally. If the mappings differ between transducers, harmonization is carried out.
 

Reporting bugs

Revision 352010-12-01 - ErikAxelson

Line: 1 to 1
 
META TOPICPARENT name="HfstHome"

HFST: Command Line Tools

Line: 130 to 130
  For more detailed info on the parameters of a tool, see tool-speficic wiki pages listed at HfstHome#Utilities.
Added:
>
>

Command Line Utilities

Utilities Comment
HfstCalculate hfst-calculate − An SFST Programming Language Compiler
HfstCompare hfst-compare
HfstCompose hfst-compose
HfstConcatenate hfst-concatenate
HfstConjunct hfst-conjunct
HfstDeterminize hfst-determinize
HfstDiffTest hfst-diff-test Does not yet work.
HfstDisjunct hfst-disjunct
HfstFormat hfst-format
HfstFst2Fst hfst-fst2fst
HfstFst2Strings hfst-fst2strings
HfstFst2Txt hfst-fst2txt
HfstHead hfst-head
HfstComposeIntersect hfst-compose-intersect
HfstFlagDiacritics hfst-flag-diacritics Does not yet work.
HfstInvert hfst-invert
HfstLexC hfst-lexc − A Lexicon Compiler
HfstLookUp hfst-lookup
HfstMinimize hfst-minimize
HfstPairTest hfst-pair-test Does not yet work.
HfstProject hfst-project
HfstPushWeights hfst-push-weights
HfstRegexp2Fst hfst-regexp2fst
HfstRemoveEpsilons hfst-remove-epsilons
HfstRepeat hfst-repeat
HfstReverse hfst-reverse
HfstSplit hfst-split
HfstStrings2Fst hfst-strings2fst
HfstSubtract hfst-subtract
HfstSubstitute hfst-substitute
HfstSummarize hfst-summarize
HfstTail hfst-tail
HfstTwolC hfst-twolc − A Two-Level Grammar Compiler
HfstTxt2Fst hfst-txt2fst
HfstXfstCompiler hfst-xfst-compiler
 

Transducer and file formats

The software is essentially created around the concept of synchronized transducers, i.e. the input and the output symbols are synchronized symbol pairs. In order to reduce the number of different versions of our tools, one character encoding convention must be used for the input and output text formats. Currently unicode with UTF8 is used in all utilities and all our demo lexicons are implemented in UTF8 (even the English lexicon).

Revision 342010-11-29 - ErikAxelson

Line: 1 to 1
 
META TOPICPARENT name="HfstHome"

HFST: Command Line Tools

Line: 26 to 26
  Parameters common for all commandline programs taking one input stream and writing transducers or text as output.

-i, --input=FILENAME Read input from FILENAME
-o, --output=FILENAME Write output to FILENAME

<--  
-->
Deleted:
<
<
Parameters common for all commandline programs taking one input stream and writing text as output.

-i, --input=FILENAME Read input transducer from FILENAME
-o, --output=FILENAME Write output to text-file FILENAME
-R, --read-symbols=FILENAME Read symbol table from FILENAME

<--  
-->
 
<-- The correct way to include these option lists on a KitWiki documentation page for a utility is to include the topic HfstCommonProgramOptions and one of the topics HfstCommonUnaryProgramOptions or HfstCommonUnaryStringProgramOptions.-->
Line: 42 to 41
  If the output transducer is written into the standard output stream, warnings and verbose output are printed to standard error stream instead of standard output. Error messages are always printed to standard error stream.
Deleted:
<
<

Weight parameters

-w, --weighted Use weighted transducers for operation
 

Revision 332010-11-29 - ErikAxelson

Line: 1 to 1
 
META TOPICPARENT name="HfstHome"

HFST: Command Line Tools

Line: 85 to 85
 
Changed:
<
<
>
>
 

Revision 322010-03-14 - TommiPirinen

Line: 1 to 1
 
META TOPICPARENT name="HfstHome"

HFST: Command Line Tools

Line: 8 to 8
  Tools can be fetched from the Sourceforge download page.
Deleted:
<
<

Dependencies

Installation requires that HFST library package is installed.

 

Installation

For installing instructions, see INSTALL. Briefly, the usual

        ./configure
        make
        (as root) make install

should result in a local installation and

        make uninstall

in its uninstallation. If you would rather install in eg. your home directory (or aren't the system administrator), you can tell ./configure:

        ./configure --prefix=${HOME}
<--  
-->
Line: 54 to 50
  For operations that add weight, the weight switch takes required argument defining weight as number. The NUMBER is parsed using
Changed:
<
<
standard library�s strtod(3) implementation. The semantics for
>
>
standard library's strtod(3) implementation. The semantics for
 weights currently used for transducers is based on tropical semiring implementation of OpenFst library. -->
Line: 133 to 129
 HFST tools that operate on multiple input parameters are:

Added:
>
>
 

Tool-specific parameters

Line: 177 to 174
 

Development and distribution

Changed:
<
<
Source code archive contains test suite make check, which must be passed for all distributed versions, unless clearly labeled as alpha test versions. Statically linked test versions can be made with make static.
>
>
Source code archive contains test suite make check, which must be passed for all distributed versions, unless clearly labeled as alpha test versions.
<-- 
Statically linked test versions can be made with make static.
-->
 

Revision 312009-11-03 - ErikAxelson

Line: 1 to 1
 
META TOPICPARENT name="HfstHome"

HFST: Command Line Tools

Line: 132 to 132
  HFST tools that operate on multiple input parameters are:
Changed:
<
<
>
>
 

Tool-specific parameters

Revision 302009-10-13 - ErikAxelson

Line: 1 to 1
 
META TOPICPARENT name="HfstHome"

HFST: Command Line Tools

Line: 6 to 6
 

Downloading

Changed:
<
<
Tools can be fetched from the Sourceforge svn repository.
>
>
Tools can be fetched from the Sourceforge download page.
 

Dependencies

Revision 292009-10-12 - ErikAxelson

Line: 1 to 1
 
META TOPICPARENT name="HfstHome"

HFST: Command Line Tools

Line: 6 to 6
 

Downloading

Changed:
<
<
Tools can be fetched from the HFST research downloads page.
>
>
Tools can be fetched from the Sourceforge svn repository.
 

Dependencies

Revision 282009-10-12 - TommiPirinen

Line: 1 to 1
 
META TOPICPARENT name="HfstHome"

HFST: Command Line Tools

Line: 164 to 164
 

Reporting bugs

Changed:
<
<
All bugs in command line tools shall be reported to HFST team. It is good to include at least steps to reproduce the error (i.e. exact command(s) used), and first line of output of command hfst-tool --version. E.g. include following in your message:
>
>
All bugs in command line tools shall be reported to sourceforge's HFST issue tracker It is good to include at least steps to reproduce the error (i.e. exact command(s) used), and first line of output of command hfst-tool --version. E.g. include following in your message:
 
$ hfst-tool --version
Line: 173 to 173
 Failure output
Changed:
<
<
It may also be possible to push bug reports to somewhere in KitWiki.
>
>
You may also direct email to HFST team.
 

Development and distribution

Revision 272009-10-05 - ErikAxelson

Line: 1 to 1
 
META TOPICPARENT name="HfstHome"
Changed:
<
<

HFST: Command line tools

>
>

HFST: Command Line Tools

  HFST tools is a collection of HFST based command line utilities that can create, operate and print transducers using the HFST interface. The tools are licenced under GNU GPL version 3 (other licences may be available at request). Licence text can be found from file COPYING. Other licences are possible, and can be given by authors found in AUTHORS file.

Revision 262009-10-05 - ErikAxelson

Line: 1 to 1
 
META TOPICPARENT name="HfstHome"

HFST: Command line tools

Line: 32 to 32
  Parameters common for all commandline programs taking one input stream and writing text as output.

-i, --input=FILENAME Read input transducer from FILENAME
-o, --output=FILENAME Write output to text-file FILENAME
-R, --read-symbols=FILENAME Read symbol table from FILENAME

<--  
-->
Changed:
<
<
The correct way to include these option lists on a KitWiki documentation page for a utility is to include the topic HfstCommonProgramOptions and one of the topics HfstCommonUnaryProgramOptions or HfstCommonUnaryStringProgramOptions.
>
>
<-- The correct way to include these option lists on a KitWiki documentation page for a utility is to include the topic HfstCommonProgramOptions and one of the topics HfstCommonUnaryProgramOptions or HfstCommonUnaryStringProgramOptions.-->
  If output parameter is not given, the transducer or text output will be written to standard output stream. That is, following are equivalent in terms of output processing:
Line: 54 to 54
  For operations that add weight, the weight switch takes required argument defining weight as number. The NUMBER is parsed using
Changed:
<
<
standard library��s strtod(3) implementation. The semantics for
>
>
standard library�s strtod(3) implementation. The semantics for
 weights currently used for transducers is based on tropical semiring implementation of OpenFst library. -->
Line: 148 to 146
 

Transducer formats

Changed:
<
<
For transducer operations that input and output transducers, the output is always of same type as inputs and this cannot be overriden. The command line tools are file-format independent, they select the function based on the input data type. All input transducers are assumed to have the same binary format (which is concluded from the first transducer in the input). In the text-to-transducer modules, the format is selected by the user with tool options. To convert between weighted and unweighted variants, the tool hfst-fst2fst is provided. Tools which operate on multiple transducers will issue error message if fed with different types of transducers.
>
>
For transducer operations that input and output transducers, the output is always of same type as inputs and this cannot be overriden. The command line tools are file-format independent, they select the function based on the input data type. All input transducers are assumed to have the same binary format (which is concluded from the first transducer in the input). In the text-to-transducer modules, the format is selected by the user with tool options. To convert between weighted and unweighted variants, the tool hfst-fst2fst is provided. Tools which operate on multiple transducers will issue error message if fed with different types of transducers.
 

Transducer archive files

Changed:
<
<
The support for transducer archives, that is, streams containing catenation of multiple transducers, is experimental and tool-dependent. If streams that have single input transducer stream are used, the tools repeat the operation for each input as if they had been provided in separate invocations. For tools that require two input streams, the operation depends on specific tool; currently composition composes pairwise, repeating the last transducer for stream that contains fewer. All other tools operate pairwise and exit as soon as either stream is exhausted. To further operate on transducer archives, tools hfst-select etc. can be used.
>
>
The support for transducer archives, that is, streams containing catenation of multiple transducers, is experimental and tool-dependent. If streams that have single input transducer stream are used, the tools repeat the operation for each input as if they had been provided in separate invocations. For tools that require two input streams, the operation depends on specific tool; currently composition composes pairwise, repeating the last transducer for stream that contains fewer. All other tools operate pairwise and exit as soon as either stream is exhausted. To further operate on transducer archives, tools hfst-head etc. can be used.
 

Symbol tables

Line: 162 to 160
  The tools that take more than one input transducer stream assume either that (1) all input transducers have a symbol table or that (2) option --numbers is used. In case (1), the first transducer is read as such and rest of the transducers are harmonized according to the symbol table of the first transducer. The output transducer has a symbol table that contains all symbols used in the input transducers. In case (2), all transducers are read as such and no harmonization is done. The output transducer does not have a symbol table.
Changed:
<
<
hfst-fst2strings requires a complete symbol table that contains all symbols that occur in the input or output strings. So does hfst-fst2txt, unless option --number is used. The tools hfst-strings2fst and hfst-txt2fst require that at least the symbol for epsilon is defined. In h(w)fst-calculate the epsilon is always defined as <>. In all three programs, all other symbols are freely assigned a number value and added to the symbol table, if no symbol table is given or a symbol is not found in the symbol table.
>
>
hfst-fst2strings requires a complete symbol table that contains all symbols that occur in the input or output strings. So does hfst-fst2txt, unless option --number is used. The tools hfst-strings2fst and hfst-txt2fst require that at least the symbol for epsilon is defined. In h(w)fst-calculate the epsilon is always defined as <>. In all three programs, all other symbols are freely assigned a number value and added to the symbol table, if no symbol table is given or a symbol is not found in the symbol table.
 

Reporting bugs

Revision 252009-10-02 - ErikAxelson

Line: 1 to 1
 
META TOPICPARENT name="HfstHome"
Changed:
<
<

HFST command line tools

>
>

HFST: Command line tools

  HFST tools is a collection of HFST based command line utilities that can create, operate and print transducers using the HFST interface. The tools are licenced under GNU GPL version 3 (other licences may be available at request). Licence text can be found from file COPYING. Other licences are possible, and can be given by authors found in AUTHORS file.

Revision 242009-10-02 - ErikAxelson

Line: 1 to 1
 
META TOPICPARENT name="HfstHome"

HFST command line tools

Line: 14 to 14
 

Installation

Changed:
<
<
...
>
>
For installing instructions, see INSTALL. Briefly, the usual

        ./configure
        make
        (as root) make install

should result in a local installation and

        make uninstall

in its uninstallation. If you would rather install in eg. your home directory (or aren't the system administrator), you can tell ./configure:

        ./configure --prefix=${HOME}
<--  
-->
 

Usage

Revision 232009-10-02 - ErikAxelson

Line: 1 to 1
 
META TOPICPARENT name="HfstHome"

HFST command line tools

Line: 122 to 122
 
Added:
>
>
 

Parameters for tools operating on arbitrary number of transducers

Revision 222009-10-01 - ErikAxelson

Line: 1 to 1
 
META TOPICPARENT name="HfstHome"

HFST command line tools

Line: 76 to 72
 cat text.txt | hfst-toolname
Changed:
<
<
Unary operations are hfst-:
>
>
Unary operations are:

 
Deleted:
<
<
  • determinize
  • fst2strings
  • fst2txt
  • fst2fst
  • head
  • invert
  • minimize
  • project
  • push-weights
  • remove-epsilons
  • repeat
  • reverse
  • split
  • strings2fst
  • summarize
  • symbols
  • tail
  • txt2fst
  • unweighted2weighted
  • weighted2unweighted
 

Parameters for binary operator tools

Line: 119 to 115
  If the binary operator is not commutative, the input1 or first transducer is the first or leftmost operand. E.g. for composition input1’s lower level is matched against input2’s upper level.
Changed:
<
<
Binary operations are hfst-:
>
>
Binary operations are:
 
Changed:
<
<
  • compare
  • compose
  • concatenate
  • conjunct
  • disjunct
>
>
 

Parameters for tools operating on arbitrary number of transducers

Line: 137 to 133
  HFST tools that operate on multiple input parameters are:
Changed:
<
<
  • hfst-compose-intersect
>
>
 

Tool-specific parameters

Line: 145 to 141
 

Transducer and file formats

Changed:
<
<
The software is essentially created around the concept of synchronized transducers, i.e. the input and the output symbols are synchronized symbol pairs. In order to reduce the number of different versions of our tools, we need to settle for one character encoding convention for the input and output text formats. Currently unicode with UTF8 is used in all utilities and all our demo lexicons are implemented in UTF8 (even the English lexicon).
>
>
The software is essentially created around the concept of synchronized transducers, i.e. the input and the output symbols are synchronized symbol pairs. In order to reduce the number of different versions of our tools, one character encoding convention must be used for the input and output text formats. Currently unicode with UTF8 is used in all utilities and all our demo lexicons are implemented in UTF8 (even the English lexicon).
  In order to allow different input modes for various functionalities, it was found most convenient to separate the conversions betwen string, text and binary formats into separate modules. Unless otherwise specified on the command line, we assume that the input is read from the standard input and the output is directed to the standard output. The input and output may specify or contain several transducers. Transducers in text format are separated by a transducer delimiter ("--" plus a newline). A delimiter at the end of a file indicates that an empty transducer follows. A sequence of two delimiters indicates an empty transducer inbetween.
Deleted:
<
<
Unweighted transducers are slightly smaller than their weighted counterparts.
<-- and also slightly more efficient to process? -->
To make the utilities file format independent, they need to recognize whether a transducer is weighted or unweighted at run-time, but the conversion between the formats must be done off-line. In practice, this means that
<-- hfst.h has two namespaces HFST and HWFST (with the same functions but different data types) and that -->
the command line tools select the function based on the input data type. To implement this, we use a format stamp in the binary files. All input transducers are assumed to have the same binary format (which is concluded from the first transducer in the input). In the text-to-transducer modules, the format is selected by the user with tool options.
 

Transducer formats

Changed:
<
<
For transducer operations that input and output transducers, the output is always of same type as inputs and this cannot be overriden. To convert between weighted and unweighted variants, the tool hfst-fst2fst is provided. Tools which operate on multiple transducers will issue error message if fed with different types of transducers.
>
>
For transducer operations that input and output transducers, the output is always of same type as inputs and this cannot be overriden. The command line tools are file-format independent, they select the function based on the input data type. All input transducers are assumed to have the same binary format (which is concluded from the first transducer in the input). In the text-to-transducer modules, the format is selected by the user with tool options. To convert between weighted and unweighted variants, the tool hfst-fst2fst is provided. Tools which operate on multiple transducers will issue error message if fed with different types of transducers.
 

Transducer archive files

Line: 184 to 178
 

Development and distribution

Changed:
<
<
Source code archive contains test suite make check, which must be passed for all distributed versions, unless clearly labeled as alpha test versions. Statically linked test versions can be made with make static. New utilities can be developed using hfst-skeleton.cc as source.
>
>
Source code archive contains test suite make check, which must be passed for all distributed versions, unless clearly labeled as alpha test versions. Statically linked test versions can be made with make static.
 

Revision 212009-10-01 - ErikAxelson

Line: 1 to 1
 
META TOPICPARENT name="HfstHome"

HFST command line tools

Changed:
<
<
HFST tools is a collection of HFST based command line utilities that can create, operate and print transducers using the HFST interface. The tools are licenced under GNU GPL version 3 (other licences may be available at request). Licence text can be found from file COPYING. Other licences are possible, and can be given by authors found in AUTHORS file.
>
>
HFST tools is a collection of HFST based command line utilities that can create, operate and print transducers using the HFST interface. The tools are licenced under GNU GPL version 3 (other licences may be available at request). Licence text can be found from file COPYING. Other licences are possible, and can be given by authors found in AUTHORS file.
 

Downloading

Line: 26 to 22
 hfst-toolname [OPTIONS] [FILE...]
Changed:
<
<
HFST tools contain number of different command line utilities, and their parameters vary on case by case basis. If in doubt, parameter --help will always tell the parameters of a tool. For further instructions and examples, see tool-specific wiki pages listed at HfstHome#Utilities.
>
>
HFST tools contain number of different command line utilities, and their parameters vary on case by case basis. If in doubt, parameter --help will always tell the parameters of a tool. For further instructions and examples, see tool-specific wiki pages listed at HfstHome#Utilities.
 

Common parameters

Line: 41 to 34
  The correct way to include these option lists on a KitWiki documentation page for a utility is to include the topic HfstCommonProgramOptions and one of the topics HfstCommonUnaryProgramOptions or HfstCommonUnaryStringProgramOptions.
Changed:
<
<
If output parameter is not given, the transducer or text output will be written to standard output stream. That is, following are equivalent in terms of output processing:
>
>
If output parameter is not given, the transducer or text output will be written to standard output stream. That is, following are equivalent in terms of output processing:
 
hfst-toolname --output=transducer.hfst
hfst-toolname > transducer.hfst
Deleted:
<
<
 
Changed:
<
<
>
>
 hfst-toolname --output=text.txt hfst-toolname > text.txt
Changed:
<
<
If the output transducer is written into the standard output stream, warnings and verbose output are printed to standard error stream instead of standard output. Error messages are always printed to standard error stream.
>
>
If the output transducer is written into the standard output stream, warnings and verbose output are printed to standard error stream instead of standard output. Error messages are always printed to standard error stream.
 

Weight parameters

Line: 67 to 56
  For operations that add weight, the weight switch takes required argument defining weight as number. The NUMBER is parsed using
Changed:
<
<
standard library’s strtod(3) implementation. The semantics for
>
>
standard library��s strtod(3) implementation. The semantics for
 weights currently used for transducers is based on tropical semiring implementation of OpenFst library. -->
Line: 71 to 60
 weights currently used for transducers is based on tropical semiring implementation of OpenFst library. -->
Deleted:
<
<
 

Input parameters for unary operator tools

Changed:
<
<
The input filename may also be specified as free argument of command line, that is, the following are equivalent in terms of input file processing:
>
>
The input filename may also be specified as free argument of command line, that is, the following are equivalent in terms of input file processing:
 
hfst-toolname --input=transducer.hfst
hfst-toolname transducer.hfst
cat transducer.hfst | hfst-toolname
Deleted:
<
<
 
Changed:
<
<
>
>
 hfst-toolname --input=text.txt hfst-toolname text.txt cat text.txt | hfst-toolname
Line: 118 to 105
 
-2, --input2=FILENAME Read second input transducer from FILENAME
-n, --number Do not harmonize transducers
Changed:
<
<
It is also possible to give one or both of the filenames as free arguments on commandline, that is, all following are equivalent in terms of processing:
>
>
It is also possible to give one or both of the filenames as free arguments on commandline, that is, all following are equivalent in terms of processing:
 
hfst-toolname --input1=first.hfst --input2=second.hfst
Line: 131 to 117
 cat second.hfst | hfst-toolname first.hfst
Changed:
<
<
If the binary operator is not commutative, the input1 or first transducer is the first or leftmost operand. E.g. for composition input1’s lower level is matched against input2’s upper level.
>
>
If the binary operator is not commutative, the input1 or first transducer is the first or leftmost operand. E.g. for composition input1’s lower level is matched against input2’s upper level.
  Binary operations are hfst-:
Line: 145 to 129
 

Parameters for tools operating on arbitrary number of transducers

Changed:
<
<
For tools that operate on arbitrary number of input transducers, the list of filenames must be given as free parameters of command line, e.g.:
>
>
For tools that operate on arbitrary number of input transducers, the list of filenames must be given as free parameters of command line, e.g.:
 
hfst-toolname input1.hfst input2.hfst ... input-n.hfst
Line: 158 to 141
 

Tool-specific parameters

Changed:
<
<
For more detailed info on the parameters of a tool, see tool-speficic wiki pages listed at HfstHome#Utilities.
>
>
For more detailed info on the parameters of a tool, see tool-speficic wiki pages listed at HfstHome#Utilities.
 

Transducer and file formats

Added:
>
>
The software is essentially created around the concept of synchronized transducers, i.e. the input and the output symbols are synchronized symbol pairs. In order to reduce the number of different versions of our tools, we need to settle for one character encoding convention for the input and output text formats. Currently unicode with UTF8 is used in all utilities and all our demo lexicons are implemented in UTF8 (even the English lexicon).

In order to allow different input modes for various functionalities, it was found most convenient to separate the conversions betwen string, text and binary formats into separate modules. Unless otherwise specified on the command line, we assume that the input is read from the standard input and the output is directed to the standard output. The input and output may specify or contain several transducers. Transducers in text format are separated by a transducer delimiter ("--" plus a newline). A delimiter at the end of a file indicates that an empty transducer follows. A sequence of two delimiters indicates an empty transducer inbetween.

Unweighted transducers are slightly smaller than their weighted counterparts.

<-- and also slightly more efficient to process? -->
To make the utilities file format independent, they need to recognize whether a transducer is weighted or unweighted at run-time, but the conversion between the formats must be done off-line. In practice, this means that
<-- hfst.h has two namespaces HFST and HWFST (with the same functions but different data types) and that -->
the command line tools select the function based on the input data type. To implement this, we use a format stamp in the binary files. All input transducers are assumed to have the same binary format (which is concluded from the first transducer in the input). In the text-to-transducer modules, the format is selected by the user with tool options.
 

Transducer formats

Changed:
<
<
For transducer operations that input and output transducers, the output is always of same type as inputs and this cannot be overriden. To convert between weighted and unweighted variants, tools hfst-unweighted2weighted and hfst-weighted2unweighted are provided. Tools which operate on multiple transducers will issue error message if fed with different types of transducers.
>
>
For transducer operations that input and output transducers, the output is always of same type as inputs and this cannot be overriden. To convert between weighted and unweighted variants, the tool hfst-fst2fst is provided. Tools which operate on multiple transducers will issue error message if fed with different types of transducers.
 

Transducer archive files

Changed:
<
<
The support for transducer archives, that is, streams containing catenation of multiple transducers, is experimental and tool-dependent. If streams that have single input transducer stream are used, the tools repeat the operation for each input as if they had been provided in separate invocations. For tools that require two input streams, the operation depends on specific tool; currently composition composes pairwise, repeating the last transducer for stream that contains fewer. All other tools operate pairwise and exit as soon as either stream is exhausted. To further operate on transducer archives, tools hfst-select etc. can be used.
>
>
The support for transducer archives, that is, streams containing catenation of multiple transducers, is experimental and tool-dependent. If streams that have single input transducer stream are used, the tools repeat the operation for each input as if they had been provided in separate invocations. For tools that require two input streams, the operation depends on specific tool; currently composition composes pairwise, repeating the last transducer for stream that contains fewer. All other tools operate pairwise and exit as soon as either stream is exhausted. To further operate on transducer archives, tools hfst-select etc. can be used.
 

Symbol tables

Changed:
<
<
The internal representation of a transition label in a transducer is a number. The mapping from symbols (strings) to numbers can be provided with a symbol table. By default, all tools write a symbol table with the output transducer, if (1) all input transducers have a symbol table or (2) a separate symbol table file has been defined with option --read-symbols.

The unary tools that take one input transducer stream read the input transducers as such. If an input transducer has a symbol table, the same table is written with the corresponding output transducer.

The tools that take more than one input transducer stream assume either that (1) all input transducers have a symbol table or that (2) option --numbers is used. In case (1), the first transducer is read as such and rest of the transducers are harmonized according to the symbol table of the first transducer. The output transducer has a symbol table that contains all symbols used in the input transducers. In case (2), all transducers are read as such and no harmonization is done. The output transducer does not have a symbol table.

hfst-fst2strings requires a complete symbol table that contains all symbols that occur in the input or output strings. So does hfst-fst2txt, unless option --number is used. The tools hfst-strings2fst and hfst-txt2fst require that at least the symbol for epsilon is defined. In h(w)fst-calculate the epsilon is always defined as <>. In all three programs, all other symbols are freely assigned a number value and added to the symbol table, if no symbol table is given or a symbol is not found in the symbol table.

>
>
The internal representation of a transition label in a transducer is a number. The mapping from symbols (strings) to numbers can be provided with a symbol table. By default, all tools write a symbol table with the output transducer, if (1) all input transducers have a symbol table or (2) a separate symbol table file has been defined with option --read-symbols.
 
Added:
>
>
The unary tools that take one input transducer stream read the input transducers as such. If an input transducer has a symbol table, the same table is written with the corresponding output transducer.

The tools that take more than one input transducer stream assume either that (1) all input transducers have a symbol table or that (2) option --numbers is used. In case (1), the first transducer is read as such and rest of the transducers are harmonized according to the symbol table of the first transducer. The output transducer has a symbol table that contains all symbols used in the input transducers. In case (2), all transducers are read as such and no harmonization is done. The output transducer does not have a symbol table.

hfst-fst2strings requires a complete symbol table that contains all symbols that occur in the input or output strings. So does hfst-fst2txt, unless option --number is used. The tools hfst-strings2fst and hfst-txt2fst require that at least the symbol for epsilon is defined. In h(w)fst-calculate the epsilon is always defined as <>. In all three programs, all other symbols are freely assigned a number value and added to the symbol table, if no symbol table is given or a symbol is not found in the symbol table.

 

Reporting bugs

Changed:
<
<
All bugs in command line tools shall be reported to HFST team. It is good to include at least steps to reproduce the error (i.e. exact command(s) used), and first line of output of command hfst-tool --version. E.g. include following in your message:
>
>
All bugs in command line tools shall be reported to HFST team. It is good to include at least steps to reproduce the error (i.e. exact command(s) used), and first line of output of command hfst-tool --version. E.g. include following in your message:
 
$ hfst-tool --version
Line: 235 to 184
 

Development and distribution

Changed:
<
<
Source code archive contains test suite make check, which must be passed for all distributed versions, unless clearly labeled as alpha test versions. Statically linked test versions can be made with make static. New utilities can be developed using hfst-skeleton.cc as source.
>
>
Source code archive contains test suite make check, which must be passed for all distributed versions, unless clearly labeled as alpha test versions. Statically linked test versions can be made with make static. New utilities can be developed using hfst-skeleton.cc as source.
 
Changed:
<
<

>
>

  -- TommiPirinen
<-- vim: set ft=twiki: -->
>
>
--> -- TommiPirinen
<-- vim: set ft=twiki: -->
 
META TOPICMOVED by="TommiPirinen" date="1236767171" from="KitWiki.NotYetHfstToolCommandLines" to="KitWiki.HfstCommandLineTools"

Revision 202009-09-29 - ErikAxelson

Line: 1 to 1
 
META TOPICPARENT name="HfstHome"

HFST command line tools

Line: 62 to 62
 

Weight parameters

-w, --weighted Use weighted transducers for operation
Changed:
<
<
-w, --weight=NUMBER Use weighted transducers and NUMBER as weight (currently not in use)
>
>
 

Input parameters for unary operator tools

Line: 90 to 91
  Unary operations are hfst-:
Deleted:
<
<
  • compatible
 
  • determinize
  • fst2strings
  • fst2txt
Added:
>
>
  • fst2fst
 
  • head
  • invert
  • minimize
Line: 105 to 106
 
  • split
  • strings2fst
  • summarize
Added:
>
>
  • symbols
 
  • tail
  • txt2fst
  • unweighted2weighted
Line: 159 to 161
 For more detailed info on the parameters of a tool, see tool-speficic wiki pages listed at HfstHome#Utilities.
Deleted:
<
<
hfst-fst2strings uses:

-n, --nbest=INT The maximum number of strings printed
-u, --unique Print each string at most once
-w, --with-spaces Print spaces between transitions
-r, --random Select the string randomly, not according to weight
-p, --pairstrings Print strings in pairstring format

hfst-strings2fst uses:

-e, --epsilon=EPS The symbol denoting the epsilon in input strings
-j, --disjunct_strings Disjunct all strings instead of transforming each string into a separate transducer
-p, --pairstrings The input is in pairstring format

hfst-fst2txt uses:

-n, --number Print numbers instead of symbols in transitions

hfst-project uses:

-p, --project=LEVEL Project towards LEVEL level

  Where LEVEL is one of {upper,input,analysis} or {lower,output,generation}.

hfst-push-weights uses:

-p, --push=DIRECTION Push towards DIRECTION

  Where DIRECTION is {start, initial} or {end, final}.

<--
hfst-regexp2fst uses:
-->

hfst-repeat uses:

-f, --from=NUMBER Repeat at least NUMBER times
-t, --to=NUMBER Repeat at most NUMBER times

Where NUMBERs must be given in format strtod(3) supports.
If --from parameter is ignored, it defaults to 0,
if --to parameter is ignored, it defaults to infinity

hfst-txt2fst uses:

-n, --number If numbers are used instead of symbol names in transitions
-e, --epsilon=EPS If no symbol table is given, map EPS as zero.
 

Transducer and file formats

Line: 222 to 168
  For transducer operations that input and output transducers, the output is always of same type as inputs and this cannot be overriden. To convert between
Changed:
<
<
weighted and unweighted variants, a tool hfst-convert is provided. Tools which operate on multiple transducers will issue error message if fed with different types of transducers.
>
>
weighted and unweighted variants, tools hfst-unweighted2weighted and hfst-weighted2unweighted are provided. Tools which operate on multiple transducers will issue error message if fed with different types of transducers.
 

Transducer archive files

Line: 236 to 182
 composition composes pairwise, repeating the last transducer for stream that contains fewer. All other tools operate pairwise and exit as soon as either stream is exhausted. To further operate on transducer archives, tools
Changed:
<
<
hfst-select etc.
>
>
hfst-select etc. can be used.
 

Symbol tables

The internal representation of a transition label in a transducer is a number. The mapping from symbols (strings) to numbers can be provided with a symbol table.

Changed:
<
<
By default, all tools write a symbol table with the resulting output transducer if all input transducers have a symbol table or a separate symbol table file
>
>
By default, all tools write a symbol table with the output transducer, if (1) all input transducers have a symbol table or (2) a separate symbol table file
  has been defined with option --read-symbols.
Changed:
<
<
The unary tools that take one input transducer read the input transducer as such. If the input transducer has a symbol table, the same table is written with the resulting output transducer.
>
>
The unary tools that take one input transducer stream read the input transducers as such. If an input transducer has a symbol table, the same table is written with the corresponding output transducer.
 
Changed:
<
<
The tools that take more than one input transducer assume either that
>
>
The tools that take more than one input transducer stream assume either that
 (1) all input transducers have a symbol table or that (2) option --numbers is used. In case (1), the first transducer is read as such and rest of the transducers are harmonized
Changed:
<
<
according to the symbol table of the first transducer. The resulting output transducer has
>
>
according to the symbol table of the first transducer. The output transducer has
 a symbol table that contains all symbols used in the input transducers. In case (2), all transducers are read as such and no harmonization is done.
Changed:
<
<
The resulting output transducer does not have a symbol table.
>
>
The output transducer does not have a symbol table.
 
Changed:
<
<
The tools hfst-strings2fst and hfst-fst2strings require a complete symbol table that contains all symbols that occur in the input/output strings and their corresponding number values. hfst-txt2fst requires that at least the symbol for epsilon is defined, as does h(w)fst-calculate that has epsilon defined as <>. All other symbols are freely assigned a number value and added to the symbol table. hfst-fst2txt requires a symbol table unless option --number is used.
>
>
hfst-fst2strings requires a complete symbol table that contains all symbols that occur in the input or output strings. So does hfst-fst2txt, unless option --number is used. The tools hfst-strings2fst and hfst-txt2fst require that at least the symbol for epsilon is defined. In h(w)fst-calculate the epsilon is always defined as <>. In all three programs, all other symbols are freely assigned a number value and added to the symbol table, if no symbol table is given or a symbol is not found in the symbol table.
 

Reporting bugs

Revision 192009-09-29 - ErikAxelson

Line: 1 to 1
 
META TOPICPARENT name="HfstHome"

HFST command line tools

HFST tools is a collection of HFST based command line utilities that can

Changed:
<
<
create, operate and print transducers using HFST interface. The tools
>
>
create, operate and print transducers using the HFST interface. The tools
 are licenced under GNU GPL version 3 (other licences may be available at request). Licence text can be found from file COPYING. Other licences are possible, and can be given by authors found in AUTHORS file.
Line: 14 to 14
 

Dependencies

Changed:
<
<
Installation requires:

  • HFST
    • Compiling needs HFST header files installed
    • Linking and dynamic loading requires libhfst.so installed

The header files and libhfst.so are found in the HFST library package.

>
>
Installation requires that HFST library package is installed.
 

Installation

Changed:
<
<
Source code can be compiled to programs by command make. Make variable LIBPATH tells where HFST header files and libhfst.so are located. Change this variable so it points to the directory where you have installed the HFST library package

The programs can be installed to system’s binary directory by make install. The default binary target defaults on linux systems to /usr/local/bin, but this may be changed with make variable prefix.

>
>
...
 

Usage

Line: 61 to 46
 
hfst-toolname --output=transducer.hfst
Changed:
<
<
hfst-toolname > transducer.hfst
>
>
hfst-toolname > transducer.hfst
 

hfst-toolname --output=text.txt
Changed:
<
<
hfst-toolname > text.txt
>
>
hfst-toolname > text.txt
 

If the output transducer is written into the

Revision 182009-05-31 - KristerLinden

Line: 1 to 1
 
META TOPICPARENT name="HfstHome"

HFST command line tools

Line: 168 to 168
 HFST tools that operate on multiple input parameters are:

  • hfst-compose-intersect
Deleted:
<
<
  • hfst-lexc
 

Tool-specific parameters

Revision 172009-04-21 - ErikAxelson

Line: 1 to 1
 
META TOPICPARENT name="HfstHome"

HFST command line tools

Line: 76 to 76
 

Weight parameters

Deleted:
<
<
-u, --unweighted Use unweighted transducers for operation DEFAULT?
 
-w, --weighted Use weighted transducers for operation
Changed:
<
<
-w, --weight=NUMBER Use weighted transducers and NUMBER as weight
>
>
-w, --weight=NUMBER Use weighted transducers and NUMBER as weight (currently not in use)
  For operations that add weight, the weight switch takes required argument defining weight as number. The NUMBER is parsed using

Revision 162009-04-09 - ErikAxelson

Line: 1 to 1
 
META TOPICPARENT name="HfstHome"

HFST command line tools

Line: 20 to 20
 
    • Compiling needs HFST header files installed
    • Linking and dynamic loading requires libhfst.so installed
Changed:
<
<
The header files and libhfst.so are found in the HFST library package.
>
>
The header files and libhfst.so are found in the HFST library package.
 

Installation

Source code can be compiled to programs by command make. Make variable LIBPATH tells where HFST header files and libhfst.so are located. Change this variable so it points to the directory where you have installed the

Changed:
<
<
HFST library package
>
>
HFST library package
  The programs can be installed to system’s binary directory by make install.

Revision 152009-04-08 - ErikAxelson

Line: 1 to 1
 
META TOPICPARENT name="HfstHome"

HFST command line tools

Line: 17 to 17
 Installation requires:

  • HFST
Changed:
<
<
    • Compiling needs hfst.h installed
>
>
    • Compiling needs HFST header files installed
 
    • Linking and dynamic loading requires libhfst.so installed
Added:
>
>
The header files and libhfst.so are found in the HFST library package.
 

Installation

Changed:
<
<
Source code can be compiled to programs by command make. The programs can
>
>
Source code can be compiled to programs by command make. Make variable LIBPATH tells where HFST header files and libhfst.so are located. Change this variable so it points to the directory where you have installed the HFST library package

The programs can

 be installed to system’s binary directory by make install. The default binary target defaults on linux systems to /usr/local/bin,

Revision 142009-04-07 - ErikAxelson

Line: 1 to 1
 
META TOPICPARENT name="HfstHome"

HFST command line tools

Line: 37 to 37
  HFST tools contain number of different command line utilities, and their parameters vary on case by case basis. If in doubt, parameter --help
Changed:
<
<
will always tell further instructions.
>
>
will always tell the parameters of a tool. For further instructions and examples, see tool-specific wiki pages listed at HfstHome#Utilities.
 

Common parameters

Line: 166 to 167
 

Tool-specific parameters

Changed:
<
<
For specific information do refer to specific wiki pages or man pages of specific tools.
>
>
For more detailed info on the parameters of a tool, see tool-speficic wiki pages listed at HfstHome#Utilities.
  hfst-fst2strings uses:

Revision 132009-04-01 - ErikAxelson

Line: 1 to 1
 
META TOPICPARENT name="HfstHome"

HFST command line tools

Line: 111 to 111
 
  • remove-epsilons
  • repeat
  • reverse
Added:
>
>
  • split
 
  • strings2fst
  • summarize
  • tail
Line: 158 to 159
 hfst-toolname input1.hfst input2.hfst ... input-n.hfst
Changed:
<
<
HFST tools that operate on multiple parameters are:
>
>
HFST tools that operate on multiple input parameters are:
 
  • hfst-compose-intersect
  • hfst-lexc

Revision 122009-04-01 - ErikAxelson

Line: 1 to 1
 
META TOPICPARENT name="HfstHome"

HFST command line tools

Line: 53 to 53
 output stream. That is, following are equivalent in terms of output processing:
Changed:
<
<
hfst-toolname --output=output.hfst hfst-toolname > output.hfst
>
>
hfst-toolname --output=transducer.hfst hfst-toolname > transducer.hfst

hfst-toolname --output=text.txt
hfst-toolname &gt; text.txt
 

If the output transducer is written into the

Line: 75 to 80
 implementation of OpenFst library.
Changed:
<
<

Parameters for unary operator tools

>
>

Input parameters for unary operator tools

  The input filename may also be specified as free argument of command line, that is, the following are equivalent in terms of input file processing:
Line: 87 to 92
 
Changed:
<
<
hfst-toolname --input=strings.txt hfst-toolname strings.txt cat strings.txt | hfst-toolname
>
>
hfst-toolname --input=text.txt hfst-toolname text.txt cat text.txt | hfst-toolname
 

Unary operations are hfst-:

Added:
>
>
  • compatible
 
  • determinize
  • fst2strings
  • fst2txt
Added:
>
>
  • head
 
  • invert
  • minimize
  • project
Line: 105 to 112
 
  • repeat
  • reverse
  • strings2fst
Added:
>
>
  • summarize
  • tail
 
  • txt2fst
Changed:
<
<
  • (compatible, summarize, head, tail)
>
>
  • unweighted2weighted
  • weighted2unweighted
 

Parameters for binary operator tools

Revision 112009-04-01 - MiikkaSilfverberg

Line: 1 to 1
 
META TOPICPARENT name="HfstHome"

HFST command line tools

Line: 41 to 41
 

Common parameters

Changed:
<
<
These parameters should work with every hfst command line tool.
>
>
Parameters common for all commandline programs.

-h, --help Print help message
-V, --version Print version info
-v, --verbose Print verbosely while processing
-q, --quiet Do not print output
-s, --silent Alias of --quiet
<--  
-->
 
Changed:
<
<
-h, --help Print help message
-V, --version Print version info
-v, --verbose Print verbosely while processing
-q, --quiet Do not print output
-s, --silent Alias of --quiet
-o, --output=FILENAME Write output to FILENAME
-R, --symbols=FILENAME Read symbol table from FILENAME
-D, --do-not-write-symbols Do not write symbol table with the output transducer
-W, --write-symbols-to=FILENAME Write symbol table to file FILENAME

Output parameter may not apply on tools that do not create transducers or output data, such as hfst-summarize or hfst-split that creates several output files.

>
>
Parameters common for all commandline programs taking one input stream and writing transducers or text as output.

-i, --input=FILENAME Read input from FILENAME
-o, --output=FILENAME Write output to FILENAME

<--  
-->
 
Changed:
<
<
If output parameter is not given, the transducer will be written to standard
>
>
Parameters common for all commandline programs taking one input stream and writing text as output.

-i, --input=FILENAME Read input transducer from FILENAME
-o, --output=FILENAME Write output to text-file FILENAME
-R, --read-symbols=FILENAME Read symbol table from FILENAME

<--  
-->

The correct way to include these option lists on a KitWiki documentation page for a utility is to include the topic HfstCommonProgramOptions and one of the topics HfstCommonUnaryProgramOptions or HfstCommonUnaryStringProgramOptions.

If output parameter is not given, the transducer or text output will be written to standard

 output stream. That is, following are equivalent in terms of output processing:
Line: 65 to 57
 hfst-toolname > output.hfst
Changed:
<
<
If transducer is written into the
>
>
If the output transducer is written into the
 standard output stream, warnings and verbose output are printed to standard error stream instead of standard output. Error messages are always printed to standard error stream.
Line: 85 to 77
 

Parameters for unary operator tools

Deleted:
<
<
-i, --input=FILENAME Read transducer (or text) from FILENAME
 The input filename may also be specified as free argument of command line, that
Changed:
<
<
is, following are equivalent in terms of input file processing:
>
>
is, the following are equivalent in terms of input file processing:
 
hfst-toolname --input=transducer.hfst

Revision 102009-03-25 - ErikAxelson

Line: 1 to 1
 
META TOPICPARENT name="HfstHome"

HFST command line tools

Line: 54 to 54
 
-W, --write-symbols-to=FILENAME Write symbol table to file FILENAME

Output parameter may not apply on tools that do not create transducers or

Changed:
<
<
output data, such as hfst-summarize.
>
>
output data, such as hfst-summarize or hfst-split that creates several output files.
  If output parameter is not given, the transducer will be written to standard output stream. That is, following are equivalent in terms of output processing:
Line: 71 to 72
 

Weight parameters

Changed:
<
<
-u, --unweighted Use unweighted transducers for operation
>
>
-u, --unweighted Use unweighted transducers for operation DEFAULT?
 
-w, --weighted Use weighted transducers for operation
-w, --weight=NUMBER Use weighted transducers and NUMBER as weight
Line: 121 to 122
 
-1, --input1=FILENAME Read first input transducer from FILENAME
-2, --input2=FILENAME Read second input transducer from FILENAME
Added:
>
>
-n, --number Do not harmonize transducers
  It is also possible to give one or both of the filenames as free arguments on commandline, that is, all following are equivalent in terms of processing:
Line: 249 to 251
  The internal representation of a transition label in a transducer is a number. The mapping from symbols (strings) to numbers can be provided with a symbol table.
Changed:
<
<
By default, all tools write a symbol table with the resulting output transducer if all the input transducers have a symbol table or a separate symbol table file has been defined with option --symbol_table.

The unary tools that take one input transducer read the input transducer as such. If the input transducer has a symbol table, the same table is written with the resulting output transducer.

The tools that take more than one input transducer assume either that (1) all input transducer have a symbol table or that (2) option --numbers is used. In case (1), the first transducer is read as such and rest of the transducers are harmonized according to the symbol table of the first transducer. The resulting output transducer has a symbol table that contains all symbols used in the input transducers. In case (2), all transducers are read as such and no harmonization is done. The resulting output transducer does not have a symbol table.

>
>
By default, all tools write a symbol table with the resulting output transducer if all input transducers have a symbol table or a separate symbol table file has been defined with option --read-symbols.

The unary tools that take one input transducer read the input transducer as such. If the input transducer has a symbol table, the same table is written with the resulting output transducer.

The tools that take more than one input transducer assume either that (1) all input transducers have a symbol table or that (2) option --numbers is used. In case (1), the first transducer is read as such and rest of the transducers are harmonized according to the symbol table of the first transducer. The resulting output transducer has a symbol table that contains all symbols used in the input transducers. In case (2), all transducers are read as such and no harmonization is done. The resulting output transducer does not have a symbol table.

 

The tools hfst-strings2fst and hfst-fst2strings require a complete symbol table that contains

Revision 92009-03-20 - ErikAxelson

Line: 1 to 1
 
META TOPICPARENT name="HfstHome"

HFST command line tools

Line: 49 to 49
 
-q, --quiet Do not print output
-s, --silent Alias of --quiet
-o, --output=FILENAME Write output to FILENAME
Changed:
<
<
-S, --symbols=FILENAME Read symbol table from FILENAME
>
>
-R, --symbols=FILENAME Read symbol table from FILENAME
 
-D, --do-not-write-symbols Do not write symbol table with the output transducer
-W, --write-symbols-to=FILENAME Write symbol table to file FILENAME

Revision 82009-03-17 - ErikAxelson

Line: 1 to 1
 
META TOPICPARENT name="HfstHome"

HFST command line tools

Line: 249 to 249
  The internal representation of a transition label in a transducer is a number. The mapping from symbols (strings) to numbers can be provided with a symbol table.
Changed:
<
<
By default, all tools write a symbol table with the resulting output transducer if at least one of the input transducers has a symbol table or a separate symbol table file has been defined with option --symbol_table. The tools that take more than one input transducer read the first transducer as such and harmonize the rest of the transducers according to the symbol table of the first transducer. The resulting output transducer has a symbol table that contains all symbols used in the input transducers.
>
>
By default, all tools write a symbol table with the resulting output transducer if all the input transducers have a symbol table or a separate symbol table file has been defined with option --symbol_table.

The unary tools that take one input transducer read the input transducer as such. If the input transducer has a symbol table, the same table is written with the resulting output transducer.

The tools that take more than one input transducer assume either that (1) all input transducer have a symbol table or that (2) option --numbers is used. In case (1), the first transducer is read as such and rest of the transducers are harmonized according to the symbol table of the first transducer. The resulting output transducer has a symbol table that contains all symbols used in the input transducers. In case (2), all transducers are read as such and no harmonization is done. The resulting output transducer does not have a symbol table.

  The tools hfst-strings2fst and hfst-fst2strings require a complete symbol table that contains all symbols that occur in the input/output strings and their corresponding number values.

Revision 72009-03-17 - ErikAxelson

Line: 1 to 1
 
META TOPICPARENT name="HfstHome"

HFST command line tools

Line: 141 to 141
  Binary operations are hfst-:
Added:
>
>
  • compare
 
  • compose
  • concatenate
  • conjunct
Line: 157 to 158
  HFST tools that operate on multiple parameters are:
Added:
>
>
  • hfst-compose-intersect
 
  • = hfst-lexc=

Tool-specific parameters

Line: 170 to 172
 
-u, --unique Print each string at most once
-w, --with-spaces Print spaces between transitions
-r, --random Select the string randomly, not according to weight
Added:
>
>
-p, --pairstrings Print strings in pairstring format
  hfst-strings2fst uses:
Line: 256 to 260
 all symbols that occur in the input/output strings and their corresponding number values. hfst-txt2fst requires that at least the symbol for epsilon is defined, as does h(w)fst-calculate that has epsilon defined as <>. All other symbols are freely assigned a number value and
Changed:
<
<
added to the symbol table. hfst-fst2txt requires a symbol table unless option -number is used.
>
>
added to the symbol table. hfst-fst2txt requires a symbol table unless option --number is used.
 

Reporting bugs

Revision 62009-03-16 - ErikAxelson

Line: 1 to 1
 
META TOPICPARENT name="HfstHome"

HFST command line tools

Line: 48 to 48
 
-v, --verbose Print verbosely while processing
-q, --quiet Do not print output
-s, --silent Alias of --quiet
Changed:
<
<
-o, --output=FILENAME Print output to FILENAME
>
>
-o, --output=FILENAME Write output to FILENAME
 
-S, --symbols=FILENAME Read symbol table from FILENAME
Added:
>
>
-D, --do-not-write-symbols Do not write symbol table with the output transducer
 
-W, --write-symbols-to=FILENAME Write symbol table to file FILENAME

Output parameter may not apply on tools that do not create transducers or

Line: 83 to 84
 

Parameters for unary operator tools

Changed:
<
<
-i, --input=FILENAME Read transducer from FILENAME
>
>
-i, --input=FILENAME Read transducer (or text) from FILENAME
  The input filename may also be specified as free argument of command line, that is, following are equivalent in terms of input file processing:
Line: 94 to 95
 cat transducer.hfst | hfst-toolname
Changed:
<
<
Unary operations are e.g. hfst-:
>
>
hfst-toolname --input=strings.txt
hfst-toolname strings.txt
cat strings.txt | hfst-toolname

Unary operations are hfst-:

 
  • determinize
Added:
>
>
  • fst2strings
  • fst2txt
 
  • invert
  • minimize
  • project
Line: 104 to 113
 
  • remove-epsilons
  • repeat
  • reverse
Added:
>
>
  • strings2fst
  • txt2fst
  • (compatible, summarize, head, tail)
 

Parameters for binary operator tools

Line: 202 to 214
  hfst-txt2fst uses:
Deleted:
<
<
-a, --append_symbols Append a symbol table with each transducer
 
-n, --number If numbers are used instead of symbol names in transitions
Deleted:
<
<
-w, --weight Write transducer in weighted format
 
-e, --epsilon=EPS If no symbol table is given, map EPS as zero.
Line: 231 to 241
 stream is exhausted. To further operate on transducer archives, tools hfst-select etc.
Added:
>
>

Symbol tables

The internal representation of a transition label in a transducer is a number. The mapping from symbols (strings) to numbers can be provided with a symbol table. By default, all tools write a symbol table with the resulting output transducer if at least one of the input transducers has a symbol table or a separate symbol table file has been defined with option --symbol_table. The tools that take more than one input transducer read the first transducer as such and harmonize the rest of the transducers according to the symbol table of the first transducer. The resulting output transducer has a symbol table that contains all symbols used in the input transducers.

The tools hfst-strings2fst and hfst-fst2strings require a complete symbol table that contains all symbols that occur in the input/output strings and their corresponding number values. hfst-txt2fst requires that at least the symbol for epsilon is defined, as does h(w)fst-calculate that has epsilon defined as <>. All other symbols are freely assigned a number value and added to the symbol table. hfst-fst2txt requires a symbol table unless option -number is used.

 

Reporting bugs

All bugs in command line tools shall be reported to

Revision 52009-03-16 - ErikAxelson

Line: 1 to 1
 
META TOPICPARENT name="HfstHome"

HFST command line tools

Line: 156 to 156
 
-n, --nbest=INT The maximum number of strings printed
-u, --unique Print each string at most once
Changed:
<
<
-w, --with-space Print spaces between transitions
>
>
-w, --with-spaces Print spaces between transitions
 
-r, --random Select the string randomly, not according to weight
Added:
>
>
hfst-strings2fst uses:

-e, --epsilon=EPS The symbol denoting the epsilon in input strings
-j, --disjunct_strings Disjunct all strings instead of transforming each string into a separate transducer
-p, --pairstrings The input is in pairstring format
 hfst-fst2txt uses:

-n, --number Print numbers instead of symbols in transitions
Line: 196 to 202
  hfst-txt2fst uses:
Changed:
<
<
-a, --store_symbols Write symbol table with each transducer
-A, --store_symbols_to=FILE Store the symbol table to FILE
>
>
-a, --append_symbols Append a symbol table with each transducer
 
-n, --number If numbers are used instead of symbol names in transitions
-w, --weight Write transducer in weighted format
-e, --epsilon=EPS If no symbol table is given, map EPS as zero.

Revision 42009-03-15 - MiikkaSilfverberg

Line: 1 to 1
 
META TOPICPARENT name="HfstHome"

HFST command line tools

Line: 49 to 49
 
-q, --quiet Do not print output
-s, --silent Alias of --quiet
-o, --output=FILENAME Print output to FILENAME
Changed:
<
<
-S, --symbols=FILENAME Print symbol table to FILENAME or read symbol table from FILENAME
>
>
-S, --symbols=FILENAME Read symbol table from FILENAME
-W, --write-symbols-to=FILENAME Write symbol table to file FILENAME
  Output parameter may not apply on tools that do not create transducers or output data, such as hfst-summarize.

Revision 32009-03-11 - MiikkaSilfverberg

Line: 1 to 1
 
META TOPICPARENT name="HfstHome"

HFST command line tools

Line: 155 to 155
 
-n, --nbest=INT The maximum number of strings printed
-u, --unique Print each string at most once
Changed:
<
<
-s, --spaces Print spaces between transitions
>
>
-w, --with-space Print spaces between transitions
 
-r, --random Select the string randomly, not according to weight

hfst-fst2txt uses:

Revision 22009-03-11 - TommiPirinen

Line: 1 to 1
 
META TOPICPARENT name="HfstHome"
Changed:
<
<

HFST: Command line tools best practices

>
>

HFST command line tools

  HFST tools is a collection of HFST based command line utilities that can create, operate and print transducers using HFST interface. The tools
Line: 11 to 10
 

Downloading

Changed:
<
<
Tools can be fetched from HFST research downloads page.
>
>
Tools can be fetched from the HFST research downloads page.
 

Dependencies

Installation requires:

  • HFST
Changed:
<
<
    • Compiling needs hfst.h in /usr/include/hfst/ or ???/hfst/
    • Linking and dynamic loading requires libhfst.so in /usr/lib/ or LD_LIBRARY_PATH
>
>
    • Compiling needs hfst.h installed
    • Linking and dynamic loading requires libhfst.so installed
 

Installation

Source code can be compiled to programs by command make. The programs can

Changed:
<
<
be installed to system’s binary directory by make install, the binary target defaults on linux systems to /usr/bin, but may be changed with make variable prefix.
>
>
be installed to system’s binary directory by make install. The default binary target defaults on linux systems to /usr/local/bin, but this may be changed with make variable prefix.
 

Usage

Line: 34 to 35
 hfst-toolname [OPTIONS] [FILE...]
Changed:
<
<
HFST tools contains number of different command line utilities, and their
>
>
HFST tools contain number of different command line utilities, and their
 parameters vary on case by case basis. If in doubt, parameter --help
Changed:
<
<
will always tell further instructions
>
>
will always tell further instructions.
 

Common parameters

Added:
>
>
These parameters should work with every hfst command line tool.
 
-h, --help Print help message
-V, --version Print version info
-v, --verbose Print verbosely while processing
-q, --quiet Do not print output
-s, --silent Alias of --quiet
-o, --output=FILENAME Print output to FILENAME
Added:
>
>
-S, --symbols=FILENAME Print symbol table to FILENAME or read symbol table from FILENAME
  Output parameter may not apply on tools that do not create transducers or
Changed:
<
<
data, such as hfst-transducer-info.
>
>
output data, such as hfst-summarize.
  If output parameter is not given, the transducer will be written to standard output stream. That is, following are equivalent in terms of output processing:
Line: 58 to 62
 hfst-toolname > output.hfst
Changed:
<
<
If transducer is written in standard output stream, error messages and verbose output are printed to standard error stream instead of standard output.
>
>
If transducer is written into the standard output stream, warnings and verbose output are printed to standard error stream instead of standard output. Error messages are always printed to standard error stream.
 

Weight parameters

Line: 69 to 74
 
-w, --weight=NUMBER Use weighted transducers and NUMBER as weight

For operations that add weight, the weight switch takes required

Changed:
<
<
argument defining weight as number. The number is parsed using standard library’s strtod(3) implementation.
>
>
argument defining weight as number. The NUMBER is parsed using standard library’s strtod(3) implementation. The semantics for weights currently used for transducers is based on tropical semiring implementation of OpenFst library.
 
Deleted:
<
<
For transducer operations that input and output transducers, the output is always of same type as inputs and this cannot be overriden. To convert between weighted and unweighted variants, a tool hfst-convert is provided.
 

Parameters for unary operator tools

Line: 129 to 133
 
  • conjunct
  • disjunct
Added:
>
>

Parameters for tools operating on arbitrary number of transducers

For tools that operate on arbitrary number of input transducers, the list of filenames must be given as free parameters of command line, e.g.:

hfst-toolname input1.hfst input2.hfst ... input-n.hfst

HFST tools that operate on multiple parameters are:

  • = hfst-lexc=

Tool-specific parameters

For specific information do refer to specific wiki pages or man pages of specific tools.

hfst-fst2strings uses:

-n, --nbest=INT The maximum number of strings printed
-u, --unique Print each string at most once
-s, --spaces Print spaces between transitions
-r, --random Select the string randomly, not according to weight

hfst-fst2txt uses:

-n, --number Print numbers instead of symbols in transitions

hfst-project uses:

-p, --project=LEVEL Project towards LEVEL level

  Where LEVEL is one of {upper,input,analysis} or {lower,output,generation}.

hfst-push-weights uses:

-p, --push=DIRECTION Push towards DIRECTION

  Where DIRECTION is {start, initial} or {end, final}.

<--
hfst-regexp2fst uses:
-->

hfst-repeat uses:

-f, --from=NUMBER Repeat at least NUMBER times
-t, --to=NUMBER Repeat at most NUMBER times

Where NUMBERs must be given in format strtod(3) supports.
If --from parameter is ignored, it defaults to 0,
if --to parameter is ignored, it defaults to infinity

hfst-txt2fst uses:

-a, --store_symbols Write symbol table with each transducer
-A, --store_symbols_to=FILE Store the symbol table to FILE
-n, --number If numbers are used instead of symbol names in transitions
-w, --weight Write transducer in weighted format
-e, --epsilon=EPS If no symbol table is given, map EPS as zero.

Transducer and file formats

Transducer formats

For transducer operations that input and output transducers, the output is always of same type as inputs and this cannot be overriden. To convert between weighted and unweighted variants, a tool hfst-convert is provided. Tools which operate on multiple transducers will issue error message if fed with different types of transducers.

 

Transducer archive files

The support for transducer archives, that is, streams containing catenation of

Line: 143 to 227
 

Reporting bugs

Changed:
<
<
All bugs in command line tools shall be reported to HFST team. It is good
>
>
All bugs in command line tools shall be reported to HFST team. It is good
 to include at least steps to reproduce the error (i.e. exact command(s) used),
Changed:
<
<
and first line of output of command hfst-tool --version.
>
>
and first line of output of command hfst-tool --version. E.g. include following in your message:

$ hfst-tool --version
HFST Toolname [internal ALPHA-$Revision: 1.7 $]
$ hfst-toolname [PARAMETERS that fail]
Failure output

It may also be possible to push bug reports to somewhere in KitWiki.

 

Development and distribution

Line: 155 to 250
 make static. New utilities can be developed using hfst-skeleton.cc as source.
Deleted:
<
<

This document (fragment) was automatically generated from README.rst on 2008-11-17T14:39:10+02:00

 


Line: 165 to 257
  --> -- TommiPirinen \ No newline at end of file
Added:
>
>
<-- vim: set ft=twiki: -->

META TOPICMOVED by="TommiPirinen" date="1236767171" from="KitWiki.NotYetHfstToolCommandLines" to="KitWiki.HfstCommandLineTools"

Revision 12008-11-17 - TommiPirinen

Line: 1 to 1
Added:
>
>
META TOPICPARENT name="HfstHome"

HFST: Command line tools best practices

HFST tools is a collection of HFST based command line utilities that can create, operate and print transducers using HFST interface. The tools are licenced under GNU GPL version 3 (other licences may be available at request). Licence text can be found from file COPYING. Other licences are possible, and can be given by authors found in AUTHORS file.

Downloading

Tools can be fetched from HFST research downloads page.

Dependencies

Installation requires:

  • HFST
    • Compiling needs hfst.h in /usr/include/hfst/ or ???/hfst/
    • Linking and dynamic loading requires libhfst.so in /usr/lib/ or LD_LIBRARY_PATH

Installation

Source code can be compiled to programs by command make. The programs can be installed to system’s binary directory by make install, the binary target defaults on linux systems to /usr/bin, but may be changed with make variable prefix.

Usage

hfst-toolname [OPTIONS] [FILE...]

HFST tools contains number of different command line utilities, and their parameters vary on case by case basis. If in doubt, parameter --help will always tell further instructions

Common parameters

-h, --help Print help message
-V, --version Print version info
-v, --verbose Print verbosely while processing
-q, --quiet Do not print output
-s, --silent Alias of --quiet
-o, --output=FILENAME Print output to FILENAME

Output parameter may not apply on tools that do not create transducers or data, such as hfst-transducer-info.

If output parameter is not given, the transducer will be written to standard output stream. That is, following are equivalent in terms of output processing:

hfst-toolname --output=output.hfst
hfst-toolname &gt; output.hfst

If transducer is written in standard output stream, error messages and verbose output are printed to standard error stream instead of standard output.

Weight parameters

-u, --unweighted Use unweighted transducers for operation
-w, --weighted Use weighted transducers for operation
-w, --weight=NUMBER Use weighted transducers and NUMBER as weight

For operations that add weight, the weight switch takes required argument defining weight as number. The number is parsed using standard library’s strtod(3) implementation.

For transducer operations that input and output transducers, the output is always of same type as inputs and this cannot be overriden. To convert between weighted and unweighted variants, a tool hfst-convert is provided.

Parameters for unary operator tools

-i, --input=FILENAME Read transducer from FILENAME

The input filename may also be specified as free argument of command line, that is, following are equivalent in terms of input file processing:

hfst-toolname --input=transducer.hfst
hfst-toolname transducer.hfst
cat transducer.hfst | hfst-toolname

Unary operations are e.g. hfst-:

  • determinize
  • invert
  • minimize
  • project
  • push-weights
  • remove-epsilons
  • repeat
  • reverse

Parameters for binary operator tools

-1, --input1=FILENAME Read first input transducer from FILENAME
-2, --input2=FILENAME Read second input transducer from FILENAME

It is also possible to give one or both of the filenames as free arguments on commandline, that is, all following are equivalent in terms of processing:

hfst-toolname --input1=first.hfst --input2=second.hfst
hfst-toolname first.hfst second.hfst
hfst-toolname --input1=first.hfst second.hfst
hfst-toolname --input2=second.hfst first.hfst
cat first.hfst | hfst-toolname --input2=second.hfst
cat second.hfst | hfst-toolname --input1=first.hfst
cat second.hfst | hfst-toolname first.hfst

If the binary operator is not commutative, the input1 or first transducer is the first or leftmost operand. E.g. for composition input1’s lower level is matched against input2’s upper level.

Binary operations are hfst-:

  • compose
  • concatenate
  • conjunct
  • disjunct

Transducer archive files

The support for transducer archives, that is, streams containing catenation of multiple transducers, is experimental and tool-dependent. If streams that have single input transducer stream are used, the tools repeat the operation for each input as if they had been provided in separate invocations. For tools that require two input streams, the operation depends on specific tool; currently composition composes pairwise, repeating the last transducer for stream that contains fewer. All other tools operate pairwise and exit as soon as either stream is exhausted. To further operate on transducer archives, tools hfst-select etc.

Reporting bugs

All bugs in command line tools shall be reported to HFST team. It is good to include at least steps to reproduce the error (i.e. exact command(s) used), and first line of output of command hfst-tool --version.

Development and distribution

Source code archive contains test suite make check, which must be passed for all distributed versions, unless clearly labeled as alpha test versions. Statically linked test versions can be made with make static. New utilities can be developed using hfst-skeleton.cc as source.


This document (fragment) was automatically generated from README.rst on 2008-11-17T14:39:10+02:00


<--  
-->
-- TommiPirinen
 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback