Difference: HfstCommandLineToolsTutorial (1 vs. 9)

Revision 92016-05-20 - ErikAxelson

Line: 1 to 1
 
META TOPICPARENT name="HfstCommandLineTools"

HFST: Command Line Tools Tutorial

Line: 110 to 110
  --> -- ErikAxelson - 2012-02-22 \ No newline at end of file
Added:
>
>
META PREFERENCE name="VIEW_TEMPLATE" title="VIEW_TEMPLATE" type="Set" value="FinCLARIN.ViewFinClarinWideEngTemplate"

Revision 82014-02-24 - ErikAxelson

Line: 1 to 1
 
META TOPICPARENT name="HfstCommandLineTools"

HFST: Command Line Tools Tutorial

Cat to Dog

We give a simple example of creating a transducer that maps "cat" to "dog"

Changed:
<
<
and printing the string pairs recognized by that transducer. The following commands,
>
>
with weight 1.5 and printing the string pairs recognized by that transducer. The following commands,
 when executed on the command line
Changed:
<
<
echo "cat:dog" | hfst-strings2fst --format sfst | hfst-fst2strings
>
>
echo "{cat}:{dog}::1.5" | hfst-regexp2fst | hfst-fst2strings
 

will yield the following output

Line: 18 to 18
 cat:dog
Added:
>
>
To see how the strings are tokenized and the tokens aligned, we can print the input-output pairs and add a space between the pairs. We can also print the total weight of the string pair. This happens by adding the following options to hfst-fst2strings

--xfst=print-space --xfst=print-pairs --print-weights

which will yield

c:d a:o t:g     1.5
 One feature of HFST tools is that they by default read from standard input and write to standard output, making it easy to pipeline a series of commands, as can be seen in our example. We could also have written separate commands to get the same result:

echo "cat:dog" > cat2dog.txt
Changed:
<
<
hfst-strings2fst --input cat2dog.txt --output cat2dog.hfst --format sfst
>
>
hfst-strings2fst --input cat2dog.txt --output cat2dog.hfst
 hfst-fst2strings --input cat2dog.hfst --output result.txt cat result.txt

Another feature is that we are able to choose the implementation from several back-end

Changed:
<
<
libraries with the option --format or in short just -f.
>
>
libraries (openfst-tropical, foma, sfst) with the option --format or in short just -f. If back-end is not specified, openfst-tropical will be used.
 Sometimes it can be more efficient to use a certain library for a task. In our example the differences between libraries are negligible, so we could have used --format foma
Changed:
<
<
or --format openfst-tropical as well. Most of the time we don't need to worry about the back-end
>
>
or --format sfst as well. Most of the time we don't need to worry about the back-end
 implementation and we can leave this parameter out, using the default openfst-tropical.

The parameters of a tool can be seen using the option --help or on the tool-specific wiki page.

Changed:
<
<
In our example, the commands hfst-strings2fst --help and hfst-fst2strings --help and the pages HfstStrings2Fst and HfstFst2Strings tell how the tools are used and what is their purpose. For instance the help message of hfst-strings2fst tells us that the input can be given in various formats using different parameters:

  echo "cat:dog" | hfst-strings2fst            create cat:dog fst
  echo "c:da:ot:g" | hfst-strings2fst -p       same as pairstring
  echo "c:d a:o t:g" | hfst-strings2fst -p -S  same as pairstring with spaces
  echo "c a t:d o g" | hfst-strings2fst -S     same with spaces

So we could have written as well

echo "c:d a:o t:g" | hfst-strings2fst -S -p --format sfst | hfst-fst2strings

if we prefer writing strings that way.

>
>
In our example, the commands hfst-regexp2fst --help and hfst-fst2strings --help and the pages HfstRegexp2Fst and HfstFst2Strings tell how the tools are used and what is their purpose.
 

Animal Nouns and Verbs

Line: 77 to 75
 bark #;
Changed:
<
<
and can easily convert that into a transducer, say in SFST format, by executing
>
>
and can easily convert that into a transducer by executing
 
Changed:
<
<
hfst-lexc lexicon.lexc -f sfst -o lexicon.hfst
>
>
hfst-lexc lexicon.lexc -o lexicon.hfst
 

which will write a corresponding transducer in the file lexicon.hfst.

Revision 72014-02-19 - ErikAxelson

Line: 1 to 1
 
META TOPICPARENT name="HfstCommandLineTools"

HFST: Command Line Tools Tutorial

Line: 24 to 24
 
echo "cat:dog" > cat2dog.txt
Changed:
<
<
hfst-strings2fst --input cat2dog.txt --output cat2dog.hfst --f format sfst
>
>
hfst-strings2fst --input cat2dog.txt --output cat2dog.hfst --format sfst
 hfst-fst2strings --input cat2dog.hfst --output result.txt cat result.txt
Line: 60 to 60
 

Animal Nouns and Verbs

Changed:
<
<
The tool hfst-lexc is very useful for writing grammars.
>
>
The tool hfst-lexc is very useful for writing grammars.
 We have a simple lexicon defined with lexc formalism in file lexicon.lexc as follows
Line: 80 to 80
 and can easily convert that into a transducer, say in SFST format, by executing
Changed:
<
<
./hfst-lexc lexicon.lexc -f sfst -o lexicon.hfst
>
>
hfst-lexc lexicon.lexc -f sfst -o lexicon.hfst
 

which will write a corresponding transducer in the file lexicon.hfst.

Revision 62014-02-10 - ErikAxelson

Line: 1 to 1
 
META TOPICPARENT name="HfstCommandLineTools"

HFST: Command Line Tools Tutorial

Line: 60 to 60
 

Animal Nouns and Verbs

Changed:
<
<
The tool hfst-lexc is very useful for writing grammars.
>
>
The tool hfst-lexc is very useful for writing grammars.
 We have a simple lexicon defined with lexc formalism in file lexicon.lexc as follows

Revision 52012-04-12 - ErikAxelson

Line: 1 to 1
 
META TOPICPARENT name="HfstCommandLineTools"

HFST: Command Line Tools Tutorial

Line: 110 to 110
 
-- ErikAxelson - 2012-02-22

Revision 42012-04-12 - ErikAxelson

Line: 1 to 1
Changed:
<
<
META TOPICPARENT name="HfstAllPages"
>
>
META TOPICPARENT name="HfstCommandLineTools"
 

HFST: Command Line Tools Tutorial

Cat to Dog

Line: 104 to 104
 
Changed:
<
<
  • More examples of HFST command line tools are given in tool-specific wiki pages.
>
>
  • More examples of HFST command line tools are given in tool-specific wiki pages.
 


Revision 32012-04-10 - ErikAxelson

Line: 1 to 1
 
META TOPICPARENT name="HfstAllPages"

HFST: Command Line Tools Tutorial

Line: 102 to 102
 

What next

Changed:
<
<
  • We recommend reading the page HfstOutline to get familiar with the different functionalities offered by the HFST tools.
>
>
 

Revision 22012-02-23 - ErikAxelson

Line: 1 to 1
 
META TOPICPARENT name="HfstAllPages"

HFST: Command Line Tools Tutorial

Changed:
<
<

A simple example

>
>

Cat to Dog

  We give a simple example of creating a transducer that maps "cat" to "dog" and printing the string pairs recognized by that transducer. The following commands,
Line: 58 to 58
 if we prefer writing strings that way.
Changed:
<
<

TODO: A more complex example

>
>

Animal Nouns and Verbs

 
Changed:
<
<
Here we create a lexicon, combine it with a couple of rules and print the strings recognized by the resulting transducer.

An unweighted lexicon transducer lexicon.hfst is defined with hfst-lexc. The format of lexicon.hfst is, say, SFST. We process a list of input words stored in $WORDS by composing them with the lexicon transducer and see what the output is:

We first convert each word (a string on a line) into a transducer:

>
>
The tool hfst-lexc is very useful for writing grammars. We have a simple lexicon defined with lexc formalism in file lexicon.lexc as follows
 
Changed:
<
<
echo $WORDS | hfst-strings2fst --format=sfst |
>
>
LEXICON Root Noun ; Verb ;
 
Changed:
<
<
We then compose each transducer with the lexicon
>
>
LEXICON Noun cat #; dog #;
 
Changed:
<
<
   hfst-compose lexicon.hfst | 
>
>
LEXICON Verb mew #; bark #;
 
Changed:
<
<
and extract the output level:
>
>
and can easily convert that into a transducer, say in SFST format, by executing
 
Changed:
<
<
hfst-project --output |
>
>
./hfst-lexc lexicon.lexc -f sfst -o lexicon.hfst
 
Changed:
<
<
Finally we convert each transducer into strings. As one input string can yield several output strings, we also limit the number of strings to 5.
>
>
which will write a corresponding transducer in the file lexicon.hfst. We can print the strings recognized by the transducer as follows
 
Changed:
<
<
hfst-fst2strings --max-strings=5
>
>
hfst-fst2strings lexicon.hfst
 
Changed:
<
<
In addition to lexicon.hfst, a couple of two-level rules R1.hfst and R2.hfst are needed. Now the run-time becomes:
>
>
which will yield
 
Changed:
<
<
echo $WORDS | hfst-strings2fst --format=sfst | hfst-compose lexicon.hfst | hfst-compose-intersect R1.hfst R2.hfst | hfst-project --output | hfst-fst2strings --max-strings=5
>
>
bark mew dog cat
 
Deleted:
<
<
Suppose we have a weighted transducer P.hfst (whose format is tropical OpenFst) for giving priorites to the the output. If we only want the 10 best results for the transducers lexicon.hfst, R1.hfst and R2.hfst, the run-time becomes:

   echo $WORDS | hfst-strings2fst |
   hfst-compose lexicon.hfst |
   hfst-compose-intersect R1.hfst R2.hfst |
   hfst-fst2fst --format=openfst-tropical |   # type conversion
   hfst-compose P.hfst |
   hfst-project --output | 
   hfst-fst2strings --nbest=10 --print-weights=true
 

What next

  • We recommend reading the page HfstOutline to get familiar with the different functionalities offered by the HFST tools.
  • HfstCommandLineTools gives more thorough information on parameters and formats recognized by the HFST tools.
Changed:
<
<
  • More examples of HFST command line tools are also given in tool-specific wiki pages listed here.
>
>
  • More examples of HFST command line tools are given in tool-specific wiki pages.
 


Revision 12012-02-22 - ErikAxelson

Line: 1 to 1
Added:
>
>
META TOPICPARENT name="HfstAllPages"

HFST: Command Line Tools Tutorial

A simple example

We give a simple example of creating a transducer that maps "cat" to "dog" and printing the string pairs recognized by that transducer. The following commands, when executed on the command line

echo "cat:dog" | hfst-strings2fst --format sfst | hfst-fst2strings

will yield the following output

cat:dog

One feature of HFST tools is that they by default read from standard input and write to standard output, making it easy to pipeline a series of commands, as can be seen in our example. We could also have written separate commands to get the same result:

echo "cat:dog" > cat2dog.txt
hfst-strings2fst --input cat2dog.txt --output cat2dog.hfst --f format sfst
hfst-fst2strings --input cat2dog.hfst --output result.txt
cat result.txt

Another feature is that we are able to choose the implementation from several back-end libraries with the option --format or in short just -f. Sometimes it can be more efficient to use a certain library for a task. In our example the differences between libraries are negligible, so we could have used --format foma or --format openfst-tropical as well. Most of the time we don't need to worry about the back-end implementation and we can leave this parameter out, using the default openfst-tropical.

The parameters of a tool can be seen using the option --help or on the tool-specific wiki page. In our example, the commands hfst-strings2fst --help and hfst-fst2strings --help and the pages HfstStrings2Fst and HfstFst2Strings tell how the tools are used and what is their purpose. For instance the help message of hfst-strings2fst tells us that the input can be given in various formats using different parameters:

  echo "cat:dog" | hfst-strings2fst            create cat:dog fst
  echo "c:da:ot:g" | hfst-strings2fst -p       same as pairstring
  echo "c:d a:o t:g" | hfst-strings2fst -p -S  same as pairstring with spaces
  echo "c a t:d o g" | hfst-strings2fst -S     same with spaces

So we could have written as well

echo "c:d a:o t:g" | hfst-strings2fst -S -p --format sfst | hfst-fst2strings

if we prefer writing strings that way.

TODO: A more complex example

Here we create a lexicon, combine it with a couple of rules and print the strings recognized by the resulting transducer.

An unweighted lexicon transducer lexicon.hfst is defined with hfst-lexc. The format of lexicon.hfst is, say, SFST. We process a list of input words stored in $WORDS by composing them with the lexicon transducer and see what the output is:

We first convert each word (a string on a line) into a transducer:

   echo $WORDS | hfst-strings2fst --format=sfst |

We then compose each transducer with the lexicon

   hfst-compose lexicon.hfst | 

and extract the output level:

   hfst-project --output |

Finally we convert each transducer into strings. As one input string can yield several output strings, we also limit the number of strings to 5.

   hfst-fst2strings --max-strings=5

In addition to lexicon.hfst, a couple of two-level rules R1.hfst and R2.hfst are needed. Now the run-time becomes:

   echo $WORDS | hfst-strings2fst --format=sfst |
   hfst-compose lexicon.hfst |
   hfst-compose-intersect R1.hfst R2.hfst |
   hfst-project --output | 
   hfst-fst2strings --max-strings=5

Suppose we have a weighted transducer P.hfst (whose format is tropical OpenFst) for giving priorites to the the output. If we only want the 10 best results for the transducers lexicon.hfst, R1.hfst and R2.hfst, the run-time becomes:

   echo $WORDS | hfst-strings2fst |
   hfst-compose lexicon.hfst |
   hfst-compose-intersect R1.hfst R2.hfst |
   hfst-fst2fst --format=openfst-tropical |   # type conversion
   hfst-compose P.hfst |
   hfst-project --output | 
   hfst-fst2strings --nbest=10 --print-weights=true

What next

  • We recommend reading the page HfstOutline to get familiar with the different functionalities offered by the HFST tools.
  • HfstCommandLineTools gives more thorough information on parameters and formats recognized by the HFST tools.
  • More examples of HFST command line tools are also given in tool-specific wiki pages listed here.
  • More complex scripts are given in HfstCommandLineToolExamples.


<--  
-->
-- ErikAxelson - 2012-02-22
 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback