Tool User Quick Start

Download and compile a lexcion

If you have installed hfst, download a Finnish lexicon text file from:

http://svn.gna.org/viewcvs/omorfi/branches/finntreebank/finntreebank.lexc?view=log

and use the commands mentioned in the beginning of the file:

hfst-lexc -v -f foma finntreebank.lexc -o finntreebank.inverted.hfst
hfst-invert -v
finntreebank.inverted.hfst -o finntreebank.debug.hfst
hfst-fst2fst -v finntreebank.debug.hfst -f olw -o finntreebank.hfst

You may also download some precompiled lexicons for various languages from

http://sourceforge.net/projects/hfst/files/morphological-transducers/

Use the lexicon

You can try out the Finnish lexicon with some word, e.g. "testi":

echo "testi" | hfst-lookup finntreebank.hfst

and you should get the line:

testi testi<N><sg><nom> 0.000000

Try a non-word

echo "xtesti" | hfst-lookup finntreebank.hfst

and you should get:

xtesti xtesti+? inf

Other lookup tools

There is a tool that does some useful things with capital letters, but may be slightly slower. You can feed it text and not only single words:

cat your-text | hfst-proc finntreebank.hfst

On the other hand, if you need speed, e.g. when you have millions of words to analyze, you may wish to feed your list of words to the optimized lookup command:

cat your-list | hfst-optimized-lookup finntreebank.hfst

All commands have various parameters that will give you different formatting of the output. You get advice on those with the --help option, e.g.

hfst-lookup --help

-- KristerLinden - 2012-03-23


Edit | Attach | Print version | History: r8 < r7 < r6 < r5 < r4 | Backlinks | Raw View | Raw edit | More topic actions...
Topic revision: r5 - 2012-12-04 - KristerLinden
 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback