hfst-guess

The guesser is part of HFST (University of Helsinki Finite State Transducer interface) finite state toolkit distribution; a tool that guesses morphological analyses for unknown words and generates paradigms of model forms for the guessed analyses. This tool is licenced under GNU GPL version 3 (other licences may be available at request). The licence can be found from file COPYING.

Usage

Usage: hfst-guess [OPTIONS...] [INFILE]
Use a guesser (and generator) to guess analyses or inflectional
paradigms of unknown words.

Common options:
  -h, --help             Print help message
  -V, --version          Print version info
  -v, --verbose          Print verbosely while processing
  -q, --quiet            Only print fatal erros and requested output
  -s, --silent           Alias of --quiet
Input/Output options:
  -i, --input=INFILE     Read input transducer from INFILE
  -o, --output=OUTFILE   Write output transducer to OUTFILE
Guesser options:
  -f, --model-form-filename       Inflectional information for
                                  generated model forms is read
                                  from this file.
  -n, --max-number-of-guesses     Maximal number of analysis
                                  per word form (5 by default).
  -m  --max-number-of-forms       Maximal number of generated model
                                  forms per guess (2 by default).
  -g  --generate-threshold        Generate only forms whose weight
                                  is better than the weight of the
                                  of the best form plus this threshold.
                                  (50 by default).
The guesser and generator should be constructed using the tool
hfst-guessify, which can compile a guesser and generator from a
morphological analyzer. hfst-guessify packages the guesser and
generator in the same fst-file.

If option -f is used, but a generator has not been compiled
with the guesser, a generator will be compiled, which will
increase load time.


If OUTFILE or INFILE is missing or -, standard streams will be used.

Report bugs to <hfst-bugs@helsinki.fi> or directly to our bug tracker at:
<https://sourceforge.net/tracker/?atid=1061990&group_id=224521&func=browse>

Example

Say, the guesser and paradigm generator were compiled from a morphological analyzer, which contains the paths

c a t 0:<N> 0:[GUESS_CATEGORY=1] 0:<SG>
c a t 0:<N> 0:[GUESS_CATEGORY=1] s:<PL>
d o g 0:<N> 0:[GUESS_CATEGORY=1] 0:<SG>
d o g 0:<N> 0:[GUESS_CATEGORY=1] s:<PL>
d o g 0:<V> 0:[GUESS_CATEGORY=2] 0:<INF>
d o g 0:<V> 0:[GUESS_CATEGORY=2] s:<3SG> 0:<PRES>
Given this guesser for English, hfst-guess gives the following guesses for the word "coats"
$ echo "coats" | hfst-guess little_en_guesser.hfst
coats   coat<N>[GUESS_CATEGORY=1]<PL>
coats   coat<V>[GUESS_CATEGORY=2]<3SG><PRES>
coats   coats<V>[GUESS_CATEGORY=2]<INF>
coats   coats<N>[GUESS_CATEGORY=1]<SG>

We can also generate a paradigm of word forms for each of the guesses. The paradigms are given in a model form file. The model form file should consist of lines which denote the protion of analyses that succeed the declension class marker symbols [CATEGORY_SYMBOL=X]. A possible model form file would thus be

<PL>
<3SG><PRES>
<INF>
<SG>
Using these model forms, we can generate the following paradigms for the four guesses given for the word "coats"
echo "coats" | hfst-guess little_en_guesser.hfst -f en_model_forms
coats   coat<N>[GUESS_CATEGORY=1]<PL>   coats   <no word forms> <no word forms> coat
coats   coat<V>[GUESS_CATEGORY=2]<3SG><PRES>    <no word forms> coats   coat    <no word forms>
coats   coats<V>[GUESS_CATEGORY=2]<INF> <no word forms> <no word forms> <no word forms> <no word forms>
coats   coats<N>[GUESS_CATEGORY=1]<SG>  <no word forms> <no word forms> <no word forms> <no word forms>

-- MiikkaSilfverberg - 2012-08-24