HFST: Terminology

Morphological Descriptions

Below we list some of the key concepts used in morphological descriptions. Some of them are used differently in SFST, which is one of the underlying libraries of HFST. When creating finite-state morphologies we draw on ideas from several different domains and the concepts we import from these domains bring some baggage with them, which we need to be aware of in order to be able to speak with all of the communities we have borrowed from.

Generative Phonology

In Generative Phonology, the idea is that we have un underlying deep level, from which all forms on the surface level are generated. The deep level may never have been seen. It only manifests itself as surface forms. In two-level morphology, the deep level is represented by what Koskenniemi (1983) called the lexical level.


In Lexicography, the lemma (lexicon form, dictionary form, look-up form) represents all the forms of a word within a paradigm. The word forms in a paradigm are called inflected forms. The lemma is often chosen to be one of the forms in the inflectional paradigm of a word, in which case the lemma is also called the base form. In some languages, the lemma is a root, which different from any of the inflected forms.

Note. The lexicon form (in Lexicography) is different from the lexical level (in Two-level Morphology).

Transducer Technology

Transducers have an input and an output tapes. The input tape is always the tape from which data to be processed is originally read and the output tape is the tape to which the final result is written. Intermediary results may be stored on either tape.

Graafinen esitys: yläpuoli ja alapuoli (t. ylänauha ja alanauha)

Merkkiparien kirjoittaminen: syötemerkki ja tulostusmerkki

FST Descriptions

We need to list some terms that are used often in the FST world and that are adopted in the HFST documentation in particular. In principle, the terminology could be stored in the Terms web in KitWiki, but we list the relevant terms under the current topic.

  • identity pair, a pair of identical elements; in use in [Kaplan and Kay 1994].
  • auxiliary marker symbol, a symbol that is added to strings in order to indicate positions and to be removed later, may originate from (please check) [Kaplan and Kay 1994].
    • diamond, a kind of an auxiliary marker, this name was introduced in [Yli-Jyrä and Koskenniemi 2005]
    • joiner, a kind of auxiliary marker, introduced in the HfstLexC documentation

-- AnssiYliJyra - 13 Aug 2008 - KristerLinden - 12 Dec 2008
Edit | Attach | Print version | History: r6 | r4 < r3 < r2 < r1 | Backlinks | Raw View | Raw edit | More topic actions...
Topic revision: r2 - 2008-12-10 - KristerLinden
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback