HFST: Terminology

Morphological Descriptions

Below we list some of the key concepts used in morphological descriptions. Some of them are used differently in SFST, which is one of the underlying libraries of HFST. When creating finite-state morphologies we draw on ideas from several different domains and the concepts we import from these domains bring some baggage with them, which we need to be aware of in order to be able to speak with all of the communities we have borrowed from.

Generative Phonology

In Generative Phonology, the idea is that we have un underlying deep level, from which all forms on the surface level are generated. The deep level may never have been seen in practice. It only manifests itself as surface forms.

Note. In two-level morphology, the deep level is represented by what Koskenniemi (1983) named the lexical level. He chose this term, because the lexical level is not as abstract as some proponents of generative phonology may like to advocate for the deep level.


Conventionally, Generative Phonology has written the deep level on top and the surface level at the bottom, which today might be perceived as somewhat counterintuitive, but at the time was probably dictated by typewriter conventions and the lack of modern tools for visual graphics. As most cultures fill the writing surface from the top, the order of the levels can also be seen to reflect the dogma that the hypothetical deep level is primary and the surface level is always generated from the deep level.

deep level kauppa+Nom (grocery store) kauppa+Gen (of the grocery store)
lexical level kaupPA 0 kaupPA n
surface level kauppa kaupan


In Lexicography, the lemma (lexicon form, dictionary form or look-up form) represents all the forms of a word within the paradigm of a lexeme. The word forms in a paradigm are called inflected forms. The lemma is often chosen to be one of the forms in the inflectional paradigm of a word, in which case the lemma is also called the base form. In some languages, the lemma is a root, which is different from any of the inflected forms.

Note 1. In order to refer to the position of an inflected form in a paradigm it is often useful to give the base form or the root with corresponding morphological features, i.e. geese is goose+Plural+Nom. We refer to this base form with features as a grammatical word.

Note 2. The lexicon form (in Lexicography) is different from the lexical level (in Two-level Morphology).


grammatical words kauppa+Nom (grocery store) kauppa+Gen (of the grocery store) ...
lexeme kauppa (base form) kaupan (inflected form) ...

Note. The base form and the inflected forms together constitute the full paradigm of the word forms of the lexeme.

Transducer Technology

Transducers have an input and an output tape. The input tape is always the tape from which data to be processed is originally read and the output tape is the tape to which the final result is written. Intermediary results may be stored on either tape.


When adapting a morphological lexicon to the world of transducers, we have programs that analyze word forms into grammatical words and generate inflected forms from grammatical words. In this case, we still keep the vertical graphical order introduced by generative phonology, but we may switch what is seen as the input and output tapes as needed.

generator input kauppa+Nom (grocery store) kauppa+Gen (of the grocery store) ...
generator output kauppa kaupan ...

analyzer output kauppa+Nom (grocery store) kauppa+Gen (of the grocery store) ...
analyzer input kauppa kaupan ...

However, if we represent the transducers horizontally, we are free from the conventions of Generative Phonology and may write the input to the left and the output to the right:

generator input generator output
kauppa+Nom (grocery store) kauppa
kauppa+Gen (of the grocery store) kaupan
... ...

analyzer input analyzer output
kauppa kauppa+Nom (grocery store)
kaupan kauppa+Gen (of the grocery store)
... ...

-- KristerLinden - 12 Dec 2008

FST Descriptions

We need to list some terms that are used often in the FST world and that are adopted in the HFST documentation in particular. In principle, the terminology could be stored in the Terms web in KitWiki, but we list the relevant terms under the current topic.

  • identity pair, a pair of identical elements; in use in [Kaplan and Kay 1994].
  • auxiliary marker symbol, a symbol that is added to strings in order to indicate positions and to be removed later, may originate from (please check) [Kaplan and Kay 1994].
    • diamond, a kind of an auxiliary marker, this name was introduced in [Yli-Jyrä and Koskenniemi 2005]
    • joiner, a kind of auxiliary marker, introduced in the HfstLexc documentation

-- AnssiYliJyra - 13 Aug 2008

Topic revision: r6 - 2014-02-10 - ErikAxelson
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2018 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback