HFST: API for Generalized Restriction (GR) and Predicate Logic in FST Calculus



Purpose of the layer

This extension to the basic HFST API provides support for new letter-FST operations that are useful for compilation of
  • rewriting, replace and markup rules, constraint rules, contextual rules and grammars in general
  • Two-Level Phonology, Generative Phonology, Optimality Theoretic Phonology, and Autosegmental Phonology, and
  • lexicon formalisms and surface syntactic grammars.

Prior references

The current definitions are based on the convergence of two streams of work:
  • the Generalized Restriction framework (Yli-Jyrä & Koskenniemi 2004, 2006, 2007; Yli-Jyrä 2007a, 2007b)
  • the Regular Expressions and Predicate Logic (Hulden 2008).
The more general discussion on FST related arithmetics goes back to the work of Buchi (1960), Elgot (1961), etc. See Section References below.

Basic Notions

Auxiliary Marker Symbols

Overt FSTs keep track of the set of the known auxiliary markers (or their keys).

  1. The basic and common notion in both streams is the notion of an auxiliary symbol that is used to indicate a boundary of a substring. This symbol has two names: a diamond and variable symbol (). More such symbols can be numbered or named according to letters using different notational conventions: 1, ◊2,..., <x>, <y>, ... (x), (y),..., x, y, .... These auxiliary symbols occur in strings of marker-augmented FSTs that we call overt FSTs (interfaced through OvertFstHandle). The data type of an uncompiled generalized restriction is OvertFstPair.
  2. The size of th auxiliary symbol alphabet may vary dynamically.
    1. Most applications use only one or two kinds of auxiliary symbols, but they may need a few more (a few dozens of them).
    2. In addition to these auxiliary symbols that are employed by an application programmer, we need to prepare for hundreds of such auxiliary symbols that are created by special algorithms that speed up the compilation of the GR operation. These symbols are added in certain paths in order to make FSTs look simpler. It is not yet certain whether these symbols are internalized to an algorithm or whether they need to be stored to the marker alphabet of the overt FSTs.
    3. The number may be even larger in the future when we have a weighted GR operation. Each rule and non-zero weight in the representation of the rules may need a separate auxiliary symbol. For example, a nondeterministic FST can often be transformed to a deterministic one by adding such auxiliary symbols.
  3. Unknown auxiliary symbols have to added by harmonization in some situations:
    1. When an FST is combined under intersection with other FSTs: loops for unknown auxiliaries are simply added to all FSTs that are being combined under intersection.
    2. The complement or relative complement of an overt FST is computed w.r.t. the universal language over all pairs of known and unknown normal symbols and all the identity pairs of known auxiliary symbols.

All auxiliary symbol are neither used to mark things in similar ways and they are not used to mark similar things.

  1. The number of auxiliary symbols in the strings may vary since there are many ways ("usage modes") to use auxiliary markers. All these should be supported:
    1. It occurs in specified positions and is used as a joiner rather than a substring marker.
    2. It occurs a bounded number of times (typically 2) and mark substrings.
    3. It occurs freely in specified positions that may be finite of infinit.
  2. All auxiliary symbols used in the current layer have are interfaced through a VarHandle.
    1. Each VarHandle contains a Key. In transitions, these Keys are used. The auxiliary symbols can be removed from FSTs one by one or once for all. Accordingly, auxiliary symbols are quite similar to keys of normal symbols.
    2. The VarHandle that is stored to the marker alphabet of over FSTs containts additional information on usage mode of the symbol.
  3. The functions in the GR layer may test that the relevant VarHandles have a correct usage mode.
    1. The current layer supports only one kind of auxiliary symbols: those that are used in pairs to mark substrings.
    2. Yli-Jyrä (2007a, 2007b) and Yli-Jyrä and Koskenniemi (2004) already employ auxiliary symbols that do not occur in pairs but as a joiners or freely in specified positions.

The total set of possible pairs have to be defined somehow when a complement of an overt FST is computed.

  1. Access to related =other=-symbol (or key) conventions would make the complementation more compact and efficient. Therefore, the layer supporting unknown keys and pairs would be below the current layer.
  2. We do not need an =other=-symbol that would account for unknown auxiary markers, since unknown auxiliary symbols are treated as epsilons except in intersection and relative complementation.

Nondeterministic Letter Transducers

Often letter transducers are kept as deterministic and minimal. However, in the case of overt FSTs, we may want to avoid determinization of intermediate results.

Overview of this Layer

Excepting the harmonization and carrying the auxiliary symbol alphabet, most of the functionalities of this layer is based on normal FST operations provided by the TransducerLayer. Accordingly, the large number of operations in the current layer does not imply much programming.

Type Comment hfst-lexc hfst-twolc
VarHandle a variable denoting positions in strings and used in different ways according to the variable type (on boundaries, sprinkled, members)    
VarSetHandle a set of variables    
OvertFstHandle an FST using variables that are not yet bound to any value    
OvertFstPair a pair of FSTs using variables    
TransducerHandle an HFST transducer without variable symbols    

The current layer contains a few sets of functions:

  1. Those that need to assume a particular usage of auxiliary symbols that are given as VarHandle parameters:
    • functions for joiner variables (minimun level of support)
    • functions for substring variables
    • functions for auxiliary symbols that occur freely in specified positions
  2. Those that operate overt FSTs and their pairs without paying attention to the usage modes of the known auxiliary symbols.

Functions Supporting Joiner Variables

Creating New Joiner Variables

Type Function Comment hfst-lexc hfst-twolc
VarHandle create_var_type1 () Create a FO variable used for a boundary between two substrings    
VarSetHandle create_var_set () Create an empty set of FO variables    
VarSetHandle insert_var (VarSetHandle s, VarHandle x) Insert an FO variable to set s    

Functions That Relate Joiner Variables to FSTs

Type Function Comment hfst-lexc hfst-twolc
OvertFstHandle var_create_joiner_fst (VarHandle x, TransducerHandle t) Asserts that position x is a boundary between empty prefix and empty suffix.    

Functions That Implement Logical Connectors and Quantifiers

Type Function Comment hfst-lexc hfst-twolc
OvertFstHandle convert_to_overt_fst (TransducerHande e) Rises the type of a transducer to an overt FST.    
OvertFstHandle connect_with_concatenation (OvertFstHande e, OvertFstHande f) Combines two overt FSTs under concatenation.    
OvertFstHandle connect_with_and (OvertFstHande e, OvertFstHande f) Combines two overt FSTs under conjunction.    
OvertFstHandle connect_with_relative_complement (OvertFstHande e, OvertFstHande f) Combine two overt FSTs as e ∪ ¬f.    
OvertFstHandle quantify_existentially (VarHandle m, OvertFstHandle φ) Restricts φ to those strings that contain an m and hides then m in φ by removing them like epsilons,
i.e. computes (∃ x)φ(x).
TransducerHandle quantify_existentially_all (OvertFstHande e) Removes all auxiliary symbols.    
OvertFstHandle quantify_universally (VarHandle m, OvertFstHandle φ) Restricts x to all substrings in φ by evaluating
    Σ* <x> Σ* <x> Σ*  =x>  φ
i.e. computes ¬ ((∃ x) ¬φ(x)) that is the same as (∀ x)φ(m).
TransducerHandle quantify_universally_all (OvertFstHandle φ) The essential part of the ordinary GR operation that removes all variables at once like repeatly applying quantify_universally for all of them.    

Function Supporting Substring Variables

These functions are implemented in a latter stage. These are currently moved to HfstAPIGeneralizedRestrictionLayer2OLD in order to prioritize the work.


  • C. Elgot: Decision problems of finite automata and related arithmetics. Trans. Amer. Math. Soc. 98: 21-51 (1961)
  • R. Cohen and J. Brzozowski: Dot-depth of star-free events. J. Comput. Syst. Sci. 5:1-16 (1971)
  • W. Thomas: Classifying Regular Events in Symbolic Logic. J. C. S. Sci. 25(3): 360-376 (1982)
  • N. Vaillette: Logical specification of regular relations for NLP. Natural Language Engineering 9(1): 65-85. (2003)
  • A. Yli-Jyrä: Describing syntax with star-free expressions. In EACL 2003. 379-386. (2003)
  • --: Contributions to the Theory of Finite-State Based Grammars. Dissertation. (2005a)
  • --: Linguistic Grammars with Very Low Complexity. In Arppe et al., Inquiries into Words, Constraints and Contexts. CSLI Publications. (2005b)
  • --: Applications of Diamonded Double Negation. Invited talk (original title: Hidden Jewels of Double Negation). FSMNLP 2007 postproceedings. (2007a)
  • --: Transducers from Parallel Replace Rules and Modes with Generalized Lenient Composition. FSMNLP 2007 postproceedings. (2007b)
  • -- and K. Koskenniemi: Compiling contextual restrictions on strings into finite-state automata. FASTAR Days. Eindhoven. (2004)
  • -- and --: Compiling Generalized Two-Level Rules and Grammars. FinTAL 2006: 174-185. (2006)
  • -- and --: A New Method for Compiling Parallel Replacement Rules. CIAA 2007: 320-321 (2007c)
  • M. Hulden. Regular expressions and predicate logic in finite-state language processing. FSMNLP 2008. (2008)

-- AnssiYliJyra - 06 Oct 2008
Topic revision: r11 - 2009-09-30 - ErikAxelson
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2018 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback