HFST: API Development Outline

For a full doxygen documentation of the currently implemented API layers, see Hfst API Documentation.

HFST – The Helsinki Finite-State Transducer technology is intended for creating and manipulating weighted or unweighted synchronic transducers implementing regular relations. The synchronic transducer t is a finite-state automaton representing a set of strings of symbol pairs. A symbol pair a:b consists of an input symbol a and an output symbol b , where a and b belong to an alphabet of symbols, Σ . The symbol pair belongs to an alphabet of pairs, Π. The print name of a symbol may consist of multiple characters. If a is equal to b , the symbol pair a:a may for convenience be written simply as a, aka an identity pair. E.g., a string of symbol pairs ab:cd consists of the symbol pairs a , b:c and d and denotes the synchronic transducer from abd to acd .

Version API Layer Covered Aspects
    labels operators graphs unknown labels edge packing markers rules label names label properties
                       
HFST 4.0 Character Properties API                 property
symbols
HFST 2.0 Symbol API symbol
tables
Rule Operator API constraints,
replace and
markup rules
                       
HFST 3.0 Overt Transducer API joiners,
diamonds,
diacritics,
variables
Packed Transducer API failure,
default,
equivalence
classes
Open Transducer API known alphabet
other
                       
HFST 2.0 Transducer Core Extension API   arcs,
nodes
paths
Transducer API operators
Transducer Key API keys,
key pairs,
key pair sets

The transducer layer of HFST (the most primitive layer):

The layers for extended transducers (based on the transducer layer):

  • the Open Transducer API for dealing with FSTs with alphabets with known and unknown symbols (requirements analysis: HfstOtherSymbol).
  • the Generalized Restriction API for dealing with generalized restrictions and predicate logic with substring variables
  • the Rule Operator API for dealing with specialized rules and related functions based on the previous layers

The symbol layers of HFST:

  • The equivalence class layer of HFST: in HFST 2.0 this is represented only by the KeyTable data structure in the symbol layer, where several symbols may be associated with one key. This layer can be cascaded.
  • the Symbol API for dealing with symbols and their interpretations in various applications.

Key Concepts

In reality, transducers are only concerned with states and transitions - not symbols, but it is often convenient to speak about the symbols that the transitions refer to when manipulating transducers. The relation between symbols and transition labels has been specified differently in different FST packages. We use the following naming conventions for these key concepts in HFST and indicate their most similar counterparts in some of the existing finite-state transducer packages.

The Symbol Layer API deals with symbols and their interpretations in various applications. Below is a comparison with the related layer in other implementations:

HFST Explanation OpenFST Karttunen 1990
Beesley & Karttunen 2003 (XFST)
SFST
Symbol A symbol is identified through
1. a string,
2. an session-wide index of the symbol in a controlled, global map of symbols.
String 1. a string
2. N/A
3. ordinal index to the symbol table of a network
1. a string
2. session wide Character indexing a string in CharMap/SymbolMap of TheAlphabet used by the SFST-PL compiler
3. transducer-specific Character indexing a string in CharMap/SymbolMap of the Alphabet of a transducer
SymbolPair 1. A pair of symbols.
2. The symbol pair is used by a network.
N/A 1. a pair of Symbol indexes
2. label as stored in the label table of an FST
3. ordinal index to the label table of an FST
1. an SFST Label structure with two Characters known to the alphabet of the tranducer
2. a member of the LabelSet of the Alphabet of a transducer.
3. an index of the member of LabelSet if LabelSet was defined as hash_map rather than as a set
1:N The 1:N relation btw. Key and Symbol is useful for dealing with equivalence classes. 1:N 1:1 1:1
KeyTable Table containing the mapping btw each key and its corresponding symbol(s). The OpenFST SymbolTable implements the mappings of OpenFST Labels to Strings and reverse. SymbolTables are used for describing the alphabet of the input and output labels for arcs in a Finite State Transducer.    

The Transducer Layer API deals with the core transducer calculus. Below is a comparison with the related layer in other implementations:

HFST Explanation OpenFST Karttunen 1990
Beesley & Karttunen 2003 (XFST)
SFST
Key A code for a symbol or a set of symbols
1. The code may be specifc to the input or output tape of a network.
Label See Symbol (Label when FSA) See Symbol 3., i.e. an input or output Character of an SFST Label
KeyPair A pair of keys.
1. A pair of symbol codes
(2. A network-specific code for such a key pair).
Arc See SymbolPair (Label when FST) See SymbolPair 1., i.e. an SFST Label structure

HFST 2.0 - closed alphabets without equivalence classes (current version)

Each symbol name represents a symbol. The symbol names are stored in a global symbol table. An alphabet of symbols is a SymbolSet, Σ. An alphabet of symbol pairs is a SymbolPairSet, Π. Only basic operations are provided on sets of symbols and symbol pairs. For more complex manipulations, see Transducer operations.

In a transducer, a key is an unsigned number representing an input (or output) label of a transition. One key may be associated with several symbols, but each symbol may be associated with only one key. A key table lists the symbols associated with each key. Input and output key tables can be stored and loaded separately in order to preserve the key numbering and symbol associations from one session to another. The epsilon transition label has the key 0. For convenience, we also define an alphabet of keys as a KeySet and an alphabet of key pairs is a KeyPairSet.

HFST 3.0 - open alphabets with equivalence classes (planned version)

Each transducer has an input and an output alphabet of symbols and an alphabet of symbol pairs. An alphabet of symbols is a SymbolSet, Σ. An alphabet of symbol pairs is a SymbolPairSet, Π. Symbols have equivalence classes with regard to an alphabet. The epsilon symbol has class number 0. The other symbol represents the class of symbols unknown to a transducer and has class number 1.

When combining two alphabets, a symbol numbering is created which guarantees that each symbol name is associated with only one class number in the combined alphabet. Trivially this is true if each symbol name has its own number, i.e. creating the most specific numbering representing singleton classes, but the implementation may provide a less specific numbering for the combined alphabet in view of a particular transducer. An alphabet and its symbol class numbering can be stored and loaded separately. A transducer may be anonymized by storing the transducer without symbol names in which case the symbol class numbering is used for indexing the transitions.

NOTE. For a proposal by AYJ for extending or modifying the HFST API concerning datatypes for symbols and alphabets, see HfstAPIProposalOLD.


-- KristerLinden - 01 Jul 2008
Topic revision: r80 - 2009-09-30 - ErikAxelson
 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2018 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback