HFST: Interpretations and Requirements for the other Symbols

This page is for discussing the useful interpretations of an other symbol when used in transitions. In FSAs, the other symbol refers to any other input which is not explicitly mentioned as a label in that automaton. In FSTs there may be either pairs with other on input or on output side, or on both. In the following, we use the question mark ? for other (in accordance with the Xerox conventions). Note that the any symbol of regular expressions represents a different concept (although often denoted also with a question mark).

Ways to describe the semantics of "other"-symbols

There are three possible ways to express formal semantics of programming languages: denotational semantics, operational semantics, axiomatic semantics and algebraic semantics. In our discussion, Kimmo's proposal was operational, Krister's denotational or axiomatic, and Anssi seemed to work on multiple semantics. It is possible that although the implementation of operational semantics is most practical, the other approaches serve well when explaning and validating the oparational semantics. The different interpretations can be seen as follows:
  • operational - how we change the transitions of FSTs when we harmonize their alphabets
  • denotational - what changes harmonizations need to do in the relations implemented by the transducers and their aphabets
  • axiomatic - the state of the program is defined using assertions: logical statements: predicates where the variables define the state of the program
  • algebraic - how the elements and their operations constitute algebraic structures.

The operational semantics works on transitions and their splitting operations, while the same effect could be described without any reference to source and target states by talking about the morphisms of the labels. When two FSTs are combined, we could compute two morphisms that are used to extend the labels of each FST.

A Template for New Proposals on Other-symbols

  1. The used "other"-symbols are listed and their run-time interpretation is given by denotational semantics
  2. Reference to a common definition of denotational semantics of rational operations is made.
  3. Operational semantics of the following operations on a flower-transducer (one state) is explored with examples and then generalized to label morphisms:
    1. projection and pair lookup (these are closest to the semantics of run-time lookup)
    2. intersection
    3. composition
  4. The soundness and completeness of the operational semantics is proven with respect to the denotational semantics.

Comparison of different "others" in different implementations.

"other" in networks:

interpretation type of FST
parallel harmonization mixed sequential harmonization
id.pair members
relative to listed AFST Kimmo's
Måns' Anssi's Anssi's v.2 XFST Miikka's Kimmo's
fixed non-id. pairs 1 no both   a:b goes into Π
goes to T_i
goes to T_o
a:b a:b a b a:b a:b a:b a:b a:b
half open non-id. pairs 2a no one a × \T_o a goes into T_i'
goes to T_i
a:<.> a:?   a:? a:? a:? a:≠ a:?
2b no one but
(\T_i'×\T_o)     <?>:? N/A   N/A
2c no one \T_i × a a goes into T_o'
goes to T_o
<.>:a ?:a   ?:a ?:a ?:a ≠:a ?:a
2d no one but
(\T_i×\T_o')     ?:<?> N/A   N/A
open non-id. pairs 3a no both but
(T_i×T_o)-Π   <.> ?:?   ?:? <?>:<?> N/A   N/A
3b no none \(T_i ∪ T_o) × \(T_i ∪ T_o)     ?:? ?:? ≠:≠ N/A
open id. pairs 4a yes both but
(T_i×T_o)-Π       =:= 1   <=>:<=> 1 N/A N/A N/A
4b yes none \(T_i ∪ T_o) × \(T_i ∪ T_o)   @ @ =:= ? =:= ?:?
fixed id. pairs 5a yes both   a:a goes into Π
goes to T_i
a:a a:a a a a:a a:a a a:a a:a
variables/diamonds 6a yes both   a goes into T_m @1:@1 N/A (x) (x) N/A <x>:<x> N/A  N/A N/A
6b yes none Γ - T_m     N/A   N/A <@> N/A N/A N/A

Note 1. If the pair alphabet contains all possible identity pairs constructed from the known input or output symbols, the the current pair does not represent any of these identity pairs.
Note 2. Hulden, Måns. 2008. Regular Expressions and Predicate Logic in Finite-State Language Processing. In pre-proceedings of Finite-State Methods and Natural Language Processing. FSMNLP 2008. 7th International Workshop, Ispra, Italy, September 11-12, 2008.

Practical Observations

  • It might be a good idea to allow two-level rules to have an open input and output alphabet (< Kimmo Koskenniemi)
  • The notion of identity pairs assumes that the input and output alphabets have compatible keys for common and unseen symbols. (< Anssi Yli-Jyrä)
  • Processing unknown identity pairs under composition is quite similar to processing of all unknown pairs in intersection (< Kimmo Koskenniemi)
  • Unknown non-identity pairs have wider use than what we thought initially (< Anssi Yli-Jyrä)
  • "Any" symbol or "any pair" as isolated regular expression correspond to "other" or "other pair" an an isolated transducers. When both expressions and transducers are combined under their respective operations, these evolve into incompatible notions (< Anssi Yli-Jyrä).
  • If we need to make difference between unknown pairs with identity constraint and unknown pairs without it, like between ? and ?:? in XFST, then the most portable approach is to refer to them with pairs that use different symbols (rather than ? in two different ways). For example: =:= and ≠:≠. (< Anssi Yli-Jyrä)
  • Kimmo's pair harmonization has prior implementation in AFST, but it was integrated to a customized intersection algorithm. It was easy to implement but extra work, and the main obstacle for full support of replace rules was the unimplemented support for concatenation, union and complementation. When alphabet-processing is separated to a separate "harmonization" layer, fuller support is quick to add. (integrated pair harmonization < Anssi Yli-Jyrä, separate pair harmonization < Kimmo Koskenniemi)
  • It might be possible to use AFST/Kimmo-twol style pair harmonizations inside two-level grammars and the representation is then changed in composition context to use XFST/Kimmo-replace/Anssi-v2 representations that have more subtle classes. (< al.)

Some of these observations have been gathered to the page HfstOtherSymbolsInUse

Shared (?) Standpoints

Although there are many approaches, we can perhaps already agree upon the following matters:

  1. Pair harmonization should be used in those sub-systems where it works, because it is most efficient.
    • Need for improvement: pair harmonization should be extended with =:= in order to widen its applications (although this widening does not make it as general as symbol harmonizations); this may not be possible without turning pair harmonization to a symbol harmonization, but we do not know for sure.
  2. Two harmonizations are needed in more general settings where e.g. composition is present.
    • More work on an ideal symbol harmonization scheme is needed.
    • Conversions between the two representations should be elaborated.
    • Computing the complement of alphabets including a symbol for unknown identity pairs causes Cartesian explosion that may be a problem in some approaches.

Currently Open Proposals

  • HfstOtherAsSimpleSymbol ("Kimmo's FSTs")
    • (a proposed observation, truth not sure:) Kimmo's symbol harmonization is closely related to the use of ? in XFST. The main differences are (i) that Kimmo's proposal does not store the pair alphabet in separation, but it is computed from the arc labels - this means that all known symbols must be in use, and (ii) that mixed unknown symbol like ?:? in XFST are not supported in Kimmo's approach.
    • critique:
      1. The approach assumes that replace rules like ? -> ? do not occur
      2. The approach assumes that markup rules like ?* -> ( .. ) are not reduced to GTWOL rules although an applicable method is known (Yli-Jyrä 2008: Transducers from Parallel Replace Rules and Modes with Generalized Lenient Composition, FSMNLP 2007)
      3. The approach assumes all identity pairs (e.g. for Unicode symbols) in twol rules are mentioned explicitly even if they are defaults.
  • HfstOtherSymbolExamples ("Miikka's FSTs")
    • This is based on the representation of Kimmo's replace FSTs, but introduces the use of non-identity-other, becoming very similar to the representation of XFST FSTs.
  • HfstOtherAsSimpleSymbols ("Anssi's FSTs")
    • This proposal extends the use of non-idenitity-other to those unknown pairs that contain known symbols.
    • critique:
      1. This is more complex than Kimmo's approach
      2. It is not feasible to list the Cartesian product of known symbols if we do not want to include them to ?:? (this applies to AFST/Kimmo's twol approaches as well)
    • some benefits in comparison to Kimmo' approach
      1. The approach assumes that identity pairs are important also in twol rules
      2. The approach assumes that rules relate two sets of Unicode strings mostly by copying rather than by listing all symbol pairs
      3. The approach assumes that it is important to be able to take the complement of the set of identity or nonidentity pairs without closing the alphabet.
  • Anssi's forthcoming proposal v.2 introduces the largest number of other symbols among the proposals, but reduces the number of arcs which helps when computing complements.

Closed Discussions

  • NoHfstOtherSymbol contains miscellaneus and inconsistent set of prior comments in our discussion
  • HfstOtherDefinition contains some definitions that may not be consistent with the current proposals

Topic attachments
I Attachment Action Size Date Who Comment
PDFpdf othersymbolspace.pdf manage 5.2 K 2008-09-25 - 12:01 KristerLinden  
Topic revision: r44 - 2009-06-08 - AnssiYliJyra
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback