HFST: Interpretations and Requirements for the other
Symbols
This page is for discussing the useful interpretations of an other
symbol when used in transitions. In FSAs, the other
symbol refers to any other input which is not explicitly mentioned as a label in that automaton. In FSTs there may be either pairs with other on input or on output side, or on both. In the following, we use the question mark ?
for other
(in accordance with the Xerox conventions). Note that the any
symbol of regular expressions represents a different concept (although often denoted also with a question mark).
Ways to describe the semantics of "other"symbols
There are three possible ways to express formal semantics of programming languages:
denotational semantics,
operational semantics,
axiomatic semantics and
algebraic semantics. In our discussion, Kimmo's proposal was operational, Krister's denotational or axiomatic, and Anssi seemed to work on multiple semantics. It is possible that although the implementation of operational semantics is most practical, the other approaches serve well when explaning and validating the oparational semantics. The different interpretations can be seen as follows:
 operational  how we change the transitions of FSTs when we harmonize their alphabets
 denotational  what changes harmonizations need to do in the relations implemented by the transducers and their aphabets
 axiomatic  the state of the program is defined using assertions: logical statements: predicates where the variables define the state of the program
 algebraic  how the elements and their operations constitute algebraic structures.
The operational semantics works on transitions and their splitting operations, while the same effect could be described without any reference to source and target states by talking about the morphisms of the labels. When two FSTs are combined, we could compute two morphisms that are used to extend the labels of each FST.
A Template for New Proposals on Othersymbols
 The used "other"symbols are listed and their runtime interpretation is given by denotational semantics
 Reference to a common definition of denotational semantics of rational operations is made.
 Operational semantics of the following operations on a flowertransducer (one state) is explored with examples and then generalized to label morphisms:
 projection and pair lookup (these are closest to the semantics of runtime lookup)
 intersection
 composition
 The soundness and completeness of the operational semantics is proven with respect to the denotational semantics.
Comparison of different "others" in different implementations.
"other" in networks:


interpretation 
type of FST 



parallel harmonization 
mixed 
sequential harmonization 


id.pair 
members known 
relative to 
listed 
AFST 
Kimmo's twol 
Måns' 
Anssi's 
Anssi's v.2 
XFST 
Miikka's 
Kimmo's replace 
fixed nonid. pairs 
1 
no 
both 

a:b goes into Π a goes to T_i b goes to T_o 
a:b 
a:b 
a b 
a:b 
a:b 
a:b 
a:b 
a:b 














half open nonid. pairs 
2a 
no 
one 
a × \T_o 
a goes into T_i' a goes to T_i 
a:<.> 
a:? 

a:? 
a:? 
a:? 
a:≠ 
a:? 
2b 
no 
one but unspecific 
(\T_i'×\T_o) 


<?>:? 
N/A 

N/A 
2c 
no 
one 
\T_i × a 
a goes into T_o' a goes to T_o 
<.>:a 
?:a 

?:a 
?:a 
?:a 
≠:a 
?:a 
2d 
no 
one but unspecific 
(\T_i×\T_o') 


?:<?> 
N/A 

N/A 














open nonid. pairs 
3a 
no 
both but unspecific 
(T_i×T_o)Π 

<.> 
?:? 

?:? 
<?>:<?> 
N/A 

N/A 
3b 
no 
none 
\(T_i ∪ T_o) × \(T_i ∪ T_o) 


?:? 
?:? 
≠:≠ 
N/A 












open id. pairs 
4a 
yes 
both but unspecific 
(T_i×T_o)Π 


=:= ^{1} 
<=>:<=> ^{1} 
N/A 
N/A 
N/A 
4b 
yes 
none 
\(T_i ∪ T_o) × \(T_i ∪ T_o) 

@ @ 
=:= 
? 
=:= 
?:? 














fixed id. pairs 
5a 
yes 
both 

a:a goes into Π a goes to T_i 
a:a 
a:a 
a a 
a:a 
a:a 
a 
a:a 
a:a 














variables/diamonds 
6a 
yes 
both 

a goes into T_m 
@1:@1 
N/A 
(x) (x) 
N/A 
<x>:<x> 
N/A 
N/A 
N/A 
6b 
yes 
none 
Γ  T_m 


N/A 

N/A 
<@> 
N/A 
N/A 
N/A 
Note 1. If the pair alphabet contains all possible identity pairs constructed from the known input or output symbols, the the current pair does not represent any of these identity pairs.
Note 2. Hulden, Måns. 2008. Regular Expressions and Predicate Logic in FiniteState Language Processing. In preproceedings of FiniteState Methods and Natural Language Processing. FSMNLP 2008. 7th International Workshop, Ispra, Italy, September 1112, 2008.
Practical Observations
 It might be a good idea to allow twolevel rules to have an open input and output alphabet (< Kimmo Koskenniemi)
 The notion of identity pairs assumes that the input and output alphabets have compatible keys for common and unseen symbols. (< Anssi YliJyrä)
 Processing unknown identity pairs under composition is quite similar to processing of all unknown pairs in intersection (< Kimmo Koskenniemi)
 Unknown nonidentity pairs have wider use than what we thought initially (< Anssi YliJyrä)
 "Any" symbol or "any pair" as isolated regular expression correspond to "other" or "other pair" an an isolated transducers. When both expressions and transducers are combined under their respective operations, these evolve into incompatible notions (< Anssi YliJyrä).
 If we need to make difference between unknown pairs with identity constraint and unknown pairs without it, like between
?
and ?:?
in XFST, then the most portable approach is to refer to them with pairs that use different symbols (rather than ?
in two different ways). For example: =:=
and ≠:≠
. (< Anssi YliJyrä)
 Kimmo's pair harmonization has prior implementation in AFST, but it was integrated to a customized intersection algorithm. It was easy to implement but extra work, and the main obstacle for full support of replace rules was the unimplemented support for concatenation, union and complementation. When alphabetprocessing is separated to a separate "harmonization" layer, fuller support is quick to add. (integrated pair harmonization < Anssi YliJyrä, separate pair harmonization < Kimmo Koskenniemi)
 It might be possible to use AFST/Kimmotwol style pair harmonizations inside twolevel grammars and the representation is then changed in composition context to use XFST/Kimmoreplace/Anssiv2 representations that have more subtle classes. (< al.)
Some of these observations have been gathered to the page HfstOtherSymbolsInUse
Shared (?) Standpoints
Although there are many approaches, we can perhaps already agree upon the following matters:
 Pair harmonization should be used in those subsystems where it works, because it is most efficient.
 Need for improvement: pair harmonization should be extended with
=:=
in order to widen its applications (although this widening does not make it as general as symbol harmonizations); this may not be possible without turning pair harmonization to a symbol harmonization, but we do not know for sure.
 Two harmonizations are needed in more general settings where e.g. composition is present.
 More work on an ideal symbol harmonization scheme is needed.
 Conversions between the two representations should be elaborated.
 Computing the complement of alphabets including a symbol for unknown identity pairs causes Cartesian explosion that may be a problem in some approaches.
Currently Open Proposals
 HfstOtherAsSimpleSymbol ("Kimmo's FSTs")
 (a proposed observation, truth not sure:) Kimmo's symbol harmonization is closely related to the use of
?
in XFST. The main differences are (i) that Kimmo's proposal does not store the pair alphabet in separation, but it is computed from the arc labels  this means that all known symbols must be in use, and (ii) that mixed unknown symbol like ?:?
in XFST are not supported in Kimmo's approach.
 critique:
 The approach assumes that replace rules like
? > ?
do not occur
 The approach assumes that markup rules like
?* > ( .. )
are not reduced to GTWOL rules although an applicable method is known (YliJyrä 2008: Transducers from Parallel Replace Rules and Modes with Generalized Lenient Composition, FSMNLP 2007)
 The approach assumes all identity pairs (e.g. for Unicode symbols) in twol rules are mentioned explicitly even if they are defaults.
 HfstOtherSymbolExamples ("Miikka's FSTs")
 This is based on the representation of Kimmo's replace FSTs, but introduces the use of
nonidentityother
, becoming very similar to the representation of XFST FSTs.
 HfstOtherAsSimpleSymbols ("Anssi's FSTs")
 This proposal extends the use of
nonidenitityother
to those unknown pairs that contain known symbols.
 critique:
 This is more complex than Kimmo's approach
 It is not feasible to list the Cartesian product of known symbols if we do not want to include them to
?:?
(this applies to AFST/Kimmo's twol approaches as well)
 some benefits in comparison to Kimmo' approach
 The approach assumes that identity pairs are important also in twol rules
 The approach assumes that rules relate two sets of Unicode strings mostly by copying rather than by listing all symbol pairs
 The approach assumes that it is important to be able to take the complement of the set of identity or nonidentity pairs without closing the alphabet.
 Anssi's forthcoming proposal v.2 introduces the largest number of other symbols among the proposals, but reduces the number of arcs which helps when computing complements.
Closed Discussions
 NoHfstOtherSymbol contains miscellaneus and inconsistent set of prior comments in our discussion
 HfstOtherDefinition contains some definitions that may not be consistent with the current proposals