other
symbol
The discussion below was removed from HfstOtherSymbol because they are no more in the focus of our discussion and they could induce (unless updated) some inconsistencies with the latest account.
Especially, other
and any
symbols may have different interpretations in different accounts. You may read the material below if you ensure that you can put the comments to
their meant contexts.
Note 6.1. other
in transducers and any
symbols do have a common origin in regular expressions
We should not draw so clear opposition between identity-other
in FSAs/FSTs and identity-any
in regular expressions. The fact is that if we use identity-any
as an isolated subexpression of regular expression, it corresponds to a one-transition automaton s0: identity-other->fs1
. If the containing regular expressions is then evaluated in the bottom-up fashion through closure properties of automata, then the elementary automaton for identity-other
and its regexp counterpart identity-any
evolve to two different symbols. The regexp symbol is no more present in complex automata, while the set of symbols denoted by idenitity-other
evolves into more particular set of symbols as the listed set of symbols in subexpressions becomes larger during the bottom-up evaluation. The same holds for irreflexive-other
and irreflexive-any
.
Note 6.2. Also any-pair
or <.>
and its FST-level counterpart have a common origin in regular expressions.
In AFST-20080214, the common origin of <.>
in regular expressions and in transducers involves pair harmonization: AFST provides a meta symbol <.>
(in addition to .
in SFST). This stands for the unknown pair alphabet (the alphavet o SFST-PL programs) in regular expressions, and a symbol in transition labels. Its semantics is illustrated by the examples: <.> & ( a:b | c:d | <d>) = a:b | c:d <d> and a:<.> & <.>:b = a:b. Note that the semantics of <.>
was implemented in regular expressions and transducers with no difference in the naming: symbol anysymb
was used in all cases. In our current terminology in this article, the symbol any-pair
became symbol other-pair
when the transducer was constructed. The availability of any-pair
i.e. =<.> in AFST allowed for the following kinds of programs:
$X$ = a:b <.> c d:<.> ALPHABET = a:b c d:e d:f $X$:knownThe operator:known
assigned the contents of the define pair alphabet.
for>.>
, thus closing the alphabet. The above AFST script resulted to the following transducer whose pair alphabet is closed:s1: a:b -> s2 s2: a:b -> 3, c -> 3, d:e -> 3, d:f -> 3 s3: c -> s4 s4: d:e -> fs5, d:f -> fs5 fs5.
-- AnssiYliJyra - 26 Sep 2008
NOTE6.2 In this discussion, we have not always differentiated between any symbol i.e. <.>
and any pair <.>:<.>
. This is a source for confusion. In fact, we should also consider the possibility of any-identity-pair
i.e. ?
in XFST regular expression.
-- AnssiYliJyra - 27 Sep 2008
NOTE6.3 There are many ways to convert any
to other
. Some of these ways postpone the expansion and use any
or in transducers up to a point when we need to get rid of it (e.g. under relative complementation e.g. <.>:<.> -
that results into a list of symbols).
There is also a way to see the any
symbol as a metasymbol in automata. This approach tries to avoid the expensive route (i): compiling any
in regular expressions with an other
automaton and harmonizing the other
transition in this automaton each time when its matrix automaton gains more characters, or another expensive route (ii): collecting the known symbols and compile any
in regular expressions immediately in a way that is in harmony with the whole. The approach takes advantage of the wrapper's ability to open a third route: (i) compiling any
in regexps to any
in transducers, collecting the known symbol from the automaton and then harmonizing any
using the other
symbol and a list of known symbols. Accordingly, a part of the regexp compilation is implemented through harmonization that takes place at a point when the semantics of any
have to be explicated. We could say, that any
could be stay unharmonized under normal regular expression operations (union, concatenation, star), shuffle and composition while the explication and harmonization of this metasymbol becomes necessary under complementation and most rule operation.
-- AnssiYliJyra - 26 Sep 2008
NOTE7. The availability of metasymbols over known pairs and symbols is an important improvement to the size of intermediate results.
The availability of any
in intermediate transducers is an important optimization. For example, XFST compiles [a | b | c | d | e | f | g | h] ?:?
with a 56 transitions because the representation replaces any
with identity-other
, irreflexive-other
, and the list of known pairs. With the any
metasymbol, the result would have needed only 9 transitions.
s0: a -> s1, b -> s1, c -> s1, d -> s1, e -> s1, f -> s1. s1: ? -> fs2, a -> fs2, b -> fs2, c -> fs2, d -> fs2, e -> fs2, f -> fs2, <a:b> -> fs2, <a:c> -> fs2, <a:d> -> fs2, <a:e > -> fs2, <a:f> -> fs2, <a:?> -> fs2, <b:a> -> fs2, <b:c> -> fs2, <b:d> -> fs2, <b:e > -> fs2, <b:f> -> fs2, <b:?> -> fs2, <c:a> -> fs2, <c:b> -> fs2, <c:d> -> fs2, <c:e > -> fs2, <c:f> -> fs2, <c:?> -> fs2, <d:a> -> fs2, <d:b> -> fs2, <d:c> -> fs2, <d:e > -> fs2, <d:f> -> fs2, <d:?> -> fs2, <e :a> -> fs2, <e :b> -> fs2, <e :c> -> fs2, <e :d> -> fs2, <e :f> -> fs2, <e :?> -> fs2, <f:a> -> fs2, <f:b> -> fs2, <f:c> -> fs2, <f:d> -> fs2, <f:e > -> fs2, <f:?> -> fs2, <?:a> -> fs2, <?:b> -> fs2, <?:c> -> fs2, <?:d> -> fs2, <?:e > -> fs2, <?:f> -> fs2, <?:?> -> fs2. fs0:
My conclusion is that identity-other
and irreflexive-other
are less problematic with OpenFST etc. because they indicate disjoint sets of symbol pairs. However, as we already have a harmonization wrapper over OpenFST, it would be easy to allow the use of any
as long as it does not cause problems with the naive semantics of the openFST.
-- AnssiYliJyra - 26 Sep 2008
Expansion or extension:
a
from state 1
with a transition labeled with an extended pair (1,a)
(Roche 1995, Yli-Jyrä 1995)
d
in bracketed strings with an extended bracket that indicates the depth (d,<)
(Yli-Jyrä 2004, 2005)
?
can become {a,b,c,?}
Harmonization:
The final commitment to the term choices needs to be done.
-- AnssiYliJyra - 26 Sep 2008
Key harmonization takes always place if the combined keys are not from the same key table. Key harmonization is a low-level harmonization that does not affect the semantics. Equivalence class harmonization is too complex to be discussed in this comment. I assumed above that each key is for one symbol. Pair harmonization is sufficient for most computations. It is also often efficient.
Symbol harmonization is needed when we want to convert between our FST representation and XFST representation. In XFST representation, the pairs of the transducer are
not listed. Instead, only the symbols are listed and then the set of pairs consists of (i) pairs in transitions and (ii) identity pairs. Such a pair set conform the condition:
if a:b
is a known pairs then a:a
and b:b
are known pairs.
In sum, there is only a trade-off between
other-pair
(<.>
of AFST and (??? check!) =:= of TWOL)
idenitity-other
and irreflexive-other
(?
and ?:?
in XFST)
Mathematically XFST-notation is more elegant because the pairset is closed under projection. We do not need to introduce missing identity pairs during projection. However, we could also consider a combination, because the three metacharacters could be interpreted by the preemptive harmonization.
-- AnssiYliJyra - 26 Sep 2008
It appears to me that it is slightly easier to consider first individual pairs and their combinations and then to go into the generalizations that iterate over all pairs in two transducers. I pursue toward such calculus of pais in this section. I call the calculus "symbol pair calculus".
Definition. Let E
be the union of alphabets E_{sym}
and E_{meta}
.
Let E_{meta}
be {other-pairs
, idenitity-other-relation
, irreflexive-other-relation}
.
The set of all abstract symbol pairs over E
, P(E), is defined to contain the following:
not-available:not-available
(used when an operation fails, e.g. if we complement <.>
we should get nothing, i.e. the empty set)
other-pairs:other-pairs
(used to refer to the complement of the known set of pairs)
identitity-other:identitity-other
irreflexive-other:irreflexive-other
a:a
, a:other-pairs
and other-pairs:a
, a:irreflexive-other
and irreflexive-other:a
, where a∈ E_{sym}
a:b
where a,b∈ E_{sym}
.
a:b
where a,b∈ E
is an abstract symbol pair.
Definition. Let P_{a}
, P_{b}
, and P_{c}
be sets of listed abstract symbol pairs over E
. Let a:a'∈P_{a}
, b:b'∈P_{b}
and c:c'∈P_{c}
. The following operations are called abstract symbol-pair operations. The result of the operations are sometimes set-valued.
C(a:a')
where a:a'∈P_{a}
(close the alphabet by removing =other=-like symbol)
N_{Pb}(a:a')
where a:a'∈P_{a}
(used to compute complement)
K_{Pa}(a:a')
where a:a'∈P_{a}
(applied when the set of possible pairs that need to be listed is fixed)
H_{Pa,Pb}(a:a')
where a:a'∈P_{a}
(needed in harmonization)
a:a' ∩ b:b'
(needed in intersection and determinization)
a:a' ∪ b:b'
(needed when computing equivalent symbol pairs or harmonized alphabets)
a:a' - b:b'
(needed when computing difference)
a:a' xor b:b'
(needed when computing exclusive or of languages)
a:a' o b:b'
(needed in composition)
a:a' o b:b' o c:c'
(needed in three-way composition that is more efficient than two compositions)
I(a:a')
and O(a:a')
(needed in projection)
I^{-1}(a')
and O^{-1}(a)
(needed in oriented contexts and surface coercion of GTWOL, useful also for computing crossproduct)
V(a:a')
(needed in inversion)
-- AnssiYliJyra - 26 Sep 2008
-- AnssiYliJyra - 25 Sep 2008
Many FST libraries implement symbol pairs as pairs without the identity constraint. Accordingly, the identity constraint is not supported by the libraries.
Nevertheless, we can still find the solution through harmonization. The solution would be not to strive at two kinds of ?:?
but to two kinds of ?
.
Instead of talking about how we obtain the symbol pair sets like other .x. other
and Identity(other), we could talk about relations what they represent. Then we could denote such relation with a suitable symbol x
and extend it into symbol pairs using the convention x=x:x.
Some standard definitions:
E
is small enough and U(..)=E, then relation over E
represented by label ?:?
in XFST can be intransitive (a:b
, b:a
∈ ?:?
, but a:a
not in ?:?
). For a slight bigger E
, ?:?
is non-transitive (e.g. a:b
, b:c
, a:c
∈ ?:?
). If we assume that E
is always big enough, we known that ?:?
is a non-transitive relation. In addition, it is symmetric and irreflexive. Under the XFST interpretation, relations represented by ?
and ?:?
are disjoint but their union contains all pairs. Therefore, we can term ?:?
as the complement of ?
w.r.t. the set of all pairs of unknown symbols.
known notation | interpretation | metasymbol | symbol pair needed in most implementations |
---|---|---|---|
? (XFST) or =:= (TWOL?) |
the largest identity relation over other symbols | identity-other |
identity-other:identity-other |
?:? (XFST) |
the largest irreflexive relation over other symbols | irreflexive-other |
irreflexive-other:irreflexive-other |
-- AnssiYliJyra - 25 Sep 2008
"any" in regular expressions:
type of pairs | XFST regexps | AFST regexps | SFST regexps | TWOL regexps | |
---|---|---|---|---|---|
non-identity pairs (when the set of pairs is not defined) | ?:? |
<.> |
N/A | ||
identity pairs (when the set of pairs is not defined) | ? |
N/A | |||
pairs of a known symbol and any other symbol (when the set of pairs is not defined) | a:? ?:a |
a:<.> <.>:a |
N/A | ||
defined pairs containing a specified symbol (when the set of pairs is defined) | a:. .:a |
a:. .:a |
|||
non-identity pairs (when the set of pairs is defined) | ?:? |
. |
. |
||
identity pairs (when the set of pairs is defined) | ? |
TWiki Reference