This page is devoted to one particular and simple interpretation for other
(alias ?
) and the corresponding implementation and consequences. The interpretation is in short:
?
symbol is like any other symbol in their alphabet.
any
and other
, there is only the ?
whose interpretation is external to the machine and thus not coded in the transition or elsewhere in the machine.
?
are identical in two FSAs, then the normal FSA operations apply (without any special treatment of ?
.
?
.
?
) and their output alphabets are identical, then the normal operations of union, concatenation (and intersection if applicable) apply normally, without special consideration of the ?
.
?
.
?
is thus deferred, partly to possible later operations where harmonization is needed, and partly to the execution of a FSA as a recognizer or FST as a transducer.
?
transition).
t
from Qi to Qj in FSA1 with label ?
, add a transition from Qi to Qj with each label in S12.
?
transitions in FSA1, nothing will be added.
In replace rules, the obvious default is the identity correspondence, because the rule only says about the center and the relevant context. Anything beyond the center must remain as it was. On the other hand, the identity correspondence is not useful in two-level rules. Once you enter identity correspondences, two rules will be in fatal conflict.
Thus we will introduce two kinds of harmonizations, one called identity harmonization for replace rules, and the other called pair harmonization e.g. for two-level rules. Both harmonize by adding transitions to accompany the original ?
transitions, but slightly in a different way.
For identity harmonization, we expand ?
transitions roughly as follows:
(Qi ?:? -> Qj)
IN FSA1, add transitions (Qi ?:? -> Qj)
FOR EACH x
IN the alphabet of FST2 NOT IN the alphabet of FST1.
For the pair harmonization, the expansion differs from the above so that we add pairs x:y
(instead of identity pairs x:x
) from the difference of pair alphabets of the two FSTs.
See the end of HfstOtherSymbol for additional details.
other
in FSAs
Suppose we treat the other
i.e. ?
as an ordinary symbol. If two automata A
and B
have an identical alphabet of known symbols. Suppose also that both A
and B
use the ?
for other symbols which do not have explicit transitions in them, e.g. A
which accepts anything which does not start with an a
, and B
which accepts anything which does not start with b
or bb
.
A: 1. ? -> 2 2: a -> 2, ? -> 2 B: 1: ? -> 2 2: ? -> 3 3: b -> 3, ? -> 3
Suppose we would make an intersection of A
and B
. The symbol b
becomes known in A
and the symbol a
in B
. We need to expand the ?
with these:
A: 1: ? -> 2, b -> 2 2: a -> 2, ? -> 2, b -> 2 B: 1: ? -> 2, a -> 2 2: ? -> 3, a -> 3 3: b -> 3, ? -> 3, a -> 3
Intersecting these two augmented automata needs no special treatment for ?
:
A: 1: ? -> 2, b -> 2 2: a -> 2, ? -> 2, b -> 2 B: 1: ? -> 2, a -> 2 2: ? -> 3, a -> 3 3: b -> 3, ? -> 3, a -> 3 A&B: 11: ? -> 22 22: a -> 23, ? -> 23 23: a -> 23, b -> 23, ? -> 23
Thus, it looks like the only thing is needed for FSAs with ?
is that we harmonize the alphabets of the two FSAs by adding some transitions. All labels of the other automata which are not part of this automaton, need to be added as alternatives for each ?
transition. After harmonizing we should be able to concatenate, union and intersect automata with ?
symbols with no special handling for the ?
.
other
with FSTs
It seems likely that the same technique of expanding and harmonizing of FST alphabets prior to normal unions, concatenations and compositions would succeed just like it does with FSAs. The only problem appears to be how to interpret the other
symbol (or ?
) in FST contexts.
Replace rules need only to characterize the changing part and the context which is explicitly mentioned. By definition, replace rules leave everything else intact. Thus all unmentioned, i.e. unknown symbols are well tolerated and they correspond to the very same symbol (no matter what it was). With replace rules we hardly need a distinction between the input and output alphabets (both are open). We might expect the following concepts for other
and sample representations using ?
:
?
is X:X
where X
is any symbol of the (open-ended input) alphabet not mentioned elsewhere in this FST.
?:a
is X:a
where X
is any symbol of the alphabet not mentioned elsewhere in this FST. Any such symbol is mapped into a
.
a:?
similarly, but a
is mapped to all such symbols. (Less useful but possible use.)
?:?
is less useful or possibly not needed at all. (Comments please!) If the cross-product interpretation of X:Y
(where X
and Y
are any and possibly distinct symbols not mentioned) is not needed, ?:?
in transitions could be used to denote ?
as X:X
(the above case 1).
A test case:
1. a:? -> 2 2. ?:b -> 3 3:What would this FST do with input "ab" and what with "ac"? Would it accept a relation "a:c c:b" (accept), "a:a b:b" (accept?) and/or "a:b a:b" (accept??)? (Comments please!) Note that
a:?
can only be in the center of a replace rule, not elsewhere. (Center is the part which is rewritten by the replace rule and the only part which might change.)
other
with the identity constraint
If the only way we need to combine replace-rules is by composition, perhaps the interpretion X:X
of ?:?
is sufficient. Indeed this might render the other interpretation X:Y
meaningless. If we want to use conjunction, the interpretation X:Y
might become sensible, i.e. allow any replacements, that other rules may make, which don't effect characters mentioned in this rule. In this case composition and conjunction seem equivalent to me, though.
-- MiikkaSilfverberg - 24 Sep 2008
It seems that both other
(other
with identity) and other:other
are both needed at least in replace and markup rules. The first use is obvious, but the second is motivated by the interaction between contexts that contain other
and centers that contain other
, but on different sides. In addition, both are needed in generalized two-level rules. However, this does not cancell other good ideas on this page.
other
in FSTs which relate representations
Two-level rule grammars typically relate two representations. These representations may have different and even disjoint alphabets. It is less clear whether really unknown symbols are relevant. Often the input and output alphabets are fixed and known in advance. There is, still, a certain need for the concept of other
symbol. The other
symbol is more relative in such a context:
?
and ?:?
could likely mean the same thing, i.e. pairs X:Y
which belong to the allowed set of pairs of input and output symbols but which have not been mentioned explicitly in this FST. Harmonizing could mean the introduction of transitions with those pairs known in the other FST which are unknown in this FST.
?:a
could mean the pairs X:a
in the allowed set of pairs not explicitly mentioned in this FST.
a:?
could mean the pairs a:X
in the allowed set of pairs not explicitly mentioned in this FST.
-- KimmoKoskenniemi - 11 Sep 2008
I do not understand why two-level grammars should be considered as those used in morphology only. Such fixed notion of two-level grammars may be the common perception, but if we do not change this then no one else will do that. We need to look for algebraic generalizations such as our generalized two-level grammars (GTWOL) and bracketed GTWOL. When they are mentioned there is no more place for such arguments that "replace rules are more flexible" or that "they are superior to two-level rules". The fact is that replace rules cannot emulate GTWOL rules, but all replace rules can be implemented through BGTWOL rules. Thus, which on is more general and which reduces to which.
Said this, I also see the point in what you say: replace rules specify typically one contextual edit in the whole distortion of the input that happens when a string is fed to a cascade of e.g. replace rules implementing rules of generative phonology. Because the rule is responsible for only one alternation only, the unaltered parts are simply copied. The same does not hold for the whole cascade as the cascaded topology is only an alternative for the parallel topology of phonological grammars. As to the use of the other symbol, a single two-level rule assumes less on those portions in the string pair where the rule does not apply.
In particular, a two-level rule allows arbitrary changes in those parts where the rule does not apply, while a replace rule does the contrary.
-- AnssiYliJyra - 25 Sep 2008
If a rule mentions a:b
, do all pairs a:X
(where X
hasn't been mentioned in the rule) belong to ?:?
? Consider the following two-level rule
a:b <= _ c ;If the pairs
a:X
don't belong to ?:?
then I think you could compile the rule into a transducer
1: ? -> 1 a:? -> 2 ?:b -> 1 ?:c -> 1 c:? -> 1 a:b -> 1 c -> 1 2: ? -> 1 a:? -> 2 ?:b -> 1 b:? -> 1 c:? -> 1 ?:c -> 1 a:b -> 1Is it possible to compile this rule without referring to each pair
a:X
individually, unless ?:?
excludes a:X
(perhaps this is a minor consideration)? Can this effect harmonization?
Actually, one can compile the rule simply
1: ? -> 1 a:b -> 1 a:? -> 2 c -> 1 2: a:b -> 1 a:? -> 2 ? -> 1since
?:?
will match exactly those pairs, that haven't been referred to in the fst, i.e. all pairs except a:b
, c
and a:?
(among others all b:?
and ?:b
pairs [ except a:b
] ).
-- MiikkaSilfverberg - 24 Sep 2008
The other
symbol perhaps could be treated just like any normal symbol in FST operations, if we adopt an alphabet harmonization as a necessary preliminary step. There are two kinds of harmonization:
Both types of harmonization split the various kinds ?:?
, a:?
and ?:a
transitions by adding concrete transitions (starting and ending in the same states as these other
transitions). Suppose we have FSTs A
and B
and their sets of label pairs are P(A)
and P(B)
respectively. Furthermore, let I(A)
, I(B)
and O(A)
, O(B)
stand for the input and output alphabets of A
and B
, and S(A)
and S(B)
stand for the common part of the input and output alphabets, i.e. I(A) ∩ O(A)
(but usually the input and output alphabets are the same). In these alphabets, we treat ?
as an ordinary symbol in the input or output alphabet. If the alphabet does not contain ?
, then it is closed, and harmonization will not expand transitions of that FST.
The pairs to be added to A in symbol harmonization are as follows (Please, comment and check!):
?
transition, add: x:x
FOR ALL x
IN S(B) - (S(A) ∪ ?)
a:?
transition, add a:x
FOR ALL x
IN O(B) - (O(A) ∪ ?)
?:a
transition, add x:a
FOR ALL x
IN I(B) - (I(A) ∪ ?)
In pair harmonization, we add to A (Please, comment and check!):
?:?
transition, add x:y
FOR ALL x:y
IN P(B) - (P(A) ∪ ?:?)
a:?
transition, add a:x
FOR ALL a:x
IN P(B) - (P(A) ∪ ?:?)
?:a
transition, add x:a
FOR ALL x:a
IN P(B) - (P(A) ∪ ?:?)
For the purposes of composition A .o. B
, one might need and define some further harmonizations. Harmonization for composition could assert that O(A)
and I(B)
would be mutually extended according to symbol harmonization in the above manner.
-- KimmoKoskenniemi - 11 Sep 2008
I support Kimmos proposal on two harmonizations (symbol and pair harmonization), but in the sense that they are actually restrictions of a more general harmonization that copes with both. Krister and Miikka have also made great work in definitions and commenting!
Kimmo's proposal might lead to inquiring what metasymbols are involved in each compilation formula (for replace, twol rules etc.). -- AnssiYliJyra - 26 Sep 2008
TWiki Reference