other
and Similar Metasymbols in Symbol (Pair) Calculus This page is withdrawn from further processing. I started again with HfstOtherAsSimpleSymbols  AnssiYliJyra  29 Sep 2008.
To implement a restriction from the set of possible symbols to the sets of symbols used in a language L
, we need alphabets.
Definition. An alphabet is a structure A=(DomainOfDefinition,Blocks,ExplicitBlocks,NullableBlocks,DisallowedBlocks)
whose members are defined as follows:
DomainOfDefinition
: the set of possible symbols (e.g. the set of strings),
Partition
: a partition of DomainOfDefinition
(corresponds closely to a key table),
ExplicitBlocks
: blocks whose content is easily listed (the keys in a key table),
NullableBlocks
: blocks containing symbols whose free occurrences in strings carry no difference to the meaning of the strings: If L
is a language over A
and 0∈∪_{B∈NullableBlocks}B
, then the following statements are equivalent: xy∈L
and x0y∈L
.
DisallowedBlocks
: blocks that cover symbols whose occurrences in strings indicate that the string does not belong to the language L
(compare: a key table can contain a key that is not used in transitions).
Why NullableBlocks
and DisallowedBlocks
are sets of blocks? In the case of onetape alphabets they are actually empty or singletons. For pair alphabets, it is necessary to define at least two implicit blocks (for identity pairs and for irreflexive pairs). In closed alphabets, these implicit blocks are among the disallowed blocks.
Alternatively, it is possible to mention symbols that are disallowed, while the openclass symbols are used in strings. Thus, it is possible that 2 blocks are disallowed.
The same holds for the empty string: in addition to the block that constains the usual zero symbol, an openclass of symbols can be treated as invisible symbols.
An alternative to the current structure is one that assumes that openclass symbols share their block with some listed symbols. Such a represention has some advantages and disadvantages.
Definition. A simple alphabet is an alphabet Q_{A1×A2}=(DomainOfDefinition,Partition,OtherBlocks,NullableBlocks,DisallowedBlocks)
where
ExplicitBlocks
, NullableBlocks
, and DisallowedBlocks
is 1.
Definition. A pair alphabet is an alphabet P_{A1×A2}=(DomainOfDefinition,Partition,ExplicitBlocks,NullableBlocks,DisallowedBlocks)
where A1
and A2
are alphabets, and DomainOfDefinition&61;DomainOfDefinition_{A1}×DomainOfDefinition_{A2}
.
Example 1. The language a b* c
use the alphabet ({0,a,b,...}, {{0},{a},{b},{c},{d,e,...}}, {{0},{a},{b},{c}}, {{0}}, {{d,e,...}})
.
Example 2. The language ? a ?:?
use the alphabet ({0,0:a,...,0:z,a:0,a,a:b,...a:z,b:0,b:a,b,b:c,...,z:y,z}, {{0},{a}, a:?∪?:a∪?∪?:? }, {{0},{a}}, {{0}}, {})
.
In every alphabet A
,
KnownAlphabet
of A
is defined as ∪_{B∈ExplicitBlocks}B
UselessAlphabet
of A
is defined as ∪_{B∈DisallowedBlocks}B
,
NullableAlphabet
of A
is defined as ∪_{B∈NullableBlocks}B
, and
UnknownAlphabet
of A
is defined as DomainOfDefinition  KnownAlphabet
,
UsefulAlphabet
of A
is defined as DomainOfDefinition  UselessAlphabet
,
LetterAlphabet
of A
is defined as DomainOfDefinition  NullableAlphabet
,
Blocks
of A
is the set of blocks in the Partition
,
ImplicitBlocks
of A
is defined as Blocks  ExplicitBlocks
,
AllowedBlocks
of A
is defined as Blocks  DisallowedBlocks
,
LetterBlocks
of A
is defined as Blocks  NullableBlocks
,
DomainOfDefinition→Blocks
defined by A
is denoted by f_{A}
.
In every pair alphabet P
, set ImplicitBlocks
is covered by the following disjoint sets of blocks:
IdentityOtherBlocks
= {{a:b a=b ∧ a:b∈UnknownAlphabet_{A1} ∧ b∈UnknownAlphabet_{A2}}}  {{}}
,
IrreflexiveOtherBlocks
= {{a:b a≠b ∧ a∈UnknownAlphabet_{A1} ∧ b∈UnknownAlphabet_{A2}}}  {{}}
,
OtherIndentityPairBlocks
= {{a:b a=b ∧ a:b∈UnknownAlphabet ∧ a:b∉IdentityOtherBlocks }}  {{}}
,
OtherIrreflexivePairBlocks
= {{a:b a≠b ∧ a:b∈UnknownAlphabet ∧ a:b∉IrreflexiveOtherBlocks }}  {{}}
.
Why there are two or more othersymbol blocks?
idenitityother
metasymbol and IdentityOtherBlock
block) in separation from other unknown pairs with knwon or unknown symbols
Definition. An alphabet A1
is a refinement of an alphabet A2
, i.e. A1≤A2
, if the alphabets share DomainOfDefinition
and there is a mapping r:Partition_{A1}→Partition_{A2}
such that f_{A1}(a) = f_{A1}(b)
implies r(f_{A1}(a))=r(f_{A1}(b))
i.e. every block in A1
is contained to a block in A2
.
Lemma 1. Given two alphabets A1
and A2
with the same DomainOfDefinition
, we can always compute alphabet A1∧A2
such that
A1∧A2 ≤ A1
and A1∧A2 ≤ A2
.
Lemma 2. Given two alphabets A1
and A2
with shared DomainOfDefinition
, we can always compute alphabet A1∨A2
such that
A1 ≤ A1∨A2
and A2 ≤ A1∧A2
.
Corollary. If we fix the DomainOfDefinition
in alphabets, there is a lattice (DomainOfDefinition,≤)
. Element A1∨A2
is called the meet (aka greatest lower bound or infimum) of A1
and A2
, and element A1∧A2
is called the join (aka least upper bound or supremum) of A1
and A2
. The lattice ({1,2,3,4},≤)
of alphabets can be illustrated by the following Hasse diagram:
Currently, we do not know any algorithms that would use the meet of alphabets. Such could be useful, however, in regular language approximations using covers.
Term Definition. Computing the join of two alphabets is called harmonization of alphabets.
Application of the principle of divideandconquer: Mutual harmonization of pair alphabets_ P_{a}
, P_{b}
can be seen as an operation where every member of both alphabets is harmonized. Harmonization of transducers with alphabets P_{a}
, P_{b}
can be seen as an operation where both alphabets are harmonized and then the labels of transitions in both transducers are harmonized
Definition. Let Alphabets_{DomainOfDefinition}
be the set of alphabets definable over some common DomainOfDefinition
. A function f: Alphabets_{DomainOfDefinition}→2^{DomainOfDefinition}
is called a metasymbol.
f
is joindistributive if if it satisfies the condition f(A1∧A2) = f(A1)∩f(A2)
.
f
is meetdistributive if if it satisfies the condition f(A1∨A2) = f(A1)∪f(A2)
.
Definition. Denote the set of metasymbols over the alphabet of DomainOfDefinition
by Meta_{DomainOfDefinition}
.
Definition. Let M_{1}⊆Meta_{DomainOfDefinition}
and M_{2}⊆M_1
.
An elimination of metasymbols &Delta=M_{1}  M_{2}
is a function elim_{Δ,A}:2^{MetaDomainOfDefinition}→2^{MetaDomainOfDefinition}
that returns for any metasymbol m∈M_{1}
and alphabet A
a set of metasymbols M⊆M_{2}
. There are metasymbols that generally cannot be eliminated by listing the covered concrete pairs.
The following important eliminations use the following equivalences that assume a fixed simple alphabet A
:
meta symbol  regexp  network  the semantics in terms of simpler meta symbols 

knownsymbol 
ExplicitBlocks 

unknownsymbol 
? 
ImplicitBlocks 

epsilonsymbol 
0 
0 
NullableBlocks 
lettersymbol 
LetterBlocks 

uselesssymbol 
DisallowedBlocks 

usefulsymbol 
AllowedBlocks 
The following important eliminations use the following equivalences that assume a fixed alphabet P
:
support  meta symbol  regexp  network  the semantics in terms of simpler meta symbols 

blocks  knownpair 
. 
ExplicitBlocks 

unknownsymbol 
? 
ImplicitBlocks 

identitypairofothersymbols 
? 
IdentityOtherBlock 

irreflexivepairofothersymbols 
?:? 
IrreflexiveOtherBlock 

otheridentitypair 
<?> 
OtherIdentityPairBlock 

otherirreflexivepair 
<?:?> 
OtherIrreflexivePairBlock 

epsilonpair 
0:0 <> 
NullableBlocks 

letterpair 
LetterBlocks 

uselesspair 
DisallowedBlock 

usefulpair 
AllowedBlocks 

lists  knownidentitypair 
{ a:b  a=b ∧ a:b∈KnownAlphabet } 

knownirreflexivepair 
{ a:b  a≠b ∧ a:b∈KnownAlphabet } 

a:knownpair 
a:. 
{ a:b  a:b ∈ KnownAlphabet } 

knownpair:a 
.:a 
{ b:a  b:a ∈ KnownAlphabet } 

a:irreflexivepairofknownsymbols 
<?>:a 
{ a:b  a≠b ∧ b ∈ A2.!KnownAlphabet } 

irreflexivepairofknownsymbols:a 
<?>:a 
{ b:a  a≠b ∧ b ∈ A1.!KnownAlphabet } 

a:irreflexivepairofothersymbols 
a:? 
{ a:b  a≠b ∧ a:b ∈ A2.!UnknownAlphabet } 

irreflexivepairofothersymbols:a 
?:a 
{ b:a  a≠b ∧ b:a ∈ A1.!UnknownAlphabet } 

metasyms  otherpair 
{otheridentitypair,otherirreflexivepair} 

pairofothersymbols 
?∪?:? 
{identitypairofothersymbols,irreflexivepairofothersymbols} 

mixedotherpair 
<.> 
{otherpair,pairofothersymbols} 

anyidentitypair 
? 
{knownidentitypair, identitypairofothersymbols, otheridentitypair} 

anyirreflexivepair 
?:?  ? 
{knownirreflexivepair, irreflexivepairofothersymbols, otherirreflexivepair} 

anypair 
?:? , <.> 
{anyidentitypair,anyirreflexivepair} 

a:anypair 
a:? 
{a:a, a:irreflexivepairofknownsymbols, a:irreflexivepairofothersymbols} 

anypair:a 
?:a 
{a:a, irreflexivepairofknownsymbols:a, irreflexivepairofothersymbols:a} 
The metasymbols that do not look like pairs can be converted to pairs using the convention x=x:x
.
It appears to me that it is slightly easier to consider first individual pairs and their combinations and then to go into the generalizations that iterate over all pairs in two transducers. I pursue toward such calculus of pais in this section. I call the calculus "symbol pair calculus".
Definition. Let E
be the union of alphabets E_{sym}
and E_{meta}
.
Let E_{meta}
be {otherpairs
, idenitityotherrelation
, irreflexiveotherrelation}
.
The set of all abstract symbol pairs over E
, P(E), is defined to contain the following:
notavailable:notavailable
(used when an operation fails, e.g. if we complement <.>
we should get nothing, i.e. the empty set)
otherpairs:otherpairs
(used to refer to the complement of the known set of pairs)
identitityother:identitityother
irreflexiveother:irreflexiveother
a:a
, a:otherpairs
and otherpairs:a
, a:irreflexiveother
and irreflexiveother:a
, where a∈ E_{sym}
a:b
where a,b∈ E_{sym}
.
a:b
where a,b∈ E
is an abstract symbol pair.
Definition. Let P_{a}
, P_{b}
, and P_{c}
be sets of listed abstract symbol pairs over E
. Let a:a'∈P_{a}
, b:b'∈P_{b}
and c:c'∈P_{c}
. The following operations are called abstract symbolpair operations. The result of the operations are sometimes setvalued.
C(a:a')
where a:a'∈P_{a}
(close the alphabet by removing =other=like symbol)
N_{Pb}(a:a')
where a:a'∈P_{a}
(used to compute complement)
K_{Pa}(a:a')
where a:a'∈P_{a}
(applied when the set of possible pairs that need to be listed is fixed)
H_{Pa,Pb}(a:a')
where a:a'∈P_{a}
(needed in harmonization)
a:a' ∩ b:b'
(needed in intersection and determinization)
a:a' ∪ b:b'
(needed when computing equivalent symbol pairs or harmonized alphabets)
a:a'  b:b'
(needed when computing difference)
a:a' xor b:b'
(needed when computing exclusive or of languages)
a:a' o b:b'
(needed in composition)
a:a' o b:b' o c:c'
(needed in threeway composition that is more efficient than two compositions)
I(a:a')
and O(a:a')
(needed in projection)
I^{1}(a')
and O^{1}(a)
(needed in oriented contexts and surface coercion of GTWOL, useful also for computing crossproduct)
V(a:a')
(needed in inversion)
 AnssiYliJyra  26 Sep 2008
Lemma 1. Given two alphabets A1
and A2
with the same DomainOfDefinition
, we can always compute alphabet A1∧A2
such that
A1∧A2 ≤ A1
and A1∧A2 ≤ A2
.
Proof of Lemma 1. Construct A1∧A2=(DomainOfDefinition, Partition_{A1∧A2}, ExplicitBlocks_{A1∧A2}, NullableBlocks_{A1∧A2}, DisallowedBlocks_{A1∧A2})
as follows:
Partition_{A1∧A2}= ∪_{B1∈PartitionA1}∪_{B2∈PartitionA2} { B1∩B2 }  { {} }
ExplicitBlocks_{A1∧A2}= (∪_{B1∈ExplicitBlocksA1}∪_{B2∈PartitionA2} { B1∩B2) }) ∪ ∪_{B1∈PartitionA1}∪_{B2∈ExplicitBlocksA2} { B1∩B2) }  { {} }
NullableBlocks_{A1∧A2}= ∪_{B1∈NullableBlocksA1}∪_{B2∈NullableBlocksA2} { B1∩B2 }  { {} }
DisallowedBlocks_{A1∧A2}= ∪_{B1∈DisallowedBlocksA1}∪_{B2∈DisallowedBlocksA2} { B1∩B2) }  { {} }
.
Lemma 2. Given two alphabets A1
and A2
with shared DomainOfDefinition
, we can always compute alphabet A1∨A2
such that
A1 ≤ A1∨A2
and A2 ≤ A1∧A2
.
Proof of Lemma 2. The alphabet of universal language induces the coarsest possible alphabet (DomainOfDefinition,{DomainOfDefinition},{DomainOfDefinition},{},{}}
. The existence of this alphabet suffices. Q.E.D.
TWiki Reference