HFST: Examples of how other-symbols should be handled in transducer operations.

This page poses questions and gives examples about the way other-symbols should be handled in transducer-operations. Please, don't edit the first section or its sub-sections. Please, do edit all other sections by adding your own subsections.

Definitions concerning the two existing other-symbols

We define two kinds of other-symbols, i.e. the identity other-symbols and the non-identity other-symbol. We also define two kinds of harmonizations, parallel and serial harmonization.

Two transducers with other-symbols have to be harmonized before one combine them using ordinary transducer operations. Parallel harmonization should be used with conjunction, disjunction, difference and concatenation of transducers. Serial harmonization should be used with composition of transducers.

The harmonizations presented here may be defected, which means that the result of an ordinary transducer operation applied on two harmonized transducers may be incorrect in some way. This is most easily seen through an example. Please give such examples (as sub-sections of the section dealing with the relevant operation on this page).

Definition of the two other symbols

Let T be a transducer with input-alphabet Tau_i and output-alphabet Tau_o. Transitions in T may be labeled with pairs, whose input or ouput symbol is one of two kinds of other-symbols = and . Both other-symbols belong to the input- and output-alphabet of T. Given the symbol a in the input-alphabet of T and the symbol b in the ouput-alphabet of T. The valid kinds of other-pairs are =:=, ≠:≠, a:≠ and ≠:a.

If S is another transducer, with other-symbols = and then these are the same symbols as the other-symbols in T.

Definition of parallel harmonization of alphabets and transitions of transducers having other-symbols

Parallel harmonization is performed on two transducers S and T before a variety of binary transducer operation excluding composition.

Let S be a transducer with other-symbols, input-alphabet Sigma_i and output-alphabet Sigma_o. Let T be a transducer with other-symbols, with input-alphabet Tau_i and output-alphabet Tau_o. Harmonization of S and T creates the new alphabets Theta_i and Theta_o with the following properties

  • Theta_i = Sigma_i ∪ Tau_i
  • Theta_o = Sigma_o ∪ Tau_o

Harmonization of S and T creates the new transducers S' and T'. Both of the new transducers have input-alphabet Theta_i and output-alphabet Theta_o.

The transducers T and T' have the same initial states, final states and other states. Each transition in T is a transition in T'.

The transducer T' additionally has every transition from state state_i to state_j with pair x:y, where x and y are different symbols and

  1. x belongs to Theta_i and y belongs to Theta_o,
  2. x doesn't belong to Tau_i and y doesn't belong to Tau_o and
  3. T has a transition from state_i to state_j with pair ≠:≠.

The transducer T' additionally has every transition from state_i to state_j with pair x:x, where

  1. x belongs to Theta_i and Theta_o,
  2. x doesn't belong to Tau_i and x doesn't belong to Tau_o (perhaps?)
  3. T has a transition from state_i to state_j with pair =:=.

The transducer T' additionally has every transition from state_i to state_j with pair a:x (x:b), where

  1. x belongs to Theta_o or Theta_i,
  2. x doesn't belong to Tau_o or Tau_i and
  3. T has a transition from state_i to state_j with pair a:≠ (≠:b).

The relation between the transducers S and S' is analogous to the relation between T and T'.

Definition of serial harmonization of alphabets and transitions of transducers having other-symbols

Serial harmonization is performed on two transducers T and S with other symbols before ordinary composition.

Let S be a transducer with other-symbols, input-alphabet Sigma_i and output-alphabet Sigma_o. Let T be a transducer with other-symbols, input alphabet Tau_i and output alphabet Tau_o.

Serial harmonization of S and T creates a new alphabet
Theta = Sigma_o ∪ Tau_i

Harmonization of S and T creates new transducers S' and T'. The transducer S' has the same intitial states, final states and other states as S. The input-alphabet of S' is Sigma_i and its output-alphabet is Theta.

All transitions in S are transitions in S'. The transducer S' additionally has the transitions from state_i to state_j with pair ≠:x, when

  1. S has a transition with the pair ≠:≠ from state_i to state_j,
  2. x is not a symbol in Sigma_i and not a symbol in Sigma_o and
  3. x is a symbol in Theta.

The transducer S' additionally has the transitions from state_i to state_j with pair x:x, when

  1. S has a transition with the pair =:= from state_i to state_j,
  2. x is not a symbol in Sigma_i and not a symbol in Sigma_o (perhaps?) and
  3. x is a symbol in Theta.

The transducer S' additionally has the transitions from state_i to state_j with pair a:x, when

  1. S has a transition with the pair a:≠ from state_i to state_j,
  2. x is not a symbol in Sigma_i and not a symbol in Sigma_o and
  3. x is a symbol in Theta.

The transducer T' has the same intitial states, final states and other states as T. The input-alphabet of T' is Theta and its output-alphabet is Tau_o.

All transitions in T are transitions in T'. The transducer T' additionally has the transitions from state_i to state_j with pair x:≠, when

  1. T has a transition with the pair ≠:≠ from state_i to state_j,
  2. x is not a symbol in Tau_i and not a symbol in Tau_o and
  3. x is a symbol in Theta.

The transducer T' additionally has the transitions from state_i to state_j with pair x:x, when

  1. T has a transition with the pair =:= from state_i to state_j,
  2. x is not a symbol in Tau_i and not a symbol in Tau_o and
  3. x is a symbol in Theta.

The transducer S' additionally has the transitions from state_i to state_j with pair x:a, when

  1. S has a transition with the pair ≠:a from state_i to state_j,
  2. x is not a symbol in Tau_i and not a symbol in Tau_o and
  3. x is a symbol in Theta.

Other-symbols in disjunction

Other-symbols in conjunction

Other-symbols in difference

Other-symbols in concatenation

Other-symbols in composition

Example showing that serial harmonization isn't sufficient.

Let the input alphabet of the transducer with other-symbols S be Sigma_i = { a, b, =, ≠ } and, output-alphabet be Sigma_o = { c, =, ≠ } and its transitions diagram be
1: a:c -> 1
   b:c -> 1
   =:= -> 1
This makes S a transducer which writes known input-symbols to c and lets everything else (except c) through.

Let the input alphabet of T be Tau_i = {A, B, =, ≠} and output alphabet be Tau_o = {A, B, C, =, ≠ } and its transition diagram be

1: A:A -> 1
   B:B -> 1
   C:C -> 1
   ≠:C -> 1
This makes T a transducer, which writes known input-symbols to themselves and everything else to C.

Serial harmonization of these transducers leads to the transducer S' and T'.

S' has input alphabet Sigma_i and output-alphabet Theta = { c, A, B, =, ≠ }. It has the transition diagram

1: a:c -> 1
   b:c -> 1
   A:A -> 1
   B:B -> 1
   =:= -> 1

T' has input alphabet Theta and output-alphabet Tau_o. It has the transition diagram

1: A:A -> 1
   B:B -> 1
   C:C -> 1
   c:C -> 1
   ≠:C -> 1

The ordinary composition of S' and T' has transition diagram

1: A:A -> 1
   B:B -> 1
   a:C -> 1
   b:C -> 1
Hence the result of the composition would give no output-string for the input-string Ea. Still the transducers S and T applied as a cascade would give an output-string CC (with an intermediate result Ec).

Info Is it possible to define serial harmonization in such a way, that there is no need to re-write existing composition algortihms? Perhaps one could add the pair ≠:≠ to the transition in S. Does this cause problems?

How to fix serial harmonization (KristerLinden - 30 Sep 2008, in personal communication to Anssi Yli-Jyrä)

It is possible to fix serial harmonization by adding a temporary harmonization symbol -:≠, which serves no other purpose than to remember that there is a =:= in S and a on Tau_i. We would use a different temporary harmonization symbol ≠:- for the case, when there is on Sigma_o and a =:= in T. When the composition is over, we simply replace all - with in the new transducer.

Why is this OK? The crucial point is that the fix is needed only under a very precise circumstance, i.e. when copying an unseen symbol from one tape to another for replacement by the next transducer, or for introducing a new and unseen symbol which is then subsequently copied by the next transducer.

NOTE. If we do not wish to introduce a new temporary symbol, we can also use ≠:≠ as suggested by Miikka (in Info ) and save us the trouble of cleaning up afterwards. This temporarily changes the denotational semantics of the transducer. However, the consequences may in fact be either harmless or beneficial in the context of the cascade that caused the need for harmonization.

More formally we need to augment the serial harmonization as follows:

The transducer S' additionally has the transitions from state_i to state_j with pair -:≠, when

  1. S has a transition with the pair =:= from state_i to state_j and
  2. is a symbol in Tau_i.

The transducer T' additionally has the transitions from state_i to state_j with pair ≠:-, when

  1. T has a transition with the pair =:= from state_i to state_j and
  2. is a symbol in Sigma_o.

After composition - is replaced with .

An example to demonstrate the principle:

Serial harmonization of the transducers S and T leads to the transducer S' and T'.

S' has input alphabet Sigma_i and output-alphabet Theta = { c, A, B, =, ≠ }. It has the transition diagram

1: a:c -> 1
   b:c -> 1
   A:A -> 1
   B:B -> 1
   =:= -> 1
   -:≠ -> 1

T' has input alphabet Theta and output-alphabet Tau_o. It has the transition diagram

1: A:A -> 1
   B:B -> 1
   C:C -> 1
   c:C -> 1
   ≠:C -> 1

The ordinary composition of S' and T' has transition diagram

1: A:A -> 1
   B:B -> 1
   a:C -> 1
   b:C -> 1
   -:C; -> 1

After composition we replace - with

1: A:A -> 1
   B:B -> 1
   a:C -> 1
   b:C -> 1
   ≠:C; -> 1
which gives us the desired result CC for Ea both in the cascade and in the final composition.

Anssi's Comment to the Sufficiency of Serial Harmonization

Label ≠:≠, represents only non-identity pairs that have not been otherwise mentioned. As a concrete pair, it is not represented by itself, because it is itself of form x:x. Under serial harmonization, =:= could be seen as a representative for all concrete identity pairs x:x

  1. that are not represented by other identity pairs used as labels of S, and
  2. whose members are drawn from Tau_i.

Due to this, ≠:≠ and a:a do not have any difference when considered as outcome of serial harmonization of =:=.

The transducer S' additionally has the transitions from state_i to state_j with pair x:x, when
  1. S has a transition with the pair =:= from state_i to state_j,
  2. concrete pair x:x is not represented by any other pairs in the labels of S,
  3. x is a symbol in Tau_i.

One thing that might still go wrong is the timing. If we do harmonization in several steps we may end up with wrong ordering. Possibly this could be avoided by seeing the harmonization as a two step process, where we first compute the label morphims for both transducers and then apply them. -- AnssiYliJyra - 04 Oct 2008


-- MiikkaSilfverberg - 02 Oct 2008
Topic revision: r5 - 2008-10-04 - AnssiYliJyra
 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2018 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback