Equivalence testing

Results from SFST and unweighted HFST are equivalent. Unweighted and weighted HFST do not yield equivalent results, but input and output projections of the results are equivalent. This suggests that the results are functionally the same, but differ in their alignment.

The equivalence of input and output projections does not necessarily mean that each input produces the same output in weighted and unweighted results. Because the results are cyclic, it is impossible to check all possible input/output relations.

Subtracting unweighted and weighted results shows that the weighted result is a subset of the unweighted one. It seems that the unweighted result accepts more alignments for some input/output relations than the weighted one.

We can test this by subtracting the weighted result from the unweighted one and taking the input projection. Then we can test the weighted and unweighted results with these strings. It seems that the weighted result accepts only one alignment for each input/output relation but the unweighted one accepts several alignments. For example the input ärähtää<53><34> produces in weighted result only one output:

verbatim0�

but in unweighted one three same outputs with different alignments:

verbatim1�

The reason for this is that implementations of composition differ in SFST and OpenFst. SFST implements a naive composition algorithm where labels with output epsilons (in the first transducer) and labels with input epsilons (in the second transducer) can appear in any order. In unweighted transducers this produces redundant paths but the result is still correct. However, in weighted transducers a naive composition will produce wrong results. That is why OpenFst's composition algorithm allows only one alignment. By default this is the alignment where labels with output epsilons (in the first transducer) come before labels with input epsilons (in the second transducer).

Pushing labels does not produce equivalent results. It is not possible to push labels in a minimal transducer. AN EXAMPLE why this is not possible.

Roark, Sproat: Computational Approaches to Morphology and Syntax, pages 14-15 Pereira, Riley: Finite-State Language Processing, pages 439-442 Mohri, Pereira, Riley: SPEECH RECOGNITION WITH WEIGHTED FINITE-STATE TRANSDUCERS, pages 13-14

Could OpenFst's algorithm be implemented in SFST? At least for testing purposes. How about filtering afterwards. If the naive composition is intersected with the following filter:

verbatim2�

This would clearly remove the redundant paths where input epsilons come before output epsilons. However, the filter does not know where these paths have originated; such paths that are present in one of the transducers before composition are also removed. If single epsilons (epsilons that occur only either in the input or output of a transition) in the argument transducers are substituted with an unused marker before composition and then substituted back to epsilons after composition and filtering? How much time does it take to do the filtering afterwards?

-->

                              -
                              -
                          (gap)
                              -
                           -  -
                           -  -
                          (gap)
                           -  -
                   +       -  -
                   +  +    -  -
         -         +  +    -  -
    +    -         + -+    -+ -
   -+    -        -+ -+    -+ -
-  -+    -        -+ -+    -+ -+
p  i  s  s  f  p  o  o  e  c  o
h  n  t  t  i  l  m  m  x  o  m
o  f  e  u  n  u  o  o  c  m  o
n  l  m  b  d  r  r  r  e  p  r
o  e  f  i  -  a  f  f  p  o  f
l  c  i  f  g  l  i  i  t  u  i
o  t  l  y  r  e  1  2  i  n
g  i  l     a  -        o  d
y  o        d  t        n  s
   n                    s 

Hypothesis that calls for extra tests: on large transducers the efficiency of a minimising algorithm depends on the cyclicity of the transducer. Cyclic transducers are faster to minimise with Hopcroft's algorithm, but acyclic with Brzozowski's.

Replace rules can be ambiguous. For example the rule

ab | b | ba | aba -> x  ||  _

yields four different outputs for the input "aba". That is because expression A in the rule can be matched in four ways in the string "aba"

input:    a b a    a b a    a b a    a b a
match:    ---        -        ---    -----
output:    x  a    a x a    a  x       x 

To avoid this ambiguity, Karttunen has implemented a directed replace operator. It allows the expression A to be matched from left to right or from right to left. It is also possible to define whether the shortest or longest match is searched. Different types of directed replace operators are listed in the table below.

  longest match shortest match
left-to-right @-> @>
right-to-left ->@ >@

If we use directed replace operators in the previous rule, the ambiguity is dissolved. Below are listed the outputs that the rule yields for input "aba" depending on the type of the directed replace operator:

operator output explanation
@-> xa left-to-right, longest match
@> axa left-to-right, shortest match
->@ x right-to-left, longest match
>@ ax right-to-left, shortest match

-- ErikAxelson - 2009-12-03

Topic revision: r3 - 2010-02-02 - ErikAxelson
 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2018 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback