HFST: Test Cases for Runtime Transducer

To properly the test transducer functionality we need some systematic way to conduct tests for different transducers and different output and input strings. A transducer test case file describes different test cases for transducer testing.

A test case file is a simple text file which contains input strings, and for every input string a set of output strings which the transducer should produce.

A test case starts with the keyword INPUT, which is followed by a line feed and a symbol string array. Symbols are divided by empty spaces. For every INPUT, there must be the keyword OUTPUT, which is followed by a list of output strings. Output strings are separated by a line feed, and output strings are divided into symbols with an empty space. One file can contain more than one input-output relation.

INPUT
    f i r s t i n p u t t e s t

OUTPUT
   a a a a
   a a b b
   a a c c
   a a d d
!COMPLETE

INPUT
   s e c o n d i n p u t t e s t

OUTPUT
    b b a a
    b b c c 
!INCOMPLETE

Note that the output set can contain the same string twice (or more). In this case the transducer must produce a given output string more than once. This may happen if the same output can be reached by different paths (and duplicates are not eliminated).

Test Case Header

TESTCASE: [ analyse | generate ] [ unordered | ordered ] [ complete | uncomplete ] [ unweighted | weighted ]

Typically at the beginning of a file, there is a line which starts with the keyword TESTCASE. This line provides sufficient information for what to expect from output strings. Given settings apply to all of the following input-output relations until the end of file or until a new TESTCASE line overrides them.

If the TESTCASE line is missing or some parameters are not specified, defaults are used.

analyse (default) | generate

If the analyse parameter is defined, the given input string should be processed as analyse input by the transducer. If the generate parameter is defined, the input string should be used to generate input. Output strings are used correspondingly.

unordered (default) | ordered

The order parameter defines the processing order of the output strings. Typically this is the case for weighted transducers when the output strings should be generated in weight order.

The order is defined by an integer, i.e. an index, in front of the output symbol string.

Example:

TESTCASE analyse ordered

INPUT
    a a 
OUTPUT
   1 b b
   2 b b b b
   3 b b b b b b
!COMPLETE

There may be cases where the order is only partially defined, e.g. when more than one output string has the same weight. These output strings can be assigned the same index. Effectively this means that strings with the same index can be generated in any order but they must all be generated.

TESTCASE analyse ordered

INPUT
    a a 
OUTPUT
   1 b b
   2 b b b b
   2 b b a a
   3 b b b b b b
!COMPLETE

complete (default) | incomplete

The complete parameter means that for a given input string there exists only the given output strings - no more, no less. If a transducer produces more strings than the output list specifies, the test case will fail, and if a transducer produces fewer output strings than specified the testcase will also fail. In addition, the iterator must return false at the end. An output string list can be empty, this means that transducer does not accept a given input string at all.

With the incomplete parameter, we can test transducers where the given input string has an unlimited set of output strings. This may happen when the transducer contains epsilon on the input side and some symbol on the output side, and all this in cycle.

An example of this kind of transducer:

[TODO: insert picture here]

input output
a b
b b
b b b
b b b b
...

Example test file:

TESTCASE: analyse incomplete unordered

INPUT
    u n a c c e p t a b l e

OUTPUT
!COMPLETE

INPUT
   a
   
OUTPUT
   b
   b b
   b b b
   b b b b
!COMPLETE

NOTE: For a testcase to pass, the hasMore-function of the iterator must return true after the last position.

unweighted (default) | weighted

The weighted parameter tells that the output strings are weighted, so the transducer must additionally produce the identical total weight. The weight is specified for every output string by putting a floating point number in front of the symbol string.

In case of ordered and weighted cases, the order number is given before the weight.

TESTCASE analyse ordered weighted

INPUT
    a a 
OUTPUT
   1 0.90 b b
   2 0.04 b b b b
   2 0.04 b b a a
   3 0.02 b b b b b b
!COMPLETE


-- PetriUusitalo - 2008-12-12

Topic revision: r4 - 2009-03-27 - PetriUusitalo
 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2018 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback