hfst-compare

Purpose

Compare two transducers for equivalence. Two transducers are equivalent, if they map the same input strings to the same output strings with the same alignments and weights.

Usage

The help message:

Usage: hfst-compare [OPTIONS...] [INFILE1 [INFILE2]]
Compare two transducers

Common options:
  -h, --help             Print help message
  -V, --version          Print version info
  -v, --verbose          Print verbosely while processing
  -q, --quiet            Only print fatal erros and requested output
  -s, --silent           Alias of --quiet
Input/Output options:
  -1, --input1=INFILE1   Read first input transducer from INFILE1
  -2, --input2=INFILE2   Read second input transducer from INFILE2
  -C, --do-not-convert   Do not allow transducers to be converted into the same type
  -o, --output=OUTFILE   Write results to OUTFILE
Harmonization:
  -H, --do-not-harmonize Do not harmonize symbols.
  -e, --eliminate-flags  Eliminate flag diacritics.

If OUTFILE, or either INFILE1 or INFILE2 is missing or -,
standard streams will be used.
INFILE1, INFILE2, or both, must be specified.
Format of result depends on format of INFILE1 and INFILE2;
both should have the same format.

The operation is applied pairwise for INFILE1 and INFILE2
that must have the same number of transducers.
If INFILE2 has only one transducer, the operation is applied for
each transducer in INFILE1 keeping the second transducer constant.


Examples:
  $ hfst-compare cat.hfst dog.hfst
  cat.hfst[1] != dog.hfst[1]
  $ hfst-compare cat.hfst cat.hfst
  cat.hfst[1] == cat.hfst[1]

Report bugs to <hfst-bugs@helsinki.fi> or directly to our bug tracker at:
<https://sourceforge.net/tracker/?atid=1061990&group_id=224521&func=browse>

Output

The tool reads one transducer from both streams at a time, compares the transducers for equivalence and for each pair of transducers prints a message to output. The message consists of

  • the filename where the first stream is read
  • the number of transducer read from the first stream between angle brackets
  • between spaces, an operator == if the transducers are equivalent, or != if they are not equivalent
  • the filename where the second stream is read
  • the number of transducer read from the second stream between angle brackets.

If either stream is read from standard input, <stdin> is given as filename. For the first transducer read from a stream, no number is printed.

Examples

We first create three transducers that all map the string "cat" to "chat". The first and second transducers place the epsilon in the second transition and the third one to the fourth transition. The first and third transducers perform the mapping with weight 0.5 and the second one with weight 0.3.

echo -e "c:c 0:h a:a t:t\t0.5" | hfst-strings2fst --has-spaces --pairstrings -e "0" -f openfst-tropical > cat2chat1.hfst
echo -e "c:c 0:h a:a t:t\t0.3" | hfst-strings2fst --has-spaces --pairstrings -e "0" -f openfst-tropical > cat2chat2.hfst
echo -e "c:c a:h t:a 0:t\t0.5" | hfst-strings2fst --has-spaces --pairstrings -e "0" -f openfst-tropical > cat2chat3.hfst

When we compare the transducers, we get the following results:

$ hfst-compare cat2chat1.hfst cat2chat2.hfst
 hfst-strings2fst c:c 0:h a:a t:t != hfst-strings2fst c:c 0:h a:a t:t
$ hfst-compare cat2chat1.hfst cat2chat3.hfst
 hfst-strings2fst c:c 0:h a:a t:t != hfst-strings2fst c:c a:h t:a 0:t
$ hfst-compare cat2chat2.hfst cat2chat3.hfst
 hfst-strings2fst c:c 0:h a:a t:t != hfst-strings2fst c:c a:h t:a 0:t

None of the transducers are equivalent since either their alignments or weights differ.

We create two transducers that both map "cat" to "chat" with the same alignments with weight 2.5 but the location of the weights differs:

$ echo "0 1 c c 0.2
1 2 0 h 0.3
2 3 a a 1.0
3 4 t t 0.5
4 0.5" | hfst-txt2fst -e "0" -f openfst-tropical > cat2chat_4weights.hfst
$
$ echo "0 1 c c 0
1 2 0 h 0.5
2 3 a a 0
3 4 t t 0
4 2.0" | hfst-txt2fst -e "0" -f openfst-tropical > cat2chat_2weights.hfst

Now the transducers are equivalent:

$ hfst-compare cat2chat_4weights.hfst cat2chat_2weights.hfst
hfst-txt2fst <stdin> == hfst-txt2fst <stdin>

Shortcomings

The tool can produce false negatives with weighted transducers due to precision issues.

More information

-- ErikAxelson - 10 Jul 2008