HFST: Transducer Properties

Notes on how properties of an OpenFst transducer could be calculated in SFST. (# means 'number'. ) An 'x' in the column W indicates the algorithm for calculating the property needs some work.

A SFST Transducer has only two property variables, i.e. deterministic and minimised. Most properties that an OpenFst transducer has can be calculated for a SFST transducer. To make use of transducer properties would require rewriting the SFST operations so they could (1) make optimizations depending on the properties of their argument transducers and (2) easily conclude the properties of the resulting transducer.

By numbering the nodes in a transducer and going through them all and their arcs (algorithm1), the following properties can be calculated:

  • # of states (NodeNumbering)
  • # of arcs (ArcsIter)
  • # of final states (is_final())
  • # of input/output epsilons
  • # of input epsilons
  • # of output epsilons
  • acceptor

At the same time, some preparations can be made for next algorithm that finds what states are accessible or coaccessible: For all transitions from state s1 to s2, a marker transition from s2 to s1 is added. Transducer can thus be traversed in reverse order from final states to initial state.

An algorithm that goes through the entire transducer from initial state to final states and vice versa (algorithm2) can calculate the following properties:

  • # of accessible states (actually the set of accessible states)
  • # of coaccessible states (actually the set of coaccessible states)
  • # of connected states (the size of intersection of sets of accessible and coaccessible states )
  • cyclic (is_cyclic() will do this, too)
  • cyclic at initial state

W property explanation SFST implementation
fst type In HWFST, equal to vector not needed
arc type In HWFST, equal to standard not needed
input symbol table - alphabet contains input and output symbols
output symbol table -
# of states - algorithm1
# of arcs - algorithm1
initial state - not needed, stored first
# of final states - algorithm1
# of input/output epsilons Transitions whose input and output labels are epsilons. algorithm1
# of input epsilons Transitions whose input label is epsilon. algorithm1
# of output epsilons Transitions whose output label is epsilon. algorithm1
x # of accessible states A state is accessible iff there is a path from initial state to that state. algorithm2
x # of coaccessible states A state is coaccessible iff there is a path from that state to a final state. algorithm2
x # of connected states A state is connected iff it is accessible and coaccessible. algorithm2
X # of strongly connected components http://en.wikipedia.org/wiki/Strongly_connected_component -
expanded - not needed
mutable In HWFST, equals true not needed
acceptor Every transition has equal input and output labels. algorithm1
input deterministic No two transitions leaving a state have the same input label. algorithm1
output deterministic No two transitions leaving a state have the same output label. algorithm1
input/output epsilons   whether # of input/output epsilons is greater than zero
input epsilons - whether # of input epsilons is greater than zero
output epsilons - whether # of output epsilons is greater than zero
input label sorted If transitions are sorted according to their input labels.  
output label sorted If transitions are sorted according to their output labels.  
weighted - always false
cyclic - algorithm2 or is_cyclic()
cyclic at initial state - algorithm2
X top[ologically] sorted http://en.wikipedia.org/wiki/Topological_sorting -
accessible - whether # of accessible states equals # of states
x coaccessible - whether # of coaccessible states equals # of states

SFST transducer properties originally include only deterministic and minimised, but more properties can easily be added to transducer class. acyclic is one property that could be useful and easily updated by some functions, so it is added to HFST unweighted transducer properties.

Below are listed the bool properties that are written to a header (two bytes) when a transducer is written to a stream or to a file. Most values are always false because they are not (presently) included as flags in transducer class and calculating them every time a transducer is written to a stream would be too slow. bit means the bit number (the leftmost bit is number zero) in the two-byte header (0 indicates a false value and 1 a true value, the four extra bits from 12 to 15 are always zero).

bit property default value
0 acceptor false
1 input deterministic false
2 output deterministic false
3 input/output epsilons false
4 input epsilons false
5 output epsilons false
6 acyclic  
7 cyclic at initial state false
8 accessible false
9 coaccessible false
10 deterministic  
11 minimised  


-- ErikAxelson - 2008-12-11
Topic revision: r10 - 2009-01-12 - ErikAxelson
 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2018 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback