HFST: HfstTransducer Header Format

The header structure

An HFST version >3.0 transducer in binary format consist of an HFST header and the transducer of the backend implementation. The HFST header has the following structure:

  • the first four bytes identify an HFST header: "HFST"
  • the fifth byte is a separator: "\0"
  • the sixth and seventh bytes tell the length of the rest of the header (beginning after the eighth byte)
  • the eighth byte is a separator and is not included to the header length: "\0"
  • the rest of the header consists of pairs of attributes and their values that are each separated by a "\0"

HFST version 3.0 header must contain at least the attributes 'version', 'type' and 'name' (in that order) and their values. Currently the accepted values are:

attribute accepted values
version 3.0
type SFST_TYPE, FOMA_TYPE, TROPICAL_OPENFST_TYPE, LOG_OFST_TYPE, HFST_OL_TYPE, HFST_OLW_TYPE
name any string, including the empty one

An HFST header can contain more attributes after these obligatory ones, but they are ignored by HfstInputStream functions unless explicitly handled in the backend implementation.

An example:

"HFST\0"
"\0\x1c\0"
"version\0"  "3.0\0"
"type\0"     "FOMA\0"
"name\0"     "\0"

This is the header of a version 3.0 HFST transducer whose implementation type is FOMA_TYPE and whose name is not defined, i.e. is the empty string "". The two bytes "\0\x1c" that form the length field tell that the length of the rest of the header (i.e. the sequence of bytes "version\03.0\0type\0FOMA\0name\0\0") is 0 * 256 + 28 * 1 = 28 bytes.


-- ErikAxelson - 2011-02-03
Topic revision: r1 - 2011-02-03 - ErikAxelson
 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2018 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback