HFST: File Extension Guidelines

There is need for some uniform naming policy for different files used in different stages of transducer development. This page contains some suggestions, feedback and modifications welcome.

I have also inserted some rudimentary icons for some file extension. Icons could be needed in future for some graphical editors or user interfaces. Icons are 16x16 png files.

Runtime Binary Format

File extension : .hfst

Output from: write_test

Input to: runtime transducer (HfstOptimizedLookupFormat)

Examples: see HfstBinaryFormatUnweightedExamples

Icon proposals:

HFST Lexicon File

File extension proposals : .hlex ( .hlx | .hlexc | .hlc )

Input to: hfst-lexc (HfstLexc)

Examples: see HfstLexcAndTwolcTutorial

Icon proposals:

HFST Two-Level Grammar File

File extension proposals : .htl (.htw | .htwl | .htwol | .twol | .twolc)

Input to: hfst-twolc (HfstTwolC)

Examples: see HfstLexcAndTwolcTutorial

Icon proposals:

SFST Grammar file

File extension : .sfst

Input to: ?

Examples: ?

Icon proposals: none

HFST Intermediate Transtructor File

In tutorial this file is named .hfst but if we use this extension for binary format, some other should be used.

File extension propsals: .htr (from helsinki transducer)

Output form: hfst-lexc | hfst-twolc | hfst-compose-intersect |

Input to: hfst-fst2pairstrings | hfst-fst2txt

Examples: unavailable

Icon proposal: none


  • Intermediate formats are mostly unchanged from underlying libraries; suffixes .openfst and .sfst should be sufficient for these.
  • Different text formats should be marked; something like .sfsttext for legacy SFST text format and .openfsttext for the format originated from AT&T systems
  • The extension of handling concatenations of binaries as archives should be recognised as .hfstarchive or such
  • There should be no need to conform to legacy 8.3 naming scheme, it is rather unlikely for HFST to work any systems old enough to have that restriction
  • Media-type should be considered for systems that do not use filename based file typing
  • ...

-- PetriUusitalo - 2008-11-25

