HFST: File Extension Guidelines
There is need for some uniform naming policy for different files used in different stages of transducer development. This page contains some suggestions, feedback and modifications welcome.
I have also inserted some rudimentary icons for some file extension. Icons could be needed in future for some graphical editors or user interfaces. Icons are 16x16 png files.
Runtime Binary Format
File extension :
.hfst
Output from: write_test
Input to: runtime transducer (
HfstOptimizedLookupFormat)
Examples: see
HfstBinaryFormatUnweightedExamples
Icon proposals:
HFST Lexicon File
File extension proposals :
.hlex (
.hlx |
.hlexc |
.hlc )
Input to: hfst-lexc (
HfstLexc)
Examples: see
HfstLexcAndTwolcTutorial
Icon proposals:
HFST Two-Level Grammar File
File extension proposals :
.htl (
.htw |
.htwl |
.htwol |
.twol |
.twolc)
Input to: hfst-twolc (
HfstTwolC)
Examples: see
HfstLexcAndTwolcTutorial
Icon proposals:
SFST Grammar file
File extension :
.sfst
Input to: ?
Examples: ?
Icon proposals: none
HFST Intermediate Transtructor File
In tutorial this file is named
.hfst but if we use this extension for binary format, some other should be used.
File extension propsals:
.htr (
from helsinki transducer)
Output form: hfst-lexc | hfst-twolc | hfst-compose-intersect |
Input to: hfst-fst2pairstrings | hfst-fst2txt
Examples: unavailable
Icon proposal: none
Comments
- Intermediate formats are mostly unchanged from underlying libraries; suffixes .openfst and .sfst should be sufficient for these.
- Different text formats should be marked; something like .sfsttext for legacy SFST text format and .openfsttext for the format originated from AT&T systems
- The extension of handling concatenations of binaries as archives should be recognised as .hfstarchive or such
- There should be no need to conform to legacy 8.3 naming scheme, it is rather unlikely for HFST to work any systems old enough to have that restriction
- Media-type should be considered for systems that do not use filename based file typing
- ...
--
PetriUusitalo - 2008-11-25