Using HFST as spell checker

This page shows how to setup a HFST based spell-checking system. First component is hfst-ospell (download link), install it like so (do note that dollar sign here signifies non-root unix command line, you should not type it; furthermore, replace 0.2.4 with whatever is the latest version as possible):

$ tar zxvf hfst-ospell-0.2.4.tar.gz
$ cd hfst-ospell-0.2.4
$ ./configure
$ make
$ sudo make install

(Replace sudo with something appropriate to perform root or admin installation in your system.)

You'll need at least libxml2.6++ and libarchive, which in Debian-based systems should be installable as "libxml2.6-dev" and "libarchive-dev" respectively. For details about current version including requirements see HfstOspellReadme.

Next you will want to install libvoikko (download link) and remember to configure with experimental HFST backend like so (replace 3.5 with the latest version number):

$ tar zxvf libvoikko-3.5.tar.gz
$ ./configure --enable-hfst
$ make
$ sudo make install

Note: You may have to make sure all the new libraries are loadable (eg. run `ldconfig` as root in Debian).

$ voikkospell -l
voikkospell: error while loading shared libraries: cannot open shared object file: No such file or directory
$ sudo ldconfig

Now we can just go about adding the spellers to right directories. For example we can use Finnish speller based on omorfi downloadable from hfst repo:

$ mkdir -p ~/.voikko/3/
$ cp speller-fi.zhfst ~/.voikko/3/

Now you should be able to test the spell-checker by using `voikkospell` (in lines saying [CTRL-D], press Control key and then D while holding control, do not type [CTRL-D]):

$ voikkospell -l
fi-x-standard: Suomen kielen oikaisuluin (omorfi 20120401)
$ voikkospell -d fi-x-standard 
C: talo
W: taloq

$ voikkospell -d fi-x-standard  -s
C: talo
W: taloq
S: talo
S: taloa
S: talot
S: talon
S: Jalon

Now everything works.

You can continue by installing enchant (your distribution should have this, possibly even with voikko support, already), which will provide this spelling functionality to most of the reasonable open source software, such as everything that uses GtkSpell, GnomeSpell or SexySpell widgets. You'll need your LANG environment variable set to the language you're correcting for this to work, unless the software has dictionary selection widgets somehow. U For problematic huge software behemoths like firefox (spell checker extension) or libreoffice (spell checker extension; 64-bit version only, open the extension file with LibreOffice) you need to fetch those specific extensions.

To go about building one such a spelling-checker dictionary for your language, please refer to my tutorial on FST spell-checker building in FSMNLP 2012 Donostia or study the finite-state morphology and language resource repository by Divvun/UiT.

Topic revision: r8 - 2013-08-09 - TommiPirinen
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2018 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback