HFST: How to compile HFST for Windows

Tools needed

See here for instructions how to install both Windows SDK 7.1 and Visual C++ 2010 Express:

If you have Visual C++ 2010 Express SP1 installed already, but not the Windows SDK 7.1:

  • 1. Uninstall the Visual C++ 2010 Redistributable packages, both x64 and x86 versions. This can be done from the Control Panel > Uninstall Programs menu.
  • 2. Install the SDK 7.1. During install of the SDK 7.1, make sure to uncheck 'Visual C++ Compilers' and 'Microsoft Visual C++ 2010' during installation options.
  • 3. Apply the patch from Microsoft below onto the SDK 7.1 installation
  • 4. Reinstall the Visual C++ 2010 Redistributable packages: x86 version and x64 version

If you get a warning "This program might not have installed correctly!" for step 4 of these instructions, allow "repair with recommended settings for this computer" to run.

See here for instructions how to target for 64-bit binaries:

  • Download and install the Windows Software Development Kit version 7.1. Visual C++ 2010 Express does not include a 64 bit compiler, but the SDK does. A link to the SDK: http://msdn.microsoft.com/en-us/windowsserver/bb980924.aspx
  • Change your project configuration. Go to Properties of your project. On the top of the dialog box there will be a "Configuration" drop-down menu. Make sure that selects "All Configurations." There will also be a "Platform" drop-down that will read "Win32." Finally on the right there is a "Configuration Manager" button - press it. In the dialog that comes up, find your project, hit the Platform drop-down, select New, then select x64. Now change the "Active solution platform" drop-down menu to "x64." When you return to the Properties dialog box, the "Platform" drop-down should now read "x64." * Finally, change your toolset. In the Properties menu of your project, under Configuration Properties | General, change Platform Toolset from "v100" to "Windows7.1SDK".

See here for how to enable a 64-Bit Visual C++ Toolset on the Command Line.

Bug: ammintrin.h is missing, get it from HfstAmmintrinHeader. In header files OAIdl.h and propldl.h there are variables named bool, change them e.g. into bool_.

A simple example of creating a dll and using it

File basic.h:

namespace basic_library 
{
  unsigned int __declspec(dllexport) __stdcall add(unsigned int i, unsigned int j);
  void __declspec(dllexport) __stdcall say_hello();
}

File basic.cpp:

#include <iostream>
namespace basic_library {
  unsigned int __declspec(dllexport) __stdcall add(unsigned int i, unsigned int j) 
  {
    return i + j;
  }
  void __declspec(dllexport) __stdcall say_hello()
  {
    std::cout << "Hello world!" << std::endl;
  }
}

(you can use #definitions for __declspec etc)

Create the dll with command:

cl /EHsc /LD basic.cpp

The lib file is created automatically.

File program.cpp:

#include "basic.h"
#include <iostream>
int main()
{
  basic_library::say_hello();
  std::cout << "2 + 5 is " << basic_library::add(2,5) << std::endl;
  return 0;
}

Compile the program:

cl program.cpp /EHsc /link basic.lib

Run program.exe:

Hello world!
2 + 5 is 7

Example Python interface

File libbasic.i:

%module libbasic

%{
#include "basic.h"
%}

%include <windows.h>
namespace basic_library
{
  void say_hello();
  unsigned int add(unsigned int i, unsigned int j);
}

c:\test_sdk>C:\swigwin-3.0.5\swig.exe -python -c++ -I"C:\Program Files (x86)\Microsoft SDKs\Windows\v7.0A\Include" -Wall -o libbasic_wrap.cpp libbasic.i
c:\test_sdk>copy c:\python33\libs\python33.lib .
c:\test_sdk>cl /EHsc /LD /IC:\Python33\include\ libbasic_wrap.cpp /link basic.lib

Rename libbasic_wrap files into _libbasic and change extension dll into pyd.

Compiling the back-ends

Openfst

Some modifications were needed (committed to svn):

properties.h:
  extern const char * PropertyNames[];  -->  prepend OPENFSTDLL

compat.h:
  #pragma etc  -->  prepend #ifdef PRAGMA condition
  #include <dlfcn.h>  --> prepend #ifdef DLFCN condition

A test file openfst_test.cpp:

#include "fst/fstlib.h"
#include <iostream>

int main()
{
  using namespace fst;
  StdVectorFst fsm;
  fsm.AddState();
  return 0;
}

Compiling openfst backend and testing it:

c:\test_sdk\openfstwin\src\lib>cl /EHsc /LD /DOPENFSTEXPORT /D_MSC_VER /Feopenfst.dll /I ..\include\ compat.cc flags.cc fst.cc properties.cc symbol-table-ops.cc symbol-table.cc util.cc
c:\test_sdk\openfstwin\src\lib>cl /I ..\include\ openfst_test.cpp /D_MSC_VER /EHsc /link openfst.lib

foma

Some modifications were made and committed to svn. They are mostly handled with #ifdef _MSC_VER conditions.

A test file test_foma.cpp:

#include <iostream>
#include <string>

extern "C" {
  char * fsm_get_library_version_string();
}

int main()
{
  std::cerr << fsm_get_library_version_string() << std::endl;
  return 0;
}

Compiling foma backend and testing it. Note that you can exclude c files foma.c, stack.c, iface.c and lex.interface.c. If native lexc compilation is not needed, also lex.lexc.c and lexread.c can be excluded.

You need to fetch inttypes and stdint headers from here.

cl /EHsc /LD /D_MSC_VER /I . /Felibfoma.dll *.c
cl /I . foma_test.cpp /D_MSC_VER /EHsc /link libfoma.lib

There are some warnings that need to be handled...

lex.cmatrix.c(1220) : warning C4003: not enough actual parameters for macro 'cma
trixwrap'
sigma.c(388) : warning C4113: 'int (__cdecl *)()' differs in parameter lists fro
m 'int (__cdecl *)(const void *,const void *)'
spelling.c
spelling.c(37) : warning C4005: 'min' : macro redefinition
        c:\Program Files (x86)\Microsoft Visual Studio 10.0\VC\INCLUDE\stdlib.h(
855) : see previous definition of 'min'
structures.c(58) : warning C4113: 'int (__cdecl *)()' differs in parameter lists
 from 'int (__cdecl *)(const void *,const void *)'
structures.c(60) : warning C4113: 'int (__cdecl *)()' differs in parameter lists
 from 'int (__cdecl *)(const void *,const void *)'

Installations for a new computer

The Microsoft Windows SDK for Windows 7 and .NET Framework 4 requires the RTM version of the full, extended .NET Framework 4 Redistributable Components http://go.microsoft.com/fwlink/?LinkID=187668 (version 4.0 from http://www.microsoft.com/en-us/download/details.aspx?id=17851: dotNetFx40_Full_setup.exe)

Microsoft Windows SDK for Windows 7 and .NET framework 4. New computer runs Windows 7 on a 64-bit architecture. http://www.microsoft.com/en-us/download/details.aspx?displaylang=en&id=8279 (winsdk_web.exe)

Installation location: C:\Program Files\Microsoft SDKs\Windows\v7.1

Installing everything

Tools needed for compilation

  • MinGW with mingw-developer-toolkit, mingw32-gcc-g++ and msys basic package
  • readline package?
  • Python is needed for python bindings and tagger tools
  • Swig is needed for creating the python bindings
  • NSIS is needed for creating the installation package
  • Dependency Walker is a useful tool to track down dll dependencies

See http://msdn.microsoft.com/en-us/library/vstudio/9yb4317s%28v=vs.100%29.aspx for compiling 64-bit binaries (winsdk_web.exe).

Setup

32-bit MinGW

Fetch mingw-get-setup.exe here and run it. Choose installation directory C:\hyapp\MinGW, install packages mingw-developer-toolkit, mingw32-gcc-g++ and msys basic package. Copy file fstab.sample as fstab in directory C:\hyapp\MinGW\msys\1.0\etc\ and add line

C:\hyapp\MinGW  /usr/local

so that utilities in C:\hyapp\MinGW are available to msys.

Comment line (number 79)

_TR1_hashtable_define_trivial_hash(unsigned long long);

in /usr/lib/gcc/i686-w64-mingw32/4.8.1/include/c++/tr1/functional_hash.h (or similar) in order to avoid compilation error

template-id 'operator()<>' for 'std::size_t std::tr1::hash<long long unsigned int>::
  operator()(long long unsigned int) const' does not match any template declaration

Make a shortcut to C:\hyapp\MinGW\msys\1.0\msys.bat. Open the shortcut and fetch wget tool

mingw-get install msys-wget

64-bit MinGW

Fetch 64-bit mingw here and extract it to C:\hyapp\mingw64.

Comment the same line as in 32-bit MinGW.

Get msys here.

Copy file fstab.sample as fstab in directory path\to\msys\etc\ and add line

C:\hyapp\mingw64  /mingw

so that utilities in C:\hyapp\mingw64 are available to msys.

Install autotools locally in msys as told here:

# Assume we want to install them below $HOME/local.
     myprefix=$HOME/local
     
     # Ensure the tools are accessible from PATH.
     # It is advisable to set this also in ~/.profile, for development.
     PATH=$myprefix/bin:$PATH
     export PATH
     
     # Do the following in a scratch directory.
     # wget http://ftp.gnu.org/gnu/m4/m4-1.4.14.tar.gz
     wget http://ftp.gnu.org/gnu/autoconf/autoconf-2.64.tar.gz
     wget http://ftp.gnu.org/gnu/automake/automake-1.11.1.tar.gz
     wget http://ftp.gnu.org/gnu/libtool/libtool-2.4.tar.gz
     # gzip -dc m4-1.4.14.tar.gz | tar xvf -
     gzip -dc autoconf-2.64.tar.gz | tar xvf -
     gzip -dc automake-1.11.1.tar.gz | tar xvf -
     gzip -dc libtool-2.4.tar.gz | tar xvf -
     # cd m4-1.4.14
     # ./configure -C --prefix=$myprefix && make && make install
     cd autoconf-2.64
     ./configure -C --prefix=$myprefix && make && make install
     cd ../automake-1.11.1
     ./configure -C --prefix=$myprefix && make && make install
     cd ../libtool-2.4
     ./configure -C --prefix=$myprefix --disable-shared && make && make install
     
     # If everything succeeded, you can delete the scratch directory again.

Option --host=x86_64-w64-mingw32 may be needed.

You maybe have to add subdir-objects to AM_INIT_AUTOMAKE in configure.ac.

hfst-ospell

There are files windows-configure.ac and windows-Makefile.am in the hfst-ospell svn repository that are meant to be used when compiling on Windows. Some pkg macros don't work on MinGW so they are omitted in configure file. Just copy windows-configure.ac as configure.ac and windows-Makefile.am as Makefile.am.

The Makefile expects that tinyxml files and libarchive directory are present (see next).

It easiest to use tinyxml2 as the xml library in hfst-ospell as it has the least dependencies. tinyxml2.h and tinyxml2.cc files can be fetched here and copied to hfst-ospell directory.

hfst-ospell also depends on libarchive (version 3) which can be downloaded here (you possibly need to autoreconf --force --install and mingw-get.exe --install pthreads). Just copy libarchive-x.x.x/libarchive/* under hfst-ospell/libarchive/.

Get liblzma here if you don't have it (7zip program can be used to extract it).

Build hfst-ospell with

autoreconf -i
./configure --enable-zhfst --enable-xml=tinyxml2
make CFLAGS="-I/path/to/hfstospell-v.v.v/libarchive/ -fpermissive \
DHAVE_WCSCPY -DHAVE_WCSLEN" CXXFLAGS="-I/path/to/hfstospell-v.v.v/ \
libarchive/ -DWINDOWS" LDFLAGS="-lz -llzma -lbz2 -liconv"

The executable hfst-ospell.exe depends on the following DLLs:

  • libbz2-2.dll
  • libgcc_s_dw2-1.dll (or libgcc_s_seh-1.dll)
  • libiconv-2.dll
  • liblzma-5.dll
  • libstdc++-6.dll
  • zlib1.dll

They must be included in the NSIS installer.

Python

You need 32-bit and 64-bit Python versions 2 and 3.

You possibly have to erase all -mno-cygwin flags from c:/Python2x/Lib/distutils/cygwincompiler.py(c) because they are not compatible with gcc version 4 and will cause a compilation error in swig.

For 64-bit Python, you need to generate the definitions file by running in msys:

gendef.exe /c/Windows/System32/python2x.dll
dlltool.exe --dllname python2x.dll --def python2x.def --output-lib libpython2x.a
mv libpython2x.a /C/hyapp/Python2x/libs

and patch the file /C/hyapp/Python27/include/pyconfig.h by cutting out the following piece of code (starting from line 141)

#ifdef _WIN64
#define MS_WIN64
#endif

and pasting it above (line number 107)

#ifdef _MSC_VER

Swig

Swig is here.

NSIS

Get it here, version 2.4.6 currently installed. You probably need to add nsh files StrRep.nsh and ReplaceInFile.nsh under directory NSISDIR/Include/. You might need to change StrReplace into StrRep in file ReplaceInFile.nsh.

Dependency Walker

Get it here.

Compilation

bash of 64-bit MinGW complains about some *.hfst.script files that it cannot execute a binary file. So you maybe have to modify test.sh so that it returns a skip value 77..

hfst-parser.exe can be classified as a virus, so first try running make check in tools/src/parsers and allow the program, if needed..

Get newest HFST tarball here, extract it and run

autoreconf -i && \
./configure --enable-all-tools --disable-foma-wrapper --without-readline \
--enable-fsmbook-tests --enable-mingw --with-openfst-log=no && \
make && \
make check

There were 'File too big' errors from as.exe when creating TropicalWeightTransducer.o, this is now taken care of by splitting that file into four parts (the same has not been done to LogWeightTransducer.cc, but it is rarely used so it is enough to set with-openfst-log to no).

We disable foma (and lexc?) wrapper tool and readline utilities to avoid dependency of zlib and readline libraries.

Note/todo: make sure that standard streams are 'opened' in binary mode!

Go to directory swig and edit setup.py by changing line

data_files = []

into

data_files = ["libhfst-31.dll", "libgcc_s_seh-1.dll"|"libgcc_s_dw2-1.dll"]

Copy these files given in data_files to the surrent directory so that swig founds them. Make sure that location of swig.exe is in PATH and run

PYTHON setup.py build --compiler=mingw32 bdist_wininst

Note that the argument --compiler=mingw32 works both for 32-bit and 64-bit versions.

Todo: where should the dependency dlls be copied from?

Todo: Python links to msvcr90|100 dll while MinGW links to msvcrt. When creating Python bindings, you need to compile HFST using exactly the same mscv dll as the version of Python you are creating the bindings for. If Python and HFST dll link to different msvc dlls, you possibly get "ImportError: DLL load failed: Invalid access to memory location." error during import libhfst. This probably also explains why C++ standard streams could not be controlled via Python. Problem: liboldname90 available only for MinGW-32, not MinGW-64..

This can be achieved with specs files:

Ensure that the default specs are defined externally, (i.e. ensure that they are defined in the <mingw-root>/lib/gcc/mingw32/<gcc-version>/specs file), rather than built-in; if they are not, use the command

    gcc -dumpspecs > <mingw-root>/lib/gcc/mingw32/<gcc-version>/specs

(with appropriate substitutions for <mingw-root> and <gcc-version>), to create a suitable external specs file. Open the default external specs file in your favourite editor, and add the new specs:

*msvcrt:
msvcrt
     
*msvcrt_version:
     
     
*moldname:
moldname

Still editing the external default specs file, locate the definition for '*cpp:'; in a default specs file, it should look like:

*cpp:
%{posix:-D_POSIX_SOURCE} %{mthreads:-D_MT}

to that definition, add the substitution macro '%(msvcrt_version)', (note parentheses, not braces here, but do leave the braces as they are, in the original conditional macros), so that it now looks like:

*cpp:
%(msvcrt_version) %{posix:-D_POSIX_SOURCE} %{mthreads:-D_MT}

Similarly, locate the spec for '*libgcc:'; it should look something like:

*libgcc:
%{mthreads:-lmingwthrd} -lmingw32 -lgcc -lmoldname -lmingwex -lmsvcrt

Within this spec string, locate the specific reference for '-lmsvcrt', and change it to use the substitution macro '-l%(msvcrt)'; do likewise for the '-lmoldname' reference, so that the entire spec now looks like:

*libgcc:
%{mthreads:-lmingwthrd} -lmingw32 -lgcc -l%(moldname) -lmingwex -l%(msvcrt)

Save this modified specs file, as <mingw-root>/lib/gcc/mingw32/<gcc-version>/specs

Again using your favourite editor, create an additional specs file, in the same directory, calling it say 'msvcr90'; i.e. <mingw-root>/lib/gcc/mingw32/<gcc-version>/msvcr90. This additional specs file will augment the original one, when specified with -specs option. This should contain:

*msvcrt:
msvcr90
     
*msvcrt_version:
-D__MSVCRT_VERSION__=0x0900
     
*moldname:
moldname90

This file need contain no more than this; save it.

To compile applications, such that they use the default version of MSVCRT.DLL, you may continue to invoke GCC just as you usually would, e.g.

gcc -o foo foo.c

To compile applications, such that they use an alternative version of MSVCRT.DLL, simply add the appropriate -specs= option, e.g. for MSVCR90.DLL

gcc -specs=msvcr90 -o foo foo.c

(insert specs file in /mingw/lib/gcc/x86_64-w64-mingw32/4.8.2).

You also need to link libgcc and libstdc++ statically

gcc -specs=msvcr90 -static-libgcc -static-libstdc++ -o foo.exe foo.cc

and have a manifest file near the exe named foo.exe.manifest:

<?xml version='1.0' encoding='UTF-8' standalone='yes'?>
<assembly xmlns='urn:schemas-microsoft-com:asm.v1' manifestVersion='1.0'>
  <trustInfo xmlns="urn:schemas-microsoft-com:asm.v3">
    <security>
      <requestedPrivileges>
        <requestedExecutionLevel level='asInvoker' uiAccess='false' />
      </requestedPrivileges>
    </security>
  </trustInfo>
  <dependency>
    <dependentAssembly>
      <assemblyIdentity type='win32' name='Microsoft.VC90.CRT' version='9.0.21022.8' processorArchitecture='amd64' publicKeyToken='1fc8b3b9a1e18e3b' />
    </dependentAssembly>
  </dependency>
</assembly>

In directory libhfst/src/, run:

make CXXFLAGS="-specs=msvcr100" LDFLAGS="-static-libgcc -static-libstdc++"

to get an HFST library that can be used as a Python3 extension.

Go to directory NSIS, edit the values of HFST_DLL and LIB_STD_CPP_DLL in files copy_files.sh and hfst_installer.nsi, and run

./copy-files.sh
makensis hfst_installer.nsi

Todo: how to open the command prompt in utf-8 mode?

Issues on Windows

Printing to console

            const HANDLE stdOut = GetStdHandle(STD_OUTPUT_HANDLE);
            DWORD numWritten = 0;

            int wchars_num =
              MultiByteToWideChar(CP_UTF8 , 0 , pstr.c_str() , -1, NULL , 0 );
            wchar_t* wstr = new wchar_t[wchars_num];
            MultiByteToWideChar(CP_UTF8 , 0 ,
                                pstr.c_str() , -1, wstr , wchars_num );
            WriteConsoleW(stdOut, wstr, wchars_num-1, &numWritten, NULL);
            delete[] wstr;

Reading from console

      SetConsoleCP(65001);
      const HANDLE stdIn = GetStdHandle(STD_INPUT_HANDLE);
      WCHAR buffer[0x1000];
      DWORD numRead = 0;
      while (ReadConsoleW(stdIn, buffer, sizeof buffer, &numRead, NULL))
        {
          std::wstring wstr(buffer);

          int size_needed = WideCharToMultiByte(CP_UTF8, 0, &wstr[0], (int)wstr.size(), NULL, 0, NULL, NULL);
          std::string line( size_needed, 0 );
          WideCharToMultiByte(CP_UTF8, 0, &wstr[0], (int)wstr.size(), &line[0], size_needed, NULL, NULL);

          std::string linestr(line);
          expression += linestr;

fileno

lex.cmatrix.c: In function ‘cmatrix_init_buffer’:
lex.cmatrix.c:1651:9: warning: implicit declaration of function ‘fileno’ 
  [-Wimplicit-function-declaration]
         b->yy_is_interactive = file ? (isatty( fileno(file) ) > 0) : 0;

bash scripts

hfst-twolc is a shell script, in windows we use the corresponding batch file hfst-twolc.bat. The same goes for hfst-train-tagger which is replaced by hfst-train-tagger.bat in windows.

Streams in Windows

On MinGW, all transducer streams must be processed in binary mode. Else, carriage return and line feed characters are not handled properly. Streams to files are opened in binary mode under the HFST API, so the user doesn't need to take care for this. However, standard input and output streams cannot be wrapped under the HFST API as they are visible for everybody. This is why all HFST command line tools have binary mode defined for them via the following line in header file tools/src/inc/globals-common.h:

#include <fcntl.h>
int _CRT_fmode = _O_BINARY;

This definition should also be included in all programs that use the HFST API. It cannot be included in HfstTransducer.h, as it would yield multiple definitions of _CRT_mode in HFST API. Maybe hfst.h would be the right place?

With Swig, it seems impossible to process standard streams in binary mode. Adding

#include <fcntl.h>
int _CRT_fmode = _O_BINARY;

inside the section

%{
#define SWIG_FILE_WITH_INIT
#include "HfstTransducer.h"
#include "HfstInputStream.h"
#include "HfstOutputStream.h"
#include "HfstDataTypes.h"
#include "HfstFlagDiacritics.h"
#include "hfst_swig_extensions.h"
#include "HfstExceptionDefs.h"
%}

has no effect. Also adding to to the file hfst_swig_extensions.h a function

void set_binary_mode() {
   assert(stdin == freopen(0, "rb", stdin));
   assert(stdout == freopen(0, "wb", stdout)); 
}

and calling it in the beginning of the Python program seems to do nothing. Also executing in the beginning of the Python program the lines

sys.stdin = os.fdopen(sys.stdin.fileno(), 'rb', 0)
sys.stdout = os.fdopen(sys.stdout.fileno(), 'wb', 0)

and/or

if sys.platform == "win32":
import msvcrt
msvcrt.setmode(sys.stdout.fileno(), os.O_BINARY)
msvcrt.setmode(sys.stdin.fileno(), os.O_BINARY)
msvcrt.setmode(sys.stderr.fileno(), os.O_BINARY)

have no effect. Any combination of the previous doesn't either work.

NOTE/TODO: one of the following tricks will probably work if Python and HFST are linked to the same msvc dll!

The Swig/Python interface

import os
os.environ['PATH'] = 'my-app-dir' + ';' + os.environ['PATH']
import libhfst
Topic revision: r129 - 2015-05-15 - ErikAxelson
 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2018 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback