Omorfi–performance testing

This page describes how I have tested the performance of omorfi rulesets, using various implementations of finite state operations. Times are given for compiling the rulesets using composition and such.

Methods

Attached Makefile contains all the needed scripts, the rulesets are from omorfi package.

  • time in SFST is time fst-compiler-utf8 .sfst /dev/null
  • time in OpenFST composition is time fstcompose 1.openfst 2.openfst | time fstrmepsilon | time fstencode  --encode_weights --encode_labels - 3.openfstkey | time fstdeterminize | time fstminimize | time fstencode - 3.openfstkey > /dev/null
  • time in FSM composition is time fsmcompose 1.fsm 2.fsm | time fsmrmepsilon | time fsmencode  -cl - 3.fsmkey | time fsmdeterminize | time fsmminimize | time fsmencode - 3.fsmkey > /dev/null
  • for other OpenFST or FSM operations, time is simply time fsmxxx 1.fsm 2.fsm or time fstxxx 1.fsm 2.fsm
  • Number of vertices |V| in SFST is fst-print .sfsta | tail -n 1 | cut -f 1
  • Number of edges |E| in SFST is fst-print .sfsta | wc -l
  • Number of vertices |V| in openfst is fstinfo .openfst | grep '# of states'
  • Number of edges |E| in openfst is fstinfo .openfst | grep '# of arcs'
  • Number of vertices |V| in FSM is fsminfo -v .fsm | grep '# of states'
  • Number of edges |E| in FSM is fsminfo -v .fsm | grep '# of arcs'
  • user time measurements are measured from time utility under user
  • real time measurements are measured from date utility with as high precision as supported (in GNU/Linux date gives time in nanoseconds, although I suspect it is not).

Machines

Low end laptop (1) is:

* Linux 2.6.22.1 #1 Sun Sep 9 16:18:52 EEST 2007 i686 Intel(R) Celeron(R) M CPU 430 @ 1.73GHz GenuineIntel GNU/Linux:

/proc/cpuinfo:

processor       : 0
vendor_id       : GenuineIntel
cpu family      : 6
model           : 14
model name      : Intel(R) Celeron(R) M CPU        430  @ 1.73GHz
stepping        : 8
cpu MHz         : 1729.183
cache size      : 1024 KB
fdiv_bug        : no
hlt_bug         : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 10
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat clflush dts acpi mmx fxsr sse sse2 ss tm pbe nx constant_tsc pni monitor tm2 xtpr
bogomips        : 3461.06
clflush size    : 64
/proc/meminfo:
MemTotal:       505704 kB
  • SFST Version 1.2
  • FSM Version 4.0, 2003
  • OpenFST 20080422
  • GNU time 1.7

Corona description (2). Using one processor at a time, 8 gig RAM cap, and solaris versions of above, exepting:

  • bash 3.0 time instead of GNU time.

Note: in corona, resource queues must be used. Either following qsh wrapper or qmake with proper parameters can be used for this:

#!/bin/sh
#$ -S /usr/bin/bash
#$ -N fsmomorfitime
#$ -o $HOME/fsm-times.txt
#$ -j y
#$ -m beas
#$ -M example@example.com
#$ -cwd
#$ -l h_rt=2:00:00
#$ -l h_vmem=8000M
#$ -V
gmake time-sfst
gmake time-openfst
gmake time-fsm

Times

Times are measured in seconds. User time represents time used by the process in its tasks. Real time is whole time spent as observed by end user. User time plus system time (i.e. time spent doing kernel stuff such as memory management) plus time spent idling equals real time. Numbers in header row refer to machines described above. E.g. 1 to my laptop, 2 to corona etc.

Module 1) SFST user 1) SFST real 1) FSM user 1) FSM real 1) OpenFST user 1) OpenFST real
kotus-sanalista 0 0.02 0 0.014 0 0.014
plurale-tantum 2.16 3.5 2.6 2.88 5.09 5.50
gradation 37.7 42.3 48.76 54.62 85.11 102.33
stubify 2.63 3.1 3.46 3.7 5.22 5.74
stemfill 4.78 5.3 4.46 6.1 7.83 9.94
TOTAL 47 54 53.3 66.6 102 113.69

Sizes

Module SFST V SFST E FSM V FSM E OpenFST V OpenFST E
kotus-sanalista 76030 155212 76031 155200 76031 155200
plurale-tantum 76015 155197 76016 155185 76016 155185
gradation 75627 154775 75628 154773 75628 154773
stubify 74591 154086 74592 154084 74592 154084
fillstems 26391 57182 26392 57181 26392 57181

Equivalence

Transducers were briefly tested for equivalence using yet another set of conversion scripts (see attachments) and fst-compare from SFST.

The rulesets were all equivalent.

The transducers were all equivalent.

(Assume differences in sizes come from alphabet, pair and/or weight handling)


-- TommiPirinen 2008
Topic attachments
I Attachment Action Size Date Who Comment
Unknown file formatEXT Makefile manage 22.3 K 2008-05-05 - 17:38 UnknownUser Makefile with timing and translation scripts
Unix shell scriptsh equivalence-check.sh manage 0.6 K 2008-05-07 - 23:24 UnknownUser Eqyuvalence pairwise check shell script
Texttxt fsmprint2sfst.py.txt manage 2.0 K 2008-05-06 - 23:12 UnknownUser Script to convert openfst print output to sfst print format
Texttxt sfst-print2fsm.py.txt manage 2.7 K 2008-05-06 - 23:10 UnknownUser Convert SFST print format to OpenFST compile format
Texttxt unirewrite.py.txt manage 1.8 K 2008-05-07 - 23:23 UnknownUser Simple unicode/compatiability/bug/alphabet rewriter
Topic revision: r7 - 2008-05-07 - TommiPirinen
 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2018 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback