HFST: Ideas for a new Python API

Feature requests

  • DONE, SPEED IS ABOUT THE SAME: Support lookup for tropical Openfst transducers. Currently:
    >>> import libhfst
    >>> f = libhfst.fst('foo')
    >>> f.lookup('foo')
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/usr/local/lib/python2.7/dist-packages/libhfst.py", line 8001, in lookup
        retval=self.lookup_fd_string(input, max_number, time_cutoff)
      File "/usr/local/lib/python2.7/dist-packages/libhfst.py", line 7897, in lookup_fd_string
        return _libhfst.HfstTransducer_lookup_fd_string(self, *args)
    libhfst.FunctionNotImplementedException
    
    This is bad because users expect simple stuff like this to work. Maybe actually implement lookup for openfst transducers instead of converting to basic transducer. Sam Hardwick says: one way to get this is implicitly turning the input string into an automaton, composing and doing extract_strings(). It's faster than doing a conversion. Miikka Silfverberg says: This sounds like it could be good enough at least for the time being. According to the following experiment, the speed is probably okay (> 3500 w/s on my computer):
    >>> import libhfst
    >>> nums = libhfst.fst([str(i) for i in range(100000)])
    >>> nums.minimize()
    <libhfst.HfstTransducer; proxy of <Swig Object of type 'std::vector< hfst::HfstTransducer >::value_type *' at 0x7f23d94bf9f0> >
    >>> def lookup():
    ...     w = libhfst.fst('29786')
    ...     w.compose(nums)
    ...     return w.extract_paths()
    ... 
    >>> from timeit import timeit
    >>> timeit(lookup, number=1000)
    0.2846348285675049
    
  • It would be nice if everything including the hfst3 library could be installed using pip.

Design Ideas

  • DONE: Make HfstInputStream behave as python files:
    >>> f = libhfst.HfstInputStream('foo.hfst')
    >>> for fst in f: 
    ...    print(fst)
    
  • Because swig seems to have problems related to member functions returning references: Implement a c++ wrapper for HfstTransducer where e.g. member function minimize is void. In the python interface, there would be two functions
    1. DONE: Member function minimize returning None.
    2. libhfst.minimize taking an fst and returning a new minimized fst. (IS THIS NEEDED, or is it enough just to take a copy and then call the modifying member function?)
  • There should be a libhfst.exceptions sub module for exceptions.
  • Maybe hide all symbol vector stuff because it isn't needed as native python data structures should be used instead. (??)
  • Gather everything related to transducer type in libhfst.format but implement a lookup_optimize member function which returns a separate lookup fst which only supports lookup. (??)

Bugs

  • FIXED: Opening a file which doesn't exist crashes.
    >>> import libhfst
    >>> libhfst.HfstInputStream('nonexistent.hfst')
    terminate called after throwing an instance of 'HfstException'
    Aborted
    
  • FIXED by wrapping member functions and returning a void instead: Member functions that return references work strange
    >>> import libhfst
    >>> s = libhfst.fst('s')
    >>> s = s.minimize()
    >>> print(s)
    terminate called after throwing an instance of 'FunctionNotImplementedException'
    Aborted
    
  • FIXED: Using libhfst.HfstTransducer leads to strange behavior because HfstTransducer::~HfstTransducer throws an exception when the transducer type is undefined.
    >>> fst = libhfst.HfstTransducer()
    >>> fst = libhfst.fst('foo')
    terminate called after throwing an instance of 'FunctionNotImplementedException'
    Aborted
    
  • FIXED: Lookup doesn't support unicode strings.
    >>> import libhfst
    >>> foo = libhfst.fst('foo')
    >>> foo.convert(libhfst.HFST_OLW_TYPE)
    <libhfst.HfstTransducer; proxy of <Swig Object of type 'std::vector< hfst::HfstTransducer >::value_type *' at 0x104a46840> >
    >>> foo.lookup('foo')
    (('foo', 0.0),)
    >>> foo.lookup(unicode('foo'))
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/Users/mpsilfve/.virtualenvs/magicwords2/lib/python2.7/site-packages/libhfst.py", line 8242, in lookup
        raise RuntimeError('Input argument must be string or tuple.')
    RuntimeError: Input argument must be string or tuple.
    

-- ErikAxelson - 2016-02-23

Topic revision: r11 - 2017-11-21 - ErikAxelson
 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2018 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback