HFST: (name of this topic page)

12:45 < meriponi> For everyone using hfst-xfst, the latest commit to git repository: "Swap implementations of 'apply up' and 'apply down' in hfst-xfst, so that these commands work in the same way as in foma and xfst."
12:58 < Flammie> cool
13:00 < Unhammer> this'll break lots of scripts?
13:00 < Unhammer> though maybe people have been using foma to apply u/d anyway?
13:01 < Flammie> possibly, though people who use xfst scripting language deserve broken scripts :-)
13:05 < meriponi> yes, users have to update their scripts
13:24 < TinoDidriksen> Is there any way to detect what a version does?
13:25 < Unhammer> could probably write a small script at least …
13:26 < sjnomos> what is most problematic is that hfst-xfst now works as foma/xfst, whereas the rest of the hfst tools presumably works like earlier
13:26 < sjnomos> ie the behevior is internally inconsistent
13:26 < sjnomos> internally in hfst, that is
13:27 < sjnomos> whatever one thinks of the xerox convention, a consistent behavior among all tools would be the best
13:27 < Unhammer> example?
13:28 < Unhammer> (I thought apply up/down was particular to (hfst-)xfst)
13:28 < sjnomos> hfst-lookup does only lookup
13:29 < sjnomos> which applies the lookup in the way the fst source code was written (ie abc+N:abc will do abc+N using hfst-lookup, to do abc -> abc+N you have to invert the fst first - for foma and xerox’ lookup it is the oposite)
13:31 < sjnomos> this is a major change for the giella infrastructure and all its users, meaning a change in hfst should be followed by a new release of hfst coupled with a similar change in the giella infra requiring that version of
13:32 < sjnomos> in addition to hfst-xfst and hfst-lookup, there is also the tool hfst-optimised-lookup
13:33 < Unhammer> isn't that inconsistency rather in lexc tool?
13:33 < Unhammer> I mean, lookup takes a "unidirectional" fst
13:33 < Unhammer> lexc is the one that gives the bidirectional source a direction
13:33 < Unhammer> (what's the xerox equiv of lexc?)
13:34 < sjnomos> lexc@
13:34 < sjnomos> lexc*
13:34 < Unhammer> ok :)
13:34 < Unhammer> lexc©
13:34 < sjnomos> and no, it is not an inconsistency in lexc
13:34 < Unhammer> ok, so xerox lexc creates a generator
13:34 < Unhammer> just like hfst-lexc?
13:34 < TinoDidriksen> The error message is pretty clear, so should be trivial to script a test that tries one direction and sees if that works, then swaps if not.
13:35 < sjnomos> all tools, xerox’, foma and all hfst tools produce fst’s with the same «directionality», as seen when doing compositions etc
13:35 < sjnomos> the only difference is in doing lookups (using whatever tool suitable)
13:36 < sjnomos> where xeox and foma «looks up» from word form to analysis (as typically seen in liguistic text books)
13:36 < sjnomos> and hfst «looks down» for the same input
13:37 < sjnomos> the consequence is that lexc as traditionally written gives the same reading linearity in lexc as in the input to the lookup tool, whereas xerox and foma gives the oposite direction
13:38 < sjnomos> I find the hfst logic simpler and more readable, but in the end it all boils down to habits
13:38 < Unhammer> so if the compiled FST file is bidirectional, then it should be possible to just have a switch to hfst-lookup to reverse it?
13:39 < sjnomos> consistency is the most important thing, and if one starts to change one hfst tool, one has to change all
13:39 < sjnomos> yes
13:39 < Unhammer> weird. I would've thought they'd be optimised for one direction
13:39 < sjnomos> bidirectional = transducer (I assume)
13:40 < sjnomos> well, the optimised lookup format is optimised for one direction, but that is just a speed thing
13:40 < sjnomos> and has nothing to do with the fst itself
13:40 < Unhammer> bidirectional as in the binary can be used for form&#8594;analysis and analysis&#8594;form without running hfst-reverse first
13:40 < Unhammer> so hfst-optimised-lookup does not need a change, but hfst-lookup does?
13:41 < sjnomos> we have been talking about different things
13:41 < sjnomos> I was thinking «bidirectional as having two sides (ie transducers)»
13:42 < Unhammer> I meant as in "there is no preferred direction"
13:43 < Unhammer> e.g. when you run hfst-summarise on a compiled lexc, it uses the terms "input" and "output"
13:44 < Unhammer> $ printf "LEXICON Root\na:b #;"|hfst-lexc |hfst-summarise -v
13:44 < Unhammer> says "a" is an input symbol
13:44 < Unhammer> $ printf "LEXICON Root\na:b #;"|hfst-lexc|hfst-invert |hfst-summarise -v
13:44 < Unhammer> says "b" is an input symbol
13:45 < Unhammer> so it seems to me that the "up" vs "down" distinction is thrown away at compilation, and turned into an input vs output distinction
13:47 < sjnomos> up vs down has no meajning in compilation either
13:54 < sjnomos> the only instance where «up» and «down» has meaning is when doing lookup (using xfst, the lookup tool or similar)
13:57 < sjnomos> and up and down only makes sense given knowledge about a certain linguistic tradition where one goes froml a surface form and «up» to the underlying analysis (cf that syntax trees typically are printed above the
                 sentence in lingistic papers - there is no inherent up or down direction in the sentence, just a printing convention)
14:02 < meriponi> we talked about this for some time in hfst meeting and finally the consensus was that apply up/down should work as they do in xfst and foma
14:03 < meriponi> but it is a big change i agree
14:05 < Unhammer> hm, the only time I've had to think about what's "up" vs "down" is when using xfst; with hfst-lookup I just think "is this an analyser? then the form is the input". So for me, it's very much how the binary was compiled
                  that defines what I expect to be input/output of hfst-lookup.
14:06 < Unhammer> but I don't understand how, when a:b|lexc gives an fst from a&#8594;b, composition would work the same, cf
14:06 < Unhammer> <sjnomos> all tools, xerox’, foma and all hfst tools produce fst’s with the same
14:06 < Unhammer>         «directionality», as seen when doing compositions etc
14:09 < meriponi> ok, maybe i'll revert the latest changes
14:09 < meriponi> we clearly need to discuss this with all users before any changes are done :)
14:11 < Unhammer> well, don't base it off my confusion at least; I don't use xerox tools anyway :)
14:14 < meriponi> i think sjnomos had some reservations as well :)
14:17 < Unhammer> mm
14:28 < meriponi> reverted
15:08 < sjnomos> Unhammer: from the fsm-book: «When we enter a word (as a string of symbols) into a network, to see if it is contained in the language of the network, we often talk of LOOKING UP the word. From this point of view, the
                 network is a kind of dictionary.»
15:08 < sjnomos> The default is to look up word forms, thus that is the unmarked case. To do the same for generation, one has to invert the net (or in xfst/foma/xfst-hfst) do a lookdown (apply down)
15:08 < sjnomos> that is, for the specialised lookup tools, one always have to invert the net

-- ErikAxelson - 2016-04-05

Topic revision: r1 - 2016-04-05 - ErikAxelson
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2018 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback