HFST:Palindromes

Under construction...

We examplify the use of HFST command line tools with an example taken from Beesley & Karttunen that creates a transducer that extracts palindromes from a given file. $FORMAT is the implementation type of the transducer. The solution given on this page can also be executed with a single script .

The solution given here differs from Beesley & Karttunen's one, because HFST has no compile-replace operator. Instead, we splice a transducer into a stream of transducers with a pair of commands HfstFst2Strings and HfstStrings2Fst and apply the needed operations separately for each transducer.

We assume that /usr/dict/words, a 23K English word list, is available on the machine.

We first construct BidirEnglish that contains all the words whose reverse is also a word of English, for example, "madam" and "dog". We wish to keep "madam" (reversed "madam") and eliminate "dog" (reversed "god").

cat /usr/dict/words | hfst-strings2fst -f $FORMAT -j > English

We intersect English with its reverse. We only take into account words that contain at least two characters. (Words like "a" and "I" are not interesting here.)

echo '[? ?+]' | hfst-regexp2fst -f $FORMAT > MinTwoChars
hfst-reverse English | hfst-conjunct English | hfst-conjunct MinTwoChars > BidirEnglish

Next we print all strings recognized by BidirPaths and construct a stream of transducers where each transducer contains one string recognized by BidirPaths. Then we create a corresponding stream of reversed transducers and conjunct the streams. The resulting stream will contain transducers that represent a palindrome (e.g. "madam") and empty transducers (e.g. the transducers resulting from the intersection of "dog" and "god" and vice versa).

hfst-fst2strings BidirEnglish | hfst-strings2fst -f $FORMAT > BidirPaths
hfst-reverse BidirPaths > BidirPathsReversed
hfst-conjunct BidirPaths BidirPathsReversed > Palindromes

We finally print all palindroms. Empty conjunctions are printed as empty strings so we only need to filter out the lines that separate strings coming from separate transducers, i.e. lines --.

hfst-fst2strings Palindromes | grep -v "\-\-"


-- ErikAxelson - 2011-09-19