FIN-CLARIN Annual Report for the Year 2010

There were two main tasks for the FIN-CLARIN during 2010: (1) Supporting the completion of the EU-wide Common Language Resource and Technology Infrastructure (CLARIN) preparatory phase project and (2) preparing and initial building of the Finnish national language resource infrastructure.

In 2010, FIN-CLARIN received 500,000 euro funding from the Ministry of Education and Culture of Finland as part of the ordinary budget of the University of Helsinki.

The licensing conventions and standard license templates were established both for the needs of the European CLARIN and for FIN-CLARIN.

FinnWordNet was checked for obvious typos and inconsistencies, and converted into a format where the relation between the Finnish and the English synonym sets (synsets) is retained so that the relations of the English synsets are accessible for Finnish synsets as well. The initial version of FinnWordNet was published with a web interface ( in the summer of 2010.

The initial 17,000 sentence grammar definition corpus of the Finnish Treebank was manually annotated by the end of 2010. A call for proposals was sent for several language technology companies for offers to produce syntactic annotation for some 70 million words of Finnish texts (Europarl and JRC Aquis). The parsing will be completed in the spring of 2011.

The AAI tool was completed in the summer of 2010 and will be taken into active service as part of the new scientist's interface of the CSC.

The materials in the language bank at CSC (Kielipankki) and various corpora from the University of Helsinki were provided with CLARIN metadata in order to include them in the CLARIN compatible services. At the same time, FIN-CLARIN took a comprehensive responsibility for developing the language bank into a CLARIN compatible national service for wider user groups.

The Helsinki Finite-State Transducer Technology (HFST and project prepared the version 3 of the finite-state software tools which now include the Foma program which implements a replace rule formalism similar to the Xerox XFST system. The version 3 of the HFST was released in early spring of 2011.

