Finnish-Swedish Machine Translation Challenge 2006
T-61.6090 Special course in Language Technology V P

This is the KitWiki home page of the course, see also the official page at the Helsinki University of Technology, which has the course description and other information and an entry in the KIT Network course listing: KitT616090Course.

People at the course

The supervisors are KimmoKoskenniemi, TimoHonkela, LauriCarlson, MathiasCreutz, KristerLinden, VilleKononen (and the corresponding TWiki group is MtChallenge2006SupervisorGroup).

The participants were organized into three groups:

The supervisors and the three groups together correspond to a TWiki group MtChallenge2006Group.

There is also a separate page where you can see the names and the email addresses of the participants. That page is visible only for the supervisors and the participants and it can be used for copying/pasting the mail addresses for sending messages to idividual groups or supervisors or everybody on the course.


The course is over. Take a look at this brief presentation of the returned systems and the evaluation results.

Materials and tools available

In principle, there are no restrictions on the use of methods, materials and tools for producing a machine translation prototype - except that one must obey the restrictions of licenses and agreements. Because not all materials have been published or completed yet, there is a separate page where there is more detailed info. The page has now been updated to contain info about the training and test data of the challenge. The text filtering that will be used in the final evaluation is also described. (-- MathiasCreutz - 20 Oct 2006)

You can find some potentially useful background information in another page where there is a collection of MT related texts.

You can run some linguistic tools (software) on

  • Connexor's Functional Dependency Grammar Parsers for both Finnish and Swedish (fi-fdg, sv-fdg)
  • Lingsoft's morphological analyzer FINTWOL (fintwol). SWETWOL is not available, though.
  • Lingsoft's constraint grammar for Swedish (swecg).

If you want to use the purely data-driven morpheme segmenter Morfessor, contact MathiasCreutz. Good descriptions of other statistical tools (GIZA++, SRI language modeling toolkit, etc.) can be found on the page of the third group.

Software can be installed in other locations than your CSC home directories. This will save diskspace for you, and the other groups get access to the tools. Anssi Yli-Jyrä's has written some instructions on where to install software and corpora and what diskspace you can use.


The videos from the opening seminar of the course are available on the course page at

Events and milestones

  • Schedule
  • Passed:
    • Monday 25.9.2006: Initial version of the KitWiki pages for the course released. -- KimmoKoskenniemi - 25 Sep 2006 - 10:11
    • Friday 22.9.2006: Kick-off event at HUT.
    • Mon 27 November 2006: The test data is released, i.e., Finnish sentences that your system is expected to translate into Swedish. Correct Swedish answers are not provided. Read more on the materials page.
    • Sun 7 January 2007: The teams will return the Swedish translation of the Finnish test data by this date. Each team has the right to supply up to three different solutions, if they have different variants of their systems, and they cannot decide on one single best system.
    • Wed 10 January 2007 (afternoon): The final meeting of the course. The participating teams will present their systems, and they will hand in their reports. The winning team will be announced, and prizes will be handed out by the representatives of the companies.
  • Page for information and links on MT related texts created. -- TimoHonkela - 25 Sep 2006 - 17:36

Questions and answers

  • Question about CSC resources: We got the user accounts for, which is great. How and where can we run computationally expensive processes? Running GIZA++ requires quite a lot of memory and cpu time. -- JaakkoVayrynen - 28 Sep 2006 - 11:13
  • Question about evaluation: Can we assume that both the source text and target text are in lowercase? Are there other considerations (e.g. coding) that we should take into account? -- JaakkoVayrynen - 27 Sep 2006 - 21:07


-- KimmoKoskenniemi - 24 Sep 2006

