TODO

Planning and task assignment scratchpad

Things to do first:

  1. Establish this MoltoWeb on the CSC TWiki analogous to Clarin to hold everything local to the UHEL side of the project
    1. List of people on UHEL, contactinfo, working days, hours, room, tasks on page HyMoltoRoomInfo
      1. Seppo Nyrkkö "sub-project manager"
      2. Lauri E. Alanko (started)
      3. Inari Listenmaa (started)
      4. Mirka "Hissu" Hyvärinen (started)
    2. Do a project calendar with known deadlines and other dates
    3. Decide on internal comm: mailing lists or what? Mailman?
  2. Update our info on the MOLTO website www2.molto-project.eu
    1. staff and roles (who to contact on what)
    2. work package descriptions
  3. Create a working environment on hippu.csc.fi (or other?) shareable to people on the project:
    1. subversion or other versioning (Bazaar?)
    2. Grammatical Framework
    3. OWLIM/Sesame
    4. Stanford Parser
    5. anything else that is needed
    6. molto group on hippu
  4. Create comm lines to dev and user sites (liaison persons in each site)

More things TODO 26.04.2010

Evaluation platform on Hippu (planning assigned to Inari 26.04.2010)

When Hippu is up and running, we can start building local evaluation site. It will be populated with generic and project specific translation quality evaluation tools. There are two classes tools to consider:

  1. Statistical evaluation packages like BLEU
  2. Language technology based evaluation tools

The aim is to concoct a mixture that is useful for evaluating MOLTO translations.

Planning

Tarkoitus on pystyä käyttämään standardeja konekäännöksen evaluaatiotyökaluja kuten BLEU ja Barcelonan ryhmän tekemiä työkaluja vaivattomasti MOLTO:n käännöksiin. Jatkossa on määrä modifioida niiitä niin, että ne tuottavat mahdollisimman valaisevia tuloksia MOLTOn käännöksistä. Tämä voi vaatia muun muassa sopivan täydentävän kieliteknologian lisäämistä evaluaatiomenettelyyn, esimerkiksi suomen morfologian osalta.

- Perusta ensin MOLTO UHEL Wikiin evaluaatiotyöpakettisivuston alle sopivan niminen alasivusto, johon keräät saatavilla olevia artikkeleita, evaluaatiotyökalujen osoitteita, käyttöohjelinkkejä / vinkkejä ym.

- Tee myös sivu, johon kuvaat em. selvityksen perusteella hakemistorakenteen Hippuun saatavilla olevia työkaluja ja testikorpuksia varten.

Implementation

Steps:

  1. Create a directory structure on Hippu to hold evaluation tools and corpora
  2. Download off the shelf tools like BLEU script
  3. Plan directory structure for working on corpora in language pairs to be evaluated in MOLTO
  4. Provide test corpora for calibration of tools for the language pairs
  5. Calibrate the off-the-shelf tools (make sure we get reference results on reference test corpora)
  6. Calibrate tools for MOLTO corpora

References:

TODO 25.05.2010

MOLTO tools including third-party evaluation tools should be in place. While waiting for MOLTO specific corpora, one way to get to know and calibrate them for MOLTO use is to do a trial evaluation of Maarit's data. So let us look into that now.

  1. Get the sources, the MT and the human translations
  2. Convert to format suitable for processing
  3. Run off the shelf tests
  4. Calibrate against human evaluation
  5. Locate difficulties from MOLTO point of view

-- LauriCarlson - 2010-04-26

TODO 29.06.2010

Summer plans:

Evaluation WP

  • Hissu to get museum data from Gothenburg
    • check if it is good enough for doing evaluation testing over summer
    • do statistics on it (for statistics course and for September report)

  • Inari, Hissu, Maarit: think creatively about evaluating MOLTO, specifically:
    • can we use MOLTO to evaluate its own output, viz:

  • Combine GF grammars and SMT tools
    • general idea to use simpler more reliable parts to evaluate the whole
    • for instance:
      • back translate
      • simplify and translate == translate and simplify

  • Instead/in addition to comparing output with human model translations, evaluate
    • coverage (with lexicon editing and out-of-vocabulary guessing)
    • throughput (with pre- and postediting) on the forthcoming platform
      • structured post-editing (menus for lexical choice, fix concordances after lexical substitution)
    • fluency (assume correctness) by doing SMT language modeling

  • Collect and summarise previous literature on these topics!

  • Find out from Jyrki how to use English-Finnish wordNet (IQMT METEOR)
    • In what way IQMT METEOR uses WordNet
    • IQMT METEOR is not working now.

  • Stanford parser / Turku parser to IQMT?

  • OMORPH usability

Translator's tools WP

  • Collect to Wiki and summarise literature on constrained language editing

  • Survey web based platforms for translation editing for facilities to copy in MOLTO
    • can we find an existing platform that can be adapted for MOLTO
    • should have translation memory

  • Find out from Aarne's gang about where we are with MOLTO editing tools API
    • what to expect in the API
    • when available

Ontologies and lexicon

  • Seppo to find out Ramona and the Bulgarians plans concerning ontologies

  • lealanko to connect up with the TF guys (Junyou, Niklas) about setting up TF test community
    • for the summer, perhaps enough to work from cs over VPN within the university net

TranslationTermEditor

TFS server tfs.ling.helsinki.fi

tfs.ling.helsinki.fi description for admins

Topic attachments
I Attachment Action Size Date Who Comment
Compressed Zip archivezip sumo_sources.zip manage 23.2 K 2010-08-07 - 20:05 UnknownUser main files for the SUO to GF conversion
Topic revision: r7 - 2011-06-16 - SeppoNyrkko
 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2018 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback