Info on the MOLTO website

WP descriptions

There are 2 work packages led by UHEL, WP3 and WP9.

WP3: Translator's Tools

Conversions from external data sources into GF format

UHEL will provide means to import terminology and structure data from the Content Factory project.

The interfaces from the Translator's tools to GF will be defined. Many aspects are to be considered:

  • describing grammar by examples
  • defining ned syntactic structures

The Translator's tools will be developed on a web platform. User-friendliness will be a major aspect of the design. We must keep the tools comfortable for experienced translators.

Also writing a MT friendly text in a controlled language like simplified english (as used on simple.wikipedia.org) can be seen as a use case for GF-based multilingual authoring.

WP9: User Requirements and Evaluation

My comments and questions are in bold font. I'm in a bit of a simple wikipedia mode, and I don't think that's a bad thing. --Inari

Requirements for work

The work will start with collecting user requirements for the grammar development IDE (WP2), translation tools (WP3), and the use cases (WP6-8).

  • In which month will it start? Who are the users and where do we find them? --Inari
  • This is assumed to be an instantly ongoing process. This is related to the D9.1 package --Seppo

We will define the evaluation criteria and schedule in synchrony with the WP plans (D9.1). We will define and collect corpora including diagnostic and evaluation sets, the former, to improve translation quality on the way, and the latter to evaluate final results.

  • What does "define and collect" actually mean? I understood that those who work on the use cases will acquire themselves the corpora they need: UPC will get the math corpora, Ontotext will get the patents and so on. Does this mean that we are the ones who decide which parts of the corpora are used in development, diagnostic use and evaluation? --Inari

Corpus definitions

Each corpus available for MOLTO will be described by the providing project members.

The corpora will be split for development, diagnostic and evaluation use.

  • Who will split them? Is it a task for WP9 to split them all, or does each organization decide themselves? --Inari
  • Each organization should be consulted with the splitting --Seppo

Contact persons will be named for questions and requests for each corpus.

  • I suppose the contact persons are from the project members that will do the tests: for math corpus someone from UPC, etc? --Inari
  • I suppose there is also contact persons to be named at Matrixware (related to patents) and UGOT (related to cultural heritage) --Seppo

Storage places and access protocols to gain this specified corpus data will be defined.

Description of end-user workflow

Translator's new role (parallel to WP3: Translator's tools) will be designed and described when?. Most current translator's workbench software treat the original text as read-only source. The tools to be developed within WP3 (+ 2) will lead towards more mutable role of source text. The translation process will resemble more like structured document editing or multilingual authoring than transformation from a fixed source to a number of target languages.

  • Sounds good: now we don't really have to design and describe anything, just tell that we will at some point. What's the deadline, the first WP9 deliverable or what?--Inari
  • Yes, definitely D9.1 at Month 6 --Seppo

We will only provide a basic infrastructure API for external translation workbenches and keep an eye on the "new multilingual translator's workflow".

Introduction of WP liaison persons and other contacts

For each work package, the liaison contact information and work progress will be kept up-to-date on the MOLTO web site.

  • Good, you make us sound important. In theory, I guess it should be responsibility of anyone who is doing something to report the progress, but in practice, nobody will do it anyway. So this is Hissu's job, to ask people "what are you doing? when will it be ready?"? We could mention that we have a contact person, Mirka Hyvärinen, who everybody should be in contact with. --Inari
  • Yes, we should put Hissu on the MOLTO site. --Seppo

Also possibility to access UHEL's internal working wiki "MOLTO kitwiki" will be granted upon request to other project members.

EVALUATION OF RESULTS

Evaluation aims at both quality and usability aspects. UHEL will develop usability tests for the end-user human translator. The MOLTO-based translation workflow may differ from the traditional translator's workflow. This will be discussed in the D9.1 evaluation plan.

To measure the quality of MOLTO translations, we compare them to (i) statistical and symbolic machine translation (Google, SYSTRAN); and (ii) human professional translation. We will use both automatic metrics (IQmt and BLEU; see section 1.2.8 for details) and TAUS quality criteria (Translation Automation Users Society). As MOLTO is focused on information-faithful grammatically correct translation in special domains, TAUS results will probably be more important.

Given MOLTO's symbolic, grammar-based interlingual approach, scalability, portability and usability are important quality criteria for the translation results. For the translator's tools, user-friendliness will be a major aspect of the evaluation. These criteria are quantified in (D9.1) and reported in the final evaluation (D9.2).

In addition to the WP deliverables, there will be continuous evaluation and monitoring with internal status reports according to the schedule defined in D9.1.

-- InariListenmaa - 2010-03-17

Topic revision: r8 - 2010-04-16 - InariListenmaa
 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2018 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback