CKT276 (Ikk323 (Kpk392)) Konekäännös / Machine translation

Käsitteitä ja termejä / Concepts and terms

Wikipedia sv Text corpus

Wikipedia sv Parallel text

EAGLES 1996

Monikielisten korpusten hankinta / ML corpus acquisition

Resnik 2003: The Web as a parallel corpus

Varga 2005: A Rapid Development Toolchain for Building Large Parallel Corpora

Monikielisten korpusten sovellukset / Applications of ML corpora

Jörg Tiedemann, Uppsala: Recycling Translations

Dyvik: From Parallel Corpus to Wordnet

Jennifer Spenader, Groningen: Parallel Corpora (WSD)

Knight/Koehn 2003 What is new in SMT (Tutorial)

Salkie 2003: Using parallel corpora in translation

Korpuslinkkejä / Corpus links

Wikipedia: some notable text corpora

Richard Zhonghua Xiao: Corpus survey

Michael Barlow: Text corpora and corpus linguistics

David Lee: Bookmarks for Corpus-based Linguists

Manuel Barbera: Multilingual and Parallel Corpora

Yvonne Breyer: Gateway to Corpus Linguistics

Europarl corpus

JRC-Acquis corpus

EUROVOC thesaurus

University of Maryland Parallel Corpus Project: The Bible

Mona Baker: Translational English Corpus (TEC)

OPUS - an open source parallel corpus

MTChallenge-kurssin korpuslinkit (Säde Seppälä)

Tomas Erjavec

Formaatteja / Formats

TEI (Text Encoding Initiative) http://www.tei-c.org/

EAGLES (X)CES http://www.cs.vassar.edu/XCES/

AG (Annotation Graph) http://arxiv.org/abs/cs.CL/9907003 http://agtk.sourceforge.net/ http://www.ldc.upenn.edu/AG/doc/xml/alignment.xml

TIGER http://www.ims.uni-stuttgart.de/projekte/TIGER/

TUSNELDA, Tübingen 2001 http://www.sfb441.uni-tuebingen.de/c1/tusnelda-guidelines.html

LISA TMX http://www.lisa.org/standards/tmx/tmx.html

Tiedemann diss. http://stp.ling.uu.se/~joerg/phd/html/node7.html

MAF/SYNAF (ISO ehdotuksia, Lirics-hanke) http://lirics.loria.fr/doc_pub/MAF_SynAF.pdf http://tc37sc4.org/new_doc/jeju/SynAF,%20ISO%20NWI.ppt

BLEU sgml formaatti

TIPSTER http://www.itl.nist.gov/iaui/894.02/related_projects/tipster/

Korpustyökaluja / Corpus tools

TEI software http://www.tei-c.org/Software/ o TEI Wiki (mmm. XSLT skriptejä) http://www.tei-c.org.uk/wiki/index.php/Main_Page o Relax NG resources http://www.tei-c.org/release/xml/tei/schema/relaxng/

AGTK: Annotation Graph Toolkit 2004 http://agtk.sourceforge.net/

jATLAS 2003 http://jatlas.sourceforge.net/

TIGERSearch 2003 http://www.ims.uni-stuttgart.de/projekte/TIGER/TIGERSearch/

OmegaT open source translation memory tool

MMAX http://www.eml-research.de/english/research/nlp/download/mmax.php

WebCon (w3c) komentorivityökalu http://www.w3.org/ComLine/

GNU wget http://www.gnu.org/software/wget/wget.html

Tilastollinen konekäännös

http://www.statmt.org/

Alignment (kohdistus, linjaus)

Wikipedia http://en.wikipedia.org/wiki/Sequence_alignment

Rada Mihalcea: Sentence Alignment and Word Alignment: Projects, Papers, Evaluation, etc.

TEI P5 kohdistuksesta http://www.tei-c.org/release/doc/tei-p5-doc/en/html/SA.html

Multext CESALIGN 1996 http://aune.lpl.univ-aix.fr/projects/multext/CORP/MUL4.alig.html

Vanilla http://nl.ijs.si/telri/Vanilla/

hunalign http://mokk.bme.hu/resources/hunalign

Dekoodaus /Decoding (eli käännöksen generointi)

ISI rewrite decoder http://www.isi.edu/licensed-sw/rewrite-decoder/

Pharaoh beam search decoder http://www.isi.edu/licensed-sw/pharaoh/

Moses stack decoder http://www.statmt.org/moses/

Papereita / Papers

Jurafsky/Martin Machine Translation chapter www.cs.colorado.edu/~martin/SLP/Updates/24.pdf

Och, F.J. 2003. Minimum error rate training in statistical machine translation. Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics, July 2003, pp. 160-167. http://acl.ldc.upenn.edu/acl2003/main/pdfs/Och.pdf

Och, F. J. and Ney, H. 2003. A systematic comparison of various statistical alignment models. Comput. Linguist. 29, 1 (Mar. 2003), 19-51. DOI= http://dx.doi.org/10.1162/089120103321337421 http://www.mitpressjournals.org/doi/pdf/10.1162/089120103321337421

Germann, U., Jahr, M., Knight, K., Marcu, D., and Yamada, K. 2001 Fast Decoding and Optimal Decoding for Machine Translation. Proceedings of ACL-01. Toulouse, France. http://www.isi.edu/natural-language/projects/rewrite/decoder.pdf

Germann, U. 2003. Greedy Decoding for Statistical Machine Translation in Almost Linear Time Proceedings of HLT-NAACL-2003.. Edmonton, AB, Canada. http://www.isi.edu/natural-language/projects/rewrite/germann-hlt-naacl-03.pdf

Barzilay/Lee 2002: Bootstrapping Lexical Choice via Multiple-Sequence Alignment. EMNLP, ACL http://www.cs.cornell.edu/home/llee/papers/gen-msa.pdf

Berger, A. L., Brown, P. F., Della Pietra, S. A., Della Pietra, V. J., Gillett, J. R., Lafferty, J. D., Mercer, R. L., Printz, H., and Ure¨, L. 1994. The Candide system for machine translation. In Proceedings of the Workshop on Human Language Technology (Plainsboro, NJ, March 08 - 11, 1994). Human Language Technology Conference. Association for Computational Linguistics, Morristown, NJ, 157-162. http://portal.acm.org/ft_gateway.cfm?id=1075844&type=pdf&coll=GUIDE&dl=GUIDE,&CFID=11627168&CFTOKEN=74689349

Bilmer, Jeff. 1998. A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov models. ICSI, Berkeley 1998

Och, Franz Josef, 2003. Minimum error rate trainining in statistical machine translation. ACL 41, 2003.''

Deng, Younggang, Byrne, William, 2005. HMM word and phrase alignment for statistical machine translation. HLT/EMNLP, Vancouver 2005, 169-176.

Long, Philiop, Servedio, Rocco, Simon, Hans Ulrich. 2007. Discriminative learning can succeed where generative learning fails. Preprint.

Moore, R. 2005. A discriminative framework for bilingual word alignment. HLT/EMNLP, Vancouver 2005.

Työkalupakkeja / Toolkits

CMU-Cambridge toolkit 2.5 1999 http://mi.eng.cam.ac.uk/%7Eprc14/toolkit.html

EGYPT Johns Hopkins 1999 http://www.clsp.jhu.edu/ws99/projects/mt/toolkit/

GiZA\+\+ 2003 http://www.fjoch.com/GIZA++.html

SRI LM toolkit http://www.speech.sri.com/projects/srilm/download.html

ISI rewrite decoder http://www.isi.edu/licensed-sw/rewrite-decoder/

Moses stack decoder http://www.statmt.org/moses/

Melamed 2005: SMT by parsing http://nlp.cs.nyu.edu/GenPar/

Oppaita / Guides

GIZA manual http://mt.schtuff.com/giza_manual

Open Source Toolkit for Statistical Machine Translation JHU Summer Workshop 2006 http://www.statmt.org/jhuws/?n=Moses.Documentation

Getting started with Moses software http://www.statmt.org/jhuws/?n=Moses.GettingStarted

Moses SMT training overview http://www.statmt.org/moses/?n=FactoredTraining.Overview

Moses stack decoder tutorial http://www.statmt.org/moses/?n=Moses.Tutorial

SMT Quick Run http://ufal.mff.cuni.cz/~curin/SMT_QuickRun/

Guardiani Moses Howto v2 http://www.guardiani.us/index.php/Moses_Language_Model_Howto_v2

Projekteja / Projects

MULTEXT http://www.lpl.univ-aix.fr/projects/multext/

MULTEXT-East http://www.tei-c.org/Applications/mu04.xml

Konferensseja / Conferences

EU Enlargement Workshop 2005 http://www.jrc.cec.eu.int/langtech/0509_EU-Enlargement-Workshop.html

ACL 2005 workshop http://www.statmt.org:8080/wpt05/

HLT/NAACL Workshop 2003 http://www.cs.unt.edu/~rada/wpt/

Bibliografioita / Bibliographies

Michael Barlow http://www.athel.com/corpus_bibliography.html

Jean Veronis Parallel text processing bibliography http://www.up.univ-mrs.fr/~veronis/biblios/ptp.htm

-- TeroAalto - 16 Jan 2007

Topic revision: r24 - 2010-03-17 - LauriCarlson
 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2018 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback