Minimal Computing Environment in Teaching of Language Technology

Introduction

Purpose: This describes collaboratively the minimal computing requirements of a learning environment at the University of Helsinki (at least). This page is distinguished from a "nice to have" wish list. Editing: Collaborative editing is encouraged. Acknowledgements and sources: JussiPiitulainen, MiettaLennes, DonKillian, KimmoKoskenniemi; http://www.ling.helsinki.fi/atk/perusohj/index.shtml.

Table of Contents

Strategies for Software Administration

Software per Courses

Need Software Courses
connectivity ssh/!PuTTY, scp/!WinSCP, NoMachine/!WinAxe many
viewing files AcroRead, Preview, Firefox, many
file archiving gzip, tar, 7-Zip, xz  
editing text files emacs many
publishing TeX, OpenOffice many
processing text files UNIX-tools many
statistics R, (SPSS) clt255 
speech processing Praat, Elan ...
morphological analyzers twol, omorfi, fdg, etc. clt261 
syntactic parsers VISLCG3  
FST tools HFST, SFST, XFST, foma, fsmlibrary, Graphviz clt271
translation memories SDL Trados, Swordfish clt...
Java tools JDK, JEdit, Gate clt...
Lisp tools LKB clt...

Software needed in Sali 25 (Windows 7), laptops or at CSC

The following table attempts to list the software that is currently used in teaching LT in Finland. The list still is partial: it neither includes all universities nor all courses.

Software Machine for running Administration Teachers Good! Notes
iTerm2 Mac Owner AY  
 ssh Linux/ workstation vakio *   Colors and fonts really bad on projected screen
PuTTY Windows workstation AD *   Colors and fonts really bad on projected screen
emacs hippu.csc.fi AD *   Must add instructions for disabling colors on the screen
TeX Linux/Mac/Windows workstation student *   Combine with a PDF viewing method
TeX hippu.csc.fi CSC AY, JP   Combine with a PDF viewing method
scp Linux/Mac workstation standard *    
WinSCP  Windows workstation AD *    
AcroRead hippu.csc.fi CSC *   not available
AcroRead Linux workstation standard? *   Reopen if PDF is overwritten
AcroRead Windows workstation AD *   Reopen if PDF is overwritten
Preview Mac workstation standard *   May crash if file changes
Firefox hippu.csc.fi CSC JP   Requires NoMachine
HFST hippu.csc.fi CSC & HFST AY, TP   Check! Not yet in the list of application software supported by CSC
xz (file compressor) hippu.csc.fi CSC AY, TP   Check! Recommend GZIP to HFST tp avoid this.
foma hippu.csc.fi CSC AY   Requires Graphviz and a viewer via NoMachine - not yet solved
foma Windows workstation AD AY   Problems with Graphviz output if it is not installed
VISL CG-3 ? ?  AV    
R hippu.csc.fi CSC AY   Plots?
R Windows workstation AD AY   May require moving data from hippu.csc.fi
RStudio Windows workstations UEF SW   structured layout helps in teaching R
RStudio Linux workstations UEF SW   structured layout helps in teaching R
Graphviz hippu.csc.fi  CSC AY    Requires NoMachine
OpenOffice Windows workstations AD *    
SDL Trados Windows workstations AD LC   Licenses
SwordFish Windows workstations AD LC   Licenses
Praat Windows workstations AD ML   Audio support needed
Praat Linux workstations AD ML   Audio support needed
Elan Linux workstations AD ML   Audio support needed
Elan Windows workstations AD ML   Audio support needed
NoMachine Windows workstations AD AY   Redundant with WinAxe??
WinAxe X-sessions Windows workstations AD GW   Redundant with NoMachine??
7-Zip Windows workstations AD GW    
Java SE Development Kit (JDK) v6 Windows workstations AD GW    
JEdit (Java editor) Windows workstations AD GW    
GATE (Java NLP) Windows workstations AD GW    
LKB (Lisp NLP) Windows workstations AD GW   What about LISP?
Python v2.7 (not 3.x!) Windows workstations AD GW    
xsltproc hippu.csc.fi CSC      

Note: Due to problems with 64-bit and 32-bit software compatibility on Windows 7, I strongly recommend that * only 32-bit software * should be used for now.

Python software

Python version 2.7 (* NOT version 3.x *) http://www.python.org/download/releases/2.7.3/

NLTK version 2.0 (instructions at http://nltk.org/install.html) http://pypi.python.org/pypi/nltk

NLTK Corpora (instructions at http://nltk.org/data.html) Download the "book" collection of packages. Shared central installation at C:\nltk_data.

PyYAML http://pyyaml.org/wiki/PyYAML

Numpy http://sourceforge.net/projects/numpy/files/NumPy/1.6.2/numpy-1.6.2-win32-superpack-python2.7.exe

Matplotlib matplotlib-1.2.0.win32-py2.7.exe from https://github.com/matplotlib/matplotlib/downloads/

Django 1.3 Django-1.3.4.tar.gz from https://www.djangoproject.com/download/

wxPython wxPython2.8-win32-unicode-py27 from http://www.wxpython.org/download.php

RUR-PLE rurple1.0rc3 from http://sourceforge.net/projects/rur-ple/files/rur-ple/

Other software

WinAxe X-Session (X-Windows for remote graphics applications)

WinSCP (for remote file transfers)

Putty (for remote interactive logins)

7-Zip (for file compression/uncompression)

Java SE Development Kit (JDK) version 6

JEdit (Java editor)

GATE (Java NLP software)

LKB (Lisp NLP software)

Hardware Requirements

Use of Internet Connections from Student's Laptops or Tablets

For those students that want to use their own laptops (or tablets or the like), it would be very nice to have free Ethernet cables in the class, since the wi-fi connections may not always work. Of course, there may be network security issues that prevent the direct cable connections. But I understand that HUPnet does not provide a reasonably secure connection (without VPN), and I have also discovered that the eduroam coverage at the university is generally not very good at the moment. (ML)

Headphones

We would be needing headphones (or, preferably, headphone+mic combinations) for all workstations before the end of January, since we are going to use audio and video files (and speech corpora) in the class. There were only a few headphones available, and they were not even locked away. (ML)

Computing Tricks and Best Practices

CSC Accounts

Professors and permanent teachers can use temporary "course" accounts in their courses, while the masters students should apply for accounts of their own Open Question: what right one needs to apply for in different cources.

Connecting a Laptop to the Internet

Editing LaTeX, compling and previewing

With integrated environments:
Open Question: We need up-to-date experiences on these.

Without integrated environments:
On Windows workstations, we login to hippu with PuTTY, edit documents with emacs, compile with pdflatex, transfer the PDF to the workstation using SCPY and then open the PDF with AcroRead. (AY)

Running Firefoxin on hippu.csc.fi:ssä

JP has used NoMachine to run Firefox in hippu remotely without problems. When this is useful?

Installation and Configuration Instructions

Installing TeX to student's machine

There are many TeX systems some of which are interactive and some are bare-bones command line systems. Interested students are encouraged to install them. Here are the systems used by some teachers or students:

Shell and Locales

As for getting course accounts from CSC, it is important for us to have a suitable environment (bash, utf-8 encoding etc.) with regard to linguistic tools (for processing text). It is indeed good if the teacher has some control over this, since the user manager at CSC may not automatically provide the ideal configuration (there are many options), and the different environments and setups tend to cause a lot of extra hassle at the beginning of a course.

Five Approaches to Run Named Versions of Software

There is sometimes a need to run a specific version of a software. There are various ways to ensure a particular version:
  1. Various versions of the program share the same library, use a named version of the library or the libraries are statically linked. For example, there can be several Python versions on the same path: python2.6, python3, python3.3. Binary versions of transducers run regardless of the path configurations, thus without modules.
  2. Binary and library paths are configured so that they find the specified version. This is the purpose of the module system.
  3. The users copy the software to their own directories to ensure that the software is not changing during the project.
  4. The users retrieve the specified version from the repository and compiles the software from it.
  5. I guess you know one more.

Normally at most one version selection method is supported for each program. I consider the use of modules more complex than the use of distinguished binary file names, but this is not possible if the software provider does not support it. It is hard to motivate the complexity of modules unless the library dependencies are complex.

Finding the Right Analyzers on hippu.csc.fi

Currently, there are only heuristic approaches to find analyzer software from hippu.csc.fi:
  1. Look from the list of software maintained by the CSC: http://www.csc.fi/english/research/software/index_html#field6. Problem: The list is heavy to maintain even within CSC since the publishing system is complex (I know from experience).
  2. Use bash shell and then write the name of the language and try completion: e.g. write finnish- and then tabulator. Problems: Is my shell bash (in fact, everybody used have tcsh by default)? How do I know should I write just f, fi, fin, or more? What if my paths are not right? Do I have access to it? How I know which module I have to load? (I doubt that this would work in hippu where the /usr/bin contains already so many irrelevant programs.)
  3. Ask from your friend or teacher (usually the fastest way, provided that you are not the teacher).
  4. Ask from CLARIN resource databases? (Do you know it?)

If you do not find the information with these methods (especially if you are busy to try), here is a simple index to the analyzers (it it is still empty):

language commands what I need to do to get the sufficient privileges How do I configure the paths for this?
Finnish      
German      
Swedish      
Swahili      
Russian      
French      
Sami      

NoMachine and CSC

CSC supports NoMachine via nxlogin.csc.fi. CSC has a way to open graphical programs on your own computer via NoMachine. It does not need X11. The user needs to install a NoMachine client (Win/Mac/Linux) on the local machine, starts the client, logs in with a CSC user account and then starts the desired graphical tools on the server side. The client software is not very pretty, but at least it seemed to work smoothly on my Mac (after following the instructions for installation and setup). However, I do not yet know whether FreeNX would actually work for the students in class... And I have no idea whether we can play audio content with this client. But we should probably test it.

  1. Download the NoMachine client (win, mac linux):
  2. Install and configure your NoMachine client. Step-by-step instructions are available in the Scientist's User Interface
  3. Start the NoMachine client and log on to nxlogin.csc.fi
  4. Right click in the Desktop and choose the CSC server you want to use (opens an ssh terminal)
  5. Start your scientific application normally
  6. When you're done, suspend or terminate the connection from the Desktop top right corner [X]

-- AnssiYliJyra - 2012-11-14

Topic revision: r13 - 2014-03-24 - ErikAxelson
 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2018 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback