Linguistics / Corpus Server (in English only)

The Corpus Server

Introduction, the corpus server of the Language Bank of Finland is a UNIX machine that is equipped for linguistic reseach purposes.

Connecting to the Corpus Server

The corpus server is accessed using Secure Shell (ssh) tools. See connecting instructions for detailed information.

Using the Tools on the Corpus Server

The basic software which typically comes with a Linux/Unix server including the GNU text utilities and other basic shell commands and file utilities, as well as common text editors, and text formatting programs etc. Language technology tools include free or proprietary parsers, linguistic knowledge bases and language technology software modules.

The list of useful software items provides (or will provide) links to the more detailed information on using.

Main Parts of the Corpus Server

  • The system
    • The Linux distribution and additional RPMs
    • CSC's configuration files and additional environment
  • The Language Bank directories
    • program and documentation directories for CSC
      • /l/bin and /l/man
    • data directories for research material:
      • /l/kielipankki/
    • a directory area, /c/appl/ling/, for contributors
      • /c/bin and /c/man for approved symbolic links
  • Directory areas for virtual language corpus servers
    • /l/venus and /corp/ are reserved for this purpose

Further information:

About the System

The corpus server has currently a 32-bit GNU/Linux system, running on a virtual machine and i686 hardware. The operating system distribution is "Red Hat Enterprise Linux 4 Update 4", or, more specifically "Linux RedHat 4(Nahant Update 4 2.6.9-42.0.3.EL i686)". Its core memory consists of 3,6GB RAM and 0,5GB swap.
