Parsed Corpus of Early English Correspondence (PCEEC)

Laitos: Englannin kielen laitos
Yhteyshenkilö: Arja Nurmi

1. Linguistic research resource

l. Official or identificatory name and acronym Parsed Corpus of Early English Correspondence (PCEEC)
m. Short description of content Linguistically annotated corpus of English correspondence between the years 1410?-1681, compiled for historical sociolinguistic research.
n. Originality status primary/original location: Research Unit for Variation, Contacts and Change in English (VARIENG), Department of English, University of Helsinki
copy: Distributed world-wide through the Oxford Text Archive; agreement exists with ICAME as well.
o. Description of size and extent 2.2 million words
100% automatically annotated, manually checked, for parts of speech and syntax (Penn Tree Bank).
p. Storage format Electronic (plain text)
q. (Estimated) time invested in the collection and processing of the resource 240 person months
r. Contact person(s) and their contact information (E-mail & telephone); may be the same for all points (a), (b) and (c). a) person who in practice administers the resource and grants (possibly required) usage permits: Oxford Text Archive (, ICAME (in the foreseeable future;
b) person who has physical possession of the contracts concerning the resource, by which the resource has been acquired for use at the department: Arja Nurmi (, 09-19123531)
c) person(s) who has/have originally contracted acquired, collected, compiled and/or annotated the resource, and who thus has copyright to the material and whose permission is (possibly) required to access the resource. Corpus compilers have jointly agreed to the distribution of the corpus via the two channels mentioned. Corpus compilers hold copyright for the selection and textual annotation of texts. Distribution rights have been cleared by the copyright holders for the the texts themselves; some texts are out of copyright because the editors have died more than 70 years ago. Corpus compilers are: Jukka Keränen, Minna Nevala, Terttu Nevalainen, Arja Nurmi, Minna Palander-Collin and Helena Raumolin-Brunberg. The annotators hold copyright for the linguistic annotation. The corpus annotators are Arja Nurmi, Terttu Nevalainen, Ann Taylor, Susan Pintzuk, Anthony Warner.
s. (Main) references to published articles or other written works describing the resource itself or research based on its use. Manual: Ann Taylor and Beatrice Santorini (2006)
t. Link(s) to more extensive/thorough descriptions of the resource in the Internet (which may be in any language)
u. Physical location of resource (server and directory path or Internet address, or room/person in the case of non-electronic materials) Several copies on computers at VARIENG. Copies available from Oxford Text Archive (free of charge) and ICAME.
v. Miscellaneous other notes  

Resource Name Parsed Corpus of Early English Correspondence (PCEEC)
Resource Type Written Corpus
Languages English
Languages (other)

Description Linguistically annotated corpus of English correspondence between the years 1410?-1681, compiled for historical sociolinguistic research.

Institute Department of English, University of Helsinki
Contact Person

Begin year of resource creation

Finalization year

Format Electronic (plain text)
Metadata Link


Reference Link

Collection Working Languages

Collection Long term preservation by

Collection Location

Collection Content Type

Collection Format Detailed

Collection Quality

Collection Applications

Collection Project

Collection Size 2.2 million words
Collection Distribution Form

Collection Access

Collection Source

IPR Ethical Reference

IPR Legal Reference

IPR License Type

IPR Description

IPR Contact Person

Topic revision: r3 - 2011-11-14 - KimmoKoskenniemi
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback