A Short Outline of the Work Package 7

This work package deals with legal issues of CLARIN, including licensing, authorization and authentication which is necessary for the proper handling and use of language resources. If the language materials and resources were free of copyrights and other restrictions, their sharing and use would be much simpler. In reality, most of the language resources and technology is governed by licenses and copyrights which impose various restrictions in their copying, their showing in public and there use for specific purposes.

The author of written texts and the speaker of oral works has certain rights, and if the works are published, a publisher has acquired some of these rights from them. Language resources are often collected by scholars and the collectors need a permission from the original copyright holders in order to use and let some other scholars use the resources. This permission is called a license. Usually each scholar and the publisher/author design a contract which they feel appropriate. Thus, the licenses are mostly unique, or at least slightly different from each other.

The problem with existing licenses is rather sizable in magnitude because there are so many language materials and resources around, created by so many authors, published by many publishing companies, collected by many scholars. There may be thousands of licenses which are similar but still somewhat different from each other. It is one of the tasks of this work package to study the stock of existing licenses and create a set of model licenses to be used when making future agreements between the collectors and the authors/publishers, between the collectors and computing centers, and between the end users and collectors. It is also a task of this work package to find ways for migrating existing materials into the framework of new standard CLARIN licenses.

Not only the number of materials is huge, there are even a larger number of potential users of those materials. Those large numbers are, of course, the opportunity for CLARIN and the justification for the investment needed for the infrastructure. At the same time, one may imagine the problems which arise. The collector of materials or the compiler of a corpus is usually granted a right to use the materials but also grant permissions to further scholars to use the material. Colleagues in near by universities or those with which there already is cooperation are known in advance, and they pose no serious problems. But, the continent is large and there are far more scholars and students than any collector is familiar with. It is the task of this work package to find ways for managing the licenses and permissions in a reasonable way which is an excessive burden neither to the users nor the collectors of the materials, but still satisfies the reasonable interest of the copyright holders.

The collector is responsible to the authors/publishers and must retain there confidence in order to acquire further materials. Thus, he/she must guarantee that the users are who they claim to be and worth the trust. Fortunately, there are techniques available for this kind of remote authentication where a trusted remote organization mediates the necessary authentication, electronic signatures etc. This work package is not going to implement those systems but, instead, its task is to set up rules and model contracts which the players in the infrastructure should obey.

It is not sufficient just to set up requirements for the various parties, one must also consider the assessment and ways of controlling the reliability of the trust system in practice. Organizations which are trusted to correctly authenticate their personnel and students, need ways to prove that they are worth the trust. Otherwise publishers and other owners of materials will not trust CLARIN enough and will not release their materials to CLARIN users. It is a task of the work package to study the problems related to the maintaining the credibility of the trust scheme.

Organizations such as ELRA (European Language Resources Association) and ELDA (Evaluation & Language resources Distribution Agency) make language resources available and identify, classify, collect, validate and produce the language resources. Their mode of operations is mostly to supply a copy of such resources for developers of human language technologies or similar users. The task of this work package is to define the relation between ELRA/ELDA and CLARIN, and the cooperation with them.

Traditionally, research use has been free of charges. Some services, e.g. the Russian Integrum has almost all published newspapers and periodicals in a commercial on line system which is also used for researchers. The use is not free which is clearly a problem for wider use and full utilization for the resources. Despite the problems, this work package will study the option of including commercial resources in the CLARIN scheme.

In addition to texts, speech recordings and multimedia materials, CLARIN needs human language technologies in order to enable adequate services for a wide array of languages. The software for normalizing, parsing and processing of the materials should optimally be multilingual in the sense that the same programs work with many languages provided there are lexicons and rules available for those languages. Searching, annotating and organizing materials for all languages may become a nightmare for the implementer, if all languages need specific programs especially coded for each language. The programs and software modules need to be compatible, i.e. various parts need to be combined to form the services. Software licenses may make this impossible in some cases. Legacy programs have, again, individual licenses which differ from each other. Some programs cannot be combined with other programs because of (often unintended) contradicting clauses in their licenses. Thus certain policies and recommendations are needed for software licenses. Many open source licenses would be useful for CLARIN software as they guarantee the freedom of enhancing, tuning or them if there is need. Open source programs can usually (but not always) be combined in order to create larger systems. CLARIN will follow open source principles in its own software where possible. It is a task of this work package to study existing and suitable software licenses for CLARIN and produce recommendations for open source licenses and model contracts for commercial software to be included in CLARIN services.

It is also a task of this work package to consider the ethical rules and recommendations related to the language resources to be included in the CLARIN framework.

-- KimmoKoskenniemi - 03 Dec 2007

