Resource Manager

The term Resource Manager is used for compatibility with the CLARIN Language Resource and Technology Federation document, in which the topic 5 (Requirements) leaves resource management to the centers. The Resource Manager is the authorization component that automatically allows or denies access to files according to user attributes. A linguistic resource or corpus can contain one or several files. There is also owner-controlled resource management defined by RequirementsSpecification.


CSC has created the CRAS system (CSC Resource Accounting System, which can store stat and hash data of files in a relational database. In the Demo, the same database was implemented to control access. The URL of the demo is

The source code of the demo is attached:

  • dl: Python program to download files
  • list: Python program to show the allowed files

The current database structure includes a table called resurssi:

describe resurssi;
| Field      | Type                  | Null | Key | Default | Extra |
| path_hash  | varchar(32)           | NO   | PRI |         |       | 
| path       | text                  | YES  |     | NULL    |       | 
| path_utf8  | text                  | YES  |     | NULL    |       | 
| owner      | varchar(64)           | YES  |     | NULL    |       | 
| right_type | mediumint(8) unsigned | YES  |     | NULL    |       | 
| rights     | varchar(255)          | YES  |     | NULL    |       | 

Each record in the resurssi table contains a file. A resource can contain one or several files.
The path_hash is an index and ensures the security of the demo system. It's generated by a python md5 object by the command, where realname is realpath(join(root, name)).
The owner is Shibboleth EPPN (EduPersonPrincipalName). In the future, the owner can set rights.
Only path information is shown to the user.
Only right_type 0 is used.
The rights field contains a Shibboleth attribute key value string. The rights field can contain one of the following sample strings :
The program list only shows the user the files that the user has the ight to access. The list of files has links to the dl program, which can send the requested file to the user if the rights allow sending.

The implementation of the demo took less than one week's work.

Required features for production

We recommend that the Resource Manager model described as Demo will be chosen for production to grant automatic access to the chosen resources. In addition to the features of Demo, the following features are needed for production:

  • Using the CRAS database instead of the current demo database.
  • Adding AND and OR operations for the rights, may be implemented as a new right_type 1 or just by adding some parsing for right_type 0.
  • Really carefully planning the database structure.
  • An owner's page to set the rights. * this should include some (limited) prescribed usage right types (e.g. for all purposes; 2. free for research and education; 3) restricted; consent to specific terms required). Moreover, this page should allow for the deposition of the specific terms for usage which the applicant may sign electronically.
  • Recursive views and functionality per resources for all subdirectories and files under them like unix chmod -r
  • Showing the owner a list of all of his/her files/resources.
  • Showing the user the file sizes and adding the size information into the database.
  • An interface to add resources to database, planning and implementation, may be a command line program because linguistic resources are static.
  • Usage statistics (they are already httpd server log published by analog, but is it enough?).
  • Groups. Groups are functionally equal to an OR operation for the list of users, but long lists are more efficient and user-friendly for storing their own tables.

Doing everything above will take about a month.

Shibboleth Service Provider (SP)

Shibboleth SP has the Resource Manager functionality which is controlled by XML settings:

<?xml version="1.0" encoding="UTF-8"?>
<AccessControl xmlns="urn:mace:shibboleth:target:config:1.0">
   <Rule require="schacHomeOrganizationType">fi:university</Rule>
The rules can be combined with And and Or tags. Rules are called by the Shibboleth configuration file shibboleth.xml
<Path name="shib/appl/ling/kielipankki/amph" authType="shibboleth" requireSession="true">
    <AccessControlProvider uri="/v/net/" type="edu.internet2.middleware.shibboleth.sp.provider.XMLAccessControl"/>
This example does not work for an unknown reason. A similar example has worked on the test machine. The server uses Shibboleth 1.3 and it may work better with Shibboleth 2. Every change also requires restarting the Shibboleth SP, which is not acceptable in production use. Administering the Shibboleth SP will be very difficult and there is no sense to use insecure authorization.

If it is possible to get the Shibboleth Access Control working, it will require at least a week of work.

External authorization

The users sign license agreements, codes of conduct etc. The information that users have signed such agreements can be stored in a centralized CLARIN database that is accessible via Web Services. If access to a resource requires that the user has signed a certain agreement, the Resource Manager (ref. Demo) can make a Web Service query to the centralized CLARIN database and ask whether the EduPersonPrincipalName of the user has signed the agreement.

A simplified use case:
Add right_type 1 into our resurssi table. The rights field's value can be, for example, CLARINlicense1. The program list will show the file to the SAML2 authenticated user if the Web Service query to the centralized CLARIN database returns that the user has signed CLARINlicense1. The Web Service query results are cached locally so that there is only one query per listing. The centralized CLARIN database has to be extremely reliable, because a fault will stop or badly delay the functions of all CLARIN centers.

The structure of the centralized CLARIN database can be very simple:

Field Type Null Key
EPPN varchar(128) NO PRIMARY
License int NO  
Valid date NO  

Of course the database can have more fields, e.g. the location where the signed license agreements are stored. All resource-specific terms of use and codes of conduct should probably be stored in the resource database in some way (e.g. text of pdf's), so that an applicant can consent to the terms within the envisioned system. If the terms of usage require the submission of a research plan, the system should be able to accommodate the inclusion of research plan and other documents (as attachments?), which can be transmitted to the resource owner for review.

An example of values stored in the database:

EPPN License Valid 1 2010-10-31 2 2010-10-31

An example of a SOAP (Simple Object Access Protocol) query:

<env:Envelope xmlns:env="" >
And response
<env:Envelope xmlns:env="" >

The above described external authorization is illustrated in Daan Broeder's and Dieter van Uytvanck's presentation held in the TF-EMC2 meeting on Dec. 3, 2008 in: Licences & Code of Conducts 3

Topic attachments
I Attachment Action Size Date Who Comment
PNGpng CLARIN_CCdatabase.png manage 49.9 K 2008-12-11 - 16:12 UnknownUser  
Unknown file formatEXT dl manage 6.2 K 2008-10-20 - 10:06 UnknownUser Python program to download
Unknown file formatEXT list manage 1.4 K 2008-10-20 - 10:07 UnknownUser Python program to show allowed files
Topic revision: r23 - 2008-12-11 - SatuTorikka
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2018 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback