Requirements Documentation

This document offers the requirements documentation for the development project of AAI for Finnish language resources.

Contents

Definitions

CO Copyright owner
CP Content provider, who acquires linguistic resources and sufficient rights to use them from the Copyright owner.
Database MySQL Language Bank database, future references to the database will refer to this MySQL database in this document
HAKA Identity federation of the Finnish universities, polytechnics and research institutions
IdF Identity federation
IdM Identity Manager
IdP Identity provider
LRT Language resources and technology
SP Service provider
SUI New Scientist's interface

Common features

Shibboleth authentication

Shibboleth authentication means here the HAKA authentication for users of Finnish universities, polytechnics and research institutions. Shibboleth authentication is a common feature preceding both the Automatic and Controlled authorization.

  • Available linguistic research resources (current www location)
  • HAKA as Identity Provider Federation (Haka pages)
  • HAKA login (WAYF Service, later to be replaced by Shibboleth2 Discovery Service)
  • Provided attributes (funetEduPerson schema)
  • In the CLARIN community, the ePPN attribute is currently seen as the minimum necessary attribute. The rest is dependent on how well the attributes sets and their semantics can be harmonized, something we hope will happen via the eduGAIN 3.0 project.
  • CSC will implement an architecture that will support the addition of other national Identity Federations in the future in a relatively easy manner.

Resource categories

Linguistic resources (corpora) have to be equipped with access information divided into three categories:

(1) LRT which can be freely used by anyone (including resources with open licenses such as Open Access etc.) Whether there will be resources falling in this category must be studied.

(2) LRT to which the CP can grant an access automatically pending acceptance by the user of Terms and Conditions attached to the corpus/resource. Failure to accept the Terms and Conditions will prevent continuation of the resource access process - One-sided: commitment by user to predetermined CP terms.

(3) LRT which can only be accessed according to an individual application by the user and after (any) individual consideration by the CP - Two-sided: commitment by user to terms and permission by CP. See Controlled authorization

According to CLARIN policy, general metadata, including knowledge of the existence of a resource, should be publicly available for all prospective users.

Terms and Conditions

(a) Terms of Access

A description of all the requirements that the applicant has to satisfy in order to gain access

(b) Terms of Use: Code-of-Conduct/License Agreement

Here the alternatives are either some of a very few general research purpose EULAs that the applicant might already have signed, and which will apply for the resource in question, or a resource-specific license agreement that the CP provides

This must be specified later.

Language selection: Finnish/English

  • There are three different electronic application forms, both in English and Finnish.
  • Emails will be bilingual (English and Finnish).
  • CSC will implement an architecture that will support the addition of more languages in the future in a relatively easy manner.

Loading linguistic resources

Process for the Language Bank Administrator to add resources to a server will be specified later. Whether CP or other people can upload resources must be studied, there may be safety and copyright considerations.

Monitoring and statistics

CSC will monitor and gather usage statistics.

Controlled authorization

Figure: User process for linguistics with Shibboleth authentication, electronic applications and referees

Electronic application form processing

There are two different electronic application forms, both in English and Finnish. There are also forms for CSC's internal use to follow the application workflow status. Each form requires a program to handle it. Also the email responses require handling.

Commercial users need to contact CSC sales and sign a contract to access the resources. In the Language Bank the following types of licenses are currently available: A License (Academic License) and B License (Extended Commercial License).

After the Shibboleth authentication:

  • Electronic Application Form (as Shibboleth authenticated and prefilled)
    • Required attributes
  • Available linguistic research resources (with limitations defined by the CPs). How will the user view a list of resources before being authorized?
  • Acceptance by the user of Terms and Conditions attached to the resource is required.
  • Send

If the user already has a CSC user account, after logging onto the CSC Scientist's Interface (https://hotpage.csc.fi/):

  • Electronic Application Form for CSC users
  • Personal and project information update (if needed)
  • Available linguistic research resources (with limitations defined by the CPs)
  • Acceptance by the user of Terms and Conditions attached to the resource is required.
  • Send

If the user cannot be authenticated with Shibboleth or CSC user account:

  • Current Application form for the Language Bank (as non-registered)
  • Available linguistic research resources (with limitations defined by CPs)
  • Acceptance by the user of Terms and Conditions attached to the resource is required.
  • Continue current process (emails, send paper form with signature)
  • The following chapters don't describe this feature.

After Send

The electronic application form can also be used to collect new referee information. Each electronic application form can contain a checkbox for the referee candidate to express his or her willingness to function as a referee. It will be clearly indicated on the web page whether user access or referee promotion is being applied for.

Referee's authorization

An applying user becomes trusted by being approved by a referee. A new electronic form with a referee list is needed in English and Finnish. The form requires a program to handle it. The response to the email sent by the system will be via a web form (not by replying to the mail). When the IdM system is running, it will be used for the Referee's authorization.

The referee's procedure to authorize an applying user could be the following:

  1. The user is forwarded to the Referees List Form containing a list of referees ordered by country (ref. Referees table). Some applications may skip the referee procedure.
  2. If the user expects that a referee knows him or her, he or she selects that referee. A notification of an application will be sent by email to the referee with links for recommending and denying. A timer-process has to be initiated when the email is sent to the referee.
    • The referee candidates select a referee, too.
  3. If the referee recommends that the application be accepted (ref. Recommend and Deny), the application will be forwarded to the CP and the Language Bank administrator to be accepted.
    • If the user does not know any referee, the application will be forwarded straight to the CP (or contact person) of the corpus and the Language Bank administrator.
    • If the referee selects the deny link, a rejection message will be sent to the CP and the administrator.

In this model, a referee losing status would affect the associated users as well. Loss of status due to natural reasons (e.g. retirement, transition) lacks this effect.

Recommend and Deny

For the referee recommendation, the system has a secret passphrase which is SHA-hashed with the applid value. There are two web programs: Recommend and Deny. The email message generated for the referee contains links to both of them together with the application data. When the referee recommends that the application be accepted, he or she clicks the recommend link that is parametrized with an applid and a SHA hash value, and the hash will be checked.

When the hash matches, the recommend program increments the CAC field value by 32.

If the referee fails to reply in e.g. one week, he will receive a reminder. If the referee still fails to reply, the application will be forwarded to the CP and the administrator after a predefined delay (e.g. one week).

If the hash does not match, the programs do nothing or warn the staff about abuse.

Emails of the referee procedure

  1. Referee Form sends an email to the referee for recommending or denying.
  2. Reminder email to the referee, if (s)he fails to reply (automatically after a delay).
  3. Referee's Recommend email to CP.
  4. Referee's Recommend email to administrator.
  5. Referee's Deny email to CP.
  6. Referee's Deny email to administrator.
  7. Referee's No reply email to CP (automatically after a delay).
  8. Referee's No reply email to administrator (automatically after a delay).

In the case of a referee candidate, emails to the CP will be replaced by emails to the nominator from the Helsinki University Department of General Linguistics.

Timer-process

Timer-process has to be initiated when the email is sent to the referee. Time limits can be adjusted as desired.

  • if the referee has not answered in a certain time (reminder)= 8 days
  • timer will expire after a delay = 15 days (email will be sent to the CP)
  • timer will be cancelled if the referee sends Recommend or Deny

Web forms (AA work flow)

The web application can process web forms, i.e. webform submissions. These webforms could include a text box field if there is a need for the referee to provide comments, and also a checkbox field (size to be decided later) if there is a need to flag that the candidate’s application should not continue to be processed automatically and should be subject to a further administrator decision. (This should eliminate the need to deal with spam email if no email addresses are used in the application.) (A CAPTCHA test could also be used on the web form.)

CP's and administrator's acceptance

If both the CP and the administrator accept the application (ref. Accept and Reject), the user will receive the access with the required permissions. Despite being rejected by the referee, the administrator still retains the option to accept the application, providing the CP agrees. When the IdM system is running, it will be used for the CP's and administrator's acceptance.

After the CP's and administrator's acceptance, all information will automatically be copied to the database tables User, Address etc. The CSC user manager process will create a new CSC user account with the appropriate rights and associate the new customer with a new or existing project. CSC's current UNIX/LINUX based environment uses unix groups for user management (e.g Lemmie and DMA). The CSC user account allows command line access to a server. Opening up a normal CSC user account would offer tools for monitoring. When the IdM system is running, it can create the account.

If the user's home organization is a member of Haka, (s)he can log onto Scientist's interface using the username and password issued by his/her home organization. During the first visit the user is also asked for the CSC user account, so that user's ePPN can be linked to CSC user account. The next time the CSC user account will no longer be needed to log onto CSC Scientist's Interface.

The referees will be nominated by the Helsinki University Department of General Linguistics and the Administrator.

Accept and Reject

The program then sends the application by email to the CP (or contact person) of the corpus and the Language Bank administrator to be accepted. If both accept, the Accept program copies the application data into the database tables kayttajat (users), osoitteet (address) etc., and sends an acceptance email to the user.

What else does the Reject program do other than send a rejection email to the user? Will the application be deleted?

Emails of the CP's and administrator's procedure

  1. CP 's Accept email to administrator.
  2. Administrator's Accept email to usermgr@csc.fi (save the user's data in the database).
  3. Accept email to user.
  4. CP's Reject email to administrator.
  5. Administrator's Reject email to user.

In the case of referee candidates, CP's emails will be replaced by emails of the nominator from the Helsinki University Department of General Linguistics, who accepts new referees.

Timer-process

Timer-process for the CP and administrator has to be initiated after referee's response. Time limits can be adjusted as desired.

  • if the CP or administrator has not answered in a certain time (reminder)= 8 days
  • timer will expire after a delay = 15 days (Reject email will be sent to the administrator)
  • timer will be cancelled if the CP or administrator sends Accept or Reject

Database changes

These changes will be made in the MySQL Language Bank database.

Email confirmation field

The application table has the email confirmation field which contains at least 128bit of random data generated when storing the application form. When using the application form as non-registered, the random data value will be emailed to the user. The user will receive a link to the confirmation form, where he or she needs to confirm the e-mail address by entering the random data value (refer to the KITWIKI registration). Submitting the confirmation form increments the CAC field value of the application table by 1 or 2 depending on the email address.

If the user's email address is invalid, the unconfirmed application will be dropped from the database once a day.

CSC Authentication Classes (CAC field)

The CAC field in the application table describes how the user's identity is verified. Information on how each user is authenticated needs to be stored in the database, because stronger authentication than the currently used personal signature may be required. It should be added into the database table kayttajat (users).

  • The minimum level of trust for authentication (expressed by CAC values) is 32.
  • Required CAC values per resource have to be defined by the CP at the time of deposition.

The CAC field can get one or several of the values listed below. If several values are selected, they will be summarized.

  • 0. Not authenticated (data stored from web form).
  • 1. User-verified email. Authentication by an email confirmation from any address.
  • 2. Organization-verified email. Authentication by an email confirmation from a well-known CSC customer organization. HAKA members and state institutions can be considered as well-known CSC customer organizations.
  • 4. Authentication using a credit card or a good certificate issued by well-known CA.
  • 8. Scanned signature in a pdf-document.
  • 16. Personal signature (default value for current CSC customers).
  • 32. Referee recommendation: a known professor or research director recommends that the application be accepted. In addition, official identification (photo ID) can be verified by a referee.
  • 64. Strong authentication using SAML2/Shibboleth or grid certificates (in the USA: urn:mace:incommon:iap:bronze).
  • 128. Official identification verified by a bank account (tupas) or more secure certificates (in the USA: urn:mace:incommon:iap:silver).
  • 256. CSC-checked official identification card or passport.

Application table

When the user sends the application, what to do with the application data which is not yet accepted? It can be stored in the existing tables with new status fields, or new table(s) can be created. We recommend that a new application table be created.

field type size null comment
applid int no
arrivaldate date no
usernamecantidate varchar 8 yes
CAC smallint no
display name varchar 20 no
familyname varchar 25 no
nationality smallint no phone code or TLD
position varchar 40 no
organization varchar 40 no
faculty varchar 40 yes
phone varchar 20 yes
gsm varchar 20 yes
email varchar 60 no
emailconfirmation varchar 20 yes only needed during confirmation process
referee smallint yes
datetime datetime no
projectname varchar yes
projectdescription text yes
newreferee char 1 yes

  • A postal address is required for sending the password, magazines and Christmas cards. Fields must be rechecked.
  • Will the applied resources be stored here?

Referees table

CSC has to add the new table referees in the database. The referee table must have the ID and status fields. The ID field is just a number which connects the table to the henkilo (person) table (includes e.g. first name and last name) and to the osoitteet (address) table (includes e.g. email, phone etc.). The status field can have the values 0 (no longer trusted), 1 (active) and 2 (retired).

It is necessary to document who was the referee for each user. The referees table needs to be connected to the kayttajat (users) table by adding the ID field of the referee table into the kayttajat table.

field type size null
ID smallint no
status char 1 no

Topic attachments
I Attachment Action Size Date Who Comment
PNGpng linguistics_user_controlled_process_draft.png manage 126.3 K 2009-05-28 - 16:20 UnknownUser Controlled authorization drawing
Topic revision: r34 - 2010-02-08 - SatuTorikka
 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2018 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback