$Id: data-model,v 1.5 2008-02-07 13:37:15 mike Exp $


Entity Relationship Diagram
---------------------------

Here is a diagram of the eleven main classes in the Keystone Resolver
data model, represented in the initial implementation by MySQL
tables.  (In the relational database implementation, additional tables
are used to represent the many-to-many links genre:service-type
and server:serial, and to provide each serial's multiple aliases and
each credential's multiple key=value pairs.)

       +---------+                            +----------+
       | MFORMAT |                            | PROVIDER |
       |         |                            |          |
       | name    |                            | name     |
       | uri     |                            | priority |
       |         |                            | contact  |
       +---------+                            +----------+
         \  |  /                                   /|\
          \ | /                                   / | \
           \|/                                   /  |  \
        +-------+      +--------------+      +-------------+      +--------+
        | GENRE |      | SERVICE TYPE |      | SERVICE     |      | SERIAL |
        |       |\_  _/|              | ____/|             |\_  _/|        |
        | tag   |__\/__| tag          |/_____| name        |__\/__| name   |
        | name  | _/\_ | name         |\____ | priority    | _/\_ | alias* |
        |       |/    \| plugin       |     \| url recipe  |/    \| issn   |
        |       |      | priority     |      | auth recipe |      |        |
        |       |      | [CODEREF]    |      | disabled    |      |        |
        +-------+      +--------------+      +-------------+      +--------+
                                                   /|\
                                                  / | \
                                                 /  |  \
                                             +-------------+
                                             | CREDENTIALS |
                                             |             |
                                             | key=value*  |
    +--------+      +----------+             +-------------+
    | SID    |      | SOURCE   |                 \  |  /
    |        | ____/|          |                  \ | /
    | sid    |/_____| name     |                   \|/
    | tag    |\____ | url      |             +-------------+
    | recipe |     \| encoding |             | IDENTITY    |
    +--------+      +----------+             |             |
                                             | name        |
                                             | level       |
					     | parent ---> |
					     +-------------+

	+--------+      +--------+
	| CONFIG |      | DOMAIN |
	|        |      |        |
	| name   |      | domain |
	| value  |      | status |
	+--------+      +--------+


Description
-----------

Genres are the kinds of things that you might find a reference for in
a resource discovery service such as an abstracts-and-indexing service
or a metasearch site.  The most common and complex case is journal
articles, but books are also important.  The complete list (from the
widely-implemented v0.1 of the OpenURL standard) is:
	journal, book, conference, article, preprint, proceeding,
	bookitem
And the list in v1.0 (Part 2, page 39) is:
	journal, issue, article, conference, proceeding, preprint,
	unknown
It may appear that v1.0 of the standard has dropped support for
representing books, but this is not so: in OpenURL 1.0, books are
represented by a completely different metadata format (refered to here
as an mformat).  The standard's specification of the San Antonio
Profile (Part 2, page 77) lists five KEV metadata formats:
	book, dissertation, journal, patent, sch_svc
of which only journal is described in the standard (Part 2, page 35).
Each mformat has a default genre, which is assumed for entities that
do not explicitly specify one; multiple mformats may share the same
default genre.

(The only use of the mformat table is to map a metadata-format URI onto a
default genre in the case of v1.0 OpenURLs that specify the former but
not the latter.)

Service types are the kinds of things that an OpenURL might resolve
to.  The most common and important case is the full text of the
article (typically but not always as a PDF), as provided by a content
aggregator.  Other important cases include abstracts, web-searches,
on-line book-stores, local library catalogue's holdings, ILL requests
and citations in some specific format.  Each service-type is
implemented by a plugin module whose name is stated in the "plugin"
field, or in the "tag" if that is empty.

There is a complex many-to-many relationship between genres and
service types.  For example, the genre "book" can be resolved by an
online-bookstore or web-search service, but not by content aggregators
as they deal in serials.  Conversely, the genre "article" can be
resolved by an aggregator or a web-search service, but not by an
online book-store.

	### David observes that some of the content aggregators
	(e.g. Ovid) deal in books and other publications as well as
	serial articles.  We can't tell how to represent this in the
	RDB until we get sample data from TDNet.

Service Providers, or just Providers for short, are organisations that
provide services that a link might resolve to.  For example, Elsevier
and Gale are both providers.  Each provider provides one or more
services.

Services are specific implementations of a given service type, and
each is provided by a particular provider.  For example, Elsevier
provides a service of type "aggregator", and Google and Alltheweb are
both services of type "web search".  Each service has exactly one
service type and one provider; but there may be any number of services
of each type, and any number of services may be provided by a single
provider.  Services may be disabled.

Serials (including journals, proceedings, etc.) are places where
individual articles, papers, etc. are published.  A service may
provide access to many serials, and many services may provide access
to a serial, so this is a many-to-many relationship.

An identity is anyone or anything that might be trying to resolve an
OpenURL: an individual, a group, a department, an organisation, a
consortium or anything in between.  In all cases, the identity
consists of a name, a level (person, group, etc.) and a parent pointer
indicating the containing identity if any: for example, Index Data is
the parent of Mike Taylor, and ILCSO is the parent of Champaign
Library; but ILCSO itself has no parent.

Credentials are sets of context=value pairs, such as the set
user=mike, password=fruit.  Each such credential set is associated
with a particular identity and particular service (typically but not
necessarily an aggregator service).  A given set of credentials is
what a particular identity needs to use in order to access a
particular service.  For example, Index Data might need to use
user=index pass=apple to access Elsevier, but user=id pass=pear to
access Science Direct.

	### One deficiency in this model is that it does not appear to
	handle the situation where an identity has a time-limited
	subscription for a service, e.g. Index Data has access to all
	of Elsevier's articles from the start of 1998 until six months
	ago.

Some OpenURLs contain non-standard opaque identifiers such as local
catalogue numbers.  According to the standard, OpenURLs that do this
must also include an indication of the vocabulary from which the
identifier is drawn, e.g "this is a Library of Croyden call number".
Such an identifier is called a SID.

A Source is a site that can generate OpenURLs.  Several sites may use
the same non-standard identifier, so there is a one-to-many
relationship between SIDs and sources.

Configuration information is held as a set of name=value pairs called
Config objects, which can be thought of as broadly analogous to
environment variables.

And, finally, domains associate specific Internet domains with a
status drawn from a small set of allowable values, indicating what
approach we should take to fetching ContextObjects, Descriptors and
Identifier results from them.  The "status" field may contains the
values:
	0 (always) -- fetch data from this domain when an OpenURL says
		to do so, and treat failure as fatal.
	1 (never) -- never try to fetch data from this domain, but
		continue with the resolution process.
	2 (ignore) -- try to fetch data from this domain when
		requested, but ignore failure and continue resolving.


Examples
--------

Example objects of various types:

GENRE: book, journal article, online document
SERVICE TYPE: full text, abstract, web search, on-line book store, citation
PROVIDER: Elsevier, Gale, EBSCO
SERVICE:
	Full Text: Elsevier, Gale, EBSCO
	Abstract: Elsevier, Gale, EBSCO, GAIA
	Web Search: Google, Alltheweb, Inktomi
	Book Store: Amazon, Barnes & Noble
	Citation: JVP format, BMJ format
SERIAL: JVP, Acta Palaeontologica Polonica, BMJ, Paleobios, Ariadne
IDENTITY: Mike Taylor, Index Data, Champaign Library, ICLSO
CREDENTIALS: user=mike pass=chickens, user=index pass=kebab
SID: Ovid:Medline, ERL:BX4, EBSCO:MFA
SOURCE: Library of Texas, LC Voyager
CONFIG: logfile=/tmp/kr.log, verbosity=2
DOMAIN: ldap.caltech.edu with status=2 (this domain is used in req_ref
	for some sample v1.0 OpenURLs, but doesn't exist)


Admin UI
--------

The adminstrative web-site for the resolver's RDB uses some further
tables for its own purposes.  These are:

       +---------------+      +---------+
       | SITE          |      | SESSION |
       |               | ____/|         |
       | tag           |/_____| cookie  |
       | name          |\____ | dest    |
       | bg_colour     |     \|         |
       | email_address |      |         |
       +---------------+      +---------+
              /|\               \  |  /
             / | \               \ | /
            /  |  \               \|/
       +---------------+           |
       | USER          |           |
       |               |           |
       | admin         |__________/
       | name          |
       | email_address |
       | password      |
       +---------------+

The admin web-site setup actually hosts an arbitrary number of similar
sites which are cosmetically branded and may have different rights and
responsibilities, though implemented with the same core functionality:
each such site is represented by a Site record.

Users of the admin site are allocated a session on their first page
impression, implemented using a cookie.  That cookie is used as the
key to the Session records, which must be stored in the database
rather than in server memory because subsequent page requests may be
directed to different server instances.

Users may register and log in, at which point their Session object is
updated to point to the relevant User record.  Users are allocated on
a per-site basis, so that multiple users with the same email address
may exist so long as each is associated with a different site.
