Library Portal Roles in a Shibboleth Federation

Don Gourley
[email protected]
Washington Research Library Consortium
October 30, 2003

Library Portal and Gateway Systems

Libraries provide valuable services to the academic community, selecting and organizing information resources that support research and education. As the use of electronic resources has grown libraries have deployed portal or gateway systems online to provide some of these selection and organization services in the digital realm. Library portals enhance the value of electronic resources with functions such as resource description & discovery, combined searching of multiple resources, context-sensitive linking, and personalized services [COX2002]. These gateways can be as simple as a set of HTML pages with links to electronic resources or as complex and fully featured as commercial portal systems such as MetaLib (ExLibris), ENCompass (Endeavor) and ZPORTALTM (Fretwell-Downing). While library portals contain gateway functionality, they also provide tools for organized knowledge discovery. (Library portals are a kind of application-specific vertical portal, as distinguished from a campus portal that provides an enterprise interface to a broader set of resources or channels.)

Gateway functionality typically includes patron authentication and resource access management. Content vendors use a variety of authentication mechanisms to control access to their products, often (but not always) including authentication by browser IP address. The security vulnerabilities and variety of these mechanisms make it difficult for a library portal to provide uniform and reliable access to the library's electronic subscriptions.

Figure 1
Figure 1: Library portal functionality (pre-Shibboleth)

Shibboleth is a federated system of authentication and authorization that can address many of these issues. When institutions and vendors join together in a Shibboleth federation, the library gateway no longer needs to mediate the authentication interaction between library users and resources, and it can use attributes available through Shibboleth to offer an improved set of services tailored to the patron. It may also provide a means to resolve some of the issues that arise when resource access does not fit neatly into Shibboleth's origin-target model. But to take advantage of the Shibboleth trust infrastructure the portal must be "Shibboleth-enabled" to interoperate with the Shibboleth software. This note explores what it means for a library portal to be Shibboleth-enabled by presenting some scenarios illustrating the roles the portal might play in a Shibboleth federation.

Shibboleth Origins and Targets

The Shibboleth model defines three roles that are involved in access management for electronic resources [CARMODY2001]:

  1. The browser user (for example, the library patron accessing electronic resources)
  2. The origin site (the institution to which the user is affiliated; responsible for authenticating the user and providing attributes)
  3. The target site (the resource provider; responsible for granting access based on the attribute information about the user provided by the origin)

Since a library portal system sits between the patron and the electronic resource, how does it fit into this model? In current practice, the library gateway system may play each role. When proxying access to IP-authenticated resources, or performing a combined search for a patron, it appears to the resource as the browser user. When authorizing a patron to access a resource it is performing an origin function. And, when providing favorite databases, saved searches or other personalized services the library portal is the resource target.

At a campus that is set up as a Shibboleth origin, a library gateway can provide access to a Shibboleth target resource by simply staying out of the authentication and authorization process. Links on the portal menus can go directly to the resource just as they would for unlicensed (free) resources such as PubMed or the library OPAC. But in many other scenarios a portal will still have to play the role of the user or target in order to provide library services and uniform access to heterogeneous distributed resources.

The Library Portal as Shibboleth Target

Perhaps the most obvious role for a library portal is as a Shibboleth target. In particular, if a campus has a Shibboleth origin that allows a user to login once to access all resources, the library portal should take advantage of that for authenticating users for its services. If the portal serves a single institution, it can make this process more transparent for its users by redirecting unauthenticated users to the institution's handle service (HS) to login, rather than sending them through an intermediate WAYF ("where are you from") service.

In this scenario the library portal has the same issues that other targets have, including how to define the attributes and entitlements necessary to determine what resources and services to offer the user. But the portal must also implement access policy for a wide range of resources in order to present the appropriate links to the user. These rules may make use of entitlement attributes from the origin or be based on group membership attributes that are associated in the portal with authorization for certain resources. For many resources authorization is the same as authentication (i.e., if you can login you must be allowed to access this resource), but in some cases authorization may only be for some members of the community and the portal must know something about the user's identity.

Authentication establishes an identity, but it may be anonymous (used indistinguishably by multiple users) or pseudonymous (used by a specific user in an unidentifiable way) [LYNCH1998]. Directories can provide a GUEST account with minimal privileges to provide anonymous identities to walk-in or on-campus users. Such an account would not provide membership attributes, but could have entitlement values allowing access to those resources that permit access by walk-ins. When personalized options are offered some identification of the patron must be provided by the origin. To keep identities secret while enabling targets to offer personalized services, Shibboleth supports a persistent unique opaque identifier in the TargetedID attribute. Some library services (e.g., online renewal) depend on specific identification where an opaque pseudonym would not work, but Shibboleth provides mechanisms for providing additional information to more trusted targets which are in the same authentication domain or within one of the institutions in a consortium that are collaborating closely on academic services. Library portals typically meet the criteria for local trusted targets, and can provide a mechanism for the anonymous user to provide an identified login for those services.

Figure 2
Figure 2: Portal access management in a Shibboleth environment

The library portal can help solve some other general resource access problems in a Shibboleth environment if it is a Shibboleth target. In particular, solutions that have already been developed in portals for creating and resolving context-sensitive links through OpenURLs can be used to provide persistent URLs to Shibboleth target resources and embed those links in other applications. A portal link resolver can take an OpenURL that contains metadata about a manuscript, journal or article and, with the user attributes available from the Shibboleth origin, make more sophisticated decisions about resources (e.g. full-text) and services (e.g. inter-library loan) to offer the patron. Assuming the resources and services offered are Shibboleth targets then the user can access them without logging in again.

Some OpenURL link resolvers, such as SFX from ExLibris, can be set up to offer a service that captures a portable citation for the user. This citation might include a description of the item and its OpenURL. The citation can be shared with other members of the university to provide access through the link resolver, which would provide an appropriate list of resources and services for each citation recipient. For example, an instructor might grab the citation and put it in the reading list for her course in a learning management system (LMS). If the LMS is also a Shibboleth target then the student's access of the reading list items requires no additional manual authentication step. To broaden the portability of citations across a federation, the portal could use a link resolver BASE-URL attribute (that would be provided by Shibboleth) to redirect OpenURL requests.

As a practical matter it will be a very long time, if ever, before all of the thousands of content providers implement Shibboleth as an access mechanism. There will continue to be multiple technical implementations of the access policies and contracts between institutions and content providers for the foreseeable future, to accommodate heterogeneous resources and different classes of institutional users [LYNCH1998]. To simplify access for the patrons, and support requirements for the library staff, the library portal can provide uniform Shibboleth-based access to all the resources, whether Shibboleth targets or not. The library portal, if itself a Shibboleth target, can act as an intermediate resource manager (RM) for electronic resources that are not Shibboleth targets.

During the Shibboleth pilot deployment project, WRLC and Georgetown University developed a prototype RM to sit between Shibboleth authentication and the non-Shibboleth-enabled resources. How this works is best seen by contrasting it with how Shibboleth targets were accessed during pilot testing. When a member of the Georgetown testing team selected a Shibboleth-enabled resource for the first time during a browser session, the user's identity was provided to Shibboleth by the campus authentication system, and Shibboleth set the session credential attributes and redirected the patron to the resource. The resource, being Shibboleth-enabled, accepted the credentials and allowed access. In this scenario the library gateway had no role (other than providing direct links to the resources on its menus).

By contrast, when providing access to non-Shibboleth-enabled resources the gateway would present URLs to the intermediate RM. To access this resource, the intermediate RM checks the Shibboleth-provided attributes and, if the user is authorized, either redirects the patron to a proxy server (for IP-authenticated resources) or to the vendor site while automatically and invisibly submitting the login credentials (for resources whose access is controlled that way). For example, if a publisher controls access by a login name and password, the intermediate RM allows the patron to use the Shibboleth login session without knowing or seeing the publisher's login credentials.

The Library Portal as User

At the same time, a portal often acts as a proxy for the user to access information resources. The intermediate RM, while acting as a target with respect to the user, appears as a user when mediating interaction with IP-authenticated (non-Shibboleth) resources for off-campus patrons. A more detailed look at how the intermediate RM works demonstrates some of the ways a library portal can proxy user actions. (It also provides a brief overview of the Shibboleth origin and target interaction during the authentication/authorization process.)

Figure 3
Figure 3: Intermediate resource manager authentication and authorization

Figure 3 shows how patrons use their Shibboleth sign-on session to access electronic resources that are controlled, either through IP-authentication or login credentials, but not Shibboleth-enabled. Access involves the following steps:

  1. Request is made to intermediate RM specifying the resource to access.
  2. The RM checks for the existence of a Shibboleth session (and an associated "handle" which can be used to retrieve user attributes). If no session exists then the intermediate RM redirects the user to the origin handle service (HS) to establish one. There the user logs in (2') to the Georgetown single sign-on page and gets assigned a temporary handle that is associated with any resource requests for the current browser session. The HS redirects the user back to the target that directed the user to the HS (in this case, the intermediate RM).
  3. The RM uses the handle to query the Georgetown attribute authority (AA) about the user. The AA responds with the appropriate attributes (e.g. "[email protected]").
  4. The RM looks up the requested resource in the configuration database to find out where it is (URL) and what mechanism is used to authorize the user (e.g. IP authentication, login credentials, etc.).
  5. The RM redirects the user to the resource according to the configured access method:
    GET (5'):
    Redirect the user directly to the requested resource. Used for a Shibboleth target, or for an IP-authenticated resource for an on-campus user.
    PROXY (5"):
    Redirect the user to EZproxy, a rewriting pass-through proxy server that retrieves web pages for the user and rewrites site URLs to go through EZproxy before it passes the page through to the user. This is used for IP-authenticated resources for off-campus users.
    POST (5"'):
    Redirect the user to login to the resource using JavaScript to automatically submit the login credentials.

The portal may also act as the user in order to integrate information from multiple sources (whether all Shibboleth-enabled or a combination). A common example is combined searching of multiple resources. The patron enters search criteria and selects various resources to search. The portal sends the query to each database, receives the results, performs some processing on the result set (such as ranking and de-duplication) and presents the results to the patron.

Figure 4
Figure 4: Library portal acting as a Shibboleth user to perform combined search

A portal's combined searching capability often uses non-Web based protocols such as Z39.50. This presents a problem in Shibboleth as the current target implementation is integrated into a Web server and can only handle access via HTTP. Consequently, content providers are not able to easily provide Shibboleth-authenticated access through these alternate protocols. In order to provide this functionality in a general way, Shibboleth target components need to provide a programming interface that can be accessed by applications other than Web servers.

More issues arise if a portal needs to access information that is user-specific (e.g., a patron's saved searches at a remote database). In this case it must convince the resource that it is acting on behalf of the real user. This masquerade can involve serious security and privacy risks. If the resource is a Shibboleth target, then it is possible (in theory, at least) for the portal to get the user's Shibboleth handle and present it to the resource for use in obtaining attributes describing the real user. This process is complicated by the various trust issues to consider when proxying security credentials [WASLEY2002]. Fortunately, in most cases a library portal can simply provide links to the services and resources rather than proxy users requests. Examples such as combined searching, where the portal can enhance the service by submitting the requests to the resources itself, do not require patron-specific information. In these scenarios the portal system can use its own identity (i.e. a special user-agent account) that is authorized to access the resources.

Library Patron Directory

Some library portals include their own directory of patrons to manage authentication and access. This duplication of information, and the security and synchronization problems that come with it, are not needed for authentication if the library portal is a Shibboleth target. But specialized library services, such as online renewal, user-initiated interlibrary loan, and intra-consortium reciprocal borrowing, require that the library system still maintain its own directory of patrons with a rich set of user attributes. Moreover, these patron directories are the best sources of some information about users that is needed for fine-grained access management. This includes data such as overdue fines or books that might trigger a library block, and perhaps finely tuned patron group associations that are referenced in resource contracts. For example, medical school faculty may be allowed to view a particular journal article online, regular faculty can get a copy sent to them, and others have to visit the library to see the hard copy.

Figure 5
Figure 5: Generating entitlement attribute values from multiple directories

At the same time, Shibboleth origins are trying to figure out how they are going to manage the new attributes that may be required in a Shibboleth federation. It may not be practical to add all the attributes necessary for very granular access control to the campus directory. In some cases, such as the library block and journal article examples above, the attributes are already stored in the patron directory. One approach that a few sites are trying is to implement plug-ins for the AA so it is able to consult multiple backend directories and attribute values to produce a single Shibboleth entitlement attribute. For example, the attributes in a patron's LDAP user object may indicate she is eligible to access a resource, but the patron directory contains a library block (due, say, to excess fines) so the AA does not provide the entitlement. This allows the library staff to maintain their specific directory information with the same integrated library system tools currently used to maintain patron accounts, while making it available to Shibboleth targets in the form of entitlement values.

Consortia and Federations

Multi-campus systems and library consortia represent certain kinds of institutional federations, and it is useful to examine the relationships between these federations and Shibboleth federations. In the simplest consortium scenario, a single origin (e.g. for multiple campuses of one institution) provides the authentication and AA for the group. Un-authenticated users can be redirected by the library portal to the origin HS to login, without going through a WAYF service. The origin's AA can use campus affiliation to determine the appropriate entitlements for resources that aren't licensed by all campuses.

In a more general scenario, a consortium such as a state university system may have a separate origin for each campus. If their libraries share a portal system then they could also share a WAYF site for patrons to identify their campus when accessing resources through the portal. If desired, the attribute acceptance policies of the library portal could be configured to only accept attributes from the consortium institutions. But in some cases it would be useful for the consortium WAYF to provide an option for "Other Institution" (which would redirect the patron to another WAYF), and have the library portal accept some attributes from origins that are part of the larger Shibboleth federation. The BASE-URL attribute, described above for sharing article citations, is one example.

Consortium scenarios are complicated by the fact that individuals may have multiple identities among the member institutions. For example, a staff member at one university may be a graduate student at another. Or, a faculty member of one school may visit (walk-in) another library. In these cases it would be ideal if the entitlements from all the identities could be merged, so users could access resources to which any of their identities are eligible. Few, if any, library portal systems support this now; multiple identities are generally handled by providing a mechanism for patrons to specify a particular institutional affiliation before accessing resources through the portal. Such a mechanism should be easy to implement in Shibboleth between the library portal and the consortium WAYF. In the future I expect link resolvers to be able to send OpenURLs to each other and receive back links that can be combined on a service menu for the patron. If Shibboleth could provide a BASE-URL for each identity, then this kind of meta-link resolver would know which other link resolvers to send the OpenURL to, perhaps providing a way to combine entitlements from multiple identities.

Another complication occurs if all members of a library consortium are not members of a Shibboleth federation. In order to join a Shibboleth federation you must set up an origin site, which is a significant undertaking. Institutions must have PKI and campus directory infrastructure in place. Within a consortium that shares library resources it is possible that some member institution campuses have Shibboleth origins sites set up and others don't. In order to provide uniform resource access to library resources through Shibboleth, the consortium would need to provide Shibboleth origin sites (based on the library patron directory) for those members that don't have their own.

Acknowledgements

Thanks to Oren Beit-Arie, Steven Carmody and Michael Neuman for reviewing a preliminary draft of this note and providing constructive comments.

References

[CARMODY2001] Carmody, Steven, "Shibboleth Overview and Requirements", Shibboleth Working Group Document, February 20, 2001.
http://shibboleth.internet2.edu/docs/draft-internet2-shibboleth-requirements-01.html

[COX2002] Cox, Andrew and Robin Yeates, "Library Oriented Portals Solutions", JISC Techwatch report TSW 02-03, August 2002.
http://www.jisc.ac.uk/index.cfm?name=techwatch_report_0203

[LYNCH1998] Lynch, Clifford (editor), "A White Paper on Authentication and Access Management Issues in Cross-organizational Use of Networked Information Resources", Coalition for Networked Information, Revised Discussion Draft of April 14, 1998.
http://www.cni.org/projects/authentication/authentication-wp.html

[WASLEY2002] Wasley, David L., et. al., "Shibboleth & Portals", May 12, 2002.
http://archives.internet2.edu/guest/archives/mace-webiso/log200206/pdf00000.pdf