CAUSE/EFFECT

This article was published in CAUSE/EFFECT journal, Volume 22 Number 2 1999. The copyright is shared by EDUCAUSE and the author. See http://www.educause.edu/copyright for additional copyright information.

Authenticated Off-Net Access to Commercial Library Resources
by Mark Sheehan and Allen Porter

Most college and university libraries make commercial information data bases available to their patrons. Examples include Lexis/Nexis Academic Universe and Britannica Online; there are dozens of others. The agreements under which such resources are distributed typically require that access be limited to institutionally affiliated patrons. Faculty, staff, and students are allowed access; the general public is not.

Database vendors use a number of means to limit access. The most common are IP address filtering and access control lists. In IP-address filtered access, the vendor accepts only Web requests that originate from registered customer-affiliated domains. The campus supplies the vendor with the range of IP addresses it has issued to its network users.

In the access control list model, the campus gives the database provider a list of user names and passwords for its eligible patrons. Patrons use these to log in each time they want to use the database. Libraries and service providers generally find this method more expensive than IP address filtering.

To libraries, the drawback of IP address filtering is the limitation it puts on off-campus access. If a patron is connecting to the Internet through a non-university affiliated Internet service provider (ISP) and requests access to a commercial database, the IP number by which the patron is identified to the database provider will not be among those the vendor�s IP address filter accepts. The request will be rejected.

At Montana State University-Bozeman, we found ourselves with exactly this problem. We have outsourced our remote access modem pool to Cable & Wireless USA (C&W). While we have been assigned our own block of IP addresses by C&W and have registered them with our database providers, not all of our faculty, staff, and student modem users are C&W subscribers. Some use other local ISPs; others use national ISPs such as America Online. The IP addresses associated with the network activity of these people, and of C&W subscribers �on the road� who use C&W�s 800-number service, are not part of a pool limited to MSU-affiliated users and thus are unacceptable to our database providers.

To serve this small but growing population of library patrons (distance education is driving the growth), we needed a way of providing access to the commercial databases our library offers. The solution we found works from anywhere, was relatively easy to set up, satisfies our vendor�s security requirements, and lets us avoid wholesale conversion to an overhead-intensive access control list model of authentication.

The Options

We reviewed three options for providing out-of-domain access to IP-filtered databases: HTTP referrer checking, CGI script-based surrogate browser, and proxy server.

HTTP Referrer Checking

Tim Kambitsch of the Dayton and Montgomery County (Ohio) Public Library describes a technique for HTTP referrer checking.1 The technique takes advantage of the fact that any request for Web services carries with it the URL of the referrer page--the page from which the link to the service was made. Thus any click on the Britannica Online link on the MSU Libraries Electronic Publications page will send a request to Britannica Online�s server that includes in the HTTP referrer field the URL of the MSU Libraries page. If a library were to tightly control access to its electronic publications page such that only authorized individuals could use it--with user name and password, for example--a database provider might agree to allow its server to honor all HTTP requests originating from (that is, referred by) that page. To allow subsequent, deeper access into the provider�s database, the provider�s server would then need to be set up to honor all requests, regardless of the IP addresses from which they originated, that listed the URLs of the vendor�s own pages in the referrer field. At present, few major commercial database providers seem comfortable with this approach.

CGI Script Surrogate Browser

Kerry Bouchard of Texas Christian University has reported on a technique for circumventing IP address filtering.2 This technique involves invoking a common gateway interface (CGI) script on the library�s server when a patron requests a commercial database. The CGI script then causes the server to send the vendor an HTTP request bearing the IP address of the server rather than the patron�s PC. The vendor�s server accepts the request and sends the HTML code for the requested document back to the server where the CGI script is running. The CGI script then sends that HTML code to the user�s browser where the requested page then appears. The primary drawback to this technique is that the CGI script must convert all relative URLs in the HTML document to absolute URLs. This and other minor drawbacks cause Bouchard to conclude, �This is not a very good way to solve the problem.�

Proxy Server

At first glance, a proxy server would seem the obvious answer. Here�s how it works. Among your browser�s setup options, you can specify the IP address of a proxy server.3 When you have done so, each time the browser requests a page, the request goes first to the proxy server which then sends out the request for you. To the commercial database provider it appears that the request originates from the proxy server. The IP address associated with the request is that of the proxy server. The requested page is sent to the proxy server�s IP address, and then the proxy server sends it to your browser. There are two problems with this approach, however. First, some form of authentication is needed. To meet the database provider�s requirement for security, the proxy server should not accept requests from just anyone, but only from authorized individuals. Second, once you have set your browser up to use a proxy server, all your requests for Web pages go through it, whether they are going to the protected Britannica Online or to the wide-open Pepsi.com. This is transparent to the browser user, but not to the proxy server administrator! The volume of traffic needlessly going through that one machine can make it a serious network bottleneck.

Our Solution

At MSU-Bozeman we solved the authentication problem utilizing open-source software consisting of the popular Apache Web server, compiled with the proxy module, running under netBSD on an Intel Pentium processor. The Apache server can be easily configured to use basic HTTP authentication (with an htpasswd file) for proxy access. The htpasswd file can be one that uniquely regulates commercial database access or it can be a file that has been set up to protect other Web pages on an institution�s Web servers. At MSU-Bozeman we maintain a separate htpasswd file on our proxy server machine, adding to it only when specifically asked by patrons who need access to commercial databases from off campus.

The second problem, that of the proxy server becoming a network bottleneck once a user�s browser has been set up to use it, is dealt with by utilizing a little-known feature of the Netscape Navigator and Microsoft Internet Explorer Web browsers. These browsers� automatic proxy configuration option allows the server administrator to write a Javascript function (a .pac file with a mime type of application/x-ns-proxy-autoconfig) that is downloaded to the browser each time it is launched. The function is then executed by the browser each time the user clicks on a link or enters a URL for the browser to fetch. Our function checks to see if the client machine belongs to a domain that has been registered with the database providers and if the host named in the URL belongs to a database provider. Only if the client domain is not a registered one and the requested URL belongs to a database provider is the browser instructed to use the proxy server; otherwise the browser requests a direct connection. To make use of this feature we must program into the .pac file the domains that have been registered and the hosts that are associated with each of the commercial database vendors. This Javascript code requires periodic maintenance.

An additional issue was that there exist small pockets of IP addresses that are permitted access to our commercial databases, but it is not practical to register them with the database providers. In these cases we require that a browser with one of these IP addresses use the proxy, but we have configured the Apache server to recognize the client address as privileged and not request authentication.

Pros and Cons

So far our authenticated, automatically configured proxy server works beautifully! We find the labor of maintaining the htpasswd file and the automatic configuration file to be minimal.

If at some point we find we need to maintain an authentication list with thousands of entries, we will probably want to derive our htpasswd file from the institution�s file of faculty, staff, and student user names and passwords and personal identification numbers (PINs) or use one of the alternate authentication methods provided by Apache. That will require cooperation with the managers of our administrative software system and either a direct link to that database or the frequent periodic update (several times a day) of our separate proxy server authentication file.

The user is only challenged for user name and password the first time the proxy server is used during a browser session. Because the browser retains this authentication as long as it is running, a browser used in a public setting should be closed when the user�s session is over.

A more disturbing drawback, but one that we have little or no control over, is that anyone on campus whose desktop computer can support the Apache Web server could create a proxy service similar to ours. If the service were appropriately authenticated, there would be little danger in this. But if the service were made available to all, the university could find itself in legal difficulties. Anyone on campus running a proxy server could effectively open up access to our commercial database providers for anyone (or everyone!) in the world. The commercial services could even be resold. While the latter entrepreneurism would be a violation of our network�s acceptable use policy, it�s not clear that our existing policy prevents campus computer sophisticates from giving away the commercial database services the university has contracted for. We�ll be working on that. In the meantime if anyone has discovered how to detect the presence of unauthorized proxy servers on a network, we�d like to hear about it!

Acknowledgments:
In addition to the authors, Michael Hitch of MSU-Bozeman�s Information Technology Center and Michael Rammer of the MSU Libraries made significant, much-appreciated contributions to this project.

Endnotes

1 Tim Kambitsch, �Patron Authentication for Remote Access to Premium Data Services.� [http://www.dayton. lib.oh.us/~kambitsch/referral.html]

Back to the text

2 Kerry Bouchard, �Remote Access Authorization for Commercial Databases Requiring IP Validation and Username/Password Validation--Lynx URL Rewriter in Use at TCU.� [http://lib.tcu. edu/www/staff/bouchard/CGI_logon/CGI_logon#CGI]

Back to the text

3 Path with Netscape Communicator 4.5 = Edit> Preferences>Advanced>Proxies. Path with Microsoft Internet Explorer 4.0 = View>Internet Options>Connection>Proxy Server.

Back to the text

Mark Sheehan ([email protected]) is director of the Information Technology Center and Allen Porter ([email protected]) is a senior systems analyst at Montana State University-Bozeman.

...to the table of contents