Main Nav

EDUCAUSE North East Regional Computing Program Conference 2007. Summary: Universal Access to Human Knowledge

General Session
Wednesday, March 21, 2007
Worcester, Massachusetts
Universal Access to Human Knowledge (or Public Access to Digital Materials)
Brewster Kahle, Director and Cofounder, Digital Librarian, and Chairman of the Board, Internet Archive
The goal of universal access to our cultural heritage is within our grasp. With current digital technology we can build comprehensive collections, and with digital networks we can make these available to students and scholars all over the world. The current challenge is establishing the roles, rights, and responsibilities of our libraries and archives in providing public access to this information. With these roles defined, our institutions will help fulfill this epic opportunity of our digital age.
Brewster Kahle’s primary mission in life is to take the published works of humankind and make them accessible to everyone (universally) in the world. He argues that this is the opportunity of our generation and if we don’t make them available now we will have lost our opportunity to do so.
Kahle noted that many kids today think everything is already on the Internet and available to them but they do not understand that the best we have to offer is not. He asked “what are the roles going to be as this digital information revolution rolls out?” Is there a role for universities or will the non-profits rise to the occasion?
The goal will be to process information from shelves to servers. Kahle noted that “Free to all” is inscribed over the door of the Boston Public Library. Library or access to information needs to be FREE so each individual will be able to do what is needed to do to create new ideas out of the old.
Kahle suggests that universal access is within our grasp and suggests that two key elements in its provision are the roles and politics that will undergird the work.
Digitization of text:
Kahle said that the digitization of all books and printed materials is within our grasp. A rough calculation is that it would take $60K to house the 26-28 million volumes in the Library of Congress on a Linux machine but if you actually like the book format then we might be looking at $750 million to scan each book and make it available using a scanner developed specifically for this purpose. The equipment is about $100K and the cost is about 10 cents per page which is cheaper than the cost of the building a library with the space needed to house the physical books.
They’ve also created a bookmobile which is actually a printmobile in that it can print a book on demand for about a penny a page. This will allow printing and binding your own books if you want paper instead of screen. The basic idea is that kids can make their own books at a low cost which is cheaper than the library can loan them since the print-on-demand system is based on toner ink instead of oil based ink. India, Egypt, and Uganda have systems operating now. 
He discussed the Open Content Alliance and The Million Book Project, funded by NSF and spearheaded by Carnegie Mellon University in the US but which has focused primarily in India where the cost of scanning materials is a third of the cost in the US. China and Egypt are also involved.  Kahle also brought a beta $100 laptop that gives 200 dots per inch and wondered if the next major laptop company could be a non-profit using open work. The Google Books project was also discussed. 
For the higher education environment, Kahle suggested that since text books are quite expensive, often costing more than $100, we should modernize, cut out the middlemen, and reduce the cost of distribution to our students.
Digitization of audio:
There are 2-3 million discs in press and it costs about $10 to digitize a single disc. We have the technology but this is an area of high litigation. They are currently working with musicians to give them storage and unlimited bandwidth for their work. This has been the biggest win particularly with the rock-n-rollers who let people trade their music as long as they don’t make money from the trade. They currently have archived 2000 bands with 30,000 recordings, including everything from the Grateful Dead. Kahle said that the “pat on back” and the “tax exemption” for giving things away are benefits as well. He suggests that anything ‘common commons’ should be archived forever.
Again, Kahle demonstrated that there are no technological obstacles to digitizing content in the audio format.
Digitization of moving images:
The digitization is a larger issue but there is less to digitize. There are only about 150-200K “feature films” but, as we know there are multiple legal issues for most of these.  They have digitized about 600 “B” movies that aren’t under copyright, old government and advertising films and now have about 30K films archived. Students are re-contextualizing their world by cut/paste and there is significant use by researchers.
Kahle mentions, however, that this work is dwarfed by cool YouTube and new genres like the Lego movies. It doesn’t cost much and provides storage venue for creative people.  Videos/lectures can all be added to the open content at the cost of about $10 per hour per video.
Moving images also come from TV, and they are recording 20 channels 24X7. He talked about the importance of seeing the news from different perspectives. For example, a CNN story is different than a Palestine story on the same event so we can reference news stories from different perspectives now. They have about a pedabyte archived now.
Kahle also mentioned the archiving of about 50K software packages and others from download sites so that we can play old things on an emulated device in the future.
Capturing the Web
Every two months since 1996 the Internet Archive has taken a snapshot of the Web. It’s about 100 terabytes for each snap at this point.  To avoid a disaster like the burning of the Biblioteka Alexandrina, they need to have multiple copies of the archive housed around the world. Large scale swaps are needed to build an international library and we need 5-6 copies around the world in order to sleep well at night.  
Kahle showed a photo of the pedabox Internet Archive housed in the new Biblioteka Alexandrina. Kahle’s “Wayback Machine” allows us to access the archives. It gets 300-500 hits per second which is more than 100K a day.  Most people are using it to look at their own stuff, their old websites.
Kahle continued that preservation and access must be inexpensive. At this time the total quantity to be preserved is not bad and the cost is not bad but there is the question of what we do with it. How do we preserve – make it accessible and people will use it and demand it be kept. A European archive has started and they are moving along without us.   He asks if we in the US will rise to the opportunity. It’s within our grasp but split between public and private right now and questions of who will control it are political. 
Questions to consider include: Public or private? Open or proprietary?  Role of libraries, companies, or public institutions acting like companies?  
Kahle closed noting that “Free to the People” is carved in stone over the entry to the Carnegie Library in Pittsburgh asked us to all join the Open Content Alliance where universal access to all knowledge can be one of our greatest achievements.
What should be archived: Everything – or moderated everything?
We are in the demonstration stage.
Good to have technology in the curatorial process.
What are your institutions going to take online and bring it up the chain?
You’re working within the current business model but everything today starts out electronic?
By going with the model we can work backwards and try to get full access legally.
We are working where we can re: lectures etc.

Tags from the Community