![]() |
|
![]() |
![]() |
![]() |
Toward Universal Access to All Knowledge - NOTES from a talk by Brewster KahleCreated by Lida L. Larsen (EDUCAUSE) on April 29, 2009
Toward Universal Access to All Knowledge - NOTES This presentation was recorded for podcast and is available at http://www.educause.edu/blog/gbayne/PodcastTowardUniversalAccessto/170701 Speaker: Brewster Kahle Director and Cofounder, Digital Librarian, and Chairman of the Board, Internet Archive Notes – Kahle gave a fabulous talk at the Western Regional Conference on the topic of universal access to all knowledge. He prefaced his remarks by saying that there is a real blur growing between IT and Libraries where the content layers are beginning to merge. He looked at what people have carved in stone. At the Boston Library it is “Free to All” His presentation will include discussion of the scope and the issues for universal access to all. Content - Library of Congress (LOC) – 26 million books - a book is a MB so a total of 26 Terabytes for all works in the LOC. At the cost of storage we can afford 26 TB so the ability to make this quantity of information available – it is within our grasp. He said that we need to be able to search inside books with search tabs - a very “booky” exercise to make the information accessible. In his international work, the first lady of Egypt – “loves her books” and wants to be able to share them widely - which led to a discussion on “how hard would it be to make books-on-demand.” [It turns out to cost $3 to loan a book from Harvard Library.] He looked into creating a bookmaking machine – there are now mobile bookmaking machines in Alexandria, Egypt, Uganda, and more. The print-on-demand/binding machine costs $100K. But now, Netbooks, One Laptop Per Child (OLPC), electronic/digital readers, computers and specialized devices are coming around and making digital materials more accessible. Kahle talked about the million books project in India – 600K books scanned to date, in China more than 1 million are scanned now and they are headed for 3 million including 50K Arabic texts. They designed a new scanner and have established 18 scanning centers in 5 countries and are scanning 1K books a day. The cost is down to 10 cents /page – that includes maintaining the text online forever. [1.3 million now and cost is $30 million to get a million books online – same $3 per book as it costs to loan a book from the Harvard Library] Microfiche and microfilm collections - we now have mechanisms to get them online as well and so can add newspapers/magazines soon. What they have: There are 8 collections – from children’s books to Arabic texts Audio There are 2-3 million recordings which is ‘just not that big’ (though heavily litigated) so, he asks: How do we do this without going to jail? They collect on the edges from those not interested in money. For example, the Grateful Dead encouraged their fans to record their concerts and these are all available digitally now. They offer unlimited storage, unlimited bandwidth, forever, for free to musicians and get 2-3 bands a day, 40 songs a day, and have 30K concerts and more. Kahle suggests that it shouldn’t cost you to give something away and noted that anyplace else you get a reward for giving something away – but not on the Internet. Open audio essentially costs $10 per disk /$10 per hour. They now have 200K items in over 100 collections Moving Images When we think of movies we think of Hollywood films of which there are only 150-200K. However, there are about 1000 in the public domain. The more popular are the training and propaganda films from the past. They are rabidly popular – used as ‘kitcsh’ things. YouTube is now doing 99.9 % of these, [Lego film community, political debates, etc] Television (400 channels of original programming) began to just record unto hard drives 24X7 and now we can quote and critique. Work with LOC and others to prioritize, ie, Sept 11th materials are now saved and available so we can see the different points of view from around the world. Video costs $15 per video It is possible to do all of these. Ultimate issue is saving software. Many think of this as saving SW boxes instead of the SW but it’s the SW that needs to be saved. Kahle said “Good thinking starts with good passion…” and it is clear that he and his organization are passionate about what they do. WEB They are capturing 4 billion pages every two months – and doing more archiving now – all to build collections. Their mantra is that “If you don’t grab it – it might go away.” The Internet Archive and Wayback Machine both have a rapid take down if inappropriate materials are captured. They have 4.5 Pedabytes of materials and 500 hits per second. They are the largest single database. Preservation: in 1997 they had 2TB which were moved in 1999 to tape-robots and later to discs. He noted that there is loss and risk with these “moves” He noted that in ancient Alexandria, the great library burned down and now only a handful of important ancient texts are left today. Today they make digital copies. In 2002 the Bibliotheca Alexandrina began keeping a copy of the Internet Archive. Our history is now in digital form. Another of their data centers is in the heart of San Francisco. They had 2 PB in 2008. Next generation machine is a 20X8X8 box - a Wayback machine 3 PB of materials and it sits outside. Just plug in cold water, power and computers. It has low capital costs. [At this point Kahle noted that human error drives data loss.] Tomorrow the technology may look like the Jedi library in Star Wars. Amsterdam (independent group) starting another digital repository similar to the Internet Archive. Digital collections:
Examples:
[We are indexing in new ways – but to do so we must have data in bulk]
Open Library – one webpage for every book – We need to use Wikipedia to make a new universal library catalog (open library has 22 million records) LOC has 26 million books but only 12 million are cataloged. All is “Wikipediable” Kahle made a Pre-Announcement of an Open Repository for the Commons
Building the library
Our library’s digital transition so far has moved
Kahle suggests that we build the library system we’ve dreamed of and mentioned the Open Content Alliance – and said “University access to all knowledge…can be one of our greatest achievements.” Carved in stone at the Carnegie Library in Pittsburgh – “Free to All People” Q&A Images – please help – working with NASA (without $$) but have permission to add them to the collection. Info swap with the Bibliotheca Alexandrina 150MB/second - If you are on Internet2 – you can keep up. (8mb connection to the Internet – most is outbound) [20% of the Abilene backbone] Stay away from commercially viable works for now Periodical literature needs to be saved especially in Biology Currently they have published cultural data Need services on top of data – backend bulk is the current corner data sets of the Internet Archive. For example, rating systems - reviews, stars, downloads, are standard but they are now looking at more of the “if you liked this one you’ll like that one” The User community is their biggest asset in finding and fixing problems. Moving Images need work for search-ability and accessibility Kahle graciously invited attendees to visit their downtown San Francisco location the following day after the last session. About 30 people attended the tour, had a glass of wine, and took home an Internet Archive glass as a souvenir. ========================================================= This presentation was recorded for podcast and is available at http://www.educause.edu/blog/gbayne/PodcastTowardUniversalAccessto/170701
|
![]() |
|
| Unless otherwise noted, EDUCAUSE holds the copyright on all materials published by the association, whether in print or electronic form. In certain cases the work remains the intellectual property of the individual author(s) (see Special Circumstances). Content from conference speeches, presentations, blogs, wikis and feeds reflect the opinions of the author, and not necessarily those of EDUCAUSE or its members. | |||