![]() |
|
![]() |
![]() |
|
EDUCAUSE Quarterly
|
![]() |
If You Build It, They Will Scan: Oxford University’s Exploration of Community CollectionsIf You Build It, They Will Scan: Oxford University’s Exploration of Community Collections
In 2009 the University of Oxford ran a groundbreaking digitization project focused on getting members of the public to digitally capture, submit, catalogue, and assign usage rights to material they personally held to do with the First World War. The results demonstrated the potential of this approach to save money compared with traditional digitization projects. It also revealed that community collections could capture a wealth of hitherto undiscovered material held in private hands. Mass Amateur Digitization and Mobilizing the PublicIn 2008 the NPD Group’s Household Penetration Study: Ownership Landscape 2008 reported that nearly 75 percent of all U.S. households owned at least one digital camera. These ranged from compact point-and-shoot cameras to full digital single lens reflex (DSLR) cameras. Add to this figure the number of mobile phones with cameras and the public availability of flat-bed scanners or combination scanners/photocopiers/printers, and it would not be a wild claim to say that in North America, Western Europe, and other developed countries the ability to digitize visual material is almost ubiquitous. Or, to put it another way, an extraordinary resource is just waiting to be exploited - namely, mass amateur digitization. The question is how to tap into this resource for the benefit of research and teaching. The concept of mobilizing large cohorts of volunteers to assist in public projects is not new:
So are institutions looking to create digital archives missing a trick here? Could they build on the potential for voluntary projects and the clear willingness of the public to assist in projects in which they feel some form of investment, and take advantage of the widespread availability of domestic digitization equipment? Or, to put it another way, could one create a "community collection" whereby members of the public generate the digital content? More importantly, can individual institutions take on such initiatives? Traditional Digitization Projects Versus Community CollectionsBefore discussing the Oxford project that shows how mass amateur digitization might be achieved, it is perhaps worth considering why one might want to consider a community collection in the first place. The Internet is awash with digital objects (there are, for example, 29.5 million images of cats listed under the Google Image search), and it is perhaps a responsibility of higher education to only add objects of true value to that virtual mountain. In 2001 one of us (Lee) posed the question as to whether the cultural heritage sector had spent wisely during the 1990s on the major digitization projects undertaken.1 In particular, could the costs for these large-scale digitization projects be justified when placed against competing demands (notably the clamor for online subscriptions to journals and other data sets)? Ten years later, in the current global financial predicament it would seem apt to resurrect the question and perhaps ask whether the model usually adopted for a digitization project is sustainable. In most cases a digitization project is led by a large central unit (library, museum, or university) that has acquired funding to concentrate on a major collection they hold or have access to. The material is captured (for the most part as digital images) at professional standards either in-house or by a third party (with all the transport and insurance costs this might involve), returned, quality assured, post-processed, quality assured again, catalogued, archived, migrated to a delivery system, and so on. There is no doubt that the end product is of exceptionally high quality, and in the U.K. the wealth of material now made available under the recent digitization programs run by the Joint Information Systems Committee offers immense value to researchers and teachers. However, things have changed, and one would be foolish to ignore these changes.
More importantly, perhaps it’s time to stop and think about the approaches to large digital content creation projects and ask three questions:
It is important to maintain a sense of balance, however. We believe the digitization initiatives undertaken by the major cultural centers in the past 20 years have been extremely worthwhile and that scholars of the future will look back on these decades as the start of a golden age where access to resources opened up, helping researchers, teachers, and the plain curious. Furthermore, the approach of prioritizing the capture of material people want access to, and basing that on historical demand, is sensible. At the same time, many digitization projects have been focused on other needs, such as preservation, or releasing to researchers content that up to now has remained unnoticed or needs its profile raised. Nonetheless, we suggest that another approach could be used, based upon the mass amateur digitization movement and the general willingness of the public to participate in initiatives they consider worthwhile. A project on which we worked did exactly that. Oxford University’s Great War Archive InitiativeThe First World War Poetry Digital Archive, a project run at Oxford University, launched on November 11, 2008. The archive released over 12,000 digital objects drawn from collections in the U.K. and U.S., for free worldwide educational use via the web. The project has a particular focus on the major British poets of the Western Front,2 but the archive also includes a wealth of historical material to provide context to the poetry (including audio and video) drawn from collections held at the Imperial War Museum in London and the U.K.’s National Archives. In this sense, then, it is a standard digitization project funded by a national agency (the JISC) and thus no different from many others that preceded it, or will follow it, with the possible exception that it makes a point of surrounding the collection with a series of educational resources and tools targeted specifically for teaching. What makes the poetry archive relevant to this discussion is the extra project undertaken as part of the funding - the Great War Archive (GWA) initiative, which is an example of a community collection initiative. Originally intended as a small adjunct, the GWA rapidly became a major project in its own right and has attracted considerable attention worldwide. The GWA focused entirely on what the public owned and not what was in the major collections. We issued a call to arms (or rather to attics, garages, and bottom drawers) through the main media channels in the U.K., asking members of the public to submit, via the web, digital surrogates of material they personally held to do with the First World War and to which they controlled the rights (family photographs, diaries, letters, artifacts owned or collected from the war). We also asked them to record the stories that had been passed down to them over the years about their family’s experiences. Over a period of 16 weeks we made available a website that was a front end to a simple piece of software the project developed called CoCoCo. The software allowed anyone to upload objects following a set of simple steps that guided them through the provision of some basic metadata and necessitated agreeing to the license conditions. Behind the scenes CoCoCo also provided some administrative controls for further cataloguing and quality assurance. With the metadata the trick was to get the most useful information from the people submitting but at the same time not make it so laborious as to dissuade them from participating. In short, we asked them to provide:
In conjunction with this the project team ran a series of "submission road shows" around the country where we would base ourselves in a local museum or library and invite people to bring the objects along on a particular day (Figure 1). We would then talk to them about the item, get them to fill in a form with further information about themselves and what they had brought (the basic metadata again), and then we would photograph or scan the item or items. To get the word out, we targeted local newspapers and radio shows and produced a series of small, simple cards that we left in pubs, libraries, trains, and other public places (Figure 2). We also provided a "Submission Day Pack" for libraries we could not visit, which guided them through running their own submission days. Figure 1. Submission Road Shows for the GWA Project Figure 2. Cards Inviting Submissions to the GWA Project To a certain extent this was a risky venture. What if nobody submitted anything? What if nobody turned up on the submission days? Was the system just going to get spammed by the world’s pornographers? Would we be inundated with material that was fake or irrelevant? The results were the exact opposite. In the space of 16 weeks we "collected" over 6,500 items. These were all quality assured by two subject experts, and a technical imaging expert where appropriate, and only one submission was rejected because it was from the Boer War. The online administration system permitted adding additional information (for example, when a contributor was uncertain about the soldier’s regiment, subject experts could add the data based on the information in the photograph). Submission days were packed, with people bringing in the items their families had treasured over the years (Figure 1). Most importantly, these were items that had never seen the light of day, up to now. For the most part, the items were catalogued by the public, scanned by the public, and the rights for distribution agreed to by the public. The submission website was available for a limited time, as the project was very much an experiment to see if the approach would work. Afterward, though, people contacted us wishing to add yet more material. To assist with them, we opened a Flickr group, which now has a further 1,600 items (this time under the individual’s choice of Creative Commons license). An interesting observation in all of this was the blurring between the amateur and the professional. Although the digitization standards and the physical environments the public used (based on guidelines posted on the GWA submission site) were not comparable with professional work practices, and one would not want to rely on this process for archiving extremely rare items, they did the job, providing thousands of usable digital surrogates. Moreover, the wealth of information in the collective public knowledge base is astounding and demonstrated that many so-called amateurs have a lot to contribute to the academy. The comments and discussions on the Flickr site alone demonstrate the depth of public knowledge that can be tapped. Although 6,500 items might sound like a lot, numbers are not everything. For example, if the archive consisted of several thousand blank field postcards (a template card issued to soldiers, where they could only select basic choices such as "I am quite well"; see Figure 3) or numerous other duplications, then our understanding of the war would not have advanced much. Thankfully, this was not the case. The project received 42 unique, unpublished diaries by soldiers from a range of battlefields, 63 memoirs, 255 unpublished letters, over 700 photographs, various pamphlets, local recruiting posters, images of rare objects (such as the original designs for the tomb of the unknown soldier), and so on. Figure 3. Field Postcard A particularly fine example relates to the scrapbook of the Reverend L. T. Pearson, a chaplain attached to the Royal Army Medical Corps. Pearson had brought a camera with him and recorded, through his photographs and ephemera he collected from the battlefields, his journey throughout the war up to the British occupation of Cologne in 1918–1919. The last picture in the album is his first view of the white cliffs of Dover as he returned to England. (See the case study, "Reverend Leonard Thomas Pearson’s Scrapbook.") Individual items, which might not add much to our knowledge of the events of the First World War, do give insight into what the people endured:
Throughout the collection process, a GWA blog recorded other stories/items of interest (extremely useful when it came to promoting the site through the national media). The impact that the GWA has had since its launch is attested to by the many military historians and national museums in the U.K., Canada, and Australia who contacted us throughout the project and afterwards. It has been reported in press publications worldwide and has generated widespread interest. Early usage statistics show that the GWA has drawn more users to the First World War Poetry Archive than the material that constitutes the more scholarly collections of the War Poets. Teachers regularly download material to illustrate their lessons on the First World War across a range of topics, to bring the subject alive and to captivate learners. For researchers - those attached to academic institutions, genealogists, local historians, and those simply interested in following their own interest in the subject - the hitherto unseen material is providing new avenues of research. The project is regularly contacted by members of the public who have been able to trace new histories within their families and communities as a result. Cost SavingsOn analyzing the costs of the GWA digitization project, we calculated that each item cost around £3.50 ($5.70) to collect, catalogue, QA, and distribute - mainly because the costs were shifted from the project to the public contributors. In comparison, the complete capture and distribution process for each image of the main Poetry Archive project (the rare items held in museums and libraries) came in at around £40.00 per image ($65.00). Moreover, the cost per item under the GWA (not just images, as we also received audio and text files) was derived by simply dividing the total cost of the project by the number of submissions. The total included, therefore, the initial set-up costs and development of the submission software CoCoCo. The latter was performed by a contracted developer and cost £3,000 ($4,900). Had the project continued, the unit cost would have reduced further as the number of items scaled up (the actual system for collection was never really put under much load stress). Moreover, if another initiative were to use CoCoCo, it would not have to cover the initial investment in development. Again, though, this is not comparing like with like, as the quality of the material coming from the professional reprographic studios as part of the Poetry Archive was much higher than that delivered by the public. On the other hand, the public shared items never before seen, many of considerable historical value, and the stories associated with the items which outlined their provenance and history. Submissions from across the U.K. reflected key regional interests, and once the Flickr site opened, submissions came from a global audience. Moreover, the quality of the scanning was good - certainly of "workable" quality. Similarly, the cataloguing provided by the public achieved an acceptable standard, so most of our time was spent adding extra information, not correcting errors. ConclusionsThe GWA presents a model for others to follow. Not only was it extremely cost effective, it also:
It is true that the subject matter, the First World War, attracts great interest in the U.K., and no doubt the project benefited from the widespread interest in genealogy and tracing family roots. However, the workflows and systems developed could be reused for other subjects, and all are freely available (the collection software - CoCoCo - is open source and is available on request). Regardless of continued funding of large digitization projects, the GWA illustrates another area that could be explored - mobilization of the public to contribute digitized items to national archives. These community collections could provide a cost-effective means of expanding research resources. In the U.K. there is already considerable interest in taking this model further, and one could envisage a central national service that would allow researchers and research projects to quickly and easily set up their own community collection sites. In building community collections, however, we are also building communities themselves. If such initiatives source input from the public, then serious consideration needs to be given to how such communities can be fostered and maintained, and how queries and questions can be answered if projects only run for a limited time. Funding for digitization projects is often for a set period only, to capture, catalogue, and deliver material over a period of months or years. Yet a community collection requires resourcing beyond supporting the public during the submission stage and making the material available - it also, we would argue, requires sustaining the community into the future by answering questions, providing further information, and assisting teachers and researchers. Perhaps current funding models do not address the level of sustainability required for engaging the general public and need to be rethought to support longer term activity. Nevertheless, public collaboration projects within the field of digitization look to become increasingly popular, and the GWA initiative has provided the foundations for a critical understanding as to precisely what the benefits of such a collaboration may be, what challenges these projects will encounter, and how future efforts can benefit from such experience. Endnotes
© 2009 Stuart D. Lee and Kate Lindsay. The text of this article is licensed under the Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 license. |
![]() |
| Unless otherwise noted, EDUCAUSE holds the copyright on all materials published by the association, whether in print or electronic form. In certain cases the work remains the intellectual property of the individual author(s) (see Special Circumstances). Content from conference speeches, presentations, blogs, wikis and feeds reflect the opinions of the author, and not necessarily those of EDUCAUSE or its members. | |||