Main Nav

E-Research and E-Scholarship: Enabling the Scholarly Communication of the Future Discussion Session WRC09 - San Francisco April 2009

E-Research and E-Scholarship: Enabling the Scholarly Communication of the Future

Notes from a discussion session faciliated by Robert McDonald, Associate Dean for Library Technologies, Indiana University

Session wiki - includes discussion points and additional resources

Why people attended:

CIO:  Ideas for working with faculty – looking for low overhead models for faculty to get access to data sets and other resources in a setting where they don’t have to have much staff support.  Hopeful that scalable Web 2.0/3.0 commodities and resources and “cloud computing” will find its way into his realm so he can best provide the infrastructure and support needed

IT Sustainability Director – There’s so much more data coming – how to do it in an environmentally appropriate way?

Marine Lab CIO– Wanted to know if e-research requires an open data policy and what does that mean?  They are working with oceanographic data -  What are the consequences, what are policies needed? 

Manager, Digital Collections Production Center for a regional consortium -  Digitization and its user community – connections, appropriate use, etc.


Mentioned Missouri System-wide digital library ( – key issue for them is to get information back to the contributors.

Melissa Woo led a similar discussion at the Midwest Regional in which they discussed the open policy aspect of sharing/re-use of data.  Strong librarian participation in the session.


“The client-server model killed the conservative ethic (of computing)  Profligacy replaced frugality as defining characteristics of business computing” p 56 of The Big Switch by Nicholas Carr (who first looked at how the power grid in US was formed and how it is happening now for computing)

We have lots of capacity that is unused now.  Now we are virtualizing everything.

Another quote “…a modern corporate data center can use up to 100 times more energy per square foot than a typical office building.  The study found that a company can spend upwards of $1million dollars per month on the electricity required to run a single big data center…”  study from Lawrence Berkeley National Lab, 2005. 

Q - Where do we build these new data centers for maximum energy savings?

Build it to scale – collaborate with Amazon? 

Stanford is building a research data center in the “hills” where it will be air cooled – getting it off campus and affordable.  Mostly compsci, engineering, and biosci, bioengineering data.

Health sciences data drove the Indianapolis campus to move in this direction under an associate dean for research computing.  McDonald referenced DuraSpace – Fedora and DSpace working with Mellon funding

John Unsworth quote “propose a patron saint for cyberinfrastructure”  his nominee was Ben Franklin.  Unsworth is the Dean of the ISchool and Director of the Informatics Institute at the University of Illinois – 7.15.08 Bamboo Project Meeting

Q - How do we get the details worked out for the storage and compute cycles needed?  We need a trust factor on vetting the cloud. (Do we need the primary copy or a secondary one though? Key is minimally you can’t lose the transactions on the servers. )

Cost of storage – re Brewster Kahle’s earlier “out-of-the-box thinking” comments at the conference’s opening session (see.  Using containers – takes a lot of power.  Where you need high performance storage, and where you do not, makes a difference.  We need to right-size the resources and when we build for an institution it needs to be big, but we can get what we want from the “cloud,”  Must think of them as a shared resource… however, governance issues are still very political.  McDonald – “must think more globally than higher education …and more globally than higher education in the US.”

If you are talking in gigabytes it’s much too small to scale – need to be talking in terabytes. 

Hathi Trust ( Not everything in the libraries will be digitized by Google (they are doing 200 volumes per month)

Burning question – lots of interest in legacy data. (1890s weather data)  Who’s looking at it?  Depends on who is curating it?  Much is un-curated.  Waiting to see what happens with the NSF Datanet track  (  Two projects so far have been funded and they have a year to get something in place ($ 4 million each for 5 years) – then another 5 years to look at sustainability.   Big piece is discoverability – among research colleagues – social networking aspects important.

Need to have a data/cyberinfrastructure/collaboration coordinator on your campus. 

CSU East Bay – start with educating faculty so they can educate us on what they need us to do.  Next stage is to work with data clusters on campus – shared resources approach.  Data approach – working with librarians… what does it mean for graduate students and faculty?   Also, how we build curriculum and push research into the undergraduate experience?  They’ve had three meetings so far with ad hoc reporting out. 

Indiana attaching people to labs who can report back ($$ from associate dean for research/pervasive technology) – they are also looking for a shared solution for humanities research.  

How do you find people with the skill sets you need to do this work?  McDonald is looking at iSchool curriculums – plus there is a need to know the domain knowledge.

Points from the Midwest Regional Discussion

  • Arts and Humanities Faculty are increasingly being involved in CI needs due to expanded use of digital content in their work
  • Social science areas have similar issues but are more concerned with statistical analysis and data mining (need computational support)
  • Increasing expectations of partnerships across disciplines and across institutional boundaries