Main Nav

White House Initiative Highlights "Big Data"

On March 29, the White House Office of Science and Technology Policy (OSTP) held an event to announce the launch of a "Big Data Research and Development Initiative." Largely building on existing federal agency programs and projects, the initiative as described in a White House press release seeks to coordinate and enhance efforts across the federal government to:

  • "Advance state-of-the-art core technologies needed to collect, store, preserve, manage, analyze, and share huge quantities of data;
  • Harness these technologies to accelerate the pace of discovery in science and engineering, strengthen our national security, and transform teaching and learning; and
  • Expand the workforce needed to develop and use Big Data technologies."

To highlight current and planned activities, six federal agencies (National Science Foundation, National Institutes of Health, U.S. Geological Survey, U.S. Department of Energy Office of Science, U.S. Department of Defense Office of Research and Engineering, and Defense Advance Research Projects Agency) participated in the launch and discussed newly made grant awards or new inter-agency collaborations relevant to research analytics. Of particular note:

  • NSF and NIH announced a joint solicitation for grant proposals entitled "Core Techniques and Technologies for Advancing Big Data Science & Engineering (BIGDATA)." Subject to the availability of funds, the agencies anticipate making 15 to 20 awards of a) up to $250,000 per year for three years, or b) between $250,000 and $1,000,000 per year for up to five years, for a total funding commitment of up to $25 million. (NSF grants under the solicitation will draw from program funding under the agency's existing "Cyberinfrastructure Framework for 21st Century Science and Engineering (CIF21)," while NIH grants will draw funding from "participating Institutes, Centers, and Offices.") The NSF press release on its Big Data efforts indicates that the program will target "research to develop and evaluate new algorithms, statistical methods, technologies, and tools for improved data collection and management, data analytics and e-science collaboration environments." For more details on the specific science, engineering, and medical/health domains the two agencies intend to address, please see pp. 5-8 of the solicitation description at the link provided above.
  • The DOD Office of Research and Engineering noted that the DOD currently spends $250 million annually on Big Data projects, with solicitations for approximately $60 million in new projects currently underway. As described in the office's press release, the major target areas for these solicitation are a) autonomous battlefield systems that "will maneuver and understand their environment, ...make decisions by themselves, and also know when to call upon a human"; and b) "a new generation of systems that understand and interpret the real world with computer speed, computer precision, and human agility" that will help "our commanders and analysts make sense of the huge volumes of data our military sensors collect at speeds 100x faster than today."
  • Under the DOD umbrella, DARPA announced the launch of its XDATA program which will focus $25 million per year for four years on developing "computational techniques and software tools for processing and analyzing the vast amount of mission-oriented information for Defense activities." As part of the program, DARPA anticipates releasing "open-source software toolkits to enable collaboration among the applied mathematics, computer science and data visualization communities"; the agency is also planning for a proposers workshop in April to "introduce the research community to the effort, explain the mechanics of a DARPA research program, and encourage collaborative arrangements among potential performers."
  • NIH highlighted its joint announcement with Amazon Web Services that the data set from the international, NIH-supported 1000 Genomes Project would now be available online through AWS. Encompassing 200 terabytes of data--"the equivalent of 16 million file cabinets filled with text"--the data set represents the largest collection of data on human genetic variation in the world. By hosting 1000 Genomes Project data in the cloud, NIH and its international collaborators hope to further expand global research access to the data and thereby speed progress on understanding the genetic roots of disease and developing potential new forms of treatment. Access to the data is free, with researchers only paying for any additional AWS data processing or analysis services they might choose to use. As the joint announcement notes, "The public-private collaboration to store the data in the AWS cloud allows any researcher to access and analyze the data at a fraction of the cost it would take for their institution to acquire the needed internet bandwidth, data storage and analytical computing capacity."

A comprehensive website for the overall Big Data initiative has yet to be established, and no details about plans for such a site have been released at this time. For continuing information about the effort as it progresses, interested parties should probably see the OSTP site, which also houses the report by the President's Council of Advisors on Science and Technology (PCAST) that spurred the creation of this initiative.