IT Best Practices
A focus on funding and self-sustainment can ensure continued research. Building and hosting local machine learning workstations can require substantial up-front investments, in terms of both cost and labor. Higher education IT units cannot afford to take that kind of investment lightly, but there are several ways to lighten the burden in both the short and the long term.
IT managers reported that building relationships with industry partners may help provide researchers with opportunities to connect to industry resources and apply for and receive grants for workstations, such as the HP Grant Support Program. Staff facilitators can also help researchers apply for grants to receive cloud computing credits with the NSF or directly with Google or Amazon Web Services (AWS). If researchers receive cloud grants for their labs or projects, this can help reduce or remove some of the financial burden from IT.
Other interviewees reflected on the potential long-term benefits of creating and hosting an on-premises cluster of workstations for research purposes. IT can then run a service center, allowing researchers to rent hardware and other computing capabilities for a specified period of time. Campus initiatives and grant applications can help fund the creation, maintenance, and expansion of such a service center, but ideally the demand for the resources will help pay for itself and future hardware and infrastructure upgrades.
Case Study Example
At the Massachusetts Institute of Technology, Brian Anthony is the Associate Director of MIT.nano, a laboratory with more than 200,000 square feet of advanced research space, including a new Immersion Lab that focuses on facilitating immersive experiences and display and interaction with 3D, 4D, and other large datasets. For example, researchers might employ machine learning to facilitate human interaction with data in real time, allowing for innovative new work to be done in fields such as neuroscience and biomechanics. Among other tools, researchers at the Immersion Lab have access to an HP Z8 workstation with dual Intel Xeon processors and dual NVIDIA RTX 8000 GPUs to accelerate machine learning applications in projects including photogrammetry, motion capture, ultrasound tomography imaging, and processing large datasets at the edge for smart-home ambient sensors and cryo-microscopy. In the last two examples, having a high-performance workstation at the data source to extract actionable insights offers a tremendous advantage over cloud computing by obviating the need to transfer a large amount of data over the Internet, some of which may be confidential.
For funding, MIT used a combination of internal and grant funds to acquire the variety of technologies needed to get the lab up and running in 2020. Since then, they've worked to build out a variety of technical capabilities in the lab and are operating the lab on a cost-recovery model in which researchers pay a fee for using the lab, its toolsets, and computing capabilities for a short period of time. Since part of the mandate of the lab is to "raise the water level" of machine learning access, the administrators also work with students and researchers to help them obtain any funding or internal grants they need to access the lab and technology.
The Immersion Lab is open to any professor, student, or researcher from the MIT campus, as well as unaffiliated individuals or companies interested in the lab’s capabilities. These companies must go through a simple application process to gain access. External users expand the range of projects being carried out in the Immersion Lab and provide an extra source of income to help maintain the lab's equipment and pay for staffing.
IT units can find proactive approaches to understanding researcher needs and experiences. With so many different levels of technology needs for researchers across the various disciplines where machine learning is being incorporated, IT needs to be prepared to offer a variety of solutions based on unique or discipline-specific researcher needs. IT managers suggested putting together a communication plan for researchers to encourage them to open a dialogue with IT, as well as ensuring that needs assessment and maintenance processes and communications are in place to stay up to date on the needs of users.
Through the pandemic over the past year, the relationships between researchers and IT have evolved in important ways. With the proliferation of social distancing and remote working across campuses, IT staff may need to be even more proactive now in keeping researchers up to date on new services and in ensuring researchers are satisfied with IT processes and operations. Erik Engquist, Director of the Center for Research Computing at Rice University, highlighted how their IT staff have had to put extra effort into engaging with faculty and graduate student researchers because "you can't run into them in the building anymore."
Case Study Examples
At Red Rocks Community College (RRCC), Adam Forland, a mathematics professor, worked with Bill Cherrington, Director of IT, to start and grow their machine learning curriculum and technology by showcasing student interest and success over time. They started with limited technology, building their first machine learning system by borrowing the GPUs of several virtual reality systems at their institution. Since then, this professor has worked closed with students and faculty in close conjunction with the IT department to apply for grants, leading them to purchase a Z8 workstation from HP, which they've used to "develop a computational platform to introduce Python to as many students as possible" through project-based learning courses.
The projects developed by RRCC students with their professor spurred the creation of innovative proofs of concept, with high levels of collaboration between instruction and infrastructure/support teams. Since it was launched, the system has been proving itself successful with students getting to showcase their projects and the skills they have gained in competitions such as the NSF Community College Innovation Challenge. This system of IT and faculty collaboration at RRCC continues to mature and expand, with the IT team applying production systems controls and industry best practices such as system backups, vulnerability scanning, and data classification as they continue to grow with their users.
At the University of California, Berkeley, researcher David Brookes, PhD candidate in Jennifer Listgarten's lab, has been using the Z8 workstation to explore new ways to use machine learning in protein engineering to improve on or create new protein functions. Machine learning allows for more testing and experimentation on protein functions, building on the past decade of advancements in the field. Brookes has access to a national lab for his research, with plenty of CPUs available, but his style of testing and research on protein engineering requires GPUs.
Before acquiring the local workstation, he had access to the Berkeley lab with plenty of GPU support, but because it was shared with many research groups, he faced queue times of up to a week. Now that his lab has acquired the Z8 workstation, Brookes reported that it is powerful enough to share between himself and four other researchers, and he is using about 100 GB of data per project. Right now, the data can be stored and secured locally, but as projects get bigger and more numerous, IT may need to think about distributed or cloud storage.
Meanwhile, in the civil engineering department at UC Berkeley, a different set of challenges was reported by Kenichi Soga, a professor working with some of his graduate students to further incorporate machine learning into their field. These researchers must work closely with machine learning software development teams as they work to run traffic simulations of millions of cars trying to escape a natural disaster or work to track the flow of construction equipment to work on a tunnel construction project. During the interview, Soga and his graduate students reported that they struggle to understand the documentation of the open-source software they are using, especially how to know if their data satisfy the assumptions made by some of the models. Researchers like these require extra time and extra support from IT to help bring machine learning into their work.