Machine Learning’s Growing Role in Research

by Sean Burns Monday, March 29, 2021

Faculty and Researcher Needs and Challenges

More students are taking classes in machine learning. Every interviewee in this project reported that undergraduate and graduate courses in machine learning have seen rising interest from students. Interviewees reported that undergraduate courses in machine learning were regularly full with a waitlist, and many reported that they had expanded the number of courses offered in recent years to accommodate interest. Additionally, the students enrolling in these courses are coming from more disciplines outside computer science. Among graduate courses, interviewees reported much higher numbers of applicants for both master's and PhD programs.

Higher education researchers struggle to compete with industry research. No matter the type of institution, budget constraints will be an issue for researchers wanting to incorporate machine learning into their research. As one graduate student put it, being a researcher in higher education "isn't an internship at Google"—it's "always going to be impossible to compete with research groups with nearly unlimited computing resources." Rich Baraniuk, professor at Rice University and founder of OpenStax, highlighted the challenges of trying to compete with industry-grade research technology and processes: "Not only does the industry have better computational resources, but also they have more manpower and prebuilt APIs and project pipelines, which makes testing of new ideas much more streamlined."

This imbalance in resource availability is certainly a challenge for researchers in higher education who need to publish and present at conferences. However, this challenge is not insurmountable, as interviewees reported that these industry research groups' goals are often to market their computing capacity, while higher education researchers can use domain knowledge, creativity, and repeated experimentation to improve results and processes. In fact, several interviewees suggested that the ingenuity and dedication of higher education researchers is far more valuable than any quantity of computing power. As an example, by leveraging the large NVIDIA RTX 8000 GPUs in the HP Z8 workstation, one of Baraniuk's PhD candidates was able to accelerate his research and achieve 10-times the number of word embeddings in his natural language processing (NLP) workflow as compared to the legacy platform.

Privacy is a growing concern in machine learning and AI research. With more researchers in more disciplines having access to more data, researchers and IT staff alike are beginning to voice privacy concerns that need to be addressed. Machine learning research in the education and medical fields was of particular concern to interviewees, given the data security and confidentiality needs in these fields. But hope is on the horizon, as some researchers are trying to address these growing concerns by developing new frameworks including Privacy-Preserving Machine Learning (PPML) and Privacy-Preserving AI. Though these practices are still nascent, researchers who are working with confidential data or who are looking to improve privacy-enhancing techniques should approach their algorithm training with privacy preservation in mind.

Researchers and IT staff can more easily work together to control the security and privacy of their machine learning research by following developed processes for data storage and access. An advantage to using local workstations or computing clusters is that IT can better control these types of processes, working directly with the technology and creating documentation or training to help researchers. When researchers can work with IT and ask questions, institutions can mitigate the chances of any kind of data or security incidents.

Researchers examine movement data at the MIT Immersion Lab. *(Image credit: Tom Gearty/MIT.nano)*