Chan develops protein prediction model through MolSSI fellowship

7/19/2021

Written by

The Molecular Sciences Software Institute (MolSSI) selected chemical and biomolecular engineering graduate student Matthew Chan (a member of professor Diwakar Shukla’s lab) to be a 2021-A Seed Software Fellow to create a model using transfer learning that can decode the sequence, structure, and function of any protein. 

Transfer learning is a machine learning method that can extrapolate information from known data and apply it to understand similar but unknown data. The goal of Chan’s project is to build a machine learning model that can understand how a protein’s specific amino acid sequence affects its structure and function. 

Transfer learning allows a machine learning model to first learn about a known protein from experimental data and then transfer the knowledge to predict the effect of an unknown variant. Graphic created by Matthew Chan
Transfer learning allows a machine learning model to first learn about a known protein from experimental data and then transfer the knowledge to predict the effect of an unknown variant. Graphic created by Matthew Chan

The model contains “all the proteins available to us” from a database of proteins with known sequences and functions thanks to experimental data. Through transfer learning, the model can extend this knowledge to determine the form and function of an unknown protein —making the process of predicting mutational effects easier.

The scientists will use this model to understand how small mutations that change a specific amino acid within the protein can affect the protein’s overall function. The applications are limitless, but one timely example is the ability to predict how mutations affect the viability of the coronavirus. 

Through this fellowship, Chan attended a week-long training camp to learn the best practices for software development. He also received six months of financial support coupled with weekly guidance, training, and mentorship from two of MolSSI’s Software Scientists: Doaa Altarawy and Sina Mostafanejad. “I greatly benefited from having these scientists, who had real experience in machine learning, to advise me and plan out the next steps in this project.” 

For Chan, this fellowship was a perfect fit. His mentors helped him master totally new skill sets (i.e. machine learning applications and software development). The quick timeline allowed him to pursue this project even as he neared the end of his graduate studies. 

“Before, it was just an idea on paper, but MolSSI saw the value in it and helped us kick-start this effort to create a tangible product,” Chan said. “I am quite surprised that by the end of the six months, we actually have something that is working. I’m even more surprised by the confidence that I’ve gained.” 

While the model’s preliminary results are promising, Chan said, “there’s always room for improvement as more experimental datasets become available for the model to learn.” He is also collaborating with fellow graduate student Jesse Horne, who will continue to enhance the model after Chan graduates. 

“Matthew sought a computational approach to realize results that would be impossible through experimental methods,” Shukla said. “I am extremely proud of all that he has accomplished in this project, with much thanks to this unique fellowship. We are all excited about the application of this approach in engineering protein structure and function.”

Written by Claire Benjamin, associate director of communications


Share this story

This story was published July 19, 2021.