Deep-learning algorithm aims to accelerate protein engineering


Alisa King-Kemplerer

Proteins are the molecular machines of all living cells and have been exploited for use in many applications, including therapeutics and industrial catalysts. To overcome the limitations of naturally occurring proteins, protein engineering is used to improve protein characteristics such as stability and functionality. In a new study, researchers demonstrate a machine learning algorithm that accelerates the protein engineering process. The study is reported in the journal Nature Communications.

Left: Huimin Zhao, Right: Jian Peng
Left: Huimin Zhao, Right: Jian Peng

Machine learning algorithms assist in protein engineering by reducing the experimental burden of methods such as directed evolution, which involves multiple rounds of mutagenesis and high-throughput screening. They work by simulating and predicting the fitness of all possible sequences of the target protein after being trained on protein sequence databases.

Although many machine learning algorithms exist, few of them incorporate the evolutionary history of the target protein. This is where ECNet (evolutionary context-integrated neural network), a deep-learning algorithm, comes in.

“With ECNet, we are able to look at the target protein and all its homologs to see which residues are coupled together and are therefore important for that particular protein,” said Steven L. Miller Chair Professor of Chemical and Biomolecular Engineering Huimin Zhao, also Director of the National Science Foundation (NSF)-funded Molecule Maker Lab Institute. “We then combine that information and use the deep learning framework to figure out what kind of mutations are important for the target protein function.”

In a benchmark study, the researchers showed ECNet outperforming current methods on several deep mutagenesis datasets. As a follow-up, ECNet was used to engineer TEM-1 β-lactamase — an enzyme that confers resistance to β-lactam antibiotics — and identify variants that had improved fitness and therefore, were more resistant to ampicillin.

Furthermore, ECNet prioritized higher-order and novel mutants in the analysis. Having a computational tool that can successfully predict higher-order interactions can reduce experimental efforts, said Zhao.

“We are combining all the proteins in the database with the specific evolutionary history of the target protein to improve the prediction efficiency,” said Zhao. “We can then use the mutants that we generate from our experiments to further improve and train the model. This algorithm is still a work in progress, but it’s an overall improvement on what’s already known in the literature.”

Zhao said researchers are currently using ECNet to develop enzymes catalysts with improved selectivities.

This study was a joint effort with professor of computer science Jian Peng (CABBI). Other authors of the study include Yunan Luo, Guangde Jiang, Tianhao Yu, Yang Liu, Lam Vo, Hantian Ding, Yufeng Su, and Wesley Wei Qian.

This work was supported by the U.S. Department of Energy and the NSF.