Learning the language of lasso peptides to improve peptide engineering

10/8/2025 Katie Brady

Written by Katie Brady

Left image: First author of the study Xuenan Mi received an award in 2024 for her work on LassoESM. Right image: Professor Doug Mitchell, Chemical and Biomolecular Engineering Professor&nbsp;Diwakar Shukla, and Susanna Barrett /&nbsp;<em>Isaac Mitchell</em>
Left image: First author of the study Xuenan Mi received an award in 2024 for her work on LassoESM. Right image: Professor Doug Mitchell, Chemical and Biomolecular Engineering Professor Diwakar Shukla, and Susanna Barrett / Isaac Mitchell

In the hunt for new therapeutics for cancer and infectious diseases, lasso peptides prove to be a catch. Their knot-like structures afford these molecules high stability and diverse biological activities, making them a promising avenue for new therapeutics. To better unleash their clinical potential, a team from the Carl R. Woese Institute for Genomic Biology developed LassoESM, a new large language model for predicting lasso peptide properties.

The collaborative study was recently published in Nature Communications.

Lasso peptides are natural products made by bacteria. To produce these peptides, bacteria use ribosomes to build chains of amino acids that are then folded by biosynthetic enzymes into a unique slip knot-like structure. Through this process, thousands of different lasso peptides are generated, many of which have demonstrated antibacterial, antiviral, and anticancer properties.

“There are striking opportunities to use lasso peptides in drug discovery, from targeting receptors to developing stable oral therapeutics,’ said Doug Mitchell, the Director of the Vanderbilt Institute for Chemical Biology and co-leader of the study. “By building a dedicated language model for these molecules, we’ve created a tool that helps us unlock these possibilities far more efficiently.”

Machine learning models have become essential tools for researchers, particularly for recognizing patterns in large data sets. This enables scientists to find new connections, while also saving months of time and effort. Protein prediction especially benefits from this technology, helping to uncover new insights into complex protein interactions and accelerate the discovery of new therapeutics. But commonly used AI platforms for protein prediction, such as AlphaFold, fall short when tasked with lasso peptides. 

“Because of the unique structure of the lasso peptide, none of the current AI programs actually work in terms of doing a structure prediction,” said project co-leader Diwakar Shukla (BSD/CAMBERS/MMG), a professor of chemical and biomolecular engineering and James W. Westwater Professorial Scholar at the University of Illinois Urbana-Champaign. 

Similar to the large language models powering AI chatbots, protein language models are trained to learn and apply the language of proteins: their amino acid sequences, three-dimensional structures, and interactions with surrounding environments. But without lasso peptide specific training data, these algorithms lack specificity for these molecules.

“Predicting lasso peptide properties has been challenging due to the scarcity of experimentally labeled data and the complexity of enzyme–peptide substrate interactions,” said Xuenan Mi, who recently earned her PhD in Shukla’s research group. “We developed LassoESM, a lasso peptide-tailored protein language model, to capture peptide-specific features that are often missed by generic protein language models.”

Mitchell’s group first used bioinformatics methods to find thousands of lasso peptide sequences that different microorganisms produce. To improve the quality of the data, the team also manually validated any new lasso peptide sequences they discovered. 

“Then, we learned the language of those lasso peptides using masked language modeling, which is where you hide part of the peptide, and then you try to predict the other half,” Shukla said. “Once you have learned the language of how the lasso structure is formed in nature, then you can train efficient property prediction models based on these language model parameters.”

By combining the Shukla group’s machine learning knowledge with experimental data collected by Mitchell’s group, the team applied LassoESM for numerous useful prediction tasks. One area of focus is the identification of compatible lasso peptide and lasso cyclase pairs to expand the clinical potential of these molecules. Lasso cyclases are the enzymes responsible for the knot-forming step of lasso peptide biosynthesis. Like different locks require unique keys, different peptides require specific lasso cyclases to tie the characteristic knot.

“We built the models to predict which lasso cyclase could actually form a lasso peptide using only the sequence of amino acids in a peptide. If we can understand the substrate scope or we can engineer lasso cyclases, then we can potentially make any peptide into a lasso,’ Shukla said. Without LassoESM, these enzyme-substrate interactions are difficult to predict, highlighting the utility of this artificial intelligence tool.

Mi said, “We demonstrated that LassoESM enables accurate prediction of various lasso peptide properties, even with limited training data. This work provides a powerful AI-driven tool to accelerate the rational design of functional lasso peptides for biomedical and industrial applications.”

Moving forward, the team aims also aims to expand their model to accommodate new prediction capabilities, such as building tailor-made language models for other peptide natural products and engineering lasso peptides to target specific proteins.

“Thanks to access to powerful computing resources on our campus and interdisciplinary collaboration opportunities provided by the MMG theme at Carl R. Woese Institute for Genomic Biology,” Shukla said. “I am grateful to Xuenan Mi and Susanna Barrett for leading the computational and experimental aspects of this study, and Professor Douglas Mitchell for providing experimental support and guidance during this investigation.”

The publication, “LassoESM a tailored language model for enhanced lasso peptide property prediction” can be found at https://doi.org/10.1038/s41467-025-63412-3 was supported by the National Institutes of Health.


Share this story

This story was published October 8, 2025.