Learning the molecular grammar of protein condensates from sequence determinants and embeddings

Significance

The tendency of many cellular proteins to form protein-rich biomolecular condensates underlies the formation of subcellular compartments and has been linked to various physiological functions. Understanding the molecular basis of this fundamental process and predicting protein phase behavior have therefore become important objectives. To develop a global understanding of how protein sequence determines its phase behavior, we constructed bespoke datasets of proteins of varying phase separation propensity and identified explicit biophysical and sequence-specific features common to phase-separating proteins. Moreover, by combining this insight with neural network-based sequence embeddings, we trained machine-learning classifiers that identified phase-separating sequences with high accuracy, including from independent external test data.

Abstract

Intracellular phase separation of proteins into biomolecular condensates is increasingly recognized as a process with a key role in cellular compartmentalization and regulation. Different hypotheses about the parameters that determine the tendency of proteins to form condensates have been proposed, with some of them probed experimentally through the use of constructs generated by sequence alterations. To broaden the scope of these observations, we established an in silico strategy for understanding on a global level the associations between protein sequence and phase behavior and further constructed machine-learning models for predicting protein liquid–liquid phase separation (LLPS). Our analysis highlighted that LLPS-prone proteins are more disordered, less hydrophobic, and of lower Shannon entropy than sequences in the Protein Data Bank or the Swiss-Prot database and that they show a fine balance in their relative content of polar and hydrophobic residues. To further learn in a hypothesis-free manner the sequence features underpinning LLPS, we trained a neural network-based language model and found that a classifier constructed on such embeddings learned the underlying principles of phase behavior at a comparable accuracy to a classifier that used knowledge-based features. By combining knowledge-based features with unsupervised embeddings, we generated an integrated model that distinguished LLPS-prone sequences both from structured proteins and from unstructured proteins with a lower LLPS propensity and further identified such sequences from the human proteome at a high accuracy. These results provide a platform rooted in molecular principles for understanding protein phase behavior. The predictor, termed DeePhase, is accessible from https://deephase.ch.cam.ac.uk/.

Molecular Medicine Israel

Learning the molecular grammar of protein condensates from sequence determinants and embeddings

Significance

Abstract

Recent Posts

Genomic deletions explain the generation of alternative BRAF isoforms conferring resistance to MAPK inhibitors in melanoma

Autoimmunity against melanoma differentiation–associated gene 5 induces interstitial lung disease mimicking dermatomyositis in mice

A SWI/SNF-dependent transcriptional regulation mediated by POU2AF2/C11orf53 at enhancer

Adipose tissue macrophages secrete small extracellular vesicles that mediate rosiglitazone-induced insulin sensitization

Complex activity and short-term plasticity of human cerebral organoids reciprocally connected with axons

Support Us
By Promoting your Ad HERE:

Sign up for our Newsletter

MMiP

Archive

Weekly NewslEtter

Molecular Medicine Israel

Learning the molecular grammar of protein condensates from sequence determinants and embeddings

Significance

Abstract

Recent Posts

Genomic deletions explain the generation of alternative BRAF isoforms conferring resistance to MAPK inhibitors in melanoma

Autoimmunity against melanoma differentiation–associated gene 5 induces interstitial lung disease mimicking dermatomyositis in mice

A SWI/SNF-dependent transcriptional regulation mediated by POU2AF2/C11orf53 at enhancer

Adipose tissue macrophages secrete small extracellular vesicles that mediate rosiglitazone-induced insulin sensitization

Complex activity and short-term plasticity of human cerebral organoids reciprocally connected with axons

Support UsBy Promoting your Ad HERE:

Sign up for our Newsletter

MMiP

Archive

Weekly NewslEtter

Support Us
By Promoting your Ad HERE: