Advancing generative large language models toward discriminative performance in protein function prediction.
Tool / method
Multitask generative LLM sequence-to-function via natural language generation
Summary
OPUS-PLLM is a multitask generative LLM that predicts protein function from amino acid sequence via a sequence-to-function paradigm using natural language generation. Unlike previous approaches that benchmark generalist LLMs (ChatGPT-4o, DeepSeek-v3) without matching specialized model performance, OPUS-PLLM achieves competitive performance with top discriminative models (ESM2, ProtT5) for function prediction. The model integrates modality encoding, modality refinement, and instruction tuning on dedicated datasets constructed for this study.
Synthesis written by Geno'X. For the full original abstract, please refer to the source publication.
Analysis
Predicting protein function from sequence remains a fundamental challenge for interpreting variants of uncertain significance in clinical genomics. OPUS-PLLM demonstrates that generative LLMs can rival specialized discriminative models, paving the way for unified sequence-to-function tools integrable in variant annotation pipelines. Published in Genome Biology, this work illustrates the rapid maturation of LLMs for molecular biology applied to genomics.
Why this score?
Clinical impact: 1/3 · Evidence strength: 2/3 · Novelty: 2/2 · Sample size: 1/1 · Journal quality: 1/1 → Total: 7/10
Keywords
Every Wednesday · Annotated selection · Free · Unsubscribe anytime