Back
DNABERT-2HGNC PubMedBenchmarkPathogenicity predictionLLM applied

Benchmarking reveals the superiority of nucleic acid foundation models in predicting lncRNA coding potential.

Yang Y, Ren L, Feng J et al.Genome Biol 2026 · June 2026
Relevance score
9/10
Disease / domain
lncRNA coding potential prediction
Source
PubMed
PMID 42243956
Share on LinkedIn

Tool / method

Benchmark of 16 tools including 4 nucleic acid foundation models for coding lncRNA (codlncRNA) prediction

Summary

Long noncoding RNAs harboring short open reading frames (coding lncRNAs) encode functional micropeptides, but their identification remains challenging with classical bioinformatics tools. This study establishes the first comparative benchmark stratified by experimental evidence quality, comparing 12 classical tools and 4 nucleic acid foundation models (including DNABERT-2 and RNA-FM). Foundation models consistently outperform classical tools, with validated multi-species generalization. An open-source framework and web server are provided to standardize future evaluations.

Synthesis written by Geno'X. For the full original abstract, please refer to the source publication.

Analysis

The result confirms what is observed elsewhere: large models pre-trained on broad genomic corpora capture representations inaccessible to classical bioinformatics features. Stratifying benchmarks by experimental evidence quality is an important methodological contribution, often overlooked in comparative studies.

Why this score?

Clinical impact: 2/3 · Evidence strength: 3/3 · Novelty: 2/2 · Sample size: 1/1 · Publication status: 1/1 → Total: 9/10

Keywords

lncRNAfoundation modelscoding potential predictionDNABERTbenchmarkgenomic LLM
Weekly report in your inbox

Every Wednesday · Annotated selection · Free · Unsubscribe anytime