Back
bioRxivNew toolPathogenicity predictionLLM applied

A fine-tuned genomic language model captures nucleotide-level information overlooked by missense variant impact predictors.

Su Y, Lin YJbioRxiv 2026 · June 2026
Relevance score
9/10
Disease / domain
Missense variant interpretation / pathogenicity prediction
Source
bioRxiv
DOI 10.64898/2026.05.06.723362
Share on LinkedIn

Tool / method

Fine-tuned genomic language model capturing nucleotide-level information overlooked by standard missense variant impact predictors

Summary

This bioRxiv preprint presents a fine-tuned genomic language model (LLM) that captures nucleotide-level information overlooked by standard missense variant impact predictors. Existing predictors focus on protein-level consequences and share overlapping annotation priors, creating blind spots — particularly for variants acting through nucleotide sequence context (splicing, regulation). The model significantly improves pathogenicity prediction on independent benchmarks.

Synthesis written by Geno'X. For the full original abstract, please refer to the source publication.

Analysis

Missense pathogenicity predictors (CADD, REVEL, AlphaMissense) are standard in genomic diagnostics but share common biases. A genomic LLM capturing complementary nucleotide-level information is a genuine advance — pending validation on independent diagnostic cohorts after peer review.

Why this score?

Clinical impact: 3/3 · Evidence strength: 3/3 · Novelty: 2/2 · Sample size: 1/1 · Publication status: 0/1 → Total: 9/10

Keywords

genomic LLMmissense variantspathogenicity predictiongenomic diagnosticsdeep learning
Weekly report in your inbox

Every Wednesday · Annotated selection · Free · Unsubscribe anytime