bioRxivNew toolPathogenicity predictionLLM applied

A fine-tuned genomic language model captures nucleotide-level information overlooked by missense variant impact predictors.

Su Y, Lin YJ — bioRxiv 2026 · June 2026

Relevance score

9/10

Disease / domain

Missense variant interpretation / pathogenicity prediction

Source

bioRxiv

DOI 10.64898/2026.05.06.723362

Share on LinkedIn

Tool / method

Fine-tuned genomic language model capturing nucleotide-level information overlooked by standard missense variant impact predictors

Summary

This bioRxiv preprint presents a fine-tuned genomic language model (LLM) that captures nucleotide-level information overlooked by standard missense variant impact predictors. Existing predictors focus on protein-level consequences and share overlapping annotation priors, creating blind spots — particularly for variants acting through nucleotide sequence context (splicing, regulation). The model significantly improves pathogenicity prediction on independent benchmarks.

Synthesis written by Geno'X. For the full original abstract, please refer to the source publication.

Analysis

Missense pathogenicity predictors (CADD, REVEL, AlphaMissense) are standard in genomic diagnostics but share common biases. A genomic LLM capturing complementary nucleotide-level information is a genuine advance — pending validation on independent diagnostic cohorts after peer review.

Analysis by Dr Thibaut Benquey

Why this score?

Clinical impact: 3/3 · Evidence strength: 3/3 · Novelty: 2/2 · Sample size: 1/1 · Publication status: 0/1 → Total: 9/10

Keywords

genomic LLMmissense variantspathogenicity predictiongenomic diagnosticsdeep learning

Weekly report in your inbox

Every Wednesday · Annotated selection · Free · Unsubscribe anytime