The impact of transcript alignment on variant interpretation

In the world of genomic analysis and variant interpretation, the alignment of transcripts to the reference genome plays a crucial role. However, even with the latest reference genome, hg38, discrepancies persist, leading to what is known as ‘mapping gaps.’ These gaps between transcript sequences and the reference genome can significantly impact variant interpretation, posing challenges for diagnostics and clinical decisions.

Roberta Trunzo
Limbus News

--

Understanding the impact of mapping gaps

While hg38 has shown improvement compared to its predecessor, hg19 transcript alignment issues persist. Approximately 0.2% of aligned transcripts in hg38 exhibit mapping gaps, affecting around 54 clinically relevant genes (1228 transcripts with gaps in hg19 vs 191 transcripts in hg38 in Figure 1).

hg19: RefSeq alignment 2022–03–07
hg38: RefSeq alignment 2023–03–21

Figure 1. Number of transcripts with mapping gaps in hg19 and hg38.

The problematic gene list can be narrowed to 71 genes for hg79 and 15 genes for hg38 selecting the ones with clinical relevance based on coding transcripts (those with NM_* accession) and the ones associated with phenotypes in the Human Phenotype Ontology (HPO). Analyzing clinical data from allexes®, the reference data network connecting varvis® software setups, reveals that around 1.8% of clinical reports refer to genes with mapping gaps, impacting diagnostics across a significant number of diseases and patient records. Interestingly, these cases aren’t uniformly distributed but rather concentrated on a few transcripts, notably ALMS1 and SHANK3, representing about 45% of the affected cases in hg38 (Figure 2).

A summary of RefSeq transcript alignment issues with human reference genomes (hg19, hg38) is provided by varvis® team. This repository can be extremely valuable for troubleshooting discrepancies in variant calling, saving considerable time in identifying the root cause of inconsistencies

Furthermore, expert groups like MANE provide transcript recommendations, highlighting instances where transcripts with alignment challenges offer more extensive protein sequences, influencing clinical relevance, as seen in ALMS1 and SHANK3.

Figure 2. Most relevant transcripts with mapping gaps in allexes®

Identifying the root cause

The root cause of these mapping gaps lies in sequences that cannot be aligned perfectly with the reference genome. While efforts are made to align potentially clinically relevant transcripts, discrepancies persist due to inherent differences between DNA and RNA sequences. Ongoing advancements in alignment algorithms and the discovery of new transcripts lead to regular updates by RefSeq, impacting variant annotation.

Navigating the challenges: practical recommendations and the role of the varvis® platform

Addressing these challenges involves different approaches that our varvis® platform supports:

  1. Using Databases: Utilize multiple databases and links to other websites that the varvis® software provides to cross-validate transcripts and seek consensus on challenging sequences. You can look up any mapping issues documented by the RefSeq team in the gene management view.
  2. Documenting Alignments: Document the chosen alignments to provide clarity and transparency in analyses and interpretations. You can see which alignment version was used for each analysis in the varvis® software, and can update variant calls to the newest alignment with the click of a button.
  3. Checking provided repository: we also provide you with lists of RefSeq transcripts that have mapping issues for specific transcript<->reference alignment versions. This allows to quickly check whether tools may disagree about a variant because of this issue, as a first step for troubleshooting.
  4. Reassessing Expert Opinions: Periodically revisit expert opinions, considering RefSeq Select or MANE recommendations when uncertain. Our varvis® support team helps you to configure the most suitable ‘reference transcripts’ for your assays.
  5. Enhancing Communication: the varvis® software also includes many features to enhance intra-team communication, such as gene and variant comments. Additionally, the varvis® Academy provides a dedicated course on transcripts, for training. All this helps to raise awareness of mapping gaps and alignment quality among team members.

Conclusion

Mapping gaps in transcript alignment continue to pose significant challenges in variant interpretation. While advancements in alignment algorithms and regular updates to reference genomes help mitigate these issues, it remains crucial for bioinformatics teams and laboratories to remain vigilant. Through documentation, collaboration, and a commitment to staying updated with expert recommendations, the impact of these gaps can be minimized, ensuring accurate variant annotations and diagnoses in clinical settings.

About varvis®

The varvis® software is a clinical decision support system designed by Limbus Medical Technologies GmbH, a medical device manufacturer and software development company. The cloud-based genomics platform is tailored to support the entire NGS workflow, from raw data processing, to genomics data management and variant interpretation. Automated CNV and SNV analysis are completely integrated into the NGS workflow and clinically validated for panels of all sizes including WES. Our services comprise first class support, training, automated quality control and validation compliant with relevant international guidelines. The varvis® software is a registered CE-IVD device and specifically made to aid in the diagnosis of patients.

See for yourself

If you want to learn more about the transcripts, check out the recordings of our ESHG2022 Corporate Satellite (in english language) and AGHM2024 symposium (in french language) on our YouTube channel or check the varvis® Academy course about the transcripts. Get in touch with us to schedule your personal varvis® software demo.

References

  1. Pereira R, Oliveira J, Sousa M. Bioinformatics and Computational Tools for Next-Generation Sequencing Analysis in Clinical Genetics. J Clin Med. 2020 Jan 3;9(1):132. doi: 10.3390/jcm9010132. PMID: 31947757; PMCID: PMC7019349.
  2. Pruitt KD, Tatusova T, Brown GR, Maglott DR. NCBI Reference Sequences (RefSeq): current status, new features and genome annotation policy. Nucleic Acids Res. 2012 Jan;40(Database issue):D130–5. doi: 10.1093/nar/gkr1079. Epub 2011 Nov 24. PMID: 22121212; PMCID: PMC3245008.
  3. Aken BL, Ayling S, Barrell D, Clarke L, Curwen V, Fairley S, Fernandez Banet J, Billis K, García Girón C, Hourlier T, Howe K, Kähäri A, Kokocinski F, Martin FJ, Murphy DN, Nag R, Ruffier M, Schuster M, Tang YA, Vogel JH, White S, Zadissa A, Flicek P, Searle SM. The Ensembl gene annotation system. Database (Oxford). 2016 Jun 23;2016:baw093. doi: 10.1093/database/baw093. PMID: 27337980; PMCID: PMC4919035.
  4. Hubbard T, Barker D, Birney E, Cameron G, Chen Y, Clark L, Cox T, Cuff J, Curwen V, Down T, Durbin R, Eyras E, Gilbert J, Hammond M, Huminiecki L, Kasprzyk A, Lehvaslaiho H, Lijnzaad P, Melsopp C, Mongin E, Pettett R, Pocock M, Potter S, Rust A, Schmidt E, Searle S, Slater G, Smith J, Spooner W, Stabenau A, Stalker J, Stupka E, Ureta-Vidal A, Vastrik I, Clamp M. The Ensembl genome database project. Nucleic Acids Res. 2002 Jan 1;30(1):38–41. doi: 10.1093/nar/30.1.38. PMID: 11752248; PMCID: PMC99161.
  5. Karolchik D, Baertsch R, Diekhans M, Furey TS, Hinrichs A, Lu YT, Roskin KM, Schwartz M, Sugnet CW, Thomas DJ, Weber RJ, Haussler D, Kent WJ; University of California Santa Cruz. The UCSC Genome Browser Database. Nucleic Acids Res. 2003 Jan 1;31(1):51–4. doi: 10.1093/nar/gkg129. PMID: 12519945; PMCID: PMC165576.
  6. Pruitt KD, Tatusova T, Maglott DR. NCBI reference sequences (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res. 2007 Jan;35(Database issue):D61–5. doi: 10.1093/nar/gkl842. Epub 2006 Nov 27. PMID: 17130148; PMCID: PMC1716718.
  7. Frankish A, Uszczynska B, Ritchie GR, Gonzalez JM, Pervouchine D, Petryszak R, Mudge JM, Fonseca N, Brazma A, Guigo R, Harrow J. Comparison of GENCODE and RefSeq gene annotation and the impact of reference geneset on variant effect prediction. BMC Genomics. 2015;16 Suppl 8(Suppl 8):S2. doi: 10.1186/1471–2164–16-S8-S2. Epub 2015 Jun 18. PMID: 26110515; PMCID: PMC4502323.

--

--