Known issues with genome build hg38

Back in 2020 we were advertising hg38 as the better reference genome and highlighted potential challenges when transitioning from hg19 to hg38. More recently, the work of the T2T consortium revealed previously unknown shortcomings of hg38 which we will discuss here.

Ben Liesfeld
Limbus News

--

A tweet from Steven Salzberg drew our attention to a blog post from the Genome Reference Consortium. It appears that hg38 does not only contain Chinese hamster sequence, but also falsely duplicated regions on chr21.

Issues with impact on clinical diagnostics

While the hamster does not represent a problem for clinical diagnostics, the duplicated regions do when using short-read sequencing. When aligning short reads to the reference genome, those reads that cannot unambiguously be aligned to the reference sequence will be omitted from further analysis. This means that these regions will typically be represented with 0 depth of coverage in short-read NGS assays in the varvis® software. The genes in these regions will simply not be accessible for diagnostics. This is transparent to the user, though.

Fig. 1 “Summary of the complete T2T-CHM13 human genome assembly.” from https://doi.org/10.1126/science.abj6987

Potential solutions to the problem

Unfortunately, the proper correction (removal) of the above-mentioned falsely duplicated regions would require a new genome release (“hg39”) which is not in sight. And there is already yet another version of the reference genome around the corner: T2T-CHM13.

There are generally two different approaches:

“Fixing” hg38

In 2021 the GiaB consortium published a pre-print (now published here) where they propose a solution which would fix the issue in the short term for NGS assays: creating a custom version of hg38 where the duplicate regions are masked by the letter “N”. While the GiaB consortium provides convincing evidence that this improves the analytical performance of variant calling compared to the unmodified hg38 reference, the pre-print has still not been peer-reviewed.

Using the new T2T-CHM13 reference genome

While the T2T Reference has been widely published and already demonstrated to be a significant improvement to hg38, it is still quite fresh and has not been integrated with important annotation sources. This means the most important annotation sources like gene annotations (RefSeq), disease-related or common variants (ClinVar, gnomAD) are not yet available on the T2T genome. T2T GiaB reference data sets are also missing. We should be prepared, but we have to wait before T2T can be put to use in a diagnostic setting.

By the way, there is another genome reference in the making: the Human Pangenome Reference.

Further reading

About varvis®

The varvis® software is a clinical decision support system designed by Limbus Medical Technologies GmbH, a medical device manufacturer and software development company. The cloud-based genomics platform is tailored to support the entire NGS workflow, from raw data processing, to genomics data management and variant interpretation. Automated CNV and SNV analysis are completely integrated into the NGS workflow and clinically validated for panels of all sizes including WES. Our services comprise first class support, training, automated quality control and validation compliant with relevant international guidelines. The varvis® software is a registered CE-IVD device and specifically made to aid in the diagnosis of patients.

See for yourself

If you want to learn more about the varvis® software and services, please get in touch with us to schedule your personal varvis® software demo.

--

--

Excited about the impact of genetic diagnostics on patients’ lives. Founder of a genomics software company.