RefSeq Select — a standardized algorithm to select your preferred RefSeq transcript

Which transcript should I use to report a genomic variant? This is a frequent question from clinical geneticists now that Whole Exome Sequencing, which targets all known genes, has become widely available in clinical diagnostics.

Ben Liesfeld
Limbus News

--

In clinical diagnostics, genomic variants are reported based on their impact on the function of genes. In order to determine their impact, variants are mapped onto transcripts. The relevant source of transcripts for clinical diagnostics is the NCBI’s RefSeq project, which provides a collection of curated transcript sequences that are based on evidence.

For more than 35% of genes, alternative splicing leads to several RefSeq transcripts being available for a specific gene. How should you select the transcript to interpret and report a clinically relevant variant? This has been an ongoing discussion, in particular since this question cannot be answered by expert panels for every single RefSeq gene. Our knowledge of the role of specific transcripts in different tissues will also grow in the future, spearheaded by recent research efforts like the GTEx project.

What is “RefSeq Select”?

RefSeq Select addresses the issue of multiple transcripts per gene and introduces an automated workflow that identifies a single curated RefSeq transcript for every protein-coding gene. A “RefSeq Select” tag has been added to the RefSeq flat files, see for example NM_012435.3.

The RefSeq flat file for NM012435.3 highlighting the addition of the “RefSeq Select” tag.

The selection algorithm has been documented on the RefSeq Select website, which is the only source of information about this process at the time of writing this article (to the best of our knowledge). Unfortunately, changes to the process or the website are not transparently versioned.

This flowchart shows the first steps in the selection process of transcripts according to “RefSeq Select”. The complete process is shown on the NCBI website.

How does RefSeq Select relate to other standardization projects like LRG, MANE and CCDS?

  • The RefSeq Select algorithm preferes transcripts that are identified as LRG transcript. However, if more than one transcript is linked to an LRG record, only one of the transcripts will be identified as RefSeq Select.
  • The RefSeq Select transcripts are also NCBI’s input to the MANE project, which in turn enforces that RefSeq Select is a superset of MANE transcripts as well.
  • Not all RefSeq Select transcripts are contained in the Consensus CDS (CCDS). Not all CCDS transcripts are RefSeq Select transcripts.

We recommend to review and be familiar with HGVS’ recommendations regarding coding DNA reference sequences.

If you are using the UCSC genome browser, please also review the UCSC genome browser FAQ, which contains important information about the differences between the “NCBI RefSeq” and “UCSC RefSeq” tracks.

Will RefSeq Select solve all my problems with transcripts?

Short answer: no. Some issues are beyond the scope of the RefSeq Select project, for example:

  • In particular clinical cases it may be required to consider alternative transcripts that are not identified as “RefSeq Select”.
  • RefSeq Select does not solve issues mapping transcripts onto the genomic reference sequence. These issues will persist.
  • If you are using GRCh37 or hg19, mappings for some RefSeq Select transcripts may not be available at all.

Will RefSeq Select transcripts be stable?

The RefSeq project team hopes that RefSeq Select will be stable “with no updates unless there are compelling reasons”. Given the current rate of changes to gene annotations in general we advise to be prepared for changes to RefSeq Select, just to be on the safe side.

How can I utilize RefSeq Select in varvis®?

Users of varvis® may identify any RefSeq transcript as their preferred transcript that is available for their genome build. When reviewing and filtering genomic variants, it is always possible to consider mappings onto alternative transcripts.

If you would like to identify RefSeq Select transcripts as your preferred transcript globally for all genes where this is available, please get in touch with the varvis® support team.

--

--

Excited about the impact of genetic diagnostics on patients’ lives. Founder of a genomics software company.