Provider FAQs: Technology and quality
We have a robust system in place for identifying which variants require confirmation. Our confirmation rules for SNVs and indels (single base changes and small insertions and deletions) are as follows:
We confirm a variant if:
1) It does not meet stringent NGS quality metrics, and
2) It has been interpreted as pathogenic or likely pathogenic (disease causing).
We do not confirm a variant if:
1) It meets stringent quality metrics which have been shown to indicate high-accuracy NGS results, or
2) It has been interpreted as a variant of uncertain significance, or
3) It was previously confirmed in a first-degree relative.
Invitae confirms any reported CNV event by performing array comparative genome hybridization (aCGH) with a custom designed exon-focused microarray. This is the industry standard technique for these events.
Exceptions to our current CNV confirmation policy are:
PMS2: MLPA (multiplex ligation-dependent probe amplification), not aCGH, is used for exons 11-15 of PMS2.
PMP22 full gene duplication: Detection of this relatively common event by NGS alone has been validated to have high accuracy.
Pseudodeficiency alleles are DNA variants that can lead to false positive results on biochemical enzyme studies, but are not known to cause clinical symptoms or lead to disease. Enzyme studies cannot differentiate between true pathogenic variants and pseudodeficiency alleles, so these must be distinguished by molecular studies.
Enzyme studies measure enzyme activity, or the ability of an enzyme to convert a specific substrate to a product. The inability (or reduced ability) of an enzyme to catalyze this conversion can lead to disease. In a laboratory enzyme assay, synthetic substrates are commonly used instead of the substrate naturally found in the body. Pseudodeficiency alleles are known to impair an enzyme’s ability to convert this artificial substrate to product, which can lead to a false positive result on enzyme tests. Enzymes encoded by pseudodeficiency alleles can process natural substrate normally, or at a level that does not result in disease.
Why does Invitae report pseudodeficiency alleles?
Invitae reports pseudodeficiency alleles to help clinicians interpret abnormal biochemical results. Both diagnostic studies and large-scale screening programs (such as newborn screening, prenatal carrier screening, and Tay-Sachs carrier screening) frequently utilize enzyme studies to identify at-risk individuals, and false positive results are not uncommon. Molecular analysis can identify variants known to be pseudodeficiency alleles and is able to discriminate a true positive (abnormal) biochemical result from a false positive (abnormal) biochemical result.
Invitae reports pseudodeficiency alleles identified by sequencing in our results because these variants can provide an explanation for previous or future abnormal enzyme testing. This information can reassure the clinician and the patient that the patient is not considered to be affected with the respective disorder despite abnormal enzyme studies.
How common are pseudodeficiency alleles?
The overall incidence of pseudodeficiency alleles is unknown, but large-scale screening programs have found that approximately 2% of Ashkenazi Jewish individuals are carriers of a pseudodeficiency allele for Tay-Sachs disease (HEXA gene), while approximately 36% of the non-Ashkenazi population is a carrier for a HEXA pseudodeficiency allele (1). Approximately 3.9% of the healthy Japanese population is homozygous for a common glycogen storage disease: type II (Pompe disease; GAA gene) pseudodeficiency allele (2). Pseudodeficiency alleles have also been identified in metachromatic leukodystrophy (ARSA gene), mucopolysaccharidosis (MPS) type 1 (also known as Hurler syndrome or Scheie syndrome; IDUA gene), GM1 gangliosidosis (GLB1 gene), Krabbe disease (GALC gene), Sandhoff disease (HEXB gene), Fabry disease (GLA gene), MPS type 7 (also known as Sly syndrome; GUSB gene) and fucosidosis (FUCA1 gene) (3). In addition, a pseudodeficiency allele has also been reported in a non-lysosomal storage disorder, tyrosinemia type I (FAH gene) (4).
Are pseudodeficiency alleles inherited?
These DNA changes are inherited just like any other genetic variant and can be passed to offspring. Individuals may be heterozygous, compound heterozygous, or homozygous for a pseudodeficiency allele.
Can two pseudodeficiency alleles in the same gene or a pseudodeficiency allele inherited with a known pathogenic allele in the same gene cause disease?
Based on currently available data, pseudodeficiency alleles are not thought to be associated with clinical symptoms. Many unaffected individuals with two pseudodeficiency alleles or a pathogenic allele and a pseudodeficiency allele have been identified in the population (data obtained from ExAC and Gnomad databases).
Can the the presence of a pseudodeficiency allele in an affected individual with two pathogenic variants cause more severe disease?
At this time, there is no evidence showing a more severe clinical presentation in individuals with two pathogenic variants and one or more pseudodeficiency alleles.
1) Park NJ, Morgan C, Sharma R, et al. Pediatr Res. 2010;67(2):217-20. 2) Labrousse P, Chien YH, Pomponio RJ, et al. Mol Genet Metab. 2010;99(4):379-83. 3) Thomas GH. Am J Hum Genet. 1994;54(6):934-40. 4) Rootwelt H, Brodtkorb E, Kvittingen EA. Am J Hum Genet. 1994;55(6):1122-7.
Termination codons in the last exon are not pathogenic without additional evidence because they have a fundamentally different effect on the protein product than termination codons found in other exons.
To understand why we need to know how the cell makes protein products from RNA and the role that termination codons usually play in that process:
First, the cell copies the DNA into an initial messenger RNA molecule that contains both exons and introns.
Next, the spliceosome complexes remove the introns leaving only the exons, with exon junction complexes (EJC) at the position of the original splice junction.
Then, the protein transcription machinery (ribosomes) starts translating the messenger RNA into protein. The process stops when the machinery reaches the termination codon. This is the signal that the protein transcription machinery uses to ‘know’ when to stop adding amino acids to the growing protein chain. Along the way, the protein transcription machinery also removes the exon-junction complexes from the RNA.
Once one copy of the protein product is made from the RNA, dozens, if not hundreds, of additional protein copies are made from that one molecule of RNA.
Sometimes, a variant creates a second termination codon earlier in the gene. This is known as a ‘premature’ terminal codon.
The RNA copy is made and spliced normally, leaving exon-junction complexes wherever splicing occurred.
In this situation, the protein transcription machinery stops when it reaches the premature termination codon instead of the original termination codon and at least one of the exon-junction complexes remains on the RNA.
Now, a different process kicks in. Because exon-junction complexes should be removed during translation, any RNA molecules that still retain exon-junction complexes must have a premature termination codon. Specialized surveillance machinery is used to find these RNA molecules.
Once the machinery finds the RNA molecules, it breaks them down so that they don’t continue to create truncated protein products.
Now that we understand how the cell makes protein products from RNA and the role of termination codons, we can conclude our original question: “Why are termination codons in the last exon reported as VUS?”.
If the premature termination codon is found within the last exon, the RNA molecule will not retain any extra EJC’s so the surveillance machinery won’t be able to identify and break it down. As a result, the RNA will continue to create a protein product, except the product will be lacking whatever residues would have been present in the full-length of the protein. So while most premature termination codons that are positioned anywhere else in the gene will lead to a nearly complete loss of the protein product, premature termination codons in the last exon are more akin to a deletion of the end of the gene. Without additional clinical or functional evidence showing that the deleted amino acids are deleterious, premature truncations in the last exon are of uncertain significance.
No, absolutely not. All of our interpretations are made independently according to the Sherloc guidelines, and we don’t take into account other labs’ interpretations in any way whatsoever. This does occasionally lead to different interpretations of the same variant, and there are many reasons why this could occur.
To understand why this occurs so it can be minimized in the future, we are active participants in an NIH-funded project focused on examining reasons for varied interpretations. Learn more about our efforts here.
We are also transparent about what evidence goes into our interpretations and what additional information we would need for a more definitive classification. and have open dialogues with other clinical laboratories to help resolve any differences.
We are one of the leading submitters to ClinVar, in part because we do not rely on previously existing interpretations. Review our ClinVar submissions here.
Invitae finds scientific articles by using several complementary methods. The primary method is a natural-language algorithm that automatically searches through hundreds of thousands of scientific articles and only displays literature to the interpreter that likely contains information about the variant. A second method searches publicly available databases, such as ClinVar, to find additional articles. Finally, the interpreter manually reviews each article. During the review process, the interpreter may identify other materials. In our experience, our natural-language algorithm provides significantly more information than relying on manual searches or references available in public databases.
Once we’ve found the literature, the interpreter looks at all of the available evidence and reads through each article to identify specific information that falls into the Sherloc evidence guidelines. The interpreter’s role is only to gather and apply the evidence; the evidence itself is what determines the final classification.
Some genes may undergo alternative splicing, a process that results in the generation of different protein variants from the same genetic sequence by altering the pattern of intron and exon elements joined by splicing to produce mRNA. The instructions for these alternative mRNA products are contained within the gene transcripts. For some genes, different transcripts are expressed in different tissues at different stages in development.
First, Invitae scientists review the available literature to find clinically relevant variants in a gene. Then, they compare the discovered variants with the available transcripts for each gene and select the transcript that captures the majority of clinically reported variants. To ensure that previously described clinically relevant variants aren't missed, we will report on several transcripts when there isn't a single transcript that captures all reported variants because of alternative splicing.
If at least one pathogenic variant exists in a gene, any variant in that gene could potentially be pathogenic. Conversely, if there are no conclusively pathogenic variants in a gene, we can't be sure that the gene causes disease. While reviewing the evidence for each variant in each gene is a time-consuming process, we want to make sure that the evidence meets our own high standards.
To help move the industry forward, we are active participants in collaborative efforts to identify which genes and variants cause disease. One of these projects is the ClinGen Gene-Disease Validity project, though their scope is slightly different than Invitae’s. While the ClinGen project aims to figure out which genes cause which disease, the project is also interested in comparing the relative amounts of available information for each gene.
We do not provide interpretations for variants that have not been formally evaluated by our report writing team. Our interpretation process, Sherloc, integrates prior curation, historical data, software-assisted literature searches, clinical information from the patient or family, laboratory metrics, and multiple quality control steps that we can only produce for variants detected in our lab.We routinely share our interpretations with ClinVar, and we have described the Sherloc guidelines in detail in PMID: 28492532.
If you have specific questions about variants we have submitted to ClinVar or general questions about how to implement Sherloc in your own lab, please contact us at email@example.com.
General population allele frequencies – such as those made available by ExAC and gnomAD – are invaluable for variant interpretation. As such, Invitae has developed an approach for evaluating population data that is more sophisticated than simply comparing allele frequencies against a single threshold.
How does Invitae calculate allele frequency values?
Typically, the evaluation of population data involves a very simple allele frequency (AF)* calculation of a variant:
However, this approach does not work well when comparing allele frequencies derived from two cohorts of different sizes, such as those pervasive in gnomAD and ExAC. For example, a variant in intronic or promoter regions may be represented by a cohort of a few thousand individuals, while a variant in the exonic region may be covered by a few hundred thousand individuals. Even if those two variants resulted in the same allele frequency, the precision of those frequency values will be vastly different. To account for this issue, assessment of population frequency is done by calculating the 95%confidence value of the calculated raw allele frequency. We use a statistical model called beta-distributions, which allows us to say, “we are >95% confident the allele frequency of this variant is at least greater than xxx%”. These beta-distribution derived values are what we use to assess variants.
For illustrative purposes, here are gnomAD data from two BRCA1 variants. Both variants occur at an allele frequency right around 0.1%. However, due to the small sample size for the second variant, our confidence in the allele frequency is much lower.
Source # of variants # of chromosomes sequence Raw allele frequency I am 95% confident that the variant is at least... BRCA1 NM_007294.3:c.148G>A (rs28897677) gnomAD (non-Finnish Europeans) 114 128956 0.09% 0.076% BRCA1 NM_007294.3:c.1745C>T (rs786202386) gnomAD (other) 1 1084 0.09% 0.032% For more on beta-distributions, read this Wikipedia page.
Excel has a beta-distribution function that equals BETA.INV(prob, A, B) where the probability value is set to 0.05, A is the number of variants plus one, and B is the number of chromosomes sequenced minus the number of variants plus one.
*AF = total variant count / total # of chromosomes sequenced
What allele frequency thresholds does Invitae use?
The American College of Medical Genetics (ACMG) guidelines recommend that when “(an) allele frequency is greater than expected for a disorder,” it should be considered strong evidence for a benign classification (PMID: 25741868). Rather than draw arbitrary thresholds, we empirically derived the appropriate thresholds using the allele frequencies of known pathogenic variants, as described previously in PMID: 28166811.
Based on this method, we derived 3 different thresholds:
Very high: In the absence of evidence supporting a pathogenic classification, variants at this threshold is classified as Benign. This was empirically calculated to be an allele frequency value greater than approximately 99.9% of all known pathogenic variants.
High: In the absence of evidence supporting a pathogenic classification, variants at this threshold is classified as Likely Benign. This was empirically calculated to be an allele frequency value greater than approximately 99.7% of all known pathogenic variants.
Somewhat high: An allele frequency range that suggests the variant is benign but will remain a VUS in the absence of additional supporting evidence. This was empirically calculated to be an allele frequency value greater than approximately 95% of all known pathogenic variants.
Finally, because pathogenic variants tend to be at higher allele frequency for recessive conditions compared to dominant conditions, we calculated these thresholds separately. They are as follows:
Allele frequency thresholds (based on 95% confidence interval):
Dominant: Very high (0.261%), High (0.052%), Somewhat high (0.020%)Recessive: Very high (0.523%), High (0.157%), Somewhat high (0.038%)
Sherloc is a logical framework for evaluating genetic evidence and combining this evidence into a consistent and reproducible classification of both genes and variants. Importantly, this framework is based on guidelines for the Interpretation of Sequence Variants (ISV) published by a joint ACMG and AMP working group in 2015. Through an iterative and experiential process we started with 33 ISV rules, refined these to provide greater clarity, and introduced more than 100 additional rules to capture a wider variety of edge cases and exceptions routinely encountered during clinical genetic testing. The outcome of these changes is a more reliable, scalable and efficient variant interpretation process. If you would like to hear more about the specific improvements we’ve made and how Sherloc compares to the ACMG-AMP ISV guidelines, please see the invited commentary in Genetics In Medicine or watch our Sherloc webinar.
For a more detailed description of the Sherloc framework and evidence criteria used to interpret sequence variants see the manuscript published in Genetics in Medicine.
For diagnostic CFTR testing, variants in the polymorphic TG/T tract are analyzed, interpreted and only reported if classified as “Likely pathogenic,” or “Pathogenic.” The 5T is classified as Pathogenic and would be included, if present. The benign polymorphisms, 7T and 9T, and uncertain variants are not included in the primary report but are available upon request.
For carrier screening, when the 5T variant is present in conjunction with 11TG, 12TG, or 13TG, it will be reported. We do not report the presence of 5T if it is in conjunction with any other TG tract variant (e.g. 10TG). A 5T variant is always associated with a specific number of TGs in the gene. Some TG numbers (11, 12, 13) are known to be problematic (to different degrees) while others (10) are not thought to be pathogenic.
For panel testing, Invitae performs full-gene sequencing and deletion/duplication analysis using next-generation sequencing technology. For more information about our assay, please visit our Assay page.
For exome testing, Invitae uses the most advanced next-generation sequencing capture technology, rigorous bioinformatics, and detailed phenotypic and clinical information to yield the most accurate interpretation for your patient. For more information about our exome methodology, please visit the Exome webpage.
Yes, Invitae’s panel tests detect deletion/duplication events encompassing a single exon or more. In some cases, specific genes and exons are excluded from analysis. Please consult our Test Catalog for details.
Our copy number detection algorithm can also identify large deletion/duplication events that include and extend beyond a targeted gene, although the boundaries of those events cannot be determined beyond the gene itself. In that respect, Invitae's testing does not provide cytogenetic analysis for large chromosomal anomalies or copy-neutral changes, such as reciprocal translocations, uniparental disomies, or inversions (one exception: we offer the MSH2 exon 1-7 inversion).
Note: In August 2015 we started offering del/dup analysis for full PMS. For more information about our methods, please visit our Validation studies page. (Deletion/duplication analysis is not guaranteed for gDNA samples.)
In contrast to Invitae's gene panel sequencing, where single-exon del/dups are detected, the greater variability in depth of coverage across an exome permits reliable detection of del/dups spanning 4 exons or more with high confidence; smaller events may be detected and will be reported when sufficient resolution exists.