Next-Generation Sequencing (NGS)

Variant calling as a diagnostic tool for complex cases

Over the past few decades, remarkable progress has been made in the field of genetic research. One of the biggest breakthrough technologies has been next-generation sequencing, which has greatly improved diagnostic performances for mendelian diseases. Nowadays, it is possible to perform complete genomic and transcriptomic testing in under 24 hours, allowing the detection of any abnormalities present in a patients' DNA. Multiple methods have been developed, all carrying their set of advantages and limitations:

Whole-Exome Sequencing (WES): Targets coding regions of the DNA for a faster and more affordable evaluation of the patient's potential causative mutations.

Whole-Genome Sequencing (WGS): Allows for a complete genomic overview of the patient, including non-coding regions which represent around 98% of human genome, and are thought to be largely implicated in gene regulation.

RNA-sequencing (RNA-seq): Captures the gene expression profile in a tissue-specific manner, providing a functional insight on the effect of detected coding variants.

While the approaches are fast and high-throughput, the enormous amount of generated data requires efficient bioinformatics tools to be analyzed. In that regard, analysis tools have been consistently evolving to accomplish this task more optimally. One of the crucial steps of data processing is variant calling: while some of the algorithms have established themselves as Gold Standards in the field, such as GATK's HaplotypeCaller for Single Nucleotide Variants, other types of mutations are not always as easy to detect. Many tools were developed specifically to achieve those detections, whether it be for Structural Variants (Lumpy; Delly; CNVkit...), Alternative Splicing (DEXseq; rMATS; SpliceAI...) or even Repeat Expansions (TRF; ExpansionHunter; STRetch...). Hence, it is important to assess which tools are best suited for a given analysis. Additionally, different sequencing approaches generally require different processing due to the nature of the data format. For these reasons, bioinformatic pipelines of the labs are optimized in each project in order to accurately detect and understand the nature of a patient's complex genetic pathology.

Muscle Project & Tools for VUS investigation

Myopathies and muscular dystrophies are a heterogeneous group of progressive genetic neuromuscular diseases that generally emerge from dysfunctions of structural, metabolism or channel functions. Gradual degeneration of the muscle fibers prompts variable clinical symptoms that interfere with daily activities, such as muscle weakness, cramps, stiffness, posture instability or spasms. Owing partially to the large heterogeneity of these disorders, the obtention of a molecular diagnosis in a patient that does not carry a common pathogenic variant is relatively complex. As such, at least 25% of myopathy cases remain without an official genetic cause following clinical genetic testing.

While the effect of "Variants of Unknown Significance" (VUS) is very hard to assess following its identification from DNA sequencing, functional information such as mRNA changes in expression or splicing profiles can be used to better interpret these candidate VUS. Therefore, there is a great potential for RNA-sequencing of complex cases. For this reason, muscle biopsies were obtained from a cohort of 17 patients with rare undiagnosed myopathies in order to perform transcriptomic sequencing from a disease-relevant tissue, with the aim of identifying novel pathogenic variants, and improving the diagnostic yield of myopathies.

Although we aim to identify the genetic cause for all patients, one of our main research focuses has recently been a very interesting homozygous stop-gain variant in a gene with very little literature. Indeed, little is known about MLIP functions and molecular interactions, but the nature of the variant, combined with decreased mRNA levels and a seemingly abnormal exon-junction profile, led to further functional validation of the candidate. Evidence gathered from Long-Read Sequencing and CRISPR-Cas9 experiments enables the genotype-phenotype association of the dysfunctional gene, and should enhance the genetic knowledge for this specific type of myopathy.

Genetically representative models of patients using C. elegans is yet another powerful tool we use for the interpretation of VUS. Despite significant evolutionary distance these small worms share many homologous genes with humans as well as a structurally, compositionally, and functionally conserved sarcomere. Congenital myopathies are most commonly caused by variants in the gene RYR1: one of the largest known genes. The enormity of this gene renders it especially prone to polymorphisms of which many have been identified but few classified. Using two CRISPR/Cas9 gene edited C. elegans models we have discovered strong evidence for the pathogenicity of two previously unclassified RYR1 variants. One such variant, a de novo mutation, was observed in a mother and her son who both demonstrate symptoms of a static myopathy. The other RYR1 variant was discovered in a cohort of 21 individuals who, despite being carriers of the same novel heterozygous missense mutation, demonstrate extremely diverse muscle phenotypes. While both cohorts are one step closer to a genetic diagnosis with further study these data will also provide insight into the mechanisms of clinical heterogeneity.

Finally, in another cohort composed of three siblings with a mild congenital muscular dystrophy and unaffected parents, we discovered a muscle-specific allelic imbalance in the gene IARS. This imbalance results in a loss of the paternal allele and limited expression of the mutant allele carrying the missense variant (maternal allele). Knock down of the IARS homologue in C. elegans did not produce abnormal phenotype therefore, we are pursuing a IARS knock-out myoblast cell model to further our investigations.

Ataxia Project & Combination of sequencing technologies

The clinical consequences of neuronal degeneration vary greatly depending on the structure affected. As the cerebellum is known for his major role in motor control, in addition to participating in many cognitive functions such as language and working memory, cerebellar degeneration generally leads to Ataxia. While many subsets of the condition have been described (Spinocerebellar Ataxias (SCA); Autosomal Recessive Cerebellar Ataxias (ARCA); Friedreich Ataxia (FRDA) ...), our laboratory has been focusing on Episodic Ataxias (EA).

Characterized by sporadic loss of voluntary movement coordination, EA typically manifest with a late onset as well as high-clinical and genetic heterogeneity, setting additional hurdles to diagnosis. While four genes have been linked to the eight subtypes of EA, KCNA1 and CACNA1A variants accounting for a majority of cases, many patients are left without molecular diagnosis due to the limitations of individual DNA-sequencing methods.

Our team has conducted a pilot study to assess the diagnostic performance of combining the complete genomic overview offered by WGS with the functional insight provided by RNA-seq. Preliminary data diagnostic yields similar or greater to other approaches, and we hope future candidate validation will improve EA literature as well as care management for the patients affected by the condition.

Combination of other omics such as Epigenomics

Parkinson Project

Parkinson’s disease (PD) is the second most common neurodegenerative disease. The long-term condition is known to cause mainly motor symptoms, such as tremor, rigidity, slowness of movement, and difficulty with walking, but cognitive also arise down the road. Clinical heterogeneity has been reported, with about 70% of patients presenting a “typical” parkinsonian disorder while the others exhibit an “atypical” parkinsonism. The clinical diagnosis of these various subtypes is challenging, especially in the first five years of disease onset. In the islands of Guadeloupe and Martinique however, two-hirds of patients present Caribbean atypical Parkinsonism (CAP), which has led to multiple hypotheses where environmental factors (i.e. pesticides) would cause epigenetic alterations that would explain the abnormal ratio of atypical PD patients.

Since flaviviruses (Zika & Dengue) are endemic in this region, we are proposing a third hypothesis: the epigenetic alteration are secondary to flaviviral infections. This idea is supported by the fact that parkinsonism has been described as a neurological manifestation of Zika infection, and is well-aligned with the growing evidence of the important role played by the immune system in the pathophysiology of PD. Several studies have shown that viral infections can lead to transcriptomic and epigenomic alterations, with methylation profile changes being permanent and potentially impacting immune memory.

To assess this hypothesis, 120 patients diagnosed with either PD, CAP, or other types of atypical parkinsonism such as PSP or MSA have been recruited with age-matched controls for blood sampling. The combination of bulk RNA-sequencing with methyl-ATAC-sequencing should allow the capture of any transcriptomic or epigenomic profile changes. Integrative multi-omic data analysis should enable the definition of immune signatures that at least partially explain the clinical variability observed with Parkinson's disease.

The goal of this study is to identify clinically relevant biomarkers or molecular signatures that will contribute to enhance the diagnosis of PD and atypical parkinsonisms at early or even pre-symptomatic stages. While treatment options are restricted, better care management should lead to an improved quality of life for patients receiving an early diagnosis.

Single-cell

Single cell sequencing technologies represent a rapidly evolving approach with multiplying applications in all biological fields. Our laboratory focuses on harnessing the multimodal capabilities of the 10X platform to identify specific markers of Parkinson’s Disease (PD) and other Parkinsonian syndromes that could highlight disease-specific mechanisms and help in the diagnosis of these complex diseases. To that end, we rely on primary blood samples from patients and controls to compare immune compartments and gene expression within our cohort. We apply cutting edge multiplexing approaches to increase scalability of our study and the immune profiling of the effector populations also allows for correlation of TCR clonotypes with gene expression for a deeper phenotyping of implicated cells. We hope to identify markers of mitochondrial-specific autoimmunity in PD by adding epitope-TCR experiments to complete our immune landscape characterization.

Usage of brand new technologies

Long-Read Sequencing with GridION

Sometimes described as Third-Generation Sequencing, Long-Read Sequencing (LRS) technologies (PacBio & Oxford Nanopore) are the most recent advances in the genomic field. The main advantage of this approach is that DNA or RNA molecules do not need to be sheared into small fragments prior to the sequencing, which also works on unamplified material. Resulting reads are therefore more representative, bypassing the need of short-read stacking in order to obtain high-confidence alignment to the reference genome. While LRS currently offers less accurate base calling than its short read homolog, high coverage and constant improvement of related bioinformatic tools has enabled the precision of the approach to catch up to other NGS technologies.

One of the biggest strengths of LRS lies in the assembly of de novo tissue-specific transcript annotations, which hints towards a considerable improvement of current knowledge regarding existing transcriptomic splicing. Indeed, the fact that exon junctions do not need to be predicted during the alignment stage allows for a more accurate quantification of the expression profile of genes, while also increasing the chance of catching alternative splicing events. Many tools, such as minimap2 & FLAIR, allow for a quick and efficient analysis of long-read data, and it wouldn't be surprising to see the technology reach a more widespread audience in a near future.

Another attractive aspect of LRS is its unique capacity to detect large structural variant, which were previously pretty hard to assess from short-read data. Therefore, the technology provides the user with the possibility of easily investigating those large events, such as the accurate quantification of repeat expansions, without requiring the use of a lengthy Southern Blot.

Our laboratory has recently purchased a GridION from Oxford Nanopore, as is planning to set up efficient protocols in order to collaborate with many other researchers who may benefit from the use of this highly interesting technology.

CRISPR/Cas9 Cultures

In order to understand how genetic variants are capable of inducing clinical phenotypes, it is essential to study the molecular mechanisms associated with the affected genes. As such, disease modeling is highly suitable to the functional validation of a variant. The CRISPR-Cas9 system allows for a precise gene editing and/or modulation of a candidate causal variant in a disease-relevant immortalized cell line, which better recapitulates the tissu-specific effects of the mutation. Not only is the technology highly valuable to gene characterisation, but it is an excellent tool for therapy research, such as drug discovery screens.

The technology exploits the bacterial Cas9, which is an enzyme that is guided by a gRNA (CRISPR Sequence), allowing it to recognize and cleave specific double-stranded DNA. Due to the fact that the targeting specificity is defined by a simple RNA-DNA base pairing and PAM sequence, it is fairly easy to engineer, hence the versatility of applications the system offers. The double-strand break is then either repaired by the error-prone Non-Homologous End Joining (NHEJ) or the Homology Directed Repair (HDR) mechanisms. NHEJ is very useful for target gene disruption in knock-out studies due to its relative simplicity. On the other hand, designing a custom ssDNA (or dsDNA) template enables the harnessing of HDR, leading to the insertion of a specific variant for precise gene editing & functional studies.

In our lab, we've mostly focused on CRISPR-Cas9 editing of in vitro cell lines, performing gene knock-out as well as specific variant knock-in, but hope to expand the application to more complex models such as C. elegans or zebrafish. Our goal is to streamline the process of functionally validating candidate variant through models that are more relevant to the pathology than PBMCs, which is often the only available patient sample.