Please use this identifier to cite or link to this item: http://theses.ncl.ac.uk/jspui/handle/10443/4298
Title: A systematic comparison of integrated genomic platforms and bioinformatics pipelines for next generation sequencing in patients with rare neuromuscular disease
Authors: Alrohaif, Hadil E. D. M.
Issue Date: 2018
Publisher: Newcastle University
Abstract: Neuromuscular disorders are a group of genetically and phenotypically heterogeneous disorders and pose a challenge for molecular diagnosis. Next generation sequencing is increasingly used in research and clinical settings for diagnosis and disease gene discovery. Inconsistencies in bioinformatics pipelines, research and validation results suggest that bioinformatics tools for next generation sequencing are yet problematic and that further research is needed. In addition, sequencing data, bioinformatics tools, clinical data and databases of current knowledge of the human genome need to be integrated in an effective workflow that facilitates diagnosis and novel gene discoveries. Furthermore, optimising and standardising analysis workflow for next generation sequencing allows data from different projects and research sites to be shared and validated. At the John Walton Muscular Dystrophy Research Centre, Newcastle University, three genomics platforms are used to analyse whole exome and whole genome sequencing data for patients with rare neuromuscular disease. These three platforms, namely: RD-Connect Genome-Phenome Analysis Platform (CNAG, Barcelona, academic), seqr (Broad Institute, Boston, academic) and the Clinical Sequence Analyser (CSA, WuXi NextCODE, commercial) use combinations of different bioinformatics tools and integrate different software applications and databases for variant annotation, filtering and prioritization. Here, the aim was to compare the yield of genome sequencing over exome sequencing for patients with rare neuromuscular disorders and to assess the degree of agreement between the three genomic platforms and their respective bioinformatics pipelines. I also aimed to evaluate the value of using an integrated genomics platform in diagnosis and novel gene discovery in patients with rare neuromuscular disorders. The analysis showed that whole genome sequencing offers more uniform coverage of coding regions in the genome and has the potential to detect additional coding variants in known neuromuscular disease genes that are missed by exome sequencing due to low coverage. Low coverage was associated with genomic features such as high GC-content and low sequence heterogeneity. The uniform coverage and sequencing methods used for whole genome sequencing may also lead to improved detection of InDels and copy number variants in this group of patients. ii Analysis of the bioinformatics pipelines at the three sites using patient WES and WGS data revealed that the highest agreement was between the RD-Connect and the CSA platforms (75%). However, using high quality reference data revealed higher concordance rates (up to 91%). As for variant output from the three genomics platforms, the mean variant concordance for all three platforms was 37%, and the highest pairwise concordance rate was 66% for seqr and RD-Connect. Looking at variant type, agreement in variant output was largely accounted for by single nucleotide variants and InDel agreement was significantly low. Comparing the variant output between the three platforms revealed very low agreement. This highlights variant annotation software and filtering algorithms as contributors to the discrepancy in variant output. Whole exome sequencing data from molecularly undiagnosed families with limb girdle muscle weakness were used on the seqr platform to assess the utility of an integrated genomics platform in diagnosis and disease gene discovery in patients with rare neuromuscular disorders. This analysis showed that for 65.6% of families, a genetic diagnosis was proposed. This included a number of proposed novel genetic associations in neuromuscular disorders, including the recently published MYMK gene and the FILIP1 gene, which is projected as a strong candidate for syndromic congenital myopathy. Once a genetic diagnosis for a rare disease is established, phenotype-genotype correlations can be established. A group of patients with genetically confirmed GNE myopathy from Kuwait were studied. A description of clinical, genetic and epidemiological aspects of the disease in the Kuwaiti Bedouin population is given. In conclusion, next generation sequencing undoubtedly continues to offer new insights in rare neuromuscular disorders. However, advances in bioinformatics need to match advances in sequencing technologies. Whole genome sequencing offers additional value over whole exome sequencing. Nevertheless, it remains costly and data interpretation is still problematic. A targeted approach to the analysis of whole genome sequencing data may be a more appropriate intermediate approach. Analysis pipelines require a standardised approach for development and validation. Moreover, bioinformatics algorithms remain an area for continued assessment and optimisation. This will maximise the benefit from research in next generation sequencing and enable data to be shared and compared. Lastly, integrated genomics platforms are an ideal interface between the researcher and all relevant genetic, iii phenotypic, population and bioinformatics prediction data, for diagnosis and novel gene discoveries in patients with rare neuromuscular disorders.
Description: MD Thesis
URI: http://theses.ncl.ac.uk:8080/jspui/handle/10443/4298
Appears in Collections:Institute of Genetic Medicine

Files in This Item:
File Description SizeFormat 
Alrohaif, H 2018.pdfThesis3.52 MBAdobe PDFView/Open
dspacelicence.pdfLicence43.82 kBAdobe PDFView/Open


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.