NGS-DA: NGS & NGS Data Analysis
RNA Atlas: A Nucleotide Resolution Map Of The Human Transcriptome
1Ghent University, Belgium; 2Baylor College of Medicine, US; 3Aarhus University, Denmark; 4Illumina, US
Technological advances in RNA expression profiling revealed that the human genome is pervasively transcribed, generating an unexpectedly complex transcriptome consisting of various classes of RNA molecules and a huge isoform diversity. Many of these RNAs show high tissue specificity, with some being expressed in only one or few cell types. While numerous large-scale RNA-sequencing studies have been performed, samples involved are often complex tissues, masking transcripts expressed in low-frequent cell populations, and sequencing methods typically focus on one class of RNA transcripts.
We assembled the most comprehensive human transcriptome across an extensive cohort of human samples, consisting of 160 different normal cell types, 45 tissues and 93 cancer cell lines. For each sample, total RNA, poly-A RNA and small RNA libraries were generated and sequenced using Illumina technology, yielding a total of 65 billion reads. Transcriptome assemblies for mRNAs, lncRNAs, miRNAs and circRNAs were matched with chromatin state maps from the Roadmap Epigenomics Consortium to define stringent gene models for each RNA biotype. Count data from polyA and total RNA sequencing libraries were combined to reveal the polyadenylation status of each transcript in each sample. We identified a total of 50235 gene loci of which 19668 were novel. From these loci, 37140 circRNAs were expressed. While a small fraction of novel genes was predicted to have coding potential, the majority of novel genes were non-coding, single exonic, and highly enriched for non-polyadenylated transcripts. Interestingly, a subset of genes showed variable poly-adenylation status across samples, mainly driven by alternative isoform usage. Biological information content of each RNA biotype was assessed by evaluating RNA expression - sample ontology associations and complex tissue deconvolution. Furthermore, we exploited the availability of intron reads from the total-RNA sequencing data to assess the regulatory potential of miRNAs, lncRNAs and circRNAs at the transcriptional and post-transcriptional level. Taken together, the RNA atlas serves as a unique resource for further studies on the function, organization and regulation of the different layers of the human transcriptome.
Getting More out of RNA-Seq Data: Transcriptomic Analysis of Ischemic Stroke
1Institute of Biotechnology CAS, Czech Republic; 2Institute of Experimental Medicine CAS, Czech Republic; 3Faculty of Science, Palacky University Olomouc, Czech Republic
In recent years, RNA sequencing has become a standard method for genome-wide transcriptional analysis. Despite the extensive informational content of RNA-Seq data, many studies limit their scope to differentially expressed genes and/or pathway enrichment analysis, leaving substantial part of information unexplored. Here, we present a comprehensive transcriptomic analysis of the ischemic stroke in young and aged animals. We assessed differential gene expression across injury status and age, performed detailed pathway analysis and unsupervised co-expression analysis, identifying modules of genes associated with the various response to injury. We complemented these results with estimation of cell-type proportion changes using computational deconvolution techniques and assayed our results with findings from previous studies of similar design and publicly available databases. By employing these simple, yet often underutilized analytical approaches we found disease signatures consistent with literature and extend these results with new findings. We show strikingly variable response of different cell types and specific cellular pathways between young and aged ischemic animals, particularly related to immune response. Together, these results paint a picture of ischemic stroke as a complex age-related disease and provide insights into interaction of age and stroke on cellular and molecular level.
Liquid Biopsies For Personalized Medicine: The Omiterc Project
University of Florence, Italy
OMITERC is a data-sharing project sponsored by Regione Toscana, Italy, with the objective to develop an electronic registry that aggregates and links cancer genomic and pharmacogenetics-pharmacogenomics data with clinical outcomes from wild-type BRAF metastatic melanomas and RAS mutated metastatic colorectal cancers. The project aims to aggregate, harmonize and share clinical and molecular data obtained during routine medical practice. To reach this goal the implementation of a comprehensive database including all the clinical and molecular data deriving from the analysis of the primary tumor and the liquid biopsy in the case study is ongoing.
Twenty RAS-mutated metastatic colorectal cancer patients and eleven BRAF-wild-type metastatic melanoma patients were enrolled in the study and submitted to serial blood sampling before therapy and at different time intervals during the follow up.
We present the data deriving from the analysis of the liquid biopsy before and during therapeutic treatment in metastatic colorectal cancer patients: circulating tumor cell (CTC) detection and counting by CellSearch and mutational analysis by targeted NGS of cell-free DNA (cfDNA) and single CTCs (in a subset of cases).
CTCs were detected in 7 patients at baseline and were not found in subsequent blood draws during the follow up except for 4 subjects. On the whole the presence of CTCs showed a prognostic significance and was correlated to the efficacy of treatment.
KRAS mutational status in cfDNA from colon cancer patients at baseline was concordant with that of the primary tissue in 90% of cases.
The longitudinal study of cfDNA allowed a dynamic monitoring of the disease through the assessment of the presence of specific tumor-related mutations and the evaluation of their allelic frequency over time.
From the results emerges that it is important to study both CTCs and cfDNA, since they represent two different aspects of the liquid biopsy that can be integrated into a non-invasive approach to cancer patients.
Tiled Amplicons Panels in NGS-Based Genetic Testing
ecSeq Bioinformatics GmbH, Germany
Next-generation sequencing increasingly replaces traditional Sanger sequencing for routine genetic testing applications. The higher throughput allows higher sensitivities for detecting low-frequency DNA mutations. However, more sequence reads do not automatically lead to a higher sensitivity and accuracy. Technical limits grounded in NGS technology and the library preparation, such as PCR duplicates, low complexity and false positives, need to be addressed. This leads to an increase in the diversity and complexity of available commercial NGS sample preparation kits.
Recent kits targeting low-frequency variant detection (for oncology) combine approaches such a unique molecular identifiers, single primer extension and tiled amplicon designs to address these issues. We show how these approaches work, and what are the practical consequences of these approaches on the observed sequence reads and on the downstream bioinformatics analysis.
PCR Based Target Enrichment For Variant Confirmation, Gene Panels And Multiplex PCR Sample Tracking In A Whole Exome Sequencing Workflow
1Center for Medical Genetics Ghent, Ghent University Hospital, Ghent, Belgium; 2pxlence bvba, Dendermonde, Belgium; 3Center for Medical Genetics Ghent, Ghent University, Ghent, Belgium
Targeted PCR-based resequencing is an important application in clinical diagnostics. Using our best-in-class primer design tool primerXL, we have designed almost one million PCR assays for both fresh frozen and formalin-fixed paraffin-embedded samples, covering the entire human exome. Over 6200 assays for hundreds of clinically relevant genes in total were wet-lab validated. In addition, over 5000 patient-specific variants, from exome sequencing, were confirmed using pxlence PCR assays. All singleplex PCR assays work under universal PCR conditions and result in equimolar sequencing coverage. As a latest addition, we present the compatibility of pxlence assays with multiplex PCR applications. As a first product, we designed and validated a cost-effective and flexible sample tracking test. This primer pool enables fast identification of sample swapping or contamination which may occur in laborious library preparation workflows.
Thirty SNPs were selected based on their minor allele frequency, exonic location and overlap with the capture region of exome enrichment kits. We evaluated three different mastermixes for multiplex PCR and two library preparation methods, followed by 150 bp paired-end sequencing on a MiSeq instrument (Illumina).
The SsoAdvanced PreAmp Supermix (Bio-Rad) resulted in superior homogenous coverage following multiplex PCR of all SNP assays (pxlence). No significant difference in coverage uniformity was observed between the Nextera DNA Flex and the NexteraXT DNA library prep method (both Illumina). In virtually all tested DNA samples (n=393), 86.29% of the SNPs had a uniform coverage within 2-fold of the mean. Based on the SNP genotypes, DNA samples could unambiguously be discriminated.
In conclusion, pxlence provides high-quality and versatile PCR assays for various targeted resequencing applications. Here, we designed and validated a novel sample tracking test for whole exome or whole genome sequencing, involving a straightforward single multiplex PCR reaction followed by DNA sequencing library prep. In principle, our strategy could also be used to design gene panel-specific sample tracking solutions.
Handling of Spurious Molecular Species Dictates the Outcome of High-throughput 16S rRNA Gene Amplicon Profiling.
ZIEL Institute for Food and Health, Core Facility Microbiome/NGS, Technical University of Munich, Freising, Germany
16S rRNA gene sequencing has become a popular method for rapid and comprehensive analysis of the diversity and composition of complex microbial communities. However, this method is prone to technical artefacts at various levels of the workflow. The most common method to analyse 16S amplicon data is building a cluster of sequences, representing single microbial entities on a 97% sequence identity (OTUs). Diversity measures derived from OTU-based datasets are strongly influenced by parameter settings such as filtering of spurious OTUs. This is crucial because of interpretation, reproducibility and quality. This study aims to bring clarity about filtering thresholds, usable to exclude spurious OTUs from high-throughput 16S rRNA amplicon datasets.
To determine an appropriate filtering cutoff two types of mock communities are used: ten different communities from published studies and two in-house generated datasets. This was complemented by the analysis of fecal samples of four gnotobiotic mice, colonized with different mixtures of bacteria. To analyse the impact of filtering, two studies with open access to sequence datasets are used as reference.
By filtering data with the commonly used method of removing singletons, shows on average 71% of all OTUs to be artefacts. A filtering cutoff of 0.25% reduces the number to 1.17% in mock-communities and 3.57% in gnototiotic mice, while still capturing 85% of true positives. Even with a low cumulative abundance of 1.14%, these artefacts are appearing in the data set and are as well considered as sequences while building the phylogenetic tree. Especially richness is influenced by the absolute number of OTUs and shows different results in both reference studies (0.25% = 195 ± 78 and 156 ± 44; no singletons = 364 ± 140 and 531 ± 201). Intra-individual stability of the microbiome is dependent on the used filtering method as well as stability of richness which is less dynamic by filtering with 0.25% cutoff (p-value < 0.001). A shift of median unweighted UniFrac distance by 0.36 per individual can be observed, which assumes a more stable microbiome. It is noting that the outcome in first study is inversely for generalized UniFrac distance. The second study shows the same pattern for both methods and distances even though there is a difference in distances for unweighted UniFrac. This affects the interpretation of stability of the human gut microbiome.
With this work we would like to raise the awareness of interpreting the outcome of 16S r RNA gene sequencing data. Since there is no standardisation it is important to know the methods behind the analysis and to be sensitised about the possible impact of different filtering approaches and used distance matrices. Nevertheless, it is not only important to carefully interpreted results it is also important to obtain high-quality results. Using a proportional cutoff is an independent filtering method to remove spurious OTUs in microbial datasets.
QC Measurements for Predicting Downstream NGS Success with FFPE and Circulating Cell-Free DNA Plasma Samples
Promega Corporation, 2800 Woods Hollow Rd. Madison, WI
Quantity and quality of DNA from formalin fixed, paraffin embedded (FFPE) tumor tissue samples is highly variable, with degradation and crosslinking due to the fixation process leading to issues with amplification and difficulty in NGS analysis. An alternative to FFPE is circulating cell-free DNA (ccfDNA) from plasma or other biological fluids. Compared to gDNA, ccfDNA yields are typically low, with tumor cell present at significantly lower frequencies. Due to the inherent variability of FFPE and ccfDNA, knowing the quantity of DNA is not in itself reliably predictive of downstream NGS success. In this presentation, we describe novel methods for predicting sequencing result quality utilizing a multiplexed qPCR assay.