Abstract
Determining whether the RNA isoforms from medically relevant genes have distinct functions could facilitate direct targeting of RNA isoforms for disease treatment. Here, as a step toward this goal for neurological diseases, we sequenced 12 postmortem, aged human frontal cortices (6 Alzheimer disease cases and 6 controls; 50% female) using one Oxford Nanopore PromethION flow cell per sample. We identified 1,917 medically relevant genes expressing multiple isoforms in the frontal cortex where 1,018 had multiple isoforms with different protein-coding sequences. Of these 1,018 genes, 57 are implicated in brain-related diseases including major depression, schizophrenia, Parkinson’s disease and Alzheimer disease. Our study also uncovered 53 new RNA isoforms in medically relevant genes, including several where the new isoform was one of the most highly expressed for that gene. We also reported on five mitochondrially encoded, spliced RNA isoforms. We found 99 differentially expressed RNA isoforms between cases with Alzheimer disease and controls.
Main
Human protein-coding genes average more than eight RNA isoforms, resulting in almost four distinct protein-coding sequences1,2. As a result of practical limitations in standard short-read sequencing technologies, researchers have historically been forced to collapse all isoforms into a single gene expression measurement, a major oversimplification of the underlying biology. Many unique isoforms from a single gene body appear to have unique interactomes at the protein level3. Distinct functions for individual isoforms from a single gene body have already been demonstrated for a handful of genes4,5,6. Notably, isoforms can play entirely different, or even opposite, roles within a given cell; a classic example includes two well-studied BCL-X (BCL2L1) transcripts with opposite functions, where BCL-XL is anti-apoptotic and BCL-XS is pro-apoptotic6. Changes in the expression ratio between the BCL-X isoforms are implicated in cancer and are being studied as therapeutic targets7, demonstrating the importance of understanding individual RNA isoform function rather than treating them as a ‘single’ gene.
Knowing which tissues and cell types express each isoform is an important first step in understanding isoform function. The limitations of using short-read sequencing for studying differential RNA isoform expression/usage8,9 include relying on heuristics to assemble and quantify isoforms10,11,12. As a result of these limitations, detailed analysis of individual isoforms has been limited to highly studied genes. In principle, long reads can sequence the entire isoforms directly12. However, the imperfections of long-read data13 still require some heuristics to estimate the expression of each isoform13,14. Recent long-read RNA sequencing (RNA-seq) studies used targeted approaches to uncover aberrant splicing events in sporadic Alzheimer disease (AD)15, dystrophinopathies16 and cancers17,18. Two other studies demonstrated that long-read sequencing can discover new RNA isoforms across several human tissues, including the brain19,20. Although both studies revealed important biology, including reporting new RNA isoforms, they had limited sequencing coverage (averaging <6 million aligned reads per sample). Read depth is essential to accurately quantify individual RNA isoforms, given that a total of >250,000 annotated RNA isoforms have been reported, as of July 2023 (ref. 2). In addition, neither of the studies focused on the medical relevance of using long-read RNA-seq. Although long-read sequencing does not resolve all challenges related to isoform sequencing (for example, those related to RNA degradation), our goal is to demonstrate the utility and importance of using long-read sequencing for both academic research and clinical diagnostics in the context of RNA isoforms (for example, reporting newly discovered RNA isoforms in medically relevant genes and variant interpretation in genes expressing multiple RNA isoforms).
In the present study, we demonstrate that RNA isoform quantification through deep long-read sequencing can be a step toward understanding the function of individual RNA isoforms, and provide insights into how they may impact human health and disease. Specifically, in addition to discovering new (that is, unannotated) RNA isoforms in known medically relevant genes, we also discovered new spliced mitochondria-encoded RNA isoforms and entirely new gene bodies in nuclear DNA and demonstrated the complexity of RNA isoform diversity for medically relevant genes within a single tissue (human frontal cortex from patients with AD and controls). Last, we showed the potential of differential RNA isoform expression analysis to reveal disease-relevant transcriptomic signatures unavailable at the gene level (that is, when collapsing all isoforms into a single expression measurement). Summary data from the present study are readily explorable through a public web application to visualize individual RNA isoform expression in aged human frontal cortex tissue (https://ebbertlab.com/brain_rna_isoform_seq.html).
Results
Methodological and results overview
Traditional RNA-seq studies relied on short-read sequencing approaches that excel at quantifying gene-level expression, but cannot accurately assemble and quantify a large proportion of RNA isoforms11,21 (Fig. 1a). Thus, we sequenced 12 postmortem, aged, dorsolateral prefrontal cortex (Brodmann area 9/46) brain samples individually from six patients with AD and six cognitively unimpaired controls (50% female; Fig. 1b). All samples had postmortem intervals <5 h and an RNA integrity score (RIN) ≥ 9.0; demographics, summary sequencing statistics and read length distributions are shown in Supplementary Table 1 and Supplementary Figs. 1–4. Poly(A)-enriched complementary DNA from each sample was sequenced using one PromethION flow cell. Sequencing yielded a median of 35.5 million aligned reads per sample after excluding reads lacking the primer on either end and those with a mapping quality <10 (Extended Data Fig. 1a). By excluding all reads missing primers, reads included in the present study should closely represent the RNA as it was at extraction…