Molecular Medicine Israel

Germline structural variation globally impacts the cancer transcriptome including disease-relevant genes

Highlights

  • Combined RNA and germline SV analysis using the PCAWG consortium datasets
  • Several genes impacted by germline SVs could conceivably contribute to cancer
  • Several genes are essential in cell lines or involve patient survival associations

Summary

Germline variation and somatic alterations contribute to the molecular profile of cancers. We combine RNA with whole genome sequencing across 1,218 cancer patients to determine the extent germline structural variants (SVs) impact expression of nearby genes. For hundreds of genes, recurrent and common germline SV breakpoints within 100 kb associate with increased or decreased expression in tumors spanning various tissues of origin. A significant fraction of germline SV expression associations involves duplication of intergenic enhancers or 3′ UTR disruption. Genes altered by both somatic and germline SVs include ATRX and CEBPA. Genes essential in cancer cell lines include BARD1 and IRS2. Genes with both expression and germline SV breakpoint patterns associated with patient survival include GCLM. Our results capture a class of phenotypic variation at work in the disease setting, including genes with cancer roles. Specific germline SVs represent potential cancer risk variants for genetic testing, including those involving genes with targeting implications.

Introduction

Structural variation is a broad class of chromosomal variation that includes copy number variants (deletions and duplications), balanced rearrangements (e.g., inversions and translocations), and insertions (e.g., from mobile-elements). In recent years, structural variants (SVs) have been associated with an increasing number of normal phenotypic variations, as well as common and rare human diseases.1,2 Recent technological advances, including higher resolution microarrays and massively parallel next-generation sequencing, have allowed for more accurate cataloging of structural variation across many thousands of individuals.3 Cancer is a multistep process involving mutations in multiple genes, to which germline mutations may contribute and provide a head start on the neoplastic process.

4 Moderate-to high-penetrance germline variants in cancer predisposition genes underlie 5%–10% of all cancers, with SVs representing an uncommon cause of cancer susceptibility, underlying perhaps 1.5% of cancer cases by some estimates.5 SV-impacted genes in cancer would include cancer predisposition genes, DNA damage response genes, and somatic driver genes.5,6 The functional effects of germline SVs on gene expression in normal tissues or cell lines derived from normal tissues have been explored,7,8,9,10,11 whereby SVs can have large effect sizes on adjacent genes and are often constitutive across diverse tissues.7,10 At the same time, the impact of germline SVs on expression variation in human cancers would also be of interest.

Both germline variation and somatic alterations contribute to the molecular profile of cancers. Somatic structural variation exerts a strong influence on gene expression in cancer.12,13,14,15,16 Recently, the Pan-Cancer Analysis of Whole Genomes (PCAWG) consortium aggregated whole genome sequencing (WGS) data from 2,658 cancers across 38 tumor types involving 20 major tissues of origin to identify germline variants and somatically acquired mutations.6 In the PCAWG cohort, integration of somatic SVs with corresponding RNA data identified genes with altered expression associated with nearby SV breakpoints.14,16,17 Genes deregulated or disrupted in this way included many oncogenes, such as TERTMDM2CDK4ERBB2CD274PDCD1LG2BCL2, and IGF2; and tumor suppressor genes, such as PTENRB1STK11, and TP53. In the PCAWG-led studies, common and rare germline variants were found to affect somatic mutation patterns, including SVs. Germline deletion SVs impacting cancer susceptibility genes were cataloged in the PCAWG cohort, e.g., involving BRCA1 and BRCA2. However, the above studies did not systematically explore the potential for germline SVs, including SVs with breakpoints outside of genes, to impact gene expression, as was explored for the somatic SVs.

In the present study, we combined patient germline SV data (taken from a normal blood sample) with tumor RNA sequencing (RNA-seq) data across the PCAWG cohort to systematically catalog gene-level associations with altered tumor expression in conjunction with nearby germline SV breakpoints. We took a gene-centric approach to the data, allowing us to focus on genes of interest that were recurrently altered in relation to SVs. The germline SV-expression associations identified cut across tumors from various tissues of origin. Most of the significant genes would not necessarily have specific roles in cancer, but would instead reflect germline variations. However, some genes had known roles in cancer, seemed to be targeted by somatic SVs, were essential in cancer cell lines, or had germline patterns associated with cancer patient survival.

Results

Germline structural variation patterns across cancer patients

To explore germline structural variation in cancer, we referred to the PCAWG dataset of germline and somatic SVs calls representing 1,218 patients (Table S1). Based on the blood normal sample, a median of 1,163 germline SVs was identified per patient (with s SD of 650.4). On average, the numbers of germline SVs detected did not vary widely according to tumor tissue origin (Figure 1A). However, four patients had very high numbers of SVs detected in their blood sample, only a fraction of which constituted previously identified SVs. The four patients (one breast cancer, three uterine cancers) might conceivably have undergone clonal hematopoiesis,19 although the SVs unique to these patients did not contribute to the SV-expression associations described below (Table S2). Across patients, we observed no strong associations between germline SV numbers and patient age, tumor ploidy, or tumor cellularity (Figure S1A). In contrast with the somatic SVs, most germline SVs in the PCAWG dataset were deletions (78%), followed by inversions and duplications (12.5% and 9.3%, respectively) (Figure 1B). Of 1,427,378 total germline SVs in the PCAWG dataset, only 68 were inter-chromosomal translocations, and 24% of the 322,734 PCAWG somatic SVs were translocations. In addition, germline SVs were highly recurrent across patients in terms of breakpoint positions, with more than 93% of germline SV calls involving SVs with both breakpoints found for two or more patients, while this applied to only 0.1% of somatic SV calls (Figure 1C). For intra-chromosomal SVs, the median distance between breakpoints was much smaller for germline versus somatic SVs, 2.1 kb versus 175 kb, respectively (Figure 1D). In addition, the numbers of somatic SVs detected per tumor varied widely across patients (Figure S1B), and an increased total number of somatic SVs was significantly associated with higher patient age (Spearman’s r = 0.13; p < 1E−5) and strongly associated with higher tumor ploidy (r = 0.37; p < 1E−40).

Sign up for our Newsletter