Molecular Medicine Israel

Topography of mutational signatures in human cancer

Highlights

  • Mutations imprinted by mutational signatures are affected by topographical genomic features
  • Mutational signatures with related etiologies are similarly affected by genomic topography
  • Periodicity and cancer-type-specific enrichments/depletions are observed for some signatures
  • Updated COSMIC database links 76 signatures in 40 cancer types with 516 topography features

Summary

The somatic mutations found in a cancer genome are imprinted by different mutational processes. Each process exhibits a characteristic mutational signature, which can be affected by the genome architecture. However, the interplay between mutational signatures and topographical genomic features has not been extensively explored. Here, we integrate mutations from 5,120 whole-genome-sequenced tumors from 40 cancer types with 516 topographical features from ENCODE to evaluate the effect of nucleosome occupancy, histone modifications, CTCF binding, replication timing, and transcription/replication strand asymmetries on the cancer-specific accumulation of mutations from distinct mutagenic processes. Most mutational signatures are affected by topographical features, with signatures of related etiologies being similarly affected. Certain signatures exhibit periodic behaviors or cancer-type-specific enrichments/depletions near topographical features, revealing further information about the processes that imprinted them. Our findings, disseminated via the COSMIC (Catalog of Somatic Mutations in Cancer) signatures database, provide a comprehensive online resource for exploring the interactions between mutational signatures and topographical features across human cancer.

Introduction

Cancer genomes are peppered with somatic mutations imprinted by the activities of different endogenous and exogenous mutational processes. Due to their intrinsic biophysical and biochemical properties, each mutational process engraves a characteristic pattern of somatic mutations, known as a mutational signature. Our previous analyses encompassing more than 5,000 whole-genome- and 20,000 whole-exome-sequenced human cancers have revealed the existence of at least 78 single-base substitution (SBS), 11 doublet-base substitution (DBS), and 18 insertion or deletion (ID) mutational signatures.

 Through statistical associations and further experimental characterizations, etiology has been proposed for approximately half of the identified signatures. Prior studies have also explored the interactions between somatic mutations imprinted by different mutational processes and the topographical features of the human genome for certain cancer types and for a small subset of topographical features. However, previously, there has been no comprehensive evaluation that examined the effect of genome architecture and topographical features on the accumulation of somatic mutations from different mutational signatures across human cancer.

Early studies have shown that late-replicating regions and condensed chromatin regions accumulate more mutations when compared with early-replicating regions, actively transcribed regions, and open chromatin regions. Subsequent analyses of hundreds of cancer genomes have revealed that differential DNA repair can explain variations in mutation rates across some cancer genomes as well as that chromatin features originating from the cell of origin, which gave rise to the tumor, can affect mutation rate and the distribution of somatic mutations. Recently, Morganella et al. examined the effect of the genomic and the epigenomic architecture on the activity of 12 SBS signatures in breast cancer. These analyses demonstrated that mutations generated by different mutational processes exhibit distinct strand asymmetries and that mutational signatures are differently affected by replication timing and nucleosome occupancy. Pan-cancer exploration of strand asymmetries was also conducted for different mutation types across multiple cancer types, as well as for different mutational signatures. In particular, pan-cancer analyses of more than 3,000 cancers have revealed the strand asymmetries and replication timings of the 30 SBS mutational signatures from the Catalog of Somatic Mutations in Cancer v.2 signatures database (COSMICv.2). Similarly, more than 3,000 cancer genomes were used to elucidate the effect of nucleosome occupancy for the 30 substitution signatures from COSMICv.2.

More recently, a study has also shown the interplay between the three-dimensional genome organization and the activity of certain mutational signatures.

Here, we report the most comprehensive evaluation of the effect of nucleosome occupancy, histone modifications, CCCTC-binding factor (CTCF) binding sites, replication timing, transcription strand asymmetry, and replication strand asymmetry on the cancer-specific accumulation of somatic mutations from distinct mutational signatures. Our analysis leverages the complete set of known COSMICv.3.3 signatures (78 SBS, 11 DBS, and 18 ID), and it examines 5,120 whole-genome-sequenced cancers while simultaneously utilizing 516 unique tissue-matched topographical features from the ENCODE project (Table S1).27

 In all analyses, the observed patterns of somatic mutations are compared to background simulation models of mutational signatures that mimic both the trinucleotide pattern of these signatures as well as their mutational burden within each chromosome in each examined sample (STAR Methods). Our results confirm many of the observations previously reported for strand asymmetry, replication timing, and nucleosome periodicity for the original COSMICv.2 signatures. Further, the richer and larger COSMICv.3.3 dataset allowed us to elucidate novel biological findings for some of these 30 SBS signatures, revealing previously unobserved pan-cancer and cancer-specific dependencies. Additionally, this resource provides the first-ever map of the genome topography of ID, doublet-base, and another 24 substitution signatures in human cancer. Moreover, our study is the first to examine the tissue-specific effect of CTCF binding and 11 different histone modifications on the accumulation of somatic mutations from different mutational signatures. As part of the results, we provide a global view of the topography of mutational signatures across 5,120 whole-genome-sequenced tumors from 40 types of human cancer, and we include cancer-type-specific examples. As part of the discussion, we zoom in on two distinct case studies: (1) the topography of different types of clustered somatic mutations and (2) using the topography of mutational signatures to separate mutational signatures with similar patterns. Lastly, the reported results are released as part of the COSMICv.3.3 signatures database, https://cancer.sanger.ac.uk/signatures, providing an unprecedented online resource for examining the topography of mutational signatures within and across human cancer types.

Results

Transcription strand asymmetries

Transcription strand asymmetries have been generally attributed to transcription-coupled nucleotide excision repair (TC-NER) since bulky adducts (e.g., ones due to tobacco carcinogens) in actively transcribed regions of the genome will be preferentially repaired by TC-NER.28

 Additionally, TC damage may also lead to transcription strand asymmetry due to one of the strands being preferentially damaged during transcription.22

Mutational signatures with similar etiologies generally exhibited consistent patterns of transcription strand asymmetries across cancer types. Specifically, most signatures attributed to exogenous mutational processes showed transcription strand bias with mutations usually enriched on the transcribed strand (Figures 1A and 1E). This included signatures SBS4/DBS2 (both previously attributed to mutagens in tobacco smoke), SBS16 (alcohol consumption), SBS24 (aflatoxin), SBS29 (tobacco chewing), SBS25/SBS31/SBS35/DBS5 (prior chemotherapy), and SBS32 (prior treatment with azathioprine). Nevertheless, for some exogenous signatures, strand asymmetries could differ between cancer types. For example, while transcriptional asymmetries for C>A and T>A mutations from SBS4 were observed across most cancer types, asymmetries for C>G mutations were only observed in lung adenocarcinoma and cancers of the head and neck (Figure 1C). Interestingly, C>T mutations attributed to SBS4 had strand asymmetry only in lung adenocarcinoma. In contrast, mutational signatures due to direct damage from ultraviolet light (viz., SBS7a/b/c/d and DBS1) were the only known exogenous mutational processes to exhibit transcription strand asymmetry with strong enrichment of mutations on the untranscribed strand, consistent with damage from ultraviolet light on cytosine (Figures 1A and 1E).

Sign up for our Newsletter