Molecular Medicine Israel

Strand-resolved mutagenicity of DNA damage and repair

Abstract

DNA base damage is a major source of oncogenic mutations1. Such damage can produce strand-phased mutation patterns and multiallelic variation through the process of lesion segregation2. Here we exploited these properties to reveal how strand-asymmetric processes, such as replication and transcription, shape DNA damage and repair. Despite distinct mechanisms of leading and lagging strand replication3,4, we observe identical fidelity and damage tolerance for both strands. For small alkylation adducts of DNA, our results support a model in which the same translesion polymerase is recruited on-the-fly to both replication strands, starkly contrasting the strand asymmetric tolerance of bulky UV-induced adducts5. The accumulation of multiple distinct mutations at the site of persistent lesions provides the means to quantify the relative efficiency of repair processes genome wide and at single-base resolution. At multiple scales, we show DNA damage-induced mutations are largely shaped by the influence of DNA accessibility on repair efficiency, rather than gradients of DNA damage. Finally, we reveal specific genomic conditions that can actively drive oncogenic mutagenesis by corrupting the fidelity of nucleotide excision repair. These results provide insight into how strand-asymmetric mechanisms underlie the formation, tolerance and repair of DNA damage, thereby shaping cancer genome evolution.

Main

There is an elegant symmetry to the structure and replication of DNA, in which the two strands separate and each acts as a template for the synthesis of new daughter strands. Despite this holistic symmetry, many activities of DNA are strand asymmetric: (1) during replication, different enzymes mainly synthesize the leading and lagging strands3,4,6,7, (2) RNA transcription uses only one strand of the DNA as a template8, (3) one side of the DNA double helix is more associated with transcription factors9, and (4) alternating strands of DNA face towards or away from the nucleosome core10,11. These processes can each impart strand asymmetric mutational patterns that reflect the cumulative DNA transactions of the cells in which the mutations accrued1,9,10,12,13.

Cancer genomes are the result of diverse mutational processes1,14, often accumulated over decades, making it challenging to identify and subsequently interpret their relative roles in generating spatial and temporal mutational asymmetries. The relative contribution of DNA damage, surveillance and repair processes to observed patterns of mutational asymmetry remains poorly understood, although mapping of DNA damage15,16,17,18 and repair intermediates19,20 have provided key insights.

To understand the mechanistic asymmetries of DNA damage and repair on a genome-wide basis, we have exploited an established mouse model of liver carcinogenesis21,22, in which mutations are induced through a single DNA-damaging exposure to diethylnitrosamine (DEN; an alkylating agent that is bioactivated by the hepatocyte-expressed enzyme Cyp2e1). The exposure results in mutagenic DNA base damage, referred to as DNA lesions, that are inherited and resolved as mutations in subsequent cell cycles2. This phenomenon of lesion segregation, in which damaged lesion-containing strands segregate into separate daughter cells, results in pronounced, chromosome-scale mutational asymmetry. In a clonally expanded cell population, such as a tumour, this asymmetry can identify which damaged DNA strand was inherited by the ancestor of each tumour (Fig. 1a). Using this approach, we can determine the lesion-containing strand for approximately 50% of the autosomal genome and the entire X chromosome for each tumour2 (Extended Data Fig. 1). We analysed data from 237 clonally distinct tumours from 98 mice and could resolve the lesion strand for over 7 million base substitution mutations (Fig. 1b). Most (more than 75%) of the mutations are from T nucleotides on the lesion strand (Fig. 1c), consistent with previous analyses of DEN-induced tumours2,22, and biochemical evidence of frequent mutagenic alkylation adducts on thymine23.

The range of mutagenic alkylation adducts generated by activated DEN overlaps those from tobacco smoke exposure, unavoidable endogenous mutagens and alkylating chemotherapeutics such as temozolomide23,24,25. More generally, the mechanism of lesion segregation, which the strand-resolved analysis relies on, appears to be a ubiquitous property of base-damaging mutagens2. Here we newly exploit these strand-resolved lesions as a powerful tool to quantify how mitotic replication, transcription and DNA–protein binding mechanistically shape DNA damage, genome repair and mutagenesis.

The mutational symmetry of replication

These well-powered and experimentally controlled in vivo data provide a unique opportunity to evaluate whether DNA damage on the template for leading strand replication results in the same rate and spectrum of mutations as on the lagging strand template. There are several reasons why they might differ. First, leading and lagging strand replication use distinct replicative enzymes3,4,6,7, which may differ in how they handle unrepaired damage on the DNA template strand. Second, it is unknown whether the leading and lagging strand polymerases recruit different translesion polymerases, which could generate distinct error profiles. Third, substantially longer replication gaps are expected on the leading strand, if there is polymerase stalling26. Consequently, leading and lagging strands are thought to differ in their lesion bypass5 and post-replicative gap filling27,28.

On the basis of hepatocyte-derived measures of replication fork directionality (using Repli-seq and OK-seq, see Methods; Extended Data Fig. 2) and patterns of mutation asymmetry, we inferred whether the lesion-containing strand preferentially templated the leading or lagging replication strand (Fig. 1d). This was separately resolved for each genomic locus on a per tumour basis. Our initial analysis demonstrated a significantly higher mutation rate for lagging strand synthesis over a lesion-containing template (Pearson’s correlation coefficient cor = −0.86, P = 3.2 × 10−9; Fig. 1e). However, gene orientation — and thus the directionality of transcription — also correlates with replication direction29,30 and DEN lesions are subject to transcription-coupled repair (TCR)2. We therefore measured transcriptome-wide gene expression in the mouse liver on postnatal day 15 (P15), corresponding to the timing of DEN mutagenesis. This confirmed that the direction of transcription is strongly biased to match replication fork movement, and the effect is disproportionally evident in regions of extreme replication bias (Fig. 1e).

To disentangle the effects of transcription from replication, we measured mutation rates, jointly stratifying the genome by transcription state, replication strand bias, replication timing and genic annotation (Fig. 1f and Extended Data Fig. 3). Although transcribed regions exhibit a strong correlation of mutation rate with replication strand bias (Pearson’s cor = −0.86, P = 3.1 × 10−7), genome-wide multivariate regression shows that the strongest independent effect on the DEN-induced mutation rate is transcription over the lesion-containing strand (P < 1 × 10−300), followed by replication time (P = 6 × 10−162). As mismatch repair is biased towards earlier replicating genomic regions31, it may be partially responsible for correcting some mismatch–lesion heteroduplexes. We considered genic and non-genic regions of the genome across 21 quantiles of replication timing and found that, although there is a correlation between mutation rate and replication time supportive of mismatch repair, its role is minor relative to TCR (Extended Data Fig. 4). Replication strand bias has the smallest effect on mutation rate of tested measures (Extended Data Fig. 3j). Outside of genic regions, the correlation of replication strand bias with mutation rate is negligible (Fig. 1f and Extended Data Fig. 3j). This unexpected consistency in the rate of mutations generated by replication over alkyl lesions points to a shared mechanism of lesion bypass for the leading and lagging strands, possibly involving recruitment of the same translesion polymerases.

Strand-resolved collateral mutagenesis

It has been proposed that when translesion polymerases replicate across damaged bases, they can generate proximal tracts of low-fidelity synthesis32,33,34. In bacteria and yeast, this mechanism produces clusters of mutations35,36 and such collateral mutagenesis has recently been reported in vertebrates37. Consistent with these models, we found that mutations within 10 nt of each other are significantly elevated over permuted expectation (two-sided Fisher’s test, odds ratio 11.9, P < 2.2 × 10−16). This enrichment is most pronounced at 1–2 nt spacing, decreases after one DNA helical turn (approximately 10 nt) and decays to background within 20 nt (Fig. 2a and Extended Data Fig. 5). These short clusters are overwhelmingly isolated pairs of mutations (98% pairs, 2% trios) phased on the same chromosome (Extended Data Fig. 5e).

We oriented the clusters by their lesion-containing strand, and designated the first mutation site to be replicated over on the lesion-containing template as the upstream (5′) mutation and subsequent mutations were designated downstream (3′). Upstream mutations showed a mutation spectrum closely resembling the tumours as a whole (Fig. 2b and Extended Data Fig. 5a,b,i), indicating that it represents a typical lesion-templated substitution.

By contrast, downstream mutations have distinct mutation spectra (Extended Data Fig. 5c). Those located more than two nucleotides downstream show a strong preference for G→T substitutions (Fig. 2c and Extended Data Fig. 5h,l–n). As mutations are called relative to the lesion-containing template strand, this indicates the preferential misincorporation of A nucleotides opposite a template G nucleotide, thus newly revealing the intrinsic error profile of an extending translesion polymerase. Mutation pairs with closer spacing (2 nt or fewer) exhibit somewhat divergent mutation signatures (Extended Data Fig. 5h,j,k), probably reflecting both sequence-composition constraints and processes such as the transition between alternate translesion polymerases (Fig. 2d).

Extending these observations of collateral translesion mutagenesis, we found significant clustering of insertion and deletion mutations with base substitutions (insertion/deletion mutation within 100 bp of a substitution, two-sided Fisher’s test odds ratio 103, P < 2.2 × 10−16 compared with permuted expectation; Fig. 2e,f and Extended Data Fig. 6a–i). Single-base deletions preferentially remove T nucleotides from the lesion strand both genome wide and in mutation clusters (Fig. 2g; two-sided Fisher’s test odds ratio 16.5, P = 1.04 × 10−16), which indicates a base-skipping mode of lesion bypass. These single-base deletions are associated with downstream substitutions within 10 nt that include the G→T substitutions already identified as a signature of collateral translesion mutagenesis, but more prominently a distinct substitution signature of A→C on the lesion strand (Fig. 2g). In contrast to deletions, nucleotide insertions are clustered downstream of typical DEN adduct-induced base substitutions, pointing to collateral insertion mutagenesis by translesion polymerases (Fig. 2g and Extended Data Fig. 6h,i).

Three lines of evidence support a model in which the same translesion polymerases are recruited with equal efficiency and processivity to both the leading and the lagging strands. First, the leading and lagging strands have essentially identical relative rates of mutation clusters (Fig. 2h). Second, the mutation spectra of the downstream mutations are the same (Extended Data Fig. 5o). Third, the length distribution of clusters matches between leading strand-biased and lagging strand-biased regions (no significant difference in size distribution, Kolmogorov–Smirnov test (P = 0.15) despite more than 98% power to detect a difference in the distribution of cluster lengths of 4% or more; Extended Data Fig. 5p,q).

Having established the replicative symmetry of damage-induced mutagenesis and determined the relative contributions of replication and transcription on mutation rate, we next looked in detail at the pronounced strand-specific effects of transcription on DNA repair and mutagenesis.

Multiallelism reveals repair kinetics

Using liver RNA sequencing data (P15 mice), we found that nascent transcription estimates provide a better correlation with mutation rate than steady-state transcript levels (Extended Data Fig. 7a–d), as expected8. Increased transcription decreases the mutation rate for template strand lesions up to an expression level of ten nascent transcripts per million (Fig. 3a,b). Beyond this, the mutation rate plateaus and is not further reduced by additional transcription, suggesting that the remaining mutagenic lesions are largely invisible to TCR (Extended Data Fig. 7c,d)…

Sign up for our Newsletter