Molecular Medicine Israel

Model-guided engineering of DNA sequences with predictable site-specific recombination rates

Abstract

Site-specific recombination (SSR) is an important tool in synthetic biology, but its applications are limited by the inability to predictably tune SSR reaction rates. Facile rate manipulation could be achieved by modifying the DNA substrate sequence; however, this approach lacks rational design principles. Here, we develop an integrated experimental and computational method to engineer the DNA attachment sequence attP for predictably modulating the inversion reaction mediated by the recombinase Bxb1. After developing a qPCR method to measure SSR reaction rate, we design, select, and sequence attP libraries to inform a machine-learning model that computes Bxb1 inversion rate as a function of attP sequence. We use this model to predict reaction rates of attP variants in vitro and demonstrate their utility in gene circuit design in Escherichia coli. Our high-throughput, model-guided approach for rationally tuning SSR reaction rates enhances our understanding of recombinase function and expands the synthetic biology toolbox.

Introduction

Site-specific recombination (SSR) technology relies on recombinases that can precisely recognize two DNA sites, form an intermediate complex, cut, swap, and recombine the DNA in a new configuration, resulting in gene insertions, deletions, or inversions1. Based on the active residue in the catalytic domain, the recombinase superfamily is divided into two groups: tyrosine recombinases and serine recombinases2. Each group can be further subdivided by directionality (bidirectional/unidirectional) for tyrosine recombinases or by size (large/small) for serine recombinases3. Among these four recombinase subgroups, the large serine recombinases (LSRs) are considered one of the most powerful tools in synthetic biology based on the following properties4:

Irreversibility: LSRs have non-identical recognition sites typically known as attB (attachment site bacteria) and attP (attachment site phage) and yield hybrid product sites attL and attR. LSRs cannot target the hybrid attL and attR sites to regenerate attP and attB, resulting in an exceptionally stable DNA recombination product, which is in contrast to the commonly used Cre-lox and FLP-FRT systems5. This feature is important in applications such as human cell genome editing6 and gene circuits for data storage in living cells7.

Simplicity: In contrast to some tyrosine recombinases such as λ integrase, which requires long attP sites (>200 bp), supercoiled DNA structure, and other factors to stabilize DNA bending, serine recombinases have short DNA sites (<50 bp) and no required DNA topology or cofactors8.

Efficiency: LSRs such as Bxb1, TP901 and φC31 have been demonstrated to be efficient in both prokaryotic and eukaryotic cells, in gene therapy9, memory circuit design10,11, and genome editing12.

However, SSR applications in gene circuit design have been largely limited to the creation of logic gates and memory, with a focus on the end products and not the rates at which they are produced7,10,11. The ability to predictably control SSR reaction rates would enable rational tuning of timescales in the aforementioned applications and would also enable advanced circuit designs that rely on differential SSR reaction rates. An understanding of the DNA determinants of such processes could not only lead to improvements in wild-type recombination rates but could also provide a suite of parts that could be coupled together to enable higher-order information processing in genetic circuits via kinetic control. Here, we focused on understanding and engineering the DNA inversion reaction mediated by the mycobacteriophage integrase Bxb113, though given the shared functional mechanisms of LSRs, our approach should be readily translatable to other LSRs as well.

Previous engineering approaches for regulating biochemical reaction rates have focused on altering key amino acid residues in the enzyme14. However, rational protein design is limited by the lack of high-resolution recombinase-DNA complex crystal structures. To our knowledge, only one structure of a DNA-bound LSR, Listeria phage A118 integrase, has been reported and the resolution of the protein-DNA interface is not sufficiently high15. Despite ongoing efforts to understand the interactions between amino acid residues of recombinases and nucleotides, static recombinase-DNA complex crystal structures cannot provide sufficient information to understand DNA sequence determinants of SSR rates15,16,17,18,19. In addition to direct protein-DNA contacts, charge/shape complementarity and water-mediated interactions contribute to the SSR rate of different DNA sequences20. Furthermore, mutations in the recombinase can alter protein stability or solubility, confounding efforts to rationally tune reaction rates via enzyme engineering. Although engineering a recombinase is in principle possible using a library-based approach, this would be more challenging due to the lack of a high-throughput enzyme selection method for SSR rates. Additionally, for gene circuit applications, it is easier to create multiple DNA substrates for a single recombinase than to create multiple recombinases for the same substrate. Therefore, we focused on engineering the DNA attachment sites and developing a method to rationally design DNA attachment site sequences to modulate the SSR reaction rate. Previously, the impact of single and double base substitutions in the Bxb1-attP site on specificity and directionality was reported21. Another group used a high-throughput approach to identify DNA specificity determinants by selecting saturation mutagenesis DNA site libraries22. Although essential for better understanding the recombination mechanism and revealing potential off-target substrates, these studies focused only on DNA sequence specificity. Considering the potential application of LSRs in the design of genetic circuits, it would also be useful to be able to rationally tune their reaction kinetics. Through this mode of predictable kinetic control at the DNA level, it would be possible to use recombinases in applications beyond genetic memory, such as coordination of protein expression dynamics and temporal ordering of genetically encoded processes.

We therefore sought to develop a method to programmably tune the Bxb1 reaction rate via the DNA attachment sequences. However, a method to accurately measure SSR reaction rates or a platform to screen DNA sequences for recombination on a large scale has not been well established. In this study, we developed a qPCR-based method for quantifying relative SSR reaction rate as well as a platform for profiling SSR reaction rates of selected sequences from a designed DNA library using next-generation sequencing (NGS). Then, we constructed a machine-learning model to quantify the contributions of different nucleotide substitutions in attP to the overall SSR reaction rate. Finally, using assays both in vitro and in E. coli, we demonstrated accurate model predictions of rates of DNA inversion. Our study enables rational modulation of SSR reaction rates, providing a form of kinetic control for predictably tuning synthetic genetic circuits and gene editing.

Results

DNA library design

For our initial DNA library design, we first identified nucleotide positions in the attP and attB DNA attachment sites of Bxb1 that could potentially be substituted to vary the reaction rate while maintaining recognition specificity. As shown in Fig. 1a, during the SSR reaction, the attP and attB attachment sites each bind a Bxb1 dimer. After synapsis through the interaction between the coiled-coil (CC) motifs at the different sites, the DNA sequence is cleaved at the center and then rotated 180°. After rotation, the conformation of the CC groups bound together on the same DNA strand is much more thermodynamically stable; thus, the ligation step is essentially irreversible and drives the overall reaction from the attP and attB substrate sites to the attL and attR product sites23. Notably, the attP and attB sites for Bxb1 have non-identical sequences, with both having two quasi-half-sites attP-L/attP-R and attB-L/attB-R (Fig. 1b). A Bxb1 monomer bound to the attP-L half-site has direct contact with the DNA site sequence via the zinc ribbon domain (ZD), the recombinase domain (RD), and the linker connecting ZD and RD (Fig. 1c)15. The shorter length of the attB site forces the linker to adopt a different conformation when bound to attB half sites15. As shown in Fig. 1d (underlined), the four half-sites attP-L, attP-R, attB-L, and attB-R have ~50% conserved bases. Previous studies have demonstrated that the homology positions are highly conserved for specificity24. Therefore, we hypothesized that these conserved positions may directly interact with protein residues and are necessary for DNA recognition, whereas bases at other asymmetric positions might be substituted to vary reaction rates.

Sign up for our Newsletter