Molecular Medicine Israel

Evolutionary and functional history of the Escherichia coli K1 capsule

E.coli

Abstract

Escherichia coli is a leading cause of invasive bacterial infections in humans. Capsule polysaccharide has an important role in bacterial pathogenesis, and the K1 capsule has been firmly established as one of the most potent capsule types in E. coli through its association with severe infections. However, little is known about its distribution, evolution and functions across the E. coli phylogeny, which is fundamental to elucidating its role in the expansion of successful lineages. Using systematic surveys of invasive E. coli isolates, we show that the K1-cps locus is present in a quarter of bloodstream infection isolates and has emerged in at least four different extraintestinal pathogenic E. coli (ExPEC) phylogroups independently in the last 500 years. Phenotypic assessment demonstrates that K1 capsule synthesis enhances E. coli survival in human serum independent of genetic background, and that therapeutic targeting of the K1 capsule re-sensitizes E. coli from distinct genetic backgrounds to human serum. Our study highlights that assessing the evolutionary and functional properties of bacterial virulence factors at population levels is important to better monitor and predict the emergence of virulent clones, and to also inform therapies and preventive medicine to effectively control bacterial infections whilst significantly lowering antibiotic usage.

Introduction

Bacteria across a wide range of phyla produce capsular polysaccharides that are associated with diverse experimentally validated functions and have been shown to improve bacterial persistence and adaptation to new environments1,2. Human bacterial pathogens use these capsules as major virulence determinants to promote colonization and persistence in the gastrointestinal, respiratory, and urogenital tracts, as well as other tissues3,4,5. Capsular polysaccharides are often identical or similar in structure and chemical composition to polysaccharides found in human tissues, thereby providing non-immunogenic coatings for bacteria to survive in human tissues and to cause disease5. In addition, capsules can reduce the efficiency of antimicrobial peptides and complement6,7,8,9, suppress phagocytosis by innate immune cells and promote intracellular survival10,11,12,13,14,15,16,17, and contribute to defence against antimicrobial agents8,18. Although E. coli can produce around 80 distinct capsular chemotypes that are organised into four major groups19, only a subset of these chemically distinct capsular types are associated with the capacity to cause invasive extraintestinal diseases; such infections include bloodstream infections (BSI), pyelonephritis and meningitis20. In particular, the polysialic acid-containing K1 capsule21,22, chrondodontin-containing K4 capsule23 and heparosan-containing K5 capsule24,25 are associated with the extraintestinal pathogenic E. coli (ExPEC) clones linked to such invasive diseases26,27, likely through the propensity of these capsule type to mimic polysaccharides present on cells in the human tissues5. However, the epidemiology of the capsular types remains largely unexplored due to the absence of serological typing data or specific methods that can computationally predict E. coli capsular types based on whole-genome sequencing data. Our current understanding of the evolution and functional properties of the distinct capsular polysaccharides in the global E. coli population is therefore limited and mainly based on pre-genomic studies.

The K1 capsule polysaccharide has repeatedly been linked to BSI, neonatal meningitis and pyelonephritis5,28,29,30,31. The K1 capsule is a homopolymer of α−2,8-linked N-acetylneuraminic acid (sialic acid; NeuNAc) termed polysialic acid (polySia) that mimics the polySia modification found on human neuronal and immune cells32,33,34,35 and likely promotes the capacity of K1-expressing E. coli to hide and reside within the blood and neuronal compartments. Indeed, polySia prevents full activation of innate host defences and confers resistance to complement- and phagocyte-mediated killing36,37,38. In agreement with epidemiological links of K1 capsule to invasive human infections, experimental animal models using isogenic strains have revealed that K1 expression promotes stable gastrointestinal (GI) tract colonisation and promotes the development of invasive systemic infections by E. coli39,40,41,42. In contrast to the association of K1 with BSI and meningitis, other well-studied K antigens including types K2, K4 and K5 are mostly associated with UTIs5,6,43,44. Despite the association of K1 encapsulated E. coli with BSI and meningitis, and the fact that polySia can be utilized as a powerful tool for diagnosis and therapeutic targeting45,46, we lack basic knowledge on the prevalence, evolution and functional properties of the K1 capsule at the population level. This lack of knowledge limits our capacity to develop efficient strategies to combat E. coli infections.

In this study, we elucidated the prevalence, distribution, and evolution of the K1 capsule in E. coli populations by considering a global dataset of 5065 clinical isolates. The position and synteny of the K1 capsule (K1-cps) locus in the genomes were resolved by considering K1 complete genomes. Using a Bayesian inference approach, we estimated the introduction date of K1-cps in the most predominant extraintestinal lineages. To show that the K1 capsule was functional among the distinct lineages, we performed several phenotypic assays to show that the K1 capsule was expressed and conferred E. coli with immune resistance independent of its genetic background. Our results show that a quarter of BSIs are caused by E. coli carrying this the K1-cps locus, and this is driven by multiple introductions of K1-cps into the ExPEC pathotype. For the first time, we estimated the introduction times of the capsule among the main ExPEC lineages and dated that the K1-cps locus was acquired at least 500 years ago. In support of the role of K1 capsule in virulence, we show that the enzymatic removal of the K1 capsule renders E. coli susceptible to complement-replete human serum, suggesting that the therapeutic use of capsule depolymerases is likely to be a promising approach for the prevention and treatment of these infections.

Results

A quarter of the ExPEC population has the K1-cps locus

Despite the association of K1-cps to the ExPEC pathotype since the 1980s21,47,48,49, and the availability of whole-genome sequencing data for ExPEC isolates, the epidemiology of the K1 capsular type has remained largely unexplored in the post-genomic era. The K1 capsule biosynthesis locus belongs to group 2 capsules26, and is composed of eight genes in two conserved regions (regions 1 and 3) shared between all group 2 capsule types and an additional 6 genes (region 2) unique to the K1-cps locus (Fig. 1a).

To estimate the prevalence of the K1-cps locus among ExPEC isolates, we assessed two unbiased longitudinal studies, NORM50 and BSAC51, that characterized BSIs in Norway (n = 3254) and United Kingdom (n = 1509), respectively. As the BSI isolates in both studies were collected regardless of their clonal background, antimicrobial resistance profile or other bacterial phenotypic or genotypic characteristics, the NORM and BSAC collections offer a representative survey of BSI clones that have circulated in comparable host populations during the timespan of the studies. These datasets therefore provide a valuable platform to estimate the K1-cps prevalence in E. coli BSI populations. The population prevalence of the K1-cps locus among BSI isolates was estimated to be 24.0% and 22.9% for the NORM and BSAC collections, respectively (Table 1). The finding that a similar proportion of BSI isolates are positive for K1-cps in the two independent unbiased longitudinal studies provides strong evidence that the K1 is linked with E. coli propensity to cause BSIs.

Group 2 capsules, including K1, are classically assumed to be expressed in E. coli isolates causing extraintestinal infections, but not in E. coli causing diarrhoeal diseases26. To clarify whether the K1-cps is associated with any other E. coli pathotypes, we analysed the Horesh et al. collection that consists of a comprehensive, high-quality and pathotype-defined collection of E. coli genomes52. We specifically screened for the presence of the K1-cps locus in the 5,236 diarrhoeagenic isolates from the Horesh et al. collection that includes the pathotypes (i) enteropathogenic E. coli (EPEC), (ii) enterotoxigenic E. coli (ETEC), (iii) enterohaemorrhagic E. coli (EHEC), (iv) enteroaggregative E. coli (EAEC), (v) enteroinvasive E. coli (EIEC), (vi) diffusely adherent E. coli (DAEC) and (vii) adherent invasive E. coli (AIEC). We observed that only 0.1% (5/5236) of the diarrhoeagenic isolates carried the K1-cps locus, therefore discarding a role of the K1 capsule in diarrhoeagenic diseases.

Given that ~25% of ExPEC isolates possess the K1-cps locus, it is important to analyze the clonality of these isolates to understand the population structure and to define how ExPEC clones with enhanced virulence properties emerge and evolve. We, therefore, examined the distribution of the K1-cps locus across the distinct ExPEC phylogroups (phylogroups A, B1, B2, C, D, E and F). For both the NORM and BSAC datasets, the K1-cps was frequent in phylogroup B2 (31.7% and 29.4%, respectively) and phylogroup F (50.4% and 23.2%, respectively), but also observed in phylogroups A (9.3% and 6.1%, respectively) and D (0.6% and 1.4%, respectively). We did not detect K1-cps in phylogroup B1, C or E. The distribution of the K1-cps locus among clonal complexes (CC, defined here as belonging to a lineage named after the dominant sequence type) was almost identical for both NORM and BSAC collections (Table 1). Within phylogroup B2, the K1-cps was mostly present in CC95 followed by CC141, CC144, and CC80 (Table 1). In addition, we observed the capsule in a few isolates of the pan-susceptible clade B of CC131, but not in the multidrug-resistant (MDR) C1 and C2 clades of CC131. The K1-cps locus was also present in the CC59 and CC62 within phylogroup F and in CC10 within phylogroup A (Table 1). This data indicates that 22–24% of all BSI isolates are not caused by a single successful K1-cps+ ExPEC clone, but by multiple distinct K1-cps+ clones.

To analyze the distribution and evolution of the K1-cps across a global collection of ExPEC isolates, we expanded the NORM and BSAC studies by incorporating isolates from other countries, time periods and sources that were genotypically or phenotypically detected as K1-cps+ or K1+, respectively. This batch of samples included: (i) newly generated adult and neonatal samples phenotypically detected as K1+, n = 201, from six countries (Brazil, Laos, Mexico, Poland, United Kingdom, United States), (ii) samples carrying the K1-cps locus from the pre-antibiotic era (from 1932 onwards) that form part of the Murray collection, n = 1553 and (iii) samples carrying the K1-cps locus from the Horesh collection, n = 8652. We collated all genome isolates (5065) and generated a maximum-likelihood phylogeny. A total of 1427 isolates possessed the K1-cps locus (Fig. 1b). The phylogeny overlaid with the metadata (also available at Dataset S1) can be interactively queried in the following Microreact project https://microreact.org/project/cm8f8adPyWoQqemAAgvrmi-k1-context. For isolates in which the complete K1-cps locus, from kpsM to kpsF (Fig. 1a), was assembled in one contig (1409/1427, 98.7%), the product of kpsT showed the most conserved protein sequence with 98.7% (1391/1409) of isolates carrying an identical amino acid sequence (Supplementary Fig. 1). In contrast, neuE, the most diverse gene in the locus, displayed a total of 76 distinct variants (Supplementary Fig. 1).

The dominant ExPEC lineage CC95 acquired the K1-cps locus over 250 years ago

CC95 is one of the most dominant ExPEC clones associated with community-onset and nosocomial infections worldwide20, and the K1-cps locus is ubiquitous in invasive isolates of this lineage (99.8–100% of CC95 isolates are K1-cps+; Table 1). We hypothesized that the successful expansion and establishment of the K1-cps+ CC95 clone in the E. coli population has been driven by a single acquisition of the K1-cps locus and clonal expansion, rather than multiple independent K1-cps acquisitions by CC95. We further investigated the genetic context and evolution of the K1-cps in the lineage by obtaining a dated phylogeny of CC95 NORM genomes to determine the most common recent ancestor (tMRCA) for K1-cps within CC95, and by unambiguously characterizing the position and gene synteny of the K1-cps locus in the E. coli genome by retrieving all CC95 genomes with an associated RefSeq complete genome (n = 44). Nearly all NORM genomes (441/442) from CC95 (Table 1) possessed the K1-cps suggesting that this locus was introduced by its most common recent ancestor (MRCA).

To determine the origin of CC95 (tMRCA), we used BactDating that makes use of a Markov Chain Monte Carlo (MCMC) model to perform Bayesian inference and produce a dating phylogeny. We estimated the origin of CC95 (tMRCA) and thus estimated the introduction of the K1-cps locus to be approximately around the year 1768 [95% CI, 1721–1806] (Fig. 2a). In Dataset S2, we confirmed the MCMC convergence of the model as shown by (i) the effective sample sizes of the parameters and (ii) the Gelman and Rubin’s convergence diagnostic. In addition, we show a significant temporal signal by comparing the resulting model against a model with equal sampled dates using the deviance information criteria (DIC) (Dataset S2). Thus, the high-frequency of K1-cps in CC95 is readily explained by a single acquisition event that occurred approximately over 250 years ago that subsequently spread worldwide as a single clone.

The K1-cps locus was present next to the tRNA-pheV site, which is localized around the position ~800,000 of the chromosome (matching the position described in pre-genome literature as 67 min in the E. coli genome54,55) as revealed by analysis of the RefSeq CC95 complete genomes (Dataset S3). Upstream of the K1-cps locus, gene synteny is highly conserved and is characterized by the presence of a type II secretion system56. Downstream of the locus, two predominant genome configurations were defined across CC95 (i) the insertion of a pathogenicity island (PAI) that has resulted in the truncation of the tRNA pheV gene (26/44, 59%), and (ii) the absence of the PAI and an intact pheV gene (16/44, 36%), while 2 genomes (2/44, 5%) showed each one a distinct gene synteny as determined by Panaroo. In the NORM CC95 isolates, we could detect the PAI in 80.27% (354/441) of the CC95 isolates based on the truncation of the pheV and presence of the intB gene, while 15.87% (70/441) of isolates did not carry the PAI, and in 3.85% of the cases (17/441) we could not determine the presence/absence of the PAI due to fragmented short-read assemblies (Fig. 2a).

This PAI had an average size of 52.4 kbp and was previously termed PAI-V (Fig. 2b)57,58. Because PAIs carry one or more accessory genes that encode virulence factors that often function as adhesins, iron-acquisition systems, host defence mechanisms or toxins59, the acquisition and loss of this PAIs is likely to impact phenotypes and infectivity. We did not observe any trend in the association between the presence/absence of the PAI and the isolation source (BSI vs UPEC) of these 44 complete genomes (Dataset S3), indicating that acquisition or loss of this PAI is not exclusively linked to either type of invasive infection. The mosaic structure and flexible pool of virulence genes carried by PAIs could mean that a certain virulence gene is associated with the PAI in the CC95 lineage. However, a comparison of the virulence genes of the isolates that lost the PAI with the closest PAI+ isolate in the phylogeny indicated the main difference resided in the presence and absence of papGII locus encoding the P fimbriae and an iron-acquisition (ireA) locus. Interestingly, these genes have been recently identified as key features distinguishing invasive from non-invasive UPEC60. Additional genes that have roles in promoting bacterial survival or virulence were not identified on the PAI in CC95 isolates.

Based on the CC95 dated phylogeny (Fig. 2a), we determined that the MRCA did not possess the PAI and that it is likely that there were two introductions of a PAI downstream of the kpsF gene in this lineage. The PAI was estimated to have been introduced between 1823 [95% CI, 1789–1851] and 1840 [95% CI, 1809–1864], and then subsequently has become fixed in the genome and predominated through clonal expansion. Interestingly, we estimated a minimum of 14 excision events of the PAI linked to recombination between the two direct short repeats at pheV (22 bp)61 that resemble att sites in a process likely mediated by the intB P4-like integrase gene58,62. This suggests that the PAI can be excised and lost from the E. coli genome. In support, this PAI can form a circular excision product that may be exported by larger mobile genetic elements (MGEs)61. We also observed a minor and independent introduction of an island downstream of the kpsF gene estimated to happen between 1864 [95% CI, 1832–1892] and 1904 [95% CI, 1878–1927] (Fig. 2b). Collectively, this data indicates that the introduction of the K1-cps locus in CC95 precedes the acquisition of the PAI, but that accessory genes encoded on the PAI could additionally contribute to the success of CC95 as an ExPEC clone.

Multiple K1-cps locus acquisitions across the major ExPEC lineages

Given that the K1-cps + CC95 clone emerged over 250 years ago and is globally successful, we hypothesized that the CC95 population is likely to have acted as the donor of the K1-cps locus during the emergence of other K1-cps + ExPEC clones in other phylogroups. For each K1-cps+ lineage, we performed a hybrid assembly to obtain a complete chromosomal sequence (Dataset S4) that allowed us to infer the gene synteny present downstream of the K1-cps locus and shown in Fig. 4.

We first analysed the acquisition of the K1-cps locus in other phylogroups (CC59 and CC62 in phylogroup F; CC10 in phylogroup A). Surprisingly, the K1-cps locus was introduced into CC59 by its MRCA around the year 1525 [95% CI, 1044–1730] (Fig. 3, Dataset S2). The BactDating model inferring the tMRCA of CC59 showed MCMC convergence of the parameters and a significant temporal signal (Dataset S2). The median number of SNP differences and the SNP profile of the neu region (Supplementary Fig. 2) together with the conserved gene synteny observed downstream of the kpsF gene (Fig. 4) indicated that CC59 does not share the same K1-cps locus with CC95. Instead, CC59 shared the same K1-cps locus with CC62, and shared a common ancestor with CC62 within the E. coli wide-species tree (Fig. 1). This suggests that the MRCA of these lineages in phylogroup F already carried the K1-cps locus. Correspondingly, the data conclusively refute that CC95 was the original donor of the K1-cps locus for CC59/CC62, and vice versa, that CC59/CC62 was the original donor(s) of the K1-cps locus for CC95…

Sign up for our Newsletter