Molecular Medicine Israel

Global diversity of enterococci and description of 18 previously unknown species


Enterococci are among the most widely distributed microbes in animal gut consortia. Over 60 species have been identified, including Enterococcus faecalis and Enterococcus faecium commonly found in the human gut. Importantly, both emerged in the antibiotic era as leading causes of multidrug-resistant infection. Microbial traits that determine membership in microbiomes of various hosts are largely unknown and enterococci represent a unique opportunity to determine core underlying principles of that association. This study examined 886 enterococcal specimens from a wide range of hosts in diverse geographies and ecologies. Generalist to specialist enterococcal species were found including 18 previously undescribed species. This study identified new features associated with species radiations and provides evidence that tremendous genetic diversity in Enterococcus remains to be discovered.


Enterococci are gut microbes of most land animals. Likely appearing first in the guts of arthropods as they moved onto land, they diversified over hundreds of millions of years adapting to evolving hosts and host diets. Over 60 enterococcal species are now known. Two species, Enterococcus faecalis and Enterococcus faecium, are common constituents of the human microbiome. They are also now leading causes of multidrug-resistant hospital-associated infection. The basis for host association of enterococcal species is unknown. To begin identifying traits that drive host association, we collected 886 enterococcal strains from widely diverse hosts, ecologies, and geographies. This identified 18 previously undescribed species expanding genus diversity by >25%. These species harbor diverse genes including toxins and systems for detoxification and resource acquisition. Enterococcus faecalis and E. faecium were isolated from diverse hosts highlighting their generalist properties. Most other species showed a more restricted distribution indicative of specialized host association. The expanded species diversity permitted the Enterococcus genus phylogeny to be viewed with unprecedented resolution, allowing features to be identified that distinguish its four deeply rooted clades, and the entry of genes associated with range expansion such as B-vitamin biosynthesis and flagellar motility to be mapped to the phylogeny. This work provides an unprecedentedly broad and deep view of the genus Enterococcus, including insights into its evolution, potential new threats to human health, and where substantial additional enterococcal diversity is likely to be found.

Enterococci are unusually rugged and environmentally persistent microbes (1). This unusual hardiness appears to have aided their transmission among arthropods and then tetrapods as animals began to colonize land and now contributes to the spread of antibiotic-resistant enterococci in hospitals (2). Likely because of their emergence in the early days of terrestrialization, enterococci are among the most widely distributed members of gut microbiomes in land animals—from invertebrates to humans (3). Their occurrence in animals widely varying in gut physiologies, diets, and social habits provides a unique opportunity to explore how diverse host backgrounds gut drive microbiome membership.

Pioneering surveys in the 1960s and 1970s by Mundt and colleagues provided early evidence for the widespread occurrence of enterococci in diverse hosts including mammals and birds (4), insects (5), and animal-inhabited environments (67). However, at the time the genus Enterococcus had yet to be recognized as distinct from Streptococcus (8) and was resolved into species at low resolution by a small number of metabolic tests (9). Although evidence of diversity among the enterococci was found, the inability to precisely assess species and strain differences limited the ability to associate well-defined Enterococcus species with particular hosts. Genomics now provides a high-resolution tool capable of detecting differing traits between species of microbes from varying hosts and for quantifying the extent of their divergence.

The goal of the current study was to sample the Earth broadly for enterococci from diverse hosts, geographies, and environments, to gain a first approximation of the diversity of species on the planet, and to compare the content and degree of divergence of their genomes toward the broader goal of understanding the mechanisms that drive association with particular hosts. To achieve these goals, 430 enterococci from 381 unprocessed animal samples, as well as 456 enterococci isolated by contributors from diverse sources, were collected and taxonomically identified at the DNA sequence level, creating a collection of 886 isolates. The entire genomes of strains exhibiting sequence diversity suggestive of distant relationship to any known species, were then sequenced in their entirety. This identified 18 previously undescribed species of Enterococcus and 1 new species of the ancestrally related genus Vagococcus. Genome sequence analysis also showed that substantial enterococcal diversity remains to be discovered, most prominently in arthropod hosts and insectivores. Further, genetic novelty was found not only in novel species but also circulating in well-known Enterococcus species, including a divergent BoNT-type toxin (10) and a new family of pore-forming toxins (11), highlighting the importance of broader knowledge of the enterococcal gene pool.


Broad Survey Samples Enterococcus Host Diversity.

To understand the breadth of enterococcal diversity, we examined little-sampled (non-clinical, non-human) environments, including those minimally impacted by human habitation or pollution. To maximize global coverage, we assembled the Enterococcal Diversity Consortium (EDC), an international group of scientists and adventurers, who contributed 456 colony-purified presumptive enterococci and 579 whole specimens, typically insects and scat, in addition to commercially procured samples. Enterococci were enriched from 381 of these 579 whole specimens (Dataset S1). Although sampling began prior to the Nagoya Protocol entering into force October 2014 with the objective of fair and equitable sharing of benefits arising from the utilization of genetic resources, thereby contributing to the conservation and sustainable use of biodiversity, all strains showing novelty as described below were registered with the country of origin irrespective of isolation date. Specimens derived from a wide range of hosts and host diets (e.g., carnivores versus herbivores), geographies, and environments (e.g., captive versus wild). The diversity of sources spanned penguins migrating through sub-Antarctic waters (1213), duiker and elephants from Uganda; insects, bivalves, sea turtles, and wild turkeys from Brazil to the United States; kestrel and vultures from Mongolia; wallaby, swans, and wombats from Australia; and zoo animals and wild birds from Europe. Two selective media, CHROMagar Orientation agar (14) and bile-esculin azide agar (9), were used to culture presumptive enterococci to minimize potential selection bias against natural enterococcal isolates with unknown properties. To recover enterococci potentially present in low abundance, we performed isolations both directly from samples and from enrichment cultures. Some specimens yielded presumptive enterococcal colonies with varying morphologies, and in those cases, each morphotype was separately analyzed (Dataset S1). At least one presumptive enterococcal isolate was culturable from 55% of samples (318 of 579), including extracts from dead insects, wild animal feces, tissue swabs, or samples of water and soil likely contaminated with animal fecal matter. These 318 presumptive Enterococcus-positive samples yielded 430 morphologically distinct colony types that were further analyzed. Together with the 456 putatively enterococcal isolates contributed as pure cultures by EDC members, this resulted in an analytical set of 886 presumptive enterococci (Dataset S1, Tab 1) derived from 774 different sample sources.

Preliminary Screen for Species Diversity.

Species-level diversity within the genus Enterococcus is not well resolved by nucleotide polymorphisms in the 16S rRNA gene (15). Thus, as an initial measure of taxonomic diversity, we developed a high-resolution PCR amplification and amplicon sequencing protocol. An internal highly polymorphic 97bp fragment of an RNA methylase gene (SI Appendix, Fig. S1A), designated EF1984 in the Enterococcus faecalis V583 genome and previously found to be core to the Enterococcus genus (2), flanked by short conserved sequence stretches that could be used for amplification, was selected for initial screening for strain diversity. Sequence variation within this 97bp diversity locus (DL) proved better able to discriminate species than variability within the entire 16S rRNA gene (horizontal lines, SI Appendix, Fig. S1B). Further, sequence polymorphism within the DL recapitulated well the phylogenetic relationships between the 47 known enterococcal species based on genome-wide average nucleotide identity (ANI) (SI Appendix, Fig. S1C).

DL Variation Presumptively Identified Novel Species.

Colonies of all 886 presumptive enterococcal isolates were subjected to DL amplification and amplicon sequencing. Of those, 34 isolates yielded no product. Amplification and sequencing of the full-length 16S rRNA gene of those 34 showed them to be Carnobacterium (9/34), Lactobacillus (10/34), or Vagococcus (14/34), all closely related to Enterococcus, likely accounting for their growth on selective media.

DL positive enterococci (852 of 886 isolates) derived from 41 taxonomic orders of animal hosts from 16 countries on 6 continents, representing many climatic zones (Fig. 1A). Largely reflecting representation in the collection, positively identified enterococcal isolates were obtained from mammals (29%), birds (28%), insects (18%), reptiles and amphibians (9%), coastal fish (4%), bivalves (2%), and gastropod mollusks (2%) (Fig. 1B). Samples derived from primary consumers (e.g., herbivores) as well as predators and scavengers (Fig. 1C). Over half of the isolates (53%) derived from wild environments with very low human activity (Fig. 1D).

Strains belonging to the same known species all shared DL sequence variations of 4bp or less, and DL sequence variation was able to resolve known fine-scale differences between Enterococcus faecium clades A and B (known to share ~94% ANI, (1617). Most DL sequences (96%, 824/853) matched with fewer than 4 single-nucleotide polymorphisms (SNPs) to 1 of 65 previously identified enterococcal species for which a genome had been sequenced (Fig. 2A and Dataset S1), to which these closely related isolates were presumptively assigned. The most frequently encountered species in our collection were: E. faecalis (340/853; 40%), E. faecium (125/853; 15% of isolates, including 66 [9%] clade A and 59 [7%] clade B), E. mundtii (119/853; 13%), E. casseliflavus (81/853;10%), and Enterococcus hirae (68/853; 8%). Importantly, this approach identified 27 isolates possessing 19 different DL sequences that exceeded the threshold for likely membership in a known species (>4 SNPs from any known species), identifying them as potentially novel (Fig. 2B and Dataset S1). These 27 isolates derived from insects (14 isolates), birds (9 isolates), and herbivorous reptiles (4 isolates) (Fig. 2C). In contrast, isolates of the most frequently encountered species—E. faecalis, E. faecium, and E. mundtii—derived mainly from mammals and birds, but were not exclusive to those hosts (Fig. 2C)….

Sign up for our Newsletter