Molecular Medicine Israel

Recent gene selection and drug resistance underscore clinical adaptation across Candida species

Abstract

Understanding how microbial pathogens adapt to treatments, humans and clinical environments is key to infer mechanisms of virulence, transmission and drug resistance. This may help improve therapies and diagnostics for infections with a poor prognosis, such as those caused by fungal pathogens, including Candida. Here we analysed genomic variants across approximately 2,000 isolates from six Candida species (C. glabrataC. aurisC. albicansC. tropicalisC. parapsilosis and C. orthopsilosis) and identified genes under recent selection, suggesting a highly complex clinical adaptation. These involve species-specific and convergently affected adaptive mechanisms, such as adhesion. Using convergence-based genome-wide association studies we identified known drivers of drug resistance alongside potentially novel players. Finally, our analyses reveal an important role of structural variants and suggest an unexpected involvement of (para)sexual recombination in the spread of resistance. Our results provide insights on how opportunistic pathogens adapt to human-related environments and unearth candidate genes that deserve future attention.

Main

Fungal infections pose a serious health threat, affecting more than one billion people and causing approximately 1.5 million deaths each year1,2. The problem is growing due to insufficient diagnostic and therapeutic options3,4, increasing numbers of susceptible patients1,5, the expansion of pathogens partly linked to climate change6,7 and the alarming rise of antifungal drug resistance4,8,9Candida species are a major cause of severe hospital-acquired infections1, prompting the classification of some species (Candida aurisCandida albicansCandida glabrataCandida tropicalis and Candida parapsilosis) as critical or high-priority targets by the World Health Organization2.

A promising strategy to improve current therapies is to understand the evolutionary mechanisms of adaptation to antifungal drugs as well as to the human host. Candida pathogens have highly dynamic genomes (both within species10,11,12 and within patient13,14), which probably underlie these adaptive processes13,15,16,17,18. For example, in vitro evolution studies have pinpointed genome-wide changes underlying drug resistance19,20,21. In addition, analyses of serial clinical isolates13,14, genome-wide association studies (GWAS)22,23 and population genomics research11,12,24 have partially clarified the clinical relevance of resistance mechanisms. Similarly, directed evolution experiments in mice25,26,27, the analysis of paired clinical isolates13 and population genomics studies12,28 have explored host adaptation mechanisms involving virulence, adhesion or filamentous growth. Furthermore, some studies used ratios between non-synonymous and synonymous variation (such as πN/πS) to infer signatures of selection, which are useful to predict genes involved in clinical adaptation where the relevant phenotypes (such as drug susceptibility or cell adhesion within a patient) are not measurable12,29,30,31.

However, our understanding of how Candida species adapt in a clinical context is limited due to many reasons. First, most clinical studies include small sample sizes and/or lack rigorous statistical testing of the associations between genotypes and adaptive changes. Second, most studies involve only C. albicans, leaving open questions in other species2. Third, despite the importance of structural variants (SVs; such as deletions, duplications, inversions and/or translocations; Fig. 1)32,33,34, their contribution to clinically relevant adaptation remains largely unexplored. Fourth, similarities in adaptation mechanisms across species remain elusive because most studies focus on only one species and use different methods. This is key to understanding the epidemiology of these pathogens as well as enabling personalized treatments and prevention strategies. Fifth, many exploratory clinical studies focus only on known adaptive mechanisms (that is, known drug-resistance genes, as discussed previously23), which means that there may be unexplored factors. Finally, current studies of selection consider all variants within a gene, which may reflect ancient adaptation unrelated to the clinics. It may be important to only analyse recently emerged variants, as they are more likely to reflect clinically relevant selective pressures (as proposed in ref. 35).

To address these gaps, we used approximately 2,000 available genomes from major Candida species to investigate two open questions in clinical adaptation. First, we used phylogenetics and πN/πS-inspired tools to infer the genes with signatures of recent and potentially clinically relevant selection in C. glabrataC. aurisC. albicansC. tropicalisC. parapsilosis and C. orthopsilosis. Second, we used convergence-based GWAS to infer the genomic drivers of resistance to echinocandins, polyenes and azoles in C. glabrataC. auris and C. albicans. In both cases we measured the contribution of various variant types, including SVs. Our analyses revealed both expected and novel adaptive mechanisms, including those convergently acting in several species.

Results and discussion

Public sequences allow the study of recent evolution in Candida

To identify genes under recent selection in Candida pathogens, we retrieved all publicly available short-read whole-genome sequencing data for pure isolates (that is, clinical and environmental) of six major species and identified four variant types: single-nucleotide polymorphisms (SNPs), small insertions and deletions (indels), SVs and copy-number variants (CNVs; Methods, Fig. 1 and Extended Data Fig. 1). We enriched genomic information with strain metadata from the literature, including isolation source and antifungal drug susceptibility, where available (Fig. 1b and Supplementary Table 1). This dataset, comprising 1,987 high-quality samples (available at https://candidamine.org), is unprecedented in terms of the types of variants and number of strains considered11,12,24,28,36.

To provide a phylogenetic framework to our analysis, we inferred a strain tree (Fig. 1b and Supplementary Table 1) and used a systematic approach to identify genetically divergent monophyletic clades in each species (Methods and Supplementary Fig. 1a). A comparison with previously defined clades (Methods) revealed an overall consistency, underscoring the validity of our clade-definition approach, but also showed that our dataset encompasses a higher intraspecific diversity. In summary, we generated a dataset with unprecedented power to study the signs of selection and drug-resistance mechanisms in major Candida pathogens.

Structural variants underlie intraspecific variation

To determine the relevance of considering different variant types in subsequent analyses, we quantified their relative contribution to genetic diversity. Such comparative analysis across Candida species is lacking, as most previous studies have focused on SNPs and used specific methodologies. For each variant type, we measured the genetic distance (variants kb−1) between all pairs of isolates within a given species. We found that most species span high levels of diversity so that some distant conspecific strains have a genetic distance of about 10 SNPs kb−1 (1% divergence) or higher (Fig. 2a,b). In some species (that is, C. orthopsilosis36 and C. albicans10), this could be attributed to their hybrid nature. For non-hybrid species (C. glabrata and C. auris), this indicates that their diversification predates human colonization, which must have occurred in parallel in divergent clades for each species. C. parapsilosis is an exception to this trend, pointing to a more recent origin of this lineage.

Regarding non-SNP variants, we found that the SV and indel distances correlate to SNP distances (Fig. 2a), which suggests that they were accurately called. Conversely, the CNV and SNP distances were not always correlated (Fig. 2a), which could be attributed to inaccurate definitions of CNV boundaries, probably complicating distance metrics. As expected, SNPs were quantitatively the most common variant type, followed by indels—one order of magnitude less prominent—and then SVs and CNVs at much lower frequencies (Fig. 2b). Despite their lower abundance, SVs and CNVs can affect a significant fraction of protein-coding genes (Fig. 2c), highlighting their relevance. We investigated the mechanisms underlying the formation of SVs and CNVs, and found that most variants are unrelated to repetitive elements or rearrangements derived from homologous recombination (Extended Data Fig. 2 and Methods). This suggests that non-homologous-end-joining DNA repair pathways37,38 could be the main driver of SVs and CNVs in Candida species, consistent with such repair often resulting in rearrangements39. In summary, we found that all variant types are quantitatively important and therefore should not be overlooked in subsequent analyses.

Signatures of recent selection reveal adaptation mechanisms

To infer the signatures of recent clinically relevant selection, we took advantage of the predominance of clinical strains in our collection. We reasoned that recently acquired variants in clinical isolates may be enriched in those acquired in a clinical context and could therefore inform on selective pressures related to adaptation to human-related environments. The standard approach of calculating πN/πS ratios12,31,40,41 is not suitable for our aim for the following reasons. First, we focus on recently acquired variants and πN/πS considers all mutations in a gene, thereby also detecting ancient selection. Second, considering only recent variants poses a statistical challenge to reliably calculate πN/πS, given that many genes have few recent variants and thus a πS of zero. Third, πN/πS cannot be applied to indels, SVs and CNVs, which we deem important.

To overcome these drawbacks, we developed a πN/πS-inspired method that detects genes with an excess of recent functionally relevant variants (non-synonymous SNPs, in-frame indels, gene duplications or truncations; Methods, Fig. 3a and Extended Data Figs. 3,4). Duplications could be SVs or CNVs, and deletions could be nonsense SNPs, frameshifting indels, SVs or CNVs. Note that an excess of deletions in a gene could reflect either positive selection acting on deletions or recent relaxation of purifying selection. To focus on recent variants, we identified monophyletic clusters comprising only clinical strains with high genetic relatedness (Supplementary Fig. 1) and only considered variants inferred to have appeared within the cluster. These clusters probably represent clonally propagating lineages that evolved in human-associated environments (as they are closely related and recurrently isolated from patients), and therefore recent mutations may reflect selective pressures related to adaptation to the host, hospital environments or antifungal drugs. We used these variants to define genes under recent selection as those with an excess of recurrent functionally relevant variants.

We detected several recently selected genes using our approach, belonging to 879/7,499 orthologous groups (OGs, a proxy for gene families) (Fig. 3b and Supplementary Table 2). The low numbers in C. orthopsilosis and C. parapsilosis probably reflect reduced statistical power due to few strains and low intraspecific diversity, respectively. Thus, further sequencing efforts will be needed to fully understand the signs of selection in these species. Most OGs are affected by a single variant type, with few exceptions that suggest complex evolutionary interactions (sometimes antagonistic) among the OG members (Fig. 3b and Supplementary Results). We found several expected genes related to virulence and drug resistance, providing support for the validity of our approach (Fig. 3b and Supplementary Table 2). Some examples include ALS genes in C. albicans (implicated in adhesion and biofilm formation42); TAC1bERG11 and MRR1 in C. auris (related to azole resistance11,21,43,44); PDR1 in C. glabrata (implicated in azole resistance19,45); EPA genes in C. glabrata (related to adhesion32,46); a drug exporter in C. orthopsilosis (gene CORT_0G00240) or filamentous growth proteins in C. tropicalis (genes CTRG_00655 and CTRG_03085). In addition, significant overlaps between these genes and those with recurrent mutations across clonal within-patient serial isolates were observed, providing support for the idea that these genes are often involved in clinical adaptation (Supplementary Results). Furthermore, there were signs of selection on all variant types in most species, suggesting that considering SVs and CNVs is relevant. This gene catalogue constitutes a valuable resource to validate the clinical relevance of evolutionary mechanisms inferred from future non-clinical studies (that is, in vitro evolution19,21, virulence in animal models27 or high-throughput genotype–phenotype screenings47,48).

To understand the similarities in selective processes across species, we screened for OGs with a gene affected by selection in multiple taxa. Only 68/879 such OGs were identified, suggesting that each species has unique signatures of selection (Fig. 3c). Although this could be partly attributed to different sampling criteria and statistical power across taxa, it is consistent with generally different infection mechanisms in each species, which is also reflected in mostly non-overlapping transcriptional profiles on host interactions49,50. However, in many instances the number of overlapping OGs was higher than expected by chance (P < 0.05; Methods, Fig. 3c and Supplementary Table 2), pointing to convergent adaptive mechanisms in Candida pathogens. Relevant example genes within these OGs include ALS genes from C. albicans and C. aurisOPT2 and OPT3 (transporters related to pseudohyphal growth and fluconazole presence) in C. albicans and C. tropicalisMRR1a in C. auris and C. tropicalis (related to drug resistance), FLO8 and MSS11 (related to pseudohyphal growth) in C. glabrata and C. aurisMDS3 (virulence factor) in C. albicans and C. aurisCST6 (associated with azole resistance22) in C. glabrata and C. auris, and WOR4 (related to phenotype switching) in C. albicans and C. auris.

We performed enrichment analyses on functional annotations and identified 1,074 domains, 151 gene ontology (GO) terms, five MetaCyc and three Reactome pathways that were enriched across all gene sets (Fig. 4, Supplementary Fig. 2 and Supplementary Table 2), including hyphal growth, biofilm formation, transcriptional regulation, response to temperature, cell adhesion, carbohydrate metabolism, cell wall and membrane regions (Fig. 4). Most enriched functional groups are unique to a single species (991/1,074 domains, 143/151 GO terms, and all MetaCyc and Reactome pathways), suggesting that each species has unique signatures of recent selection also at the pathway and domain level (Fig. 4). These species-specific enrichments reflect the distinct adaptive paths affecting each of these Candida pathogens (discussed in Supplementary Results). However, there are several convergently affected pathways and domains, which may reflect conserved adaptive mechanisms (Fig. 4 and Supplementary Fig. 2). Relevant examples include a zinc-dependent transcription factor domain in C. tropicalisC. albicans and C. auris; disordered regions in C. tropicalisC. albicans and C. glabrata, and hyphally regulated cell wall proteins in C. tropicalisC. albicans and C. auris (Supplementary Fig. 2). Several GO terms related to adhesion (‘biological process involved in symbiotic interaction’, ‘adhesion of symbiont to host’ and ‘cell–cell adhesion’) were also enriched in genes with selected deletions from C. tropicalisC. albicans and C. glabrata, suggesting recurrent rewiring of these functions (Fig. 4). Further research is needed to associate these functions with possible adaptive advantages. For instance, disordered proteins can generate new traits in yeast51 and the deletion of adhesion genes could modulate host attachment, biofilm formation or immune evasion52,53,54,55, therefore improving survival. In summary, our results suggest hundreds of gene families (about 10% of all families) and pathways under recent selective pressure, often in a single species. This may be explained by the natural niche of these pathogens being massively different to the human host. In addition, we found convergently selected families and pathways that may be at the core of recent adaptation and constitute interesting therapeutic targets. Future experiments should validate these results and pinpoint the most important drivers of recent adaptation….

Sign up for our Newsletter