Abstract
X chromosome inactivation (XCI) generates clonal heterogeneity within XX individuals. Combined with sequence variation between human X chromosomes, XCI gives rise to intra-individual clonal diversity, whereby two sets of clones express mutually exclusive sequence variants present on one or the other X chromosome. Here we ask whether such clones merely co-exist or potentially interact with each other to modulate the contribution of X-linked diversity to organismal development. Focusing on X-linked coding variation in the human STAG2 gene, we show that Stag2variant clones contribute to most tissues at the expected frequencies but fail to form lymphocytes in Stag2WT Stag2variant mouse models. Unexpectedly, the absence of Stag2variant clones from the lymphoid compartment is due not solely to cell-intrinsic defects but requires continuous competition by Stag2WT clones. These findings show that interactions between epigenetically diverse clones can operate in an XX individual to shape the contribution of X-linked genetic diversity in a cell-type-specific manner.
Main
Eutherian mammals such as humans and mice compensate for differences in X-linked gene dosage between males and females by X chromosome inactivation1 (XCI; Fig. 1a). In XX embryos, each cell randomly chooses one of its two X chromosomes for inactivation, which results in the silencing of the majority of genes on that chromosome1,2,3,4. XX embryos therefore resemble mixtures of clones expressing genes from either their maternal or paternal X chromosome. The identities of the active (Xa) and inactive (Xi) X chromosomes are clonally propagated through organismal development by epigenetic mechanisms5,6. Hence, XX individuals are clonally heterogeneous as a result of XCI and its propagation.
Human population shows extensive genetic diversity, including single-nucleotide polymorphisms7 (SNPs), which occur at comparable frequencies on autosomes and X chromosomes8 (Supplementary Table 1). The human X chromosome harbors >600 protein-coding genes annotated in OMIM, the Online Catalog of Human Genes and Genetic Disorders9. Together, these genes contain ~400k nonsynonymous SNPs that change their coding potential10, indicating extensive variation between human X chromosomes. This variation, combined with XCI and its epigenetic propagation, gives rise to intra-individual clonal diversity in XX individuals.
Given that X-linked intra-individual diversity is widespread among XX individuals, it is of interest to consider its potential significance for organismal development. What is known so far is that stochastic and selective processes can affect the deployment of intra-individual clonal diversity.
Stochastic X-linked bias can arise from sampling errors early when founder cells are allocated to the three germ layers (ectoderm, endoderm and mesoderm) in embryonic development and can be further amplified by the allocation of cells to particular fates within each germ layer4 (Extended Data Fig. 1a). The resulting bias has been exploited to estimate the number of founder cells for cell types and tissues in embryonic development4 and the number of hematopoietic stem cells (HSCs) that contribute to the regeneration of blood cells in later life11.
A distinct form of X-linked bias arises from clonal selection against deleterious genetic variants that compromise the ability of variant-expressing clones to expand or survive in a cell-intrinsic fashion (Extended Data Fig. 1b). Clonal selection results in the dominance of clones that have inactivated the X chromosome harboring the deleterious variant and is relevant in the context of human disease, where intra-individual clonal diversity can mean a more favorable outcome in XX than XY individuals2,12.
Here we ask a different question, namely whether epigenetically diverse clones, which arise from the combined effect of XCI and X-linked genetic variation, merely co-exist in XX individuals, or whether they interact, and, if so, how such interactions may shape the landscape of X-linked clonal diversity. To this end, we generate mouse models of X-linked genetic variation found in the human STAG2 gene and uncover a noncell-autonomous mode of X-linked bias which is distinct from stochastic variation and selection against deleterious variants. We find that clones expressing Stag2 variants fail to adopt a lymphoid fate in the presence of competitor clones that have silenced the variant allele by XCI. Unexpectedly, however, the absence of competitors expressing wild-type (WT) Stag2 restored the full range of cell fate choices to clones expressing Stag2 variants. Our observations reveal that clonal interactions have the potential to shape the contribution of X-linked genetic diversity to specific cell types and tissues in XX individuals.
Results
Sequence variation and XCI combine to generate intra-individual genetic diversity
Analysis of 3,775 X chromosomes across 2,504 individuals from phase 3 of the 1000 Genomes Project13 found 13,796 nonsynonymous SNPs (SNPs that alter the amino acid sequence of proteins encoded on the X chromosome). The average number of such missense variants between any two X chromosomes was 138 (minimum = 3 and maximum = 232), omitting genes that escape X-inactivation in humans3,4. Ninety percent of X chromosome pairs harbored at least 101 missense variants. This analysis shows that sequence variation has the potential to generate intra-individual diversity in XX individuals when combined with XCI and its clonal propagation (Fig. 1a,b).
Sequence variants in the X-linked STAG2 gene disrupt cohesin–CTCF binding
STAG2 is an essential X-linked gene that is evolutionarily highly conserved14 (Fig. 1c and Extended Data Fig. 2) and encodes a subunit of cohesin, a protein complex that contributes to 3D genome organization as well as DNA replication, DNA repair and the stable propagation of chromosomes through cell division15. A survey of 125,748 human exomes10 (gnomAD v2.1) found that STAG2 coding variation was lower than predicted by chance, indicating a level of constraint expected for an essential gene (Fig. 1c,d). Nevertheless, >150 distinct missense variants were observed (Fig. 1c and Extended Data Fig. 2). We focused on gnomAD variant X-123185062—G-C (GRCh37) found in HG02885, an XX individual of African origin who self-reported as healthy, and participated with her husband and daughter in the control (nondisease) cohort of gnomAD v2.1.1. This SNP changes STAG2 arginine 370 to proline (R370P). STAG2 R370 contributes to an interaction interface that is formed jointly by the cohesin subunits STAG1/STAG2 and RAD21 (Fig. 1e). This interface has been described as a ‘conserved essential surface’ and is bound by the following cohesin-interacting proteins that are engaged in a range of DNA-based processes: CTCF in 3D genome organization16 (Fig. 1e), Shugoshin in sister chromatid cohesion17,18, MCM3 (minichromosome maintenance protein 3) in DNA replication19 and likely other cohesin interaction partners20. We used isothermal calorimetry to assess the impact of STAG2R370P on cohesin–CTCF interactions and found a complete loss of binding (Fig. 1f). Hence, sequence variation in the X-linked STAG2 gene illustrates the potential for clonal heterogeneity within XX individuals.
Stag2 variant progenitors fail to form lymphocytes in heterozygous XX individuals
To explore the impact of X-linked sequence variation at the organismal level, we generated mouse models of Stag2 variants in the conserved essential surface between STAG2 and CTCF (Fig. 1e). Stag2R370Q had a tenfold lower CTCF binding affinity than WT (Fig. 1f). A second variant, Stag2W334A, abolished the STAG2–CTCF interaction to the same extent as the human R370T variant (Fig. 1f). As expected16, STAG2–CTCF interface variants retained the ability to form DNA-bound cohesin complexes (Extended Data Fig. 3c). Stag2R370Q and Stag2W334A variants showed equivalent phenotypes and are therefore described together.
WT and variant Stag2 were equally represented in genomic DNA (gDNA) from heterozygous XStag2-WT and XStag2-variant female mice, as illustrated for gDNA from blood (Fig. 2a, left). An equivalent representation of Stag2WT and Stag2variant genomic sequences was expected, as the presence of gDNA is unaffected by the epigenetic inactivation of one X chromosome in XX individuals1. We next analyzed a range of cell types and tissues in heterozygous female mice to determine the contribution of clones in which the active X chromosome harbored the Stag2WT allele (Stag2WT clones) versus clones in which the active X chromosome harbored the Stag2variant allele (Stag2variant clones). We isolated RNA, reverse-transcribed RNA into cDNA and sequenced the complementary DNA (cDNA). Brain, gut and other tissues showed a roughly equal representation of Stag2WT and Stag2variant clones (Fig. 2a), while skewing toward Stag2WT clones was found in skeletal muscle (Fig. 2a). cDNA isolated from peripheral blood mononuclear cells showed a markedly reduced expression of variant Stag2 (Fig. 2a and Extended Data Fig. 4a,b), indicating a near-complete absence of Stag2variant clones.
To quantify the contribution of Stag2variant versus Stag2WT clones, we used allele-specific qRT–PCR (see Extended Data Fig. 4c for calibration). This analysis confirmed reduced representation of Stag2variant clones in blood mononuclear cells (Fig. 2b) and in skeletal muscle and revealed increased representation of Stag2variant clones in the heart (Fig. 2b and variants are shown separately in Extended Data Fig. 4d).
T and B lymphocytes are the major mononuclear cell types in blood. CD4 T and B cells isolated from lymph nodes of Stag2variant Stag2WT heterozygous females (Fig. 2c(i) and gating strategy in Extended Data Fig. 4e) showed a near-complete absence of Stag2variant clones as determined by sequencing (Fig. 2c(ii)) and allele-specific qRT–PCR (Fig. 2c(iii)). We developed a reporter system to directly visualize individual cells expressing Stag2variant or Stag2WT by inserting a Luc/βGal reporter construct21,22 into the X-linked Atrx gene, which is subject to XCI and broadly expressed across cell types and tissues, including the hematopoietic system23. AtrxLuc/βGal allows the visualization and prospective isolation of live AtrxLuc/βGal cells by flow cytometry, based on the conversion of nonfluorescent fluorescein di-β-D-galactopyranoside (FDG) into green fluorescent fluorescein isothiocyanate (FITC) by the enzymatic activity of β-galactosidase (βGal). We confirmed that FDG conversion was indeed dependent on the presence of the AtrxLuc/βGal reporter (Extended Data Fig. 5a–c). In female mice that were heterozygous for the AtrxLuc/βGal reporter and had two WT alleles of Stag2, FDG to FITC conversion occurred in approximately half of all T and B lymphocytes (Fig. 2c(iv), top) and other hematopoietic cell types examined (Extended Data Fig. 5a–c). This indicates that the reporter itself does not substantially skew X chromosome usage. Sanger sequencing and allele-specific qRT–PCR confirmed the fidelity of the reporter, as well as the monoallelic expression of Stag2 in XX individuals (Extended Data Fig. 5d). In lymphocytes isolated from Stag2WT Stag2variant AtrxLuc/βGal heterozygous females, Stag2WT clones dominated over Stag2variant AtrxLuc/βGal clones (Fig. 2c(iv), bottom, and Extended Data Fig. 5c). Taken together with the sequencing and allele-specific qRT–PCR data, these results indicate that Stag2variant clones fail to contribute substantially to mature T and B lymphocytes in Stag2WT Stag2variant heterozygous females.
Blood cells are continuously replenished by hematopoietic stem and progenitor cells11 (Fig. 2d), allowing the developmental origin of skewed X chromosome usage to be traced. T cell fate specification of bone marrow-derived progenitors occurs in the thymus, and we, therefore, examined the representation of Stag2variant clones among thymocyte subsets at successive stages of development (Fig. 2e(i) and gating strategy in Extended Data Fig. 4e). Sequencing (Fig. 2e(ii)), allele-specific qRT–PCR (Fig. 2e(iii)) and FDG labeling of Stag2WT Stag2variant AtrxLuc/βGal thymocytes (Fig. 2e(iv) and Extended Data Fig. 5c) showed that Stag2variant clones were barely detectable among developing T cells. Thymocyte differentiation of Stag2variant clones was not rescued by provision of rearranged lymphocyte receptor transgenes (Extended Data Fig. 6). Stag2variant clones were also absent from developing pro-B and pre-B cells in the bone marrow (Extended Data Fig. 7).
We next examined the representation of variant Stag2 RNA in hematopoietic stem (LSK), c-kit+ and common lymphoid progenitor (CLP) cells isolated from the bone marrow of heterozygous Stag2WT Stag2variant female mice (Fig. 2f(i) and gating strategy in Extended Data Fig. 4e). Sequencing (Fig. 2f(ii)), allele-specific qRT–PCR (Fig. 2f(iii)) and FDG labeling (Fig. 2f(iv) and Extended Data Fig. 5b) revealed skewing against Stag2variant clones in hematopoietic stem and progenitor cells. In contrast to lymphocytes, the representation of Stag2variant clones among mature myeloid cells remained comparable to hematopoietic stem and progenitor cells (Extended Data Fig. 7).
In conclusion, the hematopoietic system of Stag2WT Stag2variant heterozygous individuals appeared outwardly normal with respect to the number and composition of cell types in bone marrow, thymus and peripheral lymph nodes. However, the clonal composition of the hematopoietic system was skewed toward Stag2WT clones, and few, if any, Stag2variant clones contributed to immature and mature lymphocyte subsets. These findings suggested that hematopoietic progenitors with an active X chromosome harboring Stag2 variants were unable to undergo lymphoid specification and differentiation.
Reduced lymphoid priming in Stag2 variant hematopoietic progenitors
We isolated lineage-negative, c-kit+ Stag2WT and Stag2variant cells from the bone marrow of heterozygous females for single-cell RNA-sequencing (scRNA-seq; Fig. 3a, Extended Data Fig. 8a and gating strategy in Extended Data Fig. 4e) and identified progenitors based on established marker genes (Supplementary Data 1). DESeq2 found 1,600 upregulated and 802 downregulated genes in Stag2variant progenitors (adjusted P < 0.01; Fig. 3b and representative gene ontology terms in Extended Data Fig. 8b). As STAG2 is part of the cohesin complex, we analyzed the relationship between cohesin binding and deregulated gene expression in Stag2variant progenitors. Leveraging cohesin chromatin immunoprecipitation followed by sequencing (ChIP–seq) from hematopoietic progenitors, we found that genes that were deregulated in Stag2variant progenitors were highly enriched for cohesin promoter binding compared to non-deregulated genes (Extended Data Fig. 8c), which links transcriptional deregulation in Stag2variant cells to cohesin….