Molecular Medicine Israel

Science Forum: The Human Cell Atlas

Abstract
The recent advent of methods for high-throughput single-cell molecular profiling has catalyzed a growing sense in the scientific community that the time is ripe to complete the 150-year-old effort to identify all cell types in the human body. The Human Cell Atlas Project is an international collaborative effort that aims to define all human cell types in terms of distinctive molecular profiles (such as gene expression profiles) and to connect this information with classical cellular descriptions (such as location and morphology). An open comprehensive reference map of the molecular state of cells in healthy human tissues would propel the systematic study of physiological states, developmental trajectories, regulatory circuitry and interactions of cells, and also provide a framework for understanding cellular dysregulation in human disease. Here we describe the idea, its potential utility, early proofs-of-concept, and some design considerations for the Human Cell Atlas, including a commitment to open data, code, and community.
Introduction
The cell is the fundamental unit of living organisms. Hooke reported the discovery of cells in plants in 1665 (Hooke, 1665) and named them for their resemblance to the cells inhabited by monks, but it took nearly two centuries for biologists to appreciate their central role in biology. Between 1838 and 1855, Schleiden, Schwann, Remak, Virchow and others crystalized an elegant Cell Theory (Harris, 2000), stating that all organisms are composed of one or more cells; that cells are the basic unit of structure and function in life; and that all cells are derived from pre-existing cells (Mazzarello, 1999; Figure 1).
To study human biology, we must know our cells. Human physiology emerges from normal cellular functions and intercellular interactions. Human disease entails the disruption of these processes and may involve aberrant cell types and states, as seen in cancer. Genotypes give rise to organismal phenotypes through the intermediate of cells, because cells are the basic functional units, each regulating their own program of gene expression. Therefore, genetic variants that contribute to disease typically manifest their action through impact in a particular cell types: for example, genetic variants in the IL23R locus increase risk of autoimmune diseases by altering the function of dendritic cells and T-cells (Duerr et al., 2006), and DMD mutations cause muscular dystrophy through specific effects in skeletal muscle cells (Murray et al., 1982).

For more than 150 years, biologists have sought to characterize and classify cells into distinct types based on increasingly detailed descriptions of their properties, including their shape, their location and relationship to other cells within tissues, their biological function, and, more recently, their molecular components. At every step, efforts to catalog cells have been driven by advances in technology. Improvements in light microscopy were obviously critical. So too was the invention of synthetic dyes by chemists (Nagel, 1981), which biologists rapidly found stained cellular components in different ways (Stahnisch, 2015). In pioneering work beginning in 1887, Santiago Ramón y Cajal applied a remarkable staining process discovered by Camillo Golgi to show that the brain is composed of distinct neuronal cells, rather than a continuous syncytium, with stunningly diverse architectures found in specific anatomical regions (Ramón y Cajal, 1995); the pair shared the 1906 Nobel Prize in Physiology or Medicine for their work.

Starting in the 1930s, electron microscopy provided up to 5000-fold higher resolution, making it possible to discover and distinguish cells based on finer structural features. Immunohistochemistry, pioneered in the 1940s (Arthur, 2016) and accelerated by the advent of monoclonal antibodies (Köhler and Milstein, 1975) and Fluorescence-Activated Cell Sorting (FACS; Dittrich and Göhde, 1971; Fulwyler, 1965) in the 1970s, made it possible to detect the presence and levels of specific proteins. This revealed that morphologically indistinguishable cells can vary dramatically at the molecular level and led to exceptionally fine classification systems, for example, of hematopoietic cells, based on cell-surface markers. In the 1980s, Fluorescence in situ Hybridization (FISH; Langer-Safer et al., 1982) enhanced the ability to characterize cells by detecting specific DNA loci and RNA transcripts. Along the way, studies showed that distinct molecular phenotypes typically signify distinct functionalities. Through these remarkable efforts, biologists have achieved an impressive understanding of specific systems, such as the hematopoietic and immune systems (Chao et al., 2008; Jojic et al., 2013; Kim and Lanier, 2013) or the neurons in the retina (Sanes and Masland, 2015).

Despite this progress, our knowledge of cell types remains incomplete. Moreover, current classifications are based on different criteria, such as morphology, molecules and function, which have not always been related to each other. In addition, molecular classification of cells has largely been ad hoc – based on markers discovered by accident or chosen for convenience – rather than systematic and comprehensive. Even less is known about cell states and their relationships during development: the full lineage tree of cells from the single-cell zygote to the adult is only known for the nematode C. elegans, which is transparent and has just ~1000 cells.

At a conceptual level, one challenge is that we lack a rigorous definition of what we mean by the intuitive terms ‘cell type’ and ‘cell state’. Cell type often implies a notion of persistence (e.g., being a hepatic stellate cell or a cerebellar Purkinje cell), while cell state often refers to more transient properties (e.g., being in the G1 phase of the cell cycle or experiencing nutrient deprivation). But, the boundaries between these concepts can be blurred, because cells change over time in ways that are far from fully understood. Ultimately, data-driven approaches will likely refine our concepts.

The desirability of having much deeper knowledge about cells has been well recognized for a long time (Brenner, 2010; Eberwine et al., 1992; Shapiro, 2010; Van Gelder et al., 1990). However, only in the past few years has it begun to seem feasible to undertake the kind of systematic, high-resolution characterization of human cells necessary to create a systematic cell atlas.

The key has been the recent ability to apply genomic profiling approaches to single cells. By ‘genomic approaches’ we mean methods for large-scale profiling of the genome and its products, including DNA sequence, chromatin architecture, RNA transcripts, proteins, and metabolites (Lander, 1996). It has long been appreciated that such methods provide rich and comprehensive descriptions of biological processes. Historically, however, they could only be applied to bulk tissue samples comprised of an ensemble of many cells, providing average genomic measures for a sample, but masking their differences across cells. The result is as unsatisfying as trying to understand New York, London or Mumbai based on the average properties of their inhabitants.

The first single-cell genomic characterization method to become feasible at large-scale is trancriptome analysis by single cell RNA-Seq (Box 1; Hashimshony et al., 2012; Jaitin et al., 2014; Picelli et al., 2013; Ramsköld et al., 2012; Shalek et al., 2013). Initial efforts first used microarrays and then RNA-seq to profile RNA from small numbers of single cells, which were obtained either by manual picking from in situ fixed tissue, using flow-sorting or, later on, with microfluidic devices, adapted from devices developed initially for qPCR-based approaches (Crino et al., 1996; Dalerba et al., 2011; Marcus et al., 2006; Miyashiro et al., 1994; Zhong et al., 2008). Now, massively parallel assays can process tens and hundreds of thousands of single cells simultaneously to measure their transcriptional profiles at rapidly decreasing costs (Klein et al., 2015; Macosko et al., 2015; Shekhar et al., 2016) with increasing accuracy and sensitivity (Svensson et al., 2017; Ziegenhain et al., 2017). In some cases, it is even possible to register these sorted cells to their spatial positions in images (Vickovic et al., 2016). Single-cell RNA sequencing (scRNA-seq) is rapidly becoming widely disseminated…..

Sign up for our Newsletter