Before we understood that DNA was the genetic code, scientists knew that bacteria transferred it between cells. In 1928, 25 years before the structure of DNA was solved, British bacteriologist Frederick Griffith demonstrated that live, nonvirulent bacteria could transform into virulent microbes after being incubated with a heat-killed virulent strain. Fifteen years later, a trio of researchers at the Rockefeller Institute for Medical Research (now Rockefeller University), Oswald Avery, Colin MacLeod, and Maclyn McCarty, demonstrated that this transformation was mediated by DNA. Even dead bacteria, it seemed, could share their genes.
This DNA-sharing process, known as horizontal or lateral gene transfer (LGT), is now understood to occur by the direct movement of DNA between two organisms. Almost all bacterial genomes show evidence of past LGT events, and the phenomenon is known to have profound effects on microbial biology, from spreading antibiotic resistance genes to creating new pathways for degrading chemicals. But LGT is not limited to bacteria. Scientists now recognize that microbes transfer DNA to the plants, fungi, and animals they infect or reside in, and conversely, human long interspersed elements (LINEs) have been found in bacterial genomes. Moreover, researchers have documented LGT from fungi to insects and from algae to sea slugs. There is reason to believe that any two major groups of organisms—including humans—can share their genetic codes.
People have long been intrigued by the prospect of foreign DNA within our own genomes. Human genomes harbor evidence of beneficial LGTs from bacteria in the recent past, and there is evidence that transfers may occur regularly between resident bacteria and somatic cells of the body. How commonly bacteria-animal LGT occurs is unclear, as are the mechanisms of these transfers. But if LGTs induce harmful mutations, they may be an unrecognized cause of disease.
Gene swap
Bacteria are a genomically promiscuous bunch. They do not reproduce sexually but are among the most genetically varied species because they are constantly exchanging bits of their genetic code via LGT. Their diversity has allowed them to adapt to every ecological niche on the planet, from deep-sea hydrothermal vents to the frozen lakes of Antarctica, from rock crevices to our own intestines. LGT between bacteria has been categorized as transformation by free DNA (genetic material is released into the environment by bacteria and taken up by living microbes, as in Griffith’s experiment), transduction by viruses, and direct cell-cell transfer through conjugation.
The mechanisms of transfer from bacteria to other organisms are less clear, but are likely similar. Bacteria’s type IV secretion system is a syringe-like protein known to inject molecules from bacteria into their host cell though cell-cell contact. It is an important mediator of LGT between Agrobacteriumand plants in the wild, as well as in the lab, where it can be used to create genetically modified crops and can even mediate transfer betweenAgrobacterium and human cells. Using whole genome sequencing, researchers have found that the genomes of numerous insects and nematode worms sometimes contain DNA from microbes inhabiting or infecting their bodies. Some species contain vast arrays of Wolbachia endosymbiont DNA, for example—up to many complete copies of the bacterial genome. (See illustration.)
These large LGTs can be nearly identical in sequence to the endosymbiont genome, suggesting that they happened quite recently. Some insect species carry remnants of much older gene transfers that were beneficial to the recipient species and have been selected for over time. The coffee berry borer, for example, coopted a bacterial mannanase gene that allows it to eat coffee berries.1 Coopted bacterial mannanase genes may also underlie crop destruction caused by the invasive brown marmorated stink bug.2 And aphids synthesize their own carotenoids using genes transferred from fungi to produce a colorful appearance important to defense.3 As more examples of LGT among diverse organisms crop up in the literature, it’s only natural to focus on the human angle. Does it occur in us, and if so, how often, and what are the consequences?
LGT in humans
The extent and importance of LGT in vertebrate animals is less clear, in part because fewer of their genomes have been sequenced, and/or analyzed with suitable methods, compared with those of invertebrates. One vertebrate species whose genome has been extensively studied—humans—has yielded solid evidence of ancient LGT events.
In 2001, the first draft sequence of the human genome was suggested to have 223 LGT-derived regions that were not present in other species’ genomes that had been sequenced at that time.4 Some researchers quickly disputed this number as an overestimate, even suggesting that all of the proposed LGTs were more likely explained through alternative mechanisms such as gene loss or convergent evolution.5 A new analysis published last year by Alastair Crisp of the University of Cambridge and colleagues found more than 130 traces of possible LGT events in the human genome—including the presence of fungal hyaluronan synthases, a fat mass and obesity associated gene (FTO), and the gene responsible for blood types (ABO). But most, if not all, of the identified events predate the human and primate lineages and were identified because the researchers chose to no longer limit the results to LGTs that exist only in humans and not in other animal species.6
In order for a nonhuman gene to appear in the genomes of many people, however, the LGT needs to occur in the germline so that it can be passed to future generations; and it has to confer some benefit to the host. Such LGTs may be rare, because humans may not experience strong selection for new functions in our genome, and because our germ cells are thought to be protected from other organisms and their DNA. However, LGT might be possible in the somatic human genome; such insertional mutations would be very difficult to detect, though, without sequencing large numbers of human cells.
Once they are present in the human somatic genome, it’s not hard to imagine how LGT insertions could cause disease. In fact, while definitive evidence of recent LGT in humans is still lacking, there are other types of DNA transfer that are well known to negatively impact humans. For example, human papillomavirus (HPV) is the cause of 80 percent to 100 percent of cervical cancers. The virus can integrate into the chromosomes of cervical cells, and if the integration is incomplete, certain HPV proteins can become unregulated, leading to disruption of apoptosis, an increase in cell proliferation, and ultimately cancer. Likewise, hepatitis B virus (HBV) causes hepatocellular cancer and has been found to insert its DNA into infected hepatocytes as the cells regenerate. HBV recurrently integrates its viral enhancer gene and its core gene into cancer-related genes, causing increased cell growth and survival, two hallmarks of cancer.7
Given the known risk of such integrations, we have focused on identifying LGT of bacterial DNA in the human genome. We knew that we needed to look at data from a large number of individuals, so we relied on publicly available human sequence data from the original public and private human genome projects and the 1000 Genomes Project. We quickly realized that if an LGT happened in a terminally differentiated cell that no longer replicates its DNA, it would exist in only one copy, and we would never be able to distinguish it from noise during sequencing. So we turned to tumors. We figured that, should an insertion occur in a progenitor cell of the tumor, it should be propagated in the tumor and be detected multiple times.
We analyzed genome sequence data from nine different tumor types from Cancer Genome Atlas projects and used bioinformatics tools to identify potential DNA integrations. In results published in 2013, we found sequences from Acinetobacter species in acute myeloid leukemia (AML) samples and from Pseudomonasspecies in stomach adenocarcinoma (STAD) samples. There were recurrent insertions in cancer-related genes in the STAD samples.8
In both the AML and STAD cancer samples, we only identified evidence of bacterial 16S and 23S rRNA fragments integrating into the human genome. Karsten Sieber, then a graduate student in the Dunning Hotopp lab, created models of the STAD integrations in cancer-related genes and observed that these pieces of the rRNA genes contain secondary structures that form numerous stem-loops, or hairpin loops. These integrations occur in the 5’-untranslated region (5’-UTR) of the cancer-related genes, meaning that they are transcribed but not translated. The stem-loops predicted in the inserted rRNA gene fragments could alter the secondary structures of the transcripts, thereby disrupting transcription and/or translation. We also noticed that the putative STAD integrations occur in G-rich regions of the cancer-related genes, which can also be important for gene regulation.9
ing Xu at the University of Georgia has also identified LGT events in human tumors. His team looked for evidence of genetic material from Helicobacter pylori bacteria and the Epstein-Barr virus, both of which have been associated with gastric cancer. The researchers identified H. pylori integrations in 36 genes in the gastric samples, with more integrations present in the tumors relative to controls.10 Chronic infection with H. pylori can cause double-strand DNA breaks, and human cells may “heal” these double-strand breaks by inserting pieces of stray DNA. Often this is nuclear or mitochondrial DNA, but if bacterial DNA is present, including H. pylori DNA, it could become integrated. These integrations could therefore be a side effect, rather than a cause, of cancer.
It’s unclear how bacterial DNA evades the human immune system, which recognizes most forms of nucleic acids. But these studies suggest that LGT events can and do occur in human tissues, perhaps with devastating consequences. So far, these are the only reported cases of bacterial DNA integrations in human cancers. Whether such events cause cancer, and if so, how commonly, remains to be seen.
A steep road ahead
A number of challenges face researchers hoping to assess the presence and impact of bacterial DNA integrations in the genomes of human cells. A thorough study of this sort is still expensive, and after the samples have been sequenced it takes significant resources to develop, implement, and run a computational tool to identify LGT.
Contamination also remains a barrier. This was an issue in an analysis of the human genome in 2001, and it continues to be a problem today. DNA extraction kits have been shown to contain bacterial nucleic acids. Contaminants can also be introduced via sample handling, from reagents, and during sequencing. During the process of creating the DNA library to be sequenced, chimeras that look like bacterial DNA integrations might form. Sure enough, earlier this year, researchers found that the extent of LGT in the tardigrade genome was initially overestimated; some proposed LGTs likely arose from the genomes of bacterial contaminants and not from the tardigrade genome itself.11
Despite skepticism from some corners of the scientific community and the difficulties of studying bacterial DNA integrations, we believe that LGTs are an important form of insertional mutagenesis. Perhaps now that putative bacterial DNA integrations have been identified in cancer, more researchers will look for these mutations in other diseases. A bacterial DNA integration that occurs in a human cell and leads to the expression of a bacterial compound recognized by the human immune system has the potential to trigger autoimmune disease, for example. Further research on the occurrence and consequences of LGT in human cells will likely reveal the phenomenon to be much more common and important than currently appreciated.