Symbiosis with Ancient Viruses Critical for Human Development

Thu, 12/03/2015 - 8:54am
Cynthia Fox, Science Writer

Human blastocysts don't develop when ancient virus RNA, "trapped" in our DNA for millions of years, is artificially blocked. (Image: Wikimedia)
Human blastocysts don't develop when ancient virus RNA, "trapped" in our DNA for millions of years, is artificially blocked. (Image: Wikimedia)
In recent years humans have come to understand we are not just about Darwinian natural selection, but symbiosis. For two billion years, there were only bacteria and archaea. Then a single archaea swallowed a bacteria in such a way the bacteria became its powerpack. Complex life exploded out of this symbiosis.

Remnants of that moment are alive in humans today: experimental and genetic analysis proves the power packs of our cells, mitochondria, are indeed ancestors of those ancient bacteria.

This week a Stanford University crew reported in Nature Genetics making another key symbiosis finding: human embryos need ancient viral RNA, trapped in the non-protein-coding regions of our genomes, to grow. They are essential for our existence.

Those ancient viral RNA “were acquired in the primate lineage, some millions of years ago, by infection/insertion into the germ-cell lineage that gives rise to eggs and sperm,” developmental biologist Renee Reijo Pera, Ph.D., told Bioscience Technology. Reijo Pera, co-senior author on the study, is now vice president for research and economic development at Montana State University. “Our development without them would have been fundamentally different.  Different species likely all use their own sequences. To the best of our knowledge, there are no other data to show that single non-coding, human-specific, retrovirally derived genes are essential for timing or cell fate decisions in human development. We were very surprised by the results.” 

Harvard University/Massachusetts Institute of Technology geneticist John Rinn, Ph.D., told Bioscience Technology that function was earlier linked to some retroviral RNA elements found in isolated stem cells, and induced pluripotency. But he agreed that, until now, a seminal role was not found for those elements in actual human development: “This manuscript makes significant progress in understanding the functional roles of ERV-lncRNA [endogenous retroviral long-coding RNA].” Rinn was uninvolved in the work.

“The paper is interesting because it is one of the first to probe the role of long non-coding RNA— a relatively poorly understood class of regulatory RNAs— in early human development and pluripotency,” Harvard stem cell researcher George Daley, M.D., Ph.D., told Bioscience Technology. Daley, director of the Dana Farber Cancer Institute/Boston Children’s Hospital Stem Cell Transplantation Program, made some early lncRNA finds with Rinn. He was also uninvolved in the new work.

“Quite a nice paper, bringing the field quite a step ahead,” Riken Center for Life Sciences geneticist Piero Carninci, Ph.D., told Bioscience Technology. Also uninvolved in the work, he said some saw this coming—most did not: “Many colleagues still see ‘danger’ and ‘DNA damage’ as the main, and possibly only, role for retroviral elements.”

Renee Reijo Pera, Ph.D.
Renee Reijo Pera, Ph.D.
The quest

Reijo Pera and Stanford co-senior author Vittorio Sebastiano, Ph.D., wanted to study the genetics of embryonic stem (ES) cell pluripotency, or the ability to form all cells. The team isolated all genes highly expressed, or active, in human ES cells, including genes in the non-coding regions. For the team knew the above non-coding RNA molecules, called long-intergenic noncoding RNAs (lincRNAs), had recently been shown active in many biological processes, including pluripotency acquisition. Located in the so-called “Junk DNA” regions of the genome, these RNA molecules do not form proteins, but can affect expression (the “on/off” state) of protein-making genes in the coding region.

It has been difficult to characterize these non-protein-coding genes, since so many lincRNAs possess similar, repetitive regions. But using their new RNA sequencing technology, the team pulled it off. They found more than 2,000 previously unknown RNA sequences total in human ES cells, 146 in ES cells solely. Of the 146, they winnowed out 23 of the most highly expressed.

Unexpectedly, they found all 23 were retroviral elements.

Calling these HPAT1-23, the team discovered that 13 of them were HERV-H retroviruses, which, like HIV, spread by infecting and co-opting the DNA machinery of cells to produce viral proteins—that then infect other cells. When germ cells are infected, the co-opted retroviral sequences can be inherited by offspring.

As they hail from infections occurring millions of years ago, most trapped ancient viral RNA sequences were thought non-functional. Increasingly, some were found dangerous: complicit in cancers. Few have been called advantageous.

But after isolating HPAT1-23 in ES cells, the Stanford team analyzed their expression in human blastocysts, and saw that HPAT2, HPAT3, and HPAT5 were expressed only in the blactocysts’ inner cell mass, which forms ES cells and the fetus. When the team blocked expression of the above three elements in a single cell of a two-celled human embryo, the cell stopped creating the inner cell mass.  The team also found those three genes were necessary to make induced pluripotent stem cells (iPSCs) out of adult cells.

Furthermore, the team found those three virally derived RNA molecules apparently exist only in primates. This may mean they are partly responsible for some seminal ways humans differ from other animals.

Finally, the team discovered that HPAT-5, in particular, affects pluripotency in concert with let-c microRNAs.

Since Aristotle, development pathways thought universal

A key reason her team was surprised, Reijo Pera told Bioscience Technology: “Since the earliest days of developmental biology and genetics dating back to Aristotle, there has been regard for what some have called ‘the principle of universality.’ Under this principle, there is a focus on the conserved properties that establish developmental pathways across all animals from frogs and fish, to mice and humans. Indeed, early in my career, I used conservation of DNA sequences as a test of gene identity, under the assumption that genes that are required for development must be conserved.”

Continued Reijo Pera, although that assumption “has borne fruit in science throughout the decades, we must consider, as well, that studies of human genetics and disease often uncover variants that do not map to conserved sequences (protein-coding genes). Indeed, it is estimated that only 12 percent of SNPs (single nucleotide polymorphisms) associated with disease or common traits are located in, or occur in tight linkage disequilibrium with, protein-coding regions of genes. This is in spite of the fact that SNPs in protein-coding regions are vastly over-represented in GWAS (genome wide association studies).”

Therefore, Reijo Pera said, the team sought to examine “function of a subset of a large group of human-specific transcripts that may be linked to human developmental timing, and potentially ultimately to disease. What we found is that disruption (knockout) or overexpression of single human-(and nonhuman primate)-specific genes changes development, influencing the balance between embryonic and differentiated cell fates.”

The team used both ES cells, and iPSCs, “to validate our findings, insure they were not just an artifact of one cell type or the other.”

Challenging the primacy of protein-coding regions?

Carninci—senior author on the first paper reporting development of a large-scale screen for active versions of these elements in mammal stem cells—told Bioscience Technology that, for a long time, genes were considered “essentially to overlap with proteins. In the genome field, search for genes has been mostly focused on the identification of protein-coding ones. As an example, the Mammalian Gene Collection project (led by the NIH) in the past decade focused on making a catalog of protein-coding genes only.”

However, he said: “For a decade we have known there are many long non-coding RNA, but we have been behind with the identification of their function for many reasons, including the very large number of lncRNAs, and the belief that they may not be functional. So it has been difficult to raise large funds for studying these RNAs.”

But several studies have now “identified the expression of a subset of lncRNA from repetitive elements that include the retrotransposon elements (RE) identified in the new Stanford study. RE can retrotranspose, and cause genome damages, like disruption of genes where they insert. Most colleagues still see these elements as true parasites of the genome, something dangerous that cells have to strictly control. Additionally, many of the RE elements are quite specific (that is, mouse and human elements are different). Hence, in light of classic evolutionary biology, they are not conserved, and they ‘should not’ have functions. We and others have seen they are expressed in ES cells.”

The “next level” is achieved: critical role in human development

In the new paper, he said, the Stanford team took “three of these lncRNAs deriving from TE elements. Against common belief, these RNA do have function. In particular, the function of one of them is linked to important regulation of both pluripotency, and nuclear reprogramming.  This work brings the functional characterization of human-lineage specific lncRNAs to the next level, suggesting we cannot ignore these RNAs in biological studies.”

Identifying the function of lncRNAs like this also helps explain “the complexity of regulation of high organisms like vertebrates,” Carninci said. “We have very complex developmental plans, yet the number of protein-coding genes is not much different from the C. elegans worm. The existence of regulatory lncRNAs, like those in the study, helps to explain the gene-number paradox, and how genomes are regulated.  There are many more thousands of lncRNA like these in stem cells awaiting functional studies, and tens of thousands broadly expressed in many cells.”

Daley agreed. “There are several papers that have shown that loci associated with transposable elements or endogenous retroviruses play roles in the early embryo,” he told Bioscience Technology. “Perhaps the most interesting aspect of these studies is they imply that our human genome, and indeed human development, has been shaped by mutations and genetic variation due to transposable elements or endogenous retroviruses.”

In 2010, Rinn and Daley were first to find a function for a lincRNA gene in pluripotency (by using a lincRNA to create iPSCs). “We have known these elements can be functional since 2010, with the first example being linc-ROR in human reprogramming,” Rinn told Bioscience Technology. “What was surprising was that, the next year, we found hundreds of ERV-derived lncRNA genes that only like to turn on in early development cells, e.g. stem cells.”

The future in lab and clinic

Rinn thinks key next steps will be “to understand how these ERV elements could be evolving new genes. Both human and mouse have different ERV that accomplish the same goal of producing pluripotent-specific lncRNA genes. A fun experiment we have on the back burner is to try and engineer a new ERV site in an otherwise vacant region of the genome. It would be interesting to see if the cell learns to evolve a new gene and or depend on it.”

Said Reijo Pera: “An important next step is to examine other genes, as we identified more than 200 genes, and to examine effects of these loci on human disease via GWAS. We think there may be future clinical applications in diagnosis of normal human development and potentially, depending on results, association with complex disease.”