As the cost of whole genome sequencing (WGS) declines, the number of human genome sequences is increasing rapidly. This is revealing new biochemistry, for example, exogenous RNA in our blood, incorporation of genetic sequences from progeny in the parent genome, and existence and operation of different genomes in the same organism. Google’s Calico subsidiary raises serious questions about the use of data from WGS.
In 2012 and before, my simplistic view of human genomics was:
- DNA is the master template. Each of us has a set of unique chromosomes that, except for cancer, are invariant in the lifetime of a particular human.
- Cancer results from a mutation. Mutations may occur stochastically or may be induced by oncogenic agents, including viruses and specific chemicals.
- RNA (messenger, transfer, and ribosomal) is coded ultimately from an individual’s DNA. RNA, proteins, and hormones are responsible for control homeostasis.
- Blood supplies nutrients to the body and removes the waste products.
As 2013 comes to an end, it seems that my simple, closed world is nearly all wrong, or atleast incomplete.
For example, a paper in PLOS in late December 2012 reported that human plasma contained percentage concentrations of RNA that is exogenous to the host and is traced to bacteria, fungi, and other sources.1 Exogenous RNA (eRNA) is from the host’s digestive tract. Some eRNA is detected in intracellular complexes, where it might influence the cell’s activity. The authors hypothesize that eRNAs may directly mediate human-microbiome function, which is essential to homeostasis. So points 3 and 4 above must now be expanded to include eRNA carried by blood. How it is protected from plasma RNAases is not clear.
The variable and dynamic genome
On September 17, the lead report in the Science section of The New York Times (Zimmer, C., “DNA Double Take”)2 pointed out that $1000 whole genome sequencing is uncovering genetic causes of rare mutants. Some examples: Autopsies of patients with no known genetic defect including cancer show that cells within the same host can operate with a different genetic blueprint. One highly visible cause is patterns in skin color attributed to mosaicism.* Mosaicism arises from spontaneous mutation of the chromosome during cell division, moving one or more chromosomes to others. This happens only in a specific cell. The rearrangement must produce a viable cell with the ability to replicate in a controlled manner, or the mutation does not propagate.
Other instances of genome modification in individuals cited by Zimmer include: One donor’s blood was found to be a mixture of type O and A. She acquired blood from her twin brother in the womb. A kidney transplant patient was found to possess two genomes, which were expressed individually in her eggs.
Another example shows that a woman can “inherit” genomes from her progeny. This seems quite common. Autopsy of the brains of 59 women showed Y (male) chromosomes in 63%. A similar study of female breast tissue found 56% Y chromosome.
One possible explanation is that the cells in the sample were a mixture of cells with different pedigree as described for the presence of Y chromosomes. The high percentages of Y chromosome content possibly indicate that the pathway is very active. My guess is that the non-Y chromosomes (~40%) are from women who did not have a male baby. In any case, statements 1 and 2 above are not correct.
The new paradigm will impact many fields, such as forensics. Zimmer cited a case where the genetic code of saliva and sperm from the same person did not match. Another example: Analysis of cheek swabs of bone marrow transplants showed a mix of genomes from the patient and donor. However, is this a mixture of cells or did the genomes indeed rearrange in the cells? Single-cell DNA sequencing would probably answer this question, but at added expense.
Affinity groups fund whole genome databases
A September 17 article in TheWall Street Journal3 described how various cancer foundations are investing tens of millions of U.S. dollars in sequencing hundreds of genomes from afflicted patient cohorts in a desperate effort to improve therapeutic efficacy. This is in contrast to the traditional approach of recruiting cohorts for clinical trials based upon symptoms of presenting patients. Symptoms are few (fever, ache, pain, nasal congestion, fluid in lungs, etc.) compared to the huge numbers of diseases. Physicians are responsible for connecting the dots between the macro symptoms and the numerous possible causes.
One example of affinity groups: The Multiple Myeloma Society committed $40 million to a 1000-patient study called “Compass” to connect the symptoms and root causes of blood cancer. This effort seems to be paying off since six new drugs have been approved and 18 are in the pipeline.
Another example: A program to curate DNA sequences from a cohort of neonates.4 The hypothesis: The DNA profile could reveal more information than is derived from current (“heelstick”) neonatal screens for phenylketonuria (PKU), lysosomal storage disorders or heart defects (“blue baby heart”), and 8000 more.5 Searches of genomic data gathered at birth including the transcriptome could be extended to spotlight people at high risk for various diseases, including yet-undiagnosed developmental disorders.
However, I can see others who might want to mine the data, even later in life. Parents could unintentionally pierce the veil of privacy for their progeny. Just suppose that genetic technology could advance to allow prediction of sexual development. Would a parent want to enable searching databases for hermaphrodites?6 What if sexual preference turns out to have a genetic component? Are we ready to have this information as part of a public database?
My point is that WGS, including the transcriptome, can and probably will be a key for releasing information that some would prefer to keep private. One can also expect a learning curve, where some predictions are erroneous. In time, the prediction engines will undoubtedly improve, but the impact of false positives could be life altering.
At the 2013 NoSQL Now! Conference in August in San Jose, CA, lecturers gave examples of searching gigantic databases.7 Plus, graph analytical technology can quickly draw apparent correlations. Some correlations are weak and occasionally dead wrong, but graph search and inference technology seems to be very powerful.
Commercialization of the genome
Then, on September 19th, my local paper (Contra Costa Times, which also is affiliated with the San Jose Mercury News) introduced Google’s new subsidiary company called Calico. The new venture will focus on health, wellness, and aging. The name “Calico” struck me as a great name that I connected to mosaicism in cats. On the genetic level, mosaicism is a rare but often visible genetic disorder.
Larry Page, the founder of Google, characterized Calico as a “moon shot” possibly to curate all the medical information to support the goals above. He appointed Art Levinson of Genentech and Apple fame to head the new venture. Funding “moon shots” is usually fine. After all, the money is theirs, and since health care is expensive and inefficient, they can try to improve a bad situation.
Calico appears to envision integration of all health-related information. Genomics, including transcriptomics, would be only a part of the picture, but in terms of IT, it would be a major part. An annotated genome for an individual is about a GB. Adding the transcriptome could add another GB. Add a third GB for the remainder of a person’s records. Three GB is not large by today’s standards, but factor in a billion people, and one is inside the Exabyte (EB) realm.
How, could, should, and will Google use the information? Could Calico have a revenue model that includes selling advertising to pharmaceutical firms? Life insurance firms might be interested in searching the profile of prospective policy-holders. Firms selling annuities would certainly like to avoid issuing a policy to an individual with “longevity genes.”
Cloud providers of IT services (including Google) all claim that information is secure and private. Users report otherwise, at least recently. As technology races ahead, those of us with technical training need to be aware of the advances and act accordingly. The public is probably blissfully asleep.
In genomics, the simple picture is probably useful for the layperson, but when one is looking for diseases, one needs to be aware of the complexity and modes of failure. The success of precision medicine will rest on developing a mode of action for diseases and mapping these to the smart diagnostics and tools to fix our problems.
*Mosaicism denotes the presence of two or more populations of cells with different genotypes in one individual who has developed from a single fertilized egg. See http://en.wikipedia.org/wiki/Mosaic_%28genetics%29. In contrast, chimerism arises from the combination of two or more eggs with different genomes.
- Winslow, R. Patients share DNA for cures. Wall St. J. Sept 17, 2013, p B-1.
Robert L. Stevenson, Ph.D., is a Consultant and Editor of Separation Science for American Laboratory/Labcompare; e-mail: firstname.lastname@example.org.