Can Technology Crack and Extract Useful Knowledge From Proteomics, or is the Proteome too Complex?

Society has invested tens of billions of dollars in the last two decades to sequence the human genome and look for biomarkers to fight disease and guide therapy. Neither has come close to expectations, at least the initial expectations. Why? The short answer is “Oh, if it were so simple.” We underestimated the complexity. Simple models of associating a disease with a biomarker protein or single nucleotide polymorphism (SNP) in the genome seldom work. There is progress, however.

The human genome is riddled with errors, including duplications and inversions. One estimate is that about 30% of the human genome is wrong. To get it right, Prof. David C. Schwartz (University of Wisconsin, Madison) is correcting the numerous errors in genomes sequenced by shotgun technology by sequencing linear DNA using capture genome hybridization (CGH). Individual strands of DNA are stretched into a line on the surface of a microscope slide and cut with restriction enzymes at specific locations. Since the connectivity is established and confirmed by the site specificity of the restriction enzymes, assembling the genome is much more accurate. This technique catches duplicates, insertions, and inversions, all of which are common. The output is a series of bars that read like a bar code.

For comparison, bar codes for a series of individuals are aligned. This reveals similarities and differences in sequence. Capture genome hybridization for humans and chimps shows that humans have 510 deletions. Generally these are for proteins associated with mental development. Another study compared wild type and cancerous breast tissue. Prof. Schwartz found that the genes from cancerous tissue had been randomized with long sequences exchanged between the 23 chromosomes.

Lack of a standard genome for reference is still another issue. Since we are all different, there is no “standard” genome. One has to work with a consensus genome that recognizes and respects our diversity. Understanding genetic diversity is an essential key to gaining value from the sequence. Now we have displays that aid the eye and mind in presenting data in a useful form, but this took time to develop.

One unanticipated problem was the size of the data files. Worse, they all seem to have a different structure and protocols. Searching and using the databases is a major technical issue. Semantic technology (ST) shows facility in collecting and associating data from disparate data banks. With all the problems, genomics is not a waste, but getting useful results is much more challenging and slower than anticipated.

If the genome is daunting, the proteome is even more so, since it is much less suitable for rule-based understanding. With the genome, one has coding triads of bases that code for specific amino acids in a protein chain, which gives the primary structure. Some amino acids have multiple codes but the dots are connected. The primary structure is from the DNA, but proteins are often adorned with side groups via processes called post-translational modification (PTM). Under this heading, proteins can conjugate with other molecules either covalently or by forming a complex. For example, Prof. Al Burlingame (University of California, San Francisco) reports evidence that extremely rare (1 in 105) phosphorylation events are important in AIDS.

This requires very careful sample prep and expensive analytics. Plus, the informatics tools required for describing these complex networks are inadequate, despite the efforts of many brilliant programmers.

Protein complexes are potentially dynamic entities since they exist in a soup of enzymes designed to modify or degrade them. Ideally, covalent modifications such as phosphorylation, glycosylation, ubiquitination, or any mixture of the hundreds of possibilities need to be stable enough for current analytical technology. Analytical techniques for dynamic PTMs will probably come, possibly from NMR or back-scattering interferometry.

Despite the warnings of Leigh Anderson five years ago (lecture at 2006 PepTalk meeting in San Diego, February 2006, Plasma Proteome Institute), hundreds of millions of dollars have been spent on serum biomarkers. Prof. Giorgio Righetti (Politecnico di Milano, Italy) slammed the door on this at the 2011 Joint Conference in San Diego. The selective enrichment technology does not work for serum, or at least well enough to be useful. At the same meeting, Prof. Barry Karger (Barnett Institute, Boston, MA) and others did show promising results for measuring biomarkers in tissue. The advantage of tissue is that the diversity of proteins is lower and their concentration is higher. Thus, the future probably involves focusing on individual diseases and associated markers rather than shotgun or fingerprinting approaches.

Going larger, the biochemistry of genes and proteins also requires explanation of the higher-order (quaternary) structures that are responsible for biological function. DNA is wrapped around the histone, and proteins fold into an active state. CASSS is organizing a symposium on higher-order structures to provide a forum for research. There seems to be a need for much better instrumentation to study higher-order structures.

All of this needs to be compared to reality. As new entities are found and annotated, could some correlations be flukes? Biological systems have large populations, so random nonsensical correlations are probable. Prof. Karger studied proteins from breast cancer and normal tissue. Even at the 99% confidence level, 121 proteins were significantly differentially expressed. Using network analysis, these produced two suggested networks, one of which appears to make sense and may have some predictive value in the clinic. The other appears to be nonsensical.

Regulatory science, which is the study of the science used to regulate commercial biotherapeutics, is still another complex topic. More than 80% of the cost of a new therapeutic is associated with regulatory issues. Armies of scientists around the world are involved in formulation, stability, clinical trials, production engineering, etc., of biotherapeutics.

Then, consider the sorry lot of the regulators. They have to deal with mountains of data generated by armies of scientists in biopharm. They are outmanned and outfunded by at least 1000:1. Fortunately, the ethical standards at the bench level are similar for the regulators and regulated. Flagrant fraud seems to be rare. When fraud does occur, it seems to be located in executive suites and boardrooms. The take-home message: Be skeptical of the leaders; they may have sold out.

Finally, consider biosimiliars. “Generic drugs,” which appear after patent protection expires for small-molecule drugs, appear to be working as intended. But the complexity of biotherapeutics is beyond the understanding of the most skilled scientists to reverse engineer. The original vendor “innovator” is able to use experience to sideline potential generic competitors. The public clearly suffers from associated monopolistic practices.

With the current state-of-the-art, no generic solution for biotherapeutics is visible. Europe has a protocol that works, but only for the large countries. For the U.S.A., the dance proposed by the FDA for exchange of intellectual property is prone to abuse. In addition, the protocol is not harmonized with the rest of the world. The public will certainly continue to suffer. To me, the best route for generic products is to try to develop a second- or third-generation biobetter. Yes, this involves clinical trials, but the path is well-known and effective. Plus, bio-betters free the developer to take advantage of the latest technology in production, formulation, and delivery.

Readers should take away an optimistic view of the future of biotherapeutics. Progress has been slow, since the biochemistry is much more complex than initially expected. New tools had to be developed to remove confounding errors in the human genome. In proteomics, the mysteries of post-translational modifications and higher-order structures will require study, probably including new instrumentation. Now, informatics is recognized as a key contributing technology. After all, humans need to understand and utilize the knowledge that comes from our investment.

Dr. Stevenson is a Consultant and Editor of Separation Science for American Laboratory/Labcompare; e-mail: