Connecting the Dots With a Foundation for Biological Registration

The last decade has seen huge advances in biological research. The advent of genomics and proteomics, as well as experimental approaches such as next-generation sequencing and high-throughput screening, have led to a more in-depth understanding of systems biology and the discovery of new biomarkers. Biotherapeutics innovation is thriving as research organizations move beyond small molecules to explore antibodies, vaccines, siRNA, and other biologics as potential drug candidates. Researchers are gaining new insights every day from their work with stem cell lines, plasmids, algae strains, and more. A key factor in the success of these initiatives is an ability to “connect the dots,” i.e., have a way to systematically track biologic research data in order to compare and analyze experimental findings; better understand the relationships between biologic entities; and, most importantly, make discoveries that have not been made before. The Accelrys Biological Registration system (Accelrys, San Diego, CA) is a tool to help researchers reach these goals.

How, for example, can an organization quickly determine whether a promising siRNA candidate has already been patented? What is the safety code associated with a specific biologic entity? Has another department or project team within the organization conducted experiments involving the same ribonucleic acid sequence currently being studied?

Figure 1 - Registration, in this case of biological entities, is a fundamental process to ensure subsequent data aggregation. The registration number/ ID acts as the key used to track all activity against that entity. This is a requirement for any overall master data management program.

The problem is that making these types of critical associations is often far easier said than done in modern scientific enterprises, where researchers are challenged to keep up with enormous volumes of data. To alleviate these challenges, speed innovation, and boost competitive advantage, a reliable and scaleable system for biological registration is needed (Figure 1).

Millions of needles; thousands of haystacks

Biological research data are vast and extremely complex, often spanning thousands and even millions of experimental protocols, proteins, cell lines, compounds, and more. As research operations become increasingly global in scope, these data are typically distributed across geographies, organizational departments, and project teams, and locked within discipline- and format-specific systems, instruments, and databases. Adding to the chaos are data that need to be incorporated from the literature and public databases such as GenBank and other government or academic databases. However, scientists need to enable researchers to compare their work against what has already been done by other organizations and take advantage of existing knowledge in their fields.

What all of this means is that accessing information on just a single biological entity—in order to find out what is known about it, which scientists are working with it, and what processes are involved in producing it—can be like finding a needle in a haystack, or, more accurately, finding a needle that may be located within hundreds or thousands of haystacks. Multiply this problem across the millions of possible entities a research organization may want to study, and it is easy to see the importance of being able to uniquely identify and track biologics across data systems and knowledge sources.

Biological registration: The next informatics frontier

Figure 2 - The registration process protects intellectual property (IP) by establishing a unique ID for the biological entity and giving the entity a time stamp (Original Electronic File, OEF). The ID acts as a unique key to track the entity through the screening database, inventory system, and documents. This key is what allows the entity and all of its data to be aggregated throughout the R&D enterprise.

Pharmaceutical organizations have relied on registration systems in the chemical realm for years, using them to identify and track chemical compounds during the drug discovery process. Similarly, a system for biological registration can help scientists protect their life science discoveries, keep tabs on experimental progress, access related information about promising biologic candidates, and build on the valuable research that has come before, both within their own organizations and across the broader scientific community (Figure 2).

Improved operational efficiency

It is not unusual for scientists to spend 50% or even 75% of their time searching through databases, formatting and collating information for analysis, and comparing results across departments and disciplines. This is hugely wasteful. An ability to quickly find information on biologics empowers researchers to spend less time managing data and more time on actual science. The resulting efficiency gains will not only speed the discovery process, but will also save money and resources.

Reduced redundancies

Registration systems can also help scientists more easily find out if similar research is being undertaken elsewhere in the organization, and thus reduce duplicate efforts. This is an issue that has been exacerbated by the trend toward distributed global operations. For example, a project team located in China may have already run numerous experiments on a protein that another group of scientists located in the U.S. is also interested in studying. With a registration system, this existing knowledge can be reused, avoiding redundant experimentation (including its associated costs).

Increased safety

By linking safety codes to a unique biologic ID, organizations can more effectively ensure that their researchers are aware of, and can mitigate, biohazard risks. Without registering safety information, organizations can miss vital information about health and safety issues associated with the entities they are working with, especially when research is handed off to separate teams and specialists during the course of a project.

IP protection

Imagine being hit with a $1 billion patent lawsuit after spending years and millions of dollars bringing a biotherapeutic to market. These kinds of cases can and do happen and are more likely to do so when researchers have no systematic way of comparing their biologic candidates with patent databases and other public sources of information. It is absolutely critical to be able to identify potential patent conflicts before investing a great deal of time and money in R&D, and equally critical to be able to protect potentially lucrative IP from competitive infringement. By tracking the history, experimental protocols, and processing steps associated with a biologic entity, and by identifying similar efforts, registration offers a cost-effective insurance policy against such risks.

A foundational, integrated approach

Biological registration systems are needed so that research organizations can track biologics and their relationships, creating critical intellectual property positions as well as connections to past research and manufacturing processes. Yet, unlike the small molecules tracked by chemical registration systems, biological entities such as proteins, antibodies, vaccines, viruses, or siRNA are notoriously complex and difficult to identify in a consistent manner. For example, if a protein is expressed in two different cell lines, one scientist may consider them to be the same thing because they share the same amino acid sequence, while another scientist may consider them to be different because of different glycosylation patterns. Additionally, biologics typically comprise anywhere from hundreds to millions of atoms, compared to the 20–100 found in small molecules. Finally, the knowledge base surrounding biological entities is continually evolving. Two observations that are seemingly unrelated today may lead to an unexpected connection tomorrow.

Getting biological registration right requires a foundational approach that takes into account the complexity inherent in the field, and one flexible enough to evolve with the science. The Accelrys Biological Registration system is an “intelligent” solution for registering, associating, searching, and retrieving data for entities such as siRNA, plasmids, cell lines, proteins, antibodies, vaccines, and future biological entities. The system was developed through a consortium approach that involved close collaboration with leading pharmaceutical companies, including Merck & Co., Inc. (Whitehouse Station, NJ) and Abbott Laboratories (Abbott Park, IL). The consortium approach was critical because it enabled Accelrys to incorporate “in the trenches” insight from real-world end users about what capabilities are most important.