Automated Structure Verification by NMR, Part 1: Lead Optimization Support in Drug Discovery

The pharmaceutical industry has always relied on testing and evaluating novel compounds in a wide range of chemical space to feed its pipelines. Physical properties and the ability of the compound to hit its intended target are crucial in determining the success of a medicinal chemistry campaign. The chemist’s interpretation of the structure/activity relationship (SAR) of candidate molecules is at the very core of lead optimization and successful nomination of a specific compound as a clinical candidate. Nothing, then, can be more confounding or wasteful than counterintuitive or misleading structure/activity relationships that can derail or misguide the efforts of a team while selecting the next compounds to make for testing.

Therapeutic area teams must trust the integrity of the molecules being tested; yet mistakes happen. Molecules with incorrect chemical structures are submitted every day across the industry. Sometimes these mistakes are subtle and are due to administrative errors, such as misplacing a functional group on a ring or omitting a double bond when drawing the structure electronically. Peer-reviewed journal articles are subject to the occasional error in interpretation or drawing of reported molecules.1 Certain Web logs chronicle the misdrawn and misrepresented molecules that are published; one in particular has numerous alerts regarding structure errors.2 The more detrimental cases occur when an incorrect structure is the result of an unexpected or undetected outcome of a reaction, resulting in a product that is different from its expected structure.

It is critical to the outcome of a project that samples submitted for testing are correct. Incorrect structures have both a short- and long-term impact on research costs. Chemists find nothing more frustrating than beginning a discovery program with a series of leads only to find that, upon resynthesis, the leads are not active. This is especially true if a lead appeared to be exceptionally potent. In instances in which sufficient material remains for analysis, a great deal of effort can be expended to perform structure elucidation.

The first part of this two-part series will focus on the scientific and technical benefits of system implementation. Part 2 will cover the return on investment and financial analysis of system implementation.

Use of mass spectrometry (MS) and nuclear magnetic resonance (NMR) to characterize and validate compounds

Analytical techniques such as mass spectrometry and nuclear magnetic resonance are generally utilized to characterize and validate the compounds submitted for testing. Before the advent of high-throughput automation systems such as sample changers and autoinjectors in the 1980s and ’90s, it was typical for chemists to submit compounds to staff analytical chemists. Lately, this dependence has shifted to chemists themselves as organizations have continued to minimize support staff. The fundamental problem occurs in the interpretation. A higher dependency has been placed on MS data to characterize compound integrity due to the simplicity of the method and interpretation of the data. Heavy dependency on this single analytical technique is not without risk, however, since many aspects of MS can lead to overly optimistic interpretation.

Some molecules are not easily ionized and may be invisible to MS, UV, or even evaporative light scattering (ELS) detectors. More easily ionized compounds, on the other hand, may actually be the less abundant species in such a sample, and may be overrepresented in characterization. It is common for most pharmaceutical drug discovery teams to require proton-NMR, MS, HPLC, or other auxiliary analytical methods as evidence of successful syntheses of proposed target molecules. A scant percentage of the content of the collected analytical data is used to verify or confirm the identity of proposed molecules. The main reason for this is that it is not the primary function of a medicinal chemist to interpret spectral data, but rather to synthesize compounds. The more time chemists spend characterizing their molecules, the less time they spend making them. Therefore, management and researchers must strike a delicate balance between quality and quantity of compounds produced.

Implementing automated chemistry

Automating chemistry is often used as a means to increase productivity,3–5 and typically results in reduced requirements for analytical characterization as well. This can lessen the workload for medicinal and analytical chemists. Industry requirements for purity are dependent on the purpose of the compound. In general, individual custom synthetic molecules are required to meet a minimum 95% purity criterion, while compounds for libraries synthesized under automation are typically accepted within a 65–80% purity range.

In many cases, characterization of compounds synthesized using automation requires only a passing LC-MS. This is often done for practical reasons related to sensitivity. While these types of methods can be employed to increase “shots on goal” for hit/lead identification, they cannot meet the quality requirements and amount of material necessary in the lead optimization process, which is the focus of this discussion. Here, the higher requirements for purity allow us to more readily investigate the consistency of a compound’s spectral parameters (NMR or LC-MS) relative to derived or calculated values. For NMR, this would be chemical shifts, coupling constants, and correlations. For MS, it would be associated parent ions and adducts.

To further reduce the chemist’s data interpretation workload, spectral interpretation can be automated. Only recently has work been published demonstrating this principle. When automated interpretation is run as a background process, greater value can be extracted from the data that have already been collected (NMR, LC-MS, etc.), which may otherwise have been overlooked due to the limited time available for manual interpretation. Figure 1 depicts an example of the automated interpretation output. The key to successfully accomplishing this goal in the authors’ laboratory was to build the work flow into existing activities and provide the results with no extra burden to the chemists.

Figure 1 - Typical autoverification result on commercial compound using HSQC experiment.

NMR verification system description

The automated NMR verification system that has been implemented is designed to eliminate additional work for the chemist. Use of a simple convention for lab notebook number reference at sample login on the spectrometer is all that is necessary to establish the basic structure-to-spectrum relationship. In addition to the proton-NMR experiment, the 2-D (1H/13C) heteronuclear single quantum coherence (HSQC) experiment, which is typically run for 15 min, is also collected.6,7 With sophisticated higher magnetic field and more sensitive instrumentation, the HSQC data can be collected in 1 min. All other aspects of the work flow remain the same, as depicted in Figure 2. Automated functions occur completely in the background. These functions are triggered automatically upon completion of compound registration. Automated interpretation and subsequent generation of a verification score requires no additional effort on the part of the chemist, spectroscopist, or supervisor, and are conveniently reported back to the LIMS and compound registration database, where they are available for final review by supervisors prior to approval of the compound (see Figure 1).

Figure 2 - Standard work flow diagram.

System components

The current implementation of the system utilizes the NMR Expert, Automation Server, and C+H NMR predictors (version 12)8 (ACD/Labs, Toronto, Ontario, Canada).