Automating and Validating Analysis in High-Throughput Single-Cell Assays

While genomics and transcriptomics provide insight into the potential of a cell, single-cell phenomic data reflect its current reality, allowing researchers to determine how genomic variants affect phenotypes and providing deep insight into the causes of disease. A core technology in single-cell phenomics is flow cytometry, in which cells are labeled with fluorescent-bound antibodies in suspension, interrogated with laser light individually at several thousands of cells per second in a stream of fluid and measured using a flow cytometer that detects both scatter and fluorescence parameters (Figure 1b). The data produced by these instruments is stored as event-level data and associated metadata, which gives researchers the ability to determine the phenotype of many thousands of cells (Figure 1d).

Figure 1 – Flow cytometry. From left to right (a) a single cell suspension stained with different fluorophore-bound antibodies is interrogated (b) with light from one or more lasers (illustrated as blue and green). This excites fluorescence at unique emission wavelengths that can be detected by a flow cytometer, which in turn produces (c) list-mode data files. In this manner, fluorescence may be correlated with the protein to which the antibody bound specifically and cells identified (d). The resulting fluorescence data may be visualized as a 2D histogram in a FlowJo pseudocolor dot-plot graph, which heat maps fluorescence intensities of greatest frequency in red.

Flow cytometry and automated analysis

Modern flow cytometry originated in the late 1960s in the Herzenberg Laboratory at Stanford University (Stanford, Calif.). As cytometers evolved into the 1990s, a more powerful and user-friendly analysis and display program was needed. FlowJo cell analysis software (FlowJo, LLC, Ashland, Ore.) is able to analyze flow cytometry data at the experiment level, enabling researchers to quickly and reproducibly cluster cell phenotypes by “gating” (Figure 1d). However, the acquisition of large numbers of samples and the need to run a 96-well plate in under 4 minutes and a 1536-well plate in an hour1–4 have created a data analysis bottleneck that limits the potential of available analysis tools.

Clearly, a better way to facilitate automated analysis had to be found. The authors used a dataset from the Supporting Health by Integrating Nutrition and Exercise (SHINE) study, which examined neuroendocrine modulation of immune function. The analysis challenge is summarized in Figure 2a. The study included 120 cell phenotypes and statistics with 632 clinical samples, 3–4 time points over a one-year period and 163 unique study subjects. Manual analysis via gating and review were an enormous computing challenge because raw data comprise the majority of the memory footprint (Figure 2b). In addition, while FlowJo provides reproducible and rapid analysis, validation in this workflow is performed by manual review (Figure 2 shows only six gates from a representative sample). In this study, the manual analysis of each file and review of each of the 120 gates and statistics on all 632 clinical samples and controls would take 110 cumulative days.

 Figure 2 – Clinical study analysis challenge. Top left: summary of patients, time points, acquisition runs and statistics and cell populations of interest in the SHINE study. Top right: derived versus raw data produced from the entire study and analysis. In flow cytometry, populations are identified with “gates,” which are shapes drawn on cell clusters. At bottom, the first six hierarchical gates on a representative sample to identify cells after an initial burst that may have sample crossover, singlet cells, live (left to right, row 1), peripheral blood mononuclear cells (PBMC), T cells and CD4+ and CD8+ T cells (left to right, row 2). Axis parameters correspond to fluorescence from bound antibodies, thus allowing population identification.

Using the software, template analysis files can be produced based on a completed assay, and the templates can be reused for similar analyses. A template contains the entire analysis minus any raw data, leaving artifacts such as group-owned gates and statistics, layout and table outputs, third-party nodes (such as those from R or GPS), scripts and functions. Templates also retain group inclusion criteria, statistic-based gates and metadata flagging.

FlowJo Enterprise is an acquisition-to-insight software platform that extends the functionality of FlowJo software. It gives researchers the tools to construct pipelines of templates called protocols, which facilitate adaptive assays by modularizing analysis components and making decisions based on constraint criteria (Figures 3 and 4).

Using this enhancement to the traditional analysis workflow, a protocol pipeline was generated to analyze data from the SHINE Study and add critical decisions, e.g., flagging samples with <70% viability, excluding populations with less than 100 events from final summary analyses. The 4thWall application is installed on the instrument-associated computer and allows the investigator to link raw data to a protocol for automated analysis before data transfer and immediately following data acquisition. This adds a layer of data provenance and transfer checks and accelerates analysis. Using this integrated workflow of protocol construction in FlowJo, the compute power of server-based high-performance engines and accounting for final review, the time needed to analyze the data decreased by more than tenfold. Moreover, a data ecosystem was created in which protocols, raw data and derivative reports were linked together (Figure 3).

Figure 3 – Solutions for data management and analysis in flow cytometry.
Figure 4 – Analysis automation. Protocols tie together multiple analysis templates (left) for conditional execution based on the results of previous analyses (right) and allow for parallelization and pipelining of analysis strategies and report generation.

To validate the analysis pipeline, manual and automated analyses were compared using F-measure, a standard for flow cytometry analysis comparison.5 F-measure compares the inclusion or exclusion at the eventlevel cell data within a population. Thus, for validation, F-measure was calculated for the terminal populations, the glucocorticoid (GR) and mineralocorticoid (MR) positive, since it was believed they would have the highest possible amount of possible variance. To visualize this, the F-measure statistics were calculated for each population on one of the 12 runs of the study, and discordance and matching were mapped (F-measures between 0 and 1, respectively; Figure 5). This analysis reveals the similarity between automated gating and manual adjustment of gates at the phenotype level in this large study. In addition, F-measure allows identification of the most problematic population—the monocyte populations defined by CD14 and CD16 markers, which are notoriously hard to gate because they represent smears and varying expression levels of both markers. This population presents an opportunity for pipeline optimization and frees up the analyst’s time.

Figure 5 – Analysis validation. Discordance and agreement (left) and heat-mapped F-measures for the GR+ and MR+ populations indicated (right).

Conclusion

An analysis pipeline designed from raw data file to finished report was created to address the needs of researchers studying flow cytometry data. This study demonstrated the similarity between automated gating and manual adjustment of gates at the phenotype level, as well as the construction of an analysis-centric data management system. FlowJo Enterprise enables streaming data directly from acquisition into statistically validated analysis pipelines; sharing reproducible, individual components of analyses; and quickly obtaining phenotype or clinical outcome data from predictive machinelearning algorithms. A library of templates can be created and shared among researchers to build automation pipelines that apply hierarchical gating to calculate and import clustering results, generate reports and additional experimental annotation and update a database depending on the results of the ongoing analysis. The platform extends the functionality of the most widely used analysis software in flow cytometry and addresses the challenges presented by the increased volume and velocity of single-cell data.

References

  1. “Flow Cytometry Powers High-Throughput Screening Advances,” Society for Lab Automation and Screening Electronic Laboratory Neighborhood; March 26, 2015; http://eln.slas.org/story/1/156-flowcytometry-powers-high-throughput-screening-advances
  2. Black, C.B.; Duensing, T.D. et al. Cell-based screening using high-throughput flow cytometry. Assay Drug Devel. Technol. 2011, 9(1), 13–20.
  3. Edwards, B.S.; Kuckuck, F.W. et al. HTPS flow cytometry: a novel platform for automated high-throughput drug discovery and characterization. J. Biomol. Screen. 2001, 6, 83–90.
  4. Edwards, B.S.; Young, S.M. et al. High-throughput flow cytometry for drug discovery. Expert Opin. Drug Discov. 2007, 2, 685–96.
  5. Aghaeepour, N.; Finak, G. et al. Critical assessment of automated flow cytometry data analysis techniques. Nat. Methods 2013,10(3), 228–38.

The authors are with FlowJo, LLC, 385 Williamson Way, Ashland, Ore. 97520, U.S.A.; tel.: 800-366-6045; e-mail: [email protected]; www.flowjo.com. The authors wish to thank Jeffrey Milush, Ph.D., and Bill Hyun of the University of California, San Francisco, for providing access to the Supporting Health by Integrating Nutrition and Exercise (SHINE) dataset.

Comments