Big Data: Unseen by Human Eyes

A story titled “Learning to See Data” by Eric Foner in the Review section of the Sunday New York Times (3-29-15, pSR 1) described in layman’s terms the problem with data-rich modern science: multidimensional data is too often incompatible with 2-D or 3-D simple graphs. Because of sheer volume, it is also incomprehensible. Transmitting the data is often problematic as well: files that are too large cause the FTP to time out. And humans can curate only so much before the mind goes numb.

New ways to present data are needed. As numbers are much harder for people to process than images, most tools for visualizing large data sets rely on generating infographics. Heat maps are an early example. However, color-coded heat maps do little more than show regions of similarity and contrast. Looking at pages of heat maps is better than a similar number of data points, but both are mind-numbing.

One way to add content is to add shape to the color. Spider plots are an example of a graph that helps one compare multivariate responses. Spider plots lend themselves to grouping by finding similarities and outliers in the data, and thus are much more useful than two-color heat maps.

Graph computing also generates images connecting nodes and edges, particularly when semantic analysis is employed, and can be useful in presenting unorganized data. Combined with inference engines, graphs often lead to interesting questions.

In the article, Foner advocates using “perceptual learning” to take advantage of instincts that people develop based upon prior experience. Perceptual learning involves visual skill training using computer-game-like modules that require split-second decisions. With experience, the ability to see familiar patterns in the data display and to instantaneously make corresponding associations is developed. Meaningful sights are detected and noise is filtered out. Clearly, this is a human endeavor.

How good is it? Foner cites the example of training a person to fly using a video game-like lesson developed by Dr. Phillip Kellerman. With it, novices were able to respond in a flight simulator as well as a person with 1000 hours of flight training. Foner further reports that the medical school at UCLA has developed perceptual learning modules for histology and for reading electrocardiograms.

Reading tissue sample slides is generally limited to about 200 specimens/day/person. With organs on a chip, experiments processing tens of thousands to millions of slides/day can be anticipated, with correspondingly large data sets. As Foner points out, the experienced eye can see things quickly and intuitively.

Meeting this challenge of scale will require development of a computer that can mimic human recognition and intuition. This is possible, as illustrated by IBM’s claiming the "Jeopardy" Challenge in 2011 with the Watson system. Team IBM developed a semantic search capability that mimicked human experience by computing fit probabilities for the edges that connect nodes (dates, people, location, etc.) contained in the "Jeopardy" questions. The model provided accurate responses, but was slow (~30 minutes), so IBM added almost 3000 more computing cores and the computer won the challenge.

The Watson project might be showing the way to scale perceptual learning to meet challenges in size, such as screening organ chips, or that are too complex or tedious, such as histology.

Robert L. Stevenson, Ph.D., is Editor Emeritus, American Laboratory/Labcompare; e-mail: [email protected].

Comments