Highlights From Sigma Xi’s Annual Meeting

As a member of the media, I was honored to be invited to the annual meeting of Sigma Xi (The Scientific Research Honor Society), held in San Francisco, October 26 and 27, 2018. I’d heard Sigma Xi mentioned a few times during my 50+-year career, but I’d passed it off as just another fraternity. Naïve, I am. Sigma Xi is a vibrant honor society that provides a forum for STEM researchers for trans-discipline communications.

For 2018, the meeting tagline was “Big Data and the Future of Research.” The technical program featured speakers from junior faculty that are striving for tenure. The novelty of their topics and clarity of presentations bode well for their progress along the tenure trail. A few plenary lectures from established (older) researchers were salted in throughout the program.

Let’s look at some of the topics that caught my eye:

High-speed imaging

High-resolution images involve large data files. “High speed” means more images and even larger files. This was reported in object scales ranging from the universe to single atoms.

Astronomy

A plenary lecture from Professor Steve Ritz of the University of California at Santa Cruz described two very fast, widefield, high-resolution imagers for studying the growth of the universe, including the behavior of dark matter. The Large Synoptic Survey Telescope (LSST) will use an 8-meter primary lens and a 3.2-gigapixel camera to record a hemisphere of data of astronomical events during a decade-long photo shoot. This will provide 20 terabytes of data per night.

Since its launch in 2008, the Fermi Gamma-ray Space Telescope has been in low earth orbit surveying the entire sky every three hours for high-energy events. One can anticipate the need to search files from the LSST after the Fermi calls attention to a high-energy event and vice versa. The databases of both will be astronomical.

Google Cloud enables research

The technical program for the first day closed with a plenary lecture by Jeff Dean, who leads the Brain Project at Google. This is a major program that applies artificial intelligence (AI) and custom-built hardware to machine-learning challenges. Using the rate of published papers as an index, Dean reported that the publication rate in late 2018 was 90 papers per day. This is phenomenal, since machine learning, including AI, has been a topic since the 1960s. Fifty years is a long gestation period.

Machine learning can use lots of computing power. A few years ago, Google scaled the computing requirements associated with anticipated technology and demand. They found that this would double the size and power requirements for the Google Cloud Platform. In a daring move, Google redesigned the AI machines to fit-for-purpose computing. This permitted large-scale addition of AI to each of Google’s cloud farms, with a densely packed stack about the size of a 40-foot cargo container. Each stack is a massive assembly of special water-cooled, printed circuit boards with novel AI processing chips. It is amazing what one can do if money is not a limiting factor.

The lecture by Dean touched on numerous applications of AI. Many were from the list developed by the National Academy of Engineering’s Grand Challenges for Engineering for the 21st Century. Other topics included histology and microbiology. However, Dean’s major message is that Google is most concerned about doing AI right. I am impressed with the careful thought Google has put into the AI segment.

Principles of artificial intelligence at Google

Artificial intelligence, including machine leaning, is a hot topic today since it has the potential to profoundly improve our lives. It may be useful in predicting the risk of wildfires and earthquakes, and to diagnose cancers, prevent blindness, and on and on.

With such important problems to solve, Google is investing heavily in AI R&D. Google also makes AI technologies and tools with open-source code. This will encourage users to develop applications using Google’s codes and services such as Google Cloud.

However, there are risks with misuses of powerful technology. Google intends to assess AI applications according to the following seven criteria.

Google believes that AI should:

  • Be socially beneficial—Google will proceed where they believe that the overall likely benefits substantially exceed the foreseeable risks and downsides. They intend to thoughtfully evaluate when to make technologies available on a noncommercial basis.
  • Avoid creating or reinforcing unfair bias.
  • Be built and tested for safety—Google intends to develop and apply strong safety and security practices to avoid unintended results that create risks of harm.
  • Be accountable to people—Google expects their AI technologies will be subject to appropriate human direction and control.
  • Incorporate privacy design principles—Google’s AI products and services will include privacy protection.
  • Uphold high standards of scientific excellence.
  • Be available for uses in accordance with the above principles.

In addition to the above objectives, Google will not design or deploy AI in the following application areas:

  • Those that may cause harm.
  • Weapons designed to injure people.
  • Technologies focused on gathering and using information for surveillance that violate internationally accepted standards.
  • Applications conflicting with generally accepted international law and human rights.

On October 30, 2018, Google announced a $25 million grant program called the AI Impact Challenge to encourage development of AI-based solutions to major world problems.1

Big data in biology and medicine

In the lecture program, I attendedBig Data in Biology and Medicine,” which was one of five parallel lecture tracks. Another focused on astronomy.

Brain disorders affect almost 20% of the global population. Useful phenotype–genotype and hereditary associations exist, but causality is less well-developed. Professor Mark B. Gerstein of Yale University described his work using the transcriptome to elucidate the molecular pathways of the brain and psychiatric disorders.

Using deep learning technology, Gerstein developed a model that predicts phenotype from genotype. The model improves disease prediction sixfold compared to additive polygenic risk scores. The program also highlighted genes for disorders.

Petabases of genomic data qualify as big data. The Sequence Read Archive (SRA) now contains over a million data files requiring many petabytes of data, according to a lecture by Benjamin Langmead of Johns Hopkins University. An individual genome is about a gigabyte. Several projects already in the data acquisition phase will greatly expand the size of databases.

Professor Langmead used a combination of programs focused on RNA transcripts and cloud computing to accelerate research into molecular biology transcripts, including splicing patterns and Indels.

A lecture by Ovidiu D. Iancu of Oregon Health and Science University described the dependence of methamphetamine addiction on polymorphism of the TAAR1 gene. His team used network and graph analysis to locate brain regions, genes, and pathways associated with meth addiction or resistance in animals.

As scientists develop and curate large genomic data files, the subjects are predominantly European Caucasians. Professor Dana C. Crawford of Case Western Reserve University pointed out that this is already leading to misdiagnosis and treatment of nonrepresented human phenotypes.

The bias toward Caucasians is leaving a gaping hole in understanding human evolution since Africans are nearly excluded, according to Professor Latifa Jackson of Howard University. The African population is heterogeneous, so there need to be many more genomes recorded in different locations. Several have been identified, but funding has been low compared to need. And it is much more than just ancestry, as professor Crawford pointed out.

The poster sessions presented a much wider range of topics than the lectures. Many focused on the intersection of science and current affairs. Many appeared to report on works in progress. One that caught my eye dealt with the injury risk of American football. Another discussed machine classification of “fake news.”

BMI and American football

A poster by George Moll, M.D., Ph.D., focused on the body mass index (BMI) of American football players and the lifelong (or life-ending) injuries endured by male high-school players. Some statistics: number of players in 2016–2017: 1,057,382. Injury rate: 4.36 per 1000 participants, including 12 fatalities (highest of all sports). High BMI and related obesity are strong correlates. A supporting paper by Dr. Moll advocated setting a maximum BMI of 30 for participation in public high school football.2 The study used public data obtained from published game programs that include position, height, weight, etc., thus avoiding the bureaucratic delays due to internal review boards.

Fact-checking the Internet

A poster by Tsion Coulter and colleagues at the University of North Carolina in Chapel Hill discussed machine-aided fact-checking of Internet content.3 They used FEVER, a large-scale public data set. FEVER sorts into three classes: “Supported,” “Refuted,” and “Not Enough Info.” Initially using a long short-term memory (LSTM) model, they achieved an accuracy of classification of 59.33%. Expanding the model to use bi-directional-associated words improved the accuracy to 62.45%. Work to improve accuracy is continuing.

The 2019 Sigma Xi meeting is scheduled to be held November 14–17, 2019 in Madison, WI. Keep track at https://community.sigmaxi.org/home.

References

  1. Baron, E. Google puts up $25 million for AI research. East Bay Times Oct 30, 2018, pg C8.
  2. Maeda, K. and Moll, G. American football sets players’ body mass index. Global Pediat. Health 2018, 5, 1–7. doi: 10.1177/2333794X18785540.
  3. Coulter, T. et al. Fake news: fact verification using evidence extraction and logical entailment. Poster UG-MCS-410, Sigma Xi Annual Meeting, 2018.

Robert L. Stevenson, Ph.D., is Editor Emeritus, American Laboratory/Labcompare; e-mail: [email protected]

Related Products

Comments