A Review of Quetzal: A Linguistic Search Engine for Biomedical Literature

Quetzal is a biomedical literature search engine that uses simple strategies that contain few keywords combined with an extensive biomedical vocabulary and filter sets that mirror the discovery process often used by biomedical researchers. Quetzal (Quertle, Henderson, Nev., www.quertle.com) goes beyond keyword searching by integrating natural language processing to extract phrases that emphasize the relevance of the search result.

Each option available to researchers when searching for biomedical literature each has unique strengths and limitations.1,2 Most of these search engines, including Google Scholar and PubMed, use keyword searching to generate results.3 This strategy can result in irrelevant search results because the keywords can be present anywhere in the indexed text, including the references or author names. For example, searching for “bone cancer” in these search engines could return a result containing the word “cancer” written by an author with the last name “Bone.” This issue has spurred the development of new user-friendly biomedical search engines that extract meaning from the search results. Quetzal (also known as Quertle V5.0) was developed by biologists to solve the searching problems they encountered while conducting biomedical research.4 Instead of relying solely on the coincident presence of keywords within the indexed text, Quetzal generates specific search results using what the company calls Quantum Logic Linguistic Technology to retrieve articles that contain assertions about relationships between the search terms. This technology combines natural language processing and statistical algorithms to tease out relevant statements made by authors within the text.

Quetzal is best used by entering a few concepts into the search field and using filters to refine the search. The manually curated Quetzal ontology finds synonyms using information from dictionaries, thesauri and hierarchical and nonhierarchical relationships. The Quetzal ontology is geared toward biomedical research in a unique way. For example, when searching for NO, a common abbreviation for nitric oxide, Quetzal uses its Entity Identification Engine to disambiguate similar terms using context and case where appropriate. This type of disambiguation also works for gene names. Additionally, Quetzal separates the search fields for keywords, journals, authors and affiliations to avoid cross-talk between the fields (Figure 1). Quetzal does not support Boolean logic, and thus users should not include “ands,” “ors” or ”nots.”

 Figure 1 ‒ Search box. The Quetzal search interface has separate fields for search concepts (A), authors/investigators (B) and journal names (C). An affiliation search will be added soon (D).

Search results are separated into Focused Results generated by Quantum Logic Logistic Technology and Broader Results resulting from a more general keyword search (Figure 2). The search results link out to the material on external websites such as PubMed or the United States patent office. Quetzal displays the Relevant Statements within the article with the search results. Contextual Highlighting highlights search terms and their synonyms within the Relevant Statements to indicate why the result was retrieved (Figure 2).

Figure 2 ‒ Search results. Quetzal separates its search results into Focused Results based on assertions made by the authors in the text (A), and Broader Results (B) based on a more traditional keyword search. Each focused search result displays the citation along with one Relevant Statement (C, red box) in which the search terms and synonyms are highlighted. More relevant statements can be displayed by clicking “more Relevant Statements” below the result. The search and applied filters can be saved by clicking the pink star at the top right of the search results (D, blue box).

For example, a search for “cancer genes” retrieves results with the concept “genes” that are closely linked with the concept “cancer.” The Focused Results show Relevant Statements like “expression of the gene is associated with cancer” (Table 1). This feature obviates the need for the user to read the entire text to find relevant assertions. By default, only one relevant statement is included with the results, but the user can opt to see all of the relevant statements in a document by selecting “Show all relevant statements” above the search results (Figure 2). Quetzal also finds results that contain synonyms for cancer, like specific cancer types (melanoma) or cancer-related terms (tumor or carcinogenesis).

Table 1 ‒ Quetzal example searches*

To find associations between cancer and specific genes, Quetzal contains “Power Terms,”5 which are designated by a $ symbol before the term of interest. For example, searching the Power Term “$Genes” finds for the names of all genes in the ontology without searching for the word “gene.” This strategy eliminates generic statements about genes that do not name a specific gene. Searching “Cancer $Genes” now returns results that have Relevant Statements like “The abundance of this HSP90 species does not necessarily correspond with the total HSP90 expression in the tumor.” Users cannot create their own Power Terms; however, Quetzal staff are responsive to creating new Power Terms based on requests. Another useful feature is the addition of verbs to the ontology. For example, searching “$Genes inhibit cancer” returns results with Relevant Statements like “elevated MPS-1 decreases HNSCC tumor growth” (Table 1).

Quetzal Filters allow searches to be narrowed, such as by publication date or publication type (Figure 3a). For example, selecting the Clinical Trial filter in the cancer gene example will return only reports of clinical trials, which drastically narrows the number of results and which returns Relevant Statements that are limited to human study, like “The continuous supplementation of anticancer therapies with the medical nutriment MSC helps to reduce the incidence of treatment-related febrile neutropenia in children with solid cancers” (Table 1). Quetzal is very effective at retrieving specific information, but it is not perfect: note that the Entity Identification Engine misidentified the medical nutriment MSC as a gene because of similar text formatting. However, Relevant Statements prevent the user from having to read an entire text to figure this out.

Figure 3 ‒ Filters. Quetzal has predefined filters (A, red box) that allow the user to limit by publication date and type. Filters can be created by using the Also Containing and Not Containing fields (B, blue box). Additionally, positive and negative assertions can be differentiated using the Negative Statements filter (C, green box). Finally, Quetzal generates Key Concepts that can be used as filters based on the Relevant Statements found in the search results (D, orange box).

In addition to the date and publication type filters, Quetzal generates useful filters based on the search results. The Key Concept Identification Engine generates filters that are based on the Relevant Statements in the search results (Figure 3b). In the “$genes inhibit cancer” example, Key Concepts generated include relevant gene names, types of cancer and a list of verb synonyms based on “inhibit.” The user can select one or more of the Key Concepts to filter the results. For example, selecting the Key Concept melanoma restricts the results to only those about melanoma with relevant statements like “MASL targets PDPN to inhibit melanoma cell growth” (Table 1).

Users can also create their own filters by using the “Also Containing” or “Not Containing” search fields to add concepts to the search strategy (Figure 3a). Quetzal provides a further layer of granularity by allowing this additional search term to be in the Relevant Statements or anywhere in the document as a whole. For example, adding an “Also Containing” filter for the Power Term “$AnimalModel” will restrict the results above to animal models of melanoma (Table 1).

Finally, Quetzal allows users to filter for Negative Statements, a common need in searching the biomedical literature. Typical keyword searches make it nearly impossible to filter between whether “A causes B” vs “A does not cause B.” This filter searches the Relevant Statements for the presence or absence of a “not” statement and displays either statements that contain “not” or statements that do not contain “not,” depending on preference (Figure 3a). This feature makes it easy to filter for statements about genes not being involved in certain types of cancer. Overall, using these filters makes it very easy to get high-quality search results compared to the complicated search strategies necessary to get the same resolution in other search engines like PubMed.

Quetzal allows users to save their searches, filters and individual citations as well as receive e-mail alerts when new, relevant material appears. Clicking the pink star next to “Save searches and filters” saves the search and applied filters in the user’s Quetzal account (Figure 4). From here, the search can be run again or an e-mail alert can be set up. Individual articles can be saved to the user’s Quetzal account by clicking the black diskette next to the search result. The checkboxes next to each reference can be used to export the articles in bulk to a bibliographic manager (.ris format) or a spreadsheet (.csv format) (Figure 2).

Figure 4 ‒ Saved searches. Users can save searches, including the applied filters, to their account. The search parameters and filters are listed on the left (A). The user can rerun the search (B), set up an alert (C) or delete the search (D).

An account is required to use any Quetzal services, some of which are free in the Basic version, which allows the contents of the PubMed database to be searched with Power Terms and filtered by publication date and type. The Professional version, which is geared toward researchers and physicians, includes all the Basic features plus user-created filters, a wider range of searchable text (TOXNET, news, NIH grants), Key Concepts filters, a “Journal club” feature, the ability to save searches and receive e-mail alerts for new articles relevant to the search, link to library holdings, access PDFs and export results. The Advanced version is useful for information professionals and researchers who need advanced features and patent content. This version is recommended for institutional accounts and includes all the features of the Professional version, plus the ability to search even more content (AHRQ guidelines, patents, full text) and the Negative Statement Filter.

Quetzal addresses many of the shortcomings of standard keyword-based search algorithms, such as providing a biomedically focused ontology that contains verbs and extracts relevant statements from the search results. The mode of conducting searches is also geared toward discovery (i.e., searching for a general topic and filtering), which is generally how researchers approach literature searching. One notable feature that is absent in the Basic version is the ability to export references, which will cost time later when using these references. The most impressive features are restricted to the Professional (Key Concepts and user-defined filters) and Advanced (Negative Statements) versions, but the costs are not prohibitive, especially on an institutional level.

References

  1. Lu, Z. PubMed and beyond: a survey of web tools for searching biomedical literature. Database (Oxford). 2011;2011:baq036. doi: 10.1093/database/baq036. PubMed PMID: 21245076; PMCID: 3025693.
  2. Steinbrook, R. Searching for the right search—reaching the medical literature. N. Engl. J. Med. 2006, 354(1), 4‒7. doi: 10.1056/NEJMp058128. PubMed PMID: 16394296.
  3. Anders, M.E. and Evans, D.P. Comparison of PubMed and Google Scholar literature searches. Respir.Care. 2010, 55(5), 578‒83. PubMed PMID: 20420728.
  4. Coppernoll-Blach, P. Quertle: the conceptual relationships alternative search engine for PubMed. JMLA2011, 99(2), 176‒7. doi: 10.3163/1536-5050.99.2.017. PubMed PMID: PMC3066589.
  5. www.quetzal-search.info/pages/powerterms.shtml

C. Tobin Magle is a biomedical science research support specialist, Health Sciences Library, University of Colorado Anschutz Medical Campus, 12950 E. Montview Blvd., Aurora, Col. 80045, U.S.A.; tel.: 303-724-2114; e-mail: [email protected]

Related Products

Comments