How Several Research Areas Relate to BioASQ Tasks

 

Classification, Semantic Indexing, Machine Learning

Task 4a requires newly published biomedical articles to be assigned MeSH headings, before PubMed curators annotate them manually. In effect, this is a classification task that requires documents to be automatically classified into a hierarchy of classes (MeSH headings); it can also be viewed as a semantic indexing task, where an index mapping documents to concepts (MeSH headings) is required. Task 4b Phase A requires, among other outputs, English questions to be assigned concepts from designated ontologies. This is also a classification task that requires questions to be classified into classes from multiple hierarchies; it can also be viewed as a semantic indexing task, where an index mapping questions to concepts is required. Task 4b Phase B requires, among other outputs, "yes" or "no" responses for yes/no questions; this can also be viewed as a (binary) classification task. Most classification methods employ machine learning. Machine learning is also extensively used in Information Retrieval, Question Answering, Information Extraction, Summarization, and Textual Entailment (see below).


Information Extraction, Named Entity Recognition and Disambiguation, Relation Extraction, Fact Checking

Given an English yes/no, factoid, or list biomedical question and relevant PubMed articles, relevant article snippets, relevant concepts and relevant RDF triples (from ontologies), Task 4b Phase B requires participants to produce, among other outputs, an `exact’ answer, i.e., a “yes” or “no” response for yes/no questions, a named entity for factoid questions, a list of named entities for list questions. Since most of the exact answers are named entities (or lists of named entities), this task is also directly related to named entity recognition and named entity disambiguation. Many of the questions in effect ask about relations between entities; hence, this task is also related to relation extraction and, more generally, information extraction. Fact checking methods can also be applied, for example to find support for positive or negative responses to yes/no questions.


Information Retrieval, Document Retrieval, Passage Retrieval, RDF Triple Retrieval

Given an English biomedical question, Task 4b Phase A requires participants to retrieve relevant PubMed articles, relevant article snippets, and relevant RDF triples from ontologies. Hence, this task is directly related to information retrieval, including document retrieval, passage retrieval, and RDF triple retrieval.


Question Answering from Texts and Structured Information

Given an English yes/no, factoid, or list biomedical question and relevant PubMed articles, relevant article snippets, relevant concepts and relevant RDF triples (from ontologies), Task 4b Phase B requires participants to produce, among other outputs, an `exact’ answer, i.e., a “yes” or “no” response for yes/no questions, a named entity for factoid questions, and a list of named entities for list questions. Hence, this task is directly related to question answering, from both texts and structured information.


Textual Entailment and Reasoning

Given an English biomedical question, Task 4b requires participants to identify relevant PubMed articles, relevant article snippets, relevant concepts and relevant RDF triples (Phase A), as well as to construct (in Phase B) `exact’ answers (e.g., named entities, in the case of factoid questions) and `ideal’ answers (paragraph-sized summaries). Since the questions may be phrased differently than in relevant articles, snippets, and English renderings of RDF triples, textual entailment and paraphrasing methods can be particularly useful (in Phase A). Textual entailment and paraphrasing methods can also be employed to avoid repeating the same information (expressed in different ways) in the `ideal’ answers (in Phase B). Reasoning at the level of logical representations can also be used to deduce, for example, answers that are not directly present in the available information sources, but can be inferred from them.


Text Summarization and Natural Language Generation

Given an English biomedical question and relevant PubMed articles, relevant article snippets, relevant concepts and relevant RDF triples (from ontologies), Task 4b Phase B requires participants to produce, among other outputs, an `ideal’ answer, meaning a paragraph-sized summary, ideally fluent and coherent, which reports all the relevant information, without repeating information and with no irrelevant information. Hence, this task is directly related to text summarization, especially multi-document summarization, and concept-to-text natural language generation.