The 5th year round of the BioASQ challenge will comprise three tasks. Participants may choose to participate in any or all of the tasks and their subtasks.
BioASQ Task 5a: Large-scale online biomedical semantic indexing
This task will be based on the standard process followed by PubMed to index journal abstracts. The participants will be asked to classify new PubMed documents, written in English, as they become available online, before PubMed curators annotate (in effect, classify) them manually. The classes will come from the MeSH hierarchy; they will be the subject headings that are currently used to manually index the abstracts, excluding those that are already provided by the authors of each article. As new manual annotations become available, they will be used to evaluate the classification performance of participating systems (that classify articles before they are manually annotated), using standard IR measures (e.g., precision, recall, accuracy), as well as hierarchical variants of them. The participants will be able to train their classifiers, using the whole history of manually annotated abstracts.
Task 5a will run for three consecutive periods (batches) of 5 weeks each. The first batch will start on February 06, 2017. Separate winners will be announced for each batch. Participation in the task can be partial, i.e. in some of the batches.
BioASQ Task 5b: Biomedical Semantic QA (involves IR, QA, summarization)
Task 5b will use benchmark datasets containing training and test biomedical questions, in English, along with gold standard (reference) answers. The participants will have to respond to each test question with relevant concepts (from designated terminologies and ontologies), relevant articles (in English, from designated article repositories), relevant snippets (from the relevant articles), relevant RDF triples (from designated ontologies), exact answers (e.g., named entities in the case of factoid questions) and 'ideal' answers (English paragraph-sized summaries). 1799 training questions (that were used as dry-run or test questions in previous year) are already available, along with their gold standard answers (relevant concepts, articles, snippets, exact answers, summaries). At least 500 new test questions will be used this year. All the questions are constructed by biomedical experts from around Europe.
The test dataset of Task 5b will be released in five batches, each containing approximately 100 questions. The first batch will start on March 08, 2017. Separate winners will be announced for each batch. Participation in the task can be partial; for example, it is acceptable to participate in only some of the batches, to return only relevant articles (and no concepts, triples, article snippets), or to return only exact answers (or only `ideal' answers). System responses will be evaluated both automatically and manually.
BioASQ Task 5c: Funding Information Extraction from Biomedical Literature
For this introductory year, task 5c will run in one batch. The participants will be asked to extract grant information of new PubMed documents, from full text available in PubMed Central. The participants will have to respond to each test article with grant ids and grant agencies mentioned in the article's full text. Annotations from PubMed will be used to evaluate the information extraction performance of participating systems.
A test batch for Task 5c will be released in April 18, consisting of biomedical articles from PubMed with full text available from PubMed Central. Participant systems should extract funding information from the full text of articles included in the test set. Three properties of participant answers will be evaluated independently for each system submission (extraction of grant ids, extraction of grant agencies and their combination) and separate winners will be announced for each category.