The first BioASQ challenge will comprise two tasks. Participants may choose to participate in any or all of the tasks and their subtasks.
BioASQ Task 1a: Large-scale online biomedical semantic indexing
This task will be based on the standard process followed by PubMed to index journal abstracts. The participants will be asked to classify new PubMed documents, written in English, as they become available online, before PubMed curators annotate (in effect, classify) them manually. The classes will come from the MeSH hierarchy; they will be the subject headings that are currently used to manually index the abstracts, excluding those that are already provided by the authors of each article. As new manual annotations become available, they will be used to evaluate the classification performance of participating systems (that classify articles before they are manually annotated), using standard IR measures (e.g., precision, recall, accuracy), as well as hierarchical variants of them. The participants will be able to train their classifiers, using the whole history of manually annotated abstracts.
Task 1a will run for three consecutive periods (batches) of 6 weeks each. The first batch will start on April 22, 2013. Separate winners will be announced for each batch. Participation in the task can be partial, i.e. in some of the batches.
BioASQ Task 1b: Introductory biomedical semantic QA
Task 1B will use benchmark datasets containing development and test questions, in English, along with gold standard (reference) answers. The benchmark datasets are being constructed by a team of biomedical experts from around Europe. Task 1B will run in two phases:
- Phase A: BioASQ will release questions from the benchmark datasets. The participants will have to respond with relevant concepts (from designated terminologies and ontologies), relevant articles (in English, from designated article repositories), relevant snippets (from the relevant articles), and relevant RDF triples (from designated ontologies).
- Phase B: BioASQ will release questions and gold (correct) relevant concepts, articles, snippets, and RDF triples from the benchmark datasets. The participants will have to respond with exact answers (e.g., named entities in the case of factoid questions) and ideal answers (paragraph-sized summaries), both in English.
The test dataset of Task 1B will be released in three batches, each containing approximately 100 questions. For each batch, first only the questions of the batch will be released, and the participants will have to submit their answers for Phase A (concepts, articles, snippets, RDF triples) within 24 hours; then the gold concepts, articles, snippets, and RDF triples for the questions of the batch will also be provided, and the participants will again have 24 hours to submit their answers for Phase B ("exact" and "ideal" answers). The first batch will start on Monday June 17, 2013.