Challenge Overview

This year the BioASQ challenge will comprise the following tasks. Participants may choose to participate in any or all of the tasks and their subtasks. Participant registration for any of the tasks will open here.

BioASQ Task 14b: Biomedical Semantic QA (involves IR, QA, summarization)

Task 14b will use benchmark datasets containing training and test biomedical questions, in English, along with gold standard (reference) answers. The participants will have to respond to each test question with relevant articles (in English, from designated article repositories), relevant snippets (from the relevant articles), exact answers (e.g., named entities in the case of factoid questions) and 'ideal' answers (English paragraph-sized summaries). More than 5,700 training questions (that were used as dry-run or test questions in previous year) are already available, along with their gold standard answers (relevant articles, snippets, exact answers, summaries). About 300 new test questions will be used this year. All the questions are constructed by biomedical experts from around the world.

The test dataset of Task 14b will be released in batches, each containing approximately 80 questions. The task will start on March, 2026. Separate winners will be announced for each batch. Participation in the task can be partial; for example, it is acceptable to participate in only some of the batches, to return only relevant articles (and no article snippets), or to return only exact answers (or only `ideal' answers). System responses will be evaluated both automatically and manually. Read more...

BioASQ Task Synergy 14: Biomedical Semantic QA For developing issues

Task Synergy will use benchmark datasets of test biomedical questions for developing issues, such as COVID-19, in English. The participants will have to respond to each test question with relevant articles (in English, from designated article repositories), relevant snippets (from the relevant articles), exact answers (e.g., named entities in the case of factoid questions) and 'ideal' answers (English paragraph-sized summaries). No special training questions are available for this task, but expert feedback will be incrementally provided instead, based on participant responses for each round. Using this feedback, the participants can improve their systems and provide better answers for persisting and/or new questions. Meanwhile, the participants may also train their systems using data from previous versions of task Synergy, that consists of approximately 400 questions on developing topics with incremental annotations including relevant material and answers, and the training dataset of task b, both available at the BioASQ Participants Area . All the questions are constructed and assessed by biomedical experts from around the world. Participation in the task can be partial, i.e. participants may enter the task in any of the rounds.

The Synergy Task will run in four rounds, starting with an initial set of questions for developing issues, such as COVID-19, in the first round on January, 2026. The questions will persist in later rounds until fully answered. In addition, new versions of the questions or new questions may also added in later rounds. Separate winners will be announced for each round. Participation in the task can be partial; for example, it is acceptable to participate in only some of the rounds, to return only relevant articles (or only article snippets), or to return only exact answers (or only `ideal' answers). System responses will be manually assessed and feedback for the responses will be provided at the end of each round. Read more...

BioASQ Task MultiClinSum-2: Multilingual Clinical Summarization

Task MultiClinSum-2 will rely on a corpus of manually selected full clinical case reports and their corresponding clinical case report summaries derived from case report publications written in the previously mentioned languages. For evaluation proposes, automatically generated summaries will be compared against manually generated summaries generated by the original authors, exploring Rouge-2 scores and BERTScore for evaluation assessment. You can join anytime from March, 2026 onwards.

Clinical content, such as medical records and case reports, is rapidly growing and written in multiple languages, not just English. These reports are often lengthy, making it difficult for domain experts to extract and track key clinical insights. Generative AI and Large Language Models (LLMs) have shown promise in summarizing such content, condensing detailed reports into shorter texts while preserving essential medical information. This highlights the urgent need to evaluate and benchmark clinical summarization methods across multilingual case reports.

Since clinical case reports share similarities with medical discharge summaries, findings from the MultiClinSum project are also relevant to broader clinical summarization tasks. The MultiClinSum dataset includes cases related to rare diseases and specialties like cardiology and rheumatology, offering valuable resources for ongoing clinical NLP efforts—particularly the BARITONE, DataTool4Heart, and AI4HF projects.

The MultiClinSum task focused on the automatic summarization of long clinical case reports written in multiple languages—specifically English, Spanish, French, and Portuguese. In addition, new languages such as Italian, Swedish, or Czech are being considered for addition, further broadening the linguistic coverage of task MultiClinSum-2. For evaluation, the automatically generated summaries were compared to human-written summaries using metrics such as ROUGE-2 and BERTScore.

Acknowledgements:

The BioASQ Task MultiClinSum is co-ogranized with the Barcelona Supercomputing Center. Read more...

BioASQ Task BioNNE-R: Nested Relation Extraction in Russian and English

The BioNNE-R shared task addresses the NLP challenge of relation extraction involving nested named entities, i.e. entities that contain other entities within their boundaries. Participants in this task must develop models for nested relation extraction, which can be either language-oriented or bilingual, depending on the track of the task. You can join anytime from Feb, 2026 onwards.

The train/dev datasets include relations involving annotated mentions of disorders, anatomical structures, and chemicals. We design our data to account for a complex structure of nested entity mentions and the partial nature of medical terminology. Participants are allowed to train any model architecture on any publicly available data to achieve the best performance.

The BioASQ Task BioNNE-R is co-ogranized with the Kazan Federal University. Read more...

BioASQ Task ELCardioCC: Clinical Coding of Greek Cardiology Discharge Letters

The ELCardioCC 2026 shared task concerns the automatic assignment of cardiology-related ICD-10 codes to hospital discharge letters at the document level. The task can be approached in two ways: either by combining Named Entity Recognition (NER) and Entity Linking (EL) techniques, or by formulating it as a multi-label classification (MLC) problem. To support both directions, we introduce a mixed dataset consisting of 1,500 documents annotated at both the document and mention level, suitable for NER+EL approaches, and 3,500 documents annotated only at the document level, intended for MLC approaches. In total, the dataset comprises 5,000 documents for training and development and 1,000 documents for testing. You can join anytime from Feb, 2026 onwards.

The BioASQ Task ELCardioCC is co-ogranized with the Aristotle University of Thessaloniki. Read more...

BioASQ Task GutBrainIE: Gut-Brain interplay Information Extraction

The Task GutBrainIE aims to foster the development of Information Extraction (IE) systems that support experts by automatically extracting and linking knowledge from scientific literature, facilitating the understanding of gut-brain interplay and its role in neurological diseases. The task is divided into three subtasks. In the first subtask, participants are provided with PubMed abstracts discussing the gut-brain interplay and asked to extract named entities about the gut-brain interplay from PubMed abstracts. In the second subtask, the participants are asked to identify binary relations – i.e., presence/absence – between any pair of entities they extract within an abstract. In the third subtask, the participants are asked to link extracted named entities to the corresponding concepts in a reference ontology. The submitted runs are evaluated based on Precision, Recall, and F1 measures for each subtask using gold annotations created by domain experts. You can join anytime from Feb, 2026 onwards.

The dataset for the GutBrainIE task includes circa 1,000 PubMed abstracts annotated by experts with entity mentions, corresponding concepts in the reference ontology, and binary relations.

The BioASQ Task GutBrainIE is co-ogranized with the University of Padua. Read more...

BioASQ Participants Area

View Descriptions of completed BioASQ tasks

You are here