BioASQ Workshop Scope

Every day, we generate 2.5 quintillion bytes of data. In domains such as bio-medicine, approximately 3000 new articles are published on the Web every day. This averages to more than 2 articles every minute. In addition to the sheer amount of information available on the Web, the variety of this information increases everyday and ranges for structured data in the form of ontologies to unstructured data in the form of documents. Staying on top of this huge amount of diverse data requires methods that allow detecting and integrating portions of datasets that satisfy the information need of given users from sources such as documents, ontologies, Linked Data sets, etc. Developing tools to achieve this bold goal requires combining techniques from several disciplines including Natural Language Processing (e.g., question answering, document summarization, ontology verbalization), Information Retrieval (e.g., document and passage retrieval), Machine Learning (e.g., large-scale hierarchical classification, clustering, etc.), Semantic Web/Linked Data (e.g., reasoning, link discovery) and Databases (e.g., storage and retrieval of triples, indexing, etc.).

The aim of the BioASQ workshop is to bring experts from these domains together in order to push the research frontier towards hybrid information systems that will be able to deal with the whole diversity of the Web, especially for, but not restricted to the context of bio-medicine. During the workshop, the results of the open BioASQ challenge will also be presented.

The topics of interest include (but are not restricted to):

  • Large-scale hierarchical text classification
  • Large-scale classification of documents onto ontology concepts (semantic indexing)
  • Classification of questions onto ontological concepts
  • Scalable approaches to document clustering
  • Text summarization, especially multi-document and query-focused summarization
  • Verbalization of structured information and related queries (RDF, OWL, SPARQL, etc.)
  • Question Answering over structured, semi-structured and unstructured data
  • Reasoning for information retrieval and question answering
  • Information retrieval over fragmented sources of information
  • Efficient indexing and storage structures for information retrieval
  • Delivery of the retrieved information in a concise and user-understandable form