BioASQ Releases Continuous Space Word Vectors Obtained by Applying Word2Vec to PubMed Abstracts

The word2vec tool (https://code.google.com/p/word2vec/) processes a large text corpus and maps the words of the corpus to vectors of a continuous space. The word vectors can then be used, for example, to estimate the relatedness of two words or to perform query expansion. We applied word2vec to a corpus of 10,876,004 English abstracts of biomedical articles from PubMed. The resulting vectors of 1,701,632 distinct words (types) are now publicly available from http://bioasq.lip6.fr/tools/BioASQword2vec/. File size: 1.3GB (compressed), 3.5GB (uncompressed). More information here.