BioInfer: Bio Information Extraction Resource

This is the home page of the Biomedical Information Extraction Resource (BioInfer), a public resource providing a manually annotated corpus and related resources for information extraction in the biomedical domain.

Corpus data (UPDATED 12.9.2008)

A version of BioInfer with all relationships represented as binary relations between proteins is available on the download page. The process used to create this data is described in the paper Complex-to-Pairwise Mapping of Biological Relationships using a Semantic Network Representation (SMBM'08 proceedings, p. 45-52).

A new release (1.1.1) of the corpus is available on the download page. This release fixes minor technical issues in the corpus XML file (mainly occasionally miscalculated token character offsets). The annotation is exactly the same as in 1.1.0.

The version 1.1.0 of the corpus is available on the download page. This version includes full dependency annotation following the Stanford dependency scheme as described in the paper On the unification of syntactic annotations under the Stanford dependency scheme: A case study on BioInfer and GENIA in the ACL BioNLP'07 workshop. Further, the parallel LG linkage now features fully manually corrected link types. This version deprecates the serial and raw LG linkages.

Supporting software (UPDATED 21.6.2007)

Supporting software for users of the corpus is available for any use under an open source licence and can be found here. With version 1.1.0 of the corpus, we make available under an open source licence the liblp2lp program and conversion rules that is used to transform the undirected LG linkages to the directed dependency graphs in the Stanford scheme.


A list of publications concerning and using BioInfer can be found on the publications page.

