The BioInfer corpus is provided in an XML format and can be parsed by any of several freely available XML parsers. However, as the corpus annotation is relatively complex, we provide the following supporting software to further ease the use of the corpus:
- Basic classes
- An extendable framework of python classes for parsing the corpus XML and representing it as a hierarchical object structure. The basic classes offer functions for easily accessing different aspects of the corpus annotation.
- A simple command-line python program for extracting different aspects of the corpus annotation in a simple human-readable format. In addition to providing simplified access to the corpus annotation, the extractor serves as an example on how to use the basic classes to transform the corpus annotation into other formats.
- A python program that presents a graphical user interface for viewing the corpus sentences and annotation types. The visualizer helps understand the different annotation types and how they come together to capture the information stated in the sentences.
All of the supporting software is available under the GNU LGPL license, a free software license that allows use, modification and redistribution for any purpose, including commercial use.
Detailed class documentation of the BioInfer supporting software can be found from here.
Download and installation
To download the supporting software and read installation instructions, please see the download page.