Skip navigation.


Two tutorials will be held at SMBM 2008 on September 1st, the first day of the symposium. Each tutorial spans half of the day, and the tutorials are independent so that attendees can attendend one or both according to their interests. The following introduces the two tutorials; more information for attendees will be provided shortly.

UIMA for Semantic Text Mining in Biomedicine

NEW! Tutorial handouts available (8,5 MB PDF).

The Unstructured Information Management Architecture (UIMA) is a framework for analyzing unstructured data such as text to discover knowledge that is relevant to end users. UIMA enables novice users to build text mining applications from existing components (such as, e.g., named entity recognition or relation extraction) in a non-programmatic way and it allows expert users to flexibly build large-scale systems tailored to their specific needs. UIMA is efficient and scalable and holds promise of becoming a de facto standard for large-scale natural language processing systems.

The tutorial addresses practitioners and researchers who are interested in using UIMA for semantic text mining applications in biomedicine. Attendees will get a solid grounding in UIMA. They will understand what UIMA is, what benefits it can bring to their work, know how to build NLP applications with UIMA, and where to look for ready-made components.

For more details see the full tutorial description here. (PDF file)

Tutorial organizers:

  • Ekaterina Buyko, Jena University Language & Information Engineering (JULIE) Lab
  • Thilo Götz, IBM Germany
  • Katrin Tomanek, Jena University Language & Information Engineering (JULIE) Lab

For more information on UIMA, see

Resources for Semantic Mining: Corpora and Ontologies

A growing number of increasingly sophisticated resources are available for biomedical text mining. Recently, corpora and ontological resources supporting the extraction and normalized representation of complex events from domain scientific literature have become available. In addition to the many opportunities that e.g. the BioInfer and GENIA Event corpora and numerous domain ontologies provide, the inherent complexity of these resources presents users with some challenges to adoption. This tutorial explores these issues in detail, presenting the current generation of semantic mining resources and techniques for their use.

The tutorial will be presented in two parts, one focusing on corpora and the other on ontologies.

Tutorial organizers (corpora)

  • Jin-Dong Kim, University of Tokyo, Tsujii laboratory ("GENIA group")
  • Sampo Pyysalo, University of Turku, Bioinformatics laboratory ("BioInfer group")

Tutorial organizers (ontologies)

  • Jung-jae Kim, European Bioinformatics Institute, Rebholz Group (text mining)
  • Dietrich Rebholz-Schuhmann, European Bioinformatics Institute, Rebholz Group (text mining)

For more information, see