Title | An Annotated German-Language Medical Text Corpus as Language Resource |
Author(s) |
Joachim Wermter, Udo Hahn
Text Knowledge Engineering Lab, Freiburg University, Werthmannplatz 1, D-79098 Freiburg, Germany |
Session | P6-T |
Abstract | We describe the structure of a German-language corpus which contains a variety of medical text genres. Clinical documents (discharge summaries, pathology, histology and surgery reports) are distinguished from non-clinical ones (textbook articles and consumer health care documents from a Web portal). After introducing a medical extension of the general-language STTS tagset which accounts for unique features of the medical sublanguage encountered in these documents, we discuss some of the quantitative properties of the annotations (e.g., distribution patterns of part-of-speech tags). |
Keyword(s) | Text corpus, medical application, annotation, tagging, sublanguage |
Language(s) | German |
Full Paper | 614.pdf |