Title

Corpus-based Learning of Lexical Resources for German Named Entity Recognition

Author(s)

Marc Rössler

Computational Linguistics, University Duisburg-Essen, Duisburg - Germany (marc.roessler@uni-duisburg.de)

Session

O15-W

Abstract

This paper explores the use of unlabeled data in a knowledge-poor approach to German NER. German is especially interesting for NER since not only names but all nouns are capitalized. Therefore, large and reliable lexical resources are necessary to develop and adapt systems for NER. Motivated by a model of word form observance, distinguishing three levels of different granularity, a method for the automatic creation of domain-sensitive lexical resources for NER is proposed. The approach uses linear SVMs and is based solely on an annotated corpus of reasonable size and a large amount of unlabeled data.

Keyword(s)

Named Entity Recognition, linear SVM, learning from unlabeled data

Language(s) German
Full Paper

373.pdf