Title | Memory-based Classification of Proper Names in Norwegian |
Author(s) |
Anders Nøklestad
Department of Linguistics, University of Oslo |
Session | P5-W |
Abstract | This paper describes the classifier part of a named entity recogniser for Norwegian which uses memory-based learning to categorise proper names. Names are classified into one of the categories Person, Organisation, Location, Work, Event, or Other. We test the effect of using different features as input to the model, ranging from knowledge-poor features such as windows of inflected forms, to features that require high-level processing such as syntactic analysis. We run training sessions with four different k-values for the k-nearest neighbour classifier, and with four different feature weighting schemes. We also apply a document-centred approach/one sense per discourse strategy to the output of the memory-based learner. We find that the most important features are the use of gazetteers and the inclusion of lemmas that constitute multi-word proper names, and that document-centred post-processing gives a highly valuable contribution to the performance of the classifier. The best version of the classifier achieves an accuracy of 90.67% using leave-one-out testing and 83.18% using ten-fold cross-validation. The classifier outperforms a maximum entropy model using the same set of features. |
Keyword(s) | Named entity recognition, memory-based learning, Norwegian, comparison between memory-based learning and maximum entropy modelling |
Language(s) | Norwegian |
Full Paper | 652.pdf |