Title | Cluster Analysis and Classification of Named Entities |
Author(s) |
Joaquim F. Ferreira da Silva (1), Zornitsa Kozareva (2), José Gabriel Pereira Lopes (1)
(1) Departamento de Informática, Faculdade de Ciências e Tecnologia, Universidade Nova de Lisboa, Quinta da Torre, 2725 Monte da Caparica, Portugal, jfs@di.fct.unl.pt; (2) Faculty of Mathematics and Informatics, Plovdiv University, 236, Bulgaria blvd., Plovdiv, Bulgaria, zkozareva@hotmail.com |
Session | P2-W |
Abstract | This paper presents a statistics-based and language independent unsupervised approach for clustering possible named entities. We describe and motivate the features and statistical filters used by our clustering process. Using the Model-Based Clustering Analysis software we obtained different clusters of named entities. The method was applied to Bulgarian and English. For some clusters, precision is close to 100%; this helps human validation and saves time. Other clusters still need further refinement. Based on the obtained clusters, it is possible to classify new named entities. |
Keyword(s) | Named Entities, Multiword Lexical Units, Clustering |
Language(s) | English, Bulgarian |
Full Paper | 796.pdf |