Title |
Semantic Vectors: a Scalable Open Source Package and Online Technology Management Application |
Authors |
Dominic Widdows and Kathleen Ferraro |
Abstract |
This paper describes the open source SemanticVectors package that efficiently creates semantic vectors for words and documents from a corpus of free text articles. We believe that this package can play an important role in furthering research in distributional semantics, and (perhaps more importantly) can help to significantly reduce the current gap that exists between good research results and valuable applications in production software. Two clear principles that have guided the creation of the package so far include ease-of-use and scalability. The basic package installs and runs easily on any Java-enabled platform, and depends only on Apache Lucene. Dimension reduction is performed using Random Projection, which enables the system to scale much more effectively than other algorithms used for the same purpose. This paper also describes a trial application in the Technology Management domain, which highlights some user-centred design challenges which we believe are also key to successful deployment of this technology. |
Language |
|
Topics |
Tools, systems, applications, Acquisition, Machine Learning, Document Classification, Text categorisation |
Full paper |
Semantic Vectors: a Scalable Open Source Package and Online Technology Management Application |
Slides |
Semantic Vectors: a Scalable Open Source Package and Online Technology Management Application |
Bibtex |
@InProceedings{WIDDOWS08.300,
author = {Dominic Widdows and Kathleen Ferraro},
title = {Semantic Vectors: a Scalable Open Source Package and Online Technology Management Application},
booktitle = {Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)},
year = {2008},
month = {may},
date = {28-30},
address = {Marrakech, Morocco},
editor = {Nicoletta Calzolari (Conference Chair), Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odijk, Stelios Piperidis, Daniel Tapias},
publisher = {European Language Resources Association (ELRA)},
isbn = {2-9517408-4-0},
note = {http://www.lrec-conf.org/proceedings/lrec2008/},
language = {english}
} |