LREC 2014 Proceedings

Summary of the paper

Title	An Effortless Way To Create Large-Scale Datasets For Famous Speakers
Authors	François Salmon and Félicien Vallet
Abstract	The creation of large-scale multimedia datasets has become a scientific matter in itself. Indeed, the fully-manual annotation of hundreds or thousands of hours of video and/or audio turns out to be practically infeasible. In this paper, we propose an extremly handy approach to automatically construct a database of famous speakers from TV broadcast news material. We then run a user experiment with a correctly designed tool that demonstrates that very reliable results can be obtained with this method. In particular, a thorough error analysis demonstrates the value of the approach and provides hints for the improvement of the quality of the dataset.
Topics	Speech Resource/Database, Person Identification
Full paper	An Effortless Way To Create Large-Scale Datasets For Famous Speakers
Bibtex	@InProceedings{SALMON14.32, author = {François Salmon and Félicien Vallet}, title = {An Effortless Way To Create Large-Scale Datasets For Famous Speakers}, booktitle = {Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)}, year = {2014}, month = {may}, date = {26-31}, address = {Reykjavik, Iceland}, editor = {Nicoletta Calzolari (Conference Chair) and Khalid Choukri and Thierry Declerck and Hrafn Loftsson and Bente Maegaard and Joseph Mariani and Asuncion Moreno and Jan Odijk and Stelios Piperidis}, publisher = {European Language Resources Association (ELRA)}, isbn = {978-2-9517408-8-4}, language = {english} }