Title

Title	The Mixer Corpus of Multilingual, Multichannel Speaker Recognition Data
Author(s)	Christopher Cieri (1), Joseph P. Campbell (2), Hirotaka Nakasone (3), David Miller (1), Kevin Walker (1) (1) University of Pennsylvania, Linguistic Data Consortium, Philadelphia, PA, USA; (2) MIT Lincoln Laboratory, Lexington, MA, USA; (3) Federal Bureau of Investigation, Quantico, VA, USA
Session	P9-SE
Abstract	This paper describes efforts to create corpora to support and evaluate systems that perform speaker recognition where channel and language may vary. Beyond the ongoing evaluation of speaker recognition systems, these corpora are aimed at the bilingual and cross channel dimensions. We report on specific data collection efforts at the Linguistic Data Consortium and the research ongoing at the US Federal Bureau of Investigation and MIT Lincoln Laboratories. We cover the design and requirements, the collections and final properties of the corpus integrating discussions of the data preparation, research, technology development and evaluation on a grand scale.
Keyword(s)	Language resources, speech, speaker recognition, bilingual, multilingual, cross-channel, extended data
Language(s)	Arabic, English, Mandarin, Russian, Spanish
Full Paper	771.pdf