| Title | A Study on Expert Sourcing Enterprise Question Collection and Classification | 
  
  | Authors | Yuan Luo, Thomas Boucher, Tolga Oral, David Osofsky and Sara Weber | 
  
  | Abstract | Large enterprises, such as IBM, accumulate petabytes of free-text data within their organizations. To mine this big data, a critical ability is to enable meaningful question answering beyond keywords search. In this paper, we present a study on the characteristics and classification of IBM sales questions. The characteristics are analyzed both semantically and syntactically, from where a question classification guideline evolves. We adopted an enterprise level expert sourcing approach to gather questions, annotate questions based on the guideline and manage the quality of annotations via enhanced inter-annotator agreement analysis. We developed a question feature extraction system and experimented with rule-based, statistical and hybrid question classifiers. We share our annotated corpus of questions and report our experimental results. Statistical classifiers separately based on n-grams and hand-crafted rule features give reasonable macro-f1 scores at 61.7% and 63.1% respectively. Rule based classifier gives a macro-f1 at 77.1%. The hybrid classifier with n-gram and rule features using a second guess model further improves the macro-f1 to 83.9%. | 
  
  | Topics | Question Answering, Crowdsourcing | 
  
  | Full paper  | A Study on Expert Sourcing Enterprise Question Collection and Classification | 
  
  | Bibtex | @InProceedings{LUO14.25, author =  {Yuan Luo and Thomas Boucher and Tolga Oral and David Osofsky and Sara Weber},
 title =  {A Study on Expert Sourcing Enterprise Question Collection and Classification},
 booktitle =  {Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)},
 year =  {2014},
 month =  {may},
 date =  {26-31},
 address =  {Reykjavik, Iceland},
 editor =  {Nicoletta Calzolari (Conference Chair) and Khalid Choukri and Thierry Declerck and Hrafn Loftsson and Bente Maegaard and Joseph Mariani and Asuncion Moreno and Jan Odijk and Stelios Piperidis},
 publisher =  {European Language Resources Association (ELRA)},
 isbn =  {978-2-9517408-8-4},
 language =  {english}
 }
 |