| Title | Crowdsourcing and Annotating NER for Twitter #drift | 
  
  | Authors | Hege Fromreide, Dirk Hovy and anders Søgaard | 
  
  | Abstract | "We present two new NER datasets for Twitter; a manually annotated set of 1,467 tweets (kappa=0.942) and a set of 2,975 expert-corrected, crowdsourced NER annotated tweets from the dataset described in Finin et al. (2010). In our experiments with these datasets, we observe two important points: (a) language drift on Twitter is significant, and while off-the-shelf systems have been reported to perform well on in-sample data, they often perform poorly on new samples of tweets, (b) state-of-the-art performance across various datasets can be obtained from crowdsourced annotations, making it more feasible to ""catch up"" with language drift." | 
  
  | Topics | Social Media Processing, Crowdsourcing | 
  
  | Full paper  | Crowdsourcing and Annotating NER for Twitter #drift | 
  
  | Bibtex | @InProceedings{FROMREIDE14.421, author =  {Hege Fromreide and Dirk Hovy and anders Søgaard},
 title =  {Crowdsourcing and Annotating NER for Twitter #drift},
 booktitle =  {Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)},
 year =  {2014},
 month =  {may},
 date =  {26-31},
 address =  {Reykjavik, Iceland},
 editor =  {Nicoletta Calzolari (Conference Chair) and Khalid Choukri and Thierry Declerck and Hrafn Loftsson and Bente Maegaard and Joseph Mariani and Asuncion Moreno and Jan Odijk and Stelios Piperidis},
 publisher =  {European Language Resources Association (ELRA)},
 isbn =  {978-2-9517408-8-4},
 language =  {english}
 }
 |