Summary of the paper

Title Construction of the Turkish National Corpus (TNC)
Authors Yeşim Aksan, Mustafa Aksan, Ahmet Koltuksuz, Taner Sezer, Ümit Mersinli, Umut Ufuk Demirhan, Hakan Yılmazer, Gülsüm Atasoy, Seda Öz, İpek Yıldız and Özlem Kurtoğlu
Abstract This paper addresses theoretical and practical issues experienced in the construction of Turkish National Corpus (TNC). TNC is designed to be a balanced, large scale (50 million words) and general-purpose corpus for contemporary Turkish. It has benefited from previous practices and efforts for the construction of corpora. In this sense, TNC generally follows the framework of British National Corpus, yet necessary adjustments in corpus design of TNC are made whenever needed. All throughout the process, different types of open-source software are used for specific tasks, and the resulting corpus is a free resource for non-commercial use. This paper presents TNC's design features, web-based corpus management system, carefully planned workflow and its web-based user-friendly search interface.
Topics Corpus (creation, annotation, etc.), Morphology, Part of speech tagging
Full paper Construction of the Turkish National Corpus (TNC)
Bibtex @InProceedings{AKSAN12.991,
  author = {Yeşim Aksan and Mustafa Aksan and Ahmet Koltuksuz and Taner Sezer and Ümit Mersinli and Umut Ufuk Demirhan and Hakan Yılmazer and Gülsüm Atasoy and Seda Öz and İpek Yıldız and Özlem Kurtoğlu},
  title = {Construction of the Turkish National Corpus (TNC)},
  booktitle = {Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC'12)},
  year = {2012},
  month = {may},
  date = {23-25},
  address = {Istanbul, Turkey},
  editor = {Nicoletta Calzolari (Conference Chair) and Khalid Choukri and Thierry Declerck and Mehmet Uğur Doğan and Bente Maegaard and Joseph Mariani and Asuncion Moreno and Jan Odijk and Stelios Piperidis},
  publisher = {European Language Resources Association (ELRA)},
  isbn = {978-2-9517408-7-7},
  language = {english}
 }
Powered by ELDA © 2012 ELDA/ELRA