LREC 2012 Proceedings

Summary of the paper

Title	A Grammar-informed Corpus-based Sentence Database for Linguistic and Computational Studies
Authors	Hongzhi Xu, Helen Kaiyun Chen, Chu-Ren Huang, Qin Lu, Dingxu Shi and Tin-Shing Chiu
Abstract	We adopt the corpus-informed approach to example sentence selections for the construction of a reference grammar. In the process, a database containing sentences that are carefully selected by linguistic experts including the full range of linguistic facts covered in an authoritative Chinese Reference Grammar is constructed and structured according to the reference grammar. A search engine system is developed to facilitate the process of finding the most typical examples the users need to study a linguistic problem or prove their hypotheses. The database can also be used as a training corpus by computational linguists to train models for Chinese word segmentation, POS tagging and sentence parsing.
Topics	Corpus (creation, annotation, etc.), LR Infrastructures and Architectures, Tools, systems, applications
Full paper	A Grammar-informed Corpus-based Sentence Database for Linguistic and Computational Studies
Bibtex	@InProceedings{XU12.401, author = {Hongzhi Xu and Helen Kaiyun Chen and Chu-Ren Huang and Qin Lu and Dingxu Shi and Tin-Shing Chiu}, title = {A Grammar-informed Corpus-based Sentence Database for Linguistic and Computational Studies}, booktitle = {Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC'12)}, year = {2012}, month = {may}, date = {23-25}, address = {Istanbul, Turkey}, editor = {Nicoletta Calzolari (Conference Chair) and Khalid Choukri and Thierry Declerck and Mehmet Uğur Doğan and Bente Maegaard and Joseph Mariani and Asuncion Moreno and Jan Odijk and Stelios Piperidis}, publisher = {European Language Resources Association (ELRA)}, isbn = {978-2-9517408-7-7}, language = {english} }