Title |
Construction and Annotation of a French Folkstale Corpus |
Authors |
Anne Garcia-Fernandez, Anne-Laure Ligozat and Anne Vilnat |
Abstract |
In this paper, we present the digitization and annotation of a tales corpus - which is to our knowledge the only French tales corpus available and classified according to the Aarne&Thompson classification - composed of historical texts (with old French parts). We first studied whether the pre-processing tools, namely OCR and PoS-tagging, have good enough accuracies to allow automatic analysis. We also manually annotated this corpus according to several types of information which could prove useful for future work: character references, episodes, and motifs. The contributions are the creation of an corpus of French tales from classical anthropology material, which will be made available to the community; the evaluation of OCR and NLP tools on this corpus; and the annotation with anthropological information. |
Topics |
Information Extraction, Information Retrieval, Digital Libraries |
Full paper |
Construction and Annotation of a French Folkstale Corpus |
Bibtex |
@InProceedings{GARCIAFERNANDEZ14.1070,
author = {Anne Garcia-Fernandez and Anne-Laure Ligozat and Anne Vilnat}, title = {Construction and Annotation of a French Folkstale Corpus}, booktitle = {Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)}, year = {2014}, month = {may}, date = {26-31}, address = {Reykjavik, Iceland}, editor = {Nicoletta Calzolari (Conference Chair) and Khalid Choukri and Thierry Declerck and Hrafn Loftsson and Bente Maegaard and Joseph Mariani and Asuncion Moreno and Jan Odijk and Stelios Piperidis}, publisher = {European Language Resources Association (ELRA)}, isbn = {978-2-9517408-8-4}, language = {english} } |