SUMMARY : Session O39-W Lexicons, Semantics, Clustering Tools

 

Title An Introduction to NLP-based Textual Anonymisation
Authors B. Medlock
Abstract We introduce the problem of automatic textual anonymisation and present a new publicly-available, pseudonymised benchmark corpus of personal email text for the task, dubbed ITAC (Informal Text Anonymisation Corpus). We discuss the method by which the corpus was constructed, and consider some important issues related to the evaluation of textual anonymisation systems. We also present some initial baseline results on the new corpus using a state of the art HMM-based tagger. We introduce the problem of automatic textual anonymisation and present a new publicly-available, pseudonymised benchmark corpus of personal email text for the task, dubbed ITAC (Informal Text Anonymisation Corpus). We discuss the method by which the corpus was constructed, and consider some important issues related to the evaluation of textual anonymisation systems. We also present some initial baseline results on the new corpus using a state of the art HMM-based tagger.
Keywords Anonymisation, pseudonymisation, sensitivity, annotation, machine learning
Full paper An Introduction to NLP-based Textual Anonymisation