Summary of the paper

Title Data Mining with Shallow vs. Linguistic Features to Study Diversification of Scientific Registers
Authors Stefania Degaetano-Ortlieb, Peter Fankhauser, Hannah Kermes, Ekaterina Lapshinova-Koltunski, Noam Ordan and Elke Teich
Abstract We present a methodology to analyze the linguistic evolution of scientific registers with data mining techniques, comparing the insights gained from shallow vs. linguistic features. The focus is on selected scientific disciplines at the boundaries to computer science (computational linguistics, bioinformatics, digital construction, microelectronics). The data basis is the English Scientific Text Corpus (SCITEX) which covers a time range of roughly thirty years (1970/80s to early 2000s) (Degaetano-Ortlieb et al., 2013; Teich and Fankhauser, 2010). In particular, we investigate the diversification of scientific registers over time. Our theoretical basis is Systemic Functional Linguistics (SFL) and its specific incarnation of register theory (Halliday and Hasan, 1985). In terms of methods, we combine corpus-based methods of feature extraction and data mining techniques.
Topics Information Extraction, Information Retrieval, Knowledge Discovery/Representation
Full paper Data Mining with Shallow vs. Linguistic Features to Study Diversification of Scientific Registers
Bibtex @InProceedings{DEGAETANOORTLIEB14.291,
  author = {Stefania Degaetano-Ortlieb and Peter Fankhauser and Hannah Kermes and Ekaterina Lapshinova-Koltunski and Noam Ordan and Elke Teich},
  title = {Data Mining with Shallow vs. Linguistic Features to Study Diversification of Scientific Registers},
  booktitle = {Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)},
  year = {2014},
  month = {may},
  date = {26-31},
  address = {Reykjavik, Iceland},
  editor = {Nicoletta Calzolari (Conference Chair) and Khalid Choukri and Thierry Declerck and Hrafn Loftsson and Bente Maegaard and Joseph Mariani and Asuncion Moreno and Jan Odijk and Stelios Piperidis},
  publisher = {European Language Resources Association (ELRA)},
  isbn = {978-2-9517408-8-4},
  language = {english}
 }
Powered by ELDA © 2014 ELDA/ELRA