|
TOPICS: Browse articles of the conference sorted by topic
A - C - D - E - G - H - I - K - L - M - N - O - P - Q - S - T - U - V - W
C |
Cognitive Methods |
Semi-Supervised Methods for Expanding Psycholinguistics Norms by Integrating Distributional Similarity with the Structure of WordNet
#mygoal: Finding Motivations on Twitter
A Graph-based Approach for Computing Free Word Associations
Design and Development of an Online Computational Framework to Facilitate Language Comprehension Research on Indian Languages
Mining a Multimodal Corpus for Non-Verbal Behavior Sequences Conveying Attitudes
Turkish Resources for Visual Word Recognition
|
Collaborative Resource Construction |
The DWAN Framework: Application of a Web Annotation Framework for the General Humanities to the Domain of Language Resources
Collaboratively Annotating Multilingual Parallel Corpora in the Biomedical Domain―Some Mantras
Mapping Between English Strings and Reentrant Semantic Graphs
Developing Text Resources for Ten South African Languages
Zmorge: a German Morphological Lexicon Extracted from Wiktionary
Evaluating Lemmatization Models for Machine-Assisted Corpus-Dictionary Linkage
Digital Library 2.0: Source of Knowledge and Research Collaboration Platform
Linguistic Landscaping of South Asia Using Digital Language Resources: Genetic Vs. Areal Linguistics
SAVAS: Collecting, Annotating and Sharing Audiovisual Language Resources for Automatic Subtitling
CFT13: a Resource for Research into the Post-editing Process
Generating a Lexicon of Errors in Portuguese to Support an Error Identification System for Spanish Native Learners
A Colloquial Corpus of Japanese Sign Language: Linguistic Resources for Observing Sign Language Conversations
Can Numerical Expressions Be Simpler? Implementation and Demostration of a Numerical Simplification System for Spanish
The eIdentity Text Exploration Workbench
Rhapsodie: a Prosodic-Syntactic Treebank for Spoken French
CLARA: A New Generation of Researchers in Common Language Resources and Their Applications
Can Crowdsourcing Be Used for Effective Annotation of Arabic?
TweetNorm_es: an Annotated Corpus for Spanish Microtext Normalization
Corpus Annotation Through Crowdsourcing: Towards Best Practice Guidelines
Towards an Environment for the Production and the Validation of Lexical Semantic Resources
Towards an Encyclopedia of Compositional Semantics: Documenting the Interface of the English Resource Grammar
MUHIT: a Multilingual Harmonized Dictionary
Pivot-based Multilingual Dictionary Building Using Wiktionary
The AMARA Corpus: Building Parallel Language Resources for the Educational Domain
Exploiting Networks in Law
Terminology Resources and Terminology Work Benefit from Cloud Services
|
Computer-Assisted Language Learning (CALL) |
FLELex: a Graded Lexical Resource for French Foreign Learners
MAT: a Tool for L2 Pronunciation Errors Annotation
Generating a Lexicon of Errors in Portuguese to Support an Error Identification System for Spanish Native Learners
Reusing Swedish Framenet for Training Semantic Roles
A Database of Freely Written Texts of German School Students for the Purpose of Automatic Spelling Error Classification
Automatic Error Detection Concerning the Definite and Indefinite Conjugation in the Hunlearner Corpus
Designing a Bilingual Speech Corpus for French and German Language Learners: a Two-Step Process
An Innovative World Language Centre : Challenges for the Use of Language Technology
Open Philology at the University of Leipzig
|
Controlled Languages |
Presenting a System of Human-Machine Interaction for Performing Map Tasks.
|
Corpus (Creation, Annotation, etc.) |
Statistical Analysis of Multilingual Text Corpus and Development of Language Models
Smile and Laughter in Human-Machine Interaction: a Study of Engagement
A Conventional Orthography for Tunisian Arabic
The AMARA Corpus: Building Parallel Language Resources for the Educational Domain
A Multimodal Dataset for Deception Detection
Human Annotation of ASR Error Regions: is "gravity" a Sharable Concept for Human Annotators?
Correcting and Validating Syntactic Dependency in the Spoken French Treebank Rhapsodie
Erlangen-CLP: A Large Annotated Corpus of Speech from Children with Cleft Lip and Palate
Semi-Automatic Annotation of the Ucu Accents Speech Corpus
Evaluation of Automatic Hypernym Extraction from Technical Corpora in English and Dutch
The Cle Urdu POS Tagset
Automatic Detection of Other-Repetition Occurrences: Application to French Conversational Speech
EMOVO Corpus: an Italian Emotional Speech Database
A Tagged Corpus and a Tagger for Urdu
A Multidialectal Parallel Corpus of Arabic
Identification of Multiword Expressions in the Brwac
Phone Boundary Annotation in Conversational Speech
NoSta-D Named Entity Annotation for German: Guidelines and Dataset
Mörkum Njálu. an Annotated Corpus to Analyse and Explain Grammatical Divergences Between 14th-Century Manuscripts of Njál's Saga.
Mapping WordNet Domains, WordNet Topics and Wikipedia Categories to Generate Multilingual Domain Specific Resources
The Polish Summaries Corpus
Variations on Quantitative Comparability Measures and Their Evaluations on Synthetic French-English Comparable Corpora
Teenage and Adult Speech in School Context: Building and Processing a Corpus of European Portuguese
On the Importance of Text Analysis for Stock Price Prediction
A Corpus of Comparisons in Product Reviews
The IULA Spanish LSP Treebank
A System for Experiments with Dependency Parsers
Sockpuppet Detection in Wikipedia: a Corpus of Real-World Deceptive Writing for Linking Identities
ALICO: a Multimodal Corpus for the Study of Active Listening
Corpus and Method for Identifying Citations in Non-Academic Text
A Cross-Language Corpus for Studying the Phonetics and Phonology of Prominence
Using Resource-Rich Languages to Improve Morphological Analysis of Under-Resourced Languages
Collaboratively Annotating Multilingual Parallel Corpora in the Biomedical Domain―Some Mantras
On the Use of a Fuzzy Classifier to Speed Up the Sp_ToBI Labeling of the Glissando Spanish Corpus
Turkish Treebank as a Gold Standard for Morphological Disambiguation and Its Influence on Parsing
Praaline: Integrating Tools for Speech Corpus Research
Interoperability and Customisation of Annotation Schemata in Argo
Polish Coreference Corpus in Numbers
A Gold Standard Dependency Corpus for English
A Corpus of Machine Translation Errors Extracted from Translation Students Exercises
Co-Training for Classification of Live Or Studio Music Recordings
Creating and Using Large Monolingual Parallel Corpora for Sentential Paraphrase Generation
A New Framework for Sign Language Recognition Based on 3d Handshape Identification and Linguistic Modeling
Crowdsourcing for the Identification of Event Nominals: an Experiment
Semantic Technologies for Querying Linguistic Annotations: an Experiment Focusing on Graph-Structured Data
A Hierarchical Taxonomy for Classifying Hardness of Inference Tasks
The Sweet-Home Speech and Multimodal Corpus for Home Automation Interaction
Tools for Arabic Natural Language Processing: a Case Study in Qalqalah Prosody
Aligning Predicate-Argument Structures for Paraphrase Fragment Extraction
Automatic Creation of WordNets from Parallel Corpora
Pre-Ordering of Phrase-based Machine Translation Input in Translation Workflow
A Wikipedia-based Corpus for Contextualized Machine Translation
Motàmot Project: Conversion of a French-Khmer Published Dictionary for Building a Multilingual Lexical System
Building a Corpus of Manually Revised Texts from Discourse Perspective
Single-Person and Multi-Party 3d Visualizations for Nonverbal Communication Analysis
Interoperability of Dialogue Corpora Through Iso 24617-2-based Querying
The Database for Spoken German ― DGD2
Simple Effective Microblog Named Entity Recognition: Arabic as an Example
Priberam Compressive Summarization Corpus: a New Multi-Document Summarization Corpus for European Portuguese
The MMASCS Multi-Modal Annotated Synchronous Corpus of Audio, Video, Facial Motion and Tongue Motion Data of Normal, Fast and Slow Speech
Constructing a Chinese―Japanese Parallel Corpus from Wikipedia
Modelling Irony in Twitter: Feature Analysis and Evaluation
Corpus and Evaluation of Handwriting Recognition of Historical Genealogical Records
Computational Narratology: Extracting Tense Clusters from Narrative Texts
Designing the Latvian Speech Recognition Corpus
Aligning Parallel Texts with Intertext
From Non Word to New Word: Automatically Identifying Neologisms in French Newspapers
A Corpus of Spontaneous Speech in Lectures: the Kit Lecture Corpus for Spoken Language Processing and Translation
The Pragmatic Annotation of a Corpus of Academic Lectures
Comparative Analysis of Verbal Alignment in Human-Human and Human-Agent Interactions
The eIdentity Text Exploration Workbench
Emilya: Emotional Body Expression in Daily Actions Database
The LIMA Multilingual Analyzer Made Free: FLOSS Resources Adaptation and Correction
Exploring Factors That Contribute to Successful Fingerspelling Comprehension
On the Annotation of TMX Translation Memories for Advanced Leveraging in Computer-Aided Translation
Named Entity Recognition on Turkish Tweets
On Complex Word Alignment Configurations
Linguistic Resources and Cats: How to Use Isocat, Relcat and Schemacat
Cross-Linguistic Annotation of Narrativity for English / French Verb Tense Disambiguation
Evaluating Corpora Documentation with Regards to the Ethics and Big Data Charter
Introducing a Web Application for Labeling, Visualizing Speech and Correcting Derived Speech Signals
Vocabulary-based Language Similarity Using Web Corpora
S-Pot - a Benchmark in Spotting Signs Within Continuous Signing
TweetNorm_es: an Annotated Corpus for Spanish Microtext Normalization
The Procedure of Lexico-Semantic Annotation of Składnica Treebank
A Corpus and Phonetic Dictionary for Tunisian Arabic Speech Recognition
Crowdsourcing as a Preprocessing for Complex Semantic Annotation Tasks
Automatic Annotation of Machine Translation Datasets with Binary Quality Judgements
Learning from Domain Complexity
Designing a Bilingual Speech Corpus for French and German Language Learners: a Two-Step Process
Deep Syntax Annotation of the Sequoia French Treebank
Developing a French Framenet: Methodology and First Results
Innovations in Parallel Corpus Search Tools
Representing Multimodal Linguistic Annotated Data
A Corpus of European Portuguese Child and Child-Directed Speech
'interHist' - an Interactive Visual Interface for Corpus Exploration
Hashtag Occurrences, Layout and Translation: a Corpus-Driven Analysis of Tweets Published by the Canadian Government
Presenting a System of Human-Machine Interaction for Performing Map Tasks.
Hesita(Te) in Portuguese
MUHIT: a Multilingual Harmonized Dictionary
The Munich Biovoice Corpus: Effects of Physical Exercising, Heart Rate, and Skin Conductance on Human Speech Production
Conceptual Transfer: Using Local Classifiers for Transfer Selection
Annotating Arguments: the Nomad Collaborative Annotation Tool
Correcting Errors in a New Gold Standard for Tagging Icelandic Text
Experiences with Parallelisation of an Existing NLP Pipeline: Tagging Hansard
Named Entity Corpus Construction Using Wikipedia and DBpedia Ontology
Euronews: a Multilingual Speech Corpus for ASR
Thomas Aquinas in the Tündra: Integrating the Index Thomisticus Treebank Into Clarin-D
Towards Linked Hypernyms Dataset 2.0: Complementing DBpedia with Hypernym Discovery
The Slovene Bnsi Broadcast News Database and Reference Speech Corpus Gos: Towards the Uniform Guidelines for Future Work
Language Editing Dataset of Academic Texts
Japanese Conversation Corpus for Training and Evaluation of Backchannel Prediction Model.
Aix Map Task Corpus: the French Multimodal Corpus of Task-Oriented Dialogue
Multiword Expressions in Machine Translation
CROMER: a Tool for Cross-Document Event and Entity Coreference
Automatic Language Identity Tagging on Word and Sentence-Level in Multilingual Text Sources: a Case-Study on Luxembourgish
CORILGA: a Galician Multilevel Annotated Speech Corpus for Linguistic Analysis
Classifying Inconsistencies in DBpedia Language Specific Chapters
The Halliday Centre Tagger: an Online Platform for Semi-Automatic Text Annotation and Analysis
NIF4OGGD - NLP Interchange Format for Open German Governmental Data
Verbs of Saying with a Textual Connecting Function in the Prague Discourse Treebank
A Language-Independent and Fully Unsupervised Approach to Lexicon Induction and Part-Of-Speech Tagging for Closely Related Languages
New Bilingual Speech Databases for Audio Diarization
Less is More? Towards a Reduced Inventory of Categories for Training a Parser for the Italian Stanford Dependencies
UnixMan Corpus: A Resource for Language Learning in the Unix Domain
GraPAT: a Tool for Graph Annotations
The Tutorbot Corpus ― a Corpus for Studying Tutoring Behaviour in Multiparty Face-To-Face Spoken Dialogue
TweetCaT: a Tool for Building Twitter Corpora of Smaller Languages
Re-Using an Argument Corpus to Aid in the Curation of Social Media Collections
Rapid Deployment of Phrase Structure Parsing for Related Languages: a Case Study of Insular Scandinavian
Assessment of Non-Native Prosody for Spanish as L2 Using Quantitative Scores and Perceptual Evaluation
Exploiting the Large-Scale German Broadcast Corpus to Boost the Fraunhofer Iais Speech Recognition System
Exploring the Utility of Coreference Chains for Improved Identification of Personal Names
Co-Clustering of Bilingual Datasets as a Mean for Assisting the Construction of Thematic Bilingual Comparable Corpora
The Extended Dirndl Corpus as a Resource for Coreference and Bridging Resolution
A Flexible Language Learning Platform Based on Language Resources and Web Services
Extracting Semantic Relations from Portuguese Corpora Using Lexical-Syntactic Patterns
An Analysis of Ambiguity in Word Sense Annotations
Disclose Models, Hide the Data - How to Make Use of Confidential Corpora Without Seeing Sensitive Raw Data
Exploiting Networks in Law
Parsing Chinese Synthetic Words with a Character-based Dependency Model
Multilingual Test Sets for Machine Translation of Search Queries for Cross-Lingual Information Retrieval in the Medical Domain
Building a Dataset for Summarization and Keyword Extraction from Emails
|
Crowdsourcing |
Modern Chinese Helps Archaic Chinese Processing: Finding and Exploiting the Shared Properties
A Crowdsourcing Smartphone Application for Swiss German: Putting Language Documentation in the Hands of the Users
A Study on Expert Sourcing Enterprise Question Collection and Classification
Collaboration in the Production of a Massively Multilingual Lexicon
The Newsome Corpus: a Unifying Opinion Annotation Framework Across Genres and in Multiple Languages
A SICK Cure for the Evaluation of Compositional Distributional Semantic Models
Morpho-Syntactic Study of Errors from Speech Recognition System
Crowdsourcing and Annotating NER for Twitter #drift
Designing and Evaluating a Reliable Corpus of Web Genres Via Crowd-Sourcing
Crowdsourcing as a Preprocessing for Complex Semantic Annotation Tasks
A Multi-Dialect, Multi-Genre Corpus of Informal Written Arabic
Crowd-Sourcing Evaluation of Automatically Acquired, Morphologically Related Word Groupings
Propa-L: a Semantic Filtering Service from a Lexical Network Created Using Games with a Purpose
When Transliteration Met Crowdsourcing : an Empirical Study of Transliteration Via Crowdsourcing Using Efficient, Non-Redundant and Fair Quality Control
|
E |
Emotion Recognition/Generation |
Toward a Unifying Model for Opinion, Sentiment and Emotion Information Extraction
Eliciting and Annotating Uncertainty in Spoken Language
Annotating Events in an Emotion Corpus
Speech-based Emotion Recognition: Feature Selection by Self-Adaptive Multi-Criteria Genetic Algorithm
The Av-Lasyn Database : a Synchronous Corpus of Audio and 3d Facial Marker Data for Audio-Visual Laughter Synthesis
Emilya: Emotional Body Expression in Daily Actions Database
The D-Ans Corpus: the Dublin-Autonomous Nervous System Corpus of Biosignal and Multimodal Recordings of Conversational Speech
Texafon 2.0: a Text Processing Tool for the Generation of Expressive Speech in Tts Applications
Media Monitoring and Information Extraction for the Highly Inflected Agglutinative Language Hungarian
The Sspnet-Mobile Corpus: Social Signal Processing Over Mobile Phones.
EMOVO Corpus: an Italian Emotional Speech Database
The Munich Biovoice Corpus: Effects of Physical Exercising, Heart Rate, and Skin Conductance on Human Speech Production
Voce Corpus: Ecologically Collected Speech Annotated with Physiological and Psychological Stress Assessments
Alert!... Calm Down, There is Nothing to Worry About. Warning and Soothing Speech Synthesis.
Modeling, Managing, Exposing, and Linking Ontologies with a Wiki-based Tool
Smile and Laughter in Human-Machine Interaction: a Study of Engagement
|
Endangered Languages |
PanLex: Building a Resource for Panlingual Lexical Translation
Enriching ODIN
TLAXCALA: a Multilingual Corpus of Independent News
Untrained Forced Alignment of Transcriptions and Audio for Language Documentation Corpora Using Webmaus
Finite-State Morphological Transducers for Three Kypchak Languages
A Finite-State Morphological Analyzer for a Lakota HPSG Grammar
Open-Domain Interaction and Online Content in the Sami Language
Using Transfer Learning to Assist Exploratory Corpus Annotation
Linguistic Evaluation of Support Verb Constructions by Openlogos and Google Translate
First Approach Toward Semantic Role Labeling for Basque
The Gulf of Guinea Creole Corpora
An Innovative World Language Centre : Challenges for the Use of Language Technology
|
Evaluation Methodologies |
VERTa: Facing a Multilingual Experience of a Linguistically-based MT Evaluation
Combining Elicited Imitation and Fluency Features for Oral Proficiency Measurement
ETER : a New Metric for the Evaluation of Hierarchical Named Entity Recognition
Measuring Readability of Polish Texts: Baseline Experiments
Bridging the Gap Between Speech Technology and Natural Language Processing: an Evaluation Toolbox for Term Discovery Systems
Building a Database of Japanese Adjective Examples from Special Purpose Web Corpora
A Repository of State of the Art and Competitive Baseline Summaries for Generic News Summarization
Comparing the Quality of Focused Crawlers and of the Translation Resources Obtained from Them
Creating and Using Large Monolingual Parallel Corpora for Sentential Paraphrase Generation
A Comparative Evaluation Methodology for Nlg in Interactive Systems
An Evaluation of the Role of Statistical Measures and Frequency for Mwe Identification
Using a Machine Learning Model to Assess the Complexity of Stress Systems
Translation Errors from English to Portuguese: an Annotated Corpus
Discosuite - a Parser Test Suite for German Discontinuous Structures
Corpus and Evaluation of Handwriting Recognition of Historical Genealogical Records
PACE Corpus: a Multilingual Corpus of Polarity-Annotated Textual Data from the Domains Automotive and CEllphone
A Benchmark Database of Phonetic Alignments in Historical Linguistics and Dialectology
Introducing a Framework for the Evaluation of Music Detection Tools
The Taraxü Corpus of Human-Annotated Machine Translations
Detecting Document Structure in a Very Large Corpus of Uk Financial Reports
Measuring Readability of Polish Texts: Baseline Experiments
S-Pot - a Benchmark in Spotting Signs Within Continuous Signing
Machine Translation for Subtitling: a Large-Scale Evaluation
Extrinsic Corpus Evaluation with a Collocation Dictionary Task
HuRIC: a Human Robot Interaction Corpus
On the Origin of Errors: a Fine-Grained Analysis of MT and PE Errors and their Relationship
Dense Components in the Structure of WordNet
MADAMIRA: a Fast, Comprehensive Tool for Morphological Analysis and Disambiguation of Arabic
A Rank-based Distance Measure to Detect Polysemy and to Determine Salient Vector-Space Features for German Prepositions
Building a Crisis Management Term Resource for Social Media: the Case of Floods and Protests
The Use of a Filemaker Pro Database in Evaluating Sign Language Notation Systems
A Quality-based Active Sample Selection Strategy for Statistical Machine Translation
A Large-Scale Evaluation of Pre-Editing Strategies for Improving User-Generated Content Translation
Activ-Es: a Comparable, Cross-Dialect Corpus of everyday Spanish from Argentina, Mexico, and Spain
Overview of Todai Robot Project and Evaluation Framework of Its NLP-based Problem Solving
Crowdsourcing for Evaluating Machine Translation Quality
Student Achievement and French Sentence Repetition Test Scores
Fuzzy V-Measure - an Evaluation Method for Cluster Analyses of Ambiguous Data
Why Chinese Web-as-Corpus is Wacky? Or: How Big Data is Killing Chinese Corpus Linguistics
KoKo: an L1 Learner Corpus for German
An Efficient and User-Friendly Tool for Machine Translation Quality Estimation
LexTerm Manager: Design for an Integrated Lexicography and Terminology System
The Etape Speech Processing Evaluation
|
M |
Machine Translation, SpeechToSpeech Translation |
Bilingual Dictionary Construction with Transliteration Filtering
Large SMTData-Sets Extracted from Wikipedia
Two-Step Machine Translation with Lattices
MTWatch: A Tool for the Analysis of Noisy Parallel Data
Collecting Natural Sms and Chat Conversations in Multiple Languages: the Bolt Phase 2 Corpus
Comparing the Quality of Focused Crawlers and of the Translation Resources Obtained from Them
Openlogos Semantico-Syntactic Knowledge-Rich Bilingual Dictionaries
Incorporating Alternate Translations Into English Translation Treebank
Multival - Towards a Multilingual Valence Lexicon
A Unified Annotation Scheme for the Semantic / Pragmatic Components of Definiteness
On the Reliability and Inter-Annotator Agreement of Human Semantic MT Evaluation Via Hmeant
Dual Subtitles as Parallel Corpora
Bootstrapping Open-Source English-Bulgarian Computational Dictionary
Collection of a Simultaneous Translation Corpus for Comparative Analysis
Translation Errors from English to Portuguese: an Annotated Corpus
English-French Verb Phrase Alignment in Europarl for Tense Translation Modeling
CFT13: a Resource for Research into the Post-editing Process
Creating a Massively Parallel Bible Corpus
Evaluating the Effects of Interactivity in a Post-Editing Workbench
ParCor 1.0: a Parallel Pronoun-Coreference Corpus to Support Statistical Mt
An Efficient Language Independent Toolkit for Complete Morphological Disambiguation
A Corpus of Spontaneous Speech in Lectures: the Kit Lecture Corpus for Spoken Language Processing and Translation
On the Annotation of TMX Translation Memories for Advanced Leveraging in Computer-Aided Translation
The Taraxü Corpus of Human-Annotated Machine Translations
The Strategic Impact of Meta-Net on the Regional, National and International Level
An Iterative Approach for Mining Parallel Sentences in a Comparable Corpus
Collocation Or Free Combination? ― Applying Machine Translation Techniques to Identify Collocations in Japanese
Multiword Expressions in Machine Translation
Crowdsourcing for Evaluating Machine Translation Quality
Hindencorp - Hindi-English and Hindi-Only Corpus for Machine Translation
caWaC - a Web Corpus of Catalan and Its Application to Language Modeling and Machine Translation
Billions of Parallel Words for Free: Building and Using the Eu Bookshop Corpus
LinkedHealthAnswers: Towards Linked Data-driven Question Answering for the Health Care Domain
Chasing the Perfect Splitter: a Comparison of Different Compound Splitting Tools
A Comparison of Mt Errors and Esl Errors
Improving Evaluation of English-Czech Mt Through Paraphrasing
DCEP - Digital Corpus of the European Parliament
An Efficient and User-Friendly Tool for Machine Translation Quality Estimation
|
Metadata |
Revising the Annotation of a Broadcast News Corpus: a Linguistic Approach
Using Large Biomedical Databases as Gold Annotations for Automatic Relation Extraction
Developing a Framework for Describing Relations Among Language Resources
Global Intelligent Content: Active Curation of Language Resources Using Linked Data
Experiences with the Isocat Data Category Registry
The Dutch LESLLA Corpus
The EASR Corpora of European Portuguese, French, Hungarian and Polish Elderly Speech
Three Dimensions of the So-Called "interoperability" of Annotation Schemes
TagNText: a Parallel Corpus for the Induction of Resource-Specific Non-Taxonomical Relations from Tagged Images
Meta-Share: One Year After
Meta-Classifiers Easily Improve Commercial Sentiment Detection Tools
HindEnCorp - Hindi-English and Hindi-only Corpus for Machine Translation
Recent Developments in DeReKo
Vulnerability in Acquisition, Language Impairments in Dutch: Creating a Valid Data Archive
Improving Entity Linking Using Surface Form Refinement
Facing the Identification Problem in Language-Related Scientific Data Analysis.
|
Morphology |
DerivBase.Hr: a High-Coverage Derivational Morphology Resource for Croatian
Generating and Using Probabilistic Morphological Resources for the Biomedical Domain
Computer-Aided Morphology Expansion for Old Swedish
DeLex, a Freely-Avaible, Large-Scale and Linguistically Grounded Morphological Lexicon for German
Automatic Refinement of Syntactic Categories in Chinese Word Structures
Bootstrapping Open-Source English-Bulgarian Computational Dictionary
Amazigh Verb Conjugator
Szeged Corpus 2.5: Morphological Modifications in a Manually Pos-Tagged Hungarian Corpus
The Syn-Series Corpora of Written Czech
Corpus of 19th-Century Czech Texts: Problems and Solutions
Automatic Error Detection Concerning the Definite and Indefinite Conjugation in the Hunlearner Corpus
A Language-Independent Approach to Extracting Derivational Relations from an Inflectional Lexicon
Morpho-Syntactic Study of Errors from Speech Recognition System
Can Crowdsourcing Be Used for Effective Annotation of Arabic?
Word-Formation Network for Czech
Glàff, a Large Versatile French Lexicon
The CMU Metal Farsi NLP Approach
Language Resource Addition: Dictionary Or Corpus?
The Development of Dutch and Afrikaans Language Resources for Compound Boundary Analysis.
Correcting Errors in a New Gold Standard for Tagging Icelandic Text
The Hungarian Gigaword Corpus
Measuring the Impact of Spelling Errors on the Quality of Machine Translation
Automatic Acquisition of Urdu Nouns (along with Gender and Irregular Plurals)
Chasing the Perfect Splitter: a Comparison of Different Compound Splitting Tools
|
MultiWord Expressions & Collocations |
PropBank: Semantics of New Predicate Types
4FX: Light Verb Constructions in a Multilingual Parallel Corpus
Semi-Compositional Method for Synonym Extraction of Multi-Word Terms
Linguistic Resources and Cats: How to Use Isocat, Relcat and Schemacat
Identifying Idioms in Chinese Translations
Identification of Multiword Expressions in the Brwac
Extrinsic Corpus Evaluation with a Collocation Dictionary Task
Comprehensive Annotation of Multiword Expressions in a Social Web Corpus
T2K²: a System for Automatically Extracting and Organizing Knowledge from Texts
Reconstructing the Semantic Landscape of Natural Language Processing
ISLEX ― a Multilingual Web Dictionary
TermWise: A CAT-tool with Context-Sensitive Terminological Support.
Compounds and Distributional Thesauri
SwissAdmin: a Multilingual Tagged Parallel Corpus of Press Releases
Summarizing News Clusters on the Basis of Thematic Chains
Named Entity Tagging a Very Large Unbalanced Corpus: Training and Evaluating Ne Classifiers
LexTerm Manager: Design for an Integrated Lexicography and Terminology System
|
Multilinguality |
Using Resource-Rich Languages to Improve Morphological Analysis of Under-Resourced Languages
Universal Stanford Dependencies: a Cross-Linguistic Typology
Pivot-based Multilingual Dictionary Building Using Wiktionary
Production of Phrase Tables in 11 European Languages Using an Improved Sub-Sentential Aligner
The Making of Ancient Greek WordNet
Extracting a Bilingual Semantic Grammar from Framenet-Annotated Corpora
Etymological WordNet: Tracing the History of Words
TLAXCALA: a Multilingual Corpus of Independent News
Relating Frames and Constructions in Japanese Framenet
Tharwa: a Large Scale Dialectal Arabic - Standard Arabic - English Lexicon
Automatic Methods for the Extension of a Bilingual Dictionary Using Comparable Corpora
Aggregation Methods for Efficient Collocation Detection
Globalphone: Pronunciation Dictionaries in 20 Languages
Linguistic Evaluation of Support Verb Constructions by Openlogos and Google Translate
Building a Dataset of Multilingual Cognates for the Romanian Lexicon
Automatic Expansion of the MRC Psycholinguistic Database Imageability Ratings
English-French Verb Phrase Alignment in Europarl for Tense Translation Modeling
Constructing a Chinese―Japanese Parallel Corpus from Wikipedia
xLiD-Lexica: Cross-lingual Linked Data Lexica
An Efficient Language Independent Toolkit for Complete Morphological Disambiguation
4FX: Light Verb Constructions in a Multilingual Parallel Corpus
Resources in Conflict: a Bilingual Valency Lexicon Vs. a Bilingual Treebank Vs. a Linguistic Theory
Buy One Get One Free: Distant Annotation of Chinese Tense, Event Type and Modality
Comparison of the Impact of Word Segmentation on Name Tagging for Chinese and Japanese
Not an Interlingua, But Close: Comparison of English Amrs to Chinese and Czech
On Complex Word Alignment Configurations
Bring vs. MTRoget: Evaluating Automatic Thesaurus Translation
The Strategic Impact of Meta-Net on the Regional, National and International Level
Bilingual Dictionary Induction as an Optimization Problem
Bootstrapping Term Extractors for Multiple Languages
Clustering of Multi-Word Named Entity Variants: Multilingual Evaluation
A Multidialectal Parallel Corpus of Arabic
Transfer Learning of Feedback Head Expressions in Danish and Polish Comparable Multimodal Corpora
Comparing Two Acquisition Systems for Automatically Building an English―Croatian Parallel Corpus from Multilingual Websites
Hashtag Occurrences, Layout and Translation: a Corpus-Driven Analysis of Tweets Published by the Canadian Government
On the Origin of Errors: a Fine-Grained Analysis of MT and PE Errors and their Relationship
YouDACC: the Youtube Dialectal Arabic Comment Corpus
Improving the Exploitation of Linguistic Annotations in Elan
Automatic Extraction of Synonyms for German Particle Verbs from Parallel Data with Distributional Similarity as a Re-Ranking Feature
NASTIA: Negotiating Appointment Setting Interface
Applying Accessibility-Oriented Controlled Language (CL) Rules to Improve Appropriateness of Text Alternatives for Images: an Exploratory Study
The DIRHA simulated corpus
High Quality Word Lists as a Resource for Multiple Purposes
ISLEX ― a Multilingual Web Dictionary
Exploiting Catenae in a Parallel Treebank Alignment
Multiple Choice Question Corpus Analysis for Distractor Characterization
Euronews: a Multilingual Speech Corpus for ASR
Towards Multilingual Conversations in the Medical Domain: Development of Multilingual Medical Data and a Network-based ASR System
How to Construct a Multi-Lingual Domain Ontology
Mining Online Discussion Forums for Metaphors
TALC-Sef a Manually-revised POS-Tagged Literary Corpus in Serbian, English and French
The Development of the Multilingual Luna Corpus for Spoken Language System Porting
An Open-Source Heavily Multilingual Translation Graph Extracted from Wiktionaries and Parallel Corpora
Quality Estimation for Synthetic Parallel Data Generation
Representing Multilingual Data as Linked Data: the Case of Babelnet 2.0
A Framework for Compiling High Quality Knowledge Resources from Raw Corpora
Extending Heideltime for Temporal Expressions Referring to Historic Dates
Enabling Language Resources to Expose Translations as Linked Data on the Web
Multilingual Extended WordNet Knowledge Base: Semantic Parsing and Translation of Glosses
A Comparison of Mt Errors and Esl Errors
HamleDT 2.0: Thirty Dependency Treebanks Stanfordized
Building the Sense-Tagged Multilingual Parallel Corpus
A Hindi-English Code-Switching Corpus
Resource Creation and Evaluation for Multilingual Sentiment Analysis in Social Media Texts
RECSA: Resource for Evaluating Cross-Lingual Semantic Annotation
|
Multimedia Document Processing |
Multimodal Corpora for Silent Speech Interaction
Extending Standoff Annotation
Expanding N-Gram Analytics in Elan and a Case Study for Sign Synthesis
TVD: a Reproducible and Multiply Aligned Tv Series Dataset
New Functions for a Multipurpose Multimodal Tool for Phonetic and Linguistic Analysis of Very Large Speech Corpora
|
P |
Parsing |
A Gold Standard Dependency Corpus for English
Boosting the Creation of a Treebank
A System for Experiments with Dependency Parsers
Improving Open Relation Extraction Via Sentence Re-Structuring
Universal Stanford Dependencies: a Cross-Linguistic Typology
Turkish Treebank as a Gold Standard for Morphological Disambiguation and Its Influence on Parsing
Incorporating Alternate Translations Into English Translation Treebank
Pre-Ordering of Phrase-based Machine Translation Input in Translation Workflow
Towards Building a Kashmiri Treebank: Setting Up the Annotation Pipeline
Information Extraction from German Patient Records Via Hybrid Parsing and Relation Extraction Strategies
Parsing Heterogeneous Corpora with a Rich Dependency Grammar
Mapping Diatopic and Diachronic Variation in Spoken Czech: the Ortofon and Dialekt Corpora
The Norwegian Dependency Treebank
All Fragments Count in Parser Evaluation
A Persian Treebank with Stanford Typed Dependencies
A Japanese Word Dependency Corpus
Converting an HPSG-based Treebank Into Its Parallel Dependency-based Treebank
Legal Aspects of Text Mining
Treelet Probabilities for HPSG Parsing and Error Correction
Swift Aligner, a Multifunctional Tool for Parallel Corpora: Visualization, Word Alignment, and (Morpho)-Syntactic Cross-Language Transfer
Projection-based Annotation of a Polish Dependency Treebank
Towards an Encyclopedia of Compositional Semantics: Documenting the Interface of the English Resource Grammar
The CMU Metal Farsi NLP Approach
The Setimes.Hr Linguistically Annotated Corpus of Croatian
Croatian Dependency Treebank 2.0: New Annotation Guidelines for Improved Parsing
Constituency Parsing of Bulgarian: Word- Vs Class-based Parsing
An Out-Of-Domain Test Suite for Dependency Parsing of German
Automatically Enriching Spoken Corpora with Syntactic Information for Linguistic Studies
Because Size Does Matter: the Hamburg Dependency Treebank
Dependency Parsing Representation Effects on the Accuracy of Semantic Applications ― an Example of an Inflective Language
HamleDT 2.0: Thirty Dependency Treebanks Stanfordized
Validation Issues Induced by an Automatic Pre-Annotation Mechanism in the Building of Non-Projective Dependency Treebanks
Bidirectionnal Converter Between Syntactic Annotations : from French Treebank Dependencies to Passage Annotations, and Back
|
Part-of-Speech Tagging |
PoliTa: a Multitagger for Polish
DeLex, a Freely-Avaible, Large-Scale and Linguistically Grounded Morphological Lexicon for German
The Kiezdeutsch Korpus (KiDKo) Release 1.0
ColLex.EN: Automatically Generating and Evaluating a Full-Form Lexicon for English
Developing an Egyptian Arabic Treebank: Impact of Dialectal Morphology on Annotation and Tool Development
Finite-State Morphological Transducers for Three Kypchak Languages
Using Transfer Learning to Assist Exploratory Corpus Annotation
Szeged Corpus 2.5: Morphological Modifications in a Manually Pos-Tagged Hungarian Corpus
The Cle Urdu POS Tagset
Adapting Freely Available Resources to Build an Opinion Mining Pipeline in Portuguese
Using Stem-Templates to Improve Arabic POS and Gender / Number Tagging
CoRoLa ― The Reference Corpus of Contemporary Romanian Language
The LIMA Multilingual Analyzer Made Free: FLOSS Resources Adaptation and Correction
Bootstrapping Term Extractors for Multiple Languages
The Gulf of Guinea Creole Corpora
A Corpus of European Portuguese Child and Child-Directed Speech
A Tagged Corpus and a Tagger for Urdu
Talapi ― a Thai Linguistically Annotated Corpus for Language Processing
Language Resource Addition: Dictionary Or Corpus?
The Setimes.Hr Linguistically Annotated Corpus of Croatian
Activ-Es: a Comparable, Cross-Dialect Corpus of everyday Spanish from Argentina, Mexico, and Spain
TALC-Sef a Manually-revised POS-Tagged Literary Corpus in Serbian, English and French
Morfeusz Reloaded
SwissAdmin: a Multilingual Tagged Parallel Corpus of Press Releases
Standardisation and Interoperation of Morphosyntactic and Syntactic Annotation Tools for Spanish and Their Annotations
A 500 Million Word Pos-Tagged Icelandic Corpus
Macrosyntactic Segmenters of a French Spoken Corpus
KoKo: an L1 Learner Corpus for German
|
Person Identification |
Sockpuppet Detection in Wikipedia: a Corpus of Real-World Deceptive Writing for Linking Identities
An Effortless Way to Create Large-Scale Datasets for Famous Speakers
Comparison of Gender- and Speaker-Adaptive Emotion Recognition
German Alcohol Language Corpus - the Question of Dialect
|
Phonetic Databases, Phonology |
On the Use of a Fuzzy Classifier to Speed Up the Sp_ToBI Labeling of the Glissando Spanish Corpus
Using a Machine Learning Model to Assess the Complexity of Stress Systems
The Nijmegen Corpus of Casual Czech
Computer-Aided Quality Assurance of an Icelandic Pronunciation Dictionary
Phoneme Similarity Matrices to Improve Long Audio Alignment for Automatic Subtitling
GRASS: the Graz Corpus of Read and Spontaneous Speech
Design and Development of an Rdb Version of the Corpus of Spontaneous Japanese
Glàff, a Large Versatile French Lexicon
C-Phonogenre: a 7-Hours Corpus of 7 Speaking Styles in French: Relations Between Situational Features and Prosodic Properties
|
Profiling |
CLiPS Stylometry Investigation (CSI) Corpus: a Dutch Corpus for the Detection of Age, Gender, Personality, Sentiment and Deception in Text
How to Use Less Features and Reach Better Performance in Author Gender Identification
Modeling and Evaluating Dialog Success in the Last Minute Corpus
Recognising Suicidal Messages in Dutch Social Media
|
Prosody |
ALICO: a Multimodal Corpus for the Study of Active Listening
A Cross-Language Corpus for Studying the Phonetics and Phonology of Prominence
Praaline: Integrating Tools for Speech Corpus Research
Evaluating Improvised Hip Hop Lyrics - Challenges and Observations
Eliciting and Annotating Uncertainty in Spoken Language
Teenage and Adult Speech in School Context: Building and Processing a Corpus of European Portuguese
Prosodic, Syntactic, Semantic Guidelines for Topic Structures Across Domains and Corpora
Annotation Pro + Tga: Automation of Speech Timing Analysis
New Spanish Speech Corpus Database for the Analysis of People Suffering from Parkinson's Disease
Towards Automatic Transformation Between Different Transcription Conventions: Prediction of Intonation Markers from Linguistic and Acoustic Features
RSS-TOBI - a Prosodically Enhanced Romanian Speech Corpus
Using Audio Books for Training a Text-To-Speech System
Assessment of Non-Native Prosody for Spanish as L2 Using Quantitative Scores and Perceptual Evaluation
C-Phonogenre: a 7-Hours Corpus of 7 Speaking Styles in French: Relations Between Situational Features and Prosodic Properties
The Extended Dirndl Corpus as a Resource for Coreference and Bridging Resolution
New Functions for a Multipurpose Multimodal Tool for Phonetic and Linguistic Analysis of Very Large Speech Corpora
DisMo: a Morphosyntactic, Disfluency and Multi-Word Unit Annotator. an Evaluation on a Corpus of French Spontaneous and Read Speech
Segmentation Evaluation Metrics, a Comparison Grounded on Prosodic and Discourse Units
|
S |
Semantic Web |
Accommodations in Tuscany as Linked Data
The DWAN Framework: Application of a Web Annotation Framework for the General Humanities to the Domain of Language Resources
A Meta-Data Driven Platform for Semi-Automatic Configuration of Ontology Mediators
N-Gram Counts and Language Models from the Common Crawl
TMO ― the Federated Ontology of the TRENDMINER Project
A SKOS-based Schema for TEI encoded Dictionaries at ICLTT
Efficient Reuse of Structured and Unstructured Resources for Ontology Population
Linked Open Data and Web Corpus Data for Noun Compound Bracketing
Newsreader: Recording History from Daily News Streams
Discovering and Visualising Stories in News
From Natural Language to Ontology Population in the Cultural Heritage Domain. a Computational Linguistics-based Approach.
NIF4OGGD - NLP Interchange Format for Open German Governmental Data
The LRE Map Disclosed
Representing Multilingual Data as Linked Data: the Case of Babelnet 2.0
VOAR: A Visual and Integrated Ontology Alignment Environment
|
Semantics |
PropBank: Semantics of New Predicate Types
Reusing Swedish Framenet for Training Semantic Roles
A Rank-based Distance Measure to Detect Polysemy and to Determine Salient Vector-Space Features for German Prepositions
Image Annotation with Iso-Space: Distinguishing Content from Structure
Semantic Approaches to Software Component Retrieval with English Queries
Definition Patterns for Predicative Terms in Specialized Lexical Resources
The Making of Ancient Greek WordNet
Augmenting English Adjective Senses with Supersenses
Evaluation of Simple Distributional Compositional Operations on Longer Texts
Relating Frames and Constructions in Japanese Framenet
Crowdsourcing for the Identification of Event Nominals: an Experiment
Tharwa: a Large Scale Dialectal Arabic - Standard Arabic - English Lexicon
Semantic Technologies for Querying Linguistic Annotations: an Experiment Focusing on Graph-Structured Data
A Unified Annotation Scheme for the Semantic / Pragmatic Components of Definiteness
Aligning Predicate-Argument Structures for Paraphrase Fragment Extraction
On the Reliability and Inter-Annotator Agreement of Human Semantic MT Evaluation Via Hmeant
Mapping WordNet Domains, WordNet Topics and Wikipedia Categories to Generate Multilingual Domain Specific Resources
Adapting VerbNet to French Using Existing Resources
Corpus-based Computation of Reverse Associations
Annotating Relation Mentions in Tabloid Press
Mapping Diatopic and Diachronic Variation in Spoken Czech: the Ortofon and Dialekt Corpora
Constructing a Corpus of Japanese Predicate Phrases for Synonym / Antonym Relations
Distributed Distributional Similarities of Google Books Over the Centuries
How to Tell a Schneemann from a Milchmann: an Annotation Scheme for Compound-Internal Relations
Construction of Diachronic Ontologies from People's Daily of Fifty Years
Resources in Conflict: a Bilingual Valency Lexicon Vs. a Bilingual Treebank Vs. a Linguistic Theory
Buy One Get One Free: Distant Annotation of Chinese Tense, Event Type and Modality
Building a Reference Lexicon for Countability in English
Not an Interlingua, But Close: Comparison of English Amrs to Chinese and Czech
WordNet―Wikipedia―Wiktionary: Construction of a Three-Way Alignment
Evaluation of Automatic Hypernym Extraction from Technical Corpora in English and Dutch
Discovering Frames in Specialized Domains
Resources for the Detection of Conventionalized Metaphors in Four Languages
Annotation of Computer Science Papers for Semantic Relation Extrac-Tion
Using C5.0 and Exhaustive Search for Boosting Frame-Semantic Parsing Accuracy
Automatic Semantic Relation Extraction from Portuguese Texts
Lexical Substitution Dataset for German
Polysemy Index for Nouns: an Experiment on Italian Using the Parole Simple CLiPS Lexical Database
Manual Analysis of Structurally Informed Reordering in German-English Machine Translation
Criteria for Identifying and Annotating Caused Motion Constructions in Corpus Data
Web-Imageability of the Behavioral Features of Basic-Level Concepts
Semi-Compositional Method for Synonym Extraction of Multi-Word Terms
From Synsets to Videos: Enriching Italwordnet Multimodally
Mining Online Discussion Forums for Metaphors
Classifying Inconsistencies in DBpedia Language Specific Chapters
Flow Graph Corpus from Recipe Texts
To Pay Or to Get Paid: Enriching a Valency Lexicon with Diatheses
Annotating the Focus of Negation in Japanese Text
Less is More? Towards a Reduced Inventory of Categories for Training a Parser for the Italian Stanford Dependencies
Combining Dependency Information and Generalization in a Pattern-based Approach to the Classification of Lexical-Semantic Relation Instances
Dependency Parsing Representation Effects on the Accuracy of Semantic Applications ― an Example of an Inflective Language
Extending the Coverage of a Mwe Database for Persian Cps Exploiting Valency Alternations
Single Classifier Approach for Verb Sense Disambiguation Based on Generalized Features
An Analysis of Ambiguity in Word Sense Annotations
SANA: a Large Scale Multi-Genre, Multi-Dialect Lexicon for Arabic Subjectivity and Sentiment Analysis
Word Semantic Similarity for Morphologically Rich Languages
Focusing Annotation for Semantic Role Labeling
|
Sign Language Recognition/Generation |
SLMotion - an Extensible Sign Language Oriented Video Analysis Tool
Extensions of the Sign Language Recognition and Translation Corpus Rwth-Phoenix-Weather
Expanding N-Gram Analytics in Elan and a Case Study for Sign Synthesis
LinkedHealthAnswers: Towards Linked Data-driven Question Answering for the Health Care Domain
|
Social Media Processing |
A Corpus of Comparisons in Product Reviews
A Corpus of Participant Roles in Contentious Discussions
Modelling Irony in Twitter: Feature Analysis and Evaluation
Getting Reliable Annotations for Sarcasm in Online Dialogues
Finding Romanized Arabic Dialect in Code-Mixed Tweets
Votter Corpus: a Corpus of Social Polling Language
A German Twitter Snapshot
SenTube: a Corpus for Sentiment Analysis on Youtube Social Media
Simple Effective Microblog Named Entity Recognition: Arabic as an Example
An Arabic Twitter Corpus for Subjectivity and Sentiment Analysis
The Dangerous Myth of the Star System
Crowdsourcing and Annotating NER for Twitter #drift
When POS Data Sets Don't Add Up: Combatting Sample Bias
Benchmarking Twitter Sentiment Analysis Tools
Comprehensive Annotation of Multiword Expressions in a Social Web Corpus
Building a Crisis Management Term Resource for Social Media: the Case of Floods and Protests
Who Cares About Sarcastic Tweets? Investigating the Impact of Sarcasm on Sentiment Analysis.
Named Entity Corpus Construction Using Wikipedia and DBpedia Ontology
Towards Shared Datasets for Normalization Research
Nomad: Linguistic Resources and Tools Aimed at Policy Formulation and Validation
TweetCaT: a Tool for Building Twitter Corpora of Smaller Languages
A Framework for Public Health Surveillance
|
Speech Recognition/Understanding |
The Etape Speech Processing Evaluation
Enhancing the Ted-Lium Corpus with Selected Data for Language Modeling and More Ted Talks
Automatically Enriching Spoken Corpora with Syntactic Information for Linguistic Studies
ASR-based CALL Systems and Learner Speech Data: New Resources and Opportunities for Research and Development in Second Language Learning
Ciempiess: a New Open-Sourced Mexican Spanish Radio Corpus
Speech Recognition Web Services for Dutch
A Corpus and Phonetic Dictionary for Tunisian Arabic Speech Recognition
Free English and Czech Telephone Speech Corpus Shared Under the Cc-By-Sa 3.0 License
The DIRHA simulated corpus
TUKE-BNews-SK: Slovak Broadcast News Corpus Construction and Evaluation
Towards Multilingual Conversations in the Medical Domain: Development of Multilingual Medical Data and a Network-based Asr System
The Slovene Bnsi Broadcast News Database and Reference Speech Corpus Gos: Towards the Uniform Guidelines for Future Work
A Toolkit for Efficient Learning of Lexical Units for Speech Recognition
Basque Speecon-Like and Basque Speechdat Mdb-600: Speech Databases for the Development of ASR Technology for Basque
Using a Serious Game to Collect a Child Learner Speech Corpus
A LDA-based Topic Classification Approach from highly Imperfect Automatic Transcriptions
Exploiting the Large-Scale German Broadcast Corpus to Boost the Fraunhofer Iais Speech Recognition System
El-Woz: a Client-Server Wizard-Of-Oz Interface
|
Speech Resource/Database |
Phoneme Set Design Using English Speech Database by Japanese for Dialogue-based English Call Systems
Croatian Memories
Designing the Latvian Speech Recognition Corpus
The Kiezdeutsch Korpus (KiDKo) Release 1.0
The RATS Collection: Supporting HLT Research with Degraded Audio Data
Untrained Forced Alignment of Transcriptions and Audio for Language Documentation Corpora Using Webmaus
The Sweet-Home Speech and Multimodal Corpus for Home Automation Interaction
Collection of a Simultaneous Translation Corpus for Comparative Analysis
The Database for Spoken German ― DGD2
SAVAS: Collecting, Annotating and Sharing Audiovisual Language Resources for Automatic Subtitling
Speech Recognition Web Services for Dutch
ML-Optimization of Ported Constraint Grammars
Phone Boundary Annotation in Conversational Speech
The Research and Teaching Corpus of Spoken German ― Folk
Free Acoustic and Language Models for Large Vocabulary Continuous Speech Recognition in Swedish
An Effortless Way to Create Large-Scale Datasets for Famous Speakers
Rhapsodie: a Prosodic-Syntactic Treebank for Spoken French
GRASS: the Graz Corpus of Read and Spontaneous Speech
German Alcohol Language Corpus - the Question of Dialect
Development of a Tv Broadcasts Speech Recognition System for Qatari Arabic
Design and Development of an Rdb Version of the Corpus of Spontaneous Japanese
Automatic Long Audio Alignment and Confidence Scoring for Conversational Arabic Speech
Semi-Automatic Annotation of the Ucu Accents Speech Corpus
AusTalk: an Audio-Visual Corpus of Australian English
The Sspnet-Mobile Corpus: Social Signal Processing Over Mobile Phones.
Extensions of the Sign Language Recognition and Translation Corpus Rwth-Phoenix-Weather
Mapping CPA Patterns onto OntoNotes Senses
Voce Corpus: Ecologically Collected Speech Annotated with Physiological and Psychological Stress Assessments
A Multimodal Corpus of Rapid Dialogue Games
Alert!... Calm Down, There is Nothing to Worry About. Warning and Soothing Speech Synthesis.
CORILGA: a Galician Multilevel Annotated Speech Corpus for Linguistic Analysis
Basque Speecon-Like and Basque Speechdat Mdb-600: Speech Databases for the Development of ASR Technology for Basque
Erlangen-CLP: A Large Annotated Corpus of Speech from Children with Cleft Lip and Palate
Using a Serious Game to Collect a Child Learner Speech Corpus
Using Audio Books for Training a Text-To-Speech System
Discovering the Italian Literature: Interactive Access to Audio Indexed Text Resources
VOLIP: a Corpus of Spoken Italian and a Virtuous Example of Reuse of Linguistic Resources
A Hindi-English Code-Switching Corpus
El-Woz: a Client-Server Wizard-Of-Oz Interface
Multilingual Test Sets for Machine Translation of Search Queries for Cross-Lingual Information Retrieval in the Medical Domain
|
Speech Synthesis |
The MMASCS Multi-Modal Annotated Synchronous Corpus of Audio, Video, Facial Motion and Tongue Motion Data of Normal, Fast and Slow Speech
Texafon 2.0: a Text Processing Tool for the Generation of Expressive Speech in Tts Applications
RSS-TOBI - a Prosodically Enhanced Romanian Speech Corpus
A Flexible Language Learning Platform Based on Language Resources and Web Services
|
Standards for LRs |
On Paraphrase Identification Corpora
Image Annotation with Iso-Space: Distinguishing Content from Structure
N-Gram Counts and Language Models from the Common Crawl
Benchmarking of English-Hindi Parallel Corpora
RELISH LMF: Unlocking the Full Power of the Lexical Markup Framework
The CMD Cloud
Interoperability of Dialogue Corpora Through Iso 24617-2-based Querying
A Benchmark Database of Phonetic Alignments in Historical Linguistics and Dialectology
Using TEI, CMDI and ISOcat in CLARIN-DK
Legal Aspects of Text Mining
Towards an Integration of Syntactic and Temporal Annotations in Estonian
Adapting a Part-Of-Speech Tagset to Non-Standard Text: the Case of Stts
An Open Source Part-Of-Speech Tagger for Norwegian: Building on Existing Language Resources
Vulnerability in Acquisition, Language Impairments in Dutch: Creating a Valid Data Archive
Facing the Identification Problem in Language-Related Scientific Data Analysis.
Off-Road LAF: Encoding and Processing Annotations in NLP Workflows
|
Statistical and Machine Learning Methods |
Gold-Standard for Topic-Specific Sentiment Analysis of Economic Texts
Semantic Approaches to Software Component Retrieval with English Queries
Missed Opportunities in Translation Memory Matching
Use of Unsupervised Word Classes for Entity Recognition: Application to the Detection of Disorders in Clinical Reports
ColLex.EN: Automatically Generating and Evaluating a Full-Form Lexicon for English
Event Extraction Using Distant Supervision
A Vector Space Model for Syntactic Distances Between Dialects
The Av-Lasyn Database : a Synchronous Corpus of Audio and 3d Facial Marker Data for Audio-Visual Laughter Synthesis
Exploring and Visualizing Variation in Language Resources
SLMotion - an Extensible Sign Language Oriented Video Analysis Tool
Boosting the Creation of a Treebank
Improvements to Dependency Parsing Using Automatic Simplification of Data
From Non Word to New Word: Automatically Identifying Neologisms in French Newspapers
Comparison of Gender- and Speaker-Adaptive Emotion Recognition
Disambiguating Verbs by Collocation: Corpus Lexicography Meets Natural Language Processing
GenitivDB ― a Corpus-Generated Database for German Genitive Classification
3d Face Tracking and Multi-Scale, Spatio-Temporal Analysis of Linguistically Significant Facial Expressions and Head Positions in Asl
All Fragments Count in Parser Evaluation
A Language-Independent Approach to Extracting Derivational Relations from an Inflectional Lexicon
Bring vs. MTRoget: Evaluating Automatic Thesaurus Translation
Latent Semantic Analysis Models on Wikipedia and Tasa
Shata-Anuvadak: Tackling Multiway Translation of Indian Languages
Narrowing the Gap Between Termbases and Corpora in Commercial Environments
Machine Translationness: Machine-Likeness in Machine Translation Evaluation
Using C5.0 and Exhaustive Search for Boosting Frame-Semantic Parsing Accuracy
Projection-based Annotation of a Polish Dependency Treebank
A Deep Context Grammatical Model for Authorship Attribution
DINASTI: Dialogues with a Negotiating Appointment Setting Interface
LQVSumm: a Corpus of Linguistic Quality Violations in Multi-Document Summarization
Choosing Which to Use? A Study of Distributional Models for Nominal Lexical Semantic Classification
Estimation of Speaking Style in Speech Corpora Focusing on Speech Transcriptions
A Quality-based Active Sample Selection Strategy for Statistical Machine Translation
Metadata as Linked Open Data: Mapping Disparate Xml Metadata Registries Into One Rdf / Owl Registry.
Hindi to English Machine Translation: Using Effective Selection in Multi-Model SMT
New Spanish Speech Corpus Database for the Analysis of People Suffering from Parkinson's Disease
Automatic Language Identity Tagging on Word and Sentence-Level in Multilingual Text Sources: a Case-Study on Luxembourgish
Crowd-Sourcing Evaluation of Automatically Acquired, Morphologically Related Word Groupings
A Language-Independent and Fully Unsupervised Approach to Lexicon Induction and Part-Of-Speech Tagging for Closely Related Languages
Quality Estimation for Synthetic Parallel Data Generation
Online Optimisation of Log-Linear Weights in Interactive Machine Translation
Finding a Tradeoff Between Accuracy and Rater's Workload in Grading Clustered Short Answers
Evaluation of Technology Term Recognition with Random Indexing
Utilizing Constituent Structure for Compound Analysis
|
Summarisation |
Building a Dataset for Summarization and Keyword Extraction from Emails
The Polish Summaries Corpus
The Impact of Cohesion Errors in Extraction Based Summaries
Out in the Open: Finding and Categorising Errors in the Lexical Simplification Pipeline
Locating Requests Among Open Source Software Communication Messages
How Could Veins Speed Up the Process of Discourse Parsing
A Repository of State of the Art and Competitive Baseline Summaries for Generic News Summarization
|
T |
Text Mining |
Gold-Standard for Topic-Specific Sentiment Analysis of Economic Texts
HiEve: A Corpus for Extracting Event Hierarchies from News Stories
Co-Clustering of Bilingual Datasets as a Mean for Assisting the Construction of Thematic Bilingual Comparable Corpora
Enrichment of Bilingual Dictionary Through News Stream Data
Event Extraction Using Distant Supervision
SinoCoreferencer: An End-to-End Chinese Event Coreference Resolver
Using Large Biomedical Databases as Gold Annotations for Automatic Relation Extraction
Annotating Inter-Sentence Temporal Relations in Clinical Notes
Tools for Arabic Natural Language Processing: a Case Study in Qalqalah Prosody
Dual Subtitles as Parallel Corpora
Variations on Quantitative Comparability Measures and Their Evaluations on Synthetic French-English Comparable Corpora
Linking Pictographs to Synsets: Sclera2Cornetto
Information Extraction from German Patient Records Via Hybrid Parsing and Relation Extraction Strategies
Constructing a Corpus of Japanese Predicate Phrases for Synonym / Antonym Relations
On Stopwords, Filtering and Data Sparsity for Sentiment Analysis of Twitter
Automatic Semantic Relation Extraction from Portuguese Texts
Creating a Gold Standard Corpus for the Extraction of Chemistry-Disease Relations from Patent Texts
Estimation of Speaking Style in Speech Corpora Focusing on Speech Transcriptions
AraNLP: a Java-based Library for the Processing of Arabic Text
Who Cares About Sarcastic Tweets? Investigating the Impact of Sarcasm on Sentiment Analysis.
Coreference Resolution for Latvian
Ranking Job Offers for Candidates: Learning Hidden Knowledge from Big Data
Clustering Tweets Usingwikipedia Concepts
Hot Topics and Schisms in NLP: Community and Trend Analysis with Saffron on Acl and Lrec Proceedings
The American Local News Corpus
When Transliteration Met Crowdsourcing : an Empirical Study of Transliteration Via Crowdsourcing Using Efficient, Non-Redundant and Fair Quality Control
|
Textual Entailment and Paraphrasing |
On Paraphrase Identification Corpora
Multimodal Dialogue Segmentation with Gesture Post-Processing
A SICK Cure for the Evaluation of Compositional Distributional Semantic Models
Semantic Clustering of Pivot Paraphrases
The Multilingual Paraphrase Database
Annotating the Focus of Negation in Japanese Text
Improving Evaluation of English-Czech Mt Through Paraphrasing
|
Tools, Systems, Applications |
VERTa: Facing a Multilingual Experience of a Linguistically-based MT Evaluation
Accommodations in Tuscany as Linked Data
Discovering the Italian Literature: Interactive Access to Audio Indexed Text Resources
Using Stem-Templates to Improve Arabic POS and Gender / Number Tagging
A Meta-Data Driven Platform for Semi-Automatic Configuration of Ontology Mediators
The Ellogon Pattern Engine: Context-Free Grammars Over Annotations
Missed Opportunities in Translation Memory Matching
Native Language Identification Using Large, Longitudinal Data
Enriching ODIN
Creating Summarization Systems with SUMMA
Developing an Egyptian Arabic Treebank: Impact of Dialectal Morphology on Annotation and Tool Development
MomResp: a Bayesian Model for Multi-Annotator Document Labeling
Towards Automatic Detection of Narrative Structure
A Method for Building Burst-Annotated Co-Occurrence Networks for Analysing Trends in Textual Data
Annotating Inter-Sentence Temporal Relations in Clinical Notes
Refractive: an Open Source Tool to Extract Knowledge from Syntactic and Semantic Relations
A Finite-State Morphological Analyzer for a Lakota HPSG Grammar
Motàmot Project: Conversion of a French-Khmer Published Dictionary for Building a Multilingual Lexical System
Just.Ask, a QASystem That Learns to Answer New Questions from Previous Interactions
Open-Domain Interaction and Online Content in the Sami Language
Guampa: a Toolkit for Collaborative Translation
RELISH LMF: Unlocking the Full Power of the Lexical Markup Framework
Ciempiess: a New Open-Sourced Mexican Spanish Radio Corpus
Exploring and Visualizing Variation in Language Resources
A New Form of Humor ― Mapping Constraint-based Computational Morphologies to a Finite-State Representation
A Multi-Cultural Repository of Automatically Discovered Linguistic and Conceptual Metaphors
First Approach Toward Semantic Role Labeling for Basque
Aligning Parallel Texts with Intertext
Extending Standoff Annotation
Turkish Resources for Visual Word Recognition
ROOTS: a Toolkit for Easy, Fast and Consistent Processing of Large Sequential Annotated Data Collections
Constructing and Exploiting an Automatically Annotated Resource of Legislative Texts
The EASR Corpora of European Portuguese, French, Hungarian and Polish Elderly Speech
Phoneme Similarity Matrices to Improve Long Audio Alignment for Automatic Subtitling
HFST-SweNER ― a New NER Resource for Swedish
Introducing a Framework for the Evaluation of Music Detection Tools
Detecting Document Structure in a Very Large Corpus of Uk Financial Reports
Latent Semantic Analysis Models on Wikipedia and Tasa
Sprinter: Language Technologies for Interactive and Multimedia Language Learning
Bilingual Dictionary Induction as an Optimization Problem
A Set of Open Source Tools for Turkish Natural Language Processing
The Procedure of Lexico-Semantic Annotation of Składnica Treebank
French Resources for Extraction and Normalization of Temporal Expressions with Heideltime
CLARIN-NL: Major Results
Machine Translation for Subtitling: a Large-Scale Evaluation
The N2 Corpus: a Semantically Annotated Collection of Islamist Extremist Stories
Benchmarking Twitter Sentiment Analysis Tools
Corpus Annotation Through Crowdsourcing: Towards Best Practice Guidelines
Machine Translationness: Machine-Likeness in Machine Translation Evaluation
Towards an Environment for the Production and the Validation of Lexical Semantic Resources
The Distress Analysis Interview Corpus of Human and Computer Interviews
Representing Multimodal Linguistic Annotated Data
Comparative Analysis of Portuguese Named Entities Recognition Tools
Collocation Or Free Combination? ― Applying Machine Translation Techniques to Identify Collocations in Japanese
The Wavesurfer Automatic Speech Recognition Plugin
A Cascade Approach for Complex-Type Classification
Online Experiments with the Percy Software Framework - Experiences and Some Early Results
Improving the Exploitation of Linguistic Annotations in Elan
Sentence Rephrasing for Parsing Sentences with Oov Words
Clinical Data-Driven Probabilistic Graph Processing
A Compact Interactive Visualization of Dependency Treebank Query Results
ILLINOISCLOUDNLP: Text Analytics Services in the Cloud
Reconstructing the Semantic Landscape of Natural Language Processing
High Quality Word Lists as a Resource for Multiple Purposes
Language Processing Infrastructure in the Xlike Project
Sharing Resources Between Free / Open-Source Rule-based Machine Translation Systems: Grammatical Framework and Apertium
A Stream Computing Approach Towards Scalable NLP
Hindi to English Machine Translation: Using Effective Selection in Multi-Model SMT
Experiences with Parallelisation of an Existing NLP Pipeline: Tagging Hansard
A Model to Generate Adaptive Multimodal Job Interviews with a Virtual Recruiter
Identification of Technology Terms in Patents
A Toolkit for Efficient Learning of Lexical Units for Speech Recognition
Rule-based Reordering Space in Statistical Machine Translation
TVD: a Reproducible and Multiply Aligned Tv Series Dataset
The Halliday Centre Tagger: an Online Platform for Semi-Automatic Text Annotation and Analysis
Heuristic Hyper-Minimization of Finite State Lexicons
Nomad: Linguistic Resources and Tools Aimed at Policy Formulation and Validation
Standardisation and Interoperation of Morphosyntactic and Syntactic Annotation Tools for Spanish and Their Annotations
The Tutorbot Corpus ― a Corpus for Studying Tutoring Behaviour in Multiparty Face-To-Face Spoken Dialogue
Combining Dependency Information and Generalization in a Pattern-based Approach to the Classification of Lexical-Semantic Relation Instances
An Exercise in Reuse of Resources: Adapting General Discourse Coreference Resolution for Detecting Lexical Chains in Patent Documentation
VOAR: A Visual and Integrated Ontology Alignment Environment
DisMo: a Morphosyntactic, Disfluency and Multi-Word Unit Annotator. an Evaluation on a Corpus of French Spontaneous and Read Speech
Integration of Workflow and Pipeline for Language Service Composition
Large Scale Arabic Error Annotation: Guidelines and Framework
MAT: a Tool for L2 Pronunciation Errors Annotation
Taalportaal: an Online Grammar of Dutch and Frisian
A Framework for Public Health Surveillance
Languagesindanger.Eu - Including Multimedia Language Resources to Disseminate Knowledge and Create Educational Material On less-Resourced Languages
|
Topic Detection & Tracking |
Extracting Information for Context-Aware Meeting Preparation
Newsreader: Recording History from Daily News Streams
The Slovak Categorized News Corpus
Clustering Tweets Usingwikipedia Concepts
Hot Topics and Schisms in NLP: Community and Trend Analysis with Saffron on ACL and LREC Proceedings
A Modular System for Rule-based Text Categorisation
|
Typological Databases |
Etymological WordNet: Tracing the History of Words
Language Collage: Grammatical Description with the Lingo Grammar Matrix
|
|
|