TOPICS: Browse articles of the conference sorted by topic
A - C - D - E - G - I - K - L - M - N - O - P - Q - S - T - U - V - W
C |
Cognitive Methods |
VoxML: A Visualization Modeling Language
Metonymy Analysis Using Associative Relations between Words
A Corpus of Text Data and Gaze Fixations from Autistic and Non-Autistic Adults
Cognitively Motivated Distributional Representations of Meaning
English-to-Japanese Translation vs. Dictation vs. Post-editing: Comparing Translation Modes in a Multilingual Setting
Multimodal Resources for Human-Robot Communication Modelling
Finding Recurrent Features of Image Schema Gestures: the FIGURE corpus
Coordinating Communication in the Wild: The Artwalk Dialogue Corpus of Pedestrian Navigation and Mobile Referential Communication
Database of Mandarin Neighborhood Statistics
Cohere: A Toolkit for Local Coherence
Collaborative Resource Construction |
A Corpus of Wikipedia Discussions: Over the Years, with Topic, Power and Gender Labels
Phonetic Inventory for an Arabic Speech Corpus
A Multi-Layered Annotated Corpus of Scientific Papers
Corpus Resources for Dispute Mediation Discourse
New Inflectional Lexicons and Training Corpora for Improved Morphosyntactic Annotation of Croatian and Serbian
A Tagged Corpus for Automatic Labeling of Disabilities in Medical Scientific Papers
Introducing the Asian Language Treebank (ALT)
Benchmarking multimedia technologies with the CAMOMILE platform: the case of Multimodal Person Discovery at MediaEval 2015
Nederlab: Towards a Single Portal and Research Environment for Diachronic Dutch Text Corpora
Building Language Resources for Exploring Autism Spectrum Disorders
Staggered NLP-assisted refinement for Clinical Annotations of Chronic Disease Events
Resources for building applications with Dependency Minimal Recursion Semantics
Port4NooJ v3.0: Integrated Linguistic Resources for Portuguese NLP
From Interoperable Annotations towards Interoperable Resources: A Multilingual Approach to the Analysis of Discourse
UIMA-Based JCoRe 2.0 Goes GitHub and Maven Central ― State-of-the-Art Software Resource Engineering and Distribution of NLP Pipelines
Building Evaluation Datasets for Consumer-Oriented Information Retrieval
CLARIN-EL Web-based Annotation Tool
EDISON: Feature Extraction for NLP, Simplified
Computer-Assisted Language Learning (CALL) |
The Validation of MRCPD Cross-language Expansions on Imageability Ratings
Unsupervised Ranked Cross-Lingual Lexical Substitution for Low-Resource Languages
Improving POS Tagging of German Learner Language in a Reading Comprehension Scenario
SweLL on the rise: Swedish Learner Language corpus for European Reference Level studies
SVALex: a CEFR-graded Lexical Resource for Swedish Foreign and Second Language Learners
Detecting Word Usage Errors in Chinese Sentences for Learning Chinese as a Foreign Language
Leveraging Native Data to Correct Preposition Errors in Learners' Dutch
Chatbot Technology with Synthetic Voices in the Acquisition of an Endangered Language: Motivation, Development and Evaluation of a Platform for Irish
A Shared Task for Spoken CALL?
DALILA: The Dialectal Arabic Linguistic Learning Assistant
Error Typology and Remediation Strategies for Requirements Written in English by Non-Native Speakers
Joining-in-type Humanoid Robot Assisted Language Learning System
Controlled Languages |
LELIO: An Auto-Adaptative System to Acquire Domain Lexical Knowledge in Technical Texts
ProphetMT: A Tree-based SMT-driven Controlled Language Authoring/Post-Editing Tool
Error Typology and Remediation Strategies for Requirements Written in English by Non-Native Speakers
Corpus (Creation, Annotation, etc.) |
Endangered Language Documentation: Bootstrapping a Chatino Speech Corpus, Forced Aligner, ASR
The PsyMine Corpus - A Corpus annotated with Psychiatric Disorders and their Etiological Factors
Optimizing Computer-Assisted Transcription Quality with Iterative User Interfaces
QTLeap WSD/NED Corpora: Semantic Annotation of Parallel Corpora in Six Languages
An Interaction-Centric Dataset for Learning Automation Rules in Smart Homes
C-WEP―Rich Annotated Collection of Writing Errors by Professionals
The REAL Corpus: A Crowd-Sourced Corpus of Human Generated and Evaluated Spatial References to Real-World Urban Scenes
Ecological Gestures for HRI: the GEE Corpus
How to Address Smart Homes with a Social Robot? A Multi-modal Corpus of User Interactions with an Intelligent Environment
Who was Pietro Badoglio? Towards a QA system for Italian History
Croatian Error-Annotated Corpus of Non-Professional Written Language
New release of Mixer-6: Improved validity for phonetic study of speaker variation and identification
An Annotated Corpus of Direct Speech
Annotating Sentiment and Irony in the Online Italian Political Debate on #labuonascuola
Axolotl: a Web Accessible Parallel Corpus for Spanish-Nahuatl
A Corpus of Wikipedia Discussions: Over the Years, with Topic, Power and Gender Labels
NLP Infrastructure for the Lithuanian Language
Sense-annotating a Lexical Substitution Data Set with Ubyline
Focus Annotation of Task-based Data: A Comparison of Expert and Crowd-Sourced Annotation in a Reading Comprehension Corpus
The OpenCourseWare Metadiscourse (OCWMD) Corpus
An Open Corpus for Named Entity Recognition in Historic Newspapers
Domain Adaptation for Named Entity Recognition Using CRFs
Building a Dataset for Possessions Identification in Text
Age and Gender Prediction on Health Forum Data
Generating a Yiddish Speech Corpus, Forced Aligner and Basic ASR System for the AHEYM Project
Manual and Automatic Paraphrases for MT Evaluation
CodE Alltag: A German-Language E-Mail Corpus
ARRAU: Linguistically-Motivated Annotation of Anaphoric Descriptions
Internet Argument Corpus 2.0: An SQL schema for Dialogic Social Media and the Corpora to go with it
Combining Semantic Annotation of Word Sense & Semantic Roles: A Novel Annotation Scheme for VerbNet Roles on German Language Data
A Framework for Collecting Realistic Recordings of Dysarthric Speech - the homeService Corpus
Annotating Characters in Literary Corpora: A Scheme, the CHARLES Tool, and an Annotated Novel
MWEs in Treebanks: From Survey to Guidelines
LORELEI Language Packs: Data, Tools, and Resources for Technology Development in Low Resource Languages
Improving corpus search via parsing
Ubuntu-fr: A Large and Open Corpus for Multi-modal Analysis of Online Written Conversations
A Turkish-German Code-Switching Corpus
Corpus Analysis based on Structural Phenomena in Texts: Exploiting TEI Encoding for Linguistic Research
A Web Tool for Building Parallel Corpora of Spoken and Sign Languages
Introducing the LCC Metaphor Datasets
Passing a USA National Bar Exam: a First Corpus for Experimentation
Creating a Large Multi-Layered Representational Repository of Linguistic Code Switched Arabic Data
Factuality Annotation and Learning in Spanish Texts
Using Word Embeddings to Translate Named Entities
Privacy Issues in Online Machine Translation Services - European Perspective
The Alaskan Athabascan Grammar Database
Corpora for Learning the Mutual Relationship between Semantic Relatedness and Textual Entailment
DUEL: A Multi-lingual Multimodal Dialogue Corpus for Disfluency, Exclamations and Laughter
The OnForumS corpus from the Shared Task on Online Forum Summarisation at MultiLing 2015
Capturing Chat: Annotation and Tools for Multiparty Casual Conversation.
DT-Neg: Tutorial Dialogues Annotated for Negation Scope and Focus in Context
Enriching TimeBank: Towards a more precise annotation of temporal relations in a text
Phrase Level Segmentation and Labelling of Machine Translation Errors
Building the Macedonian-Croatian Parallel Corpus
The ACQDIV Database: Min(d)ing the Ambient Language
Towards Automatic Transcription of ILSE ― an Interdisciplinary Longitudinal Study of Adult Development and Aging
A Tangled Web: The Faint Signals of Deception in Text - Boulder Lies and Truth Corpus (BLT-C)
SatiricLR: a Language Resource of Satirical News Articles
The Uppsala Corpus of Student Writings: Corpus Creation, Annotation, and Analysis
The Query of Everything: Developing Open-Domain, Natural-Language Queries for BOLT Information Retrieval
Spanish Word Vectors from Wikipedia
Two Years of Aranea: Increasing Counts and Tuning the Pipeline
Universal Dependencies for Japanese
Annotating and Detecting Medical Events in Clinical Notes
Collecting Language Resources for the Latvian e-Government Machine Translation Platform
Multiword Expressions Dataset for Indian Languages
Quantitative Analysis of Gazes and Grounding Acts in L1 and L2 Conversations
The Validation of MRCPD Cross-language Expansions on Imageability Ratings
SemRelData ― Multilingual Contextual Annotation of Semantic Relations between Nominals: Dataset and Guidelines
A Dependency Treebank of the Chinese Buddhist Canon
Hidden Resources ― Strategies to Acquire and Exploit Potential Spoken Language Resources in National Archives
Learning from Within? Comparing PoS Tagging Approaches for Historical Text
Introducing the Weighted Trustability Evaluator for Crowdsourcing Exemplified by Speaker Likability Classification
Question-Answering with Logic Specific to Video Games
SubCo: A Learner Translation Corpus of Human and Machine Subtitles
Multi-language Speech Collection for NIST LRE
Selection Criteria for Low Resource Language Programs
Assessing the Prosody of Non-Native Speakers of English: Measures and Feature Sets
Japanese Word―Color Associations with and without Contexts
Phonetic Inventory for an Arabic Speech Corpus
A Language Resource of German Errors Written by Children with Dyslexia
MarsaGram: an excursion in the forests of parsing trees
The IPR-cleared Corpus of Contemporary Written and Spoken Romanian Language
Compilation of an Arabic Childrens Corpus
CoRuSS - a New Prosodically Annotated Corpus of Russian Spontaneous Speech
Corpus for Childrens Writing with Enhanced Output for Specific Spelling Patterns (2nd and 3rd Grade)
Annotating Logical Forms for EHR Questions
Modelling Multi-issue Bargaining Dialogues: Data Collection, Annotation Design and Corpus
Evaluating a Topic Modelling Approach to Measuring Corpus Similarity
Benchmarking Lexical Simplification Systems
AIMU: Actionable Items for Meeting Understanding
Phoneme Alignment Using the Information on Phonological Processes in Continuous Speech
Arabic to English Person Name Transliteration using Twitter
Improving POS Tagging of German Learner Language in a Reading Comprehension Scenario
A Multi-Layered Annotated Corpus of Scientific Papers
Korean TimeML and Korean TimeBank
TEG-REP: A corpus of Textual Entailment Graphs based on Relation Extraction Patterns
SYN2015: Representative Corpus of Contemporary Written Czech
Challenges of Evaluating Sentiment Analysis Tools on Social Media
EmoTweet-28: A Fine-Grained Emotion Corpus for Sentiment Analysis
A Corpus of Images and Text in Online News
WikiCoref: An English Coreference-annotated Corpus of Wikipedia Articles
POS-tagging of Historical Dutch
Accuracy of Automatic Cross-Corpus Emotion Labeling for Conversational Speech Corpus Commonization
The VU Sound Corpus: Adding More Fine-grained Annotations to the Freesound Database
A Taxonomy of Specific Problem Classes in Text-to-Speech Synthesis: Comparing Commercial and Open Source Performance
A Bilingual Discourse Corpus and Its Applications
Quality Assessment of the Reuters Vol. 2 Multilingual Corpus
Language Resource Addition Strategies for Raw Text Parsing
Information structure in the Potsdam Commentary Corpus: Topics
Compasses, Magnets, Water Microscopes: Annotation of Terminology in a Diachronic Corpus of Scientific Texts
The SpeDial datasets: datasets for Spoken Dialogue Systems analytics
A Corpus of Literal and Idiomatic Uses of German Infinitive-Verb Compounds
The ILMT-s2s Corpus ― A Multimodal Interlingual Map Task Corpus
The Negochat Corpus of Human-agent Negotiation Dialogues
KorAP Architecture ― Diving in the Deep Sea of Corpus Data
Name Translation based on Fine-grained Named Entity Recognition in a Single Language
Wikification for Scriptio Continua
Two Decades of Terminology: European Framework Programmes Titles
The IFCASL Corpus of French and German Non-native and Native Read Speech
Legal Text Interpretation: Identifying Hohfeldian Relations from Text
Learning Tone and Attribution for Financial Text Mining
Mirroring Facial Expressions and Emotions in Dyadic Conversations
SweLL on the rise: Swedish Learner Language corpus for European Reference Level studies
Uzbek-English and Turkish-English Morpheme Alignment Corpora
Text Segmentation of Digitized Clinical Texts
Large Multi-lingual, Multi-level and Multi-genre Annotation Corpus
Creating Annotated Dialogue Resources: Cross-domain Dialogue Act Classification
Giving Lexical Resources a Second Life: Démonette, a Multi-sourced Morpho-semantic Network for French
Solving the AL Chicken-and-Egg Corpus and Model Problem: Model-free Active Learning for Phenomena-driven Corpus Construction
Lexical Resources to Enrich English Malayalam Machine Translation
Building a Corpus of Errors and Quality in Machine Translation: Experiments on Error Impact
Reliable Baselines for Sentiment Analysis in Resource-Limited Languages: The Serbian Movie Review Dataset
TTS for Low Resource Languages: A Bangla Synthesizer
A Semantically Compositional Annotation Scheme for Time Normalization
PROMETHEUS: A Corpus of Proverbs Annotated with Metaphors
Corpus Annotation within the French FrameNet: a Domain-by-domain Methodology
Phrase Detectives Corpus 1.0 Crowdsourced Anaphoric Coreference.
Correcting Errors in a Treebank Based on Tree Mining
Comparison of Emotional Understanding in Modality-Controlled Environments using Multimodal Online Emotional Communication Corpus
A Multilingual, Multi-style and Multi-granularity Dataset for Cross-language Textual Similarity Detection
Corpus Resources for Dispute Mediation Discourse
The SemDaX Corpus ― Sense Annotations with Scalable Sense Inventories
A Corpus of Argument Networks: Using Graph Properties to Analyse Divisive Issues
WIKIPARQ: A Tabulated Wikipedia Resource Using the Parquet Format
Novel elicitation and annotation schemes for sentential and sub-sentential alignments of bitexts
Covering various Needs in Temporal Annotation: a Proposal of Extension of ISO TimeML that Preserves Upward Compatibility
A Turkish Database for Psycholinguistic Studies Based on Frequency, Age of Acquisition, and Imageability
4Couv: A New Treebank for French
Domain-Specific Corpus Expansion with Focused Webcrawling
Humor in Collective Discourse: Unsupervised Funniness Detection in the New Yorker Cartoon Caption Contest
A Large-scale Recipe and Meal Data Collection as Infrastructure for Food Research
CORILSE: a Spanish Sign Language Repository for Linguistic Analysis
A Comparative Analysis of Crowdsourced Natural Language Corpora for Spoken Dialog Systems
Discourse Structure and Dialogue Acts in Multiparty Dialogue: the STAC Corpus
An Arabic-Moroccan Darija Code-Switched Corpus
The OFAI Multi-Modal Task Description Corpus
A Tagged Corpus for Automatic Labeling of Disabilities in Medical Scientific Papers
A Corpus of Text Data and Gaze Fixations from Autistic and Non-Autistic Adults
Universal Dependencies v1: A Multilingual Treebank Collection
FABIOLE, a Speech Database for Forensic Speaker Comparison
A Japanese Chess Commentary Corpus
InScript: Narrative texts annotated with script information
Finding Definitions in Large Corpora with Sketch Engine
Towards a Multi-dimensional Taxonomy of Stories in Dialogue
PersonaBank: A Corpus of Personal Narratives and Their Story Intention Graphs
Corpus-Based Diacritic Restoration for South Slavic Languages
AfriBooms: An Online Treebank for Afrikaans
Parallel Sentence Extraction from Comparable Corpora with Neural Network Features
UPPC - Urdu Paraphrase Plagiarism Corpus
A Publicly Available Indonesian Corpora for Automatic Abstractive and Extractive Chat Summarization
Differentia compositionem facit. A Slower-Paced and Reliable Parser for Latin
How Diachronic Text Corpora Affect Context based Retrieval of OOV Proper Names for Audio News
Evaluating the Readability of Text Simplification Output for Readers with Cognitive Disabilities
AMISCO: The Austrian German Multi-Sensor Corpus
Emotion Analysis on Twitter: The Hidden Challenge
A Database of Laryngeal High-Speed Videos with Simultaneous High-Quality Audio Recordings of Pathological and Non-Pathological Voices
Identifying Content Types of Messages Related to Open Source Software Projects
WTF-LOD - A New Resource for Large-Scale NER Evaluation
C4Corpus: Multilingual Web-size Corpus with Free License
Training & Quality Assessment of an Optical Character Recognition Model for Northern Haida
Improving Information Extraction from Wikipedia Texts using Basic English
Exploiting a Large Strongly Comparable Corpus
Purely Corpus-based Automatic Conversation Authoring
FOLK-Gold ― A Gold Standard for Part-of-Speech-Tagging of Spoken German
Automatic identification of Mild Cognitive Impairment through the analysis of Italian spontaneous speech productions
CINTIL DependencyBank PREMIUM - A Corpus of Grammatical Dependencies for Portuguese
A General Framework for the Annotation of Causality Based on FrameNet
PE2rr Corpus: Manual Error Annotation of Automatically Pre-annotated MT Post-edits
Estonian Dependency Treebank: from Constraint Grammar tagset to Universal Dependencies
D(H)ante: A New Set of Tools for XIII Century Italian
LexFr: Adapting the LexIt Framework to Build a Corpus-based French Subcategorization Lexicon
QUEMDISSE? Reported speech in Portuguese
Annotating Temporally-Anchored Spatial Knowledge on Top of OntoNotes Semantic Roles
A Classification-based Approach to Economic Event Detection in Dutch News Text
A Corpus of Gesture-Annotated Dialogues for Monologue-to-Dialogue Generation from Personal Narratives
Construction of an English Dependency Corpus incorporating Compound Function Words
Simultaneous Sentence Boundary Detection and Alignment with Pivot-based Machine Translation Generated Lexicons
Design and Development of the MERLIN Learner Corpus Platform
EN-ES-CS: An English-Spanish Code-Switching Twitter Corpus for Multilingual Sentiment Analysis
The Universal Dependencies Treebank of Spoken Slovenian
Introducing the Asian Language Treebank (ALT)
The COPLE2 corpus: a learner corpus for Portuguese
TGermaCorp -- A (Digital) Humanities Resource for (Computational) Linguistics
1 Million Captioned Dutch Newspaper Images
ANTUSD: A Large Chinese Sentiment Dictionary
Multimodal Resources for Human-Robot Communication Modelling
Metrical Annotation of a Large Corpus of Spanish Sonnets: Representation, Scansion and Evaluation
The CAMOMILE Collaborative Annotation Platform for Multi-modal, Multi-lingual and Multi-media Documents
Annotating Discourse Relations in Spoken Language: A Comparison of the PDTB and CCR Frameworks
Corpus for Customer Purchase Behavior Prediction in Social Media
metaTED: a Corpus of Metadiscourse for Spoken Language
Universal Dependencies for Norwegian
TweetMT: A Parallel Microblog Corpus
Construction of Japanese Audio-Visual Emotion Database and Its Application in Emotion Recognition
GRaSP: A Multilayered Annotation Scheme for Perspectives
Nederlab: Towards a Single Portal and Research Environment for Diachronic Dutch Text Corpora
NLP and Public Engagement: The Case of the Italian School Reform
Enhancing The RATP-DECODA Corpus With Linguistic Annotations For Performing A Large Range Of NLP Tasks
Parallel Discourse Annotations on a Corpus of Short Texts
BulPhonC: Bulgarian Speech Corpus for the Development of ASR Technology
Designing a Speech Corpus for the Development and Evaluation of Dictation Systems in Latvian
Poly-GrETEL: Cross-Lingual Example-based Querying of Syntactic Constructions
Web Chat Conversations from Contact Centers: a Descriptive Study
MEANTIME, the NewsReader Multilingual Event and Time Corpus
LanguageCrawl: A Generic Tool for Building Language Models Upon Common-Crawl
Crowdsourcing a Large Dataset of Domain-Specific Context-Sensitive Semantic Verb Relations
The LetsRead Corpus of Portuguese Children Reading Aloud for Performance Evaluation
Crowdsourced Corpus with Entity Salience Annotations
ELMD: An Automatically Generated Entity Linking Gold Standard Dataset in the Music Domain
Features for Generic Corpus Querying
Graded and Word-Sense-Disambiguation Decisions in Corpus Pattern Analysis: a Pilot Study
Combining Manual and Automatic Prosodic Annotation for Expressive Speech Synthesis
Cysill Ar-lein: A Corpus of Written Contemporary Welsh Compiled from an On-line Spelling and Grammar Checker
Identification of Drug-Related Medical Conditions in Social Media
Emotion Corpus Construction Based on Selection from Hashtags
Mining the Spoken Wikipedia for Speech Data and Beyond
On the Use of a Serious Game for Recording a Speech Corpus of People with Intellectual Disabilities
A Corpus of Clinical Practice Guidelines Annotated with the Importance of Recommendations
Construction and Analysis of a Large Vietnamese Text Corpus
The dialogue breakdown detection challenge: Task description, datasets, and evaluation metrics
The Methodius Corpus of Rhetorical Discourse Structures and Generated Texts
SpaceRef: A corpus of street-level geographic descriptions
That'll Do Fine!: A Coarse Lexical Resource for English-Hindi MT, Using Polylingual Topic Models
Constructing a Norwegian Academic Wordlist
Tweeting and Being Ironic in the Debate about a Political Reform: the French Annotated Corpus TWitter-MariagePourTous
CItA: an L1 Italian Learners Corpus to Study the Development of Writing Competence
CEPLEXicon ― A Lexicon of Child European Portuguese
Finding Recurrent Features of Image Schema Gestures: the FIGURE corpus
Evaluating Lexical Simplification and Vocabulary Knowledge for Learners of French: Possibilities of Using the FLELex Resource
A Corpus of Read and Spontaneous Upper Saxon German Speech for ASR Evaluation
Parallel Speech Corpora of Japanese Dialects
Automatic Recognition of Linguistic Replacements in Text Series Generated from Keystroke Logs
Towards a Corpus of Violence Acts in Arabic Social Media
Affective Lexicon Creation for the Greek Language
The TYPALOC Corpus: A Collection of Various Dysarthric Speech Recordings in Read and Spontaneous Styles
Multilevel Annotation of Agreement and Disagreement in Italian News Blogs
PentoRef: A Corpus of Spoken References in Task-oriented Dialogues
Building Language Resources for Exploring Autism Spectrum Disorders
Comprehensive and Consistent PropBank Light Verb Annotation
Summ-it++: an Enriched Version of the Summ-it Corpus
Automatic Corpus Extension for Data-driven Natural Language Generation
European Union Language Resources in Sketch Engine
Extracting Structured Scholarly Information from the Machine Translation Literature
Edit Categories and Editor Role Identification in Wikipedia
Inconsistency Detection in Semantic Annotation
Bilbo-Val: Automatic Identification of Bibliographical Zone in Papers
Staggered NLP-assisted refinement for Clinical Annotations of Chronic Disease Events
SCARE ― The Sentiment Corpus of App Reviews with Fine-grained Annotations in German
Developing a Dataset for Evaluating Approaches for Document Expansion with Images
Coordinating Communication in the Wild: The Artwalk Dialogue Corpus of Pedestrian Navigation and Mobile Referential Communication
A Multimodal Corpus for the Assessment of Public Speaking Ability and Anxiety
Fast and Robust POS tagger for Arabic Tweets Using Agreement-based Bootstrapping
WAGS: A Beautiful English-Italian Benchmark Supporting Word Alignment Evaluation on Rare Words
Datasets for Aspect-Based Sentiment Analysis in French
Integration of Lexical and Semantic Knowledge for Sentiment Analysis in SMS
DART: a Dataset of Arguments and their Relations on Twitter
Hypergraph Modelization of a Syntactically Annotated English Wikipedia Dump
MADAD: A Readability Annotation Tool for Arabic Text
Finding Alternative Translations in a Large Corpus of Movie Subtitle
ASPEC: Asian Scientific Paper Excerpt Corpus
Discontinuous Verb Phrases in Parsing and Machine Translation of English and German
A Large-Scale Multilingual Disambiguation of Glosses
Domain Adaptation in MT Using Titles in Wikipedia as a Parallel Corpus: Resources and Evaluation
Crowdsourcing Salient Information from News and Tweets
Guidelines and Framework for a Large Scale Arabic Diacritized Corpus
A Dutch Dysarthric Speech Database for Individualized Speech Therapy Research
TwiSty: A Multilingual Twitter Stylometry Corpus for Gender and Personality Profiling
TEITOK: Text-Faithful Annotated Corpora
Extracting Interlinear Glossed Text from LaTeX Documents
A Shared Task for Spoken CALL?
From Interoperable Annotations towards Interoperable Resources: A Multilingual Approach to the Analysis of Discourse
Laughter in French Spontaneous Conversational Dialogs
A Corpus of Word-Aligned Asked and Anticipated Questions in a Virtual Patient Dialogue System
The ACL RD-TEC 2.0: A Language Resource for Evaluating Term Extraction and Entity Recognition Methods
Persian Proposition Bank
Dialogue System Characterisation by Back-channelling Patterns Extracted from Dialogue Corpus
Creation of comparable corpora for English-{Urdu, Arabic, Persian}
Detecting Annotation Scheme Variation in Out-of-Domain Treebanks
SciCorp: A Corpus of English Scientific Articles Annotated for Information Status Analysis
Building an Arabic Machine Translation Post-Edited Corpus: Guidelines and Annotation
Universal Dependencies for Persian
Aspect based Sentiment Analysis in Hindi: Resource Creation and Evaluation
BosphorusSign: A Turkish Sign Language Recognition Corpus in Health and Finance Domains
A Longitudinal Bilingual Frisian-Dutch Radio Broadcast Database Designed for Code-Switching Research
Gulf Arabic Linguistic Resource Building for Sentiment Analysis
If You Even Don't Have a Bit of Bible: Learning Delexicalized POS Taggers
The CIRDO Corpus: Comprehensive Audio/Video Database of Domestic Falls of Elderly People
Annotating Named Entities in Consumer Health Questions
VPS-GradeUp: Graded Decisions on Usage Patterns
Interoperability of Annotation Schemes: Using the Pepper Framework to Display AWA Documents in the ANNIS Interface
PARC 3.0: A Corpus of Attribution Relations
Hard Time Parsing Questions: Building a QuestionBank for French
SuperCAT: The (New and Improved) Corpus Analysis Toolkit
Morphologically Annotated Corpora and Morphological Analyzers for Moroccan and Sanaani Yemeni Arabic
AppDialogue: Multi-App Dialogues for Intelligent Assistants
A Multimodal Motion-Captured Corpus of Matched and Mismatched Extravert-Introvert Conversational Pairs
Urdu Summary Corpus
Towards Automatic Identification of Effective Clues for Team Word-Guessing Games
A CUP of CoFee: A large Collection of feedback Utterances Provided with communicative function annotations
OSMAN ― A Novel Arabic Readability Metric
Parallel Global Voices: a Collection of Multilingual Corpora with Citizen Media Stories
Typed Entity and Relation Annotation on Computer Science Papers
Speech Corpus Spoken by Young-old, Old-old and Oldest-old Japanese
Summarizing Behaviours: An Experiment on the Annotation of Call-Centre Conversations
Automatic Construction of Discourse Corpora for Dialogue Translation
TermITH-Eval: a French Standard-Based Resource for Keyphrase Extraction Evaluation
The Royal Society Corpus: From Uncharted Data to Corpus
The Scielo Corpus: a Parallel Corpus of Scientific Publications for Biomedicine
ArchiMob - A Corpus of Spoken Swiss German
Building Evaluation Datasets for Consumer-Oriented Information Retrieval
Annotating Topic Development in Information Seeking Queries
Detection of Reformulations in Spoken French
Corpus vs. Lexicon Supervision in Morphosyntactic Tagging: the Case of Slovene
A Proposition Bank of Urdu
A Hungarian Sentiment Corpus Manually Annotated at Aspect Level
Creating a Lexicon of Bavarian Dialect by Means of Facebook Language Data and Crowdsourcing
A Large Scale Corpus of Gulf Arabic
CHATR the Corpus; a 20-year-old archive of Concatenative Speech Synthesis
A Regional News Corpora for Contextualized Entity Discovery and Linking
Survey of Conversational Behavior: Towards the Design of a Balanced Corpus of Everyday Japanese Conversation
A Dataset for Open Event Extraction in English
Twitter as a Lifeline: Human-annotated Twitter Corpora for NLP of Crisis-related Messages
Coreference Annotation Scheme and Relation Types for Hindi
A Study of Reuse and Plagiarism in LREC papers
A Reading Comprehension Corpus for Machine Translation Evaluation
Transfer of Corpus-Specific Dialogue Act Annotation to ISO Standard: Is it worth it?
Producing Monolingual and Parallel Web Corpora at the Same Time - SpiderLing and Bitextor's Love Affair
A Multi-party Multi-modal Dataset for Focus of Visual Attention in Human-human and Human-robot Interaction
Semantic Annotation of the ACL Anthology Corpus for the Automatic Analysis of Scientific Literature
Designing A Long Lasting Linguistic Project: The Case Study of ASIt
Controlled Propagation of Concept Annotations in Textual Corpora
Exploiting Arabic Diacritization for High Quality Automatic Annotation
An Extension of the Slovak Broadcast News Corpus based on Semi-Automatic Annotation
Coreference in Prague Czech-English Dependency Treebank
Joining-in-type Humanoid Robot Assisted Language Learning System
Searching in the Penn Discourse Treebank Using the PML-Tree Query
Rapid Development of Morphological Analyzers for Typologically Diverse Languages
DBpedia Abstracts: A Large-Scale, Open, Multilingual NLP Training Corpus
A Multi-domain Corpus of Swedish Word Sense Annotation
A Corpus of Native, Non-native and Translated Texts
He Said She Said ― a Male/Female Corpus of Polish
Global Open Resources and Information for Language and Linguistic Analysis (GORILLA)
Crowdsourcing an OCR Gold Standard for a German and French Heritage Corpus
corpus-tools.org: An Interoperable Generic Software Tool Set for Multi-layer Linguistic Corpora
On Developing Resources for Patient-level Information Retrieval
Graphical Annotation for Syntax-Semantics Mapping
Monolingual Social Media Datasets for Detecting Contradiction and Entailment
Evaluating Entity Linking: An Analysis of Current Benchmark Datasets and a Roadmap for Doing a Better Job
Multi-label Annotation in Scientific Articles - The Multi-label Cancer Risk Assessment Corpus
Improving the Annotation of Sentence Specificity
Functions of Code-Switching in Tweets: An Annotation Framework and Some Initial Experiments
Czech Legal Text Treebank 1.0
Building A Case-based Semantic English-Chinese Parallel Treebank
NorGramBank: A Deep Treebank for Norwegian
VerbLexPor: a lexical resource with semantic roles for Portuguese
OpenSubtitles2016: Extracting Large Parallel Corpora from Movie and TV Subtitles
Challenges and Solutions for Consistent Annotation of Vietnamese Treebank
Crowdsourcing a Multi-lingual Speech Corpus: Recording, Transcription and Annotation of the CrowdIS Corpora
First Steps Towards Coverage-Based Sentence Alignment
Latin Vallex. A Treebank-based Semantic Valency Lexicon for Latin
CommonCOW: Massively Huge Web Corpora from CommonCrawl Data and a Method to Distribute them Freely under Restrictive EU Copyright Laws
Sentiframes: A Resource for Verb-centered German Sentiment Inference
Temporal Information Annotation: Crowd vs. Experts
PotTS: The Potsdam Twitter Sentiment Corpus
Parallel Chinese-English Entities, Relations and Events Corpora
Automatic Classification of Tweets for Analyzing Communication Behavior of Museums
Adapting the TANL tool suite to Universal Dependencies
Crowdsourcing |
A Gold Standard for Scalar Adjectives
Optimizing Computer-Assisted Transcription Quality with Iterative User Interfaces
Remote Elicitation of Inflectional Paradigms to Seed Morphological Analysis in Low-Resource Languages
The REAL Corpus: A Crowd-Sourced Corpus of Human Generated and Evaluated Spatial References to Real-World Urban Scenes
Focus Annotation of Task-based Data: A Comparison of Expert and Crowd-Sourced Annotation in a Reading Comprehension Corpus
Arabic Corpora for Credibility Analysis
A Web Tool for Building Parallel Corpora of Spoken and Sign Languages
The OnForumS corpus from the Shared Task on Online Forum Summarisation at MultiLing 2015
A Tangled Web: The Faint Signals of Deception in Text - Boulder Lies and Truth Corpus (BLT-C)
Introducing the Weighted Trustability Evaluator for Crowdsourcing Exemplified by Speaker Likability Classification
Japanese Word―Color Associations with and without Contexts
Wikipedia Titles As Noun Tag Predictors
The VU Sound Corpus: Adding More Fine-grained Annotations to the Freesound Database
Crowdsourcing Ontology Lexicons
The Negochat Corpus of Human-agent Negotiation Dialogues
Analysis of English Spelling Errors in a Word-Typing Game
Phrase Detectives Corpus 1.0 Crowdsourced Anaphoric Coreference.
Towards Using Social Media to Identify Individuals at Risk for Preventable Chronic Illness
A Comparative Analysis of Crowdsourced Natural Language Corpora for Spoken Dialog Systems
InScript: Narrative texts annotated with script information
Enhancing Access to Online Education: Quality Machine Translation of MOOC Content
Annotating Temporally-Anchored Spatial Knowledge on Top of OntoNotes Semantic Roles
Palabras: Crowdsourcing Transcriptions of L2 Speech
Crowdsourcing a Large Dataset of Domain-Specific Context-Sensitive Semantic Verb Relations
Crowdsourced Corpus with Entity Salience Annotations
Cysill Ar-lein: A Corpus of Written Contemporary Welsh Compiled from an On-line Spelling and Grammar Checker
EasyTree: A Graphical Tool for Dependency Tree Annotation
Towards a Corpus of Violence Acts in Arabic Social Media
Crowdsourcing Salient Information from News and Tweets
Acquiring Opposition Relations among Italian Verb Senses using Crowdsourcing
Semantic Relation Extraction with Semantic Patterns Experiment on Radiology Reports
Creating a Lexicon of Bavarian Dialect by Means of Facebook Language Data and Crowdsourcing
Crowdsourcing an OCR Gold Standard for a German and French Heritage Corpus
Crowdsourcing a Multi-lingual Speech Corpus: Recording, Transcription and Annotation of the CrowdIS Corpora
Temporal Information Annotation: Crowd vs. Experts
D |
Dialogue |
An Annotated Corpus of Direct Speech
Internet Argument Corpus 2.0: An SQL schema for Dialogic Social Media and the Corpora to go with it
Ubuntu-fr: A Large and Open Corpus for Multi-modal Analysis of Online Written Conversations
DUEL: A Multi-lingual Multimodal Dialogue Corpus for Disfluency, Exclamations and Laughter
Capturing Chat: Annotation and Tools for Multiparty Casual Conversation.
DT-Neg: Tutorial Dialogues Annotated for Negation Scope and Focus in Context
A Dependency Treebank of the Chinese Buddhist Canon
Modelling Multi-issue Bargaining Dialogues: Data Collection, Annotation Design and Corpus
AIMU: Actionable Items for Meeting Understanding
The SpeDial datasets: datasets for Spoken Dialogue Systems analytics
The Negochat Corpus of Human-agent Negotiation Dialogues
Mirroring Facial Expressions and Emotions in Dyadic Conversations
Creating Annotated Dialogue Resources: Cross-domain Dialogue Act Classification
A Comparative Study of Text Preprocessing Approaches for Topic Detection of User Utterances
Discourse Structure and Dialogue Acts in Multiparty Dialogue: the STAC Corpus
Towards a Multi-dimensional Taxonomy of Stories in Dialogue
A Document Repository for Social Media and Speech Conversations
Purely Corpus-based Automatic Conversation Authoring
A Corpus of Gesture-Annotated Dialogues for Monologue-to-Dialogue Generation from Personal Narratives
The dialogue breakdown detection challenge: Task description, datasets, and evaluation metrics
PentoRef: A Corpus of Spoken References in Task-oriented Dialogues
The DialogBank
Coordinating Communication in the Wild: The Artwalk Dialogue Corpus of Pedestrian Navigation and Mobile Referential Communication
Vocal Pathologies Detection and Mispronounced Phonemes Identification: Case of Arabic Continuous Speech
Managing Linguistic and Terminological Variation in a Medical Dialogue System
Laughter in French Spontaneous Conversational Dialogs
A Corpus of Word-Aligned Asked and Anticipated Questions in a Virtual Patient Dialogue System
Dialogue System Characterisation by Back-channelling Patterns Extracted from Dialogue Corpus
AppDialogue: Multi-App Dialogues for Intelligent Assistants
A Verbal and Gestural Corpus of Story Retellings to an Expressive Embodied Virtual Character
A Multimodal Motion-Captured Corpus of Matched and Mismatched Extravert-Introvert Conversational Pairs
Towards Automatic Identification of Effective Clues for Team Word-Guessing Games
A CUP of CoFee: A large Collection of feedback Utterances Provided with communicative function annotations
Summarizing Behaviours: An Experiment on the Annotation of Call-Centre Conversations
ArchiMob - A Corpus of Spoken Swiss German
Survey of Conversational Behavior: Towards the Design of a Balanced Corpus of Everyday Japanese Conversation
A Multi-party Multi-modal Dataset for Focus of Visual Attention in Human-human and Human-robot Interaction
Deep Learning of Audio and Language Features for Humor Prediction
Digital Libraries |
A Computational Perspective on the Romanian Dialects
Evaluating the Noisy Channel Model for the Normalization of Historical Texts: Basque, Spanish and Slovene
Measuring Lexical Quality of a Historical Finnish Newspaper Collection ― Analysis of Garbled OCR Data with Basic Language Technology Tools and Means
South African National Centre for Digital Language Resources
Bilbo-Val: Automatic Identification of Bibliographical Zone in Papers
OCR Post-Correction Evaluation of Early Dutch Books Online - Revisited
Data Management Plans and Data Centers
Lin|gu|is|tik: Building the Linguist's Pathway to Bibliographies, Libraries, Language Resources and Linked Open Data
Designing A Long Lasting Linguistic Project: The Case Study of ASIt
Discourse Annotation, Representation and Processing |
Falling silent, lost for words ... Tracing personal involvement in interviews with Dutch war veterans
Focus Annotation of Task-based Data: A Comparison of Expert and Crowd-Sourced Annotation in a Reading Comprehension Corpus
The OpenCourseWare Metadiscourse (OCWMD) Corpus
ARRAU: Linguistically-Motivated Annotation of Anaphoric Descriptions
Ubuntu-fr: A Large and Open Corpus for Multi-modal Analysis of Online Written Conversations
DUEL: A Multi-lingual Multimodal Dialogue Corpus for Disfluency, Exclamations and Laughter
Quantitative Analysis of Gazes and Grounding Acts in L1 and L2 Conversations
A Multi-Layered Annotated Corpus of Scientific Papers
A Bilingual Discourse Corpus and Its Applications
Information structure in the Potsdam Commentary Corpus: Topics
The SpeDial datasets: datasets for Spoken Dialogue Systems analytics
Learning Tone and Attribution for Financial Text Mining
Adding Semantic Relations to a Large-Coverage Connective Lexicon of German
Corpus Resources for Dispute Mediation Discourse
A Corpus of Argument Networks: Using Graph Properties to Analyse Divisive Issues
PROTEST: A Test Suite for Evaluating Pronouns in Machine Translation
Discourse Structure and Dialogue Acts in Multiparty Dialogue: the STAC Corpus
A Tagged Corpus for Automatic Labeling of Disabilities in Medical Scientific Papers
PersonaBank: A Corpus of Personal Narratives and Their Story Intention Graphs
Fine-Grained Chinese Discourse Relation Labelling
A Corpus of Gesture-Annotated Dialogues for Monologue-to-Dialogue Generation from Personal Narratives
Argument Mining: the Bottleneck of Knowledge and Language Resources
Annotating Discourse Relations in Spoken Language: A Comparison of the PDTB and CCR Frameworks
metaTED: a Corpus of Metadiscourse for Spoken Language
Enhancing The RATP-DECODA Corpus With Linguistic Annotations For Performing A Large Range Of NLP Tasks
Parallel Discourse Annotations on a Corpus of Short Texts
A Corpus of Clinical Practice Guidelines Annotated with the Importance of Recommendations
The Methodius Corpus of Rhetorical Discourse Structures and Generated Texts
The DialogBank
From Interoperable Annotations towards Interoperable Resources: A Multilingual Approach to the Analysis of Discourse
Applying Core Scientific Concepts to Context-Based Citation Recommendation
SciCorp: A Corpus of English Scientific Articles Annotated for Information Status Analysis
PARC 3.0: A Corpus of Attribution Relations
Using lexical and Dependency Features to Disambiguate Discourse Connectives in Hindi
A CUP of CoFee: A large Collection of feedback Utterances Provided with communicative function annotations
Summarizing Behaviours: An Experiment on the Annotation of Call-Centre Conversations
Automatic Construction of Discourse Corpora for Dialogue Translation
Annotating Topic Development in Information Seeking Queries
Coreference Annotation Scheme and Relation Types for Hindi
Transfer of Corpus-Specific Dialogue Act Annotation to ISO Standard: Is it worth it?
Searching in the Penn Discourse Treebank Using the PML-Tree Query
Cohere: A Toolkit for Local Coherence
Multi-label Annotation in Scientific Articles - The Multi-label Cancer Risk Assessment Corpus
Improving the Annotation of Sentence Specificity
Document Classification, Text categorisation |
Evaluating Unsupervised Dutch Word Embeddings as a Linguistic Resource
An Empirical Exploration of Moral Foundations Theory in Partisan News Sources
DRANZIERA: An Evaluation Protocol For Multi-Domain Opinion Mining
Coh-Metrix-Esp: A Complexity Analysis Tool for Documents Written in Spanish
Age and Gender Prediction on Health Forum Data
Comparing Speech and Text Classification on ICNALE
A Tangled Web: The Faint Signals of Deception in Text - Boulder Lies and Truth Corpus (BLT-C)
SatiricLR: a Language Resource of Satirical News Articles
Compilation of an Arabic Childrens Corpus
Quality Assessment of the Reuters Vol. 2 Multilingual Corpus
Learning Tone and Attribution for Financial Text Mining
Reliable Baselines for Sentiment Analysis in Resource-Limited Languages: The Serbian Movie Review Dataset
A Comparative Study of Text Preprocessing Approaches for Topic Detection of User Utterances
A Comparison of Domain-based Word Polarity Estimation using different Word Embeddings
Towards a Multi-dimensional Taxonomy of Stories in Dialogue
A Semi-Supervised Approach for Gender Identification
Identifying Content Types of Messages Related to Open Source Software Projects
Ensemble Classification of Grants using LDA-based Features
Character-Level Neural Translation for Multilingual Media Monitoring in the SUMMA Project
Emotion Corpus Construction Based on Selection from Hashtags
A Corpus of Clinical Practice Guidelines Annotated with the Importance of Recommendations
Towards a Corpus of Violence Acts in Arabic Social Media
Edit Categories and Editor Role Identification in Wikipedia
Exploring the Realization of Irony in Twitter Data
Evaluation Set for Slovak News Information Retrieval
Discriminating Similar Languages: Evaluations and Explorations
Modeling Language Change in Historical Corpora: The Case of Portuguese
Twitter as a Lifeline: Human-annotated Twitter Corpora for NLP of Crisis-related Messages
Specialising Paragraph Vectors for Text Polarity Detection
A Corpus of Native, Non-native and Translated Texts
He Said She Said ― a Male/Female Corpus of Polish
Cohere: A Toolkit for Local Coherence
Multi-label Annotation in Scientific Articles - The Multi-label Cancer Risk Assessment Corpus
MoBiL: A Hybrid Feature Set for Automatic Human Translation Quality Assessment
Detecting Expressions of Blame or Praise in Text
Automatic Classification of Tweets for Analyzing Communication Behavior of Museums
E |
Emotion Recognition/Generation |
Falling silent, lost for words ... Tracing personal involvement in interviews with Dutch war veterans
EmoTweet-28: A Fine-Grained Emotion Corpus for Sentiment Analysis
Accuracy of Automatic Cross-Corpus Emotion Labeling for Conversational Speech Corpus Commonization
Mirroring Facial Expressions and Emotions in Dyadic Conversations
Detecting Implicit Expressions of Affect from Text using Semantic Knowledge on Common Concept Properties
Comparison of Emotional Understanding in Modality-Controlled Environments using Multimodal Online Emotional Communication Corpus
Humor in Collective Discourse: Unsupervised Funniness Detection in the New Yorker Cartoon Caption Contest
A Comparison of Domain-based Word Polarity Estimation using different Word Embeddings
Emotion Analysis on Twitter: The Hidden Challenge
AVAB-DBS: an Audio-Visual Affect Bursts Database for Synthesis
Construction of Japanese Audio-Visual Emotion Database and Its Application in Emotion Recognition
Could Speaker, Gender or Age Awareness be beneficial in Speech-based Emotion Recognition?
Tweeting and Being Ironic in the Debate about a Political Reform: the French Annotated Corpus TWitter-MariagePourTous
Affective Lexicon Creation for the Greek Language
Datasets for Aspect-Based Sentiment Analysis in French
Evaluating Context Selection Strategies to Build Emotive Vector Space Models
Sentiment Analysis in Social Networks through Topic modeling
A Multimodal Motion-Captured Corpus of Matched and Mismatched Extravert-Introvert Conversational Pairs
Deep Learning of Audio and Language Features for Humor Prediction
PotTS: The Potsdam Twitter Sentiment Corpus
Endangered Languages |
Endangered Language Documentation: Bootstrapping a Chatino Speech Corpus, Forced Aligner, ASR
A Finite-state Morphological Analyser for Tuvan
Remote Elicitation of Inflectional Paradigms to Seed Morphological Analysis in Low-Resource Languages
Generating a Yiddish Speech Corpus, Forced Aligner and Basic ASR System for the AHEYM Project
The Alaskan Athabascan Grammar Database
Using a Small Lexicon with CRFs Confidence Measure to Improve POS Tagging Accuracy
Constraint-Based Bilingual Lexicon Induction for Closely Related Languages
Selection Criteria for Low Resource Language Programs
Data Formats and Management Strategies from the Perspective of Language Resource Producers ― Personal Diachronic and Social Synchronic Data Sharing ―
A Morphological Lexicon of Esperanto with Morpheme Frequencies
Training & Quality Assessment of an Optical Character Recognition Model for Northern Haida
Fostering digital representation of EU regional and minority languages: the Digital Language Diversity Project
Cysill Ar-lein: A Corpus of Written Contemporary Welsh Compiled from an On-line Spelling and Grammar Checker
Bridge-Language Capitalization Inference in Western Iranian: Sorani, Kurmanji, Zazaki, and Tajik
Chatbot Technology with Synthetic Voices in the Acquisition of an Endangered Language: Motivation, Development and Evaluation of a Platform for Irish
If You Even Don't Have a Bit of Bible: Learning Delexicalized POS Taggers
Legacy language atlas data mining: mapping Kru languages
A Rule-based Shallow-transfer Machine Translation System for Scots and English
Evaluation Methodologies |
Orthographic and Morphological Correspondences between Related Slavic Languages as a Base for Modeling of Mutual Intelligibility
Ecological Gestures for HRI: the GEE Corpus
Complementarity, F-score, and NLP Evaluation
DRANZIERA: An Evaluation Protocol For Multi-Domain Opinion Mining
Manual and Automatic Paraphrases for MT Evaluation
LORELEI Language Packs: Data, Tools, and Resources for Technology Development in Low Resource Languages
Using the TED Talks to Evaluate Spoken Post-editing of Machine Translation
Revisiting Summarization Evaluation for Scientific Articles
Whats the Issue Here?: Task-based Evaluation of Reader Comment Summarization Systems
RankDCG: Rank-Ordering Evaluation Measure
Spanish Word Vectors from Wikipedia
The Language Application Grid and Galaxy
Multi-language Speech Collection for NIST LRE
An Empirical Study of Arabic Formulaic Sequence Extraction Methods
Homing in on Twitter Users: Evaluating an Enhanced Geoparser for User Profile Locations
Evaluating a Topic Modelling Approach to Measuring Corpus Similarity
Measuring Lexical Quality of a Historical Finnish Newspaper Collection ― Analysis of Garbled OCR Data with Basic Language Technology Tools and Means
Use of Domain-Specific Language Resources in Machine Translation
Exploitation of Co-reference in Distributional Semantics
A Taxonomy of Specific Problem Classes in Text-to-Speech Synthesis: Comparing Commercial and Open Source Performance
Compasses, Magnets, Water Microscopes: Annotation of Terminology in a Diachronic Corpus of Scientific Texts
A Novel Evaluation Method for Morphological Segmentation
Building a Corpus of Errors and Quality in Machine Translation: Experiments on Error Impact
Novel elicitation and annotation schemes for sentential and sub-sentential alignments of bitexts
PROTEST: A Test Suite for Evaluating Pronouns in Machine Translation
Linguistically Inspired Language Model Augmentation for MT
UPPC - Urdu Paraphrase Plagiarism Corpus
Evaluating the Readability of Text Simplification Output for Readers with Cognitive Disabilities
Word Embedding Evaluation and Combination
PE2rr Corpus: Manual Error Annotation of Automatically Pre-annotated MT Post-edits
D(H)ante: A New Set of Tools for XIII Century Italian
Benchmarking multimedia technologies with the CAMOMILE platform: the case of Multimodal Person Discovery at MediaEval 2015
Polarity Lexicon Building: to what Extent Is the Manual Effort Worth?
Using Contextual Information for Machine Translation Evaluation
Evaluating the Impact of Light Post-Editing on Usability
Standard Test Collection for English-Persian Cross-Lingual Word Sense Disambiguation
Evaluating Machine Translation in a Usage Scenario
Cross-validating Image Description Datasets and Evaluation Metrics
OCR Post-Correction Evaluation of Early Dutch Books Online - Revisited
WAGS: A Beautiful English-Italian Benchmark Supporting Word Alignment Evaluation on Rare Words
Guidelines and Framework for a Large Scale Arabic Diacritized Corpus
Comparing the Level of Code-Switching in Corpora
Evaluation Set for Slovak News Information Retrieval
The ACL RD-TEC 2.0: A Language Resource for Evaluating Term Extraction and Entity Recognition Methods
Building an Arabic Machine Translation Post-Edited Corpus: Guidelines and Annotation
Tools and Guidelines for Principled Machine Translation Development
Generating Task-Pertinent sorted Error Lists for Speech Recognition
Towards Automatic Identification of Effective Clues for Team Word-Guessing Games
OSMAN ― A Novel Arabic Readability Metric
EVALution-MAN: A Chinese Dataset for the Training and Evaluation of DSMs
Analysing Constraint Grammars with a SAT-solver
The Trials and Tribulations of Predicting Post-Editing Productivity
Analyzing Pre-processing Settings for Urdu Single-document Extractive Summarization
A Regional News Corpora for Contextualized Entity Discovery and Linking
Evaluating Interactive System Adaptation
Applying the Cognitive Machine Translation Evaluation Approach to Arabic
A Reading Comprehension Corpus for Machine Translation Evaluation
B2SG: a TOEFL-like Task for Portuguese
Translation Errors and Incomprehensibility: a Case Study using Machine-Translated Second Language Proficiency Tests
Distributional Thesauri for Information Retrieval and vice versa
MoBiL: A Hybrid Feature Set for Automatic Human Translation Quality Assessment
L |
Language Identification |
Evaluating Unsupervised Dutch Word Embeddings as a Linguistic Resource
Multi-language Speech Collection for NIST LRE
An Arabic-Moroccan Darija Code-Switched Corpus
Integration of Lexical and Semantic Knowledge for Sentiment Analysis in SMS
Assessing the Potential of Metaphoricity of verbs using corpus data
Discriminating Similar Languages: Evaluations and Explorations
Language Modelling |
MARMOT: A Toolkit for Translation Quality Estimation at the Word Level
Deriving Morphological Analyzers from Example Inflections
Discriminative Analysis of Linguistic Features for Typological Study
Morphological Analysis of Sahidic Coptic for Automatic Glossing
Factuality Annotation and Learning in Spanish Texts
Creating Linked Data Morphological Language Resources with MMoOn - The Hebrew Morpheme Inventory
Using SMT for OCR Error Correction of Historical Texts
Domain-Specific Corpus Expansion with Focused Webcrawling
Linguistically Inspired Language Model Augmentation for MT
Leveraging Native Data to Correct Preposition Errors in Learners' Dutch
GRaSP: A Multilayered Annotation Scheme for Perspectives
SCALE: A Scalable Language Engineering Toolkit
Towards a Linguistic Ontology with an Emphasis on Reasoning and Knowledge Reuse
Extracting Weighted Language Lexicons from Wikipedia
Filtering Wiktionary Triangles by Linear Mbetween Distributed Word Models
Discriminating Similar Languages: Evaluations and Explorations
Lexicon, Lexical Database |
Semantic Links for Portuguese
A Gold Standard for Scalar Adjectives
A Finite-state Morphological Analyser for Tuvan
The Gavagai Living Lexicon
VerbCROcean: A Repository of Fine-Grained Semantic Verb Relations for Croatian
Rule-based Automatic Multi-word Term Extraction and Lemmatization
A New Integrated Open-source Morphological Analyzer for Hungarian
Transfer-Based Learning-to-Rank Assessment of Medical Term Technicality
Enriching a Portuguese WordNet using Synonyms from a Monolingual Dictionary
Very-large Scale Parsing and Normalization of Wiktionary Morphological Paradigms
Tēzaurs.lv: the Largest Open Lexical Database for Latvian
NileULex: A Phrase and Word Level Sentiment Lexicon for Egyptian and Modern Standard Arabic
VoxML: A Visualization Modeling Language
Example-based Acquisition of Fine-grained Collocation Resources
A Finite-State Morphological Analyser for Sindhi
A Computational Perspective on the Romanian Dialects
The on-line version of Grammatical Dictionary of Polish
A Taxonomy of Spanish Nouns, a Statistical Algorithm to Generate it and its Implementation in Open Source Code
Synset Ranking of Hindi WordNet
Evaluating Lexical Similarity to build Sentiment Similarity
Constraint-Based Bilingual Lexicon Induction for Closely Related Languages
An Empirical Study of Arabic Formulaic Sequence Extraction Methods
Japanese Word―Color Associations with and without Contexts
A Language Resource of German Errors Written by Children with Dyslexia
Discovering Fuzzy Synsets from the Redundancy in Different Lexical-Semantic Resources
Aspectual Flexibility Increases with Agentivity and Concreteness\\ A Computational Classification Experiment on Polysemous Verbs
"LVF-lemon ― Towards a Linked Data Representation of ""Les Verbes français"""
A Framework for Cross-lingual/Node-wise Alignment of Lexical-Semantic Resources
Crowdsourcing Ontology Lexicons
Curation of Dutch Regional Dictionaries
A sense-based lexicon of count and mass expressions: The Bochum English Countability Lexicon
A lexicon of perception for the identification of synaesthetic metaphors in corpora
Happy Accident: A Sentiment Composition Lexicon for Opposing Polarity Phrases
Wikification for Scriptio Continua
Two Decades of Terminology: European Framework Programmes Titles
Lexical Coverage Evaluation of Large-scale Multilingual Semantic Lexicons for Twelve Languages
A Morphological Lexicon of Esperanto with Morpheme Frequencies
How does Dictionary Size Influence Performance of Vietnamese Word Segmentation?
Adding Semantic Relations to a Large-Coverage Connective Lexicon of German
SVALex: a CEFR-graded Lexical Resource for Swedish Foreign and Second Language Learners
Giving Lexical Resources a Second Life: Démonette, a Multi-sourced Morpho-semantic Network for French
Lexical Resources to Enrich English Malayalam Machine Translation
Creating a General Russian Sentiment Lexicon
TTS for Low Resource Languages: A Bangla Synthesizer
GhoSt-NN: A Representative Gold Standard of German Noun-Noun Compounds
A Turkish Database for Psycholinguistic Studies Based on Frequency, Age of Acquisition, and Imageability
Building Concept Graphs from Monolingual Dictionary Entries
Detecting Optional Arguments of Verbs
New Inflectional Lexicons and Training Corpora for Improved Morphosyntactic Annotation of Croatian and Serbian
Classifying Out-of-vocabulary Terms in a Domain-Specific Social Media Corpus
DeQue: A Lexicon of Complex Prepositions and Conjunctions in French
A Japanese Chess Commentary Corpus
Paraphrasing Out-of-Vocabulary Words with Word Embeddings and Semantic Lexicons for Low Resource Statistical Machine Translation
Encoding Adjective Scales for Fine-grained Resources
How Diachronic Text Corpora Affect Context based Retrieval of OOV Proper Names for Audio News
Automatic Enrichment of WordNet with Common-Sense Knowledge
Semantic Layer of the Valence Dictionary of Polish Walenty
Ambiguity Diagnosis for Terms in Digital Humanities
A General Framework for the Annotation of Causality Based on FrameNet
LexFr: Adapting the LexIt Framework to Build a Corpus-based French Subcategorization Lexicon
QUEMDISSE? Reported speech in Portuguese
Extending Monolingual Semantic Textual Similarity Task to Multiple Cross-lingual Settings
Simultaneous Sentence Boundary Detection and Alignment with Pivot-based Machine Translation Generated Lexicons
The Hebrew FrameNet Project
Addressing the MFS Bias in WSD systems
A Lexical Resource of Hebrew Verb-Noun Multi-Word Expressions
Italian VerbNet: A Construction-based Approach to Italian Verb Classification
TGermaCorp -- A (Digital) Humanities Resource for (Computational) Linguistics
LELIO: An Auto-Adaptative System to Acquire Domain Lexical Knowledge in Technical Texts
Polarity Lexicon Building: to what Extent Is the Manual Effort Worth?
Challenges of Adjective Mapping between plWordNet and Princeton WordNet
Graded and Word-Sense-Disambiguation Decisions in Corpus Pattern Analysis: a Pilot Study
Accessing and Elaborating Walenty - a Valence Dictionary of Polish - via Internet Browser
CEPLEXicon ― A Lexicon of Child European Portuguese
Al Qamus al Muhit, a Medieval Arabic Lexicon in LMF
Evaluating Lexical Simplification and Vocabulary Knowledge for Learners of French: Possibilities of Using the FLELex Resource
Automatically Generated Affective Norms of Abstractness, Arousal, Imageability and Valence for 350 000 German Lemmas
Affective Lexicon Creation for the Greek Language
A Large Rated Lexicon with French Medical Words
Mapping Ontologies Using Ontologies: Cross-lingual Semantic Role Information Transfer
Multi-prototype Chinese Character Embedding
Leveraging RDF Graphs for Crossing Multiple Bilingual Dictionaries
Extracting Weighted Language Lexicons from Wikipedia
Best of Both Worlds: Making Word Sense Embeddings Interpretable
Evaluating Context Selection Strategies to Build Emotive Vector Space Models
Towards Lexical Encoding of Multi-Word Expressions in Spanish Dialects
Port4NooJ v3.0: Integrated Linguistic Resources for Portuguese NLP
Managing Linguistic and Terminological Variation in a Medical Dialogue System
Assessing the Potential of Metaphoricity of verbs using corpus data
Filtering Wiktionary Triangles by Linear Mbetween Distributed Word Models
A comparison of Named-Entity Disambiguation and Word Sense Disambiguation
BosphorusSign: A Turkish Sign Language Recognition Corpus in Health and Finance Domains
Gulf Arabic Linguistic Resource Building for Sentiment Analysis
A Lexical Resource for the Identification of Weak Words in German Specification Documents
PARSEME Survey on MWE Resources
Generating a Large-Scale Entity Linking Dictionary from Wikipedia Link Structure and Article Text
Refurbishing a Morphological Database for German
ANEW+: Automatic Expansion and Validation of Affective Norms of Words Lexicons in Multiple Languages
Recent Advances in Development of a Lexicon-Grammar of Polish: PolNet 3.0
Creating a Lexicon of Bavarian Dialect by Means of Facebook Language Data and Crowdsourcing
A Rule-based Shallow-transfer Machine Translation System for Scots and English
Effect Functors for Opinion Inference
PreMOn: a Lemon Extension for Exposing Predicate Models as Linked Data
Multiword Expressions in Child Language
A Framework for Automatic Acquisition of Croatian and Serbian Verb Aspect from Corpora
Database of Mandarin Neighborhood Statistics
Wow! What a Useful Extension! Introducing Non-Referential Concepts to Wordnet
Graph-Based Induction of Word Senses in Croatian
SlangNet: A WordNet like resource for English Slang
B2SG: a TOEFL-like Task for Portuguese
A Multi-domain Corpus of Swedish Word Sense Annotation
Wiktionnaire's Wikicode GLAWIfied: a Workable French Machine-Readable Dictionary
Distributional Thesauri for Information Retrieval and vice versa
ALT Explored: Integrating an Online Dialectometric Tool and an Online Dialect Atlas
VerbLexPor: a lexical resource with semantic roles for Portuguese
A Multilingual Predicate Matrix
Latin Vallex. A Treebank-based Semantic Valency Lexicon for Latin
Sentiframes: A Resource for Verb-centered German Sentiment Inference
Named Entity Resources - Overview and Outlook
Merging Data Resources for Inflectional and Derivational Morphology in Czech
Linked Data |
Semantic Links for Portuguese
Publishing the Trove Newspaper Corpus
Cross-lingual RDF Thesauri Interlinking
Concepticon: A Resource for the Linking of Concept Lists
"LVF-lemon ― Towards a Linked Data Representation of ""Les Verbes français"""
A Corpus of Images and Text in Online News
WikiCoref: An English Coreference-annotated Corpus of Wikipedia Articles
WTF-LOD - A New Resource for Large-Scale NER Evaluation
Riddle Generation using Word Associations
Challenges of Adjective Mapping between plWordNet and Princeton WordNet
Relation- and Phrase-level Linking of FrameNet with Sar-graphs
Crosswalking from CMDI to Dublin Core and MARC 21
Mapping Ontologies Using Ontologies: Cross-lingual Semantic Role Information Transfer
Leveraging RDF Graphs for Crossing Multiple Bilingual Dictionaries
Generating a Large-Scale Entity Linking Dictionary from Wikipedia Link Structure and Article Text
Lin|gu|is|tik: Building the Linguist's Pathway to Bibliographies, Libraries, Language Resources and Linked Open Data
The Open Linguistics Working Group: Developing the Linguistic Linked Open Data Cloud
PreMOn: a Lemon Extension for Exposing Predicate Models as Linked Data
Open Data Vocabularies for Assigning Usage Rights to Data Resources from Translation Projects
LR Infrastructures and Architectures |
Two Architectures for Parallel Processing of Huge Amounts of Text
Trends in HLT Research: A Survey of LDC's Data Scholarship Program
How to Address Smart Homes with a Social Robot? A Multi-modal Corpus of User Interactions with an Intelligent Environment
Internet Argument Corpus 2.0: An SQL schema for Dialogic Social Media and the Corpora to go with it
Publishing the Trove Newspaper Corpus
Corpus Query Lingua Franca (CQLF)
Providing a Catalogue of Language Resources for Commercial Users
Corpus Analysis based on Structural Phenomena in Texts: Exploiting TEI Encoding for Linguistic Research
Creating a Large Multi-Layered Representational Repository of Linguistic Code Switched Arabic Data
Collecting Language Resources for the Latvian e-Government Machine Translation Platform
The Language Application Grid and Galaxy
Learning from Within? Comparing PoS Tagging Approaches for Historical Text
ELRA Activities and Services
New Developments in the LRE Map
Data Formats and Management Strategies from the Perspective of Language Resource Producers ― Personal Diachronic and Social Synchronic Data Sharing ―
Korean TimeML and Korean TimeBank
The Language Resource Life Cycle: Towards a Generic Model for Creating, Maintaining, Using and Distributing Language Resources
Analysis of English Spelling Errors in a Word-Typing Game
A Large-scale Recipe and Meal Data Collection as Infrastructure for Food Research
EstNLTK - NLP Toolkit for Estonian
South African National Centre for Digital Language Resources
A Document Repository for Social Media and Speech Conversations
C4Corpus: Multilingual Web-size Corpus with Free License
Using a Language Technology Infrastructure for German in order to Anonymize German Sign Language Corpus Data
Design and Development of the MERLIN Learner Corpus Platform
The Hebrew FrameNet Project
FLAT: Constructing a CLARIN Compatible Home for Language Resources
The BAS Speech Data Repository
CLARIAH in the Netherlands
Crosswalking from CMDI to Dublin Core and MARC 21
LREC as a Graph: People and Resources in a Network
Hypergraph Modelization of a Syntactically Annotated English Wikipedia Dump
MADAD: A Readability Annotation Tool for Arabic Text
Data Management Plans and Data Centers
Fostering the Next Generation of European Language Technology: Recent Developments ― Emerging Initiatives ― Challenges and Opportunities
UIMA-Based JCoRe 2.0 Goes GitHub and Maven Central ― State-of-the-Art Software Resource Engineering and Distribution of NLP Pipelines
Facilitating Metadata Interoperability in CLARIN-DK
The Open Linguistics Working Group: Developing the Linguistic Linked Open Data Cloud
Towards a Language Service Infrastructure for Mobile Environments
Global Open Resources and Information for Language and Linguistic Analysis (GORILLA)
GATE-Time: Extraction of Temporal Expressions and Events
corpus-tools.org: An Interoperable Generic Software Tool Set for Multi-layer Linguistic Corpora
Open Data Vocabularies for Assigning Usage Rights to Data Resources from Translation Projects
NorGramBank: A Deep Treebank for Norwegian
CLARIN-EL Web-based Annotation Tool
LR National/International Projects, Infrastructural/Policy issues |
NLP Infrastructure for the Lithuanian Language
CodE Alltag: A German-Language E-Mail Corpus
LORELEI Language Packs: Data, Tools, and Resources for Technology Development in Low Resource Languages
Providing a Catalogue of Language Resources for Commercial Users
Hidden Resources ― Strategies to Acquire and Exploit Potential Spoken Language Resources in National Archives
ELRA Activities and Services
Language Resource Citation: the ISLRN Dissemination and Further Developments
The ELRA License Wizard
Review on the Existing Language Resources for Languages of France
Selection Criteria for Low Resource Language Programs
New Developments in the LRE Map
Enhancing Cross-border EU E-commerce through Machine Translation: Needed Language Resources, Challenges and Opportunities
The IPR-cleared Corpus of Contemporary Written and Spoken Romanian Language
SYN2015: Representative Corpus of Contemporary Written Czech
Character-Level Neural Translation for Multilingual Media Monitoring in the SUMMA Project
South African Language Resources: Phrase Chunking
A Lexical Resource of Hebrew Verb-Noun Multi-Word Expressions
Nederlab: Towards a Single Portal and Research Environment for Diachronic Dutch Text Corpora
Fostering digital representation of EU regional and minority languages: the Digital Language Diversity Project
CLARIAH in the Netherlands
LREC as a Graph: People and Resources in a Network
Port4NooJ v3.0: Integrated Linguistic Resources for Portuguese NLP
Persian Proposition Bank
Data Management Plans and Data Centers
Fostering the Next Generation of European Language Technology: Recent Developments ― Emerging Initiatives ― Challenges and Opportunities
Evaluating Interactive System Adaptation
The Open Linguistics Working Group: Developing the Linguistic Linked Open Data Cloud
The Public License Selector:
Making Open Licensing Easier
Graphical Annotation for Syntax-Semantics Mapping
Government Domain Named Entity Recognition for South African Languages
M |
Machine Translation, SpeechToSpeech Translation |
Word Sense-Aware Machine Translation: Including Senses as Contextual Features for Improved Translation Models
Manual and Automatic Paraphrases for MT Evaluation
Using the TED Talks to Evaluate Spoken Post-editing of Machine Translation
Privacy Issues in Online Machine Translation Services - European Perspective
Phrase Level Segmentation and Labelling of Machine Translation Errors
The United Nations Parallel Corpus v1.0
Building the Macedonian-Croatian Parallel Corpus
Collecting Language Resources for the Latvian e-Government Machine Translation Platform
SubCo: A Learner Translation Corpus of Human and Machine Subtitles
Enhancing Cross-border EU E-commerce through Machine Translation: Needed Language Resources, Challenges and Opportunities
Syntax-based Multi-system Machine Translation
Use of Domain-Specific Language Resources in Machine Translation
A Bilingual Discourse Corpus and Its Applications
Using a Cross-Language Information Retrieval System based on OHSUMED to Evaluate the Moses and KantanMT Statistical Machine Translation Systems
CATaLog Online: Porting a Post-editing Tool to the Web
The ILMT-s2s Corpus ― A Multimodal Interlingual Map Task Corpus
Name Translation based on Fine-grained Named Entity Recognition in a Single Language
Uzbek-English and Turkish-English Morpheme Alignment Corpora
Large Multi-lingual, Multi-level and Multi-genre Annotation Corpus
Using SMT for OCR Error Correction of Historical Texts
Lexical Resources to Enrich English Malayalam Machine Translation
Building a Corpus of Errors and Quality in Machine Translation: Experiments on Error Impact
Novel elicitation and annotation schemes for sentential and sub-sentential alignments of bitexts
PROTEST: A Test Suite for Evaluating Pronouns in Machine Translation
Linguistically Inspired Language Model Augmentation for MT
Paraphrasing Out-of-Vocabulary Words with Word Embeddings and Semantic Lexicons for Low Resource Statistical Machine Translation
Parallel Sentence Extraction from Comparable Corpora with Neural Network Features
Enhancing Access to Online Education: Quality Machine Translation of MOOC Content
Exploiting a Large Strongly Comparable Corpus
Character-Level Neural Translation for Multilingual Media Monitoring in the SUMMA Project
PE2rr Corpus: Manual Error Annotation of Automatically Pre-annotated MT Post-edits
Simultaneous Sentence Boundary Detection and Alignment with Pivot-based Machine Translation Generated Lexicons
English-to-Japanese Translation vs. Dictation vs. Post-editing: Comparing Translation Modes in a Multilingual Setting
Introducing the Asian Language Treebank (ALT)
TweetMT: A Parallel Microblog Corpus
Evaluating Translation Quality and CLIR Performance of Query Sessions
Using Contextual Information for Machine Translation Evaluation
That'll Do Fine!: A Coarse Lexical Resource for English-Hindi MT, Using Polylingual Topic Models
Evaluating the Impact of Light Post-Editing on Usability
Bootstrapping a Hybrid MT System to a New Language Pair
Evaluating Machine Translation in a Usage Scenario
Using BabelNet to Improve OOV Coverage in SMT
WAGS: A Beautiful English-Italian Benchmark Supporting Word Alignment Evaluation on Rare Words
Finding Alternative Translations in a Large Corpus of Movie Subtitle
ASPEC: Asian Scientific Paper Excerpt Corpus
Discontinuous Verb Phrases in Parsing and Machine Translation of English and German
Domain Adaptation in MT Using Titles in Wikipedia as a Parallel Corpus: Resources and Evaluation
Evaluation of the KIT Lecture Translation System
Filtering Wiktionary Triangles by Linear Mbetween Distributed Word Models
Tools and Guidelines for Principled Machine Translation Development
ProphetMT: A Tree-based SMT-driven Controlled Language Authoring/Post-Editing Tool
The Scielo Corpus: a Parallel Corpus of Scientific Publications for Biomedicine
The Trials and Tribulations of Predicting Post-Editing Productivity
A Rule-based Shallow-transfer Machine Translation System for Scots and English
Applying the Cognitive Machine Translation Evaluation Approach to Arabic
A Reading Comprehension Corpus for Machine Translation Evaluation
Producing Monolingual and Parallel Web Corpora at the Same Time - SpiderLing and Bitextor's Love Affair
IRIS: English-Irish Machine Translation System
Translation Errors and Incomprehensibility: a Case Study using Machine-Translated Second Language Proficiency Tests
Building A Case-based Semantic English-Chinese Parallel Treebank
OpenSubtitles2016: Extracting Large Parallel Corpora from Movie and TV Subtitles
Towards producing bilingual lexica from monolingual corpora
First Steps Towards Coverage-Based Sentence Alignment
Metadata |
The United Nations Parallel Corpus v1.0
Review on the Existing Language Resources for Languages of France
New Developments in the LRE Map
A Language Resource of German Errors Written by Children with Dyslexia
The IPR-cleared Corpus of Contemporary Written and Spoken Romanian Language
Compilation of an Arabic Childrens Corpus
The Language Resource Life Cycle: Towards a Generic Model for Creating, Maintaining, Using and Distributing Language Resources
FLAT: Constructing a CLARIN Compatible Home for Language Resources
CLARIAH in the Netherlands
Crosswalking from CMDI to Dublin Core and MARC 21
Automatically Generated Affective Norms of Abstractness, Arousal, Imageability and Valence for 350 000 German Lemmas
LREC as a Graph: People and Resources in a Network
A Lexical Resource for the Identification of Weak Words in German Specification Documents
PARSEME Survey on MWE Resources
Facilitating Metadata Interoperability in CLARIN-DK
The Royal Society Corpus: From Uncharted Data to Corpus
Open Data Vocabularies for Assigning Usage Rights to Data Resources from Translation Projects
Morphology |
A Finite-state Morphological Analyser for Tuvan
Orthographic and Morphological Correspondences between Related Slavic Languages as a Base for Modeling of Mutual Intelligibility
Remote Elicitation of Inflectional Paradigms to Seed Morphological Analysis in Low-Resource Languages
A New Integrated Open-source Morphological Analyzer for Hungarian
A Proposal for a Part-of-Speech Tagset for the Albanian Language
Very-large Scale Parsing and Normalization of Wiktionary Morphological Paradigms
Tēzaurs.lv: the Largest Open Lexical Database for Latvian
A Finite-State Morphological Analyser for Sindhi
Deriving Morphological Analyzers from Example Inflections
Morphological Analysis of Sahidic Coptic for Automatic Glossing
The on-line version of Grammatical Dictionary of Polish
Creating Linked Data Morphological Language Resources with MMoOn - The Hebrew Morpheme Inventory
Using a Small Lexicon with CRFs Confidence Measure to Improve POS Tagging Accuracy
Evaluating the Noisy Channel Model for the Normalization of Historical Texts: Basque, Spanish and Slovene
Farasa: A New Fast and Accurate Arabic Word Segmenter
A Novel Evaluation Method for Morphological Segmentation
A Morphological Lexicon of Esperanto with Morpheme Frequencies
How does Dictionary Size Influence Performance of Vietnamese Word Segmentation?
Giving Lexical Resources a Second Life: Démonette, a Multi-sourced Morpho-semantic Network for French
Universal Dependencies v1: A Multilingual Treebank Collection
Syntactic Analysis of Phrasal Compounds in Corpora: a Challenge for NLP Tools
Al Qamus al Muhit, a Medieval Arabic Lexicon in LMF
Bilingual Lexicon Extraction at the Morpheme Level Using Distributional Analysis
Lemmatization and Morphological Tagging in German and Latin: A Comparison and a Survey of the State-of-the-art
Morphologically Annotated Corpora and Morphological Analyzers for Moroccan and Sanaani Yemeni Arabic
DALILA: The Dialectal Arabic Linguistic Learning Assistant
Refurbishing a Morphological Database for German
A Large Scale Corpus of Gulf Arabic
A Framework for Automatic Acquisition of Croatian and Serbian Verb Aspect from Corpora
Exploiting Arabic Diacritization for High Quality Automatic Annotation
Rapid Development of Morphological Analyzers for Typologically Diverse Languages
A Neural Lemmatizer for Bengali
Merging Data Resources for Inflectional and Derivational Morphology in Czech
Multilinguality |
Orthographic and Morphological Correspondences between Related Slavic Languages as a Base for Modeling of Mutual Intelligibility
Transfer-Based Learning-to-Rank Assessment of Medical Term Technicality
Axolotl: a Web Accessible Parallel Corpus for Spanish-Nahuatl
Very-large Scale Parsing and Normalization of Wiktionary Morphological Paradigms
A Computational Perspective on the Romanian Dialects
A Turkish-German Code-Switching Corpus
Introducing the LCC Metaphor Datasets
Comparing Speech and Text Classification on ICNALE
Modelling a Parallel Corpus of French and French Belgian Sign Language
The United Nations Parallel Corpus v1.0
Building the Macedonian-Croatian Parallel Corpus
Two Years of Aranea: Increasing Counts and Tuning the Pipeline
Universal Dependencies for Japanese
Cross-lingual RDF Thesauri Interlinking
Quantitative Analysis of Gazes and Grounding Acts in L1 and L2 Conversations
SemRelData ― Multilingual Contextual Annotation of Semantic Relations between Nominals: Dataset and Guidelines
Speech Synthesis of Code-Mixed Text
Crowdsourcing Ontology Lexicons
CATaLog Online: Porting a Post-editing Tool to the Web
Sentiment Lexicons for Arabic Social Media
The IFCASL Corpus of French and German Non-native and Native Read Speech
Lexical Coverage Evaluation of Large-scale Multilingual Semantic Lexicons for Twelve Languages
Uzbek-English and Turkish-English Morpheme Alignment Corpora
Large Multi-lingual, Multi-level and Multi-genre Annotation Corpus
PROMETHEUS: A Corpus of Proverbs Annotated with Metaphors
A Multilingual, Multi-style and Multi-granularity Dataset for Cross-language Textual Similarity Detection
WIKIPARQ: A Tabulated Wikipedia Resource Using the Parquet Format
South African National Centre for Digital Language Resources
C4Corpus: Multilingual Web-size Corpus with Free License
Cognitively Motivated Distributional Representations of Meaning
Extending Monolingual Semantic Textual Similarity Task to Multiple Cross-lingual Settings
Cross-lingual Linking of Multi-word Entities and their corresponding Acronyms
EN-ES-CS: An English-Spanish Code-Switching Twitter Corpus for Multilingual Sentiment Analysis
English-to-Japanese Translation vs. Dictation vs. Post-editing: Comparing Translation Modes in a Multilingual Setting
The COPLE2 corpus: a learner corpus for Portuguese
Collecting Resources in Sub-Saharan African Languages for Automatic Speech Recognition: a Case Study of Wolof
Challenges of Adjective Mapping between plWordNet and Princeton WordNet
Poly-GrETEL: Cross-Lingual Example-based Querying of Syntactic Constructions
MEANTIME, the NewsReader Multilingual Event and Time Corpus
Evaluating Translation Quality and CLIR Performance of Query Sessions
Standard Test Collection for English-Persian Cross-Lingual Word Sense Disambiguation
European Union Language Resources in Sketch Engine
FREME: Multilingual Semantic Enrichment with Linked Data and Language Technologies
Evaluating Machine Translation in a Usage Scenario
Finding Alternative Translations in a Large Corpus of Movie Subtitle
ASPEC: Asian Scientific Paper Excerpt Corpus
Bilingual Lexicon Extraction at the Morpheme Level Using Distributional Analysis
Improving Bilingual Terminology Extraction from Comparable Corpora via Multiple Word-Space Models
A Large-Scale Multilingual Disambiguation of Glosses
MultiVec: a Multilingual and Multilevel Representation Learning Toolkit for NLP
Comparing the Level of Code-Switching in Corpora
Creation of comparable corpora for English-{Urdu, Arabic, Persian}
Fostering the Next Generation of European Language Technology: Recent Developments ― Emerging Initiatives ― Challenges and Opportunities
Parallel Global Voices: a Collection of Multilingual Corpora with Citizen Media Stories
The Scielo Corpus: a Parallel Corpus of Scientific Publications for Biomedicine
Combining Ontologies and Neural Networks for Analyzing Historical Language Varieties. A Case Study in Middle Low German
Applying the Cognitive Machine Translation Evaluation Approach to Arabic
Producing Monolingual and Parallel Web Corpora at the Same Time - SpiderLing and Bitextor's Love Affair
UDPipe: Trainable Pipeline for Processing CoNLL-U Files Performing Tokenization, Morphological Analysis, POS Tagging and Parsing
Coreference in Prague Czech-English Dependency Treebank
IRIS: English-Irish Machine Translation System
Functions of Code-Switching in Tweets: An Annotation Framework and Some Initial Experiments
OpenSubtitles2016: Extracting Large Parallel Corpora from Movie and TV Subtitles
A Multilingual Predicate Matrix
Towards producing bilingual lexica from monolingual corpora
Multimedia Document Processing |
SubCo: A Learner Translation Corpus of Human and Machine Subtitles
A Corpus of Images and Text in Online News
Speech Trax: A Bottom to the Top Approach for Speaker Tracking and Indexing in an Archiving Context
A Japanese Chess Commentary Corpus
Impact of Automatic Segmentation on the Quality, Productivity and Self-reported Post-editing Effort of Intralingual Subtitles
1 Million Captioned Dutch Newspaper Images
The CAMOMILE Collaborative Annotation Platform for Multi-modal, Multi-lingual and Multi-media Documents
Developing a Dataset for Evaluating Approaches for Document Expansion with Images
ArchiMob - A Corpus of Spoken Swiss German
MultiWord Expressions & Collocations |
Rule-based Automatic Multi-word Term Extraction and Lemmatization
Example-based Acquisition of Fine-grained Collocation Resources
MWEs in Treebanks: From Survey to Guidelines
Multiword Expressions Dataset for Indian Languages
An Empirical Study of Arabic Formulaic Sequence Extraction Methods
A lexicon of perception for the identification of synaesthetic metaphors in corpora
Compasses, Magnets, Water Microscopes: Annotation of Terminology in a Diachronic Corpus of Scientific Texts
Happy Accident: A Sentiment Composition Lexicon for Opposing Polarity Phrases
mwetoolkit+sem: Integrating Word Embeddings in the mwetoolkit for Semantic MWE Processing
TermoPL - a Flexible Tool for Terminology Extraction
GhoSt-NN: A Representative Gold Standard of German Noun-Noun Compounds
DeQue: A Lexicon of Complex Prepositions and Conjunctions in French
Construction of an English Dependency Corpus incorporating Compound Function Words
Cross-lingual Linking of Multi-word Entities and their corresponding Acronyms
Distribution of Valency Complements in Czech Complex Predicates: Between Verb and Noun
A Lexical Resource of Hebrew Verb-Noun Multi-Word Expressions
Forecasting Emerging Trends from Scientific Literature
Comprehensive and Consistent PropBank Light Verb Annotation
Inconsistency Detection in Semantic Annotation
Towards Lexical Encoding of Multi-Word Expressions in Spanish Dialects
PARSEME Survey on MWE Resources
Recent Advances in Development of a Lexicon-Grammar of Polish: PolNet 3.0
Multiword Expressions in Child Language
O |
Ontologies |
Ecological Gestures for HRI: the GEE Corpus
Semi-automatic Parsing for Web Knowledge Extraction through Semantic Annotation
Metonymy Analysis Using Associative Relations between Words
Creating Linked Data Morphological Language Resources with MMoOn - The Hebrew Morpheme Inventory
A Taxonomy of Spanish Nouns, a Statistical Algorithm to Generate it and its Implementation in Open Source Code
Annotating Logical Forms for EHR Questions
Domain Ontology Learning Enhanced by Optimized Relation Instance in DBpedia
A Framework for Cross-lingual/Node-wise Alignment of Lexical-Semantic Resources
Issues and Challenges in Annotating Urdu Action Verbs on the IMAGACT4ALL Platform
Towards a Linguistic Ontology with an Emphasis on Reasoning and Knowledge Reuse
Constructing a Norwegian Academic Wordlist
Mapping Ontologies Using Ontologies: Cross-lingual Semantic Role Information Transfer
Extracting Structured Scholarly Information from the Machine Translation Literature
Managing Linguistic and Terminological Variation in a Medical Dialogue System
The Event and Implied Situation Ontology (ESO): Application and Evaluation
Semantic Relation Extraction with Semantic Patterns Experiment on Radiology Reports
Combining Ontologies and Neural Networks for Analyzing Historical Language Varieties. A Case Study in Middle Low German
PreMOn: a Lemon Extension for Exposing Predicate Models as Linked Data
Wow! What a Useful Extension! Introducing Non-Referential Concepts to Wordnet
Automatic Biomedical Term Polysemy Detection
Opinion Mining / Sentiment Analysis |
Annotating Sentiment and Irony in the Online Italian Political Debate on #labuonascuola
NileULex: A Phrase and Word Level Sentiment Lexicon for Egyptian and Modern Standard Arabic
DRANZIERA: An Evaluation Protocol For Multi-Domain Opinion Mining
OPFI: A Tool for Opinion Finding in Polish
SatiricLR: a Language Resource of Satirical News Articles
Evaluating Lexical Similarity to build Sentiment Similarity
Using Data Mining Techniques for Sentiment Shifter Identification
Challenges of Evaluating Sentiment Analysis Tools on Social Media
EmoTweet-28: A Fine-Grained Emotion Corpus for Sentiment Analysis
A Dataset for Detecting Stance in Tweets
Sentiment Lexicons for Arabic Social Media
Happy Accident: A Sentiment Composition Lexicon for Opposing Polarity Phrases
Detecting Implicit Expressions of Affect from Text using Semantic Knowledge on Common Concept Properties
Reliable Baselines for Sentiment Analysis in Resource-Limited Languages: The Serbian Movie Review Dataset
Creating a General Russian Sentiment Lexicon
A Comparison of Domain-based Word Polarity Estimation using different Word Embeddings
Encoding Adjective Scales for Fine-grained Resources
Emotion Analysis on Twitter: The Hidden Challenge
EN-ES-CS: An English-Spanish Code-Switching Twitter Corpus for Multilingual Sentiment Analysis
A Language Independent Method for Generating Large Scale Polarity Lexicons
ANTUSD: A Large Chinese Sentiment Dictionary
Polarity Lexicon Building: to what Extent Is the Manual Effort Worth?
GRaSP: A Multilayered Annotation Scheme for Perspectives
Emotion Corpus Construction Based on Selection from Hashtags
SCARE ― The Sentiment Corpus of App Reviews with Fine-grained Annotations in German
Exploring the Realization of Irony in Twitter Data
Integration of Lexical and Semantic Knowledge for Sentiment Analysis in SMS
Rude waiter but mouthwatering pastries! An exploratory study into Dutch Aspect-Based Sentiment Analysis
Sentiment Analysis in Social Networks through Topic modeling
Aspect based Sentiment Analysis in Hindi: Resource Creation and Evaluation
Gulf Arabic Linguistic Resource Building for Sentiment Analysis
PARC 3.0: A Corpus of Attribution Relations
ANEW+: Automatic Expansion and Validation of Affective Norms of Words Lexicons in Multiple Languages
A Hungarian Sentiment Corpus Manually Annotated at Aspect Level
Effect Functors for Opinion Inference
Specialising Paragraph Vectors for Text Polarity Detection
Sentiframes: A Resource for Verb-centered German Sentiment Inference
Optical Character Recognition |
An Open Corpus for Named Entity Recognition in Historic Newspapers
Measuring Lexical Quality of a Historical Finnish Newspaper Collection ― Analysis of Garbled OCR Data with Basic Language Technology Tools and Means
Using SMT for OCR Error Correction of Historical Texts
Training & Quality Assessment of an Optical Character Recognition Model for Northern Haida
OCR Post-Correction Evaluation of Early Dutch Books Online - Revisited
Crowdsourcing an OCR Gold Standard for a German and French Heritage Corpus
Other |
Two Architectures for Parallel Processing of Huge Amounts of Text
Trends in HLT Research: A Survey of LDC's Data Scholarship Program
Who was Pietro Badoglio? Towards a QA system for Italian History
Coh-Metrix-Esp: A Complexity Analysis Tool for Documents Written in Spanish
Metonymy Analysis Using Associative Relations between Words
A Finite-State Morphological Analyser for Sindhi
Discriminative Analysis of Linguistic Features for Typological Study
Privacy Issues in Online Machine Translation Services - European Perspective
The ACQDIV Database: Min(d)ing the Ambient Language
Building Tempo-HindiWordNet: A resource for effective temporal information access in Hindi
Review on the Existing Language Resources for Languages of France
Corpus for Childrens Writing with Enhanced Output for Specific Spelling Patterns (2nd and 3rd Grade)
Unsupervised Ranked Cross-Lingual Lexical Substitution for Low-Resource Languages
Wikipedia Titles As Noun Tag Predictors
SYN2015: Representative Corpus of Contemporary Written Czech
Automatic Anomaly Detection for Dysarthria across Two Speech Styles: Read vs Spontaneous Speech
User, who art thou? User Profiling for Oral Corpus Platforms
Curation of Dutch Regional Dictionaries
Semi-automatically Alignment of Predicates between Speech and OntoNotes data
Wikification for Scriptio Continua
Adding Semantic Relations to a Large-Coverage Connective Lexicon of German
Crossmodal Network-Based Distributional Semantic Models
Detecting Word Usage Errors in Chinese Sentences for Learning Chinese as a Foreign Language
EstNLTK - NLP Toolkit for Estonian
The OFAI Multi-Modal Task Description Corpus
A Corpus of Text Data and Gaze Fixations from Autistic and Non-Autistic Adults
Fine-Grained Chinese Discourse Relation Labelling
Automatic identification of Mild Cognitive Impairment through the analysis of Italian spontaneous speech productions
Construction of Japanese Audio-Visual Emotion Database and Its Application in Emotion Recognition
Parallel Discourse Annotations on a Corpus of Short Texts
Fostering digital representation of EU regional and minority languages: the Digital Language Diversity Project
Features for Generic Corpus Querying
The TYPALOC Corpus: A Collection of Various Dysarthric Speech Recordings in Read and Spontaneous Styles
A Large Rated Lexicon with French Medical Words
IMS HotCoref DE: A Data-driven Co-reference Resolver for German
Towards Lexical Encoding of Multi-Word Expressions in Spanish Dialects
Laughter in French Spontaneous Conversational Dialogs
Acquiring Opposition Relations among Italian Verb Senses using Crowdsourcing
A comparison of Named-Entity Disambiguation and Word Sense Disambiguation
Universal Dependencies for Persian
Modeling Language Change in Historical Corpora: The Case of Portuguese
The CIRDO Corpus: Comprehensive Audio/Video Database of Domestic Falls of Elderly People
Interoperability of Annotation Schemes: Using the Pepper Framework to Display AWA Documents in the ANNIS Interface
SuperCAT: The (New and Improved) Corpus Analysis Toolkit
SPLIT: Smart Preprocessing (Quasi) Language Independent Tool
A Verbal and Gestural Corpus of Story Retellings to an Expressive Embodied Virtual Character
Word Segmentation for Akkadian Cuneiform
Survey of Conversational Behavior: Towards the Design of a Balanced Corpus of Everyday Japanese Conversation
Yes, We Care! Results of the Ethics and Natural Language Processing Surveys
NNBlocks: A Deep Learning Framework for Computational Linguistics Neural Network Models
The Public License Selector:
Making Open Licensing Easier
Named Entity Recognition on Twitter for Turkish using Semi-supervised Learning with Word Embeddings
Deep Learning of Audio and Language Features for Humor Prediction
Improving the Annotation of Sentence Specificity
ALT Explored: Integrating an Online Dialectometric Tool and an Online Dialect Atlas
Detecting Expressions of Blame or Praise in Text
CommonCOW: Massively Huge Web Corpora from CommonCrawl Data and a Method to Distribute them Freely under Restrictive EU Copyright Laws
Temporal Information Annotation: Crowd vs. Experts
EDISON: Feature Extraction for NLP, Simplified
Entity Linking with a Paraphrase Flavor
Accurate Deep Syntactic Parsing of Graphs: The Case of French
Enriching a Portuguese WordNet using Synonyms from a Monolingual Dictionary
An Empirical Exploration of Moral Foundations Theory in Partisan News Sources
Embedding Open-domain Common-sense Knowledge from Text
OPFI: A Tool for Opinion Finding in Polish
Cro36WSD: A Lexical Sample for Croatian Word Sense Disambiguation
The Uppsala Corpus of Student Writings: Corpus Creation, Annotation, and Analysis
Evaluating Lexical Similarity to build Sentiment Similarity
Annotating and Detecting Medical Events in Clinical Notes
Multiword Expressions Dataset for Indian Languages
Constraint-Based Bilingual Lexicon Induction for Closely Related Languages
The ELRA License Wizard
CASSAurus: A Resource of Simpler Spanish Synonyms
CoRuSS - a New Prosodically Annotated Corpus of Russian Spontaneous Speech
Evaluating the Noisy Channel Model for the Normalization of Historical Texts: Basque, Spanish and Slovene
Farasa: A New Fast and Accurate Arabic Word Segmenter
Automatic Anomaly Detection for Dysarthria across Two Speech Styles: Read vs Spontaneous Speech
Using a Cross-Language Information Retrieval System based on OHSUMED to Evaluate the Moses and KantanMT Statistical Machine Translation Systems
Curation of Dutch Regional Dictionaries
LibN3L:A Lightweight Package for Neural NLP
Extractive Summarization under Strict Length Constraints
DeQue: A Lexicon of Complex Prepositions and Conjunctions in French
A Singing Voice Database in Basque for Statistical Singing Synthesis of Bertsolaritza
ANTUSD: A Large Chinese Sentiment Dictionary
Universal Dependencies for Norwegian
Can Tweets Predict TV Ratings?
Web Chat Conversations from Contact Centers: a Descriptive Study
MEANTIME, the NewsReader Multilingual Event and Time Corpus
Could Speaker, Gender or Age Awareness be beneficial in Speech-based Emotion Recognition?
CItA: an L1 Italian Learners Corpus to Study the Development of Writing Competence
Automatic Recognition of Linguistic Replacements in Text Series Generated from Keystroke Logs
SCARE ― The Sentiment Corpus of App Reviews with Fine-grained Annotations in German
Leveraging RDF Graphs for Crossing Multiple Bilingual Dictionaries
Improving Bilingual Terminology Extraction from Comparable Corpora via Multiple Word-Space Models
Domain Adaptation in MT Using Titles in Wikipedia as a Parallel Corpus: Resources and Evaluation
A Dutch Dysarthric Speech Database for Individualized Speech Therapy Research
Neural Scoring Function for MST Parser
TEITOK: Text-Faithful Annotated Corpora
TLT-CRF: A Lexicon-supported Morphological Tagger for Latin Based on Conditional Random Fields
A Longitudinal Bilingual Frisian-Dutch Radio Broadcast Database Designed for Code-Switching Research
The CIRDO Corpus: Comprehensive Audio/Video Database of Domestic Falls of Elderly People
Generating Task-Pertinent sorted Error Lists for Speech Recognition
Using lexical and Dependency Features to Disambiguate Discourse Connectives in Hindi
SPLIT: Smart Preprocessing (Quasi) Language Independent Tool
Parallel Global Voices: a Collection of Multilingual Corpora with Citizen Media Stories
TermITH-Eval: a French Standard-Based Resource for Keyphrase Extraction Evaluation
French Learners Audio Corpus of German Speech (FLACGS)
Yes, We Care! Results of the Ethics and Natural Language Processing Surveys
Transfer of Corpus-Specific Dialogue Act Annotation to ISO Standard: Is it worth it?
Wiktionnaire's Wikicode GLAWIfied: a Workable French Machine-Readable Dictionary
A Neural Lemmatizer for Bengali
CommonCOW: Massively Huge Web Corpora from CommonCrawl Data and a Method to Distribute them Freely under Restrictive EU Copyright Laws
P |
Parsing |
Accurate Deep Syntactic Parsing of Graphs: The Case of French
Punctuation Prediction for Unsegmented Transcript Based on Word Vector
Semi-automatic Parsing for Web Knowledge Extraction through Semantic Annotation
Explicit Fine grained Syntactic and Semantic Annotation of the Idafa Construction in Arabic
Phrase Level Segmentation and Labelling of Machine Translation Errors
Universal Dependencies for Japanese
A Dependency Treebank of the Chinese Buddhist Canon
Evaluating a Deterministic Shift-Reduce Neural Parser for Constituent Parsing
Language Resource Addition Strategies for Raw Text Parsing
E-TIPSY: Search Query Corpus Annotated with Entities, Term Importance, POS Tags, and Syntactic Parses
4Couv: A New Treebank for French
AfriBooms: An Online Treebank for Afrikaans
Differentia compositionem facit. A Slower-Paced and Reliable Parser for Latin
CINTIL DependencyBank PREMIUM - A Corpus of Grammatical Dependencies for Portuguese
Estonian Dependency Treebank: from Constraint Grammar tagset to Universal Dependencies
Construction of an English Dependency Corpus incorporating Compound Function Words
South African Language Resources: Phrase Chunking
Syntactic Analysis of Phrasal Compounds in Corpora: a Challenge for NLP Tools
EasyTree: A Graphical Tool for Dependency Tree Annotation
Neural Scoring Function for MST Parser
Extracting Interlinear Glossed Text from LaTeX Documents
Cross-lingual and Supervised Models for Morphosyntactic Annotation: a Comparison on Romanian
Hard Time Parsing Questions: Building a QuestionBank for French
Using lexical and Dependency Features to Disambiguate Discourse Connectives in Hindi
Enhanced English Universal Dependencies: An Improved Representation for Natural Language Understanding Tasks
Towards Building Semantic Role Labeler for Indian Languages
Old French Dependency Parsing: Results of Two Parsers Analysed from a Linguistic Point of View
The Denoised Web Treebank: Evaluating Dependency Parsing under Noisy Input Conditions
UDPipe: Trainable Pipeline for Processing CoNLL-U Files Performing Tokenization, Morphological Analysis, POS Tagging and Parsing
Towards Comparability of Linguistic Graph Banks for Semantic Parsing
Czech Legal Text Treebank 1.0
NorGramBank: A Deep Treebank for Norwegian
Government Domain Named Entity Recognition for South African Languages
Part-of-Speech Tagging |
A Proposal for a Part-of-Speech Tagset for the Albanian Language
Morphological Analysis of Sahidic Coptic for Automatic Glossing
Using a Small Lexicon with CRFs Confidence Measure to Improve POS Tagging Accuracy
Two Years of Aranea: Increasing Counts and Tuning the Pipeline
Learning from Within? Comparing PoS Tagging Approaches for Historical Text
Improving POS Tagging of German Learner Language in a Reading Comprehension Scenario
Wikipedia Titles As Noun Tag Predictors
POS-tagging of Historical Dutch
Language Resource Addition Strategies for Raw Text Parsing
New Inflectional Lexicons and Training Corpora for Improved Morphosyntactic Annotation of Croatian and Serbian
FOLK-Gold ― A Gold Standard for Part-of-Speech-Tagging of Spoken German
TGermaCorp -- A (Digital) Humanities Resource for (Computational) Linguistics
Features for Generic Corpus Querying
Constructing a Norwegian Academic Wordlist
Fast and Robust POS tagger for Arabic Tweets Using Agreement-based Bootstrapping
Lemmatization and Morphological Tagging in German and Latin: A Comparison and a Survey of the State-of-the-art
TLT-CRF: A Lexicon-supported Morphological Tagger for Latin Based on Conditional Random Fields
Cross-lingual and Supervised Models for Morphosyntactic Annotation: a Comparison on Romanian
If You Even Don't Have a Bit of Bible: Learning Delexicalized POS Taggers
Morphologically Annotated Corpora and Morphological Analyzers for Moroccan and Sanaani Yemeni Arabic
The hunvec framework for NN-CRF-based sequential tagging
Corpus vs. Lexicon Supervision in Morphosyntactic Tagging: the Case of Slovene
Combining Ontologies and Neural Networks for Analyzing Historical Language Varieties. A Case Study in Middle Low German
A Large Scale Corpus of Gulf Arabic
The Denoised Web Treebank: Evaluating Dependency Parsing under Noisy Input Conditions
UDPipe: Trainable Pipeline for Processing CoNLL-U Files Performing Tokenization, Morphological Analysis, POS Tagging and Parsing
Exploiting Arabic Diacritization for High Quality Automatic Annotation
Rapid Development of Morphological Analyzers for Typologically Diverse Languages
FlexTag: A Highly Flexible PoS Tagging Framework
Person Identification |
Comparing Speech and Text Classification on ICNALE
Arabic to English Person Name Transliteration using Twitter
Speech Trax: A Bottom to the Top Approach for Speaker Tracking and Indexing in an Archiving Context
FABIOLE, a Speech Database for Forensic Speaker Comparison
Benchmarking multimedia technologies with the CAMOMILE platform: the case of Multimodal Person Discovery at MediaEval 2015
Dialogue System Characterisation by Back-channelling Patterns Extracted from Dialogue Corpus
He Said She Said ― a Male/Female Corpus of Polish
Predicting Author Age from Weibo Microblog Posts
Phonetic Databases, Phonology |
New release of Mixer-6: Improved validity for phonetic study of speaker variation and identification
Phonetic Inventory for an Arabic Speech Corpus
Defining and Counting Phonological Classes in Cross-linguistic Segment Databases
Phoneme Alignment Using the Information on Phonological Processes in Continuous Speech
The IFCASL Corpus of French and German Non-native and Native Read Speech
The BAS Speech Data Repository
Bridge-Language Capitalization Inference in Western Iranian: Sorani, Kurmanji, Zazaki, and Tajik
Vocal Pathologies Detection and Mispronounced Phonemes Identification: Case of Arabic Continuous Speech
Polish Rhythmic Database ― New Resources for Speech Timing and Rhythm Analysis
Profiling |
Building a Dataset for Possessions Identification in Text
Age and Gender Prediction on Health Forum Data
SweLL on the rise: Swedish Learner Language corpus for European Reference Level studies
A Semi-Supervised Approach for Gender Identification
TwiSty: A Multilingual Twitter Stylometry Corpus for Gender and Personality Profiling
Predicting Author Age from Weibo Microblog Posts
Prosody |
Assessing the Prosody of Non-Native Speakers of English: Measures and Feature Sets
AMISCO: The Austrian German Multi-Sensor Corpus
Introducing the SEA_AP: an Enhanced Tool for Automatic Prosodic Analysis
Metrical Annotation of a Large Corpus of Spanish Sonnets: Representation, Scansion and Evaluation
Combining Manual and Automatic Prosodic Annotation for Expressive Speech Synthesis
On the Use of a Serious Game for Recording a Speech Corpus of People with Intellectual Disabilities
Polish Rhythmic Database ― New Resources for Speech Timing and Rhythm Analysis
S |
Semantics |
A Gold Standard for Scalar Adjectives
The Gavagai Living Lexicon
VerbCROcean: A Repository of Fine-Grained Semantic Verb Relations for Croatian
VoxML: A Visualization Modeling Language
Example-based Acquisition of Fine-grained Collocation Resources
Embedding Open-domain Common-sense Knowledge from Text
Combining Semantic Annotation of Word Sense & Semantic Roles: A Novel Annotation Scheme for VerbNet Roles on German Language Data
SemAligner: A Method and Tool for Aligning Chunks with Semantic Relation Types and Semantic Similarity Scores
Introducing the LCC Metaphor Datasets
DT-Neg: Tutorial Dialogues Annotated for Negation Scope and Focus in Context
Medical Concept Embeddings via Labeled Background Corpora
Enriching TimeBank: Towards a more precise annotation of temporal relations in a text
Cro36WSD: A Lexical Sample for Croatian Word Sense Disambiguation
A Taxonomy of Spanish Nouns, a Statistical Algorithm to Generate it and its Implementation in Open Source Code
Spanish Word Vectors from Wikipedia
Synset Ranking of Hindi WordNet
Neural Embedding Language Models in Semantic Clustering of Web Search Results
SemRelData ― Multilingual Contextual Annotation of Semantic Relations between Nominals: Dataset and Guidelines
Using Data Mining Techniques for Sentiment Shifter Identification
Question-Answering with Logic Specific to Video Games
Concepticon: A Resource for the Linking of Concept Lists
Aspectual Flexibility Increases with Agentivity and Concreteness\\ A Computational Classification Experiment on Polysemous Verbs
Annotating Logical Forms for EHR Questions
Exploitation of Co-reference in Distributional Semantics
A Framework for Cross-lingual/Node-wise Alignment of Lexical-Semantic Resources
The VU Sound Corpus: Adding More Fine-grained Annotations to the Freesound Database
A sense-based lexicon of count and mass expressions: The Bochum English Countability Lexicon
A lexicon of perception for the identification of synaesthetic metaphors in corpora
A Corpus of Literal and Idiomatic Uses of German Infinitive-Verb Compounds
A Dataset for Detecting Stance in Tweets
Semi-automatically Alignment of Predicates between Speech and OntoNotes data
Legal Text Interpretation: Identifying Hohfeldian Relations from Text
Lexical Coverage Evaluation of Large-scale Multilingual Semantic Lexicons for Twelve Languages
mwetoolkit+sem: Integrating Word Embeddings in the mwetoolkit for Semantic MWE Processing
Crossmodal Network-Based Distributional Semantic Models
A Semantically Compositional Annotation Scheme for Time Normalization
PROMETHEUS: A Corpus of Proverbs Annotated with Metaphors
Corpus Annotation within the French FrameNet: a Domain-by-domain Methodology
GhoSt-NN: A Representative Gold Standard of German Noun-Noun Compounds
The SemDaX Corpus ― Sense Annotations with Scalable Sense Inventories
Covering various Needs in Temporal Annotation: a Proposal of Extension of ISO TimeML that Preserves Upward Compatibility
Building Concept Graphs from Monolingual Dictionary Entries
CORILSE: a Spanish Sign Language Repository for Linguistic Analysis
PersonaBank: A Corpus of Personal Narratives and Their Story Intention Graphs
Paraphrasing Out-of-Vocabulary Words with Word Embeddings and Semantic Lexicons for Low Resource Statistical Machine Translation
Semantic Layer of the Valence Dictionary of Polish Walenty
Riddle Generation using Word Associations
A General Framework for the Annotation of Causality Based on FrameNet
Cognitively Motivated Distributional Representations of Meaning
Annotating Temporally-Anchored Spatial Knowledge on Top of OntoNotes Semantic Roles
Extending Monolingual Semantic Textual Similarity Task to Multiple Cross-lingual Settings
The Hebrew FrameNet Project
Addressing the MFS Bias in WSD systems
Argument Mining: the Bottleneck of Knowledge and Language Resources
Italian VerbNet: A Construction-based Approach to Italian Verb Classification
Nine Features in a Random Forest to Learn Taxonomical Semantic Relations
metaTED: a Corpus of Metadiscourse for Spoken Language
ELMD: An Automatically Generated Entity Linking Gold Standard Dataset in the Music Domain
Issues and Challenges in Annotating Urdu Action Verbs on the IMAGACT4ALL Platform
SpaceRef: A corpus of street-level geographic descriptions
Visualisation and Exploration of High-Dimensional Distributional Features in Lexical Semantic Classification
Al Qamus al Muhit, a Medieval Arabic Lexicon in LMF
Automatically Generated Affective Norms of Abstractness, Arousal, Imageability and Valence for 350 000 German Lemmas
A Large Rated Lexicon with French Medical Words
Comprehensive and Consistent PropBank Light Verb Annotation
Inconsistency Detection in Semantic Annotation
Datasets for Aspect-Based Sentiment Analysis in French
DART: a Dataset of Arguments and their Relations on Twitter
Multi-prototype Chinese Character Embedding
Bilingual Lexicon Extraction at the Morpheme Level Using Distributional Analysis
Best of Both Worlds: Making Word Sense Embeddings Interpretable
Improving Bilingual Terminology Extraction from Comparable Corpora via Multiple Word-Space Models
Rude waiter but mouthwatering pastries! An exploratory study into Dutch Aspect-Based Sentiment Analysis
Can Topic Modelling benefit from Word Sense Information?
Resources for building applications with Dependency Minimal Recursion Semantics
Typology of Adjectives Benchmark for Compositional Distributional Models
Assessing the Potential of Metaphoricity of verbs using corpus data
Persian Proposition Bank
Enhanced English Universal Dependencies: An Improved Representation for Natural Language Understanding Tasks
Semantic Relation Extraction with Semantic Patterns Experiment on Radiology Reports
Typed Entity and Relation Annotation on Computer Science Papers
EVALution-MAN: A Chinese Dataset for the Training and Evaluation of DSMs
Towards Building Semantic Role Labeler for Indian Languages
Effect Functors for Opinion Inference
A Dataset for Open Event Extraction in English
A Framework for Automatic Acquisition of Croatian and Serbian Verb Aspect from Corpora
Semantic Annotation of the ACL Anthology Corpus for the Automatic Analysis of Scientific Literature
Wow! What a Useful Extension! Introducing Non-Referential Concepts to Wordnet
Graph-Based Induction of Word Senses in Croatian
Towards Comparability of Linguistic Graph Banks for Semantic Parsing
A Crowdsourced Database of Event Sequence Descriptions for the Acquisition of High-quality Script Knowledge
GATE-Time: Extraction of Temporal Expressions and Events
Building A Case-based Semantic English-Chinese Parallel Treebank
VerbLexPor: a lexical resource with semantic roles for Portuguese
A Multilingual Predicate Matrix
Latin Vallex. A Treebank-based Semantic Valency Lexicon for Latin
Merging Data Resources for Inflectional and Derivational Morphology in Czech
Semantic Web |
Semi-automatic Parsing for Web Knowledge Extraction through Semantic Annotation
Concepticon: A Resource for the Linking of Concept Lists
Towards a Linguistic Ontology with an Emphasis on Reasoning and Knowledge Reuse
Context-enhanced Adaptive Entity Linking
DBpedia Abstracts: A Large-Scale, Open, Multilingual NLP Training Corpus
Evaluating Entity Linking: An Analysis of Current Benchmark Datasets and a Roadmap for Doing a Better Job
Sign Language Recognition/Generation |
A Web Tool for Building Parallel Corpora of Spoken and Sign Languages
Modelling a Parallel Corpus of French and French Belgian Sign Language
CORILSE: a Spanish Sign Language Repository for Linguistic Analysis
Using a Language Technology Infrastructure for German in order to Anonymize German Sign Language Corpus Data
Finding Recurrent Features of Image Schema Gestures: the FIGURE corpus
BosphorusSign: A Turkish Sign Language Recognition Corpus in Health and Finance Domains
Detection of Major ASL Sign Types in Continuous Signing For ASL Recognition
Social Media Processing |
Evaluating Unsupervised Dutch Word Embeddings as a Linguistic Resource
Annotating Sentiment and Irony in the Online Italian Political Debate on #labuonascuola
A Corpus of Wikipedia Discussions: Over the Years, with Topic, Power and Gender Labels
NileULex: A Phrase and Word Level Sentiment Lexicon for Egyptian and Modern Standard Arabic
Building a Dataset for Possessions Identification in Text
CodE Alltag: A German-Language E-Mail Corpus
A Turkish-German Code-Switching Corpus
Whats the Issue Here?: Task-based Evaluation of Reader Comment Summarization Systems
Enhancing Cross-border EU E-commerce through Machine Translation: Needed Language Resources, Challenges and Opportunities
Homing in on Twitter Users: Evaluating an Enhanced Geoparser for User Profile Locations
Speech Synthesis of Code-Mixed Text
Challenges of Evaluating Sentiment Analysis Tools on Social Media
A Dataset for Detecting Stance in Tweets
Sentiment Lexicons for Arabic Social Media
An Arabic-Moroccan Darija Code-Switched Corpus
Classifying Out-of-vocabulary Terms in a Domain-Specific Social Media Corpus
A Document Repository for Social Media and Speech Conversations
A Language Independent Method for Generating Large Scale Polarity Lexicons
Corpus for Customer Purchase Behavior Prediction in Social Media
TweetMT: A Parallel Microblog Corpus
Can Tweets Predict TV Ratings?
Web Chat Conversations from Contact Centers: a Descriptive Study
Multilevel Annotation of Agreement and Disagreement in Italian News Blogs
Exploring the Realization of Irony in Twitter Data
Fast and Robust POS tagger for Arabic Tweets Using Agreement-based Bootstrapping
DART: a Dataset of Arguments and their Relations on Twitter
Rude waiter but mouthwatering pastries! An exploratory study into Dutch Aspect-Based Sentiment Analysis
TwiSty: A Multilingual Twitter Stylometry Corpus for Gender and Personality Profiling
Sentiment Analysis in Social Networks through Topic modeling
Analyzing Time Series Changes of Correlation between Market Share and Concerns on Companies measured through Search Engine Suggests
Segmenting Hashtags using Automatically Created Training Data
What does this Emoji Mean? A Vector Space Skip-Gram Model for Twitter Emojis
A Hungarian Sentiment Corpus Manually Annotated at Aspect Level
Twitter as a Lifeline: Human-annotated Twitter Corpora for NLP of Crisis-related Messages
The Denoised Web Treebank: Evaluating Dependency Parsing under Noisy Input Conditions
Named Entity Recognition on Twitter for Turkish using Semi-supervised Learning with Word Embeddings
Exploring Language Variation Across Europe - A Web-based Tool for Computational Sociolinguistics
Monolingual Social Media Datasets for Detecting Contradiction and Entailment
Functions of Code-Switching in Tweets: An Annotation Framework and Some Initial Experiments
Predicting Author Age from Weibo Microblog Posts
Effects of Sampling on Twitter Trend Detection
PotTS: The Potsdam Twitter Sentiment Corpus
FlexTag: A Highly Flexible PoS Tagging Framework
Automatic Classification of Tweets for Analyzing Communication Behavior of Museums
Speech Recognition/Understanding |
Optimizing Computer-Assisted Transcription Quality with Iterative User Interfaces
Punctuation Prediction for Unsegmented Transcript Based on Word Vector
The DIRHA Portuguese Corpus: A Comparison of Home Automation Command Detection and Recognition in Simulated and Real Data.
Enhanced CORILGA: Introducing the Automatic Phonetic Alignment Tool for Continuous Speech
Using the TED Talks to Evaluate Spoken Post-editing of Machine Translation
Introducing the Weighted Trustability Evaluator for Crowdsourcing Exemplified by Speaker Likability Classification
Assessing the Prosody of Non-Native Speakers of English: Measures and Feature Sets
AIMU: Actionable Items for Meeting Understanding
A Comparative Analysis of Crowdsourced Natural Language Corpora for Spoken Dialog Systems
How Diachronic Text Corpora Affect Context based Retrieval of OOV Proper Names for Audio News
Introducing the SEA_AP: an Enhanced Tool for Automatic Prosodic Analysis
Syllable based DNN-HMM Cantonese Speech to Text System
Palabras: Crowdsourcing Transcriptions of L2 Speech
Collecting Resources in Sub-Saharan African Languages for Automatic Speech Recognition: a Case Study of Wolof
BulPhonC: Bulgarian Speech Corpus for the Development of ASR Technology
Designing a Speech Corpus for the Development and Evaluation of Dictation Systems in Latvian
SCALE: A Scalable Language Engineering Toolkit
The LetsRead Corpus of Portuguese Children Reading Aloud for Performance Evaluation
Mining the Spoken Wikipedia for Speech Data and Beyond
A Corpus of Read and Spontaneous Upper Saxon German Speech for ASR Evaluation
Parallel Speech Corpora of Japanese Dialects
Generating Task-Pertinent sorted Error Lists for Speech Recognition
The SI TEDx-UM speech database: a new Slovenian Spoken Language Resource
AppDialogue: Multi-App Dialogues for Intelligent Assistants
Speech Corpus Spoken by Young-old, Old-old and Oldest-old Japanese
Joining-in-type Humanoid Robot Assisted Language Learning System
Speech Resource/Database |
Endangered Language Documentation: Bootstrapping a Chatino Speech Corpus, Forced Aligner, ASR
Falling silent, lost for words ... Tracing personal involvement in interviews with Dutch war veterans
New release of Mixer-6: Improved validity for phonetic study of speaker variation and identification
The DIRHA Portuguese Corpus: A Comparison of Home Automation Command Detection and Recognition in Simulated and Real Data.
Enhanced CORILGA: Introducing the Automatic Phonetic Alignment Tool for Continuous Speech
Generating a Yiddish Speech Corpus, Forced Aligner and Basic ASR System for the AHEYM Project
A Framework for Collecting Realistic Recordings of Dysarthric Speech - the homeService Corpus
Capturing Chat: Annotation and Tools for Multiparty Casual Conversation.
Towards Automatic Transcription of ILSE ― an Interdisciplinary Longitudinal Study of Adult Development and Aging
Hidden Resources ― Strategies to Acquire and Exploit Potential Spoken Language Resources in National Archives
CoRuSS - a New Prosodically Annotated Corpus of Russian Spontaneous Speech
Operational Assessment of Keyword Search on Oral History
Accuracy of Automatic Cross-Corpus Emotion Labeling for Conversational Speech Corpus Commonization
User, who art thou? User Profiling for Oral Corpus Platforms
Semi-automatically Alignment of Predicates between Speech and OntoNotes data
Comparison of Emotional Understanding in Modality-Controlled Environments using Multimodal Online Emotional Communication Corpus
FABIOLE, a Speech Database for Forensic Speaker Comparison
A Singing Voice Database in Basque for Statistical Singing Synthesis of Bertsolaritza
AMISCO: The Austrian German Multi-Sensor Corpus
A Database of Laryngeal High-Speed Videos with Simultaneous High-Quality Audio Recordings of Pathological and Non-Pathological Voices
FOLK-Gold ― A Gold Standard for Part-of-Speech-Tagging of Spoken German
AVAB-DBS: an Audio-Visual Affect Bursts Database for Synthesis
Introducing the SEA_AP: an Enhanced Tool for Automatic Prosodic Analysis
Syllable based DNN-HMM Cantonese Speech to Text System
Palabras: Crowdsourcing Transcriptions of L2 Speech
Collecting Resources in Sub-Saharan African Languages for Automatic Speech Recognition: a Case Study of Wolof
BulPhonC: Bulgarian Speech Corpus for the Development of ASR Technology
The LetsRead Corpus of Portuguese Children Reading Aloud for Performance Evaluation
The BAS Speech Data Repository
Mining the Spoken Wikipedia for Speech Data and Beyond
Parallel Speech Corpora of Japanese Dialects
The TYPALOC Corpus: A Collection of Various Dysarthric Speech Recordings in Read and Spontaneous Styles
A Dutch Dysarthric Speech Database for Individualized Speech Therapy Research
A Shared Task for Spoken CALL?
A Longitudinal Bilingual Frisian-Dutch Radio Broadcast Database Designed for Code-Switching Research
The SI TEDx-UM speech database: a new Slovenian Spoken Language Resource
A Verbal and Gestural Corpus of Story Retellings to an Expressive Embodied Virtual Character
Speech Corpus Spoken by Young-old, Old-old and Oldest-old Japanese
SPA: Web-based Platform for easy Access to Speech Processing Modules
Polish Rhythmic Database ― New Resources for Speech Timing and Rhythm Analysis
CHATR the Corpus; a 20-year-old archive of Concatenative Speech Synthesis
Database of Mandarin Neighborhood Statistics
An Extension of the Slovak Broadcast News Corpus based on Semi-Automatic Annotation
Global Open Resources and Information for Language and Linguistic Analysis (GORILLA)
Crowdsourcing a Multi-lingual Speech Corpus: Recording, Transcription and Annotation of the CrowdIS Corpora
Speech Synthesis |
Speech Synthesis of Code-Mixed Text
A Taxonomy of Specific Problem Classes in Text-to-Speech Synthesis: Comparing Commercial and Open Source Performance
TTS for Low Resource Languages: A Bangla Synthesizer
AVAB-DBS: an Audio-Visual Affect Bursts Database for Synthesis
Combining Manual and Automatic Prosodic Annotation for Expressive Speech Synthesis
Chatbot Technology with Synthetic Voices in the Acquisition of an Endangered Language: Motivation, Development and Evaluation of a Platform for Irish
CHATR the Corpus; a 20-year-old archive of Concatenative Speech Synthesis
Standards for LRs |
An Annotated Corpus of Direct Speech
A Proposal for a Part-of-Speech Tagset for the Albanian Language
MWEs in Treebanks: From Survey to Guidelines
Corpus Query Lingua Franca (CQLF)
Corpus Analysis based on Structural Phenomena in Texts: Exploiting TEI Encoding for Linguistic Research
Creating a Large Multi-Layered Representational Repository of Linguistic Code Switched Arabic Data
RankDCG: Rank-Ordering Evaluation Measure
Language Resource Citation: the ISLRN Dissemination and Further Developments
Modelling Multi-issue Bargaining Dialogues: Data Collection, Annotation Design and Corpus
Quality Assessment of the Reuters Vol. 2 Multilingual Corpus
The Language Resource Life Cycle: Towards a Generic Model for Creating, Maintaining, Using and Distributing Language Resources
Covering various Needs in Temporal Annotation: a Proposal of Extension of ISO TimeML that Preserves Upward Compatibility
A Large-scale Recipe and Meal Data Collection as Infrastructure for Food Research
The Universal Dependencies Treebank of Spoken Slovenian
Metrical Annotation of a Large Corpus of Spanish Sonnets: Representation, Scansion and Evaluation
Annotating Discourse Relations in Spoken Language: A Comparison of the PDTB and CCR Frameworks
The DialogBank
Facilitating Metadata Interoperability in CLARIN-DK
Towards Comparability of Linguistic Graph Banks for Semantic Parsing
Graphical Annotation for Syntax-Semantics Mapping
Statistical and Machine Learning Methods |
Punctuation Prediction for Unsegmented Transcript Based on Word Vector
Transfer-Based Learning-to-Rank Assessment of Medical Term Technicality
MARMOT: A Toolkit for Translation Quality Estimation at the Word Level
Word Sense-Aware Machine Translation: Including Senses as Contextual Features for Improved Translation Models
A Machine Learning based Music Retrieval and Recommendation System
Medical Concept Embeddings via Labeled Background Corpora
Aspectual Flexibility Increases with Agentivity and Concreteness\\ A Computational Classification Experiment on Polysemous Verbs
Evaluating a Deterministic Shift-Reduce Neural Parser for Constituent Parsing
POS-tagging of Historical Dutch
An Annotated Corpus and Method for Analysis of Ad-Hoc Structures Embedded in Text
A Novel Evaluation Method for Morphological Segmentation
Text Segmentation of Digitized Clinical Texts
How does Dictionary Size Influence Performance of Vietnamese Word Segmentation?
Creating Annotated Dialogue Resources: Cross-domain Dialogue Act Classification
Solving the AL Chicken-and-Egg Corpus and Model Problem: Model-free Active Learning for Phenomena-driven Corpus Construction
Towards Using Social Media to Identify Individuals at Risk for Preventable Chronic Illness
A Comparative Study of Text Preprocessing Approaches for Topic Detection of User Utterances
Detecting Optional Arguments of Verbs
Corpus-Based Diacritic Restoration for South Slavic Languages
Differentia compositionem facit. A Slower-Paced and Reliable Parser for Latin
A Semi-Supervised Approach for Gender Identification
Word Embedding Evaluation and Combination
Automatic identification of Mild Cognitive Impairment through the analysis of Italian spontaneous speech productions
South African Language Resources: Phrase Chunking
Impact of Automatic Segmentation on the Quality, Productivity and Self-reported Post-editing Effort of Intralingual Subtitles
Syllable based DNN-HMM Cantonese Speech to Text System
What a Nerd! Beating Students and Vector Cosine in the ESL and TOEFL Datasets
Bootstrapping a Hybrid MT System to a New Language Pair
Building Language Resources for Exploring Autism Spectrum Disorders
A Multimodal Corpus for the Assessment of Public Speaking Ability and Anxiety
A Sequence Model Approach to Relation Extraction in Portuguese
MultiVec: a Multilingual and Multilevel Representation Learning Toolkit for NLP
Cross-lingual and Supervised Models for Morphosyntactic Annotation: a Comparison on Romanian
Segmenting Hashtags using Automatically Created Training Data
Detection of Major ASL Sign Types in Continuous Signing For ASL Recognition
Word Segmentation for Akkadian Cuneiform
A Multi-party Multi-modal Dataset for Focus of Visual Attention in Human-human and Human-robot Interaction
Specialising Paragraph Vectors for Text Polarity Detection
NNBlocks: A Deep Learning Framework for Computational Linguistics Neural Network Models
MoBiL: A Hybrid Feature Set for Automatic Human Translation Quality Assessment
Learning Thesaurus Relations from Distributional Features
Summarisation |
Revisiting Summarization Evaluation for Scientific Articles
Whats the Issue Here?: Task-based Evaluation of Reader Comment Summarization Systems
The OnForumS corpus from the Shared Task on Online Forum Summarisation at MultiLing 2015
Extractive Summarization under Strict Length Constraints
A Publicly Available Indonesian Corpora for Automatic Abstractive and Extractive Chat Summarization
Enhancing The RATP-DECODA Corpus With Linguistic Annotations For Performing A Large Range Of NLP Tasks
Sentence Similarity based on Dependency Tree Kernels for Multi-document Summarization
Urdu Summary Corpus
Analyzing Pre-processing Settings for Urdu Single-document Extractive Summarization
T |
Text Mining |
Event Coreference Resolution with Multi-Pass Sieves
The PsyMine Corpus - A Corpus annotated with Psychiatric Disorders and their Etiological Factors
An Empirical Exploration of Moral Foundations Theory in Partisan News Sources
Arabic Corpora for Credibility Analysis
Medical Concept Embeddings via Labeled Background Corpora
Using Data Mining Techniques for Sentiment Shifter Identification
Homing in on Twitter Users: Evaluating an Enhanced Geoparser for User Profile Locations
Domain Ontology Learning Enhanced by Optimized Relation Instance in DBpedia
An Annotated Corpus and Method for Analysis of Ad-Hoc Structures Embedded in Text
A Large DataBase of Hypernymy Relations Extracted from the Web.
JATE 2.0: Java Automatic Term Extraction with Apache Solr
Text Segmentation of Digitized Clinical Texts
Creating a General Russian Sentiment Lexicon
A Multilingual, Multi-style and Multi-granularity Dataset for Cross-language Textual Similarity Detection
WIKIPARQ: A Tabulated Wikipedia Resource Using the Parquet Format
Monitoring Disease Outbreak Events on the Web Using Text-mining Approach and Domain Expert Knowledge
Odin's Runes: A Rule Language for Information Extraction
A Publicly Available Indonesian Corpora for Automatic Abstractive and Extractive Chat Summarization
Identifying Content Types of Messages Related to Open Source Software Projects
Ensemble Classification of Grants using LDA-based Features
Ambiguity Diagnosis for Terms in Digital Humanities
A Classification-based Approach to Economic Event Detection in Dutch News Text
Corpus for Customer Purchase Behavior Prediction in Social Media
NLP and Public Engagement: The Case of the Italian School Reform
LanguageCrawl: A Generic Tool for Building Language Models Upon Common-Crawl
Tweeting and Being Ironic in the Debate about a Political Reform: the French Annotated Corpus TWitter-MariagePourTous
Edit Categories and Editor Role Identification in Wikipedia
Bilbo-Val: Automatic Identification of Bibliographical Zone in Papers
Sentence Similarity based on Dependency Tree Kernels for Multi-document Summarization
Crowdsourcing Salient Information from News and Tweets
More than Word Cooccurrence: Exploring Support and Opposition in International Climate Negotiations with Semantic Parsing
Analyzing Time Series Changes of Correlation between Market Share and Concerns on Companies measured through Search Engine Suggests
The Event and Implied Situation Ontology (ESO): Application and Evaluation
Typed Entity and Relation Annotation on Computer Science Papers
Detection of Reformulations in Spoken French
A Study of Reuse and Plagiarism in LREC papers
Controlled Propagation of Concept Annotations in Textual Corpora
Predictive Modeling: Guessing the NLP Terms of Tomorrow
A Crowdsourced Database of Event Sequence Descriptions for the Acquisition of High-quality Script Knowledge
Detecting Expressions of Blame or Praise in Text
Effects of Sampling on Twitter Trend Detection
Studying the Temporal Dynamics of Word Co-occurrences: An Application to Event Detection
Automatic Biomedical Term Polysemy Detection
Markov Logic Networks for Text Mining: A Qualitative and Empirical Comparison with Integer Linear Programming
Textual Entailment and Paraphrasing |
SemAligner: A Method and Tool for Aligning Chunks with Semantic Relation Types and Semantic Similarity Scores
Passing a USA National Bar Exam: a First Corpus for Experimentation
Corpora for Learning the Mutual Relationship between Semantic Relatedness and Textual Entailment
TEG-REP: A corpus of Textual Entailment Graphs based on Relation Extraction Patterns
UPPC - Urdu Paraphrase Plagiarism Corpus
Crowdsourcing a Large Dataset of Domain-Specific Context-Sensitive Semantic Verb Relations
Relation- and Phrase-level Linking of FrameNet with Sar-graphs
A Corpus of Word-Aligned Asked and Anticipated Questions in a Virtual Patient Dialogue System
Detection of Reformulations in Spoken French
A Crowdsourced Database of Event Sequence Descriptions for the Acquisition of High-quality Script Knowledge
Monolingual Social Media Datasets for Detecting Contradiction and Entailment
Tools, Systems, Applications |
Event Coreference Resolution with Multi-Pass Sieves
An Interaction-Centric Dataset for Learning Automation Rules in Smart Homes
Two Architectures for Parallel Processing of Huge Amounts of Text
Sieve-based Coreference Resolution in the Biomedical Domain
How to Address Smart Homes with a Social Robot? A Multi-modal Corpus of User Interactions with an Intelligent Environment
Croatian Error-Annotated Corpus of Non-Professional Written Language
MARMOT: A Toolkit for Translation Quality Estimation at the Word Level
NLP Infrastructure for the Lithuanian Language
Enhanced CORILGA: Introducing the Automatic Phonetic Alignment Tool for Continuous Speech
Sense-annotating a Lexical Substitution Data Set with Ubyline
Coh-Metrix-Esp: A Complexity Analysis Tool for Documents Written in Spanish
Annotating Characters in Literary Corpora: A Scheme, the CHARLES Tool, and an Annotated Novel
A Machine Learning based Music Retrieval and Recommendation System
Publishing the Trove Newspaper Corpus
Deriving Morphological Analyzers from Example Inflections
SemAligner: A Method and Tool for Aligning Chunks with Semantic Relation Types and Semantic Similarity Scores
The on-line version of Grammatical Dictionary of Polish
Enriching TimeBank: Towards a more precise annotation of temporal relations in a text
The Uppsala Corpus of Student Writings: Corpus Creation, Annotation, and Analysis
RankDCG: Rank-Ordering Evaluation Measure
CASSAurus: A Resource of Simpler Spanish Synonyms
MarsaGram: an excursion in the forests of parsing trees
Operational Assessment of Keyword Search on Oral History
Defining and Counting Phonological Classes in Cross-linguistic Segment Databases
Benchmarking Lexical Simplification Systems
Syntax-based Multi-system Machine Translation
Phoneme Alignment Using the Information on Phonological Processes in Continuous Speech
Farasa: A New Fast and Accurate Arabic Word Segmenter
Use of Domain-Specific Language Resources in Machine Translation
A Large DataBase of Hypernymy Relations Extracted from the Web.
Automatic Anomaly Detection for Dysarthria across Two Speech Styles: Read vs Spontaneous Speech
JATE 2.0: Java Automatic Term Extraction with Apache Solr
CATaLog Online: Porting a Post-editing Tool to the Web
The ILMT-s2s Corpus ― A Multimodal Interlingual Map Task Corpus
KorAP Architecture ― Diving in the Deep Sea of Corpus Data
mwetoolkit+sem: Integrating Word Embeddings in the mwetoolkit for Semantic MWE Processing
SVALex: a CEFR-graded Lexical Resource for Swedish Foreign and Second Language Learners
Solving the AL Chicken-and-Egg Corpus and Model Problem: Model-free Active Learning for Phenomena-driven Corpus Construction
Detecting Word Usage Errors in Chinese Sentences for Learning Chinese as a Foreign Language
TermoPL - a Flexible Tool for Terminology Extraction
Correcting Errors in a Treebank Based on Tree Mining
Towards Using Social Media to Identify Individuals at Risk for Preventable Chronic Illness
LibN3L:A Lightweight Package for Neural NLP
Humor in Collective Discourse: Unsupervised Funniness Detection in the New Yorker Cartoon Caption Contest
EstNLTK - NLP Toolkit for Estonian
SemLinker, a Modular and Open Source Framework for Named Entity Discovery and Linking
Finding Definitions in Large Corpora with Sketch Engine
Fine-Grained Chinese Discourse Relation Labelling
Corpus-Based Diacritic Restoration for South Slavic Languages
Ensemble Classification of Grants using LDA-based Features
Riddle Generation using Word Associations
Purely Corpus-based Automatic Conversation Authoring
Impact of Automatic Segmentation on the Quality, Productivity and Self-reported Post-editing Effort of Intralingual Subtitles
Distribution of Valency Complements in Czech Complex Predicates: Between Verb and Noun
1 Million Captioned Dutch Newspaper Images
Multimodal Resources for Human-Robot Communication Modelling
The CAMOMILE Collaborative Annotation Platform for Multi-modal, Multi-lingual and Multi-media Documents
NLP and Public Engagement: The Case of the Italian School Reform
FLAT: Constructing a CLARIN Compatible Home for Language Resources
SCALE: A Scalable Language Engineering Toolkit
LanguageCrawl: A Generic Tool for Building Language Models Upon Common-Crawl
Construction and Analysis of a Large Vietnamese Text Corpus
Accessing and Elaborating Walenty - a Valence Dictionary of Polish - via Internet Browser
Visualisation and Exploration of High-Dimensional Distributional Features in Lexical Semantic Classification
Evaluating Lexical Simplification and Vocabulary Knowledge for Learners of French: Possibilities of Using the FLELex Resource
EasyTree: A Graphical Tool for Dependency Tree Annotation
Automatic Recognition of Linguistic Replacements in Text Series Generated from Keystroke Logs
Bootstrapping a Hybrid MT System to a New Language Pair
Multilevel Annotation of Agreement and Disagreement in Italian News Blogs
Adapting an Entity Centric Model for Portuguese Coreference Resolution
FREME: Multilingual Semantic Enrichment with Linked Data and Language Technologies
Staggered NLP-assisted refinement for Clinical Annotations of Chronic Disease Events
Cross-validating Image Description Datasets and Evaluation Metrics
Using BabelNet to Improve OOV Coverage in SMT
A Multimodal Corpus for the Assessment of Public Speaking Ability and Anxiety
MADAD: A Readability Annotation Tool for Arabic Text
IMS HotCoref DE: A Data-driven Co-reference Resolver for German
Resources for building applications with Dependency Minimal Recursion Semantics
More than Word Cooccurrence: Exploring Support and Opposition in International Climate Negotiations with Semantic Parsing
Guidelines and Framework for a Large Scale Arabic Diacritized Corpus
TEITOK: Text-Faithful Annotated Corpora
Extracting Interlinear Glossed Text from LaTeX Documents
MultiVec: a Multilingual and Multilevel Representation Learning Toolkit for NLP
BAS Speech Science Web Services - an Update of Current Developments
Evaluation of the KIT Lecture Translation System
CirdoX: an on/off-line multisource speech and sound analysis software
Building an Arabic Machine Translation Post-Edited Corpus: Guidelines and Annotation
Tools and Guidelines for Principled Machine Translation Development
Interoperability of Annotation Schemes: Using the Pepper Framework to Display AWA Documents in the ANNIS Interface
SuperCAT: The (New and Improved) Corpus Analysis Toolkit
SPLIT: Smart Preprocessing (Quasi) Language Independent Tool
Urdu Summary Corpus
Refurbishing a Morphological Database for German
OSMAN ― A Novel Arabic Readability Metric
UIMA-Based JCoRe 2.0 Goes GitHub and Maven Central ― State-of-the-Art Software Resource Engineering and Distribution of NLP Pipelines
The hunvec framework for NN-CRF-based sequential tagging
SPA: Web-based Platform for easy Access to Speech Processing Modules
Corpus vs. Lexicon Supervision in Morphosyntactic Tagging: the Case of Slovene
Towards Multiple Antecedent Coreference Resolution in Specialized Discourse
Word Segmentation for Akkadian Cuneiform
Towards a Language Service Infrastructure for Mobile Environments
NNBlocks: A Deep Learning Framework for Computational Linguistics Neural Network Models
Controlled Propagation of Concept Annotations in Textual Corpora
The Public License Selector:
Making Open Licensing Easier
Searching in the Penn Discourse Treebank Using the PML-Tree Query
IRIS: English-Irish Machine Translation System
Exploring Language Variation Across Europe - A Web-based Tool for Computational Sociolinguistics
corpus-tools.org: An Interoperable Generic Software Tool Set for Multi-layer Linguistic Corpora
On Developing Resources for Patient-level Information Retrieval
ALT Explored: Integrating an Online Dialectometric Tool and an Online Dialect Atlas
Czech Legal Text Treebank 1.0
FlexTag: A Highly Flexible PoS Tagging Framework
CLARIN-EL Web-based Annotation Tool
Adapting the TANL tool suite to Universal Dependencies
Markov Logic Networks for Text Mining: A Qualitative and Empirical Comparison with Integer Linear Programming
EDISON: Feature Extraction for NLP, Simplified
Topic Detection & Tracking |
Enhancing Access to Online Education: Quality Machine Translation of MOOC Content
That'll Do Fine!: A Coarse Lexical Resource for English-Hindi MT, Using Polylingual Topic Models
Forecasting Emerging Trends from Scientific Literature
Can Topic Modelling benefit from Word Sense Information?
Analyzing Time Series Changes of Correlation between Market Share and Concerns on Companies measured through Search Engine Suggests
Automatic Construction of Discourse Corpora for Dialogue Translation
Predictive Modeling: Guessing the NLP Terms of Tomorrow
Studying the Temporal Dynamics of Word Co-occurrences: An Application to Event Detection
Typological Databases |
Discriminative Analysis of Linguistic Features for Typological Study
The Alaskan Athabascan Grammar Database
Defining and Counting Phonological Classes in Cross-linguistic Segment Databases
Typology of Adjectives Benchmark for Compositional Distributional Models
Legacy language atlas data mining: mapping Kru languages