Languages

A / B / C / D / E / F / G / H / I / J / K / L / M / N / O / P / Q / R / S / T / U / V / W / X / Z


 
Language Title

60 languages

The OPUS corpus - parallel and free

A

 

Abbey

WALA: a multilingual resource repository for West African Languages

Afrikaans

A Chatbot as a Novel Corpus Visualization Tool

A Spoken Afrikaans Language Resource Designed for Research on Pronunciation Variations

Talkbank: Building an Open Unified Multimodal Database of Communicative Interaction

The African Speech Technology Project: An Assessment

Albanian

Dynamic Lexicographic Data Modelling. A Diachronic Dictionary Development Report

MED-TYP: A Typological Database for Mediterranean Languages

All

A Registry of Standard Data Categories for Linguistic Annotation

Mapping Dependency Structures to Phrase Structures and the Automatic Acquisition of Mapping Rules

All text encodable languages

Migrating Language Resources from SGML to XML: the Text Encoding Initiative Recommendations

All Unicode supported languages

Callisto: A Configurable Annotation Workbench

American English

The American English SALA-II Data Collection

The American National Corpus First Release

Any

WinPitch Corpus, a Text to Speech Alignment Tool for Multimodal Corpora

Collecting and Sharing Bilingual Spontaneous Speech Corpora: the ChinFaDial Experiment

eGram - a Grammar Development Environment and its Usage for Language Generation

ENABLER Thematic Network of National Projects: Technical, Strategic and Political Issues of LRs

Anyi

CoGesT: A Formal Transcription System for Conversational Gesture

Anyi

WALA: a multilingual resource repository for West African Languages

Arabic

A Chatbot as a Novel Corpus Visualization Tool

A Framework for Evaluating the Suitability of Non-English Corpora for Language Engineering

A Multi-Modal Documentation System for Warao

A Progress Report from the Linguistic Data Consortium: Recent Activities in Resource Creation and Distribution and the Development of Tools and Standards

An Emerging Transcontinental Collaborative Research and Education Agenda in Human Language Technologies

Annotation Tools for Large-Scale Corpus Development: Using AGTK at the Linguistic Data Consortium

Automatic Language-Independent Induction of Gazetteer Lists

Collection and Evaluation of Broadcast News Data for Arabic

Construction of a Bilingual Arabic-Spanish Lexicon of Verbs Based on a Parallel Corpus

Conversational Telephone Speech Corpus Collection for the NIST Speaker Recognition Evaluation 2004

Generating an Arabic full-form lexicon for bidirectional morphology lookup

Language Model Adaptation for Statistical Machine Translation based on Information Retrieval

Linguistic Resources for Effective, Affordable, Reusable Speech-to-Text

NEMLAR - An Arabic Language Resources Project

OrienTel - Telephony Databases Across Northern Africa and the Middle East

Talkbank: Building an Open Unified Multimodal Database of Communicative Interaction

The Automatic Content Extraction (ACE) Program - Tasks, Data, and Evaluation

The Fisher Corpus: A Resource for the Next Generations of Speech-to-Text

The Mixer Corpus of Multilingual, Multichannel Speaker Recognition Data

Towards basic categories for describing properties of texts in a corpus

Arabic dialects

MED-TYP: A Typological Database for Mediterranean Languages

B

 

Balkan languages

The Integral Dictionary: An Ontological Resource for the Semantic Web Integration of EuroWordNet, Balkanet, TID and SUMO

Basque

A Xml-Based Term Extraction Tool for Basque

Abar-Hitz: An Annotation Tool for the Basque Dependency Treebank

Cross-Language Acquisition of Semantic Models for Verbal Predicates

Development of Resources for a Bilingual Automatic Index System of Broadcast News in Basque and Spanish

Evaluation of a Spoken Phonetic Databse in Basque Language

Exploring Portability of Syntactic Information from English to Basque

Towards the MEANING Top Ontology: Sources of Ontological Meaning

Translation memories enrichment by statistical bilingual segmentation

Basque (standard)

Designing and Recording an Audiovisual Database of Emotional Speech in Basque

Baule

WALA: a multilingual resource repository for West African Languages

Bengali

A Framework for Evaluating the Suitability of Non-English Corpora for Language Engineering

Berber

An Emerging Transcontinental Collaborative Research and Education Agenda in Human Language Technologies

MED-TYP: A Typological Database for Mediterranean Languages

Bulgarian

A Hybrid Strategy for Regular Grammar Parsing

A Language Resources Infrastructure for Bulgarian

A Methodology and Associated Tools for Building Interlingual Wordnets

Cluster Analysis and Classification of Named Entities

Exploring Balkanet Shared Ontology for Multilingual Conceptual Indexing

Making Monolingual Corpora Comparable: a Case Study of Bulgarian and Croatian

MULTEXT-East Version 3: Multilingual Morphosyntactic Specifications, Lexicons and Corpora

Multilingual Pattern Libraries for Question Answering: a Case Study for Definition Questions

Talkbank: Building an Open Unified Multimodal Database of Communicative Interaction

The CLaRK System:  XML-based Corpora Development System for Rapid Prototyping

Unexpected Productions May Well be Errors

Verb Valency Descriptors for a Syntactic Treebank

C

 

Cantonese

Talkbank: Building an Open Unified Multimodal Database of Communicative Interaction

Catalan

 

ALLES: Integrating NLP in ICALL Applications

Bilingual Connections for Trilingual Corpora: An XML Approach

Creation and Validation of Large Lexica for Speech-to-Speech Translation Purposes

FreeLing: An Open-Source Suite of Language Analyzers

MED-TYP: A Typological Database for Mediterranean Languages

Mercedes, A Term-In-Context Highlighter

NLP-enhanced error Checking for Catalan unrestricted text

Talkbank: Building an Open Unified Multimodal Database of Communicative Interaction

The GENOMA-KB Platform: Queries Over Integrated Linguistic Resources

The GENOMA-KB project: towards the integration of concepts, terms, textual corpora and entities

Towards the MEANING Top Ontology: Sources of Ontological Meaning

Towards the Use of Word Stems and Suffixes for Statistical Machine Translation

Chinese

A Model of Semantic Representations Analysis For Chinese Sentences

A Multi-Modal Documentation System for Warao

An Information Repository Model for Advanced Question Answering Systems

Annotation Tools for Large-Scale Corpus Development: Using AGTK at the Linguistic Data Consortium

Augmenting Manual Dictionaries for Statistical Machine Translation Systems

Automatic Language-Independent Induction of Gazetteer Lists

Collecting and Sharing Bilingual Spontaneous Speech Corpora: the ChinFaDial Experiment

Collocation Extraction Using Web Statistics

Distributional Consistency: As a General Method for Defining a Core Lexicon

Dynamic Lexicographic Data Modelling. A Diachronic Dictionary Development Report

Korean-Chinese-Japanese Multilingual Wordnet with Shared Semantic Hierarchy

Language Model Adaptation for Statistical Machine Translation based on Information Retrieval

Linguistic Resources for Effective, Affordable, Reusable Speech-to-Text

MEAD - A Platform for Multidocument Multilingual Text Summarization

Pattern Discovery in Named Organization Corpus

Sinica BOW (Bilingual Ontological Wordnet): Integration of Bilingual WordNet and SUMO

Sinica BOW (Bilingual Ontological Wordnet): Integration of Bilingual WordNet and SUMO

Speech & Expression - The Value of a Longitudinal Corpus

Test Collections for Patent-to-Patent Retrieval and Patent Map Generation in NTCIR-4 Workshop

The Automatic Content Extraction (ACE) Program - Tasks, Data, and Evaluation

Chol

Talkbank: Building an Open Unified Multimodal Database of Communicative Interaction

Classical Arabic

Creation and Validation of Large Lexica for Speech-to-Speech Translation Purposes

Contemporary Italian

Representing Italian Complex Nominals: a Pilot Study

Croatian

Enlarging the Croatian Morphological Lexicon by Automatic Lexical Acquisition from Raw Corpora

Making Monolingual Corpora Comparable: a Case Study of Bulgarian and Croatian

MULTEXT-East Version 3: Multilingual Morphosyntactic Specifications, Lexicons and Corpora

Talkbank: Building an Open Unified Multimodal Database of Communicative Interaction

Cypriot Greek

Cypriot Speech Database: Data Collection and Greek to Cypriot Dialect Adaptation

Czech

A Methodology and Associated Tools for Building Interlingual Wordnets

Annotators' Agreement: The Case of Topic-Focus Articulation

Derivational Relations in Flectional Languages - Czech Case

Exploring Balkanet Shared Ontology for Multilingual Conceptual Indexing

Issues in Annotation of the Czech Spontaneous Speech Corpus in the MALACH Project

MULTEXT-East Version 3: Multilingual Morphosyntactic Specifications, Lexicons and Corpora

Orthographic and Phonetic Annotation of Very Large Czech Corpora with Quality Assessment

Prague Czech-English Dependency Treebank, Syntactically Annotated Resources for Machine Translation

Talkbank: Building an Open Unified Multimodal Database of Communicative Interaction

The Core of the Czech Derivational Dictionary

The COST278 pan-European Broadcast News Database

The Design of Czech Language Formal Listening Tests for the Evaluation of TTS Systems

Tiered Tagging Revisited

Top Ontology as a Tool for Semantic Role Tagging

Word Association Norms as a Unique Supplement of Traditional Language Resources

D

 

DAML+OIL

Ontology Evaluation Functionalities of RDF(S), DAML+OIL, and OWL Parsers and Ontology Platforms

Danish

A Corpus-based Syntactic Lexicon for Adverbs

A Danish Lexicon Resource - Ready for Applications

A Flexible Language Acquisition Tool Kit for Natural Language Processing

A Named Entity Recognizer for Danish

Evaluation of a Multimodal Dialogue System for Small-screen Devices

Human Language Technology Elements in a Knowledge Organisation System -The VID project

Talkbank: Building an Open Unified Multimodal Database of Communicative Interaction

The Bilingual Web Dictionary on Demand

Dutch

Automatic Phonemic Labeling and Segmentation of Spoken Dutch

Automatic Sentence Simplification for Subtitling in Dutch and English

Discarding noise in an automatically acquired lexicon of support verb constructions

Evaluating Multimodal NLG using Production Experiments

Evaluation and Adaptation of the Celex Dutch Morphological Database

Improving Automatic Phonetic Transcription of Spontaneous Speech through Variant-Based Pronunciation Variation Modelling

Intelligent Building of Language Resources for HLT Applications

Linguistic annotation of the Spoken Dutch Corpus: If we had to do it all over again ...

On the Usefulness of Large Spoken Language Corpora for Linguistic Research

Putting the Dutch PAROLE Corpus to Work

Reusable Lexical Representations for Idioms

Talkbank: Building an Open Unified Multimodal Database of Communicative Interaction

Term Translations in Parallel Corpora: Discovery and Consistency Check

The Centre for Dutch Language and Speech Technology (TST Centre)

The COST278 pan-European Broadcast News Database

The Influence of the Labeller’s Regional Background on Phonetic Transcriptions: Implications for the Evaluation of Spoken Language Resources

The Integral Dictionary: An Ontological Resource for the Semantic Web Integration of EuroWordNet, Balkanet, TID and SUMO

The Integrated Language Database of 8th - 21st-Century Dutch

The new Dutch-Flemish HLT Programme: a concerted effort to stimulate the HLT sector

Use and Evaluation of Prosodic Annotations in Dutch

Using a Parallel Transcript/Subtitle Corpus for Sentence Compression

Using large multi-purpose corpora for specific research questions: discourse phenomena related to wh-questions in the Spoken Dutch Corpus

Dutch (historical)

The Integrated Language Database of 8th - 21st-Century Dutch

E

 

Ega

Securing Interpretability: The Case of Ega Language Documentation

WALA: a multilingual resource repository for West African Languages

EL

Multimodal Multilingual Resources in the Subtitling Process

EN

Multimodal Multilingual Resources in the Subtitling Process

English

A Chatbot as a Novel Corpus Visualization Tool

A Comparative Study on Human Communication Behaviors and Linguistic Characteristics for Speech-to-Speech Translation

A comparison of summarisation methods based on term specificity estimation

A Comparison of Two Variant Corpora: The Same Content with Different Sources

A Critical Survey of the Methodology for IE Evaluation

A Domain-Independent Approach to IE Rule Development 

A Fine-Grained Evaluation Method for Speech-to-Speech Machine Translation Using Concept Annotations

A Flexible Language Acquisition Tool Kit for Natural Language Processing

A Framework for Evaluating the Suitability of Non-English Corpora for Language Engineering

A Framework for Temporal Resolution

A Freely Available Automatically Generated Thesaurus of Related Words

A General-Purpose  off-the-shelf Anaphora Resolution Module: Implementation and Preliminary Evaluation

A Grammar and Style Checker Based on Internet Searches

A Labelled Corpus for Prepositional Phrase Attachment

A Large-Scale Resource for Storing and Recognizing Technical Terminology

A Lexicon Module for a Grammar Development Environment

A Methodology and Associated Tools for Building Interlingual Wordnets

A Multilingual Database of Idioms

A Multi-Modal Documentation System for Warao

A natural language approach to information management: tracking scientific advances through the structure of words

A New ITU-T Recommendation on the Evaluation of Telephone-Based Spoken Dialogue Systems

A pattern extraction workbench combining multiple linguistic levels

A powerful and versatile XML format for representing role-semantic annotation

A practical competition of different filters used in automatic term extraction

A Progress Report from the Linguistic Data Consortium: Recent Activities in Resource Creation and Distribution and the Development of Tools and Standards

A Public Reference Implementation of the RAP Anaphora Resolution Algorithm

A Similarity Measure for Unsupervised Semantic Disambiguation

A Suite of Tools for Marking Up Textual Data for Temporal Text Mining Scenarios

A word alignment system based on a translation equivalence extractor

A2Q: an agent-based architecure for multilingual Q&A

Abstracting a Dialogue Act Tagset for Meeting Processing

Acquiring Bayesian Networks from Text

Acquiring Reusable Multilingual Phonotactic Resources

Adding Syntactic Annotations to Transcripts of Parent-Child Dialogs

Agreement in Human Factoid Annotation for Summarization Evaluation

ALLES: Integrating NLP in ICALL Applications

An Analysis of the Relative Difficulty of Reuters-21578 Subsets

An Annotation Scheme for Information Status in Dialogue

An argumentative annotation schema for meeting discussions

An Automatic Method for Constructing Domain-Specific Ontology Resources

An Emerging Transcontinental Collaborative Research and Education Agenda in Human Language Technologies

An Information Repository Model for Advanced Question Answering Systems

Annotating a corpus for building a domain-specific knowledge base

Annotating Noun Argument Structure for NomBank

Annotation of anaphoric expressions in an aligned bilingual corpus

Annotation OfCoreference Relations Among Linguistic Expressions And Images In Biological Articles

Annotation Tools for Large-Scale Corpus Development: Using AGTK at the Linguistic Data Consortium

Application of the BLEU Method for Evaluating Free-text Answers in an E-learning Environment

Augmenting Manual Dictionaries for Statistical Machine Translation Systems

Automatic Acquisition of Paradigmatic Relations using Iterated Co-occurrences

Automatic Acquisition of Sense Examples using ExRetriever

Automatic Bilingual Lexicon Acquisition Using Random Indexing of Aligned Bilingual Data

Automatic Building Gazetteers of Co-referring Named Entities

Automatic Classification of Geographical Named Entities

Automatic Generation of Glosses in the OntoLearn System

Automatic Keyword Extraction from Spoken Text. A Comparison of two Lexical Resources: the EDR and WordNet

Automatic Language-Independent Induction of Gazetteer Lists

Automatic Sentence Simplification for Subtitling in Dutch and English

Automatic transformation of phrase treebanks to dependency trees

Automatic Translation Memory Fuzzy Match Post-Editing: A Step beyond Traditional TM/MT Integration

Bayesian Semantics Incorporation to Web Content for Natural Language Information Retrieval

Beyond TREC's Filtering Track

BootCaT: Bootstrapping Corpora and Terms from the Web

Building a Maritime Domain Lexicon: a Few Considerations on the Database Structure and the Semantic Coding

Building and Using a Corpus of Shallow Dialog Annotated Meetings

Building Part-of-speech Corpora through Histogram Hopping

Calibrating Resource-light Automatic MT Evaluation: A Cheap Approach to Ranking MT Systems by the Usability of their Output

Can Anaphoric Definite Descriptions be Replaced by Pronouns?

Categorizing Web Pages as a Preprocessing Step for Information Extraction

CHeM: A System for the Automatic Analysis of e-mails in the Restoration and Conservation Domain

Cluster Analysis and Classification of Named Entities

Clustering Concept Hierarchies from Text

CoGesT: A Formal Transcription System for Conversational Gesture

Collection of SLR in the Asian-Pacific area

Collocation Extraction Using Web Statistics

Combining Heterogeneous Lexical Resources

Comparative Evaluation Of A Stochastic Parser On Semantic And Syntactic-Semantic Labels

Computing Reliability for Coreference Annotation

Concept Creation in Lexical Ontologies

Connector Usage in the English Essay Writing of Japanese EFL Learners

Consistent Storage of Metadata in Inference Lexica: The MetaLex Approach

Constructing Word-Sense Association Networks from Bilingual Dictionary and Comparable Corpora

Conversational Telephone Speech Corpus Collection for the NIST Speaker Recognition Evaluation 2004

Converting Treebank Annotations to Language Neutral Syntax

Creation of a Doctor-Patient Dialogue Corpus Using Standardized Patients

Creation of reusable components and language resources for Named Entity Recognition in Russian

Cross-effective cross-lingual document classification

Cross-Language Acquisition of Semantic Models for Verbal Predicates

CST Bank: A Corpus for the Study of Cross-document Structural Relationships

Data Driven Ontology Evaluation

Definition, dictionaries and tagger for Extended Named Entity Hierarchy

Designing a Realistic Evaluation of an End-to-end Interactive Question Answering System

Detecting Errors in English Article Usage with a Maximum Entropy Classifier Trained on a Large, Diverse Corpus

Detection of Domain Specific Terminology Using Corpora Comparison

Development of Bilingual Domain-Specific Ontology for Automatic Conceptual Indexing

Development of Ontologies with Minimal Set of Conceptual Relations

Dynamic Lexicographic Data Modelling. A Diachronic Dictionary Development Report

Enriching a Thai Lexical Database with Selectional Preferences

Enriching WordNet Via Generative Metonymy and Creative Polysemy

EuroWordNet as a Resource for Cross-language Information Retrieval

Evaluating Conversation with Hans Christian Andersen

Evaluating Factors Impacting the Accuracy of Forced Alignments in a Multimodal Corpus

Evaluating Lexical Resources for A Semantic Tagger

Evaluating Name-Matching for Coreference Resolution

Evaluating Variants of the Lesk Approach for Disambiguating Words

Evaluation and Adaptation of a Specialised Language Checking Tool for Non-specialised Machine Translation and Non-expert MT Users for Multi-lingual Telecooperation

Evaluation of Cross-Language Information Retrieval Using the Domain-Specific GIRT Data as Parallel German-English Corpus

Evaluation of Different Similarity Measures for the Extraction of Multiword Units in a Reinforcement Learning Environment

Evaluation of Multi-party Virtual Reality Dialogue Interaction

Evaluation of Transcription and Annotation Tools for a Multi-modal, Multi-party Dialogue Corpus

Evaluation Resources for Concept-based Cross-Lingual Information Retrieval in the Medical Domain

Exploiting Anchor Text as a Lexical Resource

Exploiting Language Resources for Semantic Web Annotations

Exploiting Semantic Web Technologies for Intelligent Access to Historical Documents

Exploring Balkanet Shared Ontology for Multilingual Conceptual Indexing

Exploring Portability of Syntactic Information from English to Basque

Extending a verb-lexicon using a semantically annotated corpus

Extending WordNets to Implicit Information

FreeLing: An Open-Source Suite of Language Analyzers

French-English multi-word term alignment based on lexical context analysis

Frequent Term Distribution Measures for Dataset Profiling

How Does Automatic Machine Translation Evaluation Correlate With Human Scoring as the Number of Reference Translations Increases?

How to Disassemble Alphabetical Processions - Morphological Treatment of Unknown Words

Human dialogue modelling using annotated corpora

Identifying Definitions in Text Collections for Question Answering

Improving Collocation Extraction for High Frequency Words

Incremental Knowledge Acquisition from WordNet and EuroWordNet

Incremental Methods to Select Test Sentences for Evaluating Translation Ability

Information Retrieval System Using Latent Contextual Relevance

INSPIRE: Evaluation of a Smart-Home System for Infotainment Management and Device Control

Integrated Language Technologies for Multilingual Information Services in the MEMPHIS Project

Issues in Corpus Cevelopment for Muli-party Multi-modal Task-oriented Dialogue

Language Model Adaptation for Statistical Machine Translation based on Information Retrieval

Large Scale Experiments for Semantic Labeling of Noun Phrases in Raw Text

Linguistic Corpus Search

Linguistic Resources for Effective, Affordable, Reusable Speech-to-Text

MEAD - A Platform for Multidocument Multilingual Text Summarization

Meaningful Clusters

Mercedes, A Term-In-Context Highlighter

Mining the Web for Discourse Markers

Modelling Legitimate Translation Variation for Automatic Evaluation of MT Quality

MT Goes Farming: Comparing Two Machine Translation Approaches on a New Domain

MULTEXT-East Version 3: Multilingual Morphosyntactic Specifications, Lexicons and Corpora

Multi-Document Summarization using Multiple-Sequence Alignment

Multilingual Corpus-based Approach to the Resolution of English -ing

Multi-lingual Evaluation of a Natural Language Generation System

Multilingual Pattern Libraries for Question Answering: a Case Study for Definition Questions

Multimodal Meaning Representation for Generic Dialogue Systems Architectures

NameNet: A Self-Improving Resource for Name Classification

N-Gram Language Modeling for Robust Multi-Lingual Document Classification

NLP-enhanced Content Filtering within the POESIA Project

OntoTag's Linguistic Ontologies: Enhancing Higher Level and Semantic Web Annotations

Open Resources for Language Technology

Open-source Tools for Creation, Maintenance, and Storage of Lexical Resources for Language Generation from Ontologies

OrienTel - Telephony Databases Across Northern Africa and the Middle East

Parsing Ungrammatical Input: An Evaluation Procedure

Part-of-Speech Annotation of Biology Research Abstracts

Polysemy and Category Structure in WordNet: An Evidential Approach

Prague Czech-English Dependency Treebank, Syntactically Annotated Resources for Machine Translation

Pronominal Anaphora Resolution for Unrestricted Text

Proper Names and Polysemy: from a Lexicographic Experience

Publicly Available Topic Signatures for all WordNet Nominal Senses

Querying both time-aligned and hierarchical corpora with NXT Search

Raising the Bar: Stacked Conservative Error Correction Beyond Boosting

Resources and Techniques for Multilingual Information Extraction

Resources for Place Name Analysis

Reusable Lexical Representations for Idioms

Re-using high-quality resources for continued evaluation of automated summarization systems

RevisionBank: A Resource for Revision-based Multi-document Summarization and Evaluation

Road-testing the English Resource Grammar over the British National Corpus

SALA II across the finish line: a large collection of mobile telephone speech databases from North and Latin America completed

Selecting the Correct English Synset for a Spanish Sense

Semi-Automatic Construction of a Question Treebank

Semi-automatic Syntactic and Semantic Corpus Annotation with a Deep Parser

Sinica BOW (Bilingual Ontological Wordnet): Integration of Bilingual WordNet and SUMO

Some Meaning Procedures of Ontological Semantics

Spanish WordNet 1.6: Porting the Spanish WordNet Across Princeton Versions

Speech & Expression - The Value of a Longitudinal Corpus

Steps towards Semantically Annotated Language Resources

Summarization of Multimodal Information

Talkbank: Building an Open Unified Multimodal Database of Communicative Interaction

Term Translations in Parallel Corpora: Discovery and Consistency Check

Test Collections for Patent-to-Patent Retrieval and Patent Map Generation in NTCIR-4 Workshop

Text Corpora, Local Grammars and Prediction

Textual Distraction as a Basis for Evaluating Automatic Summarisers

The AAC [Austrian Academy Corpus] An Enterprise to Develop Large Electronic Text Corpora

The Automatic Content Extraction (ACE) Program - Tasks, Data, and Evaluation

The Bilingual Web Dictionary on Demand

The Corpógrafo – a Web-based environment for corpora research

The Cross-Breeding of Dictionaries

The DeepThought Core Architecture Framework

The Effect of Bias on an Automatically-built Word Sense Corpus

The Fisher Corpus: A Resource for the Next Generations of Speech-to-Text

The GENOMA-KB Platform: Queries Over Integrated Linguistic Resources

The GENOMA-KB project: towards the integration of concepts, terms, textual corpora and entities

The Integral Dictionary: An Ontological Resource for the Semantic Web Integration of EuroWordNet, Balkanet, TID and SUMO

The Italian NESPOLE! Corpus: A Multilingual Database with Interlingua Annotation in Tourism and Medical Domains

The Mixer Corpus of Multilingual, Multichannel Speaker Recognition Data

The MULI Project: Annotation and Analysis of Information Structure in German and English

The NIST Meeting Room Pilot Corpus

The OLISSIPO and LECTIO Projects

The overview of the SST speech corpus of Japanese learner English and evaluation through the experiment on automatic detection of learners' errors

The Penn Discourse Treebank

The Rationale for Building Resources Expressly for NLP

The Role of MultiWord Terminology in Knowledge Management

The Translation Correction Tool: English-Spanish User Studies

Tiered Tagging Revisited

Tone-of-Voice and Controlled Language Techniques

Top Ontology as a Tool for Semantic Role Tagging

Towards basic categories for describing properties of texts in a corpus

Towards the MEANING Top Ontology: Sources of Ontological Meaning

Training a Sentence-Level Machine Translation Confidence Measure

Unsupervised Text Mining for Ontology Extraction: An Evaluation of Statistical Measures

Using Paradigm Tables to Generate New Utterances Similar to those Existing in Linguistic Resources

Using the NITE XML Toolkit on the Switchboard Corpus to Study Syntactic Choice: A Case Study

Using the Penn Treebank to Evaluate Non-Treebank Parsers

Using the Web as a Corpus for the Syntactic-Based Collocation Identification

Using Weighted Abduction to Align Term Variant Translations in Bilingual Texts

Using Weighted Abduction to Align Term Variant Translations in Bilingual Texts

Using WordNet to Measure Semantic Orientations of Adjectives

Utilization of Multiple Language Resources for Robust Grammar-Based Tense and Aspect Classification

Utilizing the One-Sense-per-Discourse Constraint for Fully Unsupervised Word Sense Induction and Disambiguation

Why do you ignore me? - Proof that not all direct speech is bad

Word Association Norms as a Unique Supplement of Traditional Language Resources

Word Sense Disambiguation as a Wordnets' Validation Method in Balkanet

Word Sense Disambiguation Using Random Indexing

You stupid tin box' - children interacting with the AIBO robot: A cross-linguistic emotional speech corpus

Semi-automatic Acquisition of Command Grammar

English (in scientific texts)

An Annotation Scheme for a Rhetorical Analysis of Biology Articles

English (U.S., Belize)

Developing Language Resources for a Transnational Digital Government System

Estonian

MULTEXT-East Version 3: Multilingual Morphosyntactic Specifications, Lexicons and Corpora

Talkbank: Building an Open Unified Multimodal Database of Communicative Interaction

Tiered Tagging Revisited

F

 

Farsi

Creation of a Doctor-Patient Dialogue Corpus Using Standardized Patients

Talkbank: Building an Open Unified Multimodal Database of Communicative Interaction

Finnish

 

Creation and Validation of Large Lexica for Speech-to-Speech Translation Purposes

Infrastructure for Collaborative Annotation of Speech

FR

Multimodal Multilingual Resources in the Subtitling Process

French

A Chatbot as a Novel Corpus Visualization Tool

A complete understanding speech system based on semantic concepts

An Evaluation Protocol For Text Mining Tools : ALCESTE SAS TEXT MINER SPAD-CRM AND TEMIS Text Mining Solutions Testing

Annotation of anaphoric expressions in an aligned bilingual corpus

Automatic audio and manual transcripts alignment, time-code transfer and selection of exact transcripts

Automatisation Of The Activity Of Term Collection In Different Languages

Building Part-of-speech Corpora through Histogram Hopping

Calibrating Resource-light Automatic MT Evaluation: A Cheap Approach to Ranking MT Systems by the Usability of their Output

Collecting and Sharing Bilingual Spontaneous Speech Corpora: the ChinFaDial Experiment

Development of New Telephone Speech Databases for French: The NEOLOGOS Project

Enriching a French Treebank

Evaluating an Authentic Audio-Visual Expressive Speech Corpus

Evaluation Of A Speech Cuer: From Motion Capture To A Concatenative Text-To-Cued Speech System

Evaluation of Consensus on the Annotation of Prosodic Breaks in the Romance Corpus of Spontaneous Speech “C-ORAL-ROM”

Experiments on Building Language Resources for Multi-Modal Dialogue Systems

French-English multi-word term alignment based on lexical context analysis

Generating Coreferential Descriptions from a Structured Model of the Context

Intelligent Building of Language Resources for HLT Applications

Language Modeling using Dynamic Bayesian Networks

Measurements of Spoken Language Variability in a Multilingual Corpus. Predictable Aspects

MED-TYP: A Typological Database for Mediterranean Languages

Metaphors in Wordnets: from Theory to Practice

Methodology For Building Thematic Indexes In Medecine For French

Modelling Legitimate Translation Variation for Automatic Evaluation of MT Quality

Morphology Based Automatic Acquisition of Large-coverage Lexica

Multilingual Corpus-based Approach to the Resolution of English -ing

NLP-enhanced Content Filtering within the POESIA Project

OntoTag's Linguistic Ontologies: Enhancing Higher Level and Semantic Web Annotations

OrienTel - Telephony Databases Across Northern Africa and the Middle East

Resources and Techniques for Multilingual Information Extraction

SALA II across the finish line: a large collection of mobile telephone speech databases from North and Latin America completed

Semi-automatic Acquisition of Command Grammar

Semi-Automatic Derivation of a French Lexicon from CLIPS

Talkbank: Building an Open Unified Multimodal Database of Communicative Interaction

Term Translations in Parallel Corpora: Discovery and Consistency Check

The Bilingual Web Dictionary on Demand

The C-ORAL-ROM CORPUS. A Multilingual Resource of Spontaneous Speech for Romance Languages

The ESTER Evaluation Campaign for the Rich Transcription of French Broadcast News

The Integral Dictionary: An Ontological Resource for the Semantic Web Integration of EuroWordNet, Balkanet, TID and SUMO

The Italian NESPOLE! Corpus: A Multilingual Database with Interlingua Annotation in Tourism and Medical Domains

Using the Web as a Corpus for the Syntactic-Based Collocation Identification

Using Weighted Abduction to Align Term Variant Translations in Bilingual Texts

French Sign Language

Toward an Annotation Software for Video of Sign Language, Including Image Processing Tools and Signing Space Modelling

Friulan

MED-TYP: A Typological Database for Mediterranean Languages

G

 

Gaelic

Dynamic Lexicographic Data Modelling. A Diachronic Dictionary Development Report

Galician

A Galician Textual Corpus for Morphosyntactic Tagging with Application to Text-to-Speech Synthesis

Parallel corpora for the Galician language: building and processing of the CLUVI (Linguistic Corpus of the University of Vigo)

The COST278 pan-European Broadcast News Database

Transcrigal: A Bilingual System for Automatic Indexing of Broadcast News

General

Distributional Consistency: As a General Method for Defining a Core Lexicon

German

The BITS Speech Synthesis Corpus for German

A High Quality Partial Parser for Annotating German Text Corpora

A powerful and versatile XML format for representing role-semantic annotation

A Progress Report from the Linguistic Data Consortium: Recent Activities in Resource Creation and Distribution and the Development of Tools and Standards

ALLES: Integrating NLP in ICALL Applications

An Annotated Corpus of Tutorial Dialogs on Mathematical Theorem Proving

An Annotated German-Language Medical Text Corpus as Language Resource

Annotating a corpus for building a domain-specific knowledge base

Automated Morphological Segmentation and Evaluation

Automatic Acquisition of Paradigmatic Relations using Iterated Co-occurrences

Automatic Bilingual Lexicon Acquisition Using Random Indexing of Aligned Bilingual Data

Automatic Methods to Supplement Broad-Coverage Subcategorization Lexicons

Automatic transformation of phrase treebanks to dependency trees

Automatic Translation Memory Fuzzy Match Post-Editing: A Step beyond Traditional TM/MT Integration

Automatisation Of The Activity Of  Term Collection In Different Languages

Bootstrapping a database of German multi-word expressions

CoGesT: A Formal Transcription System for Conversational Gesture

Consistent Storage of Metadata in Inference Lexica: The MetaLex Approach

Corpus based Enrichment of GermaNet Verb Frames

Corpus-based Learning of Lexical Resources for German Named Entity Recognition

Creation and Validation of Large Lexica for Speech-to-Speech Translation Purposes

Development and Integration of the LDA-Toolkit into the COST249 SpeechDat (II) SIG Reference Recognizer

Dynamic Lexicographic Data Modelling. A Diachronic Dictionary Development Report

Evaluation and Adaptation of a Specialised Language Checking Tool for Non-specialised Machine Translation and Non-expert MT Users for Multi-lingual Telecooperation

Evaluation of Cross-Language Information Retrieval Using the Domain-Specific GIRT Data as Parallel German-English Corpus

Evaluation of Microphone Array Front-Ends for ASR - an Extension of the AURORA Framework

Exploiting Coreference Annotations for Text-to-Hypertext Conversion

How to Disassemble Alphabetical Processions - Morphological Treatment of Unknown Words

Identifying Morphosyntactic Preferences in Collocations

Integrated Language Technologies for Multilingual Information Services in the MEMPHIS Project

Intelligent Building of Language Resources for HLT Applications

Linguistic Corpus Search

MAUS Goes Iterative

Metaphors in Wordnets: from Theory to Practice

Multilingual Corpus-based Approach to the Resolution of English -ing

N-Gram Language Modeling for Robust Multi-Lingual Document Classification

OntoTag's Linguistic Ontologies: Enhancing Higher Level and Semantic Web Annotations

OrienTel - Telephony Databases Across Northern Africa and the Middle East

Querying both time-aligned and hierarchical corpora with NXT Search

Resources and Techniques for Multilingual Information Extraction

Rethinking readability of digital editions - the case of the AAC’s "Digital Brenner"

SMOR: A German Computational Morphology Covering Derivation, Composition, and Inflection

Speech recognition simulation and its application for Wizard of Oz experiments

Steps towards Semantically Annotated Language Resources

Talkbank: Building an Open Unified Multimodal Database of Communicative Interaction

The AAC [Austrian Academy Corpus] An Enterprise to Develop Large Electronic Text Corpora

The COST 278 MASPER initiative - crosslingual speech recognition with large telephone databases

The DeepThought Core Architecture Framework

The Integral Dictionary: An Ontological Resource for the Semantic Web Integration of EuroWordNet, Balkanet, TID and SUMO

The Italian NESPOLE! Corpus: A Multilingual Database with Interlingua Annotation in Tourism and Medical Domains

The MULI Project: Annotation and Analysis of Information Structure in German and English

The Statistical Analysis of Morphosyntactic Distributions

The TüBa-D/Z Treebank: Annotating German with a Context-Free Backbone

Tools for Upgrading Printed Dictionaries by Means of Corpus-based Lexical Acquisition

Towards a Dynamic Lexicon: Predicting the Syntactic Argument Structure of Complex Verbs

Unexpected Productions May Well be Errors

You stupid tin box' - children interacting with the AIBO robot: A cross-linguistic emotional speech corpus

Pumping Documents Through a Domain and Genre Classification Pipeline

German (Deutsch)

Evaluation Resources for Concept-based Cross-Lingual Information Retrieval in the Medical Domain

German (Deutsch)

Towards Ontology Engineering Based on Linguistic Analysis

Greek

A Bayesian Model for Shallow Syntactic Parsing of Natural Language Texts

A Methodology and Associated Tools for Building Interlingual Wordnets

Bayesian Semantics Incorporation to Web Content for Natural Language Information Retrieval

Bypassing Greeklish!

Corpus Design, Recording and Phonetic Analysis of Greek Emotional Database 

Creation and Validation of Large Lexica for Speech-to-Speech Translation Purposes

Cypriot Speech Database: Data Collection and Greek to Cypriot Dialect Adaptation

Exploring Balkanet Shared Ontology for Multilingual Conceptual Indexing

Handling Subtle Sense Distinctions through Wordnet Semantic Types

Learning to predict Pitch Accents using Bayesian Belief Networks for Greek Language

Multi-lingual Evaluation of a Natural Language Generation System

OrienTel - Telephony Databases Across Northern Africa and the Middle East

Reusing Language Resources for Speech Applications involving Emotion

Talkbank: Building an Open Unified Multimodal Database of Communicative Interaction

The COST278 pan-European Broadcast News Database

H

 

Hebrew

Creation and Validation of Large Lexica for Speech-to-Speech Translation Purposes

OrienTel - Telephony Databases Across Northern Africa and the Middle East

Hindi

 

Automatic Generation of Compound Word Lexicon for Hindi Speech Synthesis

Automatic Language-Independent Induction of Gazetteer Lists

Collection of SLR in the Asian-Pacific area

Information Extraction from Hindi Texts

Hungarian

Combining symbolic and statistical methods in morphological analysis and unknown word guessing

Creating open language resources for Hungarian

Dynamic Lexicographic Data Modelling. A Diachronic Dictionary Development Report

Talkbank: Building an Open Unified Multimodal Database of Communicative Interaction

The COST 278 MASPER initiative - crosslingual speech recognition with large telephone databases

Tiered Tagging Revisited

I

 

Ibibio

WALA: a multilingual resource repository for West African Languages

Iko

WALA: a multilingual resource repository for West African Languages

Independent

Towards A Language Infrastructure for the Semantic Web

Indic Scripts

An XML Representation for Annotated Handwriting Datasets for Online Handwriting Recognition

Experiences in Collection of Handwriting Data for Online Handwriting Recognition in Indic Scripts

Irish

Acquiring Reusable Multilingual Phonotactic Resources

Italian

A2Q: an agent-based architecure for multilingual Q&A

Automatisation Of The Activity Of Term Collection In Different Languages

BootCaT: Bootstrapping Corpora and Terms from the Web

Building a Large Grammar for Italian

Building a Maritime Domain Lexicon: a Few Considerations on the Database Structure and the Semantic Coding

Building Distributed Language Resources by Grid Computing

CHeM: A System for the Automatic Analysis of e-mails in the Restoration and Conservation Domain

Computational Lexicography and Carlo Emilio Gadda, Principe dell'Analisi e Duca della Buona Cognizione

Creation and Validation of Large Lexica for Speech-to-Speech Translation Purposes

Cross-Language Acquisition of Semantic Models for Verbal Predicates

Discovery of (New) Knowledge and the Analysis of Text Corpora

Evaluation of Consensus on the Annotation of Prosodic Breaks in the Romance Corpus of Spontaneous Speech “C-ORAL-ROM”

How to Disassemble Alphabetical Processions - Morphological Treatment of Unknown Words

Hybrid Constraints for Robust Parsing: First Experiments and Evaluation

Integrated Language Technologies for Multilingual Information Services in the MEMPHIS Project

Introducing the La Repubblica Corpus: A large Annotated TEI(XML)-Compliant Corpus of Newspaper Italian

Measurements of Spoken Language Variability in a Multilingual Corpus. Predictable Aspects

MED-TYP: A Typological Database for Mediterranean Languages

Metaphors in Wordnets: from Theory to Practice

Multilingual Pattern Libraries for Question Answering: a Case Study for Definition Questions

NLP-enhanced Content Filtering within the POESIA Project

OntoTag's Linguistic Ontologies: Enhancing Higher Level and Semantic Web Annotations

Proper Names and Polysemy: from a Lexicographic Experience

Semantic Mark-up of Italian Legal Texts Through NLP-based Techniques

Semi-Automatic Derivation of a French Lexicon from CLIPS

Talkbank: Building an Open Unified Multimodal Database of Communicative Interaction

Term Translations in Parallel Corpora: Discovery and Consistency Check

The C-ORAL-ROM CORPUS. A Multilingual Resource of Spontaneous Speech for Romance Languages

The Integral Dictionary: An Ontological Resource for the Semantic Web Integration of EuroWordNet, Balkanet, TID and SUMO

The Italian NESPOLE! Corpus: A Multilingual Database with Interlingua Annotation in Tourism and Medical Domains

Towards the MEANING Top Ontology: Sources of Ontological Meaning

Unifying Lexicons in View of a Phonological and Morphological Lexical DB

Using cooccurrence statistics and the web to discover synonyms in a technical language

Using PiTagger for Lemmatization and PoS Tagging of a Spontaneous Speech Corpus: C-ORAL-ROM Italian

Using Semantic Language Resources to Support Textual Inference for Question Answering

J

 

Japanese

A Comparative Study on Human Communication Behaviors and Linguistic Characteristics for Speech-to-Speech Translation

A Comparison of Two Variant Corpora: The Same Content with Different Sources

A Lexicon Module for a Grammar Development Environment

An Information Repository Model for Advanced Question Answering Systems

Automatic Extraction of Hyponyms from Japanese Newspapers Using Lexico-syntactic Patterns

Building a Paraphrase Corpus for Speech Translation

Classification of Japanese Spatial Nouns

Collecting Spontaneously Spoken Queries for Information Retrieval

Comparison of some automatic and manual methods for summary evaluation based on the Text Summarization Challenge 2

Concept-based queries: Combining and Reusing Linguistic Corpus Formats and Query Languages

Consistent Storage of Metadata in Inference Lexica: The MetaLex Approach

Constructing Word-Sense Association Networks from Bilingual Dictionary and Comparable Corpora

Co-reference in Japanese Task-oriented Dialogues: A Contribution to the Development of Language-specific and Language-general Annotation Schemes and Resources

Definition, dictionaries and tagger for Extended Named Entity Hierarchy

Dynamic Lexicographic Data Modelling. A Diachronic Dictionary Development Report

Evaluating the FOKS Error Model

Extraction of Hyperonymy of Adjectives from Large Corpora by Using the Neural Network Model

How Does Automatic Machine Translation Evaluation Correlate With Human Scoring as the Number of Reference Translations Increases?

Incremental Methods to Select Test Sentences for Evaluating Translation Ability

Korean-Chinese-Japanese Multilingual Wordnet with Shared Semantic Hierarchy

Making an XML-based Japanese-Slovene Learners' Dictionary

Multilingual Corpus-based Approach to the Resolution of English -ing

Perceptual Evaluation of Quality Deterioration Owing to Prosody Modification

Phrase-Based Dependency Evaluation of a Japanese Parser

Related Word-pairs Extraction without Dictionaries

Semi-supervised learning by Fuzzy clustering and Ensemble learning

Speech & Expression - The Value of a Longitudinal Corpus

Talkbank: Building an Open Unified Multimodal Database of Communicative Interaction

Terminal Device Oriented Comparable Corpora and its Alignment -- Towards Extracting Paraphrasing Patterns --

Test Collections for Patent-to-Patent Retrieval and Patent Map Generation in NTCIR-4 Workshop

Toward Text Understanding: Integrating Relevance-tagged Corpus and Automatically Constructed Case Frames

Collection of SLR in the Asian-Pacific area

K

 

Korean

A Comparison of Two Variant Corpora: The Same Content with Different Sources

A Progress Report from the Linguistic Data Consortium: Recent Activities in Resource Creation and Distribution and the Development of Tools and Standards

Collection of SLR in the Asian-Pacific area

Creation and Assessment of Korean Speech and Noise DB in Car Environment

Korean-Chinese-Japanese Multilingual Wordnet with Shared Semantic Hierarchy

Lexical Analysis of Agglutinative Languages Using a Dictionary of Lemmas and Lexical Transducers

Sejong Korean Corpora in the Making

Test Collections for Patent-to-Patent Retrieval and Patent Map Generation in NTCIR-4 Workshop

L

 

Language independent

A Global Data Category Registry for Interoperable Language Resources

Data Driven Ontology Evaluation

Evaluation of Microphone Array Front-Ends for ASR - an Extension of the AURORA Framework

Infrastructure for Collaborative Annotation of Speech

Online Evaluation of Coreference Resolution

Principles of a system for terminological concept modelling

A Graphical Tool for Handling Rule Grammars in Java Speech Grammar Format

A Search Tool for Corpora with Positional Tagsets and Ambiguities

Highlighting latent structure in documents

Linguistic Corpus Search

Pumping Documents Through a Domain and Genre Classification Pipeline

Standardization in Multimodal Content Representation: Some Methodological Issues

SVMTool: A general POS tagger generator based on Support Vector Machines

Towards an International Standard on Feature Structure Representation

Language-independent (multilingual interface)

An Environment for Dialogue Corpora Collection (ENDIACC)

Latvian

MULTEXT-East Version 3: Multilingual Morphosyntactic Specifications, Lexicons and Corpora

Lithuanian

MULTEXT-East Version 3: Multilingual Morphosyntactic Specifications, Lexicons and Corpora

M

 

Maltese

MED-TYP: A Typological Database for Mediterranean Languages

Mambila

Talkbank: Building an Open Unified Multimodal Database of Communicative Interaction

Mandarin

A Progress Report from the Linguistic Data Consortium: Recent Activities in Resource Creation and Distribution and the Development of Tools and Standards

Collection of SLR in the Asian-Pacific area

Conversational Telephone Speech Corpus Collection for the NIST Speaker Recognition Evaluation 2004

Creation and Validation of Large Lexica for Speech-to-Speech Translation Purposes

Talkbank: Building an Open Unified Multimodal Database of Communicative Interaction

The Fisher Corpus: A Resource for the Next Generations of Speech-to-Text

The Mixer Corpus of Multilingual, Multichannel Speaker Recognition Data

Many

Current Projects in Languages of Military Interest at the Defense Language Institute

Maori

Dynamic Lexicographic Data Modelling. A Diachronic Dictionary Development Report

Mapudungun

Data Collection and Analysis of Mapudungun Morphology for Spelling Correction

Mexican Spanish

VOXMEX Speech Database: Design of a Phonetically Balanced Corpus

Modern Greek

MED-TYP: A Typological Database for Mediterranean Languages

Modern Greek in a multilingual context

Creating multi-purpose linguistic resources for Modern Greek: a deep Modern Greek Grammar

Modern Hebrew

MED-TYP: A Typological Database for Mediterranean Languages

Modern Standard Arabic

MED-TYP: A Typological Database for Mediterranean Languages

Moroccan Arabic

An Emerging Transcontinental Collaborative Research and Education Agenda in Human Language Technologies

Multilingual

OntoTag's Linguistic Ontologies: Enhancing Higher Level and Semantic Web Annotations

Rethinking Reusable Resources

Semi-automatic UNL Dictionary Generation using WordNet.PT

Multilingual approach

Automatic Translation Memory Fuzzy Match Post-Editing: A Step beyond Traditional TM/MT Integration

Intelligent Building of Language Resources for HLT Applications

Multiple

NIST Language Technology Evaluation Cookbook

SLR Validation: Current Trends and Developments

N

 

Nahuatl

Dynamic Lexicographic Data Modelling. A Diachronic Dictionary Development Report

Norwegian

A Lexicon Module for a Grammar Development Environment

Memory-based Classification of Proper Names in Norwegian

O

 

Old-Church Slavonic

Towards Intelligent Written Cultural Heritage Processing - Lexical Processing

OWL

Ontology Evaluation Functionalities of RDF(S), DAML+OIL, and OWL Parsers and Ontology Platforms

P

 

Persian

Creation of a Doctor-Patient Dialogue Corpus Using Standardized Patients

Polish

A Search Tool for Corpora with Positional Tagsets and Ambiguities

Extraction of Polish Named-Entities

Talkbank: Building an Open Unified Multimodal Database of Communicative Interaction

Portuguese

A Multilingual Database of Idioms

An Efficient Word Confidence Measure Using Likelihood Ratio Scores

Design and Implementation of a Semantic Search Engine for Portuguese

Evaluating Solutions for the Rapid Development of State-of-the-Art POS taggers for Portuguese

Evaluation of Consensus on the Annotation of Prosodic Breaks in the Romance Corpus of Spontaneous Speech “C-ORAL-ROM”

Extending WordNets to Implicit Information

INQUER: A WordNet-based Question-Answering Application

Measurements of Spoken Language Variability in a Multilingual Corpus. Predictable Aspects

Multifunctional Computational Lexicon of Contemporary Portuguese: An Available Resource for Multitype Applications

On the problems of creating a golden standard of inflected forms in Portuguese

Portuguese Large-scale Language Resources for NLP Applications

Providing on-line access to Portuguese language resources: corpora and lexicons

SALA II across the finish line: a large collection of mobile telephone speech databases from North and Latin America completed

Talkbank: Building an Open Unified Multimodal Database of Communicative Interaction

The C-ORAL-ROM CORPUS. A Multilingual Resource of Spontaneous Speech for Romance Languages

The Corpógrafo – a Web-based environment for corpora research

The COST278 pan-European Broadcast News Database

The Integral Dictionary: An Ontological Resource for the Semantic Web Integration of EuroWordNet, Balkanet, TID and SUMO

The Lácio-Web: Corpora and Tools to advance Brazilian Portuguese Language Investigations and Computational Linguistic Tools

The Verb in the Terminological Collocations. Contribution to the Development of a Morphological Analyser. MorphoComp

What is my Style? Using Stylistic Features of Portuguese Web Texts to classify Web pages according to Users' Needs

Portuguese (European)

An Acoustic Corpus Contemplating Regional Variation for Studies of European Portuguese Nasals

Potentiallly all

Using the Penn Treebank to Evaluate Non-Treebank Parsers

Provençal

MED-TYP: A Typological Database for Mediterranean Languages

Q

 

Q’anjob’al (Mayan Guatemala)

Applying Computational Linguistic Techniques in a Documentary Project for Q’anjob’al (Mayan Guatemala)

Quechua

Dynamic Lexicographic Data Modelling. A Diachronic Dictionary Development Report

R

 

RDF(S)

Ontology Evaluation Functionalities of RDF(S), DAML+OIL, and OWL Parsers and Ontology Platforms

Resian

MULTEXT-East Version 3: Multilingual Morphosyntactic Specifications, Lexicons and Corpora

Romanian

A Methodology and Associated Tools for Building Interlingual Wordnets

A word alignment system based on a translation equivalence extractor

Exploring Balkanet Shared Ontology for Multilingual Conceptual Indexing

MULTEXT-East Version 3: Multilingual Morphosyntactic Specifications, Lexicons and Corpora

Talkbank: Building an Open Unified Multimodal Database of Communicative Interaction

Tiered Tagging Revisited

Word Sense Disambiguation as a Wordnets' Validation Method in Balkanet

Russian

A Flexible Language Acquisition Tool Kit for Natural Language Processing

A Progress Report from the Linguistic Data Consortium: Recent Activities in Resource Creation and Distribution and the Development of Tools and Standards

Conversational Telephone Speech Corpus Collection for the NIST Speaker Recognition Evaluation 2004

Creation and Validation of Large Lexica for Speech-to-Speech Translation Purposes

Creation of reusable components and language resources for Named Entity Recognition in Russian

Development of Bilingual Domain-Specific Ontology for Automatic Conceptual Indexing

Development of Ontologies with Minimal Set of Conceptual Relations

Dynamic Lexicographic Data Modelling. A Diachronic Dictionary Development Report

Integration of Russian Language Resources

MULTEXT-East Version 3: Multilingual Morphosyntactic Specifications, Lexicons and Corpora

Russian Information Retrieval Evaluation Seminar

Talkbank: Building an Open Unified Multimodal Database of Communicative Interaction

The AAC [Austrian Academy Corpus] An Enterprise to Develop Large Electronic Text Corpora

The Mixer Corpus of Multilingual, Multichannel Speaker Recognition Data

Towards basic categories for describing properties of texts in a corpus

Word Association Norms as a Unique Supplement of Traditional Language Resources

S

 

Sardinian

MED-TYP: A Typological Database for Mediterranean Languages

Scottish

Dynamic Lexicographic Data Modelling. A Diachronic Dictionary Development Report

Serbian

A Methodology and Associated Tools for Building Interlingual Wordnets

Combining Heterogeneous Lexical Resources

Exploring Balkanet Shared Ontology for Multilingual Conceptual Indexing

MULTEXT-East Version 3: Multilingual Morphosyntactic Specifications, Lexicons and Corpora

Talkbank: Building an Open Unified Multimodal Database of Communicative Interaction

Towards the Use of Word Stems and Suffixes for Statistical Machine Translation

Serbo-Croatian

MED-TYP: A Typological Database for Mediterranean Languages

Slovak

The COST 278 MASPER initiative - crosslingual speech recognition with large telephone databases

Slovakian

The COST278 pan-European Broadcast News Database

Slovene

Making an XML-based Japanese-Slovene Learners' Dictionary

MED-TYP: A Typological Database for Mediterranean Languages

MULTEXT-East Version 3: Multilingual Morphosyntactic Specifications, Lexicons and Corpora

Tiered Tagging Revisited

Slovenian

A data-driven adaptation of prosody in a multilingual TTS

Acquisition and Annotation of Slovenian Broadcast News Database

Creating Slovenian Language Resources for Development of Speech-to-Speech Translation Components

Creation and Validation of Large Lexica for Speech-to-Speech Translation Purposes

Development and Integration of the LDA-Toolkit into the COST249 SpeechDat (II) SIG Reference Recognizer

Development of Slovenian Broadcast News Speech Database

The COST 278 MASPER initiative - crosslingual speech recognition with large telephone databases

The COST278 pan-European Broadcast News Database

Sotho

The African Speech Technology Project: An Assessment

South African English

The African Speech Technology Project: An Assessment

Spanish

A Progress Report from the Linguistic Data Consortium: Recent Activities in Resource Creation and Distribution and the Development of Tools and Standards

ALLES: Integrating NLP in ICALL Applications

Application of the BLEU Method for Evaluating Free-text Answers in an E-learning Environment

Automatically selecting domain markers for terminology extraction

AV@CAR: A Spanish Multichannel Multimodal Corpus for In-Vehicle Automatic Audio-Visual Speech Recognition

Bilingual Connections for Trilingual Corpora: An XML Approach

Construction of a Bilingual Arabic-Spanish Lexicon of Verbs Based on a Parallel Corpus

Conversational Telephone Speech Corpus Collection for the NIST Speaker Recognition Evaluation 2004

Creation and Validation of Large Lexica for Speech-to-Speech Translation Purposes

Cross-effective cross-lingual document classification

Cross-Language Acquisition of Semantic Models for Verbal Predicates

Development and Integration of the LDA-Toolkit into the COST249 SpeechDat (II) SIG Reference Recognizer

Development of Resources for a Bilingual Automatic Index System of Broadcast News in Basque and Spanish

Enriching EWN with Syntagmatic Information by means of WSD

Enriching the Spanish EuroWordNet by Collocations

EuroWordNet as a Resource for Cross-language Information Retrieval

Evaluation of Consensus on the Annotation of Prosodic Breaks in the Romance Corpus of Spontaneous Speech “C-ORAL-ROM”

FreeLing: An Open-Source Suite of Language Analyzers

Intelligent Building of Language Resources for HLT Applications

Lexical Entry Templates for Robust Deep Parsing

Measurements of Spoken Language Variability in a Multilingual Corpus. Predictable Aspects

MED-TYP: A Typological Database for Mediterranean Languages

Mercedes, A Term-In-Context Highlighter

Methodology for Rapid Prototyping and Testing of ASR Based User Interfaces

MiniCors and Cast3LB: Two Semantically Tagged Spanish Corpora

Multilingual Corpus-based Approach to the Resolution of English -ing

Multiple Sequence Alignment for characterizing the linear structure of revision

NLP-enhanced Content Filtering within the POESIA Project

NLP-enhanced error Checking for Catalan unrestricted text

OntoTag's Linguistic Ontologies: Enhancing Higher Level and Semantic Web Annotations

SALA II across the finish line: a large collection of mobile telephone speech databases from North and Latin America completed

Selecting the Correct English Synset for a Spanish Sense

Semantic categorization of Spanish se-constructions

Spanish WordNet 1.6: Porting the Spanish WordNet Across Princeton Versions

Talkbank: Building an Open Unified Multimodal Database of Communicative Interaction

The C-ORAL-ROM CORPUS. A Multilingual Resource of Spontaneous Speech for Romance Languages

The COST 278 MASPER initiative - crosslingual speech recognition with large telephone databases

The GENOMA-KB Platform: Queries Over Integrated Linguistic Resources

The GENOMA-KB project: towards the integration of concepts, terms, textual corpora and entities

The Integral Dictionary: An Ontological Resource for the Semantic Web Integration of EuroWordNet, Balkanet, TID and SUMO

The Mixer Corpus of Multilingual, Multichannel Speaker Recognition Data

The SPARTACUS-Database: a Spanish Sentence Database for Offline Handwriting Recognition

The Translation Correction Tool: English-Spanish User Studies

Towards the MEANING Top Ontology: Sources of Ontological Meaning

Towards the Use of Word Stems and Suffixes for Statistical Machine Translation

Training a Sentence-Level Machine Translation Confidence Measure

Transcrigal: A Bilingual System for Automatic Indexing of Broadcast News

Translation memories enrichment by statistical bilingual segmentation

Spanish (Latin American)

Developing Language Resources for a Transnational Digital Government System

Swahili

Dynamic Lexicographic Data Modelling. A Diachronic Dictionary Development Report

Swedish

A pattern extraction workbench combining multiple linguistic levels

Finding the Correct Interpretation of Swedish Compounds a Statistical Approach

MT Goes Farming: Comparing Two Machine Translation Approaches on a New Domain

Open Resources for Language Technology

Probabilistic Detection of Context-Sensitive Spelling Errors

Talkbank: Building an Open Unified Multimodal Database of Communicative Interaction

T

 

Tamil

Dynamic Lexicographic Data Modelling. A Diachronic Dictionary Development Report

Thai

Collection of SLR in the Asian-Pacific area

Enriching a Thai Lexical Database with Selectional Preferences

Open Collaborative Development of the Thai Language Resources for Natural Language Processing

Talkbank: Building an Open Unified Multimodal Database of Communicative Interaction

Tibetan

A Syntactically Annotated Corpus of Tibetan

Turkish

A Methodology and Associated Tools for Building Interlingual Wordnets

Creation and Validation of Large Lexica for Speech-to-Speech Translation Purposes

Development of a Corpus Workbench for the METU Turkish Corpus

Duration Modeling for Turkish Text-to-Speech Synthesis System

Exploring Balkanet Shared Ontology for Multilingual Conceptual Indexing

MED-TYP: A Typological Database for Mediterranean Languages

OrienTel - Telephony Databases Across Northern Africa and the Middle East

Talkbank: Building an Open Unified Multimodal Database of Communicative Interaction

Tzeltal

Talkbank: Building an Open Unified Multimodal Database of Communicative Interaction

Tzotzil

Talkbank: Building an Open Unified Multimodal Database of Communicative Interaction

U

 

Universal

A Large Metadata Domain of Language Resources

Architecture for Distributed Language Resource Management and Archiving

Cross-Disciplinary Integration of Metadata Descriptions

Design of an Interactive Web-based User Interface for Speech Database Query Formation

US-English

Bilingual Connections for Trilingual Corpora: An XML Approach

Creation and Validation of Large Lexica for Speech-to-Speech Translation Purposes

V

 

Various Native American

An Emerging Transcontinental Collaborative Research and Education Agenda in Human Language Technologies

Vietnamese

Developping tools and building linguistic resources for Vietnamese morpho-syntactic processing

Dynamic Lexicographic Data Modelling. A Diachronic Dictionary Development Report

Spoken and Written Language Resources for Vietnamese

W

 

Warao

A Multi-Modal Documentation System for Warao

Warlpiri

Dynamic Lexicographic Data Modelling. A Diachronic Dictionary Development Report

X

 

Xhosa

The African Speech Technology Project: An Assessment

Z

 

Zeltal

Dynamic Lexicographic Data Modelling. A Diachronic Dictionary Development Report

Zulu

Software Tools for Morphological Tagging of Zulu Corpora and Lexicon Development

The African Speech Technology Project: An Assessment