LREC 2000 - Abstracts

LREC 2000 2^nd International Conference on Language Resources & Evaluation

Conference Papers and Abstracts

Papers and abstracts by paper title: A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

Papers and abstracts by ID number: 1-50, 51-100, 101-150, 151-200, 201-250, 251-300, 301-350, 351-377.

List of all papers and abstracts.

Paper Paper Title Abstract

116 Galaxy-II as an Architecture for Spoken Dialogue Evaluation The GALAXY-II architecture, comprised of a centralized hub mediating the interaction among a suite of human language technology servers, provides both a useful tool for implementing systems and also a streamlined way of configuring the evaluation of these systems. In this paper, we discuss our ongoing efforts in evaluation of spoken dialogue systems, with particular attention to the way in which the architecture facilitates the development of a variety of evaluation configurations. We furthermore propose two new metrics for automatic evaluation of the discourse and dialogue components of a spoken dialogue system, which we call “user frustration” and “information bit rate.”

279 GeDeriF: Automatic Generation and Analysis of Morphologically Constructed Lexical Resources One of the major frequent problems in text retrieval comes from large number of words encountered which are not listed in general language dictionaries. However, it is very often the case that these words are morphologically complex, and as such have a meaning which is predictable on the basis of their structure. Furthermore, such words typically belong to specialized language uses (e.g. scientific, philosophical or media technolects). Consequently, tools for listing and analysing such words can help enrich a terminological database. The purpose of this paper is to present a system that automatically generates morphologically complex lexical French items which are not listed in dictionaries, and that furthermore provides a structural and semantic analysis of these items. The output of this system is a morphological database (currently in progress) which forms a powerful lexical resource. It will be very useful in Natural Language Processing (NLP) and in IR (Information Retrieval) applications. Indeed the system generates a potentially infinite set of complex (derived) lexical units (henceforth CLUs) automatically associated with a rich array of morpho-semantic features, and is thus capable of dealing morphologically complex structures which are unlisted in dictionaries.

164 Grammarless Bracketing in an Aligned Bilingual Corpus We propose a simple grammarless procedure to extract phrasal examples from aligned parallel texts. Is is based on the difference of word sequence in two languages.

7 GREEK ToBI: A System for the Annotation of Greek Speech Corpora Greek ToBI is a system for the annotation of (Standard) Greek spoken corpora, that encodes intonational, prosodic and phonetic information. It is used to develop a large and publicly available database of prosodically annotated utterances for research, engineering and educational purposes. Greek ToBI is based on the system developed for American English (ToBI), but includes novel features (“tiers”) designed to address particularities of Greek prosody that merit annotation, such as stress and juncture. Thus Greek ToBI includes five tiers: the Tone Tier shows the intonational analysis of the utterance; the Prosodic Words Tier is a phonetic transcription; the Break Index Tier shows indices of cohesion; the Words Tier gives the text in romanization; the Miscellaneous Tier is used to encode other relevant information (e.g., disfluency or pitch-halving). The development of GRToBI is largely based on the transcription and analysis of a corpus of spoken Greek, that includes data from several speakers and speech styles, but also draws on existing quantitative research on Greek prosody.

289 GRUHD: A Greek database of Unconstrained Handwriting In this paper we present the GRUHD database of Greek characters, text, digits, and other symbols in unconstrained handwriting mode. The database consists of 1,760 forms that contain 667,583 handwritten symbols and 102,692 words in total, written by 1,000 writers, 500 men and equal number of women. Special attention was paid in gathering data from writers of different age and educational level. The GRUHD database is accompanied by the GRUHD software that facilitates its installation and use and enables the user to extract and process the data from the forms selectively, depending on the application. The various types of possible installations make it appropriate for the training and validation of character recognition, character segmentation and text-dependent writer identification systems.

77 Guidelines for Japanese Speech Synthesizer Evaluation Speech synthesis technology is one of the most important elements required for better human interfaces for communication and information systems.This paper describes the ''Guidelines for Speech Synthesis System Performance Evaluation Methods''created by the Speech Input/Output Systems Expert Committee of the Japan Electronic Industry Development Association (JEIDA).JEIDA has been investigating speech synthesizer evaluation methods since 1993 and previously reported the provisional version of the guidelines. The guidelines comprise six chapters: General rules,Text analysis evaluation,Syllable articulation test,Word intelligibility test, Sentence intelligibility test,and Over ll quality evaluation.

QUOTE>

Paper	Paper Title	Abstract
116	Galaxy-II as an Architecture for Spoken Dialogue Evaluation	The GALAXY-II architecture, comprised of a centralized hub mediating the interaction among a suite of human language technology servers, provides both a useful tool for implementing systems and also a streamlined way of configuring the evaluation of these systems. In this paper, we discuss our ongoing efforts in evaluation of spoken dialogue systems, with particular attention to the way in which the architecture facilitates the development of a variety of evaluation configurations. We furthermore propose two new metrics for automatic evaluation of the discourse and dialogue components of a spoken dialogue system, which we call “user frustration” and “information bit rate.”
279	GeDeriF: Automatic Generation and Analysis of Morphologically Constructed Lexical Resources	One of the major frequent problems in text retrieval comes from large number of words encountered which are not listed in general language dictionaries. However, it is very often the case that these words are morphologically complex, and as such have a meaning which is predictable on the basis of their structure. Furthermore, such words typically belong to specialized language uses (e.g. scientific, philosophical or media technolects). Consequently, tools for listing and analysing such words can help enrich a terminological database. The purpose of this paper is to present a system that automatically generates morphologically complex lexical French items which are not listed in dictionaries, and that furthermore provides a structural and semantic analysis of these items. The output of this system is a morphological database (currently in progress) which forms a powerful lexical resource. It will be very useful in Natural Language Processing (NLP) and in IR (Information Retrieval) applications. Indeed the system generates a potentially infinite set of complex (derived) lexical units (henceforth CLUs) automatically associated with a rich array of morpho-semantic features, and is thus capable of dealing morphologically complex structures which are unlisted in dictionaries.
164	Grammarless Bracketing in an Aligned Bilingual Corpus	We propose a simple grammarless procedure to extract phrasal examples from aligned parallel texts. Is is based on the difference of word sequence in two languages.
7	GREEK ToBI: A System for the Annotation of Greek Speech Corpora	Greek ToBI is a system for the annotation of (Standard) Greek spoken corpora, that encodes intonational, prosodic and phonetic information. It is used to develop a large and publicly available database of prosodically annotated utterances for research, engineering and educational purposes. Greek ToBI is based on the system developed for American English (ToBI), but includes novel features (“tiers”) designed to address particularities of Greek prosody that merit annotation, such as stress and juncture. Thus Greek ToBI includes five tiers: the Tone Tier shows the intonational analysis of the utterance; the Prosodic Words Tier is a phonetic transcription; the Break Index Tier shows indices of cohesion; the Words Tier gives the text in romanization; the Miscellaneous Tier is used to encode other relevant information (e.g., disfluency or pitch-halving). The development of GRToBI is largely based on the transcription and analysis of a corpus of spoken Greek, that includes data from several speakers and speech styles, but also draws on existing quantitative research on Greek prosody.
289	GRUHD: A Greek database of Unconstrained Handwriting	In this paper we present the GRUHD database of Greek characters, text, digits, and other symbols in unconstrained handwriting mode. The database consists of 1,760 forms that contain 667,583 handwritten symbols and 102,692 words in total, written by 1,000 writers, 500 men and equal number of women. Special attention was paid in gathering data from writers of different age and educational level. The GRUHD database is accompanied by the GRUHD software that facilitates its installation and use and enables the user to extract and process the data from the forms selectively, depending on the application. The various types of possible installations make it appropriate for the training and validation of character recognition, character segmentation and text-dependent writer identification systems.
77	Guidelines for Japanese Speech Synthesizer Evaluation	Speech synthesis technology is one of the most important elements required for better human interfaces for communication and information systems.This paper describes the ''Guidelines for Speech Synthesis System Performance Evaluation Methods''created by the Speech Input/Output Systems Expert Committee of the Japan Electronic Industry Development Association (JEIDA).JEIDA has been investigating speech synthesizer evaluation methods since 1993 and previously reported the provisional version of the guidelines. The guidelines comprise six chapters: General rules,Text analysis evaluation,Syllable articulation test,Word intelligibility test, Sentence intelligibility test,and Over ll quality evaluation.