LREC 2000 2nd International Conference on Language Resources & Evaluation | |
Conference Papers and Abstracts
Papers and abstracts by paper title: A B C D E F G H I J K L M N O P Q R S T U V W X Y Z Papers and abstracts by ID number: 1-50, 51-100, 101-150, 151-200, 201-250, 251-300, 301-350, 351-377. |
Evaluation of a Dialogue System Based on a Generic Model that Combines Robust Speech Understanding and Mixed-initiative Control | This paper presents a generic model to combine robust speech understanding and mixed-initiative dialogue control in spoken dialogue systems. It relies on the use of semantic frames to conceptually store user interactions, a frame-unification procedure to deal with partial information, and a stack structure to handle initiative control. This model has been successfully applied in a dialogue system being developed at our lab, named SAPLEN, which aims to deal with the telephone-based product orders and queries of fast food restaurants’ clients. In this paper we present the dialogue system and describe the new model, together with the results of a preliminary evaluation of the system concerning recognition time, word accuracy, implicit recovery and speech understanding. Finally, we present the conclusions and indicate possibilities for future work. | |
MDWOZ: A Wizard of Oz Environment for Dialog Systems Development | This paper describes MDWOZ, a development environment for spoken dialog systems based on the Wizard of Oz technique, whose main goal is to facilitate data collection (speech signal and dialog related information) and interaction model building. Both these tasks can be quite difficult, and such an environment can facilitate them very much. Due to the modular way in which MDWOZ was implemented, it is possible to reuse parts of it in the final dialog system. The environment provides language-transparent facilities and accessible methods such that even non-computing specialists can participate in spoken dialog systems development. The main features of the environment are presented, together with some test experiments. | |
A Web-based Text Corpora Development System | One of the most important starting points for any NLP endeavor is the construction of text corpora of appropriate size and quality. This paper presents a web-based text corpora development system which focuses both on the size and the quality of these corpora. The quantitative problem is solved by using the Internet as a practically limitless source of texts. To ensure a certain quality, we enrich the text with relevant information, to be fit for further use, by treating in an integrated manner the problems of morpho-syntactic annotation, lexical ambiguity resolution, and diacritic characters restoration. Although at this moment it is targeted at texts in Romanian, the system can be adapted to other languages, provided that some appropriate auxiliary resources are available. | |
Term-based Identification of Sentences for Text Summarisation | The present paper describes a methodology for automatic text summarisation of Greek texts which combines terminology extraction and sentence spotting. Since generating abstracts has proven a hard NLP task of questionable effectiveness, the paper focuses on the production of a special kind of abstracts, called extracts: sets of sentences taken from the original text. These sentences are selected on the basis of the amount of information they carry about the subject content. The proposed, corpus-based and statistical approach exploits several heuristics to determine the summary-worthiness of sentences. It actually uses statistical occurrences of terms (TF· IDF formula) and several cue phrases to calculate sentence weights and then extract the top scoring sentences which form the extract. | |
Morphemic Analysis and Morphological Tagging of Latvian Corpus | There are approximately 8 million running words in Latvian Corpus and it is initial size for investigations using national corpus. The corpus contains different texts: modern written Latvian, different newspapers, Latvian classical literature, Bible, Latvian Folk Believes, Latvian Folk Songs, Latvian Fairy-tales and other. Methodology and the software for SGML tagging are developed by Artificial Intelligence Laboratory; approximately 3 million running words is marked up by SGML language. The first step was to develop morphemic analysis in co-operation with Dr. B. Kangere from Stockholm University. The first morphological analyzer was developed in 1994 at Artificial Intelligence Laboratory. The analyzer has its own tag system. Later the tags for the morphological analyzer were elaborated according to MULTEXT-EAST recommendations. Latvian morphological system is rather complicate and there are many difficulties with the recognition of words, word forms as far as Latvian has many homonymous forms. The first corpus of texts of morphological analysis is marked up manually. Totally it covers approximately 10 000 words of modern written Latvian. The results of this work will be used in the further investigations. | |
Textual Information Retrieval Systems Test: The Point of View of an Organizer and Corpuses Provider | Amaryllis is an evaluation programme for text retrieval systems which has been carried out as two test campaigns. The second Amaryllis campaign took place in 1998/1999. Corpuses of documents, topics, and the corresponding responses were first sent to each of the participating teams for system learning purposes. Corpuses of new documents and a set of new topics were then supplied for evaluation purposes. Two optional tracks were added for Internet and interlingual track. The first track of these contained a test via the Internet. INIST sent topics to the system and collected responses directly, thus reducing the need for conceptor manipulations. The second contained tests in different European Community language pairs. The corpuses of documents consisted of records of questions and answers from the European Commission, in parallel official language versions. Participants could use any language pair for their tests. The aim of this paper is to give the point of view of an organizer and corpus provider (INIST) on the organization of an operation of this sort. In particular, it will describe the difficulties encountered during the tests (corpus construction, translation of topics and systems evaluation ), and will suggest avenues to explore for future tests. | |
The Spoken Dutch Corpus. Overview and First Evaluation | In this paper the Spoken Dutch Corpus project is presented, a joint Flemish-Dutch undertaking aimed at the compilation and annotation of a 10-million-word corpus of spoken Dutch. Upon completion, the corpus will constitute a valuable resource for research in the fields of computational linguistics and language and speech technology. The paper first gives an overall description of the project, its aims, structure and organization. It then goes on to discuss the considerations - both methodological and practical - that have played a role in the design of the corpus as well as in its compilation and annotation. The paper concludes with an account of the data that are available in the first release of the first part of the corpus that came out on March 1st, 2000. | |
A Strategy for the Syntactic Parsing of Corpora: from Constraint Grammar Output to Unification-based Processing | This paper presents a strategy for syntactic analysis based on the combination of two different parsing techniques: lexical syntactic tagging and phrase structure syntactic parsing. The basic proposal is to take advantage of the good results on lexical syntactic tagging to improve the whole performance of unification-based parsing. The syntactic functions attached to every word by the lexical syntactic tagging are used as head features in the unification-based grammar, and are the base for grammar rules. | |
Producing LRs in Parallel with Lexicographic Description: the DCC project | This paper is a brief presentation of some aspects of the most important lexicographical project that is being carried out in Catalonia: the DCC (Dictionary of Contemporary Catalan) project. After making a general description of the aims of the project, the specific goal of my contribution is to present the general strategy of our lexicographical description, consisting in the production of an electronic dictionary able to be the common repository from which we will obtain different derived products (the human dictionary, among them). My concern is to show to which extent human and computer lexicography can share descriptions, and the results of lexicographic work can be taken as a language resource in this new perspective. I will present different aspects and criteria of our dictionary, taking the different layers (morphology, syntax, semantics) as a guideline. | |
A Novelty-based Evaluation Method for Information Retrieval | In information retrieval research, precision and recall have long been used to evaluate IR systems. However, given that a number of retrieval systems resembling one another are already available to the public, it is valuable to retrieve novel relevant documents, i.e., documents that cannot be retrieved by those existing systems. In view of this problem, we propose an evaluation method that favors systems retrieving as many novel documents as possible. We also used our method to evaluate systems that participated in the IREX workshop. | |
Towards More Comprehensive Evaluation in Anaphora Resolution | The paper presents a package of evaluation tasks for anaphora resolution. We argue that these newly added tasks which have been carried out on Mitkov's (1998) knowledge-poor, robust approach, provide a better picture of the performance of an anaphora resolution system. The paper also outlines future work on the development of a 'consistent' evaluation environment for anaphora resolution. | |
Galaxy-II as an Architecture for Spoken Dialogue Evaluation | The GALAXY-II architecture, comprised of a centralized hub mediating the interaction among a suite of human language technology servers, provides both a useful tool for implementing systems and also a streamlined way of configuring the evaluation of these systems. In this paper, we discuss our ongoing efforts in evaluation of spoken dialogue systems, with particular attention to the way in which the architecture facilitates the development of a variety of evaluation configurations. We furthermore propose two new metrics for automatic evaluation of the discourse and dialogue components of a spoken dialogue system, which we call “user frustration” and “information bit rate.” | |
Building the Croatian-English Parallel Corpus | The contribution gives a survey of procedures and formats used in building the Croatian-English parallel corpus which is being collected in the Institute of Linguistics at the Philosophical Faculty, University of Zagreb. The primary text source is newspaper Croatia Weekly which has been published from the beginning of 1998 by HIKZ (Croatian Institute for Information and Culture). After quick survey of existing English-Croatian parallel corpora, the article copes with procedures involved in text conversion and text encoding, particularly the alignment. There are several recent suggestions for alignment encoding and they are elaborated. Preliminary statistics on numbers of S and W elements in each language is given at the end of the article. | |
Lexical and Translation Equivalence in Parallel Corpora | In the present paper we intend to investigate to what extent use of parallel corpora can help to eliminate some of the difficulties noted with bilingual dictionaries. The particular issues addressed are the bidirectionality of translation equivalence, the coverage of multiword units, and the amount of implicit knowledge presupposed on the part of the user in interpreting the data. Three lexical items belonging to different word classes were chosen for analysis: the noun head, the verb give and the preposition with. George Orwell's novel 1984 was used as source material, which is available in English-Hungarian sentence aligned form. It is argued that the analysis of translation equivalents displayed in sets of concordances with aligned sentences in the target language holds important implications for bilingual lexicography and automatic word alignment methodology. | |
Towards a Standard for Meta-descriptions of Language Resources | The desire is to improve the availability of Language Resources (LR) on the Intra- and Internet. It is suggested that this can be achieved by creating a browsable & searchable universe of meta-descriptions. This asks for the development of a standard for tagging LRs with meta-data and several conventions agreed within the community. | |
Object-oriented Access to the Estonian Phonetic Database | The paper introduces the Estonian Phonetic Database developed at the Laboratory of Phonetics and Speech Technology of the Institute of Cybernetics at the Tallinn Technical University, and its integration into QuickSig – an object-oriented speech processing environment developed at the Acoustics Laboratory of the Helsinki University of Technology. Methods of database access are discussed, relations between different speech units – sentences, words, phonemes – are defined, examples of predicate functions are given to perform searches for different contexts, and the advantage of an object-oriented paradigm is demonstrated. The introduced approach has been proven to be a flexible research environment allowing studies to be performed in a more efficient way. | |
ItalWordNet: a Large Semantic Database for Italian | The focus of this paper is on the work we are carrying out to develop a large semantic database within an Italian national project, SI-TAL, aiming at realizing a set of integrated (compatible) resources and tools for the automatic processing of the Italian language. Within SI-TAL, ItalWordNet is the reference lexical resource which will contain information related to about 130,000 word senses grouped into synsets. This lexical database is not being created ex novo, but extending and revising the Italian lexical wordnet built in the framework of the EuroWordNet project. In this paper we firstly describe how the lexical coverage of our wordnet is being extended by adding adjectives, adverbs and proper nouns, plus a terminological subset belonging to the economic and financial domain. The relevant changes involved by these extensions both in the linguistic model and in the data structure are then illustrated. In particular we discuss i) the new semantic relations identified to encode information on adjectives and adverbs ii) the new architecture including the terminological subset. | |
FAST - Towards a Semi-automatic Annotation of Corpora | As the use of annotated corpora in natural language processing applications increases, we are aware of the necessity of having flexible annotation tools that would not only support the manual annotation, but also enable us to perform post-editing on a text which has already been automatically annotated using a separate processing tool and even to interact with the tool during the annotation process. In practice, we have been confronted with the problem of converting the output of different tools to SGML format, while preserving the previous annotation, as well as with the difficulty of post-editing manually an annotated text. It has occurred to us that designing an interface between an annotation tool and any automatic tool would not only provide an easy way of taking advantage of the automatic annotation but it would also allow an easier interactive manual editing of the results. FAST was designed as a manual tagger that can also be used in conjunction with automatic tools for speeding up the human annotation. | |
Coreference Resolution Evaluation Based on Descriptive Specificity | This paper introduces a new evaluation method for the coreference resolution task. Considering that coreference resolution is a matter of linking expressions to discourse referents, we set our evaluation criteron in terms of an evaluation of the denotations assigned to the expressions. This criterion requires that the coreference chains identified in one annotation stand in a one-to-one correspondence with the coreference chains in the other. To determine this correspondence and with a view to keep closer to what human interpretation of the coreference chains would be, we take into account the fact that, in a coreference chain, some expressions are more specific to their referent than others. With this observation in mind, we measure the similarity between the chains in one annotation and the chains in the other, and then compute the optimal similarity between the two annotations. Evaluation then consists in checking whether the denotations assigned to the expressions are correct or not. New measures to analyse errors are also introduced. A comparison with other methods is given at the end of the paper. | |
A Text->Meaning->Text Dictionary and Process | In this article we deal with various applications of a multilingual semantic network named The Integral Dictionary. We revise different commercial applications that uses semantic networks and we show the results with the Integral Dictionary. The details of the semantic calculations are not given here but we show that contrary to the WordNet semantic net, the Integral Dictionary provides most data and relations needed to these calculations. The article presents results and discussion on lexical expanding, lexical reduction, WSD, query expansion, lexical translation extraction, document summary Emails sorting, catalogue access and information retrieval. We conclude that resource like Integral Dictionary can become a good new step for all those who tried to compute semantics with WordNet and that complementary between the two dictionaries could be seriously study in a shared project. | |
A French Phonetic Lexicon with Variants for Speech and Language Processing | This paper reports on a project aiming at the semi-automatic development of a large orthographic-phonetic lexicon for French, based on the Multext dictionary. It details the various stages of the project, with an emphasis on the methodological and design aspects. Information regarding the lexicon’s content is also given, together with a description of interface tools which should facilitate its exploitation. | |
Annotating Communication Problems Using the MATE Workbench | The increasing commercialisation and sophistication of language engineering products reinforces the need for tools and standards in support of a more cost-effective development and evaluation process than has been possible so far.This paper presents results of the MATE project which was launched in response to the need for standards and tools in support of creating,annotating,evaluating and exploiting spoken language resources.Focusing on the MATE workbench,we illustrate its functionality and usability through its use for markup of communication problems. | |
A Methodology for Evaluating Spoken Language Dialogue Systems and Their Components | As spoken language dialogue systems (SLDSs)proliferate in the market place,the issue of SLDS evaluation has come to attract wide interest from research and industry alike.Yet it is only recently that spoken dialogue engineering researchers have come to face SLDSs evaluation in its full complexity.This paper presents results of the European DISC project concerning technical evaluation and usability evaluation of SLDSs and their components.The paper presents a methodology for complete and correct evaluation of SLDSs and components together with a generic evaluation template for describing the evaluation criteria needed. | |
Evaluating Translation Quality as Input to Product Development | In this paper we present a corpus-based method to evaluate the translation quality of machine translation (MT) systems. We start with a shallow analysis of a large corpus and gradually focus the attention on the translation problems. The method constitutes an efficient way to identify the most important grammatical and lexical weaknesses of an MT system and to guide development towards improved translation quality. The evaluation described in the paper was carried out as a cooperation between an MT technology developer, Sail Labs, and the Computational Linguistics group at the University of Zurich. | |
Evaluation of Word Alignment Systems | Recent years have seen a few serious attempts to develop methods and measures for the evaluation of word alignment systems, notably the Blinker project (Melamed, 1998) and the ARCADE project (Veronis and Langlais, forthcoming). In this paper we discuss different approaches to the problem and report on results from a project where two word alignment systems have been evaluated. These results include methods and tools for the generation of reference data and a set of measures for system performance. We note that the selection and sampling of reference data can have a great impact on scoring results. | |
How To Evaluate and Compare Tagsets? A Proposal | We propose a methodology which allows an evaluation of distributional qualities of a tagset and a comparison between tagsets. Evaluation of tagset is crucial since the task of tagging is often considered as one of the first tasks in language processing. The aim of tagging is to summarise as well as possible linguistic information for further processing such as syntactic parsing. The idea is to consider these further steps in order to evaluate a given tagset, and thus to measure the pertinence of the information provided by the tagset for these steps. For this purpose, a Machine Learning system, ALLiS, is used, whose goal is to learn phrase structures from bracketed corpora and to generate formal grammar which describes these structures. ALLiS learning is based on the detection of structural regularities. By this means, it can be pointed out some non-distributional behaviours of the tagset, and thus some of its weaknesses or its inadequacies. | |
Determining the Tolerance of Text-handling Tasks for MT Output | With the explosion of the internet and access to increased amounts of information provided by international media, the need to process this abundance of information in an efficient and effective manner has become critical. The importance of machine translation (MT) in the stream of information processing has become apparent. With this new demand on the user community comes the need to assess an MT system before adding such a system to the user’s current suite of text-handling applications. The MT Functional Proficiency Scale project has developed a method for ranking the tolerance of a variety of information processing tasks to possibly poor MT output. This ranking allows for the prediction of an MT system’s usefulness for particular text-handling tasks. | |
A Parallel Corpus of Italian/German Legal Texts | This paper presents the creation of a parallel corpus of Italian and German legal documents which are translations of one another. The corpus, which contains approximately 5 mio. words, is primarily intended as a resource for (semi-)automatic terminology acquisition. The guidelines of the Corpus Encoding Standard have been applied for encoding structural information, segmentation information, and sentence alignment. Since the parallel texts have a one-to-one correspondence on the sentence level, building a perfect sentence alignment is rather straightforward. As a result of this the corpus constitutes also a valuable testbed for the evaluation of alignment algorithms. The paper discusses the intended use of the corpus, the various phases of corpus compilation, and basic statistics. | |
Integrating Seed Names and ngrams for a Named Entity List and Classifier | We present a method for building a named-entity list and machine-learned named-entity classifier from a corpus of Dutch newspaper text, a rule-based named entity recognizer, and labeled seed name lists taken from the internet. The seed names, labeled either as PERSON, LOCATION, ORGANIZATION, or ADJECTIVAL name, are looked up in a 83-million word corpus, and their immediate contexts are stored as instances of their label. The latter 8-grams are used by a memory-based machine learning algorithm that, after training, (i) can produce high-precision labeling of instances to be added to the seed lists, and (ii) more generally labels new, unseen names. Unlabeled named-entity types are labeled with a precision of 61 % and a recall of 56 %. On free text, named-entity token labeling accuracy is 71 %. | |
Automatically Expansion of Thesaurus Entries with a Different Thesaurus | We propose a method for expanding the entries in a thesaurus using a di erent thesaurus constructed with another concept. This method constructs a mapping table between the concept codes of these two di erent thesauri. Then, almost all of the entries of the latter thesaurus are assigned the concept codes of the former thesaurus with the mapping table between them. To con rm whether this method is e ective or not,we construct a mapping table between the ''Kadokawa- shin-ruigo'' thesaurus (hereafter, ''ShinRuigo'') and ''Nihongo-goitaikei'' (hereafter, ''Goitaikei''), and assigne about 350 thousand entries with the mapping table. About 10% of the entries cannot be assigned automatically. It is shown that this method can save cost in expanding a thesaurus. | |
Learning Verb Subcategorization from Corpora: Counting Frame Subsets | We present some novel machine learning techniques for the identification of subcategorization information for verbs in Czech. We compare three different statistical techniques applied to this problem. We show how the learning algorithm can be used to discover previously unknown subcategorization frames from the Czech Prague Dependency Treebank. The algorithm can then be used to label dependents of a verb in the Czech treebank as either arguments or adjuncts. Using our techniques, we are able to achieve 88 % accuracy on unseen parsed text. | |
Morphosyntactic Tagging of Slovene: Evaluating Taggers and Tagsets | The paper evaluates tagging techniques on a corpus of Slovene, where we are faced with a large number of possible word-class tags and only a small (hand-tagged) dataset. We report on training and testing of four different taggers on the Slovene MULTEXT-East corpus containing about 100.000 words and 1000 different morphosyntactic tags. Results show, first of all, that training times of the Maximum Entropy Tagger and the Rule Based Tagger are unacceptably long, while they are negligible for the Memory Based Taggers and the TnT tri-gram tagger. Results on a random split show that tagging accuracy varies between 86% and 89% overall, between 92% and 95% on known words and between 54% and 55% on unknown words. Best results are obtained by TnT. The paper also investigates performance in relation to our EAGLES-based morphosyntactic tagset. Here we compare the per-feature accuracy on the full tagset, and accuracies on these features when training on a reduced tagset. Results show that PoS accuracy is quite high, while accuracy on Case is lowest. Tagset reduction helps improve accuracy, but less than might be expected. | |
Cross-lingual Interpolation of Speech Recognition Models | A method is proposed for implementing the cross-lingual porting of recognition models for rapid prototyping of speech recognisers in new target languages, specifically when the collection of large speech corpora for training would be economically questionable. The paper describes a way to build up a multilingual model which includes the phonetic structure of all the constituent languages, and which can be exploited to interpolate the recognition units of a different language. The CTSU (Classes of Transitory-Stationary Units) approach is exploited to derive a well balanced set of recognition models, as a reasonable trade-off between precision and trainability. The phonemes of the untrained language are then mapped onto the multilingual inventory of recognition units, and the corresponding CTSUs are then obtained. The procedure was tested with a preliminary set of 10 Rumanian speakers starting from an Italian-English-Spanish CTSU model. The optimal mapping of the vowel phone set of this language onto the multilingual phone set was obtained by inspecting the F1 and F2 formants of the vowel sounds from two male and female Rumanian speakers, and by comparing them with the values of F1 and F2 of the other three languages. Results in terms of recognition word accuracy measured on a preliminary test set of 10 speakers are reported. | |
Lexicalised Systematic Polysemy in WordNet | This paper describes an attempt to gain more insight into the mechanisms that underlie lexicalised sy phenomenon is interpreted as systematic sense combinations that are valid for more than one word. The WordNet is exploited to create a working definition of systematic polysemy and extract polysemic patt isation that allows the identification of fine-grained semantic relations between the senses of the words par ic polysemic pattern. |