LREC 2002 Workshop

Beyond PARSEVAL
Towards Improved Evaluation Measures for Parsing Systems

Overview

The PARSEVAL metrics for evaluating the accuracy of parsing systems have underpinned recent advances in stochastic parsing with grammars learned from treebanks (most prominently the Penn Treebank of English). However, a new generation of parsing systems is emerging based on different underlying frameworks and covering other languages. PARSEVAL is not appropriate for many of these approaches: the NLP community therefore needs to come together and agree on a new set of parser evaluation standards.

Motivation and Aims

In line with increasing interest in fine-grained syntactic and semantic representations, stochastic parsing is currently being applied to several high level syntactic frameworks, such as unification-based grammars, tree-adjoining grammars and combinatory categorial grammars. A variety of different types of training data are being used, including dependency annotations, phrase structure trees, and unlabelled text. Other researchers are building parsing systems using shallower frameworks, based for example on finite-state transducers. Many of these novel parsing approaches are using alternative evaluation measures -- based on dependencies, valencies, or exact or selective category match -- since the PARSEVAL measures (of bracketing match with respect to atomic-labelled phrase structure trees) cannot be applied, or are uninformative.

The field is therefore confronted with a lack of common evaluation metrics, and also of appropriate gold standard evaluation corpora in languages other than English. We need a new and uniform scheme for parser evaluation that covers both shallow and deep grammars, and allows for comparison and benchmarking across different syntactic frameworks and different language types.

A previous LREC-hosted workshop on parser evaluation in 1998 (seehttp://ceres.ugr.es/~rubio/elra/parsing.htm) brought together a number of researchers advocating parser evaluation based on dependencies or grammatical relations as a viable alternative to the PARSEVAL measures.

The aim of this workshop is to start an initiative by bringing together four relevant parties:

Researchers in symbolic and stochastic parsing
Builders of annotated corpora
Representatives from different syntactic frameworks
Groups with interests in and proposals for parser evaluation

The workshop will provide a forum for discussion with the aim of defining a new parser evaluation metric; we also intend the workshop to kick off a sustained collaborative effort into building or deriving sufficiently large evaluation corpora, and possibly training corpora appropriate to the new metric. To maintain the momentum of this initiative we will work towards setting up a parsing competition based on new standard evaluation corpora and evaluation metric.

Topics of Interest

The workshop organisers invite papers focussing on:

Benchmarking the accuracy of individual parsing systems
Parser evaluation
Design of annotation schemes covering different languages and grammar frameworks
Creation of high-quality evaluation corpora

Papers on the following topics will be particularly welcome:

Descriptions of experiments using alternative evaluation measures with existing (stochastic or symbolic) parsers, focussing on comparison and discussion of qualitative differences
Methods for creation of evaluation (or training) corpora, allowing flexible adaptation to a new evaluation standard based on dependencies or grammatical relations
Comparisons of existing or possible new schemes for dependency-based evaluation (differences, similarities, problems)

Workshop Agenda

The one-day workshop will consist of (30-minute) paper presentations, a panel session, and an extended open session at which important results of the workshop will be summarised and discussed.

As a follow-up, we hope to arrange a half-day meeting outside the workshop format to discuss concrete action plans, create working groups, and plan future collaboration.

Workshop Organisers

John Carroll	University of Sussex (UK)	John.Carroll@cogs.susx.ac.uk
Anette Frank	DFKI GmbH, Saarbruecken (Germany)
Dekang Lin	University of Alberta (Canada)
Detlef Prescher	DFKI GmbH, Saarbruecken (Germany)
Hans Uszkoreit	DFKI GmbH and Saarland University, Saarbruecken (Germany)

Programme Committee

Salah Ait-Mokhtar	XRCE Grenoble
Thorsten Brants	Xerox PARC
Gosse Bouma	Rijksuniversiteit Groningen
Ted Briscoe	University of Cambridge
John Carroll	University of Sussex
Jean-Pierre Chanod	XRCE Grenoble
Michael Collins	AT&T Labs-Research
Anette Frank	DFKI Saarbruecken
Josef van Genabith	Dublin City University
Gregory Grefenstette	Clairvoyance, Pittsburgh
Julia Hockenmaier	University of Edinburgh
Dekang Lin	University of Alberta
Chris Manning	Stanford University
Detlef Prescher	DFKI Saarbruecken
Khalil Sima'an	University of Amsterdam
Hans Uszkoreit	DFKI Saarbruecken and Saarland University

Submissions

Abstracts for workshop contributions should not exceed two A4 pages (excluding references). An additional title page should state: the title; author(s); affiliation(s); and contact author's e-mail address, as well as postal address, telephone and fax numbers.

Submission is to be sent by email, preferably in Postscript or PDF format, to John Carroll before 1st February 2002. Abstracts will be reviewed by at least 3 members of the program committee.

Formatting instructions for the final full version of papers will be sent to authors after notification of acceptance.

Important Dates

Deadline for receipt of abstracts	1st February 2002
Notification of acceptance	22nd February 2002
Camera-ready final version for workshop proceedings	12th April 2002
Workshop	2nd June 2002

Workshop Registration Fees

The registration fees for the workshop are:

If you are not attending LREC: 140 EURO
If you are attending LREC: 90 Euro

All attendees will receive a copy of the workshop proceedings.

Beyond PARSEVALTowards Improved Evaluation Measures for Parsing Systems