LREC 2002 Workshop

Event Modelling for Multilingual Document Linking

Specific Motivation and Aims

There is a growing need for modelling the content of multilingual documents for purposes such as IR and IE which rest on classification and document similarity.

This workshop aims to discuss the issues surrounding content modelling of multilingual documents, and in particular the issue of whether event representations can function as cross-document links. Specific Issues are:

How large-scale language resources can be used in Content Modelling.
How NLP techniques can be used to full advantage in event/content modelling. Can generic tools be used or is it inevitable that tools developed be application specific.
Discussion of learning techniques that have been applied to such tasks.
Discussion of applications that use content/event modelling as an intermediate stage or end result in document linkage.
Issues concerning evaluation of how well the content is modelled.
Multilingual document analysis.

General Motivation and Aims

The application of HLT to current IT trends requires large amounts of specific linguistic resources. However, existing large scale resources are never intended (i.e. designed and handcrafted) for specific application tasks.
In order to bridge the existing gap, a variety of methods for acquisition, adaptation and integration of linguistic resources have been proposed in the NLP research area since the late 80's. Machine learning and statistical techniques have been largely employed as major devices able to deal with the scale and the complexity of the problem. Although a large area of research, the impact of these technologies on the applications is still low with respect to their potential.
Open problems are:

The unclear targets of the learning activity: no general consensus exists among the proposed approaches to the quality and quantity of linguistic information needed for the different tasks (e.g. which is the suitable representation that captures selective information from the LR training material able to optimize parsing accuracy? Is it fully grammatical, like in bracketed corpora, or lexical)
The heterogeneity of sources: relevant information for the adaptation task can be distributed in different repositories (LKBs and texts) or expressed differently (in different languages and/or raw, e.g. texts, vs. semistructured data, e.g. HTML/XML formats).
The architectural idiosyncrasies: the proposed learning system make reference to different sources of information in different pipelined (or redundant as in voting) application architectures.
The application scope: current applications make a limited use (if any) of available adaptation technologies. This often limits the scale reachable by the current HLT aplications.

The above issues are orientative towards the complexity of the problem in current research given the enormous potential of the application field in areas like Web Mining, Q&A and Knowledge Management.

This workshop aims to bring together researchers of both academic and industrial organizations interested in:

Theoretical and Practical aspects of adaptive Natural Language Processing.
Models of Acquisition and Integration of Domain Knowledge.
Integration of induction models from heterogeneus data (lexicons vs. ontologies, texts vs. HTML/XML pages.
Learning Multlingual Information exploiting Multilingual Resources (e.g. EWN).
Theoretical and Practical aspects of Lexical Acquisition in multilingual scenarios.
Architectures for learning, adaptation, and integration of LR.
Adaptive HLT applications (including but not limited to search, retrieval, navigation and QA).

Papers are invited for presenting theoretical and methodological aspects of Machine Learning of Natural Language as well as approaches making effective use of adaptive methods in the perspective of pre-industrial or industrial applications.

Programme Committee (still pending)

Roberta Catizone	University of Sheffield, 211 Portobello Street, Regent Court, S1 4DP Sheffield (UK)	Phone: +44 114 222 1804; Fax +44 114 2725004; Email: r.catizone@dcs.shef.ac.uk
Walter Daelemans	CNTS/Language Technology Group, Antwerp University	daelem@uia.ua.ac.be
M. V. Marabello	KnowledgeStones S.p.A, Via C. Colombo 256, 00145 Roma (Italy)	Phone: +39 06 59606539; Fax:+39 06 5402987; Email: mv.marabello@knowledgestones.com
M. T. Pazienza	University of Roma, Tor Vergata, Via di Tor Vergata 110, 00133 Roma (Italy)	Phone: +39 06 72597378; Fax: +39 06 72597460; Email: pazienza@info.uniroma2.it
G. Rigau	Polytechnical University of Catalunia, Jordi Girona 31, 08034 Barcelona (Spain)	Phone: +34 93 401 56 51; Fax: +34 93 401 70 14; Email: g.rigau@lsi.upc.es
Horatio Rodriguez	Polytechnical University of Catalunia (Spain)	horacio@lsi.upc.es
A. Setzer	University of Sheffield (GB)	andrea@dcs.shef.ac.uk
N. Webb	University of Sheffield (GB)	n.webb@dcs.shef.ac.uk
Y. Wilks	University of Sheffield (GB)	y.wilks@dcs.shef.ac.uk
Rémi Zajac	New Mexico State University (USA)	zajac@crl.nmsu.edu
F.M. Zanzotto	University of Roma, Tor Vergata (Italy)	zanzotto@info.uniroma2.it

Contact Person

Roberta Catizone
University of Sheffield
211 Portobello Street, Regent Court, S1 4DP Sheffield (UK)
Phone: +44 114 2221897
Fax +44 114 2221810
Email: r.catizone@dcs.shef.ac.uk

Important Dates

Deadline for workshop abstract submission	20th of March 2002
Notification of Acceptance	27th of March 2002
Final version of paper for proceedings	15th of April 2002
Workshop	2nd of June 2002

Workshop Agenda - Morning Session

1st Invited Talk	8:00-9:00
Technical Papers	9:00-11:30
2nd Invited Talk	11:30-12:30
Panel and Round Table	12:30-1:30

A summary of the intended workshop Call for Participation.

In the workshop the following invited speakers are expected:

Roberto Basili (University of Roma, Tor Vergata)
Fabio Ciravegna (University of Sheffield)

A panel session on "Adaptive Technologies and their implications on advanced HLT applications (IR, IE, Q&A and KM)"

Distinguished panelists will be invited. Some of them confirmed their partecipation and among others:

Nino Varile (EC Commission)
F. Gardin (AISoftware)

Submissions

Papers should describe existing research connected to the topics of the workshop. The presentation at the workshop will be 30 minutes long (20 minutes for presentation and 10 minutes for questions and discussion). Each submission should show: title; author(s); affiliation(s); and contact author's e-mail address, postal address, telephone and fax numbers. Abstracts (maximum 2 pages, plain-text format).

The final version of the accepted papers should be no longer than 10 A4 pages. Instructions for formatting and presentation of the final version will be sent to authors upon notification of acceptance.