Workshop on Adapting Lexical and Corpus Resources to Sublanguages and Applications

ADAPTING LEXICAL AND CORPUS RESOURCES TO SUBLANGUAGES AND APPLICATIONS

a workshop to be held at the

FIRST INTERNATIONAL CONFERENCE
ON LANGUAGE RESOURCES AND EVALUATION

GRANADA, SPAIN, 26 MAY 1998

The workshop will provide a forum for those researchers involved in the development of methods to integrate corpora and MRDs, with the aim of adding adaptive capabilities to existing linguistic resources.

Organisers:

Roberto Basili (University of Roma "Tor Vergata"),
Roberta Catizone (University of Sheffield),
Maria Teresa Pazienza (University of Roma "Tor Vergata"),
Paola Velardi (University of Roma "La Sapienza),
Yorick Wilks (University of Sheffield)

WORKSHOP SCOPE AND AIMS

Lexicons, i.e., those components of a NLP system that contain "computable" information about words, cannot be considered as static objects. Words may behave very differently in different domains, and there are language phenomena that do not generalize across sublanguages. Lexicons are a snapshot of a given stage of development of a language, normally provided without support for adaptation changes, whether caused by language creativity and development or the shift to such a previously unencountered domain.

The divergence of corpus usages from lexical norms has been studied computationally at least since the late Sixties, but only recently has the availability of large on-line corpora made it possible to establish methods to cope systematically with this problem. An emerging branch of research is now involved in studies and experiments on corpus-driven linguistics, with the aim of complementing and extending earlier work on lexicon acquisition based on Machine Readable Dictionaries (MRD): data are extracted from texts, as embodiments of language in use, so as to capture lexical regularities and to code them into operational forms. The purpose of this workshop will be to provide an updated snapshot of current work in the area, and promote discussion of how to make progress.

Central topics will be (though this list is in no way exclusive):

corpus-driven tuning of MRDs to optimize domain-specific inferences,
terminology and jargon acquisition,
sense extensions,
acquisition of preference or subcategorization information from corpora
taxonomy adaptation,
statistical weighting of senses etc. to domains
use of MRDs to provide explanations of linguistic phenomena in corpora
what is the scope of "lexical tuning"
the evaluation of lexical tuning as a separate task, or as part of a more generic task

INDUSTRIAL PANEL

Automatic adaptation of lexicons to new domains through the use of application corpora makes NLP applications more adaptable and portable. The Program Commettee is organizing a joint panel to discuss this (and other) issues concerning next generation Information Extraction Systems. The panel intends to bring industrial representatives to confront expectations in IE from their viewpoint and degree of maturity of the offering.

The following (and other) issues will be discussed:

Is there a market for IE?
What is the demand in domains such as New Services for the citizens, Telecommunications, Management Support, etc?
What are the technical requirements?Is the technology near to the market?

PROGRAM COMMITTEE

Yorick Wilks University of Sheffield
Roberta Catizone University of Sheffield
Paola Velardi University of Roma "La Sapienza"
Maria Teresa Pazienza University of Roma "Tor Vergata"
Roberto Basili University of Roma "Tor Vergata"
Bran Boguraev Brandeis University
Sergei Nirenburg New Mexico State University
James Pustejowsky Brandeis University
Ralph Grishman New York University
Christiane Fellbaum Princeton University

PAPER SUBMISSION

FORMATTING GUIDELINES:
Papers should not exceed 4000 words or 10 pages.

HARD COPIES:
Three hard copies should be sent to:

Paola Velardi
Dipartimento di Scienza dell'Informazione
via Salaria 113
00198 Roma
Italy

ELECTRONIC SUBMISSION:

Electronic submission will be allowed in Poscript or Word per Mac or RTF. An ftp site will be available on demand. Authors should send an info email to Paola Velardi (velardi@dsi.uniroma1.it) even if they submit in paper form. An electronic submission should be accompanied by a plain ascii text.

# NAME : Name of first author
# TITLE: Title of the paper
# PAGES: Number of pages
# FILES: Name of file (if also submitted electronically)
# NOTE : Anything you'd like to add
# KEYS : Keywords
# EMAIL: Email of the first author
# ABSTR: Abstract of the paper
# . . . . . .

IMPORTANT DATES

Paper Submission Deadline (Hard Copy/Electronic) March 10
Paper Notification April 1
Camera-Ready Papers Due April 20
L&CT workshop May 26

WORKSHOP PROGRAM

8:00 - 8:30 Registration

8:30 - 8: 50 "ASIUM: Learning subcategorization frames and restrictions of
selection" D. Faure and C. Nédellec
8:50 - 9:10 "When <> Strikes Back: Will corpus evidence bear out my
theoretical claims?" Magnar Brekke
9:10 - 9:30 "Textual Semantics and Corpus Specific Lexicons" Marc Cavazza
9:30 -9:50 "An empirical approach to Lexical Tuning"  R. Basili , R.
Catizone, M.T. Pazienza, M. Stevenson, P. Velardi, M. Vindigni, Y. Wilks
9:50 - 10:10 "Towards Inductive lexicons" W. Daelemans, G. Durieux, and A.
van den Bosch

10:10 - 10-30 coffee break

10:30 - 10-50  "Subcorpora-based tuning of Swedish generic lexical
resources" D. Kokkinakis and  S. Kokkinakis
10:50 - 11:10 "Machine Learning for Domain-Adaptive Word Sense
Disambiguation"  G. Paliouras,V. Karkaletsis, C. Spiropoulos
11:10 - 11:30 "Acquisition of Language Resources for Special Applications"
S. Sheremetyeva
11:30 - 11:50 "Electronic Dictionary management in a multiplatform
environment" L. de Yzaguirre, J. Vivaldi, M.T. Cabré
11:50 - 12:10 "Bridging the gap between lexicon and corpus: convergence of
formalisms" A. Kilgariff

12:10 - 12: 30 coffee break

12:30 - 14:00 Industrial Panel :
Will adaptive lexical resources bridge the gap between industrial and
research-oriented Information Extraction systems?
The panelists are requested to address the following issues:
*       Is there a market for IE?
*       What is the demand in domains such as  Telecommunications, New Web
Services, Consumer Electronics, etc.?
*       What are the technical requirements?
*       Is the technology near to the market?
*       Linguistic resources for IE: how to create, adapt, integrate
linguistic resources for adaptive IE system?

CONFERENCE INFORMATION

General information about the conference is at:

Specific queries about the conference should be directed to:

LREC Secretariat
Facultad de Traduccion e Interpretacion
Dpto. de Traduccion e Interpretacion
C/ Puentezuelas, 55
18002 Granada, SPAIN
Tel: +34 58 24 41 00 - Fax: +34 58 24 41 04
reli98@goliat.ugr.es ml>