Author(s) |
Tamás
Gröbler, Gábor Hodász, Balázs Kis
MorphoLogic
, Orbánhegyi
út 5. H-1126 Budapest,
Hungary
, {grobler;hodasz;kis}@morphologic.hu
|
Abstract |
This
paper discusses the aspects of bi-lingual resource processing within a
rule-based translation memory (TM) system currently being
developed.
Translation memories can be viewed as translation tools incorporating
parallel corpora, mainly aligned at the sentence
level.
Usually, these corpora have no linguistic annotation, as commercial TM
systems perform queries at the character level, using
fuzzy
matches.
The
proposed translation memory system uses linguistic analysis (morphology
and parsing) to determine similarity between two
source-language
segments, and attempts to assemble a sensible translation using
translations of source-language chunks if the entire
source
segment was not found. This is achieved by integrating a rule-based
machine translation (RBMT) engine. The drawback of this
approach
is language-dependence; however, proper grammar acquisition methods are
being developed to speed up grammar preparation
for
further language pairs.
This
paper addresses the problem of adding sufficient linguistic annotation
to segment pairs – translation units (TU) – for new segment
pairs
to integrate with the RBMT scheme. This should be fully automatic
because adding a new translation unit to a translation memory
must
be transparent, without requiring user reaction. The paper discusses a
robust enough method to obtain as much linguistic annotation
as
possible, while keeping the error rate low.
|