Title | Developing Language Resources for a Transnational Digital Government System |
Author(s) |
Violetta Cavalli-Sforza, Jaime G. Carbonell, Peter J. Jansen
Language Technologies Institute, Carnegie Mellon University |
Session | O25-EGSW |
Abstract | We describe ongoing efforts towards developing language resources for a transnational digital government project aimed at applying information technology (IT) to a problem of international concern: detecting and monitoring activities related to the transnational movement of illicit drugs. The project seeks to support information sharing, coordination and collaboration among government agencies within a country and across national boundaries by combining a variety of technologies including a distributed query processor with form-based and conversational user interfaces, a language translation system, an event server for event filtering and notification, and an event-trigger-rule server. The prototype system is being developed by U.S. universities in collaboration with an international agency and with universities and government agencies in Belize and the Dominican Republic. This paper focuses on the linguistic resources and their use in Example-Based Machine Translation (EBMT). We are in the process of developing an English-Spanish parallel corpus, focused on the domain of information elicited and used at border crossings, to fuel the EBMT system. While significant parallel corpora are available for these two languages in the newswire domain, they were found to be of very limited use for the border crossings application, spurring the need to develop our own resources. |
Keyword(s) | Language resources development, language resources management, machine translation, border crossing domain, transnational digital government |
Language(s) | English (U.S., Belize) and Spanish (Latin American) |
Full Paper | 713.pdf |