Title |
Information Extraction from Hindi Texts |
Author(s) |
Kamlesh Dutta (1), Saroj Kaushik (2), Nupur Prakash (3) (1) National Institute of Technology, Hamirpur (HP) INDIA 177005, kd@recham.ernet.in; (2) Indian Institute of Technology, New Delhi INDIA, saroj@cse.iitd.ernet.in; (3) IndiraGandhi Institute of Technology, New Delhi, INDIA nupurprakash@rediffmail.com |
Session |
P23-W |
Abstract |
The paper presents an information extraction system that takes input from Hindi texts and improves the information content retrieved by using anaphor/pronoun resolution mechanism. The information extraction system developed consists of three major modules: The language Parser, Resolution System and Information Extractor. The language parser used is HPSG (Head-Driven Phrase Structure Grammar) based that provides both syntactic and semantic information to the anaphor resolution system. HPSG was chosen because it provides a set of constraint on the co-referential structures in the language, which bounds the search for an antecedent to a more precise location in the discourse. The semantic information included in its parsing may be helpful for removing ambiguity in anaphor/pronoun resolution. The anaphor resolution system uses few heuristic rules to resolve intrasentential references while centering theory is used for intersentential resolution. |
Keyword(s) |
information extraction, anaphor, HPSG, discourse |
Language(s) | Hindi |
Full Paper |