Xử lí ngôn ngữ
|Automated Extraction of Tree Adjoining Grammars from a Treebank for Vietnamese|
Le Hong Phuong, Nguyen Thi Minh Huyen, Nguyen Phuong Thai, Azim Roussanaly
In this paper, we present a system that automatically extracts lexicalized tree adjoining grammars (LTAG) from treebanks. We first discuss in detail extraction algorithms and compare them to previous works. We then report the first LTAG extraction result for Vietnamese, using a recently released Vietnamese treebank. The implementation of an open source and language independent system for automatic extraction of LTAG grammars is also discussed.
Grammars in general and lexicalized tree adjoining grammars in particular are one of the most important elements in the natural language processing (NLP). Since the development of hand-crafted grammars is a time consuming and labor intensive task, many studies on automatic and semi automatic grammar development have been carried outduring lastdecades.
After decades of research in NLP mostly concentrated on English and other well-studied languages, recent years have seen an increased interest in less common languages, notably because of their growing presence on the Internet. Vietnamese, which belongs to the top 20 most spoken languages, is one of those new focuses of interest. Obstacles remain, however, for NLP research in general and grammar development in particular: Vietnamese does not yet have vast and readily available constructed linguistic resources upon which to build effective statistical models, nor reference works against which new ideas may be experimented.
Moreover, most existing research so far has been focused on testing the applicability of existing methods and tools developed for English or other Western languages, under the assumption that their logical or statistical well-foundedness guarantees cross-language validity, while in fact assumptions about the structure of a language are always made in such tools, and must be amended to adapt them to different linguistic phenomena. For an isolating language such as Vietnamese, techniques developed for flexional languages cannotbeapplied “asis”.
The primary motivation to develop a system that can automatically extract an LTAG grammar for the Vietnamese language is the need of a rich statistical information and wide-coverage grammar which may contribute more effectively in the development of basic linguistic resources and tools forautomaticprocessing of Vietnamesetext.
We present in this article a system that automatically extracts lexicalized tree adjoining grammarsfromtreebanks. We first discussin detail the extraction algorithms and compare them to previous works. We then report the first LTAG extraction result for Vietnamese, using the recently released Vietnamese treebank. The implementation of an open source and language independent system for automatic extraction of LTAG grammars from treebanks is also discussed.Please to read PDF file
Bài đăng trước: