detalle del documento
IDENTIFICACIÓN

oai:arXiv.org:2409.14657

Tema
Computer Science - Computation and...
Autor
Sarveswaran, Kengatharaiyer
Categoría

Computer Science

Año

2024

fecha de cotización

25/9/2024

Palabras clave
annotation linguistic tamil
Métrico

Resumen

Treebanks are important linguistic resources, which are structured and annotated corpora with rich linguistic annotations.

These resources are used in Natural Language Processing (NLP) applications, supporting linguistic analyses, and are essential for training and evaluating various computational models.

This paper discusses the creation of Tamil treebanks using three distinct approaches: manual annotation, computational grammars, and machine learning techniques.

Manual annotation, though time-consuming and requiring linguistic expertise, ensures high-quality and rich syntactic and semantic information.

Computational deep grammars, such as Lexical Functional Grammar (LFG), offer deep linguistic analyses but necessitate significant knowledge of the formalism.

Machine learning approaches, utilising off-the-shelf frameworks and tools like Stanza, UDpipe, and UUParser, facilitate the automated annotation of large datasets but depend on the availability of quality annotated data, cross-linguistic training resources, and computational power.

The paper discusses the challenges encountered in building Tamil treebanks, including issues with Internet data, the need for comprehensive linguistic analysis, and the difficulty of finding skilled annotators.

Despite these challenges, the development of Tamil treebanks is essential for advancing linguistic research and improving NLP tools for Tamil.

;Comment: 10 pages

Sarveswaran, Kengatharaiyer, 2024, Building Tamil Treebanks

Documento

Abrir

Compartir

Fuente

Artículos recomendados por ES/IODE IA

Bone metastasis prediction in non-small-cell lung cancer: primary CT-based radiomics signature and clinical feature
non-small-cell lung cancer bone metastasis radiomics risk factor predict cohort model cect cancer prediction 0 metastasis radiomics clinical