detalle del documento
IDENTIFICACIÓN

oai:arXiv.org:2408.06675

Tema
Computer Science - Computation and...
Autor
Hudspeth, Marisa O'Connor, Brendan Thompson, Laure
Categoría

Computer Science

Año

2024

fecha de cotización

21/8/2024

Palabras clave
treebanks
Métrico

Resumen

Existing Latin treebanks draw from Latin's long written tradition, spanning 17 centuries and a variety of cultures.

Recent efforts have begun to harmonize these treebanks' annotations to better train and evaluate morphological taggers.

However, the heterogeneity of these treebanks must be carefully considered to build effective and reliable data.

In this work, we review existing Latin treebanks to identify the texts they draw from, identify their overlap, and document their coverage across time and genre.

We additionally design automated conversions of their morphological feature annotations into the conventions of standard Latin grammar.

From this, we build new time-period data splits that draw from the existing treebanks which we use to perform a broad cross-time analysis for POS and morphological feature tagging.

We find that BERT-based taggers outperform existing taggers while also being more robust to cross-domain shifts.

Hudspeth, Marisa,O'Connor, Brendan,Thompson, Laure, 2024, Latin Treebanks in Review: An Evaluation of Morphological Tagging Across Time

Documento

Abrir

Compartir

Fuente

Artículos recomendados por ES/IODE IA

Investigation of Heavy Metal Analysis on Medicinal Plants Used for the Treatment of Skin Cancer by Traditional Practitioners in Pretoria
heavy metals medicinal plants skin cancer icp-ms health risk assessment treatment cancer plants 0 metal health medicinal