Détail du document
Identifiant

oai:arXiv.org:2402.13604

Sujet
Computer Science - Computation and... Economics - Econometrics I.2.7 I.7.0
Auteur
Dahl, Christian Møller Johansen, Torben Vedel, Christian
Catégorie

Computer Science

Année

2024

Date de référencement

10/04/2024

Mots clés
descriptions occcanine
Métrique

Résumé

This paper introduces a new tool, OccCANINE, to automatically transform occupational descriptions into the HISCO classification system.

The manual work involved in processing and classifying occupational descriptions is error-prone, tedious, and time-consuming.

We finetune a preexisting language model (CANINE) to do this automatically, thereby performing in seconds and minutes what previously took days and weeks.

The model is trained on 14 million pairs of occupational descriptions and HISCO codes in 13 different languages contributed by 22 different sources.

Our approach is shown to have accuracy, recall, and precision above 90 percent.

Our tool breaks the metaphorical HISCO barrier and makes this data readily available for analysis of occupational structures with broad applicability in economics, economic history, and various related disciplines.

;Comment: All code and guides on how to use OccCANINE is available on GitHub https://github.com/christianvedels/OccCANINE

Dahl, Christian Møller,Johansen, Torben,Vedel, Christian, 2024, Breaking the HISCO Barrier: Automatic Occupational Standardization with OccCANINE

Document

Ouvrir

Partager

Source

Articles recommandés par ES/IODE IA

CLIP-VIS: Adapting CLIP for Open-Vocabulary Video Instance Segmentation
matching weighted using set query mask classification clip open-vocabulary scores instance video
Das Verbundprojekt „Personalisierte Medizin für die Onkologie“ (PM4Onco) als Teil der Medizininformatik-Initiative (MII)
präzisionsmedizin hochdurchsatzdaten datenstandards interdisziplinarität entscheidungsunterstützende analys... precision medicine high-throughput data data standards interdisciplinarity decision-supporting analyses improve data klinischen von werden die und innen clinical mii zu der cancer um pm4onco