Words to Wheels: Vision-Based Autonomous Driving Understanding Human Language Instructions Using Foundation Models

Détail du document

Identifiant

oai:arXiv.org:2410.10577

Sujet

Computer Science - Robotics

Auteur

Ryu, Chanhoe Seong, Hyunki Lee, Daegyu Moon, Seongwoo Min, Sungjae Shim, D. Hyunchul

Catégorie

Computer Science

Année

2024

Date de référencement

16/10/2024

Mots clés

navigation using model elevation language vehicle human instructions

Métrique

Résumé

This paper introduces an innovative application of foundation models, enabling Unmanned Ground Vehicles (UGVs) equipped with an RGB-D camera to navigate to designated destinations based on human language instructions.

Unlike learning-based methods, this approach does not require prior training but instead leverages existing foundation models, thus facilitating generalization to novel environments.

Upon receiving human language instructions, these are transformed into a 'cognitive route description' using a large language model (LLM)-a detailed navigation route expressed in human language.

The vehicle then decomposes this description into landmarks and navigation maneuvers.

The vehicle also determines elevation costs and identifies navigability levels of different regions through a terrain segmentation model, GANav, trained on open datasets.

Semantic elevation costs, which take both elevation and navigability levels into account, are estimated and provided to the Model Predictive Path Integral (MPPI) planner, responsible for local path planning.

Concurrently, the vehicle searches for target landmarks using foundation models, including YOLO-World and EfficientViT-SAM.

Ultimately, the vehicle executes the navigation commands to reach the designated destination, the final landmark.

Our experiments demonstrate that this application successfully guides UGVs to their destinations following human language instructions in novel environments, such as unfamiliar terrain or urban settings.

;Comment: 7 pages, 7 figures

Ryu, Chanhoe,Seong, Hyunki,Lee, Daegyu,Moon, Seongwoo,Min, Sungjae,Shim, D. Hyunchul, 2024, Words to Wheels: Vision-Based Autonomous Driving Understanding Human Language Instructions Using Foundation Models

Document

Ouvrir

Source

Words to Wheels: Vision-Based Autonomous Driving Understanding Human Language Instructions Using Foundation Models

Détail du document

Identifiant

Sujet

Auteur

Catégorie

Année

Date de référencement

Mots clés

Métrique

Résumé

Document

Partager

Source

Articles recommandés par ES/IODE IA

Words to Wheels: Vision-Based Autonomous Driving Understanding Human Language Instructions Using Foundation Models

Category Trumps Shape as an Organizational Principle of Object Space in the Human Occipitotemporal Cortex

Postoperative breast cancer surveillance: Can contrast-enhanced spectral mammography solve the diagnostic dilemma?