Document detail
ID

oai:arXiv.org:2404.14565

Topic
Computer Science - Computer Vision...
Author
Chen, Jiaqi Barath, Daniel Armeni, Iro Pollefeys, Marc Blum, Hermann
Category

Computer Science

Year

2024

listing date

11/13/2024

Metrics

Abstract

Natural language interfaces to embodied AI are becoming more ubiquitous in our daily lives.

This opens up further opportunities for language-based interaction with embodied agents, such as a user verbally instructing an agent to execute some task in a specific location.

For example, "put the bowls back in the cupboard next to the fridge" or "meet me at the intersection under the red sign."

As such, we need methods that interface between natural language and map representations of the environment.

To this end, we explore the question of whether we can use an open-set natural language query to identify a scene represented by a 3D scene graph.

We define this task as "language-based scene-retrieval" and it is closely related to "coarse-localization," but we are instead searching for a match from a collection of disjoint scenes and not necessarily a large-scale continuous map.

We present Text2SceneGraphMatcher, a "scene-retrieval" pipeline that learns joint embeddings between text descriptions and scene graphs to determine if they are a match.

The code, trained models, and datasets will be made public.

Chen, Jiaqi,Barath, Daniel,Armeni, Iro,Pollefeys, Marc,Blum, Hermann, 2024, "Where am I?" Scene Retrieval with Language

Document

Open

Share

Source

Articles recommended by ES/IODE AI

Diabetes and obesity: the role of stress in the development of cancer
stress diabetes mellitus obesity cancer non-communicable chronic disease stress diabetes obesity patients cause cancer