Dokumentdetails
ID

oai:arXiv.org:2406.00725

Thema
Computer Science - Information Ret...
Autor
Chen, Xiaocong Wang, Siyu Yao, Lina
Kategorie

Computer Science

Jahr

2024

Auflistungsdatum

05.06.2024

Schlüsselwörter
online approach reward methods decision
Metrisch

Zusammenfassung

Reinforcement learning-based recommender systems have recently gained popularity.

However, due to the typical limitations of simulation environments (e.g., data inefficiency), most of the work cannot be broadly applied in all domains.

To counter these challenges, recent advancements have leveraged offline reinforcement learning methods, notable for their data-driven approach utilizing offline datasets.

A prominent example of this is the Decision Transformer.

Despite its popularity, the Decision Transformer approach has inherent drawbacks, particularly evident in recommendation methods based on it.

This paper identifies two key shortcomings in existing Decision Transformer-based methods: a lack of stitching capability and limited effectiveness in online adoption.

In response, we introduce a novel methodology named Max-Entropy enhanced Decision Transformer with Reward Relabeling for Offline RLRS (EDT4Rec).

Our approach begins with a max entropy perspective, leading to the development of a max entropy enhanced exploration strategy.

This strategy is designed to facilitate more effective exploration in online environments.

Additionally, to augment the model's capability to stitch sub-optimal trajectories, we incorporate a unique reward relabeling technique.

To validate the effectiveness and superiority of EDT4Rec, we have conducted comprehensive experiments across six real-world offline datasets and in an online simulator.

Chen, Xiaocong,Wang, Siyu,Yao, Lina, 2024, Maximum-Entropy Regularized Decision Transformer with Reward Relabelling for Dynamic Recommendation

Dokumentieren

Öffnen

Teilen

Quelle

Artikel empfohlen von ES/IODE AI

High-Frequency Repetitive Magnetic Stimulation at the Sacrum Alleviates Chronic Constipation in Parkinson’s Patients
magnetic stimulation parkinson’s significant patients scale sacrum pd hf-rms chronic constipation scores
The mechanism of PFK-1 in the occurrence and development of bladder cancer by regulating ZEB1 lactylation
bladder cancer pfk-1 zeb1 lactylation glycolysis inhibits lactate glucose bc pfk-1 cancer lactylation cells bladder