Document detail
ID

oai:arXiv.org:2405.02449

Topic
Statistics - Machine Learning Condensed Matter - Materials Scien... Computer Science - Machine Learnin... Quantitative Biology - Biomolecule...
Author
Nguyen, Quan Dieng, Adji Bousso
Category

Computer Science

Year

2024

listing date

5/8/2024

Keywords
machine experimental design
Metrics

Abstract

Experimental design techniques such as active search and Bayesian optimization are widely used in the natural sciences for data collection and discovery.

However, existing techniques tend to favor exploitation over exploration of the search space, which causes them to get stuck in local optima.

This ``collapse" problem prevents experimental design algorithms from yielding diverse high-quality data.

In this paper, we extend the Vendi scores -- a family of interpretable similarity-based diversity metrics -- to account for quality.

We then leverage these quality-weighted Vendi scores to tackle experimental design problems across various applications, including drug discovery, materials discovery, and reinforcement learning.

We found that quality-weighted Vendi scores allow us to construct policies for experimental design that flexibly balance quality and diversity, and ultimately assemble rich and diverse sets of high-performing data points.

Our algorithms led to a 70%-170% increase in the number of effective discoveries compared to baselines.

;Comment: Published in International Conference on Machine Learning, ICML 2024.

Code can be found in the Vertaix GitHub: https://github.com/vertaix/Quality-Weighted-Vendi-Score.

Paper dedicated to Kwame Nkrumah

Nguyen, Quan,Dieng, Adji Bousso, 2024, Quality-Weighted Vendi Scores And Their Application To Diverse Experimental Design

Document

Open

Share

Source

Articles recommended by ES/IODE AI

An Updated Overview of Existing Cancer Databases and Identified Needs
advancements insights assess review lipidomics glycomics proteomics databases research cancer