From data archetypes to collider data: two perspectives on the Wasserstein metric as a distance between data distributions

Abstract

Interest from the machine learning community in optimal transport has surged over the past five years. A key reason for this is that the Wasserstein metric provides a unique way to measure the distance between data distributions—one that respects the geometry of the underlying space and behaves well even when distributions lack overlapping support.

In today’s talk, I will present two recent works that leverage the benefits of the Wasserstein metric in vastly different contexts. First, I will describe how the Wasserstein metric can be used to define a novel notion of archetypal analysis — in which one approximates a data distribution by a uniform probability measure on a convex polygon, so that the vertices provide exemplars of extreme points of the data. Next, I will discuss an application of optimal transport to collider physics, in which comparing collider events using the Wasserstein metric allowed us to achieve state of the art accuracy with vastly improved computational efficiency. In both cases, I will discuss both the theoretical benefits and the computational challenges of optimal transport in the machine learning context.