Chilean Probability Seminar: “On Model-Based Clustering with Entropic Optimal Transport”
Abstract:
Resumen: We develop a new methodology for model-based clustering. Optimizing the log-likelihood provides a principled statistical framework for clustering, with solutions found via the EM algorithm. However, because the log-likelihood is nonconvex, only convergence to stationary points can be guaranteed, and practitioners often use multiple starting points in the hope that one will converge to the global solution. We consider a new loss function based on entropic optimal transport that shares the same global optimum as the log-likelihood but has a much better-behaved landscape, thereby avoiding spurious local-optima configurations that are pervasive with the log-likelihood. Similar to the EM algorithm for the log-likelihood, this new loss can be optimized by the Sinkhorn-EM algorithm, which we show converges at a rate comparable to that of EM. By analyzing extensive numerical experiments and two real-world applications in image segmentation in C. elegans microscopy and clustering in spatial transcriptomics, we show that this new loss outperforms log-likelihood optimization, indicating that it represents a valuable clustering methodology for practitioners. We also comment on finite-sample properties of this procedure, leveraging novel convergence bounds for objects arising from entropic optimal transport.
Speaker: Gonzalo Mena (UC Berkeley)
Join Zoom Meeting
https://reuna.zoom.us/j/84521834914?pwd=OTZ6Y0NWM3pYTGtTbEt3c0luTG96UT09
ID de reunión: 845 2183 4914
Código de acceso: 997973