DONKEY: A Flexible and Accurate Algorithm for Clustering.

Kára J, Acheson K, Kirrander A

We propose an accurate clustering algorithm suitable for the varied and multidimensional data sets that correspond to temporal snapshots from on-the-fly nonadiabatic trajectory-based simulations of photoexcited dynamics. The algorithm approximates the underlying probability density function using variable kernel density estimation, with local maxima corresponding to cluster centers. Each data point is then assigned to one of the maxima by employing a maximization procedure. Finally, clusters artificially separated by minor fluctuations in the probability density are merged. The algorithm does not require parameter tuning, which ensures flexibility and reduces the risk of bias. It is tested on several synthetic data sets, where it consistently outperforms conventional clustering algorithms. As a final example, the algorithm is applied to the excited dynamics of the norbornadiene ⇌ quadricyclane (C7H8) molecular photoswitch, demonstrating how distinct reaction pathways can be identified.