Recipes of complex stochastic processes in Bayesian machine learning

Abstract: In recent years, the need for automatic identification and classification of documents, images, sounds and other objects have been felt in various aspects of everyday life such as cataloging, biometric identification, phone banking, spam filtering, search engines, online shopping and so on. In modeling the random mechanism guiding the behavior of the objects under processing, exchangeability, latent variables, high dimension, sharing and sparsity have emerged as the key concepts. A simple stochastic process leading to a natural sharing behavior and thus leading to a distribution on partitions is given by the Chinese restaurant process (CRP). The process is intimately connected with the Dirichlet process, the most widely studied object in Bayesian nonparametrics. When a virtually infinite number of features are simultaneously considered, a factorial analog of the sharing pattern of the CRP is generated by the Indian buffet process (IBP). In this talk, we outline the basic properties of the CRP and IBP, especially, their representation as limits of multinomial processes, predictive distributions of new objects, stick-breaking representations, and their relations respectively with the Dirichlet process and the beta process, another fundamental process in Bayesian survival analysis. We shall outline computational strategies and describe potential applications in machine learning.