Plenary talks

Title: From Mahalanobis Distance to Fractile Graphs via Sample Survey

Speaker: Probal Chaudhuri, Indian Statistical Institute, Kolkata

I shall present a historical review of three major methodological contributions of Prasanta Chandra Mahalanobis in statistics.

Title: i-Fusion: Efficient Fusion Learning for Individualized Inference from Diverse Data Sources

Speaker: Regina Liu, Rutgers University, NJ, USA

Inferences from different databases or studies can often be fused together to yield a more powerful overall inference than individual studies alone. Fusion learning refers to effective approaches for synergizing learnings from different data sources.  Effective fusion learning is in great demand in decision-making processes in many domains such as medicine, life science or social studies where massive automatic data collection from different sources, often even with varying forms of complexity and heterogeneity in data structure, is ubiquitous.

This talk presents some new fusion approaches for extracting and merging useful information. Particular focus is the i-Fusion (individualized fusion) method, which is an individual-to-clique approach to fuse information from relevant entities to make inference for the target individual entity. Drawing inference from a clique allows “borrowing strength” from similar entities to enhance the inference efficiency for each individual. The i-Fusion method is flexible, computationally efficient, and can be scaled up to search through massive databases. The key tool underlying those fusion approaches is the so-called “confidence distribution” (CD), which, simply put, is a versatile distributional inferential scheme (unlike the usual point or interval inferences) without priors. Time permits, applications of the i-Fusion method in financial modeling, star formation in galaxies, precision medicine and forecast will also be discussed.

This is joint work with John Kolassa, Jieli Shen and Minge Xie, Rutgers University.

Title: Statistical learning in models made of modules

Speaker: Christian Robert, Ceremade - Université Paris-Dauphine, Bureau C638, Paris

In modern applications, statisticians are faced with integrating heterogeneous data modalities relevant for an inference, prediction, or decision problem. In such circumstances, it is convenient to use a graphical model to represent the statistical dependencies, via a set of connected "modules", each relating to a specific data modality, and drawing on specific domain expertise in their development. In principle, given data, the conventional statistical update then allows for coherent uncertainty quantification and information propagation through and across the modules. However, misspecification of any module can contaminate the estimate and update of others, often in unpredictable ways. In various settings, particularly when certain modules are trusted more than others, practitioners have preferred to avoid learning with the full model in favor of approaches that restrict the information propagation between modules, for example by restricting propagation to only particular directions along the edges of the graph. In this talk, we investigate why these modular approaches might be preferable to the full model in misspecified settings. We propose principled criteria to choose between modular and full-model approaches. The question arises in many applied settings, including large stochastic dynamical systems, meta-analysis, epidemiological models, air pollution models, pharmacokinetics-pharmacodynamics, and causal inference with propensity scores.

This is joint work with Pierre Jacob, Lawrence Murray, and Chris Holmes.