ISI-ISM-ISSAS Joint Conference, 2017

Hide abstracts

Talks

Hironori Fujisawa, ISM

Robust and Sparse Regression Modelling

In high-dimensional data, many sparse regression methods have been proposed. However, they may not be robust against outliers. Recently, the use of density power weight has been studied for robust parameter estimation and the corresponding divergences have been discussed. One of such divergences is the gamma-divergence and the robust estimator using the gamma-divergence is known for having a strong robustness. Here, we consider the robust and sparse regression based on gamma-divergence. We extend the gamma-divergence to the regression problem and show that it has a strong robustness under heavy contamination even when outliers are heterogeneous. The loss function is constructed by an empirical estimate of the gamma-divergence with sparse regularization and the parameter estimate is defined as the minimizer of the loss function. To obtain the robust and sparse estimate, we propose an efficient update algorithm which has a monotone decreasing property of the loss function. Particularly, we discuss a linear regression problem with L1 regularization in detail. In numerical experiments and real data analyses, we see that the proposed method outperforms past robust and sparse methods.

This is a joint work with Takayuki Kawashima, Ph.D. student of the Institute of Statistical Mathematics, Japan.

Ci-Ren Jiang, ISSAS

Sensible Functional Linear Discriminant Analysis

The focus of this paper is to extend Fisher's linear discriminant analysis (LDA) to both densely recorded functional data and sparsely observed longitudinal data for general \(c\)-category classification problems. We propose an efficient approach to identify the optimal LDA projections in addition to managing the noninvertibility issue of the covariance operator emerging from this extension. A conditional expectation technique is employed to tackle the challenge of projecting sparse data to the LDA directions. We study the asymptotic properties of the proposed estimators and show that asymptotically perfect classification can be achieved in certain circumstances. The performance of this new approach is further demonstrated with numerical examples.

Swagata Nandi, ISI

Estimating the fundamental frequency using modified Newton-Raphson algorithm

In this talk, we propose a modied Newton-Raphson algorithm to estimate the frequency parameter of the fundamental frequency model in presence of additive stationary error. The proposed estimator has the same rate of convergence as the least squares estimator. With a proper step factor modification, we start the algorithm with an initial estimator of order \(O_p(n^{-1})\) and obtain an estimator with rate \(O_p(n^{-3/2})\), the same rate as the least squares estimator. We will discuss numerical results using simulated data as well as real datasets.

This is a joint work with Debasis Kundu of IIT Kanpur

Yuya Ariyoshi, ISM

Space Debris Modelling for Sustainable Space Utilization

Space debris is an artificial and useless object orbiting around the Earth and it includes abandoned satellites, rocket bodies, breakup fragments and so on. Even if space debris is small like a breakup fragment, operational spacecraft may be destroyed or malfunctioned by collisions. Currently, the number of artificial objects in space is increasing and space environment is being degraded by space debris.

This presentation introduces space debris modeling for evaluating long-term evolution of space environment. The first half presents the current space environment around the Earth. Then the result of the future space debris population which is predicted using NEODEEM (Near Earth Orbital Debris Environment Evolutionary Model) is demonstrated. The latter half reports on the estimation of physical properties of fragments generated by breakup events using the historical orbital elements.

This is partially a joint work with Toshiya Hanada of Kyushu University, Japan and Satomi Kawamoto of Japan Aerospace Exploration Agency, Japan.

Jing-Shiang Hwang, ISSAS

Detecting the spread of personal mood through networks using online contact diaries

Most existing studies have demonstrated that certain emotions tend to spread from direct contacts over a short period of time, but relatively few have examined whether personal mood may also spread from indirect contacts within social networks over a longer period of time. By extending the bottom-up approach to social network studies of diffusion, we aim to examine such a pattern of mood spread using data collected via an online survey platform, ClickDiary, over a seven-month period between May 1 and November 30, 2014. During the study period, 133 diary keepers recorded 127,455 contacts with 12,070 persons. Diary keepers reported the mood status of their network members during each contact on a scale from 1 to 4, representing poor, good, very good and excellent, respectively. The overall mood of each network member was calculated by the average of that person’s mood scores during the study period of seven months. Diary keepers also rated how well a given pair of network members knew each other. These entries helped construct the complete contact network of each diary keeper, with rich information about the interpersonal ties and contacts among all actors involved, as well as their moods during the contacts, as reported by the diary keepers. Using mixed-effects models while controlling for covariates at both tie and contact levels, we analyzed how personal mood varies by the moods of those within the complete contact networks. The results showed that personal mood varied significantly by the average mood among those who were directly connected to the person, while the positive effect size, reduced by about a half, remained significant among those who were connected to the person with two degrees. The mood of anyone else separated by more than two degrees was statistically irrelevant. We concluded that personal mood is likely to spread through strong connections and intense contacts. Our findings revealed that such transmissions could extend up to two degrees of separation within contact networks.

Anil K. Ghosh, ISI

Multi-scale Classification Using Localized Spatial Depth

In this talk, we develop and investigate a new classifier based on features extracted using spatial depth. Our construction is based on fitting a generalized additive model to posterior probabilities of different competing classes. To cope with possible multi-modal as well as non-elliptic nature of the population distribution, we also develop a localized version of spatial depth and use that with varying degrees of localization to build the classifier. Final classification is done by aggregating several posterior probability estimates, each of which is obtained using this localized spatial depth with a fixed scale of localization. The proposed classifier can be conveniently used even when the dimension of the data is larger than the sample size, and its good discriminatory power for such data has been established using theoretical as well as numerical results.

This is a joint work with Subhajit Dutta and Soham Sarakar.

Su-Yun Huang, ISSAS

On the weak convergence and Central Limit Theorem of blurring and nonblurring processes with application to robust location estimation

Weak convergence and associated Central Limit Theorem for blurring and nonblurring processes will be presented. Then, they are applied to robust estimation of location parameter. Simulation studies show that the location estimation based on the blurring process is more robust and often more efficient than the nonblurring process.

Shogo Kato, ISM

Some properties of a family of distributions on the sphere related to the Möbius transformation

We discuss some properties of a family of distributions on the sphere. The family is a generalization of the wrapped Cauchy or circular Cauchy family on the circle. Its connection with the Möbius transformation is explored. It is shown that the family is closed under the Möbius transformation on the sphere of the sample space and that there is a similar induced transformation on the parameter space. The family is related to the conformal Cauchy distribution on the Euclidean space via the stereographic projection. Some properties of a marginal distribution of the spherical Cauchy such as certain moments and a closure property associated with the real Möbius group are obtained. Maximum likelihood estimation is studied in some detail. Closed-form expressions for the maximum likelihood estimators are available when the sample size is not greater than three, and the unimodality holds for the maximized likelihood functions.

This is a joint work with Peter McCullagh of the University of Chicago, USA.

Anish Sarkar, ISI

Towards finer result about learning from neighbour model

I will introduce the learning from neighbour model and briefly explain the known results and explain the attempted result in the learning model. Finally, for a simplified model, I will show how to obtain a similar result.

This is a joint work with Dr. Kumajit Saha

Arijit Chakrabarty, ISI

Asymptotic behaviour of Gaussian minima

In this work, we investigate what happens when an entire sample path of a smooth Gaussian process on a compact interval lies above a high level. Specifically, we determine the precise asymptotic probability of such an event, the extent to which the high level is exceeded, the conditional shape of the process above the high level, and the location of the minimum of the process given that the sample path is above a high level.

This is a joint work with Gennady Samorodnitsky.

Krishanu Maulik, ISI

Strong Law for Urn Models with Random Replacement Matrices

Urn models were introduced by Eggenberger and Polya in 1923. Since then the model has been studied intensively and found applications in modelling social behaviour, gas laws, population dynamics, database management, einforced learning, analysis of algorithms, clinical trial and so on. Originally, the replacement matrices were taken to be multiples of the identity matrix. However, later extensions allowed more general ones, where the balanced condition (equal row sums) and nonnegativity of the entries of the replacement matrices were relaxed; even random replacement matrices were allowed. One of the fundamental questions was to study the asymptotic behaviour of the configuration vector of the urn. It is known that the vector scales linearly, but the limit differs interestingly depending on the replacement matrix - from random ones for the multiples of the identity matrix, where it is a Dirichlet random vector with distribution depending on the initial configuration, to deterministic ones for positive regular case, where it is a left eigenvector of the principal (Perron-Frobenius) eigenvalue and is independent of the initial configuration. However, typically such results require the replacement matrix to be balanced or have finite second moment. We consider a model where the replacement matrices have nonnegative entries, but are random and the \(n\)-th draw is independent of the corresponding replacement matrix given the past. We also assume finite \(p\)-th moment for the matrices and the replacement matrix sequence to be conditionally \(L^p\) bounded. Further, the conditional expectations of the replacement matrices are assumed to be concentrated around an irreducible matrix \(H\), which is possibly random. Using stochastic approximation technique, we show that the configuration vector scales linearly almost surely and the limit is the product of the principal eigenvalue of \(H\) and its corresponding left eigenvector normalized to be a probability vector.

This is a joint work with Ujan Gangopadhyay, part of which constituted his M.Stat. project at Indian Statistical Institute.

Daichi Mochihashi, ISM

The Infinite Tree Hidden Markov Model

Hidden Markov Models (HMMs) are widely used in statistics, computer science and other fields including time series analysis, and considered to be a time series extension of a mixture model. Generally, discrete HMMs assume a discrete hidden state for each time frame of the observation. For example, hidden state sequence would be like “7 4 2 2 15 …”. However, such a representation is not always satisfactory because hidden states should be structured and hierarchical in many cases: each word of a language has a part-of-speech such as “noun/name” or “verb/transitive/past”, or the state of a chemical plant could be classified to “normal but potentially dangerous” or “disordered but safe”.

Inducing latent hierarchy of states on time-series data in an unsupervised fashion is a difficult task because its tree structure is completely unknown to us a priori.

In this talk, I will present a nonparametric Bayesian solution to this problem that enables learning of tree-structured hidden states that have potentially infinite width and depth. By defining transitions between their states a tree-structured hierarchical random measures, this HMM, the Infinite Tree HMM, can be considered as an infinite tree extension of the hidden Markov models and also has a relationship to diffusion priors.

Hsuan-Yu Chen, ISSAS

Applications of Bioinformatics in Precision Medicine

The aim of the precision medicine is to develop the patient-tailored treatment or management. Up to date, microarrays, next generation sequencers, and LC MS/MS can be used to measure changes of different molecular levels such as genome, transcriptome, and proteome. It enables researchers to set up patient-tailored treatment or management based on molecular background. However, how to analyze above big molecular data and link to clinical information raise challenges in the precision medicine. In addition, heterogeneity of genomic backgrounds of different races is also an important issue. Hence, developments of bioinformatics approaches can overcome above challenges. In this talk, in the post-diagnosis, integrations of biological pathways and prognosis will be introduced. In the disease screening, whole genome sequencing approach will be demonstrated to identify a germline risk allele of lung adenocarcinoma. In the druggable target identifications, a web-based tool for transcription factor analysis will also be introduced. Furthermore, analysis of genomic backgrounds of different races and diseases will be discussed. Finally, pharmacoecnomics issue of targeting therapies will be also mentioned.

Saurabh Ghosh, ISI

Effect of Population Stratification on Powers of Association Tests for Quantitative Traits

The effect of population stratification on tests of genetic association for both binary as well as quantitative traits has traditionally been studied in the context of false positive rates. It is well known that a test based on a sample comprising genetically or phenotypically heterogeneous subpopulations is susceptible to an inflated rate of false positives. However, this adverse effect on the false positive rate pertains only to population-based association tests, but not family-based tests of transmission disequilibrium. On the other hand, the effects of population stratification on the false negative rates of association tests based on either of the study designs have remained largely unexplored. Our aim is to investigate, both analytically as well as empirically, the possible marginal and joint effects of genetic and phenotypic heterogeneities on the false negative rates (and hence power) of both population-based as well as family-based tests for quantitative traits with controlled false positive rates. In light of the fact that both phenotypic as well as genetic heterogeneities are necessary to inflate the false positive rates of population-based tests (Haldar and Ghosh, 2012), we study only the marginal effects of phenotypic or genetic heterogeneity on the powers of three popular population-based association tests: ANOVA, linear regression with additive allelic effect and the Kruskal-Wallis test. Since the family-based design protects the tests of association from inflated false positive rates, we evaluate both the marginal as well as joint effects of genetic and phenotypic heterogeneities on three model free family-based tests for transmission disequilibrium: the logistic regression based method proposed by Waldman et al. (1999), FBAT (Lange et al., 2002) and TBAT based on the classical trio design in a logistic regression framework proposed by us that yields comparable power as FBAT. We also carry out extensive simulations under different genetic models to assess the extent of reduction in powers of the different tests for different levels of stratification.

This is a joint work with Tanushree Haldar.

Masao Ueki, Kurume University, Japan

Rapid and accurate genetic predictive modelling for large-scale genetic study

We develop a new genetic prediction method, smooth-threshold multivariate genetic prediction, using single nucleotide variants (SNPs) data in genome-wide association studies (GWASs). Our method consists of two stages. At the first stage, unlike the usual discontinuous SNP screening as used in the gene score method, our method continuously screens SNPs based on the output from standard univariate analysis for marginal association of each SNP. At the second stage, the predictive model is built by a generalized ridge regression simultaneously using the screened SNPs with SNP weight determined by the strength of marginal association. Continuous SNP screening by the smooth thresholding not only makes prediction stable but also leads to a closed form expression of generalized degrees of freedom (GDF). The GDF leads to the Stein's unbiased risk estimation (SURE), which enables data-dependent choice of optimal SNP screening cutoff without using cross-validation. Our method is very rapid because computationally expensive genome-wide scan is required only once in contrast to the penalized regression methods including lasso and elastic net. Simulation studies that mimic real GWAS data with quantitative and binary traits demonstrate that the proposed method outperforms the gene score method and genomic best linear unbiased prediction (GBLUP), and also shows comparable or sometimes improved performance with the lasso and elastic net being known to have good predictive ability but with heavy computational cost. Application to whole-genome sequencing (WGS) data from the Alzheimer's Disease Neuroimaging Initiative (ADNI) exhibits that the proposed method shows higher predictive power than the gene score and GBLUP methods.

This is a joint work with Gen Tamiya of Tohoku University, Japan.

Antar Bandyopadhyay, ISI

Generalized Pólya Urn Schemes With Negative But Linear Reinforcements

In this work, we consider a new type of urn scheme, where the selection probabilities are proportional to a weight function, which is linear but decreasing in the proportion of existing colours. Thus leading to what we refer to as a negatively reinforced urn scheme. We establish almost sure limit of the random configuration for any balanced replacement matrix R. In particular, we show that the limiting configuration is uniform on the set of colours, if and only if, R is a doubly stochastic matrix. We further establish almost sure limit of the vector of colour counts and prove central limit theorems for the random configuration, as well as, for the colour counts of this negatively reinforced urn process.

This is a joint work with Gursharn Kaur.

Chi-Lun Cheng, ISSAS

Two Problems in Measurement Error Models

Measurement error (errors-in-variables) models are alternatives for regression models. The latter models assume the independent variables are measured without error, whereas the former models assume the independent variables are measured with error. This article discusses two problems in measurement error models. There is vast literature on linear and nonlinear measurement error models. However, most of papers concentrate on point estimation. Less is known for other statistical inferences, such as model/variable selection, diagnosis etc. In the first problem we focus on such issues. The second problem is more philosophical. If the measurement error is “too large”, we might not be able to find good inferences. But how large is too large? We address such issue in a preliminary way.

Shuhei Mano, ISM

Multiplicative measures on partitions and A-hypergeometric system

Multiplicative measures on partitions appear in various statistical setting, such as nonparametrics and sampling theory. They are sampling distributions from prior processes in the Bayesian nonparametrics. The conditional measure is an algebraic exponential family, whose normalization constant is the A-hypergeometric polynomial associated with the rational normal curve, or the associated partial Bell polynomial. The maximum likelihood estimator (MLE) of the full and the curved exponential families are studied in terms of the information geometry of the Newton polytopes. Especially, it is shown that the MLE does not exist for the full exponential family. Algebraic methods to evaluate the A-hypergeometric polynomials numerically are also provided.

Junji Nakano, ISM

Summarizing aggregated symbolic data with categorical variables

In recent “Big data” era, huge amount of individual data are available. They are often divided into some naturally defined groups, and sometimes we are mainly interested in the differences among groups, not among individuals. Such groups can be characterised by several descriptive statistics calculated by using information about the joint distribution. Symbolic data analysis is a method to handle such groups mainly by using the information of the marginal distribution of variables. We propose to use contingency tables between pairs of variables to describe a group when each individual data consists of several categorical variables, and call them aggregated symbolic data. We further propose to summarize them by a few number of statistics to be able to visualize them and interpret them.

This is a joint work with Nobuo Shimizu at the Institute of Statistical Mathematics, Japan, and Yoshikazu Yamamoto at Tokushima Bunri University, Japan.

Deepayan Sarkar, ISI

Blind Deconvolution using Natural Image Priors

Blurring of photographic images due to camera shake is quite common, and recovering the underlying image from such photographs is an interesting inference problem. Ignoring rotations, the blurring process can be modeled as a filtering or convolution of the underlying image and a “blur kernel” or “point spread function”, and the problem is thus referred to as “deconvolution”. The problem is well-studied when the blur kernel is known. However, non-blind deconvolution, when the blur kernel is unknown, is more difficult and in fact ill-posed unless additional assumptions are made. Considerable progress in this problem has been made during the last decade by making `natural' assumptions about the unknown image in the form of a prior. In this talk, we will give an overview of the problem, summarize the current approaches to solve it, and describe a generalization of the commonly used prior family.

This is a joint work with Kaustav Nandy.

Tso-Jung Yen, ISSAS

Solving Fused Group Lasso Problems via Block Splitting Algorithms

We propose a distributed optimization-based method for solving the fused group lasso problem in that the penalty function is a sum of Euclidean distances between pairs of parameter vectors. As a result of that, the penalty function is not separable in terms of these parameter vectors. To make the penalty function separable, we further introduce a set of equality constraints that connect each parameter vector to a group of paired auxiliary variables. This separable property facilitates us to solve the fused group lasso problem by developing an iterative algorithm with that most tasks can be carried out independently in parallel. We evaluate performance of the parallel algorithm by carrying out fused group lasso estimation for regression models using simulated data sets. Our results show that the parallel algorithm has a massive advantage over its non-parallel counterpart in terms of computational time and memory usage. In addition, with additional steps in each iteration, the parallel algorithm can obtain parameter values almost identical to those obtained by the non-parallel algorithm.

Yasmeen Akhtar, ISSAS

Generalized covering designs: a new class of experimental designs bridging covering arrays and orthogonal arrays

Orthogonal arrays have been well-known for their applications in designing the fractional factorial experiments. Their existence is restricted as all tuples appear in equal number of times, but such constraint guarantees equivalent variation estimation in all tuples. On the other hands, covering arrays are an important class of designs in software testing. Their existence is less restrictive as all tuples only need to appear at least once, but those appear-once tuples lack the measure on variations and thus they fail to resist outliers. In this talk, we introduce a new class of experimental designs, namely “Generalized Covering designs (GCD)”, which fills the gap in between orthogonal arrays and covering arrays such that all tuples are required to appear at least \(\lambda\) times, where \(\lambda\) is a user-defined parameter. We theoretically study the properties of GCD, and develop a systematic method to construct families of GCD with minimium run sizes under different number of factors, number of levels, strength and \(\lambda\).

This is a joint work with Frederick Kin Hing Phoa.

Yi-Hau Chen, ISSAS

Joint regression analysis of marginal quantile and quantile association in longitudinal studies, with application to adolescent body mass index data

This work proposes joint regression analysis of the marginal quantiles of longitudinal/clustered outcomes as well as the association between pairs of the outcomes, with the association measuring the tendency of concordance between pairs of the outcomes with respect to their marginal quantiles. The motivation comes from a longitudinal adolescent body mass index (BMI) study where both the marginal quantile regression of BMI, as well as the tendency that an adolescent with BMI higher than the 75th population quantile of BMI at some age would still have BMI higher than the 75th population quantile of BMI at some later age, are of interest. The new procedure generalizes the ‘alternative logistic regressions' to marginal quantile regression, and extends the ‘quantile association regression' to general analysis of longitudinal and clustered data. A novel bivariate induced smoothing technique is proposed for stable and efficient computation. The application to the longitudinal adolescent BMI study reveals the practical utility of our proposal.

This is a joint work with Drs. Chi-Chuan Yang (Academia Sinica) and Hsing-Yi Chang (National Health Research Institutes)

Shonosuke Sugasawa, ISM

Flexibly Transformed Empirical Best Prediction in Finite Population

For estimating an area-specific parameter in a finite population, an empirical best prediction approach is an attractive tool. The key assumption in the approach is that the transformed response variables are normally distributed. However, the use of some known (specified) transformation suffers from misspecification of the transformation, which might lead to biased and inefficient prediction. To overcome this difficulty, we propose using a family of transformations and develop a new approach called flexibly transformed empirical best prediction. We suggested a simple estimating method for transformation parameters based on profile likelihood, which achieves consistency under reasonable assumptions for transformation functions. Through simulation and empirical studies, we evaluate the performances of the proposed method together with some existing methods. This is a joint work with Tatsuya Kubokawa.

Posters

Mansi Garg, ISI

On U-statistics based on associated random variables

A notion of positive dependence between random variables is association. Associated random variables often occur in models based on monotonic transformations, and hence several applications can be found in reliability and survival analysis. We discuss limiting behavior of U-statistics when the underlying sample consists of stationary associated random variables.

Partha Pratim Ghosh, ISI

Characterization of Extreme Copulas

This project aims to characterize the set of extreme points of n-dimensional copulas (n > 1)​ and seeks to represent a copula as a limit of such extreme points. We have stated and proved results discovered in the course of our project. It has been found that for a copula to be an extreme point in the set of all n-dimensional copulas it must give a singular measure with respect to Lebesgue measure. We have also presented a construction of a subset of n-dimensional extreme copulas using permutation functions such that any n-dimensional copula is a limit point of the subset with respect to dinfinity metric (weak convergence). Such results have natural applications as have been illustrated. The project concludes by proving a few more results on the characterization of extreme copulas in terms of measure preserving functions and finds a sufficient condition on copulas such that they may be extreme points.

Zhongliang Guo, ISM

Statistical significance of conditional main effects in molecular properties

Statistical models have been used to predict physical properties of molecules. Molecular structures are represented by bit vectors, often called molecular fingerprints, according to the presence or absence of the chemical fragments. The datasets which contain the molecular fingerprint and properties are used to train models. However, the interaction effects have been neglected so far, since the number of combinations of the chemical fragments increases exponentially. In this study, we consider the conditional main effect (CME) as an alternative of the conventional interaction effects. CME is defined as the effect of a specific fragment conditioned by the presence of some other fragments in a molecule. By controlling the false discovery rate, we aim to design a multiple testing method to efficiently identify the statistically significant CMEs. In the poster, I will show a preliminary result with the comparison to the Benjamini-Hochberg procedure and Storey’s procedure.

This is a joint work with Ryo Yoshida of the Institute of Statistical Mathematics, Japan.

Jayant Jha, ISI

Regression On A Unit Sphere

We discuss the regression of a point on the surface of a unit sphere in d dimensions given a point on the surface of a unit sphere in p dimensions, where p is possibly different from d. Point projection is added to the rotation and linear transformation for regression link function. The identifiability of the model is proved. Then, parameter estimation is discussed. Simulation studies and data analysis were done to illustrate the model.

This is a joint work with Prof. Atanu Biswas of Indian Statistical Institute, Kolkata.

Gursharn Kaur, ISI

Negatively Reinforced Urn Models

n this work we consider general negatively reinforced urn models with finitely many colours. We will call an urn scheme negatively reinforced, if the selection probability for a colour is proportional to a weight function, which is decreasing. Under some assumptions on \(w\), we obtain almost sure convergence of the random configuration of the urn for a general replacement matrix \(R\). We show that depending upon the function \(w\) and the replacement matrix \(R\), the limit may be a constant or a random variable.

This is the joint work with Antar Bandyopadhyay.

Kaustav Nandy, ISI

Estimation of Kernel Size from Blurred Images

Recovering latent image from a blurred image is an interesting inference problem. A blurred image is usually assumed to be a convolution of the true latent image and an unknown "blur kernel" or "point spread function". In most existing work, dimension of the blur kernel is assumed to be known, while in practice they are not known. In our work we try to estimate the blur kernel size from a blurred image so that we can use it further while doing the image deconvolution.