ISI-ISM-ISSAS Joint Conference, 2014
Hide abstracts
Chun-houh Chen, ISSAS
Exploratory Data Analysis of Interval-valued Symbolic Data with Matrix Visualization
Symbolic data analysis (SDA) has gained popularity over the past few years because of its potential for handling data having a dependent and hierarchical nature. Amongst many methods for analyzing symbolic data, exploratory data analysis (EDA: Tukey, (1977)) with graphical presentation is an important one. Recent developments of graphical and visualization tools for SDA include zoom star, closed shapes, and parallel-coordinate-plots. Other studies project high dimensional symbolic data into lower dimensional spaces using symbolic data versions of principal component analysis, multidimensional scaling, and self-organizing maps. Most graphical and visualization approaches for exploring symbolic data structure inherit the advantages of their counterparts for conventional (non-symbolic) data, but also their disadvantages. Here we introduce matrix visualization (MV) for visualizing and clustering symbolic data using interval-valued symbolic data as an example; it is by far the most popular symbolic data type in the literature and the most commonly encountered one in practice. Many MV techniques for visualizing and clustering conventional data are converted to symbolic data, and several techniques are newly developed for symbolic data. Various examples of data with simple to complex structures are brought in to illustrate the proposed methods.
Kei Kobayashi, ISM
Hypothesis Testing for the Difference of Dendrograms
In this talk, we propose a novel type of
permutation tests for dendrogram data with respect to two metrics
for measuring difference between dendrograms. First the Frobenius
norm is used and consistency and efficiency of the permutation tests
are proved. Next the geodesic distance on a dendrogram space is
used. We use the uniqueness of geodesics on a dendrogram space. The
proposed permutation tests are applied to data analysis of mental
lexicons of English words. The difference of mental lexicons between
native and non-native English speakers is tested for each word
class.
This work is collaboration with Mitsuru Orita of Kumamoto
University.
Ayanendranath Basu, ISI
Some Recent Advances in Density-Based Minimum Distance Inference
Density-based minimum distance methods have a natural
resistance against model mis-specifications and outliers, and are popular
tools in parametric inference. The density power divergence proposal
(Basu et al. 1998, Jones et al. 2001) presented a class of useful robust
alternatives to the minimum disparity estimation approach. A comprehensive
description of this method is provided in Basu et al. (2011). In the
present talk we will describe some recent developments in the area of
minimum divergence estimation based on spirit of density power
downweighting, discuss how they extend the scope of inference beyond the
density power divergence, and in the process demonstrate the limitation of
the influence function as a measure of local robustness.
Hsien-Kuei Hwang, ISSAS
Riccati differential equations in applied probability
Riccati equations of the form
\[
y'(z) = a(z) y^2(z) + b(z) y(z) + g(z)
\]
were sporadically encountered in the applied probability literature.
In this talk, I will give a brief review and then propose a general
approach to the asymptotics of their coefficients, which will then
be helpful in establishing the limiting properties of the random
variables in question. New applications to variants of seating
arrangement problems will also be indicated.
Debleena Thacker, ISI
Pólya Urn Schemes with Infinitely Many Colors
In this talk we introduce a new type of urn
model with infinite but countably many colors indexed by an
appropriate infinite set. We mainly consider the indexing set of
colors to be the \(d\)-dimensional integer lattice and consider
balanced replacement schemes associated with bounded increment
random walks on it. We prove central and local limit theorems for
the expected configuration of the urn and show that irrespective of
the null recurrent or transient behavior of the underlying random
walks, the configurations have asymptotic Gaussian distribution
after appropriate centering and scaling. We show that the order of
any non-zero centering is always \({\mathcal O}\left(\log n\right)\)
and the scaling is \({\mathcal O}\left(\sqrt{\log n}\right)\). The
rate of convergence for the central limit theorem at time \(n\) will
be shown to be of the order \({\mathcal O}\left(\frac{1}{\sqrt{\log
n}}\right)\) and bounds similar to the classical Berry-Essen bound
will be derived. Further we show that for the expected configuration
a large deviation principle (LDP) holds with a good rate
function and speed \(\log n\).
Joint Work with Antar Bandyopadhyay.
Siva Athreya, ISI
Dense graph limits under respondent-driven sampling
We consider certain respondent-driven sampling
procedures on dense graphs. We show that if the sequence of the
vertex-sets is ergodic then the limiting graph can be expressed in
terms of the original dense graph via a transformation related to
the invariant measure of the ergodic sequence. For specific sampling
procedures we describe the transformation explicitly.
Joint work with Adrian Röllin.
Koji Tsukuda, ISM
On \(L^2\) Space Approach to Detect a Parameter Change in an Ergodic Diffusion Process Model
In this presentation, testing a change of drift
parameters in an ergodic diffusion process model is discussed. For
this problem, past studies chose \( \ell_\infty \) space as the
framework of weak convergences of proposed \(sup\) type test
statistics, that is, Kolmogorov-Smirnov type statistics. On the
other hand, we shall develop an approach by limit theorems in an
\(L^2\) space and propose a weighted integral type test statistic,
that is, Anderson-Darling type statistics, which is expected to have
better power in many cases.
This work is collaboration with Prof. Y. Nishiyama (Institute of
Statistical Mathematics).
Arvind Ayyer, IISc Bangalore
Connections between Exclusion Processes and Multiclass Queues
Motivated by problems in nonequilibrium
statistical physics, we consider a totally asymmetric multispecies
exclusion process on a finite one-dimensional lattice with periodic
boundary conditions. Physicists in the 90s had found a way to obtain
the stationary distribution by a technique known as the "matrix
ansatz". In 2006, P. Ferrari and J. Martin explicitly constructed
the stationary distribution by using ideas from queueing theory. I
will review both approaches to the proof and describe a
generalization to the partially asymmetric version.
This is joint work with C. Arita, K. Mallick and S. Prolhac.
Frederick K. H. Phoa, ISSAS
Construction of 2-level and 3-level Definitive Screening Designs
Definitive screening (DS) designs draw numerous attentions from the researches of designs of experiments due to its good design properties and run-size econ- omy. This paper investigates in the structure of both 2-level and 3-level DS designs and suggests theoretically-driven approaches to construct these DS designs for any number of run size. These construction is generally applicable for any number of factors. The constructed 3-level DS designs are T-optimal and many of them are D-optimal as well, and the rest have high D-efficiencies. Similar situation holds in 2-level DS designs when D-, A- and T -optimalities are considered. The part for 3-level DS design is a joint work with Professor Dennis Lin of Pennsylvania State University. The part for 2-level DS design is a joint work with Professor William Li of University of Minnesota.
Satoshi Kuriki , ISM
Optimal experimental designs for Fourier and polynomial regressions that minimize volume of tube
Simultaneous confidence bands of a nonlinear
regression are constructed by evaluating the volume of a tube about
a curve or manifold defined as a trajectory of regression basis
vector (Naiman, 1986). In this talk, we consider optimal
experimental designs that minimize the volume of tube, that is, that
attain the narrowest confidence band. In the cases of Fourier and
polynomial regressions, the problems are formalized as a
minimization problem over the cone of Hankel positive definite
matrices, where the objective function to minimize is the volume of
tube expressed as elliptic functions. We show that there exists a
group that remains our problem invariant, and demonstrate that the
minimization can be achieved by choosing a cross-section of
orbits.
This is a joint work with Henry Wynn of the London School of Economics, UK.
Siuli Mukhopadhyay, IIT Bombay
Generalized Multinomial Models
In this talk a family of link functions for the multinomial response model is proposed. The link family includes the multicategorical logistic link as one of its members. Conditions for the local orthogonality of the link and the regression parameters are given. It is shown that local orthogonality of the parameters in a neighbourhood makes the link family location and scale invariant. Simulation studies and a numerical example based on a combination drug study are used to illustrate the proposed parametric link family.
Chen-Hung Kao, ISSAS
Mapping quantitative trait loci under selective genotyping
The selective genotyping approach has been known as a cost-effective strategy to reduce genotyping work and still have the ability to maintain efficiency in detecting quantitative trait loci (QTL). This approach is to select individuals with extreme (high and low) phenotypic values for genotyping and keep the remaining individuals ungenotyped in the entire sample. In this talk, the current and our proposed statistical methods for mapping QTL using the data from the selective genotyping experiment are presented and discussed. The issues in determining critical thresholds for claiming QTL detection under selective genotyping are also discussed. Simulated examples are used for illustration.
Xiaoling Dou, ISM
Functional Clustering of Mouse Ultrasonic Vocalization Data
Mouse ultrasonic vocalizations (USVs) are studied in various fields of science. However, background noise and varied USV patterns in observed signals make complete automatic analysis difficult. We propose a series of methods to cluster nonharmonic mouse USV data automatically. The procedure includes noise reduction, detecting USV calls, transforming USV calls as functions and functional clustering. The proposed methods are shown useful with two data sets taken from laboratory mice.
Saurabh Ghosh, ISI
Integrating Multiple Phenotypes For Association Mapping
Most clinical end-point traits are governed by a
set of quantitative and qualitative precursors and a single
precursor is unlikely to explain the variation in the end- point
trait completely. Thus, it may be a prudent strategy to analyze a
multivariate phenotype vector possibly comprising both quantitative
as well as qualitative precursors for association mapping of a
clinical end-point trait. The major statistical challenge in the
analyses of multivariate phenotypes lies in the modelling of the
vector of phenotypes, particularly in the presence of both
quantitative and binary traits in the multivariate phenotype
vector.
For population-based data, we propose a novel Binomial regression
approach that models the likelihood of the number of minor alleles
at a SNP conditional on the vector of multivariate phenotype using a
logistic link function. For family-based data comprising informative
trios, we propose a logistic regression method that models the
transmission probability of a marker allele from a heterozygous
parent conditioned on the multivariate phenotype vector and the
allele transmitted by the other parent. In both the approaches, the
test for association is based on all the regression coefficients.
We carry out extensive simulations under a wide spectrum of genetic
models and probability distributions of the multivariate phenotype
vector to evaluate the powers of our test procedures. We apply the
proposed population-based method to analyze a multivariate phenotype
comprising homocysteine levels, Vitamin B12 levels and affection
status in a study on Coronary Artery Disease and the family-based
method to analyze a vector of four endophenotypes associated with
alcoholism: the maximum number of drinks in a 24 hour period, Beta 2
EEG Waves, externalizing symptoms and the COGA diagnosis trait in
the Collaborative Study on the Genetics of Alcoholism (COGA)
project.
Jing-Shiang Hwang, ISSAS
A stepwise regression algorithm for high-dimensional variable selection
We propose a new stepwise regression algorithm with a simple stopping rule for the identification of influential predictors and interactions among a huge number of variables in various statistical models. Like conventional stepwise regression, at each forward selection step, a variable is included into the current model if the test statistic of the enlarged model with the predictor against current model has the minimum p-value among all the candidates and is smaller than a predetermined threshold. Instead of using conventional information types of criteria, the threshold is determined by a lower percentile of the beta distribution. We conducted extensive simulation studies to evaluate the performance of the proposed algorithm for various statistical models and found it very competitive and robust compared to several popular high-dimensional variable selection methods.
Arindam Chatterjee, ISI
Inference using Adaptive Lasso based residuals
We study a linear model with a large number of
covariates. It is shown that under suitable sparsity assumptions,
the residuals based on the Adaptive Lasso estimator can provide
asymptotically valid inference procedures for the underlying unknown
error distribution function.
This is in contrast to existing procedures based on the least
squares estimator, which is known to fail when the number of
covariates is large compared to the sample size.
(Joint work with S. Gupta and S. N. Lahiri.)
Shota Katayama, ISM
Lasso Penalized Model Selection Criteria for High-Dimensional Multivariate Linear Regression Analysis
Model selection criteria for multivariate linear regression analysis that identify relevant predictors play an important role in biometrics, marketing research, engineering, econometrics and many other related research fields. Recently, high-dimensional data where the sample size is comparable with the dimension of multiple responses or larger than it often appear in these applications and classical model selection criteria are not applicable to such data. In this talk, we provide two model selection criteria that allow the high-dimensionality using Lasso penalized likelihood function. The consistency property is also shown under the framework that the dimension of multiple responses goes to infinity while the maximum size of candidate models has smaller order of the sample size.
Hironori Fujisawa, ISM
Affine Invariant Divergence With Empirical Estimability And Its Applications
In statistical inference, divergences play an
important role. An estimator of parameter can be obtained as the
minimizer of divergence. In this talk, we focus on an invariant
divergence under affine transformation of data, and then we obtain
an explicit class of divergences with empirical estimability. It is
proved that this class is uniquely determined under some conditions,
including affine invariance and empirical estimability. A definition
of cross entropy is extended to deal with a broader class of
divergence. We also investigate the relation to the Bregman
divergence.
This is a joint work with Takafumi Kanamori of Nagoya
University.
Hsin-Cheng Huang, ISSAS
Regularized Principal Component Analysis for Spatial Data
We consider nonstationary spatial modeling using empirical orthogonal functions (EOFs) based on data observed at p spatial locations with n repeated measurements. Traditionally, EOFs are obtained using principal-component-analysis related approaches. However, when data are noisy or n is small, the leading eigenfunctions produced from these methods may lack of any spatial structure and have poor physical interpretation. To obtain more precise estimates of eigenfunctions and the spatial covariance function, we propose a regularization approach incorporating smoothness and sparseness of eigenfunctions, which can be applied even when data are observed at irregularly spaced locations. The resulting optimization problem is solved using the alternating direction method of multipliers. Some numerical examples are provided to demonstrate the effectiveness of the proposed method.
Shinsuke Koyama, ISM
Information Gain on Variable Neuronal Firing
The question of how much information can be
theoretically gained from variable neuronal firing rate with respect
to constant mean firing rate is investigated. For this purpose, we
employ the Kullback-Leibler divergence as a measure of information
gain. We first give a statistical interpretation of this information
in terms of detectability of rate variation: the lower bound of
detectable rate variation, below which the temporal variation of
firing rate is undetectable with a Bayesian decoder, is entirely
determined by this information.
We show that the information depends not only of the variation of
firing rates (i.e., signals), but also significantly on the
dispersion properties of neuronal firing described by the shape of
interspike interval (ISI) distribution (i.e., noise properties). It
is shown that under certain condition, the gamma distribution
attains the theoretical lower bound of the information among all ISI
distributions when the coefficient of variation of ISIs is
given.
With the basis of the theoretical investigations, we propose a
practical method for estimating the information from spike trains,
and apply this method to biological spike data recorded from a
cortical area.
Chen-Hsiang Yeang, ISSAS
Development of nonstandard personalized medicine strategies for cancers with heterogeneous subclones
Cancers are heterogeneous and genetically unstable. Current practice of personalized medicine tailors therapy to heterogeneity between cancers of the same organ type. However, it does not yet systematically address heterogeneity at the single-cell level within a single individual’s cancer or the dynamic nature of cancer due to genetic and epigenetic change as well as transient functional changes. We have developed a mathematical model of personalized cancer therapy incorporating genetic evolutionary dynamics and single-cell heterogeneity, and have examined simulated clinical outcomes. Analyses of an illustrative case and a virtual clinical trial of over 3 million evaluable “patients” demonstrate that augmented (and sometimes counterintuitive) nonstandard personalized medicine strategies may lead to superior patient outcomes compared with the current personalized medicine approach. Current personalized medicine matches therapy to a tumor molecular profile at diagnosis and at tumor relapse or progression, generally focusing on the average, static, and current properties of the sample. Nonstandard strategies also consider minor subclones, dynamics, and predicted future tumor states. Our methods allow systematic study and evaluation of nonstandard personalized medicine strategies. These findings may, in turn, suggest global adjustments and enhancements to translational oncology research paradigms.
Masaya Saito, ISM
Estimation of outer-regional effect on 2009/2010 epidemic in Japan
Influenza epidemic in 2009/2010 season in Japan
was dominated by a single strain, the 2009pdm strain. According to
the sentinel observation data of Japan, a single epidemic wave made
the global trend, but small multiple waves superposed to the global
trend are identified. In addition, synchronized abrupt changes in
cases are also observed in several prefectures. In this talk, an
influence on the epidemic in each prefecture from the outside areas
is evaluated by comparing the data with solutions of the SIR model
with a stochastic term.
This is joint work with Seiya Imoto, Rui Yamaguch, and Satoru Miyano
from Institute of Medical Science, University of Tokyo, and Tomoyuki
Higuchi from Institute of Statistical Mathematics, Japan.