[Turing-Southampton] Statistics lectureship talks
Woods D.C.
D.Woods at soton.ac.uk
Fri Aug 23 18:03:15 BST 2019
Dear all
Apologies for any cross-posting.
On Monday 2 September, lectureship candidates in Statistics will give presentations in the Ketley Room (Level 4 of B54). All are welcome to attend.
Schedule, title and abstracts are below.
Best wishes
Dave
—
Schedule and titles
09.00 Piao Chen (via Skype)
Estimation of field reliability based on aggregate lifetime data
09.30 Kim May Lee
To add or not to add a new arm to an on-going clinical trial: a decision-theoretic framework
10.00 Reza Drikvandi
LassoPCR: A novel method for analysis of high dimensional data
10.30 Break
11.00 Axel Finke (Skype)
Limit theorems for sequential MCMC methods
11.30 Chao Zheng
Revisiting Huber’s M-estimation: a tuning-free approach
12.00 Chieh-Hsi (Jessie) Wu
Revealing across-site heterogeneity of nucleotide substitution patterns using Dirichlet process mixture model in a phylogenetic inference
Abstracts
Piao Chen
Estimation of field reliability based on aggregate lifetime data
Many large organizations have developed ambitious programs to build reliability databases by collecting field failure data from a large variety of components. To make the database concise, the component lifetime data are recorded in an aggregate way in these databases. The data format is different from traditional lifetime data and the statistical inference is challenging. In this talk, we propose a general parametric estimation framework for one particular type of the aggregate data, i.e., the failure-censored aggregate data, where each data point is a summation of a series of collective failures representing the cumulative operating time of one component position from system commencement to the last component replacement. We use two common lifetime distributions, i.e., the gamma distribution and the inverse Gaussian distribution, to model the component lifetime. We develop point and interval estimation procedures for the model parameters and the lifetime quantiles. Through extensive simulations, we show that the proposed interval estimation methods uniformly outperform the other competing methods. At last, we illustrate the proposed models and the inference methods by using a real aggregate dataset.
Kim May Lee
To add or not to add a new arm to an on-going clinical trial: a decision-theoretic framework
Clinical trials are expensive investments that aim to evaluate the efficacy of new treatments on patients. When there are several new treatments available for testing at different times, a platform trial approach can be considered. This trial approach allows for adding new arms to an existing trial that has similar objectives and settings. This feature of adding arms is appealing to practitioners because of the efficiencies from running one trial instead of several: this can lead to the process of testing new treatments being shortened.
This talk will explore the decision of whether or not to add a new treatment arm to an on- going study within a two-stage trial setting. When a new treatment becomes available, the decision of always opening a new arm in an existing trial may reduce the resources for testing the initial treatments and prolong the overall duration to establish the treatment benefits. On the contrary, the decision to not add may reduce the chances of patients getting better treatments earlier. I will illustrate a decision-theoretic framework that provides an optimal decision based on the observed data from the initial stage.
Reza Drikvandi
LassoPCR: A novel method for analysis of high dimensional data
High dimensional data are rapidly growing in many domains, for example, in microarray gene expression studies, fMRI data analysis, large-scale healthcare analytics, text/image analysis, natural language processing and astronomy, to name but a few. In the last two decades regularisation approaches have become the methods of choice for analysing high dimensional data. However, obtaining accurate estimates and predictions as well as reliable statistical inference remains a major challenge in high dimensional situations. In this talk we introduce a novel method, called LassoPCR, to overcome this challenge in high dimensional linear regression models. The proposed method enjoys from an effective amalgamation of the regularisation methods such as lasso and the dimensionality reduction techniques such as PCA and sparse PCA. Specifically, the LassoPCR uses all the selected and unselected covariates obtained from lasso variable selection to construct a model that provides more accurate parameter estimates and predictions. We develop a likelihood-based method for parameter estimation in the LassoPCR model, which also allows us to conduct individual and simultaneous statistical inference on regression parameters. We establish the asymptotic properties of the LassoPCR, and evaluate and compare its finite-sample performance with existing regularaisation methods using simulations and real data analysis. The LassoPCR also enables us to construct a test for sparsity in the data. The LassoPCR idea can be extended to generalised regression models, especially to high dimensional logistic regression for classification purposes.
Axel Finke
Limit theorems for sequential MCMC methods
Sequential Monte Carlo (SMC) methods a.k.a. "particle filters" are algorithms which can be used to approximate expectations w.r.t. a sequence of probability distributions as well as their normalising constants. This makes them a powerful tool not just for object-tracking applications but also more generally for (Bayesian) inference, model comparison, rare-event estimation or even optimisation.
The main focus of my talk will be on an extension of SMC methods termed "sequential MCMC" methods which have recently become popular in the engineering literature because they demonstrate superior empirical performance to standard SMC methods in some applications. I will present some convergence guarantees for sequential MCMC methods and explain under which conditions they are preferable to standard SMC methods.
This is joint work with Arnaud Doucet (Oxford) and Adam M. Johansen (Warwick & The Alan Turing Institute)
Chao Zheng
Revisiting Huber’s M-estimation: a tuning-free approach
The robustification parameter, which balances bias and robustness, has played a critical role in the construction of sub-Gaussian estimators for heavy-tailed data. Although it can be tuned by cross-validation in traditional practice, in large scale statistical problems such as high dimensional regression and multiple testing, the number of robustification parameters scales with the size of the problem so that cross-validation can be computationally unaffordable. In this talk, I will introduce a new data-driven principle to select the robustification parameter for Huber-type sub-Gaussian estimators in three fundamental problems: mean estimation, linear regression and sparse regression in high dimensions. The proposal is guided by non-asymptotic deviation analysis, and is conceptually different from cross-validation which relies on the mean squared error to assess the fit. The promising performance of the proposed methods, apart from the theoretical justifications, are further illustrated with numerical experiments.
Chieh-Hsi (Jessie) Wu
Revealing across-site heterogeneity of nucleotide substitution patterns using Dirichlet process mixture model in a phylogenetic inference
Phylogenetics studies the ancestral relationships among a group of individuals descended from a common ancestor and the relationships are typically modelled in the form of a tree-like structure. Usually, we only observe data at the leaves of a phylogenetic tree; therefore, when inferring the phylogeny from a nucleotide sequence alignment, we use a nucleotide substitution model to explain the molecular process that gives rise to the genetic diversity observed in the individuals of interest. Specifically, the nucleotide substitution model specifies the nucleotide substitution pattern—the relative exchange frequencies between a pair of nucleotides. There are many substitution models to choose from; however, they do not accommodate the variation in the substitution pattern across sites (positions in nucleotide sequences), leading to potential model-misspecification. Although there are existing tools that aim to partition the alignment to account for such spatial variation, they do not quantify the associated uncertainty.
We solve the problem of estimating across-site heterogeneity in the substitution pattern by employing Dirichlet process mixture (DPM) models to cluster the sites in an alignment. Within each cluster, we apply Bayesian model selection over a set of standard nucleotide substitution models; the variation in substitution models across clusters accounts for the heterogeneity in the substitution pattern across sites. Additionally, the DPM model estimates the clustering directly in the Bayesian framework, so we can automatically quantify the uncertainty in our estimate of the variation.
In an analysis on an RNA virus dataset, our method demonstrates that nucleotide sequences can display a higher level of heterogeneity than the conventional partitioning scheme by codon positions. Furthermore, applying a particular case of our method to analyse linguistic data has produced new findings on the dynamics of language evolution.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.ecs.soton.ac.uk/pipermail/turing-southampton/attachments/20190823/efb4189e/attachment-0001.html
More information about the Turing-Southampton
mailing list