I currently work as a Principal Data Scientist at SEEK. Our team provides an experimentation platform that is used to test the performance of new AI tools and services being developed at SEEK.
Previously, I worked as an academic biostatistician in several Australian and UK universities; my full CV can be found here.
I have interests in Bayesian inference, experimental design, survival analysis, and models for longitudinal data from cohort studies. I also enjoy statistical and probabilistic programming. I am a regular user of R and Python. I have contributed to the development of open-source software packages for R and Stan, as well as Python, CI/CD, and cloud-based infrastructure internally at SEEK.
My PhD – awarded by Monash University in 2018 – was entitled “Joint longitudinal and time-to-event models: development, implementation and applications in health research” and was supervised by Prof Rory Wolfe (primary), Dr Margarita Moreno-Betancur, and Dr Michael Crowther.
When not busy crunching data, I have a keen interest in hiking and travelling around some of the most epic spots on the globe.
PhD in Biostatistics, 2018
Monash University, Australia
MSc in Medical Statistics, 2015
University of Leicester, UK
BSc(Hons) in Statistics, 2007
University of Otago, NZ
A full list of my peer-reviewed publications can be found here and my PhD thesis can be found here.
A full list of my talks can be found here
simsurv
is an R package for simulating survival (i.e. time-to-event) data. The user can simulate survival times from standard parametric survival distributions (exponential, Weibull, Gompertz), 2-component mixture distributions, or a user-defined hazard or log hazard function. The latter two features are those which likely separate the simsurv
package from other packages available for simulating survival data in R. The package implements the methods described in Crowther and Lambert (2013) and is modelled on the survsim
package available in the Stata software.
rstanarm
is an extensive R package for Bayesian applied regression modelling. It is written and maintained by Ben Goodrich and Jonah Gabry. However, I have contributed code for fitting multivariate mixed models (the stan_mvmer
modelling function), joint longitudinal and time-to-event models (the stan_jm
modelling function), and time-to-event models themselves (the stan_surv
modelling function), as well as a number of post-estimation functions for obtaining predictions and diagnostics for the fitted models. Note that when you read this the stan_surv
modelling function may not yet be available in the CRAN version of rstanarm, in which case see the installation instructions here.
simjm
is an R package package that allows the user to simulate data from a shared parameter joint model for longitudinal and time-to-event data. The shared parameter joint model from which the simulated data is generated is based on the model formulation described for the stan_jm
modelling function in the rstanarm
R package. The shared parameter joint model can be univariate (i.e. one longitudinal marker) or multivariate (i.e. more than one longitudinal marker) and a variety of parameterisations are allowed for the association structure between the longitudinal and event submodels.
devr2
is a Stata module that can be used to calculate a deviance based R-squared measure for models estimated using Stata’s glm
command. The measure is based on the method of Cameron and Windmeijer (1997). The module can be easily installed from within your Stata session; simply type ssc install devr2
into the Command window.