Adaptive Estimation for Epidemic Renewal and Phylogenetic Skyline Models
The renewal model uses the observed incidence across an epidemic to estimate its underlying time-varying effective reproductive number, R(t). The skyline model infers the time-varying effective population size, N(t), responsible for the shape of an observed phylogeny of sequences sampled from an infected population. While both models solve different epidemiological problems, the bias and precision of their estimates depend on p-dimensional piecewise-constant descriptions of their variables of interest. At large p estimates can detect rapid changes but are noisy, while at small p inference, though precise, lacks temporal resolution. Surprisingly, no transparent, principled approach for optimally selecting p, for either model, exists. Usually, p is set heuristically, or obscurely controlled using complex algorithms. We present an easily computable and interpretable method for choosing p based on the minimum description length (MDL) formalism of information theory. Unlike many standard model selection techniques, MDL accounts for the additional statistical complexity induced by how parameters interact. As a result, our method optimises p so that R(t) and N(t) estimates properly adapt to the available data. It also outperforms comparable Akaike and Bayesian information criteria over several model classification problems. Our approach requires some knowledge of the parameter space, and exposes the similarities between renewal and skyline models.