NEURAL FRAILTY MACHINE: BEYOND PROPOR-TIONAL HAZARD ASSUMPTION IN NEURAL SURVIVAL REGRESSIONS

Abstract

We present neural frailty machine (NFM), a powerful and flexible neural modeling framework for survival regressions. The NFM framework utilizes the classical idea of multiplicative frailty in survival analysis to capture unobserved heterogeneity among individuals, at the same time being able to leverage the strong approximation power of neural architectures for handling nonlinear covariate dependence. Two concrete models are derived under the framework that extends neural proportional hazard models and nonparametric hazard regression models. Both models allow efficient training under the likelihood objective. Theoretically, for both proposed models, we establish statistical guarantees of neural function approximation with respect to nonparametric components via characterizing their rate of convergence. Empirically, we provide synthetic experiments that verify our theoretical statements. We also conduct experimental evaluations over 6 benchmark datasets of different scales, showing that the proposed NFM models outperform state-of-the-art survival models in terms of predictive performance.

1. INTRODUCTION

Regression analysis of time-to-event data (Kalbfleisch & Prentice, 2002) has been among the most important modeling tools for clinical studies and has witnessed a growing interest in areas like corporate finance (Duffie et al., 2009) , recommendation systems (Jing & Smola, 2017), and computational advertising (Wu et al., 2015) . The key feature that differentiates time-to-event data from other types of data is that they are often incompletely observed, with the most prevailing form of incompleteness being the right censoring mechanism (Kalbfleisch & Prentice, 2002) . In the right censoring mechanism, the duration time of a sampled subject is (sometimes) only known to be larger than the observation time instead of being recorded precisely. It is well known in the community of survival analysis that even in the case of linear regression, naively discarding the censored observations produces estimation results that are statistically biased (Buckley & James, 1979) , at the same time losses sample efficiency if the censoring proportion is high. Cox's proportional hazard (CoxPH ) model (Cox, 1972) using the convex objective of negative partial likelihood (Cox, 1975) is the de facto choice in modeling right censored time-to-event data (hereafter abbreviated as censored data without misunderstandings). The model is semiparametric (Bickel et al., 1993) in the sense that the baseline hazard function needs no parametric assumptions. The original formulation of CoxPH model assumes a linear form and therefore has limited flexibility since the truth is not necessarily linear. Subsequent studies extended CoxPH model to nonlinear variants using ideas from nonparametric regression (Huang, 1999; Cai et al., 2007; 2008) , ensemble learning (Ishwaran et al., 2008) , and neural networks (Faraggi & Simon, 1995; Katzman et al., 2018) . While such extensions allowed a more flexible nonlinear dependence structure with the covariates, the learning objectives were still derived under the proportional hazards (PH) assumption, which was shown to be inadequate in many real-world scenarios (Gray, 2000) . The most notable case was the failure of modeling the phenomenon of crossing hazards (Stablein & Koutrouvelis, 1985) . It is thus of significant interest to explore extensions of CoxPH that both allow nonlinear dependence over covariates and relaxations of the PH assumption.

