NEURAL FRAILTY MACHINE: BEYOND PROPOR-TIONAL HAZARD ASSUMPTION IN NEURAL SURVIVAL REGRESSIONS

Abstract

We present neural frailty machine (NFM), a powerful and flexible neural modeling framework for survival regressions. The NFM framework utilizes the classical idea of multiplicative frailty in survival analysis to capture unobserved heterogeneity among individuals, at the same time being able to leverage the strong approximation power of neural architectures for handling nonlinear covariate dependence. Two concrete models are derived under the framework that extends neural proportional hazard models and nonparametric hazard regression models. Both models allow efficient training under the likelihood objective. Theoretically, for both proposed models, we establish statistical guarantees of neural function approximation with respect to nonparametric components via characterizing their rate of convergence. Empirically, we provide synthetic experiments that verify our theoretical statements. We also conduct experimental evaluations over 6 benchmark datasets of different scales, showing that the proposed NFM models outperform state-of-the-art survival models in terms of predictive performance.

1. INTRODUCTION

Regression analysis of time-to-event data (Kalbfleisch & Prentice, 2002) has been among the most important modeling tools for clinical studies and has witnessed a growing interest in areas like corporate finance (Duffie et al., 2009) , recommendation systems (Jing & Smola, 2017) , and computational advertising (Wu et al., 2015) . The key feature that differentiates time-to-event data from other types of data is that they are often incompletely observed, with the most prevailing form of incompleteness being the right censoring mechanism (Kalbfleisch & Prentice, 2002) . In the right censoring mechanism, the duration time of a sampled subject is (sometimes) only known to be larger than the observation time instead of being recorded precisely. It is well known in the community of survival analysis that even in the case of linear regression, naively discarding the censored observations produces estimation results that are statistically biased (Buckley & James, 1979) , at the same time losses sample efficiency if the censoring proportion is high. Cox's proportional hazard (CoxPH ) model (Cox, 1972) using the convex objective of negative partial likelihood (Cox, 1975) is the de facto choice in modeling right censored time-to-event data (hereafter abbreviated as censored data without misunderstandings). The model is semiparametric (Bickel et al., 1993) in the sense that the baseline hazard function needs no parametric assumptions. The original formulation of CoxPH model assumes a linear form and therefore has limited flexibility since the truth is not necessarily linear. Subsequent studies extended CoxPH model to nonlinear variants using ideas from nonparametric regression (Huang, 1999; Cai et al., 2007; 2008) , ensemble learning (Ishwaran et al., 2008) , and neural networks (Faraggi & Simon, 1995; Katzman et al., 2018) . While such extensions allowed a more flexible nonlinear dependence structure with the covariates, the learning objectives were still derived under the proportional hazards (PH) assumption, which was shown to be inadequate in many real-world scenarios (Gray, 2000) . The most notable case was the failure of modeling the phenomenon of crossing hazards (Stablein & Koutrouvelis, 1985) . It is thus of significant interest to explore extensions of CoxPH that both allow nonlinear dependence over covariates and relaxations of the PH assumption. Frailty models (Wienke, 2010; Duchateau & Janssen, 2007) are among the most important research topics in modern survival analysis, in that they provide a principled way of extending CoxPH model via incorporating a multiplicative random effect to capture unobserved heterogeneity. The resulting parameterization contains many useful variants of CoxPH like the proportional odds model (Bennett, 1983) , under specific choices of frailty families. While the theory of frailty models has been wellestablished (Murphy, 1994; 1995; Parner, 1998; Kosorok et al., 2004) , most of them focused on the linear case. Recent developments on applying neural approaches to survival analysis (Katzman et al., 2018; Kvamme et al., 2019; Tang et al., 2022; Rindt et al., 2022) have shown promising results in terms of empirical predictive performance, with most of them lacking theoretical discussions. Therefore, it is of significant interest to build more powerful frailty models via adopting techniques in modern deep learning (Goodfellow et al., 2016) with provable statistical guarantees. In this paper, we present a general framework for neural extensions of frailty models called the neural frailty machine (NFM). Two concrete neural architectures are derived under the framework: The first one adopts the proportional frailty assumption, allowing an intuitive interpretation of the neural CoxPH model with a multiplicative random effect. The second one further relaxes the proportional frailty assumption and could be viewed as an extension of nonparametric hazard regression (NHR) (Cox & O'Sullivan, 1990; Kooperberg et al., 1995) , sometimes referred to as "fully neural" models under the context of neural survival analysis (Omi et al., 2019) . We summarize our contributions as follows. • We propose the neural frailty machine (NFM) framework as a principled way of incorporating unobserved heterogeneity into neural survival regression models. The framework includes many commonly used survival regression models as special cases. • We derive two model architectures based on the NFM framework that extend neural CoxPH models and neural NHR models. Both models allow stochastic training and scale to large datasets. • We show theoretical guarantees for the two proposed models via characterizing the rates of convergence of the proposed nonparametric function estimators. The proof technique is different from previous theoretical studies on neural survival analysis and is applicable to many other types of neural survival models. 

2.2. BEYOND COXPH IN SURVIVAL ANALYSIS

In linear survival modeling, there are standard alternatives to CoxPH such as the accelerated failure time (AFT) model (Buckley & James, 1979; Ying, 1993) , the extended hazard regression model (Etezadi-Amoli & Ciampi, 1987) , and the family of linear transformation models (Zeng & Lin,



• We conduct extensive studies on various benchmark datasets at different scales. Under standard performance metrics, both models are empirically shown to perform competitively, matching or outperforming state-of-the-art neural survival models.Gijbels, 1996). The empirical success of tree-based models inspired subsequent developments like Ishwaran et al. (2008) that equip tree-based models such as gradient boosting trees and random forests with losses in the form of negative log partial likelihood. Early developments of neural survival analysis Faraggi & Simon (1995) adopted similar extension strategies and obtained neural versions of partial likelihood. Later attemptsKatzman et al. (2018)  suggest using the successful practice of stochastic training which is believed to be at the heart of the empirical success of modern neural methods(Hardt et al., 2016). However, stochastic training under the partial likelihood objective is highly non-trivial, as mini-batch versions of log partial likelihoodKatzman et al.  (2018)  are no longer valid stochastic gradients of the full-sample log partial likelihood(Tang et al.,  2022).

