INTERPRETABLE (META)FACTORIZATION OF CLINI-CAL QUESTIONNAIRES TO IDENTIFY GENERAL DIMEN-SIONS OF PSYCHOPATHOLOGY

Abstract

Psychiatry research aims at understanding manifestations of psychopathology in behavior, in terms of a small number of latent constructs. These are usually inferred from questionnaire data using factor analysis. The resulting factors and relationship to the original questions are not necessarily interpretable. Furthermore, this approach does not provide a way to separate the effect of confounds from those of constructs, and requires explicit imputation for missing data. Finally, there is no clear way to integrate multiple sets of constructs estimated from different questionnaires. An important question is whether there is a universal, compact set of constructs that would span all the psychopathology issues listed across those questionnaires. We propose a new matrix factorization method designed for questionnaires aimed at promoting interpretability, through bound and sparsity constraints. We provide an optimization procedure with theoretical convergence guarantees, and validate automated methods to detect latent dimensionality on synthetic data. We first demonstrate the method on a commonly used general-purpose questionnaire. We then show it can be used to extract a broad set of 15 psychopathology factors spanning 21 questionnaires from the Healthy Brain Network study. We show that our method preserves diagnostic information against competing methods, even as it imposes more constraints. Finally, we demonstrate that it can be used for defining a short, general questionnaire that allows recovery of those 15 meta-factors, using data more efficiently than other methods.

1. INTRODUCTION

Standardized questionnaires are a common tool in psychiatric practice and research, for purposes ranging from screening to diagnosis or quantification of severity. A typical questionnaire comprises questions -usually referred to as items -reflecting the degree to which particular symptoms or behavioural issues are present in study participants. Items are chosen as evidence for the presence of latent constructs giving rise to the psychiatric problems observed. For many common disorders, there is a practical consensus on constructs. If so, a questionnaire may be organized so that subsets of the items can be added up to yield a subscale score quantifying the presence of their respective construct. Otherwise, the goal may be to discover constructs through factor analysis. The factor analysis of a questionnaire matrix (#participants × #items) expresses it as the product of a factor matrix (#participants × #factors) and a loading matrix (#factors × #items). The method assumes that answers to items may be correlated, and can therefore be explained in terms of a smaller number of factors. The method yields two real-valued matrices, with uncorrelated columns in the factor matrix. The number of factors needs to be specified a priori, or estimated from data. This solution is often subjected to rotation so that, after transformation, each factor has non-zero loadings on few variables, and each variable has a high-loading on a single factor, if possible. The values of the factors for each participant can then be viewed as a succinct representation of them. Interpreting what construct a factor may represent is done by considering its loadings over all the items. Ideally, if very few items have a non-zero loading, it will be easy to associate the factor with them. However, in practice, the loadings could be an arbitrary linear combination of items, with positive and negative weights. Factors are real-valued, and neither their magnitude nor their sign are

