CAREER: TRANSFER LEARNING FOR ECONOMIC PREDICTION OF LABOR DATA

Abstract

Labor economists regularly analyze employment data by fitting predictive models to small, carefully constructed longitudinal survey datasets. Although modern machine learning methods offer promise for such problems, these survey datasets are too small to take advantage of them. In recent years large datasets of online resumes have also become available, providing data about the career trajectories of millions of individuals. However, standard econometric models cannot take advantage of their scale or incorporate them into the analysis of survey data. To this end we develop CAREER, a transformer-based model that uses transfer learning to learn representations of job sequences. CAREER is first fit to large, passivelycollected resume data and then fine-tuned to smaller, better-curated datasets for economic inferences. We fit CAREER to a dataset of 24 million job sequences from resumes, and fine-tune its representations on longitudinal survey datasets. We find that CAREER forms accurate predictions of job sequences, achieving state-of-the-art predictive performance on three widely-used economics datasets. We further find that CAREER can be used to form good predictions of other downstream variables; incorporating CAREER into a wage model provides better predictions than the econometric models currently in use.

1. INTRODUCTION

A variety of economic analyses rely on models for predicting an individual's future occupations. These models are crucial for estimating important economic quantities, such as gender or racial differences in unemployment (Hall, 1972; Fairlie & Sundstrom, 1999) ; they underpin causal analyses and decompositions that rely on simulating counterfactual occupations for individuals (Brown et al., 1980; Schubert et al., 2021) ; and they inform policy, by forecasting occupations with rising or declining market shares. These analyses typically involve fitting predictive models to longitudinal surveys that follow a cohort of individuals during their working career (Panel Study of Income Dynamics, 2021; Bureau of Labor Statistics, 2019a). Such surveys have been carefully collected to represent national demographics, ensuring that the economic analyses can generalize to larger populations. But these datasets are also small, usually containing only thousands of workers, because maintaining them requires regularly interviewing each individual. Consequently, economists use simple sequential models, where a worker's next occupation depends on their history only through the most recent occupation (Hall, 1972) or a few summary statistics about the past (Blau & Riphahn, 1999) . In recent years, however, much larger datasets of online resumes have also become available. In contrast to longitudinal surveys, these passively-collected datasets are not typically used directly for economic inferences because they contain noisy observations and they are missing important economic variables such as demographics and wage. However, they provide occupation sequences of millions of individuals, potentially expanding the scope of insights that can be obtained from analyses on downstream survey datasets. The simple econometric models currently in use cannot incorporate the complex patterns embedded in these larger datasets into the analysis of survey data. To this end, we develop CAREER, a neural sequence model of occupation trajectories. CAREER is designed to be pretrained on large-scale resume data and then fine-tuned to small and bettercurated survey data for economic prediction. Its architecture is based on the transformer language model (Vaswani et al., 2017) , for which pretraining and fine-tuning has proven to be an effective paradigm for many NLP tasks (Devlin et al., 2019; Lewis et al., 2019) . CAREER extends this transformer-based transfer learning approach to modeling sequences of occupations, rather than text. We will show that CAREER's representations provide effective predictions of occupations on survey datasets used for economic analysis, and can be used as inputs to economic models for other downstream applications. To study this model empirically, we pretrain CAREER on a dataset of 24 million resumes provided by Zippia, a career planning company. We then fine-tune CAREER's representations of job sequences to make predictions on three widely-used economic datasets: the National Longitudinal Survey of Youth 1979 (NLSY79), another cohort from the same survey (NLSY97), and the Panel Study of Income Dynamics (PSID). In contrast to resume data, these well-curated datasets are representative of the larger population. It is with these survey datasets that economists make inferences, ensuring their analyses generalize. In this study, we find that CAREER outperforms standard econometric models for predicting and forecasting occupations, achieving state-of-the-art performance on the three widely-used survey datasets. We further find that CAREER can be used to form good predictions of other downstream variables; incorporating CAREER into a wage model provides better predictions than the econometric models currently in use. We release code so that practitioners can train CAREER on their own datasets. In summary, we demonstrate that CAREER can leverage large-scale resume data to make accurate predictions on important datasets from economics. Thus CAREER ties together economic models for understanding career trajectories with transformer-based methods for transfer learning. (See Section 3 for details of related work.) A flexible predictive model like CAREER expands the scope of analyses that can be performed by economists and policy-makers.

2. CAREER

Given an individual's career history, what is the probability distribution of their occupation in the next timestep? We go over a class of models for predicting occupations before introducing CA-REER, one such model based on transformers and transfer learning.

2.1. OCCUPATION MODELS

Consider an individual worker. This person's career can be defined as a series of timesteps. Here, we use a timestep of one year. At each timestep, this individual works in a job: it could be the same job as the previous timestep, or a different job. (Note we use the terms "occupation" and "job" synonymously.) We consider "unemployed" and "out-of-labor-force" to be special types of jobs. Define an occupation model to be a probability distribution over sequences of jobs. An occupation model predicts a worker's job at each timestep as a function of all previous jobs and other observed characteristics of the worker. More formally, define an individual's career to be a sequence (y 1 , . . . , y T ), where each y t ∈ {1, . . . , J} indexes one of J occupations at time t. Occupations are categorical; one example of a sequence could be ("cashier", "salesperson", ... , "sales manager"). At each timestep, an individual is also associated with C observed covariates x t = {x tc } C c=1 . Covariates are also categorical, with x tc ∈ {1, . . . , N c }. For example, if c corresponds to the most recent educational degree, x tc could be "high school diploma" or "bachelors", and N c is the number of types of educational degrees.foot_0 Define y t = (y 1 , . . . , y t ) to index all jobs that have occurred up to time t, with the analogous definition for x t . At each timestep, an occupation model predicts an individual's job in the next timestep, p(y t |y t-1 , x t ). This distribution conditions on covariates from the same timestep because these are "pre-transition." For example, an individual's most recent educational degree is available to the model as it predicts their next job.



Some covariates may not evolve over time. We encode them as time-varying without loss of generality.

