SIMPLEKT: A SIMPLE BUT TOUGH-TO-BEAT BASE-LINE FOR KNOWLEDGE TRACING

Abstract

Knowledge tracing (KT) is the problem of predicting students' future performance based on their historical interactions with intelligent tutoring systems. Recently, many works present lots of special methods for applying deep neural networks to KT from different perspectives like model architecture, adversarial augmentation and etc., which make the overall algorithm and system become more and more complex. Furthermore, due to the lack of standardized evaluation protocol (Liu et al., 2022), there is no widely agreed KT baselines and published experimental comparisons become inconsistent and self-contradictory, i.e., the reported AUC scores of DKT on ASSISTments2009 range from 0.721 to 0.821 (Minn et al., 2018; Yeung & Yeung, 2018). Therefore, in this paper, we provide a strong but simple baseline method to deal with the KT task named SIMPLEKT. Inspired by the Rasch model in psychometrics, we explicitly model questionspecific variations to capture the individual differences among questions covering the same set of knowledge components that are a generalization of terms of concepts or skills needed for learners to accomplish steps in a task or a problem. Furthermore, instead of using sophisticated representations to capture student forgetting behaviors, we use the ordinary dot-product attention function to extract the time-aware information embedded in the student learning interactions. Extensive experiments show that such a simple baseline is able to always rank top 3 in terms of AUC scores and achieve 57 wins, 3 ties and 16 loss against 12 DLKT baseline methods on 7 public datasets of different domains. We believe this work serves as a strong baseline for future KT research. Code is available at https://github.com/pykt-team/pykt-toolkit 1 .

1. INTRODUCTION

Knowledge tracing (KT) is a sequential prediction task that aims to predict the outcomes of students over questions by modeling their mastery of knowledge, i.e., knowledge states, as they interact with learning platforms such as massive open online courses and intelligent tutoring systems, as shown in Figure 1 . Solving the KT problems may help teachers better detect students that need further attention, or recommend personalized learning materials to students. The KT related research has been studied since 1990s where Corbett and Anderson, to the best of our knowledge, were the first to estimate students' current knowledge with regard to each individ-ual knowledge component (KC) (Corbett & Anderson, 1994) . A KC is a description of a mental structure or process that a learner uses, alone or in combination with other KCs, to accomplish steps in a task or a problemfoot_0 . Since then, many attempts have been made to solve the KT problem, such as probabilistic graphical models (Käser et al., 2017) and factor analysis based models (Cen et al., 2006; Lavoué et al., 2018; Thai-Nghe et al., 2012) . Recently, with the rapid development of deep neural networks, many deep learning based knowledge tracing (DLKT) models are developed, such as auto-regressive based deep sequential KT models (Piech et al., 2015; Yeung & Yeung, 2018; Chen et al., 2018; Wang et al., 2019; Guo et al., 2021; Long et al., 2021; Chen et al., 2023) , memoryaugmented KT models (Zhang et al., 2017; Abdelrahman & Wang, 2019; Yeung, 2019) , attention based KT models (Pandey & Karypis, 2019; Pandey & Srivastava, 2020; Choi et al., 2020; Ghosh et al., 2020; Pu et al., 2020) , and graph based KT models (Nakagawa et al., 2019; Yang et al., 2020; Tong et al., 2020) . ? ? Knowledge Components Questions c 2 c 1 c 4 c 3 c 2 c 2 c 1 c 2 c 4 c 3 q 2 q 3 q 4 q 6 q 1 q 5 KC illustrative KCs Questions S1 c 2 c 1 c 4 c 3 c 2 c 2 c 1 c 2 c 4 c 3 q 2 q 3 q 4 q 6 q 1 q 5 S2 S3 KC ill ID Know Inte Mu Mi Com c 1 c 2 c 3 c 4 S1 1/1 1/ S2 2/2 2/ S3 2/2 2/ r c r c 1 Individual sc rates at KC " and " ? ? ? KCs Questions S1 c 2 c 1 c 4 c 3 c 2 c 2 c 1 c 2 c 4 c 3 q 2 q 3 q 4 q 6 q 1 q 5 S2 S3 KC illustrative examples ID Knowledge Component (K Integer Distributive Law Multi-digit Calculation Mixed Basic Operations Common Factor in Integer c 1 c 2 c 3 c 4 S1 1/1 1/2 1/1 1/1 4/5 S2 2/2 2/2 0/1 1/2 5/6 S3 2/2 2/foot_2 1/1 2/2 7/9 r c2 r c3 r c4 r c r c1 Individual scoring rates per KC rates at KC and question leve " denote the question is answered correctly and incorrectly. Although DLKT approaches have constituted new paradigms of the KT problem and achieved promising results, recently developed DLKT models seem to be more and more complex and resemble each other with very limited nuances from the methodological perspective: applying different neural components to capture student forgetting behaviors (Ghosh et al. Therefore, in this paper, we propose SIMPLEKT, a simple but tough-to-beat KT baseline that is simple to implement, computationally friendly and robust to a wide range of KT datasets across different domains. Motivated by the Rasch modelfoot_1 that is a classic yet powerful model in psychometrics, the proposed SIMPLEKT approach captures the individual differences among questions covering the same set of KCs by representing each question's embedding as an additive combination of the average of its corresponding KCs' embeddings and a question-specific variation. Furthermore, different from many existing models that try to capture various aforementioned relations and information, the SIMPLEKT is purely based on the attention mechanism and uses the ordinary dot-product attention function to capture the contextual information embedded in the student learning interactions. To comprehensively and systematically evaluate the performance of SIM-PLEKT, we choose to use the publicly available PYKT benchmark 4 implementation to guarantee valid and reproducible comparisons against 12 DLKT methods on 7 popular datasets across differ-



A KC is a generalization of everyday terms like concept, principle, fact, or skill. The Rasch model is also known as the 1PL item response theory model. https://www.pykt.org/



Figure 1: Graphical illustration of the task of Knowledge Tracing. "

, 2020; Nagatani et al.,  2019), recency effects(Zhang et al., 2021), and various auxiliary information including relations between questions and KCs(Tong et al., 2020; Pandey & Srivastava, 2020; Liu et al., 2021; Yang et al.,  2020), question text content(Liu et al., 2019; Wang et al., 2020a), question difficulty level(Liu et al.,  2021; Shen et al., 2022), and students' learning ability(Shen et al., 2020). Furthermore, published DLKT baseline results surprisingly diverge. For example, the reported AUC scores of DKT and AKT on ASSISTments2009 range from 0.721 to 0.821 (Minn et al., 2018; Yeung & Yeung, 2018) and from 0.747 to 0.835 in (Ghosh et al., 2020; Wang et al., 2021) respectively. Another example is that for the performance of DKT on the ASSISTments2009 dataset, it is recognized as one of the best baselines by Ghosh et al. (2020) while Long et al. (2022) and Zhang et al. (2021) showed that its performance is below the average. Recent survey studies by Sarsa et al. (2022) and Liu et al. (2022) summarized aforementioned inconsistencies of baseline results and showed evidence that variations in hyper-parameters and data pre-processing procedures contribute significantly to prediction performance of DLKT models. Specifically, Sarsa et al. (2022) empirically found that even simple baselines with little predictive value may outperform DLKT models with sophisticated neural components. Liu et al. (2022) built a standardized DLKT benchmark platform and showed that the improvement of many DLKT approaches is minimal compared to the very first DLKT model proposed by Piech et al. (2015).

