PLAY TO GRADE: GRADING INTERACTIVE CODING GAMES AS CLASSIFYING MARKOV DECISION PRO-CESS

Abstract

Contemporary coding education often present students with the task of developing programs that have user interaction and complex dynamic systems, such as mouse based games. While pedagogically compelling, grading such student programs requires dynamic user inputs, therefore they are difficult to grade by unit tests. In this paper we formalize the challenge of grading interactive programs as a task of classifying Markov Decision Processes (MDPs). Each student's program fully specifies an MDP where the agent needs to operate and decide, under reasonable generalization, if the dynamics and reward model of the input MDP conforms to a set of latent MDPs. We demonstrate that by experiencing a handful of latent MDPs millions of times, we can use the agent to sample trajectories from the input MDP and use a classifier to determine membership. Our method drastically reduces the amount of data needed to train an automatic grading system for interactive code assignments and present a challenge to state-of-the-art reinforcement learning generalization methods. Together with Code.org, we curated a dataset of 700k student submissions, one of the largest dataset of anonymized student submissions to a single assignment. This Code.org assignment had no previous solution for automatically providing correctness feedback to students and as such this contribution could lead to meaningful improvement in educational experience.

1. INTRODUCTION

The rise of online coding education platforms accelerates the trend to democratize high quality computer science education for millions of students each year. Corbett (2001) suggests that providing feedback to students can have an enormous impact on efficiently and effectively helping students learn. Unfortunately contemporary coding education has a clear limitation. Students are able to get automatic feedback only up until they start writing interactive programs. When a student authors a program that requires user interaction, e.g. where a user interacts with the student's program using a mouse, or by clicking on button it becomes exceedingly difficult to grade automatically. Even for well defined challenges, if the user has any creative discretion, or the problem involves any randomness, the task of automatically assessing the work is daunting. Yet creating more open-ended assignments for students can be particularly motivating and engaging, and also help allow students to practice key skills that will be needed in commercial projects. Generating feedback on interactive programs from humans is more laborious than it might seem. Though the most common student solution to an assignment may be submitted many thousands of times, even for introductory computer science education, the probability distribution of homework submissions follows the very heavy tailed Zipf distribution -the statistical distribution of natural language. This makes grading exceptionally hard for contemporary AI (Wu et al., 2019) as well as massive crowd sourced human efforts (Code.org, 2014) . While code as text has proved difficult to grade, actually running student code is a promising path forward (Yan et al., 2019) . We formulate the grading via playing task as equivalent to classifying whether an ungraded student program -a new Markov Decision Process (MDP) -belongs to a latent class of correct Markov Decision Processes (representing correct programming solutions to the assignment). Given a discrete set of environments E = {e n = (S n , A, R n , P n ) : n = 1, 2, 3, ...}, we can partition them into E

