I would recommend implementing this infrastructure in a dependently typed language, such as Agda. In a language like Agda, there is a close correspondence between programs and proofs. You can write programs in Agda to animate the operational semantics, and you can also write proofs in Agda for basic properties of the operational semantics.
There are various directions that this project could take. Contact Sam Staton (ss368 at cam.ac.uk) if interested.
See also: the bisimulation checking project that I proposed a few years ago.