The project ideas below can be tailored for both Part II and Part III/MPhil students, depending on their scope. If you're interested in working on any of them (or on a related topic) feel free to get in touch: ceren.kocaogullar@cl.cam.ac.uk.
Contextual Integrity (CI) is a theory of privacy which says that information should only flow in ways appropriate to the social context. This idea can be useful for privacy in agentic AI systems (LLMs that plan and act). For example, if an agent is handling medical data, it should not reveal it to tools or subagents outside the permitted context. Here are some relevant papers about building agents that apply CI principles to agents [1, 2, 3]. There are many directions you can go from there papers, including:
MCP allows agents to use tools by exchanging structured prompts. However, attackers might exploit misconfigured tools, inject malicious contexts, or exfiltrate data via outputs. In this project, you can simulate and analyse such attacks, then implement defenses such as tool sandboxing, digital signatures, or prompt-level policy checks. You can evaluate the effectiveness of these defences in different realistic scenarios.
LLM systems are often hard to debug without logs, but logging can leak sensitive data. In this project, you can design a privacy-aware logging system for an agentic framework (like LangChain or AutoGen), which supports features like redaction, anonymisation, or logging only derived summaries. You can implement this system and compare it to naive logging in terms of both privacy risk and developer usability.
LLM agents often invoke external tools (e.g. APIs, databases) without fine-grained access control. This project is about implementing a capability-based security model, where each tool invocation requires an explicit token granted by the system. Tokens can encode permissions (e.g. read-only, rate-limited). You can evaluate how this system reduces the risk of unintended or malicious tool usage, and whether it remains usable in complex tasks.
LLM agents often rely on external tools (e.g. APIs, scripts, calculators) to complete tasks; but what if those tools are buggy, misconfigured, or even malicious? In this project, you can develop a testing framework that injects adversarial tools into an agent’s environment. For example, you might simulate a tool that returns misleading data, causes an exception, or attempts to leak memory contents. You can then study how different types of agents respond (e.g. do they fail safely or continue with incorrect assumptions?), and propose defenses like tool verification and input validation.