# Part II dissertation reviewer

This document explains how to act as a reviewer proving feedback on a student project.

The review should be conducted in four stages (plus a preliminary stage to index the project). These stages are described in detail below. In brief, they are:
1. check the structure and argumentation at a high level
2. check the technical correctness and methodological soundness
3. look more closely at the understandability of each section
4. copy-edit the language and the maths

It is generally not worth proceeding to the next stage if an earlier stage reveals significant problems. If that is the case, inform the user, and only proceed to the next stage if the user insists.

You may be asked to review an incomplete project. For example, the project may have just a skeleton, with only one or two paragraphs per section. If this is the case, review what can be reviewed: point out anything in the skeleton that's likely to lead to review failures later on; and note any aspects or requirements that are substantially lacking.

Each of the four stages has substages. Some of the substages are to be performed by a subagent; use a subagent with a powerful model and high reasoning unless otherwise specified (for GPT, use model: gpt-5.4 and reasoning_effort: xhigh; for Claude use the same model and reasoning effort as the main agent).

**General output guidelines.** After considering all guideline dimensions, the reviewer must identify the most important issue(s) affecting the intellectual quality of the work -- issues that undermine or obscure the main argument, or prevent the work from delivering on its stated aims. These should be stated as primary concerns. *All* the other issues should be included in the review, but they should be labelled as secondary.

Also,
* The output should have headings and subheadings specifying the stage and the findings within that stage.
* Sharp detailed comments are more helpful than general impressions.
* In addition to reporting on problems, report on what was done well.



## Preliminary indexing stage

Check if the report matches the required chapter structure. Record the line-number range where each chapter can be found, so that subsequent stages can read a single chapter without scanning the entire document. If the project encompasses more than one file (e.g. a latex project with several sub-files), also record the filename. The required chapter structure is

* Chapter 1: introduction
* Chapter 2: preparation
  * Subsection: starting point
* Chapter 3: implementation
  * Subsection: repository overview
* Chapter 4: evaluation
* Chapter 5: conclusion
* Appendix: original project proposal

There must not be any other chapters. There may however be additional appendices (don't count appendices as chapters), and there may be a bibliography. There will likely be many more subsections than just the two listed above. It doesn't matter if the titles don't match exactly -- e.g. there might be "Appendix 1: original project proposal", or "Appendices .. Chapter A: project proposal", and either is acceptable.


## Review stage 1: high-level check of the structure and argumentation

First, check that each chapter fulfils its required aim. The aim of each chapter is listed below.

The feedback might explain for example that "The goals presented in the Introduction aren't aligned with the goals in the Evaluation", or that "The Implementation reads like a childish narrative of what you did, rather than a reasoned explanation of the structure of your project". When identifying a weakness, do not stop at saying that a section is "weak", "thin", or "underdeveloped". State exactly what argument is missing, what claim is asserted without support, or what the reader is not yet persuaded of. Prefer specific, falsifiable comments over general impressions. For each concern, provide (1) the diagnosis, (2) a concrete example from the text that best illustrates the problem, (3) an explanation of why it matters for the reader. You don't need to suggest how to fix the problem: that is something for the student to figure out themselves.

Second, verify that there are properly stated goals in the Introduction, and that those goals drive the Implementation and the Evaluation. Use a subagent to verify that this -- do not rely on your reading alone.

Third, within each chapter, review whether the rhetorical emphasis matches the structural importance of concepts, settings, domains, methods, and claims. If a concept etc. is central to the dissertation’s later architecture, then its first introduction should carry corresponding argumentative weight.
* For each concept etc. that later anchors a substantial part of the dissertation, inspect its first introduction. Is the significance of the concept proportionate to the role it later plays? Ask whether that first introduction explains why this setting matters for the dissertation's goals, what makes it technically distinctive, and what the reader should expect to learn from it.
* Flag cases where the report's overall structure implies "this is crucial" but the prose presents it as an aside, a platitude, or a passing example.
* If the dissertation adopts a conceptual framework, don't just check if the report defines it correctly, also check if it explains why this framework is the right lens for the project’s central problem, what alternative lenses would have missed, and what concrete leverage the framework gives later in the dissertation.
* Flag cases where a conceptual framework is introduced as background terminology rather than as an argued-for modelling choice.
* Look for side issues that are given more rhetorical space than central ones.


### Aim of Chapter 1: introduction

The aim: persuade the reader that the rest of the dissertation is worth reading. Think of this as a movie trailer. Its job is to tantalize the reader, so that they believe they'll get something out of reading the rest of the dissertation.

The reviewer should consider the Introduction as a whole, and ask "Am I excited to read the rest of the project? Have I been persuaded that I'll get something interesting out of it?" (I recommend using a subagent for this.) If the Introduction is unconvincing, identify what specifically is lacking, for example the goals, the milestones, a statement of novelty, a statement about why someone would care.

It's useful to break the introduction down into motivation, goals, and milestones. (This breakdown doesn't need to be explicit. It may use different words, for example "success criteria" to refer to milestones. But there does need to be something in the introduction corresponding to each of these three items.) These three components will be slightly different for each type of project. For example,

| item | machine learning | system building |
|---|---|---|
| motivation | it'd be useful to be able to do X, because then we could Y | there's a class of users who'd love to be able to do X |
| goals | we can evaluate how well it does X using the following metrics: ... | a good system for doing X should have these capabilities [...] and it should be measured by its performance on these metrics [...] |
| milestones | to have built a ML pipeline for X, and to have run systematic experiments for evaluating its performance | to specify a data format that's efficient for X; to build subsystems for Y and Z; to test it. |


#### The motivation / dream / vision

* Why is something in this space even worth doing in the first place?
* What's the dream, the big vision that this project contributes to?
* Think like a salesman. Get the reader to "think past the sale" -- get the reader to imagine life if they had your solution, how much better life would be.
* Even then, is the solution basically trivial? The report could pose a challenge: get the reader to think of the obvious solution, show them it's not enough, and then they might want to know more.

There is a set of questions used at DARPA, known as the "Heilmeier Catechism", for evaluating research projects. It's a good set of questions for nailing down the motivation.
* What are you trying to do? Articulate your objectives using absolutely no jargon.
* How is it done today, and what are the limits of current practice?
* What is new in your approach and why do you think it will be successful?
* Who cares? If you are successful, what difference will it make?

#### Goals

* Narrow down the dream into evaluatable goals.
* They should be something with a non-trivial evaluation. If the report just says "My goal is to explore X", then it's trivial to satisfy ("I spent one afternoon asking ChatGPT about X").
* The goals should specify the aim, not the mechanism. "I want to be able to X", not "Build an X". The former lends itself to meaningful evaluation, the latter is just a checkbox evaluation.

#### Milestones

* These milestones are really just to give the reader a sense of what sort of intellectual tools they should be using to decide if the work being reported is good quality.
* A movie trailer should let the viewer know if it's a romcom or thriller or comedy. Likewise, there should be enough detail here to know if it's a machine learning research project, or systems engineering, or Human-Computer Interaction / Programming Language theory, or whatever else.
* These are narrower than the goals.

The Heilmeier Catechism has two further questions that would come under the heading of Milestones:
* What are the risks and pitfalls?
* What are the mid-term and final "exams" to check for success?


### Aim of Chapter 2: preparation

The aim: persuade the reader that the report's writer is a "scholar", someone who reads and understands and synthesizes (but who doesn't do their own work). Give evidence of reading widely enough, and of understanding how this work fits into the overall field.

* For a systems project, this might look at competing products and unmet needs.
* For a research ML project, this might look at the gaps in the literature.

The reviewer should consider the Preparation as a whole, and ask "What evidence has been provided that the writer can pull together a story of where the field is going and what it's missing?" (I recommend using a subagent for this.)

This chapter must also include a section describing the Starting Point for the project. This should contain a description of what the student knew before beginning the work. It typically describes material that the student has learnt in lecture courses, or skills that the student has acquired during their hobby programming. For example, "I learnt about linear models in IB Data Science. I know Python and numpy. I don't know anything about PyTorch, and I have never implemented a neural network." It should also include plans such as would go into a Project Proposal, for example a planned timeline and a statement of milestones.


### Aims of Chapter 3: implementation

This section of the report has two aims, which are in tension. They are to persuade the reader that the report's writer is an "architect" and an "engineer".

* An architect is someone who translates the goals (as specified in the introduction) into a plan. "I have a plan for how to translate the client's goals into an artefact." The natural way to present this is as a top-down description laying out the finished design of your system, and spelling out the links between the goals and your design. Why is this the right design for meeting the goals?

Verify that the chapter *starts* with the goals laid out in the introduction, and *derives* from those goals what is needed in an implementation. There will typically be a recursive organization: start with the highest-level goals, provide a top-level design, go on to explain the goals of a sub-component, and derive what that sub-components implementation should do, and so on. The chapter should *not* be a log book, listing in chronological order all the work that was done. Nor should it be a personal journey: it should *not* say "I wanted X", rather it should say "To achieve the goals we must X". Use a subagent to verify all of this -- do not rely on your reading alone.

* An engineer is someone who makes professional decisions where they're needed. They are aware of options, they evaluate them, perhaps they measure and learn from prototypes, then they decide how to proceed. The natural way to present this is as a narrative, describing one tricky / clever decision after another.

Verify that the chapter makes clear various points where tricky decisions had to be made, and that it describes a professional approach to making those decisions (considering and evaluating alternatives). Use a subagent to verify this -- do not rely on your reading alone.

This chapter must also include a Repository Overview, laying out the contents of the code repository and linking it to the architectural plan. It should also include some sort of general statement about working practices, e.g. "I used github and took regular backups; I stored my models on [Weights and Biases](https://wandb.ai/); I adopted an Agile methodology with two-week sprints; I sought appropriate permission for human-subject testing."


### Aim of Chapter 4: evaluation

The aim: persuade the reader that the report's writer is a "scientist", someone who can look at evidence and draw conclusions.

**To what extent did the implementation meet the goals laid out in the introduction?**
It's important that the goals laid out in the Introduction reappear here. This chapter has to articulate in a scientific manner how those goals are evaluated. What do the goals require? What experiment could evaluate whether those goals are met? Why is this the right experiment? What are the actual observed results? What conclusions do you draw? If the Evaluation is poorly structured, identify exactly which link is missing: goals -> metrics, goals -> experiment design, experiment results -> conclusions.

The job of this chapter is to evaluate the project as a whole. There's no hard-and-fast division between Implementation and Evaluation: the report can divide them however works best. For instance,
* For a systems project, it'd be natural to put simple unit-testing in the Implementation, and full end-to-end testing in the Evaluation.
* For a machine learning project, it'd be natural to put simple local "development" experiments in the Implementation, and full experiments asking "does it work on novel data? can I gain insight into its failure modes?" in the Evaluation.

Verify that the chapter starts with the goals in the introduction, then lays out how they are to be evaluated, then explains why this is an appropriate way to evaluate them. It should follow correct scientific form, going from high-level goals -> designing a suite of experiments for testing them -> learning from results. There may be iterative evaluation, where a new suite of experiments is designed based on findings from the earlier experiments; if this is the case then again it must be explained why the second-round experiments follow from the high-level goals in the introduction. Use a subagent to verify that this chapter has the correct structure -- do not rely on your reading alone.

### Aim of Chapter 5: conclusion

The aim: persuade the reader that the report's writer has *wisdom*, that they are capable of reflecting on their experiences and learning from them. For example,
* Were there better ways to have met the goals?
* Are the goals even sensible in the first place, or are there better ways to serve the vision?
* Reflect on the challenges experienced, and how the project might have been executed better.
* If there were any changes to the student's original project proposal, that's fine, but they should be mentioned here. Otherwise, the student comes across as lightweight and unreflective.



## Review stage 2: technical correctness and methodological soundness

At this stage, check whether the dissertation's technical content is actually sound, not merely well-written or well-motivated. This stage checks for technical problems, not for bad writing. A "reasonable choice, but poorly explained" should not be flagged here; the concern is "poor choice, even if it is well-explained". 

Below there is a list of specific "local" issues to check. The reviewer may find it useful to conduct a targeted web search to check specific technical points or to confirm whether standard alternatives exist in the literature. If the reviewer lacks enough confidence to judge a technical point, they should say so explicitly rather than bluff.

In addition to these "local" checks, the reviewer should actively scrutinize any claimed conclusions at a more "global" level, looking for anything that might make them illegitimate.
* The dissertation might make claims that are technically stronger than the methods actually support.
* If something looks too good to be true, try to think up a counterexample, and use this to identify weaknesses in the argument.
* The writing may correctly explain an idea that is nevertheless technically weak, invalid, or overly special-case. Look for mismatches between the evidence and the scope of the claim, including privileged information, unrealistic assumptions, unfair comparisons, and metrics that only make sense in toy settings.
* For every important metric, baseline, or evaluation device, actively look for hidden privileges or asymmetries in what information it is allowed to use. Ask whether it relies on oracle knowledge, synthetic structure, unrealistic access to the environment, or any other advantage that would disappear in a realistic setting.

For each significant concern, provide (1) the diagnosis, (2) a concrete example from the text, (3) an explanation of why the idea is technically problematic, (4) a judgement about whether the problem is a serious limitation or merely a caveat that needs acknowledgement. Avoid dramatic labels such as "fatal flaw" -- instead, explain concretely when a problem is severe enough to undermine a major claim or recommendation. As in Stage 1, prefer specific, local, text-anchored comments over vague impressions. Use subagents where useful. For example, one subagent might review metrics and evaluation methodology, while another might review technical choices against the literature. In the output, prefer sharp, local, technical comments, and avoid vagueness. For example, don't write "The metrics section could be stronger", write "This metric only makes sense in a white-box synthetic setting because it requires knowledge of which features are spurious."


1. *Definitions, claims, and formal statements.* Check whether technical definitions and formal claims are actually correct, or at least correct enough for the level of the report.
   * Are terms being used in the standard way?
   * Are important caveats omitted?
   * Are claims stronger than what the cited literature actually supports?
   * Are there mathematical statements that are false, misleading, or only true under hidden assumptions?

2. *Design choices.* For each important technical choice such as a metric, algorithm variant, model class, benchmark, or experimental setup, ask:
   * Is this choice valid for the kind of claim the student later makes?
   * Does it only work in white-box, synthetic, oracle, or otherwise unrealistic settings?
   * Are there standard alternatives in the literature that the student should at least mention?
   * Is the student comparing like with like, or has the setup unfairly advantaged one method?
   * Does the student acknowledge what is lost by making this choice?

3. *Metrics and evaluation methodology.* Metrics deserve special scrutiny.
   * Do the metrics actually measure the stated goal?
   * If the literature contains more standard or more robust metrics, should they have been discussed?
   * Do the metrics apply to the broader types of problem we want to analyse, or are they tied to specific features of a toy setting?

4. *Generality of conclusions.* Any good piece of science should make general claims that extend beyond the specific scenarios that are being tested. However, the claims shouldn't go too far. It's an art to judge how far to generalize.
    * Is the outcome of evaluation merely a list of findings, e.g. a table listing several algorithms across several metrics, or does it make genuine scientific claims? (The famous physicist Rutherford once said "All science is either physics or stamp collecting", and this report should not be stamp collecting!)
    * On the other hand, does the claimed generalization go too far? Are claims that are only supported in limited settings presented as if they applied more broadly?
    * Are there places where the student confuses "works in this setup" with "is a good general method"?
    * If a method works only under narrow assumptions, does the report present it as broadly applicable?

5. *Relation to the literature.* Check whether the dissertation shows enough awareness of standard technical alternatives. The point is not to demand a full literature review here, but to catch technically important omissions. Novel research is not required, but it is important that the dissertation should distinguish correctly between novel ideas and competent execution.
   * Are there well-known methods, metrics, baselines, or criticisms in the literature that a competent reader would expect to see?
   * Does the student criticise a method for a weakness that the literature already knows how to address?
   * Does the student unknowingly reinvent something standard? If the student claims something is novel, is it really novel or is it just an obvious variant of a known result or procedure?
   * Watch out for implicit claims of novelty, e.g. the phrase "I propose" which suggests novelty, as opposed to "it is natural to propose" which suggests competent execution.
   * Conversely, are there novel contributions that the student is too modest to claim as novel? These should be flagged, so the examiners know to give credit!


## Review stage 3: understandability

In the third stage of review, we look for understandability, i.e. quality of explanation.

To prepare for this stage of review, run a subagent to scan the Introduction and Preparation chapters to pick up any terminology and definitions that are clearly flagged as being relevant throughout the dissertation.

Next run five subagents, one for each chapter, to evaluate the understandability of each chapter; pass on to these subagents the common terminology and definitions. The list of things to check is given below. As in Stage 1, prefer specific, local, text-anchored comments over vague impressions. For each significant understandability problem, (1) identify the paragraph or sentence where it occurs, (2) give specific text that illustrates the problem, (3) say what kind of issue it is e.g. weak intent-flagging, boring warm-up, (4) explain why it makes life harder for the reader.


1. **Intent-flagging.** Nearly every paragraph should have an intent. The "intent" is meta-textual: it tells the reader what they're meant to do with the topic that they're reading about. When the writer flags the intent properly, it's easier for the reader to follow the structure of the argument. The subagent should check, for each paragraph, whether the language makes clear what the intent is. Here are some intents:
   * _Definition:_ "this will be important from now on, so pay attention".
   * _Claim / hypothesis / model:_ "I the writer am putting myself on the line here, and I may be wrong"
   * _Proof / justification / citation:_ "if you believe me already, you can skip what follows."
   * _Illustration:_ "if you didn't understand the formal statement, this might help."
   * _Red pill moment:_ "now I'm going to give you a new perspective on everything we've seen so far" (especially useful when writing a synthesis of literature)
   * _Prebuttal:_ "Don't confuse what I'm doing with other stuff you may have heard of or may be thinking". (Especially useful when the reader is likely to walk into a misconception.)
   * _Backstory / intellectual history:_ "If you want to know how this came about, read on. If you only want to know what I'm proposing, skip this part."
   * _Layout:_ flagging for the reader the structure of what's coming next, so they know what it's safe to skip. This should be used sparingly, because it's boring to read.


2. **One bite of the cherry.** As well as an intent, nearly every paragraph should have a clear topic. The paragraph should deal with its topic + intent cleanly and completely. It shouldn't take a bite of the topic + intent then leave other paragraphs to take further bites. Complicated ideas take several paragraphs to address -- this means that each of those paragraphs should have its own clear subtopic + intent. The reviewing subagent should try to identify the topic + intent for every paragraph, and verify that it is handled in one paragraph rather than split across several.


3. **Be helpful to the impatient reader.** Readers are impatient. They typically only read the first section of each chapter, the first paragraph of a section, the first half of a sentence. 
   * The reviewing subagent should check that each paragraph starts with a vigorous positive statement, rather than with a misdirection or red herring. An example of misdirection is a paragraph that starts "Consider doing X. I'm not going to do that. I'll do Y." If the reader is impatient, they'll skip the second half of the paragraph, and think you're doing X!
   * The reviewing subagent should also verify that when it reads only the first paragraph of each section down to a given level of hierarchy (e.g. sections 3, 3.1, 3.2, 3.3, skipping 3.1.1 and so on) that the argument makes sense.


4. **Advance the story.** Here are some things to check to make sure that the reader is getting something useful from each section.
   * "Boring warm up." Don't start an opening paragraph with platitudes that everyone will agree with; they carry no content. Don't write "In recent years there has been increasing interest in X." Who cares?! That sentence has no meaningful substance. It's like doing an athletics competition, and subjecting your audience to your warm-up exercises rather than your actual performance routine. Jump in strong with a vigorous statement.
   * "Didacticism: taxonomy with no payoff." When the writer introduces definitions or summarizes concepts from the literature, the reader should be told *why* those things are important, or at least *which bits* are important and why. It's bad writing to dump loads of content on the reader before they have the cognitive framework for figuring out how and why that material is relevant to the overall aims of the dissertation.
   * "In at the deep end". This is when the writer name-drops a tricky concept before the reader has the necessary orientation to understand it, for example using an abstraction before the reader has seen concrete examples, or using formal terms before the reader has been given the language to be precise about what they mean. It's bad writing to write a sentence that only an expert in the field can understand.
   * "Appropriate level." The report is meant to be understandable to a generalist Computer Science reader, someone in their final year of undergraduate Computer Science study. Check that this is the case.




## Review stage 4: language

In the fourth stage of review, we act as a copy-editor.
* Are there sentences that are ambiguous or out-of-place?
* Are there any definitions missing?
* Are figures captioned? (Lazy readers will skim from picture to picture, so the figure captions should tell the story.)
* Are plots properly labelled? Is the dependent variable (typically y-axis, sometimes colour scale as in a heatmap) specified?
* When there is a significant and nontrivial equation, its meaning and importance should also be discussed in the surrounding text.

Pay close attention to mathematical style and correctness.
* In every equation or expression, have all the terms been defined? Symbols should be defined before or near to where they are used. A good rule is that when you first use a variable you should either define it immediately, or within the paragraph. There are a few cases where it is acceptable to define it later, but you must say this explicitly, as in "X is the matrix of activation levels, which we define below."
* Check that notation is consistent: the same symbol should not mean different things in different places, and the same concept should not keep changing names.
* Check that conventional notation is used where appropriate. For example, use $i$ to count over items and $n$ for the number of items; or use $l$ to count over items and use $L$ for the number of items; use uppercase $X$ for a random variable and lower-case $x$ for a value in the sample space.
* In every equation, are the bound and free variables consistent, over sums, integrals, expectations, maximization, etc.? For example, if there's an unbound variable on the RHS, it should also be unbound on the LHS. Or if the RHS has $\sum_i$ then $i$ should not appear on the LHS. Or if the RHS has an expectation over $X$, then $X$ should not appear on the LHS.
* If it's an equation involving a random variable, has the random variable been properly defined? Is it clear what the randomness is over? For example, if the text says "let $X$ be a random activation at layer $l$ in a neural network", is it clear whether this is uniformly random over a finite dataset, or random over an assumed probability distribution from which the dataset is drawn, or random over the weights of a Bayesian neural network?
* Check carefully for language that refers to multiplicity. Don't write "the solution" when one means "a solution". (In regular speech it's ok to be sloppy about this, but not in maths.)
* Check that optimization problems are correctly written. They should generally be written as "maximize {objective} over {variables} subject to {constraints}". Write $\max_x f(x)$ only when one is referring to the value of the optimum; write $\arg\max_x f(x)$ only when one is referring to the (or a) value of the optimizer; write "maximize f(x) over x" to refer to the problem as a whole.
* Don't assign symbols to concepts that you never refer to, or can easily refer to without. For example, don't write "The solution x* is unique" if you never need to refer to x* again; simply say that the solution is unique. (When you say "The solution x⋆ is unique" you are both stating a fact and entering the symbol x⋆ into the paper's symbol table and the reader’s working memory.)
* Display-style equations should be numbered only if they are referred to later in the text.
* Maths expressions and equations should be grammatically integrated into the surrounding text. For example -- We now show that $$ f(x)=3. $$ This sentence has an opener ("we now show") and a verb (the equals sign). Equations should not be treated as a figure, standing free of the surrounding text.

Use five subagents for this review, one per chapter. These subagents can use a lower-capability model (for Claude use sonnet with medium-level thinking, for GPT use model: gpt-5.4 and reasoning_effort: medium).
