| Optimising CompilersPrincipal lecturer: Mr Tom
 Stuart2006-07
Taken by: Part II
 
 The slides for each lecture are available as individual downloads below;
alternatively, download them all as one giant
document (3.5M PDF, 644 pages), which is also available in a
tree-preserving 8-up format for printing
(3.5M PDF, 81 pages). 
Lecture 1: Introduction (352k PDF, 37 pages)
                 
Structure of an optimising compilerWhy optimise?Optimisation = Analysis + Transformation3-address codeFlowgraphsBasic blocksTypes of analysisLocating basic blocks
 Lecture 2: Unreachable-code & -procedure elimination (263k PDF, 53 pages)
         
Control-flow analysis operates on the control structure of a program
(flowgraphs and call graphs)Unreachable-code elimination is an intra-procedural optimisation which
reduces code sizeUnreachable-procedure elimination is a similar, interprocedural
optimisation making use of the program's call graphAnalyses for both optimisations must be imprecise in order to guarantee
safety
 Lecture 3: Live variable analysis (497k PDF, 45 pages)
           
Data-flow analysis collects information about how data moves through a
programVariable liveness is a data-flow propertyLive variable analysis (LVA) is a backwards data-flow analysis for
determining variable livenessLVA may be expressed as a pair of complementary data-flow equations, which
can be combinedA simple iterative algorithm can be used to find the smallest solution to
the LVA data-flow equations
 Lecture 4: Available expression analysis (453k PDF, 58 pages)
           
Expression availability is a data-flow propertyAvailable expression analysis (AVAIL) is a forwards data-flow analysis for
determining expression availabilityAVAIL may be expressed as a pair of complementary data-flow equations,
which may be combinedA simple iterative algorithm can be used to find the largest solution to
the AVAIL data-flow equationsAVAIL and LVA are both instances (among others) of the same data-flow
analysis framework
 Lecture 5: Data-flow anomalies and clash graphs (162k PDF, 40 pages)
           
Data-flow analysis is helpful in locating (and sometimes correcting)
data-flow anomaliesLVA allows us to identify dead code and possible uses of uninitialised
variablesWrite-write anomalies can be identified with a similar analysisImprecision may lead to overzealous warningsLVA allows us to construct a clash graph
 Lecture 6: Register allocation (477k PDF, 45 pages)
           
A register allocation phase is required to assign each virtual register to
a physical one during compilationRegisters may be allocated by colouring the vertices of a clash
graphWhen the number of physical registers is limited, some virtual registers
may be spilled to memoryNon-orthogonal instructions may be handled with additional MOVs and new
edges on the clash graphProcedure calling standards are also handled this way
 Lecture 7: Redundancy elimination (352k PDF, 37 pages)
           
Some optimisations exist to reduce or remove redundancy in programsOne such optimisation, common-subexpression elimination, is enabled by
AVAILCopy propagation makes CSE practicalOther code motion optimisations can also help to reduce redundancyThe optimisations work together to improve code
 Lecture 8: Static single-assignment; strength reduction (189k PDF, 35 pages)
               
Live range splitting reduces register pressureIn SSA form, each variable is assigned to only onceSSA uses Φ-functions to handle control-flow mergesSSA aids register allocation and many optimisationsOptimal ordering of compiler phases is difficultAlgebraic identities enable code improvementsStrength reduction uses them to improve loops
 Lecture 9: Abstract interpretation (101k PDF, 27 pages)
           
Abstractions are manageably simple models of unmanageably complex
realityAbstract interpretation is a general technique for executing simplified
versions of computationsFor example, the sign of an arithmetic result can be sometimes determined
without doing any arithmeticAbstractions are approximate, but must be safeData-flow analysis is a form of abstract interpretation
 Lecture 10: Strictness analysis (215k PDF, 44 pages)
           
Functional languages can use CBV or CBN evaluationCBV is more efficient but can only be used in place of CBN if termination
behaviour is unaffectedStrictness shows dependencies of terminationAbstract interpretation may be used to perform strictness analysis of
user-defined functionsThe resulting strictness functions tell us when it is safe to use CBV in
place of CBN
 Lecture 11: Constraint-based analysis (168k PDF, 42 pages)
           
Many analyses can be formulated using constraints0CFA is a constraint-based analysisInequality constraints are generated from the syntax of a programA minimal solution to the constraints provides a safe approximation to
dynamic control-flow behaviourPolyvariant (as in 1CFA) and polymorphic approaches may improve
precision
 Lecture 12: Inference-based analysis (173k PDF, 21 pages)
           
Inference-based analysis is another useful frameworkInference rules are used to produce judgements about programs and their
propertiesType systems are the best-known exampleRicher properties give more detailed informationAn inference system used for analysis has an associated safety
condition
 Lecture 13: Effect systems (277k PDF, 37 pages)
             
Effect systems are a form of inference-based analysisSide-effects occur when expressions are evaluatedFunction types must be annotated to account for latent effectsA type system may be modified to produce judgements about both types and
effectsSubtyping may be required to handle annotated typesDifferent effect structures may give more information
 Lecture 14: Instruction scheduling (364k PDF, 45 pages)
           
Instruction pipelines allow a processor to work on executing several
instructions at oncePipeline hazards cause stalls and impede optimal throughput, even when
feed-forwarding is usedInstructions may be reordered to avoid stallsDependencies between instructions limit reorderingStatic scheduling heuristics may be used to achieve near-optimal scheduling
with an O(n²) algorithm
 Lecture 15: Register allocation vs. instruction scheduling; legality of reverse engineering (216k PDF, 29 pages)
           
Register allocation makes scheduling harder by creating extra dependencies
between instructionsLess aggressive register allocation may be desirableSome processors allocate and schedule dynamicallyReverse engineering is used to extract source code and specifications from
executable codeExisting copyright legislation may permit limited reverse engineering for
interoperability purposes
 Lecture 16: Decompilation (270k PDF, 35 pages)
           
Decompilation is another application of program analysis and
transformationCompilation discards lots of information about programs, some of which can
be recoveredLoops can be identified by using dominator treesOther control structure can also be recoveredTypes can be partially reconstructed with constraint-based analysis |