2000-2004, 2010+ Publications: Dr. D J Greaves.
This list is not complete. Publications were being placed in the online Computer Laboratory Publications Database but I have resumed adding all recent ones to this page now.
System Specficiation, Hardware and Software Synthesis
- 2015: `Layering RTL, SAFL, Handel-C and Bluespec Constructs on Chisel HCL.' DJ Greaves. At Memocode'15, The 13th ACM-IEEE International Conference on Formal Methods and Models for System Design at the University of Texas at Austin September 21-23, 2015. Chisel is a hardware construction language that supports a simplistic level of transactional programming via its Decoupled I/O primitives. In this paper we describe extensions that layer popular design paradigms on the Chisel substrate. We include RTL, SAFL-style functional hardware description, Handel-C message passing and Bluespec rules. We then briefly discuss interworking between these design styles. Paper PDF. Slides PDF.
- 2015: `Fine-grained Energy/Power Instrumentation for Software-level Efficiency Optimization' DJ Greaves, M Puzovic, AM Zaidi, K McDonald-Maier, Andrew Hopkins. At Forum on Description Languages, FDL'15, Barcelona, September 2015. Abstract: In the pursuit of both increased energy-efficiency, as well as high-performance, architects are constructing in- creasingly complex Systems-on-Chip with a variety of processor cores and DMA controllers. This complexity makes software implementation and optimization difficult, particularly when multiple independent applications may be running concurrently on such a heterogeneous platform. In order to take full ad- vantage of the underlying system, increased visibility into the interaction between the software and hardware is needed. This paper proposes on-line and off-line fine-grained instrumentation of SoC components in hardware (e.g. as part of the debug & trace infrastructure) in order to enable improvements and optimization for energy efficiency to be undertaken at higher levels of abstraction, i.e. the programmer and runtime scheduler. Energy counters are incorporated for each component that keep track of energy use. These counters are indexed by customer number tags, that are used to distinguish between the transactions executed on any given component by client applications running in a multitasking SoC environment. The contents of the counters for each augmented component, correlated with the appropriate consumer-numbers, are extracted from a running SoC under test via existing debug & trace interfaces like GDBserver, JTAG and various proprietary trace probes. In addition, auxiliary processing on-chip computes local and global energy figures and offers them through a 4-layer abstraction stack so that programmer-level fine- grained energy measurement is made available. Both the O/S scheduler and programmers can adapt their policies and coding styles for their desired energy/performance tradeoff. PAPER PDF. SLIDES PDF.
- 2015: `A toolchain for safety-critical embedded programming using FPGAs' J Kimmitt, DJ Greaves, G Wilson. At INDIN 2015 IEEE International Conference on Industrial Informatics 22-24 July 2015, Cambridge, UK.
- 2014. `A New Dataflow Compiler IR for Accelerating Control-Intensive Code in Spatial Hardware' A M Zaidi and DJ Greaves. At IPDPS'14. Abstract: While custom (and reconfigurable) computing can provide orders-of-magnitude improvements in energy efficiency and performance for many numeric, data-parallel applications, performance on non-numeric, sequential code is often worse than what is achievable using conventional superscalar proces- sors. This work attempts to address the problem of improving sequential performance in custom hardware by (a) switching from a statically scheduled to a dynamically scheduled (dataflow) execution model, and (b) developing a new compiler IR for high- level synthesis that enables aggressive exposition of ILP even in the presence of complex control flow. This new IR is directly implemented as a static dataflow graph in hardware by our prototype high-level synthesis tool-chain, and shows an average speedup of 1.13× over equivalent hardware generated using LegUp, an existing HLS tool. In addition, our new IR allows us to further trade area and energy for performance, increasing the average speedup to 1.55×, through loop unrolling, with a peak speedup of 4.05×. Our custom hardware is able to approach the sequential cycle counts of an Intel Nehalem Core i7 superscalar processor, while consuming on average only 0.25× the energy of an in-order Altera Nios IIf processor. A longer version Will also shortly appear in IEEE Transactions on Reconfigurable Technology and Systems. PDF. DOI 10.1109/IPDPSW.2014.18.
- 2013. `Achieving Superscalar Performance without Superscalar Overheads – A Dataflow Compiler IR for Custom Computing' A M Zaidi and DJ Greaves. ICCSW'13, Imperial College, London. ISBN 978-3-939897-63-7. The difficulty of effectively parallelizing code for multicore processors, combined with the end of threshold voltage scaling has resulted in the problem of ‘Dark Silicon’, severely limiting per- formance scaling despite Moore’s Law. To address dark silicon, not only must we drastically improve the energy efficiency of computation, but due to Amdahl’s Law, we must do so without compromising sequential performance... PDF.
- 2012. `TLM POWER3: Power Estimation Methodology for SystemC TLM 2.0' DJ Greaves & MM Yasin. At FDL'12 Forum on specification & Design Languages. Vienna. September 2012. 6 pages. We report on a SystemC add-on library which extends every SystemC module with non-functional data regarding power consumption and physical layout and which accumulates and estimates dynamic energy usage. It supports both phase/mode power modelling and energy-per-transaction logging for TLM (transactional-level modelling). Wiring energy is computed by counting bit-level activity within the TLM generic payload. Each leaf component can also register its physical dimensions to facilitate a wire length estimator that traverses the SystemC model hierarchy using either full placement or Rent's rule estimators. It also supports dynamic voltage islands and inter-chip wiring, where each transaction can consume energy according to the current supply voltage of the relevant islands and the nature of the interconnect. We report on basic peformance from some SPLASH-2 benchmarks running on a modelled OpenRISC quad-core platform. Paper: Full Text PDF. Slides: SLIDES PDF. Also appears in extended form in `Models, Methods, and Tools for Complex Chip Design: Selected Contributions from FDL 2012' Editor Jan Haase, Lecture Notes in Electrical Engineering (Springer) 2013.
- 2012. `A Novel Design Flow for Fault-Tolerant Computing' JR Kimmitt G Wilson & DJ Greaves. At IEEE CEEC'12 (Best Paper Award) Essex, UK, September 2012. 6 pages. PDF. This paper presents a new hardware synthesis flow, which generates an output verifiable in a field-programmable gate array. It demonstrates the relevance of fault-tolerant synthesis as required by demanding, sustainable, safety-critical applications. Although general-purpose in capability, the technique is partic- ularly applicable for modern processor implementations, where the consequences for undetected errors are usually catastrophic
- 2012. `Deadlock Avoidance and Combinational Balancing for High-Level Synthesis'. DJ Greaves. At Compiling Complete Programs into Circuits Workshop (CCPC 2012) 4th March 2012, London. The Bluespec and Kiwi tool chains project systems of communicating processes into hardware circuits. When a number of proceses are composed, two problems commonly arise at the system level: deadlock and excessive combinational delay. Both problems are emergent as the system grows and are best solved using a global pass of the whole assembly, rather than by systematic modification to components before composition. SLIDES PDF.
- 2011-2017: Kiwi High-Level Synthesis (HSL) Kiwi Scientific Acceleration User Manual: KiwiC (pdf) DJ Greaves and Satnam Singh. Kiwi was a collaborative project between the University of Cambridge Computer Laboratory and Microsoft Research Limited, headed by David Greaves (UoCCL) and Satnam Singh (MRL). From 2013 onwards, the Kiwi system was further developed at the Computer Laboratory and using a logic synthesis library called HPR-L/S. Kiwi is developing a methodology for algorithm acceleration using parallel programming and the C# language. Specifically, Kiwi consists of a run-time library for hardware FPGA execution of algorithms expressed within C# and a compiler, KiwiC, that converts .NET bytecode into Verilog RTL for further compilation for FPGA execution. In the future, custom domain-specific front ends that generate .NET bytecode can be used.
- 2011. `Distributing C# Methods and Threads over Ethernet-connected FPGAs using Kiwi'. DJ Greaves, S Singh. The Kiwi system achieves co-design by allowing nominated regions of C# programs to be targeted at FPGAs while the remainder executes on unmodified .NET and Mono virtual machines. Using C# attributes, certain methods are identified for separate compilation and collections of methods are mapped to multiple FPGAs or to workstations connected to a common Ethernet switch. Individual methods become RPC-callable entities for the top-level C# thread running on one workstation, while server threads may run continually on other workstations or FPGAs. We illustrate the concept using minimal modifications to an Adobe Photoshop plug-in where the processing for each colour channel is farmed over the Ethernet to one or three remote entities which may each be either an FPGA or a workstation. At MEMOCODE 2011 PDF (9 pages). (PDF).
- 2010. `Synthesis of Glue Logic, Transactors, Multiplexors and Serialisors from Protocol Specifications.' Whitepaper PDF. Slides PDF. At Forum on Description Languages 2010, FDL'10, Southampton, UK. Local Materials.
- 2004. `Automated Hardware Synthesis from Formal Specification using SAT solvers' DJ Greaves. RSP 2004, Geneva. Paper PDF, slides PPT.
- 2009: `Synthesis of a Parallel Smith-Waterman Sequence Alignment Kernel into FPGA Hardware', S Singh, DJ Greaves, and S Sanyal. At Many-Core and Reconfigurable Supercomputing Conference 2009 (MRSC09), Berlin., IEEE Computer Society, March 2009. This paper addresses the issue of making FPGA- based co-processors accessible to systems biologists who do not have an extensive knowledge of digital circuit design or hardware description languages like VHDL or Verilog. Our approach allows a software engineer to model the compute intensive core of some algorithm (the as a multi-threaded program which can then be automatically synthesized into a digital circuit. Key aspects of our approach include the ability to control the quality of results by adjusting the model to instantiate different numbers of threads and adjust how information flows between threads and well as the ability to program, debug and verify using regular software compilers and integrated design environments (IDEs). We illustrate our approach using the kernel of the Smith-Waterman sequence alignment algorithm. This kernel is expressed as a parallel C# program which is automatically compiled into an FPGA implementation. We describe how to represent the parallel architecture of the desired circuit using multi-threaded code which models the key architectural aspects of the circuits. We argue that descriptions in a regular, high-level lan- guage that can be automatically compiled into circuits makes hardware-based accelerators more accessible to software engineers. MSR LINK. PDF.
- 2004. `Automated Hardware Synthesis from Formal Specification using Symbolic Model Checking.' System and circuit design can be considered as planning problems, where resources are deployed in time and space to meet a given goal. At the lowest level, systems and circuits are finite state automata that must meet their design specification in all reachable operating states, but these constraints may be relaxed during a system start-up phase. In this paper we show how symbolic model checking can be used to synthesise hardware systems that conform to such specifications. We include an example each of gate-level synchonous logic, gate-level asynchronous logic and higher-level component module integration. We also demonstrate how to synthesise protocols that are guaranteed to recover from errors. Submitted to ????.
- 2003. `High-Level Hardware Specifications' Keynote Talk. DJ Greaves. 3rd U.K. ACM SIGDA Workshop www.sigda.org.uk, PDF.
- 2003. `Using RTL-to-C++ translation for large SoC concurrent engineering: a case study.' W Stoye, N Richards, D Greaves and J Green. DesignCon 2003 Santa Clara Convention Center, CA, January 27-30, 2003. Also appeared as: Using RTL-to-C++ translation for concurrent engineering. William Stoye, David Greaves, Neil Richards, James Green. in IEE Electronic Systems and Software - Feb 2003. Link. Link2.
- 2000. VTOC Verilog to C Compiler. This paper was presented at RSP 2000 in Paris: PDF
- 1995. `The CSYN Verilog Compiler and Other Tools' DJ Greaves. At the International Workshop on Field Programmable Logic. FPL'95. In LNCS Volume 975, pp 198-207. ISBN 3540 60294-1. Oxford, September 1995. PDF.
- 2002. `Performance Of A Home Lan Over Homepna 2.0 Stations' S Suh and DJ Greaves. 8th IEEE International Conference on Communication Systems (ICCS 2002), 25-28 Nov. 2002 Page(s): 661 - 668 vol.2.
- 2002. `Application of a multi-carrier modulation to a nondeterministic MAC based home LAN and its effect on performance.' S Suh and DJ Greaves. 8th IEEE International Conference on Communication Systems (ICCS 2002), Nov 2002
Reliable Scripting and Embedded Systems
- 2008. `Using a .NET Checkability Profile to Limit Interactions between Embedded Controllers'. DJ Greaves, , D Gordon, A Alvi & T Omitola. At SENSORCOMM 2008. Within a closed domain—such as a railway train, chemical production line, vehicle or home of the future concurrent applications running in embedded controller units (ECUs) and on servers share many common sensors, actuators and feedback paths through the physical part of the domain, while having to abide by common, basic liveness and consistency rules to ensure proper operation of that domain. This paper suggests that all ECUs must export a summary of their behaviour using a restricted subset of .NET bytecode and that the programming constructs used by all participating controllers must abide within a common upper bound so that automated formal checking of domain as a whole is possible. The upper bound is defined as a Checkability Profile. We describe the ROM and RAM costs of implementing this approach in one of our prototypes: a CD/DVD player for the home of the future. PDF. PPT.
- 2006. `Using Simple Pushlogic'. DJ Greaves, D Gordon. At WEBIST 06. PDF.
- 2006. `A LOGICAL APPROACH TO HOME AUTOMATION.' A Alvi, DJ Greaves. 2nd IEE International Conference on Intelligent Environments, July, 2006, Athens (http://conferences.iee.org/ie06).
- 2003. `Agile Programming For Mobile Adaptive Systems.' T Omitola and DJ Greaves. Presented at the ANWIRE Workshop at FMOODS 2003, Paris.
- 2003. `Towards Autonomous Dependable Systems', presented at Cabernet Workshop November 2003.
T Omitola and DJ Greaves.
- 2002. `Control Software for Home Automation'. DJ Greaves. Accepted for `International Workshop on Smart Appliances and Wearable Computing (IWSAWC 2002)', July 2nd, Vienna 2002. PDF.
- 2001. `Internet Access to Home Area Network'
Umar Saif, Daniel Gordon, David Greaves
IEEE Internet Computing 54-63: Vol. 5, No. 1, January/February 2001.
- 2001. `Context-aware Adaptation in UbiqtOS: A Java-based Embedded Operating System for Ubiquitous Computing'. "Umar Saif, David Greaves" 5-minute presentation + poster at ACM SOSP 2001. Banff, Canada. We present a bottom-up design of an extensible, Java-based embedded operating system that lends itself to secure, efficient, context-driven adaptation. A universal, eternal substrate is embedded in resources to enable effective participation in the system.
- 2001. "Communication Primitives for Ubiquitous Systems or RPC Considered Harmful". Umar Saif, David J. Greaves. Proceedings of 21st International Conference of Distributed Computing Systems (Workshop on Smart Appliances and Wearable Computing), 2001.
- 1998. `A revised 51 Mbps scrambler proposal'. ATM Forum Contribution AF 98-0305. 19-Apr-98. DJ Greaves Virata, A Quayle, R Dearnaley, BT Labs. PDF.
- 1998. `Warren: A Protocol for control of ATM hardware'. ATM Forum Contribution AF 98-0008. Feb-98. DJ Greaves and RJ Bradbury. PDF.
- 1997. Invited talk and paper: `ATM in the Home and the Home Area Network', DJG. 5 pages. At IEE Colloquium on ATM in Professional and Consumer Applications, Savoy Place, May 1997, ATM in Professional and Consumer Applications (Digest No: 1997/113). PDF0, PDF1.
- 1996. `ATM for Video and Audio on Demand'. Despite the massive amount of work we did on the Cambridge Interactive TV Trial, I only have one publication to show. This was presented to the Audio Engineering Society, in 1996 and they did not really believe the message in the paper. However, their 2000 conference was almost entirely dedicated to audio in packets. AES Paper (PDF) AES Paper Abstract (ascii).
Some pre-1995 Papers
- 1991-2. "The Cambridge Backbone Network, An Overview and Preliminary Performance." `Preliminary Performance Results for the CBN Half-Duplex VME Stations', DJG & K Zielinski. At IFIP WG 6.4 workshop `Local Communication Systems' Palma, 1992. Proceedings pub North Holland, eds. G Pujolle & R Puigjaner, pp 391-406.In `Computer Networks and ISDN Systems' Vol 25 No 10, pp 1127-1133, May 1993. PDF. Similar paper: Second International Conference on Local  Communications Systems: LAN and PBX, June 26-28.  1991, Palma. PDF.
- 1990. "The double-slot slotted ring protocol (DSR)." DJG. Proc Telecommunications Symposium (ITS'90) pp 238-242, Rio de Janeiro Sept 1990. DOI 10.1109/ITS.1990.175605. PDF.
- 1989. "Cambridge HSLAN Protocol Review. Presented at IFIP WG6.1 Workshop, IBM Ruschlikon." In `Protocols for High-Speed Networks', eds. H Rudin and R Williamson, pp 257-268, Elsevier Science, ISBN 0444885366, August 1989. Original PDF, Final PDF.
- 1985. "DigiPoly Digital Polyphonic Keyboard". David J Greaves. Wireless World, No. 1593. Sept 1985-Jan 1986. PDF.
- 1985. "Pulse Generator for 63XX PROM Programming". David J Greaves. Half-page. Electronic Engineering Magazine, April 1985. PDF.
- 1984. "GPIB Combiner". David J Greaves. Wireless World, No. 1579, Pages=54-57+62. April 1984. PDF.