Feb 2018 - SocEx3 Assessed Exercise 3 - ACS P35 Mini-Project or Group Mini-Project Task There may be two or more groups or one big group. It's up to you. ---------------------------------------------------------------------------------------- Exercise 3a (5 Marks Credit) is the rough specification of the main practical work you will complete in Exercise 3. Exercise 3b (25 Marks Credit) is making it work. Exercise 3c (0 Marks Credit) is approval of what you will do for Exercise 4a, which will generally be detailed analysis of Exercise 3. Note: Most people will use their Exercise 3 group mini-project for their evaluation in Exercise 4a. But Exercise 4a can instead be done using a different implementation or even a different accelerator if you wish. ---------------------------------------------------------------------------------------- P35 Exercise 3a - Specification of an Accelerator and Functional Model 3.a1 - chose an application task amenable to hardware acceleration (the suggested tasks are big data bloom filtering and AES encryption) 3.a2 - get a baseline version of it working in C on a unix workstation without acceleration 3.a3 - port the C version to run on the Prazor simulator without acceleration 3.a4 - port the C version to the Zynq ARM without acceleration 3.a5 - create an outline design of some sort of hardware accelerator (keep it simple) 3.a6 - write low-level documentation for the accelerator giving its programming model. 3.a7 - give an analytic formula that provides a rough estimate of expected performance or speedup. Write up all of the above in a brief report. For exercise 3a you do not need to make anything work on Prazor or the Zynq cards except for the unaccelerated baseline application. And you do not need to make any measurements. For all work in 3a you may work alone or together with others or use resources found elsewhere. Just make sure you clearly show who did what. ---------------------------------------------------------------------------------------- Exercise 3b - Initial Accelerator Implementation - Get something working! 3.b0 - creating both a SystemC TLM model and a synthesisable RTL implementation of the accelerator (the work creating covers manual coding and HLS) 3.b1 - installing the TLM version in the Prazor virtual platform and adapting the 3.03 output to use it 3.b2 - installing the RTL version on the Zynq FPGA and adapting the 3.04 output to use it 3.b3 - Make a very brief evaluation of the performance of your 3.b2 implementation where you report 2 or 3 measurements. 3.b4 - Make sure you have energy figures being reported in both the 3.b1 and 3.b2 experiments. For all work 3b you may work alone or together with others. Write up all of the above in a report. Make sure you clearly show who did what. Where you include text or diagrams in your report that were prepared by others, make this clear. Emphasise your contribution to the work. ---------------------------------------------------------------------------------------- Exercise 3c - Evaluation Specification (but not the actual evaluation). 3.c0 - Define a benchmarking workload suite for your accelerator - explaining why it is representitive or what span metrics could be computed for it. Note: this is a non-issue for the AES task. 3.c1 - Briefly compare your results for energy and performance between 3.a7, 3.b1 and 3.b2 and say where you think any discrepancies might have come from (you can explore the discrepancies properly in exercise 4a). 3.c2 - Do 3.c1 for energy instead of performance. Note, in order to generate data points for comparison, and for crosschecking predicted and measured performance, consider adjusting the following parameters: Dataset block and bust size gcc optimisaion level : -O0 versus -O2 L1 and/or L2 cache disable CPU clock frequency adjustment Use one or both ARM cores ---------------------------------------------------------------------------------------- Timetable Exercise 3a 5/3/18 12:00 Exercise 3b & 3c 16/3/18 12:00 =================================================================================== Notes arising: Marking scheme note: The marks for Exercise 3 will be allocated proportionally to how much practical work an individual has undertaken and the complexity of the work. Little credit shall be allocated for the quality of the write up in Exercise 3. Where people have worked in groups, as recommended for Exercise 3, this is seen as a sensible use of time that deserves credit, and so certain items of practical work should sensibly count towards the score of more than one individual. The original marking scheme for the P35 module, as relfected on Moodle, was 3a 0 3b 20 4a 30 4b 40 But as some people have noticed, this has been changed to 3a 0 5 3b 20 25 4a 30 30 4b 40 30 The explanation for this is that the 20 marks for exercise 3 was not sufficient to reflect the different levels of sophistication in different implementations. And in previous years, more material was lectured and so there was more to examine in 4b. I've used the term 'span metric' to describe features of a data set that relate to how much of the possible test size the data set covers. For instance, if the accelerator processes a graph structure, graphs have many possible metrics, including connectivity, diameter, number of edges etc.. The 'TLM' model in 3.b1 may be an RTL model wrapped up with transactors as a TLM model to interact with the TLM structure of Prazor. Verilator will create SystemC from RTL. Or if you use HLS to generate your accelerator then a SystemC output is available from some HLS tools, like Kiwi. END