Two Benchmark Tests for BCPL Style Coroutines This directory contains various implementions of two benchmark programs to test the efficiency of BCPL style coroutines when implemented in a variety of different languages. Download cobench.tgz (or cobench.zip) and un-tar (or un-zip) it to form the directory Cobench. The enter one of the diectories: bcpl, natbcpl, mcpl, java, c and New Jersey SML and type the command: make. This should compile and run the benchmark, provided you are on a Pentium machine running Linux. You may find you have to re-install the BCPL Cintcode system and you may have to edit one or more of the Makefiles. There are directories for each language and implementation style. They are currently as follows: bcpl The definitive BCPL implementation run using Cintcode natbcpl BCPL compiled (very naively) in machine code c Implementation in C using a coroutine library involving setjmp, longjump and two instructions of inline assembler. nat-c Implemetation in C using a hand written coroutine library in assembler (for the Pentium). java Implementation in Java using a coroutine library based on threads using wait, notify and synchronisation. mcpl An interpretive implementation in MCPL smlnj An implementation in Standard ML using New Jersey Continuations. haskell An untested approximation to the benchmark in Haskell. All these implementations can be run under Linux (provided the relevant compilers have been installed). They are typically run use make as follows: make Same as make help make help Display the help infomation make bench Run cobench make bencht Run cobench with tracing make sim Run cosim make simt Run cosim with tracing make test Run cotest make clean Delete any files that can be reconstructed The program cotest tests where the BCPL style coroutines have been implemented correctly. ************************ cobench ************************ This benchmark program creates a source coroutine, 500 copy coroutines and a sink coroutine. It then causes the source coroutine to transmit 10,000 numbers through all the copy coroutines to be thrown away by the sink coroutine. The numbers are sent one at a time, and all transmission from one coroutine to the next uses a coroutine implementation of Occam channels. Reading and writing are done (in the BCPL version) using coread and cowrite, respectively. These are defined as follows: AND coread(ptr) = VALOF { LET cptr = !ptr TEST cptr THEN { !ptr := 0 // Clear the channel word RESULTIS resumeco(cptr, currco) } ELSE { !ptr := currco // Set channel word to this coroutine RESULTIS cowait() // Wait for value from cowrite } } AND cowrite(ptr, val) BE { LET cptr = !ptr TEST cptr THEN { !ptr := 0 callco(cptr, val) // Send val to coread } ELSE { !ptr := currco callco(cowait(), val) } } In both functions, ptr points to a channel word which hold zero (=FALSE) if no coroutine is waiting on the channel. The first coroutine to call coread (or cowrite) puts its own identity into the channel word and then waits. When the second coroutine reaches the corresponding cowrite (or coread) call, it can see the identity of the other coroutine that is waiting. If the writing coroutine arrives at the communication point second, it picks the identity of the reading coroutine out of the channel word, clears the channel word, and then transmits the value to the waiting read coroutine using a simple call of callco. If the reading coroutine arrives at the communication point second, it gives its own identity to the waiting write coroutine then waits for the value to be sent. A little thought will show you why coread uses resumeco to transmit its own identity to the write coroutine. If you are not familiar with BCPL coroutines, you can read about them in the BCPL manual that is available via my home page. The BCPL Cintcode version of this cobench executes 257,569,294 Cintcode instructions changing coroutines 10,024,518 times. (These figures were obtained using the stats command). This directory currently holds four implementations of the benchmark: bcpl, natbcpl, java and c. They take the following times when run on a 1 GHz Pentium III, unless otherwise stated. BCPL Cintcode byte stream interpreter: 4.78 secs BCPL Cintcode byte stream interpreter on a 2.4GHz AMD64, under Cygwin: 1.79 secs BCPL Native 386 code (unoptimised): 0.81 secs BCPL under Cintpos using an interpreter implemented in C and all debugging aids on turned on: 10 secs MCPL Mintcode byte stream interpreter: 10.51 secs Java j2sdk1.4.0 JIT (using threads): 183.60 secs C using setjmp and longjmp + inline assembler: 1.36 secs C using colib written in assembler 0.61 secs New Jersey SML using continuations: 4.37 secs BCPL Cintcode on a 75Mhz SH3 (HP 620LX Handheld) 115.52 secs The first of the C versions using setjmp, longjmp and inline assembler by Jeremy Singer. ************************ cosim ************************ This benchmark is a discrete event simulator that simulates messages passing through a network of nodes. The nodes are numbered 1 to 500 and are each initially given a message to process. The time to process a message is randomly distributed between 0 and 999 units of simulated time. Once a message has been processed it is sent to another node selected at random. The transmission delay is assumed to be ABS(i-j) where i and j are the source and destination node numbers. Nodes can only process one message at a time and so messages may have to be queued. Initially each node is assumed to have just received a message. The simulation starts a time 0 and finishes at time 1,000,000. The result, 501907, is the total count of messages that have been completely processed by the finishing time. This benchmark executes 435,363,350 Cintcode instructions and performs 2,510,520 coroutine changes. The timing results are as follows: BCPL Cintcode byte stream interpreter: 9.02 secs BCPL Cintcode byte stream interpreter on a 2.4GHz AMD64 under Cygwin: 3.29 secs BCPL Native 386 code (unoptimised): 1.11 secs BCPL under Cintpos using an interpreter implemented in C and all debugging aids on turned on: 17 secs MCPL Mintcode byte stream interpreter: 15.39 secs Java j2sdk1.4.0 JIT (using threads, wait and notify): 57.39 secs C using setjmp and longjmp + inline assembler: 0.80 secs C using colib written in assembler 0.58 secs New Jersey SML using continuations: 4.08 secs BCPL Cintcode on a 75Mhz SH3 (HP 620LX Handheld) 121.22 secs Martin Richards 23 February 2007