<p>The gcc C compiler defaults to generating Thumb-2 machine code and hardware floating point.  The Thumb modes   
and hardware floating point on the simulator do not currently work (are just being debugged and should work in the near                      
future), so meanwhile, to run the same binary code on both systems                                                                           
requires coercing gcc to avoid to only use the old ARM32 mode.  The relevant flags are:                                                      
<pre>                                                                                                                                        
   -marm -mfloat-abi=soft                                                                                                                    
</pre>                                                                                                                                       
                                                                                                                                             
<p>With these flags, code compiled on the workstatstion or on the real card is interchangable and .o files can                               
freely be copied backwards and forwards using scp.                                                                                           
                                                                                                                                             
<p>However, note the installed libraries, libc and libgcc, on the Parallella cards also use Thumb mode so, for detailed                      
performance comparison (and to avoid linker errors about VPF)  you                                                                           
should avoid using these and instead use your own compiled versions of these too (or the ones on the links                                   
in the detailed documentation at the bottom of this section).                                                                                
                                                                                                                                             
<p>The difference between your application binaries for running bare                                                                         
metal on the simulator and running on linux on the real card are                                                                             
mainly to do with console I/O.  The I/O paths are very different and                                                                         
so performance comparions should not be made.  Also, they are incompatible and you need to swap the system calls                             
used on the real card with direct calls to the UART device driver as used on the simulator.                                                  
                                                                                                               
                                                                                                                            
<pre>                                                                                                                                        
        ld -o a.out bareliftcrt.o bareliftshim.o dfsin.o prlibc.o $(LIBGCC)                                                                  
        objdump -d ./a.out > dis                                                                                                             
        ./a.out                                                                                                                              
</pre>                            
                              
<p>The best way to redirect the I/O is to link the same .o application and library files with some slightly                                  
different I/O shims, as in the detailed information.  Essentially you want to replace uart64_driver.o with bareliftshim.o.                   
You should link using the ld program (not using gcc as a linker) since this will give you complete control over                              
which kickoff code and libraries are included.  Also, always check what you have made using objdump -d as a disassembler.                    
                                                                                                                                             
<p>Once Thumb modes are working we can compare energy and performance with and without them.                                                 
                                                                                             

END
