==============================================================================
 FireBench V0.5 by Michael Kbel 16/06/2012
==============================================================================

Usage
-----
To change the screenmode, just edit the !RunImage and change the
parameters "screen_x%" and "screen_y%" to your desired resolution

History
-------
V0.1  28/02/98 - first internal version
V0.2  29/11/98 - speed improvements (Never use the screen as a buffer !!!)
                 of about 15% without VRAM Caching
                 of about 30% with    VRAM Caching
               - plots everything, also the 2 ugly bottom lines
               - now calculates 2000 Frames instead of 1000, because a
                 StrongARM is to fast in calculating 1000 Frames :-)
               - first release
V.03  17/06/00 - bugs fixed, no impact on results
V.04  15/09/10 - screen mode adapted to 800*600 / Frames raised to 16000
                 due to BeagleBoard
V.05  16/06/12 - ported to 24bit screen mode for Raspberry PI compatibility,
                 shoult support any 24bit resultion from 800*600 to 1920*1200,
                 fire size changed from 320x100 to 512x256

Info
----
I wrote this to check how fast my fire routine can calculate a fire and
to test the impact of some code optimizations.

(Please note that the results differ a bit each time you run !FireBench,
 may be about +/- 0.02 sec. or something like that)

As you can see in the results, the StrongARM is only about 4 times faster
than my ARM 610 machine. This must be because of the huge memory access of
this algorithm.

If you've got any different results (ARM710,more/less Mhz,...) post them
to me !

VRAMCaching
-----------
VRAM Caching helps a lot as you can see. To enable this start the !Run App
in the dir 'AVCache.AutoVCache'. It starts the AutoVCache modul from
Torsten Karwoth. This modul enables VRAM Caching in SingleTask mode. For
more information look at the provided help files by Torsten.

Within Risc OS 4 VRAM Caching is done by default. So you don't have to use
AutoVCache.

Some technical notes...
-----------------------
This is a fire simulation which calculates the new pixels from the eight
sourrounding pixels and stores the new pixels as you can see here...

   XXX  X:old pixels => sum all 8  => div by 8 => store new one (O) at  XOX
   X X                                                                  X X
   XXX                                                                  XXX

Due to the overlapping pixels, a second buffer is needed and there have
to be put some random hotspots in the basic line.

I optimized everything as much as I could. No STRB's/LDRB's in the fire
calculation routine. I use every free ARM register and so I first read
all pixels I need WORDwise, build some often used sums, calculate 4 new
ones and then store all with one STR.

If you've done further speed improvements please let me know !!!

Of course a fire calculated of 4 pixels can be done much faster, but I think
it doesn't look that good...

Why is it faster than release V0.1 ?
------------------------------------
- VRAM caching

- In release V0.1 I used the Screen as the second buffer and then copied
  everything back to the memory buffer. As the StrongARM normally can't
  cache the VRAM this was very slow, so I set up a second buffer in the
  RAM and now the routine copies from the buffer to the screen.

  Additionally I only had to switch the buffer addresses after each frame
  to make it work.

- I also noticed that when VRAM caching is switched on, it's slightly faster
  to plot only 8 Words each time, not 11 as I did before. I think this must
  be because of the StrongARM's 4 write back buffer.

Contacting me
-------------
 Please don't hesitate to report any bugs, ideas, or whatever to me:

 e-mail: michael.kuebel@googlemail.com
