MiKuSite 
News   Risc OS Stuff   Java Stuff   x86 Stuff   VFP/NEON Stuff  
x86 Stuff
I just like fractals ;-)
Kümmel Mandelbrot Benchmark V 0.53I-32b-MT (07/06/2009) (SSE4.1 version added)
On the basis of the SSE2 version I created a SSE4.1 version, basically just making use of the 'PTEST' instruction, resulting in a performance gain of 2% at least on an Intel i7. Results on Intel C2D/C2Q seem to be slower. Nothing else changed, so I kept the version index.
Download
KMB V 0.53I-32b-MT Results (listed in order of SSE2 or SSE4.1 performance):
(Efficiency is calculated like = 1000 Iterations / MHz / Physical CPU cores)
Kümmel Mandelbrot Benchmark V 0.53I-32b-MT (11/01/2009)
Logical CPU core detection implemented. This sets also the maximum of threads created (before always set to 16 threads, what can cause some overhead). Furthermore the display of logical CPU cores and CPU brand was implemented for the result message box. Also the Windows code that caused problems on some machines was corrected. Iteration code remains unchanged for SSE2 and FPU.
Download
Kümmel Mandelbrot Benchmark V 0.53H-32b-MT (01/06/2008)
Another improvement of the FPU version and the SSE2 version. The FPU version has now 3 iterated points loop with 3 independent instruction lines and 3 times loop unrolling. The SSE2-Version was also extended to 3 times loop unrolling, which was beneficial for Core2Duo and AMD's but not for Pentium M's. So to support them a special version (which is in fact V0.53H_SSE2) is included. Speed up for the FPU version is up to 40 % and SSE2 of up to 14 % depending on the type of CPU compared to Version 0.53H.
Download
KMB V 0.53H-32b-MT Results (listed in order of SSE2 performance):
(Efficiency is calculated like = 1000 Iterations / MHz / Core)
Kümmel Mandelbrot Benchmark V 0.53G-32b-MT (13/04/2008)
After leaving the FPU version all the time behind I took the effort to implement all lessons learned from the SSE2 optimization and created a much faster FPU version with 2 points iteration loop and loop unrolling one time. About 40 to 90 % faster than the original version. The SSE2-Version remains unchanged.
Download
KMB V 0.53G-32b-MT Results (listed in order of SSE2 performance):
(Efficiency is calculated like = 1000 Iterations / MHz / Core)
Kümmel Mandelbrot Benchmark V 0.53F-32b-MT (02/04/2008)
The SSE2 version now finally uses 6 exits of the inner loop and so no iterations are wasted to wait for other points to diverge or reach maximum iterations. Iteration counter is now done with integers to reduce the necessary memory access if SSE2 counters would have been used. Overall about 20 % faster than the last version. I also doubled the duration of the benchmark to have more stable results. The FPU-Version remains unchanged, I hope to enhance it also one day as there are some possibilities, too.
Download
KMB V 0.53F-32b-MT Results (listed in order of SSE2 performance):
(Efficiency is calculated like = 1000 Iterations / MHz / Core)
Kümmel Mandelbrot Benchmark V 0.53E-32b-MT (17/01/2008)
These Version seems to be best for INTEL and AMD, except Pentium M. One can't do it right for everyone, but for now I favour this version in respect to Phenom and Core 2 Duo.
Download
KMB V 0.53E-32b-MT Results (listed in order of SSE2 performance):
(Efficiency is calculated like = 1000 Iterations / MHz / Core)
Kümmel Mandelbrot Benchmark V 0.53D-32b-MT (13/01/2008)
Another major improvement, now at about the double speed of the original version. Thanks to Xorpd! I also include now an AMD optimized version, to be more fair to each company's cpu features. There's still air for some more optimizations but the limit is more close now, I guess.
Download
KMB V 0.53D-32b-MT Results (listed in order of SSE2 performance):
(Efficiency is calculated like = 1000 Iterations / MHz / Core)
Kümmel Mandelbrot Benchmark V 0.53C-32b-MT (04/01/2008)
Much improved version of my old V 0.53 MT. The improvments are only for the SSE2 version, FPU stays the same. The code is mainly influenced by Xorpd!s 64bit OS versions (see below) and speeds up calculations on an Intel Core 2 Duo 70 % compared to the old version. Yet there's still plenty of options for further speedup.
Download
FASM - 'INCLUDE' for Kümmel Mandelbrot Benchmark V 0.53C-32b-MT (04/01/2008)
If you got problems compiling the source code of KMB, this INCLUDE directory for FASM might help. Note: This one is different to the INCLUDE from the old KMB V0.53 Version. One day I'll try to make it run with the standard delivered FASM Include
Download  
KMB V 0.53C-32b-MT Results (listed in order of SSE2 performance):
(Efficiency is calculated like = 1000 Iterations / MHz / Core)
Xorpd! did a x64 OS version of my KMB V 0.53 MT version that gains amazing speedups ! See the results of the KMB V 0.57 MT here:
(Efficiency is calculated like = 1000 Iterations / MHz / Core)
Kümmel Mandelbrot Benchmark V 0.53 MT (11/07/2006)
A Benchmark desired to detect the double precision floating point power of x86 CPU's by calculating a bunch of fractals either with the standard FPU or the SSE2 unit. Version 0.52 now supports Multi-Threading up to 16 Threads instead of the 2 Threads of the previous version giving huge benefit on multi CPU systems. Some results are published below. More info in the 'Read Me'. Post some more results to me if you like ! Written in x86 assembler/FASM format.
Download (18 KByte)
FASM - 'INCLUDE' for Kümmel Mandelbrot Benchmark
If you got problems compiling the source code of KMB, this INCLUDE directory for FASM might help.
Download (97 KByte)  
KMB V 0.53 MT Results (listed in order of SSE2 performance):
(Efficiency is calculated like = 1000 Iterations / MHz / Core)
FRAC! V 1.0 (02/06/1997)
A full menu driven fractal application for DOS, including different algorithms, 3D-display, coulour-rotation, load and save of the data. Still working on XP but incredibly slow...written in C with and compiled with ancient Turbo-C.
Download (56 KByte)