64 Bit Raspberry Pi 3 Benchmarks via SUSE

I have a number of benchmarks that are often used in reviews of new Raspberry Pis and have also been run on everything from DOS to Windows 10, Linux distros, includingOpenSuse, and Android, many with 32 bit and 64 bit compilations. I am converting these to run on Raspberry Pi 3 at 64 bits via SUSE and OpenSUSE. I won’t bore you with details (unless you want me to) but these can be found at:

https://www.raspberrypi.org/forums/viewtopic.php?p=1095254#p1095254

I have encountered a problem that prevents realistic performance being demonstrated that appears to be caused by the CPU clock alternating between 1200 and 600 MHz, evenwhen idling. I monitored MHz and CPU temperature via the watch command, with 1 second sampling. One way of avoiding the MHz variation was to change the watch sampling rate to 0.1 seconds - why?
As established in the above RPi topic, a workaround is to change boot config.txt force_turbo=0 to =1 (Turbo mode: 0 = enable dynamic freq/voltage - 1 = always max). Alwaysrunning at 1200 MHz is good for benchmarking but not for real work. Using Raspbian, normally CPU MHz is 1200 MHz when running CPU bound programs and 600 MHz when idling.

Another issue is to find out what happens when I run my MP stress tests. These cause the RPi 3 to overheat but the CPU MHz is throttled to minimise the effects. This clockspeed is identified using command vcgencmd measure_clock arm, with variable steps, but that does not appear to be available for SUSE. How is this overheating avoided with SUSE - running continuously at 600 MHz?

Hi and welcome to the Forum :slight_smile:
Why not use cpupower?


cpupower -c all frequency-info
cpupower -c all frequency-set -g performance
cpupower -c all frequency-info
cpupower -c all frequency-set -g ondemand

Thanks, that works but, after setting performance, all cores run permanently at 1200 MHz. With OpenSuse 11.3, on a PC, normal operation is mainly at a low frequency when idle (e.g. 800 MHz) then at full speed (like 3000 MHz) when a benchmark is running, but only the core being used, followed by back to lower MHz when finished. RPi 3 with Raspbian works in the same way, at 600 or 1200 MHz, except all cores run at the same frequency. Reminder - normal operation with RPi3 and SUSE is that it switches to lower frequency when a CPU bound program is running.

Hi
I don’t see a huge temperature increase… but I don’t run a desktop environment either…

Ondemand Test;

http://thumbnails117.imagebam.com/52588/aed7c4525875188.jpg](http://www.imagebam.com/image/aed7c4525875188)

Performance Test;

http://thumbnails117.imagebam.com/52588/952b11525875193.jpg](http://www.imagebam.com/image/952b11525875193)

I have compiled my RPi Floating point and integer CPU stress tests, at 64 bits for OpenSUSE, and run them on my RPi 3. The purpose was to see what happens when the system is running with the CPU frequency governor settings of performance and on-demand, bearing in mind that the first converted single core benchmarks demonstrated unexpected slow performance with the latter setting. Details of the tests are in the following and benchmarks, with source code, in the tar.gz file.

http://www.roylongbottom.org.uk/SUSE%20RPi3%20Stress%20Tests.htm
http://www.roylongbottom.org.uk/SuseRpi3Stress.tar.gz

The first tests were with the RPi 3 fitted in a FLIRC case ##, where 15 minute 32 bit tests, under Raspbian, showed no degradation in performance, with limited increase in CPU temperature. The next ones had a copper heat sink on the CPU and no case, where the original tests demonstrated CPU MHz throttling and slower performance, with increases in temperature.

with FLIRC case, the whole aluminium case acts as an heatsink.

A summary of OpenSUSE results are below, showing average speeds per core for multi-core tests, with floating point test results in MFLOPS and integer tests in MB/second.

Single vs Multi Core - With performance setting, up to 10% per core degradation could be expected. With on demand, even an overheated MP core can be faster than a cold single core.

Performance vs On Demand - Multi core performance is essentially the same but single core OD speeds can be too slow.

FLIRC case - Note constant performance over 15 minutes MP tests.

Copper Heatsink - CPU speed throttling kicks in at over 80°C CPU temperature, recorded as on-going small reductions in MHz, and slower measured performance.

                         Per Core Average Speeds

                       On Demand      Performance     OD/Perf
                       MFLOPS MB/sec  MFLOPS MB/sec   MFL MB/s

 Single core 1 pass      1996   2278    3832   2768   52%  82%

 FLIRC Case
 4 cores first 2 passes  3623   2465    3644   2528   99%  98%
 4 cores last  2 passes  3591   2476    3645   2618   99%  95%

 Copper Heatsink
 4 cores first 2 passes  3603   2485    3463   2477  104% 100%
 4 cores last  2 passes  3152   2017    3104   1975  102% 102%

The above htm report includes 8 graphs, with 2 represented below, indicating variable recordings with on demand setting and fairly constant speeds using the performance option, with limited increases in CPU temperature.

http://www.roylongbottom.org.uk\HotSUSEPi.gif

Hi
Now you need to look at kernel tweaks :wink: Or compare the kernel configurations between openSUSE and SLES (Free for a year) for aarch64.


zcat /proc/config.gz > current_kernel_config
less current_kernel_config

Hi Malcolm

> Now you need to look at kernel tweaks Or compare the kernel configurations between openSUSE and SLES (Free for a year)for aarch64<

Thanks, but I have never considered using kernel tweaks and prefer to stick to “as is”, unless a simple run time optionis available. I already have a copy of SLES and have repeated the stress tests and it produced the same loading effects as OpenSUSE.

Following are average speeds of stressintPi64, running 6 minute tests at 8 KB, using 1, 2 and 4 cores. Withthe performance setting, except for a little degradation due to heat effects, with 4 threads, MB/second results were the same. On the other hand, default on demand option produced better performance per core as the load increased. That seems to be the wrong way round.


             On Demand     Performance
 Program   Total Average  Total  Average
  Copies  MB/sec  MB/sec  MB/sec  MB/sec
 
     1      2079    2079    2758    2758
     2      4651    2325    5519    2759
     4     10811    2703   10806    2701

I have had a quite a few crashes using the SUSE Operating Systems, but it might be my Raspberry Pi 3 and/or the sortof programs I am running, as they also occurred (less frequently) using Raspbian. Unfortunately, sometimes the SD cards cannot boot afterwards. This happened with SLES. Although it is no real hassle to produce another copyof the system, the new copy could not install SUSE software/updates and the old registration code was not recognised. But, so far, I can still run my benchmarks using SLES.

I know it’s a bit off topic, but can you guys please advice if Leap 42.2 for RPi is mature enough for daily usage, or is it still more of a proof-of-concept? I have been using Raspbian for for several months and I’m quite disappointed with it. I would love to switch to openSUSE, because that’s what I use for everything else, but I don’t want to end up with something even worse then Raspbian.

I understand it all depends on the intended use etc, but I just want to know your subjective opinion - is openSUSE for Arm similar in maturity and stability to the x86 version?

Hi
Had no issues with openSUSE Leap 42.2, Tumbleweed (I like the xfce on this) or the free (1 year subscription) SLES 12 SP2 version, I tend to use for command line only and playing with the GPIO. I had to modify wiringPi code as well as create fake cpu info.

Thank you, I think I will give it a try.

I have compiled and run the first set of 64 bit benchmarks. Full details and results are in the following, with benchmarksand source codes in the tar.gz file:

http://www.roylongbottom.org.uk/Rpi3-64-Bit-Benchmarks.tar.gz

The Classic Benchmarks are the first programs that set standards of performance for computers in the 1970s and 1980s. Theyare Whetstone, Dhrystone, Linpack and Livermore Loops. Improvements indicated relate to comparisons of gcc-6 64 bit versions and 32 bit compilations from gcc 4.8 via Raspbian.

** Whetstone **- This includes simple test loops that do not benefit from advanced instructions. There was a 40% improvement inoverall performance. This was due to limited but dominant tests using such as COS and EXP functions.

** Dhrystone **- rated in VAX MIPS AKA DMIPS produced a 43% improvement, but this benchmark is susceptible to over optimisation.

Linpack - double and single precision versions (DP and SP), with results reported in MFLOPS. Speed improvements, over the 32bit version, were around 1.9 times DP and 2.5 times SP. There is also a version that uses NEON intrinsic functions where, at 32 bits and 64 bits, are compiled as different varieties of vector instructions, with only a 10%improvement.

** Livermore Loops **- has 24 test kernels, where 64 bit performance increased between 1.02 and 2.88 times. The official averagewas 34% faster, at 279 MFLOPS. This is 21 times faster than the Cray 1 supercomputer, where this benchmark confirmed the original selection.

** Memory tests -** These measure cache and RAM speeds with results in MB/second. As could be expected, RAM speeds were generallyquite similar for particular test functions.

** MemSpeed -** Nine tests measure speeds using floating point (FP) and integer calculations. Cache based improvements were 1.64to 2.60 DPFP, 1.17 to 1.55 SPFP and 1.03 to 1.23 integer.

**BusSpeed - **this reads data via loops with 64 AND instructions, attempting to measure maximum data transfer speeds. It includesvariable address increments to identify burst reading and to provide a means of estimating bus speeds. Main differences were on using L1 cache data, where average bursts speeds were 38% faster but reading all data was slower.This is surprising as the 64 bit disassembly indicate that far more registers were used, with fewer load instructions, and the same type of AND instructions.

** NeonSpeed -** All floating point data is single precision. The source code carries out the same calculations using normal arithmeticand more complicated NEON intrinsic functions, the latter being compiled as different types of vector instructions, with no real average 64 bit improvement. The normal SP calculations were slightly faster.

Latest programs converted were my Fast Fourier Transform benchmarks that showed some 64 bit performance improvements. Source code and execution files are included in the above. These execute FFTs sized 1K to 1024K, the larger ones depending on RAM speeds. Using Raspbian (32 bit), SUSE and another Linux distro, (64 bit), the short FFTs, with execution times of less than 0.5 milliseconds, produced inconsistent running times (like sometimes half speed). This was only with “on demand” MHz settings. SUSE also produced longer periods of poor performance, as observed through random slow results on other benchmarks. [FONT=Verdana]To investigate this, I produced another test that executes 30 1K sized FFTs 500 times, with 32 bit and 64 bit compilations (These will be included in the tar.gz file). Example results are below.


       RPi 3 500 x 30 1K Single Precision FFT milliseconds
. 
                   32 Bit Raspbian On Demand
.
  12.9  12.2   7.4   6.0   6.0   6.4   6.0   6.0   6.0   6.0
   6.1   6.0   6.0   6.0   6.0   6.0   6.1   6.1   6.0   6.2
   6.2   6.0   6.0   6.1   6.0   6.0   6.0   6.0   6.1   6.0
   6.2   6.0   6.0   7.0   6.1   6.0   6.0   6.0   6.1   6.0
   6.2   6.1   6.0   6.0   6.2   6.0   6.0   6.0   6.0   7.2
 To
   6.5   6.3   6.1   6.2   6.1   6.1   6.1   6.1   6.1   6.1
   6.5   6.3   6.1   6.1   6.1   6.1   6.1   6.1   6.1   6.1
   6.4   6.2   6.1   6.1   6.2   6.1   6.1   6.1   6.1   6.1
.
                   64 Bit Other Linux On Demand
.
  17.5  15.4  11.8   8.6   5.4   5.4   5.4   5.4   5.4   5.4
   5.5   5.8   6.0   5.4   5.5   5.4   5.5   5.4   5.4   5.4
   5.5   5.6   6.1   5.4   5.5   5.4   5.5   5.5   5.4   5.4
 To
   5.7   6.9   5.7   5.4   5.4   5.4   5.5   5.4   5.4   5.4
   5.8   6.8   5.8   5.6   5.4   5.4   5.4   5.5   5.4   5.4
   5.7   6.4   5.7   5.5   5.4   5.4   5.5   5.4   5.4   5.4
.
                 64 Bit OpenSUSE On Demand
.
  12.1  12.5   8.9   5.3   5.3   5.3   5.3   5.3   5.3   5.3
   5.3   5.7   5.3   5.3   5.3   5.3   5.3   5.3   5.3   5.3
   5.3   5.6   5.3   5.3   5.3   5.3   5.3   5.3   5.3   5.3
 To
   7.9  11.7  10.7  10.6  10.6  10.6  10.6  10.6  10.6  10.7
  11.6  11.2  10.6  10.7  10.6  10.6  10.6  10.6  10.6  10.6
  11.7  11.5  10.6  10.6  10.6  10.6  10.6  10.6  10.6  10.6
  11.8  11.1  10.6  10.6  10.7  10.6  10.6  10.7  10.6  10.6
 To
   5.5   6.0   5.8   5.3   5.3   5.3   5.3   5.3   5.3   5.3
   5.5   5.9   5.7   5.3   5.3   5.3   5.3   5.3   5.3   5.3
   5.5   6.0   5.8   5.3   5.3   5.3   5.3   5.3   5.3   5.4

[/FONT]

MultiThreading Benchmarks

Most of my multithreading benchmarks run using 1, 2, 4 and 8 threads. Many have tests that use approximately 12 KB. 120 KBand 12 MB, to use both caches and RAM. The first set attempt to measure maximum MFLOPS. with two test procedures, one with two floating point operations per data word and the other with 32. The latter includes a mixture ofmultiplications and additions, coded to enable SIMD operation. In this case, using single precision numbers, four at a time, plus linked multiply and add, a top end CPU can execute eight operations per clock cycle per core.It is not clear what the potential maximum MFLOPS is on an ARM Cortex-A53, but eight per core is mentioned. The same benchmark code obtained a maximum of 24 MFLOPS/MHz on a top end quad core Intel CPU, via Linux - see thefollowing:

http://www.roylongbottom.org.uk/linux%20multithreading%20benchmarks.htm#anchor6

[FONT=Verdana]Following shows the format of the MP-MFLOPS benchmarks with the best 64 bit Raspberry Pi 3 results. Note performance increasesusing more threads, except when limited by RAM speed. These benchmarks carry out a fixed number of test passes, with each thread carrying out the same calculations on different sections of data. Numeric results produced (x100000) are output to show that all data has been used.
[/FONT]

 MP-MFLOPS NEON Intrinsics 64 Bit Tue Feb 28 15:37:39 2017
    FPU Add & Multiply using 1, 2, 4 and 8 Threads

        2 Ops/Word              32 Ops/Word
 KB     12.8     128   12800    12.8     128   12800
 MFLOPS
 1T      697     725     420    2640    2544    2441
 2T     1452    1420     348    5135    5258    4430
 4T     1438    2679     343   10113    9905    5370
 8T     1914    2533     358    9332   10124    6041
 Results x 100000, 12345 indicates ERRORS
 1T    76406   97075   99969   66015   95363   99951
 2T    76406   97075   99969   66015   95363   99951
 4T    76406   97075   99969   66015   95363   99951
 8T    76406   97075   99969   66015   95363   99951
         End of test Tue Feb 28 15:37:43 2017

Benchmarks appropriate for comparison of 32 and 64 bit versions are single and double precision versions, compiled for normalfloating point and one using NEON intrinsic functions that are clearly suitable for SIMD operation and are converted to different types of vector operation.
64 bit/32 bit speed comparisons are below. Single precision MP-MFLOPS has the highest gain by using vector instructions, insteadof scalar. With compiled intrinsics the systems use different forms of vector instructions.

 Average 64 bit performance gains

         2 Ops/Word              32 Ops/Word
 KB      12.8     128   12800    12.8     128   12800

 MF SP   4.31    3.87    1.24    2.19    2.35    2.04
 MF DP   2.45    1.71    0.83    1.92    1.92    1.42
 Intrin  1.81    1.84    0.82    1.67    1.75    1.08

There is also an OpenMP benchmark that carries out the same calculations, but also with 8 calculations per data word. OpenSUSEuses all available CPU cores. So, for comparison purposes, a version, without the MP directive, is also provided. Results identify MP gains of up to 3.89 times at 64 bits. The 64 bit version produces some similar speedsto the 32 bit compilation, but was faster by 2.47 to 2.80 times using 32 floating point operations per word, in the MP tests.

As usual benchmark, source codes, details and results are in:

http://www.roylongbottom.org.uk/Rpi3-64-Bit-Benchmarks.tar.gz
http://www.roylongbottom.org.uk/Raspberry%20Pi%20Benchmarks.htm

More 64 Bit MultiThreading Benchmarks

The other MP benchmarks, included in the tar.gz file, demonstrate some MP and 64 bit performance gains, with others identifyingthat multithreading provided little or no benefit and, sometimes, much worse performance.
**
M**P-Whetstone - Multiple threads each run the eight test functions at the same time, but with some dedicated variables. MP performanceis good but the simple test functions are nit appropriate for more advanced instructions at 64 bits, so relative 32 bit performance is between 0.48 and 2.08.
**
MP-Dhrystone - This runs multiple copies of the whole program at the same time. Dedicated data arrays are used for each threadbut there are numerous other variables that are shared. The latter reduces performance gains via multiple threads and, in some cases, these can be slower than using a single thread. In this case, some quad core improvementsare shown as up to 2.5 times faster than a single core. Single core 64 bit/32 bit speed ratio was 1.50 reducing to 1.10 using four threads.
**
MP-L
inpack **- **The original Linpack Benchmark operates on double precision floating point 100x100 matrices. This one runs on100x100, 500x500 and 1000x1000 single precision matrices using 0, 1, 2 and 4 separate threads, mainly via NEON intrinsic functions that are compiled into different forms of vector instructions. The benchmark was produced todemonstrate that the original Linpack code could not be converted (by me) to show increased performance using multiple threads. The official line is that users are allowed to implement their own linear equation solver forthis purpose. At 100 x 100, data is in L2 cache, others depend more on RAM speed. The critical daxpy function is affected by numerous thread create and join directives, even on using one thread. This leads to slow and constantperformance using all thread tests - see example below. The 32 bit version produced slightly slower speeds.

 Linpack Single Precision MultiThreaded Benchmark
  64 Bit NEON Intrinsics, Wed Mar  8 11:36:25 2017

   MFLOPS 0 to 4 Threads, N 100, 500, 1000

 Threads      None        1        2        4

 N  100     552.47   112.73   105.19   105.31 
 N  500     442.32   303.75   303.64   305.03 
 N 1000     353.88   315.96   309.15   308.31 

**MP-BusSpeed - **This runs integer read only tests using caches and RAM, each thread accessing the same data, but with staggeredstarting points. It includes tests with variable address increments, to identify burst reading and bus speeds. The main “Read All” test is intended to identify maximum RAM speed. The benchmark demonstrated some appropriateMP performance gains, but slow 64 bit speeds, with the 32 bit version being 2.5 times faster via cache based data. The reason is that the latter compiled arithmetic as 16 four way NEON operations compared with 64 scalar instructions.

**MP-RandMem - **The benchmark has cache and RAM read only and read/write tests using sequential and random access, each threadaccessing the same data but starting at different points. The read only L1 cache based tests demonstrated MP gains of 3.6 times and 64 bit version 43% faster than the 32 bit variety. Read/write tests produced no multithreadingperformance improvement and the latest benchmark appeared to be siomewhat slower than the 32 bit version.

OpenGL GLUT Benchmark

This was produced for use on Linux based PCs. It has four tests using coloured or textured simple objects then a wireframeand textured complex kitchen structure. It can be run from a script file specifying different window sizes and a command to disable VSYNC, enabling speeds greater than 60 FPS to be demonstrated. The benchmark, source codeand details are in the following:

http://www.roylongbottom.org.uk/Rpi3-64-Bit-Benchmarks.tar.gz
http://www.roylongbottom.org.uk/Raspberry%20Pi%20Benchmarks.htm#anchor19a

In 2012, I approved a request from a Quality Engineer at Canonical, to use this OpenGL benchmark in the testing framework ofthe Unity desktop software. One reason probably was that a test can be run for extended periods as a stress test.
Below are results from a Raspberry Pi 3, using the experimental desktop GL driver and the new 64 bit version. The latter includedtests at a smaller window size, to show that maximum speed was not limited by VSYNC. It can be seen that, using smaller windows, the 32 bit version was significantly faster running simple coloured objects, with the 64 bitbenchmark being ahead with complex structures. Then, performance became close up to 1024 x 768, with the later program falling over with a full screen display - (config setting?). Note that this benchmark would not run onsome Leap installations.

 ######################### RPi 3 Original #########################

 GLUT OpenGL Benchmark 32 Bit Version 1, Wed Jul 27 20:31:52 2016

 Window Size  Coloured Objects  Textured Objects  WireFrm  Texture
    Pixels        Few      All      Few      All  Kitchen  Kitchen
  Wide  High      FPS      FPS      FPS      FPS      FPS      FPS

   320   240    308.4    182.1     82.6     52.3     21.6     13.7
   640   480    129.5    119.6     74.6     49.2     21.6     13.8
  1024   768     54.8     52.2     43.7     39.2     21.4     13.6
  1920  1080     21.5     17.9     20.3     19.6     20.6     13.4

 ########################## RPi 3 SUSE ###########################

 GLUT OpenGL Benchmark 64 Bit Version 1, Sat Mar 18 19:03:25 2017

 Window Size  Coloured Objects  Textured Objects  WireFrm  Texture
    Pixels        Few      All      Few      All  Kitchen  Kitchen

  Wide  High      FPS      FPS      FPS      FPS      FPS      FPS
   160   120     87.1     76.3     64.3     46.9     24.3     15.6
   320   240     59.2     54.7     53.7     43.9     25.6     15.6
   640   480     33.4     31.7     31.0     27.6     24.4     15.3
  1024   768     17.5     17.5     17.7     17.0     16.2     14.1
  1920  1080      8.2      8.3      9.0      9.3      8.4      7.6

JavaDraw Benchmarks

The benchmark uses small to rather excessive simple objects to measure drawing performance in Frames Per Second (FPS). Fivetests draw on a background of continuously changing colour shades. Benchmarks, further details and results can be obtained via the above links.

Results below include all sorts of issues, where the original system did not run well after the new OpenGL GLUT driver wasinstalled and OpenSUSE performance depended on a particular distribution.


 ##################### RPi 3 JavaDraw FPS ######################

                    PNG     PNG    +Sweep   +200    +320   +4000
                  Bitmaps Bitmaps Gradient Small    Long   Small
                      1       2   Circles Circles  Lines  Circles

  Pi 2  900 MHz     44.4    56.8    57.3    55.0    38.6    25.2

  Pi 3  Original    55.0    69.5    70.0    67.7    46.4    29.5
  Pi 3 +GLUT Driver  2.9     3.2     7.3     8.1     7.5     7.0

  Pi 3  OpenSUSE     8.6    10.9    10.7    10.1     7.9     3.6
  Pi 3  OpenSUSE    22.8    32.1    32.3    27.7    15.3     6.2
 

[FONT=Verdana]Java Whetstone Benchmark

Details and results are also included ii the above files. Excluding two tests, where each was much faster, the average 64bit speed was nearly twice as fast.

[/FONT]

64 Bit I/O Benchmarks
**
[FONT=Verdana][size=2]My DriveSpeed and LanSpeed programs have now been recompiled as Dr
iveSpeed64 and LanSpeed64, **with benchmarks, source codes,details and results in the tar.gz and htm files quoted earlier. The code for these is identical, except DriveSpeed opens files to use direct I/O, avoiding caching. LanSpeed normally runs without using local caching. The benchmarksmeasure writing and reading speeds of relatively large files, random access and numerous small files.
There might be tuning parameter, but DriveSpeed64 produced errors using the installed OpenSUSE, where direct I/O did not appearto be available. It did run using SUSE SLES, producing the results shown below. In this case, random access and small file test results were not as expected.

#################### DriveSpeed64 SUSE SLES ####################

   DriveSpeed RasPi 64 Bit 1.1 Mon Apr  3 23:40:21 2017
 
 Current Directory Path: /home/roy/driveLANSUSE
 Total MB   29465, Free MB   27495, Used MB    1970
                        MBytes/Second
  MB   Write1   Write2   Write3    Read1    Read2    Read3
    8    10.26    15.50     7.78    47.27    51.62    48.91
  16    10.58    13.86    10.14    54.05    55.50    45.78

 Cached
   8   520.96   586.68   601.25   709.43   709.23   706.46

 Random         Read                       Write
 From MB        4        8       16        4        8       16
 msecs      0.005    0.004    0.004    16.91    20.31    22.13

 200 Files      Write                      Read                  Delete
 File KB        4        8       16        4        8       16     secs
 MB/sec      0.25     0.36     1.06   252.55   403.28   621.47
 ms/file    16.10    23.00    15.43     0.02     0.02     0.03    0.029

                End of test Mon Apr  3 23:40:59 2017
 
 >>>>>>>>>>>>>>>>>>> Comparison with 32 Bit Version <<<<<<<<<<<<<<<<<<<
  Large Files > Faster SD card reflected, reading > twice as fast
  Random      > Writing exceptionally slow, reading far too fast, data cached? 
  Small Files > Writing exceptionally slow, reading far too fast, data cached?

DriveSpeed can also be used for testing USB connected drives. This produced errors using flash drives and USB connected SDcards. It did run on the latter, via a different 64 bit OS, but only on a btrfs formatted partition.

**LAN **access could only be used via OpenSUSE, following installation of additional facilities. Samba for SUSE SLES could notbe downloaded following a necessary reinstallation of the system. OpenSUSE results are below from accessing a Windows based PC.
[/FONT][/size][size=2][FONT=Verdana][FONT=Verdana]


Cannot insert table in CODE see:
http://www.roylongbottom.org.uk/Raspberry%20Pi%20Benchmarks.htm#anchor22a

[/FONT][/FONT][/size][size=2][FONT=Verdana][FONT=Verdana]
LanSpeed64 was also successfully run targeting the main and USB drives that would not run DriveSpeed64, identifying speedswhen data was cached, and suggesting that the earlier failures were due to trying to open files (as used in the programs) to force direct I/O. Details are available in the aforementioned htm report.

[/FONT]

[/FONT]
[/size]