I have recently installed perf, a performance tool for linux with the aim of monitoring cahce misses. I have done so using the zypper repository. Unfortunately, perf does not show me any cache miss rate. Are there some special kernel setting that need to be turned on to allow perf to run cache performance analysis? I would appreciate your help.
In general, is using perf on opensuse a good idea. If not, what tool should I use to run cache analysis (valgrind is the standard option, but is relatively heavy and I am looking for something simple like perf).
Here are some examples: I have tried running a simple example given on the perf website and I got:
john@linux-f44g:~/Downloads/test> sudo perf stat dd if=/dev/zero of=/dev/null count=1000000
1000000+0 records in
1000000+0 records out
512000000 bytes (512 MB) copied, 0.565965 s, 905 MB/s
Performance counter stats for 'dd if=/dev/zero of=/dev/null count=1000000':
565.860129 task-clock (msec) # 0.998 CPUs utilized
22 context-switches # 0.039 K/sec
0 cpu-migrations # 0.000 K/sec
78 page-faults # 0.138 K/sec
1,210,511,822 cycles # 2.139 GHz [83.24%]
408,072,883 stalled-cycles-frontend # 33.71% frontend cycles idle [83.29%]
279,548,413 stalled-cycles-backend # 23.09% backend cycles idle [66.82%]
1,967,969,705 instructions # 1.63 insns per cycle
# 0.21 stalled cycles per insn [83.41%]
397,736,050 branches # 702.888 M/sec [83.42%]
4,011,374 branch-misses # 1.01% of all branches [83.24%]
0.567221491 seconds time elapsed
Note that the cache miss rate is missing from the statistics. The same command should provide the following output (according to the perf website https://perf.wiki.kernel.org/index.php/Tutorial )
perf stat -B dd if=/dev/zero of=/dev/null count=1000000
1000000+0 records in
1000000+0 records out
512000000 bytes (512 MB) copied, 0.956217 s, 535 MB/s
Performance counter stats for 'dd if=/dev/zero of=/dev/null count=1000000':
5,099 cache-misses # 0.005 M/sec (scaled from 66.58%)
235,384 cache-references # 0.246 M/sec (scaled from 66.56%)
9,281,660 branch-misses # 3.858 % (scaled from 33.50%)
240,609,766 branches # 251.559 M/sec (scaled from 33.66%)
1,403,561,257 instructions # 0.679 IPC (scaled from 50.23%)
2,066,201,729 cycles # 2160.227 M/sec (scaled from 66.67%)
217 page-faults # 0.000 M/sec
3 CPU-migrations # 0.000 M/sec
83 context-switches # 0.000 M/sec
956.474238 task-clock-msecs # 0.999 CPUs
0.957617512 seconds time elapsed