linux - SystemTap script to analyze the cache behavior of functions -
i profile cache behavior of kernel module systemtap (#cache references, #cache misses, etc). there example script online shows how systemtap can used read perf events , counters, including cache-related ones: https://sourceware.org/systemtap/examples/profiling/perf.stp
this sample script works default process:
probe perf.hw.cache_references.process("/usr/bin/find").counter("find_insns") {}
i replaced process
keyword module
, path executable name of kernel module:
probe perf.hw.cache_references.module(module_name).counter("find_insns") {}
i'm pretty sure module has debug info, running script get:
semantic error: while resolving probe point: identifier 'perf' @ perf.stp:14:7 source: probe perf.hw.instructions.module(module_name).counter("find_insns") {}
any ideas might wrong?
edit:
okay, realized perf counters bound processes not modules (explained here: https://sourceware.org/systemtap/man/stapprobes.3stap.html). therefore changed to:
probe perf.hw.cache_references.process(path_to_binary).counter("find_insns") {}
now, sample script suggests, have:
probe module(module_name).function(func_name) { #save counter values on entrance ... }
but running it, get:
semantic error: perf counter 'find_insns' not defined semantic error: while resolving probe point: identifier 'module' @ perf.stp:26:7 source: probe module(module_name).function(func_name)
edit2:
so here complete script:
#! /usr/bin/env stap # usage: stap perf.stp <path-to-binary> <module-name> <function-name> global cycles_per_insn global branch_per_insn global cacheref_per_insn global insns global cycles global branches global cacherefs global insn global cachemisses global miss_per_insn probe perf.hw.instructions.process(@1).counter("find_insns") {} probe perf.hw.cpu_cycles.process(@1).counter("find_cycles") {} probe perf.hw.branch_instructions.process(@1).counter("find_branches") {} probe perf.hw.cache_references.process(@1).counter("find_cache_refs") {} probe perf.hw.cache_misses.process(@1).counter("find_cache_misses") {} probe module(@2).function(@3) { insn["find_insns"] = @perf("find_insns") insns <<< (insn["find_insns"]) insn["find_cycles"] = @perf("find_cycles") cycles <<< insn["find_cycles"] insn["find_branches"] = @perf("find_branches") branches <<< insn["find_branches"] insn["find_cache_refs"] = @perf("find_cache_refs") cacherefs <<< insn["find_cache_refs"] insn["find_cache_misses"] = @perf("find_cache_misses") cachemisses <<< insn["find_cache_misses"] } probe module(@2).function(@3).return { dividend = (@perf("find_cycles") - insn["find_cycles"]) divisor = (@perf("find_insns") - insn["find_insns"]) q = dividend / divisor if (q > 0) cycles_per_insn <<< q dividend = (@perf("find_branches") - insn["find_branches"]) q = dividend / divisor if (q > 0) branch_per_insn <<< q dividend = (@perf("find_cycles") - insn["find_cycles"]) q = dividend / divisor if (q > 0) cacheref_per_insn <<< q dividend = (@perf("find_cache_misses") - insn["find_cache_misses"]) q = dividend / divisor if (q > 0) miss_per_insn <<< q } probe end { if (@count(cycles_per_insn)) { printf ("cycles per insn\n\n") print (@hist_log(cycles_per_insn)) } if (@count(branch_per_insn)) { printf ("\nbranches per insn\n\n") print (@hist_log(branch_per_insn)) } if (@count(cacheref_per_insn)) { printf ("cache refs per insn\n\n") print (@hist_log(cacheref_per_insn)) } if (@count(miss_per_insn)) { printf ("cache misses per insn\n\n") print (@hist_log(miss_per_insn)) } }
systemtap can't read hardware perfctr values kernel probes, because linux doesn't provide suitable (e.g., atomic) internal api safely reading values contexts. perf...process probes work because context not atomic: systemtap probe handler can block safely.
i cannot answer detailed question 2 (?) scripts last experimented with, because they're not complete.
Comments
Post a Comment