linux - SystemTap script to analyze the cache behavior of functions -


i profile cache behavior of kernel module systemtap (#cache references, #cache misses, etc). there example script online shows how systemtap can used read perf events , counters, including cache-related ones: https://sourceware.org/systemtap/examples/profiling/perf.stp

this sample script works default process:

probe perf.hw.cache_references.process("/usr/bin/find").counter("find_insns") {}  

i replaced process keyword module , path executable name of kernel module:

probe perf.hw.cache_references.module(module_name).counter("find_insns") {}  

i'm pretty sure module has debug info, running script get:

semantic error: while resolving probe point: identifier 'perf' @ perf.stp:14:7 source: probe perf.hw.instructions.module(module_name).counter("find_insns") {}

any ideas might wrong?

edit:

okay, realized perf counters bound processes not modules (explained here: https://sourceware.org/systemtap/man/stapprobes.3stap.html). therefore changed to:

probe perf.hw.cache_references.process(path_to_binary).counter("find_insns") {}  

now, sample script suggests, have:

probe module(module_name).function(func_name) { #save counter values on entrance ... } 

but running it, get:

semantic error: perf counter 'find_insns' not defined semantic error: while resolving probe point: identifier 'module' @ perf.stp:26:7 source: probe module(module_name).function(func_name)

edit2:

so here complete script:

#! /usr/bin/env stap  # usage: stap perf.stp <path-to-binary> <module-name> <function-name>  global cycles_per_insn global branch_per_insn global cacheref_per_insn global insns global cycles global branches global cacherefs global insn global cachemisses global miss_per_insn  probe perf.hw.instructions.process(@1).counter("find_insns") {}  probe perf.hw.cpu_cycles.process(@1).counter("find_cycles") {}  probe perf.hw.branch_instructions.process(@1).counter("find_branches") {}  probe perf.hw.cache_references.process(@1).counter("find_cache_refs") {}  probe perf.hw.cache_misses.process(@1).counter("find_cache_misses") {}   probe module(@2).function(@3) {  insn["find_insns"] = @perf("find_insns")  insns <<< (insn["find_insns"])  insn["find_cycles"] = @perf("find_cycles")  cycles <<< insn["find_cycles"]  insn["find_branches"] = @perf("find_branches")  branches <<< insn["find_branches"]  insn["find_cache_refs"] = @perf("find_cache_refs")  cacherefs <<< insn["find_cache_refs"]  insn["find_cache_misses"] = @perf("find_cache_misses")  cachemisses <<< insn["find_cache_misses"] }   probe module(@2).function(@3).return  {     dividend = (@perf("find_cycles") - insn["find_cycles"])     divisor =  (@perf("find_insns") - insn["find_insns"])     q = dividend / divisor     if (q > 0)     cycles_per_insn <<< q      dividend = (@perf("find_branches") - insn["find_branches"])     q = dividend / divisor     if (q > 0)     branch_per_insn <<< q      dividend = (@perf("find_cycles") - insn["find_cycles"])     q = dividend / divisor     if (q > 0)     cacheref_per_insn <<< q      dividend = (@perf("find_cache_misses") - insn["find_cache_misses"])     q = dividend / divisor     if (q > 0)         miss_per_insn <<< q }  probe end {  if (@count(cycles_per_insn)) {    printf ("cycles per insn\n\n")    print (@hist_log(cycles_per_insn))  }  if (@count(branch_per_insn)) {    printf ("\nbranches per insn\n\n")    print (@hist_log(branch_per_insn))  }  if (@count(cacheref_per_insn)) {    printf ("cache refs per insn\n\n")    print (@hist_log(cacheref_per_insn))  }  if (@count(miss_per_insn)) {    printf ("cache misses per insn\n\n")    print (@hist_log(miss_per_insn))  } } 

systemtap can't read hardware perfctr values kernel probes, because linux doesn't provide suitable (e.g., atomic) internal api safely reading values contexts. perf...process probes work because context not atomic: systemtap probe handler can block safely.

i cannot answer detailed question 2 (?) scripts last experimented with, because they're not complete.


Comments

Popular posts from this blog

python - Healpy: From Data to Healpix map -

c - Bitwise operation with (signed) enum value -

xslt - Unnest parent nodes by child node -