This patch series is to support for using 'perf script' for CoreSight
trace disassembler, for this purpose this patch series adds a new
python script to parse CoreSight tracing event and use command 'objdump'
for disassembled lines, finally this can generate readable program
execution flow for reviewing tracing data.
Patch 0001 is one fixing patch to generate samples for the start packet
and exception packets.
Patch 0002 is the prerequisite to add addr into sample dict, so this
value can be used by python script to analyze instruction range.
Patch 0003 is to add python script for trace disassembler.
Patch 0004 is to add doc to explain python script usage and give
example for it.
This patch series has been rebased on acme git tree [1] with the commit
19422a9f2a3b ("perf tools: Fix kernel_start for PTI on x86") and tested
on Hikey (ARM64 octa CA53 cores).
In this version the script has no dependency on ARM64 platform and is
expected to support ARM32 platform, but I am lacking ARM32 platform for
testing on it, so firstly upstream to support ARM64 platform.
This patch series is firstly to support 'per-thread' recording tracing
data, but we also need to verify the script can dump trace disassembler
CPU wide tracing and kernel panic kdump tracing data. I also verified
this patch series which can work with kernel panic kdump tracing data,
because Mathieu is working on CPU wide tracing related work, so after
this we need to retest for CPU wide tracing and kdump tracing to ensure
the python script can handle well for all cases.
You are very welcome to test the script in this patch series, your
testing result and suggestion are very valuable to perfect this script
to cover more cases.
Changes from v2:
* Synced with Rob for handling CS_ETM_TRACE_ON packet, so refined 0001
patch according to dicussion;
* Minor cleanup and fixes in 0003 patch for python script: remove 'svc'
checking.
Changes from v1:
* According to Mike and Rob suggestion, add the fixing to generate samples
for the start packet and exception packets.
* Simplify the python script to remove the exception prediction algorithm,
we can rely on the sane exception packets for disassembler.
Leo Yan (4):
perf cs-etm: Generate branch sample for missed packets
perf script python: Add addr into perf sample dict
perf script python: Add script for CoreSight trace disassembler
coresight: Document for CoreSight trace disassembler
Documentation/trace/coresight.txt | 52 +++++
tools/perf/scripts/python/arm-cs-trace-disasm.py | 235 +++++++++++++++++++++
tools/perf/util/cs-etm.c | 93 ++++++--
.../util/scripting-engines/trace-event-python.c | 2 +
4 files changed, 362 insertions(+), 20 deletions(-)
create mode 100644 tools/perf/scripts/python/arm-cs-trace-disasm.py
--
2.7.4
ARM CoreSight auxtrace uses 'sample->addr' to record the target address
for branch instructions, so the data of 'sample->addr' is required for
tracing data analysis.
This commit collects data of 'sample->addr' into perf sample dict,
finally can be used for python script for parsing event.
Signed-off-by: Leo Yan <[email protected]>
---
tools/perf/util/scripting-engines/trace-event-python.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/tools/perf/util/scripting-engines/trace-event-python.c b/tools/perf/util/scripting-engines/trace-event-python.c
index 10dd5fc..7f8afac 100644
--- a/tools/perf/util/scripting-engines/trace-event-python.c
+++ b/tools/perf/util/scripting-engines/trace-event-python.c
@@ -531,6 +531,8 @@ static PyObject *get_perf_sample_dict(struct perf_sample *sample,
PyLong_FromUnsignedLongLong(sample->period));
pydict_set_item_string_decref(dict_sample, "phys_addr",
PyLong_FromUnsignedLongLong(sample->phys_addr));
+ pydict_set_item_string_decref(dict_sample, "addr",
+ PyLong_FromUnsignedLongLong(sample->addr));
set_sample_read_in_dict(dict_sample, sample, evsel);
pydict_set_item_string_decref(dict, "sample", dict_sample);
--
2.7.4
This commit adds python script to parse CoreSight tracing event and
use command 'objdump' for disassembled lines, finally we can generate
readable program execution flow for reviewing tracing data.
The script receives CoreSight tracing packet with below format:
+------------+------------+------------+
packet(n): | addr | ip | cpu |
+------------+------------+------------+
packet(n+1): | addr | ip | cpu |
+------------+------------+------------+
packet::ip is the last address of current branch instruction and
packet::addr presents the start address of the next coming branch
instruction. So for one branch instruction which starts in packet(n),
its execution flow starts from packet(n)::addr and it stops at
packet(n+1)::ip. As results we need to combine the two continuous
packets to generate the instruction range, this is the rationale for the
script implementation:
[ sample(n)::addr .. sample(n+1)::ip ]
Credits to Tor Jeremiassen who have written the script skeleton and
provides the ideas for reading symbol file according to build-id,
creating memory map for dso and basic packet handling. Mathieu Poirier
contributed fixes for build-id and memory map bugs. The detailed
development history for this script you can find from [1]. Based on Tor
and Mathieu work, the script is updated samples handling for the
corrected sample format. Another minor enhancement is to support for
without build-id case, the script can parse kernel symbols with option
'-k' for vmlinux file path.
[1] https://github.com/Linaro/perf-opencsd/commits/perf-opencsd-v4.15/tools/perf/scripts/python/cs-trace-disasm.py
Co-authored-by: Tor Jeremiassen <[email protected]>
Co-authored-by: Mathieu Poirier <[email protected]>
Signed-off-by: Leo Yan <[email protected]>
---
tools/perf/scripts/python/arm-cs-trace-disasm.py | 235 +++++++++++++++++++++++
1 file changed, 235 insertions(+)
create mode 100644 tools/perf/scripts/python/arm-cs-trace-disasm.py
diff --git a/tools/perf/scripts/python/arm-cs-trace-disasm.py b/tools/perf/scripts/python/arm-cs-trace-disasm.py
new file mode 100644
index 0000000..1239ab4
--- /dev/null
+++ b/tools/perf/scripts/python/arm-cs-trace-disasm.py
@@ -0,0 +1,235 @@
+# arm-cs-trace-disasm.py: ARM CoreSight Trace Dump With Disassember
+# SPDX-License-Identifier: GPL-2.0
+#
+# Tor Jeremiassen <[email protected]> is original author who wrote script
+# skeleton, Mathieu Poirier <[email protected]> contributed
+# fixes for build-id and memory map; Leo Yan <[email protected]>
+# updated the packet parsing with new samples format.
+
+import os
+import sys
+import re
+from subprocess import *
+from optparse import OptionParser, make_option
+
+# Command line parsing
+
+option_list = [
+ # formatting options for the bottom entry of the stack
+ make_option("-k", "--vmlinux", dest="vmlinux_name",
+ help="Set path to vmlinux file"),
+ make_option("-d", "--objdump", dest="objdump_name",
+ help="Set path to objdump executable file"),
+ make_option("-v", "--verbose", dest="verbose",
+ action="store_true", default=False,
+ help="Enable debugging log")
+]
+
+parser = OptionParser(option_list=option_list)
+(options, args) = parser.parse_args()
+
+if (options.objdump_name == None):
+ sys.exit("No objdump executable file specified - use -d or --objdump option")
+
+# Initialize global dicts and regular expression
+
+build_ids = dict()
+mmaps = dict()
+disasm_cache = dict()
+cpu_data = dict()
+disasm_re = re.compile("^\s*([0-9a-fA-F]+):")
+disasm_func_re = re.compile("^\s*([0-9a-fA-F]+)\s\<.*\>:")
+cache_size = 32*1024
+prev_cpu = -1
+
+def parse_buildid():
+ global build_ids
+
+ buildid_regex = "([a-fA-f0-9]+)[ \t]([^ \n]+)"
+ buildid_re = re.compile(buildid_regex)
+
+ results = check_output(["perf", "buildid-list"]).split('\n');
+ for line in results:
+ m = buildid_re.search(line)
+ if (m == None):
+ continue;
+
+ id_name = m.group(2)
+ id_num = m.group(1)
+
+ if (id_name == "[kernel.kallsyms]") :
+ append = "/kallsyms"
+ elif (id_name == "[vdso]") :
+ append = "/vdso"
+ else:
+ append = "/elf"
+
+ build_ids[id_name] = os.environ['PERF_BUILDID_DIR'] + \
+ "/" + id_name + "/" + id_num + append;
+ # Replace duplicate slash chars to single slash char
+ build_ids[id_name] = build_ids[id_name].replace('//', '/', 1)
+
+ if ((options.vmlinux_name == None) and ("[kernel.kallsyms]" in build_ids)):
+ print "kallsyms cannot be used to dump assembler"
+
+ # Set vmlinux path to replace kallsyms file, if without buildid we still
+ # can use vmlinux to prase kernel symbols
+ if ((options.vmlinux_name != None)):
+ build_ids['[kernel.kallsyms]'] = options.vmlinux_name;
+
+def parse_mmap():
+ global mmaps
+
+ # Check mmap for PERF_RECORD_MMAP and PERF_RECORD_MMAP2
+ mmap_regex = "PERF_RECORD_MMAP.* -?[0-9]+/[0-9]+: \[(0x[0-9a-fA-F]+)\((0x[0-9a-fA-F]+)\).*:\s.*\s(\S*)"
+ mmap_re = re.compile(mmap_regex)
+
+ results = check_output("perf script --show-mmap-events | fgrep PERF_RECORD_MMAP", shell=True).split('\n')
+ for line in results:
+ m = mmap_re.search(line)
+ if (m != None):
+ if (m.group(3) == '[kernel.kallsyms]_text'):
+ dso = '[kernel.kallsyms]'
+ else:
+ dso = m.group(3)
+
+ start = int(m.group(1),0)
+ end = int(m.group(1),0) + int(m.group(2),0)
+ mmaps[dso] = [start, end]
+
+def find_dso_mmap(addr):
+ global mmaps
+
+ for key, value in mmaps.items():
+ if (addr >= value[0] and addr < value[1]):
+ return key
+
+ return None
+
+def read_disam(dso, start_addr, stop_addr):
+ global mmaps
+ global build_ids
+
+ addr_range = start_addr + ":" + stop_addr;
+
+ # Don't let the cache get too big, clear it when it hits max size
+ if (len(disasm_cache) > cache_size):
+ disasm_cache.clear();
+
+ try:
+ disasm_output = disasm_cache[addr_range];
+ except:
+ try:
+ fname = build_ids[dso];
+ except KeyError:
+ sys.exit("cannot find symbol file for " + dso)
+
+ disasm = [ options.objdump_name, "-d", "-z",
+ "--start-address="+start_addr,
+ "--stop-address="+stop_addr, fname ]
+
+ disasm_output = check_output(disasm).split('\n')
+ disasm_cache[addr_range] = disasm_output;
+
+ return disasm_output
+
+def dump_disam(dso, start_addr, stop_addr):
+ for line in read_disam(dso, start_addr, stop_addr):
+ m = disasm_func_re.search(line)
+ if (m != None):
+ print "\t",line
+ continue
+
+ m = disasm_re.search(line)
+ if (m == None):
+ continue;
+
+ print "\t",line
+
+def dump_packet(sample):
+ print "Packet = { cpu: 0x%d addr: 0x%x phys_addr: 0x%x ip: 0x%x " \
+ "pid: %d tid: %d period: %d time: %d }" % \
+ (sample['cpu'], sample['addr'], sample['phys_addr'], \
+ sample['ip'], sample['pid'], sample['tid'], \
+ sample['period'], sample['time'])
+
+def trace_begin():
+ print 'ARM CoreSight Trace Data Assembler Dump'
+ parse_buildid()
+ parse_mmap()
+
+def trace_end():
+ print 'End'
+
+def trace_unhandled(event_name, context, event_fields_dict):
+ print ' '.join(['%s=%s'%(k,str(v))for k,v in sorted(event_fields_dict.items())])
+
+def process_event(param_dict):
+ global cache_size
+ global options
+ global prev_cpu
+
+ sample = param_dict["sample"]
+
+ if (options.verbose == True):
+ dump_packet(sample)
+
+ # If period doesn't equal to 1, this packet is for instruction sample
+ # packet, we need drop this synthetic packet.
+ if (sample['period'] != 1):
+ print "Skip synthetic instruction sample"
+ return
+
+ cpu = format(sample['cpu'], "d");
+
+ # Initialize CPU data if it's empty, and directly return back
+ # if this is the first tracing event for this CPU.
+ if (cpu_data.get(str(cpu) + 'addr') == None):
+ cpu_data[str(cpu) + 'addr'] = format(sample['addr'], "#x")
+ prev_cpu = cpu
+ return
+
+ # The format for packet is:
+ #
+ # +------------+------------+------------+
+ # sample_prev: | addr | ip | cpu |
+ # +------------+------------+------------+
+ # sample_next: | addr | ip | cpu |
+ # +------------+------------+------------+
+ #
+ # We need to combine the two continuous packets to get the instruction
+ # range for sample_prev::cpu:
+ #
+ # [ sample_prev::addr .. sample_next::ip ]
+ #
+ # For this purose, sample_prev::addr is stored into cpu_data structure
+ # and read back for 'start_addr' when the new packet comes, and we need
+ # to use sample_next::ip to calculate 'stop_addr', plusing extra 4 for
+ # 'stop_addr' is for the sake of objdump so the final assembler dump can
+ # include last instruction for sample_next::ip.
+
+ start_addr = cpu_data[str(prev_cpu) + 'addr']
+ stop_addr = format(sample['ip'] + 4, "#x")
+
+ # Record for previous sample packet
+ cpu_data[str(cpu) + 'addr'] = format(sample['addr'], "#x")
+ prev_cpu = cpu
+
+ # Handle CS_ETM_TRACE_ON packet if start_addr=0 and stop_addr=4
+ if (int(start_addr, 0) == 0 and int(stop_addr, 0) == 4):
+ print "CPU%s: CS_ETM_TRACE_ON packet is inserted" % cpu
+ return
+
+ # Sanity checking dso for start_addr and stop_addr
+ prev_dso = find_dso_mmap(int(start_addr, 0))
+ next_dso = find_dso_mmap(int(stop_addr, 0))
+
+ # If cannot find dso so cannot dump assembler, bail out
+ if (prev_dso == None or next_dso == None):
+ print "Address range [ %s .. %s ]: failed to find dso" % (start_addr, stop_addr)
+ return
+ elif (prev_dso != next_dso):
+ print "Address range [ %s .. %s ]: isn't in same dso" % (start_addr, stop_addr)
+ return
+
+ dump_disam(prev_dso, start_addr, stop_addr)
--
2.7.4
This commit documents CoreSight trace disassembler usage and gives
example for it.
Signed-off-by: Leo Yan <[email protected]>
---
Documentation/trace/coresight.txt | 52 +++++++++++++++++++++++++++++++++++++++
1 file changed, 52 insertions(+)
diff --git a/Documentation/trace/coresight.txt b/Documentation/trace/coresight.txt
index 6f0120c..b8f2359 100644
--- a/Documentation/trace/coresight.txt
+++ b/Documentation/trace/coresight.txt
@@ -381,3 +381,55 @@ sort example is from the AutoFDO tutorial (https://gcc.gnu.org/wiki/AutoFDO/Tuto
$ taskset -c 2 ./sort_autofdo
Bubble sorting array of 30000 elements
5806 ms
+
+
+Tracing data disassembler
+-------------------------
+
+'perf script' supports to use script to parse tracing packet and rely on
+'objdump' for disassembled lines, this can convert tracing data to readable
+program execution flow for easily reviewing tracing data.
+
+The CoreSight trace disassembler is located in the folder:
+tools/perf/scripts/python/arm-cs-trace-disasm.py. This script support below
+options:
+
+ -d, --objdump: Set path to objdump executable, this option is
+ mandatory.
+ -k, --vmlinux: Set path to vmlinux file.
+ -v, --verbose: Enable debugging log, after enable this option the
+ script dumps every event data.
+
+Below is one example for using python script to dump CoreSight trace
+disassembler:
+
+ $ perf script -s arm-cs-trace-disasm.py -i perf.data \
+ -F cpu,event,ip,addr,sym -- -d objdump -k ./vmlinux > cs-disasm.log
+
+Below is one example for the disassembler log:
+
+ARM CoreSight Trace Data Assembler Dump
+ ffff000008a5f2dc <etm4_enable_hw+0x344>:
+ ffff000008a5f2dc: 340000a0 cbz w0, ffff000008a5f2f0 <etm4_enable_hw+0x358>
+ ffff000008a5f2f0 <etm4_enable_hw+0x358>:
+ ffff000008a5f2f0: f9400260 ldr x0, [x19]
+ ffff000008a5f2f4: d5033f9f dsb sy
+ ffff000008a5f2f8: 913ec000 add x0, x0, #0xfb0
+ ffff000008a5f2fc: b900001f str wzr, [x0]
+ ffff000008a5f300: f9400bf3 ldr x19, [sp, #16]
+ ffff000008a5f304: a8c27bfd ldp x29, x30, [sp], #32
+ ffff000008a5f308: d65f03c0 ret
+ ffff000008a5fa18 <etm4_enable+0x1b0>:
+ ffff000008a5fa18: 14000025 b ffff000008a5faac <etm4_enable+0x244>
+ ffff000008a5faac <etm4_enable+0x244>:
+ ffff000008a5faac: b9406261 ldr w1, [x19, #96]
+ ffff000008a5fab0: 52800015 mov w21, #0x0 // #0
+ ffff000008a5fab4: f901ca61 str x1, [x19, #912]
+ ffff000008a5fab8: 2a1503e0 mov w0, w21
+ ffff000008a5fabc: 3940e261 ldrb w1, [x19, #56]
+ ffff000008a5fac0: f901ce61 str x1, [x19, #920]
+ ffff000008a5fac4: a94153f3 ldp x19, x20, [sp, #16]
+ ffff000008a5fac8: a9425bf5 ldp x21, x22, [sp, #32]
+ ffff000008a5facc: a94363f7 ldp x23, x24, [sp, #48]
+ ffff000008a5fad0: a8c47bfd ldp x29, x30, [sp], #64
+ ffff000008a5fad4: d65f03c0 ret
--
2.7.4
Commit e573e978fb12 ("perf cs-etm: Inject capabilitity for CoreSight
traces") reworks the samples generation flow from CoreSight trace to
match the correct format so Perf report tool can display the samples
properly.
But the change has side effect for branch packet handling, it only
generate branch samples by checking previous packet flag
'last_instr_taken_branch' is true, this results in below three kinds
packets are missed to generate branch samples:
- The start tracing packet at the beginning of tracing data;
- The exception handling packet;
- If one CS_ETM_TRACE_ON packet is inserted, we also miss to handle it
for branch samples. CS_ETM_TRACE_ON packet itself can give the info
that there have a discontinuity in the trace, on the other hand we
also miss to generate proper branch sample for packets before and
after CS_ETM_TRACE_ON packet.
This patch is to add branch sample handling for up three kinds packets:
- In function cs_etm__sample(), check if 'prev_packet->sample_type' is
zero and in this case it generates branch sample for the start tracing
packet; furthermore, we also need to handle the condition for
prev_packet::end_addr is zero in the cs_etm__last_executed_instr();
- In function cs_etm__sample(), check if 'prev_packet->exc' is true and
generate branch sample for exception handling packet;
- If there has one CS_ETM_TRACE_ON packet is coming, we firstly generate
branch sample in the function cs_etm__flush(), this can save complete
info for the previous CS_ETM_RANGE packet just before CS_ETM_TRACE_ON
packet. We also generate branch sample for the new CS_ETM_RANGE
packet after CS_ETM_TRACE_ON packet, this have two purposes, the
first one purpose is to save the info for the new CS_ETM_RANGE packet,
the second purpose is to save CS_ETM_TRACE_ON packet info so we can
have hint for a discontinuity in the trace.
For CS_ETM_TRACE_ON packet, its fields 'packet->start_addr' and
'packet->end_addr' equal to 0xdeadbeefdeadbeefUL which are emitted in
the decoder layer as dummy value. This patch is to convert these
values to zeros for more readable; this is accomplished by functions
cs_etm__last_executed_instr() and cs_etm__first_executed_instr(). The
later one is a new function introduced by this patch.
Reviewed-by: Robert Walker <[email protected]>
Fixes: e573e978fb12 ("perf cs-etm: Inject capabilitity for CoreSight traces")
Signed-off-by: Leo Yan <[email protected]>
---
tools/perf/util/cs-etm.c | 93 +++++++++++++++++++++++++++++++++++++-----------
1 file changed, 73 insertions(+), 20 deletions(-)
diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c
index 822ba91..8418173 100644
--- a/tools/perf/util/cs-etm.c
+++ b/tools/perf/util/cs-etm.c
@@ -495,6 +495,20 @@ static inline void cs_etm__reset_last_branch_rb(struct cs_etm_queue *etmq)
static inline u64 cs_etm__last_executed_instr(struct cs_etm_packet *packet)
{
/*
+ * The packet is the start tracing packet if the end_addr is zero,
+ * returns 0 for this case.
+ */
+ if (!packet->end_addr)
+ return 0;
+
+ /*
+ * The packet is the CS_ETM_TRACE_ON packet if the end_addr is
+ * magic number 0xdeadbeefdeadbeefUL, returns 0 for this case.
+ */
+ if (packet->end_addr == 0xdeadbeefdeadbeefUL)
+ return 0;
+
+ /*
* The packet records the execution range with an exclusive end address
*
* A64 instructions are constant size, so the last executed
@@ -505,6 +519,18 @@ static inline u64 cs_etm__last_executed_instr(struct cs_etm_packet *packet)
return packet->end_addr - A64_INSTR_SIZE;
}
+static inline u64 cs_etm__first_executed_instr(struct cs_etm_packet *packet)
+{
+ /*
+ * The packet is the CS_ETM_TRACE_ON packet if the start_addr is
+ * magic number 0xdeadbeefdeadbeefUL, returns 0 for this case.
+ */
+ if (packet->start_addr == 0xdeadbeefdeadbeefUL)
+ return 0;
+
+ return packet->start_addr;
+}
+
static inline u64 cs_etm__instr_count(const struct cs_etm_packet *packet)
{
/*
@@ -546,7 +572,7 @@ static void cs_etm__update_last_branch_rb(struct cs_etm_queue *etmq)
be = &bs->entries[etmq->last_branch_pos];
be->from = cs_etm__last_executed_instr(etmq->prev_packet);
- be->to = etmq->packet->start_addr;
+ be->to = cs_etm__first_executed_instr(etmq->packet);
/* No support for mispredict */
be->flags.mispred = 0;
be->flags.predicted = 1;
@@ -701,7 +727,7 @@ static int cs_etm__synth_branch_sample(struct cs_etm_queue *etmq)
sample.ip = cs_etm__last_executed_instr(etmq->prev_packet);
sample.pid = etmq->pid;
sample.tid = etmq->tid;
- sample.addr = etmq->packet->start_addr;
+ sample.addr = cs_etm__first_executed_instr(etmq->packet);
sample.id = etmq->etm->branches_id;
sample.stream_id = etmq->etm->branches_id;
sample.period = 1;
@@ -897,13 +923,28 @@ static int cs_etm__sample(struct cs_etm_queue *etmq)
etmq->period_instructions = instrs_over;
}
- if (etm->sample_branches &&
- etmq->prev_packet &&
- etmq->prev_packet->sample_type == CS_ETM_RANGE &&
- etmq->prev_packet->last_instr_taken_branch) {
- ret = cs_etm__synth_branch_sample(etmq);
- if (ret)
- return ret;
+ if (etm->sample_branches && etmq->prev_packet) {
+ bool generate_sample = false;
+
+ /* Generate sample for start tracing packet */
+ if (etmq->prev_packet->sample_type == 0 ||
+ etmq->prev_packet->sample_type == CS_ETM_TRACE_ON)
+ generate_sample = true;
+
+ /* Generate sample for exception packet */
+ if (etmq->prev_packet->exc == true)
+ generate_sample = true;
+
+ /* Generate sample for normal branch packet */
+ if (etmq->prev_packet->sample_type == CS_ETM_RANGE &&
+ etmq->prev_packet->last_instr_taken_branch)
+ generate_sample = true;
+
+ if (generate_sample) {
+ ret = cs_etm__synth_branch_sample(etmq);
+ if (ret)
+ return ret;
+ }
}
if (etm->sample_branches || etm->synth_opts.last_branch) {
@@ -922,11 +963,16 @@ static int cs_etm__sample(struct cs_etm_queue *etmq)
static int cs_etm__flush(struct cs_etm_queue *etmq)
{
int err = 0;
+ struct cs_etm_auxtrace *etm = etmq->etm;
struct cs_etm_packet *tmp;
- if (etmq->etm->synth_opts.last_branch &&
- etmq->prev_packet &&
- etmq->prev_packet->sample_type == CS_ETM_RANGE) {
+ if (!etmq->prev_packet)
+ return 0;
+
+ if (etmq->prev_packet->sample_type != CS_ETM_RANGE)
+ return 0;
+
+ if (etmq->etm->synth_opts.last_branch) {
/*
* Generate a last branch event for the branches left in the
* circular buffer at the end of the trace.
@@ -939,18 +985,25 @@ static int cs_etm__flush(struct cs_etm_queue *etmq)
err = cs_etm__synth_instruction_sample(
etmq, addr,
etmq->period_instructions);
+ if (err)
+ return err;
etmq->period_instructions = 0;
+ }
- /*
- * Swap PACKET with PREV_PACKET: PACKET becomes PREV_PACKET for
- * the next incoming packet.
- */
- tmp = etmq->packet;
- etmq->packet = etmq->prev_packet;
- etmq->prev_packet = tmp;
+ if (etm->sample_branches) {
+ err = cs_etm__synth_branch_sample(etmq);
+ if (err)
+ return err;
}
- return err;
+ /*
+ * Swap PACKET with PREV_PACKET: PACKET becomes PREV_PACKET for
+ * the next incoming packet.
+ */
+ tmp = etmq->packet;
+ etmq->packet = etmq->prev_packet;
+ etmq->prev_packet = tmp;
+ return 0;
}
static int cs_etm__run_decoder(struct cs_etm_queue *etmq)
--
2.7.4
Em Mon, May 28, 2018 at 04:44:59PM +0800, Leo Yan escreveu:
> This patch series is to support for using 'perf script' for CoreSight
> trace disassembler, for this purpose this patch series adds a new
> python script to parse CoreSight tracing event and use command 'objdump'
> for disassembled lines, finally this can generate readable program
> execution flow for reviewing tracing data.
>
> Patch 0001 is one fixing patch to generate samples for the start packet
> and exception packets.
>
> Patch 0002 is the prerequisite to add addr into sample dict, so this
> value can be used by python script to analyze instruction range.
>
> Patch 0003 is to add python script for trace disassembler.
>
> Patch 0004 is to add doc to explain python script usage and give
> example for it.
>
> This patch series has been rebased on acme git tree [1] with the commit
> 19422a9f2a3b ("perf tools: Fix kernel_start for PTI on x86") and tested
> on Hikey (ARM64 octa CA53 cores).
Thanks, applied to perf/core.
- Arnaldo
On 28 May 2018 at 14:03, Arnaldo Carvalho de Melo <[email protected]> wrote:
> Em Mon, May 28, 2018 at 04:44:59PM +0800, Leo Yan escreveu:
>> This patch series is to support for using 'perf script' for CoreSight
>> trace disassembler, for this purpose this patch series adds a new
>> python script to parse CoreSight tracing event and use command 'objdump'
>> for disassembled lines, finally this can generate readable program
>> execution flow for reviewing tracing data.
>>
>> Patch 0001 is one fixing patch to generate samples for the start packet
>> and exception packets.
>>
>> Patch 0002 is the prerequisite to add addr into sample dict, so this
>> value can be used by python script to analyze instruction range.
>>
>> Patch 0003 is to add python script for trace disassembler.
>>
>> Patch 0004 is to add doc to explain python script usage and give
>> example for it.
>>
>> This patch series has been rebased on acme git tree [1] with the commit
>> 19422a9f2a3b ("perf tools: Fix kernel_start for PTI on x86") and tested
>> on Hikey (ARM64 octa CA53 cores).
>
> Thanks, applied to perf/core.
Please hold off on that Arnaldo - I'm currently reviewing the set and
I think some things can be improved.
Thanks,
Mathieu
>
> - Arnaldo
Leo and/or Robert,
On Mon, May 28, 2018 at 04:45:00PM +0800, Leo Yan wrote:
> Commit e573e978fb12 ("perf cs-etm: Inject capabilitity for CoreSight
> traces") reworks the samples generation flow from CoreSight trace to
> match the correct format so Perf report tool can display the samples
> properly.
>
> But the change has side effect for branch packet handling, it only
> generate branch samples by checking previous packet flag
> 'last_instr_taken_branch' is true, this results in below three kinds
> packets are missed to generate branch samples:
>
> - The start tracing packet at the beginning of tracing data;
> - The exception handling packet;
> - If one CS_ETM_TRACE_ON packet is inserted, we also miss to handle it
> for branch samples. CS_ETM_TRACE_ON packet itself can give the info
> that there have a discontinuity in the trace, on the other hand we
> also miss to generate proper branch sample for packets before and
> after CS_ETM_TRACE_ON packet.
>
> This patch is to add branch sample handling for up three kinds packets:
>
> - In function cs_etm__sample(), check if 'prev_packet->sample_type' is
> zero and in this case it generates branch sample for the start tracing
> packet; furthermore, we also need to handle the condition for
> prev_packet::end_addr is zero in the cs_etm__last_executed_instr();
>
> - In function cs_etm__sample(), check if 'prev_packet->exc' is true and
> generate branch sample for exception handling packet;
>
> - If there has one CS_ETM_TRACE_ON packet is coming, we firstly generate
> branch sample in the function cs_etm__flush(), this can save complete
> info for the previous CS_ETM_RANGE packet just before CS_ETM_TRACE_ON
> packet. We also generate branch sample for the new CS_ETM_RANGE
> packet after CS_ETM_TRACE_ON packet, this have two purposes, the
> first one purpose is to save the info for the new CS_ETM_RANGE packet,
> the second purpose is to save CS_ETM_TRACE_ON packet info so we can
> have hint for a discontinuity in the trace.
>
> For CS_ETM_TRACE_ON packet, its fields 'packet->start_addr' and
> 'packet->end_addr' equal to 0xdeadbeefdeadbeefUL which are emitted in
> the decoder layer as dummy value. This patch is to convert these
> values to zeros for more readable; this is accomplished by functions
> cs_etm__last_executed_instr() and cs_etm__first_executed_instr(). The
> later one is a new function introduced by this patch.
>
> Reviewed-by: Robert Walker <[email protected]>
> Fixes: e573e978fb12 ("perf cs-etm: Inject capabilitity for CoreSight traces")
> Signed-off-by: Leo Yan <[email protected]>
> ---
> tools/perf/util/cs-etm.c | 93 +++++++++++++++++++++++++++++++++++++-----------
> 1 file changed, 73 insertions(+), 20 deletions(-)
>
> diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c
> index 822ba91..8418173 100644
> --- a/tools/perf/util/cs-etm.c
> +++ b/tools/perf/util/cs-etm.c
> @@ -495,6 +495,20 @@ static inline void cs_etm__reset_last_branch_rb(struct cs_etm_queue *etmq)
> static inline u64 cs_etm__last_executed_instr(struct cs_etm_packet *packet)
> {
> /*
> + * The packet is the start tracing packet if the end_addr is zero,
> + * returns 0 for this case.
> + */
> + if (!packet->end_addr)
> + return 0;
What is considered to be the "start tracing packet"? Right now the only two
kind of packets inserted in the decoder packet buffer queue are INST_RANGE and
TRACE_ON. How can we hit a condition where packet->end-addr == 0?
> +
> + /*
> + * The packet is the CS_ETM_TRACE_ON packet if the end_addr is
> + * magic number 0xdeadbeefdeadbeefUL, returns 0 for this case.
> + */
> + if (packet->end_addr == 0xdeadbeefdeadbeefUL)
> + return 0;
As it is with the above, I find triggering on addresses to be brittle and hard
to maintain on the long run. Packets all have a sample_type field that should
be used in cases like this one. That way we know exactly the condition that is
targeted.
While working on this set, please spin-off another patch that defines
CS_ETM_INVAL_ADDR 0xdeadbeefdeadbeefUL and replace all the cases where the
numeral is used. That way we stop using the hard coded value.
> +
> + /*
> * The packet records the execution range with an exclusive end address
> *
> * A64 instructions are constant size, so the last executed
> @@ -505,6 +519,18 @@ static inline u64 cs_etm__last_executed_instr(struct cs_etm_packet *packet)
> return packet->end_addr - A64_INSTR_SIZE;
> }
>
> +static inline u64 cs_etm__first_executed_instr(struct cs_etm_packet *packet)
> +{
> + /*
> + * The packet is the CS_ETM_TRACE_ON packet if the start_addr is
> + * magic number 0xdeadbeefdeadbeefUL, returns 0 for this case.
> + */
> + if (packet->start_addr == 0xdeadbeefdeadbeefUL)
> + return 0;
Same comment as above.
> +
> + return packet->start_addr;
> +}
> +
> static inline u64 cs_etm__instr_count(const struct cs_etm_packet *packet)
> {
> /*
> @@ -546,7 +572,7 @@ static void cs_etm__update_last_branch_rb(struct cs_etm_queue *etmq)
>
> be = &bs->entries[etmq->last_branch_pos];
> be->from = cs_etm__last_executed_instr(etmq->prev_packet);
> - be->to = etmq->packet->start_addr;
> + be->to = cs_etm__first_executed_instr(etmq->packet);
> /* No support for mispredict */
> be->flags.mispred = 0;
> be->flags.predicted = 1;
> @@ -701,7 +727,7 @@ static int cs_etm__synth_branch_sample(struct cs_etm_queue *etmq)
> sample.ip = cs_etm__last_executed_instr(etmq->prev_packet);
> sample.pid = etmq->pid;
> sample.tid = etmq->tid;
> - sample.addr = etmq->packet->start_addr;
> + sample.addr = cs_etm__first_executed_instr(etmq->packet);
> sample.id = etmq->etm->branches_id;
> sample.stream_id = etmq->etm->branches_id;
> sample.period = 1;
> @@ -897,13 +923,28 @@ static int cs_etm__sample(struct cs_etm_queue *etmq)
> etmq->period_instructions = instrs_over;
> }
>
> - if (etm->sample_branches &&
> - etmq->prev_packet &&
> - etmq->prev_packet->sample_type == CS_ETM_RANGE &&
> - etmq->prev_packet->last_instr_taken_branch) {
> - ret = cs_etm__synth_branch_sample(etmq);
> - if (ret)
> - return ret;
> + if (etm->sample_branches && etmq->prev_packet) {
> + bool generate_sample = false;
> +
> + /* Generate sample for start tracing packet */
> + if (etmq->prev_packet->sample_type == 0 ||
What kind of packet is sample_type == 0 ?
> + etmq->prev_packet->sample_type == CS_ETM_TRACE_ON)
> + generate_sample = true;
> +
> + /* Generate sample for exception packet */
> + if (etmq->prev_packet->exc == true)
> + generate_sample = true;
Please don't do that. Exception packets have a type of their own and can be
added to the decoder packet queue the same way INST_RANGE and TRACE_ON packets
are. Moreover exception packet containt an address that, if I'm reading the
documenation properly, can be used to keep track of instructions that were
executed between the last address of the previous range packet and the address
executed just before the exception occurred. Mike and Rob will have to confirm
this as the decoder may be doing all that hard work for us.
> +
> + /* Generate sample for normal branch packet */
> + if (etmq->prev_packet->sample_type == CS_ETM_RANGE &&
> + etmq->prev_packet->last_instr_taken_branch)
> + generate_sample = true;
> +
> + if (generate_sample) {
> + ret = cs_etm__synth_branch_sample(etmq);
> + if (ret)
> + return ret;
> + }
> }
>
> if (etm->sample_branches || etm->synth_opts.last_branch) {
> @@ -922,11 +963,16 @@ static int cs_etm__sample(struct cs_etm_queue *etmq)
> static int cs_etm__flush(struct cs_etm_queue *etmq)
> {
> int err = 0;
> + struct cs_etm_auxtrace *etm = etmq->etm;
> struct cs_etm_packet *tmp;
>
> - if (etmq->etm->synth_opts.last_branch &&
> - etmq->prev_packet &&
> - etmq->prev_packet->sample_type == CS_ETM_RANGE) {
> + if (!etmq->prev_packet)
> + return 0;
> +
> + if (etmq->prev_packet->sample_type != CS_ETM_RANGE)
> + return 0;
> +
> + if (etmq->etm->synth_opts.last_branch) {
If you add:
if (!etmq->etm->synth_opts.last_branch)
return 0;
You can avoid indenting the whole block.
> /*
> * Generate a last branch event for the branches left in the
> * circular buffer at the end of the trace.
> @@ -939,18 +985,25 @@ static int cs_etm__flush(struct cs_etm_queue *etmq)
> err = cs_etm__synth_instruction_sample(
> etmq, addr,
> etmq->period_instructions);
> + if (err)
> + return err;
> etmq->period_instructions = 0;
> + }
>
> - /*
> - * Swap PACKET with PREV_PACKET: PACKET becomes PREV_PACKET for
> - * the next incoming packet.
> - */
> - tmp = etmq->packet;
> - etmq->packet = etmq->prev_packet;
> - etmq->prev_packet = tmp;
> + if (etm->sample_branches) {
> + err = cs_etm__synth_branch_sample(etmq);
> + if (err)
> + return err;
> }
>
> - return err;
> + /*
> + * Swap PACKET with PREV_PACKET: PACKET becomes PREV_PACKET for
> + * the next incoming packet.
> + */
> + tmp = etmq->packet;
> + etmq->packet = etmq->prev_packet;
> + etmq->prev_packet = tmp;
Robert, I remember noticing that when you first submitted the code but forgot to
go back to it. What is the point of swapping the packets? I understand
etmq->prev_packet = etmq->packet;
But not
etmq->packet = tmp;
After all etmq->packet will be clobbered as soon as cs_etm_decoder__get_packet()
is called, which is alwasy right after either cs_etm__sample() or
cs_etm__flush().
Thanks,
Mathieu
> + return 0;
> }
>
> static int cs_etm__run_decoder(struct cs_etm_queue *etmq)
> --
> 2.7.4
>
Hi Mathieu,
On Mon, May 28, 2018 at 04:13:47PM -0600, Mathieu Poirier wrote:
> Leo and/or Robert,
>
> On Mon, May 28, 2018 at 04:45:00PM +0800, Leo Yan wrote:
> > Commit e573e978fb12 ("perf cs-etm: Inject capabilitity for CoreSight
> > traces") reworks the samples generation flow from CoreSight trace to
> > match the correct format so Perf report tool can display the samples
> > properly.
> >
> > But the change has side effect for branch packet handling, it only
> > generate branch samples by checking previous packet flag
> > 'last_instr_taken_branch' is true, this results in below three kinds
> > packets are missed to generate branch samples:
> >
> > - The start tracing packet at the beginning of tracing data;
> > - The exception handling packet;
> > - If one CS_ETM_TRACE_ON packet is inserted, we also miss to handle it
> > for branch samples. CS_ETM_TRACE_ON packet itself can give the info
> > that there have a discontinuity in the trace, on the other hand we
> > also miss to generate proper branch sample for packets before and
> > after CS_ETM_TRACE_ON packet.
> >
> > This patch is to add branch sample handling for up three kinds packets:
> >
> > - In function cs_etm__sample(), check if 'prev_packet->sample_type' is
> > zero and in this case it generates branch sample for the start tracing
> > packet; furthermore, we also need to handle the condition for
> > prev_packet::end_addr is zero in the cs_etm__last_executed_instr();
> >
> > - In function cs_etm__sample(), check if 'prev_packet->exc' is true and
> > generate branch sample for exception handling packet;
> >
> > - If there has one CS_ETM_TRACE_ON packet is coming, we firstly generate
> > branch sample in the function cs_etm__flush(), this can save complete
> > info for the previous CS_ETM_RANGE packet just before CS_ETM_TRACE_ON
> > packet. We also generate branch sample for the new CS_ETM_RANGE
> > packet after CS_ETM_TRACE_ON packet, this have two purposes, the
> > first one purpose is to save the info for the new CS_ETM_RANGE packet,
> > the second purpose is to save CS_ETM_TRACE_ON packet info so we can
> > have hint for a discontinuity in the trace.
> >
> > For CS_ETM_TRACE_ON packet, its fields 'packet->start_addr' and
> > 'packet->end_addr' equal to 0xdeadbeefdeadbeefUL which are emitted in
> > the decoder layer as dummy value. This patch is to convert these
> > values to zeros for more readable; this is accomplished by functions
> > cs_etm__last_executed_instr() and cs_etm__first_executed_instr(). The
> > later one is a new function introduced by this patch.
> >
> > Reviewed-by: Robert Walker <[email protected]>
> > Fixes: e573e978fb12 ("perf cs-etm: Inject capabilitity for CoreSight traces")
> > Signed-off-by: Leo Yan <[email protected]>
> > ---
> > tools/perf/util/cs-etm.c | 93 +++++++++++++++++++++++++++++++++++++-----------
> > 1 file changed, 73 insertions(+), 20 deletions(-)
> >
> > diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c
> > index 822ba91..8418173 100644
> > --- a/tools/perf/util/cs-etm.c
> > +++ b/tools/perf/util/cs-etm.c
> > @@ -495,6 +495,20 @@ static inline void cs_etm__reset_last_branch_rb(struct cs_etm_queue *etmq)
> > static inline u64 cs_etm__last_executed_instr(struct cs_etm_packet *packet)
> > {
> > /*
> > + * The packet is the start tracing packet if the end_addr is zero,
> > + * returns 0 for this case.
> > + */
> > + if (!packet->end_addr)
> > + return 0;
>
> What is considered to be the "start tracing packet"? Right now the only two
> kind of packets inserted in the decoder packet buffer queue are INST_RANGE and
> TRACE_ON. How can we hit a condition where packet->end-addr == 0?
When the first CS_ETM_RANGE packet is coming, etmq->prev_packet is
initialized by the function cs_etm__alloc_queue(), so
etmq->prev_packet->end_addr is zero:
etmq->prev_packet = zalloc(szp);
As you mentioned, we should only have two kind of packets for
CS_ETM_RANGE and CS_ETM_TRACE_ON. Currently we skip to handle the
first CS_ETM_TRACE_ON packet in function cs_etm__flush(), we also can
refine the function cs_etm__flush() to handle the first coming
CS_ETM_TRACE_ON packet, after that all packets will be CS_ETM_RANGE
and CS_ETM_TRACE_ON and have no chance to hit 'packet->end_addr = 0'.
Does this make sense for you?
--- Packet dumping when first packet coming ---
cs_etm__flush: prev_packet: sample_type=0 exc=0 exc_ret=0 cpu=0 start_addr=0x0 end_addr=0x0 last_instr_taken_branch=0
cs_etm__flush: packet: sample_type=2 exc=0 exc_ret=0 cpu=1 start_addr=0xdeadbeefdeadbeef end_addr=0xdeadbeefdeadbeef last_instr_taken_branch=0
> > +
> > + /*
> > + * The packet is the CS_ETM_TRACE_ON packet if the end_addr is
> > + * magic number 0xdeadbeefdeadbeefUL, returns 0 for this case.
> > + */
> > + if (packet->end_addr == 0xdeadbeefdeadbeefUL)
> > + return 0;
>
> As it is with the above, I find triggering on addresses to be brittle and hard
> to maintain on the long run. Packets all have a sample_type field that should
> be used in cases like this one. That way we know exactly the condition that is
> targeted.
Will do this.
> While working on this set, please spin-off another patch that defines
> CS_ETM_INVAL_ADDR 0xdeadbeefdeadbeefUL and replace all the cases where the
> numeral is used. That way we stop using the hard coded value.
Will do this.
As now this patch is big with more complex logic, so I consider to
split it into small patches:
- Define CS_ETM_INVAL_ADDR;
- Fix for CS_ETM_TRACE_ON packet;
- Fix for exception packet;
Does this make sense for you? I have concern that this patch is a
fixing patch, so not sure after spliting patches will introduce
trouble for applying them for other stable kernels ...
> > +
> > + /*
> > * The packet records the execution range with an exclusive end address
> > *
> > * A64 instructions are constant size, so the last executed
> > @@ -505,6 +519,18 @@ static inline u64 cs_etm__last_executed_instr(struct cs_etm_packet *packet)
> > return packet->end_addr - A64_INSTR_SIZE;
> > }
> >
> > +static inline u64 cs_etm__first_executed_instr(struct cs_etm_packet *packet)
> > +{
> > + /*
> > + * The packet is the CS_ETM_TRACE_ON packet if the start_addr is
> > + * magic number 0xdeadbeefdeadbeefUL, returns 0 for this case.
> > + */
> > + if (packet->start_addr == 0xdeadbeefdeadbeefUL)
> > + return 0;
>
> Same comment as above.
Will do this.
> > +
> > + return packet->start_addr;
> > +}
> > +
> > static inline u64 cs_etm__instr_count(const struct cs_etm_packet *packet)
> > {
> > /*
> > @@ -546,7 +572,7 @@ static void cs_etm__update_last_branch_rb(struct cs_etm_queue *etmq)
> >
> > be = &bs->entries[etmq->last_branch_pos];
> > be->from = cs_etm__last_executed_instr(etmq->prev_packet);
> > - be->to = etmq->packet->start_addr;
> > + be->to = cs_etm__first_executed_instr(etmq->packet);
> > /* No support for mispredict */
> > be->flags.mispred = 0;
> > be->flags.predicted = 1;
> > @@ -701,7 +727,7 @@ static int cs_etm__synth_branch_sample(struct cs_etm_queue *etmq)
> > sample.ip = cs_etm__last_executed_instr(etmq->prev_packet);
> > sample.pid = etmq->pid;
> > sample.tid = etmq->tid;
> > - sample.addr = etmq->packet->start_addr;
> > + sample.addr = cs_etm__first_executed_instr(etmq->packet);
> > sample.id = etmq->etm->branches_id;
> > sample.stream_id = etmq->etm->branches_id;
> > sample.period = 1;
> > @@ -897,13 +923,28 @@ static int cs_etm__sample(struct cs_etm_queue *etmq)
> > etmq->period_instructions = instrs_over;
> > }
> >
> > - if (etm->sample_branches &&
> > - etmq->prev_packet &&
> > - etmq->prev_packet->sample_type == CS_ETM_RANGE &&
> > - etmq->prev_packet->last_instr_taken_branch) {
> > - ret = cs_etm__synth_branch_sample(etmq);
> > - if (ret)
> > - return ret;
> > + if (etm->sample_branches && etmq->prev_packet) {
> > + bool generate_sample = false;
> > +
> > + /* Generate sample for start tracing packet */
> > + if (etmq->prev_packet->sample_type == 0 ||
>
> What kind of packet is sample_type == 0 ?
Just as explained above, sample_type == 0 is the packet which
initialized in the function cs_etm__alloc_queue().
> > + etmq->prev_packet->sample_type == CS_ETM_TRACE_ON)
> > + generate_sample = true;
> > +
> > + /* Generate sample for exception packet */
> > + if (etmq->prev_packet->exc == true)
> > + generate_sample = true;
>
> Please don't do that. Exception packets have a type of their own and can be
> added to the decoder packet queue the same way INST_RANGE and TRACE_ON packets
> are. Moreover exception packet containt an address that, if I'm reading the
> documenation properly, can be used to keep track of instructions that were
> executed between the last address of the previous range packet and the address
> executed just before the exception occurred. Mike and Rob will have to confirm
> this as the decoder may be doing all that hard work for us.
Sure, will wait for Rob and Mike to confirm for this.
At my side, I dump the packet, the exception packet isn't passed to
cs-etm.c layer, the decoder layer only sets the flag
'packet->exc = true' when exception packet is coming [1].
[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c#n364
> > +
> > + /* Generate sample for normal branch packet */
> > + if (etmq->prev_packet->sample_type == CS_ETM_RANGE &&
> > + etmq->prev_packet->last_instr_taken_branch)
> > + generate_sample = true;
> > +
> > + if (generate_sample) {
> > + ret = cs_etm__synth_branch_sample(etmq);
> > + if (ret)
> > + return ret;
> > + }
> > }
> >
> > if (etm->sample_branches || etm->synth_opts.last_branch) {
> > @@ -922,11 +963,16 @@ static int cs_etm__sample(struct cs_etm_queue *etmq)
> > static int cs_etm__flush(struct cs_etm_queue *etmq)
> > {
> > int err = 0;
> > + struct cs_etm_auxtrace *etm = etmq->etm;
> > struct cs_etm_packet *tmp;
> >
> > - if (etmq->etm->synth_opts.last_branch &&
> > - etmq->prev_packet &&
> > - etmq->prev_packet->sample_type == CS_ETM_RANGE) {
> > + if (!etmq->prev_packet)
> > + return 0;
> > +
> > + if (etmq->prev_packet->sample_type != CS_ETM_RANGE)
> > + return 0;
> > +
> > + if (etmq->etm->synth_opts.last_branch) {
>
> If you add:
>
> if (!etmq->etm->synth_opts.last_branch)
> return 0;
>
> You can avoid indenting the whole block.
No, here we cannot do like this. Except we need to handle the
condition for 'etmq->etm->synth_opts.last_branch', we also need to
handle 'etm->sample_branches'. These two conditions are saperate and
decide by different command parameters from 'perf script'.
> > /*
> > * Generate a last branch event for the branches left in the
> > * circular buffer at the end of the trace.
> > @@ -939,18 +985,25 @@ static int cs_etm__flush(struct cs_etm_queue *etmq)
> > err = cs_etm__synth_instruction_sample(
> > etmq, addr,
> > etmq->period_instructions);
> > + if (err)
> > + return err;
> > etmq->period_instructions = 0;
> > + }
> >
> > - /*
> > - * Swap PACKET with PREV_PACKET: PACKET becomes PREV_PACKET for
> > - * the next incoming packet.
> > - */
> > - tmp = etmq->packet;
> > - etmq->packet = etmq->prev_packet;
> > - etmq->prev_packet = tmp;
> > + if (etm->sample_branches) {
> > + err = cs_etm__synth_branch_sample(etmq);
> > + if (err)
> > + return err;
> > }
> >
> > - return err;
> > + /*
> > + * Swap PACKET with PREV_PACKET: PACKET becomes PREV_PACKET for
> > + * the next incoming packet.
> > + */
> > + tmp = etmq->packet;
> > + etmq->packet = etmq->prev_packet;
> > + etmq->prev_packet = tmp;
>
> Robert, I remember noticing that when you first submitted the code but forgot to
> go back to it. What is the point of swapping the packets? I understand
>
> etmq->prev_packet = etmq->packet;
>
> But not
>
> etmq->packet = tmp;
>
> After all etmq->packet will be clobbered as soon as cs_etm_decoder__get_packet()
> is called, which is alwasy right after either cs_etm__sample() or
> cs_etm__flush().
Yeah, I have the same question for this :)
Thanks for suggestions and reviewing.
> Thanks,
> Mathieu
>
>
>
> > + return 0;
> > }
> >
> > static int cs_etm__run_decoder(struct cs_etm_queue *etmq)
> > --
> > 2.7.4
> >
Em Mon, May 28, 2018 at 03:53:42PM -0600, Mathieu Poirier escreveu:
> On 28 May 2018 at 14:03, Arnaldo Carvalho de Melo <[email protected]> wrote:
> > Em Mon, May 28, 2018 at 04:44:59PM +0800, Leo Yan escreveu:
> >> This patch series is to support for using 'perf script' for CoreSight
> >> trace disassembler, for this purpose this patch series adds a new
> >> python script to parse CoreSight tracing event and use command 'objdump'
> >> for disassembled lines, finally this can generate readable program
> >> execution flow for reviewing tracing data.
> >>
> >> Patch 0001 is one fixing patch to generate samples for the start packet
> >> and exception packets.
> >>
> >> Patch 0002 is the prerequisite to add addr into sample dict, so this
> >> value can be used by python script to analyze instruction range.
> >>
> >> Patch 0003 is to add python script for trace disassembler.
> >>
> >> Patch 0004 is to add doc to explain python script usage and give
> >> example for it.
> >>
> >> This patch series has been rebased on acme git tree [1] with the commit
> >> 19422a9f2a3b ("perf tools: Fix kernel_start for PTI on x86") and tested
> >> on Hikey (ARM64 octa CA53 cores).
> >
> > Thanks, applied to perf/core.
>
> Please hold off on that Arnaldo - I'm currently reviewing the set and
> I think some things can be improved.
Ok, I dropped all but the one adding sample->addr to the python
dictionary, that is ok to cherry pick.
- Arnaldo
On 28 May 2018 at 18:25, Leo Yan <[email protected]> wrote:
> Hi Mathieu,
>
> On Mon, May 28, 2018 at 04:13:47PM -0600, Mathieu Poirier wrote:
>> Leo and/or Robert,
>>
>> On Mon, May 28, 2018 at 04:45:00PM +0800, Leo Yan wrote:
>> > Commit e573e978fb12 ("perf cs-etm: Inject capabilitity for CoreSight
>> > traces") reworks the samples generation flow from CoreSight trace to
>> > match the correct format so Perf report tool can display the samples
>> > properly.
>> >
>> > But the change has side effect for branch packet handling, it only
>> > generate branch samples by checking previous packet flag
>> > 'last_instr_taken_branch' is true, this results in below three kinds
>> > packets are missed to generate branch samples:
>> >
>> > - The start tracing packet at the beginning of tracing data;
>> > - The exception handling packet;
>> > - If one CS_ETM_TRACE_ON packet is inserted, we also miss to handle it
>> > for branch samples. CS_ETM_TRACE_ON packet itself can give the info
>> > that there have a discontinuity in the trace, on the other hand we
>> > also miss to generate proper branch sample for packets before and
>> > after CS_ETM_TRACE_ON packet.
>> >
>> > This patch is to add branch sample handling for up three kinds packets:
>> >
>> > - In function cs_etm__sample(), check if 'prev_packet->sample_type' is
>> > zero and in this case it generates branch sample for the start tracing
>> > packet; furthermore, we also need to handle the condition for
>> > prev_packet::end_addr is zero in the cs_etm__last_executed_instr();
>> >
>> > - In function cs_etm__sample(), check if 'prev_packet->exc' is true and
>> > generate branch sample for exception handling packet;
>> >
>> > - If there has one CS_ETM_TRACE_ON packet is coming, we firstly generate
>> > branch sample in the function cs_etm__flush(), this can save complete
>> > info for the previous CS_ETM_RANGE packet just before CS_ETM_TRACE_ON
>> > packet. We also generate branch sample for the new CS_ETM_RANGE
>> > packet after CS_ETM_TRACE_ON packet, this have two purposes, the
>> > first one purpose is to save the info for the new CS_ETM_RANGE packet,
>> > the second purpose is to save CS_ETM_TRACE_ON packet info so we can
>> > have hint for a discontinuity in the trace.
>> >
>> > For CS_ETM_TRACE_ON packet, its fields 'packet->start_addr' and
>> > 'packet->end_addr' equal to 0xdeadbeefdeadbeefUL which are emitted in
>> > the decoder layer as dummy value. This patch is to convert these
>> > values to zeros for more readable; this is accomplished by functions
>> > cs_etm__last_executed_instr() and cs_etm__first_executed_instr(). The
>> > later one is a new function introduced by this patch.
>> >
>> > Reviewed-by: Robert Walker <[email protected]>
>> > Fixes: e573e978fb12 ("perf cs-etm: Inject capabilitity for CoreSight traces")
>> > Signed-off-by: Leo Yan <[email protected]>
>> > ---
>> > tools/perf/util/cs-etm.c | 93 +++++++++++++++++++++++++++++++++++++-----------
>> > 1 file changed, 73 insertions(+), 20 deletions(-)
>> >
>> > diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c
>> > index 822ba91..8418173 100644
>> > --- a/tools/perf/util/cs-etm.c
>> > +++ b/tools/perf/util/cs-etm.c
>> > @@ -495,6 +495,20 @@ static inline void cs_etm__reset_last_branch_rb(struct cs_etm_queue *etmq)
>> > static inline u64 cs_etm__last_executed_instr(struct cs_etm_packet *packet)
>> > {
>> > /*
>> > + * The packet is the start tracing packet if the end_addr is zero,
>> > + * returns 0 for this case.
>> > + */
>> > + if (!packet->end_addr)
>> > + return 0;
>>
>> What is considered to be the "start tracing packet"? Right now the only two
>> kind of packets inserted in the decoder packet buffer queue are INST_RANGE and
>> TRACE_ON. How can we hit a condition where packet->end-addr == 0?
>
> When the first CS_ETM_RANGE packet is coming, etmq->prev_packet is
> initialized by the function cs_etm__alloc_queue(), so
> etmq->prev_packet->end_addr is zero:
>
> etmq->prev_packet = zalloc(szp);
>
> As you mentioned, we should only have two kind of packets for
> CS_ETM_RANGE and CS_ETM_TRACE_ON. Currently we skip to handle the
> first CS_ETM_TRACE_ON packet in function cs_etm__flush(), we also can
> refine the function cs_etm__flush() to handle the first coming
> CS_ETM_TRACE_ON packet, after that all packets will be CS_ETM_RANGE
> and CS_ETM_TRACE_ON and have no chance to hit 'packet->end_addr = 0'.
>
> Does this make sense for you?
That is the right way to handle this condition and it gives us a
reliable state machine.
>
> --- Packet dumping when first packet coming ---
> cs_etm__flush: prev_packet: sample_type=0 exc=0 exc_ret=0 cpu=0 start_addr=0x0 end_addr=0x0 last_instr_taken_branch=0
> cs_etm__flush: packet: sample_type=2 exc=0 exc_ret=0 cpu=1 start_addr=0xdeadbeefdeadbeef end_addr=0xdeadbeefdeadbeef last_instr_taken_branch=0
>
>> > +
>> > + /*
>> > + * The packet is the CS_ETM_TRACE_ON packet if the end_addr is
>> > + * magic number 0xdeadbeefdeadbeefUL, returns 0 for this case.
>> > + */
>> > + if (packet->end_addr == 0xdeadbeefdeadbeefUL)
>> > + return 0;
>>
>> As it is with the above, I find triggering on addresses to be brittle and hard
>> to maintain on the long run. Packets all have a sample_type field that should
>> be used in cases like this one. That way we know exactly the condition that is
>> targeted.
>
> Will do this.
>
>> While working on this set, please spin-off another patch that defines
>> CS_ETM_INVAL_ADDR 0xdeadbeefdeadbeefUL and replace all the cases where the
>> numeral is used. That way we stop using the hard coded value.
>
> Will do this.
Much appreciated.
>
> As now this patch is big with more complex logic, so I consider to
> split it into small patches:
>
> - Define CS_ETM_INVAL_ADDR;
> - Fix for CS_ETM_TRACE_ON packet;
> - Fix for exception packet;
>
> Does this make sense for you? I have concern that this patch is a
> fixing patch, so not sure after spliting patches will introduce
> trouble for applying them for other stable kernels ...
Reverse the order:
- Fix for CS_ETM_TRACE_ON packet;
- Fix for exception packet;
- Define CS_ETM_INVAL_ADDR;
But you may not need to - see next comment.
>
>> > +
>> > + /*
>> > * The packet records the execution range with an exclusive end address
>> > *
>> > * A64 instructions are constant size, so the last executed
>> > @@ -505,6 +519,18 @@ static inline u64 cs_etm__last_executed_instr(struct cs_etm_packet *packet)
>> > return packet->end_addr - A64_INSTR_SIZE;
>> > }
>> >
>> > +static inline u64 cs_etm__first_executed_instr(struct cs_etm_packet *packet)
>> > +{
>> > + /*
>> > + * The packet is the CS_ETM_TRACE_ON packet if the start_addr is
>> > + * magic number 0xdeadbeefdeadbeefUL, returns 0 for this case.
>> > + */
>> > + if (packet->start_addr == 0xdeadbeefdeadbeefUL)
>> > + return 0;
>>
>> Same comment as above.
>
> Will do this.
>
>> > +
>> > + return packet->start_addr;
>> > +}
>> > +
>> > static inline u64 cs_etm__instr_count(const struct cs_etm_packet *packet)
>> > {
>> > /*
>> > @@ -546,7 +572,7 @@ static void cs_etm__update_last_branch_rb(struct cs_etm_queue *etmq)
>> >
>> > be = &bs->entries[etmq->last_branch_pos];
>> > be->from = cs_etm__last_executed_instr(etmq->prev_packet);
>> > - be->to = etmq->packet->start_addr;
>> > + be->to = cs_etm__first_executed_instr(etmq->packet);
>> > /* No support for mispredict */
>> > be->flags.mispred = 0;
>> > be->flags.predicted = 1;
>> > @@ -701,7 +727,7 @@ static int cs_etm__synth_branch_sample(struct cs_etm_queue *etmq)
>> > sample.ip = cs_etm__last_executed_instr(etmq->prev_packet);
>> > sample.pid = etmq->pid;
>> > sample.tid = etmq->tid;
>> > - sample.addr = etmq->packet->start_addr;
>> > + sample.addr = cs_etm__first_executed_instr(etmq->packet);
>> > sample.id = etmq->etm->branches_id;
>> > sample.stream_id = etmq->etm->branches_id;
>> > sample.period = 1;
>> > @@ -897,13 +923,28 @@ static int cs_etm__sample(struct cs_etm_queue *etmq)
>> > etmq->period_instructions = instrs_over;
>> > }
>> >
>> > - if (etm->sample_branches &&
>> > - etmq->prev_packet &&
>> > - etmq->prev_packet->sample_type == CS_ETM_RANGE &&
>> > - etmq->prev_packet->last_instr_taken_branch) {
>> > - ret = cs_etm__synth_branch_sample(etmq);
>> > - if (ret)
>> > - return ret;
>> > + if (etm->sample_branches && etmq->prev_packet) {
>> > + bool generate_sample = false;
>> > +
>> > + /* Generate sample for start tracing packet */
>> > + if (etmq->prev_packet->sample_type == 0 ||
>>
>> What kind of packet is sample_type == 0 ?
>
> Just as explained above, sample_type == 0 is the packet which
> initialized in the function cs_etm__alloc_queue().
>
>> > + etmq->prev_packet->sample_type == CS_ETM_TRACE_ON)
>> > + generate_sample = true;
>> > +
>> > + /* Generate sample for exception packet */
>> > + if (etmq->prev_packet->exc == true)
>> > + generate_sample = true;
>>
>> Please don't do that. Exception packets have a type of their own and can be
>> added to the decoder packet queue the same way INST_RANGE and TRACE_ON packets
>> are. Moreover exception packet containt an address that, if I'm reading the
>> documenation properly, can be used to keep track of instructions that were
>> executed between the last address of the previous range packet and the address
>> executed just before the exception occurred. Mike and Rob will have to confirm
>> this as the decoder may be doing all that hard work for us.
>
> Sure, will wait for Rob and Mike to confirm for this.
>
> At my side, I dump the packet, the exception packet isn't passed to
> cs-etm.c layer, the decoder layer only sets the flag
> 'packet->exc = true' when exception packet is coming [1].
That's because we didn't need the information. Now that we do a
function that will insert a packet in the decoder packet queue and
deal with the new packet type in the main decoder loop [2]. At that
point your work may not be eligible for stable anymore and I think it
is fine. Robert's work was an enhancement over mine and yours is an
enhancement over his.
[2]. https://elixir.bootlin.com/linux/v4.17-rc7/source/tools/perf/util/cs-etm.c#L999
>
> [1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c#n364
>
>> > +
>> > + /* Generate sample for normal branch packet */
>> > + if (etmq->prev_packet->sample_type == CS_ETM_RANGE &&
>> > + etmq->prev_packet->last_instr_taken_branch)
>> > + generate_sample = true;
>> > +
>> > + if (generate_sample) {
>> > + ret = cs_etm__synth_branch_sample(etmq);
>> > + if (ret)
>> > + return ret;
>> > + }
>> > }
>> >
>> > if (etm->sample_branches || etm->synth_opts.last_branch) {
>> > @@ -922,11 +963,16 @@ static int cs_etm__sample(struct cs_etm_queue *etmq)
>> > static int cs_etm__flush(struct cs_etm_queue *etmq)
>> > {
>> > int err = 0;
>> > + struct cs_etm_auxtrace *etm = etmq->etm;
>> > struct cs_etm_packet *tmp;
>> >
>> > - if (etmq->etm->synth_opts.last_branch &&
>> > - etmq->prev_packet &&
>> > - etmq->prev_packet->sample_type == CS_ETM_RANGE) {
>> > + if (!etmq->prev_packet)
>> > + return 0;
>> > +
>> > + if (etmq->prev_packet->sample_type != CS_ETM_RANGE)
>> > + return 0;
>> > +
>> > + if (etmq->etm->synth_opts.last_branch) {
>>
>> If you add:
>>
>> if (!etmq->etm->synth_opts.last_branch)
>> return 0;
>>
>> You can avoid indenting the whole block.
>
> No, here we cannot do like this. Except we need to handle the
> condition for 'etmq->etm->synth_opts.last_branch', we also need to
> handle 'etm->sample_branches'. These two conditions are saperate and
> decide by different command parameters from 'perf script'.
Pardon me - I didn't see the addition of the new '}' just below.
>
>> > /*
>> > * Generate a last branch event for the branches left in the
>> > * circular buffer at the end of the trace.
>> > @@ -939,18 +985,25 @@ static int cs_etm__flush(struct cs_etm_queue *etmq)
>> > err = cs_etm__synth_instruction_sample(
>> > etmq, addr,
>> > etmq->period_instructions);
>> > + if (err)
>> > + return err;
>> > etmq->period_instructions = 0;
>> > + }
>> >
>> > - /*
>> > - * Swap PACKET with PREV_PACKET: PACKET becomes PREV_PACKET for
>> > - * the next incoming packet.
>> > - */
>> > - tmp = etmq->packet;
>> > - etmq->packet = etmq->prev_packet;
>> > - etmq->prev_packet = tmp;
>> > + if (etm->sample_branches) {
>> > + err = cs_etm__synth_branch_sample(etmq);
>> > + if (err)
>> > + return err;
>> > }
>> >
>> > - return err;
>> > + /*
>> > + * Swap PACKET with PREV_PACKET: PACKET becomes PREV_PACKET for
>> > + * the next incoming packet.
>> > + */
>> > + tmp = etmq->packet;
>> > + etmq->packet = etmq->prev_packet;
>> > + etmq->prev_packet = tmp;
>>
>> Robert, I remember noticing that when you first submitted the code but forgot to
>> go back to it. What is the point of swapping the packets? I understand
>>
>> etmq->prev_packet = etmq->packet;
>>
>> But not
>>
>> etmq->packet = tmp;
>>
>> After all etmq->packet will be clobbered as soon as cs_etm_decoder__get_packet()
>> is called, which is alwasy right after either cs_etm__sample() or
>> cs_etm__flush().
>
> Yeah, I have the same question for this :)
>
> Thanks for suggestions and reviewing.
>
>> Thanks,
>> Mathieu
>>
>>
>>
>> > + return 0;
>> > }
>> >
>> > static int cs_etm__run_decoder(struct cs_etm_queue *etmq)
>> > --
>> > 2.7.4
>> >
Hi Mathieu,
On Tue, May 29, 2018 at 10:04:49AM -0600, Mathieu Poirier wrote:
[...]
> > As now this patch is big with more complex logic, so I consider to
> > split it into small patches:
> >
> > - Define CS_ETM_INVAL_ADDR;
> > - Fix for CS_ETM_TRACE_ON packet;
> > - Fix for exception packet;
> >
> > Does this make sense for you? I have concern that this patch is a
> > fixing patch, so not sure after spliting patches will introduce
> > trouble for applying them for other stable kernels ...
>
> Reverse the order:
>
> - Fix for CS_ETM_TRACE_ON packet;
> - Fix for exception packet;
> - Define CS_ETM_INVAL_ADDR;
>
> But you may not need to - see next comment.
From the discussion context, I think here 'you may not need to' is
referring to my concern for applying patches on stable kernel, so I
should take this patch series as an enhancement and don't need to
consider much for stable kernel.
On the other hand, your suggestion is possible to mean 'not need
to' split into small patches (though I guess this is misunderstanding
for your meaning).
Could you clarify which is your meaning?
> >> > +
> >> > + /*
> >> > * The packet records the execution range with an exclusive end address
> >> > *
> >> > * A64 instructions are constant size, so the last executed
> >> > @@ -505,6 +519,18 @@ static inline u64 cs_etm__last_executed_instr(struct cs_etm_packet *packet)
> >> > return packet->end_addr - A64_INSTR_SIZE;
> >> > }
> >> >
> >> > +static inline u64 cs_etm__first_executed_instr(struct cs_etm_packet *packet)
> >> > +{
> >> > + /*
> >> > + * The packet is the CS_ETM_TRACE_ON packet if the start_addr is
> >> > + * magic number 0xdeadbeefdeadbeefUL, returns 0 for this case.
> >> > + */
> >> > + if (packet->start_addr == 0xdeadbeefdeadbeefUL)
> >> > + return 0;
> >>
> >> Same comment as above.
> >
> > Will do this.
> >
> >> > +
> >> > + return packet->start_addr;
> >> > +}
> >> > +
> >> > static inline u64 cs_etm__instr_count(const struct cs_etm_packet *packet)
> >> > {
> >> > /*
> >> > @@ -546,7 +572,7 @@ static void cs_etm__update_last_branch_rb(struct cs_etm_queue *etmq)
> >> >
> >> > be = &bs->entries[etmq->last_branch_pos];
> >> > be->from = cs_etm__last_executed_instr(etmq->prev_packet);
> >> > - be->to = etmq->packet->start_addr;
> >> > + be->to = cs_etm__first_executed_instr(etmq->packet);
> >> > /* No support for mispredict */
> >> > be->flags.mispred = 0;
> >> > be->flags.predicted = 1;
> >> > @@ -701,7 +727,7 @@ static int cs_etm__synth_branch_sample(struct cs_etm_queue *etmq)
> >> > sample.ip = cs_etm__last_executed_instr(etmq->prev_packet);
> >> > sample.pid = etmq->pid;
> >> > sample.tid = etmq->tid;
> >> > - sample.addr = etmq->packet->start_addr;
> >> > + sample.addr = cs_etm__first_executed_instr(etmq->packet);
> >> > sample.id = etmq->etm->branches_id;
> >> > sample.stream_id = etmq->etm->branches_id;
> >> > sample.period = 1;
> >> > @@ -897,13 +923,28 @@ static int cs_etm__sample(struct cs_etm_queue *etmq)
> >> > etmq->period_instructions = instrs_over;
> >> > }
> >> >
> >> > - if (etm->sample_branches &&
> >> > - etmq->prev_packet &&
> >> > - etmq->prev_packet->sample_type == CS_ETM_RANGE &&
> >> > - etmq->prev_packet->last_instr_taken_branch) {
> >> > - ret = cs_etm__synth_branch_sample(etmq);
> >> > - if (ret)
> >> > - return ret;
> >> > + if (etm->sample_branches && etmq->prev_packet) {
> >> > + bool generate_sample = false;
> >> > +
> >> > + /* Generate sample for start tracing packet */
> >> > + if (etmq->prev_packet->sample_type == 0 ||
> >>
> >> What kind of packet is sample_type == 0 ?
> >
> > Just as explained above, sample_type == 0 is the packet which
> > initialized in the function cs_etm__alloc_queue().
> >
> >> > + etmq->prev_packet->sample_type == CS_ETM_TRACE_ON)
> >> > + generate_sample = true;
> >> > +
> >> > + /* Generate sample for exception packet */
> >> > + if (etmq->prev_packet->exc == true)
> >> > + generate_sample = true;
> >>
> >> Please don't do that. Exception packets have a type of their own and can be
> >> added to the decoder packet queue the same way INST_RANGE and TRACE_ON packets
> >> are. Moreover exception packet containt an address that, if I'm reading the
> >> documenation properly, can be used to keep track of instructions that were
> >> executed between the last address of the previous range packet and the address
> >> executed just before the exception occurred. Mike and Rob will have to confirm
> >> this as the decoder may be doing all that hard work for us.
> >
> > Sure, will wait for Rob and Mike to confirm for this.
> >
> > At my side, I dump the packet, the exception packet isn't passed to
> > cs-etm.c layer, the decoder layer only sets the flag
> > 'packet->exc = true' when exception packet is coming [1].
>
> That's because we didn't need the information. Now that we do a
> function that will insert a packet in the decoder packet queue and
> deal with the new packet type in the main decoder loop [2]. At that
> point your work may not be eligible for stable anymore and I think it
> is fine. Robert's work was an enhancement over mine and yours is an
> enhancement over his.
>
> [2]. https://elixir.bootlin.com/linux/v4.17-rc7/source/tools/perf/util/cs-etm.c#L999
Agree, will look into for exception packet and try to add new packet
type for this.
[...]
Thanks,
Leo Yan
On 28/05/18 23:13, Mathieu Poirier wrote:
> Leo and/or Robert,
>
> On Mon, May 28, 2018 at 04:45:00PM +0800, Leo Yan wrote:
>> Commit e573e978fb12 ("perf cs-etm: Inject capabilitity for CoreSight
>> traces") reworks the samples generation flow from CoreSight trace to
>> match the correct format so Perf report tool can display the samples
>> properly.
>>
>> But the change has side effect for branch packet handling, it only
>> generate branch samples by checking previous packet flag
>> 'last_instr_taken_branch' is true, this results in below three kinds
>> packets are missed to generate branch samples:
>>
>> - The start tracing packet at the beginning of tracing data;
>> - The exception handling packet;
>> - If one CS_ETM_TRACE_ON packet is inserted, we also miss to handle it
>> for branch samples. CS_ETM_TRACE_ON packet itself can give the info
>> that there have a discontinuity in the trace, on the other hand we
>> also miss to generate proper branch sample for packets before and
>> after CS_ETM_TRACE_ON packet.
>>
>> This patch is to add branch sample handling for up three kinds packets:
>>
>> - In function cs_etm__sample(), check if 'prev_packet->sample_type' is
>> zero and in this case it generates branch sample for the start tracing
>> packet; furthermore, we also need to handle the condition for
>> prev_packet::end_addr is zero in the cs_etm__last_executed_instr();
>>
>> - In function cs_etm__sample(), check if 'prev_packet->exc' is true and
>> generate branch sample for exception handling packet;
>>
>> - If there has one CS_ETM_TRACE_ON packet is coming, we firstly generate
>> branch sample in the function cs_etm__flush(), this can save complete
>> info for the previous CS_ETM_RANGE packet just before CS_ETM_TRACE_ON
>> packet. We also generate branch sample for the new CS_ETM_RANGE
>> packet after CS_ETM_TRACE_ON packet, this have two purposes, the
>> first one purpose is to save the info for the new CS_ETM_RANGE packet,
>> the second purpose is to save CS_ETM_TRACE_ON packet info so we can
>> have hint for a discontinuity in the trace.
>>
>> For CS_ETM_TRACE_ON packet, its fields 'packet->start_addr' and
>> 'packet->end_addr' equal to 0xdeadbeefdeadbeefUL which are emitted in
>> the decoder layer as dummy value. This patch is to convert these
>> values to zeros for more readable; this is accomplished by functions
>> cs_etm__last_executed_instr() and cs_etm__first_executed_instr(). The
>> later one is a new function introduced by this patch.
>>
>> Reviewed-by: Robert Walker <[email protected]>
>> Fixes: e573e978fb12 ("perf cs-etm: Inject capabilitity for CoreSight traces")
>> Signed-off-by: Leo Yan <[email protected]>
>> ---
>> tools/perf/util/cs-etm.c | 93 +++++++++++++++++++++++++++++++++++++-----------
>> 1 file changed, 73 insertions(+), 20 deletions(-)
>>
>> diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c
>> index 822ba91..8418173 100644
>> --- a/tools/perf/util/cs-etm.c
>> +++ b/tools/perf/util/cs-etm.c
>> @@ -495,6 +495,20 @@ static inline void cs_etm__reset_last_branch_rb(struct cs_etm_queue *etmq)
>> static inline u64 cs_etm__last_executed_instr(struct cs_etm_packet *packet)
>> {
>> /*
>> + * The packet is the start tracing packet if the end_addr is zero,
>> + * returns 0 for this case.
>> + */
>> + if (!packet->end_addr)
>> + return 0;
>
> What is considered to be the "start tracing packet"? Right now the only two
> kind of packets inserted in the decoder packet buffer queue are INST_RANGE and
> TRACE_ON. How can we hit a condition where packet->end-addr == 0?
>
>
>> +
>> + /*
>> + * The packet is the CS_ETM_TRACE_ON packet if the end_addr is
>> + * magic number 0xdeadbeefdeadbeefUL, returns 0 for this case.
>> + */
>> + if (packet->end_addr == 0xdeadbeefdeadbeefUL)
>> + return 0;
>
> As it is with the above, I find triggering on addresses to be brittle and hard
> to maintain on the long run. Packets all have a sample_type field that should
> be used in cases like this one. That way we know exactly the condition that is
> targeted.
>
> While working on this set, please spin-off another patch that defines
> CS_ETM_INVAL_ADDR 0xdeadbeefdeadbeefUL and replace all the cases where the
> numeral is used. That way we stop using the hard coded value.
>
>> +
>> + /*
>> * The packet records the execution range with an exclusive end address
>> *
>> * A64 instructions are constant size, so the last executed
>> @@ -505,6 +519,18 @@ static inline u64 cs_etm__last_executed_instr(struct cs_etm_packet *packet)
>> return packet->end_addr - A64_INSTR_SIZE;
>> }
>>
>> +static inline u64 cs_etm__first_executed_instr(struct cs_etm_packet *packet)
>> +{
>> + /*
>> + * The packet is the CS_ETM_TRACE_ON packet if the start_addr is
>> + * magic number 0xdeadbeefdeadbeefUL, returns 0 for this case.
>> + */
>> + if (packet->start_addr == 0xdeadbeefdeadbeefUL)
>> + return 0;
>
> Same comment as above.
>
>> +
>> + return packet->start_addr;
>> +}
>> +
>> static inline u64 cs_etm__instr_count(const struct cs_etm_packet *packet)
>> {
>> /*
>> @@ -546,7 +572,7 @@ static void cs_etm__update_last_branch_rb(struct cs_etm_queue *etmq)
>>
>> be = &bs->entries[etmq->last_branch_pos];
>> be->from = cs_etm__last_executed_instr(etmq->prev_packet);
>> - be->to = etmq->packet->start_addr;
>> + be->to = cs_etm__first_executed_instr(etmq->packet);
>> /* No support for mispredict */
>> be->flags.mispred = 0;
>> be->flags.predicted = 1;
>> @@ -701,7 +727,7 @@ static int cs_etm__synth_branch_sample(struct cs_etm_queue *etmq)
>> sample.ip = cs_etm__last_executed_instr(etmq->prev_packet);
>> sample.pid = etmq->pid;
>> sample.tid = etmq->tid;
>> - sample.addr = etmq->packet->start_addr;
>> + sample.addr = cs_etm__first_executed_instr(etmq->packet);
>> sample.id = etmq->etm->branches_id;
>> sample.stream_id = etmq->etm->branches_id;
>> sample.period = 1;
>> @@ -897,13 +923,28 @@ static int cs_etm__sample(struct cs_etm_queue *etmq)
>> etmq->period_instructions = instrs_over;
>> }
>>
>> - if (etm->sample_branches &&
>> - etmq->prev_packet &&
>> - etmq->prev_packet->sample_type == CS_ETM_RANGE &&
>> - etmq->prev_packet->last_instr_taken_branch) {
>> - ret = cs_etm__synth_branch_sample(etmq);
>> - if (ret)
>> - return ret;
>> + if (etm->sample_branches && etmq->prev_packet) {
>> + bool generate_sample = false;
>> +
>> + /* Generate sample for start tracing packet */
>> + if (etmq->prev_packet->sample_type == 0 ||
>
> What kind of packet is sample_type == 0 ?
>
>> + etmq->prev_packet->sample_type == CS_ETM_TRACE_ON)
>> + generate_sample = true;
>> +
>> + /* Generate sample for exception packet */
>> + if (etmq->prev_packet->exc == true)
>> + generate_sample = true;
>
> Please don't do that. Exception packets have a type of their own and can be
> added to the decoder packet queue the same way INST_RANGE and TRACE_ON packets
> are. Moreover exception packet containt an address that, if I'm reading the
> documenation properly, can be used to keep track of instructions that were
> executed between the last address of the previous range packet and the address
> executed just before the exception occurred. Mike and Rob will have to confirm
> this as the decoder may be doing all that hard work for us.
>
>> +
>> + /* Generate sample for normal branch packet */
>> + if (etmq->prev_packet->sample_type == CS_ETM_RANGE &&
>> + etmq->prev_packet->last_instr_taken_branch)
>> + generate_sample = true;
>> +
>> + if (generate_sample) {
>> + ret = cs_etm__synth_branch_sample(etmq);
>> + if (ret)
>> + return ret;
>> + }
>> }
>>
>> if (etm->sample_branches || etm->synth_opts.last_branch) {
>> @@ -922,11 +963,16 @@ static int cs_etm__sample(struct cs_etm_queue *etmq)
>> static int cs_etm__flush(struct cs_etm_queue *etmq)
>> {
>> int err = 0;
>> + struct cs_etm_auxtrace *etm = etmq->etm;
>> struct cs_etm_packet *tmp;
>>
>> - if (etmq->etm->synth_opts.last_branch &&
>> - etmq->prev_packet &&
>> - etmq->prev_packet->sample_type == CS_ETM_RANGE) {
>> + if (!etmq->prev_packet)
>> + return 0;
>> +
>> + if (etmq->prev_packet->sample_type != CS_ETM_RANGE)
>> + return 0;
>> +
>> + if (etmq->etm->synth_opts.last_branch) {
>
> If you add:
>
> if (!etmq->etm->synth_opts.last_branch)
> return 0;
>
> You can avoid indenting the whole block.
>
>> /*
>> * Generate a last branch event for the branches left in the
>> * circular buffer at the end of the trace.
>> @@ -939,18 +985,25 @@ static int cs_etm__flush(struct cs_etm_queue *etmq)
>> err = cs_etm__synth_instruction_sample(
>> etmq, addr,
>> etmq->period_instructions);
>> + if (err)
>> + return err;
>> etmq->period_instructions = 0;
>> + }
>>
>> - /*
>> - * Swap PACKET with PREV_PACKET: PACKET becomes PREV_PACKET for
>> - * the next incoming packet.
>> - */
>> - tmp = etmq->packet;
>> - etmq->packet = etmq->prev_packet;
>> - etmq->prev_packet = tmp;
>> + if (etm->sample_branches) {
>> + err = cs_etm__synth_branch_sample(etmq);
>> + if (err)
>> + return err;
>> }
>>
>> - return err;
>> + /*
>> + * Swap PACKET with PREV_PACKET: PACKET becomes PREV_PACKET for
>> + * the next incoming packet.
>> + */
>> + tmp = etmq->packet;
>> + etmq->packet = etmq->prev_packet;
>> + etmq->prev_packet = tmp;
>
> Robert, I remember noticing that when you first submitted the code but forgot to
> go back to it. What is the point of swapping the packets? I understand
>
> etmq->prev_packet = etmq->packet;
>
> But not
>
> etmq->packet = tmp;
>
> After all etmq->packet will be clobbered as soon as cs_etm_decoder__get_packet()
> is called, which is alwasy right after either cs_etm__sample() or
> cs_etm__flush().
>
This is code I inherited from the original versions of these patches,
but it works because:
- etmq->packet and etmq->prev_packet are pointers to struct
cs_etm_packet allocated by zalloc() in cs_etm__alloc_queue()
- cs_etm_decoder__get_packet() takes a pointer to struct cs_etm_packet
and copies the contents of the first packet from the queue into the
passed location with:
*packet = decoder->packet_buffer[decoder->head]
So the swap code is only swapping the pointers over, not the contents of
the packets.
Regards
Rob
> Thanks,
> Mathieu
>
>
>
>> + return 0;
>> }
>>
>> static int cs_etm__run_decoder(struct cs_etm_queue *etmq)
>> --
>> 2.7.4
>>
On 29 May 2018 at 18:28, Leo Yan <[email protected]> wrote:
> Hi Mathieu,
>
> On Tue, May 29, 2018 at 10:04:49AM -0600, Mathieu Poirier wrote:
>
> [...]
>
>> > As now this patch is big with more complex logic, so I consider to
>> > split it into small patches:
>> >
>> > - Define CS_ETM_INVAL_ADDR;
>> > - Fix for CS_ETM_TRACE_ON packet;
>> > - Fix for exception packet;
>> >
>> > Does this make sense for you? I have concern that this patch is a
>> > fixing patch, so not sure after spliting patches will introduce
>> > trouble for applying them for other stable kernels ...
>>
>> Reverse the order:
>>
>> - Fix for CS_ETM_TRACE_ON packet;
>> - Fix for exception packet;
>> - Define CS_ETM_INVAL_ADDR;
>>
>> But you may not need to - see next comment.
>
> From the discussion context, I think here 'you may not need to' is
> referring to my concern for applying patches on stable kernel, so I
> should take this patch series as an enhancement and don't need to
> consider much for stable kernel.
Yes, that is what I meant.
>
> On the other hand, your suggestion is possible to mean 'not need
> to' split into small patches (though I guess this is misunderstanding
> for your meaning).
>
> Could you clarify which is your meaning?
>
>> >> > +
>> >> > + /*
>> >> > * The packet records the execution range with an exclusive end address
>> >> > *
>> >> > * A64 instructions are constant size, so the last executed
>> >> > @@ -505,6 +519,18 @@ static inline u64 cs_etm__last_executed_instr(struct cs_etm_packet *packet)
>> >> > return packet->end_addr - A64_INSTR_SIZE;
>> >> > }
>> >> >
>> >> > +static inline u64 cs_etm__first_executed_instr(struct cs_etm_packet *packet)
>> >> > +{
>> >> > + /*
>> >> > + * The packet is the CS_ETM_TRACE_ON packet if the start_addr is
>> >> > + * magic number 0xdeadbeefdeadbeefUL, returns 0 for this case.
>> >> > + */
>> >> > + if (packet->start_addr == 0xdeadbeefdeadbeefUL)
>> >> > + return 0;
>> >>
>> >> Same comment as above.
>> >
>> > Will do this.
>> >
>> >> > +
>> >> > + return packet->start_addr;
>> >> > +}
>> >> > +
>> >> > static inline u64 cs_etm__instr_count(const struct cs_etm_packet *packet)
>> >> > {
>> >> > /*
>> >> > @@ -546,7 +572,7 @@ static void cs_etm__update_last_branch_rb(struct cs_etm_queue *etmq)
>> >> >
>> >> > be = &bs->entries[etmq->last_branch_pos];
>> >> > be->from = cs_etm__last_executed_instr(etmq->prev_packet);
>> >> > - be->to = etmq->packet->start_addr;
>> >> > + be->to = cs_etm__first_executed_instr(etmq->packet);
>> >> > /* No support for mispredict */
>> >> > be->flags.mispred = 0;
>> >> > be->flags.predicted = 1;
>> >> > @@ -701,7 +727,7 @@ static int cs_etm__synth_branch_sample(struct cs_etm_queue *etmq)
>> >> > sample.ip = cs_etm__last_executed_instr(etmq->prev_packet);
>> >> > sample.pid = etmq->pid;
>> >> > sample.tid = etmq->tid;
>> >> > - sample.addr = etmq->packet->start_addr;
>> >> > + sample.addr = cs_etm__first_executed_instr(etmq->packet);
>> >> > sample.id = etmq->etm->branches_id;
>> >> > sample.stream_id = etmq->etm->branches_id;
>> >> > sample.period = 1;
>> >> > @@ -897,13 +923,28 @@ static int cs_etm__sample(struct cs_etm_queue *etmq)
>> >> > etmq->period_instructions = instrs_over;
>> >> > }
>> >> >
>> >> > - if (etm->sample_branches &&
>> >> > - etmq->prev_packet &&
>> >> > - etmq->prev_packet->sample_type == CS_ETM_RANGE &&
>> >> > - etmq->prev_packet->last_instr_taken_branch) {
>> >> > - ret = cs_etm__synth_branch_sample(etmq);
>> >> > - if (ret)
>> >> > - return ret;
>> >> > + if (etm->sample_branches && etmq->prev_packet) {
>> >> > + bool generate_sample = false;
>> >> > +
>> >> > + /* Generate sample for start tracing packet */
>> >> > + if (etmq->prev_packet->sample_type == 0 ||
>> >>
>> >> What kind of packet is sample_type == 0 ?
>> >
>> > Just as explained above, sample_type == 0 is the packet which
>> > initialized in the function cs_etm__alloc_queue().
>> >
>> >> > + etmq->prev_packet->sample_type == CS_ETM_TRACE_ON)
>> >> > + generate_sample = true;
>> >> > +
>> >> > + /* Generate sample for exception packet */
>> >> > + if (etmq->prev_packet->exc == true)
>> >> > + generate_sample = true;
>> >>
>> >> Please don't do that. Exception packets have a type of their own and can be
>> >> added to the decoder packet queue the same way INST_RANGE and TRACE_ON packets
>> >> are. Moreover exception packet containt an address that, if I'm reading the
>> >> documenation properly, can be used to keep track of instructions that were
>> >> executed between the last address of the previous range packet and the address
>> >> executed just before the exception occurred. Mike and Rob will have to confirm
>> >> this as the decoder may be doing all that hard work for us.
>> >
>> > Sure, will wait for Rob and Mike to confirm for this.
>> >
>> > At my side, I dump the packet, the exception packet isn't passed to
>> > cs-etm.c layer, the decoder layer only sets the flag
>> > 'packet->exc = true' when exception packet is coming [1].
>>
>> That's because we didn't need the information. Now that we do a
>> function that will insert a packet in the decoder packet queue and
>> deal with the new packet type in the main decoder loop [2]. At that
>> point your work may not be eligible for stable anymore and I think it
>> is fine. Robert's work was an enhancement over mine and yours is an
>> enhancement over his.
>>
>> [2]. https://elixir.bootlin.com/linux/v4.17-rc7/source/tools/perf/util/cs-etm.c#L999
>
> Agree, will look into for exception packet and try to add new packet
> type for this.
>
> [...]
>
> Thanks,
> Leo Yan
On Wed, May 30, 2018 at 08:45:46AM -0600, Mathieu Poirier wrote:
> On 29 May 2018 at 18:28, Leo Yan <[email protected]> wrote:
> > Hi Mathieu,
> >
> > On Tue, May 29, 2018 at 10:04:49AM -0600, Mathieu Poirier wrote:
> >
> > [...]
> >
> >> > As now this patch is big with more complex logic, so I consider to
> >> > split it into small patches:
> >> >
> >> > - Define CS_ETM_INVAL_ADDR;
> >> > - Fix for CS_ETM_TRACE_ON packet;
> >> > - Fix for exception packet;
> >> >
> >> > Does this make sense for you? I have concern that this patch is a
> >> > fixing patch, so not sure after spliting patches will introduce
> >> > trouble for applying them for other stable kernels ...
> >>
> >> Reverse the order:
> >>
> >> - Fix for CS_ETM_TRACE_ON packet;
> >> - Fix for exception packet;
> >> - Define CS_ETM_INVAL_ADDR;
> >>
> >> But you may not need to - see next comment.
> >
> > From the discussion context, I think here 'you may not need to' is
> > referring to my concern for applying patches on stable kernel, so I
> > should take this patch series as an enhancement and don't need to
> > consider much for stable kernel.
>
> Yes, that is what I meant.
Thanks for confirmation, will send new patch series according to the
discussion.
[...]
Thanks,
Leo Yan
Hi Leo,
On 30 May 2018 at 10:45, Robert Walker <[email protected]> wrote:
>
>
> On 28/05/18 23:13, Mathieu Poirier wrote:
>>
>> Leo and/or Robert,
>>
>> On Mon, May 28, 2018 at 04:45:00PM +0800, Leo Yan wrote:
>>>
>>> Commit e573e978fb12 ("perf cs-etm: Inject capabilitity for CoreSight
>>> traces") reworks the samples generation flow from CoreSight trace to
>>> match the correct format so Perf report tool can display the samples
>>> properly.
>>>
>>> But the change has side effect for branch packet handling, it only
>>> generate branch samples by checking previous packet flag
>>> 'last_instr_taken_branch' is true, this results in below three kinds
>>> packets are missed to generate branch samples:
>>>
>>> - The start tracing packet at the beginning of tracing data;
>>> - The exception handling packet;
>>> - If one CS_ETM_TRACE_ON packet is inserted, we also miss to handle it
>>> for branch samples. CS_ETM_TRACE_ON packet itself can give the info
>>> that there have a discontinuity in the trace, on the other hand we
>>> also miss to generate proper branch sample for packets before and
>>> after CS_ETM_TRACE_ON packet.
>>>
>>> This patch is to add branch sample handling for up three kinds packets:
>>>
>>> - In function cs_etm__sample(), check if 'prev_packet->sample_type' is
>>> zero and in this case it generates branch sample for the start tracing
>>> packet; furthermore, we also need to handle the condition for
>>> prev_packet::end_addr is zero in the cs_etm__last_executed_instr();
>>>
>>> - In function cs_etm__sample(), check if 'prev_packet->exc' is true and
>>> generate branch sample for exception handling packet;
>>>
>>> - If there has one CS_ETM_TRACE_ON packet is coming, we firstly generate
>>> branch sample in the function cs_etm__flush(), this can save complete
>>> info for the previous CS_ETM_RANGE packet just before CS_ETM_TRACE_ON
>>> packet. We also generate branch sample for the new CS_ETM_RANGE
>>> packet after CS_ETM_TRACE_ON packet, this have two purposes, the
>>> first one purpose is to save the info for the new CS_ETM_RANGE packet,
>>> the second purpose is to save CS_ETM_TRACE_ON packet info so we can
>>> have hint for a discontinuity in the trace.
>>>
>>> For CS_ETM_TRACE_ON packet, its fields 'packet->start_addr' and
>>> 'packet->end_addr' equal to 0xdeadbeefdeadbeefUL which are emitted in
>>> the decoder layer as dummy value. This patch is to convert these
>>> values to zeros for more readable; this is accomplished by functions
>>> cs_etm__last_executed_instr() and cs_etm__first_executed_instr(). The
>>> later one is a new function introduced by this patch.
>>>
>>> Reviewed-by: Robert Walker <[email protected]>
>>> Fixes: e573e978fb12 ("perf cs-etm: Inject capabilitity for CoreSight
>>> traces")
>>> Signed-off-by: Leo Yan <[email protected]>
>>> ---
>>> tools/perf/util/cs-etm.c | 93
>>> +++++++++++++++++++++++++++++++++++++-----------
>>> 1 file changed, 73 insertions(+), 20 deletions(-)
>>>
>>> diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c
>>> index 822ba91..8418173 100644
>>> --- a/tools/perf/util/cs-etm.c
>>> +++ b/tools/perf/util/cs-etm.c
>>> @@ -495,6 +495,20 @@ static inline void
>>> cs_etm__reset_last_branch_rb(struct cs_etm_queue *etmq)
>>> static inline u64 cs_etm__last_executed_instr(struct cs_etm_packet
>>> *packet)
>>> {
>>> /*
>>> + * The packet is the start tracing packet if the end_addr is
>>> zero,
>>> + * returns 0 for this case.
>>> + */
>>> + if (!packet->end_addr)
>>> + return 0;
>>
>>
>> What is considered to be the "start tracing packet"? Right now the only
>> two
>> kind of packets inserted in the decoder packet buffer queue are INST_RANGE
>> and
>> TRACE_ON. How can we hit a condition where packet->end-addr == 0?
>>
>>
>>> +
>>> + /*
>>> + * The packet is the CS_ETM_TRACE_ON packet if the end_addr is
>>> + * magic number 0xdeadbeefdeadbeefUL, returns 0 for this case.
>>> + */
>>> + if (packet->end_addr == 0xdeadbeefdeadbeefUL)
>>> + return 0;
>>
>>
>> As it is with the above, I find triggering on addresses to be brittle and
>> hard
>> to maintain on the long run. Packets all have a sample_type field that
>> should
>> be used in cases like this one. That way we know exactly the condition
>> that is
>> targeted.
>>
>> While working on this set, please spin-off another patch that defines
>> CS_ETM_INVAL_ADDR 0xdeadbeefdeadbeefUL and replace all the cases where the
>> numeral is used. That way we stop using the hard coded value.
>>
>>> +
>>> + /*
>>> * The packet records the execution range with an exclusive end
>>> address
>>> *
>>> * A64 instructions are constant size, so the last executed
>>> @@ -505,6 +519,18 @@ static inline u64 cs_etm__last_executed_instr(struct
>>> cs_etm_packet *packet)
>>> return packet->end_addr - A64_INSTR_SIZE;
>>> }
>>> +static inline u64 cs_etm__first_executed_instr(struct cs_etm_packet
>>> *packet)
>>> +{
>>> + /*
>>> + * The packet is the CS_ETM_TRACE_ON packet if the start_addr is
>>> + * magic number 0xdeadbeefdeadbeefUL, returns 0 for this case.
>>> + */
>>> + if (packet->start_addr == 0xdeadbeefdeadbeefUL)
>>> + return 0;
>>
>>
>> Same comment as above.
>>
>>> +
>>> + return packet->start_addr;
>>> +}
>>> +
>>> static inline u64 cs_etm__instr_count(const struct cs_etm_packet
>>> *packet)
>>> {
>>> /*
>>> @@ -546,7 +572,7 @@ static void cs_etm__update_last_branch_rb(struct
>>> cs_etm_queue *etmq)
>>> be = &bs->entries[etmq->last_branch_pos];
>>> be->from = cs_etm__last_executed_instr(etmq->prev_packet);
>>> - be->to = etmq->packet->start_addr;
>>> + be->to = cs_etm__first_executed_instr(etmq->packet);
>>> /* No support for mispredict */
>>> be->flags.mispred = 0;
>>> be->flags.predicted = 1;
>>> @@ -701,7 +727,7 @@ static int cs_etm__synth_branch_sample(struct
>>> cs_etm_queue *etmq)
>>> sample.ip = cs_etm__last_executed_instr(etmq->prev_packet);
>>> sample.pid = etmq->pid;
>>> sample.tid = etmq->tid;
>>> - sample.addr = etmq->packet->start_addr;
>>> + sample.addr = cs_etm__first_executed_instr(etmq->packet);
>>> sample.id = etmq->etm->branches_id;
>>> sample.stream_id = etmq->etm->branches_id;
>>> sample.period = 1;
>>> @@ -897,13 +923,28 @@ static int cs_etm__sample(struct cs_etm_queue
>>> *etmq)
>>> etmq->period_instructions = instrs_over;
>>> }
>>> - if (etm->sample_branches &&
>>> - etmq->prev_packet &&
>>> - etmq->prev_packet->sample_type == CS_ETM_RANGE &&
>>> - etmq->prev_packet->last_instr_taken_branch) {
>>> - ret = cs_etm__synth_branch_sample(etmq);
>>> - if (ret)
>>> - return ret;
>>> + if (etm->sample_branches && etmq->prev_packet) {
>>> + bool generate_sample = false;
>>> +
>>> + /* Generate sample for start tracing packet */
>>> + if (etmq->prev_packet->sample_type == 0 ||
>>
>>
>> What kind of packet is sample_type == 0 ?
>>
>>> + etmq->prev_packet->sample_type == CS_ETM_TRACE_ON)
>>> + generate_sample = true;
>>> +
>>> + /* Generate sample for exception packet */
>>> + if (etmq->prev_packet->exc == true)
>>> + generate_sample = true;
>>
>>
>> Please don't do that. Exception packets have a type of their own and can
>> be
>> added to the decoder packet queue the same way INST_RANGE and TRACE_ON
>> packets
>> are. Moreover exception packet containt an address that, if I'm reading
>> the
>> documenation properly, can be used to keep track of instructions that were
>> executed between the last address of the previous range packet and the
>> address
>> executed just before the exception occurred. Mike and Rob will have to
>> confirm
>> this as the decoder may be doing all that hard work for us.
>>
clarification on the exception packets....
The Opencsd output exception packet gives you the exception number,
and optionally the preferred return address. If this address is
present does depend a lot on the underlying protocol - will normally
be there with ETMv4.
Exceptions are marked differently in the underlying protocol - the
OCSD packets abstract away these differences.
consider the code:
0x1000: <some instructions>
0x1100: BR 0x2000
....
0x2000: <some instructions>
0x2020 BZ r4
Without an exception this would result in the packets
OCSD_RANGE(0x1000,0x1104, Last instr type=Br, taken) // recall that
range packets have start addr inclusive, end addr exclusive.
OCSD_RANGE(0x2000,0x2024, Last instr type=Br, <taken / not taken -
depends on condition>
Now consider an exception occurring before the BR 0x2000
this will result in:-
OCSD_RANGE(0x1000, 0x1100, Last instr type=Other)
OCSD_EXECEPTION(IRQ, ret-addr 0x1100)
OCSD_RANGE(IRQ_START, IRQ_END+4, Last instr type = BR, taken) //
this is more likely to have multiple ranges / branches before any
return, but simplified here.
OCSD_EXCEPTION_RETURN() // present if exception returns are
explicitly marked in underlying trace - may not always be depending on
circumstances.
OCSD_RANGE(0x1100,0x1104, Last=BR, taken) // continue on with short
range - just the branch
OCSD_RANGE(0x2000,0x2024, Last instr type=Br, <taken / not taken -
depends on condition>
Now consider the exception occurring after the BR, but before any
other instructions are executed.
OCSD_RANGE(0x1000,0x1104, Last instr type=Br, taken) // recall that
range packets have start addr inclusive, end addr exclusive.
OCSD_EXECEPTION(IRQ, ret-addr 0x2000) // here the preferred return
address is actually the target of the branch.
OCSD_RANGE(IRQ_START, IRQ_END+4, Last instr type = BR, taken) //
this is more likely to have multiple ranges / branches before any
return, but simplified here.
OCSD_RANGE(0x2000,0x2024, Last instr type=Br, <taken / not taken -
depends on condition>
So in general it is possible to arrive in the IRQ_START range with the
previous packet having been either a taken branch, a not taken branch,
or not a branch.
Care must be taken - whether AutoFDO or normal trace disassembly not
to assume that having the last range packet as a taken branch means
that the next range packet is the target, if there is an intervening
exception packet.
Regards
Mike
>>> +
>>> + /* Generate sample for normal branch packet */
>>> + if (etmq->prev_packet->sample_type == CS_ETM_RANGE &&
>>> + etmq->prev_packet->last_instr_taken_branch)
>>> + generate_sample = true;
>>> +
>>> + if (generate_sample) {
>>> + ret = cs_etm__synth_branch_sample(etmq);
>>> + if (ret)
>>> + return ret;
>>> + }
>>> }
>>> if (etm->sample_branches || etm->synth_opts.last_branch) {
>>> @@ -922,11 +963,16 @@ static int cs_etm__sample(struct cs_etm_queue
>>> *etmq)
>>> static int cs_etm__flush(struct cs_etm_queue *etmq)
>>> {
>>> int err = 0;
>>> + struct cs_etm_auxtrace *etm = etmq->etm;
>>> struct cs_etm_packet *tmp;
>>> - if (etmq->etm->synth_opts.last_branch &&
>>> - etmq->prev_packet &&
>>> - etmq->prev_packet->sample_type == CS_ETM_RANGE) {
>>> + if (!etmq->prev_packet)
>>> + return 0;
>>> +
>>> + if (etmq->prev_packet->sample_type != CS_ETM_RANGE)
>>> + return 0;
>>> +
>>> + if (etmq->etm->synth_opts.last_branch) {
>>
>>
>> If you add:
>>
>> if (!etmq->etm->synth_opts.last_branch)
>> return 0;
>>
>> You can avoid indenting the whole block.
>>
>>> /*
>>> * Generate a last branch event for the branches left in
>>> the
>>> * circular buffer at the end of the trace.
>>> @@ -939,18 +985,25 @@ static int cs_etm__flush(struct cs_etm_queue *etmq)
>>> err = cs_etm__synth_instruction_sample(
>>> etmq, addr,
>>> etmq->period_instructions);
>>> + if (err)
>>> + return err;
>>> etmq->period_instructions = 0;
>>> + }
>>> - /*
>>> - * Swap PACKET with PREV_PACKET: PACKET becomes
>>> PREV_PACKET for
>>> - * the next incoming packet.
>>> - */
>>> - tmp = etmq->packet;
>>> - etmq->packet = etmq->prev_packet;
>>> - etmq->prev_packet = tmp;
>>> + if (etm->sample_branches) {
>>> + err = cs_etm__synth_branch_sample(etmq);
>>> + if (err)
>>> + return err;
>>> }
>>> - return err;
>>> + /*
>>> + * Swap PACKET with PREV_PACKET: PACKET becomes PREV_PACKET for
>>> + * the next incoming packet.
>>> + */
>>> + tmp = etmq->packet;
>>> + etmq->packet = etmq->prev_packet;
>>> + etmq->prev_packet = tmp;
>>
>>
>> Robert, I remember noticing that when you first submitted the code but
>> forgot to
>> go back to it. What is the point of swapping the packets? I understand
>>
>> etmq->prev_packet = etmq->packet;
>>
>> But not
>>
>> etmq->packet = tmp;
>>
>> After all etmq->packet will be clobbered as soon as
>> cs_etm_decoder__get_packet()
>> is called, which is alwasy right after either cs_etm__sample() or
>> cs_etm__flush().
>>
>
> This is code I inherited from the original versions of these patches, but it
> works because:
> - etmq->packet and etmq->prev_packet are pointers to struct cs_etm_packet
> allocated by zalloc() in cs_etm__alloc_queue()
> - cs_etm_decoder__get_packet() takes a pointer to struct cs_etm_packet and
> copies the contents of the first packet from the queue into the passed
> location with:
> *packet = decoder->packet_buffer[decoder->head]
>
> So the swap code is only swapping the pointers over, not the contents of the
> packets.
>
> Regards
>
> Rob
>
>
>
>> Thanks,
>> Mathieu
>>
>>
>>
>>> + return 0;
>>> }
>>> static int cs_etm__run_decoder(struct cs_etm_queue *etmq)
>>> --
>>> 2.7.4
>>>
>
--
Mike Leach
Principal Engineer, ARM Ltd.
Manchester Design Centre. UK
Hi Mike,
On Wed, May 30, 2018 at 04:04:34PM +0100, Mike Leach wrote:
[...]
> >>> + /* Generate sample for exception packet */
> >>> + if (etmq->prev_packet->exc == true)
> >>> + generate_sample = true;
> >>
> >>
> >> Please don't do that. Exception packets have a type of their own and can
> >> be
> >> added to the decoder packet queue the same way INST_RANGE and TRACE_ON
> >> packets
> >> are. Moreover exception packet containt an address that, if I'm reading
> >> the
> >> documenation properly, can be used to keep track of instructions that were
> >> executed between the last address of the previous range packet and the
> >> address
> >> executed just before the exception occurred. Mike and Rob will have to
> >> confirm
> >> this as the decoder may be doing all that hard work for us.
> >>
>
> clarification on the exception packets....
>
> The Opencsd output exception packet gives you the exception number,
> and optionally the preferred return address. If this address is
> present does depend a lot on the underlying protocol - will normally
> be there with ETMv4.
> Exceptions are marked differently in the underlying protocol - the
> OCSD packets abstract away these differences.
>
> consider the code:
>
> 0x1000: <some instructions>
> 0x1100: BR 0x2000
> ....
> 0x2000: <some instructions>
> 0x2020 BZ r4
>
> Without an exception this would result in the packets
>
> OCSD_RANGE(0x1000,0x1104, Last instr type=Br, taken) // recall that
> range packets have start addr inclusive, end addr exclusive.
> OCSD_RANGE(0x2000,0x2024, Last instr type=Br, <taken / not taken -
> depends on condition>
>
> Now consider an exception occurring before the BR 0x2000
>
> this will result in:-
> OCSD_RANGE(0x1000, 0x1100, Last instr type=Other)
> OCSD_EXECEPTION(IRQ, ret-addr 0x1100)
> OCSD_RANGE(IRQ_START, IRQ_END+4, Last instr type = BR, taken) //
> this is more likely to have multiple ranges / branches before any
> return, but simplified here.
> OCSD_EXCEPTION_RETURN() // present if exception returns are
> explicitly marked in underlying trace - may not always be depending on
> circumstances.
> OCSD_RANGE(0x1100,0x1104, Last=BR, taken) // continue on with short
> range - just the branch
> OCSD_RANGE(0x2000,0x2024, Last instr type=Br, <taken / not taken -
> depends on condition>
>
> Now consider the exception occurring after the BR, but before any
> other instructions are executed.
>
> OCSD_RANGE(0x1000,0x1104, Last instr type=Br, taken) // recall that
> range packets have start addr inclusive, end addr exclusive.
> OCSD_EXECEPTION(IRQ, ret-addr 0x2000) // here the preferred return
> address is actually the target of the branch.
> OCSD_RANGE(IRQ_START, IRQ_END+4, Last instr type = BR, taken) //
> this is more likely to have multiple ranges / branches before any
> return, but simplified here.
> OCSD_RANGE(0x2000,0x2024, Last instr type=Br, <taken / not taken -
> depends on condition>
>
> So in general it is possible to arrive in the IRQ_START range with the
> previous packet having been either a taken branch, a not taken branch,
> or not a branch.
> Care must be taken - whether AutoFDO or normal trace disassembly not
> to assume that having the last range packet as a taken branch means
> that the next range packet is the target, if there is an intervening
> exception packet.
Thanks a lot for detailed explaination.
IIUC, AutoFDO will not have such issue due every range packet will be
handled for it. On the other hand, as you remind, the branch samples
(and its consumer trace disassembler) is very dependent on the flag
'last_instr_taken_branch'.
According to your explaination, I think we consider the branch is
taken for below situations:
- The new coming packet is exception packet (both for exception entry
and exit packets);
- The previous packet is expcetion packet;
- The previous packet is normal range packet with
'last_instr_taken_branch' = true;
So I'd like to use below function to demonstrate my understanding for
exception packets handling. I also will send out one new patch for
support exception packet for reviewing.
If you have concern or I miss anything, please let me know.
static bool cs_etm__is_taken_branch(struct cs_etm_packet *prev_packet,
struct cs_etm_packet *packet,)
{
/* The branch is taken for normal range packet with taken branch flag */
if (prev_packet->sample_type == CS_ETM_RANGE &&
prev_packet->last_instr_taken_branch)
return true;
/* The branch is taken if previous packet is exception packet */
if (prev_packet->sample_type == CS_ETM_EXCEPTION ||
prev_packet->sample_type == CS_ETM_EXCEPTION_RET)
return true;
/* The branch is taken for an intervening exception packet */
if (packet->sample_type == CS_ETM_EXCEPTION ||
packet->sample_type == CS_ETM_EXCEPTION_RET)
return true;
return false;
}
[...]
Thanks,
Leo Yan
On Wed, May 30, 2018 at 11:39:00PM +0800, Leo Yan wrote:
> Hi Mike,
>
> On Wed, May 30, 2018 at 04:04:34PM +0100, Mike Leach wrote:
>
> [...]
>
> > >>> + /* Generate sample for exception packet */
> > >>> + if (etmq->prev_packet->exc == true)
> > >>> + generate_sample = true;
> > >>
> > >>
> > >> Please don't do that. Exception packets have a type of their own and can
> > >> be
> > >> added to the decoder packet queue the same way INST_RANGE and TRACE_ON
> > >> packets
> > >> are. Moreover exception packet containt an address that, if I'm reading
> > >> the
> > >> documenation properly, can be used to keep track of instructions that were
> > >> executed between the last address of the previous range packet and the
> > >> address
> > >> executed just before the exception occurred. Mike and Rob will have to
> > >> confirm
> > >> this as the decoder may be doing all that hard work for us.
> > >>
> >
> > clarification on the exception packets....
> >
> > The Opencsd output exception packet gives you the exception number,
> > and optionally the preferred return address. If this address is
> > present does depend a lot on the underlying protocol - will normally
> > be there with ETMv4.
> > Exceptions are marked differently in the underlying protocol - the
> > OCSD packets abstract away these differences.
> >
> > consider the code:
> >
> > 0x1000: <some instructions>
> > 0x1100: BR 0x2000
> > ....
> > 0x2000: <some instructions>
> > 0x2020 BZ r4
> >
> > Without an exception this would result in the packets
> >
> > OCSD_RANGE(0x1000,0x1104, Last instr type=Br, taken) // recall that
> > range packets have start addr inclusive, end addr exclusive.
> > OCSD_RANGE(0x2000,0x2024, Last instr type=Br, <taken / not taken -
> > depends on condition>
> >
> > Now consider an exception occurring before the BR 0x2000
> >
> > this will result in:-
> > OCSD_RANGE(0x1000, 0x1100, Last instr type=Other)
> > OCSD_EXECEPTION(IRQ, ret-addr 0x1100)
> > OCSD_RANGE(IRQ_START, IRQ_END+4, Last instr type = BR, taken) //
> > this is more likely to have multiple ranges / branches before any
> > return, but simplified here.
> > OCSD_EXCEPTION_RETURN() // present if exception returns are
> > explicitly marked in underlying trace - may not always be depending on
> > circumstances.
> > OCSD_RANGE(0x1100,0x1104, Last=BR, taken) // continue on with short
> > range - just the branch
> > OCSD_RANGE(0x2000,0x2024, Last instr type=Br, <taken / not taken -
> > depends on condition>
> >
> > Now consider the exception occurring after the BR, but before any
> > other instructions are executed.
> >
> > OCSD_RANGE(0x1000,0x1104, Last instr type=Br, taken) // recall that
> > range packets have start addr inclusive, end addr exclusive.
> > OCSD_EXECEPTION(IRQ, ret-addr 0x2000) // here the preferred return
> > address is actually the target of the branch.
> > OCSD_RANGE(IRQ_START, IRQ_END+4, Last instr type = BR, taken) //
> > this is more likely to have multiple ranges / branches before any
> > return, but simplified here.
> > OCSD_RANGE(0x2000,0x2024, Last instr type=Br, <taken / not taken -
> > depends on condition>
> >
> > So in general it is possible to arrive in the IRQ_START range with the
> > previous packet having been either a taken branch, a not taken branch,
> > or not a branch.
> > Care must be taken - whether AutoFDO or normal trace disassembly not
> > to assume that having the last range packet as a taken branch means
> > that the next range packet is the target, if there is an intervening
> > exception packet.
>
> Thanks a lot for detailed explaination.
>
> IIUC, AutoFDO will not have such issue due every range packet will be
> handled for it. On the other hand, as you remind, the branch samples
> (and its consumer trace disassembler) is very dependent on the flag
> 'last_instr_taken_branch'.
>
> According to your explaination, I think we consider the branch is
> taken for below situations:
>
> - The new coming packet is exception packet (both for exception entry
> and exit packets);
> - The previous packet is expcetion packet;
> - The previous packet is normal range packet with
> 'last_instr_taken_branch' = true;
>
> So I'd like to use below function to demonstrate my understanding for
> exception packets handling. I also will send out one new patch for
> support exception packet for reviewing.
>
> If you have concern or I miss anything, please let me know.
>
> static bool cs_etm__is_taken_branch(struct cs_etm_packet *prev_packet,
> struct cs_etm_packet *packet,)
> {
> /* The branch is taken for normal range packet with taken branch flag */
> if (prev_packet->sample_type == CS_ETM_RANGE &&
> prev_packet->last_instr_taken_branch)
> return true;
>
> /* The branch is taken if previous packet is exception packet */
> if (prev_packet->sample_type == CS_ETM_EXCEPTION ||
> prev_packet->sample_type == CS_ETM_EXCEPTION_RET)
> return true;
>
> /* The branch is taken for an intervening exception packet */
> if (packet->sample_type == CS_ETM_EXCEPTION ||
> packet->sample_type == CS_ETM_EXCEPTION_RET)
> return true;
>
> return false;
> }
Just clarify, I missed to mention I introduce two extra sample types:
CS_ETM_EXCEPTION and CS_ETM_EXCEPTION_RET, one is for exception
entry packet and another is for exception exit packet. If this is
hard for understanding, you could hold on for reveiwing new patch.
Thanks,
Leo Yan
Commit-ID: 943f32a0e8a4ea513dc68b00720a6c65842135e8
Gitweb: https://git.kernel.org/tip/943f32a0e8a4ea513dc68b00720a6c65842135e8
Author: Leo Yan <[email protected]>
AuthorDate: Mon, 28 May 2018 16:45:01 +0800
Committer: Arnaldo Carvalho de Melo <[email protected]>
CommitDate: Wed, 30 May 2018 15:39:31 -0300
perf script python: Add addr into perf sample dict
ARM CoreSight auxtrace uses 'sample->addr' to record the target address
for branch instructions, so the data of 'sample->addr' is required for
tracing data analysis.
This commit collects data of 'sample->addr' into perf sample dict,
finally can be used for python script for parsing event.
Signed-off-by: Leo Yan <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Cc: Mathieu Poirier <[email protected]>
Cc: Mike Leach <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Robert Walker <[email protected]>
Cc: Tor Jeremiassen <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
---
tools/perf/util/scripting-engines/trace-event-python.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/tools/perf/util/scripting-engines/trace-event-python.c b/tools/perf/util/scripting-engines/trace-event-python.c
index 10dd5fce082b..7f8afacd08ee 100644
--- a/tools/perf/util/scripting-engines/trace-event-python.c
+++ b/tools/perf/util/scripting-engines/trace-event-python.c
@@ -531,6 +531,8 @@ static PyObject *get_perf_sample_dict(struct perf_sample *sample,
PyLong_FromUnsignedLongLong(sample->period));
pydict_set_item_string_decref(dict_sample, "phys_addr",
PyLong_FromUnsignedLongLong(sample->phys_addr));
+ pydict_set_item_string_decref(dict_sample, "addr",
+ PyLong_FromUnsignedLongLong(sample->addr));
set_sample_read_in_dict(dict_sample, sample, evsel);
pydict_set_item_string_decref(dict, "sample", dict_sample);