2018-05-21 08:55:42

by Leo Yan

[permalink] [raw]
Subject: [RFT v2 0/4] Perf script: Add python script for CoreSight trace disassembler

This patch series is to support for using 'perf script' for CoreSight
trace disassembler, for this purpose this patch series adds a new
python script to parse CoreSight tracing event and use command 'objdump'
for disassembled lines, finally this can generate readable program
execution flow for reviewing tracing data.

Patch 0001 is one fixing patch to generate samples for the start packet
and exception packets.

Patch 0002 is the prerequisite to add addr into sample dict, so this
value can be used by python script to analyze instruction range.

Patch 0003 is to add python script for trace disassembler.

Patch 0004 is to add doc to explain python script usage and give
example for it.

This patch series has been rebased on acme git tree [1] with the last
commit 19422a9f2a3b ("perf tools: Fix kernel_start for PTI on x86") and
tested on Hikey (ARM64 octa CA53 cores).

In this version the script has no dependency on ARM64 platform and is
expected to support ARM32 platform, but I am lacking ARM32 platform for
testing on it, so firstly upstream to support ARM64 platform.

This patch series is firstly to support 'per-thread' recording tracing
data, but we also need to verify the script can dump trace disassembler
CPU wide tracing and kernel panic kdump tracing data. I also verified
this patch series which can work with kernel panic kdump tracing data,
because Mathieu is working on CPU wide tracing related work, so after
this we need to retest for CPU wide tracing and kdump tracing to ensure
the python script can handle well for all cases.

You are very welcome to test the script in this patch series, your
testing result and suggestion are very valuable to perfect this script
to cover more cases.

Changes from v1:
* According to Mike and Rob suggestion, add the fixing to generate samples
for the start packet and exception packets.
* Simplify the python script to remove the exception prediction algorithm,
we can rely on the sane exception packets for disassembler.

[1] https://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git


Leo Yan (4):
perf cs-etm: Generate sample for missed packets
perf script python: Add addr into perf sample dict
perf script python: Add script for CoreSight trace disassembler
coresight: Document for CoreSight trace disassembler

Documentation/trace/coresight.txt | 52 +++++
tools/perf/scripts/python/arm-cs-trace-disasm.py | 234 +++++++++++++++++++++
tools/perf/util/cs-etm.c | 35 ++-
.../util/scripting-engines/trace-event-python.c | 2 +
4 files changed, 316 insertions(+), 7 deletions(-)
create mode 100644 tools/perf/scripts/python/arm-cs-trace-disasm.py

--
2.7.4



2018-05-21 08:54:47

by Leo Yan

[permalink] [raw]
Subject: [RFT v2 4/4] coresight: Document for CoreSight trace disassembler

This commit documents CoreSight trace disassembler usage and gives
example for it.

Signed-off-by: Leo Yan <[email protected]>
---
Documentation/trace/coresight.txt | 52 +++++++++++++++++++++++++++++++++++++++
1 file changed, 52 insertions(+)

diff --git a/Documentation/trace/coresight.txt b/Documentation/trace/coresight.txt
index 6f0120c..b8f2359 100644
--- a/Documentation/trace/coresight.txt
+++ b/Documentation/trace/coresight.txt
@@ -381,3 +381,55 @@ sort example is from the AutoFDO tutorial (https://gcc.gnu.org/wiki/AutoFDO/Tuto
$ taskset -c 2 ./sort_autofdo
Bubble sorting array of 30000 elements
5806 ms
+
+
+Tracing data disassembler
+-------------------------
+
+'perf script' supports to use script to parse tracing packet and rely on
+'objdump' for disassembled lines, this can convert tracing data to readable
+program execution flow for easily reviewing tracing data.
+
+The CoreSight trace disassembler is located in the folder:
+tools/perf/scripts/python/arm-cs-trace-disasm.py. This script support below
+options:
+
+ -d, --objdump: Set path to objdump executable, this option is
+ mandatory.
+ -k, --vmlinux: Set path to vmlinux file.
+ -v, --verbose: Enable debugging log, after enable this option the
+ script dumps every event data.
+
+Below is one example for using python script to dump CoreSight trace
+disassembler:
+
+ $ perf script -s arm-cs-trace-disasm.py -i perf.data \
+ -F cpu,event,ip,addr,sym -- -d objdump -k ./vmlinux > cs-disasm.log
+
+Below is one example for the disassembler log:
+
+ARM CoreSight Trace Data Assembler Dump
+ ffff000008a5f2dc <etm4_enable_hw+0x344>:
+ ffff000008a5f2dc: 340000a0 cbz w0, ffff000008a5f2f0 <etm4_enable_hw+0x358>
+ ffff000008a5f2f0 <etm4_enable_hw+0x358>:
+ ffff000008a5f2f0: f9400260 ldr x0, [x19]
+ ffff000008a5f2f4: d5033f9f dsb sy
+ ffff000008a5f2f8: 913ec000 add x0, x0, #0xfb0
+ ffff000008a5f2fc: b900001f str wzr, [x0]
+ ffff000008a5f300: f9400bf3 ldr x19, [sp, #16]
+ ffff000008a5f304: a8c27bfd ldp x29, x30, [sp], #32
+ ffff000008a5f308: d65f03c0 ret
+ ffff000008a5fa18 <etm4_enable+0x1b0>:
+ ffff000008a5fa18: 14000025 b ffff000008a5faac <etm4_enable+0x244>
+ ffff000008a5faac <etm4_enable+0x244>:
+ ffff000008a5faac: b9406261 ldr w1, [x19, #96]
+ ffff000008a5fab0: 52800015 mov w21, #0x0 // #0
+ ffff000008a5fab4: f901ca61 str x1, [x19, #912]
+ ffff000008a5fab8: 2a1503e0 mov w0, w21
+ ffff000008a5fabc: 3940e261 ldrb w1, [x19, #56]
+ ffff000008a5fac0: f901ce61 str x1, [x19, #920]
+ ffff000008a5fac4: a94153f3 ldp x19, x20, [sp, #16]
+ ffff000008a5fac8: a9425bf5 ldp x21, x22, [sp, #32]
+ ffff000008a5facc: a94363f7 ldp x23, x24, [sp, #48]
+ ffff000008a5fad0: a8c47bfd ldp x29, x30, [sp], #64
+ ffff000008a5fad4: d65f03c0 ret
--
2.7.4


2018-05-21 08:55:21

by Leo Yan

[permalink] [raw]
Subject: [RFT v2 3/4] perf script python: Add script for CoreSight trace disassembler

This commit adds python script to parse CoreSight tracing event and
use command 'objdump' for disassembled lines, finally we can generate
readable program execution flow for reviewing tracing data.

The script receives CoreSight tracing packet with below format:

+------------+------------+------------+
packet(n): | addr | ip | cpu |
+------------+------------+------------+
packet(n+1): | addr | ip | cpu |
+------------+------------+------------+

packet::ip is the last address of current branch instruction and
packet::addr presents the start address of the next coming branch
instruction. So for one branch instruction which starts in packet(n),
its execution flow starts from packet(n)::addr and it stops at
packet(n+1)::ip. As results we need to combine the two continuous
packets to generate the instruction range, this is the rationale for the
script implementation:

[ sample(n)::addr .. sample(n+1)::ip ]

Credits to Tor Jeremiassen who have written the script skeleton and
provides the ideas for reading symbol file according to build-id,
creating memory map for dso and basic packet handling. Mathieu Poirier
contributed fixes for build-id and memory map bugs. The detailed
development history for this script you can find from [1]. Based on Tor
and Mathieu work, the script is updated samples handling for the
corrected sample format. Another minor enhancement is to support for
without build-id case, the script can parse kernel symbols with option
'-k' for vmlinux file path.

[1] https://github.com/Linaro/perf-opencsd/commits/perf-opencsd-v4.15/tools/perf/scripts/python/cs-trace-disasm.py

Co-authored-by: Tor Jeremiassen <[email protected]>
Co-authored-by: Mathieu Poirier <[email protected]>
Signed-off-by: Leo Yan <[email protected]>
---
tools/perf/scripts/python/arm-cs-trace-disasm.py | 234 +++++++++++++++++++++++
1 file changed, 234 insertions(+)
create mode 100644 tools/perf/scripts/python/arm-cs-trace-disasm.py

diff --git a/tools/perf/scripts/python/arm-cs-trace-disasm.py b/tools/perf/scripts/python/arm-cs-trace-disasm.py
new file mode 100644
index 0000000..58de36f
--- /dev/null
+++ b/tools/perf/scripts/python/arm-cs-trace-disasm.py
@@ -0,0 +1,234 @@
+# arm-cs-trace-disasm.py: ARM CoreSight Trace Dump With Disassember
+# SPDX-License-Identifier: GPL-2.0
+#
+# Tor Jeremiassen <[email protected]> is original author who wrote script
+# skeleton, Mathieu Poirier <[email protected]> contributed
+# fixes for build-id and memory map; Leo Yan <[email protected]>
+# updated the packet parsing with new samples format.
+
+import os
+import sys
+import re
+from subprocess import *
+from optparse import OptionParser, make_option
+
+# Command line parsing
+
+option_list = [
+ # formatting options for the bottom entry of the stack
+ make_option("-k", "--vmlinux", dest="vmlinux_name",
+ help="Set path to vmlinux file"),
+ make_option("-d", "--objdump", dest="objdump_name",
+ help="Set path to objdump executable file"),
+ make_option("-v", "--verbose", dest="verbose",
+ action="store_true", default=False,
+ help="Enable debugging log")
+]
+
+parser = OptionParser(option_list=option_list)
+(options, args) = parser.parse_args()
+
+if (options.objdump_name == None):
+ sys.exit("No objdump executable file specified - use -d or --objdump option")
+
+# Initialize global dicts and regular expression
+
+build_ids = dict()
+mmaps = dict()
+disasm_cache = dict()
+cpu_data = dict()
+disasm_re = re.compile("^\s*([0-9a-fA-F]+):")
+disasm_func_re = re.compile("^\s*([0-9a-fA-F]+)\s\<.*\>:")
+cache_size = 32*1024
+prev_cpu = -1
+
+def parse_buildid():
+ global build_ids
+
+ buildid_regex = "([a-fA-f0-9]+)[ \t]([^ \n]+)"
+ buildid_re = re.compile(buildid_regex)
+
+ results = check_output(["perf", "buildid-list"]).split('\n');
+ for line in results:
+ m = buildid_re.search(line)
+ if (m == None):
+ continue;
+
+ id_name = m.group(2)
+ id_num = m.group(1)
+
+ if (id_name == "[kernel.kallsyms]") :
+ append = "/kallsyms"
+ elif (id_name == "[vdso]") :
+ append = "/vdso"
+ else:
+ append = "/elf"
+
+ build_ids[id_name] = os.environ['PERF_BUILDID_DIR'] + \
+ "/" + id_name + "/" + id_num + append;
+ # Replace duplicate slash chars to single slash char
+ build_ids[id_name] = build_ids[id_name].replace('//', '/', 1)
+
+ if ((options.vmlinux_name == None) and ("[kernel.kallsyms]" in build_ids)):
+ print "kallsyms cannot be used to dump assembler"
+
+ # Set vmlinux path to replace kallsyms file, if without buildid we still
+ # can use vmlinux to prase kernel symbols
+ if ((options.vmlinux_name != None)):
+ build_ids['[kernel.kallsyms]'] = options.vmlinux_name;
+
+def parse_mmap():
+ global mmaps
+
+ # Check mmap for PERF_RECORD_MMAP and PERF_RECORD_MMAP2
+ mmap_regex = "PERF_RECORD_MMAP.* -?[0-9]+/[0-9]+: \[(0x[0-9a-fA-F]+)\((0x[0-9a-fA-F]+)\).*:\s.*\s(\S*)"
+ mmap_re = re.compile(mmap_regex)
+
+ results = check_output("perf script --show-mmap-events | fgrep PERF_RECORD_MMAP", shell=True).split('\n')
+ for line in results:
+ m = mmap_re.search(line)
+ if (m != None):
+ if (m.group(3) == '[kernel.kallsyms]_text'):
+ dso = '[kernel.kallsyms]'
+ else:
+ dso = m.group(3)
+
+ start = int(m.group(1),0)
+ end = int(m.group(1),0) + int(m.group(2),0)
+ mmaps[dso] = [start, end]
+
+def find_dso_mmap(addr):
+ global mmaps
+
+ for key, value in mmaps.items():
+ if (addr >= value[0] and addr < value[1]):
+ return key
+
+ return None
+
+def read_disam(dso, start_addr, stop_addr):
+ global mmaps
+ global build_ids
+
+ addr_range = start_addr + ":" + stop_addr;
+
+ # Don't let the cache get too big, clear it when it hits max size
+ if (len(disasm_cache) > cache_size):
+ disasm_cache.clear();
+
+ try:
+ disasm_output = disasm_cache[addr_range];
+ except:
+ try:
+ fname = build_ids[dso];
+ except KeyError:
+ sys.exit("cannot find symbol file for " + dso)
+
+ disasm = [ options.objdump_name, "-d", "-z",
+ "--start-address="+start_addr,
+ "--stop-address="+stop_addr, fname ]
+
+ disasm_output = check_output(disasm).split('\n')
+ disasm_cache[addr_range] = disasm_output;
+
+ return disasm_output
+
+def dump_disam(dso, start_addr, stop_addr, check_svc):
+ for line in read_disam(dso, start_addr, stop_addr):
+ m = disasm_func_re.search(line)
+ if (m != None):
+ print "\t",line
+ continue
+
+ m = disasm_re.search(line)
+ if (m == None):
+ continue;
+
+ print "\t",line
+
+ if ((check_svc == True) and "svc" in line):
+ return
+
+def dump_packet(sample):
+ print "Packet = { cpu: 0x%d addr: 0x%x phys_addr: 0x%x ip: 0x%x " \
+ "pid: %d tid: %d period: %d time: %d }" % \
+ (sample['cpu'], sample['addr'], sample['phys_addr'], \
+ sample['ip'], sample['pid'], sample['tid'], \
+ sample['period'], sample['time'])
+
+def trace_begin():
+ print 'ARM CoreSight Trace Data Assembler Dump'
+ parse_buildid()
+ parse_mmap()
+
+def trace_end():
+ print 'End'
+
+def trace_unhandled(event_name, context, event_fields_dict):
+ print ' '.join(['%s=%s'%(k,str(v))for k,v in sorted(event_fields_dict.items())])
+
+def process_event(param_dict):
+ global cache_size
+ global options
+ global prev_cpu
+
+ sample = param_dict["sample"]
+
+ if (options.verbose == True):
+ dump_packet(sample)
+
+ # If period doesn't equal to 1, this packet is for instruction sample
+ # packet, we need drop this synthetic packet.
+ if (sample['period'] != 1):
+ print "Skip synthetic instruction sample"
+ return
+
+ cpu = format(sample['cpu'], "d");
+
+ # Initialize CPU data if it's empty, and directly return back
+ # if this is the first tracing event for this CPU.
+ if (cpu_data.get(str(cpu) + 'addr') == None):
+ cpu_data[str(cpu) + 'addr'] = format(sample['addr'], "#x")
+ prev_cpu = cpu
+ return
+
+ # The format for packet is:
+ #
+ # +------------+------------+------------+
+ # sample_prev: | addr | ip | cpu |
+ # +------------+------------+------------+
+ # sample_next: | addr | ip | cpu |
+ # +------------+------------+------------+
+ #
+ # We need to combine the two continuous packets to get the instruction
+ # range for sample_prev::cpu:
+ #
+ # [ sample_prev::addr .. sample_next::ip ]
+ #
+ # For this purose, sample_prev::addr is stored into cpu_data structure
+ # and read back for 'start_addr' when the new packet comes, and we need
+ # to use sample_next::ip to calculate 'stop_addr', plusing extra 4 for
+ # 'stop_addr' is for the sake of objdump so the final assembler dump can
+ # include last instruction for sample_next::ip.
+
+ start_addr = cpu_data[str(prev_cpu) + 'addr']
+ stop_addr = format(sample['ip'] + 4, "#x")
+
+ # Sanity checking dso for start_addr and stop_addr
+ prev_dso = find_dso_mmap(int(start_addr, 0))
+ next_dso = find_dso_mmap(int(stop_addr, 0))
+
+ # If cannot find dso so cannot dump assembler, bail out
+ if (prev_dso == None or next_dso == None):
+ print "Address range [ %s .. %s ]: failed to find dso" % (start_addr, stop_addr)
+ prev_cpu = cpu
+ return
+ elif (prev_dso != next_dso):
+ print "Address range [ %s .. %s ]: isn't in same dso" % (start_addr, stop_addr)
+ prev_cpu = cpu
+ return
+
+ dump_disam(prev_dso, start_addr, stop_addr, False)
+
+ cpu_data[str(cpu) + 'addr'] = format(sample['addr'], "#x")
+ prev_cpu = cpu
--
2.7.4


2018-05-21 08:55:44

by Leo Yan

[permalink] [raw]
Subject: [RFT v2 1/4] perf cs-etm: Generate sample for missed packets

Commit e573e978fb12 ("perf cs-etm: Inject capabilitity for CoreSight
traces") reworks the samples generation flow from CoreSight trace to
match the correct format so Perf report tool can display the samples
properly. But the change has side effect for packet handling, it only
generate samples when 'prev_packet->last_instr_taken_branch' is true,
this results in the start tracing packet and exception packets are
dropped.

This patch checks extra two conditions for complete samples:

- If 'prev_packet->sample_type' is zero we can use this condition to
get to know this is the start tracing packet; for this case, the start
packet's end_addr is zero as well so we need to handle it in the
function cs_etm__last_executed_instr();

- If 'prev_packet->exc' is true, we can know the previous packet is
exception handling packet so need to generate sample for exception
flow.

Fixes: e573e978fb12 ("perf cs-etm: Inject capabilitity for CoreSight traces")
Cc: Mike Leach <[email protected]>
Cc: Robert Walker <[email protected]>
Cc: Mathieu Poirier <[email protected]>
Signed-off-by: Leo Yan <[email protected]>
---
tools/perf/util/cs-etm.c | 35 ++++++++++++++++++++++++++++-------
1 file changed, 28 insertions(+), 7 deletions(-)

diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c
index 822ba91..378953b 100644
--- a/tools/perf/util/cs-etm.c
+++ b/tools/perf/util/cs-etm.c
@@ -495,6 +495,13 @@ static inline void cs_etm__reset_last_branch_rb(struct cs_etm_queue *etmq)
static inline u64 cs_etm__last_executed_instr(struct cs_etm_packet *packet)
{
/*
+ * The packet is the start tracing packet if the end_addr is zero,
+ * returns 0 for this case.
+ */
+ if (!packet->end_addr)
+ return 0;
+
+ /*
* The packet records the execution range with an exclusive end address
*
* A64 instructions are constant size, so the last executed
@@ -897,13 +904,27 @@ static int cs_etm__sample(struct cs_etm_queue *etmq)
etmq->period_instructions = instrs_over;
}

- if (etm->sample_branches &&
- etmq->prev_packet &&
- etmq->prev_packet->sample_type == CS_ETM_RANGE &&
- etmq->prev_packet->last_instr_taken_branch) {
- ret = cs_etm__synth_branch_sample(etmq);
- if (ret)
- return ret;
+ if (etm->sample_branches && etmq->prev_packet) {
+ bool generate_sample = false;
+
+ /* Generate sample for start tracing packet */
+ if (etmq->prev_packet->sample_type == 0)
+ generate_sample = true;
+
+ /* Generate sample for exception packet */
+ if (etmq->prev_packet->exc == true)
+ generate_sample = true;
+
+ /* Generate sample for normal branch packet */
+ if (etmq->prev_packet->sample_type == CS_ETM_RANGE &&
+ etmq->prev_packet->last_instr_taken_branch)
+ generate_sample = true;
+
+ if (generate_sample) {
+ ret = cs_etm__synth_branch_sample(etmq);
+ if (ret)
+ return ret;
+ }
}

if (etm->sample_branches || etm->synth_opts.last_branch) {
--
2.7.4


2018-05-21 08:55:46

by Leo Yan

[permalink] [raw]
Subject: [RFT v2 2/4] perf script python: Add addr into perf sample dict

ARM CoreSight auxtrace uses 'sample->addr' to record the target address
for branch instructions, so the data of 'sample->addr' is required for
tracing data analysis.

This commit collects data of 'sample->addr' into perf sample dict,
finally can be used for python script for parsing event.

Signed-off-by: Leo Yan <[email protected]>
---
tools/perf/util/scripting-engines/trace-event-python.c | 2 ++
1 file changed, 2 insertions(+)

diff --git a/tools/perf/util/scripting-engines/trace-event-python.c b/tools/perf/util/scripting-engines/trace-event-python.c
index 10dd5fc..7f8afac 100644
--- a/tools/perf/util/scripting-engines/trace-event-python.c
+++ b/tools/perf/util/scripting-engines/trace-event-python.c
@@ -531,6 +531,8 @@ static PyObject *get_perf_sample_dict(struct perf_sample *sample,
PyLong_FromUnsignedLongLong(sample->period));
pydict_set_item_string_decref(dict_sample, "phys_addr",
PyLong_FromUnsignedLongLong(sample->phys_addr));
+ pydict_set_item_string_decref(dict_sample, "addr",
+ PyLong_FromUnsignedLongLong(sample->addr));
set_sample_read_in_dict(dict_sample, sample, evsel);
pydict_set_item_string_decref(dict, "sample", dict_sample);

--
2.7.4


2018-05-21 11:29:06

by Robert Walker

[permalink] [raw]
Subject: Re: [RFT v2 1/4] perf cs-etm: Generate sample for missed packets

Hi Leo,

On 21/05/18 09:52, Leo Yan wrote:
> Commit e573e978fb12 ("perf cs-etm: Inject capabilitity for CoreSight
> traces") reworks the samples generation flow from CoreSight trace to
> match the correct format so Perf report tool can display the samples
> properly. But the change has side effect for packet handling, it only
> generate samples when 'prev_packet->last_instr_taken_branch' is true,
> this results in the start tracing packet and exception packets are
> dropped.
>
> This patch checks extra two conditions for complete samples:
>
> - If 'prev_packet->sample_type' is zero we can use this condition to
> get to know this is the start tracing packet; for this case, the start
> packet's end_addr is zero as well so we need to handle it in the
> function cs_etm__last_executed_instr();
>

I think you also need to add something in to handle discontinuities in
trace - for example it is possible to configure the ETM to only trace
execution in specific code regions or to trace a few cycles every so
often. In these cases, prev_packet->sample_type will not be zero, but
whatever the previous packet was. You will get a CS_ETM_TRACE_ON packet
in such cases, generated by an I_TRACE_ON element in the trace stream.
You also get this on exception return.

However, you should also keep the test for prev_packet->sample_type == 0
as you may not see a CS_ETM_TRACE_ON when decoding a buffer that has
wrapped.

Regards

Rob

> - If 'prev_packet->exc' is true, we can know the previous packet is
> exception handling packet so need to generate sample for exception
> flow.
>
> Fixes: e573e978fb12 ("perf cs-etm: Inject capabilitity for CoreSight traces")
> Cc: Mike Leach <[email protected]>
> Cc: Robert Walker <[email protected]>
> Cc: Mathieu Poirier <[email protected]>
> Signed-off-by: Leo Yan <[email protected]>
> ---
> tools/perf/util/cs-etm.c | 35 ++++++++++++++++++++++++++++-------
> 1 file changed, 28 insertions(+), 7 deletions(-)
>
> diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c
> index 822ba91..378953b 100644
> --- a/tools/perf/util/cs-etm.c
> +++ b/tools/perf/util/cs-etm.c
> @@ -495,6 +495,13 @@ static inline void cs_etm__reset_last_branch_rb(struct cs_etm_queue *etmq)
> static inline u64 cs_etm__last_executed_instr(struct cs_etm_packet *packet)
> {
> /*
> + * The packet is the start tracing packet if the end_addr is zero,
> + * returns 0 for this case.
> + */
> + if (!packet->end_addr)
> + return 0;
> +
> + /*
> * The packet records the execution range with an exclusive end address
> *
> * A64 instructions are constant size, so the last executed
> @@ -897,13 +904,27 @@ static int cs_etm__sample(struct cs_etm_queue *etmq)
> etmq->period_instructions = instrs_over;
> }
>
> - if (etm->sample_branches &&
> - etmq->prev_packet &&
> - etmq->prev_packet->sample_type == CS_ETM_RANGE &&
> - etmq->prev_packet->last_instr_taken_branch) {
> - ret = cs_etm__synth_branch_sample(etmq);
> - if (ret)
> - return ret;
> + if (etm->sample_branches && etmq->prev_packet) {
> + bool generate_sample = false;
> +
> + /* Generate sample for start tracing packet */
> + if (etmq->prev_packet->sample_type == 0)
> + generate_sample = true;
> +
> + /* Generate sample for exception packet */
> + if (etmq->prev_packet->exc == true)
> + generate_sample = true;
> +
> + /* Generate sample for normal branch packet */
> + if (etmq->prev_packet->sample_type == CS_ETM_RANGE &&
> + etmq->prev_packet->last_instr_taken_branch)
> + generate_sample = true;
> +
> + if (generate_sample) {
> + ret = cs_etm__synth_branch_sample(etmq);
> + if (ret)
> + return ret;
> + }
> }
>
> if (etm->sample_branches || etm->synth_opts.last_branch) {
>

2018-05-22 08:40:08

by Leo Yan

[permalink] [raw]
Subject: Re: [RFT v2 1/4] perf cs-etm: Generate sample for missed packets

Hi Rob,

On Mon, May 21, 2018 at 12:27:42PM +0100, Robert Walker wrote:
> Hi Leo,
>
> On 21/05/18 09:52, Leo Yan wrote:
> >Commit e573e978fb12 ("perf cs-etm: Inject capabilitity for CoreSight
> >traces") reworks the samples generation flow from CoreSight trace to
> >match the correct format so Perf report tool can display the samples
> >properly. But the change has side effect for packet handling, it only
> >generate samples when 'prev_packet->last_instr_taken_branch' is true,
> >this results in the start tracing packet and exception packets are
> >dropped.
> >
> >This patch checks extra two conditions for complete samples:
> >
> >- If 'prev_packet->sample_type' is zero we can use this condition to
> > get to know this is the start tracing packet; for this case, the start
> > packet's end_addr is zero as well so we need to handle it in the
> > function cs_etm__last_executed_instr();
> >
>
> I think you also need to add something in to handle discontinuities in
> trace - for example it is possible to configure the ETM to only trace
> execution in specific code regions or to trace a few cycles every so
> often. In these cases, prev_packet->sample_type will not be zero, but
> whatever the previous packet was. You will get a CS_ETM_TRACE_ON packet in
> such cases, generated by an I_TRACE_ON element in the trace stream.
> You also get this on exception return.
>
> However, you should also keep the test for prev_packet->sample_type == 0
> as you may not see a CS_ETM_TRACE_ON when decoding a buffer that has
> wrapped.

Thanks for reviewing. Let's dig more detailed into this issue,
especially for handling packet CS_ETM_TRACE_ON, I'd like divide into two
sub cases.

- The first case is for using python script:

I use python script to analyze packets with below command:
./perf script --itrace=ril128 -s arm-cs-trace-disasm.py -F cpu,event,ip,addr,sym -- -v -d objdump -k ./vmlinux

What I observe is after we pass python script with parameter '-s
arm-cs-trace-disasm.py', then instruction tracing options
'--itrace=ril128' isn't really used; the perf tool creates another
new process for launch python script and re-enter cmd_script()
function, but at the second time when invoke cmd_script() for python
script execution the option '--itrace=ril128' is dropped and all
parameters are only valid defined by the python script.

As result, I can the variable 'etmq->etm->synth_opts.last_branch' is
always FALSE for running python script. So all CS_ETM_TRACE_ON
packets will be ignored in the function cs_etm__flush().

Even the CS_ETM_TRACE_ON packets are missed to handle, the program
flow still can work well. The reason is without the interference by
CS_ETM_TRACE_ON, the CS_ETM_RANGE packets can smoothly create
instruction range by ignore the middle CS_ETM_TRACE_ON packet.

Please see below example, in this example there have 3 packets, the
first one packet is CS_ETM_RANGE packet which is labelled with
'PACKET_1', the first one packet can properly generate branch sample
data with previous packet as expected; the second packet is
PACKET_2 which is CS_ETM_TRACE_ON, but
'etmq->etm->synth_opts.last_branch' is false so function
cs_etm__flush() doesn't handle it and skip the swap operation
"etmq->prev_packet = tmp"; the third packet is PACKET_3, which is
CS_ETM_RANGE packet and we can see it's smoontly to create
continous instruction range between PACKET_1 and PACKET_3.

cs_etm__sample: prev_packet: sample_type=1 exc=0 exc_ret=0 cpu=1 start_addr=0xffff000008a5f79c end_addr=0xffff000008a5f7bc last_instr_taken_branch=1
PACKET_1: cs_etm__sample: packet: sample_type=1 exc=0 exc_ret=0 cpu=1 start_addr=0xffff000008a5f858 end_addr=0xffff000008a5f864 last_instr_taken_branch=1
cs_etm__synth_branch_sample: ip=0xffff000008a5f7b8 addr=0xffff000008a5f858 pid=2290 tid=2290 id=1000000021 stream_id=1000000021 period=1 cpu=1 flags=0 cpumode=2

cs_etm__flush: prev_packet: sample_type=1 exc=0 exc_ret=0 cpu=1 start_addr=0xffff000008a5f858 end_addr=0xffff000008a5f864 last_instr_taken_branch=1
PACKET_2: cs_etm__flush: packet: sample_type=2 exc=0 exc_ret=0 cpu=2 start_addr=0xdeadbeefdeadbeef end_addr=0xdeadbeefdeadbeef last_instr_taken_branch=1

cs_etm__sample: prev_packet: sample_type=1 exc=0 exc_ret=0 cpu=1 start_addr=0xffff000008a5f858 end_addr=0xffff000008a5f864 last_instr_taken_branch=1
PACKET_3: cs_etm__sample: packet: sample_type=1 exc=0 exc_ret=0 cpu=2 start_addr=0xffff000008be7528 end_addr=0xffff000008be7538 last_instr_taken_branch=1
cs_etm__synth_branch_sample: ip=0xffff000008a5f860 addr=0xffff000008be7528 pid=2290 tid=2290 id=1000000021 stream_id=1000000021 period=1 cpu=2 flags=0 cpumode=2

So seems to me, the CS_ETM_TRACE_ON packet doesn't introduce trouble
for the program flow analysis if we can handle all CS_ETM_RANGE
packets and without handling CS_ETM_TRACE_ON packet for branch
samples.

- The second case is for --itrace option without python script:
./perf script --itrace=ril -F cpu,event,ip,addr,sym -k ./vmlinux

In this case, the flag 'etmq->etm->synth_opts.last_branch' is true
so CS_ETM_TRACE_ON packet will be handled; but I can observe the
CS_ETM_RANGE packet in etmq->prev_packet isn't handled in the
function cs_etm__flush() for branch sample, so actually we miss some
branch sample for this case.

So I think we also need handle CS_ETM_RANGE packet in function
cs_etm__flush() to generate branch samples. But this has side
effect, we introduce the extra track for CS_ETM_TRACE_ON packet for
branch samples, so we will see one branch range like:
[ 0xdeadbeefdeadbeef .. 0xdeadbeefdeadbeef ].

Please reivew below change is okay for you? Thanks a lot for
suggestions.

diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c
index 822ba91..37d3722 100644
--- a/tools/perf/util/cs-etm.c
+++ b/tools/perf/util/cs-etm.c
@@ -495,6 +495,13 @@ static inline void cs_etm__reset_last_branch_rb(struct cs_etm_queue *etmq)
static inline u64 cs_etm__last_executed_instr(struct cs_etm_packet *packet)
{
/*
+ * The packet is the start tracing packet if the end_addr is zero,
+ * returns 0 for this case.
+ */
+ if (!packet->end_addr)
+ return 0;
+
+ /*
* The packet records the execution range with an exclusive end address
*
* A64 instructions are constant size, so the last executed
@@ -897,13 +904,28 @@ static int cs_etm__sample(struct cs_etm_queue *etmq)
etmq->period_instructions = instrs_over;
}

- if (etm->sample_branches &&
- etmq->prev_packet &&
- etmq->prev_packet->sample_type == CS_ETM_RANGE &&
- etmq->prev_packet->last_instr_taken_branch) {
- ret = cs_etm__synth_branch_sample(etmq);
- if (ret)
- return ret;
+ if (etm->sample_branches && etmq->prev_packet) {
+ bool generate_sample = false;
+
+ /* Generate sample for start tracing packet */
+ if (etmq->prev_packet->sample_type == 0 ||
+ etmq->prev_packet->sample_type == CS_ETM_TRACE_ON)
+ generate_sample = true;
+
+ /* Generate sample for exception packet */
+ if (etmq->prev_packet->exc == true)
+ generate_sample = true;
+
+ /* Generate sample for normal branch packet */
+ if (etmq->prev_packet->sample_type == CS_ETM_RANGE &&
+ etmq->prev_packet->last_instr_taken_branch)
+ generate_sample = true;
+
+ if (generate_sample) {
+ ret = cs_etm__synth_branch_sample(etmq);
+ if (ret)
+ return ret;
+ }
}

if (etm->sample_branches || etm->synth_opts.last_branch) {
@@ -921,12 +943,17 @@ static int cs_etm__sample(struct cs_etm_queue *etmq)

static int cs_etm__flush(struct cs_etm_queue *etmq)
{
+ struct cs_etm_auxtrace *etm = etmq->etm;
int err = 0;
struct cs_etm_packet *tmp;

- if (etmq->etm->synth_opts.last_branch &&
- etmq->prev_packet &&
- etmq->prev_packet->sample_type == CS_ETM_RANGE) {
+ if (!etmq->prev_packet)
+ return 0;
+
+ if (etmq->prev_packet->sample_type != CS_ETM_RANGE)
+ return 0;
+
+ if (etmq->etm->synth_opts.last_branch) {
/*
* Generate a last branch event for the branches left in the
* circular buffer at the end of the trace.
@@ -939,18 +966,25 @@ static int cs_etm__flush(struct cs_etm_queue *etmq)
err = cs_etm__synth_instruction_sample(
etmq, addr,
etmq->period_instructions);
+ if (err)
+ return err;
etmq->period_instructions = 0;
+ }

- /*
- * Swap PACKET with PREV_PACKET: PACKET becomes PREV_PACKET for
- * the next incoming packet.
- */
- tmp = etmq->packet;
- etmq->packet = etmq->prev_packet;
- etmq->prev_packet = tmp;
+ if (etm->sample_branches) {
+ err = cs_etm__synth_branch_sample(etmq);
+ if (err)
+ return err;
}

- return err;
+ /*
+ * Swap PACKET with PREV_PACKET: PACKET becomes PREV_PACKET for
+ * the next incoming packet.
+ */
+ tmp = etmq->packet;
+ etmq->packet = etmq->prev_packet;
+ etmq->prev_packet = tmp;
+ return 0;
}

static int cs_etm__run_decoder(struct cs_etm_queue *etmq)
--
2.7.4

2018-05-22 09:54:33

by Leo Yan

[permalink] [raw]
Subject: Re: [RFT v2 1/4] perf cs-etm: Generate sample for missed packets

On Tue, May 22, 2018 at 04:39:20PM +0800, Leo Yan wrote:

[...]

Rather than the patch I posted in my previous email, I think below new
patch is more reasonable for me.

In the below change, 'etmq->prev_packet' is only used to store the
previous CS_ETM_RANGE packet, we don't need to save CS_ETM_TRACE_ON
packet into 'etmq->prev_packet'.

On the other hand, cs_etm__flush() can use 'etmq->period_instructions'
to indicate if need to generate instruction sample or not. If it's
non-zero, then generate instruction sample and
'etmq->period_instructions' will be cleared; so next time if there
have more tracing CS_ETM_TRACE_ON packet, it can skip to generate
instruction sample due 'etmq->period_instructions' is zero.

How about you think for this?

Thanks,
Leo Yan


diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c
index 822ba91..dd354ad 100644
--- a/tools/perf/util/cs-etm.c
+++ b/tools/perf/util/cs-etm.c
@@ -495,6 +495,13 @@ static inline void cs_etm__reset_last_branch_rb(struct cs_etm_queue *etmq)
static inline u64 cs_etm__last_executed_instr(struct cs_etm_packet *packet)
{
/*
+ * The packet is the start tracing packet if the end_addr is zero,
+ * returns 0 for this case.
+ */
+ if (!packet->end_addr)
+ return 0;
+
+ /*
* The packet records the execution range with an exclusive end address
*
* A64 instructions are constant size, so the last executed
@@ -897,13 +904,27 @@ static int cs_etm__sample(struct cs_etm_queue *etmq)
etmq->period_instructions = instrs_over;
}

- if (etm->sample_branches &&
- etmq->prev_packet &&
- etmq->prev_packet->sample_type == CS_ETM_RANGE &&
- etmq->prev_packet->last_instr_taken_branch) {
- ret = cs_etm__synth_branch_sample(etmq);
- if (ret)
- return ret;
+ if (etm->sample_branches && etmq->prev_packet) {
+ bool generate_sample = false;
+
+ /* Generate sample for start tracing packet */
+ if (etmq->prev_packet->sample_type == 0)
+ generate_sample = true;
+
+ /* Generate sample for exception packet */
+ if (etmq->prev_packet->exc == true)
+ generate_sample = true;
+
+ /* Generate sample for normal branch packet */
+ if (etmq->prev_packet->sample_type == CS_ETM_RANGE &&
+ etmq->prev_packet->last_instr_taken_branch)
+ generate_sample = true;
+
+ if (generate_sample) {
+ ret = cs_etm__synth_branch_sample(etmq);
+ if (ret)
+ return ret;
+ }
}

if (etm->sample_branches || etm->synth_opts.last_branch) {
@@ -922,11 +943,12 @@ static int cs_etm__sample(struct cs_etm_queue *etmq)
static int cs_etm__flush(struct cs_etm_queue *etmq)
{
int err = 0;
- struct cs_etm_packet *tmp;

if (etmq->etm->synth_opts.last_branch &&
etmq->prev_packet &&
- etmq->prev_packet->sample_type == CS_ETM_RANGE) {
+ etmq->prev_packet->sample_type == CS_ETM_RANGE &&
+ etmq->period_instructions) {
+
/*
* Generate a last branch event for the branches left in the
* circular buffer at the end of the trace.
@@ -940,14 +962,6 @@ static int cs_etm__flush(struct cs_etm_queue *etmq)
etmq, addr,
etmq->period_instructions);
etmq->period_instructions = 0;
-
- /*
- * Swap PACKET with PREV_PACKET: PACKET becomes PREV_PACKET for
- * the next incoming packet.
- */
- tmp = etmq->packet;
- etmq->packet = etmq->prev_packet;
- etmq->prev_packet = tmp;
}

return err;
--
2.7.4

2018-05-23 11:21:53

by Robert Walker

[permalink] [raw]
Subject: Re: [RFT v2 1/4] perf cs-etm: Generate sample for missed packets

Hi Leo,

On 22/05/18 10:52, Leo Yan wrote:
> On Tue, May 22, 2018 at 04:39:20PM +0800, Leo Yan wrote:
>
> [...]
>
> Rather than the patch I posted in my previous email, I think below new
> patch is more reasonable for me.
>
> In the below change, 'etmq->prev_packet' is only used to store the
> previous CS_ETM_RANGE packet, we don't need to save CS_ETM_TRACE_ON
> packet into 'etmq->prev_packet'.
>
> On the other hand, cs_etm__flush() can use 'etmq->period_instructions'
> to indicate if need to generate instruction sample or not. If it's
> non-zero, then generate instruction sample and
> 'etmq->period_instructions' will be cleared; so next time if there
> have more tracing CS_ETM_TRACE_ON packet, it can skip to generate
> instruction sample due 'etmq->period_instructions' is zero.
>
> How about you think for this?
>
> Thanks,
> Leo Yan
>

I don't think this covers the cases where CS_ETM_TRACE_ON is used to
indicate a discontinuity in the trace. For example, there is work in
progress to configure the ETM so that it only traces a few thousand
cycles with a gap of many thousands of cycles between each chunk of
trace - this can be used to sample program execution in the form of
instruction events with branch stacks for feedback directed optimization
(AutoFDO).

In this case, the raw trace is something like:

...
I_ADDR_L_64IS0 : Address, Long, 64 bit, IS0.; Addr=0x0000007E7B886908;
I_ATOM_F3 : Atom format 3.; EEN
I_ATOM_F1 : Atom format 1.; E
# Trace stops here

# Some time passes, and then trace is turned on again
I_TRACE_ON : Trace On.
I_ADDR_CTXT_L_64IS0 : Address & Context, Long, 64 bit, IS0.;
Addr=0x00000057224322F4; Ctxt: AArch64,EL0, NS;
I_ATOM_F3 : Atom format 3.; ENN
I_ATOM_F5 : Atom format 5.; ENENE
...

This results in the following packets from the decoder:

CS_ETM_RANGE: [0x7e7b886908-0x7e7b886930] br
CS_ETM_RANGE: [0x7e7b88699c-0x7e7b8869a4] br
CS_ETM_RANGE: [0x7e7b8869d8-0x7e7b8869f0]
CS_ETM_RANGE: [0x7e7b8869f0-0x7e7b8869fc] br
CS_ETM_TRACE_ON
CS_ETM_RANGE: [0x57224322f4-0x5722432304] br
CS_ETM_RANGE: [0x57224320e8-0x57224320ec]
CS_ETM_RANGE: [0x57224320ec-0x57224320f8]
CS_ETM_RANGE: [0x57224320f8-0x572243212c] br
CS_ETM_RANGE: [0x5722439b80-0x5722439bec]
CS_ETM_RANGE: [0x5722439bec-0x5722439c14] br
CS_ETM_RANGE: [0x5722437c30-0x5722437c6c]
CS_ETM_RANGE: [0x5722437c6c-0x5722437c7c] br

Without handling the CS_ETM_TRACE_ON, this would be interpreted as a
branch from 0x7e7b8869f8 to 0x57224322f4, when there is actually a gap
of many thousand instructions between these.

I think this patch will break the branch stacks - by removing the
prev_packet swap from cs_etm__flush(), the next time a CS_ETM_RANGE
packet is handled, cs_etm__sample() will see prev_packet contains the
last CS_ETM_RANGE from the previous block of trace, causing an erroneous
call to cs_etm__update_last_branch_rb(). In the example above, the
branch stack will contain an erroneous branch from 0x7e7b8869f8 to
0x57224322f4.

I think what you need to do is add a check for the previous packet being
a CS_ETM_TRACE_ON when determining the generate_sample value.

Regards

Rob

>
> diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c
> index 822ba91..dd354ad 100644
> --- a/tools/perf/util/cs-etm.c
> +++ b/tools/perf/util/cs-etm.c
> @@ -495,6 +495,13 @@ static inline void cs_etm__reset_last_branch_rb(struct cs_etm_queue *etmq)
> static inline u64 cs_etm__last_executed_instr(struct cs_etm_packet *packet)
> {
> /*
> + * The packet is the start tracing packet if the end_addr is zero,
> + * returns 0 for this case.
> + */
> + if (!packet->end_addr)
> + return 0;
> +
> + /*
> * The packet records the execution range with an exclusive end address
> *
> * A64 instructions are constant size, so the last executed
> @@ -897,13 +904,27 @@ static int cs_etm__sample(struct cs_etm_queue *etmq)
> etmq->period_instructions = instrs_over;
> }
>
> - if (etm->sample_branches &&
> - etmq->prev_packet &&
> - etmq->prev_packet->sample_type == CS_ETM_RANGE &&
> - etmq->prev_packet->last_instr_taken_branch) {
> - ret = cs_etm__synth_branch_sample(etmq);
> - if (ret)
> - return ret;
> + if (etm->sample_branches && etmq->prev_packet) {
> + bool generate_sample = false;
> +
> + /* Generate sample for start tracing packet */
> + if (etmq->prev_packet->sample_type == 0)
> + generate_sample = true;

Also check for etmq->prev_packet->sample_type == CS_ETM_TRACE_ON here
and set generate_sample = true.

> +
> + /* Generate sample for exception packet */
> + if (etmq->prev_packet->exc == true)
> + generate_sample = true;
> +
> + /* Generate sample for normal branch packet */
> + if (etmq->prev_packet->sample_type == CS_ETM_RANGE &&
> + etmq->prev_packet->last_instr_taken_branch)
> + generate_sample = true;
> +
> + if (generate_sample) {
> + ret = cs_etm__synth_branch_sample(etmq);
> + if (ret)
> + return ret;
> + }
> }
>
> if (etm->sample_branches || etm->synth_opts.last_branch) {
> @@ -922,11 +943,12 @@ static int cs_etm__sample(struct cs_etm_queue *etmq)
> static int cs_etm__flush(struct cs_etm_queue *etmq)
> {
> int err = 0;
> - struct cs_etm_packet *tmp;
>
> if (etmq->etm->synth_opts.last_branch &&
> etmq->prev_packet &&
> - etmq->prev_packet->sample_type == CS_ETM_RANGE) {
> + etmq->prev_packet->sample_type == CS_ETM_RANGE &&
> + etmq->period_instructions) {
> +

I don't think this is needed.

> /*
> * Generate a last branch event for the branches left in the
> * circular buffer at the end of the trace.
> @@ -940,14 +962,6 @@ static int cs_etm__flush(struct cs_etm_queue *etmq)
> etmq, addr,
> etmq->period_instructions);
> etmq->period_instructions = 0;
> -
> - /*
> - * Swap PACKET with PREV_PACKET: PACKET becomes PREV_PACKET for
> - * the next incoming packet.
> - */
> - tmp = etmq->packet;
> - etmq->packet = etmq->prev_packet;
> - etmq->prev_packet = tmp;

This should not be changed as discussed above.

> }
>
> return err;
>

2018-05-23 13:23:55

by Leo Yan

[permalink] [raw]
Subject: Re: [RFT v2 1/4] perf cs-etm: Generate sample for missed packets

Hi Rob,

On Wed, May 23, 2018 at 12:21:18PM +0100, Robert Walker wrote:
> Hi Leo,
>
> On 22/05/18 10:52, Leo Yan wrote:
> >On Tue, May 22, 2018 at 04:39:20PM +0800, Leo Yan wrote:
> >
> >[...]
> >
> >Rather than the patch I posted in my previous email, I think below new
> >patch is more reasonable for me.
> >
> >In the below change, 'etmq->prev_packet' is only used to store the
> >previous CS_ETM_RANGE packet, we don't need to save CS_ETM_TRACE_ON
> >packet into 'etmq->prev_packet'.
> >
> >On the other hand, cs_etm__flush() can use 'etmq->period_instructions'
> >to indicate if need to generate instruction sample or not. If it's
> >non-zero, then generate instruction sample and
> >'etmq->period_instructions' will be cleared; so next time if there
> >have more tracing CS_ETM_TRACE_ON packet, it can skip to generate
> >instruction sample due 'etmq->period_instructions' is zero.
> >
> >How about you think for this?
> >
> >Thanks,
> >Leo Yan
> >
>
> I don't think this covers the cases where CS_ETM_TRACE_ON is used to
> indicate a discontinuity in the trace. For example, there is work in
> progress to configure the ETM so that it only traces a few thousand cycles
> with a gap of many thousands of cycles between each chunk of trace - this
> can be used to sample program execution in the form of instruction events
> with branch stacks for feedback directed optimization (AutoFDO).
>
> In this case, the raw trace is something like:
>
> ...
> I_ADDR_L_64IS0 : Address, Long, 64 bit, IS0.; Addr=0x0000007E7B886908;
> I_ATOM_F3 : Atom format 3.; EEN
> I_ATOM_F1 : Atom format 1.; E
> # Trace stops here
>
> # Some time passes, and then trace is turned on again
> I_TRACE_ON : Trace On.
> I_ADDR_CTXT_L_64IS0 : Address & Context, Long, 64 bit, IS0.;
> Addr=0x00000057224322F4; Ctxt: AArch64,EL0, NS;
> I_ATOM_F3 : Atom format 3.; ENN
> I_ATOM_F5 : Atom format 5.; ENENE
> ...
>
> This results in the following packets from the decoder:
>
> CS_ETM_RANGE: [0x7e7b886908-0x7e7b886930] br
> CS_ETM_RANGE: [0x7e7b88699c-0x7e7b8869a4] br
> CS_ETM_RANGE: [0x7e7b8869d8-0x7e7b8869f0]
> CS_ETM_RANGE: [0x7e7b8869f0-0x7e7b8869fc] br
> CS_ETM_TRACE_ON
> CS_ETM_RANGE: [0x57224322f4-0x5722432304] br
> CS_ETM_RANGE: [0x57224320e8-0x57224320ec]
> CS_ETM_RANGE: [0x57224320ec-0x57224320f8]
> CS_ETM_RANGE: [0x57224320f8-0x572243212c] br
> CS_ETM_RANGE: [0x5722439b80-0x5722439bec]
> CS_ETM_RANGE: [0x5722439bec-0x5722439c14] br
> CS_ETM_RANGE: [0x5722437c30-0x5722437c6c]
> CS_ETM_RANGE: [0x5722437c6c-0x5722437c7c] br
>
> Without handling the CS_ETM_TRACE_ON, this would be interpreted as a branch
> from 0x7e7b8869f8 to 0x57224322f4, when there is actually a gap of many
> thousand instructions between these.
>
> I think this patch will break the branch stacks - by removing the
> prev_packet swap from cs_etm__flush(), the next time a CS_ETM_RANGE packet
> is handled, cs_etm__sample() will see prev_packet contains the last
> CS_ETM_RANGE from the previous block of trace, causing an erroneous call to
> cs_etm__update_last_branch_rb(). In the example above, the branch stack
> will contain an erroneous branch from 0x7e7b8869f8 to 0x57224322f4.
>
> I think what you need to do is add a check for the previous packet being a
> CS_ETM_TRACE_ON when determining the generate_sample value.

I still can see there have hole for packets handling with your
suggestion, let's focus on below three packets:

CS_ETM_RANGE: [0x7e7b8869f0-0x7e7b8869fc] br
CS_ETM_TRACE_ON: [0xdeadbeefdeadbeef-0xdeadbeefdeadbeef]
CS_ETM_RANGE: [0x57224322f4-0x5722432304] br

When the CS_ETM_TRACE_ON packet is coming, cs_etm__flush() doesn't
handle for 'etmq->prev_packet' to generate branch sample, this results
in we miss the info for 0x7e7b8869fc, and with packet swapping
'etmq->prev_packet' is assigned to CS_ETM_TRACE_ON packet.

When the last CS_ETM_RANGE packet is coming, cs_etm__sample() will
combine the values from CS_ETM_TRACE_ON packet and the last
CS_ETM_RANGE packet to generate branch sample packet; at the end
we get below sample packets:

packet(n): sample::addr=0x7e7b8869f0
packet(n+1): sample::ip=0xdeadbeefdeadbeeb sample::addr=0x57224322f4

So I think we also need to generate branch sample, and we can get
below results:

packet(n): sample::addr=0x7e7b8869f0
packet(n+1): sample::ip=0x7e7b8869f8 sample::addr=0xdeadbeefdeadbeef
packet(n+2): sample::ip=0xdeadbeefdeadbeeb sample::addr=0x57224322f4

So we also can rely on this to get to know there have one address
range is [0xdeadbeefdeadbeef..0xdeadbeefdeadbeeb] to indicate there
have a discontinuity in the trace.

> Regards
>
> Rob
>
> >
> >diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c
> >index 822ba91..dd354ad 100644
> >--- a/tools/perf/util/cs-etm.c
> >+++ b/tools/perf/util/cs-etm.c
> >@@ -495,6 +495,13 @@ static inline void cs_etm__reset_last_branch_rb(struct cs_etm_queue *etmq)
> > static inline u64 cs_etm__last_executed_instr(struct cs_etm_packet *packet)
> > {
> > /*
> >+ * The packet is the start tracing packet if the end_addr is zero,
> >+ * returns 0 for this case.
> >+ */
> >+ if (!packet->end_addr)
> >+ return 0;
> >+
> >+ /*
> > * The packet records the execution range with an exclusive end address
> > *
> > * A64 instructions are constant size, so the last executed
> >@@ -897,13 +904,27 @@ static int cs_etm__sample(struct cs_etm_queue *etmq)
> > etmq->period_instructions = instrs_over;
> > }
> >- if (etm->sample_branches &&
> >- etmq->prev_packet &&
> >- etmq->prev_packet->sample_type == CS_ETM_RANGE &&
> >- etmq->prev_packet->last_instr_taken_branch) {
> >- ret = cs_etm__synth_branch_sample(etmq);
> >- if (ret)
> >- return ret;
> >+ if (etm->sample_branches && etmq->prev_packet) {
> >+ bool generate_sample = false;
> >+
> >+ /* Generate sample for start tracing packet */
> >+ if (etmq->prev_packet->sample_type == 0)
> >+ generate_sample = true;
>
> Also check for etmq->prev_packet->sample_type == CS_ETM_TRACE_ON here and
> set generate_sample = true.

Agree, will add this.

> >+
> >+ /* Generate sample for exception packet */
> >+ if (etmq->prev_packet->exc == true)
> >+ generate_sample = true;
> >+
> >+ /* Generate sample for normal branch packet */
> >+ if (etmq->prev_packet->sample_type == CS_ETM_RANGE &&
> >+ etmq->prev_packet->last_instr_taken_branch)
> >+ generate_sample = true;
> >+
> >+ if (generate_sample) {
> >+ ret = cs_etm__synth_branch_sample(etmq);
> >+ if (ret)
> >+ return ret;
> >+ }
> > }
> > if (etm->sample_branches || etm->synth_opts.last_branch) {
> >@@ -922,11 +943,12 @@ static int cs_etm__sample(struct cs_etm_queue *etmq)
> > static int cs_etm__flush(struct cs_etm_queue *etmq)
> > {
> > int err = 0;
> >- struct cs_etm_packet *tmp;
> > if (etmq->etm->synth_opts.last_branch &&
> > etmq->prev_packet &&
> >- etmq->prev_packet->sample_type == CS_ETM_RANGE) {
> >+ etmq->prev_packet->sample_type == CS_ETM_RANGE &&
> >+ etmq->period_instructions) {
> >+
>
> I don't think this is needed.

Okay, I will keep this.

> > /*
> > * Generate a last branch event for the branches left in the
> > * circular buffer at the end of the trace.
> >@@ -940,14 +962,6 @@ static int cs_etm__flush(struct cs_etm_queue *etmq)
> > etmq, addr,
> > etmq->period_instructions);
> > etmq->period_instructions = 0;
> >-
> >- /*
> >- * Swap PACKET with PREV_PACKET: PACKET becomes PREV_PACKET for
> >- * the next incoming packet.
> >- */
> >- tmp = etmq->packet;
> >- etmq->packet = etmq->prev_packet;
> >- etmq->prev_packet = tmp;
>
> This should not be changed as discussed above.

Okay, will keep this. But I suggest we add some change like below:

+ if (etm->sample_branches) {
+ err = cs_etm__synth_branch_sample(etmq);
+ if (err)
+ return err;
+ }

If so, could you review my posted another patch for this?
http://archive.armlinux.org.uk/lurker/message/20180522.083920.184f1f78.en.html

Thanks,
Leo Yan

> > }
> > return err;
> >

2018-05-25 13:58:08

by Robert Walker

[permalink] [raw]
Subject: Re: [RFT v2 1/4] perf cs-etm: Generate sample for missed packets

Hi Leo,


On 23/05/18 14:22, Leo Yan wrote:
> Hi Rob,
>
> On Wed, May 23, 2018 at 12:21:18PM +0100, Robert Walker wrote:
>> Hi Leo,
>>
>> On 22/05/18 10:52, Leo Yan wrote:
>>> On Tue, May 22, 2018 at 04:39:20PM +0800, Leo Yan wrote:
>>>
>>> [...]
>>>
>>> Rather than the patch I posted in my previous email, I think below new
>>> patch is more reasonable for me.
>>>
>>> In the below change, 'etmq->prev_packet' is only used to store the
>>> previous CS_ETM_RANGE packet, we don't need to save CS_ETM_TRACE_ON
>>> packet into 'etmq->prev_packet'.
>>>
>>> On the other hand, cs_etm__flush() can use 'etmq->period_instructions'
>>> to indicate if need to generate instruction sample or not. If it's
>>> non-zero, then generate instruction sample and
>>> 'etmq->period_instructions' will be cleared; so next time if there
>>> have more tracing CS_ETM_TRACE_ON packet, it can skip to generate
>>> instruction sample due 'etmq->period_instructions' is zero.
>>>
>>> How about you think for this?
>>>
>>> Thanks,
>>> Leo Yan
>>>
>> I don't think this covers the cases where CS_ETM_TRACE_ON is used to
>> indicate a discontinuity in the trace. For example, there is work in
>> progress to configure the ETM so that it only traces a few thousand cycles
>> with a gap of many thousands of cycles between each chunk of trace - this
>> can be used to sample program execution in the form of instruction events
>> with branch stacks for feedback directed optimization (AutoFDO).
>>
>> In this case, the raw trace is something like:
>>
>> ...
>> I_ADDR_L_64IS0 : Address, Long, 64 bit, IS0.; Addr=0x0000007E7B886908;
>> I_ATOM_F3 : Atom format 3.; EEN
>> I_ATOM_F1 : Atom format 1.; E
>> # Trace stops here
>>
>> # Some time passes, and then trace is turned on again
>> I_TRACE_ON : Trace On.
>> I_ADDR_CTXT_L_64IS0 : Address & Context, Long, 64 bit, IS0.;
>> Addr=0x00000057224322F4; Ctxt: AArch64,EL0, NS;
>> I_ATOM_F3 : Atom format 3.; ENN
>> I_ATOM_F5 : Atom format 5.; ENENE
>> ...
>>
>> This results in the following packets from the decoder:
>>
>> CS_ETM_RANGE: [0x7e7b886908-0x7e7b886930] br
>> CS_ETM_RANGE: [0x7e7b88699c-0x7e7b8869a4] br
>> CS_ETM_RANGE: [0x7e7b8869d8-0x7e7b8869f0]
>> CS_ETM_RANGE: [0x7e7b8869f0-0x7e7b8869fc] br
>> CS_ETM_TRACE_ON
>> CS_ETM_RANGE: [0x57224322f4-0x5722432304] br
>> CS_ETM_RANGE: [0x57224320e8-0x57224320ec]
>> CS_ETM_RANGE: [0x57224320ec-0x57224320f8]
>> CS_ETM_RANGE: [0x57224320f8-0x572243212c] br
>> CS_ETM_RANGE: [0x5722439b80-0x5722439bec]
>> CS_ETM_RANGE: [0x5722439bec-0x5722439c14] br
>> CS_ETM_RANGE: [0x5722437c30-0x5722437c6c]
>> CS_ETM_RANGE: [0x5722437c6c-0x5722437c7c] br
>>
>> Without handling the CS_ETM_TRACE_ON, this would be interpreted as a branch
>> from 0x7e7b8869f8 to 0x57224322f4, when there is actually a gap of many
>> thousand instructions between these.
>>
>> I think this patch will break the branch stacks - by removing the
>> prev_packet swap from cs_etm__flush(), the next time a CS_ETM_RANGE packet
>> is handled, cs_etm__sample() will see prev_packet contains the last
>> CS_ETM_RANGE from the previous block of trace, causing an erroneous call to
>> cs_etm__update_last_branch_rb(). In the example above, the branch stack
>> will contain an erroneous branch from 0x7e7b8869f8 to 0x57224322f4.
>>
>> I think what you need to do is add a check for the previous packet being a
>> CS_ETM_TRACE_ON when determining the generate_sample value.
> I still can see there have hole for packets handling with your
> suggestion, let's focus on below three packets:
>
> CS_ETM_RANGE: [0x7e7b8869f0-0x7e7b8869fc] br
> CS_ETM_TRACE_ON: [0xdeadbeefdeadbeef-0xdeadbeefdeadbeef]
> CS_ETM_RANGE: [0x57224322f4-0x5722432304] br
>
> When the CS_ETM_TRACE_ON packet is coming, cs_etm__flush() doesn't
> handle for 'etmq->prev_packet' to generate branch sample, this results
> in we miss the info for 0x7e7b8869fc, and with packet swapping
> 'etmq->prev_packet' is assigned to CS_ETM_TRACE_ON packet.
>
> When the last CS_ETM_RANGE packet is coming, cs_etm__sample() will
> combine the values from CS_ETM_TRACE_ON packet and the last
> CS_ETM_RANGE packet to generate branch sample packet; at the end
> we get below sample packets:
>
> packet(n): sample::addr=0x7e7b8869f0
> packet(n+1): sample::ip=0xdeadbeefdeadbeeb sample::addr=0x57224322f4
>
> So I think we also need to generate branch sample, and we can get
> below results:
>
> packet(n): sample::addr=0x7e7b8869f0
> packet(n+1): sample::ip=0x7e7b8869f8 sample::addr=0xdeadbeefdeadbeef
> packet(n+2): sample::ip=0xdeadbeefdeadbeeb sample::addr=0x57224322f4
>
> So we also can rely on this to get to know there have one address
> range is [0xdeadbeefdeadbeef..0xdeadbeefdeadbeeb] to indicate there
> have a discontinuity in the trace.
Yes, I agree you need the extra branch sample from cs_etm__flush().

With a discontinuity in trace, I get output from perf script like this:

branches:u:        59ee6e2e08 sqlite3VdbeExec (speedtest1) =>      
59ee6e2e64 sqlite3VdbeExec (spe
branches:u:        59ee6e2e7c sqlite3VdbeExec (speedtest1) =>      
59ee6e2eec sqlite3VdbeExec (spe
branches:u:        59ee6e2efc sqlite3VdbeExec (speedtest1) =>      
59ee6e2f14 sqlite3VdbeExec (spe
branches:u:        59ee6e2f3c sqlite3VdbeExec (speedtest1) =>
deadbeefdeadbeef [unknown] ([unknown])
branches:u:  deadbeefdeadbeeb [unknown] ([unknown]) => 769949daa0 memcpy
(/system/lib64/libc.so)
branches:u:        769949dacc memcpy (/system/lib64/libc.so) =>      
59ee6f0664 insertCell (speedtest1)
branches:u:        59ee6f0664 insertCell (speedtest1) => 59ee6f0684
insertCell (speedtest1)
branches:u:        59ee6f06a4 insertCell (speedtest1) => 59ee6a4d50
memmove@plt (speedtest1)
branches:u:        59ee6a4d5c memmove@plt (speedtest1) => 769949ebf8
memmove (/system/lib64/libc.so)

Showing there is a break in trace between 59ee6e2f3c and 769949daa0. 
The deadbeefdeadbeef addresses are a bit ugly - these are just dummy
values emitted in the decoder layer - maybe these should be changed to
0.  Or you could add a new sample type (i.e. not branch) to indicate the
start / end of trace, with only the valid address.

With this change, it becomes the same as the patch from your previous mail.

Regards

Rob

>> Regards
>>
>> Rob
>>
>>> diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c
>>> index 822ba91..dd354ad 100644
>>> --- a/tools/perf/util/cs-etm.c
>>> +++ b/tools/perf/util/cs-etm.c
>>> @@ -495,6 +495,13 @@ static inline void cs_etm__reset_last_branch_rb(struct cs_etm_queue *etmq)
>>> static inline u64 cs_etm__last_executed_instr(struct cs_etm_packet *packet)
>>> {
>>> /*
>>> + * The packet is the start tracing packet if the end_addr is zero,
>>> + * returns 0 for this case.
>>> + */
>>> + if (!packet->end_addr)
>>> + return 0;
>>> +
>>> + /*
>>> * The packet records the execution range with an exclusive end address
>>> *
>>> * A64 instructions are constant size, so the last executed
>>> @@ -897,13 +904,27 @@ static int cs_etm__sample(struct cs_etm_queue *etmq)
>>> etmq->period_instructions = instrs_over;
>>> }
>>> - if (etm->sample_branches &&
>>> - etmq->prev_packet &&
>>> - etmq->prev_packet->sample_type == CS_ETM_RANGE &&
>>> - etmq->prev_packet->last_instr_taken_branch) {
>>> - ret = cs_etm__synth_branch_sample(etmq);
>>> - if (ret)
>>> - return ret;
>>> + if (etm->sample_branches && etmq->prev_packet) {
>>> + bool generate_sample = false;
>>> +
>>> + /* Generate sample for start tracing packet */
>>> + if (etmq->prev_packet->sample_type == 0)
>>> + generate_sample = true;
>> Also check for etmq->prev_packet->sample_type == CS_ETM_TRACE_ON here and
>> set generate_sample = true.
> Agree, will add this.
>
>>> +
>>> + /* Generate sample for exception packet */
>>> + if (etmq->prev_packet->exc == true)
>>> + generate_sample = true;
>>> +
>>> + /* Generate sample for normal branch packet */
>>> + if (etmq->prev_packet->sample_type == CS_ETM_RANGE &&
>>> + etmq->prev_packet->last_instr_taken_branch)
>>> + generate_sample = true;
>>> +
>>> + if (generate_sample) {
>>> + ret = cs_etm__synth_branch_sample(etmq);
>>> + if (ret)
>>> + return ret;
>>> + }
>>> }
>>> if (etm->sample_branches || etm->synth_opts.last_branch) {
>>> @@ -922,11 +943,12 @@ static int cs_etm__sample(struct cs_etm_queue *etmq)
>>> static int cs_etm__flush(struct cs_etm_queue *etmq)
>>> {
>>> int err = 0;
>>> - struct cs_etm_packet *tmp;
>>> if (etmq->etm->synth_opts.last_branch &&
>>> etmq->prev_packet &&
>>> - etmq->prev_packet->sample_type == CS_ETM_RANGE) {
>>> + etmq->prev_packet->sample_type == CS_ETM_RANGE &&
>>> + etmq->period_instructions) {
>>> +
>> I don't think this is needed.
> Okay, I will keep this.
>
>>> /*
>>> * Generate a last branch event for the branches left in the
>>> * circular buffer at the end of the trace.
>>> @@ -940,14 +962,6 @@ static int cs_etm__flush(struct cs_etm_queue *etmq)
>>> etmq, addr,
>>> etmq->period_instructions);
>>> etmq->period_instructions = 0;
>>> -
>>> - /*
>>> - * Swap PACKET with PREV_PACKET: PACKET becomes PREV_PACKET for
>>> - * the next incoming packet.
>>> - */
>>> - tmp = etmq->packet;
>>> - etmq->packet = etmq->prev_packet;
>>> - etmq->prev_packet = tmp;
>> This should not be changed as discussed above.
> Okay, will keep this. But I suggest we add some change like below:
>
> + if (etm->sample_branches) {
> + err = cs_etm__synth_branch_sample(etmq);
> + if (err)
> + return err;
> + }
>
> If so, could you review my posted another patch for this?
> http://archive.armlinux.org.uk/lurker/message/20180522.083920.184f1f78.en.html
WIll do - with these changes, it is the same as your original patch.
> Thanks,
> Leo Yan
>
>>> }
>>> return err;
>>>


2018-05-25 14:05:48

by Robert Walker

[permalink] [raw]
Subject: Re: [RFT v2 1/4] perf cs-etm: Generate sample for missed packets

Hi Leo,

Following the discussions from your reply to this with a simplified
patch, this version of the patch works better as you also need to emit a
branch sample when handling a CS_ETM_TRACE_ON packet to indicate the end
of a block of trace.

This patch does not break the output from perf inject to generate
instruction samples for AutoFDO, so I am happy with that.

Regards

Rob

Reviewed-by: Robert Walker <[email protected]>


On 22/05/18 09:39, Leo Yan wrote:
> Hi Rob,
>
> On Mon, May 21, 2018 at 12:27:42PM +0100, Robert Walker wrote:
>> Hi Leo,
>>
>> On 21/05/18 09:52, Leo Yan wrote:
>>> Commit e573e978fb12 ("perf cs-etm: Inject capabilitity for CoreSight
>>> traces") reworks the samples generation flow from CoreSight trace to
>>> match the correct format so Perf report tool can display the samples
>>> properly. But the change has side effect for packet handling, it only
>>> generate samples when 'prev_packet->last_instr_taken_branch' is true,
>>> this results in the start tracing packet and exception packets are
>>> dropped.
>>>
>>> This patch checks extra two conditions for complete samples:
>>>
>>> - If 'prev_packet->sample_type' is zero we can use this condition to
>>> get to know this is the start tracing packet; for this case, the start
>>> packet's end_addr is zero as well so we need to handle it in the
>>> function cs_etm__last_executed_instr();
>>>
>> I think you also need to add something in to handle discontinuities in
>> trace - for example it is possible to configure the ETM to only trace
>> execution in specific code regions or to trace a few cycles every so
>> often. In these cases, prev_packet->sample_type will not be zero, but
>> whatever the previous packet was. You will get a CS_ETM_TRACE_ON packet in
>> such cases, generated by an I_TRACE_ON element in the trace stream.
>> You also get this on exception return.
>>
>> However, you should also keep the test for prev_packet->sample_type == 0
>> as you may not see a CS_ETM_TRACE_ON when decoding a buffer that has
>> wrapped.
> Thanks for reviewing. Let's dig more detailed into this issue,
> especially for handling packet CS_ETM_TRACE_ON, I'd like divide into two
> sub cases.
>
> - The first case is for using python script:
>
> I use python script to analyze packets with below command:
> ./perf script --itrace=ril128 -s arm-cs-trace-disasm.py -F cpu,event,ip,addr,sym -- -v -d objdump -k ./vmlinux
>
> What I observe is after we pass python script with parameter '-s
> arm-cs-trace-disasm.py', then instruction tracing options
> '--itrace=ril128' isn't really used; the perf tool creates another
> new process for launch python script and re-enter cmd_script()
> function, but at the second time when invoke cmd_script() for python
> script execution the option '--itrace=ril128' is dropped and all
> parameters are only valid defined by the python script.
>
> As result, I can the variable 'etmq->etm->synth_opts.last_branch' is
> always FALSE for running python script. So all CS_ETM_TRACE_ON
> packets will be ignored in the function cs_etm__flush().
>
> Even the CS_ETM_TRACE_ON packets are missed to handle, the program
> flow still can work well. The reason is without the interference by
> CS_ETM_TRACE_ON, the CS_ETM_RANGE packets can smoothly create
> instruction range by ignore the middle CS_ETM_TRACE_ON packet.
>
> Please see below example, in this example there have 3 packets, the
> first one packet is CS_ETM_RANGE packet which is labelled with
> 'PACKET_1', the first one packet can properly generate branch sample
> data with previous packet as expected; the second packet is
> PACKET_2 which is CS_ETM_TRACE_ON, but
> 'etmq->etm->synth_opts.last_branch' is false so function
> cs_etm__flush() doesn't handle it and skip the swap operation
> "etmq->prev_packet = tmp"; the third packet is PACKET_3, which is
> CS_ETM_RANGE packet and we can see it's smoontly to create
> continous instruction range between PACKET_1 and PACKET_3.
>
> cs_etm__sample: prev_packet: sample_type=1 exc=0 exc_ret=0 cpu=1 start_addr=0xffff000008a5f79c end_addr=0xffff000008a5f7bc last_instr_taken_branch=1
> PACKET_1: cs_etm__sample: packet: sample_type=1 exc=0 exc_ret=0 cpu=1 start_addr=0xffff000008a5f858 end_addr=0xffff000008a5f864 last_instr_taken_branch=1
> cs_etm__synth_branch_sample: ip=0xffff000008a5f7b8 addr=0xffff000008a5f858 pid=2290 tid=2290 id=1000000021 stream_id=1000000021 period=1 cpu=1 flags=0 cpumode=2
>
> cs_etm__flush: prev_packet: sample_type=1 exc=0 exc_ret=0 cpu=1 start_addr=0xffff000008a5f858 end_addr=0xffff000008a5f864 last_instr_taken_branch=1
> PACKET_2: cs_etm__flush: packet: sample_type=2 exc=0 exc_ret=0 cpu=2 start_addr=0xdeadbeefdeadbeef end_addr=0xdeadbeefdeadbeef last_instr_taken_branch=1
>
> cs_etm__sample: prev_packet: sample_type=1 exc=0 exc_ret=0 cpu=1 start_addr=0xffff000008a5f858 end_addr=0xffff000008a5f864 last_instr_taken_branch=1
> PACKET_3: cs_etm__sample: packet: sample_type=1 exc=0 exc_ret=0 cpu=2 start_addr=0xffff000008be7528 end_addr=0xffff000008be7538 last_instr_taken_branch=1
> cs_etm__synth_branch_sample: ip=0xffff000008a5f860 addr=0xffff000008be7528 pid=2290 tid=2290 id=1000000021 stream_id=1000000021 period=1 cpu=2 flags=0 cpumode=2
>
> So seems to me, the CS_ETM_TRACE_ON packet doesn't introduce trouble
> for the program flow analysis if we can handle all CS_ETM_RANGE
> packets and without handling CS_ETM_TRACE_ON packet for branch
> samples.
>
> - The second case is for --itrace option without python script:
> ./perf script --itrace=ril -F cpu,event,ip,addr,sym -k ./vmlinux
>
> In this case, the flag 'etmq->etm->synth_opts.last_branch' is true
> so CS_ETM_TRACE_ON packet will be handled; but I can observe the
> CS_ETM_RANGE packet in etmq->prev_packet isn't handled in the
> function cs_etm__flush() for branch sample, so actually we miss some
> branch sample for this case.
>
> So I think we also need handle CS_ETM_RANGE packet in function
> cs_etm__flush() to generate branch samples. But this has side
> effect, we introduce the extra track for CS_ETM_TRACE_ON packet for
> branch samples, so we will see one branch range like:
> [ 0xdeadbeefdeadbeef .. 0xdeadbeefdeadbeef ].
>
> Please reivew below change is okay for you? Thanks a lot for
> suggestions.
>
> diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c
> index 822ba91..37d3722 100644
> --- a/tools/perf/util/cs-etm.c
> +++ b/tools/perf/util/cs-etm.c
> @@ -495,6 +495,13 @@ static inline void cs_etm__reset_last_branch_rb(struct cs_etm_queue *etmq)
> static inline u64 cs_etm__last_executed_instr(struct cs_etm_packet *packet)
> {
> /*
> + * The packet is the start tracing packet if the end_addr is zero,
> + * returns 0 for this case.
> + */
> + if (!packet->end_addr)
> + return 0;
> +
> + /*
> * The packet records the execution range with an exclusive end address
> *
> * A64 instructions are constant size, so the last executed
> @@ -897,13 +904,28 @@ static int cs_etm__sample(struct cs_etm_queue *etmq)
> etmq->period_instructions = instrs_over;
> }
>
> - if (etm->sample_branches &&
> - etmq->prev_packet &&
> - etmq->prev_packet->sample_type == CS_ETM_RANGE &&
> - etmq->prev_packet->last_instr_taken_branch) {
> - ret = cs_etm__synth_branch_sample(etmq);
> - if (ret)
> - return ret;
> + if (etm->sample_branches && etmq->prev_packet) {
> + bool generate_sample = false;
> +
> + /* Generate sample for start tracing packet */
> + if (etmq->prev_packet->sample_type == 0 ||
> + etmq->prev_packet->sample_type == CS_ETM_TRACE_ON)
> + generate_sample = true;
> +
> + /* Generate sample for exception packet */
> + if (etmq->prev_packet->exc == true)
> + generate_sample = true;
> +
> + /* Generate sample for normal branch packet */
> + if (etmq->prev_packet->sample_type == CS_ETM_RANGE &&
> + etmq->prev_packet->last_instr_taken_branch)
> + generate_sample = true;
> +
> + if (generate_sample) {
> + ret = cs_etm__synth_branch_sample(etmq);
> + if (ret)
> + return ret;
> + }
> }
>
> if (etm->sample_branches || etm->synth_opts.last_branch) {
> @@ -921,12 +943,17 @@ static int cs_etm__sample(struct cs_etm_queue *etmq)
>
> static int cs_etm__flush(struct cs_etm_queue *etmq)
> {
> + struct cs_etm_auxtrace *etm = etmq->etm;
> int err = 0;
> struct cs_etm_packet *tmp;
>
> - if (etmq->etm->synth_opts.last_branch &&
> - etmq->prev_packet &&
> - etmq->prev_packet->sample_type == CS_ETM_RANGE) {
> + if (!etmq->prev_packet)
> + return 0;
> +
> + if (etmq->prev_packet->sample_type != CS_ETM_RANGE)
> + return 0;
> +
> + if (etmq->etm->synth_opts.last_branch) {
> /*
> * Generate a last branch event for the branches left in the
> * circular buffer at the end of the trace.
> @@ -939,18 +966,25 @@ static int cs_etm__flush(struct cs_etm_queue *etmq)
> err = cs_etm__synth_instruction_sample(
> etmq, addr,
> etmq->period_instructions);
> + if (err)
> + return err;
> etmq->period_instructions = 0;
> + }
>
> - /*
> - * Swap PACKET with PREV_PACKET: PACKET becomes PREV_PACKET for
> - * the next incoming packet.
> - */
> - tmp = etmq->packet;
> - etmq->packet = etmq->prev_packet;
> - etmq->prev_packet = tmp;
> + if (etm->sample_branches) {
> + err = cs_etm__synth_branch_sample(etmq);
> + if (err)
> + return err;
> }
>
> - return err;
> + /*
> + * Swap PACKET with PREV_PACKET: PACKET becomes PREV_PACKET for
> + * the next incoming packet.
> + */
> + tmp = etmq->packet;
> + etmq->packet = etmq->prev_packet;
> + etmq->prev_packet = tmp;
> + return 0;
> }
>
> static int cs_etm__run_decoder(struct cs_etm_queue *etmq)


2018-05-25 15:29:27

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: Re: [RFT v2 1/4] perf cs-etm: Generate sample for missed packets

Em Fri, May 25, 2018 at 03:03:47PM +0100, Robert Walker escreveu:
> Hi Leo,
>
> Following the discussions from your reply to this with a simplified patch,
> this version of the patch works better as you also need to emit a branch
> sample when handling a CS_ETM_TRACE_ON packet to indicate the end of a block
> of trace.
>
> This patch does not break the output from perf inject to generate
> instruction samples for AutoFDO, so I am happy with that.
>
> Regards
>
> Rob
>
> Reviewed-by: Robert Walker <[email protected]>

So, Leo, can you please resubmit, bumping the v2 to v3 (or the latest
one, I haven't fully reread this thread) add this "Reviewed-by: Robert"
tag and any other that people may have provided, so that I can merge it?

Thanks,

- Arnaldo

>
> On 22/05/18 09:39, Leo Yan wrote:
> > Hi Rob,
> >
> > On Mon, May 21, 2018 at 12:27:42PM +0100, Robert Walker wrote:
> > > Hi Leo,
> > >
> > > On 21/05/18 09:52, Leo Yan wrote:
> > > > Commit e573e978fb12 ("perf cs-etm: Inject capabilitity for CoreSight
> > > > traces") reworks the samples generation flow from CoreSight trace to
> > > > match the correct format so Perf report tool can display the samples
> > > > properly. But the change has side effect for packet handling, it only
> > > > generate samples when 'prev_packet->last_instr_taken_branch' is true,
> > > > this results in the start tracing packet and exception packets are
> > > > dropped.
> > > >
> > > > This patch checks extra two conditions for complete samples:
> > > >
> > > > - If 'prev_packet->sample_type' is zero we can use this condition to
> > > > get to know this is the start tracing packet; for this case, the start
> > > > packet's end_addr is zero as well so we need to handle it in the
> > > > function cs_etm__last_executed_instr();
> > > >
> > > I think you also need to add something in to handle discontinuities in
> > > trace - for example it is possible to configure the ETM to only trace
> > > execution in specific code regions or to trace a few cycles every so
> > > often. In these cases, prev_packet->sample_type will not be zero, but
> > > whatever the previous packet was. You will get a CS_ETM_TRACE_ON packet in
> > > such cases, generated by an I_TRACE_ON element in the trace stream.
> > > You also get this on exception return.
> > >
> > > However, you should also keep the test for prev_packet->sample_type == 0
> > > as you may not see a CS_ETM_TRACE_ON when decoding a buffer that has
> > > wrapped.
> > Thanks for reviewing. Let's dig more detailed into this issue,
> > especially for handling packet CS_ETM_TRACE_ON, I'd like divide into two
> > sub cases.
> >
> > - The first case is for using python script:
> >
> > I use python script to analyze packets with below command:
> > ./perf script --itrace=ril128 -s arm-cs-trace-disasm.py -F cpu,event,ip,addr,sym -- -v -d objdump -k ./vmlinux
> >
> > What I observe is after we pass python script with parameter '-s
> > arm-cs-trace-disasm.py', then instruction tracing options
> > '--itrace=ril128' isn't really used; the perf tool creates another
> > new process for launch python script and re-enter cmd_script()
> > function, but at the second time when invoke cmd_script() for python
> > script execution the option '--itrace=ril128' is dropped and all
> > parameters are only valid defined by the python script.
> >
> > As result, I can the variable 'etmq->etm->synth_opts.last_branch' is
> > always FALSE for running python script. So all CS_ETM_TRACE_ON
> > packets will be ignored in the function cs_etm__flush().
> >
> > Even the CS_ETM_TRACE_ON packets are missed to handle, the program
> > flow still can work well. The reason is without the interference by
> > CS_ETM_TRACE_ON, the CS_ETM_RANGE packets can smoothly create
> > instruction range by ignore the middle CS_ETM_TRACE_ON packet.
> >
> > Please see below example, in this example there have 3 packets, the
> > first one packet is CS_ETM_RANGE packet which is labelled with
> > 'PACKET_1', the first one packet can properly generate branch sample
> > data with previous packet as expected; the second packet is
> > PACKET_2 which is CS_ETM_TRACE_ON, but
> > 'etmq->etm->synth_opts.last_branch' is false so function
> > cs_etm__flush() doesn't handle it and skip the swap operation
> > "etmq->prev_packet = tmp"; the third packet is PACKET_3, which is
> > CS_ETM_RANGE packet and we can see it's smoontly to create
> > continous instruction range between PACKET_1 and PACKET_3.
> >
> > cs_etm__sample: prev_packet: sample_type=1 exc=0 exc_ret=0 cpu=1 start_addr=0xffff000008a5f79c end_addr=0xffff000008a5f7bc last_instr_taken_branch=1
> > PACKET_1: cs_etm__sample: packet: sample_type=1 exc=0 exc_ret=0 cpu=1 start_addr=0xffff000008a5f858 end_addr=0xffff000008a5f864 last_instr_taken_branch=1
> > cs_etm__synth_branch_sample: ip=0xffff000008a5f7b8 addr=0xffff000008a5f858 pid=2290 tid=2290 id=1000000021 stream_id=1000000021 period=1 cpu=1 flags=0 cpumode=2
> >
> > cs_etm__flush: prev_packet: sample_type=1 exc=0 exc_ret=0 cpu=1 start_addr=0xffff000008a5f858 end_addr=0xffff000008a5f864 last_instr_taken_branch=1
> > PACKET_2: cs_etm__flush: packet: sample_type=2 exc=0 exc_ret=0 cpu=2 start_addr=0xdeadbeefdeadbeef end_addr=0xdeadbeefdeadbeef last_instr_taken_branch=1
> >
> > cs_etm__sample: prev_packet: sample_type=1 exc=0 exc_ret=0 cpu=1 start_addr=0xffff000008a5f858 end_addr=0xffff000008a5f864 last_instr_taken_branch=1
> > PACKET_3: cs_etm__sample: packet: sample_type=1 exc=0 exc_ret=0 cpu=2 start_addr=0xffff000008be7528 end_addr=0xffff000008be7538 last_instr_taken_branch=1
> > cs_etm__synth_branch_sample: ip=0xffff000008a5f860 addr=0xffff000008be7528 pid=2290 tid=2290 id=1000000021 stream_id=1000000021 period=1 cpu=2 flags=0 cpumode=2
> >
> > So seems to me, the CS_ETM_TRACE_ON packet doesn't introduce trouble
> > for the program flow analysis if we can handle all CS_ETM_RANGE
> > packets and without handling CS_ETM_TRACE_ON packet for branch
> > samples.
> >
> > - The second case is for --itrace option without python script:
> > ./perf script --itrace=ril -F cpu,event,ip,addr,sym -k ./vmlinux
> >
> > In this case, the flag 'etmq->etm->synth_opts.last_branch' is true
> > so CS_ETM_TRACE_ON packet will be handled; but I can observe the
> > CS_ETM_RANGE packet in etmq->prev_packet isn't handled in the
> > function cs_etm__flush() for branch sample, so actually we miss some
> > branch sample for this case.
> >
> > So I think we also need handle CS_ETM_RANGE packet in function
> > cs_etm__flush() to generate branch samples. But this has side
> > effect, we introduce the extra track for CS_ETM_TRACE_ON packet for
> > branch samples, so we will see one branch range like:
> > [ 0xdeadbeefdeadbeef .. 0xdeadbeefdeadbeef ].
> >
> > Please reivew below change is okay for you? Thanks a lot for
> > suggestions.
> >
> > diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c
> > index 822ba91..37d3722 100644
> > --- a/tools/perf/util/cs-etm.c
> > +++ b/tools/perf/util/cs-etm.c
> > @@ -495,6 +495,13 @@ static inline void cs_etm__reset_last_branch_rb(struct cs_etm_queue *etmq)
> > static inline u64 cs_etm__last_executed_instr(struct cs_etm_packet *packet)
> > {
> > /*
> > + * The packet is the start tracing packet if the end_addr is zero,
> > + * returns 0 for this case.
> > + */
> > + if (!packet->end_addr)
> > + return 0;
> > +
> > + /*
> > * The packet records the execution range with an exclusive end address
> > *
> > * A64 instructions are constant size, so the last executed
> > @@ -897,13 +904,28 @@ static int cs_etm__sample(struct cs_etm_queue *etmq)
> > etmq->period_instructions = instrs_over;
> > }
> > - if (etm->sample_branches &&
> > - etmq->prev_packet &&
> > - etmq->prev_packet->sample_type == CS_ETM_RANGE &&
> > - etmq->prev_packet->last_instr_taken_branch) {
> > - ret = cs_etm__synth_branch_sample(etmq);
> > - if (ret)
> > - return ret;
> > + if (etm->sample_branches && etmq->prev_packet) {
> > + bool generate_sample = false;
> > +
> > + /* Generate sample for start tracing packet */
> > + if (etmq->prev_packet->sample_type == 0 ||
> > + etmq->prev_packet->sample_type == CS_ETM_TRACE_ON)
> > + generate_sample = true;
> > +
> > + /* Generate sample for exception packet */
> > + if (etmq->prev_packet->exc == true)
> > + generate_sample = true;
> > +
> > + /* Generate sample for normal branch packet */
> > + if (etmq->prev_packet->sample_type == CS_ETM_RANGE &&
> > + etmq->prev_packet->last_instr_taken_branch)
> > + generate_sample = true;
> > +
> > + if (generate_sample) {
> > + ret = cs_etm__synth_branch_sample(etmq);
> > + if (ret)
> > + return ret;
> > + }
> > }
> > if (etm->sample_branches || etm->synth_opts.last_branch) {
> > @@ -921,12 +943,17 @@ static int cs_etm__sample(struct cs_etm_queue *etmq)
> > static int cs_etm__flush(struct cs_etm_queue *etmq)
> > {
> > + struct cs_etm_auxtrace *etm = etmq->etm;
> > int err = 0;
> > struct cs_etm_packet *tmp;
> > - if (etmq->etm->synth_opts.last_branch &&
> > - etmq->prev_packet &&
> > - etmq->prev_packet->sample_type == CS_ETM_RANGE) {
> > + if (!etmq->prev_packet)
> > + return 0;
> > +
> > + if (etmq->prev_packet->sample_type != CS_ETM_RANGE)
> > + return 0;
> > +
> > + if (etmq->etm->synth_opts.last_branch) {
> > /*
> > * Generate a last branch event for the branches left in the
> > * circular buffer at the end of the trace.
> > @@ -939,18 +966,25 @@ static int cs_etm__flush(struct cs_etm_queue *etmq)
> > err = cs_etm__synth_instruction_sample(
> > etmq, addr,
> > etmq->period_instructions);
> > + if (err)
> > + return err;
> > etmq->period_instructions = 0;
> > + }
> > - /*
> > - * Swap PACKET with PREV_PACKET: PACKET becomes PREV_PACKET for
> > - * the next incoming packet.
> > - */
> > - tmp = etmq->packet;
> > - etmq->packet = etmq->prev_packet;
> > - etmq->prev_packet = tmp;
> > + if (etm->sample_branches) {
> > + err = cs_etm__synth_branch_sample(etmq);
> > + if (err)
> > + return err;
> > }
> > - return err;
> > + /*
> > + * Swap PACKET with PREV_PACKET: PACKET becomes PREV_PACKET for
> > + * the next incoming packet.
> > + */
> > + tmp = etmq->packet;
> > + etmq->packet = etmq->prev_packet;
> > + etmq->prev_packet = tmp;
> > + return 0;
> > }
> > static int cs_etm__run_decoder(struct cs_etm_queue *etmq)

2018-05-25 15:56:23

by Leo Yan

[permalink] [raw]
Subject: Re: [RFT v2 1/4] perf cs-etm: Generate sample for missed packets

Hi Arnaldo, Rob,

On Fri, May 25, 2018 at 12:27:13PM -0300, Arnaldo Carvalho de Melo wrote:
> Em Fri, May 25, 2018 at 03:03:47PM +0100, Robert Walker escreveu:
> > Hi Leo,
> >
> > Following the discussions from your reply to this with a simplified patch,
> > this version of the patch works better as you also need to emit a branch
> > sample when handling a CS_ETM_TRACE_ON packet to indicate the end of a block
> > of trace.

I also will follow the suggestion as Rob mentioned in another email:
"The deadbeefdeadbeef addresses are a bit ugly - these are just dummy
values emitted in the decoder layer - maybe these should be changed
to 0."

> > This patch does not break the output from perf inject to generate
> > instruction samples for AutoFDO, so I am happy with that.

Thanks for confirmation.

> > Regards
> >
> > Rob
> >
> > Reviewed-by: Robert Walker <[email protected]>
>
> So, Leo, can you please resubmit, bumping the v2 to v3 (or the latest
> one, I haven't fully reread this thread) add this "Reviewed-by: Robert"
> tag and any other that people may have provided, so that I can merge it?

Sure! I will respin the v3 patch series by following up Rob's
suggestion and add Rob's review tag.

BTW, I'd like to get ack from Mathieu as well. Mathieu is working on
CPU wide tracing, so I talked with Mathieu he will review the patch
series if has conflict with CPU wide tracing.

[...]

Thanks,
Leo Yan