2023-03-06 21:13:49

by Yang Shi

[permalink] [raw]
Subject: [BUG] perf: No samples found when using kcore + coresight

Hi,

I'm seeking some help regarding perf record --kcore + coresight. When
I tracing with perf record --kcore -e cs_etm ... , perf report shows
"The perf.data/data data has no samples!".

I tried other combinations:
1. perf record --kcore (w/o using coresight): works
2. perf record -e cs_etm ... : works
3. Manually copy kcore with perf buildid-cache, then run perf record
-e cs_etm... : works

So just "perf record --kcore -e cs_etm ..." doesn't work.

I'm using v6.2 kernel and the perf is built from the same kernel
source with OpenCSD 1.4. Also attached the sample file generated by
perf.

Any suggestion is appreciated.

Thanks,
Yang


Attachments:
perf.data.broken (164.07 kB)

2023-03-07 10:33:41

by James Clark

[permalink] [raw]
Subject: Re: [BUG] perf: No samples found when using kcore + coresight



On 06/03/2023 21:13, Yang Shi wrote:
> Hi,
>
> I'm seeking some help regarding perf record --kcore + coresight. When
> I tracing with perf record --kcore -e cs_etm ... , perf report shows
> "The perf.data/data data has no samples!".
>
> I tried other combinations:
> 1. perf record --kcore (w/o using coresight): works
> 2. perf record -e cs_etm ... : works
> 3. Manually copy kcore with perf buildid-cache, then run perf record
> -e cs_etm... : works
>
> So just "perf record --kcore -e cs_etm ..." doesn't work.
>
> I'm using v6.2 kernel and the perf is built from the same kernel
> source with OpenCSD 1.4. Also attached the sample file generated by
> perf.
>
> Any suggestion is appreciated.
>
> Thanks,
> Yang

Hi Yang,

I don't see any issue with this command and I still get samples:

perf record -e cs_etm// --kcore -- true

You might want to try running both record and report in verbose and
stdio mode (-vvv --stdio) to see if you see any warnings.

I can't think of any way --kcore would cause an issue because all it
does is save kcore into the .debug cache rather than affecting the
actual perf.data file.

Are you running perf report in the same place the recording was made? Or
on another system?

James

2023-03-07 19:54:11

by Yang Shi

[permalink] [raw]
Subject: Re: [BUG] perf: No samples found when using kcore + coresight

Hi James,

Thanks for the prompt response. Please see the inline replies.

On Tue, Mar 7, 2023 at 2:32 AM James Clark <[email protected]> wrote:
>
>
>
> On 06/03/2023 21:13, Yang Shi wrote:
> > Hi,
> >
> > I'm seeking some help regarding perf record --kcore + coresight. When
> > I tracing with perf record --kcore -e cs_etm ... , perf report shows
> > "The perf.data/data data has no samples!".
> >
> > I tried other combinations:
> > 1. perf record --kcore (w/o using coresight): works
> > 2. perf record -e cs_etm ... : works
> > 3. Manually copy kcore with perf buildid-cache, then run perf record
> > -e cs_etm... : works
> >
> > So just "perf record --kcore -e cs_etm ..." doesn't work.
> >
> > I'm using v6.2 kernel and the perf is built from the same kernel
> > source with OpenCSD 1.4. Also attached the sample file generated by
> > perf.
> >
> > Any suggestion is appreciated.
> >
> > Thanks,
> > Yang
>
> Hi Yang,
>
> I don't see any issue with this command and I still get samples:
>
> perf record -e cs_etm// --kcore -- true

This command works for me. But the below command doesn't work:

perf record --kcore -e cs_etm/@tmc_etf63/k --per-thread -- taskset
--cpu-list 1 uname

>
> You might want to try running both record and report in verbose and
> stdio mode (-vvv --stdio) to see if you see any warnings.
>
> I can't think of any way --kcore would cause an issue because all it
> does is save kcore into the .debug cache rather than affecting the
> actual perf.data file.

Yes, this is what I thought. But digging into the code seems it may
add a new dummy event. Please see the below data header dump.

>
> Are you running perf report in the same place the recording was made? Or
> on another system?

Yes.

I did some further investigation. The below is the dump for both good
and bad data files.

Good (copy kcore manually):
# captured on : Tue Mar 7 11:19:14 2023
# header version : 1
# data offset : 1576
# data size : 26736
# feat offset : 28312
# hostname : fedora
# os release : 6.2.0-coresight+
# perf version : 6.2.g3424af22c4fc
# arch : aarch64
# nrcpus online : 128
# nrcpus avail : 128
# cpuid : 0x00000000c00fac30
# total memory : 2108862504 kB
# cmdline : /root/bin/perf record -e cs_etm/@tmc_etf63/k --kcore
--per-thread -- taskset --cpu-list 1 true
# event : name = cs_etm/@tmc_etf63/k, , id = { 2248 }, type = 9, size
= 128, { sample_period, sample_freq } = 1, sample_type =
IP|TID|IDENTIFIER, read_format = ID|LO
ST, disabled = 1, exclude_user = 1, exclude_hv = 1, enable_on_exec =
1, sample_id_all = 1, { bp_len, config2 } = 0x12792918
# event : name = dummy:u, , id = { 2249 }, type = 1, size = 128,
config = 0x9, { sample_period, sample_freq } = 1, sample_type =
IP|TID|IDENTIFIER, read_format = ID|
LOST, disabled = 1, exclude_kernel = 1, exclude_hv = 1, mmap = 1, comm
= 1, enable_on_exec = 1, task = 1, sample_id_all = 1, exclude_guest =
1, mmap2 = 1, comm_exec
= 1, context_switch = 1, bpf_event = 1
# event : name = dummy:u, , id = { 2250, 2251, 2252, 2253, 2254, 2255,
2256, 2257, 2258, 2259, 2260, 2261, 2262, 2263, 2264, 2265, 2266,
2267, 2268, 2269, 2270, 2271
, 2272, 2273, 2274, 2275, 2276, 2277, 2278, 2279, 2280, 2281, 2282,
2283, 2284, 2285, 2286, 2287, 2288, 2289, 2290, 2291, 2292, 2293,
2294, 2295, 2296, 2297, 2298, 2
299, 2300, 2301, 2302, 2303, 2304, 2305, 2306, 2307, 2308, 2309, 2310,
2311, 2312, 2313, 2314, 2315, 2316, 2317, 2318, 2319, 2320, 2321,
2322, 2323, 2324, 2325, 2326
, 2327, 2328, 2329, 2330, 2331, 2332, 2333, 2334, 2335, 2336, 2337,
2338, 2339, 2340, 2341, 2342, 2343, 2344, 2345, 2346, 2347, 2348,
2349, 2350, 2351, 2352, 2353, 2
354, 2355, 2356, 2357, 2358, 2359, 2360, 2361, 2362, 2363, 2364, 2365,
2366, 2367, 2368, 2369, 2370, 2371, 2372, 2373, 2374, 2375, 2376, 2377
}, type = 1, size = 128
, config = 0x9, { sample_period, sample_freq } = 1, sample_type =
IP|TID|TIME|IDENTIFIER, read_format = ID|LOST, exclude_kernel = 1,
exclude_hv = 1, sample_id_all =
1, exclude_guest = 1, ksymbol = 1, text_poke = 1
# CPU_TOPOLOGY info available, use -I to display
# NUMA_TOPOLOGY info available, use -I to display
# pmu mappings: armv8_pmuv3_0 = 8, software = 1, arm_cmn_0 = 10,
uprobe = 7, cs_etm = 9, breakpoint = 5, tracepoint = 2, arm_cmn_1 =
11, kprobe = 6
# contains AUX area data (e.g. instruction trace)
# CACHE info available, use -I to display
# time of first sample : 18446744073.709551
# time of last sample : 18446744073.709551
# sample duration : 0.000 ms
# MEM_TOPOLOGY info available, use -I to display
# armv8_pmuv3_0 pmu capabilities: slots=0x00000004,
bus_slots=0x00000000, bus_width=0x00000000
# missing features: (null) CPUDESC BRANCH_STACK GROUP_DESC STAT
CLOCKID DIR_FORMAT COMPRESSED CPU_PMU_CAPS CLOCK_DATA HYBRID_TOPOLOGY

Bad:
# captured on : Tue Mar 7 11:19:14 2023
# header version : 1
# data offset : 1576
# data size : 26736
# feat offset : 28312
# hostname : fedora
# os release : 6.2.0-coresight+
# perf version : 6.2.g3424af22c4fc
# arch : aarch64
# nrcpus online : 128
# nrcpus avail : 128
# cpuid : 0x00000000c00fac30
# total memory : 2108862504 kB
# cmdline : /root/bin/perf record -e cs_etm/@tmc_etf63/k --kcore
--per-thread -- taskset --cpu-list 1 true
# event : name = cs_etm/@tmc_etf63/k, , id = { 2248 }, type = 9, size
= 128, { sample_period, sample_freq } = 1, sample_type =
IP|TID|IDENTIFIER, read_format = ID|LO
ST, disabled = 1, exclude_user = 1, exclude_hv = 1, enable_on_exec =
1, sample_id_all = 1, { bp_len, config2 } = 0x12792918
# event : name = dummy:u, , id = { 2249 }, type = 1, size = 128,
config = 0x9, { sample_period, sample_freq } = 1, sample_type =
IP|TID|IDENTIFIER, read_format = ID|
LOST, disabled = 1, exclude_kernel = 1, exclude_hv = 1, mmap = 1, comm
= 1, enable_on_exec = 1, task = 1, sample_id_all = 1, exclude_guest =
1, mmap2 = 1, comm_exec
= 1, context_switch = 1, bpf_event = 1
# event : name = dummy:u, , id = { 2250, 2251, 2252, 2253, 2254, 2255,
2256, 2257, 2258, 2259, 2260, 2261, 2262, 2263, 2264, 2265, 2266,
2267, 2268, 2269, 2270, 2271
, 2272, 2273, 2274, 2275, 2276, 2277, 2278, 2279, 2280, 2281, 2282,
2283, 2284, 2285, 2286, 2287, 2288, 2289, 2290, 2291, 2292, 2293,
2294, 2295, 2296, 2297, 2298, 2
299, 2300, 2301, 2302, 2303, 2304, 2305, 2306, 2307, 2308, 2309, 2310,
2311, 2312, 2313, 2314, 2315, 2316, 2317, 2318, 2319, 2320, 2321,
2322, 2323, 2324, 2325, 2326
, 2327, 2328, 2329, 2330, 2331, 2332, 2333, 2334, 2335, 2336, 2337,
2338, 2339, 2340, 2341, 2342, 2343, 2344, 2345, 2346, 2347, 2348,
2349, 2350, 2351, 2352, 2353, 2
354, 2355, 2356, 2357, 2358, 2359, 2360, 2361, 2362, 2363, 2364, 2365,
2366, 2367, 2368, 2369, 2370, 2371, 2372, 2373, 2374, 2375, 2376, 2377
}, type = 1, size = 128
, config = 0x9, { sample_period, sample_freq } = 1, sample_type =
IP|TID|TIME|IDENTIFIER, read_format = ID|LOST, exclude_kernel = 1,
exclude_hv = 1, sample_id_all =
1, exclude_guest = 1, ksymbol = 1, text_poke = 1
# CPU_TOPOLOGY info available, use -I to display
# NUMA_TOPOLOGY info available, use -I to display
# pmu mappings: armv8_pmuv3_0 = 8, software = 1, arm_cmn_0 = 10,
uprobe = 7, cs_etm = 9, breakpoint = 5, tracepoint = 2, arm_cmn_1 =
11, kprobe = 6
# contains AUX area data (e.g. instruction trace)
# CACHE info available, use -I to display
# time of first sample : 18446744073.709551
# time of last sample : 18446744073.709551
# sample duration : 0.000 ms
# MEM_TOPOLOGY info available, use -I to display
# armv8_pmuv3_0 pmu capabilities: slots=0x00000004,
bus_slots=0x00000000, bus_width=0x00000000
# missing features: (null) CPUDESC BRANCH_STACK GROUP_DESC STAT
CLOCKID DIR_FORMAT COMPRESSED CPU_PMU_CAPS CLOCK_DATA HYBRID_TOPOLOGY


Dumping raw events could show the events from the bad data file. But
it has zero samples after event collapse.

The only difference is --kcore inserted a new text_poke dummy event.
It seems coresight also inserted a dummy event with my command but
your command didn't. So it seems like the two dummy events confused
event collapse.

The text_poke dummy event is added by commit
f42c0ce573df79d1b8bd169008c994dcdd43585a ("perf record: Always get
text_poke events with --kcore option"). If I reverted this commit,
then it works. But I'm not sure whether this is the right fix or real
root cause or not. Or coresight shouldn't insert its own dummy event?

>
> James

2023-03-08 19:56:57

by Yang Shi

[permalink] [raw]
Subject: Re: [BUG] perf: No samples found when using kcore + coresight

On Tue, Mar 7, 2023 at 11:45 AM Yang Shi <[email protected]> wrote:
>
> Hi James,
>
> Thanks for the prompt response. Please see the inline replies.
>
> On Tue, Mar 7, 2023 at 2:32 AM James Clark <[email protected]> wrote:
> >
> >
> >
> > On 06/03/2023 21:13, Yang Shi wrote:
> > > Hi,
> > >
> > > I'm seeking some help regarding perf record --kcore + coresight. When
> > > I tracing with perf record --kcore -e cs_etm ... , perf report shows
> > > "The perf.data/data data has no samples!".
> > >
> > > I tried other combinations:
> > > 1. perf record --kcore (w/o using coresight): works
> > > 2. perf record -e cs_etm ... : works
> > > 3. Manually copy kcore with perf buildid-cache, then run perf record
> > > -e cs_etm... : works
> > >
> > > So just "perf record --kcore -e cs_etm ..." doesn't work.
> > >
> > > I'm using v6.2 kernel and the perf is built from the same kernel
> > > source with OpenCSD 1.4. Also attached the sample file generated by
> > > perf.
> > >
> > > Any suggestion is appreciated.
> > >
> > > Thanks,
> > > Yang
> >
> > Hi Yang,
> >
> > I don't see any issue with this command and I still get samples:
> >
> > perf record -e cs_etm// --kcore -- true
>
> This command works for me. But the below command doesn't work:
>
> perf record --kcore -e cs_etm/@tmc_etf63/k --per-thread -- taskset
> --cpu-list 1 uname
>
> >
> > You might want to try running both record and report in verbose and
> > stdio mode (-vvv --stdio) to see if you see any warnings.
> >
> > I can't think of any way --kcore would cause an issue because all it
> > does is save kcore into the .debug cache rather than affecting the
> > actual perf.data file.
>
> Yes, this is what I thought. But digging into the code seems it may
> add a new dummy event. Please see the below data header dump.
>
> >
> > Are you running perf report in the same place the recording was made? Or
> > on another system?
>
> Yes.
>
> I did some further investigation. The below is the dump for both good
> and bad data files.
>
> Good (copy kcore manually):
> # captured on : Tue Mar 7 11:19:14 2023
> # header version : 1
> # data offset : 1576
> # data size : 26736
> # feat offset : 28312
> # hostname : fedora
> # os release : 6.2.0-coresight+
> # perf version : 6.2.g3424af22c4fc
> # arch : aarch64
> # nrcpus online : 128
> # nrcpus avail : 128
> # cpuid : 0x00000000c00fac30
> # total memory : 2108862504 kB
> # cmdline : /root/bin/perf record -e cs_etm/@tmc_etf63/k --kcore
> --per-thread -- taskset --cpu-list 1 true
> # event : name = cs_etm/@tmc_etf63/k, , id = { 2248 }, type = 9, size
> = 128, { sample_period, sample_freq } = 1, sample_type =
> IP|TID|IDENTIFIER, read_format = ID|LO
> ST, disabled = 1, exclude_user = 1, exclude_hv = 1, enable_on_exec =
> 1, sample_id_all = 1, { bp_len, config2 } = 0x12792918
> # event : name = dummy:u, , id = { 2249 }, type = 1, size = 128,
> config = 0x9, { sample_period, sample_freq } = 1, sample_type =
> IP|TID|IDENTIFIER, read_format = ID|
> LOST, disabled = 1, exclude_kernel = 1, exclude_hv = 1, mmap = 1, comm
> = 1, enable_on_exec = 1, task = 1, sample_id_all = 1, exclude_guest =
> 1, mmap2 = 1, comm_exec
> = 1, context_switch = 1, bpf_event = 1
> # event : name = dummy:u, , id = { 2250, 2251, 2252, 2253, 2254, 2255,
> 2256, 2257, 2258, 2259, 2260, 2261, 2262, 2263, 2264, 2265, 2266,
> 2267, 2268, 2269, 2270, 2271
> , 2272, 2273, 2274, 2275, 2276, 2277, 2278, 2279, 2280, 2281, 2282,
> 2283, 2284, 2285, 2286, 2287, 2288, 2289, 2290, 2291, 2292, 2293,
> 2294, 2295, 2296, 2297, 2298, 2
> 299, 2300, 2301, 2302, 2303, 2304, 2305, 2306, 2307, 2308, 2309, 2310,
> 2311, 2312, 2313, 2314, 2315, 2316, 2317, 2318, 2319, 2320, 2321,
> 2322, 2323, 2324, 2325, 2326
> , 2327, 2328, 2329, 2330, 2331, 2332, 2333, 2334, 2335, 2336, 2337,
> 2338, 2339, 2340, 2341, 2342, 2343, 2344, 2345, 2346, 2347, 2348,
> 2349, 2350, 2351, 2352, 2353, 2
> 354, 2355, 2356, 2357, 2358, 2359, 2360, 2361, 2362, 2363, 2364, 2365,
> 2366, 2367, 2368, 2369, 2370, 2371, 2372, 2373, 2374, 2375, 2376, 2377
> }, type = 1, size = 128
> , config = 0x9, { sample_period, sample_freq } = 1, sample_type =
> IP|TID|TIME|IDENTIFIER, read_format = ID|LOST, exclude_kernel = 1,
> exclude_hv = 1, sample_id_all =
> 1, exclude_guest = 1, ksymbol = 1, text_poke = 1
> # CPU_TOPOLOGY info available, use -I to display
> # NUMA_TOPOLOGY info available, use -I to display
> # pmu mappings: armv8_pmuv3_0 = 8, software = 1, arm_cmn_0 = 10,
> uprobe = 7, cs_etm = 9, breakpoint = 5, tracepoint = 2, arm_cmn_1 =
> 11, kprobe = 6
> # contains AUX area data (e.g. instruction trace)
> # CACHE info available, use -I to display
> # time of first sample : 18446744073.709551
> # time of last sample : 18446744073.709551
> # sample duration : 0.000 ms
> # MEM_TOPOLOGY info available, use -I to display
> # armv8_pmuv3_0 pmu capabilities: slots=0x00000004,
> bus_slots=0x00000000, bus_width=0x00000000
> # missing features: (null) CPUDESC BRANCH_STACK GROUP_DESC STAT
> CLOCKID DIR_FORMAT COMPRESSED CPU_PMU_CAPS CLOCK_DATA HYBRID_TOPOLOGY
>
> Bad:
> # captured on : Tue Mar 7 11:19:14 2023
> # header version : 1
> # data offset : 1576
> # data size : 26736
> # feat offset : 28312
> # hostname : fedora
> # os release : 6.2.0-coresight+
> # perf version : 6.2.g3424af22c4fc
> # arch : aarch64
> # nrcpus online : 128
> # nrcpus avail : 128
> # cpuid : 0x00000000c00fac30
> # total memory : 2108862504 kB
> # cmdline : /root/bin/perf record -e cs_etm/@tmc_etf63/k --kcore
> --per-thread -- taskset --cpu-list 1 true
> # event : name = cs_etm/@tmc_etf63/k, , id = { 2248 }, type = 9, size
> = 128, { sample_period, sample_freq } = 1, sample_type =
> IP|TID|IDENTIFIER, read_format = ID|LO
> ST, disabled = 1, exclude_user = 1, exclude_hv = 1, enable_on_exec =
> 1, sample_id_all = 1, { bp_len, config2 } = 0x12792918
> # event : name = dummy:u, , id = { 2249 }, type = 1, size = 128,
> config = 0x9, { sample_period, sample_freq } = 1, sample_type =
> IP|TID|IDENTIFIER, read_format = ID|
> LOST, disabled = 1, exclude_kernel = 1, exclude_hv = 1, mmap = 1, comm
> = 1, enable_on_exec = 1, task = 1, sample_id_all = 1, exclude_guest =
> 1, mmap2 = 1, comm_exec
> = 1, context_switch = 1, bpf_event = 1
> # event : name = dummy:u, , id = { 2250, 2251, 2252, 2253, 2254, 2255,
> 2256, 2257, 2258, 2259, 2260, 2261, 2262, 2263, 2264, 2265, 2266,
> 2267, 2268, 2269, 2270, 2271
> , 2272, 2273, 2274, 2275, 2276, 2277, 2278, 2279, 2280, 2281, 2282,
> 2283, 2284, 2285, 2286, 2287, 2288, 2289, 2290, 2291, 2292, 2293,
> 2294, 2295, 2296, 2297, 2298, 2
> 299, 2300, 2301, 2302, 2303, 2304, 2305, 2306, 2307, 2308, 2309, 2310,
> 2311, 2312, 2313, 2314, 2315, 2316, 2317, 2318, 2319, 2320, 2321,
> 2322, 2323, 2324, 2325, 2326
> , 2327, 2328, 2329, 2330, 2331, 2332, 2333, 2334, 2335, 2336, 2337,
> 2338, 2339, 2340, 2341, 2342, 2343, 2344, 2345, 2346, 2347, 2348,
> 2349, 2350, 2351, 2352, 2353, 2
> 354, 2355, 2356, 2357, 2358, 2359, 2360, 2361, 2362, 2363, 2364, 2365,
> 2366, 2367, 2368, 2369, 2370, 2371, 2372, 2373, 2374, 2375, 2376, 2377
> }, type = 1, size = 128
> , config = 0x9, { sample_period, sample_freq } = 1, sample_type =
> IP|TID|TIME|IDENTIFIER, read_format = ID|LOST, exclude_kernel = 1,
> exclude_hv = 1, sample_id_all =
> 1, exclude_guest = 1, ksymbol = 1, text_poke = 1
> # CPU_TOPOLOGY info available, use -I to display
> # NUMA_TOPOLOGY info available, use -I to display
> # pmu mappings: armv8_pmuv3_0 = 8, software = 1, arm_cmn_0 = 10,
> uprobe = 7, cs_etm = 9, breakpoint = 5, tracepoint = 2, arm_cmn_1 =
> 11, kprobe = 6
> # contains AUX area data (e.g. instruction trace)
> # CACHE info available, use -I to display
> # time of first sample : 18446744073.709551
> # time of last sample : 18446744073.709551
> # sample duration : 0.000 ms
> # MEM_TOPOLOGY info available, use -I to display
> # armv8_pmuv3_0 pmu capabilities: slots=0x00000004,
> bus_slots=0x00000000, bus_width=0x00000000
> # missing features: (null) CPUDESC BRANCH_STACK GROUP_DESC STAT
> CLOCKID DIR_FORMAT COMPRESSED CPU_PMU_CAPS CLOCK_DATA HYBRID_TOPOLOGY
>
>
> Dumping raw events could show the events from the bad data file. But
> it has zero samples after event collapse.
>
> The only difference is --kcore inserted a new text_poke dummy event.
> It seems coresight also inserted a dummy event with my command but
> your command didn't. So it seems like the two dummy events confused
> event collapse.
>
> The text_poke dummy event is added by commit
> f42c0ce573df79d1b8bd169008c994dcdd43585a ("perf record: Always get
> text_poke events with --kcore option"). If I reverted this commit,
> then it works. But I'm not sure whether this is the right fix or real
> root cause or not. Or coresight shouldn't insert its own dummy event?

It seems like coresight needs to insert the dummy event if
full_auxtrace is on IIUC. So it sounds like event collapse can't
handle such a case? I also tried v5.19 (before "perf record: Always
get text_poke events with --kcore option", which was merged in v6.0),
it works. So it seems like a regression.

>
> >
> > James

2023-03-09 11:39:04

by Leo Yan

[permalink] [raw]
Subject: Re: [BUG] perf: No samples found when using kcore + coresight

Hi Yang,

On Wed, Mar 08, 2023 at 11:56:38AM -0800, Yang Shi wrote:

[...]

> > Dumping raw events could show the events from the bad data file. But
> > it has zero samples after event collapse.
> >
> > The only difference is --kcore inserted a new text_poke dummy event.
> > It seems coresight also inserted a dummy event with my command but
> > your command didn't. So it seems like the two dummy events confused
> > event collapse.
> >
> > The text_poke dummy event is added by commit
> > f42c0ce573df79d1b8bd169008c994dcdd43585a ("perf record: Always get
> > text_poke events with --kcore option"). If I reverted this commit,
> > then it works. But I'm not sure whether this is the right fix or real
> > root cause or not. Or coresight shouldn't insert its own dummy event?
>
> It seems like coresight needs to insert the dummy event if
> full_auxtrace is on IIUC. So it sounds like event collapse can't
> handle such a case?

I am struggling to understand the meaning "event collapse" :)

I reviewed your shared dump, the bad and good perf data both contain the
dummy event with 'text_poke = 1'. Could you confirm the shared dump in
your previous email is correct or not?

> I also tried v5.19 (before "perf record: Always
> get text_poke events with --kcore option", which was merged in v6.0),
> it works. So it seems like a regression.

Yeah, we need to fix it. I am not sure the Linux kernel for Arm64
supports text poke or not (kernel needs some specific handling when
alter instructions), the kernel change is the prerequisites.

On the other hand, in the current code cs-etm misses to handle the
event PERF_RECORD_TEXT_POKE in the function cs_etm__process_event().
This might be the cause for the failure.

Do you mind to share the bad perf.data file with James and me?

Thanks,
Leo

2023-03-09 18:06:58

by Yang Shi

[permalink] [raw]
Subject: Re: [BUG] perf: No samples found when using kcore + coresight

On Thu, Mar 9, 2023 at 3:39 AM Leo Yan <[email protected]> wrote:
>
> Hi Yang,
>
> On Wed, Mar 08, 2023 at 11:56:38AM -0800, Yang Shi wrote:
>
> [...]
>
> > > Dumping raw events could show the events from the bad data file. But
> > > it has zero samples after event collapse.
> > >
> > > The only difference is --kcore inserted a new text_poke dummy event.
> > > It seems coresight also inserted a dummy event with my command but
> > > your command didn't. So it seems like the two dummy events confused
> > > event collapse.
> > >
> > > The text_poke dummy event is added by commit
> > > f42c0ce573df79d1b8bd169008c994dcdd43585a ("perf record: Always get
> > > text_poke events with --kcore option"). If I reverted this commit,
> > > then it works. But I'm not sure whether this is the right fix or real
> > > root cause or not. Or coresight shouldn't insert its own dummy event?
> >
> > It seems like coresight needs to insert the dummy event if
> > full_auxtrace is on IIUC. So it sounds like event collapse can't
> > handle such a case?
>
> I am struggling to understand the meaning "event collapse" :)

I mean report__collapse_hists(). Since dumping raw events is fine, so
it seems like report__collapse_hists() returns 0 samples after
collapse.

>
> I reviewed your shared dump, the bad and good perf data both contain the
> dummy event with 'text_poke = 1'. Could you confirm the shared dump in
> your previous email is correct or not?

Oops, sorry. I pasted the wrong log. The good one looks like
(generated by v5.19):

# captured on : Wed Mar 8 18:02:58 2023
# header version : 1
# data offset : 408
# data size : 22640
# feat offset : 23048
# hostname : fedora
# os release : 6.2.0-coresight+
# perf version : 5.19.g3d7cb6b04c3f
# arch : aarch64
# nrcpus online : 128
# nrcpus avail : 128
# cpuid : 0x00000000c00fac30
# total memory : 2108862504 kB
# cmdline : /home/yshi/linux/tools/perf/perf record -e
cs_etm/@tmc_etf63/k --kcore --per-thread -- taskset --cpu-list 1 uname
# event : name = cs_etm/@tmc_etf63/k, , id = { 3832 }, type = 9, size
= 128, { sample_period, sample_freq } = 1, sample_type =
IP|TID|IDENTIFIER, read_format = ID, d
isabled = 1, exclude_user = 1, exclude_hv = 1, enable_on_exec = 1,
sample_id_all = 1, { bp_len, config2 } = 0x12792918
# event : name = dummy:u, , id = { 3833 }, type = 1, size = 128,
config = 0x9, { sample_period, sample_freq } = 1, sample_type =
IP|TID|IDENTIFIER, read_format = ID,
disabled = 1, exclude_kernel = 1, exclude_hv = 1, mmap = 1, comm = 1,
enable_on_exec = 1, task = 1, sample_id_all = 1, exclude_guest = 1,
mmap2 = 1, comm_exec = 1,
context_switch = 1, ksymbol = 1, bpf_event = 1
# CPU_TOPOLOGY info available, use -I to display
# NUMA_TOPOLOGY info available, use -I to display
# pmu mappings: armv8_pmuv3_0 = 8, software = 1, arm_cmn_0 = 10,
uprobe = 7, cs_etm = 9, breakpoint = 5, tracepoint = 2, arm_cmn_1 =
11, kprobe = 6
# contains AUX area data (e.g. instruction trace)
# CACHE info available, use -I to display
# time of first sample : 18446744073.709551
# time of last sample : 18446744073.709551
# sample duration : 0.000 ms
# MEM_TOPOLOGY info available, use -I to display
# missing features: TRACING_DATA CPUDESC BRANCH_STACK GROUP_DESC STAT
CLOCKID DIR_FORMAT COMPRESSED CPU_PMU_CAPS CLOCK_DATA HYBRID_TOPOLOGY
HYBRID_CPU_PMU_CAPS

>
> > I also tried v5.19 (before "perf record: Always
> > get text_poke events with --kcore option", which was merged in v6.0),
> > it works. So it seems like a regression.
>
> Yeah, we need to fix it. I am not sure the Linux kernel for Arm64
> supports text poke or not (kernel needs some specific handling when
> alter instructions), the kernel change is the prerequisites.
>
> On the other hand, in the current code cs-etm misses to handle the
> event PERF_RECORD_TEXT_POKE in the function cs_etm__process_event().
> This might be the cause for the failure.
>
> Do you mind to share the bad perf.data file with James and me?

Please check the attachment out. Thanks for looking into this problem.

>
> Thanks,
> Leo


Attachments:
perf.data.broken (164.07 kB)

2023-03-13 12:14:31

by Leo Yan

[permalink] [raw]
Subject: Re: [BUG] perf: No samples found when using kcore + coresight

On Thu, Mar 09, 2023 at 10:06:41AM -0800, Yang Shi wrote:

[...]

> > I reviewed your shared dump, the bad and good perf data both contain the
> > dummy event with 'text_poke = 1'. Could you confirm the shared dump in
> > your previous email is correct or not?
>
> Oops, sorry. I pasted the wrong log. The good one looks like
> (generated by v5.19):
>
> # captured on : Wed Mar 8 18:02:58 2023
> # header version : 1
> # data offset : 408
> # data size : 22640
> # feat offset : 23048
> # hostname : fedora
> # os release : 6.2.0-coresight+
> # perf version : 5.19.g3d7cb6b04c3f
> # arch : aarch64
> # nrcpus online : 128
> # nrcpus avail : 128
> # cpuid : 0x00000000c00fac30
> # total memory : 2108862504 kB
> # cmdline : /home/yshi/linux/tools/perf/perf record -e
> cs_etm/@tmc_etf63/k --kcore --per-thread -- taskset --cpu-list 1 uname
> # event : name = cs_etm/@tmc_etf63/k, , id = { 3832 }, type = 9, size
> = 128, { sample_period, sample_freq } = 1, sample_type =
> IP|TID|IDENTIFIER, read_format = ID, d
> isabled = 1, exclude_user = 1, exclude_hv = 1, enable_on_exec = 1,
> sample_id_all = 1, { bp_len, config2 } = 0x12792918
> # event : name = dummy:u, , id = { 3833 }, type = 1, size = 128,
> config = 0x9, { sample_period, sample_freq } = 1, sample_type =
> IP|TID|IDENTIFIER, read_format = ID,
> disabled = 1, exclude_kernel = 1, exclude_hv = 1, mmap = 1, comm = 1,
> enable_on_exec = 1, task = 1, sample_id_all = 1, exclude_guest = 1,
> mmap2 = 1, comm_exec = 1,
> context_switch = 1, ksymbol = 1, bpf_event = 1
> # CPU_TOPOLOGY info available, use -I to display
> # NUMA_TOPOLOGY info available, use -I to display
> # pmu mappings: armv8_pmuv3_0 = 8, software = 1, arm_cmn_0 = 10,
> uprobe = 7, cs_etm = 9, breakpoint = 5, tracepoint = 2, arm_cmn_1 =
> 11, kprobe = 6
> # contains AUX area data (e.g. instruction trace)
> # CACHE info available, use -I to display
> # time of first sample : 18446744073.709551
> # time of last sample : 18446744073.709551
> # sample duration : 0.000 ms
> # MEM_TOPOLOGY info available, use -I to display
> # missing features: TRACING_DATA CPUDESC BRANCH_STACK GROUP_DESC STAT
> CLOCKID DIR_FORMAT COMPRESSED CPU_PMU_CAPS CLOCK_DATA HYBRID_TOPOLOGY
> HYBRID_CPU_PMU_CAPS

Thanks for confirmation.

Just a quick summary, here we have two issues:

- With command:
perf record -e cs_etm/@tmc_etf63/k --kcore --per-thread \
-- taskset --cpu-list 1 uname",

perf doesn't enable "text poke" attribution.

- With command:
perf record -e cs_etm/@tmc_etf63/k --kcore --per-thread \
-- taskset --cpu-list 1 true (in your previous email), or ...
perf record --kcore -e cs_etm/@tmc_etf63/k --per-thread \
-- taskset --cpu-list 1 uname (in your shared perf data file),

perf enables "text poke" attribution, in this case, perf fails to decode
Arm CoreSight trace data.

[...]

> > Do you mind to share the bad perf.data file with James and me?
>
> Please check the attachment out. Thanks for looking into this problem.

Thank you for sharing the data. We will look into it.

Leo

2023-03-13 18:16:01

by Yang Shi

[permalink] [raw]
Subject: Re: [BUG] perf: No samples found when using kcore + coresight

On Mon, Mar 13, 2023 at 5:14 AM Leo Yan <[email protected]> wrote:
>
> On Thu, Mar 09, 2023 at 10:06:41AM -0800, Yang Shi wrote:
>
> [...]
>
> > > I reviewed your shared dump, the bad and good perf data both contain the
> > > dummy event with 'text_poke = 1'. Could you confirm the shared dump in
> > > your previous email is correct or not?
> >
> > Oops, sorry. I pasted the wrong log. The good one looks like
> > (generated by v5.19):
> >
> > # captured on : Wed Mar 8 18:02:58 2023
> > # header version : 1
> > # data offset : 408
> > # data size : 22640
> > # feat offset : 23048
> > # hostname : fedora
> > # os release : 6.2.0-coresight+
> > # perf version : 5.19.g3d7cb6b04c3f
> > # arch : aarch64
> > # nrcpus online : 128
> > # nrcpus avail : 128
> > # cpuid : 0x00000000c00fac30
> > # total memory : 2108862504 kB
> > # cmdline : /home/yshi/linux/tools/perf/perf record -e
> > cs_etm/@tmc_etf63/k --kcore --per-thread -- taskset --cpu-list 1 uname
> > # event : name = cs_etm/@tmc_etf63/k, , id = { 3832 }, type = 9, size
> > = 128, { sample_period, sample_freq } = 1, sample_type =
> > IP|TID|IDENTIFIER, read_format = ID, d
> > isabled = 1, exclude_user = 1, exclude_hv = 1, enable_on_exec = 1,
> > sample_id_all = 1, { bp_len, config2 } = 0x12792918
> > # event : name = dummy:u, , id = { 3833 }, type = 1, size = 128,
> > config = 0x9, { sample_period, sample_freq } = 1, sample_type =
> > IP|TID|IDENTIFIER, read_format = ID,
> > disabled = 1, exclude_kernel = 1, exclude_hv = 1, mmap = 1, comm = 1,
> > enable_on_exec = 1, task = 1, sample_id_all = 1, exclude_guest = 1,
> > mmap2 = 1, comm_exec = 1,
> > context_switch = 1, ksymbol = 1, bpf_event = 1
> > # CPU_TOPOLOGY info available, use -I to display
> > # NUMA_TOPOLOGY info available, use -I to display
> > # pmu mappings: armv8_pmuv3_0 = 8, software = 1, arm_cmn_0 = 10,
> > uprobe = 7, cs_etm = 9, breakpoint = 5, tracepoint = 2, arm_cmn_1 =
> > 11, kprobe = 6
> > # contains AUX area data (e.g. instruction trace)
> > # CACHE info available, use -I to display
> > # time of first sample : 18446744073.709551
> > # time of last sample : 18446744073.709551
> > # sample duration : 0.000 ms
> > # MEM_TOPOLOGY info available, use -I to display
> > # missing features: TRACING_DATA CPUDESC BRANCH_STACK GROUP_DESC STAT
> > CLOCKID DIR_FORMAT COMPRESSED CPU_PMU_CAPS CLOCK_DATA HYBRID_TOPOLOGY
> > HYBRID_CPU_PMU_CAPS
>
> Thanks for confirmation.
>
> Just a quick summary, here we have two issues:
>
> - With command:
> perf record -e cs_etm/@tmc_etf63/k --kcore --per-thread \
> -- taskset --cpu-list 1 uname",
>
> perf doesn't enable "text poke" attribution.

No, it enables "text poke" and perf fails to decode coresight trace
data too. It doesn't matter whether "--kcore" is after or before "-e
cs/etm/@tmc_etf63/k".

>
> - With command:
> perf record -e cs_etm/@tmc_etf63/k --kcore --per-thread \
> -- taskset --cpu-list 1 true (in your previous email), or ...
> perf record --kcore -e cs_etm/@tmc_etf63/k --per-thread \
> -- taskset --cpu-list 1 uname (in your shared perf data file),
>
> perf enables "text poke" attribution, in this case, perf fails to decode
> Arm CoreSight trace data.

Yes.

>
> [...]
>
> > > Do you mind to share the bad perf.data file with James and me?
> >
> > Please check the attachment out. Thanks for looking into this problem.
>
> Thank you for sharing the data. We will look into it.

Thank you.

>
> Leo

2023-03-14 00:36:37

by Leo Yan

[permalink] [raw]
Subject: Re: [BUG] perf: No samples found when using kcore + coresight

On Mon, Mar 13, 2023 at 11:15:44AM -0700, Yang Shi wrote:

[...]

> > Just a quick summary, here we have two issues:
> >
> > - With command:
> > perf record -e cs_etm/@tmc_etf63/k --kcore --per-thread \
> > -- taskset --cpu-list 1 uname",
> >
> > perf doesn't enable "text poke" attribution.
>
> No, it enables "text poke" and perf fails to decode coresight trace
> data too. It doesn't matter whether "--kcore" is after or before "-e
> cs/etm/@tmc_etf63/k".

Understand now. Thanks for correction, if so we can ignore this one.

Leo

2023-03-28 00:55:25

by Yang Shi

[permalink] [raw]
Subject: Re: [BUG] perf: No samples found when using kcore + coresight

Hi Leo,

Just follow up on this one. Any update?

Thanks,
Yang

On Mon, Mar 13, 2023 at 5:36 PM Leo Yan <[email protected]> wrote:
>
> On Mon, Mar 13, 2023 at 11:15:44AM -0700, Yang Shi wrote:
>
> [...]
>
> > > Just a quick summary, here we have two issues:
> > >
> > > - With command:
> > > perf record -e cs_etm/@tmc_etf63/k --kcore --per-thread \
> > > -- taskset --cpu-list 1 uname",
> > >
> > > perf doesn't enable "text poke" attribution.
> >
> > No, it enables "text poke" and perf fails to decode coresight trace
> > data too. It doesn't matter whether "--kcore" is after or before "-e
> > cs/etm/@tmc_etf63/k".
>
> Understand now. Thanks for correction, if so we can ignore this one.
>
> Leo

2023-03-28 09:08:03

by James Clark

[permalink] [raw]
Subject: Re: [BUG] perf: No samples found when using kcore + coresight



On 28/03/2023 01:53, Yang Shi wrote:
> Hi Leo,
>
> Just follow up on this one. Any update?
>
Hi Yang,

Sorry no update on this yet from me. I was just finishing off
"coresight: Fix CTI module refcount leak by making it a helper device"
which I hope to post in the next day or two and then I will start on this.

James

> Thanks,
> Yang
>
> On Mon, Mar 13, 2023 at 5:36 PM Leo Yan <[email protected]> wrote:
>>
>> On Mon, Mar 13, 2023 at 11:15:44AM -0700, Yang Shi wrote:
>>
>> [...]
>>
>>>> Just a quick summary, here we have two issues:
>>>>
>>>> - With command:
>>>> perf record -e cs_etm/@tmc_etf63/k --kcore --per-thread \
>>>> -- taskset --cpu-list 1 uname",
>>>>
>>>> perf doesn't enable "text poke" attribution.
>>>
>>> No, it enables "text poke" and perf fails to decode coresight trace
>>> data too. It doesn't matter whether "--kcore" is after or before "-e
>>> cs/etm/@tmc_etf63/k".
>>
>> Understand now. Thanks for correction, if so we can ignore this one.
>>
>> Leo

2023-03-28 16:19:34

by Yang Shi

[permalink] [raw]
Subject: Re: [BUG] perf: No samples found when using kcore + coresight



On 3/28/23 1:59 AM, James Clark wrote:
>
> On 28/03/2023 01:53, Yang Shi wrote:
>> Hi Leo,
>>
>> Just follow up on this one. Any update?
>>
> Hi Yang,
>
> Sorry no update on this yet from me. I was just finishing off
> "coresight: Fix CTI module refcount leak by making it a helper device"
> which I hope to post in the next day or two and then I will start on this.

Hi James,

Thank you for the update.

>
> James
>
>> Thanks,
>> Yang
>>
>> On Mon, Mar 13, 2023 at 5:36 PM Leo Yan <[email protected]> wrote:
>>> On Mon, Mar 13, 2023 at 11:15:44AM -0700, Yang Shi wrote:
>>>
>>> [...]
>>>
>>>>> Just a quick summary, here we have two issues:
>>>>>
>>>>> - With command:
>>>>> perf record -e cs_etm/@tmc_etf63/k --kcore --per-thread \
>>>>> -- taskset --cpu-list 1 uname",
>>>>>
>>>>> perf doesn't enable "text poke" attribution.
>>>> No, it enables "text poke" and perf fails to decode coresight trace
>>>> data too. It doesn't matter whether "--kcore" is after or before "-e
>>>> cs/etm/@tmc_etf63/k".
>>> Understand now. Thanks for correction, if so we can ignore this one.
>>>
>>> Leo

2023-03-29 16:23:17

by James Clark

[permalink] [raw]
Subject: Re: [BUG] perf: No samples found when using kcore + coresight



On 14/03/2023 00:36, Leo Yan wrote:
> On Mon, Mar 13, 2023 at 11:15:44AM -0700, Yang Shi wrote:
>
> [...]
>
>>> Just a quick summary, here we have two issues:
>>>
>>> - With command:
>>> perf record -e cs_etm/@tmc_etf63/k --kcore --per-thread \
>>> -- taskset --cpu-list 1 uname",
>>>
>>> perf doesn't enable "text poke" attribution.
>>
>> No, it enables "text poke" and perf fails to decode coresight trace
>> data too. It doesn't matter whether "--kcore" is after or before "-e
>> cs/etm/@tmc_etf63/k".
>
> Understand now. Thanks for correction, if so we can ignore this one.
>
> Leo

To me it looks like it's only --per-thread and --kcore together that
cause the issue. I can't see if that was mentioned previously in this
thread.

If it is --per-thread that's causing the issue then I think I have an
idea why it might be. There are some assumptions and different paths
taken in decoding in that mode that aren't correct. It causes some other
issues to do with ordering and timestamps as well and I wanted to fix it
previously. I wouldn't say that the text-poke change has caused a
regression, as decoding in this mode was always a bit buggy.

Maybe this is another reason to fix it properly.

2023-03-29 16:35:17

by James Clark

[permalink] [raw]
Subject: Re: [BUG] perf: No samples found when using kcore + coresight



On 14/03/2023 00:36, Leo Yan wrote:
> On Mon, Mar 13, 2023 at 11:15:44AM -0700, Yang Shi wrote:
>
> [...]
>
>>> Just a quick summary, here we have two issues:
>>>
>>> - With command:
>>> perf record -e cs_etm/@tmc_etf63/k --kcore --per-thread \
>>> -- taskset --cpu-list 1 uname",
>>>
>>> perf doesn't enable "text poke" attribution.
>>
>> No, it enables "text poke" and perf fails to decode coresight trace
>> data too. It doesn't matter whether "--kcore" is after or before "-e
>> cs/etm/@tmc_etf63/k".
>
> Understand now. Thanks for correction, if so we can ignore this one.
>
> Leo

To me it looks like it's only --per-thread and --kcore together that
cause the issue. I can't see if that was mentioned previously in this
thread.

If it is --per-thread that's causing the issue then I think I have an
idea why it might be. There are some assumptions and different paths
taken in decoding in that mode that aren't correct. It causes some other
issues to do with ordering and timestamps as well and I wanted to fix it
previously. I wouldn't say that the text-poke change has caused a
regression, as decoding in this mode was always a bit buggy.

Maybe this is another reason to fix it properly.

2023-03-29 23:26:48

by Yang Shi

[permalink] [raw]
Subject: Re: [BUG] perf: No samples found when using kcore + coresight

On Wed, Mar 29, 2023 at 9:08 AM James Clark <[email protected]> wrote:
>
>
>
> On 14/03/2023 00:36, Leo Yan wrote:
> > On Mon, Mar 13, 2023 at 11:15:44AM -0700, Yang Shi wrote:
> >
> > [...]
> >
> >>> Just a quick summary, here we have two issues:
> >>>
> >>> - With command:
> >>> perf record -e cs_etm/@tmc_etf63/k --kcore --per-thread \
> >>> -- taskset --cpu-list 1 uname",
> >>>
> >>> perf doesn't enable "text poke" attribution.
> >>
> >> No, it enables "text poke" and perf fails to decode coresight trace
> >> data too. It doesn't matter whether "--kcore" is after or before "-e
> >> cs/etm/@tmc_etf63/k".
> >
> > Understand now. Thanks for correction, if so we can ignore this one.
> >
> > Leo
>
> To me it looks like it's only --per-thread and --kcore together that
> cause the issue. I can't see if that was mentioned previously in this
> thread.

If "--pre-thread" is not passed in, perf record failed with "failed to
mmap with 12 (Cannot allocate memory)". Sorry for not mentioning this
in the first place. I was quite focused on --kcore and didn't realize
they may be related.

>
> If it is --per-thread that's causing the issue then I think I have an
> idea why it might be. There are some assumptions and different paths
> taken in decoding in that mode that aren't correct. It causes some other
> issues to do with ordering and timestamps as well and I wanted to fix it
> previously. I wouldn't say that the text-poke change has caused a
> regression, as decoding in this mode was always a bit buggy.
>
> Maybe this is another reason to fix it properly.

2023-03-30 08:21:12

by James Clark

[permalink] [raw]
Subject: Re: [BUG] perf: No samples found when using kcore + coresight



On 30/03/2023 00:25, Yang Shi wrote:
> On Wed, Mar 29, 2023 at 9:08 AM James Clark <[email protected]> wrote:
>>
>>
>>
>> On 14/03/2023 00:36, Leo Yan wrote:
>>> On Mon, Mar 13, 2023 at 11:15:44AM -0700, Yang Shi wrote:
>>>
>>> [...]
>>>
>>>>> Just a quick summary, here we have two issues:
>>>>>
>>>>> - With command:
>>>>> perf record -e cs_etm/@tmc_etf63/k --kcore --per-thread \
>>>>> -- taskset --cpu-list 1 uname",
>>>>>
>>>>> perf doesn't enable "text poke" attribution.
>>>>
>>>> No, it enables "text poke" and perf fails to decode coresight trace
>>>> data too. It doesn't matter whether "--kcore" is after or before "-e
>>>> cs/etm/@tmc_etf63/k".
>>>
>>> Understand now. Thanks for correction, if so we can ignore this one.
>>>
>>> Leo
>>
>> To me it looks like it's only --per-thread and --kcore together that
>> cause the issue. I can't see if that was mentioned previously in this
>> thread.
>
> If "--pre-thread" is not passed in, perf record failed with "failed to
> mmap with 12 (Cannot allocate memory)". Sorry for not mentioning this
> in the first place. I was quite focused on --kcore and didn't realize
> they may be related.

That's unrelated. That's because you have specified a sink and without
--per-thread it tries to open the event on all cores. If the sink can't
be reached from all cores it will fail to open. You can make it work
without --per-thread if you limit it to a valid core like this, although
I don't know which ones exactly would be valid for your system:


perf record -e cs_etm/@tmc_etf63/k --kcore -C 0 \
-- taskset --cpu-list 1 uname

>
>>
>> If it is --per-thread that's causing the issue then I think I have an
>> idea why it might be. There are some assumptions and different paths
>> taken in decoding in that mode that aren't correct. It causes some other
>> issues to do with ordering and timestamps as well and I wanted to fix it
>> previously. I wouldn't say that the text-poke change has caused a
>> regression, as decoding in this mode was always a bit buggy.
>>
>> Maybe this is another reason to fix it properly.

2023-03-30 10:44:58

by James Clark

[permalink] [raw]
Subject: Re: [BUG] perf: No samples found when using kcore + coresight



On 29/03/2023 17:11, James Clark wrote:
>
>
> On 14/03/2023 00:36, Leo Yan wrote:
>> On Mon, Mar 13, 2023 at 11:15:44AM -0700, Yang Shi wrote:
>>
>> [...]
>>
>>>> Just a quick summary, here we have two issues:
>>>>
>>>> - With command:
>>>> perf record -e cs_etm/@tmc_etf63/k --kcore --per-thread \
>>>> -- taskset --cpu-list 1 uname",
>>>>
>>>> perf doesn't enable "text poke" attribution.
>>>
>>> No, it enables "text poke" and perf fails to decode coresight trace
>>> data too. It doesn't matter whether "--kcore" is after or before "-e
>>> cs/etm/@tmc_etf63/k".
>>
>> Understand now. Thanks for correction, if so we can ignore this one.
>>
>> Leo
>
> To me it looks like it's only --per-thread and --kcore together that
> cause the issue. I can't see if that was mentioned previously in this
> thread.
>
> If it is --per-thread that's causing the issue then I think I have an
> idea why it might be. There are some assumptions and different paths
> taken in decoding in that mode that aren't correct. It causes some other
> issues to do with ordering and timestamps as well and I wanted to fix it
> previously. I wouldn't say that the text-poke change has caused a
> regression, as decoding in this mode was always a bit buggy.
>
> Maybe this is another reason to fix it properly.

Hi Yang,

I found a temporary workaround to the issue. Whenever you use
--per-thread mode, force timeless mode when decoding with --itrace=Z and
it works the same as it did before. For example:

perf report --itrace=Zi1000i

The reason is that the new text poke event was added with timestamps
enabled. The Coresight decoder slightly incorrectly assumed that no
timestamps == per-thread mode so when it sees the new event it goes into
the wrong decode mode.

The assumption was never really correct so I will make a proper fix. But
in the mean time the workaround should be fine.

2023-03-30 19:55:34

by Yang Shi

[permalink] [raw]
Subject: Re: [BUG] perf: No samples found when using kcore + coresight

On Thu, Mar 30, 2023 at 3:36 AM James Clark <[email protected]> wrote:
>
>
>
> On 29/03/2023 17:11, James Clark wrote:
> >
> >
> > On 14/03/2023 00:36, Leo Yan wrote:
> >> On Mon, Mar 13, 2023 at 11:15:44AM -0700, Yang Shi wrote:
> >>
> >> [...]
> >>
> >>>> Just a quick summary, here we have two issues:
> >>>>
> >>>> - With command:
> >>>> perf record -e cs_etm/@tmc_etf63/k --kcore --per-thread \
> >>>> -- taskset --cpu-list 1 uname",
> >>>>
> >>>> perf doesn't enable "text poke" attribution.
> >>>
> >>> No, it enables "text poke" and perf fails to decode coresight trace
> >>> data too. It doesn't matter whether "--kcore" is after or before "-e
> >>> cs/etm/@tmc_etf63/k".
> >>
> >> Understand now. Thanks for correction, if so we can ignore this one.
> >>
> >> Leo
> >
> > To me it looks like it's only --per-thread and --kcore together that
> > cause the issue. I can't see if that was mentioned previously in this
> > thread.
> >
> > If it is --per-thread that's causing the issue then I think I have an
> > idea why it might be. There are some assumptions and different paths
> > taken in decoding in that mode that aren't correct. It causes some other
> > issues to do with ordering and timestamps as well and I wanted to fix it
> > previously. I wouldn't say that the text-poke change has caused a
> > regression, as decoding in this mode was always a bit buggy.
> >
> > Maybe this is another reason to fix it properly.
>
> Hi Yang,
>
> I found a temporary workaround to the issue. Whenever you use
> --per-thread mode, force timeless mode when decoding with --itrace=Z and
> it works the same as it did before. For example:
>
> perf report --itrace=Zi1000i
>
> The reason is that the new text poke event was added with timestamps
> enabled. The Coresight decoder slightly incorrectly assumed that no
> timestamps == per-thread mode so when it sees the new event it goes into
> the wrong decode mode.
>
> The assumption was never really correct so I will make a proper fix. But
> in the mean time the workaround should be fine.

Hi James,

Thanks for figuring out the workaround. I tested both (using "-C"
instead of "--per-thread" for record or using "--itrace=Z" for report)
works. I think the workaround is fine before we have a proper fix.

>