Commit 7736627b865d ("perf stat: Use affinity for closing file
descriptors") will use FD(evsel, cpu, thread) to read and write
file descriptors xyarray. For a kernel PMU event, this leads to
serious memory corruption and perf crash.
I have seen evlist->core.cpus->nr is 1 while evsel has cpus->nr
with the total number of CPUs. so xyarray which is allocated by
evlist->core.cpus->nr will get overflow. This leads to various
segmentation faults in perf tool for kernel PMU events, eg:
./perf stat -e bus_cycles sleep 1
*** Error in `./perf': free(): invalid next size (fast): 0x00000000401e6370 ***
Aborted (core dumped)
Fixes: 7736627b865d ("perf stat: Use affinity for closing file descriptors")
Cc: Andi Kleen <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Alexey Budankov <[email protected]>
Signed-off-by: Barry Song <[email protected]>
---
tools/perf/util/evlist.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
index c0768c6..3022152 100644
--- a/tools/perf/util/evlist.c
+++ b/tools/perf/util/evlist.c
@@ -1226,10 +1226,14 @@ void evlist__close(struct evlist *evlist)
int cpu, i;
/*
- * With perf record core.cpus is usually NULL.
+ * With perf record core.cpus is usually NULL;
+ * For Kernel PMU event x, "perf stat -e x" will set evlist->core.cpus->nr to
+ * 1 while evsel has cpus->nr which contains all CPUs. evsel__cpu_iter_skip()
+ * will be false, memory corruption will happen if we use affinity to close
+ * file descriptors;
* Use the old method to handle this for now.
*/
- if (!evlist->core.cpus) {
+ if (!evlist->core.cpus || evlist->core.cpus->nr == 1) {
evlist__for_each_entry_reverse(evlist, evsel)
evsel__close(evsel);
return;
--
2.7.4
On Fri, Oct 02, 2020 at 12:57:29AM +1300, Barry Song wrote:
> Commit 7736627b865d ("perf stat: Use affinity for closing file
> descriptors") will use FD(evsel, cpu, thread) to read and write
> file descriptors xyarray. For a kernel PMU event, this leads to
> serious memory corruption and perf crash.
> I have seen evlist->core.cpus->nr is 1 while evsel has cpus->nr
> with the total number of CPUs. so xyarray which is allocated by
> evlist->core.cpus->nr will get overflow. This leads to various
> segmentation faults in perf tool for kernel PMU events, eg:
> ./perf stat -e bus_cycles sleep 1
> *** Error in `./perf': free(): invalid next size (fast): 0x00000000401e6370 ***
> Aborted (core dumped)
Thanks.
I believe there is already a patch queued for this.
The problem seems to only happen on ARM64.
-Andi
> -----Original Message-----
> From: Andi Kleen [mailto:[email protected]]
> Sent: Friday, October 2, 2020 12:07 PM
> To: Song Bao Hua (Barry Song) <[email protected]>
> Cc: [email protected]; Linuxarm <[email protected]>; Peter
> Zijlstra <[email protected]>; Ingo Molnar <[email protected]>; Arnaldo
> Carvalho de Melo <[email protected]>; Mark Rutland
> <[email protected]>; Alexander Shishkin
> <[email protected]>; Jiri Olsa <[email protected]>;
> Namhyung Kim <[email protected]>; Adrian Hunter
> <[email protected]>; Alexey Budankov
> <[email protected]>
> Subject: Re: [PATCH] perf evlist: fix memory corruption for Kernel PMU event
>
> On Fri, Oct 02, 2020 at 12:57:29AM +1300, Barry Song wrote:
> > Commit 7736627b865d ("perf stat: Use affinity for closing file
> > descriptors") will use FD(evsel, cpu, thread) to read and write file
> > descriptors xyarray. For a kernel PMU event, this leads to serious
> > memory corruption and perf crash.
> > I have seen evlist->core.cpus->nr is 1 while evsel has cpus->nr with
> > the total number of CPUs. so xyarray which is allocated by
> > evlist->core.cpus->nr will get overflow. This leads to various
> > segmentation faults in perf tool for kernel PMU events, eg:
> > ./perf stat -e bus_cycles sleep 1
> > *** Error in `./perf': free(): invalid next size (fast):
> > 0x00000000401e6370 *** Aborted (core dumped)
>
> Thanks.
>
> I believe there is already a patch queued for this.
Andi, thanks! Could you share the link or the commit ID? I'd like to take a look at the fix.
I could still reproduce this issue in the latest linus' tree and I didn't find any commit
related to this issue in linux-next and tip/perf/core.
>
> The problem seems to only happen on ARM64.
My platform which has this issue is really ARM64.
Thanks
Barry
Hello,
On Fri, Oct 2, 2020 at 12:02 PM Song Bao Hua (Barry Song)
<[email protected]> wrote:
>
>
>
> > -----Original Message-----
> > From: Andi Kleen [mailto:[email protected]]
> > Sent: Friday, October 2, 2020 12:07 PM
> > To: Song Bao Hua (Barry Song) <[email protected]>
> > Cc: [email protected]; Linuxarm <[email protected]>; Peter
> > Zijlstra <[email protected]>; Ingo Molnar <[email protected]>; Arnaldo
> > Carvalho de Melo <[email protected]>; Mark Rutland
> > <[email protected]>; Alexander Shishkin
> > <[email protected]>; Jiri Olsa <[email protected]>;
> > Namhyung Kim <[email protected]>; Adrian Hunter
> > <[email protected]>; Alexey Budankov
> > <[email protected]>
> > Subject: Re: [PATCH] perf evlist: fix memory corruption for Kernel PMU event
> >
> > On Fri, Oct 02, 2020 at 12:57:29AM +1300, Barry Song wrote:
> > > Commit 7736627b865d ("perf stat: Use affinity for closing file
> > > descriptors") will use FD(evsel, cpu, thread) to read and write file
> > > descriptors xyarray. For a kernel PMU event, this leads to serious
> > > memory corruption and perf crash.
> > > I have seen evlist->core.cpus->nr is 1 while evsel has cpus->nr with
> > > the total number of CPUs. so xyarray which is allocated by
> > > evlist->core.cpus->nr will get overflow. This leads to various
> > > segmentation faults in perf tool for kernel PMU events, eg:
> > > ./perf stat -e bus_cycles sleep 1
> > > *** Error in `./perf': free(): invalid next size (fast):
> > > 0x00000000401e6370 *** Aborted (core dumped)
> >
> > Thanks.
> >
> > I believe there is already a patch queued for this.
>
> Andi, thanks! Could you share the link or the commit ID? I'd like to take a look at the fix.
> I could still reproduce this issue in the latest linus' tree and I didn't find any commit
> related to this issue in linux-next and tip/perf/core.
I think Andi was referring to this discussion which is not merged yet:
https://lore.kernel.org/lkml/[email protected]/
I suggested a patch at the end. Can you please try it?
Thanks
Namhyung
> -----Original Message-----
> From: Namhyung Kim [mailto:[email protected]]
> Sent: Tuesday, October 6, 2020 2:26 PM
> To: Song Bao Hua (Barry Song) <[email protected]>
> Cc: Andi Kleen <[email protected]>; [email protected]; Linuxarm
> <[email protected]>; Peter Zijlstra <[email protected]>; Ingo Molnar
> <[email protected]>; Arnaldo Carvalho de Melo <[email protected]>; Mark
> Rutland <[email protected]>; Alexander Shishkin
> <[email protected]>; Jiri Olsa <[email protected]>; Adrian
> Hunter <[email protected]>; Alexey Budankov
> <[email protected]>
> Subject: Re: [PATCH] perf evlist: fix memory corruption for Kernel PMU event
>
> Hello,
>
> On Fri, Oct 2, 2020 at 12:02 PM Song Bao Hua (Barry Song)
> <[email protected]> wrote:
> >
> >
> >
> > > -----Original Message-----
> > > From: Andi Kleen [mailto:[email protected]]
> > > Sent: Friday, October 2, 2020 12:07 PM
> > > To: Song Bao Hua (Barry Song) <[email protected]>
> > > Cc: [email protected]; Linuxarm <[email protected]>; Peter
> > > Zijlstra <[email protected]>; Ingo Molnar <[email protected]>;
> Arnaldo
> > > Carvalho de Melo <[email protected]>; Mark Rutland
> > > <[email protected]>; Alexander Shishkin
> > > <[email protected]>; Jiri Olsa <[email protected]>;
> > > Namhyung Kim <[email protected]>; Adrian Hunter
> > > <[email protected]>; Alexey Budankov
> > > <[email protected]>
> > > Subject: Re: [PATCH] perf evlist: fix memory corruption for Kernel PMU
> event
> > >
> > > On Fri, Oct 02, 2020 at 12:57:29AM +1300, Barry Song wrote:
> > > > Commit 7736627b865d ("perf stat: Use affinity for closing file
> > > > descriptors") will use FD(evsel, cpu, thread) to read and write file
> > > > descriptors xyarray. For a kernel PMU event, this leads to serious
> > > > memory corruption and perf crash.
> > > > I have seen evlist->core.cpus->nr is 1 while evsel has cpus->nr with
> > > > the total number of CPUs. so xyarray which is allocated by
> > > > evlist->core.cpus->nr will get overflow. This leads to various
> > > > segmentation faults in perf tool for kernel PMU events, eg:
> > > > ./perf stat -e bus_cycles sleep 1
> > > > *** Error in `./perf': free(): invalid next size (fast):
> > > > 0x00000000401e6370 *** Aborted (core dumped)
> > >
> > > Thanks.
> > >
> > > I believe there is already a patch queued for this.
> >
> > Andi, thanks! Could you share the link or the commit ID? I'd like to take a
> look at the fix.
> > I could still reproduce this issue in the latest linus' tree and I didn't find any
> commit
> > related to this issue in linux-next and tip/perf/core.
>
> I think Andi was referring to this discussion which is not merged yet:
>
> https://lore.kernel.org/lkml/[email protected]
> m/
>
> I suggested a patch at the end. Can you please try it?
I tried the patch you suggested.
diff --git a/tools/lib/perf/evlist.c b/tools/lib/perf/evlist.c
index 2208444ecb44..cfcdbd7be066 100644
--- a/tools/lib/perf/evlist.c
+++ b/tools/lib/perf/evlist.c
@@ -45,6 +45,9 @@ static void __perf_evlist__propagate_maps(struct perf_evlist *evlist,
if (!evsel->own_cpus || evlist->has_user_cpus) {
perf_cpu_map__put(evsel->cpus);
evsel->cpus = perf_cpu_map__get(evlist->cpus);
+ } else if (!evsel->system_wide && perf_cpu_map__empty(evlist->cpus)) {
+ perf_cpu_map__put(evsel->cpus);
+ evsel->cpus = perf_cpu_map__get(evlist->cpus);
} else if (evsel->cpus != evsel->own_cpus) {
perf_cpu_map__put(evsel->cpus);
evsel->cpus = perf_cpu_map__get(evsel->own_cpus);
it did fix the crash I have seen on arm64. I'd prefer you put the below fixes tag in the commit log.
Fixes: 7736627b865d ("perf stat: Use affinity for closing file descriptors")
Perf stat began to crash from v5.4 kernel, so the fix should be backported to stable trees.
Thanks
Barry
On Tue, Oct 06, 2020 at 06:39:44AM +0000, Song Bao Hua (Barry Song) wrote:
SNIP
> > > Andi, thanks! Could you share the link or the commit ID? I'd like to take a
> > look at the fix.
> > > I could still reproduce this issue in the latest linus' tree and I didn't find any
> > commit
> > > related to this issue in linux-next and tip/perf/core.
> >
> > I think Andi was referring to this discussion which is not merged yet:
> >
> > https://lore.kernel.org/lkml/[email protected]
> > m/
> >
> > I suggested a patch at the end. Can you please try it?
>
> I tried the patch you suggested.
>
> diff --git a/tools/lib/perf/evlist.c b/tools/lib/perf/evlist.c
> index 2208444ecb44..cfcdbd7be066 100644
> --- a/tools/lib/perf/evlist.c
> +++ b/tools/lib/perf/evlist.c
> @@ -45,6 +45,9 @@ static void __perf_evlist__propagate_maps(struct perf_evlist *evlist,
> if (!evsel->own_cpus || evlist->has_user_cpus) {
> perf_cpu_map__put(evsel->cpus);
> evsel->cpus = perf_cpu_map__get(evlist->cpus);
> + } else if (!evsel->system_wide && perf_cpu_map__empty(evlist->cpus)) {
> + perf_cpu_map__put(evsel->cpus);
> + evsel->cpus = perf_cpu_map__get(evlist->cpus);
> } else if (evsel->cpus != evsel->own_cpus) {
> perf_cpu_map__put(evsel->cpus);
> evsel->cpus = perf_cpu_map__get(evsel->own_cpus);
>
> it did fix the crash I have seen on arm64. I'd prefer you put the below fixes tag in the commit log.
> Fixes: 7736627b865d ("perf stat: Use affinity for closing file descriptors")
> Perf stat began to crash from v5.4 kernel, so the fix should be backported to stable trees.
awesome.. Namhyung, could you please send full patch?
thanks,
jirka
Hello,
On Tue, Oct 6, 2020 at 8:11 PM Jiri Olsa <[email protected]> wrote:
>
> On Tue, Oct 06, 2020 at 06:39:44AM +0000, Song Bao Hua (Barry Song) wrote:
>
> SNIP
>
> > > > Andi, thanks! Could you share the link or the commit ID? I'd like to take a
> > > look at the fix.
> > > > I could still reproduce this issue in the latest linus' tree and I didn't find any
> > > commit
> > > > related to this issue in linux-next and tip/perf/core.
> > >
> > > I think Andi was referring to this discussion which is not merged yet:
> > >
> > > https://lore.kernel.org/lkml/[email protected]
> > > m/
> > >
> > > I suggested a patch at the end. Can you please try it?
> >
> > I tried the patch you suggested.
> >
> > diff --git a/tools/lib/perf/evlist.c b/tools/lib/perf/evlist.c
> > index 2208444ecb44..cfcdbd7be066 100644
> > --- a/tools/lib/perf/evlist.c
> > +++ b/tools/lib/perf/evlist.c
> > @@ -45,6 +45,9 @@ static void __perf_evlist__propagate_maps(struct perf_evlist *evlist,
> > if (!evsel->own_cpus || evlist->has_user_cpus) {
> > perf_cpu_map__put(evsel->cpus);
> > evsel->cpus = perf_cpu_map__get(evlist->cpus);
> > + } else if (!evsel->system_wide && perf_cpu_map__empty(evlist->cpus)) {
> > + perf_cpu_map__put(evsel->cpus);
> > + evsel->cpus = perf_cpu_map__get(evlist->cpus);
> > } else if (evsel->cpus != evsel->own_cpus) {
> > perf_cpu_map__put(evsel->cpus);
> > evsel->cpus = perf_cpu_map__get(evsel->own_cpus);
> >
> > it did fix the crash I have seen on arm64. I'd prefer you put the below fixes tag in the commit log.
> > Fixes: 7736627b865d ("perf stat: Use affinity for closing file descriptors")
> > Perf stat began to crash from v5.4 kernel, so the fix should be backported to stable trees.
>
> awesome.. Namhyung, could you please send full patch?
Sure, I'll do!
Thanks
Namhyung