2017-03-15 06:57:33

by Stephane Eranian

[permalink] [raw]
Subject: [PATCH] perf/record: make perf_event__synthesize_mmap_events() scale

This patch significantly improves the execution time of
perf_event__synthesize_mmap_events() when running perf record
on systems where processes have lots of threads. It just happens
that cat /proc/pid/maps support uses a O(N^2) algorithm to generate
each map line in the maps file. If you have 1000 threads, then you have
necessarily 1000 stacks. For each vma, you need to check if it corresponds
to a thread's stack. With a large number of threads, this can take a very long time. I have seen latencies >> 10mn.

As of today, perf does not use the fact that a mapping is a stack,
therefore we can work around the issue by using /proc/pid/tasks/pid/maps.
This entry does not try to map a vma to stack and is thus much
faster with no loss of functonality.

The proc-map-timeout logic is kept in case user still want some uppre limit.

Signed-off-by: Stephane Eranian <[email protected]>
---
tools/perf/util/event.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/tools/perf/util/event.c b/tools/perf/util/event.c
index 4ea7ce7..b137566 100644
--- a/tools/perf/util/event.c
+++ b/tools/perf/util/event.c
@@ -255,8 +255,8 @@ int perf_event__synthesize_mmap_events(struct perf_tool *tool,
if (machine__is_default_guest(machine))
return 0;

- snprintf(filename, sizeof(filename), "%s/proc/%d/maps",
- machine->root_dir, pid);
+ snprintf(filename, sizeof(filename), "%s/proc/%d/tasks/%d/maps",
+ machine->root_dir, pid, pid);

fp = fopen(filename, "r");
if (fp == NULL) {
--
2.5.0


2017-03-15 11:33:56

by Jiri Olsa

[permalink] [raw]
Subject: Re: [PATCH] perf/record: make perf_event__synthesize_mmap_events() scale

On Tue, Mar 14, 2017 at 11:57:21PM -0700, Stephane Eranian wrote:
> This patch significantly improves the execution time of
> perf_event__synthesize_mmap_events() when running perf record
> on systems where processes have lots of threads. It just happens
> that cat /proc/pid/maps support uses a O(N^2) algorithm to generate
> each map line in the maps file. If you have 1000 threads, then you have
> necessarily 1000 stacks. For each vma, you need to check if it corresponds
> to a thread's stack. With a large number of threads, this can take a very long time. I have seen latencies >> 10mn.
>
> As of today, perf does not use the fact that a mapping is a stack,
> therefore we can work around the issue by using /proc/pid/tasks/pid/maps.
> This entry does not try to map a vma to stack and is thus much
> faster with no loss of functonality.
>
> The proc-map-timeout logic is kept in case user still want some uppre limit.
>
> Signed-off-by: Stephane Eranian <[email protected]>
> ---
> tools/perf/util/event.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/tools/perf/util/event.c b/tools/perf/util/event.c
> index 4ea7ce7..b137566 100644
> --- a/tools/perf/util/event.c
> +++ b/tools/perf/util/event.c
> @@ -255,8 +255,8 @@ int perf_event__synthesize_mmap_events(struct perf_tool *tool,
> if (machine__is_default_guest(machine))
> return 0;
>
> - snprintf(filename, sizeof(filename), "%s/proc/%d/maps",
> - machine->root_dir, pid);
> + snprintf(filename, sizeof(filename), "%s/proc/%d/tasks/%d/maps",
> + machine->root_dir, pid, pid);
>
> fp = fopen(filename, "r");
> if (fp == NULL) {
> --
> 2.5.0
>

nice..

Acked-by: Jiri Olsa <[email protected]>

thanks,
jirka

2017-03-15 13:52:30

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: Re: [PATCH] perf/record: make perf_event__synthesize_mmap_events() scale

Em Tue, Mar 14, 2017 at 11:57:21PM -0700, Stephane Eranian escreveu:
> This patch significantly improves the execution time of
> perf_event__synthesize_mmap_events() when running perf record
> on systems where processes have lots of threads. It just happens
> that cat /proc/pid/maps support uses a O(N^2) algorithm to generate
> each map line in the maps file. If you have 1000 threads, then you have
> necessarily 1000 stacks. For each vma, you need to check if it corresponds
> to a thread's stack. With a large number of threads, this can take a very long time. I have seen latencies >> 10mn.
>
> As of today, perf does not use the fact that a mapping is a stack,
> therefore we can work around the issue by using /proc/pid/tasks/pid/maps.
> This entry does not try to map a vma to stack and is thus much
> faster with no loss of functonality.
>
> The proc-map-timeout logic is kept in case user still want some uppre limit.
>
> Signed-off-by: Stephane Eranian <[email protected]>
> ---
> tools/perf/util/event.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/tools/perf/util/event.c b/tools/perf/util/event.c
> index 4ea7ce7..b137566 100644
> --- a/tools/perf/util/event.c
> +++ b/tools/perf/util/event.c
> @@ -255,8 +255,8 @@ int perf_event__synthesize_mmap_events(struct perf_tool *tool,
> if (machine__is_default_guest(machine))
> return 0;
>
> - snprintf(filename, sizeof(filename), "%s/proc/%d/maps",
> - machine->root_dir, pid);
> + snprintf(filename, sizeof(filename), "%s/proc/%d/tasks/%d/maps",
> + machine->root_dir, pid, pid);

Humm...

[root@jouet ~]# ps -C firefox -L
PID LWP TTY TIME CMD
25023 25023 tty2 01:32:10 firefox
25023 25048 tty2 00:00:00 gmain
25023 25049 tty2 00:00:00 gdbus
25023 25056 tty2 00:00:00 Gecko_IOThread
25023 25058 tty2 00:00:00 Link Monitor
25023 25059 tty2 00:03:52 Socket Thread
25023 25060 tty2 00:00:00 JS Watchdog
25023 25061 tty2 00:00:12 JS Helper
25023 25062 tty2 00:00:12 JS Helper
25023 25063 tty2 00:00:13 JS Helper
25023 25064 tty2 00:00:12 JS Helper
25023 25065 tty2 00:00:13 JS Helper
25023 25066 tty2 00:00:12 JS Helper
25023 25067 tty2 00:00:13 JS Helper
25023 25068 tty2 00:00:13 JS Helper
25023 25069 tty2 00:00:00 Hang Monitor
25023 25070 tty2 00:00:00 BgHangManager
25023 25117 tty2 00:00:12 Cache2 I/O
25023 25118 tty2 00:05:37 Timer
25023 25146 tty2 00:00:00 DataStorage
25023 25150 tty2 00:00:00 GMPThread
25023 25152 tty2 00:04:12 Compositor
25023 25156 tty2 00:00:02 ImgDecoder #1
25023 25157 tty2 00:00:02 ImgDecoder #2
25023 25158 tty2 00:00:02 ImgDecoder #3
25023 25159 tty2 00:00:00 ImageIO
25023 25160 tty2 00:02:26 SoftwareVsyncTh
25023 25202 tty2 00:00:00 HTML5 Parser
25023 25204 tty2 00:00:00 IPDL Background
25023 25212 tty2 00:00:22 DOM Worker
25023 25219 tty2 00:00:23 URL Classifier
25023 25220 tty2 00:00:05 ImageBridgeChil
25023 25221 tty2 00:00:04 mozStorage #1
25023 25222 tty2 00:00:00 Proxy R~olution
25023 25223 tty2 00:00:00 DataStorage
25023 25234 tty2 00:00:00 Cache I/O
25023 25235 tty2 00:00:01 mozStorage #2
25023 25236 tty2 00:00:00 mozStorage #3
25023 25252 tty2 00:00:20 DOM Worker
25023 25256 tty2 00:00:00 mozStorage #4
25023 25257 tty2 00:00:00 localStorage DB
25023 25260 tty2 00:00:00 mozStorage #5
25023 2066 tty2 00:00:19 DOM Worker
25023 5988 tty2 00:00:00 threaded-ml
25023 6757 tty2 00:00:00 firefox
25023 6999 tty2 00:00:00 mozStorage #6
25023 19239 tty2 00:00:02 mozStorage #7
25023 12038 tty2 00:00:05 threaded-ml
25023 19301 tty2 00:00:00 firefox
[root@jouet ~]#

[root@jouet ~]# file /proc/25023/tasks
/proc/25023/tasks: cannot open `/proc/25023/tasks' (No such file or directory)
[root@jouet ~]#

But...

[root@jouet ~]# file /proc/25023/task
/proc/25023/task: directory
[root@jouet ~]# wc -l /proc/25023/task/25023/maps
1200 /proc/25023/task/25023/maps
[root@jouet ~]# wc -l /proc/25023/task/25048/maps
1201 /proc/25023/task/25048/maps
[root@jouet ~]# wc -l /proc/25048/task/25048/maps
1201 /proc/25048/task/25048/maps
[root@jouet ~]#

[acme@jouet linux]$ grep \"tasks\" fs/proc/*.c
[acme@jouet linux]$ grep \"task\" fs/proc/*.c
fs/proc/base.c: DIR("task", S_IRUGO|S_IXUGO, proc_task_inode_operations, proc_task_operations),
fs/proc/base.c: name.name = "task";
[acme@jouet linux]$

These end up mapping to the later of these two:

const struct file_operations proc_pid_maps_operations = {
.open = pid_maps_open,
.read = seq_read,
.llseek = seq_lseek,
.release = proc_map_release,
};

const struct file_operations proc_tid_maps_operations = {
.open = tid_maps_open,
.read = seq_read,
.llseek = seq_lseek,
.release = proc_map_release,
};


Which ends up going down to:

show_map_vma(struct seq_file *m, struct vm_area_struct *vma, int is_pid)

with that is_pid respectively true and false, but then show_map_vma() doesn't
use 'is_pid' at all :-\

/me scratches head, what am I missing?

The following commit, that appeared circa v4.9-rc2. So, fixing up the "tasks"
-> "tasks" we end up with something safe and that avoids this by now
non-existing problem, on older kernels, ok?

commit b18cb64ead400c01bf1580eeba330ace51f8087d
Author: Andy Lutomirski <[email protected]>
Date: Fri Sep 30 10:58:57 2016 -0700

fs/proc: Stop trying to report thread stacks

This reverts more of:

b76437579d13 ("procfs: mark thread stack correctly in proc/<pid>/maps")

... which was partially reverted by:

65376df58217 ("proc: revert /proc/<pid>/maps [stack:TID] annotation")

Originally, /proc/PID/task/TID/maps was the same as /proc/TID/maps.

In current kernels, /proc/PID/maps (or /proc/TID/maps even for
threads) shows "[stack]" for VMAs in the mm's stack address range.

In contrast, /proc/PID/task/TID/maps uses KSTK_ESP to guess the
target thread's stack's VMA. This is racy, probably returns garbage
and, on arches with CONFIG_TASK_INFO_IN_THREAD=y, is also crash-prone:
KSTK_ESP is not safe to use on tasks that aren't known to be running
ordinary process-context kernel code.

This patch removes the difference and just shows "[stack]" for VMAs
in the mm's stack range. This is IMO much more sensible -- the
actual "stack" address really is treated specially by the VM code,
and the current thread stack isn't even well-defined for programs
that frequently switch stacks on their own.

Reported-by: Jann Horn <[email protected]>
Signed-off-by: Andy Lutomirski <[email protected]>
Acked-by: Thomas Gleixner <[email protected]>


>
> fp = fopen(filename, "r");
> if (fp == NULL) {
> --
> 2.5.0

2017-03-15 13:58:49

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: Re: [PATCH] perf/record: make perf_event__synthesize_mmap_events() scale

Em Wed, Mar 15, 2017 at 10:50:59AM -0300, Arnaldo Carvalho de Melo escreveu:
> So, fixing up the "tasks" -> "tasks" we end up with something safe and
> that avoids this by now

"tasks" -> "task", grrr

- Arnaldo

2017-03-15 17:08:49

by Stephane Eranian

[permalink] [raw]
Subject: Re: [PATCH] perf/record: make perf_event__synthesize_mmap_events() scale

On Wed, Mar 15, 2017 at 6:58 AM, Arnaldo Carvalho de Melo
<[email protected]> wrote:
> Em Wed, Mar 15, 2017 at 10:50:59AM -0300, Arnaldo Carvalho de Melo escreveu:
>> So, fixing up the "tasks" -> "tasks" we end up with something safe and
>> that avoids this by now
>
> "tasks" -> "task", grrr
>
Oops, yeah, sorry about that.
Let me submit a V2 with the fix.

2017-03-15 17:45:04

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: Re: [PATCH] perf/record: make perf_event__synthesize_mmap_events() scale

Em Wed, Mar 15, 2017 at 10:08:27AM -0700, Stephane Eranian escreveu:
> On Wed, Mar 15, 2017 at 6:58 AM, Arnaldo Carvalho de Melo
> <[email protected]> wrote:
> > Em Wed, Mar 15, 2017 at 10:50:59AM -0300, Arnaldo Carvalho de Melo escreveu:
> >> So, fixing up the "tasks" -> "tasks" we end up with something safe and
> >> that avoids this by now
> >
> > "tasks" -> "task", grrr
> >
> Oops, yeah, sorry about that.
> Let me submit a V2 with the fix.

Ok, but what about the other observations, what is the kernel you're
using?

I think this will not make any difference for any kernel >= 4.9, which
isn't to say the patch shouldn't be applied, I'm just curious.

- Arnaldo

2017-03-16 02:34:37

by Stephane Eranian

[permalink] [raw]
Subject: Re: [PATCH] perf/record: make perf_event__synthesize_mmap_events() scale

Arnaldo,

On Wed, Mar 15, 2017 at 10:44 AM, Arnaldo Carvalho de Melo
<[email protected]> wrote:
>
> Em Wed, Mar 15, 2017 at 10:08:27AM -0700, Stephane Eranian escreveu:
> > On Wed, Mar 15, 2017 at 6:58 AM, Arnaldo Carvalho de Melo
> > <[email protected]> wrote:
> > > Em Wed, Mar 15, 2017 at 10:50:59AM -0300, Arnaldo Carvalho de Melo escreveu:
> > >> So, fixing up the "tasks" -> "tasks" we end up with something safe and
> > >> that avoids this by now
> > >
> > > "tasks" -> "task", grrr
> > >
> > Oops, yeah, sorry about that.
> > Let me submit a V2 with the fix.
>
> Ok, but what about the other observations, what is the kernel you're
> using?
>
Ok, yeah, I was using an older kernel which still had [stack:tid] annotations.
With more recent kernel, it seems this optimization is not needed.
I guess it would only be useful to people running a new perf on a older kernel.

>
> I think this will not make any difference for any kernel >= 4.9, which
> isn't to say the patch shouldn't be applied, I'm just curious.
>
correct, see above.

>
> - Arnaldo