From: "Liang, Kan" <kan.liang@intel.com>
To: Arnaldo Carvalho de Melo <acme@kernel.org>
CC: "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
        "dsahern@gmail.com" <dsahern@gmail.com>,
        "Huang, Ying" <ying.huang@intel.com>,
        "andi@firstfloor.org" <andi@firstfloor.org>
Subject: RE: [PATCH V4 1/2] perf,tools: add time out to force stop proc map
 processing
Thread-Topic: [PATCH V4 1/2] perf,tools: add time out to force stop proc map
 processing
Thread-Index: AQHQqUFEQP8tuAF300ebtd8pXARQEJ2z2LoAgATXscA=
Date: Mon, 22 Jun 2015 18:01:58 +0000
Message-ID: <37D7C6CF3E00A74B8858931C1DB2F0770187F7B7@SHSMSX103.ccr.corp.intel.com>
References: <1434549071-25611-1-git-send-email-kan.liang@intel.com>
 <20150619214118.GD28405@kernel.org>
In-Reply-To: <20150619214118.GD28405@kernel.org>
Accept-Language: zh-CN, en-US
Content-Language: en-US
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 8BIT
MIME-Version: 1.0
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 5088
Lines: 117


> Em Wed, Jun 17, 2015 at 09:51:10AM -0400, kan.liang@intel.com escreveu:
> > From: Kan Liang <kan.liang@intel.com>
> > System wide sampling like 'perf top' or 'perf record -a' read all
> > threads /proc/xxx/maps before sampling. If there are any threads which
> > generating a keeping growing huge maps, perf will do infinite loop
> > during synthesizing. Nothing will be sampled.
> >
> > This patch fixes this issue by adding per-thread timeout to force stop
> > this kind of endless proc map processing.
> > PERF_RECORD_MISC_PROC_MAP_PARSE_TIME_OUT is introduced to
> indicate
> > that the mmap record are truncated by time out. User will get warning
> > notification when truncated mmap records are detected.
> 
> I am applying the patch, we indeed need to keep it in the last
> PERF_RECORD_MMAP2 we synthesized for a proc mmap file, i.e. it has to
> be stored in the perf.data file, so that later, at 'report' time, possibly in
> another machine, that situation can be reported to the user.
> 
> But, the warning is only being showm for tools that process events via
> perf_session__process_events(), like 'perf report'.
> 
> 'top' and 'trace', for instance, don't do that, so the users will not be warned
> in that case.

The users can be warned right away by pr_warning when time out is detected.
I tested both 'top' and 'trace'.

# perf trace
Reading /proc/18099/maps time out. You may want to increase the time limit
by --proc-map-timeout
Reading /proc/18100/maps time out. You may want to increase the time limit 
by --proc-map-timeout

For 'top', the error message will be printed at the bottom of the screen.
The user may not notice it with default --tui mode, only if there is very small
number of threads time out.  (My test case has 64 threads time out. The error
messages last 32 seconds, the user must notice that.) 

So the problem is only for some case of 'top'.
> 
> To fix this we need to make machine__process_mmap2_event() notice
> that PERF_RECORD_MISC_PROC_MAP_PARSE_TIME_OUT is set, and
> bump a per-struct machine counter, something like machine-
> >stats.nr_truncated_mmaps.

machine__process_mmap2_event also be called by 'record', 'report', etc.
Since it's an issue only for 'top'.
How about update nr_truncated_mmaps in machine__process_event?

diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c
index 5cfc3aa..95a1dc5 100644
--- a/tools/perf/builtin-top.c
+++ b/tools/perf/builtin-top.c
@@ -952,6 +952,16 @@ static int __cmd_top(struct perf_top *top)

        machine__synthesize_threads(&top->session->machines.host, &opts->target,
                                    top->evlist->threads, false, opts->proc_map_timeout);
+
+       if (top->session->machines.host.nr_truncated_mmaps)
+               ui__warning("%d map information files for pre-existing threads were\n"
+                           "not processed, if there are samples for addresses they\n"
+                           "will not be resolved, you may find out which are these\n"
+                           "threads by running with --stdio\n"
+                           "The time limit to process proc map is too short?\n"
+                           "Increase it by --proc-map-timeout\n",
+                           top->session->machines.host.nr_truncated_mmaps);
+
        ret = perf_top__start_counters(top);
        if (ret)
                goto out_delete;
diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index 968cea8..7d960ff 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -1438,6 +1438,8 @@ int machine__process_event(struct machine *machine, union perf_event *event,
        case PERF_RECORD_MMAP:
                ret = machine__process_mmap_event(machine, event, sample); break;
        case PERF_RECORD_MMAP2:
+               if (event->header.misc & PERF_RECORD_MISC_PROC_MAP_PARSE_TIMEOUT)
+                       machine->nr_truncated_mmaps++;
                ret = machine__process_mmap2_event(machine, event, sample); break;
        case PERF_RECORD_FORK:
                ret = machine__process_fork_event(machine, event, sample); break;
diff --git a/tools/perf/util/machine.h b/tools/perf/util/machine.h
index 4333a3a..af74955 100644
--- a/tools/perf/util/machine.h
+++ b/tools/perf/util/machine.h
@@ -38,6 +38,7 @@ struct machine {
        struct map_groups kmaps;
        struct map        *vmlinux_maps[MAP__NR_TYPES];
        u64               kernel_start;
+       u32               nr_truncated_mmaps;
        symbol_filter_t   symbol_filter;
        pid_t             *current_tid;
        union { /* Tool specific area */

Thanks,
Kan
> 
> The callchain is:
> 
>   builtin-top.c
>     machine__synthesize_threads()
>       __machine__synthesize_threads()
>         perf_event__process()
>           machine__process_event()
>             machine__process_mmap2_event()
> 
> At some point even a event_stats may make sense to be added to struct
> machine :-)
> 
> - Arnaldo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
Please read the FAQ at  http://www.tux.org/lkml/