Subject: Re: [PATCH 4/5] perf top: switch to overwrite mode
To: <kan.liang@intel.com>, <acme@kernel.org>, <peterz@infradead.org>,
        <mingo@redhat.com>, <linux-kernel@vger.kernel.org>
References: <1512513555-118798-1-git-send-email-kan.liang@intel.com>
 <1512513555-118798-5-git-send-email-kan.liang@intel.com>
CC: <jolsa@kernel.org>, <namhyung@kernel.org>, <ak@linux.intel.com>
From: "Wangnan (F)" <wangnan0@huawei.com>
Message-ID: <f99c5959-b974-d929-7d82-5d537daaf3dd@huawei.com>
Date: Wed, 6 Dec 2017 09:58:51 +0800
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:45.0) Gecko/20100101
 Thunderbird/45.5.1
MIME-Version: 1.0
In-Reply-To: <1512513555-118798-5-git-send-email-kan.liang@intel.com>
Content-Type: text/plain; charset="utf-8"; format=flowed
Content-Transfer-Encoding: 7bit
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 4787
Lines: 136


On 2017/12/6 6:39, kan.liang@intel.com wrote:
> From: Kan Liang <kan.liang@intel.com>
>
> perf_top__mmap_read has severe performance issue in
> Knights Landing/Mill, when monitoring in heavy load system. It costs
> several minutes to finish, which is unacceptable.
>
> Currently, perf top is non overwrite mode. For non overwrite mode, it
> tries to read everything in the ringbuffer and doesn't pause the
> ringbuffer. Once there are lots of samples delivered persistently,
> the processing time could be very long. Also, the latest samples could
> be lost when the ringbuffer is full.
>
> For overwrite mode, it takes a snapshot for the system by pausing the
> ringbuffer, which could significantly reducing the processing time.
> Also, the overwrite mode always keep the latest samples.
> Considering the real time requirement for perf top, the overwrite mode
> is more suitable for perf top.
>
> Actually, perf top was overwrite mode. It is changed to non overwrite
> mode since commit 93fc64f14472 ("perf top: Switch to non overwrite
> mode"). It's better to change it back to overwrite mode.
>
> There would be some records lost in overwrite mode because of pausing
> the ringbuffer. It has little impact for the accuracy of the snapshot
> and could be tolerant.
> The lost events checking is removed.
>
> Unconditionally wait 100 ms before each snapshot. It also reduce the
> overhead caused by pausing ringbuffer, especially on light load system.
>
> Signed-off-by: Kan Liang <kan.liang@intel.com>
> ---
>   tools/perf/builtin-top.c       | 30 ++++++++++++++----------------
>   tools/perf/ui/browsers/hists.c | 12 +++++++++---
>   2 files changed, 23 insertions(+), 19 deletions(-)
>
> diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c
> index 540461f..721d786 100644
> --- a/tools/perf/builtin-top.c
> +++ b/tools/perf/builtin-top.c
> @@ -283,16 +283,6 @@ static void perf_top__print_sym_table(struct perf_top *top)
>   
>   	printf("%-*.*s\n", win_width, win_width, graph_dotted_line);
>   
> -	if (hists->stats.nr_lost_warned !=
> -	    hists->stats.nr_events[PERF_RECORD_LOST]) {
> -		hists->stats.nr_lost_warned =
> -			      hists->stats.nr_events[PERF_RECORD_LOST];
> -		color_fprintf(stdout, PERF_COLOR_RED,
> -			      "WARNING: LOST %d chunks, Check IO/CPU overload",
> -			      hists->stats.nr_lost_warned);
> -		++printed;
> -	}
> -
>   	if (top->sym_filter_entry) {
>   		perf_top__show_details(top);
>   		return;
> @@ -807,14 +797,19 @@ static void perf_event__process_sample(struct perf_tool *tool,
>   
>   static void perf_top__mmap_read_idx(struct perf_top *top, int idx)
>   {
> +	struct perf_mmap *md = &top->evlist->overwrite_mmap[idx];
>   	struct perf_sample sample;
>   	struct perf_evsel *evsel;
>   	struct perf_session *session = top->session;
>   	union perf_event *event;
>   	struct machine *machine;
> +	unsigned long size;
> +	u64 end, start;
>   	int ret;
>   
> -	while ((event = perf_evlist__mmap_read(top->evlist, idx)) != NULL) {
> +	perf_mmap__read_catchup(md, true, &start, &end, &size);
> +
> +	while ((event = perf_mmap__read_backward(md, &start, end)) != NULL) {
>   		ret = perf_evlist__parse_sample(top->evlist, event, &sample);
>   		if (ret) {
>   			pr_err("Can't parse sample, err = %d\n", ret);
> @@ -869,16 +864,21 @@ static void perf_top__mmap_read_idx(struct perf_top *top, int idx)
>   		} else
>   			++session->evlist->stats.nr_unknown_events;
>   next_event:
> -		perf_evlist__mmap_consume(top->evlist, idx);
> +		perf_mmap__consume(md, true);
>   	}
> +
> +	perf_mmap__read_done(md);
>   }
>   
>   static void perf_top__mmap_read(struct perf_top *top)
>   {
>   	int i;
>   
> +	perf_evlist__toggle_bkw_mmap(top->evlist, BKW_MMAP_DATA_PENDING);
>   	for (i = 0; i < top->evlist->nr_mmaps; i++)
>   		perf_top__mmap_read_idx(top, i);
> +	perf_evlist__toggle_bkw_mmap(top->evlist, BKW_MMAP_EMPTY);
> +	perf_evlist__toggle_bkw_mmap(top->evlist, BKW_MMAP_RUNNING);
>   }
>   
>   static int perf_top__start_counters(struct perf_top *top)
> @@ -1029,12 +1029,9 @@ static int __cmd_top(struct perf_top *top)
>   	}
>   
>   	while (!done) {
> -		u64 hits = top->samples;
> -
>   		perf_top__mmap_read(top);
>   
> -		if (hits == top->samples)
> -			ret = perf_evlist__poll(top->evlist, 100);
> +		ret = perf_evlist__poll(top->evlist, 100);
>   
>   		if (resize) {
>   			perf_top__resize(top);
> @@ -1127,6 +1124,7 @@ int cmd_top(int argc, const char **argv)
>   				.uses_mmap   = true,
>   			},
>   			.proc_map_timeout    = 500,
> +			.overwrite	= 1,

This breaks old kernel:

# ./perf top
Error:
Reading from overwrite event is not supported by this kernel.
# uname -r
3.0.13-0.27-default

We need a way to fall back to normal ring buffer when kernel
doesn't support backward ring buffer.

Thank you.