Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753871AbcJDRL0 (ORCPT ); Tue, 4 Oct 2016 13:11:26 -0400 Received: from smtp.codeaurora.org ([198.145.29.96]:41617 "EHLO smtp.codeaurora.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752924AbcJDRLZ (ORCPT ); Tue, 4 Oct 2016 13:11:25 -0400 DMARC-Filter: OpenDMARC Filter v1.3.1 smtp.codeaurora.org 2EBFA6155B Authentication-Results: pdx-caf-mail.web.codeaurora.org; dmarc=none header.from=codeaurora.org Authentication-Results: pdx-caf-mail.web.codeaurora.org; spf=pass smtp.mailfrom=joonwoop@codeaurora.org Subject: Re: [PATCH] perf sched: kill time stamp discrepancy between script and latency To: Namhyung Kim References: <1475115926-9602-1-git-send-email-joonwoop@codeaurora.org> <20161001051531.GA28673@danjae.aot.lge.com> <9fdbfc0f-4b56-331e-0b9d-ba715228ec26@codeaurora.org> <20161004120417.GA9845@danjae.aot.lge.com> Cc: Peter Zijlstra , Arnaldo Carvalho de Melo , Ingo Molnar , Alexander Shishkin , Steven Rostedt , linux-kernel@vger.kernel.org From: Joonwoo Park Message-ID: <1a924e35-9e37-df19-3a3d-6296e7d91820@codeaurora.org> Date: Tue, 4 Oct 2016 10:11:21 -0700 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.2.0 MIME-Version: 1.0 In-Reply-To: <20161004120417.GA9845@danjae.aot.lge.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 6654 Lines: 174 On 10/04/2016 05:04 AM, Namhyung Kim wrote: > On Mon, Oct 03, 2016 at 03:04:48PM -0700, Joonwoo Park wrote: >> >> >> On 09/30/2016 10:15 PM, Namhyung Kim wrote: >>> Hi Joonwoo, >>> >>> On Wed, Sep 28, 2016 at 07:25:26PM -0700, Joonwoo Park wrote: >>>> Perf sched latency is handy to find out the maximum sched latency and >>>> the time stamp of the event. After running sched latency, if a found >>>> latency looks suspicious it's quite reasonable to run perf script >>>> subsequently and search with the time stamp given by perf sched latency >>>> to continue further debugging. However, at present, it's possible the >>>> time stamp given by perf sched latency cannot be found in the trace >>>> output by perf script because perf sched latency converts the event >>>> time from ns to ms as double float and prints it with printf which >>>> does banker's rounding as opposed to perf script doesn't. >>>> >>>> For example: >>>> >>>> 0x750ff0 [0x80]: event: 9 >>>> >>>> 2 1858303049520 0x750ff0 [0x80]: PERF_RECORD_SAMPLE(IP, 0x1): 15281/15281: 0xffffffff8162a63a period: 1 addr: 0 >>>> ... thread: hackbench:15281 >>>> >>>> >>>> $ perf sched -i perf.data latency | grep hackbench >>>> hackbench:(401) + 3539.283 ms | 23347 | avg: 7.286 ms | max: 829.998 ms | max at: 1858.303050 s >>>> >>>> $ perf script -i perf.data | grep "1858\.303050" >>>> >>>> $ perf script -i perf.data | grep "1858\.303049" >>>> hackbench 15281 [002] 1858.303049: sched:sched_switch: prev_comm=hackbench prev_pid=15281 prev_prio=120 prev_state=D ==> next_comm=hackbench next_pid=15603 next_prio=120 >>>> >>>> Fix perf latency to print out time stamp without rounding to avoid such >>>> discrepancy. >>>> >>>> Cc: Peter Zijlstra >>>> Cc: Ingo Molnar >>>> Cc: Arnaldo Carvalho de Melo >>>> Cc: Alexander Shishkin >>>> Cc: Steven Rostedt >>>> Cc: Namhyung Kim >>>> Cc: linux-kernel@vger.kernel.org >>>> Signed-off-by: Joonwoo Park >>>> --- >>>> >>>> I was tempted to get rid of all u64 to double casting in the function >>>> output_lat_thread but didn't because there is no data loss as of >>>> today. Double float gives at least 15 significant decimal digits >>>> precision while the function requires only 14 significant digits precision. >>>> >>>> $ python -c "print(len(str(int(0xffffffffffffffff / 1e6))))" >>>> 14 >>>> >>>> tools/lib/traceevent/event-parse.h | 1 + >>>> tools/perf/builtin-sched.c | 12 ++++++++++-- >>>> 2 files changed, 11 insertions(+), 2 deletions(-) >>>> >>>> diff --git a/tools/lib/traceevent/event-parse.h b/tools/lib/traceevent/event-parse.h >>>> index 9ffde37..f42703c 100644 >>>> --- a/tools/lib/traceevent/event-parse.h >>>> +++ b/tools/lib/traceevent/event-parse.h >>>> @@ -174,6 +174,7 @@ struct pevent_plugin_option { >>>> >>>> #define NSECS_PER_SEC 1000000000ULL >>>> #define NSECS_PER_USEC 1000ULL >>>> +#define MSECS_PER_SEC 1000ULL >>>> >>>> enum format_flags { >>>> FIELD_IS_ARRAY = 1, >>>> diff --git a/tools/perf/builtin-sched.c b/tools/perf/builtin-sched.c >>>> index afa0576..e5cf51a 100644 >>>> --- a/tools/perf/builtin-sched.c >>>> +++ b/tools/perf/builtin-sched.c >>>> @@ -1190,6 +1190,7 @@ static void output_lat_thread(struct perf_sched *sched, struct work_atoms *work_ >>>> int i; >>>> int ret; >>>> u64 avg; >>>> + u64 max_lat_at_sec, max_lat_at_msec; >>> >>> Isn't it usec rathen than msec? :) >> >> It's to contain three decimal digits which are msecs when 'max_lat_at' is >> expressed in sec. >> For example when max_lat_at = 1858303049520 which is 1858.3030495199998 sec, >> max_lat_at_msec is meant to be 303. > > ??? But didn't you want to print it as 303049? > > Looking at the code, the 'max_lat' is a latency which was printed in > msec (%9.3f ms) but the 'max_lat_at' is a timestamp which was printed > in sec (%13.6f s). This is confusing but they use effectively same > unit (usec) by using different number of digit after the period. > >> >> I must admit variable's name is bit misleading. Maybe just secs, msecs are >> better? >> Also just noticed u64 isn't needed for msecs. Will size down. > > IIUC you wanted usec for timestamp rather than msec, aren't you? Ugh. I was a dummy. My bad. Will fix this. > >> >>> >>>> >>>> if (!work_list->nb_atoms) >>>> return; >>>> @@ -1212,11 +1213,18 @@ static void output_lat_thread(struct perf_sched *sched, struct work_atoms *work_ >>>> >>>> avg = work_list->total_lat / work_list->nb_atoms; >>>> >>>> - printf("|%11.3f ms |%9" PRIu64 " | avg:%9.3f ms | max:%9.3f ms | max at: %13.6f s\n", >>>> + /* >>>> + * Avoid round up with printf to prevent event time discrepency >>>> + * between sched script and latency. >>>> + */ >>>> + max_lat_at_sec = work_list->max_lat_at / NSECS_PER_SEC; >>>> + max_lat_at_msec = (work_list->max_lat_at - >>>> + max_lat_at_sec * NSECS_PER_SEC) / MSECS_PER_SEC; >>>> + printf("+%11.3f ms |%9" PRIu64 " | avg:%9.3f ms | max:%9.3f ms | max at: %6lu.%06lu s\n", >>> >>> Maybe you'd better to be in sync with the script code: >>> >>> if (PRINT_FIELD(TIME)) { >>> nsecs = sample->time; >>> secs = nsecs / NSECS_PER_SEC; >>> nsecs -= secs * NSECS_PER_SEC; >>> usecs = nsecs / NSECS_PER_USEC; >>> if (nanosecs) >>> printf("%5lu.%09llu: ", secs, nsecs); >>> else >>> printf("%5lu.%06lu: ", secs, usecs); >>> } >> >> Apart from variable name, I'm not quite sure what to sync because sched >> doesn't print in nsecs. >> Maybe you just wanted for variable names in sync rather than logic? > > You found inconsistency between the two, and wanted to show same time. > So wouldn't it be better factor out the code and reuse it from the > both callsite? If you don't need nsec, you could make it an option. > > But this is just a general suggestion and maybe it's not worth doing > here since the code is small.. So I won't insist on it strongly. I see what you're suggesting. Will try to factor out the common part in the next revision. Thanks, Joonwoo > > Thanks > Namhyung > >>> >>> >>>> (double)work_list->total_runtime / 1e6, >>>> work_list->nb_atoms, (double)avg / 1e6, >>>> (double)work_list->max_lat / 1e6, >>>> - (double)work_list->max_lat_at / 1e9); >>>> + max_lat_at_sec, max_lat_at_msec); >>>> } >>>> >>>> static int pid_cmp(struct work_atoms *l, struct work_atoms *r) >>>> -- >>>> The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, >>>> hosted by The Linux Foundation >>>>