Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754944AbbKRHbg (ORCPT ); Wed, 18 Nov 2015 02:31:36 -0500 Received: from szxga02-in.huawei.com ([119.145.14.65]:21693 "EHLO szxga02-in.huawei.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751852AbbKRHbe (ORCPT ); Wed, 18 Nov 2015 02:31:34 -0500 Message-ID: <564C283E.6070306@huawei.com> Date: Wed, 18 Nov 2015 15:26:54 +0800 From: "Wangnan (F)" User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:31.0) Gecko/20100101 Thunderbird/31.6.0 MIME-Version: 1.0 To: Namhyung Kim CC: Jiri Olsa , Arnaldo Carvalho de Melo , lkml , David Ahern , "Peter Zijlstra" , Ingo Molnar , Milian Wolff Subject: Re: [PATCH 2/3] perf tools: Add callchain order support for libunwind DWARF unwinder References: <1447772739-18471-1-git-send-email-jolsa@kernel.org> <1447772739-18471-3-git-send-email-jolsa@kernel.org> <564BFAD4.8060909@huawei.com> <20151118054114.GQ7062@sejong> In-Reply-To: <20151118054114.GQ7062@sejong> Content-Type: text/plain; charset="utf-8"; format=flowed Content-Transfer-Encoding: 7bit X-Originating-IP: [10.111.66.109] X-CFilter-Loop: Reflected X-Mirapoint-Virus-RAPID-Raw: score=unknown(0), refid=str=0001.0A020203.564C2948.0018,ss=1,re=0.000,recu=0.000,reip=0.000,cl=1,cld=1,fgs=0, ip=0.0.0.0, so=2013-06-18 04:22:30, dmn=2013-03-21 17:37:32 X-Mirapoint-Loop-Id: 5108a053352c28b01c2f0c177b4b4355 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 5288 Lines: 159 On 2015/11/18 13:41, Namhyung Kim wrote: > On Wed, Nov 18, 2015 at 12:13:08PM +0800, Wangnan (F) wrote: >> >> On 2015/11/17 23:05, Jiri Olsa wrote: >>> From: Jiri Olsa >>> >>> As reported by Milian, currently for DWARF unwind (both libdw >>> and libunwind) we display callchain in callee order only. >>> >>> Adding the support to follow callchain order setup to libunwind >>> DWARF unwinder, so we could get following output for report: >>> >>> $ perf record --call-graph dwarf ls >>> ... >>> $ perf report --no-children --stdio >>> >>> 39.26% ls libc-2.21.so [.] __strcoll_l >>> | >>> ---__strcoll_l >>> mpsort_with_tmp >>> mpsort_with_tmp >>> sort_files >>> main >>> __libc_start_main >>> _start >>> 0 >>> >>> $ perf report -g caller --no-children --stdio >>> ... >>> 39.26% ls libc-2.21.so [.] __strcoll_l >>> | >>> ---0 >>> _start >>> __libc_start_main >>> main >>> sort_files >>> mpsort_with_tmp >>> mpsort_with_tmp >>> __strcoll_l >>> >>> Reported-by: Milian Wolff >>> Based-on-patch-by: Milian Wolff >>> Link: http://lkml.kernel.org/n/tip-lmtbeqm403f3luw4jkjevsi5@git.kernel.org >>> Signed-off-by: Jiri Olsa >>> --- >>> tools/perf/util/unwind-libunwind.c | 47 ++++++++++++++++++++++++-------------- >>> 1 file changed, 30 insertions(+), 17 deletions(-) >>> >>> diff --git a/tools/perf/util/unwind-libunwind.c b/tools/perf/util/unwind-libunwind.c >>> index 0ae8844fe7a6..705e1c19f1ea 100644 >>> --- a/tools/perf/util/unwind-libunwind.c >>> +++ b/tools/perf/util/unwind-libunwind.c >> [SNIP] >> >>> - unw_get_reg(&c, UNW_REG_IP, &ip); >>> - ret = ip ? entry(ip, ui->thread, cb, arg) : 0; >> In original code if ip == 0 entry() won't be called. >> >>> + if (callchain_param.order == ORDER_CALLER) >>> + j = max_stack - i - 1; >>> + ret = entry(ips[j], ui->thread, cb, arg); >> But in new code event if ips[j] == 0 an entry will be built, which causes >> a behavior changes user noticable: >> >> Before this patch: >> >> >> # perf report --no-children --stdio --call-graph=callee >> ... >> 3.38% a.out a.out [.] funcc >> | >> ---funcc >> | >> --2.70%-- funcb >> funca >> main >> __libc_start_main >> _start >> >> After this patch: >> >> # perf report --no-children --stdio --call-graph=callee >> ... >> 3.38% a.out a.out [.] funcc >> | >> ---funcc >> | >> |--2.70%-- funcb >> | funca >> | main >> | __libc_start_main >> | _start >> | >> --0.68%-- 0 >> >> >> I'm not sure whether we can regard this behavior changing as a bugfix? I >> think >> there may be some reason the original code explicitly avoid creating an '0' >> entry. > I think callchain value being 0 is an error or marker for the end of > callchain. So it'd be better avoiding 0 entry. > > But unfortunately, we have many 0 entries (and broken callchain after > them) with fp recording on optimized binaries. I think we should omit > those callchains. > > Maybe something like this? > > > diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c > index 5ef90be2a249..22642c5719ab 100644 > --- a/tools/perf/util/machine.c > +++ b/tools/perf/util/machine.c > @@ -1850,6 +1850,15 @@ static int thread__resolve_callchain_sample(struct thread *thread, > #endif > ip = chain->ips[j]; > > + /* callchain value inside zero page means it's broken, stop */ > + if (ip < 4096) { > + if (callchain_param.order == ORDER_CALLER) { > + callchain_cursor_reset(&callchain_cursor); > + continue; > + } else > + break; > + } > + > err = add_callchain_ip(thread, parent, root_al, &cpumode, ip); > > if (err) Then we totally get rid of 0 entries, but how can we explain the sum of overhead of different branches? Is it possible to explicitly tell user the place where perf failed to unwind call stack? For example: 3.38% a.out a.out [.] funcc | ---funcc | |--2.70%-- funcb | funca | main | __libc_start_main | _start | --0.68%-- (unwind failure) Thank you. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/