Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757443AbcKXTk1 convert rfc822-to-8bit (ORCPT ); Thu, 24 Nov 2016 14:40:27 -0500 Received: from mga01.intel.com ([192.55.52.88]:53906 "EHLO mga01.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1757325AbcKXTk0 (ORCPT ); Thu, 24 Nov 2016 14:40:26 -0500 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.31,543,1473145200"; d="scan'208";a="35305673" From: "Liang, Kan" To: Mark Rutland CC: "peterz@infradead.org" , "mingo@redhat.com" , "acme@kernel.org" , "linux-kernel@vger.kernel.org" , "alexander.shishkin@linux.intel.com" , "tglx@linutronix.de" , "namhyung@kernel.org" , "jolsa@kernel.org" , "Hunter, Adrian" , "wangnan0@huawei.com" , "andi@firstfloor.org" Subject: RE: [PATCH 02/14] perf/x86: output NMI overhead Thread-Topic: [PATCH 02/14] perf/x86: output NMI overhead Thread-Index: AQHSRbFTKStexc6qbU6oNw4V9aaqU6Dny3mAgACUQPA= Date: Thu, 24 Nov 2016 19:40:21 +0000 Message-ID: <37D7C6CF3E00A74B8858931C1DB2F07750CA2D9D@SHSMSX103.ccr.corp.intel.com> References: <1479894292-16277-1-git-send-email-kan.liang@intel.com> <1479894292-16277-3-git-send-email-kan.liang@intel.com> <20161124161712.GA2444@remoulade> In-Reply-To: <20161124161712.GA2444@remoulade> Accept-Language: zh-CN, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-titus-metadata-40: eyJDYXRlZ29yeUxhYmVscyI6IiIsIk1ldGFkYXRhIjp7Im5zIjoiaHR0cDpcL1wvd3d3LnRpdHVzLmNvbVwvbnNcL0ludGVsMyIsImlkIjoiYTY5M2VkYzQtMjdjNi00YjljLWJlY2MtZmQzMmFhY2ZjYWM5IiwicHJvcHMiOlt7Im4iOiJDVFBDbGFzc2lmaWNhdGlvbiIsInZhbHMiOlt7InZhbHVlIjoiQ1RQX0lDIn1dfV19LCJTdWJqZWN0TGFiZWxzIjpbXSwiVE1DVmVyc2lvbiI6IjE1LjkuNi42IiwiVHJ1c3RlZExhYmVsSGFzaCI6InhrSGFLU3RIbVNtdVkrVlZlSnY5K05CampMZ09XcHhod2gwYnIwalkyNTA9In0= x-ctpclassification: CTP_IC x-originating-ip: [10.239.127.40] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 8BIT MIME-Version: 1.0 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2753 Lines: 83 > > @@ -1492,8 +1507,10 @@ perf_event_nmi_handler(unsigned int cmd, > struct pt_regs *regs) > > start_clock = sched_clock(); > > ret = x86_pmu.handle_irq(regs); > > finish_clock = sched_clock(); > > + clock = finish_clock - start_clock; > > > > - perf_sample_event_took(finish_clock - start_clock); > > + perf_caculate_nmi_overhead(clock); > > + perf_sample_event_took(clock); > > Ah, so it's the *sampling* overhead, not the NMI overhead. > > This doesn't take into account the cost of entering/exiting the handler, > which could be larger than the sampling overhead (e.g. if the PMU is > connected through chained interrupt controllers). > > > enum perf_record_overhead_type { > > + PERF_NMI_OVERHEAD = 0, > > As above, it may be worth calling this PERF_SAMPLE_OVERHEAD; this I think PERF_NMI stands for the NMI overhead in perf part. PERF_SAMPLE_OVERHEAD looks too generic I think. It heard like the sum of all overheads in sampling. After all we collect the overhead in different stage of sampling. NMI handler, multiplexing, side-band events... > doesn't count the entire cost of the NMI, and other architectures may want > to implement this, yet don't have NMI. > I think I can change it to PERF_X86_NMI_OVERHEAD, if you think it's more clear. For other architectures, they can implement their own type of overhead, just ignore the NMI one. > > @@ -1872,7 +1873,7 @@ __perf_remove_from_context(struct perf_event > > *event, { > > unsigned long flags = (unsigned long)info; > > > > - event_sched_out(event, cpuctx, ctx); > > + event_sched_out(event, cpuctx, ctx, false); > > if (flags & DETACH_GROUP) > > perf_group_detach(event); > > list_del_event(event, ctx); > > @@ -1918,9 +1919,9 @@ static void __perf_event_disable(struct > perf_event *event, > > update_cgrp_time_from_event(event); > > update_group_times(event); > > if (event == event->group_leader) > > - group_sched_out(event, cpuctx, ctx); > > + group_sched_out(event, cpuctx, ctx, true); > > else > > - event_sched_out(event, cpuctx, ctx); > > + event_sched_out(event, cpuctx, ctx, true); > > Why does this differ from __perf_remove_from_context()? > Both of them are called on removing event. So I think we only need to log overhead in one place. I just did some tests. It looks __perf_remove_from_context is called after __perf_event_disable. I think I will log overhead in __perf_remove_from_context for next version. > What's the policy for when we do or do not measure overhead? Currently, it's enabled all the time. Jirka suggested me to make it configurable. I will do it in next version. For next version, I still prefer to make it enable by default, since it doesn't bring additional overhead based on my test. Thanks, Kan