Date: Fri, 25 Nov 2016 08:26:01 +0900
From: Namhyung Kim <namhyung@kernel.org>
To: "Liang, Kan" <kan.liang@intel.com>
CC: Mark Rutland <mark.rutland@arm.com>,
        "peterz@infradead.org" <peterz@infradead.org>,
        "mingo@redhat.com" <mingo@redhat.com>,
        "acme@kernel.org" <acme@kernel.org>,
        "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
        "alexander.shishkin@linux.intel.com" 
        <alexander.shishkin@linux.intel.com>,
        "tglx@linutronix.de" <tglx@linutronix.de>,
        "jolsa@kernel.org" <jolsa@kernel.org>,
        "Hunter, Adrian" <adrian.hunter@intel.com>,
        "wangnan0@huawei.com" <wangnan0@huawei.com>,
        "andi@firstfloor.org" <andi@firstfloor.org>
Subject: Re: [PATCH 02/14] perf/x86: output NMI overhead
Message-ID: <20161124232601.GB28557@sejong>
References: <1479894292-16277-1-git-send-email-kan.liang@intel.com>
 <1479894292-16277-3-git-send-email-kan.liang@intel.com>
 <20161124161712.GA2444@remoulade>
 <37D7C6CF3E00A74B8858931C1DB2F07750CA2D9D@SHSMSX103.ccr.corp.intel.com>
MIME-Version: 1.0
In-Reply-To: <37D7C6CF3E00A74B8858931C1DB2F07750CA2D9D@SHSMSX103.ccr.corp.intel.com>
User-Agent: Mutt/1.7.1 (2016-10-04)
Content-Type: text/plain; charset="utf-8"
Content-Disposition: inline
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 3118
Lines: 91

On Thu, Nov 24, 2016 at 07:40:21PM +0000, Liang, Kan wrote:
> 
> 
> > > @@ -1492,8 +1507,10 @@ perf_event_nmi_handler(unsigned int cmd,
> > struct pt_regs *regs)
> > >  	start_clock = sched_clock();
> > >  	ret = x86_pmu.handle_irq(regs);
> > >  	finish_clock = sched_clock();
> > > +	clock = finish_clock - start_clock;
> > >
> > > -	perf_sample_event_took(finish_clock - start_clock);
> > > +	perf_caculate_nmi_overhead(clock);
> > > +	perf_sample_event_took(clock);
> > 
> > Ah, so it's the *sampling* overhead, not the NMI overhead.
> > 
> > This doesn't take into account the cost of entering/exiting the handler,
> > which could be larger than the sampling overhead (e.g. if the PMU is
> > connected through chained interrupt controllers).
> > 
> > >  enum perf_record_overhead_type {
> > > +	PERF_NMI_OVERHEAD	= 0,
> > 
> > As above, it may be worth calling this PERF_SAMPLE_OVERHEAD; this
> 
> I think PERF_NMI stands for the NMI overhead in perf part.
> 
> PERF_SAMPLE_OVERHEAD looks too generic I think.
> It heard like the sum of all overheads in sampling.
> After all we collect the overhead in different stage of sampling.
> NMI handler, multiplexing, side-band events...
> 
> 
> > doesn't count the entire cost of the NMI, and other architectures may want
> > to implement this, yet don't have NMI.
> >
> 
> I think I can change it to PERF_X86_NMI_OVERHEAD, if you think it's more clear.
> For other architectures, they can implement their own type of overhead,
> just ignore the NMI one.

I think it'd be better making it arch-agnostic if possible.  What
about PERF_PMU_OVERHEAD or PERF_PMU_SAMPLE_OVERHEAD?

Thanks,
Namhyung

> 
> 
>  
> > > @@ -1872,7 +1873,7 @@ __perf_remove_from_context(struct perf_event
> > > *event,  {
> > >  	unsigned long flags = (unsigned long)info;
> > >
> > > -	event_sched_out(event, cpuctx, ctx);
> > > +	event_sched_out(event, cpuctx, ctx, false);
> > >  	if (flags & DETACH_GROUP)
> > >  		perf_group_detach(event);
> > >  	list_del_event(event, ctx);
> > > @@ -1918,9 +1919,9 @@ static void __perf_event_disable(struct
> > perf_event *event,
> > >  	update_cgrp_time_from_event(event);
> > >  	update_group_times(event);
> > >  	if (event == event->group_leader)
> > > -		group_sched_out(event, cpuctx, ctx);
> > > +		group_sched_out(event, cpuctx, ctx, true);
> > >  	else
> > > -		event_sched_out(event, cpuctx, ctx);
> > > +		event_sched_out(event, cpuctx, ctx, true);
> > 
> > Why does this differ from __perf_remove_from_context()?
> > 
> 
> Both of them are called on removing event. So I think we only
> need to log overhead in one place. 
> 
> I just did some tests. It looks __perf_remove_from_context is called
> after __perf_event_disable.
> I think I will log overhead in __perf_remove_from_context for next
> version.
> 
> 
> > What's the policy for when we do or do not measure overhead?
> 
> Currently, it's enabled all the time.
> Jirka suggested me to make it configurable. I will do it in next version.
> For next version, I still prefer to make it enable by default, since
> it doesn't bring additional overhead based on my test.
> 
> Thanks,
> Kan