Date: Fri, 23 Jun 2017 23:46:50 +0200 (CEST)
From: Thomas Gleixner <tglx@linutronix.de>
To: Kan Liang <kan.liang@intel.com>
cc: peterz@infradead.org, mingo@redhat.com, eranian@google.com,
        linux-kernel@vger.kernel.org, alexander.shishkin@linux.intel.com,
        acme@redhat.com, jolsa@redhat.com, torvalds@linux-foundation.org,
        vincent.weaver@maine.edu, ak@linux.intel.com
Subject: Re: [PATCH V2 1/2] perf/x86/intel: enable CPU ref_cycles for GP
 counter
In-Reply-To: <1497029283-3332-1-git-send-email-kan.liang@intel.com>
Message-ID: <alpine.DEB.2.20.1706232315230.2234@nanos>
References: <1497029283-3332-1-git-send-email-kan.liang@intel.com>
User-Agent: Alpine 2.20 (DEB 67 2015-01-07)
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 2847
Lines: 76

On Fri, 9 Jun 2017, kan.liang@intel.com wrote:
> User visible RDPMC issue
> It is impossible to transparently handle the case, which reading
> ref-cycles by RDPMC instruction in user space.
> ref_cycles_factor_for_rdpmc is exposed for RDPMC user.
> For the few users who want to read ref-cycles by RDPMC, they have to
> correct the result by multipling ref_cycles_factor_for_rdpmc manually,
> if GP counter is used.

That's not how it works. You cannot nilly willy define what existing users
have to do and thereby break existing setups and use cases.

It does not matter how many patches you send to pmu tools, PAPI and
whatever. It'll take ages until they permeate into distros, but until that
happens users are stranded.

The _existing_ user space interface is also used by power users directly
and we are not going to break that just because you think there are just a
few users and they should accomodate.

If you want to move the watchdog NMI to ref cycles, then this needs to be
on a opt in basis via kernel command line and perhaps a config option
which is default n.

>  __init int intel_pmu_init(void)
>  {
>  	union cpuid10_edx edx;
> @@ -3775,7 +3839,11 @@ __init int intel_pmu_init(void)
>  
>  		intel_pmu_pebs_data_source_nhm();
>  		x86_add_quirk(intel_nehalem_quirk);
> -
> +		x86_pmu.ref_cycles_factor = nhm_get_ref_cycles_factor();
> +		if (x86_pmu.ref_cycles_factor) {

Why would this be 0? 

> +			x86_pmu.ref_cycles_rep = nhm_ref_cycles_rep;

What's the point of this function pointer? All implementations use the same
callback and I don't see any reason why this would change in the forseeable
future. If there is a reason you better document it.

The function can be called directly w/o indirection.. And the conditionals
can look at ref_cycles_factor to figure out whether it can be done or not.

And with that you can spare the whole exercise of sprinkling the same
factor_attrs thingy all over the place. You simply can check
ref_cycles_factor in the attribute setup code and conditionally merge
intel_ref_cycles_factor_attrs.

> +			x86_pmu.factor_attrs = intel_ref_cycles_factor_attrs;
> +		}
>  		pr_cont("Nehalem events, ");
>  		break;
>  
> @@ -3835,6 +3903,11 @@ __init int intel_pmu_init(void)
>  		x86_pmu.lbr_pt_coexist = true;
>  		x86_pmu.flags |= PMU_FL_HAS_RSP_1;
>  		x86_pmu.cpu_events = glm_events_attrs;
> +		x86_pmu.ref_cycles_factor = glm_get_ref_cycles_factor();
> +		if (x86_pmu.ref_cycles_factor) {

Copy and paste "programming" ... The factor is hard coded '1' as you
documented also in the changelog. So why do you need an extra function to
retrieve 1 and a sanity check whether the function actually returned 1?

> +			x86_pmu.ref_cycles_rep = nhm_ref_cycles_rep;
> +			x86_pmu.factor_attrs = intel_ref_cycles_factor_attrs;
> +		}
>  		pr_cont("Goldmont events, ");
>  		break;

Thanks,

	tglx