Received: by 2002:a25:ef43:0:0:0:0:0 with SMTP id w3csp92392ybm; Tue, 26 May 2020 11:34:04 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxY9U3aeIshd1dGPi5VndPPemgIhFq/gjRu3PmukQq1qBMdVaMGXh2KGKHrFdORsbyJFnZE X-Received: by 2002:a17:906:f44:: with SMTP id h4mr2315348ejj.38.1590518044715; Tue, 26 May 2020 11:34:04 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1590518044; cv=none; d=google.com; s=arc-20160816; b=vl1HjAoyzZm2OWOBvEtwk3vce6/WA5Uoybi5UEuVruJlSSw6iSWXT7iTJK0HzcZsGm H8Sw6J7AZjYf9Kwq96rod/tYSjAoKa0IWLuG2Ho2KAtiX6f93LdI+HF1BCFeBuWdt1Rg rO25PIprULeZNAbg1CnmfLVKZhH/0F/+v5NdDX2w863WNKG+Aj1CqESv6ycvKsQQPXlb XclHw8VSyPXG+MgiaFs5loBn2uOgrb0TwEhzNVrVM127SH2cLHxzEWVbAAxhvqFFslG+ l7AqaYZWt/Tz6Bx7EWsrmgreLYO1KPlbfX4UqGBmmXCZ7X+tKp+rJK4kpv7BXDRMHGeg chYw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:message-id:date:references :in-reply-to:subject:cc:to:from:ironport-sdr:ironport-sdr; bh=rTmHdFCUvHCtmmfPH0kPsggynIChePeC4M6Hcs1O5vM=; b=V7pe7jxeMHdg5QAtJOrznJc1LkOLJ6ohWlUPAZ51FJJ289SPG9jITWf73eUErh4WDi T3hZL9NNDMi94gLJuZ5pB3yFEDWsOk6O9PIoE1StX784Qqm21db9ORKyxb7DkT/1rqki mdjkL5wiTSoUt6iI7QNnLgufbMRRaE1Hh3yT/KInOw5mbfh3yx5DyqNc04hk0Uk6rmPW mS8/NXUHV3kTBZUhHDgYpDIX4yEptxdAAd1Gw0tLAVRqloSssLBv1c+C14rOMSbKGg+y B45tmAbrcNEJyZnXuaHRDLrnyymZ2ON2gzR3uYJfhDv7oom/z979myZFBdh0dq1YST9Q Ogmg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id lz8si371647ejb.721.2020.05.26.11.33.40; Tue, 26 May 2020 11:34:04 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2388761AbgEZS3e (ORCPT + 99 others); Tue, 26 May 2020 14:29:34 -0400 Received: from mga05.intel.com ([192.55.52.43]:29498 "EHLO mga05.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2387399AbgEZS3d (ORCPT ); Tue, 26 May 2020 14:29:33 -0400 IronPort-SDR: 6JvKWslHFxzY6xC33lNmEhkJEBVDbY0WmgWSo92TmqLJslRwqvBc4H7WVe3L7gTmQBolx43RKZ v072yFc2lz/w== X-Amp-Result: UNKNOWN X-Amp-Original-Verdict: FILE UNKNOWN X-Amp-File-Uploaded: False Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by fmsmga105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 26 May 2020 11:29:02 -0700 IronPort-SDR: HEMx0RmOfY0I6ir8uYfKkspz4iaciu8zoN6YYVK+qBE3zJ8GCkEyLvKMRSONGhuuymtbdl74nc r+vqQYA/XOQA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.73,437,1583222400"; d="asc'?scan'208";a="375782186" Received: from sbensado-mobl1.ger.corp.intel.com (HELO otredad) ([10.252.58.193]) by fmsmga001.fm.intel.com with ESMTP; 26 May 2020 11:28:57 -0700 From: Francisco Jerez To: "Rafael J. Wysocki" Cc: "Rafael J. Wysocki" , "Rafael J. Wysocki" , Linux PM , LKML , Len Brown , Srinivas Pandruvada , Peter Zijlstra , Giovanni Gherdovich , Doug Smythies Subject: Re: [RFC/RFT][PATCH] cpufreq: intel_pstate: Work in passive mode with HWP enabled In-Reply-To: References: <3169564.ZRsPWhXyMD@kreacher> <87mu5wre1v.fsf@intel.com> <87a71vraus.fsf@intel.com> Date: Tue, 26 May 2020 11:29:09 -0700 Message-ID: <874ks2r1my.fsf@intel.com> MIME-Version: 1.0 Content-Type: multipart/signed; boundary="==-=-="; micalg=pgp-sha256; protocol="application/pgp-signature" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org --==-=-= Content-Type: multipart/mixed; boundary="=-=-=" --=-=-= Content-Type: text/plain; charset=utf-8 Content-Disposition: inline "Rafael J. Wysocki" writes: > to On Mon, May 25, 2020 at 10:57 PM Francisco Jerez > wrote: >> >> "Rafael J. Wysocki" writes: >> >> > On Mon, May 25, 2020 at 3:39 AM Francisco Jerez >> > wrote: >> >> >> >> "Rafael J. Wysocki" writes: >> >> >> >> > From: Rafael J. Wysocki >> >> > >> >> > Allow intel_pstate to work in the passive mode with HWP enabled and >> >> > make it translate the target frequency supplied by the cpufreq >> >> > governor in use into an EPP value to be written to the HWP request >> >> > MSR (high frequencies are mapped to low EPP values that mean more >> >> > performance-oriented HWP operation) as a hint for the HWP algorithm >> >> > in the processor, so as to prevent it and the CPU scheduler from >> >> > working against each other at least when the schedutil governor is >> >> > in use. >> >> > >> >> > Signed-off-by: Rafael J. Wysocki >> >> > --- >> >> > >> >> > This is a prototype not intended for production use (based on linux-next). >> >> > >> >> > Please test it if you can (on HWP systems, of course) and let me know the >> >> > results. >> >> > >> >> > The INTEL_CPUFREQ_TRANSITION_DELAY_HWP value has been guessed and it very well >> >> > may turn out to be either too high or too low for the general use, which is one >> >> > reason why getting as much testing coverage as possible is key here. >> >> > >> >> > If you can play with different INTEL_CPUFREQ_TRANSITION_DELAY_HWP values, >> >> > please do so and let me know the conclusions. >> >> > >> >> > Cheers, >> >> > Rafael >> >> > >> >> > --- >> >> > drivers/cpufreq/intel_pstate.c | 169 +++++++++++++++++++++++++++++++---------- >> >> > 1 file changed, 131 insertions(+), 38 deletions(-) >> >> > >> >> > Index: linux-pm/drivers/cpufreq/intel_pstate.c >> >> > =================================================================== >> >> > --- linux-pm.orig/drivers/cpufreq/intel_pstate.c >> >> > +++ linux-pm/drivers/cpufreq/intel_pstate.c >> >> > @@ -36,6 +36,7 @@ >> >> > #define INTEL_PSTATE_SAMPLING_INTERVAL (10 * NSEC_PER_MSEC) >> >> > >> >> > #define INTEL_CPUFREQ_TRANSITION_LATENCY 20000 >> >> > +#define INTEL_CPUFREQ_TRANSITION_DELAY_HWP 5000 >> >> > #define INTEL_CPUFREQ_TRANSITION_DELAY 500 >> >> > >> >> > #ifdef CONFIG_ACPI >> >> > @@ -95,6 +96,8 @@ static inline int32_t percent_ext_fp(int >> >> > return div_ext_fp(percent, 100); >> >> > } >> >> > >> >> > +#define HWP_EPP_TO_BYTE(x) (((u64)x >> 24) & 0xFF) >> >> > + >> >> > /** >> >> > * struct sample - Store performance sample >> >> > * @core_avg_perf: Ratio of APERF/MPERF which is the actual average >> >> > @@ -2175,7 +2178,10 @@ static int intel_pstate_verify_policy(st >> >> > >> >> > static void intel_cpufreq_stop_cpu(struct cpufreq_policy *policy) >> >> > { >> >> > - intel_pstate_set_min_pstate(all_cpu_data[policy->cpu]); >> >> > + if (hwp_active) >> >> > + intel_pstate_hwp_force_min_perf(policy->cpu); >> >> > + else >> >> > + intel_pstate_set_min_pstate(all_cpu_data[policy->cpu]); >> >> > } >> >> > >> >> > static void intel_pstate_stop_cpu(struct cpufreq_policy *policy) >> >> > @@ -2183,12 +2189,10 @@ static void intel_pstate_stop_cpu(struct >> >> > pr_debug("CPU %d exiting\n", policy->cpu); >> >> > >> >> > intel_pstate_clear_update_util_hook(policy->cpu); >> >> > - if (hwp_active) { >> >> > + if (hwp_active) >> >> > intel_pstate_hwp_save_state(policy); >> >> > - intel_pstate_hwp_force_min_perf(policy->cpu); >> >> > - } else { >> >> > - intel_cpufreq_stop_cpu(policy); >> >> > - } >> >> > + >> >> > + intel_cpufreq_stop_cpu(policy); >> >> > } >> >> > >> >> > static int intel_pstate_cpu_exit(struct cpufreq_policy *policy) >> >> > @@ -2296,7 +2300,8 @@ static int intel_cpufreq_verify_policy(s >> >> > #define INTEL_PSTATE_TRACE_TARGET 10 >> >> > #define INTEL_PSTATE_TRACE_FAST_SWITCH 90 >> >> > >> >> > -static void intel_cpufreq_trace(struct cpudata *cpu, unsigned int trace_type, int old_pstate) >> >> > +static void intel_cpufreq_trace(struct cpudata *cpu, unsigned int trace_type, >> >> > + int from, int to) >> >> > { >> >> > struct sample *sample; >> >> > >> >> > @@ -2309,8 +2314,8 @@ static void intel_cpufreq_trace(struct c >> >> > sample = &cpu->sample; >> >> > trace_pstate_sample(trace_type, >> >> > 0, >> >> > - old_pstate, >> >> > - cpu->pstate.current_pstate, >> >> > + from, >> >> > + to, >> >> > sample->mperf, >> >> > sample->aperf, >> >> > sample->tsc, >> >> > @@ -2318,40 +2323,110 @@ static void intel_cpufreq_trace(struct c >> >> > fp_toint(cpu->iowait_boost * 100)); >> >> > } >> >> > >> >> > -static int intel_cpufreq_target(struct cpufreq_policy *policy, >> >> > - unsigned int target_freq, >> >> > - unsigned int relation) >> >> > +static void intel_cpufreq_update_hwp_request(struct cpudata *cpu, u8 new_epp) >> >> > { >> >> > - struct cpudata *cpu = all_cpu_data[policy->cpu]; >> >> > - struct cpufreq_freqs freqs; >> >> > - int target_pstate, old_pstate; >> >> > + u64 value, prev; >> >> > >> >> > - update_turbo_state(); >> >> > + prev = READ_ONCE(cpu->hwp_req_cached); >> >> > + value = prev; >> >> > >> >> > - freqs.old = policy->cur; >> >> > - freqs.new = target_freq; >> >> > + /* >> >> > + * The entire MSR needs to be updated in order to update the EPP field >> >> > + * in it, so opportunistically update the min and max too if needed. >> >> > + */ >> >> > + value &= ~HWP_MIN_PERF(~0L); >> >> > + value |= HWP_MIN_PERF(cpu->min_perf_ratio); >> >> > + >> >> > + value &= ~HWP_MAX_PERF(~0L); >> >> > + value |= HWP_MAX_PERF(cpu->max_perf_ratio); >> >> > + >> >> > + if (boot_cpu_has(X86_FEATURE_HWP_EPP)) { >> >> > + intel_cpufreq_trace(cpu, INTEL_PSTATE_TRACE_TARGET, >> >> > + HWP_EPP_TO_BYTE(prev), new_epp); >> >> > + >> >> > + value &= ~GENMASK_ULL(31, 24); >> >> > + value |= HWP_ENERGY_PERF_PREFERENCE(new_epp); >> >> > + } >> >> > + >> >> > + if (value != prev) { >> >> > + WRITE_ONCE(cpu->hwp_req_cached, value); >> >> > + wrmsrl_on_cpu(cpu->cpu, MSR_HWP_REQUEST, value); >> >> > + } >> >> > +} >> >> > + >> >> > +/** >> >> > + * intel_cpufreq_adjust_hwp_request - Adjust the HWP reuqest register. >> >> > + * @cpu: Target CPU. >> >> > + * @max_freq: Maximum frequency to consider. >> >> > + * @target_freq: Target frequency selected by the governor. >> >> > + * >> >> > + * Translate the target frequency into a new EPP value to be written into the >> >> > + * HWP request MSR of @cpu as a hint for the HW-driven P-state selection. >> >> > + * >> >> > + * The purpose of this is to avoid situations in which the kernel and the HWP >> >> > + * algorithm work against each other by giving a hint about the expectations of >> >> > + * the former to the latter. >> >> > + * >> >> > + * The mapping betweeen the target frequencies and the hint values need not be >> >> > + * exact, but it must be monotonic, so that higher target frequencies always >> >> > + * indicate more performance-oriented P-state selection. >> >> > + */ >> >> > +static void intel_cpufreq_adjust_hwp_request(struct cpudata *cpu, s64 max_freq, >> >> > + unsigned int target_freq) >> >> > +{ >> >> > + s64 epp_fp = div_fp(255 * (max_freq - target_freq), max_freq); >> >> > + >> >> > + intel_cpufreq_update_hwp_request(cpu, fp_toint(epp_fp)); >> >> > +} >> >> > + >> >> >> >> Hey Rafael, I'm building a kernel with this in order to give it a try on >> >> my system, but I'm skeptical that translating the target frequency to an >> >> EPP value will work reliably. AFAIA the EPP value only has an indirect >> >> influence on the processor's performance by adjusting the trade-off >> >> between its responsiveness (rather than the actual clock frequency which >> >> it will sustain in the long run) and its energy usage, in a largely >> >> unspecified and non-linear way (non-linear like the effect of switching >> >> CPU energy optimization features on and off, or like its effect on the >> >> energy balancing behavior of the processor which can have a paradoxical >> >> effect on performance). >> >> >> >> I doubt that the scheduling-based CPU utilization is a good predictor >> >> for the optimal trade-off between those two variables. >> > >> > While I agree that this is not perfect, there barely is anything else >> > that can be used for this purpose. >> > >> > Using the desired field or trying to adjust the limits relatively >> > often would basically cause the P-state selection to be driven >> > entirely by the kernel which simply doesn't know certain things only >> > known to the P-unit, so it is reasonable to leave some level of >> > control to the latter, so as to allow it to use the information known >> > to it only. >> > >> > However, if it is allowed to do whatever it likes without any hints >> > from the kernel, it may very well go against the scheduler's decisions >> > which is not going to be optimal. >> > >> > I'm simply not sure if there is any other way to give such hints to it >> > that through the EPP. >> > >> >> Why not HWP_MIN_PERF? That would leave the HWP quite some room for >> maneuvering (the whole HWP_MIN_PERF-HWP_MAX_PERF P-state range, it's not >> like P-state selection would be entirely driven by the kernel), while >> avoiding every failure scenario I was describing in my previous reply. > > Actually, I have been thinking about the HWP min as an alternative > that may be worth evaluating. > > However, I would rather set the HWP min to something like 80% if the > cpufreq request. > > Let me cut an alternative patch. Heh... Good. While you're at it, it would be nice to make sure we have a way for the governor to control both the minimum and maximum P-state in some efficient way, rather than it simply providing a single target, with [1] in mind. [1] https://lwn.net/Articles/818827/ --=-=-=-- --==-=-= Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iHUEAREIAB0WIQST8OekYz69PM20/4aDmTidfVK/WwUCXs1f9QAKCRCDmTidfVK/ W1EmAP9wqhC2wCS7v7W29lEWnoDf9nF0ze6Miuqp/Oak6bSfGAD/SfjfpguqUC3a 7TQlvs6LxFD6tq7CUXtEsRGBV5WH8uA= =zsUH -----END PGP SIGNATURE----- --==-=-=--