Received: by 2002:ac0:a5b6:0:0:0:0:0 with SMTP id m51-v6csp671603imm; Tue, 5 Jun 2018 02:28:52 -0700 (PDT) X-Google-Smtp-Source: ADUXVKKPSsCtqEIa7xCgqEIh6WDSAKLPKky/6vm3nMz/VE2LjWcd4J/ZgmnF0ByZTrMDLRdGhuC7 X-Received: by 2002:a62:f206:: with SMTP id m6-v6mr15175119pfh.171.1528190932709; Tue, 05 Jun 2018 02:28:52 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1528190932; cv=none; d=google.com; s=arc-20160816; b=h8/R8T+AVjCYuMargKo6yvSlFn2vxHNjzmCsw1k2H1zXBKd6yaUbTkS8zmTgPCTd9/ ta/pNg/D+v506pOsmjYtroW3fEwu4gbxOYSguV0RO/g3IA6AvkQdbViC2zaQ/7VnJSrl 2U1nERkrTkIRHYdErX42E+UeQD+7rWnDJvTAaLBwRDKklWG/sqV+mwQCHwj/l+Scwjvm YMW3MUQTRWdRY3TtMTFr81y02GSGtGsLbJwIpUyrSfMAUZgaW8I4/I0NR1LftyMZKAGR 0ZW3i00q6YOkGz+jMKxLF+BGMNFbe7Him5gsfut0hKoZxo0k4QsPpvk4yGQZKzAnCYAa pabw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :references:in-reply-to:mime-version:dkim-signature :arc-authentication-results; bh=XRtVTYzE1AS2VaziylRHayxCS/PuY78WFlhxviQLRvw=; b=hxSsL/PpgBZqkafbp6p6t8UTzKGhq0N3areGA97o3UNLJBbSHZezQmounUwOpw0E3p aqRv0XxvWZS8xg5Z9hiZkmROb43FPUgGoeOy8NeUKz4BQVqmoqJCkQ8MBO2x+fnyKaws oBPQR4tNLDX9ML0I2Wo7wZccbpdGpoi57uAi9j1/QsLETyMdJt8c5V7vd+c6SA3pT/mZ 0yYURMio8TmN/WFwaFFdh3BswlsKug18DSpjfHcKzHHSobn7/UnlxZya49VNX8WonfoG jY8wMrlbKzphSV6HA9ALQ1QWQ3LxeXnslK5egeRNQImVCkwNlyU5+5Lf3FkPf6+pPKq0 ApKQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@gmail.com header.s=20161025 header.b=LfDfcn7j; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id p1-v6si45907211pfn.269.2018.06.05.02.28.38; Tue, 05 Jun 2018 02:28:52 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=fail header.i=@gmail.com header.s=20161025 header.b=LfDfcn7j; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751570AbeFEJ1O (ORCPT + 99 others); Tue, 5 Jun 2018 05:27:14 -0400 Received: from mail-ot0-f194.google.com ([74.125.82.194]:44380 "EHLO mail-ot0-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751525AbeFEJ1M (ORCPT ); Tue, 5 Jun 2018 05:27:12 -0400 Received: by mail-ot0-f194.google.com with SMTP id w13-v6so1948002ote.11; Tue, 05 Jun 2018 02:27:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:sender:in-reply-to:references:from:date:message-id :subject:to:cc; bh=XRtVTYzE1AS2VaziylRHayxCS/PuY78WFlhxviQLRvw=; b=LfDfcn7jHK8F5VCz+y2Al1hnODMN0CT1yxV8DakeJXUw7DOcJLaLuWp8xxSN0LWJU9 LNtKOIgJozqKR7c57WDLkU2qPjXwLv6DGJOerW/Hyf1yheNzOvAMru3x8Mv8FyxT0aDP ck3NVFrWByQqu36uTW095kjbCyylHhLdtnlngKzgXW/XjRcCt4lS+nqwjtLO8lIPAgih R0FFu3ICM7w8k3MwMZ+2KvY6CnR3r6LaQGGYpnmBh4sHRavmipt3G6QdfRYBX+GaQksx 47JmZe+BmhUVPgv4k8xv0Zvu/OUoSW7zD66xbknCZfc85gLhapULoE8btvK0yDe8UX5N cZwg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:sender:in-reply-to:references:from :date:message-id:subject:to:cc; bh=XRtVTYzE1AS2VaziylRHayxCS/PuY78WFlhxviQLRvw=; b=mGWvd5MfYVyvzDNNd/woafmC6Xy18RWkrHX7Yy3f9hl+ep4c/eV8T6QIgD0ViIVyrR AYfMQT8/8vi7ZV127KSmrbfM3bG4AofxqOIUL37jmOg2Dmy9kumrYUqzWtrTaAhT5kxH fAyKFGYUK002u9iTD9vPU8gptpmbpP5wjHnaBcYG8dbWOPYyEc71+O9vTlvdQU6r9N8A zNodBVngv2GE8wy4LtDGHbYyeamVmR87ID5qnK9plruwwgZC0g+MJtPQuFu4vUO6Gtcm vIVxXJ+s+rdLa7jIHrkRLtHYN7fVRY7bW32YSKP9JSNKZj9VqHUO9Go8//dMZ01t5vgp kJew== X-Gm-Message-State: APt69E2VQpSFK0F423r78/U+GJjF1gqgwSTbwRyJLjeu3lNPylmMi5wk Iab0UMkYDMeksH1sYn8anKpYpIKzOq6ljrdRIVs= X-Received: by 2002:a9d:60d2:: with SMTP id b18-v6mr16227088otk.305.1528190832164; Tue, 05 Jun 2018 02:27:12 -0700 (PDT) MIME-Version: 1.0 Received: by 2002:a9d:1468:0:0:0:0:0 with HTTP; Tue, 5 Jun 2018 02:27:11 -0700 (PDT) In-Reply-To: <20180531225143.34270-2-srinivas.pandruvada@linux.intel.com> References: <20180531225143.34270-1-srinivas.pandruvada@linux.intel.com> <20180531225143.34270-2-srinivas.pandruvada@linux.intel.com> From: "Rafael J. Wysocki" Date: Tue, 5 Jun 2018 11:27:11 +0200 X-Google-Sender-Auth: 8tIZct9qC9qpdQ-rp3XCIX9ukJU Message-ID: Subject: Re: [RFC/RFT] [PATCH v3 1/4] cpufreq: intel_pstate: Add HWP boost utility and sched util hooks To: Srinivas Pandruvada Cc: Len Brown , "Rafael J. Wysocki" , Peter Zijlstra , Mel Gorman , Linux PM , Linux Kernel Mailing List , Juri Lelli , Viresh Kumar Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Jun 1, 2018 at 12:51 AM, Srinivas Pandruvada wrote: > Added two utility functions to HWP boost up gradually and boost down to > the default cached HWP request values. > > Boost up: > Boost up updates HWP request minimum value in steps. This minimum value > can reach upto at HWP request maximum values depends on how frequently, > this boost up function is called. At max, boost up will take three steps > to reach the maximum, depending on the current HWP request levels and HWP > capabilities. For example, if the current settings are: > If P0 (Turbo max) = P1 (Guaranteed max) = min > No boost at all. > If P0 (Turbo max) > P1 (Guaranteed max) = min > Should result in one level boost only for P0. > If P0 (Turbo max) = P1 (Guaranteed max) > min > Should result in two level boost: > (min + p1)/2 and P1. > If P0 (Turbo max) > P1 (Guaranteed max) > min > Should result in three level boost: > (min + p1)/2, P1 and P0. > We don't set any level between P0 and P1 as there is no guarantee that > they will be honored. > > Boost down: > After the system is idle for hold time of 3ms, the HWP request is reset > to the default value from HWP init or user modified one via sysfs. > > Caching of HWP Request and Capabilities > Store the HWP request value last set using MSR_HWP_REQUEST and read > MSR_HWP_CAPABILITIES. This avoid reading of MSRs in the boost utility > functions. > > These boost utility functions calculated limits are based on the latest > HWP request value, which can be modified by setpolicy() callback. So if > user space modifies the minimum perf value, that will be accounted for > every time the boost up is called. There will be case when there can be > contention with the user modified minimum perf, in that case user value > will gain precedence. For example just before HWP_REQUEST MSR is updated > from setpolicy() callback, the boost up function is called via scheduler > tick callback. Here the cached MSR value is already the latest and limits > are updated based on the latest user limits, but on return the MSR write > callback called from setpolicy() callback will update the HWP_REQUEST > value. This will be used till next time the boost up function is called. > > In addition add a variable to control HWP dynamic boosting. When HWP > dynamic boost is active then set the HWP specific update util hook. The > contents in the utility hooks will be filled in the subsequent patches. > > Signed-off-by: Srinivas Pandruvada > --- > drivers/cpufreq/intel_pstate.c | 99 ++++++++++++++++++++++++++++++++++++++++-- > 1 file changed, 95 insertions(+), 4 deletions(-) > > diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c > index 17e566afbb41..80bf61ae4b1f 100644 > --- a/drivers/cpufreq/intel_pstate.c > +++ b/drivers/cpufreq/intel_pstate.c > @@ -221,6 +221,9 @@ struct global_params { > * preference/bias > * @epp_saved: Saved EPP/EPB during system suspend or CPU offline > * operation > + * @hwp_req_cached: Cached value of the last HWP Request MSR > + * @hwp_cap_cached: Cached value of the last HWP Capabilities MSR > + * @hwp_boost_min: Last HWP boosted min performance > * > * This structure stores per CPU instance data for all CPUs. > */ > @@ -253,6 +256,9 @@ struct cpudata { > s16 epp_policy; > s16 epp_default; > s16 epp_saved; > + u64 hwp_req_cached; > + u64 hwp_cap_cached; > + int hwp_boost_min; Why int? That's a register value, so maybe u32? > }; > > static struct cpudata **all_cpu_data; > @@ -285,6 +291,7 @@ static struct pstate_funcs pstate_funcs __read_mostly; > > static int hwp_active __read_mostly; > static bool per_cpu_limits __read_mostly; > +static bool hwp_boost __read_mostly; > > static struct cpufreq_driver *intel_pstate_driver __read_mostly; > > @@ -689,6 +696,7 @@ static void intel_pstate_get_hwp_max(unsigned int cpu, int *phy_max, > u64 cap; > > rdmsrl_on_cpu(cpu, MSR_HWP_CAPABILITIES, &cap); > + WRITE_ONCE(all_cpu_data[cpu]->hwp_cap_cached, cap); > if (global.no_turbo) > *current_max = HWP_GUARANTEED_PERF(cap); > else > @@ -763,6 +771,7 @@ static void intel_pstate_hwp_set(unsigned int cpu) > intel_pstate_set_epb(cpu, epp); > } > skip_epp: > + WRITE_ONCE(cpu_data->hwp_req_cached, value); > wrmsrl_on_cpu(cpu, MSR_HWP_REQUEST, value); > } > > @@ -1381,6 +1390,81 @@ static void intel_pstate_get_cpu_pstates(struct cpudata *cpu) > intel_pstate_set_min_pstate(cpu); > } > > +/* > + * Long hold time will keep high perf limits for long time, > + * which negatively impacts perf/watt for some workloads, > + * like specpower. 3ms is based on experiements on some > + * workoads. > + */ > +static int hwp_boost_hold_time_ms = 3; > + > +static inline void intel_pstate_hwp_boost_up(struct cpudata *cpu) > +{ > + u64 hwp_req = READ_ONCE(cpu->hwp_req_cached); > + int max_limit = (hwp_req & 0xff00) >> 8; > + int min_limit = (hwp_req & 0xff); > + int boost_level1; > + > + /* > + * Cases to consider (User changes via sysfs or boot time): > + * If, P0 (Turbo max) = P1 (Guaranteed max) = min: > + * No boost, return. > + * If, P0 (Turbo max) > P1 (Guaranteed max) = min: > + * Should result in one level boost only for P0. > + * If, P0 (Turbo max) = P1 (Guaranteed max) > min: > + * Should result in two level boost: > + * (min + p1)/2 and P1. > + * If, P0 (Turbo max) > P1 (Guaranteed max) > min: > + * Should result in three level boost: > + * (min + p1)/2, P1 and P0. > + */ > + > + /* If max and min are equal or already at max, nothing to boost */ > + if (max_limit == min_limit || cpu->hwp_boost_min >= max_limit) > + return; > + > + if (!cpu->hwp_boost_min) > + cpu->hwp_boost_min = min_limit; > + > + /* level at half way mark between min and guranteed */ > + boost_level1 = (HWP_GUARANTEED_PERF(cpu->hwp_cap_cached) + min_limit) >> 1; > + > + if (cpu->hwp_boost_min < boost_level1) > + cpu->hwp_boost_min = boost_level1; > + else if (cpu->hwp_boost_min < HWP_GUARANTEED_PERF(cpu->hwp_cap_cached)) > + cpu->hwp_boost_min = HWP_GUARANTEED_PERF(cpu->hwp_cap_cached); > + else if (cpu->hwp_boost_min == HWP_GUARANTEED_PERF(cpu->hwp_cap_cached) && > + max_limit != HWP_GUARANTEED_PERF(cpu->hwp_cap_cached)) > + cpu->hwp_boost_min = max_limit; > + else > + return; > + > + hwp_req = (hwp_req & ~GENMASK_ULL(7, 0)) | cpu->hwp_boost_min; > + wrmsrl(MSR_HWP_REQUEST, hwp_req); > + cpu->last_update = cpu->sample.time; > +} > + > +static inline void intel_pstate_hwp_boost_down(struct cpudata *cpu) > +{ > + if (cpu->hwp_boost_min) { > + bool expired; > + > + /* Check if we are idle for hold time to boost down */ > + expired = time_after64(cpu->sample.time, cpu->last_update + > + (hwp_boost_hold_time_ms * NSEC_PER_MSEC)); It would be good to avoid the multiplication here as it will just add overhead for no value. > + if (expired) { > + wrmsrl(MSR_HWP_REQUEST, cpu->hwp_req_cached); > + cpu->hwp_boost_min = 0; > + } > + } > + cpu->last_update = cpu->sample.time; > +} > + > +static inline void intel_pstate_update_util_hwp(struct update_util_data *data, > + u64 time, unsigned int flags) > +{ > +} > + > static inline void intel_pstate_calc_avg_perf(struct cpudata *cpu) > { > struct sample *sample = &cpu->sample; > @@ -1684,7 +1768,7 @@ static void intel_pstate_set_update_util_hook(unsigned int cpu_num) > { > struct cpudata *cpu = all_cpu_data[cpu_num]; > > - if (hwp_active) > + if (hwp_active && !hwp_boost) > return; > > if (cpu->update_util_set) > @@ -1692,8 +1776,12 @@ static void intel_pstate_set_update_util_hook(unsigned int cpu_num) > > /* Prevent intel_pstate_update_util() from using stale data. */ > cpu->sample.time = 0; > - cpufreq_add_update_util_hook(cpu_num, &cpu->update_util, > - intel_pstate_update_util); > + if (hwp_active) > + cpufreq_add_update_util_hook(cpu_num, &cpu->update_util, > + intel_pstate_update_util_hwp); > + else > + cpufreq_add_update_util_hook(cpu_num, &cpu->update_util, > + intel_pstate_update_util); You can use the ternary operator in the third arg of cpufreq_add_update_util_hook(). > cpu->update_util_set = true; > } > > @@ -1805,8 +1893,11 @@ static int intel_pstate_set_policy(struct cpufreq_policy *policy) > intel_pstate_set_update_util_hook(policy->cpu); > } > > - if (hwp_active) > + if (hwp_active) { > + if (!hwp_boost) > + intel_pstate_clear_update_util_hook(policy->cpu); This can be called unconditionally as the cpu_data->update_util_set check will make it return immediately anyway AFAICS. > intel_pstate_hwp_set(policy->cpu); > + } > > mutex_unlock(&intel_pstate_limits_lock); > > --