Received-SPF: pass (google.com: domain of linux-kernel+bounces-16-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45d1:ec00::1 as permitted sender) client-ip=2604:1380:45d1:ec00::1;
Precedence: bulk
MIME-Version: 1.0
References: <20231212072617.14756-1-lihuisong@huawei.com>
In-Reply-To: <20231212072617.14756-1-lihuisong@huawei.com>
From: "Rafael J. Wysocki" <rafael@kernel.org>
Date: Thu, 14 Dec 2023 20:31:17 +0100
Message-ID: <CAJZ5v0jwW0=8cNvC-Vu_o+pEHFpN9nrPD4LXCpmSTgQBTHODgg@mail.gmail.com>
Subject: Re: [PATCH] cpufreq: CPPC: Resolve the large frequency discrepancy
 from cpuinfo_cur_freq
To: Huisong Li <lihuisong@huawei.com>
Cc: linux-kernel@vger.kernel.org, linux-pm@vger.kernel.org, 
	linux-arm-kernel@lists.infradead.org, rafael@kernel.org, 
	beata.michalska@arm.com, sumitg@nvidia.com, ionela.voinescu@arm.com, 
	zengheng4@huawei.com, yang@os.amperecomputing.com, will@kernel.org, 
	sudeep.holla@arm.com, liuyonglong@huawei.com, zhanjie9@hisilicon.com
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

On Tue, Dec 12, 2023 at 8:26=E2=80=AFAM Huisong Li <lihuisong@huawei.com> w=
rote:
>
> Many developers found that the cpu current frequency is greater than
> the maximum frequency of the platform, please see [1], [2] and [3].
>
> In the scenarios with high memory access pressure, the patch [1] has
> proved the significant latency of cpc_read() which is used to obtain
> delivered and reference performance counter cause an absurd frequency.
> The sampling interval for this counters is very critical and is expected
> to be equal. However, the different latency of cpc_read() has a direct
> impact on their sampling interval.
>
> This patch adds a interface, cpc_read_arch_counters_on_cpu, to read
> delivered and reference performance counter together. According to my
> test[4], the discrepancy of cpu current frequency in the scenarios with
> high memory access pressure is lower than 0.2% by stress-ng application.
>
> [1] https://lore.kernel.org/all/20231025093847.3740104-4-zengheng4@huawei=
.com/
> [2] https://lore.kernel.org/all/20230328193846.8757-1-yang@os.amperecompu=
ting.com/
> [3] https://lore.kernel.org/all/20230418113459.12860-7-sumitg@nvidia.com/
>
> [4] My local test:
> The testing platform enable SMT and include 128 logical CPU in total,
> and CPU base frequency is 2.7GHz. Reading "cpuinfo_cur_freq" for each
> physical core on platform during the high memory access pressure from
> stress-ng, and the output is as follows:
>   0: 2699133     2: 2699942     4: 2698189     6: 2704347
>   8: 2704009    10: 2696277    12: 2702016    14: 2701388
>  16: 2700358    18: 2696741    20: 2700091    22: 2700122
>  24: 2701713    26: 2702025    28: 2699816    30: 2700121
>  32: 2700000    34: 2699788    36: 2698884    38: 2699109
>  40: 2704494    42: 2698350    44: 2699997    46: 2701023
>  48: 2703448    50: 2699501    52: 2700000    54: 2699999
>  56: 2702645    58: 2696923    60: 2697718    62: 2700547
>  64: 2700313    66: 2700000    68: 2699904    70: 2699259
>  72: 2699511    74: 2700644    76: 2702201    78: 2700000
>  80: 2700776    82: 2700364    84: 2702674    86: 2700255
>  88: 2699886    90: 2700359    92: 2699662    94: 2696188
>  96: 2705454    98: 2699260   100: 2701097   102: 2699630
> 104: 2700463   106: 2698408   108: 2697766   110: 2701181
> 112: 2699166   114: 2701804   116: 2701907   118: 2701973
> 120: 2699584   122: 2700474   124: 2700768   126: 2701963
>
> Signed-off-by: Huisong Li <lihuisong@huawei.com>

First off, please Cc ACPI-related patches to linux-acpi.

> ---
>  arch/arm64/kernel/topology.c | 43 ++++++++++++++++++++++++++++++++++--
>  drivers/acpi/cppc_acpi.c     | 22 +++++++++++++++---
>  include/acpi/cppc_acpi.h     |  5 +++++
>  3 files changed, 65 insertions(+), 5 deletions(-)
>
> diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c
> index 7d37e458e2f5..c3122154d738 100644
> --- a/arch/arm64/kernel/topology.c
> +++ b/arch/arm64/kernel/topology.c
> @@ -299,6 +299,11 @@ core_initcall(init_amu_fie);
>  #ifdef CONFIG_ACPI_CPPC_LIB
>  #include <acpi/cppc_acpi.h>
>
> +struct amu_counters {
> +       u64 corecnt;
> +       u64 constcnt;
> +};
> +
>  static void cpu_read_corecnt(void *val)
>  {
>         /*
> @@ -322,8 +327,27 @@ static void cpu_read_constcnt(void *val)
>                       0UL : read_constcnt();
>  }
>
> +static void cpu_read_amu_counters(void *data)
> +{
> +       struct amu_counters *cnt =3D (struct amu_counters *)data;
> +
> +       /*
> +        * The running time of the this_cpu_has_cap() might have a couple=
 of
> +        * microseconds and is significantly increased to tens of microse=
conds.
> +        * But AMU core and constant counter need to be read togeter with=
out any
> +        * time interval to reduce the calculation discrepancy using this=
 counters.
> +        */
> +       if (this_cpu_has_cap(ARM64_WORKAROUND_2457168)) {
> +               cnt->corecnt =3D read_corecnt();

This statement is present in both branches, so can it be moved before the i=
f ()?

> +               cnt->constcnt =3D 0;
> +       } else {
> +               cnt->corecnt =3D read_corecnt();
> +               cnt->constcnt =3D read_constcnt();
> +       }
> +}
> +
>  static inline
> -int counters_read_on_cpu(int cpu, smp_call_func_t func, u64 *val)
> +int counters_read_on_cpu(int cpu, smp_call_func_t func, void *data)
>  {
>         /*
>          * Abort call on counterless CPU or when interrupts are
> @@ -335,7 +359,7 @@ int counters_read_on_cpu(int cpu, smp_call_func_t fun=
c, u64 *val)
>         if (WARN_ON_ONCE(irqs_disabled()))
>                 return -EPERM;
>
> -       smp_call_function_single(cpu, func, val, 1);
> +       smp_call_function_single(cpu, func, data, 1);
>
>         return 0;
>  }
> @@ -364,6 +388,21 @@ bool cpc_ffh_supported(void)
>         return true;
>  }
>
> +int cpc_read_arch_counters_on_cpu(int cpu, u64 *delivered, u64 *referenc=
e)
> +{
> +       struct amu_counters cnts =3D {0};
> +       int ret;
> +
> +       ret =3D counters_read_on_cpu(cpu, cpu_read_amu_counters, &cnts);
> +       if (ret)
> +               return ret;
> +
> +       *delivered =3D cnts.corecnt;
> +       *reference =3D cnts.constcnt;
> +
> +       return 0;
> +}
> +
>  int cpc_read_ffh(int cpu, struct cpc_reg *reg, u64 *val)
>  {
>         int ret =3D -EOPNOTSUPP;
> diff --git a/drivers/acpi/cppc_acpi.c b/drivers/acpi/cppc_acpi.c
> index 7ff269a78c20..f303fabd7cfe 100644
> --- a/drivers/acpi/cppc_acpi.c
> +++ b/drivers/acpi/cppc_acpi.c
> @@ -1299,6 +1299,11 @@ bool cppc_perf_ctrs_in_pcc(void)
>  }
>  EXPORT_SYMBOL_GPL(cppc_perf_ctrs_in_pcc);
>
> +int __weak cpc_read_arch_counters_on_cpu(int cpu, u64 *delivered, u64 *r=
eference)
> +{
> +       return 0;
> +}
> +
>  /**
>   * cppc_get_perf_ctrs - Read a CPU's performance feedback counters.
>   * @cpunum: CPU from which to read counters.
> @@ -1313,7 +1318,8 @@ int cppc_get_perf_ctrs(int cpunum, struct cppc_perf=
_fb_ctrs *perf_fb_ctrs)
>                 *ref_perf_reg, *ctr_wrap_reg;
>         int pcc_ss_id =3D per_cpu(cpu_pcc_subspace_idx, cpunum);
>         struct cppc_pcc_data *pcc_ss_data =3D NULL;
> -       u64 delivered, reference, ref_perf, ctr_wrap_time;
> +       u64 delivered =3D 0, reference =3D 0;
> +       u64 ref_perf, ctr_wrap_time;
>         int ret =3D 0, regs_in_pcc =3D 0;
>
>         if (!cpc_desc) {
> @@ -1350,8 +1356,18 @@ int cppc_get_perf_ctrs(int cpunum, struct cppc_per=
f_fb_ctrs *perf_fb_ctrs)
>                 }
>         }
>
> -       cpc_read(cpunum, delivered_reg, &delivered);
> -       cpc_read(cpunum, reference_reg, &reference);
> +       if (cpc_ffh_supported()) {
> +               ret =3D cpc_read_arch_counters_on_cpu(cpunum, &delivered,=
 &reference);
> +               if (ret) {
> +                       pr_debug("read arch counters failed, ret=3D%d.\n"=
, ret);
> +                       ret =3D 0;
> +               }
> +       }

The above is surely not applicable to every platform using CPPC.  Also
it looks like in the ARM64_WORKAROUND_2457168 enabled case it is just
pointless overhead, because "reference" is always going to be 0 here
then.

Please clean that up.

> +       if (!delivered || !reference) {
> +               cpc_read(cpunum, delivered_reg, &delivered);
> +               cpc_read(cpunum, reference_reg, &reference);
> +       }
> +
>         cpc_read(cpunum, ref_perf_reg, &ref_perf);
>
>         /*
> diff --git a/include/acpi/cppc_acpi.h b/include/acpi/cppc_acpi.h
> index 6126c977ece0..07d4fd82d499 100644
> --- a/include/acpi/cppc_acpi.h
> +++ b/include/acpi/cppc_acpi.h
> @@ -152,6 +152,7 @@ extern bool cpc_ffh_supported(void);
>  extern bool cpc_supported_by_cpu(void);
>  extern int cpc_read_ffh(int cpunum, struct cpc_reg *reg, u64 *val);
>  extern int cpc_write_ffh(int cpunum, struct cpc_reg *reg, u64 val);
> +extern int cpc_read_arch_counters_on_cpu(int cpu, u64 *delivered, u64 *r=
eference);
>  extern int cppc_get_epp_perf(int cpunum, u64 *epp_perf);
>  extern int cppc_set_epp_perf(int cpu, struct cppc_perf_ctrls *perf_ctrls=
, bool enable);
>  extern int cppc_get_auto_sel_caps(int cpunum, struct cppc_perf_caps *per=
f_caps);
> @@ -209,6 +210,10 @@ static inline int cpc_write_ffh(int cpunum, struct c=
pc_reg *reg, u64 val)
>  {
>         return -ENOTSUPP;
>  }
> +static inline int cpc_read_arch_counters_on_cpu(int cpu, u64 *delivered,=
 u64 *reference)
> +{
> +       return -EOPNOTSUPP;
> +}
>  static inline int cppc_set_epp_perf(int cpu, struct cppc_perf_ctrls *per=
f_ctrls, bool enable)
>  {
>         return -ENOTSUPP;
> --