2019-07-10 18:40:37

by Janakarajan Natarajan

[permalink] [raw]
Subject: [PATCHv3 0/6] CPPC optional registers AMD support

CPPC (Collaborative Processor Performance Control) offers optional
registers which can be used to tune the system based on energy and/or
performance requirements.

Newer AMD processors (>= Family 17h) add support for a subset of these
optional CPPC registers, based on ACPI v6.1.

The following are the supported CPPC registers for which sysfs entries
are created:
* enable (NEW)
* max_perf (NEW)
* min_perf (NEW)
* energy_perf
* lowest_perf
* nominal_perf
* desired_perf (NEW)
* feedback_ctrs
* auto_sel_enable (NEW)
* lowest_nonlinear_perf

First, update cppc_acpi to create sysfs entries only when the optional
registers are known to be supported.

Next, a new CPUFreq driver is introduced to enable the OSPM and the userspace
to access the newly supported registers through sysfs entries found in
/sys/devices/system/cpu/cpu<num>/amd_cpufreq/.

This new CPUFreq driver can only be used by providing a module parameter,
amd_cpufreq.cppc_enable=1.

The purpose of exposing the registers via the amd-cpufreq sysfs entries is to
allow the userspace to:
* Tweak the values to fit its workload.
* Apply a profile from AMD's optimization guides.

Profiles will be documented in the performance/optimization guides.

Note:
* AMD systems will not have a policy applied in the kernel at this time.

TODO:
* Create a linux userspace tool that will help users generate a CPPC profile
for their target workload.
* Create a general CPPC policy in the kernel.

v1->v2:
* Add macro to ensure BUFFER only registers have BUFFER type.
* Add support macro to make the right check based on register type.
* Remove support checks for registers which are mandatory.

v2->v3:
* Introduce new amd-cpufreq driver which will have priority over acpi-cpufreq.
* Move new sysfs entries creation to amd-cpufreq.

Janakarajan Natarajan (3):
acpi/cppc: Add macros for CPPC register checks
acpi/cppc: Ensure only supported CPPC sysfs entries are created
drivers/cpufreq: Add a CPUFreq driver for AMD processors (Fam17h and
later)

Yazen Ghannam (3):
acpi/cppc: Rework cppc_set_perf() to use cppc_regs index
acpi/cppc: Add support for optional CPPC registers
acpi/cppc: Add support for CPPC Enable register

drivers/acpi/cppc_acpi.c | 244 ++++++++++++++++++++++++++++-----
drivers/cpufreq/Kconfig.x86 | 14 ++
drivers/cpufreq/Makefile | 4 +-
drivers/cpufreq/amd-cpufreq.c | 233 +++++++++++++++++++++++++++++++
drivers/cpufreq/cppc_cpufreq.c | 6 +-
include/acpi/cppc_acpi.h | 11 +-
6 files changed, 474 insertions(+), 38 deletions(-)
create mode 100644 drivers/cpufreq/amd-cpufreq.c

--
2.17.1


2019-07-10 18:40:38

by Janakarajan Natarajan

[permalink] [raw]
Subject: [PATCHv3 1/6] acpi/cppc: Add macros for CPPC register checks

Introduce two macros to help with checking the support for optional CPPC
registers.

CPC_SUP_BUFFER_ONLY ensures that an expected BUFFER only register has a
register type of ACPI_TYPE_BUFFER and is not NULL.

REG_SUPPORTED decides which check to perform based the expected type of
the CPPC register.

Signed-off-by: Janakarajan Natarajan <[email protected]>
---
drivers/acpi/cppc_acpi.c | 28 ++++++++++++++++++++++++++++
1 file changed, 28 insertions(+)

diff --git a/drivers/acpi/cppc_acpi.c b/drivers/acpi/cppc_acpi.c
index 15f103d7532b..c43de65531ae 100644
--- a/drivers/acpi/cppc_acpi.c
+++ b/drivers/acpi/cppc_acpi.c
@@ -111,6 +111,14 @@ static DEFINE_PER_CPU(struct cpc_desc *, cpc_desc_ptr);
#define CPC_SUPPORTED(cpc) ((cpc)->type == ACPI_TYPE_INTEGER ? \
!!(cpc)->cpc_entry.int_value : \
!IS_NULL_REG(&(cpc)->cpc_entry.reg))
+
+/*
+ * Evaluates to True if an optional cpc field is supported and is
+ * BUFFER only
+ */
+#define CPC_SUP_BUFFER_ONLY(cpc) ((cpc)->type == ACPI_TYPE_BUFFER && \
+ !IS_NULL_REG(&(cpc)->cpc_entry.reg))
+
/*
* Arbitrary Retries in case the remote processor is slow to respond
* to PCC commands. Keeping it high enough to cover emulators where
@@ -705,6 +713,26 @@ static bool is_cppc_supported(int revision, int num_ent)
* }
*/

+static bool is_buf_only(int reg_idx)
+{
+ switch (reg_idx) {
+ case HIGHEST_PERF:
+ case NOMINAL_PERF:
+ case LOW_NON_LINEAR_PERF:
+ case LOWEST_PERF:
+ case CTR_WRAP_TIME:
+ case AUTO_SEL_ENABLE:
+ case REFERENCE_PERF:
+ return false;
+ default:
+ return true;
+ }
+}
+
+#define REG_SUPPORTED(cpc, idx) (is_buf_only(idx) ? \
+ CPC_SUP_BUFFER_ONLY(&cpc->cpc_regs[idx]) : \
+ CPC_SUPPORTED(&cpc->cpc_regs[idx]))
+
/**
* acpi_cppc_processor_probe - Search for per CPU _CPC objects.
* @pr: Ptr to acpi_processor containing this CPU's logical ID.
--
2.17.1

2019-07-10 18:40:46

by Janakarajan Natarajan

[permalink] [raw]
Subject: [PATCHv3 5/6] acpi/cppc: Add support for CPPC Enable register

From: Yazen Ghannam <[email protected]>

To enable CPPC on a processor, the OS should write a value "1" to the
CPPC Enable register. Add support for this register.

Since we have a new variable "enable" in cppc_perf_ctrls, rename it
and the associated functions i.e. cppc_perf_ctrls->cppc_ctrls and
cppc_get_perf()->cppc_get_ctrls().

Signed-off-by: Yazen Ghannam <[email protected]>
[ carved out into a patch, cleaned up, productized ]
Signed-off-by: Janakarajan Natarajan <[email protected]>
---
drivers/acpi/cppc_acpi.c | 44 ++++++++++++++++++++++++----------------
include/acpi/cppc_acpi.h | 10 +++++----
2 files changed, 33 insertions(+), 21 deletions(-)

diff --git a/drivers/acpi/cppc_acpi.c b/drivers/acpi/cppc_acpi.c
index b24e54263efb..3199433e3f71 100644
--- a/drivers/acpi/cppc_acpi.c
+++ b/drivers/acpi/cppc_acpi.c
@@ -1343,12 +1343,12 @@ EXPORT_SYMBOL_GPL(cppc_get_perf_ctrs);
/**
* cppc_set_reg - Set the CPUs control register.
* @cpu: CPU for which to set the register.
- * @perf_ctrls: ptr to cppc_perf_ctrls. See cppc_acpi.h
+ * @ctrls: ptr to cppc_ctrls. See cppc_acpi.h
* @reg_idx: Index of the register being accessed
*
* Return: 0 for success, -ERRNO otherwise.
*/
-int cppc_set_reg(int cpu, struct cppc_perf_ctrls *perf_ctrls,
+int cppc_set_reg(int cpu, struct cppc_ctrls *ctrls,
enum cppc_regs reg_idx)
{
struct cpc_desc *cpc_desc = per_cpu(cpc_desc_ptr, cpu);
@@ -1364,20 +1364,23 @@ int cppc_set_reg(int cpu, struct cppc_perf_ctrls *perf_ctrls,
}

switch (reg_idx) {
+ case ENABLE:
+ value = ctrls->enable;
+ break;
case DESIRED_PERF:
- value = perf_ctrls->desired_perf;
+ value = ctrls->desired_perf;
break;
case MAX_PERF:
- value = perf_ctrls->max_perf;
+ value = ctrls->max_perf;
break;
case MIN_PERF:
- value = perf_ctrls->min_perf;
+ value = ctrls->min_perf;
break;
case ENERGY_PERF:
- value = perf_ctrls->energy_perf;
+ value = ctrls->energy_perf;
break;
case AUTO_SEL_ENABLE:
- value = perf_ctrls->auto_sel_enable;
+ value = ctrls->auto_sel_enable;
break;
default:
pr_debug("CPC register index #%d not writeable\n", reg_idx);
@@ -1485,13 +1488,14 @@ int cppc_set_reg(int cpu, struct cppc_perf_ctrls *perf_ctrls,
}
EXPORT_SYMBOL_GPL(cppc_set_reg);

-int cppc_get_perf(int cpu, struct cppc_perf_ctrls *perf_ctrls)
+int cppc_get_ctrls(int cpu, struct cppc_ctrls *ctrls)
{
struct cpc_desc *cpc_desc = per_cpu(cpc_desc_ptr, cpu);
struct cpc_register_resource *desired_reg, *max_reg, *min_reg;
struct cpc_register_resource *energy_reg, *auto_sel_enable_reg;
+ struct cpc_register_resource *enable_reg;
int pcc_ss_id = per_cpu(cpu_pcc_subspace_idx, cpu);
- u64 desired, max, min, energy, auto_sel_enable;
+ u64 desired, max, min, energy, auto_sel_enable, enable;
struct cppc_pcc_data *pcc_ss_data = NULL;
int ret = 0, regs_in_pcc = 0;

@@ -1500,6 +1504,7 @@ int cppc_get_perf(int cpu, struct cppc_perf_ctrls *perf_ctrls)
return -ENODEV;
}

+ enable_reg = &cpc_desc->cpc_regs[ENABLE];
desired_reg = &cpc_desc->cpc_regs[DESIRED_PERF];
max_reg = &cpc_desc->cpc_regs[MAX_PERF];
min_reg = &cpc_desc->cpc_regs[MIN_PERF];
@@ -1509,7 +1514,7 @@ int cppc_get_perf(int cpu, struct cppc_perf_ctrls *perf_ctrls)
/* Check if any of the perf registers are in PCC */
if (CPC_IN_PCC(desired_reg) || CPC_IN_PCC(max_reg) ||
CPC_IN_PCC(min_reg) || CPC_IN_PCC(energy_reg) ||
- CPC_IN_PCC(auto_sel_enable_reg)) {
+ CPC_IN_PCC(auto_sel_enable_reg) || CPC_IN_PCC(enable_reg)) {
pcc_ss_data = pcc_data[pcc_ss_id];
down_write(&pcc_ss_data->pcc_lock);
regs_in_pcc = 1;
@@ -1521,10 +1526,14 @@ int cppc_get_perf(int cpu, struct cppc_perf_ctrls *perf_ctrls)
}
}

- /* desired_perf is the only mandatory value in perf_ctrls */
+ /* desired_perf is the only mandatory value in ctrls */
if (cpc_read(cpu, desired_reg, &desired))
ret = -EFAULT;

+ if (CPC_SUP_BUFFER_ONLY(enable_reg) &&
+ cpc_read(cpu, enable_reg, &enable))
+ ret = -EFAULT;
+
if (CPC_SUP_BUFFER_ONLY(max_reg) && cpc_read(cpu, max_reg, &max))
ret = -EFAULT;

@@ -1540,11 +1549,12 @@ int cppc_get_perf(int cpu, struct cppc_perf_ctrls *perf_ctrls)
ret = -EFAULT;

if (!ret) {
- perf_ctrls->desired_perf = desired;
- perf_ctrls->max_perf = max;
- perf_ctrls->min_perf = min;
- perf_ctrls->energy_perf = energy;
- perf_ctrls->auto_sel_enable = auto_sel_enable;
+ ctrls->enable = enable;
+ ctrls->desired_perf = desired;
+ ctrls->max_perf = max;
+ ctrls->min_perf = min;
+ ctrls->energy_perf = energy;
+ ctrls->auto_sel_enable = auto_sel_enable;
}

out_err:
@@ -1552,7 +1562,7 @@ int cppc_get_perf(int cpu, struct cppc_perf_ctrls *perf_ctrls)
up_write(&pcc_ss_data->pcc_lock);
return ret;
}
-EXPORT_SYMBOL_GPL(cppc_get_perf);
+EXPORT_SYMBOL_GPL(cppc_get_ctrls);

/**
* cppc_get_transition_latency - returns frequency transition latency in ns
diff --git a/include/acpi/cppc_acpi.h b/include/acpi/cppc_acpi.h
index 80720b246c51..e6cd2a487874 100644
--- a/include/acpi/cppc_acpi.h
+++ b/include/acpi/cppc_acpi.h
@@ -109,7 +109,8 @@ struct cppc_perf_caps {
u32 nominal_freq;
};

-struct cppc_perf_ctrls {
+struct cppc_ctrls {
+ bool enable;
u32 max_perf;
u32 min_perf;
u32 desired_perf;
@@ -128,17 +129,18 @@ struct cppc_perf_fb_ctrs {
struct cppc_cpudata {
int cpu;
struct cppc_perf_caps perf_caps;
- struct cppc_perf_ctrls perf_ctrls;
+ struct cppc_ctrls ctrls;
struct cppc_perf_fb_ctrs perf_fb_ctrs;
struct cpufreq_policy *cur_policy;
unsigned int shared_type;
cpumask_var_t shared_cpu_map;
};

+extern int cppc_get_enable(int cpu);
extern int cppc_get_desired_perf(int cpunum, u64 *desired_perf);
extern int cppc_get_perf_ctrs(int cpu, struct cppc_perf_fb_ctrs *perf_fb_ctrs);
-extern int cppc_set_reg(int cpu, struct cppc_perf_ctrls *perf_ctrls, enum cppc_regs reg_idx);
-extern int cppc_get_perf(int cpu, struct cppc_perf_ctrls *perf_ctrls);
+extern int cppc_set_reg(int cpu, struct cppc_ctrls *ctrls, enum cppc_regs reg_idx);
+extern int cppc_get_ctrls(int cpu, struct cppc_ctrls *ctrls);
extern int cppc_get_perf_caps(int cpu, struct cppc_perf_caps *caps);
extern int acpi_get_psd_map(struct cppc_cpudata **);
extern unsigned int cppc_get_transition_latency(int cpu);
--
2.17.1

2019-07-10 18:42:01

by Janakarajan Natarajan

[permalink] [raw]
Subject: [PATCHv3 2/6] acpi/cppc: Ensure only supported CPPC sysfs entries are created

Add attributes only for registers that are supported by the platform.
This prevents unsupported, optional registers from having sysfs entries
created.

Signed-off-by: Janakarajan Natarajan <[email protected]>
---
drivers/acpi/cppc_acpi.c | 82 +++++++++++++++++++++++++++++++++-------
1 file changed, 68 insertions(+), 14 deletions(-)

diff --git a/drivers/acpi/cppc_acpi.c b/drivers/acpi/cppc_acpi.c
index c43de65531ae..53a9dc9960b6 100644
--- a/drivers/acpi/cppc_acpi.c
+++ b/drivers/acpi/cppc_acpi.c
@@ -183,22 +183,8 @@ static ssize_t show_feedback_ctrs(struct kobject *kobj,
}
define_one_cppc_ro(feedback_ctrs);

-static struct attribute *cppc_attrs[] = {
- &feedback_ctrs.attr,
- &reference_perf.attr,
- &wraparound_time.attr,
- &highest_perf.attr,
- &lowest_perf.attr,
- &lowest_nonlinear_perf.attr,
- &nominal_perf.attr,
- &nominal_freq.attr,
- &lowest_freq.attr,
- NULL
-};
-
static struct kobj_type cppc_ktype = {
.sysfs_ops = &kobj_sysfs_ops,
- .default_attrs = cppc_attrs,
};

static int check_pcc_chan(int pcc_ss_id, bool chk_err_bit)
@@ -733,6 +719,69 @@ static bool is_buf_only(int reg_idx)
CPC_SUP_BUFFER_ONLY(&cpc->cpc_regs[idx]) : \
CPC_SUPPORTED(&cpc->cpc_regs[idx]))

+static int is_mandatory_reg(int reg_idx)
+{
+ switch (reg_idx) {
+ case HIGHEST_PERF:
+ case NOMINAL_PERF:
+ case LOW_NON_LINEAR_PERF:
+ case LOWEST_PERF:
+ case REFERENCE_CTR:
+ case DELIVERED_CTR:
+ return 1;
+ }
+
+ return 0;
+}
+
+#define MANDATORY_REG_CNT 6
+
+static int set_cppc_attrs(struct cpc_desc *cpc, int entries)
+{
+ int i, attr_i = 0, opt_reg_cnt;
+ static struct attribute **cppc_attrs;
+
+ cppc_attrs = kcalloc(entries, sizeof(*cppc_attrs), GFP_KERNEL);
+ if (!cppc_attrs)
+ return -ENOMEM;
+
+ /* Set optional regs */
+ opt_reg_cnt = entries - MANDATORY_REG_CNT;
+ for (i = 0; i < MAX_CPC_REG_ENT && attr_i < opt_reg_cnt; i++) {
+ if (is_mandatory_reg(i) || !REG_SUPPORTED(cpc, i))
+ continue;
+
+ switch (i) {
+ case NOMINAL_FREQ:
+ cppc_attrs[attr_i++] = &nominal_freq.attr;
+ break;
+ case LOWEST_FREQ:
+ cppc_attrs[attr_i++] = &lowest_freq.attr;
+ break;
+ case REFERENCE_PERF:
+ cppc_attrs[attr_i++] = &reference_perf.attr;
+ break;
+ case CTR_WRAP_TIME:
+ cppc_attrs[attr_i++] = &wraparound_time.attr;
+ break;
+ }
+ }
+
+ /* Set mandatory regs */
+ cppc_attrs[attr_i++] = &highest_perf.attr;
+ cppc_attrs[attr_i++] = &nominal_perf.attr;
+ cppc_attrs[attr_i++] = &lowest_nonlinear_perf.attr;
+ cppc_attrs[attr_i++] = &lowest_perf.attr;
+
+ /* Set feedback_ctr sysfs entry */
+ cppc_attrs[attr_i] = &feedback_ctrs.attr;
+
+ /* Set kobj_type member */
+ cppc_ktype.default_attrs = cppc_attrs;
+
+ return 0;
+}
+
/**
* acpi_cppc_processor_probe - Search for per CPU _CPC objects.
* @pr: Ptr to acpi_processor containing this CPU's logical ID.
@@ -887,6 +936,10 @@ int acpi_cppc_processor_probe(struct acpi_processor *pr)
/* Plug PSD data into this CPU's CPC descriptor. */
per_cpu(cpc_desc_ptr, pr->id) = cpc_ptr;

+ ret = set_cppc_attrs(cpc_ptr, num_ent - 2);
+ if (ret)
+ goto out_free;
+
ret = kobject_init_and_add(&cpc_ptr->kobj, &cppc_ktype, &cpu_dev->kobj,
"acpi_cppc");
if (ret) {
@@ -948,6 +1001,7 @@ void acpi_cppc_processor_exit(struct acpi_processor *pr)
iounmap(addr);
}

+ kfree(cppc_ktype.default_attrs);
kobject_put(&cpc_ptr->kobj);
kfree(cpc_ptr);
}
--
2.17.1

2019-07-13 10:47:39

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCHv3 0/6] CPPC optional registers AMD support

On Wed, Jul 10, 2019 at 06:37:09PM +0000, Natarajan, Janakarajan wrote:
> CPPC (Collaborative Processor Performance Control) offers optional
> registers which can be used to tune the system based on energy and/or
> performance requirements.
>
> Newer AMD processors (>= Family 17h) add support for a subset of these
> optional CPPC registers, based on ACPI v6.1.
>
> The following are the supported CPPC registers for which sysfs entries
> are created:
> * enable (NEW)
> * max_perf (NEW)
> * min_perf (NEW)
> * energy_perf
> * lowest_perf
> * nominal_perf
> * desired_perf (NEW)
> * feedback_ctrs
> * auto_sel_enable (NEW)
> * lowest_nonlinear_perf
>
> First, update cppc_acpi to create sysfs entries only when the optional
> registers are known to be supported.
>
> Next, a new CPUFreq driver is introduced to enable the OSPM and the userspace
> to access the newly supported registers through sysfs entries found in
> /sys/devices/system/cpu/cpu<num>/amd_cpufreq/.
>
> This new CPUFreq driver can only be used by providing a module parameter,
> amd_cpufreq.cppc_enable=1.
>
> The purpose of exposing the registers via the amd-cpufreq sysfs entries is to
> allow the userspace to:
> * Tweak the values to fit its workload.
> * Apply a profile from AMD's optimization guides.

So in general I think it is a huge mistake to expose all that to
userspace. Before you know it, there's tools that actually rely on it,
and then inhibit the kernel from doing anything sane with it.

> Profiles will be documented in the performance/optimization guides.

I don't think userspace can really do anything sane with this; it lacks
much if not all useful information.

> Note:
> * AMD systems will not have a policy applied in the kernel at this time.

And why the heck not? We're trying to move all cpufreq into the
scheduler and have only a single governor, namely schedutil -- yes,
we're still stuck with legacy, and we're still working on performance
parity in some cases, but I really hope to get rid of all other cpufreq
governors eventually.

And if you look at schedutil (schedutil_cpu_util in specific) then
you'll see it is already prepared for CPPC and currently only held back
by the generic cpufreq interface.

It currently only sets desired freq, it has information for
min/guaranteed, and once we get thermal intergrated we might have
sensible data for max freq too.

> TODO:
> * Create a linux userspace tool that will help users generate a CPPC profile
> for their target workload.

Basically a big fat NAK for this approach to cpufreq.

> * Create a general CPPC policy in the kernel.

We already have that, sorta.

2019-07-15 17:59:24

by Yazen Ghannam

[permalink] [raw]
Subject: RE: [PATCHv3 0/6] CPPC optional registers AMD support

> -----Original Message-----
> From: Peter Zijlstra <[email protected]>
> Sent: Saturday, July 13, 2019 5:46 AM
> To: Natarajan, Janakarajan <[email protected]>
> Cc: [email protected]; [email protected]; [email protected]; [email protected]; Rafael J . Wysocki
> <[email protected]>; Len Brown <[email protected]>; Viresh Kumar <[email protected]>; Robert Moore
> <[email protected]>; Erik Schmauss <[email protected]>; Ghannam, Yazen <[email protected]>
> Subject: Re: [PATCHv3 0/6] CPPC optional registers AMD support
>
> On Wed, Jul 10, 2019 at 06:37:09PM +0000, Natarajan, Janakarajan wrote:
> > CPPC (Collaborative Processor Performance Control) offers optional
> > registers which can be used to tune the system based on energy and/or
> > performance requirements.
> >
> > Newer AMD processors (>= Family 17h) add support for a subset of these
> > optional CPPC registers, based on ACPI v6.1.
> >
> > The following are the supported CPPC registers for which sysfs entries
> > are created:
> > * enable (NEW)
> > * max_perf (NEW)
> > * min_perf (NEW)
> > * energy_perf
> > * lowest_perf
> > * nominal_perf
> > * desired_perf (NEW)
> > * feedback_ctrs
> > * auto_sel_enable (NEW)
> > * lowest_nonlinear_perf
> >
> > First, update cppc_acpi to create sysfs entries only when the optional
> > registers are known to be supported.
> >
> > Next, a new CPUFreq driver is introduced to enable the OSPM and the userspace
> > to access the newly supported registers through sysfs entries found in
> > /sys/devices/system/cpu/cpu<num>/amd_cpufreq/.
> >
> > This new CPUFreq driver can only be used by providing a module parameter,
> > amd_cpufreq.cppc_enable=1.
> >
> > The purpose of exposing the registers via the amd-cpufreq sysfs entries is to
> > allow the userspace to:
> > * Tweak the values to fit its workload.
> > * Apply a profile from AMD's optimization guides.
>
> So in general I think it is a huge mistake to expose all that to
> userspace. Before you know it, there's tools that actually rely on it,
> and then inhibit the kernel from doing anything sane with it.
>

Okay, makes sense.

Is there any way to expose a sysfs interface and make it explicitly "experimental"? Maybe putting it in Documentation/ABI/testing/?

Or do you think it's just not worth it?

> > Profiles will be documented in the performance/optimization guides.
>
> I don't think userspace can really do anything sane with this; it lacks
> much if not all useful information.
>
> > Note:
> > * AMD systems will not have a policy applied in the kernel at this time.
>
> And why the heck not? We're trying to move all cpufreq into the
> scheduler and have only a single governor, namely schedutil -- yes,
> we're still stuck with legacy, and we're still working on performance
> parity in some cases, but I really hope to get rid of all other cpufreq
> governors eventually.
>

Because this is new to AMD systems, we didn't want to enforce a default policy.

We figured that exposing the CPPC interface would be a good way to decouple policy from the kernel and let users experiment/tune their systems, like using the userspace governor. And if some pattern emerged then we could make that a default policy in the kernel (for AMD or in general).

But you're saying we should focus more on working with the schedutil governor, correct? Do you think there's still a use for a userspace governor?

> And if you look at schedutil (schedutil_cpu_util in specific) then
> you'll see it is already prepared for CPPC and currently only held back
> by the generic cpufreq interface.
>
> It currently only sets desired freq, it has information for
> min/guaranteed, and once we get thermal intergrated we might have
> sensible data for max freq too.
>

Will do.

> > TODO:
> > * Create a linux userspace tool that will help users generate a CPPC profile
> > for their target workload.
>
> Basically a big fat NAK for this approach to cpufreq.
>

Is that for exposing the sysfs interface, having a stub driver, or both?

Would it be better to have a cpufreq driver that implements some policy rather than just providing the sysfs interface?

> > * Create a general CPPC policy in the kernel.
>
> We already have that, sorta.

Right, but it seems to still be focused on CPU frequency rather than abstract performance like how CPPC is defined.

This is another reason for exposing the CPPC interface directly. We'll give users the ability to interact with the platform, using CPPC, without having to follow the CPUFREQ paradigm.

Do you think this is doable? Or should we always have some kernel interaction because of the scheduler, etc.?

Thanks,
Yazen