2013-04-02 18:11:57

by Jacob Shin

[permalink] [raw]
Subject: [PATCH V3 0/2] cpufreq: ondemand: add AMD specific powersave bias

This patchset adds AMD specific powersave bias function to the ondemand
governor; which can be used to help ondemand governor make more power conscious
frequency change decisions based on feedback from hardware (availble on AMD
Family 16h and above).

Hardware feedback tells software how "sensitive" to frequency changes the
workloads are. CPU-bound workloads will be more sensitive -- they will
perform better as frequency increases. Memory/IO-bound workloads will be less
sensitive -- they will not necessarily perform better as frequnecy increases.

This patchset was compared against ondemand governor without powersave bias
and did not show any performance degradation on CPU-bound workloads such as
kernbench and unixbench. While saving power on Memory-bound workloads such as
stream.

V3:
* Added to CPUID bit to cpufeature.h
* Added MODULE_DEVICE_TABLE to autoload this driver.
* Other small changes per feedback from:
https://lkml.org/lkml/2013/4/2/349

V2:
* Added proper include files to amd_freq_sensitivity.c
* Only register powersave_bias_target function pointer and not the entire
od_ops.

Jacob Shin (2):
cpufreq: ondemand: allow custom powersave_bias_target handler to be
registered
cpufreq: AMD "frequency sensitivity feedback" powersave bias for
ondemand governor

arch/x86/include/asm/cpufeature.h | 1 +
arch/x86/kernel/cpu/scattered.c | 3 +-
drivers/cpufreq/Kconfig.x86 | 17 ++++
drivers/cpufreq/Makefile | 1 +
drivers/cpufreq/amd_freq_sensitivity.c | 150 ++++++++++++++++++++++++++++++++
drivers/cpufreq/cpufreq_governor.h | 3 +
drivers/cpufreq/cpufreq_ondemand.c | 32 +++++--
7 files changed, 198 insertions(+), 9 deletions(-)
create mode 100644 drivers/cpufreq/amd_freq_sensitivity.c

--
1.7.9.5


2013-04-02 18:12:06

by Jacob Shin

[permalink] [raw]
Subject: [PATCH V3 1/2] cpufreq: ondemand: allow custom powersave_bias_target handler to be registered

This allows for another [arch specific] driver to hook into existing
powersave bias function of the ondemand governor. i.e. This allows AMD
specific powersave bias function (in a separate AMD specific driver)
to aid ondemand governor's frequency transition deicisions.

Signed-off-by: Jacob Shin <[email protected]>
---
drivers/cpufreq/cpufreq_governor.h | 3 +++
drivers/cpufreq/cpufreq_ondemand.c | 32 ++++++++++++++++++++++++--------
2 files changed, 27 insertions(+), 8 deletions(-)

diff --git a/drivers/cpufreq/cpufreq_governor.h b/drivers/cpufreq/cpufreq_governor.h
index 6593769..f52bf17 100644
--- a/drivers/cpufreq/cpufreq_governor.h
+++ b/drivers/cpufreq/cpufreq_governor.h
@@ -263,4 +263,7 @@ int cpufreq_governor_dbs(struct cpufreq_policy *policy,
struct common_dbs_data *cdata, unsigned int event);
void gov_queue_work(struct dbs_data *dbs_data, struct cpufreq_policy *policy,
unsigned int delay, bool all_cpus);
+void od_register_powersave_bias_handler(unsigned int (*f)
+ (struct cpufreq_policy *, unsigned int, unsigned int));
+void od_unregister_powersave_bias_handler(void);
#endif /* _CPUFREQ_GOVERNOR_H */
diff --git a/drivers/cpufreq/cpufreq_ondemand.c b/drivers/cpufreq/cpufreq_ondemand.c
index 1471478..e43611d 100644
--- a/drivers/cpufreq/cpufreq_ondemand.c
+++ b/drivers/cpufreq/cpufreq_ondemand.c
@@ -40,6 +40,8 @@

static DEFINE_PER_CPU(struct od_cpu_dbs_info_s, od_cpu_dbs_info);

+static struct od_ops od_ops;
+
#ifndef CONFIG_CPU_FREQ_DEFAULT_GOV_ONDEMAND
static struct cpufreq_governor cpufreq_gov_ondemand;
#endif
@@ -80,7 +82,7 @@ static int should_io_be_busy(void)
* Returns the freq_hi to be used right now and will set freq_hi_jiffies,
* freq_lo, and freq_lo_jiffies in percpu area for averaging freqs.
*/
-static unsigned int powersave_bias_target(struct cpufreq_policy *policy,
+static unsigned int generic_powersave_bias_target(struct cpufreq_policy *policy,
unsigned int freq_next, unsigned int relation)
{
unsigned int freq_req, freq_reduc, freq_avg;
@@ -145,7 +147,8 @@ static void dbs_freq_increase(struct cpufreq_policy *p, unsigned int freq)
struct od_dbs_tuners *od_tuners = dbs_data->tuners;

if (od_tuners->powersave_bias)
- freq = powersave_bias_target(p, freq, CPUFREQ_RELATION_H);
+ freq = od_ops.powersave_bias_target(p, freq,
+ CPUFREQ_RELATION_H);
else if (p->cur == p->max)
return;

@@ -205,12 +208,12 @@ static void od_check_cpu(int cpu, unsigned int load_freq)
if (!od_tuners->powersave_bias) {
__cpufreq_driver_target(policy, freq_next,
CPUFREQ_RELATION_L);
- } else {
- int freq = powersave_bias_target(policy, freq_next,
- CPUFREQ_RELATION_L);
- __cpufreq_driver_target(policy, freq,
- CPUFREQ_RELATION_L);
+ return;
}
+
+ freq_next = od_ops.powersave_bias_target(policy, freq_next,
+ CPUFREQ_RELATION_L);
+ __cpufreq_driver_target(policy, freq_next, CPUFREQ_RELATION_L);
}
}

@@ -557,7 +560,7 @@ define_get_cpu_dbs_routines(od_cpu_dbs_info);

static struct od_ops od_ops = {
.powersave_bias_init_cpu = ondemand_powersave_bias_init_cpu,
- .powersave_bias_target = powersave_bias_target,
+ .powersave_bias_target = generic_powersave_bias_target,
.freq_increase = dbs_freq_increase,
};

@@ -574,6 +577,19 @@ static struct common_dbs_data od_dbs_cdata = {
.exit = od_exit,
};

+void od_register_powersave_bias_handler(unsigned int (*f)
+ (struct cpufreq_policy *, unsigned int, unsigned int))
+{
+ od_ops.powersave_bias_target = f;
+}
+EXPORT_SYMBOL_GPL(od_register_powersave_bias_handler);
+
+void od_unregister_powersave_bias_handler(void)
+{
+ od_ops.powersave_bias_target = generic_powersave_bias_target;
+}
+EXPORT_SYMBOL_GPL(od_unregister_powersave_bias_handler);
+
static int od_cpufreq_governor_dbs(struct cpufreq_policy *policy,
unsigned int event)
{
--
1.7.9.5

2013-04-02 18:27:09

by Jacob Shin

[permalink] [raw]
Subject: [PATCH V3 2/2] cpufreq: AMD "frequency sensitivity feedback" powersave bias for ondemand governor

Future AMD processors, starting with Family 16h, can provide software
with feedback on how the workload may respond to frequency change --
memory-bound workloads will not benefit from higher frequency, where
as compute-bound workloads will. This patch enables this "frequency
sensitivity feedback" to aid the ondemand governor to make better
frequency change decisions by hooking into the powersave bias.

Signed-off-by: Jacob Shin <[email protected]>
---
arch/x86/include/asm/cpufeature.h | 1 +
arch/x86/kernel/cpu/scattered.c | 3 +-
drivers/cpufreq/Kconfig.x86 | 17 ++++
drivers/cpufreq/Makefile | 1 +
drivers/cpufreq/amd_freq_sensitivity.c | 150 ++++++++++++++++++++++++++++++++
5 files changed, 171 insertions(+), 1 deletion(-)
create mode 100644 drivers/cpufreq/amd_freq_sensitivity.c

diff --git a/arch/x86/include/asm/cpufeature.h b/arch/x86/include/asm/cpufeature.h
index 93fe929..9e22520 100644
--- a/arch/x86/include/asm/cpufeature.h
+++ b/arch/x86/include/asm/cpufeature.h
@@ -182,6 +182,7 @@
#define X86_FEATURE_PTS (7*32+ 6) /* Intel Package Thermal Status */
#define X86_FEATURE_DTHERM (7*32+ 7) /* Digital Thermal Sensor */
#define X86_FEATURE_HW_PSTATE (7*32+ 8) /* AMD HW-PState */
+#define X86_FEATURE_PROC_FEEDBACK (7*32+ 9) /* AMD ProcFeedbackInterface */

/* Virtualization flags: Linux defined, word 8 */
#define X86_FEATURE_TPR_SHADOW (8*32+ 0) /* Intel TPR Shadow */
diff --git a/arch/x86/kernel/cpu/scattered.c b/arch/x86/kernel/cpu/scattered.c
index ee8e9ab..d92b5da 100644
--- a/arch/x86/kernel/cpu/scattered.c
+++ b/arch/x86/kernel/cpu/scattered.c
@@ -39,8 +39,9 @@ void __cpuinit init_scattered_cpuid_features(struct cpuinfo_x86 *c)
{ X86_FEATURE_APERFMPERF, CR_ECX, 0, 0x00000006, 0 },
{ X86_FEATURE_EPB, CR_ECX, 3, 0x00000006, 0 },
{ X86_FEATURE_XSAVEOPT, CR_EAX, 0, 0x0000000d, 1 },
- { X86_FEATURE_CPB, CR_EDX, 9, 0x80000007, 0 },
{ X86_FEATURE_HW_PSTATE, CR_EDX, 7, 0x80000007, 0 },
+ { X86_FEATURE_CPB, CR_EDX, 9, 0x80000007, 0 },
+ { X86_FEATURE_PROC_FEEDBACK, CR_EDX,11, 0x80000007, 0 },
{ X86_FEATURE_NPT, CR_EDX, 0, 0x8000000a, 0 },
{ X86_FEATURE_LBRV, CR_EDX, 1, 0x8000000a, 0 },
{ X86_FEATURE_SVML, CR_EDX, 2, 0x8000000a, 0 },
diff --git a/drivers/cpufreq/Kconfig.x86 b/drivers/cpufreq/Kconfig.x86
index d7dc0ed..018fced 100644
--- a/drivers/cpufreq/Kconfig.x86
+++ b/drivers/cpufreq/Kconfig.x86
@@ -129,6 +129,23 @@ config X86_POWERNOW_K8

For details, take a look at <file:Documentation/cpu-freq/>.

+config X86_AMD_FREQ_SENSITIVITY
+ tristate "AMD frequency sensitivity feedback powersave bias"
+ depends on CPU_FREQ_GOV_ONDEMAND && X86_ACPI_CPUFREQ && CPU_SUP_AMD
+ help
+ This adds AMD specific powersave bias function to the ondemand
+ governor; which can be used to help ondemand governor make more power
+ conscious frequency change decisions based on feedback from hardware
+ (availble on AMD Family 16h and above).
+
+ Hardware feedback tells software how "sensitive" to frequency changes
+ the CPUs' workloads are. CPU-bound workloads will be more sensitive
+ -- they will perform better as frequency increases. Memory/IO-bound
+ workloads will be less sensitive -- they will not necessarily perform
+ better as frequnecy increases.
+
+ If in doubt, say N.
+
config X86_GX_SUSPMOD
tristate "Cyrix MediaGX/NatSemi Geode Suspend Modulation"
depends on X86_32 && PCI
diff --git a/drivers/cpufreq/Makefile b/drivers/cpufreq/Makefile
index 863fd18..01dfdaf 100644
--- a/drivers/cpufreq/Makefile
+++ b/drivers/cpufreq/Makefile
@@ -41,6 +41,7 @@ obj-$(CONFIG_X86_SPEEDSTEP_CENTRINO) += speedstep-centrino.o
obj-$(CONFIG_X86_P4_CLOCKMOD) += p4-clockmod.o
obj-$(CONFIG_X86_CPUFREQ_NFORCE2) += cpufreq-nforce2.o
obj-$(CONFIG_X86_INTEL_PSTATE) += intel_pstate.o
+obj-$(CONFIG_X86_AMD_FREQ_SENSITIVITY) += amd_freq_sensitivity.o

##################################################################################
# ARM SoC drivers
diff --git a/drivers/cpufreq/amd_freq_sensitivity.c b/drivers/cpufreq/amd_freq_sensitivity.c
new file mode 100644
index 0000000..e3e62d2
--- /dev/null
+++ b/drivers/cpufreq/amd_freq_sensitivity.c
@@ -0,0 +1,150 @@
+/*
+ * amd_freq_sensitivity.c: AMD frequency sensitivity feedback powersave bias
+ * for the ondemand governor.
+ *
+ * Copyright (C) 2013 Advanced Micro Devices, Inc.
+ *
+ * Author: Jacob Shin <[email protected]>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/types.h>
+#include <linux/percpu-defs.h>
+#include <linux/init.h>
+#include <linux/mod_devicetable.h>
+
+#include <asm/msr.h>
+#include <asm/cpufeature.h>
+
+#include "cpufreq_governor.h"
+
+#define MSR_AMD64_FREQ_SENSITIVITY_ACTUAL 0xc0010080
+#define MSR_AMD64_FREQ_SENSITIVITY_REFERENCE 0xc0010081
+#define CLASS_CODE_SHIFT 56
+#define CLASS_CODE_CORE_FREQ_SENSITIVITY 0x01
+#define POWERSAVE_BIAS_MAX 1000
+
+struct cpu_data_t {
+ u64 actual;
+ u64 reference;
+ unsigned int freq_prev;
+};
+
+static DEFINE_PER_CPU(struct cpu_data_t, cpu_data);
+
+static unsigned int amd_powersave_bias_target(struct cpufreq_policy *policy,
+ unsigned int freq_next,
+ unsigned int relation)
+{
+ int sensitivity;
+ long d_actual, d_reference;
+ struct msr actual, reference;
+ struct cpu_data_t *data = &per_cpu(cpu_data, policy->cpu);
+ struct dbs_data *od_data = policy->governor_data;
+ struct od_dbs_tuners *od_tuners = od_data->tuners;
+ struct od_cpu_dbs_info_s *od_info =
+ od_data->cdata->get_cpu_dbs_info_s(policy->cpu);
+
+ if (!od_info->freq_table)
+ return freq_next;
+
+ rdmsr_on_cpu(policy->cpu, MSR_AMD64_FREQ_SENSITIVITY_ACTUAL,
+ &actual.l, &actual.h);
+ rdmsr_on_cpu(policy->cpu, MSR_AMD64_FREQ_SENSITIVITY_REFERENCE,
+ &reference.l, &reference.h);
+ actual.h &= 0x00ffffff;
+ reference.h &= 0x00ffffff;
+
+ /* counter wrapped around, so stay on current frequency */
+ if (actual.q < data->actual || reference.q < data->reference) {
+ freq_next = policy->cur;
+ goto out;
+ }
+
+ d_actual = actual.q - data->actual;
+ d_reference = reference.q - data->reference;
+
+ /* divide by 0, so stay on current frequency as well */
+ if (d_reference == 0) {
+ freq_next = policy->cur;
+ goto out;
+ }
+
+ sensitivity = POWERSAVE_BIAS_MAX -
+ (POWERSAVE_BIAS_MAX * (d_reference - d_actual) / d_reference);
+
+ clamp(sensitivity, 0, POWERSAVE_BIAS_MAX);
+
+ /* this workload is not CPU bound, so choose a lower freq */
+ if (sensitivity < od_tuners->powersave_bias) {
+ if (data->freq_prev == policy->cur)
+ freq_next = policy->cur;
+
+ if (freq_next > policy->cur)
+ freq_next = policy->cur;
+ else if (freq_next < policy->cur)
+ freq_next = policy->min;
+ else {
+ unsigned int index;
+
+ cpufreq_frequency_table_target(policy,
+ od_info->freq_table, policy->cur - 1,
+ CPUFREQ_RELATION_H, &index);
+ freq_next = od_info->freq_table[index].frequency;
+ }
+
+ data->freq_prev = freq_next;
+ } else
+ data->freq_prev = 0;
+
+out:
+ data->actual = actual.q;
+ data->reference = reference.q;
+ return freq_next;
+}
+
+static int __init amd_freq_sensitivity_init(void)
+{
+ int err;
+ u64 val;
+
+ if (boot_cpu_data.x86_vendor != X86_VENDOR_AMD)
+ return -ENODEV;
+
+ if (!static_cpu_has(X86_FEATURE_PROC_FEEDBACK))
+ return -ENODEV;
+
+ err = rdmsrl_safe(MSR_AMD64_FREQ_SENSITIVITY_ACTUAL, &val);
+
+ if (err)
+ return -ENODEV;
+
+ if ((val >> CLASS_CODE_SHIFT) != CLASS_CODE_CORE_FREQ_SENSITIVITY)
+ return -ENODEV;
+
+ od_register_powersave_bias_handler(amd_powersave_bias_target);
+ return 0;
+}
+module_init(amd_freq_sensitivity_init);
+
+static void __exit amd_freq_sensitivity_exit(void)
+{
+ od_unregister_powersave_bias_handler();
+}
+module_exit(amd_freq_sensitivity_exit);
+
+static const struct x86_cpu_id amd_freq_sensitivity_ids[] = {
+ X86_FEATURE_MATCH(X86_FEATURE_PROC_FEEDBACK),
+ {}
+};
+MODULE_DEVICE_TABLE(x86cpu, amd_freq_sensitivity_ids);
+
+MODULE_AUTHOR("Jacob Shin <[email protected]>");
+MODULE_DESCRIPTION("AMD frequency sensitivity feedback powersave bias for "
+ "the ondemand governor.");
+MODULE_LICENSE("GPL");
--
1.7.9.5

2013-04-02 19:23:59

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH V3 2/2] cpufreq: AMD "frequency sensitivity feedback" powersave bias for ondemand governor

On Tue, Apr 02, 2013 at 01:11:44PM -0500, Jacob Shin wrote:
> Future AMD processors, starting with Family 16h, can provide software
> with feedback on how the workload may respond to frequency change --
> memory-bound workloads will not benefit from higher frequency, where
> as compute-bound workloads will. This patch enables this "frequency
> sensitivity feedback" to aid the ondemand governor to make better
> frequency change decisions by hooking into the powersave bias.
>
> Signed-off-by: Jacob Shin <[email protected]>
> ---

[ … ]

> --- a/drivers/cpufreq/Kconfig.x86
> +++ b/drivers/cpufreq/Kconfig.x86
> @@ -129,6 +129,23 @@ config X86_POWERNOW_K8
>
> For details, take a look at <file:Documentation/cpu-freq/>.
>
> +config X86_AMD_FREQ_SENSITIVITY

/me is turning on his spell checker...

> + tristate "AMD frequency sensitivity feedback powersave bias"
> + depends on CPU_FREQ_GOV_ONDEMAND && X86_ACPI_CPUFREQ && CPU_SUP_AMD
> + help
> + This adds AMD specific powersave bias function to the ondemand

AMD-specific

> + governor; which can be used to help ondemand governor make more power

"... governor, which allows it to make more power-conscious frequency
change decisions based on ..."

> + conscious frequency change decisions based on feedback from hardware
> + (availble on AMD Family 16h and above).

s/availble/available/

> +
> + Hardware feedback tells software how "sensitive" to frequency changes
> + the CPUs' workloads are. CPU-bound workloads will be more sensitive
> + -- they will perform better as frequency increases. Memory/IO-bound
> + workloads will be less sensitive -- they will not necessarily perform
> + better as frequnecy increases.

s/frequnecy/frequency/

> +
> + If in doubt, say N.
> +
> config X86_GX_SUSPMOD
> tristate "Cyrix MediaGX/NatSemi Geode Suspend Modulation"
> depends on X86_32 && PCI
> diff --git a/drivers/cpufreq/Makefile b/drivers/cpufreq/Makefile
> index 863fd18..01dfdaf 100644
> --- a/drivers/cpufreq/Makefile
> +++ b/drivers/cpufreq/Makefile
> @@ -41,6 +41,7 @@ obj-$(CONFIG_X86_SPEEDSTEP_CENTRINO) += speedstep-centrino.o
> obj-$(CONFIG_X86_P4_CLOCKMOD) += p4-clockmod.o
> obj-$(CONFIG_X86_CPUFREQ_NFORCE2) += cpufreq-nforce2.o
> obj-$(CONFIG_X86_INTEL_PSTATE) += intel_pstate.o
> +obj-$(CONFIG_X86_AMD_FREQ_SENSITIVITY) += amd_freq_sensitivity.o
>
> ##################################################################################
> # ARM SoC drivers
> diff --git a/drivers/cpufreq/amd_freq_sensitivity.c b/drivers/cpufreq/amd_freq_sensitivity.c
> new file mode 100644
> index 0000000..e3e62d2
> --- /dev/null
> +++ b/drivers/cpufreq/amd_freq_sensitivity.c
> @@ -0,0 +1,150 @@
> +/*
> + * amd_freq_sensitivity.c: AMD frequency sensitivity feedback powersave bias
> + * for the ondemand governor.
> + *
> + * Copyright (C) 2013 Advanced Micro Devices, Inc.
> + *
> + * Author: Jacob Shin <[email protected]>
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + */
> +
> +#include <linux/kernel.h>
> +#include <linux/module.h>
> +#include <linux/types.h>
> +#include <linux/percpu-defs.h>
> +#include <linux/init.h>
> +#include <linux/mod_devicetable.h>
> +
> +#include <asm/msr.h>
> +#include <asm/cpufeature.h>
> +
> +#include "cpufreq_governor.h"
> +
> +#define MSR_AMD64_FREQ_SENSITIVITY_ACTUAL 0xc0010080
> +#define MSR_AMD64_FREQ_SENSITIVITY_REFERENCE 0xc0010081
> +#define CLASS_CODE_SHIFT 56
> +#define CLASS_CODE_CORE_FREQ_SENSITIVITY 0x01
> +#define POWERSAVE_BIAS_MAX 1000
> +
> +struct cpu_data_t {
> + u64 actual;
> + u64 reference;
> + unsigned int freq_prev;
> +};
> +
> +static DEFINE_PER_CPU(struct cpu_data_t, cpu_data);
> +
> +static unsigned int amd_powersave_bias_target(struct cpufreq_policy *policy,
> + unsigned int freq_next,
> + unsigned int relation)
> +{
> + int sensitivity;
> + long d_actual, d_reference;
> + struct msr actual, reference;
> + struct cpu_data_t *data = &per_cpu(cpu_data, policy->cpu);
> + struct dbs_data *od_data = policy->governor_data;
> + struct od_dbs_tuners *od_tuners = od_data->tuners;
> + struct od_cpu_dbs_info_s *od_info =
> + od_data->cdata->get_cpu_dbs_info_s(policy->cpu);
> +
> + if (!od_info->freq_table)
> + return freq_next;
> +
> + rdmsr_on_cpu(policy->cpu, MSR_AMD64_FREQ_SENSITIVITY_ACTUAL,
> + &actual.l, &actual.h);
> + rdmsr_on_cpu(policy->cpu, MSR_AMD64_FREQ_SENSITIVITY_REFERENCE,
> + &reference.l, &reference.h);
> + actual.h &= 0x00ffffff;
> + reference.h &= 0x00ffffff;
> +
> + /* counter wrapped around, so stay on current frequency */
> + if (actual.q < data->actual || reference.q < data->reference) {
> + freq_next = policy->cur;
> + goto out;
> + }
> +
> + d_actual = actual.q - data->actual;
> + d_reference = reference.q - data->reference;
> +
> + /* divide by 0, so stay on current frequency as well */
> + if (d_reference == 0) {
> + freq_next = policy->cur;
> + goto out;
> + }
> +
> + sensitivity = POWERSAVE_BIAS_MAX -
> + (POWERSAVE_BIAS_MAX * (d_reference - d_actual) / d_reference);
> +
> + clamp(sensitivity, 0, POWERSAVE_BIAS_MAX);
> +
> + /* this workload is not CPU bound, so choose a lower freq */
> + if (sensitivity < od_tuners->powersave_bias) {

Ok, I still didn't get an answer to that: don't we want to use this
feature by default, even without looking at ->powersave_bias? I mean,
with feedback from the hardware, we kinda know better than the user, no?

> + if (data->freq_prev == policy->cur)
> + freq_next = policy->cur;
> +
> + if (freq_next > policy->cur)
> + freq_next = policy->cur;
> + else if (freq_next < policy->cur)
> + freq_next = policy->min;
> + else {
> + unsigned int index;
> +
> + cpufreq_frequency_table_target(policy,
> + od_info->freq_table, policy->cur - 1,
> + CPUFREQ_RELATION_H, &index);
> + freq_next = od_info->freq_table[index].frequency;
> + }
> +
> + data->freq_prev = freq_next;
> + } else
> + data->freq_prev = 0;
> +
> +out:
> + data->actual = actual.q;
> + data->reference = reference.q;
> + return freq_next;
> +}
> +
> +static int __init amd_freq_sensitivity_init(void)
> +{
> + int err;
> + u64 val;
> +
> + if (boot_cpu_data.x86_vendor != X86_VENDOR_AMD)
> + return -ENODEV;
> +
> + if (!static_cpu_has(X86_FEATURE_PROC_FEEDBACK))
> + return -ENODEV;
> +
> + err = rdmsrl_safe(MSR_AMD64_FREQ_SENSITIVITY_ACTUAL, &val);
> +

extraneous newline.

> + if (err)
> + return -ENODEV;
> +
> + if ((val >> CLASS_CODE_SHIFT) != CLASS_CODE_CORE_FREQ_SENSITIVITY)
> + return -ENODEV;

If this CLASS_CODE_CORE_FREQ_SENSITIVITY is always going to be a
non-null value, you can simplify the check even more, as I proposed
earlier:

if (val >> CLASS_CODE_SHIFT)
...

and drop CLASS_CODE_CORE_FREQ_SENSITIVITY.

Thanks.

--
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

2013-04-02 19:24:28

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH V3 1/2] cpufreq: ondemand: allow custom powersave_bias_target handler to be registered

On Tue, Apr 02, 2013 at 01:11:43PM -0500, Jacob Shin wrote:
> This allows for another [arch specific] driver to hook into existing
> powersave bias function of the ondemand governor. i.e. This allows AMD
> specific powersave bias function (in a separate AMD specific driver)
> to aid ondemand governor's frequency transition deicisions.
>
> Signed-off-by: Jacob Shin <[email protected]>

Acked-by: Borislav Petkov <[email protected]>

--
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

2013-04-02 20:03:46

by Jacob Shin

[permalink] [raw]
Subject: Re: [PATCH V3 2/2] cpufreq: AMD "frequency sensitivity feedback" powersave bias for ondemand governor

On Tue, Apr 02, 2013 at 09:23:52PM +0200, Borislav Petkov wrote:
> On Tue, Apr 02, 2013 at 01:11:44PM -0500, Jacob Shin wrote:
> > Future AMD processors, starting with Family 16h, can provide software
> > with feedback on how the workload may respond to frequency change --
> > memory-bound workloads will not benefit from higher frequency, where
> > as compute-bound workloads will. This patch enables this "frequency
> > sensitivity feedback" to aid the ondemand governor to make better
> > frequency change decisions by hooking into the powersave bias.
> >
> > Signed-off-by: Jacob Shin <[email protected]>
> > ---
>
> [ … ]
>
> > --- a/drivers/cpufreq/Kconfig.x86
> > +++ b/drivers/cpufreq/Kconfig.x86
> > @@ -129,6 +129,23 @@ config X86_POWERNOW_K8
> >
> > For details, take a look at <file:Documentation/cpu-freq/>.
> >
> > +config X86_AMD_FREQ_SENSITIVITY
>
> /me is turning on his spell checker...

Yikes, sorry about that (*ashamed*), will remeber to run spellcheck
next time.

>
> > + tristate "AMD frequency sensitivity feedback powersave bias"
> > + depends on CPU_FREQ_GOV_ONDEMAND && X86_ACPI_CPUFREQ && CPU_SUP_AMD
> > + help
> > + This adds AMD specific powersave bias function to the ondemand
>
> AMD-specific
>
> > + governor; which can be used to help ondemand governor make more power
>
> "... governor, which allows it to make more power-conscious frequency
> change decisions based on ..."
>
> > + conscious frequency change decisions based on feedback from hardware
> > + (availble on AMD Family 16h and above).
>
> s/availble/available/
>
> > +
> > + Hardware feedback tells software how "sensitive" to frequency changes
> > + the CPUs' workloads are. CPU-bound workloads will be more sensitive
> > + -- they will perform better as frequency increases. Memory/IO-bound
> > + workloads will be less sensitive -- they will not necessarily perform
> > + better as frequnecy increases.
>
> s/frequnecy/frequency/
>
> > +
> > + If in doubt, say N.
> > +
> > config X86_GX_SUSPMOD
> > tristate "Cyrix MediaGX/NatSemi Geode Suspend Modulation"
> > depends on X86_32 && PCI
> > diff --git a/drivers/cpufreq/Makefile b/drivers/cpufreq/Makefile
> > index 863fd18..01dfdaf 100644
> > --- a/drivers/cpufreq/Makefile
> > +++ b/drivers/cpufreq/Makefile
> > @@ -41,6 +41,7 @@ obj-$(CONFIG_X86_SPEEDSTEP_CENTRINO) += speedstep-centrino.o
> > obj-$(CONFIG_X86_P4_CLOCKMOD) += p4-clockmod.o
> > obj-$(CONFIG_X86_CPUFREQ_NFORCE2) += cpufreq-nforce2.o
> > obj-$(CONFIG_X86_INTEL_PSTATE) += intel_pstate.o
> > +obj-$(CONFIG_X86_AMD_FREQ_SENSITIVITY) += amd_freq_sensitivity.o
> >
> > ##################################################################################
> > # ARM SoC drivers
> > diff --git a/drivers/cpufreq/amd_freq_sensitivity.c b/drivers/cpufreq/amd_freq_sensitivity.c
> > new file mode 100644
> > index 0000000..e3e62d2
> > --- /dev/null
> > +++ b/drivers/cpufreq/amd_freq_sensitivity.c
> > @@ -0,0 +1,150 @@
> > +/*
> > + * amd_freq_sensitivity.c: AMD frequency sensitivity feedback powersave bias
> > + * for the ondemand governor.
> > + *
> > + * Copyright (C) 2013 Advanced Micro Devices, Inc.
> > + *
> > + * Author: Jacob Shin <[email protected]>
> > + *
> > + * This program is free software; you can redistribute it and/or modify
> > + * it under the terms of the GNU General Public License version 2 as
> > + * published by the Free Software Foundation.
> > + */
> > +
> > +#include <linux/kernel.h>
> > +#include <linux/module.h>
> > +#include <linux/types.h>
> > +#include <linux/percpu-defs.h>
> > +#include <linux/init.h>
> > +#include <linux/mod_devicetable.h>
> > +
> > +#include <asm/msr.h>
> > +#include <asm/cpufeature.h>
> > +
> > +#include "cpufreq_governor.h"
> > +
> > +#define MSR_AMD64_FREQ_SENSITIVITY_ACTUAL 0xc0010080
> > +#define MSR_AMD64_FREQ_SENSITIVITY_REFERENCE 0xc0010081
> > +#define CLASS_CODE_SHIFT 56
> > +#define CLASS_CODE_CORE_FREQ_SENSITIVITY 0x01
> > +#define POWERSAVE_BIAS_MAX 1000
> > +
> > +struct cpu_data_t {
> > + u64 actual;
> > + u64 reference;
> > + unsigned int freq_prev;
> > +};
> > +
> > +static DEFINE_PER_CPU(struct cpu_data_t, cpu_data);
> > +
> > +static unsigned int amd_powersave_bias_target(struct cpufreq_policy *policy,
> > + unsigned int freq_next,
> > + unsigned int relation)
> > +{
> > + int sensitivity;
> > + long d_actual, d_reference;
> > + struct msr actual, reference;
> > + struct cpu_data_t *data = &per_cpu(cpu_data, policy->cpu);
> > + struct dbs_data *od_data = policy->governor_data;
> > + struct od_dbs_tuners *od_tuners = od_data->tuners;
> > + struct od_cpu_dbs_info_s *od_info =
> > + od_data->cdata->get_cpu_dbs_info_s(policy->cpu);
> > +
> > + if (!od_info->freq_table)
> > + return freq_next;
> > +
> > + rdmsr_on_cpu(policy->cpu, MSR_AMD64_FREQ_SENSITIVITY_ACTUAL,
> > + &actual.l, &actual.h);
> > + rdmsr_on_cpu(policy->cpu, MSR_AMD64_FREQ_SENSITIVITY_REFERENCE,
> > + &reference.l, &reference.h);
> > + actual.h &= 0x00ffffff;
> > + reference.h &= 0x00ffffff;
> > +
> > + /* counter wrapped around, so stay on current frequency */
> > + if (actual.q < data->actual || reference.q < data->reference) {
> > + freq_next = policy->cur;
> > + goto out;
> > + }
> > +
> > + d_actual = actual.q - data->actual;
> > + d_reference = reference.q - data->reference;
> > +
> > + /* divide by 0, so stay on current frequency as well */
> > + if (d_reference == 0) {
> > + freq_next = policy->cur;
> > + goto out;
> > + }
> > +
> > + sensitivity = POWERSAVE_BIAS_MAX -
> > + (POWERSAVE_BIAS_MAX * (d_reference - d_actual) / d_reference);
> > +
> > + clamp(sensitivity, 0, POWERSAVE_BIAS_MAX);
> > +
> > + /* this workload is not CPU bound, so choose a lower freq */
> > + if (sensitivity < od_tuners->powersave_bias) {
>
> Ok, I still didn't get an answer to that: don't we want to use this
> feature by default, even without looking at ->powersave_bias? I mean,
> with feedback from the hardware, we kinda know better than the user, no?

Well, so this powersave_bias also works as a tunable knob.

>From ondemand side, if /sys/../ondemand/powersave_bias is 0, then we
(AMD sensitivity) don't get called and you get the default ondemand
behavior.

Like existing powersave_bias, users can tune the value to whatever
they want, to get a specturum of less to more aggressive power savings
vs performance.

I thought tunable would be more flexible .. out in the field or what
not .. no?

>
> > + if (data->freq_prev == policy->cur)
> > + freq_next = policy->cur;
> > +
> > + if (freq_next > policy->cur)
> > + freq_next = policy->cur;
> > + else if (freq_next < policy->cur)
> > + freq_next = policy->min;
> > + else {
> > + unsigned int index;
> > +
> > + cpufreq_frequency_table_target(policy,
> > + od_info->freq_table, policy->cur - 1,
> > + CPUFREQ_RELATION_H, &index);
> > + freq_next = od_info->freq_table[index].frequency;
> > + }
> > +
> > + data->freq_prev = freq_next;
> > + } else
> > + data->freq_prev = 0;
> > +
> > +out:
> > + data->actual = actual.q;
> > + data->reference = reference.q;
> > + return freq_next;
> > +}
> > +
> > +static int __init amd_freq_sensitivity_init(void)
> > +{
> > + int err;
> > + u64 val;
> > +
> > + if (boot_cpu_data.x86_vendor != X86_VENDOR_AMD)
> > + return -ENODEV;
> > +
> > + if (!static_cpu_has(X86_FEATURE_PROC_FEEDBACK))
> > + return -ENODEV;
> > +
> > + err = rdmsrl_safe(MSR_AMD64_FREQ_SENSITIVITY_ACTUAL, &val);
> > +
>
> extraneous newline.
>
> > + if (err)
> > + return -ENODEV;
> > +
> > + if ((val >> CLASS_CODE_SHIFT) != CLASS_CODE_CORE_FREQ_SENSITIVITY)
> > + return -ENODEV;
>
> If this CLASS_CODE_CORE_FREQ_SENSITIVITY is always going to be a
> non-null value, you can simplify the check even more, as I proposed
> earlier:
>
> if (val >> CLASS_CODE_SHIFT)
> ...
>
> and drop CLASS_CODE_CORE_FREQ_SENSITIVITY.
>
> Thanks.
>
> --
> Regards/Gruss,
> Boris.
>
> Sent from a fat crate under my desk. Formatting is fine.
> --
>

2013-04-02 20:51:58

by Thomas Renninger

[permalink] [raw]
Subject: Re: [PATCH V3 2/2] cpufreq: AMD "frequency sensitivity feedback" powersave bias for ondemand governor

On Tuesday, April 02, 2013 03:03:37 PM Jacob Shin wrote:
> On Tue, Apr 02, 2013 at 09:23:52PM +0200, Borislav Petkov wrote:
> > On Tue, Apr 02, 2013 at 01:11:44PM -0500, Jacob Shin wrote:
> > > Future AMD processors, starting with Family 16h, can provide software
> > > with feedback on how the workload may respond to frequency change --
> > > memory-bound workloads will not benefit from higher frequency, where
> > > as compute-bound workloads will. This patch enables this "frequency
> > > sensitivity feedback" to aid the ondemand governor to make better
> > > frequency change decisions by hooking into the powersave bias.

I had a quick look at the specification of these registers.
So this seem to be designed and stay very cpufreq specific and other kernel
parts probably won't make use of it.
...
> > > +
> > > + /* this workload is not CPU bound, so choose a lower freq */
> > > + if (sensitivity < od_tuners->powersave_bias) {
> >
> > Ok, I still didn't get an answer to that: don't we want to use this
> > feature by default, even without looking at ->powersave_bias? I mean,
> > with feedback from the hardware, we kinda know better than the user, no?
>
> Well, so this powersave_bias also works as a tunable knob.
>
> From ondemand side, if /sys/../ondemand/powersave_bias is 0, then we
> (AMD sensitivity) don't get called and you get the default ondemand
> behavior.
>
> Like existing powersave_bias, users can tune the value to whatever
> they want, to get a specturum of less to more aggressive power savings
> vs performance.
I understand powersave_bias code to only be able to do a more
aggressive power saving way:
If you pass 900, a frequency of 90% (for example 900MHz instead of 1000MHz)
of the one ondemand typically would choose is taken.
powersave_bias values above 1000 (take higher frequencies than the ondemand
would take) are not allowed.

powersave_bias is undocumented in Documentation/cpu-freq/...
I guess its use-case is for people who want to get some percent more
power savings out of their laptop and do not care of the one or other
percent performance.
In fact I would like to get rid of this extra code and I expect nobody would
miss it.
I might miss a configuration tool where someone went through the code,
documented things and allows users to set powersave_bias values through
some /etc/* config files.
If so, please point me to it.

What your patch misses are some hints how and when to use this at all.
What value should a user write to powersave_bias tunable to activate your
stuff?
I guess it's also for laptop users to get some percent more battery out of
their platform and this with an even higher performance rate?
Server guys do not care for some percent of power, but they do care for
some percent of performance.

> I thought tunable would be more flexible .. out in the field or what
> not .. no?
Yep, if you want anyone to make use of this, it should better get embedded
in more general, at least general ondemand code.

Thomas

2013-04-02 20:53:01

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH V3 2/2] cpufreq: AMD "frequency sensitivity feedback" powersave bias for ondemand governor

On Tue, Apr 02, 2013 at 03:03:37PM -0500, Jacob Shin wrote:
> Well, so this powersave_bias also works as a tunable knob.
>
> From ondemand side, if /sys/../ondemand/powersave_bias is 0, then we
> (AMD sensitivity) don't get called and you get the default ondemand
> behavior.
>
> Like existing powersave_bias, users can tune the value to whatever
> they want, to get a specturum of less to more aggressive power savings
> vs performance.
>
> I thought tunable would be more flexible .. out in the field or what
> not .. no?

Ok, yes, that is the default on current systems which don't have hw
feedback.

But, on hw with such counters, I think the default should be to use
the hw feedback feature so that hardware can already do more informed
decisions for users.

As Thomas said, I hardly doubt users even know about that knob. So if we
can make the freq sensitivity thing work out of the box and without user
intervention, then we should strive to do that, no?

IOW:

if (!powersave_bias) {
/* user hasn't touched knob */

if (HAS_FEEDBACK_INTERFACE)
od_ops.powersave_bias_target(...);

__cpufreq_driver_target(..)
else
od_ops.powersave_bias_target(..)
__cpufreq_driver_target(..)
}

The only change is that on hw feedback systems, you don't get the old
behavior with powersave_bias == 0. Question is, do you even want it all
that much but would rather leave the hw do much more informed decisions
than the ondemand governor.

Hmmm.

--
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

2013-04-02 21:01:40

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH V3 2/2] cpufreq: AMD "frequency sensitivity feedback" powersave bias for ondemand governor

On Tue, Apr 02, 2013 at 10:51:51PM +0200, Thomas Renninger wrote:
> Yep, if you want anyone to make use of this, it should better get
> embedded in more general, at least general ondemand code.

Yeah, it all sounds like we want to enable this by default on systems
which support it. Maybe with an off-switch for people who want plain
ondemand decisions.

The remaining systems with ripped out powersave_bias would get plain
ondemand governor decisions. Provided, of course, nobody uses
powersave_bias and the functionality doesn't make any sense anyway.

Thanks.

--
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

2013-04-03 05:12:55

by Viresh Kumar

[permalink] [raw]
Subject: Re: [PATCH V3 1/2] cpufreq: ondemand: allow custom powersave_bias_target handler to be registered

On 2 April 2013 23:41, Jacob Shin <[email protected]> wrote:
> This allows for another [arch specific] driver to hook into existing
> powersave bias function of the ondemand governor. i.e. This allows AMD
> specific powersave bias function (in a separate AMD specific driver)
> to aid ondemand governor's frequency transition deicisions.
>
> Signed-off-by: Jacob Shin <[email protected]>
> ---
> drivers/cpufreq/cpufreq_governor.h | 3 +++
> drivers/cpufreq/cpufreq_ondemand.c | 32 ++++++++++++++++++++++++--------
> 2 files changed, 27 insertions(+), 8 deletions(-)

Acked-by: Viresh Kumar <[email protected]>

2013-04-03 16:53:31

by Jacob Shin

[permalink] [raw]
Subject: Re: [PATCH V3 2/2] cpufreq: AMD "frequency sensitivity feedback" powersave bias for ondemand governor

On Tue, Apr 02, 2013 at 11:01:24PM +0200, Borislav Petkov wrote:
> On Tue, Apr 02, 2013 at 10:51:51PM +0200, Thomas Renninger wrote:
> > powersave_bias is undocumented in Documentation/cpu-freq/...
> > I guess its use-case is for people who want to get some percent more
> > power savings out of their laptop and do not care of the one or other
> > percent performance.
> > In fact I would like to get rid of this extra code and I expect nobody would
> > miss it.
> > I might miss a configuration tool where someone went through the code,
> > documented things and allows users to set powersave_bias values through
> > some /etc/* config files.
> > Yep, if you want anyone to make use of this, it should better get
> > embedded in more general, at least general ondemand code.
>
> Yeah, it all sounds like we want to enable this by default on systems
> which support it. Maybe with an off-switch for people who want plain
> ondemand decisions.
>
> The remaining systems with ripped out powersave_bias would get plain
> ondemand governor decisions. Provided, of course, nobody uses
> powersave_bias and the functionality doesn't make any sense anyway.

Rafael, any thoughts on removing powersave_bias altogether ?

If we remove it, then is it acceptable to add an alternate callback/
handler registration to ondemand governor to account for hardware
feedback ?

Or, if we don't want to remove powersave_bias,

Then Thomas, Boris, would it be acceptable if enable the frequency
feedback feature by default with a sane powersave_bias tunable value ?
And also add proper documentation for both vanila powersave_bias and
powersave_bias with AMD frequency sensitivity loaded to
Documentation/cpu-freq/ondemand ?

>
> Thanks.
>
> --
> Regards/Gruss,
> Boris.
>
> Sent from a fat crate under my desk. Formatting is fine.
> --
>

2013-04-03 17:05:00

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH V3 2/2] cpufreq: AMD "frequency sensitivity feedback" powersave bias for ondemand governor

On Wed, Apr 03, 2013 at 11:53:24AM -0500, Jacob Shin wrote:
> Then Thomas, Boris, would it be acceptable if enable the frequency
> feedback feature by default with a sane powersave_bias tunable value?
> And also add proper documentation for both vanila powersave_bias
> and powersave_bias with AMD frequency sensitivity loaded to
> Documentation/cpu-freq/ondemand ?

Yeah, this was what I was proposing, basically. The only question here
is, would anyone want to disable freq decisions on systems with hw
feedback? If yes, then you'd need to be able to disable the feedback
thing, maybe have a magic value for powersave_bias...

Thanks.

--
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

2013-04-03 17:17:48

by Jacob Shin

[permalink] [raw]
Subject: Re: [PATCH V3 2/2] cpufreq: AMD "frequency sensitivity feedback" powersave bias for ondemand governor

On Wed, Apr 03, 2013 at 07:04:56PM +0200, Borislav Petkov wrote:
> On Wed, Apr 03, 2013 at 11:53:24AM -0500, Jacob Shin wrote:
> > Then Thomas, Boris, would it be acceptable if enable the frequency
> > feedback feature by default with a sane powersave_bias tunable value?
> > And also add proper documentation for both vanila powersave_bias
> > and powersave_bias with AMD frequency sensitivity loaded to
> > Documentation/cpu-freq/ondemand ?
>
> Yeah, this was what I was proposing, basically. The only question here
> is, would anyone want to disable freq decisions on systems with hw
> feedback? If yes, then you'd need to be able to disable the feedback
> thing, maybe have a magic value for powersave_bias...

Writing 0 to powersave_bias or unloading the AMD driver could do that.

When the AMD driver loads, it will give a sane default value to
powersave_bias to enable it, when it unloads, it will put it back to 0

>
> Thanks.
>
> --
> Regards/Gruss,
> Boris.
>
> Sent from a fat crate under my desk. Formatting is fine.
> --
> --
> To unsubscribe from this list: send the line "unsubscribe cpufreq" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>

2013-04-03 17:30:32

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH V3 2/2] cpufreq: AMD "frequency sensitivity feedback" powersave bias for ondemand governor

On Wed, Apr 03, 2013 at 12:17:38PM -0500, Jacob Shin wrote:
> Writing 0 to powersave_bias or unloading the AMD driver could do that.
>
> When the AMD driver loads, it will give a sane default value to
> powersave_bias to enable it, when it unloads, it will put it back to
> 0.

... and on systems without hw feedback, it will keep powersave_bias to 0
by default, retaining the old behavior.

Yeah, sounds like a plan.

Thanks.

--
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--