2013-03-28 18:24:25

by Jacob Shin

[permalink] [raw]
Subject: [PATCH V2 0/2] cpufreq: ondemand: add AMD specific powersave bias

This patchset adds AMD specific powersave bias function to the ondemand
governor; which can be used to help ondemand governor make more power conscious
frequency change decisions based on feedback from hardware (availble on AMD
Family 16h and above).

Hardware feedback tells software how "sensitive" to frequency changes the
workloads are. CPU-bound workloads will be more sensitive -- they will
perform better as frequency increases. Memory/IO-bound workloads will be less
sensitive -- they will not necessarily perform better as frequnecy increases.

This patchset was compared against ondemand governor without powersave bias
and did not show any performance degradation on CPU-bound workloads such as
kernbench and unixbench. While saving power on Memory-bound workloads such as
stream.

This applies to linux-pm's linux-next branch, on top of Viresh's 'Implement
per policy instance of governor' V4 patchset:

https://lkml.org/lkml/2013/3/27/348

V2:
* Added proper include files to amd_freq_sensitivity.c
* Only register powersave_bias_target function pointer and not the entire
od_ops.

Jacob Shin (2):
cpufreq: ondemand: allow custom powersave_bias_target function to be
registered
cpufreq: AMD "frequency sensitivity feedback" powersave bias for
ondemand governor

arch/x86/include/uapi/asm/msr-index.h | 1 +
drivers/cpufreq/Kconfig.x86 | 10 +++
drivers/cpufreq/Makefile | 1 +
drivers/cpufreq/amd_freq_sensitivity.c | 147 ++++++++++++++++++++++++++++++++
drivers/cpufreq/cpufreq_governor.h | 3 +
drivers/cpufreq/cpufreq_ondemand.c | 22 ++++-
6 files changed, 181 insertions(+), 3 deletions(-)
create mode 100644 drivers/cpufreq/amd_freq_sensitivity.c

--
1.7.9.5


2013-03-28 18:24:22

by Jacob Shin

[permalink] [raw]
Subject: [PATCH V2 1/2] cpufreq: ondemand: allow custom powersave_bias_target function to be registered

This allows for another [arch specific] driver to hook into existing
powersave bias function of the ondemand governor. i.e. This allows AMD
specific powersave bias function (in a separate AMD specific driver)
to aid ondemand governor's frequency transition deicisions.

Signed-off-by: Jacob Shin <[email protected]>
---
drivers/cpufreq/cpufreq_governor.h | 3 +++
drivers/cpufreq/cpufreq_ondemand.c | 22 +++++++++++++++++++---
2 files changed, 22 insertions(+), 3 deletions(-)

diff --git a/drivers/cpufreq/cpufreq_governor.h b/drivers/cpufreq/cpufreq_governor.h
index c83cabf..4b6808f 100644
--- a/drivers/cpufreq/cpufreq_governor.h
+++ b/drivers/cpufreq/cpufreq_governor.h
@@ -262,4 +262,7 @@ bool need_load_eval(struct cpu_dbs_common_info *cdbs,
unsigned int sampling_rate);
int cpufreq_governor_dbs(struct cpufreq_policy *policy,
struct common_dbs_data *cdata, unsigned int event);
+void od_register_powersave_bias_function(unsigned int (*f)
+ (struct cpufreq_policy *, unsigned int, unsigned int));
+void od_unregister_powersave_bias_function(void);
#endif /* _CPUFREQ_GOVERNER_H */
diff --git a/drivers/cpufreq/cpufreq_ondemand.c b/drivers/cpufreq/cpufreq_ondemand.c
index 15e80ee..36f0798 100644
--- a/drivers/cpufreq/cpufreq_ondemand.c
+++ b/drivers/cpufreq/cpufreq_ondemand.c
@@ -40,6 +40,8 @@

static DEFINE_PER_CPU(struct od_cpu_dbs_info_s, od_cpu_dbs_info);

+static struct od_ops od_ops;
+
#ifndef CONFIG_CPU_FREQ_DEFAULT_GOV_ONDEMAND
static struct cpufreq_governor cpufreq_gov_ondemand;
#endif
@@ -145,7 +147,8 @@ static void dbs_freq_increase(struct cpufreq_policy *p, unsigned int freq)
struct od_dbs_tuners *od_tuners = dbs_data->tuners;

if (od_tuners->powersave_bias)
- freq = powersave_bias_target(p, freq, CPUFREQ_RELATION_H);
+ freq = od_ops.powersave_bias_target(p, freq,
+ CPUFREQ_RELATION_H);
else if (p->cur == p->max)
return;

@@ -206,8 +209,8 @@ static void od_check_cpu(int cpu, unsigned int load_freq)
__cpufreq_driver_target(policy, freq_next,
CPUFREQ_RELATION_L);
} else {
- int freq = powersave_bias_target(policy, freq_next,
- CPUFREQ_RELATION_L);
+ int freq = od_ops.powersave_bias_target(policy,
+ freq_next, CPUFREQ_RELATION_L);
__cpufreq_driver_target(policy, freq,
CPUFREQ_RELATION_L);
}
@@ -565,6 +568,19 @@ static struct common_dbs_data od_dbs_cdata = {
.exit = od_exit,
};

+void od_register_powersave_bias_function(unsigned int (*f)
+ (struct cpufreq_policy *, unsigned int, unsigned int))
+{
+ od_ops.powersave_bias_target = f;
+}
+EXPORT_SYMBOL_GPL(od_register_powersave_bias_function);
+
+void od_unregister_powersave_bias_function(void)
+{
+ od_ops.powersave_bias_target = powersave_bias_target;
+}
+EXPORT_SYMBOL_GPL(od_unregister_powersave_bias_function);
+
static int od_cpufreq_governor_dbs(struct cpufreq_policy *policy,
unsigned int event)
{
--
1.7.9.5

2013-03-28 18:25:14

by Jacob Shin

[permalink] [raw]
Subject: [PATCH V2 2/2] cpufreq: AMD "frequency sensitivity feedback" powersave bias for ondemand governor

Future AMD processors, starting with Family 16h, can provide software
with feedback on how the workload may respond to frequency change --
memory-bound workloads will not benefit from higher frequency, where
as compute-bound workloads will. This patch enables this "frequency
sensitivity feedback" to aid the ondemand governor to make better
frequency change decisions by hooking into the powersave bias.

Signed-off-by: Jacob Shin <[email protected]>
---
arch/x86/include/uapi/asm/msr-index.h | 1 +
drivers/cpufreq/Kconfig.x86 | 10 +++
drivers/cpufreq/Makefile | 1 +
drivers/cpufreq/amd_freq_sensitivity.c | 147 ++++++++++++++++++++++++++++++++
4 files changed, 159 insertions(+)
create mode 100644 drivers/cpufreq/amd_freq_sensitivity.c

diff --git a/arch/x86/include/uapi/asm/msr-index.h b/arch/x86/include/uapi/asm/msr-index.h
index 7a060f4..b2e6c49 100644
--- a/arch/x86/include/uapi/asm/msr-index.h
+++ b/arch/x86/include/uapi/asm/msr-index.h
@@ -173,6 +173,7 @@
#define MSR_AMD64_TSC_RATIO 0xc0000104
#define MSR_AMD64_NB_CFG 0xc001001f
#define MSR_AMD64_PATCH_LOADER 0xc0010020
+#define MSR_AMD64_FREQ_SENSITIVITY 0xc0010080
#define MSR_AMD64_OSVW_ID_LENGTH 0xc0010140
#define MSR_AMD64_OSVW_STATUS 0xc0010141
#define MSR_AMD64_DC_CFG 0xc0011022
diff --git a/drivers/cpufreq/Kconfig.x86 b/drivers/cpufreq/Kconfig.x86
index d7dc0ed..6c714b0 100644
--- a/drivers/cpufreq/Kconfig.x86
+++ b/drivers/cpufreq/Kconfig.x86
@@ -129,6 +129,16 @@ config X86_POWERNOW_K8

For details, take a look at <file:Documentation/cpu-freq/>.

+config X86_AMD_FREQ_SENSITIVITY
+ tristate "AMD 'frequency sensitivity feedback' powersave bias"
+ depends on CPU_FREQ_GOV_ONDEMAND && X86_ACPI_CPUFREQ
+ help
+ This adds support for 'frequency sensitivity feedback' feature on
+ supported AMD processors, which hooks into the ondemand governor's
+ powersave bias to influence frequency change decisions.
+
+ If in doubt, say N.
+
config X86_GX_SUSPMOD
tristate "Cyrix MediaGX/NatSemi Geode Suspend Modulation"
depends on X86_32 && PCI
diff --git a/drivers/cpufreq/Makefile b/drivers/cpufreq/Makefile
index 863fd18..01dfdaf 100644
--- a/drivers/cpufreq/Makefile
+++ b/drivers/cpufreq/Makefile
@@ -41,6 +41,7 @@ obj-$(CONFIG_X86_SPEEDSTEP_CENTRINO) += speedstep-centrino.o
obj-$(CONFIG_X86_P4_CLOCKMOD) += p4-clockmod.o
obj-$(CONFIG_X86_CPUFREQ_NFORCE2) += cpufreq-nforce2.o
obj-$(CONFIG_X86_INTEL_PSTATE) += intel_pstate.o
+obj-$(CONFIG_X86_AMD_FREQ_SENSITIVITY) += amd_freq_sensitivity.o

##################################################################################
# ARM SoC drivers
diff --git a/drivers/cpufreq/amd_freq_sensitivity.c b/drivers/cpufreq/amd_freq_sensitivity.c
new file mode 100644
index 0000000..997feb0
--- /dev/null
+++ b/drivers/cpufreq/amd_freq_sensitivity.c
@@ -0,0 +1,147 @@
+/*
+ * amd_freq_sensitivity.c: AMD "frequency sensitivity feedback" powersave bias
+ * for ondemand governor.
+ *
+ * Copyright (C) 2013 Advanced Micro Devices, Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include <linux/module.h>
+#include <linux/types.h>
+#include <linux/percpu-defs.h>
+#include <linux/init.h>
+
+#include "cpufreq_governor.h"
+
+#define PROC_FEEDBACK_INTERFACE_SHIFT 11
+#define CLASS_CODE_SHIFT 56
+#define CLASS_CODE_MASK 0xff
+#define CLASS_CODE_CORE_FREQUENCY_SENSITIVITY 0x01
+
+static u32 msr_addr;
+
+struct cpu_data_t {
+ u64 actual;
+ u64 reference;
+ unsigned int freq_prev;
+};
+
+static DEFINE_PER_CPU(struct cpu_data_t, cpu_data);
+
+static unsigned int amd_powersave_bias_target(struct cpufreq_policy *policy,
+ unsigned int freq_next, unsigned int relation)
+{
+ int sensitivity;
+ long d_actual, d_reference;
+ struct msr actual, reference;
+ struct cpu_data_t *data = &per_cpu(cpu_data, policy->cpu);
+ struct dbs_data *od_data = policy->governor_data;
+ struct od_dbs_tuners *od_tuners = od_data->tuners;
+ struct od_cpu_dbs_info_s *od_info =
+ od_data->cdata->get_cpu_dbs_info_s(policy->cpu);
+
+ rdmsr_on_cpu(policy->cpu, msr_addr, &actual.l, &actual.h);
+ rdmsr_on_cpu(policy->cpu, msr_addr + 1, &reference.l, &reference.h);
+ actual.h &= 0x00ffffff;
+ reference.h &= 0x00ffffff;
+
+ if (!od_info->freq_table)
+ goto out;
+
+ /* counter wrapped around, so push until next check */
+ if (actual.q < data->actual || reference.q < data->reference) {
+ freq_next = policy->cur;
+ goto out;
+ }
+
+ d_actual = actual.q - data->actual;
+ d_reference = reference.q - data->reference;
+
+ /* divide by 0, so push as well */
+ if (d_reference == 0) {
+ freq_next = policy->cur;
+ goto out;
+ }
+
+ sensitivity = 1000 - (1000 * (d_reference - d_actual) / d_reference);
+
+ if (sensitivity > 1000)
+ sensitivity = 1000;
+ else if (sensitivity < 0)
+ sensitivity = 0;
+
+ /* this workload is not CPU bound, so choose a lower freq */
+ if (sensitivity < od_tuners->powersave_bias) {
+ if (data->freq_prev == policy->cur)
+ freq_next = policy->cur;
+
+ if (freq_next > policy->cur)
+ freq_next = policy->cur;
+ else if (freq_next < policy->cur)
+ freq_next = policy->min;
+ else {
+ unsigned int index = 0;
+
+ cpufreq_frequency_table_target(policy,
+ od_info->freq_table, policy->cur - 1,
+ CPUFREQ_RELATION_H, &index);
+ freq_next = od_info->freq_table[index].frequency;
+ }
+
+ data->freq_prev = freq_next;
+ } else
+ data->freq_prev = 0;
+
+out:
+ data->actual = actual.q;
+ data->reference = reference.q;
+ return freq_next;
+}
+
+static int __init amd_freq_sensitivity_init(void)
+{
+ int i;
+ u32 eax, edx, dummy;
+
+ if (boot_cpu_data.x86_vendor != X86_VENDOR_AMD)
+ return -ENODEV;
+
+ cpuid(0x80000007, &eax, &dummy, &dummy, &edx);
+
+ if (!(edx & (1 << PROC_FEEDBACK_INTERFACE_SHIFT)))
+ return -ENODEV;
+
+ for (i = 0; i < (eax & 0xf); i++) {
+ u64 val;
+ u32 addr = MSR_AMD64_FREQ_SENSITIVITY + (i * 2);
+
+ rdmsrl(addr, val);
+
+ if (((val >> CLASS_CODE_SHIFT) & CLASS_CODE_MASK)
+ == CLASS_CODE_CORE_FREQUENCY_SENSITIVITY) {
+ msr_addr = addr;
+ break;
+ }
+ }
+
+ if (!msr_addr)
+ return -ENODEV;
+
+ od_register_powersave_bias_function(amd_powersave_bias_target);
+ return 0;
+}
+module_init(amd_freq_sensitivity_init);
+
+static void __exit amd_freq_sensitivity_exit(void)
+{
+ od_unregister_powersave_bias_function();
+}
+module_exit(amd_freq_sensitivity_exit);
+
+MODULE_AUTHOR("Jacob Shin <[email protected]>");
+MODULE_DESCRIPTION("AMD 'frequency sensitivity feedback' powersave bias for "
+ "the ondemand governor.");
+MODULE_LICENSE("GPL");
--
1.7.9.5

2013-03-29 02:48:57

by Viresh Kumar

[permalink] [raw]
Subject: Re: [PATCH V2 0/2] cpufreq: ondemand: add AMD specific powersave bias

On 28 March 2013 23:54, Jacob Shin <[email protected]> wrote:
> This patchset adds AMD specific powersave bias function to the ondemand
> governor; which can be used to help ondemand governor make more power conscious
> frequency change decisions based on feedback from hardware (availble on AMD
> Family 16h and above).
>
> Hardware feedback tells software how "sensitive" to frequency changes the
> workloads are. CPU-bound workloads will be more sensitive -- they will
> perform better as frequency increases. Memory/IO-bound workloads will be less
> sensitive -- they will not necessarily perform better as frequnecy increases.
>
> This patchset was compared against ondemand governor without powersave bias
> and did not show any performance degradation on CPU-bound workloads such as
> kernbench and unixbench. While saving power on Memory-bound workloads such as
> stream.
>
> This applies to linux-pm's linux-next branch, on top of Viresh's 'Implement
> per policy instance of governor' V4 patchset:
>
> https://lkml.org/lkml/2013/3/27/348
>
> V2:
> * Added proper include files to amd_freq_sensitivity.c
> * Only register powersave_bias_target function pointer and not the entire
> od_ops.
>
> Jacob Shin (2):
> cpufreq: ondemand: allow custom powersave_bias_target function to be
> registered
> cpufreq: AMD "frequency sensitivity feedback" powersave bias for
> ondemand governor
>
> arch/x86/include/uapi/asm/msr-index.h | 1 +
> drivers/cpufreq/Kconfig.x86 | 10 +++
> drivers/cpufreq/Makefile | 1 +
> drivers/cpufreq/amd_freq_sensitivity.c | 147 ++++++++++++++++++++++++++++++++
> drivers/cpufreq/cpufreq_governor.h | 3 +
> drivers/cpufreq/cpufreq_ondemand.c | 22 ++++-
> 6 files changed, 181 insertions(+), 3 deletions(-)
> create mode 100644 drivers/cpufreq/amd_freq_sensitivity.c

Acked-by: Viresh Kumar <[email protected]>

2013-04-01 19:38:51

by Jacob Shin

[permalink] [raw]
Subject: Re: [PATCH V2 0/2] cpufreq: ondemand: add AMD specific powersave bias

On Fri, Mar 29, 2013 at 08:18:54AM +0530, Viresh Kumar wrote:
> On 28 March 2013 23:54, Jacob Shin <[email protected]> wrote:
> > This patchset adds AMD specific powersave bias function to the ondemand
> > governor; which can be used to help ondemand governor make more power conscious
> > frequency change decisions based on feedback from hardware (availble on AMD
> > Family 16h and above).
> >
> > Hardware feedback tells software how "sensitive" to frequency changes the
> > workloads are. CPU-bound workloads will be more sensitive -- they will
> > perform better as frequency increases. Memory/IO-bound workloads will be less
> > sensitive -- they will not necessarily perform better as frequnecy increases.
> >
> > This patchset was compared against ondemand governor without powersave bias
> > and did not show any performance degradation on CPU-bound workloads such as
> > kernbench and unixbench. While saving power on Memory-bound workloads such as
> > stream.
> >
> > This applies to linux-pm's linux-next branch, on top of Viresh's 'Implement
> > per policy instance of governor' V4 patchset:
> >
> > https://lkml.org/lkml/2013/3/27/348
> >
> > V2:
> > * Added proper include files to amd_freq_sensitivity.c
> > * Only register powersave_bias_target function pointer and not the entire
> > od_ops.
> >
> > Jacob Shin (2):
> > cpufreq: ondemand: allow custom powersave_bias_target function to be
> > registered
> > cpufreq: AMD "frequency sensitivity feedback" powersave bias for
> > ondemand governor
> >
> > arch/x86/include/uapi/asm/msr-index.h | 1 +
> > drivers/cpufreq/Kconfig.x86 | 10 +++
> > drivers/cpufreq/Makefile | 1 +
> > drivers/cpufreq/amd_freq_sensitivity.c | 147 ++++++++++++++++++++++++++++++++
> > drivers/cpufreq/cpufreq_governor.h | 3 +
> > drivers/cpufreq/cpufreq_ondemand.c | 22 ++++-
> > 6 files changed, 181 insertions(+), 3 deletions(-)
> > create mode 100644 drivers/cpufreq/amd_freq_sensitivity.c
>
> Acked-by: Viresh Kumar <[email protected]>
>

Hi Rafael,

If this looks okay to you,
please commit to linux-next when you get the chance.

Thanks!

2013-04-01 20:36:13

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH V2 0/2] cpufreq: ondemand: add AMD specific powersave bias

On Monday, April 01, 2013 02:38:39 PM Jacob Shin wrote:
> On Fri, Mar 29, 2013 at 08:18:54AM +0530, Viresh Kumar wrote:
> > On 28 March 2013 23:54, Jacob Shin <[email protected]> wrote:
> > > This patchset adds AMD specific powersave bias function to the ondemand
> > > governor; which can be used to help ondemand governor make more power conscious
> > > frequency change decisions based on feedback from hardware (availble on AMD
> > > Family 16h and above).
> > >
> > > Hardware feedback tells software how "sensitive" to frequency changes the
> > > workloads are. CPU-bound workloads will be more sensitive -- they will
> > > perform better as frequency increases. Memory/IO-bound workloads will be less
> > > sensitive -- they will not necessarily perform better as frequnecy increases.
> > >
> > > This patchset was compared against ondemand governor without powersave bias
> > > and did not show any performance degradation on CPU-bound workloads such as
> > > kernbench and unixbench. While saving power on Memory-bound workloads such as
> > > stream.
> > >
> > > This applies to linux-pm's linux-next branch, on top of Viresh's 'Implement
> > > per policy instance of governor' V4 patchset:
> > >
> > > https://lkml.org/lkml/2013/3/27/348
> > >
> > > V2:
> > > * Added proper include files to amd_freq_sensitivity.c
> > > * Only register powersave_bias_target function pointer and not the entire
> > > od_ops.
> > >
> > > Jacob Shin (2):
> > > cpufreq: ondemand: allow custom powersave_bias_target function to be
> > > registered
> > > cpufreq: AMD "frequency sensitivity feedback" powersave bias for
> > > ondemand governor
> > >
> > > arch/x86/include/uapi/asm/msr-index.h | 1 +
> > > drivers/cpufreq/Kconfig.x86 | 10 +++
> > > drivers/cpufreq/Makefile | 1 +
> > > drivers/cpufreq/amd_freq_sensitivity.c | 147 ++++++++++++++++++++++++++++++++
> > > drivers/cpufreq/cpufreq_governor.h | 3 +
> > > drivers/cpufreq/cpufreq_ondemand.c | 22 ++++-
> > > 6 files changed, 181 insertions(+), 3 deletions(-)
> > > create mode 100644 drivers/cpufreq/amd_freq_sensitivity.c
> >
> > Acked-by: Viresh Kumar <[email protected]>
> >
>
> Hi Rafael,
>
> If this looks okay to you,
> please commit to linux-next when you get the chance.

OK, I wonder what Boris thinks about it, though.

Boris, could you please have a look at this and let me know what you think?

Rafael


--
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

2013-04-02 11:40:17

by Thomas Renninger

[permalink] [raw]
Subject: Re: [PATCH V2 2/2] cpufreq: AMD "frequency sensitivity feedback" powersave bias for ondemand governor

On Thursday, March 28, 2013 01:24:17 PM Jacob Shin wrote:
> Future AMD processors, starting with Family 16h, can provide software
> with feedback on how the workload may respond to frequency change --
> memory-bound workloads will not benefit from higher frequency, where
> as compute-bound workloads will. This patch enables this "frequency
> sensitivity feedback" to aid the ondemand governor to make better
> frequency change decisions by hooking into the powersave bias.
If I read this correctly, nothing changes even if the driver is loaded,
unless user modifies:
/sys/devices/system/cpu/cpufreq/ondemand/powersave_bias
is this correct?

I wonder who should modify:
/sys/devices/system/cpu/cpufreq/ondemand/powersave_bias
Even cpupower is not aware of this very specific tunable.

Also, are you sure cpufreq subsystem will be the only user
of this one?
Or could cpuidle or others also make use of this somewhen in the future?

Then this could more be done like:
drivers/cpufreq/mperf.c
And scheduler, cpuidle, cpufreq or whatever could use this as well.

Just some thinking:
I wonder how one could check/verify that the right thing is done
(by CPU and kernel). Ideally it would be nice to have the CPU register
appended to a cpufreq or cpuidle event trace.
But this very (AMD or X86 only?) specific data would not look nice there.
An arch placeholder value would be needed or similar?

...
> +}
> +
> +static int __init amd_freq_sensitivity_init(void)
> +{
> + int i;
> + u32 eax, edx, dummy;
> +
> + if (boot_cpu_data.x86_vendor != X86_VENDOR_AMD)
> + return -ENODEV;
> +
> + cpuid(0x80000007, &eax, &dummy, &dummy, &edx);
If this really should be a separate module:
Does/will Intel have the same (feature/cpuid bit)?
Anyway, this should get a general AMD or X86 CPU capability flag.

Then you can also autoload this driver similar to how it's done in acpi-
cpufreq:
static const struct x86_cpu_id acpi_cpufreq_ids[] = {
X86_FEATURE_MATCH(X86_FEATURE_ACPI),
X86_FEATURE_MATCH(X86_FEATURE_HW_PSTATE),
{}
};
MODULE_DEVICE_TABLE(x86cpu, acpi_cpufreq_ids);

Thomas

2013-04-02 12:43:36

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH V2 1/2] cpufreq: ondemand: allow custom powersave_bias_target function to be registered

On Thu, Mar 28, 2013 at 01:24:16PM -0500, Jacob Shin wrote:
> This allows for another [arch specific] driver to hook into existing
> powersave bias function of the ondemand governor. i.e. This allows AMD
> specific powersave bias function (in a separate AMD specific driver)
> to aid ondemand governor's frequency transition deicisions.
>
> Signed-off-by: Jacob Shin <[email protected]>
> ---
> drivers/cpufreq/cpufreq_governor.h | 3 +++
> drivers/cpufreq/cpufreq_ondemand.c | 22 +++++++++++++++++++---
> 2 files changed, 22 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/cpufreq/cpufreq_governor.h b/drivers/cpufreq/cpufreq_governor.h
> index c83cabf..4b6808f 100644
> --- a/drivers/cpufreq/cpufreq_governor.h
> +++ b/drivers/cpufreq/cpufreq_governor.h
> @@ -262,4 +262,7 @@ bool need_load_eval(struct cpu_dbs_common_info *cdbs,
> unsigned int sampling_rate);
> int cpufreq_governor_dbs(struct cpufreq_policy *policy,
> struct common_dbs_data *cdata, unsigned int event);
> +void od_register_powersave_bias_function(unsigned int (*f)
> + (struct cpufreq_policy *, unsigned int, unsigned int));
> +void od_unregister_powersave_bias_function(void);

We generally call those a "callback" or a "handler". I.e.,
od_register_powersave_bias_handler or something.

> #endif /* _CPUFREQ_GOVERNER_H */
> diff --git a/drivers/cpufreq/cpufreq_ondemand.c b/drivers/cpufreq/cpufreq_ondemand.c
> index 15e80ee..36f0798 100644
> --- a/drivers/cpufreq/cpufreq_ondemand.c
> +++ b/drivers/cpufreq/cpufreq_ondemand.c
> @@ -40,6 +40,8 @@
>
> static DEFINE_PER_CPU(struct od_cpu_dbs_info_s, od_cpu_dbs_info);
>
> +static struct od_ops od_ops;
> +
> #ifndef CONFIG_CPU_FREQ_DEFAULT_GOV_ONDEMAND
> static struct cpufreq_governor cpufreq_gov_ondemand;
> #endif
> @@ -145,7 +147,8 @@ static void dbs_freq_increase(struct cpufreq_policy *p, unsigned int freq)
> struct od_dbs_tuners *od_tuners = dbs_data->tuners;
>
> if (od_tuners->powersave_bias)
> - freq = powersave_bias_target(p, freq, CPUFREQ_RELATION_H);
> + freq = od_ops.powersave_bias_target(p, freq,
> + CPUFREQ_RELATION_H);
> else if (p->cur == p->max)
> return;
>
> @@ -206,8 +209,8 @@ static void od_check_cpu(int cpu, unsigned int load_freq)
> __cpufreq_driver_target(policy, freq_next,
> CPUFREQ_RELATION_L);
> } else {
> - int freq = powersave_bias_target(policy, freq_next,
> - CPUFREQ_RELATION_L);
> + int freq = od_ops.powersave_bias_target(policy,
> + freq_next, CPUFREQ_RELATION_L);
> __cpufreq_driver_target(policy, freq,
> CPUFREQ_RELATION_L);
> }
> @@ -565,6 +568,19 @@ static struct common_dbs_data od_dbs_cdata = {
> .exit = od_exit,
> };
>
> +void od_register_powersave_bias_function(unsigned int (*f)
> + (struct cpufreq_policy *, unsigned int, unsigned int))
> +{
> + od_ops.powersave_bias_target = f;
> +}
> +EXPORT_SYMBOL_GPL(od_register_powersave_bias_function);
> +
> +void od_unregister_powersave_bias_function(void)
> +{
> + od_ops.powersave_bias_target = powersave_bias_target;

This is very confusing: we have ->powersave_bias_target and the default
powersave_bias_target in the ondemand governor. Can we call the default
one generic_powersave_bias_target or default_* or whatever.

Thanks.

--
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

2013-04-02 12:49:46

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH V2 2/2] cpufreq: AMD "frequency sensitivity feedback" powersave bias for ondemand governor

On Tue, Apr 02, 2013 at 01:40:13PM +0200, Thomas Renninger wrote:
> On Thursday, March 28, 2013 01:24:17 PM Jacob Shin wrote:
> > Future AMD processors, starting with Family 16h, can provide software
> > with feedback on how the workload may respond to frequency change --
> > memory-bound workloads will not benefit from higher frequency, where
> > as compute-bound workloads will. This patch enables this "frequency
> > sensitivity feedback" to aid the ondemand governor to make better
> > frequency change decisions by hooking into the powersave bias.
> If I read this correctly, nothing changes even if the driver is loaded,
> unless user modifies:
> /sys/devices/system/cpu/cpufreq/ondemand/powersave_bias
> is this correct?
>
> I wonder who should modify:
> /sys/devices/system/cpu/cpufreq/ondemand/powersave_bias
> Even cpupower is not aware of this very specific tunable.
>
> Also, are you sure cpufreq subsystem will be the only user
> of this one?
> Or could cpuidle or others also make use of this somewhen in the future?

Yeah, I don't think this is supposed to work like that - more likely,
you want to use the freq sensitivity thing by default if the hardware
supports it.

So I think the od_tuners->powersave_bias check needs to be augmented
with a freq_sensitivity cpuid bit check...

> Then this could more be done like:
> drivers/cpufreq/mperf.c
> And scheduler, cpuidle, cpufreq or whatever could use this as well.
>
> Just some thinking:
> I wonder how one could check/verify that the right thing is done
> (by CPU and kernel). Ideally it would be nice to have the CPU register
> appended to a cpufreq or cpuidle event trace.
> But this very (AMD or X86 only?) specific data would not look nice there.
> An arch placeholder value would be needed or similar?

I actually wonder whether this should be a separate module but I
guess this is maybe the most agreeable way for adding vendor-specific
functionality to cpufreq.

> ...
> > +}
> > +
> > +static int __init amd_freq_sensitivity_init(void)
> > +{
> > + int i;
> > + u32 eax, edx, dummy;
> > +
> > + if (boot_cpu_data.x86_vendor != X86_VENDOR_AMD)
> > + return -ENODEV;
> > +
> > + cpuid(0x80000007, &eax, &dummy, &dummy, &edx);
> If this really should be a separate module:
> Does/will Intel have the same (feature/cpuid bit)?
> Anyway, this should get a general AMD or X86 CPU capability flag.
>
> Then you can also autoload this driver similar to how it's done in acpi-
> cpufreq:
> static const struct x86_cpu_id acpi_cpufreq_ids[] = {
> X86_FEATURE_MATCH(X86_FEATURE_ACPI),
> X86_FEATURE_MATCH(X86_FEATURE_HW_PSTATE),
> {}
> };
> MODULE_DEVICE_TABLE(x86cpu, acpi_cpufreq_ids);

Yes, this needs to be a cpu feature bit in cpufeature.h and be loaded
automatically.

Thanks.

--
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

2013-04-02 13:31:18

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH V2 1/2] cpufreq: ondemand: allow custom powersave_bias_target function to be registered

On Thu, Mar 28, 2013 at 01:24:16PM -0500, Jacob Shin wrote:
> @@ -206,8 +209,8 @@ static void od_check_cpu(int cpu, unsigned int load_freq)
> __cpufreq_driver_target(policy, freq_next,
> CPUFREQ_RELATION_L);
> } else {
> - int freq = powersave_bias_target(policy, freq_next,
> - CPUFREQ_RELATION_L);
> + int freq = od_ops.powersave_bias_target(policy,
> + freq_next, CPUFREQ_RELATION_L);
> __cpufreq_driver_target(policy, freq,
> CPUFREQ_RELATION_L);

Btw, one more thing: you can simplify this code a bit, while you're at
it:

if (!od_tuners->powersave_bias) {
__cpufreq_driver_target(policy, freq_next,
CPUFREQ_RELATION_L);
return;
}

freq_next = od_ops.powersave_bias_target(policy, freq_next, CPUFREQ_RELATION_L);
__cpufreq_driver_target(policy, freq, CPUFREQ_RELATION_L);
}

and drop the local "int freq" too.

Thanks.

--
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

2013-04-02 13:43:00

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH V2 2/2] cpufreq: AMD "frequency sensitivity feedback" powersave bias for ondemand governor

On Thu, Mar 28, 2013 at 01:24:17PM -0500, Jacob Shin wrote:
> Future AMD processors, starting with Family 16h, can provide software
> with feedback on how the workload may respond to frequency change --
> memory-bound workloads will not benefit from higher frequency, where
> as compute-bound workloads will. This patch enables this "frequency
> sensitivity feedback" to aid the ondemand governor to make better
> frequency change decisions by hooking into the powersave bias.
>
> Signed-off-by: Jacob Shin <[email protected]>
> ---
> arch/x86/include/uapi/asm/msr-index.h | 1 +
> drivers/cpufreq/Kconfig.x86 | 10 +++
> drivers/cpufreq/Makefile | 1 +
> drivers/cpufreq/amd_freq_sensitivity.c | 147 ++++++++++++++++++++++++++++++++
> 4 files changed, 159 insertions(+)
> create mode 100644 drivers/cpufreq/amd_freq_sensitivity.c
>
> diff --git a/arch/x86/include/uapi/asm/msr-index.h b/arch/x86/include/uapi/asm/msr-index.h
> index 7a060f4..b2e6c49 100644
> --- a/arch/x86/include/uapi/asm/msr-index.h
> +++ b/arch/x86/include/uapi/asm/msr-index.h
> @@ -173,6 +173,7 @@
> #define MSR_AMD64_TSC_RATIO 0xc0000104
> #define MSR_AMD64_NB_CFG 0xc001001f
> #define MSR_AMD64_PATCH_LOADER 0xc0010020
> +#define MSR_AMD64_FREQ_SENSITIVITY 0xc0010080
> #define MSR_AMD64_OSVW_ID_LENGTH 0xc0010140
> #define MSR_AMD64_OSVW_STATUS 0xc0010141
> #define MSR_AMD64_DC_CFG 0xc0011022

My guess is, this MSR won't be used outside of cpufreq so you probably
want to define it there, in amd_freq_sensitivity.c

> diff --git a/drivers/cpufreq/Kconfig.x86 b/drivers/cpufreq/Kconfig.x86
> index d7dc0ed..6c714b0 100644
> --- a/drivers/cpufreq/Kconfig.x86
> +++ b/drivers/cpufreq/Kconfig.x86
> @@ -129,6 +129,16 @@ config X86_POWERNOW_K8
>
> For details, take a look at <file:Documentation/cpu-freq/>.
>
> +config X86_AMD_FREQ_SENSITIVITY
> + tristate "AMD 'frequency sensitivity feedback' powersave bias"

Why in ' '? Isn't that the final name?

> + depends on CPU_FREQ_GOV_ONDEMAND && X86_ACPI_CPUFREQ

depends on CPU_SUP_AMD

> + help
> + This adds support for 'frequency sensitivity feedback' feature on
> + supported AMD processors, which hooks into the ondemand governor's
> + powersave bias to influence frequency change decisions.

Your description about the feature in the 0/2 message is much better
than this one here. How about adding it here too?

> +
> + If in doubt, say N.
> +
> config X86_GX_SUSPMOD
> tristate "Cyrix MediaGX/NatSemi Geode Suspend Modulation"
> depends on X86_32 && PCI
> diff --git a/drivers/cpufreq/Makefile b/drivers/cpufreq/Makefile
> index 863fd18..01dfdaf 100644
> --- a/drivers/cpufreq/Makefile
> +++ b/drivers/cpufreq/Makefile
> @@ -41,6 +41,7 @@ obj-$(CONFIG_X86_SPEEDSTEP_CENTRINO) += speedstep-centrino.o
> obj-$(CONFIG_X86_P4_CLOCKMOD) += p4-clockmod.o
> obj-$(CONFIG_X86_CPUFREQ_NFORCE2) += cpufreq-nforce2.o
> obj-$(CONFIG_X86_INTEL_PSTATE) += intel_pstate.o
> +obj-$(CONFIG_X86_AMD_FREQ_SENSITIVITY) += amd_freq_sensitivity.o
>
> ##################################################################################
> # ARM SoC drivers
> diff --git a/drivers/cpufreq/amd_freq_sensitivity.c b/drivers/cpufreq/amd_freq_sensitivity.c
> new file mode 100644
> index 0000000..997feb0
> --- /dev/null
> +++ b/drivers/cpufreq/amd_freq_sensitivity.c
> @@ -0,0 +1,147 @@
> +/*
> + * amd_freq_sensitivity.c: AMD "frequency sensitivity feedback" powersave bias
> + * for ondemand governor.
> + *
> + * Copyright (C) 2013 Advanced Micro Devices, Inc.

You probably want to leave an email address in here for contacting you
when it is b0rked. :-)

> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + */
> +
> +#include <linux/module.h>
> +#include <linux/types.h>
> +#include <linux/percpu-defs.h>
> +#include <linux/init.h>
> +
> +#include "cpufreq_governor.h"
> +
> +#define PROC_FEEDBACK_INTERFACE_SHIFT 11

Yeah, that's a bit cumbersome. Just define a normal x86 feature bit in
cpufeature.h and then you can use static_cpu_has below:

if (!static_cpu_has(AMD_PROC_FEEDBACK_INTERFACE))
return -ENODEV;


> +#define CLASS_CODE_SHIFT 56
> +#define CLASS_CODE_MASK 0xff
> +#define CLASS_CODE_CORE_FREQUENCY_SENSITIVITY 0x01
> +
> +static u32 msr_addr;
> +
> +struct cpu_data_t {
> + u64 actual;
> + u64 reference;
> + unsigned int freq_prev;
> +};
> +
> +static DEFINE_PER_CPU(struct cpu_data_t, cpu_data);
> +
> +static unsigned int amd_powersave_bias_target(struct cpufreq_policy *policy,
> + unsigned int freq_next, unsigned int relation)

arg alignment.

> +{
> + int sensitivity;
> + long d_actual, d_reference;
> + struct msr actual, reference;
> + struct cpu_data_t *data = &per_cpu(cpu_data, policy->cpu);
> + struct dbs_data *od_data = policy->governor_data;
> + struct od_dbs_tuners *od_tuners = od_data->tuners;
> + struct od_cpu_dbs_info_s *od_info =
> + od_data->cdata->get_cpu_dbs_info_s(policy->cpu);
> +
> + rdmsr_on_cpu(policy->cpu, msr_addr, &actual.l, &actual.h);
> + rdmsr_on_cpu(policy->cpu, msr_addr + 1, &reference.l, &reference.h);
> + actual.h &= 0x00ffffff;
> + reference.h &= 0x00ffffff;
> +
> + if (!od_info->freq_table)
> + goto out;

Ok, this check is definitely misplaced. So if we don't have
->freq_table, we can save us the MSR accesses above and simply return
freq_next, right? So basically you want to push the check before the MSR
accesses and do:

if (!od_info->freq_table)
return freq_next;

Or am I missing something?

> +
> + /* counter wrapped around, so push until next check */
> + if (actual.q < data->actual || reference.q < data->reference) {
> + freq_next = policy->cur;
> + goto out;
> + }
> +
> + d_actual = actual.q - data->actual;
> + d_reference = reference.q - data->reference;
> +
> + /* divide by 0, so push as well */

What do you mean by "push as well"? No change, right?

> + if (d_reference == 0) {
> + freq_next = policy->cur;
> + goto out;
> + }
> +
> + sensitivity = 1000 - (1000 * (d_reference - d_actual) / d_reference);

Ok, now this naked 1000 could very well use a define here like you do
for your other values you're using :-).

> + if (sensitivity > 1000)
> + sensitivity = 1000;
> + else if (sensitivity < 0)
> + sensitivity = 0;

clamp(sensitivity, 0, 1000);

> +
> + /* this workload is not CPU bound, so choose a lower freq */
> + if (sensitivity < od_tuners->powersave_bias) {

Yeah, this ->powersave_bias usage needs more discussion, as Thomas said
earlier.

> + if (data->freq_prev == policy->cur)
> + freq_next = policy->cur;
> +
> + if (freq_next > policy->cur)
> + freq_next = policy->cur;
> + else if (freq_next < policy->cur)
> + freq_next = policy->min;
> + else {
> + unsigned int index = 0;
> +
> + cpufreq_frequency_table_target(policy,
> + od_info->freq_table, policy->cur - 1,
> + CPUFREQ_RELATION_H, &index);
> + freq_next = od_info->freq_table[index].frequency;
> + }
> +
> + data->freq_prev = freq_next;
> + } else
> + data->freq_prev = 0;
> +
> +out:
> + data->actual = actual.q;
> + data->reference = reference.q;
> + return freq_next;
> +}
> +
> +static int __init amd_freq_sensitivity_init(void)
> +{
> + int i;
> + u32 eax, edx, dummy;
> +
> + if (boot_cpu_data.x86_vendor != X86_VENDOR_AMD)
> + return -ENODEV;
> +
> + cpuid(0x80000007, &eax, &dummy, &dummy, &edx);
> +
> + if (!(edx & (1 << PROC_FEEDBACK_INTERFACE_SHIFT)))
> + return -ENODEV;
> +
> + for (i = 0; i < (eax & 0xf); i++) {
> + u64 val;
> + u32 addr = MSR_AMD64_FREQ_SENSITIVITY + (i * 2);
> +
> + rdmsrl(addr, val);

Pls use rdmsrl_safe variant for virtualization's sake and check its
retval befor using it below.

> +
> + if (((val >> CLASS_CODE_SHIFT) & CLASS_CODE_MASK)
> + == CLASS_CODE_CORE_FREQUENCY_SENSITIVITY) {
> + msr_addr = addr;
> + break;
> + }
> + }

What is this thing doing? There's a whole range of MSRs which can give
you freq feedback? Or only the two as it is done above for actual and
reference?

Also, you can simplify the check provided that bits [63:56] denote
support is present if they're not 0:

if (val >> CLASS_CODE_SHIFT)
msr_addr = addr;

> +
> + if (!msr_addr)
> + return -ENODEV;
> +
> + od_register_powersave_bias_function(amd_powersave_bias_target);
> + return 0;
> +}
> +module_init(amd_freq_sensitivity_init);
> +
> +static void __exit amd_freq_sensitivity_exit(void)
> +{
> + od_unregister_powersave_bias_function();
> +}
> +module_exit(amd_freq_sensitivity_exit);
> +
> +MODULE_AUTHOR("Jacob Shin <[email protected]>");
> +MODULE_DESCRIPTION("AMD 'frequency sensitivity feedback' powersave bias for "
> + "the ondemand governor.");
> +MODULE_LICENSE("GPL");

Thanks.

--
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

2013-04-02 14:35:04

by Jacob Shin

[permalink] [raw]
Subject: Re: [PATCH V2 2/2] cpufreq: AMD "frequency sensitivity feedback" powersave bias for ondemand governor

On Tue, Apr 02, 2013 at 01:40:13PM +0200, Thomas Renninger wrote:
> On Thursday, March 28, 2013 01:24:17 PM Jacob Shin wrote:
> > Future AMD processors, starting with Family 16h, can provide software
> > with feedback on how the workload may respond to frequency change --
> > memory-bound workloads will not benefit from higher frequency, where
> > as compute-bound workloads will. This patch enables this "frequency
> > sensitivity feedback" to aid the ondemand governor to make better
> > frequency change decisions by hooking into the powersave bias.
> If I read this correctly, nothing changes even if the driver is loaded,
> unless user modifies:
> /sys/devices/system/cpu/cpufreq/ondemand/powersave_bias
> is this correct?

Hi, yes that is correct.

>
> I wonder who should modify:
> /sys/devices/system/cpu/cpufreq/ondemand/powersave_bias
> Even cpupower is not aware of this very specific tunable.

Hmm .. I had thought that end user or distros would already know about
it/use it since the powersave_bias tunable sysfs already exists.

I guess not?

>
> Also, are you sure cpufreq subsystem will be the only user
> of this one?
> Or could cpuidle or others also make use of this somewhen in the future?

I think so, since this register deals with how the workload is
affected by frequency changes -- cpufreq.

>
> Then this could more be done like:
> drivers/cpufreq/mperf.c
> And scheduler, cpuidle, cpufreq or whatever could use this as well.
>
> Just some thinking:
> I wonder how one could check/verify that the right thing is done
> (by CPU and kernel). Ideally it would be nice to have the CPU register
> appended to a cpufreq or cpuidle event trace.
> But this very (AMD or X86 only?) specific data would not look nice there.
> An arch placeholder value would be needed or similar?
>
> ...
> > +}
> > +
> > +static int __init amd_freq_sensitivity_init(void)
> > +{
> > + int i;
> > + u32 eax, edx, dummy;
> > +
> > + if (boot_cpu_data.x86_vendor != X86_VENDOR_AMD)
> > + return -ENODEV;
> > +
> > + cpuid(0x80000007, &eax, &dummy, &dummy, &edx);
> If this really should be a separate module:
> Does/will Intel have the same (feature/cpuid bit)?
> Anyway, this should get a general AMD or X86 CPU capability flag.

As far as I know, this is AMD specific, yes I'll add the AMD vendor
check.

>
> Then you can also autoload this driver similar to how it's done in acpi-
> cpufreq:
> static const struct x86_cpu_id acpi_cpufreq_ids[] = {
> X86_FEATURE_MATCH(X86_FEATURE_ACPI),
> X86_FEATURE_MATCH(X86_FEATURE_HW_PSTATE),
> {}
> };
> MODULE_DEVICE_TABLE(x86cpu, acpi_cpufreq_ids);

Okay.

Thanks.

>
> Thomas
>

2013-04-02 14:36:28

by Jacob Shin

[permalink] [raw]
Subject: Re: [PATCH V2 1/2] cpufreq: ondemand: allow custom powersave_bias_target function to be registered

On Tue, Apr 02, 2013 at 02:43:32PM +0200, Borislav Petkov wrote:
> On Thu, Mar 28, 2013 at 01:24:16PM -0500, Jacob Shin wrote:
> > This allows for another [arch specific] driver to hook into existing
> > powersave bias function of the ondemand governor. i.e. This allows AMD
> > specific powersave bias function (in a separate AMD specific driver)
> > to aid ondemand governor's frequency transition deicisions.
> >
> > Signed-off-by: Jacob Shin <[email protected]>
> > ---
> > drivers/cpufreq/cpufreq_governor.h | 3 +++
> > drivers/cpufreq/cpufreq_ondemand.c | 22 +++++++++++++++++++---
> > 2 files changed, 22 insertions(+), 3 deletions(-)
> >
> > diff --git a/drivers/cpufreq/cpufreq_governor.h b/drivers/cpufreq/cpufreq_governor.h
> > index c83cabf..4b6808f 100644
> > --- a/drivers/cpufreq/cpufreq_governor.h
> > +++ b/drivers/cpufreq/cpufreq_governor.h
> > @@ -262,4 +262,7 @@ bool need_load_eval(struct cpu_dbs_common_info *cdbs,
> > unsigned int sampling_rate);
> > int cpufreq_governor_dbs(struct cpufreq_policy *policy,
> > struct common_dbs_data *cdata, unsigned int event);
> > +void od_register_powersave_bias_function(unsigned int (*f)
> > + (struct cpufreq_policy *, unsigned int, unsigned int));
> > +void od_unregister_powersave_bias_function(void);
>
> We generally call those a "callback" or a "handler". I.e.,
> od_register_powersave_bias_handler or something.

Okay, will change.

>
> > #endif /* _CPUFREQ_GOVERNER_H */
> > diff --git a/drivers/cpufreq/cpufreq_ondemand.c b/drivers/cpufreq/cpufreq_ondemand.c
> > index 15e80ee..36f0798 100644
> > --- a/drivers/cpufreq/cpufreq_ondemand.c
> > +++ b/drivers/cpufreq/cpufreq_ondemand.c
> > @@ -40,6 +40,8 @@
> >
> > static DEFINE_PER_CPU(struct od_cpu_dbs_info_s, od_cpu_dbs_info);
> >
> > +static struct od_ops od_ops;
> > +
> > #ifndef CONFIG_CPU_FREQ_DEFAULT_GOV_ONDEMAND
> > static struct cpufreq_governor cpufreq_gov_ondemand;
> > #endif
> > @@ -145,7 +147,8 @@ static void dbs_freq_increase(struct cpufreq_policy *p, unsigned int freq)
> > struct od_dbs_tuners *od_tuners = dbs_data->tuners;
> >
> > if (od_tuners->powersave_bias)
> > - freq = powersave_bias_target(p, freq, CPUFREQ_RELATION_H);
> > + freq = od_ops.powersave_bias_target(p, freq,
> > + CPUFREQ_RELATION_H);
> > else if (p->cur == p->max)
> > return;
> >
> > @@ -206,8 +209,8 @@ static void od_check_cpu(int cpu, unsigned int load_freq)
> > __cpufreq_driver_target(policy, freq_next,
> > CPUFREQ_RELATION_L);
> > } else {
> > - int freq = powersave_bias_target(policy, freq_next,
> > - CPUFREQ_RELATION_L);
> > + int freq = od_ops.powersave_bias_target(policy,
> > + freq_next, CPUFREQ_RELATION_L);
> > __cpufreq_driver_target(policy, freq,
> > CPUFREQ_RELATION_L);
> > }
> > @@ -565,6 +568,19 @@ static struct common_dbs_data od_dbs_cdata = {
> > .exit = od_exit,
> > };
> >
> > +void od_register_powersave_bias_function(unsigned int (*f)
> > + (struct cpufreq_policy *, unsigned int, unsigned int))
> > +{
> > + od_ops.powersave_bias_target = f;
> > +}
> > +EXPORT_SYMBOL_GPL(od_register_powersave_bias_function);
> > +
> > +void od_unregister_powersave_bias_function(void)
> > +{
> > + od_ops.powersave_bias_target = powersave_bias_target;
>
> This is very confusing: we have ->powersave_bias_target and the default
> powersave_bias_target in the ondemand governor. Can we call the default
> one generic_powersave_bias_target or default_* or whatever.

Okay, will do.

Thanks!

>
> Thanks.
>
> --
> Regards/Gruss,
> Boris.
>
> Sent from a fat crate under my desk. Formatting is fine.
> --
>

2013-04-02 17:19:00

by Jacob Shin

[permalink] [raw]
Subject: Re: [PATCH V2 2/2] cpufreq: AMD "frequency sensitivity feedback" powersave bias for ondemand governor

On Tue, Apr 02, 2013 at 03:42:55PM +0200, Borislav Petkov wrote:
> On Thu, Mar 28, 2013 at 01:24:17PM -0500, Jacob Shin wrote:
> > Future AMD processors, starting with Family 16h, can provide software
> > with feedback on how the workload may respond to frequency change --
> > memory-bound workloads will not benefit from higher frequency, where
> > as compute-bound workloads will. This patch enables this "frequency
> > sensitivity feedback" to aid the ondemand governor to make better
> > frequency change decisions by hooking into the powersave bias.
> >
> > Signed-off-by: Jacob Shin <[email protected]>
> > ---
> > arch/x86/include/uapi/asm/msr-index.h | 1 +
> > drivers/cpufreq/Kconfig.x86 | 10 +++
> > drivers/cpufreq/Makefile | 1 +
> > drivers/cpufreq/amd_freq_sensitivity.c | 147 ++++++++++++++++++++++++++++++++
> > 4 files changed, 159 insertions(+)
> > create mode 100644 drivers/cpufreq/amd_freq_sensitivity.c
> >
> > diff --git a/arch/x86/include/uapi/asm/msr-index.h b/arch/x86/include/uapi/asm/msr-index.h
> > index 7a060f4..b2e6c49 100644
> > --- a/arch/x86/include/uapi/asm/msr-index.h
> > +++ b/arch/x86/include/uapi/asm/msr-index.h
> > @@ -173,6 +173,7 @@
> > #define MSR_AMD64_TSC_RATIO 0xc0000104
> > #define MSR_AMD64_NB_CFG 0xc001001f
> > #define MSR_AMD64_PATCH_LOADER 0xc0010020
> > +#define MSR_AMD64_FREQ_SENSITIVITY 0xc0010080
> > #define MSR_AMD64_OSVW_ID_LENGTH 0xc0010140
> > #define MSR_AMD64_OSVW_STATUS 0xc0010141
> > #define MSR_AMD64_DC_CFG 0xc0011022
>
> My guess is, this MSR won't be used outside of cpufreq so you probably
> want to define it there, in amd_freq_sensitivity.c
>
> > diff --git a/drivers/cpufreq/Kconfig.x86 b/drivers/cpufreq/Kconfig.x86
> > index d7dc0ed..6c714b0 100644
> > --- a/drivers/cpufreq/Kconfig.x86
> > +++ b/drivers/cpufreq/Kconfig.x86
> > @@ -129,6 +129,16 @@ config X86_POWERNOW_K8
> >
> > For details, take a look at <file:Documentation/cpu-freq/>.
> >
> > +config X86_AMD_FREQ_SENSITIVITY
> > + tristate "AMD 'frequency sensitivity feedback' powersave bias"
>
> Why in ' '? Isn't that the final name?

You are right,

It does not need to be in quotes. I had first written this as its
own governor, and I was mimicking Kconfig entries of 'ondemand',
'performance' .. and so on.

>
> > + depends on CPU_FREQ_GOV_ONDEMAND && X86_ACPI_CPUFREQ
>
> depends on CPU_SUP_AMD
>
> > + help
> > + This adds support for 'frequency sensitivity feedback' feature on
> > + supported AMD processors, which hooks into the ondemand governor's
> > + powersave bias to influence frequency change decisions.
>
> Your description about the feature in the 0/2 message is much better
> than this one here. How about adding it here too?
>
> > +
> > + If in doubt, say N.
> > +
> > config X86_GX_SUSPMOD
> > tristate "Cyrix MediaGX/NatSemi Geode Suspend Modulation"
> > depends on X86_32 && PCI
> > diff --git a/drivers/cpufreq/Makefile b/drivers/cpufreq/Makefile
> > index 863fd18..01dfdaf 100644
> > --- a/drivers/cpufreq/Makefile
> > +++ b/drivers/cpufreq/Makefile
> > @@ -41,6 +41,7 @@ obj-$(CONFIG_X86_SPEEDSTEP_CENTRINO) += speedstep-centrino.o
> > obj-$(CONFIG_X86_P4_CLOCKMOD) += p4-clockmod.o
> > obj-$(CONFIG_X86_CPUFREQ_NFORCE2) += cpufreq-nforce2.o
> > obj-$(CONFIG_X86_INTEL_PSTATE) += intel_pstate.o
> > +obj-$(CONFIG_X86_AMD_FREQ_SENSITIVITY) += amd_freq_sensitivity.o
> >
> > ##################################################################################
> > # ARM SoC drivers
> > diff --git a/drivers/cpufreq/amd_freq_sensitivity.c b/drivers/cpufreq/amd_freq_sensitivity.c
> > new file mode 100644
> > index 0000000..997feb0
> > --- /dev/null
> > +++ b/drivers/cpufreq/amd_freq_sensitivity.c
> > @@ -0,0 +1,147 @@
> > +/*
> > + * amd_freq_sensitivity.c: AMD "frequency sensitivity feedback" powersave bias
> > + * for ondemand governor.
> > + *
> > + * Copyright (C) 2013 Advanced Micro Devices, Inc.
>
> You probably want to leave an email address in here for contacting you
> when it is b0rked. :-)
>
> > + *
> > + * This program is free software; you can redistribute it and/or modify
> > + * it under the terms of the GNU General Public License version 2 as
> > + * published by the Free Software Foundation.
> > + */
> > +
> > +#include <linux/module.h>
> > +#include <linux/types.h>
> > +#include <linux/percpu-defs.h>
> > +#include <linux/init.h>
> > +
> > +#include "cpufreq_governor.h"
> > +
> > +#define PROC_FEEDBACK_INTERFACE_SHIFT 11
>
> Yeah, that's a bit cumbersome. Just define a normal x86 feature bit in
> cpufeature.h and then you can use static_cpu_has below:
>
> if (!static_cpu_has(AMD_PROC_FEEDBACK_INTERFACE))
> return -ENODEV;
>
>
> > +#define CLASS_CODE_SHIFT 56
> > +#define CLASS_CODE_MASK 0xff
> > +#define CLASS_CODE_CORE_FREQUENCY_SENSITIVITY 0x01
> > +
> > +static u32 msr_addr;
> > +
> > +struct cpu_data_t {
> > + u64 actual;
> > + u64 reference;
> > + unsigned int freq_prev;
> > +};
> > +
> > +static DEFINE_PER_CPU(struct cpu_data_t, cpu_data);
> > +
> > +static unsigned int amd_powersave_bias_target(struct cpufreq_policy *policy,
> > + unsigned int freq_next, unsigned int relation)
>
> arg alignment.
>
> > +{
> > + int sensitivity;
> > + long d_actual, d_reference;
> > + struct msr actual, reference;
> > + struct cpu_data_t *data = &per_cpu(cpu_data, policy->cpu);
> > + struct dbs_data *od_data = policy->governor_data;
> > + struct od_dbs_tuners *od_tuners = od_data->tuners;
> > + struct od_cpu_dbs_info_s *od_info =
> > + od_data->cdata->get_cpu_dbs_info_s(policy->cpu);
> > +
> > + rdmsr_on_cpu(policy->cpu, msr_addr, &actual.l, &actual.h);
> > + rdmsr_on_cpu(policy->cpu, msr_addr + 1, &reference.l, &reference.h);
> > + actual.h &= 0x00ffffff;
> > + reference.h &= 0x00ffffff;
> > +
> > + if (!od_info->freq_table)
> > + goto out;
>
> Ok, this check is definitely misplaced. So if we don't have
> ->freq_table, we can save us the MSR accesses above and simply return
> freq_next, right? So basically you want to push the check before the MSR
> accesses and do:
>
> if (!od_info->freq_table)
> return freq_next;

Yup, you are right, my oversight.

>
> Or am I missing something?
>
> > +
> > + /* counter wrapped around, so push until next check */
> > + if (actual.q < data->actual || reference.q < data->reference) {
> > + freq_next = policy->cur;
> > + goto out;
> > + }
> > +
> > + d_actual = actual.q - data->actual;
> > + d_reference = reference.q - data->reference;
> > +
> > + /* divide by 0, so push as well */
>
> What do you mean by "push as well"? No change, right?

Right, stay at the current frequency. I'll change the comments to be
more clear.

>
> > + if (d_reference == 0) {
> > + freq_next = policy->cur;
> > + goto out;
> > + }
> > +
> > + sensitivity = 1000 - (1000 * (d_reference - d_actual) / d_reference);
>
> Ok, now this naked 1000 could very well use a define here like you do
> for your other values you're using :-).
>
> > + if (sensitivity > 1000)
> > + sensitivity = 1000;
> > + else if (sensitivity < 0)
> > + sensitivity = 0;
>
> clamp(sensitivity, 0, 1000);
>
> > +
> > + /* this workload is not CPU bound, so choose a lower freq */
> > + if (sensitivity < od_tuners->powersave_bias) {
>
> Yeah, this ->powersave_bias usage needs more discussion, as Thomas said
> earlier.
>
> > + if (data->freq_prev == policy->cur)
> > + freq_next = policy->cur;
> > +
> > + if (freq_next > policy->cur)
> > + freq_next = policy->cur;
> > + else if (freq_next < policy->cur)
> > + freq_next = policy->min;
> > + else {
> > + unsigned int index = 0;
> > +
> > + cpufreq_frequency_table_target(policy,
> > + od_info->freq_table, policy->cur - 1,
> > + CPUFREQ_RELATION_H, &index);
> > + freq_next = od_info->freq_table[index].frequency;
> > + }
> > +
> > + data->freq_prev = freq_next;
> > + } else
> > + data->freq_prev = 0;
> > +
> > +out:
> > + data->actual = actual.q;
> > + data->reference = reference.q;
> > + return freq_next;
> > +}
> > +
> > +static int __init amd_freq_sensitivity_init(void)
> > +{
> > + int i;
> > + u32 eax, edx, dummy;
> > +
> > + if (boot_cpu_data.x86_vendor != X86_VENDOR_AMD)
> > + return -ENODEV;
> > +
> > + cpuid(0x80000007, &eax, &dummy, &dummy, &edx);
> > +
> > + if (!(edx & (1 << PROC_FEEDBACK_INTERFACE_SHIFT)))
> > + return -ENODEV;
> > +
> > + for (i = 0; i < (eax & 0xf); i++) {
> > + u64 val;
> > + u32 addr = MSR_AMD64_FREQ_SENSITIVITY + (i * 2);
> > +
> > + rdmsrl(addr, val);
>
> Pls use rdmsrl_safe variant for virtualization's sake and check its
> retval befor using it below.
>
> > +
> > + if (((val >> CLASS_CODE_SHIFT) & CLASS_CODE_MASK)
> > + == CLASS_CODE_CORE_FREQUENCY_SENSITIVITY) {
> > + msr_addr = addr;
> > + break;
> > + }
> > + }
>
> What is this thing doing? There's a whole range of MSRs which can give
> you freq feedback? Or only the two as it is done above for actual and
> reference?

Only the two that has to do with frequency sensitivity.

My initial reading of the manual was like this:

* CPUID feature bit says it supports "Processor Feedback Interface"
* Another CPUID field says how many "Number of Monitors" -- actual and
reference MSR register pairs.
* Finally, software can look at all the monitors and see what its
"Class Code" is and 0x01 is the frequency sensitivity.

So I was trying to future proof .. by looking at all possible monitors
and finding the frequency sensitivity.

But I think we can just simplify things and just assume frequency
sensitivity will always live at the defined MSR address. So I'll get
rid of the entire for loop.

>
> Also, you can simplify the check provided that bits [63:56] denote
> support is present if they're not 0:
>
> if (val >> CLASS_CODE_SHIFT)
> msr_addr = addr;
>
> > +
> > + if (!msr_addr)
> > + return -ENODEV;
> > +
> > + od_register_powersave_bias_function(amd_powersave_bias_target);
> > + return 0;
> > +}
> > +module_init(amd_freq_sensitivity_init);
> > +
> > +static void __exit amd_freq_sensitivity_exit(void)
> > +{
> > + od_unregister_powersave_bias_function();
> > +}
> > +module_exit(amd_freq_sensitivity_exit);
> > +
> > +MODULE_AUTHOR("Jacob Shin <[email protected]>");
> > +MODULE_DESCRIPTION("AMD 'frequency sensitivity feedback' powersave bias for "
> > + "the ondemand governor.");
> > +MODULE_LICENSE("GPL");
>
> Thanks.

Thanks for taking a look, I'll send out V3 soon with all your
suggested fixups.

>
> --
> Regards/Gruss,
> Boris.
>
> Sent from a fat crate under my desk. Formatting is fine.
> --
>