From: Borislav Petkov <[email protected]>
Ok, here's the second version with all review comments addressed. I've
dropped the stable tag since this has been that way (and b0rked) since
forever so stable rules don't apply.
We can probably do single backports if distros want it.
Changelog:
-v1:
Once upon a time there was this microcode reloading interface
/sys/devices/system/cpu/cpuX/microcode/reload, where X is an online
cpu on the system, which allowed the loading of microcode in a per-cpu
manner.
This had problems like having different ucode revisions on an otherwise
homogeneous system and needed O(n^2) overhead when tracking minimum
microcode revision per-core.
So make this interface per-system so that it does microcode reloading on
the whole system only.
Single commit messages have more info too.
The first patch has a stable tag which I'd like to see in stable but
since it is not fixing a direct regression, I'd like to not push it
upstream now but have it get tested in linux-next and go upstream during
the next merge window from where it can trickle slowly to stable.
Patches have been tested on all AMD families, it wouldn't hurt if it saw
some Intel testing too, although it should just work.
Holler if you see regressions/problems with it.
Thanks.
From: Borislav Petkov <[email protected]>
Microcode reloading in a per-core manner is a very bad idea for both
major x86 vendors. And the thing is, we have such interface with which
we can end up with different microcode versions applied on different
cores of an otherwise homogeneous wrt (family,model,stepping) system.
So turn off the possibility of doing that per core and allow it only
system-wide.
This is a minimal fix which we'd like to see in stable too thus the
more-or-less arbitrary decision to allow system-wide reloading only on
the BSP:
$ echo 1 > /sys/devices/system/cpu/cpu0/microcode/reload
...
and disable the interface on the other cores:
$ echo 1 > /sys/devices/system/cpu/cpu23/microcode/reload
-bash: echo: write error: Invalid argument
Also, allowing the reload only from one CPU (the BSP in
that case) doesn't allow the reload procedure to degenerate
into an O(n^2) deal when triggering reloads from all
/sys/devices/system/cpu/cpuX/microcode/reload sysfs nodes
simultaneously.
A more generic fix will follow.
Cc: Henrique de Moraes Holschuh <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
---
arch/x86/kernel/microcode_core.c | 26 +++++++++++++++++++-------
1 file changed, 19 insertions(+), 7 deletions(-)
diff --git a/arch/x86/kernel/microcode_core.c b/arch/x86/kernel/microcode_core.c
index fbdfc6917180..24b852b61be3 100644
--- a/arch/x86/kernel/microcode_core.c
+++ b/arch/x86/kernel/microcode_core.c
@@ -298,19 +298,31 @@ static ssize_t reload_store(struct device *dev,
const char *buf, size_t size)
{
unsigned long val;
- int cpu = dev->id;
- ssize_t ret = 0;
+ int cpu;
+ ssize_t ret = 0, tmp_ret;
+
+ /* allow reload only from the BSP */
+ if (boot_cpu_data.cpu_index != dev->id)
+ return -EINVAL;
ret = kstrtoul(buf, 0, &val);
if (ret)
return ret;
- if (val == 1) {
- get_online_cpus();
- if (cpu_online(cpu))
- ret = reload_for_cpu(cpu);
- put_online_cpus();
+ if (val != 1)
+ return size;
+
+ get_online_cpus();
+ for_each_online_cpu(cpu) {
+ tmp_ret = reload_for_cpu(cpu);
+ if (tmp_ret != 0)
+ pr_warn("Error reloading microcode on CPU %d\n", cpu);
+
+ /* save retval of the first encountered reload error */
+ if (!ret)
+ ret = tmp_ret;
}
+ put_online_cpus();
if (!ret)
ret = size;
--
1.7.11.rc1
From: Borislav Petkov <[email protected]>
The reload interface should be per-system so that a full system ucode
reload happens (on each core) when doing
echo 1 > /sys/devices/system/cpu/microcode/reload
Move it to the cpu subsys directory instead of it being per-cpu.
Cc: Henrique de Moraes Holschuh <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
---
arch/x86/kernel/microcode_core.c | 36 ++++++++++++++++++++++++++++--------
1 file changed, 28 insertions(+), 8 deletions(-)
diff --git a/arch/x86/kernel/microcode_core.c b/arch/x86/kernel/microcode_core.c
index 24b852b61be3..947e4c64b1d8 100644
--- a/arch/x86/kernel/microcode_core.c
+++ b/arch/x86/kernel/microcode_core.c
@@ -301,10 +301,6 @@ static ssize_t reload_store(struct device *dev,
int cpu;
ssize_t ret = 0, tmp_ret;
- /* allow reload only from the BSP */
- if (boot_cpu_data.cpu_index != dev->id)
- return -EINVAL;
-
ret = kstrtoul(buf, 0, &val);
if (ret)
return ret;
@@ -351,7 +347,6 @@ static DEVICE_ATTR(version, 0400, version_show, NULL);
static DEVICE_ATTR(processor_flags, 0400, pf_show, NULL);
static struct attribute *mc_default_attrs[] = {
- &dev_attr_reload.attr,
&dev_attr_version.attr,
&dev_attr_processor_flags.attr,
NULL
@@ -528,6 +523,16 @@ static const struct x86_cpu_id microcode_id[] = {
MODULE_DEVICE_TABLE(x86cpu, microcode_id);
#endif
+static struct attribute *cpu_root_microcode_attrs[] = {
+ &dev_attr_reload.attr,
+ NULL
+};
+
+static struct attribute_group cpu_root_microcode_group = {
+ .name = "microcode",
+ .attrs = cpu_root_microcode_attrs,
+};
+
static int __init microcode_init(void)
{
struct cpuinfo_x86 *c = &cpu_data(0);
@@ -559,9 +564,17 @@ static int __init microcode_init(void)
if (error)
goto out_pdev;
+ error = sysfs_create_group(&cpu_subsys.dev_root->kobj,
+ &cpu_root_microcode_group);
+
+ if (error) {
+ pr_err("Error creating microcode group!\n");
+ goto out_driver;
+ }
+
error = microcode_dev_init();
if (error)
- goto out_driver;
+ goto out_ucode_group;
register_syscore_ops(&mc_syscore_ops);
register_hotcpu_notifier(&mc_cpu_notifier);
@@ -571,7 +584,11 @@ static int __init microcode_init(void)
return 0;
-out_driver:
+ out_ucode_group:
+ sysfs_remove_group(&cpu_subsys.dev_root->kobj,
+ &cpu_root_microcode_group);
+
+ out_driver:
get_online_cpus();
mutex_lock(µcode_mutex);
@@ -580,7 +597,7 @@ out_driver:
mutex_unlock(µcode_mutex);
put_online_cpus();
-out_pdev:
+ out_pdev:
platform_device_unregister(microcode_pdev);
return error;
@@ -596,6 +613,9 @@ static void __exit microcode_exit(void)
unregister_hotcpu_notifier(&mc_cpu_notifier);
unregister_syscore_ops(&mc_syscore_ops);
+ sysfs_remove_group(&cpu_subsys.dev_root->kobj,
+ &cpu_root_microcode_group);
+
get_online_cpus();
mutex_lock(µcode_mutex);
--
1.7.11.rc1
On Thu, 2012-06-21 at 14:07 +0200, Borislav Petkov wrote:
> From: Borislav Petkov <[email protected]>
>
> Microcode reloading in a per-core manner is a very bad idea for both
> major x86 vendors. And the thing is, we have such interface with which
> we can end up with different microcode versions applied on different
> cores of an otherwise homogeneous wrt (family,model,stepping) system.
>
> So turn off the possibility of doing that per core and allow it only
> system-wide.
>
> This is a minimal fix which we'd like to see in stable too thus the
> more-or-less arbitrary decision to allow system-wide reloading only on
> the BSP:
>
> $ echo 1 > /sys/devices/system/cpu/cpu0/microcode/reload
> ...
>
> and disable the interface on the other cores:
>
> $ echo 1 > /sys/devices/system/cpu/cpu23/microcode/reload
> -bash: echo: write error: Invalid argument
>
> Also, allowing the reload only from one CPU (the BSP in
> that case) doesn't allow the reload procedure to degenerate
> into an O(n^2) deal when triggering reloads from all
> /sys/devices/system/cpu/cpuX/microcode/reload sysfs nodes
> simultaneously.
>
> A more generic fix will follow.
>
> Cc: Henrique de Moraes Holschuh <[email protected]>
Acked-by: Peter Zijlstra <[email protected]>
On Thu, 2012-06-21 at 14:07 +0200, Borislav Petkov wrote:
> From: Borislav Petkov <[email protected]>
>
> The reload interface should be per-system so that a full system ucode
> reload happens (on each core) when doing
>
> echo 1 > /sys/devices/system/cpu/microcode/reload
>
> Move it to the cpu subsys directory instead of it being per-cpu.
>
> Cc: Henrique de Moraes Holschuh <[email protected]>
Acked-by: Peter Zijlstra <[email protected]>
What about something like so on top of those two microcode_core patches?
Since we don't have to worry about per-cpu loads, we can do a single
callback after the entire machine is updated.
---
arch/x86/include/asm/perf_event.h | 2 +
arch/x86/kernel/cpu/perf_event.c | 21 +++++++++-----
arch/x86/kernel/cpu/perf_event.h | 4 ++
arch/x86/kernel/cpu/perf_event_intel.c | 49 ++++++++++++++++++++++++++++++---
arch/x86/kernel/microcode_core.c | 10 ++++--
5 files changed, 72 insertions(+), 14 deletions(-)
--- a/arch/x86/include/asm/perf_event.h
+++ b/arch/x86/include/asm/perf_event.h
@@ -232,6 +232,7 @@ struct perf_guest_switch_msr {
extern struct perf_guest_switch_msr *perf_guest_get_msrs(int *nr);
extern void perf_get_x86_pmu_capability(struct x86_pmu_capability *cap);
+extern void perf_check_microcode(void);
#else
static inline perf_guest_switch_msr *perf_guest_get_msrs(int *nr)
{
@@ -245,6 +246,7 @@ static inline void perf_get_x86_pmu_capa
}
static inline void perf_events_lapic_init(void) { }
+static inline void perf_check_microcode(void) { }
#endif
#if defined(CONFIG_PERF_EVENTS) && defined(CONFIG_CPU_SUP_AMD)
--- a/arch/x86/kernel/cpu/perf_event.c
+++ b/arch/x86/kernel/cpu/perf_event.c
@@ -378,7 +378,7 @@ int x86_pmu_hw_config(struct perf_event
int precise = 0;
/* Support for constant skid */
- if (x86_pmu.pebs_active) {
+ if (x86_pmu.pebs_active && !x86_pmu.pebs_broken) {
precise++;
/* Support for IP fixup */
@@ -1649,13 +1649,20 @@ static void x86_pmu_flush_branch_stack(v
x86_pmu.flush_branch_stack();
}
+void perf_check_microcode(void)
+{
+ if (x86_pmu.check_microcode)
+ x86_pmu.check_microcode();
+}
+EXPORT_SYMBOL_GPL(perf_check_microcode);
+
static struct pmu pmu = {
.pmu_enable = x86_pmu_enable,
.pmu_disable = x86_pmu_disable,
- .attr_groups = x86_pmu_attr_groups,
+ .attr_groups = x86_pmu_attr_groups,
- .event_init = x86_pmu_event_init,
+ .event_init = x86_pmu_event_init,
.add = x86_pmu_add,
.del = x86_pmu_del,
@@ -1663,11 +1670,11 @@ static struct pmu pmu = {
.stop = x86_pmu_stop,
.read = x86_pmu_read,
- .start_txn = x86_pmu_start_txn,
- .cancel_txn = x86_pmu_cancel_txn,
- .commit_txn = x86_pmu_commit_txn,
+ .start_txn = x86_pmu_start_txn,
+ .cancel_txn = x86_pmu_cancel_txn,
+ .commit_txn = x86_pmu_commit_txn,
- .event_idx = x86_pmu_event_idx,
+ .event_idx = x86_pmu_event_idx,
.flush_branch_stack = x86_pmu_flush_branch_stack,
};
--- a/arch/x86/kernel/cpu/perf_event.h
+++ b/arch/x86/kernel/cpu/perf_event.h
@@ -361,6 +361,8 @@ struct x86_pmu {
void (*cpu_starting)(int cpu);
void (*cpu_dying)(int cpu);
void (*cpu_dead)(int cpu);
+
+ void (*check_microcode)(void);
void (*flush_branch_stack)(void);
/*
@@ -373,7 +375,7 @@ struct x86_pmu {
* Intel DebugStore bits
*/
int bts, pebs;
- int bts_active, pebs_active;
+ int bts_active, pebs_active, pebs_broken;
int pebs_record_size;
void (*drain_pebs)(struct pt_regs *regs);
struct event_constraint *pebs_constraints;
--- a/arch/x86/kernel/cpu/perf_event_intel.c
+++ b/arch/x86/kernel/cpu/perf_event_intel.c
@@ -1714,11 +1714,54 @@ static __init void intel_clovertown_quir
x86_pmu.pebs_constraints = NULL;
}
+static int intel_snb_pebs_broken(int cpu)
+{
+ u32 rev = UINT_MAX; /* default to broken for unknown models */
+
+ switch (cpu_data(cpu).x86_model) {
+ case 42: /* SNB */
+ rev = 0x28;
+ break;
+
+ case 45: /* SNB-EP */
+ switch (cpu_data(cpu).x86_mask) {
+ case 6: rev = 0x618; break;
+ case 7: rev = 0x70c; break;
+ }
+ }
+
+ return (cpu_data(cpu).microcode < rev);
+}
+
+static void intel_snb_check_microcode(void)
+{
+ int pebs_broken = 0;
+ int cpu;
+
+ get_online_cpus();
+ for_each_online_cpu(cpu)
+ pebs_broken |= intel_snb_pebs_broken(cpu);
+ put_online_cpus();
+
+ if (pebs_broken == x86_pmu.pebs_broken)
+ return;
+
+ /*
+ * Serialized by the microcode lock..
+ */
+ if (x86_pmu.pebs_broken) {
+ pr_info("PEBS enabled due to micro-code update\n");
+ x86_pmu.pebs_broken = 0;
+ } else {
+ pr_info("PEBS disabled due to CPU errata, please upgrade micro-code\n");
+ x86_pmu.pebs_broken = 1;
+ }
+}
+
static __init void intel_sandybridge_quirk(void)
{
- pr_warn("PEBS disabled due to CPU errata\n");
- x86_pmu.pebs = 0;
- x86_pmu.pebs_constraints = NULL;
+ x86_pmu.check_microcode = intel_snb_check_microcode;
+ intel_snb_check_microcode();
}
static const struct { int id; char *name; } intel_arch_events_map[] __initconst = {
--- a/arch/x86/kernel/microcode_core.c
+++ b/arch/x86/kernel/microcode_core.c
@@ -87,6 +87,7 @@
#include <asm/microcode.h>
#include <asm/processor.h>
#include <asm/cpu_device_id.h>
+#include <asm/perf_event.h>
MODULE_DESCRIPTION("Microcode Update Driver");
MODULE_AUTHOR("Tigran Aivazian <[email protected]>");
@@ -277,7 +278,6 @@ static int reload_for_cpu(int cpu)
struct ucode_cpu_info *uci = ucode_cpu_info + cpu;
int err = 0;
- mutex_lock(µcode_mutex);
if (uci->valid) {
enum ucode_state ustate;
@@ -288,7 +288,6 @@ static int reload_for_cpu(int cpu)
if (ustate == UCODE_ERROR)
err = -EINVAL;
}
- mutex_unlock(µcode_mutex);
return err;
}
@@ -309,6 +308,7 @@ static ssize_t reload_store(struct devic
return size;
get_online_cpus();
+ mutex_lock(µcode_mutex);
for_each_online_cpu(cpu) {
tmp_ret = reload_for_cpu(cpu);
if (tmp_ret != 0)
@@ -318,6 +318,9 @@ static ssize_t reload_store(struct devic
if (!ret)
ret = tmp_ret;
}
+ if (!ret)
+ perf_check_microcode();
+ mutex_unlock(µcode_mutex);
put_online_cpus();
if (!ret)
@@ -557,7 +560,8 @@ static int __init microcode_init(void)
mutex_lock(µcode_mutex);
error = subsys_interface_register(&mc_cpu_interface);
-
+ if (!error)
+ perf_check_microcode();
mutex_unlock(µcode_mutex);
put_online_cpus();
On Fri, Jun 22, 2012 at 05:46:59PM +0200, Peter Zijlstra wrote:
> What about something like so on top of those two microcode_core patches?
>
> Since we don't have to worry about per-cpu loads, we can do a single
> callback after the entire machine is updated.
Looks simple and neat enough to me, some minor nitpicks below...
>
> ---
> arch/x86/include/asm/perf_event.h | 2 +
> arch/x86/kernel/cpu/perf_event.c | 21 +++++++++-----
> arch/x86/kernel/cpu/perf_event.h | 4 ++
> arch/x86/kernel/cpu/perf_event_intel.c | 49 ++++++++++++++++++++++++++++++---
> arch/x86/kernel/microcode_core.c | 10 ++++--
> 5 files changed, 72 insertions(+), 14 deletions(-)
>
> --- a/arch/x86/include/asm/perf_event.h
> +++ b/arch/x86/include/asm/perf_event.h
> @@ -232,6 +232,7 @@ struct perf_guest_switch_msr {
>
> extern struct perf_guest_switch_msr *perf_guest_get_msrs(int *nr);
> extern void perf_get_x86_pmu_capability(struct x86_pmu_capability *cap);
> +extern void perf_check_microcode(void);
> #else
> static inline perf_guest_switch_msr *perf_guest_get_msrs(int *nr)
> {
> @@ -245,6 +246,7 @@ static inline void perf_get_x86_pmu_capa
> }
>
> static inline void perf_events_lapic_init(void) { }
> +static inline void perf_check_microcode(void) { }
> #endif
>
> #if defined(CONFIG_PERF_EVENTS) && defined(CONFIG_CPU_SUP_AMD)
> --- a/arch/x86/kernel/cpu/perf_event.c
> +++ b/arch/x86/kernel/cpu/perf_event.c
> @@ -378,7 +378,7 @@ int x86_pmu_hw_config(struct perf_event
> int precise = 0;
>
> /* Support for constant skid */
> - if (x86_pmu.pebs_active) {
> + if (x86_pmu.pebs_active && !x86_pmu.pebs_broken) {
> precise++;
>
> /* Support for IP fixup */
> @@ -1649,13 +1649,20 @@ static void x86_pmu_flush_branch_stack(v
> x86_pmu.flush_branch_stack();
> }
>
> +void perf_check_microcode(void)
> +{
> + if (x86_pmu.check_microcode)
> + x86_pmu.check_microcode();
> +}
> +EXPORT_SYMBOL_GPL(perf_check_microcode);
Maybe we should call the after-ucode-has-been-updated callback something
like arch_verify_microcode_revision or something similar and move it to
generic x86 code so that other stuff can use it too, in the future...
Although I'm not aware of any other users right about now.
> +
> static struct pmu pmu = {
> .pmu_enable = x86_pmu_enable,
> .pmu_disable = x86_pmu_disable,
>
> - .attr_groups = x86_pmu_attr_groups,
> + .attr_groups = x86_pmu_attr_groups,
>
> - .event_init = x86_pmu_event_init,
> + .event_init = x86_pmu_event_init,
>
> .add = x86_pmu_add,
> .del = x86_pmu_del,
> @@ -1663,11 +1670,11 @@ static struct pmu pmu = {
> .stop = x86_pmu_stop,
> .read = x86_pmu_read,
>
> - .start_txn = x86_pmu_start_txn,
> - .cancel_txn = x86_pmu_cancel_txn,
> - .commit_txn = x86_pmu_commit_txn,
> + .start_txn = x86_pmu_start_txn,
> + .cancel_txn = x86_pmu_cancel_txn,
> + .commit_txn = x86_pmu_commit_txn,
>
> - .event_idx = x86_pmu_event_idx,
> + .event_idx = x86_pmu_event_idx,
> .flush_branch_stack = x86_pmu_flush_branch_stack,
> };
>
> --- a/arch/x86/kernel/cpu/perf_event.h
> +++ b/arch/x86/kernel/cpu/perf_event.h
> @@ -361,6 +361,8 @@ struct x86_pmu {
> void (*cpu_starting)(int cpu);
> void (*cpu_dying)(int cpu);
> void (*cpu_dead)(int cpu);
> +
> + void (*check_microcode)(void);
> void (*flush_branch_stack)(void);
>
> /*
> @@ -373,7 +375,7 @@ struct x86_pmu {
> * Intel DebugStore bits
> */
> int bts, pebs;
> - int bts_active, pebs_active;
> + int bts_active, pebs_active, pebs_broken;
I know you don't like bool's here but maybe make it a bitfield like the
one in perf_event_attr?
> int pebs_record_size;
> void (*drain_pebs)(struct pt_regs *regs);
> struct event_constraint *pebs_constraints;
> --- a/arch/x86/kernel/cpu/perf_event_intel.c
> +++ b/arch/x86/kernel/cpu/perf_event_intel.c
> @@ -1714,11 +1714,54 @@ static __init void intel_clovertown_quir
> x86_pmu.pebs_constraints = NULL;
> }
>
> +static int intel_snb_pebs_broken(int cpu)
> +{
> + u32 rev = UINT_MAX; /* default to broken for unknown models */
> +
> + switch (cpu_data(cpu).x86_model) {
cpu_data(cpu) does a per_cpu access three times in this function, maybe
declare a local ptr which saves us the next two derefs... (if the
compiler is not optimizing those, anyway, that is).
> + case 42: /* SNB */
> + rev = 0x28;
> + break;
> +
> + case 45: /* SNB-EP */
> + switch (cpu_data(cpu).x86_mask) {
> + case 6: rev = 0x618; break;
> + case 7: rev = 0x70c; break;
> + }
> + }
> +
> + return (cpu_data(cpu).microcode < rev);
> +}
> +
> +static void intel_snb_check_microcode(void)
> +{
> + int pebs_broken = 0;
> + int cpu;
> +
> + get_online_cpus();
> + for_each_online_cpu(cpu)
> + pebs_broken |= intel_snb_pebs_broken(cpu);
if pebs_broken gets set here not on the last cpu, you could break out of
the loop early instead of iterating to the last cpu.
> + put_online_cpus();
> +
> + if (pebs_broken == x86_pmu.pebs_broken)
This is strange, why not:
if (pebs_broken && x86_pmu.pebs_broken)
?
> + return;
> +
> + /*
> + * Serialized by the microcode lock..
> + */
> + if (x86_pmu.pebs_broken) {
> + pr_info("PEBS enabled due to micro-code update\n");
> + x86_pmu.pebs_broken = 0;
> + } else {
> + pr_info("PEBS disabled due to CPU errata, please upgrade micro-code\n");
s/micro-code/microcode/g
> + x86_pmu.pebs_broken = 1;
> + }
> +}
> +
> static __init void intel_sandybridge_quirk(void)
> {
> - pr_warn("PEBS disabled due to CPU errata\n");
> - x86_pmu.pebs = 0;
> - x86_pmu.pebs_constraints = NULL;
> + x86_pmu.check_microcode = intel_snb_check_microcode;
> + intel_snb_check_microcode();
> }
>
> static const struct { int id; char *name; } intel_arch_events_map[] __initconst = {
> --- a/arch/x86/kernel/microcode_core.c
> +++ b/arch/x86/kernel/microcode_core.c
> @@ -87,6 +87,7 @@
> #include <asm/microcode.h>
> #include <asm/processor.h>
> #include <asm/cpu_device_id.h>
> +#include <asm/perf_event.h>
>
> MODULE_DESCRIPTION("Microcode Update Driver");
> MODULE_AUTHOR("Tigran Aivazian <[email protected]>");
> @@ -277,7 +278,6 @@ static int reload_for_cpu(int cpu)
> struct ucode_cpu_info *uci = ucode_cpu_info + cpu;
> int err = 0;
>
> - mutex_lock(µcode_mutex);
> if (uci->valid) {
> enum ucode_state ustate;
>
> @@ -288,7 +288,6 @@ static int reload_for_cpu(int cpu)
> if (ustate == UCODE_ERROR)
> err = -EINVAL;
> }
> - mutex_unlock(µcode_mutex);
>
> return err;
> }
> @@ -309,6 +308,7 @@ static ssize_t reload_store(struct devic
> return size;
>
> get_online_cpus();
> + mutex_lock(µcode_mutex);
> for_each_online_cpu(cpu) {
> tmp_ret = reload_for_cpu(cpu);
> if (tmp_ret != 0)
> @@ -318,6 +318,9 @@ static ssize_t reload_store(struct devic
> if (!ret)
> ret = tmp_ret;
> }
> + if (!ret)
> + perf_check_microcode();
> + mutex_unlock(µcode_mutex);
> put_online_cpus();
>
> if (!ret)
> @@ -557,7 +560,8 @@ static int __init microcode_init(void)
> mutex_lock(µcode_mutex);
>
> error = subsys_interface_register(&mc_cpu_interface);
> -
> + if (!error)
> + perf_check_microcode();
> mutex_unlock(µcode_mutex);
> put_online_cpus();
>
>
>
--
Regards/Gruss,
Boris.
Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
GM: Alberto Bozzo
Reg: Dornach, Landkreis Muenchen
HRB Nr. 43632 WEEE Registernr: 129 19551
On Tue, Jun 26, 2012 at 09:40:43PM +0200, Borislav Petkov wrote:
> > + put_online_cpus();
> > +
> > + if (pebs_broken == x86_pmu.pebs_broken)
>
> This is strange, why not:
>
> if (pebs_broken && x86_pmu.pebs_broken)
>
> ?
>
> > + return;
Ah, nevermind, I see what you're doing here.
--
Regards/Gruss,
Boris.
Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
GM: Alberto Bozzo
Reg: Dornach, Landkreis Muenchen
HRB Nr. 43632 WEEE Registernr: 129 19551
On Tue, 2012-06-26 at 21:40 +0200, Borislav Petkov wrote:
>
> > +void perf_check_microcode(void)
> > +{
> > + if (x86_pmu.check_microcode)
> > + x86_pmu.check_microcode();
> > +}
> > +EXPORT_SYMBOL_GPL(perf_check_microcode);
>
> Maybe we should call the after-ucode-has-been-updated callback something
> like arch_verify_microcode_revision or something similar and move it to
> generic x86 code so that other stuff can use it too, in the future...
>
> Although I'm not aware of any other users right about now.
Like that notifier list I had earlier? Yeah we can do that if more users
show up.
> > @@ -373,7 +375,7 @@ struct x86_pmu {
> > * Intel DebugStore bits
> > */
> > int bts, pebs;
> > - int bts_active, pebs_active;
> > + int bts_active, pebs_active, pebs_broken;
>
> I know you don't like bool's here but maybe make it a bitfield like the
> one in perf_event_attr?
I added another patch doing just that:
---
Subject: perf, x86: Save a few bytes in struct x86_pmu
From: Peter Zijlstra <[email protected]>
Date: Tue Jun 26 23:38:39 CEST 2012
All these are basically boolean flags, use a bitfield to save a few
bytes.
Suggested-by: Borislav Petkov <[email protected]>
Signed-off-by: Peter Zijlstra <[email protected]>
---
arch/x86/kernel/cpu/perf_event.h | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)
--- a/arch/x86/kernel/cpu/perf_event.h
+++ b/arch/x86/kernel/cpu/perf_event.h
@@ -374,8 +374,11 @@ struct x86_pmu {
/*
* Intel DebugStore bits
*/
- int bts, pebs;
- int bts_active, pebs_active, pebs_broken;
+ int bts :1,
+ bts_active :1,
+ pebs :1,
+ pebs_active :1,
+ pebs_broken :1;
int pebs_record_size;
void (*drain_pebs)(struct pt_regs *regs);
struct event_constraint *pebs_constraints;
---
> > int pebs_record_size;
> > void (*drain_pebs)(struct pt_regs *regs);
> > struct event_constraint *pebs_constraints;
> > --- a/arch/x86/kernel/cpu/perf_event_intel.c
> > +++ b/arch/x86/kernel/cpu/perf_event_intel.c
> > @@ -1714,11 +1714,54 @@ static __init void intel_clovertown_quir
> > x86_pmu.pebs_constraints = NULL;
> > }
> >
> > +static int intel_snb_pebs_broken(int cpu)
> > +{
> > + u32 rev = UINT_MAX; /* default to broken for unknown models */
> > +
> > + switch (cpu_data(cpu).x86_model) {
>
> cpu_data(cpu) does a per_cpu access three times in this function, maybe
> declare a local ptr which saves us the next two derefs... (if the
> compiler is not optimizing those, anyway, that is).
I was hoping for CSE optimization to do that for me.. its not really a
performance critical path anyway. I could change it if people think it
reads better with regular: struct cpuinfo_x86 *c = &cpu_data(cpu);
> > + get_online_cpus();
> > + for_each_online_cpu(cpu)
> > + pebs_broken |= intel_snb_pebs_broken(cpu);
>
> if pebs_broken gets set here not on the last cpu, you could break out of
> the loop early instead of iterating to the last cpu.
Right you are..
---
Subject: perf, x86: Add a microcode revision check for SNB-PEBS
From: Peter Zijlstra <[email protected]>
Date: Fri Jun 08 14:50:50 CEST 2012
Recent Intel microcode resolved the SNB-PEBS issues, so conditionally
enable PEBS on SNB hardware depending on the microcode revision.
Thanks to Stephane for figuring out the various microcode revisions.
Cc: Stephane Eranian <[email protected]>
Signed-off-by: Peter Zijlstra <[email protected]>
---
arch/x86/include/asm/perf_event.h | 2 +
arch/x86/kernel/cpu/perf_event.c | 21 +++++++++----
arch/x86/kernel/cpu/perf_event.h | 4 +-
arch/x86/kernel/cpu/perf_event_intel.c | 51 +++++++++++++++++++++++++++++++--
arch/x86/kernel/microcode_core.c | 10 ++++--
5 files changed, 74 insertions(+), 14 deletions(-)
--- a/arch/x86/include/asm/perf_event.h
+++ b/arch/x86/include/asm/perf_event.h
@@ -232,6 +232,7 @@ struct perf_guest_switch_msr {
extern struct perf_guest_switch_msr *perf_guest_get_msrs(int *nr);
extern void perf_get_x86_pmu_capability(struct x86_pmu_capability *cap);
+extern void perf_check_microcode(void);
#else
static inline perf_guest_switch_msr *perf_guest_get_msrs(int *nr)
{
@@ -245,6 +246,7 @@ static inline void perf_get_x86_pmu_capa
}
static inline void perf_events_lapic_init(void) { }
+static inline void perf_check_microcode(void) { }
#endif
#if defined(CONFIG_PERF_EVENTS) && defined(CONFIG_CPU_SUP_AMD)
--- a/arch/x86/kernel/cpu/perf_event.c
+++ b/arch/x86/kernel/cpu/perf_event.c
@@ -379,7 +379,7 @@ int x86_pmu_hw_config(struct perf_event
int precise = 0;
/* Support for constant skid */
- if (x86_pmu.pebs_active) {
+ if (x86_pmu.pebs_active && !x86_pmu.pebs_broken) {
precise++;
/* Support for IP fixup */
@@ -1650,13 +1650,20 @@ static void x86_pmu_flush_branch_stack(v
x86_pmu.flush_branch_stack();
}
+void perf_check_microcode(void)
+{
+ if (x86_pmu.check_microcode)
+ x86_pmu.check_microcode();
+}
+EXPORT_SYMBOL_GPL(perf_check_microcode);
+
static struct pmu pmu = {
.pmu_enable = x86_pmu_enable,
.pmu_disable = x86_pmu_disable,
- .attr_groups = x86_pmu_attr_groups,
+ .attr_groups = x86_pmu_attr_groups,
- .event_init = x86_pmu_event_init,
+ .event_init = x86_pmu_event_init,
.add = x86_pmu_add,
.del = x86_pmu_del,
@@ -1664,11 +1671,11 @@ static struct pmu pmu = {
.stop = x86_pmu_stop,
.read = x86_pmu_read,
- .start_txn = x86_pmu_start_txn,
- .cancel_txn = x86_pmu_cancel_txn,
- .commit_txn = x86_pmu_commit_txn,
+ .start_txn = x86_pmu_start_txn,
+ .cancel_txn = x86_pmu_cancel_txn,
+ .commit_txn = x86_pmu_commit_txn,
- .event_idx = x86_pmu_event_idx,
+ .event_idx = x86_pmu_event_idx,
.flush_branch_stack = x86_pmu_flush_branch_stack,
};
--- a/arch/x86/kernel/cpu/perf_event.h
+++ b/arch/x86/kernel/cpu/perf_event.h
@@ -361,6 +361,8 @@ struct x86_pmu {
void (*cpu_starting)(int cpu);
void (*cpu_dying)(int cpu);
void (*cpu_dead)(int cpu);
+
+ void (*check_microcode)(void);
void (*flush_branch_stack)(void);
/*
@@ -373,7 +375,7 @@ struct x86_pmu {
* Intel DebugStore bits
*/
int bts, pebs;
- int bts_active, pebs_active;
+ int bts_active, pebs_active, pebs_broken;
int pebs_record_size;
void (*drain_pebs)(struct pt_regs *regs);
struct event_constraint *pebs_constraints;
--- a/arch/x86/kernel/cpu/perf_event_intel.c
+++ b/arch/x86/kernel/cpu/perf_event_intel.c
@@ -1714,11 +1714,56 @@ static __init void intel_clovertown_quir
x86_pmu.pebs_constraints = NULL;
}
+static int intel_snb_pebs_broken(int cpu)
+{
+ u32 rev = UINT_MAX; /* default to broken for unknown models */
+
+ switch (cpu_data(cpu).x86_model) {
+ case 42: /* SNB */
+ rev = 0x28;
+ break;
+
+ case 45: /* SNB-EP */
+ switch (cpu_data(cpu).x86_mask) {
+ case 6: rev = 0x618; break;
+ case 7: rev = 0x70c; break;
+ }
+ }
+
+ return (cpu_data(cpu).microcode < rev);
+}
+
+static void intel_snb_check_microcode(void)
+{
+ int pebs_broken = 0;
+ int cpu;
+
+ get_online_cpus();
+ for_each_online_cpu(cpu) {
+ if ((pebs_broken = intel_snb_pebs_broken(cpu)))
+ break;
+ }
+ put_online_cpus();
+
+ if (pebs_broken == x86_pmu.pebs_broken)
+ return;
+
+ /*
+ * Serialized by the microcode lock..
+ */
+ if (x86_pmu.pebs_broken) {
+ pr_info("PEBS enabled due to microcode update\n");
+ x86_pmu.pebs_broken = 0;
+ } else {
+ pr_info("PEBS disabled due to CPU errata, please upgrade microcode\n");
+ x86_pmu.pebs_broken = 1;
+ }
+}
+
static __init void intel_sandybridge_quirk(void)
{
- pr_warn("PEBS disabled due to CPU errata\n");
- x86_pmu.pebs = 0;
- x86_pmu.pebs_constraints = NULL;
+ x86_pmu.check_microcode = intel_snb_check_microcode;
+ intel_snb_check_microcode();
}
static const struct { int id; char *name; } intel_arch_events_map[] __initconst = {
--- a/arch/x86/kernel/microcode_core.c
+++ b/arch/x86/kernel/microcode_core.c
@@ -87,6 +87,7 @@
#include <asm/microcode.h>
#include <asm/processor.h>
#include <asm/cpu_device_id.h>
+#include <asm/perf_event.h>
MODULE_DESCRIPTION("Microcode Update Driver");
MODULE_AUTHOR("Tigran Aivazian <[email protected]>");
@@ -277,7 +278,6 @@ static int reload_for_cpu(int cpu)
struct ucode_cpu_info *uci = ucode_cpu_info + cpu;
int err = 0;
- mutex_lock(µcode_mutex);
if (uci->valid) {
enum ucode_state ustate;
@@ -288,7 +288,6 @@ static int reload_for_cpu(int cpu)
if (ustate == UCODE_ERROR)
err = -EINVAL;
}
- mutex_unlock(µcode_mutex);
return err;
}
@@ -309,6 +308,7 @@ static ssize_t reload_store(struct devic
return size;
get_online_cpus();
+ mutex_lock(µcode_mutex);
for_each_online_cpu(cpu) {
tmp_ret = reload_for_cpu(cpu);
if (tmp_ret != 0)
@@ -318,6 +318,9 @@ static ssize_t reload_store(struct devic
if (!ret)
ret = tmp_ret;
}
+ if (!ret)
+ perf_check_microcode();
+ mutex_unlock(µcode_mutex);
put_online_cpus();
if (!ret)
@@ -557,7 +560,8 @@ static int __init microcode_init(void)
mutex_lock(µcode_mutex);
error = subsys_interface_register(&mc_cpu_interface);
-
+ if (!error)
+ perf_check_microcode();
mutex_unlock(µcode_mutex);
put_online_cpus();
On Tue, Jun 26, 2012 at 11:46:18PM +0200, Peter Zijlstra wrote:
> On Tue, 2012-06-26 at 21:40 +0200, Borislav Petkov wrote:
> >
> > > +void perf_check_microcode(void)
> > > +{
> > > + if (x86_pmu.check_microcode)
> > > + x86_pmu.check_microcode();
> > > +}
> > > +EXPORT_SYMBOL_GPL(perf_check_microcode);
> >
> > Maybe we should call the after-ucode-has-been-updated callback something
> > like arch_verify_microcode_revision or something similar and move it to
> > generic x86 code so that other stuff can use it too, in the future...
> >
> > Although I'm not aware of any other users right about now.
>
> Like that notifier list I had earlier? Yeah we can do that if more users
> show up.
Yeah, ok.
> > > @@ -373,7 +375,7 @@ struct x86_pmu {
> > > * Intel DebugStore bits
> > > */
> > > int bts, pebs;
> > > - int bts_active, pebs_active;
> > > + int bts_active, pebs_active, pebs_broken;
> >
> > I know you don't like bool's here but maybe make it a bitfield like the
> > one in perf_event_attr?
>
> I added another patch doing just that:
>
> ---
> Subject: perf, x86: Save a few bytes in struct x86_pmu
> From: Peter Zijlstra <[email protected]>
> Date: Tue Jun 26 23:38:39 CEST 2012
>
> All these are basically boolean flags, use a bitfield to save a few
> bytes.
>
> Suggested-by: Borislav Petkov <[email protected]>
> Signed-off-by: Peter Zijlstra <[email protected]>
> ---
> arch/x86/kernel/cpu/perf_event.h | 7 +++++--
> 1 file changed, 5 insertions(+), 2 deletions(-)
>
> --- a/arch/x86/kernel/cpu/perf_event.h
> +++ b/arch/x86/kernel/cpu/perf_event.h
> @@ -374,8 +374,11 @@ struct x86_pmu {
> /*
> * Intel DebugStore bits
> */
> - int bts, pebs;
> - int bts_active, pebs_active, pebs_broken;
> + int bts :1,
> + bts_active :1,
> + pebs :1,
> + pebs_active :1,
> + pebs_broken :1;
> int pebs_record_size;
> void (*drain_pebs)(struct pt_regs *regs);
> struct event_constraint *pebs_constraints;
> ---
Yep.
>
>
> > > int pebs_record_size;
> > > void (*drain_pebs)(struct pt_regs *regs);
> > > struct event_constraint *pebs_constraints;
> > > --- a/arch/x86/kernel/cpu/perf_event_intel.c
> > > +++ b/arch/x86/kernel/cpu/perf_event_intel.c
> > > @@ -1714,11 +1714,54 @@ static __init void intel_clovertown_quir
> > > x86_pmu.pebs_constraints = NULL;
> > > }
> > >
> > > +static int intel_snb_pebs_broken(int cpu)
> > > +{
> > > + u32 rev = UINT_MAX; /* default to broken for unknown models */
> > > +
> > > + switch (cpu_data(cpu).x86_model) {
> >
> > cpu_data(cpu) does a per_cpu access three times in this function, maybe
> > declare a local ptr which saves us the next two derefs... (if the
> > compiler is not optimizing those, anyway, that is).
>
> I was hoping for CSE optimization to do that for me.. its not really a
> performance critical path anyway. I could change it if people think it
> reads better with regular: struct cpuinfo_x86 *c = &cpu_data(cpu);
Right, we maybe will do this a couple of times tops during system
lifetime, ok.
> > > + get_online_cpus();
> > > + for_each_online_cpu(cpu)
> > > + pebs_broken |= intel_snb_pebs_broken(cpu);
> >
> > if pebs_broken gets set here not on the last cpu, you could break out of
> > the loop early instead of iterating to the last cpu.
>
> Right you are..
>
>
> ---
> Subject: perf, x86: Add a microcode revision check for SNB-PEBS
> From: Peter Zijlstra <[email protected]>
> Date: Fri Jun 08 14:50:50 CEST 2012
>
> Recent Intel microcode resolved the SNB-PEBS issues, so conditionally
> enable PEBS on SNB hardware depending on the microcode revision.
>
> Thanks to Stephane for figuring out the various microcode revisions.
>
> Cc: Stephane Eranian <[email protected]>
> Signed-off-by: Peter Zijlstra <[email protected]>
Right, looks ok to me.
If it is of any use:
Acked-by: Borislav Petkov <[email protected]>
Thanks.
--
Regards/Gruss,
Boris.
Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
GM: Alberto Bozzo
Reg: Dornach, Landkreis Muenchen
HRB Nr. 43632 WEEE Registernr: 129 19551
Assuming there are no more objections to those, can we get them into,
say, tip:x86/microcode for a wider, linux-next testing?
Peter's stuff could go there too or through the perf tree - it's kinda
arbitrary though.
Thanks.
On Thu, Jun 21, 2012 at 02:07:15PM +0200, Borislav Petkov wrote:
> From: Borislav Petkov <[email protected]>
>
> Ok, here's the second version with all review comments addressed. I've
> dropped the stable tag since this has been that way (and b0rked) since
> forever so stable rules don't apply.
>
> We can probably do single backports if distros want it.
>
>
>
> Changelog:
>
> -v1:
>
> Once upon a time there was this microcode reloading interface
> /sys/devices/system/cpu/cpuX/microcode/reload, where X is an online
> cpu on the system, which allowed the loading of microcode in a per-cpu
> manner.
>
> This had problems like having different ucode revisions on an otherwise
> homogeneous system and needed O(n^2) overhead when tracking minimum
> microcode revision per-core.
>
> So make this interface per-system so that it does microcode reloading on
> the whole system only.
>
> Single commit messages have more info too.
>
> The first patch has a stable tag which I'd like to see in stable but
> since it is not fixing a direct regression, I'd like to not push it
> upstream now but have it get tested in linux-next and go upstream during
> the next merge window from where it can trickle slowly to stable.
>
> Patches have been tested on all AMD families, it wouldn't hurt if it saw
> some Intel testing too, although it should just work.
>
> Holler if you see regressions/problems with it.
>
> Thanks.
>
--
Regards/Gruss,
Boris.
Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
GM: Alberto Bozzo
Reg: Dornach, Landkreis Muenchen
HRB Nr. 43632 WEEE Registernr: 129 19551