This patch checks the microcode version before disabling
PEBS on SandyBridge model 42 (desktop, mobile), and 45 (SNB-EP).
PEBS was disabled for both models due to an erratum.
A workaround is implemented by micro-code 0x28. This patch checks
the microcode version and disables PEBS support if version < 0x28.
The check is done each time a PEBS event is created and NOT at boot
time because the micro-code update may only be done after the kernel
has booted.
Go to downloadcenter.intel.com to download microcode updates.
Need microcode update 6/6/2012 or later.
Signed-off-by: Stephane Eranian <[email protected]>
diff --git a/arch/x86/kernel/cpu/perf_event_intel.c b/arch/x86/kernel/cpu/perf_event_intel.c
index 5ec146c..9c905bd 100644
--- a/arch/x86/kernel/cpu/perf_event_intel.c
+++ b/arch/x86/kernel/cpu/perf_event_intel.c
@@ -1394,6 +1394,27 @@ static void intel_pebs_aliases_snb(struct perf_event *event)
}
}
+static int check_pebs_quirks(void)
+{
+ int uversion = cpu_data(smp_processor_id()).microcode;
+ int model = cpu_data(smp_processor_id()).x86_model;
+
+ /* do not have PEBS to begin with */
+ if (!x86_pmu.pebs)
+ return 0;
+
+ /*
+ * check ucode version for SNB, SNB-EP
+ */
+ if ((model == 42 || model == 45) && uversion < 0x28) {
+ pr_warn("SandyBridge PEBS unavailable due to CPU erratum, "
+ " update microcode (was 0x%x, needs at least 0x28).\n",
+ uversion);
+ return -ENOTSUPP;
+ }
+ return 0;
+}
+
static int intel_pmu_hw_config(struct perf_event *event)
{
int ret = x86_pmu_hw_config(event);
@@ -1401,8 +1422,14 @@ static int intel_pmu_hw_config(struct perf_event *event)
if (ret)
return ret;
- if (event->attr.precise_ip && x86_pmu.pebs_aliases)
- x86_pmu.pebs_aliases(event);
+ if (event->attr.precise_ip) {
+
+ if (check_pebs_quirks())
+ return -ENOTSUPP;
+
+ if (x86_pmu.pebs_aliases)
+ x86_pmu.pebs_aliases(event);
+ }
if (intel_pmu_needs_lbr_smpl(event)) {
ret = intel_pmu_setup_lbr_filter(event);
@@ -1714,13 +1741,6 @@ static __init void intel_clovertown_quirk(void)
x86_pmu.pebs_constraints = NULL;
}
-static __init void intel_sandybridge_quirk(void)
-{
- pr_warn("PEBS disabled due to CPU errata\n");
- x86_pmu.pebs = 0;
- x86_pmu.pebs_constraints = NULL;
-}
-
static const struct { int id; char *name; } intel_arch_events_map[] __initconst = {
{ PERF_COUNT_HW_CPU_CYCLES, "cpu cycles" },
{ PERF_COUNT_HW_INSTRUCTIONS, "instructions" },
@@ -1914,7 +1934,6 @@ __init int intel_pmu_init(void)
case 42: /* SandyBridge */
case 45: /* SandyBridge, "Romely-EP" */
- x86_add_quirk(intel_sandybridge_quirk);
case 58: /* IvyBridge */
memcpy(hw_cache_event_ids, snb_hw_cache_event_ids,
sizeof(hw_cache_event_ids));
On Thu, 2012-06-07 at 09:15 +0200, Stephane Eranian wrote:
> +static int check_pebs_quirks(void)
> +{
> + int uversion = cpu_data(smp_processor_id()).microcode;
> + int model = cpu_data(smp_processor_id()).x86_model;
> +
> + /* do not have PEBS to begin with */
> + if (!x86_pmu.pebs)
> + return 0;
> +
> + /*
> + * check ucode version for SNB, SNB-EP
> + */
> + if ((model == 42 || model == 45) && uversion < 0x28) {
> + pr_warn("SandyBridge PEBS unavailable due to CPU erratum, "
> + " update microcode (was 0x%x, needs at least 0x28).\n",
> + uversion);
> + return -ENOTSUPP;
> + }
> + return 0;
> +}
> +
> static int intel_pmu_hw_config(struct perf_event *event)
> {
> int ret = x86_pmu_hw_config(event);
> @@ -1401,8 +1422,14 @@ static int intel_pmu_hw_config(struct perf_event *event)
> if (ret)
> return ret;
>
> - if (event->attr.precise_ip && x86_pmu.pebs_aliases)
> - x86_pmu.pebs_aliases(event);
> + if (event->attr.precise_ip) {
> +
> + if (check_pebs_quirks())
> + return -ENOTSUPP;
This will only warn about the PEBS issue once you try and use a PEBS
counter. Shouldn't we do this on boot? IOW. put check_pebs_quirks()
inside the existing quirk code instead of here?
> +
> + if (x86_pmu.pebs_aliases)
> + x86_pmu.pebs_aliases(event);
> + }
>
> if (intel_pmu_needs_lbr_smpl(event)) {
> ret = intel_pmu_setup_lbr_filter(event);
> @@ -1714,13 +1741,6 @@ static __init void intel_clovertown_quirk(void)
> x86_pmu.pebs_constraints = NULL;
> }
On Thu, Jun 7, 2012 at 12:18 PM, Peter Zijlstra <[email protected]> wrote:
> On Thu, 2012-06-07 at 09:15 +0200, Stephane Eranian wrote:
>> +static int check_pebs_quirks(void)
>> +{
>> + int uversion = cpu_data(smp_processor_id()).microcode;
>> + int model = cpu_data(smp_processor_id()).x86_model;
>> +
>> + /* do not have PEBS to begin with */
>> + if (!x86_pmu.pebs)
>> + return 0;
>> +
>> + /*
>> + * check ucode version for SNB, SNB-EP
>> + */
>> + if ((model == 42 || model == 45) && uversion < 0x28) {
>> + pr_warn("SandyBridge PEBS unavailable due to CPU erratum, "
>> + " update microcode (was 0x%x, needs at least 0x28).\n",
>> + uversion);
>> + return -ENOTSUPP;
>> + }
>> + return 0;
>> +}
>> +
>> static int intel_pmu_hw_config(struct perf_event *event)
>> {
>> int ret = x86_pmu_hw_config(event);
>> @@ -1401,8 +1422,14 @@ static int intel_pmu_hw_config(struct perf_event *event)
>> if (ret)
>> return ret;
>>
>> - if (event->attr.precise_ip && x86_pmu.pebs_aliases)
>> - x86_pmu.pebs_aliases(event);
>> + if (event->attr.precise_ip) {
>> +
>> + if (check_pebs_quirks())
>> + return -ENOTSUPP;
>
> This will only warn about the PEBS issue once you try and use a PEBS
> counter. Shouldn't we do this on boot? IOW. put check_pebs_quirks()
> inside the existing quirk code instead of here?
>
The warning could also be done on boot. But the check has to be done
when the event is created. The other thing is that as it is now, if you
get an error, you check dmesg and it's obvious what is wrong. With
the boot warning, you'd have to look back all the way to the early boot
messages. Whatever you prefer, I am fine.
>> +
>> + if (x86_pmu.pebs_aliases)
>> + x86_pmu.pebs_aliases(event);
>> + }
>>
>> if (intel_pmu_needs_lbr_smpl(event)) {
>> ret = intel_pmu_setup_lbr_filter(event);
>> @@ -1714,13 +1741,6 @@ static __init void intel_clovertown_quirk(void)
>> x86_pmu.pebs_constraints = NULL;
>> }
On Thu, 2012-06-07 at 12:35 +0200, Stephane Eranian wrote:
> > This will only warn about the PEBS issue once you try and use a PEBS
> > counter. Shouldn't we do this on boot? IOW. put check_pebs_quirks()
> > inside the existing quirk code instead of here?
> >
> The warning could also be done on boot. But the check has to be done
> when the event is created.
Is this because of the ucode loader not having done its thing yet etc..?
> The other thing is that as it is now, if you
> get an error, you check dmesg and it's obvious what is wrong. With
> the boot warning, you'd have to look back all the way to the early boot
> messages. Whatever you prefer, I am fine.
We could do both I guess. But the event creation one needs a rate
limiter.
On Thu, Jun 7, 2012 at 12:45 PM, Peter Zijlstra <[email protected]> wrote:
> On Thu, 2012-06-07 at 12:35 +0200, Stephane Eranian wrote:
>> > This will only warn about the PEBS issue once you try and use a PEBS
>> > counter. Shouldn't we do this on boot? IOW. put check_pebs_quirks()
>> > inside the existing quirk code instead of here?
>> >
>> The warning could also be done on boot. But the check has to be done
>> when the event is created.
>
> Is this because of the ucode loader not having done its thing yet etc..?
>
>> The other thing is that as it is now, if you
>> get an error, you check dmesg and it's obvious what is wrong. With
>> the boot warning, you'd have to look back all the way to the early boot
>> messages. Whatever you prefer, I am fine.
>
> We could do both I guess. But the event creation one needs a rate
> limiter.
Let's keep things simple, move the warning on boot, keep the quiet check
on creation.
On Thu, 2012-06-07 at 12:48 +0200, Stephane Eranian wrote:
> >> The warning could also be done on boot. But the check has to be done
> >> when the event is created.
> >
> > Is this because of the ucode loader not having done its thing yet etc..?
Ah, its because you can update ucode at runtime of course.
A quick look at the ucode loader doesn't show it having a notifier
either, so then yeah we need to check on every event creation :/
Or we could put a hook in the ucode loader.
> + /*
> + * check ucode version for SNB, SNB-EP
> + */
> + if ((model == 42 || model == 45) && uversion < 0x28) {
You need to check for x86 == 6 here.
-Andi
> + pr_warn("SandyBridge PEBS unavailable due to CPU erratum, "
> + " update microcode (was 0x%x, needs at least 0x28).\n",
> + uversion);
> + return -ENOTSUPP;
> This will only warn about the PEBS issue once you try and use a PEBS
> counter. Shouldn't we do this on boot? IOW. put check_pebs_quirks()
At boot the new microcode may not be loaded yet. So you have to delay
the check.
-Andi
* Peter Zijlstra <[email protected]> wrote:
> On Thu, 2012-06-07 at 12:48 +0200, Stephane Eranian wrote:
> > >> The warning could also be done on boot. But the check has to be done
> > >> when the event is created.
> > >
> > > Is this because of the ucode loader not having done its thing yet etc..?
>
> Ah, its because you can update ucode at runtime of course.
>
> A quick look at the ucode loader doesn't show it having a
> notifier either, so then yeah we need to check on every event
> creation :/
>
> Or we could put a hook in the ucode loader.
I'd really suggest the latter - I doubt this will be our only
ucode dependent quirk, going forward ...
Thanks,
Ingo
On Fri, 2012-06-08 at 11:35 +0200, Ingo Molnar wrote:
> * Peter Zijlstra <[email protected]> wrote:
>
> > On Thu, 2012-06-07 at 12:48 +0200, Stephane Eranian wrote:
> > > >> The warning could also be done on boot. But the check has to be done
> > > >> when the event is created.
> > > >
> > > > Is this because of the ucode loader not having done its thing yet etc..?
> >
> > Ah, its because you can update ucode at runtime of course.
> >
> > A quick look at the ucode loader doesn't show it having a
> > notifier either, so then yeah we need to check on every event
> > creation :/
> >
> > Or we could put a hook in the ucode loader.
>
> I'd really suggest the latter - I doubt this will be our only
> ucode dependent quirk, going forward ...
Yeah, am in the middle of writing that..
On Fri, Jun 8, 2012 at 12:00 PM, Peter Zijlstra <[email protected]> wrote:
> On Fri, 2012-06-08 at 11:35 +0200, Ingo Molnar wrote:
>> * Peter Zijlstra <[email protected]> wrote:
>>
>> > On Thu, 2012-06-07 at 12:48 +0200, Stephane Eranian wrote:
>> > > >> The warning could also be done on boot. But the check has to be done
>> > > >> when the event is created.
>> > > >
>> > > > Is this because of the ucode loader not having done its thing yet etc..?
>> >
>> > Ah, its because you can update ucode at runtime of course.
>> >
>> > A quick look at the ucode loader doesn't show it having a
>> > notifier either, so then yeah we need to check on every event
>> > creation :/
>> >
>> > Or we could put a hook in the ucode loader.
>>
>> I'd really suggest the latter - I doubt this will be our only
>> ucode dependent quirk, going forward ...
>
> Yeah, am in the middle of writing that..
I am fine with that. As I found out yesterday, the reality is that the
ucode version for SNB vs. SNB-EP differ, so with this mechanism
we could customize the callback per CPU model number more easily.
On Fri, 2012-06-08 at 12:00 +0200, Peter Zijlstra wrote:
> > > Or we could put a hook in the ucode loader.
> >
> > I'd really suggest the latter - I doubt this will be our only
> > ucode dependent quirk, going forward ...
>
> Yeah, am in the middle of writing that..
OK so the micro-code stuff is a complete trainwreck.
The very worst is that it does per-cpu micro-code updates, not machine
wide. This results in it being able to have different revisions on
different cpus. This in turn makes the below O(n^2) :/
The biggest problem is finding when the minimum revision changes, at
best this is a n log n sorting problem due to the per-cpu setup, but I
couldn't be arsed to implement a tree or anything fancy since it all
stinks anyway.
Why do we have per-cpu firmware anyway?
Anyway, I have the below in two patches, but I thought to first post the
entire mess since if anybody has a better idea I'm all ears.
It does work though..
---
arch/x86/include/asm/microcode.h | 12 ++++++
arch/x86/kernel/cpu/perf_event.c | 14 +++----
arch/x86/kernel/cpu/perf_event.h | 2 +-
arch/x86/kernel/cpu/perf_event_intel.c | 63 ++++++++++++++++++++++++++++++--
arch/x86/kernel/microcode_amd.c | 1 +
arch/x86/kernel/microcode_core.c | 53 +++++++++++++++++++++++----
arch/x86/kernel/microcode_intel.c | 1 +
arch/x86/kernel/setup.c | 5 +++
8 files changed, 132 insertions(+), 19 deletions(-)
diff --git a/arch/x86/include/asm/microcode.h b/arch/x86/include/asm/microcode.h
index 4ebe157..1cd2dc5 100644
--- a/arch/x86/include/asm/microcode.h
+++ b/arch/x86/include/asm/microcode.h
@@ -32,6 +32,7 @@ struct microcode_ops {
struct ucode_cpu_info {
struct cpu_signature cpu_sig;
+ int new_rev;
int valid;
void *mc;
};
@@ -63,4 +64,15 @@ static inline struct microcode_ops * __init init_amd_microcode(void)
static inline void __exit exit_amd_microcode(void) {}
#endif
+extern struct blocking_notifier_head microcode_notifier;
+
+#define MICROCODE_CAN_UPDATE 0x01
+#define MICROCODE_UPDATED 0x02
+
+#define microcode_notifier(fn) \
+do { \
+ static struct notifier_block fn##_nb ={ .notifier_call = fn }; \
+ blocking_notifier_chain_register(µcode_notifier, &fn##_nb);\
+} while (0)
+
#endif /* _ASM_X86_MICROCODE_H */
diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c
index 000a474..d3ef86c1 100644
--- a/arch/x86/kernel/cpu/perf_event.c
+++ b/arch/x86/kernel/cpu/perf_event.c
@@ -377,7 +377,7 @@ int x86_pmu_hw_config(struct perf_event *event)
int precise = 0;
/* Support for constant skid */
- if (x86_pmu.pebs_active) {
+ if (x86_pmu.pebs_active && !x86_pmu.pebs_broken) {
precise++;
/* Support for IP fixup */
@@ -1677,9 +1677,9 @@ static struct pmu pmu = {
.pmu_enable = x86_pmu_enable,
.pmu_disable = x86_pmu_disable,
- .attr_groups = x86_pmu_attr_groups,
+ .attr_groups = x86_pmu_attr_groups,
- .event_init = x86_pmu_event_init,
+ .event_init = x86_pmu_event_init,
.add = x86_pmu_add,
.del = x86_pmu_del,
@@ -1687,11 +1687,11 @@ static struct pmu pmu = {
.stop = x86_pmu_stop,
.read = x86_pmu_read,
- .start_txn = x86_pmu_start_txn,
- .cancel_txn = x86_pmu_cancel_txn,
- .commit_txn = x86_pmu_commit_txn,
+ .start_txn = x86_pmu_start_txn,
+ .cancel_txn = x86_pmu_cancel_txn,
+ .commit_txn = x86_pmu_commit_txn,
- .event_idx = x86_pmu_event_idx,
+ .event_idx = x86_pmu_event_idx,
.flush_branch_stack = x86_pmu_flush_branch_stack,
};
diff --git a/arch/x86/kernel/cpu/perf_event.h b/arch/x86/kernel/cpu/perf_event.h
index 3df3de9..a5ecfe8 100644
--- a/arch/x86/kernel/cpu/perf_event.h
+++ b/arch/x86/kernel/cpu/perf_event.h
@@ -373,7 +373,7 @@ struct x86_pmu {
* Intel DebugStore bits
*/
int bts, pebs;
- int bts_active, pebs_active;
+ int bts_active, pebs_active, pebs_broken;
int pebs_record_size;
void (*drain_pebs)(struct pt_regs *regs);
struct event_constraint *pebs_constraints;
diff --git a/arch/x86/kernel/cpu/perf_event_intel.c b/arch/x86/kernel/cpu/perf_event_intel.c
index 5ec146c..bde86cf 100644
--- a/arch/x86/kernel/cpu/perf_event_intel.c
+++ b/arch/x86/kernel/cpu/perf_event_intel.c
@@ -15,6 +15,7 @@
#include <asm/hardirq.h>
#include <asm/apic.h>
+#include <asm/microcode.h>
#include "perf_event.h"
@@ -1714,11 +1715,67 @@ static __init void intel_clovertown_quirk(void)
x86_pmu.pebs_constraints = NULL;
}
+static const u32 snb_ucode_rev = 0x28;
+
+static void intel_snb_verify_ucode(void)
+{
+ u32 rev = UINT_MAX;
+ int pebs_broken = 0;
+ int cpu;
+
+ get_online_cpus();
+ /*
+ * Because the microcode loader is bloody stupid and allows different
+ * revisions per cpu and does strictly per-cpu loading, we now have to
+ * check all cpus to determine the minimally installed revision.
+ *
+ * This makes updating the microcode O(n^2) in the number of CPUs :/
+ */
+ for_each_online_cpu(cpu)
+ rev = min(cpu_data(cpu).microcode, rev);
+ put_online_cpus();
+
+ pebs_broken = (rev < snb_ucode_rev);
+
+ if (pebs_broken == x86_pmu.pebs_broken)
+ return;
+
+ /*
+ * Serialized by the microcode lock..
+ */
+ if (x86_pmu.pebs_broken) {
+ pr_info("PEBS enabled due to micro-code update\n");
+ x86_pmu.pebs_broken = 0;
+ } else {
+ pr_info("PEBS disabled due to CPU errata, "
+ "please upgrade micro-code to at least %x (current: %x)\n",
+ snb_ucode_rev, rev);
+ x86_pmu.pebs_broken = 1;
+ }
+}
+
+static int intel_snb_ucode_notifier(struct notifier_block *self,
+ unsigned long action, void *_uci)
+{
+ /*
+ * Since ucode cannot be down-graded, and no future ucode revision
+ * is known to break PEBS again, we're ok with MICROCODE_CAN_UPDATE.
+ */
+
+ if (action == MICROCODE_UPDATED)
+ intel_snb_verify_ucode();
+
+ return NOTIFY_DONE;
+}
+
static __init void intel_sandybridge_quirk(void)
{
- pr_warn("PEBS disabled due to CPU errata\n");
- x86_pmu.pebs = 0;
- x86_pmu.pebs_constraints = NULL;
+ intel_snb_verify_ucode();
+ /*
+ * we're still single threaded, so while there's a hole here,
+ * you can't trigger it.
+ */
+ microcode_notifier(intel_snb_ucode_notifier);
}
static const struct { int id; char *name; } intel_arch_events_map[] __initconst = {
diff --git a/arch/x86/kernel/microcode_amd.c b/arch/x86/kernel/microcode_amd.c
index 8a2ce8f..265831e 100644
--- a/arch/x86/kernel/microcode_amd.c
+++ b/arch/x86/kernel/microcode_amd.c
@@ -294,6 +294,7 @@ generic_load_microcode(int cpu, const u8 *data, size_t size)
}
out_ok:
+ uci->new_rev = new_rev;
uci->mc = new_mc;
state = UCODE_OK;
pr_debug("CPU%d update ucode (0x%08x -> 0x%08x)\n",
diff --git a/arch/x86/kernel/microcode_core.c b/arch/x86/kernel/microcode_core.c
index fbdfc69..a6cad81 100644
--- a/arch/x86/kernel/microcode_core.c
+++ b/arch/x86/kernel/microcode_core.c
@@ -167,15 +167,43 @@ static void apply_microcode_local(void *arg)
ctx->err = microcode_ops->apply_microcode(smp_processor_id());
}
+/*
+ * Call the notifier chain before we update to verify we can indeed
+ * update to the desired revision.
+ */
+
+static int microcode_notifier_check(struct ucode_cpu_info *uci)
+{
+ int ret = blocking_notifier_call_chain(µcode_notifier,
+ MICROCODE_CAN_UPDATE, uci);
+ return notifier_to_errno(ret);
+}
+
+/*
+ * Call the notifier chain after we've updated to
+ */
+
+static void microcode_notifier_done(void)
+{
+ blocking_notifier_call_chain(µcode_notifier, MICROCODE_UPDATED, NULL);
+}
+
static int apply_microcode_on_target(int cpu)
{
struct apply_microcode_ctx ctx = { .err = 0 };
int ret;
+ ret = microcode_notifier_check(ucode_cpu_info + cpu);
+ if (ret)
+ return ret;
+
ret = smp_call_function_single(cpu, apply_microcode_local, &ctx, 1);
if (!ret)
ret = ctx.err;
+ if (!ret)
+ microcode_notifier_done();
+
return ret;
}
@@ -196,8 +224,11 @@ static int do_microcode_update(const void __user *buf, size_t size)
if (ustate == UCODE_ERROR) {
error = -1;
break;
- } else if (ustate == UCODE_OK)
- apply_microcode_on_target(cpu);
+ } else if (ustate == UCODE_OK) {
+ error = apply_microcode_on_target(cpu);
+ if (error)
+ break;
+ }
}
return error;
@@ -282,11 +313,12 @@ static int reload_for_cpu(int cpu)
enum ucode_state ustate;
ustate = microcode_ops->request_microcode_fw(cpu, µcode_pdev->dev);
- if (ustate == UCODE_OK)
- apply_microcode_on_target(cpu);
- else
+ if (ustate == UCODE_OK) {
+ err = apply_microcode_on_target(cpu);
+ } else {
if (ustate == UCODE_ERROR)
err = -EINVAL;
+ }
}
mutex_unlock(µcode_mutex);
@@ -361,14 +393,15 @@ static void microcode_fini_cpu(int cpu)
static enum ucode_state microcode_resume_cpu(int cpu)
{
struct ucode_cpu_info *uci = ucode_cpu_info + cpu;
+ int err;
if (!uci->mc)
return UCODE_NFOUND;
pr_debug("CPU%d updated upon resume\n", cpu);
- apply_microcode_on_target(cpu);
+ err = apply_microcode_on_target(cpu);
- return UCODE_OK;
+ return !err ? UCODE_OK : UCODE_ERROR;
}
static enum ucode_state microcode_init_cpu(int cpu)
@@ -385,8 +418,12 @@ static enum ucode_state microcode_init_cpu(int cpu)
ustate = microcode_ops->request_microcode_fw(cpu, µcode_pdev->dev);
if (ustate == UCODE_OK) {
+ int err;
+
pr_debug("CPU%d updated upon init\n", cpu);
- apply_microcode_on_target(cpu);
+ err = apply_microcode_on_target(cpu);
+ if (err)
+ ustate = UCODE_ERROR;
}
return ustate;
diff --git a/arch/x86/kernel/microcode_intel.c b/arch/x86/kernel/microcode_intel.c
index 0327e2b..6b87bd5 100644
--- a/arch/x86/kernel/microcode_intel.c
+++ b/arch/x86/kernel/microcode_intel.c
@@ -391,6 +391,7 @@ static enum ucode_state generic_load_microcode(int cpu, void *data, size_t size,
}
vfree(uci->mc);
+ uci->new_rev = new_rev;
uci->mc = (struct microcode_intel *)new_mc;
pr_debug("CPU%d found a matching microcode update with version 0x%x (current=0x%x)\n",
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index f4b9b80..98b5b5c 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -68,6 +68,7 @@
#include <linux/percpu.h>
#include <linux/crash_dump.h>
#include <linux/tboot.h>
+#include <linux/notifier.h>
#include <video/edid.h>
@@ -205,6 +206,10 @@ struct cpuinfo_x86 boot_cpu_data __read_mostly = {
EXPORT_SYMBOL(boot_cpu_data);
#endif
+/*
+ * Lives here because the microcode stuff is modular.
+ */
+struct atomic_notifier_head microcode_notifier;
#if !defined(CONFIG_X86_PAE) || defined(CONFIG_X86_64)
unsigned long mmu_cr4_features;
On Fri, Jun 08, 2012 at 03:26:12PM +0200, Peter Zijlstra wrote:
> On Fri, 2012-06-08 at 12:00 +0200, Peter Zijlstra wrote:
> > > > Or we could put a hook in the ucode loader.
> > >
> > > I'd really suggest the latter - I doubt this will be our only
> > > ucode dependent quirk, going forward ...
> >
> > Yeah, am in the middle of writing that..
>
> OK so the micro-code stuff is a complete trainwreck.
>
> The very worst is that it does per-cpu micro-code updates, not machine
> wide. This results in it being able to have different revisions on
> different cpus. This in turn makes the below O(n^2) :/
Reportedly, there are some obscure systems which need different
microcode versions per CPU:
http://lkml.indiana.edu/hypermail/linux/kernel/1105.3/01010.html
> The biggest problem is finding when the minimum revision changes, at
> best this is a n log n sorting problem due to the per-cpu setup, but I
> couldn't be arsed to implement a tree or anything fancy since it all
> stinks anyway.
I know. Can't you just iterate over all CPUs and collect the lowest
ucode version? Provided, of course, newer microcode versions means a
higher version number.
> Why do we have per-cpu firmware anyway?
See above.
[ … ]
> diff --git a/arch/x86/kernel/cpu/perf_event_intel.c b/arch/x86/kernel/cpu/perf_event_intel.c
> index 5ec146c..bde86cf 100644
> --- a/arch/x86/kernel/cpu/perf_event_intel.c
> +++ b/arch/x86/kernel/cpu/perf_event_intel.c
> @@ -15,6 +15,7 @@
>
> #include <asm/hardirq.h>
> #include <asm/apic.h>
> +#include <asm/microcode.h>
>
> #include "perf_event.h"
>
> @@ -1714,11 +1715,67 @@ static __init void intel_clovertown_quirk(void)
> x86_pmu.pebs_constraints = NULL;
> }
>
> +static const u32 snb_ucode_rev = 0x28;
> +
> +static void intel_snb_verify_ucode(void)
> +{
> + u32 rev = UINT_MAX;
> + int pebs_broken = 0;
> + int cpu;
> +
> + get_online_cpus();
> + /*
> + * Because the microcode loader is bloody stupid and allows different
> + * revisions per cpu and does strictly per-cpu loading, we now have to
> + * check all cpus to determine the minimally installed revision.
> + *
> + * This makes updating the microcode O(n^2) in the number of CPUs :/
> + */
> + for_each_online_cpu(cpu)
> + rev = min(cpu_data(cpu).microcode, rev);
> + put_online_cpus();
> +
> + pebs_broken = (rev < snb_ucode_rev);
> +
> + if (pebs_broken == x86_pmu.pebs_broken)
> + return;
> +
> + /*
> + * Serialized by the microcode lock..
> + */
> + if (x86_pmu.pebs_broken) {
> + pr_info("PEBS enabled due to micro-code update\n");
> + x86_pmu.pebs_broken = 0;
> + } else {
> + pr_info("PEBS disabled due to CPU errata, "
> + "please upgrade micro-code to at least %x (current: %x)\n",
> + snb_ucode_rev, rev);
> + x86_pmu.pebs_broken = 1;
> + }
> +}
> +
> +static int intel_snb_ucode_notifier(struct notifier_block *self,
> + unsigned long action, void *_uci)
> +{
> + /*
> + * Since ucode cannot be down-graded, and no future ucode revision
> + * is known to break PEBS again, we're ok with MICROCODE_CAN_UPDATE.
> + */
> +
> + if (action == MICROCODE_UPDATED)
> + intel_snb_verify_ucode();
Actually, the deal here is that you could have received microcode
already from BIOS and you won't have to necessarily load ucode from
userspace with the Linux ucode loader.
Which means that on boxes which already come with PEBS-good microcode
version from the BIOS, you don't need the ucode notifier because you
most certainly are not loading newer ucode version from userspace - the
newest version is in the BIOS already.
In these cases, you already have PEBS fixed but since you're not loading
any ucode, you won't run intel_snb_verify_ucode().
So, you want to check for PEBS twice (and for all other features fixed
by ucode and tested for earlier than the ucode loader, for that matter):
* once when perf inits
* twice, a bit later when the ucode loader has loaded something from
userspace and the notifier corrects it.
Btw, this is why we wanted to load ucode as early as possible but
there's no progress on the whole thing right now...
--
Regards/Gruss,
Boris.
Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
GM: Alberto Bozzo
Reg: Dornach, Landkreis Muenchen
HRB Nr. 43632 WEEE Registernr: 129 19551
On Fri, 2012-06-08 at 15:51 +0200, Borislav Petkov wrote:
> Reportedly, there are some obscure systems which need different
> microcode versions per CPU:
>
> http://lkml.indiana.edu/hypermail/linux/kernel/1105.3/01010.html
*groan*,.. ok.
> > The biggest problem is finding when the minimum revision changes, at
> > best this is a n log n sorting problem due to the per-cpu setup, but I
> > couldn't be arsed to implement a tree or anything fancy since it all
> > stinks anyway.
>
> I know. Can't you just iterate over all CPUs and collect the lowest
> ucode version? Provided, of course, newer microcode versions means a
> higher version number.
That's what I do.. but that's O(n) per cpu, so if you update all cpus,
that's O(n^2).
You don't know when userspace is done updating the 'last' cpu, so you
have to scan all cpus for every cpu.
I'm sure the SGI people with their silly CPU counts aren't too thrilled
to have this.
On Fri, 2012-06-08 at 15:51 +0200, Borislav Petkov wrote:
>
> In these cases, you already have PEBS fixed but since you're not loading
> any ucode, you won't run intel_snb_verify_ucode().
>
> So, you want to check for PEBS twice (and for all other features fixed
> by ucode and tested for earlier than the ucode loader, for that matter):
>
> * once when perf inits
>
> * twice, a bit later when the ucode loader has loaded something from
> userspace and the notifier corrects it.
> static __init void intel_sandybridge_quirk(void)
> {
> + intel_snb_verify_ucode();
> + /*
> + * we're still single threaded, so while there's a hole here,
> + * you can't trigger it.
> + */
> + microcode_notifier(intel_snb_ucode_notifier);
> }
Like that you mean?
On Fri, 2012-06-08 at 15:51 +0200, Borislav Petkov wrote:
> Provided, of course, newer microcode versions means a
> higher version number.
Fair enough, I assumed this.. is this true?
On Fri, Jun 08, 2012 at 03:56:11PM +0200, Peter Zijlstra wrote:
> > static __init void intel_sandybridge_quirk(void)
> > {
> > + intel_snb_verify_ucode();
> > + /*
> > + * we're still single threaded, so while there's a hole here,
> > + * you can't trigger it.
> > + */
> > + microcode_notifier(intel_snb_ucode_notifier);
> > }
>
> Like that you mean?
Oh ok, this is regged through x86_add_quirk() in the pmu init code. Yep,
this is what I mean.
--
Regards/Gruss,
Boris.
Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
GM: Alberto Bozzo
Reg: Dornach, Landkreis Muenchen
HRB Nr. 43632 WEEE Registernr: 129 19551
On Fri, Jun 8, 2012 at 3:59 PM, Peter Zijlstra <[email protected]> wrote:
> On Fri, 2012-06-08 at 15:51 +0200, Borislav Petkov wrote:
>> Provided, of course, newer microcode versions means a
>> higher version number.
>
> Fair enough, I assumed this.. is this true?
>
I would think so, otherwise it gets even messier!
On Fri, Jun 8, 2012 at 3:26 PM, Peter Zijlstra <[email protected]> wrote:
> On Fri, 2012-06-08 at 12:00 +0200, Peter Zijlstra wrote:
>> > > Or we could put a hook in the ucode loader.
>> >
>> > I'd really suggest the latter - I doubt this will be our only
>> > ucode dependent quirk, going forward ...
>>
>> Yeah, am in the middle of writing that..
>
> OK so the micro-code stuff is a complete trainwreck.
>
> The very worst is that it does per-cpu micro-code updates, not machine
> wide. This results in it being able to have different revisions on
> different cpus. This in turn makes the below O(n^2) :/
>
But it's not like this is a frequent operation either...
> The biggest problem is finding when the minimum revision changes, at
> best this is a n log n sorting problem due to the per-cpu setup, but I
> couldn't be arsed to implement a tree or anything fancy since it all
> stinks anyway.
>
> Why do we have per-cpu firmware anyway?
>
> Anyway, I have the below in two patches, but I thought to first post the
> entire mess since if anybody has a better idea I'm all ears.
>
> It does work though..
>
> ---
> arch/x86/include/asm/microcode.h | 12 ++++++
> arch/x86/kernel/cpu/perf_event.c | 14 +++----
> arch/x86/kernel/cpu/perf_event.h | 2 +-
> arch/x86/kernel/cpu/perf_event_intel.c | 63 ++++++++++++++++++++++++++++++--
> arch/x86/kernel/microcode_amd.c | 1 +
> arch/x86/kernel/microcode_core.c | 53 +++++++++++++++++++++++----
> arch/x86/kernel/microcode_intel.c | 1 +
> arch/x86/kernel/setup.c | 5 +++
> 8 files changed, 132 insertions(+), 19 deletions(-)
>
> diff --git a/arch/x86/include/asm/microcode.h b/arch/x86/include/asm/microcode.h
> index 4ebe157..1cd2dc5 100644
> --- a/arch/x86/include/asm/microcode.h
> +++ b/arch/x86/include/asm/microcode.h
> @@ -32,6 +32,7 @@ struct microcode_ops {
>
> struct ucode_cpu_info {
> struct cpu_signature cpu_sig;
> + int new_rev;
> int valid;
> void *mc;
> };
> @@ -63,4 +64,15 @@ static inline struct microcode_ops * __init init_amd_microcode(void)
> static inline void __exit exit_amd_microcode(void) {}
> #endif
>
> +extern struct blocking_notifier_head microcode_notifier;
> +
That is a problem because microcode can be compiled as a module.
When I tried compiling your patch I got undefined for this notifier because
I have microcode update as a module...
> +#define MICROCODE_CAN_UPDATE 0x01
> +#define MICROCODE_UPDATED 0x02
> +
> +#define microcode_notifier(fn) \
> +do { \
> + static struct notifier_block fn##_nb ={ .notifier_call = fn }; \
> + blocking_notifier_chain_register(µcode_notifier, &fn##_nb);\
> +} while (0)
> +
> #endif /* _ASM_X86_MICROCODE_H */
> diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c
> index 000a474..d3ef86c1 100644
> --- a/arch/x86/kernel/cpu/perf_event.c
> +++ b/arch/x86/kernel/cpu/perf_event.c
> @@ -377,7 +377,7 @@ int x86_pmu_hw_config(struct perf_event *event)
> int precise = 0;
>
> /* Support for constant skid */
> - if (x86_pmu.pebs_active) {
> + if (x86_pmu.pebs_active && !x86_pmu.pebs_broken) {
> precise++;
>
> /* Support for IP fixup */
> @@ -1677,9 +1677,9 @@ static struct pmu pmu = {
> .pmu_enable = x86_pmu_enable,
> .pmu_disable = x86_pmu_disable,
>
> - .attr_groups = x86_pmu_attr_groups,
> + .attr_groups = x86_pmu_attr_groups,
>
> - .event_init = x86_pmu_event_init,
> + .event_init = x86_pmu_event_init,
>
> .add = x86_pmu_add,
> .del = x86_pmu_del,
> @@ -1687,11 +1687,11 @@ static struct pmu pmu = {
> .stop = x86_pmu_stop,
> .read = x86_pmu_read,
>
> - .start_txn = x86_pmu_start_txn,
> - .cancel_txn = x86_pmu_cancel_txn,
> - .commit_txn = x86_pmu_commit_txn,
> + .start_txn = x86_pmu_start_txn,
> + .cancel_txn = x86_pmu_cancel_txn,
> + .commit_txn = x86_pmu_commit_txn,
>
> - .event_idx = x86_pmu_event_idx,
> + .event_idx = x86_pmu_event_idx,
> .flush_branch_stack = x86_pmu_flush_branch_stack,
> };
>
> diff --git a/arch/x86/kernel/cpu/perf_event.h b/arch/x86/kernel/cpu/perf_event.h
> index 3df3de9..a5ecfe8 100644
> --- a/arch/x86/kernel/cpu/perf_event.h
> +++ b/arch/x86/kernel/cpu/perf_event.h
> @@ -373,7 +373,7 @@ struct x86_pmu {
> * Intel DebugStore bits
> */
> int bts, pebs;
> - int bts_active, pebs_active;
> + int bts_active, pebs_active, pebs_broken;
> int pebs_record_size;
> void (*drain_pebs)(struct pt_regs *regs);
> struct event_constraint *pebs_constraints;
> diff --git a/arch/x86/kernel/cpu/perf_event_intel.c b/arch/x86/kernel/cpu/perf_event_intel.c
> index 5ec146c..bde86cf 100644
> --- a/arch/x86/kernel/cpu/perf_event_intel.c
> +++ b/arch/x86/kernel/cpu/perf_event_intel.c
> @@ -15,6 +15,7 @@
>
> #include <asm/hardirq.h>
> #include <asm/apic.h>
> +#include <asm/microcode.h>
>
> #include "perf_event.h"
>
> @@ -1714,11 +1715,67 @@ static __init void intel_clovertown_quirk(void)
> x86_pmu.pebs_constraints = NULL;
> }
>
> +static const u32 snb_ucode_rev = 0x28;
> +
That needs to be a per CPU model value. It
is not the same for SNB vs. SNB-EP. On EP
it may even depends on stepping.
> +static void intel_snb_verify_ucode(void)
> +{
> + u32 rev = UINT_MAX;
> + int pebs_broken = 0;
> + int cpu;
> +
> + get_online_cpus();
> + /*
> + * Because the microcode loader is bloody stupid and allows different
> + * revisions per cpu and does strictly per-cpu loading, we now have to
> + * check all cpus to determine the minimally installed revision.
> + *
> + * This makes updating the microcode O(n^2) in the number of CPUs :/
> + */
> + for_each_online_cpu(cpu)
> + rev = min(cpu_data(cpu).microcode, rev);
> + put_online_cpus();
> +
> + pebs_broken = (rev < snb_ucode_rev);
> +
> + if (pebs_broken == x86_pmu.pebs_broken)
> + return;
> +
> + /*
> + * Serialized by the microcode lock..
> + */
> + if (x86_pmu.pebs_broken) {
> + pr_info("PEBS enabled due to micro-code update\n");
> + x86_pmu.pebs_broken = 0;
> + } else {
> + pr_info("PEBS disabled due to CPU errata, "
> + "please upgrade micro-code to at least %x (current: %x)\n",
> + snb_ucode_rev, rev);
> + x86_pmu.pebs_broken = 1;
> + }
> +}
> +
> +static int intel_snb_ucode_notifier(struct notifier_block *self,
> + unsigned long action, void *_uci)
> +{
> + /*
> + * Since ucode cannot be down-graded, and no future ucode revision
> + * is known to break PEBS again, we're ok with MICROCODE_CAN_UPDATE.
> + */
> +
> + if (action == MICROCODE_UPDATED)
> + intel_snb_verify_ucode();
> +
> + return NOTIFY_DONE;
> +}
> +
> static __init void intel_sandybridge_quirk(void)
> {
> - pr_warn("PEBS disabled due to CPU errata\n");
> - x86_pmu.pebs = 0;
> - x86_pmu.pebs_constraints = NULL;
> + intel_snb_verify_ucode();
> + /*
> + * we're still single threaded, so while there's a hole here,
> + * you can't trigger it.
> + */
> + microcode_notifier(intel_snb_ucode_notifier);
> }
>
> static const struct { int id; char *name; } intel_arch_events_map[] __initconst = {
> diff --git a/arch/x86/kernel/microcode_amd.c b/arch/x86/kernel/microcode_amd.c
> index 8a2ce8f..265831e 100644
> --- a/arch/x86/kernel/microcode_amd.c
> +++ b/arch/x86/kernel/microcode_amd.c
> @@ -294,6 +294,7 @@ generic_load_microcode(int cpu, const u8 *data, size_t size)
> }
>
> out_ok:
> + uci->new_rev = new_rev;
> uci->mc = new_mc;
> state = UCODE_OK;
> pr_debug("CPU%d update ucode (0x%08x -> 0x%08x)\n",
> diff --git a/arch/x86/kernel/microcode_core.c b/arch/x86/kernel/microcode_core.c
> index fbdfc69..a6cad81 100644
> --- a/arch/x86/kernel/microcode_core.c
> +++ b/arch/x86/kernel/microcode_core.c
> @@ -167,15 +167,43 @@ static void apply_microcode_local(void *arg)
> ctx->err = microcode_ops->apply_microcode(smp_processor_id());
> }
>
> +/*
> + * Call the notifier chain before we update to verify we can indeed
> + * update to the desired revision.
> + */
> +
> +static int microcode_notifier_check(struct ucode_cpu_info *uci)
> +{
> + int ret = blocking_notifier_call_chain(µcode_notifier,
> + MICROCODE_CAN_UPDATE, uci);
> + return notifier_to_errno(ret);
> +}
> +
> +/*
> + * Call the notifier chain after we've updated to
> + */
> +
> +static void microcode_notifier_done(void)
> +{
> + blocking_notifier_call_chain(µcode_notifier, MICROCODE_UPDATED, NULL);
> +}
> +
> static int apply_microcode_on_target(int cpu)
> {
> struct apply_microcode_ctx ctx = { .err = 0 };
> int ret;
>
> + ret = microcode_notifier_check(ucode_cpu_info + cpu);
> + if (ret)
> + return ret;
> +
> ret = smp_call_function_single(cpu, apply_microcode_local, &ctx, 1);
> if (!ret)
> ret = ctx.err;
>
> + if (!ret)
> + microcode_notifier_done();
> +
> return ret;
> }
>
> @@ -196,8 +224,11 @@ static int do_microcode_update(const void __user *buf, size_t size)
> if (ustate == UCODE_ERROR) {
> error = -1;
> break;
> - } else if (ustate == UCODE_OK)
> - apply_microcode_on_target(cpu);
> + } else if (ustate == UCODE_OK) {
> + error = apply_microcode_on_target(cpu);
> + if (error)
> + break;
> + }
> }
>
> return error;
> @@ -282,11 +313,12 @@ static int reload_for_cpu(int cpu)
> enum ucode_state ustate;
>
> ustate = microcode_ops->request_microcode_fw(cpu, µcode_pdev->dev);
> - if (ustate == UCODE_OK)
> - apply_microcode_on_target(cpu);
> - else
> + if (ustate == UCODE_OK) {
> + err = apply_microcode_on_target(cpu);
> + } else {
> if (ustate == UCODE_ERROR)
> err = -EINVAL;
> + }
> }
> mutex_unlock(µcode_mutex);
>
> @@ -361,14 +393,15 @@ static void microcode_fini_cpu(int cpu)
> static enum ucode_state microcode_resume_cpu(int cpu)
> {
> struct ucode_cpu_info *uci = ucode_cpu_info + cpu;
> + int err;
>
> if (!uci->mc)
> return UCODE_NFOUND;
>
> pr_debug("CPU%d updated upon resume\n", cpu);
> - apply_microcode_on_target(cpu);
> + err = apply_microcode_on_target(cpu);
>
> - return UCODE_OK;
> + return !err ? UCODE_OK : UCODE_ERROR;
> }
>
> static enum ucode_state microcode_init_cpu(int cpu)
> @@ -385,8 +418,12 @@ static enum ucode_state microcode_init_cpu(int cpu)
> ustate = microcode_ops->request_microcode_fw(cpu, µcode_pdev->dev);
>
> if (ustate == UCODE_OK) {
> + int err;
> +
> pr_debug("CPU%d updated upon init\n", cpu);
> - apply_microcode_on_target(cpu);
> + err = apply_microcode_on_target(cpu);
> + if (err)
> + ustate = UCODE_ERROR;
> }
>
> return ustate;
> diff --git a/arch/x86/kernel/microcode_intel.c b/arch/x86/kernel/microcode_intel.c
> index 0327e2b..6b87bd5 100644
> --- a/arch/x86/kernel/microcode_intel.c
> +++ b/arch/x86/kernel/microcode_intel.c
> @@ -391,6 +391,7 @@ static enum ucode_state generic_load_microcode(int cpu, void *data, size_t size,
> }
>
> vfree(uci->mc);
> + uci->new_rev = new_rev;
> uci->mc = (struct microcode_intel *)new_mc;
>
> pr_debug("CPU%d found a matching microcode update with version 0x%x (current=0x%x)\n",
> diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
> index f4b9b80..98b5b5c 100644
> --- a/arch/x86/kernel/setup.c
> +++ b/arch/x86/kernel/setup.c
> @@ -68,6 +68,7 @@
> #include <linux/percpu.h>
> #include <linux/crash_dump.h>
> #include <linux/tboot.h>
> +#include <linux/notifier.h>
>
> #include <video/edid.h>
>
> @@ -205,6 +206,10 @@ struct cpuinfo_x86 boot_cpu_data __read_mostly = {
> EXPORT_SYMBOL(boot_cpu_data);
> #endif
>
> +/*
> + * Lives here because the microcode stuff is modular.
> + */
> +struct atomic_notifier_head microcode_notifier;
>
> #if !defined(CONFIG_X86_PAE) || defined(CONFIG_X86_64)
> unsigned long mmu_cr4_features;
>
>
On Fri, 2012-06-08 at 16:07 +0200, Stephane Eranian wrote:
> > The very worst is that it does per-cpu micro-code updates, not machine
> > wide. This results in it being able to have different revisions on
> > different cpus. This in turn makes the below O(n^2) :/
> >
> But it's not like this is a frequent operation either...
Still Dimitri did a patch 938179b4f8cf8 ("x86: Improve Intel microcode
loader performance") to improve the performance of it, so apparently
they do care some.
> > ---
> > diff --git a/arch/x86/include/asm/microcode.h b/arch/x86/include/asm/microcode.h
> > index 4ebe157..1cd2dc5 100644
> > --- a/arch/x86/include/asm/microcode.h
> > +++ b/arch/x86/include/asm/microcode.h
> > @@ -63,4 +64,15 @@ static inline struct microcode_ops * __init init_amd_microcode(void)
> > static inline void __exit exit_amd_microcode(void) {}
> > #endif
> >
> > +extern struct blocking_notifier_head microcode_notifier;
> > +
> That is a problem because microcode can be compiled as a module.
> When I tried compiling your patch I got undefined for this notifier because
> I have microcode update as a module...
>
> > +#define MICROCODE_CAN_UPDATE 0x01
> > +#define MICROCODE_UPDATED 0x02
> > +
> > +#define microcode_notifier(fn) \
> > +do { \
> > + static struct notifier_block fn##_nb ={ .notifier_call = fn }; \
> > + blocking_notifier_chain_register(µcode_notifier, &fn##_nb);\
> > +} while (0)
> > +
> > #endif /* _ASM_X86_MICROCODE_H */
But that's ourside of CONFIG_MICROCODE and...
> > diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
> > index f4b9b80..98b5b5c 100644
> > --- a/arch/x86/kernel/setup.c
> > +++ b/arch/x86/kernel/setup.c
> > @@ -68,6 +68,7 @@
> > #include <linux/percpu.h>
> > #include <linux/crash_dump.h>
> > #include <linux/tboot.h>
> > +#include <linux/notifier.h>
> >
> > #include <video/edid.h>
> >
> > @@ -205,6 +206,10 @@ struct cpuinfo_x86 boot_cpu_data __read_mostly = {
> > EXPORT_SYMBOL(boot_cpu_data);
> > #endif
> >
> > +/*
> > + * Lives here because the microcode stuff is modular.
> > + */
> > +struct atomic_notifier_head microcode_notifier;
s/atomic_/blocking_/ -- forgot a refresh there..
> > #if !defined(CONFIG_X86_PAE) || defined(CONFIG_X86_64)
> > unsigned long mmu_cr4_features;
So that all should build.. weird.. I'll give it a go.
On Fri, Jun 08, 2012 at 03:54:29PM +0200, Peter Zijlstra wrote:
> On Fri, 2012-06-08 at 15:51 +0200, Borislav Petkov wrote:
> > Reportedly, there are some obscure systems which need different
> > microcode versions per CPU:
> >
> > http://lkml.indiana.edu/hypermail/linux/kernel/1105.3/01010.html
>
> *groan*,.. ok.
Tell me about it - if we didn't have to support this, we could've killed
a lot of code from the thing.
> > > The biggest problem is finding when the minimum revision changes, at
> > > best this is a n log n sorting problem due to the per-cpu setup, but I
> > > couldn't be arsed to implement a tree or anything fancy since it all
> > > stinks anyway.
> >
> > I know. Can't you just iterate over all CPUs and collect the lowest
> > ucode version? Provided, of course, newer microcode versions means a
> > higher version number.
>
> That's what I do.. but that's O(n) per cpu, so if you update all cpus,
> that's O(n^2).
>
> You don't know when userspace is done updating the 'last' cpu, so you
> have to scan all cpus for every cpu.
>
> I'm sure the SGI people with their silly CPU counts aren't too thrilled
> to have this.
Ok, hmm, I don't have a bright idea right now except maybe a stupid one:
have a variable which gets initialized to the number of all CPUs and
each time ->apply_microcode() finishes by returning 0, we decrement it
once.
"Finishes by returning 0" means ucode successfully updated.
When it has reached 0, microcode_notifier_done() says
MICROCODE_SUCCESSFULLY_UPDATED_ON_ALL_CPUS and only then you run
intel_snb_verify_ucode().
However(!),
* what do you do when you boot the box with less CPUs than present? All
the offline CPUs cannot get to userspace to get the chance to update
ucode. In that case, counter stays !0 and bo PEBS, tough luck.
* hotplug: we don't care about it because once we ONLINE a CPU again, it
automatically gets the ucode applied.
Hmm, I'm probably missing some obscure case.
--
Regards/Gruss,
Boris.
Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
GM: Alberto Bozzo
Reg: Dornach, Landkreis Muenchen
HRB Nr. 43632 WEEE Registernr: 129 19551
On Fri, Jun 08, 2012 at 04:03:38PM +0200, Stephane Eranian wrote:
> On Fri, Jun 8, 2012 at 3:59 PM, Peter Zijlstra <[email protected]> wrote:
> > On Fri, 2012-06-08 at 15:51 +0200, Borislav Petkov wrote:
> >> Provided, of course, newer microcode versions means a
> >> higher version number.
> >
> > Fair enough, I assumed this.. is this true?
> >
> I would think so, otherwise it gets even messier!
Need to get Intel folk to say something here, hpa?
--
Regards/Gruss,
Boris.
Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
GM: Alberto Bozzo
Reg: Dornach, Landkreis Muenchen
HRB Nr. 43632 WEEE Registernr: 129 19551
On Fri, Jun 08, 2012 at 04:07:43PM +0200, Stephane Eranian wrote:
> > The very worst is that it does per-cpu micro-code updates, not machine
> > wide. This results in it being able to have different revisions on
> > different cpus. This in turn makes the below O(n^2) :/
> >
> But it's not like this is a frequent operation either...
Yep, you're only doing it when there's new version from the hw vendor or
when rebooting...
[ … ]
> > +extern struct blocking_notifier_head microcode_notifier;
> > +
> That is a problem because microcode can be compiled as a module.
> When I tried compiling your patch I got undefined for this notifier because
> I have microcode update as a module...
Yes, because it uses request_firmware() and it needs userspace.
--
Regards/Gruss,
Boris.
Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
GM: Alberto Bozzo
Reg: Dornach, Landkreis Muenchen
HRB Nr. 43632 WEEE Registernr: 129 19551
On Fri, 2012-06-08 at 16:15 +0200, Borislav Petkov wrote:
> have a variable which gets initialized to the number of all CPUs and
> each time ->apply_microcode() finishes by returning 0, we decrement it
> once.
>
> Hmm, I'm probably missing some obscure case.
Since its all per-cpu sysfs muck, userspace could update a random
subsets of cpus.. leaving us hanging.
The 'bestestet' idea I came up with is doing the verify thing I have
from a delayed work -- say 1 second into the future. That way, when
there's lots of cpus they all try and enqueue the one work, which at the
end executes only once, provided the entire update scan took less than
the second.
But yuck..
On Fri, 2012-06-08 at 16:07 +0200, Stephane Eranian wrote:
> > +extern struct blocking_notifier_head microcode_notifier;
> > +
> That is a problem because microcode can be compiled as a module.
> When I tried compiling your patch I got undefined for this notifier because
> I have microcode update as a module...
Ah, no, setup.c needs an EXPORT_SYMBOL_GPL(microcode_notifier);. Then it
all builds again.
On Fri, 2012-06-08 at 16:07 +0200, Stephane Eranian wrote:
> > +static const u32 snb_ucode_rev = 0x28;
> > +
>
> That needs to be a per CPU model value. It
> is not the same for SNB vs. SNB-EP. On EP
> it may even depends on stepping.
Yeah, you said, easiest is removing the const and putting some
assignments to the thing somewhere in our model switch.
Do you know what values for what chip we should use?
On Fri, Jun 8, 2012 at 4:25 PM, Peter Zijlstra <[email protected]> wrote:
> On Fri, 2012-06-08 at 16:07 +0200, Stephane Eranian wrote:
>> > +static const u32 snb_ucode_rev = 0x28;
>> > +
>>
>> That needs to be a per CPU model value. It
>> is not the same for SNB vs. SNB-EP. On EP
>> it may even depends on stepping.
>
> Yeah, you said, easiest is removing the const and putting some
> assignments to the thing somewhere in our model switch.
>
> Do you know what values for what chip we should use?
>
Checking on that.
On Fri, Jun 8, 2012 at 4:23 PM, Peter Zijlstra <[email protected]> wrote:
> On Fri, 2012-06-08 at 16:07 +0200, Stephane Eranian wrote:
>> > +extern struct blocking_notifier_head microcode_notifier;
>> > +
>> That is a problem because microcode can be compiled as a module.
>> When I tried compiling your patch I got undefined for this notifier because
>> I have microcode update as a module...
>
> Ah, no, setup.c needs an EXPORT_SYMBOL_GPL(microcode_notifier);. Then it
> all builds again.
>
Yes, it works with this now!
Thanks.
On Fri, Jun 08, 2012 at 04:20:50PM +0200, Peter Zijlstra wrote:
> On Fri, 2012-06-08 at 16:15 +0200, Borislav Petkov wrote:
> > have a variable which gets initialized to the number of all CPUs and
> > each time ->apply_microcode() finishes by returning 0, we decrement it
> > once.
>
> >
> > Hmm, I'm probably missing some obscure case.
>
> Since its all per-cpu sysfs muck, userspace could update a random
> subsets of cpus.. leaving us hanging.
I'm afraid I don't understand - when you modprobe microcode.ko,
it goes and loads /lib/firmware/amd-ucode/microcode_amd.bin (in
the AMD case) on each CPU when the driver gets regged through
subsys_interface_register().
It calls ->add_dev() on each CPU - this should be guaranteed because it
uses the cpu_subsys from drivers/base/cpu.c which onlines all CPUs, I'd
assume.
So, I'd say that once subsys_interface_register() returns, we have
updated ucode on all CPUs, if successful...
We probably could run the notifier at that moment, before we do
put_online_cpus().
> The 'bestestet' idea I came up with is doing the verify thing I have
> from a delayed work -- say 1 second into the future. That way, when
> there's lots of cpus they all try and enqueue the one work, which at
> the end executes only once, provided the entire update scan took less
> than the second.
You're saying, you want the last CPU that gets to update its microcode
gets to also run the delayed work...? Probably, I'd assume ucode update
on a single CPU takes less than a second IIUC.
--
Regards/Gruss,
Boris.
Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
GM: Alberto Bozzo
Reg: Dornach, Landkreis Muenchen
HRB Nr. 43632 WEEE Registernr: 129 19551
On Fri, 2012-06-08 at 16:36 +0200, Borislav Petkov wrote:
> On Fri, Jun 08, 2012 at 04:20:50PM +0200, Peter Zijlstra wrote:
> > On Fri, 2012-06-08 at 16:15 +0200, Borislav Petkov wrote:
> > > have a variable which gets initialized to the number of all CPUs and
> > > each time ->apply_microcode() finishes by returning 0, we decrement it
> > > once.
> >
> > >
> > > Hmm, I'm probably missing some obscure case.
> >
> > Since its all per-cpu sysfs muck, userspace could update a random
> > subsets of cpus.. leaving us hanging.
>
> I'm afraid I don't understand - when you modprobe microcode.ko,
> it goes and loads /lib/firmware/amd-ucode/microcode_amd.bin (in
> the AMD case) on each CPU when the driver gets regged through
> subsys_interface_register().
>
> It calls ->add_dev() on each CPU - this should be guaranteed because it
> uses the cpu_subsys from drivers/base/cpu.c which onlines all CPUs, I'd
> assume.
>
> So, I'd say that once subsys_interface_register() returns, we have
> updated ucode on all CPUs, if successful...
>
> We probably could run the notifier at that moment, before we do
> put_online_cpus().
I was thinking about reload_store(), that seems to only reload ucode for
a single cpu.
> > The 'bestestet' idea I came up with is doing the verify thing I have
> > from a delayed work -- say 1 second into the future. That way, when
> > there's lots of cpus they all try and enqueue the one work, which at
> > the end executes only once, provided the entire update scan took less
> > than the second.
>
> You're saying, you want the last CPU that gets to update its microcode
> gets to also run the delayed work...? Probably, I'd assume ucode update
> on a single CPU takes less than a second IIUC.
Nah.. it'll probably be the first. But it doesn't matter which cpu does
it. So the idea was:
static void intel_snb_verify_work(struct work_struct *work)
{
/* do the verify thing.. */
}
static DECLARE_DELAYED_WORK(intel_snb_delayed_work, intel_snb_verify_ucode);
static int intel_snb_ucode_notifier(struct notifier_block *self,
unsigned long action, void *_uci)
{
/*
* Since ucode cannot be down-graded, and no future ucode revision
* is known to break PEBS again, we're ok with MICROCODE_CAN_UPDATE.
*/
if (action == MICROCODE_UPDATED)
schedule_delayed_work(&intel_snb_delayed_work, HZ);
return NOTIFY_DONE;
}
Thus it will queue the delayed work when the work isn't already queued
for execution. Resulting in the work only happening once a second (at
most).
On Fri, Jun 08, 2012 at 04:45:18PM +0200, Peter Zijlstra wrote:
> I was thinking about reload_store(), that seems to only reload ucode for
> a single cpu.
Ok.
> > > The 'bestestet' idea I came up with is doing the verify thing I have
> > > from a delayed work -- say 1 second into the future. That way, when
> > > there's lots of cpus they all try and enqueue the one work, which at
> > > the end executes only once, provided the entire update scan took less
> > > than the second.
> >
> > You're saying, you want the last CPU that gets to update its microcode
> > gets to also run the delayed work...? Probably, I'd assume ucode update
> > on a single CPU takes less than a second IIUC.
>
> Nah.. it'll probably be the first. But it doesn't matter which cpu does
> it. So the idea was:
>
> static void intel_snb_verify_work(struct work_struct *work)
> {
> /* do the verify thing.. */
> }
>
> static DECLARE_DELAYED_WORK(intel_snb_delayed_work, intel_snb_verify_ucode);
>
> static int intel_snb_ucode_notifier(struct notifier_block *self,
> unsigned long action, void *_uci)
> {
> /*
> * Since ucode cannot be down-graded, and no future ucode revision
> * is known to break PEBS again, we're ok with MICROCODE_CAN_UPDATE.
> */
>
> if (action == MICROCODE_UPDATED)
> schedule_delayed_work(&intel_snb_delayed_work, HZ);
>
> return NOTIFY_DONE;
> }
>
> Thus it will queue the delayed work when the work isn't already queued
> for execution. Resulting in the work only happening once a second (at
> most).
Ok, this would probably work - the last cpu that schedules the work should
definitely see ucode version updated on all cpus.
Or, instead of doing the verify thing on each cpu, you could track which
is the last cpu to run the delayed work and do the verify thing only on
it (the other works simply remove a bit from a bitmask or whatever...).
--
Regards/Gruss,
Boris.
Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
GM: Alberto Bozzo
Reg: Dornach, Landkreis Muenchen
HRB Nr. 43632 WEEE Registernr: 129 19551
On 06/08/2012 07:17 AM, Borislav Petkov wrote:
> On Fri, Jun 08, 2012 at 04:03:38PM +0200, Stephane Eranian wrote:
>> On Fri, Jun 8, 2012 at 3:59 PM, Peter Zijlstra <[email protected]> wrote:
>>> On Fri, 2012-06-08 at 15:51 +0200, Borislav Petkov wrote:
>>>> Provided, of course, newer microcode versions means a
>>>> higher version number.
>>>
>>> Fair enough, I assumed this.. is this true?
>>>
>> I would think so, otherwise it gets even messier!
>
> Need to get Intel folk to say something here, hpa?
>
There are some exceptions; but it's good enough for this purpose.
The main exception is that not every revision may be installable on
every CPU; so you might have 0x20, 0x24, 0x28 on one CPU and 0x22, 0x26,
0x2a on another; this only requires caution when you set your bracket
values.
-hpa
--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel. I don't speak on their behalf.
On Fri, Jun 8, 2012 at 3:26 PM, Peter Zijlstra <[email protected]> wrote:
> On Fri, 2012-06-08 at 12:00 +0200, Peter Zijlstra wrote:
>> > > Or we could put a hook in the ucode loader.
>> >
>> > I'd really suggest the latter - I doubt this will be our only
>> > ucode dependent quirk, going forward ...
>>
>> Yeah, am in the middle of writing that..
>
> OK so the micro-code stuff is a complete trainwreck.
>
> The very worst is that it does per-cpu micro-code updates, not machine
> wide. This results in it being able to have different revisions on
> different cpus. This in turn makes the below O(n^2) :/
>
> The biggest problem is finding when the minimum revision changes, at
> best this is a n log n sorting problem due to the per-cpu setup, but I
> couldn't be arsed to implement a tree or anything fancy since it all
> stinks anyway.
>
> Why do we have per-cpu firmware anyway?
>
> Anyway, I have the below in two patches, but I thought to first post the
> entire mess since if anybody has a better idea I'm all ears.
>
> It does work though..
>
> ---
> arch/x86/include/asm/microcode.h | 12 ++++++
> arch/x86/kernel/cpu/perf_event.c | 14 +++----
> arch/x86/kernel/cpu/perf_event.h | 2 +-
> arch/x86/kernel/cpu/perf_event_intel.c | 63 ++++++++++++++++++++++++++++++--
> arch/x86/kernel/microcode_amd.c | 1 +
> arch/x86/kernel/microcode_core.c | 53 +++++++++++++++++++++++----
> arch/x86/kernel/microcode_intel.c | 1 +
> arch/x86/kernel/setup.c | 5 +++
> 8 files changed, 132 insertions(+), 19 deletions(-)
>
> diff --git a/arch/x86/include/asm/microcode.h b/arch/x86/include/asm/microcode.h
> index 4ebe157..1cd2dc5 100644
> --- a/arch/x86/include/asm/microcode.h
> +++ b/arch/x86/include/asm/microcode.h
> @@ -32,6 +32,7 @@ struct microcode_ops {
>
> struct ucode_cpu_info {
> struct cpu_signature cpu_sig;
> + int new_rev;
> int valid;
> void *mc;
> };
> @@ -63,4 +64,15 @@ static inline struct microcode_ops * __init init_amd_microcode(void)
> static inline void __exit exit_amd_microcode(void) {}
> #endif
>
> +extern struct blocking_notifier_head microcode_notifier;
> +
> +#define MICROCODE_CAN_UPDATE 0x01
> +#define MICROCODE_UPDATED 0x02
> +
> +#define microcode_notifier(fn) \
> +do { \
> + static struct notifier_block fn##_nb ={ .notifier_call = fn }; \
> + blocking_notifier_chain_register(µcode_notifier, &fn##_nb);\
> +} while (0)
> +
> #endif /* _ASM_X86_MICROCODE_H */
> diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c
> index 000a474..d3ef86c1 100644
> --- a/arch/x86/kernel/cpu/perf_event.c
> +++ b/arch/x86/kernel/cpu/perf_event.c
> @@ -377,7 +377,7 @@ int x86_pmu_hw_config(struct perf_event *event)
> int precise = 0;
>
> /* Support for constant skid */
> - if (x86_pmu.pebs_active) {
> + if (x86_pmu.pebs_active && !x86_pmu.pebs_broken) {
> precise++;
>
> /* Support for IP fixup */
> @@ -1677,9 +1677,9 @@ static struct pmu pmu = {
> .pmu_enable = x86_pmu_enable,
> .pmu_disable = x86_pmu_disable,
>
> - .attr_groups = x86_pmu_attr_groups,
> + .attr_groups = x86_pmu_attr_groups,
>
> - .event_init = x86_pmu_event_init,
> + .event_init = x86_pmu_event_init,
>
> .add = x86_pmu_add,
> .del = x86_pmu_del,
> @@ -1687,11 +1687,11 @@ static struct pmu pmu = {
> .stop = x86_pmu_stop,
> .read = x86_pmu_read,
>
> - .start_txn = x86_pmu_start_txn,
> - .cancel_txn = x86_pmu_cancel_txn,
> - .commit_txn = x86_pmu_commit_txn,
> + .start_txn = x86_pmu_start_txn,
> + .cancel_txn = x86_pmu_cancel_txn,
> + .commit_txn = x86_pmu_commit_txn,
>
> - .event_idx = x86_pmu_event_idx,
> + .event_idx = x86_pmu_event_idx,
> .flush_branch_stack = x86_pmu_flush_branch_stack,
> };
>
> diff --git a/arch/x86/kernel/cpu/perf_event.h b/arch/x86/kernel/cpu/perf_event.h
> index 3df3de9..a5ecfe8 100644
> --- a/arch/x86/kernel/cpu/perf_event.h
> +++ b/arch/x86/kernel/cpu/perf_event.h
> @@ -373,7 +373,7 @@ struct x86_pmu {
> * Intel DebugStore bits
> */
> int bts, pebs;
> - int bts_active, pebs_active;
> + int bts_active, pebs_active, pebs_broken;
> int pebs_record_size;
> void (*drain_pebs)(struct pt_regs *regs);
> struct event_constraint *pebs_constraints;
> diff --git a/arch/x86/kernel/cpu/perf_event_intel.c b/arch/x86/kernel/cpu/perf_event_intel.c
> index 5ec146c..bde86cf 100644
> --- a/arch/x86/kernel/cpu/perf_event_intel.c
> +++ b/arch/x86/kernel/cpu/perf_event_intel.c
> @@ -15,6 +15,7 @@
>
> #include <asm/hardirq.h>
> #include <asm/apic.h>
> +#include <asm/microcode.h>
>
> #include "perf_event.h"
>
> @@ -1714,11 +1715,67 @@ static __init void intel_clovertown_quirk(void)
> x86_pmu.pebs_constraints = NULL;
> }
>
> +static const u32 snb_ucode_rev = 0x28;
> +
> +static void intel_snb_verify_ucode(void)
> +{
> + u32 rev = UINT_MAX;
> + int pebs_broken = 0;
> + int cpu;
> +
> + get_online_cpus();
> + /*
> + * Because the microcode loader is bloody stupid and allows different
> + * revisions per cpu and does strictly per-cpu loading, we now have to
> + * check all cpus to determine the minimally installed revision.
> + *
> + * This makes updating the microcode O(n^2) in the number of CPUs :/
> + */
> + for_each_online_cpu(cpu)
> + rev = min(cpu_data(cpu).microcode, rev);
> + put_online_cpus();
> +
> + pebs_broken = (rev < snb_ucode_rev);
> +
> + if (pebs_broken == x86_pmu.pebs_broken)
> + return;
> +
> + /*
> + * Serialized by the microcode lock..
> + */
> + if (x86_pmu.pebs_broken) {
> + pr_info("PEBS enabled due to micro-code update\n");
> + x86_pmu.pebs_broken = 0;
> + } else {
> + pr_info("PEBS disabled due to CPU errata, "
> + "please upgrade micro-code to at least %x (current: %x)\n",
> + snb_ucode_rev, rev);
> + x86_pmu.pebs_broken = 1;
> + }
> +}
> +
> +static int intel_snb_ucode_notifier(struct notifier_block *self,
> + unsigned long action, void *_uci)
> +{
> + /*
> + * Since ucode cannot be down-graded, and no future ucode revision
> + * is known to break PEBS again, we're ok with MICROCODE_CAN_UPDATE.
> + */
> +
> + if (action == MICROCODE_UPDATED)
> + intel_snb_verify_ucode();
> +
> + return NOTIFY_DONE;
> +}
> +
> static __init void intel_sandybridge_quirk(void)
> {
> - pr_warn("PEBS disabled due to CPU errata\n");
> - x86_pmu.pebs = 0;
> - x86_pmu.pebs_constraints = NULL;
> + intel_snb_verify_ucode();
> + /*
> + * we're still single threaded, so while there's a hole here,
> + * you can't trigger it.
> + */
> + microcode_notifier(intel_snb_ucode_notifier);
> }
>
> static const struct { int id; char *name; } intel_arch_events_map[] __initconst = {
> diff --git a/arch/x86/kernel/microcode_amd.c b/arch/x86/kernel/microcode_amd.c
> index 8a2ce8f..265831e 100644
> --- a/arch/x86/kernel/microcode_amd.c
> +++ b/arch/x86/kernel/microcode_amd.c
> @@ -294,6 +294,7 @@ generic_load_microcode(int cpu, const u8 *data, size_t size)
> }
>
> out_ok:
> + uci->new_rev = new_rev;
> uci->mc = new_mc;
> state = UCODE_OK;
> pr_debug("CPU%d update ucode (0x%08x -> 0x%08x)\n",
> diff --git a/arch/x86/kernel/microcode_core.c b/arch/x86/kernel/microcode_core.c
> index fbdfc69..a6cad81 100644
> --- a/arch/x86/kernel/microcode_core.c
> +++ b/arch/x86/kernel/microcode_core.c
> @@ -167,15 +167,43 @@ static void apply_microcode_local(void *arg)
> ctx->err = microcode_ops->apply_microcode(smp_processor_id());
> }
>
> +/*
> + * Call the notifier chain before we update to verify we can indeed
> + * update to the desired revision.
> + */
> +
> +static int microcode_notifier_check(struct ucode_cpu_info *uci)
> +{
> + int ret = blocking_notifier_call_chain(µcode_notifier,
> + MICROCODE_CAN_UPDATE, uci);
> + return notifier_to_errno(ret);
> +}
> +
> +/*
> + * Call the notifier chain after we've updated to
> + */
> +
> +static void microcode_notifier_done(void)
> +{
> + blocking_notifier_call_chain(µcode_notifier, MICROCODE_UPDATED, NULL);
> +}
> +
> static int apply_microcode_on_target(int cpu)
> {
> struct apply_microcode_ctx ctx = { .err = 0 };
> int ret;
>
> + ret = microcode_notifier_check(ucode_cpu_info + cpu);
> + if (ret)
> + return ret;
> +
> ret = smp_call_function_single(cpu, apply_microcode_local, &ctx, 1);
> if (!ret)
> ret = ctx.err;
>
> + if (!ret)
> + microcode_notifier_done();
> +
I suspect you want to do this here and not after the update() has completed over
all CPU (for_each_online_cpu()), because you want to prevent a race condition
with perf_event users trying PEBS at the same time. If not, then why not move
the callback after all the smp_call() are done.
> return ret;
> }
>
> @@ -196,8 +224,11 @@ static int do_microcode_update(const void __user *buf, size_t size)
> if (ustate == UCODE_ERROR) {
> error = -1;
> break;
> - } else if (ustate == UCODE_OK)
> - apply_microcode_on_target(cpu);
> + } else if (ustate == UCODE_OK) {
> + error = apply_microcode_on_target(cpu);
> + if (error)
> + break;
> + }
> }
>
> return error;
> @@ -282,11 +313,12 @@ static int reload_for_cpu(int cpu)
> enum ucode_state ustate;
>
> ustate = microcode_ops->request_microcode_fw(cpu, µcode_pdev->dev);
> - if (ustate == UCODE_OK)
> - apply_microcode_on_target(cpu);
> - else
> + if (ustate == UCODE_OK) {
> + err = apply_microcode_on_target(cpu);
> + } else {
> if (ustate == UCODE_ERROR)
> err = -EINVAL;
> + }
> }
> mutex_unlock(µcode_mutex);
>
> @@ -361,14 +393,15 @@ static void microcode_fini_cpu(int cpu)
> static enum ucode_state microcode_resume_cpu(int cpu)
> {
> struct ucode_cpu_info *uci = ucode_cpu_info + cpu;
> + int err;
>
> if (!uci->mc)
> return UCODE_NFOUND;
>
> pr_debug("CPU%d updated upon resume\n", cpu);
> - apply_microcode_on_target(cpu);
> + err = apply_microcode_on_target(cpu);
>
> - return UCODE_OK;
> + return !err ? UCODE_OK : UCODE_ERROR;
> }
>
> static enum ucode_state microcode_init_cpu(int cpu)
> @@ -385,8 +418,12 @@ static enum ucode_state microcode_init_cpu(int cpu)
> ustate = microcode_ops->request_microcode_fw(cpu, µcode_pdev->dev);
>
> if (ustate == UCODE_OK) {
> + int err;
> +
> pr_debug("CPU%d updated upon init\n", cpu);
> - apply_microcode_on_target(cpu);
> + err = apply_microcode_on_target(cpu);
> + if (err)
> + ustate = UCODE_ERROR;
> }
>
> return ustate;
> diff --git a/arch/x86/kernel/microcode_intel.c b/arch/x86/kernel/microcode_intel.c
> index 0327e2b..6b87bd5 100644
> --- a/arch/x86/kernel/microcode_intel.c
> +++ b/arch/x86/kernel/microcode_intel.c
> @@ -391,6 +391,7 @@ static enum ucode_state generic_load_microcode(int cpu, void *data, size_t size,
> }
>
> vfree(uci->mc);
> + uci->new_rev = new_rev;
> uci->mc = (struct microcode_intel *)new_mc;
>
> pr_debug("CPU%d found a matching microcode update with version 0x%x (current=0x%x)\n",
> diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
> index f4b9b80..98b5b5c 100644
> --- a/arch/x86/kernel/setup.c
> +++ b/arch/x86/kernel/setup.c
> @@ -68,6 +68,7 @@
> #include <linux/percpu.h>
> #include <linux/crash_dump.h>
> #include <linux/tboot.h>
> +#include <linux/notifier.h>
>
> #include <video/edid.h>
>
> @@ -205,6 +206,10 @@ struct cpuinfo_x86 boot_cpu_data __read_mostly = {
> EXPORT_SYMBOL(boot_cpu_data);
> #endif
>
> +/*
> + * Lives here because the microcode stuff is modular.
> + */
> +struct atomic_notifier_head microcode_notifier;
>
> #if !defined(CONFIG_X86_PAE) || defined(CONFIG_X86_64)
> unsigned long mmu_cr4_features;
>
>
> Why do we have per-cpu firmware anyway?
Consider CPU hotplug. A newly plugged CPU is not updated yet, but will
be after hotplug.
However i suppose it would be possible to do the update early in the
boot up sequence before others can use it. That would avoid this problem.
-Andi
On Fri, Jun 08, 2012 at 03:26:12PM +0200, Peter Zijlstra wrote:
> The very worst is that it does per-cpu micro-code updates, not machine
> wide. This results in it being able to have different revisions on
> different cpus. This in turn makes the below O(n^2) :/
How about this: since the ucode cannot be downgraded and since higher
ucode versions are supposed to fix current and older problems (otherwise
ucoders will get an earlfull) you shouldn't be needing to verify the
ucode version on all CPUs per-CPU, i.e. the O(n^2) overhead.
Rather, simply track which CPUs _haven't_ been updated yet, and once
this is the empty set, run the verify thing to check ucode version on
all CPUs.
And this should happen only when we update ucode from version A to
version B, where B > A.
And unless I'm missing something, this should be O(n) and ucode update
should happen very seldomly anyway.
--
Regards/Gruss,
Boris.
Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
GM: Alberto Bozzo
Reg: Dornach, Landkreis Muenchen
HRB Nr. 43632 WEEE Registernr: 129 19551
On Fri, 2012-06-08 at 18:28 +0200, Stephane Eranian wrote:
> > static int apply_microcode_on_target(int cpu)
> > {
> > struct apply_microcode_ctx ctx = { .err = 0 };
> > int ret;
> >
> > + ret = microcode_notifier_check(ucode_cpu_info + cpu);
> > + if (ret)
> > + return ret;
> > +
> > ret = smp_call_function_single(cpu, apply_microcode_local, &ctx, 1);
> > if (!ret)
> > ret = ctx.err;
> >
> > + if (!ret)
> > + microcode_notifier_done();
> > +
> I suspect you want to do this here and not after the update() has completed over
> all CPU (for_each_online_cpu()), because you want to prevent a race condition
> with perf_event users trying PEBS at the same time. If not, then why not move
> the callback after all the smp_call() are done.
Because not all callers are in a for_each_cpu loop. For instance, see
the reload_store() thing.
Also, since the ucode is per-cpu and not machine wide, a per-cpu
callback makes more sense.
On Fri, 2012-06-08 at 20:05 +0200, Borislav Petkov wrote:
> How about this: since the ucode cannot be downgraded and since higher
> ucode versions are supposed to fix current and older problems (otherwise
> ucoders will get an earlfull) you shouldn't be needing to verify the
> ucode version on all CPUs per-CPU, i.e. the O(n^2) overhead.
>
> Rather, simply track which CPUs _haven't_ been updated yet, and once
> this is the empty set, run the verify thing to check ucode version on
> all CPUs.
>
> And this should happen only when we update ucode from version A to
> version B, where B > A.
>
> And unless I'm missing something, this should be O(n) and ucode update
> should happen very seldomly anyway.
Checking a bitmap of n bits for being all zero is O(n), so the total is
still O(n^2). Still, probably faster than the for_each_online_cpu() scan
I do now.
On Fri, Jun 08, 2012 at 08:52:17PM +0200, Peter Zijlstra wrote:
> Checking a bitmap of n bits for being all zero is O(n), so the total is
> still O(n^2). Still, probably faster than the for_each_online_cpu() scan
> I do now.
Ok, here's what I mean:
On init you do:
cpumask_copy(ucode_mask, cpu_online_mask);
In the notifier:
cpumask_clear_cpu(this_cpu, ucode_mask);
if (cpumask_empty(ucode_mask))
verify_pebs();
So, on each cpu you do cpumask_clear_cpu() which should be O(k) for some
constant k.
And ok, cpumask_empty() aka __bitmap_empty() iterates over
nbits/BITS_PER_LONG, so here's the O(n), I see what you mean.
Well, you could probably replace cpumask_empty() with cpumask_weight()
which should use POPCNT in newer hardware and should be almost for free.
--
Regards/Gruss,
Boris.
Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
GM: Alberto Bozzo
Reg: Dornach, Landkreis Muenchen
HRB Nr. 43632 WEEE Registernr: 129 19551
On Fri, 08 Jun 2012, Borislav Petkov wrote:
> On Fri, Jun 08, 2012 at 03:26:12PM +0200, Peter Zijlstra wrote:
> > On Fri, 2012-06-08 at 12:00 +0200, Peter Zijlstra wrote:
> > > > > Or we could put a hook in the ucode loader.
> > > >
> > > > I'd really suggest the latter - I doubt this will be our only
> > > > ucode dependent quirk, going forward ...
> > >
> > > Yeah, am in the middle of writing that..
> >
> > OK so the micro-code stuff is a complete trainwreck.
> >
> > The very worst is that it does per-cpu micro-code updates, not machine
> > wide. This results in it being able to have different revisions on
> > different cpus. This in turn makes the below O(n^2) :/
>
> Reportedly, there are some obscure systems which need different
> microcode versions per CPU:
>
> http://lkml.indiana.edu/hypermail/linux/kernel/1105.3/01010.html
I wrote that email you linked above. All it means is that you can have
more than one microcode *type* per system, because you can have more
than one processor type per system.
The lack of per-system microcode handling is, as far as I am concerned,
a rather annoying misfeature, plain and simple. A per-core interface
that lets you have mismatched microcode across cores of the same type
(i.e. that take the same microcode) is, IMO, a bottomless pit of pain
and suffering which should diediediediedie as soon as possible.
> > Why do we have per-cpu firmware anyway?
>
> See above.
It just means the firmware core has to do a per-cpu job, not that we
should have mismatched microcodes across the same type of cpus, or even
that we should have the required per-cpu handling of firmware exposed to
the rest of the kernel and the worst of it all, exposed to userspace.
> Actually, the deal here is that you could have received microcode
> already from BIOS and you won't have to necessarily load ucode from
> userspace with the Linux ucode loader.
Or from an enhanced bootloader. Yes.
> Btw, this is why we wanted to load ucode as early as possible but
> there's no progress on the whole thing right now...
Which is a pity.
--
"One disk to rule them all, One disk to find them. One disk to bring
them all and in the darkness grind them. In the Land of Redmond
where the shadows lie." -- The Silicon Valley Tarot
Henrique Holschuh
On Fri, Jun 08, 2012 at 06:02:39PM -0300, Henrique de Moraes Holschuh wrote:
> > Reportedly, there are some obscure systems which need different
> > microcode versions per CPU:
> >
> > http://lkml.indiana.edu/hypermail/linux/kernel/1105.3/01010.html
>
> I wrote that email you linked above. All it means is that you can have
> more than one microcode *type* per system, because you can have more
> than one processor type per system.
I know ;).
Btw, with microcode type you mean two or more different versions of
microcode which the respective CPUs need, right?
> The lack of per-system microcode handling is, as far as I am concerned,
> a rather annoying misfeature, plain and simple. A per-core interface
> that lets you have mismatched microcode across cores of the same type
> (i.e. that take the same microcode) is, IMO, a bottomless pit of pain
> and suffering which should diediediediedie as soon as possible.
Yeah, but we need to support your use case where you have mixed
processor revisions and you want them to get the microcode suitable for
each one of them.
Otherwise, I'm all for making it per-system too. Or are you saying you
want to have one per-system interface and not per-CPU like what we have
now?
/sys/devices/system/cpu/cpu0/microcode
/sys/devices/system/cpu/cpu1/microcode
...
> It just means the firmware core has to do a per-cpu job, not that we
> should have mismatched microcodes across the same type of cpus, or
> even that we should have the required per-cpu handling of firmware
> exposed to the rest of the kernel and the worst of it all, exposed to
> userspace.
Yeah.
> > Actually, the deal here is that you could have received microcode
> > already from BIOS and you won't have to necessarily load ucode from
> > userspace with the Linux ucode loader.
>
> Or from an enhanced bootloader. Yes.
Yeah, I'd like to see that too.
> > Btw, this is why we wanted to load ucode as early as possible but
> > there's no progress on the whole thing right now...
>
> Which is a pity.
Well, there was this other effort with ACPI tables passed on as blobs
through the initrd and it was an interesting one because we don't have
to change the bootloaders but it kinda died out too.
I know how this is going to happen: some hw vendor will really need to
load ucode very early and will have to code the thing up :-).
--
Regards/Gruss,
Boris.
Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
GM: Alberto Bozzo
Reg: Dornach, Landkreis Muenchen
HRB Nr. 43632 WEEE Registernr: 129 19551
On Fri, Jun 8, 2012 at 4:25 PM, Peter Zijlstra <[email protected]> wrote:
>
> On Fri, 2012-06-08 at 16:07 +0200, Stephane Eranian wrote:
> > > +static const u32 snb_ucode_rev = 0x28;
> > > +
> >
> > That needs to be a per CPU model value. It
> > is not the same for SNB vs. SNB-EP. On EP
> > it may even depends on stepping.
>
> Yeah, you said, easiest is removing the const and putting some
> assignments to the thing somewhere in our model switch.
>
> Do you know what values for what chip we should use?
>
Ok, so to close on this, I tried the 6/6/2012 ucode update on a few
SNB-EP systems.
I got two answers depending on the stepping:
C1 (stepping 6) -> 0x618
C2 (stepping 7) -> 0x70c
So we need to check x86_mask for stepping and adjust the value of
snb_ucode_rev accordingly for model 45.
On 08.06.12 15:26:12, Peter Zijlstra wrote:
> +static const u32 snb_ucode_rev = 0x28;
> +
> +static void intel_snb_verify_ucode(void)
> +{
> + u32 rev = UINT_MAX;
> + int pebs_broken = 0;
> + int cpu;
> +
> + get_online_cpus();
> + /*
> + * Because the microcode loader is bloody stupid and allows different
> + * revisions per cpu and does strictly per-cpu loading, we now have to
> + * check all cpus to determine the minimally installed revision.
> + *
> + * This makes updating the microcode O(n^2) in the number of CPUs :/
> + */
> + for_each_online_cpu(cpu)
> + rev = min(cpu_data(cpu).microcode, rev);
> + put_online_cpus();
> +
> + pebs_broken = (rev < snb_ucode_rev);
> +
> + if (pebs_broken == x86_pmu.pebs_broken)
> + return;
> +
> + /*
> + * Serialized by the microcode lock..
> + */
> + if (x86_pmu.pebs_broken) {
> + pr_info("PEBS enabled due to micro-code update\n");
> + x86_pmu.pebs_broken = 0;
> + } else {
> + pr_info("PEBS disabled due to CPU errata, "
> + "please upgrade micro-code to at least %x (current: %x)\n",
> + snb_ucode_rev, rev);
> + x86_pmu.pebs_broken = 1;
> + }
> +}
> +
> +static int intel_snb_ucode_notifier(struct notifier_block *self,
> + unsigned long action, void *_uci)
> +{
> + /*
> + * Since ucode cannot be down-graded, and no future ucode revision
> + * is known to break PEBS again, we're ok with MICROCODE_CAN_UPDATE.
> + */
> +
> + if (action == MICROCODE_UPDATED)
> + intel_snb_verify_ucode();
> +
> + return NOTIFY_DONE;
> +}
> +
> static __init void intel_sandybridge_quirk(void)
> {
> - pr_warn("PEBS disabled due to CPU errata\n");
> - x86_pmu.pebs = 0;
> - x86_pmu.pebs_constraints = NULL;
> + intel_snb_verify_ucode();
> + /*
> + * we're still single threaded, so while there's a hole here,
> + * you can't trigger it.
> + */
> + microcode_notifier(intel_snb_ucode_notifier);
> }
Instead of registering a microcode notifier, why not checking the
availability of pebs dynamically with each syscall in
intel_pmu_hw_config()? It looks like intel_snb_verify_ucode() is not
that much expensive. We can perform the check only if the event could
be for pebs and if pebs is broken. The check could be repeated when
setting up a new event after ucode could potentially has been updated
(e.g. after bringing a cpu online or so).
-Robert
--
Advanced Micro Devices, Inc.
Operating System Research Center
On Tue, Jun 12, 2012 at 7:07 PM, Robert Richter <[email protected]> wrote:
> On 08.06.12 15:26:12, Peter Zijlstra wrote:
>> +static const u32 snb_ucode_rev = 0x28;
>> +
>> +static void intel_snb_verify_ucode(void)
>> +{
>> + u32 rev = UINT_MAX;
>> + int pebs_broken = 0;
>> + int cpu;
>> +
>> + get_online_cpus();
>> + /*
>> + * Because the microcode loader is bloody stupid and allows different
>> + * revisions per cpu and does strictly per-cpu loading, we now have to
>> + * check all cpus to determine the minimally installed revision.
>> + *
>> + * This makes updating the microcode O(n^2) in the number of CPUs :/
>> + */
>> + for_each_online_cpu(cpu)
>> + rev = min(cpu_data(cpu).microcode, rev);
>> + put_online_cpus();
>> +
>> + pebs_broken = (rev < snb_ucode_rev);
>> +
>> + if (pebs_broken == x86_pmu.pebs_broken)
>> + return;
>> +
>> + /*
>> + * Serialized by the microcode lock..
>> + */
>> + if (x86_pmu.pebs_broken) {
>> + pr_info("PEBS enabled due to micro-code update\n");
>> + x86_pmu.pebs_broken = 0;
>> + } else {
>> + pr_info("PEBS disabled due to CPU errata, "
>> + "please upgrade micro-code to at least %x (current: %x)\n",
>> + snb_ucode_rev, rev);
>> + x86_pmu.pebs_broken = 1;
>> + }
>> +}
>> +
>> +static int intel_snb_ucode_notifier(struct notifier_block *self,
>> + unsigned long action, void *_uci)
>> +{
>> + /*
>> + * Since ucode cannot be down-graded, and no future ucode revision
>> + * is known to break PEBS again, we're ok with MICROCODE_CAN_UPDATE.
>> + */
>> +
>> + if (action == MICROCODE_UPDATED)
>> + intel_snb_verify_ucode();
>> +
>> + return NOTIFY_DONE;
>> +}
>> +
>> static __init void intel_sandybridge_quirk(void)
>> {
>> - pr_warn("PEBS disabled due to CPU errata\n");
>> - x86_pmu.pebs = 0;
>> - x86_pmu.pebs_constraints = NULL;
>> + intel_snb_verify_ucode();
>> + /*
>> + * we're still single threaded, so while there's a hole here,
>> + * you can't trigger it.
>> + */
>> + microcode_notifier(intel_snb_ucode_notifier);
>> }
>
> Instead of registering a microcode notifier, why not checking the
> availability of pebs dynamically with each syscall in
> intel_pmu_hw_config()? It looks like intel_snb_verify_ucode() is not
> that much expensive. We can perform the check only if the event could
> be for pebs and if pebs is broken. The check could be repeated when
> setting up a new event after ucode could potentially has been updated
> (e.g. after bringing a cpu online or so).
>
That's what I had in my original version.
> -Robert
>
>
> --
> Advanced Micro Devices, Inc.
> Operating System Research Center
>
On Tue, 2012-06-12 at 19:09 +0200, Stephane Eranian wrote:
> > Instead of registering a microcode notifier, why not checking the
> > availability of pebs dynamically with each syscall in
> > intel_pmu_hw_config()? It looks like intel_snb_verify_ucode() is not
> > that much expensive. We can perform the check only if the event
> could
> > be for pebs and if pebs is broken. The check could be repeated when
> > setting up a new event after ucode could potentially has been
> updated
> > (e.g. after bringing a cpu online or so).
Because you then end up with a for_each_online_cpu() loop in there,
that's not pretty and quite horrible on large systems when you need to
create nr_cpus events.
Furthermore, ucode update is the rare thing, creating events happens
much more frequently.
> That's what I had in my original version.
Right, but you really need to check all cpus, not just the one you
happen to run on or the boot cpu.
On Tue, Jun 12, 2012 at 07:09:19PM +0200, Stephane Eranian wrote:
> > Instead of registering a microcode notifier, why not checking the
> > availability of pebs dynamically with each syscall in
> > intel_pmu_hw_config()? It looks like intel_snb_verify_ucode() is not
> > that much expensive. We can perform the check only if the event could
> > be for pebs and if pebs is broken. The check could be repeated when
> > setting up a new event after ucode could potentially has been updated
> > (e.g. after bringing a cpu online or so).
Yes, this will obviate the need for touching the microcode driver at all.
> That's what I had in my original version.
... and you decided not to do it like that because... ?
Thanks.
--
Regards/Gruss,
Boris.
Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
GM: Alberto Bozzo
Reg: Dornach, Landkreis Muenchen
HRB Nr. 43632 WEEE Registernr: 129 19551
On Tue, Jun 12, 2012 at 07:13:23PM +0200, Peter Zijlstra wrote:
> > That's what I had in my original version.
>
> Right, but you really need to check all cpus, not just the one you
> happen to run on or the boot cpu.
Ok, the answer to that question is whether sandybridge can support mixed
ucode revisions or not. If not, you only need to check the current cpu
you're on.
--
Regards/Gruss,
Boris.
Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
GM: Alberto Bozzo
Reg: Dornach, Landkreis Muenchen
HRB Nr. 43632 WEEE Registernr: 129 19551
On Tue, 2012-06-12 at 19:17 +0200, Borislav Petkov wrote:
> On Tue, Jun 12, 2012 at 07:13:23PM +0200, Peter Zijlstra wrote:
> > > That's what I had in my original version.
> >
> > Right, but you really need to check all cpus, not just the one you
> > happen to run on or the boot cpu.
>
> Ok, the answer to that question is whether sandybridge can support mixed
> ucode revisions or not. If not, you only need to check the current cpu
> you're on.
But the ucode loader can be (ab)used to only update one of the many
CPUs.
On Tue, Jun 12, 2012 at 07:18:13PM +0200, Peter Zijlstra wrote:
> But the ucode loader can be (ab)used to only update one of the many
> CPUs.
Not if you don't support mixed ucode revisions. In that case, you want
the same ucode patch applied to _all_ cores on the system. Otherwise,
random freezes, explosions and other cataclysmic events can happen.
--
Regards/Gruss,
Boris.
Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
GM: Alberto Bozzo
Reg: Dornach, Landkreis Muenchen
HRB Nr. 43632 WEEE Registernr: 129 19551
On 12.06.12 19:13:23, Peter Zijlstra wrote:
> On Tue, 2012-06-12 at 19:09 +0200, Stephane Eranian wrote:
> > > Instead of registering a microcode notifier, why not checking the
> > > availability of pebs dynamically with each syscall in
> > > intel_pmu_hw_config()? It looks like intel_snb_verify_ucode() is not
> > > that much expensive. We can perform the check only if the event
> > could
> > > be for pebs and if pebs is broken. The check could be repeated when
> > > setting up a new event after ucode could potentially has been
> > updated
> > > (e.g. after bringing a cpu online or so).
>
> Because you then end up with a for_each_online_cpu() loop in there,
> that's not pretty and quite horrible on large systems when you need to
> create nr_cpus events.
But usually the check fails on the current cpu, no need to touch other
cpus in this case. for_each_online_cpu() would run only for the case
that there just was a ucode update and pebs is going to be enabled.
And for this rare case we could use locking.
> Furthermore, ucode update is the rare thing, creating events happens
> much more frequently.
>
> > That's what I had in my original version.
>
> Right, but you really need to check all cpus, not just the one you
> happen to run on or the boot cpu.
Once pebs is enabled no further checks are needed anymore, I think.
-Robert
--
Advanced Micro Devices, Inc.
Operating System Research Center
On Tue, 2012-06-12 at 19:23 +0200, Borislav Petkov wrote:
> On Tue, Jun 12, 2012 at 07:18:13PM +0200, Peter Zijlstra wrote:
> > But the ucode loader can be (ab)used to only update one of the many
> > CPUs.
>
> Not if you don't support mixed ucode revisions. In that case, you want
> the same ucode patch applied to _all_ cores on the system. Otherwise,
> random freezes, explosions and other cataclysmic events can happen.
You want, yes, but afaict the stuff in
arch/x86/kernel/microcode_core.c:reload_store means you can force a
single cpu to update while leaving the others alone.
Afaict this is not a sane thing to do, but quite possible.
On Tue, Jun 12, 2012 at 07:26:36PM +0200, Peter Zijlstra wrote:
> You want, yes, but afaict the stuff in
> arch/x86/kernel/microcode_core.c:reload_store means you can force a
> single cpu to update while leaving the others alone.
Btw, this doesn't work on AMD... for a reason.
> Afaict this is not a sane thing to do, but quite possible.
I'd guess this is still there to support mixed ucode revisions for some
oldish platforms. But if SB doesn't support mixed ucode revisions,
you don't want to use the reload_store interface anyway for reasons
described above. Maybe this interface should be behind a family, model
check or so, so that users don't shoot themselves in the foot but it is
root-only anyway.
--
Regards/Gruss,
Boris.
Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
GM: Alberto Bozzo
Reg: Dornach, Landkreis Muenchen
HRB Nr. 43632 WEEE Registernr: 129 19551
On Tue, 2012-06-12 at 19:35 +0200, Borislav Petkov wrote:
> On Tue, Jun 12, 2012 at 07:26:36PM +0200, Peter Zijlstra wrote:
> > You want, yes, but afaict the stuff in
> > arch/x86/kernel/microcode_core.c:reload_store means you can force a
> > single cpu to update while leaving the others alone.
>
> Btw, this doesn't work on AMD... for a reason.
How so? afaict there's nothing stopping it from working.
> > Afaict this is not a sane thing to do, but quite possible.
>
> I'd guess this is still there to support mixed ucode revisions for some
> oldish platforms. But if SB doesn't support mixed ucode revisions,
> you don't want to use the reload_store interface anyway for reasons
> described above. Maybe this interface should be behind a family, model
> check or so, so that users don't shoot themselves in the foot but it is
> root-only anyway.
Ideally this interface should be removed, but yeah. As long as its there
you have to check all CPUs, because officially supported or not simply
doesn't matter, the user can do it.
Also, you can create a pebs event while updating micro-code. There's a
race window there if you don't check all cpus.
> Also, you can create a pebs event while updating micro-code. There's a
> race window there if you don't check all cpus.
The best proposal I've seen so far was to track the minimum ucode
revision and check that on every syscall.
This can be a single global, so it should not be very exepsnive.
I can look at implementing that if that solution is agreeable with
everyone.
-Andi
--
[email protected] -- Speaking for myself only.
Andi just alerted me to this thread.
I, Thomas and Borislav discussed this at LCE last year, and it really
comes up that we have a *large* number of reasons to load microcode
early. We then need to keep the microcode around so that it can be
loaded on APs, hotplug CPUs, and after S3 resume.
The protocol discussion happened in this thread:
http://marc.info/[email protected]
Fenghua Yu at Intel is currently working on the implementation.
-hpa
On Tue, Jun 12, 2012 at 11:35:18AM -0700, H. Peter Anvin wrote:
> Andi just alerted me to this thread.
>
> I, Thomas and Borislav discussed this at LCE last year, and it really
> comes up that we have a *large* number of reasons to load microcode
> early. We then need to keep the microcode around so that it can be
> loaded on APs, hotplug CPUs, and after S3 resume.
>
> The protocol discussion happened in this thread:
>
> http://marc.info/[email protected]
>
> Fenghua Yu at Intel is currently working on the implementation.
Nice, this is the initrd cpio thing, cool, no changes needed for the
bootloader. Ping me when you have something ready so that I can start
testing it here too.
Thanks guys.
--
Regards/Gruss,
Boris.
Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
GM: Alberto Bozzo
Reg: Dornach, Landkreis Muenchen
HRB Nr. 43632 WEEE Registernr: 129 19551
On Tue, 2012-06-12 at 21:07 +0200, Borislav Petkov wrote:
> Nice, this is the initrd cpio thing, cool, no changes needed for the
> bootloader. Ping me when you have something ready so that I can start
> testing it here too.
What do people like me do that don't use initrd like nonsense?
On Tue, Jun 12, 2012 at 09:33:34PM +0200, Peter Zijlstra wrote:
> On Tue, 2012-06-12 at 21:07 +0200, Borislav Petkov wrote:
> > Nice, this is the initrd cpio thing, cool, no changes needed for the
> > bootloader. Ping me when you have something ready so that I can start
> > testing it here too.
>
> What do people like me do that don't use initrd like nonsense?
Use the patch kit I'm posting shortly...
-Andi
--
[email protected] -- Speaking for myself only.
On Tue, Jun 12, 2012 at 09:33:34PM +0200, Peter Zijlstra wrote:
> On Tue, 2012-06-12 at 21:07 +0200, Borislav Petkov wrote:
> > Nice, this is the initrd cpio thing, cool, no changes needed for the
> > bootloader. Ping me when you have something ready so that I can start
> > testing it here too.
>
> What do people like me do that don't use initrd like nonsense?
I haven't seen the code yet but I'd assume you could add only the ucode
as an initrd image to the bootloader using "initrd... ". Then you'd only
need to be able to load initrds.
Also, I'd assume we need some of the current functionality to be able to
load ucode without rebooting the box.
But why, do you see a better option for how to tell the kernel to load a
binary blob early at boot?
--
Regards/Gruss,
Boris.
Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
GM: Alberto Bozzo
Reg: Dornach, Landkreis Muenchen
HRB Nr. 43632 WEEE Registernr: 129 19551
On Tue, Jun 12, 2012 at 07:40:43PM +0200, Peter Zijlstra wrote:
> How so? afaict there's nothing stopping it from working.
I spoke too soon. It works but only if you have a
newer-than-whats-on-the-system microcode image file. IOW, you need to
move the microcode image away so that the ucode driver doesn't find it,
boot, then move it back to /lib/firmware and do the reload thing through
sysfs.
Last time I tried this, the microcode got already updated during boot
and so the loading path didn't find a newer image, leading me to think
the interface didn't work and I didn't dig deeper as to why it didn't.
Which means that I definitely need to disable this on AMD. I'll try to
cook up something...
> Ideally this interface should be removed, but yeah. As long as its there
> you have to check all CPUs, because officially supported or not simply
> doesn't matter, the user can do it.
>
> Also, you can create a pebs event while updating micro-code. There's a
> race window there if you don't check all cpus.
Oh man, this is just nasty.
Ok, we could probably make the notifier functionality dependent on
whether the ucode driver for a vendor supports the reload_store
interface and thus single cpu ucode updates. If it doesn't, no need to
run the notifier per cpu but then the story with the race window is
still unsolved.
I need to sleep on it.
--
Regards/Gruss,
Boris.
Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
GM: Alberto Bozzo
Reg: Dornach, Landkreis Muenchen
HRB Nr. 43632 WEEE Registernr: 129 19551
On 06/12/2012 12:33 PM, Peter Zijlstra wrote:
> On Tue, 2012-06-12 at 21:07 +0200, Borislav Petkov wrote:
>> Nice, this is the initrd cpio thing, cool, no changes needed for the
>> bootloader. Ping me when you have something ready so that I can start
>> testing it here too.
>
> What do people like me do that don't use initrd like nonsense?
>
There is no requirement that you use an initramfs userspace. This is
just data loaded via the initramfs protocol rather than inventing a new one.
-hpa
On 06/12/2012 12:49 PM, Borislav Petkov wrote:
>
> Also, I'd assume we need some of the current functionality to be able to
> load ucode without rebooting the box.
>
Yes, the basic idea is that we'll keep the ucode around anyway so we can
feed it to hotplug CPUs or S3 resume. If the ucode is updated from
userspace we'd replace the in-memory copy and load it into the CPUs.
-hpa
On Tue, Jun 12, 2012 at 01:28:55PM -0700, H. Peter Anvin wrote:
> On 06/12/2012 12:49 PM, Borislav Petkov wrote:
> >
> > Also, I'd assume we need some of the current functionality to be able to
> > load ucode without rebooting the box.
> >
>
> Yes, the basic idea is that we'll keep the ucode around anyway so we can
> feed it to hotplug CPUs or S3 resume. If the ucode is updated from
> userspace we'd replace the in-memory copy and load it into the CPUs.
You say CPUs, i.e. plural. What is your take on loading ucode on a
single core, i.e. the /sys/devices/system/cpu/cpuX/microcode/reload
interface? Do you guys need to load ucode only system-wide or can you
stomach per-core loads?
I want to disable it (or even remove it, modulo the whole userspace
compat blah blah) on AMD.
Also, if we do system-wide, can we have a single sysfs node where we
give the new ucode from userspace instead of per-cpu files?
--
Regards/Gruss,
Boris.
Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
GM: Alberto Bozzo
Reg: Dornach, Landkreis Muenchen
HRB Nr. 43632 WEEE Registernr: 129 19551
On 06/12/2012 01:37 PM, Borislav Petkov wrote:
> On Tue, Jun 12, 2012 at 01:28:55PM -0700, H. Peter Anvin wrote:
>> On 06/12/2012 12:49 PM, Borislav Petkov wrote:
>>>
>>> Also, I'd assume we need some of the current functionality to be able to
>>> load ucode without rebooting the box.
>>>
>>
>> Yes, the basic idea is that we'll keep the ucode around anyway so we can
>> feed it to hotplug CPUs or S3 resume. If the ucode is updated from
>> userspace we'd replace the in-memory copy and load it into the CPUs.
>
> You say CPUs, i.e. plural. What is your take on loading ucode on a
> single core, i.e. the /sys/devices/system/cpu/cpuX/microcode/reload
> interface? Do you guys need to load ucode only system-wide or can you
> stomach per-core loads?
>
> I want to disable it (or even remove it, modulo the whole userspace
> compat blah blah) on AMD.
>
> Also, if we do system-wide, can we have a single sysfs node where we
> give the new ucode from userspace instead of per-cpu files?
>
I personally don't know of any valid use case for per-core loads, and it
sounds like a horrid idea. And yes, I would prefer a single sysfs file,
or better yet a plain old device node.
-hpa
On Tue, Jun 12, 2012 at 01:42:07PM -0700, H. Peter Anvin wrote:
> I personally don't know of any valid use case for per-core loads, and
> it sounds like a horrid idea.
Vehemently agreed.
However, last time we talked about Ingo mentioned some possibility
of having different microcode versions on different CPUs:
http://lkml.indiana.edu/hypermail/linux/kernel/1105.3/00749.html
Hmm...
> And yes, I would prefer a single sysfs file, or better yet a plain old
> device node.
Also agreed.
--
Regards/Gruss,
Boris.
Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
GM: Alberto Bozzo
Reg: Dornach, Landkreis Muenchen
HRB Nr. 43632 WEEE Registernr: 129 19551
On 06/12/2012 01:56 PM, Borislav Petkov wrote:
> On Tue, Jun 12, 2012 at 01:42:07PM -0700, H. Peter Anvin wrote:
>> I personally don't know of any valid use case for per-core loads, and
>> it sounds like a horrid idea.
>
> Vehemently agreed.
>
> However, last time we talked about Ingo mentioned some possibility
> of having different microcode versions on different CPUs:
> http://lkml.indiana.edu/hypermail/linux/kernel/1105.3/00749.html
>
> Hmm...
>
>> And yes, I would prefer a single sysfs file, or better yet a plain old
>> device node.
>
> Also agreed.
>
There is a difference here. Mixed steppings or even mixed CPUs are,
indeed, supported (even things like Nehalem + Westmere in the same
system). However, the notion is that we should upload the *entire set*
of microcodes to the kernel, and let it flash the appropriate one into
each CPU.
This may result in different version number on different CPUs.
-hpa
On Tue, Jun 12, 2012 at 01:58:23PM -0700, H. Peter Anvin wrote:
> There is a difference here. Mixed steppings or even mixed CPUs are,
> indeed, supported (even things like Nehalem + Westmere in the same
> system). However, the notion is that we should upload the *entire set*
> of microcodes to the kernel, and let it flash the appropriate one into
> each CPU.
Sure, the ucode driver both on Intel and AMD does that already by
looking at each patch in the blob. I'd expect no change to that
functionality.
> This may result in different version number on different CPUs.
Ok, understood.
Thanks.
--
Regards/Gruss,
Boris.
Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
GM: Alberto Bozzo
Reg: Dornach, Landkreis Muenchen
HRB Nr. 43632 WEEE Registernr: 129 19551
On Tue, 12 Jun 2012, Borislav Petkov wrote:
> I'd guess this is still there to support mixed ucode revisions for some
It is still there because of the stable ABI rules. As far as I can tell,
it was a crap interface when it got in, and it still is a crap interface
now because there simply isn't any sane usercase that requires it, it is
dangerous as implemented right now (at least in the Intel case), and even
if we fixed the kernel to do the right thing, userspace would not be able
to know that and would still need to request 1 microcode refresh per core.
As far as I know, we always want to refresh the microcode on every core,
use the firmware interface to pick up a copy of the newest version of the
microcode matching the signature of each core, and leave no core behind
without an update.
And preferably, we want to request_firmware() only once per microcode,
which is rather easy to do: cache every microcode that will be needed,
check cache first before doing request_firmware() in the per-core worker
threads, and invalidate the cache when the user requests
"refresh_all_microcode". So, the cache speeds up multi-core updates, and
is also usable when restoring the system from suspend/hibernation, but
doesn't get in the way of userspace trying to apply a microcode update.
This would make my pathetic system do one request_firmware instead of
eight. And even the old junk with mixed-stepping SMP at work would only
require two, instead of four (or eight? I don't recall how many cores per
die it has) request_firmware calls.
> described above. Maybe this interface should be behind a family, model
> check or so, so that users don't shoot themselves in the foot but it is
> root-only anyway.
This interface should _DIE_.
Perhaps we could make it work only for CPU 0 and return EBUSY or whatever
for all others (or just don't publish the sysfs attribute for the others),
and change the CPU0 refresh firmware request into a refresh all cores call.
We could then add a new sysfs attribute to cleanly do the update-all-cores
request. That should take care of old userspace, and let the distros
deploy faster userspace without any fuss...
--
"One disk to rule them all, One disk to find them. One disk to bring
them all and in the darkness grind them. In the Land of Redmond
where the shadows lie." -- The Silicon Valley Tarot
Henrique Holschuh
On Tue, Jun 12, 2012 at 10:04:13PM -0300, Henrique de Moraes Holschuh wrote:
> On Tue, 12 Jun 2012, Borislav Petkov wrote:
> > I'd guess this is still there to support mixed ucode revisions for some
>
> It is still there because of the stable ABI rules. As far as I can tell,
> it was a crap interface when it got in, and it still is a crap interface
> now because there simply isn't any sane usercase that requires it, it is
> dangerous as implemented right now (at least in the Intel case), and even
> if we fixed the kernel to do the right thing, userspace would not be able
> to know that and would still need to request 1 microcode refresh per core.
Well, I'm disabling it on AMD.
We could make it iterate over _all_ cores and do the update on each one
of them even though the user does a sysfs write only for a single core..
> As far as I know, we always want to refresh the microcode on every core,
> use the firmware interface to pick up a copy of the newest version of the
> microcode matching the signature of each core, and leave no core behind
> without an update.
Yep.
> And preferably, we want to request_firmware() only once per microcode,
> which is rather easy to do: cache every microcode that will be needed,
> check cache first before doing request_firmware() in the per-core
> worker threads, and invalidate the cache when the user requests
> "refresh_all_microcode". So, the cache speeds up multi-core updates,
> and is also usable when restoring the system from suspend/hibernation,
> but doesn't get in the way of userspace trying to apply a microcode
> update.
Yes, agreed too.
> This would make my pathetic system do one request_firmware instead of
> eight. And even the old junk with mixed-stepping SMP at work would
> only require two, instead of four (or eight? I don't recall how many
> cores per die it has) request_firmware calls.
Ok.
> > described above. Maybe this interface should be behind a family, model
> > check or so, so that users don't shoot themselves in the foot but it is
> > root-only anyway.
>
> This interface should _DIE_.
Yeppers! :-)
> Perhaps we could make it work only for CPU 0 and return EBUSY or whatever
> for all others (or just don't publish the sysfs attribute for the others),
> and change the CPU0 refresh firmware request into a refresh all cores call.
That could also work, it sounds like a sane thing to do having in mind
the current sysfs layout. I will cook up something soonish.
> We could then add a new sysfs attribute to cleanly do the
> update-all-cores request.
That's what Fenghua is doing.
--
Regards/Gruss,
Boris.
Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
GM: Alberto Bozzo
Reg: Dornach, Landkreis Muenchen
HRB Nr. 43632 WEEE Registernr: 129 19551
On Tue, 2012-06-12 at 21:49 +0200, Borislav Petkov wrote:
>
> But why, do you see a better option for how to tell the kernel to load a
> binary blob early at boot?
built it in? There's this feature where you can included firmware blobs
in the kernel image. It has all the benefits of initrd, since the only
time I ever would update initrd if used it, would be when I install a
new kernel, which I just built anyway.
On Tue, 2012-06-12 at 13:58 -0700, H. Peter Anvin wrote:
> even things like Nehalem + Westmere in the same system
Uhm,.. how well do you expect Linux to run on a franken-system like
that? I'm pretty sure perf will malfunction in such a setup.
On Wed, 13 Jun 2012, Borislav Petkov wrote:
> On Tue, Jun 12, 2012 at 10:04:13PM -0300, Henrique de Moraes Holschuh wrote:
> > On Tue, 12 Jun 2012, Borislav Petkov wrote:
> > > I'd guess this is still there to support mixed ucode revisions for some
> >
> > It is still there because of the stable ABI rules. As far as I can tell,
> > it was a crap interface when it got in, and it still is a crap interface
> > now because there simply isn't any sane usercase that requires it, it is
> > dangerous as implemented right now (at least in the Intel case), and even
> > if we fixed the kernel to do the right thing, userspace would not be able
> > to know that and would still need to request 1 microcode refresh per core.
>
> Well, I'm disabling it on AMD.
>
> We could make it iterate over _all_ cores and do the update on each one
> of them even though the user does a sysfs write only for a single core..
Yes, please. I suggest we use core 0 for that. Using my Debian
maintainer hat, I'd rather you got rid of the sysfs entries for every
other core while at it, as it will make our life a lot simpler,
distro-side.
This is still not the proper fix, which would be to add a new sysfs node
to access the proper update-every-core functionality, but it is a damn
good start in the right direction and required to make it safe without
ripping out the old ABI entirely without a deprecation period.
Since Intel processors don't want the per-core behaviour either, you
could fix it in the microcode core itself...
> > Perhaps we could make it work only for CPU 0 and return EBUSY or whatever
> > for all others (or just don't publish the sysfs attribute for the others),
> > and change the CPU0 refresh firmware request into a refresh all cores call.
>
> That could also work, it sounds like a sane thing to do having in mind
> the current sysfs layout. I will cook up something soonish.
>
> > We could then add a new sysfs attribute to cleanly do the
> > update-all-cores request.
>
> That's what Fenghua is doing.
Ok. Please CC me in the patches, if the new ABI arrives in time, I'll
even be able to get it supported on the next Debian stable (and push to
get this stuff backported to the kernel 3.2 which we will ship, I
consider this an important bug-fix to a pontentially very serious issue.
We have *zero* chance of finding out what's wrong if an users' system
start getting subtly crazy because it is running with skewed microcode
among cores.
--
"One disk to rule them all, One disk to find them. One disk to bring
them all and in the darkness grind them. In the Land of Redmond
where the shadows lie." -- The Silicon Valley Tarot
Henrique Holschuh
On Tue, 2012-06-12 at 13:42 -0700, H. Peter Anvin wrote:
> And yes, I would prefer a single sysfs file,
> or better yet a plain old device node.
So how about we rip out all this sysfs crap and go back to
MICROCODE_OLD_INTERFACE ? Afaict that's relatively sane.
I don't remember the exact details of the old interface but yes... seems like every "improvement" has just made it a worse mess.
Peter Zijlstra <[email protected]> wrote:
>On Tue, 2012-06-12 at 13:42 -0700, H. Peter Anvin wrote:
>> And yes, I would prefer a single sysfs file,
>> or better yet a plain old device node.
>
>So how about we rip out all this sysfs crap and go back to
>MICROCODE_OLD_INTERFACE ? Afaict that's relatively sane.
--
Sent from my mobile phone. Please excuse brevity and lack of formatting.
On 06/13/2012 01:32 AM, Peter Zijlstra wrote:
> On Tue, 2012-06-12 at 21:49 +0200, Borislav Petkov wrote:
>>
>> But why, do you see a better option for how to tell the kernel to load a
>> binary blob early at boot?
>
> built it in? There's this feature where you can included firmware blobs
> in the kernel image. It has all the benefits of initrd, since the only
> time I ever would update initrd if used it, would be when I install a
> new kernel, which I just built anyway.
>
You know you can build in an initramfs blob in the kernel, right? In
fact, there always is one there even if you don't create it explicitly.
Problem solved, methinks.
-hpa
--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel. I don't speak on their behalf.
On 06/13/2012 01:38 AM, Peter Zijlstra wrote:
> On Tue, 2012-06-12 at 13:58 -0700, H. Peter Anvin wrote:
>> even things like Nehalem + Westmere in the same system
>
> Uhm,.. how well do you expect Linux to run on a franken-system like
> that? I'm pretty sure perf will malfunction in such a setup.
>
perf might, but I have seen such frankensystems in the field and they
work (although I haven't seen that particular combo, I have to admit.)
-hpa
--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel. I don't speak on their behalf.
On Wed, 13 Jun 2012, Peter Zijlstra wrote:
> On Tue, 2012-06-12 at 13:42 -0700, H. Peter Anvin wrote:
> > And yes, I would prefer a single sysfs file,
> > or better yet a plain old device node.
>
> So how about we rip out all this sysfs crap and go back to
> MICROCODE_OLD_INTERFACE ? Afaict that's relatively sane.
Please teach it to handle partial writes kernel side, so that we can just
use "cat" in userspace instead of special tools (or nasty hacks using 'dd'),
then. And it would be a bit annoying for userspace to figure out whether it
wants Intel or AMD or <whatever vendor> microcode...
IMHO, if we fix the firmware interface to the microcode core, it can do a
lot better than MICROCODE_OLD_INTERFACE, and it can integrate very well with
userspace.
"Fixing" the microcode firmware interface (i.e. making it optimal from
userspace's PoV) is not a complex endeavour:
1. Add the per-system sysfs node, and drop the per-core nodes for
firmware updates. I understand this _will_ be done anyway because the
current status-quo is dangerous.
2. Add a cache layer, so that microcode files are fetched only once per
file per update cycle, no matter how many cores [optional, but quite
welcome as it speeds up the boot]
3. Change the Intel firmware naming convention to a constant name,
instead of the variable naming scheme currently used. This can be done
in a backwards compatible way, by attempting to load the variable named
firmware file if the first attempt fails.
4. Declare proper MODULE_FIRMWARE() and auto-load information.
With all that, all we'd have to do in the distros would be to ship binary
processor microcode the very same way we already deal with firmware for
everything else.
BTW, I've uploaded to Debian a tool that can actually manipulate the Intel
microcode container well enough to help Debian do some release
management[1], and helps the user have tailored, per-system reduced
microcode files. The code is probably crap as far as LKML standards for C
code go, but if there's general interest, I can make the upstream git tree
public in gitorious or something. Full source currently available in the
".orig.tar.bz2" file of the Debian source package:
http://packages.qa.debian.org/i/iucode-tool.html
http://cdn.debian.net/debian/pool/contrib/i/iucode-tool/
http://cdn.debian.net/debian/pool/contrib/i/iucode-tool/iucode-tool_0.8.orig.tar.bz2
[1] Maybe Intel could actually consider making it easier to the general
Linux distro and Linux admin/power user, and provide a proper chagelog along
with its microcode dumps? AMD does it, and it *really* helps... A timeline
of the microcodes shipped by Intel paints an EXTREMELY confusing picture.
--
"One disk to rule them all, One disk to find them. One disk to bring
them all and in the darkness grind them. In the Land of Redmond
where the shadows lie." -- The Silicon Valley Tarot
Henrique Holschuh
On Wed, 2012-06-13 at 06:59 -0700, H. Peter Anvin wrote:
> On 06/13/2012 01:32 AM, Peter Zijlstra wrote:
> > On Tue, 2012-06-12 at 21:49 +0200, Borislav Petkov wrote:
> >>
> >> But why, do you see a better option for how to tell the kernel to load a
> >> binary blob early at boot?
> >
> > built it in? There's this feature where you can included firmware blobs
> > in the kernel image. It has all the benefits of initrd, since the only
> > time I ever would update initrd if used it, would be when I install a
> > new kernel, which I just built anyway.
> >
>
> You know you can build in an initramfs blob in the kernel, right? In
> fact, there always is one there even if you don't create it explicitly.
Didn't know that..
> Problem solved, methinks.
Yep.
On Wed, 2012-06-13 at 07:00 -0700, H. Peter Anvin wrote:
> On 06/13/2012 01:38 AM, Peter Zijlstra wrote:
> > On Tue, 2012-06-12 at 13:58 -0700, H. Peter Anvin wrote:
> >> even things like Nehalem + Westmere in the same system
> >
> > Uhm,.. how well do you expect Linux to run on a franken-system like
> > that? I'm pretty sure perf will malfunction in such a setup.
> >
>
> perf might, but I have seen such frankensystems in the field and they
> work (although I haven't seen that particular combo, I have to admit.)
Thing is, if we boot on the wsm, perf could fault the system with
nonexistent MSR accesses, if we boot on the nhm worst is wrong numbers I
think.
I'm not really inclined to 'fix' this.. the best I'm willing to accept
is a patch that would detect this situation and simply disable all
hardware perf support for the platform.
On 06/13/2012 08:32 AM, Peter Zijlstra wrote:
> On Wed, 2012-06-13 at 06:59 -0700, H. Peter Anvin wrote:
>> On 06/13/2012 01:32 AM, Peter Zijlstra wrote:
>>> On Tue, 2012-06-12 at 21:49 +0200, Borislav Petkov wrote:
>>>>
>>>> But why, do you see a better option for how to tell the kernel to load a
>>>> binary blob early at boot?
>>>
>>> built it in? There's this feature where you can included firmware blobs
>>> in the kernel image. It has all the benefits of initrd, since the only
>>> time I ever would update initrd if used it, would be when I install a
>>> new kernel, which I just built anyway.
>>>
>>
>> You know you can build in an initramfs blob in the kernel, right? In
>> fact, there always is one there even if you don't create it explicitly.
>
> Didn't know that..
>
>> Problem solved, methinks.
>
> Yep.
>
We have to be a bit careful about it, though; the early initramfs
content can't be compressed. It might end up making more sense to make
the builtin version of that a separate blob from the main builtin blob.
-hpa
On 06/13/2012 08:37 AM, Peter Zijlstra wrote:
>>
>> perf might, but I have seen such frankensystems in the field and they
>> work (although I haven't seen that particular combo, I have to admit.)
>
> Thing is, if we boot on the wsm, perf could fault the system with
> nonexistent MSR accesses, if we boot on the nhm worst is wrong numbers I
> think.
>
> I'm not really inclined to 'fix' this.. the best I'm willing to accept
> is a patch that would detect this situation and simply disable all
> hardware perf support for the platform.
>
BIOS is *supposed* to pick the least capable CPU as the BSP. BIOS
being, well, BIOS is known to not always do that, but on the other hand
it is apparently common enough that at least *some* BIOSes are known to
pay attention and do so.
Anyway, I'm not suggesting we should go out of our way to make perf work
on these systems. Most likely, as you point out, it is either "just
going to work" or totally fail.
What I don't want to see is creating a new facility and encouraging
people to use it, when it is known to be in error. I'm trying to find
out at the moment just where the practical cutoff is.
-hpa
On Wed, Jun 13, 2012 at 09:36:49AM -0300, Henrique de Moraes Holschuh wrote:
> Yes, please. I suggest we use core 0 for that. Using my Debian
> maintainer hat, I'd rather you got rid of the sysfs entries for every
> other core while at it, as it will make our life a lot simpler,
> distro-side.
Wouldn't we have some sort of ABI breakage if I remove the sysfs files?
Instead, I was thinking of having the rest of the files not on the BSP
return -EINVAL and only the BSP reload ucode on the whole system.
> This is still not the proper fix, which would be to add a new sysfs node
> to access the proper update-every-core functionality, but it is a damn
> good start in the right direction and required to make it safe without
> ripping out the old ABI entirely without a deprecation period.
Yes, I'd like to have the system-wide sysfs node somewhere under
/sys/devices/system/cpu/microcode/ but we'll see how that goes.
> Since Intel processors don't want the per-core behaviour either, you
> could fix it in the microcode core itself...
Yes.
> Ok. Please CC me in the patches, if the new ABI arrives in time, I'll
> even be able to get it supported on the next Debian stable (and push
> to get this stuff backported to the kernel 3.2 which we will ship,
> I consider this an important bug-fix to a pontentially very serious
> issue. We have *zero* chance of finding out what's wrong if an users'
> system start getting subtly crazy because it is running with skewed
> microcode among cores.
Ok, will do.
Btw, I have to think about whether we really want to backport this to
stable since it is not a regression fix but functionality change which
kinda fixes a some sort of bug. Hmm, the stable rules are kinda blurry
here. I could write a minimal fix with stable in mind though... we'll
see.
hpa, what is our take here, should we backport a minimal change
disabling reloading of ucode per-cpu for stable? It is the wrong thing
to do anyway.
--
Regards/Gruss,
Boris.
Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
GM: Alberto Bozzo
Reg: Dornach, Landkreis Muenchen
HRB Nr. 43632 WEEE Registernr: 129 19551
On Wed, 13 Jun 2012, Borislav Petkov wrote:
> On Wed, Jun 13, 2012 at 09:36:49AM -0300, Henrique de Moraes Holschuh wrote:
> > Yes, please. I suggest we use core 0 for that. Using my Debian
> > maintainer hat, I'd rather you got rid of the sysfs entries for every
> > other core while at it, as it will make our life a lot simpler,
> > distro-side.
>
> Wouldn't we have some sort of ABI breakage if I remove the sysfs files?
It wouldn't cause any problems in Debian, because currently our crap only
uses the deprecated /dev interface for Intel, and doesn't support AMD at
all. This is something I am already addressing.
We would have to check the other distros, so maybe your -EINVAL idea is
better.
> Instead, I was thinking of having the rest of the files not on the BSP
> return -EINVAL and only the BSP reload ucode on the whole system.
That will work as well, although it will cause userspace to waste time
looping all CPUs for no good reason. But userspace can be smart enough to
skip this on boot in the most common cases (modular microcode, with the
microcode already available at /lib/firmware at the time the module loads),
so I guess that doesn't matter much.
And obviously as soon as the system-wide microcode update sysfs node is
available, we can just test for the existence of that node, and ignore the
broken per-node interface when the system-wide node exists.
> > This is still not the proper fix, which would be to add a new sysfs node
> > to access the proper update-every-core functionality, but it is a damn
> > good start in the right direction and required to make it safe without
> > ripping out the old ABI entirely without a deprecation period.
>
> Yes, I'd like to have the system-wide sysfs node somewhere under
> /sys/devices/system/cpu/microcode/ but we'll see how that goes.
Please set the sysfs node name ASAP, so that I can make sure the next Debian
stable ships knowing how to handle it :-)
> > Ok. Please CC me in the patches, if the new ABI arrives in time, I'll
> > even be able to get it supported on the next Debian stable (and push
> > to get this stuff backported to the kernel 3.2 which we will ship,
> > I consider this an important bug-fix to a pontentially very serious
> > issue. We have *zero* chance of finding out what's wrong if an users'
> > system start getting subtly crazy because it is running with skewed
> > microcode among cores.
>
> Ok, will do.
Thank you!
> Btw, I have to think about whether we really want to backport this to
> stable since it is not a regression fix but functionality change which
> kinda fixes a some sort of bug. Hmm, the stable rules are kinda blurry
> here. I could write a minimal fix with stable in mind though... we'll
> see.
It is not a regression fix, but it is a major bugfix to the microcode
core...
> hpa, what is our take here, should we backport a minimal change
> disabling reloading of ucode per-cpu for stable? It is the wrong thing
> to do anyway.
--
"One disk to rule them all, One disk to find them. One disk to bring
them all and in the darkness grind them. In the Land of Redmond
where the shadows lie." -- The Silicon Valley Tarot
Henrique Holschuh
Ok,
so at least Henrique and I feel that this per-core reloading interface
is crap and it should go. Below is a first attempt to fix that. It is
supposed to be a minimal fix so that it can be backported to stable.
I know, I know, it is not a fully stable-rules compatible patch but
having in mind the stupidity of the interface and how dangerous it can
be for users who want to shoot themselves in the foot with microcode, we
better backport it everywhere we can.
So guys, please have a look and let me know of any objections.
It is lightly tested here and it seems to work.
Thanks.
From: Borislav Petkov <[email protected]>
Date: Fri, 15 Jun 2012 14:19:55 +0200
Subject: [PATCH] x86, microcode: Sanitize per-cpu microcode reloading
interface
Microcode reloading in a per-core manner is a very bad idea for both
major x86 vendors. And the thing is, we have such interface with which
we can end up with different microcode versions applied on different
cores of an otherwise homogeneous wrt (family,model,stepping) system.
So turn off the possibility of doing that per core and allow it only
system-wide.
This is a minimal fix which we'd like to see in stable too thus the
more-or-less arbitrary decision to allow system-wide reloading only on
the BSP:
$ echo 1 > /sys/devices/system/cpu/cpu0/microcode/reload
...
and disable the interface on the other cores:
$ echo 1 > /sys/devices/system/cpu/cpu23/microcode/reload
-bash: echo: write error: Invalid argument
A more generic fix will follow.
Cc: Henrique de Moraes Holschuh <[email protected]>
Cc: [email protected]
Signed-off-by: Borislav Petkov <[email protected]>
---
arch/x86/kernel/microcode_core.c | 26 +++++++++++++++++++-------
1 file changed, 19 insertions(+), 7 deletions(-)
diff --git a/arch/x86/kernel/microcode_core.c b/arch/x86/kernel/microcode_core.c
index fbdfc6917180..24b852b61be3 100644
--- a/arch/x86/kernel/microcode_core.c
+++ b/arch/x86/kernel/microcode_core.c
@@ -298,19 +298,31 @@ static ssize_t reload_store(struct device *dev,
const char *buf, size_t size)
{
unsigned long val;
- int cpu = dev->id;
- ssize_t ret = 0;
+ int cpu;
+ ssize_t ret = 0, tmp_ret;
+
+ /* allow reload only from the BSP */
+ if (boot_cpu_data.cpu_index != dev->id)
+ return -EINVAL;
ret = kstrtoul(buf, 0, &val);
if (ret)
return ret;
- if (val == 1) {
- get_online_cpus();
- if (cpu_online(cpu))
- ret = reload_for_cpu(cpu);
- put_online_cpus();
+ if (val != 1)
+ return size;
+
+ get_online_cpus();
+ for_each_online_cpu(cpu) {
+ tmp_ret = reload_for_cpu(cpu);
+ if (tmp_ret != 0)
+ pr_warn("Error reloading microcode on CPU %d\n", cpu);
+
+ /* save retval of the first encountered reload error */
+ if (!ret)
+ ret = tmp_ret;
}
+ put_online_cpus();
if (!ret)
ret = size;
--
1.7.11.rc1
On Fri, 2012-06-15 at 14:37 +0200, Borislav Petkov wrote:
> This is a minimal fix which we'd like to see in stable too thus the
> more-or-less arbitrary decision to allow system-wide reloading only on
> the BSP:
>
> $ echo 1 > /sys/devices/system/cpu/cpu0/microcode/reload
> ...
>
> and disable the interface on the other cores:
>
> $ echo 1 > /sys/devices/system/cpu/cpu23/microcode/reload
> -bash: echo: write error: Invalid argument
One could allow the reload from all cpus and simply do all cpus, but I
guess the rationale for not doing that and restricting it to cpu0 is to
avoid the O(n^2) thing in case userspace issues a reload on all cpus?
If so it would be good to mention in the Changelog so people see its not
a 'random' choice.
On Fri, Jun 15, 2012 at 02:42:41PM +0200, Peter Zijlstra wrote:
> On Fri, 2012-06-15 at 14:37 +0200, Borislav Petkov wrote:
> > This is a minimal fix which we'd like to see in stable too thus the
> > more-or-less arbitrary decision to allow system-wide reloading only on
> > the BSP:
> >
> > $ echo 1 > /sys/devices/system/cpu/cpu0/microcode/reload
> > ...
> >
> > and disable the interface on the other cores:
> >
> > $ echo 1 > /sys/devices/system/cpu/cpu23/microcode/reload
> > -bash: echo: write error: Invalid argument
>
> One could allow the reload from all cpus and simply do all cpus, but I
> guess the rationale for not doing that and restricting it to cpu0 is to
> avoid the O(n^2) thing in case userspace issues a reload on all cpus?
That's a good point.
Actually, the BSP thing was more-or-less an arbitrary deal. What we want as an
endresult is to do:
$ echo 1 > /sys/devices/system/cpu/microcode/reload
(notice this is in the toplevel "cpu" directory in sysfs)
and do the system-wide reload there.
Obviously, this needs a couple of more patches and we wanted to have a
simplest stable fix first.
> If so it would be good to mention in the Changelog so people see its not
> a 'random' choice.
Yes, will do, thanks!
--
Regards/Gruss,
Boris.
Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
GM: Alberto Bozzo
Reg: Dornach, Landkreis Muenchen
HRB Nr. 43632 WEEE Registernr: 129 19551
On Fri, Jun 15, 2012 at 02:52:40PM +0200, Borislav Petkov wrote:
> Actually, the BSP thing was more-or-less an arbitrary deal. What we want as an
> endresult is to do:
>
> $ echo 1 > /sys/devices/system/cpu/microcode/reload
>
> (notice this is in the toplevel "cpu" directory in sysfs)
>
> and do the system-wide reload there.
Actually, that was easier than I thought, see below. And it seems to
workie too.
--
From: Borislav Petkov <[email protected]>
Date: Fri, 15 Jun 2012 18:46:05 +0200
Subject: [PATCH] x86, microcode: Make reload interface per system
The reload interface should be per-system so that a full system ucode
reload happens (on each core) when doing
echo 1 > /sys/devices/system/cpu/microcode/reload
Move it to the cpu subsys directory instead of it being per-cpu.
Signed-off-by: Borislav Petkov <[email protected]>
---
arch/x86/kernel/microcode_core.c | 32 ++++++++++++++++++++++++++++----
1 file changed, 28 insertions(+), 4 deletions(-)
diff --git a/arch/x86/kernel/microcode_core.c b/arch/x86/kernel/microcode_core.c
index 24b852b61be3..4c6f3b37ed3c 100644
--- a/arch/x86/kernel/microcode_core.c
+++ b/arch/x86/kernel/microcode_core.c
@@ -351,7 +351,6 @@ static DEVICE_ATTR(version, 0400, version_show, NULL);
static DEVICE_ATTR(processor_flags, 0400, pf_show, NULL);
static struct attribute *mc_default_attrs[] = {
- &dev_attr_reload.attr,
&dev_attr_version.attr,
&dev_attr_processor_flags.attr,
NULL
@@ -528,6 +527,16 @@ static const struct x86_cpu_id microcode_id[] = {
MODULE_DEVICE_TABLE(x86cpu, microcode_id);
#endif
+static struct attribute *cpu_root_microcode_attrs[] = {
+ &dev_attr_reload.attr,
+ NULL
+};
+
+static struct attribute_group cpu_root_microcode_group = {
+ .name = "microcode",
+ .attrs = cpu_root_microcode_attrs,
+};
+
static int __init microcode_init(void)
{
struct cpuinfo_x86 *c = &cpu_data(0);
@@ -559,9 +568,17 @@ static int __init microcode_init(void)
if (error)
goto out_pdev;
+ error = sysfs_create_group(&cpu_subsys.dev_root->kobj,
+ &cpu_root_microcode_group);
+
+ if (error) {
+ pr_err("Error creating microcode group!\n");
+ goto out_driver;
+ }
+
error = microcode_dev_init();
if (error)
- goto out_driver;
+ goto out_ucode_group;
register_syscore_ops(&mc_syscore_ops);
register_hotcpu_notifier(&mc_cpu_notifier);
@@ -571,7 +588,11 @@ static int __init microcode_init(void)
return 0;
-out_driver:
+ out_ucode_group:
+ sysfs_remove_group(&cpu_subsys.dev_root->kobj,
+ &cpu_root_microcode_group);
+
+ out_driver:
get_online_cpus();
mutex_lock(µcode_mutex);
@@ -580,7 +601,7 @@ out_driver:
mutex_unlock(µcode_mutex);
put_online_cpus();
-out_pdev:
+ out_pdev:
platform_device_unregister(microcode_pdev);
return error;
@@ -596,6 +617,9 @@ static void __exit microcode_exit(void)
unregister_hotcpu_notifier(&mc_cpu_notifier);
unregister_syscore_ops(&mc_syscore_ops);
+ sysfs_remove_group(&cpu_subsys.dev_root->kobj,
+ &cpu_root_microcode_group);
+
get_online_cpus();
mutex_lock(µcode_mutex);
--
1.7.11.rc1
--
Regards/Gruss,
Boris.
Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
GM: Alberto Bozzo
Reg: Dornach, Landkreis Muenchen
HRB Nr. 43632 WEEE Registernr: 129 19551
On Fri, 2012-06-15 at 18:52 +0200, Borislav Petkov wrote:
> Actually, that was easier than I thought, see below. And it seems to
> workie too.
This is on top of that other patch that makes reload_store() do the
whole system, right?
On Fri, Jun 15, 2012 at 07:16:26PM +0200, Peter Zijlstra wrote:
> On Fri, 2012-06-15 at 18:52 +0200, Borislav Petkov wrote:
> > Actually, that was easier than I thought, see below. And it seems to
> > workie too.
>
> This is on top of that other patch that makes reload_store() do the
> whole system, right?
Yes sir. This one simply moves the sysfs 'reload' node to
/sys/devices/system/cpu/microcode/reload.
--
Regards/Gruss,
Boris.
Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
GM: Alberto Bozzo
Reg: Dornach, Landkreis Muenchen
HRB Nr. 43632 WEEE Registernr: 129 19551
On Fri, 15 Jun 2012, Borislav Petkov wrote:
> From: Borislav Petkov <[email protected]>
> Date: Fri, 15 Jun 2012 14:19:55 +0200
> Subject: [PATCH] x86, microcode: Sanitize per-cpu microcode reloading
> interface
>
> Microcode reloading in a per-core manner is a very bad idea for both
> major x86 vendors. And the thing is, we have such interface with which
> we can end up with different microcode versions applied on different
> cores of an otherwise homogeneous wrt (family,model,stepping) system.
>
> So turn off the possibility of doing that per core and allow it only
> system-wide.
>
> This is a minimal fix which we'd like to see in stable too thus the
> more-or-less arbitrary decision to allow system-wide reloading only on
> the BSP:
>
> $ echo 1 > /sys/devices/system/cpu/cpu0/microcode/reload
> ...
>
> and disable the interface on the other cores:
>
> $ echo 1 > /sys/devices/system/cpu/cpu23/microcode/reload
> -bash: echo: write error: Invalid argument
>
> A more generic fix will follow.
>
> Cc: Henrique de Moraes Holschuh <[email protected]>
> Cc: [email protected]
> Signed-off-by: Borislav Petkov <[email protected]>
Tested-by: Henrique de Moraes Holschuh <[email protected]>
I've actually backported this to 3.0 (trivial) and tested it there.
> ---
> arch/x86/kernel/microcode_core.c | 26 +++++++++++++++++++-------
> 1 file changed, 19 insertions(+), 7 deletions(-)
>
> diff --git a/arch/x86/kernel/microcode_core.c b/arch/x86/kernel/microcode_core.c
> index fbdfc6917180..24b852b61be3 100644
> --- a/arch/x86/kernel/microcode_core.c
> +++ b/arch/x86/kernel/microcode_core.c
> @@ -298,19 +298,31 @@ static ssize_t reload_store(struct device *dev,
> const char *buf, size_t size)
> {
> unsigned long val;
> - int cpu = dev->id;
> - ssize_t ret = 0;
> + int cpu;
> + ssize_t ret = 0, tmp_ret;
> +
> + /* allow reload only from the BSP */
> + if (boot_cpu_data.cpu_index != dev->id)
> + return -EINVAL;
>
> ret = kstrtoul(buf, 0, &val);
> if (ret)
> return ret;
>
> - if (val == 1) {
> - get_online_cpus();
> - if (cpu_online(cpu))
> - ret = reload_for_cpu(cpu);
> - put_online_cpus();
> + if (val != 1)
> + return size;
> +
> + get_online_cpus();
> + for_each_online_cpu(cpu) {
> + tmp_ret = reload_for_cpu(cpu);
> + if (tmp_ret != 0)
> + pr_warn("Error reloading microcode on CPU %d\n", cpu);
> +
> + /* save retval of the first encountered reload error */
> + if (!ret)
> + ret = tmp_ret;
> }
> + put_online_cpus();
>
> if (!ret)
> ret = size;
--
"One disk to rule them all, One disk to find them. One disk to bring
them all and in the darkness grind them. In the Land of Redmond
where the shadows lie." -- The Silicon Valley Tarot
Henrique Holschuh
On Fri, 15 Jun 2012, Borislav Petkov wrote:
> From: Borislav Petkov <[email protected]>
> Date: Fri, 15 Jun 2012 18:46:05 +0200
> Subject: [PATCH] x86, microcode: Make reload interface per system
>
> The reload interface should be per-system so that a full system ucode
> reload happens (on each core) when doing
>
> echo 1 > /sys/devices/system/cpu/microcode/reload
>
> Move it to the cpu subsys directory instead of it being per-cpu.
>
> Signed-off-by: Borislav Petkov <[email protected]>
It is a pity this one is much harder to backport to 3.2 and 3.0. I'd
really like to have the new interface there. But it looks good, and we
will support the new /sys/devices/system/cpu/microcode/reload sysfs node
in Debian for the benefit of anyone using a newer kernel than the
distro's (which will be based on 3.2).
So, fwiw, you have my:
Acked-by-unimportant-person: Henrique de Moraes Holschuh <[email protected]>
> ---
> arch/x86/kernel/microcode_core.c | 32 ++++++++++++++++++++++++++++----
> 1 file changed, 28 insertions(+), 4 deletions(-)
>
> diff --git a/arch/x86/kernel/microcode_core.c b/arch/x86/kernel/microcode_core.c
> index 24b852b61be3..4c6f3b37ed3c 100644
> --- a/arch/x86/kernel/microcode_core.c
> +++ b/arch/x86/kernel/microcode_core.c
> @@ -351,7 +351,6 @@ static DEVICE_ATTR(version, 0400, version_show, NULL);
> static DEVICE_ATTR(processor_flags, 0400, pf_show, NULL);
>
> static struct attribute *mc_default_attrs[] = {
> - &dev_attr_reload.attr,
> &dev_attr_version.attr,
> &dev_attr_processor_flags.attr,
> NULL
> @@ -528,6 +527,16 @@ static const struct x86_cpu_id microcode_id[] = {
> MODULE_DEVICE_TABLE(x86cpu, microcode_id);
> #endif
>
> +static struct attribute *cpu_root_microcode_attrs[] = {
> + &dev_attr_reload.attr,
> + NULL
> +};
> +
> +static struct attribute_group cpu_root_microcode_group = {
> + .name = "microcode",
> + .attrs = cpu_root_microcode_attrs,
> +};
> +
> static int __init microcode_init(void)
> {
> struct cpuinfo_x86 *c = &cpu_data(0);
> @@ -559,9 +568,17 @@ static int __init microcode_init(void)
> if (error)
> goto out_pdev;
>
> + error = sysfs_create_group(&cpu_subsys.dev_root->kobj,
> + &cpu_root_microcode_group);
> +
> + if (error) {
> + pr_err("Error creating microcode group!\n");
> + goto out_driver;
> + }
> +
> error = microcode_dev_init();
> if (error)
> - goto out_driver;
> + goto out_ucode_group;
>
> register_syscore_ops(&mc_syscore_ops);
> register_hotcpu_notifier(&mc_cpu_notifier);
> @@ -571,7 +588,11 @@ static int __init microcode_init(void)
>
> return 0;
>
> -out_driver:
> + out_ucode_group:
> + sysfs_remove_group(&cpu_subsys.dev_root->kobj,
> + &cpu_root_microcode_group);
> +
> + out_driver:
> get_online_cpus();
> mutex_lock(µcode_mutex);
>
> @@ -580,7 +601,7 @@ out_driver:
> mutex_unlock(µcode_mutex);
> put_online_cpus();
>
> -out_pdev:
> + out_pdev:
> platform_device_unregister(microcode_pdev);
> return error;
>
> @@ -596,6 +617,9 @@ static void __exit microcode_exit(void)
> unregister_hotcpu_notifier(&mc_cpu_notifier);
> unregister_syscore_ops(&mc_syscore_ops);
>
> + sysfs_remove_group(&cpu_subsys.dev_root->kobj,
> + &cpu_root_microcode_group);
> +
> get_online_cpus();
> mutex_lock(µcode_mutex);
--
"One disk to rule them all, One disk to find them. One disk to bring
them all and in the darkness grind them. In the Land of Redmond
where the shadows lie." -- The Silicon Valley Tarot
Henrique Holschuh
On 06/18/2012 07:46 PM, Henrique de Moraes Holschuh wrote:
>
> It is a pity this one is much harder to backport to 3.2 and 3.0. I'd
> really like to have the new interface there. But it looks good, and we
> will support the new /sys/devices/system/cpu/microcode/reload sysfs node
> in Debian for the benefit of anyone using a newer kernel than the
> distro's (which will be based on 3.2).
>
> So, fwiw, you have my:
> Acked-by-unimportant-person: Henrique de Moraes Holschuh <[email protected]>
>
I have to admit to being slightly questioning to the whole "sysfs
trigger, request_firmware" interface... it seems royally backwards to me
(it makes sense for the initial firmware load, I guess, but that is
better done early.)
Either way I don't have a hugely strong preference.
-hpa
On Mon, Jun 18, 2012 at 11:46:39PM -0300, Henrique de Moraes Holschuh wrote:
> On Fri, 15 Jun 2012, Borislav Petkov wrote:
> > From: Borislav Petkov <[email protected]>
> > Date: Fri, 15 Jun 2012 18:46:05 +0200
> > Subject: [PATCH] x86, microcode: Make reload interface per system
> >
> > The reload interface should be per-system so that a full system ucode
> > reload happens (on each core) when doing
> >
> > echo 1 > /sys/devices/system/cpu/microcode/reload
> >
> > Move it to the cpu subsys directory instead of it being per-cpu.
> >
> > Signed-off-by: Borislav Petkov <[email protected]>
>
> It is a pity this one is much harder to backport to 3.2 and 3.0.
Yeah, not harder. It changes the sysfs interface and we can't do that
for already released kernels.
But I think you meant that.
> I'd really like to have the new interface there. But it looks good,
> and we will support the new /sys/devices/system/cpu/microcode/reload
> sysfs node in Debian for the benefit of anyone using a newer kernel
> than the distro's (which will be based on 3.2).
Right, hopefully the next Debian release will have that nice shiny new
interface :-)
> So, fwiw, you have my:
> Acked-by-unimportant-person: Henrique de Moraes Holschuh <[email protected]>
No, don't underestimate yourself. I think it helps a lot to have someone
look at the patch and even better: give comments whether what we're
doing is sane for userspace. So, I appreciate your comments a lot, keep
'em coming :-)
Thanks for review and testing, I'll send out the patches soon.
--
Regards/Gruss,
Boris.
Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
GM: Alberto Bozzo
Reg: Dornach, Landkreis Muenchen
HRB Nr. 43632 WEEE Registernr: 129 19551
On Mon, Jun 18, 2012 at 08:31:08PM -0700, H. Peter Anvin wrote:
> On 06/18/2012 07:46 PM, Henrique de Moraes Holschuh wrote:
> >
> > It is a pity this one is much harder to backport to 3.2 and 3.0. I'd
> > really like to have the new interface there. But it looks good, and we
> > will support the new /sys/devices/system/cpu/microcode/reload sysfs node
> > in Debian for the benefit of anyone using a newer kernel than the
> > distro's (which will be based on 3.2).
> >
> > So, fwiw, you have my:
> > Acked-by-unimportant-person: Henrique de Moraes Holschuh <[email protected]>
> >
>
> I have to admit to being slightly questioning to the whole "sysfs
> trigger, request_firmware" interface... it seems royally backwards to me
> (it makes sense for the initial firmware load, I guess, but that is
> better done early.)
Two reasons:
1. There are those guys who like to run their systems for years without
upgrading the kernel (you know who you are :)) and in such cases, you
want to be able to upgrade the microcode you loaded early with a newer
version which the hw vendor released in the meantime.
[ Which makes the whole ucode-load-early deal not the solution to all
problems since we need to be able to load ucode without rebooting too,
if possible. ]
2. Right, we can do that by
$ rmmod microcode; modprobe microcode
but then the same guys come :-) and say they don't want to reload
modules for security reasons.
Thus the sysfs interface and sysfs is the proper thing we have for such
things so ...
... and I'm always open for better ideas, btw.
> Either way I don't have a hugely strong preference.
I'll run the patches here and send them out maybe later today if all
looks good.
Thanks.
--
Regards/Gruss,
Boris.
Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
GM: Alberto Bozzo
Reg: Dornach, Landkreis Muenchen
HRB Nr. 43632 WEEE Registernr: 129 19551
On Tue, 2012-06-19 at 07:11 +0200, Borislav Petkov wrote:
> $ rmmod microcode; modprobe microcode
>
> but then the same guys come :-) and say they don't want to reload
> modules for security reasons.
Nah, they built their kernel without modules, much more secure!
On Tue, Jun 19, 2012 at 10:17:44AM +0200, Peter Zijlstra wrote:
> On Tue, 2012-06-19 at 07:11 +0200, Borislav Petkov wrote:
> > $ rmmod microcode; modprobe microcode
> >
> > but then the same guys come :-) and say they don't want to reload
> > modules for security reasons.
>
> Nah, they built their kernel without modules, much more secure!
Not secure enough - they don't get microcode then (ucode loader uses
request_firmware()) and has to be module for the initial load, at least.
But I haven't tested what happens when it is built in and I use the
reload interface, hehe something to play with today.
--
Regards/Gruss,
Boris.
Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
GM: Alberto Bozzo
Reg: Dornach, Landkreis Muenchen
HRB Nr. 43632 WEEE Registernr: 129 19551
On Tue, 2012-06-19 at 12:22 +0200, Borislav Petkov wrote:
>
> But I haven't tested what happens when it is built in and I use the
> reload interface, hehe something to play with today.
I couldn't easily find where to place the damn ucode image anyway, so I
built in the old-style /dev interface and used the microcode.ctl package
to load it.
Fwiw, that works just fine when its built-in.
On Tue, Jun 19, 2012 at 12:26:08PM +0200, Peter Zijlstra wrote:
> On Tue, 2012-06-19 at 12:22 +0200, Borislav Petkov wrote:
> >
> > But I haven't tested what happens when it is built in and I use the
> > reload interface, hehe something to play with today.
>
> I couldn't easily find where to place the damn ucode image anyway, so I
> built in the old-style /dev interface and used the microcode.ctl package
> to load it.
>
> Fwiw, that works just fine when its built-in.
Ok, so something changed in the request_firmware thing in the meantime.
Last time I checked we waited for 60 seconds to get the ucode blob from
userspace but that wasn't happening that early in the boot process so we
continued without newer ucode patch.
Now we load the ucode driver very early if it is built-in (look at
printk timestamps):
[ 3.651275] microcode: CPU0: patch_level=0x010000c4
[ 3.664943] microcode: CPU1: patch_level=0x010000c4
[ 3.670177] microcode: CPU2: patch_level=0x010000c4
[ 3.675407] microcode: CPU3: patch_level=0x010000c4
...
and then later, when we have userspace, we can use the reload thing to
re-request microcode. Which is also fine, I guess.
--
Regards/Gruss,
Boris.
Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
GM: Alberto Bozzo
Reg: Dornach, Landkreis Muenchen
HRB Nr. 43632 WEEE Registernr: 129 19551
On 06/19/2012 08:06 AM, Borislav Petkov wrote:
>
> Now we load the ucode driver very early if it is built-in (look at
> printk timestamps):
>
> [ 3.651275] microcode: CPU0: patch_level=0x010000c4
> [ 3.664943] microcode: CPU1: patch_level=0x010000c4
> [ 3.670177] microcode: CPU2: patch_level=0x010000c4
> [ 3.675407] microcode: CPU3: patch_level=0x010000c4
> ...
>
> and then later, when we have userspace, we can use the reload thing to
> re-request microcode. Which is also fine, I guess.
>
Right, although we can push it much, much earlier than that.
-hpa
On Tue, 19 Jun 2012, Peter Zijlstra wrote:
> On Tue, 2012-06-19 at 12:22 +0200, Borislav Petkov wrote:
> > But I haven't tested what happens when it is built in and I use the
> > reload interface, hehe something to play with today.
>
> I couldn't easily find where to place the damn ucode image anyway, so I
> built in the old-style /dev interface and used the microcode.ctl package
> to load it.
The firmware interface is fine *after the system boot*, we just needed a
better way to trigger it through sysfs. These patches address that.
The sysfs interface is useful to immediately apply a microcode update
when a new one is made available, without the need for a reboot... most
microcode updates are not of the sort the kernel is testing for their
presence at boot. The sysfs interface is also useful when the required
microcode is not available at the time the microcode driver first
requests it.
We still need a proper way to load microcode very very early (which
requires that the microcode be available to the kernel and THAT
has nothing to do with sysfs :-)
Even if we have the very early microcode facility, we *still* want the
runtime one based on sysfs+request_firmware in order to update microcode
without the need for a reboot.
--
"One disk to rule them all, One disk to find them. One disk to bring
them all and in the darkness grind them. In the Land of Redmond
where the shadows lie." -- The Silicon Valley Tarot
Henrique Holschuh
On Tue, 19 Jun 2012, Borislav Petkov wrote:
> On Mon, Jun 18, 2012 at 11:46:39PM -0300, Henrique de Moraes Holschuh wrote:
> > On Fri, 15 Jun 2012, Borislav Petkov wrote:
> > > From: Borislav Petkov <[email protected]>
> > > Date: Fri, 15 Jun 2012 18:46:05 +0200
> > > Subject: [PATCH] x86, microcode: Make reload interface per system
> > >
> > > The reload interface should be per-system so that a full system ucode
> > > reload happens (on each core) when doing
> > >
> > > echo 1 > /sys/devices/system/cpu/microcode/reload
> > >
> > > Move it to the cpu subsys directory instead of it being per-cpu.
> > >
> > > Signed-off-by: Borislav Petkov <[email protected]>
> >
> > It is a pity this one is much harder to backport to 3.2 and 3.0.
>
> Yeah, not harder. It changes the sysfs interface and we can't do that
> for already released kernels.
>
> But I think you meant that.
Yeah.
I was wondering whether it would be worthwhile to backport the new ABI for
the Debian 3.2 kernel or not. Probably not.
Backporting to 3.2 and 3.0 is straigthforward, however it will look nasty as
one has to "inappropriately touch" the cpu sysdev class to get the attribute
group directly connected to /sys/devices/system/cpu.
I did notice there were no stable backports of the error unwind during
module init, but that one is a very rarely used codepath. Maybe worth to
backport a fix to stable, though.
> > I'd really like to have the new interface there. But it looks good,
> > and we will support the new /sys/devices/system/cpu/microcode/reload
> > sysfs node in Debian for the benefit of anyone using a newer kernel
> > than the distro's (which will be based on 3.2).
>
> Right, hopefully the next Debian release will have that nice shiny new
> interface :-)
I sure hope so :) Actually, I hope Debian will release a
"stable-and-a-half" that will already have it.
> > So, fwiw, you have my:
> > Acked-by-unimportant-person: Henrique de Moraes Holschuh <[email protected]>
>
> No, don't underestimate yourself. I think it helps a lot to have someone
> look at the patch and even better: give comments whether what we're
> doing is sane for userspace. So, I appreciate your comments a lot, keep
> 'em coming :-)
Well, my ack is unimportant in the "this is not an area of the kernel I have
any authority to ack things" sense. But we don't have a "I-wanna-that-by:"
or even a "Thumbs-up-by:"...
> Thanks for review and testing, I'll send out the patches soon.
Thank you for addressing these issues and writing the patches!
--
"One disk to rule them all, One disk to find them. One disk to bring
them all and in the darkness grind them. In the Land of Redmond
where the shadows lie." -- The Silicon Valley Tarot
Henrique Holschuh
On 06/19/2012 11:22 AM, Henrique de Moraes Holschuh wrote:
>
> Even if we have the very early microcode facility, we *still* want the
> runtime one based on sysfs+request_firmware in order to update microcode
> without the need for a reboot.
>
I am slightly questioning the clause "based on sysfs+request_firmware"
in the above sentence, but again, not a huge deal either way.
-hpa
On Tue, Jun 19, 2012 at 03:57:36PM -0300, Henrique de Moraes Holschuh wrote:
> I was wondering whether it would be worthwhile to backport the new ABI
> for the Debian 3.2 kernel or not. Probably not.
Nope, changing released kernels' ABI is a no-no.
> Backporting to 3.2 and 3.0 is straigthforward, however it will look
> nasty as one has to "inappropriately touch" the cpu sysdev class to
> get the attribute group directly connected to /sys/devices/system/cpu.
>
> I did notice there were no stable backports of the error unwind during
> module init, but that one is a very rarely used codepath. Maybe worth
> to backport a fix to stable, though.
Yeah, stable rules say we only backport regression fixes and although
missing error unwind is a small regression, I've never heard of it
causing trouble.
> Well, my ack is unimportant in the "this is not an area of the kernel
> I have any authority to ack things" sense. But we don't have a
> "I-wanna-that-by:" or even a "Thumbs-up-by:"...
Which reminds me, I forgot to add your tags to the patches, sorry.
@hpa: Would you please add Henrique's {Tested,Acked}-by tags to the
patches? Thanks.
> > Thanks for review and testing, I'll send out the patches soon.
>
> Thank you for addressing these issues and writing the patches!
Sure, absolutely! :-)
--
Regards/Gruss,
Boris.
Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
GM: Alberto Bozzo
Reg: Dornach, Landkreis Muenchen
HRB Nr. 43632 WEEE Registernr: 129 19551
On Tue, 19 Jun 2012, H. Peter Anvin wrote:
> On 06/19/2012 11:22 AM, Henrique de Moraes Holschuh wrote:
> > Even if we have the very early microcode facility, we *still* want the
> > runtime one based on sysfs+request_firmware in order to update microcode
> > without the need for a reboot.
>
> I am slightly questioning the clause "based on sysfs+request_firmware"
> in the above sentence, but again, not a huge deal either way.
Well, it makes the kernel tell us what (amd/intel/whatever) firmware it
wants, and requires (after patching) a simple echo 1 >foo to activate. It
is also harmless if activated without updated firmware in place.
/dev/cpu/microcode requires special software (or a dd with bs>=file size),
since it borks if userspace breaks a microcode into two write() syscalls,
and requires detection of the cpu vendor. Very annoying.
--
"One disk to rule them all, One disk to find them. One disk to bring
them all and in the darkness grind them. In the Land of Redmond
where the shadows lie." -- The Silicon Valley Tarot
Henrique Holschuh
On 06/20/2012 04:46 PM, Henrique de Moraes Holschuh wrote:
>
> Well, it makes the kernel tell us what (amd/intel/whatever) firmware it
> wants, and requires (after patching) a simple echo 1 >foo to activate. It
> is also harmless if activated without updated firmware in place.
>
> /dev/cpu/microcode requires special software (or a dd with bs>=file size),
> since it borks if userspace breaks a microcode into two write() syscalls,
> and requires detection of the cpu vendor. Very annoying.
>
Yeah, because dd is such an exotic piece of software...
-hpa
--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel. I don't speak on their behalf.
On Wed, 20 Jun 2012, H. Peter Anvin wrote:
> On 06/20/2012 04:46 PM, Henrique de Moraes Holschuh wrote:
> > Well, it makes the kernel tell us what (amd/intel/whatever) firmware it
> > wants, and requires (after patching) a simple echo 1 >foo to activate. It
> > is also harmless if activated without updated firmware in place.
> >
> > /dev/cpu/microcode requires special software (or a dd with bs>=file size),
> > since it borks if userspace breaks a microcode into two write() syscalls,
> > and requires detection of the cpu vendor. Very annoying.
> >
>
> Yeah, because dd is such an exotic piece of software...
No, because dd if=<whatever> of=/dev/cpu/microcode bs=1M means I have to
add dd to the initramfs or to busybox, AND it will break the day the
microcode data file gets bigger than 1M. And it will be at best very
annoying to have to special case each vendor to locate the correct
microcode, etc.
I'll take the firmware interface we already have to support to get
anything done with a lot of other drivers anyway, thank you very much.
--
"One disk to rule them all, One disk to find them. One disk to bring
them all and in the darkness grind them. In the Land of Redmond
where the shadows lie." -- The Silicon Valley Tarot
Henrique Holschuh
On 06/20/2012 05:06 PM, Henrique de Moraes Holschuh wrote:
>
> No, because dd if=<whatever> of=/dev/cpu/microcode bs=1M means I have to
> add dd to the initramfs or to busybox, AND it will break the day the
> microcode data file gets bigger than 1M. And it will be at best very
> annoying to have to special case each vendor to locate the correct
> microcode, etc.
>
No.
The whole point is you won't have to put ANYTHING in your initramfs,
period. The early microcode blob will be all you need there.
So the only issue left is what to do when you want to update the
microcode in an already running system.
-hpa
--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel. I don't speak on their behalf.
On Wed, 20 Jun 2012, H. Peter Anvin wrote:
> On 06/20/2012 05:06 PM, Henrique de Moraes Holschuh wrote:
> > No, because dd if=<whatever> of=/dev/cpu/microcode bs=1M means I have to
> > add dd to the initramfs or to busybox, AND it will break the day the
> > microcode data file gets bigger than 1M. And it will be at best very
> > annoying to have to special case each vendor to locate the correct
> > microcode, etc.
>
> No.
>
> The whole point is you won't have to put ANYTHING in your initramfs,
> period. The early microcode blob will be all you need there.
Yes. And the sooner it gets done that way, the better. But until then, IMO
the firmware_request + sysfs trigger interface is easier and friendler to
use than /dev/cpu/microcode.
> So the only issue left is what to do when you want to update the
> microcode in an already running system.
Well, IMO the firmware interface makes that somewhat easier on the userland
side, as it is more userfriendly from a generic support for processor
microcode updates point-of-view. And the sysfs trigger isn't that horrible,
it is certainly easier to script than an ioctl or syscall.
I guess it's better if we just agree to disagree on this...
--
"One disk to rule them all, One disk to find them. One disk to bring
them all and in the darkness grind them. In the Land of Redmond
where the shadows lie." -- The Silicon Valley Tarot
Henrique Holschuh