2012-06-06 21:53:25

by Thomas Gleixner

[permalink] [raw]
Subject: [patch 0/5] x86: mce: Bugfixes, cleanups and a new CMCI poll version

Sorry for the resend, I guess I need more sleep or vacation or both :(

The following series fixes a few interesting bugs (found by review in
context of the CMCI poll effort) and a cleanup to the timer/hotplug
code followed by a consolidated version of the CMCI poll
implementation. This series is based on

git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git

which contains the bugfix for the dropped timer interval init.

Thanks,

tglx




2012-06-07 10:08:22

by Chen, Gong

[permalink] [raw]
Subject: Re: [patch 0/5] x86: mce: Bugfixes, cleanups and a new CMCI poll version

于 2012/6/7 5:53, Thomas Gleixner 写道:
> Sorry for the resend, I guess I need more sleep or vacation or both :(
>
> The following series fixes a few interesting bugs (found by review in
> context of the CMCI poll effort) and a cleanup to the timer/hotplug
> code followed by a consolidated version of the CMCI poll
> implementation. This series is based on
>
> git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git
>
> which contains the bugfix for the dropped timer interval init.
>
> Thanks,
>
> tglx
>
>
>
>

I tested the latest patch series based on your tip tree. The basic logic
is correct as we expected :-).

But during the CPU online/offline test I found an issue. After *STORM*
mode is entered, it can't come back from *STORM* mode to normal
interrupt mode. At least there exists such an issue: when *STORM* is
entered, in the meanwhile, one CPU is offline during this period,
which means *cmci_storm_on_cpus* can't decrease to 0 because there
is one bit stuck on this offlined CPU. So we should detect such
situation and decrease on *cmci_storm_on_cpus* at proper time.
BTW, even I online the *CPU* in above situation, the normal CMCI
still doesn't come back, strange.

I still have another question: When we handle following case:
mce_cpu_callback(struct notifier_block *
mce_device_remove(cpu);
break;
case CPU_DOWN_PREPARE:
- del_timer_sync(t);
smp_call_function_single(cpu, mce_disable_cpu, &action, 1);
+ del_timer_sync(t);
break;

Where we add this timer back? I can't find it in "case CPU_ONLINE".

Subject: Re: [patch 0/5] x86: mce: Bugfixes, cleanups and a new CMCI poll version

On Thu, Jun 07, 2012 at 06:08:15PM +0800, Chen Gong wrote:
> But during the CPU online/offline test I found an issue. After *STORM*
> mode is entered, it can't come back from *STORM* mode to normal
> interrupt mode. At least there exists such an issue: when *STORM* is
> entered, in the meanwhile, one CPU is offline during this period,
> which means *cmci_storm_on_cpus* can't decrease to 0 because there is
> one bit stuck on this offlined CPU.

Stupid question: do you even want to allow offlining of CPUs while the
storm is raging? IOW, wouldn't it be better to disallow hotplug when the
storm happens so that all online CPUs, which is all CPUs on the system,
can work out the storm conditions faster so that the system gets back to
normal faster...

Thanks.

--
Regards/Gruss,
Boris.

Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
GM: Alberto Bozzo
Reg: Dornach, Landkreis Muenchen
HRB Nr. 43632 WEEE Registernr: 129 19551

2012-06-07 16:22:57

by Tony Luck

[permalink] [raw]
Subject: RE: [patch 0/5] x86: mce: Bugfixes, cleanups and a new CMCI poll version

> Stupid question: do you even want to allow offlining of CPUs while the
> storm is raging? IOW, wouldn't it be better to disallow hotplug when the
> storm happens so that all online CPUs, which is all CPUs on the system,
> can work out the storm conditions faster so that the system gets back to
> normal faster...

What if a processor is the source of the storm (we can get CMCI from
corrected errors in the cache)? Then our action to abate the storm would
be to take the cpu offline.

-Tony

2012-06-08 07:49:44

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [patch 0/5] x86: mce: Bugfixes, cleanups and a new CMCI poll version

On Thu, 7 Jun 2012, Chen Gong wrote:
>
> But during the CPU online/offline test I found an issue. After *STORM*
> mode is entered, it can't come back from *STORM* mode to normal
> interrupt mode. At least there exists such an issue: when *STORM* is
> entered, in the meanwhile, one CPU is offline during this period,
> which means *cmci_storm_on_cpus* can't decrease to 0 because there
> is one bit stuck on this offlined CPU. So we should detect such
> situation and decrease on *cmci_storm_on_cpus* at proper time.

Yes, we need to reset the storm state as well I think.

> BTW, even I online the *CPU* in above situation, the normal CMCI
> still doesn't come back, strange.

That's weird.

> I still have another question: When we handle following case:
> mce_cpu_callback(struct notifier_block *
> mce_device_remove(cpu);
> break;
> case CPU_DOWN_PREPARE:
> - del_timer_sync(t);
> smp_call_function_single(cpu, mce_disable_cpu, &action, 1);
> + del_timer_sync(t);
> break;
>
> Where we add this timer back? I can't find it in "case CPU_ONLINE".

The timer gets added back via mcheck_cpu_init(), which is called on
the newly onlined cpu from smp_callin().

Thanks,

tglx

2012-06-11 05:46:31

by Chen, Gong

[permalink] [raw]
Subject: Re: [patch 0/5] x86: mce: Bugfixes, cleanups and a new CMCI poll version

于 2012/6/8 15:49, Thomas Gleixner 写道:
> On Thu, 7 Jun 2012, Chen Gong wrote:
>>
>> But during the CPU online/offline test I found an issue. After *STORM*
>> mode is entered, it can't come back from *STORM* mode to normal
>> interrupt mode. At least there exists such an issue: when *STORM* is
>> entered, in the meanwhile, one CPU is offline during this period,
>> which means *cmci_storm_on_cpus* can't decrease to 0 because there
>> is one bit stuck on this offlined CPU. So we should detect such
>> situation and decrease on *cmci_storm_on_cpus* at proper time.
>
> Yes, we need to reset the storm state as well I think.
>
>> BTW, even I online the *CPU* in above situation, the normal CMCI
>> still doesn't come back, strange.
>
> That's weird.

Here is the reason. When CPU offline and then online, the CMCI is
reenabled, which means it opens a backdoor to enable *CMCI storm detect*
again. The result is the counter cmci_sotrm_cnt is increased again
before it decreases to zero. At last, this counter will never be back
to zero, irrelevant to CPU online. Obviously, we must stop CMCI enabling
again before CMCI storm ends, even during CPU online/offline. Besides
this, we still need to remove the count if one CPU is offlined. Like
below tmp patch I wrote (not guarantee it is OK :-))

diff --git a/arch/x86/kernel/cpu/mcheck/mce-internal.h
b/arch/x86/kernel/cpu/mcheck/mce-internal.h
index 2cd73ce..6a05c1d 100644
--- a/arch/x86/kernel/cpu/mcheck/mce-internal.h
+++ b/arch/x86/kernel/cpu/mcheck/mce-internal.h
@@ -31,9 +31,11 @@ extern struct mce_bank *mce_banks;
#ifdef CONFIG_X86_MCE_INTEL
unsigned long mce_intel_adjust_timer(unsigned long interval);
void mce_intel_cmci_poll(void);
+void mce_intel_hcpu_update(unsigned long cpu);
#else
# define mce_intel_adjust_timer mce_adjust_timer_default
static inline void mce_intel_cmci_poll(void) { }
+static inline void mce_intel_hcpu_update(unsigned long cpu) { }
#endif

void mce_timer_kick(unsigned long interval);
diff --git a/arch/x86/kernel/cpu/mcheck/mce.c
b/arch/x86/kernel/cpu/mcheck/mce.c
index e3f8b94..5e22d99 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -2306,6 +2306,7 @@ mce_cpu_callback(struct notifier_block *nfb,
unsigned long action, void *hcpu)
if (threshold_cpu_callback)
threshold_cpu_callback(action, cpu);
mce_device_remove(cpu);
+ mce_intel_hcpu_update(cpu);
break;
case CPU_DOWN_PREPARE:
smp_call_function_single(cpu, mce_disable_cpu, &action, 1);
diff --git a/arch/x86/kernel/cpu/mcheck/mce_intel.c
b/arch/x86/kernel/cpu/mcheck/mce_intel.c
index 92d8b5c..0051b2d 100644
--- a/arch/x86/kernel/cpu/mcheck/mce_intel.c
+++ b/arch/x86/kernel/cpu/mcheck/mce_intel.c
@@ -40,6 +40,7 @@ static DEFINE_RAW_SPINLOCK(cmci_discover_lock);
static DEFINE_PER_CPU(unsigned long, cmci_time_stamp);
static DEFINE_PER_CPU(unsigned int, cmci_storm_cnt);
static DEFINE_PER_CPU(unsigned int, cmci_storm_state);
+static DEFINE_PER_CPU(unsigned long, cmci_storm_hcpu_status);

enum {
CMCI_STORM_NONE,
@@ -47,6 +48,11 @@ enum {
CMCI_STORM_SUBSIDED,
};

+enum {
+ CMCI_STORM_HCPU_NONE,
+ CMCI_STORM_HCPU_ACTIVE,
+};
+
static atomic_t cmci_storm_on_cpus;

static int cmci_supported(int *banks)
@@ -77,6 +83,17 @@ void mce_intel_cmci_poll(void)
machine_check_poll(MCP_TIMESTAMP, &__get_cpu_var(mce_banks_owned));
}

+void mce_intel_hcpu_update(unsigned long cpu)
+{
+ unsigned long *status = &per_cpu(cmci_storm_hcpu_status, cpu);
+
+ if (*status == CMCI_STORM_HCPU_ACTIVE) {
+ *status = CMCI_STORM_HCPU_NONE;
+ atomic_dec(&cmci_storm_on_cpus);
+ per_cpu(cmci_storm_state, cpu) = CMCI_STORM_NONE;
+ }
+}
+
unsigned long mce_intel_adjust_timer(unsigned long interval)
{
if (interval < CMCI_POLL_INTERVAL)
@@ -132,6 +149,7 @@ static bool cmci_storm_detect(void)

cmci_clear();
__this_cpu_write(cmci_storm_state, CMCI_STORM_ACTIVE);
+ __this_cpu_write(cmci_storm_hcpu_status, CMCI_STORM_HCPU_ACTIVE);
atomic_inc(&cmci_storm_on_cpus);
mce_timer_kick(CMCI_POLL_INTERVAL);
return true;


>
>> I still have another question: When we handle following case:
>> mce_cpu_callback(struct notifier_block *
>> mce_device_remove(cpu);
>> break;
>> case CPU_DOWN_PREPARE:
>> - del_timer_sync(t);
>> smp_call_function_single(cpu, mce_disable_cpu, &action, 1);
>> + del_timer_sync(t);
>> break;
>>
>> Where we add this timer back? I can't find it in "case CPU_ONLINE".
>
> The timer gets added back via mcheck_cpu_init(), which is called on
> the newly onlined cpu from smp_callin().

Stupid me! I must too busy that day so that I lost my head. :-(.
Thanks Tomas for your explanation!

>
> Thanks,
>
> tglx
>
>

2012-06-11 06:09:57

by Chen, Gong

[permalink] [raw]
Subject: Re: [patch 0/5] x86: mce: Bugfixes, cleanups and a new CMCI poll version

I don't know why my reply format is changed in the previous mail, I have
to resend it again, sorry.

于 2012/6/8 15:49, Thomas Gleixner 写道:
> On Thu, 7 Jun 2012, Chen Gong wrote:
>>
>> But during the CPU online/offline test I found an issue. After *STORM*
>> mode is entered, it can't come back from *STORM* mode to normal
>> interrupt mode. At least there exists such an issue: when *STORM* is
>> entered, in the meanwhile, one CPU is offline during this period,
>> which means *cmci_storm_on_cpus* can't decrease to 0 because there
>> is one bit stuck on this offlined CPU. So we should detect such
>> situation and decrease on *cmci_storm_on_cpus* at proper time.
>
> Yes, we need to reset the storm state as well I think.
>
>> BTW, even I online the *CPU* in above situation, the normal CMCI
>> still doesn't come back, strange.
>
> That's weird.

Here is the reason. When CPU offline and then online, the CMCI is
reenabled, which means it opens a backdoor to enable *CMCI storm detect*
again. The result is the counter cmci_sotrm_cnt is increased again
before it decreases to zero. At last, this counter will never be back
to zero, irrelevant to CPU online. Obviously, we must stop CMCI enabling
again before CMCI storm ends, even during CPU online/offline. Besides
this, we still need to remove the count if one CPU is offlined. Like
below tmp patch I wrote (not guarantee it is OK :-))

diff --git a/arch/x86/kernel/cpu/mcheck/mce-internal.h
b/arch/x86/kernel/cpu/mcheck/mce-internal.h
index 2cd73ce..6a05c1d 100644
--- a/arch/x86/kernel/cpu/mcheck/mce-internal.h
+++ b/arch/x86/kernel/cpu/mcheck/mce-internal.h
@@ -31,9 +31,11 @@ extern struct mce_bank *mce_banks;
#ifdef CONFIG_X86_MCE_INTEL
unsigned long mce_intel_adjust_timer(unsigned long interval);
void mce_intel_cmci_poll(void);
+void mce_intel_hcpu_update(unsigned long cpu);
#else
# define mce_intel_adjust_timer mce_adjust_timer_default
static inline void mce_intel_cmci_poll(void) { }
+static inline void mce_intel_hcpu_update(unsigned long cpu) { }
#endif

void mce_timer_kick(unsigned long interval);
diff --git a/arch/x86/kernel/cpu/mcheck/mce.c
b/arch/x86/kernel/cpu/mcheck/mce.c
index e3f8b94..5e22d99 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -2306,6 +2306,7 @@ mce_cpu_callback(struct notifier_block *nfb,
unsigned long action, void *hcpu)
if (threshold_cpu_callback)
threshold_cpu_callback(action, cpu);
mce_device_remove(cpu);
+ mce_intel_hcpu_update(cpu);
break;
case CPU_DOWN_PREPARE:
smp_call_function_single(cpu, mce_disable_cpu, &action, 1);
diff --git a/arch/x86/kernel/cpu/mcheck/mce_intel.c
b/arch/x86/kernel/cpu/mcheck/mce_intel.c
index 92d8b5c..0051b2d 100644
--- a/arch/x86/kernel/cpu/mcheck/mce_intel.c
+++ b/arch/x86/kernel/cpu/mcheck/mce_intel.c
@@ -40,6 +40,7 @@ static DEFINE_RAW_SPINLOCK(cmci_discover_lock);
static DEFINE_PER_CPU(unsigned long, cmci_time_stamp);
static DEFINE_PER_CPU(unsigned int, cmci_storm_cnt);
static DEFINE_PER_CPU(unsigned int, cmci_storm_state);
+static DEFINE_PER_CPU(unsigned long, cmci_storm_hcpu_status);

enum {
CMCI_STORM_NONE,
@@ -47,6 +48,11 @@ enum {
CMCI_STORM_SUBSIDED,
};

+enum {
+ CMCI_STORM_HCPU_NONE,
+ CMCI_STORM_HCPU_ACTIVE,
+};
+
static atomic_t cmci_storm_on_cpus;

static int cmci_supported(int *banks)
@@ -77,6 +83,17 @@ void mce_intel_cmci_poll(void)
machine_check_poll(MCP_TIMESTAMP, &__get_cpu_var(mce_banks_owned));
}

+void mce_intel_hcpu_update(unsigned long cpu)
+{
+ unsigned long *status = &per_cpu(cmci_storm_hcpu_status, cpu);
+
+ if (*status == CMCI_STORM_HCPU_ACTIVE) {
+ *status = CMCI_STORM_HCPU_NONE;
+ atomic_dec(&cmci_storm_on_cpus);
+ per_cpu(cmci_storm_state, cpu) = CMCI_STORM_NONE;
+ }
+}
+
unsigned long mce_intel_adjust_timer(unsigned long interval)
{
if (interval < CMCI_POLL_INTERVAL)
@@ -132,6 +149,7 @@ static bool cmci_storm_detect(void)

cmci_clear();
__this_cpu_write(cmci_storm_state, CMCI_STORM_ACTIVE);
+ __this_cpu_write(cmci_storm_hcpu_status, CMCI_STORM_HCPU_ACTIVE);
atomic_inc(&cmci_storm_on_cpus);
mce_timer_kick(CMCI_POLL_INTERVAL);
return true;

>
>> I still have another question: When we handle following case:
>> mce_cpu_callback(struct notifier_block *
>> mce_device_remove(cpu);
>> break;
>> case CPU_DOWN_PREPARE:
>> - del_timer_sync(t);
>> smp_call_function_single(cpu, mce_disable_cpu, &action, 1);
>> + del_timer_sync(t);
>> break;
>>
>> Where we add this timer back? I can't find it in "case CPU_ONLINE".
>
> The timer gets added back via mcheck_cpu_init(), which is called on
> the newly onlined cpu from smp_callin().
>

Stupid me! I must too busy that day so that I lost my head. :-(.
Thanks Tomas for your explanation!

> Thanks,
>
> tglx

2012-06-14 13:48:56

by Chen, Gong

[permalink] [raw]
Subject: [PATCH] tmp patch to fix hotplug issue in CMCI storm

this patch is based on tip tree and previous 5 patches.

Signed-off-by: Chen Gong <[email protected]>
---
arch/x86/kernel/cpu/mcheck/mce-internal.h | 2 +
arch/x86/kernel/cpu/mcheck/mce.c | 1 +
arch/x86/kernel/cpu/mcheck/mce_intel.c | 49 ++++++++++++++++++++++++++++-
3 files changed, 51 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kernel/cpu/mcheck/mce-internal.h b/arch/x86/kernel/cpu/mcheck/mce-internal.h
index 2cd73ce..6a05c1d 100644
--- a/arch/x86/kernel/cpu/mcheck/mce-internal.h
+++ b/arch/x86/kernel/cpu/mcheck/mce-internal.h
@@ -31,9 +31,11 @@ extern struct mce_bank *mce_banks;
#ifdef CONFIG_X86_MCE_INTEL
unsigned long mce_intel_adjust_timer(unsigned long interval);
void mce_intel_cmci_poll(void);
+void mce_intel_hcpu_update(unsigned long cpu);
#else
# define mce_intel_adjust_timer mce_adjust_timer_default
static inline void mce_intel_cmci_poll(void) { }
+static inline void mce_intel_hcpu_update(unsigned long cpu) { }
#endif

void mce_timer_kick(unsigned long interval);
diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index e3f8b94..5e22d99 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -2306,6 +2306,7 @@ mce_cpu_callback(struct notifier_block *nfb, unsigned long action, void *hcpu)
if (threshold_cpu_callback)
threshold_cpu_callback(action, cpu);
mce_device_remove(cpu);
+ mce_intel_hcpu_update(cpu);
break;
case CPU_DOWN_PREPARE:
smp_call_function_single(cpu, mce_disable_cpu, &action, 1);
diff --git a/arch/x86/kernel/cpu/mcheck/mce_intel.c b/arch/x86/kernel/cpu/mcheck/mce_intel.c
index 92d8b5c..ef687df 100644
--- a/arch/x86/kernel/cpu/mcheck/mce_intel.c
+++ b/arch/x86/kernel/cpu/mcheck/mce_intel.c
@@ -40,6 +40,7 @@ static DEFINE_RAW_SPINLOCK(cmci_discover_lock);
static DEFINE_PER_CPU(unsigned long, cmci_time_stamp);
static DEFINE_PER_CPU(unsigned int, cmci_storm_cnt);
static DEFINE_PER_CPU(unsigned int, cmci_storm_state);
+static DEFINE_PER_CPU(unsigned long, cmci_storm_hcpu_status);

enum {
CMCI_STORM_NONE,
@@ -47,6 +48,12 @@ enum {
CMCI_STORM_SUBSIDED,
};

+enum {
+ CMCI_STORM_HCPU_NONE,
+ CMCI_STORM_HCPU_ACTIVE,
+ CMCI_STORM_HCPU_SUBSIDED,
+};
+
static atomic_t cmci_storm_on_cpus;

static int cmci_supported(int *banks)
@@ -77,6 +84,17 @@ void mce_intel_cmci_poll(void)
machine_check_poll(MCP_TIMESTAMP, &__get_cpu_var(mce_banks_owned));
}

+void mce_intel_hcpu_update(unsigned long cpu)
+{
+ unsigned long *status = &per_cpu(cmci_storm_hcpu_status, cpu);
+
+ if (*status == CMCI_STORM_HCPU_ACTIVE) {
+ per_cpu(cmci_storm_state, cpu) = CMCI_STORM_NONE;
+ *status = CMCI_STORM_HCPU_SUBSIDED;
+ atomic_dec(&cmci_storm_on_cpus);
+ }
+}
+
unsigned long mce_intel_adjust_timer(unsigned long interval)
{
if (interval < CMCI_POLL_INTERVAL)
@@ -90,6 +108,7 @@ unsigned long mce_intel_adjust_timer(unsigned long interval)
* timer interval is back to our poll interval.
*/
__this_cpu_write(cmci_storm_state, CMCI_STORM_SUBSIDED);
+ __this_cpu_write(cmci_storm_hcpu_status, CMCI_STORM_HCPU_NONE);
atomic_dec(&cmci_storm_on_cpus);

case CMCI_STORM_SUBSIDED:
@@ -109,6 +128,21 @@ unsigned long mce_intel_adjust_timer(unsigned long interval)
* We have shiny wheather, let the poll do whatever it
* thinks.
*/
+
+ /*
+ * If one CPU is offlined and onlined during CMCI storm, it
+ * has no chance to enable CMCI again. Here is the portal.
+ */
+ if ((!atomic_read(&cmci_storm_on_cpus)) &&
+ (CMCI_STORM_HCPU_SUBSIDED ==
+ __this_cpu_read(cmci_storm_hcpu_status))) {
+ __this_cpu_write(cmci_storm_hcpu_status,
+ CMCI_STORM_HCPU_NONE);
+ cmci_reenable();
+ apic_write(APIC_LVTCMCI,
+ THRESHOLD_APIC_VECTOR|APIC_DM_FIXED);
+ cmci_recheck();
+ }
return interval;
}
}
@@ -132,6 +166,7 @@ static bool cmci_storm_detect(void)

cmci_clear();
__this_cpu_write(cmci_storm_state, CMCI_STORM_ACTIVE);
+ __this_cpu_write(cmci_storm_hcpu_status, CMCI_STORM_HCPU_ACTIVE);
atomic_inc(&cmci_storm_on_cpus);
mce_timer_kick(CMCI_POLL_INTERVAL);
return true;
@@ -259,7 +294,9 @@ void cmci_rediscover(int dying)
int cpu;
cpumask_var_t old;

- if (!cmci_supported(&banks))
+ if (!cmci_supported(&banks) ||
+ /* if still in CMCI storm, don't enable it */
+ (0 != atomic_read(&cmci_storm_on_cpus)))
return;
if (!alloc_cpumask_var(&old, GFP_KERNEL))
return;
@@ -297,6 +334,16 @@ static void intel_init_cmci(void)
return;

mce_threshold_vector = intel_threshold_interrupt;
+ /* if still in CMCI storm, don't enable it */
+ if (0 != atomic_read(&cmci_storm_on_cpus))
+ return;
+ /*
+ * If one CPU is offlined during CMCI storm and onlined after
+ * CMCI storm, this *hCPU* status must be updated to avoid
+ * to reenable CMCI twice.
+ */
+ __this_cpu_write(cmci_storm_hcpu_status, CMCI_STORM_HCPU_NONE);
+
cmci_discover(banks, 1);
/*
* For CPU #0 this runs with still disabled APIC, but that's
--
1.7.1

2012-06-14 14:07:54

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH] tmp patch to fix hotplug issue in CMCI storm

On Thu, 14 Jun 2012, Chen Gong wrote:
> this patch is based on tip tree and previous 5 patches.

You really don't need all this complexity to handle that. The main
thing is that you clear the storm state and adjust the storm counter
when the cpu goes offline (in case the state is ACTIVE).

When it comes online again then you can simply let it restart cmci. If
the storm on this cpu (or node) still exists then it will notice and
everything falls in place.

Keep it simple, really!

Thanks,

tglx

2012-06-15 06:51:36

by Chen, Gong

[permalink] [raw]
Subject: Re: [PATCH] tmp patch to fix hotplug issue in CMCI storm

于 2012/6/14 22:07, Thomas Gleixner 写道:
> On Thu, 14 Jun 2012, Chen Gong wrote:
>> this patch is based on tip tree and previous 5 patches.
>
> You really don't need all this complexity to handle that. The main
> thing is that you clear the storm state and adjust the storm counter
> when the cpu goes offline (in case the state is ACTIVE).
>
> When it comes online again then you can simply let it restart cmci. If
> the storm on this cpu (or node) still exists then it will notice and
> everything falls in place.

I ever tested some different scenarios, if storm on this cpu still
exists, it triggers the CMCI and broadcast it on the sibling CPU,
which means the counter *cmci_storm_on_cpus* will increase beyond
the upper limit. E.g. on a 2 sockets SandyBridge-EP system (one socket
has 8 cores and 16 threads), inject one error on one socket, you can
watch *cmci_storm_on_cpus* = 16 becuase of CMCI broadcast, during
this time, offline and online one CPU on this socket, firstly
*cmci_storm_on_cpus* = 15 because of offline and ACTIVE status, and then
*cmci_storm_on_cpus* = 31 in that CMCI is actived because of
online.That's why I have to disable CMCI during whole online/offline
until CMCI storm is subsided. Frankly, the logic is a little bit
complex so that I write many comments to avoid I forget it after some
time :-)

2012-06-15 09:56:08

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH] tmp patch to fix hotplug issue in CMCI storm

On Fri, 15 Jun 2012, Chen Gong wrote:
> 于 2012/6/14 22:07, Thomas Gleixner 写道:
> > On Thu, 14 Jun 2012, Chen Gong wrote:
> > > this patch is based on tip tree and previous 5 patches.
> >
> > You really don't need all this complexity to handle that. The main
> > thing is that you clear the storm state and adjust the storm counter
> > when the cpu goes offline (in case the state is ACTIVE).
> >
> > When it comes online again then you can simply let it restart cmci. If
> > the storm on this cpu (or node) still exists then it will notice and
> > everything falls in place.
>
> I ever tested some different scenarios, if storm on this cpu still
> exists, it triggers the CMCI and broadcast it on the sibling CPU,
> which means the counter *cmci_storm_on_cpus* will increase beyond
> the upper limit. E.g. on a 2 sockets SandyBridge-EP system (one socket
> has 8 cores and 16 threads), inject one error on one socket, you can
> watch *cmci_storm_on_cpus* = 16 becuase of CMCI broadcast, during
> this time, offline and online one CPU on this socket, firstly
> *cmci_storm_on_cpus* = 15 because of offline and ACTIVE status, and then
> *cmci_storm_on_cpus* = 31 in that CMCI is actived because of
> online.That's why I have to disable CMCI during whole online/offline
> until CMCI storm is subsided. Frankly, the logic is a little bit
> complex so that I write many comments to avoid I forget it after some
> time :-)

This does not make any sense at all.

What you are saying is that even if CPU0 run cmci_clear() the CMCI
raised on CPU1 will cause the CMCI vector to be triggered on CPU0.

So how does the whole storm machinery work in the following case:

CPU0 CPU1
cmci incoming cmci incoming
storm detected no storm detected yet
cmci_clear()
switch to poll

cmci raised

So according to your explanation that would cause the cmci vector to
be broadcasted to CPU0 as well. Now that would cause the counter to
get a bogus increment, right ?

So instead of hacking insane crap into the code, we have simply to do
the obvious Right Thing:

Index: linux-2.6/arch/x86/kernel/cpu/mcheck/mce_intel.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/cpu/mcheck/mce_intel.c
+++ linux-2.6/arch/x86/kernel/cpu/mcheck/mce_intel.c
@@ -119,6 +119,9 @@ static bool cmci_storm_detect(void)
unsigned long ts = __this_cpu_read(cmci_time_stamp);
unsigned long now = jiffies;

+ if (__this_cpu_read(cmci_storm_state) != CMCI_STORM_NONE)
+ return true;
+
if (time_before_eq(now, ts + CMCI_STORM_INTERVAL)) {
cnt++;
} else {

That will prevent damage under all circumstances, cpu hotplug
included.

But that's too simple and comprehensible I fear.

Thanks,

tglx

2012-06-18 06:42:40

by Chen, Gong

[permalink] [raw]
Subject: Re: [PATCH] tmp patch to fix hotplug issue in CMCI storm

[...]
>
> So according to your explanation that would cause the cmci vector to
> be broadcasted to CPU0 as well. Now that would cause the counter to
> get a bogus increment, right ?
>
> So instead of hacking insane crap into the code, we have simply to do
> the obvious Right Thing:
>
> Index: linux-2.6/arch/x86/kernel/cpu/mcheck/mce_intel.c
> ===================================================================
> --- linux-2.6.orig/arch/x86/kernel/cpu/mcheck/mce_intel.c
> +++ linux-2.6/arch/x86/kernel/cpu/mcheck/mce_intel.c
> @@ -119,6 +119,9 @@ static bool cmci_storm_detect(void)
> unsigned long ts = __this_cpu_read(cmci_time_stamp);
> unsigned long now = jiffies;
>
> + if (__this_cpu_read(cmci_storm_state) != CMCI_STORM_NONE)
> + return true;
> +
> if (time_before_eq(now, ts + CMCI_STORM_INTERVAL)) {
> cnt++;
> } else {
>
> That will prevent damage under all circumstances, cpu hotplug
> included.
>
> But that's too simple and comprehensible I fear.
>

Oh, yes, your suggestion can simplify my logic, but not all. Obviously,
when hotplug is happened, it can be considered quitting CMCI storm in
another way, so the corresponding counter and status should be
decreased from that path. And in my original patch, I defined three
status:

enum {
CMCI_STORM_HCPU_NONE,
CMCI_STORM_HCPU_ACTIVE,
CMCI_STORM_HCPU_SUBSIDED,
};

I use CMCI_STORM_HCPU_SUBSIDED to descirbe stalled CPU in hotplug
progressI to stop CMCI enable during whole hotplog status, but
according to your suggestion, now the CMCI_STORM_HCPU_SUBSIDED can
be removed (FIXME: because CMCI can be enabled, if CPU offline and
then online again during CMCI storm, it will enter CMCI storm status
right now. It can simplify the logic, but a little bit performance
degradation).

I will send the 2nd patch based on previous 5 patches in a separate
mail.

2012-06-18 06:44:35

by Chen, Gong

[permalink] [raw]
Subject: [PATCH V2] tmp patch to fix hotplug issue in CMCI storm

This patch is based on previous 5 patches and tuned based on
Thomas' suggestion.

Signed-off-by: Chen Gong <[email protected]>
---
arch/x86/kernel/cpu/mcheck/mce-internal.h | 2 ++
arch/x86/kernel/cpu/mcheck/mce.c | 1 +
arch/x86/kernel/cpu/mcheck/mce_intel.c | 22 ++++++++++++++++++++++
3 files changed, 25 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kernel/cpu/mcheck/mce-internal.h b/arch/x86/kernel/cpu/mcheck/mce-internal.h
index 2cd73ce..6a05c1d 100644
--- a/arch/x86/kernel/cpu/mcheck/mce-internal.h
+++ b/arch/x86/kernel/cpu/mcheck/mce-internal.h
@@ -31,9 +31,11 @@ extern struct mce_bank *mce_banks;
#ifdef CONFIG_X86_MCE_INTEL
unsigned long mce_intel_adjust_timer(unsigned long interval);
void mce_intel_cmci_poll(void);
+void mce_intel_hcpu_update(unsigned long cpu);
#else
# define mce_intel_adjust_timer mce_adjust_timer_default
static inline void mce_intel_cmci_poll(void) { }
+static inline void mce_intel_hcpu_update(unsigned long cpu) { }
#endif

void mce_timer_kick(unsigned long interval);
diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index e3f8b94..5e22d99 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -2306,6 +2306,7 @@ mce_cpu_callback(struct notifier_block *nfb, unsigned long action, void *hcpu)
if (threshold_cpu_callback)
threshold_cpu_callback(action, cpu);
mce_device_remove(cpu);
+ mce_intel_hcpu_update(cpu);
break;
case CPU_DOWN_PREPARE:
smp_call_function_single(cpu, mce_disable_cpu, &action, 1);
diff --git a/arch/x86/kernel/cpu/mcheck/mce_intel.c b/arch/x86/kernel/cpu/mcheck/mce_intel.c
index 92d8b5c..0493525 100644
--- a/arch/x86/kernel/cpu/mcheck/mce_intel.c
+++ b/arch/x86/kernel/cpu/mcheck/mce_intel.c
@@ -40,6 +40,7 @@ static DEFINE_RAW_SPINLOCK(cmci_discover_lock);
static DEFINE_PER_CPU(unsigned long, cmci_time_stamp);
static DEFINE_PER_CPU(unsigned int, cmci_storm_cnt);
static DEFINE_PER_CPU(unsigned int, cmci_storm_state);
+static DEFINE_PER_CPU(unsigned long, cmci_storm_hcpu_status);

enum {
CMCI_STORM_NONE,
@@ -47,6 +48,11 @@ enum {
CMCI_STORM_SUBSIDED,
};

+enum {
+ CMCI_STORM_HCPU_NONE,
+ CMCI_STORM_HCPU_ACTIVE,
+};
+
static atomic_t cmci_storm_on_cpus;

static int cmci_supported(int *banks)
@@ -77,6 +83,17 @@ void mce_intel_cmci_poll(void)
machine_check_poll(MCP_TIMESTAMP, &__get_cpu_var(mce_banks_owned));
}

+void mce_intel_hcpu_update(unsigned long cpu)
+{
+ unsigned long *status = &per_cpu(cmci_storm_hcpu_status, cpu);
+
+ if (*status == CMCI_STORM_HCPU_ACTIVE) {
+ per_cpu(cmci_storm_state, cpu) = CMCI_STORM_NONE;
+ *status = CMCI_STORM_HCPU_NONE;
+ atomic_dec(&cmci_storm_on_cpus);
+ }
+}
+
unsigned long mce_intel_adjust_timer(unsigned long interval)
{
if (interval < CMCI_POLL_INTERVAL)
@@ -90,6 +107,7 @@ unsigned long mce_intel_adjust_timer(unsigned long interval)
* timer interval is back to our poll interval.
*/
__this_cpu_write(cmci_storm_state, CMCI_STORM_SUBSIDED);
+ __this_cpu_write(cmci_storm_hcpu_status, CMCI_STORM_HCPU_NONE);
atomic_dec(&cmci_storm_on_cpus);

case CMCI_STORM_SUBSIDED:
@@ -119,6 +137,9 @@ static bool cmci_storm_detect(void)
unsigned long ts = __this_cpu_read(cmci_time_stamp);
unsigned long now = jiffies;

+ if (__this_cpu_read(cmci_storm_state) != CMCI_STORM_NONE)
+ return true;
+
if (time_before_eq(now, ts + CMCI_STORM_INTERVAL)) {
cnt++;
} else {
@@ -132,6 +153,7 @@ static bool cmci_storm_detect(void)

cmci_clear();
__this_cpu_write(cmci_storm_state, CMCI_STORM_ACTIVE);
+ __this_cpu_write(cmci_storm_hcpu_status, CMCI_STORM_HCPU_ACTIVE);
atomic_inc(&cmci_storm_on_cpus);
mce_timer_kick(CMCI_POLL_INTERVAL);
return true;
--
1.7.1

2012-06-18 08:01:10

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH V2] tmp patch to fix hotplug issue in CMCI storm

On Mon, 18 Jun 2012, Chen Gong wrote:
> index 92d8b5c..0493525 100644
> --- a/arch/x86/kernel/cpu/mcheck/mce_intel.c
> +++ b/arch/x86/kernel/cpu/mcheck/mce_intel.c
> @@ -40,6 +40,7 @@ static DEFINE_RAW_SPINLOCK(cmci_discover_lock);
> static DEFINE_PER_CPU(unsigned long, cmci_time_stamp);
> static DEFINE_PER_CPU(unsigned int, cmci_storm_cnt);
> static DEFINE_PER_CPU(unsigned int, cmci_storm_state);
> +static DEFINE_PER_CPU(unsigned long, cmci_storm_hcpu_status);

Why do you insist on having another status variable, which does
actually nothing than obfuscate the code?

Look at the usage sites:

> __this_cpu_write(cmci_storm_state, CMCI_STORM_SUBSIDED);
> + __this_cpu_write(cmci_storm_hcpu_status, CMCI_STORM_HCPU_NONE);

> __this_cpu_write(cmci_storm_state, CMCI_STORM_ACTIVE);
> + __this_cpu_write(cmci_storm_hcpu_status, CMCI_STORM_HCPU_ACTIVE);

So it's a shadow variable of cmci_storm_state for no value.

And all you do with it is:

> +void mce_intel_hcpu_update(unsigned long cpu)
> +{
> + unsigned long *status = &per_cpu(cmci_storm_hcpu_status, cpu);
> +
> + if (*status == CMCI_STORM_HCPU_ACTIVE) {

This can be checked with the existing variable as well. And your check
leaves CMCI_STORM_SUBSIDED as a stale value around.

This simply wants to check

if (per_cpu(cmci_storm_state, cpu) == CMCI_STORM_ACTIVE)
atomic_dec(&cmci_storm_on_cpus);

and unconditionally clear the state

per_cpu(cmci_storm_state, cpu) = CMCI_STORM_NONE;

Right?

2012-06-18 10:13:45

by Chen, Gong

[permalink] [raw]
Subject: Re: [PATCH V2] tmp patch to fix hotplug issue in CMCI storm

于 2012/6/18 16:00, Thomas Gleixner 写道:
> On Mon, 18 Jun 2012, Chen Gong wrote:
>> index 92d8b5c..0493525 100644
>> --- a/arch/x86/kernel/cpu/mcheck/mce_intel.c
>> +++ b/arch/x86/kernel/cpu/mcheck/mce_intel.c
>> @@ -40,6 +40,7 @@ static DEFINE_RAW_SPINLOCK(cmci_discover_lock);
>> static DEFINE_PER_CPU(unsigned long, cmci_time_stamp);
>> static DEFINE_PER_CPU(unsigned int, cmci_storm_cnt);
>> static DEFINE_PER_CPU(unsigned int, cmci_storm_state);
>> +static DEFINE_PER_CPU(unsigned long, cmci_storm_hcpu_status);
>
> Why do you insist on having another status variable, which does
> actually nothing than obfuscate the code?
>
> Look at the usage sites:
>
>> __this_cpu_write(cmci_storm_state, CMCI_STORM_SUBSIDED);
>> + __this_cpu_write(cmci_storm_hcpu_status, CMCI_STORM_HCPU_NONE);

Because here I can't substitute CMCI_STORM_HCPU_NONE with
CMCI_STORM_SUBSIDED. If so, the status is chaos.

2012-06-18 12:23:45

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH V2] tmp patch to fix hotplug issue in CMCI storm

On Mon, 18 Jun 2012, Chen Gong wrote:
> 于 2012/6/18 16:00, Thomas Gleixner 写道:
> > On Mon, 18 Jun 2012, Chen Gong wrote:
> > > index 92d8b5c..0493525 100644
> > > --- a/arch/x86/kernel/cpu/mcheck/mce_intel.c
> > > +++ b/arch/x86/kernel/cpu/mcheck/mce_intel.c
> > > @@ -40,6 +40,7 @@ static DEFINE_RAW_SPINLOCK(cmci_discover_lock);
> > > static DEFINE_PER_CPU(unsigned long, cmci_time_stamp);
> > > static DEFINE_PER_CPU(unsigned int, cmci_storm_cnt);
> > > static DEFINE_PER_CPU(unsigned int, cmci_storm_state);
> > > +static DEFINE_PER_CPU(unsigned long, cmci_storm_hcpu_status);
> >
> > Why do you insist on having another status variable, which does
> > actually nothing than obfuscate the code?
> >
> > Look at the usage sites:
> >
> > > __this_cpu_write(cmci_storm_state, CMCI_STORM_SUBSIDED);
> > > + __this_cpu_write(cmci_storm_hcpu_status,
> > > CMCI_STORM_HCPU_NONE);
>
> Because here I can't substitute CMCI_STORM_HCPU_NONE with
> CMCI_STORM_SUBSIDED. If so, the status is chaos.

Have you actually read what I wrote later? You do not neeed that extra
state, period.

2012-06-19 06:05:13

by Chen, Gong

[permalink] [raw]
Subject: Re: [PATCH V2] tmp patch to fix hotplug issue in CMCI storm

于 2012/6/18 20:23, Thomas Gleixner 写道:
> On Mon, 18 Jun 2012, Chen Gong wrote:
>> 于 2012/6/18 16:00, Thomas Gleixner 写道:
>>> On Mon, 18 Jun 2012, Chen Gong wrote:
>>>> index 92d8b5c..0493525 100644
>>>> --- a/arch/x86/kernel/cpu/mcheck/mce_intel.c
>>>> +++ b/arch/x86/kernel/cpu/mcheck/mce_intel.c
>>>> @@ -40,6 +40,7 @@ static DEFINE_RAW_SPINLOCK(cmci_discover_lock);
>>>> static DEFINE_PER_CPU(unsigned long, cmci_time_stamp);
>>>> static DEFINE_PER_CPU(unsigned int, cmci_storm_cnt);
>>>> static DEFINE_PER_CPU(unsigned int, cmci_storm_state);
>>>> +static DEFINE_PER_CPU(unsigned long, cmci_storm_hcpu_status);
>>>
>>> Why do you insist on having another status variable, which does
>>> actually nothing than obfuscate the code?
>>>
>>> Look at the usage sites:
>>>
>>>> __this_cpu_write(cmci_storm_state, CMCI_STORM_SUBSIDED);
>>>> + __this_cpu_write(cmci_storm_hcpu_status,
>>>> CMCI_STORM_HCPU_NONE);
>>
>> Because here I can't substitute CMCI_STORM_HCPU_NONE with
>> CMCI_STORM_SUBSIDED. If so, the status is chaos.
>
> Have you actually read what I wrote later? You do not neeed that extra
> state, period.
>

Oh, yes, you are right. I really don't take enough time on your reply.
I apology for it. Unconditionally clearing the state can avoid stale
CMCI_STORM_SUBSIDED status. Your logic is fine and simply my previous
unnecessarily complicated logic. Thanks a lot.

I will send the V3 tmp patch based on your previous 5 patches right now.
If OK, you can merge it into your 5th patch and I can write the comment
for this patch if you are OK.

2012-06-19 06:08:08

by Chen, Gong

[permalink] [raw]
Subject: [PATCH V3] tmp patch to fix hotplug issue in CMCI storm

v3->v1
Thanks very much for Thomas' suggestion to simply the whole logic.

Signed-off-by: Chen Gong <[email protected]>
---
arch/x86/kernel/cpu/mcheck/mce-internal.h | 2 ++
arch/x86/kernel/cpu/mcheck/mce.c | 1 +
arch/x86/kernel/cpu/mcheck/mce_intel.c | 11 +++++++++++
3 files changed, 14 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kernel/cpu/mcheck/mce-internal.h b/arch/x86/kernel/cpu/mcheck/mce-internal.h
index 2cd73ce..6a05c1d 100644
--- a/arch/x86/kernel/cpu/mcheck/mce-internal.h
+++ b/arch/x86/kernel/cpu/mcheck/mce-internal.h
@@ -31,9 +31,11 @@ extern struct mce_bank *mce_banks;
#ifdef CONFIG_X86_MCE_INTEL
unsigned long mce_intel_adjust_timer(unsigned long interval);
void mce_intel_cmci_poll(void);
+void mce_intel_hcpu_update(unsigned long cpu);
#else
# define mce_intel_adjust_timer mce_adjust_timer_default
static inline void mce_intel_cmci_poll(void) { }
+static inline void mce_intel_hcpu_update(unsigned long cpu) { }
#endif

void mce_timer_kick(unsigned long interval);
diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index e3f8b94..5e22d99 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -2306,6 +2306,7 @@ mce_cpu_callback(struct notifier_block *nfb, unsigned long action, void *hcpu)
if (threshold_cpu_callback)
threshold_cpu_callback(action, cpu);
mce_device_remove(cpu);
+ mce_intel_hcpu_update(cpu);
break;
case CPU_DOWN_PREPARE:
smp_call_function_single(cpu, mce_disable_cpu, &action, 1);
diff --git a/arch/x86/kernel/cpu/mcheck/mce_intel.c b/arch/x86/kernel/cpu/mcheck/mce_intel.c
index 92d8b5c..693bc7d 100644
--- a/arch/x86/kernel/cpu/mcheck/mce_intel.c
+++ b/arch/x86/kernel/cpu/mcheck/mce_intel.c
@@ -77,6 +77,14 @@ void mce_intel_cmci_poll(void)
machine_check_poll(MCP_TIMESTAMP, &__get_cpu_var(mce_banks_owned));
}

+void mce_intel_hcpu_update(unsigned long cpu)
+{
+ if (per_cpu(cmci_storm_state, cpu) == CMCI_STORM_ACTIVE)
+ atomic_dec(&cmci_storm_on_cpus);
+
+ per_cpu(cmci_storm_state, cpu) = CMCI_STORM_NONE;
+}
+
unsigned long mce_intel_adjust_timer(unsigned long interval)
{
if (interval < CMCI_POLL_INTERVAL)
@@ -119,6 +127,9 @@ static bool cmci_storm_detect(void)
unsigned long ts = __this_cpu_read(cmci_time_stamp);
unsigned long now = jiffies;

+ if (__this_cpu_read(cmci_storm_state) != CMCI_STORM_NONE)
+ return true;
+
if (time_before_eq(now, ts + CMCI_STORM_INTERVAL)) {
cnt++;
} else {
--
1.7.1