2011-06-17 08:38:03

by Hidetoshi Seto

[permalink] [raw]
Subject: [PATCH 0/8] x86, mce: misc fix/cleanups, cont

This patch set contains small patches, newly invented through the
recent discussion about error recovery etc.
Not drastic, but small improvements.

(This set is based on today's tip/ras/core branch.)

Thanks,
H.Seto

arch/x86/include/asm/mce.h | 2 -
arch/x86/kernel/cpu/mcheck/mce.c | 340 ++++++++++++++++----------------
arch/x86/kernel/cpu/mcheck/mce_intel.c | 2 +-
3 files changed, 173 insertions(+), 171 deletions(-)

Hidetoshi Seto (8):
x86, mce: stop calling del_timer_sync() from interrupt
x86, mce: remove redundant mce_available() checks
x86, mce: introduce mce_timer_add()
x86, mce: rename bootparam parser
x86, mce: introduce mce_sysdev_init()
x86, mce: introduce mce_memory_failure_process()
x86, mce: rework use of TIF_MCE_NOTIFY
x86, mce, edac: call edac_mce_parse() once per a record


2011-06-17 08:41:15

by Hidetoshi Seto

[permalink] [raw]
Subject: [PATCH 1/8] x86, mce: stop calling del_timer_sync() from interrupt

Function del_timer_sync() has WARN_ON(in_irq()) in it because
calling it from interrupt context can cause deadlock if it
interrupts the target timer running.

In MCE code, del_timer_sync() is used with on_each_cpu() in
some parts for sysfs files:
bank*, check_interval, cmci_disabled and ignore_ce.

However use of on_each_cpu() results in calling the function
passed as the argument in the interrupt context. It means you
can see a flood of warnings from del_timer_sync() by a simple
file access, for example:

echo 300 > /sys/devices/system/machinecheck/machinecheck0/check_interval

Fortunately these MCE specific files are rare-used and AFAIK
only few MCE geeks experience this warning on write.

To remove the warning (for my happy hacking), move timer deletion
outside of the interrupt context ;-)

v2: update patch description

Signed-off-by: Hidetoshi Seto <[email protected]>
---
arch/x86/kernel/cpu/mcheck/mce.c | 22 ++++++++++++++++------
1 files changed, 16 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index 08363b0..42fc8d2 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -1140,6 +1140,17 @@ static void mce_start_timer(unsigned long data)
add_timer_on(t, smp_processor_id());
}

+/* Must not be called from interrupt where del_timer_sync() can deadlock */
+static void mce_timer_delete_all(void)
+{
+ int cpu;
+
+ for_each_online_cpu(cpu) {
+ if (mce_available(&per_cpu(cpu_info, cpu)))
+ del_timer_sync(&per_cpu(mce_timer, cpu));
+ }
+}
+
static void mce_do_trigger(struct work_struct *work)
{
call_usermodehelper(mce_helper, mce_helper_argv, NULL, UMH_NO_WAIT);
@@ -1750,7 +1761,6 @@ static struct syscore_ops mce_syscore_ops = {

static void mce_cpu_restart(void *data)
{
- del_timer_sync(&__get_cpu_var(mce_timer));
if (!mce_available(__this_cpu_ptr(&cpu_info)))
return;
__mcheck_cpu_init_generic();
@@ -1760,16 +1770,15 @@ static void mce_cpu_restart(void *data)
/* Reinit MCEs after user configuration changes */
static void mce_restart(void)
{
+ mce_timer_delete_all();
on_each_cpu(mce_cpu_restart, NULL, 1);
}

/* Toggle features for corrected errors */
-static void mce_disable_ce(void *all)
+static void mce_disable_cmci(void *data)
{
if (!mce_available(__this_cpu_ptr(&cpu_info)))
return;
- if (all)
- del_timer_sync(&__get_cpu_var(mce_timer));
cmci_clear();
}

@@ -1852,7 +1861,8 @@ static ssize_t set_ignore_ce(struct sys_device *s,
if (mce_ignore_ce ^ !!new) {
if (new) {
/* disable ce features */
- on_each_cpu(mce_disable_ce, (void *)1, 1);
+ mce_timer_delete_all();
+ on_each_cpu(mce_disable_cmci, NULL, 1);
mce_ignore_ce = 1;
} else {
/* enable ce features */
@@ -1875,7 +1885,7 @@ static ssize_t set_cmci_disabled(struct sys_device *s,
if (mce_cmci_disabled ^ !!new) {
if (new) {
/* disable cmci */
- on_each_cpu(mce_disable_ce, NULL, 1);
+ on_each_cpu(mce_disable_cmci, NULL, 1);
mce_cmci_disabled = 1;
} else {
/* enable cmci */
--
1.7.1

2011-06-17 08:42:38

by Hidetoshi Seto

[permalink] [raw]
Subject: [PATCH 2/8] x86, mce: remove redundant mce_available() checks

I noticed that there is no need to check mce_available() on runtime.
In fact we don't support asymmetric configuration such as mixture of
varied processors which some are MCE capable while others are not.

Finally only 2 checks in init code path are enough for now.

When !mce_available():
- in mcheck_cpu_init():
- mce timer is not initialized
- cmci is not enabled
- and later in mce_init_device():
- sysfs files are not created
- hotplug notifier is not registered
So we don't need checks at all in these functions which is never
called when !mce_available().

For safety I also add a warning which is given when bad unsupported
asymmetric configuration is detected.

Signed-off-by: Hidetoshi Seto <[email protected]>
---
arch/x86/include/asm/mce.h | 2 -
arch/x86/kernel/cpu/mcheck/mce.c | 34 ++++++++++---------------------
arch/x86/kernel/cpu/mcheck/mce_intel.c | 2 +-
3 files changed, 12 insertions(+), 26 deletions(-)

diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h
index 716b48a..77825dd 100644
--- a/arch/x86/include/asm/mce.h
+++ b/arch/x86/include/asm/mce.h
@@ -180,8 +180,6 @@ void mce_amd_feature_init(struct cpuinfo_x86 *c);
static inline void mce_amd_feature_init(struct cpuinfo_x86 *c) { }
#endif

-int mce_available(struct cpuinfo_x86 *c);
-
DECLARE_PER_CPU(unsigned, mce_exception_count);
DECLARE_PER_CPU(unsigned, mce_poll_count);

diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index 42fc8d2..205b334 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -449,7 +449,7 @@ static int mce_ring_add(unsigned long pfn)
return 0;
}

-int mce_available(struct cpuinfo_x86 *c)
+static int mce_available(struct cpuinfo_x86 *c)
{
if (mce_disabled)
return 0;
@@ -1121,10 +1121,7 @@ static void mce_start_timer(unsigned long data)

WARN_ON(smp_processor_id() != data);

- if (mce_available(__this_cpu_ptr(&cpu_info))) {
- machine_check_poll(MCP_TIMESTAMP,
- &__get_cpu_var(mce_poll_banks));
- }
+ machine_check_poll(MCP_TIMESTAMP, &__get_cpu_var(mce_poll_banks));

/*
* Alert userspace if needed. If we logged an MCE, reduce the
@@ -1146,8 +1143,7 @@ static void mce_timer_delete_all(void)
int cpu;

for_each_online_cpu(cpu) {
- if (mce_available(&per_cpu(cpu_info, cpu)))
- del_timer_sync(&per_cpu(mce_timer, cpu));
+ del_timer_sync(&per_cpu(mce_timer, cpu));
}
}

@@ -1418,8 +1414,15 @@ void __cpuinit mcheck_cpu_init(struct cpuinfo_x86 *c)
if (__mcheck_cpu_ancient_init(c))
return;

- if (!mce_available(c))
+ if (!mce_available(c)) {
+ /*
+ * Asymmetric configurations are not supported today.
+ * If mce_banks is allocated there must be a cpu passed here.
+ */
+ WARN_ON(!mce_disabled && mce_banks);
+ mce_disabled = 1;
return;
+ }

if (__mcheck_cpu_cap_init() < 0 || __mcheck_cpu_apply_quirks(c) < 0) {
mce_disabled = 1;
@@ -1761,8 +1764,6 @@ static struct syscore_ops mce_syscore_ops = {

static void mce_cpu_restart(void *data)
{
- if (!mce_available(__this_cpu_ptr(&cpu_info)))
- return;
__mcheck_cpu_init_generic();
__mcheck_cpu_init_timer();
}
@@ -1777,15 +1778,11 @@ static void mce_restart(void)
/* Toggle features for corrected errors */
static void mce_disable_cmci(void *data)
{
- if (!mce_available(__this_cpu_ptr(&cpu_info)))
- return;
cmci_clear();
}

static void mce_enable_ce(void *all)
{
- if (!mce_available(__this_cpu_ptr(&cpu_info)))
- return;
cmci_reenable();
cmci_recheck();
if (all)
@@ -1946,9 +1943,6 @@ static __cpuinit int mce_sysdev_create(unsigned int cpu)
int err;
int i, j;

- if (!mce_available(&boot_cpu_data))
- return -EIO;
-
memset(&sysdev->kobj, 0, sizeof(struct kobject));
sysdev->id = cpu;
sysdev->cls = &mce_sysdev_class;
@@ -2006,9 +2000,6 @@ static void __cpuinit mce_disable_cpu(void *h)
unsigned long action = *(unsigned long *)h;
int i;

- if (!mce_available(__this_cpu_ptr(&cpu_info)))
- return;
-
if (!(action & CPU_TASKS_FROZEN))
cmci_clear();
for (i = 0; i < banks; i++) {
@@ -2024,9 +2015,6 @@ static void __cpuinit mce_reenable_cpu(void *h)
unsigned long action = *(unsigned long *)h;
int i;

- if (!mce_available(__this_cpu_ptr(&cpu_info)))
- return;
-
if (!(action & CPU_TASKS_FROZEN))
cmci_reenable();
for (i = 0; i < banks; i++) {
diff --git a/arch/x86/kernel/cpu/mcheck/mce_intel.c b/arch/x86/kernel/cpu/mcheck/mce_intel.c
index 8694ef5..393516c 100644
--- a/arch/x86/kernel/cpu/mcheck/mce_intel.c
+++ b/arch/x86/kernel/cpu/mcheck/mce_intel.c
@@ -130,7 +130,7 @@ void cmci_recheck(void)
unsigned long flags;
int banks;

- if (!mce_available(__this_cpu_ptr(&cpu_info)) || !cmci_supported(&banks))
+ if (!cmci_supported(&banks))
return;
local_irq_save(flags);
machine_check_poll(MCP_TIMESTAMP, &__get_cpu_var(mce_banks_owned));
--
1.7.1

2011-06-17 08:43:46

by Hidetoshi Seto

[permalink] [raw]
Subject: [PATCH 3/8] x86, mce: introduce mce_timer_add()

It is too redundant to call setup_timer() every time when the timer is
going to be added.

This patch breaks __mcheck_cpu_init_timer() down, put setup part to init
code path and construct mce_timer_add() from the rests. Since there is no
strong reason to keep interval only when it back from hotplug event, this
patch also helps to kill duplicated code in hotplug notifier.

As the sideline this patch includes rename of mce_start_timer() to
mce_timer_run(), to group related functions with mce_timer_ prefix.

Signed-off-by: Hidetoshi Seto <[email protected]>
---
arch/x86/kernel/cpu/mcheck/mce.c | 54 ++++++++++++++++++-------------------
1 files changed, 26 insertions(+), 28 deletions(-)

diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index 205b334..c3dad64 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -1114,7 +1114,7 @@ static int check_interval = 5 * 60; /* 5 minutes */
static DEFINE_PER_CPU(int, mce_next_interval); /* in jiffies */
static DEFINE_PER_CPU(struct timer_list, mce_timer);

-static void mce_start_timer(unsigned long data)
+static void mce_timer_run(unsigned long data)
{
struct timer_list *t = &per_cpu(mce_timer, data);
int *n;
@@ -1147,6 +1147,21 @@ static void mce_timer_delete_all(void)
}
}

+static void mce_timer_add(unsigned long cpu)
+{
+ struct timer_list *t = &per_cpu(mce_timer, cpu);
+ int *n = &per_cpu(mce_next_interval, cpu);
+
+ if (mce_ignore_ce || !check_interval)
+ return;
+
+ /* reset next interval */
+ *n = check_interval * HZ;
+
+ t->expires = round_jiffies(jiffies + *n);
+ add_timer_on(t, cpu);
+}
+
static void mce_do_trigger(struct work_struct *work)
{
call_usermodehelper(mce_helper, mce_helper_argv, NULL, UMH_NO_WAIT);
@@ -1374,23 +1389,6 @@ static void __mcheck_cpu_init_vendor(struct cpuinfo_x86 *c)
}
}

-static void __mcheck_cpu_init_timer(void)
-{
- struct timer_list *t = &__get_cpu_var(mce_timer);
- int *n = &__get_cpu_var(mce_next_interval);
-
- setup_timer(t, mce_start_timer, smp_processor_id());
-
- if (mce_ignore_ce)
- return;
-
- *n = check_interval * HZ;
- if (!*n)
- return;
- t->expires = round_jiffies(jiffies + *n);
- add_timer_on(t, smp_processor_id());
-}
-
/* Handle unconfigured int18 (should never happen) */
static void unexpected_machine_check(struct pt_regs *regs, long error_code)
{
@@ -1408,6 +1406,8 @@ void (*machine_check_vector)(struct pt_regs *, long error_code) =
*/
void __cpuinit mcheck_cpu_init(struct cpuinfo_x86 *c)
{
+ int cpu = smp_processor_id();
+
if (mce_disabled)
return;

@@ -1433,9 +1433,12 @@ void __cpuinit mcheck_cpu_init(struct cpuinfo_x86 *c)

__mcheck_cpu_init_generic();
__mcheck_cpu_init_vendor(c);
- __mcheck_cpu_init_timer();
+
+ setup_timer(&__get_cpu_var(mce_timer), mce_timer_run, cpu);
INIT_WORK(&__get_cpu_var(mce_work), mce_process_work);
init_irq_work(&__get_cpu_var(mce_irq_work), &mce_irq_work_cb);
+
+ mce_timer_add(cpu);
}

/*
@@ -1765,7 +1768,7 @@ static struct syscore_ops mce_syscore_ops = {
static void mce_cpu_restart(void *data)
{
__mcheck_cpu_init_generic();
- __mcheck_cpu_init_timer();
+ mce_timer_add(smp_processor_id());
}

/* Reinit MCEs after user configuration changes */
@@ -1786,7 +1789,7 @@ static void mce_enable_ce(void *all)
cmci_reenable();
cmci_recheck();
if (all)
- __mcheck_cpu_init_timer();
+ mce_timer_add(smp_processor_id());
}

static struct sysdev_class mce_sysdev_class = {
@@ -2030,7 +2033,6 @@ static int __cpuinit
mce_cpu_callback(struct notifier_block *nfb, unsigned long action, void *hcpu)
{
unsigned int cpu = (unsigned long)hcpu;
- struct timer_list *t = &per_cpu(mce_timer, cpu);

switch (action) {
case CPU_ONLINE:
@@ -2047,16 +2049,12 @@ mce_cpu_callback(struct notifier_block *nfb, unsigned long action, void *hcpu)
break;
case CPU_DOWN_PREPARE:
case CPU_DOWN_PREPARE_FROZEN:
- del_timer_sync(t);
+ del_timer_sync(&per_cpu(mce_timer, cpu));
smp_call_function_single(cpu, mce_disable_cpu, &action, 1);
break;
case CPU_DOWN_FAILED:
case CPU_DOWN_FAILED_FROZEN:
- if (!mce_ignore_ce && check_interval) {
- t->expires = round_jiffies(jiffies +
- __get_cpu_var(mce_next_interval));
- add_timer_on(t, cpu);
- }
+ mce_timer_add(cpu);
smp_call_function_single(cpu, mce_reenable_cpu, &action, 1);
break;
case CPU_POST_DEAD:
--
1.7.1

2011-06-17 08:44:52

by Hidetoshi Seto

[permalink] [raw]
Subject: [PATCH 4/8] x86, mce: rename bootparam parser

Rename them with comprehensible prefix mcheck_setup.
(at least it looks better than current misleading name)
And relocate to put together setup codes.

Before: After:
mcheck_enable mcheck_setup
mcheck_disable mcheck_setup_old

Signed-off-by: Hidetoshi Seto <[email protected]>
---
arch/x86/kernel/cpu/mcheck/mce.c | 91 +++++++++++++++++++-------------------
1 files changed, 45 insertions(+), 46 deletions(-)

diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index c3dad64..ad0e9fb 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -1662,50 +1662,6 @@ static struct miscdevice mce_chrdev_device = {
&mce_chrdev_ops,
};

-/*
- * mce=off Disables machine check
- * mce=no_cmci Disables CMCI
- * mce=dont_log_ce Clears corrected events silently, no log created for CEs.
- * mce=ignore_ce Disables polling and CMCI, corrected events are not cleared.
- * mce=TOLERANCELEVEL[,monarchtimeout] (number, see above)
- * monarchtimeout is how long to wait for other CPUs on machine
- * check, or 0 to not wait
- * mce=bootlog Log MCEs from before booting. Disabled by default on AMD.
- * mce=nobootlog Don't log MCEs from before booting.
- */
-static int __init mcheck_enable(char *str)
-{
- if (*str == 0) {
- enable_p5_mce();
- return 1;
- }
- if (*str == '=')
- str++;
- if (!strcmp(str, "off"))
- mce_disabled = 1;
- else if (!strcmp(str, "no_cmci"))
- mce_cmci_disabled = 1;
- else if (!strcmp(str, "dont_log_ce"))
- mce_dont_log_ce = 1;
- else if (!strcmp(str, "ignore_ce"))
- mce_ignore_ce = 1;
- else if (!strcmp(str, "bootlog") || !strcmp(str, "nobootlog"))
- mce_bootlog = (str[0] == 'b');
- else if (isdigit(str[0])) {
- get_option(&str, &tolerant);
- if (*str == ',') {
- ++str;
- get_option(&str, &monarch_timeout);
- }
- } else {
- printk(KERN_INFO "mce argument %s ignored. Please use /sys\n",
- str);
- return 0;
- }
- return 1;
-}
-__setup("mce", mcheck_enable);
-
int __init mcheck_init(void)
{
mcheck_intel_therm_init();
@@ -2120,14 +2076,57 @@ static __init int mcheck_init_device(void)
device_initcall(mcheck_init_device);

/*
+ * mce=off Disables machine check
+ * mce=no_cmci Disables CMCI
+ * mce=dont_log_ce Clears corrected events silently, no log created for CEs.
+ * mce=ignore_ce Disables polling and CMCI, corrected events are not cleared.
+ * mce=TOLERANCELEVEL[,monarchtimeout] (number, see above)
+ * monarchtimeout is how long to wait for other CPUs on machine
+ * check, or 0 to not wait
+ * mce=bootlog Log MCEs from before booting. Disabled by default on AMD.
+ * mce=nobootlog Don't log MCEs from before booting.
+ */
+static int __init mcheck_setup(char *str)
+{
+ if (*str == 0) {
+ enable_p5_mce();
+ return 1;
+ }
+ if (*str == '=')
+ str++;
+ if (!strcmp(str, "off"))
+ mce_disabled = 1;
+ else if (!strcmp(str, "no_cmci"))
+ mce_cmci_disabled = 1;
+ else if (!strcmp(str, "dont_log_ce"))
+ mce_dont_log_ce = 1;
+ else if (!strcmp(str, "ignore_ce"))
+ mce_ignore_ce = 1;
+ else if (!strcmp(str, "bootlog") || !strcmp(str, "nobootlog"))
+ mce_bootlog = (str[0] == 'b');
+ else if (isdigit(str[0])) {
+ get_option(&str, &tolerant);
+ if (*str == ',') {
+ ++str;
+ get_option(&str, &monarch_timeout);
+ }
+ } else {
+ pr_info("mce argument %s ignored. Please use /sys\n", str);
+ return 0;
+ }
+ return 1;
+}
+__setup("mce", mcheck_setup);
+
+/*
* Old style boot options parsing. Only for compatibility.
*/
-static int __init mcheck_disable(char *str)
+static int __init mcheck_setup_old(char *str)
{
mce_disabled = 1;
return 1;
}
-__setup("nomce", mcheck_disable);
+__setup("nomce", mcheck_setup_old);

#ifdef CONFIG_DEBUG_FS
struct dentry *mce_get_debugfs_dir(void)
--
1.7.1

2011-06-17 08:46:20

by Hidetoshi Seto

[permalink] [raw]
Subject: [PATCH 5/8] x86, mce: introduce mce_sysdev_init()

As a one of mce_sysdev_ family, coordinate a single function that
initializes sysfs hierarchy here.

Signed-off-by: Hidetoshi Seto <[email protected]>
---
arch/x86/kernel/cpu/mcheck/mce.c | 65 ++++++++++++++++++++-----------------
1 files changed, 35 insertions(+), 30 deletions(-)

diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index ad0e9fb..0424299 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -1953,6 +1953,39 @@ static __cpuinit void mce_sysdev_remove(unsigned int cpu)
cpumask_clear_cpu(cpu, mce_sysdev_initialized);
}

+static __init int mce_sysdev_init(void)
+{
+ int err, i;
+
+ zalloc_cpumask_var(&mce_sysdev_initialized, GFP_KERNEL);
+
+ /* init attributes for each mce_banks */
+ for (i = 0; i < banks; i++) {
+ struct mce_bank *b = &mce_banks[i];
+ struct sysdev_attribute *a = &b->attr;
+
+ sysfs_attr_init(&a->attr);
+ a->attr.name = b->attrname;
+ snprintf(b->attrname, ATTR_LEN, "bank%d", i);
+
+ a->attr.mode = 0644;
+ a->show = show_bank;
+ a->store = set_bank;
+ }
+
+ err = sysdev_class_register(&mce_sysdev_class);
+ if (err)
+ return err;
+
+ for_each_online_cpu(i) {
+ err = mce_sysdev_create(i);
+ if (err)
+ return err;
+ }
+
+ return err;
+}
+
/* Make sure there are no machine checks on offlined CPUs. */
static void __cpuinit mce_disable_cpu(void *h)
{
@@ -2025,46 +2058,18 @@ static struct notifier_block mce_cpu_notifier __cpuinitdata = {
.notifier_call = mce_cpu_callback,
};

-static __init void mce_init_banks(void)
-{
- int i;
-
- for (i = 0; i < banks; i++) {
- struct mce_bank *b = &mce_banks[i];
- struct sysdev_attribute *a = &b->attr;
-
- sysfs_attr_init(&a->attr);
- a->attr.name = b->attrname;
- snprintf(b->attrname, ATTR_LEN, "bank%d", i);
-
- a->attr.mode = 0644;
- a->show = show_bank;
- a->store = set_bank;
- }
-}
-
static __init int mcheck_init_device(void)
{
int err;
- int i = 0;

if (!mce_available(&boot_cpu_data))
return -EIO;

- zalloc_cpumask_var(&mce_sysdev_initialized, GFP_KERNEL);
-
- mce_init_banks();
-
- err = sysdev_class_register(&mce_sysdev_class);
+ /* register sysfs interface */
+ err = mce_sysdev_init();
if (err)
return err;

- for_each_online_cpu(i) {
- err = mce_sysdev_create(i);
- if (err)
- return err;
- }
-
register_syscore_ops(&mce_syscore_ops);
register_hotcpu_notifier(&mce_cpu_notifier);

--
1.7.1

2011-06-17 08:47:25

by Hidetoshi Seto

[permalink] [raw]
Subject: [PATCH 6/8] x86, mce: introduce mce_memory_failure_process()

And relocate related functions to near by mce_ring* routines.

Signed-off-by: Hidetoshi Seto <[email protected]>
---
arch/x86/kernel/cpu/mcheck/mce.c | 26 +++++++++++++++++---------
1 files changed, 17 insertions(+), 9 deletions(-)

diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index 0424299..a118496 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -395,6 +395,8 @@ static inline void mce_gather_info(struct mce *m, struct pt_regs *regs)
}

/*
+ * mce_ring, mce_memory_failure: Support for Memory errors
+ *
* Simple lockless ring to communicate PFNs from the exception handler with the
* process context work function. This is vastly simplified because there's
* only a single reader and a single writer.
@@ -449,6 +451,20 @@ static int mce_ring_add(unsigned long pfn)
return 0;
}

+/* dummy to break dependency. actual code is in mm/memory-failure.c */
+void __attribute__((weak)) memory_failure(unsigned long pfn, int vector)
+{
+ pr_err("Action optional memory failure at %lx ignored\n", pfn);
+}
+
+static inline void mce_memory_failure_process(void)
+{
+ unsigned long pfn;
+
+ while (mce_ring_get(&pfn))
+ memory_failure(pfn, MCE_VECTOR);
+}
+
static int mce_available(struct cpuinfo_x86 *c)
{
if (mce_disabled)
@@ -1049,12 +1065,6 @@ out:
}
EXPORT_SYMBOL_GPL(do_machine_check);

-/* dummy to break dependency. actual code is in mm/memory-failure.c */
-void __attribute__((weak)) memory_failure(unsigned long pfn, int vector)
-{
- printk(KERN_ERR "Action optional memory failure at %lx ignored\n", pfn);
-}
-
/*
* Called after mce notification in process context. This code
* is allowed to sleep. Call the high level VM handler to process
@@ -1068,10 +1078,8 @@ void __attribute__((weak)) memory_failure(unsigned long pfn, int vector)
*/
void mce_notify_process(void)
{
- unsigned long pfn;
mce_notify_irq();
- while (mce_ring_get(&pfn))
- memory_failure(pfn, MCE_VECTOR);
+ mce_memory_failure_process();
}

static void mce_process_work(struct work_struct *dummy)
--
1.7.1

2011-06-17 08:49:48

by Hidetoshi Seto

[permalink] [raw]
Subject: [PATCH 7/8] x86, mce: rework use of TIF_MCE_NOTIFY

The basic flow of MCE handler is summarized as follows:
1) from NMI context:
check hardware error registers, determine error severity,
and then panic or request non-NMI context by irq_work() to
continue the system.
2) from (irq) context:
call non-NMI safe functions,
wake up loggers and schedule work if required
3) from worker thread:
process some time-consuming works like memory poisoning.

TIF_MCE_NOTIFY flag is relatively legacy and have used to do tasks of
2) and 3) on the thread context that interrupted by MCE. However now
use of irq_work() and work-queue is enough for these tasks, so this
patch removes duplicated tasks in mce_notify_process().

As the result there is no task to be done in the interrupted context,
but soon if SRAR is supported there would be some thread-specific thing
for action required. So (even if it will be removed soon) keep the flag
for such possible future use, until better mechanism is introduced.

Signed-off-by: Hidetoshi Seto <[email protected]>
---
arch/x86/kernel/cpu/mcheck/mce.c | 28 ++++++++++++----------------
1 files changed, 12 insertions(+), 16 deletions(-)

diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index a118496..bc8a02c 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -1053,8 +1053,9 @@ void do_machine_check(struct pt_regs *regs, long error_code)
if (kill_it && tolerant < 3)
force_sig(SIGBUS, current);

- /* notify userspace ASAP */
- set_thread_flag(TIF_MCE_NOTIFY);
+ /* Trap this thread before returning to user, for action required */
+ if (worst == MCE_AR_SEVERITY)
+ set_thread_flag(TIF_MCE_NOTIFY);

if (worst > 0)
mce_report_event(regs);
@@ -1066,25 +1067,22 @@ out:
EXPORT_SYMBOL_GPL(do_machine_check);

/*
- * Called after mce notification in process context. This code
- * is allowed to sleep. Call the high level VM handler to process
- * any corrupted pages.
- * Assume that the work queue code only calls this one at a time
- * per CPU.
- * Note we don't disable preemption, so this code might run on the wrong
- * CPU. In this case the event is picked up by the scheduled work queue.
- * This is merely a fast path to expedite processing in some common
- * cases.
+ * Called in process context that interrupted by MCE and marked with
+ * TIF_MCE_NOTFY, just before returning to erroneous userland.
+ * This code is allowed to sleep.
+ * Attempt possible recovery such as calling the high level VM handler to
+ * process any corrupted pages, and kill/signal current process if required.
*/
void mce_notify_process(void)
{
- mce_notify_irq();
- mce_memory_failure_process();
+ clear_thread_flag(TIF_MCE_NOTIFY);
+
+ /* TBD: do recovery for action required event */
}

static void mce_process_work(struct work_struct *dummy)
{
- mce_notify_process();
+ mce_memory_failure_process();
}

#ifdef CONFIG_X86_MCE_INTEL
@@ -1187,8 +1185,6 @@ int mce_notify_irq(void)
/* Not more than two messages every minute */
static DEFINE_RATELIMIT_STATE(ratelimit, 60*HZ, 2);

- clear_thread_flag(TIF_MCE_NOTIFY);
-
if (test_and_clear_bit(0, &mce_need_notify)) {
/* wake processes polling /dev/mcelog */
wake_up_interruptible(&mce_chrdev_wait);
--
1.7.1

2011-06-17 08:51:21

by Hidetoshi Seto

[permalink] [raw]
Subject: [PATCH 8/8] x86, mce, edac: call edac_mce_parse() once per a record

There is no reason to keep it in the cmpxchg loop.
(The loop continues until mcelog acquires buffer to save the record)

And use do-while for the loop here.

Signed-off-by: Hidetoshi Seto <[email protected]>
---
arch/x86/kernel/cpu/mcheck/mce.c | 26 +++++++++++++-------------
1 files changed, 13 insertions(+), 13 deletions(-)

diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index bc8a02c..f1d8ce1 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -148,21 +148,21 @@ void mce_log(struct mce *mce)
/* Emit the trace record: */
trace_mce_record(mce);

+ /*
+ * If edac_mce is enabled, it will check the error type and will
+ * process it, if it is a known error.
+ * Otherwise, the error will be sent through mcelog interface.
+ */
+ if (edac_mce_parse(mce))
+ return;
+
mce->finished = 0;
wmb();
- for (;;) {
+
+ do {
entry = rcu_dereference_check_mce(mcelog.next);
for (;;) {
/*
- * If edac_mce is enabled, it will check the error type
- * and will process it, if it is a known error.
- * Otherwise, the error will be sent through mcelog
- * interface
- */
- if (edac_mce_parse(mce))
- return;
-
- /*
* When the buffer fills up discard new entries.
* Assume that the earlier errors are the more
* interesting ones:
@@ -181,10 +181,10 @@ void mce_log(struct mce *mce)
}
smp_rmb();
next = entry + 1;
- if (cmpxchg(&mcelog.next, entry, next) == entry)
- break;
- }
+ } while (cmpxchg(&mcelog.next, entry, next) != entry);
+
memcpy(mcelog.entry + entry, mce, sizeof(struct mce));
+
wmb();
mcelog.entry[entry].finished = 1;
wmb();
--
1.7.1

2011-06-17 13:56:57

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH 1/8] x86, mce: stop calling del_timer_sync() from interrupt

On Fri, Jun 17, 2011 at 04:40:36AM -0400, Hidetoshi Seto wrote:
> Function del_timer_sync() has WARN_ON(in_irq()) in it because
> calling it from interrupt context can cause deadlock if it
> interrupts the target timer running.

No need to explain the del_timer_sync() code here - just say that it's
not allowed to call it from an IRQ context.

> In MCE code, del_timer_sync() is used with on_each_cpu() in
> some parts for sysfs files:

... for the following sysfs files:

> bank*, check_interval, cmci_disabled and ignore_ce.
>
> However use of on_each_cpu() results in calling the function
> passed as the argument in the interrupt context. It means you
> can see a flood of warnings from del_timer_sync() by a simple
> file access, for example:
>
> echo 300 > /sys/devices/system/machinecheck/machinecheck0/check_interval

Good.

>
> Fortunately these MCE specific files are rare-used and AFAIK

rarely used

> only few MCE geeks experience this warning on write.

MCE geeks ??? I wonder who those are :-)

> To remove the warning (for my happy hacking), move timer deletion
> outside of the interrupt context ;-)
>
> v2: update patch description
>
> Signed-off-by: Hidetoshi Seto <[email protected]>
> ---
> arch/x86/kernel/cpu/mcheck/mce.c | 22 ++++++++++++++++------
> 1 files changed, 16 insertions(+), 6 deletions(-)
>
> diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
> index 08363b0..42fc8d2 100644
> --- a/arch/x86/kernel/cpu/mcheck/mce.c
> +++ b/arch/x86/kernel/cpu/mcheck/mce.c
> @@ -1140,6 +1140,17 @@ static void mce_start_timer(unsigned long data)
> add_timer_on(t, smp_processor_id());
> }
>
> +/* Must not be called from interrupt where del_timer_sync() can deadlock */
> +static void mce_timer_delete_all(void)
> +{
> + int cpu;
> +
> + for_each_online_cpu(cpu) {
> + if (mce_available(&per_cpu(cpu_info, cpu)))
> + del_timer_sync(&per_cpu(mce_timer, cpu));
> + }
> +}

You're adding the mce_available(..) check just to remove it in the next
patch. Since all those sysfs nodes are behind such a check, there's no
need for it here too.

> static void mce_do_trigger(struct work_struct *work)
> {
> call_usermodehelper(mce_helper, mce_helper_argv, NULL, UMH_NO_WAIT);
> @@ -1750,7 +1761,6 @@ static struct syscore_ops mce_syscore_ops = {
>
> static void mce_cpu_restart(void *data)
> {
> - del_timer_sync(&__get_cpu_var(mce_timer));
> if (!mce_available(__this_cpu_ptr(&cpu_info)))
> return;
> __mcheck_cpu_init_generic();
> @@ -1760,16 +1770,15 @@ static void mce_cpu_restart(void *data)
> /* Reinit MCEs after user configuration changes */
> static void mce_restart(void)
> {
> + mce_timer_delete_all();
> on_each_cpu(mce_cpu_restart, NULL, 1);
> }
>
> /* Toggle features for corrected errors */
> -static void mce_disable_ce(void *all)
> +static void mce_disable_cmci(void *data)
> {
> if (!mce_available(__this_cpu_ptr(&cpu_info)))
> return;
> - if (all)
> - del_timer_sync(&__get_cpu_var(mce_timer));
> cmci_clear();
> }
>
> @@ -1852,7 +1861,8 @@ static ssize_t set_ignore_ce(struct sys_device *s,
> if (mce_ignore_ce ^ !!new) {
> if (new) {
> /* disable ce features */
> - on_each_cpu(mce_disable_ce, (void *)1, 1);
> + mce_timer_delete_all();
> + on_each_cpu(mce_disable_cmci, NULL, 1);
> mce_ignore_ce = 1;
> } else {
> /* enable ce features */
> @@ -1875,7 +1885,7 @@ static ssize_t set_cmci_disabled(struct sys_device *s,
> if (mce_cmci_disabled ^ !!new) {
> if (new) {
> /* disable cmci */
> - on_each_cpu(mce_disable_ce, NULL, 1);
> + on_each_cpu(mce_disable_cmci, NULL, 1);
> mce_cmci_disabled = 1;
> } else {
> /* enable cmci */
> --
> 1.7.1

--
Regards/Gruss,
Boris.

Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
GM: Alberto Bozzo
Reg: Dornach, Landkreis Muenchen
HRB Nr. 43632 WEEE Registernr: 129 19551

2011-06-17 14:40:13

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH 2/8] x86, mce: remove redundant mce_available() checks

On Fri, Jun 17, 2011 at 04:42:01AM -0400, Hidetoshi Seto wrote:
> I noticed that there is no need to check mce_available() on runtime.
> In fact we don't support asymmetric configuration such as mixture of
> varied processors which some are MCE capable while others are not.

simplify the above sentence please.

>
> Finally only 2 checks in init code path are enough for now.
>
> When !mce_available():
> - in mcheck_cpu_init():
> - mce timer is not initialized
... in __mcheck_cpu_init_timer()
> - cmci is not enabled
> - and later in mce_init_device():
> - sysfs files are not created
> - hotplug notifier is not registered
> So we don't need checks at all in these functions which is never

s/which is/because they are/

> called when !mce_available().
>
> For safety I also add a warning which is given when bad unsupported
> asymmetric configuration is detected.
>
> Signed-off-by: Hidetoshi Seto <[email protected]>
> ---
> arch/x86/include/asm/mce.h | 2 -
> arch/x86/kernel/cpu/mcheck/mce.c | 34 ++++++++++---------------------
> arch/x86/kernel/cpu/mcheck/mce_intel.c | 2 +-
> 3 files changed, 12 insertions(+), 26 deletions(-)
>
> diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h
> index 716b48a..77825dd 100644
> --- a/arch/x86/include/asm/mce.h
> +++ b/arch/x86/include/asm/mce.h
> @@ -180,8 +180,6 @@ void mce_amd_feature_init(struct cpuinfo_x86 *c);
> static inline void mce_amd_feature_init(struct cpuinfo_x86 *c) { }
> #endif
>
> -int mce_available(struct cpuinfo_x86 *c);
> -
> DECLARE_PER_CPU(unsigned, mce_exception_count);
> DECLARE_PER_CPU(unsigned, mce_poll_count);
>
> diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
> index 42fc8d2..205b334 100644
> --- a/arch/x86/kernel/cpu/mcheck/mce.c
> +++ b/arch/x86/kernel/cpu/mcheck/mce.c
> @@ -449,7 +449,7 @@ static int mce_ring_add(unsigned long pfn)
> return 0;
> }
>
> -int mce_available(struct cpuinfo_x86 *c)
> +static int mce_available(struct cpuinfo_x86 *c)
> {
> if (mce_disabled)
> return 0;
> @@ -1121,10 +1121,7 @@ static void mce_start_timer(unsigned long data)
>
> WARN_ON(smp_processor_id() != data);
>
> - if (mce_available(__this_cpu_ptr(&cpu_info))) {
> - machine_check_poll(MCP_TIMESTAMP,
> - &__get_cpu_var(mce_poll_banks));
> - }
> + machine_check_poll(MCP_TIMESTAMP, &__get_cpu_var(mce_poll_banks));
>
> /*
> * Alert userspace if needed. If we logged an MCE, reduce the
> @@ -1146,8 +1143,7 @@ static void mce_timer_delete_all(void)
> int cpu;
>
> for_each_online_cpu(cpu) {
> - if (mce_available(&per_cpu(cpu_info, cpu)))
> - del_timer_sync(&per_cpu(mce_timer, cpu));
> + del_timer_sync(&per_cpu(mce_timer, cpu));
> }

no need for the parentheses but this should be part of the first patch
anyway.

> }
>
> @@ -1418,8 +1414,15 @@ void __cpuinit mcheck_cpu_init(struct cpuinfo_x86 *c)
> if (__mcheck_cpu_ancient_init(c))
> return;
>
> - if (!mce_available(c))
> + if (!mce_available(c)) {
> + /*
> + * Asymmetric configurations are not supported today.
> + * If mce_banks is allocated there must be a cpu passed here.
> + */
> + WARN_ON(!mce_disabled && mce_banks);
> + mce_disabled = 1;
> return;
> + }

I don't think this will ever happen so why complicate the code
needlessly? This can only be executed if at least one of the cores on
the system has CPUID(1)_EDX, bits 7 and 14 cleared and that's just
silly.

Besides, mcheck_init_device() already confirms we don't support
MCE-asymmetric configs:

if (!mce_available(&boot_cpu_data))
return -EIO;

> if (__mcheck_cpu_cap_init() < 0 || __mcheck_cpu_apply_quirks(c) < 0) {
> mce_disabled = 1;
> @@ -1761,8 +1764,6 @@ static struct syscore_ops mce_syscore_ops = {
>
> static void mce_cpu_restart(void *data)
> {
> - if (!mce_available(__this_cpu_ptr(&cpu_info)))
> - return;
> __mcheck_cpu_init_generic();
> __mcheck_cpu_init_timer();
> }
> @@ -1777,15 +1778,11 @@ static void mce_restart(void)
> /* Toggle features for corrected errors */
> static void mce_disable_cmci(void *data)
> {
> - if (!mce_available(__this_cpu_ptr(&cpu_info)))
> - return;
> cmci_clear();
> }
>
> static void mce_enable_ce(void *all)
> {
> - if (!mce_available(__this_cpu_ptr(&cpu_info)))
> - return;
> cmci_reenable();
> cmci_recheck();
> if (all)
> @@ -1946,9 +1943,6 @@ static __cpuinit int mce_sysdev_create(unsigned int cpu)
> int err;
> int i, j;
>
> - if (!mce_available(&boot_cpu_data))
> - return -EIO;
> -
> memset(&sysdev->kobj, 0, sizeof(struct kobject));
> sysdev->id = cpu;
> sysdev->cls = &mce_sysdev_class;
> @@ -2006,9 +2000,6 @@ static void __cpuinit mce_disable_cpu(void *h)
> unsigned long action = *(unsigned long *)h;
> int i;
>
> - if (!mce_available(__this_cpu_ptr(&cpu_info)))
> - return;
> -
> if (!(action & CPU_TASKS_FROZEN))
> cmci_clear();
> for (i = 0; i < banks; i++) {
> @@ -2024,9 +2015,6 @@ static void __cpuinit mce_reenable_cpu(void *h)
> unsigned long action = *(unsigned long *)h;
> int i;
>
> - if (!mce_available(__this_cpu_ptr(&cpu_info)))
> - return;
> -
> if (!(action & CPU_TASKS_FROZEN))
> cmci_reenable();
> for (i = 0; i < banks; i++) {
> diff --git a/arch/x86/kernel/cpu/mcheck/mce_intel.c b/arch/x86/kernel/cpu/mcheck/mce_intel.c
> index 8694ef5..393516c 100644
> --- a/arch/x86/kernel/cpu/mcheck/mce_intel.c
> +++ b/arch/x86/kernel/cpu/mcheck/mce_intel.c
> @@ -130,7 +130,7 @@ void cmci_recheck(void)
> unsigned long flags;
> int banks;
>
> - if (!mce_available(__this_cpu_ptr(&cpu_info)) || !cmci_supported(&banks))
> + if (!cmci_supported(&banks))
> return;
> local_irq_save(flags);
> machine_check_poll(MCP_TIMESTAMP, &__get_cpu_var(mce_banks_owned));
> --
> 1.7.1
>
>
>

--
Regards/Gruss,
Boris.

Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
GM: Alberto Bozzo
Reg: Dornach, Landkreis Muenchen
HRB Nr. 43632 WEEE Registernr: 129 19551

2011-06-17 15:11:47

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH 3/8] x86, mce: introduce mce_timer_add()

On Fri, Jun 17, 2011 at 04:43:09AM -0400, Hidetoshi Seto wrote:
> It is too redundant to call setup_timer() every time when the timer is
> going to be added.
>
> This patch breaks __mcheck_cpu_init_timer() down, put setup part to init
> code path and construct mce_timer_add() from the rests. Since there is no
> strong reason to keep interval only when it back from hotplug event, this
> patch also helps to kill duplicated code in hotplug notifier.
>
> As the sideline this patch includes rename of mce_start_timer() to
> mce_timer_run(), to group related functions with mce_timer_ prefix.
>
> Signed-off-by: Hidetoshi Seto <[email protected]>

Reviewed-by: Borislav Petkov <[email protected]>

> ---
> arch/x86/kernel/cpu/mcheck/mce.c | 54 ++++++++++++++++++-------------------
> 1 files changed, 26 insertions(+), 28 deletions(-)
>
> diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
> index 205b334..c3dad64 100644
> --- a/arch/x86/kernel/cpu/mcheck/mce.c
> +++ b/arch/x86/kernel/cpu/mcheck/mce.c
> @@ -1114,7 +1114,7 @@ static int check_interval = 5 * 60; /* 5 minutes */
> static DEFINE_PER_CPU(int, mce_next_interval); /* in jiffies */
> static DEFINE_PER_CPU(struct timer_list, mce_timer);
>
> -static void mce_start_timer(unsigned long data)
> +static void mce_timer_run(unsigned long data)
> {
> struct timer_list *t = &per_cpu(mce_timer, data);
> int *n;
> @@ -1147,6 +1147,21 @@ static void mce_timer_delete_all(void)
> }
> }
>
> +static void mce_timer_add(unsigned long cpu)
> +{
> + struct timer_list *t = &per_cpu(mce_timer, cpu);
> + int *n = &per_cpu(mce_next_interval, cpu);
> +
> + if (mce_ignore_ce || !check_interval)
> + return;
> +
> + /* reset next interval */
> + *n = check_interval * HZ;
> +
> + t->expires = round_jiffies(jiffies + *n);
> + add_timer_on(t, cpu);
> +}
> +
> static void mce_do_trigger(struct work_struct *work)
> {
> call_usermodehelper(mce_helper, mce_helper_argv, NULL, UMH_NO_WAIT);
> @@ -1374,23 +1389,6 @@ static void __mcheck_cpu_init_vendor(struct cpuinfo_x86 *c)
> }
> }
>
> -static void __mcheck_cpu_init_timer(void)
> -{
> - struct timer_list *t = &__get_cpu_var(mce_timer);
> - int *n = &__get_cpu_var(mce_next_interval);
> -
> - setup_timer(t, mce_start_timer, smp_processor_id());
> -
> - if (mce_ignore_ce)
> - return;
> -
> - *n = check_interval * HZ;
> - if (!*n)
> - return;
> - t->expires = round_jiffies(jiffies + *n);
> - add_timer_on(t, smp_processor_id());
> -}
> -
> /* Handle unconfigured int18 (should never happen) */
> static void unexpected_machine_check(struct pt_regs *regs, long error_code)
> {
> @@ -1408,6 +1406,8 @@ void (*machine_check_vector)(struct pt_regs *, long error_code) =
> */
> void __cpuinit mcheck_cpu_init(struct cpuinfo_x86 *c)
> {
> + int cpu = smp_processor_id();
> +
> if (mce_disabled)
> return;
>
> @@ -1433,9 +1433,12 @@ void __cpuinit mcheck_cpu_init(struct cpuinfo_x86 *c)
>
> __mcheck_cpu_init_generic();
> __mcheck_cpu_init_vendor(c);
> - __mcheck_cpu_init_timer();
> +
> + setup_timer(&__get_cpu_var(mce_timer), mce_timer_run, cpu);
> INIT_WORK(&__get_cpu_var(mce_work), mce_process_work);
> init_irq_work(&__get_cpu_var(mce_irq_work), &mce_irq_work_cb);
> +
> + mce_timer_add(cpu);
> }
>
> /*
> @@ -1765,7 +1768,7 @@ static struct syscore_ops mce_syscore_ops = {
> static void mce_cpu_restart(void *data)
> {
> __mcheck_cpu_init_generic();
> - __mcheck_cpu_init_timer();
> + mce_timer_add(smp_processor_id());
> }
>
> /* Reinit MCEs after user configuration changes */
> @@ -1786,7 +1789,7 @@ static void mce_enable_ce(void *all)
> cmci_reenable();
> cmci_recheck();
> if (all)
> - __mcheck_cpu_init_timer();
> + mce_timer_add(smp_processor_id());
> }
>
> static struct sysdev_class mce_sysdev_class = {
> @@ -2030,7 +2033,6 @@ static int __cpuinit
> mce_cpu_callback(struct notifier_block *nfb, unsigned long action, void *hcpu)
> {
> unsigned int cpu = (unsigned long)hcpu;
> - struct timer_list *t = &per_cpu(mce_timer, cpu);
>
> switch (action) {
> case CPU_ONLINE:
> @@ -2047,16 +2049,12 @@ mce_cpu_callback(struct notifier_block *nfb, unsigned long action, void *hcpu)
> break;
> case CPU_DOWN_PREPARE:
> case CPU_DOWN_PREPARE_FROZEN:
> - del_timer_sync(t);
> + del_timer_sync(&per_cpu(mce_timer, cpu));
> smp_call_function_single(cpu, mce_disable_cpu, &action, 1);
> break;
> case CPU_DOWN_FAILED:
> case CPU_DOWN_FAILED_FROZEN:
> - if (!mce_ignore_ce && check_interval) {
> - t->expires = round_jiffies(jiffies +
> - __get_cpu_var(mce_next_interval));
> - add_timer_on(t, cpu);
> - }
> + mce_timer_add(cpu);
> smp_call_function_single(cpu, mce_reenable_cpu, &action, 1);
> break;
> case CPU_POST_DEAD:
> --
> 1.7.1
>
>
>

--
Regards/Gruss,
Boris.

Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
GM: Alberto Bozzo
Reg: Dornach, Landkreis Muenchen
HRB Nr. 43632 WEEE Registernr: 129 19551

2011-06-17 15:42:10

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH 4/8] x86, mce: rename bootparam parser

On Fri, Jun 17, 2011 at 04:44:18AM -0400, Hidetoshi Seto wrote:
> Rename them with comprehensible prefix mcheck_setup.

The relocation is causing unneeded churn for no apparent reason.

> (at least it looks better than current misleading name)
> And relocate to put together setup codes.
>
> Before: After:
> mcheck_enable mcheck_setup

Nah, let's call it mcheck_parse_boot_param...

> mcheck_disable mcheck_setup_old

and leave this like this. "nomce" is the same as "mce=off" and frankly,
I'd like to remove this redundancy, thus no need to do the code
relocation. In addition, I don't think there are lots of systems running
with "nomce" so I really think we should drop it.

So Ingo, hpa, what is the proper way to remove early setup params? Maybe
through Documentation/feature-removal-schedule.txt?

>
> Signed-off-by: Hidetoshi Seto <[email protected]>
> ---
> arch/x86/kernel/cpu/mcheck/mce.c | 91 +++++++++++++++++++-------------------
> 1 files changed, 45 insertions(+), 46 deletions(-)
>
> diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
> index c3dad64..ad0e9fb 100644
> --- a/arch/x86/kernel/cpu/mcheck/mce.c
> +++ b/arch/x86/kernel/cpu/mcheck/mce.c
> @@ -1662,50 +1662,6 @@ static struct miscdevice mce_chrdev_device = {
> &mce_chrdev_ops,
> };
>
> -/*
> - * mce=off Disables machine check
> - * mce=no_cmci Disables CMCI
> - * mce=dont_log_ce Clears corrected events silently, no log created for CEs.
> - * mce=ignore_ce Disables polling and CMCI, corrected events are not cleared.
> - * mce=TOLERANCELEVEL[,monarchtimeout] (number, see above)
> - * monarchtimeout is how long to wait for other CPUs on machine
> - * check, or 0 to not wait
> - * mce=bootlog Log MCEs from before booting. Disabled by default on AMD.

Please remove "Disabled by default on AMD." while you're at it, since
this is not true anymore.

> - * mce=nobootlog Don't log MCEs from before booting.
> - */
> -static int __init mcheck_enable(char *str)
> -{
> - if (*str == 0) {
> - enable_p5_mce();
> - return 1;
> - }
> - if (*str == '=')
> - str++;
> - if (!strcmp(str, "off"))
> - mce_disabled = 1;
> - else if (!strcmp(str, "no_cmci"))
> - mce_cmci_disabled = 1;
> - else if (!strcmp(str, "dont_log_ce"))
> - mce_dont_log_ce = 1;
> - else if (!strcmp(str, "ignore_ce"))
> - mce_ignore_ce = 1;
> - else if (!strcmp(str, "bootlog") || !strcmp(str, "nobootlog"))
> - mce_bootlog = (str[0] == 'b');
> - else if (isdigit(str[0])) {
> - get_option(&str, &tolerant);
> - if (*str == ',') {
> - ++str;
> - get_option(&str, &monarch_timeout);
> - }
> - } else {
> - printk(KERN_INFO "mce argument %s ignored. Please use /sys\n",
> - str);
> - return 0;
> - }
> - return 1;
> -}
> -__setup("mce", mcheck_enable);
> -
> int __init mcheck_init(void)
> {
> mcheck_intel_therm_init();
> @@ -2120,14 +2076,57 @@ static __init int mcheck_init_device(void)
> device_initcall(mcheck_init_device);
>
> /*
> + * mce=off Disables machine check
> + * mce=no_cmci Disables CMCI
> + * mce=dont_log_ce Clears corrected events silently, no log created for CEs.
> + * mce=ignore_ce Disables polling and CMCI, corrected events are not cleared.
> + * mce=TOLERANCELEVEL[,monarchtimeout] (number, see above)
> + * monarchtimeout is how long to wait for other CPUs on machine
> + * check, or 0 to not wait
> + * mce=bootlog Log MCEs from before booting. Disabled by default on AMD.
> + * mce=nobootlog Don't log MCEs from before booting.
> + */
> +static int __init mcheck_setup(char *str)
> +{
> + if (*str == 0) {
> + enable_p5_mce();
> + return 1;
> + }
> + if (*str == '=')
> + str++;
> + if (!strcmp(str, "off"))
> + mce_disabled = 1;
> + else if (!strcmp(str, "no_cmci"))
> + mce_cmci_disabled = 1;
> + else if (!strcmp(str, "dont_log_ce"))
> + mce_dont_log_ce = 1;
> + else if (!strcmp(str, "ignore_ce"))
> + mce_ignore_ce = 1;
> + else if (!strcmp(str, "bootlog") || !strcmp(str, "nobootlog"))
> + mce_bootlog = (str[0] == 'b');
> + else if (isdigit(str[0])) {
> + get_option(&str, &tolerant);
> + if (*str == ',') {
> + ++str;
> + get_option(&str, &monarch_timeout);
> + }
> + } else {
> + pr_info("mce argument %s ignored. Please use /sys\n", str);
> + return 0;
> + }
> + return 1;
> +}
> +__setup("mce", mcheck_setup);
> +
> +/*
> * Old style boot options parsing. Only for compatibility.
> */
> -static int __init mcheck_disable(char *str)
> +static int __init mcheck_setup_old(char *str)
> {
> mce_disabled = 1;
> return 1;
> }
> -__setup("nomce", mcheck_disable);
> +__setup("nomce", mcheck_setup_old);
>
> #ifdef CONFIG_DEBUG_FS
> struct dentry *mce_get_debugfs_dir(void)
> --
> 1.7.1
>
>
>

--
Regards/Gruss,
Boris.

Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
GM: Alberto Bozzo
Reg: Dornach, Landkreis Muenchen
HRB Nr. 43632 WEEE Registernr: 129 19551

2011-06-17 16:33:07

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH 5/8] x86, mce: introduce mce_sysdev_init()

On Fri, Jun 17, 2011 at 04:45:44AM -0400, Hidetoshi Seto wrote:
> As a one of mce_sysdev_ family, coordinate a single function that
> initializes sysfs hierarchy here.
>
> Signed-off-by: Hidetoshi Seto <[email protected]>
> ---
> arch/x86/kernel/cpu/mcheck/mce.c | 65 ++++++++++++++++++++-----------------
> 1 files changed, 35 insertions(+), 30 deletions(-)
>
> diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
> index ad0e9fb..0424299 100644
> --- a/arch/x86/kernel/cpu/mcheck/mce.c
> +++ b/arch/x86/kernel/cpu/mcheck/mce.c
> @@ -1953,6 +1953,39 @@ static __cpuinit void mce_sysdev_remove(unsigned int cpu)
> cpumask_clear_cpu(cpu, mce_sysdev_initialized);
> }
>
> +static __init int mce_sysdev_init(void)
> +{
> + int err, i;
> +
> + zalloc_cpumask_var(&mce_sysdev_initialized, GFP_KERNEL);
> +
> + /* init attributes for each mce_banks */
> + for (i = 0; i < banks; i++) {
> + struct mce_bank *b = &mce_banks[i];
> + struct sysdev_attribute *a = &b->attr;
> +
> + sysfs_attr_init(&a->attr);
> + a->attr.name = b->attrname;
> + snprintf(b->attrname, ATTR_LEN, "bank%d", i);
> +
> + a->attr.mode = 0644;
> + a->show = show_bank;
> + a->store = set_bank;
> + }
> +
> + err = sysdev_class_register(&mce_sysdev_class);
> + if (err)
> + return err;
> +
> + for_each_online_cpu(i) {
> + err = mce_sysdev_create(i);
> + if (err)
> + return err;
> + }

Yeah, let's handle the error case here and unwind to a sane state:

if (err)
goto sysdev_unwind;
...

sysdev_unwind:
while (--i > 0)
mce_sysdev_remove(i);


> +
> + return err;
> +}
> +
> /* Make sure there are no machine checks on offlined CPUs. */
> static void __cpuinit mce_disable_cpu(void *h)
> {
> @@ -2025,46 +2058,18 @@ static struct notifier_block mce_cpu_notifier __cpuinitdata = {
> .notifier_call = mce_cpu_callback,
> };
>
> -static __init void mce_init_banks(void)
> -{
> - int i;
> -
> - for (i = 0; i < banks; i++) {
> - struct mce_bank *b = &mce_banks[i];
> - struct sysdev_attribute *a = &b->attr;
> -
> - sysfs_attr_init(&a->attr);
> - a->attr.name = b->attrname;
> - snprintf(b->attrname, ATTR_LEN, "bank%d", i);
> -
> - a->attr.mode = 0644;
> - a->show = show_bank;
> - a->store = set_bank;
> - }
> -}
> -
> static __init int mcheck_init_device(void)
> {
> int err;
> - int i = 0;
>
> if (!mce_available(&boot_cpu_data))
> return -EIO;
>
> - zalloc_cpumask_var(&mce_sysdev_initialized, GFP_KERNEL);
> -
> - mce_init_banks();
> -
> - err = sysdev_class_register(&mce_sysdev_class);
> + /* register sysfs interface */
> + err = mce_sysdev_init();
> if (err)
> return err;
>
> - for_each_online_cpu(i) {
> - err = mce_sysdev_create(i);
> - if (err)
> - return err;
> - }
> -
> register_syscore_ops(&mce_syscore_ops);
> register_hotcpu_notifier(&mce_cpu_notifier);
>
> --
> 1.7.1

Thanks.

--
Regards/Gruss,
Boris.

Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
GM: Alberto Bozzo
Reg: Dornach, Landkreis Muenchen
HRB Nr. 43632 WEEE Registernr: 129 19551

2011-06-17 17:00:22

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH 6/8] x86, mce: introduce mce_memory_failure_process()

On Fri, Jun 17, 2011 at 04:46:46AM -0400, Hidetoshi Seto wrote:
> And relocate related functions to near by mce_ring* routines.
>
> Signed-off-by: Hidetoshi Seto <[email protected]>
> ---
> arch/x86/kernel/cpu/mcheck/mce.c | 26 +++++++++++++++++---------
> 1 files changed, 17 insertions(+), 9 deletions(-)
>
> diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
> index 0424299..a118496 100644
> --- a/arch/x86/kernel/cpu/mcheck/mce.c
> +++ b/arch/x86/kernel/cpu/mcheck/mce.c
> @@ -395,6 +395,8 @@ static inline void mce_gather_info(struct mce *m, struct pt_regs *regs)
> }
>
> /*
> + * mce_ring, mce_memory_failure: Support for Memory errors
> + *
> * Simple lockless ring to communicate PFNs from the exception handler with the
> * process context work function. This is vastly simplified because there's
> * only a single reader and a single writer.
> @@ -449,6 +451,20 @@ static int mce_ring_add(unsigned long pfn)
> return 0;
> }
>
> +/* dummy to break dependency. actual code is in mm/memory-failure.c */
> +void __attribute__((weak)) memory_failure(unsigned long pfn, int vector)
> +{
> + pr_err("Action optional memory failure at %lx ignored\n", pfn);
> +}
> +
> +static inline void mce_memory_failure_process(void)
> +{
> + unsigned long pfn;
> +
> + while (mce_ring_get(&pfn))
> + memory_failure(pfn, MCE_VECTOR);
> +}
> +

Frankly, I don't see the need to carve out two lines of code into a
function to avoid duplication and causing needless churn for this.

Just do the

while (..)
memory_failure(..)

in the next patch and that's it. mce.c is already cluttered with too
much stuff anyway.

> static int mce_available(struct cpuinfo_x86 *c)
> {
> if (mce_disabled)
> @@ -1049,12 +1065,6 @@ out:
> }
> EXPORT_SYMBOL_GPL(do_machine_check);
>
> -/* dummy to break dependency. actual code is in mm/memory-failure.c */
> -void __attribute__((weak)) memory_failure(unsigned long pfn, int vector)
> -{
> - printk(KERN_ERR "Action optional memory failure at %lx ignored\n", pfn);
> -}
> -
> /*
> * Called after mce notification in process context. This code
> * is allowed to sleep. Call the high level VM handler to process
> @@ -1068,10 +1078,8 @@ void __attribute__((weak)) memory_failure(unsigned long pfn, int vector)
> */
> void mce_notify_process(void)
> {
> - unsigned long pfn;
> mce_notify_irq();
> - while (mce_ring_get(&pfn))
> - memory_failure(pfn, MCE_VECTOR);
> + mce_memory_failure_process();
> }
>
> static void mce_process_work(struct work_struct *dummy)
> --
> 1.7.1
>
>
>

--
Regards/Gruss,
Boris.

Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
GM: Alberto Bozzo
Reg: Dornach, Landkreis Muenchen
HRB Nr. 43632 WEEE Registernr: 129 19551

2011-06-17 17:10:52

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH 8/8] x86, mce, edac: call edac_mce_parse() once per a record

On Fri, Jun 17, 2011 at 04:50:48AM -0400, Hidetoshi Seto wrote:
> There is no reason to keep it in the cmpxchg loop.
> (The loop continues until mcelog acquires buffer to save the record)
>
> And use do-while for the loop here.
>
> Signed-off-by: Hidetoshi Seto <[email protected]>

Good catch.

Reviewed-by: Borislav Petkov <[email protected]>

> ---
> arch/x86/kernel/cpu/mcheck/mce.c | 26 +++++++++++++-------------
> 1 files changed, 13 insertions(+), 13 deletions(-)
>
> diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
> index bc8a02c..f1d8ce1 100644
> --- a/arch/x86/kernel/cpu/mcheck/mce.c
> +++ b/arch/x86/kernel/cpu/mcheck/mce.c
> @@ -148,21 +148,21 @@ void mce_log(struct mce *mce)
> /* Emit the trace record: */
> trace_mce_record(mce);
>
> + /*
> + * If edac_mce is enabled, it will check the error type and will
> + * process it, if it is a known error.
> + * Otherwise, the error will be sent through mcelog interface.
> + */
> + if (edac_mce_parse(mce))
> + return;
> +
> mce->finished = 0;
> wmb();
> - for (;;) {
> +
> + do {
> entry = rcu_dereference_check_mce(mcelog.next);
> for (;;) {
> /*
> - * If edac_mce is enabled, it will check the error type
> - * and will process it, if it is a known error.
> - * Otherwise, the error will be sent through mcelog
> - * interface
> - */
> - if (edac_mce_parse(mce))
> - return;
> -
> - /*
> * When the buffer fills up discard new entries.
> * Assume that the earlier errors are the more
> * interesting ones:
> @@ -181,10 +181,10 @@ void mce_log(struct mce *mce)
> }
> smp_rmb();
> next = entry + 1;
> - if (cmpxchg(&mcelog.next, entry, next) == entry)
> - break;
> - }
> + } while (cmpxchg(&mcelog.next, entry, next) != entry);
> +
> memcpy(mcelog.entry + entry, mce, sizeof(struct mce));
> +
> wmb();
> mcelog.entry[entry].finished = 1;
> wmb();
> --
> 1.7.1

Thanks.

--
Regards/Gruss,
Boris.

Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
GM: Alberto Bozzo
Reg: Dornach, Landkreis Muenchen
HRB Nr. 43632 WEEE Registernr: 129 19551

2011-06-17 22:25:53

by Tony Luck

[permalink] [raw]
Subject: RE: [PATCH 4/8] x86, mce: rename bootparam parser

> and leave this like this. "nomce" is the same as "mce=off" and frankly,
> I'd like to remove this redundancy, thus no need to do the code
> relocation. In addition, I don't think there are lots of systems running
> with "nomce" so I really think we should drop it.
>
> So Ingo, hpa, what is the proper way to remove early setup params? Maybe
> through Documentation/feature-removal-schedule.txt?

Though it seems odd to me that anyone would want to turn mce off,
the fact that we have two ways to do so would indicate that people
do (or at least did) want to do this.

It seems like we'd need a long "deprecated" period (till all the
major OSVs turn out a new release) to remove this without surprises.
If the "nomce" option just disappears, then people using it will
not realize that their boot time argument was ignored (until they
see a reported error).

-Tony

2011-06-18 08:38:30

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH 4/8] x86, mce: rename bootparam parser

On Fri, Jun 17, 2011 at 03:25:51PM -0700, Luck, Tony wrote:
> > and leave this like this. "nomce" is the same as "mce=off" and frankly,
> > I'd like to remove this redundancy, thus no need to do the code
> > relocation. In addition, I don't think there are lots of systems running
> > with "nomce" so I really think we should drop it.
> >
> > So Ingo, hpa, what is the proper way to remove early setup params? Maybe
> > through Documentation/feature-removal-schedule.txt?
>
> Though it seems odd to me that anyone would want to turn mce off,
> the fact that we have two ways to do so would indicate that people
> do (or at least did) want to do this.

Yeah, the only usecase I could think of for this is people doing their
own userspace DRAM ECC evaluation like google. But they'd still need to
have an #MC handler... or they could parse syslog for the output from
the default unexpected_machine_check().. hm.

> It seems like we'd need a long "deprecated" period (till all the
> major OSVs turn out a new release) to remove this without surprises.
> If the "nomce" option just disappears, then people using it will
> not realize that their boot time argument was ignored (until they
> see a reported error).

Yeah, we could do a grace period with a warning:

/*
* Old style boot options parsing. Only for compatibility.
*/
static int __init mcheck_disable(char *str)
{
pr_err("\"nomce\" boot param is deprecated. Please use \"mce=off\"\n");

mce_disabled = 1;
return 1;
}
__setup("nomce", mcheck_disable);

and then kill it.

--
Regards/Gruss,
Boris.

2011-06-20 04:48:08

by Hidetoshi Seto

[permalink] [raw]
Subject: Re: [PATCH 1/8] x86, mce: stop calling del_timer_sync() from interrupt

(2011/06/17 22:56), Borislav Petkov wrote:
> On Fri, Jun 17, 2011 at 04:40:36AM -0400, Hidetoshi Seto wrote:
>> Function del_timer_sync() has WARN_ON(in_irq()) in it because
>> calling it from interrupt context can cause deadlock if it
>> interrupts the target timer running.
>
> No need to explain the del_timer_sync() code here - just say that it's
> not allowed to call it from an IRQ context.
>

I'd like to add "why not allowed" in brief too...
Anyway I'll update the patch description and make it short.

>> only few MCE geeks experience this warning on write.
>
> MCE geeks ??? I wonder who those are :-)

I'm just joking ;-)

>> +/* Must not be called from interrupt where del_timer_sync() can deadlock */
>> +static void mce_timer_delete_all(void)
>> +{
>> + int cpu;
>> +
>> + for_each_online_cpu(cpu) {
>> + if (mce_available(&per_cpu(cpu_info, cpu)))
>> + del_timer_sync(&per_cpu(mce_timer, cpu));
>> + }
>> +}
>
> You're adding the mce_available(..) check just to remove it in the next
> patch. Since all those sysfs nodes are behind such a check, there's no
> need for it here too.

Maybe it would be better to change the order of patches to make
this fix after the removal of unnecessary mce_available().


Thanks,
H.Seto

2011-06-20 04:48:14

by Hidetoshi Seto

[permalink] [raw]
Subject: Re: [PATCH 2/8] x86, mce: remove redundant mce_available() checks

(2011/06/17 23:39), Borislav Petkov wrote:
>> @@ -1418,8 +1414,15 @@ void __cpuinit mcheck_cpu_init(struct cpuinfo_x86 *c)
>> if (__mcheck_cpu_ancient_init(c))
>> return;
>>
>> - if (!mce_available(c))
>> + if (!mce_available(c)) {
>> + /*
>> + * Asymmetric configurations are not supported today.
>> + * If mce_banks is allocated there must be a cpu passed here.
>> + */
>> + WARN_ON(!mce_disabled && mce_banks);
>> + mce_disabled = 1;
>> return;
>> + }
>
> I don't think this will ever happen so why complicate the code
> needlessly? This can only be executed if at least one of the cores on
> the system has CPUID(1)_EDX, bits 7 and 14 cleared and that's just
> silly.

This is a guard against such silly situation, isn't it?
I don't think it looks so complicated and needless.

I think this code will not run on systems with single socket as
long as the processor vendors are in the right mind.
But in cases when it is numa with strange nodes or when it is
migrated to/from misconfigured virtual machine I can't say that
with certainty.

>
> Besides, mcheck_init_device() already confirms we don't support
> MCE-asymmetric configs:
>
> if (!mce_available(&boot_cpu_data))
> return -EIO;

Without above check, mcheck_init_device() will not notice the
issue when boot cpu is mce capable but others are not.

Maybe I should have another patch to do well with silly
configurations, and let this patch to a simple cleanup.


Thanks,
H.Seto

2011-06-20 04:48:33

by Hidetoshi Seto

[permalink] [raw]
Subject: Re: [PATCH 5/8] x86, mce: introduce mce_sysdev_init()

(2011/06/18 1:32), Borislav Petkov wrote:
>> @@ -1953,6 +1953,39 @@ static __cpuinit void mce_sysdev_remove(unsigned int cpu)
>> cpumask_clear_cpu(cpu, mce_sysdev_initialized);
>> }
>>
>> +static __init int mce_sysdev_init(void)
>> +{
>> + int err, i;
>> +
>> + zalloc_cpumask_var(&mce_sysdev_initialized, GFP_KERNEL);
>> +
>> + /* init attributes for each mce_banks */
>> + for (i = 0; i < banks; i++) {
>> + struct mce_bank *b = &mce_banks[i];
>> + struct sysdev_attribute *a = &b->attr;
>> +
>> + sysfs_attr_init(&a->attr);
>> + a->attr.name = b->attrname;
>> + snprintf(b->attrname, ATTR_LEN, "bank%d", i);
>> +
>> + a->attr.mode = 0644;
>> + a->show = show_bank;
>> + a->store = set_bank;
>> + }
>> +
>> + err = sysdev_class_register(&mce_sysdev_class);
>> + if (err)
>> + return err;
>> +
>> + for_each_online_cpu(i) {
>> + err = mce_sysdev_create(i);
>> + if (err)
>> + return err;
>> + }
>
> Yeah, let's handle the error case here and unwind to a sane state:
>
> if (err)
> goto sysdev_unwind;
> ...
>
> sysdev_unwind:
> while (--i > 0)
> mce_sysdev_remove(i);
>

Nice!
I'll do it.


Thanks,
H.Seto

2011-06-20 04:48:58

by Hidetoshi Seto

[permalink] [raw]
Subject: Re: [PATCH 4/8] x86, mce: rename bootparam parser

(2011/06/18 17:38), Borislav Petkov wrote:
> On Fri, Jun 17, 2011 at 03:25:51PM -0700, Luck, Tony wrote:
>>> and leave this like this. "nomce" is the same as "mce=off" and frankly,
>>> I'd like to remove this redundancy, thus no need to do the code
>>> relocation. In addition, I don't think there are lots of systems running
>>> with "nomce" so I really think we should drop it.
>>>
>>> So Ingo, hpa, what is the proper way to remove early setup params? Maybe
>>> through Documentation/feature-removal-schedule.txt?
>>
>> Though it seems odd to me that anyone would want to turn mce off,
>> the fact that we have two ways to do so would indicate that people
>> do (or at least did) want to do this.
>
> Yeah, the only usecase I could think of for this is people doing their
> own userspace DRAM ECC evaluation like google. But they'd still need to
> have an #MC handler... or they could parse syslog for the output from
> the default unexpected_machine_check().. hm.
>
>> It seems like we'd need a long "deprecated" period (till all the
>> major OSVs turn out a new release) to remove this without surprises.
>> If the "nomce" option just disappears, then people using it will
>> not realize that their boot time argument was ignored (until they
>> see a reported error).
>
> Yeah, we could do a grace period with a warning:
>
> /*
> * Old style boot options parsing. Only for compatibility.
> */
> static int __init mcheck_disable(char *str)
> {
> pr_err("\"nomce\" boot param is deprecated. Please use \"mce=off\"\n");
>
> mce_disabled = 1;
> return 1;
> }
> __setup("nomce", mcheck_disable);
>
> and then kill it.

Thank you for giving a new direction!
I'll write a patch to "deprecate nomce" instead of this rename/cleanup.


Thanks,
H.Seto

2011-06-20 07:37:22

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH 1/8] x86, mce: stop calling del_timer_sync() from interrupt

On Mon, Jun 20, 2011 at 12:46:28AM -0400, Hidetoshi Seto wrote:
> > You're adding the mce_available(..) check just to remove it in the next
> > patch. Since all those sysfs nodes are behind such a check, there's no
> > need for it here too.
>
> Maybe it would be better to change the order of patches to make
> this fix after the removal of unnecessary mce_available().

Yes, and then don't add it to mce_timer_delete_all because this functionality is
guarded my mce_available() checks in the sysfs nodes init path.

--
Regards/Gruss,
Boris.

Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
GM: Alberto Bozzo
Reg: Dornach, Landkreis Muenchen
HRB Nr. 43632 WEEE Registernr: 129 19551

2011-06-20 07:45:18

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH 2/8] x86, mce: remove redundant mce_available() checks

On Mon, Jun 20, 2011 at 12:47:57AM -0400, Hidetoshi Seto wrote:
> > I don't think this will ever happen so why complicate the code
> > needlessly? This can only be executed if at least one of the cores on
> > the system has CPUID(1)_EDX, bits 7 and 14 cleared and that's just
> > silly.
>
> This is a guard against such silly situation, isn't it?
> I don't think it looks so complicated and needless.
>
> I think this code will not run on systems with single socket as
> long as the processor vendors are in the right mind.
> But in cases when it is numa with strange nodes or when it is
> migrated to/from misconfigured virtual machine I can't say that
> with certainty.

Let me put it this way: are you talking about a concrete example where
there's machine with the BSP's MCA features enabled and the rest of the
cores are not or is this some theoretical thing you're conjecturing?

If it is the first I'd really like to know about it, if it's the second,
then my initial point stands - no need to handle purely theoretical
cases.

--
Regards/Gruss,
Boris.

Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
GM: Alberto Bozzo
Reg: Dornach, Landkreis Muenchen
HRB Nr. 43632 WEEE Registernr: 129 19551