This patch adds "mce=nopoll" option to disable timer polling
for corrected errors from boot. Unlike "mce=off", it doesn't
prevent handling for uncorrected errors.
It is useful if:
- You don't have any interests in corrected errors. You may
use option mce_threshold=0 to disable cmci too.
- You'd like to care banks only which cmci are supported.
- You have an application such as hardware monitor that
checks error banks, and that can conflict with OS's polling.
- Your system have an intelligent BIOS which can provide
enough health information, so reports from OS is redundant.
Once booted, we can disable polling by setting check_interval
to 0, but there are no mention about the fact.
Some additional comments are help for this.
Signed-off-by: Hidetoshi Seto <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Ingo Molnar <[email protected]>
---
Documentation/x86/x86_64/boot-options.txt | 2 ++
arch/x86/kernel/cpu/mcheck/mce_64.c | 10 ++++++++--
2 files changed, 10 insertions(+), 2 deletions(-)
diff --git a/Documentation/x86/x86_64/boot-options.txt b/Documentation/x86/x86_64/boot-options.txt
index 34c1304..5d55158 100644
--- a/Documentation/x86/x86_64/boot-options.txt
+++ b/Documentation/x86/x86_64/boot-options.txt
@@ -13,6 +13,8 @@ Machine check
in a reboot. On Intel systems it is enabled by default.
mce=nobootlog
Disable boot machine check logging.
+ mce=nopoll
+ Disable timer polling for corrected errors
mce=tolerancelevel (number)
0: always panic on uncorrected errors, log corrected errors
1: panic or SIGBUS on uncorrected errors, log corrected errors
diff --git a/arch/x86/kernel/cpu/mcheck/mce_64.c b/arch/x86/kernel/cpu/mcheck/mce_64.c
index 33d612e..80ec191 100644
--- a/arch/x86/kernel/cpu/mcheck/mce_64.c
+++ b/arch/x86/kernel/cpu/mcheck/mce_64.c
@@ -449,6 +449,8 @@ void mce_log_therm_throt_event(__u64 status)
* Periodic polling timer for "silent" machine check errors. If the
* poller finds an MCE, poll 2x faster. When the poller finds no more
* errors, poll 2x slower (up to check_interval seconds).
+ *
+ * If check_interval is 0, polling is disabled.
*/
static int check_interval = 5 * 60; /* 5 minutes */
@@ -633,11 +635,12 @@ static void mce_init_timer(void)
{
struct timer_list *t = &__get_cpu_var(mce_timer);
+ /* Disable polling if check_interval is 0 */
+ if (!check_interval)
+ return;
/* data race harmless because everyone sets to the same value */
if (!next_interval)
next_interval = check_interval * HZ;
- if (!next_interval)
- return;
setup_timer(t, mcheck_timer, smp_processor_id());
t->expires = round_jiffies(jiffies + next_interval);
add_timer(t);
@@ -845,11 +848,14 @@ __setup("nomce", mcheck_disable);
* mce=TOLERANCELEVEL (number, see above)
* mce=bootlog Log MCEs from before booting. Disabled by default on AMD.
* mce=nobootlog Don't log MCEs from before booting.
+ * mce=nopoll Disable timer polling for corrected errors
*/
static int __init mcheck_enable(char *str)
{
if (!strcmp(str, "off"))
mce_dont_init = 1;
+ else if (!strcmp(str, "nopoll"))
+ check_interval = 0;
else if (!strcmp(str, "bootlog") || !strcmp(str, "nobootlog"))
mce_bootlog = (str[0] == 'b');
else if (isdigit(str[0]))
--
1.6.2.1
Hidetoshi Seto wrote:
> This patch adds "mce=nopoll" option to disable timer polling
> for corrected errors from boot. Unlike "mce=off", it doesn't
> prevent handling for uncorrected errors.
>
> It is useful if:
> - You don't have any interests in corrected errors. You may
> use option mce_threshold=0 to disable cmci too.
> - You'd like to care banks only which cmci are supported.
These two seem to be conflicting. CMCI errors are corrected errors
too. Why would the user care about the reporting mechanism
and only shut down one or not another?
Also I'm not sure a boot argument is really needed. Isn't it
good enough to do this early at boot through sysfs?
> - You have an application such as hardware monitor that
> checks error banks, and that can conflict with OS's polling.
Well then your patch is not enough because it doesn't shut off
boot time clearing/logging of corrected errors left over from
boot for once. And CMCI.
> - Your system have an intelligent BIOS which can provide
> enough health information, so reports from OS is redundant.
It would seem inconvenient then to require the user to set a special
boot option. I think it would be better if the BIOS set a flag
somewhere that the kernel can check.
> Once booted, we can disable polling by setting check_interval
> to 0, but there are no mention about the fact.
That's true, Documentation/x86/x86_64/machinecheck should be fixed
to say 0 means no polling. I'm not sure new boot option are the
preferred fix for missing documentation though @)
> static int check_interval = 5 * 60; /* 5 minutes */
> @@ -633,11 +635,12 @@ static void mce_init_timer(void)
> {
> struct timer_list *t = &__get_cpu_var(mce_timer);
>
> + /* Disable polling if check_interval is 0 */
> + if (!check_interval)
> + return;
This check shouldn't be needed, the next two checks already do that.
Also there's a conflicting patch pending which moves next_interval
to be per CPU.
-Andi
Andi Kleen wrote:
> Hidetoshi Seto wrote:
>> This patch adds "mce=nopoll" option to disable timer polling
>> for corrected errors from boot. Unlike "mce=off", it doesn't
>> prevent handling for uncorrected errors.
>>
>> It is useful if:
>> - You don't have any interests in corrected errors. You may
>> use option mce_threshold=0 to disable cmci too.
>> - You'd like to care banks only which cmci are supported.
>
> These two seem to be conflicting. CMCI errors are corrected errors
> too. Why would the user care about the reporting mechanism
> and only shut down one or not another?
Separate the two in different case.
The first case is easy to understand.
The second case is ... yes, unusual request.
I think these 2 mechanism, polling and threshold interrupt, can
be used in parallel, because there are errors having various
characteristics.
Assume that:
error A : Easily happen (i.e. once in every minute)
polling : o (with long interval preferred)
interrupt : o (with high threshold)
error B : Burst in short time once happened
polling : o
interrupt : x (low threshold will be storm of interrupt,
high threshold cannot report the error)
error C : Rarely happen (bit serious?)
polling : x (waste of CPU)
interrupt : o (with low threshold)
However still I could not find situation where the second case
is reasonable... Well, it was bad example. Please ignore.
> Also I'm not sure a boot argument is really needed. Isn't it
> good enough to do this early at boot through sysfs?
Maybe it is good for this option, as far as polling never run
so soon. Because sysfs is available after start of polling
timer, boot argument is required just in theory of logics.
>> - You have an application such as hardware monitor that
>> checks error banks, and that can conflict with OS's polling.
>
> Well then your patch is not enough because it doesn't shut off
> boot time clearing/logging of corrected errors left over from
> boot for once. And CMCI.
This case also requires mce_threshold=0 to shut off CMCI like
the first case. I should wrote down it for this third case too.
I don't think it is really required, but if the bootlog, that left
over from previous boot, is required by the application, then the
application should use /dev/mcelog. If all works well, there would
be left-overred CE and/or UE, or recovered UE that happened after
the boot. Isn't it easy?
>> - Your system have an intelligent BIOS which can provide
>> enough health information, so reports from OS is redundant.
>
> It would seem inconvenient then to require the user to set a special
> boot option. I think it would be better if the BIOS set a flag
> somewhere that the kernel can check.
Of course there would be no such inconvenient if BIOS can prohibit
OS capability (by hiding resources etc.), and/or if there are good
interface for such use between OS and BIOS.
But there is not such interface yet. And I think setting option is
reasonably quick and convenient way at this time, rather than
requesting/waiting revised BIOS and/or new specification for the
interface.
>> Once booted, we can disable polling by setting check_interval
>> to 0, but there are no mention about the fact.
>
> That's true, Documentation/x86/x86_64/machinecheck should be fixed
> to say 0 means no polling. I'm not sure new boot option are the
> preferred fix for missing documentation though @)
Indeed.
One another problem is that there are multiple documentations for
machinecheck parameters, but not linked well:
- Documentation/kernel-parameters.txt
("See Documentation/x86/x86_64/boot-options.txt" for "mce=")
- Documentation/x86/x86_64/boot-options.txt
("AMD64 specific boot options" is not true now!)
- Documentation/x86/x86_64/machinecheck
(which I had not noticed the existence at first, oops!)
>> static int check_interval = 5 * 60; /* 5 minutes */
>> @@ -633,11 +635,12 @@ static void mce_init_timer(void)
>> {
>> struct timer_list *t = &__get_cpu_var(mce_timer);
>>
>> + /* Disable polling if check_interval is 0 */
>> + if (!check_interval)
>> + return;
>
> This check shouldn't be needed, the next two checks already do that.
That is for readability improvement.
+ /* Disable polling if check_interval is 0 */
+ if (!check_interval)
+ return;
/* data race harmless because everyone sets to the same value */
if (!next_interval)
next_interval = check_interval * HZ;
- if (!next_interval)
- return;
Are there any case where the HZ becomes 0?
> Also there's a conflicting patch pending which moves next_interval
> to be per CPU.
Even if next_interval becomes per CPU, global check_interval can live
with it. I don't mind doing rebase my patches on top of your pending
patches, so if prepared please send your patches to LKML (CC-ing me)
any time soon.
Thanks,
H.Seto
Hidetoshi Seto wrote:
>> Also I'm not sure a boot argument is really needed. Isn't it
>> good enough to do this early at boot through sysfs?
>
> Maybe it is good for this option, as far as polling never run
> so soon. Because sysfs is available after start of polling
> timer, boot argument is required just in theory of logics.
I think the best way would be to just not run mcelog if you want
the BIOS to log all. The only problem I guess is that users might be confused
by the printk. So perhaps just do a patch to shut down the printk?
>
> One another problem is that there are multiple documentations for
> machinecheck parameters, but not linked well:
>
> - Documentation/kernel-parameters.txt
> ("See Documentation/x86/x86_64/boot-options.txt" for "mce=")
> - Documentation/x86/x86_64/boot-options.txt
> ("AMD64 specific boot options" is not true now!)
> - Documentation/x86/x86_64/machinecheck
> (which I had not noticed the existence at first, oops!)
machinecheck is for the sysfs interface, boot options is for the boot parameters.
I guess a reference could be added to machinecheck to point to boot-options.txt
Undoubtedly the documentation could be further improved too. In general machine
checks are rather tricky and somewhat unobvious though and I expect no matter how good
the documentation ever is it won't be easy to understand.
>>> static int check_interval = 5 * 60; /* 5 minutes */
>>> @@ -633,11 +635,12 @@ static void mce_init_timer(void)
>>> {
>>> struct timer_list *t = &__get_cpu_var(mce_timer);
>>>
>>> + /* Disable polling if check_interval is 0 */
>>> + if (!check_interval)
>>> + return;
>> This check shouldn't be needed, the next two checks already do that.
>
> That is for readability improvement.
it was actually a cheesy way to avoid a race with multiple initializers, but
I fixed this in a better way in a upcoming patch (now the interval is per CPU)
> + /* Disable polling if check_interval is 0 */
> + if (!check_interval)
> + return;
> /* data race harmless because everyone sets to the same value */
> if (!next_interval)
> next_interval = check_interval * HZ;
> - if (!next_interval)
> - return;
>
> Are there any case where the HZ becomes 0?
No.
-Andi
Commit-ID: f5b3ca6efb64aa3ab5fa32f0fc9cc0f6cfedee95
Gitweb: http://git.kernel.org/tip/f5b3ca6efb64aa3ab5fa32f0fc9cc0f6cfedee95
Author: Hidetoshi Seto <[email protected]>
AuthorDate: Thu, 26 Mar 2009 17:39:39 +0900
Committer: Ingo Molnar <[email protected]>
CommitDate: Sat, 28 Mar 2009 13:13:01 +0100
x86, mce: Add mce=nopoll option to disable timer polling
Impact: add new boot option
This patch adds "mce=nopoll" option to disable timer polling
for corrected errors from boot. Unlike "mce=off", it doesn't
prevent handling for uncorrected errors.
It is useful if:
- You don't have any interests in corrected errors. You may
use option mce_threshold=0 to disable cmci too.
- You'd like to care banks only which cmci are supported.
- You have an application such as hardware monitor that
checks error banks, and that can conflict with OS's polling.
- Your system have an intelligent BIOS which can provide
enough health information, so reports from OS is redundant.
Once booted, we can disable polling by setting check_interval
to 0, but there are no mention about the fact.
Some additional comments are help for this.
Signed-off-by: Hidetoshi Seto <[email protected]>
Cc: Andi Kleen <[email protected]>
LKML-Reference: <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
---
Documentation/x86/x86_64/boot-options.txt | 2 ++
arch/x86/kernel/cpu/mcheck/mce_64.c | 10 ++++++++--
2 files changed, 10 insertions(+), 2 deletions(-)
diff --git a/Documentation/x86/x86_64/boot-options.txt b/Documentation/x86/x86_64/boot-options.txt
index 34c1304..5d55158 100644
--- a/Documentation/x86/x86_64/boot-options.txt
+++ b/Documentation/x86/x86_64/boot-options.txt
@@ -13,6 +13,8 @@ Machine check
in a reboot. On Intel systems it is enabled by default.
mce=nobootlog
Disable boot machine check logging.
+ mce=nopoll
+ Disable timer polling for corrected errors
mce=tolerancelevel (number)
0: always panic on uncorrected errors, log corrected errors
1: panic or SIGBUS on uncorrected errors, log corrected errors
diff --git a/arch/x86/kernel/cpu/mcheck/mce_64.c b/arch/x86/kernel/cpu/mcheck/mce_64.c
index 9bf52bd..fe21883 100644
--- a/arch/x86/kernel/cpu/mcheck/mce_64.c
+++ b/arch/x86/kernel/cpu/mcheck/mce_64.c
@@ -449,6 +449,8 @@ void mce_log_therm_throt_event(__u64 status)
* Periodic polling timer for "silent" machine check errors. If the
* poller finds an MCE, poll 2x faster. When the poller finds no more
* errors, poll 2x slower (up to check_interval seconds).
+ *
+ * If check_interval is 0, polling is disabled.
*/
static int check_interval = 5 * 60; /* 5 minutes */
@@ -633,11 +635,12 @@ static void mce_init_timer(void)
{
struct timer_list *t = &__get_cpu_var(mce_timer);
+ /* Disable polling if check_interval is 0 */
+ if (!check_interval)
+ return;
/* data race harmless because everyone sets to the same value */
if (!next_interval)
next_interval = check_interval * HZ;
- if (!next_interval)
- return;
setup_timer(t, mcheck_timer, smp_processor_id());
t->expires = round_jiffies(jiffies + next_interval);
add_timer(t);
@@ -845,11 +848,14 @@ __setup("nomce", mcheck_disable);
* mce=TOLERANCELEVEL (number, see above)
* mce=bootlog Log MCEs from before booting. Disabled by default on AMD.
* mce=nobootlog Don't log MCEs from before booting.
+ * mce=nopoll Disable timer polling for corrected errors
*/
static int __init mcheck_enable(char *str)
{
if (!strcmp(str, "off"))
mce_dont_init = 1;
+ else if (!strcmp(str, "nopoll"))
+ check_interval = 0;
else if (!strcmp(str, "bootlog") || !strcmp(str, "nobootlog"))
mce_bootlog = (str[0] == 'b');
else if (isdigit(str[0]))
Andi Kleen wrote:
>>> Also I'm not sure a boot argument is really needed. Isn't it
>>> good enough to do this early at boot through sysfs?
>> Maybe it is good for this option, as far as polling never run
>> so soon. Because sysfs is available after start of polling
>> timer, boot argument is required just in theory of logics.
>
> I think the best way would be to just not run mcelog if you want
> the BIOS to log all. The only problem I guess is that users might be confused
> by the printk. So perhaps just do a patch to shut down the printk?
How to prevent banks from clearing?
>> One another problem is that there are multiple documentations for
>> machinecheck parameters, but not linked well:
>>
>> - Documentation/kernel-parameters.txt
>> ("See Documentation/x86/x86_64/boot-options.txt" for "mce=")
>> - Documentation/x86/x86_64/boot-options.txt
>> ("AMD64 specific boot options" is not true now!)
>> - Documentation/x86/x86_64/machinecheck
>> (which I had not noticed the existence at first, oops!)
>
> machinecheck is for the sysfs interface, boot options is for the boot parameters.
>
> I guess a reference could be added to machinecheck to point to boot-options.txt
Vice verse, I suppose.
boot-options.txt just have "Everything else is in sysfs now." without pointing
machinecheck that describe the sysfs well.
Thanks,
H.Seto
Hidetoshi Seto wrote:
> Andi Kleen wrote:
>>>> Also I'm not sure a boot argument is really needed. Isn't it
>>>> good enough to do this early at boot through sysfs?
>>> Maybe it is good for this option, as far as polling never run
>>> so soon. Because sysfs is available after start of polling
>>> timer, boot argument is required just in theory of logics.
>> I think the best way would be to just not run mcelog if you want
>> the BIOS to log all. The only problem I guess is that users might be confused
>> by the printk. So perhaps just do a patch to shut down the printk?
>
> How to prevent banks from clearing?
Firmware first typically means the firmware gets it one way or another
using SMI and that tends to work even when the OS clears because
SMI comes first. So stopping the clearing is probably not needed?
>>> One another problem is that there are multiple documentations for
>>> machinecheck parameters, but not linked well:
>>>
>>> - Documentation/kernel-parameters.txt
>>> ("See Documentation/x86/x86_64/boot-options.txt" for "mce=")
>>> - Documentation/x86/x86_64/boot-options.txt
>>> ("AMD64 specific boot options" is not true now!)
>>> - Documentation/x86/x86_64/machinecheck
>>> (which I had not noticed the existence at first, oops!)
>> machinecheck is for the sysfs interface, boot options is for the boot parameters.
>>
>> I guess a reference could be added to machinecheck to point to boot-options.txt
>
> Vice verse, I suppose.
> boot-options.txt just have "Everything else is in sysfs now." without pointing
> machinecheck that describe the sysfs well.
I added a reference in my pile.
-Andi
Andi Kleen wrote:
>> How to prevent banks from clearing?
>
> Firmware first typically means the firmware gets it one way or another
> using SMI and that tends to work even when the OS clears because
> SMI comes first. So stopping the clearing is probably not needed?
SMI is not only way for firmware.
IIRC, I heard there are some server doing poll from service processor on
BMC (Board Management Controller).
I could be wrong, but always there could be a special case.
Thanks,
H.Seto
Hidetoshi Seto wrote:
> Andi Kleen wrote:
>>> How to prevent banks from clearing?
>> Firmware first typically means the firmware gets it one way or another
>> using SMI and that tends to work even when the OS clears because
>> SMI comes first. So stopping the clearing is probably not needed?
>
> SMI is not only way for firmware.
> IIRC, I heard there are some server doing poll from service processor on
> BMC (Board Management Controller).
> I could be wrong, but always there could be a special case.
I see. In this case you likely want to shut off boot logging too
(mce=nobootlog)
-Andi