2012-06-05 02:34:06

by Chen, Gong

[permalink] [raw]
Subject: [PATCH] fix the MCE poll timer logic

In commit 82f7af09 (x86/mce: Cleanup timer mess), Thomas just forgot
the "/ 2" there while cleaning up.

Signed-off-by: Chen Gong <[email protected]>
---
arch/x86/kernel/cpu/mcheck/mce.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index 0a687fd..a97f3c4 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -1274,7 +1274,7 @@ static void mce_timer_fn(unsigned long data)
*/
iv = __this_cpu_read(mce_next_interval);
if (mce_notify_irq())
- iv = max(iv, (unsigned long) HZ/100);
+ iv = max(iv / 2, (unsigned long) HZ/100);
else
iv = min(iv * 2, round_jiffies_relative(check_interval * HZ));
__this_cpu_write(mce_next_interval, iv);
--
1.7.10


2012-06-05 09:39:46

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH] fix the MCE poll timer logic

On Tue, 5 Jun 2012, Chen Gong wrote:

> In commit 82f7af09 (x86/mce: Cleanup timer mess), Thomas just forgot
> the "/ 2" there while cleaning up.
>
> Signed-off-by: Chen Gong <[email protected]>

Acked-by: Thomas Gleixner <[email protected]>


> ---
> arch/x86/kernel/cpu/mcheck/mce.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
> index 0a687fd..a97f3c4 100644
> --- a/arch/x86/kernel/cpu/mcheck/mce.c
> +++ b/arch/x86/kernel/cpu/mcheck/mce.c
> @@ -1274,7 +1274,7 @@ static void mce_timer_fn(unsigned long data)
> */
> iv = __this_cpu_read(mce_next_interval);
> if (mce_notify_irq())
> - iv = max(iv, (unsigned long) HZ/100);
> + iv = max(iv / 2, (unsigned long) HZ/100);
> else
> iv = min(iv * 2, round_jiffies_relative(check_interval * HZ));
> __this_cpu_write(mce_next_interval, iv);
> --
> 1.7.10
>
>

2012-06-06 07:10:42

by Chen, Gong

[permalink] [raw]
Subject: [tip:x86/urgent] x86/mce: Fix the MCE poll timer logic

Commit-ID: 958fb3c51295764599d6abce87e1a01ace897a3e
Gitweb: http://git.kernel.org/tip/958fb3c51295764599d6abce87e1a01ace897a3e
Author: Chen Gong <[email protected]>
AuthorDate: Tue, 5 Jun 2012 10:35:02 +0800
Committer: Ingo Molnar <[email protected]>
CommitDate: Wed, 6 Jun 2012 08:28:21 +0200

x86/mce: Fix the MCE poll timer logic

In commit 82f7af09 ("x86/mce: Cleanup timer mess), Thomas just
forgot the "/ 2" there while cleaning up.

Signed-off-by: Chen Gong <[email protected]>
Acked-by: Thomas Gleixner <[email protected]>
Cc: [email protected]
Cc: [email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
---
arch/x86/kernel/cpu/mcheck/mce.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index 0a687fd..a97f3c4 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -1274,7 +1274,7 @@ static void mce_timer_fn(unsigned long data)
*/
iv = __this_cpu_read(mce_next_interval);
if (mce_notify_irq())
- iv = max(iv, (unsigned long) HZ/100);
+ iv = max(iv / 2, (unsigned long) HZ/100);
else
iv = min(iv * 2, round_jiffies_relative(check_interval * HZ));
__this_cpu_write(mce_next_interval, iv);

2012-06-06 07:51:36

by Chen, Gong

[permalink] [raw]
Subject: Re: [tip:x86/urgent] x86/mce: Fix the MCE poll timer logic

于 2012/6/6 15:10, tip-bot for Chen Gong 写道:
> Commit-ID: 958fb3c51295764599d6abce87e1a01ace897a3e
> Gitweb: http://git.kernel.org/tip/958fb3c51295764599d6abce87e1a01ace897a3e
> Author: Chen Gong <[email protected]>
> AuthorDate: Tue, 5 Jun 2012 10:35:02 +0800
> Committer: Ingo Molnar <[email protected]>
> CommitDate: Wed, 6 Jun 2012 08:28:21 +0200
>
> x86/mce: Fix the MCE poll timer logic
>
> In commit 82f7af09 ("x86/mce: Cleanup timer mess), Thomas just
> forgot the "/ 2" there while cleaning up.
>
> Signed-off-by: Chen Gong <[email protected]>
> Acked-by: Thomas Gleixner <[email protected]>
> Cc: [email protected]
> Cc: [email protected]
> Link: http://lkml.kernel.org/r/[email protected]
> Signed-off-by: Ingo Molnar <[email protected]>
> ---
> arch/x86/kernel/cpu/mcheck/mce.c | 2 +-
> 1 files changed, 1 insertions(+), 1 deletions(-)
>
> diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
> index 0a687fd..a97f3c4 100644
> --- a/arch/x86/kernel/cpu/mcheck/mce.c
> +++ b/arch/x86/kernel/cpu/mcheck/mce.c
> @@ -1274,7 +1274,7 @@ static void mce_timer_fn(unsigned long data)
> */
> iv = __this_cpu_read(mce_next_interval);
> if (mce_notify_irq())
> - iv = max(iv, (unsigned long) HZ/100);
> + iv = max(iv / 2, (unsigned long) HZ/100);
> else
> iv = min(iv * 2, round_jiffies_relative(check_interval * HZ));
> __this_cpu_write(mce_next_interval, iv);
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

In fact, there still exists another potential issue:

static void __mcheck_cpu_init_timer(void)
{
struct timer_list *t = &__get_cpu_var(mce_timer);
unsigned long iv = __this_cpu_read(mce_next_interval);

setup_timer(t, mce_timer_fn, smp_processor_id());

if (mce_ignore_ce)
return;

__this_cpu_write(mce_next_interval, iv);
if (!iv)
return;
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Because the 2nd patch is not merged yet, so here iv is zero when this
function is called, which means at the beginning, the poll timers are
not registered until some other conditions trigger *add_timer_on*.

t->expires = round_jiffies(jiffies + iv);
add_timer_on(t, smp_processor_id());
}

Another potential issue is in this function two smp_processor_id()
are called. If conext changes during this procedure (I'm not sure
if it can hapen, besides secondary_cpu kickoff, online/offline will
call these functions, even in virtualization envrionment, etc.).
So I think it will be better saving the value in the beginning of
this function. Make sense?

2012-06-06 09:27:24

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [tip:x86/urgent] x86/mce: Fix the MCE poll timer logic

On Wed, 6 Jun 2012, Chen Gong wrote:
> In fact, there still exists another potential issue:
>
> static void __mcheck_cpu_init_timer(void)
> {
> struct timer_list *t = &__get_cpu_var(mce_timer);
> unsigned long iv = __this_cpu_read(mce_next_interval);
>
> setup_timer(t, mce_timer_fn, smp_processor_id());
>
> if (mce_ignore_ce)
> return;
>
> __this_cpu_write(mce_next_interval, iv);
> if (!iv)
> return;
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>
> Because the 2nd patch is not merged yet, so here iv is zero when this
> function is called, which means at the beginning, the poll timers are
> not registered until some other conditions trigger *add_timer_on*.

Dammit. I dropped the

iv = check_interval * HZ;

line before __this_cpu_write() and nobody noticed. :(

> t->expires = round_jiffies(jiffies + iv);
> add_timer_on(t, smp_processor_id());
> }
>
> Another potential issue is in this function two smp_processor_id()
> are called. If conext changes during this procedure (I'm not sure
> if it can hapen, besides secondary_cpu kickoff, online/offline will

No. This code is always called with preemption disabled.

> call these functions, even in virtualization envrionment, etc.).

What has virtualization to do with that ?

> So I think it will be better saving the value in the beginning of
> this function. Make sense?

No. Otherwise all the __this_cpu_read/write accesses are bogus as
well.

Thanks,

tglx

2012-06-06 09:52:16

by Chen, Gong

[permalink] [raw]
Subject: Re: [tip:x86/urgent] x86/mce: Fix the MCE poll timer logic

于 2012/6/6 17:27, Thomas Gleixner 写道:
> On Wed, 6 Jun 2012, Chen Gong wrote:
>> In fact, there still exists another potential issue:
>>
>> static void __mcheck_cpu_init_timer(void)
>> {
>> struct timer_list *t = &__get_cpu_var(mce_timer);
>> unsigned long iv = __this_cpu_read(mce_next_interval);
>>
>> setup_timer(t, mce_timer_fn, smp_processor_id());
>>
>> if (mce_ignore_ce)
>> return;
>>
>> __this_cpu_write(mce_next_interval, iv);
>> if (!iv)
>> return;
>> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>>
>> Because the 2nd patch is not merged yet, so here iv is zero when this
>> function is called, which means at the beginning, the poll timers are
>> not registered until some other conditions trigger *add_timer_on*.
> Dammit. I dropped the
>
> iv = check_interval * HZ;
>
> line before __this_cpu_write() and nobody noticed. :(
>
>> t->expires = round_jiffies(jiffies + iv);
>> add_timer_on(t, smp_processor_id());
>> }
>>
>> Another potential issue is in this function two smp_processor_id()
>> are called. If conext changes during this procedure (I'm not sure
>> if it can hapen, besides secondary_cpu kickoff, online/offline will
> No. This code is always called with preemption disabled.
>
>> call these functions, even in virtualization envrionment, etc.).
> What has virtualization to do with that ?
>
>> So I think it will be better saving the value in the beginning of
>> this function. Make sense?
> No. Otherwise all the __this_cpu_read/write accesses are bogus as
> well.
>
>
Oh, yes, since __this_cpu_read/write can be used here, there no context
issue.
Please ignore my over-thinking.