2010-01-07 03:23:23

by Xiaotian Feng

[permalink] [raw]
Subject: [PATCH] clockevent: don't remove broadcast device when cpu is dead

Marc reported BUG during shutdown, after debugging, kernel is trying
to remove a broadcast device which mode is CLOCK_EVT_MODE_ONESHOT.

The root cause for this bug is that in clockevents_notify,
"cpumask_weight(dev->cpumask) == 1" is always true even if dev is a
broadcast device. We need to use tick_is_broadcast_device to check
if it is a broadcast device.

Reported-by: Marc Dionne <[email protected]>
Tested-by: Marc Dionne <[email protected]>
Signed-off-by: Xiaotian Feng <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Magnus Damm <[email protected]>
Cc: H Hartley Sweeten <[email protected]>
---
kernel/time/clockevents.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/kernel/time/clockevents.c b/kernel/time/clockevents.c
index 6f740d9..0223d83 100644
--- a/kernel/time/clockevents.c
+++ b/kernel/time/clockevents.c
@@ -259,7 +259,7 @@ void clockevents_notify(unsigned long reason, void *arg)
cpu = *((int *)arg);
list_for_each_entry_safe(dev, tmp, &clockevent_devices, list) {
if (cpumask_test_cpu(cpu, dev->cpumask) &&
- cpumask_weight(dev->cpumask) == 1) {
+ !tick_is_broadcast_device(dev)) {
BUG_ON(dev->mode != CLOCK_EVT_MODE_UNUSED);
list_del(&dev->list);
}
--
1.6.5.2


2010-01-12 02:27:04

by Xiaotian Feng

[permalink] [raw]
Subject: Re: [PATCH] clockevent: don't remove broadcast device when cpu is dead

Hi:

Any comments on the patch? Marc confirmed this patch also fixed his
hang at suspend/resume stage. Thanks.

(Cc'ed [email protected])

Regards
Xiaotian

On 01/07/2010 11:22 AM, Xiaotian Feng wrote:
> Marc reported BUG during shutdown, after debugging, kernel is trying
> to remove a broadcast device which mode is CLOCK_EVT_MODE_ONESHOT.
>
> The root cause for this bug is that in clockevents_notify,
> "cpumask_weight(dev->cpumask) == 1" is always true even if dev is a
> broadcast device. We need to use tick_is_broadcast_device to check
> if it is a broadcast device.
>
> Reported-by: Marc Dionne<[email protected]>
> Tested-by: Marc Dionne<[email protected]>
> Signed-off-by: Xiaotian Feng<[email protected]>
> Cc: Thomas Gleixner<[email protected]>
> Cc: Magnus Damm<[email protected]>
> Cc: H Hartley Sweeten<[email protected]>
> ---
> kernel/time/clockevents.c | 2 +-
> 1 files changed, 1 insertions(+), 1 deletions(-)
>
> diff --git a/kernel/time/clockevents.c b/kernel/time/clockevents.c
> index 6f740d9..0223d83 100644
> --- a/kernel/time/clockevents.c
> +++ b/kernel/time/clockevents.c
> @@ -259,7 +259,7 @@ void clockevents_notify(unsigned long reason, void *arg)
> cpu = *((int *)arg);
> list_for_each_entry_safe(dev, tmp,&clockevent_devices, list) {
> if (cpumask_test_cpu(cpu, dev->cpumask)&&
> - cpumask_weight(dev->cpumask) == 1) {
> + !tick_is_broadcast_device(dev)) {
> BUG_ON(dev->mode != CLOCK_EVT_MODE_UNUSED);
> list_del(&dev->list);
> }
>

2010-01-12 13:20:54

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH] clockevent: don't remove broadcast device when cpu is dead

On Thu, 7 Jan 2010, Xiaotian Feng wrote:

> Marc reported BUG during shutdown, after debugging, kernel is trying
> to remove a broadcast device which mode is CLOCK_EVT_MODE_ONESHOT.
>
> The root cause for this bug is that in clockevents_notify,
> "cpumask_weight(dev->cpumask) == 1" is always true even if dev is a

Why is cpumask_weight(dev->cpumask) == 1 always true when we shutdown
a non boot cpu ?

The broadcast device is not a per cpu device and the cpumask should
not only contain the CPU which is shut down !

The patch is papering over the real problem.

Marc, can you please apply the following debug patch and provide the
dmesg outputs from boot and shutdown ?

Thanks,

tglx
---

Index: linux-2.6-tip/kernel/time/clockevents.c
===================================================================
--- linux-2.6-tip.orig/kernel/time/clockevents.c
+++ linux-2.6-tip/kernel/time/clockevents.c
@@ -186,7 +186,7 @@ void clockevents_register_device(struct
BUG_ON(!dev->cpumask);

raw_spin_lock_irqsave(&clockevents_lock, flags);
-
+ printk(KERN_ERR "CE register %p %s\n", dev, dev->name);
list_add(&dev->list, &clockevent_devices);
clockevents_do_notify(CLOCK_EVT_NOTIFY_ADD, dev);
clockevents_notify_released();
@@ -220,6 +220,7 @@ void clockevents_exchange_device(struct
* released list and do a notify add later.
*/
if (old) {
+ printk(KERN_INFO "CE Release %p %s\n", old, old->name);
clockevents_set_mode(old, CLOCK_EVT_MODE_UNUSED);
list_del(&old->list);
list_add(&old->list, &clockevents_released);
@@ -260,6 +261,13 @@ void clockevents_notify(unsigned long re
list_for_each_entry_safe(dev, tmp, &clockevent_devices, list) {
if (cpumask_test_cpu(cpu, dev->cpumask) &&
cpumask_weight(dev->cpumask) == 1) {
+ if (dev->mode != CLOCK_EVT_MODE_UNUSED) {
+ printk(KERN_INFO
+ "CE Remove %p %s bc: %d\n",
+ dev, dev->name,
+ tick_is_broadcast_device(dev));
+ continue;
+ }
BUG_ON(dev->mode != CLOCK_EVT_MODE_UNUSED);
list_del(&dev->list);
}
Index: linux-2.6-tip/kernel/time/tick-broadcast.c
===================================================================
--- linux-2.6-tip.orig/kernel/time/tick-broadcast.c
+++ linux-2.6-tip/kernel/time/tick-broadcast.c
@@ -72,6 +72,8 @@ int tick_check_broadcast_device(struct c
(dev->features & CLOCK_EVT_FEAT_C3STOP))
return 0;

+ printk(KERN_INFO "CE set broadcast %p %s\n", dev, dev->name);
+
clockevents_exchange_device(NULL, dev);
tick_broadcast_device.evtdev = dev;
if (!cpumask_empty(tick_get_broadcast_mask()))

2010-01-13 01:28:32

by Marc Dionne

[permalink] [raw]
Subject: Re: [PATCH] clockevent: don't remove broadcast device when cpu is dead

On Tue, Jan 12, 2010 at 8:20 AM, Thomas Gleixner <[email protected]> wrote:
> On Thu, 7 Jan 2010, Xiaotian Feng wrote:
>
>> Marc reported BUG during shutdown, after debugging, kernel is trying
>> to remove a broadcast device which mode is CLOCK_EVT_MODE_ONESHOT.
>>
>> The root cause for this bug is that in clockevents_notify,
>> "cpumask_weight(dev->cpumask) == 1" is always true even if dev is a
>
> Why is cpumask_weight(dev->cpumask) == 1 always true when we shutdown
> a non boot cpu ?
>
> The broadcast device is not a per cpu device and the cpumask should
> not only contain the CPU which is shut down !
>
> The patch is papering over the real problem.
>
> Marc, can you please apply the following debug patch and provide the
> dmesg outputs from boot and shutdown ?
>
> Thanks,
>
> ? ? ? ?tglx

Full dmesg output from boot is attached.

On shutdown I wrote down the message just before the BUG by hand
(after modifying your patch a little to let it BUG):

CE Remove ffffffff81a44100 hpet bc: 1

Marc


Attachments:
boot_ce (43.15 kB)

2010-01-13 01:49:09

by Xiaotian Feng

[permalink] [raw]
Subject: Re: [PATCH] clockevent: don't remove broadcast device when cpu is dead

On 01/12/2010 09:20 PM, Thomas Gleixner wrote:
> On Thu, 7 Jan 2010, Xiaotian Feng wrote:
>
>> Marc reported BUG during shutdown, after debugging, kernel is trying
>> to remove a broadcast device which mode is CLOCK_EVT_MODE_ONESHOT.
>>
>> The root cause for this bug is that in clockevents_notify,
>> "cpumask_weight(dev->cpumask) == 1" is always true even if dev is a
>
> Why is cpumask_weight(dev->cpumask) == 1 always true when we shutdown
> a non boot cpu ?
>
> The broadcast device is not a per cpu device and the cpumask should
> not only contain the CPU which is shut down !

At least for hpet broadcast dev, it's dev->cpumask is only contain the
CPU which it is initialized from.
And for broadcast device, kernel is using tick_broadcast_mask not
dev->cpumask, right?

>
> The patch is papering over the real problem.
>
> Marc, can you please apply the following debug patch and provide the
> dmesg outputs from boot and shutdown ?
>
> Thanks,
>
> tglx
> ---
>
> Index: linux-2.6-tip/kernel/time/clockevents.c
> ===================================================================
> --- linux-2.6-tip.orig/kernel/time/clockevents.c
> +++ linux-2.6-tip/kernel/time/clockevents.c
> @@ -186,7 +186,7 @@ void clockevents_register_device(struct
> BUG_ON(!dev->cpumask);
>
> raw_spin_lock_irqsave(&clockevents_lock, flags);
> -
> + printk(KERN_ERR "CE register %p %s\n", dev, dev->name);
> list_add(&dev->list,&clockevent_devices);
> clockevents_do_notify(CLOCK_EVT_NOTIFY_ADD, dev);
> clockevents_notify_released();
> @@ -220,6 +220,7 @@ void clockevents_exchange_device(struct
> * released list and do a notify add later.
> */
> if (old) {
> + printk(KERN_INFO "CE Release %p %s\n", old, old->name);
> clockevents_set_mode(old, CLOCK_EVT_MODE_UNUSED);
> list_del(&old->list);
> list_add(&old->list,&clockevents_released);
> @@ -260,6 +261,13 @@ void clockevents_notify(unsigned long re
> list_for_each_entry_safe(dev, tmp,&clockevent_devices, list) {
> if (cpumask_test_cpu(cpu, dev->cpumask)&&
> cpumask_weight(dev->cpumask) == 1) {
> + if (dev->mode != CLOCK_EVT_MODE_UNUSED) {
> + printk(KERN_INFO
> + "CE Remove %p %s bc: %d\n",
> + dev, dev->name,
> + tick_is_broadcast_device(dev));
> + continue;
> + }
> BUG_ON(dev->mode != CLOCK_EVT_MODE_UNUSED);
> list_del(&dev->list);
> }
> Index: linux-2.6-tip/kernel/time/tick-broadcast.c
> ===================================================================
> --- linux-2.6-tip.orig/kernel/time/tick-broadcast.c
> +++ linux-2.6-tip/kernel/time/tick-broadcast.c
> @@ -72,6 +72,8 @@ int tick_check_broadcast_device(struct c
> (dev->features& CLOCK_EVT_FEAT_C3STOP))
> return 0;
>
> + printk(KERN_INFO "CE set broadcast %p %s\n", dev, dev->name);
> +
> clockevents_exchange_device(NULL, dev);
> tick_broadcast_device.evtdev = dev;
> if (!cpumask_empty(tick_get_broadcast_mask()))
>

2010-01-13 22:09:17

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH] clockevent: don't remove broadcast device when cpu is dead

On Wed, 13 Jan 2010, Xiaotian Feng wrote:
> On 01/12/2010 09:20 PM, Thomas Gleixner wrote:
> > On Thu, 7 Jan 2010, Xiaotian Feng wrote:
> >
> > > Marc reported BUG during shutdown, after debugging, kernel is trying
> > > to remove a broadcast device which mode is CLOCK_EVT_MODE_ONESHOT.
> > >
> > > The root cause for this bug is that in clockevents_notify,
> > > "cpumask_weight(dev->cpumask) == 1" is always true even if dev is a
> >
> > Why is cpumask_weight(dev->cpumask) == 1 always true when we shutdown
> > a non boot cpu ?
> >
> > The broadcast device is not a per cpu device and the cpumask should
> > not only contain the CPU which is shut down !
>
> At least for hpet broadcast dev, it's dev->cpumask is only contain the CPU
> which it is initialized from.

Which is fundamentaly wrong and the root cause of the problem. I'll
have a look tomorrow morning when my brain is more awake than now.

> And for broadcast device, kernel is using tick_broadcast_mask not
> dev->cpumask, right?

No, tick_broadcast_mask is the bitmask which tells us which cpus get
the broadcast IPI.

Thanks,

tglx

2010-01-14 11:44:47

by Xiaotian Feng

[permalink] [raw]
Subject: Re: [PATCH] clockevent: don't remove broadcast device when cpu is dead

On 01/14/2010 06:08 AM, Thomas Gleixner wrote:
> On Wed, 13 Jan 2010, Xiaotian Feng wrote:
>> On 01/12/2010 09:20 PM, Thomas Gleixner wrote:
>>> On Thu, 7 Jan 2010, Xiaotian Feng wrote:
>>>
>>>> Marc reported BUG during shutdown, after debugging, kernel is trying
>>>> to remove a broadcast device which mode is CLOCK_EVT_MODE_ONESHOT.
>>>>
>>>> The root cause for this bug is that in clockevents_notify,
>>>> "cpumask_weight(dev->cpumask) == 1" is always true even if dev is a
>>>
>>> Why is cpumask_weight(dev->cpumask) == 1 always true when we shutdown
>>> a non boot cpu ?
>>>
>>> The broadcast device is not a per cpu device and the cpumask should
>>> not only contain the CPU which is shut down !
>>
>> At least for hpet broadcast dev, it's dev->cpumask is only contain the CPU
>> which it is initialized from.
>
> Which is fundamentaly wrong and the root cause of the problem. I'll
> have a look tomorrow morning when my brain is more awake than now.

hpet_legacy_clockevent_register is trying to register new CE, but
replace failed,
then in tick_check_new_device -> tick_check_broadcast_device, the legacy
hpet CE
was registered as multicast device, but its dev->cpumask is cpumask of
smp_processor_id().

on my system its dev->cpumask is cpumask of 0, but in Marc's,
dev->cpumask is cpumask of 4.
So when kernel is trying to offline cpu 4, the broadcast hpet is removed.

>
>> And for broadcast device, kernel is using tick_broadcast_mask not
>> dev->cpumask, right?
>
> No, tick_broadcast_mask is the bitmask which tells us which cpus get
> the broadcast IPI.
>
> Thanks,
>
> tglx
>

2010-01-14 11:53:07

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH] clockevent: don't remove broadcast device when cpu is dead

On Thu, 14 Jan 2010, Xiaotian Feng wrote:
> On 01/14/2010 06:08 AM, Thomas Gleixner wrote:
> > On Wed, 13 Jan 2010, Xiaotian Feng wrote:
> > > On 01/12/2010 09:20 PM, Thomas Gleixner wrote:
> > > > On Thu, 7 Jan 2010, Xiaotian Feng wrote:
> > > >
> > > > > Marc reported BUG during shutdown, after debugging, kernel is trying
> > > > > to remove a broadcast device which mode is CLOCK_EVT_MODE_ONESHOT.
> > > > >
> > > > > The root cause for this bug is that in clockevents_notify,
> > > > > "cpumask_weight(dev->cpumask) == 1" is always true even if dev is a
> > > >
> > > > Why is cpumask_weight(dev->cpumask) == 1 always true when we shutdown
> > > > a non boot cpu ?
> > > >
> > > > The broadcast device is not a per cpu device and the cpumask should
> > > > not only contain the CPU which is shut down !
> > >
> > > At least for hpet broadcast dev, it's dev->cpumask is only contain the CPU
> > > which it is initialized from.
> >
> > Which is fundamentaly wrong and the root cause of the problem. I'll
> > have a look tomorrow morning when my brain is more awake than now.
>
> hpet_legacy_clockevent_register is trying to register new CE, but replace
> failed,
> then in tick_check_new_device -> tick_check_broadcast_device, the legacy hpet
> CE
> was registered as multicast device, but its dev->cpumask is cpumask of
> smp_processor_id().
>
> on my system its dev->cpumask is cpumask of 0, but in Marc's, dev->cpumask is
> cpumask of 4.
> So when kernel is trying to offline cpu 4, the broadcast hpet is removed.

I know, but the point is that a broadcast device should not be bound
to the CPU which is registering it. Working on a fix.

Thanks,

tglx

2010-01-18 13:49:10

by Xiaotian Feng

[permalink] [raw]
Subject: [tip:timers/urgent] clockevent: Don't remove broadcast device when cpu is dead

Commit-ID: ea9d8e3f45404d411c00ae67b45cc35c58265bb7
Gitweb: http://git.kernel.org/tip/ea9d8e3f45404d411c00ae67b45cc35c58265bb7
Author: Xiaotian Feng <[email protected]>
AuthorDate: Thu, 7 Jan 2010 11:22:44 +0800
Committer: Thomas Gleixner <[email protected]>
CommitDate: Mon, 18 Jan 2010 14:44:50 +0100

clockevent: Don't remove broadcast device when cpu is dead

Marc reported that the BUG_ON in clockevents_notify() triggers on his
system. This happens because the kernel tries to remove an active
clock event device (used for broadcasting) from the device list.

The handling of devices which can be used as per cpu device and as a
global broadcast device is suboptimal.

The simplest solution for now (and for stable) is to check whether the
device is used as global broadcast device, but this needs to be
revisited.

[ tglx: restored the cpuweight check and massaged the changelog ]

Reported-by: Marc Dionne <[email protected]>
Tested-by: Marc Dionne <[email protected]>
Signed-off-by: Xiaotian Feng <[email protected]>
LKML-Reference: <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Cc: [email protected]
---
kernel/time/clockevents.c | 3 ++-
1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/kernel/time/clockevents.c b/kernel/time/clockevents.c
index 6f740d9..d7395fd 100644
--- a/kernel/time/clockevents.c
+++ b/kernel/time/clockevents.c
@@ -259,7 +259,8 @@ void clockevents_notify(unsigned long reason, void *arg)
cpu = *((int *)arg);
list_for_each_entry_safe(dev, tmp, &clockevent_devices, list) {
if (cpumask_test_cpu(cpu, dev->cpumask) &&
- cpumask_weight(dev->cpumask) == 1) {
+ cpumask_weight(dev->cpumask) == 1 &&
+ !tick_is_broadcast_device(dev)) {
BUG_ON(dev->mode != CLOCK_EVT_MODE_UNUSED);
list_del(&dev->list);
}