Date: Thu, 10 Dec 2009 15:35:10 +0100 (CET)
From: Thomas Gleixner <tglx@linutronix.de>
To: Xiaotian Feng <dfeng@redhat.com>
cc: damm@igel.co.jp, hsweeten@visionengravers.com, akpm@linux-foundation.org,
       venkatesh.pallipadi@intel.com, linux-kernel@vger.kernel.org
Subject: Re: [RFC PATCH 0/4] clockevents: fix clockevent_devices list corruption
 after cpu hotplug
In-Reply-To: <1260450459-18072-1-git-send-email-dfeng@redhat.com>
Message-ID: <alpine.LFD.2.00.0912101431380.3089@localhost.localdomain>
References: <1260450459-18072-1-git-send-email-dfeng@redhat.com>
User-Agent: Alpine 2.00 (LFD 1167 2008-08-23)
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 3933
Lines: 98

On Thu, 10 Dec 2009, Xiaotian Feng wrote:
> I've met a list_del corruption, which was reported in
> http://lkml.org/lkml/2009/11/27/45. But no response, so I try to debug it
> by myself.
> 
> After I added some printks to show all elements in clockevent_devices, I
> found kernel hangs when I tried to resume from s2ram.
> 
> In clockevents_register_device, clockevents_do_notify ADD is always followed
> by clockevents_notify_released. Although clockevents_do_notify ADD will use
> tick_check_new_device to add new devices and replace old devices to the
> clockevents_released list, clockevents_notify_released add them back to
> clockevent_devices list.
> 
> My system is Quad-Core x86_64, with apic and hpet enables, after boot up,
> the elements in clockevent_devices list is :
> clockevent_device->lapic(3)->hpet5(3)->lapic(2)->hpet4(2)->lapic(1)->hpet3(1)-
>   ->lapic(0)->hpet2(0)->hpet(0)
> * () means cpu id
> 
> But active clock_event_device is hpet2,hpet3,hpet4,hpet5. Then at s2ram stage,
> cpu 1,2,3 is down, then notify CLOCK_EVT_NOTIFY_CPU_DEAD will calls tick_shutdown,
> then hpet2,hpet3,hpet4,hpet5 was deleted from clockevent_device list.
> So after s2ram, elements in clockevent_device list is:
> clockevent_device->lapic(3)->lapic(2)->lapic(1)->lapic(0)->hpet2(0)->hpet(0)
> 
> Then at resume stage, cpu 1,2,3 is up, it will register lapic again, and then
> perform list_add lapic on clockevent_device list, e.g. list_add lapic(1) on
> above list, lapic will move to the clockevent_device->next, but lapic(2)->next
> is still point to lapic(1), the list is circular and corrupted then. 

Great detective work !

> This patchset aims to fixes above behaviour by:
>        - on clockevents_register_device, if notify ADD success, move new devices
>          to the clockevent_devices list, otherwise move to clockevents_released
>          list.
>        - on clockevents_notify_released, same behaviour as above.
>        - on clockevents_notify CPU_DEAD, remove related devices on dead cpu from
>          clockevents_released list.
> 
> It makes sure that only active devices on each cpu is on clockevent_devices list.
> With this patchset, the list_del corruption disappeared, and suspend/resume, cpu
> hotplug works fine on my system. 

I'm not happy about that churn. Why don't we simply scan the
clockevent_devices list for leftovers of the dead CPU ?

Untested patch below solves the same problem.

Thanks,

	tglx
----
diff --git a/kernel/time/clockevents.c b/kernel/time/clockevents.c
index 20a8920..5dd857f 100644
--- a/kernel/time/clockevents.c
+++ b/kernel/time/clockevents.c
@@ -238,8 +238,9 @@ void clockevents_exchange_device(struct clock_event_device *old,
  */
 void clockevents_notify(unsigned long reason, void *arg)
 {
-	struct list_head *node, *tmp;
+	struct clock_event_device *dev, *tmp;
 	unsigned long flags;
+	int cpu;
 
 	spin_lock_irqsave(&clockevents_lock, flags);
 	clockevents_do_notify(reason, arg);
@@ -250,8 +251,19 @@ void clockevents_notify(unsigned long reason, void *arg)
 		 * Unregister the clock event devices which were
 		 * released from the users in the notify chain.
 		 */
-		list_for_each_safe(node, tmp, &clockevents_released)
-			list_del(node);
+		list_for_each_entry_safe(dev, tmp, &clockevents_released, list)
+			list_del(&dev->list);
+		/*
+		 * Now check whether the CPU has left unused per cpu devices
+		 */
+		cpu = *((int *)arg);
+		list_for_each_entry_safe(dev, tmp, &clockevent_devices, list) {
+			if (cpumask_test_cpu(cpu, dev->cpumask) &&
+			    cpumask_weight(dev->cpumask) == 1) {
+				BUG_ON(dev->mode != CLOCK_EVT_MODE_UNUSED);
+				list_del(&dev->list);
+			}
+		}
 		break;
 	default:
 		break;


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/