Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754533AbaBFBTu (ORCPT ); Wed, 5 Feb 2014 20:19:50 -0500 Received: from aserp1040.oracle.com ([141.146.126.69]:32109 "EHLO aserp1040.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752390AbaBFBTt convert rfc822-to-8bit (ORCPT ); Wed, 5 Feb 2014 20:19:49 -0500 MIME-Version: 1.0 Message-ID: <7c7623ad-516f-4615-8923-c64ea203636c@default> Date: Wed, 5 Feb 2014 17:18:16 -0800 (PST) From: Boris Ostrovsky To: Cc: , , , , , , , , , , , , , , , Subject: Re: [PATCH 44/51] xen, balloon: Fix CPU hotplug callback registration X-Mailer: Zimbra on Oracle Beehive Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8BIT Content-Disposition: inline X-Source-IP: acsinet22.oracle.com [141.146.126.238] Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org ----- srivatsa.bhat@linux.vnet.ibm.com wrote: > Subsystems that want to register CPU hotplug callbacks, as well as > perform > initialization for the CPUs that are already online, often do it as > shown > below: > > get_online_cpus(); > > for_each_online_cpu(cpu) > init_cpu(cpu); > > register_cpu_notifier(&foobar_cpu_notifier); > > put_online_cpus(); > > This is wrong, since it is prone to ABBA deadlocks involving the > cpu_add_remove_lock and the cpu_hotplug.lock (when running > concurrently > with CPU hotplug operations). > > Interestingly, the balloon code in xen can actually prevent double > initialization and hence can use the following simplified form of > callback > registration: > > register_cpu_notifier(&foobar_cpu_notifier); > > get_online_cpus(); > > for_each_online_cpu(cpu) > init_cpu(cpu); > > put_online_cpus(); > > A hotplug operation that occurs between registering the notifier and > calling > get_online_cpus(), won't disrupt anything, because the code takes care > to > perform the memory allocations only once. > > So reorganize the balloon code in xen this way to fix the deadlock > with > callback registration. > > Cc: Konrad Rzeszutek Wilk > Cc: Boris Ostrovsky > Cc: David Vrabel > Cc: xen-devel@lists.xenproject.org > Signed-off-by: Srivatsa S. Bhat > --- > > drivers/xen/balloon.c | 35 +++++++++++++++++++++++------------ > 1 file changed, 23 insertions(+), 12 deletions(-) > > diff --git a/drivers/xen/balloon.c b/drivers/xen/balloon.c > index 37d06ea..afe1a3f 100644 > --- a/drivers/xen/balloon.c > +++ b/drivers/xen/balloon.c > @@ -592,19 +592,29 @@ static void __init balloon_add_region(unsigned > long start_pfn, > } > } > > +static int alloc_balloon_scratch_page(int cpu) > +{ > + if (per_cpu(balloon_scratch_page, cpu) != NULL) > + return 0; > + > + per_cpu(balloon_scratch_page, cpu) = alloc_page(GFP_KERNEL); > + if (per_cpu(balloon_scratch_page, cpu) == NULL) { > + pr_warn("Failed to allocate balloon_scratch_page for cpu %d\n", > cpu); > + return -ENOMEM; > + } > + > + return 0; > +} > + > + > static int balloon_cpu_notify(struct notifier_block *self, > unsigned long action, void *hcpu) > { > int cpu = (long)hcpu; > switch (action) { > case CPU_UP_PREPARE: > - if (per_cpu(balloon_scratch_page, cpu) != NULL) > - break; > - per_cpu(balloon_scratch_page, cpu) = alloc_page(GFP_KERNEL); > - if (per_cpu(balloon_scratch_page, cpu) == NULL) { > - pr_warn("Failed to allocate balloon_scratch_page for cpu %d\n", > cpu); > + if (alloc_balloon_scratch_page(cpu)) > return NOTIFY_BAD; > - } > break; > default: > break; > @@ -624,15 +634,16 @@ static int __init balloon_init(void) > return -ENODEV; > > if (!xen_feature(XENFEAT_auto_translated_physmap)) { > - for_each_online_cpu(cpu) > - { > - per_cpu(balloon_scratch_page, cpu) = alloc_page(GFP_KERNEL); > - if (per_cpu(balloon_scratch_page, cpu) == NULL) { > - pr_warn("Failed to allocate balloon_scratch_page for cpu %d\n", > cpu); > + register_cpu_notifier(&balloon_cpu_notifier); > + > + get_online_cpus(); > + for_each_online_cpu(cpu) { > + if (alloc_balloon_scratch_page(cpu)) { > + put_online_cpus(); > return -ENOMEM; Not that original code was doing a particularly thorough job of cleaning up on allocation failure but if it couldn't get memory it would not register the notifier. So perhaps you should unregister it before returning here. I am also not sure how we were susceptible to the deadlock here since we didn't call get_online_cpus(). (We probably should have but then commit description should say it). -boris > } > } > - register_cpu_notifier(&balloon_cpu_notifier); > + put_online_cpus(); > } > > pr_info("Initialising balloon driver\n"); -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/