Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755470AbdIGRvT (ORCPT ); Thu, 7 Sep 2017 13:51:19 -0400 Received: from bombadil.infradead.org ([65.50.211.133]:41633 "EHLO bombadil.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751770AbdIGRvS (ORCPT ); Thu, 7 Sep 2017 13:51:18 -0400 Date: Thu, 7 Sep 2017 19:51:07 +0200 From: Peter Zijlstra To: Prateek Sood Cc: tj@kernel.org, lizefan@huawei.com, cgroups@vger.kernel.org, mingo@kernel.org, longman@redhat.com, boqun.feng@gmail.com, tglx@linutronix.de, linux-kernel@vger.kernel.org, sramana@codeaurora.org Subject: Re: [PATCH] cgroup/cpuset: remove circular dependency deadlock Message-ID: <20170907175107.GG17526@worktop.programming.kicks-ass.net> References: <1504792583-10424-1-git-send-email-prsood@codeaurora.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1504792583-10424-1-git-send-email-prsood@codeaurora.org> User-Agent: Mutt/1.5.22.1 (2013-10-16) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2399 Lines: 74 On Thu, Sep 07, 2017 at 07:26:23PM +0530, Prateek Sood wrote: > Remove circular dependency deadlock in a scenario where hotplug of CPU is > being done while there is updation in cgroup and cpuset triggered from > userspace. You've forgotten to mention your solution to the deadlock, namely inverting cpuset_mutex and cpu_hotplug_lock. > Signed-off-by: Prateek Sood > --- > kernel/cgroup/cpuset.c | 32 +++++++++++++++++++------------- > 1 file changed, 19 insertions(+), 13 deletions(-) > > diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c > index 2f4039b..60dc0ac 100644 > --- a/kernel/cgroup/cpuset.c > +++ b/kernel/cgroup/cpuset.c > @@ -816,16 +816,15 @@ static int generate_sched_domains(cpumask_var_t **domains, > * 'cpus' is removed, then call this routine to rebuild the > * scheduler's dynamic sched domains. > * > - * Call with cpuset_mutex held. Takes get_online_cpus(). > */ > -static void rebuild_sched_domains_locked(void) > +static void rebuild_sched_domains_cpuslocked(void) > { > struct sched_domain_attr *attr; > cpumask_var_t *doms; > int ndoms; > > + lockdep_assert_cpus_held(); > lockdep_assert_held(&cpuset_mutex); > - get_online_cpus(); > > /* > * We have raced with CPU hotplug. Don't do anything to avoid > @@ -833,27 +832,27 @@ static void rebuild_sched_domains_locked(void) > * Anyways, hotplug work item will rebuild sched domains. > */ > if (!cpumask_equal(top_cpuset.effective_cpus, cpu_active_mask)) > - goto out; > + return; > > /* Generate domain masks and attrs */ > ndoms = generate_sched_domains(&doms, &attr); > > /* Have scheduler rebuild the domains */ > partition_sched_domains(ndoms, doms, attr); > -out: > - put_online_cpus(); > } > #else /* !CONFIG_SMP */ > -static void rebuild_sched_domains_locked(void) > +static void rebuild_sched_domains_cpuslocked(void) > { > } > #endif /* CONFIG_SMP */ > > void rebuild_sched_domains(void) > { > + get_online_cpus(); > mutex_lock(&cpuset_mutex); > - rebuild_sched_domains_locked(); > + rebuild_sched_domains_cpuslocked(); > mutex_unlock(&cpuset_mutex); > + put_online_cpus(); > } But if you invert these locks, the need for cpuset_hotplug_workfn() goes away, at least for the CPU part, and we can make in synchronous again. Yay!! Also, I think new code should use cpus_read_lock() instead of get_online_cpus().