Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753853AbdIGRpV (ORCPT ); Thu, 7 Sep 2017 13:45:21 -0400 Received: from bombadil.infradead.org ([65.50.211.133]:34425 "EHLO bombadil.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751417AbdIGRpT (ORCPT ); Thu, 7 Sep 2017 13:45:19 -0400 Date: Thu, 7 Sep 2017 19:45:05 +0200 From: Peter Zijlstra To: Prateek Sood Cc: tj@kernel.org, lizefan@huawei.com, cgroups@vger.kernel.org, mingo@kernel.org, longman@redhat.com, boqun.feng@gmail.com, tglx@linutronix.de, linux-kernel@vger.kernel.org, sramana@codeaurora.org Subject: Re: [PATCH] cgroup/cpuset: remove circular dependency deadlock Message-ID: <20170907174505.GF17526@worktop.programming.kicks-ass.net> References: <1504792583-10424-1-git-send-email-prsood@codeaurora.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1504792583-10424-1-git-send-email-prsood@codeaurora.org> User-Agent: Mutt/1.5.22.1 (2013-10-16) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2178 Lines: 78 On Thu, Sep 07, 2017 at 07:26:23PM +0530, Prateek Sood wrote: > Remove circular dependency deadlock in a scenario where hotplug of CPU is > being done while there is updation in cgroup and cpuset triggered from > userspace. > > Process A => kthreadd => Process B => Process C => Process A > Process A > cpu_subsys_offline(); > cpu_down(); > _cpu_down(); > percpu_down_write(&cpu_hotplug_lock); //held > cpuhp_invoke_callback(); > workqueue_offline_cpu(); > wq_update_unbound_numa(); > kthread_create_on_node(); > wake_up_process(); //wakeup kthreadd TJ, I'm puzzled, why would we need to spawn new threads to update NUMA affinity when taking a CPU out? That doesn't make sense to me, we can either shrink the affinity of an existing thread or completely kill of a thread if the mask becomes empty. But why spawn a new thread? > flush_work(); > wait_for_completion(); > > kthreadd > kthreadd(); > kernel_thread(); > do_fork(); > copy_process(); > percpu_down_read(&cgroup_threadgroup_rwsem); > __rwsem_down_read_failed_common(); //waiting So this will eventually do our: complete() to make A go. > Process B > kernfs_fop_write(); > cgroup_file_write(); > cgroup_procs_write(); > percpu_down_write(&cgroup_threadgroup_rwsem); //held > cgroup_attach_task(); > cgroup_migrate(); > cgroup_migrate_execute(); > cpuset_can_attach(); > mutex_lock(&cpuset_mutex); //waiting > > Process C > kernfs_fop_write(); > cgroup_file_write(); > cpuset_write_resmask(); > mutex_lock(&cpuset_mutex); //held > update_cpumask(); > update_cpumasks_hier(); > rebuild_sched_domains_locked(); > get_online_cpus(); > percpu_down_read(&cpu_hotplug_lock); //waiting So the whole thing looks like: A B C D L(hotplug) L(threadgroup) L(cpuset) L(threadgroup) WFC(c) L(cpuset) L(hotplug) C(c) Yes, inverting cpuset and hotplug would break that chain, but I'm still wondering why workqueue needs to spawn threads on CPU down.