Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751164AbeABQRC (ORCPT + 1 other); Tue, 2 Jan 2018 11:17:02 -0500 Received: from mail-io0-f181.google.com ([209.85.223.181]:45083 "EHLO mail-io0-f181.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751101AbeABQRA (ORCPT ); Tue, 2 Jan 2018 11:17:00 -0500 X-Google-Smtp-Source: ACJfBovEtqcsAwsnZXuZNnjW7I5q4xzyjDlZ3hU7w9a1h53WAGJ65IUsXVm3XVeQrhkj4m9o6DRR8A== Date: Tue, 2 Jan 2018 08:16:56 -0800 From: Tejun Heo To: Prateek Sood Cc: Peter Zijlstra , avagin@gmail.com, mingo@kernel.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, sramana@codeaurora.org, "Paul E. McKenney" Subject: Re: [PATCH] cgroup/cpuset: fix circular locking dependency Message-ID: <20180102161656.GD3668920@devbig577.frc2.facebook.com> References: <1511868946-23959-1-git-send-email-prsood@codeaurora.org> <623f214b-8b9a-f967-7a3d-ca9c06151267@codeaurora.org> <20171204202219.GF2421075@devbig577.frc2.facebook.com> <20171204225825.GP2421075@devbig577.frc2.facebook.com> <20171204230117.GF20227@worktop.programming.kicks-ass.net> <20171211152059.GH2421075@devbig577.frc2.facebook.com> <20171213160617.GQ3919388@devbig577.frc2.facebook.com> <9843d982-d201-8702-2e4e-0541a4d96b53@codeaurora.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <9843d982-d201-8702-2e4e-0541a4d96b53@codeaurora.org> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Return-Path: Hello, On Fri, Dec 29, 2017 at 02:07:16AM +0530, Prateek Sood wrote: > task T is waiting for cpuset_mutex acquired > by kworker/2:1 > > sh ==> cpuhp/2 ==> kworker/2:1 ==> sh > > kworker/2:3 ==> kthreadd ==> Task T ==> kworker/2:1 > > It seems that my earlier patch set should fix this scenario: > 1) Inverting locking order of cpuset_mutex and cpu_hotplug_lock. > 2) Make cpuset hotplug work synchronous. > > Could you please share your feedback. Hmm... this can also be resolved by adding WQ_MEM_RECLAIM to the synchronize rcu workqueue, right? Given the wide-spread usages of synchronize_rcu and friends, maybe that's the right solution, or at least something we also need to do, for this particular deadlock? Again, I don't have anything against making the domain rebuliding part of cpuset operations synchronous and these tricky deadlock scenarios do indicate that doing so would probably be beneficial. That said, tho, these scenarios seem more of manifestations of other problems exposed through kthreadd dependency than anything else. Thanks. -- tejun