Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751406AbeABRno (ORCPT + 1 other); Tue, 2 Jan 2018 12:43:44 -0500 Received: from mx0a-001b2d01.pphosted.com ([148.163.156.1]:45684 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751366AbeABRnk (ORCPT ); Tue, 2 Jan 2018 12:43:40 -0500 Date: Tue, 2 Jan 2018 09:44:08 -0800 From: "Paul E. McKenney" To: Tejun Heo Cc: Prateek Sood , Peter Zijlstra , avagin@gmail.com, mingo@kernel.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, sramana@codeaurora.org Subject: Re: [PATCH] cgroup/cpuset: fix circular locking dependency Reply-To: paulmck@linux.vnet.ibm.com References: <1511868946-23959-1-git-send-email-prsood@codeaurora.org> <623f214b-8b9a-f967-7a3d-ca9c06151267@codeaurora.org> <20171204202219.GF2421075@devbig577.frc2.facebook.com> <20171204225825.GP2421075@devbig577.frc2.facebook.com> <20171204230117.GF20227@worktop.programming.kicks-ass.net> <20171211152059.GH2421075@devbig577.frc2.facebook.com> <20171213160617.GQ3919388@devbig577.frc2.facebook.com> <9843d982-d201-8702-2e4e-0541a4d96b53@codeaurora.org> <20180102161656.GD3668920@devbig577.frc2.facebook.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20180102161656.GD3668920@devbig577.frc2.facebook.com> User-Agent: Mutt/1.5.21 (2010-09-15) X-TM-AS-GCONF: 00 x-cbid: 18010217-0040-0000-0000-000003D9B232 X-IBM-SpamModules-Scores: X-IBM-SpamModules-Versions: BY=3.00008306; HX=3.00000241; KW=3.00000007; PH=3.00000004; SC=3.00000244; SDB=6.00969252; UDB=6.00490784; IPR=6.00749226; BA=6.00005764; NDR=6.00000001; ZLA=6.00000005; ZF=6.00000009; ZB=6.00000000; ZP=6.00000000; ZH=6.00000000; ZU=6.00000002; MB=3.00018833; XFM=3.00000015; UTC=2018-01-02 17:43:38 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 18010217-0041-0000-0000-000007CF0549 Message-Id: <20180102174408.GM7829@linux.vnet.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2018-01-02_13:,, signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 malwarescore=0 suspectscore=0 phishscore=0 bulkscore=0 spamscore=0 clxscore=1011 lowpriorityscore=0 impostorscore=0 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1709140000 definitions=main-1801020255 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Return-Path: On Tue, Jan 02, 2018 at 08:16:56AM -0800, Tejun Heo wrote: > Hello, > > On Fri, Dec 29, 2017 at 02:07:16AM +0530, Prateek Sood wrote: > > task T is waiting for cpuset_mutex acquired > > by kworker/2:1 > > > > sh ==> cpuhp/2 ==> kworker/2:1 ==> sh > > > > kworker/2:3 ==> kthreadd ==> Task T ==> kworker/2:1 > > > > It seems that my earlier patch set should fix this scenario: > > 1) Inverting locking order of cpuset_mutex and cpu_hotplug_lock. > > 2) Make cpuset hotplug work synchronous. > > > > Could you please share your feedback. > > Hmm... this can also be resolved by adding WQ_MEM_RECLAIM to the > synchronize rcu workqueue, right? Given the wide-spread usages of > synchronize_rcu and friends, maybe that's the right solution, or at > least something we also need to do, for this particular deadlock? To make WQ_MEM_RECLAIM work, I need to dynamically allocate RCU's workqueues, correct? Or is there some way to mark a statically allocated workqueue as WQ_MEM_RECLAIM after the fact? I can dynamically allocate them, but I need to carefully investigate boot-time use. So if it is possible to be lazy, I do want to take the easy way out. ;-) Thanx, Paul > Again, I don't have anything against making the domain rebuliding part > of cpuset operations synchronous and these tricky deadlock scenarios > do indicate that doing so would probably be beneficial. That said, > tho, these scenarios seem more of manifestations of other problems > exposed through kthreadd dependency than anything else. > > Thanks. > > -- > tejun >