Received: by 2002:a25:ab43:0:0:0:0:0 with SMTP id u61csp3890032ybi; Mon, 3 Jun 2019 01:57:18 -0700 (PDT) X-Google-Smtp-Source: APXvYqzJAJNyZ2rOutAITvVco2uiEnLHQS+SibTKfENNGC9RqAXlvolnZwsvFNaopqQnXwhquCCk X-Received: by 2002:a65:64d5:: with SMTP id t21mr27455611pgv.310.1559552238158; Mon, 03 Jun 2019 01:57:18 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1559552238; cv=none; d=google.com; s=arc-20160816; b=wjHolP0zzqhfRuhFEarxetm46wVjFy3OnjOiFno6TYbDNQs7jiXGLX+wc8Hr5O5xI3 /3zmkiG/v0X+TuukIwot09BscoMQjjQkvuCCImN1b5LF7PrVWiNM60CJrBpQBX9SMEOj Nd3JDZh61sQUEU6LesSAz123LyD75DIfJE+EJ2mRGLwWwhn7CAnsjXSo9+eX9UhygzcS SGBpu9ffgMC6pzxVdQeEn8fhJNOfuFIcFYG8MeXGfqaL6AAwG/KeC1SeSEmvpwjQkdYc dIO2F8pHLQkcOCGyyZwK0EIC6BoSq1T+guIfHm+v3KreAvGIrpGLK0RMXRXARKn6kq4A xP6Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:message-id:user-agent:in-reply-to :content-disposition:mime-version:references:reply-to:subject:cc:to :from:date; bh=BotEQRsyazxT5RJpsh1+pInK4QtDmyeK/vE3k+BEaV0=; b=yDSAOur/F7cugqZxSGAGdXesjs/+yeAQVJahjsiedf5zvtBrqFzBQJskrivqrEIWka WjUc0rUdDvuJMz2emwbFe6pVe0LDlTUWEy2XGekX5g/XX23hHK79z66ldIq4aBkj+r7Q W2cZTl2LYTZ3DpTbeuKsv23h4BaR9BY8Cv66JNzBUKb9e8mxCCmuKdt/5YexsO+L11AQ 3RSbZIBvjMT1W7tXRmegynR0jxwGDQb93UL82ATyu42u9QQerLE2dnAVpC9mYpHpaGnr sLnu569MdDkiqC0h00FpNfBOAqyMtCth027zV0kQxMYSzKsRzixXMK/8ZLFkVsnrgcBt S5PA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=ibm.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id j5si3638376pgj.233.2019.06.03.01.57.01; Mon, 03 Jun 2019 01:57:18 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=ibm.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727853AbfFCIzM (ORCPT + 99 others); Mon, 3 Jun 2019 04:55:12 -0400 Received: from mx0b-001b2d01.pphosted.com ([148.163.158.5]:59252 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726604AbfFCIzL (ORCPT ); Mon, 3 Jun 2019 04:55:11 -0400 Received: from pps.filterd (m0098419.ppops.net [127.0.0.1]) by mx0b-001b2d01.pphosted.com (8.16.0.27/8.16.0.27) with SMTP id x538mBib123877 for ; Mon, 3 Jun 2019 04:55:10 -0400 Received: from e35.co.us.ibm.com (e35.co.us.ibm.com [32.97.110.153]) by mx0b-001b2d01.pphosted.com with ESMTP id 2sw02k9342-1 (version=TLSv1.2 cipher=AES256-GCM-SHA384 bits=256 verify=NOT) for ; Mon, 03 Jun 2019 04:55:09 -0400 Received: from localhost by e35.co.us.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Mon, 3 Jun 2019 09:55:09 +0100 Received: from b03cxnp08028.gho.boulder.ibm.com (9.17.130.20) by e35.co.us.ibm.com (192.168.1.135) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; (version=TLSv1/SSLv3 cipher=AES256-GCM-SHA384 bits=256/256) Mon, 3 Jun 2019 09:55:06 +0100 Received: from b03ledav002.gho.boulder.ibm.com (b03ledav002.gho.boulder.ibm.com [9.17.130.233]) by b03cxnp08028.gho.boulder.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id x538t5bn33096164 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 3 Jun 2019 08:55:05 GMT Received: from b03ledav002.gho.boulder.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 9883C136053; Mon, 3 Jun 2019 08:55:05 +0000 (GMT) Received: from b03ledav002.gho.boulder.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 48AFF13604F; Mon, 3 Jun 2019 08:55:05 +0000 (GMT) Received: from sofia.ibm.com (unknown [9.124.31.17]) by b03ledav002.gho.boulder.ibm.com (Postfix) with ESMTP; Mon, 3 Jun 2019 08:55:05 +0000 (GMT) Received: by sofia.ibm.com (Postfix, from userid 1000) id 511432E36A2; Mon, 3 Jun 2019 14:25:02 +0530 (IST) Date: Mon, 3 Jun 2019 14:25:02 +0530 From: Gautham R Shenoy To: "Gautham R. Shenoy" Cc: Paul Mackerras , Nicholas Piggin , Michael Ellerman , "Aneesh Kumar K.V" , linuxppc-dev@lists.ozlabs.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH v3] powerpc/pseries: Fix cpu_hotplug_lock acquisition in resize_hpt() Reply-To: ego@linux.vnet.ibm.com References: <1557906352-29048-1-git-send-email-ego@linux.vnet.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1557906352-29048-1-git-send-email-ego@linux.vnet.ibm.com> User-Agent: Mutt/1.5.23 (2014-03-12) X-TM-AS-GCONF: 00 x-cbid: 19060308-0012-0000-0000-0000173FA0B9 X-IBM-SpamModules-Scores: X-IBM-SpamModules-Versions: BY=3.00011207; HX=3.00000242; KW=3.00000007; PH=3.00000004; SC=3.00000286; SDB=6.01212557; UDB=6.00637236; IPR=6.00993618; MB=3.00027161; MTD=3.00000008; XFM=3.00000015; UTC=2019-06-03 08:55:08 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 19060308-0013-0000-0000-0000578415E7 Message-Id: <20190603085502.GA23270@in.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:,, definitions=2019-06-03_07:,, signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 malwarescore=0 suspectscore=0 phishscore=0 bulkscore=0 spamscore=0 clxscore=1015 lowpriorityscore=0 mlxscore=0 impostorscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1810050000 definitions=main-1906030065 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi, On Wed, May 15, 2019 at 01:15:52PM +0530, Gautham R. Shenoy wrote: > From: "Gautham R. Shenoy" > > The calls to arch_add_memory()/arch_remove_memory() are always made > with the read-side cpu_hotplug_lock acquired via > memory_hotplug_begin(). On pSeries, > arch_add_memory()/arch_remove_memory() eventually call resize_hpt() > which in turn calls stop_machine() which acquires the read-side > cpu_hotplug_lock again, thereby resulting in the recursive acquisition > of this lock. A clarification regarding why we hadn't observed this problem earlier. In the absence of CONFIG_PROVE_LOCKING, we hadn't observed a system lockup during a memory hotplug operation because cpus_read_lock() is a per-cpu rwsem read, which, in the fast-path (in the absence of the writer, which in our case is a CPU-hotplug operation) simply increments the read_count on the semaphore. Thus a recursive read in the fast-path doesn't cause any problems. However, we can hit this problem in practice if there is a concurrent CPU-Hotplug operation in progress which is waiting to acquire the write-side of the lock. This will cause the second recursive read to block until the writer finishes. While the writer is blocked since the first read holds the lock. Thus both the reader as well as the writers fail to make any progress thereby blocking both CPU-Hotplug as well as Memory Hotplug operations. Memory-Hotplug CPU-Hotplug CPU 0 CPU 1 ------ ------ 1. down_read(cpu_hotplug_lock.rw_sem) [memory_hotplug_begin] 2. down_write(cpu_hotplug_lock.rw_sem) [cpu_up/cpu_down] 3. down_read(cpu_hotplug_lock.rw_sem) [stop_machine()] > > Lockdep complains as follows in these code-paths. > > swapper/0/1 is trying to acquire lock: > (____ptrval____) (cpu_hotplug_lock.rw_sem){++++}, at: stop_machine+0x2c/0x60 > > but task is already holding lock: > (____ptrval____) (cpu_hotplug_lock.rw_sem){++++}, at: mem_hotplug_begin+0x20/0x50 > > other info that might help us debug this: > Possible unsafe locking scenario: > > CPU0 > ---- > lock(cpu_hotplug_lock.rw_sem); > lock(cpu_hotplug_lock.rw_sem); > > *** DEADLOCK *** > > May be due to missing lock nesting notation > > 3 locks held by swapper/0/1: > #0: (____ptrval____) (&dev->mutex){....}, at: __driver_attach+0x12c/0x1b0 > #1: (____ptrval____) (cpu_hotplug_lock.rw_sem){++++}, at: mem_hotplug_begin+0x20/0x50 > #2: (____ptrval____) (mem_hotplug_lock.rw_sem){++++}, at: percpu_down_write+0x54/0x1a0 > > stack backtrace: > CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.0.0-rc5-58373-gbc99402235f3-dirty #166 > Call Trace: > [c0000000feb03150] [c000000000e32bd4] dump_stack+0xe8/0x164 (unreliable) > [c0000000feb031a0] [c00000000020d6c0] __lock_acquire+0x1110/0x1c70 > [c0000000feb03320] [c00000000020f080] lock_acquire+0x240/0x290 > [c0000000feb033e0] [c00000000017f554] cpus_read_lock+0x64/0xf0 > [c0000000feb03420] [c00000000029ebac] stop_machine+0x2c/0x60 > [c0000000feb03460] [c0000000000d7f7c] pseries_lpar_resize_hpt+0x19c/0x2c0 > [c0000000feb03500] [c0000000000788d0] resize_hpt_for_hotplug+0x70/0xd0 > [c0000000feb03570] [c000000000e5d278] arch_add_memory+0x58/0xfc > [c0000000feb03610] [c0000000003553a8] devm_memremap_pages+0x5e8/0x8f0 > [c0000000feb036c0] [c0000000009c2394] pmem_attach_disk+0x764/0x830 > [c0000000feb037d0] [c0000000009a7c38] nvdimm_bus_probe+0x118/0x240 > [c0000000feb03860] [c000000000968500] really_probe+0x230/0x4b0 > [c0000000feb038f0] [c000000000968aec] driver_probe_device+0x16c/0x1e0 > [c0000000feb03970] [c000000000968ca8] __driver_attach+0x148/0x1b0 > [c0000000feb039f0] [c0000000009650b0] bus_for_each_dev+0x90/0x130 > [c0000000feb03a50] [c000000000967dd4] driver_attach+0x34/0x50 > [c0000000feb03a70] [c000000000967068] bus_add_driver+0x1a8/0x360 > [c0000000feb03b00] [c00000000096a498] driver_register+0x108/0x170 > [c0000000feb03b70] [c0000000009a7400] __nd_driver_register+0xd0/0xf0 > [c0000000feb03bd0] [c00000000128aa90] nd_pmem_driver_init+0x34/0x48 > [c0000000feb03bf0] [c000000000010a10] do_one_initcall+0x1e0/0x45c > [c0000000feb03cd0] [c00000000122462c] kernel_init_freeable+0x540/0x64c > [c0000000feb03db0] [c00000000001110c] kernel_init+0x2c/0x160 > [c0000000feb03e20] [c00000000000bed4] ret_from_kernel_thread+0x5c/0x68 > > Fix this issue by > 1) Requiring all the calls to pseries_lpar_resize_hpt() be made > with cpu_hotplug_lock held. > > 2) In pseries_lpar_resize_hpt() invoke stop_machine_cpuslocked() > as a consequence of 1) > > 3) To satisfy 1), in hpt_order_set(), call mmu_hash_ops.resize_hpt() > with cpu_hotplug_lock held. > > Reported-by: Aneesh Kumar K.V > Signed-off-by: Gautham R. Shenoy > --- > v2 -> v3 : Updated the comment for pseries_lpar_resize_hpt() > Updated the commit-log with the full backtrace. > v1 -> v2 : Rebased against powerpc/next instead of linux/master > > arch/powerpc/mm/book3s64/hash_utils.c | 9 ++++++++- > arch/powerpc/platforms/pseries/lpar.c | 8 ++++++-- > 2 files changed, 14 insertions(+), 3 deletions(-) > > diff --git a/arch/powerpc/mm/book3s64/hash_utils.c b/arch/powerpc/mm/book3s64/hash_utils.c > index 919a861..d07fcafd 100644 > --- a/arch/powerpc/mm/book3s64/hash_utils.c > +++ b/arch/powerpc/mm/book3s64/hash_utils.c > @@ -38,6 +38,7 @@ > #include > #include > #include > +#include > > #include > #include > @@ -1928,10 +1929,16 @@ static int hpt_order_get(void *data, u64 *val) > > static int hpt_order_set(void *data, u64 val) > { > + int ret; > + > if (!mmu_hash_ops.resize_hpt) > return -ENODEV; > > - return mmu_hash_ops.resize_hpt(val); > + cpus_read_lock(); > + ret = mmu_hash_ops.resize_hpt(val); > + cpus_read_unlock(); > + > + return ret; > } > > DEFINE_DEBUGFS_ATTRIBUTE(fops_hpt_order, hpt_order_get, hpt_order_set, "%llu\n"); > diff --git a/arch/powerpc/platforms/pseries/lpar.c b/arch/powerpc/platforms/pseries/lpar.c > index 1034ef1..557d592 100644 > --- a/arch/powerpc/platforms/pseries/lpar.c > +++ b/arch/powerpc/platforms/pseries/lpar.c > @@ -859,7 +859,10 @@ static int pseries_lpar_resize_hpt_commit(void *data) > return 0; > } > > -/* Must be called in user context */ > +/* > + * Must be called in process context. The caller must hold the > + * cpus_lock. > + */ > static int pseries_lpar_resize_hpt(unsigned long shift) > { > struct hpt_resize_state state = { > @@ -913,7 +916,8 @@ static int pseries_lpar_resize_hpt(unsigned long shift) > > t1 = ktime_get(); > > - rc = stop_machine(pseries_lpar_resize_hpt_commit, &state, NULL); > + rc = stop_machine_cpuslocked(pseries_lpar_resize_hpt_commit, > + &state, NULL); > > t2 = ktime_get(); > > -- > 1.9.4 >