Received: by 2002:a05:6a10:c604:0:0:0:0 with SMTP id y4csp3875665pxt; Tue, 10 Aug 2021 13:33:32 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyfxgTU3FXBCfUzHHqizUXI+ThtBux724NEZc2EfpuyhCrHbbdjbe0rOYf4OyAM+tESsxSy X-Received: by 2002:aa7:d404:: with SMTP id z4mr6926202edq.255.1628627612465; Tue, 10 Aug 2021 13:33:32 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1628627612; cv=none; d=google.com; s=arc-20160816; b=olMMQ60SRBio4h6o//xOgDO1AF9lwHcapzAtfnDLXZhoTBh24NtJJO00P6z2VCMz+q umMTmN62bMnUVr0Yuk3PQrDLh996yrSlFRfGAw4G+Yp6oabYKcUBv4ch6OBjoT7wtCVU 5C6DR4TxEuT+sdgENT2DSPAcpfHhYOJ4KzakdSexOBEl59cmiqBc5bwZe5s67yEierj1 sUMzGMOyHEvuw7LGDoFRt2ED0wgQsmNS383/SMHI6eZdY74VfYUSFef42HppLlYC8h2g Dnt+Tgblcakf9x1I9K16lV8F533MKZ4elV9g3cX4nnwsHk2hUgQ7Hrm/NuWHD4g3ibfy DKcg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-transfer-encoding :content-disposition:mime-version:references:reply-to:message-id :subject:cc:to:from:date:dkim-signature; bh=UYBkRUYnODOVn6YqzgfRWEPicaIKhHymh9wsshCpbg0=; b=U3Y8aE0SYONPLxk6/qyerwn75pjRDTP7rY3YJi//s13fEqG9/gCeSMoF6a4OZEl5sJ 23CUTCEYL/Ez7XUxGlHsUZITcf6Qb5HfXciya471wo9uxY8u06jZxolG9uozhpaN0ZgK DVFTg0/mx5mk6Ym+qjauzvruWhUcKw6ol8cxjqZG64PeCcq3EQUMnsV6+WwKk92wlIC/ 44QCltBSRNAhIdpnOKwGxtDB0tsSjhDmt5BGRyp+GpZmVL0yVgvd5gs922nFtFtwPpqj 6NMo/DWzgK8mbtyKLNUWEBJw3vTQAGYZoa7fSa4JN07+p+Kn0mmQbMlj9N3HHfLCeXfj aSHA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=VvI4AB5x; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id f12si22412134edl.0.2021.08.10.13.33.09; Tue, 10 Aug 2021 13:33:32 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=VvI4AB5x; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233329AbhHJUbt (ORCPT + 99 others); Tue, 10 Aug 2021 16:31:49 -0400 Received: from mail.kernel.org ([198.145.29.99]:40610 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233150AbhHJUbq (ORCPT ); Tue, 10 Aug 2021 16:31:46 -0400 Received: by mail.kernel.org (Postfix) with ESMTPSA id 0B810606A5; Tue, 10 Aug 2021 20:31:24 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1628627484; bh=C2rMNc9e2efv5j9d1NIVrZ2ghfDUBN/t3clUU7g6QbY=; h=Date:From:To:Cc:Subject:Reply-To:References:In-Reply-To:From; b=VvI4AB5xd8KciuWEbLlrr7918D0Q5BpbpAEqjiHNWMrcbAZ5YLxx58qhd9bw2c5/L 7nym+05flY5XuWb0/LgcJSrUlW0pv4FG5ByDENUazW/r3hHe5ev9naq7mxRnZFN4ay RnrlApAK3suV5Hd/EzfOL1ovY+NZmVYlmDkyL7bHB+L9D2ijOKT0ixle+4pw28XH/t kevouw8jNsGopm0Vx7KQ2bLtVqHA/zDEQwGRLCJN/kyohu+8tYZ4s4HiNtjbyPQOSS GPG1v3/+cC4dLZF1eh6lyLoBRk1n90f7X3N9d3QHf3UWLmxQwA7FiSd2+t8B6YMqdy gv5qfgI05A1Fw== Received: by paulmck-ThinkPad-P17-Gen-1.home (Postfix, from userid 1000) id CD0175C048D; Tue, 10 Aug 2021 13:31:23 -0700 (PDT) Date: Tue, 10 Aug 2021 13:31:23 -0700 From: "Paul E. McKenney" To: Mike Galbraith Cc: Vlastimil Babka , Qian Cai , Andrew Morton , Christoph Lameter , David Rientjes , Pekka Enberg , Joonsoo Kim , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Sebastian Andrzej Siewior , Thomas Gleixner , Mel Gorman , Jesper Dangaard Brouer , Jann Horn Subject: Re: [PATCH v4 29/35] mm: slub: Move flush_cpu_slab() invocations __free_slab() invocations out of IRQ context Message-ID: <20210810203123.GB190765@paulmck-ThinkPad-P17-Gen-1> Reply-To: paulmck@kernel.org References: <20210805152000.12817-1-vbabka@suse.cz> <20210805152000.12817-30-vbabka@suse.cz> <0b36128c-3e12-77df-85fe-a153a714569b@quicinc.com> <2eb3cf340716c40f03a0a342ab40219b3d1de195.camel@gmx.de> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <2eb3cf340716c40f03a0a342ab40219b3d1de195.camel@gmx.de> Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Aug 10, 2021 at 01:47:42PM +0200, Mike Galbraith wrote: > On Tue, 2021-08-10 at 11:03 +0200, Vlastimil Babka wrote: > > On 8/9/21 3:41 PM, Qian Cai wrote: > > > > ? > > > > +static DEFINE_MUTEX(flush_lock); > > > > +static DEFINE_PER_CPU(struct slub_flush_work, slub_flush); > > > > + > > > > ?static void flush_all(struct kmem_cache *s) > > > > ?{ > > > > -???????on_each_cpu_cond(has_cpu_slab, flush_cpu_slab, s, 1); > > > > +???????struct slub_flush_work *sfw; > > > > +???????unsigned int cpu; > > > > + > > > > +???????mutex_lock(&flush_lock); > > > > > > Vlastimil, taking the lock here could trigger a warning during memory offline/online due to the locking order: > > > > > > slab_mutex -> flush_lock > > > > > > [?? 91.374541] WARNING: possible circular locking dependency detected > > > [?? 91.381411] 5.14.0-rc5-next-20210809+ #84 Not tainted > > > [?? 91.387149] ------------------------------------------------------ > > > [?? 91.394016] lsbug/1523 is trying to acquire lock: > > > [?? 91.399406] ffff800018e76530 (flush_lock){+.+.}-{3:3}, at: flush_all+0x50/0x1c8 > > > [?? 91.407425] > > > ?????????????? but task is already holding lock: > > > [?? 91.414638] ffff800018e48468 (slab_mutex){+.+.}-{3:3}, at: slab_memory_callback+0x44/0x280 > > > [?? 91.423603] > > > ?????????????? which lock already depends on the new lock. > > > > > > > OK, managed to reproduce in qemu and this fixes it for me on top of > > next-20210809. Could you test as well, as your testing might be more > > comprehensive? I will format is as a fixup for the proper patch in the series then. > > As it appeared it should, moving cpu_hotplug_lock outside slab_mutex in > kmem_cache_destroy() on top of that silenced the cpu offline gripe. And this one got rid of the remainder of the deadlock, but gets me the splat shown at the end of this message. So some sort of middle ground may be needed. (Same reproducer as in my previous reply to Vlastimil.) Thanx, Paul > --- > mm/slab_common.c | 2 ++ > mm/slub.c | 2 +- > 2 files changed, 3 insertions(+), 1 deletion(-) > > --- a/mm/slab_common.c > +++ b/mm/slab_common.c > @@ -502,6 +502,7 @@ void kmem_cache_destroy(struct kmem_cach > if (unlikely(!s)) > return; > > + cpus_read_lock(); > mutex_lock(&slab_mutex); > > s->refcount--; > @@ -516,6 +517,7 @@ void kmem_cache_destroy(struct kmem_cach > } > out_unlock: > mutex_unlock(&slab_mutex); > + cpus_read_unlock(); > } > EXPORT_SYMBOL(kmem_cache_destroy); > > --- a/mm/slub.c > +++ b/mm/slub.c > @@ -4234,7 +4234,7 @@ int __kmem_cache_shutdown(struct kmem_ca > int node; > struct kmem_cache_node *n; > > - flush_all(s); > + flush_all_cpus_locked(s); > /* Attempt to free all objects */ > for_each_kmem_cache_node(s, node, n) { > free_partial(s, n); [ 602.539109] ------------[ cut here ]------------ [ 602.539804] WARNING: CPU: 3 PID: 88 at kernel/cpu.c:335 lockdep_assert_cpus_held+0x29/0x30 [ 602.540940] Modules linked in: [ 602.541377] CPU: 3 PID: 88 Comm: torture_shutdow Not tainted 5.14.0-rc5-next-20210809+ #3299 [ 602.542536] Hardware name: Red Hat KVM/RHEL-AV, BIOS 1.13.0-2.module_el8.5.0+746+bbd5d70c 04/01/2014 [ 602.543786] RIP: 0010:lockdep_assert_cpus_held+0x29/0x30 [ 602.544524] Code: 00 83 3d 4d f1 a4 01 01 76 0a 8b 05 4d 23 a5 01 85 c0 75 01 c3 be ff ff ff ff 48 c7 c7 b0 86 66 a3 e8 9b 05 c9 00 85 c0 75 ea <0f> 0b c3 0f 1f 40 00 41 57 41 89 ff 41 56 4d 89 c6 41 55 49 89 cd [ 602.547051] RSP: 0000:ffffb382802efdb8 EFLAGS: 00010246 [ 602.547783] RAX: 0000000000000000 RBX: ffffa23301a44000 RCX: 0000000000000001 [ 602.548764] RDX: 0000000000000001 RSI: ffffffffa335f5c0 RDI: ffffffffa33adbbf[ 602.549747] RBP: ffffa23301a44000 R08: ffffa23302810000 R09: 974cf0ba5c48ad3c [ 602.550727] R10: ffffb382802efe78 R11: 0000000000000001 R12: ffffa23301a44000[ 602.551709] R13: 00000000000249c0 R14: 00000000ffffffff R15: 0000000fffffffe0 [ 602.552694] FS: 0000000000000000(0000) GS:ffffa2331f580000(0000) knlGS:0000000000000000 [ 602.553805] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 602.554606] CR2: 0000000000000000 CR3: 0000000017222000 CR4: 00000000000006e0 [ 602.555601] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 602.556590] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 602.557585] Call Trace: [ 602.557927] flush_all_cpus_locked+0x29/0x140 [ 602.558535] __kmem_cache_shutdown+0x26/0x200 [ 602.559145] ? lock_is_held_type+0xd6/0x130 [ 602.559739] ? torture_onoff+0x260/0x260 [ 602.560284] kmem_cache_destroy+0x38/0x110 [ 602.560859] rcu_torture_cleanup.cold.36+0x192/0x421 [ 602.561539] ? wait_woken+0x60/0x60 [ 602.562035] ? torture_onoff+0x260/0x260 [ 602.562591] torture_shutdown+0xdd/0x1c0 [ 602.563131] kthread+0x132/0x160 [ 602.563592] ? set_kthread_struct+0x40/0x40 [ 602.564172] ret_from_fork+0x22/0x30 [ 602.564696] irq event stamp: 1307 [ 602.565161] hardirqs last enabled at (1315): [] __up_console_sem+0x4d/0x50 [ 602.566321] hardirqs last disabled at (1324): [] __up_console_sem+0x32/0x50 [ 602.567479] softirqs last enabled at (1304): [] __do_softirq+0x311/0x473 [ 602.568616] softirqs last disabled at (1299): [] irq_exit_rcu+0xe8/0xf0 [ 602.569735] ---[ end trace 26fd643e1df331c9 ]---