Received: by 2002:a05:6a10:f3d0:0:0:0:0 with SMTP id a16csp3920044pxv; Tue, 13 Jul 2021 06:57:55 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyzmN3Of/+9i85Nrj+h7fcO+zsezPQYF0UJQUtAv4QqmYdqX5RvUybfurHC8E9m/5ZY/Zhu X-Received: by 2002:a17:906:49ce:: with SMTP id w14mr5842384ejv.273.1626184674952; Tue, 13 Jul 2021 06:57:54 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1626184674; cv=none; d=google.com; s=arc-20160816; b=pigJv47TIoAvOx0vZmYhMzyyGDG+BHg5hEQ5Vyy/bakTKHwQjBe+94PyYO/clKBkWx /g0ke9j7Xlbrq0ggX8i3DxZ0BsQ5fPCB59ZkUxIl95yTMtpKOr9rvuJuXjW86RVyIxrv T+b3faxiJjVU4V7kDRGMve9JRnSyadsR4xOyOSYFqQIUKiY2QjbKPUdAG/N+cVCh8NLK L1Ghn3w14sGLsQnTy5DG9HenPQdoRe/BjriY/KSCC+hQ+ocmEPBaVI/ztzRpdB0Eg6s4 rKOf0O/uviQNykQAwoOt+BZD4W0O6h/yNCDpIQA4hH+CoV87LL2lgcMzhyxvbpvYW+Gr bPqg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from; bh=aCF7Mqj++mPzuPBm09ZmCUTA9g8G2Lx/xcgB2VvhewY=; b=bjWm0L6tnJjw7L0VxajM8KqZXxCCAvujfqqRTBdnq0XSlskqNX6OX+TK2VPZ6yef9q mOjp7Mtt/eltnJPz9/B0BTAhd2g5ec4yROxT7y97S6gdTdyUhtwNneVSBvEd7Iammx7A tvqaiE+UP4KYkrv+wP9beRicfrv1nneCPeDnoYUvlyXyXg5m14o5xd7H2UEoe5+4yBay 8UOvxSIQgCt1U4lqnD55DhIkl8LzxLRkTZ4eyaLRaeAltQjScYQfONAbwwJBGPhHiP5U 80hIVxH+Kwa+r6JBvDaQ3QjlEYcZLIuUA5pb4bOESV3bCsCmzQzSAriEVVBr/UvXzny/ bm2g== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id h93si24440730edd.605.2021.07.13.06.57.32; Tue, 13 Jul 2021 06:57:54 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236833AbhGMN7U (ORCPT + 99 others); Tue, 13 Jul 2021 09:59:20 -0400 Received: from outbound-smtp27.blacknight.com ([81.17.249.195]:47894 "EHLO outbound-smtp27.blacknight.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236222AbhGMN7R (ORCPT ); Tue, 13 Jul 2021 09:59:17 -0400 Received: from mail.blacknight.com (pemlinmail06.blacknight.ie [81.17.255.152]) by outbound-smtp27.blacknight.com (Postfix) with ESMTPS id 8BCFDCADC3 for ; Tue, 13 Jul 2021 14:56:26 +0100 (IST) Received: (qmail 30347 invoked from network); 13 Jul 2021 13:56:26 -0000 Received: from unknown (HELO stampy.112glenside.lan) (mgorman@techsingularity.net@[84.203.17.255]) by 81.17.254.9 with ESMTPA; 13 Jul 2021 13:56:26 -0000 From: Mel Gorman To: Andrew Morton Cc: Desmond Cheong Zhi Xi , Zhang Qiang , Yanfei Xu , Chuck Lever , Jesper Dangaard Brouer , Matteo Croce , Linux-MM , LKML , Mel Gorman Subject: [PATCH 1/4] mm/page_alloc: Avoid page allocator recursion with pagesets.lock held Date: Tue, 13 Jul 2021 14:56:22 +0100 Message-Id: <20210713135625.7615-2-mgorman@techsingularity.net> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20210713135625.7615-1-mgorman@techsingularity.net> References: <20210713135625.7615-1-mgorman@techsingularity.net> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Syzbot is reporting potential deadlocks due to pagesets.lock when PAGE_OWNER is enabled. One example from Desmond Cheong Zhi Xi is as follows __alloc_pages_bulk() local_lock_irqsave(&pagesets.lock, flags) <---- outer lock here prep_new_page(): post_alloc_hook(): set_page_owner(): __set_page_owner(): save_stack(): stack_depot_save(): alloc_pages(): alloc_page_interleave(): __alloc_pages(): get_page_from_freelist(): rm_queue(): rm_queue_pcplist(): local_lock_irqsave(&pagesets.lock, flags); *** DEADLOCK *** Zhang, Qiang also reported BUG: sleeping function called from invalid context at mm/page_alloc.c:5179 in_atomic(): 0, irqs_disabled(): 1, non_block: 0, pid: 1, name: swapper/0 ..... __dump_stack lib/dump_stack.c:79 [inline] dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:96 ___might_sleep.cold+0x1f1/0x237 kernel/sched/core.c:9153 prepare_alloc_pages+0x3da/0x580 mm/page_alloc.c:5179 __alloc_pages+0x12f/0x500 mm/page_alloc.c:5375 alloc_page_interleave+0x1e/0x200 mm/mempolicy.c:2147 alloc_pages+0x238/0x2a0 mm/mempolicy.c:2270 stack_depot_save+0x39d/0x4e0 lib/stackdepot.c:303 save_stack+0x15e/0x1e0 mm/page_owner.c:120 __set_page_owner+0x50/0x290 mm/page_owner.c:181 prep_new_page mm/page_alloc.c:2445 [inline] __alloc_pages_bulk+0x8b9/0x1870 mm/page_alloc.c:5313 alloc_pages_bulk_array_node include/linux/gfp.h:557 [inline] vm_area_alloc_pages mm/vmalloc.c:2775 [inline] __vmalloc_area_node mm/vmalloc.c:2845 [inline] __vmalloc_node_range+0x39d/0x960 mm/vmalloc.c:2947 __vmalloc_node mm/vmalloc.c:2996 [inline] vzalloc+0x67/0x80 mm/vmalloc.c:3066 There are a number of ways it could be fixed. The page owner code could be audited to strip GFP flags that allow sleeping but it'll impair the functionality of PAGE_OWNER if allocations fail. The bulk allocator could add a special case to release/reacquire the lock for prep_new_page and lookup PCP after the lock is reacquired at the cost of performance. The pages requiring prep could be tracked using the least significant bit and looping through the array although it is more complicated for the list interface. The options are relatively complex and the second one still incurs a performance penalty when PAGE_OWNER is active so this patch takes the simple approach -- disable bulk allocation if PAGE_OWNER is active. The caller will be forced to allocate one page at a time incurring a performance penalty but PAGE_OWNER is already a performance penalty. Fixes: dbbee9d5cd83 ("mm/page_alloc: convert per-cpu list protection to local_lock") Reported-by: Desmond Cheong Zhi Xi Reported-by: "Zhang, Qiang" Reported-and-tested-by: syzbot+127fd7828d6eeb611703@syzkaller.appspotmail.com Signed-off-by: Mel Gorman Acked-by: Rafael Aquini --- mm/page_alloc.c | 12 ++++++++++++ 1 file changed, 12 insertions(+) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 3b97e17806be..6ef86f338151 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -5239,6 +5239,18 @@ unsigned long __alloc_pages_bulk(gfp_t gfp, int preferred_nid, if (nr_pages - nr_populated == 1) goto failed; +#ifdef CONFIG_PAGE_OWNER + /* + * PAGE_OWNER may recurse into the allocator to allocate space to + * save the stack with pagesets.lock held. Releasing/reacquiring + * removes much of the performance benefit of bulk allocation so + * force the caller to allocate one page at a time as it'll have + * similar performance to added complexity to the bulk allocator. + */ + if (static_branch_unlikely(&page_owner_inited)) + goto failed; +#endif + /* May set ALLOC_NOFRAGMENT, fragmentation will return 1 page. */ gfp &= gfp_allowed_mask; alloc_gfp = gfp; -- 2.26.2