Received: by 2002:ab2:6203:0:b0:1f5:f2ab:c469 with SMTP id o3csp498220lqt; Fri, 19 Apr 2024 02:05:10 -0700 (PDT) X-Forwarded-Encrypted: i=3; AJvYcCW9f2G9uQijqnVXBcu7W/0yp5JVnrnZjrZBDSc/z8Vid0Ft7Ukfpc4O4ArNwsivBoLztPh0YyQwNPAGPLfm/tei6aOcJrnY5rUpXogJOA== X-Google-Smtp-Source: AGHT+IFMeWPZkOUfhHkMD0p2GKeMTb8FC0+8AtcLxMc78A5zgfEi2gATc1bgIdzukQd9zsZAg55K X-Received: by 2002:a54:481a:0:b0:3c7:2a49:9777 with SMTP id j26-20020a54481a000000b003c72a499777mr1533817oij.53.1713517510214; Fri, 19 Apr 2024 02:05:10 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1713517510; cv=pass; d=google.com; s=arc-20160816; b=w2MDbYVbSzw+tca8K3BgQyCTaKlB0wZ9INk2cX5mLQMppeeZOqEQ/mCq+qAPB+/n+Q 82nwVCxK83m6zS9t8CBWEs8YtD0yFJK1N4+WgqQIO/4bi3yrZyyHp67ItYWlTTL7K+AE pLabunpODfc0zP4Bx0AAsrPcTr2Q93ugZhL4S2QCwvmSdmZC8IVqZulq6SF9PPKUuwdQ ZuHTyLLnUIOT/e+kY9d+FmtxCPkPtSlKdXChoSqSCw0XZxIXLSj5/NQyicFmBaUjl1Aj EPIL6S1kGU1NY1Ko1uA5WRLsTaj8DJVEA2ZgLtviYWr46j/jwFjRM2af9L4jWYMNQ+wM FCsw== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:message-id:date:subject:cc:to :from; bh=dtO/qgVSNjluWq185j7juVH1VRbFKByHDJqzEP+cUOY=; fh=L1Az4lvw/jU5lfjk7CBnQSxqJlfOqVULJn8w9fGzqaw=; b=p+SaQx7UoYkOFwDcq19nD+jcT6AxepY+kAGc+/14VE2FsrgNPzUHUfQibxLIobLsuk 9HtYU3CKXxVF23mC3s/8GNjrq1wjhztiPMHlinHhywMkN5FbMJcbbnLSicETZMWwfHHI yKd5B4TEOWbOesuw5+dCZIM9w35Nr9oLB7EAw4O2gZ/RFZUdVj/QNolY48aWWm0wls65 OUhuR3LXntVTHFg/HQtDeuqVWfyRC/hEwyY91r6doFGhfbaMiRgfAkIPB7kGfJysloAz B62u5T22QHfHEeH3v1TC21awfPTgbELqtVR97EffSY1QmfGJgMCaH3z9GcDiRAVEv+YO fyNg==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; arc=pass (i=1 spf=pass spfdomain=huawei.com dmarc=pass fromdomain=huawei.com); spf=pass (google.com: domain of linux-kernel+bounces-151205-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:40f1:3f00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-151205-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=huawei.com Return-Path: Received: from sy.mirrors.kernel.org (sy.mirrors.kernel.org. [2604:1380:40f1:3f00::1]) by mx.google.com with ESMTPS id 7-20020a631747000000b005e425bf0f0esi2801337pgx.830.2024.04.19.02.05.09 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 19 Apr 2024 02:05:10 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel+bounces-151205-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:40f1:3f00::1 as permitted sender) client-ip=2604:1380:40f1:3f00::1; Authentication-Results: mx.google.com; arc=pass (i=1 spf=pass spfdomain=huawei.com dmarc=pass fromdomain=huawei.com); spf=pass (google.com: domain of linux-kernel+bounces-151205-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:40f1:3f00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-151205-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=huawei.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sy.mirrors.kernel.org (Postfix) with ESMTPS id AE3D6B2347D for ; Fri, 19 Apr 2024 09:01:59 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 6A18A79950; Fri, 19 Apr 2024 09:00:50 +0000 (UTC) Received: from szxga07-in.huawei.com (szxga07-in.huawei.com [45.249.212.35]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id AB0FB65194 for ; Fri, 19 Apr 2024 09:00:47 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.35 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1713517249; cv=none; b=jkF2le5WBB08/RbWivls+AkvznEETB3lJRdBBPP/6nInfPuReuELwe0sZ0BJK9hIoGO/7UrNg+V1l1D1phJR0Gib6anUePw/c0izNUhy4HHngJhThVIzT1kgbxcLPgjqNcjeyrlkrd6fuX0e9hi9+dI0pdi2aYK+JEhfVlTKi0c= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1713517249; c=relaxed/simple; bh=i1geZbCak49qB4pVeuHPvkwxHT5Il3XYMjAPZf7+NTs=; h=From:To:CC:Subject:Date:Message-ID:MIME-Version:Content-Type; b=BGxmItTj8Hq/A7xhCbGrRseJE+iuVBiWO0MfD2zwD6PMcujIKb0Z+XW2XBMNXjbd00/YBd9V/4k1zMm0H6bbcVWOMfBogKVug2ncgAyU7mk3ZFCDA26/AQKbKQ/PT0qoqGxO5tu+TA4YAGBQvlTiDE/zdQ93EBoL5BHUdvaWlB0= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=huawei.com; spf=pass smtp.mailfrom=huawei.com; arc=none smtp.client-ip=45.249.212.35 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=huawei.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huawei.com Received: from mail.maildlp.com (unknown [172.19.88.214]) by szxga07-in.huawei.com (SkyGuard) with ESMTP id 4VLT7D5P3bz1RCg0; Fri, 19 Apr 2024 16:57:44 +0800 (CST) Received: from canpemm500002.china.huawei.com (unknown [7.192.104.244]) by mail.maildlp.com (Postfix) with ESMTPS id D89341A016F; Fri, 19 Apr 2024 17:00:44 +0800 (CST) Received: from huawei.com (10.173.135.154) by canpemm500002.china.huawei.com (7.192.104.244) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.1.2507.35; Fri, 19 Apr 2024 17:00:44 +0800 From: Miaohe Lin To: , CC: , , , , Subject: [PATCH v2] mm/hugetlb: fix DEBUG_LOCKS_WARN_ON(1) when dissolve_free_hugetlb_folio() Date: Fri, 19 Apr 2024 16:58:19 +0800 Message-ID: <20240419085819.1901645-1-linmiaohe@huawei.com> X-Mailer: git-send-email 2.33.0 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Content-Type: text/plain X-ClientProxiedBy: dggems705-chm.china.huawei.com (10.3.19.182) To canpemm500002.china.huawei.com (7.192.104.244) When I did memory failure tests recently, below warning occurs: DEBUG_LOCKS_WARN_ON(1) WARNING: CPU: 8 PID: 1011 at kernel/locking/lockdep.c:232 __lock_acquire+0xccb/0x1ca0 Modules linked in: mce_inject hwpoison_inject CPU: 8 PID: 1011 Comm: bash Kdump: loaded Not tainted 6.9.0-rc3-next-20240410-00012-gdb69f219f4be #3 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014 RIP: 0010:__lock_acquire+0xccb/0x1ca0 RSP: 0018:ffffa7a1c7fe3bd0 EFLAGS: 00000082 RAX: 0000000000000000 RBX: eb851eb853975fcf RCX: ffffa1ce5fc1c9c8 RDX: 00000000ffffffd8 RSI: 0000000000000027 RDI: ffffa1ce5fc1c9c0 RBP: ffffa1c6865d3280 R08: ffffffffb0f570a8 R09: 0000000000009ffb R10: 0000000000000286 R11: ffffffffb0f2ad50 R12: ffffa1c6865d3d10 R13: ffffa1c6865d3c70 R14: 0000000000000000 R15: 0000000000000004 FS: 00007ff9f32aa740(0000) GS:ffffa1ce5fc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007ff9f3134ba0 CR3: 00000008484e4000 CR4: 00000000000006f0 Call Trace: lock_acquire+0xbe/0x2d0 _raw_spin_lock_irqsave+0x3a/0x60 hugepage_subpool_put_pages.part.0+0xe/0xc0 free_huge_folio+0x253/0x3f0 dissolve_free_huge_page+0x147/0x210 __page_handle_poison+0x9/0x70 memory_failure+0x4e6/0x8c0 hard_offline_page_store+0x55/0xa0 kernfs_fop_write_iter+0x12c/0x1d0 vfs_write+0x380/0x540 ksys_write+0x64/0xe0 do_syscall_64+0xbc/0x1d0 entry_SYSCALL_64_after_hwframe+0x77/0x7f RIP: 0033:0x7ff9f3114887 RSP: 002b:00007ffecbacb458 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 RAX: ffffffffffffffda RBX: 000000000000000c RCX: 00007ff9f3114887 RDX: 000000000000000c RSI: 0000564494164e10 RDI: 0000000000000001 RBP: 0000564494164e10 R08: 00007ff9f31d1460 R09: 000000007fffffff R10: 0000000000000000 R11: 0000000000000246 R12: 000000000000000c R13: 00007ff9f321b780 R14: 00007ff9f3217600 R15: 00007ff9f3216a00 Kernel panic - not syncing: kernel: panic_on_warn set ... CPU: 8 PID: 1011 Comm: bash Kdump: loaded Not tainted 6.9.0-rc3-next-20240410-00012-gdb69f219f4be #3 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014 Call Trace: panic+0x326/0x350 check_panic_on_warn+0x4f/0x50 __warn+0x98/0x190 report_bug+0x18e/0x1a0 handle_bug+0x3d/0x70 exc_invalid_op+0x18/0x70 asm_exc_invalid_op+0x1a/0x20 RIP: 0010:__lock_acquire+0xccb/0x1ca0 RSP: 0018:ffffa7a1c7fe3bd0 EFLAGS: 00000082 RAX: 0000000000000000 RBX: eb851eb853975fcf RCX: ffffa1ce5fc1c9c8 RDX: 00000000ffffffd8 RSI: 0000000000000027 RDI: ffffa1ce5fc1c9c0 RBP: ffffa1c6865d3280 R08: ffffffffb0f570a8 R09: 0000000000009ffb R10: 0000000000000286 R11: ffffffffb0f2ad50 R12: ffffa1c6865d3d10 R13: ffffa1c6865d3c70 R14: 0000000000000000 R15: 0000000000000004 lock_acquire+0xbe/0x2d0 _raw_spin_lock_irqsave+0x3a/0x60 hugepage_subpool_put_pages.part.0+0xe/0xc0 free_huge_folio+0x253/0x3f0 dissolve_free_huge_page+0x147/0x210 __page_handle_poison+0x9/0x70 memory_failure+0x4e6/0x8c0 hard_offline_page_store+0x55/0xa0 kernfs_fop_write_iter+0x12c/0x1d0 vfs_write+0x380/0x540 ksys_write+0x64/0xe0 do_syscall_64+0xbc/0x1d0 entry_SYSCALL_64_after_hwframe+0x77/0x7f RIP: 0033:0x7ff9f3114887 RSP: 002b:00007ffecbacb458 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 RAX: ffffffffffffffda RBX: 000000000000000c RCX: 00007ff9f3114887 RDX: 000000000000000c RSI: 0000564494164e10 RDI: 0000000000000001 RBP: 0000564494164e10 R08: 00007ff9f31d1460 R09: 000000007fffffff R10: 0000000000000000 R11: 0000000000000246 R12: 000000000000000c R13: 00007ff9f321b780 R14: 00007ff9f3217600 R15: 00007ff9f3216a00 After git bisecting and digging into the code, I believe the root cause is that _deferred_list field of folio is unioned with _hugetlb_subpool field. In __update_and_free_hugetlb_folio(), folio->_deferred_list is initialized leading to corrupted folio->_hugetlb_subpool when folio is hugetlb. Later free_huge_folio() will use _hugetlb_subpool and above warning happens. But it is assumed hugetlb flag must have been cleared when calling folio_put() in update_and_free_hugetlb_folio(). This assumption is broken due to below race: CPU1 CPU2 dissolve_free_huge_page update_and_free_pages_bulk update_and_free_hugetlb_folio hugetlb_vmemmap_restore_folios folio_clear_hugetlb_vmemmap_optimized clear_flag = folio_test_hugetlb_vmemmap_optimized if (clear_flag) <-- False, it's already cleared. __folio_clear_hugetlb(folio) <-- Hugetlb is not cleared. folio_put free_huge_folio <-- free_the_page is expected. list_for_each_entry() __folio_clear_hugetlb <-- Too late. Fix this issue by checking whether folio is hugetlb directly instead of checking clear_flag to close the race window. Fixes: 32c877191e02 ("hugetlb: do not clear hugetlb dtor until allocating vmemmap") CC: stable@vger.kernel.org Signed-off-by: Miaohe Lin --- v2: The root cause should be above race, so rework the fix. --- mm/hugetlb.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index d748664bb2c9..3b7d5ddc32ad 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -1773,7 +1773,7 @@ static void __update_and_free_hugetlb_folio(struct hstate *h, * If vmemmap pages were allocated above, then we need to clear the * hugetlb flag under the hugetlb lock. */ - if (clear_flag) { + if (folio_test_hugetlb(folio)) { spin_lock_irq(&hugetlb_lock); __folio_clear_hugetlb(folio); spin_unlock_irq(&hugetlb_lock); -- 2.33.0