Received: by 2002:a05:6a10:2726:0:0:0:0 with SMTP id ib38csp1132674pxb; Tue, 29 Mar 2022 18:11:14 -0700 (PDT) X-Google-Smtp-Source: ABdhPJx+451WyGRkw+H7PSxPUF4WClHpCDXSy6HgM64V8A/S5KL0fTBCLe3qbnkGkjY3eC241OUG X-Received: by 2002:a17:902:7ec1:b0:156:17a4:a2f8 with SMTP id p1-20020a1709027ec100b0015617a4a2f8mr10787474plb.155.1648602673853; Tue, 29 Mar 2022 18:11:13 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1648602673; cv=none; d=google.com; s=arc-20160816; b=HuGEj2HCJuJsqRGKMD60381RH7WVCECG+iu/l3AYvTHiVT64kOBeI9D6pqtcLtdAi5 zb+HXaJFYL6KBAYuaHCKZLKKWpnZ3Rxsrlh/0t3ozBo8qo6m+DePbqx0z+J29BtnzrA2 uKojVo2Bo7FhDCgqOppjA0jVamCAzhrMveK0dHQWniSWukGLGynfjxYnuNJcBK/+Ycra 1Uj9WUa54zz1uZcw9ROah8BfUiK9oiHKbM6SoFwmmJhJWj+jMSRKqTYDddBBKPagrqX9 Nmv9FiOSsemudwNvBIrR+bSfoDMRxYThaIYQnD2XsGtUSElj+f+8ISfMiTByCtILC2m2 gQeg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=t7Q+jhI9UZJhIWE2/oqDfYhFA25P+HJW56muMmn3X9U=; b=bbVx1kcOMio9YUQ9Xa2to7uLLT5+ED93xfbPA1phgs7+u0KheUlFWCyeDV2gAsy+rZ 1GKiSZIeziuvCttWbRD41USrwwPhR8KRSo14r+WrT5MbRlFpUR72I/7xu4uiroqZ72oB 2D+9DXXEjj8nkw/r2kny7ugtJOg+RjzXzqXa7Au72rS4EK0qW7oX1rQyb2VXiF+dQ5JQ 11fk5TPJYKA9rsREay+HDBDo9r+JHuuyN923awgOXnSImqsieBQAIRPkMcjhGfw8LuGH FsdC4aWJ92qmOCwA89Isz2q/dZj8QG8qCildOw/5Bk9hF9Pu44I5L8MAUvY9Bsy7nfOK OFoQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@infradead.org header.s=casper.20170209 header.b="S56/59mj"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id lk17-20020a17090b33d100b001c98dbffe90si769670pjb.26.2022.03.29.18.10.58; Tue, 29 Mar 2022 18:11:13 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@infradead.org header.s=casper.20170209 header.b="S56/59mj"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236845AbiC2Ops (ORCPT + 99 others); Tue, 29 Mar 2022 10:45:48 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57032 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238056AbiC2Opn (ORCPT ); Tue, 29 Mar 2022 10:45:43 -0400 Received: from casper.infradead.org (casper.infradead.org [IPv6:2001:8b0:10b:1236::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5708F366A7 for ; Tue, 29 Mar 2022 07:43:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=In-Reply-To:Content-Type:MIME-Version: References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=t7Q+jhI9UZJhIWE2/oqDfYhFA25P+HJW56muMmn3X9U=; b=S56/59mjT/XoUOayZae2jx8dZD WdZ98U/Zj60LEJcCEdPp8ehpChTRTa6y+NAMe1wWQCp4DQfmUjDJNwxh4tdTkp5qc5uQfkWom9vam NUasl9Gk/Unj1vdPVYN3yvwgZvFvWCmkS0wWUsTHlMqpMqRiPAbCs9eO8bTOn6xmjVKXs1b9Wsnw5 8eYW4ELbrhnuC4ZF6KHa/IE+IyVvSI2RIVMifsf9lfKSOWblxNDgtNbyX12FEkHSSTzJIt+V8x6UF t3S83extJu7IyBuRtYMWGo62SyhKC7ziFiJSj7CLvwF2lPxGiKYaA1GVvkNvBLAu8nHq/F4+bjq3P zeQjqlJw==; Received: from willy by casper.infradead.org with local (Exim 4.94.2 #2 (Red Hat Linux)) id 1nZD4Y-000Mmz-7c; Tue, 29 Mar 2022 14:43:50 +0000 Date: Tue, 29 Mar 2022 15:43:50 +0100 From: Matthew Wilcox To: Hugh Dickins Cc: Johannes Weiner , Muchun Song , Andrew Morton , linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: Re: WARNING: mm/workingset.c:567 shadow_lru_isolate Message-ID: References: <55db706b-9bce-b820-7d88-6392374cd4fc@google.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <55db706b-9bce-b820-7d88-6392374cd4fc@google.com> X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED,SPF_HELO_NONE, SPF_NONE,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Mar 28, 2022 at 09:24:00PM -0700, Hugh Dickins wrote: > Sunday's linux.git, under tmpfs swapping load (including > ext4 file readonly THPs and shmem THPs), gave me the: > > if (WARN_ON_ONCE(!node->nr_values)) > goto out_invalid; Fun. I'm assuming it's actually the ext4 file ROTHP that causes the issue ... but really, I don't think it matters to fixing the bug. > if (WARN_ON_ONCE(node->count != node->nr_values)) > goto out_invalid; > > from mm/workingset.c - log below. I did see these on mmotm before > the merge window, but only now got around to bisecting them. And > the bisection, confirmed by revert, arrives surprisingly at: > > 820c4e2e6f51 ("mm/vmscan: Free non-shmem folios without splitting them") > > --- a/mm/vmscan.c > +++ b/mm/vmscan.c > @@ -1739,8 +1739,8 @@ static unsigned int shrink_page_list(struct list_head *page_list, > /* Adding to swap updated mapping */ > mapping = page_mapping(page); > } > - } else if (unlikely(PageTransHuge(page))) { > - /* Split file THP */ > + } else if (PageSwapBacked(page) && PageTransHuge(page)) { > + /* Split shmem THP */ > if (split_folio_to_list(folio, page_list)) > goto keep_locked; > } > > I'm not familiar with workingset.c, I'll make no guesses, over to you! OK, so the overall point of this part of workingset.c is that we used to simply remove pages from the page cache when they fell off the end of the LRU. Now we replace them with a shadow entry (obviously shmem is different here which is why you're so unfamiliar with it). The problem with this is that we still tie up radix tree nodes tracking how old the pages would have been. So workingset puts nodes which only have shadow entries on an LRU and will remove old radix tree nodes when there's enough memory pressure. These WARN_ON messages are supposed to be checking the invariant that the nodes we're deleting only contain shadow entries; no real pages. When the first one fires, it says there are _no_ shadow entries in this node, and if we get to the second one, we're seeing a node with N entries, but M of them are shadow entries and M != N. That shouldn't be possible; we should only put nodes on the list when nr_values is positive and equal to count (see workingset_update_node()). OK, so why are we hitting this now? Before, we'd split the THP, creating 8 new radix tree nodes, all of which would contain 64 shadow entries and get put on the LRU. What _should_ be happening after the commit that we bisect to is that the node->count is unaffected, but node->nr_values increases by 8. I'll investigate a little bit, but may ask you to add some debug if I'm having trouble reproducing it. > Hugh > > [ 1470.572186] ------------[ cut here ]------------ > [ 1470.574478] WARNING: CPU: 1 PID: 47 at mm/workingset.c:567 shadow_lru_isolate+0x84/0x128 > [ 1470.575570] CPU: 1 PID: 47 Comm: kswapd0 Not tainted 5.18.0-rc0 #2 > [ 1470.576592] Hardware name: LENOVO 20HQS0EG02/20HQS0EG02, BIOS N1MET54W (1.39 ) 04/16/2019 > [ 1470.577836] RIP: 0010:shadow_lru_isolate+0x84/0x128 > [ 1470.579170] Code: 89 ff 4c 8d 73 e8 e8 16 f4 ff ff 83 ca ff be 09 00 00 00 4c 89 f7 e8 91 3c 05 00 4c 89 ef e8 e9 25 7d 00 8a 43 eb 84 c0 75 04 <0f> 0b eb 2a 3a 43 ea 74 04 0f 0b eb 21 48 c7 c6 6a f7 1f 81 4c 89 > [ 1470.580669] RSP: 0018:ffff888000e0bba0 EFLAGS: 00010046 > [ 1470.582173] RAX: 0000000080000000 RBX: ffff888016c21510 RCX: 8bc9d3bc33ed677c > [ 1470.583755] RDX: 0000000000000003 RSI: ffffffff823f436f RDI: 00000000ffffffff > [ 1470.585303] RBP: ffff888000e0bbc8 R08: 0000000000000078 R09: 00000000000c8386 > [ 1470.586860] R10: 0000000000000001 R11: ffff888000e055b8 R12: ffff8880166c99d0 > [ 1470.588518] R13: ffff8880010dd180 R14: ffff888016c214f8 R15: ffff8880010dd1c0 > [ 1470.590154] FS: 0000000000000000(0000) GS:ffff88803ec80000(0000) knlGS:0000000000000000 > [ 1470.591743] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 1470.593296] CR2: 0000555555694f50 CR3: 000000000f940005 CR4: 00000000003706e0 > [ 1470.594824] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > [ 1470.596198] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > [ 1470.597581] Call Trace: > [ 1470.598907] > [ 1470.600267] __list_lru_walk_one+0x84/0xf2 > [ 1470.601629] ? workingset_update_node+0x84/0x84 > [ 1470.602977] list_lru_walk_one_irq+0x59/0x74 > [ 1470.604324] ? workingset_update_node+0x84/0x84 > [ 1470.605657] scan_shadow_nodes+0x29/0x2b > [ 1470.607073] do_shrink_slab+0x1d0/0x33d > [ 1470.608477] shrink_slab+0x98/0x100 > [ 1470.609880] shrink_node_memcgs+0x136/0x182 > [ 1470.611164] shrink_node+0x220/0x3d9 > [ 1470.612446] balance_pgdat+0x216/0x3cb > [ 1470.613715] ? lock_is_held_type+0xcf/0x10f > [ 1470.615010] kswapd+0x189/0x1ae > [ 1470.616285] ? balance_pgdat+0x3cb/0x3cb > [ 1470.617481] kthread+0xee/0xf6 > [ 1470.618690] ? kthread_exit+0x1f/0x1f > [ 1470.619932] ret_from_fork+0x1f/0x30 > [ 1470.621110] > [ 1470.622237] irq event stamp: 76862920 > [ 1470.623284] hardirqs last enabled at (76862919): [] _raw_spin_unlock_irq+0x23/0x42 > [ 1470.624373] hardirqs last disabled at (76862920): [] _raw_spin_lock_irq+0x17/0x4c > [ 1470.625540] softirqs last enabled at (76861226): [] __do_softirq+0x36f/0x3aa > [ 1470.626752] softirqs last disabled at (76861195): [] __irq_exit_rcu+0x85/0xc1 > [ 1470.627957] ---[ end trace 0000000000000000 ]--- > [ 1470.629407] ------------[ cut here ]------------ > [ 1470.630849] WARNING: CPU: 1 PID: 47 at mm/workingset.c:569 shadow_lru_isolate+0x8d/0x128 > [ 1470.632264] CPU: 1 PID: 47 Comm: kswapd0 Tainted: G W 5.18.0-rc0 #2 > [ 1470.633729] Hardware name: LENOVO 20HQS0EG02/20HQS0EG02, BIOS N1MET54W (1.39 ) 04/16/2019 > [ 1470.635202] RIP: 0010:shadow_lru_isolate+0x8d/0x128 > [ 1470.636601] Code: ff ff 83 ca ff be 09 00 00 00 4c 89 f7 e8 91 3c 05 00 4c 89 ef e8 e9 25 7d 00 8a 43 eb 84 c0 75 04 0f 0b eb 2a 3a 43 ea 74 04 <0f> 0b eb 21 48 c7 c6 6a f7 1f 81 4c 89 f7 e8 71 ca 7a 00 ba 01 00 > [ 1470.638206] RSP: 0018:ffff888000e0bba0 EFLAGS: 00010087 > [ 1470.639811] RAX: 0000000080000020 RBX: ffff888016c4d020 RCX: 8bc9d3bc33ed677c > [ 1470.641470] RDX: 0000000000000003 RSI: ffffffff823f436f RDI: 00000000ffffffff > [ 1470.643072] RBP: ffff888000e0bbc8 R08: 0000000000000078 R09: 00000000000c8386 > [ 1470.644688] R10: 0000000000000001 R11: ffff888000e055b8 R12: ffff888016c54910 > [ 1470.646312] R13: ffff8880010dd180 R14: ffff888016c4d008 R15: ffff8880010dd1c0 > [ 1470.648341] FS: 0000000000000000(0000) GS:ffff88803ec80000(0000) knlGS:0000000000000000 > [ 1470.649820] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 1470.651404] CR2: 0000555555694f50 CR3: 000000000f940005 CR4: 00000000003706e0 > [ 1470.651408] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > [ 1470.651411] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > [ 1470.655817] Call Trace: > [ 1470.655821] > [ 1470.655824] __list_lru_walk_one+0x84/0xf2 > [ 1470.660171] ? workingset_update_node+0x84/0x84 > [ 1470.660178] list_lru_walk_one_irq+0x59/0x74 > [ 1470.662784] ? workingset_update_node+0x84/0x84 > [ 1470.662791] scan_shadow_nodes+0x29/0x2b > [ 1470.662795] do_shrink_slab+0x1d0/0x33d > [ 1470.662800] shrink_slab+0x98/0x100 > [ 1470.662805] shrink_node_memcgs+0x136/0x182 > [ 1470.662810] shrink_node+0x220/0x3d9 > [ 1470.662814] balance_pgdat+0x216/0x3cb > [ 1470.662818] ? lock_is_held_type+0xcf/0x10f > [ 1470.662824] kswapd+0x189/0x1ae > [ 1470.662828] ? balance_pgdat+0x3cb/0x3cb > [ 1470.662833] kthread+0xee/0xf6 > [ 1470.662838] ? kthread_exit+0x1f/0x1f > [ 1470.662843] ret_from_fork+0x1f/0x30 > [ 1470.662849] > [ 1470.662851] irq event stamp: 76862966 > [ 1470.662854] hardirqs last enabled at (76862965): [] _raw_spin_unlock_irq+0x23/0x42 > [ 1470.662860] hardirqs last disabled at (76862966): [] _raw_spin_lock_irq+0x17/0x4c > [ 1470.662866] softirqs last enabled at (76862958): [] __do_softirq+0x36f/0x3aa > [ 1470.662871] softirqs last disabled at (76862925): [] __irq_exit_rcu+0x85/0xc1 > [ 1470.662877] ---[ end trace 0000000000000000 ]--- >