Received: by 2002:a05:6a10:9848:0:0:0:0 with SMTP id x8csp71883pxf; Tue, 30 Mar 2021 19:51:24 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwNf1uzsCY59kVoipT6o0EoNd1Awc2YWkwX7AvcprsK6yhkhVVsaG9Cr1CZHwr/lq2TZ3i0 X-Received: by 2002:a17:907:2955:: with SMTP id et21mr1188225ejc.448.1617159084064; Tue, 30 Mar 2021 19:51:24 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1617159084; cv=none; d=google.com; s=arc-20160816; b=dh+NMTxOg/XQZ5kGBtWO6neEKr7qZuNXb0xi8f/x9A4h5Rl1vcG2Ngx+3xDVqFc6nx g6YWb6JFPg2hJBbDERdDTd9fyLZZFX9TG/qO7iIesAaS+RzRLingxd7izy1bSxjQDGvF 6Qv7bFc0ASxKqcivmXS/xVBdKDb6pLIvMHA6TnUr0V7Lvc3acPOKBP6tODwAVKHEtfun sNk15fk5JWp3EEetFM1N/HZf860OZ4SD/IdTUycud6hASRErTzmr8vJlZJCMBDD8jnhc 7SxXMKVNkUfJRIDcxnR8meAs+wGOn9lz+bMSUWMJcF7fDHV+OL8AuELsrYikARsxHW1W gJIA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=GNF6gUQ9773ceOlNU3m/mpbqKb9xhW/a2GyF7cConwU=; b=KknocLMXTwyT5Se0SlFYvbzDYO/Z2bWdjBLfD8jUFthxGy6cSTSs5LAwfwyy4VA7iu BIMMrPo02XRy8r4Vqbjjfqgum8mhLJ8ulm2w3j9C+c1i48CDD7jyidTGjZdZRajqaATf esaU9iPvFURU3dl8WUvvgt050l9df7HyGwqPL4WtK2S7CAu+e+2XhuOCY/WLs/j5c5Lq PfwEbfReF3HbroXug+ZnzCS6MtG8USYa0Mv9pOI8b9wrdr1SZTy4nkcrmJ5IUXVWSfO9 x1sIclcY5b0kmNtHjOhiKlpP85KZyzfC2CI+LSOEmrcz5U97Q8F8dxPJBJ8LR8FtQklv GK2Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@infradead.org header.s=casper.20170209 header.b=nRVKKIXa; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id g23si546637ejx.623.2021.03.30.19.51.01; Tue, 30 Mar 2021 19:51:24 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@infradead.org header.s=casper.20170209 header.b=nRVKKIXa; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233403AbhCaCtm (ORCPT + 99 others); Tue, 30 Mar 2021 22:49:42 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35870 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233392AbhCaCtk (ORCPT ); Tue, 30 Mar 2021 22:49:40 -0400 Received: from casper.infradead.org (casper.infradead.org [IPv6:2001:8b0:10b:1236::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7B19BC061574; Tue, 30 Mar 2021 19:49:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=In-Reply-To:Content-Type:MIME-Version: References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=GNF6gUQ9773ceOlNU3m/mpbqKb9xhW/a2GyF7cConwU=; b=nRVKKIXaglkLCMARHOWC59RNot hTB/jGCE1di9WNFl31vMF72sUMsUy4EeOSwvYXrwtut6DtqP7N6z3Ful/+bZQjmO6d9C9yIMAcX1U 74avSpgbRtc13i9xHbHKUO8DXG+inffHoyS+I0z8SN/KyORKV/KMCkY7etm1c4GgfdtotT+4Os9Fq QMGz/7kkVbOWdJNdnm40VF6QodjkWjzfcPe4LVJcnyqy65abSOy7F+HAx+X5zpJM0rY0A6KZqVcF2 N6pfa1MIrJFGXCs2eHW13PisFNru6dNbLRbgY3WeOZ1iyhFkFEtl4DbPwhB35Hozi8CtPZ+v+Cnzy +IQXQSPg==; Received: from willy by casper.infradead.org with local (Exim 4.94 #2 (Red Hat Linux)) id 1lRQuv-003vev-Iv; Wed, 31 Mar 2021 02:49:20 +0000 Date: Wed, 31 Mar 2021 03:49:13 +0100 From: Matthew Wilcox To: Hugh Dickins Cc: Andrew Morton , linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, linux-nvdimm@lists.01.org, linux-kernel@vger.kernel.org Subject: Re: BUG_ON(!mapping_empty(&inode->i_data)) Message-ID: <20210331024913.GS351017@casper.infradead.org> References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Mar 30, 2021 at 06:30:22PM -0700, Hugh Dickins wrote: > Running my usual tmpfs kernel builds swapping load, on Sunday's rc4-mm1 > mmotm (I never got to try rc3-mm1 but presume it behaved the same way), > I hit clear_inode()'s BUG_ON(!mapping_empty(&inode->i_data)); on two > machines, within an hour or few, repeatably though not to order. > > The stack backtrace has always been clear_inode < ext4_clear_inode < > ext4_evict_inode < evict < dispose_list < prune_icache_sb < > super_cache_scan < do_shrink_slab < shrink_slab_memcg < shrink_slab < > shrink_node_memgs < shrink_node < balance_pgdat < kswapd. > > ext4 is the disk filesystem I read the source to build from, and also > the filesystem I use on a loop device on a tmpfs file: I have not tried > with other filesystems, nor checked whether perhaps it happens always on > the loop one or always on the disk one. I have not seen it happen with > tmpfs - probably because its inodes cannot be evicted by the shrinker > anyway; I have not seen it happen when "rm -rf" evicts ext4 or tmpfs > inodes (but suspect that may be down to timing, or less pressure). > I doubt it's a matter of filesystem: think it's an XArray thing. > > Whenever I've looked at the XArray nodes involved, the root node > (shift 6) contained one or three (adjacent) pointers to empty shift > 0 nodes, which each had offset and parent and array correctly set. > Is there some way in which empty nodes can get left behind, and so > fail eviction's mapping_empty() check? There isn't _supposed_ to be. The XArray is supposed to delete nodes whenever the ->count reaches zero. It might give me a clue if you could share a dump of the tree, if you still have that handy. > I did wonder whether some might get left behind if xas_alloc() fails > (though probably the tree here is too shallow to show that). Printks > showed that occasionally xas_alloc() did fail while testing (maybe at > memcg limit), but there was no correlation with the BUG_ONs. This is a problem inherited from the radix tree, and I really want to justify fixing it ... I think I may have enough infrastructure in place to do it now (as part of the xas_split() commit we can now allocate multiple xa_nodes in xas->xa_alloc). But you're right; if we allocated all the way down to an order-0 node, then this isn't the bug. Were you using the ALLOW_ERROR_INJECTION feature on __add_to_page_cache_locked()? I haven't looked into how that works, and maybe that could leave us in an inconsistent state. > I did wonder whether this is a long-standing issue, which your new > BUG_ON is the first to detect: so tried 5.12-rc5 clear_inode() with > a BUG_ON(!xa_empty(&inode->i_data.i_pages)) after its nrpages and > nrexceptional BUG_ONs. The result there surprised me: I expected > it to behave the same way, but it hits that BUG_ON in a minute or > so, instead of an hour or so. Was there a fix you made somewhere, > to avoid the BUG_ON(!mapping_empty) most of the time? but needs > more work. I looked around a little, but didn't find any. I didn't make a fix for this issue; indeed I haven't observed it myself. It seems like cgroups are a good way to induce allocation failures, so I should play around with that a bit. The userspace test-suite has a relatively malicious allocator that will fail every allocation not marked as GFP_KERNEL, so it always exercises the fallback path for GFP_NOWAIT, but then it will always succeed eventually. > I had hoped to work this out myself, and save us both some writing: > but better hand over to you, in the hope that you'll quickly guess > what's up, then I can try patches. I do like the no-nrexceptionals > series, but there's something still to be fixed. Agreed. It seems like it's unmasking a bug that already existed, so it's not an argument for dropping the series, but we should fix the bug so we don't crash people's machines. Arguably, the condition being checked for is not serious enough for a BUG_ON. A WARN_ON, yes, and dump the tree for later perusal, but it's just a memory leak, and not (I think?) likely to lead to later memory corruption. The nodes don't contain any pages, so there's nothing to point to the mapping.