Received: by 2002:a05:6a10:f347:0:0:0:0 with SMTP id d7csp229765pxu; Wed, 25 Nov 2020 01:22:53 -0800 (PST) X-Google-Smtp-Source: ABdhPJyjNuPLSSKPo1XzpS+fNeqRd+/38cMEX4Ow5NlGPpeAMk2cqWZDhAeHXtcZ1P8uUF1Q4Guf X-Received: by 2002:a05:6402:54c:: with SMTP id i12mr2593537edx.9.1606296173616; Wed, 25 Nov 2020 01:22:53 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1606296173; cv=none; d=google.com; s=arc-20160816; b=MqRLaylk4srq4LoKx2EWc8Yy+P84PqcMOah95wi5vYpeSnA9xjhkzOlnDC8mcJF5PO 3VGPgtm5+dIttamlt+3GjwZVE2vn+5OhsuXqFK99hWvFH6+pU9hl87GQOjEwqf1HHSyK 36DQRByn6ojofR0AZLV1exkxiCRlYdUsAx01fCEz4XQV1HQvl36+p+g16/vdh1i0W1sK 7IZMSFOFfIy8HA8bI5aPVlj1g0PdwzWqaV288ZD/5kAgYRs1DZ7DTEBiomliK33VMJ3B TaMf6klC/WXM90awQICQRR9j/1pcyjpM67AvkYzLYzL9m8Mhu4JLeqCuBPykswt+ij+3 zssA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:user-agent:in-reply-to:content-disposition :mime-version:references:message-id:subject:cc:to:from:date; bh=jbGfL3vHBzpm93VvDebzDuZpBVU2BDDCnCYWA7thkMI=; b=jgExpHp+/vOuTEqSzYEbEgOMVV3UXq2Xg+4k56t9FC7FhMeoWzCryGglujxcx/QJj0 WDXUYNf15WL3/SfwVPLKpHN8LeYx8THCSX1pZKUjISGwEpd5oukEtlmOYlifDIGiSK3W +IXAZySjWcY2bwpArgSB/HeGt52p1CM7GHN02QBehLQ/N1BWRFRfSAr6P5GPPaZ4bVuo mw+xsgyAumpLaRqm6cS89Rure4ufANJLD7IJtoqvhixLSkm1Go/MUvMvVTHc6T/Y5RPo WPyjIOnpqUfHF3Jzc7tzkNgdLEsOtCyGTZLM/lyu82cLTC8yYmLumngtljwBliS0hB7/ DMPQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-ext4-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-ext4-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id dd6si838622edb.135.2020.11.25.01.22.28; Wed, 25 Nov 2020 01:22:53 -0800 (PST) Received-SPF: pass (google.com: domain of linux-ext4-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-ext4-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-ext4-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726392AbgKYJUL (ORCPT + 99 others); Wed, 25 Nov 2020 04:20:11 -0500 Received: from mx2.suse.de ([195.135.220.15]:43090 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725938AbgKYJUK (ORCPT ); Wed, 25 Nov 2020 04:20:10 -0500 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id 7E1A2AE42; Wed, 25 Nov 2020 09:20:08 +0000 (UTC) Received: by quack2.suse.cz (Postfix, from userid 1000) id 96F841E130F; Wed, 25 Nov 2020 10:20:07 +0100 (CET) Date: Wed, 25 Nov 2020 10:20:07 +0100 From: Jan Kara To: Matthew Wilcox Cc: Hugh Dickins , Linus Torvalds , Jan Kara , syzbot , Andreas Dilger , Ext4 Developers List , Linux Kernel Mailing List , syzkaller-bugs , Theodore Ts'o , Linux-MM , Oleg Nesterov , Andrew Morton , "Kirill A. Shutemov" , Nicholas Piggin , Alex Shi , Qian Cai , Christoph Hellwig , "Darrick J. Wong" , William Kucharski , Jens Axboe , linux-fsdevel@vger.kernel.org, linux-xfs@vger.kernel.org Subject: Re: kernel BUG at fs/ext4/inode.c:LINE! Message-ID: <20201125092007.GA16944@quack2.suse.cz> References: <000000000000d3a33205add2f7b2@google.com> <20200828100755.GG7072@quack2.suse.cz> <20200831100340.GA26519@quack2.suse.cz> <20201124121912.GZ4327@casper.infradead.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20201124121912.GZ4327@casper.infradead.org> User-Agent: Mutt/1.10.1 (2018-07-13) Precedence: bulk List-ID: X-Mailing-List: linux-ext4@vger.kernel.org On Tue 24-11-20 12:19:12, Matthew Wilcox wrote: > On Mon, Nov 23, 2020 at 08:07:24PM -0800, Hugh Dickins wrote: > > Twice now, when exercising ext4 looped on shmem huge pages, I have crashed > > on the PF_ONLY_HEAD check inside PageWaiters(): ext4_finish_bio() calling > > end_page_writeback() calling wake_up_page() on tail of a shmem huge page, > > no longer an ext4 page at all. > > > > The problem is that PageWriteback is not accompanied by a page reference > > (as the NOTE at the end of test_clear_page_writeback() acknowledges): as > > soon as TestClearPageWriteback has been done, that page could be removed > > from page cache, freed, and reused for something else by the time that > > wake_up_page() is reached. > > > > https://lore.kernel.org/linux-mm/20200827122019.GC14765@casper.infradead.org/ > > Matthew Wilcox suggested avoiding or weakening the PageWaiters() tail > > check; but I'm paranoid about even looking at an unreferenced struct page, > > lest its memory might itself have already been reused or hotremoved (and > > wake_up_page_bit() may modify that memory with its ClearPageWaiters()). > > > > Then on crashing a second time, realized there's a stronger reason against > > that approach. If my testing just occasionally crashes on that check, > > when the page is reused for part of a compound page, wouldn't it be much > > more common for the page to get reused as an order-0 page before reaching > > wake_up_page()? And on rare occasions, might that reused page already be > > marked PageWriteback by its new user, and already be waited upon? What > > would that look like? > > > > It would look like BUG_ON(PageWriteback) after wait_on_page_writeback() > > in write_cache_pages() (though I have never seen that crash myself). > > I don't think this is it. write_cache_pages() holds a reference to the > page -- indeed, it holds the page lock! So this particular race cannot > cause the page to get recycled. I still have no good ideas what this > is :-( But does it really matter what write_cache_pages() does? I mean we start page writeback. I mean struct bio holds no reference to the page it writes. The only thing that prevents the page from being freed under bio's hands is PageWriteback bit. So when the bio is completing we do (e.g. in ext4_end_bio()), we usually walk all pages in a bio bio_for_each_segment_all() and for each page call end_page_writeback(), now once end_page_writeback() calls test_clear_page_writeback() which clears PageWriteback(), the page can get freed. And that can happen before the wake_up_page() call in end_page_writeback(). So a race will be like: CPU1 CPU2 ext4_end_bio() ... end_page_writeback(page) test_clear_page_writeback(page) free page reallocate page for something else we can even dirty & start to writeback 'page' wake_up_page(page) and we have a "spurious" wake up on 'page'. Honza -- Jan Kara SUSE Labs, CR