Received: by 2002:a05:6a10:f347:0:0:0:0 with SMTP id d7csp1417066pxu; Mon, 23 Nov 2020 22:37:38 -0800 (PST) X-Google-Smtp-Source: ABdhPJzgaCqxxiS1RDo47K8EyMDtwJOJhldv4u0zMosUhTZ57yHixGxmaiGBxj427k1cQ1E07udw X-Received: by 2002:a50:cd0a:: with SMTP id z10mr2453380edi.223.1606199857862; Mon, 23 Nov 2020 22:37:37 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1606199857; cv=none; d=google.com; s=arc-20160816; b=joMYu8QpryS2Fmq0C5NWB1CUmRNSTxLHsL5kERgyEz9uSPCc/SK7XFggyT80812xd+ XaTgbwOHGAru8VYWiCIZ5EYvPIamayF5Bsk7B44n+jC12qbDfNNZ3Te+OMWvbZ661dcM liz/VxCY1+/hPTGXfmD3aeCPdgv1tKrQG0bbdT2MZdEOhYFhnOGrJ3Qwqj9nQuNs3pvX kqBFZ6Mvin4svMppSThve5ilJiXNfLNdKv2wbFXJuiFy6zRP8Yt+tPoYm50WPhg9pdsG rzjsewDMnBsUCO8ksDAUil9Dy1swIfPdFMKXf4Y/0+0ANqT51sqf/NzPTnKBenkwWBhS KC8g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:mime-version:user-agent:references:message-id :in-reply-to:subject:cc:to:from:date:dkim-signature; bh=NRH6VONBYNIxV3zO4nNRr9V11VXjtGQdzP+hrNGAm3A=; b=JZLit3F/q32X1ZsyzU7k2sqAo88WFGBa6AUKPHwznJsFIi+s01kbEmwuEO0E+jRHTb LlyMz7M7pMwxrQX4BNA5+8A8UvHxgiz/tEjcLE41/u6DMeovrD4glWKfmnXBH9U5dw6R V282Mu2xz+MXvRZpCE5o8WO3pyENSYkHL7PYCP+HFVSK+9KtcUKl3SfYGijxOqp+riSx iMBe48WZP1UawXGE7cIrOV1QOFOfnsA1xMQudcRCgCoYT6jvGGsgs2F5wjRGexk4FGL6 2sE4+H05h8GTx9vLXWink8JOCrMC2Y9Uw4tJ2eYC7NCMm9AoM5PgnH374UfUKU1i0QIr WWbQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=qhe1mkmW; spf=pass (google.com: domain of linux-ext4-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-ext4-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id be20si8239850edb.423.2020.11.23.22.37.04; Mon, 23 Nov 2020 22:37:37 -0800 (PST) Received-SPF: pass (google.com: domain of linux-ext4-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=qhe1mkmW; spf=pass (google.com: domain of linux-ext4-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-ext4-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729795AbgKXGe2 (ORCPT + 99 others); Tue, 24 Nov 2020 01:34:28 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54012 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729791AbgKXGe2 (ORCPT ); Tue, 24 Nov 2020 01:34:28 -0500 Received: from mail-oi1-x22a.google.com (mail-oi1-x22a.google.com [IPv6:2607:f8b0:4864:20::22a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 319E0C061A4D for ; Mon, 23 Nov 2020 22:34:28 -0800 (PST) Received: by mail-oi1-x22a.google.com with SMTP id l206so22575654oif.12 for ; Mon, 23 Nov 2020 22:34:28 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:from:to:cc:subject:in-reply-to:message-id:references :user-agent:mime-version; bh=NRH6VONBYNIxV3zO4nNRr9V11VXjtGQdzP+hrNGAm3A=; b=qhe1mkmWMwhGuf8fqGpCpj+e8tRFPMtqWbixqm50z4cHkRXVBUsZ7CU1/j/3vZBSEx i02OxFKXk2LaAEF2fZz5tPHf9/q0ss9DbDxNVwhC7LKVfn+oQXptj0CFmw2v7eEfmSFJ M86CkeLLA/t8Wn0Nn9rrOWw8p4ZUls/7JJw8vEpy+N6au2YzdMd/T4PwcEHuveafA3sq zxAujieBdrx0MmJ2WiNkfYcYGM5h/HaN0/81OBBEE+PaVIft5zJvVBg0mxu7bj5mV/zF 0CCxa96dPF8hFwabQ+wMVg+QK5D/GRETSm2SsNTVmgVJ6yDhF5p3QGC7wnIvZr2Oad3A aq6g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:in-reply-to:message-id :references:user-agent:mime-version; bh=NRH6VONBYNIxV3zO4nNRr9V11VXjtGQdzP+hrNGAm3A=; b=EDZyEiWUDuaHpih1VlZYKUtkgwRM2coyyphWy0bDfb7Sr8c+uqfNYftPJwSU5x+Sk5 qw9kixUSjQ6eAO/KYesoM0BpIARC2px3fslSklihpJ2lAz3gBXg80APvw9EMrMlIn9Vu OhA/9WJj7Y+Axg6/5+VzIDpPQj9GiDD1sAKVmu1pj90S73lf4a8FaKwDO4pWPic+/7OT hvDS27g50v2kdEb7+BYt36k63Jz5D8jJYl1ooVXDfpERYb9yyN/g8N+d620mBfpbcGsF E0zK7cK0p3ZEZOSV3/akZ3U1wEB1YLE7aanxJK7nVrCqJFaOPh3ZVfSe9Jx+0CztUvFU W5/A== X-Gm-Message-State: AOAM533XyP/X51J0KxubJZI6IKq751w6+i7N+ihp2fDLCZ6qlZV5cRrL bVRBJuWm3GkWVO42PITgDf61Cg== X-Received: by 2002:aca:3087:: with SMTP id w129mr1737657oiw.78.1606199667073; Mon, 23 Nov 2020 22:34:27 -0800 (PST) Received: from eggly.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id m65sm8116677otm.40.2020.11.23.22.34.24 (version=TLS1 cipher=ECDHE-ECDSA-AES128-SHA bits=128/128); Mon, 23 Nov 2020 22:34:25 -0800 (PST) Date: Mon, 23 Nov 2020 22:34:12 -0800 (PST) From: Hugh Dickins X-X-Sender: hugh@eggly.anvils To: Linus Torvalds cc: Hugh Dickins , Jan Kara , syzbot , Andreas Dilger , Ext4 Developers List , Linux Kernel Mailing List , syzkaller-bugs , Theodore Ts'o , Linux-MM , Oleg Nesterov , Andrew Morton , "Kirill A. Shutemov" , Nicholas Piggin , Alex Shi , Qian Cai , Christoph Hellwig , "Darrick J. Wong" , Matthew Wilcox , William Kucharski , Jens Axboe , linux-fsdevel , linux-xfs Subject: Re: kernel BUG at fs/ext4/inode.c:LINE! In-Reply-To: Message-ID: References: <000000000000d3a33205add2f7b2@google.com> <20200828100755.GG7072@quack2.suse.cz> <20200831100340.GA26519@quack2.suse.cz> User-Agent: Alpine 2.11 (LSU 23 2013-08-11) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Precedence: bulk List-ID: X-Mailing-List: linux-ext4@vger.kernel.org On Mon, 23 Nov 2020, Linus Torvalds wrote: > On Mon, Nov 23, 2020 at 8:07 PM Hugh Dickins wrote: > > > > The problem is that PageWriteback is not accompanied by a page reference > > (as the NOTE at the end of test_clear_page_writeback() acknowledges): as > > soon as TestClearPageWriteback has been done, that page could be removed > > from page cache, freed, and reused for something else by the time that > > wake_up_page() is reached. > > Ugh. > > Would it be possible to instead just make PageWriteback take the ref? > > I don't hate your patch per se, but looking at that long explanation, > and looking at the gyrations end_page_writeback() does, I go "why > don't we do that?" > > IOW, why couldn't we just make the __test_set_page_writeback() > increment the page count if the writeback flag wasn't already set, and > then make the end_page_writeback() do a put_page() after it all? Right, that should be a lot simpler, and will not require any of the cleanup (much as I liked that). If you're reasonably confident that adding the extra get_page+put_page to every writeback (instead of just to the waited case, which I presume significantly less common) will get lost in the noise - I was not confident of that, nor confident of devising realistic tests to decide it. What I did look into before sending, was whether in the filesystems there was a pattern of doing a put_page() after *set_page_writeback(), when it would just be a matter of deleting that put_page() and doing it instead at the end of end_page_writeback(). But no: there were a few cases like that, but in general no such pattern. Though, what I think I'll try is not quite what you suggest there, but instead do both get_page() and put_page() in end_page_writeback(). The reason being, there are a number of places (in mm at least) where we judge what to do by the expected refcount: places that know to add 1 on when PagePrivate is set (for buffers), but do not expect to add 1 on when PageWriteback is set. Now, all of those places probably have to have their own wait_on_page_writeback() too, but I'd rather narrow the window when the refcount is raised, than work through what if any change would be needed in those places. > > > > Then on crashing a second time, realized there's a stronger reason against > > that approach. If my testing just occasionally crashes on that check, > > when the page is reused for part of a compound page, wouldn't it be much > > more common for the page to get reused as an order-0 page before reaching > > wake_up_page()? And on rare occasions, might that reused page already be > > marked PageWriteback by its new user, and already be waited upon? What > > would that look like? > > > > It would look like BUG_ON(PageWriteback) after wait_on_page_writeback() > > in write_cache_pages() (though I have never seen that crash myself). > > So looking more at the patch, I started looking at this part: > > > + writeback = TestClearPageWriteback(page); > > + /* No need for smp_mb__after_atomic() after TestClear */ > > + waiters = PageWaiters(page); > > + if (waiters) { > > + /* > > + * Writeback doesn't hold a page reference on its own, relying > > + * on truncation to wait for the clearing of PG_writeback. > > + * We could safely wake_up_page_bit(page, PG_writeback) here, > > + * while holding i_pages lock: but that would be a poor choice > > + * if the page is on a long hash chain; so instead choose to > > + * get_page+put_page - though atomics will add some overhead. > > + */ > > + get_page(page); > > + } > > and thinking more about this, my first reaction was "but that has the > same race, just a smaller window". > > And then reading the comment more, I realize you relied on the i_pages > lock, and that this odd ordering was to avoid the possible latency. Yes. I decided to send the get_page+put_page variant, rather than the wake_up_page_bit while holding i_pages variant (also tested), in part because it's easier to edit the get_page+put_page one to the other. > > But what about the non-mapping case? I'm not sure how that happens, > but this does seem very fragile. I don't see how the non-mapping case would ever occur: I think it probably comes from a general pattern of caution about NULL mapping when akpm (I think) originally wrote these functions. > > I'm wondering why you didn't want to just do the get_page() > unconditionally and early. Is avoiding the refcount really such a big > optimization? I don't know: I trust your judgement more than mine. Hugh