Received: by 2002:a05:6a10:f347:0:0:0:0 with SMTP id d7csp1381100pxu; Mon, 23 Nov 2020 21:06:47 -0800 (PST) X-Google-Smtp-Source: ABdhPJw/Mo234Ow0eiFen8u67objBadRZIjksYej3tQNteLf7O7rXfg10W1YA2TXkbNeXwb6aavj X-Received: by 2002:a50:dac7:: with SMTP id s7mr2266519edj.106.1606194407505; Mon, 23 Nov 2020 21:06:47 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1606194407; cv=none; d=google.com; s=arc-20160816; b=methlxHnnpcLEG6h3MWzYwqaAK4RLRGDCIScmAflb46q8CnF0VfQwl+QLk6eZVjnoc ZJ/FcYHptKOl0JHzXY0hkhnirYEP3AryQkzC+2J+8neGHzaSxJDx3XS9z1ZEb0KGoBQO RuJToS2v6fKUgFUlC4xeWxqZrtlnHUwiOvvs5xIgNYPHIleGLjU7E3noqwfOmFGA2Pb3 BOwCe6uJo5KkcDv9z1lTv685Q2T189TFwAk1EMtk2Zb3wCfT50J9Qk3GYENyJBPl4/NH j5M864mKzkgy3572Vx4W/CfFJX6LfuqI6SLbQh4K/Ym18r4ubNzMXsmHsJAowvIo0iWw QUMg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=39gr2tFdgeuXdCwWbn22GRnP3K3VvibhIlGf9k17rGw=; b=SSyspeLZdwNqLFIJAHrj5OIJVby+1oodD+1UkEvdKSq6UORgSBVBCZIc/oLvpWAW7j hemFLWbHWE23GhNa9jUh6YTpmfTm1TyFuCY9ScUUc6lLdjVZQHeb3wsn3pTM9B7ferL2 WHxZpP9cGkM3LMfufthiCbvUDRuuu0ZrEVqr4a4eespO630a5rmdpOyZxmXk9+lhs8hs X4RyhRm94inSaPpYMBSTFUeuAqux4hT4wcPQ1dw8K5vYv9c2GxOYIR0QHe0OgR6VHAxw ODne5vCSY78pTfoaMgLKeLaY6PIlH/C23qhkr4UFIvswNAii/5tds2LcWPnA/FExqsQd J3SA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linux-foundation.org header.s=google header.b=SwaVIz6Z; spf=pass (google.com: domain of linux-ext4-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-ext4-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id l25si7822769ejz.442.2020.11.23.21.06.23; Mon, 23 Nov 2020 21:06:47 -0800 (PST) Received-SPF: pass (google.com: domain of linux-ext4-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@linux-foundation.org header.s=google header.b=SwaVIz6Z; spf=pass (google.com: domain of linux-ext4-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-ext4-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727457AbgKXExm (ORCPT + 99 others); Mon, 23 Nov 2020 23:53:42 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38582 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726934AbgKXExm (ORCPT ); Mon, 23 Nov 2020 23:53:42 -0500 Received: from mail-lj1-x244.google.com (mail-lj1-x244.google.com [IPv6:2a00:1450:4864:20::244]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A9CE6C0613CF for ; Mon, 23 Nov 2020 20:53:41 -0800 (PST) Received: by mail-lj1-x244.google.com with SMTP id f18so2566824ljg.9 for ; Mon, 23 Nov 2020 20:53:41 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux-foundation.org; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=39gr2tFdgeuXdCwWbn22GRnP3K3VvibhIlGf9k17rGw=; b=SwaVIz6ZuT6iXE5tagOrwT7N6KNmmSkcEsDFfOdmzj5yjFEuDkdXhaHbzf22ipogzf zWRWhpolqbEMbnZmb7w8SY8YssaOfvRbra2P6MvkDrdQIcceCNS8eCPycSxvqCyBE/Wt Aogcb79k9z8ri+jJ6nmyXvkP8hv8F792ZSl3g= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=39gr2tFdgeuXdCwWbn22GRnP3K3VvibhIlGf9k17rGw=; b=H4/E+Nqou9VJU2TgKksux4pZ3bX38e3gFUuHg+6EqT9J8xD5SieZH2Hmr32LuI1RI/ VYs2wZ0c/JIZSL40h8LOArs17qpk9MY9EGd/E2Vf+EMMzvHjFWgMISKWJu6W7ZXMJgdC 6etq6GwPPcRvUa+rqz4F6uvtbWAov1X+nFIIKj/kcdHk52oiYRQxMS4VCwCtdYYswIyr Wt2hQAVw0dME2VkXs5JEesLQcKHp6ZR3m4/BsoetnZoTpxQqk0wOKZFGhTC497rKCaTw d82+y0/wkxJzX3TugiuZoxMHI0T3JCGHc+PueDqffmQCGTT/RIzwvM5HFtBCyQSJIayZ yyfQ== X-Gm-Message-State: AOAM533ZfPCSaDjO+DnZi5Nt9yR9xt25hbLWgKXn2UfnTsgFpGQ/T/v6 iOZN0npRCOrFyhF/spZKeNb40s8zxFSfOQ== X-Received: by 2002:a2e:544e:: with SMTP id y14mr1143534ljd.9.1606193619874; Mon, 23 Nov 2020 20:53:39 -0800 (PST) Received: from mail-lf1-f48.google.com (mail-lf1-f48.google.com. [209.85.167.48]) by smtp.gmail.com with ESMTPSA id v9sm1598605lfd.287.2020.11.23.20.53.37 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 23 Nov 2020 20:53:38 -0800 (PST) Received: by mail-lf1-f48.google.com with SMTP id z21so26981741lfe.12 for ; Mon, 23 Nov 2020 20:53:37 -0800 (PST) X-Received: by 2002:a19:ae06:: with SMTP id f6mr1057406lfc.133.1606193616810; Mon, 23 Nov 2020 20:53:36 -0800 (PST) MIME-Version: 1.0 References: <000000000000d3a33205add2f7b2@google.com> <20200828100755.GG7072@quack2.suse.cz> <20200831100340.GA26519@quack2.suse.cz> In-Reply-To: From: Linus Torvalds Date: Mon, 23 Nov 2020 20:53:20 -0800 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: kernel BUG at fs/ext4/inode.c:LINE! To: Hugh Dickins Cc: Jan Kara , syzbot , Andreas Dilger , Ext4 Developers List , Linux Kernel Mailing List , syzkaller-bugs , "Theodore Ts'o" , Linux-MM , Oleg Nesterov , Andrew Morton , "Kirill A. Shutemov" , Nicholas Piggin , Alex Shi , Qian Cai , Christoph Hellwig , "Darrick J. Wong" , Matthew Wilcox , William Kucharski , Jens Axboe , linux-fsdevel , linux-xfs Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-ext4@vger.kernel.org On Mon, Nov 23, 2020 at 8:07 PM Hugh Dickins wrote: > > Then on crashing a second time, realized there's a stronger reason against > that approach. If my testing just occasionally crashes on that check, > when the page is reused for part of a compound page, wouldn't it be much > more common for the page to get reused as an order-0 page before reaching > wake_up_page()? And on rare occasions, might that reused page already be > marked PageWriteback by its new user, and already be waited upon? What > would that look like? > > It would look like BUG_ON(PageWriteback) after wait_on_page_writeback() > in write_cache_pages() (though I have never seen that crash myself). So looking more at the patch, I started looking at this part: > + writeback = TestClearPageWriteback(page); > + /* No need for smp_mb__after_atomic() after TestClear */ > + waiters = PageWaiters(page); > + if (waiters) { > + /* > + * Writeback doesn't hold a page reference on its own, relying > + * on truncation to wait for the clearing of PG_writeback. > + * We could safely wake_up_page_bit(page, PG_writeback) here, > + * while holding i_pages lock: but that would be a poor choice > + * if the page is on a long hash chain; so instead choose to > + * get_page+put_page - though atomics will add some overhead. > + */ > + get_page(page); > + } and thinking more about this, my first reaction was "but that has the same race, just a smaller window". And then reading the comment more, I realize you relied on the i_pages lock, and that this odd ordering was to avoid the possible latency. But what about the non-mapping case? I'm not sure how that happens, but this does seem very fragile. I'm wondering why you didn't want to just do the get_page() unconditionally and early. Is avoiding the refcount really such a big optimization? Linus