Received: by 2002:a05:6a10:22f:0:0:0:0 with SMTP id 15csp672815pxk; Wed, 16 Sep 2020 14:03:19 -0700 (PDT) X-Google-Smtp-Source: ABdhPJy+9DEVHcvUSAcun7WGEwXl9QfjXHb8ZbjZiZIp6/RZW3g63/e3qntWNzMwH0KZF3nUnzIc X-Received: by 2002:a17:906:f150:: with SMTP id gw16mr26049058ejb.528.1600290198806; Wed, 16 Sep 2020 14:03:18 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1600290198; cv=none; d=google.com; s=arc-20160816; b=o+xPXu8H8Z1dTp+S576Ah/yGKXR2w1T2Y62lkMRY01Ymnngs9KTCFqiBp/qKMdDwu+ A4OWpjnoB2MGEzMwMYfs9jfFdHjCdWWBuSuHsGVy1mroWgBbhgeYdBEIh1N8AO9F66Ib t68rIqB3STdmoE5pwvgysH+rrrMcdQDf9i4yqvWfgUTpXMz6zZI6FScEEm0ug5oGSbSw dSvtQ1f6ocS/b7efy2yZZTKKQBoTAxbWWTH/HOE8s8BtJUstEkNYzApq7milzE10fwel RWIjlQwO/q3bv/pdH5A0m7jWMps7+Ser9wYKcygn03aO0X/GE+j/Pc2JlJWnCaMLg07K eYgw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=R9f7dWqz3WI2JNORg4gHIfCWJtoqakfsLAqL6tDZRf8=; b=aHO7mjxI5HTv/aTt2/JHnyjhgkAB8qOC2S5+hVmG7b19P6fskadIjAlYFb3Iq4RSYF goAOaYC2qUxVBfeSchoE+vy6Y2jO10E6E0FAfopCfDRgDJ002pXn4MhbKzmM1dpR1+xt QDFvjJPs6yUT9rFUyxqrmf3Icf50ANvm0MRCjH2xyYkmkHgTNu5QA7IfoLQjfGEy5GvO iXe6lJ997CJg3w+MBKzK74fmaxM0y5Vmnlx/o7zuNjyBJFcHOQCgjxT25PXc1OGPsKRi aC5SHTYcx/B02sNd2kMS4LOuqWDAD5B1qsbY1Vt6Y5m0YLcGIudAYC8pbid4THIpHM/M 5HTQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-ext4-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-ext4-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id p16si12503676ejw.31.2020.09.16.14.02.54; Wed, 16 Sep 2020 14:03:18 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-ext4-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-ext4-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-ext4-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726129AbgIPQS7 (ORCPT + 99 others); Wed, 16 Sep 2020 12:18:59 -0400 Received: from mx2.suse.de ([195.135.220.15]:34194 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726501AbgIPQQZ (ORCPT ); Wed, 16 Sep 2020 12:16:25 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id 604CCB2A1; Wed, 16 Sep 2020 16:15:54 +0000 (UTC) Received: by quack2.suse.cz (Postfix, from userid 1000) id A4E951E12E1; Wed, 16 Sep 2020 18:15:38 +0200 (CEST) Date: Wed, 16 Sep 2020 18:15:38 +0200 From: Jan Kara To: Mauricio Faria de Oliveira Cc: Jan Kara , linux-ext4@vger.kernel.org, dann frazier , Mauricio Faria de Oliveira Subject: Re: [RFC PATCH v3 0/3] ext4/jbd2: data=journal: write-protect pages on transaction commit Message-ID: <20200916161538.GK3607@quack2.suse.cz> References: <20200910193127.276214-1-mfo@canonical.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20200910193127.276214-1-mfo@canonical.com> User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-ext4-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-ext4@vger.kernel.org Hi Mauricio! On Thu 10-09-20 16:31:24, Mauricio Faria de Oliveira wrote: > This series implements your suggestions for the RFC PATCH v2 set, > which we mostly talked about in cover letter [1] and PATCH 3 [2]. > (I added Suggested-by: tags, by the way, for due credit.) > > It looks almost good on fstests: zero regressions on data=ordered, > and only one regression in data=journal (generic/347); I'll check. > (That's with ext4; I'll check ocfs2, but it's only a minor change.) Glad to hear that! > Testing with 'stress-ng --mmap --mmap-file' runs well for days, > but it occasionally hits: > > JBD2: Spotted dirty metadata buffer (dev = vdc, blocknr = 74775). > There's a risk of filesystem corruption in case of system crash. > > I added dump_stack() in warn_dirty_buffer(), and it usually comes > from jbd2_journal_file_buffer(, BJ_Forget) in the commit function. > When filing from BJ_Shadow to BJ_Forget.. so it possibly happened > while BH_Shadow was still set! > > So I instrumented [test_]set_buffer_dirty() macros to dump_stack() > if BH_Shadow is set (i.e. buffer being set dirty during write-out.) > > This showed that the occasional BH_Dirty setter with BH_Shadow set > is block_page_mkwrite() in ext4_page_mkwrite(). And it seems right, > because it's called before do_journal_get_write_access() (where we > check for/wait on BH_Shadow.) > > ext4_page_mkwrite(): > > err = block_page_mkwrite(vma, vmf, get_block); > if (!err && ext4_should_journal_data(inode)) { > if (ext4_walk_page_buffers(handle, page_buffers(page), 0, > PAGE_SIZE, NULL, do_journal_get_write_access)) { > > The patches didn't directly break this, they only allow this code > to run more often as we disabled an early-return optimization for > the case 'all buffers mapped' in data=journal (question 2 in [1]): > > ext4_page_mkwrite(): > > * Return if we have all the buffers mapped. > ... > - if (page_has_buffers(page)) { > + if (page_has_buffers(page) && !ext4_should_journal_data(inode)) { > > > In order to confirm it, I built the unpatched v5.9-rc4 kernel, with > just the change to disable that optimization in data=journal -- and > it hit that occasionally too ("JBD2: Spotted dirty metadata buffer".) > > I was naive enough to mindlessly try to swap the order of those two > statements, in hopes that do_journal_get_write_access() should wait > for BH_Shadow to clear, and then we just call block_page_mkwrite(). > BUT it trips into the BUG() check in page_buffers() in the former. Yeah, you're right that code is wrong for data=journal case. We cannot really use block_page_mkwrite() for it - we rather need to use there something like: __block_write_begin(page, pos, len, ext4_get_block); ext4_walk_page_buffers(handle, page_buffers(page), from, to, NULL, do_journal_get_write_access); ext4_walk_page_buffers(handle, page_buffers(page), from, to, NULL, write_end_fn); or something like that, I guess you get the idea... Actually, I think data=journal mode should get it's own .page_mkwrite handler because the sharing between data=journal handling and the other cases is pretty minimal. Honza -- Jan Kara SUSE Labs, CR