Received: by 2002:a05:6a10:f3d0:0:0:0:0 with SMTP id a16csp4722545pxv; Tue, 6 Jul 2021 07:41:22 -0700 (PDT) X-Google-Smtp-Source: ABdhPJx1zs7t3HTbtTpSSbq0U8bTf+UhgG4MDAl9vSwY6eB4v0dO/Ad7zBRSFrt/5bn8rBN+XHur X-Received: by 2002:a17:906:c113:: with SMTP id do19mr19255860ejc.541.1625582482413; Tue, 06 Jul 2021 07:41:22 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1625582482; cv=none; d=google.com; s=arc-20160816; b=TS6y8EfEfRbS9JLL/CBgXygBJI/fQOjT5vQzOEDVHUdsP2tdvHy8R4AJpQvj06ro6v R/2AyxUadX3iMF8r2nCjJlGQ2Nz2yKd11AVC2ORxentRaPkkn1argT57rs6yKjJcmIPU HJtN9lRIX9nU61AYSnNe1+S8ilRKYOlqb+BN5jImHzr92myqUM+hWFCe51K45uKlAUvi TG20dNJDJCn1PMjkfV6epS03zmG8JJaBJUd/TX31KAUn3EWM1CS6MW6A8Z6YCtU/QPtm RLIcqY2NccwCPNp3uCM7639iYpbTK2oY90Z9+NZhY0f17rMwoAYMMNBDW6th2eK/WzXv 4xvQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:content-language :in-reply-to:mime-version:user-agent:date:message-id:from:references :cc:to:subject; bh=yZWtHeoVr0qgcNQODnm2tkW8NA6NBud4G75cYdVPQoM=; b=Zloa3MrzvkPe942jDoAApm2rPENfrDD6k4MmfogYCMXuNmv3jsiDoNW20Usl7bgT2z xuTHcy/+7qDa7PLhqQsZ+nLY5qwEajV35scebD0pcm/BMn3qBhjP+v8vCMxQuG/bISDf 4t/w/x97cYZbWLnCTy6L1jPvDi0xfSdDZPBA3ZKH1uaW/sw8fiTMchtRIKkw1v7Ggl9w mE3N3CBydoGgGRCOEiK3tf3XKkbTsb+ZRewKQXsDWWw76QA8/A3TiDdSjCAp711K2GG9 0qw7Q4M9B6C7jPpZiRnS1O6gRokoFK3D9bHuQ9sWeDokdYFJQYVGU3bRtQ4XeGCFT5Xd sqyQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-ext4-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-ext4-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=huawei.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id v10si8097260edc.102.2021.07.06.07.40.54; Tue, 06 Jul 2021 07:41:22 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-ext4-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-ext4-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-ext4-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=huawei.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231523AbhGFOnb (ORCPT + 99 others); Tue, 6 Jul 2021 10:43:31 -0400 Received: from szxga01-in.huawei.com ([45.249.212.187]:6072 "EHLO szxga01-in.huawei.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232397AbhGFOna (ORCPT ); Tue, 6 Jul 2021 10:43:30 -0400 Received: from dggeme752-chm.china.huawei.com (unknown [172.30.72.54]) by szxga01-in.huawei.com (SkyGuard) with ESMTP id 4GK4pb5WSFzXphc; Tue, 6 Jul 2021 22:35:19 +0800 (CST) Received: from [10.174.178.134] (10.174.178.134) by dggeme752-chm.china.huawei.com (10.3.19.98) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256_P256) id 15.1.2176.2; Tue, 6 Jul 2021 22:40:47 +0800 Subject: Re: [RFC PATCH 1/4] ext4: check and update i_disksize properly To: Jan Kara CC: , , , References: <20210706024210.746788-1-yi.zhang@huawei.com> <20210706024210.746788-2-yi.zhang@huawei.com> <20210706121123.GB7922@quack2.suse.cz> From: Zhang Yi Message-ID: <32946f62-631e-d752-9fcf-e89b568e2e7f@huawei.com> Date: Tue, 6 Jul 2021 22:40:46 +0800 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101 Thunderbird/78.3.1 MIME-Version: 1.0 In-Reply-To: <20210706121123.GB7922@quack2.suse.cz> Content-Type: text/plain; charset="utf-8" Content-Language: en-US Content-Transfer-Encoding: 7bit X-Originating-IP: [10.174.178.134] X-ClientProxiedBy: dggems705-chm.china.huawei.com (10.3.19.182) To dggeme752-chm.china.huawei.com (10.3.19.98) X-CFilter-Loop: Reflected Precedence: bulk List-ID: X-Mailing-List: linux-ext4@vger.kernel.org On 2021/7/6 20:11, Jan Kara wrote: > On Tue 06-07-21 10:42:07, Zhang Yi wrote: >> After commit 3da40c7b0898 ("ext4: only call ext4_truncate when size <= >> isize"), i_disksize could always be updated to i_size in ext4_setattr(), >> and it seems that there is no other way that could appear >> i_disksize < i_size besides the delalloc write. In the case of delay > > Well, there are also direct IO writes which have temporarily i_disksize < > i_size but when you hold i_rwsem, you're right that delalloc is the only > reason why you can see i_disksize < i_size AFAIK. > >> alloc write, ext4_writepages() could update i_disksize for the new delay >> allocated blocks properly. So we could switch to check i_size instead >> of i_disksize in ext4_da_write_end() when write to the end of the file. > > I agree that since ext4_da_should_update_i_disksize() needs to return true > for us to touch i_disksize, writeback has to have already allocated block > underlying the end of write (new_i_size position) and thus we are > guaranteed that writeback will also soon update i_disksize after the > new_i_size position. So I agree that your switch to testing i_size instead > of i_disksize should not have any bad effect... Thinking about this some > more why do we need i_disksize update in ext4_da_write_end() at all? The > page will be dirtied and when writeback will happen we will update > i_disksize to i_size. Updating i_disksize earlier brings no benefit - the user > will see zeros instead of valid data if we crash before the writeback > happened. Am I missing something guys? > Hi, Jan. Do you remember the patch and question I asked 2 years ago[1][2]? The case of new_i_size > i_size && ext4_da_should_update_i_disksize() here means partial block append write, ext4_writepages() does not update i_disksize for this case now. And the journal data=ordered mode also cannot guarantee write data before metadata. So we cannot guarantee we cannot see zeros where data was written after crash. Thanks, Yi. [1]https://lore.kernel.org/linux-ext4/20190404101823.GA22313@quack2.suse.cz/ [2]https://lore.kernel.org/linux-ext4/20190405091258.GA1600@quack2.suse.cz/ > >> we also could remove ext4_mark_inode_dirty() together because >> generic_write_end() will dirty the inode. >> >> Signed-off-by: Zhang Yi >> --- >> fs/ext4/inode.c | 21 ++++++++------------- >> 1 file changed, 8 insertions(+), 13 deletions(-) >> >> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c >> index d8de607849df..6f6a61f3ae5f 100644 >> --- a/fs/ext4/inode.c >> +++ b/fs/ext4/inode.c >> @@ -3087,32 +3087,27 @@ static int ext4_da_write_end(struct file *file, >> * generic_write_end() will run mark_inode_dirty() if i_size >> * changes. So let's piggyback the i_disksize mark_inode_dirty >> * into that. >> + * >> + * Check i_size not i_disksize here because ext4_writepages() could >> + * update i_disksize from i_size for delay allocated blocks properly. >> */ >> new_i_size = pos + copied; >> - if (copied && new_i_size > EXT4_I(inode)->i_disksize) { >> + if (copied && new_i_size > inode->i_size) { >> if (ext4_has_inline_data(inode) || >> - ext4_da_should_update_i_disksize(page, end)) { >> + ext4_da_should_update_i_disksize(page, end)) >> ext4_update_i_disksize(inode, new_i_size); >> - /* We need to mark inode dirty even if >> - * new_i_size is less that inode->i_size >> - * bu greater than i_disksize.(hint delalloc) >> - */ >> - ret = ext4_mark_inode_dirty(handle, inode); >> - } >> } >> >> if (write_mode != CONVERT_INLINE_DATA && >> ext4_test_inode_state(inode, EXT4_STATE_MAY_INLINE_DATA) && >> ext4_has_inline_data(inode)) >> - ret2 = ext4_da_write_inline_data_end(inode, pos, len, copied, >> + ret = ext4_da_write_inline_data_end(inode, pos, len, copied, >> page); >> else >> - ret2 = generic_write_end(file, mapping, pos, len, copied, >> + ret = generic_write_end(file, mapping, pos, len, copied, >> page, fsdata); >> >> - copied = ret2; >> - if (ret2 < 0) >> - ret = ret2; >> + copied = ret; >> ret2 = ext4_journal_stop(handle); >> if (unlikely(ret2 && !ret)) >> ret = ret2; >> -- >> 2.31.1 >>