Received: by 2002:a05:6a10:17d3:0:0:0:0 with SMTP id hz19csp3378955pxb; Tue, 20 Apr 2021 07:07:07 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxW7uQZyZr/zfC4Fz3GG7ZJM10SUXvG+dv4rwjbSRQGq6boojAgZyzwjmgmHZTvp2pv2Otu X-Received: by 2002:a50:eb0d:: with SMTP id y13mr4691258edp.326.1618927627316; Tue, 20 Apr 2021 07:07:07 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1618927627; cv=none; d=google.com; s=arc-20160816; b=jBCGtfOJ6eCuztfp8/AJkM+Dan1aG2YwgbISmIiFDE9/v37okj6cYuO2AK85hicKkN LlLL0pv3zguTyDROL45IGGOP3og91Qbqx9pVqQc8qmF8fFYR2uAsSMgmde0Aj3dsSC1+ DJiWhr1o4O0XroJIn/FMJmhL0yc0LID+QVjXBnpfoi4Vcf9GwPyT5B1/fZcxejKWtFjl zYSVrXYRO/i71KiYOQJz2e63r/zgPMq7PMq/LSpiUFFUGiylRtIaNyQjpMrJvPKAfLU1 HWGmp90iBDjmDdPRrP7pllPf8P13xxenkE8EJNjMNn0GZkfnFgRWa6X7wAncx3YVsL88 8OVg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to :mime-version:user-agent:date:message-id:from:cc:references:to :subject; bh=vu8k/w4TcAm8n+daUbX9CxVWpJKW9RR6zqg5ePx2rNs=; b=IX3do26amfDhfhjnMCVFFQViabe8h+n9fQOps16zWkCuoR2Oq6ADWrmgxwliOXmumX qzh/b4jYAjW5vS7T8c0NTYOj6KkJgn+CaOmF1nLsFShl7PivZl4l6ulhf96WXIdv278T VWkQxpj9th1b2rDAj/lW1C3/GndCDJ5BAXLFhcu4iTfCD83xvLF2BnlehhWKkZVOHaFR o5cfk/yoUqVU3k9QvGo1bSQsnuUhaA0ObOHNKE11CRdfQTrHC35jrtDCxlwEaL7q7Yrs xYJdezuj13GrdmkTUITmAI3vKsKnaW3IVTQIZTs6AJTcxafb0UsA/OM4sNGg6Cd+/vcc cVPQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=huawei.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id go7si15472116ejc.278.2021.04.20.07.06.43; Tue, 20 Apr 2021 07:07:07 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=huawei.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232540AbhDTOGN (ORCPT + 99 others); Tue, 20 Apr 2021 10:06:13 -0400 Received: from szxga06-in.huawei.com ([45.249.212.32]:17382 "EHLO szxga06-in.huawei.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232359AbhDTOGK (ORCPT ); Tue, 20 Apr 2021 10:06:10 -0400 Received: from DGGEMS405-HUB.china.huawei.com (unknown [172.30.72.58]) by szxga06-in.huawei.com (SkyGuard) with ESMTP id 4FPllc5McCzlYL1; Tue, 20 Apr 2021 22:03:40 +0800 (CST) Received: from [10.174.176.255] (10.174.176.255) by DGGEMS405-HUB.china.huawei.com (10.3.19.205) with Microsoft SMTP Server id 14.3.498.0; Tue, 20 Apr 2021 22:05:35 +0800 Subject: Re: [PATCH] ext4: Fix bug on in ext4_es_cache_extent as ext4_split_extent_at failed To: Theodore Ts'o References: <20210325022925.1769056-1-yebin10@huawei.com> <606D0DE5.8070002@huawei.com> CC: , , From: yebin Message-ID: <607EDFAF.1050307@huawei.com> Date: Tue, 20 Apr 2021 22:05:35 +0800 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:38.0) Gecko/20100101 Thunderbird/38.1.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset="windows-1252"; format=flowed Content-Transfer-Encoding: 7bit X-Originating-IP: [10.174.176.255] X-CFilter-Loop: Reflected Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 2021/4/9 9:47, Theodore Ts'o wrote: > On Wed, Apr 07, 2021 at 09:41:57AM +0800, yebin wrote: >>>> If call ext4_ext_insert_extent failed but new extent already inserted, we just >>>> update "ex->ee_len = orig_ex.ee_len", this will lead to extent overlap, then >>>> cause bug on when cache extent. >>> How did this happen in the first place? It sounds like if the extent >>> was already inserted, that would be casue there was an on-disk file >>> system corruption, no? >>> >>> In that case, shouldn't we call ext4_error() to declare the file >>> system has an inconsistency, so it can be fixed by fsck? >> We inject IO fault when runing fsstress, JBD detect IO error then trigger >> JBD abort. At the same time, >> if ext4_ext_insert_extent already insert new exntent then call >> ext4_ext_dirty to dirty metadata , but >> JBD already aborted , ext4_ext_dirty will return error. >> In ext4_ext_dirty function call ext4_ext_check_inode check extent if ok, if >> not, trigger BUG_ON and >> also print extent detail information. > In this particular case, skipping the "ex->ee_len = orig_ex.ee_len" > may avoid the BUG_ON. But it's not clear that this is always the > right thing to do. The fundamental question is what should we do we > run into an error while we are in the middle of making changes to > on-disk and in-memory data structures? > > In the ideal world, we should undo the changes that we were in the > middle of making before we return an error. That way, the semantics > are very clear; on success, the function has made the requested change > to the file system. If the function returns an error, then no changes > should be made. > > That was the reasoning behind resetting ex->ee_len to orig_ex.ee_len > in the fix_extent_len inside ext4_split_extent_at(). Unofrtunately, > ext4_ext_insert_extent() does *not* always follow this convention, and > that's because it would be extremely difficult for it to do so --- the > mutations that it makes can be quite complex, including potentially > increasing the height of the extent tree. > > However, I don't think your fix is by any means the ideal one, because > the much more common way that ext4_ext_insert_extent() is when it > needs to insert a new leaf node, or need to increase the height of the > extent tree --- and in it returns an ENOSPC failure. In that case, it > won't have made any changes changes in the extent tree, and so having > ext4_split_extent_at() undo the change to ex->ee_len is the right > thing to do. > > Having blocks get leaked when there is an ENOSPC failure, requiring > fsck to be run --- and without giving the user any warning that this > has happened is *not* a good way to fail. So I don't think the > proposed patch is the right way to go. > > A better way to go would be to teach ext4_ext_insert_extent() so if > there is a late failure, that it unwinds the leaf node back to its > original state (at least from a semantic value). Since the extent > leaf node could have been split, and/or adjacent extent entries may > have been merged, what it would need to do is to remember the starting > block number and length, and make whatever changes are necessaries to > the extent entries in that leaf node corresponding to that starting > block number and length. > > If you don't want to do that, then a "do no harm" fix would be > something like this: > > ... > } else if (err == -EROFS) { > return err; > } else if (err) > goto fix_extent_len; Thanks for your advice. I will send v2 patch. > > So in the journal abort case, when err is set to EROFS, we don't try > to reset the length, since in theory the file system is read-only > already anyway. However, in the ENOSPC case, we won't end up silently > leaking blocks that will be lost until the user somehow decides to run > fsck. > > There are still times when this doesn't get things completely right > (e.g., what if we get a late ENOMEM error versus an early ENOMEM > failure), where the only real fix is to make ext4_ext_insert_extent() > obey the convention that if it returns an error, it must not result in > any user-visible state change. > > Cheers, > > - Ted > .