Received: by 2002:a05:6358:11c7:b0:104:8066:f915 with SMTP id i7csp4130186rwl; Tue, 28 Mar 2023 03:06:14 -0700 (PDT) X-Google-Smtp-Source: AKy350aQHDKSfR8rzgx1hCTGqTWw9ofmslE1SJKoImz47ZQONisRBmvGYQ5RkG3KN7m22DX8ME2M X-Received: by 2002:a17:906:5785:b0:93d:1c2b:bd23 with SMTP id k5-20020a170906578500b0093d1c2bbd23mr17700104ejq.39.1679997974398; Tue, 28 Mar 2023 03:06:14 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1679997974; cv=none; d=google.com; s=arc-20160816; b=c9jI5car36NIcRYyV1AXkMv+eBeVahWio8IjvbdyJnLHHGE0aHyvm+/3faNuBjqyEX AgHamV4YElPWPuCcNIQt9/mGCIdrptWAQM4TsmR8FYTs2SSAICpPouml+zSVmgDCHTzx vM2HWOiQftaVVT5N++iXPs4VnLZY62/4bEJtapl7xhALgPXzLaxAfBxElFXr2lSndo/z +g1XB8skLFdmj/ym9ZeR33OOxa5cZqcDItmMvzBdb1Bt2HhI2Z6E/H2GeEB4nssi9B0h 2p7Pae8IPVHyuaeFsnKKD75MvuaLfrGfTL96FsMWSMxubKT89K8nZeh+/TXSNMzKgoov 9HHg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-transfer-encoding :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature:dkim-signature; bh=L3swI2nzmHgC/eyRcwS8QwJ3kMOr2pact6o7Wl+DUFU=; b=Co9gmcwlEp6S3+Qd3ltRLPngZSmPdijn9clAWssGCebFL/SBFPoQvGFZr72ZBodHlL 0K6ZkA/lHtlg5G0Hu7oG7Gzdjy9hnaCnG6FcduakK3+6JChdpbGoqHRp34VCGFBfvxDP on1dhh23ZnPqgyjNOmXTmAa5IlWfhg8FSEWf6LH7lfer7Qk7MpoiWLkjgD7JIDlKQ6F2 SMSgrG8mjq25aS9+3Bp5knmXOx22ebVkjDlAD2bv8MERdg1dBiFlhIkNSmLSId9qGoi7 +D8Xm6qdNR718uEBtK430mbTZvEzdTVyT7fbosTsiLQGgPHQKZDqTPQoM6N6OnFsTnE4 KGTA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@suse.cz header.s=susede2_rsa header.b=pAmMUoqF; dkim=neutral (no key) header.i=@suse.cz header.s=susede2_ed25519 header.b=BU5nnRu8; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id n15-20020a170906118f00b0092fa562c84fsi32541081eja.33.2023.03.28.03.05.49; Tue, 28 Mar 2023 03:06:14 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@suse.cz header.s=susede2_rsa header.b=pAmMUoqF; dkim=neutral (no key) header.i=@suse.cz header.s=susede2_ed25519 header.b=BU5nnRu8; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232518AbjC1KA5 (ORCPT + 99 others); Tue, 28 Mar 2023 06:00:57 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51088 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232832AbjC1KAv (ORCPT ); Tue, 28 Mar 2023 06:00:51 -0400 Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.220.29]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 926136180; Tue, 28 Mar 2023 03:00:39 -0700 (PDT) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id 4211F1F8BA; Tue, 28 Mar 2023 10:00:38 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_rsa; t=1679997638; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=L3swI2nzmHgC/eyRcwS8QwJ3kMOr2pact6o7Wl+DUFU=; b=pAmMUoqFK6W/dtk0yrj0704FOtnNRbilsg+V34xos4fSRAF923B/9lqaXvVa6vi70PYU1/ 7N6BFRYpKkNet/2krMA4c9G3nL3zDGNdyz3G9WQrD9MDRuSwb/owlcsgJrpusDcfJT/Hfg kiG8v9x4tIHblCcOVpQ1+JyIvY3aqRs= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_ed25519; t=1679997638; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=L3swI2nzmHgC/eyRcwS8QwJ3kMOr2pact6o7Wl+DUFU=; b=BU5nnRu84251gMYEIyM8zbXt/bYEcwx0MtQfieoUTy2uLITmlTbWYbfCgGJuImIjPwbHRC A569vx2qGJyR+xAg== Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 24D421390B; Tue, 28 Mar 2023 10:00:38 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id kIz1CMa6ImRnWwAAMHmgww (envelope-from ); Tue, 28 Mar 2023 10:00:38 +0000 Received: by quack3.suse.cz (Postfix, from userid 1000) id 69618A071C; Tue, 28 Mar 2023 12:00:37 +0200 (CEST) Date: Tue, 28 Mar 2023 12:00:37 +0200 From: Jan Kara To: Baokun Li Cc: Jan Kara , linux-ext4@vger.kernel.org, tytso@mit.edu, adilger.kernel@dilger.ca, ritesh.list@gmail.com, linux-kernel@vger.kernel.org, yi.zhang@huawei.com, yangerkun@huawei.com, yukuai3@huawei.com Subject: Re: [PATCH] ext4: only update i_reserved_data_blocks on successful block allocation Message-ID: <20230328100037.vy23wsnl437ujdoh@quack3> References: <20230325063443.1839558-1-libaokun1@huawei.com> <20230327124700.mnldh4sosp3ptbls@quack3> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: X-Spam-Status: No, score=-2.5 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED,SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon 27-03-23 21:09:42, Baokun Li wrote: > On 2023/3/27 20:47, Jan Kara wrote: > > On Sat 25-03-23 14:34:43, Baokun Li wrote: > > > In our fault injection test, we create an ext4 file, migrate it to > > > non-extent based file, then punch a hole and finally trigger a WARN_ON > > > in the ext4_da_update_reserve_space(): > > > > > > EXT4-fs warning (device sda): ext4_da_update_reserve_space:369: > > > ino 14, used 11 with only 10 reserved data blocks > > > > > > When writing back a non-extent based file, if we enable delalloc, the > > > number of reserved blocks will be subtracted from the number of blocks > > > mapped by ext4_ind_map_blocks(), and the extent status tree will be > > > updated. We update the extent status tree by first removing the old > > > extent_status and then inserting the new extent_status. If the block range > > > we remove happens to be in an extent, then we need to allocate another > > > extent_status with ext4_es_alloc_extent(). > > > > > > use old to remove to add new > > > |----------|------------|------------| > > > old extent_status > > > > > > The problem is that the allocation of a new extent_status failed due to a > > > fault injection, and __es_shrink() did not get free memory, resulting in > > > a return of -ENOMEM. Then do_writepages() retries after receiving -ENOMEM, > > > we map to the same extent again, and the number of reserved blocks is again > > > subtracted from the number of blocks in that extent. Since the blocks in > > > the same extent are subtracted twice, we end up triggering WARN_ON at > > > ext4_da_update_reserve_space() because used > ei->i_reserved_data_blocks. > > Hum, but this second call to ext4_map_blocks() should find already allocated > > blocks in the indirect block and thus should not be subtracting > > ei->i_reserved_data_blocks for the second time. What am I missing? > > > > Honza > > > ext4_map_blocks > ? 1. Lookup extent status tree firstly > ?????? goto found; > ? 2. get the block without requesting a new file system block. > found: > ? 3. ceate and map the block > > When we call ext4_map_blocks() for the second time, we directly find the > corresponding blocks in the extent status tree, and then go directly to step > 3, > because our flag is brand new and therefore does not contain EXT4_MAP_MAPPED > but contains EXT4_GET_BLOCKS_CREATE, thus subtracting > ei->i_reserved_data_blocks > for the second time. Ah, I see. Thanks for explanation. But then the problem is deeper than just a mismatch in number of reserved delalloc block. The problem really is that if extent status tree update fails, we have inconsistency between what is stored in the extent status tree and what is stored on disk. And that can cause even data corruption issues in some cases. So I think we rather need to work on handling of errors in extent status tree operations. In the extent status tree, we have extents which we can just drop without issues and extents we must not drop - this depends on the extent's status - currently ext4_es_is_delayed() extents must stay, others may be dropped but I'd wrap the decision in a helper function. I'm currently inclined towards the following: 1) Removal must never fail. If we need to split extent, we use GFP_NOFAIL if we cannot just drop the second part of the split extent in case of allocation failure. 2) Similarly if inserting extent that cannot be dropped, we use GFP_NOFAIL. 3) We do not try to "undo" failed operations like we currently do - with the above rules we never loose information that cannot be restored. And this should also fix the problem you've hit because in case of allocation failure we may just end up with removed extent from the extent status tree and thus we refetch info from the disk and find out blocks are already allocated. Honza -- Jan Kara SUSE Labs, CR