Received: by 2002:a05:6a10:7420:0:0:0:0 with SMTP id hk32csp4186553pxb; Mon, 21 Feb 2022 14:17:37 -0800 (PST) X-Google-Smtp-Source: ABdhPJzCMfMKtqnfNKsEguuWWwDfQko1qsLxdc77ZlEg9ZpWtdi6i5NfUuPaK7phOZv8l5LXKc46 X-Received: by 2002:a17:903:18a:b0:14f:bdc4:713 with SMTP id z10-20020a170903018a00b0014fbdc40713mr6114821plg.109.1645481856996; Mon, 21 Feb 2022 14:17:36 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1645481856; cv=none; d=google.com; s=arc-20160816; b=FP0NuOJHAA+U7X27lSOys3WVCtUc+5DE/meCXCRBjBlAfcWTVhKE+bZMaLAFzJgTGO rn1fh6IJ+RlB8ltqtu1ktee4Bed72bzN/D/8Sow7TwFriOvrPfVXgU6q6+zLJEmvSZvO hdMpW0hR7j67R1SdJjPy5UmhTAu9aw2isxsoFztwo5/P6tpRbE32SijW2PVV44cWb4cO bqbNX6IWFNGYf9JKVt8cSbr4HBSOhcns1YFaWzu5CMA5VacsbUNtrBWjvh7G0YbhCtd1 LwuRJWS6cILeukrExznlpNKy0y0ciHtTNpxDZQRiCT9ht7AHWyiYd+4j7NoIwODhHoiJ pGiQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=2zHdnlxp+6QZ5JDAEiaLocZ8jy1h1CjM+knGftC+SwA=; b=bIEE/YEP73/A6UlsHC84aLKpbd+G7uwNMDoyb0gapYruFNDlp/EiU0aGKNYSyCLhxF F36cuI3iolYWql96uvUFaMVKGOvG29dklCP2t4iEfgb9RECuHXqxwrAxToh+KPSEI81A Fk8HrlcuGbe9uim9YEX1cNbH7Cil+0cJYAeUITNklA6inTggnCc7DapEpTeriaqsXds5 JzY6G3rb1bpJl+v4UO6CLi7Zb9N0jJ5EfTwhit7zgQj1Xv1ElLObCK0O3Fooa9VfBaSl BmI0QH6agAwRy/kVx6ka2pR2flAad6wsNQRg8Wolt8GN0k2LxPWE1hl4gPInihlE2u9Y 987g== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b=ZF8YuwM3; spf=pass (google.com: domain of linux-ext4-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-ext4-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id s10si32993450plg.590.2022.02.21.14.17.19; Mon, 21 Feb 2022 14:17:36 -0800 (PST) Received-SPF: pass (google.com: domain of linux-ext4-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b=ZF8YuwM3; spf=pass (google.com: domain of linux-ext4-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-ext4-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230347AbiBUSy5 (ORCPT + 99 others); Mon, 21 Feb 2022 13:54:57 -0500 Received: from mxb-00190b01.gslb.pphosted.com ([23.128.96.19]:37056 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232541AbiBUSyz (ORCPT ); Mon, 21 Feb 2022 13:54:55 -0500 Received: from mail-ed1-x536.google.com (mail-ed1-x536.google.com [IPv6:2a00:1450:4864:20::536]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 61B24C7F for ; Mon, 21 Feb 2022 10:54:29 -0800 (PST) Received: by mail-ed1-x536.google.com with SMTP id cm8so21677987edb.3 for ; Mon, 21 Feb 2022 10:54:29 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=2zHdnlxp+6QZ5JDAEiaLocZ8jy1h1CjM+knGftC+SwA=; b=ZF8YuwM3tdQCkn7CfH8d1zAZaUpslyzBc6CmziaKGbKVeNhaYwKyCEOpqIwRrY3jP5 P18kv2I1Iv+yBlVkwYMVWJe0ERG21REoyKay7UeZy+IPesb7fDl6O9L0pCi211oKbE/n kqzsaYCgVIOqkLQUYdCrb99Mx4KQ4Kz64GXW7NP7lYKWdrOkLk0WdqEo5O+YC24vyK1N sgIVL/qKCV9VGI1tTjSkjd84Jge6oaNMFwzmSAACt1URDHRue61evqjBeln+nnmU70V2 LhGUEIXQrMSmGXL1VClwtBpHRa0esh+UelBJzTU8D1sQ/fhMCXo4jZqJq7WCiEwCcz5O 6wlg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=2zHdnlxp+6QZ5JDAEiaLocZ8jy1h1CjM+knGftC+SwA=; b=aZ5UA0lB5wSxVXT4No1RFjESahKgsM+hDoHNbx/q6ju3pY6fwMzctSG8yhnt2DL4hc E4CnoREaFnfV8z2aEtlnDHwLSPs9BWGpAZEMH8HF/HKlfPSsaPr2x2GEEcUJjeoV/y92 GJ2eiyavIuvSjtcHTEFL3WGldgD65Yb6OVFi0uL6lVlk+lzJsMkzJXolyeMVuuO24n50 ySpDsowJoOi0tnwMDWiPaO6CiRQ6fE5UAtLlL6n3o3U2eyYjQSSAYq53OzReDgsqqCGA UzEANRDuZwCAZYLwKxp6LVXJZEepwzJrdTBdOqJSy0SJFdJV3CaUWqdcPdjtczIoY+ZZ YudQ== X-Gm-Message-State: AOAM531YPaW0hrHD0Gil/Y83pGydAhDhDIZHvaWPY9NX3V84LfNypqrr zMFA9fn+BsToKOggLIqRTmn3UxuHNsB2ru2oBFc= X-Received: by 2002:aa7:df1a:0:b0:409:5174:68a9 with SMTP id c26-20020aa7df1a000000b00409517468a9mr23336435edy.145.1645469667726; Mon, 21 Feb 2022 10:54:27 -0800 (PST) MIME-Version: 1.0 References: <20220221075938.g2lncbi7sxnnbrhr@riteshh-domain> In-Reply-To: <20220221075938.g2lncbi7sxnnbrhr@riteshh-domain> From: harshad shirwadkar Date: Mon, 21 Feb 2022 10:54:16 -0800 Message-ID: Subject: Re: Query regarding fast_commit replay of inode To: Ritesh Harjani Cc: "Theodore Ts'o" , linux-ext4 , Jan Kara Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM, RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-ext4@vger.kernel.org Hi Ritesh, Thanks for bringing this up again. I think we should really document this in the code. Let me first try to explain what we are doing and then why we are doing it this way. During fast commit replay (even for "add range" / "delete range" tags), we always clear the bitmaps of whatever blocks we touch. For example, even if new blocks are added to an inode, during the replay phase of add range tag, we will still clear the bitmap to mark these new blocks as free and at the end of the replay (i.e. after all the tags have been replayed) we will traverse the "modified inodes list" and mark all the blocks in these inodes as allocated. We extend this logic to the replay of the "inode" tag as well. So here's the high level flow of replay path wrt block bitmap handling: - For every "add_range" / "del_range" tags in the FC log area: - Clear block bitmaps for all modified blocks in this tag. - For every "inode" tag in the FC log area: - Clear all the blocks in that inode - Record that the inode is modified - At the end of the replay phase - For each inode in the modified inode list: - Mark all the blocks in this inode as used Let me try to describe the reason why we take this approach with an example. Let's say there was an inode A of size 100 with following extents: Logical - Physical [0-49] - unmapped [50-99] - [1000 - 1049] Let's say this inode changed to following mappings: Logical - Physical [0-49] - [1000 - 1049] [50-99] - [1050 - 1099] Let's say a fast commit was performed at this point. So, this would translate to following logs in the fast commit area: [ADD_RANGE, A, 0-49, 1000-1049] [ADD_RANGE, A, 50-99, 1050-1099] After replaying the first tag, the inode's intermediate state would be as follows: Logical - Physical [0-49] - [1000-1049] [50-99] - [1000-1049] During replay of the second tag, the logical to physical mapping of 50:99 has changed. So, we need to decide what to do with the old blocks. If we mark the old physical blocks corresponding to 50-99 range as free on-disk, then that would result in block bitmap inconsistency, since these blocks are actually being used somewhere else. We can't simply leave these blocks as allocated either since these blocks may actually not be used at all. Things get even more complicated if these blocks are moved to a different file. So to protect against such cases, what we simply do is that we defer setting up the block bitmap correctly until the very end and just mark all the blocks that we encounter during replays of different tags as free. This was done mainly to protect against such cases in add range / delete range operations, but I extended it to replay inode as well just to be on the safer side. generic/311 provides a lot of good test cases for such manipulations. We can see if removing clear_bb from replay inode handling introduces any regressions. Does this make sense? - Harshad On Sun, 20 Feb 2022 at 23:59, Ritesh Harjani wrote: > > Hello Ted/Harshad, > > I think we did discuss this once before, but I am unable to recollect the exact > reasoning for this. So question is - why do we have to call ext4_ext_clear_bb() > from ext4_fc_replay_inode()? > > I was just thinking if this is suboptimal and if it can be optimized. But before > working on that problem, I wanted to again understand the right reasoning behind > choosing this approach in the first place. > > Could you please help with this again? > > ext4_fc_replay_inode() > <..> > inode = ext4_iget(sb, ino, EXT4_IGET_NORMAL); > if (!IS_ERR(inode)) { > ext4_ext_clear_bb(inode); > iput(inode); > } > <..> > > I will document it this time, so that I don't have to keep coming to this > everytime I look into fc replay code. > > Thanks again for your help!! > -ritesh