Received: by 2002:a05:6a10:9afc:0:0:0:0 with SMTP id t28csp1543265pxm; Thu, 24 Feb 2022 05:17:15 -0800 (PST) X-Google-Smtp-Source: ABdhPJy4yWp9CizqPPXXvB8feaLJcevd/Sxdih6SDU+VHyYRcZt4u9gc1mz6i0sWzR10FODw8YqO X-Received: by 2002:a65:5548:0:b0:375:9f78:62ae with SMTP id t8-20020a655548000000b003759f7862aemr660213pgr.390.1645708635262; Thu, 24 Feb 2022 05:17:15 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1645708635; cv=none; d=google.com; s=arc-20160816; b=kgfSMA9KobYqclt4K0o00alTK5ypuHiX8YPoD5CRd/oM28LAvwcQJKxL1aEvE0p5pt 6dZxTNlq/2wziYRd99M9vviM7sG9TlDyxQgVdumssMltaqCW0lER2bunWIQ66MxqUPWJ 0eDtulOe9ewEpgYo0ubPqURkkxvjgSzO3uu3V9bWHmqdDB16ZO78m+bjx/b2OYTapMmR sWkgfOlXZLsSB2YwuQ5fZ5GLeAIi1oNZ5YmW8nmSg+MEbjqWUzPEGl1AzHzglZwO20dr vNUdGY3SqcXQAViu4+RmItRBvwkauV/j7YMRh728/LXlc3Rb8Nk0hWvf7tpTiXp6TbtK +6+g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=0aj1sHJcqKfZb0FtBlcvqwZ+alQwzGOksJeJvrB2n3c=; b=ZDa9tKxi0w2RqYQY0HnnxG3qNeUIl0wjRfthCIwUeSlPZdaC4S3noW80J5szj3tGV7 A4E+h2NIDhdd2m5U4A/NhHHZONSfNLC8zLZrjakL75QT8ski6cs7YqFPB/DrcHQCzctb 2SSHRPVO3MLaOYAb0Cgnxi7i1gbadi6RJ3bbU3NUBLRhV5VU+hdQzr362z9wk8rIoROx 0KR/3NlnEoGWJ8Pgp20aqifvRRK4F9ns5NwsU8excQEtM7i1Ay+HHshXhbBMlmrW7zY2 IWV1vIk8UXFxmmRvjKaV/NGCRcx5+5Oy1xoNDFwNpjrQ4nU76LYAwlpvOliRNETqKKDZ 0tUw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@bytedance-com.20210112.gappssmtp.com header.s=20210112 header.b=lKuEAZny; spf=pass (google.com: domain of linux-ext4-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-ext4-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=bytedance.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id m191si2198680pga.361.2022.02.24.05.16.49; Thu, 24 Feb 2022 05:17:15 -0800 (PST) Received-SPF: pass (google.com: domain of linux-ext4-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@bytedance-com.20210112.gappssmtp.com header.s=20210112 header.b=lKuEAZny; spf=pass (google.com: domain of linux-ext4-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-ext4-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=bytedance.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232942AbiBXLoa (ORCPT + 99 others); Thu, 24 Feb 2022 06:44:30 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50104 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232438AbiBXLo3 (ORCPT ); Thu, 24 Feb 2022 06:44:29 -0500 Received: from mail-lf1-x12c.google.com (mail-lf1-x12c.google.com [IPv6:2a00:1450:4864:20::12c]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0BB8C28DDFA for ; Thu, 24 Feb 2022 03:43:59 -0800 (PST) Received: by mail-lf1-x12c.google.com with SMTP id e5so3321732lfr.9 for ; Thu, 24 Feb 2022 03:43:58 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20210112.gappssmtp.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=0aj1sHJcqKfZb0FtBlcvqwZ+alQwzGOksJeJvrB2n3c=; b=lKuEAZnyhHTHhRa8sI+KaAy+L/24jIHphLByYoXbT7Laiv3Pp2hGZx5OkqoRCCwQiB G5snGJHKil1n7xQZDv4CkZYg5Oexa2ZeRnkMZGZ9K3qBO72NusQBRomYTmqJx26nK2bl k1uSUGzItmiRTTBKKl9kR2We0meBcXcQOJid+4UuFcbyBJ6ZnMFnk+3VVdO40M8QaLXn 215J84p6I9iqmAxns1anQP0EGVwzk+gvFKfqAt15OfNbp6i5onWDrRyNazdXg3ct2bC9 2GESHzW5dJ+rx0bqHIv4hXUxUzqVnQxr9B59xRg2RVCmLF+JL9/DauUh+hTdUK82Kyfo PgVw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=0aj1sHJcqKfZb0FtBlcvqwZ+alQwzGOksJeJvrB2n3c=; b=IRORanJBURpJCdFNHS1nT/YVwwu9hrDxvn6s6sMTu1tTUCCS/NIL0AtE4B8uJKwKWj DrQzk24fPOauyJxRQI6vM+3TmKy5tGbOVHtnjH3F/v2wKp9SLqm1y+y2elWhoB325c0o KBhkaq665tmNJX1fqK/PtRV/NI+gQ5IamInLH4zl97WEhNrKOIOvKqbIElOFyoUm1diC XKAw6WrEU4lZGkd6/Tb8ETHXJ19CdH7+hcDL2fMpA3GB2tBmA+stZfNT/IXE6qBqKFdF /QIvilGysmqpWaz1fluZY5lquSJjUGm/tmncc3nPUnxTrNA7kgv0Q4KLWQHVrYfAjTe/ SWvg== X-Gm-Message-State: AOAM530/q/uuIL69m5odrxnrKo3XmT6h02+snjKHCnaPw1veGtx2WoeE H1DPveoRcTrcAuo7U+FSii3UJnPAvW1Z82L58/dqpw== X-Received: by 2002:a19:e049:0:b0:42f:b0e2:10c4 with SMTP id g9-20020a19e049000000b0042fb0e210c4mr1553336lfj.170.1645703037284; Thu, 24 Feb 2022 03:43:57 -0800 (PST) MIME-Version: 1.0 References: <20220223135755.plbr2fvt66k3xyn5@riteshh-domain> In-Reply-To: <20220223135755.plbr2fvt66k3xyn5@riteshh-domain> From: Xin Yin Date: Thu, 24 Feb 2022 19:43:46 +0800 Message-ID: Subject: Re: [External] [RFC 9/9] ext4: fast_commit missing tracking updates to a file To: Ritesh Harjani Cc: linux-ext4@vger.kernel.org, Harshad Shirwadkar , "Theodore Ts'o" , Jan Kara , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_NONE, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-ext4@vger.kernel.org On Wed, Feb 23, 2022 at 9:59 PM Ritesh Harjani wrote: > > On 22/02/23 11:50AM, Xin Yin wrote: > > On Wed, Feb 23, 2022 at 4:36 AM Ritesh Harjani wrote: > > > > > > > > > > > > Testcase > > > ========== > > > 1. i=0; while [ $i -lt 1000 ]; do xfs_io -f -c "pwrite -S 0xaa -b 32k 0 32k" -c "fsync" /mnt/$i; i=$(($i+1)); done && sudo ./src/godown -v /mnt && sudo umount /mnt && sudo mount /dev/loop2 /mnt' > > > 2. ls -alih /mnt/ -> In this you will observe one such file with 0 bytes (which ideally should not happen) > > > > > > ^^^ say if you don't see the issue because your underlying storage > > > device is very fast, then maybe try with commit=1 mount option. > > > > > > Analysis > > > ========== > > > It seems a file's updates can be a part of two transaction tid. > > > Below are the sequence of events which could cause this issue. > > > > > > jbd2_handle_start -> (t_tid = 38) > > > __ext4_new_inode > > > ext4_fc_track_template -> __track_inode -> (i_sync_tid = 38, t_tid = 38) > > > > > > jbd2_start_commit -> (t_tid = 38) > > > > > > jbd2_handle_start (tid = 39) > > > ext4_fc_track_template -> __track_inode -> (i_sync_tid = 38, t_tid 39) > > > -> ext4_fc_reset_inode & ei->i_sync_tid = t_tid > > > > > > ext4_fc_commit_start -> (will wait since jbd2 full commit is in progress) > > > jbd2_end_commit (t_tid = 38) > > > -> jbd2_fc_cleanup() -> this will cleanup entries in sbi->s_fc_q[FC_Q_MAIN] > > > -> And the above could result inode size as 0 as after effect. > > > ext4_fc_commit_stop > > > > > > You could find the logs for the above behavior for inode 979 at [1]. > > > > > > -> So what is happening here is since the ei->i_fc_list is not empty > > > (because it is already part of sb's MAIN queue), we don't add this inode > > > again into neither sb's MAIN or STAGING queue. > > > And after jbd2_fc_cleanup() is called from jbd2 full commit, we > > > just remove this inode from the main queue. > > > > > > So as a simple fix, what I did below was to check if it is a jbd2 full commit > > > in ext4_fc_cleanup(), and if the ei->i_sync_tid > tid, that means we > > > need not remove that from MAIN queue. This is since neither jbd2 nor FC > > > has committed updates of those inodes for this new txn tid yet. > > > > > > But below are some quick queries on this > > > ========================================= > > > > > > 1. why do we call ext4_fc_reset_inode() when inode tid and > > > running txn tid does not match? > > This is part of a change in commit:bdc8a53a6f2f, it fixes the issue > > for fc tracking logic while jbd2 commit is ongoing. > > Thanks Xin for pointing the other issue too. > But I think what I was mostly referring to was - calling ext4_fc_reset_inode() > in ext4_fc_track_template(). > Understood, I missed something here, then maybe Harshad can give some directions for this part. > <..> > 391 tid = handle->h_transaction->t_tid; > 392 mutex_lock(&ei->i_fc_lock); > 393 if (tid == ei->i_sync_tid) { > 394 update = true; > 395 } else { > 396 ext4_fc_reset_inode(inode); > 397 ei->i_sync_tid = tid; > 398 } > 399 ret = __fc_track_fn(inode, args, update); > 400 mutex_unlock(&ei->i_fc_lock); > <..> > > So, yes these are few corner cases which I want to take a deeper look at. > I vaugely understand that this reset inode is done since we anyway might have > done the full commit for previous tid, so we can reset the inode track range. > > So, yes, we should carefully review this as well that if jbd2 commit happens for > an inode which is still part of MAIN_Q, then does it make sense to still > call ext4_fc_reset_inode() for that inode in ext4_fc_track_template()? > > > If the inode tid is bigger than txn tid, that means this inode may be > > in the STAGING queue, if we reset it then it will lose the tack range. > > I think it's a similar issue, the difference is this inode is already > > Do you have a test case which was failing for your issue? > I would like to test that one too. This issue can be triggered by generic/455 , but this is one failed case for it. I also do not have a reproducer for this. > > > > in the MAIN queue before the jbd2 commit starts. > > And yes , I think in this case we can not remove it from the MAIN > > Yes. I too have a similar thought. But I still wanted to get few queries sorted > (like point 1 & 2). > > > queue, but still need to clear EXT4_STATE_FC_COMMITTING right? it may > > block some task still waiting for it. > > Sorry I didn't get you here. So I think we will end up in such situation > (where ext4_fc_cleanup() is getting called for an inode with i_sync_tid > tid) > only from full commit path right ? > And that won't set EXT4_FC_COMMITTING for this inode right anyways no? I am not sure if there are any other cases, But for now we also clear EXT4_STATE_FC_COMMITTING in the full commit path right? So maybe some further tests are needed. Thanks, Xin Yin > > Do you mean anything else, or am I missing something here? > > -ritesh > > > > > > Thanks, > > Xin Yin > > > > > > 2. Also is this an expected behavior from the design perspective of > > > fast_commit. i.e. > > > a. the inode can be part of two tids? > > > b. And that while a full commit is in progress, the inode can still > > > receive updates but using a new transaction tid. > > > > > > Frankly speaking, since I was also working on other things, so I haven't > > > yet got the chance to completely analyze the situation yet. > > > Once I have those things sorted, I will spend more time on this, to > > > understand it more. Meanwhile if you already have some answers to above > > > queries/observations, please do share those here. > > > > > > Links > > > ========= > > > [1] https://raw.githubusercontent.com/riteshharjani/LinuxStudy/master/ext4/fast_commit/fc_inode_missing_updates_ino_979.txt > > > > > > Signed-off-by: Ritesh Harjani > > > --- > > > fs/ext4/fast_commit.c | 2 ++ > > > 1 file changed, 2 insertions(+) > > > > > > diff --git a/fs/ext4/fast_commit.c b/fs/ext4/fast_commit.c > > > index 8803ba087b07..769b584c2552 100644 > > > --- a/fs/ext4/fast_commit.c > > > +++ b/fs/ext4/fast_commit.c > > > @@ -1252,6 +1252,8 @@ static void ext4_fc_cleanup(journal_t *journal, int full, tid_t tid) > > > spin_lock(&sbi->s_fc_lock); > > > list_for_each_entry_safe(iter, iter_n, &sbi->s_fc_q[FC_Q_MAIN], > > > i_fc_list) { > > > + if (full && iter->i_sync_tid > tid) > > > + continue; > > > list_del_init(&iter->i_fc_list); > > > ext4_clear_inode_state(&iter->vfs_inode, > > > EXT4_STATE_FC_COMMITTING); > > > -- > > > 2.31.1 > > >