Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp3093567imu; Sun, 6 Jan 2019 18:44:37 -0800 (PST) X-Google-Smtp-Source: ALg8bN7D+aDgl7Z7zpUyXPQ7BN0RPj5+PK+vQvHmHp69ZfmqfV0yTSd01XRfax7rjNkjFILtJuT6 X-Received: by 2002:a62:30c3:: with SMTP id w186mr61621211pfw.39.1546829077906; Sun, 06 Jan 2019 18:44:37 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1546829077; cv=none; d=google.com; s=arc-20160816; b=nV/S5zGODkdTmH2SY1fdQv5T8q1rmAcjZJHWjwEQ/PAw9z04pOHOcwiB4XvJW2DFB1 TG/RhCs9DeKViHZzF6ybdmCX7G1Pr9FVmeavNqW0Y0vahq1F+50GIXhu55xmMYt9RhUU Lf4LlFkVIQV2BmGo9mJ7np/dPFnIYYw51RChG2VWk+qTWRUI4OXJrew719GKBeMhzS9i bUeaSR73abbmzuSO7ru01NAj4gMi4wG7FJxv+0vHRuKM9Tp/nBFzphI++2AIbbCJ7R1n xMV/3xCH5dfNRdf2YfLrgPgb9Ap6Gh8eelOT7Ho2buLQTAE/si6Lt+7iaV+1Q0mhq5yp 1FeA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject; bh=wvJltDy4lpzIG+7yT1zWRFS3y1d4Iu6C9PHGFMzxgeI=; b=eCRcIGGfYa4D+TkhCFbwpUH/5ur/3VwMdJ/bE/M8sD1DKPbrGaJ1NpmBNudSB4SNxn R4wU5bbi7VvkCQX9lTM3lUxfF40kC7Nb5aLiGyFakjUIBBKfPALj+6lR0nRHPhR98VOL 6qMI1dKInNPyBdsmre6nq0Z0cQC/YUqE/UNey0bSCXLa6k6fjl/ph4AM/AWJZZJcU+jP QduzICeBpTC5TArCtKIAjEfHdge5AYQldGGmZ4ok31eR87G7LDX7GTpTnPdkapLmB84m ZNyYkFMEgCuVYYAbob+2586IlE5HORcij1MqlOqyQ2kUsIxGjTuBTvlINeE5zblaPAA+ 0hsw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id l5si57798424plt.5.2019.01.06.18.44.22; Sun, 06 Jan 2019 18:44:37 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726494AbfAGC2C (ORCPT + 99 others); Sun, 6 Jan 2019 21:28:02 -0500 Received: from szxga07-in.huawei.com ([45.249.212.35]:53922 "EHLO huawei.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726160AbfAGC2C (ORCPT ); Sun, 6 Jan 2019 21:28:02 -0500 Received: from DGGEMS408-HUB.china.huawei.com (unknown [172.30.72.58]) by Forcepoint Email with ESMTP id A8C9745F3F053D891CD1; Mon, 7 Jan 2019 10:27:57 +0800 (CST) Received: from [127.0.0.1] (10.134.22.195) by DGGEMS408-HUB.china.huawei.com (10.3.19.208) with Microsoft SMTP Server id 14.3.408.0; Mon, 7 Jan 2019 10:25:12 +0800 Subject: Re: [PATCH v2] f2fs: fix sbi->extent_list corruption issue To: Jaegeuk Kim , Sahitya Tummala CC: , References: <1543207640-31033-1-git-send-email-stummala@codeaurora.org> <20190104080535.GB8475@codeaurora.org> <20190104203329.GA57873@jaegeuk-macbookpro.roam.corp.google.com> From: Chao Yu Message-ID: <0197c64e-6419-db96-d8a9-d17f446aa9c1@huawei.com> Date: Mon, 7 Jan 2019 10:25:11 +0800 User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.9.1 MIME-Version: 1.0 In-Reply-To: <20190104203329.GA57873@jaegeuk-macbookpro.roam.corp.google.com> Content-Type: text/plain; charset="windows-1252" Content-Language: en-US Content-Transfer-Encoding: 7bit X-Originating-IP: [10.134.22.195] X-CFilter-Loop: Reflected Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 2019/1/5 4:33, Jaegeuk Kim wrote: > On 01/04, Sahitya Tummala wrote: >> On Mon, Nov 26, 2018 at 10:17:20AM +0530, Sahitya Tummala wrote: >>> When there is a failure in f2fs_fill_super() after/during >>> the recovery of fsync'd nodes, it frees the current sbi and >>> retries again. This time the mount is successful, but the files >>> that got recovered before retry, still holds the extent tree, >>> whose extent nodes list is corrupted since sbi and sbi->extent_list >>> is freed up. The list_del corruption issue is observed when the >>> file system is getting unmounted and when those recoverd files extent >>> node is being freed up in the below context. >>> >>> list_del corruption. prev->next should be fffffff1e1ef5480, but was (null) >>> <...> >>> kernel BUG at kernel/msm-4.14/lib/list_debug.c:53! >>> task: fffffff1f46f2280 task.stack: ffffff8008068000 >>> lr : __list_del_entry_valid+0x94/0xb4 >>> pc : __list_del_entry_valid+0x94/0xb4 >>> <...> >>> Call trace: >>> __list_del_entry_valid+0x94/0xb4 >>> __release_extent_node+0xb0/0x114 >>> __free_extent_tree+0x58/0x7c >>> f2fs_shrink_extent_tree+0xdc/0x3b0 >>> f2fs_leave_shrinker+0x28/0x7c >>> f2fs_put_super+0xfc/0x1e0 >>> generic_shutdown_super+0x70/0xf4 >>> kill_block_super+0x2c/0x5c >>> kill_f2fs_super+0x44/0x50 >>> deactivate_locked_super+0x60/0x8c >>> deactivate_super+0x68/0x74 >>> cleanup_mnt+0x40/0x78 >>> __cleanup_mnt+0x1c/0x28 >>> task_work_run+0x48/0xd0 >>> do_notify_resume+0x678/0xe98 >>> work_pending+0x8/0x14 >>> >>> Fix this by cleaning up inodes, extent tree and nodes of those >>> recovered files before freeing up sbi and before next retry. >>> >> Hi Jaegeuk, Chao, >> >> I have observed another scenario where the similar list corruption issue >> can happen with sbi->inode_list as well. If recover_fsync_data() >> fails at some point in write_checkpoint() due to some error and if >> those recovered inodes are still dirty, then after the mount is >> successful, this issue is observed when that dirty inode is under >> writeback. > > recover_fsync_data() does iget/iput in pair, and destroy_fsync_dnodes() drops > its dirty list and call iput(), when there is an error. So, after then, there'd > be no dirty inodes. If there's no error, checkpoint() flushes quota/dentry pages > in dirty inodes as well. Can we check where this dirty inode came from? I guess it comes from: f2fs_recover_fsync_data() /* Needed for iput() to work correctly and not trash data */ sbi->sb->s_flags |= SB_ACTIVE; iput_final() if (!drop && (sb->s_flags & SB_ACTIVE)) { inode_add_lru(inode); spin_unlock(&inode->i_lock); return; } So dirty data in those inode can be remained after iput(), then meta/node can be persisted during next checkpoint, if checkpoint failed due to error, dirty inode remain in system. IIUC. > > Oh, one sceanrio can be an error by f2fs_disable_checkpoint() which will do GC. > >> >> [ 90.400500] list_del corruption. prev->next should be ffffffed1f566208, but was (null) >> [ 90.675349] Call trace: >> [ 90.677869] __list_del_entry_valid+0x94/0xb4 >> [ 90.682351] remove_dirty_inode+0xac/0x114 >> [ 90.686563] __f2fs_write_data_pages+0x6a8/0x6c8 >> [ 90.691302] f2fs_write_data_pages+0x40/0x4c >> [ 90.695695] do_writepages+0x80/0xf0 >> [ 90.699372] __writeback_single_inode+0xdc/0x4ac >> [ 90.704113] writeback_sb_inodes+0x280/0x440 >> [ 90.708501] wb_writeback+0x1b8/0x3d0 >> [ 90.712267] wb_workfn+0x1a8/0x4d4 >> [ 90.715765] process_one_work+0x1c0/0x3d4 >> [ 90.719883] worker_thread+0x224/0x344 >> [ 90.723739] kthread+0x120/0x130 >> [ 90.727055] ret_from_fork+0x10/0x18 >> >> I think it is better to cleanup those inodes completely before freeing sbi >> and before next retry as done in this patch. Would you like to re-consider >> this patch for this new issue? > > The patch was merged in mainline already. > Could you take a look at this patch? > >>From cb1d20e640402beed300c2bdce79311ee8a781ad Mon Sep 17 00:00:00 2001 > From: Jaegeuk Kim > Date: Fri, 4 Jan 2019 12:29:00 -0800 > Subject: [PATCH] f2fs: sync filesystem after roll-forward recovery You mean android kernel mainline? Thanks, > > Some works after roll-forward recovery can get an error which will release > all the data structures. Let's flush them in order to make it clean. > > One possible corruption came from: > > [ 90.400500] list_del corruption. prev->next should be ffffffed1f566208, but was (null) > [ 90.675349] Call trace: > [ 90.677869] __list_del_entry_valid+0x94/0xb4 > [ 90.682351] remove_dirty_inode+0xac/0x114 > [ 90.686563] __f2fs_write_data_pages+0x6a8/0x6c8 > [ 90.691302] f2fs_write_data_pages+0x40/0x4c > [ 90.695695] do_writepages+0x80/0xf0 > [ 90.699372] __writeback_single_inode+0xdc/0x4ac > [ 90.704113] writeback_sb_inodes+0x280/0x440 > [ 90.708501] wb_writeback+0x1b8/0x3d0 > [ 90.712267] wb_workfn+0x1a8/0x4d4 > [ 90.715765] process_one_work+0x1c0/0x3d4 > [ 90.719883] worker_thread+0x224/0x344 > [ 90.723739] kthread+0x120/0x130 > [ 90.727055] ret_from_fork+0x10/0x18 > > Reported-by: Sahitya Tummala > Signed-off-by: Jaegeuk Kim > --- > fs/f2fs/super.c | 8 ++++++-- > 1 file changed, 6 insertions(+), 2 deletions(-) > > diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c > index 547cb7459be7..bb02186293a3 100644 > --- a/fs/f2fs/super.c > +++ b/fs/f2fs/super.c > @@ -3357,7 +3357,7 @@ static int f2fs_fill_super(struct super_block *sb, void *data, int silent) > if (test_opt(sbi, DISABLE_CHECKPOINT)) { > err = f2fs_disable_checkpoint(sbi); > if (err) > - goto free_meta; > + goto sync_free_meta; > } else if (is_set_ckpt_flags(sbi, CP_DISABLED_FLAG)) { > f2fs_enable_checkpoint(sbi); > } > @@ -3370,7 +3370,7 @@ static int f2fs_fill_super(struct super_block *sb, void *data, int silent) > /* After POR, we can run background GC thread.*/ > err = f2fs_start_gc_thread(sbi); > if (err) > - goto free_meta; > + goto sync_free_meta; > } > kvfree(options); > > @@ -3392,6 +3392,10 @@ static int f2fs_fill_super(struct super_block *sb, void *data, int silent) > f2fs_update_time(sbi, REQ_TIME); > return 0; > > +sync_free_meta: > + /* safe to flush all the data */ > + sync_filesystem(sbi->sb); > + > free_meta: > /* flush dirty orphan inode objects */ > f2fs_sync_inode_meta(sbi); >