Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp1590575imu; Wed, 9 Jan 2019 22:44:01 -0800 (PST) X-Google-Smtp-Source: ALg8bN7NAkmNOFkcLsjg5iHs45IaUTsGxyTnt6WuGGNK9wR9N/M2Xqb0K6qptkfUoqAsYRNIC/ru X-Received: by 2002:a63:9e19:: with SMTP id s25mr7301652pgd.203.1547102641197; Wed, 09 Jan 2019 22:44:01 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1547102641; cv=none; d=google.com; s=arc-20160816; b=GzXtivvwsMdR8z0vbMJzixLqraebu4HIEAmHjwwjg/YYb2hFImJPGfsyAO7cGkMEv8 U8sHNodb1ejPYLDzpu4RAyVrodD0sD21N8skoS+vrl1hN98Z6OtTHRsDTrEA3DVmAdPR TX1KfUbZVIO7zyaLYvn2LcQ5cy6L5Fo8uyfss2ZQrwwDJNkX1VEWbp7Ue6YRTETxcBgX +TSnBT+NDeDVFsHIz8dushLy68JduvNW117f+e9FCtac294WYt6OWH86AxJEs6/mQd/t Hm5ReMkp9hTkr6P3cjHvER5TkNnBJjFa5FuLymmM8zGl2OvQ8+wJAz2wUAeoBDLBDL/T mJ7w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dmarc-filter:dkim-signature:dkim-signature; bh=xX955zuDSDqZbYtNp0l3+zOgAlM2PqrWKfkzT9zm9PY=; b=SrQDq32+IaeqAfCfMf39/d+TM5TcUmn3jkI/WHpkwpXnOTrqNjslZteb2+zDIge1sC d3Tsv2GuXtxmgtxwtiQ2KFXnFOHrAHFe5gHAUrPMuE+Ake7kTY5BB2Zk8iG55qCppA/E MFvHPpVsoyOIcEfQ+UKB+bR/us7BvJNywiD23zXa+qldHyyI6NeuI3sKDPvG/viCWihj /nxHXlQtp9kCyuMmQYEHW4PVR+LQVhTD7bo2+dY7kt2BjGv55ivTYz3opEVpUWE5Utkf ktaXCt5pBxqZ1CU7cjt35MukOiCUVP+os6VM+iZyl/JwMtwPQe8Nzn1GBkJrgJh5NWT/ WnVw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@codeaurora.org header.s=default header.b=l8mkesGM; dkim=pass header.i=@codeaurora.org header.s=default header.b=N+9YCZKL; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 91si7937037ply.222.2019.01.09.22.43.45; Wed, 09 Jan 2019 22:44:01 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@codeaurora.org header.s=default header.b=l8mkesGM; dkim=pass header.i=@codeaurora.org header.s=default header.b=N+9YCZKL; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727334AbfAJGja (ORCPT + 99 others); Thu, 10 Jan 2019 01:39:30 -0500 Received: from smtp.codeaurora.org ([198.145.29.96]:53846 "EHLO smtp.codeaurora.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726968AbfAJGja (ORCPT ); Thu, 10 Jan 2019 01:39:30 -0500 Received: by smtp.codeaurora.org (Postfix, from userid 1000) id 94F68608C6; Thu, 10 Jan 2019 06:39:28 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=codeaurora.org; s=default; t=1547102368; bh=mLW4QGS0llUZX+CodpmB9q8YfBLGqEEkX5bom2/mnpk=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=l8mkesGMrsEI/sbjBhSuFljTg57NcezEPIUn9cuAKxwZ5ckyMicsXv/JplzBMpcqz mehYJB/Jpte6pEuNGddiN5D56pZGJ2P8A82TeYiN9LdSRS+owbnmH3jbtUyzb8o4Sq R0kEdBRUny/95po/FX5DGzxEI0lvFHuyJBk3YO1Y= X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on pdx-caf-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.7 required=2.0 tests=ALL_TRUSTED,BAYES_00, DKIM_INVALID,DKIM_SIGNED autolearn=no autolearn_force=no version=3.4.0 Received: from codeaurora.org (blr-c-bdr-fw-01_globalnat_allzones-outside.qualcomm.com [103.229.19.19]) (using TLSv1.2 with cipher DHE-RSA-AES128-SHA (128/128 bits)) (No client certificate requested) (Authenticated sender: stummala@smtp.codeaurora.org) by smtp.codeaurora.org (Postfix) with ESMTPSA id 672D06079B; Thu, 10 Jan 2019 06:39:25 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=codeaurora.org; s=default; t=1547102367; bh=mLW4QGS0llUZX+CodpmB9q8YfBLGqEEkX5bom2/mnpk=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=N+9YCZKLIsfWrftLd5cSnxWeZoLZKN7/4yyfSnJ0UeC60z5maq1azU6pcDAz9B637 gSf6KFTIGdAykpeEY07QWT1MauOFBfsvsqMW6R+fGy1wEkzPcBi+Vlfq8RSqZBcsSY 5kNBirRx0vVPz1xNLHD+krtCM5G6o8aK23k7zBhY= DMARC-Filter: OpenDMARC Filter v1.3.2 smtp.codeaurora.org 672D06079B Authentication-Results: pdx-caf-mail.web.codeaurora.org; dmarc=none (p=none dis=none) header.from=codeaurora.org Authentication-Results: pdx-caf-mail.web.codeaurora.org; spf=none smtp.mailfrom=stummala@codeaurora.org Date: Thu, 10 Jan 2019 12:09:21 +0530 From: Sahitya Tummala To: Chao Yu Cc: Jaegeuk Kim , linux-f2fs-devel@lists.sourceforge.net, linux-kernel@vger.kernel.org Subject: Re: [PATCH v2] f2fs: fix sbi->extent_list corruption issue Message-ID: <20190110063921.GA7308@codeaurora.org> References: <1543207640-31033-1-git-send-email-stummala@codeaurora.org> <20190104080535.GB8475@codeaurora.org> <20190104203329.GA57873@jaegeuk-macbookpro.roam.corp.google.com> <0197c64e-6419-db96-d8a9-d17f446aa9c1@huawei.com> <20190109043801.GB55897@jaegeuk-macbookpro.roam.corp.google.com> <9e8fb00b-75e7-1f69-0255-26de800a6688@huawei.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <9e8fb00b-75e7-1f69-0255-26de800a6688@huawei.com> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Jan 09, 2019 at 02:13:02PM +0800, Chao Yu wrote: > On 2019/1/9 12:38, Jaegeuk Kim wrote: > > On 01/07, Chao Yu wrote: > >> On 2019/1/5 4:33, Jaegeuk Kim wrote: > >>> On 01/04, Sahitya Tummala wrote: > >>>> On Mon, Nov 26, 2018 at 10:17:20AM +0530, Sahitya Tummala wrote: > >>>>> When there is a failure in f2fs_fill_super() after/during > >>>>> the recovery of fsync'd nodes, it frees the current sbi and > >>>>> retries again. This time the mount is successful, but the files > >>>>> that got recovered before retry, still holds the extent tree, > >>>>> whose extent nodes list is corrupted since sbi and sbi->extent_list > >>>>> is freed up. The list_del corruption issue is observed when the > >>>>> file system is getting unmounted and when those recoverd files extent > >>>>> node is being freed up in the below context. > >>>>> > >>>>> list_del corruption. prev->next should be fffffff1e1ef5480, but was (null) > >>>>> <...> > >>>>> kernel BUG at kernel/msm-4.14/lib/list_debug.c:53! > >>>>> task: fffffff1f46f2280 task.stack: ffffff8008068000 > >>>>> lr : __list_del_entry_valid+0x94/0xb4 > >>>>> pc : __list_del_entry_valid+0x94/0xb4 > >>>>> <...> > >>>>> Call trace: > >>>>> __list_del_entry_valid+0x94/0xb4 > >>>>> __release_extent_node+0xb0/0x114 > >>>>> __free_extent_tree+0x58/0x7c > >>>>> f2fs_shrink_extent_tree+0xdc/0x3b0 > >>>>> f2fs_leave_shrinker+0x28/0x7c > >>>>> f2fs_put_super+0xfc/0x1e0 > >>>>> generic_shutdown_super+0x70/0xf4 > >>>>> kill_block_super+0x2c/0x5c > >>>>> kill_f2fs_super+0x44/0x50 > >>>>> deactivate_locked_super+0x60/0x8c > >>>>> deactivate_super+0x68/0x74 > >>>>> cleanup_mnt+0x40/0x78 > >>>>> __cleanup_mnt+0x1c/0x28 > >>>>> task_work_run+0x48/0xd0 > >>>>> do_notify_resume+0x678/0xe98 > >>>>> work_pending+0x8/0x14 > >>>>> > >>>>> Fix this by cleaning up inodes, extent tree and nodes of those > >>>>> recovered files before freeing up sbi and before next retry. > >>>>> > >>>> Hi Jaegeuk, Chao, > >>>> > >>>> I have observed another scenario where the similar list corruption issue > >>>> can happen with sbi->inode_list as well. If recover_fsync_data() > >>>> fails at some point in write_checkpoint() due to some error and if > >>>> those recovered inodes are still dirty, then after the mount is > >>>> successful, this issue is observed when that dirty inode is under > >>>> writeback. > >>> > >>> recover_fsync_data() does iget/iput in pair, and destroy_fsync_dnodes() drops > >>> its dirty list and call iput(), when there is an error. So, after then, there'd > >>> be no dirty inodes. If there's no error, checkpoint() flushes quota/dentry pages > >>> in dirty inodes as well. Can we check where this dirty inode came from? > >> > >> I guess it comes from: > >> > >> f2fs_recover_fsync_data() > >> > >> /* Needed for iput() to work correctly and not trash data */ > >> sbi->sb->s_flags |= SB_ACTIVE; > >> > >> iput_final() > >> > >> if (!drop && (sb->s_flags & SB_ACTIVE)) { > >> inode_add_lru(inode); > >> spin_unlock(&inode->i_lock); > >> return; > >> } > >> > >> So dirty data in those inode can be remained after iput(), then meta/node > >> can be persisted during next checkpoint, if checkpoint failed due to error, > >> dirty inode remain in system. IIUC. > > > > > > 749 err = recover_data(sbi, &inode_list, &tmp_inode_list, &dir_list); > > 750 if (!err) > > 751 f2fs_bug_on(sbi, !list_empty(&inode_list)); > > 752 else { > > 753 /* restore s_flags to let iput() trash data */ > > 754 sbi->sb->s_flags = s_flags; > > 755 } > > > > We deactivate sb before iput? > > We only restore s_flags in error path of recover_data? I remember Sahitya > said his case is encountering error in later checkpoint()? > Thanks Chao and Jaegeuk for taking a look at this. I might be wrong earlier on the point of error. The case that Jaegeuk has addressed below looks good for now. https://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs.git/commit/?h=dev&id=cd2dcebde8b8088b8ad77184ee3a1e81fd3da6ba > Thanks, > > > > >> > >>> > >>> Oh, one sceanrio can be an error by f2fs_disable_checkpoint() which will do GC. > >>> > >>>> > >>>> [ 90.400500] list_del corruption. prev->next should be ffffffed1f566208, but was (null) > >>>> [ 90.675349] Call trace: > >>>> [ 90.677869] __list_del_entry_valid+0x94/0xb4 > >>>> [ 90.682351] remove_dirty_inode+0xac/0x114 > >>>> [ 90.686563] __f2fs_write_data_pages+0x6a8/0x6c8 > >>>> [ 90.691302] f2fs_write_data_pages+0x40/0x4c > >>>> [ 90.695695] do_writepages+0x80/0xf0 > >>>> [ 90.699372] __writeback_single_inode+0xdc/0x4ac > >>>> [ 90.704113] writeback_sb_inodes+0x280/0x440 > >>>> [ 90.708501] wb_writeback+0x1b8/0x3d0 > >>>> [ 90.712267] wb_workfn+0x1a8/0x4d4 > >>>> [ 90.715765] process_one_work+0x1c0/0x3d4 > >>>> [ 90.719883] worker_thread+0x224/0x344 > >>>> [ 90.723739] kthread+0x120/0x130 > >>>> [ 90.727055] ret_from_fork+0x10/0x18 > >>>> > >>>> I think it is better to cleanup those inodes completely before freeing sbi > >>>> and before next retry as done in this patch. Would you like to re-consider > >>>> this patch for this new issue? > >>> > >>> The patch was merged in mainline already. > >>> Could you take a look at this patch? > >>> > >>> >From cb1d20e640402beed300c2bdce79311ee8a781ad Mon Sep 17 00:00:00 2001 > >>> From: Jaegeuk Kim > >>> Date: Fri, 4 Jan 2019 12:29:00 -0800 > >>> Subject: [PATCH] f2fs: sync filesystem after roll-forward recovery > >> > >> You mean android kernel mainline? > > > > I meant the previous patch was upstreamed. We need another patch to address this > > second issue. > > > > Thanks, > > > >> > >> Thanks, > >> > >>> > >>> Some works after roll-forward recovery can get an error which will release > >>> all the data structures. Let's flush them in order to make it clean. > >>> > >>> One possible corruption came from: > >>> > >>> [ 90.400500] list_del corruption. prev->next should be ffffffed1f566208, but was (null) > >>> [ 90.675349] Call trace: > >>> [ 90.677869] __list_del_entry_valid+0x94/0xb4 > >>> [ 90.682351] remove_dirty_inode+0xac/0x114 > >>> [ 90.686563] __f2fs_write_data_pages+0x6a8/0x6c8 > >>> [ 90.691302] f2fs_write_data_pages+0x40/0x4c > >>> [ 90.695695] do_writepages+0x80/0xf0 > >>> [ 90.699372] __writeback_single_inode+0xdc/0x4ac > >>> [ 90.704113] writeback_sb_inodes+0x280/0x440 > >>> [ 90.708501] wb_writeback+0x1b8/0x3d0 > >>> [ 90.712267] wb_workfn+0x1a8/0x4d4 > >>> [ 90.715765] process_one_work+0x1c0/0x3d4 > >>> [ 90.719883] worker_thread+0x224/0x344 > >>> [ 90.723739] kthread+0x120/0x130 > >>> [ 90.727055] ret_from_fork+0x10/0x18 > >>> > >>> Reported-by: Sahitya Tummala > >>> Signed-off-by: Jaegeuk Kim > >>> --- > >>> fs/f2fs/super.c | 8 ++++++-- > >>> 1 file changed, 6 insertions(+), 2 deletions(-) > >>> > >>> diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c > >>> index 547cb7459be7..bb02186293a3 100644 > >>> --- a/fs/f2fs/super.c > >>> +++ b/fs/f2fs/super.c > >>> @@ -3357,7 +3357,7 @@ static int f2fs_fill_super(struct super_block *sb, void *data, int silent) > >>> if (test_opt(sbi, DISABLE_CHECKPOINT)) { > >>> err = f2fs_disable_checkpoint(sbi); > >>> if (err) > >>> - goto free_meta; > >>> + goto sync_free_meta; > >>> } else if (is_set_ckpt_flags(sbi, CP_DISABLED_FLAG)) { > >>> f2fs_enable_checkpoint(sbi); > >>> } > >>> @@ -3370,7 +3370,7 @@ static int f2fs_fill_super(struct super_block *sb, void *data, int silent) > >>> /* After POR, we can run background GC thread.*/ > >>> err = f2fs_start_gc_thread(sbi); > >>> if (err) > >>> - goto free_meta; > >>> + goto sync_free_meta; > >>> } > >>> kvfree(options); > >>> > >>> @@ -3392,6 +3392,10 @@ static int f2fs_fill_super(struct super_block *sb, void *data, int silent) > >>> f2fs_update_time(sbi, REQ_TIME); > >>> return 0; > >>> > >>> +sync_free_meta: > >>> + /* safe to flush all the data */ > >>> + sync_filesystem(sbi->sb); > >>> + > >>> free_meta: > >>> /* flush dirty orphan inode objects */ > >>> f2fs_sync_inode_meta(sbi); > >>> > > > > . > > > -- -- Sent by a consultant of the Qualcomm Innovation Center, Inc. The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum.