Received: by 2002:a25:1506:0:0:0:0:0 with SMTP id 6csp4310190ybv; Sun, 16 Feb 2020 19:10:27 -0800 (PST) X-Google-Smtp-Source: APXvYqy7M/E1UltgulQq/eV9HtGhFNPTrfgYTDnrP/zb7E14yuxrlya2MyWZicZ+gZYU2dhknpqx X-Received: by 2002:a05:6830:11c8:: with SMTP id v8mr9854709otq.330.1581909026843; Sun, 16 Feb 2020 19:10:26 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1581909026; cv=none; d=google.com; s=arc-20160816; b=0b4B+MFo3RRsmrLsRCh7kmBnbtagSvscbO4F6siOZoTQf5hbLul8zBve4dSt+t8Y8l 14dp/Hx72V3kBbqALCoyhNQqHLZx6mGFqmfUnUhEKlCOAP2mF17pcvH3RQ0esqA3zMJI aKcEKtksH644XveQe+bnHNiQ56Zm32CmN/D+ikWpCn5qqZRYzabydot9GmwU4ct8kjZh A4qq5+QFaJVARlnYJviCk1TBfqrrunsB2T5ub2+6GYTtMWqVKrmTst3myUqkoUjkgdTM gs5OQ8DOVOH8+zIsNciLjk+HmMpvfc3SGvxgISt2Wk1ooVL/8x7msHBgauV92PYA3Evo WtUg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dmarc-filter:dkim-signature; bh=2XhGGEJvO3Ntt34gfK50uTmC8sY/lgsz5WYS4dJwkJo=; b=e90LhipTlYvq4Z6lj2EZXPbGCSHMSO+G5URHykxTAI3QQbe1pL1h/KdrC9cmuzVYuc uJOJDysWyX/fPpQbs/P8lF7nlEmqp/wRT35ubzfOgr1vzlUo3lGB584DMmGuIwA9YCS9 +d8VCsjnVoGorQ6hB+/PsNTKQ8VQqRQkYn6H1yvlJh0qxibueCf4X0T+tmhjlMllrIqf OQ8EjKU/zn3AcMeH19yoGsDA2YzJ6gHcsTVX9siICOj/dgYZSMIv0qzJmUUBeDZfpBk8 hYTnsVgaGCF2Dtn63pj6Kgm85HfWks9RGF/yBh243FuREtNpgrspc9iH5zHwwG05zy0D RiaA== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@mg.codeaurora.org header.s=smtp header.b=wIUJin4I; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id e15si6900292otq.237.2020.02.16.19.10.14; Sun, 16 Feb 2020 19:10:26 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=fail header.i=@mg.codeaurora.org header.s=smtp header.b=wIUJin4I; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727078AbgBQDJr (ORCPT + 99 others); Sun, 16 Feb 2020 22:09:47 -0500 Received: from mail27.static.mailgun.info ([104.130.122.27]:51426 "EHLO mail27.static.mailgun.info" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726485AbgBQDJq (ORCPT ); Sun, 16 Feb 2020 22:09:46 -0500 DKIM-Signature: a=rsa-sha256; v=1; c=relaxed/relaxed; d=mg.codeaurora.org; q=dns/txt; s=smtp; t=1581908986; h=In-Reply-To: Content-Type: MIME-Version: References: Message-ID: Subject: Cc: To: From: Date: Sender; bh=2XhGGEJvO3Ntt34gfK50uTmC8sY/lgsz5WYS4dJwkJo=; b=wIUJin4IQqgDoEBpsPKzpMs1AdnTuMBs++fbBREdgLmZyHwHmnEW/G1McKZe9y96usp10IcW 3nXrje5MslUDnWZZdaLkSgzN2vQxcd/A8d2aDBAORRtvMkd3CnduXKagkNNg1pLv1ZHNFtYm Q7jNZNoQmIlUPUqLN03zP1FYEpw= X-Mailgun-Sending-Ip: 104.130.122.27 X-Mailgun-Sid: WyI0MWYwYSIsICJsaW51eC1rZXJuZWxAdmdlci5rZXJuZWwub3JnIiwgImJlOWU0YSJd Received: from smtp.codeaurora.org (ec2-35-166-182-171.us-west-2.compute.amazonaws.com [35.166.182.171]) by mxa.mailgun.org with ESMTP id 5e4a03f8.7eff6104f928-smtp-out-n02; Mon, 17 Feb 2020 03:09:44 -0000 (UTC) Received: by smtp.codeaurora.org (Postfix, from userid 1001) id ADD1CC4479F; Mon, 17 Feb 2020 03:09:43 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-caf-mail-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-1.0 required=2.0 tests=ALL_TRUSTED,SPF_NONE autolearn=unavailable autolearn_force=no version=3.4.0 Received: from codeaurora.org (blr-c-bdr-fw-01_GlobalNAT_AllZones-Outside.qualcomm.com [103.229.19.19]) (using TLSv1.2 with cipher DHE-RSA-AES128-SHA (128/128 bits)) (No client certificate requested) (Authenticated sender: stummala) by smtp.codeaurora.org (Postfix) with ESMTPSA id CDD01C43383; Mon, 17 Feb 2020 03:09:40 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 smtp.codeaurora.org CDD01C43383 Authentication-Results: aws-us-west-2-caf-mail-1.web.codeaurora.org; dmarc=none (p=none dis=none) header.from=codeaurora.org Authentication-Results: aws-us-west-2-caf-mail-1.web.codeaurora.org; spf=none smtp.mailfrom=stummala@codeaurora.org Date: Mon, 17 Feb 2020 08:39:37 +0530 From: Sahitya Tummala To: Chao Yu Cc: Jaegeuk Kim , linux-f2fs-devel@lists.sourceforge.net, linux-kernel@vger.kernel.org Subject: Re: [PATCH] f2fs: fix the panic in do_checkpoint() Message-ID: <20200217030937.GC20234@codeaurora.org> References: <1581503665-19914-1-git-send-email-stummala@codeaurora.org> <5d5903e6-f089-7ecf-f1ff-ad341c4cef56@huawei.com> <20200214035425.GA20234@codeaurora.org> <7fb14366-0ae3-e2d9-49ff-b7257054002c@huawei.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <7fb14366-0ae3-e2d9-49ff-b7257054002c@huawei.com> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Feb 14, 2020 at 03:16:59PM +0800, Chao Yu wrote: > Hi Sahitya, > > On 2020/2/14 11:54, Sahitya Tummala wrote: > > Hi Chao, > > > > On Fri, Feb 14, 2020 at 10:26:18AM +0800, Chao Yu wrote: > >> Hi Sahitya, > >> > >> On 2020/2/12 18:34, Sahitya Tummala wrote: > >>> There could be a scenario where f2fs_sync_meta_pages() will not > >>> ensure that all F2FS_DIRTY_META pages are submitted for IO. Thus, > >>> resulting in the below panic in do_checkpoint() - > >>> > >>> f2fs_bug_on(sbi, get_pages(sbi, F2FS_DIRTY_META) && > >>> !f2fs_cp_error(sbi)); > >>> > >>> This can happen in a low-memory condition, where shrinker could > >>> also be doing the writepage operation (stack shown below) > >>> at the same time when checkpoint is running on another core. > >>> > >>> schedule > >>> down_write > >>> f2fs_submit_page_write -> by this time, this page in page cache is tagged > >>> as PAGECACHE_TAG_WRITEBACK and PAGECACHE_TAG_DIRTY > >>> is cleared, due to which f2fs_sync_meta_pages() > >>> cannot sync this page in do_checkpoint() path. > >>> f2fs_do_write_meta_page > >>> __f2fs_write_meta_page > >>> f2fs_write_meta_page > >>> shrink_page_list > >>> shrink_inactive_list > >>> shrink_node_memcg > >>> shrink_node > >>> kswapd > >> > >> IMO, there may be one more simple fix here: > >> > >> - f2fs_do_write_meta_page(sbi, page, io_type); > >> dec_page_count(sbi, F2FS_DIRTY_META); > >> > >> + f2fs_do_write_meta_page(sbi, page, io_type); > >> > >> If we can remove F2FS_DIRTY_META reference count before we clear > >> PAGECACHE_TAG_DIRTY, we can avoid this race condition. > >> > >> - dec_page_count(sbi, F2FS_DIRTY_META); > >> - f2fs_do_write_meta_page > >> - set_page_writeback > >> - __test_set_page_writeback > >> - xas_clear_mark(&xas, PAGECACHE_TAG_DIRTY); > >> > >> Thoughts? > > > > I believe it will be a problem because let's say the shrinker path is > > still in the below stack - > > > > f2fs_submit_page_write(); > > ->down_write(&io->io_rwsem); > > > > Then, the current f2fs_wait_on_all_pages_writeback() will not wait for > > that page as it is not yet tagged as F2FS_WB_CP_DATA. Hence, the checkpoint > > may proceed without waiting for this meta page to be written to disk. > > Oh, you're right. > > It looks the race can happen for data/node pages? like in fsync() procedure. > Yes, it can happen. I see that you have already put a patch to fix that :). > > > >> > >>> > >>> Signed-off-by: Sahitya Tummala > >>> --- > >>> fs/f2fs/checkpoint.c | 16 ++++++++-------- > >>> fs/f2fs/f2fs.h | 2 +- > >>> fs/f2fs/super.c | 2 +- > >>> 3 files changed, 10 insertions(+), 10 deletions(-) > >>> > >>> diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c > >>> index ffdaba0..2b651a3 100644 > >>> --- a/fs/f2fs/checkpoint.c > >>> +++ b/fs/f2fs/checkpoint.c > >>> @@ -1250,14 +1250,14 @@ static void unblock_operations(struct f2fs_sb_info *sbi) > >>> f2fs_unlock_all(sbi); > >>> } > >>> > >>> -void f2fs_wait_on_all_pages_writeback(struct f2fs_sb_info *sbi) > >>> +void f2fs_wait_on_all_pages(struct f2fs_sb_info *sbi, int type) > >>> { > >>> DEFINE_WAIT(wait); > >>> > >>> for (;;) { > >>> prepare_to_wait(&sbi->cp_wait, &wait, TASK_UNINTERRUPTIBLE); > >>> > >>> - if (!get_pages(sbi, F2FS_WB_CP_DATA)) > >>> + if (!get_pages(sbi, type)) > >>> break; > >>> > >>> if (unlikely(f2fs_cp_error(sbi))) > >>> @@ -1384,8 +1384,8 @@ static int do_checkpoint(struct f2fs_sb_info *sbi, struct cp_control *cpc) > >>> > >>> /* Flush all the NAT/SIT pages */ > >>> f2fs_sync_meta_pages(sbi, META, LONG_MAX, FS_CP_META_IO); > >>> - f2fs_bug_on(sbi, get_pages(sbi, F2FS_DIRTY_META) && > >>> - !f2fs_cp_error(sbi)); > >>> + /* Wait for all dirty meta pages to be submitted for IO */ > >>> + f2fs_wait_on_all_pages(sbi, F2FS_DIRTY_META); > >> > >> I'm afraid calling f2fs_wait_on_all_pages() after we call > >> f2fs_sync_meta_pages() is low efficient, as we only want to write out > >> dirty meta pages instead of wait for writebacking them to device cache. > >> > > > > I have modified the existing function f2fs_wait_on_all_pages_writeback() to > > a generic one f2fs_wait_on_all_pages(), where it will wait according to the > > requested type. In this case, it will only wait for dirty F2FS_DIRTY_META pages > > but not for the writeback to device cache. > > Oh, I see, as last dirty reference count decreaser won't wake up waiter, we need > to wait for timeout, right? Can we decrease timeout count, maybe HZ/50 as we did > for congestion. Yes, I agree that it is better to decrease the timeout to HZ/50. Thanks, > > io_schedule_timeout(5*HZ); > > Thanks, > > > > > Thanks, > > > >> Thanks, > >> > >>> > >>> /* > >>> * modify checkpoint > >>> @@ -1493,11 +1493,11 @@ static int do_checkpoint(struct f2fs_sb_info *sbi, struct cp_control *cpc) > >>> > >>> /* Here, we have one bio having CP pack except cp pack 2 page */ > >>> f2fs_sync_meta_pages(sbi, META, LONG_MAX, FS_CP_META_IO); > >>> - f2fs_bug_on(sbi, get_pages(sbi, F2FS_DIRTY_META) && > >>> - !f2fs_cp_error(sbi)); > >>> + /* Wait for all dirty meta pages to be submitted for IO */ > >>> + f2fs_wait_on_all_pages(sbi, F2FS_DIRTY_META); > >>> > >>> /* wait for previous submitted meta pages writeback */ > >>> - f2fs_wait_on_all_pages_writeback(sbi); > >>> + f2fs_wait_on_all_pages(sbi, F2FS_WB_CP_DATA); > >>> > >>> /* flush all device cache */ > >>> err = f2fs_flush_device_cache(sbi); > >>> @@ -1506,7 +1506,7 @@ static int do_checkpoint(struct f2fs_sb_info *sbi, struct cp_control *cpc) > >>> > >>> /* barrier and flush checkpoint cp pack 2 page if it can */ > >>> commit_checkpoint(sbi, ckpt, start_blk); > >>> - f2fs_wait_on_all_pages_writeback(sbi); > >>> + f2fs_wait_on_all_pages(sbi, F2FS_WB_CP_DATA); > >>> > >>> /* > >>> * invalidate intermediate page cache borrowed from meta inode > >>> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h > >>> index 5a888a0..b0e0535 100644 > >>> --- a/fs/f2fs/f2fs.h > >>> +++ b/fs/f2fs/f2fs.h > >>> @@ -3196,7 +3196,7 @@ bool f2fs_is_dirty_device(struct f2fs_sb_info *sbi, nid_t ino, > >>> void f2fs_update_dirty_page(struct inode *inode, struct page *page); > >>> void f2fs_remove_dirty_inode(struct inode *inode); > >>> int f2fs_sync_dirty_inodes(struct f2fs_sb_info *sbi, enum inode_type type); > >>> -void f2fs_wait_on_all_pages_writeback(struct f2fs_sb_info *sbi); > >>> +void f2fs_wait_on_all_pages(struct f2fs_sb_info *sbi, int type); > >>> int f2fs_write_checkpoint(struct f2fs_sb_info *sbi, struct cp_control *cpc); > >>> void f2fs_init_ino_entry_info(struct f2fs_sb_info *sbi); > >>> int __init f2fs_create_checkpoint_caches(void); > >>> diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c > >>> index 5111e1f..084633b 100644 > >>> --- a/fs/f2fs/super.c > >>> +++ b/fs/f2fs/super.c > >>> @@ -1105,7 +1105,7 @@ static void f2fs_put_super(struct super_block *sb) > >>> /* our cp_error case, we can wait for any writeback page */ > >>> f2fs_flush_merged_writes(sbi); > >>> > >>> - f2fs_wait_on_all_pages_writeback(sbi); > >>> + f2fs_wait_on_all_pages(sbi, F2FS_WB_CP_DATA); > >>> > >>> f2fs_bug_on(sbi, sbi->fsync_node_num); > >>> > >>> > > -- -- Sent by a consultant of the Qualcomm Innovation Center, Inc. The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum.