Received: by 2002:a05:6a10:6d10:0:0:0:0 with SMTP id gq16csp82133pxb; Thu, 14 Apr 2022 16:38:01 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwdjNDEGUplGV1gnJ1dsZqygfmmjHbpq9AW/FZrOhP86bUeeX0er+J8Eg+24G7lBkkQIZbd X-Received: by 2002:a17:90a:af8f:b0:1ca:7bce:ce3b with SMTP id w15-20020a17090aaf8f00b001ca7bcece3bmr1082622pjq.224.1649979480893; Thu, 14 Apr 2022 16:38:00 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1649979480; cv=none; d=google.com; s=arc-20160816; b=Kjbsg5d7o8QNzBO5wk7i9uYXXq+44BvC0DR3n51ZxoO7ZWNmGRwC+Xr1HmbYdPJkl/ DleENEdSDbGH3R+4sny2dkQPCDFP+DAdR11d4ddQELBrdzY8tgioQSDRmiy+qWY6uS7r GL3hikO/XDwpln/yGyZawh1VAMLIzrgRoMvOQ66n+Igs0gRSaA5402ycsPZaiZDvtbIL SasHA8cBWudL2ugahRO1K4Mz+AAvWDTGgxdJ1GB2ZHyzRcgZG0suJN9olfg55HOG2UU4 xg+k2uxGadmP5JVDcih9970K8ptYjEq0CwMtoAfgec60x7Gafd8uyWhnbXNOmcOxRI0b 5FPQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=LuwzxAU98bSfdptKZvAuTIVJdPRhrCy3SYmCNU7Fx/I=; b=y7+/VYOwNMUg0YcIBZueXqaHK6DNlv+kZKfdcpUDQhRh09IznwGoyiAsa6OvKL6ZAN jt2X6l0GTKOFFpYOynP1oMo/mLBqLWYvUSspKyuKIwtNlmef8zzEL4ophfcWDY7wJq7A y4hsxtk2lPvC4sN4Wj7J/EGNpSUCSxYjaGKmJyK8OwDcwI9c4fZnx5IESHkigjzg8mIk g3drHZU7DVXWMw11JmgcmA49E2XcG5yNXBSl3hPuycPsbfhlKuxymMgC7gVvWFcV34am Z60YuwR1Hwuy5U/dPEtaFp8lBw48Keb8qDCrgH44YfcjszCWd/0M+n731E8K+KF2ynX7 HFtA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=DUJay3Ku; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id h6-20020a056a00230600b004fac017a5b1si119962pfh.95.2022.04.14.16.37.47; Thu, 14 Apr 2022 16:38:00 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=DUJay3Ku; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239588AbiDNCpR (ORCPT + 99 others); Wed, 13 Apr 2022 22:45:17 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60558 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233483AbiDNCpP (ORCPT ); Wed, 13 Apr 2022 22:45:15 -0400 Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DD1D433E15 for ; Wed, 13 Apr 2022 19:42:51 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 4C3C9B827EC for ; Thu, 14 Apr 2022 02:42:50 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id C12E1C385A3; Thu, 14 Apr 2022 02:42:48 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1649904168; bh=36DzzE0zFDY0+1eEpGlav//54lccCGWjd8+KNiyp/0A=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=DUJay3KuNMjYKs3umQ64L2QtcGkKJDk81kuC0XTjgOd+8MJB3EliCD1FpByvEInaF 7ni4Ssghq79zqEwoUr1q2NKPrnG5jx1I6Fca1eR3vFKpUkFZcSKvPxecO7Ilb5uaC0 3eHH8HNY+vv/fQ1yY+K2eKFn1Can0F1D4QmOkNAwkJmmB21FgQFBDoXo6DOmMwhenU pmcCd9paycCmONEWoKDpD8bBLksOPNagmfxclGyPsu9tKqXvuqRknYck3h0K//tAZn MC3noDTXIakBd25BKB1/R061hsZsJZe1OFU72nZIHoFV05i41+cre04/VbgVwHtndn JsaA1m2taWiaw== Date: Wed, 13 Apr 2022 19:42:47 -0700 From: Jaegeuk Kim To: Wu Yan Cc: linux-f2fs-devel@lists.sourceforge.net, linux-kernel@vger.kernel.org, tang.ding@tcl.com Subject: Re: [PATCH] f2fs: avoid deadlock in gc thread under low memory Message-ID: References: <660530eb62e71fb6520d3596704162e5@sslemail.net> <39c4ded0-09c0-3e38-85cb-5535099b177d@tcl.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Spam-Status: No, score=-7.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_HI, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 04/14, Wu Yan wrote: > On 4/14/22 10:18, Jaegeuk Kim wrote: > > On 04/14, Wu Yan wrote: > > > On 4/14/22 01:00, Jaegeuk Kim wrote: > > > > On 04/13, Rokudo Yan wrote: > > > > > There is a potential deadlock in gc thread may happen > > > > > under low memory as below: > > > > > > > > > > gc_thread_func > > > > > -f2fs_gc > > > > > -do_garbage_collect > > > > > -gc_data_segment > > > > > -move_data_block > > > > > -set_page_writeback(fio.encrypted_page); > > > > > -f2fs_submit_page_write > > > > > as f2fs_submit_page_write try to do io merge when possible, so the > > > > > encrypted_page is marked PG_writeback but may not submit to block > > > > > layer immediately, if system enter low memory when gc thread try > > > > > to move next data block, it may do direct reclaim and enter fs layer > > > > > as below: > > > > > -move_data_block > > > > > -f2fs_grab_cache_page(index=?, for_write=false) > > > > > -grab_cache_page > > > > > -find_or_create_page > > > > > -pagecache_get_page > > > > > -__page_cache_alloc -- __GFP_FS is set > > > > > -alloc_pages_node > > > > > -__alloc_pages > > > > > -__alloc_pages_slowpath > > > > > -__alloc_pages_direct_reclaim > > > > > -__perform_reclaim > > > > > -try_to_free_pages > > > > > -do_try_to_free_pages > > > > > -shrink_zones > > > > > -mem_cgroup_soft_limit_reclaim > > > > > -mem_cgroup_soft_reclaim > > > > > -mem_cgroup_shrink_node > > > > > -shrink_node_memcg > > > > > -shrink_list > > > > > -shrink_inactive_list > > > > > -shrink_page_list > > > > > -wait_on_page_writeback -- the page is marked > > > > > writeback during previous move_data_block call > > > > > > > > > > the gc thread wait for the encrypted_page writeback complete, > > > > > but as gc thread held sbi->gc_lock, the writeback & sync thread > > > > > may blocked waiting for sbi->gc_lock, so the bio contain the > > > > > encrypted_page may nerver submit to block layer and complete the > > > > > writeback, which cause deadlock. To avoid this deadlock condition, > > > > > we mark the gc thread with PF_MEMALLOC_NOFS flag, then it will nerver > > > > > enter fs layer when try to alloc cache page during move_data_block. > > > > > > > > > > Signed-off-by: Rokudo Yan > > > > > --- > > > > > fs/f2fs/gc.c | 6 ++++++ > > > > > 1 file changed, 6 insertions(+) > > > > > > > > > > diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c > > > > > index e020804f7b07..cc71f77b98c8 100644 > > > > > --- a/fs/f2fs/gc.c > > > > > +++ b/fs/f2fs/gc.c > > > > > @@ -38,6 +38,12 @@ static int gc_thread_func(void *data) > > > > > wait_ms = gc_th->min_sleep_time; > > > > > + /* > > > > > + * Make sure that no allocations from gc thread will ever > > > > > + * recurse to the fs layer to avoid deadlock as it will > > > > > + * hold sbi->gc_lock during garbage collection > > > > > + */ > > > > > + memalloc_nofs_save(); > > > > > > > > I think this cannot cover all the f2fs_gc() call cases. Can we just avoid by: > > > > > > > > --- a/fs/f2fs/gc.c > > > > +++ b/fs/f2fs/gc.c > > > > @@ -1233,7 +1233,7 @@ static int move_data_block(struct inode *inode, block_t bidx, > > > > CURSEG_ALL_DATA_ATGC : CURSEG_COLD_DATA; > > > > > > > > /* do not read out */ > > > > - page = f2fs_grab_cache_page(inode->i_mapping, bidx, false); > > > > + page = f2fs_grab_cache_page(inode->i_mapping, bidx, true); > > > > if (!page) > > > > return -ENOMEM; > > > > > > > > Thanks, > > > > > > > > > set_freezable(); > > > > > do { > > > > > bool sync_mode, foreground = false; > > > > > -- > > > > > 2.25.1 > > > > > > Hi, Jaegeuk > > > > > > I'm not sure if any other case may trigger the issue, but the stack traces I > > > have caught so far are all the same as below: > > > > > > f2fs_gc-253:12 D 226966.808196 572 302561 150976 0x1200840 0x0 572 > > > 237207473347056 > > > __switch_to+0x134/0x150 > > > __schedule+0xd5c/0x1100 > > > io_schedule+0x90/0xc0 > > > wait_on_page_bit+0x194/0x208 > > > shrink_page_list+0x62c/0xe74 > > > shrink_inactive_list+0x2c0/0x698 > > > shrink_node_memcg+0x3dc/0x97c > > > mem_cgroup_shrink_node+0x144/0x218 > > > mem_cgroup_soft_limit_reclaim+0x188/0x47c > > > do_try_to_free_pages+0x204/0x3a0 > > > try_to_free_pages+0x35c/0x4d0 > > > __alloc_pages_nodemask+0x7a4/0x10d0 > > > pagecache_get_page+0x184/0x2ec > > > > Is this deadlock trying to grab a lock, instead of waiting for writeback? > > Could you share all the backtraces of the tasks? > > > > For writeback above, looking at the code, f2fs_gc uses three mappings, meta, > > node, and data, and meta/node inodes are masking GFP_NOFS in f2fs_iget(), > > while data inode does not. So, the above f2fs_grab_cache_page() in > > move_data_block() is actually calling w/o NOFS. > > > > > do_garbage_collect+0xfe0/0x2828 > > > f2fs_gc+0x4a0/0x8ec > > > gc_thread_func+0x240/0x4d4 > > > kthread+0x17c/0x18c > > > ret_from_fork+0x10/0x18 > > > > > > Thanks > > > yanwu > > Hi, Jaegeuk > > The gc thread is blocked on wait_on_page_writeback(encrypted page submit > before) when it try grab data inode page, the parsed stack traces as below: > > ppid=572 pid=572 D cpu=1 prio=120 wait=378s f2fs_gc-253:12 > Native callstack: > vmlinux wait_on_page_bit_common(page=0xFFFFFFBF7D2CD700, state=2, > lock=false) + 304 > vmlinux wait_on_page_bit(page=0xFFFFFFBF7D2CD700, bit_nr=15) + 400 > > > vmlinux wait_on_page_writeback(page=0xFFFFFFBF7D2CD700) + 36 > > > vmlinux shrink_page_list(page_list=0xFFFFFF8011E83418, > pgdat=contig_page_data, sc=0xFFFFFF8011E835B8, ttu_flags=0, > stat=0xFFFFFF8011E833F0, force_reclaim=false) + 1576 > vmlinux shrink_inactive_list(lruvec=0xFFFFFFE003C304C0, > sc=0xFFFFFF8011E835B8, lru=LRU_INACTIVE_FILE) + 700 > > vmlinux shrink_list(lru=LRU_INACTIVE_FILE, lruvec=0xFFFFFF8011E834B8, > sc=0xFFFFFF8011E835B8) + 128 > vmlinux shrink_node_memcg(pgdat=contig_page_data, > memcg=0xFFFFFFE003C1A300, sc=0xFFFFFF8011E835B8, > lru_pages=0xFFFFFF8011E835B0) + 984 > vmlinux mem_cgroup_shrink_node(memcg=0xFFFFFFE003C1A300, > gfp_mask=21102794, noswap=false, pgdat=contig_page_data, > nr_scanned=0xFFFFFF8011E836A0) + 320 > vmlinux mem_cgroup_soft_reclaim(root_memcg=0xFFFFFFE003C1A300, > pgdat=contig_page_data) + 164 > > vmlinux mem_cgroup_soft_limit_reclaim(pgdat=contig_page_data, order=0, > gfp_mask=21102794, total_scanned=0xFFFFFF8011E83720) + 388 > > vmlinux shrink_zones(zonelist=contig_page_data + 14784, > sc=0xFFFFFF8011E83790) + 352 > > vmlinux do_try_to_free_pages(zonelist=contig_page_data + 14784, > sc=0xFFFFFF8011E83790) + 512 > > vmlinux try_to_free_pages(zonelist=contig_page_data + 14784, order=0, > gfp_mask=21102794, nodemask=0) + 856 > > vmlinux __perform_reclaim(gfp_mask=300431548, order=0, > ac=0xFFFFFF8011E83900) + 60 > > vmlinux __alloc_pages_direct_reclaim(gfp_mask=300431548, order=0, > alloc_flags=300431604, ac=0xFFFFFF8011E83900) + 60 > > vmlinux __alloc_pages_slowpath(gfp_mask=300431548, order=0, > ac=0xFFFFFF8011E83900) + 1244 > > vmlinux __alloc_pages_nodemask() + 1952 > > > vmlinux __alloc_pages(gfp_mask=21102794, order=0, preferred_nid=0) + 16 > > vmlinux __alloc_pages_node(nid=0, gfp_mask=21102794, order=0) + 16 > > > vmlinux alloc_pages_node(nid=0, gfp_mask=21102794, order=0) + 16 > > > vmlinux __page_cache_alloc(gfp=21102794) + 16 > > > vmlinux pagecache_get_page() + 384 > > > vmlinux find_or_create_page(offset=209) + 112 > > > vmlinux grab_cache_page(index=209) + 112 > > > vmlinux f2fs_grab_cache_page(index=209, for_write=false) + 112 Ok, I think this should be enough. --- a/fs/f2fs/gc.c +++ b/fs/f2fs/gc.c @@ -1233,7 +1233,7 @@ static int move_data_block(struct inode *inode, block_t bidx, CURSEG_ALL_DATA_ATGC : CURSEG_COLD_DATA; /* do not read out */ - page = f2fs_grab_cache_page(inode->i_mapping, bidx, false); + page = f2fs_grab_cache_page(inode->i_mapping, bidx, true); if (!page) return -ENOMEM; > > > vmlinux move_data_block(inode=0xFFFFFFDFD578EEA0, gc_type=300432152, > segno=21904, off=145) + 3584 > vmlinux gc_data_segment(sbi=0xFFFFFFE007C03000, sum=0xFFFFFF8011E83B10, > gc_list=0xFFFFFF8011E83AB8, segno=21904, gc_type=300432152) + 3644 > > vmlinux do_garbage_collect(sbi=0xFFFFFFE007C03000, start_segno=21904, > gc_list=0xFFFFFF8011E83CF0, gc_type=0) + 4060 > > vmlinux f2fs_gc(sbi=0xFFFFFFE007C03000, background=true, segno=4294967295) > + 1180 > vmlinux gc_thread_func(data=0xFFFFFFE007C03000) + 572 > > > vmlinux kthread() + 376 > > > vmlinux ret_from_fork() + > > Thanks > yanwu