Received: by 2002:a05:6a10:6d10:0:0:0:0 with SMTP id gq16csp1024010pxb; Fri, 15 Apr 2022 18:42:56 -0700 (PDT) X-Google-Smtp-Source: ABdhPJy29n4dWEHpCW4ZDJaUyy7Mv4qwvcOWY93EklwxAwZkr59DGYZBkKIjv6Tqj/tZLVFQUqRX X-Received: by 2002:a65:6a07:0:b0:39d:8c35:426b with SMTP id m7-20020a656a07000000b0039d8c35426bmr1312188pgu.171.1650073376339; Fri, 15 Apr 2022 18:42:56 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1650073376; cv=none; d=google.com; s=arc-20160816; b=ZPlut/o9OSnBLMdR/5BOcupcB7VATL+P8VxezFwReqxcV5IEssDvnDaZvXOCFp7f5h C8nbPx7tTOdMr5ZlsOS0hii0M69IpN4/PIqBQdHr3wcjIRfOnUSGBQXuwLAqFq2gQiUJ ysc6dbQbN0RJ9uwBJQgA30IZIEIlmtmz1dvn75C+K00eJI/aic4AnTok8GcCERQuYpvJ lPm/8V4wOHrVv3leTQn9UVtP4k1fj+25h65zaWEaid/s7sNoNIWoYHzK+kcsVn4uBZNT nEtHCLhGMgQFWAYI78tqst2Iqhoy9Cyd8JAsFb+GY02pmiT2+Y88dlawxVT6QobkNP5l yPWw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:abuse-reports-to:tuid:content-transfer-encoding :in-reply-to:from:references:cc:to:content-language:subject :user-agent:mime-version:date:message-id; bh=tfWBu02x/nTKUaNl5lvpCGr/P7HIwKc0s53MerDQ3PM=; b=kbIW3oqlf9svfO2JM2ZLiyKtMxlM3VHuNfBLhNgoAta18acNRye4pUANPxtBi0AEl4 5DJhx3W7J0b9QaVgm1TyqLDTJ8jHKX1OzmY2FOdO9rNuX90WNNWyGAq4cW4m/WKmrMCx T6+QsNyGIYrshIkYnmXS1lRXwIUvuv42j8gM7CQ2k3ebtzIkgCxP5HlSND6dc4N+GFco k+Yvp4jMEfOyGtQ/HsGCH93M2UyTugYlMIhANqfz1fIFqlRZTcupDwjUmlq87PzfQI3t /g974IKcIvDAX4k1mvKg/u8au3wAYWc0uZAmx8VWrDrYcF5+KyIWFDlCDIPXo847kvpV LxEA== ARC-Authentication-Results: i=1; mx.google.com; spf=softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net. [23.128.96.19]) by mx.google.com with ESMTPS id q22-20020a635c16000000b003821f8bcbb9si3003476pgb.94.2022.04.15.18.42.55 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 15 Apr 2022 18:42:56 -0700 (PDT) Received-SPF: softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) client-ip=23.128.96.19; Authentication-Results: mx.google.com; spf=softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id B862A12759D; Fri, 15 Apr 2022 18:07:36 -0700 (PDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238932AbiDNCaY (ORCPT + 99 others); Wed, 13 Apr 2022 22:30:24 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51962 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229991AbiDNCaX (ORCPT ); Wed, 13 Apr 2022 22:30:23 -0400 Received: from smtp233.corp-email.com (smtp233.corp-email.com [222.73.234.233]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5883844A07 for ; Wed, 13 Apr 2022 19:27:53 -0700 (PDT) Received: from ([114.119.32.142]) by smtp233.corp-email.com ((D)) with ASMTP (SSL) id IWP00146; Thu, 14 Apr 2022 10:27:46 +0800 Received: from [172.16.35.4] (172.16.35.4) by GCY-MBS-28.TCL.com (10.136.3.28) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.18; Thu, 14 Apr 2022 10:27:45 +0800 Message-ID: Date: Thu, 14 Apr 2022 10:27:44 +0800 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.7.0 Subject: Re: [PATCH] f2fs: avoid deadlock in gc thread under low memory Content-Language: en-US To: Jaegeuk Kim CC: , , References: <660530eb62e71fb6520d3596704162e5@sslemail.net> <39c4ded0-09c0-3e38-85cb-5535099b177d@tcl.com> From: Wu Yan In-Reply-To: Content-Type: text/plain; charset="UTF-8"; format=flowed Content-Transfer-Encoding: 7bit X-Originating-IP: [172.16.35.4] X-ClientProxiedBy: GCY-EXS-25.TCL.com (10.74.128.65) To GCY-MBS-28.TCL.com (10.136.3.28) tUid: 20224141027462c065553eb15e849b85c1fd0b57e91a8 X-Abuse-Reports-To: service@corp-email.com Abuse-Reports-To: service@corp-email.com X-Complaints-To: service@corp-email.com X-Report-Abuse-To: service@corp-email.com X-Spam-Status: No, score=-5.7 required=5.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,NICE_REPLY_A, RDNS_NONE,SPF_HELO_NONE,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 4/14/22 10:18, Jaegeuk Kim wrote: > On 04/14, Wu Yan wrote: >> On 4/14/22 01:00, Jaegeuk Kim wrote: >>> On 04/13, Rokudo Yan wrote: >>>> There is a potential deadlock in gc thread may happen >>>> under low memory as below: >>>> >>>> gc_thread_func >>>> -f2fs_gc >>>> -do_garbage_collect >>>> -gc_data_segment >>>> -move_data_block >>>> -set_page_writeback(fio.encrypted_page); >>>> -f2fs_submit_page_write >>>> as f2fs_submit_page_write try to do io merge when possible, so the >>>> encrypted_page is marked PG_writeback but may not submit to block >>>> layer immediately, if system enter low memory when gc thread try >>>> to move next data block, it may do direct reclaim and enter fs layer >>>> as below: >>>> -move_data_block >>>> -f2fs_grab_cache_page(index=?, for_write=false) >>>> -grab_cache_page >>>> -find_or_create_page >>>> -pagecache_get_page >>>> -__page_cache_alloc -- __GFP_FS is set >>>> -alloc_pages_node >>>> -__alloc_pages >>>> -__alloc_pages_slowpath >>>> -__alloc_pages_direct_reclaim >>>> -__perform_reclaim >>>> -try_to_free_pages >>>> -do_try_to_free_pages >>>> -shrink_zones >>>> -mem_cgroup_soft_limit_reclaim >>>> -mem_cgroup_soft_reclaim >>>> -mem_cgroup_shrink_node >>>> -shrink_node_memcg >>>> -shrink_list >>>> -shrink_inactive_list >>>> -shrink_page_list >>>> -wait_on_page_writeback -- the page is marked >>>> writeback during previous move_data_block call >>>> >>>> the gc thread wait for the encrypted_page writeback complete, >>>> but as gc thread held sbi->gc_lock, the writeback & sync thread >>>> may blocked waiting for sbi->gc_lock, so the bio contain the >>>> encrypted_page may nerver submit to block layer and complete the >>>> writeback, which cause deadlock. To avoid this deadlock condition, >>>> we mark the gc thread with PF_MEMALLOC_NOFS flag, then it will nerver >>>> enter fs layer when try to alloc cache page during move_data_block. >>>> >>>> Signed-off-by: Rokudo Yan >>>> --- >>>> fs/f2fs/gc.c | 6 ++++++ >>>> 1 file changed, 6 insertions(+) >>>> >>>> diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c >>>> index e020804f7b07..cc71f77b98c8 100644 >>>> --- a/fs/f2fs/gc.c >>>> +++ b/fs/f2fs/gc.c >>>> @@ -38,6 +38,12 @@ static int gc_thread_func(void *data) >>>> wait_ms = gc_th->min_sleep_time; >>>> + /* >>>> + * Make sure that no allocations from gc thread will ever >>>> + * recurse to the fs layer to avoid deadlock as it will >>>> + * hold sbi->gc_lock during garbage collection >>>> + */ >>>> + memalloc_nofs_save(); >>> >>> I think this cannot cover all the f2fs_gc() call cases. Can we just avoid by: >>> >>> --- a/fs/f2fs/gc.c >>> +++ b/fs/f2fs/gc.c >>> @@ -1233,7 +1233,7 @@ static int move_data_block(struct inode *inode, block_t bidx, >>> CURSEG_ALL_DATA_ATGC : CURSEG_COLD_DATA; >>> >>> /* do not read out */ >>> - page = f2fs_grab_cache_page(inode->i_mapping, bidx, false); >>> + page = f2fs_grab_cache_page(inode->i_mapping, bidx, true); >>> if (!page) >>> return -ENOMEM; >>> >>> Thanks, >>> >>>> set_freezable(); >>>> do { >>>> bool sync_mode, foreground = false; >>>> -- >>>> 2.25.1 >> >> Hi, Jaegeuk >> >> I'm not sure if any other case may trigger the issue, but the stack traces I >> have caught so far are all the same as below: >> >> f2fs_gc-253:12 D 226966.808196 572 302561 150976 0x1200840 0x0 572 >> 237207473347056 >> __switch_to+0x134/0x150 >> __schedule+0xd5c/0x1100 >> io_schedule+0x90/0xc0 >> wait_on_page_bit+0x194/0x208 >> shrink_page_list+0x62c/0xe74 >> shrink_inactive_list+0x2c0/0x698 >> shrink_node_memcg+0x3dc/0x97c >> mem_cgroup_shrink_node+0x144/0x218 >> mem_cgroup_soft_limit_reclaim+0x188/0x47c >> do_try_to_free_pages+0x204/0x3a0 >> try_to_free_pages+0x35c/0x4d0 >> __alloc_pages_nodemask+0x7a4/0x10d0 >> pagecache_get_page+0x184/0x2ec > > Is this deadlock trying to grab a lock, instead of waiting for writeback? > Could you share all the backtraces of the tasks? > > For writeback above, looking at the code, f2fs_gc uses three mappings, meta, > node, and data, and meta/node inodes are masking GFP_NOFS in f2fs_iget(), > while data inode does not. So, the above f2fs_grab_cache_page() in > move_data_block() is actually calling w/o NOFS. > >> do_garbage_collect+0xfe0/0x2828 >> f2fs_gc+0x4a0/0x8ec >> gc_thread_func+0x240/0x4d4 >> kthread+0x17c/0x18c >> ret_from_fork+0x10/0x18 >> >> Thanks >> yanwu Hi, Jaegeuk The gc thread is blocked on wait_on_page_writeback(encrypted page submit before) when it try grab data inode page, the parsed stack traces as below: ppid=572 pid=572 D cpu=1 prio=120 wait=378s f2fs_gc-253:12 Native callstack: vmlinux wait_on_page_bit_common(page=0xFFFFFFBF7D2CD700, state=2, lock=false) + 304 vmlinux wait_on_page_bit(page=0xFFFFFFBF7D2CD700, bit_nr=15) + 400 vmlinux wait_on_page_writeback(page=0xFFFFFFBF7D2CD700) + 36 vmlinux shrink_page_list(page_list=0xFFFFFF8011E83418, pgdat=contig_page_data, sc=0xFFFFFF8011E835B8, ttu_flags=0, stat=0xFFFFFF8011E833F0, force_reclaim=false) + 1576 vmlinux shrink_inactive_list(lruvec=0xFFFFFFE003C304C0, sc=0xFFFFFF8011E835B8, lru=LRU_INACTIVE_FILE) + 700 vmlinux shrink_list(lru=LRU_INACTIVE_FILE, lruvec=0xFFFFFF8011E834B8, sc=0xFFFFFF8011E835B8) + 128 vmlinux shrink_node_memcg(pgdat=contig_page_data, memcg=0xFFFFFFE003C1A300, sc=0xFFFFFF8011E835B8, lru_pages=0xFFFFFF8011E835B0) + 984 vmlinux mem_cgroup_shrink_node(memcg=0xFFFFFFE003C1A300, gfp_mask=21102794, noswap=false, pgdat=contig_page_data, nr_scanned=0xFFFFFF8011E836A0) + 320 vmlinux mem_cgroup_soft_reclaim(root_memcg=0xFFFFFFE003C1A300, pgdat=contig_page_data) + 164 vmlinux mem_cgroup_soft_limit_reclaim(pgdat=contig_page_data, order=0, gfp_mask=21102794, total_scanned=0xFFFFFF8011E83720) + 388 vmlinux shrink_zones(zonelist=contig_page_data + 14784, sc=0xFFFFFF8011E83790) + 352 vmlinux do_try_to_free_pages(zonelist=contig_page_data + 14784, sc=0xFFFFFF8011E83790) + 512 vmlinux try_to_free_pages(zonelist=contig_page_data + 14784, order=0, gfp_mask=21102794, nodemask=0) + 856 vmlinux __perform_reclaim(gfp_mask=300431548, order=0, ac=0xFFFFFF8011E83900) + 60 vmlinux __alloc_pages_direct_reclaim(gfp_mask=300431548, order=0, alloc_flags=300431604, ac=0xFFFFFF8011E83900) + 60 vmlinux __alloc_pages_slowpath(gfp_mask=300431548, order=0, ac=0xFFFFFF8011E83900) + 1244 vmlinux __alloc_pages_nodemask() + 1952 vmlinux __alloc_pages(gfp_mask=21102794, order=0, preferred_nid=0) + 16 vmlinux __alloc_pages_node(nid=0, gfp_mask=21102794, order=0) + 16 vmlinux alloc_pages_node(nid=0, gfp_mask=21102794, order=0) + 16 vmlinux __page_cache_alloc(gfp=21102794) + 16 vmlinux pagecache_get_page() + 384 vmlinux find_or_create_page(offset=209) + 112 vmlinux grab_cache_page(index=209) + 112 vmlinux f2fs_grab_cache_page(index=209, for_write=false) + 112 vmlinux move_data_block(inode=0xFFFFFFDFD578EEA0, gc_type=300432152, segno=21904, off=145) + 3584 vmlinux gc_data_segment(sbi=0xFFFFFFE007C03000, sum=0xFFFFFF8011E83B10, gc_list=0xFFFFFF8011E83AB8, segno=21904, gc_type=300432152) + 3644 vmlinux do_garbage_collect(sbi=0xFFFFFFE007C03000, start_segno=21904, gc_list=0xFFFFFF8011E83CF0, gc_type=0) + 4060 vmlinux f2fs_gc(sbi=0xFFFFFFE007C03000, background=true, segno=4294967295) + 1180 vmlinux gc_thread_func(data=0xFFFFFFE007C03000) + 572 vmlinux kthread() + 376 vmlinux ret_from_fork() + Thanks yanwu