Received: by 2002:a19:f614:0:0:0:0:0 with SMTP id x20csp60742lfe; Fri, 15 Apr 2022 19:31:16 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwxmXU7nZw5C7aBydFzanWA4WtVy2Nm/0SRsd30H1QDK9Lj5C5UiYoEV/Ye0YgcX+lU9EsE X-Received: by 2002:a63:e617:0:b0:382:9ad9:d829 with SMTP id g23-20020a63e617000000b003829ad9d829mr1355157pgh.553.1650076276057; Fri, 15 Apr 2022 19:31:16 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1650076276; cv=none; d=google.com; s=arc-20160816; b=taOI3rzb+bjTbRIuPv0UB6tQMyBFJZwvuVo8w12BoWe+nTtOJDJDh43Hpbr4yoBGd/ vW86TGJMxQgeRfyfyIfZd57Tt1Yrk1gNLzBmaLrgVh5PWGs9rw1IzW3N/dB723cBRXni MUabnn0GvOP0zG+hyuhPlP5iCvFoavrPgEy6kpKq880RiXh+4UzVay5v1Febj0V8q6R8 0ZocCWaL/3U6wKb2J7O5nSo6xmTExoSLSCtNuQVlFgztaTiPMoUJyxMbe76KweFQXwjn piMy19hg8CKyFMv/JKX6jBgJCl/NIheaoJgKx5Zh225tKO85Sa9DfZO8yBcAia4g7GOx 8y5g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from; bh=5g3f/X6ZIrMSvI4Dfi3jFcrJAaVaz4kj7xhZzo0RmuU=; b=WgaGJ3N9EflRcNvJIPWew7oFUG0S1tE5jcx3CXWTM+MynFD9fF2znRcZjfOPMAQoB4 NG768NTtXReYbeVx/cd0vHgvMZFAB+zJS/Kt0epTjGkLvKtpvZAWXXYTeXxHwQI7FTfl Ms3TYJ17dnDcS2xmv84D1F9cbrIIS9yL1gIog5kUMaN5c5u9S+VpP+emscGLVIOtJhhm /XMKQvCsviyV1i0tAwsKx6e6KJnYSIM9zF8sjB+rW25J6gFMgd/4yfVNpHPFpLedSNOc mSAMFAk8KuZlhib0VOoOCpmuFtl9kT2GiE5WyP3Jq9Ll1PNyxCCjjhmVIXff7a1TtFM1 2ceA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=huawei.com Return-Path: Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net. [2620:137:e000::1:18]) by mx.google.com with ESMTPS id e184-20020a6369c1000000b003a2fe597486si2680249pgc.117.2022.04.15.19.31.15 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 15 Apr 2022 19:31:16 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) client-ip=2620:137:e000::1:18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=huawei.com Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 862351252B3; Fri, 15 Apr 2022 18:41:12 -0700 (PDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1348406AbiDOBZj (ORCPT + 99 others); Thu, 14 Apr 2022 21:25:39 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50046 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1343588AbiDOBZh (ORCPT ); Thu, 14 Apr 2022 21:25:37 -0400 Received: from szxga03-in.huawei.com (szxga03-in.huawei.com [45.249.212.189]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1600A66C96; Thu, 14 Apr 2022 18:23:11 -0700 (PDT) Received: from kwepemi100012.china.huawei.com (unknown [172.30.72.56]) by szxga03-in.huawei.com (SkyGuard) with ESMTP id 4Kfdkx3d7PzCqw6; Fri, 15 Apr 2022 09:18:49 +0800 (CST) Received: from kwepemm600013.china.huawei.com (7.193.23.68) by kwepemi100012.china.huawei.com (7.221.188.202) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.24; Fri, 15 Apr 2022 09:23:09 +0800 Received: from huawei.com (10.175.127.227) by kwepemm600013.china.huawei.com (7.193.23.68) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.24; Fri, 15 Apr 2022 09:23:08 +0800 From: Zhihao Cheng To: , , CC: , , , Subject: [PATCH] fs-writeback: Flush plug before next iteration in wb_writeback() Date: Fri, 15 Apr 2022 09:37:35 +0800 Message-ID: <20220415013735.1610091-1-chengzhihao1@huawei.com> X-Mailer: git-send-email 2.31.1 MIME-Version: 1.0 Content-Transfer-Encoding: 7BIT Content-Type: text/plain; charset=US-ASCII X-Originating-IP: [10.175.127.227] X-ClientProxiedBy: dggems701-chm.china.huawei.com (10.3.19.178) To kwepemm600013.china.huawei.com (7.193.23.68) X-CFilter-Loop: Reflected X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,RDNS_NONE, SPF_HELO_NONE,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Commit 505a666ee3fc ("writeback: plug writeback in wb_writeback() and writeback_inodes_wb()") has us holding a plug during wb_writeback, which may cause a potential ABBA dead lock: wb_writeback fat_file_fsync blk_start_plug(&plug) for (;;) { iter i-1: some reqs have been added into plug->mq_list // LOCK A iter i: progress = __writeback_inodes_wb(wb, work) . writeback_sb_inodes // fat's bdev . __writeback_single_inode . . generic_writepages . . __block_write_full_page . . . . __generic_file_fsync . . . . sync_inode_metadata . . . . writeback_single_inode . . . . __writeback_single_inode . . . . fat_write_inode . . . . __fat_write_inode . . . . sync_dirty_buffer // fat's bdev . . . . lock_buffer(bh) // LOCK B . . . . submit_bh . . . . blk_mq_get_tag // LOCK A . . . trylock_buffer(bh) // LOCK B . . . redirty_page_for_writepage . . . wbc->pages_skipped++ . . --wbc->nr_to_write . wrote += write_chunk - wbc.nr_to_write // wrote > 0 . requeue_inode . redirty_tail_locked if (progress) // progress > 0 continue; iter i+1: queue_io // similar process with iter i, infinite for-loop ! } blk_finish_plug(&plug) // flush plug won't be called Above process triggers a hungtask like: [ 399.044861] INFO: task bb:2607 blocked for more than 30 seconds. [ 399.046824] Not tainted 5.18.0-rc1-00005-gefae4d9eb6a2-dirty [ 399.051539] task:bb state:D stack: 0 pid: 2607 ppid: 2426 flags:0x00004000 [ 399.051556] Call Trace: [ 399.051570] __schedule+0x480/0x1050 [ 399.051592] schedule+0x92/0x1a0 [ 399.051602] io_schedule+0x22/0x50 [ 399.051613] blk_mq_get_tag+0x1d3/0x3c0 [ 399.051640] __blk_mq_alloc_requests+0x21d/0x3f0 [ 399.051657] blk_mq_submit_bio+0x68d/0xca0 [ 399.051674] __submit_bio+0x1b5/0x2d0 [ 399.051708] submit_bio_noacct+0x34e/0x720 [ 399.051718] submit_bio+0x3b/0x150 [ 399.051725] submit_bh_wbc+0x161/0x230 [ 399.051734] __sync_dirty_buffer+0xd1/0x420 [ 399.051744] sync_dirty_buffer+0x17/0x20 [ 399.051750] __fat_write_inode+0x289/0x310 [ 399.051766] fat_write_inode+0x2a/0xa0 [ 399.051783] __writeback_single_inode+0x53c/0x6f0 [ 399.051795] writeback_single_inode+0x145/0x200 [ 399.051803] sync_inode_metadata+0x45/0x70 [ 399.051856] __generic_file_fsync+0xa3/0x150 [ 399.051880] fat_file_fsync+0x1d/0x80 [ 399.051895] vfs_fsync_range+0x40/0xb0 [ 399.051929] __x64_sys_fsync+0x18/0x30 In my test, 'need_resched()' (which is imported by 590dca3a71 "fs-writeback: unplug before cond_resched in writeback_sb_inodes") in function 'writeback_sb_inodes()' seldom comes true, unless cond_resched() is deleted from write_cache_pages(). Fix it by flush plug before next iteration in wb_writeback(). Goto Link to find a reproducer. Link: https://bugzilla.kernel.org/show_bug.cgi?id=215837 Cc: stable@vger.kernel.org # v4.3 Reported-by: Zhihao Cheng --- fs/fs-writeback.c | 15 ++++++++++++++- 1 file changed, 14 insertions(+), 1 deletion(-) diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c index 591fe9cf1659..e524c0a1749c 100644 --- a/fs/fs-writeback.c +++ b/fs/fs-writeback.c @@ -2036,8 +2036,21 @@ static long wb_writeback(struct bdi_writeback *wb, * mean the overall work is done. So we keep looping as long * as made some progress on cleaning pages or inodes. */ - if (progress) + if (progress) { + /* + * The progress may be false postive in page redirty + * case (which is caused by failing to get buffer head + * lock), which will requeue dirty inodes and start + * next writeback iteration, and other tasks maybe + * stuck for getting tags for new requests. So, flush + * plug to schedule requests holding tags. + * + * The code can be removed after buffer head + * disappering from linux. + */ + blk_flush_plug(current->plug, false); continue; + } /* * No more inodes for IO, bail */ -- 2.31.1