Received: by 2002:ad5:4acb:0:0:0:0:0 with SMTP id n11csp3799534imw; Thu, 7 Jul 2022 07:50:10 -0700 (PDT) X-Google-Smtp-Source: AGRyM1uVaD6NdCfnXdjxrP3yhGc+3baRxnOcX+wljXJaK4zlO6UV8CprgglnCiOqGVVETl0ScInk X-Received: by 2002:a17:90b:4a0c:b0:1ec:d90c:601d with SMTP id kk12-20020a17090b4a0c00b001ecd90c601dmr5694639pjb.154.1657205410539; Thu, 07 Jul 2022 07:50:10 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1657205410; cv=none; d=google.com; s=arc-20160816; b=cNO7YDF4lRd/XaNGsV2rWt/jvqz1mHYufbH25Wgog2OUB2NOzgaeLX2KZQhSMEqHEm vw4uZJWHDk2h+0AaQAiNsEKwPLtwPBIJ2mZaF928jErE+JjQwpQIsZByMTKDy0XI6tOs +cXiesWnJlMnQB5GOUa53cKbni+PbKcAKzV+d+xzCIeKBq2CA3qh83eENv8cp2fKMVYc m96HSbZsW4VQ1wfGZUIJWIi4R743cZAiIolqDA+AjOboslJ8bk1mB4UHtwCb7SdZCJ9H 7In6XNo78A81LDJZHMFCoPq2VgS40V+Oq7dAyUrJbgTfw7O/Bq/zegqs/wFHNb2g4CnW j5Cw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from:dkim-signature; bh=wvNkANiSexAoN2cm67vLnTE25j5/JmIiqiUZ1CJD9Xg=; b=sZgITufNTT8z4kwkqUlBIr0ZXoDIooj/4OrGC+eM6u3E7rR2IRa/23GS6InfyXqXwb NBoBXmqU1YIFCj2sKsL8gzEyl/kd3JhzpZIN6fbAF53C3Q9l7YqElqYOzFkKIddgIdFv 9mc0jQPlR4oVdo17AONxwgUe1juZqoiCw+zrq4aNzmUXHp1ielrRNCYkG6K7AmO/Mok9 fMA3WREgsUz1q+Q/28xO3roleSkNtGb5kfabQEiR9r5qKq3H3wqPvvk9TaKfwfROoeUr SD5KtL5o1PCKMuQ0L6urYvBJYUnjmrCxFMsehKoOdCWpqjpzHyWVgf/sQwKAgpFnX+EX Og8g== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass (test mode) header.i=@metanate.com header.s=stronger header.b="Qad/GxnM"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=metanate.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id d18-20020a63d712000000b003fc242727efsi33333990pgg.85.2022.07.07.07.49.55; Thu, 07 Jul 2022 07:50:10 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass (test mode) header.i=@metanate.com header.s=stronger header.b="Qad/GxnM"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=metanate.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235892AbiGGOjW (ORCPT + 99 others); Thu, 7 Jul 2022 10:39:22 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60648 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235889AbiGGOjU (ORCPT ); Thu, 7 Jul 2022 10:39:20 -0400 Received: from metanate.com (unknown [IPv6:2001:8b0:1628:5005::111]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 759FF2F3BF; Thu, 7 Jul 2022 07:39:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=metanate.com; s=stronger; h=Content-Transfer-Encoding:Message-Id:Date: Subject:Cc:To:From:Content-Type:Reply-To:Content-ID:Content-Description: In-Reply-To:References; bh=wvNkANiSexAoN2cm67vLnTE25j5/JmIiqiUZ1CJD9Xg=; b=Qa d/GxnMkevIE0tOULnkmo4Z6YoyPSzGGAYUjDGiL9Jlttn3RnLmiY27/9NYnTOj69d+Cq2AVgobmoS ZUse83A82Vkp2vJ4UO4a+m0ozB+N2wXW9KIbd1Phu+BnLn90sYKEydjZO9M4+GPox7vdJxCG+VvnV 0mBDZ+5fgO8ZvzPTqgy5zGgvmEgjz9Kn+a2Ean5rkvQLvsfLfJNAPRX2lMB9M96T5TRXeWZbU/Bvr Fgh/Nm895kZEMw9QQ8X68IDXB5s8He0o1BOUjafdLMpXLSUpfKpQkDxl+mxKEjm9qaJJW+6XUOle5 5rlQWF+VtEjbNb8SXhKwRBoHxX6sgAVA==; Received: from [81.174.171.191] (helo=donbot.metanate.com) by email.metanate.com with esmtpsa (TLS1.3) tls TLS_AES_256_GCM_SHA384 (Exim 4.95) (envelope-from ) id 1o9Ser-0000MM-DM; Thu, 07 Jul 2022 15:39:10 +0100 From: John Keeping To: linux-kernel@vger.kernel.org Cc: linux-rt-users@vger.kernel.org, John Keeping , Sebastian Andrzej Siewior , Peter Zijlstra , Thomas Gleixner , Ingo Molnar , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Daniel Bristot de Oliveira , Valentin Schneider Subject: [PATCH] sched/core: Always flush pending blk_plug Date: Thu, 7 Jul 2022 15:39:02 +0100 Message-Id: <20220707143902.529938-1-john@metanate.com> X-Mailer: git-send-email 2.37.0 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Authenticated: YES X-Spam-Status: No, score=-1.3 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RDNS_NONE,SPF_HELO_PASS, SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org With CONFIG_PREEMPT_RT, it is possible to hit a deadlock between two normal priority tasks (SCHED_OTHER, nice level zero): INFO: task kworker/u8:0:8 blocked for more than 491 seconds. Not tainted 5.15.49-rt46 #1 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. task:kworker/u8:0 state:D stack: 0 pid: 8 ppid: 2 flags:0x00000000 Workqueue: writeback wb_workfn (flush-7:0) [] (__schedule) from [] (schedule+0xdc/0x134) [] (schedule) from [] (rt_mutex_slowlock_block.constprop.0+0xb8/0x174) [] (rt_mutex_slowlock_block.constprop.0) from [] +(rt_mutex_slowlock.constprop.0+0xac/0x174) [] (rt_mutex_slowlock.constprop.0) from [] (fat_write_inode+0x34/0x54) [] (fat_write_inode) from [] (__writeback_single_inode+0x354/0x3ec) [] (__writeback_single_inode) from [] (writeback_sb_inodes+0x250/0x45c) [] (writeback_sb_inodes) from [] (__writeback_inodes_wb+0x7c/0xb8) [] (__writeback_inodes_wb) from [] (wb_writeback+0x2c8/0x2e4) [] (wb_writeback) from [] (wb_workfn+0x1a4/0x3e4) [] (wb_workfn) from [] (process_one_work+0x1fc/0x32c) [] (process_one_work) from [] (worker_thread+0x22c/0x2d8) [] (worker_thread) from [] (kthread+0x16c/0x178) [] (kthread) from [] (ret_from_fork+0x14/0x38) Exception stack(0xc10e3fb0 to 0xc10e3ff8) 3fa0: 00000000 00000000 00000000 00000000 3fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 3fe0: 00000000 00000000 00000000 00000000 00000013 00000000 INFO: task tar:2083 blocked for more than 491 seconds. Not tainted 5.15.49-rt46 #1 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. task:tar state:D stack: 0 pid: 2083 ppid: 2082 flags:0x00000000 [] (__schedule) from [] (schedule+0xdc/0x134) [] (schedule) from [] (io_schedule+0x14/0x24) [] (io_schedule) from [] (bit_wait_io+0xc/0x30) [] (bit_wait_io) from [] (__wait_on_bit_lock+0x54/0xa8) [] (__wait_on_bit_lock) from [] (out_of_line_wait_on_bit_lock+0x84/0xb0) [] (out_of_line_wait_on_bit_lock) from [] (fat_mirror_bhs+0xa0/0x144) [] (fat_mirror_bhs) from [] (fat_alloc_clusters+0x138/0x2a4) [] (fat_alloc_clusters) from [] (fat_alloc_new_dir+0x34/0x250) [] (fat_alloc_new_dir) from [] (vfat_mkdir+0x58/0x148) [] (vfat_mkdir) from [] (vfs_mkdir+0x68/0x98) [] (vfs_mkdir) from [] (do_mkdirat+0xb0/0xec) [] (do_mkdirat) from [] (ret_fast_syscall+0x0/0x1c) Exception stack(0xc2e1bfa8 to 0xc2e1bff0) bfa0: 01ee42f0 01ee4208 01ee42f0 000041ed 00000000 00004000 bfc0: 01ee42f0 01ee4208 00000000 00000027 01ee4302 00000004 000dcb00 01ee4190 bfe0: 000dc368 bed11924 0006d4b0 b6ebddfc Here the kworker is waiting on msdos_sb_info::s_lock which is held by tar which is in turn waiting for a buffer which is locked waiting to be flushed, but this operation is plugged in the kworker. The lock is a normal struct mutex, so tsk_is_pi_blocked() will always return false on !RT and thus the behaviour changes for RT. It seems that the intent here is to skip blk_flush_plug() in the case where a non-preemptible lock (such as a spinlock) has been converted to a rtmutex on RT, which is the case covered by the SM_RTLOCK_WAIT schedule flag. But sched_submit_work() is only called from schedule() which is never called in this scenario, so the check can simply be deleted. Looking at the history of the -rt patchset, in fact this change was present from v5.9.1-rt20 until being dropped in v5.13-rt1 as it was part of a larger patch [1] most of which was replaced by commit b4bfa3fcfe3b ("sched/core: Rework the __schedule() preempt argument"). [1] https://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-rt-devel.git/tree/patches/0022-locking-rtmutex-Use-custom-scheduling-function-for-s.patch?h=linux-5.10.y-rt-patches Cc: Sebastian Andrzej Siewior Cc: Peter Zijlstra Cc: Thomas Gleixner Signed-off-by: John Keeping --- include/linux/sched/rt.h | 8 -------- kernel/sched/core.c | 3 --- 2 files changed, 11 deletions(-) diff --git a/include/linux/sched/rt.h b/include/linux/sched/rt.h index e5af028c08b4..994c25640e15 100644 --- a/include/linux/sched/rt.h +++ b/include/linux/sched/rt.h @@ -39,20 +39,12 @@ static inline struct task_struct *rt_mutex_get_top_task(struct task_struct *p) } extern void rt_mutex_setprio(struct task_struct *p, struct task_struct *pi_task); extern void rt_mutex_adjust_pi(struct task_struct *p); -static inline bool tsk_is_pi_blocked(struct task_struct *tsk) -{ - return tsk->pi_blocked_on != NULL; -} #else static inline struct task_struct *rt_mutex_get_top_task(struct task_struct *task) { return NULL; } # define rt_mutex_adjust_pi(p) do { } while (0) -static inline bool tsk_is_pi_blocked(struct task_struct *tsk) -{ - return false; -} #endif extern void normalize_rt_tasks(void); diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 1d4660a1915b..e4974fe003b5 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -6578,9 +6578,6 @@ static inline void sched_submit_work(struct task_struct *tsk) io_wq_worker_sleeping(tsk); } - if (tsk_is_pi_blocked(tsk)) - return; - /* * If we are going to sleep and we have plugged IO queued, * make sure to submit it to avoid deadlocks. -- 2.37.0