From:   Petr Mladek <pmladek@suse.com>
To:     Tejun Heo <tj@kernel.org>
Cc:     Lai Jiangshan <jiangshanlai@gmail.com>,
        Michal Koutny <mkoutny@suse.com>, linux-kernel@vger.kernel.org,
        Petr Mladek <pmladek@suse.com>
Subject: [RFC 5/5] workqueue: Print backtraces from CPUs with hung CPU bound workqueues
Date:   Wed,  1 Feb 2023 14:45:43 +0100
Message-Id: <20230201134543.13687-6-pmladek@suse.com>
In-Reply-To: <20230201134543.13687-1-pmladek@suse.com>
References: <20230201134543.13687-1-pmladek@suse.com>
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Precedence: bulk

The workqueue watchdog reports a lockup when there was not any progress
in the worker pool for a long time. The progress means that a pending
work item starts being proceed.

Worker pools for unbound workqueues always wake up an idle worker and
try to process the work immediately. The last idle worker has to create
new worker first. The stall might happen only when a new worker could
not be created in which case some warnings should get printed.
Another problem might be too high load. In this case, workers are
victims of a global system problem.

Worker pools for CPU bound workqueues are designed for lightweight
work items that do not need much CPU time. They are ideally proceed
on a single worker. New worker is used only when a work is sleeping.
It brings one additional problem. The stall might happen when
the CPU-bound workqueue is used for CPU-intensive work.

The problem happens when the CPU-intensive work is not sleeping/waiting
for a long time. Such a worker would be in TASK_RUNNING state.

In this case, it might be useful to see backtrace from the problematic
worker.

wq_watchdog_timer_fn() starts with one line report for each stalled worker
pool. All the extra information are printed later.

The most precise solution would be to make a note in struct worker_pool
when the first check found a stall in a CPU-bound workqueue.

A good enough heuristic should be printing backtraces of workers in
TASK_RUNNING state on CPUs where the stall was detected. After all,
there are just two CPU-bound workers pools for normal and high priority
workers.

Signed-off-by: Petr Mladek <pmladek@suse.com>
---
 kernel/workqueue.c | 72 ++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 72 insertions(+)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 58b835420435..c6017d178b91 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -49,6 +49,7 @@
 #include <linux/moduleparam.h>
 #include <linux/uaccess.h>
 #include <linux/sched/isolation.h>
+#include <linux/sched/debug.h>
 #include <linux/nmi.h>
 #include <linux/kvm_para.h>
 
@@ -5911,6 +5912,68 @@ static struct timer_list wq_watchdog_timer;
 
 static unsigned long wq_watchdog_touched = INITIAL_JIFFIES;
 static DEFINE_PER_CPU(unsigned long, wq_watchdog_touched_cpu) = INITIAL_JIFFIES;
+static DEFINE_PER_CPU(bool, wq_watchdog_hung_cpu);
+
+/*
+ * Show workers that might prevent processing pending work items.
+ * The only candidates are CPU-bound workers in a running state.
+ * Pending work items should be handled by another idle worker in all
+ * other situations.
+ */
+static bool show_pool_suspicious_workers(struct worker_pool *pool, bool shown_title)
+{
+	bool shown_any = false;
+	struct worker *worker;
+	unsigned long flags;
+	int bkt;
+
+	raw_spin_lock_irqsave(&pool->lock, flags);
+
+	if (pool->cpu < 0)
+		goto out;
+
+	if (!per_cpu(wq_watchdog_hung_cpu, pool->cpu))
+		goto out;
+
+	if (list_empty(&pool->worklist))
+		goto out;
+
+	hash_for_each(pool->busy_hash, bkt, worker, hentry) {
+		if (!task_is_running(worker->task))
+			continue;
+
+		if (!shown_title) {
+			pr_info("Showing backtraces of running workers in stuck CPU-bound worker pools:\n");
+			shown_title = true;
+		}
+
+		shown_any = true;
+		pr_info("pool %d:\n", pool->id);
+		sched_show_task(worker->task);
+	}
+out:
+	raw_spin_unlock_irqrestore(&pool->lock, flags);
+	return shown_any;
+}
+
+static void show_suspicious_workers(void)
+{
+	struct worker_pool *pool;
+	bool shown_title = false;
+	int pi;
+
+	rcu_read_lock();
+
+	for_each_pool(pool, pi) {
+		bool shown;
+
+		shown = show_pool_suspicious_workers(pool, shown_title);
+		if (shown)
+			shown_title = shown;
+	}
+
+	rcu_read_unlock();
+}
 
 static void wq_watchdog_reset_touched(void)
 {
@@ -5927,11 +5990,15 @@ static void wq_watchdog_timer_fn(struct timer_list *unused)
 	bool lockup_detected = false;
 	unsigned long now = jiffies;
 	struct worker_pool *pool;
+	int cpu;
 	int pi;
 
 	if (!thresh)
 		return;
 
+	for_each_online_cpu(cpu)
+		per_cpu(wq_watchdog_hung_cpu, cpu) = false;
+
 	rcu_read_lock();
 
 	for_each_pool(pool, pi) {
@@ -5961,11 +6028,15 @@ static void wq_watchdog_timer_fn(struct timer_list *unused)
 		/* did we stall? */
 		if (time_after(now, ts + thresh)) {
 			lockup_detected = true;
+			if (pool->cpu >= 0)
+				per_cpu(wq_watchdog_hung_cpu, pool->cpu) = true;
 			pr_emerg("BUG: workqueue lockup - pool");
 			pr_cont_pool_info(pool);
 			pr_cont(" stuck for %us!\n",
 				jiffies_to_msecs(now - pool_ts) / 1000);
 		}
+
+
 	}
 
 	rcu_read_unlock();
@@ -5973,6 +6044,7 @@ static void wq_watchdog_timer_fn(struct timer_list *unused)
 	if (lockup_detected) {
 		print_create_worker_failed_num();
 		show_all_workqueues();
+		show_suspicious_workers();
 	}
 
 	wq_watchdog_reset_touched();
-- 
2.35.3