Received: by 2002:a05:6520:3645:b029:c0:f950:43e0 with SMTP id l5csp171848lki; Tue, 16 Mar 2021 22:04:26 -0700 (PDT) X-Google-Smtp-Source: ABdhPJy1jiD1nc9pWJyFwaadBoL7FXHZ2OWkMyZr7tvm/qaszjw4NE8Yh1dAjIZlWZ8AzYaQHdCN X-Received: by 2002:aa7:d0cb:: with SMTP id u11mr39837760edo.163.1615957466394; Tue, 16 Mar 2021 22:04:26 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1615957466; cv=none; d=google.com; s=arc-20160816; b=GNwk4XlZS9fXa0nrChcGpyTFcX1UbXLNABI8WdF9Krg0yTFqbDePMfp6P46hW3wArE wf7Xvi+vQgrl+H1WsqgkC8nT+DsC+hryA7CZymPZ3evNPBgoPHtAeGdTFY74Ttwthr7m GX8vD02UPtPzgZyAs2Zw+fLLyIT7kIyswPhZtycP3x/PYiPokCR86rSsOnfzjFvbtCoa 4IfXHgIN/c/9Y8oeD7yHd3XZs++KeI+l5niBiiLtD5xf1bDyn3TAZ2iqzDiGGvGOc9Ra D8xWM9rzlBXKZ0I43Fp9s6HzV3bay7qZmBaRXGYRZ5ZXW9FZYgIKiENe/YDonUGtIujs n25g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:mime-version:message-id:date :dkim-signature; bh=uR46A4nqOYwwJGkb7CjOd8YvMztGe4D67fEr9Lx0aK4=; b=jO1Zf/JUTTsSe1CvMuNbLFhBvY1od6bs3f0Guqo0APiG3n95uYZmaD7d/QHNDYSlAb n8VrWT8GQxyRIdRmMSlkL5Poa0HwDtCpehTiR190fY1LOwKKPsRwzOuQejSCfS/0XLjA CPcpBODXW20iv1nTStnQL7gLq8SpnnKUYrrKxdyb6N20F1Xj1St0+uf1eIRGABz12Ehs I8UoQ0XHR5J+Be8pye3LsopIZa9JB5hFOGXsZzm6T+3o1NvdpHMYCtitjLcW6vHJ9vBM oV22Zornb4mT8MazV44dEBrwKtGVFC4qNoLoFwrRQEZ0eyV0Bm18Ftpb3Z+0wAHEDybD gq0A== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=m15dh5m3; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id s19si15587542ejb.327.2021.03.16.22.04.03; Tue, 16 Mar 2021 22:04:26 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=m15dh5m3; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229469AbhCQFA0 (ORCPT + 99 others); Wed, 17 Mar 2021 01:00:26 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43568 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229487AbhCQFAY (ORCPT ); Wed, 17 Mar 2021 01:00:24 -0400 Received: from mail-yb1-xb4a.google.com (mail-yb1-xb4a.google.com [IPv6:2607:f8b0:4864:20::b4a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2EC4CC06175F for ; Tue, 16 Mar 2021 22:00:24 -0700 (PDT) Received: by mail-yb1-xb4a.google.com with SMTP id o9so43567378yba.18 for ; Tue, 16 Mar 2021 22:00:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:message-id:mime-version:subject:from:to:cc; bh=uR46A4nqOYwwJGkb7CjOd8YvMztGe4D67fEr9Lx0aK4=; b=m15dh5m3XXgVhf5iWQarg86MSr3VNumGRieYIz4PfcfM8sNthpKwxA5HcN0Art4i5v /bHjfTAZ/ReKbPrZl1wGPgFDj18CL3ET2UtjPuA+eqHEl3uMIJV448Nv6p2y+mxv5Ft1 d4Uxgm4KyT9ZWu89QAO69ESjmDMGlwNrgbhV0rwL+C6TClF9a8BRC5jWeCIpbPtLl64u jJvaJGaXy9/+68C8LLAmwq+P4q3OLbHgIKf8uyJAExOatoEP399TehNCpr3hBgP/X8v8 N2+eYsMdUMeE9fOGiJKANZxdtWdLk4Y65UvhrQDqvPsmpoDoVGObMKxbszmNfNdRhp28 yguQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:message-id:mime-version:subject:from:to:cc; bh=uR46A4nqOYwwJGkb7CjOd8YvMztGe4D67fEr9Lx0aK4=; b=aQCi3RGUgsm8eKDxYvHsnTDI4Gzxqu0gq/ODHSGdgXxgo8K7OPoWVeALTfuBcwhFcR BZ923wCo2j6R7scKwtHAPyTx0BGjYQsFT5qUns5lWctN10o+wNLROFURTxw9eudhgolG fzY9EdaZNdqfxnDo1HW63N5DhqIzMnwnNGnQ7daiOjFnmyzIX2BhUmDJrcf12TF6G+Db WwlVWrOidysJ2JMoaNoqUu5f1GVkfIA74XLJLaJRQf3BADiyBCkXwElbjmbg0gU1nyj0 +rQwNgdWR+v8wFBXaCnrJdtabuJTer8DeGXsHt1tjB/gRIGDXk0pH3JnK6wKnbIQFyAW DXLw== X-Gm-Message-State: AOAM531YQIobt3KXsZnTbtHYFjfs5jewTcvGywQMjW33KPd1TDgKQEoH V4fD5WqtGpUU1IT9N7/RjauMiLXoyNTG X-Received: from joshdon.svl.corp.google.com ([2620:15c:2cd:202:d554:135a:678a:ce01]) (user=joshdon job=sendgmr) by 2002:a25:dad4:: with SMTP id n203mr2720954ybf.233.1615957223214; Tue, 16 Mar 2021 22:00:23 -0700 (PDT) Date: Tue, 16 Mar 2021 21:59:49 -0700 Message-Id: <20210317045949.1584952-1-joshdon@google.com> Mime-Version: 1.0 X-Mailer: git-send-email 2.31.0.rc2.261.g7f71774620-goog Subject: [PATCH] sched: Warn on long periods of pending need_resched From: Josh Don To: Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot Cc: Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Daniel Bristot de Oliveira , Luis Chamberlain , Kees Cook , Iurii Zaikin , linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, David Rientjes , Oleg Rombakh , Paul Turner , Josh Don Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Paul Turner CPU scheduler marks need_resched flag to signal a schedule() on a particular CPU. But, schedule() may not happen immediately in cases where the current task is executing in the kernel mode (no preemption state) for extended periods of time. This patch adds a warn_on if need_resched is pending for more than the time specified in sysctl resched_latency_warn_ms. Monitoring is done via the tick and the accuracy is hence limited to jiffy scale. This also means that we won't trigger the warning if the tick is disabled. If resched_latency_warn_ms is set to the default value, only one warning will be produced per boot. This warning only exists under CONFIG_SCHED_DEBUG. If it goes off, it is likely that there is a missing cond_resched() somewhere. Signed-off-by: Paul Turner Signed-off-by: Josh Don --- We've caught various bugs using this patch. In fact, a followup patch to this one will be a patch introducing a missing cond_resched(). However, this may be too noisy to have as default enabled (especially given that it requires some tuning); I'm open to having this default disabled instead. kernel/sched/core.c | 91 ++++++++++++++++++++++++++++++++++++++++++++ kernel/sched/sched.h | 6 +++ kernel/sysctl.c | 8 ++++ 3 files changed, 105 insertions(+) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 98191218d891..ac037fc87d5e 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -58,7 +58,25 @@ const_debug unsigned int sysctl_sched_features = #include "features.h" 0; #undef SCHED_FEAT + +/* + * Print a warning if need_resched is set for at least this long. At the + * default value, only a single warning will be printed per boot. + * + * Values less than 2 disable the feature. + * + * A kernel compiled with CONFIG_KASAN tends to run more slowly on average. + * Increase the need_resched timeout to reduce false alarms. + */ +#ifdef CONFIG_KASAN +#define RESCHED_DEFAULT_WARN_LATENCY_MS 101 +#define RESCHED_BOOT_QUIET_SEC 600 +#else +#define RESCHED_DEFAULT_WARN_LATENCY_MS 51 +#define RESCHED_BOOT_QUIET_SEC 300 #endif +int sysctl_resched_latency_warn_ms = RESCHED_DEFAULT_WARN_LATENCY_MS; +#endif /* CONFIG_SCHED_DEBUG */ /* * Number of tasks to iterate in a single balance run. @@ -4520,6 +4538,71 @@ unsigned long long task_sched_runtime(struct task_struct *p) return ns; } +#ifdef CONFIG_SCHED_DEBUG + +static inline u64 resched_latency_check(struct rq *rq) +{ + int latency_warn_ms = READ_ONCE(sysctl_resched_latency_warn_ms); + bool warn_only_once = (latency_warn_ms == RESCHED_DEFAULT_WARN_LATENCY_MS); + u64 need_resched_latency, now = rq_clock(rq); + static bool warned_once; + + if (warn_only_once && warned_once) + return 0; + + if (!need_resched() || latency_warn_ms < 2) + return 0; + + /* Disable this warning for the first few mins after boot */ + if (now < RESCHED_BOOT_QUIET_SEC * NSEC_PER_SEC) + return 0; + + if (!rq->last_seen_need_resched_ns) { + rq->last_seen_need_resched_ns = now; + rq->ticks_without_resched = 0; + return 0; + } + + rq->ticks_without_resched++; + need_resched_latency = now - rq->last_seen_need_resched_ns; + if (need_resched_latency <= latency_warn_ms * NSEC_PER_MSEC) + return 0; + + warned_once = true; + + return need_resched_latency; +} + +static inline void resched_latency_warn(int cpu, u64 latency) +{ + static DEFINE_RATELIMIT_STATE(latency_check_ratelimit, 60 * 60 * HZ, 1); + + WARN(__ratelimit(&latency_check_ratelimit), + "CPU %d: need_resched set for > %llu ns (%d ticks) " + "without schedule\n", + cpu, latency, cpu_rq(cpu)->ticks_without_resched); +} + + +static int __init setup_resched_latency_warn_ms(char *str) +{ + long val; + + if ((kstrtol(str, 0, &val))) { + pr_warn("Unable to set resched_latency_warn_ms\n"); + return 1; + } + + sysctl_resched_latency_warn_ms = val; + return 1; +} +__setup("resched_latency_warn_ms=", setup_resched_latency_warn_ms); +#else +static inline u64 resched_latency_check(struct rq *rq) { return 0; } +static inline void resched_latency_warn(int cpu, u64 latency) {} +#endif /* CONFIG_SCHED_DEBUG */ + + /* * This function gets called by the timer code, with HZ frequency. * We call it with interrupts disabled. @@ -4531,6 +4614,7 @@ void scheduler_tick(void) struct task_struct *curr = rq->curr; struct rq_flags rf; unsigned long thermal_pressure; + u64 resched_latency; arch_scale_freq_tick(); sched_clock_tick(); @@ -4541,11 +4625,15 @@ void scheduler_tick(void) thermal_pressure = arch_scale_thermal_pressure(cpu_of(rq)); update_thermal_load_avg(rq_clock_thermal(rq), rq, thermal_pressure); curr->sched_class->task_tick(rq, curr, 0); + resched_latency = resched_latency_check(rq); calc_global_load_tick(rq); psi_task_tick(rq); rq_unlock(rq, &rf); + if (resched_latency) + resched_latency_warn(cpu, resched_latency); + perf_event_task_tick(); #ifdef CONFIG_SMP @@ -5040,6 +5128,9 @@ static void __sched notrace __schedule(bool preempt) next = pick_next_task(rq, prev, &rf); clear_tsk_need_resched(prev); clear_preempt_need_resched(); +#ifdef CONFIG_SCHED_DEBUG + rq->last_seen_need_resched_ns = 0; +#endif if (likely(prev != next)) { rq->nr_switches++; diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index 10a1522b1e30..912a8886bc7f 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -57,6 +57,7 @@ #include #include #include +#include #include #include #include @@ -963,6 +964,11 @@ struct rq { atomic_t nr_iowait; +#ifdef CONFIG_SCHED_DEBUG + u64 last_seen_need_resched_ns; + int ticks_without_resched; +#endif + #ifdef CONFIG_MEMBARRIER int membarrier_state; #endif diff --git a/kernel/sysctl.c b/kernel/sysctl.c index 62fbd09b5dc1..526094fc364e 100644 --- a/kernel/sysctl.c +++ b/kernel/sysctl.c @@ -193,6 +193,7 @@ static int max_wakeup_granularity_ns = NSEC_PER_SEC; /* 1 second */ static int min_sched_tunable_scaling = SCHED_TUNABLESCALING_NONE; static int max_sched_tunable_scaling = SCHED_TUNABLESCALING_END-1; #endif /* CONFIG_SMP */ +extern int sysctl_resched_latency_warn_ms; #endif /* CONFIG_SCHED_DEBUG */ #ifdef CONFIG_COMPACTION @@ -1763,6 +1764,13 @@ static struct ctl_table kern_table[] = { .extra2 = SYSCTL_ONE, }, #endif /* CONFIG_NUMA_BALANCING */ + { + .procname = "resched_latency_warn_ms", + .data = &sysctl_resched_latency_warn_ms, + .maxlen = sizeof(int), + .mode = 0644, + .proc_handler = proc_dointvec, + }, #endif /* CONFIG_SCHED_DEBUG */ { .procname = "sched_rt_period_us", -- 2.31.0.rc2.261.g7f71774620-goog