Received: by 10.223.176.46 with SMTP id f43csp195845wra; Thu, 18 Jan 2018 16:06:22 -0800 (PST) X-Google-Smtp-Source: ACJfBov3IzRqCdkcek35LHR8D/oGSTp/+r+zcHlGTb1ddw1aCTrCtekoF2TOWCFFXIUEmCYT9UXj X-Received: by 2002:a17:902:8f94:: with SMTP id z20-v6mr613097plo.72.1516320382358; Thu, 18 Jan 2018 16:06:22 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1516320382; cv=none; d=google.com; s=arc-20160816; b=aKFaNLCBN4MrFG+fIYxKz7Qd+KJMzCPddKm2MMzxS60YuyH/9mf09vNlA6wfAPUUOz DvRfBu8zO30O08K/fbSjgB11O0Kq2ZTQxWn4bvCwHfdXUzhp3NLYY7JwjB5JL5ZjzTiE ueZg5AEKJUla7LCwlcTCMDJoBzSeQ8FBAHdJTIHbOcvi5J41WbCUN0a6nu8x8xydEM73 YJZ/ybTb0qFJpYy0IIMZ0y06364lXhFBRWWVKVnTU6/Doa0/jjT6fSOkLSmJdx7g+qgA KRGoUqsp0EOGDN5Wv51rcI8IMJaE9+fIhGdh/G2yiSiHKKuavDx1eZhkrRcNoV9RkMT4 4KTw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:references:in-reply-to:message-id:date :subject:cc:to:from:dmarc-filter:arc-authentication-results; bh=RLySC84U0fr1c00VlJIM2L0Yn6+OXQkPB3/O+OjVj7E=; b=1IGZLM2Ki9V8EqY1wCTu1A1JBJ9LvFmpPLsMW3rRKQwzagoG42ZTVLCkn+W9qYV68k JjkMJ1/x+e1q22tucTpauIx7mpK/WTvdyrwzyGbq+/0ws2vqxiVTSPXZ6jpnzZhXLoy2 YE9n+AHie1xKzAeMV3SV9kTRvDmXg/uuJThFbvlyW5DHwJLndppdksQYBSkMrP7zVj/l tPDJxZNGj305TEb8WvbpL91YCNdxeBpOFndFwk2rnYb3LJYU8jkfqdowd2TYN0NvEKyU DytK+AV/Bony3jTtnVyz5d3n1GcMSEugB9WKAkX4Nw5dr5hTZ+Lgj+bk1///7tQJOg24 2I1Q== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id j8si7149382pgc.229.2018.01.18.16.06.08; Thu, 18 Jan 2018 16:06:22 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755372AbeASAEJ (ORCPT + 99 others); Thu, 18 Jan 2018 19:04:09 -0500 Received: from mail.kernel.org ([198.145.29.99]:39158 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754953AbeASACy (ORCPT ); Thu, 18 Jan 2018 19:02:54 -0500 Received: from localhost.localdomain (i16-les03-th2-31-37-47-191.sfr.lns.abo.bbox.fr [31.37.47.191]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-SHA256 (128/128 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id C24742176D; Fri, 19 Jan 2018 00:02:47 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org C24742176D Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=frederic@kernel.org From: Frederic Weisbecker To: Ingo Molnar Cc: LKML , Frederic Weisbecker , Peter Zijlstra , Chris Metcalf , Thomas Gleixner , Luiz Capitulino , Christoph Lameter , "Paul E . McKenney" , Wanpeng Li , Mike Galbraith , Rik van Riel Subject: [PATCH 4/6] sched/isolation: Residual 1Hz scheduler tick offload Date: Fri, 19 Jan 2018 01:02:18 +0100 Message-Id: <1516320140-13189-5-git-send-email-frederic@kernel.org> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1516320140-13189-1-git-send-email-frederic@kernel.org> References: <1516320140-13189-1-git-send-email-frederic@kernel.org> Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org When a CPU runs in full dynticks mode, a 1Hz tick remains in order to keep the scheduler stats alive. However this residual tick is a burden for bare metal tasks that can't stand any interruption at all, or want to minimize them. The usual boot parameters "nohz_full=" or "isolcpus=nohz" will now outsource these scheduler ticks to the global workqueue so that a housekeeping CPU handles those remotely. Note that in the case of using isolcpus, it's still up to the user to affine the global workqueues to the housekeeping CPUs through /sys/devices/virtual/workqueue/cpumask or domains isolation "isolcpus=nohz,domain". Signed-off-by: Frederic Weisbecker Cc: Chris Metcalf Cc: Christoph Lameter Cc: Luiz Capitulino Cc: Mike Galbraith Cc: Paul E. McKenney Cc: Peter Zijlstra Cc: Rik van Riel Cc: Thomas Gleixner Cc: Wanpeng Li Cc: Ingo Molnar --- kernel/sched/core.c | 79 +++++++++++++++++++++++++++++++++++++++++++++++- kernel/sched/isolation.c | 4 +++ kernel/sched/sched.h | 2 ++ 3 files changed, 84 insertions(+), 1 deletion(-) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index d72d0e9..c79500c 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -3062,7 +3062,82 @@ u64 scheduler_tick_max_deferment(void) return jiffies_to_nsecs(next - now); } -#endif + +struct tick_work { + int cpu; + struct delayed_work work; +}; + +static struct tick_work __percpu *tick_work_cpu; + +static void sched_tick_remote(struct work_struct *work) +{ + struct delayed_work *dwork = to_delayed_work(work); + struct tick_work *twork = container_of(dwork, struct tick_work, work); + int cpu = twork->cpu; + struct rq *rq = cpu_rq(cpu); + struct rq_flags rf; + + /* + * Handle the tick only if it appears the remote CPU is running + * in full dynticks mode. The check is racy by nature, but + * missing a tick or having one too much is no big deal. + */ + if (!idle_cpu(cpu) && tick_nohz_tick_stopped_cpu(cpu)) { + rq_lock_irq(rq, &rf); + update_rq_clock(rq); + rq->curr->sched_class->task_tick(rq, rq->curr, 0); + rq_unlock_irq(rq, &rf); + } + + queue_delayed_work(system_unbound_wq, dwork, HZ); +} + +static void sched_tick_start(int cpu) +{ + struct tick_work *twork; + + if (housekeeping_cpu(cpu, HK_FLAG_TICK)) + return; + + WARN_ON_ONCE(!tick_work_cpu); + + twork = per_cpu_ptr(tick_work_cpu, cpu); + twork->cpu = cpu; + INIT_DELAYED_WORK(&twork->work, sched_tick_remote); + queue_delayed_work(system_unbound_wq, &twork->work, HZ); +} + +#ifdef CONFIG_HOTPLUG_CPU +static void sched_tick_stop(int cpu) +{ + struct tick_work *twork; + + if (housekeeping_cpu(cpu, HK_FLAG_TICK)) + return; + + WARN_ON_ONCE(!tick_work_cpu); + + twork = per_cpu_ptr(tick_work_cpu, cpu); + cancel_delayed_work_sync(&twork->work); +} +#endif /* CONFIG_HOTPLUG_CPU */ + +int __init sched_tick_offload_init(void) +{ + tick_work_cpu = alloc_percpu(struct tick_work); + if (!tick_work_cpu) { + pr_err("Can't allocate remote tick struct\n"); + return -ENOMEM; + } + + return 0; +} + +#else +static void sched_tick_start(int cpu) { } +static void sched_tick_stop(int cpu) { } +#endif /* CONFIG_NO_HZ_FULL */ #if defined(CONFIG_PREEMPT) && (defined(CONFIG_DEBUG_PREEMPT) || \ defined(CONFIG_PREEMPT_TRACER)) @@ -5713,6 +5788,7 @@ int sched_cpu_starting(unsigned int cpu) { set_cpu_rq_start_time(cpu); sched_rq_cpu_starting(cpu); + sched_tick_start(cpu); return 0; } @@ -5724,6 +5800,7 @@ int sched_cpu_dying(unsigned int cpu) /* Handle pending wakeups and then migrate everything off */ sched_ttwu_pending(); + sched_tick_stop(cpu); rq_lock_irqsave(rq, &rf); if (rq->rd) { diff --git a/kernel/sched/isolation.c b/kernel/sched/isolation.c index 8f1c1de..d782302 100644 --- a/kernel/sched/isolation.c +++ b/kernel/sched/isolation.c @@ -13,6 +13,7 @@ #include #include #include +#include "sched.h" DEFINE_STATIC_KEY_FALSE(housekeeping_overriden); EXPORT_SYMBOL_GPL(housekeeping_overriden); @@ -61,6 +62,9 @@ void __init housekeeping_init(void) static_branch_enable(&housekeeping_overriden); + if (housekeeping_flags & HK_FLAG_TICK) + sched_tick_offload_init(); + /* We need at least one CPU to handle housekeeping work */ WARN_ON_ONCE(cpumask_empty(housekeeping_mask)); } diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index b19552a2..5a3b82c 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -1587,6 +1587,7 @@ extern void post_init_entity_util_avg(struct sched_entity *se); #ifdef CONFIG_NO_HZ_FULL extern bool sched_can_stop_tick(struct rq *rq); +extern int __init sched_tick_offload_init(void); /* * Tick may be needed by tasks in the runqueue depending on their policy and @@ -1611,6 +1612,7 @@ static inline void sched_update_tick_dependency(struct rq *rq) tick_nohz_dep_set_cpu(cpu, TICK_DEP_BIT_SCHED); } #else +static inline int sched_tick_offload_init(void) { return 0; } static inline void sched_update_tick_dependency(struct rq *rq) { } #endif -- 2.7.4