Received: by 10.223.176.5 with SMTP id f5csp3019587wra; Mon, 29 Jan 2018 07:39:37 -0800 (PST) X-Google-Smtp-Source: AH8x225NVoYtIm7lZldjuVwzj9ODdmcWhAdDHjbyeigM2Lstz2VRCll2K90KEnv4l8HGb/hvwbxr X-Received: by 10.98.19.19 with SMTP id b19mr27342246pfj.118.1517240377566; Mon, 29 Jan 2018 07:39:37 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1517240377; cv=none; d=google.com; s=arc-20160816; b=YCyYNCewKtVHWXZmVXRGfjWQrCGTARTRuIK9ilTHtU5XHddvWyOg25srEU7Cn+Wrjm h0E0Pe3kVh28DlEyI05I+ANnu5HKspsSXPtUj7V8P5fcq7w6JImQWSUsWZT27EIXwJGP smnB0kQdsh/hz41Q4FQuRxnFfmF24bRgWouG1KCC4DWBVeeVCbmVucQcMEZ0VadirAcs N+xnp4xt5t7o0+tS9wG27IVmePIEwmAHOd3KGzux4++iGMuFFz/+KfF84BIdriidqxDe upJqy0IZ8OZ+vLb7Dus89ZhOh8QXY0Jfq9yFHmXr17QgfLUdFmzf9+svYPcaYVZv2iLX +Otw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature:arc-authentication-results; bh=4RhvdHTmnwMyWpX0B1TuqsmNbM1w2tNRzYLdAh7Xw6w=; b=NW6p5rX+RhANRAGpo+nd4jLDu/nkE4tUh9ju+HetwKiSVOyzq8M6oG/blwYBM49ZdH MSHTLDf8deh7nMDNMKAp/1oovM+Klwq1+tr9EjlKN8Djw5yZMR+Gzo+xBlRJafnap/h3 j5fhMBY+doe5yklNw13N4kn06v6fQWAo2hzbKg0R2vH2c6vHNjifRZtVx52/FOJBVm0F 4OdZ26j1yY5mDfYLgueX+TB+IHRcTNx+Vdknz6fiCN200SQuCnknGr8DEKdbY/Xp/2LX MslvDV7edc+/87rcAAX5fNZEPwgKdpca5puaKuS0ip2PaNMszKcKsGch8zXVZ16W1acy ogmw== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@infradead.org header.s=merlin.20170209 header.b=RvyWfbG6; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id y14-v6si12007pll.484.2018.01.29.07.39.22; Mon, 29 Jan 2018 07:39:37 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=fail header.i=@infradead.org header.s=merlin.20170209 header.b=RvyWfbG6; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751926AbeA2Pi6 (ORCPT + 99 others); Mon, 29 Jan 2018 10:38:58 -0500 Received: from merlin.infradead.org ([205.233.59.134]:56630 "EHLO merlin.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751337AbeA2Pi5 (ORCPT ); Mon, 29 Jan 2018 10:38:57 -0500 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=merlin.20170209; h=In-Reply-To:Content-Type:MIME-Version: References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Id: List-Help:List-Unsubscribe:List-Subscribe:List-Post:List-Owner:List-Archive; bh=4RhvdHTmnwMyWpX0B1TuqsmNbM1w2tNRzYLdAh7Xw6w=; b=RvyWfbG6NZXSViipmpwT7+05A KdKrQU+L+cy1sSouITJ9ye59IVRjyuPNeSNpWWpYfsJhqCyQYjKxAB9VV4599wdIKBxw9DoEGdjxB ofc1WJXCD9H7vUytFWw/wRxkMGTzcjMPXJjj8I7fqsXhSSYYK4c0kGX2a1P8TdOcG/6EBPyUqozJM e24sgfgJ5fzCVF+ryIX+hkuGmxpfG5yWpP+mRBZmjHxaZmQD1bd4fTMLmHf8KhDeHA3Kw5OvJqO3l MJGr9AEcT8+tlkhIJs7LfjuYy6BXxpYCBIkTyNntEcyTTkpikLi1JqlQhjHp380bcWBWgeaZwHeDY cVf3Pix8A==; Received: from j217100.upc-j.chello.nl ([24.132.217.100] helo=hirez.programming.kicks-ass.net) by merlin.infradead.org with esmtpsa (Exim 4.89 #1 (Red Hat Linux)) id 1egBW6-0002bj-Eh; Mon, 29 Jan 2018 15:38:42 +0000 Received: by hirez.programming.kicks-ass.net (Postfix, from userid 1000) id 7B5DE2013E23F; Mon, 29 Jan 2018 16:38:39 +0100 (CET) Date: Mon, 29 Jan 2018 16:38:39 +0100 From: Peter Zijlstra To: Frederic Weisbecker Cc: Ingo Molnar , LKML , Chris Metcalf , Thomas Gleixner , Luiz Capitulino , Christoph Lameter , "Paul E . McKenney" , Wanpeng Li , Mike Galbraith , Rik van Riel Subject: Re: [PATCH 4/6] sched/isolation: Residual 1Hz scheduler tick offload Message-ID: <20180129153839.GT2269@hirez.programming.kicks-ass.net> References: <1516320140-13189-1-git-send-email-frederic@kernel.org> <1516320140-13189-5-git-send-email-frederic@kernel.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1516320140-13189-5-git-send-email-frederic@kernel.org> User-Agent: Mutt/1.9.2 (2017-12-15) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Jan 19, 2018 at 01:02:18AM +0100, Frederic Weisbecker wrote: > When a CPU runs in full dynticks mode, a 1Hz tick remains in order to > keep the scheduler stats alive. However this residual tick is a burden > for bare metal tasks that can't stand any interruption at all, or want > to minimize them. > > The usual boot parameters "nohz_full=" or "isolcpus=nohz" will now > outsource these scheduler ticks to the global workqueue so that a > housekeeping CPU handles those remotely. > > Note that in the case of using isolcpus, it's still up to the user to > affine the global workqueues to the housekeeping CPUs through > /sys/devices/virtual/workqueue/cpumask or domains isolation > "isolcpus=nohz,domain". I would very much like a few words on why sched_class::task_tick() is safe to call remote -- from a quick look I think it actually is, but it would be good to have some words here. > diff --git a/kernel/sched/core.c b/kernel/sched/core.c > index d72d0e9..c79500c 100644 > --- a/kernel/sched/core.c > +++ b/kernel/sched/core.c > @@ -3062,7 +3062,82 @@ u64 scheduler_tick_max_deferment(void) > > return jiffies_to_nsecs(next - now); > } > -#endif > + > +struct tick_work { > + int cpu; > + struct delayed_work work; > +}; > + > +static struct tick_work __percpu *tick_work_cpu; > + > +static void sched_tick_remote(struct work_struct *work) > +{ > + struct delayed_work *dwork = to_delayed_work(work); > + struct tick_work *twork = container_of(dwork, struct tick_work, work); > + int cpu = twork->cpu; > + struct rq *rq = cpu_rq(cpu); > + struct rq_flags rf; > + > + /* > + * Handle the tick only if it appears the remote CPU is running > + * in full dynticks mode. The check is racy by nature, but > + * missing a tick or having one too much is no big deal. > + */ > + if (!idle_cpu(cpu) && tick_nohz_tick_stopped_cpu(cpu)) { > + rq_lock_irq(rq, &rf); > + update_rq_clock(rq); > + rq->curr->sched_class->task_tick(rq, rq->curr, 0); > + rq_unlock_irq(rq, &rf); > + } > + > + queue_delayed_work(system_unbound_wq, dwork, HZ); Do we want something that tracks the actual interrer arrival time of this work, such that we can detect and warn if the book-keeping thing is failing to keep up? > +} > + > +static void sched_tick_start(int cpu) > +{ > + struct tick_work *twork; > + > + if (housekeeping_cpu(cpu, HK_FLAG_TICK)) > + return; This all looks very static :-(, you can't reconfigure this nohz_full crud after boot? > + WARN_ON_ONCE(!tick_work_cpu); > + > + twork = per_cpu_ptr(tick_work_cpu, cpu); > + twork->cpu = cpu; > + INIT_DELAYED_WORK(&twork->work, sched_tick_remote); > + queue_delayed_work(system_unbound_wq, &twork->work, HZ); > +} Similarly, I think we want a few words about how unbound workqueues are expected to behave vs NUMA. AFAICT unbound workqueues by default prefer to run on a cpu in the same node, but if no cpu is available, it doesn't go looking for the nearest node that does have a cpu, it just punts to whatever random cpu.