Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751938AbdLSQ0o (ORCPT ); Tue, 19 Dec 2017 11:26:44 -0500 Received: from mail.kernel.org ([198.145.29.99]:52710 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750784AbdLSQ0e (ORCPT ); Tue, 19 Dec 2017 11:26:34 -0500 DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org E08DE21927 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=frederic@kernel.org X-Google-Smtp-Source: ACJfBotWLjPnzDUwyyAajI8HZ650tWdSSPN5iFIRma9q7Wjt1nEOxHCJUjwRPcrkJ+gtuyXosuPpEPfou14pTXQHDqA= MIME-Version: 1.0 In-Reply-To: <20171219091911.tg2k4w7mgv2bcmeb@hirez.programming.kicks-ass.net> References: <1513653838-31314-1-git-send-email-frederic@kernel.org> <1513653838-31314-5-git-send-email-frederic@kernel.org> <20171219091911.tg2k4w7mgv2bcmeb@hirez.programming.kicks-ass.net> From: Frederic Weisbecker Date: Tue, 19 Dec 2017 17:26:32 +0100 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [PATCH 4/5] sched/isolation: Residual 1Hz scheduler tick offload To: Peter Zijlstra Cc: LKML , Chris Metcalf , Thomas Gleixner , Luiz Capitulino , Christoph Lameter , "Paul E . McKenney" , Ingo Molnar , Wanpeng Li , Mike Galbraith , Rik van Riel Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2996 Lines: 88 2017-12-19 10:19 UTC+01:00, Peter Zijlstra : > On Tue, Dec 19, 2017 at 04:23:57AM +0100, Frederic Weisbecker wrote: >> When a CPU runs in full dynticks mode, a 1Hz tick remains in order to >> keep the scheduler stats alive. However this residual tick is a burden >> for Real-Time tasks that can't stand no interruption at all. > > I'm not sure that is accurate. RT doesn't necessarily have anything much > to so with this. The tick is per definition very deterministic and thus > should not be a problem. I see, the term Real-Time can indeed be misleading here. I'll rather use "bare metal", as per Christoph's suggestion. > >> Adding the boot parameter "isolcpus=nohz_offload" will now outsource >> these scheduler ticks to the global workqueue so that a housekeeping CPU >> handles that tick remotely. > > The global workqueue sounds horrific; surely you want at least one such > housekeeping CPU per node or something ? I guess it depends how much CPUs we can afford to sacrifice to housekeeping. Surely the more CPUs we isolate, the more CPUs we want to do housekeeping and preferably per node. IIRC, the system_unbound_wq queues a work to a thread running on the enqueuer node when possible. But I need to check that. If it's the case, then it's up to the user to leave one CPU out of isolcpus on each node and the works should get queued and requeued to those per node housekeepers automatically. > >> +static void sched_tick_remote(struct work_struct *work) >> +{ >> + struct delayed_work *dwork = to_delayed_work(work); >> + struct tick_work *twork = container_of(dwork, struct tick_work, work); >> + struct rq *rq = cpu_rq(twork->cpu); >> + struct rq_flags rf; >> + >> + rq_lock_irq(rq, &rf); >> + update_rq_clock(rq); >> + rq->curr->sched_class->task_tick(rq, rq->curr, 0); >> + rq_unlock_irq(rq, &rf); >> + >> + queue_delayed_work(system_unbound_wq, dwork, HZ); >> +} >> + >> +void sched_tick_start(int cpu) >> +{ >> + struct tick_work *twork; >> + >> + if (housekeeping_cpu(cpu, HK_FLAG_TICK_SCHED)) >> + return; >> + >> + WARN_ON_ONCE(!tick_work_cpu); >> + >> + twork = per_cpu_ptr(tick_work_cpu, cpu); >> + twork->cpu = cpu; >> + INIT_DELAYED_WORK(&twork->work, sched_tick_remote); >> + queue_delayed_work(system_unbound_wq, &twork->work, HZ); >> + >> + return; >> +} >> + >> +#ifdef CONFIG_HOTPLUG_CPU >> +void sched_tick_stop(int cpu) >> +{ >> + struct tick_work *twork; >> + >> + if (housekeeping_cpu(cpu, HK_FLAG_TICK_SCHED)) >> + return; >> + >> + WARN_ON_ONCE(!tick_work_cpu); >> + >> + twork = per_cpu_ptr(tick_work_cpu, cpu); >> + cancel_delayed_work_sync(&twork->work); >> + >> + return; >> +} >> +#endif /* CONFIG_HOTPLUG_CPU */ > > This seems daft in that you _always_ run this remote tick, even when the > CPU in question is not in nohz (full) mode. Yeah that's very basic, I think I should add a check to verify that the CPU has effectively stopped its tick and is not in idle mode. This will be racy but it shouldn't matter much. Thanks.