Received: by 2002:a05:6a10:22f:0:0:0:0 with SMTP id 15csp3300115pxk; Mon, 21 Sep 2020 10:03:05 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxKQxiPXJ1nlhJ6XsihMcJDI8pZ7JR2dm5OHo1cDANeu2fuUxFDUGmSDM8oitVJM8NzOE8T X-Received: by 2002:a17:906:1115:: with SMTP id h21mr387682eja.273.1600707784826; Mon, 21 Sep 2020 10:03:04 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1600707784; cv=none; d=google.com; s=arc-20160816; b=znuuBkK/s2JMGAUQohczhymLrMrgR9YmU3HNi3KJ8jgKjpJEZl0rM3EDs373rw3/tK QgxFEpzHinpu4QBh8vjw2fhLvqewPsO3qdh//Sqr/wGXYnUdjhbVFpnc8wbPF0L0Uclh cB0eYMeuR506RpVg13e7jp+bs/okXjrG2Ptl+aBwtqNvqYvD7ZyEk/YojFLdqpB7JSjF n9e8kPn8qs7haa/nvgWQfEa2S0gBHJl9FpDtfxQg1XYHTios34rHnBsY1U7SNslvYyy+ xY4tWWGZ7vRi/txWNwc2Zk8IMMnanOcv5Xdww8zgWx2mpL3JtcxzEPGelL2dghTvHVNy TWnw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:mime-version:references:subject:cc:to:from:date :user-agent:message-id:dkim-signature; bh=khmK1/BwcjJR5ktAJ7+ioiYrL0DP5beuYtB3piuayAI=; b=B9X1cA7QSFIdEY50IYk575BborHvAzO/hHYO+kWQonDVVdC2WZAOPOXAFUbfHt5QxW hgSJpKV3u+85MKbH17ioCufVwGPQAeeSlMzBIXxQTnvNawEj+uHUT/mj+5dIewdQq4bp yNlhK/kEEh++qOqhUZ3+xlgcRGnXewoJjlFnLUBDYH9ryQJOKXC+5/FNdN4vkY7RVIE2 0khH5YegdL++3hfPbV1CYJzwIKGQAo4jbylldXObr9NIQ/KpkXgAJK3heFatXEQJuar4 /FXhQpuhsXqSzTHpMoDCYLcW7gGWeSEypScypn0W8kHChqIMjYnNAuSxvXhMAlLPyBmk PThA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@infradead.org header.s=casper.20170209 header.b=qM+triY6; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id dg23si9027354edb.229.2020.09.21.10.02.33; Mon, 21 Sep 2020 10:03:04 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@infradead.org header.s=casper.20170209 header.b=qM+triY6; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729385AbgIURB3 (ORCPT + 99 others); Mon, 21 Sep 2020 13:01:29 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33414 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728356AbgIUQjT (ORCPT ); Mon, 21 Sep 2020 12:39:19 -0400 Received: from casper.infradead.org (casper.infradead.org [IPv6:2001:8b0:10b:1236::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 78AA3C0613CF for ; Mon, 21 Sep 2020 09:39:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=Content-Type:MIME-Version:References: Subject:Cc:To:From:Date:Message-ID:Sender:Reply-To:Content-Transfer-Encoding: Content-ID:Content-Description:In-Reply-To; bh=khmK1/BwcjJR5ktAJ7+ioiYrL0DP5beuYtB3piuayAI=; b=qM+triY6R+8n0sCNqci/x8OmOx 2WsjLMU5LBHtoOSjWs1GI2n3D+fXrrlL+rAwxvx8oCyBNoSSbkun/p8xV3CE0tF43WsS2nBIBjhA+ OOyrZ/cjdmGXc6RHiQi/hcAQkccAekh8XNxuO0hRDSd8BWUTIJM19feX47wHeA1+AuhTtRuCSON2x zV5y2Vow9CtI4p+ro8aLPUiY2GY8kQ6RB6Pw9PF44AHFhS2LDBoqZDHt1bhZRdnR0ghTVtCsv0eNV PNfKGoRkR/pZLXiSqhh06EvtgLQVXI87s7JSlGaBpvw9iPccL+EC4LewUQm6iAUJ9hYzwPUMWS4fL P7xGILTQ==; Received: from j217100.upc-j.chello.nl ([24.132.217.100] helo=noisy.programming.kicks-ass.net) by casper.infradead.org with esmtpsa (Exim 4.92.3 #3 (Red Hat Linux)) id 1kKOqE-00016F-Tv; Mon, 21 Sep 2020 16:39:03 +0000 Received: from hirez.programming.kicks-ass.net (hirez.programming.kicks-ass.net [192.168.1.225]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by noisy.programming.kicks-ass.net (Postfix) with ESMTPS id 2564A307A65; Mon, 21 Sep 2020 18:39:00 +0200 (CEST) Received: by hirez.programming.kicks-ass.net (Postfix, from userid 0) id AB4B8201DA664; Mon, 21 Sep 2020 18:38:59 +0200 (CEST) Message-ID: <20200921163845.644634229@infradead.org> User-Agent: quilt/0.66 Date: Mon, 21 Sep 2020 18:36:02 +0200 From: Peter Zijlstra To: tglx@linutronix.de, mingo@kernel.org Cc: linux-kernel@vger.kernel.org, bigeasy@linutronix.de, qais.yousef@arm.com, swood@redhat.com, peterz@infradead.org, valentin.schneider@arm.com, juri.lelli@redhat.com, vincent.guittot@linaro.org, dietmar.eggemann@arm.com, rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de, bristot@redhat.com, vincent.donnefort@arm.com Subject: [PATCH 5/9] sched/hotplug: Consolidate task migration on CPU unplug References: <20200921163557.234036895@infradead.org> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Thomas Gleixner With the new mechanism which kicks tasks off the outgoing CPU at the end of schedule() the situation on an outgoing CPU right before the stopper thread brings it down completely is: - All user tasks and all unbound kernel threads have either been migrated away or are not running and the next wakeup will move them to a online CPU. - All per CPU kernel threads, except cpu hotplug thread and the stopper thread have either been unbound or parked by the responsible CPU hotplug callback. That means that at the last step before the stopper thread is invoked the cpu hotplug thread is the last legitimate running task on the outgoing CPU. Add a final wait step right before the stopper thread is kicked which ensures that any still running tasks on the way to park or on the way to kick themself of the CPU are either sleeping or gone. This allows to remove the migrate_tasks() crutch in sched_cpu_dying(). If sched_cpu_dying() detects that there is still another running task aside of the stopper thread then it will explode with the appropriate fireworks. Signed-off-by: Thomas Gleixner Signed-off-by: Peter Zijlstra (Intel) --- include/linux/cpuhotplug.h | 1 include/linux/sched/hotplug.h | 2 kernel/cpu.c | 9 ++ kernel/sched/core.c | 150 +++++++++--------------------------------- 4 files changed, 46 insertions(+), 116 deletions(-) --- a/include/linux/cpuhotplug.h +++ b/include/linux/cpuhotplug.h @@ -152,6 +152,7 @@ enum cpuhp_state { CPUHP_AP_ONLINE, CPUHP_TEARDOWN_CPU, CPUHP_AP_ONLINE_IDLE, + CPUHP_AP_SCHED_WAIT_EMPTY, CPUHP_AP_SMPBOOT_THREADS, CPUHP_AP_X86_VDSO_VMA_ONLINE, CPUHP_AP_IRQ_AFFINITY_ONLINE, --- a/include/linux/sched/hotplug.h +++ b/include/linux/sched/hotplug.h @@ -11,8 +11,10 @@ extern int sched_cpu_activate(unsigned i extern int sched_cpu_deactivate(unsigned int cpu); #ifdef CONFIG_HOTPLUG_CPU +extern int sched_cpu_wait_empty(unsigned int cpu); extern int sched_cpu_dying(unsigned int cpu); #else +# define sched_cpu_wait_empty NULL # define sched_cpu_dying NULL #endif --- a/kernel/cpu.c +++ b/kernel/cpu.c @@ -1602,7 +1602,7 @@ static struct cpuhp_step cpuhp_hp_states .name = "ap:online", }, /* - * Handled on controll processor until the plugged processor manages + * Handled on control processor until the plugged processor manages * this itself. */ [CPUHP_TEARDOWN_CPU] = { @@ -1611,6 +1611,13 @@ static struct cpuhp_step cpuhp_hp_states .teardown.single = takedown_cpu, .cant_stop = true, }, + + [CPUHP_AP_SCHED_WAIT_EMPTY] = { + .name = "sched:waitempty", + .startup.single = NULL, + .teardown.single = sched_cpu_wait_empty, + }, + /* Handle smpboot threads park/unpark */ [CPUHP_AP_SMPBOOT_THREADS] = { .name = "smpboot/threads:online", --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -6759,120 +6759,6 @@ void idle_task_exit(void) /* finish_cpu(), as ran on the BP, will clean up the active_mm state */ } -/* - * Since this CPU is going 'away' for a while, fold any nr_active delta - * we might have. Assumes we're called after migrate_tasks() so that the - * nr_active count is stable. We need to take the teardown thread which - * is calling this into account, so we hand in adjust = 1 to the load - * calculation. - * - * Also see the comment "Global load-average calculations". - */ -static void calc_load_migrate(struct rq *rq) -{ - long delta = calc_load_fold_active(rq, 1); - if (delta) - atomic_long_add(delta, &calc_load_tasks); -} - -static struct task_struct *__pick_migrate_task(struct rq *rq) -{ - const struct sched_class *class; - struct task_struct *next; - - for_each_class(class) { - next = class->pick_next_task(rq); - if (next) { - next->sched_class->put_prev_task(rq, next); - return next; - } - } - - /* The idle class should always have a runnable task */ - BUG(); -} - -/* - * Migrate all tasks from the rq, sleeping tasks will be migrated by - * try_to_wake_up()->select_task_rq(). - * - * Called with rq->lock held even though we'er in stop_machine() and - * there's no concurrency possible, we hold the required locks anyway - * because of lock validation efforts. - */ -static void migrate_tasks(struct rq *dead_rq, struct rq_flags *rf) -{ - struct rq *rq = dead_rq; - struct task_struct *next, *stop = rq->stop; - struct rq_flags orf = *rf; - int dest_cpu; - - /* - * Fudge the rq selection such that the below task selection loop - * doesn't get stuck on the currently eligible stop task. - * - * We're currently inside stop_machine() and the rq is either stuck - * in the stop_machine_cpu_stop() loop, or we're executing this code, - * either way we should never end up calling schedule() until we're - * done here. - */ - rq->stop = NULL; - - /* - * put_prev_task() and pick_next_task() sched - * class method both need to have an up-to-date - * value of rq->clock[_task] - */ - update_rq_clock(rq); - - for (;;) { - /* - * There's this thread running, bail when that's the only - * remaining thread: - */ - if (rq->nr_running == 1) - break; - - next = __pick_migrate_task(rq); - - /* - * Rules for changing task_struct::cpus_mask are holding - * both pi_lock and rq->lock, such that holding either - * stabilizes the mask. - * - * Drop rq->lock is not quite as disastrous as it usually is - * because !cpu_active at this point, which means load-balance - * will not interfere. Also, stop-machine. - */ - rq_unlock(rq, rf); - raw_spin_lock(&next->pi_lock); - rq_relock(rq, rf); - - /* - * Since we're inside stop-machine, _nothing_ should have - * changed the task, WARN if weird stuff happened, because in - * that case the above rq->lock drop is a fail too. - */ - if (WARN_ON(task_rq(next) != rq || !task_on_rq_queued(next))) { - raw_spin_unlock(&next->pi_lock); - continue; - } - - /* Find suitable destination for @next, with force if needed. */ - dest_cpu = select_fallback_rq(dead_rq->cpu, next); - rq = __migrate_task(rq, rf, next, dest_cpu); - if (rq != dead_rq) { - rq_unlock(rq, rf); - rq = dead_rq; - *rf = orf; - rq_relock(rq, rf); - } - raw_spin_unlock(&next->pi_lock); - } - - rq->stop = stop; -} - static int __balance_push_cpu_stop(void *arg) { struct task_struct *p = arg; @@ -7145,6 +7031,41 @@ int sched_cpu_starting(unsigned int cpu) } #ifdef CONFIG_HOTPLUG_CPU + +/* + * Invoked immediately before the stopper thread is invoked to bring the + * CPU down completely. At this point all per CPU kthreads except the + * hotplug thread (current) and the stopper thread (inactive) have been + * either parked or have been unbound from the outgoing CPU. Ensure that + * any of those which might be on the way out are gone. + * + * If after this point a bound task is being woken on this CPU then the + * responsible hotplug callback has failed to do it's job. + * sched_cpu_dying() will catch it with the appropriate fireworks. + */ +int sched_cpu_wait_empty(unsigned int cpu) +{ + balance_hotplug_wait(); + return 0; +} + +/* + * Since this CPU is going 'away' for a while, fold any nr_active delta we + * might have. Called from the CPU stopper task after ensuring that the + * stopper is the last running task on the CPU, so nr_active count is + * stable. We need to take the teardown thread which is calling this into + * account, so we hand in adjust = 1 to the load calculation. + * + * Also see the comment "Global load-average calculations". + */ +static void calc_load_migrate(struct rq *rq) +{ + long delta = calc_load_fold_active(rq, 1); + + if (delta) + atomic_long_add(delta, &calc_load_tasks); +} + int sched_cpu_dying(unsigned int cpu) { struct rq *rq = cpu_rq(cpu); @@ -7158,7 +7079,6 @@ int sched_cpu_dying(unsigned int cpu) BUG_ON(!cpumask_test_cpu(cpu, rq->rd->span)); set_rq_offline(rq); } - migrate_tasks(rq, &rf); BUG_ON(rq->nr_running != 1); rq_unlock_irqrestore(rq, &rf);