Received: by 2002:a25:1506:0:0:0:0:0 with SMTP id 6csp1947505ybv; Fri, 14 Feb 2020 08:41:58 -0800 (PST) X-Google-Smtp-Source: APXvYqxZ57h2r2rJsenWDgMKcbWaC+oahYaqYdk2Z++GnjIqdH8t//0mdEt/biCPLm247PyZNrlG X-Received: by 2002:aca:5248:: with SMTP id g69mr2399531oib.106.1581698518123; Fri, 14 Feb 2020 08:41:58 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1581698518; cv=none; d=google.com; s=arc-20160816; b=gsILJHCjujqBtcqYRxmzHaoYUwPXXupdnqdUq25N91/EnKbAW5rpj5TsvsPYj4f67z HwSiBc6Hgcc4zBipe4mYQ3ockGWGhPedAg6s9ml0X68kiNoYg9azHaaWA8aakKOZ07ty AV52qHD/sNrs4GYOC1/LPSEnKdckRCk3RHIHS4ruyIlM+/wIiTNM5docDbI28PbZ3M1g JKAVPV/sI+V4A0E47OBbob2o8vNUCpIMQe6eTbs4va4CmC2dA2JPGjVwrO1PnySoCXau 0KTBTNTZJxLtjJzHP9YRIC5/C2T8z/A9HFDqln9A7RwCMiSODQTTKPhLFxfovm1TD1+B wgMw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:references:in-reply-to:message-id:date :subject:cc:to:from; bh=kMRBZ6zjQuhwEqeB7JM5QOSn06xnM+/wVfpKXtVnLjU=; b=Set//VDTHtyM3Pf9fosg2qv065ZcHmJ0gT6jaH7M0qqMbjYHTu4Rhfjc3rSPtIdM0j lOBsvNrD5h9W6bZt9FTujvZLHWwcqoRpr82B4CIq8xwq3+S8tiN9UPYgEJGPhfuXbxU8 rbMi9sgFLF9ZJfq6TuzXivk8gUWuggGbQ+jxQgxrc8adRUaCjlKpaEmPjPHxJKjR1MCp 4Jt7p02ZnYMrcfz1q9wotn8KZrDD+a71NYRIhvQ6Dh/B53SnkU2TQ53MPqisAk1NzR1a iLeNOc+BGKTwwIVLjjhxJ5mcUZ6D2VZBL+L7z+gd0tBIOYJyui4Wim19G0+cIdxSFMB1 YXYg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id s22si2778177oij.35.2020.02.14.08.41.46; Fri, 14 Feb 2020 08:41:58 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2405870AbgBNQkB (ORCPT + 99 others); Fri, 14 Feb 2020 11:40:01 -0500 Received: from foss.arm.com ([217.140.110.172]:39250 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2405845AbgBNQj7 (ORCPT ); Fri, 14 Feb 2020 11:39:59 -0500 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 52D51101E; Fri, 14 Feb 2020 08:39:59 -0800 (PST) Received: from e107158-lin.cambridge.arm.com (e107158-lin.cambridge.arm.com [10.1.195.21]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id DB13E3F68E; Fri, 14 Feb 2020 08:39:57 -0800 (PST) From: Qais Yousef To: Ingo Molnar , Peter Zijlstra , Steven Rostedt , Pavan Kondeti , Dietmar Eggemann Cc: Juri Lelli , Vincent Guittot , Ben Segall , Mel Gorman , linux-kernel@vger.kernel.org, Qais Yousef Subject: [PATCH 1/3] sched/rt: cpupri_find: implement fallback mechanism for !fit case Date: Fri, 14 Feb 2020 16:39:47 +0000 Message-Id: <20200214163949.27850-2-qais.yousef@arm.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20200214163949.27850-1-qais.yousef@arm.com> References: <20200214163949.27850-1-qais.yousef@arm.com> Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org When searching for the best lowest_mask with a fitness_fn passed, make sure we record the lowest_level that returns a valid lowest_mask so that we can use that as a fallback in case we fail to find a fitting CPU at all levels. The intention in the original patch was not to allow a down migration to unfitting CPU. But this missed the case where we are already running on unfitting one. With this change now RT tasks can still move between unfitting CPUs when they're already running on such CPU. And as Steve suggested; to adhere to the strict priority rules of RT, if a task is already running on a fitting CPU but due to priority it can't run on it, allow it to downmigrate to unfitting CPU so it can run. Reported-by: Pavan Kondeti Signed-off-by: Qais Yousef --- kernel/sched/cpupri.c | 157 +++++++++++++++++++++++++++--------------- 1 file changed, 101 insertions(+), 56 deletions(-) diff --git a/kernel/sched/cpupri.c b/kernel/sched/cpupri.c index 1a2719e1350a..1bcfa1995550 100644 --- a/kernel/sched/cpupri.c +++ b/kernel/sched/cpupri.c @@ -41,6 +41,59 @@ static int convert_prio(int prio) return cpupri; } +static inline int __cpupri_find(struct cpupri *cp, struct task_struct *p, + struct cpumask *lowest_mask, int idx) +{ + struct cpupri_vec *vec = &cp->pri_to_cpu[idx]; + int skip = 0; + + if (!atomic_read(&(vec)->count)) + skip = 1; + /* + * When looking at the vector, we need to read the counter, + * do a memory barrier, then read the mask. + * + * Note: This is still all racey, but we can deal with it. + * Ideally, we only want to look at masks that are set. + * + * If a mask is not set, then the only thing wrong is that we + * did a little more work than necessary. + * + * If we read a zero count but the mask is set, because of the + * memory barriers, that can only happen when the highest prio + * task for a run queue has left the run queue, in which case, + * it will be followed by a pull. If the task we are processing + * fails to find a proper place to go, that pull request will + * pull this task if the run queue is running at a lower + * priority. + */ + smp_rmb(); + + /* Need to do the rmb for every iteration */ + if (skip) + return 0; + + if (cpumask_any_and(p->cpus_ptr, vec->mask) >= nr_cpu_ids) + return 0; + + if (lowest_mask) { + cpumask_and(lowest_mask, p->cpus_ptr, vec->mask); + + /* + * We have to ensure that we have at least one bit + * still set in the array, since the map could have + * been concurrently emptied between the first and + * second reads of vec->mask. If we hit this + * condition, simply act as though we never hit this + * priority level and continue on. + */ + if (cpumask_empty(lowest_mask)) + return 0; + } + + return 1; +} + /** * cpupri_find - find the best (lowest-pri) CPU in the system * @cp: The cpupri context @@ -62,80 +115,72 @@ int cpupri_find(struct cpupri *cp, struct task_struct *p, struct cpumask *lowest_mask, bool (*fitness_fn)(struct task_struct *p, int cpu)) { - int idx = 0; int task_pri = convert_prio(p->prio); + int best_unfit_idx = -1; + int idx = 0, cpu; BUG_ON(task_pri >= CPUPRI_NR_PRIORITIES); for (idx = 0; idx < task_pri; idx++) { - struct cpupri_vec *vec = &cp->pri_to_cpu[idx]; - int skip = 0; - if (!atomic_read(&(vec)->count)) - skip = 1; - /* - * When looking at the vector, we need to read the counter, - * do a memory barrier, then read the mask. - * - * Note: This is still all racey, but we can deal with it. - * Ideally, we only want to look at masks that are set. - * - * If a mask is not set, then the only thing wrong is that we - * did a little more work than necessary. - * - * If we read a zero count but the mask is set, because of the - * memory barriers, that can only happen when the highest prio - * task for a run queue has left the run queue, in which case, - * it will be followed by a pull. If the task we are processing - * fails to find a proper place to go, that pull request will - * pull this task if the run queue is running at a lower - * priority. - */ - smp_rmb(); - - /* Need to do the rmb for every iteration */ - if (skip) - continue; - - if (cpumask_any_and(p->cpus_ptr, vec->mask) >= nr_cpu_ids) + if (!__cpupri_find(cp, p, lowest_mask, idx)) continue; - if (lowest_mask) { - int cpu; + if (!lowest_mask || !fitness_fn) + return 1; - cpumask_and(lowest_mask, p->cpus_ptr, vec->mask); + /* Ensure the capacity of the CPUs fit the task */ + for_each_cpu(cpu, lowest_mask) { + if (!fitness_fn(p, cpu)) + cpumask_clear_cpu(cpu, lowest_mask); + } + /* + * If no CPU at the current priority can fit the task + * continue looking + */ + if (cpumask_empty(lowest_mask)) { /* - * We have to ensure that we have at least one bit - * still set in the array, since the map could have - * been concurrently emptied between the first and - * second reads of vec->mask. If we hit this - * condition, simply act as though we never hit this - * priority level and continue on. + * Store our fallback priority in case we + * didn't find a fitting CPU */ - if (cpumask_empty(lowest_mask)) - continue; + if (best_unfit_idx == -1) + best_unfit_idx = idx; - if (!fitness_fn) - return 1; - - /* Ensure the capacity of the CPUs fit the task */ - for_each_cpu(cpu, lowest_mask) { - if (!fitness_fn(p, cpu)) - cpumask_clear_cpu(cpu, lowest_mask); - } - - /* - * If no CPU at the current priority can fit the task - * continue looking - */ - if (cpumask_empty(lowest_mask)) - continue; + continue; } return 1; } + /* + * If we failed to find a fitting lowest_mask, make sure we fall back + * to the last known unfitting lowest_mask. + * + * Note that the map of the recorded idx might have changed since then, + * so we must ensure to do the full dance to make sure that level still + * holds a valid lowest_mask. + * + * As per above, the map could have been concurrently emptied while we + * were busy searching for a fitting lowest_mask at the other priority + * levels. + * + * This rule favours honouring priority over fitting the task in the + * correct CPU (Capacity Awareness being the only user now). + * The idea is that if a higher priority task can run, then it should + * run even if this ends up being on unfitting CPU. + * + * The cost of this trade-off is not entirely clear and will probably + * be good for some workloads and bad for others. + * + * The main idea here is that if some CPUs were overcommitted, we try + * to spread which is what the scheduler traditionally did. Sys admins + * must do proper RT planning to avoid overloading the system if they + * really care. + */ + if (best_unfit_idx != -1) + return __cpupri_find(cp, p, lowest_mask, best_unfit_idx); + return 0; } -- 2.17.1