Received: by 2002:a25:c205:0:0:0:0:0 with SMTP id s5csp4674617ybf; Wed, 4 Mar 2020 08:28:35 -0800 (PST) X-Google-Smtp-Source: ADFU+vtBRjMVWa9zf3YqzaWDgBars9gIwBavpNCVt18AwmcVRbQC8kcHRavvvRd95SdOBpKPlZAX X-Received: by 2002:a05:6830:1d7a:: with SMTP id l26mr2905240oti.138.1583339315115; Wed, 04 Mar 2020 08:28:35 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1583339315; cv=none; d=google.com; s=arc-20160816; b=tcdD3KQAcgb7IbZPCSJyT+7TfwVw81es6Thz0oKZxOft/EA+Vp4QSpawDFFalhbR8t /1UEvi5RFRN4vOuEhD7/aI0ixO8Oj6TTFTGw2ICnket5DvfXMeYxt4/g9540S4OSTmmf xA3wcgfDZfcKEg2sXc1dwLHBVa2jKwqQ7UixgneotOtoriBOZrwVoiWTvQCMIp0hEcVs UPWEloBP8HVCEIaPuwoh+RVbt1agccJ+5Vi1QMqNb0WEfMngsIX/TA882htN/zcIOaNv c1odTwh+k1nA9QoIa+QjgGDbv+sbZS0uKakb8/RjmnqTzctXwLr2JfT2DgYjullIqoAI 8oFw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :references:in-reply-to:message-id:subject:cc:to:from:date; bh=R/fXe3ZfhoJUV9f9YzpJa9DY3ty0hLiWrEd9DKu6qgw=; b=e1VqI++TLXdJTU6t1bjlsrktehPgY9kaBVh9hIeqhJ/snKn/xQsHbx6EbRIxrt7HHv +ZGuPwp20uNrgqsX3b1pXJxCGT4Eb0iGHX1ihkiqMEfVTxdbxg8zrF5UvZrtawIDt7ui IKLDZ0GNk8hRHlIjPUxpuuU1b4Kc5Urn1w+Vgu0oVpOgAq0pwZsK/MjAHwOaxvJTiJnH Geved3wGQns2gUbEQ/bqupQYAUF5tGhiLfr7dFSFMmz2nNkjYbRMYJ0QnaBYHmBSwjhv rzBzVGytqxKtXeiRP5e6w/k57vLBuDRJM/wePNLsn39lZz4Svb1yV84iNkblrz6ihnoa sphQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 65si1772494oif.14.2020.03.04.08.28.23; Wed, 04 Mar 2020 08:28:35 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2388290AbgCDQ1n (ORCPT + 99 others); Wed, 4 Mar 2020 11:27:43 -0500 Received: from mail.kernel.org ([198.145.29.99]:38534 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2387497AbgCDQ1n (ORCPT ); Wed, 4 Mar 2020 11:27:43 -0500 Received: from gandalf.local.home (cpe-66-24-58-225.stny.res.rr.com [66.24.58.225]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id C035F21775; Wed, 4 Mar 2020 16:27:40 +0000 (UTC) Date: Wed, 4 Mar 2020 11:27:39 -0500 From: Steven Rostedt To: Qais Yousef Cc: Ingo Molnar , Peter Zijlstra , Dietmar Eggemann , Pavan Kondeti , Juri Lelli , Vincent Guittot , Ben Segall , Mel Gorman , linux-kernel@vger.kernel.org Subject: Re: [PATCH v3 1/6] sched/rt: cpupri_find: Implement fallback mechanism for !fit case Message-ID: <20200304112739.7b99677e@gandalf.local.home> In-Reply-To: <20200302132721.8353-2-qais.yousef@arm.com> References: <20200302132721.8353-1-qais.yousef@arm.com> <20200302132721.8353-2-qais.yousef@arm.com> X-Mailer: Claws Mail 3.17.3 (GTK+ 2.24.32; x86_64-pc-linux-gnu) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, 2 Mar 2020 13:27:16 +0000 Qais Yousef wrote: > /** > * cpupri_find - find the best (lowest-pri) CPU in the system > * @cp: The cpupri context > @@ -62,80 +115,72 @@ int cpupri_find(struct cpupri *cp, struct task_struct *p, > struct cpumask *lowest_mask, > bool (*fitness_fn)(struct task_struct *p, int cpu)) > { > - int idx = 0; > int task_pri = convert_prio(p->prio); > + int best_unfit_idx = -1; > + int idx = 0, cpu; Nit, but if you moved idx, might as well remove the unnecessary initialization of it as well ;-) > > BUG_ON(task_pri >= CPUPRI_NR_PRIORITIES); > > for (idx = 0; idx < task_pri; idx++) { It's initialized here. > - struct cpupri_vec *vec = &cp->pri_to_cpu[idx]; > - int skip = 0; > > - if (!atomic_read(&(vec)->count)) > - skip = 1; > - /* > - * When looking at the vector, we need to read the counter, > - * do a memory barrier, then read the mask. > - * > - * Note: This is still all racey, but we can deal with it. > - * Ideally, we only want to look at masks that are set. > - * > - * If a mask is not set, then the only thing wrong is that we > - * did a little more work than necessary. > - * > - * If we read a zero count but the mask is set, because of the > - * memory barriers, that can only happen when the highest prio > - * task for a run queue has left the run queue, in which case, > - * it will be followed by a pull. If the task we are processing > - * fails to find a proper place to go, that pull request will > - * pull this task if the run queue is running at a lower > - * priority. > - */ > - smp_rmb(); > - > - /* Need to do the rmb for every iteration */ > - if (skip) > - continue; > - > - if (cpumask_any_and(p->cpus_ptr, vec->mask) >= nr_cpu_ids) > + if (!__cpupri_find(cp, p, lowest_mask, idx)) > continue; > > - if (lowest_mask) { > - int cpu; > + if (!lowest_mask || !fitness_fn) > + return 1; > > - cpumask_and(lowest_mask, p->cpus_ptr, vec->mask); > + /* Ensure the capacity of the CPUs fit the task */ > + for_each_cpu(cpu, lowest_mask) { > + if (!fitness_fn(p, cpu)) > + cpumask_clear_cpu(cpu, lowest_mask); > + } > > + /* > + * If no CPU at the current priority can fit the task > + * continue looking > + */ > + if (cpumask_empty(lowest_mask)) { > /* > - * We have to ensure that we have at least one bit > - * still set in the array, since the map could have > - * been concurrently emptied between the first and > - * second reads of vec->mask. If we hit this > - * condition, simply act as though we never hit this > - * priority level and continue on. > + * Store our fallback priority in case we > + * didn't find a fitting CPU > */ > - if (cpumask_empty(lowest_mask)) > - continue; > + if (best_unfit_idx == -1) > + best_unfit_idx = idx; > > - if (!fitness_fn) > - return 1; > - > - /* Ensure the capacity of the CPUs fit the task */ > - for_each_cpu(cpu, lowest_mask) { > - if (!fitness_fn(p, cpu)) > - cpumask_clear_cpu(cpu, lowest_mask); > - } > - > - /* > - * If no CPU at the current priority can fit the task > - * continue looking > - */ > - if (cpumask_empty(lowest_mask)) > - continue; > + continue; > } > > return 1; > } > > + /* > + * If we failed to find a fitting lowest_mask, make sure we fall back > + * to the last known unfitting lowest_mask. > + * > + * Note that the map of the recorded idx might have changed since then, > + * so we must ensure to do the full dance to make sure that level still > + * holds a valid lowest_mask. > + * > + * As per above, the map could have been concurrently emptied while we > + * were busy searching for a fitting lowest_mask at the other priority > + * levels. > + * > + * This rule favours honouring priority over fitting the task in the > + * correct CPU (Capacity Awareness being the only user now). > + * The idea is that if a higher priority task can run, then it should > + * run even if this ends up being on unfitting CPU. > + * > + * The cost of this trade-off is not entirely clear and will probably > + * be good for some workloads and bad for others. > + * > + * The main idea here is that if some CPUs were overcommitted, we try > + * to spread which is what the scheduler traditionally did. Sys admins > + * must do proper RT planning to avoid overloading the system if they > + * really care. > + */ > + if (best_unfit_idx != -1) > + return __cpupri_find(cp, p, lowest_mask, best_unfit_idx); Hmm, this only checks the one index, which can change and then we miss everything. I think we can do better. What about this: for (idx = 0; idx < task_pri; idx++) { int found = -1; if (!__cpupri_find(cp, p, lowest_mask, idx)) continue; if (!lowest_mask || !fitness_fn) return 1; /* Make sure we have one fit CPU before clearing */ for_each_cpu(cpu, lowest_mask) { if (fitness_fn(p, cpu)) { found = cpu; break; } } if (found == -1) continue; /* Ensure the capacity of the CPUs fit the task */ for_each_cpu(cpu, lowest_mask) { if (cpu < found || !fitness_fn(p, cpu)) cpumask_clear_cpu(cpu, lowest_mask); } return 1; } This way, if nothing fits we return the untouched lowest_mask, and only clear the lowest_mask bits if we found a fitness cpu. -- Steve