Received: by 2002:a25:c593:0:0:0:0:0 with SMTP id v141csp924406ybe; Fri, 13 Sep 2019 08:24:45 -0700 (PDT) X-Google-Smtp-Source: APXvYqziwMhBLw/PVhCQug7KkWM0uFH4qaa8V/mCcqb66692Y7BeBEIy691gyzW7cMe1UlOfYRdp X-Received: by 2002:a50:8d5e:: with SMTP id t30mr48570538edt.112.1568388285489; Fri, 13 Sep 2019 08:24:45 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1568388285; cv=none; d=google.com; s=arc-20160816; b=PUpXy/0G5XCd2b+p5OZ+CNLPs86idmguR0o8KJ07hlQriQhm1Fft82pg9c8XjsM8gO fyBZ0iOqL0MsDsMBLHzoOtgyNP9YfTDNMOdyRu9GEYsvLImhYgQB/vb68kSnQqOuuKrZ IAmj8fF/K+zguqhw2x/8nqinAY+cDMFDzgqZ2bs5psJr6TH9vejpCG5LcKBCeZRJksny vOlCS6NBXn5q6z413TXfbQO8lSThfOx9utsR0dqyr8cDOl06UxoZ6JfVutyU1FCCfwL6 hE6tT9j+bTkjo9Bs/Z76TFBFU1oRQUatL9KN6ZhHNkDTcFJ/YJoCSVmYwndF1hrZgvpk 7VCQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject; bh=bvlkn11LGJA1zMh28lt8I6zECwLpgpgVI7W6q3h01WE=; b=cZchpejJySrwQtNFVzF7mkvp+39jucBEU19QS9Gw1mZMiA8LcErDn6MrF4b9fAg+nR 5mOz2jutn16RI7yvFg1D68OUWBZQhnjncovqaq03jBbR1y7schAAUSHnT0NFzf+TwUs7 J2Q9qWzVm6NZrGfT6vwKrrnLZdJ9BM8Z7v1fF0kXRfOwtBdlDoHUKl8W9LWmk9ZDY7Sj vIUlk6bmRGw09NUHLWNmK044aaFlrK5cuQJbCSB3zFUnIRxuHeAHY0Bd9z324CEeJWNp q8K/n4QosPp/BNR9CWrpG1l06ovKGvNZzyD2HHv4sfyu59Gx/O3O47VUPcw3jqN9PbUf R5rQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 64si1282424eda.384.2019.09.13.08.24.21; Fri, 13 Sep 2019 08:24:45 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2391428AbfIMNbB (ORCPT + 99 others); Fri, 13 Sep 2019 09:31:01 -0400 Received: from foss.arm.com ([217.140.110.172]:43714 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2388620AbfIMNbA (ORCPT ); Fri, 13 Sep 2019 09:31:00 -0400 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id D9BAD1000; Fri, 13 Sep 2019 06:30:59 -0700 (PDT) Received: from [192.168.1.158] (unknown [172.31.20.19]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 9DAB23F67D; Fri, 13 Sep 2019 06:30:58 -0700 (PDT) Subject: Re: [PATCH] sched: rt: Make RT capacity aware To: Qais Yousef , Steven Rostedt Cc: Ingo Molnar , Peter Zijlstra , Vincent Guittot , Alessio Balsini , linux-kernel@vger.kernel.org References: <20190903103329.24961-1-qais.yousef@arm.com> <20190904072524.09de28aa@oasis.local.home> <20190904154052.ygbhtduzkfj3xs5d@e107158-lin.cambridge.arm.com> From: Dietmar Eggemann Message-ID: Date: Fri, 13 Sep 2019 14:30:56 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.8.0 MIME-Version: 1.0 In-Reply-To: <20190904154052.ygbhtduzkfj3xs5d@e107158-lin.cambridge.arm.com> Content-Type: text/plain; charset=utf-8 Content-Language: en-GB Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 9/4/19 4:40 PM, Qais Yousef wrote: > On 09/04/19 07:25, Steven Rostedt wrote: >> On Tue, 3 Sep 2019 11:33:29 +0100 >> Qais Yousef wrote: [...] >>> @@ -1614,7 +1660,8 @@ static void put_prev_task_rt(struct rq *rq, struct task_struct *p) >>> static int pick_rt_task(struct rq *rq, struct task_struct *p, int cpu) >>> { >>> if (!task_running(rq, p) && >>> - cpumask_test_cpu(cpu, p->cpus_ptr)) >>> + cpumask_test_cpu(cpu, p->cpus_ptr) && >>> + rt_task_fits_capacity(p, cpu)) >> >> Hmm, so if a CPU goes idle, and looks for CPUS with more than one RT >> task queued (overloaded), it will skip pulling RT tasks if they are >> below capacity. Is that the desired effect? I think we could end up >> with a small idle CPUs with RT tasks waiting to run. > > The intention was not to pull this task that doesn't fit. But not to abort the > whole pull operation. pick_highest_pushable_task() should still iterate through > the remaining tasks, or did I miss something? On a big.LITTLE system (6 CPUs with [446 1024 1024 446 466 466] CPU capacity vector) I try to trace the handling of the 3rd big task (big2-2, util_min: 800, util_max: 1024) of an rt-app workload running 3 of them. rt_task_fits_capacity() call in: tag 1: select_task_rq_rt(), 3-7: 1st till 5th in find_lowest_rq() [ 37.505325] rt_task_fits_capacity: CPU3 tag=1 [big2-2 285] ret=0 [ 37.505882] find_lowest_rq: CPU3 [big2-2 285] tag=1 find_lowest_rq [ 37.506509] CPU3 [big2-2 285] lowest_mask=0,3-5 [ 37.507971] rt_task_fits_capacity: CPU3 tag=3 [big2-2 285] ret=0 [ 37.508200] rt_task_fits_capacity: CPU3 tag=4 [big2-2 285] ret=0 [ 37.509486] rt_task_fits_capacity: CPU0 tag=5 [big2-2 285] ret=0 [ 37.510191] rt_task_fits_capacity: CPU3 tag=5 [big2-2 285] ret=0 [ 37.511334] rt_task_fits_capacity: CPU4 tag=5 [big2-2 285] ret=0 [ 37.512194] rt_task_fits_capacity: CPU5 tag=5 [big2-2 285] ret=0 [ 37.513210] rt_task_fits_capacity: CPU0 tag=6 [big2-2 285] ret=0 [ 37.514085] rt_task_fits_capacity: CPU3 tag=7 [big2-2 285] ret=0 [ 37.514732] --> select_task_rq_rt: CPU3 [big2-2 285] cpu=0 Since CPU 0,3-5 can't run big2-2, it takes up to the test that the fitness hasn't changed. In case a capacity-aware (with fallback CPU) cpupri_find() would have returned a suitable lowest_mask, less work would have been needed. The snippet is repeating itself for the whole run of the workload since all the rt-tasks run for the same time and I'm only faking big.LITTLE on qemu. [...] >>> rcu_read_lock(); >>> for_each_domain(cpu, sd) { >>> @@ -1692,11 +1747,15 @@ static int find_lowest_rq(struct task_struct *task) >>> return this_cpu; >>> } >>> >>> - best_cpu = cpumask_first_and(lowest_mask, >>> - sched_domain_span(sd)); >>> - if (best_cpu < nr_cpu_ids) { >>> - rcu_read_unlock(); >>> - return best_cpu; >>> + for_each_cpu_and(best_cpu, lowest_mask, >>> + sched_domain_span(sd)) { >>> + if (best_cpu >= nr_cpu_ids) >> >> Can that happen in this loop? > > I kept the condition that was originally here but inverted the logic so we > don't mindlessly iterate through the rest of the CPUs. IOW, tried to retain the > original behavior of `if (best_cpu < nr_cpu_ids)` logic. > > Whether we can remove this check I don't know to be honest. Similar check exist > below and I did wonder under what conditions this could happen but didn't try > to follow the thread. > > The only case I can think about is if we set nr_cpu_ids through command line > to a lower value. Then if cpu_possible_mask and family aren't updated > accordingly then yeah this check will protect against scheduling on cpus the > users said they don't want to use? Just guessing. Why don't you build the capacity awareness into cpupri_find(..., lowest_mask)? You would have to add a fallback strategy in case p doesn't fit on any CPUs cpupri_find() returns today as lowest_mask. (return the CPU with the max capacity). Luca proposed something like this for SCHED_DEADLINE in "[RFC PATCH 0/6] Capacity awareness for SCHED_DEADLINE" (patch 2/6 and 5/6) https://lkml.kernel.org/r/20190506044836.2914-1-luca.abeni@santannapisa.it In this case you could get rid of all the newly introduced rt_task_fits_capacity() logic in find_lowest_rq(). --- I can't see the for_each_domain(cpu, sd) { for_each_cpu_and(best_cpu, lowest_mask, sched_domain_span(sd)) { ... } } other than we want to pick a CPU from the lowest_mask first which is closer to task_cpu(task). Qemu doesn't give me cluster: # cat /proc/sys/kernel/sched_domain/cpu0/domain*/name MC [...]