Received: by 2002:a05:6a10:f347:0:0:0:0 with SMTP id d7csp9222859pxu; Mon, 28 Dec 2020 09:42:46 -0800 (PST) X-Google-Smtp-Source: ABdhPJzmEKJpSngEk2ClSD9UDW4Rbqxj0NQkphxkwg2nSBkEPHu1GjWrR2j6Wshd7sbDg7+lMTYy X-Received: by 2002:aa7:cf8f:: with SMTP id z15mr43228542edx.17.1609177365952; Mon, 28 Dec 2020 09:42:45 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1609177365; cv=none; d=google.com; s=arc-20160816; b=G541c3n3EKFdXn+bGV0El3SQeg3c48I6SknbN926qAKuMN84aXSzTpu4+tpnY4C71m fyxiPWx6KzJCPujwTme2r+sY42FZSPlwurAM9+YzzSFVX651sKUDVzRNRrh8JOZZrHXQ GsVCFeqvH2HeRq7IOy3u0BwKb7FP93XGGzQdFqKZAaBPuSnZ2SaLTHoS6+36qb1h4CzK 1D3U/WntebJDFHOjvwJPQQe4I7r2x2gxiXbYcH2jhswF8eaBoGVnPHfvnOgTlTiS95sj m6ghSttoio0zJ31KdoflweyJ3AQ+yhWpGom1VEa6SsKjpMTIVNZYcBRmlrTIbPn0LKlO Ygkg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :user-agent:references:in-reply-to:message-id:date:subject:cc:to :from:dkim-signature; bh=EAYxOjkDtjFZYwg18LDioEAfN5EMbdKqvw5o/QSdfrA=; b=U/430Jhj7RLojdwO+6WW9JsEjw+UUmSxuS5sBnRb40H9RBwk4ln9bct9iNVAMBjoI2 tLlpzIWa/qPIAu2UCz6sJZswqunlN/Tsx0Gt3tFJRc+bsRrVFVtg9DnIqCACgUMM2tik OgrLbDsokbxxkZ3BGOGLR2HiNZ1zxeVHgk8AJF9SUJ6RsjIIovQLOX9LFJSqV0JRLK9l Xofx0dQisNRgi19ygVVYRv0gdiX3nSy9hVR39YZg+1L+3ShIMdG+kB8mV7kOQGj6rpGF 1DFE1nftYIFcVZlywWxvQlSoqgLb0CFdaI9nZrfA9SLnWWfR4mzPAM6wHZaXublP87uW flbQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linuxfoundation.org header.s=korg header.b=tsdmGrfl; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linuxfoundation.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id nq4si19884853ejb.68.2020.12.28.09.42.24; Mon, 28 Dec 2020 09:42:45 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@linuxfoundation.org header.s=korg header.b=tsdmGrfl; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linuxfoundation.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1731943AbgL1NMC (ORCPT + 99 others); Mon, 28 Dec 2020 08:12:02 -0500 Received: from mail.kernel.org ([198.145.29.99]:39272 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1731760AbgL1NLX (ORCPT ); Mon, 28 Dec 2020 08:11:23 -0500 Received: by mail.kernel.org (Postfix) with ESMTPSA id 7D786229EF; Mon, 28 Dec 2020 13:10:41 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linuxfoundation.org; s=korg; t=1609161042; bh=cei+Or6+UtFCdj1naDzz5lQ1z2UQkZiDBZQEcnzSLEQ=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=tsdmGrflCY9EgpdSCZmxDMao0CqX3JllmRiK/UEUP92J+N4pdtleZbhW8LlRdAei8 xHbn6PuzNOSr8PuZp+i1pQfcl3mdUP5JpAoJLDRB928m8PRrRgB1Qo3XUae0Kxo5ND +aEv3+L7V+S5bt7l7/UDKSHORLJBsng5Y/XdvpqA= From: Greg Kroah-Hartman To: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman , stable@vger.kernel.org, Peng Liu , "Peter Zijlstra (Intel)" , Daniel Bristot de Oliveira , Juri Lelli , Sasha Levin Subject: [PATCH 4.14 073/242] sched/deadline: Fix sched_dl_global_validate() Date: Mon, 28 Dec 2020 13:47:58 +0100 Message-Id: <20201228124908.279962300@linuxfoundation.org> X-Mailer: git-send-email 2.29.2 In-Reply-To: <20201228124904.654293249@linuxfoundation.org> References: <20201228124904.654293249@linuxfoundation.org> User-Agent: quilt/0.66 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Peng Liu [ Upstream commit a57415f5d1e43c3a5c5d412cd85e2792d7ed9b11 ] When change sched_rt_{runtime, period}_us, we validate that the new settings should at least accommodate the currently allocated -dl bandwidth: sched_rt_handler() --> sched_dl_bandwidth_validate() { new_bw = global_rt_runtime()/global_rt_period(); for_each_possible_cpu(cpu) { dl_b = dl_bw_of(cpu); if (new_bw < dl_b->total_bw) <------- ret = -EBUSY; } } But under CONFIG_SMP, dl_bw is per root domain , but not per CPU, dl_b->total_bw is the allocated bandwidth of the whole root domain. Instead, we should compare dl_b->total_bw against "cpus*new_bw", where 'cpus' is the number of CPUs of the root domain. Also, below annotation(in kernel/sched/sched.h) implied implementation only appeared in SCHED_DEADLINE v2[1], then deadline scheduler kept evolving till got merged(v9), but the annotation remains unchanged, meaningless and misleading, update it. * With respect to SMP, the bandwidth is given on a per-CPU basis, * meaning that: * - dl_bw (< 100%) is the bandwidth of the system (group) on each CPU; * - dl_total_bw array contains, in the i-eth element, the currently * allocated bandwidth on the i-eth CPU. [1]: https://lore.kernel.org/lkml/1267385230.13676.101.camel@Palantir/ Fixes: 332ac17ef5bf ("sched/deadline: Add bandwidth management for SCHED_DEADLINE tasks") Signed-off-by: Peng Liu Signed-off-by: Peter Zijlstra (Intel) Reviewed-by: Daniel Bristot de Oliveira Acked-by: Juri Lelli Link: https://lkml.kernel.org/r/db6bbda316048cda7a1bbc9571defde193a8d67e.1602171061.git.iwtbavbm@gmail.com Signed-off-by: Sasha Levin --- kernel/sched/deadline.c | 5 +++-- kernel/sched/sched.h | 42 ++++++++++++++++++----------------------- 2 files changed, 21 insertions(+), 26 deletions(-) diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c index 22770168bff84..06a6bcd6cfa66 100644 --- a/kernel/sched/deadline.c +++ b/kernel/sched/deadline.c @@ -2345,7 +2345,7 @@ int sched_dl_global_validate(void) u64 period = global_rt_period(); u64 new_bw = to_ratio(period, runtime); struct dl_bw *dl_b; - int cpu, ret = 0; + int cpu, cpus, ret = 0; unsigned long flags; /* @@ -2360,9 +2360,10 @@ int sched_dl_global_validate(void) for_each_possible_cpu(cpu) { rcu_read_lock_sched(); dl_b = dl_bw_of(cpu); + cpus = dl_bw_cpus(cpu); raw_spin_lock_irqsave(&dl_b->lock, flags); - if (new_bw < dl_b->total_bw) + if (new_bw * cpus < dl_b->total_bw) ret = -EBUSY; raw_spin_unlock_irqrestore(&dl_b->lock, flags); diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index 391d73a12ad72..e5cfec6bc8913 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -191,30 +191,6 @@ struct rt_bandwidth { void __dl_clear_params(struct task_struct *p); -/* - * To keep the bandwidth of -deadline tasks and groups under control - * we need some place where: - * - store the maximum -deadline bandwidth of the system (the group); - * - cache the fraction of that bandwidth that is currently allocated. - * - * This is all done in the data structure below. It is similar to the - * one used for RT-throttling (rt_bandwidth), with the main difference - * that, since here we are only interested in admission control, we - * do not decrease any runtime while the group "executes", neither we - * need a timer to replenish it. - * - * With respect to SMP, the bandwidth is given on a per-CPU basis, - * meaning that: - * - dl_bw (< 100%) is the bandwidth of the system (group) on each CPU; - * - dl_total_bw array contains, in the i-eth element, the currently - * allocated bandwidth on the i-eth CPU. - * Moreover, groups consume bandwidth on each CPU, while tasks only - * consume bandwidth on the CPU they're running on. - * Finally, dl_total_bw_cpu is used to cache the index of dl_total_bw - * that will be shown the next time the proc or cgroup controls will - * be red. It on its turn can be changed by writing on its own - * control. - */ struct dl_bandwidth { raw_spinlock_t dl_runtime_lock; u64 dl_runtime; @@ -226,6 +202,24 @@ static inline int dl_bandwidth_enabled(void) return sysctl_sched_rt_runtime >= 0; } +/* + * To keep the bandwidth of -deadline tasks under control + * we need some place where: + * - store the maximum -deadline bandwidth of each cpu; + * - cache the fraction of bandwidth that is currently allocated in + * each root domain; + * + * This is all done in the data structure below. It is similar to the + * one used for RT-throttling (rt_bandwidth), with the main difference + * that, since here we are only interested in admission control, we + * do not decrease any runtime while the group "executes", neither we + * need a timer to replenish it. + * + * With respect to SMP, bandwidth is given on a per root domain basis, + * meaning that: + * - bw (< 100%) is the deadline bandwidth of each CPU; + * - total_bw is the currently allocated bandwidth in each root domain; + */ struct dl_bw { raw_spinlock_t lock; u64 bw, total_bw; -- 2.27.0