Received: by 2002:a05:6358:3188:b0:123:57c1:9b43 with SMTP id q8csp38381410rwd; Wed, 12 Jul 2023 07:02:30 -0700 (PDT) X-Google-Smtp-Source: APBJJlEDvJvJM2hhetvZzTQUQhh6XHBE1HzQ0WJ4WReXp5MmZ1k8ukaowlDIxGRk3mwsV5wTRcZq X-Received: by 2002:a05:6a20:430b:b0:12e:92c1:b1c8 with SMTP id h11-20020a056a20430b00b0012e92c1b1c8mr19993440pzk.47.1689170550455; Wed, 12 Jul 2023 07:02:30 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1689170550; cv=none; d=google.com; s=arc-20160816; b=rKJfbLfSQi7QXgpp/G0q7FhBrV4oXKW0GmrJCUWLL0aySC4xAnVKNPAcc5kirbD2IA u9ESqmTBiiQbCoch6HSYlYDzw+1SBlE1wSAbVORsO0F7SKTvAOATGJAXB8IkGgA0IPPX tPaqAhEXq0AblEq0Gg0IZSAzNgdMZiOnjLpWlSdcaZaQlHLBXNybmV8FrV2NQVSnREv7 CVJIM+P0laNfO2PrG8ukRVcgC7AknMj52BEm5ByjHcg3Ez6Pn9iIXt7vhfx8wkV2daIa JQzkk7RTaxFYihK6M2xdvJb0CL/t/hJemkH1fqwTY31jkE1cBLG4HsESxrM6nKCqKEy6 kwlg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=VYppUyZNyU3xBWqK0PQPx+kzMBZVFwAxWyQZb36l+P0=; fh=KNdEubpqu5UvbgA0knx25swlzaHCjQgFyStLlD2Wpro=; b=xl4xO2IADwvAAsmpsFOArg3UkE2IJ7Yz6WXTVrSX08l+Whi7w/xR7VOd1u9klG5cb9 kLRwHgL8U4dfYGdczXQ4ZyTUxWoiBChwffj1G348F5KJM370sbv3ZCLZcmpIc8/GFw/D WriYkGJkTl/no6DOF2zqMCiHGBX5W5DMZTpC3yRpfpFPZ6iGB0rQqFbb0PpcsgaiWYwS Z6eScrXkrYUP1hD9KwU1zHroKiJGI0cNB9uoo7xR/e+KA56gcBBibOA1guNaBNMFSe9Q AK/RArH7MgfDRDPkRAQWfsCYkmlzCsQhJUUfquYSVPGM8WunyxHNlHtZdP9fw1U0J3Q7 D+Rg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=C1ypnuuF; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id u12-20020a655c0c000000b0055b79986063si3298102pgr.300.2023.07.12.07.02.09; Wed, 12 Jul 2023 07:02:30 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=C1ypnuuF; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231970AbjGLNez (ORCPT + 99 others); Wed, 12 Jul 2023 09:34:55 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53942 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229880AbjGLNey (ORCPT ); Wed, 12 Jul 2023 09:34:54 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9E48A19A6 for ; Wed, 12 Jul 2023 06:34:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1689168843; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=VYppUyZNyU3xBWqK0PQPx+kzMBZVFwAxWyQZb36l+P0=; b=C1ypnuuFSEXJww2N3SLoiES1UfD2U0eTQbuguoH6j4GIygxjktBR0eCl2/rhOg084Ccj9u pr4Ve7L7GmOo2/cAC3oxWb43pDUGkXOl/vsLlsI/x72qQxh2IlUwajc7FW7l1xWU4cRYQJ uncrjChhESlkfMLqBkGdbQofhUDHC0A= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-654-7eoBJSkGNha3kaBHlgn_-w-1; Wed, 12 Jul 2023 09:33:59 -0400 X-MC-Unique: 7eoBJSkGNha3kaBHlgn_-w-1 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.rdu2.redhat.com [10.11.54.5]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 2085F88D0F7; Wed, 12 Jul 2023 13:33:59 +0000 (UTC) Received: from pauld.bos.com (dhcp-17-165.bos.redhat.com [10.18.17.165]) by smtp.corp.redhat.com (Postfix) with ESMTP id B14AAF66D1; Wed, 12 Jul 2023 13:33:58 +0000 (UTC) From: Phil Auld To: linux-kernel@vger.kernel.org Cc: Juri Lelli , Ingo Molnar , Daniel Bristot de Oliveira , Peter Zijlstra , Vincent Guittot , Dietmar Eggemann , Valentin Schneider , Ben Segall , Steven Rostedt , Mel Gorman , Frederic Weisbecker , Tejun Heo , Phil Auld Subject: [PATCH v2 1/2] sched, cgroup: Restore meaning to hierarchical_quota Date: Wed, 12 Jul 2023 09:33:56 -0400 Message-Id: <20230712133357.381137-2-pauld@redhat.com> In-Reply-To: <20230712133357.381137-1-pauld@redhat.com> References: <20230712133357.381137-1-pauld@redhat.com> MIME-Version: 1.0 Content-type: text/plain Content-Transfer-Encoding: 8bit X-Scanned-By: MIMEDefang 3.1 on 10.11.54.5 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H4,RCVD_IN_MSPIKE_WL,SPF_HELO_NONE,SPF_NONE, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org In cgroupv2 cfs_b->hierarchical_quota is set to -1 for all task groups due to the previous fix simply taking the min. It should reflect a limit imposed at that level or by an ancestor. Even though cgroupv2 does not require child quota to be less than or equal to that of its ancestors the task group will still be constrained by such a quota so this should be shown here. Cgroupv1 continues to set this correctly. In both cases, add initialization when a new task group is created based on the current parent's value (or RUNTIME_INF in the case of root_task_group). Otherwise, the field is wrong until a quota is changed after creation and __cfs_schedulable() is called. Fixes: c53593e5cb69 ("sched, cgroup: Don't reject lower cpu.max on ancestors") Signed-off-by: Phil Auld Reviewed-by: Ben Segall Cc: Ingo Molnar Cc: Peter Zijlstra Cc: Vincent Guittot Cc: Juri Lelli Cc: Dietmar Eggemann Cc: Valentin Schneider Cc: Ben Segall Cc: Frederic Weisbecker Cc: Tejun Heo --- v2: Improve comment about how setting hierarchical_quota correctly helps the scheduler. Remove extra parens. kernel/sched/core.c | 13 +++++++++---- kernel/sched/fair.c | 7 ++++--- kernel/sched/sched.h | 2 +- 3 files changed, 14 insertions(+), 8 deletions(-) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index a68d1276bab0..f80697a79baf 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -9904,7 +9904,7 @@ void __init sched_init(void) ptr += nr_cpu_ids * sizeof(void **); root_task_group.shares = ROOT_TASK_GROUP_LOAD; - init_cfs_bandwidth(&root_task_group.cfs_bandwidth); + init_cfs_bandwidth(&root_task_group.cfs_bandwidth, NULL); #endif /* CONFIG_FAIR_GROUP_SCHED */ #ifdef CONFIG_RT_GROUP_SCHED root_task_group.rt_se = (struct sched_rt_entity **)ptr; @@ -11038,11 +11038,16 @@ static int tg_cfs_schedulable_down(struct task_group *tg, void *data) /* * Ensure max(child_quota) <= parent_quota. On cgroup2, - * always take the min. On cgroup1, only inherit when no - * limit is set: + * always take the non-RUNTIME_INF min. On cgroup1, only + * inherit when no limit is set. In cgroup2 this is used + * by the scheduler to determine if a given CFS task has a + * bandwidth constraint at some higher level. */ if (cgroup_subsys_on_dfl(cpu_cgrp_subsys)) { - quota = min(quota, parent_quota); + if (quota == RUNTIME_INF) + quota = parent_quota; + else if (parent_quota != RUNTIME_INF) + quota = min(quota, parent_quota); } else { if (quota == RUNTIME_INF) quota = parent_quota; diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 373ff5f55884..d9b3d4617e16 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -6005,13 +6005,14 @@ static enum hrtimer_restart sched_cfs_period_timer(struct hrtimer *timer) return idle ? HRTIMER_NORESTART : HRTIMER_RESTART; } -void init_cfs_bandwidth(struct cfs_bandwidth *cfs_b) +void init_cfs_bandwidth(struct cfs_bandwidth *cfs_b, struct cfs_bandwidth *parent) { raw_spin_lock_init(&cfs_b->lock); cfs_b->runtime = 0; cfs_b->quota = RUNTIME_INF; cfs_b->period = ns_to_ktime(default_cfs_period()); cfs_b->burst = 0; + cfs_b->hierarchical_quota = parent ? parent->hierarchical_quota : RUNTIME_INF; INIT_LIST_HEAD(&cfs_b->throttled_cfs_rq); hrtimer_init(&cfs_b->period_timer, CLOCK_MONOTONIC, HRTIMER_MODE_ABS_PINNED); @@ -6168,7 +6169,7 @@ static inline int throttled_lb_pair(struct task_group *tg, return 0; } -void init_cfs_bandwidth(struct cfs_bandwidth *cfs_b) {} +void init_cfs_bandwidth(struct cfs_bandwidth *cfs_b, struct cfs_bandwidth *parent) {} #ifdef CONFIG_FAIR_GROUP_SCHED static void init_cfs_rq_runtime(struct cfs_rq *cfs_rq) {} @@ -12373,7 +12374,7 @@ int alloc_fair_sched_group(struct task_group *tg, struct task_group *parent) tg->shares = NICE_0_LOAD; - init_cfs_bandwidth(tg_cfs_bandwidth(tg)); + init_cfs_bandwidth(tg_cfs_bandwidth(tg), tg_cfs_bandwidth(parent)); for_each_possible_cpu(i) { cfs_rq = kzalloc_node(sizeof(struct cfs_rq), diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index ec7b3e0a2b20..63822c9238cc 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -460,7 +460,7 @@ extern void unregister_fair_sched_group(struct task_group *tg); extern void init_tg_cfs_entry(struct task_group *tg, struct cfs_rq *cfs_rq, struct sched_entity *se, int cpu, struct sched_entity *parent); -extern void init_cfs_bandwidth(struct cfs_bandwidth *cfs_b); +extern void init_cfs_bandwidth(struct cfs_bandwidth *cfs_b, struct cfs_bandwidth *parent); extern void __refill_cfs_bandwidth_runtime(struct cfs_bandwidth *cfs_b); extern void start_cfs_bandwidth(struct cfs_bandwidth *cfs_b); -- 2.31.1