Received: by 2002:ac0:a5a6:0:0:0:0:0 with SMTP id m35-v6csp2333883imm; Fri, 7 Sep 2018 14:52:10 -0700 (PDT) X-Google-Smtp-Source: ANB0VdYPd9IVg+vL5QIEbrR2bRnZ7neDe+q+fTGgw/M85+LiuMrzcsYEg89rs0tBDYgvQYPnyioG X-Received: by 2002:a63:6183:: with SMTP id v125-v6mr10569244pgb.242.1536357129990; Fri, 07 Sep 2018 14:52:09 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1536357129; cv=none; d=google.com; s=arc-20160816; b=R0I0Oe04lP5MQD++sEgex08uwc3d1bZsbGNd4p4C39T6zG4/mPzd5eeztvddfQsYIt SfGpOOmUmSazjAacCWHlF28rLEr7UAqA0l9rVHwYuFN3kC+9bOeuwZ03gcXCKgeRprFO Ka23MewzUGesc4Ubcip8EgJetwycO1AEWIUNq3xQim/aiTYlEiGBQGBxmN3betvFIq0n vsYzt5yy+SBH0M7n2pNHRwI9aYMx2uM8xMHhLdghqHmaI18BV++7WZ2rDTMs2Ssuzvrh O24hy3QffFFsGGNYk16ur2FkZrgQW/UPPFLWVxxcJ7O+Co+XO/FRsPKblXJ12MqRON/q Nb8A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=Qbo2tRBsA24neJorN0C3RRPRL0VsEZKsQ1z2jlpDL80=; b=WyNOwgcvQ8CzSRpWbWaZ5DjGQ/a6+Xe9pV+ZDSeFssJGkNDFigJX/Y0aQ0x5AyzSkt nYldUznInQ7zE2T3cNKgs9bjtBPZQvMQVCijQTZD8tYEZCK/vz8u8Bz3lVOEhtnzGaMF tmzWblFKvkYiknZQrjePgnAm4Rrn0iQXh/6ST8xuYAZhi80ljS3MgHcOa4SCESN1TvmH wtsNf9EjgIJkLLJRJOLy1MUCyhqd2AhPy22DvQ/shjyehPVGFd58IbFNowlBq/2WBO/f 9IBEXnGWX01QMalEI5qniyIoUMOsQQp5HTN1IJifThAdW2XDQ1dj5+ZIcuoF/z83U+VH fBvg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@amazon.de header.s=amazon201209 header.b=pr00AxmD; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=amazon.de Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id u7-v6si9987038pfi.96.2018.09.07.14.51.54; Fri, 07 Sep 2018 14:52:09 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@amazon.de header.s=amazon201209 header.b=pr00AxmD; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=amazon.de Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730738AbeIHCZx (ORCPT + 99 others); Fri, 7 Sep 2018 22:25:53 -0400 Received: from smtp-fw-6002.amazon.com ([52.95.49.90]:23044 "EHLO smtp-fw-6002.amazon.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730728AbeIHCZw (ORCPT ); Fri, 7 Sep 2018 22:25:52 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.de; i=@amazon.de; q=dns/txt; s=amazon201209; t=1536356577; x=1567892577; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=Qbo2tRBsA24neJorN0C3RRPRL0VsEZKsQ1z2jlpDL80=; b=pr00AxmDA7XP5BKLz5CPm1+dNzyteVhvX5Z9GFRzcqteKKyr9Jq3y/rP HcNeSZcO5tDmQU4uI36Bu+Z7jSAMBfqtAYbiq46ITsPl5DhxeZa8vrOt4 Sn+tfHfsSDyd/duQcCk4VTAS8+MiQtyvu8tiWLLqcmjVtVXGwQ6bsLEKv 4=; X-IronPort-AV: E=Sophos;i="5.53,343,1531785600"; d="scan'208";a="361243261" Received: from iad6-co-svc-p1-lb1-vlan3.amazon.com (HELO email-inbound-relay-1e-27fb8269.us-east-1.amazon.com) ([10.124.125.6]) by smtp-border-fw-out-6002.iad6.amazon.com with ESMTP/TLS/DHE-RSA-AES256-SHA; 07 Sep 2018 21:42:57 +0000 Received: from u7588a65da6b65f.ant.amazon.com (iad7-ws-svc-lb50-vlan3.amazon.com [10.0.93.214]) by email-inbound-relay-1e-27fb8269.us-east-1.amazon.com (8.14.7/8.14.7) with ESMTP id w87Lgp3w005088 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=FAIL); Fri, 7 Sep 2018 21:42:54 GMT Received: from u7588a65da6b65f.ant.amazon.com (localhost [127.0.0.1]) by u7588a65da6b65f.ant.amazon.com (8.15.2/8.15.2/Debian-3) with ESMTPS id w87LgokQ027762 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 7 Sep 2018 23:42:50 +0200 Received: (from jschoenh@localhost) by u7588a65da6b65f.ant.amazon.com (8.15.2/8.15.2/Submit) id w87LgnZC027761; Fri, 7 Sep 2018 23:42:49 +0200 From: =?UTF-8?q?Jan=20H=2E=20Sch=C3=B6nherr?= To: Ingo Molnar , Peter Zijlstra Cc: =?UTF-8?q?Jan=20H=2E=20Sch=C3=B6nherr?= , linux-kernel@vger.kernel.org Subject: [RFC 50/60] cosched: Propagate load changes across hierarchy levels Date: Fri, 7 Sep 2018 23:40:37 +0200 Message-Id: <20180907214047.26914-51-jschoenh@amazon.de> X-Mailer: git-send-email 2.9.3.1.gcba166c.dirty In-Reply-To: <20180907214047.26914-1-jschoenh@amazon.de> References: <20180907214047.26914-1-jschoenh@amazon.de> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org The weight of an SD-SE is defined to be the average weight of all runqueues that are represented by the SD-SE. Hence, its weight should change whenever one of the child runqueues changes its weight. However, as these are two different hierarchy levels, they are protected by different locks. To reduce lock contention, we want to avoid holding higher level locks for prolonged amounts of time, if possible. Therefore, we update an aggregated weight -- sdrq->sdse_load -- in a lock-free manner during enqueue and dequeue in the lower level, and once we actually get the higher level lock, we perform the actual SD-SE weight adjustment via update_sdse_load(). At some point in the future (the code isn't there yet), this will allow software combining, where not all CPUs have to walk up the full hierarchy on enqueue/dequeue. Signed-off-by: Jan H. Schönherr --- kernel/sched/fair.c | 55 +++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 55 insertions(+) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 0dc4d289497c..1eee262ecf88 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -2740,6 +2740,10 @@ static inline void account_numa_dequeue(struct rq *rq, struct task_struct *p) static void account_entity_enqueue(struct cfs_rq *cfs_rq, struct sched_entity *se) { +#ifdef CONFIG_COSCHEDULING + if (!cfs_rq->sdrq.is_root && !cfs_rq->throttled) + atomic64_add(se->load.weight, &cfs_rq->sdrq.sd_parent->sdse_load); +#endif update_load_add(&cfs_rq->load, se->load.weight); if (!parent_entity(se) || is_sd_se(parent_entity(se))) update_load_add(&hrq_of(cfs_rq)->load, se->load.weight); @@ -2757,6 +2761,10 @@ account_entity_enqueue(struct cfs_rq *cfs_rq, struct sched_entity *se) static void account_entity_dequeue(struct cfs_rq *cfs_rq, struct sched_entity *se) { +#ifdef CONFIG_COSCHEDULING + if (!cfs_rq->sdrq.is_root && !cfs_rq->throttled) + atomic64_sub(se->load.weight, &cfs_rq->sdrq.sd_parent->sdse_load); +#endif update_load_sub(&cfs_rq->load, se->load.weight); if (!parent_entity(se) || is_sd_se(parent_entity(se))) update_load_sub(&hrq_of(cfs_rq)->load, se->load.weight); @@ -3083,6 +3091,35 @@ static inline void update_cfs_group(struct sched_entity *se) } #endif /* CONFIG_FAIR_GROUP_SCHED */ +#ifdef CONFIG_COSCHEDULING +static void update_sdse_load(struct sched_entity *se) +{ + struct cfs_rq *cfs_rq = cfs_rq_of(se); + struct sdrq *sdrq = &cfs_rq->sdrq; + unsigned long load; + + if (!is_sd_se(se)) + return; + + /* FIXME: the load calculation assumes a homogeneous topology */ + load = atomic64_read(&sdrq->sdse_load); + + if (!list_empty(&sdrq->children)) { + struct sdrq *entry; + + entry = list_first_entry(&sdrq->children, struct sdrq, siblings); + load *= entry->data->span_weight; + } + + load /= sdrq->data->span_weight; + + /* FIXME: Use a proper runnable */ + reweight_entity(cfs_rq, se, load, load); +} +#else /* !CONFIG_COSCHEDULING */ +static void update_sdse_load(struct sched_entity *se) { } +#endif /* !CONFIG_COSCHEDULING */ + static inline void cfs_rq_util_change(struct cfs_rq *cfs_rq, int flags) { struct rq *rq = hrq_of(cfs_rq); @@ -4527,6 +4564,11 @@ static void throttle_cfs_rq(struct cfs_rq *cfs_rq) se = cfs_rq->my_se; +#ifdef CONFIG_COSCHEDULING + if (!cfs_rq->sdrq.is_root && !cfs_rq->throttled) + atomic64_sub(cfs_rq->load.weight, + &cfs_rq->sdrq.sd_parent->sdse_load); +#endif /* freeze hierarchy runnable averages while throttled */ rcu_read_lock(); walk_tg_tree_from(cfs_rq->tg, tg_throttle_down, tg_nop, (void *)rq); @@ -4538,6 +4580,8 @@ static void throttle_cfs_rq(struct cfs_rq *cfs_rq) struct cfs_rq *qcfs_rq = cfs_rq_of(se); rq_chain_lock(&rc, se); + update_sdse_load(se); + /* throttled entity or throttle-on-deactivate */ if (!se->on_rq) break; @@ -4590,6 +4634,11 @@ void unthrottle_cfs_rq(struct cfs_rq *cfs_rq) se = cfs_rq->my_se; cfs_rq->throttled = 0; +#ifdef CONFIG_COSCHEDULING + if (!cfs_rq->sdrq.is_root && !cfs_rq->throttled) + atomic64_add(cfs_rq->load.weight, + &cfs_rq->sdrq.sd_parent->sdse_load); +#endif update_rq_clock(rq); @@ -4608,6 +4657,7 @@ void unthrottle_cfs_rq(struct cfs_rq *cfs_rq) rq_chain_init(&rc, rq); for_each_sched_entity(se) { rq_chain_lock(&rc, se); + update_sdse_load(se); if (se->on_rq) enqueue = 0; @@ -5152,6 +5202,7 @@ bool enqueue_entity_fair(struct rq *rq, struct sched_entity *se, int flags, rq_chain_init(&rc, rq); for_each_sched_entity(se) { rq_chain_lock(&rc, se); + update_sdse_load(se); if (se->on_rq) break; cfs_rq = cfs_rq_of(se); @@ -5173,6 +5224,7 @@ bool enqueue_entity_fair(struct rq *rq, struct sched_entity *se, int flags, for_each_sched_entity(se) { /* FIXME: taking locks up to the top is bad */ rq_chain_lock(&rc, se); + update_sdse_load(se); cfs_rq = cfs_rq_of(se); cfs_rq->h_nr_running += task_delta; @@ -5235,6 +5287,7 @@ bool dequeue_entity_fair(struct rq *rq, struct sched_entity *se, int flags, rq_chain_init(&rc, rq); for_each_sched_entity(se) { rq_chain_lock(&rc, se); + update_sdse_load(se); cfs_rq = cfs_rq_of(se); dequeue_entity(cfs_rq, se, flags); @@ -5269,6 +5322,7 @@ bool dequeue_entity_fair(struct rq *rq, struct sched_entity *se, int flags, for_each_sched_entity(se) { /* FIXME: taking locks up to the top is bad */ rq_chain_lock(&rc, se); + update_sdse_load(se); cfs_rq = cfs_rq_of(se); cfs_rq->h_nr_running -= task_delta; @@ -9897,6 +9951,7 @@ static void propagate_entity_cfs_rq(struct sched_entity *se) for_each_sched_entity(se) { rq_chain_lock(&rc, se); + update_sdse_load(se); cfs_rq = cfs_rq_of(se); if (cfs_rq_throttled(cfs_rq)) -- 2.9.3.1.gcba166c.dirty