Received: by 2002:a25:31c3:0:0:0:0:0 with SMTP id x186csp106878ybx; Tue, 29 Oct 2019 15:09:17 -0700 (PDT) X-Google-Smtp-Source: APXvYqyvVrXrqAzJRLK2z4EswQtVd/sPrWGHCvMU1p+thaejXGUu79LexLJjNsoh2vmWNaVqeSDW X-Received: by 2002:a50:b634:: with SMTP id b49mr28311831ede.77.1572386956965; Tue, 29 Oct 2019 15:09:16 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1572386956; cv=none; d=google.com; s=arc-20160816; b=O6yM/SBwgKIVH/nHWXhi5zaordRGj069NdJXtbPADlpJt60GhQtfV7aYjdbtMKrinY MznPP3meUu4jBttG0hUrSWictLg3olBgJ1b4MacjhsPaIFvieym/yEC3JY05ZIpX2ThT nv+QaCCT7tPwp38a1vBZo0TcvpwXbyQuL5LDokF+64kdexmUCGzvuKOX5kazVFkXvm/m xbvG/Avyfw7KfguYrwkjWrvHJrxib46GYWkiKPq0yQDGKnJY6fLotWn+blPSd9DQpltP 0RiU/gLKtfVag7e5Gpx9XSP2azB0tNv9gRpRnib/28rlgeuDMf36p2MsOL7VQ1OP42ZD zncQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :references:in-reply-to:date:cc:to:from:subject:message-id; bh=Lm6Zku5Twpn6Yo1X+hlhcnt1h+IuZmpC43CHox63Bq8=; b=X74Qp5A5/zyv+jde46cvF95YK4Np76VA5qN6hPcDEbOK5SoKBFgQoQK1HgKh1RbXdB qkGm0/TWRhsVebyTfbp+YzqC1VqKrT1c6kBUdd/6wkzVmQkpKIm1l61mX9LsNDkmB8fg n6zZz6LAuGn2HGEH7mBtfZcbD0qqj2JNyclJ03Ubbq5S3s+Twz8YV+EC9rbfIdWHSUtG FjlXsbiHhyvfT9n9mkOdyfkgemxwO2GrK3I9t3m1EpEDHaDPg/xpQbsinAWwcWD1buei Itbt79lVTIcmFOnbG8x+LlaWuWB15YUqta8YA1zU5jdKwTT9juB2vBGrhZ3gEiXdgfAL 1egw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id z58si62339edd.283.2019.10.29.15.08.53; Tue, 29 Oct 2019 15:09:16 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727193AbfJ2T7v (ORCPT + 99 others); Tue, 29 Oct 2019 15:59:51 -0400 Received: from mga06.intel.com ([134.134.136.31]:39895 "EHLO mga06.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725830AbfJ2T7u (ORCPT ); Tue, 29 Oct 2019 15:59:50 -0400 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga005.fm.intel.com ([10.253.24.32]) by orsmga104.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 29 Oct 2019 12:59:50 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.68,245,1569308400"; d="scan'208";a="399912802" Received: from spandruv-desk.jf.intel.com ([10.54.75.31]) by fmsmga005.fm.intel.com with ESMTP; 29 Oct 2019 12:59:49 -0700 Message-ID: Subject: Re: [PATCH] Revert "sched/fair: Fix O(nr_cgroups) in the load balancing path" From: Srinivas Pandruvada To: Doug Smythies , vincent.guittot@linaro.org Cc: linux-kernel@vger.kernel.org, linux-pm@vger.kernel.org, Doug Smythies , Peter Zijlstra , Ingo Molnar , Linus Torvalds , Thomas Gleixner , sargun@sargun.me, tj@kernel.org, xiexiuqi@huawei.com, xiezhipeng1@huawei.com Date: Tue, 29 Oct 2019 12:59:49 -0700 In-Reply-To: <10eef14e434375ef4bb7cf23ecb987b3591064a6.camel@linux.intel.com> References: <1572018904-5234-1-git-send-email-dsmythies@telus.net> <10eef14e434375ef4bb7cf23ecb987b3591064a6.camel@linux.intel.com> Content-Type: text/plain; charset="UTF-8" X-Mailer: Evolution 3.28.5 (3.28.5-3.fc28) Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, 2019-10-29 at 12:34 -0700, Srinivas Pandruvada wrote: > On Fri, 2019-10-25 at 08:55 -0700, Doug Smythies wrote: > > [...] > > > Experiment method: > > > > enable only idle state 1 > > Dountil stopped > > apply a 100% load (all CPUs) > > after awhile (about 50 seconds) remove the load. > > allow a short transient delay (1 second). > > measure the processor package joules used over the next 149 > > seconds. > > Enduntil > > > > Kernel k5.4-rc2 + reversion (this method) > > Average processor package power: 9.148 watts (128 samples, > 7 > > hours) > > Minimum: 9.02 watts > > Maximum: 9.29 watts > > Note: outlyer data point group removed, as it was assumed the > > computer > > had something to do and wasn't actually "idle". > > > > Kernel 5.4-rc2: > > Average processor package power: 9.969 watts (150 samples, > 8 > > hours) > > Or 9% more energy for the idle phases of the work load. > > Minimum: 9.15 watts > > Maximum: 13.79 watts (51% more power) > > Hi Doug, > > Do you have intel_pstate_tracer output? I guess that when started > request to measure the measure joules, it started at higher P-state > without revert. > Other way is check by fixing the max and min scaling frequency to > some > frequency, then we shouldn't see power difference. I mean not significant power difference. Also to get real numbers, need to use some power meter measuring CPU power. If I can get your script, I may be able to measure that. Thanks, Srinivas > > Thanks, > Srinivas > > > > > > --- > > kernel/sched/fair.c | 43 +++++++++------------------------------ > > ---- > > 1 file changed, 9 insertions(+), 34 deletions(-) > > > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > > index 83ab35e..51625b8 100644 > > --- a/kernel/sched/fair.c > > +++ b/kernel/sched/fair.c > > @@ -381,10 +381,9 @@ static inline void > > assert_list_leaf_cfs_rq(struct rq *rq) > > SCHED_WARN_ON(rq->tmp_alone_branch != &rq->leaf_cfs_rq_list); > > } > > > > -/* Iterate thr' all leaf cfs_rq's on a runqueue */ > > -#define for_each_leaf_cfs_rq_safe(rq, cfs_rq, pos) > > \ > > - list_for_each_entry_safe(cfs_rq, pos, &rq->leaf_cfs_rq_list, > > \ > > - leaf_cfs_rq_list) > > +/* Iterate through all cfs_rq's on a runqueue in bottom-up order > > */ > > +#define for_each_leaf_cfs_rq(rq, cfs_rq) \ > > + list_for_each_entry_rcu(cfs_rq, &rq->leaf_cfs_rq_list, > > leaf_cfs_rq_list) > > > > /* Do the two (enqueued) entities belong to the same group ? */ > > static inline struct cfs_rq * > > @@ -481,8 +480,8 @@ static inline void > > assert_list_leaf_cfs_rq(struct > > rq *rq) > > { > > } > > > > -#define for_each_leaf_cfs_rq_safe(rq, cfs_rq, pos) \ > > - for (cfs_rq = &rq->cfs, pos = NULL; cfs_rq; cfs_rq = > > pos) > > +#define for_each_leaf_cfs_rq(rq, cfs_rq) \ > > + for (cfs_rq = &rq->cfs; cfs_rq; cfs_rq = NULL) > > > > static inline struct sched_entity *parent_entity(struct > > sched_entity > > *se) > > { > > @@ -7502,27 +7501,10 @@ static inline void > > update_blocked_load_status(struct rq *rq, bool has_blocked) { > > > > #ifdef CONFIG_FAIR_GROUP_SCHED > > > > -static inline bool cfs_rq_is_decayed(struct cfs_rq *cfs_rq) > > -{ > > - if (cfs_rq->load.weight) > > - return false; > > - > > - if (cfs_rq->avg.load_sum) > > - return false; > > - > > - if (cfs_rq->avg.util_sum) > > - return false; > > - > > - if (cfs_rq->avg.runnable_load_sum) > > - return false; > > - > > - return true; > > -} > > - > > static void update_blocked_averages(int cpu) > > { > > struct rq *rq = cpu_rq(cpu); > > - struct cfs_rq *cfs_rq, *pos; > > + struct cfs_rq *cfs_rq; > > const struct sched_class *curr_class; > > struct rq_flags rf; > > bool done = true; > > @@ -7534,7 +7516,7 @@ static void update_blocked_averages(int cpu) > > * Iterates the task_group tree in a bottom up fashion, see > > * list_add_leaf_cfs_rq() for details. > > */ > > - for_each_leaf_cfs_rq_safe(rq, cfs_rq, pos) { > > + for_each_leaf_cfs_rq(rq, cfs_rq) { > > struct sched_entity *se; > > > > if (update_cfs_rq_load_avg(cfs_rq_clock_pelt(cfs_rq), > > cfs_rq)) > > @@ -7545,13 +7527,6 @@ static void update_blocked_averages(int cpu) > > if (se && !skip_blocked_update(se)) > > update_load_avg(cfs_rq_of(se), se, 0); > > > > - /* > > - * There can be a lot of idle CPU cgroups. Don't let > > fully > > - * decayed cfs_rqs linger on the list. > > - */ > > - if (cfs_rq_is_decayed(cfs_rq)) > > - list_del_leaf_cfs_rq(cfs_rq); > > - > > /* Don't need periodic decay once load/util_avg are > > null */ > > if (cfs_rq_has_blocked(cfs_rq)) > > done = false; > > @@ -10444,10 +10419,10 @@ const struct sched_class fair_sched_class > > = > > { > > #ifdef CONFIG_SCHED_DEBUG > > void print_cfs_stats(struct seq_file *m, int cpu) > > { > > - struct cfs_rq *cfs_rq, *pos; > > + struct cfs_rq *cfs_rq; > > > > rcu_read_lock(); > > - for_each_leaf_cfs_rq_safe(cpu_rq(cpu), cfs_rq, pos) > > + for_each_leaf_cfs_rq(cpu_rq(cpu), cfs_rq) > > print_cfs_rq(m, cpu, cfs_rq); > > rcu_read_unlock(); > > } > >