Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753006AbdI1MiE (ORCPT ); Thu, 28 Sep 2017 08:38:04 -0400 Received: from bombadil.infradead.org ([65.50.211.133]:52277 "EHLO bombadil.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752069AbdI1MiC (ORCPT ); Thu, 28 Sep 2017 08:38:02 -0400 Date: Thu, 28 Sep 2017 14:37:58 +0200 From: Peter Zijlstra To: Rik van Riel Cc: Eric Farman , ????????? , LKML , Ingo Molnar , Christian Borntraeger , "KVM-ML (kvm@vger.kernel.org)" , vcaputo@pengaru.com, Matthew Rosato Subject: Re: sysbench throughput degradation in 4.13+ Message-ID: <20170928123758.robe5ggsjf4voj7h@hirez.programming.kicks-ass.net> References: <95edafb1-5e9d-8461-db73-bcb002b7ebef@linux.vnet.ibm.com> <50a279d3-84eb-3403-f2f0-854934778037@linux.vnet.ibm.com> <20170922155348.zujigkn3o5eylctn@hirez.programming.kicks-ass.net> <754f5a9f-5332-148d-2631-918fc7a7cfe9@linux.vnet.ibm.com> <20170927093530.s3sgdz2vamc5ka4w@hirez.programming.kicks-ass.net> <20170927135820.61cd077f@cuia.usersys.redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20170927135820.61cd077f@cuia.usersys.redhat.com> User-Agent: NeoMutt/20170609 (1.8.3) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1693 Lines: 44 On Wed, Sep 27, 2017 at 01:58:20PM -0400, Rik van Riel wrote: > @@ -5359,10 +5378,14 @@ wake_affine_llc(struct sched_domain *sd, struct task_struct *p, > unsigned long current_load = task_h_load(current); > > /* in this case load hits 0 and this LLC is considered 'idle' */ > - if (current_load > this_stats.load) > + if (current_load > this_stats.max_load) > + return true; > + > + /* allow if the CPU would go idle, regardless of LLC load */ > + if (current_load >= target_load(this_cpu, sd->wake_idx)) > return true; > > - this_stats.load -= current_load; > + this_stats.max_load -= current_load; > } > > /* > @@ -5375,10 +5398,6 @@ wake_affine_llc(struct sched_domain *sd, struct task_struct *p, > if (prev_stats.has_capacity && prev_stats.nr_running < this_stats.nr_running+1) > return false; > > - /* if this cache has capacity, come here */ > - if (this_stats.has_capacity && this_stats.nr_running+1 < prev_stats.nr_running) > - return true; > - > /* > * Check to see if we can move the load without causing too much > * imbalance. > @@ -5391,8 +5410,8 @@ wake_affine_llc(struct sched_domain *sd, struct task_struct *p, > prev_eff_load = 100 + (sd->imbalance_pct - 100) / 2; > prev_eff_load *= this_stats.capacity; > > - this_eff_load *= this_stats.load + task_load; > - prev_eff_load *= prev_stats.load - task_load; > + this_eff_load *= this_stats.max_load + task_load; > + prev_eff_load *= prev_stats.min_load - task_load; > > return this_eff_load <= prev_eff_load; > } So I would really like a workload that needs this LLC/NUMA stuff. Because I much prefer the simpler: 'on which of these two CPUs can I run soonest' approach.