Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751625AbdGaQmc (ORCPT ); Mon, 31 Jul 2017 12:42:32 -0400 Received: from mail-yw0-f182.google.com ([209.85.161.182]:34101 "EHLO mail-yw0-f182.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750709AbdGaQma (ORCPT ); Mon, 31 Jul 2017 12:42:30 -0400 Date: Mon, 31 Jul 2017 16:42:28 +0000 From: Josef Bacik To: Joel Fernandes Cc: Josef Bacik , Mike Galbraith , Peter Zijlstra , LKML , Juri Lelli , Dietmar Eggemann , Patrick Bellasi , Brendan Jackman , Chris Redpath , Michael Wang , Matt Fleming Subject: Re: wake_wide mechanism clarification Message-ID: <20170731164227.GA7922@li70-116.members.linode.com> References: <20170630142815.GA9743@destiny> <1498842140.15161.66.camel@gmail.com> <1501340845.7706.168.camel@gmail.com> <20170731122149.GA7539@li70-116.members.linode.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.23 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 5155 Lines: 96 On Mon, Jul 31, 2017 at 09:21:46AM -0700, Joel Fernandes wrote: > Hi Josef, > > On Mon, Jul 31, 2017 at 5:21 AM, Josef Bacik wrote: > > On Sat, Jul 29, 2017 at 03:41:56PM -0700, Joel Fernandes wrote: > >> On Sat, Jul 29, 2017 at 3:28 PM, Joel Fernandes wrote: > >> > >> >>>> Again I didn't follow why the second condition couldn't just be: > >> >>>> waker->nr_wakee_switch > factor, or, (waker->nr_wakee_switch + > >> >>>> wakee->nr_wakee_switch) > factor, based on the above explanation from > >> >>>> Micheal Wang that I quoted. > >> >>>> and why he's instead doing the whole multiplication thing there that I > >> >>>> was talking about earlier: "factor * wakee->nr_wakee_switch". > >> >>>> > >> >>>> Rephrasing my question in another way, why are we talking the ratio of > >> >>>> master/slave instead of the sum when comparing if its > factor? I am > >> >>>> surely missing something here. > >> >>> > >> >>> Because the heuristic tries to not demolish 1:1 buddies. Big partner > >> >>> flip delta means the pair are unlikely to be a communicating pair, > >> >>> perhaps at high frequency where misses hurt like hell. > >> >> > >> >> But it does seem to me to demolish the N:N communicating pairs from a > >> >> latency/load balancing standpoint. For he case of N readers and N > >> >> writers, the ratio (master/slave) comes down to 1:1 and we wake > >> >> affine. Hopefully I didn't miss something too obvious about that. > >> > > >> > I think wake_affine() should correctly handle the case (of > >> > overloading) I bring up here where wake_wide() is too conservative and > >> > does affine a lot, (I don't have any data for this though, this just > >> > from code reading), so I take this comment back for this reason. > >> > >> aargh, nope :( it still runs select_idle_sibling although on the > >> previous CPU even if want_affine is 0 (and doesn't do the wider > >> wakeup..), so the comment still applies.. its easy to get lost into > >> the code with so many if statements :-\ sorry about the noise :) > >> > > > > I've been working in this area recently because of a cpu imbalance problem. > > Wake_wide() definitely makes it so we're waking affine way too often, but I > > think messing with wake_waide to solve that problem is the wrong solution. This > > is just a heuristic to see if we should wake affine, the simpler the better. I > > solved the problem of waking affine too often like this > > > > https://marc.info/?l=linux-kernel&m=150003849602535&w=2 > > Thanks! Cool! > > > > > So why do you care about wake_wide() anyway? Are you observing some problem > > that you suspect is affected by the affine wakeup stuff? Or are you just trying > > I am dealing with an affine wake up issue, yes. > > > to understand what is going on for fun? Cause if you are just doing this for > > fun you are a very strange person, thanks, > > Its not just for fun :) Let me give you some background about me, I > work in the Android team and one of the things I want to do is to take > an out of tree patch that's been carried for some time and post a more > upstreamable solution - this is related to wake ups from the binder > driver which does sync wake ups (WF_SYNC). I can't find the exact out > of tree patch publicly since it wasn't posted to a list, but the code > is here [1]. What's worse is I have recently found really bad issues > with this patch itself where runnable times are increased. I should > have provided this background earlier (sorry that I didn't, my plan > was to trigger a separate discussion about the binder sync wake up > thing as a part of a patch/proposal I want to post - which I plan to > do so). Anyway, as a part of this effort, I want to understand > wake_wide() better and "respect" it since it sits in the wake up path > and I wanted to my proposal to work well with it, especially since I > want to solve this problem in an upstream-friendly way. > > The other reason to trigger the discussion, is, I have seen > wake_wide() enough number of times and asked enough number of folks > how it works that it seems sensible to ask about it here (I was also > suggested to ask about wake_wide on LKML because since few people > seemingly understand how it works) and hopefully now its a bit better > understood. > > I agree with you that instead of spending insane amounts of time on > wake_wide itself, its better to directly work on a problem and collect > some data - which is also what I'm doing, but I still thought its > worth doing some digging into wake_wide() during some free time I had, > thanks. > Ok so your usecase is to _always_ wake affine if we're doing a sync wakeup. I _think_ for your case it's best to make wake_affine() make this decision, and you don't want wake_wide() to filter out your wakeup as not-affine? So perhaps just throw a test in there to not wake wide if WF_SYNC is set. This makes logical sense to me as synchronous wakeups are probably going to want to be affine wakeups, and then we can rely on wake_affine() to do the load checks to make sure it really makes sense. How does that sound? Thanks, Josef