Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751888AbdF3DFC (ORCPT ); Thu, 29 Jun 2017 23:05:02 -0400 Received: from mail-oi0-f45.google.com ([209.85.218.45]:34434 "EHLO mail-oi0-f45.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751005AbdF3DFA (ORCPT ); Thu, 29 Jun 2017 23:05:00 -0400 MIME-Version: 1.0 In-Reply-To: <20170630004912.GA2457@destiny> References: <20170630004912.GA2457@destiny> From: Joel Fernandes Date: Thu, 29 Jun 2017 20:04:59 -0700 Message-ID: Subject: Re: wake_wide mechanism clarification To: Josef Bacik Cc: Mike Galbraith , Peter Zijlstra , LKML , Juri Lelli , Dietmar Eggemann , Patrick Bellasi , Brendan Jackman , Chris Redpath Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3484 Lines: 75 Hi Josef, Thanks a lot for your reply, :-) On Thu, Jun 29, 2017 at 5:49 PM, Josef Bacik wrote: > Because we are trying to detect the case that the master is waking many > different processes, and the 'slave' processes are only waking up the > master/some other specific processes to determine if we don't care about cache > locality. > >> >> The code here is written as: >> >> if (slave < factor || master < slave * factor) >> return 0; >> >> However I think we should just do (with my current and probably wrong >> understanding): >> >> if (slave < factor || master < factor) >> return 0; >> > > Actually I think both are wrong, but I need Mike to weigh in. In my example > above we'd return 0, because the 'producer' will definitely have a wakee_flip of > ridiculous values, but the 'consumer' could essentially have a wakee_flip of 1, > just the master to tell it that it's done. I _suppose_ in practice you have a > lock or something so the wakee_flip isn't going to be strictly 1, but some > significantly lower value than master. I'm skeptical of the slave < factor > test, I think it's too high of a bar in the case where cache locality doesn't > really matter, but the master < slave * factor makes sense, as slave is going to > be orders of magnitude lower than master. That makes sense that we multiply slave's flips by a factor because its low, but I still didn't get why the factor is chosen to be llc_size instead of something else for the multiplication with slave (slave * factor). I know slave's flips are probably low and need to be higher, but why its multiplied with llc_size than some other number (in other words - why we are tying the size of a NUMA node or a cluster size with the number of wake-ups that the slave made)? Is this because of an assumption that a master is expected to evenly distribute work amongs CPUs within its node or something like that? More over, this case is only for when slave wakeups are far lower than the master. But, what about the case where slave wakes up are greater than the factor but approximately equal or around the same as the masters'. Then, it sounds like (master < slave * factor) can return true. In that case wake_wide() will be = 0. That sounds like a bad thing to do I think - pull a busy slave onto a busy master. >> Basically, I didn't follow why we multiply the slave's flips with >> llc_size. That makes it sound like the master has to have way more >> flips than the slave to return 0 from wake_wide. Could you maybe give >> an example to clarify? Thanks a lot for your help, >> > > It may be worth to try with schedbench and trace it to see how this turns out in > practice, as that's the workload that generated all this discussion before. I > imagine generally speaking this works out properly. The small regression I I see. I will try to find some time to play with this tool. Thanks for the pointer. > reported before was at low RPS, so we wouldn't be waking up as many tasks as > often, so we would be returning 0 from wake_wide() and we'd get screwed. This > is where I think possibly dropping the slave < factor part of the test would > address that, but I'd have to trace it to say for sure. Ah, I see your problem. I guess its a conflicting case/requirement? I guess somehow to handle your case, it has to be embedded into this condition that cache locality doesn't matter as that seems to be the assumption of slave < factor as you pointed. Thanks, Joel