Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933135AbcDFH1e (ORCPT ); Wed, 6 Apr 2016 03:27:34 -0400 Received: from mx2.suse.de ([195.135.220.15]:50762 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932196AbcDFH1d (ORCPT ); Wed, 6 Apr 2016 03:27:33 -0400 Message-ID: <1459927644.5612.41.camel@suse.de> Subject: Re: [PATCH RFC] select_idle_sibling experiments From: Mike Galbraith To: Chris Mason , Peter Zijlstra , Ingo Molnar , Matt Fleming , linux-kernel@vger.kernel.org Date: Wed, 06 Apr 2016 09:27:24 +0200 In-Reply-To: <20160405180822.tjtyyc3qh4leflfj@floor.thefacebook.com> References: <20160405180822.tjtyyc3qh4leflfj@floor.thefacebook.com> Content-Type: multipart/mixed; boundary="=-qtgdJe1VT21Q+QkogHSj" X-Mailer: Evolution 3.16.5 Mime-Version: 1.0 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 5929 Lines: 141 --=-qtgdJe1VT21Q+QkogHSj Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit > On Tue, 2016-04-05 at 14:08 -0400, Chris Mason wrote: > Now, on to the patch. I pushed some code around and narrowed the > problem down to select_idle_sibling() We have cores going into and out > of idle fast enough that even this cut our latencies in half: Are you using NO_HZ? If so, you may want to try the attached. > static int select_idle_sibling(struct task_struct *p, int target) > goto next; > > for_each_cpu(i, sched_group_cpus(sg)) { > - if (i == target || !idle_cpu(i)) > + if (!idle_cpu(i)) > goto next; > } > > IOW, by the time we get down to for_each_cpu(), the idle_cpu() check > done at the top of the function is no longer valid. Ok, that's only an optimization, could go if it's causing trouble. > I tried a few variations on select_idle_sibling() that preserved the > underlying goal of returning idle cores before idle SMT threads. They > were all horrible in different ways, and none of them were fast. > > The patch below just makes select_idle_sibling pick the first idle > thread it can find. When I ran it through production workloads here, it > was faster than the patch we've been carrying around for the last few > years. > > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > index 56b7d4b..c41baa6 100644 > --- a/kernel/sched/fair.c > +++ b/kernel/sched/fair.c > @@ -4974,7 +4974,6 @@ find_idlest_cpu(struct sched_group *group, > struct task_struct *p, int this_cpu) > static int select_idle_sibling(struct task_struct *p, int target) > { > struct sched_domain *sd; > - struct sched_group *sg; > int i = task_cpu(p); > > if (idle_cpu(target)) > @@ -4990,24 +4989,14 @@ static int select_idle_sibling(struct > task_struct *p, int target) > * Otherwise, iterate the domains and find an elegible idle > cpu. > */ > sd = rcu_dereference(per_cpu(sd_llc, target)); > - for_each_lower_domain(sd) { > - sg = sd->groups; > - do { > - if > (!cpumask_intersects(sched_group_cpus(sg), > - tsk_cpus_allowed(p)) > ) > - goto next; > - > - for_each_cpu(i, sched_group_cpus(sg)) { > - if (i == target || !idle_cpu(i)) > - goto next; > - } > + if (!sd) > + goto done; > > - target = > cpumask_first_and(sched_group_cpus(sg), > - tsk_cpus_allowed(p)); > + for_each_cpu_and(i, sched_domain_span(sd), &p->cpus_allowed) > { > + if (cpu_active(i) && idle_cpu(i)) { > + target = i; > goto done; > -next: > - sg = sg->next; > - } while (sg != sd->groups); > + } > } > done: > return target; > Ew. That may improve your latency is everything load, but worst case package walk will hurt like hell on CPUs with insane number of threads. That full search also turns the evil face of two-faced little select_idle_sibling() into it's only face, the one that bounces tasks about much more than they appreciate. Looking for an idle core first delivers the most throughput boost, and only looking at target's threads if you don't find one keeps the bounce and traverse pain down to a dull roar, while at least trying to get that latency win. To me, your patch looks like it trades harm to many, for good for a few. A behavior switch would be better. It can't get any dumber, but trying to make it smarter makes it too damn fat. As it sits, it's aiming in the general direction of the bullseye.. and occasionally hits the wall. -Mike --=-qtgdJe1VT21Q+QkogHSj Content-Disposition: attachment; filename="sched-throttle-nohz.patch" Content-Type: text/x-patch; name="sched-throttle-nohz.patch"; charset="UTF-8" Content-Transfer-Encoding: base64 c2NoZWQ6IHJhdGVsaW1pdCBub2h6CgpFbnRlcmluZyBub2h6IGNvZGUgb24gZXZlcnkgbWljcm8t aWRsZSBpcyB0b28gZXhwZW5zaXZlIHRvIGJlYXIuCgpTaWduZWQtb2ZmLWJ5OiBNaWtlIEdhbGJy YWl0aCA8ZWZhdWx0QGdteC5kZT4KLS0tCiBpbmNsdWRlL2xpbnV4L3NjaGVkLmggICAgfCAgICA1 ICsrKysrCiBrZXJuZWwvc2NoZWQvY29yZS5jICAgICAgfCAgICA4ICsrKysrKysrCiBrZXJuZWwv dGltZS90aWNrLXNjaGVkLmMgfCAgICAyICstCiAzIGZpbGVzIGNoYW5nZWQsIDE0IGluc2VydGlv bnMoKyksIDEgZGVsZXRpb24oLSkKCi0tLSBhL2luY2x1ZGUvbGludXgvc2NoZWQuaAorKysgYi9p bmNsdWRlL2xpbnV4L3NjaGVkLmgKQEAgLTIyODYsNiArMjI4NiwxMSBAQCBzdGF0aWMgaW5saW5l IGludCBzZXRfY3B1c19hbGxvd2VkX3B0cihzCiAjaWZkZWYgQ09ORklHX05PX0haX0NPTU1PTgog dm9pZCBjYWxjX2xvYWRfZW50ZXJfaWRsZSh2b2lkKTsKIHZvaWQgY2FsY19sb2FkX2V4aXRfaWRs ZSh2b2lkKTsKKyNpZmRlZiBDT05GSUdfU01QCitleHRlcm4gaW50IHNjaGVkX25lZWRzX2NwdShp bnQgY3B1KTsKKyNlbHNlCitzdGF0aWMgaW5saW5lIGludCBzY2hlZF9uZWVkc19jcHUoaW50IGNw dSkgeyByZXR1cm4gMDsgfQorI2VuZGlmCiAjZWxzZQogc3RhdGljIGlubGluZSB2b2lkIGNhbGNf bG9hZF9lbnRlcl9pZGxlKHZvaWQpIHsgfQogc3RhdGljIGlubGluZSB2b2lkIGNhbGNfbG9hZF9l eGl0X2lkbGUodm9pZCkgeyB9Ci0tLSBhL2tlcm5lbC9zY2hlZC9jb3JlLmMKKysrIGIva2VybmVs L3NjaGVkL2NvcmUuYwpAQCAtNTc3LDYgKzU3NywxNCBAQCBzdGF0aWMgaW5saW5lIGJvb2wgZ290 X25vaHpfaWRsZV9raWNrKHZvCiAJcmV0dXJuIGZhbHNlOwogfQogCitpbnQgc2NoZWRfbmVlZHNf Y3B1KGludCBjcHUpCit7CisJaWYgKHRpY2tfbm9oel9mdWxsX2NwdShjcHUpKQorCQlyZXR1cm4g MDsKKworCXJldHVybiAgY3B1X3JxKGNwdSktPmF2Z19pZGxlIDwgc3lzY3RsX3NjaGVkX21pZ3Jh dGlvbl9jb3N0OworfQorCiAjZWxzZSAvKiBDT05GSUdfTk9fSFpfQ09NTU9OICovCiAKIHN0YXRp YyBpbmxpbmUgYm9vbCBnb3Rfbm9oel9pZGxlX2tpY2sodm9pZCkKLS0tIGEva2VybmVsL3RpbWUv dGljay1zY2hlZC5jCisrKyBiL2tlcm5lbC90aW1lL3RpY2stc2NoZWQuYwpAQCAtNjc2LDcgKzY3 Niw3IEBAIHN0YXRpYyBrdGltZV90IHRpY2tfbm9oel9zdG9wX3NjaGVkX3RpY2sKIAl9IHdoaWxl IChyZWFkX3NlcXJldHJ5KCZqaWZmaWVzX2xvY2ssIHNlcSkpOwogCXRzLT5sYXN0X2ppZmZpZXMg PSBiYXNlamlmZjsKIAotCWlmIChyY3VfbmVlZHNfY3B1KGJhc2Vtb25vLCAmbmV4dF9yY3UpIHx8 CisJaWYgKHNjaGVkX25lZWRzX2NwdShjcHUpIHx8IHJjdV9uZWVkc19jcHUoYmFzZW1vbm8sICZu ZXh0X3JjdSkgfHwKIAkgICAgYXJjaF9uZWVkc19jcHUoKSB8fCBpcnFfd29ya19uZWVkc19jcHUo KSkgewogCQluZXh0X3RpY2sgPSBiYXNlbW9ubyArIFRJQ0tfTlNFQzsKIAl9IGVsc2Ugewo= --=-qtgdJe1VT21Q+QkogHSj--