Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752212AbcDXHFX (ORCPT ); Sun, 24 Apr 2016 03:05:23 -0400 Received: from mail-wm0-f68.google.com ([74.125.82.68]:36153 "EHLO mail-wm0-f68.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751593AbcDXHFW (ORCPT ); Sun, 24 Apr 2016 03:05:22 -0400 Message-ID: <1461481517.3835.125.camel@gmail.com> Subject: Re: [RFC] The Linux Scheduler: a Decade of Wasted Cores Report From: Mike Galbraith To: Brendan Gregg , Jeff Merkey Cc: LKML Date: Sun, 24 Apr 2016 09:05:17 +0200 In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" X-Mailer: Evolution 3.16.5 Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2142 Lines: 39 On Sat, 2016-04-23 at 18:38 -0700, Brendan Gregg wrote: > The bugs they found seem real, and their analysis is great (although > using visualizations to find and fix scheduler bugs isn't new), and it > would be good to see these fixed. However, it would also be useful to > double check how widespread these issues really are. I suspect many on > this list can test these patches in different environments. Part of it sounded to me very much like they're meeting and "fixing" SMP group fairness. Take the worst case, a threads=cores group of synchronized threads passing checkpoints in lockstep competing with a group of one hog: synchronized threads that have a core to themselves must wait (busy as they mentioned, or sleep) for the straggler thread who's fair share is a small fraction (1/65 for 64 core box) of a core to catch up before the group as a unit can proceed. Without SMP fairness, groups intersecting compete as equals at any given intersection (assuming shares have not been twiddled), thus a fully synchronized load can utilize up to 50% of a box [1], whereas with SMP fairness, worst case load slams head on into a one core wall. Pondering the progress dependency thingy a bit, seems some degree of that is likely, thus it logically follows that SMP fairness is likely to find some non zero delta to multiply by box size. This came up fairly recently, with a university math department admin grumbling that cranky professors were beating him bloody. Testing, I couldn't confirm exactly what he was grumbling about (couldn't figure out exactly what that was actually), but thinking about it combined with what I was seeing made me too want to "fix" it by smacking it squarely between the eyes with my BFH. Turned out that it had grown a wart though, isn't nearly as bad in the real world (defined as measuring random generic stuff on my little box;) as idle pondering, and measurement of slightly dinged up code had indicated. Like everything else, it cuts both ways. -Mike 1. IOW do NOT run highly specialized load in generic environment, it is guaranteed to either suck rock or suck gigantic frick'n boulders.