Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754112AbcDYJe2 (ORCPT ); Mon, 25 Apr 2016 05:34:28 -0400 Received: from bombadil.infradead.org ([198.137.202.9]:47899 "EHLO bombadil.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754057AbcDYJe1 (ORCPT ); Mon, 25 Apr 2016 05:34:27 -0400 Date: Mon, 25 Apr 2016 11:34:24 +0200 From: Peter Zijlstra To: Brendan Gregg Cc: Jeff Merkey , LKML , Mike Galbraith , Ingo Molnar Subject: Re: [RFC] The Linux Scheduler: a Decade of Wasted Cores Report Message-ID: <20160425093424.GE12845@twins.programming.kicks-ass.net> References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2012-12-30) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2408 Lines: 57 On Sat, Apr 23, 2016 at 06:38:25PM -0700, Brendan Gregg wrote: > On Sat, Apr 23, 2016 at 11:20 AM, Jeff Merkey wrote: > > > > Interesting read. > > > > http://www.ece.ubc.ca/~sasha/papers/eurosys16-final29.pdf > > > > "... The Linux kernel scheduler has deficiencies that prevent a > > multicore system from making proper use of all cores for heavily > > multithreaded loads, according to a lecture and paper delivered > > earlier this month at the EuroSys '16 conference in London, ..." > > > > Any plans to incorporate these fixes? No; their patches are completely butchering things. Also, I don't think I agree with some of their analysis. Sadly the paper doesn't provide enough detail to fully reproduce things. Nor have I had time to really look into things yet. I was only made aware of this paper last week -- it was so good of these here folks to contact me,. oh wait. > While this paper analyzes and proposes fixes for four bugs, it has > been getting a lot of attention for broader claims about Linux being > fundamentally broken: > > "As a central part of resource management, the OS thread scheduler > must maintain the following, simple, invariant: make sure that ready > threads are scheduled on available cores. This is actually debatable. This is a global problem, therefore it is expensive. It can take more work to find a runnable task than we would have been idle for in the first place. > As simple as it may seem, we > found that this invariant is often broken in Linux. Cores may stay > idle for seconds while ready threads are waiting in runqueues." Right, obviously seconds is undesirable. > Then states that the problems in the Linux scheduler that they found > cause degradations of "13-24% for typical Linux workloads". > > Their proof of concept patches are online[1]. I tested them and saw 0% > improvements on the systems I tested, for some simple workloads[2]. I > tested 1 and 2 node NUMA, as that is typical for my employer (Netflix, > and our tens of thousands of Linux instances in the AWS/EC2 cloud), > even though I wasn't expecting any difference on 1 node. I've used > synthetic workloads so far. So their setup uses a bigger (not fully connected) NUMA topology, and I'm not entirely sure how much of their problems are due to that, but at least one of them is. Such boxes are fairly rare. In any case, I'll get to it at some point...