Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1758784AbYKWOZW (ORCPT ); Sun, 23 Nov 2008 09:25:22 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1758195AbYKWOZI (ORCPT ); Sun, 23 Nov 2008 09:25:08 -0500 Received: from casper.infradead.org ([85.118.1.10]:57249 "EHLO casper.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1758180AbYKWOZH (ORCPT ); Sun, 23 Nov 2008 09:25:07 -0500 Subject: Re: regression introduced by - timers: fix itimer/many thread hang From: Peter Zijlstra To: Petr Tesarik Cc: Frank Mayhar , Christoph Lameter , Doug Chapman , mingo@elte.hu, roland@redhat.com, adobriyan@gmail.com, akpm@linux-foundation.org, linux-kernel In-Reply-To: <200811211942.43848.ptesarik@suse.cz> References: <1224694989.8431.23.camel@oberon> <1226015568.2186.20.camel@bobble.smo.corp.google.com> <1226053744.7803.5851.camel@twins> <200811211942.43848.ptesarik@suse.cz> Content-Type: text/plain Content-Transfer-Encoding: 7bit Date: Sun, 23 Nov 2008 15:24:56 +0100 Message-Id: <1227450296.7685.20759.camel@twins> Mime-Version: 1.0 X-Mailer: Evolution 2.24.1 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2333 Lines: 48 On Fri, 2008-11-21 at 19:42 +0100, Petr Tesarik wrote: > > > In any event, while this particular implementation may not be optimal, > > > at least it's _right_. Whatever happened to "make it right, then make > > > it fast?" > > > > Well, I'm not thinking you did it right ;-) > > > > While I agree that the linear loop is sub-optimal, but it only really > > becomes a problem when you have hundreds or thousands of threads in your > > application, which I'll argue to be insane anyway. > > This is just not true. I've seen a very real example of a lockup with a very > sane number of threads (one per CPU), but on a very large machine (1024 CPUs > IIRC). The application set per-process CPU profiling with an interval of 1 > tick, which translates to 1024 timers firing off with each tick... > > Well, yes, that was broken, too, but that's the way one quite popular FORTRAN > compiler works... I'm not sure what side you're arguing... The current (per-cpu) code is utterly broken on large machines too, I've asked SGI to run some tests on real numa machines (something multi-brick altix) and even moderately small machines with 256 cpus in them grind to a halt (or make progress at a snails pace) when the itimer stuff is enabled. Furthermore, I really dislike the per-process-per-cpu memory cost, it bloats applications and makes the new per-cpu alloc work rather more difficult than it already is. I basically think the whole process wide itimer stuff is broken by design, there is no way to make it work on reasonably large machines, the whole problem space just doesn't scale. You simply cannot maintain a global count without bouncing cachelines like mad, so you might as well accept it and do the process wide counter and bounce only a single line, instead of bouncing a line per-cpu. Furthermore, I stand by my claims that anything that runs more than a hand-full of threads per physical core is utterly braindead and deserves all the pain it can get. (Yes, I'm a firm believer in state machines and don't think just throwing threads at a problem is a sane solution). -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/