Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1422959AbXBAUIZ (ORCPT ); Thu, 1 Feb 2007 15:08:25 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1422972AbXBAUIY (ORCPT ); Thu, 1 Feb 2007 15:08:24 -0500 Received: from smtp.osdl.org ([65.172.181.24]:33582 "EHLO smtp.osdl.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1422959AbXBAUIY (ORCPT ); Thu, 1 Feb 2007 15:08:24 -0500 Date: Thu, 1 Feb 2007 12:07:42 -0800 (PST) From: Linus Torvalds To: Ingo Molnar cc: Zach Brown , linux-kernel@vger.kernel.org, linux-aio@kvack.org, Suparna Bhattacharya , Benjamin LaHaise Subject: Re: [PATCH 2 of 4] Introduce i386 fibril scheduling In-Reply-To: <20070201083611.GC18233@elte.hu> Message-ID: References: <20070201083611.GC18233@elte.hu> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3709 Lines: 82 On Thu, 1 Feb 2007, Ingo Molnar wrote: > > there's almost no scheduling cost from being able to arbitrarily > schedule a kernel thread - but there are /huge/ benefits in it. That's a singularly *stupid* argument. Of course scheduling is fast. That's the whole *point* of fibrils. They still schedule. Nobody claimed anything else. Bringing up RT kernels and scheduling latency is idiotic. It's like saying "we should do this because the sky is blue". Sure, that's true, but what the *hell* does raleigh scattering have to do with anything? The cost has _never_ been scheduling. That was never the point. Why do you even bring it up? Only to make an argument that makes no sense? The cost of AIO is - maintenance. It'sa separate code-path, and it's one that simply doesn't fit into anything else AT ALL. It works (mostly) for simple things, ie reads and writes, but even there, it's really adding a lot of crud that we could do without. - setup and teardown costs: both in CPU and in memory. These are the big costs. It's especially true since a lot of AIO actually ends up cached. The user program just wants the data - 99% of the time it's likely to be there, and the whole point of AIO is to get at it cheaply, but not block if it's not there. So your scheduling arguments are inane. They totally miss the point. They have nothing to do with *anything*. Ingo: everybody *agrees* that scheduling is cheap. Scheduling isn't the issue. Scheduling isn't even needed in the perfect path where the AIO didn't need to do any real IO (and that _is_ the path we actually would like to optimize most). So instead of talking about totally irrelevant things, please keep your eyes on the ball. So I claim that the ball is here: - cached data (and that is *espectally* true of some of the more interesting things we can do with a more generic AIO thing: path lookup, inode filling (stat/fstat) etc usually has hit-rates in the 99% range, but missing even just 1% of the time can be deadly, if the miss costs you a hundred msec of not doing anythign else! Do the math. A "stat()" system call generally takes on the other of a couple of microseconds. But if it misses even just 1% of the time (and takes 100 msec when it does that, because there is other IO also competing for the disk arm), ON AVERAGE it takes 1ms. So what you should aim for is improving that number. The cached case should hopefully still be in the microseconds, and the uncached case should be nonblocking for the caller. - setup/teardown costs. Both memory and CPU. This is where the current threads simply don't work. The setup cost of doing a clone/exit is actually much higher than the cost of doing the whole operation, most of the time. Remember: caches still work. - maintenance. Clearly AIO will always have some special code, but if we can move the special code *away* from filesystems and networking and all the thousands of device drivers, and into core kernel code, we've done something good. And if we can extend it from just pure read/write into just about *anything*, then people will be happy. So stop blathering about scheduling costs, RT kernels and interrupts. Interrupts generally happen a few thousand times a second. This is soemthing you want to do a *million* times a second, without any IO happening at all except for when it has to. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/