Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1030626AbXBZUHc (ORCPT ); Mon, 26 Feb 2007 15:07:32 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1030632AbXBZUHc (ORCPT ); Mon, 26 Feb 2007 15:07:32 -0500 Received: from mx2.mail.elte.hu ([157.181.151.9]:40290 "EHLO mx2.mail.elte.hu" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1030626AbXBZUHa (ORCPT ); Mon, 26 Feb 2007 15:07:30 -0500 Date: Mon, 26 Feb 2007 20:54:16 +0100 From: Ingo Molnar To: Linus Torvalds Cc: Evgeniy Polyakov , Ulrich Drepper , linux-kernel@vger.kernel.org, Arjan van de Ven , Christoph Hellwig , Andrew Morton , Alan Cox , Zach Brown , "David S. Miller" , Suparna Bhattacharya , Davide Libenzi , Jens Axboe , Thomas Gleixner Subject: Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3 Message-ID: <20070226195416.GA11188@elte.hu> References: <20070221211355.GA7302@elte.hu> <20070221233111.GB5895@elte.hu> <45DCD9E5.2010106@redhat.com> <20070222074044.GA4158@elte.hu> <20070222113148.GA3781@2ka.mipt.ru> <20070226172812.GC22454@2ka.mipt.ru> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.4.2.2i X-ELTE-VirusStatus: clean X-ELTE-SpamScore: -2.0 X-ELTE-SpamLevel: X-ELTE-SpamCheck: no X-ELTE-SpamVersion: ELTE 2.0 X-ELTE-SpamCheck-Details: score=-2.0 required=5.9 tests=BAYES_00 autolearn=no SpamAssassin version=3.0.3 -2.0 BAYES_00 BODY: Bayesian spam probability is 0 to 1% [score: 0.0000] Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3094 Lines: 55 * Linus Torvalds wrote: > > Reading from the disk is _exactly_ the same - the same waiting for > > buffer_heads/pages, and (since it is bigger) it can be easily > > transferred to event driven model. Ugh, wait, it not only _can_ be > > transferred, it is already done in kevent AIO, and it shows faster > > speeds (though I only tested sending them over the net). > > It would be absolutely horrible to program for. Try anything more > complex than read/write (which is the simplest case, but even that is > nasty). note that even for something as 'simple and straightforward' as TCP sockets, the 25-50 lines of evserver code i worked on today had 3 separate bugs, is known to be fundamentally incorrect and one of the bugs (the lost event problem) showed up as a subtle epoll performance problem and it took me more than an hour to track down. And that matches my Tux experience as well: event based models are horribly hard to debug BECAUSE there is /no procedural state associated with requests/. Hence there is no real /proof of progress/. Not much to use for debugging - except huge logs of execution, which, if one is unlucky (which i often was with Tux) would just make the problem go away. Furthermore, with a 'retry' model, who guarantees that the retry wont be an infinite retry where none of the retries ever progresses the state of the system enough to return the data we are interested in? The moment we have to /retry/, depending on the 'depth' of how deep the retry kicked in, we've got to reach that 'depth' of code again and execute it. plus, 'there is not much state' is not even completely true to begin with, even in the most simple, TCP socket case! There /is/ quite a bit of state constructed on the kernel stack: user parameters have been evaluated/converted, the socket has been looked up, its state has been validated, etc. With a 'retry' model - but even with a pure 'event queueing' model we redo all those things /both/ at request submission and at event generation time, again and again - while with a synchronous syscall you do it just once and upon event completion a piece of that data is already on the kernel stack. I'd much rather spend time and effort on simplifying the scheduler and reducing the cache footprint of the kernel thread context switch path, etc., to make it more useful even in more extreme, highly prallel '100% context-switching' case, because i have first-hand experience about how fragile and inflexible event based servers are. I do think that event interfaces for raw, external physical events make sense in some circumstances, but for any more complex 'derived' event type it's less and less clear whether we want a direct interface to it. For something like the VFS it's outright horrible to even think about. Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/