Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1030521AbXBOSrA (ORCPT ); Thu, 15 Feb 2007 13:47:00 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1030522AbXBOSrA (ORCPT ); Thu, 15 Feb 2007 13:47:00 -0500 Received: from outpost.ds9a.nl ([213.244.168.210]:49775 "EHLO outpost.ds9a.nl" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1030521AbXBOSq7 (ORCPT ); Thu, 15 Feb 2007 13:46:59 -0500 Date: Thu, 15 Feb 2007 19:46:56 +0100 From: bert hubert To: Linus Torvalds Cc: Evgeniy Polyakov , Ingo Molnar , Linux Kernel Mailing List , Arjan van de Ven , Christoph Hellwig , Andrew Morton , Alan Cox , Ulrich Drepper , Zach Brown , "David S. Miller" , Benjamin LaHaise , Suparna Bhattacharya , Davide Libenzi , Thomas Gleixner Subject: Re: [patch 05/11] syslets: core code Message-ID: <20070215184656.GA12897@outpost.ds9a.nl> Mail-Followup-To: bert hubert , Linus Torvalds , Evgeniy Polyakov , Ingo Molnar , Linux Kernel Mailing List , Arjan van de Ven , Christoph Hellwig , Andrew Morton , Alan Cox , Ulrich Drepper , Zach Brown , "David S. Miller" , Benjamin LaHaise , Suparna Bhattacharya , Davide Libenzi , Thomas Gleixner References: <20070213142035.GF638@elte.hu> <20070215133550.GA29274@2ka.mipt.ru> <20070215163704.GA32609@2ka.mipt.ru> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.9i Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2208 Lines: 56 On Thu, Feb 15, 2007 at 09:42:32AM -0800, Linus Torvalds wrote: > We know one interface: the current aio_read() one. Nobody really _likes_ [...] > Others? We don't know yet. And exposing complex interfaces that may not be > the right ones is much *worse* than exposing simple interfaces (that > _also_ may not be the right ones, of course - but simple and >From humble userland, here's two things I'd hope to be able to do, although I admit my needs are rather specialist. 1) batch, and wait for, with proper error reporting: socket(); [ setsockopt(); ] bind(); connect(); gettimeofday(); // doesn't *always* happen send(); recv(); gettimeofday(); // doesn't *always* happen I go through this sequence for each outgoing powerdns UDP query because I need a new random source port for each query, and I connect because I care about errrors. Linux does not give me random source ports for UDP sockets. When async, I can probably just drop the setsockopt (for nonblocking). I already batch the gettimeofday to 'once per epoll return', but quite often this is once per packet. 2) On the client facing side (port 53), I'd very much hope for a way to do 'recvv' on datagram sockets, so I can retrieve a whole bunch of UDP datagrams with only one kernel transition. This would mean that I batch up either 10 calls to recv(), or one 'atom' of 10 recv's. Both 1 and 2 are currently limiting factors when I enter the 100kqps domain of name serving. This doesn't mean the rest of my code is as tight as it could be, but I spend a significant portion of time in the kernel even at moderate (10kqps effective) loads, even though I already use epoll. A busy PowerDNS recursor typically spends 25% to 50% of its time on 'sy' load. This might be due to my use of get/set/swap/makecontext though. Bert -- http://www.PowerDNS.com Open source, database driven DNS Software http://netherlabs.nl Open and Closed source services - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/