Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id ; Thu, 15 Aug 2002 22:27:50 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id ; Thu, 15 Aug 2002 22:27:49 -0400 Received: from [195.223.140.120] ([195.223.140.120]:38994 "EHLO penguin.e-mind.com") by vger.kernel.org with ESMTP id ; Thu, 15 Aug 2002 22:27:49 -0400 Date: Fri, 16 Aug 2002 04:32:10 +0200 From: Andrea Arcangeli To: Benjamin LaHaise Cc: Linus Torvalds , Alan Cox , Chris Friesen , Pavel Machek , linux-kernel@vger.kernel.org, linux-aio@kvack.org Subject: Re: aio-core why not using SuS? [Re: [rfc] aio-core for 2.5.29 (Re: async-io API registration for 2.5.29)] Message-ID: <20020816023210.GK14394@dualathlon.random> References: <1028223041.14865.80.camel@irongate.swansea.linux.org.uk> <20020801140112.G21032@redhat.com> <20020815235459.GG14394@dualathlon.random> <20020815214225.H29874@redhat.com> <20020816015717.GJ14394@dualathlon.random> <20020815220054.J29874@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20020815220054.J29874@redhat.com> User-Agent: Mutt/1.3.27i X-GnuPG-Key-URL: http://e-mind.com/~andrea/aa.gnupg.asc X-PGP-Key-URL: http://e-mind.com/~andrea/aa.asc Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2355 Lines: 46 On Thu, Aug 15, 2002 at 10:00:54PM -0400, Benjamin LaHaise wrote: > On Fri, Aug 16, 2002 at 03:57:17AM +0200, Andrea Arcangeli wrote: > > you're saying you prefer glibc to wrap the aio_read/write/fsync and to > > redirect all them to lio_listio after converting the iocb from user API to > > kernel API, right? still I don't see why should we have different iocb, > > I would understsand if you say we should simply overwrite aio_lio_opcode > > inside the aio_read(3) inside glibc and to pass it over to kernel with a > > single syscalls if it's low cost to just set the lio_opcode, but having > > different data structures doesn't sounds the best still. I mean, it > > would be nicer if things would be more consistent. > > The iocb is as minimally different from the posix aio api as possible. The > main reason for the difference is that struct sigevent is unreasonably huge. > A lightweight posix aio implementation on top of the kernel API shares the > fields between the kernel iocb and the posix aiocb. /* extra parameters */ __u64 aio_reserved2; /* TODO: use this for a (struct sigevent *) */ __u64 aio_reserved3; so you want the conversion to only store the pointer (if any) for the sigevent in the iocb, rather than the whole sigevent, right? This is an argument that has technical sense and that I can happily buy for having a different iocb. However your argument also depends having I/O completion notification via signal is the common case or not. I guess in theory it should be the common case for software designed for best performance. > > > I don't see how the flushing flood is related to this, this is a normal > > syscall, any issue that applies to these aio_read/write/fsync should > > apply to all other syscalls too. Also the 4G starvation will be more > > likely fixed by x86-64 or in software by using a softpagesize larger > > than 4k so that the mem_map array doesn't load all the zone_normal. > > A 4G/4G split flushes the TLB on every syscall. sure, that's why it's so slow. This applies to read/writes/exceptions/interrupts and everything else kernel side. Andrea - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/