Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1750707AbWINOnx (ORCPT ); Thu, 14 Sep 2006 10:43:53 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1750752AbWINOnx (ORCPT ); Thu, 14 Sep 2006 10:43:53 -0400 Received: from ecfrec.frec.bull.fr ([129.183.4.8]:48526 "EHLO ecfrec.frec.bull.fr") by vger.kernel.org with ESMTP id S1750721AbWINOnv convert rfc822-to-8bit (ORCPT ); Thu, 14 Sep 2006 10:43:51 -0400 Subject: Re: [PATCH AIO 2/2] Add listio support From: =?ISO-8859-1?Q?S=E9bastien_Dugu=E9?= To: Zach Brown Cc: linux-kernel@vger.kernel.org, linux-aio@kvack.org, drepper@redhat.com, suparna@in.ibm.com, pbadari@us.ibm.com, hch@infradead.org, johnpol@2ka.mipt.ru, davem@davemloft.net, Jean Pierre Dion In-Reply-To: <450753C1.2080308@oracle.com> References: <20060911114152.4f67e59f@frecb000686> <450753C1.2080308@oracle.com> Date: Thu, 14 Sep 2006 16:43:42 +0200 Message-Id: <1158245022.3890.68.camel@frecb000686> Mime-Version: 1.0 X-Mailer: Evolution 2.4.2.1 X-MIMETrack: Itemize by SMTP Server on ECN002/FR/BULL(Release 5.0.12 |February 13, 2003) at 14/09/2006 16:49:27, Serialize by Router on ECN002/FR/BULL(Release 5.0.12 |February 13, 2003) at 14/09/2006 16:49:30, Serialize complete at 14/09/2006 16:49:30 Content-Transfer-Encoding: 8BIT Content-Type: text/plain; charset=ISO-8859-15 Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 5864 Lines: 172 On Tue, 2006-09-12 at 17:41 -0700, Zach Brown wrote: > > As io_submit already accepts an array of iocbs, as part of listio submission, > > the user process prepends to a list of requests an empty special aiocb with > > an aio_lio_opcode of IOCB_CMD_GROUP, filling only the aio_sigevent fields. > > Hmm, overloading an iocb as the _GROUP indication doesn't make much > sense to me, other than being an expedient way to shoe-horn the > semantics into the existing API. That we have to worry about multiple > _GROUP iocbs in the array (ignoring them currently) seems like > complexity that we shouldn't even have to consider. > > Am I just missing the discussion that lead us to avoid a syscall? It > seems like we're getting a pretty funky API in return. The original intent was to use the io_submit() interface which already accepts a list of iocbs without having to resort on yet another syscall. But I agree that it's kludgy and a new syscall might end up being much cleaner. Discussion on this issue would be much welcomed. > > I guess I could also see an iocb command whose buf/bytes pointed to an > array of iocb pointers that it should issue and wait for completion on. > You could call it, uh, IOCB_CMD_IO_SUBMIT :). I'm only sort of kidding > here :). With a sys_io_wait_for_one_iocb() that might not be entirely > ridiculous. Then you could get an io_getevents() completion of a group > instead of only being able to get a signal or waiting on sync > completion. It makes it less of a special case. > > In any case, this current approach doesn't seem complete. It doesn't > deal with the case where canceled iocbs don't come through > aio_complete() and so don't drop their lio_users count. Yep, will have to think a bit more about this. > > Also, it seems that the lio_submit() interface allows NULL iocb pointers > by skipping them. sys_io_submit() looks like it'd fault on them. It > sounds unpleasant to have an app allow nulls in the array and have the > library copy and compress the members, or something. Maybe we could fix > up that mismatch in an API that explicitly took the sigevent struct? Am > I making up this problem? Effectively, right now the NULL pointers have to be handled by the library and the new syscall approach would allow to solve this in an elegant way. > > more inline.. > > > +static void lio_wait(struct kioctx *ctx, struct lio_event *lio) > > +{ > > + struct task_struct *tsk = current; > > + DECLARE_WAITQUEUE(wait, tsk); > > + > > + add_wait_queue(&ctx->wait, &wait); > > + set_task_state(tsk, TASK_UNINTERRUPTIBLE); > > + > > + while ( atomic_read(&lio->lio_users) ) { > > + schedule(); > > + set_task_state(tsk, TASK_UNINTERRUPTIBLE); > > + } > > + __set_task_state(tsk, TASK_RUNNING); > > + remove_wait_queue(&ctx->wait, &wait); > > wait_event(ctx->wait, atomic_read(&lio->lio_users)); Nice shortcut, thanks. > > > +static inline void lio_check(struct lio_event *lio) > > +{ > > + int ret; > > + > > + ret = atomic_dec_and_test(&(lio->lio_users)); > > + > > + if (unlikely(ret) && lio->lio_notify.notify != SIGEV_NONE) { > > + /* last one -> notify process */ > > + aio_send_signal(&lio->lio_notify); > > + kfree(lio); > > Since io_submit() doesn't hold a ref on lio_users this can race during > submission to send the signal and free the lio before all the operations > have been submitted. Right. > > > + if (*lio) { > > + printk (KERN_DEBUG "lio_create: already have an lio\n"); > > We should get rid of the printk so that the user doesn't have a trivial > way to flood the kernel message log. OK. > > > + *lio = kmalloc(sizeof(*lio), GFP_KERNEL); > > + > > + if (!*lio) > > + return -EAGAIN; > > + > > + memset (*lio, 0, sizeof(struct lio_event)); > > kzalloc(), and I imagine it would be a lot cleaner to return the lio or > ERR_PTR() instead of passing in the pointer to the struct and filling it in. Makes sense. > > And is it legal to memset(&atomic_t, 0, )? I would have expected a > atomic_set(&lio->lio_users, 0). Maybe I'm stuck in the bad old days of > 24 bit atomic_t and embedded locks :). Well, it's just an allocation so lio->lio_users is not even ready to be used. So I don't think it makes any difference whether this field is cleared during the memset (or kzalloc) or using an explicit atomic_set(). Or are there any architectures still out there that stuff state bits in an atomic_t? > > > + if (lio && ((tmp.aio_lio_opcode == IOCB_CMD_PREAD) || > > + (tmp.aio_lio_opcode == IOCB_CMD_PWRITE))) { > > It doesn't make sense to me to restrict which ops are allowed to be in a > list or not. Why not PREADV? An iocb is an iocb, I think. Right, I wrongly focused on the POSIX aspects, but we should be more versatile here. > > > + IOCB_CMD_GROUP = 7, > > 7 is already taken in -mm by IOCB_CMD_PREADV . Argh, forgot to check -mm. Sorry. > > > +struct lio_event { > > + atomic_t lio_users; > > + int lio_wait; > > + struct aio_notify lio_notify; > > I'm pretty sure you don't need lio_wait here, given that it's only > tested in the submission path. > Yes, we can do without. Thanks, S?bastien. -- ----------------------------------------------------- S?bastien Dugu? BULL/FREC:B1-247 phone: (+33) 476 29 77 70 Bullcom: 229-7770 mailto:sebastien.dugue@bull.net Linux POSIX AIO: http://www.bullopensource.org/posix http://sourceforge.net/projects/paiol ----------------------------------------------------- - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/