2005-09-20 10:21:30

by Sébastien Dugué

[permalink] [raw]
Subject: Re: [AIO] aio-2.6.13-rc6-B1

On Wed, 2005-08-17 at 14:44 -0400, Benjamin LaHaise wrote:
> The bugfix followup to the last aio rollup is now available at:
>
> http://www.kvack.org/~bcrl/patches/aio-2.6.13-rc6-B1-all.diff
>
> with the split up in:
>
> http://www.kvack.org/~bcrl/patches/aio-2.6.13-rc6-B1/
>
> This fixes the bugs noticed in the -B0 variant. Major changes in this
> patchset are:
>
> - added aio semaphore ops
> - aio thread based fallbacks
> - vectored aio file_operations
> - aio sendmsg/recvmsg via thread fallbacks
> - retry based aio pipe operations
>
> Comments?
>
> -ben
>

Hi Ben,

what's the point of calling wake_up_locked(&sem->wait) in
aio_down_wait? We're already in a wakeup path and end up
calling __wake_up_common recursively.

I think it may be one of the cause of my kernel hanging at the
very beginning.

When I remove this call things go further but at some point a
semaphore wait queue gets thrashed and __wake_up_common tries to
call an invalid callback function.

Any input appreciated.

S?bastien.

--
------------------------------------------------------

S?bastien Dugu? BULL/FREC:B1-247
phone: (+33) 476 29 77 70 Bullcom: 229-7770

mailto:[email protected]

Linux POSIX AIO: http://www.bullopensource.org/posix

------------------------------------------------------


2005-09-20 19:19:31

by Benjamin LaHaise

[permalink] [raw]
Subject: Re: [AIO] aio-2.6.13-rc6-B1

On Tue, Sep 20, 2005 at 12:23:10PM +0200, S?bastien Dugu? wrote:
> what's the point of calling wake_up_locked(&sem->wait) in
> aio_down_wait? We're already in a wakeup path and end up
> calling __wake_up_common recursively.

That's necessary to kick the next semaphore op in the list. The
list_del_init() right above that makes sure that we don't recurse
and run the routine again.

> I think it may be one of the cause of my kernel hanging at the
> very beginning.
>
> When I remove this call things go further but at some point a
> semaphore wait queue gets thrashed and __wake_up_common tries to
> call an invalid callback function.

This patch from Zach might make a difference. Let me know if it changes
the symptoms at all. Sorry if it doesn't apply cleanly, as it is against
a base kernel. Basically, we could sleep while holding ctx_lock, which
does Bad Things(tm) on SMP systems.

-ben


Index: 2.6.13-git12-lock-kiocb/fs/aio.c
===================================================================
--- 2.6.13-git12-lock-kiocb.orig/fs/aio.c
+++ 2.6.13-git12-lock-kiocb/fs/aio.c
@@ -398,7 +398,7 @@ static struct kiocb fastcall *__aio_get_
if (unlikely(!req))
return NULL;

- req->ki_flags = 1 << KIF_LOCKED;
+ req->ki_flags = 0;
req->ki_users = 2;
req->ki_key = 0;
req->ki_ctx = ctx;
@@ -717,6 +717,8 @@ static ssize_t aio_run_iocb(struct kiocb
iocb->ki_run_list.next = iocb->ki_run_list.prev = NULL;
spin_unlock_irq(&ctx->ctx_lock);

+ lock_kiocb(iocb);
+
/* Quit retrying if the i/o has been cancelled */
if (kiocbIsCancelled(iocb)) {
ret = -EINTR;
@@ -781,6 +783,7 @@ out:
aio_queue_work(ctx);
}
}
+ unlock_kiocb(iocb);
return ret;
}

@@ -805,9 +808,7 @@ static int __aio_run_iocbs(struct kioctx
* Hold an extra reference while retrying i/o.
*/
iocb->ki_users++; /* grab extra reference */
- lock_kiocb(iocb);
aio_run_iocb(iocb);
- unlock_kiocb(iocb);
if (__aio_put_req(ctx, iocb)) /* drop extra ref */
put_ioctx(ctx);
}
@@ -1549,7 +1550,6 @@ int fastcall io_submit_one(struct kioctx

spin_lock_irq(&ctx->ctx_lock);
aio_run_iocb(req);
- unlock_kiocb(req);
if (!list_empty(&ctx->run_list)) {
/* drain the run list */
while (__aio_run_iocbs(ctx))

2005-09-21 11:31:16

by Sébastien Dugué

[permalink] [raw]
Subject: Re: [AIO] aio-2.6.13-rc6-B1

On Tue, 2005-09-20 at 15:13 -0400, Benjamin LaHaise wrote:
> On Tue, Sep 20, 2005 at 12:23:10PM +0200, S?bastien Dugu? wrote:
> > what's the point of calling wake_up_locked(&sem->wait) in
> > aio_down_wait? We're already in a wakeup path and end up
> > calling __wake_up_common recursively.
>
> That's necessary to kick the next semaphore op in the list. The
> list_del_init() right above that makes sure that we don't recurse
> and run the routine again.

OK, understood.

>
> > I think it may be one of the cause of my kernel hanging at the
> > very beginning.
> >
> > When I remove this call things go further but at some point a
> > semaphore wait queue gets thrashed and __wake_up_common tries to
> > call an invalid callback function.
>
> This patch from Zach might make a difference. Let me know if it changes
> the symptoms at all. Sorry if it doesn't apply cleanly, as it is against
> a base kernel. Basically, we could sleep while holding ctx_lock, which
> does Bad Things(tm) on SMP systems.
>

Well, it does not change a thing (I was not expecting it to). I think
the problem rather lies in the async semaphores (aio_down/aio_up)
mechanism and not in the fs aio.

What leads me to this is that the crash occurs only when there is
contention on an inode semaphore, which seems to happen frequently
with pipes and not so frequently with regular file IO.

And as the pipes are the only users (aside from xxx_aio_writev) of
this mecanism it shows right after the kernel is booted. Without the
90_pipe_aio.diff patch, the kernel boots normally but Oopses during
shutdown involving again a semaphore operation.

Any ideas?

S?bastien.

--
------------------------------------------------------

S?bastien Dugu? BULL/FREC:B1-247
phone: (+33) 476 29 77 70 Bullcom: 229-7770

mailto:[email protected]

Linux POSIX AIO: http://www.bullopensource.org/posix

------------------------------------------------------