2015-06-09 01:15:02

by John Stultz

[permalink] [raw]
Subject: Locking issues w/ functionfs gadget and aio?

After setting up functionfs for adb w/ 4.1-rc7, I noticed some flakey behavior.
I enabled some lock debugging and got the following:

[ 91.648093] read strings
[ 91.650264] g_ffs gadget: g_ffs ready
[ 91.652551] ci_hdrc ci_hdrc.0: CI_HDRC_CONTROLLER_RESET_EVENT received
[ 96.068693] BUG: spinlock lockup suspected on CPU#0, adbd/2791
[ 96.068751] lock: 0xe7764880, .magic: e7764880, .owner: <none>/-1,
.owner_cpu: -407539900
[ 96.073448] CPU: 0 PID: 2791 Comm: adbd Not tainted
4.1.0-rc1-00032-g359b12f #147
[ 96.081688] Hardware name: Qualcomm (Flattened Device Tree)
[ 96.089266] [<c0216ac8>] (unwind_backtrace) from [<c02136a8>]
(show_stack+0x10/0x14)
[ 96.094635] [<c02136a8>] (show_stack) from [<c075d9fc>]
(dump_stack+0x70/0xbc)
[ 96.102627] [<c075d9fc>] (dump_stack) from [<c026ef90>]
(do_raw_spin_lock+0x114/0x1a0)
[ 96.109661] [<c026ef90>] (do_raw_spin_lock) from [<c0764cb8>]
(_raw_spin_lock_irqsave+0x50/0x5c)
[ 96.117560] [<c0764cb8>] (_raw_spin_lock_irqsave) from [<c037c1a0>]
(kiocb_set_cancel_fn+0x1c/0x60)
[ 96.126519] [<c037c1a0>] (kiocb_set_cancel_fn) from [<c05ae568>]
(ffs_epfile_read_iter+0x8c/0x140)
[ 96.135289] [<c05ae568>] (ffs_epfile_read_iter) from [<c0332018>]
(__vfs_read+0xb0/0xd4)
[ 96.144290] [<c0332018>] (__vfs_read) from [<c0332ef8>] (vfs_read+0x7c/0x100)
[ 96.152535] [<c0332ef8>] (vfs_read) from [<c0332fbc>] (SyS_read+0x40/0x8c)
[ 96.159571] [<c0332fbc>] (SyS_read) from [<c020ff20>]
(ret_fast_syscall+0x0/0x4c)
[ 117.678633] INFO: rcu_preempt detected stalls on CPUs/tasks:
[ 117.683069] 0: (1 GPs behind) idle=805/140000000000000/0
softirq=7187/7189 fqs=2601
[ 117.683316] (detected by 3, t=2603 jiffies, g=3028, c=3027, q=474)
[ 117.692345] Task dump for CPU 0:
[ 117.697202] adbd R running 0 2791 1 0x00000002
[ 117.704296] [<c075f234>] (__schedule) from [<ffffffff>] (0xffffffff)


It seems we're stuck on the kioctx.ctx_lock, and reviewing that lock
usage I don't see any problems in fs/aio.c right off.

So I'm guessing the f_fs.c code is somehow not initializing the lock
structure, or maybe calling kiocb_set_cancel_fn() from the wrong
context?

Anyway, I was curious if anyone else has seen similar issues or had
suggestions for further debugging. I've seen this issue as well with
4.1-rc1.

thanks
-john


2015-06-13 00:51:16

by John Stultz

[permalink] [raw]
Subject: Re: Locking issues w/ functionfs gadget and aio?

On Mon, Jun 8, 2015 at 6:14 PM, John Stultz <[email protected]> wrote:
> After setting up functionfs for adb w/ 4.1-rc7, I noticed some flakey behavior.
> I enabled some lock debugging and got the following:
>
> [ 91.648093] read strings
> [ 91.650264] g_ffs gadget: g_ffs ready
> [ 91.652551] ci_hdrc ci_hdrc.0: CI_HDRC_CONTROLLER_RESET_EVENT received
> [ 96.068693] BUG: spinlock lockup suspected on CPU#0, adbd/2791
> [ 96.068751] lock: 0xe7764880, .magic: e7764880, .owner: <none>/-1,
> .owner_cpu: -407539900
> [ 96.073448] CPU: 0 PID: 2791 Comm: adbd Not tainted
> 4.1.0-rc1-00032-g359b12f #147
> [ 96.081688] Hardware name: Qualcomm (Flattened Device Tree)
> [ 96.089266] [<c0216ac8>] (unwind_backtrace) from [<c02136a8>]
> (show_stack+0x10/0x14)
> [ 96.094635] [<c02136a8>] (show_stack) from [<c075d9fc>]
> (dump_stack+0x70/0xbc)
> [ 96.102627] [<c075d9fc>] (dump_stack) from [<c026ef90>]
> (do_raw_spin_lock+0x114/0x1a0)
> [ 96.109661] [<c026ef90>] (do_raw_spin_lock) from [<c0764cb8>]
> (_raw_spin_lock_irqsave+0x50/0x5c)
> [ 96.117560] [<c0764cb8>] (_raw_spin_lock_irqsave) from [<c037c1a0>]
> (kiocb_set_cancel_fn+0x1c/0x60)
> [ 96.126519] [<c037c1a0>] (kiocb_set_cancel_fn) from [<c05ae568>]
> (ffs_epfile_read_iter+0x8c/0x140)
> [ 96.135289] [<c05ae568>] (ffs_epfile_read_iter) from [<c0332018>]
> (__vfs_read+0xb0/0xd4)
> [ 96.144290] [<c0332018>] (__vfs_read) from [<c0332ef8>] (vfs_read+0x7c/0x100)
> [ 96.152535] [<c0332ef8>] (vfs_read) from [<c0332fbc>] (SyS_read+0x40/0x8c)
> [ 96.159571] [<c0332fbc>] (SyS_read) from [<c020ff20>]
> (ret_fast_syscall+0x0/0x4c)
> [ 117.678633] INFO: rcu_preempt detected stalls on CPUs/tasks:
> [ 117.683069] 0: (1 GPs behind) idle=805/140000000000000/0
> softirq=7187/7189 fqs=2601
> [ 117.683316] (detected by 3, t=2603 jiffies, g=3028, c=3027, q=474)
> [ 117.692345] Task dump for CPU 0:
> [ 117.697202] adbd R running 0 2791 1 0x00000002
> [ 117.704296] [<c075f234>] (__schedule) from [<ffffffff>] (0xffffffff)
>
>
> It seems we're stuck on the kioctx.ctx_lock, and reviewing that lock
> usage I don't see any problems in fs/aio.c right off.
>
> So I'm guessing the f_fs.c code is somehow not initializing the lock
> structure, or maybe calling kiocb_set_cancel_fn() from the wrong
> context?

So looking further at this, it seems that the __vfs_read() calls
new_sync_read(), which allocates a struct kiocb kiocb on the stack and
passes it to the ffs_epfile_read_iter() funciton. That then calls
kiocb_set_cancel_fn() passing a pointer to that kiocb. However,
kiocb_set_cancel_fn assumes the kiocb is a sub-element of a struct
aio_kiocb, and it tries to grab the kioctx from that parent structure.
However it seems there is no aio_kiocb structure here, so the
spin_lock_irqsave hangs trying to lock random data on the stack.

I'm not super sure what the right fix is, but if do something like the
following (sorry, whitespace corrupted via copy/paste), I don't seem
to run into the problem.

diff --git a/drivers/usb/gadget/function/f_fs.c
b/drivers/usb/gadget/function/f_fs.c
index 3507f88..e322818 100644
--- a/drivers/usb/gadget/function/f_fs.c
+++ b/drivers/usb/gadget/function/f_fs.c
@@ -924,7 +924,8 @@ static ssize_t ffs_epfile_write_iter(struct kiocb
*kiocb, struct iov_iter *from)

kiocb->private = p;

- kiocb_set_cancel_fn(kiocb, ffs_aio_cancel);
+ if (p->aio)
+ kiocb_set_cancel_fn(kiocb, ffs_aio_cancel);

res = ffs_epfile_io(kiocb->ki_filp, p);
if (res == -EIOCBQUEUED)
@@ -968,7 +969,8 @@ static ssize_t ffs_epfile_read_iter(struct kiocb
*kiocb, struct iov_iter *to)

kiocb->private = p;

- kiocb_set_cancel_fn(kiocb, ffs_aio_cancel);
+ if(p->aio)
+ kiocb_set_cancel_fn(kiocb, ffs_aio_cancel);

res = ffs_epfile_io(kiocb->ki_filp, p);
if (res == -EIOCBQUEUED)


Is there a better solution here? I'm not sure I see if the
is_sync_kiocb(kiocb) check would ever be false from here, so I'm not
sure if all the p->aio checking is really needed or not.

This issue seems to have been introduced with 70e60d917e91fff
(gadget/function/f_fs.c: switch to ->{read,write}_iter())

thanks
-john

2015-06-21 05:24:48

by Al Viro

[permalink] [raw]
Subject: Re: Locking issues w/ functionfs gadget and aio?

On Fri, Jun 12, 2015 at 05:51:12PM -0700, John Stultz wrote:

> I'm not super sure what the right fix is, but if do something like the
> following (sorry, whitespace corrupted via copy/paste), I don't seem
> to run into the problem.

Looks sane. Which tree would you prefer it to go through, vfs or usb?
BTW, in either case you'd need Signed-off-by: on that patch...

> diff --git a/drivers/usb/gadget/function/f_fs.c
> b/drivers/usb/gadget/function/f_fs.c
> index 3507f88..e322818 100644
> --- a/drivers/usb/gadget/function/f_fs.c
> +++ b/drivers/usb/gadget/function/f_fs.c
> @@ -924,7 +924,8 @@ static ssize_t ffs_epfile_write_iter(struct kiocb
> *kiocb, struct iov_iter *from)
>
> kiocb->private = p;
>
> - kiocb_set_cancel_fn(kiocb, ffs_aio_cancel);
> + if (p->aio)
> + kiocb_set_cancel_fn(kiocb, ffs_aio_cancel);
>
> res = ffs_epfile_io(kiocb->ki_filp, p);
> if (res == -EIOCBQUEUED)
> @@ -968,7 +969,8 @@ static ssize_t ffs_epfile_read_iter(struct kiocb
> *kiocb, struct iov_iter *to)
>
> kiocb->private = p;
>
> - kiocb_set_cancel_fn(kiocb, ffs_aio_cancel);
> + if(p->aio)
> + kiocb_set_cancel_fn(kiocb, ffs_aio_cancel);
>
> res = ffs_epfile_io(kiocb->ki_filp, p);
> if (res == -EIOCBQUEUED)
>
>
> Is there a better solution here? I'm not sure I see if the
> is_sync_kiocb(kiocb) check would ever be false from here, so I'm not
> sure if all the p->aio checking is really needed or not.
>
> This issue seems to have been introduced with 70e60d917e91fff
> (gadget/function/f_fs.c: switch to ->{read,write}_iter())
>
> thanks
> -john
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

2015-06-22 17:20:59

by John Stultz

[permalink] [raw]
Subject: Re: Locking issues w/ functionfs gadget and aio?

On Sat, Jun 20, 2015 at 10:24 PM, Al Viro <[email protected]> wrote:
> On Fri, Jun 12, 2015 at 05:51:12PM -0700, John Stultz wrote:
>
>> I'm not super sure what the right fix is, but if do something like the
>> following (sorry, whitespace corrupted via copy/paste), I don't seem
>> to run into the problem.
>
> Looks sane. Which tree would you prefer it to go through, vfs or usb?
> BTW, in either case you'd need Signed-off-by: on that patch...

Heh. I assumed my hack would be the wrong thing, but if there's no
better suggestion, I'm happy to submit it.

I have no strong preference of which tree it goes through, but I'd
like for Andrzej to weigh in to make sure he agrees.

I'll write up a proper changelog and resubmit here shortly.

thanks for the review!
-john

2015-07-01 10:00:20

by Andrzej Pietrasiewicz

[permalink] [raw]
Subject: Re: Locking issues w/ functionfs gadget and aio?

W dniu 22.06.2015 o 19:20, John Stultz pisze:
> On Sat, Jun 20, 2015 at 10:24 PM, Al Viro <[email protected]> wrote:
>> On Fri, Jun 12, 2015 at 05:51:12PM -0700, John Stultz wrote:
>>
>>> I'm not super sure what the right fix is, but if do something like the
>>> following (sorry, whitespace corrupted via copy/paste), I don't seem
>>> to run into the problem.
>>
>> Looks sane. Which tree would you prefer it to go through, vfs or usb?
>> BTW, in either case you'd need Signed-off-by: on that patch...
>
> Heh. I assumed my hack would be the wrong thing, but if there's no
> better suggestion, I'm happy to submit it.
>
> I have no strong preference of which tree it goes through, but I'd
> like for Andrzej to weigh in to make sure he agrees.
>

I have no preference of which tree it goes through either.

As far as the changes themselves are concerned, they have
been reviewed by Robert, whose desk is next to mine and
I trust his opinion.

Thanks,

AP