2019-05-08 02:20:02

by John Stultz

[permalink] [raw]
Subject: [REGRESSION] usb: gadget: f_fs: Allow scatter-gather buffers

Since commit 772a7a724f69 ("usb: gadget: f_fs: Allow scatter-gather
buffers"), I've been seeing trouble with adb transfers in Android on
HiKey960, HiKey and now Dragonboard 845c.

Sometimes things crash, but often the transfers just stop w/o any
obvious error messages.

Initially I thought it was an issue with the HiKey960 dwc3 usb patches
being upstreamed, and was using the following hack workaround:
https://git.linaro.org/people/john.stultz/android-dev.git/commit/?h=dev/hikey960-5.1&id=dcdadaaec9db7a7b78ea9b838dd1453359a2f388

Then dwc2 added sg support, and I ended up having to revert it to get
by on HiKey:
https://git.linaro.org/people/john.stultz/android-dev.git/commit/?h=dev/hikey-5.1&id=6e91b4c7bd1e94bdd835263403c53e85a677b848

(See thread here: https://lkml.org/lkml/2019/3/8/765)

And now I've reproduced the same issue (with the same dwc3 workaround)
on the already upstream code for Dragonboard 845c.

Fei Yang has also reached out and mentioned he was seeing similar
problems with the f_fs sg support.

Andrzej: Do you have any ideas or suggestions on this? I'm happy to
test or run any debug patches, if it would help narrow the issue down.

If not, should we consider reverting the f_fs sg support?

thanks
-john


2019-05-08 07:02:52

by Andrzej Pietrasiewicz

[permalink] [raw]
Subject: Re: [REGRESSION] usb: gadget: f_fs: Allow scatter-gather buffers

Hi John,

+= Marek

W dniu 08.05.2019 o 04:18, John Stultz pisze:
> Since commit 772a7a724f69 ("usb: gadget: f_fs: Allow scatter-gather
> buffers"), I've been seeing trouble with adb transfers in Android on
> HiKey960, HiKey and now Dragonboard 845c.
>
> Sometimes things crash, but often the transfers just stop w/o any
> obvious error messages.
>
> Initially I thought it was an issue with the HiKey960 dwc3 usb patches
> being upstreamed, and was using the following hack workaround:
> https://git.linaro.org/people/john.stultz/android-dev.git/commit/?h=dev/hikey960-5.1&id=dcdadaaec9db7a7b78ea9b838dd1453359a2f388
>
> Then dwc2 added sg support, and I ended up having to revert it to get
> by on HiKey:
> https://git.linaro.org/people/john.stultz/android-dev.git/commit/?h=dev/hikey-5.1&id=6e91b4c7bd1e94bdd835263403c53e85a677b848
>
> (See thread here: https://lkml.org/lkml/2019/3/8/765)
>
> And now I've reproduced the same issue (with the same dwc3 workaround)
> on the already upstream code for Dragonboard 845c.
>
> Fei Yang has also reached out and mentioned he was seeing similar
> problems with the f_fs sg support.
>
> Andrzej: Do you have any ideas or suggestions on this? I'm happy to
> test or run any debug patches, if it would help narrow the issue down.
>

There is a patch:

https://www.spinics.net/lists/linux-usb/msg178536.html
https://www.spinics.net/lists/linux-usb/msg179653.html

which fixes a subtle bug, but hasn't been applied yet.

Can you please try it?

> If not, should we consider reverting the f_fs sg support?
>

Andrzej

2019-05-08 14:41:51

by Andrzej Pietrasiewicz

[permalink] [raw]
Subject: Re: [REGRESSION] usb: gadget: f_fs: Allow scatter-gather buffers

Hi John,

W dniu 08.05.2019 o 04:18, John Stultz pisze:
> Since commit 772a7a724f69 ("usb: gadget: f_fs: Allow scatter-gather
> buffers"), I've been seeing trouble with adb transfers in Android on
> HiKey960, HiKey and now Dragonboard 845c.
>
> Sometimes things crash, but often the transfers just stop w/o any
> obvious error messages.
>

<snip>

>
> Andrzej: Do you have any ideas or suggestions on this? I'm happy to
> test or run any debug patches, if it would help narrow the issue down.
>

Can you please try the below patch?

One more thing to consider is "functionfs read size 512 > requested size 24,
splitting request into multiple reads." in your original report, but let's
try this first:

From f2b8f27cfa42cafe1f56d8abbe2c76fa0072e368 Mon Sep 17 00:00:00 2001
From: Andrzej Pietrasiewicz <[email protected]>
Date: Wed, 8 May 2019 13:52:40 +0200
Subject: [PATCH] usb: gadget: Zero ffs_io_data

In some cases the "Allocate & copy" block in ffs_epfile_io() is not
executed. Consequently, in such a case ffs_alloc_buffer() is never called
and struct ffs_io_data is not initialized properly. This in turn leads to
problems when ffs_free_buffer() is called at the end of ffs_epfile_io().

This patch uses kzalloc() instead of kmalloc() in the aio case and memset()
in non-aio case to properly initialize struct ffs_io_data.

Signed-off-by: Andrzej Pietrasiewicz <[email protected]>
---
drivers/usb/gadget/function/f_fs.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/usb/gadget/function/f_fs.c b/drivers/usb/gadget/function/f_fs.c
index 47be961f1bf3..41d57ae8bc15 100644
--- a/drivers/usb/gadget/function/f_fs.c
+++ b/drivers/usb/gadget/function/f_fs.c
@@ -1182,11 +1182,12 @@ static ssize_t ffs_epfile_write_iter(struct kiocb
*kiocb, struct iov_iter *from)
ENTER();

if (!is_sync_kiocb(kiocb)) {
- p = kmalloc(sizeof(io_data), GFP_KERNEL);
+ p = kzalloc(sizeof(io_data), GFP_KERNEL);
if (unlikely(!p))
return -ENOMEM;
p->aio = true;
} else {
+ memset(p, 0, sizeof(*p));
p->aio = false;
}

@@ -1218,11 +1219,12 @@ static ssize_t ffs_epfile_read_iter(struct kiocb *kiocb,
struct iov_iter *to)
ENTER();

if (!is_sync_kiocb(kiocb)) {
- p = kmalloc(sizeof(io_data), GFP_KERNEL);
+ p = kzalloc(sizeof(io_data), GFP_KERNEL);
if (unlikely(!p))
return -ENOMEM;
p->aio = true;
} else {
+ memset(p, 0, sizeof(*p));
p->aio = false;
}

--
2.17.1

2019-05-08 21:45:21

by John Stultz

[permalink] [raw]
Subject: Re: [REGRESSION] usb: gadget: f_fs: Allow scatter-gather buffers

On Wed, May 8, 2019 at 12:01 AM Andrzej Pietrasiewicz
<[email protected]> wrote:
> W dniu 08.05.2019 o 04:18, John Stultz pisze:
> > Andrzej: Do you have any ideas or suggestions on this? I'm happy to
> > test or run any debug patches, if it would help narrow the issue down.
> >
>
> There is a patch:
>
> https://www.spinics.net/lists/linux-usb/msg178536.html
> https://www.spinics.net/lists/linux-usb/msg179653.html
>
> which fixes a subtle bug, but hasn't been applied yet.
>
> Can you please try it?

I'll give those a whirl, but those both look to be dwc2 specific. I'm
also seeing trouble w/ dwc3 based devices.

thanks
-john

2019-05-08 21:49:12

by John Stultz

[permalink] [raw]
Subject: Re: [REGRESSION] usb: gadget: f_fs: Allow scatter-gather buffers

On Wed, May 8, 2019 at 5:44 AM Andrzej Pietrasiewicz
<[email protected]> wrote:
> W dniu 08.05.2019 o 04:18, John Stultz pisze:
> > Andrzej: Do you have any ideas or suggestions on this? I'm happy to
> > test or run any debug patches, if it would help narrow the issue down.
> >
>
> Can you please try the below patch?
>
> One more thing to consider is "functionfs read size 512 > requested size 24,
> splitting request into multiple reads." in your original report, but let's
> try this first:
>
> From f2b8f27cfa42cafe1f56d8abbe2c76fa0072e368 Mon Sep 17 00:00:00 2001
> From: Andrzej Pietrasiewicz <[email protected]>
> Date: Wed, 8 May 2019 13:52:40 +0200
> Subject: [PATCH] usb: gadget: Zero ffs_io_data
>
> In some cases the "Allocate & copy" block in ffs_epfile_io() is not
> executed. Consequently, in such a case ffs_alloc_buffer() is never called
> and struct ffs_io_data is not initialized properly. This in turn leads to
> problems when ffs_free_buffer() is called at the end of ffs_epfile_io().
>
> This patch uses kzalloc() instead of kmalloc() in the aio case and memset()
> in non-aio case to properly initialize struct ffs_io_data.
>
> Signed-off-by: Andrzej Pietrasiewicz <[email protected]>

Hey Andrzej,
Thanks so much for sending this patch out! I tried it, but on both
HiKey960 and Dragonboard 845c (both dwc3 hardware) I'm still seeing
the same problem with this change.

On db845c "adb logcat" will usually hang mid-way and just sit there.
Further adb shell connections don't connect until I unplug and replug
the usb cable.

On HiKey960 logcat seems to work, but doing something like adb install
<big application> will stall and never complete.

Again, in both cases I'm not getting much in the way of error
messages, so there's not much clue as to whats going wrong.

Let me know if you have any further ideas to try.

thanks
-john

2019-05-09 03:27:23

by John Stultz

[permalink] [raw]
Subject: Re: [REGRESSION] usb: gadget: f_fs: Allow scatter-gather buffers

On Wed, May 8, 2019 at 12:01 AM Andrzej Pietrasiewicz
<[email protected]> wrote:
>
> Hi John,
>
> += Marek
>
> W dniu 08.05.2019 o 04:18, John Stultz pisze:
> > Since commit 772a7a724f69 ("usb: gadget: f_fs: Allow scatter-gather
> > buffers"), I've been seeing trouble with adb transfers in Android on
> > HiKey960, HiKey and now Dragonboard 845c.
> >
> > Sometimes things crash, but often the transfers just stop w/o any
> > obvious error messages.
> >
> > Initially I thought it was an issue with the HiKey960 dwc3 usb patches
> > being upstreamed, and was using the following hack workaround:
> > https://git.linaro.org/people/john.stultz/android-dev.git/commit/?h=dev/hikey960-5.1&id=dcdadaaec9db7a7b78ea9b838dd1453359a2f388
> >
> > Then dwc2 added sg support, and I ended up having to revert it to get
> > by on HiKey:
> > https://git.linaro.org/people/john.stultz/android-dev.git/commit/?h=dev/hikey-5.1&id=6e91b4c7bd1e94bdd835263403c53e85a677b848
> >
> > (See thread here: https://lkml.org/lkml/2019/3/8/765)
> >
> > And now I've reproduced the same issue (with the same dwc3 workaround)
> > on the already upstream code for Dragonboard 845c.
> >
> > Fei Yang has also reached out and mentioned he was seeing similar
> > problems with the f_fs sg support.
> >
> > Andrzej: Do you have any ideas or suggestions on this? I'm happy to
> > test or run any debug patches, if it would help narrow the issue down.
> >
>
> There is a patch:
>
> https://www.spinics.net/lists/linux-usb/msg178536.html
> https://www.spinics.net/lists/linux-usb/msg179653.html
>
> which fixes a subtle bug, but hasn't been applied yet.
>

So, the "fix zlp handling" doesn't seem to changes things on hikey w/
the dwc2 hardware.

I still see the following crash right away:
13.571611] functionfs read size 512 > requested size 24, splitting
request into multiple reads.
[ 13.571773] ------------[ cut here ]------------
[ 13.585205] kernel BUG at mm/slub.c:3944!
[ 13.589215] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP
[ 13.594698] Modules linked in:
[ 13.597754] Process adbd (pid: 408, stack limit = 0x000000001b71cb6b)
[ 13.604197] CPU: 0 PID: 408 Comm: adbd Not tainted 5.1.0-06623-g27dc6c9 #168
[ 13.611243] Hardware name: HiKey Development Board (DT)
[ 13.616465] pstate: 40400005 (nZcv daif +PAN -UAO)
[ 13.621266] pc : kfree+0x210/0x258
[ 13.624672] lr : ffs_epfile_io.isra.12+0xf8/0x6b8
[ 13.629371] sp : ffffff8011b63b50
[ 13.632682] x29: ffffff8011b63b50 x28: ffffffc06b918e00
[ 13.637993] x27: ffffffc0703af138 x26: 00000000000001e8
[ 13.643303] x25: ffffff8011b63c98 x24: ffffffc074c3e800
[ 13.648613] x23: ffffffc074affa00 x22: ffffff80114b9000
[ 13.653923] x21: ffffff80108b19b0 x20: ffffff8011c2d000
[ 13.659233] x19: ffffffbf00470b40 x18: 0000000000000000
[ 13.664542] x17: 0000000000000000 x16: ffffffc06b918e00
[ 13.669851] x15: 0000000000000000 x14: 0000000000000000
[ 13.675160] x13: 0000000000000000 x12: 0000000000000000
[ 13.680469] x11: 0000000000000000 x10: 0000000000000000
[ 13.685779] x9 : 0000000000000000 x8 : 00000073d8dd6150
[ 13.691088] x7 : 00000073d8dd6598 x6 : 0000007280805113
[ 13.696398] x5 : 0000007280805113 x4 : 0000000000000000
[ 13.701707] x3 : ffffffc0703af100 x2 : 0000000000000000
[ 13.707020] x1 : ffffffbf00470b48 x0 : ffffffbf00470b48
[ 13.712334] Call trace:
[ 13.714784] kfree+0x210/0x258
[ 13.717837] ffs_epfile_io.isra.12+0xf8/0x6b8
[ 13.722191] ffs_epfile_read_iter+0xb4/0x188
[ 13.726459] new_sync_read+0xf4/0x190
[ 13.730118] __vfs_read+0x2c/0x40
[ 13.733430] vfs_read+0x8c/0x148
[ 13.736654] ksys_read+0x64/0xf0
[ 13.739878] __arm64_sys_read+0x14/0x20
[ 13.743715] el0_svc_common.constprop.0+0xa8/0x100
[ 13.748504] el0_svc_handler+0x28/0x78
[ 13.752250] el0_svc+0x8/0xc
[ 13.755132] Code: f9400260 378000a0 f9400660 37000060 (d4210000)
[ 13.761225] ---[ end trace 0220b13deaa73ab7 ]---
[ 13.783381] Kernel panic - not syncing: Fatal exception
[ 13.788616] SMP: stopping secondary CPUs
[ 13.792814] Kernel Offset: disabled
[ 13.796301] CPU features: 0x002,24002004
[ 13.800219] Memory Limit: none
[ 13.820708] Rebooting in 5 seconds..

thanks
-john

2019-05-09 14:06:52

by Andrzej Pietrasiewicz

[permalink] [raw]
Subject: Re: [REGRESSION] usb: gadget: f_fs: Allow scatter-gather buffers

Hi John,
W dniu 08.05.2019 o 04:18, John Stultz pisze:
> Since commit 772a7a724f69 ("usb: gadget: f_fs: Allow scatter-gather
> buffers"), I've been seeing trouble with adb transfers in Android on
> HiKey960, HiKey and now Dragonboard 845c.
>
> Sometimes things crash, but often the transfers just stop w/o any
> obvious error messages.
>
> Initially I thought it was an issue with the HiKey960 dwc3 usb patches
> being upstreamed, and was using the following hack workaround:
> https://git.linaro.org/people/john.stultz/android-dev.git/commit/?h=dev/hikey960-5.1&id=dcdadaaec9db7a7b78ea9b838dd1453359a2f388
>
> Then dwc2 added sg support, and I ended up having to revert it to get
> by on HiKey:
> https://git.linaro.org/people/john.stultz/android-dev.git/commit/?h=dev/hikey-5.1&id=6e91b4c7bd1e94bdd835263403c53e85a677b848
>
> (See thread here: https://lkml.org/lkml/2019/3/8/765)

So the thread says there are problems at boot, but here you mention about
adb transfers, which must obviously be happening after the board has booted.
Do you experience problems at boot or not?

If a crash happens, what is in the log?


Andrzej

2019-05-09 18:27:51

by John Stultz

[permalink] [raw]
Subject: Re: [REGRESSION] usb: gadget: f_fs: Allow scatter-gather buffers

On Thu, May 9, 2019 at 7:02 AM Andrzej Pietrasiewicz
<[email protected]> wrote:
>
> Hi John,
> W dniu 08.05.2019 o 04:18, John Stultz pisze:
> > Since commit 772a7a724f69 ("usb: gadget: f_fs: Allow scatter-gather
> > buffers"), I've been seeing trouble with adb transfers in Android on
> > HiKey960, HiKey and now Dragonboard 845c.
> >
> > Sometimes things crash, but often the transfers just stop w/o any
> > obvious error messages.
> >
> > Initially I thought it was an issue with the HiKey960 dwc3 usb patches
> > being upstreamed, and was using the following hack workaround:
> > https://git.linaro.org/people/john.stultz/android-dev.git/commit/?h=dev/hikey960-5.1&id=dcdadaaec9db7a7b78ea9b838dd1453359a2f388
> >
> > Then dwc2 added sg support, and I ended up having to revert it to get
> > by on HiKey:
> > https://git.linaro.org/people/john.stultz/android-dev.git/commit/?h=dev/hikey-5.1&id=6e91b4c7bd1e94bdd835263403c53e85a677b848
> >
> > (See thread here: https://lkml.org/lkml/2019/3/8/765)
>
> So the thread says there are problems at boot, but here you mention about
> adb transfers, which must obviously be happening after the board has booted.
> Do you experience problems at boot or not?
>
> If a crash happens, what is in the log?

So, yes. Sorry, I am maybe muddling two issues (though they both seem
to be tied to f_fs sg). On dwc2, with the current code, we often (but
not always) crash as soon as adb starts up in the boot process. Thus
I'm running with a revert of "usb: dwc2: gadget: Add scatter-gather
mode" to get by.

As for example crashes, there is a crash in the thread linked above
(https://lkml.org/lkml/2019/3/8/765) and also the one I sent yesterday
when testing with your zlp patch. Let me know if you're looking for
something more specific.

One thing I didn't do, but I should is run w/ the zlp + your
memset/kzalloc patch. See if that helps get dwc2 further along at
least. I'll test that shortly here and get back to you.

thanks
-john

2019-05-09 21:25:09

by John Stultz

[permalink] [raw]
Subject: Re: [REGRESSION] usb: gadget: f_fs: Allow scatter-gather buffers

On Thu, May 9, 2019 at 11:25 AM John Stultz <[email protected]> wrote:
>
> On Thu, May 9, 2019 at 7:02 AM Andrzej Pietrasiewicz
> <[email protected]> wrote:
> >
> > Hi John,
> > W dniu 08.05.2019 o 04:18, John Stultz pisze:
> > > Since commit 772a7a724f69 ("usb: gadget: f_fs: Allow scatter-gather
> > > buffers"), I've been seeing trouble with adb transfers in Android on
> > > HiKey960, HiKey and now Dragonboard 845c.
> > >
> > > Sometimes things crash, but often the transfers just stop w/o any
> > > obvious error messages.
> > >
> > > Initially I thought it was an issue with the HiKey960 dwc3 usb patches
> > > being upstreamed, and was using the following hack workaround:
> > > https://git.linaro.org/people/john.stultz/android-dev.git/commit/?h=dev/hikey960-5.1&id=dcdadaaec9db7a7b78ea9b838dd1453359a2f388
> > >
> > > Then dwc2 added sg support, and I ended up having to revert it to get
> > > by on HiKey:
> > > https://git.linaro.org/people/john.stultz/android-dev.git/commit/?h=dev/hikey-5.1&id=6e91b4c7bd1e94bdd835263403c53e85a677b848
> > >
> > > (See thread here: https://lkml.org/lkml/2019/3/8/765)
> >
> > So the thread says there are problems at boot, but here you mention about
> > adb transfers, which must obviously be happening after the board has booted.
> > Do you experience problems at boot or not?
> >
> > If a crash happens, what is in the log?
>
> So, yes. Sorry, I am maybe muddling two issues (though they both seem
> to be tied to f_fs sg). On dwc2, with the current code, we often (but
> not always) crash as soon as adb starts up in the boot process. Thus
> I'm running with a revert of "usb: dwc2: gadget: Add scatter-gather
> mode" to get by.
>
> As for example crashes, there is a crash in the thread linked above
> (https://lkml.org/lkml/2019/3/8/765) and also the one I sent yesterday
> when testing with your zlp patch. Let me know if you're looking for
> something more specific.
>
> One thing I didn't do, but I should is run w/ the zlp + your
> memset/kzalloc patch. See if that helps get dwc2 further along at
> least. I'll test that shortly here and get back to you.

Ok. Apologies for earlier confusion.

So the kzalloc/memset fix you sent for f_fs.c does seem to avoid the
crash on bootup I was seeing w/ HiKey/dwc2 (previously I had only
tested it on HiKey960/dwc3).

However with that patch, I still see tranfer problems with adb, unless
I comment out setting sg_supported in dwc2/gadget.c (in the same
fashion I have to with HiKey960/dwc3).

The dwc2 zlp patch doesn't seem to affect things much either way in my
testing. But maybe I'm just not tripping on that issue yet.

So yes, the kzalloc/memset patch is a clear improvement, as it avoids
the bootup crash on dwc2, and seems like it should go in.

However, there is still the outstanding issue w/ functionfs sg
support stalling on larger transfers.

thanks
-john

2019-05-13 14:23:31

by Andrzej Pietrasiewicz

[permalink] [raw]
Subject: Re: [REGRESSION] usb: gadget: f_fs: Allow scatter-gather buffers

Hi John,

W dniu 09.05.2019 o 23:23, John Stultz pisze:
> On Thu, May 9, 2019 at 11:25 AM John Stultz <[email protected]> wrote:
>>
>> On Thu, May 9, 2019 at 7:02 AM Andrzej Pietrasiewicz
>> <[email protected]> wrote:
>>>

<snip>

>
> Ok. Apologies for earlier confusion.
>
> So the kzalloc/memset fix you sent for f_fs.c does seem to avoid the
> crash on bootup I was seeing w/ HiKey/dwc2 (previously I had only
> tested it on HiKey960/dwc3).
>
> However with that patch, I still see tranfer problems with adb, unless
> I comment out setting sg_supported in dwc2/gadget.c (in the same
> fashion I have to with HiKey960/dwc3).
>
> The dwc2 zlp patch doesn't seem to affect things much either way in my
> testing. But maybe I'm just not tripping on that issue yet.
>
> So yes, the kzalloc/memset patch is a clear improvement, as it avoids
> the bootup crash on dwc2, and seems like it should go in.
>
> However, there is still the outstanding issue w/ functionfs sg
> support stalling on larger transfers.

Do you get "functionfs read size 512 > requested size 24, splitting
request into multiple reads" message when problems happen?

Is there anything in the kernel log?

I'm unable to reproduce your problems. I thought I was able, but
it was another problem, which is fixed with:

5acb4b970184d189d901192d075997c933b82260
dwc2: gadget: Fix completed transfer size calculation in DDMA

(or you can simply take upstream drivers/usb/dwc2).

Do your problems happen on dwc2 or dwc3?

Is there a way to try your adb without building and running the
whole Android?

Andrzej

2019-05-13 19:48:08

by John Stultz

[permalink] [raw]
Subject: Re: [REGRESSION] usb: gadget: f_fs: Allow scatter-gather buffers

On Mon, May 13, 2019 at 7:08 AM Andrzej Pietrasiewicz
<[email protected]> wrote:
> W dniu 09.05.2019 o 23:23, John Stultz pisze:
> > So yes, the kzalloc/memset patch is a clear improvement, as it avoids
> > the bootup crash on dwc2, and seems like it should go in.
> >
> > However, there is still the outstanding issue w/ functionfs sg
> > support stalling on larger transfers.
>
> Do you get "functionfs read size 512 > requested size 24, splitting
> request into multiple reads" message when problems happen?

Unfortunately no.

> Is there anything in the kernel log?

Nope. Just the transfers stall/hang and further attempts at adb
connections fail until the USB cable is unplugged and replugged.

> I'm unable to reproduce your problems. I thought I was able, but
> it was another problem, which is fixed with:
>
> 5acb4b970184d189d901192d075997c933b82260
> dwc2: gadget: Fix completed transfer size calculation in DDMA
>
> (or you can simply take upstream drivers/usb/dwc2).

Yea, I'm able to test against mainline. I can give this a whirl, but
since it affects multiple drivers, I suspect the trouble is in the
f_fs code, not just dwc2.

> Do your problems happen on dwc2 or dwc3?

The transfer hangs happen on *both* dwc2 and dwc3. And on both we can
use a similar workaround of disabling sg_supported to get by.

https://git.linaro.org/people/john.stultz/android-dev.git/commit/?h=dev/hikey-mainline-WIP&id=21dfaac615f1f287377897804cbfe69450d489e3
https://git.linaro.org/people/john.stultz/android-dev.git/commit/?h=dev/hikey960-mainline-WIP&id=5b70ec4ae1c06ae700fcca7141130c71e205fa1c


> Is there a way to try your adb without building and running the
> whole Android?

Maybe? I have heard of folks doing it in the past, though I don't
really know a quick path to get there.

Is there anything else I can try for you?

thanks
-john

2019-05-14 10:18:08

by Andrzej Pietrasiewicz

[permalink] [raw]
Subject: Re: [REGRESSION] usb: gadget: f_fs: Allow scatter-gather buffers

Hi,

W dniu 13.05.2019 o 20:09, John Stultz pisze:
> On Mon, May 13, 2019 at 7:08 AM Andrzej Pietrasiewicz
> <[email protected]> wrote:

<snip>

>>
>> Do you get "functionfs read size 512 > requested size 24, splitting
>> request into multiple reads" message when problems happen?
>
> Unfortunately no.

Actually that's a fortunate outcome :)

>
>> Is there anything in the kernel log?
>
> Nope. Just the transfers stall/hang and further attempts at adb
> connections fail until the USB cable is unplugged and replugged.
>

<snip>

>
>> Is there a way to try your adb without building and running the
>> whole Android?
>
> Maybe? I have heard of folks doing it in the past, though I don't
> really know a quick path to get there.
>
> Is there anything else I can try for you?

Have you tried compiling FunctionFS with debugging enabled?
You do so bu uncommenting:

/* #define DEBUG */
/* #define VERBOSE_DEBUG */

at the beginning of drivers/usb/gadget/function/f_fs.c

Is there anything suspicious in the kernel log when you run it then?

Remote debugging through this mailing list will incur enormous
round trip time ;) The most valuable help would be helping in
reproducing the problem you encounter.

One question that comes to my mind is this: Does the USB transmission
stall (e.g. endpoint stall) or not? In other words, is adb connection
broken because USB stops transmitting anything, or because the
data is transmitted but its integrity is broken during transmission
and that causes adb/adbd confusion which results in stopping their
operation? Does anything keep happening on FunctionFS when adb
connection is broken?

Andrzej

2019-05-20 13:41:34

by Andrzej Pietrasiewicz

[permalink] [raw]
Subject: Re: [REGRESSION] usb: gadget: f_fs: Allow scatter-gather buffers

Hi John,

<snip>

>> Is there anything else I can try for you?
>
> Have you tried compiling FunctionFS with debugging enabled?
> You do so bu uncommenting:
>
> /* #define DEBUG */
> /* #define VERBOSE_DEBUG */
>
> at the beginning of drivers/usb/gadget/function/f_fs.c
>
> Is there anything suspicious in the kernel log when you run it then?
>


<snip>

>
> One question that comes to my mind is this: Does the USB transmission
> stall (e.g. endpoint stall) or not? In other words, is adb connection
> broken because USB stops transmitting anything, or because the
> data is transmitted but its integrity is broken during transmission
> and that causes adb/adbd confusion which results in stopping their
> operation? Does anything keep happening on FunctionFS when adb
> connection is broken?

Any discoveries about the problem?

Andrzej

2019-05-20 18:19:28

by Yang, Fei

[permalink] [raw]
Subject: RE: [REGRESSION] usb: gadget: f_fs: Allow scatter-gather buffers

>> One question that comes to my mind is this: Does the USB transmission
>> stall (e.g. endpoint stall) or not? In other words, is adb connection
>> broken because USB stops transmitting anything, or because the data is
>> transmitted but its integrity is broken during transmission and that
>> causes adb/adbd confusion which results in stopping their operation?
>> Does anything keep happening on FunctionFS when adb connection is
>> broken?
>
>Any discoveries about the problem?

In my debugging, I'm seeing a lot of requests queued up through ffs_epfile_io (returning -EIOCBQUEUED), but
only a few of them came back through ffs_epfile_async_io_complete -> ffs_user_copy_worker.
I don’t think there is a USB transmission stall though, because if I manually disable io_data->use_sg, everything
goes back to normal. So it looks more likely to be a buffer handling problem in the DWC3 driver.

-Fei

>
>Andrzej

2019-05-20 18:36:04

by John Stultz

[permalink] [raw]
Subject: Re: [REGRESSION] usb: gadget: f_fs: Allow scatter-gather buffers

On Mon, May 20, 2019 at 9:23 AM Yang, Fei <[email protected]> wrote:
>
> >> One question that comes to my mind is this: Does the USB transmission
> >> stall (e.g. endpoint stall) or not? In other words, is adb connection
> >> broken because USB stops transmitting anything, or because the data is
> >> transmitted but its integrity is broken during transmission and that
> >> causes adb/adbd confusion which results in stopping their operation?
> >> Does anything keep happening on FunctionFS when adb connection is
> >> broken?
> >
> >Any discoveries about the problem?
>
> In my debugging, I'm seeing a lot of requests queued up through ffs_epfile_io (returning -EIOCBQUEUED), but
> only a few of them came back through ffs_epfile_async_io_complete -> ffs_user_copy_worker.
> I don’t think there is a USB transmission stall though, because if I manually disable io_data->use_sg, everything
> goes back to normal. So it looks more likely to be a buffer handling problem in the DWC3 driver.

Yea, I also did reconfirm that reverting 772a7a724f6, or setting
gadget->sg_supported to false makes the isssue go away.

And after spending a bunch of time trying to trace through the code
last week, in particular the sg_supported checks, but I'm not seeing
anything that is standing out with the f_fs logic.

I'd start to agree it might be a buffer handling problem in dwc3, but
it feels odd that I'm also seeing this w/ dwc2 hardware as well. Maybe
the same bug was copied into both drivers?

I'll try to dig a little on that theory today.

thanks
-john

2019-05-20 21:53:29

by Yang, Fei

[permalink] [raw]
Subject: RE: [REGRESSION] usb: gadget: f_fs: Allow scatter-gather buffers

>>>> One question that comes to my mind is this: Does the USB
>>>> transmission stall (e.g. endpoint stall) or not? In other words, is
>>>> adb connection broken because USB stops transmitting anything, or
>>>> because the data is transmitted but its integrity is broken during
>>>> transmission and that causes adb/adbd confusion which results in stopping their operation?
>>>> Does anything keep happening on FunctionFS when adb connection is
>>>> broken?
>>>
>>>Any discoveries about the problem?
>>
>> In my debugging, I'm seeing a lot of requests queued up through
>> ffs_epfile_io (returning -EIOCBQUEUED), but only a few of them came back through ffs_epfile_async_io_complete -> ffs_user_copy_worker.
>> I don’t think there is a USB transmission stall though, because if I
>> manually disable io_data->use_sg, everything goes back to normal. So it looks more likely to be a buffer handling problem in the DWC3 driver.
>
> Yea, I also did reconfirm that reverting 772a7a724f6, or setting
> gadget->sg_supported to false makes the isssue go away.
>
> And after spending a bunch of time trying to trace through the code last week, in particular the sg_supported checks, but I'm not seeing anything that is standing out with the f_fs logic.
>
> I'd start to agree it might be a buffer handling problem in dwc3, but it feels odd that I'm also seeing this w/ dwc2 hardware as well. Maybe the same bug was copied into both drivers?
>
> I'll try to dig a little on that theory today.

One of the problems appears to be that req->num_mapped_sgs was left uninitialized. I made the following change and got a lot more requests completed.
However this change is not sufficient to solve the adb issue, the usb requests would eventually get stuck without getting a matching ffs_epfile_async_io_complete.

@@ -1067,6 +1067,7 @@ static ssize_t ffs_epfile_io(struct file *file, struct ffs_io_data *io_data)
req->buf = NULL;
req->sg = io_data->sgt.sgl;
req->num_sgs = io_data->sgt.nents;
+ req->num_mapped_sgs = req->num_sgs;
} else {
req->buf = data;
}
@@ -1110,6 +1111,7 @@ static ssize_t ffs_epfile_io(struct file *file, struct ffs_io_data *io_data)
req->buf = NULL;
req->sg = io_data->sgt.sgl;
req->num_sgs = io_data->sgt.nents;
+ req->num_mapped_sgs = req->num_sgs;
} else {
req->buf = data;
}

2019-05-21 10:06:33

by Andrzej Pietrasiewicz

[permalink] [raw]
Subject: Re: [REGRESSION] usb: gadget: f_fs: Allow scatter-gather buffers

Hi,

W dniu 20.05.2019 o 23:52, Yang, Fei pisze:
>>>>> One question that comes to my mind is this: Does the USB

<snip>

>
> One of the problems appears to be that req->num_mapped_sgs was left uninitialized. I made the following change and got a lot more requests completed.
> However this change is not sufficient to solve the adb issue, the usb requests would eventually get stuck without getting a matching ffs_epfile_async_io_complete.
>
> @@ -1067,6 +1067,7 @@ static ssize_t ffs_epfile_io(struct file *file, struct ffs_io_data *io_data)
> req->buf = NULL;
> req->sg = io_data->sgt.sgl;
> req->num_sgs = io_data->sgt.nents;
> + req->num_mapped_sgs = req->num_sgs;
> } else {
> req->buf = data;
> }
> @@ -1110,6 +1111,7 @@ static ssize_t ffs_epfile_io(struct file *file, struct ffs_io_data *io_data)
> req->buf = NULL;
> req->sg = io_data->sgt.sgl;
> req->num_sgs = io_data->sgt.nents;
> + req->num_mapped_sgs = req->num_sgs;
> } else {
> req->buf = data;
> }
>

Isn't num_mapped_sgs meant to be set by drivers/usb/gadget/udc/core.c?

And the comment in include/linux/usb/gadget.h says "internal".

One thing that becomes evident now is that adb uses async io.
It seems that interaction of async io and s-g mode should be further
investigated.

Andrzej

2019-05-21 15:35:56

by John Stultz

[permalink] [raw]
Subject: Re: [REGRESSION] usb: gadget: f_fs: Allow scatter-gather buffers

On Tue, May 21, 2019 at 3:04 AM Andrzej Pietrasiewicz
<[email protected]> wrote:
>
> One thing that becomes evident now is that adb uses async io.
> It seems that interaction of async io and s-g mode should be further
> investigated.

Although on our devices, we have async io disabled w/ adb, as there is
an existing/separate issue w/ the dwc3 driver in 4.19 and older
kernels. So it seems its not strictly connected to async io.

thanks
-john