2023-10-06 14:13:02

by Hardik Gajjar

[permalink] [raw]
Subject: [PATCH] usb: gadget: u_ether: Replace netif_stop_queue with netif_device_detach

From: gah2hi <[email protected]>

This patch replaces the usage of netif_stop_queue with netif_device_detach
in the u_ether driver. The netif_device_detach function not only stops all
tx queues by calling netif_tx_stop_all_queues but also marks the device as
removed by clearing the __LINK_STATE_PRESENT bit.

This change helps notify user space about the disconnection of the device
more effectively, compared to netif_stop_queue, which only stops a single
transmit queue.

Signed-off-by: gah2hi <[email protected]>
---
drivers/usb/gadget/function/u_ether.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/usb/gadget/function/u_ether.c b/drivers/usb/gadget/function/u_ether.c
index 4bb0553da658..9d1c40c152d8 100644
--- a/drivers/usb/gadget/function/u_ether.c
+++ b/drivers/usb/gadget/function/u_ether.c
@@ -1200,7 +1200,7 @@ void gether_disconnect(struct gether *link)

DBG(dev, "%s\n", __func__);

- netif_stop_queue(dev->net);
+ netif_device_detach(dev->net);
netif_carrier_off(dev->net);

/* disable endpoints, forcing (synchronous) completion
--
2.17.1


2023-10-06 14:21:30

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH] usb: gadget: u_ether: Replace netif_stop_queue with netif_device_detach

On Fri, Oct 06, 2023 at 04:12:31PM +0200, Hardik Gajjar wrote:
> From: gah2hi <[email protected]>
>
> This patch replaces the usage of netif_stop_queue with netif_device_detach
> in the u_ether driver. The netif_device_detach function not only stops all
> tx queues by calling netif_tx_stop_all_queues but also marks the device as
> removed by clearing the __LINK_STATE_PRESENT bit.
>
> This change helps notify user space about the disconnection of the device
> more effectively, compared to netif_stop_queue, which only stops a single
> transmit queue.
>
> Signed-off-by: gah2hi <[email protected]>
> ---
> drivers/usb/gadget/function/u_ether.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/usb/gadget/function/u_ether.c b/drivers/usb/gadget/function/u_ether.c
> index 4bb0553da658..9d1c40c152d8 100644
> --- a/drivers/usb/gadget/function/u_ether.c
> +++ b/drivers/usb/gadget/function/u_ether.c
> @@ -1200,7 +1200,7 @@ void gether_disconnect(struct gether *link)
>
> DBG(dev, "%s\n", __func__);
>
> - netif_stop_queue(dev->net);
> + netif_device_detach(dev->net);
> netif_carrier_off(dev->net);
>
> /* disable endpoints, forcing (synchronous) completion
> --
> 2.17.1
>
>

Hi,

This is the friendly patch-bot of Greg Kroah-Hartman. You have sent him
a patch that has triggered this response. He used to manually respond
to these common problems, but in order to save his sanity (he kept
writing the same thing over and over, yet to different people), I was
created. Hopefully you will not take offence and will fix the problem
in your patch and resubmit it so that it can be accepted into the Linux
kernel tree.

You are receiving this message because of the following common error(s)
as indicated below:

- It looks like you did not use your "real" name for the patch on either
the Signed-off-by: line, or the From: line (both of which have to
match). Please read the kernel file,
Documentation/process/submitting-patches.rst for how to do this
correctly.

If you wish to discuss this problem further, or you have questions about
how to resolve this issue, please feel free to respond to this email and
Greg will reply once he has dug out from the pending patches received
from other developers.

thanks,

greg k-h's patch email bot

2023-10-06 14:52:49

by Hardik Gajjar

[permalink] [raw]
Subject: [PATCH] usb: gadget: u_ether: Replace netif_stop_queue with netif_device_detach

From: gah2hi <[email protected]>

This patch replaces the usage of netif_stop_queue with netif_device_detach
in the u_ether driver. The netif_device_detach function not only stops all
tx queues by calling netif_tx_stop_all_queues but also marks the device as
removed by clearing the __LINK_STATE_PRESENT bit.

This change helps notify user space about the disconnection of the device
more effectively, compared to netif_stop_queue, which only stops a single
transmit queue.

Signed-off-by: gah2hi <[email protected]>
---
drivers/usb/gadget/function/u_ether.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/usb/gadget/function/u_ether.c b/drivers/usb/gadget/function/u_ether.c
index 4bb0553da658..9d1c40c152d8 100644
--- a/drivers/usb/gadget/function/u_ether.c
+++ b/drivers/usb/gadget/function/u_ether.c
@@ -1200,7 +1200,7 @@ void gether_disconnect(struct gether *link)

DBG(dev, "%s\n", __func__);

- netif_stop_queue(dev->net);
+ netif_device_detach(dev->net);
netif_carrier_off(dev->net);

/* disable endpoints, forcing (synchronous) completion
--
2.17.1

2023-10-06 14:53:54

by Hardik Gajjar

[permalink] [raw]
Subject: [PATCH v2] usb: gadget: u_ether: Replace netif_stop_queue with netif_device_detach

This patch replaces the usage of netif_stop_queue with netif_device_detach
in the u_ether driver. The netif_device_detach function not only stops all
tx queues by calling netif_tx_stop_all_queues but also marks the device as
removed by clearing the __LINK_STATE_PRESENT bit.

This change helps notify user space about the disconnection of the device
more effectively, compared to netif_stop_queue, which only stops a single
transmit queue.

Changes since version 1:
- Correct Singed-off user name and e-mail

Signed-off-by: Hardik Gajjar <[email protected]>
---
drivers/usb/gadget/function/u_ether.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/usb/gadget/function/u_ether.c b/drivers/usb/gadget/function/u_ether.c
index 4bb0553da658..b0daee35b996 100644
--- a/drivers/usb/gadget/function/u_ether.c
+++ b/drivers/usb/gadget/function/u_ether.c
@@ -635,7 +635,7 @@ static int eth_stop(struct net_device *net)
unsigned long flags;

VDBG(dev, "%s\n", __func__);
- netif_stop_queue(net);
+ netif_device_detach(dev->net);

DBG(dev, "stop stats: rx/tx %ld/%ld, errs %ld/%ld\n",
dev->net->stats.rx_packets, dev->net->stats.tx_packets,
--
2.17.1

2023-10-06 14:59:24

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH v2] usb: gadget: u_ether: Replace netif_stop_queue with netif_device_detach

On Fri, Oct 06, 2023 at 04:53:32PM +0200, Hardik Gajjar wrote:
> This patch replaces the usage of netif_stop_queue with netif_device_detach
> in the u_ether driver. The netif_device_detach function not only stops all
> tx queues by calling netif_tx_stop_all_queues but also marks the device as
> removed by clearing the __LINK_STATE_PRESENT bit.
>
> This change helps notify user space about the disconnection of the device
> more effectively, compared to netif_stop_queue, which only stops a single
> transmit queue.
>
> Changes since version 1:
> - Correct Singed-off user name and e-mail

Nit, this goes below the --- line :(

2023-10-06 15:38:38

by Hardik Gajjar

[permalink] [raw]
Subject: [PATCH v3] usb: gadget: u_ether: Replace netif_stop_queue with netif_device_detach

This patch replaces the usage of netif_stop_queue with netif_device_detach
in the u_ether driver. The netif_device_detach function not only stops all
tx queues by calling netif_tx_stop_all_queues but also marks the device as
removed by clearing the __LINK_STATE_PRESENT bit.

This change helps notify user space about the disconnection of the device
more effectively, compared to netif_stop_queue, which only stops a single
transmit queue.

Signed-off-by: Hardik Gajjar <[email protected]>
---
Changes since version 1:
- Correct Singed-off user name and e-mail

Changes since version 2:
- Move change history below signed-off-by
---
drivers/usb/gadget/function/u_ether.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/usb/gadget/function/u_ether.c b/drivers/usb/gadget/function/u_ether.c
index 4bb0553da658..b0daee35b996 100644
--- a/drivers/usb/gadget/function/u_ether.c
+++ b/drivers/usb/gadget/function/u_ether.c
@@ -635,7 +635,7 @@ static int eth_stop(struct net_device *net)
unsigned long flags;

VDBG(dev, "%s\n", __func__);
- netif_stop_queue(net);
+ netif_device_detach(dev->net);

DBG(dev, "stop stats: rx/tx %ld/%ld, errs %ld/%ld\n",
dev->net->stats.rx_packets, dev->net->stats.tx_packets,
--
2.17.1

2023-10-06 15:57:09

by Hardik Gajjar

[permalink] [raw]
Subject: [PATCH v4] usb: gadget: u_ether: Replace netif_stop_queue with netif_device_detach

This patch replaces the usage of netif_stop_queue with netif_device_detach
in the u_ether driver. The netif_device_detach function not only stops all
tx queues by calling netif_tx_stop_all_queues but also marks the device as
removed by clearing the __LINK_STATE_PRESENT bit.

This change helps notify user space about the disconnection of the device
more effectively, compared to netif_stop_queue, which only stops a single
transmit queue.

Signed-off-by: Hardik Gajjar <[email protected]>
---
Changes since version 1:
- Correct Singed-off user name and e-mail

Changes since version 2:
- Move change history below signed-off-by

Changes since version 3:
- move netif_device_detach from eth_stop to gether_disconnect.
---
drivers/usb/gadget/function/u_ether.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/usb/gadget/function/u_ether.c b/drivers/usb/gadget/function/u_ether.c
index 4bb0553da658..9d1c40c152d8 100644
--- a/drivers/usb/gadget/function/u_ether.c
+++ b/drivers/usb/gadget/function/u_ether.c
@@ -1200,7 +1200,7 @@ void gether_disconnect(struct gether *link)

DBG(dev, "%s\n", __func__);

- netif_stop_queue(dev->net);
+ netif_device_detach(dev->net);
netif_carrier_off(dev->net);

/* disable endpoints, forcing (synchronous) completion
--
2.17.1

2024-01-14 17:06:43

by Andy Shevchenko

[permalink] [raw]
Subject: Re: [PATCH v4] usb: gadget: u_ether: Replace netif_stop_queue with netif_device_detach

+Cc: Ferry.

On Fri, Oct 06, 2023 at 05:56:46PM +0200, Hardik Gajjar wrote:
> This patch replaces the usage of netif_stop_queue with netif_device_detach
> in the u_ether driver. The netif_device_detach function not only stops all
> tx queues by calling netif_tx_stop_all_queues but also marks the device as
> removed by clearing the __LINK_STATE_PRESENT bit.
>
> This change helps notify user space about the disconnection of the device
> more effectively, compared to netif_stop_queue, which only stops a single
> transmit queue.

This change effectively broke my USB ether setup.

git bisect start
# status: waiting for both good and bad commits
# good: [1f24458a1071f006e3f7449c08ae0f12af493923] Merge tag 'tty-6.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
git bisect good 1f24458a1071f006e3f7449c08ae0f12af493923
# status: waiting for bad commit, 1 good commit known
# bad: [2c40c1c6adab90ee4660caf03722b3a3ec67767b] Merge tag 'usb-6.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
git bisect bad 2c40c1c6adab90ee4660caf03722b3a3ec67767b
# bad: [17d6b82d2d6d467149874b883cdba844844b996d] usb/usbip: fix wrong data added to platform device
git bisect bad 17d6b82d2d6d467149874b883cdba844844b996d
# good: [ba6b83a910b6d8a9379bda55cbf06cb945473a96] usb: xhci-mtk: add a bandwidth budget table
git bisect good ba6b83a910b6d8a9379bda55cbf06cb945473a96
# good: [dddc00f255415b826190cfbaa5d6dbc87cd9ded1] Revert "usb: gadget: uvc: cleanup request when not in correct state"
git bisect good dddc00f255415b826190cfbaa5d6dbc87cd9ded1
# bad: [8f999ce60ea3d47886b042ef1f22bb184b6e9c59] USB: typec: tps6598x: Refactor tps6598x port registration
git bisect bad 8f999ce60ea3d47886b042ef1f22bb184b6e9c59
# bad: [f49449fbc21e7e9550a5203902d69c8ae7dfd918] usb: gadget: u_ether: Replace netif_stop_queue with netif_device_detach
git bisect bad f49449fbc21e7e9550a5203902d69c8ae7dfd918
# good: [97475763484245916735a1aa9a3310a01d46b008] USB: usbip: fix stub_dev hub disconnect
git bisect good 97475763484245916735a1aa9a3310a01d46b008
# good: [0f5aa1b01263b8b621bc4f031a1f2983ef8517b7] usb: usbtest: fix a type promotion bug
git bisect good 0f5aa1b01263b8b621bc4f031a1f2983ef8517b7
# first bad commit: [f49449fbc21e7e9550a5203902d69c8ae7dfd918] usb: gadget: u_ether: Replace netif_stop_queue with netif_device_detach

Note, revert indeed helps. Should I send a revert?

I use configfs to setup USB EEM function and it worked till this commit.
If needed, I can share my scripts, but I believe it's not needed as here
we see a clear regression.

--
With Best Regards,
Andy Shevchenko



2024-01-15 13:27:51

by Hardik Gajjar

[permalink] [raw]
Subject: Re: [PATCH v4] usb: gadget: u_ether: Replace netif_stop_queue with netif_device_detach

On Sun, Jan 14, 2024 at 06:59:19PM +0200, Andy Shevchenko wrote:
> +Cc: Ferry.
>
> On Fri, Oct 06, 2023 at 05:56:46PM +0200, Hardik Gajjar wrote:
> > This patch replaces the usage of netif_stop_queue with netif_device_detach
> > in the u_ether driver. The netif_device_detach function not only stops all
> > tx queues by calling netif_tx_stop_all_queues but also marks the device as
> > removed by clearing the __LINK_STATE_PRESENT bit.
> >
> > This change helps notify user space about the disconnection of the device
> > more effectively, compared to netif_stop_queue, which only stops a single
> > transmit queue.
>
> This change effectively broke my USB ether setup.
>
> git bisect start
> # status: waiting for both good and bad commits
> # good: [1f24458a1071f006e3f7449c08ae0f12af493923] Merge tag 'tty-6.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
> git bisect good 1f24458a1071f006e3f7449c08ae0f12af493923
> # status: waiting for bad commit, 1 good commit known
> # bad: [2c40c1c6adab90ee4660caf03722b3a3ec67767b] Merge tag 'usb-6.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
> git bisect bad 2c40c1c6adab90ee4660caf03722b3a3ec67767b
> # bad: [17d6b82d2d6d467149874b883cdba844844b996d] usb/usbip: fix wrong data added to platform device
> git bisect bad 17d6b82d2d6d467149874b883cdba844844b996d
> # good: [ba6b83a910b6d8a9379bda55cbf06cb945473a96] usb: xhci-mtk: add a bandwidth budget table
> git bisect good ba6b83a910b6d8a9379bda55cbf06cb945473a96
> # good: [dddc00f255415b826190cfbaa5d6dbc87cd9ded1] Revert "usb: gadget: uvc: cleanup request when not in correct state"
> git bisect good dddc00f255415b826190cfbaa5d6dbc87cd9ded1
> # bad: [8f999ce60ea3d47886b042ef1f22bb184b6e9c59] USB: typec: tps6598x: Refactor tps6598x port registration
> git bisect bad 8f999ce60ea3d47886b042ef1f22bb184b6e9c59
> # bad: [f49449fbc21e7e9550a5203902d69c8ae7dfd918] usb: gadget: u_ether: Replace netif_stop_queue with netif_device_detach
> git bisect bad f49449fbc21e7e9550a5203902d69c8ae7dfd918
> # good: [97475763484245916735a1aa9a3310a01d46b008] USB: usbip: fix stub_dev hub disconnect
> git bisect good 97475763484245916735a1aa9a3310a01d46b008
> # good: [0f5aa1b01263b8b621bc4f031a1f2983ef8517b7] usb: usbtest: fix a type promotion bug
> git bisect good 0f5aa1b01263b8b621bc4f031a1f2983ef8517b7
> # first bad commit: [f49449fbc21e7e9550a5203902d69c8ae7dfd918] usb: gadget: u_ether: Replace netif_stop_queue with netif_device_detach
>
> Note, revert indeed helps. Should I send a revert?
>
> I use configfs to setup USB EEM function and it worked till this commit.
> If needed, I can share my scripts, but I believe it's not needed as here
> we see a clear regression.
>
> --
> With Best Regards,
> Andy Shevchenko
>
>

Without this patch, there may be a potential crash in a race condition, as __LINK_STATE_PRESENT is monitored at many places in the Network stack to determine the status of the link.

Could you please provide details on how this patch affects your functionality? Are you experiencing connection problems or data transfer interruptions?
Instead of reverting this patch, consider trying the upcoming patch (soon to be available in the mainline) to see if it resolves your issue.

https://lore.kernel.org/lkml/2023122900-commence-agenda-db2c@gregkh/T/#m36a812d3f1e5d744ee32381f6ae4185940b376de

Thanks,
Hardik

2024-01-15 20:11:16

by Ferry Toth

[permalink] [raw]
Subject: Re: [PATCH v4] usb: gadget: u_ether: Replace netif_stop_queue with netif_device_detach

Hi,

Op 15-01-2024 om 14:27 schreef Hardik Gajjar:
> On Sun, Jan 14, 2024 at 06:59:19PM +0200, Andy Shevchenko wrote:
>> +Cc: Ferry.
>>
>> On Fri, Oct 06, 2023 at 05:56:46PM +0200, Hardik Gajjar wrote:
>>> This patch replaces the usage of netif_stop_queue with netif_device_detach
>>> in the u_ether driver. The netif_device_detach function not only stops all
>>> tx queues by calling netif_tx_stop_all_queues but also marks the device as
>>> removed by clearing the __LINK_STATE_PRESENT bit.
>>>
>>> This change helps notify user space about the disconnection of the device
>>> more effectively, compared to netif_stop_queue, which only stops a single
>>> transmit queue.
>>
>> This change effectively broke my USB ether setup.
>>
>> git bisect start
>> # status: waiting for both good and bad commits
>> # good: [1f24458a1071f006e3f7449c08ae0f12af493923] Merge tag 'tty-6.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
>> git bisect good 1f24458a1071f006e3f7449c08ae0f12af493923
>> # status: waiting for bad commit, 1 good commit known
>> # bad: [2c40c1c6adab90ee4660caf03722b3a3ec67767b] Merge tag 'usb-6.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
>> git bisect bad 2c40c1c6adab90ee4660caf03722b3a3ec67767b
>> # bad: [17d6b82d2d6d467149874b883cdba844844b996d] usb/usbip: fix wrong data added to platform device
>> git bisect bad 17d6b82d2d6d467149874b883cdba844844b996d
>> # good: [ba6b83a910b6d8a9379bda55cbf06cb945473a96] usb: xhci-mtk: add a bandwidth budget table
>> git bisect good ba6b83a910b6d8a9379bda55cbf06cb945473a96
>> # good: [dddc00f255415b826190cfbaa5d6dbc87cd9ded1] Revert "usb: gadget: uvc: cleanup request when not in correct state"
>> git bisect good dddc00f255415b826190cfbaa5d6dbc87cd9ded1
>> # bad: [8f999ce60ea3d47886b042ef1f22bb184b6e9c59] USB: typec: tps6598x: Refactor tps6598x port registration
>> git bisect bad 8f999ce60ea3d47886b042ef1f22bb184b6e9c59
>> # bad: [f49449fbc21e7e9550a5203902d69c8ae7dfd918] usb: gadget: u_ether: Replace netif_stop_queue with netif_device_detach
>> git bisect bad f49449fbc21e7e9550a5203902d69c8ae7dfd918
>> # good: [97475763484245916735a1aa9a3310a01d46b008] USB: usbip: fix stub_dev hub disconnect
>> git bisect good 97475763484245916735a1aa9a3310a01d46b008
>> # good: [0f5aa1b01263b8b621bc4f031a1f2983ef8517b7] usb: usbtest: fix a type promotion bug
>> git bisect good 0f5aa1b01263b8b621bc4f031a1f2983ef8517b7
>> # first bad commit: [f49449fbc21e7e9550a5203902d69c8ae7dfd918] usb: gadget: u_ether: Replace netif_stop_queue with netif_device_detach
>>
>> Note, revert indeed helps. Should I send a revert?
>>
>> I use configfs to setup USB EEM function and it worked till this commit.
>> If needed, I can share my scripts, but I believe it's not needed as here
>> we see a clear regression.
>>
>> --
>> With Best Regards,
>> Andy Shevchenko
>>
>>
>
> Without this patch, there may be a potential crash in a race condition, as __LINK_STATE_PRESENT is monitored at many places in the Network stack to determine the status of the link.
>
> Could you please provide details on how this patch affects your functionality? Are you experiencing connection problems or data transfer interruptions?

In my case on mrfld (Intel Edison Arduino) using configfs with this
patch no config from host through dhcp is received. Manual setting
correct ipv4 addr / mask / gw still no connection.

> Instead of reverting this patch, consider trying the upcoming patch (soon to be available in the mainline) to see if it resolves your issue.
>
> https://lore.kernel.org/lkml/2023122900-commence-agenda-db2c@gregkh/T/#m36a812d3f1e5d744ee32381f6ae4185940b376de

This patch works for me with v6.7.0.

> Thanks,
> Hardik


2024-04-03 21:03:38

by Ferry Toth

[permalink] [raw]
Subject: Re: [PATCH v4] usb: gadget: u_ether: Replace netif_stop_queue with netif_device_detach

Hi,

Op 15-01-2024 om 21:10 schreef Ferry Toth:
> Hi,
>
> Op 15-01-2024 om 14:27 schreef Hardik Gajjar:
>> On Sun, Jan 14, 2024 at 06:59:19PM +0200, Andy Shevchenko wrote:
>>> +Cc: Ferry.
>>>
>>> On Fri, Oct 06, 2023 at 05:56:46PM +0200, Hardik Gajjar wrote:
>>>> This patch replaces the usage of netif_stop_queue with
>>>> netif_device_detach
>>>> in the u_ether driver. The netif_device_detach function not only
>>>> stops all
>>>> tx queues by calling netif_tx_stop_all_queues but also marks the
>>>> device as
>>>> removed by clearing the __LINK_STATE_PRESENT bit.
>>>>
>>>> This change helps notify user space about the disconnection of the
>>>> device
>>>> more effectively, compared to netif_stop_queue, which only stops a
>>>> single
>>>> transmit queue.
>>>
>>> This change effectively broke my USB ether setup.
>>>
>>> git bisect start
>>> # status: waiting for both good and bad commits
>>> # good: [1f24458a1071f006e3f7449c08ae0f12af493923] Merge tag
>>> 'tty-6.7-rc1' of
>>> git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
>>> git bisect good 1f24458a1071f006e3f7449c08ae0f12af493923
>>> # status: waiting for bad commit, 1 good commit known
>>> # bad: [2c40c1c6adab90ee4660caf03722b3a3ec67767b] Merge tag
>>> 'usb-6.7-rc1' of
>>> git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
>>> git bisect bad 2c40c1c6adab90ee4660caf03722b3a3ec67767b
>>> # bad: [17d6b82d2d6d467149874b883cdba844844b996d] usb/usbip: fix
>>> wrong data added to platform device
>>> git bisect bad 17d6b82d2d6d467149874b883cdba844844b996d
>>> # good: [ba6b83a910b6d8a9379bda55cbf06cb945473a96] usb: xhci-mtk:
>>> add a bandwidth budget table
>>> git bisect good ba6b83a910b6d8a9379bda55cbf06cb945473a96
>>> # good: [dddc00f255415b826190cfbaa5d6dbc87cd9ded1] Revert "usb:
>>> gadget: uvc: cleanup request when not in correct state"
>>> git bisect good dddc00f255415b826190cfbaa5d6dbc87cd9ded1
>>> # bad: [8f999ce60ea3d47886b042ef1f22bb184b6e9c59] USB: typec:
>>> tps6598x: Refactor tps6598x port registration
>>> git bisect bad 8f999ce60ea3d47886b042ef1f22bb184b6e9c59
>>> # bad: [f49449fbc21e7e9550a5203902d69c8ae7dfd918] usb: gadget:
>>> u_ether: Replace netif_stop_queue with netif_device_detach
>>> git bisect bad f49449fbc21e7e9550a5203902d69c8ae7dfd918
>>> # good: [97475763484245916735a1aa9a3310a01d46b008] USB: usbip: fix
>>> stub_dev hub disconnect
>>> git bisect good 97475763484245916735a1aa9a3310a01d46b008
>>> # good: [0f5aa1b01263b8b621bc4f031a1f2983ef8517b7] usb: usbtest: fix
>>> a type promotion bug
>>> git bisect good 0f5aa1b01263b8b621bc4f031a1f2983ef8517b7
>>> # first bad commit: [f49449fbc21e7e9550a5203902d69c8ae7dfd918] usb:
>>> gadget: u_ether: Replace netif_stop_queue with netif_device_detach
>>>
>>> Note, revert indeed helps. Should I send a revert?
>>>
>>> I use configfs to setup USB EEM function and it worked till this
>>> commit.
>>> If needed, I can share my scripts, but I believe it's not needed as
>>> here
>>> we see a clear regression.
>>>
>>> --
>>> With Best Regards,
>>> Andy Shevchenko
>>>
>>>
>>
>> Without this patch, there may be a potential crash in a race
>> condition, as __LINK_STATE_PRESENT is monitored at many places in the
>> Network stack to determine the status of the link.
>>
>> Could you please provide details on how this patch affects your
>> functionality? Are you experiencing connection problems or data
>> transfer interruptions?
>
> In my case on mrfld (Intel Edison Arduino) using configfs with this
> patch no config from host through dhcp is received. Manual setting
> correct ipv4 addr / mask / gw still no connection.
>
>> Instead of reverting this patch, consider trying the upcoming patch
>> (soon to be available in the mainline) to see if it resolves your issue.
>>
>> https://lore.kernel.org/lkml/2023122900-commence-agenda-db2c@gregkh/T/#m36a812d3f1e5d744ee32381f6ae4185940b376de
>>
>
> This patch works for me with v6.7.0.

I need to revisit this. The patch in this topic landed in v6.7.0-rc1
(f49449fbc21e) and breaks the gadget mrfld (Intel Edison Arduino) and
other platforms as well.

The mentioned fix "usb: gadget: u_ether: Re-attach netif device to
mirror detachment*" * has landed in v6.8.0-rc1 (76c945730). What it does
fix: I am able to make a USB EEM function again.

However, now a hidden issue appears. With mrfld there is an external
switch to easily switch between host and device mode.

What is not fixed:

- when in device mode and unplugging/plugging the cable when using
`ifconfig usb0` the line "usb0:
flags=4163<UP,BROADCAST,RUNNING,MULTICAST>" changes to "usb0:
flags=4099<UP,BROADCAST,MULTICAST>" as is supposed to, the route table
is updated and the dir `/sys/class/net/usb0` exists and in the dir `cat
carrier*` shows the carrier up and down counts. This is the expected
behavior.

- when in device mode and switching to host mode `ifconfig usb0`
continues to show "RUNNING", the route table is not modified and the dir
`/sys/class/net/usb0` no longer exists.

- switching to device mode again, USB EEM works fine, no changes to
RUNNING or the route table happen and the dir `/sys/class/net/usb0`
still is non- existing.

- unplugging/plugging the cable in device mode after this does not
restore the original situation.

This behavior I tested on v6.9.0-rc2 (with a few unrelated but essential
patches on top) and bisected back to this patch in v6.70-rc1.

It seems `netif_device_detach` does not completely clean up as expected
and `netif_device_attach` does not completely rebuild.

I am wondering if on other platforms this can be reproduced? If so, inho
it would be best to revert the both patches until the issue is resolved.

Thanks,

Ferry

>> Thanks,
>> Hardik
>

2024-04-05 11:44:47

by Hardik Gajjar

[permalink] [raw]
Subject: Re: [PATCH v4] usb: gadget: u_ether: Replace netif_stop_queue with netif_device_detach

On Wed, Apr 03, 2024 at 11:01:58PM +0200, Ferry Toth wrote:
> Hi,
>
> Op 15-01-2024 om 21:10 schreef Ferry Toth:
> > Hi,
> >
> > Op 15-01-2024 om 14:27 schreef Hardik Gajjar:
> > > On Sun, Jan 14, 2024 at 06:59:19PM +0200, Andy Shevchenko wrote:
> > > > +Cc: Ferry.
> > > >
> > > > On Fri, Oct 06, 2023 at 05:56:46PM +0200, Hardik Gajjar wrote:
> > > > > This patch replaces the usage of netif_stop_queue with
> > > > > netif_device_detach
> > > > > in the u_ether driver. The netif_device_detach function not
> > > > > only stops all
> > > > > tx queues by calling netif_tx_stop_all_queues but also marks
> > > > > the device as
> > > > > removed by clearing the __LINK_STATE_PRESENT bit.
> > > > >
> > > > > This change helps notify user space about the disconnection
> > > > > of the device
> > > > > more effectively, compared to netif_stop_queue, which only
> > > > > stops a single
> > > > > transmit queue.
> > > >
> > > > This change effectively broke my USB ether setup.
> > > >
> > > > git bisect start
> > > > # status: waiting for both good and bad commits
> > > > # good: [1f24458a1071f006e3f7449c08ae0f12af493923] Merge tag
> > > > 'tty-6.7-rc1' of
> > > > git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
> > > > git bisect good 1f24458a1071f006e3f7449c08ae0f12af493923
> > > > # status: waiting for bad commit, 1 good commit known
> > > > # bad: [2c40c1c6adab90ee4660caf03722b3a3ec67767b] Merge tag
> > > > 'usb-6.7-rc1' of
> > > > git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
> > > > git bisect bad 2c40c1c6adab90ee4660caf03722b3a3ec67767b
> > > > # bad: [17d6b82d2d6d467149874b883cdba844844b996d] usb/usbip: fix
> > > > wrong data added to platform device
> > > > git bisect bad 17d6b82d2d6d467149874b883cdba844844b996d
> > > > # good: [ba6b83a910b6d8a9379bda55cbf06cb945473a96] usb:
> > > > xhci-mtk: add a bandwidth budget table
> > > > git bisect good ba6b83a910b6d8a9379bda55cbf06cb945473a96
> > > > # good: [dddc00f255415b826190cfbaa5d6dbc87cd9ded1] Revert "usb:
> > > > gadget: uvc: cleanup request when not in correct state"
> > > > git bisect good dddc00f255415b826190cfbaa5d6dbc87cd9ded1
> > > > # bad: [8f999ce60ea3d47886b042ef1f22bb184b6e9c59] USB: typec:
> > > > tps6598x: Refactor tps6598x port registration
> > > > git bisect bad 8f999ce60ea3d47886b042ef1f22bb184b6e9c59
> > > > # bad: [f49449fbc21e7e9550a5203902d69c8ae7dfd918] usb: gadget:
> > > > u_ether: Replace netif_stop_queue with netif_device_detach
> > > > git bisect bad f49449fbc21e7e9550a5203902d69c8ae7dfd918
> > > > # good: [97475763484245916735a1aa9a3310a01d46b008] USB: usbip:
> > > > fix stub_dev hub disconnect
> > > > git bisect good 97475763484245916735a1aa9a3310a01d46b008
> > > > # good: [0f5aa1b01263b8b621bc4f031a1f2983ef8517b7] usb: usbtest:
> > > > fix a type promotion bug
> > > > git bisect good 0f5aa1b01263b8b621bc4f031a1f2983ef8517b7
> > > > # first bad commit: [f49449fbc21e7e9550a5203902d69c8ae7dfd918]
> > > > usb: gadget: u_ether: Replace netif_stop_queue with
> > > > netif_device_detach
> > > >
> > > > Note, revert indeed helps. Should I send a revert?
> > > >
> > > > I use configfs to setup USB EEM function and it worked till this
> > > > commit.
> > > > If needed, I can share my scripts, but I believe it's not needed
> > > > as here
> > > > we see a clear regression.
> > > >
> > > > --
> > > > With Best Regards,
> > > > Andy Shevchenko
> > > >
> > > >
> > >
> > > Without this patch, there may be a potential crash in a race
> > > condition, as __LINK_STATE_PRESENT is monitored at many places in
> > > the Network stack to determine the status of the link.
> > >
> > > Could you please provide details on how this patch affects your
> > > functionality? Are you experiencing connection problems or data
> > > transfer interruptions?
> >
> > In my case on mrfld (Intel Edison Arduino) using configfs with this
> > patch no config from host through dhcp is received. Manual setting
> > correct ipv4 addr / mask / gw still no connection.
> >
> > > Instead of reverting this patch, consider trying the upcoming patch
> > > (soon to be available in the mainline) to see if it resolves your
> > > issue.
> > >
> > > https://urldefense.proofpoint.com/v2/url?u=https-3A__lore.kernel.org_lkml_2023122900-2Dcommence-2Dagenda-2Ddb2c-40gregkh_T_-23m36a812d3f1e5d744ee32381f6ae4185940b376de&d=DwICaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=SAhjP5GOmrADp1v_EE5jWoSuMlYCIt9gKduw-DCBPLs&m=4g6EtvkKKfp8YYHpU196b2HzQxCMgsAhuo8pFng3X4TCWdcOVEUCug2-l2hRfLyV&s=t82wZAFwm2FTSaacgsmSpZWvWEa89jN8GX-okIJrwFw&e=
> > >
> >
> > This patch works for me with v6.7.0.
>
> I need to revisit this. The patch in this topic landed in v6.7.0-rc1
> (f49449fbc21e) and breaks the gadget mrfld (Intel Edison Arduino) and other
> platforms as well.
>
> The mentioned fix "usb: gadget: u_ether: Re-attach netif device to mirror
> detachment*" * has landed in v6.8.0-rc1 (76c945730). What it does fix: I am
> able to make a USB EEM function again.
>
> However, now a hidden issue appears. With mrfld there is an external switch
> to easily switch between host and device mode.
>
> What is not fixed:
>
> - when in device mode and unplugging/plugging the cable when using `ifconfig
> usb0` the line "usb0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>" changes to
> "usb0: flags=4099<UP,BROADCAST,MULTICAST>" as is supposed to, the route
> table is updated and the dir `/sys/class/net/usb0` exists and in the dir
> `cat carrier*` shows the carrier up and down counts. This is the expected
> behavior.
>
> - when in device mode and switching to host mode `ifconfig usb0` continues
> to show "RUNNING", the route table is not modified and the dir
> `/sys/class/net/usb0` no longer exists.
>
> - switching to device mode again, USB EEM works fine, no changes to RUNNING
> or the route table happen and the dir `/sys/class/net/usb0` still is non-
> existing.
>
> - unplugging/plugging the cable in device mode after this does not restore
> the original situation.
>
> This behavior I tested on v6.9.0-rc2 (with a few unrelated but essential
> patches on top) and bisected back to this patch in v6.70-rc1.
>
> It seems `netif_device_detach` does not completely clean up as expected and
> `netif_device_attach` does not completely rebuild.
>
> I am wondering if on other platforms this can be reproduced? If so, inho it
> would be best to revert the both patches until the issue is resolved.
>
> Thanks,
>
> Ferry

I'm wondering why the /sys/class/net/usb0 directory is being removed when the network interface is still available.
This behavior seems not correct.

The gether_cleanup function should remove the interface along with the associated kobject,
and this function should only be called during the unloading of the driver or deleting the gadget.
Could you please confirm whether any of your custom modifications are removing the net interface kobject?

>
> > > Thanks,
> > > Hardik
> >

2024-04-07 20:52:09

by Ferry Toth

[permalink] [raw]
Subject: Re: [PATCH v4] usb: gadget: u_ether: Replace netif_stop_queue with netif_device_detach

Hi

Op 05-04-2024 om 13:38 schreef Hardik Gajjar:
> On Wed, Apr 03, 2024 at 11:01:58PM +0200, Ferry Toth wrote:
>> Hi,
>>
>> Op 15-01-2024 om 21:10 schreef Ferry Toth:
>>> Hi,
>>>
>>> Op 15-01-2024 om 14:27 schreef Hardik Gajjar:
>>>> On Sun, Jan 14, 2024 at 06:59:19PM +0200, Andy Shevchenko wrote:
>>>>> +Cc: Ferry.
>>>>>
>>>>> On Fri, Oct 06, 2023 at 05:56:46PM +0200, Hardik Gajjar wrote:
>>>>>> This patch replaces the usage of netif_stop_queue with
>>>>>> netif_device_detach
>>>>>> in the u_ether driver. The netif_device_detach function not
>>>>>> only stops all
>>>>>> tx queues by calling netif_tx_stop_all_queues but also marks
>>>>>> the device as
>>>>>> removed by clearing the __LINK_STATE_PRESENT bit.
>>>>>>
>>>>>> This change helps notify user space about the disconnection
>>>>>> of the device
>>>>>> more effectively, compared to netif_stop_queue, which only
>>>>>> stops a single
>>>>>> transmit queue.
>>>>> This change effectively broke my USB ether setup.
>>>>>
>>>>> git bisect start
>>>>> # status: waiting for both good and bad commits
>>>>> # good: [1f24458a1071f006e3f7449c08ae0f12af493923] Merge tag
>>>>> 'tty-6.7-rc1' of
>>>>> git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
>>>>> git bisect good 1f24458a1071f006e3f7449c08ae0f12af493923
>>>>> # status: waiting for bad commit, 1 good commit known
>>>>> # bad: [2c40c1c6adab90ee4660caf03722b3a3ec67767b] Merge tag
>>>>> 'usb-6.7-rc1' of
>>>>> git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
>>>>> git bisect bad 2c40c1c6adab90ee4660caf03722b3a3ec67767b
>>>>> # bad: [17d6b82d2d6d467149874b883cdba844844b996d] usb/usbip: fix
>>>>> wrong data added to platform device
>>>>> git bisect bad 17d6b82d2d6d467149874b883cdba844844b996d
>>>>> # good: [ba6b83a910b6d8a9379bda55cbf06cb945473a96] usb:
>>>>> xhci-mtk: add a bandwidth budget table
>>>>> git bisect good ba6b83a910b6d8a9379bda55cbf06cb945473a96
>>>>> # good: [dddc00f255415b826190cfbaa5d6dbc87cd9ded1] Revert "usb:
>>>>> gadget: uvc: cleanup request when not in correct state"
>>>>> git bisect good dddc00f255415b826190cfbaa5d6dbc87cd9ded1
>>>>> # bad: [8f999ce60ea3d47886b042ef1f22bb184b6e9c59] USB: typec:
>>>>> tps6598x: Refactor tps6598x port registration
>>>>> git bisect bad 8f999ce60ea3d47886b042ef1f22bb184b6e9c59
>>>>> # bad: [f49449fbc21e7e9550a5203902d69c8ae7dfd918] usb: gadget:
>>>>> u_ether: Replace netif_stop_queue with netif_device_detach
>>>>> git bisect bad f49449fbc21e7e9550a5203902d69c8ae7dfd918
>>>>> # good: [97475763484245916735a1aa9a3310a01d46b008] USB: usbip:
>>>>> fix stub_dev hub disconnect
>>>>> git bisect good 97475763484245916735a1aa9a3310a01d46b008
>>>>> # good: [0f5aa1b01263b8b621bc4f031a1f2983ef8517b7] usb: usbtest:
>>>>> fix a type promotion bug
>>>>> git bisect good 0f5aa1b01263b8b621bc4f031a1f2983ef8517b7
>>>>> # first bad commit: [f49449fbc21e7e9550a5203902d69c8ae7dfd918]
>>>>> usb: gadget: u_ether: Replace netif_stop_queue with
>>>>> netif_device_detach
>>>>>
>>>>> Note, revert indeed helps. Should I send a revert?
>>>>>
>>>>> I use configfs to setup USB EEM function and it worked till this
>>>>> commit.
>>>>> If needed, I can share my scripts, but I believe it's not needed
>>>>> as here
>>>>> we see a clear regression.
>>>>>
>>>>> --
>>>>> With Best Regards,
>>>>> Andy Shevchenko
>>>>>
>>>>>
>>>> Without this patch, there may be a potential crash in a race
>>>> condition, as __LINK_STATE_PRESENT is monitored at many places in
>>>> the Network stack to determine the status of the link.
>>>>
>>>> Could you please provide details on how this patch affects your
>>>> functionality? Are you experiencing connection problems or data
>>>> transfer interruptions?
>>> In my case on mrfld (Intel Edison Arduino) using configfs with this
>>> patch no config from host through dhcp is received. Manual setting
>>> correct ipv4 addr / mask / gw still no connection.
>>>
>>>> Instead of reverting this patch, consider trying the upcoming patch
>>>> (soon to be available in the mainline) to see if it resolves your
>>>> issue.
>>>>
>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__lore.kernel.org_lkml_2023122900-2Dcommence-2Dagenda-2Ddb2c-40gregkh_T_-23m36a812d3f1e5d744ee32381f6ae4185940b376de&d=DwICaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=SAhjP5GOmrADp1v_EE5jWoSuMlYCIt9gKduw-DCBPLs&m=4g6EtvkKKfp8YYHpU196b2HzQxCMgsAhuo8pFng3X4TCWdcOVEUCug2-l2hRfLyV&s=t82wZAFwm2FTSaacgsmSpZWvWEa89jN8GX-okIJrwFw&e=
>>>>
>>> This patch works for me with v6.7.0.
>> I need to revisit this. The patch in this topic landed in v6.7.0-rc1
>> (f49449fbc21e) and breaks the gadget mrfld (Intel Edison Arduino) and other
>> platforms as well.
>>
>> The mentioned fix "usb: gadget: u_ether: Re-attach netif device to mirror
>> detachment*" * has landed in v6.8.0-rc1 (76c945730). What it does fix: I am
>> able to make a USB EEM function again.
>>
>> However, now a hidden issue appears. With mrfld there is an external switch
>> to easily switch between host and device mode.
>>
>> What is not fixed:
>>
>> - when in device mode and unplugging/plugging the cable when using `ifconfig
>> usb0` the line "usb0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>" changes to
>> "usb0: flags=4099<UP,BROADCAST,MULTICAST>" as is supposed to, the route
>> table is updated and the dir `/sys/class/net/usb0` exists and in the dir
>> `cat carrier*` shows the carrier up and down counts. This is the expected
>> behavior.
>>
>> - when in device mode and switching to host mode `ifconfig usb0` continues
>> to show "RUNNING", the route table is not modified and the dir
>> `/sys/class/net/usb0` no longer exists.
>>
>> - switching to device mode again, USB EEM works fine, no changes to RUNNING
>> or the route table happen and the dir `/sys/class/net/usb0` still is non-
>> existing.
>>
>> - unplugging/plugging the cable in device mode after this does not restore
>> the original situation.
>>
>> This behavior I tested on v6.9.0-rc2 (with a few unrelated but essential
>> patches on top) and bisected back to this patch in v6.70-rc1.
>>
>> It seems `netif_device_detach` does not completely clean up as expected and
>> `netif_device_attach` does not completely rebuild.
>>
>> I am wondering if on other platforms this can be reproduced? If so, inho it
>> would be best to revert the both patches until the issue is resolved.
>>
>> Thanks,
>>
>> Ferry
> I'm wondering why the /sys/class/net/usb0 directory is being removed when the network interface is still available.
> This behavior seems not correct.

Exactly. And this didn't happen before the 2 patches.

To be precise: /sys/class/net/usb0 is not removed and it is a link, the
link target
/sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/net/usb0 no
longer exists

> The gether_cleanup function should remove the interface along with the associated kobject,
> and this function should only be called during the unloading of the driver or deleting the gadget.
> Could you please confirm whether any of your custom modifications are removing the net interface kobject?

As far as know not. I have one tusb1210 (usb phy) and 2 dwc3 patches,
nothing related to gadgets or net.

For reference, patches are here:
https://github.com/htot/meta-intel-edison/tree/nanbield/meta-intel-edison-bsp/recipes-kernel/linux/files

0001-phy-ti-tusb1210-write-to-scratch-on-power-on.patch

0001a-usb-dwc3-core-Fix-dwc3_core_soft_reset-before-anythi.patch

044-REVERTME-usb-dwc3-gadget-skip-endpoints-ep-18-in-out.patch

Seems (just guessing) gether_cleanup is being called due to the switch
to host mode, but cleanup only partly succeeds. Now I'm finding I can
make the net interface disappear with `ip link set dev usb0 down` and
the broken link goes away by destroying the gadget in configfs.

Is that intentional? Do I need to tear down the gadgets in reverse order
as on create when switching to host mode? That would be new.

What happens when you trigger a switch to host mode without actually
unplugging your cable?

>>>> Thanks,
>>>> Hardik

2024-04-10 17:38:29

by Andy Shevchenko

[permalink] [raw]
Subject: Re: [PATCH v4] usb: gadget: u_ether: Replace netif_stop_queue with netif_device_detach

On Sun, Apr 07, 2024 at 10:51:51PM +0200, Ferry Toth wrote:
> Op 05-04-2024 om 13:38 schreef Hardik Gajjar:
> > On Wed, Apr 03, 2024 at 11:01:58PM +0200, Ferry Toth wrote:
> > > Op 15-01-2024 om 21:10 schreef Ferry Toth:
> > > > Op 15-01-2024 om 14:27 schreef Hardik Gajjar:
> > > > > On Sun, Jan 14, 2024 at 06:59:19PM +0200, Andy Shevchenko wrote:
> > > > > > On Fri, Oct 06, 2023 at 05:56:46PM +0200, Hardik Gajjar wrote:
> > > > > > > This patch replaces the usage of netif_stop_queue with
> > > > > > > netif_device_detach
> > > > > > > in the u_ether driver. The netif_device_detach function not
> > > > > > > only stops all
> > > > > > > tx queues by calling netif_tx_stop_all_queues but also marks
> > > > > > > the device as
> > > > > > > removed by clearing the __LINK_STATE_PRESENT bit.
> > > > > > >
> > > > > > > This change helps notify user space about the disconnection
> > > > > > > of the device
> > > > > > > more effectively, compared to netif_stop_queue, which only
> > > > > > > stops a single
> > > > > > > transmit queue.
> > > > > > This change effectively broke my USB ether setup.
> > > > > >
> > > > > > git bisect start
> > > > > > # status: waiting for both good and bad commits
> > > > > > # good: [1f24458a1071f006e3f7449c08ae0f12af493923] Merge tag
> > > > > > 'tty-6.7-rc1' of
> > > > > > git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
> > > > > > git bisect good 1f24458a1071f006e3f7449c08ae0f12af493923
> > > > > > # status: waiting for bad commit, 1 good commit known
> > > > > > # bad: [2c40c1c6adab90ee4660caf03722b3a3ec67767b] Merge tag
> > > > > > 'usb-6.7-rc1' of
> > > > > > git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
> > > > > > git bisect bad 2c40c1c6adab90ee4660caf03722b3a3ec67767b
> > > > > > # bad: [17d6b82d2d6d467149874b883cdba844844b996d] usb/usbip: fix
> > > > > > wrong data added to platform device
> > > > > > git bisect bad 17d6b82d2d6d467149874b883cdba844844b996d
> > > > > > # good: [ba6b83a910b6d8a9379bda55cbf06cb945473a96] usb:
> > > > > > xhci-mtk: add a bandwidth budget table
> > > > > > git bisect good ba6b83a910b6d8a9379bda55cbf06cb945473a96
> > > > > > # good: [dddc00f255415b826190cfbaa5d6dbc87cd9ded1] Revert "usb:
> > > > > > gadget: uvc: cleanup request when not in correct state"
> > > > > > git bisect good dddc00f255415b826190cfbaa5d6dbc87cd9ded1
> > > > > > # bad: [8f999ce60ea3d47886b042ef1f22bb184b6e9c59] USB: typec:
> > > > > > tps6598x: Refactor tps6598x port registration
> > > > > > git bisect bad 8f999ce60ea3d47886b042ef1f22bb184b6e9c59
> > > > > > # bad: [f49449fbc21e7e9550a5203902d69c8ae7dfd918] usb: gadget:
> > > > > > u_ether: Replace netif_stop_queue with netif_device_detach
> > > > > > git bisect bad f49449fbc21e7e9550a5203902d69c8ae7dfd918
> > > > > > # good: [97475763484245916735a1aa9a3310a01d46b008] USB: usbip:
> > > > > > fix stub_dev hub disconnect
> > > > > > git bisect good 97475763484245916735a1aa9a3310a01d46b008
> > > > > > # good: [0f5aa1b01263b8b621bc4f031a1f2983ef8517b7] usb: usbtest:
> > > > > > fix a type promotion bug
> > > > > > git bisect good 0f5aa1b01263b8b621bc4f031a1f2983ef8517b7
> > > > > > # first bad commit: [f49449fbc21e7e9550a5203902d69c8ae7dfd918]
> > > > > > usb: gadget: u_ether: Replace netif_stop_queue with
> > > > > > netif_device_detach
> > > > > >
> > > > > > Note, revert indeed helps. Should I send a revert?
> > > > > >
> > > > > > I use configfs to setup USB EEM function and it worked till this
> > > > > > commit.
> > > > > > If needed, I can share my scripts, but I believe it's not needed
> > > > > > as here
> > > > > > we see a clear regression.
> > > > > >
> > > > > Without this patch, there may be a potential crash in a race
> > > > > condition, as __LINK_STATE_PRESENT is monitored at many places in
> > > > > the Network stack to determine the status of the link.
> > > > >
> > > > > Could you please provide details on how this patch affects your
> > > > > functionality? Are you experiencing connection problems or data
> > > > > transfer interruptions?
> > > > In my case on mrfld (Intel Edison Arduino) using configfs with this
> > > > patch no config from host through dhcp is received. Manual setting
> > > > correct ipv4 addr / mask / gw still no connection.
> > > >
> > > > > Instead of reverting this patch, consider trying the upcoming patch
> > > > > (soon to be available in the mainline) to see if it resolves your
> > > > > issue.
> > > > >
> > > > > https://urldefense.proofpoint.com/v2/url?u=https-3A__lore.kernel.org_lkml_2023122900-2Dcommence-2Dagenda-2Ddb2c-40gregkh_T_-23m36a812d3f1e5d744ee32381f6ae4185940b376de&d=DwICaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=SAhjP5GOmrADp1v_EE5jWoSuMlYCIt9gKduw-DCBPLs&m=4g6EtvkKKfp8YYHpU196b2HzQxCMgsAhuo8pFng3X4TCWdcOVEUCug2-l2hRfLyV&s=t82wZAFwm2FTSaacgsmSpZWvWEa89jN8GX-okIJrwFw&e=
> > > > >
> > > > This patch works for me with v6.7.0.
> > > I need to revisit this. The patch in this topic landed in v6.7.0-rc1
> > > (f49449fbc21e) and breaks the gadget mrfld (Intel Edison Arduino) and other
> > > platforms as well.
> > >
> > > The mentioned fix "usb: gadget: u_ether: Re-attach netif device to mirror
> > > detachment*" * has landed in v6.8.0-rc1 (76c945730). What it does fix: I am
> > > able to make a USB EEM function again.
> > >
> > > However, now a hidden issue appears. With mrfld there is an external switch
> > > to easily switch between host and device mode.
> > >
> > > What is not fixed:
> > >
> > > - when in device mode and unplugging/plugging the cable when using `ifconfig
> > > usb0` the line "usb0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>" changes to
> > > "usb0: flags=4099<UP,BROADCAST,MULTICAST>" as is supposed to, the route
> > > table is updated and the dir `/sys/class/net/usb0` exists and in the dir
> > > `cat carrier*` shows the carrier up and down counts. This is the expected
> > > behavior.
> > >
> > > - when in device mode and switching to host mode `ifconfig usb0` continues
> > > to show "RUNNING", the route table is not modified and the dir
> > > `/sys/class/net/usb0` no longer exists.
> > >
> > > - switching to device mode again, USB EEM works fine, no changes to RUNNING
> > > or the route table happen and the dir `/sys/class/net/usb0` still is non-
> > > existing.
> > >
> > > - unplugging/plugging the cable in device mode after this does not restore
> > > the original situation.
> > >
> > > This behavior I tested on v6.9.0-rc2 (with a few unrelated but essential
> > > patches on top) and bisected back to this patch in v6.70-rc1.
> > >
> > > It seems `netif_device_detach` does not completely clean up as expected and
> > > `netif_device_attach` does not completely rebuild.
> > >
> > > I am wondering if on other platforms this can be reproduced? If so, inho it
> > > would be best to revert the both patches until the issue is resolved.
> > >
> > > Thanks,
> > >
> > > Ferry
> > I'm wondering why the /sys/class/net/usb0 directory is being removed when the network interface is still available.
> > This behavior seems not correct.
>
> Exactly. And this didn't happen before the 2 patches.
>
> To be precise: /sys/class/net/usb0 is not removed and it is a link, the link
> target /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/net/usb0 no
> longer exists
>
> > The gether_cleanup function should remove the interface along with the associated kobject,
> > and this function should only be called during the unloading of the driver or deleting the gadget.
> > Could you please confirm whether any of your custom modifications are removing the net interface kobject?
>
> As far as know not. I have one tusb1210 (usb phy) and 2 dwc3 patches,
> nothing related to gadgets or net.
>
> For reference, patches are here: https://github.com/htot/meta-intel-edison/tree/nanbield/meta-intel-edison-bsp/recipes-kernel/linux/files
>
> 0001-phy-ti-tusb1210-write-to-scratch-on-power-on.patch
>
> 0001a-usb-dwc3-core-Fix-dwc3_core_soft_reset-before-anythi.patch
>
> 044-REVERTME-usb-dwc3-gadget-skip-endpoints-ep-18-in-out.patch
>
> Seems (just guessing) gether_cleanup is being called due to the switch to
> host mode, but cleanup only partly succeeds. Now I'm finding I can make the
> net interface disappear with `ip link set dev usb0 down` and the broken link
> goes away by destroying the gadget in configfs.
>
> Is that intentional? Do I need to tear down the gadgets in reverse order as
> on create when switching to host mode? That would be new.
>
> What happens when you trigger a switch to host mode without actually
> unplugging your cable?

Since there is no response, should we actually prepare revert next week?

--
With Best Regards,
Andy Shevchenko



2024-04-11 14:55:28

by Hardik Gajjar

[permalink] [raw]
Subject: Re: [PATCH v4] usb: gadget: u_ether: Replace netif_stop_queue with netif_device_detach

On Wed, Apr 10, 2024 at 08:37:42PM +0300, Andy Shevchenko wrote:
> On Sun, Apr 07, 2024 at 10:51:51PM +0200, Ferry Toth wrote:
> > Op 05-04-2024 om 13:38 schreef Hardik Gajjar:
> > > On Wed, Apr 03, 2024 at 11:01:58PM +0200, Ferry Toth wrote:
> > > > Op 15-01-2024 om 21:10 schreef Ferry Toth:
> > > > > Op 15-01-2024 om 14:27 schreef Hardik Gajjar:
> > > > > > On Sun, Jan 14, 2024 at 06:59:19PM +0200, Andy Shevchenko wrote:
> > > > > > > On Fri, Oct 06, 2023 at 05:56:46PM +0200, Hardik Gajjar wrote:
> > > > > > > > This patch replaces the usage of netif_stop_queue with
> > > > > > > > netif_device_detach
> > > > > > > > in the u_ether driver. The netif_device_detach function not
> > > > > > > > only stops all
> > > > > > > > tx queues by calling netif_tx_stop_all_queues but also marks
> > > > > > > > the device as
> > > > > > > > removed by clearing the __LINK_STATE_PRESENT bit.
> > > > > > > >
> > > > > > > > This change helps notify user space about the disconnection
> > > > > > > > of the device
> > > > > > > > more effectively, compared to netif_stop_queue, which only
> > > > > > > > stops a single
> > > > > > > > transmit queue.
> > > > > > > This change effectively broke my USB ether setup.
> > > > > > >
> > > > > > > git bisect start
> > > > > > > # status: waiting for both good and bad commits
> > > > > > > # good: [1f24458a1071f006e3f7449c08ae0f12af493923] Merge tag
> > > > > > > 'tty-6.7-rc1' of
> > > > > > > git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
> > > > > > > git bisect good 1f24458a1071f006e3f7449c08ae0f12af493923
> > > > > > > # status: waiting for bad commit, 1 good commit known
> > > > > > > # bad: [2c40c1c6adab90ee4660caf03722b3a3ec67767b] Merge tag
> > > > > > > 'usb-6.7-rc1' of
> > > > > > > git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
> > > > > > > git bisect bad 2c40c1c6adab90ee4660caf03722b3a3ec67767b
> > > > > > > # bad: [17d6b82d2d6d467149874b883cdba844844b996d] usb/usbip: fix
> > > > > > > wrong data added to platform device
> > > > > > > git bisect bad 17d6b82d2d6d467149874b883cdba844844b996d
> > > > > > > # good: [ba6b83a910b6d8a9379bda55cbf06cb945473a96] usb:
> > > > > > > xhci-mtk: add a bandwidth budget table
> > > > > > > git bisect good ba6b83a910b6d8a9379bda55cbf06cb945473a96
> > > > > > > # good: [dddc00f255415b826190cfbaa5d6dbc87cd9ded1] Revert "usb:
> > > > > > > gadget: uvc: cleanup request when not in correct state"
> > > > > > > git bisect good dddc00f255415b826190cfbaa5d6dbc87cd9ded1
> > > > > > > # bad: [8f999ce60ea3d47886b042ef1f22bb184b6e9c59] USB: typec:
> > > > > > > tps6598x: Refactor tps6598x port registration
> > > > > > > git bisect bad 8f999ce60ea3d47886b042ef1f22bb184b6e9c59
> > > > > > > # bad: [f49449fbc21e7e9550a5203902d69c8ae7dfd918] usb: gadget:
> > > > > > > u_ether: Replace netif_stop_queue with netif_device_detach
> > > > > > > git bisect bad f49449fbc21e7e9550a5203902d69c8ae7dfd918
> > > > > > > # good: [97475763484245916735a1aa9a3310a01d46b008] USB: usbip:
> > > > > > > fix stub_dev hub disconnect
> > > > > > > git bisect good 97475763484245916735a1aa9a3310a01d46b008
> > > > > > > # good: [0f5aa1b01263b8b621bc4f031a1f2983ef8517b7] usb: usbtest:
> > > > > > > fix a type promotion bug
> > > > > > > git bisect good 0f5aa1b01263b8b621bc4f031a1f2983ef8517b7
> > > > > > > # first bad commit: [f49449fbc21e7e9550a5203902d69c8ae7dfd918]
> > > > > > > usb: gadget: u_ether: Replace netif_stop_queue with
> > > > > > > netif_device_detach
> > > > > > >
> > > > > > > Note, revert indeed helps. Should I send a revert?
> > > > > > >
> > > > > > > I use configfs to setup USB EEM function and it worked till this
> > > > > > > commit.
> > > > > > > If needed, I can share my scripts, but I believe it's not needed
> > > > > > > as here
> > > > > > > we see a clear regression.
> > > > > > >
> > > > > > Without this patch, there may be a potential crash in a race
> > > > > > condition, as __LINK_STATE_PRESENT is monitored at many places in
> > > > > > the Network stack to determine the status of the link.
> > > > > >
> > > > > > Could you please provide details on how this patch affects your
> > > > > > functionality? Are you experiencing connection problems or data
> > > > > > transfer interruptions?
> > > > > In my case on mrfld (Intel Edison Arduino) using configfs with this
> > > > > patch no config from host through dhcp is received. Manual setting
> > > > > correct ipv4 addr / mask / gw still no connection.
> > > > >
> > > > > > Instead of reverting this patch, consider trying the upcoming patch
> > > > > > (soon to be available in the mainline) to see if it resolves your
> > > > > > issue.
> > > > > >
> > > > > > https://urldefense.proofpoint.com/v2/url?u=https-3A__lore.kernel.org_lkml_2023122900-2Dcommence-2Dagenda-2Ddb2c-40gregkh_T_-23m36a812d3f1e5d744ee32381f6ae4185940b376de&d=DwICaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=SAhjP5GOmrADp1v_EE5jWoSuMlYCIt9gKduw-DCBPLs&m=4g6EtvkKKfp8YYHpU196b2HzQxCMgsAhuo8pFng3X4TCWdcOVEUCug2-l2hRfLyV&s=t82wZAFwm2FTSaacgsmSpZWvWEa89jN8GX-okIJrwFw&e=
> > > > > >
> > > > > This patch works for me with v6.7.0.
> > > > I need to revisit this. The patch in this topic landed in v6.7.0-rc1
> > > > (f49449fbc21e) and breaks the gadget mrfld (Intel Edison Arduino) and other
> > > > platforms as well.
> > > >
> > > > The mentioned fix "usb: gadget: u_ether: Re-attach netif device to mirror
> > > > detachment*" * has landed in v6.8.0-rc1 (76c945730). What it does fix: I am
> > > > able to make a USB EEM function again.
> > > >
> > > > However, now a hidden issue appears. With mrfld there is an external switch
> > > > to easily switch between host and device mode.
> > > >
> > > > What is not fixed:
> > > >
> > > > - when in device mode and unplugging/plugging the cable when using `ifconfig
> > > > usb0` the line "usb0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>" changes to
> > > > "usb0: flags=4099<UP,BROADCAST,MULTICAST>" as is supposed to, the route
> > > > table is updated and the dir `/sys/class/net/usb0` exists and in the dir
> > > > `cat carrier*` shows the carrier up and down counts. This is the expected
> > > > behavior.
> > > >
> > > > - when in device mode and switching to host mode `ifconfig usb0` continues
> > > > to show "RUNNING", the route table is not modified and the dir
> > > > `/sys/class/net/usb0` no longer exists.
> > > >
> > > > - switching to device mode again, USB EEM works fine, no changes to RUNNING
> > > > or the route table happen and the dir `/sys/class/net/usb0` still is non-
> > > > existing.
> > > >
> > > > - unplugging/plugging the cable in device mode after this does not restore
> > > > the original situation.
> > > >
> > > > This behavior I tested on v6.9.0-rc2 (with a few unrelated but essential
> > > > patches on top) and bisected back to this patch in v6.70-rc1.
> > > >
> > > > It seems `netif_device_detach` does not completely clean up as expected and
> > > > `netif_device_attach` does not completely rebuild.
> > > >
> > > > I am wondering if on other platforms this can be reproduced? If so, inho it
> > > > would be best to revert the both patches until the issue is resolved.
> > > >
> > > > Thanks,
> > > >
> > > > Ferry
> > > I'm wondering why the /sys/class/net/usb0 directory is being removed when the network interface is still available.
> > > This behavior seems not correct.
> >
> > Exactly. And this didn't happen before the 2 patches.
> >
> > To be precise: /sys/class/net/usb0 is not removed and it is a link, the link
> > target /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/net/usb0 no
> > longer exists

So, it means that the /sys/class/net/usb0 is present, but the symlink is broken. In that case, the dwc3 driver should recreate the device, and the symlink should become active again
I have the dwc3 IP base usb controller, Let me check with this patch and share result here.
May be we need some fix in dwc3

> >
> > > The gether_cleanup function should remove the interface along with the associated kobject,
> > > and this function should only be called during the unloading of the driver or deleting the gadget.
> > > Could you please confirm whether any of your custom modifications are removing the net interface kobject?
> >
> > As far as know not. I have one tusb1210 (usb phy) and 2 dwc3 patches,
> > nothing related to gadgets or net.
> >
> > For reference, patches are here: https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_htot_meta-2Dintel-2Dedison_tree_nanbield_meta-2Dintel-2Dedison-2Dbsp_recipes-2Dkernel_linux_files&d=DwICAg&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=SAhjP5GOmrADp1v_EE5jWoSuMlYCIt9gKduw-DCBPLs&m=b4sDyiL5E5Uvzote95pWzkQE_qFYYrvfF766keBAL41H9ZrXG_fF_I7mAnRnI32b&s=1t88BV-wyxvDFZcsmuO0wtanAnpRPP9TSw3ysu6ZUgs&e=
> >
> > 0001-phy-ti-tusb1210-write-to-scratch-on-power-on.patch
> >
> > 0001a-usb-dwc3-core-Fix-dwc3_core_soft_reset-before-anythi.patch
> >
> > 044-REVERTME-usb-dwc3-gadget-skip-endpoints-ep-18-in-out.patch
> >
> > Seems (just guessing) gether_cleanup is being called due to the switch to
> > host mode, but cleanup only partly succeeds. Now I'm finding I can make the
> > net interface disappear with `ip link set dev usb0 down` and the broken link
> > goes away by destroying the gadget in configfs.
> >
> > Is that intentional? Do I need to tear down the gadgets in reverse order as
> > on create when switching to host mode? That would be new.
> >
> > What happens when you trigger a switch to host mode without actually
> > unplugging your cable?
>
> Since there is no response, should we actually prepare revert next week?
>
> --
> With Best Regards,
> Andy Shevchenko
>
>


2024-04-11 17:57:11

by Andy Shevchenko

[permalink] [raw]
Subject: Re: [PATCH v4] usb: gadget: u_ether: Replace netif_stop_queue with netif_device_detach

On Thu, Apr 11, 2024 at 04:26:37PM +0200, Hardik Gajjar wrote:
> On Wed, Apr 10, 2024 at 08:37:42PM +0300, Andy Shevchenko wrote:
> > On Sun, Apr 07, 2024 at 10:51:51PM +0200, Ferry Toth wrote:
> > > Op 05-04-2024 om 13:38 schreef Hardik Gajjar:

..

> > > Exactly. And this didn't happen before the 2 patches.
> > >
> > > To be precise: /sys/class/net/usb0 is not removed and it is a link, the link
> > > target /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/net/usb0 no
> > > longer exists
>
> So, it means that the /sys/class/net/usb0 is present, but the symlink is
> broken. In that case, the dwc3 driver should recreate the device, and the
> symlink should become active again
> I have the dwc3 IP base usb controller, Let me check with this patch and
> share result here. May be we need some fix in dwc3

It's quite possible, please test on your side.
We are happy to test any fixes if you come up with.

--
With Best Regards,
Andy Shevchenko



2024-04-11 20:52:51

by Ferry Toth

[permalink] [raw]
Subject: Re: [PATCH v4] usb: gadget: u_ether: Replace netif_stop_queue with netif_device_detach

Hi

Op 11-04-2024 om 18:39 schreef Andy Shevchenko:
> On Thu, Apr 11, 2024 at 04:26:37PM +0200, Hardik Gajjar wrote:
>> On Wed, Apr 10, 2024 at 08:37:42PM +0300, Andy Shevchenko wrote:
>>> On Sun, Apr 07, 2024 at 10:51:51PM +0200, Ferry Toth wrote:
>>>> Op 05-04-2024 om 13:38 schreef Hardik Gajjar:
> ...
>
>>>> Exactly. And this didn't happen before the 2 patches.
>>>>
>>>> To be precise: /sys/class/net/usb0 is not removed and it is a link, the link
>>>> target /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/net/usb0 no
>>>> longer exists
>> So, it means that the /sys/class/net/usb0 is present, but the symlink is
>> broken. In that case, the dwc3 driver should recreate the device, and the
>> symlink should become active again

Yes, on first enabling gadget (when device mode is activated):

root@yuna:~# ls /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/
driver  net  power  sound  subsystem  suspended  uevent

Then switching to host mode:

root@yuna:~# ls /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/
ls: cannot access
'/sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/': No such
file or directory

Then back to device mode:

root@yuna:~# ls /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/
driver  power  sound  subsystem  suspended  uevent

net is missing. But, network functions:

root@yuna:~# ping 10.42.0.1
PING 10.42.0.1 (10.42.0.1): 56 data bytes

Mass storage device is created and removed each time as expected.

>> I have the dwc3 IP base usb controller, Let me check with this patch and
>> share result here. May be we need some fix in dwc3
Would have been nice if someone could test on other controller as well.
But another instance of dwc3 is also very welcome.
> It's quite possible, please test on your side.
> We are happy to test any fixes if you come up with.
>

2024-04-16 13:48:50

by Andy Shevchenko

[permalink] [raw]
Subject: Re: [PATCH v4] usb: gadget: u_ether: Replace netif_stop_queue with netif_device_detach

On Thu, Apr 11, 2024 at 10:52:36PM +0200, Ferry Toth wrote:
> Op 11-04-2024 om 18:39 schreef Andy Shevchenko:
> > On Thu, Apr 11, 2024 at 04:26:37PM +0200, Hardik Gajjar wrote:
> > > On Wed, Apr 10, 2024 at 08:37:42PM +0300, Andy Shevchenko wrote:
> > > > On Sun, Apr 07, 2024 at 10:51:51PM +0200, Ferry Toth wrote:
> > > > > Op 05-04-2024 om 13:38 schreef Hardik Gajjar:

..

> > > > > Exactly. And this didn't happen before the 2 patches.
> > > > >
> > > > > To be precise: /sys/class/net/usb0 is not removed and it is a link, the link
> > > > > target /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/net/usb0 no
> > > > > longer exists
> > > So, it means that the /sys/class/net/usb0 is present, but the symlink is
> > > broken. In that case, the dwc3 driver should recreate the device, and the
> > > symlink should become active again
>
> Yes, on first enabling gadget (when device mode is activated):
>
> root@yuna:~# ls /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/
> driver? net? power? sound? subsystem? suspended? uevent
>
> Then switching to host mode:
>
> root@yuna:~# ls /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/
> ls: cannot access
> '/sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/': No such file
> or directory
>
> Then back to device mode:
>
> root@yuna:~# ls /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/
> driver? power? sound? subsystem? suspended? uevent
>
> net is missing. But, network functions:
>
> root@yuna:~# ping 10.42.0.1
> PING 10.42.0.1 (10.42.0.1): 56 data bytes
>
> Mass storage device is created and removed each time as expected.

So, what's the conclusion? Shall we move towards revert of those two changes?

> > > I have the dwc3 IP base usb controller, Let me check with this patch and
> > > share result here. May be we need some fix in dwc3
> Would have been nice if someone could test on other controller as well. But
> another instance of dwc3 is also very welcome.
> > It's quite possible, please test on your side.
> > We are happy to test any fixes if you come up with.

--
With Best Regards,
Andy Shevchenko



2024-04-17 15:14:45

by Hardik Gajjar

[permalink] [raw]
Subject: Re: [PATCH v4] usb: gadget: u_ether: Replace netif_stop_queue with netif_device_detach

On Tue, Apr 16, 2024 at 04:48:32PM +0300, Andy Shevchenko wrote:
> On Thu, Apr 11, 2024 at 10:52:36PM +0200, Ferry Toth wrote:
> > Op 11-04-2024 om 18:39 schreef Andy Shevchenko:
> > > On Thu, Apr 11, 2024 at 04:26:37PM +0200, Hardik Gajjar wrote:
> > > > On Wed, Apr 10, 2024 at 08:37:42PM +0300, Andy Shevchenko wrote:
> > > > > On Sun, Apr 07, 2024 at 10:51:51PM +0200, Ferry Toth wrote:
> > > > > > Op 05-04-2024 om 13:38 schreef Hardik Gajjar:
>
> ...
>
> > > > > > Exactly. And this didn't happen before the 2 patches.
> > > > > >
> > > > > > To be precise: /sys/class/net/usb0 is not removed and it is a link, the link
> > > > > > target /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/net/usb0 no
> > > > > > longer exists
> > > > So, it means that the /sys/class/net/usb0 is present, but the symlink is
> > > > broken. In that case, the dwc3 driver should recreate the device, and the
> > > > symlink should become active again
> >
> > Yes, on first enabling gadget (when device mode is activated):
> >
> > root@yuna:~# ls /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/
> > driver  net  power  sound  subsystem  suspended  uevent
> >
> > Then switching to host mode:
> >
> > root@yuna:~# ls /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/
> > ls: cannot access
> > '/sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/': No such file
> > or directory
> >
> > Then back to device mode:
> >
> > root@yuna:~# ls /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/
> > driver  power  sound  subsystem  suspended  uevent
> >
> > net is missing. But, network functions:
> >
> > root@yuna:~# ping 10.42.0.1
> > PING 10.42.0.1 (10.42.0.1): 56 data bytes
> >
> > Mass storage device is created and removed each time as expected.
>
> So, what's the conclusion? Shall we move towards revert of those two changes?


As promised, I have the tested the this patch with the dwc3 gadget. I could not reproduce
the issue.

I can see the usb0 exist all the time and accessible regardless of the role switching of the USB mode (peripheral <-> host)

Following are the logs:
//Host to device

console:/sys/bus/platform/devices/a800000.ssusb # echo "peripheral" > mode
console:/sys/bus/platform/devices/a800000.ssusb # ls a800000.dwc3/gadget/net/
usb0

//device to host
console:/sys/bus/platform/devices/a800000.ssusb # echo "host" > mode
console:/sys/bus/platform/devices/a800000.ssusb # ls a800000.dwc3/gadget/net/
usb0
s a800000.dwc3/gadget/net/usb0 <
addr_assign_type duplex phys_port_name
addr_len flags phys_switch_id
address gro_flush_timeout power
broadcast ifalias proto_down
carrier ifindex queues
carrier_changes iflink speed
carrier_down_count link_mode statistics
carrier_up_count mtu subsystem
dev_id name_assign_type tx_queue_len
dev_port netdev_group type
device operstate uevent
dormant phys_port_id waiting_for_supplier
console:/sys/bus/platform/devices/a800000.ssusb # ifconfig -a usb0
usb0 Link encap:Ethernet HWaddr 3a:8b:63:97:1a:9a
BROADCAST MULTICAST MTU:1500 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:0 TX bytes:0

console:/sys/bus/platform/devices/a800000.ssusb #

I strongly advise against reverting the patch solely based on the observed issue of removing the /sys/class/net/usb0 directory while the usb0 interface remains available.
Instead, I recommend enabling FTRACE to trace the functions involved and identify which faulty call is responsible for removing usb0.

According to current kernel architecture of u_ether driver, only gether_cleanup should remove the usb0 interface along with its kobject and sysfs interface.
I suggest sharing the analysis here to understand why this practice is not followed in your use case or driver ?

I am curious why the driver was developed without adhering to the kernel's gadget architecture.

>
> > > > I have the dwc3 IP base usb controller, Let me check with this patch and
> > > > share result here. May be we need some fix in dwc3
> > Would have been nice if someone could test on other controller as well. But
> > another instance of dwc3 is also very welcome.
> > > It's quite possible, please test on your side.
> > > We are happy to test any fixes if you come up with.
>
> --
> With Best Regards,
> Andy Shevchenko
>
>

2024-04-25 21:28:29

by Ferry Toth

[permalink] [raw]
Subject: Re: [PATCH v4] usb: gadget: u_ether: Replace netif_stop_queue with netif_device_detach

Hi,

Op 17-04-2024 om 17:13 schreef Hardik Gajjar:
> On Tue, Apr 16, 2024 at 04:48:32PM +0300, Andy Shevchenko wrote:
>> On Thu, Apr 11, 2024 at 10:52:36PM +0200, Ferry Toth wrote:
>>> Op 11-04-2024 om 18:39 schreef Andy Shevchenko:
>>>> On Thu, Apr 11, 2024 at 04:26:37PM +0200, Hardik Gajjar wrote:
>>>>> On Wed, Apr 10, 2024 at 08:37:42PM +0300, Andy Shevchenko wrote:
>>>>>> On Sun, Apr 07, 2024 at 10:51:51PM +0200, Ferry Toth wrote:
>>>>>>> Op 05-04-2024 om 13:38 schreef Hardik Gajjar:
>>
>> ...
>>
>>>>>>> Exactly. And this didn't happen before the 2 patches.
>>>>>>>
>>>>>>> To be precise: /sys/class/net/usb0 is not removed and it is a link, the link
>>>>>>> target /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/net/usb0 no
>>>>>>> longer exists
>>>>> So, it means that the /sys/class/net/usb0 is present, but the symlink is
>>>>> broken. In that case, the dwc3 driver should recreate the device, and the
>>>>> symlink should become active again
>>>
>>> Yes, on first enabling gadget (when device mode is activated):
>>>
>>> root@yuna:~# ls /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/
>>> driver  net  power  sound  subsystem  suspended  uevent
>>>
>>> Then switching to host mode:
>>>
>>> root@yuna:~# ls /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/
>>> ls: cannot access
>>> '/sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/': No such file
>>> or directory
>>>
>>> Then back to device mode:
>>>
>>> root@yuna:~# ls /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/
>>> driver  power  sound  subsystem  suspended  uevent
>>>
>>> net is missing. But, network functions:
>>>
>>> root@yuna:~# ping 10.42.0.1
>>> PING 10.42.0.1 (10.42.0.1): 56 data bytes
>>>
>>> Mass storage device is created and removed each time as expected.
>>
>> So, what's the conclusion? Shall we move towards revert of those two changes?
>
>
> As promised, I have the tested the this patch with the dwc3 gadget. I could not reproduce
> the issue.
>
> I can see the usb0 exist all the time and accessible regardless of the role switching of the USB mode (peripheral <-> host)
>
> Following are the logs:
> //Host to device
>
> console:/sys/bus/platform/devices/a800000.ssusb # echo "peripheral" > mode
> console:/sys/bus/platform/devices/a800000.ssusb # ls a800000.dwc3/gadget/net/
> usb0
>
> //device to host
> console:/sys/bus/platform/devices/a800000.ssusb # echo "host" > mode
> console:/sys/bus/platform/devices/a800000.ssusb # ls a800000.dwc3/gadget/net/
> usb0

That is weird. When I switch to host mode (using the physical switch),
the whole gadget directory is removed (now testing 6.9.0-rc5)

Switching back to device mode, that gadget directory is recreated. And
gadget/sound as well, but not gadget/net.

> s a800000.dwc3/gadget/net/usb0 <
> addr_assign_type duplex phys_port_name
> addr_len flags phys_switch_id
> address gro_flush_timeout power
> broadcast ifalias proto_down
> carrier ifindex queues
> carrier_changes iflink speed
> carrier_down_count link_mode statistics
> carrier_up_count mtu subsystem
> dev_id name_assign_type tx_queue_len
> dev_port netdev_group type
> device operstate uevent
> dormant phys_port_id waiting_for_supplier
> console:/sys/bus/platform/devices/a800000.ssusb # ifconfig -a usb0
> usb0 Link encap:Ethernet HWaddr 3a:8b:63:97:1a:9a
> BROADCAST MULTICAST MTU:1500 Metric:1
> RX packets:0 errors:0 dropped:0 overruns:0 frame:0
> TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
> collisions:0 txqueuelen:1000
> RX bytes:0 TX bytes:0
>
> console:/sys/bus/platform/devices/a800000.ssusb #
>
> I strongly advise against reverting the patch solely based on the observed issue of removing the /sys/class/net/usb0 directory while the usb0 interface remains available.

There's more to it. I also mentioned that switching the role or
unplugging the cable leaves the usb0 connection.

I have while in host mode:
root@yuna:~# ifconfig -a usb0
usb0: flags=-28605<UP,BROADCAST,RUNNING,MULTICAST,DYNAMIC> mtu 1500
inet 10.42.0.221 netmask 255.255.255.0 broadcast 10.42.0.255
inet6 fe80::a8bb:ccff:fedd:eef1 prefixlen 64 scopeid 0x20<link>


You don't see that because you didn't create a connection at all.

> Instead, I recommend enabling FTRACE to trace the functions involved and identify which faulty call is responsible for removing usb0.

Switching from device -> host -> device:

root@yuna:~# trace-cmd record -p function_graph -l *gether_*
plugin 'function_graph'
Hit Ctrl^C to stop recording
^CCPU0 data recorded at offset=0x1c8000
188 bytes in size (4096 uncompressed)
CPU1 data recorded at offset=0x1c9000
0 bytes in size (0 uncompressed)
root@yuna:~# trace-cmd report
cpus=2
irq/68-dwc3-725 [000] 514.575337: funcgraph_entry: #
2079.480 us | gether_disconnect();
irq/68-dwc3-946 [000] 524.263731: funcgraph_entry: +
11.640 us | gether_disconnect();
irq/68-dwc3-946 [000] 524.263743: funcgraph_entry: !
116.520 us | gether_connect();
irq/68-dwc3-946 [000] 524.268029: funcgraph_entry: #
2057.260 us | gether_disconnect();
irq/68-dwc3-946 [000] 524.270089: funcgraph_entry: !
109.000 us | gether_connect();


> According to current kernel architecture of u_ether driver, only gether_cleanup should remove the usb0 interface along with its kobject and sysfs interface.
> I suggest sharing the analysis here to understand why this practice is not followed in your use case or driver ?

Yes, I'll try to trace where that happens.

Nevertheless, the disappearance of the net/usb0 directory seems
harmless? But the usb: net device remaining after disconnect or role
switch is not good, as the route remains.

May be they are 2 separate problems. Could you try to reproduce what
happens if you make eem connection and then unplug?

> I am curious why the driver was developed without adhering to the kernel's gadget architecture.
>
>>
>>>>> I have the dwc3 IP base usb controller, Let me check with this patch and
>>>>> share result here. May be we need some fix in dwc3
>>> Would have been nice if someone could test on other controller as well. But
>>> another instance of dwc3 is also very welcome.
>>>> It's quite possible, please test on your side.
>>>> We are happy to test any fixes if you come up with.
>>
>> --
>> With Best Regards,
>> Andy Shevchenko
>>
>>


2024-04-28 21:07:59

by Ferry Toth

[permalink] [raw]
Subject: Re: [PATCH v4] usb: gadget: u_ether: Replace netif_stop_queue with netif_device_detach

Hi,

Op 25-04-2024 om 23:27 schreef Ferry Toth:
> Hi,
>
> Op 17-04-2024 om 17:13 schreef Hardik Gajjar:
>> On Tue, Apr 16, 2024 at 04:48:32PM +0300, Andy Shevchenko wrote:
>>> On Thu, Apr 11, 2024 at 10:52:36PM +0200, Ferry Toth wrote:
>>>> Op 11-04-2024 om 18:39 schreef Andy Shevchenko:
>>>>> On Thu, Apr 11, 2024 at 04:26:37PM +0200, Hardik Gajjar wrote:
>>>>>> On Wed, Apr 10, 2024 at 08:37:42PM +0300, Andy Shevchenko wrote:
>>>>>>> On Sun, Apr 07, 2024 at 10:51:51PM +0200, Ferry Toth wrote:
>>>>>>>> Op 05-04-2024 om 13:38 schreef Hardik Gajjar:
>>>
>>> ...
>>>
>>>>>>>> Exactly. And this didn't happen before the 2 patches.
>>>>>>>>
>>>>>>>> To be precise: /sys/class/net/usb0 is not removed and it is a
>>>>>>>> link, the link
>>>>>>>> target
>>>>>>>> /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/net/usb0 no
>>>>>>>> longer exists
>>>>>> So, it means that the /sys/class/net/usb0 is present, but the
>>>>>> symlink is
>>>>>> broken. In that case, the dwc3 driver should recreate the device,
>>>>>> and the
>>>>>> symlink should become active again
>>>>
>>>> Yes, on first enabling gadget (when device mode is activated):
>>>>
>>>> root@yuna:~# ls
>>>> /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/
>>>> driver  net  power  sound  subsystem  suspended  uevent
>>>>
>>>> Then switching to host mode:
>>>>
>>>> root@yuna:~# ls
>>>> /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/
>>>> ls: cannot access
>>>> '/sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/': No
>>>> such file
>>>> or directory
>>>>
>>>> Then back to device mode:
>>>>
>>>> root@yuna:~# ls
>>>> /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/
>>>> driver  power  sound  subsystem  suspended  uevent
>>>>
>>>> net is missing. But, network functions:
>>>>
>>>> root@yuna:~# ping 10.42.0.1
>>>> PING 10.42.0.1 (10.42.0.1): 56 data bytes
>>>>
>>>> Mass storage device is created and removed each time as expected.
>>>
>>> So, what's the conclusion? Shall we move towards revert of those two
>>> changes?
>>
>>
>> As promised, I have the tested the this patch with the dwc3 gadget. I
>> could not reproduce
>> the issue.
>>
>> I can see the usb0 exist all the time and accessible regardless of the
>> role switching of the USB mode (peripheral <-> host)
>>
>> Following are the logs:
>> //Host to device
>>
>> console:/sys/bus/platform/devices/a800000.ssusb # echo "peripheral" >
>> mode
>> console:/sys/bus/platform/devices/a800000.ssusb # ls
>> a800000.dwc3/gadget/net/
>> usb0
>>
>> //device to host
>> console:/sys/bus/platform/devices/a800000.ssusb # echo "host" > mode
>> console:/sys/bus/platform/devices/a800000.ssusb # ls
>> a800000.dwc3/gadget/net/
>> usb0
>
> That is weird. When I switch to host mode (using the physical switch),
> the whole gadget directory is removed (now testing 6.9.0-rc5)
>
> Switching back to device mode, that gadget directory is recreated. And
> gadget/sound as well, but not gadget/net.
>
>> s
>> a800000.dwc3/gadget/net/usb0                                                <
>> addr_assign_type    duplex             phys_port_name
>> addr_len            flags              phys_switch_id
>> address             gro_flush_timeout  power
>> broadcast           ifalias            proto_down
>> carrier             ifindex            queues
>> carrier_changes     iflink             speed
>> carrier_down_count  link_mode          statistics
>> carrier_up_count    mtu                subsystem
>> dev_id              name_assign_type   tx_queue_len
>> dev_port            netdev_group       type
>> device              operstate          uevent
>> dormant             phys_port_id       waiting_for_supplier
>> console:/sys/bus/platform/devices/a800000.ssusb # ifconfig -a usb0
>> usb0      Link encap:Ethernet  HWaddr 3a:8b:63:97:1a:9a
>>            BROADCAST MULTICAST  MTU:1500  Metric:1
>>            RX packets:0 errors:0 dropped:0 overruns:0 frame:0
>>            TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
>>            collisions:0 txqueuelen:1000
>>            RX bytes:0 TX bytes:0
>>
>> console:/sys/bus/platform/devices/a800000.ssusb #
>>
>> I strongly advise against reverting the patch solely based on the
>> observed issue of removing the /sys/class/net/usb0 directory while the
>> usb0 interface remains available.
>
> There's more to it. I also mentioned that switching the role or
> unplugging the cable leaves the usb0 connection.
>
> I have while in host mode:
> root@yuna:~# ifconfig -a usb0
> usb0: flags=-28605<UP,BROADCAST,RUNNING,MULTICAST,DYNAMIC>  mtu 1500
>         inet 10.42.0.221  netmask 255.255.255.0  broadcast 10.42.0.255
>         inet6 fe80::a8bb:ccff:fedd:eef1  prefixlen 64  scopeid 0x20<link>
>
>
> You don't see that because you didn't create a connection at all.
>
>> Instead, I recommend enabling FTRACE to trace the functions involved
>> and identify which faulty call is responsible for removing usb0.
>
> Switching from device -> host -> device:
>
> root@yuna:~# trace-cmd record -p function_graph -l *gether_*
>   plugin 'function_graph'
> Hit Ctrl^C to stop recording
> ^CCPU0 data recorded at offset=0x1c8000
>     188 bytes in size (4096 uncompressed)
> CPU1 data recorded at offset=0x1c9000
>     0 bytes in size (0 uncompressed)
> root@yuna:~# trace-cmd report
> cpus=2
>      irq/68-dwc3-725   [000]   514.575337: funcgraph_entry:      #
> 2079.480 us |  gether_disconnect();
>      irq/68-dwc3-946   [000]   524.263731: funcgraph_entry:      +
> 11.640 us  |  gether_disconnect();
>      irq/68-dwc3-946   [000]   524.263743: funcgraph_entry:      !
> 116.520 us |  gether_connect();
>      irq/68-dwc3-946   [000]   524.268029: funcgraph_entry:      #
> 2057.260 us |  gether_disconnect();
>      irq/68-dwc3-946   [000]   524.270089: funcgraph_entry:      !
> 109.000 us |  gether_connect();

I tried to get a more useful trace:
root@yuna:/sys/kernel/tracing# echo 'gether_*' > set_ftrace_filter
root@yuna:/sys/kernel/tracing# echo 'eem_*' >> set_ftrace_filter
root@yuna:/sys/kernel/tracing# echo function > current_tracer
root@yuna:/sys/kernel/tracing# echo 'reset_config' >> set_ftrace_filter
-> switch to host mode then back to device
root@yuna:/sys/kernel/tracing# cat trace
# tracer: function
#
# entries-in-buffer/entries-written: 53/53 #P:2
#
# _-----=> irqs-off/BH-disabled
# / _----=> need-resched
# | / _---=> hardirq/softirq
# || / _--=> preempt-depth
# ||| / _-=> migrate-disable
# |||| / delay
# TASK-PID CPU# ||||| TIMESTAMP FUNCTION
# | | | ||||| | |
irq/68-dwc3-523 [000] D..3. 133.990254: reset_config
<-__composite_disconnect
irq/68-dwc3-523 [000] D..3. 133.992274: eem_disable
<-reset_config
irq/68-dwc3-523 [000] D..3. 133.992276: gether_disconnect
<-reset_config
kworker/1:3-443 [001] ...1. 134.022453: eem_unbind
<-purge_configs_funcs

-> to device mode

kworker/1:3-443 [001] ...1. 148.630773: eem_bind
<-usb_add_function
irq/68-dwc3-734 [000] D..3. 149.155209: eem_set_alt
<-composite_setup
irq/68-dwc3-734 [000] D..3. 149.155215: gether_disconnect
<-eem_set_alt
irq/68-dwc3-734 [000] D..3. 149.155220: gether_connect
<-eem_set_alt
irq/68-dwc3-734 [000] D..3. 149.157287: eem_set_alt
<-composite_setup
irq/68-dwc3-734 [000] D..3. 149.157292: gether_disconnect
<-eem_set_alt
irq/68-dwc3-734 [000] D..3. 149.159338: gether_connect
<-eem_set_alt
irq/68-dwc3-734 [000] D..2. 149.239625: eem_unwrap <-rx_complete
..

I don't know where to look exactly. Any hints?

>
>> According to current kernel architecture of u_ether driver, only
>> gether_cleanup should remove the usb0 interface along with its kobject
>> and sysfs interface.
>> I suggest sharing the analysis here to understand why this practice is
>> not followed in your use case or driver ?
>
> Yes, I'll try to trace where that happens.
>
> Nevertheless, the disappearance of the net/usb0 directory seems
> harmless? But the usb: net device remaining after disconnect or role
> switch is not good, as the route remains.
>
> May be they are 2 separate problems. Could you try to reproduce what
> happens if you make eem connection and then unplug?
>
>> I am curious why the driver was developed without adhering to the
>> kernel's gadget architecture.

I don't know what you mean here. Which driver do you mean?

>>>
>>>>>> I have the dwc3 IP base usb controller, Let me check with this
>>>>>> patch and
>>>>>> share result here.  May be we need some fix in dwc3
>>>> Would have been nice if someone could test on other controller as
>>>> well. But
>>>> another instance of dwc3 is also very welcome.
>>>>> It's quite possible, please test on your side.
>>>>> We are happy to test any fixes if you come up with.
>>>
>>> --
>>> With Best Regards,
>>> Andy Shevchenko
>>>
>>>
>

2024-04-30 15:34:38

by Hardik Gajjar

[permalink] [raw]
Subject: Re: [PATCH v4] usb: gadget: u_ether: Replace netif_stop_queue with netif_device_detach

On Sun, Apr 28, 2024 at 11:07:36PM +0200, Ferry Toth wrote:
> Hi,
>
> Op 25-04-2024 om 23:27 schreef Ferry Toth:
> > Hi,
> >
> > Op 17-04-2024 om 17:13 schreef Hardik Gajjar:
> > > On Tue, Apr 16, 2024 at 04:48:32PM +0300, Andy Shevchenko wrote:
> > > > On Thu, Apr 11, 2024 at 10:52:36PM +0200, Ferry Toth wrote:
> > > > > Op 11-04-2024 om 18:39 schreef Andy Shevchenko:
> > > > > > On Thu, Apr 11, 2024 at 04:26:37PM +0200, Hardik Gajjar wrote:
> > > > > > > On Wed, Apr 10, 2024 at 08:37:42PM +0300, Andy Shevchenko wrote:
> > > > > > > > On Sun, Apr 07, 2024 at 10:51:51PM +0200, Ferry Toth wrote:
> > > > > > > > > Op 05-04-2024 om 13:38 schreef Hardik Gajjar:
> > > >
> > > > ...
> > > >
> > > > > > > > > Exactly. And this didn't happen before the 2 patches.
> > > > > > > > >
> > > > > > > > > To be precise: /sys/class/net/usb0 is not
> > > > > > > > > removed and it is a link, the link
> > > > > > > > > target /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/net/usb0
> > > > > > > > > no
> > > > > > > > > longer exists
> > > > > > > So, it means that the /sys/class/net/usb0 is
> > > > > > > present, but the symlink is
> > > > > > > broken. In that case, the dwc3 driver should
> > > > > > > recreate the device, and the
> > > > > > > symlink should become active again
> > > > >
> > > > > Yes, on first enabling gadget (when device mode is activated):
> > > > >
> > > > > root@yuna:~# ls
> > > > > /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/
> > > > > driver  net  power  sound  subsystem  suspended  uevent
> > > > >
> > > > > Then switching to host mode:
> > > > >
> > > > > root@yuna:~# ls
> > > > > /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/
> > > > > ls: cannot access
> > > > > '/sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/':
> > > > > No such file
> > > > > or directory
> > > > >
> > > > > Then back to device mode:
> > > > >
> > > > > root@yuna:~# ls
> > > > > /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/
> > > > > driver  power  sound  subsystem  suspended  uevent
> > > > >
> > > > > net is missing. But, network functions:
> > > > >
> > > > > root@yuna:~# ping 10.42.0.1
> > > > > PING 10.42.0.1 (10.42.0.1): 56 data bytes
> > > > >
> > > > > Mass storage device is created and removed each time as expected.
> > > >
> > > > So, what's the conclusion? Shall we move towards revert of those
> > > > two changes?
> > >
> > >
> > > As promised, I have the tested the this patch with the dwc3 gadget.
> > > I could not reproduce
> > > the issue.
> > >
> > > I can see the usb0 exist all the time and accessible regardless of
> > > the role switching of the USB mode (peripheral <-> host)
> > >
> > > Following are the logs:
> > > //Host to device
> > >
> > > console:/sys/bus/platform/devices/a800000.ssusb # echo "peripheral"
> > > > mode
> > > console:/sys/bus/platform/devices/a800000.ssusb # ls
> > > a800000.dwc3/gadget/net/
> > > usb0
> > >
> > > //device to host
> > > console:/sys/bus/platform/devices/a800000.ssusb # echo "host" > mode
> > > console:/sys/bus/platform/devices/a800000.ssusb # ls
> > > a800000.dwc3/gadget/net/
> > > usb0
> >
> > That is weird. When I switch to host mode (using the physical switch),
> > the whole gadget directory is removed (now testing 6.9.0-rc5)
> >
> > Switching back to device mode, that gadget directory is recreated. And
> > gadget/sound as well, but not gadget/net.
> >
> > > s a800000.dwc3/gadget/net/usb0                                               
> > > <
> > > addr_assign_type    duplex             phys_port_name
> > > addr_len            flags              phys_switch_id
> > > address             gro_flush_timeout  power
> > > broadcast           ifalias            proto_down
> > > carrier             ifindex            queues
> > > carrier_changes     iflink             speed
> > > carrier_down_count  link_mode          statistics
> > > carrier_up_count    mtu                subsystem
> > > dev_id              name_assign_type   tx_queue_len
> > > dev_port            netdev_group       type
> > > device              operstate          uevent
> > > dormant             phys_port_id       waiting_for_supplier
> > > console:/sys/bus/platform/devices/a800000.ssusb # ifconfig -a usb0
> > > usb0      Link encap:Ethernet  HWaddr 3a:8b:63:97:1a:9a
> > >            BROADCAST MULTICAST  MTU:1500  Metric:1
> > >            RX packets:0 errors:0 dropped:0 overruns:0 frame:0
> > >            TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
> > >            collisions:0 txqueuelen:1000
> > >            RX bytes:0 TX bytes:0
> > >
> > > console:/sys/bus/platform/devices/a800000.ssusb #
> > >
> > > I strongly advise against reverting the patch solely based on the
> > > observed issue of removing the /sys/class/net/usb0 directory while
> > > the usb0 interface remains available.
> >
> > There's more to it. I also mentioned that switching the role or
> > unplugging the cable leaves the usb0 connection.
> >
> > I have while in host mode:
> > root@yuna:~# ifconfig -a usb0
> > usb0: flags=-28605<UP,BROADCAST,RUNNING,MULTICAST,DYNAMIC>  mtu 1500
> >         inet 10.42.0.221  netmask 255.255.255.0  broadcast 10.42.0.255
> >         inet6 fe80::a8bb:ccff:fedd:eef1  prefixlen 64  scopeid 0x20<link>
> >
> >
> > You don't see that because you didn't create a connection at all.
> >
> > > Instead, I recommend enabling FTRACE to trace the functions involved
> > > and identify which faulty call is responsible for removing usb0.
> >
> > Switching from device -> host -> device:
> >
> > root@yuna:~# trace-cmd record -p function_graph -l *gether_*
> >   plugin 'function_graph'
> > Hit Ctrl^C to stop recording
> > ^CCPU0 data recorded at offset=0x1c8000
> >     188 bytes in size (4096 uncompressed)
> > CPU1 data recorded at offset=0x1c9000
> >     0 bytes in size (0 uncompressed)
> > root@yuna:~# trace-cmd report
> > cpus=2
> >      irq/68-dwc3-725   [000]   514.575337: funcgraph_entry:      #
> > 2079.480 us |  gether_disconnect();
> >      irq/68-dwc3-946   [000]   524.263731: funcgraph_entry:      +
> > 11.640 us  |  gether_disconnect();
> >      irq/68-dwc3-946   [000]   524.263743: funcgraph_entry:      !
> > 116.520 us |  gether_connect();
> >      irq/68-dwc3-946   [000]   524.268029: funcgraph_entry:      #
> > 2057.260 us |  gether_disconnect();
> >      irq/68-dwc3-946   [000]   524.270089: funcgraph_entry:      !
> > 109.000 us |  gether_connect();
>
> I tried to get a more useful trace:
> root@yuna:/sys/kernel/tracing# echo 'gether_*' > set_ftrace_filter
> root@yuna:/sys/kernel/tracing# echo 'eem_*' >> set_ftrace_filter
> root@yuna:/sys/kernel/tracing# echo function > current_tracer
> root@yuna:/sys/kernel/tracing# echo 'reset_config' >> set_ftrace_filter
> -> switch to host mode then back to device
> root@yuna:/sys/kernel/tracing# cat trace
> # tracer: function
> #
> # entries-in-buffer/entries-written: 53/53 #P:2
> #
> # _-----=> irqs-off/BH-disabled
> # / _----=> need-resched
> # | / _---=> hardirq/softirq
> # || / _--=> preempt-depth
> # ||| / _-=> migrate-disable
> # |||| / delay
> # TASK-PID CPU# ||||| TIMESTAMP FUNCTION
> # | | | ||||| | |
> irq/68-dwc3-523 [000] D..3. 133.990254: reset_config
> <-__composite_disconnect
> irq/68-dwc3-523 [000] D..3. 133.992274: eem_disable
> <-reset_config
> irq/68-dwc3-523 [000] D..3. 133.992276: gether_disconnect
> <-reset_config
> kworker/1:3-443 [001] ...1. 134.022453: eem_unbind
> <-purge_configs_funcs
>
> -> to device mode
>
> kworker/1:3-443 [001] ...1. 148.630773: eem_bind
> <-usb_add_function
> irq/68-dwc3-734 [000] D..3. 149.155209: eem_set_alt
> <-composite_setup
> irq/68-dwc3-734 [000] D..3. 149.155215: gether_disconnect
> <-eem_set_alt
> irq/68-dwc3-734 [000] D..3. 149.155220: gether_connect
> <-eem_set_alt
> irq/68-dwc3-734 [000] D..3. 149.157287: eem_set_alt
> <-composite_setup
> irq/68-dwc3-734 [000] D..3. 149.157292: gether_disconnect
> <-eem_set_alt
> irq/68-dwc3-734 [000] D..3. 149.159338: gether_connect
> <-eem_set_alt
> irq/68-dwc3-734 [000] D..2. 149.239625: eem_unwrap <-rx_complete
> ...
>
> I don't know where to look exactly. Any hints?

do you see anything related to gether_cleanup() after eem_unbind() ?
If not then, you may try to enable tracing of TCP/IP stack and network side to check who deleting the sysfs entry

Hardik


>
> >
> > > According to current kernel architecture of u_ether driver, only
> > > gether_cleanup should remove the usb0 interface along with its
> > > kobject and sysfs interface.
> > > I suggest sharing the analysis here to understand why this practice
> > > is not followed in your use case or driver ?
> >
> > Yes, I'll try to trace where that happens.
> >
> > Nevertheless, the disappearance of the net/usb0 directory seems
> > harmless? But the usb: net device remaining after disconnect or role
> > switch is not good, as the route remains.
> >
> > May be they are 2 separate problems. Could you try to reproduce what
> > happens if you make eem connection and then unplug?
> >
> > > I am curious why the driver was developed without adhering to the
> > > kernel's gadget architecture.
>
> I don't know what you mean here. Which driver do you mean?
>
> > > >
> > > > > > > I have the dwc3 IP base usb controller, Let me check
> > > > > > > with this patch and
> > > > > > > share result here.  May be we need some fix in dwc3
> > > > > Would have been nice if someone could test on other
> > > > > controller as well. But
> > > > > another instance of dwc3 is also very welcome.
> > > > > > It's quite possible, please test on your side.
> > > > > > We are happy to test any fixes if you come up with.
> > > >
> > > > --
> > > > With Best Regards,
> > > > Andy Shevchenko
> > > >
> > > >
> >

2024-04-30 19:42:34

by Ferry Toth

[permalink] [raw]
Subject: Re: [PATCH v4] usb: gadget: u_ether: Replace netif_stop_queue with netif_device_detach

Hi,

Op 30-04-2024 om 17:32 schreef Hardik Gajjar:
> On Sun, Apr 28, 2024 at 11:07:36PM +0200, Ferry Toth wrote:
>> Hi,
>>
>> Op 25-04-2024 om 23:27 schreef Ferry Toth:
>>> Hi,
>>>
>>> Op 17-04-2024 om 17:13 schreef Hardik Gajjar:
>>>> On Tue, Apr 16, 2024 at 04:48:32PM +0300, Andy Shevchenko wrote:
>>>>> On Thu, Apr 11, 2024 at 10:52:36PM +0200, Ferry Toth wrote:
>>>>>> Op 11-04-2024 om 18:39 schreef Andy Shevchenko:
>>>>>>> On Thu, Apr 11, 2024 at 04:26:37PM +0200, Hardik Gajjar wrote:
>>>>>>>> On Wed, Apr 10, 2024 at 08:37:42PM +0300, Andy Shevchenko wrote:
>>>>>>>>> On Sun, Apr 07, 2024 at 10:51:51PM +0200, Ferry Toth wrote:
>>>>>>>>>> Op 05-04-2024 om 13:38 schreef Hardik Gajjar:
>>>>>
>>>>> ...
>>>>>
>>>>>>>>>> Exactly. And this didn't happen before the 2 patches.
>>>>>>>>>>
>>>>>>>>>> To be precise: /sys/class/net/usb0 is not
>>>>>>>>>> removed and it is a link, the link
>>>>>>>>>> target /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/net/usb0
>>>>>>>>>> no
>>>>>>>>>> longer exists
>>>>>>>> So, it means that the /sys/class/net/usb0 is
>>>>>>>> present, but the symlink is
>>>>>>>> broken. In that case, the dwc3 driver should
>>>>>>>> recreate the device, and the
>>>>>>>> symlink should become active again
>>>>>>
>>>>>> Yes, on first enabling gadget (when device mode is activated):
>>>>>>
>>>>>> root@yuna:~# ls
>>>>>> /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/
>>>>>> driver  net  power  sound  subsystem  suspended  uevent
>>>>>>
>>>>>> Then switching to host mode:
>>>>>>
>>>>>> root@yuna:~# ls
>>>>>> /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/
>>>>>> ls: cannot access
>>>>>> '/sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/':
>>>>>> No such file
>>>>>> or directory
>>>>>>
>>>>>> Then back to device mode:
>>>>>>
>>>>>> root@yuna:~# ls
>>>>>> /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/
>>>>>> driver  power  sound  subsystem  suspended  uevent
>>>>>>
>>>>>> net is missing. But, network functions:
>>>>>>
>>>>>> root@yuna:~# ping 10.42.0.1
>>>>>> PING 10.42.0.1 (10.42.0.1): 56 data bytes
>>>>>>
>>>>>> Mass storage device is created and removed each time as expected.
>>>>>
>>>>> So, what's the conclusion? Shall we move towards revert of those
>>>>> two changes?
>>>>
>>>>
>>>> As promised, I have the tested the this patch with the dwc3 gadget.
>>>> I could not reproduce
>>>> the issue.
>>>>
>>>> I can see the usb0 exist all the time and accessible regardless of
>>>> the role switching of the USB mode (peripheral <-> host)
>>>>
>>>> Following are the logs:
>>>> //Host to device
>>>>
>>>> console:/sys/bus/platform/devices/a800000.ssusb # echo "peripheral"
>>>>> mode
>>>> console:/sys/bus/platform/devices/a800000.ssusb # ls
>>>> a800000.dwc3/gadget/net/
>>>> usb0
>>>>
>>>> //device to host
>>>> console:/sys/bus/platform/devices/a800000.ssusb # echo "host" > mode
>>>> console:/sys/bus/platform/devices/a800000.ssusb # ls
>>>> a800000.dwc3/gadget/net/
>>>> usb0
>>>
>>> That is weird. When I switch to host mode (using the physical switch),
>>> the whole gadget directory is removed (now testing 6.9.0-rc5)
>>>
>>> Switching back to device mode, that gadget directory is recreated. And
>>> gadget/sound as well, but not gadget/net.
>>>
>>>> s a800000.dwc3/gadget/net/usb0
>>>> <
>>>> addr_assign_type    duplex             phys_port_name
>>>> addr_len            flags              phys_switch_id
>>>> address             gro_flush_timeout  power
>>>> broadcast           ifalias            proto_down
>>>> carrier             ifindex            queues
>>>> carrier_changes     iflink             speed
>>>> carrier_down_count  link_mode          statistics
>>>> carrier_up_count    mtu                subsystem
>>>> dev_id              name_assign_type   tx_queue_len
>>>> dev_port            netdev_group       type
>>>> device              operstate          uevent
>>>> dormant             phys_port_id       waiting_for_supplier
>>>> console:/sys/bus/platform/devices/a800000.ssusb # ifconfig -a usb0
>>>> usb0      Link encap:Ethernet  HWaddr 3a:8b:63:97:1a:9a
>>>>            BROADCAST MULTICAST  MTU:1500  Metric:1
>>>>            RX packets:0 errors:0 dropped:0 overruns:0 frame:0
>>>>            TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
>>>>            collisions:0 txqueuelen:1000
>>>>            RX bytes:0 TX bytes:0
>>>>
>>>> console:/sys/bus/platform/devices/a800000.ssusb #
>>>>
>>>> I strongly advise against reverting the patch solely based on the
>>>> observed issue of removing the /sys/class/net/usb0 directory while
>>>> the usb0 interface remains available.
>>>
>>> There's more to it. I also mentioned that switching the role or
>>> unplugging the cable leaves the usb0 connection.
>>>
>>> I have while in host mode:
>>> root@yuna:~# ifconfig -a usb0
>>> usb0: flags=-28605<UP,BROADCAST,RUNNING,MULTICAST,DYNAMIC>  mtu 1500
>>>         inet 10.42.0.221  netmask 255.255.255.0  broadcast 10.42.0.255
>>>         inet6 fe80::a8bb:ccff:fedd:eef1  prefixlen 64  scopeid 0x20<link>
>>>
>>>
>>> You don't see that because you didn't create a connection at all.
>>>
>>>> Instead, I recommend enabling FTRACE to trace the functions involved
>>>> and identify which faulty call is responsible for removing usb0.
>>>
>>> Switching from device -> host -> device:
>>>
>>> root@yuna:~# trace-cmd record -p function_graph -l *gether_*
>>>   plugin 'function_graph'
>>> Hit Ctrl^C to stop recording
>>> ^CCPU0 data recorded at offset=0x1c8000
>>>     188 bytes in size (4096 uncompressed)
>>> CPU1 data recorded at offset=0x1c9000
>>>     0 bytes in size (0 uncompressed)
>>> root@yuna:~# trace-cmd report
>>> cpus=2
>>>      irq/68-dwc3-725   [000]   514.575337: funcgraph_entry:      #
>>> 2079.480 us |  gether_disconnect();
>>>      irq/68-dwc3-946   [000]   524.263731: funcgraph_entry:      +
>>> 11.640 us  |  gether_disconnect();
>>>      irq/68-dwc3-946   [000]   524.263743: funcgraph_entry:      !
>>> 116.520 us |  gether_connect();
>>>      irq/68-dwc3-946   [000]   524.268029: funcgraph_entry:      #
>>> 2057.260 us |  gether_disconnect();
>>>      irq/68-dwc3-946   [000]   524.270089: funcgraph_entry:      !
>>> 109.000 us |  gether_connect();
>>
>> I tried to get a more useful trace:
>> root@yuna:/sys/kernel/tracing# echo 'gether_*' > set_ftrace_filter
>> root@yuna:/sys/kernel/tracing# echo 'eem_*' >> set_ftrace_filter
>> root@yuna:/sys/kernel/tracing# echo function > current_tracer
>> root@yuna:/sys/kernel/tracing# echo 'reset_config' >> set_ftrace_filter
>> -> switch to host mode then back to device
>> root@yuna:/sys/kernel/tracing# cat trace
>> # tracer: function
>> #
>> # entries-in-buffer/entries-written: 53/53 #P:2
>> #
>> # _-----=> irqs-off/BH-disabled
>> # / _----=> need-resched
>> # | / _---=> hardirq/softirq
>> # || / _--=> preempt-depth
>> # ||| / _-=> migrate-disable
>> # |||| / delay
>> # TASK-PID CPU# ||||| TIMESTAMP FUNCTION
>> # | | | ||||| | |
>> irq/68-dwc3-523 [000] D..3. 133.990254: reset_config
>> <-__composite_disconnect
>> irq/68-dwc3-523 [000] D..3. 133.992274: eem_disable
>> <-reset_config
>> irq/68-dwc3-523 [000] D..3. 133.992276: gether_disconnect
>> <-reset_config
>> kworker/1:3-443 [001] ...1. 134.022453: eem_unbind
>> <-purge_configs_funcs
>>
>> -> to device mode
>>
>> kworker/1:3-443 [001] ...1. 148.630773: eem_bind
>> <-usb_add_function
>> irq/68-dwc3-734 [000] D..3. 149.155209: eem_set_alt
>> <-composite_setup
>> irq/68-dwc3-734 [000] D..3. 149.155215: gether_disconnect
>> <-eem_set_alt
>> irq/68-dwc3-734 [000] D..3. 149.155220: gether_connect
>> <-eem_set_alt
>> irq/68-dwc3-734 [000] D..3. 149.157287: eem_set_alt
>> <-composite_setup
>> irq/68-dwc3-734 [000] D..3. 149.157292: gether_disconnect
>> <-eem_set_alt
>> irq/68-dwc3-734 [000] D..3. 149.159338: gether_connect
>> <-eem_set_alt
>> irq/68-dwc3-734 [000] D..2. 149.239625: eem_unwrap <-rx_complete
>> ...
>>
>> I don't know where to look exactly. Any hints?
>
> do you see anything related to gether_cleanup() after eem_unbind() ?

Nope. It's a pitty that the trace formatting got messed up above. But as
you can see I traced gether_* and eem_*. After eem_unbind no traced
function is called, until I flip the switch to device mode.
The ... at the end is where I cut uninteresting eem_unwrap <-rx_complete
and eem_wrap <-eth_start_xmit lines.

> If not then, you may try to enable tracing of TCP/IP stack and network side to check who deleting the sysfs entry

Yes, that's a vast amount of functions to trace. And I don't see how
that would be related to the patch we're discussing here. I was hoping
for a little more targeted hint.

You may recall the whole issue did not occur before this patch got applied.

> Hardik
>
>
>>
>>>
>>>> According to current kernel architecture of u_ether driver, only
>>>> gether_cleanup should remove the usb0 interface along with its
>>>> kobject and sysfs interface.
>>>> I suggest sharing the analysis here to understand why this practice
>>>> is not followed in your use case or driver ?
>>>
>>> Yes, I'll try to trace where that happens.
>>>
>>> Nevertheless, the disappearance of the net/usb0 directory seems
>>> harmless? But the usb: net device remaining after disconnect or role
>>> switch is not good, as the route remains.
>>>
>>> May be they are 2 separate problems. Could you try to reproduce what
>>> happens if you make eem connection and then unplug?
>>>
>>>> I am curious why the driver was developed without adhering to the
>>>> kernel's gadget architecture.
>>
>> I don't know what you mean here. Which driver do you mean?
>>
>>>>>
>>>>>>>> I have the dwc3 IP base usb controller, Let me check
>>>>>>>> with this patch and
>>>>>>>> share result here.  May be we need some fix in dwc3
>>>>>> Would have been nice if someone could test on other
>>>>>> controller as well. But
>>>>>> another instance of dwc3 is also very welcome.
>>>>>>> It's quite possible, please test on your side.
>>>>>>> We are happy to test any fixes if you come up with.
>>>>>
>>>>> --
>>>>> With Best Regards,
>>>>> Andy Shevchenko
>>>>>
>>>>>
>>>


2024-04-30 21:12:37

by Ferry Toth

[permalink] [raw]
Subject: Re: [PATCH v4] usb: gadget: u_ether: Replace netif_stop_queue with netif_device_detach

Hi,

Op 30-04-2024 om 21:40 schreef Ferry Toth:
> Hi,
>
> Op 30-04-2024 om 17:32 schreef Hardik Gajjar:
>> On Sun, Apr 28, 2024 at 11:07:36PM +0200, Ferry Toth wrote:
>>> Hi,
>>>
>>> Op 25-04-2024 om 23:27 schreef Ferry Toth:
>>>> Hi,
>>>>
>>>> Op 17-04-2024 om 17:13 schreef Hardik Gajjar:
>>>>> On Tue, Apr 16, 2024 at 04:48:32PM +0300, Andy Shevchenko wrote:
>>>>>> On Thu, Apr 11, 2024 at 10:52:36PM +0200, Ferry Toth wrote:
>>>>>>> Op 11-04-2024 om 18:39 schreef Andy Shevchenko:
>>>>>>>> On Thu, Apr 11, 2024 at 04:26:37PM +0200, Hardik Gajjar wrote:
>>>>>>>>> On Wed, Apr 10, 2024 at 08:37:42PM +0300, Andy Shevchenko wrote:
>>>>>>>>>> On Sun, Apr 07, 2024 at 10:51:51PM +0200, Ferry Toth wrote:
>>>>>>>>>>> Op 05-04-2024 om 13:38 schreef Hardik Gajjar:
>>>>>>
>>>>>> ...
>>>>>>
>>>>>>>>>>> Exactly. And this didn't happen before the 2 patches.
>>>>>>>>>>>
>>>>>>>>>>> To be precise: /sys/class/net/usb0 is not
>>>>>>>>>>> removed and it is a link, the link
>>>>>>>>>>> target
>>>>>>>>>>> /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/net/usb0
>>>>>>>>>>> no
>>>>>>>>>>> longer exists
>>>>>>>>> So, it means that the /sys/class/net/usb0 is
>>>>>>>>> present, but the symlink is
>>>>>>>>> broken. In that case, the dwc3 driver should
>>>>>>>>> recreate the device, and the
>>>>>>>>> symlink should become active again
>>>>>>>
>>>>>>> Yes, on first enabling gadget (when device mode is activated):
>>>>>>>
>>>>>>> root@yuna:~# ls
>>>>>>> /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/
>>>>>>> driver  net  power  sound  subsystem  suspended  uevent
>>>>>>>
>>>>>>> Then switching to host mode:
>>>>>>>
>>>>>>> root@yuna:~# ls
>>>>>>> /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/
>>>>>>> ls: cannot access
>>>>>>> '/sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/':
>>>>>>> No such file
>>>>>>> or directory
>>>>>>>
>>>>>>> Then back to device mode:
>>>>>>>
>>>>>>> root@yuna:~# ls
>>>>>>> /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/
>>>>>>> driver  power  sound  subsystem  suspended  uevent
>>>>>>>
>>>>>>> net is missing. But, network functions:
>>>>>>>
>>>>>>> root@yuna:~# ping 10.42.0.1
>>>>>>> PING 10.42.0.1 (10.42.0.1): 56 data bytes
>>>>>>>
>>>>>>> Mass storage device is created and removed each time as expected.
>>>>>>
>>>>>> So, what's the conclusion? Shall we move towards revert of those
>>>>>> two changes?
>>>>>
>>>>>
>>>>> As promised, I have the tested the this patch with the dwc3 gadget.
>>>>> I could not reproduce
>>>>> the issue.
>>>>>
>>>>> I can see the usb0 exist all the time and accessible regardless of
>>>>> the role switching of the USB mode (peripheral <-> host)
>>>>>
>>>>> Following are the logs:
>>>>> //Host to device
>>>>>
>>>>> console:/sys/bus/platform/devices/a800000.ssusb # echo "peripheral"
>>>>>> mode
>>>>> console:/sys/bus/platform/devices/a800000.ssusb # ls
>>>>> a800000.dwc3/gadget/net/
>>>>> usb0
>>>>>
>>>>> //device to host
>>>>> console:/sys/bus/platform/devices/a800000.ssusb # echo "host" > mode
>>>>> console:/sys/bus/platform/devices/a800000.ssusb # ls
>>>>> a800000.dwc3/gadget/net/
>>>>> usb0
>>>>
>>>> That is weird. When I switch to host mode (using the physical switch),
>>>> the whole gadget directory is removed (now testing 6.9.0-rc5)
>>>>
>>>> Switching back to device mode, that gadget directory is recreated. And
>>>> gadget/sound as well, but not gadget/net.
>>>>
>>>>> s a800000.dwc3/gadget/net/usb0
>>>>> <
>>>>> addr_assign_type    duplex             phys_port_name
>>>>> addr_len            flags              phys_switch_id
>>>>> address             gro_flush_timeout  power
>>>>> broadcast           ifalias            proto_down
>>>>> carrier             ifindex            queues
>>>>> carrier_changes     iflink             speed
>>>>> carrier_down_count  link_mode          statistics
>>>>> carrier_up_count    mtu                subsystem
>>>>> dev_id              name_assign_type   tx_queue_len
>>>>> dev_port            netdev_group       type
>>>>> device              operstate          uevent
>>>>> dormant             phys_port_id       waiting_for_supplier
>>>>> console:/sys/bus/platform/devices/a800000.ssusb # ifconfig -a usb0
>>>>> usb0      Link encap:Ethernet  HWaddr 3a:8b:63:97:1a:9a
>>>>>             BROADCAST MULTICAST  MTU:1500  Metric:1
>>>>>             RX packets:0 errors:0 dropped:0 overruns:0 frame:0
>>>>>             TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
>>>>>             collisions:0 txqueuelen:1000
>>>>>             RX bytes:0 TX bytes:0
>>>>>
>>>>> console:/sys/bus/platform/devices/a800000.ssusb #
>>>>>
>>>>> I strongly advise against reverting the patch solely based on the
>>>>> observed issue of removing the /sys/class/net/usb0 directory while
>>>>> the usb0 interface remains available.
>>>>
>>>> There's more to it. I also mentioned that switching the role or
>>>> unplugging the cable leaves the usb0 connection.
>>>>
>>>> I have while in host mode:
>>>> root@yuna:~# ifconfig -a usb0
>>>> usb0: flags=-28605<UP,BROADCAST,RUNNING,MULTICAST,DYNAMIC>  mtu 1500
>>>>           inet 10.42.0.221  netmask 255.255.255.0  broadcast
>>>> 10.42.0.255
>>>>           inet6 fe80::a8bb:ccff:fedd:eef1  prefixlen 64  scopeid
>>>> 0x20<link>
>>>>
>>>>
>>>> You don't see that because you didn't create a connection at all.
>>>>
>>>>> Instead, I recommend enabling FTRACE to trace the functions involved
>>>>> and identify which faulty call is responsible for removing usb0.
>>>>
>>>> Switching from device -> host -> device:
>>>>
>>>> root@yuna:~# trace-cmd record -p function_graph -l *gether_*
>>>>     plugin 'function_graph'
>>>> Hit Ctrl^C to stop recording
>>>> ^CCPU0 data recorded at offset=0x1c8000
>>>>       188 bytes in size (4096 uncompressed)
>>>> CPU1 data recorded at offset=0x1c9000
>>>>       0 bytes in size (0 uncompressed)
>>>> root@yuna:~# trace-cmd report
>>>> cpus=2
>>>>        irq/68-dwc3-725   [000]   514.575337: funcgraph_entry:      #
>>>> 2079.480 us |  gether_disconnect();
>>>>        irq/68-dwc3-946   [000]   524.263731: funcgraph_entry:      +
>>>> 11.640 us  |  gether_disconnect();
>>>>        irq/68-dwc3-946   [000]   524.263743: funcgraph_entry:      !
>>>> 116.520 us |  gether_connect();
>>>>        irq/68-dwc3-946   [000]   524.268029: funcgraph_entry:      #
>>>> 2057.260 us |  gether_disconnect();
>>>>        irq/68-dwc3-946   [000]   524.270089: funcgraph_entry:      !
>>>> 109.000 us |  gether_connect();
>>>
>>> I tried to get a more useful trace:
>>> root@yuna:/sys/kernel/tracing# echo 'gether_*' > set_ftrace_filter
>>> root@yuna:/sys/kernel/tracing# echo 'eem_*' >> set_ftrace_filter
>>> root@yuna:/sys/kernel/tracing# echo function > current_tracer
>>> root@yuna:/sys/kernel/tracing# echo 'reset_config' >> set_ftrace_filter
>>> -> switch to host mode then back to device
>>> root@yuna:/sys/kernel/tracing# cat trace
>>> # tracer: function
>>> #
>>> # entries-in-buffer/entries-written: 53/53   #P:2
>>> #
>>> #                                _-----=> irqs-off/BH-disabled
>>> #                               / _----=> need-resched
>>> #                              | / _---=> hardirq/softirq
>>> #                              || / _--=> preempt-depth
>>> #                              ||| / _-=> migrate-disable
>>> #                              |||| /     delay
>>> #           TASK-PID     CPU#  |||||  TIMESTAMP  FUNCTION
>>> #              | |         |   |||||     |         |
>>>       irq/68-dwc3-523     [000] D..3.   133.990254: reset_config
>>> <-__composite_disconnect
>>>       irq/68-dwc3-523     [000] D..3.   133.992274: eem_disable
>>> <-reset_config
>>>       irq/68-dwc3-523     [000] D..3.   133.992276: gether_disconnect
>>> <-reset_config
>>>       kworker/1:3-443     [001] ...1.   134.022453: eem_unbind
>>> <-purge_configs_funcs
>>>
>>> -> to device mode
>>>
>>>       kworker/1:3-443     [001] ...1.   148.630773: eem_bind
>>> <-usb_add_function
>>>       irq/68-dwc3-734     [000] D..3.   149.155209: eem_set_alt
>>> <-composite_setup
>>>       irq/68-dwc3-734     [000] D..3.   149.155215: gether_disconnect
>>> <-eem_set_alt
>>>       irq/68-dwc3-734     [000] D..3.   149.155220: gether_connect
>>> <-eem_set_alt
>>>       irq/68-dwc3-734     [000] D..3.   149.157287: eem_set_alt
>>> <-composite_setup
>>>       irq/68-dwc3-734     [000] D..3.   149.157292: gether_disconnect
>>> <-eem_set_alt
>>>       irq/68-dwc3-734     [000] D..3.   149.159338: gether_connect
>>> <-eem_set_alt
>>>       irq/68-dwc3-734     [000] D..2.   149.239625: eem_unwrap
>>> <-rx_complete
>>> ...
>>>
>>> I don't know where to look exactly. Any hints?
>>
>> do you see anything related to gether_cleanup() after eem_unbind() ?
>
> Nope. It's a pitty that the trace formatting got messed up above. But as
> you can see I traced gether_* and eem_*. After eem_unbind no traced
> function is called, until I flip the switch to device mode.
> The ... at the end is where I cut uninteresting eem_unwrap <-rx_complete
> and eem_wrap <-eth_start_xmit lines.
>
>> If not then, you may try to enable tracing of TCP/IP stack and network
>> side to check who deleting the sysfs entry
>
> Yes, that's a vast amount of functions to trace. And I don't see how
> that would be related to the patch we're discussing here. I was hoping
> for a little more targeted hint.

Now filtering 'gether_*', 'eem_*', '*configfs_*', 'composite_*',
'usb_fun*', 'reset_config' and 'device_remove_file' leads me to:

TIMESTAMP FUNCTION
| |
49.952477: eem_wrap <-eth_start_xmit
55.072455: eem_wrap <-eth_start_xmit
55.072621: eem_unwrap <-rx_complete
59.011540: configfs_composite_reset <-usb_gadget_udc_reset
59.011545: composite_reset <-configfs_composite_reset
59.011548: reset_config <-__composite_disconnect
59.013565: eem_disable <-reset_config
59.013567: gether_disconnect <-reset_config
59.049560: device_remove_file <-device_remove
59.051185: configfs_composite_disconnect <-usb_gadget_disco
59.051189: composite_disconnect <-configfs_composite_discon
59.051195: configfs_composite_unbind <-gadget_unbind_driver
59.052519: eem_unbind <-purge_configs_funcs
59.052529: composite_dev_cleanup <-configfs_composite_unbin
59.052537: device_remove_file <-composite_dev_cleanup

device_remove_file gets called twice, once by device_remove after
gether_disconnect (that the one). The 2nd time by composite_dev_cleanup
(removing the gadget)

> You may recall the whole issue did not occur before this patch got applied.
>
>> Hardik
>>
>>
>>>
>>>>
>>>>> According to current kernel architecture of u_ether driver, only
>>>>> gether_cleanup should remove the usb0 interface along with its
>>>>> kobject and sysfs interface.
>>>>> I suggest sharing the analysis here to understand why this practice
>>>>> is not followed in your use case or driver ?
>>>>
>>>> Yes, I'll try to trace where that happens.
>>>>
>>>> Nevertheless, the disappearance of the net/usb0 directory seems
>>>> harmless? But the usb: net device remaining after disconnect or role
>>>> switch is not good, as the route remains.
>>>>
>>>> May be they are 2 separate problems. Could you try to reproduce what
>>>> happens if you make eem connection and then unplug?
>>>>
>>>>> I am curious why the driver was developed without adhering to the
>>>>> kernel's gadget architecture.
>>>
>>> I don't know what you mean here. Which driver do you mean?
>>>
>>>>>>
>>>>>>>>> I have the dwc3 IP base usb controller, Let me check
>>>>>>>>> with this patch and
>>>>>>>>> share result here.  May be we need some fix in dwc3
>>>>>>> Would have been nice if someone could test on other
>>>>>>> controller as well. But
>>>>>>> another instance of dwc3 is also very welcome.
>>>>>>>> It's quite possible, please test on your side.
>>>>>>>> We are happy to test any fixes if you come up with.
>>>>>>
>>>>>> --
>>>>>> With Best Regards,
>>>>>> Andy Shevchenko
>>>>>>
>>>>>>
>>>>
>

2024-05-02 15:30:04

by Hardik Gajjar

[permalink] [raw]
Subject: Re: [PATCH v4] usb: gadget: u_ether: Replace netif_stop_queue with netif_device_detach

On Tue, Apr 30, 2024 at 11:12:17PM +0200, Ferry Toth wrote:
> Hi,
>
> Op 30-04-2024 om 21:40 schreef Ferry Toth:
> > Hi,
> >
> > Op 30-04-2024 om 17:32 schreef Hardik Gajjar:
> > > On Sun, Apr 28, 2024 at 11:07:36PM +0200, Ferry Toth wrote:
> > > > Hi,
> > > >
> > > > Op 25-04-2024 om 23:27 schreef Ferry Toth:
> > > > > Hi,
> > > > >
> > > > > Op 17-04-2024 om 17:13 schreef Hardik Gajjar:
> > > > > > On Tue, Apr 16, 2024 at 04:48:32PM +0300, Andy Shevchenko wrote:
> > > > > > > On Thu, Apr 11, 2024 at 10:52:36PM +0200, Ferry Toth wrote:
> > > > > > > > Op 11-04-2024 om 18:39 schreef Andy Shevchenko:
> > > > > > > > > On Thu, Apr 11, 2024 at 04:26:37PM +0200, Hardik Gajjar wrote:
> > > > > > > > > > On Wed, Apr 10, 2024 at 08:37:42PM +0300, Andy Shevchenko wrote:
> > > > > > > > > > > On Sun, Apr 07, 2024 at 10:51:51PM +0200, Ferry Toth wrote:
> > > > > > > > > > > > Op 05-04-2024 om 13:38 schreef Hardik Gajjar:
> > > > > > >
> > > > > > > ...
> > > > > > >
> > > > > > > > > > > > Exactly. And this didn't happen before the 2 patches.
> > > > > > > > > > > >
> > > > > > > > > > > > To be precise: /sys/class/net/usb0 is not
> > > > > > > > > > > > removed and it is a link, the link
> > > > > > > > > > > > target /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/net/usb0
> > > > > > > > > > > > no
> > > > > > > > > > > > longer exists
> > > > > > > > > > So, it means that the /sys/class/net/usb0 is
> > > > > > > > > > present, but the symlink is
> > > > > > > > > > broken. In that case, the dwc3 driver should
> > > > > > > > > > recreate the device, and the
> > > > > > > > > > symlink should become active again
> > > > > > > >
> > > > > > > > Yes, on first enabling gadget (when device mode is activated):
> > > > > > > >
> > > > > > > > root@yuna:~# ls
> > > > > > > > /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/
> > > > > > > > driver  net  power  sound  subsystem  suspended  uevent
> > > > > > > >
> > > > > > > > Then switching to host mode:
> > > > > > > >
> > > > > > > > root@yuna:~# ls
> > > > > > > > /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/
> > > > > > > > ls: cannot access
> > > > > > > > '/sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/':
> > > > > > > > No such file
> > > > > > > > or directory
> > > > > > > >
> > > > > > > > Then back to device mode:
> > > > > > > >
> > > > > > > > root@yuna:~# ls
> > > > > > > > /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/
> > > > > > > > driver  power  sound  subsystem  suspended  uevent
> > > > > > > >
> > > > > > > > net is missing. But, network functions:
> > > > > > > >
> > > > > > > > root@yuna:~# ping 10.42.0.1
> > > > > > > > PING 10.42.0.1 (10.42.0.1): 56 data bytes
> > > > > > > >
> > > > > > > > Mass storage device is created and removed each time as expected.
> > > > > > >
> > > > > > > So, what's the conclusion? Shall we move towards revert of those
> > > > > > > two changes?
> > > > > >
> > > > > >
> > > > > > As promised, I have the tested the this patch with the dwc3 gadget.
> > > > > > I could not reproduce
> > > > > > the issue.
> > > > > >
> > > > > > I can see the usb0 exist all the time and accessible regardless of
> > > > > > the role switching of the USB mode (peripheral <-> host)
> > > > > >
> > > > > > Following are the logs:
> > > > > > //Host to device
> > > > > >
> > > > > > console:/sys/bus/platform/devices/a800000.ssusb # echo "peripheral"
> > > > > > > mode
> > > > > > console:/sys/bus/platform/devices/a800000.ssusb # ls
> > > > > > a800000.dwc3/gadget/net/
> > > > > > usb0
> > > > > >
> > > > > > //device to host
> > > > > > console:/sys/bus/platform/devices/a800000.ssusb # echo "host" > mode
> > > > > > console:/sys/bus/platform/devices/a800000.ssusb # ls
> > > > > > a800000.dwc3/gadget/net/
> > > > > > usb0
> > > > >
> > > > > That is weird. When I switch to host mode (using the physical switch),
> > > > > the whole gadget directory is removed (now testing 6.9.0-rc5)
> > > > >
> > > > > Switching back to device mode, that gadget directory is recreated. And
> > > > > gadget/sound as well, but not gadget/net.
> > > > >
> > > > > > s a800000.dwc3/gadget/net/usb0
> > > > > > <
> > > > > > addr_assign_type    duplex             phys_port_name
> > > > > > addr_len            flags              phys_switch_id
> > > > > > address             gro_flush_timeout  power
> > > > > > broadcast           ifalias            proto_down
> > > > > > carrier             ifindex            queues
> > > > > > carrier_changes     iflink             speed
> > > > > > carrier_down_count  link_mode          statistics
> > > > > > carrier_up_count    mtu                subsystem
> > > > > > dev_id              name_assign_type   tx_queue_len
> > > > > > dev_port            netdev_group       type
> > > > > > device              operstate          uevent
> > > > > > dormant             phys_port_id       waiting_for_supplier
> > > > > > console:/sys/bus/platform/devices/a800000.ssusb # ifconfig -a usb0
> > > > > > usb0      Link encap:Ethernet  HWaddr 3a:8b:63:97:1a:9a
> > > > > >             BROADCAST MULTICAST  MTU:1500  Metric:1
> > > > > >             RX packets:0 errors:0 dropped:0 overruns:0 frame:0
> > > > > >             TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
> > > > > >             collisions:0 txqueuelen:1000
> > > > > >             RX bytes:0 TX bytes:0
> > > > > >
> > > > > > console:/sys/bus/platform/devices/a800000.ssusb #
> > > > > >
> > > > > > I strongly advise against reverting the patch solely based on the
> > > > > > observed issue of removing the /sys/class/net/usb0 directory while
> > > > > > the usb0 interface remains available.
> > > > >
> > > > > There's more to it. I also mentioned that switching the role or
> > > > > unplugging the cable leaves the usb0 connection.
> > > > >
> > > > > I have while in host mode:
> > > > > root@yuna:~# ifconfig -a usb0
> > > > > usb0: flags=-28605<UP,BROADCAST,RUNNING,MULTICAST,DYNAMIC>  mtu 1500
> > > > >           inet 10.42.0.221  netmask 255.255.255.0  broadcast
> > > > > 10.42.0.255
> > > > >           inet6 fe80::a8bb:ccff:fedd:eef1  prefixlen 64 
> > > > > scopeid 0x20<link>
> > > > >
> > > > >
> > > > > You don't see that because you didn't create a connection at all.
> > > > >
> > > > > > Instead, I recommend enabling FTRACE to trace the functions involved
> > > > > > and identify which faulty call is responsible for removing usb0.
> > > > >
> > > > > Switching from device -> host -> device:
> > > > >
> > > > > root@yuna:~# trace-cmd record -p function_graph -l *gether_*
> > > > >     plugin 'function_graph'
> > > > > Hit Ctrl^C to stop recording
> > > > > ^CCPU0 data recorded at offset=0x1c8000
> > > > >       188 bytes in size (4096 uncompressed)
> > > > > CPU1 data recorded at offset=0x1c9000
> > > > >       0 bytes in size (0 uncompressed)
> > > > > root@yuna:~# trace-cmd report
> > > > > cpus=2
> > > > >        irq/68-dwc3-725   [000]   514.575337: funcgraph_entry:      #
> > > > > 2079.480 us |  gether_disconnect();
> > > > >        irq/68-dwc3-946   [000]   524.263731: funcgraph_entry:      +
> > > > > 11.640 us  |  gether_disconnect();
> > > > >        irq/68-dwc3-946   [000]   524.263743: funcgraph_entry:      !
> > > > > 116.520 us |  gether_connect();
> > > > >        irq/68-dwc3-946   [000]   524.268029: funcgraph_entry:      #
> > > > > 2057.260 us |  gether_disconnect();
> > > > >        irq/68-dwc3-946   [000]   524.270089: funcgraph_entry:      !
> > > > > 109.000 us |  gether_connect();
> > > >
> > > > I tried to get a more useful trace:
> > > > root@yuna:/sys/kernel/tracing# echo 'gether_*' > set_ftrace_filter
> > > > root@yuna:/sys/kernel/tracing# echo 'eem_*' >> set_ftrace_filter
> > > > root@yuna:/sys/kernel/tracing# echo function > current_tracer
> > > > root@yuna:/sys/kernel/tracing# echo 'reset_config' >> set_ftrace_filter
> > > > -> switch to host mode then back to device
> > > > root@yuna:/sys/kernel/tracing# cat trace
> > > > # tracer: function
> > > > #
> > > > # entries-in-buffer/entries-written: 53/53   #P:2
> > > > #
> > > > #                                _-----=> irqs-off/BH-disabled
> > > > #                               / _----=> need-resched
> > > > #                              | / _---=> hardirq/softirq
> > > > #                              || / _--=> preempt-depth
> > > > #                              ||| / _-=> migrate-disable
> > > > #                              |||| /     delay
> > > > #           TASK-PID     CPU#  |||||  TIMESTAMP  FUNCTION
> > > > #              | |         |   |||||     |         |
> > > >       irq/68-dwc3-523     [000] D..3.   133.990254: reset_config
> > > > <-__composite_disconnect
> > > >       irq/68-dwc3-523     [000] D..3.   133.992274: eem_disable
> > > > <-reset_config
> > > >       irq/68-dwc3-523     [000] D..3.   133.992276: gether_disconnect
> > > > <-reset_config
> > > >       kworker/1:3-443     [001] ...1.   134.022453: eem_unbind
> > > > <-purge_configs_funcs
> > > >
> > > > -> to device mode
> > > >
> > > >       kworker/1:3-443     [001] ...1.   148.630773: eem_bind
> > > > <-usb_add_function
> > > >       irq/68-dwc3-734     [000] D..3.   149.155209: eem_set_alt
> > > > <-composite_setup
> > > >       irq/68-dwc3-734     [000] D..3.   149.155215: gether_disconnect
> > > > <-eem_set_alt
> > > >       irq/68-dwc3-734     [000] D..3.   149.155220: gether_connect
> > > > <-eem_set_alt
> > > >       irq/68-dwc3-734     [000] D..3.   149.157287: eem_set_alt
> > > > <-composite_setup
> > > >       irq/68-dwc3-734     [000] D..3.   149.157292: gether_disconnect
> > > > <-eem_set_alt
> > > >       irq/68-dwc3-734     [000] D..3.   149.159338: gether_connect
> > > > <-eem_set_alt
> > > >       irq/68-dwc3-734     [000] D..2.   149.239625: eem_unwrap
> > > > <-rx_complete
> > > > ...
> > > >
> > > > I don't know where to look exactly. Any hints?
> > >
> > > do you see anything related to gether_cleanup() after eem_unbind() ?
> >
> > Nope. It's a pitty that the trace formatting got messed up above. But as
> > you can see I traced gether_* and eem_*. After eem_unbind no traced
> > function is called, until I flip the switch to device mode.
> > The ... at the end is where I cut uninteresting eem_unwrap <-rx_complete
> > and eem_wrap <-eth_start_xmit lines.
> >
> > > If not then, you may try to enable tracing of TCP/IP stack and
> > > network side to check who deleting the sysfs entry
> >
> > Yes, that's a vast amount of functions to trace. And I don't see how
> > that would be related to the patch we're discussing here. I was hoping
> > for a little more targeted hint.
>
> Now filtering 'gether_*', 'eem_*', '*configfs_*', 'composite_*', 'usb_fun*',
> 'reset_config' and 'device_remove_file' leads me to:
>
> TIMESTAMP FUNCTION
> | |
> 49.952477: eem_wrap <-eth_start_xmit
> 55.072455: eem_wrap <-eth_start_xmit
> 55.072621: eem_unwrap <-rx_complete
> 59.011540: configfs_composite_reset <-usb_gadget_udc_reset
> 59.011545: composite_reset <-configfs_composite_reset
> 59.011548: reset_config <-__composite_disconnect
> 59.013565: eem_disable <-reset_config
> 59.013567: gether_disconnect <-reset_config
> 59.049560: device_remove_file <-device_remove
> 59.051185: configfs_composite_disconnect <-usb_gadget_disco
> 59.051189: composite_disconnect <-configfs_composite_discon
> 59.051195: configfs_composite_unbind <-gadget_unbind_driver
> 59.052519: eem_unbind <-purge_configs_funcs
> 59.052529: composite_dev_cleanup <-configfs_composite_unbin
> 59.052537: device_remove_file <-composite_dev_cleanup
>
> device_remove_file gets called twice, once by device_remove after
> gether_disconnect (that the one). The 2nd time by composite_dev_cleanup
> (removing the gadget)

I believe that the device_remove_file function is only removing suspend-specific attributes, not the complete gadget.
Typically, when you perform the role switch, the Gadget start/stop function in your UDC driver is called. These functions should not delete the gadget

To investigate further, could you please enable the DWC3 functions in ftrace and check who is removing the gadget?
I can also enable this on my system and compare the logs with yours, but I will be in PI planning for 1.5 weeks and may not be able to provide immediate support.

Additionally, please check if you have any customized DWC patches that may be causing this problem.

>
> > You may recall the whole issue did not occur before this patch got applied.
> >
> > > Hardik
> > >
> > >
> > > >
> > > > >
> > > > > > According to current kernel architecture of u_ether driver, only
> > > > > > gether_cleanup should remove the usb0 interface along with its
> > > > > > kobject and sysfs interface.
> > > > > > I suggest sharing the analysis here to understand why this practice
> > > > > > is not followed in your use case or driver ?
> > > > >
> > > > > Yes, I'll try to trace where that happens.
> > > > >
> > > > > Nevertheless, the disappearance of the net/usb0 directory seems
> > > > > harmless? But the usb: net device remaining after disconnect or role
> > > > > switch is not good, as the route remains.
> > > > >
> > > > > May be they are 2 separate problems. Could you try to reproduce what
> > > > > happens if you make eem connection and then unplug?
> > > > >
> > > > > > I am curious why the driver was developed without adhering to the
> > > > > > kernel's gadget architecture.
> > > >
> > > > I don't know what you mean here. Which driver do you mean?
> > > >
> > > > > > >
> > > > > > > > > > I have the dwc3 IP base usb controller, Let me check
> > > > > > > > > > with this patch and
> > > > > > > > > > share result here.  May be we need some fix in dwc3
> > > > > > > > Would have been nice if someone could test on other
> > > > > > > > controller as well. But
> > > > > > > > another instance of dwc3 is also very welcome.
> > > > > > > > > It's quite possible, please test on your side.
> > > > > > > > > We are happy to test any fixes if you come up with.
> > > > > > >
> > > > > > > --
> > > > > > > With Best Regards,
> > > > > > > Andy Shevchenko
> > > > > > >
> > > > > > >
> > > > >
> >

2024-05-02 15:32:16

by Andy Shevchenko

[permalink] [raw]
Subject: Re: [PATCH v4] usb: gadget: u_ether: Replace netif_stop_queue with netif_device_detach

On Thu, May 02, 2024 at 05:29:16PM +0200, Hardik Gajjar wrote:
> On Tue, Apr 30, 2024 at 11:12:17PM +0200, Ferry Toth wrote:
> > Op 30-04-2024 om 21:40 schreef Ferry Toth:
> > > Op 30-04-2024 om 17:32 schreef Hardik Gajjar:
> > > > On Sun, Apr 28, 2024 at 11:07:36PM +0200, Ferry Toth wrote:
> > > > > Op 25-04-2024 om 23:27 schreef Ferry Toth:
> > > > > > Op 17-04-2024 om 17:13 schreef Hardik Gajjar:
> > > > > > > On Tue, Apr 16, 2024 at 04:48:32PM +0300, Andy Shevchenko wrote:
> > > > > > > > On Thu, Apr 11, 2024 at 10:52:36PM +0200, Ferry Toth wrote:
> > > > > > > > > Op 11-04-2024 om 18:39 schreef Andy Shevchenko:
> > > > > > > > > > On Thu, Apr 11, 2024 at 04:26:37PM +0200, Hardik Gajjar wrote:
> > > > > > > > > > > On Wed, Apr 10, 2024 at 08:37:42PM +0300, Andy Shevchenko wrote:
> > > > > > > > > > > > On Sun, Apr 07, 2024 at 10:51:51PM +0200, Ferry Toth wrote:
> > > > > > > > > > > > > Op 05-04-2024 om 13:38 schreef Hardik Gajjar:

..

> > > > > > > > > > > > > Exactly. And this didn't happen before the 2 patches.
> > > > > > > > > > > > >
> > > > > > > > > > > > > To be precise: /sys/class/net/usb0 is not
> > > > > > > > > > > > > removed and it is a link, the link
> > > > > > > > > > > > > target /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/net/usb0
> > > > > > > > > > > > > no
> > > > > > > > > > > > > longer exists
> > > > > > > > > > > So, it means that the /sys/class/net/usb0 is
> > > > > > > > > > > present, but the symlink is
> > > > > > > > > > > broken. In that case, the dwc3 driver should
> > > > > > > > > > > recreate the device, and the
> > > > > > > > > > > symlink should become active again
> > > > > > > > >
> > > > > > > > > Yes, on first enabling gadget (when device mode is activated):
> > > > > > > > >
> > > > > > > > > root@yuna:~# ls
> > > > > > > > > /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/
> > > > > > > > > driver? net? power? sound? subsystem? suspended? uevent
> > > > > > > > >
> > > > > > > > > Then switching to host mode:
> > > > > > > > >
> > > > > > > > > root@yuna:~# ls
> > > > > > > > > /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/
> > > > > > > > > ls: cannot access
> > > > > > > > > '/sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/':
> > > > > > > > > No such file
> > > > > > > > > or directory
> > > > > > > > >
> > > > > > > > > Then back to device mode:
> > > > > > > > >
> > > > > > > > > root@yuna:~# ls
> > > > > > > > > /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/
> > > > > > > > > driver? power? sound? subsystem? suspended? uevent
> > > > > > > > >
> > > > > > > > > net is missing. But, network functions:
> > > > > > > > >
> > > > > > > > > root@yuna:~# ping 10.42.0.1
> > > > > > > > > PING 10.42.0.1 (10.42.0.1): 56 data bytes
> > > > > > > > >
> > > > > > > > > Mass storage device is created and removed each time as expected.
> > > > > > > >
> > > > > > > > So, what's the conclusion? Shall we move towards revert of those
> > > > > > > > two changes?
> > > > > > >
> > > > > > >
> > > > > > > As promised, I have the tested the this patch with the dwc3 gadget.
> > > > > > > I could not reproduce
> > > > > > > the issue.
> > > > > > >
> > > > > > > I can see the usb0 exist all the time and accessible regardless of
> > > > > > > the role switching of the USB mode (peripheral <-> host)
> > > > > > >
> > > > > > > Following are the logs:
> > > > > > > //Host to device
> > > > > > >
> > > > > > > console:/sys/bus/platform/devices/a800000.ssusb # echo "peripheral"
> > > > > > > > mode
> > > > > > > console:/sys/bus/platform/devices/a800000.ssusb # ls
> > > > > > > a800000.dwc3/gadget/net/
> > > > > > > usb0
> > > > > > >
> > > > > > > //device to host
> > > > > > > console:/sys/bus/platform/devices/a800000.ssusb # echo "host" > mode
> > > > > > > console:/sys/bus/platform/devices/a800000.ssusb # ls
> > > > > > > a800000.dwc3/gadget/net/
> > > > > > > usb0
> > > > > >
> > > > > > That is weird. When I switch to host mode (using the physical switch),
> > > > > > the whole gadget directory is removed (now testing 6.9.0-rc5)
> > > > > >
> > > > > > Switching back to device mode, that gadget directory is recreated. And
> > > > > > gadget/sound as well, but not gadget/net.
> > > > > >
> > > > > > > s a800000.dwc3/gadget/net/usb0
> > > > > > > <
> > > > > > > addr_assign_type??? duplex???????????? phys_port_name
> > > > > > > addr_len??????????? flags????????????? phys_switch_id
> > > > > > > address???????????? gro_flush_timeout? power
> > > > > > > broadcast?????????? ifalias??????????? proto_down
> > > > > > > carrier???????????? ifindex??????????? queues
> > > > > > > carrier_changes???? iflink???????????? speed
> > > > > > > carrier_down_count? link_mode????????? statistics
> > > > > > > carrier_up_count??? mtu??????????????? subsystem
> > > > > > > dev_id????????????? name_assign_type?? tx_queue_len
> > > > > > > dev_port??????????? netdev_group?????? type
> > > > > > > device????????????? operstate????????? uevent
> > > > > > > dormant???????????? phys_port_id?????? waiting_for_supplier
> > > > > > > console:/sys/bus/platform/devices/a800000.ssusb # ifconfig -a usb0
> > > > > > > usb0????? Link encap:Ethernet? HWaddr 3a:8b:63:97:1a:9a
> > > > > > > ??????????? BROADCAST MULTICAST? MTU:1500? Metric:1
> > > > > > > ??????????? RX packets:0 errors:0 dropped:0 overruns:0 frame:0
> > > > > > > ??????????? TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
> > > > > > > ??????????? collisions:0 txqueuelen:1000
> > > > > > > ??????????? RX bytes:0 TX bytes:0
> > > > > > >
> > > > > > > console:/sys/bus/platform/devices/a800000.ssusb #
> > > > > > >
> > > > > > > I strongly advise against reverting the patch solely based on the
> > > > > > > observed issue of removing the /sys/class/net/usb0 directory while
> > > > > > > the usb0 interface remains available.
> > > > > >
> > > > > > There's more to it. I also mentioned that switching the role or
> > > > > > unplugging the cable leaves the usb0 connection.
> > > > > >
> > > > > > I have while in host mode:
> > > > > > root@yuna:~# ifconfig -a usb0
> > > > > > usb0: flags=-28605<UP,BROADCAST,RUNNING,MULTICAST,DYNAMIC>? mtu 1500
> > > > > > ? ??????? inet 10.42.0.221? netmask 255.255.255.0? broadcast
> > > > > > 10.42.0.255
> > > > > > ? ??????? inet6 fe80::a8bb:ccff:fedd:eef1? prefixlen 64?
> > > > > > scopeid 0x20<link>
> > > > > >
> > > > > >
> > > > > > You don't see that because you didn't create a connection at all.
> > > > > >
> > > > > > > Instead, I recommend enabling FTRACE to trace the functions involved
> > > > > > > and identify which faulty call is responsible for removing usb0.
> > > > > >
> > > > > > Switching from device -> host -> device:
> > > > > >
> > > > > > root@yuna:~# trace-cmd record -p function_graph -l *gether_*
> > > > > > ? ? plugin 'function_graph'
> > > > > > Hit Ctrl^C to stop recording
> > > > > > ^CCPU0 data recorded at offset=0x1c8000
> > > > > > ? ??? 188 bytes in size (4096 uncompressed)
> > > > > > CPU1 data recorded at offset=0x1c9000
> > > > > > ? ??? 0 bytes in size (0 uncompressed)
> > > > > > root@yuna:~# trace-cmd report
> > > > > > cpus=2
> > > > > > ? ???? irq/68-dwc3-725?? [000]?? 514.575337: funcgraph_entry:????? #
> > > > > > 2079.480 us |? gether_disconnect();
> > > > > > ? ???? irq/68-dwc3-946?? [000]?? 524.263731: funcgraph_entry:????? +
> > > > > > 11.640 us? |? gether_disconnect();
> > > > > > ? ???? irq/68-dwc3-946?? [000]?? 524.263743: funcgraph_entry:????? !
> > > > > > 116.520 us |? gether_connect();
> > > > > > ? ???? irq/68-dwc3-946?? [000]?? 524.268029: funcgraph_entry:????? #
> > > > > > 2057.260 us |? gether_disconnect();
> > > > > > ? ???? irq/68-dwc3-946?? [000]?? 524.270089: funcgraph_entry:????? !
> > > > > > 109.000 us |? gether_connect();
> > > > >
> > > > > I tried to get a more useful trace:
> > > > > root@yuna:/sys/kernel/tracing# echo 'gether_*' > set_ftrace_filter
> > > > > root@yuna:/sys/kernel/tracing# echo 'eem_*' >> set_ftrace_filter
> > > > > root@yuna:/sys/kernel/tracing# echo function > current_tracer
> > > > > root@yuna:/sys/kernel/tracing# echo 'reset_config' >> set_ftrace_filter
> > > > > -> switch to host mode then back to device
> > > > > root@yuna:/sys/kernel/tracing# cat trace
> > > > > # tracer: function
> > > > > #
> > > > > # entries-in-buffer/entries-written: 53/53?? #P:2
> > > > > #
> > > > > #??????????????????????????????? _-----=> irqs-off/BH-disabled
> > > > > #?????????????????????????????? / _----=> need-resched
> > > > > #????????????????????????????? | / _---=> hardirq/softirq
> > > > > #????????????????????????????? || / _--=> preempt-depth
> > > > > #????????????????????????????? ||| / _-=> migrate-disable
> > > > > #????????????????????????????? |||| /???? delay
> > > > > #?????????? TASK-PID???? CPU#? |||||? TIMESTAMP? FUNCTION
> > > > > #????????????? | |???????? |?? |||||???? |???????? |
> > > > > ????? irq/68-dwc3-523???? [000] D..3.?? 133.990254: reset_config
> > > > > <-__composite_disconnect
> > > > > ????? irq/68-dwc3-523???? [000] D..3.?? 133.992274: eem_disable
> > > > > <-reset_config
> > > > > ????? irq/68-dwc3-523???? [000] D..3.?? 133.992276: gether_disconnect
> > > > > <-reset_config
> > > > > ????? kworker/1:3-443???? [001] ...1.?? 134.022453: eem_unbind
> > > > > <-purge_configs_funcs
> > > > >
> > > > > -> to device mode
> > > > >
> > > > > ????? kworker/1:3-443???? [001] ...1.?? 148.630773: eem_bind
> > > > > <-usb_add_function
> > > > > ????? irq/68-dwc3-734???? [000] D..3.?? 149.155209: eem_set_alt
> > > > > <-composite_setup
> > > > > ????? irq/68-dwc3-734???? [000] D..3.?? 149.155215: gether_disconnect
> > > > > <-eem_set_alt
> > > > > ????? irq/68-dwc3-734???? [000] D..3.?? 149.155220: gether_connect
> > > > > <-eem_set_alt
> > > > > ????? irq/68-dwc3-734???? [000] D..3.?? 149.157287: eem_set_alt
> > > > > <-composite_setup
> > > > > ????? irq/68-dwc3-734???? [000] D..3.?? 149.157292: gether_disconnect
> > > > > <-eem_set_alt
> > > > > ????? irq/68-dwc3-734???? [000] D..3.?? 149.159338: gether_connect
> > > > > <-eem_set_alt
> > > > > ????? irq/68-dwc3-734???? [000] D..2.?? 149.239625: eem_unwrap
> > > > > <-rx_complete
> > > > > ...
> > > > >
> > > > > I don't know where to look exactly. Any hints?
> > > >
> > > > do you see anything related to gether_cleanup() after eem_unbind() ?
> > >
> > > Nope. It's a pitty that the trace formatting got messed up above. But as
> > > you can see I traced gether_* and eem_*. After eem_unbind no traced
> > > function is called, until I flip the switch to device mode.
> > > The ... at the end is where I cut uninteresting eem_unwrap <-rx_complete
> > > and eem_wrap <-eth_start_xmit lines.
> > >
> > > > If not then, you may try to enable tracing of TCP/IP stack and
> > > > network side to check who deleting the sysfs entry
> > >
> > > Yes, that's a vast amount of functions to trace. And I don't see how
> > > that would be related to the patch we're discussing here. I was hoping
> > > for a little more targeted hint.
> >
> > Now filtering 'gether_*', 'eem_*', '*configfs_*', 'composite_*', 'usb_fun*',
> > 'reset_config' and 'device_remove_file' leads me to:
> >
> > TIMESTAMP FUNCTION
> > | |
> > 49.952477: eem_wrap <-eth_start_xmit
> > 55.072455: eem_wrap <-eth_start_xmit
> > 55.072621: eem_unwrap <-rx_complete
> > 59.011540: configfs_composite_reset <-usb_gadget_udc_reset
> > 59.011545: composite_reset <-configfs_composite_reset
> > 59.011548: reset_config <-__composite_disconnect
> > 59.013565: eem_disable <-reset_config
> > 59.013567: gether_disconnect <-reset_config
> > 59.049560: device_remove_file <-device_remove
> > 59.051185: configfs_composite_disconnect <-usb_gadget_disco
> > 59.051189: composite_disconnect <-configfs_composite_discon
> > 59.051195: configfs_composite_unbind <-gadget_unbind_driver
> > 59.052519: eem_unbind <-purge_configs_funcs
> > 59.052529: composite_dev_cleanup <-configfs_composite_unbin
> > 59.052537: device_remove_file <-composite_dev_cleanup
> >
> > device_remove_file gets called twice, once by device_remove after
> > gether_disconnect (that the one). The 2nd time by composite_dev_cleanup
> > (removing the gadget)
>
> I believe that the device_remove_file function is only removing
> suspend-specific attributes, not the complete gadget. Typically, when you
> perform the role switch, the Gadget start/stop function in your UDC driver is
> called. These functions should not delete the gadget
>
> To investigate further, could you please enable the DWC3 functions in ftrace
> and check who is removing the gadget? I can also enable this on my system
> and compare the logs with yours, but I will be in PI planning for 1.5 weeks
> and may not be able to provide immediate support.

Since we are almost at -rc7, I propose to revert and try again next cycle.

> Additionally, please check if you have any customized DWC patches that may be causing this problem.
>
> > > You may recall the whole issue did not occur before this patch got applied.
> > >
> > > > > > > According to current kernel architecture of u_ether driver, only
> > > > > > > gether_cleanup should remove the usb0 interface along with its
> > > > > > > kobject and sysfs interface.
> > > > > > > I suggest sharing the analysis here to understand why this practice
> > > > > > > is not followed in your use case or driver ?
> > > > > >
> > > > > > Yes, I'll try to trace where that happens.
> > > > > >
> > > > > > Nevertheless, the disappearance of the net/usb0 directory seems
> > > > > > harmless? But the usb: net device remaining after disconnect or role
> > > > > > switch is not good, as the route remains.
> > > > > >
> > > > > > May be they are 2 separate problems. Could you try to reproduce what
> > > > > > happens if you make eem connection and then unplug?
> > > > > >
> > > > > > > I am curious why the driver was developed without adhering to the
> > > > > > > kernel's gadget architecture.
> > > > >
> > > > > I don't know what you mean here. Which driver do you mean?
> > > > >
> > > > > > > > > > > I have the dwc3 IP base usb controller, Let me check
> > > > > > > > > > > with this patch and
> > > > > > > > > > > share result here.? May be we need some fix in dwc3
> > > > > > > > > Would have been nice if someone could test on other
> > > > > > > > > controller as well. But
> > > > > > > > > another instance of dwc3 is also very welcome.
> > > > > > > > > > It's quite possible, please test on your side.
> > > > > > > > > > We are happy to test any fixes if you come up with.

--
With Best Regards,
Andy Shevchenko



2024-05-02 16:18:11

by Hardik Gajjar

[permalink] [raw]
Subject: Re: [PATCH v4] usb: gadget: u_ether: Replace netif_stop_queue with netif_device_detach

On Thu, May 02, 2024 at 06:31:48PM +0300, Andy Shevchenko wrote:
> On Thu, May 02, 2024 at 05:29:16PM +0200, Hardik Gajjar wrote:
> > On Tue, Apr 30, 2024 at 11:12:17PM +0200, Ferry Toth wrote:
> > > Op 30-04-2024 om 21:40 schreef Ferry Toth:
> > > > Op 30-04-2024 om 17:32 schreef Hardik Gajjar:
> > > > > On Sun, Apr 28, 2024 at 11:07:36PM +0200, Ferry Toth wrote:
> > > > > > Op 25-04-2024 om 23:27 schreef Ferry Toth:
> > > > > > > Op 17-04-2024 om 17:13 schreef Hardik Gajjar:
> > > > > > > > On Tue, Apr 16, 2024 at 04:48:32PM +0300, Andy Shevchenko wrote:
> > > > > > > > > On Thu, Apr 11, 2024 at 10:52:36PM +0200, Ferry Toth wrote:
> > > > > > > > > > Op 11-04-2024 om 18:39 schreef Andy Shevchenko:
> > > > > > > > > > > On Thu, Apr 11, 2024 at 04:26:37PM +0200, Hardik Gajjar wrote:
> > > > > > > > > > > > On Wed, Apr 10, 2024 at 08:37:42PM +0300, Andy Shevchenko wrote:
> > > > > > > > > > > > > On Sun, Apr 07, 2024 at 10:51:51PM +0200, Ferry Toth wrote:
> > > > > > > > > > > > > > Op 05-04-2024 om 13:38 schreef Hardik Gajjar:
>
> ...
>
> > > > > > > > > > > > > > Exactly. And this didn't happen before the 2 patches.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > To be precise: /sys/class/net/usb0 is not
> > > > > > > > > > > > > > removed and it is a link, the link
> > > > > > > > > > > > > > target /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/net/usb0
> > > > > > > > > > > > > > no
> > > > > > > > > > > > > > longer exists
> > > > > > > > > > > > So, it means that the /sys/class/net/usb0 is
> > > > > > > > > > > > present, but the symlink is
> > > > > > > > > > > > broken. In that case, the dwc3 driver should
> > > > > > > > > > > > recreate the device, and the
> > > > > > > > > > > > symlink should become active again
> > > > > > > > > >
> > > > > > > > > > Yes, on first enabling gadget (when device mode is activated):
> > > > > > > > > >
> > > > > > > > > > root@yuna:~# ls
> > > > > > > > > > /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/
> > > > > > > > > > driver  net  power  sound  subsystem  suspended  uevent
> > > > > > > > > >
> > > > > > > > > > Then switching to host mode:
> > > > > > > > > >
> > > > > > > > > > root@yuna:~# ls
> > > > > > > > > > /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/
> > > > > > > > > > ls: cannot access
> > > > > > > > > > '/sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/':
> > > > > > > > > > No such file
> > > > > > > > > > or directory
> > > > > > > > > >
> > > > > > > > > > Then back to device mode:
> > > > > > > > > >
> > > > > > > > > > root@yuna:~# ls
> > > > > > > > > > /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/
> > > > > > > > > > driver  power  sound  subsystem  suspended  uevent
> > > > > > > > > >
> > > > > > > > > > net is missing. But, network functions:
> > > > > > > > > >
> > > > > > > > > > root@yuna:~# ping 10.42.0.1
> > > > > > > > > > PING 10.42.0.1 (10.42.0.1): 56 data bytes
> > > > > > > > > >
> > > > > > > > > > Mass storage device is created and removed each time as expected.
> > > > > > > > >
> > > > > > > > > So, what's the conclusion? Shall we move towards revert of those
> > > > > > > > > two changes?
> > > > > > > >
> > > > > > > >
> > > > > > > > As promised, I have the tested the this patch with the dwc3 gadget.
> > > > > > > > I could not reproduce
> > > > > > > > the issue.
> > > > > > > >
> > > > > > > > I can see the usb0 exist all the time and accessible regardless of
> > > > > > > > the role switching of the USB mode (peripheral <-> host)
> > > > > > > >
> > > > > > > > Following are the logs:
> > > > > > > > //Host to device
> > > > > > > >
> > > > > > > > console:/sys/bus/platform/devices/a800000.ssusb # echo "peripheral"
> > > > > > > > > mode
> > > > > > > > console:/sys/bus/platform/devices/a800000.ssusb # ls
> > > > > > > > a800000.dwc3/gadget/net/
> > > > > > > > usb0
> > > > > > > >
> > > > > > > > //device to host
> > > > > > > > console:/sys/bus/platform/devices/a800000.ssusb # echo "host" > mode
> > > > > > > > console:/sys/bus/platform/devices/a800000.ssusb # ls
> > > > > > > > a800000.dwc3/gadget/net/
> > > > > > > > usb0
> > > > > > >
> > > > > > > That is weird. When I switch to host mode (using the physical switch),
> > > > > > > the whole gadget directory is removed (now testing 6.9.0-rc5)
> > > > > > >
> > > > > > > Switching back to device mode, that gadget directory is recreated. And
> > > > > > > gadget/sound as well, but not gadget/net.
> > > > > > >
> > > > > > > > s a800000.dwc3/gadget/net/usb0
> > > > > > > > <
> > > > > > > > addr_assign_type    duplex             phys_port_name
> > > > > > > > addr_len            flags              phys_switch_id
> > > > > > > > address             gro_flush_timeout  power
> > > > > > > > broadcast           ifalias            proto_down
> > > > > > > > carrier             ifindex            queues
> > > > > > > > carrier_changes     iflink             speed
> > > > > > > > carrier_down_count  link_mode          statistics
> > > > > > > > carrier_up_count    mtu                subsystem
> > > > > > > > dev_id              name_assign_type   tx_queue_len
> > > > > > > > dev_port            netdev_group       type
> > > > > > > > device              operstate          uevent
> > > > > > > > dormant             phys_port_id       waiting_for_supplier
> > > > > > > > console:/sys/bus/platform/devices/a800000.ssusb # ifconfig -a usb0
> > > > > > > > usb0      Link encap:Ethernet  HWaddr 3a:8b:63:97:1a:9a
> > > > > > > >             BROADCAST MULTICAST  MTU:1500  Metric:1
> > > > > > > >             RX packets:0 errors:0 dropped:0 overruns:0 frame:0
> > > > > > > >             TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
> > > > > > > >             collisions:0 txqueuelen:1000
> > > > > > > >             RX bytes:0 TX bytes:0
> > > > > > > >
> > > > > > > > console:/sys/bus/platform/devices/a800000.ssusb #
> > > > > > > >
> > > > > > > > I strongly advise against reverting the patch solely based on the
> > > > > > > > observed issue of removing the /sys/class/net/usb0 directory while
> > > > > > > > the usb0 interface remains available.
> > > > > > >
> > > > > > > There's more to it. I also mentioned that switching the role or
> > > > > > > unplugging the cable leaves the usb0 connection.
> > > > > > >
> > > > > > > I have while in host mode:
> > > > > > > root@yuna:~# ifconfig -a usb0
> > > > > > > usb0: flags=-28605<UP,BROADCAST,RUNNING,MULTICAST,DYNAMIC>  mtu 1500
> > > > > > >           inet 10.42.0.221  netmask 255.255.255.0  broadcast
> > > > > > > 10.42.0.255
> > > > > > >           inet6 fe80::a8bb:ccff:fedd:eef1  prefixlen 64 
> > > > > > > scopeid 0x20<link>
> > > > > > >
> > > > > > >
> > > > > > > You don't see that because you didn't create a connection at all.
> > > > > > >
> > > > > > > > Instead, I recommend enabling FTRACE to trace the functions involved
> > > > > > > > and identify which faulty call is responsible for removing usb0.
> > > > > > >
> > > > > > > Switching from device -> host -> device:
> > > > > > >
> > > > > > > root@yuna:~# trace-cmd record -p function_graph -l *gether_*
> > > > > > >     plugin 'function_graph'
> > > > > > > Hit Ctrl^C to stop recording
> > > > > > > ^CCPU0 data recorded at offset=0x1c8000
> > > > > > >       188 bytes in size (4096 uncompressed)
> > > > > > > CPU1 data recorded at offset=0x1c9000
> > > > > > >       0 bytes in size (0 uncompressed)
> > > > > > > root@yuna:~# trace-cmd report
> > > > > > > cpus=2
> > > > > > >        irq/68-dwc3-725   [000]   514.575337: funcgraph_entry:      #
> > > > > > > 2079.480 us |  gether_disconnect();
> > > > > > >        irq/68-dwc3-946   [000]   524.263731: funcgraph_entry:      +
> > > > > > > 11.640 us  |  gether_disconnect();
> > > > > > >        irq/68-dwc3-946   [000]   524.263743: funcgraph_entry:      !
> > > > > > > 116.520 us |  gether_connect();
> > > > > > >        irq/68-dwc3-946   [000]   524.268029: funcgraph_entry:      #
> > > > > > > 2057.260 us |  gether_disconnect();
> > > > > > >        irq/68-dwc3-946   [000]   524.270089: funcgraph_entry:      !
> > > > > > > 109.000 us |  gether_connect();
> > > > > >
> > > > > > I tried to get a more useful trace:
> > > > > > root@yuna:/sys/kernel/tracing# echo 'gether_*' > set_ftrace_filter
> > > > > > root@yuna:/sys/kernel/tracing# echo 'eem_*' >> set_ftrace_filter
> > > > > > root@yuna:/sys/kernel/tracing# echo function > current_tracer
> > > > > > root@yuna:/sys/kernel/tracing# echo 'reset_config' >> set_ftrace_filter
> > > > > > -> switch to host mode then back to device
> > > > > > root@yuna:/sys/kernel/tracing# cat trace
> > > > > > # tracer: function
> > > > > > #
> > > > > > # entries-in-buffer/entries-written: 53/53   #P:2
> > > > > > #
> > > > > > #                                _-----=> irqs-off/BH-disabled
> > > > > > #                               / _----=> need-resched
> > > > > > #                              | / _---=> hardirq/softirq
> > > > > > #                              || / _--=> preempt-depth
> > > > > > #                              ||| / _-=> migrate-disable
> > > > > > #                              |||| /     delay
> > > > > > #           TASK-PID     CPU#  |||||  TIMESTAMP  FUNCTION
> > > > > > #              | |         |   |||||     |         |
> > > > > >       irq/68-dwc3-523     [000] D..3.   133.990254: reset_config
> > > > > > <-__composite_disconnect
> > > > > >       irq/68-dwc3-523     [000] D..3.   133.992274: eem_disable
> > > > > > <-reset_config
> > > > > >       irq/68-dwc3-523     [000] D..3.   133.992276: gether_disconnect
> > > > > > <-reset_config
> > > > > >       kworker/1:3-443     [001] ...1.   134.022453: eem_unbind
> > > > > > <-purge_configs_funcs
> > > > > >
> > > > > > -> to device mode
> > > > > >
> > > > > >       kworker/1:3-443     [001] ...1.   148.630773: eem_bind
> > > > > > <-usb_add_function
> > > > > >       irq/68-dwc3-734     [000] D..3.   149.155209: eem_set_alt
> > > > > > <-composite_setup
> > > > > >       irq/68-dwc3-734     [000] D..3.   149.155215: gether_disconnect
> > > > > > <-eem_set_alt
> > > > > >       irq/68-dwc3-734     [000] D..3.   149.155220: gether_connect
> > > > > > <-eem_set_alt
> > > > > >       irq/68-dwc3-734     [000] D..3.   149.157287: eem_set_alt
> > > > > > <-composite_setup
> > > > > >       irq/68-dwc3-734     [000] D..3.   149.157292: gether_disconnect
> > > > > > <-eem_set_alt
> > > > > >       irq/68-dwc3-734     [000] D..3.   149.159338: gether_connect
> > > > > > <-eem_set_alt
> > > > > >       irq/68-dwc3-734     [000] D..2.   149.239625: eem_unwrap
> > > > > > <-rx_complete
> > > > > > ...
> > > > > >
> > > > > > I don't know where to look exactly. Any hints?
> > > > >
> > > > > do you see anything related to gether_cleanup() after eem_unbind() ?
> > > >
> > > > Nope. It's a pitty that the trace formatting got messed up above. But as
> > > > you can see I traced gether_* and eem_*. After eem_unbind no traced
> > > > function is called, until I flip the switch to device mode.
> > > > The ... at the end is where I cut uninteresting eem_unwrap <-rx_complete
> > > > and eem_wrap <-eth_start_xmit lines.
> > > >
> > > > > If not then, you may try to enable tracing of TCP/IP stack and
> > > > > network side to check who deleting the sysfs entry
> > > >
> > > > Yes, that's a vast amount of functions to trace. And I don't see how
> > > > that would be related to the patch we're discussing here. I was hoping
> > > > for a little more targeted hint.
> > >
> > > Now filtering 'gether_*', 'eem_*', '*configfs_*', 'composite_*', 'usb_fun*',
> > > 'reset_config' and 'device_remove_file' leads me to:
> > >
> > > TIMESTAMP FUNCTION
> > > | |
> > > 49.952477: eem_wrap <-eth_start_xmit
> > > 55.072455: eem_wrap <-eth_start_xmit
> > > 55.072621: eem_unwrap <-rx_complete
> > > 59.011540: configfs_composite_reset <-usb_gadget_udc_reset
> > > 59.011545: composite_reset <-configfs_composite_reset
> > > 59.011548: reset_config <-__composite_disconnect
> > > 59.013565: eem_disable <-reset_config
> > > 59.013567: gether_disconnect <-reset_config
> > > 59.049560: device_remove_file <-device_remove
> > > 59.051185: configfs_composite_disconnect <-usb_gadget_disco
> > > 59.051189: composite_disconnect <-configfs_composite_discon
> > > 59.051195: configfs_composite_unbind <-gadget_unbind_driver
> > > 59.052519: eem_unbind <-purge_configs_funcs
> > > 59.052529: composite_dev_cleanup <-configfs_composite_unbin
> > > 59.052537: device_remove_file <-composite_dev_cleanup
> > >
> > > device_remove_file gets called twice, once by device_remove after
> > > gether_disconnect (that the one). The 2nd time by composite_dev_cleanup
> > > (removing the gadget)
> >
> > I believe that the device_remove_file function is only removing
> > suspend-specific attributes, not the complete gadget. Typically, when you
> > perform the role switch, the Gadget start/stop function in your UDC driver is
> > called. These functions should not delete the gadget
> >
> > To investigate further, could you please enable the DWC3 functions in ftrace
> > and check who is removing the gadget? I can also enable this on my system
> > and compare the logs with yours, but I will be in PI planning for 1.5 weeks
> > and may not be able to provide immediate support.
>
> Since we are almost at -rc7, I propose to revert and try again next cycle.

Why? There is no problem with this patch!

I don't know why you are so excited to revert the patch instead of investigating the original issue.
Even though I have proved that the problem is caused by usb0 being removed just by role-switching the port by the UDC driver, this behavior is incorrect.

There is no behavior in the Linux kernel that keeps the network interface and removes the related sysfs entry. THIS IS WRONG and has been FIXED without reverting any mainline patch.

Please don't encourage reverting any mainstream patches to avoid investigating a (probably) faulty driver. This is not how the community should work.

Also, I'm a bit wondering: the patch has been in the mainline for some time, and we haven't had any issues except this one, which is due to the wrong behavior of the UDC driver.

>
> > Additionally, please check if you have any customized DWC patches that may be causing this problem.
> >
> > > > You may recall the whole issue did not occur before this patch got applied.
> > > >
> > > > > > > > According to current kernel architecture of u_ether driver, only
> > > > > > > > gether_cleanup should remove the usb0 interface along with its
> > > > > > > > kobject and sysfs interface.
> > > > > > > > I suggest sharing the analysis here to understand why this practice
> > > > > > > > is not followed in your use case or driver ?
> > > > > > >
> > > > > > > Yes, I'll try to trace where that happens.
> > > > > > >
> > > > > > > Nevertheless, the disappearance of the net/usb0 directory seems
> > > > > > > harmless? But the usb: net device remaining after disconnect or role
> > > > > > > switch is not good, as the route remains.
> > > > > > >
> > > > > > > May be they are 2 separate problems. Could you try to reproduce what
> > > > > > > happens if you make eem connection and then unplug?
> > > > > > >
> > > > > > > > I am curious why the driver was developed without adhering to the
> > > > > > > > kernel's gadget architecture.
> > > > > >
> > > > > > I don't know what you mean here. Which driver do you mean?
> > > > > >
> > > > > > > > > > > > I have the dwc3 IP base usb controller, Let me check
> > > > > > > > > > > > with this patch and
> > > > > > > > > > > > share result here.  May be we need some fix in dwc3
> > > > > > > > > > Would have been nice if someone could test on other
> > > > > > > > > > controller as well. But
> > > > > > > > > > another instance of dwc3 is also very welcome.
> > > > > > > > > > > It's quite possible, please test on your side.
> > > > > > > > > > > We are happy to test any fixes if you come up with.
>
> --
> With Best Regards,
> Andy Shevchenko
>
>

2024-05-02 20:13:59

by Ferry Toth

[permalink] [raw]
Subject: Re: [PATCH v4] usb: gadget: u_ether: Replace netif_stop_queue with netif_device_detach

Op 02-05-2024 om 17:29 schreef Hardik Gajjar:
> On Tue, Apr 30, 2024 at 11:12:17PM +0200, Ferry Toth wrote:
>> Hi,
>>
>> Op 30-04-2024 om 21:40 schreef Ferry Toth:
>>> Hi,
>>>
>>> Op 30-04-2024 om 17:32 schreef Hardik Gajjar:
>>>> On Sun, Apr 28, 2024 at 11:07:36PM +0200, Ferry Toth wrote:
>>>>> Hi,
>>>>>
>>>>> Op 25-04-2024 om 23:27 schreef Ferry Toth:
>>>>>> Hi,
>>>>>>
>>>>>> Op 17-04-2024 om 17:13 schreef Hardik Gajjar:
>>>>>>> On Tue, Apr 16, 2024 at 04:48:32PM +0300, Andy Shevchenko wrote:
>>>>>>>> On Thu, Apr 11, 2024 at 10:52:36PM +0200, Ferry Toth wrote:
>>>>>>>>> Op 11-04-2024 om 18:39 schreef Andy Shevchenko:
>>>>>>>>>> On Thu, Apr 11, 2024 at 04:26:37PM +0200, Hardik Gajjar wrote:
>>>>>>>>>>> On Wed, Apr 10, 2024 at 08:37:42PM +0300, Andy Shevchenko wrote:
>>>>>>>>>>>> On Sun, Apr 07, 2024 at 10:51:51PM +0200, Ferry Toth wrote:
>>>>>>>>>>>>> Op 05-04-2024 om 13:38 schreef Hardik Gajjar:
>>>>>>>>
>>>>>>>> ...
>>>>>>>>
>>>>>>>>>>>>> Exactly. And this didn't happen before the 2 patches.
>>>>>>>>>>>>>
>>>>>>>>>>>>> To be precise: /sys/class/net/usb0 is not
>>>>>>>>>>>>> removed and it is a link, the link
>>>>>>>>>>>>> target /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/net/usb0
>>>>>>>>>>>>> no
>>>>>>>>>>>>> longer exists
>>>>>>>>>>> So, it means that the /sys/class/net/usb0 is
>>>>>>>>>>> present, but the symlink is
>>>>>>>>>>> broken. In that case, the dwc3 driver should
>>>>>>>>>>> recreate the device, and the
>>>>>>>>>>> symlink should become active again
>>>>>>>>>
>>>>>>>>> Yes, on first enabling gadget (when device mode is activated):
>>>>>>>>>
>>>>>>>>> root@yuna:~# ls
>>>>>>>>> /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/
>>>>>>>>> driver  net  power  sound  subsystem  suspended  uevent
>>>>>>>>>
>>>>>>>>> Then switching to host mode:
>>>>>>>>>
>>>>>>>>> root@yuna:~# ls
>>>>>>>>> /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/
>>>>>>>>> ls: cannot access
>>>>>>>>> '/sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/':
>>>>>>>>> No such file
>>>>>>>>> or directory
>>>>>>>>>
>>>>>>>>> Then back to device mode:
>>>>>>>>>
>>>>>>>>> root@yuna:~# ls
>>>>>>>>> /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/
>>>>>>>>> driver  power  sound  subsystem  suspended  uevent
>>>>>>>>>
>>>>>>>>> net is missing. But, network functions:
>>>>>>>>>
>>>>>>>>> root@yuna:~# ping 10.42.0.1
>>>>>>>>> PING 10.42.0.1 (10.42.0.1): 56 data bytes
>>>>>>>>>
>>>>>>>>> Mass storage device is created and removed each time as expected.
>>>>>>>>
>>>>>>>> So, what's the conclusion? Shall we move towards revert of those
>>>>>>>> two changes?
>>>>>>>
>>>>>>>
>>>>>>> As promised, I have the tested the this patch with the dwc3 gadget.
>>>>>>> I could not reproduce
>>>>>>> the issue.
>>>>>>>
>>>>>>> I can see the usb0 exist all the time and accessible regardless of
>>>>>>> the role switching of the USB mode (peripheral <-> host)
>>>>>>>
>>>>>>> Following are the logs:
>>>>>>> //Host to device
>>>>>>>
>>>>>>> console:/sys/bus/platform/devices/a800000.ssusb # echo "peripheral"
>>>>>>>> mode
>>>>>>> console:/sys/bus/platform/devices/a800000.ssusb # ls
>>>>>>> a800000.dwc3/gadget/net/
>>>>>>> usb0
>>>>>>>
>>>>>>> //device to host
>>>>>>> console:/sys/bus/platform/devices/a800000.ssusb # echo "host" > mode
>>>>>>> console:/sys/bus/platform/devices/a800000.ssusb # ls
>>>>>>> a800000.dwc3/gadget/net/
>>>>>>> usb0
>>>>>>
>>>>>> That is weird. When I switch to host mode (using the physical switch),
>>>>>> the whole gadget directory is removed (now testing 6.9.0-rc5)
>>>>>>
>>>>>> Switching back to device mode, that gadget directory is recreated. And
>>>>>> gadget/sound as well, but not gadget/net.
>>>>>>
>>>>>>> s a800000.dwc3/gadget/net/usb0
>>>>>>> <
>>>>>>> addr_assign_type    duplex             phys_port_name
>>>>>>> addr_len            flags              phys_switch_id
>>>>>>> address             gro_flush_timeout  power
>>>>>>> broadcast           ifalias            proto_down
>>>>>>> carrier             ifindex            queues
>>>>>>> carrier_changes     iflink             speed
>>>>>>> carrier_down_count  link_mode          statistics
>>>>>>> carrier_up_count    mtu                subsystem
>>>>>>> dev_id              name_assign_type   tx_queue_len
>>>>>>> dev_port            netdev_group       type
>>>>>>> device              operstate          uevent
>>>>>>> dormant             phys_port_id       waiting_for_supplier
>>>>>>> console:/sys/bus/platform/devices/a800000.ssusb # ifconfig -a usb0
>>>>>>> usb0      Link encap:Ethernet  HWaddr 3a:8b:63:97:1a:9a
>>>>>>>             BROADCAST MULTICAST  MTU:1500  Metric:1
>>>>>>>             RX packets:0 errors:0 dropped:0 overruns:0 frame:0
>>>>>>>             TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
>>>>>>>             collisions:0 txqueuelen:1000
>>>>>>>             RX bytes:0 TX bytes:0
>>>>>>>
>>>>>>> console:/sys/bus/platform/devices/a800000.ssusb #
>>>>>>>
>>>>>>> I strongly advise against reverting the patch solely based on the
>>>>>>> observed issue of removing the /sys/class/net/usb0 directory while
>>>>>>> the usb0 interface remains available.
>>>>>>
>>>>>> There's more to it. I also mentioned that switching the role or
>>>>>> unplugging the cable leaves the usb0 connection.
>>>>>>
>>>>>> I have while in host mode:
>>>>>> root@yuna:~# ifconfig -a usb0
>>>>>> usb0: flags=-28605<UP,BROADCAST,RUNNING,MULTICAST,DYNAMIC>  mtu 1500
>>>>>>           inet 10.42.0.221  netmask 255.255.255.0  broadcast
>>>>>> 10.42.0.255
>>>>>>           inet6 fe80::a8bb:ccff:fedd:eef1  prefixlen 64
>>>>>> scopeid 0x20<link>
>>>>>>
>>>>>>
>>>>>> You don't see that because you didn't create a connection at all.
>>>>>>
>>>>>>> Instead, I recommend enabling FTRACE to trace the functions involved
>>>>>>> and identify which faulty call is responsible for removing usb0.
>>>>>>
>>>>>> Switching from device -> host -> device:
>>>>>>
>>>>>> root@yuna:~# trace-cmd record -p function_graph -l *gether_*
>>>>>>     plugin 'function_graph'
>>>>>> Hit Ctrl^C to stop recording
>>>>>> ^CCPU0 data recorded at offset=0x1c8000
>>>>>>       188 bytes in size (4096 uncompressed)
>>>>>> CPU1 data recorded at offset=0x1c9000
>>>>>>       0 bytes in size (0 uncompressed)
>>>>>> root@yuna:~# trace-cmd report
>>>>>> cpus=2
>>>>>>        irq/68-dwc3-725   [000]   514.575337: funcgraph_entry:      #
>>>>>> 2079.480 us |  gether_disconnect();
>>>>>>        irq/68-dwc3-946   [000]   524.263731: funcgraph_entry:      +
>>>>>> 11.640 us  |  gether_disconnect();
>>>>>>        irq/68-dwc3-946   [000]   524.263743: funcgraph_entry:      !
>>>>>> 116.520 us |  gether_connect();
>>>>>>        irq/68-dwc3-946   [000]   524.268029: funcgraph_entry:      #
>>>>>> 2057.260 us |  gether_disconnect();
>>>>>>        irq/68-dwc3-946   [000]   524.270089: funcgraph_entry:      !
>>>>>> 109.000 us |  gether_connect();
>>>>>
>>>>> I tried to get a more useful trace:
>>>>> root@yuna:/sys/kernel/tracing# echo 'gether_*' > set_ftrace_filter
>>>>> root@yuna:/sys/kernel/tracing# echo 'eem_*' >> set_ftrace_filter
>>>>> root@yuna:/sys/kernel/tracing# echo function > current_tracer
>>>>> root@yuna:/sys/kernel/tracing# echo 'reset_config' >> set_ftrace_filter
>>>>> -> switch to host mode then back to device
>>>>> root@yuna:/sys/kernel/tracing# cat trace
>>>>> # tracer: function
>>>>> #
>>>>> # entries-in-buffer/entries-written: 53/53   #P:2
>>>>> #
>>>>> #                                _-----=> irqs-off/BH-disabled
>>>>> #                               / _----=> need-resched
>>>>> #                              | / _---=> hardirq/softirq
>>>>> #                              || / _--=> preempt-depth
>>>>> #                              ||| / _-=> migrate-disable
>>>>> #                              |||| /     delay
>>>>> #           TASK-PID     CPU#  |||||  TIMESTAMP  FUNCTION
>>>>> #              | |         |   |||||     |         |
>>>>>       irq/68-dwc3-523     [000] D..3.   133.990254: reset_config
>>>>> <-__composite_disconnect
>>>>>       irq/68-dwc3-523     [000] D..3.   133.992274: eem_disable
>>>>> <-reset_config
>>>>>       irq/68-dwc3-523     [000] D..3.   133.992276: gether_disconnect
>>>>> <-reset_config
>>>>>       kworker/1:3-443     [001] ...1.   134.022453: eem_unbind
>>>>> <-purge_configs_funcs
>>>>>
>>>>> -> to device mode
>>>>>
>>>>>       kworker/1:3-443     [001] ...1.   148.630773: eem_bind
>>>>> <-usb_add_function
>>>>>       irq/68-dwc3-734     [000] D..3.   149.155209: eem_set_alt
>>>>> <-composite_setup
>>>>>       irq/68-dwc3-734     [000] D..3.   149.155215: gether_disconnect
>>>>> <-eem_set_alt
>>>>>       irq/68-dwc3-734     [000] D..3.   149.155220: gether_connect
>>>>> <-eem_set_alt
>>>>>       irq/68-dwc3-734     [000] D..3.   149.157287: eem_set_alt
>>>>> <-composite_setup
>>>>>       irq/68-dwc3-734     [000] D..3.   149.157292: gether_disconnect
>>>>> <-eem_set_alt
>>>>>       irq/68-dwc3-734     [000] D..3.   149.159338: gether_connect
>>>>> <-eem_set_alt
>>>>>       irq/68-dwc3-734     [000] D..2.   149.239625: eem_unwrap
>>>>> <-rx_complete
>>>>> ...
>>>>>
>>>>> I don't know where to look exactly. Any hints?
>>>>
>>>> do you see anything related to gether_cleanup() after eem_unbind() ?
>>>
>>> Nope. It's a pitty that the trace formatting got messed up above. But as
>>> you can see I traced gether_* and eem_*. After eem_unbind no traced
>>> function is called, until I flip the switch to device mode.
>>> The ... at the end is where I cut uninteresting eem_unwrap <-rx_complete
>>> and eem_wrap <-eth_start_xmit lines.
>>>
>>>> If not then, you may try to enable tracing of TCP/IP stack and
>>>> network side to check who deleting the sysfs entry
>>>
>>> Yes, that's a vast amount of functions to trace. And I don't see how
>>> that would be related to the patch we're discussing here. I was hoping
>>> for a little more targeted hint.
>>
>> Now filtering 'gether_*', 'eem_*', '*configfs_*', 'composite_*', 'usb_fun*',
>> 'reset_config' and 'device_remove_file' leads me to:
>>
>> TIMESTAMP FUNCTION
>> | |
>> 49.952477: eem_wrap <-eth_start_xmit
>> 55.072455: eem_wrap <-eth_start_xmit
>> 55.072621: eem_unwrap <-rx_complete
>> 59.011540: configfs_composite_reset <-usb_gadget_udc_reset
>> 59.011545: composite_reset <-configfs_composite_reset
>> 59.011548: reset_config <-__composite_disconnect
>> 59.013565: eem_disable <-reset_config
>> 59.013567: gether_disconnect <-reset_config
>> 59.049560: device_remove_file <-device_remove
>> 59.051185: configfs_composite_disconnect <-usb_gadget_disco
>> 59.051189: composite_disconnect <-configfs_composite_discon
>> 59.051195: configfs_composite_unbind <-gadget_unbind_driver
>> 59.052519: eem_unbind <-purge_configs_funcs
>> 59.052529: composite_dev_cleanup <-configfs_composite_unbin
>> 59.052537: device_remove_file <-composite_dev_cleanup
>>
>> device_remove_file gets called twice, once by device_remove after
>> gether_disconnect (that the one). The 2nd time by composite_dev_cleanup
>> (removing the gadget)
>
> I believe that the device_remove_file function is only removing suspend-specific attributes, not the complete gadget.
> Typically, when you perform the role switch, the Gadget start/stop function in your UDC driver is called. These functions should not delete the gadget
>
> To investigate further, could you please enable the DWC3 functions in ftrace and check who is removing the gadget?
> I can also enable this on my system and compare the logs with yours, but I will be in PI planning for 1.5 weeks and may not be able to provide immediate support.

Yes, but of course adding dwc3_* (and usb_*) also traces host mode, so
trace is 600kb. I cut uninteresting stuff before
configfs_composite_reset <-usb_gadget_udc_reset and after
__dwc3_set_mode, < 300 lines remain. See attached tar.gz for you to compare.

> Additionally, please check if you have any customized DWC patches that may be causing this problem.
>
>>
>>> You may recall the whole issue did not occur before this patch got applied.
>>>
>>>> Hardik
>>>>
>>>>
>>>>>
>>>>>>
>>>>>>> According to current kernel architecture of u_ether driver, only
>>>>>>> gether_cleanup should remove the usb0 interface along with its
>>>>>>> kobject and sysfs interface.
>>>>>>> I suggest sharing the analysis here to understand why this practice
>>>>>>> is not followed in your use case or driver ?
>>>>>>
>>>>>> Yes, I'll try to trace where that happens.
>>>>>>
>>>>>> Nevertheless, the disappearance of the net/usb0 directory seems
>>>>>> harmless? But the usb: net device remaining after disconnect or role
>>>>>> switch is not good, as the route remains.
>>>>>>
>>>>>> May be they are 2 separate problems. Could you try to reproduce what
>>>>>> happens if you make eem connection and then unplug?
>>>>>>
>>>>>>> I am curious why the driver was developed without adhering to the
>>>>>>> kernel's gadget architecture.
>>>>>
>>>>> I don't know what you mean here. Which driver do you mean?
>>>>>
>>>>>>>>
>>>>>>>>>>> I have the dwc3 IP base usb controller, Let me check
>>>>>>>>>>> with this patch and
>>>>>>>>>>> share result here.  May be we need some fix in dwc3
>>>>>>>>> Would have been nice if someone could test on other
>>>>>>>>> controller as well. But
>>>>>>>>> another instance of dwc3 is also very welcome.
>>>>>>>>>> It's quite possible, please test on your side.
>>>>>>>>>> We are happy to test any fixes if you come up with.
>>>>>>>>
>>>>>>>> --
>>>>>>>> With Best Regards,
>>>>>>>> Andy Shevchenko
>>>>>>>>
>>>>>>>>
>>>>>>
>>>


Attachments:
trace-u_ether.tar.gz (1.45 kB)

2024-05-02 20:32:33

by Ferry Toth

[permalink] [raw]
Subject: Re: [PATCH v4] usb: gadget: u_ether: Replace netif_stop_queue with netif_device_detach

Oops, sorry, wrong file attached . Now correct one.

Op 02-05-2024 om 22:13 schreef Ferry Toth:
> Op 02-05-2024 om 17:29 schreef Hardik Gajjar:
>> On Tue, Apr 30, 2024 at 11:12:17PM +0200, Ferry Toth wrote:
>>> Hi,
>>>
>>> Op 30-04-2024 om 21:40 schreef Ferry Toth:
>>>> Hi,
>>>>
>>>> Op 30-04-2024 om 17:32 schreef Hardik Gajjar:
>>>>> On Sun, Apr 28, 2024 at 11:07:36PM +0200, Ferry Toth wrote:
>>>>>> Hi,
>>>>>>
>>>>>> Op 25-04-2024 om 23:27 schreef Ferry Toth:
>>>>>>> Hi,
>>>>>>>
>>>>>>> Op 17-04-2024 om 17:13 schreef Hardik Gajjar:
>>>>>>>> On Tue, Apr 16, 2024 at 04:48:32PM +0300, Andy Shevchenko wrote:
>>>>>>>>> On Thu, Apr 11, 2024 at 10:52:36PM +0200, Ferry Toth wrote:
>>>>>>>>>> Op 11-04-2024 om 18:39 schreef Andy Shevchenko:
>>>>>>>>>>> On Thu, Apr 11, 2024 at 04:26:37PM +0200, Hardik Gajjar wrote:
>>>>>>>>>>>> On Wed, Apr 10, 2024 at 08:37:42PM +0300, Andy Shevchenko
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>> On Sun, Apr 07, 2024 at 10:51:51PM +0200, Ferry Toth wrote:
>>>>>>>>>>>>>> Op 05-04-2024 om 13:38 schreef Hardik Gajjar:
>>>>>>>>>
>>>>>>>>> ...
>>>>>>>>>
>>>>>>>>>>>>>> Exactly. And this didn't happen before the 2 patches.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> To be precise: /sys/class/net/usb0 is not
>>>>>>>>>>>>>> removed and it is a link, the link
>>>>>>>>>>>>>> target
>>>>>>>>>>>>>> /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/net/usb0
>>>>>>>>>>>>>> no
>>>>>>>>>>>>>> longer exists
>>>>>>>>>>>> So, it means that the /sys/class/net/usb0 is
>>>>>>>>>>>> present, but the symlink is
>>>>>>>>>>>> broken. In that case, the dwc3 driver should
>>>>>>>>>>>> recreate the device, and the
>>>>>>>>>>>> symlink should become active again
>>>>>>>>>>
>>>>>>>>>> Yes, on first enabling gadget (when device mode is activated):
>>>>>>>>>>
>>>>>>>>>> root@yuna:~# ls
>>>>>>>>>> /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/
>>>>>>>>>> driver  net  power  sound  subsystem  suspended  uevent
>>>>>>>>>>
>>>>>>>>>> Then switching to host mode:
>>>>>>>>>>
>>>>>>>>>> root@yuna:~# ls
>>>>>>>>>> /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/
>>>>>>>>>> ls: cannot access
>>>>>>>>>> '/sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/':
>>>>>>>>>> No such file
>>>>>>>>>> or directory
>>>>>>>>>>
>>>>>>>>>> Then back to device mode:
>>>>>>>>>>
>>>>>>>>>> root@yuna:~# ls
>>>>>>>>>> /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/
>>>>>>>>>> driver  power  sound  subsystem  suspended  uevent
>>>>>>>>>>
>>>>>>>>>> net is missing. But, network functions:
>>>>>>>>>>
>>>>>>>>>> root@yuna:~# ping 10.42.0.1
>>>>>>>>>> PING 10.42.0.1 (10.42.0.1): 56 data bytes
>>>>>>>>>>
>>>>>>>>>> Mass storage device is created and removed each time as expected.
>>>>>>>>>
>>>>>>>>> So, what's the conclusion? Shall we move towards revert of those
>>>>>>>>> two changes?
>>>>>>>>
>>>>>>>>
>>>>>>>> As promised, I have the tested the this patch with the dwc3 gadget.
>>>>>>>> I could not reproduce
>>>>>>>> the issue.
>>>>>>>>
>>>>>>>> I can see the usb0 exist all the time and accessible regardless of
>>>>>>>> the role switching of the USB mode (peripheral <-> host)
>>>>>>>>
>>>>>>>> Following are the logs:
>>>>>>>> //Host to device
>>>>>>>>
>>>>>>>> console:/sys/bus/platform/devices/a800000.ssusb # echo "peripheral"
>>>>>>>>> mode
>>>>>>>> console:/sys/bus/platform/devices/a800000.ssusb # ls
>>>>>>>> a800000.dwc3/gadget/net/
>>>>>>>> usb0
>>>>>>>>
>>>>>>>> //device to host
>>>>>>>> console:/sys/bus/platform/devices/a800000.ssusb # echo "host" >
>>>>>>>> mode
>>>>>>>> console:/sys/bus/platform/devices/a800000.ssusb # ls
>>>>>>>> a800000.dwc3/gadget/net/
>>>>>>>> usb0
>>>>>>>
>>>>>>> That is weird. When I switch to host mode (using the physical
>>>>>>> switch),
>>>>>>> the whole gadget directory is removed (now testing 6.9.0-rc5)
>>>>>>>
>>>>>>> Switching back to device mode, that gadget directory is
>>>>>>> recreated. And
>>>>>>> gadget/sound as well, but not gadget/net.
>>>>>>>
>>>>>>>> s a800000.dwc3/gadget/net/usb0
>>>>>>>> <
>>>>>>>> addr_assign_type    duplex             phys_port_name
>>>>>>>> addr_len            flags              phys_switch_id
>>>>>>>> address             gro_flush_timeout  power
>>>>>>>> broadcast           ifalias            proto_down
>>>>>>>> carrier             ifindex            queues
>>>>>>>> carrier_changes     iflink             speed
>>>>>>>> carrier_down_count  link_mode          statistics
>>>>>>>> carrier_up_count    mtu                subsystem
>>>>>>>> dev_id              name_assign_type   tx_queue_len
>>>>>>>> dev_port            netdev_group       type
>>>>>>>> device              operstate          uevent
>>>>>>>> dormant             phys_port_id       waiting_for_supplier
>>>>>>>> console:/sys/bus/platform/devices/a800000.ssusb # ifconfig -a usb0
>>>>>>>> usb0      Link encap:Ethernet  HWaddr 3a:8b:63:97:1a:9a
>>>>>>>>              BROADCAST MULTICAST  MTU:1500  Metric:1
>>>>>>>>              RX packets:0 errors:0 dropped:0 overruns:0 frame:0
>>>>>>>>              TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
>>>>>>>>              collisions:0 txqueuelen:1000
>>>>>>>>              RX bytes:0 TX bytes:0
>>>>>>>>
>>>>>>>> console:/sys/bus/platform/devices/a800000.ssusb #
>>>>>>>>
>>>>>>>> I strongly advise against reverting the patch solely based on the
>>>>>>>> observed issue of removing the /sys/class/net/usb0 directory while
>>>>>>>> the usb0 interface remains available.
>>>>>>>
>>>>>>> There's more to it. I also mentioned that switching the role or
>>>>>>> unplugging the cable leaves the usb0 connection.
>>>>>>>
>>>>>>> I have while in host mode:
>>>>>>> root@yuna:~# ifconfig -a usb0
>>>>>>> usb0: flags=-28605<UP,BROADCAST,RUNNING,MULTICAST,DYNAMIC>  mtu 1500
>>>>>>>            inet 10.42.0.221  netmask 255.255.255.0  broadcast
>>>>>>> 10.42.0.255
>>>>>>>            inet6 fe80::a8bb:ccff:fedd:eef1  prefixlen 64
>>>>>>> scopeid 0x20<link>
>>>>>>>
>>>>>>>
>>>>>>> You don't see that because you didn't create a connection at all.
>>>>>>>
>>>>>>>> Instead, I recommend enabling FTRACE to trace the functions
>>>>>>>> involved
>>>>>>>> and identify which faulty call is responsible for removing usb0.
>>>>>>>
>>>>>>> Switching from device -> host -> device:
>>>>>>>
>>>>>>> root@yuna:~# trace-cmd record -p function_graph -l *gether_*
>>>>>>>      plugin 'function_graph'
>>>>>>> Hit Ctrl^C to stop recording
>>>>>>> ^CCPU0 data recorded at offset=0x1c8000
>>>>>>>        188 bytes in size (4096 uncompressed)
>>>>>>> CPU1 data recorded at offset=0x1c9000
>>>>>>>        0 bytes in size (0 uncompressed)
>>>>>>> root@yuna:~# trace-cmd report
>>>>>>> cpus=2
>>>>>>>         irq/68-dwc3-725   [000]   514.575337:
>>>>>>> funcgraph_entry:      #
>>>>>>> 2079.480 us |  gether_disconnect();
>>>>>>>         irq/68-dwc3-946   [000]   524.263731:
>>>>>>> funcgraph_entry:      +
>>>>>>> 11.640 us  |  gether_disconnect();
>>>>>>>         irq/68-dwc3-946   [000]   524.263743:
>>>>>>> funcgraph_entry:      !
>>>>>>> 116.520 us |  gether_connect();
>>>>>>>         irq/68-dwc3-946   [000]   524.268029:
>>>>>>> funcgraph_entry:      #
>>>>>>> 2057.260 us |  gether_disconnect();
>>>>>>>         irq/68-dwc3-946   [000]   524.270089:
>>>>>>> funcgraph_entry:      !
>>>>>>> 109.000 us |  gether_connect();
>>>>>>
>>>>>> I tried to get a more useful trace:
>>>>>> root@yuna:/sys/kernel/tracing# echo 'gether_*' > set_ftrace_filter
>>>>>> root@yuna:/sys/kernel/tracing# echo 'eem_*' >> set_ftrace_filter
>>>>>> root@yuna:/sys/kernel/tracing# echo function > current_tracer
>>>>>> root@yuna:/sys/kernel/tracing# echo 'reset_config' >>
>>>>>> set_ftrace_filter
>>>>>> -> switch to host mode then back to device
>>>>>> root@yuna:/sys/kernel/tracing# cat trace
>>>>>> # tracer: function
>>>>>> #
>>>>>> # entries-in-buffer/entries-written: 53/53   #P:2
>>>>>> #
>>>>>> #                                _-----=> irqs-off/BH-disabled
>>>>>> #                               / _----=> need-resched
>>>>>> #                              | / _---=> hardirq/softirq
>>>>>> #                              || / _--=> preempt-depth
>>>>>> #                              ||| / _-=> migrate-disable
>>>>>> #                              |||| /     delay
>>>>>> #           TASK-PID     CPU#  |||||  TIMESTAMP  FUNCTION
>>>>>> #              | |         |   |||||     |         |
>>>>>>        irq/68-dwc3-523     [000] D..3.   133.990254: reset_config
>>>>>> <-__composite_disconnect
>>>>>>        irq/68-dwc3-523     [000] D..3.   133.992274: eem_disable
>>>>>> <-reset_config
>>>>>>        irq/68-dwc3-523     [000] D..3.   133.992276:
>>>>>> gether_disconnect
>>>>>> <-reset_config
>>>>>>        kworker/1:3-443     [001] ...1.   134.022453: eem_unbind
>>>>>> <-purge_configs_funcs
>>>>>>
>>>>>> -> to device mode
>>>>>>
>>>>>>        kworker/1:3-443     [001] ...1.   148.630773: eem_bind
>>>>>> <-usb_add_function
>>>>>>        irq/68-dwc3-734     [000] D..3.   149.155209: eem_set_alt
>>>>>> <-composite_setup
>>>>>>        irq/68-dwc3-734     [000] D..3.   149.155215:
>>>>>> gether_disconnect
>>>>>> <-eem_set_alt
>>>>>>        irq/68-dwc3-734     [000] D..3.   149.155220: gether_connect
>>>>>> <-eem_set_alt
>>>>>>        irq/68-dwc3-734     [000] D..3.   149.157287: eem_set_alt
>>>>>> <-composite_setup
>>>>>>        irq/68-dwc3-734     [000] D..3.   149.157292:
>>>>>> gether_disconnect
>>>>>> <-eem_set_alt
>>>>>>        irq/68-dwc3-734     [000] D..3.   149.159338: gether_connect
>>>>>> <-eem_set_alt
>>>>>>        irq/68-dwc3-734     [000] D..2.   149.239625: eem_unwrap
>>>>>> <-rx_complete
>>>>>> ...
>>>>>>
>>>>>> I don't know where to look exactly. Any hints?
>>>>>
>>>>> do you see anything related to gether_cleanup() after eem_unbind() ?
>>>>
>>>> Nope. It's a pitty that the trace formatting got messed up above.
>>>> But as
>>>> you can see I traced gether_* and eem_*. After eem_unbind no traced
>>>> function is called, until I flip the switch to device mode.
>>>> The ... at the end is where I cut uninteresting eem_unwrap
>>>> <-rx_complete
>>>> and eem_wrap <-eth_start_xmit lines.
>>>>
>>>>> If not then, you may try to enable tracing of TCP/IP stack and
>>>>> network side to check who deleting the sysfs entry
>>>>
>>>> Yes, that's a vast amount of functions to trace. And I don't see how
>>>> that would be related to the patch we're discussing here. I was hoping
>>>> for a little more targeted hint.
>>>
>>> Now filtering 'gether_*', 'eem_*', '*configfs_*', 'composite_*',
>>> 'usb_fun*',
>>> 'reset_config' and 'device_remove_file' leads me to:
>>>
>>> TIMESTAMP  FUNCTION
>>>     |         |
>>>    49.952477: eem_wrap <-eth_start_xmit
>>>    55.072455: eem_wrap <-eth_start_xmit
>>>    55.072621: eem_unwrap <-rx_complete
>>>    59.011540: configfs_composite_reset <-usb_gadget_udc_reset
>>>    59.011545: composite_reset <-configfs_composite_reset
>>>    59.011548: reset_config <-__composite_disconnect
>>>    59.013565: eem_disable <-reset_config
>>>    59.013567: gether_disconnect <-reset_config
>>>    59.049560: device_remove_file <-device_remove
>>>    59.051185: configfs_composite_disconnect <-usb_gadget_disco
>>>    59.051189: composite_disconnect <-configfs_composite_discon
>>>    59.051195: configfs_composite_unbind <-gadget_unbind_driver
>>>    59.052519: eem_unbind <-purge_configs_funcs
>>>    59.052529: composite_dev_cleanup <-configfs_composite_unbin
>>>    59.052537: device_remove_file <-composite_dev_cleanup
>>>
>>> device_remove_file gets called twice, once by device_remove after
>>> gether_disconnect (that the one). The 2nd time by composite_dev_cleanup
>>> (removing the gadget)
>>
>> I believe that the device_remove_file function is only removing
>> suspend-specific attributes, not the complete gadget.
>> Typically, when you perform the role switch, the Gadget start/stop
>> function in your UDC driver is called. These functions should not
>> delete the gadget
>>
>> To investigate further, could you please enable the DWC3 functions in
>> ftrace and check who is removing the gadget?
>> I can also enable this on my system and compare the logs with yours,
>> but I will be in PI planning for 1.5 weeks and may not be able to
>> provide immediate support.
>
> Yes, but of course adding dwc3_* (and usb_*) also traces host mode, so
> trace is 600kb. I cut uninteresting stuff before
> configfs_composite_reset <-usb_gadget_udc_reset and after
> __dwc3_set_mode, < 300 lines remain. See attached tar.gz for you to
> compare.
>
>> Additionally, please check if you have any customized DWC patches that
>> may be causing this problem.
>>
>>>
>>>> You may recall the whole issue did not occur before this patch got
>>>> applied.
>>>>
>>>>> Hardik
>>>>>
>>>>>
>>>>>>
>>>>>>>
>>>>>>>> According to current kernel architecture of u_ether driver, only
>>>>>>>> gether_cleanup should remove the usb0 interface along with its
>>>>>>>> kobject and sysfs interface.
>>>>>>>> I suggest sharing the analysis here to understand why this practice
>>>>>>>> is not followed in your use case or driver ?
>>>>>>>
>>>>>>> Yes, I'll try to trace where that happens.
>>>>>>>
>>>>>>> Nevertheless, the disappearance of the net/usb0 directory seems
>>>>>>> harmless? But the usb: net device remaining after disconnect or role
>>>>>>> switch is not good, as the route remains.
>>>>>>>
>>>>>>> May be they are 2 separate problems. Could you try to reproduce what
>>>>>>> happens if you make eem connection and then unplug?
>>>>>>>
>>>>>>>> I am curious why the driver was developed without adhering to the
>>>>>>>> kernel's gadget architecture.
>>>>>>
>>>>>> I don't know what you mean here. Which driver do you mean?
>>>>>>
>>>>>>>>>
>>>>>>>>>>>> I have the dwc3 IP base usb controller, Let me check
>>>>>>>>>>>> with this patch and
>>>>>>>>>>>> share result here.  May be we need some fix in dwc3
>>>>>>>>>> Would have been nice if someone could test on other
>>>>>>>>>> controller as well. But
>>>>>>>>>> another instance of dwc3 is also very welcome.
>>>>>>>>>>> It's quite possible, please test on your side.
>>>>>>>>>>> We are happy to test any fixes if you come up with.
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> With Best Regards,
>>>>>>>>> Andy Shevchenko
>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>


Attachments:
trace_u_ether.tar.gz (2.44 kB)

2024-05-03 07:24:38

by Thorsten Leemhuis

[permalink] [raw]
Subject: Re: [PATCH v4] usb: gadget: u_ether: Replace netif_stop_queue with netif_device_detach

Greg, not totally sure, but this might be something where you might need
to make a judgement call. See below:

On 02.05.24 18:16, Hardik Gajjar wrote:
> On Thu, May 02, 2024 at 06:31:48PM +0300, Andy Shevchenko wrote:
>> On Thu, May 02, 2024 at 05:29:16PM +0200, Hardik Gajjar wrote:
>>> On Tue, Apr 30, 2024 at 11:12:17PM +0200, Ferry Toth wrote:
>>>> Op 30-04-2024 om 21:40 schreef Ferry Toth:
>>>>> Op 30-04-2024 om 17:32 schreef Hardik Gajjar:
>>>>>> On Sun, Apr 28, 2024 at 11:07:36PM +0200, Ferry Toth wrote:
>>>>>>> Op 25-04-2024 om 23:27 schreef Ferry Toth:
>>>>>>>> Op 17-04-2024 om 17:13 schreef Hardik Gajjar:
>>>>>>>>> On Tue, Apr 16, 2024 at 04:48:32PM +0300, Andy Shevchenko wrote:
>>>>>>>>>> On Thu, Apr 11, 2024 at 10:52:36PM +0200, Ferry Toth wrote:
>>>>>>>>>>> Op 11-04-2024 om 18:39 schreef Andy Shevchenko:
>>>>>>>>>>>> On Thu, Apr 11, 2024 at 04:26:37PM +0200, Hardik Gajjar wrote:
>>>>>>>>>>>>> On Wed, Apr 10, 2024 at 08:37:42PM +0300, Andy Shevchenko wrote:
>>>>>>>>>>>>>> On Sun, Apr 07, 2024 at 10:51:51PM +0200, Ferry Toth wrote:
>>>>>>>>>>>>>>> Op 05-04-2024 om 13:38 schreef Hardik Gajjar:
>>

>> ...
>>
>>>>>>>>>>>>>>> Exactly. And this didn't happen before the 2 patches.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> To be precise: /sys/class/net/usb0 is not
>>>>>>>>>>>>>>> removed and it is a link, the link
>>>>>>>>>>>>>>> target /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/net/usb0
>>>>>>>>>>>>>>> no
>>>>>>>>>>>>>>> longer exists
>>>>>>>>>>>>> So, it means that the /sys/class/net/usb0 is
>>>>>>>>>>>>> present, but the symlink is
>>>>>>>>>>>>> broken. In that case, the dwc3 driver should
>>>>>>>>>>>>> recreate the device, and the
>>>>>>>>>>>>> symlink should become active again
>>>>>>>>>>>
>>>>>>>>>>> Yes, on first enabling gadget (when device mode is activated):
>>>>>>>>>>>
>>>>>>>>>>> root@yuna:~# ls
>>>>>>>>>>> /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/
>>>>>>>>>>> driver  net  power  sound  subsystem  suspended  uevent
>>>>>>>>>>>
>>>>>>>>>>> Then switching to host mode:
>>>>>>>>>>>
>>>>>>>>>>> root@yuna:~# ls
>>>>>>>>>>> /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/
>>>>>>>>>>> ls: cannot access
>>>>>>>>>>> '/sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/':
>>>>>>>>>>> No such file
>>>>>>>>>>> or directory
>>>>>>>>>>>
>>>>>>>>>>> Then back to device mode:
>>>>>>>>>>>
>>>>>>>>>>> root@yuna:~# ls
>>>>>>>>>>> /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/
>>>>>>>>>>> driver  power  sound  subsystem  suspended  uevent
>>>>>>>>>>>
>>>>>>>>>>> net is missing. But, network functions:
>>>>>>>>>>>
>>>>>>>>>>> root@yuna:~# ping 10.42.0.1
>>>>>>>>>>> PING 10.42.0.1 (10.42.0.1): 56 data bytes
>>>>>>>>>>>
>>>>>>>>>>> Mass storage device is created and removed each time as expected.
>>>>>>>>>>
>>>>>>>>>> So, what's the conclusion? Shall we move towards revert of those
>>>>>>>>>> two changes?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> As promised, I have the tested the this patch with the dwc3 gadget.
>>>>>>>>> I could not reproduce
>>>>>>>>> the issue.
>>>>>>>>>
>>>>>>>>> I can see the usb0 exist all the time and accessible regardless of
>>>>>>>>> the role switching of the USB mode (peripheral <-> host)
>>>>>>>>>
>>>>>>>>> Following are the logs:
>>>>>>>>> //Host to device
>>>>>>>>>
>>>>>>>>> console:/sys/bus/platform/devices/a800000.ssusb # echo "peripheral"
>>>>>>>>>> mode
>>>>>>>>> console:/sys/bus/platform/devices/a800000.ssusb # ls
>>>>>>>>> a800000.dwc3/gadget/net/
>>>>>>>>> usb0
>>>>>>>>>
>>>>>>>>> //device to host
>>>>>>>>> console:/sys/bus/platform/devices/a800000.ssusb # echo "host" > mode
>>>>>>>>> console:/sys/bus/platform/devices/a800000.ssusb # ls
>>>>>>>>> a800000.dwc3/gadget/net/
>>>>>>>>> usb0
>>>>>>>>
>>>>>>>> That is weird. When I switch to host mode (using the physical switch),
>>>>>>>> the whole gadget directory is removed (now testing 6.9.0-rc5)
>>>>>>>>
>>>>>>>> Switching back to device mode, that gadget directory is recreated. And
>>>>>>>> gadget/sound as well, but not gadget/net.
>>>>>>>>
>>>>>>>>> s a800000.dwc3/gadget/net/usb0
>>>>>>>>> <
>>>>>>>>> addr_assign_type    duplex             phys_port_name
>>>>>>>>> addr_len            flags              phys_switch_id
>>>>>>>>> address             gro_flush_timeout  power
>>>>>>>>> broadcast           ifalias            proto_down
>>>>>>>>> carrier             ifindex            queues
>>>>>>>>> carrier_changes     iflink             speed
>>>>>>>>> carrier_down_count  link_mode          statistics
>>>>>>>>> carrier_up_count    mtu                subsystem
>>>>>>>>> dev_id              name_assign_type   tx_queue_len
>>>>>>>>> dev_port            netdev_group       type
>>>>>>>>> device              operstate          uevent
>>>>>>>>> dormant             phys_port_id       waiting_for_supplier
>>>>>>>>> console:/sys/bus/platform/devices/a800000.ssusb # ifconfig -a usb0
>>>>>>>>> usb0      Link encap:Ethernet  HWaddr 3a:8b:63:97:1a:9a
>>>>>>>>>             BROADCAST MULTICAST  MTU:1500  Metric:1
>>>>>>>>>             RX packets:0 errors:0 dropped:0 overruns:0 frame:0
>>>>>>>>>             TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
>>>>>>>>>             collisions:0 txqueuelen:1000
>>>>>>>>>             RX bytes:0 TX bytes:0
>>>>>>>>>
>>>>>>>>> console:/sys/bus/platform/devices/a800000.ssusb #
>>>>>>>>>
>>>>>>>>> I strongly advise against reverting the patch solely based on the
>>>>>>>>> observed issue of removing the /sys/class/net/usb0 directory while
>>>>>>>>> the usb0 interface remains available.
>>>>>>>>
>>>>>>>> There's more to it. I also mentioned that switching the role or
>>>>>>>> unplugging the cable leaves the usb0 connection.
>>>>>>>>
>>>>>>>> I have while in host mode:
>>>>>>>> root@yuna:~# ifconfig -a usb0
>>>>>>>> usb0: flags=-28605<UP,BROADCAST,RUNNING,MULTICAST,DYNAMIC>  mtu 1500
>>>>>>>>           inet 10.42.0.221  netmask 255.255.255.0  broadcast
>>>>>>>> 10.42.0.255
>>>>>>>>           inet6 fe80::a8bb:ccff:fedd:eef1  prefixlen 64 
>>>>>>>> scopeid 0x20<link>
>>>>>>>>
>>>>>>>>
>>>>>>>> You don't see that because you didn't create a connection at all.
>>>>>>>>
>>>>>>>>> Instead, I recommend enabling FTRACE to trace the functions involved
>>>>>>>>> and identify which faulty call is responsible for removing usb0.
>>>>>>>>
>>>>>>>> Switching from device -> host -> device:
>>>>>>>>
>>>>>>>> root@yuna:~# trace-cmd record -p function_graph -l *gether_*
>>>>>>>>     plugin 'function_graph'
>>>>>>>> Hit Ctrl^C to stop recording
>>>>>>>> ^CCPU0 data recorded at offset=0x1c8000
>>>>>>>>       188 bytes in size (4096 uncompressed)
>>>>>>>> CPU1 data recorded at offset=0x1c9000
>>>>>>>>       0 bytes in size (0 uncompressed)
>>>>>>>> root@yuna:~# trace-cmd report
>>>>>>>> cpus=2
>>>>>>>>        irq/68-dwc3-725   [000]   514.575337: funcgraph_entry:      #
>>>>>>>> 2079.480 us |  gether_disconnect();
>>>>>>>>        irq/68-dwc3-946   [000]   524.263731: funcgraph_entry:      +
>>>>>>>> 11.640 us  |  gether_disconnect();
>>>>>>>>        irq/68-dwc3-946   [000]   524.263743: funcgraph_entry:      !
>>>>>>>> 116.520 us |  gether_connect();
>>>>>>>>        irq/68-dwc3-946   [000]   524.268029: funcgraph_entry:      #
>>>>>>>> 2057.260 us |  gether_disconnect();
>>>>>>>>        irq/68-dwc3-946   [000]   524.270089: funcgraph_entry:      !
>>>>>>>> 109.000 us |  gether_connect();
>>>>>>>
>>>>>>> I tried to get a more useful trace:
>>>>>>> root@yuna:/sys/kernel/tracing# echo 'gether_*' > set_ftrace_filter
>>>>>>> root@yuna:/sys/kernel/tracing# echo 'eem_*' >> set_ftrace_filter
>>>>>>> root@yuna:/sys/kernel/tracing# echo function > current_tracer
>>>>>>> root@yuna:/sys/kernel/tracing# echo 'reset_config' >> set_ftrace_filter
>>>>>>> -> switch to host mode then back to device
>>>>>>> root@yuna:/sys/kernel/tracing# cat trace
>>>>>>> # tracer: function
>>>>>>> #
>>>>>>> # entries-in-buffer/entries-written: 53/53   #P:2
>>>>>>> #
>>>>>>> #                                _-----=> irqs-off/BH-disabled
>>>>>>> #                               / _----=> need-resched
>>>>>>> #                              | / _---=> hardirq/softirq
>>>>>>> #                              || / _--=> preempt-depth
>>>>>>> #                              ||| / _-=> migrate-disable
>>>>>>> #                              |||| /     delay
>>>>>>> #           TASK-PID     CPU#  |||||  TIMESTAMP  FUNCTION
>>>>>>> #              | |         |   |||||     |         |
>>>>>>>       irq/68-dwc3-523     [000] D..3.   133.990254: reset_config
>>>>>>> <-__composite_disconnect
>>>>>>>       irq/68-dwc3-523     [000] D..3.   133.992274: eem_disable
>>>>>>> <-reset_config
>>>>>>>       irq/68-dwc3-523     [000] D..3.   133.992276: gether_disconnect
>>>>>>> <-reset_config
>>>>>>>       kworker/1:3-443     [001] ...1.   134.022453: eem_unbind
>>>>>>> <-purge_configs_funcs
>>>>>>>
>>>>>>> -> to device mode
>>>>>>>
>>>>>>>       kworker/1:3-443     [001] ...1.   148.630773: eem_bind
>>>>>>> <-usb_add_function
>>>>>>>       irq/68-dwc3-734     [000] D..3.   149.155209: eem_set_alt
>>>>>>> <-composite_setup
>>>>>>>       irq/68-dwc3-734     [000] D..3.   149.155215: gether_disconnect
>>>>>>> <-eem_set_alt
>>>>>>>       irq/68-dwc3-734     [000] D..3.   149.155220: gether_connect
>>>>>>> <-eem_set_alt
>>>>>>>       irq/68-dwc3-734     [000] D..3.   149.157287: eem_set_alt
>>>>>>> <-composite_setup
>>>>>>>       irq/68-dwc3-734     [000] D..3.   149.157292: gether_disconnect
>>>>>>> <-eem_set_alt
>>>>>>>       irq/68-dwc3-734     [000] D..3.   149.159338: gether_connect
>>>>>>> <-eem_set_alt
>>>>>>>       irq/68-dwc3-734     [000] D..2.   149.239625: eem_unwrap
>>>>>>> <-rx_complete
>>>>>>> ...
>>>>>>>
>>>>>>> I don't know where to look exactly. Any hints?
>>>>>>
>>>>>> do you see anything related to gether_cleanup() after eem_unbind() ?
>>>>>
>>>>> Nope. It's a pitty that the trace formatting got messed up above. But as
>>>>> you can see I traced gether_* and eem_*. After eem_unbind no traced
>>>>> function is called, until I flip the switch to device mode.
>>>>> The ... at the end is where I cut uninteresting eem_unwrap <-rx_complete
>>>>> and eem_wrap <-eth_start_xmit lines.
>>>>>
>>>>>> If not then, you may try to enable tracing of TCP/IP stack and
>>>>>> network side to check who deleting the sysfs entry
>>>>>
>>>>> Yes, that's a vast amount of functions to trace. And I don't see how
>>>>> that would be related to the patch we're discussing here. I was hoping
>>>>> for a little more targeted hint.
>>>>
>>>> Now filtering 'gether_*', 'eem_*', '*configfs_*', 'composite_*', 'usb_fun*',
>>>> 'reset_config' and 'device_remove_file' leads me to:
>>>>
>>>> TIMESTAMP FUNCTION
>>>> | |
>>>> 49.952477: eem_wrap <-eth_start_xmit
>>>> 55.072455: eem_wrap <-eth_start_xmit
>>>> 55.072621: eem_unwrap <-rx_complete
>>>> 59.011540: configfs_composite_reset <-usb_gadget_udc_reset
>>>> 59.011545: composite_reset <-configfs_composite_reset
>>>> 59.011548: reset_config <-__composite_disconnect
>>>> 59.013565: eem_disable <-reset_config
>>>> 59.013567: gether_disconnect <-reset_config
>>>> 59.049560: device_remove_file <-device_remove
>>>> 59.051185: configfs_composite_disconnect <-usb_gadget_disco
>>>> 59.051189: composite_disconnect <-configfs_composite_discon
>>>> 59.051195: configfs_composite_unbind <-gadget_unbind_driver
>>>> 59.052519: eem_unbind <-purge_configs_funcs
>>>> 59.052529: composite_dev_cleanup <-configfs_composite_unbin
>>>> 59.052537: device_remove_file <-composite_dev_cleanup
>>>>
>>>> device_remove_file gets called twice, once by device_remove after
>>>> gether_disconnect (that the one). The 2nd time by composite_dev_cleanup
>>>> (removing the gadget)
>>>
>>> I believe that the device_remove_file function is only removing
>>> suspend-specific attributes, not the complete gadget. Typically, when you
>>> perform the role switch, the Gadget start/stop function in your UDC driver is
>>> called. These functions should not delete the gadget
>>>
>>> To investigate further, could you please enable the DWC3 functions in ftrace
>>> and check who is removing the gadget? I can also enable this on my system
>>> and compare the logs with yours, but I will be in PI planning for 1.5 weeks
>>> and may not be able to provide immediate support.
>>
>> Since we are almost at -rc7, I propose to revert and try again next cycle.
>
> Why? There is no problem with this patch!
>
> I don't know why you are so excited to revert the patch instead of
> investigating the original issue. Even though I have proved that the
> problem is caused by usb0 being removed just by role-switching the
> port by the UDC driver, this behavior is incorrect.
>
> There is no behavior in the Linux kernel that keeps the network
> interface and removes the related sysfs entry. THIS IS WRONG and has
> been FIXED without reverting any mainline patch.
>
> Please don't encourage reverting any mainstream patches to avoid
> investigating a (probably) faulty driver. This is not how the
> community should work.

I only skimmed the thread and this is not my area of expertise, so
correct me if I'm wrong anywhere here:

But from what I see I tend to disagree: the patch mentioned in the
subject apparently regressed something. This afaics was reported 16
weeks ago during the 6.7 cycle[1]. Due to how we apply the "no
regressions" rule[2] the change causing this ideally should have been
reverted before 6.8 was release -- it does not matter that it just
exposed an existing problem rooted somewhere else. That revert did not
happen. Then even a few more weeks went by and this is still not fixed.
Then I'd guess reverting this might be the right course of action to
motivate someone to fix the exposed problem. Unless of course we know or
fear that a revert causes regressions for users of 6.8.y -- is that the
case?

[1] https://lore.kernel.org/all/[email protected]/

[2] https://docs.kernel.org/process/handling-regressions.html

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
If I did something stupid, please tell me, as explained on that page.

> Also, I'm a bit wondering: the patch has been in the mainline for
> some time, and we haven't had any issues except this one, which is
> due to the wrong behavior of the UDC driver.
>>
>>> Additionally, please check if you have any customized DWC patches that may be causing this problem.
>>>
>>>>> You may recall the whole issue did not occur before this patch got applied.
>>>>>
>>>>>>>>> According to current kernel architecture of u_ether driver, only
>>>>>>>>> gether_cleanup should remove the usb0 interface along with its
>>>>>>>>> kobject and sysfs interface.
>>>>>>>>> I suggest sharing the analysis here to understand why this practice
>>>>>>>>> is not followed in your use case or driver ?
>>>>>>>>
>>>>>>>> Yes, I'll try to trace where that happens.
>>>>>>>>
>>>>>>>> Nevertheless, the disappearance of the net/usb0 directory seems
>>>>>>>> harmless? But the usb: net device remaining after disconnect or role
>>>>>>>> switch is not good, as the route remains.
>>>>>>>>
>>>>>>>> May be they are 2 separate problems. Could you try to reproduce what
>>>>>>>> happens if you make eem connection and then unplug?
>>>>>>>>
>>>>>>>>> I am curious why the driver was developed without adhering to the
>>>>>>>>> kernel's gadget architecture.
>>>>>>>
>>>>>>> I don't know what you mean here. Which driver do you mean?
>>>>>>>
>>>>>>>>>>>>> I have the dwc3 IP base usb controller, Let me check
>>>>>>>>>>>>> with this patch and
>>>>>>>>>>>>> share result here.  May be we need some fix in dwc3
>>>>>>>>>>> Would have been nice if someone could test on other
>>>>>>>>>>> controller as well. But
>>>>>>>>>>> another instance of dwc3 is also very welcome.
>>>>>>>>>>>> It's quite possible, please test on your side.
>>>>>>>>>>>> We are happy to test any fixes if you come up with.
>>
>> --
>> With Best Regards,
>> Andy Shevchenko
>>
>>

2024-05-03 09:16:05

by Hardik Gajjar

[permalink] [raw]
Subject: Re: [PATCH v4] usb: gadget: u_ether: Replace netif_stop_queue with netif_device_detach

On Fri, May 03, 2024 at 09:24:16AM +0200, Linux regression tracking (Thorsten Leemhuis) wrote:
> Greg, not totally sure, but this might be something where you might need
> to make a judgement call. See below:
>
> On 02.05.24 18:16, Hardik Gajjar wrote:
> > On Thu, May 02, 2024 at 06:31:48PM +0300, Andy Shevchenko wrote:
> >> On Thu, May 02, 2024 at 05:29:16PM +0200, Hardik Gajjar wrote:
> >>> On Tue, Apr 30, 2024 at 11:12:17PM +0200, Ferry Toth wrote:
> >>>> Op 30-04-2024 om 21:40 schreef Ferry Toth:
> >>>>> Op 30-04-2024 om 17:32 schreef Hardik Gajjar:
> >>>>>> On Sun, Apr 28, 2024 at 11:07:36PM +0200, Ferry Toth wrote:
> >>>>>>> Op 25-04-2024 om 23:27 schreef Ferry Toth:
> >>>>>>>> Op 17-04-2024 om 17:13 schreef Hardik Gajjar:
> >>>>>>>>> On Tue, Apr 16, 2024 at 04:48:32PM +0300, Andy Shevchenko wrote:
> >>>>>>>>>> On Thu, Apr 11, 2024 at 10:52:36PM +0200, Ferry Toth wrote:
> >>>>>>>>>>> Op 11-04-2024 om 18:39 schreef Andy Shevchenko:
> >>>>>>>>>>>> On Thu, Apr 11, 2024 at 04:26:37PM +0200, Hardik Gajjar wrote:
> >>>>>>>>>>>>> On Wed, Apr 10, 2024 at 08:37:42PM +0300, Andy Shevchenko wrote:
> >>>>>>>>>>>>>> On Sun, Apr 07, 2024 at 10:51:51PM +0200, Ferry Toth wrote:
> >>>>>>>>>>>>>>> Op 05-04-2024 om 13:38 schreef Hardik Gajjar:
> >>
>
> >> ...
> >>
> >>>>>>>>>>>>>>> Exactly. And this didn't happen before the 2 patches.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> To be precise: /sys/class/net/usb0 is not
> >>>>>>>>>>>>>>> removed and it is a link, the link
> >>>>>>>>>>>>>>> target /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/net/usb0
> >>>>>>>>>>>>>>> no
> >>>>>>>>>>>>>>> longer exists
> >>>>>>>>>>>>> So, it means that the /sys/class/net/usb0 is
> >>>>>>>>>>>>> present, but the symlink is
> >>>>>>>>>>>>> broken. In that case, the dwc3 driver should
> >>>>>>>>>>>>> recreate the device, and the
> >>>>>>>>>>>>> symlink should become active again
> >>>>>>>>>>>
> >>>>>>>>>>> Yes, on first enabling gadget (when device mode is activated):
> >>>>>>>>>>>
> >>>>>>>>>>> root@yuna:~# ls
> >>>>>>>>>>> /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/
> >>>>>>>>>>> driver net power sound subsystem suspended uevent
> >>>>>>>>>>>
> >>>>>>>>>>> Then switching to host mode:
> >>>>>>>>>>>
> >>>>>>>>>>> root@yuna:~# ls
> >>>>>>>>>>> /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/
> >>>>>>>>>>> ls: cannot access
> >>>>>>>>>>> '/sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/':
> >>>>>>>>>>> No such file
> >>>>>>>>>>> or directory
> >>>>>>>>>>>
> >>>>>>>>>>> Then back to device mode:
> >>>>>>>>>>>
> >>>>>>>>>>> root@yuna:~# ls
> >>>>>>>>>>> /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/
> >>>>>>>>>>> driver power sound subsystem suspended uevent
> >>>>>>>>>>>
> >>>>>>>>>>> net is missing. But, network functions:
> >>>>>>>>>>>
> >>>>>>>>>>> root@yuna:~# ping 10.42.0.1
> >>>>>>>>>>> PING 10.42.0.1 (10.42.0.1): 56 data bytes
> >>>>>>>>>>>
> >>>>>>>>>>> Mass storage device is created and removed each time as expected.
> >>>>>>>>>>
> >>>>>>>>>> So, what's the conclusion? Shall we move towards revert of those
> >>>>>>>>>> two changes?
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> As promised, I have the tested the this patch with the dwc3 gadget.
> >>>>>>>>> I could not reproduce
> >>>>>>>>> the issue.
> >>>>>>>>>
> >>>>>>>>> I can see the usb0 exist all the time and accessible regardless of
> >>>>>>>>> the role switching of the USB mode (peripheral <-> host)
> >>>>>>>>>
> >>>>>>>>> Following are the logs:
> >>>>>>>>> //Host to device
> >>>>>>>>>
> >>>>>>>>> console:/sys/bus/platform/devices/a800000.ssusb # echo "peripheral"
> >>>>>>>>>> mode
> >>>>>>>>> console:/sys/bus/platform/devices/a800000.ssusb # ls
> >>>>>>>>> a800000.dwc3/gadget/net/
> >>>>>>>>> usb0
> >>>>>>>>>
> >>>>>>>>> //device to host
> >>>>>>>>> console:/sys/bus/platform/devices/a800000.ssusb # echo "host" > mode
> >>>>>>>>> console:/sys/bus/platform/devices/a800000.ssusb # ls
> >>>>>>>>> a800000.dwc3/gadget/net/
> >>>>>>>>> usb0
> >>>>>>>>
> >>>>>>>> That is weird. When I switch to host mode (using the physical switch),
> >>>>>>>> the whole gadget directory is removed (now testing 6.9.0-rc5)
> >>>>>>>>
> >>>>>>>> Switching back to device mode, that gadget directory is recreated. And
> >>>>>>>> gadget/sound as well, but not gadget/net.
> >>>>>>>>
> >>>>>>>>> s a800000.dwc3/gadget/net/usb0
> >>>>>>>>> <
> >>>>>>>>> addr_assign_type duplex phys_port_name
> >>>>>>>>> addr_len flags phys_switch_id
> >>>>>>>>> address gro_flush_timeout power
> >>>>>>>>> broadcast ifalias proto_down
> >>>>>>>>> carrier ifindex queues
> >>>>>>>>> carrier_changes iflink speed
> >>>>>>>>> carrier_down_count link_mode statistics
> >>>>>>>>> carrier_up_count mtu subsystem
> >>>>>>>>> dev_id name_assign_type tx_queue_len
> >>>>>>>>> dev_port netdev_group type
> >>>>>>>>> device operstate uevent
> >>>>>>>>> dormant phys_port_id waiting_for_supplier
> >>>>>>>>> console:/sys/bus/platform/devices/a800000.ssusb # ifconfig -a usb0
> >>>>>>>>> usb0 Link encap:Ethernet HWaddr 3a:8b:63:97:1a:9a
> >>>>>>>>> BROADCAST MULTICAST MTU:1500 Metric:1
> >>>>>>>>> RX packets:0 errors:0 dropped:0 overruns:0 frame:0
> >>>>>>>>> TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
> >>>>>>>>> collisions:0 txqueuelen:1000
> >>>>>>>>> RX bytes:0 TX bytes:0
> >>>>>>>>>
> >>>>>>>>> console:/sys/bus/platform/devices/a800000.ssusb #
> >>>>>>>>>
> >>>>>>>>> I strongly advise against reverting the patch solely based on the
> >>>>>>>>> observed issue of removing the /sys/class/net/usb0 directory while
> >>>>>>>>> the usb0 interface remains available.
> >>>>>>>>
> >>>>>>>> There's more to it. I also mentioned that switching the role or
> >>>>>>>> unplugging the cable leaves the usb0 connection.
> >>>>>>>>
> >>>>>>>> I have while in host mode:
> >>>>>>>> root@yuna:~# ifconfig -a usb0
> >>>>>>>> usb0: flags=-28605<UP,BROADCAST,RUNNING,MULTICAST,DYNAMIC> mtu 1500
> >>>>>>>> inet 10.42.0.221 netmask 255.255.255.0 broadcast
> >>>>>>>> 10.42.0.255
> >>>>>>>> inet6 fe80::a8bb:ccff:fedd:eef1 prefixlen 64
> >>>>>>>> scopeid 0x20<link>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> You don't see that because you didn't create a connection at all.
> >>>>>>>>
> >>>>>>>>> Instead, I recommend enabling FTRACE to trace the functions involved
> >>>>>>>>> and identify which faulty call is responsible for removing usb0.
> >>>>>>>>
> >>>>>>>> Switching from device -> host -> device:
> >>>>>>>>
> >>>>>>>> root@yuna:~# trace-cmd record -p function_graph -l *gether_*
> >>>>>>>> plugin 'function_graph'
> >>>>>>>> Hit Ctrl^C to stop recording
> >>>>>>>> ^CCPU0 data recorded at offset=0x1c8000
> >>>>>>>> 188 bytes in size (4096 uncompressed)
> >>>>>>>> CPU1 data recorded at offset=0x1c9000
> >>>>>>>> 0 bytes in size (0 uncompressed)
> >>>>>>>> root@yuna:~# trace-cmd report
> >>>>>>>> cpus=2
> >>>>>>>> irq/68-dwc3-725 [000] 514.575337: funcgraph_entry: #
> >>>>>>>> 2079.480 us | gether_disconnect();
> >>>>>>>> irq/68-dwc3-946 [000] 524.263731: funcgraph_entry: +
> >>>>>>>> 11.640 us | gether_disconnect();
> >>>>>>>> irq/68-dwc3-946 [000] 524.263743: funcgraph_entry: !
> >>>>>>>> 116.520 us | gether_connect();
> >>>>>>>> irq/68-dwc3-946 [000] 524.268029: funcgraph_entry: #
> >>>>>>>> 2057.260 us | gether_disconnect();
> >>>>>>>> irq/68-dwc3-946 [000] 524.270089: funcgraph_entry: !
> >>>>>>>> 109.000 us | gether_connect();
> >>>>>>>
> >>>>>>> I tried to get a more useful trace:
> >>>>>>> root@yuna:/sys/kernel/tracing# echo 'gether_*' > set_ftrace_filter
> >>>>>>> root@yuna:/sys/kernel/tracing# echo 'eem_*' >> set_ftrace_filter
> >>>>>>> root@yuna:/sys/kernel/tracing# echo function > current_tracer
> >>>>>>> root@yuna:/sys/kernel/tracing# echo 'reset_config' >> set_ftrace_filter
> >>>>>>> -> switch to host mode then back to device
> >>>>>>> root@yuna:/sys/kernel/tracing# cat trace
> >>>>>>> # tracer: function
> >>>>>>> #
> >>>>>>> # entries-in-buffer/entries-written: 53/53 #P:2
> >>>>>>> #
> >>>>>>> # _-----=> irqs-off/BH-disabled
> >>>>>>> # / _----=> need-resched
> >>>>>>> # | / _---=> hardirq/softirq
> >>>>>>> # || / _--=> preempt-depth
> >>>>>>> # ||| / _-=> migrate-disable
> >>>>>>> # |||| / delay
> >>>>>>> # TASK-PID CPU# ||||| TIMESTAMP FUNCTION
> >>>>>>> # | | | ||||| | |
> >>>>>>> irq/68-dwc3-523 [000] D..3. 133.990254: reset_config
> >>>>>>> <-__composite_disconnect
> >>>>>>> irq/68-dwc3-523 [000] D..3. 133.992274: eem_disable
> >>>>>>> <-reset_config
> >>>>>>> irq/68-dwc3-523 [000] D..3. 133.992276: gether_disconnect
> >>>>>>> <-reset_config
> >>>>>>> kworker/1:3-443 [001] ...1. 134.022453: eem_unbind
> >>>>>>> <-purge_configs_funcs
> >>>>>>>
> >>>>>>> -> to device mode
> >>>>>>>
> >>>>>>> kworker/1:3-443 [001] ...1. 148.630773: eem_bind
> >>>>>>> <-usb_add_function
> >>>>>>> irq/68-dwc3-734 [000] D..3. 149.155209: eem_set_alt
> >>>>>>> <-composite_setup
> >>>>>>> irq/68-dwc3-734 [000] D..3. 149.155215: gether_disconnect
> >>>>>>> <-eem_set_alt
> >>>>>>> irq/68-dwc3-734 [000] D..3. 149.155220: gether_connect
> >>>>>>> <-eem_set_alt
> >>>>>>> irq/68-dwc3-734 [000] D..3. 149.157287: eem_set_alt
> >>>>>>> <-composite_setup
> >>>>>>> irq/68-dwc3-734 [000] D..3. 149.157292: gether_disconnect
> >>>>>>> <-eem_set_alt
> >>>>>>> irq/68-dwc3-734 [000] D..3. 149.159338: gether_connect
> >>>>>>> <-eem_set_alt
> >>>>>>> irq/68-dwc3-734 [000] D..2. 149.239625: eem_unwrap
> >>>>>>> <-rx_complete
> >>>>>>> ...
> >>>>>>>
> >>>>>>> I don't know where to look exactly. Any hints?
> >>>>>>
> >>>>>> do you see anything related to gether_cleanup() after eem_unbind() ?
> >>>>>
> >>>>> Nope. It's a pitty that the trace formatting got messed up above. But as
> >>>>> you can see I traced gether_* and eem_*. After eem_unbind no traced
> >>>>> function is called, until I flip the switch to device mode.
> >>>>> The ... at the end is where I cut uninteresting eem_unwrap <-rx_complete
> >>>>> and eem_wrap <-eth_start_xmit lines.
> >>>>>
> >>>>>> If not then, you may try to enable tracing of TCP/IP stack and
> >>>>>> network side to check who deleting the sysfs entry
> >>>>>
> >>>>> Yes, that's a vast amount of functions to trace. And I don't see how
> >>>>> that would be related to the patch we're discussing here. I was hoping
> >>>>> for a little more targeted hint.
> >>>>
> >>>> Now filtering 'gether_*', 'eem_*', '*configfs_*', 'composite_*', 'usb_fun*',
> >>>> 'reset_config' and 'device_remove_file' leads me to:
> >>>>
> >>>> TIMESTAMP FUNCTION
> >>>> | |
> >>>> 49.952477: eem_wrap <-eth_start_xmit
> >>>> 55.072455: eem_wrap <-eth_start_xmit
> >>>> 55.072621: eem_unwrap <-rx_complete
> >>>> 59.011540: configfs_composite_reset <-usb_gadget_udc_reset
> >>>> 59.011545: composite_reset <-configfs_composite_reset
> >>>> 59.011548: reset_config <-__composite_disconnect
> >>>> 59.013565: eem_disable <-reset_config
> >>>> 59.013567: gether_disconnect <-reset_config
> >>>> 59.049560: device_remove_file <-device_remove
> >>>> 59.051185: configfs_composite_disconnect <-usb_gadget_disco
> >>>> 59.051189: composite_disconnect <-configfs_composite_discon
> >>>> 59.051195: configfs_composite_unbind <-gadget_unbind_driver
> >>>> 59.052519: eem_unbind <-purge_configs_funcs
> >>>> 59.052529: composite_dev_cleanup <-configfs_composite_unbin
> >>>> 59.052537: device_remove_file <-composite_dev_cleanup
> >>>>
> >>>> device_remove_file gets called twice, once by device_remove after
> >>>> gether_disconnect (that the one). The 2nd time by composite_dev_cleanup
> >>>> (removing the gadget)
> >>>
> >>> I believe that the device_remove_file function is only removing
> >>> suspend-specific attributes, not the complete gadget. Typically, when you
> >>> perform the role switch, the Gadget start/stop function in your UDC driver is
> >>> called. These functions should not delete the gadget
> >>>
> >>> To investigate further, could you please enable the DWC3 functions in ftrace
> >>> and check who is removing the gadget? I can also enable this on my system
> >>> and compare the logs with yours, but I will be in PI planning for 1.5 weeks
> >>> and may not be able to provide immediate support.
> >>
> >> Since we are almost at -rc7, I propose to revert and try again next cycle.
> >
> > Why? There is no problem with this patch!
> >
> > I don't know why you are so excited to revert the patch instead of
> > investigating the original issue. Even though I have proved that the
> > problem is caused by usb0 being removed just by role-switching the
> > port by the UDC driver, this behavior is incorrect.
> >
> > There is no behavior in the Linux kernel that keeps the network
> > interface and removes the related sysfs entry. THIS IS WRONG and has
> > been FIXED without reverting any mainline patch.
> >
> > Please don't encourage reverting any mainstream patches to avoid
> > investigating a (probably) faulty driver. This is not how the
> > community should work.
>
> I only skimmed the thread and this is not my area of expertise, so
> correct me if I'm wrong anywhere here:
>
> But from what I see I tend to disagree: the patch mentioned in the
> subject apparently regressed something. This afaics was reported 16
> weeks ago during the 6.7 cycle[1]. Due to how we apply the "no
> regressions" rule[2] the change causing this ideally should have been
> reverted before 6.8 was release -- it does not matter that it just
> exposed an existing problem rooted somewhere else. That revert did not
> happen. Then even a few more weeks went by and this is still not fixed.
> Then I'd guess reverting this might be the right course of action to
> motivate someone to fix the exposed problem. Unless of course we know or
> fear that a revert causes regressions for users of 6.8.y -- is that the
> case?
>
> [1] https://urldefense.proofpoint.com/v2/url?u=https-3A__lore.kernel.org_all_ZaQS5x-2DXK08Jre6I-40smile.fi.intel.com_&d=DwIDaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=SAhjP5GOmrADp1v_EE5jWoSuMlYCIt9gKduw-DCBPLs&m=QAHqX_KD5JuTUUtqftI9GSmckKDKzRyJ5qu0DT0NdpJ7Gxk6vfsvBPrBvJLM0fzb&s=HBokvujOYQdyCxk3eMV6xtwLLNdCxjsnZ4nTzbhbLXM&e=
>
> [2] https://urldefense.proofpoint.com/v2/url?u=https-3A__docs.kernel.org_process_handling-2Dregressions.html&d=DwIDaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=SAhjP5GOmrADp1v_EE5jWoSuMlYCIt9gKduw-DCBPLs&m=QAHqX_KD5JuTUUtqftI9GSmckKDKzRyJ5qu0DT0NdpJ7Gxk6vfsvBPrBvJLM0fzb&s=lOh1a5Vzu2VNEaspoNgEipxRU9UDDPcibRLoOVrGbIU&e=
>
> Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
> --
> Everything you wanna know about Linux kernel regression tracking:
> https://urldefense.proofpoint.com/v2/url?u=https-3A__linux-2Dregtracking.leemhuis.info_about_-23tldr&d=DwIDaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=SAhjP5GOmrADp1v_EE5jWoSuMlYCIt9gKduw-DCBPLs&m=QAHqX_KD5JuTUUtqftI9GSmckKDKzRyJ5qu0DT0NdpJ7Gxk6vfsvBPrBvJLM0fzb&s=MuVuulVZqYAjIqCVpNc01ve8yD6DFt2wOZEH4p9khTc&e=
> If I did something stupid, please tell me, as explained on that page.

I understand your perspective. Do you have documentation or guidelines available to ensure that the existing problem is resolved and the correct patch is reinstated?

As far as I know, the issue is currently only reproducible by the reporter. I also have a similar setup, but I have not been able to reproduce the issue myself (please refer to our previous discussions). This discrepancy could be due to some OEM modifications.

If the problem is only reproducible by the reporter, how do we motivate others to support and upstream the fix? Isn't it demotivating to revert the correct patch and retain the wrong behavior of the driver?

Additionally, is it possible to determine how many correct patches have been reverted in the past to address this particular problem?

Please correct me if I am wrong ?

>
> > Also, I'm a bit wondering: the patch has been in the mainline for
> > some time, and we haven't had any issues except this one, which is
> > due to the wrong behavior of the UDC driver.
> >>
> >>> Additionally, please check if you have any customized DWC patches that may be causing this problem.
> >>>
> >>>>> You may recall the whole issue did not occur before this patch got applied.
> >>>>>
> >>>>>>>>> According to current kernel architecture of u_ether driver, only
> >>>>>>>>> gether_cleanup should remove the usb0 interface along with its
> >>>>>>>>> kobject and sysfs interface.
> >>>>>>>>> I suggest sharing the analysis here to understand why this practice
> >>>>>>>>> is not followed in your use case or driver ?
> >>>>>>>>
> >>>>>>>> Yes, I'll try to trace where that happens.
> >>>>>>>>
> >>>>>>>> Nevertheless, the disappearance of the net/usb0 directory seems
> >>>>>>>> harmless? But the usb: net device remaining after disconnect or role
> >>>>>>>> switch is not good, as the route remains.
> >>>>>>>>
> >>>>>>>> May be they are 2 separate problems. Could you try to reproduce what
> >>>>>>>> happens if you make eem connection and then unplug?
> >>>>>>>>
> >>>>>>>>> I am curious why the driver was developed without adhering to the
> >>>>>>>>> kernel's gadget architecture.
> >>>>>>>
> >>>>>>> I don't know what you mean here. Which driver do you mean?
> >>>>>>>
> >>>>>>>>>>>>> I have the dwc3 IP base usb controller, Let me check
> >>>>>>>>>>>>> with this patch and
> >>>>>>>>>>>>> share result here. May be we need some fix in dwc3
> >>>>>>>>>>> Would have been nice if someone could test on other
> >>>>>>>>>>> controller as well. But
> >>>>>>>>>>> another instance of dwc3 is also very welcome.
> >>>>>>>>>>>> It's quite possible, please test on your side.
> >>>>>>>>>>>> We are happy to test any fixes if you come up with.
> >>
> >> --
> >> With Best Regards,
> >> Andy Shevchenko
> >>
> >>

2024-05-03 12:40:04

by Thorsten Leemhuis

[permalink] [raw]
Subject: Re: [PATCH v4] usb: gadget: u_ether: Replace netif_stop_queue with netif_device_detach

On 03.05.24 11:15, Hardik Gajjar wrote:
> On Fri, May 03, 2024 at 09:24:16AM +0200, Linux regression tracking (Thorsten Leemhuis) wrote:
>> Greg, not totally sure, but this might be something where you might need
>> to make a judgement call. See below:
>>
>> On 02.05.24 18:16, Hardik Gajjar wrote:
>>> On Thu, May 02, 2024 at 06:31:48PM +0300, Andy Shevchenko wrote:
>>>> On Thu, May 02, 2024 at 05:29:16PM +0200, Hardik Gajjar wrote:
>>>>> On Tue, Apr 30, 2024 at 11:12:17PM +0200, Ferry Toth wrote:
>>>>>> Op 30-04-2024 om 21:40 schreef Ferry Toth:
>>>>>>> Op 30-04-2024 om 17:32 schreef Hardik Gajjar:
>>>>>>>> On Sun, Apr 28, 2024 at 11:07:36PM +0200, Ferry Toth wrote:
>>>>>>>>> Op 25-04-2024 om 23:27 schreef Ferry Toth:
>>>>>>>>>> Op 17-04-2024 om 17:13 schreef Hardik Gajjar:
>>>>>>>>>>> On Tue, Apr 16, 2024 at 04:48:32PM +0300, Andy Shevchenko wrote:
>>>>>>>>>>>> On Thu, Apr 11, 2024 at 10:52:36PM +0200, Ferry Toth wrote:
>>>>>>>>>>>>> Op 11-04-2024 om 18:39 schreef Andy Shevchenko:
>>>>>>>>>>>>>> On Thu, Apr 11, 2024 at 04:26:37PM +0200, Hardik Gajjar wrote:
>>>>>>>>>>>>>>> On Wed, Apr 10, 2024 at 08:37:42PM +0300, Andy Shevchenko wrote:
>>>>>>>>>>>>>>>> On Sun, Apr 07, 2024 at 10:51:51PM +0200, Ferry Toth wrote:
>>>>>>>>>>>>>>>>> Op 05-04-2024 om 13:38 schreef Hardik Gajjar:
>>>>
>>
>>>> ...
>>>>
>>>>>>>>>>>>>>>>> Exactly. And this didn't happen before the 2 patches.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> To be precise: /sys/class/net/usb0 is not
>>>>>>>>>>>>>>>>> removed and it is a link, the link
>>>>>>>>>>>>>>>>> target /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/net/usb0
>>>>>>>>>>>>>>>>> no
>>>>>>>>>>>>>>>>> longer exists
>>>>>>>>>>>>>>> So, it means that the /sys/class/net/usb0 is
>>>>>>>>>>>>>>> present, but the symlink is
>>>>>>>>>>>>>>> broken. In that case, the dwc3 driver should
>>>>>>>>>>>>>>> recreate the device, and the
>>>>>>>>>>>>>>> symlink should become active again
>>>>>>>>>>>>>
>>>>>>>>>>>>> Yes, on first enabling gadget (when device mode is activated):
>>>>>>>>>>>>>
>>>>>>>>>>>>> root@yuna:~# ls
>>>>>>>>>>>>> /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/
>>>>>>>>>>>>> driver net power sound subsystem suspended uevent
>>>>>>>>>>>>>
>>>>>>>>>>>>> Then switching to host mode:
>>>>>>>>>>>>>
>>>>>>>>>>>>> root@yuna:~# ls
>>>>>>>>>>>>> /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/
>>>>>>>>>>>>> ls: cannot access
>>>>>>>>>>>>> '/sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/':
>>>>>>>>>>>>> No such file
>>>>>>>>>>>>> or directory
>>>>>>>>>>>>>
>>>>>>>>>>>>> Then back to device mode:
>>>>>>>>>>>>>
>>>>>>>>>>>>> root@yuna:~# ls
>>>>>>>>>>>>> /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/
>>>>>>>>>>>>> driver power sound subsystem suspended uevent
>>>>>>>>>>>>>
>>>>>>>>>>>>> net is missing. But, network functions:
>>>>>>>>>>>>>
>>>>>>>>>>>>> root@yuna:~# ping 10.42.0.1
>>>>>>>>>>>>> PING 10.42.0.1 (10.42.0.1): 56 data bytes
>>>>>>>>>>>>>
>>>>>>>>>>>>> Mass storage device is created and removed each time as expected.
>>>>>>>>>>>>
>>>>>>>>>>>> So, what's the conclusion? Shall we move towards revert of those
>>>>>>>>>>>> two changes?
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> As promised, I have the tested the this patch with the dwc3 gadget.
>>>>>>>>>>> I could not reproduce
>>>>>>>>>>> the issue.
>>>>>>>>>>>
>>>>>>>>>>> I can see the usb0 exist all the time and accessible regardless of
>>>>>>>>>>> the role switching of the USB mode (peripheral <-> host)
>>>>>>>>>>>
>>>>>>>>>>> Following are the logs:
>>>>>>>>>>> //Host to device
>>>>>>>>>>>
>>>>>>>>>>> console:/sys/bus/platform/devices/a800000.ssusb # echo "peripheral"
>>>>>>>>>>>> mode
>>>>>>>>>>> console:/sys/bus/platform/devices/a800000.ssusb # ls
>>>>>>>>>>> a800000.dwc3/gadget/net/
>>>>>>>>>>> usb0
>>>>>>>>>>>
>>>>>>>>>>> //device to host
>>>>>>>>>>> console:/sys/bus/platform/devices/a800000.ssusb # echo "host" > mode
>>>>>>>>>>> console:/sys/bus/platform/devices/a800000.ssusb # ls
>>>>>>>>>>> a800000.dwc3/gadget/net/
>>>>>>>>>>> usb0
>>>>>>>>>>
>>>>>>>>>> That is weird. When I switch to host mode (using the physical switch),
>>>>>>>>>> the whole gadget directory is removed (now testing 6.9.0-rc5)
>>>>>>>>>>
>>>>>>>>>> Switching back to device mode, that gadget directory is recreated. And
>>>>>>>>>> gadget/sound as well, but not gadget/net.
>>>>>>>>>>
>>>>>>>>>>> s a800000.dwc3/gadget/net/usb0
>>>>>>>>>>> <
>>>>>>>>>>> addr_assign_type duplex phys_port_name
>>>>>>>>>>> addr_len flags phys_switch_id
>>>>>>>>>>> address gro_flush_timeout power
>>>>>>>>>>> broadcast ifalias proto_down
>>>>>>>>>>> carrier ifindex queues
>>>>>>>>>>> carrier_changes iflink speed
>>>>>>>>>>> carrier_down_count link_mode statistics
>>>>>>>>>>> carrier_up_count mtu subsystem
>>>>>>>>>>> dev_id name_assign_type tx_queue_len
>>>>>>>>>>> dev_port netdev_group type
>>>>>>>>>>> device operstate uevent
>>>>>>>>>>> dormant phys_port_id waiting_for_supplier
>>>>>>>>>>> console:/sys/bus/platform/devices/a800000.ssusb # ifconfig -a usb0
>>>>>>>>>>> usb0 Link encap:Ethernet HWaddr 3a:8b:63:97:1a:9a
>>>>>>>>>>> BROADCAST MULTICAST MTU:1500 Metric:1
>>>>>>>>>>> RX packets:0 errors:0 dropped:0 overruns:0 frame:0
>>>>>>>>>>> TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
>>>>>>>>>>> collisions:0 txqueuelen:1000
>>>>>>>>>>> RX bytes:0 TX bytes:0
>>>>>>>>>>>
>>>>>>>>>>> console:/sys/bus/platform/devices/a800000.ssusb #
>>>>>>>>>>>
>>>>>>>>>>> I strongly advise against reverting the patch solely based on the
>>>>>>>>>>> observed issue of removing the /sys/class/net/usb0 directory while
>>>>>>>>>>> the usb0 interface remains available.
>>>>>>>>>>
>>>>>>>>>> There's more to it. I also mentioned that switching the role or
>>>>>>>>>> unplugging the cable leaves the usb0 connection.
>>>>>>>>>>
>>>>>>>>>> I have while in host mode:
>>>>>>>>>> root@yuna:~# ifconfig -a usb0
>>>>>>>>>> usb0: flags=-28605<UP,BROADCAST,RUNNING,MULTICAST,DYNAMIC> mtu 1500
>>>>>>>>>> inet 10.42.0.221 netmask 255.255.255.0 broadcast
>>>>>>>>>> 10.42.0.255
>>>>>>>>>> inet6 fe80::a8bb:ccff:fedd:eef1 prefixlen 64
>>>>>>>>>> scopeid 0x20<link>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> You don't see that because you didn't create a connection at all.
>>>>>>>>>>
>>>>>>>>>>> Instead, I recommend enabling FTRACE to trace the functions involved
>>>>>>>>>>> and identify which faulty call is responsible for removing usb0.
>>>>>>>>>>
>>>>>>>>>> Switching from device -> host -> device:
>>>>>>>>>>
>>>>>>>>>> root@yuna:~# trace-cmd record -p function_graph -l *gether_*
>>>>>>>>>> plugin 'function_graph'
>>>>>>>>>> Hit Ctrl^C to stop recording
>>>>>>>>>> ^CCPU0 data recorded at offset=0x1c8000
>>>>>>>>>> 188 bytes in size (4096 uncompressed)
>>>>>>>>>> CPU1 data recorded at offset=0x1c9000
>>>>>>>>>> 0 bytes in size (0 uncompressed)
>>>>>>>>>> root@yuna:~# trace-cmd report
>>>>>>>>>> cpus=2
>>>>>>>>>> irq/68-dwc3-725 [000] 514.575337: funcgraph_entry: #
>>>>>>>>>> 2079.480 us | gether_disconnect();
>>>>>>>>>> irq/68-dwc3-946 [000] 524.263731: funcgraph_entry: +
>>>>>>>>>> 11.640 us | gether_disconnect();
>>>>>>>>>> irq/68-dwc3-946 [000] 524.263743: funcgraph_entry: !
>>>>>>>>>> 116.520 us | gether_connect();
>>>>>>>>>> irq/68-dwc3-946 [000] 524.268029: funcgraph_entry: #
>>>>>>>>>> 2057.260 us | gether_disconnect();
>>>>>>>>>> irq/68-dwc3-946 [000] 524.270089: funcgraph_entry: !
>>>>>>>>>> 109.000 us | gether_connect();
>>>>>>>>>
>>>>>>>>> I tried to get a more useful trace:
>>>>>>>>> root@yuna:/sys/kernel/tracing# echo 'gether_*' > set_ftrace_filter
>>>>>>>>> root@yuna:/sys/kernel/tracing# echo 'eem_*' >> set_ftrace_filter
>>>>>>>>> root@yuna:/sys/kernel/tracing# echo function > current_tracer
>>>>>>>>> root@yuna:/sys/kernel/tracing# echo 'reset_config' >> set_ftrace_filter
>>>>>>>>> -> switch to host mode then back to device
>>>>>>>>> root@yuna:/sys/kernel/tracing# cat trace
>>>>>>>>> # tracer: function
>>>>>>>>> #
>>>>>>>>> # entries-in-buffer/entries-written: 53/53 #P:2
>>>>>>>>> #
>>>>>>>>> # _-----=> irqs-off/BH-disabled
>>>>>>>>> # / _----=> need-resched
>>>>>>>>> # | / _---=> hardirq/softirq
>>>>>>>>> # || / _--=> preempt-depth
>>>>>>>>> # ||| / _-=> migrate-disable
>>>>>>>>> # |||| / delay
>>>>>>>>> # TASK-PID CPU# ||||| TIMESTAMP FUNCTION
>>>>>>>>> # | | | ||||| | |
>>>>>>>>> irq/68-dwc3-523 [000] D..3. 133.990254: reset_config
>>>>>>>>> <-__composite_disconnect
>>>>>>>>> irq/68-dwc3-523 [000] D..3. 133.992274: eem_disable
>>>>>>>>> <-reset_config
>>>>>>>>> irq/68-dwc3-523 [000] D..3. 133.992276: gether_disconnect
>>>>>>>>> <-reset_config
>>>>>>>>> kworker/1:3-443 [001] ...1. 134.022453: eem_unbind
>>>>>>>>> <-purge_configs_funcs
>>>>>>>>>
>>>>>>>>> -> to device mode
>>>>>>>>>
>>>>>>>>> kworker/1:3-443 [001] ...1. 148.630773: eem_bind
>>>>>>>>> <-usb_add_function
>>>>>>>>> irq/68-dwc3-734 [000] D..3. 149.155209: eem_set_alt
>>>>>>>>> <-composite_setup
>>>>>>>>> irq/68-dwc3-734 [000] D..3. 149.155215: gether_disconnect
>>>>>>>>> <-eem_set_alt
>>>>>>>>> irq/68-dwc3-734 [000] D..3. 149.155220: gether_connect
>>>>>>>>> <-eem_set_alt
>>>>>>>>> irq/68-dwc3-734 [000] D..3. 149.157287: eem_set_alt
>>>>>>>>> <-composite_setup
>>>>>>>>> irq/68-dwc3-734 [000] D..3. 149.157292: gether_disconnect
>>>>>>>>> <-eem_set_alt
>>>>>>>>> irq/68-dwc3-734 [000] D..3. 149.159338: gether_connect
>>>>>>>>> <-eem_set_alt
>>>>>>>>> irq/68-dwc3-734 [000] D..2. 149.239625: eem_unwrap
>>>>>>>>> <-rx_complete
>>>>>>>>> ...
>>>>>>>>>
>>>>>>>>> I don't know where to look exactly. Any hints?
>>>>>>>>
>>>>>>>> do you see anything related to gether_cleanup() after eem_unbind() ?
>>>>>>>
>>>>>>> Nope. It's a pitty that the trace formatting got messed up above. But as
>>>>>>> you can see I traced gether_* and eem_*. After eem_unbind no traced
>>>>>>> function is called, until I flip the switch to device mode.
>>>>>>> The ... at the end is where I cut uninteresting eem_unwrap <-rx_complete
>>>>>>> and eem_wrap <-eth_start_xmit lines.
>>>>>>>
>>>>>>>> If not then, you may try to enable tracing of TCP/IP stack and
>>>>>>>> network side to check who deleting the sysfs entry
>>>>>>>
>>>>>>> Yes, that's a vast amount of functions to trace. And I don't see how
>>>>>>> that would be related to the patch we're discussing here. I was hoping
>>>>>>> for a little more targeted hint.
>>>>>>
>>>>>> Now filtering 'gether_*', 'eem_*', '*configfs_*', 'composite_*', 'usb_fun*',
>>>>>> 'reset_config' and 'device_remove_file' leads me to:
>>>>>>
>>>>>> TIMESTAMP FUNCTION
>>>>>> | |
>>>>>> 49.952477: eem_wrap <-eth_start_xmit
>>>>>> 55.072455: eem_wrap <-eth_start_xmit
>>>>>> 55.072621: eem_unwrap <-rx_complete
>>>>>> 59.011540: configfs_composite_reset <-usb_gadget_udc_reset
>>>>>> 59.011545: composite_reset <-configfs_composite_reset
>>>>>> 59.011548: reset_config <-__composite_disconnect
>>>>>> 59.013565: eem_disable <-reset_config
>>>>>> 59.013567: gether_disconnect <-reset_config
>>>>>> 59.049560: device_remove_file <-device_remove
>>>>>> 59.051185: configfs_composite_disconnect <-usb_gadget_disco
>>>>>> 59.051189: composite_disconnect <-configfs_composite_discon
>>>>>> 59.051195: configfs_composite_unbind <-gadget_unbind_driver
>>>>>> 59.052519: eem_unbind <-purge_configs_funcs
>>>>>> 59.052529: composite_dev_cleanup <-configfs_composite_unbin
>>>>>> 59.052537: device_remove_file <-composite_dev_cleanup
>>>>>>
>>>>>> device_remove_file gets called twice, once by device_remove after
>>>>>> gether_disconnect (that the one). The 2nd time by composite_dev_cleanup
>>>>>> (removing the gadget)
>>>>>
>>>>> I believe that the device_remove_file function is only removing
>>>>> suspend-specific attributes, not the complete gadget. Typically, when you
>>>>> perform the role switch, the Gadget start/stop function in your UDC driver is
>>>>> called. These functions should not delete the gadget
>>>>>
>>>>> To investigate further, could you please enable the DWC3 functions in ftrace
>>>>> and check who is removing the gadget? I can also enable this on my system
>>>>> and compare the logs with yours, but I will be in PI planning for 1.5 weeks
>>>>> and may not be able to provide immediate support.
>>>>
>>>> Since we are almost at -rc7, I propose to revert and try again next cycle.
>>>
>>> Why? There is no problem with this patch!
>>>
>>> I don't know why you are so excited to revert the patch instead of
>>> investigating the original issue. Even though I have proved that the
>>> problem is caused by usb0 being removed just by role-switching the
>>> port by the UDC driver, this behavior is incorrect.
>>>
>>> There is no behavior in the Linux kernel that keeps the network
>>> interface and removes the related sysfs entry. THIS IS WRONG and has
>>> been FIXED without reverting any mainline patch.
>>>
>>> Please don't encourage reverting any mainstream patches to avoid
>>> investigating a (probably) faulty driver. This is not how the
>>> community should work.
>>
>> I only skimmed the thread and this is not my area of expertise, so
>> correct me if I'm wrong anywhere here:
>>
>> But from what I see I tend to disagree: the patch mentioned in the
>> subject apparently regressed something. This afaics was reported 16
>> weeks ago during the 6.7 cycle[1]. Due to how we apply the "no
>> regressions" rule[2] the change causing this ideally should have been
>> reverted before 6.8 was release -- it does not matter that it just
>> exposed an existing problem rooted somewhere else. That revert did not
>> happen. Then even a few more weeks went by and this is still not fixed.
>> Then I'd guess reverting this might be the right course of action to
>> motivate someone to fix the exposed problem. Unless of course we know or
>> fear that a revert causes regressions for users of 6.8.y -- is that the
>> case?
>>
>> [1] https://urldefense.proofpoint.com/v2/url?u=https-3A__lore.kernel.org_all_ZaQS5x-2DXK08Jre6I-40smile.fi.intel.com_&d=DwIDaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=SAhjP5GOmrADp1v_EE5jWoSuMlYCIt9gKduw-DCBPLs&m=QAHqX_KD5JuTUUtqftI9GSmckKDKzRyJ5qu0DT0NdpJ7Gxk6vfsvBPrBvJLM0fzb&s=HBokvujOYQdyCxk3eMV6xtwLLNdCxjsnZ4nTzbhbLXM&e=
>>
>> [2] https://urldefense.proofpoint.com/v2/url?u=https-3A__docs.kernel.org_process_handling-2Dregressions.html&d=DwIDaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=SAhjP5GOmrADp1v_EE5jWoSuMlYCIt9gKduw-DCBPLs&m=QAHqX_KD5JuTUUtqftI9GSmckKDKzRyJ5qu0DT0NdpJ7Gxk6vfsvBPrBvJLM0fzb&s=lOh1a5Vzu2VNEaspoNgEipxRU9UDDPcibRLoOVrGbIU&e=
>>
>> Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
>> --
>> Everything you wanna know about Linux kernel regression tracking:
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__linux-2Dregtracking.leemhuis.info_about_-23tldr&d=DwIDaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=SAhjP5GOmrADp1v_EE5jWoSuMlYCIt9gKduw-DCBPLs&m=QAHqX_KD5JuTUUtqftI9GSmckKDKzRyJ5qu0DT0NdpJ7Gxk6vfsvBPrBvJLM0fzb&s=MuVuulVZqYAjIqCVpNc01ve8yD6DFt2wOZEH4p9khTc&e=
>> If I did something stupid, please tell me, as explained on that page.
>
> I understand your perspective. Do you have documentation or
> guidelines available to ensure that the existing problem is resolved
> and the correct patch is reinstated?

The second link I gave (e.g. to a rendered version of
Documentation/process/handling-regressions.rst ) has quite a few quotes
from Linus which iirc should support this. Apart from this there is only
the lore archive and the the git log afaik.

> As far as I know, the issue is currently only reproducible by the
> reporter. I also have a similar setup, but I have not been able to
> reproduce the issue myself (please refer to our previous
> discussions). This discrepancy could be due to some OEM
> modifications.

To the board/machine in question or the kernel used beforehand? If the
former: that does not matter much. In the latter: than of course it's
not a regression.

> If the problem is only reproducible by the reporter, how do we
> motivate others to support and upstream the fix?

The general idea afaics is: there was someone that for some reason was
motivated to bring in the patch that caused the regression (indirectly
in this case); with a bit of luck that someone after the revert wants it
to be included again and thus is motivated to fix it (or search for the
underlying issue and fix that in this case).

Of course that does not always work out. OTOH with a bit of luck the
other developers will help with that, too.

And there are gray areas, but they are not that big. Not sure if this is
one, that in the end is up to Greg or Linus to decide.

> Isn't it
> demotivating to revert the correct patch and retain the wrong
> behavior of the driver?

Sure, but I'd say it's even more demotivating if things break for users
when they update their kernel, as then they won't update the next time
and stay on old versions with bugs and vulnerabilities. As that's afaics
what Linus wants to prevent -- and why he made the rule and established
current practices.

> Additionally, is it possible to determine how many correct patches
> have been reverted in the past to address this particular problem?

git log?

> Please correct me if I am wrong ?

HTH, Ciao, Thorsten

>>> Also, I'm a bit wondering: the patch has been in the mainline for
>>> some time, and we haven't had any issues except this one, which is
>>> due to the wrong behavior of the UDC driver.
>>>>
>>>>> Additionally, please check if you have any customized DWC patches that may be causing this problem.
>>>>>
>>>>>>> You may recall the whole issue did not occur before this patch got applied.
>>>>>>>
>>>>>>>>>>> According to current kernel architecture of u_ether driver, only
>>>>>>>>>>> gether_cleanup should remove the usb0 interface along with its
>>>>>>>>>>> kobject and sysfs interface.
>>>>>>>>>>> I suggest sharing the analysis here to understand why this practice
>>>>>>>>>>> is not followed in your use case or driver ?
>>>>>>>>>>
>>>>>>>>>> Yes, I'll try to trace where that happens.
>>>>>>>>>>
>>>>>>>>>> Nevertheless, the disappearance of the net/usb0 directory seems
>>>>>>>>>> harmless? But the usb: net device remaining after disconnect or role
>>>>>>>>>> switch is not good, as the route remains.
>>>>>>>>>>
>>>>>>>>>> May be they are 2 separate problems. Could you try to reproduce what
>>>>>>>>>> happens if you make eem connection and then unplug?
>>>>>>>>>>
>>>>>>>>>>> I am curious why the driver was developed without adhering to the
>>>>>>>>>>> kernel's gadget architecture.
>>>>>>>>>
>>>>>>>>> I don't know what you mean here. Which driver do you mean?
>>>>>>>>>
>>>>>>>>>>>>>>> I have the dwc3 IP base usb controller, Let me check
>>>>>>>>>>>>>>> with this patch and
>>>>>>>>>>>>>>> share result here. May be we need some fix in dwc3
>>>>>>>>>>>>> Would have been nice if someone could test on other
>>>>>>>>>>>>> controller as well. But
>>>>>>>>>>>>> another instance of dwc3 is also very welcome.
>>>>>>>>>>>>>> It's quite possible, please test on your side.
>>>>>>>>>>>>>> We are happy to test any fixes if you come up with.
>>>>
>>>> --
>>>> With Best Regards,
>>>> Andy Shevchenko
>>>>
>>>>
>
>

2024-05-10 09:45:51

by Hardik Gajjar

[permalink] [raw]
Subject: Re: [PATCH v4] usb: gadget: u_ether: Replace netif_stop_queue with netif_device_detach

On Thu, May 02, 2024 at 10:32:16PM +0200, Ferry Toth wrote:
> Oops, sorry, wrong file attached . Now correct one.
>
> Op 02-05-2024 om 22:13 schreef Ferry Toth:
> > Op 02-05-2024 om 17:29 schreef Hardik Gajjar:
> > > On Tue, Apr 30, 2024 at 11:12:17PM +0200, Ferry Toth wrote:
> > > > Hi,
> > > >
> > > > Op 30-04-2024 om 21:40 schreef Ferry Toth:
> > > > > Hi,
> > > > >
> > > > > Op 30-04-2024 om 17:32 schreef Hardik Gajjar:
> > > > > > On Sun, Apr 28, 2024 at 11:07:36PM +0200, Ferry Toth wrote:
> > > > > > > Hi,
> > > > > > >
> > > > > > > Op 25-04-2024 om 23:27 schreef Ferry Toth:
> > > > > > > > Hi,
> > > > > > > >
> > > > > > > > Op 17-04-2024 om 17:13 schreef Hardik Gajjar:
> > > > > > > > > On Tue, Apr 16, 2024 at 04:48:32PM +0300, Andy Shevchenko wrote:
> > > > > > > > > > On Thu, Apr 11, 2024 at 10:52:36PM +0200, Ferry Toth wrote:
> > > > > > > > > > > Op 11-04-2024 om 18:39 schreef Andy Shevchenko:
> > > > > > > > > > > > On Thu, Apr 11, 2024 at 04:26:37PM +0200, Hardik Gajjar wrote:
> > > > > > > > > > > > > On Wed, Apr 10, 2024 at
> > > > > > > > > > > > > 08:37:42PM +0300, Andy
> > > > > > > > > > > > > Shevchenko wrote:
> > > > > > > > > > > > > > On Sun, Apr 07, 2024 at 10:51:51PM +0200, Ferry Toth wrote:
> > > > > > > > > > > > > > > Op 05-04-2024 om 13:38 schreef Hardik Gajjar:
> > > > > > > > > >
> > > > > > > > > > ...
> > > > > > > > > >
> > > > > > > > > > > > > > > Exactly. And this didn't happen before the 2 patches.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > To be precise: /sys/class/net/usb0 is not
> > > > > > > > > > > > > > > removed and it is a link, the link
> > > > > > > > > > > > > > > target /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/net/usb0
> > > > > > > > > > > > > > > no
> > > > > > > > > > > > > > > longer exists
> > > > > > > > > > > > > So, it means that the /sys/class/net/usb0 is
> > > > > > > > > > > > > present, but the symlink is
> > > > > > > > > > > > > broken. In that case, the dwc3 driver should
> > > > > > > > > > > > > recreate the device, and the
> > > > > > > > > > > > > symlink should become active again
> > > > > > > > > > >
> > > > > > > > > > > Yes, on first enabling gadget (when device mode is activated):
> > > > > > > > > > >
> > > > > > > > > > > root@yuna:~# ls
> > > > > > > > > > > /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/
> > > > > > > > > > > driver net power sound subsystem suspended uevent
> > > > > > > > > > >
> > > > > > > > > > > Then switching to host mode:
> > > > > > > > > > >
> > > > > > > > > > > root@yuna:~# ls
> > > > > > > > > > > /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/
> > > > > > > > > > > ls: cannot access
> > > > > > > > > > > '/sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/':
> > > > > > > > > > > No such file
> > > > > > > > > > > or directory
> > > > > > > > > > >
> > > > > > > > > > > Then back to device mode:
> > > > > > > > > > >
> > > > > > > > > > > root@yuna:~# ls
> > > > > > > > > > > /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/
> > > > > > > > > > > driver power sound subsystem suspended uevent
> > > > > > > > > > >
> > > > > > > > > > > net is missing. But, network functions:
> > > > > > > > > > >
> > > > > > > > > > > root@yuna:~# ping 10.42.0.1
> > > > > > > > > > > PING 10.42.0.1 (10.42.0.1): 56 data bytes
> > > > > > > > > > >
> > > > > > > > > > > Mass storage device is created and removed each time as expected.
> > > > > > > > > >
> > > > > > > > > > So, what's the conclusion? Shall we move towards revert of those
> > > > > > > > > > two changes?
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > As promised, I have the tested the this patch with the dwc3 gadget.
> > > > > > > > > I could not reproduce
> > > > > > > > > the issue.
> > > > > > > > >
> > > > > > > > > I can see the usb0 exist all the time and accessible regardless of
> > > > > > > > > the role switching of the USB mode (peripheral <-> host)
> > > > > > > > >
> > > > > > > > > Following are the logs:
> > > > > > > > > //Host to device
> > > > > > > > >
> > > > > > > > > console:/sys/bus/platform/devices/a800000.ssusb # echo "peripheral"
> > > > > > > > > > mode
> > > > > > > > > console:/sys/bus/platform/devices/a800000.ssusb # ls
> > > > > > > > > a800000.dwc3/gadget/net/
> > > > > > > > > usb0
> > > > > > > > >
> > > > > > > > > //device to host
> > > > > > > > > console:/sys/bus/platform/devices/a800000.ssusb
> > > > > > > > > # echo "host" > mode
> > > > > > > > > console:/sys/bus/platform/devices/a800000.ssusb # ls
> > > > > > > > > a800000.dwc3/gadget/net/
> > > > > > > > > usb0
> > > > > > > >
> > > > > > > > That is weird. When I switch to host mode (using
> > > > > > > > the physical switch),
> > > > > > > > the whole gadget directory is removed (now testing 6.9.0-rc5)
> > > > > > > >
> > > > > > > > Switching back to device mode, that gadget
> > > > > > > > directory is recreated. And
> > > > > > > > gadget/sound as well, but not gadget/net.
> > > > > > > >
> > > > > > > > > s a800000.dwc3/gadget/net/usb0
> > > > > > > > > <
> > > > > > > > > addr_assign_type duplex phys_port_name
> > > > > > > > > addr_len flags phys_switch_id
> > > > > > > > > address gro_flush_timeout power
> > > > > > > > > broadcast ifalias proto_down
> > > > > > > > > carrier ifindex queues
> > > > > > > > > carrier_changes iflink speed
> > > > > > > > > carrier_down_count link_mode statistics
> > > > > > > > > carrier_up_count mtu subsystem
> > > > > > > > > dev_id name_assign_type tx_queue_len
> > > > > > > > > dev_port netdev_group type
> > > > > > > > > device operstate uevent
> > > > > > > > > dormant phys_port_id waiting_for_supplier
> > > > > > > > > console:/sys/bus/platform/devices/a800000.ssusb # ifconfig -a usb0
> > > > > > > > > usb0 Link encap:Ethernet HWaddr 3a:8b:63:97:1a:9a
> > > > > > > > > BROADCAST MULTICAST MTU:1500 Metric:1
> > > > > > > > > RX packets:0 errors:0 dropped:0 overruns:0 frame:0
> > > > > > > > > TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
> > > > > > > > > collisions:0 txqueuelen:1000
> > > > > > > > > RX bytes:0 TX bytes:0
> > > > > > > > >
> > > > > > > > > console:/sys/bus/platform/devices/a800000.ssusb #
> > > > > > > > >
> > > > > > > > > I strongly advise against reverting the patch solely based on the
> > > > > > > > > observed issue of removing the /sys/class/net/usb0 directory while
> > > > > > > > > the usb0 interface remains available.
> > > > > > > >
> > > > > > > > There's more to it. I also mentioned that switching the role or
> > > > > > > > unplugging the cable leaves the usb0 connection.
> > > > > > > >
> > > > > > > > I have while in host mode:
> > > > > > > > root@yuna:~# ifconfig -a usb0
> > > > > > > > usb0: flags=-28605<UP,BROADCAST,RUNNING,MULTICAST,DYNAMIC> mtu 1500
> > > > > > > > inet 10.42.0.221 netmask 255.255.255.0 broadcast
> > > > > > > > 10.42.0.255
> > > > > > > > inet6 fe80::a8bb:ccff:fedd:eef1 prefixlen 64
> > > > > > > > scopeid 0x20<link>
> > > > > > > >
> > > > > > > >
> > > > > > > > You don't see that because you didn't create a connection at all.
> > > > > > > >
> > > > > > > > > Instead, I recommend enabling FTRACE to
> > > > > > > > > trace the functions involved
> > > > > > > > > and identify which faulty call is responsible for removing usb0.
> > > > > > > >
> > > > > > > > Switching from device -> host -> device:
> > > > > > > >
> > > > > > > > root@yuna:~# trace-cmd record -p function_graph -l *gether_*
> > > > > > > > plugin 'function_graph'
> > > > > > > > Hit Ctrl^C to stop recording
> > > > > > > > ^CCPU0 data recorded at offset=0x1c8000
> > > > > > > > 188 bytes in size (4096 uncompressed)
> > > > > > > > CPU1 data recorded at offset=0x1c9000
> > > > > > > > 0 bytes in size (0 uncompressed)
> > > > > > > > root@yuna:~# trace-cmd report
> > > > > > > > cpus=2
> > > > > > > > irq/68-dwc3-725 [000] 514.575337:
> > > > > > > > funcgraph_entry: #
> > > > > > > > 2079.480 us | gether_disconnect();
> > > > > > > > irq/68-dwc3-946 [000] 524.263731:
> > > > > > > > funcgraph_entry: +
> > > > > > > > 11.640 us | gether_disconnect();
> > > > > > > > irq/68-dwc3-946 [000] 524.263743:
> > > > > > > > funcgraph_entry: !
> > > > > > > > 116.520 us | gether_connect();
> > > > > > > > irq/68-dwc3-946 [000] 524.268029:
> > > > > > > > funcgraph_entry: #
> > > > > > > > 2057.260 us | gether_disconnect();
> > > > > > > > irq/68-dwc3-946 [000] 524.270089:
> > > > > > > > funcgraph_entry: !
> > > > > > > > 109.000 us | gether_connect();
> > > > > > >
> > > > > > > I tried to get a more useful trace:
> > > > > > > root@yuna:/sys/kernel/tracing# echo 'gether_*' > set_ftrace_filter
> > > > > > > root@yuna:/sys/kernel/tracing# echo 'eem_*' >> set_ftrace_filter
> > > > > > > root@yuna:/sys/kernel/tracing# echo function > current_tracer
> > > > > > > root@yuna:/sys/kernel/tracing# echo 'reset_config'
> > > > > > > >> set_ftrace_filter
> > > > > > > -> switch to host mode then back to device
> > > > > > > root@yuna:/sys/kernel/tracing# cat trace
> > > > > > > # tracer: function
> > > > > > > #
> > > > > > > # entries-in-buffer/entries-written: 53/53 #P:2
> > > > > > > #
> > > > > > > # _-----=> irqs-off/BH-disabled
> > > > > > > # / _----=> need-resched
> > > > > > > # | / _---=> hardirq/softirq
> > > > > > > # || / _--=> preempt-depth
> > > > > > > # ||| / _-=> migrate-disable
> > > > > > > # |||| / delay
> > > > > > > # TASK-PID CPU# ||||| TIMESTAMP FUNCTION
> > > > > > > # | | | ||||| | |
> > > > > > > irq/68-dwc3-523 [000] D..3. 133.990254: reset_config
> > > > > > > <-__composite_disconnect
> > > > > > > irq/68-dwc3-523 [000] D..3. 133.992274: eem_disable
> > > > > > > <-reset_config
> > > > > > > irq/68-dwc3-523 [000] D..3. 133.992276:
> > > > > > > gether_disconnect
> > > > > > > <-reset_config
> > > > > > > kworker/1:3-443 [001] ...1. 134.022453: eem_unbind
> > > > > > > <-purge_configs_funcs
> > > > > > >
> > > > > > > -> to device mode
> > > > > > >
> > > > > > > kworker/1:3-443 [001] ...1. 148.630773: eem_bind
> > > > > > > <-usb_add_function
> > > > > > > irq/68-dwc3-734 [000] D..3. 149.155209: eem_set_alt
> > > > > > > <-composite_setup
> > > > > > > irq/68-dwc3-734 [000] D..3. 149.155215:
> > > > > > > gether_disconnect
> > > > > > > <-eem_set_alt
> > > > > > > irq/68-dwc3-734 [000] D..3. 149.155220: gether_connect
> > > > > > > <-eem_set_alt
> > > > > > > irq/68-dwc3-734 [000] D..3. 149.157287: eem_set_alt
> > > > > > > <-composite_setup
> > > > > > > irq/68-dwc3-734 [000] D..3. 149.157292:
> > > > > > > gether_disconnect
> > > > > > > <-eem_set_alt
> > > > > > > irq/68-dwc3-734 [000] D..3. 149.159338: gether_connect
> > > > > > > <-eem_set_alt
> > > > > > > irq/68-dwc3-734 [000] D..2. 149.239625: eem_unwrap
> > > > > > > <-rx_complete
> > > > > > > ...
> > > > > > >
> > > > > > > I don't know where to look exactly. Any hints?
> > > > > >
> > > > > > do you see anything related to gether_cleanup() after eem_unbind() ?
> > > > >
> > > > > Nope. It's a pitty that the trace formatting got messed up
> > > > > above. But as
> > > > > you can see I traced gether_* and eem_*. After eem_unbind no traced
> > > > > function is called, until I flip the switch to device mode.
> > > > > The ... at the end is where I cut uninteresting eem_unwrap
> > > > > <-rx_complete
> > > > > and eem_wrap <-eth_start_xmit lines.
> > > > >
> > > > > > If not then, you may try to enable tracing of TCP/IP stack and
> > > > > > network side to check who deleting the sysfs entry
> > > > >
> > > > > Yes, that's a vast amount of functions to trace. And I don't see how
> > > > > that would be related to the patch we're discussing here. I was hoping
> > > > > for a little more targeted hint.
> > > >
> > > > Now filtering 'gether_*', 'eem_*', '*configfs_*', 'composite_*',
> > > > 'usb_fun*',
> > > > 'reset_config' and 'device_remove_file' leads me to:
> > > >
> > > > TIMESTAMP FUNCTION
> > > > | |
> > > > 49.952477: eem_wrap <-eth_start_xmit
> > > > 55.072455: eem_wrap <-eth_start_xmit
> > > > 55.072621: eem_unwrap <-rx_complete
> > > > 59.011540: configfs_composite_reset <-usb_gadget_udc_reset
> > > > 59.011545: composite_reset <-configfs_composite_reset
> > > > 59.011548: reset_config <-__composite_disconnect
> > > > 59.013565: eem_disable <-reset_config
> > > > 59.013567: gether_disconnect <-reset_config
> > > > 59.049560: device_remove_file <-device_remove
> > > > 59.051185: configfs_composite_disconnect <-usb_gadget_disco
> > > > 59.051189: composite_disconnect <-configfs_composite_discon
> > > > 59.051195: configfs_composite_unbind <-gadget_unbind_driver
> > > > 59.052519: eem_unbind <-purge_configs_funcs
> > > > 59.052529: composite_dev_cleanup <-configfs_composite_unbin
> > > > 59.052537: device_remove_file <-composite_dev_cleanup
> > > >
> > > > device_remove_file gets called twice, once by device_remove after
> > > > gether_disconnect (that the one). The 2nd time by composite_dev_cleanup
> > > > (removing the gadget)
> > >
> > > I believe that the device_remove_file function is only removing
> > > suspend-specific attributes, not the complete gadget.
> > > Typically, when you perform the role switch, the Gadget start/stop
> > > function in your UDC driver is called. These functions should not
> > > delete the gadget
> > >
> > > To investigate further, could you please enable the DWC3 functions
> > > in ftrace and check who is removing the gadget?
> > > I can also enable this on my system and compare the logs with yours,
> > > but I will be in PI planning for 1.5 weeks and may not be able to
> > > provide immediate support.
> >
> > Yes, but of course adding dwc3_* (and usb_*) also traces host mode, so
> > trace is 600kb. I cut uninteresting stuff before
> > configfs_composite_reset <-usb_gadget_udc_reset and after
> > __dwc3_set_mode, <https://urldefense.proofpoint.com/v2/url?u=http-3A__300linesremain.Seeattachedtar.gzforyouto&d=DwIDaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=SAhjP5GOmrADp1v_EE5jWoSuMlYCIt9gKduw-DCBPLs&m=zdiBhk-2V5AXxu707QAhbCgWR4qNVRARBmxN17nVB69gVOm-QPqrJeKpo4_trszw&s=ixagWKgLs6wQDJfwh4vIDQNDiy8GZnK9KnUELIfiJz0&e=>
> > compare.
> >
Could you please try with the following patch and see if your issue resolves ?

diff --git a/drivers/usb/gadget/function/f_eem.c b/drivers/usb/gadget/function/f_eem.c
index 3b445bd88498..c2a904475083 100644
--- a/drivers/usb/gadget/function/f_eem.c
+++ b/drivers/usb/gadget/function/f_eem.c
@@ -247,7 +247,7 @@ static int eem_bind(struct usb_configuration *c, struct usb_function *f)
struct usb_composite_dev *cdev = c->cdev;
struct f_eem *eem = func_to_eem(f);
struct usb_string *us;
- int status;
+ int status = 0;
struct usb_ep *ep;

struct f_eem_opts *eem_opts;
@@ -260,16 +260,20 @@ static int eem_bind(struct usb_configuration *c, struct usb_function *f)
* with list_for_each_entry, so we assume no race condition
* with regard to eem_opts->bound access
*/
+ mutex_lock(&eem_opts->lock);
+ gether_set_gadget(eem_opts->net, cdev->gadget);
+
if (!eem_opts->bound) {
- mutex_lock(&eem_opts->lock);
- gether_set_gadget(eem_opts->net, cdev->gadget);
status = gether_register_netdev(eem_opts->net);
- mutex_unlock(&eem_opts->lock);
- if (status)
- return status;
- eem_opts->bound = true;
}

+ mutex_unlock(&eem_opts->lock);
+
+ if (status)
+ goto fail;
+
+ eem_opts->bound = true;
+
us = usb_gstrings_attach(cdev, eem_strings,
ARRAY_SIZE(eem_string_defs));

> > > Additionally, please check if you have any customized DWC patches
> > > that may be causing this problem.
> > >
> > > >
> > > > > You may recall the whole issue did not occur before this
> > > > > patch got applied.
> > > > >
> > > > > > Hardik
> > > > > >
> > > > > >
> > > > > > >
> > > > > > > >
> > > > > > > > > According to current kernel architecture of u_ether driver, only
> > > > > > > > > gether_cleanup should remove the usb0 interface along with its
> > > > > > > > > kobject and sysfs interface.
> > > > > > > > > I suggest sharing the analysis here to understand why this practice
> > > > > > > > > is not followed in your use case or driver ?
> > > > > > > >
> > > > > > > > Yes, I'll try to trace where that happens.
> > > > > > > >
> > > > > > > > Nevertheless, the disappearance of the net/usb0 directory seems
> > > > > > > > harmless? But the usb: net device remaining after disconnect or role
> > > > > > > > switch is not good, as the route remains.
> > > > > > > >
> > > > > > > > May be they are 2 separate problems. Could you try to reproduce what
> > > > > > > > happens if you make eem connection and then unplug?
> > > > > > > >
> > > > > > > > > I am curious why the driver was developed without adhering to the
> > > > > > > > > kernel's gadget architecture.
> > > > > > >
> > > > > > > I don't know what you mean here. Which driver do you mean?
> > > > > > >
> > > > > > > > > >
> > > > > > > > > > > > > I have the dwc3 IP base usb controller, Let me check
> > > > > > > > > > > > > with this patch and
> > > > > > > > > > > > > share result here. May be we need some fix in dwc3
> > > > > > > > > > > Would have been nice if someone could test on other
> > > > > > > > > > > controller as well. But
> > > > > > > > > > > another instance of dwc3 is also very welcome.
> > > > > > > > > > > > It's quite possible, please test on your side.
> > > > > > > > > > > > We are happy to test any fixes if you come up with.
> > > > > > > > > >
> > > > > > > > > > --
> > > > > > > > > > With Best Regards,
> > > > > > > > > > Andy Shevchenko
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > >



2024-05-15 18:47:16

by Ferry Toth

[permalink] [raw]
Subject: Re: [PATCH v4] usb: gadget: u_ether: Replace netif_stop_queue with netif_device_detach

Hi,

Sorry to have kept you waiting, I was away for a short break.

Patch tested below.

Op 10-05-2024 om 11:45 schreef Hardik Gajjar:
> On Thu, May 02, 2024 at 10:32:16PM +0200, Ferry Toth wrote:
>> Oops, sorry, wrong file attached . Now correct one.
>>
>> Op 02-05-2024 om 22:13 schreef Ferry Toth:
>>> Op 02-05-2024 om 17:29 schreef Hardik Gajjar:
>>>> On Tue, Apr 30, 2024 at 11:12:17PM +0200, Ferry Toth wrote:
>>>>> Hi,
>>>>>
>>>>> Op 30-04-2024 om 21:40 schreef Ferry Toth:
>>>>>> Hi,
>>>>>>
>>>>>> Op 30-04-2024 om 17:32 schreef Hardik Gajjar:
>>>>>>> On Sun, Apr 28, 2024 at 11:07:36PM +0200, Ferry Toth wrote:
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> Op 25-04-2024 om 23:27 schreef Ferry Toth:
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> Op 17-04-2024 om 17:13 schreef Hardik Gajjar:
>>>>>>>>>> On Tue, Apr 16, 2024 at 04:48:32PM +0300, Andy Shevchenko wrote:
>>>>>>>>>>> On Thu, Apr 11, 2024 at 10:52:36PM +0200, Ferry Toth wrote:
>>>>>>>>>>>> Op 11-04-2024 om 18:39 schreef Andy Shevchenko:
>>>>>>>>>>>>> On Thu, Apr 11, 2024 at 04:26:37PM +0200, Hardik Gajjar wrote:
>>>>>>>>>>>>>> On Wed, Apr 10, 2024 at
>>>>>>>>>>>>>> 08:37:42PM +0300, Andy
>>>>>>>>>>>>>> Shevchenko wrote:
>>>>>>>>>>>>>>> On Sun, Apr 07, 2024 at 10:51:51PM +0200, Ferry Toth wrote:
>>>>>>>>>>>>>>>> Op 05-04-2024 om 13:38 schreef Hardik Gajjar:
>>>>>>>>>>>
>>>>>>>>>>> ...
>>>>>>>>>>>
>>>>>>>>>>>>>>>> Exactly. And this didn't happen before the 2 patches.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> To be precise: /sys/class/net/usb0 is not
>>>>>>>>>>>>>>>> removed and it is a link, the link
>>>>>>>>>>>>>>>> target /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/net/usb0
>>>>>>>>>>>>>>>> no
>>>>>>>>>>>>>>>> longer exists
>>>>>>>>>>>>>> So, it means that the /sys/class/net/usb0 is
>>>>>>>>>>>>>> present, but the symlink is
>>>>>>>>>>>>>> broken. In that case, the dwc3 driver should
>>>>>>>>>>>>>> recreate the device, and the
>>>>>>>>>>>>>> symlink should become active again
>>>>>>>>>>>>
>>>>>>>>>>>> Yes, on first enabling gadget (when device mode is activated):
>>>>>>>>>>>>
>>>>>>>>>>>> root@yuna:~# ls
>>>>>>>>>>>> /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/
>>>>>>>>>>>> driver net power sound subsystem suspended uevent
>>>>>>>>>>>>
>>>>>>>>>>>> Then switching to host mode:
>>>>>>>>>>>>
>>>>>>>>>>>> root@yuna:~# ls
>>>>>>>>>>>> /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/
>>>>>>>>>>>> ls: cannot access
>>>>>>>>>>>> '/sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/':
>>>>>>>>>>>> No such file
>>>>>>>>>>>> or directory
>>>>>>>>>>>>
>>>>>>>>>>>> Then back to device mode:
>>>>>>>>>>>>
>>>>>>>>>>>> root@yuna:~# ls
>>>>>>>>>>>> /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/
>>>>>>>>>>>> driver power sound subsystem suspended uevent
>>>>>>>>>>>>
>>>>>>>>>>>> net is missing. But, network functions:
>>>>>>>>>>>>
>>>>>>>>>>>> root@yuna:~# ping 10.42.0.1
>>>>>>>>>>>> PING 10.42.0.1 (10.42.0.1): 56 data bytes
>>>>>>>>>>>>
>>>>>>>>>>>> Mass storage device is created and removed each time as expected.
>>>>>>>>>>>
>>>>>>>>>>> So, what's the conclusion? Shall we move towards revert of those
>>>>>>>>>>> two changes?
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> As promised, I have the tested the this patch with the dwc3 gadget.
>>>>>>>>>> I could not reproduce
>>>>>>>>>> the issue.
>>>>>>>>>>
>>>>>>>>>> I can see the usb0 exist all the time and accessible regardless of
>>>>>>>>>> the role switching of the USB mode (peripheral <-> host)
>>>>>>>>>>
>>>>>>>>>> Following are the logs:
>>>>>>>>>> //Host to device
>>>>>>>>>>
>>>>>>>>>> console:/sys/bus/platform/devices/a800000.ssusb # echo "peripheral"
>>>>>>>>>>> mode
>>>>>>>>>> console:/sys/bus/platform/devices/a800000.ssusb # ls
>>>>>>>>>> a800000.dwc3/gadget/net/
>>>>>>>>>> usb0
>>>>>>>>>>
>>>>>>>>>> //device to host
>>>>>>>>>> console:/sys/bus/platform/devices/a800000.ssusb
>>>>>>>>>> # echo "host" > mode
>>>>>>>>>> console:/sys/bus/platform/devices/a800000.ssusb # ls
>>>>>>>>>> a800000.dwc3/gadget/net/
>>>>>>>>>> usb0
>>>>>>>>>
>>>>>>>>> That is weird. When I switch to host mode (using
>>>>>>>>> the physical switch),
>>>>>>>>> the whole gadget directory is removed (now testing 6.9.0-rc5)
>>>>>>>>>
>>>>>>>>> Switching back to device mode, that gadget
>>>>>>>>> directory is recreated. And
>>>>>>>>> gadget/sound as well, but not gadget/net.
>>>>>>>>>
>>>>>>>>>> s a800000.dwc3/gadget/net/usb0
>>>>>>>>>> <
>>>>>>>>>> addr_assign_type duplex phys_port_name
>>>>>>>>>> addr_len flags phys_switch_id
>>>>>>>>>> address gro_flush_timeout power
>>>>>>>>>> broadcast ifalias proto_down
>>>>>>>>>> carrier ifindex queues
>>>>>>>>>> carrier_changes iflink speed
>>>>>>>>>> carrier_down_count link_mode statistics
>>>>>>>>>> carrier_up_count mtu subsystem
>>>>>>>>>> dev_id name_assign_type tx_queue_len
>>>>>>>>>> dev_port netdev_group type
>>>>>>>>>> device operstate uevent
>>>>>>>>>> dormant phys_port_id waiting_for_supplier
>>>>>>>>>> console:/sys/bus/platform/devices/a800000.ssusb # ifconfig -a usb0
>>>>>>>>>> usb0 Link encap:Ethernet HWaddr 3a:8b:63:97:1a:9a
>>>>>>>>>> BROADCAST MULTICAST MTU:1500 Metric:1
>>>>>>>>>> RX packets:0 errors:0 dropped:0 overruns:0 frame:0
>>>>>>>>>> TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
>>>>>>>>>> collisions:0 txqueuelen:1000
>>>>>>>>>> RX bytes:0 TX bytes:0
>>>>>>>>>>
>>>>>>>>>> console:/sys/bus/platform/devices/a800000.ssusb #
>>>>>>>>>>
>>>>>>>>>> I strongly advise against reverting the patch solely based on the
>>>>>>>>>> observed issue of removing the /sys/class/net/usb0 directory while
>>>>>>>>>> the usb0 interface remains available.
>>>>>>>>>
>>>>>>>>> There's more to it. I also mentioned that switching the role or
>>>>>>>>> unplugging the cable leaves the usb0 connection.
>>>>>>>>>
>>>>>>>>> I have while in host mode:
>>>>>>>>> root@yuna:~# ifconfig -a usb0
>>>>>>>>> usb0: flags=-28605<UP,BROADCAST,RUNNING,MULTICAST,DYNAMIC> mtu 1500
>>>>>>>>> inet 10.42.0.221 netmask 255.255.255.0 broadcast
>>>>>>>>> 10.42.0.255
>>>>>>>>> inet6 fe80::a8bb:ccff:fedd:eef1 prefixlen 64
>>>>>>>>> scopeid 0x20<link>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> You don't see that because you didn't create a connection at all.
>>>>>>>>>
>>>>>>>>>> Instead, I recommend enabling FTRACE to
>>>>>>>>>> trace the functions involved
>>>>>>>>>> and identify which faulty call is responsible for removing usb0.
>>>>>>>>>
>>>>>>>>> Switching from device -> host -> device:
>>>>>>>>>
>>>>>>>>> root@yuna:~# trace-cmd record -p function_graph -l *gether_*
>>>>>>>>> plugin 'function_graph'
>>>>>>>>> Hit Ctrl^C to stop recording
>>>>>>>>> ^CCPU0 data recorded at offset=0x1c8000
>>>>>>>>> 188 bytes in size (4096 uncompressed)
>>>>>>>>> CPU1 data recorded at offset=0x1c9000
>>>>>>>>> 0 bytes in size (0 uncompressed)
>>>>>>>>> root@yuna:~# trace-cmd report
>>>>>>>>> cpus=2
>>>>>>>>> irq/68-dwc3-725 [000] 514.575337:
>>>>>>>>> funcgraph_entry: #
>>>>>>>>> 2079.480 us | gether_disconnect();
>>>>>>>>> irq/68-dwc3-946 [000] 524.263731:
>>>>>>>>> funcgraph_entry: +
>>>>>>>>> 11.640 us | gether_disconnect();
>>>>>>>>> irq/68-dwc3-946 [000] 524.263743:
>>>>>>>>> funcgraph_entry: !
>>>>>>>>> 116.520 us | gether_connect();
>>>>>>>>> irq/68-dwc3-946 [000] 524.268029:
>>>>>>>>> funcgraph_entry: #
>>>>>>>>> 2057.260 us | gether_disconnect();
>>>>>>>>> irq/68-dwc3-946 [000] 524.270089:
>>>>>>>>> funcgraph_entry: !
>>>>>>>>> 109.000 us | gether_connect();
>>>>>>>>
>>>>>>>> I tried to get a more useful trace:
>>>>>>>> root@yuna:/sys/kernel/tracing# echo 'gether_*' > set_ftrace_filter
>>>>>>>> root@yuna:/sys/kernel/tracing# echo 'eem_*' >> set_ftrace_filter
>>>>>>>> root@yuna:/sys/kernel/tracing# echo function > current_tracer
>>>>>>>> root@yuna:/sys/kernel/tracing# echo 'reset_config'
>>>>>>>>>> set_ftrace_filter
>>>>>>>> -> switch to host mode then back to device
>>>>>>>> root@yuna:/sys/kernel/tracing# cat trace
>>>>>>>> # tracer: function
>>>>>>>> #
>>>>>>>> # entries-in-buffer/entries-written: 53/53 #P:2
>>>>>>>> #
>>>>>>>> # _-----=> irqs-off/BH-disabled
>>>>>>>> # / _----=> need-resched
>>>>>>>> # | / _---=> hardirq/softirq
>>>>>>>> # || / _--=> preempt-depth
>>>>>>>> # ||| / _-=> migrate-disable
>>>>>>>> # |||| / delay
>>>>>>>> # TASK-PID CPU# ||||| TIMESTAMP FUNCTION
>>>>>>>> # | | | ||||| | |
>>>>>>>> irq/68-dwc3-523 [000] D..3. 133.990254: reset_config
>>>>>>>> <-__composite_disconnect
>>>>>>>> irq/68-dwc3-523 [000] D..3. 133.992274: eem_disable
>>>>>>>> <-reset_config
>>>>>>>> irq/68-dwc3-523 [000] D..3. 133.992276:
>>>>>>>> gether_disconnect
>>>>>>>> <-reset_config
>>>>>>>> kworker/1:3-443 [001] ...1. 134.022453: eem_unbind
>>>>>>>> <-purge_configs_funcs
>>>>>>>>
>>>>>>>> -> to device mode
>>>>>>>>
>>>>>>>> kworker/1:3-443 [001] ...1. 148.630773: eem_bind
>>>>>>>> <-usb_add_function
>>>>>>>> irq/68-dwc3-734 [000] D..3. 149.155209: eem_set_alt
>>>>>>>> <-composite_setup
>>>>>>>> irq/68-dwc3-734 [000] D..3. 149.155215:
>>>>>>>> gether_disconnect
>>>>>>>> <-eem_set_alt
>>>>>>>> irq/68-dwc3-734 [000] D..3. 149.155220: gether_connect
>>>>>>>> <-eem_set_alt
>>>>>>>> irq/68-dwc3-734 [000] D..3. 149.157287: eem_set_alt
>>>>>>>> <-composite_setup
>>>>>>>> irq/68-dwc3-734 [000] D..3. 149.157292:
>>>>>>>> gether_disconnect
>>>>>>>> <-eem_set_alt
>>>>>>>> irq/68-dwc3-734 [000] D..3. 149.159338: gether_connect
>>>>>>>> <-eem_set_alt
>>>>>>>> irq/68-dwc3-734 [000] D..2. 149.239625: eem_unwrap
>>>>>>>> <-rx_complete
>>>>>>>> ...
>>>>>>>>
>>>>>>>> I don't know where to look exactly. Any hints?
>>>>>>>
>>>>>>> do you see anything related to gether_cleanup() after eem_unbind() ?
>>>>>>
>>>>>> Nope. It's a pitty that the trace formatting got messed up
>>>>>> above. But as
>>>>>> you can see I traced gether_* and eem_*. After eem_unbind no traced
>>>>>> function is called, until I flip the switch to device mode.
>>>>>> The ... at the end is where I cut uninteresting eem_unwrap
>>>>>> <-rx_complete
>>>>>> and eem_wrap <-eth_start_xmit lines.
>>>>>>
>>>>>>> If not then, you may try to enable tracing of TCP/IP stack and
>>>>>>> network side to check who deleting the sysfs entry
>>>>>>
>>>>>> Yes, that's a vast amount of functions to trace. And I don't see how
>>>>>> that would be related to the patch we're discussing here. I was hoping
>>>>>> for a little more targeted hint.
>>>>>
>>>>> Now filtering 'gether_*', 'eem_*', '*configfs_*', 'composite_*',
>>>>> 'usb_fun*',
>>>>> 'reset_config' and 'device_remove_file' leads me to:
>>>>>
>>>>> TIMESTAMP FUNCTION
>>>>> | |
>>>>> 49.952477: eem_wrap <-eth_start_xmit
>>>>> 55.072455: eem_wrap <-eth_start_xmit
>>>>> 55.072621: eem_unwrap <-rx_complete
>>>>> 59.011540: configfs_composite_reset <-usb_gadget_udc_reset
>>>>> 59.011545: composite_reset <-configfs_composite_reset
>>>>> 59.011548: reset_config <-__composite_disconnect
>>>>> 59.013565: eem_disable <-reset_config
>>>>> 59.013567: gether_disconnect <-reset_config
>>>>> 59.049560: device_remove_file <-device_remove
>>>>> 59.051185: configfs_composite_disconnect <-usb_gadget_disco
>>>>> 59.051189: composite_disconnect <-configfs_composite_discon
>>>>> 59.051195: configfs_composite_unbind <-gadget_unbind_driver
>>>>> 59.052519: eem_unbind <-purge_configs_funcs
>>>>> 59.052529: composite_dev_cleanup <-configfs_composite_unbin
>>>>> 59.052537: device_remove_file <-composite_dev_cleanup
>>>>>
>>>>> device_remove_file gets called twice, once by device_remove after
>>>>> gether_disconnect (that the one). The 2nd time by composite_dev_cleanup
>>>>> (removing the gadget)
>>>>
>>>> I believe that the device_remove_file function is only removing
>>>> suspend-specific attributes, not the complete gadget.
>>>> Typically, when you perform the role switch, the Gadget start/stop
>>>> function in your UDC driver is called. These functions should not
>>>> delete the gadget
>>>>
>>>> To investigate further, could you please enable the DWC3 functions
>>>> in ftrace and check who is removing the gadget?
>>>> I can also enable this on my system and compare the logs with yours,
>>>> but I will be in PI planning for 1.5 weeks and may not be able to
>>>> provide immediate support.
>>>
>>> Yes, but of course adding dwc3_* (and usb_*) also traces host mode, so
>>> trace is 600kb. I cut uninteresting stuff before
>>> configfs_composite_reset <-usb_gadget_udc_reset and after
>>> __dwc3_set_mode, <https://urldefense.proofpoint.com/v2/url?u=http-3A__300linesremain.Seeattachedtar.gzforyouto&d=DwIDaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=SAhjP5GOmrADp1v_EE5jWoSuMlYCIt9gKduw-DCBPLs&m=zdiBhk-2V5AXxu707QAhbCgWR4qNVRARBmxN17nVB69gVOm-QPqrJeKpo4_trszw&s=ixagWKgLs6wQDJfwh4vIDQNDiy8GZnK9KnUELIfiJz0&e=>
>>> compare.
>>>
> Could you please try with the following patch and see if your issue resolves ?
>
> diff --git a/drivers/usb/gadget/function/f_eem.c b/drivers/usb/gadget/function/f_eem.c
> index 3b445bd88498..c2a904475083 100644
> --- a/drivers/usb/gadget/function/f_eem.c
> +++ b/drivers/usb/gadget/function/f_eem.c
> @@ -247,7 +247,7 @@ static int eem_bind(struct usb_configuration *c, struct usb_function *f)
> struct usb_composite_dev *cdev = c->cdev;
> struct f_eem *eem = func_to_eem(f);
> struct usb_string *us;
> - int status;
> + int status = 0;
> struct usb_ep *ep;
>
> struct f_eem_opts *eem_opts;
> @@ -260,16 +260,20 @@ static int eem_bind(struct usb_configuration *c, struct usb_function *f)
> * with list_for_each_entry, so we assume no race condition
> * with regard to eem_opts->bound access
> */
> + mutex_lock(&eem_opts->lock);
> + gether_set_gadget(eem_opts->net, cdev->gadget);
> +
> if (!eem_opts->bound) {
> - mutex_lock(&eem_opts->lock);
> - gether_set_gadget(eem_opts->net, cdev->gadget);
> status = gether_register_netdev(eem_opts->net);
> - mutex_unlock(&eem_opts->lock);
> - if (status)
> - return status;
> - eem_opts->bound = true;
> }
>
> + mutex_unlock(&eem_opts->lock);
> +
> + if (status)
> + goto fail;
> +
> + eem_opts->bound = true;
> +
> us = usb_gstrings_attach(cdev, eem_strings,
> ARRAY_SIZE(eem_string_defs));
>

After switching from host -> device -> host the usb0 device as seen by
ifconfig remains "RUNNING" and in the routing table.

FYI This is the regression.

Also /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/net is
still missing.

After deeper diving into v6.1.0, I found the latter a longer existing
bug, not caused by your patch.

More over, this doesn't seem to do any harm.

The first issue does, as we try to route traffic over a device that no
longer exists.

>>>> Additionally, please check if you have any customized DWC patches
>>>> that may be causing this problem.
>>>>
>>>>>
>>>>>> You may recall the whole issue did not occur before this
>>>>>> patch got applied.
>>>>>>
>>>>>>> Hardik
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>>> According to current kernel architecture of u_ether driver, only
>>>>>>>>>> gether_cleanup should remove the usb0 interface along with its
>>>>>>>>>> kobject and sysfs interface.
>>>>>>>>>> I suggest sharing the analysis here to understand why this practice
>>>>>>>>>> is not followed in your use case or driver ?
>>>>>>>>>
>>>>>>>>> Yes, I'll try to trace where that happens.
>>>>>>>>>
>>>>>>>>> Nevertheless, the disappearance of the net/usb0 directory seems
>>>>>>>>> harmless? But the usb: net device remaining after disconnect or role
>>>>>>>>> switch is not good, as the route remains.
>>>>>>>>>
>>>>>>>>> May be they are 2 separate problems. Could you try to reproduce what
>>>>>>>>> happens if you make eem connection and then unplug?
>>>>>>>>>
>>>>>>>>>> I am curious why the driver was developed without adhering to the
>>>>>>>>>> kernel's gadget architecture.
>>>>>>>>
>>>>>>>> I don't know what you mean here. Which driver do you mean?
>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>>> I have the dwc3 IP base usb controller, Let me check
>>>>>>>>>>>>>> with this patch and
>>>>>>>>>>>>>> share result here. May be we need some fix in dwc3
>>>>>>>>>>>> Would have been nice if someone could test on other
>>>>>>>>>>>> controller as well. But
>>>>>>>>>>>> another instance of dwc3 is also very welcome.
>>>>>>>>>>>>> It's quite possible, please test on your side.
>>>>>>>>>>>>> We are happy to test any fixes if you come up with.
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> With Best Regards,
>>>>>>>>>>> Andy Shevchenko
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>
>>>>>>
>
>


2024-05-26 20:53:09

by Ferry Toth

[permalink] [raw]
Subject: Re: [PATCH v4] usb: gadget: u_ether: Replace netif_stop_queue with netif_device_detach

Hi

Op 15-05-2024 om 20:38 schreef Ferry Toth:
> Hi,
>
> Sorry to have kept you waiting, I was away for a short break.
>
> Patch tested below.
>
> Op 10-05-2024 om 11:45 schreef Hardik Gajjar:
>> On Thu, May 02, 2024 at 10:32:16PM +0200, Ferry Toth wrote:
>>> Oops, sorry, wrong file attached . Now correct one.
>>>
>>> Op 02-05-2024 om 22:13 schreef Ferry Toth:
>>>> Op 02-05-2024 om 17:29 schreef Hardik Gajjar:
>>>>> On Tue, Apr 30, 2024 at 11:12:17PM +0200, Ferry Toth wrote:
>>>>>> Hi,
>>>>>>
>>>>>> Op 30-04-2024 om 21:40 schreef Ferry Toth:
>>>>>>> Hi,
>>>>>>>
>>>>>>> Op 30-04-2024 om 17:32 schreef Hardik Gajjar:
>>>>>>>> On Sun, Apr 28, 2024 at 11:07:36PM +0200, Ferry Toth wrote:
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> Op 25-04-2024 om 23:27 schreef Ferry Toth:
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> Op 17-04-2024 om 17:13 schreef Hardik Gajjar:
>>>>>>>>>>> On Tue, Apr 16, 2024 at 04:48:32PM +0300, Andy Shevchenko wrote:
>>>>>>>>>>>> On Thu, Apr 11, 2024 at 10:52:36PM +0200, Ferry Toth wrote:
>>>>>>>>>>>>> Op 11-04-2024 om 18:39 schreef Andy Shevchenko:
>>>>>>>>>>>>>> On Thu, Apr 11, 2024 at 04:26:37PM +0200, Hardik Gajjar
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>> On Wed, Apr 10, 2024 at
>>>>>>>>>>>>>>> 08:37:42PM +0300, Andy
>>>>>>>>>>>>>>> Shevchenko wrote:
>>>>>>>>>>>>>>>> On Sun, Apr 07, 2024 at 10:51:51PM +0200, Ferry Toth wrote:
>>>>>>>>>>>>>>>>> Op 05-04-2024 om 13:38 schreef Hardik Gajjar:
>>>>>>>>>>>>
>>>>>>>>>>>> ...
>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Exactly. And this didn't happen before the 2 patches.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> To be precise: /sys/class/net/usb0 is not
>>>>>>>>>>>>>>>>> removed and it is a link, the link
>>>>>>>>>>>>>>>>> target
>>>>>>>>>>>>>>>>> /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/net/usb0
>>>>>>>>>>>>>>>>> no
>>>>>>>>>>>>>>>>> longer exists
>>>>>>>>>>>>>>> So, it means that the /sys/class/net/usb0 is
>>>>>>>>>>>>>>> present, but the symlink is
>>>>>>>>>>>>>>> broken. In that case, the dwc3 driver should
>>>>>>>>>>>>>>> recreate the device, and the
>>>>>>>>>>>>>>> symlink should become active again
>>>>>>>>>>>>>
>>>>>>>>>>>>> Yes, on first enabling gadget (when device mode is activated):
>>>>>>>>>>>>>
>>>>>>>>>>>>> root@yuna:~# ls
>>>>>>>>>>>>> /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/
>>>>>>>>>>>>> driver  net  power  sound  subsystem  suspended  uevent
>>>>>>>>>>>>>
>>>>>>>>>>>>> Then switching to host mode:
>>>>>>>>>>>>>
>>>>>>>>>>>>> root@yuna:~# ls
>>>>>>>>>>>>> /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/
>>>>>>>>>>>>> ls: cannot access
>>>>>>>>>>>>> '/sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/':
>>>>>>>>>>>>> No such file
>>>>>>>>>>>>> or directory
>>>>>>>>>>>>>
>>>>>>>>>>>>> Then back to device mode:
>>>>>>>>>>>>>
>>>>>>>>>>>>> root@yuna:~# ls
>>>>>>>>>>>>> /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/
>>>>>>>>>>>>> driver  power  sound  subsystem  suspended  uevent
>>>>>>>>>>>>>
>>>>>>>>>>>>> net is missing. But, network functions:
>>>>>>>>>>>>>
>>>>>>>>>>>>> root@yuna:~# ping 10.42.0.1
>>>>>>>>>>>>> PING 10.42.0.1 (10.42.0.1): 56 data bytes
>>>>>>>>>>>>>
>>>>>>>>>>>>> Mass storage device is created and removed each time as
>>>>>>>>>>>>> expected.
>>>>>>>>>>>>
>>>>>>>>>>>> So, what's the conclusion? Shall we move towards revert of
>>>>>>>>>>>> those
>>>>>>>>>>>> two changes?
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> As promised, I have the tested the this patch with the dwc3
>>>>>>>>>>> gadget.
>>>>>>>>>>> I could not reproduce
>>>>>>>>>>> the issue.
>>>>>>>>>>>
>>>>>>>>>>> I can see the usb0 exist all the time and accessible
>>>>>>>>>>> regardless of
>>>>>>>>>>> the role switching of the USB mode (peripheral <-> host)
>>>>>>>>>>>
>>>>>>>>>>> Following are the logs:
>>>>>>>>>>> //Host to device
>>>>>>>>>>>
>>>>>>>>>>> console:/sys/bus/platform/devices/a800000.ssusb # echo
>>>>>>>>>>> "peripheral"
>>>>>>>>>>>> mode
>>>>>>>>>>> console:/sys/bus/platform/devices/a800000.ssusb # ls
>>>>>>>>>>> a800000.dwc3/gadget/net/
>>>>>>>>>>> usb0
>>>>>>>>>>>
>>>>>>>>>>> //device to host
>>>>>>>>>>> console:/sys/bus/platform/devices/a800000.ssusb
>>>>>>>>>>> # echo "host" > mode
>>>>>>>>>>> console:/sys/bus/platform/devices/a800000.ssusb # ls
>>>>>>>>>>> a800000.dwc3/gadget/net/
>>>>>>>>>>> usb0
>>>>>>>>>>
>>>>>>>>>> That is weird. When I switch to host mode (using
>>>>>>>>>> the physical switch),
>>>>>>>>>> the whole gadget directory is removed (now testing 6.9.0-rc5)
>>>>>>>>>>
>>>>>>>>>> Switching back to device mode, that gadget
>>>>>>>>>> directory is recreated. And
>>>>>>>>>> gadget/sound as well, but not gadget/net.
>>>>>>>>>>
>>>>>>>>>>> s a800000.dwc3/gadget/net/usb0
>>>>>>>>>>> <
>>>>>>>>>>> addr_assign_type    duplex             phys_port_name
>>>>>>>>>>> addr_len            flags              phys_switch_id
>>>>>>>>>>> address             gro_flush_timeout  power
>>>>>>>>>>> broadcast           ifalias            proto_down
>>>>>>>>>>> carrier             ifindex            queues
>>>>>>>>>>> carrier_changes     iflink             speed
>>>>>>>>>>> carrier_down_count  link_mode          statistics
>>>>>>>>>>> carrier_up_count    mtu                subsystem
>>>>>>>>>>> dev_id              name_assign_type   tx_queue_len
>>>>>>>>>>> dev_port            netdev_group       type
>>>>>>>>>>> device              operstate          uevent
>>>>>>>>>>> dormant             phys_port_id       waiting_for_supplier
>>>>>>>>>>> console:/sys/bus/platform/devices/a800000.ssusb # ifconfig -a
>>>>>>>>>>> usb0
>>>>>>>>>>> usb0      Link encap:Ethernet  HWaddr 3a:8b:63:97:1a:9a
>>>>>>>>>>>               BROADCAST MULTICAST  MTU:1500  Metric:1
>>>>>>>>>>>               RX packets:0 errors:0 dropped:0 overruns:0 frame:0
>>>>>>>>>>>               TX packets:0 errors:0 dropped:0 overruns:0
>>>>>>>>>>> carrier:0
>>>>>>>>>>>               collisions:0 txqueuelen:1000
>>>>>>>>>>>               RX bytes:0 TX bytes:0
>>>>>>>>>>>
>>>>>>>>>>> console:/sys/bus/platform/devices/a800000.ssusb #
>>>>>>>>>>>
>>>>>>>>>>> I strongly advise against reverting the patch solely based on
>>>>>>>>>>> the
>>>>>>>>>>> observed issue of removing the /sys/class/net/usb0 directory
>>>>>>>>>>> while
>>>>>>>>>>> the usb0 interface remains available.
>>>>>>>>>>
>>>>>>>>>> There's more to it. I also mentioned that switching the role or
>>>>>>>>>> unplugging the cable leaves the usb0 connection.
>>>>>>>>>>
>>>>>>>>>> I have while in host mode:
>>>>>>>>>> root@yuna:~# ifconfig -a usb0
>>>>>>>>>> usb0: flags=-28605<UP,BROADCAST,RUNNING,MULTICAST,DYNAMIC>
>>>>>>>>>> mtu 1500
>>>>>>>>>>             inet 10.42.0.221  netmask 255.255.255.0  broadcast
>>>>>>>>>> 10.42.0.255
>>>>>>>>>>             inet6 fe80::a8bb:ccff:fedd:eef1  prefixlen 64
>>>>>>>>>> scopeid 0x20<link>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> You don't see that because you didn't create a connection at all.
>>>>>>>>>>
>>>>>>>>>>> Instead, I recommend enabling FTRACE to
>>>>>>>>>>> trace the functions involved
>>>>>>>>>>> and identify which faulty call is responsible for removing usb0.
>>>>>>>>>>
>>>>>>>>>> Switching from device -> host -> device:
>>>>>>>>>>
>>>>>>>>>> root@yuna:~# trace-cmd record -p function_graph -l *gether_*
>>>>>>>>>>       plugin 'function_graph'
>>>>>>>>>> Hit Ctrl^C to stop recording
>>>>>>>>>> ^CCPU0 data recorded at offset=0x1c8000
>>>>>>>>>>         188 bytes in size (4096 uncompressed)
>>>>>>>>>> CPU1 data recorded at offset=0x1c9000
>>>>>>>>>>         0 bytes in size (0 uncompressed)
>>>>>>>>>> root@yuna:~# trace-cmd report
>>>>>>>>>> cpus=2
>>>>>>>>>>          irq/68-dwc3-725   [000]   514.575337:
>>>>>>>>>> funcgraph_entry:      #
>>>>>>>>>> 2079.480 us |  gether_disconnect();
>>>>>>>>>>          irq/68-dwc3-946   [000]   524.263731:
>>>>>>>>>> funcgraph_entry:      +
>>>>>>>>>> 11.640 us  |  gether_disconnect();
>>>>>>>>>>          irq/68-dwc3-946   [000]   524.263743:
>>>>>>>>>> funcgraph_entry:      !
>>>>>>>>>> 116.520 us |  gether_connect();
>>>>>>>>>>          irq/68-dwc3-946   [000]   524.268029:
>>>>>>>>>> funcgraph_entry:      #
>>>>>>>>>> 2057.260 us |  gether_disconnect();
>>>>>>>>>>          irq/68-dwc3-946   [000]   524.270089:
>>>>>>>>>> funcgraph_entry:      !
>>>>>>>>>> 109.000 us |  gether_connect();
>>>>>>>>>
>>>>>>>>> I tried to get a more useful trace:
>>>>>>>>> root@yuna:/sys/kernel/tracing# echo 'gether_*' > set_ftrace_filter
>>>>>>>>> root@yuna:/sys/kernel/tracing# echo 'eem_*' >> set_ftrace_filter
>>>>>>>>> root@yuna:/sys/kernel/tracing# echo function > current_tracer
>>>>>>>>> root@yuna:/sys/kernel/tracing# echo 'reset_config'
>>>>>>>>>>> set_ftrace_filter
>>>>>>>>> -> switch to host mode then back to device
>>>>>>>>> root@yuna:/sys/kernel/tracing# cat trace
>>>>>>>>> # tracer: function
>>>>>>>>> #
>>>>>>>>> # entries-in-buffer/entries-written: 53/53   #P:2
>>>>>>>>> #
>>>>>>>>> #                                _-----=> irqs-off/BH-disabled
>>>>>>>>> #                               / _----=> need-resched
>>>>>>>>> #                              | / _---=> hardirq/softirq
>>>>>>>>> #                              || / _--=> preempt-depth
>>>>>>>>> #                              ||| / _-=> migrate-disable
>>>>>>>>> #                              |||| /     delay
>>>>>>>>> #           TASK-PID     CPU#  |||||  TIMESTAMP  FUNCTION
>>>>>>>>> #              | |         |   |||||     |         |
>>>>>>>>>         irq/68-dwc3-523     [000] D..3.   133.990254: reset_config
>>>>>>>>> <-__composite_disconnect
>>>>>>>>>         irq/68-dwc3-523     [000] D..3.   133.992274: eem_disable
>>>>>>>>> <-reset_config
>>>>>>>>>         irq/68-dwc3-523     [000] D..3.   133.992276:
>>>>>>>>> gether_disconnect
>>>>>>>>> <-reset_config
>>>>>>>>>         kworker/1:3-443     [001] ...1.   134.022453: eem_unbind
>>>>>>>>> <-purge_configs_funcs
>>>>>>>>>
>>>>>>>>> -> to device mode
>>>>>>>>>
>>>>>>>>>         kworker/1:3-443     [001] ...1.   148.630773: eem_bind
>>>>>>>>> <-usb_add_function
>>>>>>>>>         irq/68-dwc3-734     [000] D..3.   149.155209: eem_set_alt
>>>>>>>>> <-composite_setup
>>>>>>>>>         irq/68-dwc3-734     [000] D..3.   149.155215:
>>>>>>>>> gether_disconnect
>>>>>>>>> <-eem_set_alt
>>>>>>>>>         irq/68-dwc3-734     [000] D..3.   149.155220:
>>>>>>>>> gether_connect
>>>>>>>>> <-eem_set_alt
>>>>>>>>>         irq/68-dwc3-734     [000] D..3.   149.157287: eem_set_alt
>>>>>>>>> <-composite_setup
>>>>>>>>>         irq/68-dwc3-734     [000] D..3.   149.157292:
>>>>>>>>> gether_disconnect
>>>>>>>>> <-eem_set_alt
>>>>>>>>>         irq/68-dwc3-734     [000] D..3.   149.159338:
>>>>>>>>> gether_connect
>>>>>>>>> <-eem_set_alt
>>>>>>>>>         irq/68-dwc3-734     [000] D..2.   149.239625: eem_unwrap
>>>>>>>>> <-rx_complete
>>>>>>>>> ...
>>>>>>>>>
>>>>>>>>> I don't know where to look exactly. Any hints?
>>>>>>>>
>>>>>>>> do you see anything related to gether_cleanup() after
>>>>>>>> eem_unbind() ?
>>>>>>>
>>>>>>> Nope. It's a pitty that the trace formatting got messed up
>>>>>>> above. But as
>>>>>>> you can see I traced gether_* and eem_*. After eem_unbind no traced
>>>>>>> function is called, until I flip the switch to device mode.
>>>>>>> The ... at the end is where I cut uninteresting eem_unwrap
>>>>>>> <-rx_complete
>>>>>>> and eem_wrap <-eth_start_xmit lines.
>>>>>>>
>>>>>>>> If not then, you may try to enable tracing of TCP/IP stack and
>>>>>>>> network side to check who deleting the sysfs entry
>>>>>>>
>>>>>>> Yes, that's a vast amount of functions to trace. And I don't see how
>>>>>>> that would be related to the patch we're discussing here. I was
>>>>>>> hoping
>>>>>>> for a little more targeted hint.
>>>>>>
>>>>>> Now filtering 'gether_*', 'eem_*', '*configfs_*', 'composite_*',
>>>>>> 'usb_fun*',
>>>>>> 'reset_config' and 'device_remove_file' leads me to:
>>>>>>
>>>>>> TIMESTAMP  FUNCTION
>>>>>>      |         |
>>>>>>     49.952477: eem_wrap <-eth_start_xmit
>>>>>>     55.072455: eem_wrap <-eth_start_xmit
>>>>>>     55.072621: eem_unwrap <-rx_complete
>>>>>>     59.011540: configfs_composite_reset <-usb_gadget_udc_reset
>>>>>>     59.011545: composite_reset <-configfs_composite_reset
>>>>>>     59.011548: reset_config <-__composite_disconnect
>>>>>>     59.013565: eem_disable <-reset_config
>>>>>>     59.013567: gether_disconnect <-reset_config
>>>>>>     59.049560: device_remove_file <-device_remove
>>>>>>     59.051185: configfs_composite_disconnect <-usb_gadget_disco
>>>>>>     59.051189: composite_disconnect <-configfs_composite_discon
>>>>>>     59.051195: configfs_composite_unbind <-gadget_unbind_driver
>>>>>>     59.052519: eem_unbind <-purge_configs_funcs
>>>>>>     59.052529: composite_dev_cleanup <-configfs_composite_unbin
>>>>>>     59.052537: device_remove_file <-composite_dev_cleanup
>>>>>>
>>>>>> device_remove_file gets called twice, once by device_remove after
>>>>>> gether_disconnect (that the one). The 2nd time by
>>>>>> composite_dev_cleanup
>>>>>> (removing the gadget)
>>>>>
>>>>> I believe that the device_remove_file function is only removing
>>>>> suspend-specific attributes, not the complete gadget.
>>>>> Typically, when you perform the role switch, the Gadget start/stop
>>>>> function in your UDC driver is called. These functions should not
>>>>> delete the gadget
>>>>>
>>>>> To investigate further, could you please enable the DWC3 functions
>>>>> in ftrace and check who is removing the gadget?
>>>>> I can also enable this on my system and compare the logs with yours,
>>>>> but I will be in PI planning for 1.5 weeks and may not be able to
>>>>> provide immediate support.
>>>>
>>>> Yes, but of course adding dwc3_* (and usb_*) also traces host mode, so
>>>> trace is 600kb. I cut uninteresting stuff before
>>>> configfs_composite_reset <-usb_gadget_udc_reset and after
>>>> __dwc3_set_mode,
>>>> <https://urldefense.proofpoint.com/v2/url?u=http-3A__300linesremain.Seeattachedtar.gzforyouto&d=DwIDaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=SAhjP5GOmrADp1v_EE5jWoSuMlYCIt9gKduw-DCBPLs&m=zdiBhk-2V5AXxu707QAhbCgWR4qNVRARBmxN17nVB69gVOm-QPqrJeKpo4_trszw&s=ixagWKgLs6wQDJfwh4vIDQNDiy8GZnK9KnUELIfiJz0&e=>
>>>> compare.
>>>>
>> Could you please try with the following patch and see if your issue
>> resolves ?
>>
>> diff --git a/drivers/usb/gadget/function/f_eem.c
>> b/drivers/usb/gadget/function/f_eem.c
>> index 3b445bd88498..c2a904475083 100644
>> --- a/drivers/usb/gadget/function/f_eem.c
>> +++ b/drivers/usb/gadget/function/f_eem.c
>> @@ -247,7 +247,7 @@ static int eem_bind(struct usb_configuration *c,
>> struct usb_function *f)
>>       struct usb_composite_dev *cdev = c->cdev;
>>       struct f_eem        *eem = func_to_eem(f);
>>       struct usb_string    *us;
>> -    int            status;
>> +    int            status = 0;
>>       struct usb_ep        *ep;
>>       struct f_eem_opts    *eem_opts;
>> @@ -260,16 +260,20 @@ static int eem_bind(struct usb_configuration *c,
>> struct usb_function *f)
>>        * with list_for_each_entry, so we assume no race condition
>>        * with regard to eem_opts->bound access
>>        */
>> +    mutex_lock(&eem_opts->lock);
>> +    gether_set_gadget(eem_opts->net, cdev->gadget);
>> +
>>       if (!eem_opts->bound) {
>> -        mutex_lock(&eem_opts->lock);
>> -        gether_set_gadget(eem_opts->net, cdev->gadget);
>>           status = gether_register_netdev(eem_opts->net);
>> -        mutex_unlock(&eem_opts->lock);
>> -        if (status)
>> -            return status;
>> -        eem_opts->bound = true;
>>       }
>> +    mutex_unlock(&eem_opts->lock);
>> +
>> +    if (status)
>> +        goto fail;
>> +
>> +    eem_opts->bound = true;
>> +
>>       us = usb_gstrings_attach(cdev, eem_strings,
>>                    ARRAY_SIZE(eem_string_defs));
>>
>
> After switching from host -> device -> host the usb0 device as seen by
> ifconfig remains "RUNNING" and in the routing table.
>
> FYI This is the regression.

For us, I don't see any advantages in having this patch and only the
downside. Do you have any further ideas / patches that we could test.

Otherwise, it might be best to go with the original suggestion and
revert until it gets sorted out?

> Also /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/net is
> still missing.
>
> After deeper diving into v6.1.0, I found the latter a longer existing
> bug, not caused by your patch.
>
> More over, this doesn't seem to do any harm.
>
> The first issue does, as we try to route traffic over a device that no
> longer exists.
>
>>>>> Additionally, please check if you have any customized DWC patches
>>>>> that may be causing this problem.
>>>>>
>>>>>>
>>>>>>> You may recall the whole issue did not occur before this
>>>>>>> patch got applied.
>>>>>>>
>>>>>>>> Hardik
>>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> According to current kernel architecture of u_ether driver, only
>>>>>>>>>>> gether_cleanup should remove the usb0 interface along with its
>>>>>>>>>>> kobject and sysfs interface.
>>>>>>>>>>> I suggest sharing the analysis here to understand why this
>>>>>>>>>>> practice
>>>>>>>>>>> is not followed in your use case or driver ?
>>>>>>>>>>
>>>>>>>>>> Yes, I'll try to trace where that happens.
>>>>>>>>>>
>>>>>>>>>> Nevertheless, the disappearance of the net/usb0 directory seems
>>>>>>>>>> harmless? But the usb: net device remaining after disconnect
>>>>>>>>>> or role
>>>>>>>>>> switch is not good, as the route remains.
>>>>>>>>>>
>>>>>>>>>> May be they are 2 separate problems. Could you try to
>>>>>>>>>> reproduce what
>>>>>>>>>> happens if you make eem connection and then unplug?
>>>>>>>>>>
>>>>>>>>>>> I am curious why the driver was developed without adhering to
>>>>>>>>>>> the
>>>>>>>>>>> kernel's gadget architecture.
>>>>>>>>>
>>>>>>>>> I don't know what you mean here. Which driver do you mean?
>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>>>> I have the dwc3 IP base usb controller, Let me check
>>>>>>>>>>>>>>> with this patch and
>>>>>>>>>>>>>>> share result here.  May be we need some fix in dwc3
>>>>>>>>>>>>> Would have been nice if someone could test on other
>>>>>>>>>>>>> controller as well. But
>>>>>>>>>>>>> another instance of dwc3 is also very welcome.
>>>>>>>>>>>>>> It's quite possible, please test on your side.
>>>>>>>>>>>>>> We are happy to test any fixes if you come up with.
>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>> With Best Regards,
>>>>>>>>>>>> Andy Shevchenko
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>
>>
>>
>

2024-05-27 06:29:40

by Thorsten Leemhuis

[permalink] [raw]
Subject: Re: [PATCH v4] usb: gadget: u_ether: Replace netif_stop_queue with netif_device_detach

Hi, Thorsten here, the Linux kernel's regression tracker. Top-posting
for once, to make this easily accessible to everyone.

Seems more that four months after the report[1] there is still no
resolution for this regression in sight. This is not how it should be.

Greg, could you maybe chime in here and share your opinion? From my "I
do not know anything about the details of this problem" point of view it
looks like reverting the culprit (f49449fbc21e7e ("usb: gadget: u_ether:
Replace netif_stop_queue with netif_device_detach")) might be the best
option for now, unless of course that creates a regression as well --
but you are the best to judge here.

[1] https://lore.kernel.org/all/[email protected]/

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
If I did something stupid, please tell me, as explained on that page.

On 26.05.24 22:52, Ferry Toth wrote:
> Op 15-05-2024 om 20:38 schreef Ferry Toth:
>>
>> Sorry to have kept you waiting, I was away for a short break.
>>
>> Patch tested below.
>>
>> Op 10-05-2024 om 11:45 schreef Hardik Gajjar:
>>> On Thu, May 02, 2024 at 10:32:16PM +0200, Ferry Toth wrote:
>>>> Oops, sorry, wrong file attached . Now correct one.
>>>>
>>>> Op 02-05-2024 om 22:13 schreef Ferry Toth:
>>>>> Op 02-05-2024 om 17:29 schreef Hardik Gajjar:
>>>>>> On Tue, Apr 30, 2024 at 11:12:17PM +0200, Ferry Toth wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>> Op 30-04-2024 om 21:40 schreef Ferry Toth:
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> Op 30-04-2024 om 17:32 schreef Hardik Gajjar:
>>>>>>>>> On Sun, Apr 28, 2024 at 11:07:36PM +0200, Ferry Toth wrote:
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> Op 25-04-2024 om 23:27 schreef Ferry Toth:
>>>>>>>>>>> Hi,
>>>>>>>>>>>
>>>>>>>>>>> Op 17-04-2024 om 17:13 schreef Hardik Gajjar:
>>>>>>>>>>>> On Tue, Apr 16, 2024 at 04:48:32PM +0300, Andy Shevchenko
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>> On Thu, Apr 11, 2024 at 10:52:36PM +0200, Ferry Toth wrote:
>>>>>>>>>>>>>> Op 11-04-2024 om 18:39 schreef Andy Shevchenko:
>>>>>>>>>>>>>>> On Thu, Apr 11, 2024 at 04:26:37PM +0200, Hardik Gajjar
>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>> On Wed, Apr 10, 2024 at
>>>>>>>>>>>>>>>> 08:37:42PM +0300, Andy
>>>>>>>>>>>>>>>> Shevchenko wrote:
>>>>>>>>>>>>>>>>> On Sun, Apr 07, 2024 at 10:51:51PM +0200, Ferry Toth
>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>> Op 05-04-2024 om 13:38 schreef Hardik Gajjar:
>>>>>>>>>>>>>
>>>>>>>>>>>>> ...
>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Exactly. And this didn't happen before the 2 patches.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> To be precise: /sys/class/net/usb0 is not
>>>>>>>>>>>>>>>>>> removed and it is a link, the link
>>>>>>>>>>>>>>>>>> target
>>>>>>>>>>>>>>>>>> /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/net/usb0
>>>>>>>>>>>>>>>>>> no
>>>>>>>>>>>>>>>>>> longer exists
>>>>>>>>>>>>>>>> So, it means that the /sys/class/net/usb0 is
>>>>>>>>>>>>>>>> present, but the symlink is
>>>>>>>>>>>>>>>> broken. In that case, the dwc3 driver should
>>>>>>>>>>>>>>>> recreate the device, and the
>>>>>>>>>>>>>>>> symlink should become active again
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Yes, on first enabling gadget (when device mode is
>>>>>>>>>>>>>> activated):
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> root@yuna:~# ls
>>>>>>>>>>>>>> /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/
>>>>>>>>>>>>>> driver  net  power  sound  subsystem  suspended  uevent
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Then switching to host mode:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> root@yuna:~# ls
>>>>>>>>>>>>>> /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/
>>>>>>>>>>>>>> ls: cannot access
>>>>>>>>>>>>>> '/sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/':
>>>>>>>>>>>>>> No such file
>>>>>>>>>>>>>> or directory
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Then back to device mode:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> root@yuna:~# ls
>>>>>>>>>>>>>> /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/
>>>>>>>>>>>>>> driver  power  sound  subsystem  suspended  uevent
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> net is missing. But, network functions:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> root@yuna:~# ping 10.42.0.1
>>>>>>>>>>>>>> PING 10.42.0.1 (10.42.0.1): 56 data bytes
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Mass storage device is created and removed each time as
>>>>>>>>>>>>>> expected.
>>>>>>>>>>>>>
>>>>>>>>>>>>> So, what's the conclusion? Shall we move towards revert of
>>>>>>>>>>>>> those
>>>>>>>>>>>>> two changes?
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> As promised, I have the tested the this patch with the dwc3
>>>>>>>>>>>> gadget.
>>>>>>>>>>>> I could not reproduce
>>>>>>>>>>>> the issue.
>>>>>>>>>>>>
>>>>>>>>>>>> I can see the usb0 exist all the time and accessible
>>>>>>>>>>>> regardless of
>>>>>>>>>>>> the role switching of the USB mode (peripheral <-> host)
>>>>>>>>>>>>
>>>>>>>>>>>> Following are the logs:
>>>>>>>>>>>> //Host to device
>>>>>>>>>>>>
>>>>>>>>>>>> console:/sys/bus/platform/devices/a800000.ssusb # echo
>>>>>>>>>>>> "peripheral"
>>>>>>>>>>>>> mode
>>>>>>>>>>>> console:/sys/bus/platform/devices/a800000.ssusb # ls
>>>>>>>>>>>> a800000.dwc3/gadget/net/
>>>>>>>>>>>> usb0
>>>>>>>>>>>>
>>>>>>>>>>>> //device to host
>>>>>>>>>>>> console:/sys/bus/platform/devices/a800000.ssusb
>>>>>>>>>>>> # echo "host" > mode
>>>>>>>>>>>> console:/sys/bus/platform/devices/a800000.ssusb # ls
>>>>>>>>>>>> a800000.dwc3/gadget/net/
>>>>>>>>>>>> usb0
>>>>>>>>>>>
>>>>>>>>>>> That is weird. When I switch to host mode (using
>>>>>>>>>>> the physical switch),
>>>>>>>>>>> the whole gadget directory is removed (now testing 6.9.0-rc5)
>>>>>>>>>>>
>>>>>>>>>>> Switching back to device mode, that gadget
>>>>>>>>>>> directory is recreated. And
>>>>>>>>>>> gadget/sound as well, but not gadget/net.
>>>>>>>>>>>
>>>>>>>>>>>> s a800000.dwc3/gadget/net/usb0
>>>>>>>>>>>> <
>>>>>>>>>>>> addr_assign_type    duplex             phys_port_name
>>>>>>>>>>>> addr_len            flags              phys_switch_id
>>>>>>>>>>>> address             gro_flush_timeout  power
>>>>>>>>>>>> broadcast           ifalias            proto_down
>>>>>>>>>>>> carrier             ifindex            queues
>>>>>>>>>>>> carrier_changes     iflink             speed
>>>>>>>>>>>> carrier_down_count  link_mode          statistics
>>>>>>>>>>>> carrier_up_count    mtu                subsystem
>>>>>>>>>>>> dev_id              name_assign_type   tx_queue_len
>>>>>>>>>>>> dev_port            netdev_group       type
>>>>>>>>>>>> device              operstate          uevent
>>>>>>>>>>>> dormant             phys_port_id       waiting_for_supplier
>>>>>>>>>>>> console:/sys/bus/platform/devices/a800000.ssusb # ifconfig
>>>>>>>>>>>> -a usb0
>>>>>>>>>>>> usb0      Link encap:Ethernet  HWaddr 3a:8b:63:97:1a:9a
>>>>>>>>>>>>               BROADCAST MULTICAST  MTU:1500  Metric:1
>>>>>>>>>>>>               RX packets:0 errors:0 dropped:0 overruns:0
>>>>>>>>>>>> frame:0
>>>>>>>>>>>>               TX packets:0 errors:0 dropped:0 overruns:0
>>>>>>>>>>>> carrier:0
>>>>>>>>>>>>               collisions:0 txqueuelen:1000
>>>>>>>>>>>>               RX bytes:0 TX bytes:0
>>>>>>>>>>>>
>>>>>>>>>>>> console:/sys/bus/platform/devices/a800000.ssusb #
>>>>>>>>>>>>
>>>>>>>>>>>> I strongly advise against reverting the patch solely based
>>>>>>>>>>>> on the
>>>>>>>>>>>> observed issue of removing the /sys/class/net/usb0 directory
>>>>>>>>>>>> while
>>>>>>>>>>>> the usb0 interface remains available.
>>>>>>>>>>>
>>>>>>>>>>> There's more to it. I also mentioned that switching the role or
>>>>>>>>>>> unplugging the cable leaves the usb0 connection.
>>>>>>>>>>>
>>>>>>>>>>> I have while in host mode:
>>>>>>>>>>> root@yuna:~# ifconfig -a usb0
>>>>>>>>>>> usb0: flags=-28605<UP,BROADCAST,RUNNING,MULTICAST,DYNAMIC> 
>>>>>>>>>>> mtu 1500
>>>>>>>>>>>             inet 10.42.0.221  netmask 255.255.255.0  broadcast
>>>>>>>>>>> 10.42.0.255
>>>>>>>>>>>             inet6 fe80::a8bb:ccff:fedd:eef1  prefixlen 64
>>>>>>>>>>> scopeid 0x20<link>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> You don't see that because you didn't create a connection at
>>>>>>>>>>> all.
>>>>>>>>>>>
>>>>>>>>>>>> Instead, I recommend enabling FTRACE to
>>>>>>>>>>>> trace the functions involved
>>>>>>>>>>>> and identify which faulty call is responsible for removing
>>>>>>>>>>>> usb0.
>>>>>>>>>>>
>>>>>>>>>>> Switching from device -> host -> device:
>>>>>>>>>>>
>>>>>>>>>>> root@yuna:~# trace-cmd record -p function_graph -l *gether_*
>>>>>>>>>>>       plugin 'function_graph'
>>>>>>>>>>> Hit Ctrl^C to stop recording
>>>>>>>>>>> ^CCPU0 data recorded at offset=0x1c8000
>>>>>>>>>>>         188 bytes in size (4096 uncompressed)
>>>>>>>>>>> CPU1 data recorded at offset=0x1c9000
>>>>>>>>>>>         0 bytes in size (0 uncompressed)
>>>>>>>>>>> root@yuna:~# trace-cmd report
>>>>>>>>>>> cpus=2
>>>>>>>>>>>          irq/68-dwc3-725   [000]   514.575337:
>>>>>>>>>>> funcgraph_entry:      #
>>>>>>>>>>> 2079.480 us |  gether_disconnect();
>>>>>>>>>>>          irq/68-dwc3-946   [000]   524.263731:
>>>>>>>>>>> funcgraph_entry:      +
>>>>>>>>>>> 11.640 us  |  gether_disconnect();
>>>>>>>>>>>          irq/68-dwc3-946   [000]   524.263743:
>>>>>>>>>>> funcgraph_entry:      !
>>>>>>>>>>> 116.520 us |  gether_connect();
>>>>>>>>>>>          irq/68-dwc3-946   [000]   524.268029:
>>>>>>>>>>> funcgraph_entry:      #
>>>>>>>>>>> 2057.260 us |  gether_disconnect();
>>>>>>>>>>>          irq/68-dwc3-946   [000]   524.270089:
>>>>>>>>>>> funcgraph_entry:      !
>>>>>>>>>>> 109.000 us |  gether_connect();
>>>>>>>>>>
>>>>>>>>>> I tried to get a more useful trace:
>>>>>>>>>> root@yuna:/sys/kernel/tracing# echo 'gether_*' >
>>>>>>>>>> set_ftrace_filter
>>>>>>>>>> root@yuna:/sys/kernel/tracing# echo 'eem_*' >> set_ftrace_filter
>>>>>>>>>> root@yuna:/sys/kernel/tracing# echo function > current_tracer
>>>>>>>>>> root@yuna:/sys/kernel/tracing# echo 'reset_config'
>>>>>>>>>>>> set_ftrace_filter
>>>>>>>>>> -> switch to host mode then back to device
>>>>>>>>>> root@yuna:/sys/kernel/tracing# cat trace
>>>>>>>>>> # tracer: function
>>>>>>>>>> #
>>>>>>>>>> # entries-in-buffer/entries-written: 53/53   #P:2
>>>>>>>>>> #
>>>>>>>>>> #                                _-----=> irqs-off/BH-disabled
>>>>>>>>>> #                               / _----=> need-resched
>>>>>>>>>> #                              | / _---=> hardirq/softirq
>>>>>>>>>> #                              || / _--=> preempt-depth
>>>>>>>>>> #                              ||| / _-=> migrate-disable
>>>>>>>>>> #                              |||| /     delay
>>>>>>>>>> #           TASK-PID     CPU#  |||||  TIMESTAMP  FUNCTION
>>>>>>>>>> #              | |         |   |||||     |         |
>>>>>>>>>>         irq/68-dwc3-523     [000] D..3.   133.990254:
>>>>>>>>>> reset_config
>>>>>>>>>> <-__composite_disconnect
>>>>>>>>>>         irq/68-dwc3-523     [000] D..3.   133.992274: eem_disable
>>>>>>>>>> <-reset_config
>>>>>>>>>>         irq/68-dwc3-523     [000] D..3.   133.992276:
>>>>>>>>>> gether_disconnect
>>>>>>>>>> <-reset_config
>>>>>>>>>>         kworker/1:3-443     [001] ...1.   134.022453: eem_unbind
>>>>>>>>>> <-purge_configs_funcs
>>>>>>>>>>
>>>>>>>>>> -> to device mode
>>>>>>>>>>
>>>>>>>>>>         kworker/1:3-443     [001] ...1.   148.630773: eem_bind
>>>>>>>>>> <-usb_add_function
>>>>>>>>>>         irq/68-dwc3-734     [000] D..3.   149.155209: eem_set_alt
>>>>>>>>>> <-composite_setup
>>>>>>>>>>         irq/68-dwc3-734     [000] D..3.   149.155215:
>>>>>>>>>> gether_disconnect
>>>>>>>>>> <-eem_set_alt
>>>>>>>>>>         irq/68-dwc3-734     [000] D..3.   149.155220:
>>>>>>>>>> gether_connect
>>>>>>>>>> <-eem_set_alt
>>>>>>>>>>         irq/68-dwc3-734     [000] D..3.   149.157287: eem_set_alt
>>>>>>>>>> <-composite_setup
>>>>>>>>>>         irq/68-dwc3-734     [000] D..3.   149.157292:
>>>>>>>>>> gether_disconnect
>>>>>>>>>> <-eem_set_alt
>>>>>>>>>>         irq/68-dwc3-734     [000] D..3.   149.159338:
>>>>>>>>>> gether_connect
>>>>>>>>>> <-eem_set_alt
>>>>>>>>>>         irq/68-dwc3-734     [000] D..2.   149.239625: eem_unwrap
>>>>>>>>>> <-rx_complete
>>>>>>>>>> ...
>>>>>>>>>>
>>>>>>>>>> I don't know where to look exactly. Any hints?
>>>>>>>>>
>>>>>>>>> do you see anything related to gether_cleanup() after
>>>>>>>>> eem_unbind() ?
>>>>>>>>
>>>>>>>> Nope. It's a pitty that the trace formatting got messed up
>>>>>>>> above. But as
>>>>>>>> you can see I traced gether_* and eem_*. After eem_unbind no traced
>>>>>>>> function is called, until I flip the switch to device mode.
>>>>>>>> The ... at the end is where I cut uninteresting eem_unwrap
>>>>>>>> <-rx_complete
>>>>>>>> and eem_wrap <-eth_start_xmit lines.
>>>>>>>>
>>>>>>>>> If not then, you may try to enable tracing of TCP/IP stack and
>>>>>>>>> network side to check who deleting the sysfs entry
>>>>>>>>
>>>>>>>> Yes, that's a vast amount of functions to trace. And I don't see
>>>>>>>> how
>>>>>>>> that would be related to the patch we're discussing here. I was
>>>>>>>> hoping
>>>>>>>> for a little more targeted hint.
>>>>>>>
>>>>>>> Now filtering 'gether_*', 'eem_*', '*configfs_*', 'composite_*',
>>>>>>> 'usb_fun*',
>>>>>>> 'reset_config' and 'device_remove_file' leads me to:
>>>>>>>
>>>>>>> TIMESTAMP  FUNCTION
>>>>>>>      |         |
>>>>>>>     49.952477: eem_wrap <-eth_start_xmit
>>>>>>>     55.072455: eem_wrap <-eth_start_xmit
>>>>>>>     55.072621: eem_unwrap <-rx_complete
>>>>>>>     59.011540: configfs_composite_reset <-usb_gadget_udc_reset
>>>>>>>     59.011545: composite_reset <-configfs_composite_reset
>>>>>>>     59.011548: reset_config <-__composite_disconnect
>>>>>>>     59.013565: eem_disable <-reset_config
>>>>>>>     59.013567: gether_disconnect <-reset_config
>>>>>>>     59.049560: device_remove_file <-device_remove
>>>>>>>     59.051185: configfs_composite_disconnect <-usb_gadget_disco
>>>>>>>     59.051189: composite_disconnect <-configfs_composite_discon
>>>>>>>     59.051195: configfs_composite_unbind <-gadget_unbind_driver
>>>>>>>     59.052519: eem_unbind <-purge_configs_funcs
>>>>>>>     59.052529: composite_dev_cleanup <-configfs_composite_unbin
>>>>>>>     59.052537: device_remove_file <-composite_dev_cleanup
>>>>>>>
>>>>>>> device_remove_file gets called twice, once by device_remove after
>>>>>>> gether_disconnect (that the one). The 2nd time by
>>>>>>> composite_dev_cleanup
>>>>>>> (removing the gadget)
>>>>>>
>>>>>> I believe that the device_remove_file function is only removing
>>>>>> suspend-specific attributes, not the complete gadget.
>>>>>> Typically, when you perform the role switch, the Gadget start/stop
>>>>>> function in your UDC driver is called. These functions should not
>>>>>> delete the gadget
>>>>>>
>>>>>> To investigate further, could you please enable the DWC3 functions
>>>>>> in ftrace and check who is removing the gadget?
>>>>>> I can also enable this on my system and compare the logs with yours,
>>>>>> but I will be in PI planning for 1.5 weeks and may not be able to
>>>>>> provide immediate support.
>>>>>
>>>>> Yes, but of course adding dwc3_* (and usb_*) also traces host mode, so
>>>>> trace is 600kb. I cut uninteresting stuff before
>>>>> configfs_composite_reset <-usb_gadget_udc_reset and after
>>>>> __dwc3_set_mode,
>>>>> <https://urldefense.proofpoint.com/v2/url?u=http-3A__300linesremain.Seeattachedtar.gzforyouto&d=DwIDaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=SAhjP5GOmrADp1v_EE5jWoSuMlYCIt9gKduw-DCBPLs&m=zdiBhk-2V5AXxu707QAhbCgWR4qNVRARBmxN17nVB69gVOm-QPqrJeKpo4_trszw&s=ixagWKgLs6wQDJfwh4vIDQNDiy8GZnK9KnUELIfiJz0&e=>
>>>>> compare.
>>>>>
>>> Could you please try with the following patch and see if your issue
>>> resolves ?
>>>
>>> diff --git a/drivers/usb/gadget/function/f_eem.c
>>> b/drivers/usb/gadget/function/f_eem.c
>>> index 3b445bd88498..c2a904475083 100644
>>> --- a/drivers/usb/gadget/function/f_eem.c
>>> +++ b/drivers/usb/gadget/function/f_eem.c
>>> @@ -247,7 +247,7 @@ static int eem_bind(struct usb_configuration *c,
>>> struct usb_function *f)
>>>       struct usb_composite_dev *cdev = c->cdev;
>>>       struct f_eem        *eem = func_to_eem(f);
>>>       struct usb_string    *us;
>>> -    int            status;
>>> +    int            status = 0;
>>>       struct usb_ep        *ep;
>>>       struct f_eem_opts    *eem_opts;
>>> @@ -260,16 +260,20 @@ static int eem_bind(struct usb_configuration
>>> *c, struct usb_function *f)
>>>        * with list_for_each_entry, so we assume no race condition
>>>        * with regard to eem_opts->bound access
>>>        */
>>> +    mutex_lock(&eem_opts->lock);
>>> +    gether_set_gadget(eem_opts->net, cdev->gadget);
>>> +
>>>       if (!eem_opts->bound) {
>>> -        mutex_lock(&eem_opts->lock);
>>> -        gether_set_gadget(eem_opts->net, cdev->gadget);
>>>           status = gether_register_netdev(eem_opts->net);
>>> -        mutex_unlock(&eem_opts->lock);
>>> -        if (status)
>>> -            return status;
>>> -        eem_opts->bound = true;
>>>       }
>>> +    mutex_unlock(&eem_opts->lock);
>>> +
>>> +    if (status)
>>> +        goto fail;
>>> +
>>> +    eem_opts->bound = true;
>>> +
>>>       us = usb_gstrings_attach(cdev, eem_strings,
>>>                    ARRAY_SIZE(eem_string_defs));
>>>
>>
>> After switching from host -> device -> host the usb0 device as seen by
>> ifconfig remains "RUNNING" and in the routing table.
>>
>> FYI This is the regression.
>
> For us, I don't see any advantages in having this patch and only the
> downside. Do you have any further ideas / patches that we could test.
>
> Otherwise, it might be best to go with the original suggestion and
> revert until it gets sorted out?
>
>> Also /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/net is
>> still missing.
>>
>> After deeper diving into v6.1.0, I found the latter a longer existing
>> bug, not caused by your patch.
>>
>> More over, this doesn't seem to do any harm.
>>
>> The first issue does, as we try to route traffic over a device that no
>> longer exists.
>>
>>>>>> Additionally, please check if you have any customized DWC patches
>>>>>> that may be causing this problem.
>>>>>>
>>>>>>>
>>>>>>>> You may recall the whole issue did not occur before this
>>>>>>>> patch got applied.
>>>>>>>>
>>>>>>>>> Hardik
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> According to current kernel architecture of u_ether driver,
>>>>>>>>>>>> only
>>>>>>>>>>>> gether_cleanup should remove the usb0 interface along with its
>>>>>>>>>>>> kobject and sysfs interface.
>>>>>>>>>>>> I suggest sharing the analysis here to understand why this
>>>>>>>>>>>> practice
>>>>>>>>>>>> is not followed in your use case or driver ?
>>>>>>>>>>>
>>>>>>>>>>> Yes, I'll try to trace where that happens.
>>>>>>>>>>>
>>>>>>>>>>> Nevertheless, the disappearance of the net/usb0 directory seems
>>>>>>>>>>> harmless? But the usb: net device remaining after disconnect
>>>>>>>>>>> or role
>>>>>>>>>>> switch is not good, as the route remains.
>>>>>>>>>>>
>>>>>>>>>>> May be they are 2 separate problems. Could you try to
>>>>>>>>>>> reproduce what
>>>>>>>>>>> happens if you make eem connection and then unplug?
>>>>>>>>>>>
>>>>>>>>>>>> I am curious why the driver was developed without adhering
>>>>>>>>>>>> to the
>>>>>>>>>>>> kernel's gadget architecture.
>>>>>>>>>>
>>>>>>>>>> I don't know what you mean here. Which driver do you mean?
>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I have the dwc3 IP base usb controller, Let me check
>>>>>>>>>>>>>>>> with this patch and
>>>>>>>>>>>>>>>> share result here.  May be we need some fix in dwc3
>>>>>>>>>>>>>> Would have been nice if someone could test on other
>>>>>>>>>>>>>> controller as well. But
>>>>>>>>>>>>>> another instance of dwc3 is also very welcome.
>>>>>>>>>>>>>>> It's quite possible, please test on your side.
>>>>>>>>>>>>>>> We are happy to test any fixes if you come up with.
>>>>>>>>>>>>>
>>>>>>>>>>>>> -- 
>>>>>>>>>>>>> With Best Regards,
>>>>>>>>>>>>> Andy Shevchenko

2024-05-28 07:25:27

by Hardik Gajjar

[permalink] [raw]
Subject: Re: [PATCH v4] usb: gadget: u_ether: Replace netif_stop_queue with netif_device_detach

On Sun, May 26, 2024 at 10:52:53PM +0200, Ferry Toth wrote:
> Hi
>
> Op 15-05-2024 om 20:38 schreef Ferry Toth:
> > Hi,
> >
> > Sorry to have kept you waiting, I was away for a short break.
> >
> > Patch tested below.
> >
> > Op 10-05-2024 om 11:45 schreef Hardik Gajjar:
> > > On Thu, May 02, 2024 at 10:32:16PM +0200, Ferry Toth wrote:
> > > > Oops, sorry, wrong file attached . Now correct one.
> > > >
> > > > Op 02-05-2024 om 22:13 schreef Ferry Toth:
> > > > > Op 02-05-2024 om 17:29 schreef Hardik Gajjar:
> > > > > > On Tue, Apr 30, 2024 at 11:12:17PM +0200, Ferry Toth wrote:
> > > > > > > Hi,
> > > > > > >
> > > > > > > Op 30-04-2024 om 21:40 schreef Ferry Toth:
> > > > > > > > Hi,
> > > > > > > >
> > > > > > > > Op 30-04-2024 om 17:32 schreef Hardik Gajjar:
> > > > > > > > > On Sun, Apr 28, 2024 at 11:07:36PM +0200, Ferry Toth wrote:
> > > > > > > > > > Hi,
> > > > > > > > > >
> > > > > > > > > > Op 25-04-2024 om 23:27 schreef Ferry Toth:
> > > > > > > > > > > Hi,
> > > > > > > > > > >
> > > > > > > > > > > Op 17-04-2024 om 17:13 schreef Hardik Gajjar:
> > > > > > > > > > > > On Tue, Apr 16, 2024 at 04:48:32PM +0300, Andy Shevchenko wrote:
> > > > > > > > > > > > > On Thu, Apr 11, 2024 at 10:52:36PM +0200, Ferry Toth wrote:
> > > > > > > > > > > > > > Op 11-04-2024 om 18:39 schreef Andy Shevchenko:
> > > > > > > > > > > > > > > On Thu, Apr 11, 2024
> > > > > > > > > > > > > > > at 04:26:37PM +0200,
> > > > > > > > > > > > > > > Hardik Gajjar wrote:
> > > > > > > > > > > > > > > > On Wed, Apr 10, 2024 at
> > > > > > > > > > > > > > > > 08:37:42PM +0300, Andy
> > > > > > > > > > > > > > > > Shevchenko wrote:
> > > > > > > > > > > > > > > > > On Sun, Apr 07, 2024 at 10:51:51PM +0200, Ferry Toth wrote:
> > > > > > > > > > > > > > > > > > Op 05-04-2024 om 13:38 schreef Hardik Gajjar:
> > > > > > > > > > > > >
> > > > > > > > > > > > > ...
> > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Exactly. And this didn't happen before the 2 patches.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > To be precise: /sys/class/net/usb0 is not
> > > > > > > > > > > > > > > > > > removed and it is a link, the link
> > > > > > > > > > > > > > > > > > target /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/net/usb0
> > > > > > > > > > > > > > > > > > no
> > > > > > > > > > > > > > > > > > longer exists
> > > > > > > > > > > > > > > > So, it means that the /sys/class/net/usb0 is
> > > > > > > > > > > > > > > > present, but the symlink is
> > > > > > > > > > > > > > > > broken. In that case, the dwc3 driver should
> > > > > > > > > > > > > > > > recreate the device, and the
> > > > > > > > > > > > > > > > symlink should become active again
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Yes, on first enabling gadget (when device mode is activated):
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > root@yuna:~# ls
> > > > > > > > > > > > > > /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/
> > > > > > > > > > > > > > driver  net  power  sound  subsystem  suspended  uevent
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Then switching to host mode:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > root@yuna:~# ls
> > > > > > > > > > > > > > /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/
> > > > > > > > > > > > > > ls: cannot access
> > > > > > > > > > > > > > '/sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/':
> > > > > > > > > > > > > > No such file
> > > > > > > > > > > > > > or directory
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Then back to device mode:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > root@yuna:~# ls
> > > > > > > > > > > > > > /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/
> > > > > > > > > > > > > > driver  power  sound  subsystem  suspended  uevent
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > net is missing. But, network functions:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > root@yuna:~# ping 10.42.0.1
> > > > > > > > > > > > > > PING 10.42.0.1 (10.42.0.1): 56 data bytes
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Mass storage device is
> > > > > > > > > > > > > > created and removed each
> > > > > > > > > > > > > > time as expected.
> > > > > > > > > > > > >
> > > > > > > > > > > > > So, what's the conclusion?
> > > > > > > > > > > > > Shall we move towards revert
> > > > > > > > > > > > > of those
> > > > > > > > > > > > > two changes?
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > As promised, I have the tested
> > > > > > > > > > > > the this patch with the dwc3
> > > > > > > > > > > > gadget.
> > > > > > > > > > > > I could not reproduce
> > > > > > > > > > > > the issue.
> > > > > > > > > > > >
> > > > > > > > > > > > I can see the usb0 exist all the
> > > > > > > > > > > > time and accessible regardless
> > > > > > > > > > > > of
> > > > > > > > > > > > the role switching of the USB mode (peripheral <-> host)
> > > > > > > > > > > >
> > > > > > > > > > > > Following are the logs:
> > > > > > > > > > > > //Host to device
> > > > > > > > > > > >
> > > > > > > > > > > > console:/sys/bus/platform/devices/a800000.ssusb
> > > > > > > > > > > > # echo "peripheral"
> > > > > > > > > > > > > mode
> > > > > > > > > > > > console:/sys/bus/platform/devices/a800000.ssusb # ls
> > > > > > > > > > > > a800000.dwc3/gadget/net/
> > > > > > > > > > > > usb0
> > > > > > > > > > > >
> > > > > > > > > > > > //device to host
> > > > > > > > > > > > console:/sys/bus/platform/devices/a800000.ssusb
> > > > > > > > > > > > # echo "host" > mode
> > > > > > > > > > > > console:/sys/bus/platform/devices/a800000.ssusb # ls
> > > > > > > > > > > > a800000.dwc3/gadget/net/
> > > > > > > > > > > > usb0
> > > > > > > > > > >
> > > > > > > > > > > That is weird. When I switch to host mode (using
> > > > > > > > > > > the physical switch),
> > > > > > > > > > > the whole gadget directory is removed (now testing 6.9.0-rc5)
> > > > > > > > > > >
> > > > > > > > > > > Switching back to device mode, that gadget
> > > > > > > > > > > directory is recreated. And
> > > > > > > > > > > gadget/sound as well, but not gadget/net.
> > > > > > > > > > >
> > > > > > > > > > > > s a800000.dwc3/gadget/net/usb0
> > > > > > > > > > > > <
> > > > > > > > > > > > addr_assign_type    duplex             phys_port_name
> > > > > > > > > > > > addr_len            flags              phys_switch_id
> > > > > > > > > > > > address             gro_flush_timeout  power
> > > > > > > > > > > > broadcast           ifalias            proto_down
> > > > > > > > > > > > carrier             ifindex            queues
> > > > > > > > > > > > carrier_changes     iflink             speed
> > > > > > > > > > > > carrier_down_count  link_mode          statistics
> > > > > > > > > > > > carrier_up_count    mtu                subsystem
> > > > > > > > > > > > dev_id              name_assign_type   tx_queue_len
> > > > > > > > > > > > dev_port            netdev_group       type
> > > > > > > > > > > > device              operstate          uevent
> > > > > > > > > > > > dormant             phys_port_id       waiting_for_supplier
> > > > > > > > > > > > console:/sys/bus/platform/devices/a800000.ssusb
> > > > > > > > > > > > # ifconfig -a usb0
> > > > > > > > > > > > usb0      Link encap:Ethernet  HWaddr 3a:8b:63:97:1a:9a
> > > > > > > > > > > >               BROADCAST MULTICAST  MTU:1500  Metric:1
> > > > > > > > > > > >               RX packets:0 errors:0 dropped:0 overruns:0 frame:0
> > > > > > > > > > > >               TX packets:0
> > > > > > > > > > > > errors:0 dropped:0 overruns:0
> > > > > > > > > > > > carrier:0
> > > > > > > > > > > >               collisions:0 txqueuelen:1000
> > > > > > > > > > > >               RX bytes:0 TX bytes:0
> > > > > > > > > > > >
> > > > > > > > > > > > console:/sys/bus/platform/devices/a800000.ssusb #
> > > > > > > > > > > >
> > > > > > > > > > > > I strongly advise against
> > > > > > > > > > > > reverting the patch solely based
> > > > > > > > > > > > on the
> > > > > > > > > > > > observed issue of removing the
> > > > > > > > > > > > /sys/class/net/usb0 directory
> > > > > > > > > > > > while
> > > > > > > > > > > > the usb0 interface remains available.
> > > > > > > > > > >
> > > > > > > > > > > There's more to it. I also mentioned that switching the role or
> > > > > > > > > > > unplugging the cable leaves the usb0 connection.
> > > > > > > > > > >
> > > > > > > > > > > I have while in host mode:
> > > > > > > > > > > root@yuna:~# ifconfig -a usb0
> > > > > > > > > > > usb0:
> > > > > > > > > > > flags=-28605<UP,BROADCAST,RUNNING,MULTICAST,DYNAMIC>
> > > > > > > > > > > mtu 1500
> > > > > > > > > > >             inet 10.42.0.221  netmask 255.255.255.0  broadcast
> > > > > > > > > > > 10.42.0.255
> > > > > > > > > > >             inet6 fe80::a8bb:ccff:fedd:eef1  prefixlen 64
> > > > > > > > > > > scopeid 0x20<link>
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > You don't see that because you didn't create a connection at all.
> > > > > > > > > > >
> > > > > > > > > > > > Instead, I recommend enabling FTRACE to
> > > > > > > > > > > > trace the functions involved
> > > > > > > > > > > > and identify which faulty call is responsible for removing usb0.
> > > > > > > > > > >
> > > > > > > > > > > Switching from device -> host -> device:
> > > > > > > > > > >
> > > > > > > > > > > root@yuna:~# trace-cmd record -p function_graph -l *gether_*
> > > > > > > > > > >       plugin 'function_graph'
> > > > > > > > > > > Hit Ctrl^C to stop recording
> > > > > > > > > > > ^CCPU0 data recorded at offset=0x1c8000
> > > > > > > > > > >         188 bytes in size (4096 uncompressed)
> > > > > > > > > > > CPU1 data recorded at offset=0x1c9000
> > > > > > > > > > >         0 bytes in size (0 uncompressed)
> > > > > > > > > > > root@yuna:~# trace-cmd report
> > > > > > > > > > > cpus=2
> > > > > > > > > > >          irq/68-dwc3-725   [000]   514.575337:
> > > > > > > > > > > funcgraph_entry:      #
> > > > > > > > > > > 2079.480 us |  gether_disconnect();
> > > > > > > > > > >          irq/68-dwc3-946   [000]   524.263731:
> > > > > > > > > > > funcgraph_entry:      +
> > > > > > > > > > > 11.640 us  |  gether_disconnect();
> > > > > > > > > > >          irq/68-dwc3-946   [000]   524.263743:
> > > > > > > > > > > funcgraph_entry:      !
> > > > > > > > > > > 116.520 us |  gether_connect();
> > > > > > > > > > >          irq/68-dwc3-946   [000]   524.268029:
> > > > > > > > > > > funcgraph_entry:      #
> > > > > > > > > > > 2057.260 us |  gether_disconnect();
> > > > > > > > > > >          irq/68-dwc3-946   [000]   524.270089:
> > > > > > > > > > > funcgraph_entry:      !
> > > > > > > > > > > 109.000 us |  gether_connect();
> > > > > > > > > >
> > > > > > > > > > I tried to get a more useful trace:
> > > > > > > > > > root@yuna:/sys/kernel/tracing# echo 'gether_*' > set_ftrace_filter
> > > > > > > > > > root@yuna:/sys/kernel/tracing# echo 'eem_*' >> set_ftrace_filter
> > > > > > > > > > root@yuna:/sys/kernel/tracing# echo function > current_tracer
> > > > > > > > > > root@yuna:/sys/kernel/tracing# echo 'reset_config'
> > > > > > > > > > > > set_ftrace_filter
> > > > > > > > > > -> switch to host mode then back to device
> > > > > > > > > > root@yuna:/sys/kernel/tracing# cat trace
> > > > > > > > > > # tracer: function
> > > > > > > > > > #
> > > > > > > > > > # entries-in-buffer/entries-written: 53/53   #P:2
> > > > > > > > > > #
> > > > > > > > > > #                                _-----=> irqs-off/BH-disabled
> > > > > > > > > > #                               / _----=> need-resched
> > > > > > > > > > #                              | / _---=> hardirq/softirq
> > > > > > > > > > #                              || / _--=> preempt-depth
> > > > > > > > > > #                              ||| / _-=> migrate-disable
> > > > > > > > > > #                              |||| /     delay
> > > > > > > > > > #           TASK-PID     CPU#  |||||  TIMESTAMP  FUNCTION
> > > > > > > > > > #              | |         |   |||||     |         |
> > > > > > > > > >         irq/68-dwc3-523     [000] D..3.   133.990254: reset_config
> > > > > > > > > > <-__composite_disconnect
> > > > > > > > > >         irq/68-dwc3-523     [000] D..3.   133.992274: eem_disable
> > > > > > > > > > <-reset_config
> > > > > > > > > >         irq/68-dwc3-523     [000] D..3.   133.992276:
> > > > > > > > > > gether_disconnect
> > > > > > > > > > <-reset_config
> > > > > > > > > >         kworker/1:3-443     [001] ...1.   134.022453: eem_unbind
> > > > > > > > > > <-purge_configs_funcs
> > > > > > > > > >
> > > > > > > > > > -> to device mode
> > > > > > > > > >
> > > > > > > > > >         kworker/1:3-443     [001] ...1.   148.630773: eem_bind
> > > > > > > > > > <-usb_add_function
> > > > > > > > > >         irq/68-dwc3-734     [000] D..3.   149.155209: eem_set_alt
> > > > > > > > > > <-composite_setup
> > > > > > > > > >         irq/68-dwc3-734     [000] D..3.   149.155215:
> > > > > > > > > > gether_disconnect
> > > > > > > > > > <-eem_set_alt
> > > > > > > > > >         irq/68-dwc3-734     [000]
> > > > > > > > > > D..3.   149.155220: gether_connect
> > > > > > > > > > <-eem_set_alt
> > > > > > > > > >         irq/68-dwc3-734     [000] D..3.   149.157287: eem_set_alt
> > > > > > > > > > <-composite_setup
> > > > > > > > > >         irq/68-dwc3-734     [000] D..3.   149.157292:
> > > > > > > > > > gether_disconnect
> > > > > > > > > > <-eem_set_alt
> > > > > > > > > >         irq/68-dwc3-734     [000]
> > > > > > > > > > D..3.   149.159338: gether_connect
> > > > > > > > > > <-eem_set_alt
> > > > > > > > > >         irq/68-dwc3-734     [000] D..2.   149.239625: eem_unwrap
> > > > > > > > > > <-rx_complete
> > > > > > > > > > ...
> > > > > > > > > >
> > > > > > > > > > I don't know where to look exactly. Any hints?
> > > > > > > > >
> > > > > > > > > do you see anything related to
> > > > > > > > > gether_cleanup() after eem_unbind() ?
> > > > > > > >
> > > > > > > > Nope. It's a pitty that the trace formatting got messed up
> > > > > > > > above. But as
> > > > > > > > you can see I traced gether_* and eem_*. After eem_unbind no traced
> > > > > > > > function is called, until I flip the switch to device mode.
> > > > > > > > The ... at the end is where I cut uninteresting eem_unwrap
> > > > > > > > <-rx_complete
> > > > > > > > and eem_wrap <-eth_start_xmit lines.
> > > > > > > >
> > > > > > > > > If not then, you may try to enable tracing of TCP/IP stack and
> > > > > > > > > network side to check who deleting the sysfs entry
> > > > > > > >
> > > > > > > > Yes, that's a vast amount of functions to trace. And I don't see how
> > > > > > > > that would be related to the patch we're
> > > > > > > > discussing here. I was hoping
> > > > > > > > for a little more targeted hint.
> > > > > > >
> > > > > > > Now filtering 'gether_*', 'eem_*', '*configfs_*', 'composite_*',
> > > > > > > 'usb_fun*',
> > > > > > > 'reset_config' and 'device_remove_file' leads me to:
> > > > > > >
> > > > > > > TIMESTAMP  FUNCTION
> > > > > > >      |         |
> > > > > > >     49.952477: eem_wrap <-eth_start_xmit
> > > > > > >     55.072455: eem_wrap <-eth_start_xmit
> > > > > > >     55.072621: eem_unwrap <-rx_complete
> > > > > > >     59.011540: configfs_composite_reset <-usb_gadget_udc_reset
> > > > > > >     59.011545: composite_reset <-configfs_composite_reset
> > > > > > >     59.011548: reset_config <-__composite_disconnect
> > > > > > >     59.013565: eem_disable <-reset_config
> > > > > > >     59.013567: gether_disconnect <-reset_config
> > > > > > >     59.049560: device_remove_file <-device_remove
> > > > > > >     59.051185: configfs_composite_disconnect <-usb_gadget_disco
> > > > > > >     59.051189: composite_disconnect <-configfs_composite_discon
> > > > > > >     59.051195: configfs_composite_unbind <-gadget_unbind_driver
> > > > > > >     59.052519: eem_unbind <-purge_configs_funcs
> > > > > > >     59.052529: composite_dev_cleanup <-configfs_composite_unbin
> > > > > > >     59.052537: device_remove_file <-composite_dev_cleanup
> > > > > > >
> > > > > > > device_remove_file gets called twice, once by device_remove after
> > > > > > > gether_disconnect (that the one). The 2nd time by
> > > > > > > composite_dev_cleanup
> > > > > > > (removing the gadget)
> > > > > >
> > > > > > I believe that the device_remove_file function is only removing
> > > > > > suspend-specific attributes, not the complete gadget.
> > > > > > Typically, when you perform the role switch, the Gadget start/stop
> > > > > > function in your UDC driver is called. These functions should not
> > > > > > delete the gadget
> > > > > >
> > > > > > To investigate further, could you please enable the DWC3 functions
> > > > > > in ftrace and check who is removing the gadget?
> > > > > > I can also enable this on my system and compare the logs with yours,
> > > > > > but I will be in PI planning for 1.5 weeks and may not be able to
> > > > > > provide immediate support.
> > > > >
> > > > > Yes, but of course adding dwc3_* (and usb_*) also traces host mode, so
> > > > > trace is 600kb. I cut uninteresting stuff before
> > > > > configfs_composite_reset <-usb_gadget_udc_reset and after
> > > > > __dwc3_set_mode, <https://urldefense.proofpoint.com/v2/url?u=http-3A__300linesremain.Seeattachedtar.gzforyouto&d=DwIDaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=SAhjP5GOmrADp1v_EE5jWoSuMlYCIt9gKduw-DCBPLs&m=zdiBhk-2V5AXxu707QAhbCgWR4qNVRARBmxN17nVB69gVOm-QPqrJeKpo4_trszw&s=ixagWKgLs6wQDJfwh4vIDQNDiy8GZnK9KnUELIfiJz0&e=>
> > > > > compare.
> > > > >
> > > Could you please try with the following patch and see if your issue
> > > resolves ?
> > >
> > > diff --git a/drivers/usb/gadget/function/f_eem.c
> > > b/drivers/usb/gadget/function/f_eem.c
> > > index 3b445bd88498..c2a904475083 100644
> > > --- a/drivers/usb/gadget/function/f_eem.c
> > > +++ b/drivers/usb/gadget/function/f_eem.c
> > > @@ -247,7 +247,7 @@ static int eem_bind(struct usb_configuration *c,
> > > struct usb_function *f)
> > >       struct usb_composite_dev *cdev = c->cdev;
> > >       struct f_eem        *eem = func_to_eem(f);
> > >       struct usb_string    *us;
> > > -    int            status;
> > > +    int            status = 0;
> > >       struct usb_ep        *ep;
> > >       struct f_eem_opts    *eem_opts;
> > > @@ -260,16 +260,20 @@ static int eem_bind(struct usb_configuration
> > > *c, struct usb_function *f)
> > >        * with list_for_each_entry, so we assume no race condition
> > >        * with regard to eem_opts->bound access
> > >        */
> > > +    mutex_lock(&eem_opts->lock);
> > > +    gether_set_gadget(eem_opts->net, cdev->gadget);
> > > +
> > >       if (!eem_opts->bound) {
> > > -        mutex_lock(&eem_opts->lock);
> > > -        gether_set_gadget(eem_opts->net, cdev->gadget);
> > >           status = gether_register_netdev(eem_opts->net);
> > > -        mutex_unlock(&eem_opts->lock);
> > > -        if (status)
> > > -            return status;
> > > -        eem_opts->bound = true;
> > >       }
> > > +    mutex_unlock(&eem_opts->lock);
> > > +
> > > +    if (status)
> > > +        goto fail;
> > > +
> > > +    eem_opts->bound = true;
> > > +
> > >       us = usb_gstrings_attach(cdev, eem_strings,
> > >                    ARRAY_SIZE(eem_string_defs));
> > >
> >
> > After switching from host -> device -> host the usb0 device as seen by
> > ifconfig remains "RUNNING" and in the routing table.
> >
> > FYI This is the regression.
>
> For us, I don't see any advantages in having this patch and only the
> downside. Do you have any further ideas / patches that we could test.
>
> Otherwise, it might be best to go with the original suggestion and revert
> until it gets sorted out?

Hello,

Okay, lets revert the patch till we fix your problem. Would you be able to send the revert ? or shall I ?

However, We are very much intrested to fix your problem and upstream the v2.
Therefore could you please share the following details to reproduce the issue
- How did you created the gadget ? Could you please share the exact commands to create the eem gadget ?
- Arch of your test bench (arm, x86)
- How the DWC3 controller connected to your system ? via PCIe slot ? Could you please share the part number of the PCIe card ?

Hardik

>
> > Also /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/net is
> > still missing.
> >
> > After deeper diving into v6.1.0, I found the latter a longer existing
> > bug, not caused by your patch.
> >
> > More over, this doesn't seem to do any harm.
> >
> > The first issue does, as we try to route traffic over a device that no
> > longer exists.
> >
> > > > > > Additionally, please check if you have any customized DWC patches
> > > > > > that may be causing this problem.
> > > > > >
> > > > > > >
> > > > > > > > You may recall the whole issue did not occur before this
> > > > > > > > patch got applied.
> > > > > > > >
> > > > > > > > > Hardik
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > > According to current kernel architecture of u_ether driver, only
> > > > > > > > > > > > gether_cleanup should remove the usb0 interface along with its
> > > > > > > > > > > > kobject and sysfs interface.
> > > > > > > > > > > > I suggest sharing the analysis
> > > > > > > > > > > > here to understand why this
> > > > > > > > > > > > practice
> > > > > > > > > > > > is not followed in your use case or driver ?
> > > > > > > > > > >
> > > > > > > > > > > Yes, I'll try to trace where that happens.
> > > > > > > > > > >
> > > > > > > > > > > Nevertheless, the disappearance of the net/usb0 directory seems
> > > > > > > > > > > harmless? But the usb: net device
> > > > > > > > > > > remaining after disconnect or role
> > > > > > > > > > > switch is not good, as the route remains.
> > > > > > > > > > >
> > > > > > > > > > > May be they are 2 separate problems.
> > > > > > > > > > > Could you try to reproduce what
> > > > > > > > > > > happens if you make eem connection and then unplug?
> > > > > > > > > > >
> > > > > > > > > > > > I am curious why the driver was
> > > > > > > > > > > > developed without adhering to
> > > > > > > > > > > > the
> > > > > > > > > > > > kernel's gadget architecture.
> > > > > > > > > >
> > > > > > > > > > I don't know what you mean here. Which driver do you mean?
> > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > > > > I have the dwc3 IP base usb controller, Let me check
> > > > > > > > > > > > > > > > with this patch and
> > > > > > > > > > > > > > > > share result here.  May be we need some fix in dwc3
> > > > > > > > > > > > > > Would have been nice if someone could test on other
> > > > > > > > > > > > > > controller as well. But
> > > > > > > > > > > > > > another instance of dwc3 is also very welcome.
> > > > > > > > > > > > > > > It's quite possible, please test on your side.
> > > > > > > > > > > > > > > We are happy to test any fixes if you come up with.
> > > > > > > > > > > > >
> > > > > > > > > > > > > --
> > > > > > > > > > > > > With Best Regards,
> > > > > > > > > > > > > Andy Shevchenko
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > >
> > >
> > >
> >

2024-05-29 09:23:27

by Andy Shevchenko

[permalink] [raw]
Subject: Re: [PATCH v4] usb: gadget: u_ether: Replace netif_stop_queue with netif_device_detach

On Tue, May 28, 2024 at 09:18:51AM +0200, Hardik Gajjar wrote:
> On Sun, May 26, 2024 at 10:52:53PM +0200, Ferry Toth wrote:
> > Op 15-05-2024 om 20:38 schreef Ferry Toth:
> > > Op 10-05-2024 om 11:45 schreef Hardik Gajjar:
> > > > On Thu, May 02, 2024 at 10:32:16PM +0200, Ferry Toth wrote:
> > > > > Op 02-05-2024 om 22:13 schreef Ferry Toth:
> > > > > > Op 02-05-2024 om 17:29 schreef Hardik Gajjar:
> > > > > > > On Tue, Apr 30, 2024 at 11:12:17PM +0200, Ferry Toth wrote:
> > > > > > > > Op 30-04-2024 om 21:40 schreef Ferry Toth:
> > > > > > > > > Op 30-04-2024 om 17:32 schreef Hardik Gajjar:
> > > > > > > > > > On Sun, Apr 28, 2024 at 11:07:36PM +0200, Ferry Toth wrote:
> > > > > > > > > > > Op 25-04-2024 om 23:27 schreef Ferry Toth:
> > > > > > > > > > > > Op 17-04-2024 om 17:13 schreef Hardik Gajjar:
> > > > > > > > > > > > > On Tue, Apr 16, 2024 at 04:48:32PM +0300, Andy Shevchenko wrote:
> > > > > > > > > > > > > > On Thu, Apr 11, 2024 at 10:52:36PM +0200, Ferry Toth wrote:
> > > > > > > > > > > > > > > Op 11-04-2024 om 18:39 schreef Andy Shevchenko:
> > > > > > > > > > > > > > > > On Thu, Apr 11, 2024
> > > > > > > > > > > > > > > > at 04:26:37PM +0200,
> > > > > > > > > > > > > > > > Hardik Gajjar wrote:
> > > > > > > > > > > > > > > > > On Wed, Apr 10, 2024 at
> > > > > > > > > > > > > > > > > 08:37:42PM +0300, Andy
> > > > > > > > > > > > > > > > > Shevchenko wrote:
> > > > > > > > > > > > > > > > > > On Sun, Apr 07, 2024 at 10:51:51PM +0200, Ferry Toth wrote:
> > > > > > > > > > > > > > > > > > > Op 05-04-2024 om 13:38 schreef Hardik Gajjar:

> > > > > > > > > > > > > > ...

> > > > > > > > > > > > > > > > > > > Exactly. And this didn't happen before the 2 patches.
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > To be precise: /sys/class/net/usb0 is not
> > > > > > > > > > > > > > > > > > > removed and it is a link, the link
> > > > > > > > > > > > > > > > > > > target /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/net/usb0
> > > > > > > > > > > > > > > > > > > no
> > > > > > > > > > > > > > > > > > > longer exists
> > > > > > > > > > > > > > > > > So, it means that the /sys/class/net/usb0 is
> > > > > > > > > > > > > > > > > present, but the symlink is
> > > > > > > > > > > > > > > > > broken. In that case, the dwc3 driver should
> > > > > > > > > > > > > > > > > recreate the device, and the
> > > > > > > > > > > > > > > > > symlink should become active again
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Yes, on first enabling gadget (when device mode is activated):
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > root@yuna:~# ls
> > > > > > > > > > > > > > > /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/
> > > > > > > > > > > > > > > driver? net? power? sound? subsystem? suspended? uevent
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Then switching to host mode:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > root@yuna:~# ls
> > > > > > > > > > > > > > > /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/
> > > > > > > > > > > > > > > ls: cannot access
> > > > > > > > > > > > > > > '/sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/':
> > > > > > > > > > > > > > > No such file
> > > > > > > > > > > > > > > or directory
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Then back to device mode:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > root@yuna:~# ls
> > > > > > > > > > > > > > > /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/
> > > > > > > > > > > > > > > driver? power? sound? subsystem? suspended? uevent
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > net is missing. But, network functions:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > root@yuna:~# ping 10.42.0.1
> > > > > > > > > > > > > > > PING 10.42.0.1 (10.42.0.1): 56 data bytes
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Mass storage device is
> > > > > > > > > > > > > > > created and removed each
> > > > > > > > > > > > > > > time as expected.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > So, what's the conclusion?
> > > > > > > > > > > > > > Shall we move towards revert
> > > > > > > > > > > > > > of those
> > > > > > > > > > > > > > two changes?
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > As promised, I have the tested
> > > > > > > > > > > > > the this patch with the dwc3
> > > > > > > > > > > > > gadget.
> > > > > > > > > > > > > I could not reproduce
> > > > > > > > > > > > > the issue.
> > > > > > > > > > > > >
> > > > > > > > > > > > > I can see the usb0 exist all the
> > > > > > > > > > > > > time and accessible regardless
> > > > > > > > > > > > > of
> > > > > > > > > > > > > the role switching of the USB mode (peripheral <-> host)
> > > > > > > > > > > > >
> > > > > > > > > > > > > Following are the logs:
> > > > > > > > > > > > > //Host to device
> > > > > > > > > > > > >
> > > > > > > > > > > > > console:/sys/bus/platform/devices/a800000.ssusb
> > > > > > > > > > > > > # echo "peripheral"
> > > > > > > > > > > > > > mode
> > > > > > > > > > > > > console:/sys/bus/platform/devices/a800000.ssusb # ls
> > > > > > > > > > > > > a800000.dwc3/gadget/net/
> > > > > > > > > > > > > usb0
> > > > > > > > > > > > >
> > > > > > > > > > > > > //device to host
> > > > > > > > > > > > > console:/sys/bus/platform/devices/a800000.ssusb
> > > > > > > > > > > > > # echo "host" > mode
> > > > > > > > > > > > > console:/sys/bus/platform/devices/a800000.ssusb # ls
> > > > > > > > > > > > > a800000.dwc3/gadget/net/
> > > > > > > > > > > > > usb0
> > > > > > > > > > > >
> > > > > > > > > > > > That is weird. When I switch to host mode (using
> > > > > > > > > > > > the physical switch),
> > > > > > > > > > > > the whole gadget directory is removed (now testing 6.9.0-rc5)
> > > > > > > > > > > >
> > > > > > > > > > > > Switching back to device mode, that gadget
> > > > > > > > > > > > directory is recreated. And
> > > > > > > > > > > > gadget/sound as well, but not gadget/net.
> > > > > > > > > > > >
> > > > > > > > > > > > > s a800000.dwc3/gadget/net/usb0
> > > > > > > > > > > > > <
> > > > > > > > > > > > > addr_assign_type??? duplex???????????? phys_port_name
> > > > > > > > > > > > > addr_len??????????? flags????????????? phys_switch_id
> > > > > > > > > > > > > address???????????? gro_flush_timeout? power
> > > > > > > > > > > > > broadcast?????????? ifalias??????????? proto_down
> > > > > > > > > > > > > carrier???????????? ifindex??????????? queues
> > > > > > > > > > > > > carrier_changes???? iflink???????????? speed
> > > > > > > > > > > > > carrier_down_count? link_mode????????? statistics
> > > > > > > > > > > > > carrier_up_count??? mtu??????????????? subsystem
> > > > > > > > > > > > > dev_id????????????? name_assign_type?? tx_queue_len
> > > > > > > > > > > > > dev_port??????????? netdev_group?????? type
> > > > > > > > > > > > > device????????????? operstate????????? uevent
> > > > > > > > > > > > > dormant???????????? phys_port_id?????? waiting_for_supplier
> > > > > > > > > > > > > console:/sys/bus/platform/devices/a800000.ssusb
> > > > > > > > > > > > > # ifconfig -a usb0
> > > > > > > > > > > > > usb0????? Link encap:Ethernet? HWaddr 3a:8b:63:97:1a:9a
> > > > > > > > > > > > > ????????????? BROADCAST MULTICAST? MTU:1500? Metric:1
> > > > > > > > > > > > > ????????????? RX packets:0 errors:0 dropped:0 overruns:0 frame:0
> > > > > > > > > > > > > ????????????? TX packets:0
> > > > > > > > > > > > > errors:0 dropped:0 overruns:0
> > > > > > > > > > > > > carrier:0
> > > > > > > > > > > > > ????????????? collisions:0 txqueuelen:1000
> > > > > > > > > > > > > ????????????? RX bytes:0 TX bytes:0
> > > > > > > > > > > > >
> > > > > > > > > > > > > console:/sys/bus/platform/devices/a800000.ssusb #
> > > > > > > > > > > > >
> > > > > > > > > > > > > I strongly advise against
> > > > > > > > > > > > > reverting the patch solely based
> > > > > > > > > > > > > on the
> > > > > > > > > > > > > observed issue of removing the
> > > > > > > > > > > > > /sys/class/net/usb0 directory
> > > > > > > > > > > > > while
> > > > > > > > > > > > > the usb0 interface remains available.
> > > > > > > > > > > >
> > > > > > > > > > > > There's more to it. I also mentioned that switching the role or
> > > > > > > > > > > > unplugging the cable leaves the usb0 connection.
> > > > > > > > > > > >
> > > > > > > > > > > > I have while in host mode:
> > > > > > > > > > > > root@yuna:~# ifconfig -a usb0
> > > > > > > > > > > > usb0:
> > > > > > > > > > > > flags=-28605<UP,BROADCAST,RUNNING,MULTICAST,DYNAMIC>
> > > > > > > > > > > > mtu 1500
> > > > > > > > > > > > ??????????? inet 10.42.0.221? netmask 255.255.255.0? broadcast
> > > > > > > > > > > > 10.42.0.255
> > > > > > > > > > > > ??????????? inet6 fe80::a8bb:ccff:fedd:eef1? prefixlen 64
> > > > > > > > > > > > scopeid 0x20<link>
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > You don't see that because you didn't create a connection at all.
> > > > > > > > > > > >
> > > > > > > > > > > > > Instead, I recommend enabling FTRACE to
> > > > > > > > > > > > > trace the functions involved
> > > > > > > > > > > > > and identify which faulty call is responsible for removing usb0.
> > > > > > > > > > > >
> > > > > > > > > > > > Switching from device -> host -> device:
> > > > > > > > > > > >
> > > > > > > > > > > > root@yuna:~# trace-cmd record -p function_graph -l *gether_*
> > > > > > > > > > > > ????? plugin 'function_graph'
> > > > > > > > > > > > Hit Ctrl^C to stop recording
> > > > > > > > > > > > ^CCPU0 data recorded at offset=0x1c8000
> > > > > > > > > > > > ??????? 188 bytes in size (4096 uncompressed)
> > > > > > > > > > > > CPU1 data recorded at offset=0x1c9000
> > > > > > > > > > > > ??????? 0 bytes in size (0 uncompressed)
> > > > > > > > > > > > root@yuna:~# trace-cmd report
> > > > > > > > > > > > cpus=2
> > > > > > > > > > > > ???????? irq/68-dwc3-725?? [000]?? 514.575337:
> > > > > > > > > > > > funcgraph_entry:????? #
> > > > > > > > > > > > 2079.480 us |? gether_disconnect();
> > > > > > > > > > > > ???????? irq/68-dwc3-946?? [000]?? 524.263731:
> > > > > > > > > > > > funcgraph_entry:????? +
> > > > > > > > > > > > 11.640 us? |? gether_disconnect();
> > > > > > > > > > > > ???????? irq/68-dwc3-946?? [000]?? 524.263743:
> > > > > > > > > > > > funcgraph_entry:????? !
> > > > > > > > > > > > 116.520 us |? gether_connect();
> > > > > > > > > > > > ???????? irq/68-dwc3-946?? [000]?? 524.268029:
> > > > > > > > > > > > funcgraph_entry:????? #
> > > > > > > > > > > > 2057.260 us |? gether_disconnect();
> > > > > > > > > > > > ???????? irq/68-dwc3-946?? [000]?? 524.270089:
> > > > > > > > > > > > funcgraph_entry:????? !
> > > > > > > > > > > > 109.000 us |? gether_connect();
> > > > > > > > > > >
> > > > > > > > > > > I tried to get a more useful trace:
> > > > > > > > > > > root@yuna:/sys/kernel/tracing# echo 'gether_*' > set_ftrace_filter
> > > > > > > > > > > root@yuna:/sys/kernel/tracing# echo 'eem_*' >> set_ftrace_filter
> > > > > > > > > > > root@yuna:/sys/kernel/tracing# echo function > current_tracer
> > > > > > > > > > > root@yuna:/sys/kernel/tracing# echo 'reset_config'
> > > > > > > > > > > > > set_ftrace_filter
> > > > > > > > > > > -> switch to host mode then back to device
> > > > > > > > > > > root@yuna:/sys/kernel/tracing# cat trace
> > > > > > > > > > > # tracer: function
> > > > > > > > > > > #
> > > > > > > > > > > # entries-in-buffer/entries-written: 53/53?? #P:2
> > > > > > > > > > > #
> > > > > > > > > > > #??????????????????????????????? _-----=> irqs-off/BH-disabled
> > > > > > > > > > > #?????????????????????????????? / _----=> need-resched
> > > > > > > > > > > #????????????????????????????? | / _---=> hardirq/softirq
> > > > > > > > > > > #????????????????????????????? || / _--=> preempt-depth
> > > > > > > > > > > #????????????????????????????? ||| / _-=> migrate-disable
> > > > > > > > > > > #????????????????????????????? |||| /???? delay
> > > > > > > > > > > #?????????? TASK-PID???? CPU#? |||||? TIMESTAMP? FUNCTION
> > > > > > > > > > > #????????????? | |???????? |?? |||||???? |???????? |
> > > > > > > > > > > ??????? irq/68-dwc3-523???? [000] D..3.?? 133.990254: reset_config
> > > > > > > > > > > <-__composite_disconnect
> > > > > > > > > > > ??????? irq/68-dwc3-523???? [000] D..3.?? 133.992274: eem_disable
> > > > > > > > > > > <-reset_config
> > > > > > > > > > > ??????? irq/68-dwc3-523???? [000] D..3.?? 133.992276:
> > > > > > > > > > > gether_disconnect
> > > > > > > > > > > <-reset_config
> > > > > > > > > > > ??????? kworker/1:3-443???? [001] ...1.?? 134.022453: eem_unbind
> > > > > > > > > > > <-purge_configs_funcs
> > > > > > > > > > >
> > > > > > > > > > > -> to device mode
> > > > > > > > > > >
> > > > > > > > > > > ??????? kworker/1:3-443???? [001] ...1.?? 148.630773: eem_bind
> > > > > > > > > > > <-usb_add_function
> > > > > > > > > > > ??????? irq/68-dwc3-734???? [000] D..3.?? 149.155209: eem_set_alt
> > > > > > > > > > > <-composite_setup
> > > > > > > > > > > ??????? irq/68-dwc3-734???? [000] D..3.?? 149.155215:
> > > > > > > > > > > gether_disconnect
> > > > > > > > > > > <-eem_set_alt
> > > > > > > > > > > ??????? irq/68-dwc3-734???? [000]
> > > > > > > > > > > D..3.?? 149.155220: gether_connect
> > > > > > > > > > > <-eem_set_alt
> > > > > > > > > > > ??????? irq/68-dwc3-734???? [000] D..3.?? 149.157287: eem_set_alt
> > > > > > > > > > > <-composite_setup
> > > > > > > > > > > ??????? irq/68-dwc3-734???? [000] D..3.?? 149.157292:
> > > > > > > > > > > gether_disconnect
> > > > > > > > > > > <-eem_set_alt
> > > > > > > > > > > ??????? irq/68-dwc3-734???? [000]
> > > > > > > > > > > D..3.?? 149.159338: gether_connect
> > > > > > > > > > > <-eem_set_alt
> > > > > > > > > > > ??????? irq/68-dwc3-734???? [000] D..2.?? 149.239625: eem_unwrap
> > > > > > > > > > > <-rx_complete
> > > > > > > > > > > ...
> > > > > > > > > > >
> > > > > > > > > > > I don't know where to look exactly. Any hints?
> > > > > > > > > >
> > > > > > > > > > do you see anything related to
> > > > > > > > > > gether_cleanup() after eem_unbind() ?
> > > > > > > > >
> > > > > > > > > Nope. It's a pitty that the trace formatting got messed up
> > > > > > > > > above. But as
> > > > > > > > > you can see I traced gether_* and eem_*. After eem_unbind no traced
> > > > > > > > > function is called, until I flip the switch to device mode.
> > > > > > > > > The ... at the end is where I cut uninteresting eem_unwrap
> > > > > > > > > <-rx_complete
> > > > > > > > > and eem_wrap <-eth_start_xmit lines.
> > > > > > > > >
> > > > > > > > > > If not then, you may try to enable tracing of TCP/IP stack and
> > > > > > > > > > network side to check who deleting the sysfs entry
> > > > > > > > >
> > > > > > > > > Yes, that's a vast amount of functions to trace. And I don't see how
> > > > > > > > > that would be related to the patch we're
> > > > > > > > > discussing here. I was hoping
> > > > > > > > > for a little more targeted hint.
> > > > > > > >
> > > > > > > > Now filtering 'gether_*', 'eem_*', '*configfs_*', 'composite_*',
> > > > > > > > 'usb_fun*',
> > > > > > > > 'reset_config' and 'device_remove_file' leads me to:
> > > > > > > >
> > > > > > > > TIMESTAMP? FUNCTION
> > > > > > > > ???? |???????? |
> > > > > > > > ??? 49.952477: eem_wrap <-eth_start_xmit
> > > > > > > > ??? 55.072455: eem_wrap <-eth_start_xmit
> > > > > > > > ??? 55.072621: eem_unwrap <-rx_complete
> > > > > > > > ??? 59.011540: configfs_composite_reset <-usb_gadget_udc_reset
> > > > > > > > ??? 59.011545: composite_reset <-configfs_composite_reset
> > > > > > > > ??? 59.011548: reset_config <-__composite_disconnect
> > > > > > > > ??? 59.013565: eem_disable <-reset_config
> > > > > > > > ??? 59.013567: gether_disconnect <-reset_config
> > > > > > > > ??? 59.049560: device_remove_file <-device_remove
> > > > > > > > ??? 59.051185: configfs_composite_disconnect <-usb_gadget_disco
> > > > > > > > ??? 59.051189: composite_disconnect <-configfs_composite_discon
> > > > > > > > ??? 59.051195: configfs_composite_unbind <-gadget_unbind_driver
> > > > > > > > ??? 59.052519: eem_unbind <-purge_configs_funcs
> > > > > > > > ??? 59.052529: composite_dev_cleanup <-configfs_composite_unbin
> > > > > > > > ??? 59.052537: device_remove_file <-composite_dev_cleanup
> > > > > > > >
> > > > > > > > device_remove_file gets called twice, once by device_remove after
> > > > > > > > gether_disconnect (that the one). The 2nd time by
> > > > > > > > composite_dev_cleanup
> > > > > > > > (removing the gadget)
> > > > > > >
> > > > > > > I believe that the device_remove_file function is only removing
> > > > > > > suspend-specific attributes, not the complete gadget.
> > > > > > > Typically, when you perform the role switch, the Gadget start/stop
> > > > > > > function in your UDC driver is called. These functions should not
> > > > > > > delete the gadget
> > > > > > >
> > > > > > > To investigate further, could you please enable the DWC3 functions
> > > > > > > in ftrace and check who is removing the gadget?
> > > > > > > I can also enable this on my system and compare the logs with yours,
> > > > > > > but I will be in PI planning for 1.5 weeks and may not be able to
> > > > > > > provide immediate support.
> > > > > >
> > > > > > Yes, but of course adding dwc3_* (and usb_*) also traces host mode, so
> > > > > > trace is 600kb. I cut uninteresting stuff before
> > > > > > configfs_composite_reset <-usb_gadget_udc_reset and after
> > > > > > __dwc3_set_mode, <https://urldefense.proofpoint.com/v2/url?u=http-3A__300linesremain.Seeattachedtar.gzforyouto&d=DwIDaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=SAhjP5GOmrADp1v_EE5jWoSuMlYCIt9gKduw-DCBPLs&m=zdiBhk-2V5AXxu707QAhbCgWR4qNVRARBmxN17nVB69gVOm-QPqrJeKpo4_trszw&s=ixagWKgLs6wQDJfwh4vIDQNDiy8GZnK9KnUELIfiJz0&e=>
> > > > > > compare.
> > > > > >
> > > > Could you please try with the following patch and see if your issue
> > > > resolves ?
> > > >
> > > > diff --git a/drivers/usb/gadget/function/f_eem.c
> > > > b/drivers/usb/gadget/function/f_eem.c
> > > > index 3b445bd88498..c2a904475083 100644
> > > > --- a/drivers/usb/gadget/function/f_eem.c
> > > > +++ b/drivers/usb/gadget/function/f_eem.c
> > > > @@ -247,7 +247,7 @@ static int eem_bind(struct usb_configuration *c,
> > > > struct usb_function *f)
> > > > ????? struct usb_composite_dev *cdev = c->cdev;
> > > > ????? struct f_eem??????? *eem = func_to_eem(f);
> > > > ????? struct usb_string??? *us;
> > > > -??? int??????????? status;
> > > > +??? int??????????? status = 0;
> > > > ????? struct usb_ep??????? *ep;
> > > > ????? struct f_eem_opts??? *eem_opts;
> > > > @@ -260,16 +260,20 @@ static int eem_bind(struct usb_configuration
> > > > *c, struct usb_function *f)
> > > > ?????? * with list_for_each_entry, so we assume no race condition
> > > > ?????? * with regard to eem_opts->bound access
> > > > ?????? */
> > > > +??? mutex_lock(&eem_opts->lock);
> > > > +??? gether_set_gadget(eem_opts->net, cdev->gadget);
> > > > +
> > > > ????? if (!eem_opts->bound) {
> > > > -??????? mutex_lock(&eem_opts->lock);
> > > > -??????? gether_set_gadget(eem_opts->net, cdev->gadget);
> > > > ????????? status = gether_register_netdev(eem_opts->net);
> > > > -??????? mutex_unlock(&eem_opts->lock);
> > > > -??????? if (status)
> > > > -??????????? return status;
> > > > -??????? eem_opts->bound = true;
> > > > ????? }
> > > > +??? mutex_unlock(&eem_opts->lock);
> > > > +
> > > > +??? if (status)
> > > > +??????? goto fail;
> > > > +
> > > > +??? eem_opts->bound = true;
> > > > +
> > > > ????? us = usb_gstrings_attach(cdev, eem_strings,
> > > > ?????????????????? ARRAY_SIZE(eem_string_defs));
> > > >
> > >
> > > After switching from host -> device -> host the usb0 device as seen by
> > > ifconfig remains "RUNNING" and in the routing table.
> > >
> > > FYI This is the regression.
> >
> > For us, I don't see any advantages in having this patch and only the
> > downside. Do you have any further ideas / patches that we could test.
> >
> > Otherwise, it might be best to go with the original suggestion and revert
> > until it gets sorted out?
>
> Hello,
>
> Okay, lets revert the patch till we fix your problem. Would you be able to send the revert ? or shall I ?

Ferry or I can do it, but I will be able to do that next week.

> However, We are very much intrested to fix your problem and upstream the v2.
> Therefore could you please share the following details to reproduce the issue
> - How did you created the gadget ? Could you please share the exact commands to create the eem gadget ?

Ferry will answer to this one.

> - Arch of your test bench (arm, x86)

x86, the SoC is Intel Tangier on the Intel Merrifield (platform).

> - How the DWC3 controller connected to your system ? via PCIe slot ? Could you please share the part number of the PCIe card ?

Yes, it's PCI connected device (which you may have already deducted from the
thread a lot of times). It's part of the SoC (also can be easily deducted).

The IDs are available in the driver:
https://elixir.bootlin.com/linux/latest/source/drivers/usb/dwc3/dwc3-pci.c#L24

> > > Also /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/net is
> > > still missing.
> > >
> > > After deeper diving into v6.1.0, I found the latter a longer existing
> > > bug, not caused by your patch.
> > >
> > > More over, this doesn't seem to do any harm.
> > >
> > > The first issue does, as we try to route traffic over a device that no
> > > longer exists.
> > >
> > > > > > > Additionally, please check if you have any customized DWC patches
> > > > > > > that may be causing this problem.
> > > > > > >
> > > > > > > > > You may recall the whole issue did not occur before this
> > > > > > > > > patch got applied.

> > > > > > > > > > > > > According to current kernel architecture of u_ether driver, only
> > > > > > > > > > > > > gether_cleanup should remove the usb0 interface along with its
> > > > > > > > > > > > > kobject and sysfs interface.
> > > > > > > > > > > > > I suggest sharing the analysis
> > > > > > > > > > > > > here to understand why this
> > > > > > > > > > > > > practice
> > > > > > > > > > > > > is not followed in your use case or driver ?
> > > > > > > > > > > >
> > > > > > > > > > > > Yes, I'll try to trace where that happens.
> > > > > > > > > > > >
> > > > > > > > > > > > Nevertheless, the disappearance of the net/usb0 directory seems
> > > > > > > > > > > > harmless? But the usb: net device
> > > > > > > > > > > > remaining after disconnect or role
> > > > > > > > > > > > switch is not good, as the route remains.
> > > > > > > > > > > >
> > > > > > > > > > > > May be they are 2 separate problems.
> > > > > > > > > > > > Could you try to reproduce what
> > > > > > > > > > > > happens if you make eem connection and then unplug?
> > > > > > > > > > > >
> > > > > > > > > > > > > I am curious why the driver was
> > > > > > > > > > > > > developed without adhering to
> > > > > > > > > > > > > the
> > > > > > > > > > > > > kernel's gadget architecture.
> > > > > > > > > > >
> > > > > > > > > > > I don't know what you mean here. Which driver do you mean?
> > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > I have the dwc3 IP base usb controller, Let me check
> > > > > > > > > > > > > > > > > with this patch and
> > > > > > > > > > > > > > > > > share result here.? May be we need some fix in dwc3
> > > > > > > > > > > > > > > Would have been nice if someone could test on other
> > > > > > > > > > > > > > > controller as well. But
> > > > > > > > > > > > > > > another instance of dwc3 is also very welcome.
> > > > > > > > > > > > > > > > It's quite possible, please test on your side.
> > > > > > > > > > > > > > > > We are happy to test any fixes if you come up with.

--
With Best Regards,
Andy Shevchenko



2024-05-30 19:07:38

by Ferry Toth

[permalink] [raw]
Subject: Re: [PATCH v4] usb: gadget: u_ether: Replace netif_stop_queue with netif_device_detach

Hi,

Op 28-05-2024 om 09:18 schreef Hardik Gajjar:
> On Sun, May 26, 2024 at 10:52:53PM +0200, Ferry Toth wrote:
>> Hi
>>
>> Op 15-05-2024 om 20:38 schreef Ferry Toth:
>>> Hi,
>>>
>>> Sorry to have kept you waiting, I was away for a short break.
>>>
>>> Patch tested below.
>>>
>>> Op 10-05-2024 om 11:45 schreef Hardik Gajjar:
>>>> On Thu, May 02, 2024 at 10:32:16PM +0200, Ferry Toth wrote:
>>>>> Oops, sorry, wrong file attached . Now correct one.
>>>>>
>>>>> Op 02-05-2024 om 22:13 schreef Ferry Toth:
>>>>>> Op 02-05-2024 om 17:29 schreef Hardik Gajjar:
>>>>>>> On Tue, Apr 30, 2024 at 11:12:17PM +0200, Ferry Toth wrote:
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> Op 30-04-2024 om 21:40 schreef Ferry Toth:
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> Op 30-04-2024 om 17:32 schreef Hardik Gajjar:
>>>>>>>>>> On Sun, Apr 28, 2024 at 11:07:36PM +0200, Ferry Toth wrote:
>>>>>>>>>>> Hi,
>>>>>>>>>>>
>>>>>>>>>>> Op 25-04-2024 om 23:27 schreef Ferry Toth:
>>>>>>>>>>>> Hi,
>>>>>>>>>>>>
>>>>>>>>>>>> Op 17-04-2024 om 17:13 schreef Hardik Gajjar:
>>>>>>>>>>>>> On Tue, Apr 16, 2024 at 04:48:32PM +0300, Andy Shevchenko wrote:
>>>>>>>>>>>>>> On Thu, Apr 11, 2024 at 10:52:36PM +0200, Ferry Toth wrote:
>>>>>>>>>>>>>>> Op 11-04-2024 om 18:39 schreef Andy Shevchenko:
>>>>>>>>>>>>>>>> On Thu, Apr 11, 2024
>>>>>>>>>>>>>>>> at 04:26:37PM +0200,
>>>>>>>>>>>>>>>> Hardik Gajjar wrote:
>>>>>>>>>>>>>>>>> On Wed, Apr 10, 2024 at
>>>>>>>>>>>>>>>>> 08:37:42PM +0300, Andy
>>>>>>>>>>>>>>>>> Shevchenko wrote:
>>>>>>>>>>>>>>>>>> On Sun, Apr 07, 2024 at 10:51:51PM +0200, Ferry Toth wrote:
>>>>>>>>>>>>>>>>>>> Op 05-04-2024 om 13:38 schreef Hardik Gajjar:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> ...
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Exactly. And this didn't happen before the 2 patches.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> To be precise: /sys/class/net/usb0 is not
>>>>>>>>>>>>>>>>>>> removed and it is a link, the link
>>>>>>>>>>>>>>>>>>> target /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/net/usb0
>>>>>>>>>>>>>>>>>>> no
>>>>>>>>>>>>>>>>>>> longer exists
>>>>>>>>>>>>>>>>> So, it means that the /sys/class/net/usb0 is
>>>>>>>>>>>>>>>>> present, but the symlink is
>>>>>>>>>>>>>>>>> broken. In that case, the dwc3 driver should
>>>>>>>>>>>>>>>>> recreate the device, and the
>>>>>>>>>>>>>>>>> symlink should become active again
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Yes, on first enabling gadget (when device mode is activated):
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> root@yuna:~# ls
>>>>>>>>>>>>>>> /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/
>>>>>>>>>>>>>>> driver  net  power  sound  subsystem  suspended  uevent
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Then switching to host mode:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> root@yuna:~# ls
>>>>>>>>>>>>>>> /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/
>>>>>>>>>>>>>>> ls: cannot access
>>>>>>>>>>>>>>> '/sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/':
>>>>>>>>>>>>>>> No such file
>>>>>>>>>>>>>>> or directory
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Then back to device mode:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> root@yuna:~# ls
>>>>>>>>>>>>>>> /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/
>>>>>>>>>>>>>>> driver  power  sound  subsystem  suspended  uevent
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> net is missing. But, network functions:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> root@yuna:~# ping 10.42.0.1
>>>>>>>>>>>>>>> PING 10.42.0.1 (10.42.0.1): 56 data bytes
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Mass storage device is
>>>>>>>>>>>>>>> created and removed each
>>>>>>>>>>>>>>> time as expected.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> So, what's the conclusion?
>>>>>>>>>>>>>> Shall we move towards revert
>>>>>>>>>>>>>> of those
>>>>>>>>>>>>>> two changes?
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> As promised, I have the tested
>>>>>>>>>>>>> the this patch with the dwc3
>>>>>>>>>>>>> gadget.
>>>>>>>>>>>>> I could not reproduce
>>>>>>>>>>>>> the issue.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I can see the usb0 exist all the
>>>>>>>>>>>>> time and accessible regardless
>>>>>>>>>>>>> of
>>>>>>>>>>>>> the role switching of the USB mode (peripheral <-> host)
>>>>>>>>>>>>>
>>>>>>>>>>>>> Following are the logs:
>>>>>>>>>>>>> //Host to device
>>>>>>>>>>>>>
>>>>>>>>>>>>> console:/sys/bus/platform/devices/a800000.ssusb
>>>>>>>>>>>>> # echo "peripheral"
>>>>>>>>>>>>>> mode
>>>>>>>>>>>>> console:/sys/bus/platform/devices/a800000.ssusb # ls
>>>>>>>>>>>>> a800000.dwc3/gadget/net/
>>>>>>>>>>>>> usb0
>>>>>>>>>>>>>
>>>>>>>>>>>>> //device to host
>>>>>>>>>>>>> console:/sys/bus/platform/devices/a800000.ssusb
>>>>>>>>>>>>> # echo "host" > mode
>>>>>>>>>>>>> console:/sys/bus/platform/devices/a800000.ssusb # ls
>>>>>>>>>>>>> a800000.dwc3/gadget/net/
>>>>>>>>>>>>> usb0
>>>>>>>>>>>>
>>>>>>>>>>>> That is weird. When I switch to host mode (using
>>>>>>>>>>>> the physical switch),
>>>>>>>>>>>> the whole gadget directory is removed (now testing 6.9.0-rc5)
>>>>>>>>>>>>
>>>>>>>>>>>> Switching back to device mode, that gadget
>>>>>>>>>>>> directory is recreated. And
>>>>>>>>>>>> gadget/sound as well, but not gadget/net.
>>>>>>>>>>>>
>>>>>>>>>>>>> s a800000.dwc3/gadget/net/usb0
>>>>>>>>>>>>> <
>>>>>>>>>>>>> addr_assign_type    duplex             phys_port_name
>>>>>>>>>>>>> addr_len            flags              phys_switch_id
>>>>>>>>>>>>> address             gro_flush_timeout  power
>>>>>>>>>>>>> broadcast           ifalias            proto_down
>>>>>>>>>>>>> carrier             ifindex            queues
>>>>>>>>>>>>> carrier_changes     iflink             speed
>>>>>>>>>>>>> carrier_down_count  link_mode          statistics
>>>>>>>>>>>>> carrier_up_count    mtu                subsystem
>>>>>>>>>>>>> dev_id              name_assign_type   tx_queue_len
>>>>>>>>>>>>> dev_port            netdev_group       type
>>>>>>>>>>>>> device              operstate          uevent
>>>>>>>>>>>>> dormant             phys_port_id       waiting_for_supplier
>>>>>>>>>>>>> console:/sys/bus/platform/devices/a800000.ssusb
>>>>>>>>>>>>> # ifconfig -a usb0
>>>>>>>>>>>>> usb0      Link encap:Ethernet  HWaddr 3a:8b:63:97:1a:9a
>>>>>>>>>>>>>               BROADCAST MULTICAST  MTU:1500  Metric:1
>>>>>>>>>>>>>               RX packets:0 errors:0 dropped:0 overruns:0 frame:0
>>>>>>>>>>>>>               TX packets:0
>>>>>>>>>>>>> errors:0 dropped:0 overruns:0
>>>>>>>>>>>>> carrier:0
>>>>>>>>>>>>>               collisions:0 txqueuelen:1000
>>>>>>>>>>>>>               RX bytes:0 TX bytes:0
>>>>>>>>>>>>>
>>>>>>>>>>>>> console:/sys/bus/platform/devices/a800000.ssusb #
>>>>>>>>>>>>>
>>>>>>>>>>>>> I strongly advise against
>>>>>>>>>>>>> reverting the patch solely based
>>>>>>>>>>>>> on the
>>>>>>>>>>>>> observed issue of removing the
>>>>>>>>>>>>> /sys/class/net/usb0 directory
>>>>>>>>>>>>> while
>>>>>>>>>>>>> the usb0 interface remains available.
>>>>>>>>>>>>
>>>>>>>>>>>> There's more to it. I also mentioned that switching the role or
>>>>>>>>>>>> unplugging the cable leaves the usb0 connection.
>>>>>>>>>>>>
>>>>>>>>>>>> I have while in host mode:
>>>>>>>>>>>> root@yuna:~# ifconfig -a usb0
>>>>>>>>>>>> usb0:
>>>>>>>>>>>> flags=-28605<UP,BROADCAST,RUNNING,MULTICAST,DYNAMIC>
>>>>>>>>>>>> mtu 1500
>>>>>>>>>>>>             inet 10.42.0.221  netmask 255.255.255.0  broadcast
>>>>>>>>>>>> 10.42.0.255
>>>>>>>>>>>>             inet6 fe80::a8bb:ccff:fedd:eef1  prefixlen 64
>>>>>>>>>>>> scopeid 0x20<link>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> You don't see that because you didn't create a connection at all.
>>>>>>>>>>>>
>>>>>>>>>>>>> Instead, I recommend enabling FTRACE to
>>>>>>>>>>>>> trace the functions involved
>>>>>>>>>>>>> and identify which faulty call is responsible for removing usb0.
>>>>>>>>>>>>
>>>>>>>>>>>> Switching from device -> host -> device:
>>>>>>>>>>>>
>>>>>>>>>>>> root@yuna:~# trace-cmd record -p function_graph -l *gether_*
>>>>>>>>>>>>       plugin 'function_graph'
>>>>>>>>>>>> Hit Ctrl^C to stop recording
>>>>>>>>>>>> ^CCPU0 data recorded at offset=0x1c8000
>>>>>>>>>>>>         188 bytes in size (4096 uncompressed)
>>>>>>>>>>>> CPU1 data recorded at offset=0x1c9000
>>>>>>>>>>>>         0 bytes in size (0 uncompressed)
>>>>>>>>>>>> root@yuna:~# trace-cmd report
>>>>>>>>>>>> cpus=2
>>>>>>>>>>>>          irq/68-dwc3-725   [000]   514.575337:
>>>>>>>>>>>> funcgraph_entry:      #
>>>>>>>>>>>> 2079.480 us |  gether_disconnect();
>>>>>>>>>>>>          irq/68-dwc3-946   [000]   524.263731:
>>>>>>>>>>>> funcgraph_entry:      +
>>>>>>>>>>>> 11.640 us  |  gether_disconnect();
>>>>>>>>>>>>          irq/68-dwc3-946   [000]   524.263743:
>>>>>>>>>>>> funcgraph_entry:      !
>>>>>>>>>>>> 116.520 us |  gether_connect();
>>>>>>>>>>>>          irq/68-dwc3-946   [000]   524.268029:
>>>>>>>>>>>> funcgraph_entry:      #
>>>>>>>>>>>> 2057.260 us |  gether_disconnect();
>>>>>>>>>>>>          irq/68-dwc3-946   [000]   524.270089:
>>>>>>>>>>>> funcgraph_entry:      !
>>>>>>>>>>>> 109.000 us |  gether_connect();
>>>>>>>>>>>
>>>>>>>>>>> I tried to get a more useful trace:
>>>>>>>>>>> root@yuna:/sys/kernel/tracing# echo 'gether_*' > set_ftrace_filter
>>>>>>>>>>> root@yuna:/sys/kernel/tracing# echo 'eem_*' >> set_ftrace_filter
>>>>>>>>>>> root@yuna:/sys/kernel/tracing# echo function > current_tracer
>>>>>>>>>>> root@yuna:/sys/kernel/tracing# echo 'reset_config'
>>>>>>>>>>>>> set_ftrace_filter
>>>>>>>>>>> -> switch to host mode then back to device
>>>>>>>>>>> root@yuna:/sys/kernel/tracing# cat trace
>>>>>>>>>>> # tracer: function
>>>>>>>>>>> #
>>>>>>>>>>> # entries-in-buffer/entries-written: 53/53   #P:2
>>>>>>>>>>> #
>>>>>>>>>>> #                                _-----=> irqs-off/BH-disabled
>>>>>>>>>>> #                               / _----=> need-resched
>>>>>>>>>>> #                              | / _---=> hardirq/softirq
>>>>>>>>>>> #                              || / _--=> preempt-depth
>>>>>>>>>>> #                              ||| / _-=> migrate-disable
>>>>>>>>>>> #                              |||| /     delay
>>>>>>>>>>> #           TASK-PID     CPU#  |||||  TIMESTAMP  FUNCTION
>>>>>>>>>>> #              | |         |   |||||     |         |
>>>>>>>>>>>         irq/68-dwc3-523     [000] D..3.   133.990254: reset_config
>>>>>>>>>>> <-__composite_disconnect
>>>>>>>>>>>         irq/68-dwc3-523     [000] D..3.   133.992274: eem_disable
>>>>>>>>>>> <-reset_config
>>>>>>>>>>>         irq/68-dwc3-523     [000] D..3.   133.992276:
>>>>>>>>>>> gether_disconnect
>>>>>>>>>>> <-reset_config
>>>>>>>>>>>         kworker/1:3-443     [001] ...1.   134.022453: eem_unbind
>>>>>>>>>>> <-purge_configs_funcs
>>>>>>>>>>>
>>>>>>>>>>> -> to device mode
>>>>>>>>>>>
>>>>>>>>>>>         kworker/1:3-443     [001] ...1.   148.630773: eem_bind
>>>>>>>>>>> <-usb_add_function
>>>>>>>>>>>         irq/68-dwc3-734     [000] D..3.   149.155209: eem_set_alt
>>>>>>>>>>> <-composite_setup
>>>>>>>>>>>         irq/68-dwc3-734     [000] D..3.   149.155215:
>>>>>>>>>>> gether_disconnect
>>>>>>>>>>> <-eem_set_alt
>>>>>>>>>>>         irq/68-dwc3-734     [000]
>>>>>>>>>>> D..3.   149.155220: gether_connect
>>>>>>>>>>> <-eem_set_alt
>>>>>>>>>>>         irq/68-dwc3-734     [000] D..3.   149.157287: eem_set_alt
>>>>>>>>>>> <-composite_setup
>>>>>>>>>>>         irq/68-dwc3-734     [000] D..3.   149.157292:
>>>>>>>>>>> gether_disconnect
>>>>>>>>>>> <-eem_set_alt
>>>>>>>>>>>         irq/68-dwc3-734     [000]
>>>>>>>>>>> D..3.   149.159338: gether_connect
>>>>>>>>>>> <-eem_set_alt
>>>>>>>>>>>         irq/68-dwc3-734     [000] D..2.   149.239625: eem_unwrap
>>>>>>>>>>> <-rx_complete
>>>>>>>>>>> ...
>>>>>>>>>>>
>>>>>>>>>>> I don't know where to look exactly. Any hints?
>>>>>>>>>>
>>>>>>>>>> do you see anything related to
>>>>>>>>>> gether_cleanup() after eem_unbind() ?
>>>>>>>>>
>>>>>>>>> Nope. It's a pitty that the trace formatting got messed up
>>>>>>>>> above. But as
>>>>>>>>> you can see I traced gether_* and eem_*. After eem_unbind no traced
>>>>>>>>> function is called, until I flip the switch to device mode.
>>>>>>>>> The ... at the end is where I cut uninteresting eem_unwrap
>>>>>>>>> <-rx_complete
>>>>>>>>> and eem_wrap <-eth_start_xmit lines.
>>>>>>>>>
>>>>>>>>>> If not then, you may try to enable tracing of TCP/IP stack and
>>>>>>>>>> network side to check who deleting the sysfs entry
>>>>>>>>>
>>>>>>>>> Yes, that's a vast amount of functions to trace. And I don't see how
>>>>>>>>> that would be related to the patch we're
>>>>>>>>> discussing here. I was hoping
>>>>>>>>> for a little more targeted hint.
>>>>>>>>
>>>>>>>> Now filtering 'gether_*', 'eem_*', '*configfs_*', 'composite_*',
>>>>>>>> 'usb_fun*',
>>>>>>>> 'reset_config' and 'device_remove_file' leads me to:
>>>>>>>>
>>>>>>>> TIMESTAMP  FUNCTION
>>>>>>>>      |         |
>>>>>>>>     49.952477: eem_wrap <-eth_start_xmit
>>>>>>>>     55.072455: eem_wrap <-eth_start_xmit
>>>>>>>>     55.072621: eem_unwrap <-rx_complete
>>>>>>>>     59.011540: configfs_composite_reset <-usb_gadget_udc_reset
>>>>>>>>     59.011545: composite_reset <-configfs_composite_reset
>>>>>>>>     59.011548: reset_config <-__composite_disconnect
>>>>>>>>     59.013565: eem_disable <-reset_config
>>>>>>>>     59.013567: gether_disconnect <-reset_config
>>>>>>>>     59.049560: device_remove_file <-device_remove
>>>>>>>>     59.051185: configfs_composite_disconnect <-usb_gadget_disco
>>>>>>>>     59.051189: composite_disconnect <-configfs_composite_discon
>>>>>>>>     59.051195: configfs_composite_unbind <-gadget_unbind_driver
>>>>>>>>     59.052519: eem_unbind <-purge_configs_funcs
>>>>>>>>     59.052529: composite_dev_cleanup <-configfs_composite_unbin
>>>>>>>>     59.052537: device_remove_file <-composite_dev_cleanup
>>>>>>>>
>>>>>>>> device_remove_file gets called twice, once by device_remove after
>>>>>>>> gether_disconnect (that the one). The 2nd time by
>>>>>>>> composite_dev_cleanup
>>>>>>>> (removing the gadget)
>>>>>>>
>>>>>>> I believe that the device_remove_file function is only removing
>>>>>>> suspend-specific attributes, not the complete gadget.
>>>>>>> Typically, when you perform the role switch, the Gadget start/stop
>>>>>>> function in your UDC driver is called. These functions should not
>>>>>>> delete the gadget
>>>>>>>
>>>>>>> To investigate further, could you please enable the DWC3 functions
>>>>>>> in ftrace and check who is removing the gadget?
>>>>>>> I can also enable this on my system and compare the logs with yours,
>>>>>>> but I will be in PI planning for 1.5 weeks and may not be able to
>>>>>>> provide immediate support.
>>>>>>
>>>>>> Yes, but of course adding dwc3_* (and usb_*) also traces host mode, so
>>>>>> trace is 600kb. I cut uninteresting stuff before
>>>>>> configfs_composite_reset <-usb_gadget_udc_reset and after
>>>>>> __dwc3_set_mode, <https://urldefense.proofpoint.com/v2/url?u=http-3A__300linesremain.Seeattachedtar.gzforyouto&d=DwIDaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=SAhjP5GOmrADp1v_EE5jWoSuMlYCIt9gKduw-DCBPLs&m=zdiBhk-2V5AXxu707QAhbCgWR4qNVRARBmxN17nVB69gVOm-QPqrJeKpo4_trszw&s=ixagWKgLs6wQDJfwh4vIDQNDiy8GZnK9KnUELIfiJz0&e=>
>>>>>> compare.
>>>>>>
>>>> Could you please try with the following patch and see if your issue
>>>> resolves ?
>>>>
>>>> diff --git a/drivers/usb/gadget/function/f_eem.c
>>>> b/drivers/usb/gadget/function/f_eem.c
>>>> index 3b445bd88498..c2a904475083 100644
>>>> --- a/drivers/usb/gadget/function/f_eem.c
>>>> +++ b/drivers/usb/gadget/function/f_eem.c
>>>> @@ -247,7 +247,7 @@ static int eem_bind(struct usb_configuration *c,
>>>> struct usb_function *f)
>>>>       struct usb_composite_dev *cdev = c->cdev;
>>>>       struct f_eem        *eem = func_to_eem(f);
>>>>       struct usb_string    *us;
>>>> -    int            status;
>>>> +    int            status = 0;
>>>>       struct usb_ep        *ep;
>>>>       struct f_eem_opts    *eem_opts;
>>>> @@ -260,16 +260,20 @@ static int eem_bind(struct usb_configuration
>>>> *c, struct usb_function *f)
>>>>        * with list_for_each_entry, so we assume no race condition
>>>>        * with regard to eem_opts->bound access
>>>>        */
>>>> +    mutex_lock(&eem_opts->lock);
>>>> +    gether_set_gadget(eem_opts->net, cdev->gadget);
>>>> +
>>>>       if (!eem_opts->bound) {
>>>> -        mutex_lock(&eem_opts->lock);
>>>> -        gether_set_gadget(eem_opts->net, cdev->gadget);
>>>>           status = gether_register_netdev(eem_opts->net);
>>>> -        mutex_unlock(&eem_opts->lock);
>>>> -        if (status)
>>>> -            return status;
>>>> -        eem_opts->bound = true;
>>>>       }
>>>> +    mutex_unlock(&eem_opts->lock);
>>>> +
>>>> +    if (status)
>>>> +        goto fail;
>>>> +
>>>> +    eem_opts->bound = true;
>>>> +
>>>>       us = usb_gstrings_attach(cdev, eem_strings,
>>>>                    ARRAY_SIZE(eem_string_defs));
>>>>
>>>
>>> After switching from host -> device -> host the usb0 device as seen by
>>> ifconfig remains "RUNNING" and in the routing table.
>>>
>>> FYI This is the regression.
>>
>> For us, I don't see any advantages in having this patch and only the
>> downside. Do you have any further ideas / patches that we could test.
>>
>> Otherwise, it might be best to go with the original suggestion and revert
>> until it gets sorted out?
>
> Hello,
>
> Okay, lets revert the patch till we fix your problem. Would you be able to send the revert ? or shall I ?
>
> However, We are very much intrested to fix your problem and upstream the v2.
> Therefore could you please share the following details to reproduce the issue

Instruction to establish the eem connection on the device (connman) and
the host (networkmanager) here:
https://edison-fw.github.io/meta-intel-edison/4.2-networking.html#gadget

> - How did you created the gadget ? Could you please share the exact commands to create the eem gadget ?

Sure, you can find the u-dev rules and the script here on GH:
https://github.com/edison-fw/meta-intel-edison/tree/kirkstone/meta-intel-edison-bsp/recipes-support/gadget/files

> - Arch of your test bench (arm, x86)

This is x86_64 although we can just as easily build i686 if needed.

> - How the DWC3 controller connected to your system ? via PCIe slot ? Could you please share the part number of the PCIe card ?

Like Andy mentioned it is in the SOC. It's described in ACPI (in U-Boot
together with other SOC level devices) here:
https://github.com/u-boot/u-boot/blob/ea722aa5eb33740ae77e8816aeb72b385e621cd0/arch/x86/include/asm/arch-tangier/acpi/southcluster.asl#L333

> Hardik
>
>>
>>> Also /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/net is
>>> still missing.
>>>
>>> After deeper diving into v6.1.0, I found the latter a longer existing
>>> bug, not caused by your patch.
>>>
>>> More over, this doesn't seem to do any harm.
>>>
>>> The first issue does, as we try to route traffic over a device that no
>>> longer exists.
>>>
>>>>>>> Additionally, please check if you have any customized DWC patches
>>>>>>> that may be causing this problem.
>>>>>>>
>>>>>>>>
>>>>>>>>> You may recall the whole issue did not occur before this
>>>>>>>>> patch got applied.
>>>>>>>>>
>>>>>>>>>> Hardik
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>> According to current kernel architecture of u_ether driver, only
>>>>>>>>>>>>> gether_cleanup should remove the usb0 interface along with its
>>>>>>>>>>>>> kobject and sysfs interface.
>>>>>>>>>>>>> I suggest sharing the analysis
>>>>>>>>>>>>> here to understand why this
>>>>>>>>>>>>> practice
>>>>>>>>>>>>> is not followed in your use case or driver ?
>>>>>>>>>>>>
>>>>>>>>>>>> Yes, I'll try to trace where that happens.
>>>>>>>>>>>>
>>>>>>>>>>>> Nevertheless, the disappearance of the net/usb0 directory seems
>>>>>>>>>>>> harmless? But the usb: net device
>>>>>>>>>>>> remaining after disconnect or role
>>>>>>>>>>>> switch is not good, as the route remains.
>>>>>>>>>>>>
>>>>>>>>>>>> May be they are 2 separate problems.
>>>>>>>>>>>> Could you try to reproduce what
>>>>>>>>>>>> happens if you make eem connection and then unplug?
>>>>>>>>>>>>
>>>>>>>>>>>>> I am curious why the driver was
>>>>>>>>>>>>> developed without adhering to
>>>>>>>>>>>>> the
>>>>>>>>>>>>> kernel's gadget architecture.
>>>>>>>>>>>
>>>>>>>>>>> I don't know what you mean here. Which driver do you mean?
>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I have the dwc3 IP base usb controller, Let me check
>>>>>>>>>>>>>>>>> with this patch and
>>>>>>>>>>>>>>>>> share result here.  May be we need some fix in dwc3
>>>>>>>>>>>>>>> Would have been nice if someone could test on other
>>>>>>>>>>>>>>> controller as well. But
>>>>>>>>>>>>>>> another instance of dwc3 is also very welcome.
>>>>>>>>>>>>>>>> It's quite possible, please test on your side.
>>>>>>>>>>>>>>>> We are happy to test any fixes if you come up with.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> --
>>>>>>>>>>>>>> With Best Regards,
>>>>>>>>>>>>>> Andy Shevchenko
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>
>>>>
>>>>
>>>


2024-06-06 07:59:04

by Thorsten Leemhuis

[permalink] [raw]
Subject: Re: [PATCH v4] usb: gadget: u_ether: Replace netif_stop_queue with netif_device_detach

On 28.05.24 09:18, Hardik Gajjar wrote:
>
> Okay, lets revert the patch till we fix your problem. Would you be able to send the revert ? or shall I ?

From what I can see it seems nobody submitted the revert since then.
Unless I'm mistaken could you thus please take care of that?

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
If I did something stupid, please tell me, as explained on that page.