2023-11-01 21:08:45

by Michael Chan

[permalink] [raw]
Subject: Re: [PATCH 2/2] tg3: Fix the TX ring stall

On Wed, Nov 1, 2023 at 12:19 PM <[email protected]> wrote:
>
> From: Alex Pakhunov <[email protected]>
>
> The TX ring maintained by the tg3 driver can end up in a state, when it
> has packets queued for sending but the NIC hardware is not informed, so no
> progress is made. This leads to a multi-second interruption in network
> traffic followed by dev_watchdog() firing and resetting the queue.
>
> The specific sequence of steps is:
>
> 1. tg3_start_xmit() is called at least once and queues packet(s) without
> updating tnapi->prodmbox (netdev_xmit_more() returns true)
> 2. tg3_start_xmit() is called with an SKB which causes tg3_tso_bug() to be
> called.
> 3. tg3_tso_bug() determines that the SKB is too large, ...
>
> if (unlikely(tg3_tx_avail(tnapi) <= frag_cnt_est)) {
>
> ... stops the queue, and returns NETDEV_TX_BUSY:
>
> netif_tx_stop_queue(txq);
> ...
> if (tg3_tx_avail(tnapi) <= frag_cnt_est)
> return NETDEV_TX_BUSY;
>
> 4. Since all tg3_tso_bug() call sites directly return, the code updating
> tnapi->prodmbox is skipped.

Thanks for the patch. An alternative fix that may be simpler is to
add a goto after calling tg3_tso_bug(). Something like this:

tg3_tso_bug();
goto update_tx_mbox;
...

update_tx_mbox:
if (!netdev_xmit_more() || netif_xmit_stopped())
tw32_tx_mbox();
...


Attachments:
smime.p7s (4.11 kB)
S/MIME Cryptographic Signature

2023-11-02 04:12:35

by Alex Pakhunov

[permalink] [raw]
Subject: Re: [PATCH 2/2] tg3: Fix the TX ring stall

Hi,

> Thanks for the patch. An alternative fix that may be simpler is to
> add a goto after calling tg3_tso_bug(). Something like this:
>
> tg3_tso_bug();
> goto update_tx_mbox;
> ...
>
> update_tx_mbox:
> if (!netdev_xmit_more() || netif_xmit_stopped())
> tw32_tx_mbox();
> ...

Yes, I considered this approach but in the end it seemed more fragile. All
future updates to tg3_start_xmit() would need to be careful to make sure
all return paths go through "update_tx_mbox". This is much more
straightforward with a separate wrapper function.

The sizes of both patches are roughly the same. The wrapper function
version:

drivers/net/ethernet/broadcom/tg3.c | 46 +++++++++++++++++++++++++++++++++++-----------
1 file changed, 35 insertions(+), 11 deletions(-)

The goto version touches four different return paths: three tg3_tso_bug()
calls and the return at the very top of the function:

drivers/net/ethernet/broadcom/tg3.c | 46 +++++++++++++++++++++++++++++++++-------------
1 file changed, 33 insertions(+), 13 deletions(-)

Let me re-test the goto version and resubmit it as v2. Please let me know
which version of the patch you prefer more.

Alex.

2023-11-02 04:42:35

by Michael Chan

[permalink] [raw]
Subject: Re: [PATCH 2/2] tg3: Fix the TX ring stall

On Wed, Nov 1, 2023 at 9:11 PM <[email protected]> wrote:
>
> Hi,
>
> > Thanks for the patch. An alternative fix that may be simpler is to
> > add a goto after calling tg3_tso_bug(). Something like this:
> >
> > tg3_tso_bug();
> > goto update_tx_mbox;
> > ...
> >
> > update_tx_mbox:
> > if (!netdev_xmit_more() || netif_xmit_stopped())
> > tw32_tx_mbox();
> > ...
>
> Yes, I considered this approach but in the end it seemed more fragile. All
> future updates to tg3_start_xmit() would need to be careful to make sure
> all return paths go through "update_tx_mbox". This is much more
> straightforward with a separate wrapper function.
>
> The sizes of both patches are roughly the same. The wrapper function
> version:
>
> drivers/net/ethernet/broadcom/tg3.c | 46 +++++++++++++++++++++++++++++++++++-----------
> 1 file changed, 35 insertions(+), 11 deletions(-)
>
> The goto version touches four different return paths: three tg3_tso_bug()
> calls and the return at the very top of the function:
>
> drivers/net/ethernet/broadcom/tg3.c | 46 +++++++++++++++++++++++++++++++++-------------
> 1 file changed, 33 insertions(+), 13 deletions(-)
>
> Let me re-test the goto version and resubmit it as v2. Please let me know
> which version of the patch you prefer more.
>

I did not realize the goto version is almost as big. In that case,
your original version is fine.

You might want to declare the variables in reverse Xmas tree style for
any new code. This driver is old and most of the existing code does
not follow that style.

Thanks.


Attachments:
smime.p7s (4.11 kB)
S/MIME Cryptographic Signature

2023-11-02 05:29:35

by Alex Pakhunov

[permalink] [raw]
Subject: Re: [PATCH 2/2] tg3: Fix the TX ring stall

> > Let me re-test the goto version and resubmit it as v2. Please let me know
> > which version of the patch you prefer more.
> >
>
> I did not realize the goto version is almost as big. In that case,
> your original version is fine.
>
> You might want to declare the variables in reverse Xmas tree style for
> any new code. This driver is old and most of the existing code does
> not follow that style.

Copy, thanks. I'll reorder the locals in tg3_start_xmit() and resubmit that
as v2.

Alex.