2023-09-24 04:44:49

by Haiyang Zhang

[permalink] [raw]
Subject: [PATCH net, 1/3] net: mana: Fix TX CQE error handling

For an unknown TX CQE error type (probably from a newer hardware),
still free the SKB, update the queue tail, etc., otherwise the
accounting will be wrong.

Also, TX errors can be triggered by injecting corrupted packets, so
replace the WARN_ONCE to ratelimited error logging, because we don't
need stack trace here.

Cc: [email protected]
Fixes: ca9c54d2d6a5 ("net: mana: Add a driver for Microsoft Azure Network Adapter (MANA)")
Signed-off-by: Haiyang Zhang <[email protected]>
---
drivers/net/ethernet/microsoft/mana/mana_en.c | 18 +++++++++++-------
1 file changed, 11 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/microsoft/mana/mana_en.c b/drivers/net/ethernet/microsoft/mana/mana_en.c
index 4a16ebff3d1d..5cdcf7561b38 100644
--- a/drivers/net/ethernet/microsoft/mana/mana_en.c
+++ b/drivers/net/ethernet/microsoft/mana/mana_en.c
@@ -1317,19 +1317,23 @@ static void mana_poll_tx_cq(struct mana_cq *cq)
case CQE_TX_VPORT_IDX_OUT_OF_RANGE:
case CQE_TX_VPORT_DISABLED:
case CQE_TX_VLAN_TAGGING_VIOLATION:
- WARN_ONCE(1, "TX: CQE error %d: ignored.\n",
- cqe_oob->cqe_hdr.cqe_type);
+ if (net_ratelimit())
+ netdev_err(ndev, "TX: CQE error %d\n",
+ cqe_oob->cqe_hdr.cqe_type);
+
apc->eth_stats.tx_cqe_err++;
break;

default:
- /* If the CQE type is unexpected, log an error, assert,
- * and go through the error path.
+ /* If the CQE type is unknown, log an error,
+ * and still free the SKB, update tail, etc.
*/
- WARN_ONCE(1, "TX: Unexpected CQE type %d: HW BUG?\n",
- cqe_oob->cqe_hdr.cqe_type);
+ if (net_ratelimit())
+ netdev_err(ndev, "TX: unknown CQE type %d\n",
+ cqe_oob->cqe_hdr.cqe_type);
+
apc->eth_stats.tx_cqe_unknown_type++;
- return;
+ break;
}

if (WARN_ON_ONCE(txq->gdma_txq_id != completions[i].wq_num))
--
2.25.1


2023-09-29 05:57:35

by Simon Horman

[permalink] [raw]
Subject: Re: [PATCH net, 1/3] net: mana: Fix TX CQE error handling

On Sat, Sep 23, 2023 at 06:31:45PM -0700, Haiyang Zhang wrote:
> For an unknown TX CQE error type (probably from a newer hardware),
> still free the SKB, update the queue tail, etc., otherwise the
> accounting will be wrong.
>
> Also, TX errors can be triggered by injecting corrupted packets, so
> replace the WARN_ONCE to ratelimited error logging, because we don't
> need stack trace here.
>
> Cc: [email protected]
> Fixes: ca9c54d2d6a5 ("net: mana: Add a driver for Microsoft Azure Network Adapter (MANA)")
> Signed-off-by: Haiyang Zhang <[email protected]>

Reviewed-by: Simon Horman <[email protected]>

2023-09-29 06:09:17

by Simon Horman

[permalink] [raw]
Subject: Re: [PATCH net, 1/3] net: mana: Fix TX CQE error handling

On Fri, Sep 29, 2023 at 07:47:57AM +0200, Simon Horman wrote:
> On Sat, Sep 23, 2023 at 06:31:45PM -0700, Haiyang Zhang wrote:
> > For an unknown TX CQE error type (probably from a newer hardware),
> > still free the SKB, update the queue tail, etc., otherwise the
> > accounting will be wrong.
> >
> > Also, TX errors can be triggered by injecting corrupted packets, so
> > replace the WARN_ONCE to ratelimited error logging, because we don't
> > need stack trace here.
> >
> > Cc: [email protected]
> > Fixes: ca9c54d2d6a5 ("net: mana: Add a driver for Microsoft Azure Network Adapter (MANA)")
> > Signed-off-by: Haiyang Zhang <[email protected]>
>
> Reviewed-by: Simon Horman <[email protected]>

Sorry, one latent question.

The patch replaces WARN_ONCE with a net_ratelimit()'d netdev_err().
But I do wonder if, as a fix, netdev_err_once() would be more appropriate.

2023-09-29 18:52:31

by Haiyang Zhang

[permalink] [raw]
Subject: RE: [PATCH net, 1/3] net: mana: Fix TX CQE error handling



> -----Original Message-----
> From: Simon Horman <[email protected]>
> Sent: Friday, September 29, 2023 1:51 AM
> To: Haiyang Zhang <[email protected]>
> Cc: [email protected]; [email protected]; Dexuan Cui
> <[email protected]>; KY Srinivasan <[email protected]>; Paul Rosswurm
> <[email protected]>; [email protected]; vkuznets
> <[email protected]>; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]; Long Li <[email protected]>;
> [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]; Ajay Sharma <[email protected]>;
> [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]
> Subject: Re: [PATCH net, 1/3] net: mana: Fix TX CQE error handling
>
> On Fri, Sep 29, 2023 at 07:47:57AM +0200, Simon Horman wrote:
> > On Sat, Sep 23, 2023 at 06:31:45PM -0700, Haiyang Zhang wrote:
> > > For an unknown TX CQE error type (probably from a newer hardware),
> > > still free the SKB, update the queue tail, etc., otherwise the
> > > accounting will be wrong.
> > >
> > > Also, TX errors can be triggered by injecting corrupted packets, so
> > > replace the WARN_ONCE to ratelimited error logging, because we don't
> > > need stack trace here.
> > >
> > > Cc: [email protected]
> > > Fixes: ca9c54d2d6a5 ("net: mana: Add a driver for Microsoft Azure
> Network Adapter (MANA)")
> > > Signed-off-by: Haiyang Zhang <[email protected]>
> >
> > Reviewed-by: Simon Horman <[email protected]>
>
> Sorry, one latent question.
>
> The patch replaces WARN_ONCE with a net_ratelimit()'d netdev_err().
> But I do wonder if, as a fix, netdev_err_once() would be more appropriate.

This error may happen with different CQE error types, so I use netdev_err()
to display them, and added rate limit.

Thanks
- Haiyang

2023-09-30 18:17:22

by Simon Horman

[permalink] [raw]
Subject: Re: [PATCH net, 1/3] net: mana: Fix TX CQE error handling

On Fri, Sep 29, 2023 at 03:51:48PM +0000, Haiyang Zhang wrote:
>
>
> > -----Original Message-----
> > From: Simon Horman <[email protected]>
> > Sent: Friday, September 29, 2023 1:51 AM
> > To: Haiyang Zhang <[email protected]>
> > Cc: [email protected]; [email protected]; Dexuan Cui
> > <[email protected]>; KY Srinivasan <[email protected]>; Paul Rosswurm
> > <[email protected]>; [email protected]; vkuznets
> > <[email protected]>; [email protected]; [email protected];
> > [email protected]; [email protected]; [email protected];
> > [email protected]; Long Li <[email protected]>;
> > [email protected]; [email protected];
> > [email protected]; [email protected]; [email protected];
> > [email protected]; Ajay Sharma <[email protected]>;
> > [email protected]; [email protected]; [email protected];
> > [email protected]; [email protected]
> > Subject: Re: [PATCH net, 1/3] net: mana: Fix TX CQE error handling
> >
> > On Fri, Sep 29, 2023 at 07:47:57AM +0200, Simon Horman wrote:
> > > On Sat, Sep 23, 2023 at 06:31:45PM -0700, Haiyang Zhang wrote:
> > > > For an unknown TX CQE error type (probably from a newer hardware),
> > > > still free the SKB, update the queue tail, etc., otherwise the
> > > > accounting will be wrong.
> > > >
> > > > Also, TX errors can be triggered by injecting corrupted packets, so
> > > > replace the WARN_ONCE to ratelimited error logging, because we don't
> > > > need stack trace here.
> > > >
> > > > Cc: [email protected]
> > > > Fixes: ca9c54d2d6a5 ("net: mana: Add a driver for Microsoft Azure
> > Network Adapter (MANA)")
> > > > Signed-off-by: Haiyang Zhang <[email protected]>
> > >
> > > Reviewed-by: Simon Horman <[email protected]>
> >
> > Sorry, one latent question.
> >
> > The patch replaces WARN_ONCE with a net_ratelimit()'d netdev_err().
> > But I do wonder if, as a fix, netdev_err_once() would be more appropriate.
>
> This error may happen with different CQE error types, so I use netdev_err()
> to display them, and added rate limit.

Thanks for the clarification.