2020-12-15 02:20:57

by Ivan Babrou

[permalink] [raw]
Subject: [PATCH net-next] sfc: reduce the number of requested xdp ev queues

Without this change the driver tries to allocate too many queues,
breaching the number of available msi-x interrupts on machines
with many logical cpus and default adapter settings:

Insufficient resources for 12 XDP event queues (24 other channels, max 32)

Which in turn triggers EINVAL on XDP processing:

sfc 0000:86:00.0 ext0: XDP TX failed (-22)

Signed-off-by: Ivan Babrou <[email protected]>
---
drivers/net/ethernet/sfc/efx_channels.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/sfc/efx_channels.c b/drivers/net/ethernet/sfc/efx_channels.c
index a4a626e9cd9a..1bfeee283ea9 100644
--- a/drivers/net/ethernet/sfc/efx_channels.c
+++ b/drivers/net/ethernet/sfc/efx_channels.c
@@ -17,6 +17,7 @@
#include "rx_common.h"
#include "nic.h"
#include "sriov.h"
+#include "workarounds.h"

/* This is the first interrupt mode to try out of:
* 0 => MSI-X
@@ -137,6 +138,7 @@ static int efx_allocate_msix_channels(struct efx_nic *efx,
{
unsigned int n_channels = parallelism;
int vec_count;
+ int tx_per_ev;
int n_xdp_tx;
int n_xdp_ev;

@@ -149,9 +151,9 @@ static int efx_allocate_msix_channels(struct efx_nic *efx,
* multiple tx queues, assuming tx and ev queues are both
* maximum size.
*/
-
+ tx_per_ev = EFX_MAX_EVQ_SIZE / EFX_TXQ_MAX_ENT(efx);
n_xdp_tx = num_possible_cpus();
- n_xdp_ev = DIV_ROUND_UP(n_xdp_tx, EFX_MAX_TXQ_PER_CHANNEL);
+ n_xdp_ev = DIV_ROUND_UP(n_xdp_tx, tx_per_ev);

vec_count = pci_msix_vec_count(efx->pci_dev);
if (vec_count < 0)
--
2.29.2


2020-12-15 09:47:40

by Jesper Dangaard Brouer

[permalink] [raw]
Subject: Re: [PATCH net-next] sfc: reduce the number of requested xdp ev queues

On Mon, 14 Dec 2020 17:29:06 -0800
Ivan Babrou <[email protected]> wrote:

> Without this change the driver tries to allocate too many queues,
> breaching the number of available msi-x interrupts on machines
> with many logical cpus and default adapter settings:
>
> Insufficient resources for 12 XDP event queues (24 other channels, max 32)
>
> Which in turn triggers EINVAL on XDP processing:
>
> sfc 0000:86:00.0 ext0: XDP TX failed (-22)

I have a similar QA report with XDP_REDIRECT:
sfc 0000:05:00.0 ens1f0np0: XDP redirect failed (-22)

Here we are back to the issue we discussed with ixgbe, that NIC / msi-x
interrupts hardware resources are not enough on machines with many
logical cpus.

After this fix, what will happen if (cpu >= efx->xdp_tx_queue_count) ?
(Copied efx_xdp_tx_buffers code below signature)

The question leads to, does this driver need a fallback mechanism when
HW resource or systems logical cpus exceed the one TX-queue per CPU
assumption?


> Signed-off-by: Ivan Babrou <[email protected]>
> ---
> drivers/net/ethernet/sfc/efx_channels.c | 6 ++++--
> 1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/net/ethernet/sfc/efx_channels.c
> b/drivers/net/ethernet/sfc/efx_channels.c index
> a4a626e9cd9a..1bfeee283ea9 100644 ---
> a/drivers/net/ethernet/sfc/efx_channels.c +++
> b/drivers/net/ethernet/sfc/efx_channels.c @@ -17,6 +17,7 @@
> #include "rx_common.h"
> #include "nic.h"
> #include "sriov.h"
> +#include "workarounds.h"
>
> /* This is the first interrupt mode to try out of:
> * 0 => MSI-X
> @@ -137,6 +138,7 @@ static int efx_allocate_msix_channels(struct
> efx_nic *efx, {
> unsigned int n_channels = parallelism;
> int vec_count;
> + int tx_per_ev;
> int n_xdp_tx;
> int n_xdp_ev;
>
> @@ -149,9 +151,9 @@ static int efx_allocate_msix_channels(struct
> efx_nic *efx,
> * multiple tx queues, assuming tx and ev queues are both
> * maximum size.
> */
> -
> + tx_per_ev = EFX_MAX_EVQ_SIZE / EFX_TXQ_MAX_ENT(efx);
> n_xdp_tx = num_possible_cpus();
> - n_xdp_ev = DIV_ROUND_UP(n_xdp_tx, EFX_MAX_TXQ_PER_CHANNEL);
> + n_xdp_ev = DIV_ROUND_UP(n_xdp_tx, tx_per_ev);
>
> vec_count = pci_msix_vec_count(efx->pci_dev);
> if (vec_count < 0)



--
Best regards,
Jesper Dangaard Brouer
MSc.CS, Principal Kernel Engineer at Red Hat
LinkedIn: http://www.linkedin.com/in/brouer


/* Transmit a packet from an XDP buffer
*
* Returns number of packets sent on success, error code otherwise.
* Runs in NAPI context, either in our poll (for XDP TX) or a different NIC
* (for XDP redirect).
*/
int efx_xdp_tx_buffers(struct efx_nic *efx, int n, struct xdp_frame **xdpfs,
bool flush)
{
struct efx_tx_buffer *tx_buffer;
struct efx_tx_queue *tx_queue;
struct xdp_frame *xdpf;
dma_addr_t dma_addr;
unsigned int len;
int space;
int cpu;
int i;

cpu = raw_smp_processor_id();

if (!efx->xdp_tx_queue_count ||
unlikely(cpu >= efx->xdp_tx_queue_count))
return -EINVAL;

tx_queue = efx->xdp_tx_queues[cpu];
if (unlikely(!tx_queue))
return -EINVAL;

if (unlikely(n && !xdpfs))
return -EINVAL;

if (!n)
return 0;

/* Check for available space. We should never need multiple
* descriptors per frame.
*/
space = efx->txq_entries +
tx_queue->read_count - tx_queue->insert_count;

for (i = 0; i < n; i++) {
xdpf = xdpfs[i];

if (i >= space)
break;

/* We'll want a descriptor for this tx. */
prefetchw(__efx_tx_queue_get_insert_buffer(tx_queue));

len = xdpf->len;

/* Map for DMA. */
dma_addr = dma_map_single(&efx->pci_dev->dev,
xdpf->data, len,
DMA_TO_DEVICE);
if (dma_mapping_error(&efx->pci_dev->dev, dma_addr))
break;

/* Create descriptor and set up for unmapping DMA. */
tx_buffer = efx_tx_map_chunk(tx_queue, dma_addr, len);
tx_buffer->xdpf = xdpf;
tx_buffer->flags = EFX_TX_BUF_XDP |
EFX_TX_BUF_MAP_SINGLE;
tx_buffer->dma_offset = 0;
tx_buffer->unmap_len = len;
tx_queue->tx_packets++;
}

/* Pass mapped frames to hardware. */
if (flush && i > 0)
efx_nic_push_buffers(tx_queue);

if (i == 0)
return -EIO;

efx_xdp_return_frames(n - i, xdpfs + i);

return i;
}

2020-12-15 18:54:45

by Edward Cree

[permalink] [raw]
Subject: Re: [PATCH net-next] sfc: reduce the number of requested xdp ev queues

On 15/12/2020 09:43, Jesper Dangaard Brouer wrote:
> On Mon, 14 Dec 2020 17:29:06 -0800
> Ivan Babrou <[email protected]> wrote:
>
>> Without this change the driver tries to allocate too many queues,
>> breaching the number of available msi-x interrupts on machines
>> with many logical cpus and default adapter settings:
>>
>> Insufficient resources for 12 XDP event queues (24 other channels, max 32)
>>
>> Which in turn triggers EINVAL on XDP processing:
>>
>> sfc 0000:86:00.0 ext0: XDP TX failed (-22)
>
> I have a similar QA report with XDP_REDIRECT:
> sfc 0000:05:00.0 ens1f0np0: XDP redirect failed (-22)
>
> Here we are back to the issue we discussed with ixgbe, that NIC / msi-x
> interrupts hardware resources are not enough on machines with many
> logical cpus.
>
> After this fix, what will happen if (cpu >= efx->xdp_tx_queue_count) ?
Same as happened before: the "failed -22". But this fix will make that
less likely to happen, because it ties more TXQs to each EVQ, and it's
the EVQs that are in short supply.
(Strictly speaking, I believe the limitation is a software one, that
comes from the driver's channel structures having been designed a
decade ago when 32 cpus ought to be enough for anybody... AFAIR the
hardware is capable of giving us something like 1024 evqs if we ask
for them, it just might not have that many msi-x vectors for us.)
Anyway, the patch looks correct, so
Acked-by: Edward Cree <[email protected]>

-ed

2020-12-17 18:16:19

by Jakub Kicinski

[permalink] [raw]
Subject: Re: [PATCH net-next] sfc: reduce the number of requested xdp ev queues

On Mon, 14 Dec 2020 17:29:06 -0800 Ivan Babrou wrote:
> Without this change the driver tries to allocate too many queues,
> breaching the number of available msi-x interrupts on machines
> with many logical cpus and default adapter settings:
>
> Insufficient resources for 12 XDP event queues (24 other channels, max 32)
>
> Which in turn triggers EINVAL on XDP processing:
>
> sfc 0000:86:00.0 ext0: XDP TX failed (-22)
>
> Signed-off-by: Ivan Babrou <[email protected]>

Looks like the discussion may have concluded, but we don't take -next
patches during the merge window, so please repost when net-next reopens.

Thanks!
--
# Form letter - net-next is closed

We have already sent the networking pull request for 5.11 and therefore
net-next is closed for new drivers, features, code refactoring and
optimizations. We are currently accepting bug fixes only.

Please repost when net-next reopens after 5.11-rc1 is cut.

Look out for the announcement on the mailing list or check:
http://vger.kernel.org/~davem/net-next.html

RFC patches sent for review only are obviously welcome at any time.

2021-01-19 23:48:58

by Ivan Babrou

[permalink] [raw]
Subject: Re: [PATCH net-next] sfc: reduce the number of requested xdp ev queues

On Thu, Dec 17, 2020 at 10:14 AM Jakub Kicinski <[email protected]> wrote:
>
> On Mon, 14 Dec 2020 17:29:06 -0800 Ivan Babrou wrote:
> > Without this change the driver tries to allocate too many queues,
> > breaching the number of available msi-x interrupts on machines
> > with many logical cpus and default adapter settings:
> >
> > Insufficient resources for 12 XDP event queues (24 other channels, max 32)
> >
> > Which in turn triggers EINVAL on XDP processing:
> >
> > sfc 0000:86:00.0 ext0: XDP TX failed (-22)
> >
> > Signed-off-by: Ivan Babrou <[email protected]>
>
> Looks like the discussion may have concluded, but we don't take -next
> patches during the merge window, so please repost when net-next reopens.
>
> Thanks!
> --
> # Form letter - net-next is closed
>
> We have already sent the networking pull request for 5.11 and therefore
> net-next is closed for new drivers, features, code refactoring and
> optimizations. We are currently accepting bug fixes only.
>
> Please repost when net-next reopens after 5.11-rc1 is cut.

Should I resend my patch now that the window is open or is bumping
this thread enough?

> Look out for the announcement on the mailing list or check:
> http://vger.kernel.org/~davem/net-next.html
>
> RFC patches sent for review only are obviously welcome at any time.

2021-01-20 00:34:14

by Jakub Kicinski

[permalink] [raw]
Subject: Re: [PATCH net-next] sfc: reduce the number of requested xdp ev queues

On Tue, 19 Jan 2021 15:43:43 -0800 Ivan Babrou wrote:
> On Thu, Dec 17, 2020 at 10:14 AM Jakub Kicinski <[email protected]> wrote:
> > On Mon, 14 Dec 2020 17:29:06 -0800 Ivan Babrou wrote:
> > > Without this change the driver tries to allocate too many queues,
> > > breaching the number of available msi-x interrupts on machines
> > > with many logical cpus and default adapter settings:
> > >
> > > Insufficient resources for 12 XDP event queues (24 other channels, max 32)
> > >
> > > Which in turn triggers EINVAL on XDP processing:
> > >
> > > sfc 0000:86:00.0 ext0: XDP TX failed (-22)
> > >
> > > Signed-off-by: Ivan Babrou <[email protected]>
> >
> > Looks like the discussion may have concluded, but we don't take -next
> > patches during the merge window, so please repost when net-next reopens.
> >
> > Thanks!
> > --
> > # Form letter - net-next is closed
> >
> > We have already sent the networking pull request for 5.11 and therefore
> > net-next is closed for new drivers, features, code refactoring and
> > optimizations. We are currently accepting bug fixes only.
> >
> > Please repost when net-next reopens after 5.11-rc1 is cut.
>
> Should I resend my patch now that the window is open or is bumping
> this thread enough?

You need to resend, thanks!