2021-07-07 09:43:06

by Jason Xing

[permalink] [raw]
Subject: [PATCH net] i40e: introduce pseudo number of cpus for compatibility

From: Jason Xing <[email protected]>

The kernel will crash when loading xdpdrv with more than 64 cores and
Intel X722. Last time I submitted the fix (commit: 4e39a072a), but it only
solves the similar issue if the machine has less than 64 cores.

Crash log:
[305797.917411] Using feature eBPF/xdp.
[305798.054336] i40e 0000:1a:00.0: failed to get tracking for 256 queues
for VSI 0 err -12
[305798.055448] i40e 0000:1a:00.0: setup of MAIN VSI failed
[305798.056130] i40e 0000:1a:00.0: can't remove VEB 160 with 0 VSIs left
[305798.056856] BUG: unable to handle kernel NULL pointer dereference at
0000000000000000
[305798.061190] RIP: 0010:i40e_xdp+0xae/0x110 [i40e]
[305798.075139] Call Trace:
[305798.075666] dev_xdp_install+0x4f/0x70
[305798.076253] dev_change_xdp_fd+0x11f/0x230
[305798.085182] do_setlink+0xac7/0xe70
[305798.086344] rtnl_newlink+0x72d/0x850

Here's how to reproduce:
1) prepare one machine shipped with more than 64 cores and Intel X722
10G.
2) # mount bpffs -t bpf /sys/fs/bpf
3) # ip link set dev eth01 xdpdrv obj ./bpf_xdp.o sec from-netdev

The reason is that the allocation of @rss_size_max is too large and then
affects the value of @vsi->alloc_queue_pairs which is 64. Later, if we
load the xdpdrv, it will reinit the vsi and assign double of the older
value to @alloc_queue_pairs which triggers the failure of finding the
lump.

We cannot simply add the limit of calculating @rss_size_max because
@pf->num_lan_qps will pick up the maximum (say, the number of cpus).
In fact, we don't need to allocate too many resources in accordance with
the number of cpus. It's limited by the hardware itself. So I introduce
the pseudo number of cpus to replace the real number to avoid the
unbalance.

After applying this feature, the machine with X722 nic will only use 64
cores no mather how large the number of cpus. The pseudo number
actually depends on @num_tx_qp.

Fixes: 41c445ff0f48 ("i40e: main driver core")
Co-developed-by: Shujin Li <[email protected]>
Signed-off-by: Shujin Li <[email protected]>
Signed-off-by: Jason Xing <[email protected]>
---
drivers/net/ethernet/intel/i40e/i40e_main.c | 47 +++++++++++++++++++++++------
1 file changed, 38 insertions(+), 9 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
index 861e59a..26de518 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -100,6 +100,35 @@ static int i40e_get_capabilities(struct i40e_pf *pf,
static struct workqueue_struct *i40e_wq;

/**
+ * i40e_pseudo_num_online_cpus - use as possible as what we can use actually
+ * @pf: board private structure
+ *
+ * We mignt not use all the cores because some old nics are not compatible
+ * with the machine which has a large number of cores, say, 128 cores
+ * combined with X722 nic.
+ *
+ * We should avoid this situation where the number of core is too large
+ * while the nic is a little bit old. Therefore, we have to limit the
+ * actual number of cpus we can use by adding the calculation of parameters
+ * of the hardware.
+ *
+ * The algorithm is violent and shrink the number exponentially, so that we
+ * make sure that the driver can work definitely.
+ */
+static u16 i40e_pseudo_num_online_cpus(struct i40e_pf *pf)
+{
+ u32 limit;
+ u16 pow;
+
+ limit = pf->hw.func_caps.num_tx_qp / 3;
+ pow = roundup_pow_of_two(num_online_cpus());
+ while (pow >= limit)
+ pow = rounddown_pow_of_two(pow - 1);
+
+ return pow;
+}
+
+/**
* i40e_allocate_dma_mem_d - OS specific memory alloc for shared code
* @hw: pointer to the HW structure
* @mem: ptr to mem struct to fill out
@@ -11395,7 +11424,7 @@ static int i40e_init_msix(struct i40e_pf *pf)
* will use any remaining vectors to reach as close as we can to the
* number of online CPUs.
*/
- cpus = num_online_cpus();
+ cpus = i40e_pseudo_num_online_cpus(pf);
pf->num_lan_msix = min_t(int, cpus, vectors_left / 2);
vectors_left -= pf->num_lan_msix;

@@ -11457,7 +11486,7 @@ static int i40e_init_msix(struct i40e_pf *pf)
* the number of vectors for num_lan_msix to be at most 50% of the
* available vectors, to allow for other features. Now, we add back
* the remaining vectors. However, we ensure that the total
- * num_lan_msix will not exceed num_online_cpus(). To do this, we
+ * num_lan_msix will not exceed i40e_pseudo_num_online_cpus(). To do this, we
* calculate the number of vectors we can add without going over the
* cap of CPUs. For systems with a small number of CPUs this will be
* zero.
@@ -12102,7 +12131,7 @@ int i40e_reconfig_rss_queues(struct i40e_pf *pf, int queue_count)
if (!(pf->flags & I40E_FLAG_RSS_ENABLED))
return 0;

- queue_count = min_t(int, queue_count, num_online_cpus());
+ queue_count = min_t(int, queue_count, i40e_pseudo_num_online_cpus(pf));
new_rss_size = min_t(int, queue_count, pf->rss_size_max);

if (queue_count != vsi->num_queue_pairs) {
@@ -12348,13 +12377,13 @@ static int i40e_sw_init(struct i40e_pf *pf)
pf->hw.func_caps.num_tx_qp);

/* find the next higher power-of-2 of num cpus */
- pow = roundup_pow_of_two(num_online_cpus());
+ pow = roundup_pow_of_two(i40e_pseudo_num_online_cpus(pf));
pf->rss_size_max = min_t(int, pf->rss_size_max, pow);

if (pf->hw.func_caps.rss) {
pf->flags |= I40E_FLAG_RSS_ENABLED;
pf->alloc_rss_size = min_t(int, pf->rss_size_max,
- num_online_cpus());
+ i40e_pseudo_num_online_cpus(pf));
}

/* MFP mode enabled */
@@ -12446,16 +12475,16 @@ static int i40e_sw_init(struct i40e_pf *pf)
pf->hw.aq.fw_maj_ver >= 6)
pf->hw_features |= I40E_HW_PTP_L4_CAPABLE;

- if (pf->hw.func_caps.vmdq && num_online_cpus() != 1) {
+ if (pf->hw.func_caps.vmdq && i40e_pseudo_num_online_cpus(pf) != 1) {
pf->num_vmdq_vsis = I40E_DEFAULT_NUM_VMDQ_VSI;
pf->flags |= I40E_FLAG_VMDQ_ENABLED;
pf->num_vmdq_qps = i40e_default_queues_per_vmdq(pf);
}

- if (pf->hw.func_caps.iwarp && num_online_cpus() != 1) {
+ if (pf->hw.func_caps.iwarp && i40e_pseudo_num_online_cpus(pf) != 1) {
pf->flags |= I40E_FLAG_IWARP_ENABLED;
/* IWARP needs one extra vector for CQP just like MISC.*/
- pf->num_iwarp_msix = (int)num_online_cpus() + 1;
+ pf->num_iwarp_msix = (int)i40e_pseudo_num_online_cpus(pf) + 1;
}
/* Stopping FW LLDP engine is supported on XL710 and X722
* starting from FW versions determined in i40e_init_adminq.
@@ -14805,7 +14834,7 @@ static void i40e_determine_queue_usage(struct i40e_pf *pf)
}

/* limit lan qps to the smaller of qps, cpus or msix */
- q_max = max_t(int, pf->rss_size_max, num_online_cpus());
+ q_max = max_t(int, pf->rss_size_max, i40e_pseudo_num_online_cpus(pf));
q_max = min_t(int, q_max, pf->hw.func_caps.num_tx_qp);
q_max = min_t(int, q_max, pf->hw.func_caps.num_msix_vectors);
pf->num_lan_qps = q_max;
--
1.8.3.1


2021-07-09 07:14:49

by Jason Xing

[permalink] [raw]
Subject: Re: [PATCH net] i40e: introduce pseudo number of cpus for compatibility

Oh, one more thing I missed in the last email is that all the failures
are happening on the combination of X722 10GbE and 1GbE. So the value
of @num_tx_qp the driver fetches is 384 while the value is 768
without x722 1GbE.

I get that information back here:
$ lspci | grep -i ether
5a:00.0 Ethernet controller: Intel Corporation Ethernet Connection
X722 for 10GbE SFP+ (rev 09)
5a:00.1 Ethernet controller: Intel Corporation Ethernet Connection
X722 for 10GbE SFP+ (rev 09)
5a:00.2 Ethernet controller: Intel Corporation Ethernet Connection
X722 for 1GbE (rev 09)
5a:00.3 Ethernet controller: Intel Corporation Ethernet Connection
X722 for 1GbE (rev 09)

I know it's really stupid to control the number of online cpus, but
finding a good way only to limit the @alloc_queue_pairs is not easy to
go. So could someone point out a better way to fix this issue and take
care of some relatively old nics with the number of cpus increasing?

Thanks,
Jason

On Wed, Jul 7, 2021 at 5:41 PM <[email protected]> wrote:
>
> From: Jason Xing <[email protected]>
>
> The kernel will crash when loading xdpdrv with more than 64 cores and
> Intel X722. Last time I submitted the fix (commit: 4e39a072a), but it only
> solves the similar issue if the machine has less than 64 cores.
>
> Crash log:
> [305797.917411] Using feature eBPF/xdp.
> [305798.054336] i40e 0000:1a:00.0: failed to get tracking for 256 queues
> for VSI 0 err -12
> [305798.055448] i40e 0000:1a:00.0: setup of MAIN VSI failed
> [305798.056130] i40e 0000:1a:00.0: can't remove VEB 160 with 0 VSIs left
> [305798.056856] BUG: unable to handle kernel NULL pointer dereference at
> 0000000000000000
> [305798.061190] RIP: 0010:i40e_xdp+0xae/0x110 [i40e]
> [305798.075139] Call Trace:
> [305798.075666] dev_xdp_install+0x4f/0x70
> [305798.076253] dev_change_xdp_fd+0x11f/0x230
> [305798.085182] do_setlink+0xac7/0xe70
> [305798.086344] rtnl_newlink+0x72d/0x850
>
> Here's how to reproduce:
> 1) prepare one machine shipped with more than 64 cores and Intel X722
> 10G.
> 2) # mount bpffs -t bpf /sys/fs/bpf
> 3) # ip link set dev eth01 xdpdrv obj ./bpf_xdp.o sec from-netdev
>
> The reason is that the allocation of @rss_size_max is too large and then
> affects the value of @vsi->alloc_queue_pairs which is 64. Later, if we
> load the xdpdrv, it will reinit the vsi and assign double of the older
> value to @alloc_queue_pairs which triggers the failure of finding the
> lump.
>
> We cannot simply add the limit of calculating @rss_size_max because
> @pf->num_lan_qps will pick up the maximum (say, the number of cpus).
> In fact, we don't need to allocate too many resources in accordance with
> the number of cpus. It's limited by the hardware itself. So I introduce
> the pseudo number of cpus to replace the real number to avoid the
> unbalance.
>
> After applying this feature, the machine with X722 nic will only use 64
> cores no mather how large the number of cpus. The pseudo number
> actually depends on @num_tx_qp.
>
> Fixes: 41c445ff0f48 ("i40e: main driver core")
> Co-developed-by: Shujin Li <[email protected]>
> Signed-off-by: Shujin Li <[email protected]>
> Signed-off-by: Jason Xing <[email protected]>
> ---
> drivers/net/ethernet/intel/i40e/i40e_main.c | 47 +++++++++++++++++++++++------
> 1 file changed, 38 insertions(+), 9 deletions(-)
>
> diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
> index 861e59a..26de518 100644
> --- a/drivers/net/ethernet/intel/i40e/i40e_main.c
> +++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
> @@ -100,6 +100,35 @@ static int i40e_get_capabilities(struct i40e_pf *pf,
> static struct workqueue_struct *i40e_wq;
>
> /**
> + * i40e_pseudo_num_online_cpus - use as possible as what we can use actually
> + * @pf: board private structure
> + *
> + * We mignt not use all the cores because some old nics are not compatible
> + * with the machine which has a large number of cores, say, 128 cores
> + * combined with X722 nic.
> + *
> + * We should avoid this situation where the number of core is too large
> + * while the nic is a little bit old. Therefore, we have to limit the
> + * actual number of cpus we can use by adding the calculation of parameters
> + * of the hardware.
> + *
> + * The algorithm is violent and shrink the number exponentially, so that we
> + * make sure that the driver can work definitely.
> + */
> +static u16 i40e_pseudo_num_online_cpus(struct i40e_pf *pf)
> +{
> + u32 limit;
> + u16 pow;
> +
> + limit = pf->hw.func_caps.num_tx_qp / 3;
> + pow = roundup_pow_of_two(num_online_cpus());
> + while (pow >= limit)
> + pow = rounddown_pow_of_two(pow - 1);
> +
> + return pow;
> +}
> +
> +/**
> * i40e_allocate_dma_mem_d - OS specific memory alloc for shared code
> * @hw: pointer to the HW structure
> * @mem: ptr to mem struct to fill out
> @@ -11395,7 +11424,7 @@ static int i40e_init_msix(struct i40e_pf *pf)
> * will use any remaining vectors to reach as close as we can to the
> * number of online CPUs.
> */
> - cpus = num_online_cpus();
> + cpus = i40e_pseudo_num_online_cpus(pf);
> pf->num_lan_msix = min_t(int, cpus, vectors_left / 2);
> vectors_left -= pf->num_lan_msix;
>
> @@ -11457,7 +11486,7 @@ static int i40e_init_msix(struct i40e_pf *pf)
> * the number of vectors for num_lan_msix to be at most 50% of the
> * available vectors, to allow for other features. Now, we add back
> * the remaining vectors. However, we ensure that the total
> - * num_lan_msix will not exceed num_online_cpus(). To do this, we
> + * num_lan_msix will not exceed i40e_pseudo_num_online_cpus(). To do this, we
> * calculate the number of vectors we can add without going over the
> * cap of CPUs. For systems with a small number of CPUs this will be
> * zero.
> @@ -12102,7 +12131,7 @@ int i40e_reconfig_rss_queues(struct i40e_pf *pf, int queue_count)
> if (!(pf->flags & I40E_FLAG_RSS_ENABLED))
> return 0;
>
> - queue_count = min_t(int, queue_count, num_online_cpus());
> + queue_count = min_t(int, queue_count, i40e_pseudo_num_online_cpus(pf));
> new_rss_size = min_t(int, queue_count, pf->rss_size_max);
>
> if (queue_count != vsi->num_queue_pairs) {
> @@ -12348,13 +12377,13 @@ static int i40e_sw_init(struct i40e_pf *pf)
> pf->hw.func_caps.num_tx_qp);
>
> /* find the next higher power-of-2 of num cpus */
> - pow = roundup_pow_of_two(num_online_cpus());
> + pow = roundup_pow_of_two(i40e_pseudo_num_online_cpus(pf));
> pf->rss_size_max = min_t(int, pf->rss_size_max, pow);
>
> if (pf->hw.func_caps.rss) {
> pf->flags |= I40E_FLAG_RSS_ENABLED;
> pf->alloc_rss_size = min_t(int, pf->rss_size_max,
> - num_online_cpus());
> + i40e_pseudo_num_online_cpus(pf));
> }
>
> /* MFP mode enabled */
> @@ -12446,16 +12475,16 @@ static int i40e_sw_init(struct i40e_pf *pf)
> pf->hw.aq.fw_maj_ver >= 6)
> pf->hw_features |= I40E_HW_PTP_L4_CAPABLE;
>
> - if (pf->hw.func_caps.vmdq && num_online_cpus() != 1) {
> + if (pf->hw.func_caps.vmdq && i40e_pseudo_num_online_cpus(pf) != 1) {
> pf->num_vmdq_vsis = I40E_DEFAULT_NUM_VMDQ_VSI;
> pf->flags |= I40E_FLAG_VMDQ_ENABLED;
> pf->num_vmdq_qps = i40e_default_queues_per_vmdq(pf);
> }
>
> - if (pf->hw.func_caps.iwarp && num_online_cpus() != 1) {
> + if (pf->hw.func_caps.iwarp && i40e_pseudo_num_online_cpus(pf) != 1) {
> pf->flags |= I40E_FLAG_IWARP_ENABLED;
> /* IWARP needs one extra vector for CQP just like MISC.*/
> - pf->num_iwarp_msix = (int)num_online_cpus() + 1;
> + pf->num_iwarp_msix = (int)i40e_pseudo_num_online_cpus(pf) + 1;
> }
> /* Stopping FW LLDP engine is supported on XL710 and X722
> * starting from FW versions determined in i40e_init_adminq.
> @@ -14805,7 +14834,7 @@ static void i40e_determine_queue_usage(struct i40e_pf *pf)
> }
>
> /* limit lan qps to the smaller of qps, cpus or msix */
> - q_max = max_t(int, pf->rss_size_max, num_online_cpus());
> + q_max = max_t(int, pf->rss_size_max, i40e_pseudo_num_online_cpus(pf));
> q_max = min_t(int, q_max, pf->hw.func_caps.num_tx_qp);
> q_max = min_t(int, q_max, pf->hw.func_caps.num_msix_vectors);
> pf->num_lan_qps = q_max;
> --
> 1.8.3.1
>

2021-07-14 21:46:45

by Tony Nguyen

[permalink] [raw]
Subject: Re: [PATCH net] i40e: introduce pseudo number of cpus for compatibility

On Fri, 2021-07-09 at 15:13 +0800, Jason Xing wrote:
> Oh, one more thing I missed in the last email is that all the
> failures
> are happening on the combination of X722 10GbE and 1GbE. So the value
> of @num_tx_qp the driver fetches is 384 while the value is 768
> without x722 1GbE.
>
> I get that information back here:
> $ lspci | grep -i ether
> 5a:00.0 Ethernet controller: Intel Corporation Ethernet Connection
> X722 for 10GbE SFP+ (rev 09)
> 5a:00.1 Ethernet controller: Intel Corporation Ethernet Connection
> X722 for 10GbE SFP+ (rev 09)
> 5a:00.2 Ethernet controller: Intel Corporation Ethernet Connection
> X722 for 1GbE (rev 09)
> 5a:00.3 Ethernet controller: Intel Corporation Ethernet Connection
> X722 for 1GbE (rev 09)
>
> I know it's really stupid to control the number of online cpus, but
> finding a good way only to limit the @alloc_queue_pairs is not easy
> to
> go. So could someone point out a better way to fix this issue and
> take
> care of some relatively old nics with the number of cpus increasing?

Hi Jason,

Sorry for the slow response; I was trying to talk to the i40e team
about this.

I agree, the limiting of number of online CPUs doesn't seem like a
solution we want to pursue. The team is working on a patch that deals
with the same, or similiar, issue; it is reworking the allocations of
the queue pile. I'll make sure that they add you on the patch when it
is sent so that you can try this and see if it resolves your issue.

Thanks,
Tony

2021-07-15 02:38:35

by Jason Xing

[permalink] [raw]
Subject: Re: [PATCH net] i40e: introduce pseudo number of cpus for compatibility

On Thu, Jul 15, 2021 at 4:52 AM Nguyen, Anthony L
<[email protected]> wrote:
>
> On Fri, 2021-07-09 at 15:13 +0800, Jason Xing wrote:
> > Oh, one more thing I missed in the last email is that all the
> > failures
> > are happening on the combination of X722 10GbE and 1GbE. So the value
> > of @num_tx_qp the driver fetches is 384 while the value is 768
> > without x722 1GbE.
> >
> > I get that information back here:
> > $ lspci | grep -i ether
> > 5a:00.0 Ethernet controller: Intel Corporation Ethernet Connection
> > X722 for 10GbE SFP+ (rev 09)
> > 5a:00.1 Ethernet controller: Intel Corporation Ethernet Connection
> > X722 for 10GbE SFP+ (rev 09)
> > 5a:00.2 Ethernet controller: Intel Corporation Ethernet Connection
> > X722 for 1GbE (rev 09)
> > 5a:00.3 Ethernet controller: Intel Corporation Ethernet Connection
> > X722 for 1GbE (rev 09)
> >
> > I know it's really stupid to control the number of online cpus, but
> > finding a good way only to limit the @alloc_queue_pairs is not easy
> > to
> > go. So could someone point out a better way to fix this issue and
> > take
> > care of some relatively old nics with the number of cpus increasing?
>
> Hi Jason,
>
> Sorry for the slow response; I was trying to talk to the i40e team
> about this.

Thanks for your kind help really. It indeed has a big impact on thousands
of machines.

>
> I agree, the limiting of number of online CPUs doesn't seem like a
> solution we want to pursue. The team is working on a patch that deals

As I said above, if the machine is equipped with only 10GbE nic, the maximum
online cpus would be 256 and so on. For now, it depends on the num of cpus.

> with the same, or similiar, issue; it is reworking the allocations of
> the queue pile. I'll make sure that they add you on the patch when it

It's not easy to cover all kinds of cases. But I still believe it's
the only proper
way to fix the issue. Looking forward to your patch :)

> is sent so that you can try this and see if it resolves your issue.
>

Yeah, sure, I will double-check and then see if it's really fixed.

Thanks,
Jason

> Thanks,
> Tony
>

2021-07-26 02:44:10

by Jason Xing

[permalink] [raw]
Subject: Re: [PATCH net] i40e: introduce pseudo number of cpus for compatibility

Hi Anthony L,

Do you have any progress or any idea on the final patch? Or you could
point out some more detailed method to rework the calculation of the
queue pile.

I think it's critical and has an impact on all the old nics, which
means thousands of machines would crash if xdp-drv program is loaded.

Thanks,
Jason

On Thu, Jul 15, 2021 at 10:33 AM Jason Xing <[email protected]> wrote:
>
> On Thu, Jul 15, 2021 at 4:52 AM Nguyen, Anthony L
> <[email protected]> wrote:
> >
> > On Fri, 2021-07-09 at 15:13 +0800, Jason Xing wrote:
> > > Oh, one more thing I missed in the last email is that all the
> > > failures
> > > are happening on the combination of X722 10GbE and 1GbE. So the value
> > > of @num_tx_qp the driver fetches is 384 while the value is 768
> > > without x722 1GbE.
> > >
> > > I get that information back here:
> > > $ lspci | grep -i ether
> > > 5a:00.0 Ethernet controller: Intel Corporation Ethernet Connection
> > > X722 for 10GbE SFP+ (rev 09)
> > > 5a:00.1 Ethernet controller: Intel Corporation Ethernet Connection
> > > X722 for 10GbE SFP+ (rev 09)
> > > 5a:00.2 Ethernet controller: Intel Corporation Ethernet Connection
> > > X722 for 1GbE (rev 09)
> > > 5a:00.3 Ethernet controller: Intel Corporation Ethernet Connection
> > > X722 for 1GbE (rev 09)
> > >
> > > I know it's really stupid to control the number of online cpus, but
> > > finding a good way only to limit the @alloc_queue_pairs is not easy
> > > to
> > > go. So could someone point out a better way to fix this issue and
> > > take
> > > care of some relatively old nics with the number of cpus increasing?
> >
> > Hi Jason,
> >
> > Sorry for the slow response; I was trying to talk to the i40e team
> > about this.
>
> Thanks for your kind help really. It indeed has a big impact on thousands
> of machines.
>
> >
> > I agree, the limiting of number of online CPUs doesn't seem like a
> > solution we want to pursue. The team is working on a patch that deals
>
> As I said above, if the machine is equipped with only 10GbE nic, the maximum
> online cpus would be 256 and so on. For now, it depends on the num of cpus.
>
> > with the same, or similiar, issue; it is reworking the allocations of
> > the queue pile. I'll make sure that they add you on the patch when it
>
> It's not easy to cover all kinds of cases. But I still believe it's
> the only proper
> way to fix the issue. Looking forward to your patch :)
>
> > is sent so that you can try this and see if it resolves your issue.
> >
>
> Yeah, sure, I will double-check and then see if it's really fixed.
>
> Thanks,
> Jason
>
> > Thanks,
> > Tony
> >