2022-10-21 12:27:17

by Valentin Schneider

[permalink] [raw]
Subject: [PATCH v5 0/3] sched, net: NUMA-aware CPU spreading interface

Hi folks,

Tariq pointed out in [1] that drivers allocating IRQ vectors would benefit
from having smarter NUMA-awareness (cpumask_local_spread() doesn't quite cut
it).

The proposed interface involved an array of CPUs and a temporary cpumask, and
being my difficult self what I'm proposing here is an interface that doesn't
require any temporary storage other than some stack variables (at the cost of
one wild macro).

[1]: https://lore.kernel.org/all/[email protected]/

Revisions
=========

v4 -> v5
++++++++
o Rebased onto 6.1-rc1
o Ditched the CPU iterator, moved to a cpumask iterator (Yury)

v3 -> v4
++++++++

o Rebased on top of Yury's bitmap-for-next
o Added Tariq's mlx5e patch
o Made sched_numa_hop_mask() return cpu_online_mask for the NUMA_NO_NODE &&
hops=0 case

v2 -> v3
++++++++

o Added for_each_cpu_and() and for_each_cpu_andnot() tests (Yury)
o New patches to fix issues raised by running the above

o New patch to use for_each_cpu_andnot() in sched/core.c (Yury)

v1 -> v2
++++++++

o Split _find_next_bit() @invert into @invert1 and @invert2 (Yury)
o Rebase onto v6.0-rc1

Cheers,
Valentin

Tariq Toukan (1):
net/mlx5e: Improve remote NUMA preferences used for the IRQ affinity
hints

Valentin Schneider (2):
sched/topology: Introduce sched_numa_hop_mask()
sched/topology: Introduce for_each_numa_hop_mask()

drivers/net/ethernet/mellanox/mlx5/core/eq.c | 18 +++++++++--
include/linux/topology.h | 32 ++++++++++++++++++++
kernel/sched/topology.c | 31 +++++++++++++++++++
3 files changed, 79 insertions(+), 2 deletions(-)

--
2.31.1


2022-10-21 12:28:02

by Valentin Schneider

[permalink] [raw]
Subject: [PATCH v5 3/3] net/mlx5e: Improve remote NUMA preferences used for the IRQ affinity hints

From: Tariq Toukan <[email protected]>

In the IRQ affinity hints, replace the binary NUMA preference (local /
remote) with the improved for_each_numa_hop_cpu() API that minds the
actual distances, so that remote NUMAs with short distance are preferred
over farther ones.

This has significant performance implications when using NUMA-aware
allocated memory (follow [1] and derivatives for example).

[1]
drivers/net/ethernet/mellanox/mlx5/core/en_main.c :: mlx5e_open_channel()
int cpu = cpumask_first(mlx5_comp_irq_get_affinity_mask(priv->mdev, ix));

Performance tests:

TCP multi-stream, using 16 iperf3 instances pinned to 16 cores (with aRFS on).
Active cores: 64,65,72,73,80,81,88,89,96,97,104,105,112,113,120,121

+-------------------------+-----------+------------------+------------------+
| | BW (Gbps) | TX side CPU util | RX side CPU util |
+-------------------------+-----------+------------------+------------------+
| Baseline | 52.3 | 6.4 % | 17.9 % |
+-------------------------+-----------+------------------+------------------+
| Applied on TX side only | 52.6 | 5.2 % | 18.5 % |
+-------------------------+-----------+------------------+------------------+
| Applied on RX side only | 94.9 | 11.9 % | 27.2 % |
+-------------------------+-----------+------------------+------------------+
| Applied on both sides | 95.1 | 8.4 % | 27.3 % |
+-------------------------+-----------+------------------+------------------+

Bottleneck in RX side is released, reached linerate (~1.8x speedup).
~30% less cpu util on TX.

* CPU util on active cores only.

Setups details (similar for both sides):

NIC: ConnectX6-DX dual port, 100 Gbps each.
Single port used in the tests.

$ lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 256
On-line CPU(s) list: 0-255
Thread(s) per core: 2
Core(s) per socket: 64
Socket(s): 2
NUMA node(s): 16
Vendor ID: AuthenticAMD
CPU family: 25
Model: 1
Model name: AMD EPYC 7763 64-Core Processor
Stepping: 1
CPU MHz: 2594.804
BogoMIPS: 4890.73
Virtualization: AMD-V
L1d cache: 32K
L1i cache: 32K
L2 cache: 512K
L3 cache: 32768K
NUMA node0 CPU(s): 0-7,128-135
NUMA node1 CPU(s): 8-15,136-143
NUMA node2 CPU(s): 16-23,144-151
NUMA node3 CPU(s): 24-31,152-159
NUMA node4 CPU(s): 32-39,160-167
NUMA node5 CPU(s): 40-47,168-175
NUMA node6 CPU(s): 48-55,176-183
NUMA node7 CPU(s): 56-63,184-191
NUMA node8 CPU(s): 64-71,192-199
NUMA node9 CPU(s): 72-79,200-207
NUMA node10 CPU(s): 80-87,208-215
NUMA node11 CPU(s): 88-95,216-223
NUMA node12 CPU(s): 96-103,224-231
NUMA node13 CPU(s): 104-111,232-239
NUMA node14 CPU(s): 112-119,240-247
NUMA node15 CPU(s): 120-127,248-255
..

$ numactl -H
..
node distances:
node 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
0: 10 11 11 11 12 12 12 12 32 32 32 32 32 32 32 32
1: 11 10 11 11 12 12 12 12 32 32 32 32 32 32 32 32
2: 11 11 10 11 12 12 12 12 32 32 32 32 32 32 32 32
3: 11 11 11 10 12 12 12 12 32 32 32 32 32 32 32 32
4: 12 12 12 12 10 11 11 11 32 32 32 32 32 32 32 32
5: 12 12 12 12 11 10 11 11 32 32 32 32 32 32 32 32
6: 12 12 12 12 11 11 10 11 32 32 32 32 32 32 32 32
7: 12 12 12 12 11 11 11 10 32 32 32 32 32 32 32 32
8: 32 32 32 32 32 32 32 32 10 11 11 11 12 12 12 12
9: 32 32 32 32 32 32 32 32 11 10 11 11 12 12 12 12
10: 32 32 32 32 32 32 32 32 11 11 10 11 12 12 12 12
11: 32 32 32 32 32 32 32 32 11 11 11 10 12 12 12 12
12: 32 32 32 32 32 32 32 32 12 12 12 12 10 11 11 11
13: 32 32 32 32 32 32 32 32 12 12 12 12 11 10 11 11
14: 32 32 32 32 32 32 32 32 12 12 12 12 11 11 10 11
15: 32 32 32 32 32 32 32 32 12 12 12 12 11 11 11 10

$ cat /sys/class/net/ens5f0/device/numa_node
14

Affinity hints (127 IRQs):
Before:
331: 00000000,00000000,00000000,00000000,00010000,00000000,00000000,00000000
332: 00000000,00000000,00000000,00000000,00020000,00000000,00000000,00000000
333: 00000000,00000000,00000000,00000000,00040000,00000000,00000000,00000000
334: 00000000,00000000,00000000,00000000,00080000,00000000,00000000,00000000
335: 00000000,00000000,00000000,00000000,00100000,00000000,00000000,00000000
336: 00000000,00000000,00000000,00000000,00200000,00000000,00000000,00000000
337: 00000000,00000000,00000000,00000000,00400000,00000000,00000000,00000000
338: 00000000,00000000,00000000,00000000,00800000,00000000,00000000,00000000
339: 00010000,00000000,00000000,00000000,00000000,00000000,00000000,00000000
340: 00020000,00000000,00000000,00000000,00000000,00000000,00000000,00000000
341: 00040000,00000000,00000000,00000000,00000000,00000000,00000000,00000000
342: 00080000,00000000,00000000,00000000,00000000,00000000,00000000,00000000
343: 00100000,00000000,00000000,00000000,00000000,00000000,00000000,00000000
344: 00200000,00000000,00000000,00000000,00000000,00000000,00000000,00000000
345: 00400000,00000000,00000000,00000000,00000000,00000000,00000000,00000000
346: 00800000,00000000,00000000,00000000,00000000,00000000,00000000,00000000
347: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000001
348: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000002
349: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000004
350: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000008
351: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000010
352: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000020
353: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000040
354: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000080
355: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000100
356: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000200
357: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000400
358: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000800
359: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00001000
360: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00002000
361: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00004000
362: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00008000
363: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00010000
364: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00020000
365: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00040000
366: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00080000
367: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00100000
368: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00200000
369: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00400000
370: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00800000
371: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,01000000
372: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,02000000
373: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,04000000
374: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,08000000
375: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,10000000
376: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,20000000
377: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,40000000
378: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,80000000
379: 00000000,00000000,00000000,00000000,00000000,00000000,00000001,00000000
380: 00000000,00000000,00000000,00000000,00000000,00000000,00000002,00000000
381: 00000000,00000000,00000000,00000000,00000000,00000000,00000004,00000000
382: 00000000,00000000,00000000,00000000,00000000,00000000,00000008,00000000
383: 00000000,00000000,00000000,00000000,00000000,00000000,00000010,00000000
384: 00000000,00000000,00000000,00000000,00000000,00000000,00000020,00000000
385: 00000000,00000000,00000000,00000000,00000000,00000000,00000040,00000000
386: 00000000,00000000,00000000,00000000,00000000,00000000,00000080,00000000
387: 00000000,00000000,00000000,00000000,00000000,00000000,00000100,00000000
388: 00000000,00000000,00000000,00000000,00000000,00000000,00000200,00000000
389: 00000000,00000000,00000000,00000000,00000000,00000000,00000400,00000000
390: 00000000,00000000,00000000,00000000,00000000,00000000,00000800,00000000
391: 00000000,00000000,00000000,00000000,00000000,00000000,00001000,00000000
392: 00000000,00000000,00000000,00000000,00000000,00000000,00002000,00000000
393: 00000000,00000000,00000000,00000000,00000000,00000000,00004000,00000000
394: 00000000,00000000,00000000,00000000,00000000,00000000,00008000,00000000
395: 00000000,00000000,00000000,00000000,00000000,00000000,00010000,00000000
396: 00000000,00000000,00000000,00000000,00000000,00000000,00020000,00000000
397: 00000000,00000000,00000000,00000000,00000000,00000000,00040000,00000000
398: 00000000,00000000,00000000,00000000,00000000,00000000,00080000,00000000
399: 00000000,00000000,00000000,00000000,00000000,00000000,00100000,00000000
400: 00000000,00000000,00000000,00000000,00000000,00000000,00200000,00000000
401: 00000000,00000000,00000000,00000000,00000000,00000000,00400000,00000000
402: 00000000,00000000,00000000,00000000,00000000,00000000,00800000,00000000
403: 00000000,00000000,00000000,00000000,00000000,00000000,01000000,00000000
404: 00000000,00000000,00000000,00000000,00000000,00000000,02000000,00000000
405: 00000000,00000000,00000000,00000000,00000000,00000000,04000000,00000000
406: 00000000,00000000,00000000,00000000,00000000,00000000,08000000,00000000
407: 00000000,00000000,00000000,00000000,00000000,00000000,10000000,00000000
408: 00000000,00000000,00000000,00000000,00000000,00000000,20000000,00000000
409: 00000000,00000000,00000000,00000000,00000000,00000000,40000000,00000000
410: 00000000,00000000,00000000,00000000,00000000,00000000,80000000,00000000
411: 00000000,00000000,00000000,00000000,00000000,00000001,00000000,00000000
412: 00000000,00000000,00000000,00000000,00000000,00000002,00000000,00000000
413: 00000000,00000000,00000000,00000000,00000000,00000004,00000000,00000000
414: 00000000,00000000,00000000,00000000,00000000,00000008,00000000,00000000
415: 00000000,00000000,00000000,00000000,00000000,00000010,00000000,00000000
416: 00000000,00000000,00000000,00000000,00000000,00000020,00000000,00000000
417: 00000000,00000000,00000000,00000000,00000000,00000040,00000000,00000000
418: 00000000,00000000,00000000,00000000,00000000,00000080,00000000,00000000
419: 00000000,00000000,00000000,00000000,00000000,00000100,00000000,00000000
420: 00000000,00000000,00000000,00000000,00000000,00000200,00000000,00000000
421: 00000000,00000000,00000000,00000000,00000000,00000400,00000000,00000000
422: 00000000,00000000,00000000,00000000,00000000,00000800,00000000,00000000
423: 00000000,00000000,00000000,00000000,00000000,00001000,00000000,00000000
424: 00000000,00000000,00000000,00000000,00000000,00002000,00000000,00000000
425: 00000000,00000000,00000000,00000000,00000000,00004000,00000000,00000000
426: 00000000,00000000,00000000,00000000,00000000,00008000,00000000,00000000
427: 00000000,00000000,00000000,00000000,00000000,00010000,00000000,00000000
428: 00000000,00000000,00000000,00000000,00000000,00020000,00000000,00000000
429: 00000000,00000000,00000000,00000000,00000000,00040000,00000000,00000000
430: 00000000,00000000,00000000,00000000,00000000,00080000,00000000,00000000
431: 00000000,00000000,00000000,00000000,00000000,00100000,00000000,00000000
432: 00000000,00000000,00000000,00000000,00000000,00200000,00000000,00000000
433: 00000000,00000000,00000000,00000000,00000000,00400000,00000000,00000000
434: 00000000,00000000,00000000,00000000,00000000,00800000,00000000,00000000
435: 00000000,00000000,00000000,00000000,00000000,01000000,00000000,00000000
436: 00000000,00000000,00000000,00000000,00000000,02000000,00000000,00000000
437: 00000000,00000000,00000000,00000000,00000000,04000000,00000000,00000000
438: 00000000,00000000,00000000,00000000,00000000,08000000,00000000,00000000
439: 00000000,00000000,00000000,00000000,00000000,10000000,00000000,00000000
440: 00000000,00000000,00000000,00000000,00000000,20000000,00000000,00000000
441: 00000000,00000000,00000000,00000000,00000000,40000000,00000000,00000000
442: 00000000,00000000,00000000,00000000,00000000,80000000,00000000,00000000
443: 00000000,00000000,00000000,00000000,00000001,00000000,00000000,00000000
444: 00000000,00000000,00000000,00000000,00000002,00000000,00000000,00000000
445: 00000000,00000000,00000000,00000000,00000004,00000000,00000000,00000000
446: 00000000,00000000,00000000,00000000,00000008,00000000,00000000,00000000
447: 00000000,00000000,00000000,00000000,00000010,00000000,00000000,00000000
448: 00000000,00000000,00000000,00000000,00000020,00000000,00000000,00000000
449: 00000000,00000000,00000000,00000000,00000040,00000000,00000000,00000000
450: 00000000,00000000,00000000,00000000,00000080,00000000,00000000,00000000
451: 00000000,00000000,00000000,00000000,00000100,00000000,00000000,00000000
452: 00000000,00000000,00000000,00000000,00000200,00000000,00000000,00000000
453: 00000000,00000000,00000000,00000000,00000400,00000000,00000000,00000000
454: 00000000,00000000,00000000,00000000,00000800,00000000,00000000,00000000
455: 00000000,00000000,00000000,00000000,00001000,00000000,00000000,00000000
456: 00000000,00000000,00000000,00000000,00002000,00000000,00000000,00000000
457: 00000000,00000000,00000000,00000000,00004000,00000000,00000000,00000000

After:
331: 00000000,00000000,00000000,00000000,00010000,00000000,00000000,00000000
332: 00000000,00000000,00000000,00000000,00020000,00000000,00000000,00000000
333: 00000000,00000000,00000000,00000000,00040000,00000000,00000000,00000000
334: 00000000,00000000,00000000,00000000,00080000,00000000,00000000,00000000
335: 00000000,00000000,00000000,00000000,00100000,00000000,00000000,00000000
336: 00000000,00000000,00000000,00000000,00200000,00000000,00000000,00000000
337: 00000000,00000000,00000000,00000000,00400000,00000000,00000000,00000000
338: 00000000,00000000,00000000,00000000,00800000,00000000,00000000,00000000
339: 00010000,00000000,00000000,00000000,00000000,00000000,00000000,00000000
340: 00020000,00000000,00000000,00000000,00000000,00000000,00000000,00000000
341: 00040000,00000000,00000000,00000000,00000000,00000000,00000000,00000000
342: 00080000,00000000,00000000,00000000,00000000,00000000,00000000,00000000
343: 00100000,00000000,00000000,00000000,00000000,00000000,00000000,00000000
344: 00200000,00000000,00000000,00000000,00000000,00000000,00000000,00000000
345: 00400000,00000000,00000000,00000000,00000000,00000000,00000000,00000000
346: 00800000,00000000,00000000,00000000,00000000,00000000,00000000,00000000
347: 00000000,00000000,00000000,00000000,00000001,00000000,00000000,00000000
348: 00000000,00000000,00000000,00000000,00000002,00000000,00000000,00000000
349: 00000000,00000000,00000000,00000000,00000004,00000000,00000000,00000000
350: 00000000,00000000,00000000,00000000,00000008,00000000,00000000,00000000
351: 00000000,00000000,00000000,00000000,00000010,00000000,00000000,00000000
352: 00000000,00000000,00000000,00000000,00000020,00000000,00000000,00000000
353: 00000000,00000000,00000000,00000000,00000040,00000000,00000000,00000000
354: 00000000,00000000,00000000,00000000,00000080,00000000,00000000,00000000
355: 00000000,00000000,00000000,00000000,00000100,00000000,00000000,00000000
356: 00000000,00000000,00000000,00000000,00000200,00000000,00000000,00000000
357: 00000000,00000000,00000000,00000000,00000400,00000000,00000000,00000000
358: 00000000,00000000,00000000,00000000,00000800,00000000,00000000,00000000
359: 00000000,00000000,00000000,00000000,00001000,00000000,00000000,00000000
360: 00000000,00000000,00000000,00000000,00002000,00000000,00000000,00000000
361: 00000000,00000000,00000000,00000000,00004000,00000000,00000000,00000000
362: 00000000,00000000,00000000,00000000,00008000,00000000,00000000,00000000
363: 00000000,00000000,00000000,00000000,01000000,00000000,00000000,00000000
364: 00000000,00000000,00000000,00000000,02000000,00000000,00000000,00000000
365: 00000000,00000000,00000000,00000000,04000000,00000000,00000000,00000000
366: 00000000,00000000,00000000,00000000,08000000,00000000,00000000,00000000
367: 00000000,00000000,00000000,00000000,10000000,00000000,00000000,00000000
368: 00000000,00000000,00000000,00000000,20000000,00000000,00000000,00000000
369: 00000000,00000000,00000000,00000000,40000000,00000000,00000000,00000000
370: 00000000,00000000,00000000,00000000,80000000,00000000,00000000,00000000
371: 00000001,00000000,00000000,00000000,00000000,00000000,00000000,00000000
372: 00000002,00000000,00000000,00000000,00000000,00000000,00000000,00000000
373: 00000004,00000000,00000000,00000000,00000000,00000000,00000000,00000000
374: 00000008,00000000,00000000,00000000,00000000,00000000,00000000,00000000
375: 00000010,00000000,00000000,00000000,00000000,00000000,00000000,00000000
376: 00000020,00000000,00000000,00000000,00000000,00000000,00000000,00000000
377: 00000040,00000000,00000000,00000000,00000000,00000000,00000000,00000000
378: 00000080,00000000,00000000,00000000,00000000,00000000,00000000,00000000
379: 00000100,00000000,00000000,00000000,00000000,00000000,00000000,00000000
380: 00000200,00000000,00000000,00000000,00000000,00000000,00000000,00000000
381: 00000400,00000000,00000000,00000000,00000000,00000000,00000000,00000000
382: 00000800,00000000,00000000,00000000,00000000,00000000,00000000,00000000
383: 00001000,00000000,00000000,00000000,00000000,00000000,00000000,00000000
384: 00002000,00000000,00000000,00000000,00000000,00000000,00000000,00000000
385: 00004000,00000000,00000000,00000000,00000000,00000000,00000000,00000000
386: 00008000,00000000,00000000,00000000,00000000,00000000,00000000,00000000
387: 01000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000
388: 02000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000
389: 04000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000
390: 08000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000
391: 10000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000
392: 20000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000
393: 40000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000
394: 80000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000
395: 00000000,00000000,00000000,00000000,00000000,00000001,00000000,00000000
396: 00000000,00000000,00000000,00000000,00000000,00000002,00000000,00000000
397: 00000000,00000000,00000000,00000000,00000000,00000004,00000000,00000000
398: 00000000,00000000,00000000,00000000,00000000,00000008,00000000,00000000
399: 00000000,00000000,00000000,00000000,00000000,00000010,00000000,00000000
400: 00000000,00000000,00000000,00000000,00000000,00000020,00000000,00000000
401: 00000000,00000000,00000000,00000000,00000000,00000040,00000000,00000000
402: 00000000,00000000,00000000,00000000,00000000,00000080,00000000,00000000
403: 00000000,00000000,00000000,00000000,00000000,00000100,00000000,00000000
404: 00000000,00000000,00000000,00000000,00000000,00000200,00000000,00000000
405: 00000000,00000000,00000000,00000000,00000000,00000400,00000000,00000000
406: 00000000,00000000,00000000,00000000,00000000,00000800,00000000,00000000
407: 00000000,00000000,00000000,00000000,00000000,00001000,00000000,00000000
408: 00000000,00000000,00000000,00000000,00000000,00002000,00000000,00000000
409: 00000000,00000000,00000000,00000000,00000000,00004000,00000000,00000000
410: 00000000,00000000,00000000,00000000,00000000,00008000,00000000,00000000
411: 00000000,00000000,00000000,00000000,00000000,00010000,00000000,00000000
412: 00000000,00000000,00000000,00000000,00000000,00020000,00000000,00000000
413: 00000000,00000000,00000000,00000000,00000000,00040000,00000000,00000000
414: 00000000,00000000,00000000,00000000,00000000,00080000,00000000,00000000
415: 00000000,00000000,00000000,00000000,00000000,00100000,00000000,00000000
416: 00000000,00000000,00000000,00000000,00000000,00200000,00000000,00000000
417: 00000000,00000000,00000000,00000000,00000000,00400000,00000000,00000000
418: 00000000,00000000,00000000,00000000,00000000,00800000,00000000,00000000
419: 00000000,00000000,00000000,00000000,00000000,01000000,00000000,00000000
420: 00000000,00000000,00000000,00000000,00000000,02000000,00000000,00000000
421: 00000000,00000000,00000000,00000000,00000000,04000000,00000000,00000000
422: 00000000,00000000,00000000,00000000,00000000,08000000,00000000,00000000
423: 00000000,00000000,00000000,00000000,00000000,10000000,00000000,00000000
424: 00000000,00000000,00000000,00000000,00000000,20000000,00000000,00000000
425: 00000000,00000000,00000000,00000000,00000000,40000000,00000000,00000000
426: 00000000,00000000,00000000,00000000,00000000,80000000,00000000,00000000
427: 00000000,00000001,00000000,00000000,00000000,00000000,00000000,00000000
428: 00000000,00000002,00000000,00000000,00000000,00000000,00000000,00000000
429: 00000000,00000004,00000000,00000000,00000000,00000000,00000000,00000000
430: 00000000,00000008,00000000,00000000,00000000,00000000,00000000,00000000
431: 00000000,00000010,00000000,00000000,00000000,00000000,00000000,00000000
432: 00000000,00000020,00000000,00000000,00000000,00000000,00000000,00000000
433: 00000000,00000040,00000000,00000000,00000000,00000000,00000000,00000000
434: 00000000,00000080,00000000,00000000,00000000,00000000,00000000,00000000
435: 00000000,00000100,00000000,00000000,00000000,00000000,00000000,00000000
436: 00000000,00000200,00000000,00000000,00000000,00000000,00000000,00000000
437: 00000000,00000400,00000000,00000000,00000000,00000000,00000000,00000000
438: 00000000,00000800,00000000,00000000,00000000,00000000,00000000,00000000
439: 00000000,00001000,00000000,00000000,00000000,00000000,00000000,00000000
440: 00000000,00002000,00000000,00000000,00000000,00000000,00000000,00000000
441: 00000000,00004000,00000000,00000000,00000000,00000000,00000000,00000000
442: 00000000,00008000,00000000,00000000,00000000,00000000,00000000,00000000
443: 00000000,00010000,00000000,00000000,00000000,00000000,00000000,00000000
444: 00000000,00020000,00000000,00000000,00000000,00000000,00000000,00000000
445: 00000000,00040000,00000000,00000000,00000000,00000000,00000000,00000000
446: 00000000,00080000,00000000,00000000,00000000,00000000,00000000,00000000
447: 00000000,00100000,00000000,00000000,00000000,00000000,00000000,00000000
448: 00000000,00200000,00000000,00000000,00000000,00000000,00000000,00000000
449: 00000000,00400000,00000000,00000000,00000000,00000000,00000000,00000000
450: 00000000,00800000,00000000,00000000,00000000,00000000,00000000,00000000
451: 00000000,01000000,00000000,00000000,00000000,00000000,00000000,00000000
452: 00000000,02000000,00000000,00000000,00000000,00000000,00000000,00000000
453: 00000000,04000000,00000000,00000000,00000000,00000000,00000000,00000000
454: 00000000,08000000,00000000,00000000,00000000,00000000,00000000,00000000
455: 00000000,10000000,00000000,00000000,00000000,00000000,00000000,00000000
456: 00000000,20000000,00000000,00000000,00000000,00000000,00000000,00000000
457: 00000000,40000000,00000000,00000000,00000000,00000000,00000000,00000000

Signed-off-by: Tariq Toukan <[email protected]>
[Tweaked API use]
Signed-off-by: Valentin Schneider <[email protected]>
---
drivers/net/ethernet/mellanox/mlx5/core/eq.c | 18 ++++++++++++++++--
1 file changed, 16 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eq.c b/drivers/net/ethernet/mellanox/mlx5/core/eq.c
index a0242dc15741c..7acbeb3d51846 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eq.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eq.c
@@ -812,9 +812,12 @@ static void comp_irqs_release(struct mlx5_core_dev *dev)
static int comp_irqs_request(struct mlx5_core_dev *dev)
{
struct mlx5_eq_table *table = dev->priv.eq_table;
+ const struct cpumask *prev = cpu_none_mask;
+ const struct cpumask *mask;
int ncomp_eqs = table->num_comp_eqs;
u16 *cpus;
int ret;
+ int cpu;
int i;

ncomp_eqs = table->num_comp_eqs;
@@ -833,8 +836,19 @@ static int comp_irqs_request(struct mlx5_core_dev *dev)
ret = -ENOMEM;
goto free_irqs;
}
- for (i = 0; i < ncomp_eqs; i++)
- cpus[i] = cpumask_local_spread(i, dev->priv.numa_node);
+
+ i = 0;
+ rcu_read_lock();
+ for_each_numa_hop_mask(mask, dev->priv.numa_node) {
+ for_each_cpu_andnot(cpu, mask, prev) {
+ cpus[i] = cpu;
+ if (++i == ncomp_eqs)
+ goto spread_done;
+ }
+ prev = mask;
+ }
+spread_done:
+ rcu_read_unlock();
ret = mlx5_irqs_request_vectors(dev, cpus, ncomp_eqs, table->comp_irqs);
kfree(cpus);
if (ret < 0)
--
2.31.1

2022-10-21 12:39:49

by Valentin Schneider

[permalink] [raw]
Subject: [PATCH v5 1/3] sched/topology: Introduce sched_numa_hop_mask()

Tariq has pointed out that drivers allocating IRQ vectors would benefit
from having smarter NUMA-awareness - cpumask_local_spread() only knows
about the local node and everything outside is in the same bucket.

sched_domains_numa_masks is pretty much what we want to hand out (a cpumask
of CPUs reachable within a given distance budget), introduce
sched_numa_hop_mask() to export those cpumasks.

Link: http://lore.kernel.org/r/[email protected]
Signed-off-by: Valentin Schneider <[email protected]>
---
include/linux/topology.h | 12 ++++++++++++
kernel/sched/topology.c | 31 +++++++++++++++++++++++++++++++
2 files changed, 43 insertions(+)

diff --git a/include/linux/topology.h b/include/linux/topology.h
index 4564faafd0e12..3e91ae6d0ad58 100644
--- a/include/linux/topology.h
+++ b/include/linux/topology.h
@@ -245,5 +245,17 @@ static inline const struct cpumask *cpu_cpu_mask(int cpu)
return cpumask_of_node(cpu_to_node(cpu));
}

+#ifdef CONFIG_NUMA
+extern const struct cpumask *sched_numa_hop_mask(int node, int hops);
+#else
+static inline const struct cpumask *sched_numa_hop_mask(int node, int hops)
+{
+ if (node == NUMA_NO_NODE && !hops)
+ return cpu_online_mask;
+
+ return ERR_PTR(-EOPNOTSUPP);
+}
+#endif /* CONFIG_NUMA */
+

#endif /* _LINUX_TOPOLOGY_H */
diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
index 8739c2a5a54ea..e3cb8cc375204 100644
--- a/kernel/sched/topology.c
+++ b/kernel/sched/topology.c
@@ -2067,6 +2067,37 @@ int sched_numa_find_closest(const struct cpumask *cpus, int cpu)
return found;
}

+/**
+ * sched_numa_hop_mask() - Get the cpumask of CPUs at most @hops hops away from
+ * @node
+ * @node: The node to count hops from.
+ * @hops: Include CPUs up to that many hops away. 0 means local node.
+ *
+ * Return: On success, a pointer to a cpumask of CPUs at most @hops away from
+ * @node, an error value otherwise.
+ *
+ * Requires rcu_lock to be held. Returned cpumask is only valid within that
+ * read-side section, copy it if required beyond that.
+ *
+ * Note that not all hops are equal in distance; see sched_init_numa() for how
+ * distances and masks are handled.
+ * Also note that this is a reflection of sched_domains_numa_masks, which may change
+ * during the lifetime of the system (offline nodes are taken out of the masks).
+ */
+const struct cpumask *sched_numa_hop_mask(unsigned int node, unsigned int hops)
+{
+ struct cpumask ***masks = rcu_dereference(sched_domains_numa_masks);
+
+ if (node >= nr_node_ids || hops >= sched_domains_numa_levels)
+ return ERR_PTR(-EINVAL);
+
+ if (!masks)
+ return ERR_PTR(-EBUSY);
+
+ return masks[hops][node];
+}
+EXPORT_SYMBOL_GPL(sched_numa_hop_mask);
+
#endif /* CONFIG_NUMA */

static int __sdt_alloc(const struct cpumask *cpu_map)
--
2.31.1

2022-10-21 12:40:00

by Valentin Schneider

[permalink] [raw]
Subject: [PATCH v5 2/3] sched/topology: Introduce for_each_numa_hop_mask()

The recently introduced sched_numa_hop_mask() exposes cpumasks of CPUs
reachable within a given distance budget, wrap the logic for iterating over
all (distance, mask) values inside an iterator macro.

Signed-off-by: Valentin Schneider <[email protected]>
---
include/linux/topology.h | 30 +++++++++++++++++++++++++-----
1 file changed, 25 insertions(+), 5 deletions(-)

diff --git a/include/linux/topology.h b/include/linux/topology.h
index 3e91ae6d0ad58..8185e12ec1ccc 100644
--- a/include/linux/topology.h
+++ b/include/linux/topology.h
@@ -246,16 +246,36 @@ static inline const struct cpumask *cpu_cpu_mask(int cpu)
}

#ifdef CONFIG_NUMA
-extern const struct cpumask *sched_numa_hop_mask(int node, int hops);
+extern const struct cpumask *sched_numa_hop_mask(unsigned int node, unsigned int hops);
#else
-static inline const struct cpumask *sched_numa_hop_mask(int node, int hops)
+static inline const struct cpumask *
+sched_numa_hop_mask(unsigned int node, unsigned int hops)
{
- if (node == NUMA_NO_NODE && !hops)
- return cpu_online_mask;
-
return ERR_PTR(-EOPNOTSUPP);
}
#endif /* CONFIG_NUMA */

+/**
+ * for_each_numa_hop_mask - iterate over cpumasks of increasing NUMA distance
+ * from a given node.
+ * @mask: the iteration variable.
+ * @node: the NUMA node to start the search from.
+ *
+ * Requires rcu_lock to be held.
+ *
+ * Yields cpu_online_mask for @node == NUMA_NO_NODE.
+ */
+#define for_each_numa_hop_mask(mask, node) \
+ for (unsigned int __hops = 0; \
+ /* \
+ * Unsightly trickery required as we can't both initialize \
+ * @mask and declare __hops in for()'s first clause \
+ */ \
+ mask = __hops > 0 ? mask : \
+ node == NUMA_NO_NODE ? \
+ cpu_online_mask : sched_numa_hop_mask(node, 0), \
+ !IS_ERR_OR_NULL(mask); \
+ __hops++, \
+ mask = sched_numa_hop_mask(node, __hops))

#endif /* _LINUX_TOPOLOGY_H */
--
2.31.1

2022-10-21 13:32:39

by Andy Shevchenko

[permalink] [raw]
Subject: Re: [PATCH v5 2/3] sched/topology: Introduce for_each_numa_hop_mask()

On Fri, Oct 21, 2022 at 01:19:26PM +0100, Valentin Schneider wrote:
> The recently introduced sched_numa_hop_mask() exposes cpumasks of CPUs
> reachable within a given distance budget, wrap the logic for iterating over
> all (distance, mask) values inside an iterator macro.

...

> #ifdef CONFIG_NUMA
> -extern const struct cpumask *sched_numa_hop_mask(int node, int hops);
> +extern const struct cpumask *sched_numa_hop_mask(unsigned int node, unsigned int hops);
> #else
> -static inline const struct cpumask *sched_numa_hop_mask(int node, int hops)
> +static inline const struct cpumask *
> +sched_numa_hop_mask(unsigned int node, unsigned int hops)
> {
> - if (node == NUMA_NO_NODE && !hops)
> - return cpu_online_mask;
> -
> return ERR_PTR(-EOPNOTSUPP);
> }
> #endif /* CONFIG_NUMA */

I didn't get how the above two changes are related to the 3rd one which
introduces a for_each type of macro.

If you need change int --> unsigned int, perhaps it can be done in a separate
patch.

The change inside inliner I dunno about. Not an expert.

...

> +#define for_each_numa_hop_mask(mask, node) \
> + for (unsigned int __hops = 0; \
> + /* \
> + * Unsightly trickery required as we can't both initialize \
> + * @mask and declare __hops in for()'s first clause \
> + */ \
> + mask = __hops > 0 ? mask : \
> + node == NUMA_NO_NODE ? \
> + cpu_online_mask : sched_numa_hop_mask(node, 0), \
> + !IS_ERR_OR_NULL(mask); \

> + __hops++, \
> + mask = sched_numa_hop_mask(node, __hops))

This can be unified with conditional, see for_each_gpio_desc_with_flag() as
example how.

--
With Best Regards,
Andy Shevchenko


2022-10-21 14:24:48

by Andy Shevchenko

[permalink] [raw]
Subject: Re: [PATCH v5 2/3] sched/topology: Introduce for_each_numa_hop_mask()

On Fri, Oct 21, 2022 at 04:16:17PM +0300, Andy Shevchenko wrote:
> On Fri, Oct 21, 2022 at 01:19:26PM +0100, Valentin Schneider wrote:

...

> > +#define for_each_numa_hop_mask(mask, node) \
> > + for (unsigned int __hops = 0; \
> > + /* \
> > + * Unsightly trickery required as we can't both initialize \
> > + * @mask and declare __hops in for()'s first clause \
> > + */ \
> > + mask = __hops > 0 ? mask : \
> > + node == NUMA_NO_NODE ? \
> > + cpu_online_mask : sched_numa_hop_mask(node, 0), \
> > + !IS_ERR_OR_NULL(mask); \
>
> > + __hops++, \
> > + mask = sched_numa_hop_mask(node, __hops))
>
> This can be unified with conditional, see for_each_gpio_desc_with_flag() as
> example how.

Something like

mask = (__hops || node != NUMA_NO_NODE) ? sched_numa_hop_mask(node, __hops) : cpu_online_mask

--
With Best Regards,
Andy Shevchenko


2022-10-21 14:30:24

by Valentin Schneider

[permalink] [raw]
Subject: Re: [PATCH v5 2/3] sched/topology: Introduce for_each_numa_hop_mask()

On 21/10/22 16:16, Andy Shevchenko wrote:
> On Fri, Oct 21, 2022 at 01:19:26PM +0100, Valentin Schneider wrote:
>> The recently introduced sched_numa_hop_mask() exposes cpumasks of CPUs
>> reachable within a given distance budget, wrap the logic for iterating over
>> all (distance, mask) values inside an iterator macro.
>
> ...
>
>> #ifdef CONFIG_NUMA
>> -extern const struct cpumask *sched_numa_hop_mask(int node, int hops);
>> +extern const struct cpumask *sched_numa_hop_mask(unsigned int node, unsigned int hops);
>> #else
>> -static inline const struct cpumask *sched_numa_hop_mask(int node, int hops)
>> +static inline const struct cpumask *
>> +sched_numa_hop_mask(unsigned int node, unsigned int hops)
>> {
>> - if (node == NUMA_NO_NODE && !hops)
>> - return cpu_online_mask;
>> -
>> return ERR_PTR(-EOPNOTSUPP);
>> }
>> #endif /* CONFIG_NUMA */
>
> I didn't get how the above two changes are related to the 3rd one which
> introduces a for_each type of macro.
>
> If you need change int --> unsigned int, perhaps it can be done in a separate
> patch.
>
> The change inside inliner I dunno about. Not an expert.
>

That's a rebase fail, this should all be in the first patch, my bad.

2022-10-21 14:45:39

by Valentin Schneider

[permalink] [raw]
Subject: Re: [PATCH v5 2/3] sched/topology: Introduce for_each_numa_hop_mask()

On 21/10/22 16:34, Andy Shevchenko wrote:
> On Fri, Oct 21, 2022 at 04:16:17PM +0300, Andy Shevchenko wrote:
>> On Fri, Oct 21, 2022 at 01:19:26PM +0100, Valentin Schneider wrote:
>
> ...
>
>> > +#define for_each_numa_hop_mask(mask, node) \
>> > + for (unsigned int __hops = 0; \
>> > + /* \
>> > + * Unsightly trickery required as we can't both initialize \
>> > + * @mask and declare __hops in for()'s first clause \
>> > + */ \
>> > + mask = __hops > 0 ? mask : \
>> > + node == NUMA_NO_NODE ? \
>> > + cpu_online_mask : sched_numa_hop_mask(node, 0), \
>> > + !IS_ERR_OR_NULL(mask); \
>>
>> > + __hops++, \
>> > + mask = sched_numa_hop_mask(node, __hops))
>>
>> This can be unified with conditional, see for_each_gpio_desc_with_flag() as
>> example how.
>
> Something like
>
> mask = (__hops || node != NUMA_NO_NODE) ? sched_numa_hop_mask(node, __hops) : cpu_online_mask
>

That does simplify things somewhat, thanks!

> --
> With Best Regards,
> Andy Shevchenko

2022-10-24 11:32:39

by Tariq Toukan

[permalink] [raw]
Subject: Re: [PATCH v5 3/3] net/mlx5e: Improve remote NUMA preferences used for the IRQ affinity hints



On 10/21/2022 3:19 PM, Valentin Schneider wrote:
> From: Tariq Toukan <[email protected]>
>
> In the IRQ affinity hints, replace the binary NUMA preference (local /
> remote) with the improved for_each_numa_hop_cpu() API that minds the
> actual distances, so that remote NUMAs with short distance are preferred
> over farther ones.
>
> This has significant performance implications when using NUMA-aware
> allocated memory (follow [1] and derivatives for example).
>
> [1]
> drivers/net/ethernet/mellanox/mlx5/core/en_main.c :: mlx5e_open_channel()
> int cpu = cpumask_first(mlx5_comp_irq_get_affinity_mask(priv->mdev, ix));
>
> Performance tests:
>
> TCP multi-stream, using 16 iperf3 instances pinned to 16 cores (with aRFS on).
> Active cores: 64,65,72,73,80,81,88,89,96,97,104,105,112,113,120,121
>
> +-------------------------+-----------+------------------+------------------+
> | | BW (Gbps) | TX side CPU util | RX side CPU util |
> +-------------------------+-----------+------------------+------------------+
> | Baseline | 52.3 | 6.4 % | 17.9 % |
> +-------------------------+-----------+------------------+------------------+
> | Applied on TX side only | 52.6 | 5.2 % | 18.5 % |
> +-------------------------+-----------+------------------+------------------+
> | Applied on RX side only | 94.9 | 11.9 % | 27.2 % |
> +-------------------------+-----------+------------------+------------------+
> | Applied on both sides | 95.1 | 8.4 % | 27.3 % |
> +-------------------------+-----------+------------------+------------------+
>
> Bottleneck in RX side is released, reached linerate (~1.8x speedup).
> ~30% less cpu util on TX.
>
> * CPU util on active cores only.
>
> Setups details (similar for both sides):
>
> NIC: ConnectX6-DX dual port, 100 Gbps each.
> Single port used in the tests.
>
> $ lscpu
> Architecture: x86_64
> CPU op-mode(s): 32-bit, 64-bit
> Byte Order: Little Endian
> CPU(s): 256
> On-line CPU(s) list: 0-255
> Thread(s) per core: 2
> Core(s) per socket: 64
> Socket(s): 2
> NUMA node(s): 16
> Vendor ID: AuthenticAMD
> CPU family: 25
> Model: 1
> Model name: AMD EPYC 7763 64-Core Processor
> Stepping: 1
> CPU MHz: 2594.804
> BogoMIPS: 4890.73
> Virtualization: AMD-V
> L1d cache: 32K
> L1i cache: 32K
> L2 cache: 512K
> L3 cache: 32768K
> NUMA node0 CPU(s): 0-7,128-135
> NUMA node1 CPU(s): 8-15,136-143
> NUMA node2 CPU(s): 16-23,144-151
> NUMA node3 CPU(s): 24-31,152-159
> NUMA node4 CPU(s): 32-39,160-167
> NUMA node5 CPU(s): 40-47,168-175
> NUMA node6 CPU(s): 48-55,176-183
> NUMA node7 CPU(s): 56-63,184-191
> NUMA node8 CPU(s): 64-71,192-199
> NUMA node9 CPU(s): 72-79,200-207
> NUMA node10 CPU(s): 80-87,208-215
> NUMA node11 CPU(s): 88-95,216-223
> NUMA node12 CPU(s): 96-103,224-231
> NUMA node13 CPU(s): 104-111,232-239
> NUMA node14 CPU(s): 112-119,240-247
> NUMA node15 CPU(s): 120-127,248-255
> ..
...
>
> Signed-off-by: Tariq Toukan <[email protected]>
> [Tweaked API use]

Thanks for your modification.
It looks good to me.

Signed-off-by: Tariq Toukan <[email protected]>

> Signed-off-by: Valentin Schneider <[email protected]>
> ---
> drivers/net/ethernet/mellanox/mlx5/core/eq.c | 18 ++++++++++++++++--
> 1 file changed, 16 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eq.c b/drivers/net/ethernet/mellanox/mlx5/core/eq.c
> index a0242dc15741c..7acbeb3d51846 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/eq.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/eq.c
> @@ -812,9 +812,12 @@ static void comp_irqs_release(struct mlx5_core_dev *dev)
> static int comp_irqs_request(struct mlx5_core_dev *dev)
> {
> struct mlx5_eq_table *table = dev->priv.eq_table;
> + const struct cpumask *prev = cpu_none_mask;
> + const struct cpumask *mask;
> int ncomp_eqs = table->num_comp_eqs;
> u16 *cpus;
> int ret;
> + int cpu;
> int i;
>
> ncomp_eqs = table->num_comp_eqs;
> @@ -833,8 +836,19 @@ static int comp_irqs_request(struct mlx5_core_dev *dev)
> ret = -ENOMEM;
> goto free_irqs;
> }
> - for (i = 0; i < ncomp_eqs; i++)
> - cpus[i] = cpumask_local_spread(i, dev->priv.numa_node);
> +
> + i = 0;
> + rcu_read_lock();
> + for_each_numa_hop_mask(mask, dev->priv.numa_node) {
> + for_each_cpu_andnot(cpu, mask, prev) {
> + cpus[i] = cpu;
> + if (++i == ncomp_eqs)
> + goto spread_done;
> + }
> + prev = mask;
> + }
> +spread_done:
> + rcu_read_unlock();
> ret = mlx5_irqs_request_vectors(dev, cpus, ncomp_eqs, table->comp_irqs);
> kfree(cpus);
> if (ret < 0)

2022-10-25 00:43:18

by Yury Norov

[permalink] [raw]
Subject: Re: [PATCH v5 1/3] sched/topology: Introduce sched_numa_hop_mask()

On Fri, Oct 21, 2022 at 01:19:25PM +0100, Valentin Schneider wrote:
> Tariq has pointed out that drivers allocating IRQ vectors would benefit
> from having smarter NUMA-awareness - cpumask_local_spread() only knows
> about the local node and everything outside is in the same bucket.

Can you keep 1st-person references in a cover letter?

> sched_domains_numa_masks is pretty much what we want to hand out (a cpumask
> of CPUs reachable within a given distance budget), introduce
> sched_numa_hop_mask() to export those cpumasks.
>
> Link: http://lore.kernel.org/r/[email protected]
> Signed-off-by: Valentin Schneider <[email protected]>
> ---
> include/linux/topology.h | 12 ++++++++++++
> kernel/sched/topology.c | 31 +++++++++++++++++++++++++++++++
> 2 files changed, 43 insertions(+)
>
> diff --git a/include/linux/topology.h b/include/linux/topology.h
> index 4564faafd0e12..3e91ae6d0ad58 100644
> --- a/include/linux/topology.h
> +++ b/include/linux/topology.h
> @@ -245,5 +245,17 @@ static inline const struct cpumask *cpu_cpu_mask(int cpu)
> return cpumask_of_node(cpu_to_node(cpu));
> }
>
> +#ifdef CONFIG_NUMA
> +extern const struct cpumask *sched_numa_hop_mask(int node, int hops);
> +#else
> +static inline const struct cpumask *sched_numa_hop_mask(int node, int hops)
> +{
> + if (node == NUMA_NO_NODE && !hops)
> + return cpu_online_mask;
> +
> + return ERR_PTR(-EOPNOTSUPP);
> +}
> +#endif /* CONFIG_NUMA */
> +
>
> #endif /* _LINUX_TOPOLOGY_H */
> diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
> index 8739c2a5a54ea..e3cb8cc375204 100644
> --- a/kernel/sched/topology.c
> +++ b/kernel/sched/topology.c
> @@ -2067,6 +2067,37 @@ int sched_numa_find_closest(const struct cpumask *cpus, int cpu)
> return found;
> }
>
> +/**
> + * sched_numa_hop_mask() - Get the cpumask of CPUs at most @hops hops away from
> + * @node
> + * @node: The node to count hops from.
> + * @hops: Include CPUs up to that many hops away. 0 means local node.
> + *
> + * Return: On success, a pointer to a cpumask of CPUs at most @hops away from
> + * @node, an error value otherwise.
> + *
> + * Requires rcu_lock to be held. Returned cpumask is only valid within that
> + * read-side section, copy it if required beyond that.
> + *
> + * Note that not all hops are equal in distance; see sched_init_numa() for how
> + * distances and masks are handled.
> + * Also note that this is a reflection of sched_domains_numa_masks, which may change
> + * during the lifetime of the system (offline nodes are taken out of the masks).
> + */
> +const struct cpumask *sched_numa_hop_mask(unsigned int node, unsigned int hops)
> +{
> + struct cpumask ***masks = rcu_dereference(sched_domains_numa_masks);
> +
> + if (node >= nr_node_ids || hops >= sched_domains_numa_levels)
> + return ERR_PTR(-EINVAL);

Can you dereference rcu things after sanity checks?

> + if (!masks)
> + return ERR_PTR(-EBUSY);
> +
> + return masks[hops][node];
> +}
> +EXPORT_SYMBOL_GPL(sched_numa_hop_mask);
> +
> #endif /* CONFIG_NUMA */
>
> static int __sdt_alloc(const struct cpumask *cpu_map)
> --
> 2.31.1

2022-10-25 01:08:27

by Yury Norov

[permalink] [raw]
Subject: Re: [PATCH v5 3/3] net/mlx5e: Improve remote NUMA preferences used for the IRQ affinity hints

On Fri, Oct 21, 2022 at 01:19:27PM +0100, Valentin Schneider wrote:
> From: Tariq Toukan <[email protected]>
>
> In the IRQ affinity hints, replace the binary NUMA preference (local /
> remote) with the improved for_each_numa_hop_cpu() API that minds the
> actual distances, so that remote NUMAs with short distance are preferred
> over farther ones.
>
> This has significant performance implications when using NUMA-aware
> allocated memory (follow [1] and derivatives for example).
>
> [1]
> drivers/net/ethernet/mellanox/mlx5/core/en_main.c :: mlx5e_open_channel()
> int cpu = cpumask_first(mlx5_comp_irq_get_affinity_mask(priv->mdev, ix));
>
> Performance tests:
>
> TCP multi-stream, using 16 iperf3 instances pinned to 16 cores (with aRFS on).
> Active cores: 64,65,72,73,80,81,88,89,96,97,104,105,112,113,120,121
>
> +-------------------------+-----------+------------------+------------------+
> | | BW (Gbps) | TX side CPU util | RX side CPU util |
> +-------------------------+-----------+------------------+------------------+
> | Baseline | 52.3 | 6.4 % | 17.9 % |
> +-------------------------+-----------+------------------+------------------+
> | Applied on TX side only | 52.6 | 5.2 % | 18.5 % |
> +-------------------------+-----------+------------------+------------------+
> | Applied on RX side only | 94.9 | 11.9 % | 27.2 % |
> +-------------------------+-----------+------------------+------------------+
> | Applied on both sides | 95.1 | 8.4 % | 27.3 % |
> +-------------------------+-----------+------------------+------------------+
>
> Bottleneck in RX side is released, reached linerate (~1.8x speedup).
> ~30% less cpu util on TX.
>
> * CPU util on active cores only.
>
> Setups details (similar for both sides):
>
> NIC: ConnectX6-DX dual port, 100 Gbps each.
> Single port used in the tests.
>
> $ lscpu
> Architecture: x86_64
> CPU op-mode(s): 32-bit, 64-bit
> Byte Order: Little Endian
> CPU(s): 256
> On-line CPU(s) list: 0-255
> Thread(s) per core: 2
> Core(s) per socket: 64
> Socket(s): 2
> NUMA node(s): 16
> Vendor ID: AuthenticAMD
> CPU family: 25
> Model: 1
> Model name: AMD EPYC 7763 64-Core Processor
> Stepping: 1
> CPU MHz: 2594.804
> BogoMIPS: 4890.73
> Virtualization: AMD-V
> L1d cache: 32K
> L1i cache: 32K
> L2 cache: 512K
> L3 cache: 32768K
> NUMA node0 CPU(s): 0-7,128-135
> NUMA node1 CPU(s): 8-15,136-143
> NUMA node2 CPU(s): 16-23,144-151
> NUMA node3 CPU(s): 24-31,152-159
> NUMA node4 CPU(s): 32-39,160-167
> NUMA node5 CPU(s): 40-47,168-175
> NUMA node6 CPU(s): 48-55,176-183
> NUMA node7 CPU(s): 56-63,184-191
> NUMA node8 CPU(s): 64-71,192-199
> NUMA node9 CPU(s): 72-79,200-207
> NUMA node10 CPU(s): 80-87,208-215
> NUMA node11 CPU(s): 88-95,216-223
> NUMA node12 CPU(s): 96-103,224-231
> NUMA node13 CPU(s): 104-111,232-239
> NUMA node14 CPU(s): 112-119,240-247
> NUMA node15 CPU(s): 120-127,248-255
> ..
>
> $ numactl -H
> ..
> node distances:
> node 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
> 0: 10 11 11 11 12 12 12 12 32 32 32 32 32 32 32 32
> 1: 11 10 11 11 12 12 12 12 32 32 32 32 32 32 32 32
> 2: 11 11 10 11 12 12 12 12 32 32 32 32 32 32 32 32
> 3: 11 11 11 10 12 12 12 12 32 32 32 32 32 32 32 32
> 4: 12 12 12 12 10 11 11 11 32 32 32 32 32 32 32 32
> 5: 12 12 12 12 11 10 11 11 32 32 32 32 32 32 32 32
> 6: 12 12 12 12 11 11 10 11 32 32 32 32 32 32 32 32
> 7: 12 12 12 12 11 11 11 10 32 32 32 32 32 32 32 32
> 8: 32 32 32 32 32 32 32 32 10 11 11 11 12 12 12 12
> 9: 32 32 32 32 32 32 32 32 11 10 11 11 12 12 12 12
> 10: 32 32 32 32 32 32 32 32 11 11 10 11 12 12 12 12
> 11: 32 32 32 32 32 32 32 32 11 11 11 10 12 12 12 12
> 12: 32 32 32 32 32 32 32 32 12 12 12 12 10 11 11 11
> 13: 32 32 32 32 32 32 32 32 12 12 12 12 11 10 11 11
> 14: 32 32 32 32 32 32 32 32 12 12 12 12 11 11 10 11
> 15: 32 32 32 32 32 32 32 32 12 12 12 12 11 11 11 10
>
> $ cat /sys/class/net/ens5f0/device/numa_node
> 14
>
> Affinity hints (127 IRQs):
> Before:
> 331: 00000000,00000000,00000000,00000000,00010000,00000000,00000000,00000000
> 332: 00000000,00000000,00000000,00000000,00020000,00000000,00000000,00000000
> 333: 00000000,00000000,00000000,00000000,00040000,00000000,00000000,00000000
> 334: 00000000,00000000,00000000,00000000,00080000,00000000,00000000,00000000
> 335: 00000000,00000000,00000000,00000000,00100000,00000000,00000000,00000000
> 336: 00000000,00000000,00000000,00000000,00200000,00000000,00000000,00000000
> 337: 00000000,00000000,00000000,00000000,00400000,00000000,00000000,00000000
> 338: 00000000,00000000,00000000,00000000,00800000,00000000,00000000,00000000
> 339: 00010000,00000000,00000000,00000000,00000000,00000000,00000000,00000000
> 340: 00020000,00000000,00000000,00000000,00000000,00000000,00000000,00000000
> 341: 00040000,00000000,00000000,00000000,00000000,00000000,00000000,00000000
> 342: 00080000,00000000,00000000,00000000,00000000,00000000,00000000,00000000
> 343: 00100000,00000000,00000000,00000000,00000000,00000000,00000000,00000000
> 344: 00200000,00000000,00000000,00000000,00000000,00000000,00000000,00000000
> 345: 00400000,00000000,00000000,00000000,00000000,00000000,00000000,00000000
> 346: 00800000,00000000,00000000,00000000,00000000,00000000,00000000,00000000
> 347: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000001
> 348: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000002
> 349: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000004
> 350: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000008
> 351: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000010
> 352: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000020
> 353: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000040
> 354: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000080
> 355: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000100
> 356: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000200
> 357: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000400
> 358: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000800
> 359: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00001000
> 360: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00002000
> 361: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00004000
> 362: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00008000
> 363: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00010000
> 364: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00020000
> 365: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00040000
> 366: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00080000
> 367: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00100000
> 368: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00200000
> 369: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00400000
> 370: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00800000
> 371: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,01000000
> 372: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,02000000
> 373: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,04000000
> 374: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,08000000
> 375: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,10000000
> 376: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,20000000
> 377: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,40000000
> 378: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,80000000
> 379: 00000000,00000000,00000000,00000000,00000000,00000000,00000001,00000000
> 380: 00000000,00000000,00000000,00000000,00000000,00000000,00000002,00000000
> 381: 00000000,00000000,00000000,00000000,00000000,00000000,00000004,00000000
> 382: 00000000,00000000,00000000,00000000,00000000,00000000,00000008,00000000
> 383: 00000000,00000000,00000000,00000000,00000000,00000000,00000010,00000000
> 384: 00000000,00000000,00000000,00000000,00000000,00000000,00000020,00000000
> 385: 00000000,00000000,00000000,00000000,00000000,00000000,00000040,00000000
> 386: 00000000,00000000,00000000,00000000,00000000,00000000,00000080,00000000
> 387: 00000000,00000000,00000000,00000000,00000000,00000000,00000100,00000000
> 388: 00000000,00000000,00000000,00000000,00000000,00000000,00000200,00000000
> 389: 00000000,00000000,00000000,00000000,00000000,00000000,00000400,00000000
> 390: 00000000,00000000,00000000,00000000,00000000,00000000,00000800,00000000
> 391: 00000000,00000000,00000000,00000000,00000000,00000000,00001000,00000000
> 392: 00000000,00000000,00000000,00000000,00000000,00000000,00002000,00000000
> 393: 00000000,00000000,00000000,00000000,00000000,00000000,00004000,00000000
> 394: 00000000,00000000,00000000,00000000,00000000,00000000,00008000,00000000
> 395: 00000000,00000000,00000000,00000000,00000000,00000000,00010000,00000000
> 396: 00000000,00000000,00000000,00000000,00000000,00000000,00020000,00000000
> 397: 00000000,00000000,00000000,00000000,00000000,00000000,00040000,00000000
> 398: 00000000,00000000,00000000,00000000,00000000,00000000,00080000,00000000
> 399: 00000000,00000000,00000000,00000000,00000000,00000000,00100000,00000000
> 400: 00000000,00000000,00000000,00000000,00000000,00000000,00200000,00000000
> 401: 00000000,00000000,00000000,00000000,00000000,00000000,00400000,00000000
> 402: 00000000,00000000,00000000,00000000,00000000,00000000,00800000,00000000
> 403: 00000000,00000000,00000000,00000000,00000000,00000000,01000000,00000000
> 404: 00000000,00000000,00000000,00000000,00000000,00000000,02000000,00000000
> 405: 00000000,00000000,00000000,00000000,00000000,00000000,04000000,00000000
> 406: 00000000,00000000,00000000,00000000,00000000,00000000,08000000,00000000
> 407: 00000000,00000000,00000000,00000000,00000000,00000000,10000000,00000000
> 408: 00000000,00000000,00000000,00000000,00000000,00000000,20000000,00000000
> 409: 00000000,00000000,00000000,00000000,00000000,00000000,40000000,00000000
> 410: 00000000,00000000,00000000,00000000,00000000,00000000,80000000,00000000
> 411: 00000000,00000000,00000000,00000000,00000000,00000001,00000000,00000000
> 412: 00000000,00000000,00000000,00000000,00000000,00000002,00000000,00000000
> 413: 00000000,00000000,00000000,00000000,00000000,00000004,00000000,00000000
> 414: 00000000,00000000,00000000,00000000,00000000,00000008,00000000,00000000
> 415: 00000000,00000000,00000000,00000000,00000000,00000010,00000000,00000000
> 416: 00000000,00000000,00000000,00000000,00000000,00000020,00000000,00000000
> 417: 00000000,00000000,00000000,00000000,00000000,00000040,00000000,00000000
> 418: 00000000,00000000,00000000,00000000,00000000,00000080,00000000,00000000
> 419: 00000000,00000000,00000000,00000000,00000000,00000100,00000000,00000000
> 420: 00000000,00000000,00000000,00000000,00000000,00000200,00000000,00000000
> 421: 00000000,00000000,00000000,00000000,00000000,00000400,00000000,00000000
> 422: 00000000,00000000,00000000,00000000,00000000,00000800,00000000,00000000
> 423: 00000000,00000000,00000000,00000000,00000000,00001000,00000000,00000000
> 424: 00000000,00000000,00000000,00000000,00000000,00002000,00000000,00000000
> 425: 00000000,00000000,00000000,00000000,00000000,00004000,00000000,00000000
> 426: 00000000,00000000,00000000,00000000,00000000,00008000,00000000,00000000
> 427: 00000000,00000000,00000000,00000000,00000000,00010000,00000000,00000000
> 428: 00000000,00000000,00000000,00000000,00000000,00020000,00000000,00000000
> 429: 00000000,00000000,00000000,00000000,00000000,00040000,00000000,00000000
> 430: 00000000,00000000,00000000,00000000,00000000,00080000,00000000,00000000
> 431: 00000000,00000000,00000000,00000000,00000000,00100000,00000000,00000000
> 432: 00000000,00000000,00000000,00000000,00000000,00200000,00000000,00000000
> 433: 00000000,00000000,00000000,00000000,00000000,00400000,00000000,00000000
> 434: 00000000,00000000,00000000,00000000,00000000,00800000,00000000,00000000
> 435: 00000000,00000000,00000000,00000000,00000000,01000000,00000000,00000000
> 436: 00000000,00000000,00000000,00000000,00000000,02000000,00000000,00000000
> 437: 00000000,00000000,00000000,00000000,00000000,04000000,00000000,00000000
> 438: 00000000,00000000,00000000,00000000,00000000,08000000,00000000,00000000
> 439: 00000000,00000000,00000000,00000000,00000000,10000000,00000000,00000000
> 440: 00000000,00000000,00000000,00000000,00000000,20000000,00000000,00000000
> 441: 00000000,00000000,00000000,00000000,00000000,40000000,00000000,00000000
> 442: 00000000,00000000,00000000,00000000,00000000,80000000,00000000,00000000
> 443: 00000000,00000000,00000000,00000000,00000001,00000000,00000000,00000000
> 444: 00000000,00000000,00000000,00000000,00000002,00000000,00000000,00000000
> 445: 00000000,00000000,00000000,00000000,00000004,00000000,00000000,00000000
> 446: 00000000,00000000,00000000,00000000,00000008,00000000,00000000,00000000
> 447: 00000000,00000000,00000000,00000000,00000010,00000000,00000000,00000000
> 448: 00000000,00000000,00000000,00000000,00000020,00000000,00000000,00000000
> 449: 00000000,00000000,00000000,00000000,00000040,00000000,00000000,00000000
> 450: 00000000,00000000,00000000,00000000,00000080,00000000,00000000,00000000
> 451: 00000000,00000000,00000000,00000000,00000100,00000000,00000000,00000000
> 452: 00000000,00000000,00000000,00000000,00000200,00000000,00000000,00000000
> 453: 00000000,00000000,00000000,00000000,00000400,00000000,00000000,00000000
> 454: 00000000,00000000,00000000,00000000,00000800,00000000,00000000,00000000
> 455: 00000000,00000000,00000000,00000000,00001000,00000000,00000000,00000000
> 456: 00000000,00000000,00000000,00000000,00002000,00000000,00000000,00000000
> 457: 00000000,00000000,00000000,00000000,00004000,00000000,00000000,00000000
>
> After:
> 331: 00000000,00000000,00000000,00000000,00010000,00000000,00000000,00000000
> 332: 00000000,00000000,00000000,00000000,00020000,00000000,00000000,00000000
> 333: 00000000,00000000,00000000,00000000,00040000,00000000,00000000,00000000
> 334: 00000000,00000000,00000000,00000000,00080000,00000000,00000000,00000000
> 335: 00000000,00000000,00000000,00000000,00100000,00000000,00000000,00000000
> 336: 00000000,00000000,00000000,00000000,00200000,00000000,00000000,00000000
> 337: 00000000,00000000,00000000,00000000,00400000,00000000,00000000,00000000
> 338: 00000000,00000000,00000000,00000000,00800000,00000000,00000000,00000000
> 339: 00010000,00000000,00000000,00000000,00000000,00000000,00000000,00000000
> 340: 00020000,00000000,00000000,00000000,00000000,00000000,00000000,00000000
> 341: 00040000,00000000,00000000,00000000,00000000,00000000,00000000,00000000
> 342: 00080000,00000000,00000000,00000000,00000000,00000000,00000000,00000000
> 343: 00100000,00000000,00000000,00000000,00000000,00000000,00000000,00000000
> 344: 00200000,00000000,00000000,00000000,00000000,00000000,00000000,00000000
> 345: 00400000,00000000,00000000,00000000,00000000,00000000,00000000,00000000
> 346: 00800000,00000000,00000000,00000000,00000000,00000000,00000000,00000000
> 347: 00000000,00000000,00000000,00000000,00000001,00000000,00000000,00000000
> 348: 00000000,00000000,00000000,00000000,00000002,00000000,00000000,00000000
> 349: 00000000,00000000,00000000,00000000,00000004,00000000,00000000,00000000
> 350: 00000000,00000000,00000000,00000000,00000008,00000000,00000000,00000000
> 351: 00000000,00000000,00000000,00000000,00000010,00000000,00000000,00000000
> 352: 00000000,00000000,00000000,00000000,00000020,00000000,00000000,00000000
> 353: 00000000,00000000,00000000,00000000,00000040,00000000,00000000,00000000
> 354: 00000000,00000000,00000000,00000000,00000080,00000000,00000000,00000000
> 355: 00000000,00000000,00000000,00000000,00000100,00000000,00000000,00000000
> 356: 00000000,00000000,00000000,00000000,00000200,00000000,00000000,00000000
> 357: 00000000,00000000,00000000,00000000,00000400,00000000,00000000,00000000
> 358: 00000000,00000000,00000000,00000000,00000800,00000000,00000000,00000000
> 359: 00000000,00000000,00000000,00000000,00001000,00000000,00000000,00000000
> 360: 00000000,00000000,00000000,00000000,00002000,00000000,00000000,00000000
> 361: 00000000,00000000,00000000,00000000,00004000,00000000,00000000,00000000
> 362: 00000000,00000000,00000000,00000000,00008000,00000000,00000000,00000000
> 363: 00000000,00000000,00000000,00000000,01000000,00000000,00000000,00000000
> 364: 00000000,00000000,00000000,00000000,02000000,00000000,00000000,00000000
> 365: 00000000,00000000,00000000,00000000,04000000,00000000,00000000,00000000
> 366: 00000000,00000000,00000000,00000000,08000000,00000000,00000000,00000000
> 367: 00000000,00000000,00000000,00000000,10000000,00000000,00000000,00000000
> 368: 00000000,00000000,00000000,00000000,20000000,00000000,00000000,00000000
> 369: 00000000,00000000,00000000,00000000,40000000,00000000,00000000,00000000
> 370: 00000000,00000000,00000000,00000000,80000000,00000000,00000000,00000000
> 371: 00000001,00000000,00000000,00000000,00000000,00000000,00000000,00000000
> 372: 00000002,00000000,00000000,00000000,00000000,00000000,00000000,00000000
> 373: 00000004,00000000,00000000,00000000,00000000,00000000,00000000,00000000
> 374: 00000008,00000000,00000000,00000000,00000000,00000000,00000000,00000000
> 375: 00000010,00000000,00000000,00000000,00000000,00000000,00000000,00000000
> 376: 00000020,00000000,00000000,00000000,00000000,00000000,00000000,00000000
> 377: 00000040,00000000,00000000,00000000,00000000,00000000,00000000,00000000
> 378: 00000080,00000000,00000000,00000000,00000000,00000000,00000000,00000000
> 379: 00000100,00000000,00000000,00000000,00000000,00000000,00000000,00000000
> 380: 00000200,00000000,00000000,00000000,00000000,00000000,00000000,00000000
> 381: 00000400,00000000,00000000,00000000,00000000,00000000,00000000,00000000
> 382: 00000800,00000000,00000000,00000000,00000000,00000000,00000000,00000000
> 383: 00001000,00000000,00000000,00000000,00000000,00000000,00000000,00000000
> 384: 00002000,00000000,00000000,00000000,00000000,00000000,00000000,00000000
> 385: 00004000,00000000,00000000,00000000,00000000,00000000,00000000,00000000
> 386: 00008000,00000000,00000000,00000000,00000000,00000000,00000000,00000000
> 387: 01000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000
> 388: 02000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000
> 389: 04000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000
> 390: 08000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000
> 391: 10000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000
> 392: 20000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000
> 393: 40000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000
> 394: 80000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000
> 395: 00000000,00000000,00000000,00000000,00000000,00000001,00000000,00000000
> 396: 00000000,00000000,00000000,00000000,00000000,00000002,00000000,00000000
> 397: 00000000,00000000,00000000,00000000,00000000,00000004,00000000,00000000
> 398: 00000000,00000000,00000000,00000000,00000000,00000008,00000000,00000000
> 399: 00000000,00000000,00000000,00000000,00000000,00000010,00000000,00000000
> 400: 00000000,00000000,00000000,00000000,00000000,00000020,00000000,00000000
> 401: 00000000,00000000,00000000,00000000,00000000,00000040,00000000,00000000
> 402: 00000000,00000000,00000000,00000000,00000000,00000080,00000000,00000000
> 403: 00000000,00000000,00000000,00000000,00000000,00000100,00000000,00000000
> 404: 00000000,00000000,00000000,00000000,00000000,00000200,00000000,00000000
> 405: 00000000,00000000,00000000,00000000,00000000,00000400,00000000,00000000
> 406: 00000000,00000000,00000000,00000000,00000000,00000800,00000000,00000000
> 407: 00000000,00000000,00000000,00000000,00000000,00001000,00000000,00000000
> 408: 00000000,00000000,00000000,00000000,00000000,00002000,00000000,00000000
> 409: 00000000,00000000,00000000,00000000,00000000,00004000,00000000,00000000
> 410: 00000000,00000000,00000000,00000000,00000000,00008000,00000000,00000000
> 411: 00000000,00000000,00000000,00000000,00000000,00010000,00000000,00000000
> 412: 00000000,00000000,00000000,00000000,00000000,00020000,00000000,00000000
> 413: 00000000,00000000,00000000,00000000,00000000,00040000,00000000,00000000
> 414: 00000000,00000000,00000000,00000000,00000000,00080000,00000000,00000000
> 415: 00000000,00000000,00000000,00000000,00000000,00100000,00000000,00000000
> 416: 00000000,00000000,00000000,00000000,00000000,00200000,00000000,00000000
> 417: 00000000,00000000,00000000,00000000,00000000,00400000,00000000,00000000
> 418: 00000000,00000000,00000000,00000000,00000000,00800000,00000000,00000000
> 419: 00000000,00000000,00000000,00000000,00000000,01000000,00000000,00000000
> 420: 00000000,00000000,00000000,00000000,00000000,02000000,00000000,00000000
> 421: 00000000,00000000,00000000,00000000,00000000,04000000,00000000,00000000
> 422: 00000000,00000000,00000000,00000000,00000000,08000000,00000000,00000000
> 423: 00000000,00000000,00000000,00000000,00000000,10000000,00000000,00000000
> 424: 00000000,00000000,00000000,00000000,00000000,20000000,00000000,00000000
> 425: 00000000,00000000,00000000,00000000,00000000,40000000,00000000,00000000
> 426: 00000000,00000000,00000000,00000000,00000000,80000000,00000000,00000000
> 427: 00000000,00000001,00000000,00000000,00000000,00000000,00000000,00000000
> 428: 00000000,00000002,00000000,00000000,00000000,00000000,00000000,00000000
> 429: 00000000,00000004,00000000,00000000,00000000,00000000,00000000,00000000
> 430: 00000000,00000008,00000000,00000000,00000000,00000000,00000000,00000000
> 431: 00000000,00000010,00000000,00000000,00000000,00000000,00000000,00000000
> 432: 00000000,00000020,00000000,00000000,00000000,00000000,00000000,00000000
> 433: 00000000,00000040,00000000,00000000,00000000,00000000,00000000,00000000
> 434: 00000000,00000080,00000000,00000000,00000000,00000000,00000000,00000000
> 435: 00000000,00000100,00000000,00000000,00000000,00000000,00000000,00000000
> 436: 00000000,00000200,00000000,00000000,00000000,00000000,00000000,00000000
> 437: 00000000,00000400,00000000,00000000,00000000,00000000,00000000,00000000
> 438: 00000000,00000800,00000000,00000000,00000000,00000000,00000000,00000000
> 439: 00000000,00001000,00000000,00000000,00000000,00000000,00000000,00000000
> 440: 00000000,00002000,00000000,00000000,00000000,00000000,00000000,00000000
> 441: 00000000,00004000,00000000,00000000,00000000,00000000,00000000,00000000
> 442: 00000000,00008000,00000000,00000000,00000000,00000000,00000000,00000000
> 443: 00000000,00010000,00000000,00000000,00000000,00000000,00000000,00000000
> 444: 00000000,00020000,00000000,00000000,00000000,00000000,00000000,00000000
> 445: 00000000,00040000,00000000,00000000,00000000,00000000,00000000,00000000
> 446: 00000000,00080000,00000000,00000000,00000000,00000000,00000000,00000000
> 447: 00000000,00100000,00000000,00000000,00000000,00000000,00000000,00000000
> 448: 00000000,00200000,00000000,00000000,00000000,00000000,00000000,00000000
> 449: 00000000,00400000,00000000,00000000,00000000,00000000,00000000,00000000
> 450: 00000000,00800000,00000000,00000000,00000000,00000000,00000000,00000000
> 451: 00000000,01000000,00000000,00000000,00000000,00000000,00000000,00000000
> 452: 00000000,02000000,00000000,00000000,00000000,00000000,00000000,00000000
> 453: 00000000,04000000,00000000,00000000,00000000,00000000,00000000,00000000
> 454: 00000000,08000000,00000000,00000000,00000000,00000000,00000000,00000000
> 455: 00000000,10000000,00000000,00000000,00000000,00000000,00000000,00000000
> 456: 00000000,20000000,00000000,00000000,00000000,00000000,00000000,00000000
> 457: 00000000,40000000,00000000,00000000,00000000,00000000,00000000,00000000
>
> Signed-off-by: Tariq Toukan <[email protected]>
> [Tweaked API use]
> Signed-off-by: Valentin Schneider <[email protected]>
> ---
> drivers/net/ethernet/mellanox/mlx5/core/eq.c | 18 ++++++++++++++++--
> 1 file changed, 16 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eq.c b/drivers/net/ethernet/mellanox/mlx5/core/eq.c
> index a0242dc15741c..7acbeb3d51846 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/eq.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/eq.c
> @@ -812,9 +812,12 @@ static void comp_irqs_release(struct mlx5_core_dev *dev)
> static int comp_irqs_request(struct mlx5_core_dev *dev)
> {
> struct mlx5_eq_table *table = dev->priv.eq_table;
> + const struct cpumask *prev = cpu_none_mask;
> + const struct cpumask *mask;
> int ncomp_eqs = table->num_comp_eqs;
> u16 *cpus;
> int ret;
> + int cpu;
> int i;
>
> ncomp_eqs = table->num_comp_eqs;
> @@ -833,8 +836,19 @@ static int comp_irqs_request(struct mlx5_core_dev *dev)
> ret = -ENOMEM;
> goto free_irqs;
> }
> - for (i = 0; i < ncomp_eqs; i++)
> - cpus[i] = cpumask_local_spread(i, dev->priv.numa_node);
> +
> + i = 0;
> + rcu_read_lock();
> + for_each_numa_hop_mask(mask, dev->priv.numa_node) {
> + for_each_cpu_andnot(cpu, mask, prev) {
> + cpus[i] = cpu;
> + if (++i == ncomp_eqs)
> + goto spread_done;
> + }
> + prev = mask;
> + }

I think it was me who suggested splitting the for_each_numa_hop_cpu()
from v4 to for_each_cpu_andnot() and for_each_numa_hop_mask() in email
from Sep 25. So, for this part:

Suggested-by: Yury Norov <[email protected]>

I'm also glad to see that anonymous structure disappeared. Nice work.

For the series:

Reviewed-by: Yury Norov <[email protected]>

> +spread_done:
> + rcu_read_unlock();
> ret = mlx5_irqs_request_vectors(dev, cpus, ncomp_eqs, table->comp_irqs);
> kfree(cpus);
> if (ret < 0)
> --
> 2.31.1