Hi,
I'm not subscribed to the list, please cc me on replies.
I have a CentOS 7 linux system with 48 logical CPUs and a number of
Intel NICs running the i40e driver. It was booted with
irqaffinity=0-1,24-25 in the kernel boot args, resulting in
/proc/irq/default_smp_affinity showing "0000,03000003". CPUs 2-11 are
set as "isolated" in the kernel boot args. The irqbalance daemon is not
running.
The problem I'm seeing is that /proc/interrupts shows iavf interrupts
(associated with physical devices running the i40e driver) on other CPUs
than the expected affinity. For example, here are some iavf interrupts
on CPU 4 where I would not expect to see any interrupts given that "cat
/proc/irq/<NUM>/smp_affinity_list" reports "0-1,24-25" for all these
interrupts. (Sorry for the line wrapping.)
cat /proc/interrupts | grep -e CPU -e 941: -e 942: -e 943: -e 944: -e
945: -e 961: -e 962: -e 963: -e 964: -e 965:
CPU0 CPU1 CPU2 CPU3 CPU4 CPU5
941: 0 0 0 0 28490 0
IR-PCI-MSI-edge iavf-0000:b5:03.6:mbx
942: 0 0 0 0 333832 0
IR-PCI-MSI-edge iavf-net1-TxRx-0
943: 0 0 0 0 300842 0
IR-PCI-MSI-edge iavf-net1-TxRx-1
944: 0 0 0 0 333845 0
IR-PCI-MSI-edge iavf-net1-TxRx-2
945: 0 0 0 0 333822 0
IR-PCI-MSI-edge iavf-net1-TxRx-3
961: 0 0 0 0 28492 0
IR-PCI-MSI-edge iavf-0000:b5:02.7:mbx
962: 0 0 0 0 435608 0
IR-PCI-MSI-edge iavf-net1-TxRx-0
963: 0 0 0 0 394832 0
IR-PCI-MSI-edge iavf-net1-TxRx-1
964: 0 0 0 0 398414 0
IR-PCI-MSI-edge iavf-net1-TxRx-2
965: 0 0 0 0 192847 0
IR-PCI-MSI-edge iavf-net1-TxRx-3
There were IRQs coming in on the "iavf-0000:b5:02.7:mbx" interrupt at
roughly 1 per second without any traffic, while the interrupt rate on
the "iavf-net1-TxRx-<X>" seemed to be related to traffic.
Is this expected? It seems like the IRQ subsystem is not respecting the
configured SMP affinity for the interrupt in question. I've also seen
the same behaviour with igb interrupts.
Anyone have any ideas?
Thanks,
Chris
On Fri, Jan 29 2021 at 13:17, Chris Friesen wrote:
> I have a CentOS 7 linux system with 48 logical CPUs and a number of
Kernel version?
> Intel NICs running the i40e driver. It was booted with
> irqaffinity=0-1,24-25 in the kernel boot args, resulting in
> /proc/irq/default_smp_affinity showing "0000,03000003". CPUs 2-11 are
> set as "isolated" in the kernel boot args. The irqbalance daemon is not
> running.
>
> The problem I'm seeing is that /proc/interrupts shows iavf interrupts
> (associated with physical devices running the i40e driver) on other CPUs
> than the expected affinity. For example, here are some iavf interrupts
> on CPU 4 where I would not expect to see any interrupts given that "cat
> /proc/irq/<NUM>/smp_affinity_list" reports "0-1,24-25" for all these
> interrupts. (Sorry for the line wrapping.)
>
> cat /proc/interrupts | grep -e CPU -e 941: -e 942: -e 943: -e 944: -e
> 945: -e 961: -e 962: -e 963: -e 964: -e 965:
>
> CPU0 CPU1 CPU2 CPU3 CPU4 CPU5
> 941: 0 0 0 0 28490 0
> IR-PCI-MSI-edge iavf-0000:b5:03.6:mbx
> 942: 0 0 0 0 333832 0
> IR-PCI-MSI-edge iavf-net1-TxRx-0
> 943: 0 0 0 0 300842 0
> IR-PCI-MSI-edge iavf-net1-TxRx-1
> 944: 0 0 0 0 333845 0
> IR-PCI-MSI-edge iavf-net1-TxRx-2
> 945: 0 0 0 0 333822 0
> IR-PCI-MSI-edge iavf-net1-TxRx-3
> 961: 0 0 0 0 28492 0
> IR-PCI-MSI-edge iavf-0000:b5:02.7:mbx
> 962: 0 0 0 0 435608 0
> IR-PCI-MSI-edge iavf-net1-TxRx-0
> 963: 0 0 0 0 394832 0
> IR-PCI-MSI-edge iavf-net1-TxRx-1
> 964: 0 0 0 0 398414 0
> IR-PCI-MSI-edge iavf-net1-TxRx-2
> 965: 0 0 0 0 192847 0
> IR-PCI-MSI-edge iavf-net1-TxRx-3
>
> There were IRQs coming in on the "iavf-0000:b5:02.7:mbx" interrupt at
> roughly 1 per second without any traffic, while the interrupt rate on
> the "iavf-net1-TxRx-<X>" seemed to be related to traffic.
>
> Is this expected? It seems like the IRQ subsystem is not respecting the
> configured SMP affinity for the interrupt in question. I've also seen
> the same behaviour with igb interrupts.
No it's not expected. Do you see the same behaviour with a recent
mainline kernel, i.e. 5.10 or 5.11?
Thanks,
tglx
On 3/28/21 2:45 PM, Thomas Gleixner wrote:
> On Fri, Jan 29 2021 at 13:17, Chris Friesen wrote:
>> I have a CentOS 7 linux system with 48 logical CPUs and a number of
<snip>
>> IR-PCI-MSI-edge iavf-net1-TxRx-3
>> 961: 0 0 0 0 28492 0
>> IR-PCI-MSI-edge iavf-0000:b5:02.7:mbx
>> 962: 0 0 0 0 435608 0
>> IR-PCI-MSI-edge iavf-net1-TxRx-0
>> 963: 0 0 0 0 394832 0
>> IR-PCI-MSI-edge iavf-net1-TxRx-1
>> 964: 0 0 0 0 398414 0
>> IR-PCI-MSI-edge iavf-net1-TxRx-2
>> 965: 0 0 0 0 192847 0
>> IR-PCI-MSI-edge iavf-net1-TxRx-3
>>
>> There were IRQs coming in on the "iavf-0000:b5:02.7:mbx" interrupt at
>> roughly 1 per second without any traffic, while the interrupt rate on
>> the "iavf-net1-TxRx-<X>" seemed to be related to traffic.
>>
>> Is this expected? It seems like the IRQ subsystem is not respecting the
>> configured SMP affinity for the interrupt in question. I've also seen
>> the same behaviour with igb interrupts.
> No it's not expected. Do you see the same behaviour with a recent
> mainline kernel, i.e. 5.10 or 5.11?
>
>
Jesse pointed me to this thread and apologies that it took a while for me
to respond here.
I agree it will be interesting to see with which kernel version Chris is
reproducing the issue.
Initially, I thought that this issue is the same as the one that we have
been discussing in another thread [1].
However, in that case, the smp affinity mask itself is incorrect and doesn't
follow the default smp affinity mask (with irqbalance disabled).
[1] https://lore.kernel.org/lkml/[email protected]/
--
Thanks
Nitesh
On Wed, Apr 21 2021 at 09:31, Nitesh Narayan Lal wrote:
> On 3/28/21 2:45 PM, Thomas Gleixner wrote:
>> On Fri, Jan 29 2021 at 13:17, Chris Friesen wrote:
>>> I have a CentOS 7 linux system with 48 logical CPUs and a number of
>
> <snip>
>
>>> IR-PCI-MSI-edge iavf-net1-TxRx-3
>>> 961: 0 0 0 0 28492 0
>>> IR-PCI-MSI-edge iavf-0000:b5:02.7:mbx
>>> 962: 0 0 0 0 435608 0
>>> IR-PCI-MSI-edge iavf-net1-TxRx-0
>>> 963: 0 0 0 0 394832 0
>>> IR-PCI-MSI-edge iavf-net1-TxRx-1
>>> 964: 0 0 0 0 398414 0
>>> IR-PCI-MSI-edge iavf-net1-TxRx-2
>>> 965: 0 0 0 0 192847 0
>>> IR-PCI-MSI-edge iavf-net1-TxRx-3
>>>
>>> There were IRQs coming in on the "iavf-0000:b5:02.7:mbx" interrupt at
>>> roughly 1 per second without any traffic, while the interrupt rate on
>>> the "iavf-net1-TxRx-<X>" seemed to be related to traffic.
>>>
>>> Is this expected? It seems like the IRQ subsystem is not respecting the
>>> configured SMP affinity for the interrupt in question. I've also seen
>>> the same behaviour with igb interrupts.
>> No it's not expected. Do you see the same behaviour with a recent
>> mainline kernel, i.e. 5.10 or 5.11?
>>
>>
> Jesse pointed me to this thread and apologies that it took a while for me
> to respond here.
>
> I agree it will be interesting to see with which kernel version Chris is
> reproducing the issue.
And the output of
/proc/irq/$NUMBER/smp_affinity_list
/proc/irq/$NUMBER/effective_affinity_list
> Initially, I thought that this issue is the same as the one that we have
> been discussing in another thread [1].
>
> However, in that case, the smp affinity mask itself is incorrect and doesn't
> follow the default smp affinity mask (with irqbalance disabled).
That's the question...
Thanks,
tglx
On 4/22/2021 9:42 AM, Thomas Gleixner wrote:
> On Wed, Apr 21 2021 at 09:31, Nitesh Narayan Lal wrote:
>> I agree it will be interesting to see with which kernel version Chris is
>> reproducing the issue.
>
> And the output of
>
> /proc/irq/$NUMBER/smp_affinity_list
> /proc/irq/$NUMBER/effective_affinity_list
I haven't forgotten about this, but I've had other priorities. Hoping
to get back to it in May sometime.
Chris