2015-04-08 08:33:46

by Urban Loesch

[permalink] [raw]
Subject: Kernel 3.18.11 hangs when inserting netconsle module on a DELL M620 VRTX Blade

Hi,

I'have installed a new DELL VRTX M620 Blade with kernel 3.18.11.
After system startup I tried to activate the kernel netconsole with remote logging enabled.

I executed the following command and the shell I issued it becomes unresponsive and hangs.

# modprobe netconsole netconsole="@/eth0,[email protected]/00:10:db:fc:60:0c"

The system load increases slowly and the CPU #11 uses 100% of soft irq. Only a soft reset
witohut loading the netconsole module after startup solves the issue.

# mpstat -P 11
09:23:52 CPU %usr %nice %sys %iowait %irq %soft %steal %guest %idle
09:23:53 11 0,00 0,00 0,00 0,00 0,00 100,00 0,00 0,00 0,00


I found the following error in the kernel log:

...
Apr 8 09:22:27 server2 kernel: [ 216.788670] ------------[ cut here ]------------
Apr 8 09:22:27 server2 kernel: [ 216.788676] WARNING: CPU: 11 PID: 2929 at kernel/softirq.c:147 __local_bh_enable_ip+0x72/0xa0()
Apr 8 09:22:27 server2 kernel: [ 216.788687] CPU: 11 PID: 2929 Comm: modprobe Not tainted 3.18.11-em64t-efigpt #1
Apr 8 09:22:27 server2 kernel: [ 216.788688] Hardware name: Dell Inc. PowerEdge M620/0NJVT7, BIOS 2.4.3 07/02/2014
Apr 8 09:22:27 server2 kernel: [ 216.788690] 0000000000000009 ffff881fcfaa39e8 ffffffff8174434a 0000000019af19af
Apr 8 09:22:27 server2 kernel: [ 216.788690] 0000000000000000 ffff881fcfaa3a28 ffffffff81051fac ffffffff81f4a080
Apr 8 09:22:27 server2 kernel: [ 216.788691] 0000000000000200 ffff881fcf624dd4 ffff881fcf624d58 0000000000000000
Apr 8 09:22:27 server2 kernel: [ 216.788692] Call Trace:
Apr 8 09:22:27 server2 kernel: [ 216.788696] [<ffffffff8174434a>] dump_stack+0x46/0x58
Apr 8 09:22:27 server2 kernel: [ 216.788698] [<ffffffff81051fac>] warn_slowpath_common+0x8c/0xc0
Apr 8 09:22:27 server2 kernel: [ 216.788699] [<ffffffff81051ffa>] warn_slowpath_null+0x1a/0x20
Apr 8 09:22:27 server2 kernel: [ 216.788701] [<ffffffff81055fc2>] __local_bh_enable_ip+0x72/0xa0
Apr 8 09:22:27 server2 kernel: [ 216.788704] [<ffffffff8174a3cb>] _raw_spin_unlock_bh+0x1b/0x20
Apr 8 09:22:27 server2 kernel: [ 216.788716] [<ffffffffa00b8f43>] bnx2x_poll+0x83/0x3e0 [bnx2x]
Apr 8 09:22:27 server2 kernel: [ 216.788720] [<ffffffff81667de0>] netpoll_poll_dev+0x110/0x1b0
Apr 8 09:22:27 server2 kernel: [ 216.788721] [<ffffffff81667fe7>] netpoll_send_skb_on_dev+0x167/0x240
Apr 8 09:22:27 server2 kernel: [ 216.788722] [<ffffffff81668392>] netpoll_send_udp+0x2d2/0x400
Apr 8 09:22:27 server2 kernel: [ 216.788724] [<ffffffffa018685f>] write_msg+0xcf/0x110 [netconsole]
Apr 8 09:22:27 server2 kernel: [ 216.788728] [<ffffffff8109e32b>] call_console_drivers.constprop.27+0x9b/0x100
Apr 8 09:22:27 server2 kernel: [ 216.788730] [<ffffffff8109f39a>] console_unlock+0x3ca/0x450
Apr 8 09:22:27 server2 kernel: [ 216.788731] [<ffffffff810a073a>] register_console+0x29a/0x360
Apr 8 09:22:27 server2 kernel: [ 216.788733] [<ffffffffa0191000>] ? 0xffffffffa0191000
Apr 8 09:22:27 server2 kernel: [ 216.788735] [<ffffffffa01911c5>] init_netconsole+0x1c5/0x1000 [netconsole]
Apr 8 09:22:27 server2 kernel: [ 216.788737] [<ffffffff810002dc>] do_one_initcall+0x8c/0x1c0
Apr 8 09:22:27 server2 kernel: [ 216.788740] [<ffffffff81181042>] ? __vunmap+0xc2/0x110
Apr 8 09:22:27 server2 kernel: [ 216.788743] [<ffffffff810d7f8d>] load_module+0x1dbd/0x25b0
Apr 8 09:22:27 server2 kernel: [ 216.788744] [<ffffffff810d4770>] ? show_initstate+0x60/0x60
Apr 8 09:22:27 server2 kernel: [ 216.788746] [<ffffffff8174c49f>] ? page_fault+0x1f/0x30
Apr 8 09:22:27 server2 kernel: [ 216.788747] [<ffffffff810d881a>] SyS_init_module+0x9a/0xc0
Apr 8 09:22:27 server2 kernel: [ 216.788749] [<ffffffff8174ab72>] system_call_fastpath+0x12/0x17
Apr 8 09:22:27 server2 kernel: [ 216.788750] ---[ end trace 224709e18793096d ]---
...

I installed the latest firmware driver from DELL for the Broadcom Nic's. Same problem
and I don't know if there is only affected the netconsole module or something else.

Linked modules are:
# lsmod
Module Size Used by
netconsole 23883 1
configfs 30744 2 netconsole
iTCO_wdt 13480 0
iTCO_vendor_support 13718 1 iTCO_wdt
ipmi_si 53458 0
ipmi_msghandler 45284 1 ipmi_si
tpm_tis 18227 0
tpm 35790 1 tpm_tis
sb_edac 26792 0
lpc_ich 21093 0
edac_core 57597 1 sb_edac
dcdbas 14478 0
shpchp 37047 0
pcspkr 12718 0
joydev 17389 0
hed 13247 0
acpi_pad 17942 0
evbug 12672 0
hid_generic 12559 0
usbkbd 12926 0
usbmouse 12789 0
usbhid 46465 0
hid 110129 2 hid_generic,usbhid
ahci 34019 0
libahci 32177 1 ahci
bnx2x 726130 0
ptp 19445 1 bnx2x
megaraid_sas 113654 3
pps_core 14386 1 ptp
mdio 13561 1 bnx2x


The system runs with 256GB RAM:
# free -m
total used free shared buffers cached
Mem: 257918 1834 256084 0 19 44
-/+ buffers/cache: 1770 256148
Swap: 7627 0 7627

And has 2 six-core cpu's:
# lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 24
On-line CPU(s) list: 0-23
Thread(s) per core: 2
Core(s) per socket: 6
Socket(s): 2
NUMA node(s): 2
Vendor ID: GenuineIntel
CPU family: 6
Model: 62
Stepping: 4
CPU MHz: 2599.966
BogoMIPS: 5200.39
Virtualization: VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 15360K
NUMA node0 CPU(s): 0,2,4,6,8,10,12,14,16,18,20,22
NUMA node1 CPU(s): 1,3,5,7,9,11,13,15,17,19,21,23


I tried kernel 3.10.40. It works correctly, but I need a newer kernel,
because the shared PERC 8 linux driver for DELL VRTX is available since version 3.15.

Have you an idea how I can solve this? If you net more information, please let me know.
Please cc me, because I'm not a member of lkml.

Many thanks
Urban Loesch


2015-04-08 10:50:54

by Peter Hurley

[permalink] [raw]
Subject: [bnx2x] Re: Kernel 3.18.11 hangs when inserting netconsle module on a DELL M620 VRTX Blade

[ + Ariel Elior for bnx2x driver, netdev ]

On 04/08/2015 04:27 AM, Urban Loesch wrote:
> Hi,
>
> I'have installed a new DELL VRTX M620 Blade with kernel 3.18.11.
> After system startup I tried to activate the kernel netconsole with remote logging enabled.
>
> I executed the following command and the shell I issued it becomes unresponsive and hangs.
>
> # modprobe netconsole netconsole="@/eth0,[email protected]/00:10:db:fc:60:0c"
>
> The system load increases slowly and the CPU #11 uses 100% of soft irq. Only a soft reset
> witohut loading the netconsole module after startup solves the issue.
>
> # mpstat -P 11
> 09:23:52 CPU %usr %nice %sys %iowait %irq %soft %steal %guest %idle
> 09:23:53 11 0,00 0,00 0,00 0,00 0,00 100,00 0,00 0,00 0,00
>
>
> I found the following error in the kernel log:
>
> ...
> Apr 8 09:22:27 server2 kernel: [ 216.788670] ------------[ cut here ]------------
> Apr 8 09:22:27 server2 kernel: [ 216.788676] WARNING: CPU: 11 PID: 2929 at kernel/softirq.c:147 __local_bh_enable_ip+0x72/0xa0()
> Apr 8 09:22:27 server2 kernel: [ 216.788687] CPU: 11 PID: 2929 Comm: modprobe Not tainted 3.18.11-em64t-efigpt #1
> Apr 8 09:22:27 server2 kernel: [ 216.788688] Hardware name: Dell Inc. PowerEdge M620/0NJVT7, BIOS 2.4.3 07/02/2014
> Apr 8 09:22:27 server2 kernel: [ 216.788690] 0000000000000009 ffff881fcfaa39e8 ffffffff8174434a 0000000019af19af
> Apr 8 09:22:27 server2 kernel: [ 216.788690] 0000000000000000 ffff881fcfaa3a28 ffffffff81051fac ffffffff81f4a080
> Apr 8 09:22:27 server2 kernel: [ 216.788691] 0000000000000200 ffff881fcf624dd4 ffff881fcf624d58 0000000000000000
> Apr 8 09:22:27 server2 kernel: [ 216.788692] Call Trace:
> Apr 8 09:22:27 server2 kernel: [ 216.788696] [<ffffffff8174434a>] dump_stack+0x46/0x58
> Apr 8 09:22:27 server2 kernel: [ 216.788698] [<ffffffff81051fac>] warn_slowpath_common+0x8c/0xc0
> Apr 8 09:22:27 server2 kernel: [ 216.788699] [<ffffffff81051ffa>] warn_slowpath_null+0x1a/0x20
> Apr 8 09:22:27 server2 kernel: [ 216.788701] [<ffffffff81055fc2>] __local_bh_enable_ip+0x72/0xa0
> Apr 8 09:22:27 server2 kernel: [ 216.788704] [<ffffffff8174a3cb>] _raw_spin_unlock_bh+0x1b/0x20
> Apr 8 09:22:27 server2 kernel: [ 216.788716] [<ffffffffa00b8f43>] bnx2x_poll+0x83/0x3e0 [bnx2x]
> Apr 8 09:22:27 server2 kernel: [ 216.788720] [<ffffffff81667de0>] netpoll_poll_dev+0x110/0x1b0
> Apr 8 09:22:27 server2 kernel: [ 216.788721] [<ffffffff81667fe7>] netpoll_send_skb_on_dev+0x167/0x240
> Apr 8 09:22:27 server2 kernel: [ 216.788722] [<ffffffff81668392>] netpoll_send_udp+0x2d2/0x400
> Apr 8 09:22:27 server2 kernel: [ 216.788724] [<ffffffffa018685f>] write_msg+0xcf/0x110 [netconsole]
> Apr 8 09:22:27 server2 kernel: [ 216.788728] [<ffffffff8109e32b>] call_console_drivers.constprop.27+0x9b/0x100
> Apr 8 09:22:27 server2 kernel: [ 216.788730] [<ffffffff8109f39a>] console_unlock+0x3ca/0x450
> Apr 8 09:22:27 server2 kernel: [ 216.788731] [<ffffffff810a073a>] register_console+0x29a/0x360
> Apr 8 09:22:27 server2 kernel: [ 216.788733] [<ffffffffa0191000>] ? 0xffffffffa0191000
> Apr 8 09:22:27 server2 kernel: [ 216.788735] [<ffffffffa01911c5>] init_netconsole+0x1c5/0x1000 [netconsole]
> Apr 8 09:22:27 server2 kernel: [ 216.788737] [<ffffffff810002dc>] do_one_initcall+0x8c/0x1c0
> Apr 8 09:22:27 server2 kernel: [ 216.788740] [<ffffffff81181042>] ? __vunmap+0xc2/0x110
> Apr 8 09:22:27 server2 kernel: [ 216.788743] [<ffffffff810d7f8d>] load_module+0x1dbd/0x25b0
> Apr 8 09:22:27 server2 kernel: [ 216.788744] [<ffffffff810d4770>] ? show_initstate+0x60/0x60
> Apr 8 09:22:27 server2 kernel: [ 216.788746] [<ffffffff8174c49f>] ? page_fault+0x1f/0x30
> Apr 8 09:22:27 server2 kernel: [ 216.788747] [<ffffffff810d881a>] SyS_init_module+0x9a/0xc0
> Apr 8 09:22:27 server2 kernel: [ 216.788749] [<ffffffff8174ab72>] system_call_fastpath+0x12/0x17
> Apr 8 09:22:27 server2 kernel: [ 216.788750] ---[ end trace 224709e18793096d ]---
> ...
>
> I installed the latest firmware driver from DELL for the Broadcom Nic's. Same problem
> and I don't know if there is only affected the netconsole module or something else.
>
> Linked modules are:
> # lsmod
> Module Size Used by
> netconsole 23883 1
> configfs 30744 2 netconsole
> iTCO_wdt 13480 0
> iTCO_vendor_support 13718 1 iTCO_wdt
> ipmi_si 53458 0
> ipmi_msghandler 45284 1 ipmi_si
> tpm_tis 18227 0
> tpm 35790 1 tpm_tis
> sb_edac 26792 0
> lpc_ich 21093 0
> edac_core 57597 1 sb_edac
> dcdbas 14478 0
> shpchp 37047 0
> pcspkr 12718 0
> joydev 17389 0
> hed 13247 0
> acpi_pad 17942 0
> evbug 12672 0
> hid_generic 12559 0
> usbkbd 12926 0
> usbmouse 12789 0
> usbhid 46465 0
> hid 110129 2 hid_generic,usbhid
> ahci 34019 0
> libahci 32177 1 ahci
> bnx2x 726130 0
> ptp 19445 1 bnx2x
> megaraid_sas 113654 3
> pps_core 14386 1 ptp
> mdio 13561 1 bnx2x
>
>
> The system runs with 256GB RAM:
> # free -m
> total used free shared buffers cached
> Mem: 257918 1834 256084 0 19 44
> -/+ buffers/cache: 1770 256148
> Swap: 7627 0 7627
>
> And has 2 six-core cpu's:
> # lscpu
> Architecture: x86_64
> CPU op-mode(s): 32-bit, 64-bit
> Byte Order: Little Endian
> CPU(s): 24
> On-line CPU(s) list: 0-23
> Thread(s) per core: 2
> Core(s) per socket: 6
> Socket(s): 2
> NUMA node(s): 2
> Vendor ID: GenuineIntel
> CPU family: 6
> Model: 62
> Stepping: 4
> CPU MHz: 2599.966
> BogoMIPS: 5200.39
> Virtualization: VT-x
> L1d cache: 32K
> L1i cache: 32K
> L2 cache: 256K
> L3 cache: 15360K
> NUMA node0 CPU(s): 0,2,4,6,8,10,12,14,16,18,20,22
> NUMA node1 CPU(s): 1,3,5,7,9,11,13,15,17,19,21,23
>
>
> I tried kernel 3.10.40. It works correctly, but I need a newer kernel,
> because the shared PERC 8 linux driver for DELL VRTX is available since version 3.15.
>
> Have you an idea how I can solve this? If you net more information, please let me know.
> Please cc me, because I'm not a member of lkml.
>
> Many thanks
> Urban Loesch
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>

2015-04-08 14:42:28

by Yuval Mintz

[permalink] [raw]
Subject: RE: [bnx2x] Re: Kernel 3.18.11 hangs when inserting netconsle module on a DELL M620 VRTX Blade

> > I'have installed a new DELL VRTX M620 Blade with kernel 3.18.11.
> > After system startup I tried to activate the kernel netconsole with remote
> logging enabled.
> >
> > I executed the following command and the shell I issued it becomes
> unresponsive and hangs.
> >
> > # modprobe netconsole
> netconsole="@/eth0,[email protected]/00:10:db:fc:60:0c"
> >
> > The system load increases slowly and the CPU #11 uses 100% of soft
> > irq. Only a soft reset witohut loading the netconsole module after startup
> solves the issue.

I suspect this is a regression introduced by 9a2620c87745
"bnx2x: prevent WARN during driver unload".

bnx2x locks & unlocks spin_lock_bh() during the napi poll, which shouldn't
be done while interrupts are disabled. This break interoperability with netpoll,
as it disables irqs prior to sending the skb on the bnx2x's interface.

Can you please try compiling your kernel without CONFIG_NET_RX_BUSY_POLL?
I suspect that might solve your issue.

Regardless, we'll investigate this further and hopefully come up with a fix soon.

Thanks,
Yuval
????{.n?+???????+%?????ݶ??w??{.n?+????{??G?????{ay?ʇڙ?,j??f???h?????????z_??(?階?ݢj"???m??????G????????????&???~???iO???z??v?^?m???? ????????I?

2015-04-09 12:35:06

by Urban Loesch

[permalink] [raw]
Subject: Re: [bnx2x] Re: Kernel 3.18.11 hangs when inserting netconsle module on a DELL M620 VRTX Blade

Hi,

thanks for your help.

Am 08.04.2015 um 16:42 schrieb Yuval Mintz:
>>> I'have installed a new DELL VRTX M620 Blade with kernel 3.18.11.
>>> After system startup I tried to activate the kernel netconsole with remote
>> logging enabled.
>>>
>>> I executed the following command and the shell I issued it becomes
>> unresponsive and hangs.
>>>
>>> # modprobe netconsole
>> netconsole="@/eth0,[email protected]/00:10:db:fc:60:0c"
>>>
>>> The system load increases slowly and the CPU #11 uses 100% of soft
>>> irq. Only a soft reset witohut loading the netconsole module after startup
>> solves the issue.
>
> I suspect this is a regression introduced by 9a2620c87745
> "bnx2x: prevent WARN during driver unload".
>
> bnx2x locks & unlocks spin_lock_bh() during the napi poll, which shouldn't
> be done while interrupts are disabled. This break interoperability with netpoll,
> as it disables irqs prior to sending the skb on the bnx2x's interface.
>
> Can you please try compiling your kernel without CONFIG_NET_RX_BUSY_POLL?
> I suspect that might solve your issue.

I compiled my kernel without CONFIG_NET_RX_BUSY_POLL.
...
# CONFIG_NET_RX_BUSY_POLL is not set
...

I tried multiple times to insert an remove the netconsole module.
There was no error anymore.

Compiling the kernel without CONFIG_NET_RX_BUSY_POLL solves the issue.
At least for me.

Thanks
Urban