2024-04-28 10:55:50

by Erhard Furtner

[permalink] [raw]
Subject: WARNING: CPU: 1 PID: 1 at net/core/netpoll.c:370 netpoll_send_skb+0x1fc/0x20c at boot when netconsole is enabled (kernel v6.9-rc5, v6.8.7, sungem, PowerMac G4 DP)

Greetings!

With netconsole enabled I get this "WARNING: CPU: 1 PID: 1 at net/core/netpoll.c:370 netpoll_send_skb+0x1fc/0x20c" and "WARNING: CPU: 1 PID: 1 at kernel/locking/irqflag-debug.c:10 warn_bogus_irq_restore+0x30/0x44" at boot on my PowerMac G4 DP. Happens more often than not (6-7 out of 10 times booting):

[...]
netpoll: netconsole: local port 6666
netpoll: netconsole: local IPv4 address 192.168.2.8
netpoll: netconsole: interface 'eth0'
netpoll: netconsole: remote port 6666
netpoll: netconsole: remote IPv4 address 192.168.2.3
netpoll: netconsole: remote ethernet address a8:a1:59:16:4f:ea
netpoll: netconsole: device eth0 not up yet, forcing it
gem 0002:00:0f.0 eth0: Found BCM5421 PHY
gem 0002:00:0f.0 eth0: Link is up at 1000 Mbps, full-duplex
gem 0002:00:0f.0 eth0: Pause is enabled (rxfifo: 10240 off: 7168 on: 5632)
printk: legacy console [netcon0] enabled
------------[ cut here ]------------
netpoll_send_skb_on_dev(): eth0 enabled interrupts in poll (gem_start_xmit+0x0/0x398)
WARNING: CPU: 1 PID: 1 at net/core/netpoll.c:370 netpoll_send_skb+0x1fc/0x20c
Modules linked in:
CPU: 1 PID: 1 Comm: swapper/0 Not tainted 6.9.0-rc5-PMacG4 #4
Hardware name: PowerMac3,6 7455 0x80010303 PowerMac
NIP: c0699828 LR: c0699828 CTR: 00000000
REGS: f1021b30 TRAP: 0700 Not tainted (6.9.0-rc5-PMacG4)
MSR: 00029032 <EE,ME,IR,DR,RI> CR: 24004222 XER: 00000000

GPR00: c0699828 f1021bf0 c1180ba0 00000055 000001d6 f1021a24 f1021a18 00000000
GPR08: 2e078000 00000027 00000027 f1021bf0 28004222 00000000 00000000 00000000
GPR16: 00000000 00009032 c0df6022 c0df6022 00000000 00000001 c1df8f8c c0da8058
GPR24: 00000000 00001032 c1eae680 c1df8f8c 00000041 c22b3640 c10dcb00 c1f69000
NIP [c0699828] netpoll_send_skb+0x1fc/0x20c
LR [c0699828] netpoll_send_skb+0x1fc/0x20c
Call Trace:
[f1021bf0] [c0699828] netpoll_send_skb+0x1fc/0x20c (unreliable)
[f1021c20] [c05b6790] write_msg+0x10c/0x120
[f1021c60] [c0094d14] console_flush_all+0x20c/0x2c0
[f1021ce0] [c0094e34] console_unlock+0x6c/0x108
[f1021d20] [c0095cdc] vprintk_emit+0x1a0/0x1a4
[f1021d50] [c0091ff0] _printk+0x6c/0x74
[f1021da0] [c00956d8] register_console+0x2f0/0x45c
[f1021de0] [c0c47c24] init_netconsole+0xc8/0x4a0
[f1021e50] [c0007bd8] do_one_initcall+0x84/0x294
[f1021ec0] [c0c03dd4] kernel_init_freeable+0x2e0/0x398
[f1021f10] [c0007ec4] kernel_init+0x24/0x138
[f1021f30] [c00152ec] ret_from_kernel_user_thread+0x10/0x18
--- interrupt: 0 at 0x0
Code: 39295fdf 89490002 2c0a0000 40a20028 39400001 3c60c09a 389f00e0 99490002 3863d208 813f0008 80a90010 4b9b16c5 <0fe00000> 2c18000e 4081fec8 4bfffea0
---[ end trace 0000000000000000 ]---
------------[ cut here ]------------
raw_local_irq_restore() called with IRQs enabled
WARNING: CPU: 1 PID: 1 at kernel/locking/irqflag-debug.c:10 warn_bogus_irq_restore+0x30/0x44
Modules linked in:
CPU: 1 PID: 1 Comm: swapper/0 Tainted: G W 6.9.0-rc5-PMacG4 #4
Hardware name: PowerMac3,6 7455 0x80010303 PowerMac
NIP: c07b3280 LR: c07b3280 CTR: 00000000
REGS: f1021b20 TRAP: 0700 Tainted: G W (6.9.0-rc5-PMacG4)
MSR: 00029032 <EE,ME,IR,DR,RI> CR: 24004222 XER: 00000000

GPR00: c07b3280 f1021be0 c1180ba0 00000030 000001f2 f1021a14 f1021a08 00000000
GPR08: 2e078000 00000027 00000027 f1021be0 28004222 00000000 00000000 00000000
GPR16: 00000000 00009032 c0df6022 c0df6022 00000000 00000001 c1df8f8c c0da8058
GPR24: 00000000 00001032 c1eae680 c1df8f8c 00000041 c22b3640 c10dcb00 00000000
NIP [c07b3280] warn_bogus_irq_restore+0x30/0x44
LR [c07b3280] warn_bogus_irq_restore+0x30/0x44
Call Trace:
[f1021be0] [c07b3280] warn_bogus_irq_restore+0x30/0x44 (unreliable)
[f1021bf0] [c06996a0] netpoll_send_skb+0x74/0x20c
[f1021c20] [c05b6790] write_msg+0x10c/0x120
[f1021c60] [c0094d14] console_flush_all+0x20c/0x2c0
[f1021ce0] [c0094e34] console_unlock+0x6c/0x108
[f1021d20] [c0095cdc] vprintk_emit+0x1a0/0x1a4
[f1021d50] [c0091ff0] _printk+0x6c/0x74
[f1021da0] [c00956d8] register_console+0x2f0/0x45c
[f1021de0] [c0c47c24] init_netconsole+0xc8/0x4a0
[f1021e50] [c0007bd8] do_one_initcall+0x84/0x294
[f1021ec0] [c0c03dd4] kernel_init_freeable+0x2e0/0x398
[f1021f10] [c0007ec4] kernel_init+0x24/0x138
[f1021f30] [c00152ec] ret_from_kernel_user_thread+0x10/0x18
--- interrupt: 0 at 0x0
Code: 3d20c0db 89495e72 2c0a0000 4ca20020 9421fff0 3c60c092 7c0802a6 39400001 38634c9f 99495e72 90010014 4b897c6d <0fe00000> 80010014 38210010 7c0803a6
---[ end trace 0000000000000000 ]---
netconsole: network logging started
[...]

This dmesg is from kernel v6.9-rc5 but I get these warnings on stable v6.8.x and v6.6.x too.

Some data about the machine and the network card:

# inxi -bz
System:
Kernel: 6.8.7-gentoo-PMacG4 arch: ppc bits: 32
Desktop: WindowMaker v: 0.95.9 Distro: Gentoo Base System release 2.14
Machine:
Type: PPC System: PowerMac3 6 details: N/A mobo: PowerMac3,6 MacRISC3 Power
Macintosh serial: <filter>
CPU:
Info: single core 7455 altivec supported [MT MCM] speed (MHz): avg: 1417
Graphics:
Device-1: AMD RV350 [Radeon 9550/9600/X1050 Series] driver: radeon v: kernel
Display: x11 server: X.org v: 1.21.1.13 driver: X: loaded: radeon
unloaded: fbdev,modesetting dri: r300 gpu: radeon
resolution: <missing: xdpyinfo/xrandr> resolution: 1920x1080
API: OpenGL v: 4.5 vendor: mesa v: 24.0.4 renderer: llvmpipe (LLVM 17.0.6
128 bits)
Network:
Device-1: Broadcom BCM4306 802.11b/g Wireless LAN driver: b43-pci-bridge
Device-2: Apple UniNorth 2 GMAC driver: gem
Device-3: gmac driver: gem
Drives:
Local Storage: total: 689.82 GiB used: 7.62 GiB (1.1%)
Info:
Memory: total: 2 GiB available: 1.96 GiB used: 166.3 MiB (8.3%)
Processes: 159 Uptime: 8m Shell: Bash inxi: 3.3.33
# lspci
0000:00:0b.0 Host bridge: Apple Inc. UniNorth 2 AGP
0000:00:10.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] RV350 [Radeon 9550/9600/X1050 Series]
0001:00:0b.0 Host bridge: Apple Inc. UniNorth 2 PCI
0001:00:12.0 USB controller: NEC Corporation OHCI USB Controller (rev 43)
0001:00:12.1 USB controller: NEC Corporation OHCI USB Controller (rev 43)
0001:00:12.2 USB controller: NEC Corporation uPD72010x USB 2.0 Controller (rev 04)
0001:00:13.0 Mass storage controller: Silicon Image, Inc. SiI 3112 [SATALink/SATARaid] Serial ATA Controller (rev 02)
0001:00:15.0 Serial controller: MosChip Semiconductor Technology Ltd. PCI 9865 Multi-I/O Controller
0001:00:15.1 Serial controller: MosChip Semiconductor Technology Ltd. PCI 9865 Multi-I/O Controller
0001:00:15.2 Parallel controller: Illegal Vendor ID Device 9865
0001:00:16.0 Network controller: Broadcom Inc. and subsidiaries BCM4306 802.11b/g Wireless LAN Controller (rev 02)
0001:00:17.0 Unassigned class [ff00]: Apple Inc. KeyLargo Mac I/O (rev 03)
0001:00:18.0 USB controller: Apple Inc. KeyLargo USB
0001:00:19.0 USB controller: Apple Inc. KeyLargo USB
0001:00:1b.0 USB controller: NEC Corporation OHCI USB Controller (rev 43)
0001:00:1b.1 USB controller: NEC Corporation OHCI USB Controller (rev 43)
0001:00:1b.2 USB controller: NEC Corporation uPD72010x USB 2.0 Controller (rev 04)
0002:00:0b.0 Host bridge: Apple Inc. UniNorth 2 Internal PCI
0002:00:0d.0 Unassigned class [ff00]: Apple Inc. UniNorth 2 ATA/100
0002:00:0e.0 FireWire (IEEE 1394): Apple Inc. UniNorth 2 FireWire (rev 01)
0002:00:0f.0 Ethernet controller: Apple Inc. UniNorth 2 GMAC (Sun GEM)
# lspci -s 0002:00:0f.0 -vv
0002:00:0f.0 Ethernet controller: Apple Inc. UniNorth 2 GMAC (Sun GEM)
Device tree node: /sys/firmware/devicetree/base/pci@f4000000/ethernet@f
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap- 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=slow >TAbort- <TAbort- <MAbort- >SERR- <PERR+ INTx-
Latency: 16 (16000ns min, 16000ns max), Cache Line Size: 32 bytes
Interrupt: pin A routed to IRQ 41
Region 0: Memory at f5200000 (32-bit, non-prefetchable) [size=2M]
Expansion ROM at f5100000 [disabled] [size=1M]
Kernel driver in use: gem

Full dmesg an kernel .config attached.

Regards,
Erhard


Attachments:
(No filename) (7.95 kB)
config_69-rc5_g4 (113.60 kB)
dmesg_69-rc5_g4 (69.25 kB)
Download all attachments

2024-04-30 01:36:46

by Jakub Kicinski

[permalink] [raw]
Subject: Re: WARNING: CPU: 1 PID: 1 at net/core/netpoll.c:370 netpoll_send_skb+0x1fc/0x20c at boot when netconsole is enabled (kernel v6.9-rc5, v6.8.7, sungem, PowerMac G4 DP)

On Sun, 28 Apr 2024 12:53:06 +0200 Erhard Furtner wrote:
> With netconsole enabled I get this "WARNING: CPU: 1 PID: 1 at
> net/core/netpoll.c:370 netpoll_send_skb+0x1fc/0x20c" and "WARNING:
> CPU: 1 PID: 1 at kernel/locking/irqflag-debug.c:10
> warn_bogus_irq_restore+0x30/0x44" at boot on my PowerMac G4 DP.
> Happens more often than not (6-7 out of 10 times booting):

Could you try with LOCKDEP enabled?
I wonder if irqs_disabled() behaves differently than we expect.

2024-05-04 15:00:49

by Erhard Furtner

[permalink] [raw]
Subject: Re: WARNING: CPU: 1 PID: 1 at net/core/netpoll.c:370 netpoll_send_skb+0x1fc/0x20c at boot when netconsole is enabled (kernel v6.9-rc5, v6.8.7, sungem, PowerMac G4 DP)

On Mon, 29 Apr 2024 18:36:30 -0700
Jakub Kicinski <[email protected]> wrote:

> On Sun, 28 Apr 2024 12:53:06 +0200 Erhard Furtner wrote:
> > With netconsole enabled I get this "WARNING: CPU: 1 PID: 1 at
> > net/core/netpoll.c:370 netpoll_send_skb+0x1fc/0x20c" and "WARNING:
> > CPU: 1 PID: 1 at kernel/locking/irqflag-debug.c:10
> > warn_bogus_irq_restore+0x30/0x44" at boot on my PowerMac G4 DP.
> > Happens more often than not (6-7 out of 10 times booting):
>
> Could you try with LOCKDEP enabled?
> I wonder if irqs_disabled() behaves differently than we expect.

Enabled LOCK_ALLOC and LOCKDEP and re-tried with v6.9-rc6. I can reproduce the WARNING but got no additional dmesg.

I'll check whether PROVE_LOCKING makes a difference or any of the locking tests show something interesting.

Full dmesg attached.

Regards,
Erhard


Attachments:
(No filename) (853.00 B)
dmesg_69-rc6_g4_00 (46.90 kB)
Download all attachments

2024-05-05 21:27:48

by Erhard Furtner

[permalink] [raw]
Subject: Re: WARNING: CPU: 1 PID: 1 at net/core/netpoll.c:370 netpoll_send_skb+0x1fc/0x20c at boot when netconsole is enabled (kernel v6.9-rc5, v6.8.7, sungem, PowerMac G4 DP)

On Mon, 29 Apr 2024 18:36:30 -0700
Jakub Kicinski <[email protected]> wrote:

> On Sun, 28 Apr 2024 12:53:06 +0200 Erhard Furtner wrote:
> > With netconsole enabled I get this "WARNING: CPU: 1 PID: 1 at
> > net/core/netpoll.c:370 netpoll_send_skb+0x1fc/0x20c" and "WARNING:
> > CPU: 1 PID: 1 at kernel/locking/irqflag-debug.c:10
> > warn_bogus_irq_restore+0x30/0x44" at boot on my PowerMac G4 DP.
> > Happens more often than not (6-7 out of 10 times booting):
>
> Could you try with LOCKDEP enabled?
> I wonder if irqs_disabled() behaves differently than we expect.

Ok, after a few tries I got a "BUG: spinlock wrong CPU on CPU#0, swapper/0/1" LOCKDEP hit. But this does not happen every time when I get the netpoll_send WARNING:

[...]
printk: legacy console [netcon0] enabled
------------[ cut here ]------------
netpoll_send_skb_on_dev(): eth0 enabled interrupts in poll (gem_start_xmit+0x0/0x378)
WARNING: CPU: 0 PID: 1 at net/core/netpoll.c:370 netpoll_send_skb+0x1fc/0x20c
Modules linked in:
CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.9.0-rc6-PMacG4 #4
Hardware name: PowerMac3,6 7455 0x80010303 PowerMac
NIP: c06c1198 LR: c06c1198 CTR: 00000000
REGS: f1021b20 TRAP: 0700 Not tainted (6.9.0-rc6-PMacG4)
MSR: 02029032 <VEC,EE,ME,IR,DR,RI> CR: 24004222 XER: 00000000

GPR00: c06c1198 f1021be0 c15ab6e0 00000055 5f96410c 1bc90275 f1021b1c 0000001a
GPR08: 2e045000 00000000 c0d63998 f1021be0 24004222 00000000 c0d63a58 00000001
GPR16: 00009032 c00a0000 00000000 c009d69c 00000000 00000001 c22ccd34 c0ff485a
GPR24: 00000000 00001032 c2139c00 c22ccd34 00000041 c283a560 c1581f00 c2481000
NIP [c06c1198] netpoll_send_skb+0x1fc/0x20c
LR [c06c1198] netpoll_send_skb+0x1fc/0x20c
Call Trace:
[f1021be0] [c06c1198] netpoll_send_skb+0x1fc/0x20c (unreliable)
[f1021c10] [c05da3b8] write_msg+0x10c/0x120
[f1021c50] [c009d6d4] console_flush_all+0x244/0x330
[f1021cd0] [c009d834] console_unlock+0x74/0x124
[f1021d20] [c009e7e0] vprintk_emit+0x1fc/0x204
[f1021d50] [c009a6b4] _printk+0x6c/0x74
[f1021da0] [c009e13c] register_console+0x300/0x490
[f1021de0] [c0c486d8] init_netconsole+0xc8/0x498
[f1021e50] [c0007c08] do_one_initcall+0x84/0x294
[f1021ec0] [c0c03e70] kernel_init_freeable+0x378/0x470
[f1021f10] [c0007f34] kernel_init+0x24/0x138
[f1021f30] [c00152ec] ret_from_kernel_user_thread+0x10/0x18
--- interrupt: 0 at 0x0
Code: 39299707 89490002 2c0a0000 40a20028 39400001 3c60c0a5 389f00e0 99490002 386393d6 813f0008 80a90010 4b989fe1 <0fe00000> 2c18000e 4081fec8 4bfffea0
---[ end trace 0000000000000000 ]---
------------[ cut here ]------------
raw_local_irq_restore() called with IRQs enabled
WARNING: CPU: 0 PID: 1 at kernel/locking/irqflag-debug.c:10 warn_bogus_irq_restore+0x30/0x44
Modules linked in:
CPU: 0 PID: 1 Comm: swapper/0 Tainted: G W 6.9.0-rc6-PMacG4 #4
Hardware name: PowerMac3,6 7455 0x80010303 PowerMac
NIP: c07e3bf4 LR: c07e3bf4 CTR: 00000000
REGS: f1021b10 TRAP: 0700 Tainted: G W (6.9.0-rc6-PMacG4)
MSR: 02029032 <VEC,EE,ME,IR,DR,RI> CR: 24004222 XER: 00000000

GPR00: c07e3bf4 f1021bd0 c15ab6e0 00000030 5f96410c 1bc90275 f1021b0c 0000001a
GPR08: 2e045000 00000000 c0d63998 f1021bd0 24004222 00000000 c0d63a58 00000001
GPR16: 00009032 c00a0000 00000000 c009d69c 00000000 00000001 c22ccd34 c0ff485a
GPR24: 00000000 00001032 c2139c00 c22ccd34 00000041 c283a560 c1581f00 00000000
NIP [c07e3bf4] warn_bogus_irq_restore+0x30/0x44
LR [c07e3bf4] warn_bogus_irq_restore+0x30/0x44
Call Trace:
[f1021bd0] [c07e3bf4] warn_bogus_irq_restore+0x30/0x44 (unreliable)
[f1021be0] [c06c1010] netpoll_send_skb+0x74/0x20c
[f1021c10] [c05da3b8] write_msg+0x10c/0x120
[f1021c50] [c009d6d4] console_flush_all+0x244/0x330
[f1021cd0] [c009d834] console_unlock+0x74/0x124
[f1021d20] [c009e7e0] vprintk_emit+0x1fc/0x204
[f1021d50] [c009a6b4] _printk+0x6c/0x74
[f1021da0] [c009e13c] register_console+0x300/0x490
[f1021de0] [c0c486d8] init_netconsole+0xc8/0x498
[f1021e50] [c0007c08] do_one_initcall+0x84/0x294
[f1021ec0] [c0c03e70] kernel_init_freeable+0x378/0x470
[f1021f10] [c0007f34] kernel_init+0x24/0x138
[f1021f30] [c00152ec] ret_from_kernel_user_thread+0x10/0x18
--- interrupt: 0 at 0x0
Code: 3d20c0dd 89499593 2c0a0000 4ca20020 9421fff0 3c60c09d 7c0802a6 39400001 386382d5 99499593 90010014 4b867585 <0fe00000> 80010014 38210010 7c0803a6
---[ end trace 0000000000000000 ]---
BUG: spinlock wrong CPU on CPU#0, swapper/0/1
lock: target_list_lock+0x0/0x24, .magic: dead4ead, .owner: swapper/0/1, .owner_cpu: 1
CPU: 0 PID: 1 Comm: swapper/0 Tainted: G W 6.9.0-rc6-PMacG4 #4
Hardware name: PowerMac3,6 7455 0x80010303 PowerMac
Call Trace:
[f1021be0] [c07c2c2c] dump_stack_lvl+0x80/0xac (unreliable)
[f1021c00] [c0099474] do_raw_spin_unlock+0x90/0xb0
[f1021c30] [c07e9cf0] _raw_spin_unlock_irqrestore+0x2c/0x58
[f1021c50] [c009d6d4] console_flush_all+0x244/0x330
[f1021cd0] [c009d834] console_unlock+0x74/0x124
[f1021d20] [c009e7e0] vprintk_emit+0x1fc/0x204
[f1021d50] [c009a6b4] _printk+0x6c/0x74
[f1021da0] [c009e13c] register_console+0x300/0x490
[f1021de0] [c0c486d8] init_netconsole+0xc8/0x498
[f1021e50] [c0007c08] do_one_initcall+0x84/0x294
[f1021ec0] [c0c03e70] kernel_init_freeable+0x378/0x470
[f1021f10] [c0007f34] kernel_init+0x24/0x138
[f1021f30] [c00152ec] ret_from_kernel_user_thread+0x10/0x18
--- interrupt: 0 at 0x0
netconsole: network logging started
[...]

The G4 passed the slub, locking KUnit tests I tried. But at running dev_addr_lists_test KUnit test I get most of the time:

[...]
KTAP version 1
1..1
KTAP version 1
# Subtest: dev-addr-list-test
# module: dev_addr_lists_test
1..6

====================================
WARNING: kunit_try_catch/1778 still has locks held!
6.9.0-rc6-PMacG4 #4 Tainted: G N
------------------------------------
1 lock held by kunit_try_catch/1778:
#0: c0dbfce4 (rtnl_mutex){....}-{3:3}, at: dev_addr_test_init+0xbc/0xc8 [dev_addr_lists_test]

stack backtrace:
CPU: 0 PID: 1778 Comm: kunit_try_catch Tainted: G N 6.9.0-rc6-PMacG4 #4
Hardware name: PowerMac3,6 7455 0x80010303 PowerMac
Call Trace:
[f39a9ef0] [c07c2c2c] dump_stack_lvl+0x80/0xac (unreliable)
[f39a9f10] [c004fe64] do_exit+0x5b4/0x834
[f39a9f60] [c006d848] kthread_complete_and_exit+0x0/0x28
[f39a9f80] [c006d870] kthread+0x0/0xe8
[f39a9fa0] [bebf2cf4] kunit_try_catch_run+0x0/0x15c [kunit]
[f39a9fc0] [c006d954] kthread+0xe4/0xe8
[f39a9ff0] [c0015304] start_kernel_thread+0x10/0x14
ok 1 dev_addr_test_basic
ok 2 dev_addr_test_sync_one
ok 3 dev_addr_test_add_del
ok 4 dev_addr_test_del_main
ok 5 dev_addr_test_add_set
ok 6 dev_addr_test_add_excl
# dev-addr-list-test: pass:6 fail:0 skip:0 total:6
# Totals: pass:6 fail:0 skip:0 total:6
ok 1 dev-addr-list-test
[...]

Full dmesg + kernel .config attached.

Regards,
Erhard


Attachments:
(No filename) (6.76 kB)
config_69-rc6_g4 (113.81 kB)
dmesg_69-rc6_g4_04 (46.03 kB)
Download all attachments

2024-05-06 14:27:30

by Jakub Kicinski

[permalink] [raw]
Subject: Re: WARNING: CPU: 1 PID: 1 at net/core/netpoll.c:370 netpoll_send_skb+0x1fc/0x20c at boot when netconsole is enabled (kernel v6.9-rc5, v6.8.7, sungem, PowerMac G4 DP)

On Sun, 5 May 2024 23:27:13 +0200 Erhard Furtner wrote:
> > On Sun, 28 Apr 2024 12:53:06 +0200 Erhard Furtner wrote:
> > > With netconsole enabled I get this "WARNING: CPU: 1 PID: 1 at
> > > net/core/netpoll.c:370 netpoll_send_skb+0x1fc/0x20c" and "WARNING:
> > > CPU: 1 PID: 1 at kernel/locking/irqflag-debug.c:10
> > > warn_bogus_irq_restore+0x30/0x44" at boot on my PowerMac G4 DP.
> > > Happens more often than not (6-7 out of 10 times booting):
> >
> > Could you try with LOCKDEP enabled?
> > I wonder if irqs_disabled() behaves differently than we expect.
>
> Ok, after a few tries I got a "BUG: spinlock wrong CPU on CPU#0, swapper/0/1" LOCKDEP hit. But this does not happen every time when I get the netpoll_send WARNING:

Oh, can you try deleting the gem_poll_controller() function?
Unhook it from ndo_poll_controller and remove it completely.

2024-05-07 00:47:01

by Erhard Furtner

[permalink] [raw]
Subject: Re: WARNING: CPU: 1 PID: 1 at net/core/netpoll.c:370 netpoll_send_skb+0x1fc/0x20c at boot when netconsole is enabled (kernel v6.9-rc5, v6.8.7, sungem, PowerMac G4 DP)

On Mon, 6 May 2024 07:26:45 -0700
Jakub Kicinski <[email protected]> wrote:

> On Sun, 5 May 2024 23:27:13 +0200 Erhard Furtner wrote:
> > > On Sun, 28 Apr 2024 12:53:06 +0200 Erhard Furtner wrote:
> > > > With netconsole enabled I get this "WARNING: CPU: 1 PID: 1 at
> > > > net/core/netpoll.c:370 netpoll_send_skb+0x1fc/0x20c" and "WARNING:
> > > > CPU: 1 PID: 1 at kernel/locking/irqflag-debug.c:10
> > > > warn_bogus_irq_restore+0x30/0x44" at boot on my PowerMac G4 DP.
> > > > Happens more often than not (6-7 out of 10 times booting):
> > >
> > > Could you try with LOCKDEP enabled?
> > > I wonder if irqs_disabled() behaves differently than we expect.
> >
> > Ok, after a few tries I got a "BUG: spinlock wrong CPU on CPU#0, swapper/0/1" LOCKDEP hit. But this does not happen every time when I get the netpoll_send WARNING:
>
> Oh, can you try deleting the gem_poll_controller() function?
> Unhook it from ndo_poll_controller and remove it completely.

Ok, this is the resulting diff:

diff --git a/drivers/net/ethernet/sun/sungem.c b/drivers/net/ethernet/sun/sungem.c
index 9bd1df8308d2..d3a2fbb14140 100644
--- a/drivers/net/ethernet/sun/sungem.c
+++ b/drivers/net/ethernet/sun/sungem.c
@@ -949,17 +949,6 @@ static irqreturn_t gem_interrupt(int irq, void *dev_id)
return IRQ_HANDLED;
}

-#ifdef CONFIG_NET_POLL_CONTROLLER
-static void gem_poll_controller(struct net_device *dev)
-{
- struct gem *gp = netdev_priv(dev);
-
- disable_irq(gp->pdev->irq);
- gem_interrupt(gp->pdev->irq, dev);
- enable_irq(gp->pdev->irq);
-}
-#endif
-
static void gem_tx_timeout(struct net_device *dev, unsigned int txqueue)
{
struct gem *gp = netdev_priv(dev);
@@ -2839,9 +2828,6 @@ static const struct net_device_ops gem_netdev_ops = {
.ndo_change_mtu = gem_change_mtu,
.ndo_validate_addr = eth_validate_addr,
.ndo_set_mac_address = gem_set_mac_address,
-#ifdef CONFIG_NET_POLL_CONTROLLER
- .ndo_poll_controller = gem_poll_controller,
-#endif
};

static int gem_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)


And indeed without gem_poll_controller() I don't hit the "WARNING: CPU: 1 PID: 1 at net/core/netpoll.c:370 netpoll_send_skb+0x1fc/0x20c" and "WARNING: CPU: 1 PID: 1 at kernel/locking/irqflag-debug.c:10 warn_bogus_irq_restore+0x30/0x44" or the according lockdep bug at boot!

Re-booted the machine about 20 times without anything suspicious showing up in the dmesg. With the unpatched kernel I got the WARNING at the 2nd reboot.

What I still get with 'modprobe -v dev_addr_lists_test', even with gem_poll_controller() removed is:

[...]
KTAP version 1
1..1
KTAP version 1
# Subtest: dev-addr-list-test
# module: dev_addr_lists_test
1..6

====================================
WARNING: kunit_try_catch/1770 still has locks held!
6.9.0-rc6-PMacG4-dirty #5 Tainted: G W N
------------------------------------
1 lock held by kunit_try_catch/1770:
#0: c0dbfce4 (rtnl_mutex){....}-{3:3}, at: dev_addr_test_init+0xbc/0xc8 [dev_addr_lists_test]

stack backtrace:
CPU: 0 PID: 1770 Comm: kunit_try_catch Tainted: G W N 6.9.0-rc6-PMacG4-dirty #5
Hardware name: PowerMac3,6 7455 0x80010303 PowerMac
Call Trace:
[f3749ef0] [c07c2bec] dump_stack_lvl+0x80/0xac (unreliable)
[f3749f10] [c004fe64] do_exit+0x5b4/0x834
[f3749f60] [c006d848] kthread_complete_and_exit+0x0/0x28
[f3749f80] [c006d870] kthread+0x0/0xe8
[f3749fa0] [bebf0cf4] kunit_try_catch_run+0x0/0x15c [kunit]
[f3749fc0] [c006d954] kthread+0xe4/0xe8
[f3749ff0] [c0015304] start_kernel_thread+0x10/0x14
ok 1 dev_addr_test_basic
ok 2 dev_addr_test_sync_one
ok 3 dev_addr_test_add_del
ok 4 dev_addr_test_del_main
ok 5 dev_addr_test_add_set
ok 6 dev_addr_test_add_excl
# dev-addr-list-test: pass:6 fail:0 skip:0 total:6
# Totals: pass:6 fail:0 skip:0 total:6
ok 1 dev-addr-list-test
[...]

Regards,
Erhard

2024-05-07 01:10:31

by Jakub Kicinski

[permalink] [raw]
Subject: Re: WARNING: CPU: 1 PID: 1 at net/core/netpoll.c:370 netpoll_send_skb+0x1fc/0x20c at boot when netconsole is enabled (kernel v6.9-rc5, v6.8.7, sungem, PowerMac G4 DP)

On Tue, 7 May 2024 02:42:58 +0200 Erhard Furtner wrote:
> And indeed without gem_poll_controller() I don't hit the "WARNING: CPU: 1 PID: 1 at net/core/netpoll.c:370 netpoll_send_skb+0x1fc/0x20c" and "WARNING: CPU: 1 PID: 1 at kernel/locking/irqflag-debug.c:10 warn_bogus_irq_restore+0x30/0x44" or the according lockdep bug at boot!
>
> Re-booted the machine about 20 times without anything suspicious showing up in the dmesg. With the unpatched kernel I got the WARNING at the 2nd reboot.

Excellent! Do you want to submit that as an official patch?
The explanation is that we can't call disable_irq() from atomic
context (which which netpoll runs). But the callback is no longer
necessary as we can depend on NAPI to do the polling these days.

> What I still get with 'modprobe -v dev_addr_lists_test', even with gem_poll_controller() removed is:
>
> [...]
> KTAP version 1
> 1..1
> KTAP version 1
> # Subtest: dev-addr-list-test
> # module: dev_addr_lists_test
> 1..6
>
> ====================================
> WARNING: kunit_try_catch/1770 still has locks held!
> 6.9.0-rc6-PMacG4-dirty #5 Tainted: G W N
> ------------------------------------
> 1 lock held by kunit_try_catch/1770:
> #0: c0dbfce4 (rtnl_mutex){....}-{3:3}, at: dev_addr_test_init+0xbc/0xc8 [dev_addr_lists_test]

I think that's fixed in net-next.

2024-05-08 09:02:04

by Erhard Furtner

[permalink] [raw]
Subject: Re: WARNING: CPU: 1 PID: 1 at net/core/netpoll.c:370 netpoll_send_skb+0x1fc/0x20c at boot when netconsole is enabled (kernel v6.9-rc5, v6.8.7, sungem, PowerMac G4 DP)

On Mon, 6 May 2024 18:10:20 -0700
Jakub Kicinski <[email protected]> wrote:

> Excellent! Do you want to submit that as an official patch?
> The explanation is that we can't call disable_irq() from atomic
> context (which which netpoll runs). But the callback is no longer
> necessary as we can depend on NAPI to do the polling these days.

I could do that with the explanation you stated. But should any further questions arise in this process I would also lack the technical background to deal with them. ;)

I also noticed a similar #ifdef CONFIG_NET_POLL_CONTROLLER logic shows up in
many network drivers, e.g. net/ethernet/realtek/8139too.c:

#ifdef CONFIG_NET_POLL_CONTROLLER
static void rtl8139_poll_controller(struct net_device *dev);
#endif
[...]
#ifdef CONFIG_NET_POLL_CONTROLLER
/*
* Polling receive - used by netconsole and other diagnostic tools
* to allow network i/o with interrupts disabled.
*/
static void rtl8139_poll_controller(struct net_device *dev)
{
struct rtl8139_private *tp = netdev_priv(dev);
const int irq = tp->pci_dev->irq;

disable_irq_nosync(irq);
rtl8139_interrupt(irq, dev);
enable_irq(irq);
}
#endif
[...]
#ifdef CONFIG_NET_POLL_CONTROLLER
.ndo_poll_controller = rtl8139_poll_controller,
#endif


Should it be removed here too? This would be more cards I can test. So far I only see this on my G4 and I think something similar on an old Pentium4 box I no longer have.

> > What I still get with 'modprobe -v dev_addr_lists_test', even with gem_poll_controller() removed is:
> >
> > [...]
> > KTAP version 1
> > 1..1
> > KTAP version 1
> > # Subtest: dev-addr-list-test
> > # module: dev_addr_lists_test
> > 1..6
> >
> > ====================================
> > WARNING: kunit_try_catch/1770 still has locks held!
> > 6.9.0-rc6-PMacG4-dirty #5 Tainted: G W N
> > ------------------------------------
> > 1 lock held by kunit_try_catch/1770:
> > #0: c0dbfce4 (rtnl_mutex){....}-{3:3}, at: dev_addr_test_init+0xbc/0xc8 [dev_addr_lists_test]
>
> I think that's fixed in net-next.

Ah, good to hear!

Regards,
Erhard F.

2024-05-08 13:54:37

by Jakub Kicinski

[permalink] [raw]
Subject: Re: WARNING: CPU: 1 PID: 1 at net/core/netpoll.c:370 netpoll_send_skb+0x1fc/0x20c at boot when netconsole is enabled (kernel v6.9-rc5, v6.8.7, sungem, PowerMac G4 DP)

On Wed, 8 May 2024 10:55:05 +0200 Erhard Furtner wrote:
> I could do that with the explanation you stated. But should any
> further questions arise in this process I would also lack the
> technical background to deal with them. ;)

Alright, submitted :)

> I also noticed a similar #ifdef CONFIG_NET_POLL_CONTROLLER logic shows up in
> many network drivers, e.g. net/ethernet/realtek/8139too.c:
>
> #ifdef CONFIG_NET_POLL_CONTROLLER
> static void rtl8139_poll_controller(struct net_device *dev);
> #endif
> [...]
> #ifdef CONFIG_NET_POLL_CONTROLLER
> /*
> * Polling receive - used by netconsole and other diagnostic tools
> * to allow network i/o with interrupts disabled.
> */
> static void rtl8139_poll_controller(struct net_device *dev)
> {
> struct rtl8139_private *tp = netdev_priv(dev);
> const int irq = tp->pci_dev->irq;
>
> disable_irq_nosync(irq);
> rtl8139_interrupt(irq, dev);
> enable_irq(irq);
> }
> #endif
> [...]
> #ifdef CONFIG_NET_POLL_CONTROLLER
> .ndo_poll_controller = rtl8139_poll_controller,
> #endif
>
>
> Should it be removed here too? This would be more cards I can test.
> So far I only see this on my G4 and I think something similar on an
> old Pentium4 box I no longer have.

That one looks legit. Note the _nosync() which solves the immediate
IRQ masking / deadlock issue. And the rtl8139_interrupt() function
actually does the packet cleanup in case of 8139too.