Dear Linux folks,
I do not know, if this is an rcutorture issue, or if rcutorture found a
bug with `rtmsg_ifinfo_build_skb()`.
Building Linux 5.17-rc1+ (dd81e1c7d5fb) under Ubuntu 21.04 with
CONFIG_TORTURE_TEST=y
CONFIG_RCU_TORTURE_TEST=y
and
$ clang --version
Ubuntu clang version 12.0.0-3ubuntu1~21.04.2
Target: powerpc64le-unknown-linux-gnu
Thread model: posix
InstalledDir: /usr/bin
$ make -j100 LLVM=1 LLVM_IAS=0 bindeb-pkg
and booting it on an IBM S822LC, Linux paniced with a null pointer
dereference, and the watchdog rebooted, and I found the message below in
`/sys/fs/pstore/dmesg-nvram-2.enc.z`.
```
[ T1] Key type id_legacy registered
[ T1] SGI XFS with ACLs, security attributes, no debug enabled
[ T1] Block layer SCSI generic (bsg) driver version 0.4 loaded (major
248)
[ T1] io scheduler mq-deadline registered
[ T1] io scheduler kyber registered
[ T198] cryptomgr_test (198) used greatest stack depth: 13536 bytes left
[ T1] pci 0021:10:00.0: enabling device (0141 -> 0143)
[ T1] Using unsupported 1024x768 (null) at 3fe882010000, depth=32,
pitch=4096
[ T1] Console: switching to colour frame buffer device 128x48
[ T1] fb0: Open Firmware frame buffer device on
/pciex@3fffe41100000/pci@0/pci@0/pci@b/pci@0/vga@0
[ T1] hvc0: raw protocol on /ibm,opal/consoles/serial@0 (boot console)
[ T1] hvc0: No interrupts property, using OPAL event
[ T1] Serial: 8250/16550 driver, 4 ports, IRQ sharing disabled
[ T1] Non-volatile memory driver v1.3
[ T1] brd: module loaded
[ T1] loop: module loaded
[ T1] ipr: IBM Power RAID SCSI Device Driver version: 2.6.4 (March
14, 2017)
[ T1] ahci 0021:0e:00.0: version 3.0
[ T1] ahci 0021:0e:00.0: enabling device (0141 -> 0143)
[ T1] ahci 0021:0e:00.0: AHCI 0001.0000 32 slots 4 ports 6 Gbps 0xf
impl SATA mode
[ T1] ahci 0021:0e:00.0: flags: 64bit ncq sntf led only pmp fbs pio
slum part sxs
[ T1] scsi host0: ahci
[ T1] scsi host1: ahci
[ T1] scsi host2: ahci
[ T1] scsi host3: ahci
[ T1] ata1: SATA max UDMA/133 abar m2048@0x3fe881000000 port
0x3fe881000100 irq 39
[ T1] ata2: SATA max UDMA/133 abar m2048@0x3fe881000000 port
0x3fe881000180 irq 39
[ T1] ata3: SATA max UDMA/133 abar m2048@0x3fe881000000 port
0x3fe881000200 irq 39
[ T1] ata4: SATA max UDMA/133 abar m2048@0x3fe881000000 port
0x3fe881000280 irq 39
[ T1] e100: Intel(R) PRO/100 Network Driver
[ T1] e100: Copyright(c) 1999-2006 Intel Corporation
[ T1] e1000: Intel(R) PRO/1000 Network Driver
[ T1] e1000: Copyright (c) 1999-2006 Intel Corporation.
[ T1] e1000e: Intel(R) PRO/1000 Network Driver
[ T1] e1000e: Copyright(c) 1999 - 2015 Intel Corporation.
[ T1] ehci_hcd: USB 2.0 'Enhanced' Host Controller (EHCI) Driver
[ T1] ehci-pci: EHCI PCI platform driver
[ T1] ohci_hcd: USB 1.1 'Open' Host Controller (OHCI) Driver
[ T1] ohci-pci: OHCI PCI platform driver
[ T1] rtc-opal opal-rtc: registered as rtc0
[ T1] rtc-opal opal-rtc: setting system clock to 2022-01-24T18:21:45
UTC (1643048505)
[ T1] i2c_dev: i2c /dev entries driver
[ T1] device-mapper: uevent: version 1.0.3
[ T1] device-mapper: ioctl: 4.45.0-ioctl (2021-03-22) initialised:
[email protected]
[ T1] powernv-cpufreq: cpufreq pstate min 0xffffffd5 nominal
0xffffffef max 0x0
[ T1] powernv-cpufreq: Workload Optimized Frequency is disabled in
the platform
[ T1] powernv_idle_driver registered
[ T1] nx_compress_powernv: coprocessor found on chip 0, CT 3 CI 1
[ T1] nx_compress_powernv: coprocessor found on chip 8, CT 3 CI 9
[ T1] usbcore: registered new interface driver usbhid
[ T1] usbhid: USB HID core driver
[ T1] ipip: IPv4 and MPLS over IPv4 tunneling driver
[ T1] NET: Registered PF_INET6 protocol family
[ T1] Segment Routing with IPv6
[ T1] In-situ OAM (IOAM) with IPv6
[ T1] sit: IPv6, IPv4 and MPLS over IPv4 tunneling driver
[ T1] BUG: Kernel NULL pointer dereference on write at 0x00000000
[ T1] Faulting instruction address: 0xc0000000008e2400
[ T1] Oops: Kernel access of bad area, sig: 11 [#1]
[ T1] LE PAGE_SIZE=64K MMU=Hash PREEMPT SMP NR_CPUS=16 NUMA PowerNV
[ T1] Modules linked in:
[ T1] CPU: 11 PID: 1 Comm: swapper/0 Not tainted
5.17.0-rc1-00032-gdd81e1c7d5fb #29
[ T1] NIP: c0000000008e2400 LR: c000000000d65db0 CTR: c000000000f0bb60
[ T1] REGS: c0000000125033e0 TRAP: 0380 Not tainted
(5.17.0-rc1-00032-gdd81e1c7d5fb)
[ T1] MSR: 9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE> CR: 42800c40
XER: 00000000
[ T1] CFAR: c000000000d65dac IRQMASK: 0
[ T1] GPR00: c000000000d65b40 c000000012503680 c00000000290c600
0000000000000000
[ T1] GPR04: ffffffffffffffff 00000000ffffffff 0000000000000000
0000000000000cc0
[ T1] GPR08: 0000000000000000 0000000000000000 ffffffffffffffff
0000000000000001
[ T1] GPR12: 0000000000000000 c000007fffff6c00 c000000000012478
0000000000000000
[ T1] GPR16: 0000000000000000 0000000000000000 0000000000000000
0000000000000000
[ T1] GPR20: 0000000000000000 c000000002810100 0000000000000cc0
0000000000000000
[ T1] GPR24: 0000000000000010 c00000000294cf50 0000000000000000
0000000000000000
[ T1] GPR28: 0000000000000000 c00000001ec61000 0000000000000000
c000000012503680
[ T1] NIP [c0000000008e2400] strlen+0x10/0x30
[ T1] LR [c000000000d65db0] if_nlmsg_size+0x150/0x360
[ T1] Call Trace:
[ T1] [c000000012503680] [c0000000125036c0] 0xc0000000125036c0
(unreliable)
[ T1] [c0000000125036f0] [c000000000d65b40]
rtmsg_ifinfo_build_skb+0x80/0x1a0
[ T1] [c0000000125037b0] [c000000000d66be0] rtmsg_ifinfo+0x70/0xd0
[ T1] [c000000012503800] [c000000000d4de50]
register_netdevice+0x690/0x770
[ T1] [c000000012503890] [c000000000d4e2bc] register_netdev+0x4c/0x80
[ T1] [c0000000125038c0] [c000000000f4784c] sit_init_net+0x10c/0x1d0
[ T1] [c000000012503910] [c000000000d33c0c] ops_init+0x13c/0x1b0
[ T1] [c000000012503970] [c000000000d331bc]
register_pernet_operations+0xec/0x1e0
[ T1] [c0000000125039d0] [c000000000d33440]
register_pernet_device+0x60/0xd0
[ T1] [c000000012503a20] [c000000002085478] sit_init+0x54/0x160
[ T1] [c000000012503ab0] [c000000000011ba8] do_one_initcall+0xd8/0x3b0
[ T1] [c000000012503c70] [c000000002006064] do_initcall_level+0xe4/0x1c4
[ T1] [c000000012503cc0] [c000000002005f20] do_initcalls+0x84/0xe4
[ T1] [c000000012503d40] [c000000002005c7c]
kernel_init_freeable+0x160/0x1ec
[ T1] [c000000012503da0] [c0000000000124ac] kernel_init+0x3c/0x270
[ T1] [c000000012503e10] [c00000000000cd64]
ret_from_kernel_thread+0x5c/0x64
[ T1] Instruction dump:
[ T1] eb81ffe0 7c0803a6 4e800020 00000000 00000000 00000000 60000000
60000000
[ T1] 3883ffff 60000000 60000000 60000000 <8ca40001> 28050000
4082fff8 7c632050
[ T1] ---[ end trace 0000000000000000 ]---
[ T1]
[ T206] ata4: SATA link down (SStatus 0 SControl 300)
[ T204] ata3: SATA link down (SStatus 0 SControl 300)
[ T200] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[ T200] ata1.00: ATA-10: ST1000NX0313 00LY266 00LY265IBM, BE33,
max UDMA/133
[ T200] ata1.00: 1953525168 sectors, multi 0: LBA48 NCQ (depth 32), AA
[ T200] ata1.00: configured for UDMA/133
[ T7] scsi 0:0:0:0: Direct-Access ATA ST1000NX0313 BE33
PQ: 0 ANSI: 5
[ T7] sd 0:0:0:0: Attached scsi generic sg0 type 0
[ T209] sd 0:0:0:0: [sda] 1953525168 512-byte logical blocks: (1.00
TB/932 GiB)
[ T209] sd 0:0:0:0: [sda] 4096-byte physical blocks
[ T209] sd 0:0:0:0: [sda] Write Protect is off
[ T209] sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
[ T209] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled,
doesn't support DPO or FUA
[ T209] sda: sda1 sda2
[ T209] sd 0:0:0:0: [sda] Attached SCSI removable disk
[ T1] Kernel panic - not syncing: Attempted to kill init!
exitcode=0x0000000b
```
Kind regards,
Paul
Dear Menzel
I am also very interested in RCU tests;-)
First of all, thank your email for teaching me how to construct a
kernel deb package using clang ;-)
I build and test the linux-next under x86_64, but the kernel does not
panic, I guess our kernel configuration maybe different, following is
my steps:
1. git clone https://kernel.source.codeaurora.cn/pub/scm/linux/kernel/git/next/linux-next.git
2. git describe: next-20220125
3. make menuconfig CC=clang-12 (CONFIG_TORTURE_TEST=y
CONFIG_RCU_TORTURE_TEST=y)
My configuration file is uploaded to my VPS cloud server:
http://154.223.142.244/config-5.17.0-rc1-next-20220125+
4. make CC=clang-12 -j 16 bindeb-pkg
5. install the kernel, reboot
6. the kernel does not panic (has been running for 30 minutes by now)
I Hope I can be more helpful ;-)
Thanks
Sincerely
Zhouyi
On Wed, Jan 26, 2022 at 3:24 PM Paul Menzel <[email protected]> wrote:
>
> Dear Linux folks,
>
>
> I do not know, if this is an rcutorture issue, or if rcutorture found a
> bug with `rtmsg_ifinfo_build_skb()`.
>
>
> Building Linux 5.17-rc1+ (dd81e1c7d5fb) under Ubuntu 21.04 with
>
> CONFIG_TORTURE_TEST=y
> CONFIG_RCU_TORTURE_TEST=y
>
> and
>
> $ clang --version
> Ubuntu clang version 12.0.0-3ubuntu1~21.04.2
> Target: powerpc64le-unknown-linux-gnu
> Thread model: posix
> InstalledDir: /usr/bin
> $ make -j100 LLVM=1 LLVM_IAS=0 bindeb-pkg
>
> and booting it on an IBM S822LC, Linux paniced with a null pointer
> dereference, and the watchdog rebooted, and I found the message below in
> `/sys/fs/pstore/dmesg-nvram-2.enc.z`.
>
> ```
> [ T1] Key type id_legacy registered
> [ T1] SGI XFS with ACLs, security attributes, no debug enabled
> [ T1] Block layer SCSI generic (bsg) driver version 0.4 loaded (major
> 248)
> [ T1] io scheduler mq-deadline registered
> [ T1] io scheduler kyber registered
> [ T198] cryptomgr_test (198) used greatest stack depth: 13536 bytes left
> [ T1] pci 0021:10:00.0: enabling device (0141 -> 0143)
> [ T1] Using unsupported 1024x768 (null) at 3fe882010000, depth=32,
> pitch=4096
> [ T1] Console: switching to colour frame buffer device 128x48
> [ T1] fb0: Open Firmware frame buffer device on
> /pciex@3fffe41100000/pci@0/pci@0/pci@b/pci@0/vga@0
> [ T1] hvc0: raw protocol on /ibm,opal/consoles/serial@0 (boot console)
> [ T1] hvc0: No interrupts property, using OPAL event
> [ T1] Serial: 8250/16550 driver, 4 ports, IRQ sharing disabled
> [ T1] Non-volatile memory driver v1.3
> [ T1] brd: module loaded
> [ T1] loop: module loaded
> [ T1] ipr: IBM Power RAID SCSI Device Driver version: 2.6.4 (March
> 14, 2017)
> [ T1] ahci 0021:0e:00.0: version 3.0
> [ T1] ahci 0021:0e:00.0: enabling device (0141 -> 0143)
> [ T1] ahci 0021:0e:00.0: AHCI 0001.0000 32 slots 4 ports 6 Gbps 0xf
> impl SATA mode
> [ T1] ahci 0021:0e:00.0: flags: 64bit ncq sntf led only pmp fbs pio
> slum part sxs
> [ T1] scsi host0: ahci
> [ T1] scsi host1: ahci
> [ T1] scsi host2: ahci
> [ T1] scsi host3: ahci
> [ T1] ata1: SATA max UDMA/133 abar m2048@0x3fe881000000 port
> 0x3fe881000100 irq 39
> [ T1] ata2: SATA max UDMA/133 abar m2048@0x3fe881000000 port
> 0x3fe881000180 irq 39
> [ T1] ata3: SATA max UDMA/133 abar m2048@0x3fe881000000 port
> 0x3fe881000200 irq 39
> [ T1] ata4: SATA max UDMA/133 abar m2048@0x3fe881000000 port
> 0x3fe881000280 irq 39
> [ T1] e100: Intel(R) PRO/100 Network Driver
> [ T1] e100: Copyright(c) 1999-2006 Intel Corporation
> [ T1] e1000: Intel(R) PRO/1000 Network Driver
> [ T1] e1000: Copyright (c) 1999-2006 Intel Corporation.
> [ T1] e1000e: Intel(R) PRO/1000 Network Driver
> [ T1] e1000e: Copyright(c) 1999 - 2015 Intel Corporation.
> [ T1] ehci_hcd: USB 2.0 'Enhanced' Host Controller (EHCI) Driver
> [ T1] ehci-pci: EHCI PCI platform driver
> [ T1] ohci_hcd: USB 1.1 'Open' Host Controller (OHCI) Driver
> [ T1] ohci-pci: OHCI PCI platform driver
> [ T1] rtc-opal opal-rtc: registered as rtc0
> [ T1] rtc-opal opal-rtc: setting system clock to 2022-01-24T18:21:45
> UTC (1643048505)
> [ T1] i2c_dev: i2c /dev entries driver
> [ T1] device-mapper: uevent: version 1.0.3
> [ T1] device-mapper: ioctl: 4.45.0-ioctl (2021-03-22) initialised:
> [email protected]
> [ T1] powernv-cpufreq: cpufreq pstate min 0xffffffd5 nominal
> 0xffffffef max 0x0
> [ T1] powernv-cpufreq: Workload Optimized Frequency is disabled in
> the platform
> [ T1] powernv_idle_driver registered
> [ T1] nx_compress_powernv: coprocessor found on chip 0, CT 3 CI 1
> [ T1] nx_compress_powernv: coprocessor found on chip 8, CT 3 CI 9
> [ T1] usbcore: registered new interface driver usbhid
> [ T1] usbhid: USB HID core driver
> [ T1] ipip: IPv4 and MPLS over IPv4 tunneling driver
> [ T1] NET: Registered PF_INET6 protocol family
> [ T1] Segment Routing with IPv6
> [ T1] In-situ OAM (IOAM) with IPv6
> [ T1] sit: IPv6, IPv4 and MPLS over IPv4 tunneling driver
> [ T1] BUG: Kernel NULL pointer dereference on write at 0x00000000
> [ T1] Faulting instruction address: 0xc0000000008e2400
> [ T1] Oops: Kernel access of bad area, sig: 11 [#1]
> [ T1] LE PAGE_SIZE=64K MMU=Hash PREEMPT SMP NR_CPUS=16 NUMA PowerNV
> [ T1] Modules linked in:
> [ T1] CPU: 11 PID: 1 Comm: swapper/0 Not tainted
> 5.17.0-rc1-00032-gdd81e1c7d5fb #29
> [ T1] NIP: c0000000008e2400 LR: c000000000d65db0 CTR: c000000000f0bb60
> [ T1] REGS: c0000000125033e0 TRAP: 0380 Not tainted
> (5.17.0-rc1-00032-gdd81e1c7d5fb)
> [ T1] MSR: 9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE> CR: 42800c40
> XER: 00000000
> [ T1] CFAR: c000000000d65dac IRQMASK: 0
> [ T1] GPR00: c000000000d65b40 c000000012503680 c00000000290c600
> 0000000000000000
> [ T1] GPR04: ffffffffffffffff 00000000ffffffff 0000000000000000
> 0000000000000cc0
> [ T1] GPR08: 0000000000000000 0000000000000000 ffffffffffffffff
> 0000000000000001
> [ T1] GPR12: 0000000000000000 c000007fffff6c00 c000000000012478
> 0000000000000000
> [ T1] GPR16: 0000000000000000 0000000000000000 0000000000000000
> 0000000000000000
> [ T1] GPR20: 0000000000000000 c000000002810100 0000000000000cc0
> 0000000000000000
> [ T1] GPR24: 0000000000000010 c00000000294cf50 0000000000000000
> 0000000000000000
> [ T1] GPR28: 0000000000000000 c00000001ec61000 0000000000000000
> c000000012503680
> [ T1] NIP [c0000000008e2400] strlen+0x10/0x30
> [ T1] LR [c000000000d65db0] if_nlmsg_size+0x150/0x360
> [ T1] Call Trace:
> [ T1] [c000000012503680] [c0000000125036c0] 0xc0000000125036c0
> (unreliable)
> [ T1] [c0000000125036f0] [c000000000d65b40]
> rtmsg_ifinfo_build_skb+0x80/0x1a0
> [ T1] [c0000000125037b0] [c000000000d66be0] rtmsg_ifinfo+0x70/0xd0
> [ T1] [c000000012503800] [c000000000d4de50]
> register_netdevice+0x690/0x770
> [ T1] [c000000012503890] [c000000000d4e2bc] register_netdev+0x4c/0x80
> [ T1] [c0000000125038c0] [c000000000f4784c] sit_init_net+0x10c/0x1d0
> [ T1] [c000000012503910] [c000000000d33c0c] ops_init+0x13c/0x1b0
> [ T1] [c000000012503970] [c000000000d331bc]
> register_pernet_operations+0xec/0x1e0
> [ T1] [c0000000125039d0] [c000000000d33440]
> register_pernet_device+0x60/0xd0
> [ T1] [c000000012503a20] [c000000002085478] sit_init+0x54/0x160
> [ T1] [c000000012503ab0] [c000000000011ba8] do_one_initcall+0xd8/0x3b0
> [ T1] [c000000012503c70] [c000000002006064] do_initcall_level+0xe4/0x1c4
> [ T1] [c000000012503cc0] [c000000002005f20] do_initcalls+0x84/0xe4
> [ T1] [c000000012503d40] [c000000002005c7c]
> kernel_init_freeable+0x160/0x1ec
> [ T1] [c000000012503da0] [c0000000000124ac] kernel_init+0x3c/0x270
> [ T1] [c000000012503e10] [c00000000000cd64]
> ret_from_kernel_thread+0x5c/0x64
> [ T1] Instruction dump:
> [ T1] eb81ffe0 7c0803a6 4e800020 00000000 00000000 00000000 60000000
> 60000000
> [ T1] 3883ffff 60000000 60000000 60000000 <8ca40001> 28050000
> 4082fff8 7c632050
> [ T1] ---[ end trace 0000000000000000 ]---
> [ T1]
> [ T206] ata4: SATA link down (SStatus 0 SControl 300)
> [ T204] ata3: SATA link down (SStatus 0 SControl 300)
> [ T200] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
> [ T200] ata1.00: ATA-10: ST1000NX0313 00LY266 00LY265IBM, BE33,
> max UDMA/133
> [ T200] ata1.00: 1953525168 sectors, multi 0: LBA48 NCQ (depth 32), AA
> [ T200] ata1.00: configured for UDMA/133
> [ T7] scsi 0:0:0:0: Direct-Access ATA ST1000NX0313 BE33
> PQ: 0 ANSI: 5
> [ T7] sd 0:0:0:0: Attached scsi generic sg0 type 0
> [ T209] sd 0:0:0:0: [sda] 1953525168 512-byte logical blocks: (1.00
> TB/932 GiB)
> [ T209] sd 0:0:0:0: [sda] 4096-byte physical blocks
> [ T209] sd 0:0:0:0: [sda] Write Protect is off
> [ T209] sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
> [ T209] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled,
> doesn't support DPO or FUA
> [ T209] sda: sda1 sda2
> [ T209] sd 0:0:0:0: [sda] Attached SCSI removable disk
> [ T1] Kernel panic - not syncing: Attempted to kill init!
> exitcode=0x0000000b
> ```
>
>
> Kind regards,
>
> Paul
Dear Paul
I don't have an IBM machine, but I tried to analyze the problem using
my x86_64 kvm virtual machine, I can't reproduce the bug using my
x86_64 kvm virtual machine.
I saw the panic is caused by registration of sit device (A sit device
is a type of virtual network device that takes our IPv6 traffic,
encapsulates/decapsulates it in IPv4 packets, and sends/receives it
over the IPv4 Internet to another host)
sit device is registered in function sit_init_net:
1895 static int __net_init sit_init_net(struct net *net)
1896 {
1897 struct sit_net *sitn = net_generic(net, sit_net_id);
1898 struct ip_tunnel *t;
1899 int err;
1900
1901 sitn->tunnels[0] = sitn->tunnels_wc;
1902 sitn->tunnels[1] = sitn->tunnels_l;
1903 sitn->tunnels[2] = sitn->tunnels_r;
1904 sitn->tunnels[3] = sitn->tunnels_r_l;
1905
1906 if (!net_has_fallback_tunnels(net))
1907 return 0;
1908
1909 sitn->fb_tunnel_dev = alloc_netdev(sizeof(struct ip_tunnel), "sit0",
1910 NET_NAME_UNKNOWN,
1911 ipip6_tunnel_setup);
1912 if (!sitn->fb_tunnel_dev) {
1913 err = -ENOMEM;
1914 goto err_alloc_dev;
1915 }
1916 dev_net_set(sitn->fb_tunnel_dev, net);
1917 sitn->fb_tunnel_dev->rtnl_link_ops = &sit_link_ops;
1918 /* FB netdevice is special: we have one, and only one per netns.
1919 * Allowing to move it to another netns is clearly unsafe.
1920 */
1921 sitn->fb_tunnel_dev->features |= NETIF_F_NETNS_LOCAL;
1922
1923 err = register_netdev(sitn->fb_tunnel_dev);
register_netdev on line 1923 will call if_nlmsg_size indirectly.
On the other hand, the function that calls the paniced strlen is if_nlmsg_size:
(gdb) disassemble if_nlmsg_size
Dump of assembler code for function if_nlmsg_size:
0xffffffff81a0dc20 <+0>: nopl 0x0(%rax,%rax,1)
0xffffffff81a0dc25 <+5>: push %rbp
0xffffffff81a0dc26 <+6>: push %r15
0xffffffff81a0dd04 <+228>: je 0xffffffff81a0de20 <if_nlmsg_size+512>
0xffffffff81a0dd0a <+234>: mov 0x10(%rbp),%rdi
...
=> 0xffffffff81a0dd0e <+238>: callq 0xffffffff817532d0 <strlen>
0xffffffff81a0dd13 <+243>: add $0x10,%eax
0xffffffff81a0dd16 <+246>: movslq %eax,%r12
and the C code for 0xffffffff81a0dd0e is following (line 524):
515 static size_t rtnl_link_get_size(const struct net_device *dev)
516 {
517 const struct rtnl_link_ops *ops = dev->rtnl_link_ops;
518 size_t size;
519
520 if (!ops)
521 return 0;
522
523 size = nla_total_size(sizeof(struct nlattr)) + /* IFLA_LINKINFO */
524 nla_total_size(strlen(ops->kind) + 1); /* IFLA_INFO_KIND */
But ops is assigned the value of sit_link_ops in function sit_init_net
line 1917, so I guess something must happened between the calls.
Do we have KASAN in IBM machine? would KASAN help us find out what
happened in between?
Hope I can be of more helpful.
Thanks
Sincerely
Zhouyi
On Wed, Jan 26, 2022 at 3:24 PM Paul Menzel <[email protected]> wrote:
>
> Dear Linux folks,
>
>
> I do not know, if this is an rcutorture issue, or if rcutorture found a
> bug with `rtmsg_ifinfo_build_skb()`.
>
>
> Building Linux 5.17-rc1+ (dd81e1c7d5fb) under Ubuntu 21.04 with
>
> CONFIG_TORTURE_TEST=y
> CONFIG_RCU_TORTURE_TEST=y
>
> and
>
> $ clang --version
> Ubuntu clang version 12.0.0-3ubuntu1~21.04.2
> Target: powerpc64le-unknown-linux-gnu
> Thread model: posix
> InstalledDir: /usr/bin
> $ make -j100 LLVM=1 LLVM_IAS=0 bindeb-pkg
I build the kernel in LLVM/Clang also
>
> and booting it on an IBM S822LC, Linux paniced with a null pointer
> dereference, and the watchdog rebooted, and I found the message below in
> `/sys/fs/pstore/dmesg-nvram-2.enc.z`.
>
> ```
> [ T1] Key type id_legacy registered
> [ T1] SGI XFS with ACLs, security attributes, no debug enabled
> [ T1] Block layer SCSI generic (bsg) driver version 0.4 loaded (major
> 248)
> [ T1] io scheduler mq-deadline registered
> [ T1] io scheduler kyber registered
> [ T198] cryptomgr_test (198) used greatest stack depth: 13536 bytes left
> [ T1] pci 0021:10:00.0: enabling device (0141 -> 0143)
> [ T1] Using unsupported 1024x768 (null) at 3fe882010000, depth=32,
> pitch=4096
> [ T1] Console: switching to colour frame buffer device 128x48
> [ T1] fb0: Open Firmware frame buffer device on
> /pciex@3fffe41100000/pci@0/pci@0/pci@b/pci@0/vga@0
> [ T1] hvc0: raw protocol on /ibm,opal/consoles/serial@0 (boot console)
> [ T1] hvc0: No interrupts property, using OPAL event
> [ T1] Serial: 8250/16550 driver, 4 ports, IRQ sharing disabled
> [ T1] Non-volatile memory driver v1.3
> [ T1] brd: module loaded
> [ T1] loop: module loaded
> [ T1] ipr: IBM Power RAID SCSI Device Driver version: 2.6.4 (March
> 14, 2017)
> [ T1] ahci 0021:0e:00.0: version 3.0
> [ T1] ahci 0021:0e:00.0: enabling device (0141 -> 0143)
> [ T1] ahci 0021:0e:00.0: AHCI 0001.0000 32 slots 4 ports 6 Gbps 0xf
> impl SATA mode
> [ T1] ahci 0021:0e:00.0: flags: 64bit ncq sntf led only pmp fbs pio
> slum part sxs
> [ T1] scsi host0: ahci
> [ T1] scsi host1: ahci
> [ T1] scsi host2: ahci
> [ T1] scsi host3: ahci
> [ T1] ata1: SATA max UDMA/133 abar m2048@0x3fe881000000 port
> 0x3fe881000100 irq 39
> [ T1] ata2: SATA max UDMA/133 abar m2048@0x3fe881000000 port
> 0x3fe881000180 irq 39
> [ T1] ata3: SATA max UDMA/133 abar m2048@0x3fe881000000 port
> 0x3fe881000200 irq 39
> [ T1] ata4: SATA max UDMA/133 abar m2048@0x3fe881000000 port
> 0x3fe881000280 irq 39
> [ T1] e100: Intel(R) PRO/100 Network Driver
> [ T1] e100: Copyright(c) 1999-2006 Intel Corporation
> [ T1] e1000: Intel(R) PRO/1000 Network Driver
> [ T1] e1000: Copyright (c) 1999-2006 Intel Corporation.
> [ T1] e1000e: Intel(R) PRO/1000 Network Driver
> [ T1] e1000e: Copyright(c) 1999 - 2015 Intel Corporation.
> [ T1] ehci_hcd: USB 2.0 'Enhanced' Host Controller (EHCI) Driver
> [ T1] ehci-pci: EHCI PCI platform driver
> [ T1] ohci_hcd: USB 1.1 'Open' Host Controller (OHCI) Driver
> [ T1] ohci-pci: OHCI PCI platform driver
> [ T1] rtc-opal opal-rtc: registered as rtc0
> [ T1] rtc-opal opal-rtc: setting system clock to 2022-01-24T18:21:45
> UTC (1643048505)
> [ T1] i2c_dev: i2c /dev entries driver
> [ T1] device-mapper: uevent: version 1.0.3
> [ T1] device-mapper: ioctl: 4.45.0-ioctl (2021-03-22) initialised:
> [email protected]
> [ T1] powernv-cpufreq: cpufreq pstate min 0xffffffd5 nominal
> 0xffffffef max 0x0
> [ T1] powernv-cpufreq: Workload Optimized Frequency is disabled in
> the platform
> [ T1] powernv_idle_driver registered
> [ T1] nx_compress_powernv: coprocessor found on chip 0, CT 3 CI 1
> [ T1] nx_compress_powernv: coprocessor found on chip 8, CT 3 CI 9
> [ T1] usbcore: registered new interface driver usbhid
> [ T1] usbhid: USB HID core driver
> [ T1] ipip: IPv4 and MPLS over IPv4 tunneling driver
> [ T1] NET: Registered PF_INET6 protocol family
> [ T1] Segment Routing with IPv6
> [ T1] In-situ OAM (IOAM) with IPv6
> [ T1] sit: IPv6, IPv4 and MPLS over IPv4 tunneling driver
> [ T1] BUG: Kernel NULL pointer dereference on write at 0x00000000
> [ T1] Faulting instruction address: 0xc0000000008e2400
> [ T1] Oops: Kernel access of bad area, sig: 11 [#1]
> [ T1] LE PAGE_SIZE=64K MMU=Hash PREEMPT SMP NR_CPUS=16 NUMA PowerNV
> [ T1] Modules linked in:
> [ T1] CPU: 11 PID: 1 Comm: swapper/0 Not tainted
> 5.17.0-rc1-00032-gdd81e1c7d5fb #29
> [ T1] NIP: c0000000008e2400 LR: c000000000d65db0 CTR: c000000000f0bb60
> [ T1] REGS: c0000000125033e0 TRAP: 0380 Not tainted
> (5.17.0-rc1-00032-gdd81e1c7d5fb)
> [ T1] MSR: 9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE> CR: 42800c40
> XER: 00000000
> [ T1] CFAR: c000000000d65dac IRQMASK: 0
> [ T1] GPR00: c000000000d65b40 c000000012503680 c00000000290c600
> 0000000000000000
> [ T1] GPR04: ffffffffffffffff 00000000ffffffff 0000000000000000
> 0000000000000cc0
> [ T1] GPR08: 0000000000000000 0000000000000000 ffffffffffffffff
> 0000000000000001
> [ T1] GPR12: 0000000000000000 c000007fffff6c00 c000000000012478
> 0000000000000000
> [ T1] GPR16: 0000000000000000 0000000000000000 0000000000000000
> 0000000000000000
> [ T1] GPR20: 0000000000000000 c000000002810100 0000000000000cc0
> 0000000000000000
> [ T1] GPR24: 0000000000000010 c00000000294cf50 0000000000000000
> 0000000000000000
> [ T1] GPR28: 0000000000000000 c00000001ec61000 0000000000000000
> c000000012503680
> [ T1] NIP [c0000000008e2400] strlen+0x10/0x30
> [ T1] LR [c000000000d65db0] if_nlmsg_size+0x150/0x360
> [ T1] Call Trace:
> [ T1] [c000000012503680] [c0000000125036c0] 0xc0000000125036c0
> (unreliable)
> [ T1] [c0000000125036f0] [c000000000d65b40]
> rtmsg_ifinfo_build_skb+0x80/0x1a0
> [ T1] [c0000000125037b0] [c000000000d66be0] rtmsg_ifinfo+0x70/0xd0
> [ T1] [c000000012503800] [c000000000d4de50]
> register_netdevice+0x690/0x770
> [ T1] [c000000012503890] [c000000000d4e2bc] register_netdev+0x4c/0x80
> [ T1] [c0000000125038c0] [c000000000f4784c] sit_init_net+0x10c/0x1d0
> [ T1] [c000000012503910] [c000000000d33c0c] ops_init+0x13c/0x1b0
> [ T1] [c000000012503970] [c000000000d331bc]
> register_pernet_operations+0xec/0x1e0
> [ T1] [c0000000125039d0] [c000000000d33440]
> register_pernet_device+0x60/0xd0
> [ T1] [c000000012503a20] [c000000002085478] sit_init+0x54/0x160
> [ T1] [c000000012503ab0] [c000000000011ba8] do_one_initcall+0xd8/0x3b0
> [ T1] [c000000012503c70] [c000000002006064] do_initcall_level+0xe4/0x1c4
> [ T1] [c000000012503cc0] [c000000002005f20] do_initcalls+0x84/0xe4
> [ T1] [c000000012503d40] [c000000002005c7c]
> kernel_init_freeable+0x160/0x1ec
> [ T1] [c000000012503da0] [c0000000000124ac] kernel_init+0x3c/0x270
> [ T1] [c000000012503e10] [c00000000000cd64]
> ret_from_kernel_thread+0x5c/0x64
> [ T1] Instruction dump:
> [ T1] eb81ffe0 7c0803a6 4e800020 00000000 00000000 00000000 60000000
> 60000000
> [ T1] 3883ffff 60000000 60000000 60000000 <8ca40001> 28050000
> 4082fff8 7c632050
> [ T1] ---[ end trace 0000000000000000 ]---
> [ T1]
> [ T206] ata4: SATA link down (SStatus 0 SControl 300)
> [ T204] ata3: SATA link down (SStatus 0 SControl 300)
> [ T200] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
> [ T200] ata1.00: ATA-10: ST1000NX0313 00LY266 00LY265IBM, BE33,
> max UDMA/133
> [ T200] ata1.00: 1953525168 sectors, multi 0: LBA48 NCQ (depth 32), AA
> [ T200] ata1.00: configured for UDMA/133
> [ T7] scsi 0:0:0:0: Direct-Access ATA ST1000NX0313 BE33
> PQ: 0 ANSI: 5
> [ T7] sd 0:0:0:0: Attached scsi generic sg0 type 0
> [ T209] sd 0:0:0:0: [sda] 1953525168 512-byte logical blocks: (1.00
> TB/932 GiB)
> [ T209] sd 0:0:0:0: [sda] 4096-byte physical blocks
> [ T209] sd 0:0:0:0: [sda] Write Protect is off
> [ T209] sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
> [ T209] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled,
> doesn't support DPO or FUA
> [ T209] sda: sda1 sda2
> [ T209] sd 0:0:0:0: [sda] Attached SCSI removable disk
> [ T1] Kernel panic - not syncing: Attempted to kill init!
> exitcode=0x0000000b
> ```
>
>
> Kind regards,
>
> Paul
On Wed, Jan 26, 2022 at 3:24 PM Paul Menzel <[email protected]> wrote:
>
> Dear Linux folks,
>
>
> I do not know, if this is an rcutorture issue, or if rcutorture found a
> bug with `rtmsg_ifinfo_build_skb()`.
>
>
> Building Linux 5.17-rc1+ (dd81e1c7d5fb) under Ubuntu 21.04 with
>
> CONFIG_TORTURE_TEST=y
> CONFIG_RCU_TORTURE_TEST=y
>
> and
>
> $ clang --version
> Ubuntu clang version 12.0.0-3ubuntu1~21.04.2
> Target: powerpc64le-unknown-linux-gnu
> Thread model: posix
> InstalledDir: /usr/bin
> $ make -j100 LLVM=1 LLVM_IAS=0 bindeb-pkg
I build the kernel in LLVM/Clang also
>
> and booting it on an IBM S822LC, Linux paniced with a null pointer
> dereference, and the watchdog rebooted, and I found the message below in
> `/sys/fs/pstore/dmesg-nvram-2.enc.z`.
>
> ```
> [ T1] Key type id_legacy registered
> [ T1] SGI XFS with ACLs, security attributes, no debug enabled
> [ T1] Block layer SCSI generic (bsg) driver version 0.4 loaded (major
> 248)
> [ T1] io scheduler mq-deadline registered
> [ T1] io scheduler kyber registered
> [ T198] cryptomgr_test (198) used greatest stack depth: 13536 bytes left
> [ T1] pci 0021:10:00.0: enabling device (0141 -> 0143)
> [ T1] Using unsupported 1024x768 (null) at 3fe882010000, depth=32,
> pitch=4096
> [ T1] Console: switching to colour frame buffer device 128x48
> [ T1] fb0: Open Firmware frame buffer device on
> /pciex@3fffe41100000/pci@0/pci@0/pci@b/pci@0/vga@0
> [ T1] hvc0: raw protocol on /ibm,opal/consoles/serial@0 (boot console)
> [ T1] hvc0: No interrupts property, using OPAL event
> [ T1] Serial: 8250/16550 driver, 4 ports, IRQ sharing disabled
> [ T1] Non-volatile memory driver v1.3
> [ T1] brd: module loaded
> [ T1] loop: module loaded
> [ T1] ipr: IBM Power RAID SCSI Device Driver version: 2.6.4 (March
> 14, 2017)
> [ T1] ahci 0021:0e:00.0: version 3.0
> [ T1] ahci 0021:0e:00.0: enabling device (0141 -> 0143)
> [ T1] ahci 0021:0e:00.0: AHCI 0001.0000 32 slots 4 ports 6 Gbps 0xf
> impl SATA mode
> [ T1] ahci 0021:0e:00.0: flags: 64bit ncq sntf led only pmp fbs pio
> slum part sxs
> [ T1] scsi host0: ahci
> [ T1] scsi host1: ahci
> [ T1] scsi host2: ahci
> [ T1] scsi host3: ahci
> [ T1] ata1: SATA max UDMA/133 abar m2048@0x3fe881000000 port
> 0x3fe881000100 irq 39
> [ T1] ata2: SATA max UDMA/133 abar m2048@0x3fe881000000 port
> 0x3fe881000180 irq 39
> [ T1] ata3: SATA max UDMA/133 abar m2048@0x3fe881000000 port
> 0x3fe881000200 irq 39
> [ T1] ata4: SATA max UDMA/133 abar m2048@0x3fe881000000 port
> 0x3fe881000280 irq 39
> [ T1] e100: Intel(R) PRO/100 Network Driver
> [ T1] e100: Copyright(c) 1999-2006 Intel Corporation
> [ T1] e1000: Intel(R) PRO/1000 Network Driver
> [ T1] e1000: Copyright (c) 1999-2006 Intel Corporation.
> [ T1] e1000e: Intel(R) PRO/1000 Network Driver
> [ T1] e1000e: Copyright(c) 1999 - 2015 Intel Corporation.
> [ T1] ehci_hcd: USB 2.0 'Enhanced' Host Controller (EHCI) Driver
> [ T1] ehci-pci: EHCI PCI platform driver
> [ T1] ohci_hcd: USB 1.1 'Open' Host Controller (OHCI) Driver
> [ T1] ohci-pci: OHCI PCI platform driver
> [ T1] rtc-opal opal-rtc: registered as rtc0
> [ T1] rtc-opal opal-rtc: setting system clock to 2022-01-24T18:21:45
> UTC (1643048505)
> [ T1] i2c_dev: i2c /dev entries driver
> [ T1] device-mapper: uevent: version 1.0.3
> [ T1] device-mapper: ioctl: 4.45.0-ioctl (2021-03-22) initialised:
> [email protected]
> [ T1] powernv-cpufreq: cpufreq pstate min 0xffffffd5 nominal
> 0xffffffef max 0x0
> [ T1] powernv-cpufreq: Workload Optimized Frequency is disabled in
> the platform
> [ T1] powernv_idle_driver registered
> [ T1] nx_compress_powernv: coprocessor found on chip 0, CT 3 CI 1
> [ T1] nx_compress_powernv: coprocessor found on chip 8, CT 3 CI 9
> [ T1] usbcore: registered new interface driver usbhid
> [ T1] usbhid: USB HID core driver
> [ T1] ipip: IPv4 and MPLS over IPv4 tunneling driver
> [ T1] NET: Registered PF_INET6 protocol family
> [ T1] Segment Routing with IPv6
> [ T1] In-situ OAM (IOAM) with IPv6
> [ T1] sit: IPv6, IPv4 and MPLS over IPv4 tunneling driver
> [ T1] BUG: Kernel NULL pointer dereference on write at 0x00000000
> [ T1] Faulting instruction address: 0xc0000000008e2400
> [ T1] Oops: Kernel access of bad area, sig: 11 [#1]
> [ T1] LE PAGE_SIZE=64K MMU=Hash PREEMPT SMP NR_CPUS=16 NUMA PowerNV
> [ T1] Modules linked in:
> [ T1] CPU: 11 PID: 1 Comm: swapper/0 Not tainted
> 5.17.0-rc1-00032-gdd81e1c7d5fb #29
> [ T1] NIP: c0000000008e2400 LR: c000000000d65db0 CTR: c000000000f0bb60
> [ T1] REGS: c0000000125033e0 TRAP: 0380 Not tainted
> (5.17.0-rc1-00032-gdd81e1c7d5fb)
> [ T1] MSR: 9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE> CR: 42800c40
> XER: 00000000
> [ T1] CFAR: c000000000d65dac IRQMASK: 0
> [ T1] GPR00: c000000000d65b40 c000000012503680 c00000000290c600
> 0000000000000000
> [ T1] GPR04: ffffffffffffffff 00000000ffffffff 0000000000000000
> 0000000000000cc0
> [ T1] GPR08: 0000000000000000 0000000000000000 ffffffffffffffff
> 0000000000000001
> [ T1] GPR12: 0000000000000000 c000007fffff6c00 c000000000012478
> 0000000000000000
> [ T1] GPR16: 0000000000000000 0000000000000000 0000000000000000
> 0000000000000000
> [ T1] GPR20: 0000000000000000 c000000002810100 0000000000000cc0
> 0000000000000000
> [ T1] GPR24: 0000000000000010 c00000000294cf50 0000000000000000
> 0000000000000000
> [ T1] GPR28: 0000000000000000 c00000001ec61000 0000000000000000
> c000000012503680
> [ T1] NIP [c0000000008e2400] strlen+0x10/0x30
> [ T1] LR [c000000000d65db0] if_nlmsg_size+0x150/0x360
> [ T1] Call Trace:
> [ T1] [c000000012503680] [c0000000125036c0] 0xc0000000125036c0
> (unreliable)
> [ T1] [c0000000125036f0] [c000000000d65b40]
> rtmsg_ifinfo_build_skb+0x80/0x1a0
> [ T1] [c0000000125037b0] [c000000000d66be0] rtmsg_ifinfo+0x70/0xd0
> [ T1] [c000000012503800] [c000000000d4de50]
> register_netdevice+0x690/0x770
> [ T1] [c000000012503890] [c000000000d4e2bc] register_netdev+0x4c/0x80
> [ T1] [c0000000125038c0] [c000000000f4784c] sit_init_net+0x10c/0x1d0
> [ T1] [c000000012503910] [c000000000d33c0c] ops_init+0x13c/0x1b0
> [ T1] [c000000012503970] [c000000000d331bc]
> register_pernet_operations+0xec/0x1e0
> [ T1] [c0000000125039d0] [c000000000d33440]
> register_pernet_device+0x60/0xd0
> [ T1] [c000000012503a20] [c000000002085478] sit_init+0x54/0x160
> [ T1] [c000000012503ab0] [c000000000011ba8] do_one_initcall+0xd8/0x3b0
> [ T1] [c000000012503c70] [c000000002006064] do_initcall_level+0xe4/0x1c4
> [ T1] [c000000012503cc0] [c000000002005f20] do_initcalls+0x84/0xe4
> [ T1] [c000000012503d40] [c000000002005c7c]
> kernel_init_freeable+0x160/0x1ec
> [ T1] [c000000012503da0] [c0000000000124ac] kernel_init+0x3c/0x270
> [ T1] [c000000012503e10] [c00000000000cd64]
> ret_from_kernel_thread+0x5c/0x64
> [ T1] Instruction dump:
> [ T1] eb81ffe0 7c0803a6 4e800020 00000000 00000000 00000000 60000000
> 60000000
> [ T1] 3883ffff 60000000 60000000 60000000 <8ca40001> 28050000
> 4082fff8 7c632050
> [ T1] ---[ end trace 0000000000000000 ]---
> [ T1]
> [ T206] ata4: SATA link down (SStatus 0 SControl 300)
> [ T204] ata3: SATA link down (SStatus 0 SControl 300)
> [ T200] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
> [ T200] ata1.00: ATA-10: ST1000NX0313 00LY266 00LY265IBM, BE33,
> max UDMA/133
> [ T200] ata1.00: 1953525168 sectors, multi 0: LBA48 NCQ (depth 32), AA
> [ T200] ata1.00: configured for UDMA/133
> [ T7] scsi 0:0:0:0: Direct-Access ATA ST1000NX0313 BE33
> PQ: 0 ANSI: 5
> [ T7] sd 0:0:0:0: Attached scsi generic sg0 type 0
> [ T209] sd 0:0:0:0: [sda] 1953525168 512-byte logical blocks: (1.00
> TB/932 GiB)
> [ T209] sd 0:0:0:0: [sda] 4096-byte physical blocks
> [ T209] sd 0:0:0:0: [sda] Write Protect is off
> [ T209] sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
> [ T209] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled,
> doesn't support DPO or FUA
> [ T209] sda: sda1 sda2
> [ T209] sd 0:0:0:0: [sda] Attached SCSI removable disk
> [ T1] Kernel panic - not syncing: Attempted to kill init!
> exitcode=0x0000000b
> ```
>
>
> Kind regards,
>
> Paul
Dear Zhouyi,
Thank you for taking the time.
Am 29.01.22 um 03:23 schrieb Zhouyi Zhou:
> I don't have an IBM machine, but I tried to analyze the problem using
> my x86_64 kvm virtual machine, I can't reproduce the bug using my
> x86_64 kvm virtual machine.
No idea, if it’s architecture specific.
> I saw the panic is caused by registration of sit device (A sit device
> is a type of virtual network device that takes our IPv6 traffic,
> encapsulates/decapsulates it in IPv4 packets, and sends/receives it
> over the IPv4 Internet to another host)
>
> sit device is registered in function sit_init_net:
> 1895 static int __net_init sit_init_net(struct net *net)
> 1896 {
> 1897 struct sit_net *sitn = net_generic(net, sit_net_id);
> 1898 struct ip_tunnel *t;
> 1899 int err;
> 1900
> 1901 sitn->tunnels[0] = sitn->tunnels_wc;
> 1902 sitn->tunnels[1] = sitn->tunnels_l;
> 1903 sitn->tunnels[2] = sitn->tunnels_r;
> 1904 sitn->tunnels[3] = sitn->tunnels_r_l;
> 1905
> 1906 if (!net_has_fallback_tunnels(net))
> 1907 return 0;
> 1908
> 1909 sitn->fb_tunnel_dev = alloc_netdev(sizeof(struct ip_tunnel), "sit0",
> 1910 NET_NAME_UNKNOWN,
> 1911 ipip6_tunnel_setup);
> 1912 if (!sitn->fb_tunnel_dev) {
> 1913 err = -ENOMEM;
> 1914 goto err_alloc_dev;
> 1915 }
> 1916 dev_net_set(sitn->fb_tunnel_dev, net);
> 1917 sitn->fb_tunnel_dev->rtnl_link_ops = &sit_link_ops;
> 1918 /* FB netdevice is special: we have one, and only one per netns.
> 1919 * Allowing to move it to another netns is clearly unsafe.
> 1920 */
> 1921 sitn->fb_tunnel_dev->features |= NETIF_F_NETNS_LOCAL;
> 1922
> 1923 err = register_netdev(sitn->fb_tunnel_dev);
> register_netdev on line 1923 will call if_nlmsg_size indirectly.
>
> On the other hand, the function that calls the paniced strlen is if_nlmsg_size:
> (gdb) disassemble if_nlmsg_size
> Dump of assembler code for function if_nlmsg_size:
> 0xffffffff81a0dc20 <+0>: nopl 0x0(%rax,%rax,1)
> 0xffffffff81a0dc25 <+5>: push %rbp
> 0xffffffff81a0dc26 <+6>: push %r15
> 0xffffffff81a0dd04 <+228>: je 0xffffffff81a0de20 <if_nlmsg_size+512>
> 0xffffffff81a0dd0a <+234>: mov 0x10(%rbp),%rdi
> ...
> => 0xffffffff81a0dd0e <+238>: callq 0xffffffff817532d0 <strlen>
> 0xffffffff81a0dd13 <+243>: add $0x10,%eax
> 0xffffffff81a0dd16 <+246>: movslq %eax,%r12
Excuse my ignorance, would that look the same for ppc64le?
Unfortunately, I didn’t save the problematic `vmlinuz` file, but on a
current build (without rcutorture) I have the line below, where strlen
shows up.
(gdb) disassemble if_nlmsg_size
[…]
0xc000000000f7f82c <+332>: bl 0xc000000000a10e30 <strlen>
[…]
> and the C code for 0xffffffff81a0dd0e is following (line 524):
> 515 static size_t rtnl_link_get_size(const struct net_device *dev)
> 516 {
> 517 const struct rtnl_link_ops *ops = dev->rtnl_link_ops;
> 518 size_t size;
> 519
> 520 if (!ops)
> 521 return 0;
> 522
> 523 size = nla_total_size(sizeof(struct nlattr)) + /* IFLA_LINKINFO */
> 524 nla_total_size(strlen(ops->kind) + 1); /* IFLA_INFO_KIND */
How do I connect the disassemby output with the corresponding line?
> But ops is assigned the value of sit_link_ops in function sit_init_net
> line 1917, so I guess something must happened between the calls.
>
> Do we have KASAN in IBM machine? would KASAN help us find out what
> happened in between?
Unfortunately, KASAN is not support on Power, I have, as far as I can
see. From `arch/powerpc/Kconfig`:
select HAVE_ARCH_KASAN if PPC32 &&
PPC_PAGE_SHIFT <= 14
select HAVE_ARCH_KASAN_VMALLOC if PPC32 &&
PPC_PAGE_SHIFT <= 14
> Hope I can be of more helpful.
Some distributions support multi-arch, so they easily allow
crosscompiling for different architectures.
Kind regards,
Paul
Dear Paul,
Thank you for your instructions, I learned a lot from this process.
On Sun, Jan 30, 2022 at 12:52 AM Paul Menzel <[email protected]> wrote:
>
> Dear Zhouyi,
>
>
> Thank you for taking the time.
>
>
> Am 29.01.22 um 03:23 schrieb Zhouyi Zhou:
>
> > I don't have an IBM machine, but I tried to analyze the problem using
> > my x86_64 kvm virtual machine, I can't reproduce the bug using my
> > x86_64 kvm virtual machine.
>
> No idea, if it’s architecture specific.
>
> > I saw the panic is caused by registration of sit device (A sit device
> > is a type of virtual network device that takes our IPv6 traffic,
> > encapsulates/decapsulates it in IPv4 packets, and sends/receives it
> > over the IPv4 Internet to another host)
> >
> > sit device is registered in function sit_init_net:
> > 1895 static int __net_init sit_init_net(struct net *net)
> > 1896 {
> > 1897 struct sit_net *sitn = net_generic(net, sit_net_id);
> > 1898 struct ip_tunnel *t;
> > 1899 int err;
> > 1900
> > 1901 sitn->tunnels[0] = sitn->tunnels_wc;
> > 1902 sitn->tunnels[1] = sitn->tunnels_l;
> > 1903 sitn->tunnels[2] = sitn->tunnels_r;
> > 1904 sitn->tunnels[3] = sitn->tunnels_r_l;
> > 1905
> > 1906 if (!net_has_fallback_tunnels(net))
> > 1907 return 0;
> > 1908
> > 1909 sitn->fb_tunnel_dev = alloc_netdev(sizeof(struct ip_tunnel), "sit0",
> > 1910 NET_NAME_UNKNOWN,
> > 1911 ipip6_tunnel_setup);
> > 1912 if (!sitn->fb_tunnel_dev) {
> > 1913 err = -ENOMEM;
> > 1914 goto err_alloc_dev;
> > 1915 }
> > 1916 dev_net_set(sitn->fb_tunnel_dev, net);
> > 1917 sitn->fb_tunnel_dev->rtnl_link_ops = &sit_link_ops;
> > 1918 /* FB netdevice is special: we have one, and only one per netns.
> > 1919 * Allowing to move it to another netns is clearly unsafe.
> > 1920 */
> > 1921 sitn->fb_tunnel_dev->features |= NETIF_F_NETNS_LOCAL;
> > 1922
> > 1923 err = register_netdev(sitn->fb_tunnel_dev);
> > register_netdev on line 1923 will call if_nlmsg_size indirectly.
> >
> > On the other hand, the function that calls the paniced strlen is if_nlmsg_size:
> > (gdb) disassemble if_nlmsg_size
> > Dump of assembler code for function if_nlmsg_size:
> > 0xffffffff81a0dc20 <+0>: nopl 0x0(%rax,%rax,1)
> > 0xffffffff81a0dc25 <+5>: push %rbp
> > 0xffffffff81a0dc26 <+6>: push %r15
> > 0xffffffff81a0dd04 <+228>: je 0xffffffff81a0de20 <if_nlmsg_size+512>
> > 0xffffffff81a0dd0a <+234>: mov 0x10(%rbp),%rdi
> > ...
> > => 0xffffffff81a0dd0e <+238>: callq 0xffffffff817532d0 <strlen>
> > 0xffffffff81a0dd13 <+243>: add $0x10,%eax
> > 0xffffffff81a0dd16 <+246>: movslq %eax,%r12
>
> Excuse my ignorance, would that look the same for ppc64le?
> Unfortunately, I didn’t save the problematic `vmlinuz` file, but on a
> current build (without rcutorture) I have the line below, where strlen
> shows up.
>
> (gdb) disassemble if_nlmsg_size
> […]
> 0xc000000000f7f82c <+332>: bl 0xc000000000a10e30 <strlen>
> […]
>
> > and the C code for 0xffffffff81a0dd0e is following (line 524):
> > 515 static size_t rtnl_link_get_size(const struct net_device *dev)
> > 516 {
> > 517 const struct rtnl_link_ops *ops = dev->rtnl_link_ops;
> > 518 size_t size;
> > 519
> > 520 if (!ops)
> > 521 return 0;
> > 522
> > 523 size = nla_total_size(sizeof(struct nlattr)) + /* IFLA_LINKINFO */
> > 524 nla_total_size(strlen(ops->kind) + 1); /* IFLA_INFO_KIND */
>
> How do I connect the disassemby output with the corresponding line?
I use "make ARCH=powerpc CC=powerpc64le-linux-gnu-gcc-9
CROSS_COMPILE=powerpc64le-linux-gnu- -j 16" to cross compile kernel
for powerpc64le in my Ubuntu 20.04 x86_64.
gdb-multiarch ./vmlinux
(gdb)disassemble if_nlmsg_size
[...]
0xc00000000191bf40 <+112>: bl 0xc000000001c28ad0 <strlen>
[...]
(gdb) break *0xc00000000191bf40
Breakpoint 1 at 0xc00000000191bf40: file ./include/net/netlink.h, line 1112.
But in include/net/netlink.h:1112, I can't find the call to strlen
1110static inline int nla_total_size(int payload)
1111{
1112 return NLA_ALIGN(nla_attr_size(payload));
1113}
This may be due to the compiler wrongly encode the debug information, I guess.
>
> > But ops is assigned the value of sit_link_ops in function sit_init_net
> > line 1917, so I guess something must happened between the calls.
> >
> > Do we have KASAN in IBM machine? would KASAN help us find out what
> > happened in between?
>
> Unfortunately, KASAN is not support on Power, I have, as far as I can
> see. From `arch/powerpc/Kconfig`:
>
> select HAVE_ARCH_KASAN if PPC32 &&
> PPC_PAGE_SHIFT <= 14
> select HAVE_ARCH_KASAN_VMALLOC if PPC32 &&
> PPC_PAGE_SHIFT <= 14
>
en, agree, I invoke "make menuconfig ARCH=powerpc
CC=powerpc64le-linux-gnu-gcc-9 CROSS_COMPILE=powerpc64le-linux-gnu- -j
16", I can't find KASAN under Memory Debugging, I guess we should find
the bug by bisecting instead.
> > Hope I can be of more helpful.
>
> Some distributions support multi-arch, so they easily allow
> crosscompiling for different architectures.
I use "make ARCH=powerpc CC=powerpc64le-linux-gnu-gcc-9
CROSS_COMPILE=powerpc64le-linux-gnu- -j 16" to cross compile kernel
for powerpc64le in my Ubuntu 20.04 x86_64. But I can't boot the
compiled kernel using "qemu-system-ppc64le -M pseries -nographic -smp
4 -net none -m 4G -kernel arch/powerpc/boot/zImage". I will continue
to explore it.
Kind regards
Zhouyi
>
>
> Kind regards,
>
> Paul
Dear Zhouyi,
Am 30.01.22 um 01:21 schrieb Zhouyi Zhou:
> Thank you for your instructions, I learned a lot from this process.
Same on my end.
> On Sun, Jan 30, 2022 at 12:52 AM Paul Menzel <[email protected]> wrote:
>> Am 29.01.22 um 03:23 schrieb Zhouyi Zhou:
>>
>>> I don't have an IBM machine, but I tried to analyze the problem using
>>> my x86_64 kvm virtual machine, I can't reproduce the bug using my
>>> x86_64 kvm virtual machine.
>>
>> No idea, if it’s architecture specific.
>>
>>> I saw the panic is caused by registration of sit device (A sit device
>>> is a type of virtual network device that takes our IPv6 traffic,
>>> encapsulates/decapsulates it in IPv4 packets, and sends/receives it
>>> over the IPv4 Internet to another host)
>>>
>>> sit device is registered in function sit_init_net:
>>> 1895 static int __net_init sit_init_net(struct net *net)
>>> 1896 {
>>> 1897 struct sit_net *sitn = net_generic(net, sit_net_id);
>>> 1898 struct ip_tunnel *t;
>>> 1899 int err;
>>> 1900
>>> 1901 sitn->tunnels[0] = sitn->tunnels_wc;
>>> 1902 sitn->tunnels[1] = sitn->tunnels_l;
>>> 1903 sitn->tunnels[2] = sitn->tunnels_r;
>>> 1904 sitn->tunnels[3] = sitn->tunnels_r_l;
>>> 1905
>>> 1906 if (!net_has_fallback_tunnels(net))
>>> 1907 return 0;
>>> 1908
>>> 1909 sitn->fb_tunnel_dev = alloc_netdev(sizeof(struct ip_tunnel), "sit0",
>>> 1910 NET_NAME_UNKNOWN,
>>> 1911 ipip6_tunnel_setup);
>>> 1912 if (!sitn->fb_tunnel_dev) {
>>> 1913 err = -ENOMEM;
>>> 1914 goto err_alloc_dev;
>>> 1915 }
>>> 1916 dev_net_set(sitn->fb_tunnel_dev, net);
>>> 1917 sitn->fb_tunnel_dev->rtnl_link_ops = &sit_link_ops;
>>> 1918 /* FB netdevice is special: we have one, and only one per netns.
>>> 1919 * Allowing to move it to another netns is clearly unsafe.
>>> 1920 */
>>> 1921 sitn->fb_tunnel_dev->features |= NETIF_F_NETNS_LOCAL;
>>> 1922
>>> 1923 err = register_netdev(sitn->fb_tunnel_dev);
>>> register_netdev on line 1923 will call if_nlmsg_size indirectly.
>>>
>>> On the other hand, the function that calls the paniced strlen is if_nlmsg_size:
>>> (gdb) disassemble if_nlmsg_size
>>> Dump of assembler code for function if_nlmsg_size:
>>> 0xffffffff81a0dc20 <+0>: nopl 0x0(%rax,%rax,1)
>>> 0xffffffff81a0dc25 <+5>: push %rbp
>>> 0xffffffff81a0dc26 <+6>: push %r15
>>> 0xffffffff81a0dd04 <+228>: je 0xffffffff81a0de20 <if_nlmsg_size+512>
>>> 0xffffffff81a0dd0a <+234>: mov 0x10(%rbp),%rdi
>>> ...
>>> => 0xffffffff81a0dd0e <+238>: callq 0xffffffff817532d0 <strlen>
>>> 0xffffffff81a0dd13 <+243>: add $0x10,%eax
>>> 0xffffffff81a0dd16 <+246>: movslq %eax,%r12
>>
>> Excuse my ignorance, would that look the same for ppc64le?
>> Unfortunately, I didn’t save the problematic `vmlinuz` file, but on a
>> current build (without rcutorture) I have the line below, where strlen
>> shows up.
>>
>> (gdb) disassemble if_nlmsg_size
>> […]
>> 0xc000000000f7f82c <+332>: bl 0xc000000000a10e30 <strlen>
>> […]
>>
>>> and the C code for 0xffffffff81a0dd0e is following (line 524):
>>> 515 static size_t rtnl_link_get_size(const struct net_device *dev)
>>> 516 {
>>> 517 const struct rtnl_link_ops *ops = dev->rtnl_link_ops;
>>> 518 size_t size;
>>> 519
>>> 520 if (!ops)
>>> 521 return 0;
>>> 522
>>> 523 size = nla_total_size(sizeof(struct nlattr)) + /* IFLA_LINKINFO */
>>> 524 nla_total_size(strlen(ops->kind) + 1); /* IFLA_INFO_KIND */
>>
>> How do I connect the disassemby output with the corresponding line?
> I use "make ARCH=powerpc CC=powerpc64le-linux-gnu-gcc-9
> CROSS_COMPILE=powerpc64le-linux-gnu- -j 16" to cross compile kernel
> for powerpc64le in my Ubuntu 20.04 x86_64.
>
> gdb-multiarch ./vmlinux
> (gdb)disassemble if_nlmsg_size
> [...]
> 0xc00000000191bf40 <+112>: bl 0xc000000001c28ad0 <strlen>
> [...]
> (gdb) break *0xc00000000191bf40
> Breakpoint 1 at 0xc00000000191bf40: file ./include/net/netlink.h, line 1112.
>
> But in include/net/netlink.h:1112, I can't find the call to strlen
> 1110static inline int nla_total_size(int payload)
> 1111{
> 1112 return NLA_ALIGN(nla_attr_size(payload));
> 1113}
> This may be due to the compiler wrongly encode the debug information, I guess.
`rtnl_link_get_size()` contains:
size = nla_total_size(sizeof(struct nlattr)) + /*
IFLA_LINKINFO */
nla_total_size(strlen(ops->kind) + 1); /*
IFLA_INFO_KIND */
Is that inlined(?) and the code at fault?
>>> But ops is assigned the value of sit_link_ops in function sit_init_net
>>> line 1917, so I guess something must happened between the calls.
>>>
>>> Do we have KASAN in IBM machine? would KASAN help us find out what
>>> happened in between?
>>
>> Unfortunately, KASAN is not support on Power, I have, as far as I can
>> see. From `arch/powerpc/Kconfig`:
>>
>> select HAVE_ARCH_KASAN if PPC32 && PPC_PAGE_SHIFT <= 14
>> select HAVE_ARCH_KASAN_VMALLOC if PPC32 && PPC_PAGE_SHIFT <= 14
>>
> en, agree, I invoke "make menuconfig ARCH=powerpc
> CC=powerpc64le-linux-gnu-gcc-9 CROSS_COMPILE=powerpc64le-linux-gnu- -j
> 16", I can't find KASAN under Memory Debugging, I guess we should find
> the bug by bisecting instead.
I do not know, if it is a regression, as it was the first time I tried
to run a Linux kernel built with rcutorture on real hardware.
>>> Hope I can be of more helpful.
>>
>> Some distributions support multi-arch, so they easily allow
>> crosscompiling for different architectures.
> I use "make ARCH=powerpc CC=powerpc64le-linux-gnu-gcc-9
> CROSS_COMPILE=powerpc64le-linux-gnu- -j 16" to cross compile kernel
> for powerpc64le in my Ubuntu 20.04 x86_64. But I can't boot the
> compiled kernel using "qemu-system-ppc64le -M pseries -nographic -smp
> 4 -net none -m 4G -kernel arch/powerpc/boot/zImage". I will continue
> to explore it.
Oh, that does not sound good. But I have not tried that in a long time
either. It’s a separate issue, but maybe some of the PPC
maintainers/folks could help.
Kind regards,
Paul
Dear Paul
On Sun, Jan 30, 2022 at 4:19 PM Paul Menzel <[email protected]> wrote:
>
> Dear Zhouyi,
>
>
> Am 30.01.22 um 01:21 schrieb Zhouyi Zhou:
>
> > Thank you for your instructions, I learned a lot from this process.
>
> Same on my end.
>
> > On Sun, Jan 30, 2022 at 12:52 AM Paul Menzel <[email protected]> wrote:
>
> >> Am 29.01.22 um 03:23 schrieb Zhouyi Zhou:
> >>
> >>> I don't have an IBM machine, but I tried to analyze the problem using
> >>> my x86_64 kvm virtual machine, I can't reproduce the bug using my
> >>> x86_64 kvm virtual machine.
> >>
> >> No idea, if it’s architecture specific.
> >>
> >>> I saw the panic is caused by registration of sit device (A sit device
> >>> is a type of virtual network device that takes our IPv6 traffic,
> >>> encapsulates/decapsulates it in IPv4 packets, and sends/receives it
> >>> over the IPv4 Internet to another host)
> >>>
> >>> sit device is registered in function sit_init_net:
> >>> 1895 static int __net_init sit_init_net(struct net *net)
> >>> 1896 {
> >>> 1897 struct sit_net *sitn = net_generic(net, sit_net_id);
> >>> 1898 struct ip_tunnel *t;
> >>> 1899 int err;
> >>> 1900
> >>> 1901 sitn->tunnels[0] = sitn->tunnels_wc;
> >>> 1902 sitn->tunnels[1] = sitn->tunnels_l;
> >>> 1903 sitn->tunnels[2] = sitn->tunnels_r;
> >>> 1904 sitn->tunnels[3] = sitn->tunnels_r_l;
> >>> 1905
> >>> 1906 if (!net_has_fallback_tunnels(net))
> >>> 1907 return 0;
> >>> 1908
> >>> 1909 sitn->fb_tunnel_dev = alloc_netdev(sizeof(struct ip_tunnel), "sit0",
> >>> 1910 NET_NAME_UNKNOWN,
> >>> 1911 ipip6_tunnel_setup);
> >>> 1912 if (!sitn->fb_tunnel_dev) {
> >>> 1913 err = -ENOMEM;
> >>> 1914 goto err_alloc_dev;
> >>> 1915 }
> >>> 1916 dev_net_set(sitn->fb_tunnel_dev, net);
> >>> 1917 sitn->fb_tunnel_dev->rtnl_link_ops = &sit_link_ops;
> >>> 1918 /* FB netdevice is special: we have one, and only one per netns.
> >>> 1919 * Allowing to move it to another netns is clearly unsafe.
> >>> 1920 */
> >>> 1921 sitn->fb_tunnel_dev->features |= NETIF_F_NETNS_LOCAL;
> >>> 1922
> >>> 1923 err = register_netdev(sitn->fb_tunnel_dev);
> >>> register_netdev on line 1923 will call if_nlmsg_size indirectly.
> >>>
> >>> On the other hand, the function that calls the paniced strlen is if_nlmsg_size:
> >>> (gdb) disassemble if_nlmsg_size
> >>> Dump of assembler code for function if_nlmsg_size:
> >>> 0xffffffff81a0dc20 <+0>: nopl 0x0(%rax,%rax,1)
> >>> 0xffffffff81a0dc25 <+5>: push %rbp
> >>> 0xffffffff81a0dc26 <+6>: push %r15
> >>> 0xffffffff81a0dd04 <+228>: je 0xffffffff81a0de20 <if_nlmsg_size+512>
> >>> 0xffffffff81a0dd0a <+234>: mov 0x10(%rbp),%rdi
> >>> ...
> >>> => 0xffffffff81a0dd0e <+238>: callq 0xffffffff817532d0 <strlen>
> >>> 0xffffffff81a0dd13 <+243>: add $0x10,%eax
> >>> 0xffffffff81a0dd16 <+246>: movslq %eax,%r12
> >>
> >> Excuse my ignorance, would that look the same for ppc64le?
> >> Unfortunately, I didn’t save the problematic `vmlinuz` file, but on a
> >> current build (without rcutorture) I have the line below, where strlen
> >> shows up.
> >>
> >> (gdb) disassemble if_nlmsg_size
> >> […]
> >> 0xc000000000f7f82c <+332>: bl 0xc000000000a10e30 <strlen>
> >> […]
> >>
> >>> and the C code for 0xffffffff81a0dd0e is following (line 524):
> >>> 515 static size_t rtnl_link_get_size(const struct net_device *dev)
> >>> 516 {
> >>> 517 const struct rtnl_link_ops *ops = dev->rtnl_link_ops;
> >>> 518 size_t size;
> >>> 519
> >>> 520 if (!ops)
> >>> 521 return 0;
> >>> 522
> >>> 523 size = nla_total_size(sizeof(struct nlattr)) + /* IFLA_LINKINFO */
> >>> 524 nla_total_size(strlen(ops->kind) + 1); /* IFLA_INFO_KIND */
> >>
> >> How do I connect the disassemby output with the corresponding line?
> > I use "make ARCH=powerpc CC=powerpc64le-linux-gnu-gcc-9
> > CROSS_COMPILE=powerpc64le-linux-gnu- -j 16" to cross compile kernel
> > for powerpc64le in my Ubuntu 20.04 x86_64.
> >
> > gdb-multiarch ./vmlinux
> > (gdb)disassemble if_nlmsg_size
> > [...]
> > 0xc00000000191bf40 <+112>: bl 0xc000000001c28ad0 <strlen>
> > [...]
> > (gdb) break *0xc00000000191bf40
> > Breakpoint 1 at 0xc00000000191bf40: file ./include/net/netlink.h, line 1112.
> >
> > But in include/net/netlink.h:1112, I can't find the call to strlen
> > 1110static inline int nla_total_size(int payload)
> > 1111{
> > 1112 return NLA_ALIGN(nla_attr_size(payload));
> > 1113}
> > This may be due to the compiler wrongly encode the debug information, I guess.
>
> `rtnl_link_get_size()` contains:
>
> size = nla_total_size(sizeof(struct nlattr)) + /*
> IFLA_LINKINFO */
> nla_total_size(strlen(ops->kind) + 1); /*
> IFLA_INFO_KIND */
>
> Is that inlined(?) and the code at fault?
Yes, that is inlined! because
(gdb) disassemble if_nlmsg_size
Dump of assembler code for function if_nlmsg_size:
[...]
0xc00000000191bf38 <+104>: beq 0xc00000000191c1f0 <if_nlmsg_size+800>
0xc00000000191bf3c <+108>: ld r3,16(r31)
0xc00000000191bf40 <+112>: bl 0xc000000001c28ad0 <strlen>
[...]
(gdb)
(gdb) break *0xc00000000191bf40
Breakpoint 1 at 0xc00000000191bf40: file ./include/net/netlink.h, line 1112.
(gdb) break *0xc00000000191bf38
Breakpoint 2 at 0xc00000000191bf38: file net/core/rtnetlink.c, line 520.
>
> >>> But ops is assigned the value of sit_link_ops in function sit_init_net
> >>> line 1917, so I guess something must happened between the calls.
> >>>
> >>> Do we have KASAN in IBM machine? would KASAN help us find out what
> >>> happened in between?
> >>
> >> Unfortunately, KASAN is not support on Power, I have, as far as I can
> >> see. From `arch/powerpc/Kconfig`:
> >>
> >> select HAVE_ARCH_KASAN if PPC32 && PPC_PAGE_SHIFT <= 14
> >> select HAVE_ARCH_KASAN_VMALLOC if PPC32 && PPC_PAGE_SHIFT <= 14
> >>
> > en, agree, I invoke "make menuconfig ARCH=powerpc
> > CC=powerpc64le-linux-gnu-gcc-9 CROSS_COMPILE=powerpc64le-linux-gnu- -j
> > 16", I can't find KASAN under Memory Debugging, I guess we should find
> > the bug by bisecting instead.
>
> I do not know, if it is a regression, as it was the first time I tried
> to run a Linux kernel built with rcutorture on real hardware.
I tried to add some debug statements to the kernel to locate the bug
more accurately, you can try it when you're not busy in the future,
or just ignore it if the following patch looks not very effective ;-)
diff --git a/net/core/dev.c b/net/core/dev.c
index 1baab07820f6..969ac7c540cc 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -9707,6 +9707,9 @@ int register_netdevice(struct net_device *dev)
* Prevent userspace races by waiting until the network
* device is fully setup before sending notifications.
*/
+ if (dev->rtnl_link_ops)
+ printk(KERN_INFO "%lx IFLA_INFO_KIND %s %s\n", dev->rtnl_link_ops,
+ dev->rtnl_link_ops->kind, __FUNCTION__);
if (!dev->rtnl_link_ops ||
dev->rtnl_link_state == RTNL_LINK_INITIALIZED)
rtmsg_ifinfo(RTM_NEWLINK, dev, ~0U, GFP_KERNEL);
@@ -9788,6 +9791,9 @@ int register_netdev(struct net_device *dev)
if (rtnl_lock_killable())
return -EINTR;
+ if (dev->rtnl_link_ops)
+ printk(KERN_INFO "%lx IFLA_INFO_KIND %s %s\n", dev->rtnl_link_ops,
+ dev->rtnl_link_ops->kind, __FUNCTION__);
err = register_netdevice(dev);
rtnl_unlock();
return err;
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index e476403231f0..e08986ae6238 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -520,6 +520,8 @@ static size_t rtnl_link_get_size(const struct
net_device *dev)
if (!ops)
return 0;
+ printk(KERN_INFO "%lx IFLA_INFO_KIND %s %s\n", ops,
+ ops->kind, __FUNCTION__);
size = nla_total_size(sizeof(struct nlattr)) + /* IFLA_LINKINFO */
nla_total_size(strlen(ops->kind) + 1); /* IFLA_INFO_KIND */
@@ -1006,6 +1008,9 @@ static size_t rtnl_proto_down_size(const struct
net_device *dev)
static noinline size_t if_nlmsg_size(const struct net_device *dev,
u32 ext_filter_mask)
{
+ if (dev->rtnl_link_ops)
+ printk(KERN_INFO "%lx IFLA_INFO_KIND %s %s\n", dev->rtnl_link_ops,
+ dev->rtnl_link_ops->kind, __FUNCTION__);
return NLMSG_ALIGN(sizeof(struct ifinfomsg))
+ nla_total_size(IFNAMSIZ) /* IFLA_IFNAME */
+ nla_total_size(IFALIASZ) /* IFLA_IFALIAS */
@@ -3825,7 +3830,9 @@ struct sk_buff *rtmsg_ifinfo_build_skb(int type,
struct net_device *dev,
struct net *net = dev_net(dev);
struct sk_buff *skb;
int err = -ENOBUFS;
-
+ if (dev->rtnl_link_ops)
+ printk(KERN_INFO "%lx IFLA_INFO_KIND %s %s\n", dev->rtnl_link_ops,
+ dev->rtnl_link_ops->kind, __FUNCTION__);
skb = nlmsg_new(if_nlmsg_size(dev, 0), flags);
if (skb == NULL)
goto errout;
@@ -3861,7 +3868,9 @@ static void rtmsg_ifinfo_event(int type, struct
net_device *dev,
if (dev->reg_state != NETREG_REGISTERED)
return;
-
+ if (dev->rtnl_link_ops)
+ printk(KERN_INFO "%lx IFLA_INFO_KIND %s %s\n", dev->rtnl_link_ops,
+ dev->rtnl_link_ops->kind, __FUNCTION__);
skb = rtmsg_ifinfo_build_skb(type, dev, change, event, flags, new_nsid,
new_ifindex);
if (skb)
@@ -3871,6 +3880,9 @@ static void rtmsg_ifinfo_event(int type, struct
net_device *dev,
void rtmsg_ifinfo(int type, struct net_device *dev, unsigned int change,
gfp_t flags)
{
+ if (dev->rtnl_link_ops)
+ printk(KERN_INFO "%lx IFLA_INFO_KIND %s %s\n", dev->rtnl_link_ops,
+ dev->rtnl_link_ops->kind, __FUNCTION__);
rtmsg_ifinfo_event(type, dev, change, rtnl_get_event(0), flags,
NULL, 0);
}
diff --git a/net/ipv6/sit.c b/net/ipv6/sit.c
index c0b138c20992..fa5b2725811c 100644
--- a/net/ipv6/sit.c
+++ b/net/ipv6/sit.c
@@ -1919,6 +1919,8 @@ static int __net_init sit_init_net(struct net *net)
* Allowing to move it to another netns is clearly unsafe.
*/
sitn->fb_tunnel_dev->features |= NETIF_F_NETNS_LOCAL;
-
+ printk(KERN_INFO "%lx IFLA_INFO_KIND %s %s\n",
+ sitn->fb_tunnel_dev->rtnl_link_ops,
+ sitn->fb_tunnel_dev->rtnl_link_ops->kind, __FUNCTION__);
err = register_netdev(sitn->fb_tunnel_dev);
if (err)
goto err_reg_dev;
>
> >>> Hope I can be of more helpful.
> >>
> >> Some distributions support multi-arch, so they easily allow
> >> crosscompiling for different architectures.
> > I use "make ARCH=powerpc CC=powerpc64le-linux-gnu-gcc-9
> > CROSS_COMPILE=powerpc64le-linux-gnu- -j 16" to cross compile kernel
> > for powerpc64le in my Ubuntu 20.04 x86_64. But I can't boot the
> > compiled kernel using "qemu-system-ppc64le -M pseries -nographic -smp
> > 4 -net none -m 4G -kernel arch/powerpc/boot/zImage". I will continue
> > to explore it.
>
> Oh, that does not sound good. But I have not tried that in a long time
> either. It’s a separate issue, but maybe some of the PPC
> maintainers/folks could help.
I will do further research on this later.
Thanks for your time
Kind regards
Zhouyi
>
>
> Kind regards,
>
> Paul
On Sun, Jan 30, 2022 at 09:24:44PM +0800, Zhouyi Zhou wrote:
> Dear Paul
>
> On Sun, Jan 30, 2022 at 4:19 PM Paul Menzel <[email protected]> wrote:
> >
> > Dear Zhouyi,
> >
> >
> > Am 30.01.22 um 01:21 schrieb Zhouyi Zhou:
> >
> > > Thank you for your instructions, I learned a lot from this process.
> >
> > Same on my end.
> >
> > > On Sun, Jan 30, 2022 at 12:52 AM Paul Menzel <[email protected]> wrote:
> >
> > >> Am 29.01.22 um 03:23 schrieb Zhouyi Zhou:
> > >>
> > >>> I don't have an IBM machine, but I tried to analyze the problem using
> > >>> my x86_64 kvm virtual machine, I can't reproduce the bug using my
> > >>> x86_64 kvm virtual machine.
> > >>
> > >> No idea, if it’s architecture specific.
> > >>
> > >>> I saw the panic is caused by registration of sit device (A sit device
> > >>> is a type of virtual network device that takes our IPv6 traffic,
> > >>> encapsulates/decapsulates it in IPv4 packets, and sends/receives it
> > >>> over the IPv4 Internet to another host)
> > >>>
> > >>> sit device is registered in function sit_init_net:
> > >>> 1895 static int __net_init sit_init_net(struct net *net)
> > >>> 1896 {
> > >>> 1897 struct sit_net *sitn = net_generic(net, sit_net_id);
> > >>> 1898 struct ip_tunnel *t;
> > >>> 1899 int err;
> > >>> 1900
> > >>> 1901 sitn->tunnels[0] = sitn->tunnels_wc;
> > >>> 1902 sitn->tunnels[1] = sitn->tunnels_l;
> > >>> 1903 sitn->tunnels[2] = sitn->tunnels_r;
> > >>> 1904 sitn->tunnels[3] = sitn->tunnels_r_l;
> > >>> 1905
> > >>> 1906 if (!net_has_fallback_tunnels(net))
> > >>> 1907 return 0;
> > >>> 1908
> > >>> 1909 sitn->fb_tunnel_dev = alloc_netdev(sizeof(struct ip_tunnel), "sit0",
> > >>> 1910 NET_NAME_UNKNOWN,
> > >>> 1911 ipip6_tunnel_setup);
> > >>> 1912 if (!sitn->fb_tunnel_dev) {
> > >>> 1913 err = -ENOMEM;
> > >>> 1914 goto err_alloc_dev;
> > >>> 1915 }
> > >>> 1916 dev_net_set(sitn->fb_tunnel_dev, net);
> > >>> 1917 sitn->fb_tunnel_dev->rtnl_link_ops = &sit_link_ops;
> > >>> 1918 /* FB netdevice is special: we have one, and only one per netns.
> > >>> 1919 * Allowing to move it to another netns is clearly unsafe.
> > >>> 1920 */
> > >>> 1921 sitn->fb_tunnel_dev->features |= NETIF_F_NETNS_LOCAL;
> > >>> 1922
> > >>> 1923 err = register_netdev(sitn->fb_tunnel_dev);
> > >>> register_netdev on line 1923 will call if_nlmsg_size indirectly.
> > >>>
> > >>> On the other hand, the function that calls the paniced strlen is if_nlmsg_size:
> > >>> (gdb) disassemble if_nlmsg_size
> > >>> Dump of assembler code for function if_nlmsg_size:
> > >>> 0xffffffff81a0dc20 <+0>: nopl 0x0(%rax,%rax,1)
> > >>> 0xffffffff81a0dc25 <+5>: push %rbp
> > >>> 0xffffffff81a0dc26 <+6>: push %r15
> > >>> 0xffffffff81a0dd04 <+228>: je 0xffffffff81a0de20 <if_nlmsg_size+512>
> > >>> 0xffffffff81a0dd0a <+234>: mov 0x10(%rbp),%rdi
> > >>> ...
> > >>> => 0xffffffff81a0dd0e <+238>: callq 0xffffffff817532d0 <strlen>
> > >>> 0xffffffff81a0dd13 <+243>: add $0x10,%eax
> > >>> 0xffffffff81a0dd16 <+246>: movslq %eax,%r12
> > >>
> > >> Excuse my ignorance, would that look the same for ppc64le?
> > >> Unfortunately, I didn’t save the problematic `vmlinuz` file, but on a
> > >> current build (without rcutorture) I have the line below, where strlen
> > >> shows up.
> > >>
> > >> (gdb) disassemble if_nlmsg_size
> > >> […]
> > >> 0xc000000000f7f82c <+332>: bl 0xc000000000a10e30 <strlen>
> > >> […]
> > >>
> > >>> and the C code for 0xffffffff81a0dd0e is following (line 524):
> > >>> 515 static size_t rtnl_link_get_size(const struct net_device *dev)
> > >>> 516 {
> > >>> 517 const struct rtnl_link_ops *ops = dev->rtnl_link_ops;
> > >>> 518 size_t size;
> > >>> 519
> > >>> 520 if (!ops)
> > >>> 521 return 0;
> > >>> 522
> > >>> 523 size = nla_total_size(sizeof(struct nlattr)) + /* IFLA_LINKINFO */
> > >>> 524 nla_total_size(strlen(ops->kind) + 1); /* IFLA_INFO_KIND */
> > >>
> > >> How do I connect the disassemby output with the corresponding line?
> > > I use "make ARCH=powerpc CC=powerpc64le-linux-gnu-gcc-9
> > > CROSS_COMPILE=powerpc64le-linux-gnu- -j 16" to cross compile kernel
> > > for powerpc64le in my Ubuntu 20.04 x86_64.
> > >
> > > gdb-multiarch ./vmlinux
> > > (gdb)disassemble if_nlmsg_size
> > > [...]
> > > 0xc00000000191bf40 <+112>: bl 0xc000000001c28ad0 <strlen>
> > > [...]
> > > (gdb) break *0xc00000000191bf40
> > > Breakpoint 1 at 0xc00000000191bf40: file ./include/net/netlink.h, line 1112.
> > >
> > > But in include/net/netlink.h:1112, I can't find the call to strlen
> > > 1110static inline int nla_total_size(int payload)
> > > 1111{
> > > 1112 return NLA_ALIGN(nla_attr_size(payload));
> > > 1113}
> > > This may be due to the compiler wrongly encode the debug information, I guess.
> >
> > `rtnl_link_get_size()` contains:
> >
> > size = nla_total_size(sizeof(struct nlattr)) + /*
> > IFLA_LINKINFO */
> > nla_total_size(strlen(ops->kind) + 1); /*
> > IFLA_INFO_KIND */
> >
> > Is that inlined(?) and the code at fault?
> Yes, that is inlined! because
> (gdb) disassemble if_nlmsg_size
> Dump of assembler code for function if_nlmsg_size:
> [...]
> 0xc00000000191bf38 <+104>: beq 0xc00000000191c1f0 <if_nlmsg_size+800>
> 0xc00000000191bf3c <+108>: ld r3,16(r31)
> 0xc00000000191bf40 <+112>: bl 0xc000000001c28ad0 <strlen>
> [...]
> (gdb)
> (gdb) break *0xc00000000191bf40
> Breakpoint 1 at 0xc00000000191bf40: file ./include/net/netlink.h, line 1112.
> (gdb) break *0xc00000000191bf38
> Breakpoint 2 at 0xc00000000191bf38: file net/core/rtnetlink.c, line 520.
I suggest building your kernel with CONFIG_DEBUG_INFO=y if you are not
already doing so. That gives gdb a lot more information about things
like inlining.
Thanx, Paul
> > >>> But ops is assigned the value of sit_link_ops in function sit_init_net
> > >>> line 1917, so I guess something must happened between the calls.
> > >>>
> > >>> Do we have KASAN in IBM machine? would KASAN help us find out what
> > >>> happened in between?
> > >>
> > >> Unfortunately, KASAN is not support on Power, I have, as far as I can
> > >> see. From `arch/powerpc/Kconfig`:
> > >>
> > >> select HAVE_ARCH_KASAN if PPC32 && PPC_PAGE_SHIFT <= 14
> > >> select HAVE_ARCH_KASAN_VMALLOC if PPC32 && PPC_PAGE_SHIFT <= 14
> > >>
> > > en, agree, I invoke "make menuconfig ARCH=powerpc
> > > CC=powerpc64le-linux-gnu-gcc-9 CROSS_COMPILE=powerpc64le-linux-gnu- -j
> > > 16", I can't find KASAN under Memory Debugging, I guess we should find
> > > the bug by bisecting instead.
> >
> > I do not know, if it is a regression, as it was the first time I tried
> > to run a Linux kernel built with rcutorture on real hardware.
> I tried to add some debug statements to the kernel to locate the bug
> more accurately, you can try it when you're not busy in the future,
> or just ignore it if the following patch looks not very effective ;-)
> diff --git a/net/core/dev.c b/net/core/dev.c
> index 1baab07820f6..969ac7c540cc 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -9707,6 +9707,9 @@ int register_netdevice(struct net_device *dev)
> * Prevent userspace races by waiting until the network
> * device is fully setup before sending notifications.
> */
> + if (dev->rtnl_link_ops)
> + printk(KERN_INFO "%lx IFLA_INFO_KIND %s %s\n", dev->rtnl_link_ops,
> + dev->rtnl_link_ops->kind, __FUNCTION__);
> if (!dev->rtnl_link_ops ||
> dev->rtnl_link_state == RTNL_LINK_INITIALIZED)
> rtmsg_ifinfo(RTM_NEWLINK, dev, ~0U, GFP_KERNEL);
> @@ -9788,6 +9791,9 @@ int register_netdev(struct net_device *dev)
>
> if (rtnl_lock_killable())
> return -EINTR;
> + if (dev->rtnl_link_ops)
> + printk(KERN_INFO "%lx IFLA_INFO_KIND %s %s\n", dev->rtnl_link_ops,
> + dev->rtnl_link_ops->kind, __FUNCTION__);
> err = register_netdevice(dev);
> rtnl_unlock();
> return err;
> diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
> index e476403231f0..e08986ae6238 100644
> --- a/net/core/rtnetlink.c
> +++ b/net/core/rtnetlink.c
> @@ -520,6 +520,8 @@ static size_t rtnl_link_get_size(const struct
> net_device *dev)
> if (!ops)
> return 0;
>
> + printk(KERN_INFO "%lx IFLA_INFO_KIND %s %s\n", ops,
> + ops->kind, __FUNCTION__);
> size = nla_total_size(sizeof(struct nlattr)) + /* IFLA_LINKINFO */
> nla_total_size(strlen(ops->kind) + 1); /* IFLA_INFO_KIND */
>
> @@ -1006,6 +1008,9 @@ static size_t rtnl_proto_down_size(const struct
> net_device *dev)
> static noinline size_t if_nlmsg_size(const struct net_device *dev,
> u32 ext_filter_mask)
> {
> + if (dev->rtnl_link_ops)
> + printk(KERN_INFO "%lx IFLA_INFO_KIND %s %s\n", dev->rtnl_link_ops,
> + dev->rtnl_link_ops->kind, __FUNCTION__);
> return NLMSG_ALIGN(sizeof(struct ifinfomsg))
> + nla_total_size(IFNAMSIZ) /* IFLA_IFNAME */
> + nla_total_size(IFALIASZ) /* IFLA_IFALIAS */
> @@ -3825,7 +3830,9 @@ struct sk_buff *rtmsg_ifinfo_build_skb(int type,
> struct net_device *dev,
> struct net *net = dev_net(dev);
> struct sk_buff *skb;
> int err = -ENOBUFS;
> -
> + if (dev->rtnl_link_ops)
> + printk(KERN_INFO "%lx IFLA_INFO_KIND %s %s\n", dev->rtnl_link_ops,
> + dev->rtnl_link_ops->kind, __FUNCTION__);
> skb = nlmsg_new(if_nlmsg_size(dev, 0), flags);
> if (skb == NULL)
> goto errout;
> @@ -3861,7 +3868,9 @@ static void rtmsg_ifinfo_event(int type, struct
> net_device *dev,
>
> if (dev->reg_state != NETREG_REGISTERED)
> return;
> -
> + if (dev->rtnl_link_ops)
> + printk(KERN_INFO "%lx IFLA_INFO_KIND %s %s\n", dev->rtnl_link_ops,
> + dev->rtnl_link_ops->kind, __FUNCTION__);
> skb = rtmsg_ifinfo_build_skb(type, dev, change, event, flags, new_nsid,
> new_ifindex);
> if (skb)
> @@ -3871,6 +3880,9 @@ static void rtmsg_ifinfo_event(int type, struct
> net_device *dev,
> void rtmsg_ifinfo(int type, struct net_device *dev, unsigned int change,
> gfp_t flags)
> {
> + if (dev->rtnl_link_ops)
> + printk(KERN_INFO "%lx IFLA_INFO_KIND %s %s\n", dev->rtnl_link_ops,
> + dev->rtnl_link_ops->kind, __FUNCTION__);
> rtmsg_ifinfo_event(type, dev, change, rtnl_get_event(0), flags,
> NULL, 0);
> }
> diff --git a/net/ipv6/sit.c b/net/ipv6/sit.c
> index c0b138c20992..fa5b2725811c 100644
> --- a/net/ipv6/sit.c
> +++ b/net/ipv6/sit.c
> @@ -1919,6 +1919,8 @@ static int __net_init sit_init_net(struct net *net)
> * Allowing to move it to another netns is clearly unsafe.
> */
> sitn->fb_tunnel_dev->features |= NETIF_F_NETNS_LOCAL;
> -
> + printk(KERN_INFO "%lx IFLA_INFO_KIND %s %s\n",
> + sitn->fb_tunnel_dev->rtnl_link_ops,
> + sitn->fb_tunnel_dev->rtnl_link_ops->kind, __FUNCTION__);
> err = register_netdev(sitn->fb_tunnel_dev);
> if (err)
> goto err_reg_dev;
> >
> > >>> Hope I can be of more helpful.
> > >>
> > >> Some distributions support multi-arch, so they easily allow
> > >> crosscompiling for different architectures.
> > > I use "make ARCH=powerpc CC=powerpc64le-linux-gnu-gcc-9
> > > CROSS_COMPILE=powerpc64le-linux-gnu- -j 16" to cross compile kernel
> > > for powerpc64le in my Ubuntu 20.04 x86_64. But I can't boot the
> > > compiled kernel using "qemu-system-ppc64le -M pseries -nographic -smp
> > > 4 -net none -m 4G -kernel arch/powerpc/boot/zImage". I will continue
> > > to explore it.
> >
> > Oh, that does not sound good. But I have not tried that in a long time
> > either. It’s a separate issue, but maybe some of the PPC
> > maintainers/folks could help.
> I will do further research on this later.
>
> Thanks for your time
> Kind regards
> Zhouyi
> >
> >
> > Kind regards,
> >
> > Paul
Thank Paul for joining us!
On Mon, Jan 31, 2022 at 1:44 AM Paul E. McKenney <[email protected]> wrote:
>
> On Sun, Jan 30, 2022 at 09:24:44PM +0800, Zhouyi Zhou wrote:
> > Dear Paul
> >
> > On Sun, Jan 30, 2022 at 4:19 PM Paul Menzel <[email protected]> wrote:
> > >
> > > Dear Zhouyi,
> > >
> > >
> > > Am 30.01.22 um 01:21 schrieb Zhouyi Zhou:
> > >
> > > > Thank you for your instructions, I learned a lot from this process.
> > >
> > > Same on my end.
> > >
> > > > On Sun, Jan 30, 2022 at 12:52 AM Paul Menzel <[email protected]> wrote:
> > >
> > > >> Am 29.01.22 um 03:23 schrieb Zhouyi Zhou:
> > > >>
> > > >>> I don't have an IBM machine, but I tried to analyze the problem using
> > > >>> my x86_64 kvm virtual machine, I can't reproduce the bug using my
> > > >>> x86_64 kvm virtual machine.
> > > >>
> > > >> No idea, if it’s architecture specific.
> > > >>
> > > >>> I saw the panic is caused by registration of sit device (A sit device
> > > >>> is a type of virtual network device that takes our IPv6 traffic,
> > > >>> encapsulates/decapsulates it in IPv4 packets, and sends/receives it
> > > >>> over the IPv4 Internet to another host)
> > > >>>
> > > >>> sit device is registered in function sit_init_net:
> > > >>> 1895 static int __net_init sit_init_net(struct net *net)
> > > >>> 1896 {
> > > >>> 1897 struct sit_net *sitn = net_generic(net, sit_net_id);
> > > >>> 1898 struct ip_tunnel *t;
> > > >>> 1899 int err;
> > > >>> 1900
> > > >>> 1901 sitn->tunnels[0] = sitn->tunnels_wc;
> > > >>> 1902 sitn->tunnels[1] = sitn->tunnels_l;
> > > >>> 1903 sitn->tunnels[2] = sitn->tunnels_r;
> > > >>> 1904 sitn->tunnels[3] = sitn->tunnels_r_l;
> > > >>> 1905
> > > >>> 1906 if (!net_has_fallback_tunnels(net))
> > > >>> 1907 return 0;
> > > >>> 1908
> > > >>> 1909 sitn->fb_tunnel_dev = alloc_netdev(sizeof(struct ip_tunnel), "sit0",
> > > >>> 1910 NET_NAME_UNKNOWN,
> > > >>> 1911 ipip6_tunnel_setup);
> > > >>> 1912 if (!sitn->fb_tunnel_dev) {
> > > >>> 1913 err = -ENOMEM;
> > > >>> 1914 goto err_alloc_dev;
> > > >>> 1915 }
> > > >>> 1916 dev_net_set(sitn->fb_tunnel_dev, net);
> > > >>> 1917 sitn->fb_tunnel_dev->rtnl_link_ops = &sit_link_ops;
> > > >>> 1918 /* FB netdevice is special: we have one, and only one per netns.
> > > >>> 1919 * Allowing to move it to another netns is clearly unsafe.
> > > >>> 1920 */
> > > >>> 1921 sitn->fb_tunnel_dev->features |= NETIF_F_NETNS_LOCAL;
> > > >>> 1922
> > > >>> 1923 err = register_netdev(sitn->fb_tunnel_dev);
> > > >>> register_netdev on line 1923 will call if_nlmsg_size indirectly.
> > > >>>
> > > >>> On the other hand, the function that calls the paniced strlen is if_nlmsg_size:
> > > >>> (gdb) disassemble if_nlmsg_size
> > > >>> Dump of assembler code for function if_nlmsg_size:
> > > >>> 0xffffffff81a0dc20 <+0>: nopl 0x0(%rax,%rax,1)
> > > >>> 0xffffffff81a0dc25 <+5>: push %rbp
> > > >>> 0xffffffff81a0dc26 <+6>: push %r15
> > > >>> 0xffffffff81a0dd04 <+228>: je 0xffffffff81a0de20 <if_nlmsg_size+512>
> > > >>> 0xffffffff81a0dd0a <+234>: mov 0x10(%rbp),%rdi
> > > >>> ...
> > > >>> => 0xffffffff81a0dd0e <+238>: callq 0xffffffff817532d0 <strlen>
> > > >>> 0xffffffff81a0dd13 <+243>: add $0x10,%eax
> > > >>> 0xffffffff81a0dd16 <+246>: movslq %eax,%r12
> > > >>
> > > >> Excuse my ignorance, would that look the same for ppc64le?
> > > >> Unfortunately, I didn’t save the problematic `vmlinuz` file, but on a
> > > >> current build (without rcutorture) I have the line below, where strlen
> > > >> shows up.
> > > >>
> > > >> (gdb) disassemble if_nlmsg_size
> > > >> […]
> > > >> 0xc000000000f7f82c <+332>: bl 0xc000000000a10e30 <strlen>
> > > >> […]
> > > >>
> > > >>> and the C code for 0xffffffff81a0dd0e is following (line 524):
> > > >>> 515 static size_t rtnl_link_get_size(const struct net_device *dev)
> > > >>> 516 {
> > > >>> 517 const struct rtnl_link_ops *ops = dev->rtnl_link_ops;
> > > >>> 518 size_t size;
> > > >>> 519
> > > >>> 520 if (!ops)
> > > >>> 521 return 0;
> > > >>> 522
> > > >>> 523 size = nla_total_size(sizeof(struct nlattr)) + /* IFLA_LINKINFO */
> > > >>> 524 nla_total_size(strlen(ops->kind) + 1); /* IFLA_INFO_KIND */
> > > >>
> > > >> How do I connect the disassemby output with the corresponding line?
> > > > I use "make ARCH=powerpc CC=powerpc64le-linux-gnu-gcc-9
> > > > CROSS_COMPILE=powerpc64le-linux-gnu- -j 16" to cross compile kernel
> > > > for powerpc64le in my Ubuntu 20.04 x86_64.
> > > >
> > > > gdb-multiarch ./vmlinux
> > > > (gdb)disassemble if_nlmsg_size
> > > > [...]
> > > > 0xc00000000191bf40 <+112>: bl 0xc000000001c28ad0 <strlen>
> > > > [...]
> > > > (gdb) break *0xc00000000191bf40
> > > > Breakpoint 1 at 0xc00000000191bf40: file ./include/net/netlink.h, line 1112.
> > > >
> > > > But in include/net/netlink.h:1112, I can't find the call to strlen
> > > > 1110static inline int nla_total_size(int payload)
> > > > 1111{
> > > > 1112 return NLA_ALIGN(nla_attr_size(payload));
> > > > 1113}
> > > > This may be due to the compiler wrongly encode the debug information, I guess.
> > >
> > > `rtnl_link_get_size()` contains:
> > >
> > > size = nla_total_size(sizeof(struct nlattr)) + /*
> > > IFLA_LINKINFO */
> > > nla_total_size(strlen(ops->kind) + 1); /*
> > > IFLA_INFO_KIND */
> > >
> > > Is that inlined(?) and the code at fault?
> > Yes, that is inlined! because
> > (gdb) disassemble if_nlmsg_size
> > Dump of assembler code for function if_nlmsg_size:
> > [...]
> > 0xc00000000191bf38 <+104>: beq 0xc00000000191c1f0 <if_nlmsg_size+800>
> > 0xc00000000191bf3c <+108>: ld r3,16(r31)
> > 0xc00000000191bf40 <+112>: bl 0xc000000001c28ad0 <strlen>
> > [...]
> > (gdb)
> > (gdb) break *0xc00000000191bf40
> > Breakpoint 1 at 0xc00000000191bf40: file ./include/net/netlink.h, line 1112.
> > (gdb) break *0xc00000000191bf38
> > Breakpoint 2 at 0xc00000000191bf38: file net/core/rtnetlink.c, line 520.
>
> I suggest building your kernel with CONFIG_DEBUG_INFO=y if you are not
> already doing so. That gives gdb a lot more information about things
> like inlining.
I check my .config file, CONFIG_DEBUG_INFO=y is here:
linux-next$ grep CONFIG_DEBUG_INFO .config
CONFIG_DEBUG_INFO=y
Then I invoke "make clean" and rebuild the kernel, the behavior of gdb
and vmlinux remain unchanged, sorry for that
I am trying to reproduce the bug on my bare metal x86_64 machines in
the coming days, and am also trying to work with Mr Menzel after he
comes back to the office.
Thanks
Zhouyi
>
> Thanx, Paul
>
> > > >>> But ops is assigned the value of sit_link_ops in function sit_init_net
> > > >>> line 1917, so I guess something must happened between the calls.
> > > >>>
> > > >>> Do we have KASAN in IBM machine? would KASAN help us find out what
> > > >>> happened in between?
> > > >>
> > > >> Unfortunately, KASAN is not support on Power, I have, as far as I can
> > > >> see. From `arch/powerpc/Kconfig`:
> > > >>
> > > >> select HAVE_ARCH_KASAN if PPC32 && PPC_PAGE_SHIFT <= 14
> > > >> select HAVE_ARCH_KASAN_VMALLOC if PPC32 && PPC_PAGE_SHIFT <= 14
> > > >>
> > > > en, agree, I invoke "make menuconfig ARCH=powerpc
> > > > CC=powerpc64le-linux-gnu-gcc-9 CROSS_COMPILE=powerpc64le-linux-gnu- -j
> > > > 16", I can't find KASAN under Memory Debugging, I guess we should find
> > > > the bug by bisecting instead.
> > >
> > > I do not know, if it is a regression, as it was the first time I tried
> > > to run a Linux kernel built with rcutorture on real hardware.
> > I tried to add some debug statements to the kernel to locate the bug
> > more accurately, you can try it when you're not busy in the future,
> > or just ignore it if the following patch looks not very effective ;-)
> > diff --git a/net/core/dev.c b/net/core/dev.c
> > index 1baab07820f6..969ac7c540cc 100644
> > --- a/net/core/dev.c
> > +++ b/net/core/dev.c
> > @@ -9707,6 +9707,9 @@ int register_netdevice(struct net_device *dev)
> > * Prevent userspace races by waiting until the network
> > * device is fully setup before sending notifications.
> > */
> > + if (dev->rtnl_link_ops)
> > + printk(KERN_INFO "%lx IFLA_INFO_KIND %s %s\n", dev->rtnl_link_ops,
> > + dev->rtnl_link_ops->kind, __FUNCTION__);
> > if (!dev->rtnl_link_ops ||
> > dev->rtnl_link_state == RTNL_LINK_INITIALIZED)
> > rtmsg_ifinfo(RTM_NEWLINK, dev, ~0U, GFP_KERNEL);
> > @@ -9788,6 +9791,9 @@ int register_netdev(struct net_device *dev)
> >
> > if (rtnl_lock_killable())
> > return -EINTR;
> > + if (dev->rtnl_link_ops)
> > + printk(KERN_INFO "%lx IFLA_INFO_KIND %s %s\n", dev->rtnl_link_ops,
> > + dev->rtnl_link_ops->kind, __FUNCTION__);
> > err = register_netdevice(dev);
> > rtnl_unlock();
> > return err;
> > diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
> > index e476403231f0..e08986ae6238 100644
> > --- a/net/core/rtnetlink.c
> > +++ b/net/core/rtnetlink.c
> > @@ -520,6 +520,8 @@ static size_t rtnl_link_get_size(const struct
> > net_device *dev)
> > if (!ops)
> > return 0;
> >
> > + printk(KERN_INFO "%lx IFLA_INFO_KIND %s %s\n", ops,
> > + ops->kind, __FUNCTION__);
> > size = nla_total_size(sizeof(struct nlattr)) + /* IFLA_LINKINFO */
> > nla_total_size(strlen(ops->kind) + 1); /* IFLA_INFO_KIND */
> >
> > @@ -1006,6 +1008,9 @@ static size_t rtnl_proto_down_size(const struct
> > net_device *dev)
> > static noinline size_t if_nlmsg_size(const struct net_device *dev,
> > u32 ext_filter_mask)
> > {
> > + if (dev->rtnl_link_ops)
> > + printk(KERN_INFO "%lx IFLA_INFO_KIND %s %s\n", dev->rtnl_link_ops,
> > + dev->rtnl_link_ops->kind, __FUNCTION__);
> > return NLMSG_ALIGN(sizeof(struct ifinfomsg))
> > + nla_total_size(IFNAMSIZ) /* IFLA_IFNAME */
> > + nla_total_size(IFALIASZ) /* IFLA_IFALIAS */
> > @@ -3825,7 +3830,9 @@ struct sk_buff *rtmsg_ifinfo_build_skb(int type,
> > struct net_device *dev,
> > struct net *net = dev_net(dev);
> > struct sk_buff *skb;
> > int err = -ENOBUFS;
> > -
> > + if (dev->rtnl_link_ops)
> > + printk(KERN_INFO "%lx IFLA_INFO_KIND %s %s\n", dev->rtnl_link_ops,
> > + dev->rtnl_link_ops->kind, __FUNCTION__);
> > skb = nlmsg_new(if_nlmsg_size(dev, 0), flags);
> > if (skb == NULL)
> > goto errout;
> > @@ -3861,7 +3868,9 @@ static void rtmsg_ifinfo_event(int type, struct
> > net_device *dev,
> >
> > if (dev->reg_state != NETREG_REGISTERED)
> > return;
> > -
> > + if (dev->rtnl_link_ops)
> > + printk(KERN_INFO "%lx IFLA_INFO_KIND %s %s\n", dev->rtnl_link_ops,
> > + dev->rtnl_link_ops->kind, __FUNCTION__);
> > skb = rtmsg_ifinfo_build_skb(type, dev, change, event, flags, new_nsid,
> > new_ifindex);
> > if (skb)
> > @@ -3871,6 +3880,9 @@ static void rtmsg_ifinfo_event(int type, struct
> > net_device *dev,
> > void rtmsg_ifinfo(int type, struct net_device *dev, unsigned int change,
> > gfp_t flags)
> > {
> > + if (dev->rtnl_link_ops)
> > + printk(KERN_INFO "%lx IFLA_INFO_KIND %s %s\n", dev->rtnl_link_ops,
> > + dev->rtnl_link_ops->kind, __FUNCTION__);
> > rtmsg_ifinfo_event(type, dev, change, rtnl_get_event(0), flags,
> > NULL, 0);
> > }
> > diff --git a/net/ipv6/sit.c b/net/ipv6/sit.c
> > index c0b138c20992..fa5b2725811c 100644
> > --- a/net/ipv6/sit.c
> > +++ b/net/ipv6/sit.c
> > @@ -1919,6 +1919,8 @@ static int __net_init sit_init_net(struct net *net)
> > * Allowing to move it to another netns is clearly unsafe.
> > */
> > sitn->fb_tunnel_dev->features |= NETIF_F_NETNS_LOCAL;
> > -
> > + printk(KERN_INFO "%lx IFLA_INFO_KIND %s %s\n",
> > + sitn->fb_tunnel_dev->rtnl_link_ops,
> > + sitn->fb_tunnel_dev->rtnl_link_ops->kind, __FUNCTION__);
> > err = register_netdev(sitn->fb_tunnel_dev);
> > if (err)
> > goto err_reg_dev;
> > >
> > > >>> Hope I can be of more helpful.
> > > >>
> > > >> Some distributions support multi-arch, so they easily allow
> > > >> crosscompiling for different architectures.
> > > > I use "make ARCH=powerpc CC=powerpc64le-linux-gnu-gcc-9
> > > > CROSS_COMPILE=powerpc64le-linux-gnu- -j 16" to cross compile kernel
> > > > for powerpc64le in my Ubuntu 20.04 x86_64. But I can't boot the
> > > > compiled kernel using "qemu-system-ppc64le -M pseries -nographic -smp
> > > > 4 -net none -m 4G -kernel arch/powerpc/boot/zImage". I will continue
> > > > to explore it.
> > >
> > > Oh, that does not sound good. But I have not tried that in a long time
> > > either. It’s a separate issue, but maybe some of the PPC
> > > maintainers/folks could help.
> > I will do further research on this later.
> >
> > Thanks for your time
> > Kind regards
> > Zhouyi
> > >
> > >
> > > Kind regards,
> > >
> > > Paul
On Mon, Jan 31, 2022 at 09:08:40AM +0800, Zhouyi Zhou wrote:
> Thank Paul for joining us!
>
> On Mon, Jan 31, 2022 at 1:44 AM Paul E. McKenney <[email protected]> wrote:
> >
> > On Sun, Jan 30, 2022 at 09:24:44PM +0800, Zhouyi Zhou wrote:
> > > Dear Paul
> > >
> > > On Sun, Jan 30, 2022 at 4:19 PM Paul Menzel <[email protected]> wrote:
> > > >
> > > > Dear Zhouyi,
> > > >
> > > >
> > > > Am 30.01.22 um 01:21 schrieb Zhouyi Zhou:
> > > >
> > > > > Thank you for your instructions, I learned a lot from this process.
> > > >
> > > > Same on my end.
> > > >
> > > > > On Sun, Jan 30, 2022 at 12:52 AM Paul Menzel <[email protected]> wrote:
> > > >
> > > > >> Am 29.01.22 um 03:23 schrieb Zhouyi Zhou:
> > > > >>
> > > > >>> I don't have an IBM machine, but I tried to analyze the problem using
> > > > >>> my x86_64 kvm virtual machine, I can't reproduce the bug using my
> > > > >>> x86_64 kvm virtual machine.
> > > > >>
> > > > >> No idea, if it’s architecture specific.
> > > > >>
> > > > >>> I saw the panic is caused by registration of sit device (A sit device
> > > > >>> is a type of virtual network device that takes our IPv6 traffic,
> > > > >>> encapsulates/decapsulates it in IPv4 packets, and sends/receives it
> > > > >>> over the IPv4 Internet to another host)
> > > > >>>
> > > > >>> sit device is registered in function sit_init_net:
> > > > >>> 1895 static int __net_init sit_init_net(struct net *net)
> > > > >>> 1896 {
> > > > >>> 1897 struct sit_net *sitn = net_generic(net, sit_net_id);
> > > > >>> 1898 struct ip_tunnel *t;
> > > > >>> 1899 int err;
> > > > >>> 1900
> > > > >>> 1901 sitn->tunnels[0] = sitn->tunnels_wc;
> > > > >>> 1902 sitn->tunnels[1] = sitn->tunnels_l;
> > > > >>> 1903 sitn->tunnels[2] = sitn->tunnels_r;
> > > > >>> 1904 sitn->tunnels[3] = sitn->tunnels_r_l;
> > > > >>> 1905
> > > > >>> 1906 if (!net_has_fallback_tunnels(net))
> > > > >>> 1907 return 0;
> > > > >>> 1908
> > > > >>> 1909 sitn->fb_tunnel_dev = alloc_netdev(sizeof(struct ip_tunnel), "sit0",
> > > > >>> 1910 NET_NAME_UNKNOWN,
> > > > >>> 1911 ipip6_tunnel_setup);
> > > > >>> 1912 if (!sitn->fb_tunnel_dev) {
> > > > >>> 1913 err = -ENOMEM;
> > > > >>> 1914 goto err_alloc_dev;
> > > > >>> 1915 }
> > > > >>> 1916 dev_net_set(sitn->fb_tunnel_dev, net);
> > > > >>> 1917 sitn->fb_tunnel_dev->rtnl_link_ops = &sit_link_ops;
> > > > >>> 1918 /* FB netdevice is special: we have one, and only one per netns.
> > > > >>> 1919 * Allowing to move it to another netns is clearly unsafe.
> > > > >>> 1920 */
> > > > >>> 1921 sitn->fb_tunnel_dev->features |= NETIF_F_NETNS_LOCAL;
> > > > >>> 1922
> > > > >>> 1923 err = register_netdev(sitn->fb_tunnel_dev);
> > > > >>> register_netdev on line 1923 will call if_nlmsg_size indirectly.
> > > > >>>
> > > > >>> On the other hand, the function that calls the paniced strlen is if_nlmsg_size:
> > > > >>> (gdb) disassemble if_nlmsg_size
> > > > >>> Dump of assembler code for function if_nlmsg_size:
> > > > >>> 0xffffffff81a0dc20 <+0>: nopl 0x0(%rax,%rax,1)
> > > > >>> 0xffffffff81a0dc25 <+5>: push %rbp
> > > > >>> 0xffffffff81a0dc26 <+6>: push %r15
> > > > >>> 0xffffffff81a0dd04 <+228>: je 0xffffffff81a0de20 <if_nlmsg_size+512>
> > > > >>> 0xffffffff81a0dd0a <+234>: mov 0x10(%rbp),%rdi
> > > > >>> ...
> > > > >>> => 0xffffffff81a0dd0e <+238>: callq 0xffffffff817532d0 <strlen>
> > > > >>> 0xffffffff81a0dd13 <+243>: add $0x10,%eax
> > > > >>> 0xffffffff81a0dd16 <+246>: movslq %eax,%r12
> > > > >>
> > > > >> Excuse my ignorance, would that look the same for ppc64le?
> > > > >> Unfortunately, I didn’t save the problematic `vmlinuz` file, but on a
> > > > >> current build (without rcutorture) I have the line below, where strlen
> > > > >> shows up.
> > > > >>
> > > > >> (gdb) disassemble if_nlmsg_size
> > > > >> […]
> > > > >> 0xc000000000f7f82c <+332>: bl 0xc000000000a10e30 <strlen>
> > > > >> […]
> > > > >>
> > > > >>> and the C code for 0xffffffff81a0dd0e is following (line 524):
> > > > >>> 515 static size_t rtnl_link_get_size(const struct net_device *dev)
> > > > >>> 516 {
> > > > >>> 517 const struct rtnl_link_ops *ops = dev->rtnl_link_ops;
> > > > >>> 518 size_t size;
> > > > >>> 519
> > > > >>> 520 if (!ops)
> > > > >>> 521 return 0;
> > > > >>> 522
> > > > >>> 523 size = nla_total_size(sizeof(struct nlattr)) + /* IFLA_LINKINFO */
> > > > >>> 524 nla_total_size(strlen(ops->kind) + 1); /* IFLA_INFO_KIND */
> > > > >>
> > > > >> How do I connect the disassemby output with the corresponding line?
> > > > > I use "make ARCH=powerpc CC=powerpc64le-linux-gnu-gcc-9
> > > > > CROSS_COMPILE=powerpc64le-linux-gnu- -j 16" to cross compile kernel
> > > > > for powerpc64le in my Ubuntu 20.04 x86_64.
> > > > >
> > > > > gdb-multiarch ./vmlinux
> > > > > (gdb)disassemble if_nlmsg_size
> > > > > [...]
> > > > > 0xc00000000191bf40 <+112>: bl 0xc000000001c28ad0 <strlen>
> > > > > [...]
> > > > > (gdb) break *0xc00000000191bf40
> > > > > Breakpoint 1 at 0xc00000000191bf40: file ./include/net/netlink.h, line 1112.
> > > > >
> > > > > But in include/net/netlink.h:1112, I can't find the call to strlen
> > > > > 1110static inline int nla_total_size(int payload)
> > > > > 1111{
> > > > > 1112 return NLA_ALIGN(nla_attr_size(payload));
> > > > > 1113}
> > > > > This may be due to the compiler wrongly encode the debug information, I guess.
> > > >
> > > > `rtnl_link_get_size()` contains:
> > > >
> > > > size = nla_total_size(sizeof(struct nlattr)) + /*
> > > > IFLA_LINKINFO */
> > > > nla_total_size(strlen(ops->kind) + 1); /*
> > > > IFLA_INFO_KIND */
> > > >
> > > > Is that inlined(?) and the code at fault?
> > > Yes, that is inlined! because
> > > (gdb) disassemble if_nlmsg_size
> > > Dump of assembler code for function if_nlmsg_size:
> > > [...]
> > > 0xc00000000191bf38 <+104>: beq 0xc00000000191c1f0 <if_nlmsg_size+800>
> > > 0xc00000000191bf3c <+108>: ld r3,16(r31)
> > > 0xc00000000191bf40 <+112>: bl 0xc000000001c28ad0 <strlen>
> > > [...]
> > > (gdb)
> > > (gdb) break *0xc00000000191bf40
> > > Breakpoint 1 at 0xc00000000191bf40: file ./include/net/netlink.h, line 1112.
> > > (gdb) break *0xc00000000191bf38
> > > Breakpoint 2 at 0xc00000000191bf38: file net/core/rtnetlink.c, line 520.
> >
> > I suggest building your kernel with CONFIG_DEBUG_INFO=y if you are not
> > already doing so. That gives gdb a lot more information about things
> > like inlining.
> I check my .config file, CONFIG_DEBUG_INFO=y is here:
> linux-next$ grep CONFIG_DEBUG_INFO .config
> CONFIG_DEBUG_INFO=y
> Then I invoke "make clean" and rebuild the kernel, the behavior of gdb
> and vmlinux remain unchanged, sorry for that
Glad you were already on top of this one!
> I am trying to reproduce the bug on my bare metal x86_64 machines in
> the coming days, and am also trying to work with Mr Menzel after he
> comes back to the office.
This URL used to allow community members such as yourself to request
access to Power systems: https://osuosl.org/services/powerdev/
In case that helps.
Thanx, Paul
> Thanks
> Zhouyi
> >
> > Thanx, Paul
> >
> > > > >>> But ops is assigned the value of sit_link_ops in function sit_init_net
> > > > >>> line 1917, so I guess something must happened between the calls.
> > > > >>>
> > > > >>> Do we have KASAN in IBM machine? would KASAN help us find out what
> > > > >>> happened in between?
> > > > >>
> > > > >> Unfortunately, KASAN is not support on Power, I have, as far as I can
> > > > >> see. From `arch/powerpc/Kconfig`:
> > > > >>
> > > > >> select HAVE_ARCH_KASAN if PPC32 && PPC_PAGE_SHIFT <= 14
> > > > >> select HAVE_ARCH_KASAN_VMALLOC if PPC32 && PPC_PAGE_SHIFT <= 14
> > > > >>
> > > > > en, agree, I invoke "make menuconfig ARCH=powerpc
> > > > > CC=powerpc64le-linux-gnu-gcc-9 CROSS_COMPILE=powerpc64le-linux-gnu- -j
> > > > > 16", I can't find KASAN under Memory Debugging, I guess we should find
> > > > > the bug by bisecting instead.
> > > >
> > > > I do not know, if it is a regression, as it was the first time I tried
> > > > to run a Linux kernel built with rcutorture on real hardware.
> > > I tried to add some debug statements to the kernel to locate the bug
> > > more accurately, you can try it when you're not busy in the future,
> > > or just ignore it if the following patch looks not very effective ;-)
> > > diff --git a/net/core/dev.c b/net/core/dev.c
> > > index 1baab07820f6..969ac7c540cc 100644
> > > --- a/net/core/dev.c
> > > +++ b/net/core/dev.c
> > > @@ -9707,6 +9707,9 @@ int register_netdevice(struct net_device *dev)
> > > * Prevent userspace races by waiting until the network
> > > * device is fully setup before sending notifications.
> > > */
> > > + if (dev->rtnl_link_ops)
> > > + printk(KERN_INFO "%lx IFLA_INFO_KIND %s %s\n", dev->rtnl_link_ops,
> > > + dev->rtnl_link_ops->kind, __FUNCTION__);
> > > if (!dev->rtnl_link_ops ||
> > > dev->rtnl_link_state == RTNL_LINK_INITIALIZED)
> > > rtmsg_ifinfo(RTM_NEWLINK, dev, ~0U, GFP_KERNEL);
> > > @@ -9788,6 +9791,9 @@ int register_netdev(struct net_device *dev)
> > >
> > > if (rtnl_lock_killable())
> > > return -EINTR;
> > > + if (dev->rtnl_link_ops)
> > > + printk(KERN_INFO "%lx IFLA_INFO_KIND %s %s\n", dev->rtnl_link_ops,
> > > + dev->rtnl_link_ops->kind, __FUNCTION__);
> > > err = register_netdevice(dev);
> > > rtnl_unlock();
> > > return err;
> > > diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
> > > index e476403231f0..e08986ae6238 100644
> > > --- a/net/core/rtnetlink.c
> > > +++ b/net/core/rtnetlink.c
> > > @@ -520,6 +520,8 @@ static size_t rtnl_link_get_size(const struct
> > > net_device *dev)
> > > if (!ops)
> > > return 0;
> > >
> > > + printk(KERN_INFO "%lx IFLA_INFO_KIND %s %s\n", ops,
> > > + ops->kind, __FUNCTION__);
> > > size = nla_total_size(sizeof(struct nlattr)) + /* IFLA_LINKINFO */
> > > nla_total_size(strlen(ops->kind) + 1); /* IFLA_INFO_KIND */
> > >
> > > @@ -1006,6 +1008,9 @@ static size_t rtnl_proto_down_size(const struct
> > > net_device *dev)
> > > static noinline size_t if_nlmsg_size(const struct net_device *dev,
> > > u32 ext_filter_mask)
> > > {
> > > + if (dev->rtnl_link_ops)
> > > + printk(KERN_INFO "%lx IFLA_INFO_KIND %s %s\n", dev->rtnl_link_ops,
> > > + dev->rtnl_link_ops->kind, __FUNCTION__);
> > > return NLMSG_ALIGN(sizeof(struct ifinfomsg))
> > > + nla_total_size(IFNAMSIZ) /* IFLA_IFNAME */
> > > + nla_total_size(IFALIASZ) /* IFLA_IFALIAS */
> > > @@ -3825,7 +3830,9 @@ struct sk_buff *rtmsg_ifinfo_build_skb(int type,
> > > struct net_device *dev,
> > > struct net *net = dev_net(dev);
> > > struct sk_buff *skb;
> > > int err = -ENOBUFS;
> > > -
> > > + if (dev->rtnl_link_ops)
> > > + printk(KERN_INFO "%lx IFLA_INFO_KIND %s %s\n", dev->rtnl_link_ops,
> > > + dev->rtnl_link_ops->kind, __FUNCTION__);
> > > skb = nlmsg_new(if_nlmsg_size(dev, 0), flags);
> > > if (skb == NULL)
> > > goto errout;
> > > @@ -3861,7 +3868,9 @@ static void rtmsg_ifinfo_event(int type, struct
> > > net_device *dev,
> > >
> > > if (dev->reg_state != NETREG_REGISTERED)
> > > return;
> > > -
> > > + if (dev->rtnl_link_ops)
> > > + printk(KERN_INFO "%lx IFLA_INFO_KIND %s %s\n", dev->rtnl_link_ops,
> > > + dev->rtnl_link_ops->kind, __FUNCTION__);
> > > skb = rtmsg_ifinfo_build_skb(type, dev, change, event, flags, new_nsid,
> > > new_ifindex);
> > > if (skb)
> > > @@ -3871,6 +3880,9 @@ static void rtmsg_ifinfo_event(int type, struct
> > > net_device *dev,
> > > void rtmsg_ifinfo(int type, struct net_device *dev, unsigned int change,
> > > gfp_t flags)
> > > {
> > > + if (dev->rtnl_link_ops)
> > > + printk(KERN_INFO "%lx IFLA_INFO_KIND %s %s\n", dev->rtnl_link_ops,
> > > + dev->rtnl_link_ops->kind, __FUNCTION__);
> > > rtmsg_ifinfo_event(type, dev, change, rtnl_get_event(0), flags,
> > > NULL, 0);
> > > }
> > > diff --git a/net/ipv6/sit.c b/net/ipv6/sit.c
> > > index c0b138c20992..fa5b2725811c 100644
> > > --- a/net/ipv6/sit.c
> > > +++ b/net/ipv6/sit.c
> > > @@ -1919,6 +1919,8 @@ static int __net_init sit_init_net(struct net *net)
> > > * Allowing to move it to another netns is clearly unsafe.
> > > */
> > > sitn->fb_tunnel_dev->features |= NETIF_F_NETNS_LOCAL;
> > > -
> > > + printk(KERN_INFO "%lx IFLA_INFO_KIND %s %s\n",
> > > + sitn->fb_tunnel_dev->rtnl_link_ops,
> > > + sitn->fb_tunnel_dev->rtnl_link_ops->kind, __FUNCTION__);
> > > err = register_netdev(sitn->fb_tunnel_dev);
> > > if (err)
> > > goto err_reg_dev;
> > > >
> > > > >>> Hope I can be of more helpful.
> > > > >>
> > > > >> Some distributions support multi-arch, so they easily allow
> > > > >> crosscompiling for different architectures.
> > > > > I use "make ARCH=powerpc CC=powerpc64le-linux-gnu-gcc-9
> > > > > CROSS_COMPILE=powerpc64le-linux-gnu- -j 16" to cross compile kernel
> > > > > for powerpc64le in my Ubuntu 20.04 x86_64. But I can't boot the
> > > > > compiled kernel using "qemu-system-ppc64le -M pseries -nographic -smp
> > > > > 4 -net none -m 4G -kernel arch/powerpc/boot/zImage". I will continue
> > > > > to explore it.
> > > >
> > > > Oh, that does not sound good. But I have not tried that in a long time
> > > > either. It’s a separate issue, but maybe some of the PPC
> > > > maintainers/folks could help.
> > > I will do further research on this later.
> > >
> > > Thanks for your time
> > > Kind regards
> > > Zhouyi
> > > >
> > > >
> > > > Kind regards,
> > > >
> > > > Paul
Thank Paul for your encouragement!
On Wed, Feb 2, 2022 at 1:50 AM Paul E. McKenney <[email protected]> wrote:
>
> On Mon, Jan 31, 2022 at 09:08:40AM +0800, Zhouyi Zhou wrote:
> > Thank Paul for joining us!
> >
> > On Mon, Jan 31, 2022 at 1:44 AM Paul E. McKenney <[email protected]> wrote:
> > >
> > > On Sun, Jan 30, 2022 at 09:24:44PM +0800, Zhouyi Zhou wrote:
> > > > Dear Paul
> > > >
> > > > On Sun, Jan 30, 2022 at 4:19 PM Paul Menzel <[email protected]> wrote:
> > > > >
> > > > > Dear Zhouyi,
> > > > >
> > > > >
> > > > > Am 30.01.22 um 01:21 schrieb Zhouyi Zhou:
> > > > >
> > > > > > Thank you for your instructions, I learned a lot from this process.
> > > > >
> > > > > Same on my end.
> > > > >
> > > > > > On Sun, Jan 30, 2022 at 12:52 AM Paul Menzel <[email protected]> wrote:
> > > > >
> > > > > >> Am 29.01.22 um 03:23 schrieb Zhouyi Zhou:
> > > > > >>
> > > > > >>> I don't have an IBM machine, but I tried to analyze the problem using
> > > > > >>> my x86_64 kvm virtual machine, I can't reproduce the bug using my
> > > > > >>> x86_64 kvm virtual machine.
> > > > > >>
> > > > > >> No idea, if it’s architecture specific.
> > > > > >>
> > > > > >>> I saw the panic is caused by registration of sit device (A sit device
> > > > > >>> is a type of virtual network device that takes our IPv6 traffic,
> > > > > >>> encapsulates/decapsulates it in IPv4 packets, and sends/receives it
> > > > > >>> over the IPv4 Internet to another host)
> > > > > >>>
> > > > > >>> sit device is registered in function sit_init_net:
> > > > > >>> 1895 static int __net_init sit_init_net(struct net *net)
> > > > > >>> 1896 {
> > > > > >>> 1897 struct sit_net *sitn = net_generic(net, sit_net_id);
> > > > > >>> 1898 struct ip_tunnel *t;
> > > > > >>> 1899 int err;
> > > > > >>> 1900
> > > > > >>> 1901 sitn->tunnels[0] = sitn->tunnels_wc;
> > > > > >>> 1902 sitn->tunnels[1] = sitn->tunnels_l;
> > > > > >>> 1903 sitn->tunnels[2] = sitn->tunnels_r;
> > > > > >>> 1904 sitn->tunnels[3] = sitn->tunnels_r_l;
> > > > > >>> 1905
> > > > > >>> 1906 if (!net_has_fallback_tunnels(net))
> > > > > >>> 1907 return 0;
> > > > > >>> 1908
> > > > > >>> 1909 sitn->fb_tunnel_dev = alloc_netdev(sizeof(struct ip_tunnel), "sit0",
> > > > > >>> 1910 NET_NAME_UNKNOWN,
> > > > > >>> 1911 ipip6_tunnel_setup);
> > > > > >>> 1912 if (!sitn->fb_tunnel_dev) {
> > > > > >>> 1913 err = -ENOMEM;
> > > > > >>> 1914 goto err_alloc_dev;
> > > > > >>> 1915 }
> > > > > >>> 1916 dev_net_set(sitn->fb_tunnel_dev, net);
> > > > > >>> 1917 sitn->fb_tunnel_dev->rtnl_link_ops = &sit_link_ops;
> > > > > >>> 1918 /* FB netdevice is special: we have one, and only one per netns.
> > > > > >>> 1919 * Allowing to move it to another netns is clearly unsafe.
> > > > > >>> 1920 */
> > > > > >>> 1921 sitn->fb_tunnel_dev->features |= NETIF_F_NETNS_LOCAL;
> > > > > >>> 1922
> > > > > >>> 1923 err = register_netdev(sitn->fb_tunnel_dev);
> > > > > >>> register_netdev on line 1923 will call if_nlmsg_size indirectly.
> > > > > >>>
> > > > > >>> On the other hand, the function that calls the paniced strlen is if_nlmsg_size:
> > > > > >>> (gdb) disassemble if_nlmsg_size
> > > > > >>> Dump of assembler code for function if_nlmsg_size:
> > > > > >>> 0xffffffff81a0dc20 <+0>: nopl 0x0(%rax,%rax,1)
> > > > > >>> 0xffffffff81a0dc25 <+5>: push %rbp
> > > > > >>> 0xffffffff81a0dc26 <+6>: push %r15
> > > > > >>> 0xffffffff81a0dd04 <+228>: je 0xffffffff81a0de20 <if_nlmsg_size+512>
> > > > > >>> 0xffffffff81a0dd0a <+234>: mov 0x10(%rbp),%rdi
> > > > > >>> ...
> > > > > >>> => 0xffffffff81a0dd0e <+238>: callq 0xffffffff817532d0 <strlen>
> > > > > >>> 0xffffffff81a0dd13 <+243>: add $0x10,%eax
> > > > > >>> 0xffffffff81a0dd16 <+246>: movslq %eax,%r12
> > > > > >>
> > > > > >> Excuse my ignorance, would that look the same for ppc64le?
> > > > > >> Unfortunately, I didn’t save the problematic `vmlinuz` file, but on a
> > > > > >> current build (without rcutorture) I have the line below, where strlen
> > > > > >> shows up.
> > > > > >>
> > > > > >> (gdb) disassemble if_nlmsg_size
> > > > > >> […]
> > > > > >> 0xc000000000f7f82c <+332>: bl 0xc000000000a10e30 <strlen>
> > > > > >> […]
> > > > > >>
> > > > > >>> and the C code for 0xffffffff81a0dd0e is following (line 524):
> > > > > >>> 515 static size_t rtnl_link_get_size(const struct net_device *dev)
> > > > > >>> 516 {
> > > > > >>> 517 const struct rtnl_link_ops *ops = dev->rtnl_link_ops;
> > > > > >>> 518 size_t size;
> > > > > >>> 519
> > > > > >>> 520 if (!ops)
> > > > > >>> 521 return 0;
> > > > > >>> 522
> > > > > >>> 523 size = nla_total_size(sizeof(struct nlattr)) + /* IFLA_LINKINFO */
> > > > > >>> 524 nla_total_size(strlen(ops->kind) + 1); /* IFLA_INFO_KIND */
> > > > > >>
> > > > > >> How do I connect the disassemby output with the corresponding line?
> > > > > > I use "make ARCH=powerpc CC=powerpc64le-linux-gnu-gcc-9
> > > > > > CROSS_COMPILE=powerpc64le-linux-gnu- -j 16" to cross compile kernel
> > > > > > for powerpc64le in my Ubuntu 20.04 x86_64.
> > > > > >
> > > > > > gdb-multiarch ./vmlinux
> > > > > > (gdb)disassemble if_nlmsg_size
> > > > > > [...]
> > > > > > 0xc00000000191bf40 <+112>: bl 0xc000000001c28ad0 <strlen>
> > > > > > [...]
> > > > > > (gdb) break *0xc00000000191bf40
> > > > > > Breakpoint 1 at 0xc00000000191bf40: file ./include/net/netlink.h, line 1112.
> > > > > >
> > > > > > But in include/net/netlink.h:1112, I can't find the call to strlen
> > > > > > 1110static inline int nla_total_size(int payload)
> > > > > > 1111{
> > > > > > 1112 return NLA_ALIGN(nla_attr_size(payload));
> > > > > > 1113}
> > > > > > This may be due to the compiler wrongly encode the debug information, I guess.
> > > > >
> > > > > `rtnl_link_get_size()` contains:
> > > > >
> > > > > size = nla_total_size(sizeof(struct nlattr)) + /*
> > > > > IFLA_LINKINFO */
> > > > > nla_total_size(strlen(ops->kind) + 1); /*
> > > > > IFLA_INFO_KIND */
> > > > >
> > > > > Is that inlined(?) and the code at fault?
> > > > Yes, that is inlined! because
> > > > (gdb) disassemble if_nlmsg_size
> > > > Dump of assembler code for function if_nlmsg_size:
> > > > [...]
> > > > 0xc00000000191bf38 <+104>: beq 0xc00000000191c1f0 <if_nlmsg_size+800>
> > > > 0xc00000000191bf3c <+108>: ld r3,16(r31)
> > > > 0xc00000000191bf40 <+112>: bl 0xc000000001c28ad0 <strlen>
> > > > [...]
> > > > (gdb)
> > > > (gdb) break *0xc00000000191bf40
> > > > Breakpoint 1 at 0xc00000000191bf40: file ./include/net/netlink.h, line 1112.
> > > > (gdb) break *0xc00000000191bf38
> > > > Breakpoint 2 at 0xc00000000191bf38: file net/core/rtnetlink.c, line 520.
> > >
> > > I suggest building your kernel with CONFIG_DEBUG_INFO=y if you are not
> > > already doing so. That gives gdb a lot more information about things
> > > like inlining.
> > I check my .config file, CONFIG_DEBUG_INFO=y is here:
> > linux-next$ grep CONFIG_DEBUG_INFO .config
> > CONFIG_DEBUG_INFO=y
> > Then I invoke "make clean" and rebuild the kernel, the behavior of gdb
> > and vmlinux remain unchanged, sorry for that
>
> Glad you were already on top of this one!
I am very pleased to contribute my tiny effort to the process of
making Linux better ;-)
>
> > I am trying to reproduce the bug on my bare metal x86_64 machines in
> > the coming days, and am also trying to work with Mr Menzel after he
> > comes back to the office.
>
> This URL used to allow community members such as yourself to request
> access to Power systems: https://osuosl.org/services/powerdev/
I have filled the request form on
https://osuosl.org/services/powerdev/ and now wait for them to deploy
the environment for me.
Thanks again
Zhouyi
>
> In case that helps.
>
> Thanx, Paul
>
> > Thanks
> > Zhouyi
> > >
> > > Thanx, Paul
> > >
> > > > > >>> But ops is assigned the value of sit_link_ops in function sit_init_net
> > > > > >>> line 1917, so I guess something must happened between the calls.
> > > > > >>>
> > > > > >>> Do we have KASAN in IBM machine? would KASAN help us find out what
> > > > > >>> happened in between?
> > > > > >>
> > > > > >> Unfortunately, KASAN is not support on Power, I have, as far as I can
> > > > > >> see. From `arch/powerpc/Kconfig`:
> > > > > >>
> > > > > >> select HAVE_ARCH_KASAN if PPC32 && PPC_PAGE_SHIFT <= 14
> > > > > >> select HAVE_ARCH_KASAN_VMALLOC if PPC32 && PPC_PAGE_SHIFT <= 14
> > > > > >>
> > > > > > en, agree, I invoke "make menuconfig ARCH=powerpc
> > > > > > CC=powerpc64le-linux-gnu-gcc-9 CROSS_COMPILE=powerpc64le-linux-gnu- -j
> > > > > > 16", I can't find KASAN under Memory Debugging, I guess we should find
> > > > > > the bug by bisecting instead.
> > > > >
> > > > > I do not know, if it is a regression, as it was the first time I tried
> > > > > to run a Linux kernel built with rcutorture on real hardware.
> > > > I tried to add some debug statements to the kernel to locate the bug
> > > > more accurately, you can try it when you're not busy in the future,
> > > > or just ignore it if the following patch looks not very effective ;-)
> > > > diff --git a/net/core/dev.c b/net/core/dev.c
> > > > index 1baab07820f6..969ac7c540cc 100644
> > > > --- a/net/core/dev.c
> > > > +++ b/net/core/dev.c
> > > > @@ -9707,6 +9707,9 @@ int register_netdevice(struct net_device *dev)
> > > > * Prevent userspace races by waiting until the network
> > > > * device is fully setup before sending notifications.
> > > > */
> > > > + if (dev->rtnl_link_ops)
> > > > + printk(KERN_INFO "%lx IFLA_INFO_KIND %s %s\n", dev->rtnl_link_ops,
> > > > + dev->rtnl_link_ops->kind, __FUNCTION__);
> > > > if (!dev->rtnl_link_ops ||
> > > > dev->rtnl_link_state == RTNL_LINK_INITIALIZED)
> > > > rtmsg_ifinfo(RTM_NEWLINK, dev, ~0U, GFP_KERNEL);
> > > > @@ -9788,6 +9791,9 @@ int register_netdev(struct net_device *dev)
> > > >
> > > > if (rtnl_lock_killable())
> > > > return -EINTR;
> > > > + if (dev->rtnl_link_ops)
> > > > + printk(KERN_INFO "%lx IFLA_INFO_KIND %s %s\n", dev->rtnl_link_ops,
> > > > + dev->rtnl_link_ops->kind, __FUNCTION__);
> > > > err = register_netdevice(dev);
> > > > rtnl_unlock();
> > > > return err;
> > > > diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
> > > > index e476403231f0..e08986ae6238 100644
> > > > --- a/net/core/rtnetlink.c
> > > > +++ b/net/core/rtnetlink.c
> > > > @@ -520,6 +520,8 @@ static size_t rtnl_link_get_size(const struct
> > > > net_device *dev)
> > > > if (!ops)
> > > > return 0;
> > > >
> > > > + printk(KERN_INFO "%lx IFLA_INFO_KIND %s %s\n", ops,
> > > > + ops->kind, __FUNCTION__);
> > > > size = nla_total_size(sizeof(struct nlattr)) + /* IFLA_LINKINFO */
> > > > nla_total_size(strlen(ops->kind) + 1); /* IFLA_INFO_KIND */
> > > >
> > > > @@ -1006,6 +1008,9 @@ static size_t rtnl_proto_down_size(const struct
> > > > net_device *dev)
> > > > static noinline size_t if_nlmsg_size(const struct net_device *dev,
> > > > u32 ext_filter_mask)
> > > > {
> > > > + if (dev->rtnl_link_ops)
> > > > + printk(KERN_INFO "%lx IFLA_INFO_KIND %s %s\n", dev->rtnl_link_ops,
> > > > + dev->rtnl_link_ops->kind, __FUNCTION__);
> > > > return NLMSG_ALIGN(sizeof(struct ifinfomsg))
> > > > + nla_total_size(IFNAMSIZ) /* IFLA_IFNAME */
> > > > + nla_total_size(IFALIASZ) /* IFLA_IFALIAS */
> > > > @@ -3825,7 +3830,9 @@ struct sk_buff *rtmsg_ifinfo_build_skb(int type,
> > > > struct net_device *dev,
> > > > struct net *net = dev_net(dev);
> > > > struct sk_buff *skb;
> > > > int err = -ENOBUFS;
> > > > -
> > > > + if (dev->rtnl_link_ops)
> > > > + printk(KERN_INFO "%lx IFLA_INFO_KIND %s %s\n", dev->rtnl_link_ops,
> > > > + dev->rtnl_link_ops->kind, __FUNCTION__);
> > > > skb = nlmsg_new(if_nlmsg_size(dev, 0), flags);
> > > > if (skb == NULL)
> > > > goto errout;
> > > > @@ -3861,7 +3868,9 @@ static void rtmsg_ifinfo_event(int type, struct
> > > > net_device *dev,
> > > >
> > > > if (dev->reg_state != NETREG_REGISTERED)
> > > > return;
> > > > -
> > > > + if (dev->rtnl_link_ops)
> > > > + printk(KERN_INFO "%lx IFLA_INFO_KIND %s %s\n", dev->rtnl_link_ops,
> > > > + dev->rtnl_link_ops->kind, __FUNCTION__);
> > > > skb = rtmsg_ifinfo_build_skb(type, dev, change, event, flags, new_nsid,
> > > > new_ifindex);
> > > > if (skb)
> > > > @@ -3871,6 +3880,9 @@ static void rtmsg_ifinfo_event(int type, struct
> > > > net_device *dev,
> > > > void rtmsg_ifinfo(int type, struct net_device *dev, unsigned int change,
> > > > gfp_t flags)
> > > > {
> > > > + if (dev->rtnl_link_ops)
> > > > + printk(KERN_INFO "%lx IFLA_INFO_KIND %s %s\n", dev->rtnl_link_ops,
> > > > + dev->rtnl_link_ops->kind, __FUNCTION__);
> > > > rtmsg_ifinfo_event(type, dev, change, rtnl_get_event(0), flags,
> > > > NULL, 0);
> > > > }
> > > > diff --git a/net/ipv6/sit.c b/net/ipv6/sit.c
> > > > index c0b138c20992..fa5b2725811c 100644
> > > > --- a/net/ipv6/sit.c
> > > > +++ b/net/ipv6/sit.c
> > > > @@ -1919,6 +1919,8 @@ static int __net_init sit_init_net(struct net *net)
> > > > * Allowing to move it to another netns is clearly unsafe.
> > > > */
> > > > sitn->fb_tunnel_dev->features |= NETIF_F_NETNS_LOCAL;
> > > > -
> > > > + printk(KERN_INFO "%lx IFLA_INFO_KIND %s %s\n",
> > > > + sitn->fb_tunnel_dev->rtnl_link_ops,
> > > > + sitn->fb_tunnel_dev->rtnl_link_ops->kind, __FUNCTION__);
> > > > err = register_netdev(sitn->fb_tunnel_dev);
> > > > if (err)
> > > > goto err_reg_dev;
> > > > >
> > > > > >>> Hope I can be of more helpful.
> > > > > >>
> > > > > >> Some distributions support multi-arch, so they easily allow
> > > > > >> crosscompiling for different architectures.
> > > > > > I use "make ARCH=powerpc CC=powerpc64le-linux-gnu-gcc-9
> > > > > > CROSS_COMPILE=powerpc64le-linux-gnu- -j 16" to cross compile kernel
> > > > > > for powerpc64le in my Ubuntu 20.04 x86_64. But I can't boot the
> > > > > > compiled kernel using "qemu-system-ppc64le -M pseries -nographic -smp
> > > > > > 4 -net none -m 4G -kernel arch/powerpc/boot/zImage". I will continue
> > > > > > to explore it.
> > > > >
> > > > > Oh, that does not sound good. But I have not tried that in a long time
> > > > > either. It’s a separate issue, but maybe some of the PPC
> > > > > maintainers/folks could help.
> > > > I will do further research on this later.
> > > >
> > > > Thanks for your time
> > > > Kind regards
> > > > Zhouyi
> > > > >
> > > > >
> > > > > Kind regards,
> > > > >
> > > > > Paul
Hi Paul
Below are my preliminary test results tested on PPC VM supplied by
Open source lab of Oregon State University, thank you for your
support!
[Preliminary test results on ppc64le virtual guest]
1. Conclusion
Some other kernel configuration besides RCU may lead to "BUG: Kernel
NULL pointer dereference" at boot
2. Test Environment
2.1 host hardware
8 core ppc64le virtual guest with 16G ram and 160G disk
cpu : POWER9 (architected), altivec supported
clock : 2200.000000MHz
revision : 2.2 (pvr 004e 1202)
2.2 host software
Operating System: Ubuntu 20.04.3 LTS, Compiler: gcc version 9.3.0
3. Test Procedure
3.1 kernel source
next-20220203
3.2 build and boot the kernel with CONFIG_DRM_BOCHS=m and
CONFIG_RCU_TORTURE_TEST=y
test result: "BUG: Kernel NULL pointer dereference" at boot
config file: http://154.223.142.244/Feb2022/config-5.17.0-rc2-next.bochs.torture
boot msg: http://154.223.142.244/Feb2022/dmesg.torture.bochs
3.3 build and boot the kernel with CONFIG_DRM_BOCHS=m
test result: "BUG: Kernel NULL pointer dereference" at boot
config file: http://154.223.142.244/Feb2022/config-5.17.0-rc2-next.bochs
boot msg: http://154.223.142.244/Feb2022/dmesg.bochs
3.4 build and boot the kernel with CONFIG_RCU_TORTURE_TEST=y (without
CONFIG_DRM_BOCHS)
test result: boot without error
config file: http://154.223.142.244/Feb2022/config-5.17.0-rc2-next.torture
boot msg: http://154.223.142.244/Feb2022/dmesg.torture
3.5 build and boot the kernel with CONFIG_RCU_TORTURE_TEST=m (without
CONFIG_DRM_BOCHS)
test result: boot without error
config file: http://154.223.142.244/Feb2022/config-5.17.0-rc2-next
boot msg: http://154.223.142.244/Feb2022/dmesg
4. Acknowledgement
Thank Open source lab of Oregon State University and Paul Menzel and
all other community members who support my tiny research.
Thanks
Zhouyi
On Wed, Feb 2, 2022 at 10:39 AM Zhouyi Zhou <[email protected]> wrote:
>
> Thank Paul for your encouragement!
>
> On Wed, Feb 2, 2022 at 1:50 AM Paul E. McKenney <[email protected]> wrote:
> >
> > On Mon, Jan 31, 2022 at 09:08:40AM +0800, Zhouyi Zhou wrote:
> > > Thank Paul for joining us!
> > >
> > > On Mon, Jan 31, 2022 at 1:44 AM Paul E. McKenney <[email protected]> wrote:
> > > >
> > > > On Sun, Jan 30, 2022 at 09:24:44PM +0800, Zhouyi Zhou wrote:
> > > > > Dear Paul
> > > > >
> > > > > On Sun, Jan 30, 2022 at 4:19 PM Paul Menzel <[email protected]> wrote:
> > > > > >
> > > > > > Dear Zhouyi,
> > > > > >
> > > > > >
> > > > > > Am 30.01.22 um 01:21 schrieb Zhouyi Zhou:
> > > > > >
> > > > > > > Thank you for your instructions, I learned a lot from this process.
> > > > > >
> > > > > > Same on my end.
> > > > > >
> > > > > > > On Sun, Jan 30, 2022 at 12:52 AM Paul Menzel <[email protected]> wrote:
> > > > > >
> > > > > > >> Am 29.01.22 um 03:23 schrieb Zhouyi Zhou:
> > > > > > >>
> > > > > > >>> I don't have an IBM machine, but I tried to analyze the problem using
> > > > > > >>> my x86_64 kvm virtual machine, I can't reproduce the bug using my
> > > > > > >>> x86_64 kvm virtual machine.
> > > > > > >>
> > > > > > >> No idea, if it’s architecture specific.
> > > > > > >>
> > > > > > >>> I saw the panic is caused by registration of sit device (A sit device
> > > > > > >>> is a type of virtual network device that takes our IPv6 traffic,
> > > > > > >>> encapsulates/decapsulates it in IPv4 packets, and sends/receives it
> > > > > > >>> over the IPv4 Internet to another host)
> > > > > > >>>
> > > > > > >>> sit device is registered in function sit_init_net:
> > > > > > >>> 1895 static int __net_init sit_init_net(struct net *net)
> > > > > > >>> 1896 {
> > > > > > >>> 1897 struct sit_net *sitn = net_generic(net, sit_net_id);
> > > > > > >>> 1898 struct ip_tunnel *t;
> > > > > > >>> 1899 int err;
> > > > > > >>> 1900
> > > > > > >>> 1901 sitn->tunnels[0] = sitn->tunnels_wc;
> > > > > > >>> 1902 sitn->tunnels[1] = sitn->tunnels_l;
> > > > > > >>> 1903 sitn->tunnels[2] = sitn->tunnels_r;
> > > > > > >>> 1904 sitn->tunnels[3] = sitn->tunnels_r_l;
> > > > > > >>> 1905
> > > > > > >>> 1906 if (!net_has_fallback_tunnels(net))
> > > > > > >>> 1907 return 0;
> > > > > > >>> 1908
> > > > > > >>> 1909 sitn->fb_tunnel_dev = alloc_netdev(sizeof(struct ip_tunnel), "sit0",
> > > > > > >>> 1910 NET_NAME_UNKNOWN,
> > > > > > >>> 1911 ipip6_tunnel_setup);
> > > > > > >>> 1912 if (!sitn->fb_tunnel_dev) {
> > > > > > >>> 1913 err = -ENOMEM;
> > > > > > >>> 1914 goto err_alloc_dev;
> > > > > > >>> 1915 }
> > > > > > >>> 1916 dev_net_set(sitn->fb_tunnel_dev, net);
> > > > > > >>> 1917 sitn->fb_tunnel_dev->rtnl_link_ops = &sit_link_ops;
> > > > > > >>> 1918 /* FB netdevice is special: we have one, and only one per netns.
> > > > > > >>> 1919 * Allowing to move it to another netns is clearly unsafe.
> > > > > > >>> 1920 */
> > > > > > >>> 1921 sitn->fb_tunnel_dev->features |= NETIF_F_NETNS_LOCAL;
> > > > > > >>> 1922
> > > > > > >>> 1923 err = register_netdev(sitn->fb_tunnel_dev);
> > > > > > >>> register_netdev on line 1923 will call if_nlmsg_size indirectly.
> > > > > > >>>
> > > > > > >>> On the other hand, the function that calls the paniced strlen is if_nlmsg_size:
> > > > > > >>> (gdb) disassemble if_nlmsg_size
> > > > > > >>> Dump of assembler code for function if_nlmsg_size:
> > > > > > >>> 0xffffffff81a0dc20 <+0>: nopl 0x0(%rax,%rax,1)
> > > > > > >>> 0xffffffff81a0dc25 <+5>: push %rbp
> > > > > > >>> 0xffffffff81a0dc26 <+6>: push %r15
> > > > > > >>> 0xffffffff81a0dd04 <+228>: je 0xffffffff81a0de20 <if_nlmsg_size+512>
> > > > > > >>> 0xffffffff81a0dd0a <+234>: mov 0x10(%rbp),%rdi
> > > > > > >>> ...
> > > > > > >>> => 0xffffffff81a0dd0e <+238>: callq 0xffffffff817532d0 <strlen>
> > > > > > >>> 0xffffffff81a0dd13 <+243>: add $0x10,%eax
> > > > > > >>> 0xffffffff81a0dd16 <+246>: movslq %eax,%r12
> > > > > > >>
> > > > > > >> Excuse my ignorance, would that look the same for ppc64le?
> > > > > > >> Unfortunately, I didn’t save the problematic `vmlinuz` file, but on a
> > > > > > >> current build (without rcutorture) I have the line below, where strlen
> > > > > > >> shows up.
> > > > > > >>
> > > > > > >> (gdb) disassemble if_nlmsg_size
> > > > > > >> […]
> > > > > > >> 0xc000000000f7f82c <+332>: bl 0xc000000000a10e30 <strlen>
> > > > > > >> […]
> > > > > > >>
> > > > > > >>> and the C code for 0xffffffff81a0dd0e is following (line 524):
> > > > > > >>> 515 static size_t rtnl_link_get_size(const struct net_device *dev)
> > > > > > >>> 516 {
> > > > > > >>> 517 const struct rtnl_link_ops *ops = dev->rtnl_link_ops;
> > > > > > >>> 518 size_t size;
> > > > > > >>> 519
> > > > > > >>> 520 if (!ops)
> > > > > > >>> 521 return 0;
> > > > > > >>> 522
> > > > > > >>> 523 size = nla_total_size(sizeof(struct nlattr)) + /* IFLA_LINKINFO */
> > > > > > >>> 524 nla_total_size(strlen(ops->kind) + 1); /* IFLA_INFO_KIND */
> > > > > > >>
> > > > > > >> How do I connect the disassemby output with the corresponding line?
> > > > > > > I use "make ARCH=powerpc CC=powerpc64le-linux-gnu-gcc-9
> > > > > > > CROSS_COMPILE=powerpc64le-linux-gnu- -j 16" to cross compile kernel
> > > > > > > for powerpc64le in my Ubuntu 20.04 x86_64.
> > > > > > >
> > > > > > > gdb-multiarch ./vmlinux
> > > > > > > (gdb)disassemble if_nlmsg_size
> > > > > > > [...]
> > > > > > > 0xc00000000191bf40 <+112>: bl 0xc000000001c28ad0 <strlen>
> > > > > > > [...]
> > > > > > > (gdb) break *0xc00000000191bf40
> > > > > > > Breakpoint 1 at 0xc00000000191bf40: file ./include/net/netlink.h, line 1112.
> > > > > > >
> > > > > > > But in include/net/netlink.h:1112, I can't find the call to strlen
> > > > > > > 1110static inline int nla_total_size(int payload)
> > > > > > > 1111{
> > > > > > > 1112 return NLA_ALIGN(nla_attr_size(payload));
> > > > > > > 1113}
> > > > > > > This may be due to the compiler wrongly encode the debug information, I guess.
> > > > > >
> > > > > > `rtnl_link_get_size()` contains:
> > > > > >
> > > > > > size = nla_total_size(sizeof(struct nlattr)) + /*
> > > > > > IFLA_LINKINFO */
> > > > > > nla_total_size(strlen(ops->kind) + 1); /*
> > > > > > IFLA_INFO_KIND */
> > > > > >
> > > > > > Is that inlined(?) and the code at fault?
> > > > > Yes, that is inlined! because
> > > > > (gdb) disassemble if_nlmsg_size
> > > > > Dump of assembler code for function if_nlmsg_size:
> > > > > [...]
> > > > > 0xc00000000191bf38 <+104>: beq 0xc00000000191c1f0 <if_nlmsg_size+800>
> > > > > 0xc00000000191bf3c <+108>: ld r3,16(r31)
> > > > > 0xc00000000191bf40 <+112>: bl 0xc000000001c28ad0 <strlen>
> > > > > [...]
> > > > > (gdb)
> > > > > (gdb) break *0xc00000000191bf40
> > > > > Breakpoint 1 at 0xc00000000191bf40: file ./include/net/netlink.h, line 1112.
> > > > > (gdb) break *0xc00000000191bf38
> > > > > Breakpoint 2 at 0xc00000000191bf38: file net/core/rtnetlink.c, line 520.
> > > >
> > > > I suggest building your kernel with CONFIG_DEBUG_INFO=y if you are not
> > > > already doing so. That gives gdb a lot more information about things
> > > > like inlining.
> > > I check my .config file, CONFIG_DEBUG_INFO=y is here:
> > > linux-next$ grep CONFIG_DEBUG_INFO .config
> > > CONFIG_DEBUG_INFO=y
> > > Then I invoke "make clean" and rebuild the kernel, the behavior of gdb
> > > and vmlinux remain unchanged, sorry for that
> >
> > Glad you were already on top of this one!
> I am very pleased to contribute my tiny effort to the process of
> making Linux better ;-)
> >
> > > I am trying to reproduce the bug on my bare metal x86_64 machines in
> > > the coming days, and am also trying to work with Mr Menzel after he
> > > comes back to the office.
> >
> > This URL used to allow community members such as yourself to request
> > access to Power systems: https://osuosl.org/services/powerdev/
> I have filled the request form on
> https://osuosl.org/services/powerdev/ and now wait for them to deploy
> the environment for me.
>
> Thanks again
> Zhouyi
> >
> > In case that helps.
> >
> > Thanx, Paul
> >
> > > Thanks
> > > Zhouyi
> > > >
> > > > Thanx, Paul
> > > >
> > > > > > >>> But ops is assigned the value of sit_link_ops in function sit_init_net
> > > > > > >>> line 1917, so I guess something must happened between the calls.
> > > > > > >>>
> > > > > > >>> Do we have KASAN in IBM machine? would KASAN help us find out what
> > > > > > >>> happened in between?
> > > > > > >>
> > > > > > >> Unfortunately, KASAN is not support on Power, I have, as far as I can
> > > > > > >> see. From `arch/powerpc/Kconfig`:
> > > > > > >>
> > > > > > >> select HAVE_ARCH_KASAN if PPC32 && PPC_PAGE_SHIFT <= 14
> > > > > > >> select HAVE_ARCH_KASAN_VMALLOC if PPC32 && PPC_PAGE_SHIFT <= 14
> > > > > > >>
> > > > > > > en, agree, I invoke "make menuconfig ARCH=powerpc
> > > > > > > CC=powerpc64le-linux-gnu-gcc-9 CROSS_COMPILE=powerpc64le-linux-gnu- -j
> > > > > > > 16", I can't find KASAN under Memory Debugging, I guess we should find
> > > > > > > the bug by bisecting instead.
> > > > > >
> > > > > > I do not know, if it is a regression, as it was the first time I tried
> > > > > > to run a Linux kernel built with rcutorture on real hardware.
> > > > > I tried to add some debug statements to the kernel to locate the bug
> > > > > more accurately, you can try it when you're not busy in the future,
> > > > > or just ignore it if the following patch looks not very effective ;-)
> > > > > diff --git a/net/core/dev.c b/net/core/dev.c
> > > > > index 1baab07820f6..969ac7c540cc 100644
> > > > > --- a/net/core/dev.c
> > > > > +++ b/net/core/dev.c
> > > > > @@ -9707,6 +9707,9 @@ int register_netdevice(struct net_device *dev)
> > > > > * Prevent userspace races by waiting until the network
> > > > > * device is fully setup before sending notifications.
> > > > > */
> > > > > + if (dev->rtnl_link_ops)
> > > > > + printk(KERN_INFO "%lx IFLA_INFO_KIND %s %s\n", dev->rtnl_link_ops,
> > > > > + dev->rtnl_link_ops->kind, __FUNCTION__);
> > > > > if (!dev->rtnl_link_ops ||
> > > > > dev->rtnl_link_state == RTNL_LINK_INITIALIZED)
> > > > > rtmsg_ifinfo(RTM_NEWLINK, dev, ~0U, GFP_KERNEL);
> > > > > @@ -9788,6 +9791,9 @@ int register_netdev(struct net_device *dev)
> > > > >
> > > > > if (rtnl_lock_killable())
> > > > > return -EINTR;
> > > > > + if (dev->rtnl_link_ops)
> > > > > + printk(KERN_INFO "%lx IFLA_INFO_KIND %s %s\n", dev->rtnl_link_ops,
> > > > > + dev->rtnl_link_ops->kind, __FUNCTION__);
> > > > > err = register_netdevice(dev);
> > > > > rtnl_unlock();
> > > > > return err;
> > > > > diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
> > > > > index e476403231f0..e08986ae6238 100644
> > > > > --- a/net/core/rtnetlink.c
> > > > > +++ b/net/core/rtnetlink.c
> > > > > @@ -520,6 +520,8 @@ static size_t rtnl_link_get_size(const struct
> > > > > net_device *dev)
> > > > > if (!ops)
> > > > > return 0;
> > > > >
> > > > > + printk(KERN_INFO "%lx IFLA_INFO_KIND %s %s\n", ops,
> > > > > + ops->kind, __FUNCTION__);
> > > > > size = nla_total_size(sizeof(struct nlattr)) + /* IFLA_LINKINFO */
> > > > > nla_total_size(strlen(ops->kind) + 1); /* IFLA_INFO_KIND */
> > > > >
> > > > > @@ -1006,6 +1008,9 @@ static size_t rtnl_proto_down_size(const struct
> > > > > net_device *dev)
> > > > > static noinline size_t if_nlmsg_size(const struct net_device *dev,
> > > > > u32 ext_filter_mask)
> > > > > {
> > > > > + if (dev->rtnl_link_ops)
> > > > > + printk(KERN_INFO "%lx IFLA_INFO_KIND %s %s\n", dev->rtnl_link_ops,
> > > > > + dev->rtnl_link_ops->kind, __FUNCTION__);
> > > > > return NLMSG_ALIGN(sizeof(struct ifinfomsg))
> > > > > + nla_total_size(IFNAMSIZ) /* IFLA_IFNAME */
> > > > > + nla_total_size(IFALIASZ) /* IFLA_IFALIAS */
> > > > > @@ -3825,7 +3830,9 @@ struct sk_buff *rtmsg_ifinfo_build_skb(int type,
> > > > > struct net_device *dev,
> > > > > struct net *net = dev_net(dev);
> > > > > struct sk_buff *skb;
> > > > > int err = -ENOBUFS;
> > > > > -
> > > > > + if (dev->rtnl_link_ops)
> > > > > + printk(KERN_INFO "%lx IFLA_INFO_KIND %s %s\n", dev->rtnl_link_ops,
> > > > > + dev->rtnl_link_ops->kind, __FUNCTION__);
> > > > > skb = nlmsg_new(if_nlmsg_size(dev, 0), flags);
> > > > > if (skb == NULL)
> > > > > goto errout;
> > > > > @@ -3861,7 +3868,9 @@ static void rtmsg_ifinfo_event(int type, struct
> > > > > net_device *dev,
> > > > >
> > > > > if (dev->reg_state != NETREG_REGISTERED)
> > > > > return;
> > > > > -
> > > > > + if (dev->rtnl_link_ops)
> > > > > + printk(KERN_INFO "%lx IFLA_INFO_KIND %s %s\n", dev->rtnl_link_ops,
> > > > > + dev->rtnl_link_ops->kind, __FUNCTION__);
> > > > > skb = rtmsg_ifinfo_build_skb(type, dev, change, event, flags, new_nsid,
> > > > > new_ifindex);
> > > > > if (skb)
> > > > > @@ -3871,6 +3880,9 @@ static void rtmsg_ifinfo_event(int type, struct
> > > > > net_device *dev,
> > > > > void rtmsg_ifinfo(int type, struct net_device *dev, unsigned int change,
> > > > > gfp_t flags)
> > > > > {
> > > > > + if (dev->rtnl_link_ops)
> > > > > + printk(KERN_INFO "%lx IFLA_INFO_KIND %s %s\n", dev->rtnl_link_ops,
> > > > > + dev->rtnl_link_ops->kind, __FUNCTION__);
> > > > > rtmsg_ifinfo_event(type, dev, change, rtnl_get_event(0), flags,
> > > > > NULL, 0);
> > > > > }
> > > > > diff --git a/net/ipv6/sit.c b/net/ipv6/sit.c
> > > > > index c0b138c20992..fa5b2725811c 100644
> > > > > --- a/net/ipv6/sit.c
> > > > > +++ b/net/ipv6/sit.c
> > > > > @@ -1919,6 +1919,8 @@ static int __net_init sit_init_net(struct net *net)
> > > > > * Allowing to move it to another netns is clearly unsafe.
> > > > > */
> > > > > sitn->fb_tunnel_dev->features |= NETIF_F_NETNS_LOCAL;
> > > > > -
> > > > > + printk(KERN_INFO "%lx IFLA_INFO_KIND %s %s\n",
> > > > > + sitn->fb_tunnel_dev->rtnl_link_ops,
> > > > > + sitn->fb_tunnel_dev->rtnl_link_ops->kind, __FUNCTION__);
> > > > > err = register_netdev(sitn->fb_tunnel_dev);
> > > > > if (err)
> > > > > goto err_reg_dev;
> > > > > >
> > > > > > >>> Hope I can be of more helpful.
> > > > > > >>
> > > > > > >> Some distributions support multi-arch, so they easily allow
> > > > > > >> crosscompiling for different architectures.
> > > > > > > I use "make ARCH=powerpc CC=powerpc64le-linux-gnu-gcc-9
> > > > > > > CROSS_COMPILE=powerpc64le-linux-gnu- -j 16" to cross compile kernel
> > > > > > > for powerpc64le in my Ubuntu 20.04 x86_64. But I can't boot the
> > > > > > > compiled kernel using "qemu-system-ppc64le -M pseries -nographic -smp
> > > > > > > 4 -net none -m 4G -kernel arch/powerpc/boot/zImage". I will continue
> > > > > > > to explore it.
> > > > > >
> > > > > > Oh, that does not sound good. But I have not tried that in a long time
> > > > > > either. It’s a separate issue, but maybe some of the PPC
> > > > > > maintainers/folks could help.
> > > > > I will do further research on this later.
> > > > >
> > > > > Thanks for your time
> > > > > Kind regards
> > > > > Zhouyi
> > > > > >
> > > > > >
> > > > > > Kind regards,
> > > > > >
> > > > > > Paul
[Cc: +LLVM/clang build support folks]
Dear Zhouyi, dear Nathan, dear Nick,
To recap: On a ppc64le machine, building Linux in Ubuntu 21.10 with
*llvm* and *clang* 1:13.0-53~exp1
$ clang --version
Ubuntu clang version 13.0.0-2
Target: powerpc64le-unknown-linux-gnu
Thread model: posix
InstalledDir: /usr/bin
results in a segmentation fault, while it works when building with GCC.
$ gcc --version
gcc (Ubuntu 11.2.0-7ubuntu2) 11.2.0
```
[…]
[ T1] ipip: IPv4 and MPLS over IPv4 tunneling driver
[ T1] NET: Registered PF_INET6 protocol family
[ T1] Segment Routing with IPv6
[ T1] In-situ OAM (IOAM) with IPv6
[ T1] sit: IPv6, IPv4 and MPLS over IPv4 tunneling driver
[ T1] BUG: Kernel NULL pointer dereference on write at 0x00000000
[ T1] Faulting instruction address: 0xc0000000008e2400
[ T1] Oops: Kernel access of bad area, sig: 11 [#1]
[ T1] LE PAGE_SIZE=64K MMU=Hash PREEMPT SMP NR_CPUS=16 NUMA PowerNV
[ T1] Modules linked in:
[ T1] CPU: 11 PID: 1 Comm: swapper/0 Not tainted
5.17.0-rc1-00032-gdd81e1c7d5fb #29
[ T1] NIP: c0000000008e2400 LR: c000000000d65db0 CTR: c000000000f0bb60
[ T1] REGS: c0000000125033e0 TRAP: 0380 Not tainted
(5.17.0-rc1-00032-gdd81e1c7d5fb)
[ T1] MSR: 9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE> CR: 42800c40
XER: 00000000
[ T1] CFAR: c000000000d65dac IRQMASK: 0
[ T1] GPR00: c000000000d65b40 c000000012503680 c00000000290c600
0000000000000000
[ T1] GPR04: ffffffffffffffff 00000000ffffffff 0000000000000000
0000000000000cc0
[ T1] GPR08: 0000000000000000 0000000000000000 ffffffffffffffff
0000000000000001
[ T1] GPR12: 0000000000000000 c000007fffff6c00 c000000000012478
0000000000000000
[ T1] GPR16: 0000000000000000 0000000000000000 0000000000000000
0000000000000000
[ T1] GPR20: 0000000000000000 c000000002810100 0000000000000cc0
0000000000000000
[ T1] GPR24: 0000000000000010 c00000000294cf50 0000000000000000
0000000000000000
[ T1] GPR28: 0000000000000000 c00000001ec61000 0000000000000000
c000000012503680
[ T1] NIP [c0000000008e2400] strlen+0x10/0x30
[ T1] LR [c000000000d65db0] if_nlmsg_size+0x150/0x360
[ T1] Call Trace:
[ T1] [c000000012503680] [c0000000125036c0] 0xc0000000125036c0
(unreliable)
[ T1] [c0000000125036f0] [c000000000d65b40]
rtmsg_ifinfo_build_skb+0x80/0x1a0
[ T1] [c0000000125037b0] [c000000000d66be0] rtmsg_ifinfo+0x70/0xd0
[ T1] [c000000012503800] [c000000000d4de50]
register_netdevice+0x690/0x770
[ T1] [c000000012503890] [c000000000d4e2bc] register_netdev+0x4c/0x80
[ T1] [c0000000125038c0] [c000000000f4784c] sit_init_net+0x10c/0x1d0
[ T1] [c000000012503910] [c000000000d33c0c] ops_init+0x13c/0x1b0
[ T1] [c000000012503970] [c000000000d331bc]
register_pernet_operations+0xec/0x1e0
[ T1] [c0000000125039d0] [c000000000d33440]
register_pernet_device+0x60/0xd0
[ T1] [c000000012503a20] [c000000002085478] sit_init+0x54/0x160
[ T1] [c000000012503ab0] [c000000000011ba8] do_one_initcall+0xd8/0x3b0
[ T1] [c000000012503c70] [c000000002006064] do_initcall_level+0xe4/0x1c4
[ T1] [c000000012503cc0] [c000000002005f20] do_initcalls+0x84/0xe4
[ T1] [c000000012503d40] [c000000002005c7c]
kernel_init_freeable+0x160/0x1ec
[ T1] [c000000012503da0] [c0000000000124ac] kernel_init+0x3c/0x270
[ T1] [c000000012503e10] [c00000000000cd64]
ret_from_kernel_thread+0x5c/0x64
[ T1] Instruction dump:
[ T1] eb81ffe0 7c0803a6 4e800020 00000000 00000000 00000000 60000000
60000000
[ T1] 3883ffff 60000000 60000000 60000000 <8ca40001> 28050000
4082fff8 7c632050
[ T1] ---[ end trace 0000000000000000 ]---
[ T1]
[ T1] Kernel panic - not syncing: Attempted to kill init!
exitcode=0x0000000b
[…]
```
Am 30.01.22 um 14:24 schrieb Zhouyi Zhou:
> On Sun, Jan 30, 2022 at 4:19 PM Paul Menzel wrote:
>> Am 30.01.22 um 01:21 schrieb Zhouyi Zhou:
>>
>>> Thank you for your instructions, I learned a lot from this process.
>>
>> Same on my end.
>>
>>> On Sun, Jan 30, 2022 at 12:52 AM Paul Menzel <[email protected]> wrote:
>>
>>>> Am 29.01.22 um 03:23 schrieb Zhouyi Zhou:
>>>>
>>>>> I don't have an IBM machine, but I tried to analyze the problem using
>>>>> my x86_64 kvm virtual machine, I can't reproduce the bug using my
>>>>> x86_64 kvm virtual machine.
>>>>
>>>> No idea, if it’s architecture specific.
>>>>
>>>>> I saw the panic is caused by registration of sit device (A sit device
>>>>> is a type of virtual network device that takes our IPv6 traffic,
>>>>> encapsulates/decapsulates it in IPv4 packets, and sends/receives it
>>>>> over the IPv4 Internet to another host)
>>>>>
>>>>> sit device is registered in function sit_init_net:
>>>>> 1895 static int __net_init sit_init_net(struct net *net)
>>>>> 1896 {
>>>>> 1897 struct sit_net *sitn = net_generic(net, sit_net_id);
>>>>> 1898 struct ip_tunnel *t;
>>>>> 1899 int err;
>>>>> 1900
>>>>> 1901 sitn->tunnels[0] = sitn->tunnels_wc;
>>>>> 1902 sitn->tunnels[1] = sitn->tunnels_l;
>>>>> 1903 sitn->tunnels[2] = sitn->tunnels_r;
>>>>> 1904 sitn->tunnels[3] = sitn->tunnels_r_l;
>>>>> 1905
>>>>> 1906 if (!net_has_fallback_tunnels(net))
>>>>> 1907 return 0;
>>>>> 1908
>>>>> 1909 sitn->fb_tunnel_dev = alloc_netdev(sizeof(struct ip_tunnel), "sit0",
>>>>> 1910 NET_NAME_UNKNOWN,
>>>>> 1911 ipip6_tunnel_setup);
>>>>> 1912 if (!sitn->fb_tunnel_dev) {
>>>>> 1913 err = -ENOMEM;
>>>>> 1914 goto err_alloc_dev;
>>>>> 1915 }
>>>>> 1916 dev_net_set(sitn->fb_tunnel_dev, net);
>>>>> 1917 sitn->fb_tunnel_dev->rtnl_link_ops = &sit_link_ops;
>>>>> 1918 /* FB netdevice is special: we have one, and only one per netns.
>>>>> 1919 * Allowing to move it to another netns is clearly unsafe.
>>>>> 1920 */
>>>>> 1921 sitn->fb_tunnel_dev->features |= NETIF_F_NETNS_LOCAL;
>>>>> 1922
>>>>> 1923 err = register_netdev(sitn->fb_tunnel_dev);
>>>>> register_netdev on line 1923 will call if_nlmsg_size indirectly.
>>>>>
>>>>> On the other hand, the function that calls the paniced strlen is if_nlmsg_size:
>>>>> (gdb) disassemble if_nlmsg_size
>>>>> Dump of assembler code for function if_nlmsg_size:
>>>>> 0xffffffff81a0dc20 <+0>: nopl 0x0(%rax,%rax,1)
>>>>> 0xffffffff81a0dc25 <+5>: push %rbp
>>>>> 0xffffffff81a0dc26 <+6>: push %r15
>>>>> 0xffffffff81a0dd04 <+228>: je 0xffffffff81a0de20 <if_nlmsg_size+512>
>>>>> 0xffffffff81a0dd0a <+234>: mov 0x10(%rbp),%rdi
>>>>> ...
>>>>> => 0xffffffff81a0dd0e <+238>: callq 0xffffffff817532d0 <strlen>
>>>>> 0xffffffff81a0dd13 <+243>: add $0x10,%eax
>>>>> 0xffffffff81a0dd16 <+246>: movslq %eax,%r12
>>>>
>>>> Excuse my ignorance, would that look the same for ppc64le?
>>>> Unfortunately, I didn’t save the problematic `vmlinuz` file, but on a
>>>> current build (without rcutorture) I have the line below, where strlen
>>>> shows up.
>>>>
>>>> (gdb) disassemble if_nlmsg_size
>>>> […]
>>>> 0xc000000000f7f82c <+332>: bl 0xc000000000a10e30 <strlen>
>>>> […]
>>>>
>>>>> and the C code for 0xffffffff81a0dd0e is following (line 524):
>>>>> 515 static size_t rtnl_link_get_size(const struct net_device *dev)
>>>>> 516 {
>>>>> 517 const struct rtnl_link_ops *ops = dev->rtnl_link_ops;
>>>>> 518 size_t size;
>>>>> 519
>>>>> 520 if (!ops)
>>>>> 521 return 0;
>>>>> 522
>>>>> 523 size = nla_total_size(sizeof(struct nlattr)) + /* IFLA_LINKINFO */
>>>>> 524 nla_total_size(strlen(ops->kind) + 1); /* IFLA_INFO_KIND */
>>>>
>>>> How do I connect the disassemby output with the corresponding line?
>>> I use "make ARCH=powerpc CC=powerpc64le-linux-gnu-gcc-9
>>> CROSS_COMPILE=powerpc64le-linux-gnu- -j 16" to cross compile kernel
>>> for powerpc64le in my Ubuntu 20.04 x86_64.
>>>
>>> gdb-multiarch ./vmlinux
>>> (gdb)disassemble if_nlmsg_size
>>> [...]
>>> 0xc00000000191bf40 <+112>: bl 0xc000000001c28ad0 <strlen>
>>> [...]
>>> (gdb) break *0xc00000000191bf40
>>> Breakpoint 1 at 0xc00000000191bf40: file ./include/net/netlink.h, line 1112.
>>>
>>> But in include/net/netlink.h:1112, I can't find the call to strlen
>>> 1110static inline int nla_total_size(int payload)
>>> 1111{
>>> 1112 return NLA_ALIGN(nla_attr_size(payload));
>>> 1113}
>>> This may be due to the compiler wrongly encode the debug information, I guess.
>>
>> `rtnl_link_get_size()` contains:
>>
>> size = nla_total_size(sizeof(struct nlattr)) + /*
>> IFLA_LINKINFO */
>> nla_total_size(strlen(ops->kind) + 1); /*
>> IFLA_INFO_KIND */
>>
>> Is that inlined(?) and the code at fault?
> Yes, that is inlined! because
> (gdb) disassemble if_nlmsg_size
> Dump of assembler code for function if_nlmsg_size:
> [...]
> 0xc00000000191bf38 <+104>: beq 0xc00000000191c1f0 <if_nlmsg_size+800>
> 0xc00000000191bf3c <+108>: ld r3,16(r31)
> 0xc00000000191bf40 <+112>: bl 0xc000000001c28ad0 <strlen>
> [...]
> (gdb)
> (gdb) break *0xc00000000191bf40
> Breakpoint 1 at 0xc00000000191bf40: file ./include/net/netlink.h, line 1112.
> (gdb) break *0xc00000000191bf38
> Breakpoint 2 at 0xc00000000191bf38: file net/core/rtnetlink.c, line 520.
>
>>
>>>>> But ops is assigned the value of sit_link_ops in function sit_init_net
>>>>> line 1917, so I guess something must happened between the calls.
>>>>>
>>>>> Do we have KASAN in IBM machine? would KASAN help us find out what
>>>>> happened in between?
>>>>
>>>> Unfortunately, KASAN is not support on Power, I have, as far as I can
>>>> see. From `arch/powerpc/Kconfig`:
>>>>
>>>> select HAVE_ARCH_KASAN if PPC32 && PPC_PAGE_SHIFT <= 14
>>>> select HAVE_ARCH_KASAN_VMALLOC if PPC32 && PPC_PAGE_SHIFT <= 14
>>>>
>>> en, agree, I invoke "make menuconfig ARCH=powerpc
>>> CC=powerpc64le-linux-gnu-gcc-9 CROSS_COMPILE=powerpc64le-linux-gnu- -j
>>> 16", I can't find KASAN under Memory Debugging, I guess we should find
>>> the bug by bisecting instead.
>>
>> I do not know, if it is a regression, as it was the first time I tried
>> to run a Linux kernel built with rcutorture on real hardware.
> I tried to add some debug statements to the kernel to locate the bug
> more accurately, you can try it when you're not busy in the future,
> or just ignore it if the following patch looks not very effective ;-)
> diff --git a/net/core/dev.c b/net/core/dev.c
> index 1baab07820f6..969ac7c540cc 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -9707,6 +9707,9 @@ int register_netdevice(struct net_device *dev)
> * Prevent userspace races by waiting until the network
> * device is fully setup before sending notifications.
> */
> + if (dev->rtnl_link_ops)
> + printk(KERN_INFO "%lx IFLA_INFO_KIND %s %s\n", dev->rtnl_link_ops,
> + dev->rtnl_link_ops->kind, __FUNCTION__);
> if (!dev->rtnl_link_ops ||
> dev->rtnl_link_state == RTNL_LINK_INITIALIZED)
> rtmsg_ifinfo(RTM_NEWLINK, dev, ~0U, GFP_KERNEL);
> @@ -9788,6 +9791,9 @@ int register_netdev(struct net_device *dev)
>
> if (rtnl_lock_killable())
> return -EINTR;
> + if (dev->rtnl_link_ops)
> + printk(KERN_INFO "%lx IFLA_INFO_KIND %s %s\n", dev->rtnl_link_ops,
> + dev->rtnl_link_ops->kind, __FUNCTION__);
> err = register_netdevice(dev);
> rtnl_unlock();
> return err;
> diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
> index e476403231f0..e08986ae6238 100644
> --- a/net/core/rtnetlink.c
> +++ b/net/core/rtnetlink.c
> @@ -520,6 +520,8 @@ static size_t rtnl_link_get_size(const struct
> net_device *dev)
Google Mail unfortunately wraps lines, so it’s better to attach patches.
> if (!ops)
> return 0;
>
> + printk(KERN_INFO "%lx IFLA_INFO_KIND %s %s\n", ops,
> + ops->kind, __FUNCTION__);
> size = nla_total_size(sizeof(struct nlattr)) + /* IFLA_LINKINFO */
> nla_total_size(strlen(ops->kind) + 1); /* IFLA_INFO_KIND */
>
> @@ -1006,6 +1008,9 @@ static size_t rtnl_proto_down_size(const struct
> net_device *dev)
> static noinline size_t if_nlmsg_size(const struct net_device *dev,
> u32 ext_filter_mask)
> {
> + if (dev->rtnl_link_ops)
> + printk(KERN_INFO "%lx IFLA_INFO_KIND %s %s\n", dev->rtnl_link_ops,
> + dev->rtnl_link_ops->kind, __FUNCTION__);
> return NLMSG_ALIGN(sizeof(struct ifinfomsg))
> + nla_total_size(IFNAMSIZ) /* IFLA_IFNAME */
> + nla_total_size(IFALIASZ) /* IFLA_IFALIAS */
> @@ -3825,7 +3830,9 @@ struct sk_buff *rtmsg_ifinfo_build_skb(int type,
> struct net_device *dev,
> struct net *net = dev_net(dev);
> struct sk_buff *skb;
> int err = -ENOBUFS;
> -
> + if (dev->rtnl_link_ops)
> + printk(KERN_INFO "%lx IFLA_INFO_KIND %s %s\n", dev->rtnl_link_ops,
> + dev->rtnl_link_ops->kind, __FUNCTION__);
> skb = nlmsg_new(if_nlmsg_size(dev, 0), flags);
> if (skb == NULL)
> goto errout;
> @@ -3861,7 +3868,9 @@ static void rtmsg_ifinfo_event(int type, struct
> net_device *dev,
>
> if (dev->reg_state != NETREG_REGISTERED)
> return;
> -
> + if (dev->rtnl_link_ops)
> + printk(KERN_INFO "%lx IFLA_INFO_KIND %s %s\n", dev->rtnl_link_ops,
> + dev->rtnl_link_ops->kind, __FUNCTION__);
> skb = rtmsg_ifinfo_build_skb(type, dev, change, event, flags, new_nsid,
> new_ifindex);
> if (skb)
> @@ -3871,6 +3880,9 @@ static void rtmsg_ifinfo_event(int type, struct
> net_device *dev,
> void rtmsg_ifinfo(int type, struct net_device *dev, unsigned int change,
> gfp_t flags)
> {
> + if (dev->rtnl_link_ops)
> + printk(KERN_INFO "%lx IFLA_INFO_KIND %s %s\n", dev->rtnl_link_ops,
> + dev->rtnl_link_ops->kind, __FUNCTION__);
> rtmsg_ifinfo_event(type, dev, change, rtnl_get_event(0), flags,
> NULL, 0);
> }
> diff --git a/net/ipv6/sit.c b/net/ipv6/sit.c
> index c0b138c20992..fa5b2725811c 100644
> --- a/net/ipv6/sit.c
> +++ b/net/ipv6/sit.c
> @@ -1919,6 +1919,8 @@ static int __net_init sit_init_net(struct net *net)
> * Allowing to move it to another netns is clearly unsafe.
> */
> sitn->fb_tunnel_dev->features |= NETIF_F_NETNS_LOCAL;
> -
> + printk(KERN_INFO "%lx IFLA_INFO_KIND %s %s\n",
> + sitn->fb_tunnel_dev->rtnl_link_ops,
> + sitn->fb_tunnel_dev->rtnl_link_ops->kind, __FUNCTION__);
> err = register_netdev(sitn->fb_tunnel_dev);
> if (err)
> goto err_reg_dev;
Thank you for the diff. I *am* able to reproduce the crash also in a
QEMU/KVM virtual machine. config and Linux log is attached. Here is the
excerpt with your added messages:
```
$ qemu-system-ppc64 -enable-kvm -nographic -smp cores=1,threads=1 -net
none -enable-kvm -M pseries -nodefaults -device spapr-vscsi -serial
stdio -m 512 -kernel /dev/shm/linux/vmlinux -append
"debug_boot_weak_hash panic=-1 console=ttyS0
rcupdate.rcu_cpu_stall_suppress_at_boot=1 torture.disable_onoff_at_boot
rcupdate.rcu_task_stall_timeout=30000 rcupdate.rcu_self_test=1
rcutorture.onoff_interval=1000 rcutorture.onoff_holdoff=30
rcutorture.n_barrier_cbs=4 rcutorture.stat_interval=15
rcutorture.shutdown_secs=420 rcutorture.test_no_idle_hz=1
rcutorture.verbose=1"
[…]
[ 0.445514][ T1] c00000000295c988 IFLA_INFO_KIND ipip
register_netdevice
[ 0.446330][ T1] c00000000295c988 IFLA_INFO_KIND ipip rtmsg_ifinfo
[ 0.447107][ T1] c00000000295c988 IFLA_INFO_KIND ipip
rtmsg_ifinfo_event
[ 0.447935][ T1] c00000000295c988 IFLA_INFO_KIND ipip
rtmsg_ifinfo_build_skb
[ 0.448789][ T1] c00000000295c988 IFLA_INFO_KIND ipip if_nlmsg_size
[ 0.449563][ T1] c00000000295c988 IFLA_INFO_KIND ipip
rtnl_link_get_size
[ 0.450497][ T1] NET: Registered PF_INET6 protocol family
[ 0.451402][ T1] Segment Routing with IPv6
[ 0.451922][ T1] In-situ OAM (IOAM) with IPv6
[ 0.452480][ T1] sit: IPv6, IPv4 and MPLS over IPv4 tunneling driver
[ 0.453259][ T1] c00000000295cfd8 IFLA_INFO_KIND (null) sit_init_net
[ 0.454035][ T1] c00000000295cfd8 IFLA_INFO_KIND (null)
register_netdev
[ 0.454939][ T1] c00000000295cfd8 IFLA_INFO_KIND (null)
register_netdevice
[ 0.455780][ T1] c00000000295cfd8 IFLA_INFO_KIND (null) rtmsg_ifinfo
[ 0.456563][ T1] c00000000295cfd8 IFLA_INFO_KIND (null)
rtmsg_ifinfo_event
[ 0.457409][ T1] c00000000295cfd8 IFLA_INFO_KIND (null)
rtmsg_ifinfo_build_skb
[ 0.458288][ T1] c00000000295cfd8 IFLA_INFO_KIND (null) if_nlmsg_size
[ 0.459085][ T1] c00000000295cfd8 IFLA_INFO_KIND (null)
rtnl_link_get_size
[ 0.459921][ T1] BUG: Kernel NULL pointer dereference on read at
0x00000000
[ 0.460766][ T1] Faulting instruction address: 0xc00000000090b640
[ 0.461513][ T1] Oops: Kernel access of bad area, sig: 11 [#1]
[ 0.462225][ T1] LE PAGE_SIZE=64K MMU=Hash PREEMPT SMP NR_CPUS=16
NUMA pSeries
[ 0.463108][ T1] Modules linked in:
[ 0.463549][ T1] CPU: 0 PID: 1 Comm: swapper/0 Not tainted
5.17.0-rc4-00219-g43a6dd55dd9d #28
[ 0.464584][ T1] NIP: c00000000090b640 LR: c000000000d9785c CTR:
0000000000000000
[ 0.465499][ T1] REGS: c0000000073e32b0 TRAP: 0380 Not tainted
(5.17.0-rc4-00219-g43a6dd55dd9d)
[ 0.466581][ T1] MSR: 8000000002009033 <SF,VEC,EE,ME,IR,DR,RI,LE>
CR: 22800c47 XER: 00000000
[ 0.467642][ T1] CFAR: c000000000d97858 IRQMASK: 0
[ 0.467642][ T1] GPR00: c000000000d97850 c0000000073e3550
c000000002919d00 0000000000000000
[ 0.467642][ T1] GPR04: ffffffffffffffff ffffffffff1e7ef8
ffffffffff1e9984 c00000000267ae88
[ 0.467642][ T1] GPR08: 0000000000000003 0000000000000004
c00000000267ae88 0000000000000000
[ 0.467642][ T1] GPR12: 0000000000880000 c000000002ac0000
c000000000012518 0000000000000000
[ 0.467642][ T1] GPR16: 0000000000000000 0000000000000000
0000000000000000 0000000000000000
[ 0.467642][ T1] GPR20: 0000000000000000 c00000000281dc00
0000000000000000 0000000000000cc0
[ 0.467642][ T1] GPR24: 0000000000000000 0000000000000000
0000000000000000 0000000000000000
[ 0.467642][ T1] GPR28: c00000000295cfd8 c0000000079d3000
0000000000000000 c0000000073e3550
[ 0.476429][ T1] NIP [c00000000090b640] strlen+0x10/0x30
[ 0.477085][ T1] LR [c000000000d9785c] if_nlmsg_size+0x2dc/0x3b0
[ 0.477822][ T1] Call Trace:
[ 0.478190][ T1] [c0000000073e3550] [c000000000d97850]
if_nlmsg_size+0x2d0/0x3b0 (unreliable)
[ 0.479226][ T1] [c0000000073e3600] [c000000000d9743c]
rtmsg_ifinfo_build_skb+0x8c/0x1d0
[ 0.480210][ T1] [c0000000073e36c0] [c000000000d98298]
rtmsg_ifinfo+0x88/0x130
[ 0.481086][ T1] [c0000000073e3750] [c000000000d7e118]
register_netdevice+0x5c8/0x690
[ 0.482037][ T1] [c0000000073e37e0] [c000000000d7e578]
register_netdev+0x58/0xb0
[ 0.482946][ T1] [c0000000073e3850] [c000000000f83ad0]
sit_init_net+0x150/0x1a0
[ 0.483838][ T1] [c0000000073e38d0] [c000000000d6469c]
ops_init+0x13c/0x1b0
[ 0.484691][ T1] [c0000000073e3930] [c000000000d63c4c]
register_pernet_operations+0xec/0x1e0
[ 0.485714][ T1] [c0000000073e3990] [c000000000d63ed0]
register_pernet_device+0x60/0xd0
[ 0.486689][ T1] [c0000000073e39e0] [c000000002085228]
sit_init+0x54/0x160
[ 0.487530][ T1] [c0000000073e3a70] [c000000000011c58]
do_one_initcall+0x108/0x3e0
[ 0.488455][ T1] [c0000000073e3c70] [c000000002006190]
do_initcall_level+0xe4/0x1c4
[ 0.489389][ T1] [c0000000073e3cc0] [c00000000200604c]
do_initcalls+0x84/0xe4
[ 0.490260][ T1] [c0000000073e3d40] [c000000002005da8]
kernel_init_freeable+0x160/0x1ec
[ 0.491236][ T1] [c0000000073e3da0] [c00000000001254c]
kernel_init+0x3c/0x270
[ 0.492108][ T1] [c0000000073e3e10] [c00000000000cd64]
ret_from_kernel_thread+0x5c/0x64
[ 0.493078][ T1] Instruction dump:
[ 0.493509][ T1] eb81ffe0 7c0803a6 4e800020 00000000 00000000
00000000 60000000 60000000
[ 0.494524][ T1] 3883ffff 60000000 60000000 60000000 <8ca40001>
28050000 4082fff8 7c632050
[ 0.495542][ T1] ---[ end trace 0000000000000000 ]---
```
[…]
Kind regards,
Paul
Hi Paul,
On Wed, Feb 16, 2022 at 02:19:51PM +0100, Paul Menzel wrote:
> [Cc: +LLVM/clang build support folks]
>
>
> Dear Zhouyi, dear Nathan, dear Nick,
>
>
> To recap: On a ppc64le machine, building Linux in Ubuntu 21.10 with *llvm*
> and *clang* 1:13.0-53~exp1
>
> $ clang --version
> Ubuntu clang version 13.0.0-2
> Target: powerpc64le-unknown-linux-gnu
> Thread model: posix
> InstalledDir: /usr/bin
>
> results in a segmentation fault, while it works when building with GCC.
>
> $ gcc --version
> gcc (Ubuntu 11.2.0-7ubuntu2) 11.2.0
Thank you for keying us in. I am going to have a bit of a brain dump
here based on the information I have uncovered after a couple of hours
of debugging.
TL;DR: It seems like something is broken with __read_mostly + ld.lld
before 14.0.0.
My initial reproduction steps (boot-qemu.sh comes from
https://github.com/ClangBuiltLinux/boot-utils):
$ clang --version
clang version 13.0.1 (Fedora 13.0.1-1.fc37)
Target: x86_64-redhat-linux-gnu
Thread model: posix
InstalledDir: /usr/bin
$ powerpc64le-linux-gnu-as --version
GNU assembler version 2.37-2.fc36
Copyright (C) 2021 Free Software Foundation, Inc.
This program is free software; you may redistribute it under the terms of
the GNU General Public License version 3 or later.
This program has absolutely no warranty.
This assembler was configured for a target of `powerpc64le-linux-gnu'.
$ curl -LSso .config https://lore.kernel.org/all/[email protected]/3-linux-5.17-rc4-rcu-dev-config.txt
$ scripts/config --set-val INITRAMFS_SOURCE '""'
$ make -skj"$(nproc)" ARCH=powerpc CROSS_COMPILE=powerpc64le-linux-gnu- LLVM=1 LLVM_IAS=0 all
$ boot-qemu.sh -a ppc64le -k . -t 45s
QEMU location: /usr/bin
QEMU version: QEMU emulator version 6.2.0 (qemu-6.2.0-5.fc37)
+ timeout --foreground 45s stdbuf -oL -eL qemu-system-ppc64 -initrd \
/home/nathan/cbl/github/boot-utils-ro/images/ppc64le/rootfs.cpio -device \
ipmi-bmc-sim,id=bmc0 -device isa-ipmi-bt,bmc=bmc0,irq=10 -L \
/home/nathan/cbl/github/boot-utils-ro/images/ppc64le/ -bios skiboot.lid \
-machine powernv8 -display none -kernel \
/home/nathan/cbl/src/linux/arch/powerpc/boot/zImage.epapr -m 2G \
-nodefaults -serial mon:stdio
...
[ 1.478028][ T1] BUG: Kernel NULL pointer dereference on read at 0x00000000
[ 1.478630][ T1] Faulting instruction address: 0xc00000000090bee0
[ 1.479521][ T1] Oops: Kernel access of bad area, sig: 11 [#1]
[ 1.480036][ T1] LE PAGE_SIZE=64K MMU=Hash PREEMPT SMP NR_CPUS=16 NUMA PowerNV
[ 1.480853][ T1] Modules linked in:
[ 1.481265][ T1] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.17.0-rc4-00001-gfa15c7cb550f #1
[ 1.481967][ T1] NIP: c00000000090bee0 LR: c000000000d96b60 CTR: c0000000000d5b4c
[ 1.482596][ T1] REGS: c000000007443330 TRAP: 0380 Not tainted (5.17.0-rc4-00001-gfa15c7cb550f)
[ 1.483305][ T1] MSR: 9000000002009033 <SF,HV,VEC,EE,ME,IR,DR,RI,LE> CR: 22800a87 XER: 00000000
[ 1.484277][ T1] CFAR: c000000000d96b5c IRQMASK: 0
[ 1.484277][ T1] GPR00: c000000000d96b54 c0000000074435d0 c0000000028bc600 0000000000000000
[ 1.484277][ T1] GPR04: ffffffffffffffff ffffffffff1ea558 ffffffffff1ebfe4 c00000000261ae88
[ 1.484277][ T1] GPR08: 0000000000000003 0000000000000004 c00000000261ae88 0000000000000000
[ 1.484277][ T1] GPR12: 0000000000800000 c000000002a60000 c000000000012518 0000000000000000
[ 1.484277][ T1] GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[ 1.484277][ T1] GPR20: 0000000000000000 c0000000027bff80 0000000000000cc0 0000000000000000
[ 1.484277][ T1] GPR24: 0000000000000010 0000000000000000 0000000000000000 0000000000000000
[ 1.484277][ T1] GPR28: c0000000028fcfd8 c000000007b83000 0000000000000000 c0000000074435d0
[ 1.490325][ T1] NIP [c00000000090bee0] strlen+0x10/0x30
[ 1.490788][ T1] LR [c000000000d96b60] if_nlmsg_size+0x2b0/0x390
[ 1.491319][ T1] Call Trace:
[ 1.491573][ T1] [c0000000074435d0] [c000000000d96b54] if_nlmsg_size+0x2a4/0x390 (unreliable)
[ 1.492291][ T1] [c000000007443680] [c000000000d96790] rtmsg_ifinfo_build_skb+0x80/0x1a0
[ 1.492958][ T1] [c000000007443740] [c000000000d97590] rtmsg_ifinfo+0x70/0xd0
[ 1.493559][ T1] [c000000007443790] [c000000000d7d528] register_netdevice+0x5d8/0x670
[ 1.494205][ T1] [c000000007443820] [c000000000d7d94c] register_netdev+0x4c/0x80
[ 1.494823][ T1] [c000000007443850] [c000000000f826d8] sit_init_net+0x1b8/0x200
[ 1.495426][ T1] [c0000000074438d0] [c000000000d63b5c] ops_init+0x14c/0x1c0
[ 1.496014][ T1] [c000000007443930] [c000000000d6314c] register_pernet_operations+0xec/0x1e0
[ 1.496716][ T1] [c000000007443990] [c000000000d633d0] register_pernet_device+0x60/0xd0
[ 1.497372][ T1] [c0000000074439e0] [c000000002085194] sit_init+0x54/0x160
[ 1.497950][ T1] [c000000007443a70] [c000000000011c58] do_one_initcall+0x108/0x3e0
[ 1.498573][ T1] [c000000007443c70] [c000000002006190] do_initcall_level+0xe4/0x1c4
[ 1.499219][ T1] [c000000007443cc0] [c00000000200604c] do_initcalls+0x84/0xe4
[ 1.499799][ T1] [c000000007443d40] [c000000002005da8] kernel_init_freeable+0x160/0x1ec
[ 1.500444][ T1] [c000000007443da0] [c00000000001254c] kernel_init+0x3c/0x270
[ 1.501042][ T1] [c000000007443e10] [c00000000000cd64] ret_from_kernel_thread+0x5c/0x64
[ 1.501721][ T1] Instruction dump:
[ 1.502202][ T1] eb81ffe0 7c0803a6 4e800020 00000000 00000000 00000000 60000000 60000000
[ 1.502934][ T1] 3883ffff 60000000 60000000 60000000 <8ca40001> 28050000 4082fff8 7c632050
[ 1.504028][ T1] ---[ end trace 0000000000000000 ]---
...
First thing was figuring out where the NULL pointer dereference happens,
which appears to the "strlen(ops->kind)" in rtnl_link_get_size():
515 static size_t rtnl_link_get_size(const struct net_device *dev)
516 {
517 const struct rtnl_link_ops *ops = dev->rtnl_link_ops;
518 size_t size;
519
520 if (!ops)
521 return 0;
522
523 size = nla_total_size(sizeof(struct nlattr)) + /* IFLA_LINKINFO */
524 nla_total_size(strlen(ops->kind) + 1); /* IFLA_INFO_KIND */
which I confirmed some really rudimentary printk debugging:
[ 1.476862][ T1] nathan: rtnl_link_get_size(): name: sit0, ops: c0000000028fcfd8, ops->kind: (null)
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 710da8a36729..c8d928e83aec 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -520,6 +520,9 @@ static size_t rtnl_link_get_size(const struct net_device *dev)
if (!ops)
return 0;
+ pr_err("nathan: %s(): name: %s, ops: %px, ops->kind: %s\n", __func__,
+ dev->name, ops, ops->kind);
+
size = nla_total_size(sizeof(struct nlattr)) + /* IFLA_LINKINFO */
nla_total_size(strlen(ops->kind) + 1); /* IFLA_INFO_KIND */
Okay... how did sit0 end up with a NULL kind...? It is very clearly
defined as "sit":
1830 static struct rtnl_link_ops sit_link_ops __read_mostly = {
1831 .kind = "sit",
Adding some more debug prints to net/ipv6/sit.c:
diff --git a/net/ipv6/sit.c b/net/ipv6/sit.c
index c0b138c20992..7b9edbed2fcd 100644
--- a/net/ipv6/sit.c
+++ b/net/ipv6/sit.c
@@ -1920,6 +1920,12 @@ static int __net_init sit_init_net(struct net *net)
*/
sitn->fb_tunnel_dev->features |= NETIF_F_NETNS_LOCAL;
+ pr_err("nathan: %s(): &sit_link_ops: %px\n", __func__, &sit_link_ops);
+ pr_err("nathan: %s(): sit_link_ops.kind: %s\n", __func__, sit_link_ops.kind);
+ pr_err("nathan: %s(): sit_link_ops.maxtype: %u\n", __func__, sit_link_ops.maxtype);
+ pr_err("nathan: %s(): sitn->fb_tunnel_dev->rtnl_link_ops: %px\n", __func__, sitn->fb_tunnel_dev->rtnl_link_ops);
+ pr_err("nathan: %s(): sitn->fb_tunnel_dev->rtnl_link_ops->kind: %s\n", __func__, sitn->fb_tunnel_dev->rtnl_link_ops->kind);
+
err = register_netdev(sitn->fb_tunnel_dev);
if (err)
goto err_reg_dev;
reveals:
[ 1.471920][ T1] sit: nathan: sit_init_net(): &sit_link_ops: c0000000028fcfd8
[ 1.472534][ T1] sit: nathan: sit_init_net(): sit_link_ops.kind: (null)
[ 1.473088][ T1] sit: nathan: sit_init_net(): sit_link_ops.maxtype: 20
[ 1.473639][ T1] sit: nathan: sit_init_net(): sitn->fb_tunnel_dev->rtnl_link_ops: c0000000028fcfd8
[ 1.474370][ T1] sit: nathan: sit_init_net(): sitn->fb_tunnel_dev->rtnl_link_ops->kind: (null)
This is super bizarre, as the maxtype member appears to have the correct
value, but how is kind's initial getting dropped on the floor?
Removing the __read_mostly annotation "fixes" it:
[ 1.481708][ T1] sit: nathan: sit_init_net(): &sit_link_ops: c0000000027d3f60
[ 1.482319][ T1] sit: nathan: sit_init_net(): sit_link_ops.kind: sit
[ 1.482878][ T1] sit: nathan: sit_init_net(): sit_link_ops.maxtype: 20
[ 1.483429][ T1] sit: nathan: sit_init_net(): sitn->fb_tunnel_dev->rtnl_link_ops: c0000000027d3f60
[ 1.484174][ T1] sit: nathan: sit_init_net(): sitn->fb_tunnel_dev->rtnl_link_ops->kind: sit
...
Linux version 5.17.0-rc4-00001-g956f02ad5c31-dirty ([email protected]) (clang version 13.0.1 (Fedora 13.0.1-1.fc37), LLD 13.0.1) #2 SMP PREEMPT Wed Feb 16 13:29:49 MST 2022
...
diff --git a/net/ipv6/sit.c b/net/ipv6/sit.c
index 7b9edbed2fcd..f109c7a0233b 100644
--- a/net/ipv6/sit.c
+++ b/net/ipv6/sit.c
@@ -70,7 +70,7 @@ static void ipip6_tunnel_setup(struct net_device *dev);
static void ipip6_dev_free(struct net_device *dev);
static bool check_6rd(struct ip_tunnel *tunnel, const struct in6_addr *v6dst,
__be32 *v4dst);
-static struct rtnl_link_ops sit_link_ops __read_mostly;
+static struct rtnl_link_ops sit_link_ops;
static unsigned int sit_net_id __read_mostly;
struct sit_net {
@@ -1827,7 +1827,7 @@ static void ipip6_dellink(struct net_device *dev, struct list_head *head)
unregister_netdevice_queue(dev, head);
}
-static struct rtnl_link_ops sit_link_ops __read_mostly = {
+static struct rtnl_link_ops sit_link_ops = {
.kind = "sit",
.maxtype = IFLA_IPTUN_MAX,
.policy = ipip6_policy,
Switching to ld.bfd also resolves it:
[ 1.470405][ T1] sit: nathan: sit_init_net(): &sit_link_ops: c0000000028acfd8
[ 1.471016][ T1] sit: nathan: sit_init_net(): sit_link_ops.kind: sit
[ 1.471534][ T1] sit: nathan: sit_init_net(): sit_link_ops.maxtype: 20
[ 1.472062][ T1] sit: nathan: sit_init_net(): sitn->fb_tunnel_dev->rtnl_link_ops: c0000000028acfd8
[ 1.472790][ T1] sit: nathan: sit_init_net(): sitn->fb_tunnel_dev->rtnl_link_ops->kind: sit
...
Linux version 5.17.0-rc4-00001-g956f02ad5c31 ([email protected]) (clang version 13.0.1 (Fedora 13.0.1-1.fc37), GNU ld version 2.37-2.fc36) #3 SMP PREEMPT Wed Feb 16 13:33:42 MST 2022
...
I tested with ToT LLVM (or at least, close to it, since there is an
unrelated ld.lld regression there) and I could not reproduce it there,
so I did a reverse bisect to see what commit fixes this issue in LLVM 14
and I landed on:
commit 55c14d6dbfd8e7b86c15d2613fea3490078e2ae4
Author: Fangrui Song <[email protected]>
Date: Thu Nov 25 14:12:34 2021 -0800
[ELF] Simplify DynamicSection content computation. NFC
The new code computes the content twice, but avoides the tricky
std::function<uint64_t()>. Removed 13KiB code in a Release build.
lld/ELF/SyntheticSections.cpp | 117 ++++++++++++++++--------------------------
lld/ELF/SyntheticSections.h | 12 +----
2 files changed, 44 insertions(+), 85 deletions(-)
That's... interesting, given that commit title says No Functional
Change, even though there clearly is one. That commit has a couple
mentions of PowerPC synthetic sections, so it is possible that the
new content calculation lines up with ld.bfd?
I am not really sure where to go from here, as I don't fully understand
what the problem was before that LLD change. I'll see if I can do some
more investigation tomorrow (unless someone wants to beat me to it ;)
Cheers,
Nathan
[Cc: +Fangrui]
Dear Nathan,
Am 17.02.22 um 02:16 schrieb Nathan Chancellor:
> On Wed, Feb 16, 2022 at 02:19:51PM +0100, Paul Menzel wrote:
>> [Cc: +LLVM/clang build support folks]
[…]
>> To recap: On a ppc64le machine, building Linux in Ubuntu 21.10 with *llvm*
>> and *clang* 1:13.0-53~exp1
>>
>> $ clang --version
>> Ubuntu clang version 13.0.0-2
>> Target: powerpc64le-unknown-linux-gnu
>> Thread model: posix
>> InstalledDir: /usr/bin
>>
>> results in a segmentation fault, while it works when building with GCC.
>>
>> $ gcc --version
>> gcc (Ubuntu 11.2.0-7ubuntu2) 11.2.0
>
> Thank you for keying us in. I am going to have a bit of a brain dump
> here based on the information I have uncovered after a couple of hours
> of debugging.
>
> TL;DR: It seems like something is broken with __read_mostly + ld.lld
> before 14.0.0.
>
> My initial reproduction steps (boot-qemu.sh comes from
> https://github.com/ClangBuiltLinux/boot-utils):
>
> $ clang --version
> clang version 13.0.1 (Fedora 13.0.1-1.fc37)
> Target: x86_64-redhat-linux-gnu
> Thread model: posix
> InstalledDir: /usr/bin
>
> $ powerpc64le-linux-gnu-as --version
> GNU assembler version 2.37-2.fc36
> Copyright (C) 2021 Free Software Foundation, Inc.
> This program is free software; you may redistribute it under the terms of
> the GNU General Public License version 3 or later.
> This program has absolutely no warranty.
> This assembler was configured for a target of `powerpc64le-linux-gnu'.
>
> $ curl -LSso .config https://lore.kernel.org/all/[email protected]/3-linux-5.17-rc4-rcu-dev-config.txt
>
> $ scripts/config --set-val INITRAMFS_SOURCE '""'
>
> $ make -skj"$(nproc)" ARCH=powerpc CROSS_COMPILE=powerpc64le-linux-gnu- LLVM=1 LLVM_IAS=0 all
>
> $ boot-qemu.sh -a ppc64le -k . -t 45s
> QEMU location: /usr/bin
>
> QEMU version: QEMU emulator version 6.2.0 (qemu-6.2.0-5.fc37)
>
> + timeout --foreground 45s stdbuf -oL -eL qemu-system-ppc64 -initrd \
> /home/nathan/cbl/github/boot-utils-ro/images/ppc64le/rootfs.cpio -device \
> ipmi-bmc-sim,id=bmc0 -device isa-ipmi-bt,bmc=bmc0,irq=10 -L \
> /home/nathan/cbl/github/boot-utils-ro/images/ppc64le/ -bios skiboot.lid \
> -machine powernv8 -display none -kernel \
> /home/nathan/cbl/src/linux/arch/powerpc/boot/zImage.epapr -m 2G \
> -nodefaults -serial mon:stdio
> ...
> [ 1.478028][ T1] BUG: Kernel NULL pointer dereference on read at 0x00000000
> [ 1.478630][ T1] Faulting instruction address: 0xc00000000090bee0
> [ 1.479521][ T1] Oops: Kernel access of bad area, sig: 11 [#1]
> [ 1.480036][ T1] LE PAGE_SIZE=64K MMU=Hash PREEMPT SMP NR_CPUS=16 NUMA PowerNV
> [ 1.480853][ T1] Modules linked in:
> [ 1.481265][ T1] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.17.0-rc4-00001-gfa15c7cb550f #1
> [ 1.481967][ T1] NIP: c00000000090bee0 LR: c000000000d96b60 CTR: c0000000000d5b4c
> [ 1.482596][ T1] REGS: c000000007443330 TRAP: 0380 Not tainted (5.17.0-rc4-00001-gfa15c7cb550f)
> [ 1.483305][ T1] MSR: 9000000002009033 <SF,HV,VEC,EE,ME,IR,DR,RI,LE> CR: 22800a87 XER: 00000000
> [ 1.484277][ T1] CFAR: c000000000d96b5c IRQMASK: 0
> [ 1.484277][ T1] GPR00: c000000000d96b54 c0000000074435d0 c0000000028bc600 0000000000000000
> [ 1.484277][ T1] GPR04: ffffffffffffffff ffffffffff1ea558 ffffffffff1ebfe4 c00000000261ae88
> [ 1.484277][ T1] GPR08: 0000000000000003 0000000000000004 c00000000261ae88 0000000000000000
> [ 1.484277][ T1] GPR12: 0000000000800000 c000000002a60000 c000000000012518 0000000000000000
> [ 1.484277][ T1] GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> [ 1.484277][ T1] GPR20: 0000000000000000 c0000000027bff80 0000000000000cc0 0000000000000000
> [ 1.484277][ T1] GPR24: 0000000000000010 0000000000000000 0000000000000000 0000000000000000
> [ 1.484277][ T1] GPR28: c0000000028fcfd8 c000000007b83000 0000000000000000 c0000000074435d0
> [ 1.490325][ T1] NIP [c00000000090bee0] strlen+0x10/0x30
> [ 1.490788][ T1] LR [c000000000d96b60] if_nlmsg_size+0x2b0/0x390
> [ 1.491319][ T1] Call Trace:
> [ 1.491573][ T1] [c0000000074435d0] [c000000000d96b54] if_nlmsg_size+0x2a4/0x390 (unreliable)
> [ 1.492291][ T1] [c000000007443680] [c000000000d96790] rtmsg_ifinfo_build_skb+0x80/0x1a0
> [ 1.492958][ T1] [c000000007443740] [c000000000d97590] rtmsg_ifinfo+0x70/0xd0
> [ 1.493559][ T1] [c000000007443790] [c000000000d7d528] register_netdevice+0x5d8/0x670
> [ 1.494205][ T1] [c000000007443820] [c000000000d7d94c] register_netdev+0x4c/0x80
> [ 1.494823][ T1] [c000000007443850] [c000000000f826d8] sit_init_net+0x1b8/0x200
> [ 1.495426][ T1] [c0000000074438d0] [c000000000d63b5c] ops_init+0x14c/0x1c0
> [ 1.496014][ T1] [c000000007443930] [c000000000d6314c] register_pernet_operations+0xec/0x1e0
> [ 1.496716][ T1] [c000000007443990] [c000000000d633d0] register_pernet_device+0x60/0xd0
> [ 1.497372][ T1] [c0000000074439e0] [c000000002085194] sit_init+0x54/0x160
> [ 1.497950][ T1] [c000000007443a70] [c000000000011c58] do_one_initcall+0x108/0x3e0
> [ 1.498573][ T1] [c000000007443c70] [c000000002006190] do_initcall_level+0xe4/0x1c4
> [ 1.499219][ T1] [c000000007443cc0] [c00000000200604c] do_initcalls+0x84/0xe4
> [ 1.499799][ T1] [c000000007443d40] [c000000002005da8] kernel_init_freeable+0x160/0x1ec
> [ 1.500444][ T1] [c000000007443da0] [c00000000001254c] kernel_init+0x3c/0x270
> [ 1.501042][ T1] [c000000007443e10] [c00000000000cd64] ret_from_kernel_thread+0x5c/0x64
> [ 1.501721][ T1] Instruction dump:
> [ 1.502202][ T1] eb81ffe0 7c0803a6 4e800020 00000000 00000000 00000000 60000000 60000000
> [ 1.502934][ T1] 3883ffff 60000000 60000000 60000000 <8ca40001> 28050000 4082fff8 7c632050
> [ 1.504028][ T1] ---[ end trace 0000000000000000 ]---
> ...
>
> First thing was figuring out where the NULL pointer dereference happens,
> which appears to the "strlen(ops->kind)" in rtnl_link_get_size():
>
> 515 static size_t rtnl_link_get_size(const struct net_device *dev)
> 516 {
> 517 const struct rtnl_link_ops *ops = dev->rtnl_link_ops;
> 518 size_t size;
> 519
> 520 if (!ops)
> 521 return 0;
> 522
> 523 size = nla_total_size(sizeof(struct nlattr)) + /* IFLA_LINKINFO */
> 524 nla_total_size(strlen(ops->kind) + 1); /* IFLA_INFO_KIND */
>
> which I confirmed some really rudimentary printk debugging:
>
> [ 1.476862][ T1] nathan: rtnl_link_get_size(): name: sit0, ops: c0000000028fcfd8, ops->kind: (null)
>
> diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
> index 710da8a36729..c8d928e83aec 100644
> --- a/net/core/rtnetlink.c
> +++ b/net/core/rtnetlink.c
> @@ -520,6 +520,9 @@ static size_t rtnl_link_get_size(const struct net_device *dev)
> if (!ops)
> return 0;
>
> + pr_err("nathan: %s(): name: %s, ops: %px, ops->kind: %s\n", __func__,
> + dev->name, ops, ops->kind);
> +
> size = nla_total_size(sizeof(struct nlattr)) + /* IFLA_LINKINFO */
> nla_total_size(strlen(ops->kind) + 1); /* IFLA_INFO_KIND */
>
>
> Okay... how did sit0 end up with a NULL kind...? It is very clearly
> defined as "sit":
>
> 1830 static struct rtnl_link_ops sit_link_ops __read_mostly = {
> 1831 .kind = "sit",
>
> Adding some more debug prints to net/ipv6/sit.c:
>
> diff --git a/net/ipv6/sit.c b/net/ipv6/sit.c
> index c0b138c20992..7b9edbed2fcd 100644
> --- a/net/ipv6/sit.c
> +++ b/net/ipv6/sit.c
> @@ -1920,6 +1920,12 @@ static int __net_init sit_init_net(struct net *net)
> */
> sitn->fb_tunnel_dev->features |= NETIF_F_NETNS_LOCAL;
>
> + pr_err("nathan: %s(): &sit_link_ops: %px\n", __func__, &sit_link_ops);
> + pr_err("nathan: %s(): sit_link_ops.kind: %s\n", __func__, sit_link_ops.kind);
> + pr_err("nathan: %s(): sit_link_ops.maxtype: %u\n", __func__, sit_link_ops.maxtype);
> + pr_err("nathan: %s(): sitn->fb_tunnel_dev->rtnl_link_ops: %px\n", __func__, sitn->fb_tunnel_dev->rtnl_link_ops);
> + pr_err("nathan: %s(): sitn->fb_tunnel_dev->rtnl_link_ops->kind: %s\n", __func__, sitn->fb_tunnel_dev->rtnl_link_ops->kind);
> +
> err = register_netdev(sitn->fb_tunnel_dev);
> if (err)
> goto err_reg_dev;
>
> reveals:
>
> [ 1.471920][ T1] sit: nathan: sit_init_net(): &sit_link_ops: c0000000028fcfd8
> [ 1.472534][ T1] sit: nathan: sit_init_net(): sit_link_ops.kind: (null)
> [ 1.473088][ T1] sit: nathan: sit_init_net(): sit_link_ops.maxtype: 20
> [ 1.473639][ T1] sit: nathan: sit_init_net(): sitn->fb_tunnel_dev->rtnl_link_ops: c0000000028fcfd8
> [ 1.474370][ T1] sit: nathan: sit_init_net(): sitn->fb_tunnel_dev->rtnl_link_ops->kind: (null)
>
> This is super bizarre, as the maxtype member appears to have the correct
> value, but how is kind's initial getting dropped on the floor?
>
> Removing the __read_mostly annotation "fixes" it:
>
> [ 1.481708][ T1] sit: nathan: sit_init_net(): &sit_link_ops: c0000000027d3f60
> [ 1.482319][ T1] sit: nathan: sit_init_net(): sit_link_ops.kind: sit
> [ 1.482878][ T1] sit: nathan: sit_init_net(): sit_link_ops.maxtype: 20
> [ 1.483429][ T1] sit: nathan: sit_init_net(): sitn->fb_tunnel_dev->rtnl_link_ops: c0000000027d3f60
> [ 1.484174][ T1] sit: nathan: sit_init_net(): sitn->fb_tunnel_dev->rtnl_link_ops->kind: sit
> ...
> Linux version 5.17.0-rc4-00001-g956f02ad5c31-dirty ([email protected]) (clang version 13.0.1 (Fedora 13.0.1-1.fc37), LLD 13.0.1) #2 SMP PREEMPT Wed Feb 16 13:29:49 MST 2022
> ...
>
> diff --git a/net/ipv6/sit.c b/net/ipv6/sit.c
> index 7b9edbed2fcd..f109c7a0233b 100644
> --- a/net/ipv6/sit.c
> +++ b/net/ipv6/sit.c
> @@ -70,7 +70,7 @@ static void ipip6_tunnel_setup(struct net_device *dev);
> static void ipip6_dev_free(struct net_device *dev);
> static bool check_6rd(struct ip_tunnel *tunnel, const struct in6_addr *v6dst,
> __be32 *v4dst);
> -static struct rtnl_link_ops sit_link_ops __read_mostly;
> +static struct rtnl_link_ops sit_link_ops;
>
> static unsigned int sit_net_id __read_mostly;
> struct sit_net {
> @@ -1827,7 +1827,7 @@ static void ipip6_dellink(struct net_device *dev, struct list_head *head)
> unregister_netdevice_queue(dev, head);
> }
>
> -static struct rtnl_link_ops sit_link_ops __read_mostly = {
> +static struct rtnl_link_ops sit_link_ops = {
> .kind = "sit",
> .maxtype = IFLA_IPTUN_MAX,
> .policy = ipip6_policy,
>
> Switching to ld.bfd also resolves it:
>
> [ 1.470405][ T1] sit: nathan: sit_init_net(): &sit_link_ops: c0000000028acfd8
> [ 1.471016][ T1] sit: nathan: sit_init_net(): sit_link_ops.kind: sit
> [ 1.471534][ T1] sit: nathan: sit_init_net(): sit_link_ops.maxtype: 20
> [ 1.472062][ T1] sit: nathan: sit_init_net(): sitn->fb_tunnel_dev->rtnl_link_ops: c0000000028acfd8
> [ 1.472790][ T1] sit: nathan: sit_init_net(): sitn->fb_tunnel_dev->rtnl_link_ops->kind: sit
> ...
> Linux version 5.17.0-rc4-00001-g956f02ad5c31 ([email protected]) (clang version 13.0.1 (Fedora 13.0.1-1.fc37), GNU ld version 2.37-2.fc36) #3 SMP PREEMPT Wed Feb 16 13:33:42 MST 2022
> ...
>
> I tested with ToT LLVM (or at least, close to it, since there is an
> unrelated ld.lld regression there) and I could not reproduce it there,
> so I did a reverse bisect to see what commit fixes this issue in LLVM 14
> and I landed on:
>
> commit 55c14d6dbfd8e7b86c15d2613fea3490078e2ae4
> Author: Fangrui Song <[email protected]>
> Date: Thu Nov 25 14:12:34 2021 -0800
>
> [ELF] Simplify DynamicSection content computation. NFC
>
> The new code computes the content twice, but avoides the tricky
> std::function<uint64_t()>. Removed 13KiB code in a Release build.
>
> lld/ELF/SyntheticSections.cpp | 117 ++++++++++++++++--------------------------
> lld/ELF/SyntheticSections.h | 12 +----
> 2 files changed, 44 insertions(+), 85 deletions(-)
>
> That's... interesting, given that commit title says No Functional
> Change, even though there clearly is one. That commit has a couple
> mentions of PowerPC synthetic sections, so it is possible that the
> new content calculation lines up with ld.bfd?
>
> I am not really sure where to go from here, as I don't fully understand
> what the problem was before that LLD change. I'll see if I can do some
> more investigation tomorrow (unless someone wants to beat me to it ;)
Thank you for looking into this, and sharing your analysis.
I built LLVM/clang from the master branch, rebuilt, but can still
reproduce this.
$ git clone --depth=1 https://github.com/llvm/llvm-project.git
$ cd llvm-project/
$ git log --oneline
41cb504b7 [mlir][linalg][bufferize][NFC] Move interface impl to
Linalg Transforms
$ mkdir build
$ cd build
$ cmake -DLLVM_ENABLE_PROJECTS=clang -G "Unix Makefiles"
-DCMAKE_BUILD_TYPE=Release -DLLVM_INSTALL_UTILS=ON
-DCMAKE_INSTALL_PREFIX=/scratch/local2/llvm ../llvm
$ make -j20
$ make -j20 clang-check
$ make install
$ /scratch/local2/llvm/bin/clang --version
clang version 15.0.0 (https://github.com/llvm/llvm-project.git
41cb504b7c4b18ac15830107431a0c1eec73a6b2)
Target: powerpc64le-unknown-linux-gnu
Thread model: posix
InstalledDir: /scratch/local2/llvm/bin
Then build Linux after `make clean` with `/scratch/local2/llvm/bin` in
the path.
$ LLVM=1 LLVM_IAS=0 eatmydata make -j20
$ qemu-system-ppc64 -enable-kvm -nographic -smp cores=1,threads=1
-net none -enable-kvm -M pseries -nodefaults -device spapr-vscsi -serial
stdio -m 512 -kernel /dev/shm/linux/vmlinux -append
"debug_boot_weak_hash panic=-1 console=ttyS0
rcupdate.rcu_cpu_stall_suppress_at_boot=1 torture.disable_onoff_at_boot
rcupdate.rcu_task_stall_timeout=30000 rcupdate.rcu_self_test=1
rcutorture.onoff_interval=1000 rcutorture.onoff_holdoff=30
rcutorture.n_barrier_cbs=4 rcutorture.stat_interval=15
rcutorture.shutdown_secs=420 rcutorture.test_no_idle_hz=1
rcutorture.verbose=1"
[…]
Preparing to boot Linux version 5.17.0-rc5-00178-ga4b9a8fb20e7
([email protected]) (clang
version 15.0.0 (https://github.com/llvm/llvm-project.git
41cb504b7c4b18ac15830107431a0c1eec73a6b2), LLD 13.0.0) #29 SMP PREEMPT
Mon Feb 21 10:58:54 CET 2022
[…]
[ 0.465889][ T1] BUG: Kernel NULL pointer dereference on read
at 0x00000000
[ 0.466749][ T1] Faulting instruction address: 0xc0000000008fc300
[ 0.467507][ T1] Oops: Kernel access of bad area, sig: 11 [#1]
[…]
Kind regards,
Paul
Hi Paul,
On Mon, Feb 21, 2022 at 12:17:40PM +0100, Paul Menzel wrote:
> Am 17.02.22 um 02:16 schrieb Nathan Chancellor:
>
> > On Wed, Feb 16, 2022 at 02:19:51PM +0100, Paul Menzel wrote:
> > > [Cc: +LLVM/clang build support folks]
>
> […]
>
> > > To recap: On a ppc64le machine, building Linux in Ubuntu 21.10 with *llvm*
> > > and *clang* 1:13.0-53~exp1
> > >
> > > $ clang --version
> > > Ubuntu clang version 13.0.0-2
> > > Target: powerpc64le-unknown-linux-gnu
> > > Thread model: posix
> > > InstalledDir: /usr/bin
> > >
> > > results in a segmentation fault, while it works when building with GCC.
> > >
> > > $ gcc --version
> > > gcc (Ubuntu 11.2.0-7ubuntu2) 11.2.0
> >
> > Thank you for keying us in. I am going to have a bit of a brain dump
> > here based on the information I have uncovered after a couple of hours
> > of debugging.
> >
> > TL;DR: It seems like something is broken with __read_mostly + ld.lld
> > before 14.0.0.
> >
> > My initial reproduction steps (boot-qemu.sh comes from
> > https://github.com/ClangBuiltLinux/boot-utils):
> >
> > $ clang --version
> > clang version 13.0.1 (Fedora 13.0.1-1.fc37)
> > Target: x86_64-redhat-linux-gnu
> > Thread model: posix
> > InstalledDir: /usr/bin
> >
> > $ powerpc64le-linux-gnu-as --version
> > GNU assembler version 2.37-2.fc36
> > Copyright (C) 2021 Free Software Foundation, Inc.
> > This program is free software; you may redistribute it under the terms of
> > the GNU General Public License version 3 or later.
> > This program has absolutely no warranty.
> > This assembler was configured for a target of `powerpc64le-linux-gnu'.
> >
> > $ curl -LSso .config https://lore.kernel.org/all/[email protected]/3-linux-5.17-rc4-rcu-dev-config.txt
> >
> > $ scripts/config --set-val INITRAMFS_SOURCE '""'
> >
> > $ make -skj"$(nproc)" ARCH=powerpc CROSS_COMPILE=powerpc64le-linux-gnu- LLVM=1 LLVM_IAS=0 all
> >
> > $ boot-qemu.sh -a ppc64le -k . -t 45s
> > QEMU location: /usr/bin
> >
> > QEMU version: QEMU emulator version 6.2.0 (qemu-6.2.0-5.fc37)
> >
> > + timeout --foreground 45s stdbuf -oL -eL qemu-system-ppc64 -initrd \
> > /home/nathan/cbl/github/boot-utils-ro/images/ppc64le/rootfs.cpio -device \
> > ipmi-bmc-sim,id=bmc0 -device isa-ipmi-bt,bmc=bmc0,irq=10 -L \
> > /home/nathan/cbl/github/boot-utils-ro/images/ppc64le/ -bios skiboot.lid \
> > -machine powernv8 -display none -kernel \
> > /home/nathan/cbl/src/linux/arch/powerpc/boot/zImage.epapr -m 2G \
> > -nodefaults -serial mon:stdio
> > ...
> > [ 1.478028][ T1] BUG: Kernel NULL pointer dereference on read at 0x00000000
> > [ 1.478630][ T1] Faulting instruction address: 0xc00000000090bee0
> > [ 1.479521][ T1] Oops: Kernel access of bad area, sig: 11 [#1]
> > [ 1.480036][ T1] LE PAGE_SIZE=64K MMU=Hash PREEMPT SMP NR_CPUS=16 NUMA PowerNV
> > [ 1.480853][ T1] Modules linked in:
> > [ 1.481265][ T1] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.17.0-rc4-00001-gfa15c7cb550f #1
> > [ 1.481967][ T1] NIP: c00000000090bee0 LR: c000000000d96b60 CTR: c0000000000d5b4c
> > [ 1.482596][ T1] REGS: c000000007443330 TRAP: 0380 Not tainted (5.17.0-rc4-00001-gfa15c7cb550f)
> > [ 1.483305][ T1] MSR: 9000000002009033 <SF,HV,VEC,EE,ME,IR,DR,RI,LE> CR: 22800a87 XER: 00000000
> > [ 1.484277][ T1] CFAR: c000000000d96b5c IRQMASK: 0
> > [ 1.484277][ T1] GPR00: c000000000d96b54 c0000000074435d0 c0000000028bc600 0000000000000000
> > [ 1.484277][ T1] GPR04: ffffffffffffffff ffffffffff1ea558 ffffffffff1ebfe4 c00000000261ae88
> > [ 1.484277][ T1] GPR08: 0000000000000003 0000000000000004 c00000000261ae88 0000000000000000
> > [ 1.484277][ T1] GPR12: 0000000000800000 c000000002a60000 c000000000012518 0000000000000000
> > [ 1.484277][ T1] GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> > [ 1.484277][ T1] GPR20: 0000000000000000 c0000000027bff80 0000000000000cc0 0000000000000000
> > [ 1.484277][ T1] GPR24: 0000000000000010 0000000000000000 0000000000000000 0000000000000000
> > [ 1.484277][ T1] GPR28: c0000000028fcfd8 c000000007b83000 0000000000000000 c0000000074435d0
> > [ 1.490325][ T1] NIP [c00000000090bee0] strlen+0x10/0x30
> > [ 1.490788][ T1] LR [c000000000d96b60] if_nlmsg_size+0x2b0/0x390
> > [ 1.491319][ T1] Call Trace:
> > [ 1.491573][ T1] [c0000000074435d0] [c000000000d96b54] if_nlmsg_size+0x2a4/0x390 (unreliable)
> > [ 1.492291][ T1] [c000000007443680] [c000000000d96790] rtmsg_ifinfo_build_skb+0x80/0x1a0
> > [ 1.492958][ T1] [c000000007443740] [c000000000d97590] rtmsg_ifinfo+0x70/0xd0
> > [ 1.493559][ T1] [c000000007443790] [c000000000d7d528] register_netdevice+0x5d8/0x670
> > [ 1.494205][ T1] [c000000007443820] [c000000000d7d94c] register_netdev+0x4c/0x80
> > [ 1.494823][ T1] [c000000007443850] [c000000000f826d8] sit_init_net+0x1b8/0x200
> > [ 1.495426][ T1] [c0000000074438d0] [c000000000d63b5c] ops_init+0x14c/0x1c0
> > [ 1.496014][ T1] [c000000007443930] [c000000000d6314c] register_pernet_operations+0xec/0x1e0
> > [ 1.496716][ T1] [c000000007443990] [c000000000d633d0] register_pernet_device+0x60/0xd0
> > [ 1.497372][ T1] [c0000000074439e0] [c000000002085194] sit_init+0x54/0x160
> > [ 1.497950][ T1] [c000000007443a70] [c000000000011c58] do_one_initcall+0x108/0x3e0
> > [ 1.498573][ T1] [c000000007443c70] [c000000002006190] do_initcall_level+0xe4/0x1c4
> > [ 1.499219][ T1] [c000000007443cc0] [c00000000200604c] do_initcalls+0x84/0xe4
> > [ 1.499799][ T1] [c000000007443d40] [c000000002005da8] kernel_init_freeable+0x160/0x1ec
> > [ 1.500444][ T1] [c000000007443da0] [c00000000001254c] kernel_init+0x3c/0x270
> > [ 1.501042][ T1] [c000000007443e10] [c00000000000cd64] ret_from_kernel_thread+0x5c/0x64
> > [ 1.501721][ T1] Instruction dump:
> > [ 1.502202][ T1] eb81ffe0 7c0803a6 4e800020 00000000 00000000 00000000 60000000 60000000
> > [ 1.502934][ T1] 3883ffff 60000000 60000000 60000000 <8ca40001> 28050000 4082fff8 7c632050
> > [ 1.504028][ T1] ---[ end trace 0000000000000000 ]---
> > ...
> >
> > First thing was figuring out where the NULL pointer dereference happens,
> > which appears to the "strlen(ops->kind)" in rtnl_link_get_size():
> >
> > 515 static size_t rtnl_link_get_size(const struct net_device *dev)
> > 516 {
> > 517 const struct rtnl_link_ops *ops = dev->rtnl_link_ops;
> > 518 size_t size;
> > 519
> > 520 if (!ops)
> > 521 return 0;
> > 522
> > 523 size = nla_total_size(sizeof(struct nlattr)) + /* IFLA_LINKINFO */
> > 524 nla_total_size(strlen(ops->kind) + 1); /* IFLA_INFO_KIND */
> >
> > which I confirmed some really rudimentary printk debugging:
> >
> > [ 1.476862][ T1] nathan: rtnl_link_get_size(): name: sit0, ops: c0000000028fcfd8, ops->kind: (null)
> >
> > diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
> > index 710da8a36729..c8d928e83aec 100644
> > --- a/net/core/rtnetlink.c
> > +++ b/net/core/rtnetlink.c
> > @@ -520,6 +520,9 @@ static size_t rtnl_link_get_size(const struct net_device *dev)
> > if (!ops)
> > return 0;
> > + pr_err("nathan: %s(): name: %s, ops: %px, ops->kind: %s\n", __func__,
> > + dev->name, ops, ops->kind);
> > +
> > size = nla_total_size(sizeof(struct nlattr)) + /* IFLA_LINKINFO */
> > nla_total_size(strlen(ops->kind) + 1); /* IFLA_INFO_KIND */
> >
> > Okay... how did sit0 end up with a NULL kind...? It is very clearly
> > defined as "sit":
> >
> > 1830 static struct rtnl_link_ops sit_link_ops __read_mostly = {
> > 1831 .kind = "sit",
> >
> > Adding some more debug prints to net/ipv6/sit.c:
> >
> > diff --git a/net/ipv6/sit.c b/net/ipv6/sit.c
> > index c0b138c20992..7b9edbed2fcd 100644
> > --- a/net/ipv6/sit.c
> > +++ b/net/ipv6/sit.c
> > @@ -1920,6 +1920,12 @@ static int __net_init sit_init_net(struct net *net)
> > */
> > sitn->fb_tunnel_dev->features |= NETIF_F_NETNS_LOCAL;
> > + pr_err("nathan: %s(): &sit_link_ops: %px\n", __func__, &sit_link_ops);
> > + pr_err("nathan: %s(): sit_link_ops.kind: %s\n", __func__, sit_link_ops.kind);
> > + pr_err("nathan: %s(): sit_link_ops.maxtype: %u\n", __func__, sit_link_ops.maxtype);
> > + pr_err("nathan: %s(): sitn->fb_tunnel_dev->rtnl_link_ops: %px\n", __func__, sitn->fb_tunnel_dev->rtnl_link_ops);
> > + pr_err("nathan: %s(): sitn->fb_tunnel_dev->rtnl_link_ops->kind: %s\n", __func__, sitn->fb_tunnel_dev->rtnl_link_ops->kind);
> > +
> > err = register_netdev(sitn->fb_tunnel_dev);
> > if (err)
> > goto err_reg_dev;
> >
> > reveals:
> >
> > [ 1.471920][ T1] sit: nathan: sit_init_net(): &sit_link_ops: c0000000028fcfd8
> > [ 1.472534][ T1] sit: nathan: sit_init_net(): sit_link_ops.kind: (null)
> > [ 1.473088][ T1] sit: nathan: sit_init_net(): sit_link_ops.maxtype: 20
> > [ 1.473639][ T1] sit: nathan: sit_init_net(): sitn->fb_tunnel_dev->rtnl_link_ops: c0000000028fcfd8
> > [ 1.474370][ T1] sit: nathan: sit_init_net(): sitn->fb_tunnel_dev->rtnl_link_ops->kind: (null)
> >
> > This is super bizarre, as the maxtype member appears to have the correct
> > value, but how is kind's initial getting dropped on the floor?
> >
> > Removing the __read_mostly annotation "fixes" it:
> >
> > [ 1.481708][ T1] sit: nathan: sit_init_net(): &sit_link_ops: c0000000027d3f60
> > [ 1.482319][ T1] sit: nathan: sit_init_net(): sit_link_ops.kind: sit
> > [ 1.482878][ T1] sit: nathan: sit_init_net(): sit_link_ops.maxtype: 20
> > [ 1.483429][ T1] sit: nathan: sit_init_net(): sitn->fb_tunnel_dev->rtnl_link_ops: c0000000027d3f60
> > [ 1.484174][ T1] sit: nathan: sit_init_net(): sitn->fb_tunnel_dev->rtnl_link_ops->kind: sit
> > ...
> > Linux version 5.17.0-rc4-00001-g956f02ad5c31-dirty ([email protected]) (clang version 13.0.1 (Fedora 13.0.1-1.fc37), LLD 13.0.1) #2 SMP PREEMPT Wed Feb 16 13:29:49 MST 2022
> > ...
> >
> > diff --git a/net/ipv6/sit.c b/net/ipv6/sit.c
> > index 7b9edbed2fcd..f109c7a0233b 100644
> > --- a/net/ipv6/sit.c
> > +++ b/net/ipv6/sit.c
> > @@ -70,7 +70,7 @@ static void ipip6_tunnel_setup(struct net_device *dev);
> > static void ipip6_dev_free(struct net_device *dev);
> > static bool check_6rd(struct ip_tunnel *tunnel, const struct in6_addr *v6dst,
> > __be32 *v4dst);
> > -static struct rtnl_link_ops sit_link_ops __read_mostly;
> > +static struct rtnl_link_ops sit_link_ops;
> > static unsigned int sit_net_id __read_mostly;
> > struct sit_net {
> > @@ -1827,7 +1827,7 @@ static void ipip6_dellink(struct net_device *dev, struct list_head *head)
> > unregister_netdevice_queue(dev, head);
> > }
> > -static struct rtnl_link_ops sit_link_ops __read_mostly = {
> > +static struct rtnl_link_ops sit_link_ops = {
> > .kind = "sit",
> > .maxtype = IFLA_IPTUN_MAX,
> > .policy = ipip6_policy,
> >
> > Switching to ld.bfd also resolves it:
> >
> > [ 1.470405][ T1] sit: nathan: sit_init_net(): &sit_link_ops: c0000000028acfd8
> > [ 1.471016][ T1] sit: nathan: sit_init_net(): sit_link_ops.kind: sit
> > [ 1.471534][ T1] sit: nathan: sit_init_net(): sit_link_ops.maxtype: 20
> > [ 1.472062][ T1] sit: nathan: sit_init_net(): sitn->fb_tunnel_dev->rtnl_link_ops: c0000000028acfd8
> > [ 1.472790][ T1] sit: nathan: sit_init_net(): sitn->fb_tunnel_dev->rtnl_link_ops->kind: sit
> > ...
> > Linux version 5.17.0-rc4-00001-g956f02ad5c31 ([email protected]) (clang version 13.0.1 (Fedora 13.0.1-1.fc37), GNU ld version 2.37-2.fc36) #3 SMP PREEMPT Wed Feb 16 13:33:42 MST 2022
> > ...
> >
> > I tested with ToT LLVM (or at least, close to it, since there is an
> > unrelated ld.lld regression there) and I could not reproduce it there,
> > so I did a reverse bisect to see what commit fixes this issue in LLVM 14
> > and I landed on:
> >
> > commit 55c14d6dbfd8e7b86c15d2613fea3490078e2ae4
> > Author: Fangrui Song <[email protected]>
> > Date: Thu Nov 25 14:12:34 2021 -0800
> >
> > [ELF] Simplify DynamicSection content computation. NFC
> >
> > The new code computes the content twice, but avoides the tricky
> > std::function<uint64_t()>. Removed 13KiB code in a Release build.
> >
> > lld/ELF/SyntheticSections.cpp | 117 ++++++++++++++++--------------------------
> > lld/ELF/SyntheticSections.h | 12 +----
> > 2 files changed, 44 insertions(+), 85 deletions(-)
> >
> > That's... interesting, given that commit title says No Functional
> > Change, even though there clearly is one. That commit has a couple
> > mentions of PowerPC synthetic sections, so it is possible that the
> > new content calculation lines up with ld.bfd?
> >
> > I am not really sure where to go from here, as I don't fully understand
> > what the problem was before that LLD change. I'll see if I can do some
> > more investigation tomorrow (unless someone wants to beat me to it ;)
>
> Thank you for looking into this, and sharing your analysis.
>
> I built LLVM/clang from the master branch, rebuilt, but can still reproduce
> this.
>
> $ git clone --depth=1 https://github.com/llvm/llvm-project.git
> $ cd llvm-project/
> $ git log --oneline
> 41cb504b7 [mlir][linalg][bufferize][NFC] Move interface impl to Linalg
> Transforms
> $ mkdir build
> $ cd build
> $ cmake -DLLVM_ENABLE_PROJECTS=clang -G "Unix Makefiles"
Since this is something related to ld.lld, not clang, this should be:
... -DLLVM_ENABLE_PROJECTS="clang;lld" ...
> -DCMAKE_BUILD_TYPE=Release -DLLVM_INSTALL_UTILS=ON
> -DCMAKE_INSTALL_PREFIX=/scratch/local2/llvm ../llvm
> $ make -j20
> $ make -j20 clang-check
You can also do 'check-lld' if you want.
> $ make install
> $ /scratch/local2/llvm/bin/clang --version
> clang version 15.0.0 (https://github.com/llvm/llvm-project.git
> 41cb504b7c4b18ac15830107431a0c1eec73a6b2)
> Target: powerpc64le-unknown-linux-gnu
> Thread model: posix
> InstalledDir: /scratch/local2/llvm/bin
>
> Then build Linux after `make clean` with `/scratch/local2/llvm/bin` in the
> path.
>
> $ LLVM=1 LLVM_IAS=0 eatmydata make -j20
>
> $ qemu-system-ppc64 -enable-kvm -nographic -smp cores=1,threads=1 -net
> none -enable-kvm -M pseries -nodefaults -device spapr-vscsi -serial stdio -m
> 512 -kernel /dev/shm/linux/vmlinux -append "debug_boot_weak_hash panic=-1
> console=ttyS0 rcupdate.rcu_cpu_stall_suppress_at_boot=1
> torture.disable_onoff_at_boot rcupdate.rcu_task_stall_timeout=30000
> rcupdate.rcu_self_test=1 rcutorture.onoff_interval=1000
> rcutorture.onoff_holdoff=30 rcutorture.n_barrier_cbs=4
> rcutorture.stat_interval=15 rcutorture.shutdown_secs=420
> rcutorture.test_no_idle_hz=1 rcutorture.verbose=1"
> […]
> Preparing to boot Linux version 5.17.0-rc5-00178-ga4b9a8fb20e7
> ([email protected]) (clang version
> 15.0.0 (https://github.com/llvm/llvm-project.git
> 41cb504b7c4b18ac15830107431a0c1eec73a6b2), LLD 13.0.0) #29 SMP PREEMPT Mon
^ still using ld.lld 13.0.0.
If you want to test the master branch, I would checkout LLVM at
460830a9c664e8cce959c660648faa7747ad8bdc, as the next commit introduces
a boot regression unrelated to this issue:
https://github.com/ClangBuiltLinux/linux/issues/1581
That should at least confirm this is resolved in a newer release.
> Feb 21 10:58:54 CET 2022
> […]
> [ 0.465889][ T1] BUG: Kernel NULL pointer dereference on read at
> 0x00000000
> [ 0.466749][ T1] Faulting instruction address: 0xc0000000008fc300
> [ 0.467507][ T1] Oops: Kernel access of bad area, sig: 11 [#1]
> […]
I do intend to do further analysis at some point over the next few days
to see if I can figure out exactly why that commit that I mentioned
above fixes the issue then we can look into what we should do about it
in the kernel sources.
Cheers,
Nathan
Dear Nathan,
Am 21.02.22 um 16:29 schrieb Nathan Chancellor:
> On Mon, Feb 21, 2022 at 12:17:40PM +0100, Paul Menzel wrote:
>> Am 17.02.22 um 02:16 schrieb Nathan Chancellor:
>>
>>> On Wed, Feb 16, 2022 at 02:19:51PM +0100, Paul Menzel wrote:
>>>> [Cc: +LLVM/clang build support folks]
>>
>> […]
>>
>>>> To recap: On a ppc64le machine, building Linux in Ubuntu 21.10 with *llvm*
>>>> and *clang* 1:13.0-53~exp1
>>>>
>>>> $ clang --version
>>>> Ubuntu clang version 13.0.0-2
>>>> Target: powerpc64le-unknown-linux-gnu
>>>> Thread model: posix
>>>> InstalledDir: /usr/bin
>>>>
>>>> results in a segmentation fault, while it works when building with GCC.
>>>>
>>>> $ gcc --version
>>>> gcc (Ubuntu 11.2.0-7ubuntu2) 11.2.0
>>>
>>> Thank you for keying us in. I am going to have a bit of a brain dump
>>> here based on the information I have uncovered after a couple of hours
>>> of debugging.
>>>
>>> TL;DR: It seems like something is broken with __read_mostly + ld.lld
>>> before 14.0.0.
>>>
>>> My initial reproduction steps (boot-qemu.sh comes from
>>> https://github.com/ClangBuiltLinux/boot-utils):
>>>
>>> $ clang --version
>>> clang version 13.0.1 (Fedora 13.0.1-1.fc37)
>>> Target: x86_64-redhat-linux-gnu
>>> Thread model: posix
>>> InstalledDir: /usr/bin
>>>
>>> $ powerpc64le-linux-gnu-as --version
>>> GNU assembler version 2.37-2.fc36
>>> Copyright (C) 2021 Free Software Foundation, Inc.
>>> This program is free software; you may redistribute it under the terms of
>>> the GNU General Public License version 3 or later.
>>> This program has absolutely no warranty.
>>> This assembler was configured for a target of `powerpc64le-linux-gnu'.
>>>
>>> $ curl -LSso .config https://lore.kernel.org/all/[email protected]/3-linux-5.17-rc4-rcu-dev-config.txt
>>>
>>> $ scripts/config --set-val INITRAMFS_SOURCE '""'
>>>
>>> $ make -skj"$(nproc)" ARCH=powerpc CROSS_COMPILE=powerpc64le-linux-gnu- LLVM=1 LLVM_IAS=0 all
>>>
>>> $ boot-qemu.sh -a ppc64le -k . -t 45s
>>> QEMU location: /usr/bin
>>>
>>> QEMU version: QEMU emulator version 6.2.0 (qemu-6.2.0-5.fc37)
>>>
>>> + timeout --foreground 45s stdbuf -oL -eL qemu-system-ppc64 -initrd \
>>> /home/nathan/cbl/github/boot-utils-ro/images/ppc64le/rootfs.cpio -device \
>>> ipmi-bmc-sim,id=bmc0 -device isa-ipmi-bt,bmc=bmc0,irq=10 -L \
>>> /home/nathan/cbl/github/boot-utils-ro/images/ppc64le/ -bios skiboot.lid \
>>> -machine powernv8 -display none -kernel \
>>> /home/nathan/cbl/src/linux/arch/powerpc/boot/zImage.epapr -m 2G \
>>> -nodefaults -serial mon:stdio
>>> ...
>>> [ 1.478028][ T1] BUG: Kernel NULL pointer dereference on read at 0x00000000
>>> [ 1.478630][ T1] Faulting instruction address: 0xc00000000090bee0
>>> [ 1.479521][ T1] Oops: Kernel access of bad area, sig: 11 [#1]
>>> [ 1.480036][ T1] LE PAGE_SIZE=64K MMU=Hash PREEMPT SMP NR_CPUS=16 NUMA PowerNV
>>> [ 1.480853][ T1] Modules linked in:
>>> [ 1.481265][ T1] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.17.0-rc4-00001-gfa15c7cb550f #1
>>> [ 1.481967][ T1] NIP: c00000000090bee0 LR: c000000000d96b60 CTR: c0000000000d5b4c
>>> [ 1.482596][ T1] REGS: c000000007443330 TRAP: 0380 Not tainted (5.17.0-rc4-00001-gfa15c7cb550f)
>>> [ 1.483305][ T1] MSR: 9000000002009033 <SF,HV,VEC,EE,ME,IR,DR,RI,LE> CR: 22800a87 XER: 00000000
>>> [ 1.484277][ T1] CFAR: c000000000d96b5c IRQMASK: 0
>>> [ 1.484277][ T1] GPR00: c000000000d96b54 c0000000074435d0 c0000000028bc600 0000000000000000
>>> [ 1.484277][ T1] GPR04: ffffffffffffffff ffffffffff1ea558 ffffffffff1ebfe4 c00000000261ae88
>>> [ 1.484277][ T1] GPR08: 0000000000000003 0000000000000004 c00000000261ae88 0000000000000000
>>> [ 1.484277][ T1] GPR12: 0000000000800000 c000000002a60000 c000000000012518 0000000000000000
>>> [ 1.484277][ T1] GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
>>> [ 1.484277][ T1] GPR20: 0000000000000000 c0000000027bff80 0000000000000cc0 0000000000000000
>>> [ 1.484277][ T1] GPR24: 0000000000000010 0000000000000000 0000000000000000 0000000000000000
>>> [ 1.484277][ T1] GPR28: c0000000028fcfd8 c000000007b83000 0000000000000000 c0000000074435d0
>>> [ 1.490325][ T1] NIP [c00000000090bee0] strlen+0x10/0x30
>>> [ 1.490788][ T1] LR [c000000000d96b60] if_nlmsg_size+0x2b0/0x390
>>> [ 1.491319][ T1] Call Trace:
>>> [ 1.491573][ T1] [c0000000074435d0] [c000000000d96b54] if_nlmsg_size+0x2a4/0x390 (unreliable)
>>> [ 1.492291][ T1] [c000000007443680] [c000000000d96790] rtmsg_ifinfo_build_skb+0x80/0x1a0
>>> [ 1.492958][ T1] [c000000007443740] [c000000000d97590] rtmsg_ifinfo+0x70/0xd0
>>> [ 1.493559][ T1] [c000000007443790] [c000000000d7d528] register_netdevice+0x5d8/0x670
>>> [ 1.494205][ T1] [c000000007443820] [c000000000d7d94c] register_netdev+0x4c/0x80
>>> [ 1.494823][ T1] [c000000007443850] [c000000000f826d8] sit_init_net+0x1b8/0x200
>>> [ 1.495426][ T1] [c0000000074438d0] [c000000000d63b5c] ops_init+0x14c/0x1c0
>>> [ 1.496014][ T1] [c000000007443930] [c000000000d6314c] register_pernet_operations+0xec/0x1e0
>>> [ 1.496716][ T1] [c000000007443990] [c000000000d633d0] register_pernet_device+0x60/0xd0
>>> [ 1.497372][ T1] [c0000000074439e0] [c000000002085194] sit_init+0x54/0x160
>>> [ 1.497950][ T1] [c000000007443a70] [c000000000011c58] do_one_initcall+0x108/0x3e0
>>> [ 1.498573][ T1] [c000000007443c70] [c000000002006190] do_initcall_level+0xe4/0x1c4
>>> [ 1.499219][ T1] [c000000007443cc0] [c00000000200604c] do_initcalls+0x84/0xe4
>>> [ 1.499799][ T1] [c000000007443d40] [c000000002005da8] kernel_init_freeable+0x160/0x1ec
>>> [ 1.500444][ T1] [c000000007443da0] [c00000000001254c] kernel_init+0x3c/0x270
>>> [ 1.501042][ T1] [c000000007443e10] [c00000000000cd64] ret_from_kernel_thread+0x5c/0x64
>>> [ 1.501721][ T1] Instruction dump:
>>> [ 1.502202][ T1] eb81ffe0 7c0803a6 4e800020 00000000 00000000 00000000 60000000 60000000
>>> [ 1.502934][ T1] 3883ffff 60000000 60000000 60000000 <8ca40001> 28050000 4082fff8 7c632050
>>> [ 1.504028][ T1] ---[ end trace 0000000000000000 ]---
>>> ...
>>>
>>> First thing was figuring out where the NULL pointer dereference happens,
>>> which appears to the "strlen(ops->kind)" in rtnl_link_get_size():
>>>
>>> 515 static size_t rtnl_link_get_size(const struct net_device *dev)
>>> 516 {
>>> 517 const struct rtnl_link_ops *ops = dev->rtnl_link_ops;
>>> 518 size_t size;
>>> 519
>>> 520 if (!ops)
>>> 521 return 0;
>>> 522
>>> 523 size = nla_total_size(sizeof(struct nlattr)) + /* IFLA_LINKINFO */
>>> 524 nla_total_size(strlen(ops->kind) + 1); /* IFLA_INFO_KIND */
>>>
>>> which I confirmed some really rudimentary printk debugging:
>>>
>>> [ 1.476862][ T1] nathan: rtnl_link_get_size(): name: sit0, ops: c0000000028fcfd8, ops->kind: (null)
>>>
>>> diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
>>> index 710da8a36729..c8d928e83aec 100644
>>> --- a/net/core/rtnetlink.c
>>> +++ b/net/core/rtnetlink.c
>>> @@ -520,6 +520,9 @@ static size_t rtnl_link_get_size(const struct net_device *dev)
>>> if (!ops)
>>> return 0;
>>> + pr_err("nathan: %s(): name: %s, ops: %px, ops->kind: %s\n", __func__,
>>> + dev->name, ops, ops->kind);
>>> +
>>> size = nla_total_size(sizeof(struct nlattr)) + /* IFLA_LINKINFO */
>>> nla_total_size(strlen(ops->kind) + 1); /* IFLA_INFO_KIND */
>>>
>>> Okay... how did sit0 end up with a NULL kind...? It is very clearly
>>> defined as "sit":
>>>
>>> 1830 static struct rtnl_link_ops sit_link_ops __read_mostly = {
>>> 1831 .kind = "sit",
>>>
>>> Adding some more debug prints to net/ipv6/sit.c:
>>>
>>> diff --git a/net/ipv6/sit.c b/net/ipv6/sit.c
>>> index c0b138c20992..7b9edbed2fcd 100644
>>> --- a/net/ipv6/sit.c
>>> +++ b/net/ipv6/sit.c
>>> @@ -1920,6 +1920,12 @@ static int __net_init sit_init_net(struct net *net)
>>> */
>>> sitn->fb_tunnel_dev->features |= NETIF_F_NETNS_LOCAL;
>>> + pr_err("nathan: %s(): &sit_link_ops: %px\n", __func__, &sit_link_ops);
>>> + pr_err("nathan: %s(): sit_link_ops.kind: %s\n", __func__, sit_link_ops.kind);
>>> + pr_err("nathan: %s(): sit_link_ops.maxtype: %u\n", __func__, sit_link_ops.maxtype);
>>> + pr_err("nathan: %s(): sitn->fb_tunnel_dev->rtnl_link_ops: %px\n", __func__, sitn->fb_tunnel_dev->rtnl_link_ops);
>>> + pr_err("nathan: %s(): sitn->fb_tunnel_dev->rtnl_link_ops->kind: %s\n", __func__, sitn->fb_tunnel_dev->rtnl_link_ops->kind);
>>> +
>>> err = register_netdev(sitn->fb_tunnel_dev);
>>> if (err)
>>> goto err_reg_dev;
>>>
>>> reveals:
>>>
>>> [ 1.471920][ T1] sit: nathan: sit_init_net(): &sit_link_ops: c0000000028fcfd8
>>> [ 1.472534][ T1] sit: nathan: sit_init_net(): sit_link_ops.kind: (null)
>>> [ 1.473088][ T1] sit: nathan: sit_init_net(): sit_link_ops.maxtype: 20
>>> [ 1.473639][ T1] sit: nathan: sit_init_net(): sitn->fb_tunnel_dev->rtnl_link_ops: c0000000028fcfd8
>>> [ 1.474370][ T1] sit: nathan: sit_init_net(): sitn->fb_tunnel_dev->rtnl_link_ops->kind: (null)
>>>
>>> This is super bizarre, as the maxtype member appears to have the correct
>>> value, but how is kind's initial getting dropped on the floor?
>>>
>>> Removing the __read_mostly annotation "fixes" it:
>>>
>>> [ 1.481708][ T1] sit: nathan: sit_init_net(): &sit_link_ops: c0000000027d3f60
>>> [ 1.482319][ T1] sit: nathan: sit_init_net(): sit_link_ops.kind: sit
>>> [ 1.482878][ T1] sit: nathan: sit_init_net(): sit_link_ops.maxtype: 20
>>> [ 1.483429][ T1] sit: nathan: sit_init_net(): sitn->fb_tunnel_dev->rtnl_link_ops: c0000000027d3f60
>>> [ 1.484174][ T1] sit: nathan: sit_init_net(): sitn->fb_tunnel_dev->rtnl_link_ops->kind: sit
>>> ...
>>> Linux version 5.17.0-rc4-00001-g956f02ad5c31-dirty ([email protected]) (clang version 13.0.1 (Fedora 13.0.1-1.fc37), LLD 13.0.1) #2 SMP PREEMPT Wed Feb 16 13:29:49 MST 2022
>>> ...
>>>
>>> diff --git a/net/ipv6/sit.c b/net/ipv6/sit.c
>>> index 7b9edbed2fcd..f109c7a0233b 100644
>>> --- a/net/ipv6/sit.c
>>> +++ b/net/ipv6/sit.c
>>> @@ -70,7 +70,7 @@ static void ipip6_tunnel_setup(struct net_device *dev);
>>> static void ipip6_dev_free(struct net_device *dev);
>>> static bool check_6rd(struct ip_tunnel *tunnel, const struct in6_addr *v6dst,
>>> __be32 *v4dst);
>>> -static struct rtnl_link_ops sit_link_ops __read_mostly;
>>> +static struct rtnl_link_ops sit_link_ops;
>>> static unsigned int sit_net_id __read_mostly;
>>> struct sit_net {
>>> @@ -1827,7 +1827,7 @@ static void ipip6_dellink(struct net_device *dev, struct list_head *head)
>>> unregister_netdevice_queue(dev, head);
>>> }
>>> -static struct rtnl_link_ops sit_link_ops __read_mostly = {
>>> +static struct rtnl_link_ops sit_link_ops = {
>>> .kind = "sit",
>>> .maxtype = IFLA_IPTUN_MAX,
>>> .policy = ipip6_policy,
>>>
>>> Switching to ld.bfd also resolves it:
>>>
>>> [ 1.470405][ T1] sit: nathan: sit_init_net(): &sit_link_ops: c0000000028acfd8
>>> [ 1.471016][ T1] sit: nathan: sit_init_net(): sit_link_ops.kind: sit
>>> [ 1.471534][ T1] sit: nathan: sit_init_net(): sit_link_ops.maxtype: 20
>>> [ 1.472062][ T1] sit: nathan: sit_init_net(): sitn->fb_tunnel_dev->rtnl_link_ops: c0000000028acfd8
>>> [ 1.472790][ T1] sit: nathan: sit_init_net(): sitn->fb_tunnel_dev->rtnl_link_ops->kind: sit
>>> ...
>>> Linux version 5.17.0-rc4-00001-g956f02ad5c31 ([email protected]) (clang version 13.0.1 (Fedora 13.0.1-1.fc37), GNU ld version 2.37-2.fc36) #3 SMP PREEMPT Wed Feb 16 13:33:42 MST 2022
>>> ...
>>>
>>> I tested with ToT LLVM (or at least, close to it, since there is an
>>> unrelated ld.lld regression there) and I could not reproduce it there,
>>> so I did a reverse bisect to see what commit fixes this issue in LLVM 14
>>> and I landed on:
>>>
>>> commit 55c14d6dbfd8e7b86c15d2613fea3490078e2ae4
>>> Author: Fangrui Song <[email protected]>
>>> Date: Thu Nov 25 14:12:34 2021 -0800
>>>
>>> [ELF] Simplify DynamicSection content computation. NFC
>>>
>>> The new code computes the content twice, but avoides the tricky
>>> std::function<uint64_t()>. Removed 13KiB code in a Release build.
>>>
>>> lld/ELF/SyntheticSections.cpp | 117 ++++++++++++++++--------------------------
>>> lld/ELF/SyntheticSections.h | 12 +----
>>> 2 files changed, 44 insertions(+), 85 deletions(-)
>>>
>>> That's... interesting, given that commit title says No Functional
>>> Change, even though there clearly is one. That commit has a couple
>>> mentions of PowerPC synthetic sections, so it is possible that the
>>> new content calculation lines up with ld.bfd?
>>>
>>> I am not really sure where to go from here, as I don't fully understand
>>> what the problem was before that LLD change. I'll see if I can do some
>>> more investigation tomorrow (unless someone wants to beat me to it ;)
>>
>> Thank you for looking into this, and sharing your analysis.
>>
>> I built LLVM/clang from the master branch, rebuilt, but can still reproduce
>> this.
>>
>> $ git clone --depth=1 https://github.com/llvm/llvm-project.git
>> $ cd llvm-project/
>> $ git log --oneline
>> 41cb504b7 [mlir][linalg][bufferize][NFC] Move interface impl to Linalg Transforms
>> $ mkdir build
>> $ cd build
>> $ cmake -DLLVM_ENABLE_PROJECTS=clang -G "Unix Makefiles"
>
> Since this is something related to ld.lld, not clang, this should be:
>
> ... -DLLVM_ENABLE_PROJECTS="clang;lld" ...
>
>> -DCMAKE_BUILD_TYPE=Release -DLLVM_INSTALL_UTILS=ON
>> -DCMAKE_INSTALL_PREFIX=/scratch/local2/llvm ../llvm
>> $ make -j20
>> $ make -j20 clang-check
>
> You can also do 'check-lld' if you want.
>
>> $ make install
>> $ /scratch/local2/llvm/bin/clang --version
>> clang version 15.0.0 (https://github.com/llvm/llvm-project.git 41cb504b7c4b18ac15830107431a0c1eec73a6b2)
>> Target: powerpc64le-unknown-linux-gnu
>> Thread model: posix
>> InstalledDir: /scratch/local2/llvm/bin
>>
>> Then build Linux after `make clean` with `/scratch/local2/llvm/bin` in the
>> path.
>>
>> $ LLVM=1 LLVM_IAS=0 eatmydata make -j20
>>
>> $ qemu-system-ppc64 -enable-kvm -nographic -smp cores=1,threads=1 -net none -enable-kvm -M pseries -nodefaults -device spapr-vscsi -serial stdio -m 512 -kernel /dev/shm/linux/vmlinux -append "debug_boot_weak_hash panic=-1 console=ttyS0 rcupdate.rcu_cpu_stall_suppress_at_boot=1 torture.disable_onoff_at_boot rcupdate.rcu_task_stall_timeout=30000 rcupdate.rcu_self_test=1 rcutorture.onoff_interval=1000 rcutorture.onoff_holdoff=30 rcutorture.n_barrier_cbs=4 rcutorture.stat_interval=15 rcutorture.shutdown_secs=420 rcutorture.test_no_idle_hz=1 rcutorture.verbose=1"
>> […]
>> Preparing to boot Linux version 5.17.0-rc5-00178-ga4b9a8fb20e7 ([email protected]) (clang version 15.0.0 (https://github.com/llvm/llvm-project.git 41cb504b7c4b18ac15830107431a0c1eec73a6b2), LLD 13.0.0) #29 SMP PREEMPT Mon
>
> ^ still using ld.lld 13.0.0.
>
> If you want to test the master branch, I would checkout LLVM at
> 460830a9c664e8cce959c660648faa7747ad8bdc, as the next commit introduces
> a boot regression unrelated to this issue:
>
> https://github.com/ClangBuiltLinux/linux/issues/1581
>
> That should at least confirm this is resolved in a newer release.
Sorry for missing to update ld.lld. Indeed with the commit you
mentioned, the segmentation fault is gone.
$ /scratch/local2/llvm/bin/ld.lld --version
LLD 14.0.0 (compatible with GNU linkers)
>> Feb 21 10:58:54 CET 2022
>> […]
>> [ 0.465889][ T1] BUG: Kernel NULL pointer dereference on read at 0x00000000
>> [ 0.466749][ T1] Faulting instruction address: 0xc0000000008fc300
>> [ 0.467507][ T1] Oops: Kernel access of bad area, sig: 11 [#1]
>> […]
>
> I do intend to do further analysis at some point over the next few days
> to see if I can figure out exactly why that commit that I mentioned
> above fixes the issue then we can look into what we should do about it
> in the kernel sources.
Awesome. Thank you for working on that.
Kind regards,
Paul
On Mon, Feb 21, 2022 at 08:29:46AM -0700, Nathan Chancellor wrote:
> Hi Paul,
>
> On Mon, Feb 21, 2022 at 12:17:40PM +0100, Paul Menzel wrote:
> > Am 17.02.22 um 02:16 schrieb Nathan Chancellor:
> >
> > > On Wed, Feb 16, 2022 at 02:19:51PM +0100, Paul Menzel wrote:
> > > > [Cc: +LLVM/clang build support folks]
> >
> > […]
> >
> > > > To recap: On a ppc64le machine, building Linux in Ubuntu 21.10 with *llvm*
> > > > and *clang* 1:13.0-53~exp1
> > > >
> > > > $ clang --version
> > > > Ubuntu clang version 13.0.0-2
> > > > Target: powerpc64le-unknown-linux-gnu
> > > > Thread model: posix
> > > > InstalledDir: /usr/bin
> > > >
> > > > results in a segmentation fault, while it works when building with GCC.
> > > >
> > > > $ gcc --version
> > > > gcc (Ubuntu 11.2.0-7ubuntu2) 11.2.0
> > >
> > > Thank you for keying us in. I am going to have a bit of a brain dump
> > > here based on the information I have uncovered after a couple of hours
> > > of debugging.
> > >
> > > TL;DR: It seems like something is broken with __read_mostly + ld.lld
> > > before 14.0.0.
> > >
> > > My initial reproduction steps (boot-qemu.sh comes from
> > > https://github.com/ClangBuiltLinux/boot-utils):
> > >
> > > $ clang --version
> > > clang version 13.0.1 (Fedora 13.0.1-1.fc37)
> > > Target: x86_64-redhat-linux-gnu
> > > Thread model: posix
> > > InstalledDir: /usr/bin
> > >
> > > $ powerpc64le-linux-gnu-as --version
> > > GNU assembler version 2.37-2.fc36
> > > Copyright (C) 2021 Free Software Foundation, Inc.
> > > This program is free software; you may redistribute it under the terms of
> > > the GNU General Public License version 3 or later.
> > > This program has absolutely no warranty.
> > > This assembler was configured for a target of `powerpc64le-linux-gnu'.
> > >
> > > $ curl -LSso .config https://lore.kernel.org/all/[email protected]/3-linux-5.17-rc4-rcu-dev-config.txt
> > >
> > > $ scripts/config --set-val INITRAMFS_SOURCE '""'
> > >
> > > $ make -skj"$(nproc)" ARCH=powerpc CROSS_COMPILE=powerpc64le-linux-gnu- LLVM=1 LLVM_IAS=0 all
> > >
> > > $ boot-qemu.sh -a ppc64le -k . -t 45s
> > > QEMU location: /usr/bin
> > >
> > > QEMU version: QEMU emulator version 6.2.0 (qemu-6.2.0-5.fc37)
> > >
> > > + timeout --foreground 45s stdbuf -oL -eL qemu-system-ppc64 -initrd \
> > > /home/nathan/cbl/github/boot-utils-ro/images/ppc64le/rootfs.cpio -device \
> > > ipmi-bmc-sim,id=bmc0 -device isa-ipmi-bt,bmc=bmc0,irq=10 -L \
> > > /home/nathan/cbl/github/boot-utils-ro/images/ppc64le/ -bios skiboot.lid \
> > > -machine powernv8 -display none -kernel \
> > > /home/nathan/cbl/src/linux/arch/powerpc/boot/zImage.epapr -m 2G \
> > > -nodefaults -serial mon:stdio
> > > ...
> > > [ 1.478028][ T1] BUG: Kernel NULL pointer dereference on read at 0x00000000
> > > [ 1.478630][ T1] Faulting instruction address: 0xc00000000090bee0
> > > [ 1.479521][ T1] Oops: Kernel access of bad area, sig: 11 [#1]
> > > [ 1.480036][ T1] LE PAGE_SIZE=64K MMU=Hash PREEMPT SMP NR_CPUS=16 NUMA PowerNV
> > > [ 1.480853][ T1] Modules linked in:
> > > [ 1.481265][ T1] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.17.0-rc4-00001-gfa15c7cb550f #1
> > > [ 1.481967][ T1] NIP: c00000000090bee0 LR: c000000000d96b60 CTR: c0000000000d5b4c
> > > [ 1.482596][ T1] REGS: c000000007443330 TRAP: 0380 Not tainted (5.17.0-rc4-00001-gfa15c7cb550f)
> > > [ 1.483305][ T1] MSR: 9000000002009033 <SF,HV,VEC,EE,ME,IR,DR,RI,LE> CR: 22800a87 XER: 00000000
> > > [ 1.484277][ T1] CFAR: c000000000d96b5c IRQMASK: 0
> > > [ 1.484277][ T1] GPR00: c000000000d96b54 c0000000074435d0 c0000000028bc600 0000000000000000
> > > [ 1.484277][ T1] GPR04: ffffffffffffffff ffffffffff1ea558 ffffffffff1ebfe4 c00000000261ae88
> > > [ 1.484277][ T1] GPR08: 0000000000000003 0000000000000004 c00000000261ae88 0000000000000000
> > > [ 1.484277][ T1] GPR12: 0000000000800000 c000000002a60000 c000000000012518 0000000000000000
> > > [ 1.484277][ T1] GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> > > [ 1.484277][ T1] GPR20: 0000000000000000 c0000000027bff80 0000000000000cc0 0000000000000000
> > > [ 1.484277][ T1] GPR24: 0000000000000010 0000000000000000 0000000000000000 0000000000000000
> > > [ 1.484277][ T1] GPR28: c0000000028fcfd8 c000000007b83000 0000000000000000 c0000000074435d0
> > > [ 1.490325][ T1] NIP [c00000000090bee0] strlen+0x10/0x30
> > > [ 1.490788][ T1] LR [c000000000d96b60] if_nlmsg_size+0x2b0/0x390
> > > [ 1.491319][ T1] Call Trace:
> > > [ 1.491573][ T1] [c0000000074435d0] [c000000000d96b54] if_nlmsg_size+0x2a4/0x390 (unreliable)
> > > [ 1.492291][ T1] [c000000007443680] [c000000000d96790] rtmsg_ifinfo_build_skb+0x80/0x1a0
> > > [ 1.492958][ T1] [c000000007443740] [c000000000d97590] rtmsg_ifinfo+0x70/0xd0
> > > [ 1.493559][ T1] [c000000007443790] [c000000000d7d528] register_netdevice+0x5d8/0x670
> > > [ 1.494205][ T1] [c000000007443820] [c000000000d7d94c] register_netdev+0x4c/0x80
> > > [ 1.494823][ T1] [c000000007443850] [c000000000f826d8] sit_init_net+0x1b8/0x200
> > > [ 1.495426][ T1] [c0000000074438d0] [c000000000d63b5c] ops_init+0x14c/0x1c0
> > > [ 1.496014][ T1] [c000000007443930] [c000000000d6314c] register_pernet_operations+0xec/0x1e0
> > > [ 1.496716][ T1] [c000000007443990] [c000000000d633d0] register_pernet_device+0x60/0xd0
> > > [ 1.497372][ T1] [c0000000074439e0] [c000000002085194] sit_init+0x54/0x160
> > > [ 1.497950][ T1] [c000000007443a70] [c000000000011c58] do_one_initcall+0x108/0x3e0
> > > [ 1.498573][ T1] [c000000007443c70] [c000000002006190] do_initcall_level+0xe4/0x1c4
> > > [ 1.499219][ T1] [c000000007443cc0] [c00000000200604c] do_initcalls+0x84/0xe4
> > > [ 1.499799][ T1] [c000000007443d40] [c000000002005da8] kernel_init_freeable+0x160/0x1ec
> > > [ 1.500444][ T1] [c000000007443da0] [c00000000001254c] kernel_init+0x3c/0x270
> > > [ 1.501042][ T1] [c000000007443e10] [c00000000000cd64] ret_from_kernel_thread+0x5c/0x64
> > > [ 1.501721][ T1] Instruction dump:
> > > [ 1.502202][ T1] eb81ffe0 7c0803a6 4e800020 00000000 00000000 00000000 60000000 60000000
> > > [ 1.502934][ T1] 3883ffff 60000000 60000000 60000000 <8ca40001> 28050000 4082fff8 7c632050
> > > [ 1.504028][ T1] ---[ end trace 0000000000000000 ]---
> > > ...
> > >
> > > First thing was figuring out where the NULL pointer dereference happens,
> > > which appears to the "strlen(ops->kind)" in rtnl_link_get_size():
> > >
> > > 515 static size_t rtnl_link_get_size(const struct net_device *dev)
> > > 516 {
> > > 517 const struct rtnl_link_ops *ops = dev->rtnl_link_ops;
> > > 518 size_t size;
> > > 519
> > > 520 if (!ops)
> > > 521 return 0;
> > > 522
> > > 523 size = nla_total_size(sizeof(struct nlattr)) + /* IFLA_LINKINFO */
> > > 524 nla_total_size(strlen(ops->kind) + 1); /* IFLA_INFO_KIND */
> > >
> > > which I confirmed some really rudimentary printk debugging:
> > >
> > > [ 1.476862][ T1] nathan: rtnl_link_get_size(): name: sit0, ops: c0000000028fcfd8, ops->kind: (null)
> > >
> > > diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
> > > index 710da8a36729..c8d928e83aec 100644
> > > --- a/net/core/rtnetlink.c
> > > +++ b/net/core/rtnetlink.c
> > > @@ -520,6 +520,9 @@ static size_t rtnl_link_get_size(const struct net_device *dev)
> > > if (!ops)
> > > return 0;
> > > + pr_err("nathan: %s(): name: %s, ops: %px, ops->kind: %s\n", __func__,
> > > + dev->name, ops, ops->kind);
> > > +
> > > size = nla_total_size(sizeof(struct nlattr)) + /* IFLA_LINKINFO */
> > > nla_total_size(strlen(ops->kind) + 1); /* IFLA_INFO_KIND */
> > >
> > > Okay... how did sit0 end up with a NULL kind...? It is very clearly
> > > defined as "sit":
> > >
> > > 1830 static struct rtnl_link_ops sit_link_ops __read_mostly = {
> > > 1831 .kind = "sit",
> > >
> > > Adding some more debug prints to net/ipv6/sit.c:
> > >
> > > diff --git a/net/ipv6/sit.c b/net/ipv6/sit.c
> > > index c0b138c20992..7b9edbed2fcd 100644
> > > --- a/net/ipv6/sit.c
> > > +++ b/net/ipv6/sit.c
> > > @@ -1920,6 +1920,12 @@ static int __net_init sit_init_net(struct net *net)
> > > */
> > > sitn->fb_tunnel_dev->features |= NETIF_F_NETNS_LOCAL;
> > > + pr_err("nathan: %s(): &sit_link_ops: %px\n", __func__, &sit_link_ops);
> > > + pr_err("nathan: %s(): sit_link_ops.kind: %s\n", __func__, sit_link_ops.kind);
> > > + pr_err("nathan: %s(): sit_link_ops.maxtype: %u\n", __func__, sit_link_ops.maxtype);
> > > + pr_err("nathan: %s(): sitn->fb_tunnel_dev->rtnl_link_ops: %px\n", __func__, sitn->fb_tunnel_dev->rtnl_link_ops);
> > > + pr_err("nathan: %s(): sitn->fb_tunnel_dev->rtnl_link_ops->kind: %s\n", __func__, sitn->fb_tunnel_dev->rtnl_link_ops->kind);
> > > +
> > > err = register_netdev(sitn->fb_tunnel_dev);
> > > if (err)
> > > goto err_reg_dev;
> > >
> > > reveals:
> > >
> > > [ 1.471920][ T1] sit: nathan: sit_init_net(): &sit_link_ops: c0000000028fcfd8
> > > [ 1.472534][ T1] sit: nathan: sit_init_net(): sit_link_ops.kind: (null)
> > > [ 1.473088][ T1] sit: nathan: sit_init_net(): sit_link_ops.maxtype: 20
> > > [ 1.473639][ T1] sit: nathan: sit_init_net(): sitn->fb_tunnel_dev->rtnl_link_ops: c0000000028fcfd8
> > > [ 1.474370][ T1] sit: nathan: sit_init_net(): sitn->fb_tunnel_dev->rtnl_link_ops->kind: (null)
> > >
> > > This is super bizarre, as the maxtype member appears to have the correct
> > > value, but how is kind's initial getting dropped on the floor?
> > >
> > > Removing the __read_mostly annotation "fixes" it:
> > >
> > > [ 1.481708][ T1] sit: nathan: sit_init_net(): &sit_link_ops: c0000000027d3f60
> > > [ 1.482319][ T1] sit: nathan: sit_init_net(): sit_link_ops.kind: sit
> > > [ 1.482878][ T1] sit: nathan: sit_init_net(): sit_link_ops.maxtype: 20
> > > [ 1.483429][ T1] sit: nathan: sit_init_net(): sitn->fb_tunnel_dev->rtnl_link_ops: c0000000027d3f60
> > > [ 1.484174][ T1] sit: nathan: sit_init_net(): sitn->fb_tunnel_dev->rtnl_link_ops->kind: sit
> > > ...
> > > Linux version 5.17.0-rc4-00001-g956f02ad5c31-dirty ([email protected]) (clang version 13.0.1 (Fedora 13.0.1-1.fc37), LLD 13.0.1) #2 SMP PREEMPT Wed Feb 16 13:29:49 MST 2022
> > > ...
> > >
> > > diff --git a/net/ipv6/sit.c b/net/ipv6/sit.c
> > > index 7b9edbed2fcd..f109c7a0233b 100644
> > > --- a/net/ipv6/sit.c
> > > +++ b/net/ipv6/sit.c
> > > @@ -70,7 +70,7 @@ static void ipip6_tunnel_setup(struct net_device *dev);
> > > static void ipip6_dev_free(struct net_device *dev);
> > > static bool check_6rd(struct ip_tunnel *tunnel, const struct in6_addr *v6dst,
> > > __be32 *v4dst);
> > > -static struct rtnl_link_ops sit_link_ops __read_mostly;
> > > +static struct rtnl_link_ops sit_link_ops;
> > > static unsigned int sit_net_id __read_mostly;
> > > struct sit_net {
> > > @@ -1827,7 +1827,7 @@ static void ipip6_dellink(struct net_device *dev, struct list_head *head)
> > > unregister_netdevice_queue(dev, head);
> > > }
> > > -static struct rtnl_link_ops sit_link_ops __read_mostly = {
> > > +static struct rtnl_link_ops sit_link_ops = {
> > > .kind = "sit",
> > > .maxtype = IFLA_IPTUN_MAX,
> > > .policy = ipip6_policy,
> > >
> > > Switching to ld.bfd also resolves it:
> > >
> > > [ 1.470405][ T1] sit: nathan: sit_init_net(): &sit_link_ops: c0000000028acfd8
> > > [ 1.471016][ T1] sit: nathan: sit_init_net(): sit_link_ops.kind: sit
> > > [ 1.471534][ T1] sit: nathan: sit_init_net(): sit_link_ops.maxtype: 20
> > > [ 1.472062][ T1] sit: nathan: sit_init_net(): sitn->fb_tunnel_dev->rtnl_link_ops: c0000000028acfd8
> > > [ 1.472790][ T1] sit: nathan: sit_init_net(): sitn->fb_tunnel_dev->rtnl_link_ops->kind: sit
> > > ...
> > > Linux version 5.17.0-rc4-00001-g956f02ad5c31 ([email protected]) (clang version 13.0.1 (Fedora 13.0.1-1.fc37), GNU ld version 2.37-2.fc36) #3 SMP PREEMPT Wed Feb 16 13:33:42 MST 2022
> > > ...
> > >
> > > I tested with ToT LLVM (or at least, close to it, since there is an
> > > unrelated ld.lld regression there) and I could not reproduce it there,
> > > so I did a reverse bisect to see what commit fixes this issue in LLVM 14
> > > and I landed on:
> > >
> > > commit 55c14d6dbfd8e7b86c15d2613fea3490078e2ae4
> > > Author: Fangrui Song <[email protected]>
> > > Date: Thu Nov 25 14:12:34 2021 -0800
> > >
> > > [ELF] Simplify DynamicSection content computation. NFC
> > >
> > > The new code computes the content twice, but avoides the tricky
> > > std::function<uint64_t()>. Removed 13KiB code in a Release build.
> > >
> > > lld/ELF/SyntheticSections.cpp | 117 ++++++++++++++++--------------------------
> > > lld/ELF/SyntheticSections.h | 12 +----
> > > 2 files changed, 44 insertions(+), 85 deletions(-)
> > >
> > > That's... interesting, given that commit title says No Functional
> > > Change, even though there clearly is one. That commit has a couple
> > > mentions of PowerPC synthetic sections, so it is possible that the
> > > new content calculation lines up with ld.bfd?
> > >
> > > I am not really sure where to go from here, as I don't fully understand
> > > what the problem was before that LLD change. I'll see if I can do some
> > > more investigation tomorrow (unless someone wants to beat me to it ;)
> >
> > Thank you for looking into this, and sharing your analysis.
> >
> > I built LLVM/clang from the master branch, rebuilt, but can still reproduce
> > this.
> >
> > $ git clone --depth=1 https://github.com/llvm/llvm-project.git
> > $ cd llvm-project/
> > $ git log --oneline
> > 41cb504b7 [mlir][linalg][bufferize][NFC] Move interface impl to Linalg
> > Transforms
> > $ mkdir build
> > $ cd build
> > $ cmake -DLLVM_ENABLE_PROJECTS=clang -G "Unix Makefiles"
>
> Since this is something related to ld.lld, not clang, this should be:
>
> ... -DLLVM_ENABLE_PROJECTS="clang;lld" ...
>
> > -DCMAKE_BUILD_TYPE=Release -DLLVM_INSTALL_UTILS=ON
> > -DCMAKE_INSTALL_PREFIX=/scratch/local2/llvm ../llvm
> > $ make -j20
> > $ make -j20 clang-check
>
> You can also do 'check-lld' if you want.
>
> > $ make install
> > $ /scratch/local2/llvm/bin/clang --version
> > clang version 15.0.0 (https://github.com/llvm/llvm-project.git
> > 41cb504b7c4b18ac15830107431a0c1eec73a6b2)
> > Target: powerpc64le-unknown-linux-gnu
> > Thread model: posix
> > InstalledDir: /scratch/local2/llvm/bin
> >
> > Then build Linux after `make clean` with `/scratch/local2/llvm/bin` in the
> > path.
> >
> > $ LLVM=1 LLVM_IAS=0 eatmydata make -j20
> >
> > $ qemu-system-ppc64 -enable-kvm -nographic -smp cores=1,threads=1 -net
> > none -enable-kvm -M pseries -nodefaults -device spapr-vscsi -serial stdio -m
> > 512 -kernel /dev/shm/linux/vmlinux -append "debug_boot_weak_hash panic=-1
> > console=ttyS0 rcupdate.rcu_cpu_stall_suppress_at_boot=1
> > torture.disable_onoff_at_boot rcupdate.rcu_task_stall_timeout=30000
> > rcupdate.rcu_self_test=1 rcutorture.onoff_interval=1000
> > rcutorture.onoff_holdoff=30 rcutorture.n_barrier_cbs=4
> > rcutorture.stat_interval=15 rcutorture.shutdown_secs=420
> > rcutorture.test_no_idle_hz=1 rcutorture.verbose=1"
> > […]
> > Preparing to boot Linux version 5.17.0-rc5-00178-ga4b9a8fb20e7
> > ([email protected]) (clang version
> > 15.0.0 (https://github.com/llvm/llvm-project.git
> > 41cb504b7c4b18ac15830107431a0c1eec73a6b2), LLD 13.0.0) #29 SMP PREEMPT Mon
>
> ^ still using ld.lld 13.0.0.
>
> If you want to test the master branch, I would checkout LLVM at
> 460830a9c664e8cce959c660648faa7747ad8bdc, as the next commit introduces
> a boot regression unrelated to this issue:
>
> https://github.com/ClangBuiltLinux/linux/issues/1581
>
> That should at least confirm this is resolved in a newer release.
>
> > Feb 21 10:58:54 CET 2022
> > […]
> > [ 0.465889][ T1] BUG: Kernel NULL pointer dereference on read at
> > 0x00000000
> > [ 0.466749][ T1] Faulting instruction address: 0xc0000000008fc300
> > [ 0.467507][ T1] Oops: Kernel access of bad area, sig: 11 [#1]
> > […]
>
> I do intend to do further analysis at some point over the next few days
> to see if I can figure out exactly why that commit that I mentioned
> above fixes the issue then we can look into what we should do about it
> in the kernel sources.
Sorry for taking so long to get back to this. For me, commit
d79976918852 ("powerpc/64: Add UADDR64 relocation support") resolves
this for ld.lld 13.x. I have started a separate thread about whether or
not this commit is suitable for stable, specifically 5.17 and 5.15:
https://lore.kernel.org/[email protected]/
Cheers,
Nathan