2017-12-17 12:14:49

by Bronek Kozicki

[permalink] [raw]
Subject: PROBLEM: NULL pointer dereference in kernel 4.14.6

Hello


I noticed my system is less stable after upgrade from kernel 4.13 to
4.14.6 yesterday. There was a soft CPU lockup, hard CPU lockup and one
kernel panic while shutting down the system, all on the same day. There
was little diagnostic information collected at the time, but since then
I've enabled some debugging options (SLUB_DEBUG_ON ,
DEBUG_STACKOVERFLOW, DEBUG_LIST, BUG_ON_DATA_CORRUPTION, UBSAN,
UBSAN_SANITIZE_ALL and UBSAN_NULL - a full kernel config is attached).
My distribution is ArchLinux but I configure and build my own kernel. In
case that matters, the hardware is : two Xeon E5-2667 v2 @ 3.30GH ,
128GB RAM (with ECC) of which 80GB is reserved as huge pages for virtual
machines, total memory utilisation usually between 100GB - 110GB (swap
configured but not used).

Today I captured the below. This happened when I tried to start
"systemctl status" twice, the process was immediately killed on both
occasions. As you will notice my kernel is tainted - I am running ZFS as
my root filesystem, built from
https://github.com/zfsonlinux/zfs/tree/zfs-0.7.4 (package based on
https://github.com/archzfs/archzfs ). Hope that does not make this
report *totally* useless. In case it is considered to be useless, please
tell me so, as I do not want to create distraction.

I will send followup if I see further problems (unless told not to).
Please note I am not a subscriber to lkml, so Cc will be appreciated. I
have not noticed similar problems with kernel 4.13 , but I cannot run
bisect (the machine is in use, most of the time).


2017-12-17T11:06:45,938214+0000 ------------[ cut here ]------------
2017-12-17T11:06:45,938224+0000 WARNING: CPU: 9 PID: 22368 at
kernel/fork.c:414 __put_task_struct+0x160/0x230
2017-12-17T11:06:45,938225+0000 Modules linked in: ebtable_filter
ebtables ip6table_filter ip6_tables iptable_filter devlink joydev
hid_logitech_hidpp mxm_wmi intel_rapl sb_edac x86_pkg_temp_thermal
intel_powerclamp coretemp kvm_intel kvm crct10dif_pclmul crc32_pclmul
crc32c_i
ntel ghash_clmulni_intel pcbc aesni_intel aes_x86_64 crypto_simd
glue_helper cryptd ext4 intel_cstate crc16 mbcache jbd2 fscrypto
nls_iso8859_1 nls_cp437 vfat fat evdev input_leds led_class
intel_rapl_perf hid_logitech_dj mac_hid pcspkr igb ptp pps_core i2c_i801
lpc_ich i2c_al
go_bit tpm_tis mei_me ioatdma tpm_tis_core mei shpchp dca tpm wmi button
sch_fq_codel sg ip_tables x_tables usbhid hid zfs(PO) zunicode(PO)
zavl(PO) icp(PO) sd_mod serio_raw atkbd libps2 ahci isci xhci_pci
libahci mpt3sas libsas ehci_pci raid_class xhci_hcd ehci_hcd libata
2017-12-17T11:06:45,938266+0000 scsi_transport_sas usbcore scsi_mod
usb_common i8042 serio zcommon(PO) znvpair(PO) spl(O) nvme nvme_core
bridge stp llc vhost_net tun tap vhost vfio_pci irqbypass vfio_virqfd
vfio_iommu_type1 vfio
2017-12-17T11:06:45,938278+0000 CPU: 9 PID: 22368 Comm: systemctl
Tainted: P O 4.14.6-2-ARCH #1
2017-12-17T11:06:45,938279+0000 Hardware name: Supermicro
X9DA7/E/X9DA7/E, BIOS 3.0a 07/02/2014
2017-12-17T11:06:45,938280+0000 task: ffffa03a7f2f3100 task.stack:
ffffaed1fcf64000
2017-12-17T11:06:45,938282+0000 RIP: 0010:__put_task_struct+0x160/0x230
2017-12-17T11:06:45,938283+0000 RSP: 0018:ffffaed1fcf67d50 EFLAGS:
00010246
2017-12-17T11:06:45,938284+0000 RAX: 0000000000000000 RBX:
ffffa03dc6cda018 RCX: 0000000000000001
2017-12-17T11:06:45,938284+0000 RDX: ffffaed1fcf67df8 RSI:
ffffa03dc6cda018 RDI: ffffa03dc6cda018
2017-12-17T11:06:45,938285+0000 RBP: ffffffffa11e51a0 R08:
0000000000ffff0a R09: 0000000000000008
2017-12-17T11:06:45,938286+0000 R10: ffffaed1fcf67cf8 R11:
0000000000000000 R12: ffffaed1fcf67df8
2017-12-17T11:06:45,938286+0000 R13: ffffa03dc6cda018 R14:
ffffa03dc6cda018 R15: ffffa048a5d893f8
2017-12-17T11:06:45,938287+0000 FS: 00007fbc477598c0(0000)
GS:ffffa0583fa40000(0000) knlGS:0000000000000000
2017-12-17T11:06:45,938288+0000 CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
2017-12-17T11:06:45,938289+0000 CR2: 000055acab282f88 CR3:
00000002da550005 CR4: 00000000001626e0
2017-12-17T11:06:45,938289+0000 Call Trace:
2017-12-17T11:06:45,938294+0000 ? seq_printf+0x4e/0x70
2017-12-17T11:06:45,938297+0000 css_task_iter_next+0x74/0x90
2017-12-17T11:06:45,938300+0000 kernfs_seq_next+0x58/0x110
2017-12-17T11:06:45,938301+0000 seq_read+0x36c/0x620
2017-12-17T11:06:45,938306+0000 ? __handle_mm_fault+0xb10/0x1630
2017-12-17T11:06:45,938309+0000 __vfs_read+0x54/0x2e0
2017-12-17T11:06:45,938310+0000 vfs_read+0x9d/0x200
2017-12-17T11:06:45,938311+0000 SyS_read+0x52/0xc0
2017-12-17T11:06:45,938317+0000 entry_SYSCALL_64_fastpath+0x1a/0xa5
2017-12-17T11:06:45,938319+0000 RIP: 0033:0x7fbc47072a11
2017-12-17T11:06:45,938319+0000 RSP: 002b:00007fff8b1e5758 EFLAGS:
00000246 ORIG_RAX: 0000000000000000
2017-12-17T11:06:45,938320+0000 RAX: ffffffffffffffda RBX:
00007fbc4733daa0 RCX: 00007fbc47072a11
2017-12-17T11:06:45,938321+0000 RDX: 0000000000001000 RSI:
000055acab281f80 RDI: 0000000000000008
2017-12-17T11:06:45,938321+0000 RBP: 00007fbc4733db00 R08:
0000000000000003 R09: ffffffffffffffb0
2017-12-17T11:06:45,938322+0000 R10: 0000000000001000 R11:
0000000000000246 R12: 0000000000001010
2017-12-17T11:06:45,938323+0000 R13: 00007fbc4733db00 R14:
0000000000001000 R15: 0000000000000001
2017-12-17T11:06:45,938324+0000 Code: 44 24 10 65 48 33 04 25 28 00 00
00 0f 85 85 00 00 00 48 83 c4 18 48 89 df 5b 5d 41 5c 41 5d e9 27 fe ff
ff 0f ff e9 ee fe ff ff <0f> ff e9 d2 fe ff ff 0f ff e9 f2 fe ff ff 4d
8d ac 24 d0 03 00
2017-12-17T11:06:45,938342+0000 ---[ end trace 8979357ae8817e59 ]---
2017-12-17T11:06:45,938343+0000
================================================================================
2017-12-17T11:06:45,946841+0000 UBSAN: Undefined behaviour in
kernel/cgroup/pids.c:67:9
2017-12-17T11:06:45,953117+0000 member access within null pointer of
type 'struct pids_cgroup'
2017-12-17T11:06:45,960032+0000 CPU: 9 PID: 22368 Comm: systemctl
Tainted: P W O 4.14.6-2-ARCH #1
2017-12-17T11:06:45,960033+0000 Hardware name: Supermicro
X9DA7/E/X9DA7/E, BIOS 3.0a 07/02/2014
2017-12-17T11:06:45,960033+0000 Call Trace:
2017-12-17T11:06:45,960038+0000 dump_stack+0x70/0xae
2017-12-17T11:06:45,960043+0000 ubsan_epilogue+0x9/0x40
2017-12-17T11:06:45,960045+0000
__ubsan_handle_type_mismatch+0x104/0x180
2017-12-17T11:06:45,960057+0000 pids_free+0x99/0xb0
2017-12-17T11:06:45,960058+0000 cgroup_free+0xaa/0x190
2017-12-17T11:06:45,960060+0000 __put_task_struct+0x68/0x230
2017-12-17T11:06:45,960062+0000 ? seq_printf+0x4e/0x70
2017-12-17T11:06:45,960063+0000 css_task_iter_next+0x74/0x90
2017-12-17T11:06:45,960064+0000 kernfs_seq_next+0x58/0x110
2017-12-17T11:06:45,960066+0000 seq_read+0x36c/0x620
2017-12-17T11:06:45,960068+0000 ? __handle_mm_fault+0xb10/0x1630
2017-12-17T11:06:45,960069+0000 __vfs_read+0x54/0x2e0
2017-12-17T11:06:45,960070+0000 vfs_read+0x9d/0x200
2017-12-17T11:06:45,960072+0000 SyS_read+0x52/0xc0
2017-12-17T11:06:45,960073+0000 entry_SYSCALL_64_fastpath+0x1a/0xa5
2017-12-17T11:06:45,960074+0000 RIP: 0033:0x7fbc47072a11
2017-12-17T11:06:45,960075+0000 RSP: 002b:00007fff8b1e5758 EFLAGS:
00000246 ORIG_RAX: 0000000000000000
2017-12-17T11:06:45,960076+0000 RAX: ffffffffffffffda RBX:
00007fbc4733daa0 RCX: 00007fbc47072a11
2017-12-17T11:06:45,960077+0000 RDX: 0000000000001000 RSI:
000055acab281f80 RDI: 0000000000000008
2017-12-17T11:06:45,960077+0000 RBP: 00007fbc4733db00 R08:
0000000000000003 R09: ffffffffffffffb0
2017-12-17T11:06:45,960078+0000 R10: 0000000000001000 R11:
0000000000000246 R12: 0000000000001010
2017-12-17T11:06:45,960079+0000 R13: 00007fbc4733db00 R14:
0000000000001000 R15: 0000000000000001
2017-12-17T11:06:45,960080+0000
================================================================================
2017-12-17T11:06:45,969848+0000 BUG: unable to handle kernel NULL
pointer dereference at 00000000000000b0
2017-12-17T11:06:45,977727+0000 IP: pids_free+0x28/0xb0
2017-12-17T11:06:45,981213+0000 PGD 0 P4D 0
2017-12-17T11:06:45,983783+0000 Oops: 0000 [#1] SMP
2017-12-17T11:06:45,986934+0000 Modules linked in: ebtable_filter
ebtables ip6table_filter ip6_tables iptable_filter devlink joydev
hid_logitech_hidpp mxm_wmi intel_rapl sb_edac x86_pkg_temp_thermal
intel_powerclamp coretemp kvm_intel kvm crct10dif_pclmul crc32_pclmul
crc32c_i
ntel ghash_clmulni_intel pcbc aesni_intel aes_x86_64 crypto_simd
glue_helper cryptd ext4 intel_cstate crc16 mbcache jbd2 fscrypto
nls_iso8859_1 nls_cp437 vfat fat evdev input_leds led_class
intel_rapl_perf hid_logitech_dj mac_hid pcspkr igb ptp pps_core i2c_i801
lpc_ich i2c_al
go_bit tpm_tis mei_me ioatdma tpm_tis_core mei shpchp dca tpm wmi button
sch_fq_codel sg ip_tables x_tables usbhid hid zfs(PO) zunicode(PO)
zavl(PO) icp(PO) sd_mod serio_raw atkbd libps2 ahci isci xhci_pci
libahci mpt3sas libsas ehci_pci raid_class xhci_hcd ehci_hcd libata
2017-12-17T11:06:46,057445+0000 scsi_transport_sas usbcore scsi_mod
usb_common i8042 serio zcommon(PO) znvpair(PO) spl(O) nvme nvme_core
bridge stp llc vhost_net tun tap vhost vfio_pci irqbypass vfio_virqfd
vfio_iommu_type1 vfio
2017-12-17T11:06:46,076039+0000 CPU: 0 PID: 22368 Comm: systemctl
Tainted: P W O 4.14.6-2-ARCH #1
2017-12-17T11:06:46,084121+0000 Hardware name: Supermicro
X9DA7/E/X9DA7/E, BIOS 3.0a 07/02/2014
2017-12-17T11:06:46,091081+0000 task: ffffa03a7f2f3100 task.stack:
ffffaed1fcf64000
2017-12-17T11:06:46,097000+0000 RIP: 0010:pids_free+0x28/0xb0
2017-12-17T11:06:46,101013+0000 RSP: 0018:ffffaed1fcf67ce8 EFLAGS:
00010282
2017-12-17T11:06:46,106239+0000 RAX: 0000000000000000 RBX:
0000000000000000 RCX: 0000000000000006
2017-12-17T11:06:46,113371+0000 RDX: 0000000000000000 RSI:
0000000000000202 RDI: 0000000000000202
2017-12-17T11:06:46,120513+0000 RBP: ffffa03dc6cda018 R08:
00000000000007bd R09: 0000000000000000
2017-12-17T11:06:46,127645+0000 R10: 00000000001f6ed8 R11:
00000000000ea650 R12: 000000005ee1a648
2017-12-17T11:06:46,134777+0000 R13: ffffffffa11e59c0 R14:
ffffa03db233a900 R15: ffffffffa11d44a0
2017-12-17T11:06:46,141910+0000 FS: 00007fbc477598c0(0000)
GS:ffffa047ffa00000(0000) knlGS:0000000000000000
2017-12-17T11:06:46,150031+0000 CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
2017-12-17T11:06:46,155845+0000 CR2: 000055bf3e43f9f0 CR3:
00000002da550006 CR4: 00000000001626f0
2017-12-17T11:06:46,163004+0000 Call Trace:
2017-12-17T11:06:46,165514+0000 cgroup_free+0xaa/0x190
2017-12-17T11:06:46,169019+0000 __put_task_struct+0x68/0x230
2017-12-17T11:06:46,173028+0000 ? seq_printf+0x4e/0x70
2017-12-17T11:06:46,176527+0000 css_task_iter_next+0x74/0x90
2017-12-17T11:06:46,180541+0000 kernfs_seq_next+0x58/0x110
2017-12-17T11:06:46,184380+0000 seq_read+0x36c/0x620
2017-12-17T11:06:46,187702+0000 ? __handle_mm_fault+0xb10/0x1630
2017-12-17T11:06:46,192060+0000 __vfs_read+0x54/0x2e0
2017-12-17T11:06:46,195463+0000 vfs_read+0x9d/0x200
2017-12-17T11:06:46,198696+0000 SyS_read+0x52/0xc0
2017-12-17T11:06:46,201847+0000 entry_SYSCALL_64_fastpath+0x1a/0xa5
2017-12-17T11:06:46,206463+0000 RIP: 0033:0x7fbc47072a11
2017-12-17T11:06:46,210039+0000 RSP: 002b:00007fff8b1e5758 EFLAGS:
00000246 ORIG_RAX: 0000000000000000
2017-12-17T11:06:46,217604+0000 RAX: ffffffffffffffda RBX:
00007fbc4733daa0 RCX: 00007fbc47072a11
2017-12-17T11:06:46,224737+0000 RDX: 0000000000001000 RSI:
000055acab281f80 RDI: 0000000000000008
2017-12-17T11:06:46,231861+0000 RBP: 00007fbc4733db00 R08:
0000000000000003 R09: ffffffffffffffb0
2017-12-17T11:06:46,238984+0000 R10: 0000000000001000 R11:
0000000000000246 R12: 0000000000001010
2017-12-17T11:06:46,246117+0000 R13: 00007fbc4733db00 R14:
0000000000001000 R15: 0000000000000001
2017-12-17T11:06:46,253253+0000 Code: 44 00 00 0f 1f 44 00 00 48 81 ff
c8 f7 ff ff 55 53 48 89 fb 74 4c 48 8b 9b 38 08 00 00 48 85 db 74 7c 48
8b 5b 50 48 85 db 74 63 <48> 83 bb b0 00 00 00 00 74 2a 48 c7 c5 60 a2
1e a1 48 89 df e8
2017-12-17T11:06:46,272137+0000 RIP: pids_free+0x28/0xb0 RSP:
ffffaed1fcf67ce8
2017-12-17T11:06:46,277621+0000 CR2: 00000000000000b0
2017-12-17T11:06:46,281044+0000 ---[ end trace 8979357ae8817e5a ]---
2017-12-17T11:06:54,977759+0000 ------------[ cut here ]------------
2017-12-17T11:06:54,982387+0000 WARNING: CPU: 3 PID: 23507 at
kernel/fork.c:414 __put_task_struct+0x160/0x230
2017-12-17T11:06:54,990551+0000 Modules linked in: ebtable_filter
ebtables ip6table_filter ip6_tables iptable_filter devlink joydev
hid_logitech_hidpp mxm_wmi intel_rapl sb_edac x86_pkg_temp_thermal
intel_powerclamp coretemp kvm_intel kvm crct10dif_pclmul crc32_pclmul
crc32c_intel ghash_clmulni_intel pcbc aesni_intel aes_x86_64 crypto_simd
glue_helper cryptd ext4 intel_cstate crc16 mbcache jbd2 fscrypto
nls_iso8859_1 nls_cp437 vfat fat evdev input_leds led_class
intel_rapl_perf hid_logitech_dj mac_hid pcspkr igb ptp pps_core i2c_i801
lpc_ich i2c_algo_bit tpm_tis mei_me ioatdma tpm_tis_core mei shpchp dca
tpm wmi button sch_fq_codel sg ip_tables x_tables usbhid hid zfs(PO)
zunicode(PO) zavl(PO) icp(PO) sd_mod serio_raw atkbd libps2 ahci isci
xhci_pci libahci mpt3sas libsas ehci_pci raid_class xhci_hcd ehci_hcd
libata
2017-12-17T11:06:55,060998+0000 scsi_transport_sas usbcore scsi_mod
usb_common i8042 serio zcommon(PO) znvpair(PO) spl(O) nvme nvme_core
bridge stp llc vhost_net tun tap vhost vfio_pci irqbypass vfio_virqfd
vfio_iommu_type1 vfio
2017-12-17T11:06:55,079579+0000 CPU: 3 PID: 23507 Comm: systemctl
Tainted: P D W O 4.14.6-2-ARCH #1
2017-12-17T11:06:55,087662+0000 Hardware name: Supermicro
X9DA7/E/X9DA7/E, BIOS 3.0a 07/02/2014
2017-12-17T11:06:55,094613+0000 task: ffffa03ace590040 task.stack:
ffffaed2046f4000
2017-12-17T11:06:55,100528+0000 RIP: 0010:__put_task_struct+0x160/0x230
2017-12-17T11:06:55,105402+0000 RSP: 0018:ffffaed2046f7d50 EFLAGS:
00010246
2017-12-17T11:06:55,110620+0000 RAX: 0000000000000000 RBX:
ffffa03dc3280598 RCX: 0000000000000001
2017-12-17T11:06:55,117743+0000 RDX: ffffaed2046f7df8 RSI:
ffffa03dc3280598 RDI: ffffa03dc3280598
2017-12-17T11:06:55,124869+0000 RBP: ffffffffa11e51a0 R08:
0000000000ffff0a R09: 0000000000000008
2017-12-17T11:06:55,131992+0000 R10: ffffaed2046f7cf8 R11:
0000000000000000 R12: ffffaed2046f7df8
2017-12-17T11:06:55,139116+0000 R13: ffffa03dc3280598 R14:
ffffa03dc3280598 R15: ffffa03b3b4d3bd8
2017-12-17T11:06:55,146240+0000 FS: 00007fd9e00178c0(0000)
GS:ffffa047ffac0000(0000) knlGS:0000000000000000
2017-12-17T11:06:55,154318+0000 CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
2017-12-17T11:06:55,160054+0000 CR2: 0000565393a1d038 CR3:
0000000310850003 CR4: 00000000001626e0
2017-12-17T11:06:55,167180+0000 Call Trace:
2017-12-17T11:06:55,169627+0000 ? seq_printf+0x4e/0x70
2017-12-17T11:06:55,173117+0000 css_task_iter_next+0x74/0x90
2017-12-17T11:06:55,177122+0000 kernfs_seq_next+0x58/0x110
2017-12-17T11:06:55,180959+0000 seq_read+0x36c/0x620
2017-12-17T11:06:55,184272+0000 __vfs_read+0x54/0x2e0
2017-12-17T11:06:55,187676+0000 vfs_read+0x9d/0x200
2017-12-17T11:06:55,190901+0000 SyS_read+0x52/0xc0
2017-12-17T11:06:55,194039+0000 entry_SYSCALL_64_fastpath+0x1a/0xa5
2017-12-17T11:06:55,198657+0000 RIP: 0033:0x7fd9df930a11
2017-12-17T11:06:55,202228+0000 RSP: 002b:00007fff897b16e8 EFLAGS:
00000246 ORIG_RAX: 0000000000000000
2017-12-17T11:06:55,209783+0000 RAX: ffffffffffffffda RBX:
0000000000000008 RCX: 00007fd9df930a11
2017-12-17T11:06:55,216909+0000 RDX: 0000000000001000 RSI:
0000565393a1c020 RDI: 0000000000000008
2017-12-17T11:06:55,224032+0000 RBP: 0000565393a02860 R08:
0000000000000003 R09: ffffffffffffffb0
2017-12-17T11:06:55,231157+0000 R10: 0000000000001000 R11:
0000000000000246 R12: 0000000000000000
2017-12-17T11:06:55,238281+0000 R13: 0000000000000000 R14:
0000565393a023b0 R15: 0000000000000000
2017-12-17T11:06:55,245405+0000 Code: 44 24 10 65 48 33 04 25 28 00 00
00 0f 85 85 00 00 00 48 83 c4 18 48 89 df 5b 5d 41 5c 41 5d e9 27 fe ff
ff 0f ff e9 ee fe ff ff <0f> ff e9 d2 fe ff ff 0f ff e9 f2 fe ff ff 4d
8d ac 24 d0 03 00
2017-12-17T11:06:55,264247+0000 ---[ end trace 8979357ae8817e5b ]---
2017-12-17T11:06:55,268889+0000 BUG: unable to handle kernel NULL
pointer dereference at 00000000000000b0
2017-12-17T11:06:55,276710+0000 IP: pids_free+0x28/0xb0
2017-12-17T11:06:55,280201+0000 PGD 0 P4D 0
2017-12-17T11:06:55,282733+0000 Oops: 0000 [#2] SMP
2017-12-17T11:06:55,285870+0000 Modules linked in: ebtable_filter
ebtables ip6table_filter ip6_tables iptable_filter devlink joydev
hid_logitech_hidpp mxm_wmi intel_rapl sb_edac x86_pkg_temp_thermal
intel_powerclamp coretemp kvm_intel kvm crct10dif_pclmul crc32_pclmul
crc32c_intel ghash_clmulni_intel pcbc aesni_intel aes_x86_64 crypto_simd
glue_helper cryptd ext4 intel_cstate crc16 mbcache jbd2 fscrypto
nls_iso8859_1 nls_cp437 vfat fat evdev input_leds led_class
intel_rapl_perf hid_logitech_dj mac_hid pcspkr igb ptp pps_core i2c_i801
lpc_ich i2c_algo_bit tpm_tis mei_me ioatdma tpm_tis_core mei shpchp dca
tpm wmi button sch_fq_codel sg ip_tables x_tables usbhid hid zfs(PO)
zunicode(PO) zavl(PO) icp(PO) sd_mod serio_raw atkbd libps2 ahci isci
xhci_pci libahci mpt3sas libsas ehci_pci raid_class xhci_hcd ehci_hcd
libata
2017-12-17T11:06:55,356310+0000 scsi_transport_sas usbcore scsi_mod
usb_common i8042 serio zcommon(PO) znvpair(PO) spl(O) nvme nvme_core
bridge stp llc vhost_net tun tap vhost vfio_pci irqbypass vfio_virqfd
vfio_iommu_type1 vfio
2017-12-17T11:06:55,374886+0000 CPU: 3 PID: 23507 Comm: systemctl
Tainted: P D W O 4.14.6-2-ARCH #1
2017-12-17T11:06:55,382969+0000 Hardware name: Supermicro
X9DA7/E/X9DA7/E, BIOS 3.0a 07/02/2014
2017-12-17T11:06:55,389923+0000 task: ffffa03ace590040 task.stack:
ffffaed2046f4000
2017-12-17T11:06:55,395844+0000 RIP: 0010:pids_free+0x28/0xb0
2017-12-17T11:06:55,399853+0000 RSP: 0018:ffffaed2046f7ce8 EFLAGS:
00010282
2017-12-17T11:06:55,405071+0000 RAX: 0000000000000000 RBX:
0000000000000000 RCX: 0000000000000000
2017-12-17T11:06:55,412196+0000 RDX: 0000000000000058 RSI:
0000000000000000 RDI: ffffffffa11ea2a0
2017-12-17T11:06:55,419319+0000 RBP: ffffa03dc3280598 R08:
0000000000ffff0a R09: 0000000000000008
2017-12-17T11:06:55,426443+0000 R10: ffffaed2046f7cf8 R11:
0000000000000000 R12: 000000005ee1a648
2017-12-17T11:06:55,433567+0000 R13: ffffffffa11e59c0 R14:
ffffa03c147618c0 R15: ffffffffa11d44a0
2017-12-17T11:06:55,440691+0000 FS: 00007fd9e00178c0(0000)
GS:ffffa047ffac0000(0000) knlGS:0000000000000000
2017-12-17T11:06:55,448768+0000 CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
2017-12-17T11:06:55,454504+0000 CR2: 00000000000000b0 CR3:
0000000310850003 CR4: 00000000001626e0
2017-12-17T11:06:55,461630+0000 Call Trace:
2017-12-17T11:06:55,464077+0000 cgroup_free+0xaa/0x190
2017-12-17T11:06:55,467570+0000 __put_task_struct+0x68/0x230
2017-12-17T11:06:55,471581+0000 ? seq_printf+0x4e/0x70
2017-12-17T11:06:55,475071+0000 css_task_iter_next+0x74/0x90
2017-12-17T11:06:55,479077+0000 kernfs_seq_next+0x58/0x110
2017-12-17T11:06:55,482916+0000 seq_read+0x36c/0x620
2017-12-17T11:06:55,486228+0000 __vfs_read+0x54/0x2e0
2017-12-17T11:06:55,489631+0000 vfs_read+0x9d/0x200
2017-12-17T11:06:55,492856+0000 SyS_read+0x52/0xc0
2017-12-17T11:06:55,495997+0000 entry_SYSCALL_64_fastpath+0x1a/0xa5
2017-12-17T11:06:55,500613+0000 RIP: 0033:0x7fd9df930a11
2017-12-17T11:06:55,504192+0000 RSP: 002b:00007fff897b16e8 EFLAGS:
00000246 ORIG_RAX: 0000000000000000
2017-12-17T11:06:55,511760+0000 RAX: ffffffffffffffda RBX:
0000000000000008 RCX: 00007fd9df930a11
2017-12-17T11:06:55,518891+0000 RDX: 0000000000001000 RSI:
0000565393a1c020 RDI: 0000000000000008
2017-12-17T11:06:55,526014+0000 RBP: 0000565393a02860 R08:
0000000000000003 R09: ffffffffffffffb0
2017-12-17T11:06:55,533138+0000 R10: 0000000000001000 R11:
0000000000000246 R12: 0000000000000000
2017-12-17T11:06:55,540262+0000 R13: 0000000000000000 R14:
0000565393a023b0 R15: 0000000000000000
2017-12-17T11:06:55,547388+0000 Code: 44 00 00 0f 1f 44 00 00 48 81 ff
c8 f7 ff ff 55 53 48 89 fb 74 4c 48 8b 9b 38 08 00 00 48 85 db 74 7c 48
8b 5b 50 48 85 db 74 63 <48> 83 bb b0 00 00 00 00 74 2a 48 c7 c5 60 a2
1e a1 48 89 df e8
2017-12-17T11:06:55,566241+0000 RIP: pids_free+0x28/0xb0 RSP:
ffffaed2046f7ce8
2017-12-17T11:06:55,571722+0000 CR2: 00000000000000b0
2017-12-17T11:06:55,575101+0000 ---[ end trace 8979357ae8817e5c ]---



--
Bronek Kozicki
[email protected]


Attachments:
config.gz (47.08 kB)

2017-12-17 13:29:22

by Bronek Kozicki

[permalink] [raw]
Subject: Re: PROBLEM: NULL pointer dereference in kernel 4.14.6

This has happend again, and hopefully the report is not as mangled as
the previous one. I was also trying to start "systemctl status", only
once this time. The kernel build is different because I've just disabled
RCU tracing/debugging options. One more thing, this kernel was built with gcc 7.2.1


B.


2017-12-17T12:50:38,640725+0000 ------------[ cut here ]------------
2017-12-17T12:50:38,640741+0000 WARNING: CPU: 10 PID: 16921 at kernel/fork.c:414 __put_task_struct+0x160/0x230
2017-12-17T12:50:38,640742+0000 Modules linked in: ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter devlink joydev hid_logitech_hidpp mxm_wmi intel_rapl sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel ext4 kvm crct10dif_pclmul crc32_pc
lmul crc32c_intel ghash_clmulni_intel pcbc crc16 mbcache aesni_intel jbd2 aes_x86_64 crypto_simd glue_helper cryptd nls_iso8859_1 nls_cp437 vfat fscrypto fat intel_cstate evdev input_leds led_class intel_rapl_perf mac_hid pcspkr igb hid_logitech_dj ptp pps_core i2c_alg
o_bit tpm_tis ioatdma mei_me i2c_i801 tpm_tis_core lpc_ich mei dca shpchp tpm wmi button sch_fq_codel sg ip_tables x_tables usbhid hid zfs(PO) zunicode(PO) zavl(PO) icp(PO) sd_mod serio_raw atkbd libps2 isci ehci_pci ahci xhci_pci libsas libahci mpt3sas xhci_hcd ehci_h
cd raid_class libata
2017-12-17T12:50:38,640812+0000 scsi_transport_sas usbcore scsi_mod usb_common i8042 serio zcommon(PO) znvpair(PO) spl(O) nvme nvme_core bridge stp llc vhost_net tun tap vhost vfio_pci irqbypass vfio_virqfd vfio_iommu_type1 vfio
2017-12-17T12:50:38,640833+0000 CPU: 10 PID: 16921 Comm: systemctl Tainted: P O 4.14.6-3-ARCH #1
2017-12-17T12:50:38,640835+0000 Hardware name: Supermicro X9DA7/E/X9DA7/E, BIOS 3.0a 07/02/2014
2017-12-17T12:50:38,640837+0000 task: ffff9c4b5475c140 task.stack: ffffb4bf8641c000
2017-12-17T12:50:38,640840+0000 RIP: 0010:__put_task_struct+0x160/0x230
2017-12-17T12:50:38,640841+0000 RSP: 0018:ffffb4bf8641fd50 EFLAGS: 00010246
2017-12-17T12:50:38,640843+0000 RAX: 0000000000000000 RBX: ffff9c4b4f2c33f8 RCX: 0000000000000001
2017-12-17T12:50:38,640845+0000 RDX: ffffb4bf8641fdf8 RSI: ffff9c4b4f2c33f8 RDI: ffff9c4b4f2c33f8
2017-12-17T12:50:38,640846+0000 RBP: ffffffffb21ddda0 R08: 0000000000ffff0a R09: 0000000000000008
2017-12-17T12:50:38,640847+0000 R10: ffffb4bf8641fcf8 R11: 0000000000000000 R12: ffffb4bf8641fdf8
2017-12-17T12:50:38,640849+0000 R13: ffff9c4b4f2c33f8 R14: ffff9c4b4f2c33f8 R15: ffff9c655f98c578
2017-12-17T12:50:38,640851+0000 FS: 00007fb1df5308c0(0000) GS:ffff9c65bfa80000(0000) knlGS:0000000000000000
2017-12-17T12:50:38,640852+0000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
2017-12-17T12:50:38,640853+0000 CR2: 000055e957fd6f78 CR3: 00000006042fb001 CR4: 00000000001626e0
2017-12-17T12:50:38,640855+0000 Call Trace:
2017-12-17T12:50:38,640862+0000 ? seq_printf+0x4e/0x70
2017-12-17T12:50:38,640870+0000 css_task_iter_next+0x74/0x90
2017-12-17T12:50:38,640876+0000 kernfs_seq_next+0x58/0x110
2017-12-17T12:50:38,640878+0000 seq_read+0x36c/0x620
2017-12-17T12:50:38,640886+0000 ? __handle_mm_fault+0xb10/0x1630
2017-12-17T12:50:38,640889+0000 __vfs_read+0x54/0x2e0
2017-12-17T12:50:38,640891+0000 vfs_read+0x9d/0x200
2017-12-17T12:50:38,640893+0000 SyS_read+0x52/0xc0
2017-12-17T12:50:38,640899+0000 entry_SYSCALL_64_fastpath+0x1a/0xa5
2017-12-17T12:50:38,640902+0000 RIP: 0033:0x7fb1dee49a11
2017-12-17T12:50:38,640903+0000 RSP: 002b:00007ffcf8aa5268 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
2017-12-17T12:50:38,640905+0000 RAX: ffffffffffffffda RBX: 00007fb1df114aa0 RCX: 00007fb1dee49a11
2017-12-17T12:50:38,640907+0000 RDX: 0000000000001000 RSI: 000055e957fd5f70 RDI: 0000000000000008
2017-12-17T12:50:38,640908+0000 RBP: 00007fb1df114b00 R08: 0000000000000003 R09: ffffffffffffffb0
2017-12-17T12:50:38,640909+0000 R10: 0000000000001000 R11: 0000000000000246 R12: 0000000000001010
2017-12-17T12:50:38,640910+0000 R13: 00007fb1df114b00 R14: 0000000000001000 R15: 0000000000000001
2017-12-17T12:50:38,640912+0000 Code: 44 24 10 65 48 33 04 25 28 00 00 00 0f 85 85 00 00 00 48 83 c4 18 48 89 df 5b 5d 41 5c 41 5d e9 27 fe ff ff 0f ff e9 ee fe ff ff <0f> ff e9 d2 fe ff ff 0f ff e9 f2 fe ff ff 4d 8d ac 24 d0 03 00
2017-12-17T12:50:38,640950+0000 ---[ end trace bc939269a984f4e0 ]---
2017-12-17T12:50:38,640953+0000 ================================================================================
2017-12-17T12:50:38,649395+0000 UBSAN: Undefined behaviour in kernel/cgroup/pids.c:67:9
2017-12-17T12:50:38,655693+0000 member access within null pointer of type 'struct pids_cgroup'
2017-12-17T12:50:38,662630+0000 CPU: 10 PID: 16921 Comm: systemctl Tainted: P W O 4.14.6-3-ARCH #1
2017-12-17T12:50:38,662631+0000 Hardware name: Supermicro X9DA7/E/X9DA7/E, BIOS 3.0a 07/02/2014
2017-12-17T12:50:38,662632+0000 Call Trace:
2017-12-17T12:50:38,662638+0000 dump_stack+0x70/0xae
2017-12-17T12:50:38,662645+0000 ubsan_epilogue+0x9/0x40
2017-12-17T12:50:38,662648+0000 __ubsan_handle_type_mismatch+0x104/0x180
2017-12-17T12:50:38,662653+0000 pids_free+0x99/0xb0
2017-12-17T12:50:38,662657+0000 cgroup_free+0xaa/0x190
2017-12-17T12:50:38,662661+0000 __put_task_struct+0x68/0x230
2017-12-17T12:50:38,662664+0000 ? seq_printf+0x4e/0x70
2017-12-17T12:50:38,662668+0000 css_task_iter_next+0x74/0x90
2017-12-17T12:50:38,662671+0000 kernfs_seq_next+0x58/0x110
2017-12-17T12:50:38,662674+0000 seq_read+0x36c/0x620
2017-12-17T12:50:38,662678+0000 ? __handle_mm_fault+0xb10/0x1630
2017-12-17T12:50:38,662680+0000 __vfs_read+0x54/0x2e0
2017-12-17T12:50:38,662683+0000 vfs_read+0x9d/0x200
2017-12-17T12:50:38,662685+0000 SyS_read+0x52/0xc0
2017-12-17T12:50:38,662688+0000 entry_SYSCALL_64_fastpath+0x1a/0xa5
2017-12-17T12:50:38,662691+0000 RIP: 0033:0x7fb1dee49a11
2017-12-17T12:50:38,662692+0000 RSP: 002b:00007ffcf8aa5268 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
2017-12-17T12:50:38,662695+0000 RAX: ffffffffffffffda RBX: 00007fb1df114aa0 RCX: 00007fb1dee49a11
2017-12-17T12:50:38,662696+0000 RDX: 0000000000001000 RSI: 000055e957fd5f70 RDI: 0000000000000008
2017-12-17T12:50:38,662698+0000 RBP: 00007fb1df114b00 R08: 0000000000000003 R09: ffffffffffffffb0
2017-12-17T12:50:38,662699+0000 R10: 0000000000001000 R11: 0000000000000246 R12: 0000000000001010
2017-12-17T12:50:38,662700+0000 R13: 00007fb1df114b00 R14: 0000000000001000 R15: 0000000000000001
2017-12-17T12:50:38,662703+0000 ================================================================================
2017-12-17T12:50:38,671300+0000 BUG: unable to handle kernel NULL pointer dereference at 00000000000000b0
2017-12-17T12:50:38,679128+0000 IP: pids_free+0x28/0xb0
2017-12-17T12:50:38,682627+0000 PGD 0 P4D 0
2017-12-17T12:50:38,685166+0000 Oops: 0000 [#1] SMP
2017-12-17T12:50:38,688305+0000 Modules linked in: ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter devlink joydev hid_logitech_hidpp mxm_wmi intel_rapl sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel ext4 kvm crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel pcbc crc16 mbcache aesni_intel jbd2 aes_x86_64 crypto_simd glue_helper cryptd nls_iso8859_1 nls_cp437 vfat fscrypto fat intel_cstate evdev input_leds led_class intel_rapl_perf mac_hid pcspkr igb hid_logitech_dj ptp pps_core i2c_algo_bit tpm_tis ioatdma mei_me i2c_i801 tpm_tis_core lpc_ich mei dca shpchp tpm wmi button sch_fq_codel sg ip_tables x_tables usbhid hid zfs(PO) zunicode(PO) zavl(PO) icp(PO) sd_mod serio_raw atkbd libps2 isci ehci_pci ahci xhci_pci libsas libahci mpt3sas xhci_hcd ehci_hcd raid_class libata
2017-12-17T12:50:38,758799+0000 scsi_transport_sas usbcore scsi_mod usb_common i8042 serio zcommon(PO) znvpair(PO) spl(O) nvme nvme_core bridge stp llc vhost_net tun tap vhost vfio_pci irqbypass vfio_virqfd vfio_iommu_type1 vfio
2017-12-17T12:50:38,777383+0000 CPU: 10 PID: 16921 Comm: systemctl Tainted: P W O 4.14.6-3-ARCH #1
2017-12-17T12:50:38,785554+0000 Hardware name: Supermicro X9DA7/E/X9DA7/E, BIOS 3.0a 07/02/2014
2017-12-17T12:50:38,792504+0000 task: ffff9c4b5475c140 task.stack: ffffb4bf8641c000
2017-12-17T12:50:38,798418+0000 RIP: 0010:pids_free+0x28/0xb0
2017-12-17T12:50:38,802426+0000 RSP: 0018:ffffb4bf8641fce8 EFLAGS: 00010282
2017-12-17T12:50:38,807644+0000 RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000006
2017-12-17T12:50:38,814767+0000 RDX: 0000000000000000 RSI: 0000000000000202 RDI: 0000000000000202
2017-12-17T12:50:38,821893+0000 RBP: ffff9c4b4f2c33f8 R08: 0000000000000790 R09: 0000000000000000
2017-12-17T12:50:38,829015+0000 R10: 00000000001f586a R11: 00000000000a1caf R12: 000000004de21a48
2017-12-17T12:50:38,836141+0000 R13: ffffffffb21de5c0 R14: ffff9c4b53afc980 R15: ffffffffb21cd0a0
2017-12-17T12:50:38,843273+0000 FS: 00007fb1df5308c0(0000) GS:ffff9c65bfa80000(0000) knlGS:0000000000000000
2017-12-17T12:50:38,851349+0000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
2017-12-17T12:50:38,857088+0000 CR2: 00000000000000b0 CR3: 00000006042fb001 CR4: 00000000001626e0
2017-12-17T12:50:38,864211+0000 Call Trace:
2017-12-17T12:50:38,866661+0000 cgroup_free+0xaa/0x190
2017-12-17T12:50:38,870154+0000 __put_task_struct+0x68/0x230
2017-12-17T12:50:38,874164+0000 ? seq_printf+0x4e/0x70
2017-12-17T12:50:38,877654+0000 css_task_iter_next+0x74/0x90
2017-12-17T12:50:38,881670+0000 kernfs_seq_next+0x58/0x110
2017-12-17T12:50:38,885507+0000 seq_read+0x36c/0x620
2017-12-17T12:50:38,888831+0000 ? __handle_mm_fault+0xb10/0x1630
2017-12-17T12:50:38,893187+0000 __vfs_read+0x54/0x2e0
2017-12-17T12:50:38,896589+0000 vfs_read+0x9d/0x200
2017-12-17T12:50:38,899814+0000 SyS_read+0x52/0xc0
2017-12-17T12:50:38,902955+0000 entry_SYSCALL_64_fastpath+0x1a/0xa5
2017-12-17T12:50:38,907573+0000 RIP: 0033:0x7fb1dee49a11
2017-12-17T12:50:38,911150+0000 RSP: 002b:00007ffcf8aa5268 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
2017-12-17T12:50:38,918707+0000 RAX: ffffffffffffffda RBX: 00007fb1df114aa0 RCX: 00007fb1dee49a11
2017-12-17T12:50:38,925839+0000 RDX: 0000000000001000 RSI: 000055e957fd5f70 RDI: 0000000000000008
2017-12-17T12:50:38,932963+0000 RBP: 00007fb1df114b00 R08: 0000000000000003 R09: ffffffffffffffb0
2017-12-17T12:50:38,940088+0000 R10: 0000000000001000 R11: 0000000000000246 R12: 0000000000001010
2017-12-17T12:50:38,947211+0000 R13: 00007fb1df114b00 R14: 0000000000001000 R15: 0000000000000001
2017-12-17T12:50:38,954336+0000 Code: 44 00 00 0f 1f 44 00 00 48 81 ff c8 f7 ff ff 55 53 48 89 fb 74 4c 48 8b 9b 38 08 00 00 48 85 db 74 7c 48 8b 5b 50 48 85 db 74 63 <48> 83 bb b0 00 00 00 00 74 2a 48 c7 c5 60 2e 1e b2 48 89 df e8
2017-12-17T12:50:38,973186+0000 RIP: pids_free+0x28/0xb0 RSP: ffffb4bf8641fce8
2017-12-17T12:50:38,978663+0000 CR2: 00000000000000b0
2017-12-17T12:50:38,981994+0000 ---[ end trace bc939269a984f4e1 ]---



Attachments:
config-4.14.6-3.gz (47.09 kB)

2017-12-17 17:49:49

by Bronek Kozicki

[permalink] [raw]
Subject: Re: PROBLEM: NULL pointer dereference in kernel 4.14.6

I just upgraded to 4.14.7 and tried to reproduce this error, this time under strace. As you can see this happens when systemctl tries to read a specific entry under /sys/fs . In case this matters, the entry is for a small virtual machine running under qemu/kvm and managed by libvirt.

open("/sys/fs/cgroup/unified/machine.slice", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 5
fstat(5, {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
getdents(5, /* 12 entries */, 32768) = 464
openat(AT_FDCWD, "/sys/fs/cgroup/unified/machine.slice/machine-qemu\\x2d1\\x2dkartuzy\\x2dspice.scope/cgroup.procs", O_RDONLY|O_CLOEXEC) = 8
fstat(8, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
read(8, <unfinished ...>) = ?
+++ killed by SIGKILL +++
[1] 12078 killed strace -- systemctl status


B.


[ 1889.226051] ================================================================================
[ 1889.235286] UBSAN: Undefined behaviour in kernel/cgroup/pids.c:67:9
[ 1889.241563] member access within null pointer of type 'struct pids_cgroup'
[ 1889.249920] ================================================================================
[ 1889.259698] BUG: unable to handle kernel NULL pointer dereference at 00000000000000b0
[ 1889.267524] IP: pids_free+0x28/0xb0
[ 1889.272394] PGD 0 P4D 0
[ 1889.274925] Oops: 0000 [#1] SMP
[ 1889.278061] Modules linked in: ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter devlink joydev hid_logitech_hidpp mxm_wmi intel_rapl sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm crct10dif_pclmul crc32_pclmu
l crc32c_intel ghash_clmulni_intel pcbc aesni_intel aes_x86_64 crypto_simd glue_helper cryptd ext4 intel_cstate crc16 mbcache jbd2 fscrypto nls_iso8859_1 nls_cp437 evdev input_leds led_class vfat fat intel_rapl_perf mac_hid pcspkr hid_logitech_dj igb ptp mei_me pps_
core i2c_i801 i2c_algo_bit mei lpc_ich ioatdma tpm_tis tpm_tis_core dca shpchp tpm wmi button sch_fq_codel sg ip_tables x_tables usbhid hid zfs(PO) zunicode(PO) zavl(PO) icp(PO) sd_mod serio_raw atkbd libps2 isci ahci libsas libahci xhci_pci ehci_pci mpt3sas xhci_hc
d ehci_hcd raid_class libata
[ 1889.349864] scsi_transport_sas usbcore scsi_mod usb_common i8042 serio zcommon(PO) znvpair(PO) spl(O) nvme nvme_core bridge stp llc vhost_net tun tap vhost vfio_pci irqbypass vfio_virqfd vfio_iommu_type1 vfio
[ 1889.368439] CPU: 1 PID: 12084 Comm: systemctl Tainted: P W O 4.14.7-1-ARCH #1
[ 1889.376525] Hardware name: Supermicro X9DA7/E/X9DA7/E, BIOS 3.0a 07/02/2014
[ 1889.383474] task: ffff93149aaec140 task.stack: ffffa88c3836c000
[ 1889.389387] RIP: 0010:pids_free+0x28/0xb0
[ 1889.393388] RSP: 0018:ffffa88c3836fcc8 EFLAGS: 00010282
[ 1889.398605] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000006
[ 1889.405731] RDX: 0000000000000000 RSI: 0000000000000202 RDI: 0000000000000202
[ 1889.412854] RBP: ffff931499ab2d58 R08: 000000000000079a R09: 0000000000000000
[ 1889.419979] R10: 00000000001f5954 R11: 000000000003d040 R12: 0000000056e21a48
[ 1889.427102] R13: ffffffffa91de5c0 R14: ffff93247b0598c0 R15: ffffffffa91cd0a0
[ 1889.434227] FS: 00007f18eee6b8c0(0000) GS:ffff931ebfa40000(0000) knlGS:0000000000000000
[ 1889.442302] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1889.448041] CR2: 00000000000000b0 CR3: 0000000611019003 CR4: 00000000001626e0
[ 1889.455164] Call Trace:
[ 1889.457610] cgroup_free+0xaa/0x190
[ 1889.461095] __put_task_struct+0x68/0x230
[ 1889.465105] ? seq_printf+0x4e/0x70
[ 1889.468591] css_task_iter_next+0x74/0x90
[ 1889.472594] kernfs_seq_next+0x58/0x110
[ 1889.476424] seq_read+0x36c/0x620
[ 1889.479735] __vfs_read+0x54/0x2e0
[ 1889.483134] vfs_read+0x9d/0x200
[ 1889.486358] SyS_read+0x52/0xc0
[ 1889.489494] do_syscall_64+0x69/0x1e0
[ 1889.493152] entry_SYSCALL64_slow_path+0x25/0x25
[ 1889.497771] RIP: 0033:0x7f18ee784a11
[ 1889.501341] RSP: 002b:00007ffd56942618 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
[ 1889.508897] RAX: ffffffffffffffda RBX: 0000559a9ae6d260 RCX: 00007f18ee784a11
[ 1889.516022] RDX: 0000000000001000 RSI: 0000559a9ae80f70 RDI: 0000000000000008
[ 1889.523145] RBP: 0000000000000d68 R08: 0000000000000003 R09: ffffffffffffffb0
[ 1889.530270] R10: 0000000000001000 R11: 0000000000000246 R12: 00007f18eea4b700
[ 1889.537395] R13: 00007f18eea4c240 R14: 0000559a9ae6d260 R15: 0000000000000000
[ 1889.544518] Code: 44 00 00 0f 1f 44 00 00 48 81 ff c8 f7 ff ff 55 53 48 89 fb 74 4c 48 8b 9b 38 08 00 00 48 85 db 74 7c 48 8b 5b 50 48 85 db 74 63 <48> 83 bb b0 00 00 00 00 74 2a 48 c7 c5 60 2e 1e a9 48 89 df e8
[ 1889.563368] RIP: pids_free+0x28/0xb0 RSP: ffffa88c3836fcc8
[ 1889.568846] CR2: 00000000000000b0
[ 1889.572175] ---[ end trace eab2ed000b4d5c66 ]---

2017-12-17 18:25:50

by Randy Dunlap

[permalink] [raw]
Subject: Re: PROBLEM: NULL pointer dereference in kernel 4.14.6

On 12/17/2017 09:49 AM, Bronek Kozicki wrote:
> I just upgraded to 4.14.7 and tried to reproduce this error, this time under strace. As you can see this happens when systemctl tries to read a specific entry under /sys/fs . In case this matters, the entry is for a small virtual machine running under qemu/kvm and managed by libvirt.
>
> open("/sys/fs/cgroup/unified/machine.slice", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 5
> fstat(5, {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
> getdents(5, /* 12 entries */, 32768)    = 464
> openat(AT_FDCWD, "/sys/fs/cgroup/unified/machine.slice/machine-qemu\\x2d1\\x2dkartuzy\\x2dspice.scope/cgroup.procs", O_RDONLY|O_CLOEXEC) = 8
> fstat(8, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
> read(8,  <unfinished ...>)              = ?
> +++ killed by SIGKILL +++
> [1]    12078 killed     strace -- systemctl status
>
>
> B.
>

Hi,

Can you reproduce this without using (loading) the XFS modules?
They cause the kernel to be tainted.

Adding cgroups mailing list also.

>
> [ 1889.226051] ================================================================================
> [ 1889.235286] UBSAN: Undefined behaviour in kernel/cgroup/pids.c:67:9
> [ 1889.241563] member access within null pointer of type 'struct pids_cgroup'
> [ 1889.249920] ================================================================================
> [ 1889.259698] BUG: unable to handle kernel NULL pointer dereference at 00000000000000b0
> [ 1889.267524] IP: pids_free+0x28/0xb0
> [ 1889.272394] PGD 0 P4D 0
> [ 1889.274925] Oops: 0000 [#1] SMP
> [ 1889.278061] Modules linked in: ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter devlink joydev hid_logitech_hidpp mxm_wmi intel_rapl sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm crct10dif_pclmul crc32_pclmu
> l crc32c_intel ghash_clmulni_intel pcbc aesni_intel aes_x86_64 crypto_simd glue_helper cryptd ext4 intel_cstate crc16 mbcache jbd2 fscrypto nls_iso8859_1 nls_cp437 evdev input_leds led_class vfat fat intel_rapl_perf mac_hid pcspkr hid_logitech_dj igb ptp mei_me pps_
> core i2c_i801 i2c_algo_bit mei lpc_ich ioatdma tpm_tis tpm_tis_core dca shpchp tpm wmi button sch_fq_codel sg ip_tables x_tables usbhid hid zfs(PO) zunicode(PO) zavl(PO) icp(PO) sd_mod serio_raw atkbd libps2 isci ahci libsas libahci xhci_pci ehci_pci mpt3sas xhci_hc
> d ehci_hcd raid_class libata
> [ 1889.349864]  scsi_transport_sas usbcore scsi_mod usb_common i8042 serio zcommon(PO) znvpair(PO) spl(O) nvme nvme_core bridge stp llc vhost_net tun tap vhost vfio_pci irqbypass vfio_virqfd vfio_iommu_type1 vfio
> [ 1889.368439] CPU: 1 PID: 12084 Comm: systemctl Tainted: P        W  O    4.14.7-1-ARCH #1
> [ 1889.376525] Hardware name: Supermicro X9DA7/E/X9DA7/E, BIOS 3.0a 07/02/2014
> [ 1889.383474] task: ffff93149aaec140 task.stack: ffffa88c3836c000
> [ 1889.389387] RIP: 0010:pids_free+0x28/0xb0
> [ 1889.393388] RSP: 0018:ffffa88c3836fcc8 EFLAGS: 00010282
> [ 1889.398605] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000006
> [ 1889.405731] RDX: 0000000000000000 RSI: 0000000000000202 RDI: 0000000000000202
> [ 1889.412854] RBP: ffff931499ab2d58 R08: 000000000000079a R09: 0000000000000000
> [ 1889.419979] R10: 00000000001f5954 R11: 000000000003d040 R12: 0000000056e21a48
> [ 1889.427102] R13: ffffffffa91de5c0 R14: ffff93247b0598c0 R15: ffffffffa91cd0a0
> [ 1889.434227] FS:  00007f18eee6b8c0(0000) GS:ffff931ebfa40000(0000) knlGS:0000000000000000
> [ 1889.442302] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 1889.448041] CR2: 00000000000000b0 CR3: 0000000611019003 CR4: 00000000001626e0
> [ 1889.455164] Call Trace:
> [ 1889.457610]  cgroup_free+0xaa/0x190
> [ 1889.461095]  __put_task_struct+0x68/0x230
> [ 1889.465105]  ? seq_printf+0x4e/0x70
> [ 1889.468591]  css_task_iter_next+0x74/0x90
> [ 1889.472594]  kernfs_seq_next+0x58/0x110
> [ 1889.476424]  seq_read+0x36c/0x620
> [ 1889.479735]  __vfs_read+0x54/0x2e0
> [ 1889.483134]  vfs_read+0x9d/0x200
> [ 1889.486358]  SyS_read+0x52/0xc0
> [ 1889.489494]  do_syscall_64+0x69/0x1e0
> [ 1889.493152]  entry_SYSCALL64_slow_path+0x25/0x25
> [ 1889.497771] RIP: 0033:0x7f18ee784a11
> [ 1889.501341] RSP: 002b:00007ffd56942618 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
> [ 1889.508897] RAX: ffffffffffffffda RBX: 0000559a9ae6d260 RCX: 00007f18ee784a11
> [ 1889.516022] RDX: 0000000000001000 RSI: 0000559a9ae80f70 RDI: 0000000000000008
> [ 1889.523145] RBP: 0000000000000d68 R08: 0000000000000003 R09: ffffffffffffffb0
> [ 1889.530270] R10: 0000000000001000 R11: 0000000000000246 R12: 00007f18eea4b700
> [ 1889.537395] R13: 00007f18eea4c240 R14: 0000559a9ae6d260 R15: 0000000000000000
> [ 1889.544518] Code: 44 00 00 0f 1f 44 00 00 48 81 ff c8 f7 ff ff 55 53 48 89 fb 74 4c 48 8b 9b 38 08 00 00 48 85 db 74 7c 48 8b 5b 50 48 85 db 74 63 <48> 83 bb b0 00 00 00 00 74 2a 48 c7 c5 60 2e 1e a9 48 89 df e8
> [ 1889.563368] RIP: pids_free+0x28/0xb0 RSP: ffffa88c3836fcc8
> [ 1889.568846] CR2: 00000000000000b0
> [ 1889.572175] ---[ end trace eab2ed000b4d5c66 ]---
>


--
~Randy

2017-12-17 18:30:33

by Bronek Kozicki

[permalink] [raw]
Subject: Re: PROBLEM: NULL pointer dereference in kernel 4.14.6

On 17/12/2017 18:25, Randy Dunlap wrote:
> On 12/17/2017 09:49 AM, Bronek Kozicki wrote:
>> I just upgraded to 4.14.7 and tried to reproduce this error, this time under strace. As you can see this happens when systemctl tries to read a specific entry under /sys/fs . In case this matters, the entry is for a small virtual machine running under qemu/kvm and managed by libvirt.
>>
>> open("/sys/fs/cgroup/unified/machine.slice", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 5
>> fstat(5, {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
>> getdents(5, /* 12 entries */, 32768)    = 464
>> openat(AT_FDCWD, "/sys/fs/cgroup/unified/machine.slice/machine-qemu\\x2d1\\x2dkartuzy\\x2dspice.scope/cgroup.procs", O_RDONLY|O_CLOEXEC) = 8
>> fstat(8, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
>> read(8,  <unfinished ...>)              = ?
>> +++ killed by SIGKILL +++
>> [1]    12078 killed     strace -- systemctl status
>>
>>
>> B.
>>
>
> Hi,
>
> Can you reproduce this without using (loading) the XFS modules?
> They cause the kernel to be tainted.

I think you mean ZFS - I cannot do that. It is my root filesystem.


B.

2017-12-17 18:31:42

by Randy Dunlap

[permalink] [raw]
Subject: Re: PROBLEM: NULL pointer dereference in kernel 4.14.6

On 12/17/2017 10:30 AM, Bronek Kozicki wrote:
> On 17/12/2017 18:25, Randy Dunlap wrote:
>> On 12/17/2017 09:49 AM, Bronek Kozicki wrote:
>>> I just upgraded to 4.14.7 and tried to reproduce this error, this time under strace. As you can see this happens when systemctl tries to read a specific entry under /sys/fs . In case this matters, the entry is for a small virtual machine running under qemu/kvm and managed by libvirt.
>>>
>>> open("/sys/fs/cgroup/unified/machine.slice", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 5
>>> fstat(5, {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
>>> getdents(5, /* 12 entries */, 32768)    = 464
>>> openat(AT_FDCWD, "/sys/fs/cgroup/unified/machine.slice/machine-qemu\\x2d1\\x2dkartuzy\\x2dspice.scope/cgroup.procs", O_RDONLY|O_CLOEXEC) = 8
>>> fstat(8, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
>>> read(8,  <unfinished ...>)              = ?
>>> +++ killed by SIGKILL +++
>>> [1]    12078 killed     strace -- systemctl status
>>>
>>>
>>> B.
>>>
>>
>> Hi,
>>
>> Can you reproduce this without using (loading) the XFS modules?
>> They cause the kernel to be tainted.
>
> I think you mean ZFS - I cannot do that. It is my root filesystem.

Sorry, yes, I did mean ZFS.

thanks,
--
~Randy

2017-12-17 18:48:18

by Bronek Kozicki

[permalink] [raw]
Subject: Re: PROBLEM: NULL pointer dereference in kernel 4.14.6

FWIW, I can do "cat" . I get a single number seemingly followed by an
infinite stream of 0s (I tried wc -l, but did not want to wait very long
and killed it). Here is what it looks like, if limited by "head":

root@gdansk ~ # cat
'/sys/fs/cgroup/unified/machine.slice/machine-qemu\x2d1\x2dkartuzy\x2dspice.scope/cgroup.procs'
| head
10649
0
0
0
0
0
0
0
0
0
root@gdansk ~ #

PID 10649 is indeed qemu process running the virtual machine in
question:

root@gdansk ~ # ps lw 10649
F UID PID PPID PRI NI VSZ RSS WCHAN STAT TTY TIME
COMMAND
6 0 10649 1 20 0 4815836 60252 - Sl ? 2:56
/usr/bin/qemu-system-x86_64 -name
guest=kartuzy-spice,process=qemu:kartuzy-spice,debug-threads=on -S
-object se


Sorry about taint by ZFS, but there is nothing I can do, it is my root
filesystem. Since I am the only user of the package in question I could
cheat and replace the license for the build of the ZFS module, but I do
not see how that might help.


B.

2017-12-17 23:17:15

by Vito Caputo

[permalink] [raw]
Subject: Re: PROBLEM: NULL pointer dereference in kernel 4.14.6

On Sun, Dec 17, 2017 at 05:49:44PM +0000, Bronek Kozicki wrote:
> I just upgraded to 4.14.7 and tried to reproduce this error, this time under strace. As you can see this happens when systemctl tries to read a specific entry under /sys/fs . In case this matters, the entry is for a small virtual machine running under qemu/kvm and managed by libvirt.
>
> open("/sys/fs/cgroup/unified/machine.slice", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 5
> fstat(5, {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
> getdents(5, /* 12 entries */, 32768) = 464
> openat(AT_FDCWD, "/sys/fs/cgroup/unified/machine.slice/machine-qemu\\x2d1\\x2dkartuzy\\x2dspice.scope/cgroup.procs", O_RDONLY|O_CLOEXEC) = 8
> fstat(8, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
> read(8, <unfinished ...>) = ?
> +++ killed by SIGKILL +++
> [1] 12078 killed strace -- systemctl status
>
>

This recently came through lkml, may be related:
https://marc.info/?l=linux-kernel&m=151320108922415&w=2

CCd Tejun


> B.
>
>
> [ 1889.226051] ================================================================================
> [ 1889.235286] UBSAN: Undefined behaviour in kernel/cgroup/pids.c:67:9
> [ 1889.241563] member access within null pointer of type 'struct pids_cgroup'
> [ 1889.249920] ================================================================================
> [ 1889.259698] BUG: unable to handle kernel NULL pointer dereference at 00000000000000b0
> [ 1889.267524] IP: pids_free+0x28/0xb0
> [ 1889.272394] PGD 0 P4D 0
> [ 1889.274925] Oops: 0000 [#1] SMP
> [ 1889.278061] Modules linked in: ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter devlink joydev hid_logitech_hidpp mxm_wmi intel_rapl sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm crct10dif_pclmul crc32_pclmu
> l crc32c_intel ghash_clmulni_intel pcbc aesni_intel aes_x86_64 crypto_simd glue_helper cryptd ext4 intel_cstate crc16 mbcache jbd2 fscrypto nls_iso8859_1 nls_cp437 evdev input_leds led_class vfat fat intel_rapl_perf mac_hid pcspkr hid_logitech_dj igb ptp mei_me pps_
> core i2c_i801 i2c_algo_bit mei lpc_ich ioatdma tpm_tis tpm_tis_core dca shpchp tpm wmi button sch_fq_codel sg ip_tables x_tables usbhid hid zfs(PO) zunicode(PO) zavl(PO) icp(PO) sd_mod serio_raw atkbd libps2 isci ahci libsas libahci xhci_pci ehci_pci mpt3sas xhci_hc
> d ehci_hcd raid_class libata
> [ 1889.349864] scsi_transport_sas usbcore scsi_mod usb_common i8042 serio zcommon(PO) znvpair(PO) spl(O) nvme nvme_core bridge stp llc vhost_net tun tap vhost vfio_pci irqbypass vfio_virqfd vfio_iommu_type1 vfio
> [ 1889.368439] CPU: 1 PID: 12084 Comm: systemctl Tainted: P W O 4.14.7-1-ARCH #1
> [ 1889.376525] Hardware name: Supermicro X9DA7/E/X9DA7/E, BIOS 3.0a 07/02/2014
> [ 1889.383474] task: ffff93149aaec140 task.stack: ffffa88c3836c000
> [ 1889.389387] RIP: 0010:pids_free+0x28/0xb0
> [ 1889.393388] RSP: 0018:ffffa88c3836fcc8 EFLAGS: 00010282
> [ 1889.398605] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000006
> [ 1889.405731] RDX: 0000000000000000 RSI: 0000000000000202 RDI: 0000000000000202
> [ 1889.412854] RBP: ffff931499ab2d58 R08: 000000000000079a R09: 0000000000000000
> [ 1889.419979] R10: 00000000001f5954 R11: 000000000003d040 R12: 0000000056e21a48
> [ 1889.427102] R13: ffffffffa91de5c0 R14: ffff93247b0598c0 R15: ffffffffa91cd0a0
> [ 1889.434227] FS: 00007f18eee6b8c0(0000) GS:ffff931ebfa40000(0000) knlGS:0000000000000000
> [ 1889.442302] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 1889.448041] CR2: 00000000000000b0 CR3: 0000000611019003 CR4: 00000000001626e0
> [ 1889.455164] Call Trace:
> [ 1889.457610] cgroup_free+0xaa/0x190
> [ 1889.461095] __put_task_struct+0x68/0x230
> [ 1889.465105] ? seq_printf+0x4e/0x70
> [ 1889.468591] css_task_iter_next+0x74/0x90
> [ 1889.472594] kernfs_seq_next+0x58/0x110
> [ 1889.476424] seq_read+0x36c/0x620
> [ 1889.479735] __vfs_read+0x54/0x2e0
> [ 1889.483134] vfs_read+0x9d/0x200
> [ 1889.486358] SyS_read+0x52/0xc0
> [ 1889.489494] do_syscall_64+0x69/0x1e0
> [ 1889.493152] entry_SYSCALL64_slow_path+0x25/0x25
> [ 1889.497771] RIP: 0033:0x7f18ee784a11
> [ 1889.501341] RSP: 002b:00007ffd56942618 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
> [ 1889.508897] RAX: ffffffffffffffda RBX: 0000559a9ae6d260 RCX: 00007f18ee784a11
> [ 1889.516022] RDX: 0000000000001000 RSI: 0000559a9ae80f70 RDI: 0000000000000008
> [ 1889.523145] RBP: 0000000000000d68 R08: 0000000000000003 R09: ffffffffffffffb0
> [ 1889.530270] R10: 0000000000001000 R11: 0000000000000246 R12: 00007f18eea4b700
> [ 1889.537395] R13: 00007f18eea4c240 R14: 0000559a9ae6d260 R15: 0000000000000000
> [ 1889.544518] Code: 44 00 00 0f 1f 44 00 00 48 81 ff c8 f7 ff ff 55 53 48 89 fb 74 4c 48 8b 9b 38 08 00 00 48 85 db 74 7c 48 8b 5b 50 48 85 db 74 63 <48> 83 bb b0 00 00 00 00 74 2a 48 c7 c5 60 2e 1e a9 48 89 df e8
> [ 1889.563368] RIP: pids_free+0x28/0xb0 RSP: ffffa88c3836fcc8
> [ 1889.568846] CR2: 00000000000000b0
> [ 1889.572175] ---[ end trace eab2ed000b4d5c66 ]---

2017-12-18 19:56:24

by Bronek Kozicki

[permalink] [raw]
Subject: Re: PROBLEM: NULL pointer dereference in kernel 4.14.6

On 17/12/2017 23:24, [email protected] wrote:
> On Sun, Dec 17, 2017 at 05:49:44PM +0000, Bronek Kozicki wrote:
>> I just upgraded to 4.14.7 and tried to reproduce this error, this time under strace. As you can see this happens when systemctl tries to read a specific entry under /sys/fs . In case this matters, the entry is for a small virtual machine running under qemu/kvm and managed by libvirt.
>>
>> open("/sys/fs/cgroup/unified/machine.slice", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 5
>> fstat(5, {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
>> getdents(5, /* 12 entries */, 32768) = 464
>> openat(AT_FDCWD, "/sys/fs/cgroup/unified/machine.slice/machine-qemu\\x2d1\\x2dkartuzy\\x2dspice.scope/cgroup.procs", O_RDONLY|O_CLOEXEC) = 8
>> fstat(8, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
>> read(8, <unfinished ...>) = ?
>> +++ killed by SIGKILL +++
>> [1] 12078 killed strace -- systemctl status
>>
>>
>
> This recently came through lkml, may be related:
> https://marc.info/?l=linux-kernel&m=151320108922415&w=2

thank you, it certainly seems related. Is there some debugging option I could enable, or patch I could apply, which would make the point of data corruption easier to find? I'm ok taking untested patches, if that helps finding the location of the bug.


B.

2017-12-19 01:17:59

by George Amanakis

[permalink] [raw]
Subject: Re: PROBLEM: NULL pointer dereference in kernel 4.14.6

I can replicate this on a Thinkpad X230i running archlinux with latest
4.14.7 kernel, without the ZFS modules.

Steps to reproduce:
1) create a virtual machine using libvirt (attached xml)
2) virsh start vm
3) head /sys/fs/cgroup/unified/machine.slice/machine-
qemu\\x2d2\\x2dvm.scope/cgroup.procs

This hangs the laptop requiring a hard reset.

Regards,
George


Attachments:
vm.xml (2.90 kB)

2017-12-19 13:42:55

by Tejun Heo

[permalink] [raw]
Subject: Re: PROBLEM: NULL pointer dereference in kernel 4.14.6

On Sun, Dec 17, 2017 at 03:24:48PM -0800, [email protected] wrote:
> On Sun, Dec 17, 2017 at 05:49:44PM +0000, Bronek Kozicki wrote:
> > I just upgraded to 4.14.7 and tried to reproduce this error, this time under strace. As you can see this happens when systemctl tries to read a specific entry under /sys/fs . In case this matters, the entry is for a small virtual machine running under qemu/kvm and managed by libvirt.
> >
> > open("/sys/fs/cgroup/unified/machine.slice", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 5
> > fstat(5, {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
> > getdents(5, /* 12 entries */, 32768) = 464
> > openat(AT_FDCWD, "/sys/fs/cgroup/unified/machine.slice/machine-qemu\\x2d1\\x2dkartuzy\\x2dspice.scope/cgroup.procs", O_RDONLY|O_CLOEXEC) = 8
> > fstat(8, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
> > read(8, <unfinished ...>) = ?
> > +++ killed by SIGKILL +++
> > [1] 12078 killed strace -- systemctl status
> >
> >
>
> This recently came through lkml, may be related:
> https://marc.info/?l=linux-kernel&m=151320108922415&w=2

It looks like it could be the same problem. Working on the fix now.
Will let you know when I have something.

Thanks.

--
tejun

2017-12-19 22:37:25

by Tejun Heo

[permalink] [raw]
Subject: Re: PROBLEM: NULL pointer dereference in kernel 4.14.6

Hello,

On Mon, Dec 18, 2017 at 03:17:54PM -0500, George Amanakis wrote:
> I can replicate this on a Thinkpad X230i running archlinux with latest
> 4.14.7 kernel, without the ZFS modules.
>
> Steps to reproduce:
> 1) create a virtual machine using libvirt (attached xml)
> 2) virsh start vm
> 3) head /sys/fs/cgroup/unified/machine.slice/machine-
> qemu\\x2d2\\x2dvm.scope/cgroup.procs

It took some massaging but I can reproduce the problem. Will report
when I know more.

Thanks.

--
tejun

2017-12-20 15:14:40

by Tejun Heo

[permalink] [raw]
Subject: Re: PROBLEM: NULL pointer dereference in kernel 4.14.6

On Tue, Dec 19, 2017 at 05:42:39AM -0800, Tejun Heo wrote:
> On Sun, Dec 17, 2017 at 03:24:48PM -0800, [email protected] wrote:
> > On Sun, Dec 17, 2017 at 05:49:44PM +0000, Bronek Kozicki wrote:
> > > I just upgraded to 4.14.7 and tried to reproduce this error, this time under strace. As you can see this happens when systemctl tries to read a specific entry under /sys/fs . In case this matters, the entry is for a small virtual machine running under qemu/kvm and managed by libvirt.
> > >
> > > open("/sys/fs/cgroup/unified/machine.slice", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 5
> > > fstat(5, {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
> > > getdents(5, /* 12 entries */, 32768) = 464
> > > openat(AT_FDCWD, "/sys/fs/cgroup/unified/machine.slice/machine-qemu\\x2d1\\x2dkartuzy\\x2dspice.scope/cgroup.procs", O_RDONLY|O_CLOEXEC) = 8
> > > fstat(8, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
> > > read(8, <unfinished ...>) = ?
> > > +++ killed by SIGKILL +++
> > > [1] 12078 killed strace -- systemctl status
> > >
> > >
> >
> > This recently came through lkml, may be related:
> > https://marc.info/?l=linux-kernel&m=151320108922415&w=2
>
> It looks like it could be the same problem. Working on the fix now.
> Will let you know when I have something.

Fix posted.

http://lkml.kernel.org/r/[email protected]

Thanks.

--
tejun

2017-12-20 18:04:16

by Bronek Kozicki

[permalink] [raw]
Subject: Re: PROBLEM: NULL pointer dereference in kernel 4.14.6

On Wed, 20 Dec 2017, at 3:14 PM, Tejun Heo wrote:
> On Tue, Dec 19, 2017 at 05:42:39AM -0800, Tejun Heo wrote:
> > On Sun, Dec 17, 2017 at 03:24:48PM -0800, [email protected] wrote:
> > > On Sun, Dec 17, 2017 at 05:49:44PM +0000, Bronek Kozicki wrote:
> > > > I just upgraded to 4.14.7 and tried to reproduce this error, this time under strace. As you can see this happens when systemctl tries to read a specific entry under /sys/fs . In case this matters, the entry is for a small virtual machine running under qemu/kvm and managed by libvirt.
> > > >
> > > > open("/sys/fs/cgroup/unified/machine.slice", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 5
> > > > fstat(5, {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
> > > > getdents(5, /* 12 entries */, 32768) = 464
> > > > openat(AT_FDCWD, "/sys/fs/cgroup/unified/machine.slice/machine-qemu\\x2d1\\x2dkartuzy\\x2dspice.scope/cgroup.procs", O_RDONLY|O_CLOEXEC) = 8
> > > > fstat(8, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
> > > > read(8, <unfinished ...>) = ?
> > > > +++ killed by SIGKILL +++
> > > > [1] 12078 killed strace -- systemctl status
> > > >
> > > >
> > >
> > > This recently came through lkml, may be related:
> > > https://marc.info/?l=linux-kernel&m=151320108922415&w=2
> >
> > It looks like it could be the same problem. Working on the fix now.
> > Will let you know when I have something.
>
> Fix posted.
>
> http://lkml.kernel.org/r/[email protected]
>

Thank you Tejun - I tested this fix and it works for me


B.