2019-08-10 16:35:34

by Paul Menzel

[permalink] [raw]
Subject: [Linux 5.2.7] powertop --auto-tune: BUG: kernel NULL pointer dereference, address: 0000000000000000

Dear Linux,


I have no idea, who to report this to, so I please refer me to the
correct list.

With Linux 5.2.7 from Debian Sid/unstable and PowerTOP 2.10, executing

sudo powertop --auto-tune

causes a NULL pointer dereference, and the graphical session crashes due
to an effect on the i915 driver. It worked in the past with the 4.19
series from Debian.

Here is the trace, and please find all Linux kernel logs attached.

> [ 2027.127428] calling msr_init+0x0/0x1000 [msr] @ 6294
> [ 2027.131123] initcall msr_init+0x0/0x1000 [msr] returned 0 after 3599 usecs
> [ 2027.170589] BUG: kernel NULL pointer dereference, address: 0000000000000000
> [ 2027.170600] #PF: supervisor instruction fetch in kernel mode
> [ 2027.170604] #PF: error_code(0x0010) - not-present page
> [ 2027.170609] PGD 0 P4D 0
> [ 2027.170619] Oops: 0010 [#1] SMP PTI
> [ 2027.170628] CPU: 0 PID: 6290 Comm: powertop Not tainted 5.2.0-2-amd64 #1 Debian 5.2.7-1
> [ 2027.170632] Hardware name: Dell Inc. Latitude E7250/0TVD2T, BIOS A19 01/23/2018
> [ 2027.170640] RIP: 0010:0x0
> [ 2027.170653] Code: Bad RIP value.
> [ 2027.170660] RSP: 0018:ffffb1a943543cd8 EFLAGS: 00010246
> [ 2027.170666] RAX: 0000000000000000 RBX: ffff95c66adc1b00 RCX: 0000000000000001
> [ 2027.170672] RDX: ffff95c68ad50600 RSI: ffff95c66adc1b00 RDI: ffff95c68d320de0
> [ 2027.170678] RBP: ffff95c68d320de0 R08: 0000000000000000 R09: 0000000000000000
> [ 2027.170683] R10: ffff95c68d2d6f00 R11: 0000000000000000 R12: ffff95c66adc1b10
> [ 2027.170689] R13: ffffffff8851cdd0 R14: 0000000000000000 R15: ffff95c66adc1b00
> [ 2027.170698] FS: 00007f2933119740(0000) GS:ffff95c68da00000(0000) knlGS:0000000000000000
> [ 2027.170703] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 2027.170710] CR2: ffffffffffffffd6 CR3: 00000003d7b44006 CR4: 00000000003606f0
> [ 2027.170714] Call Trace:
> [ 2027.170730] do_dentry_open+0x13a/0x370
> [ 2027.170745] path_openat+0x2c6/0x1480
> [ 2027.170757] ? terminate_walk+0xe6/0x100
> [ 2027.170767] ? path_lookupat.isra.48+0xa3/0x220
> [ 2027.170779] ? reuse_swap_page+0x105/0x320
> [ 2027.170791] do_filp_open+0x93/0x100
> [ 2027.170804] ? __check_object_size+0x15d/0x189
> [ 2027.170816] do_sys_open+0x184/0x220
> [ 2027.170828] do_syscall_64+0x53/0x130
> [ 2027.170837] entry_SYSCALL_64_after_hwframe+0x44/0xa9
> [ 2027.170844] RIP: 0033:0x7f29332771ae
> [ 2027.170850] Code: 25 00 00 41 00 3d 00 00 41 00 74 48 48 8d 05 59 65 0d 00 8b 00 85 c0 75 69 89 f2 b8 01 01 00 00 48 89 fe bf 9c ff ff ff 0f 05 <48> 3d 00 f0 ff ff 0f 87 a6 00 00 00 48 8b 4c 24 28 64 48 33 0c 25
> [ 2027.170856] RSP: 002b:00007fffda995f10 EFLAGS: 00000246 ORIG_RAX: 0000000000000101
> [ 2027.170863] RAX: ffffffffffffffda RBX: 00005608e822f870 RCX: 00007f29332771ae
> [ 2027.170868] RDX: 0000000000000000 RSI: 00007fffda9972b0 RDI: 00000000ffffff9c
> [ 2027.170872] RBP: 0000000000000008 R08: 0000000000000008 R09: 0000000000000001
> [ 2027.170878] R10: 0000000000000000 R11: 0000000000000246 R12: 00007f293368d805
> [ 2027.170883] R13: 00007f293368d805 R14: 0000000000000001 R15: 00005608e7d9afd0
> [ 2027.170891] Modules linked in: msr snd_hda_codec_hdmi uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videobuf2_common videodev media btusb btrtl btbcm btintel bluetooth ecdh_generic ecc binfmt_misc nls_ascii nls_cp437 vfat fat intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel arc4 mei_wdt iTCO_wdt kvm iTCO_vendor_support watchdog wmi_bmof dell_wmi sparse_keymap irqbypass ppdev iwlmvm dell_laptop crct10dif_pclmul dell_smbios crc32_pclmul dell_wmi_descriptor dcdbas snd_hda_codec_realtek snd_hda_codec_generic dell_smm_hwmon mac80211 ghash_clmulni_intel ledtrig_audio i915 intel_cstate snd_hda_intel intel_uncore intel_rapl_perf drm_kms_helper snd_hda_codec xhci_pci iwlwifi ehci_pci sdhci_pci xhci_hcd ehci_hcd snd_hda_core cqhci efi_pstore sdhci pcspkr snd_hwdep mei_me joydev e1000e mmc_core sg mei snd_pcm usbcore processor_thermal_device tpm_tis cfg80211 efivars drm tpm_tis_core intel_soc_dts_iosf snd_timer i2c_i801 snd lpc_ich soundcore mfd_core usb_common
> [ 2027.171003] i2c_algo_bit tpm rng_core battery wmi parport_pc parport int3403_thermal pcc_cpufreq video ac int3402_thermal int340x_thermal_zone int3400_thermal acpi_thermal_rel dell_rbtn rfkill button acpi_pad efivarfs ip_tables x_tables autofs4 ext4 crc16 mbcache jbd2 crc32c_generic algif_skcipher af_alg dm_crypt dm_mod sd_mod crc32c_intel ahci libahci aesni_intel libata aes_x86_64 glue_helper crypto_simd scsi_mod psmouse cryptd evdev serio_raw fan
> [ 2027.171071] CR2: 0000000000000000
> [ 2027.171079] ---[ end trace fba28f1c98c90aac ]---
> [ 2028.530750] i915 0000:00:02.0: Failed to idle engines, declaring wedged!
> [ 2029.198236] RIP: 0010:0x0
> [ 2029.198246] Code: Bad RIP value.
> [ 2029.198249] RSP: 0018:ffffb1a943543cd8 EFLAGS: 00010246
> [ 2029.198253] RAX: 0000000000000000 RBX: ffff95c66adc1b00 RCX: 0000000000000001
> [ 2029.198255] RDX: ffff95c68ad50600 RSI: ffff95c66adc1b00 RDI: ffff95c68d320de0
> [ 2029.198257] RBP: ffff95c68d320de0 R08: 0000000000000000 R09: 0000000000000000
> [ 2029.198259] R10: ffff95c68d2d6f00 R11: 0000000000000000 R12: ffff95c66adc1b10
> [ 2029.198261] R13: ffffffff8851cdd0 R14: 0000000000000000 R15: ffff95c66adc1b00
> [ 2029.198264] FS: 00007f2933119740(0000) GS:ffff95c68da00000(0000) knlGS:0000000000000000
> [ 2029.198266] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 2029.198268] CR2: ffffffffffffffd6 CR3: 00000003d7b44006 CR4: 00000000003606f0
> [ 2029.255445] rfkill: input handler enabled
> [ 2037.093000] rfkill: input handler disabled
> [ 2042.552216] BUG: kernel NULL pointer dereference, address: 0000000000000000
> [ 2042.552219] #PF: supervisor instruction fetch in kernel mode
> [ 2042.552220] #PF: error_code(0x0010) - not-present page
> [ 2042.552221] PGD 0 P4D 0
> [ 2042.552223] Oops: 0010 [#2] SMP PTI
> [ 2042.552225] CPU: 2 PID: 6883 Comm: powertop Tainted: G D 5.2.0-2-amd64 #1 Debian 5.2.7-1
> [ 2042.552226] Hardware name: Dell Inc. Latitude E7250/0TVD2T, BIOS A19 01/23/2018
> [ 2042.552228] RIP: 0010:0x0
> [ 2042.552230] Code: Bad RIP value.
> [ 2042.552231] RSP: 0018:ffffb1a94815fcd8 EFLAGS: 00010246
> [ 2042.552232] RAX: 0000000000000000 RBX: ffff95c62e77fb00 RCX: 0000000000000001
> [ 2042.552233] RDX: ffff95c68ad50600 RSI: ffff95c62e77fb00 RDI: ffff95c68d320de0
> [ 2042.552234] RBP: ffff95c68d320de0 R08: 0000000000000000 R09: 0000000000000000
> [ 2042.552235] R10: ffff95c68d2d6f00 R11: 0000000000000000 R12: ffff95c62e77fb10
> [ 2042.552235] R13: ffffffff8851cdd0 R14: 0000000000000000 R15: ffff95c62e77fb00
> [ 2042.552236] FS: 00007f438ebb2740(0000) GS:ffff95c68db00000(0000) knlGS:0000000000000000
> [ 2042.552237] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 2042.552238] CR2: ffffffffffffffd6 CR3: 00000003e4dfc002 CR4: 00000000003606e0
> [ 2042.552239] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [ 2042.552240] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [ 2042.552240] Call Trace:
> [ 2042.552244] do_dentry_open+0x13a/0x370
> [ 2042.552248] path_openat+0x2c6/0x1480
> [ 2042.552250] ? terminate_walk+0xe6/0x100
> [ 2042.552251] ? path_lookupat.isra.48+0xa3/0x220
> [ 2042.552253] ? reuse_swap_page+0x105/0x320
> [ 2042.552255] do_filp_open+0x93/0x100
> [ 2042.552257] ? __check_object_size+0x15d/0x189
> [ 2042.552260] do_sys_open+0x184/0x220
> [ 2042.552262] do_syscall_64+0x53/0x130
> [ 2042.552265] entry_SYSCALL_64_after_hwframe+0x44/0xa9
> [ 2042.552266] RIP: 0033:0x7f438ed101ae
> [ 2042.552268] Code: 25 00 00 41 00 3d 00 00 41 00 74 48 48 8d 05 59 65 0d 00 8b 00 85 c0 75 69 89 f2 b8 01 01 00 00 48 89 fe bf 9c ff ff ff 0f 05 <48> 3d 00 f0 ff ff 0f 87 a6 00 00 00 48 8b 4c 24 28 64 48 33 0c 25
> [ 2042.552269] RSP: 002b:00007ffd324ae9d0 EFLAGS: 00000246 ORIG_RAX: 0000000000000101
> [ 2042.552270] RAX: ffffffffffffffda RBX: 0000557124829870 RCX: 00007f438ed101ae
> [ 2042.552271] RDX: 0000000000000000 RSI: 00007ffd324afd70 RDI: 00000000ffffff9c
> [ 2042.552271] RBP: 0000000000000008 R08: 0000000000000008 R09: 0000000000000001
> [ 2042.552272] R10: 0000000000000000 R11: 0000000000000246 R12: 00007f438f126805
> [ 2042.552273] R13: 00007f438f126805 R14: 0000000000000001 R15: 0000557124293fd0
> [ 2042.552274] Modules linked in: msr snd_hda_codec_hdmi uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videobuf2_common videodev media btusb btrtl btbcm btintel bluetooth ecdh_generic ecc binfmt_misc nls_ascii nls_cp437 vfat fat intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel arc4 mei_wdt iTCO_wdt kvm iTCO_vendor_support watchdog wmi_bmof dell_wmi sparse_keymap irqbypass ppdev iwlmvm dell_laptop crct10dif_pclmul dell_smbios crc32_pclmul dell_wmi_descriptor dcdbas snd_hda_codec_realtek snd_hda_codec_generic dell_smm_hwmon mac80211 ghash_clmulni_intel ledtrig_audio i915 intel_cstate snd_hda_intel intel_uncore intel_rapl_perf drm_kms_helper snd_hda_codec xhci_pci iwlwifi ehci_pci sdhci_pci xhci_hcd ehci_hcd snd_hda_core cqhci efi_pstore sdhci pcspkr snd_hwdep mei_me joydev e1000e mmc_core sg mei snd_pcm usbcore processor_thermal_device tpm_tis cfg80211 efivars drm tpm_tis_core intel_soc_dts_iosf snd_timer i2c_i801 snd lpc_ich soundcore mfd_core usb_common
> [ 2042.552299] i2c_algo_bit tpm rng_core battery wmi parport_pc parport int3403_thermal pcc_cpufreq video ac int3402_thermal int340x_thermal_zone int3400_thermal acpi_thermal_rel dell_rbtn rfkill button acpi_pad efivarfs ip_tables x_tables autofs4 ext4 crc16 mbcache jbd2 crc32c_generic algif_skcipher af_alg dm_crypt dm_mod sd_mod crc32c_intel ahci libahci aesni_intel libata aes_x86_64 glue_helper crypto_simd scsi_mod psmouse cryptd evdev serio_raw fan
> [ 2042.552313] CR2: 0000000000000000
> [ 2042.552315] ---[ end trace fba28f1c98c90aad ]---
> [ 2044.057566] RIP: 0010:0x0
> [ 2044.057578] Code: Bad RIP value.
> [ 2044.057580] RSP: 0018:ffffb1a943543cd8 EFLAGS: 00010246
> [ 2044.057583] RAX: 0000000000000000 RBX: ffff95c66adc1b00 RCX: 0000000000000001
> [ 2044.057584] RDX: ffff95c68ad50600 RSI: ffff95c66adc1b00 RDI: ffff95c68d320de0
> [ 2044.057586] RBP: ffff95c68d320de0 R08: 0000000000000000 R09: 0000000000000000
> [ 2044.057588] R10: ffff95c68d2d6f00 R11: 0000000000000000 R12: ffff95c66adc1b10
> [ 2044.057590] R13: ffffffff8851cdd0 R14: 0000000000000000 R15: ffff95c66adc1b00
> [ 2044.057592] FS: 00007f438ebb2740(0000) GS:ffff95c68db00000(0000) knlGS:0000000000000000
> [ 2044.057594] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 2044.057595] CR2: ffffffffffffffd6 CR3: 00000003e4dfc002 CR4: 00000000003606e0
> [ 2044.057597] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [ 2044.057598] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 000000000000040


Kind regards,

Paul


Attachments:
=?utf-8?Q?20190810=E2=80=93linux-5=2E2=2E7=E2=80=93messages=2Etxt?= (235.23 kB)

2019-08-10 17:32:42

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [Linux 5.2.7] powertop --auto-tune: BUG: kernel NULL pointer dereference, address: 0000000000000000

Paul,

On Sat, 10 Aug 2019, Paul Menzel wrote:
>
> I have no idea, who to report this to, so I please refer me to the correct
> list.

I have no idea yet either :)

> With Linux 5.2.7 from Debian Sid/unstable and PowerTOP 2.10, executing
>
> sudo powertop --auto-tune
>
> causes a NULL pointer dereference, and the graphical session crashes due to an
> effect on the i915 driver. It worked in the past with the 4.19 series from
> Debian.
>
> Here is the trace, and please find all Linux kernel logs attached.
>
> > [ 2027.170589] BUG: kernel NULL pointer dereference, address:
> > 0000000000000000
> > [ 2027.170600] #PF: supervisor instruction fetch in kernel mode
> > [ 2027.170604] #PF: error_code(0x0010) - not-present page
> > [ 2027.170609] PGD 0 P4D 0 [ 2027.170619] Oops: 0010 [#1] SMP PTI
...
> > [ 2027.170730] do_dentry_open+0x13a/0x370

If you have compiled with debug info, please decode the line:

linux/scripts/faddr2line vmlinux do_dentry_open+0x13a/0x370

That gives us the fops pointer which is NULL.

> > [ 2027.170745] path_openat+0x2c6/0x1480
> > [ 2027.170757] ? terminate_walk+0xe6/0x100
> > [ 2027.170767] ? path_lookupat.isra.48+0xa3/0x220
> > [ 2027.170779] ? reuse_swap_page+0x105/0x320
> > [ 2027.170791] do_filp_open+0x93/0x100
> > [ 2027.170804] ? __check_object_size+0x15d/0x189
> > [ 2027.170816] do_sys_open+0x184/0x220
> > [ 2027.170828] do_syscall_64+0x53/0x130
> > [ 2027.170837] entry_SYSCALL_64_after_hwframe+0x44/0xa9

That's an open crashing. We just don't know which file. Is the machine
completely hosed after that or is it just the graphics stuff dying?

If it's not completely dead then instead of running it from your graphical
desktop you could switch to a VGA terminal Alt+Ctrl+F1 (or whatever
function key your distro maps to) after boot and run powertop with strace
from there:

strace -f -o xxx.log powertop

With a bit of luck xxx.log should contain the information about the file it
tries to open.

Alternatively if you have a serial console you can enable the
sys_enter_open* tracepoints:

# echo 1 >/sys/kernel/debug/tracing/events/syscalls/sys_enter_open
# echo 1 >/sys/kernel/debug/tracing/events/syscalls/sys_enter_openat

Either add 'ftrace_dump_on_oops' to the kernel command line or enable it
from the shell:

# echo 1 > /proc/sys/kernel/ftrace_dump_on_oops

Then run powertop. After the crash it will take some time to spill out the
trace buffer over serial, but it will pinpoint the offending file.

Once we know which file it is, we also know who needs to stare at it :)

Thanks,

tglx

2019-08-10 18:29:33

by Paul Menzel

[permalink] [raw]
Subject: Re: [Linux 5.2.7] powertop --auto-tune: BUG: kernel NULL pointer dereference, address: 0000000000000000

Dear Thomas,


On 10.08.19 19:31, Thomas Gleixner wrote:

> On Sat, 10 Aug 2019, Paul Menzel wrote:
>>
>> I have no idea, who to report this to, so I please refer me to the correct
>> list.
>
> I have no idea yet either :)
>
>> With Linux 5.2.7 from Debian Sid/unstable and PowerTOP 2.10, executing
>>
>> sudo powertop --auto-tune
>>
>> causes a NULL pointer dereference, and the graphical session crashes due to an
>> effect on the i915 driver. It worked in the past with the 4.19 series from
>> Debian.
>>
>> Here is the trace, and please find all Linux kernel logs attached.
>>
>>> [ 2027.170589] BUG: kernel NULL pointer dereference, address:
>>> 0000000000000000
>>> [ 2027.170600] #PF: supervisor instruction fetch in kernel mode
>>> [ 2027.170604] #PF: error_code(0x0010) - not-present page
>>> [ 2027.170609] PGD 0 P4D 0 [ 2027.170619] Oops: 0010 [#1] SMP PTI
> ...
>>> [ 2027.170730] do_dentry_open+0x13a/0x370
>
> If you have compiled with debug info, please decode the line:
>
> linux/scripts/faddr2line vmlinux do_dentry_open+0x13a/0x370
>
> That gives us the fops pointer which is NULL.

Hah, luckily it’s reproducible.

```
$ scripts/faddr2line /usr/lib/debug/boot/vmlinux-5.2.0-2-amd64
do_dentry_open+0x13a/0x370
do_dentry_open+0x13a/0x370:
do_dentry_open at fs/open.c:799
```

>>> [ 2027.170745] path_openat+0x2c6/0x1480
>>> [ 2027.170757] ? terminate_walk+0xe6/0x100
>>> [ 2027.170767] ? path_lookupat.isra.48+0xa3/0x220
>>> [ 2027.170779] ? reuse_swap_page+0x105/0x320
>>> [ 2027.170791] do_filp_open+0x93/0x100
>>> [ 2027.170804] ? __check_object_size+0x15d/0x189
>>> [ 2027.170816] do_sys_open+0x184/0x220
>>> [ 2027.170828] do_syscall_64+0x53/0x130
>>> [ 2027.170837] entry_SYSCALL_64_after_hwframe+0x44/0xa9
>
> That's an open crashing. We just don't know which file. Is the machine
> completely hosed after that or is it just the graphics stuff dying?

No, the graphical login manager showed up, and I could log back in, and
continue using hte machine.

> If it's not completely dead then instead of running it from your graphical
> desktop you could switch to a VGA terminal Alt+Ctrl+F1 (or whatever
> function key your distro maps to) after boot and run powertop with strace
> from there:
>
> strace -f -o xxx.log powertop
>
> With a bit of luck xxx.log should contain the information about the file it
> tries to open.

```
2157 access("/sys/class/drm/card0/power/rc6_residency_ms", R_OK) = 0
2157 openat(AT_FDCWD,
"/sys/kernel/debug/tracing/events/power/cpu_idle/id", O_RDONLY) = ?
2157 +++ killed by SIGKILL +++
```

> Alternatively if you have a serial console you can enable the
> sys_enter_open* tracepoints:
>
> # echo 1 >/sys/kernel/debug/tracing/events/syscalls/sys_enter_open
> # echo 1 >/sys/kernel/debug/tracing/events/syscalls/sys_enter_openat
>
> Either add 'ftrace_dump_on_oops' to the kernel command line or enable it
> from the shell:
>
> # echo 1 > /proc/sys/kernel/ftrace_dump_on_oops
>
> Then run powertop. After the crash it will take some time to spill out the
> trace buffer over serial, but it will pinpoint the offending file.

I do not have serial console on this device.

> Once we know which file it is, we also know who needs to stare at it :)

Hopefully the information is enough.


Kind regards,

Paul


Attachments:
powertop-strace.txt (61.72 kB)

2019-08-10 21:11:58

by Paul Menzel

[permalink] [raw]
Subject: Re: [Linux 5.2.x] /sys/kernel/debug/tracing/events/power/cpu_idle/id: BUG: kernel NULL pointer dereference, address: 0000000000000000

[+ INTEL IDLE DRIVER]

Dear Linux folks,


On 10.08.19 20:28, Paul Menzel wrote:

> On 10.08.19 19:31, Thomas Gleixner wrote:
>
>> On Sat, 10 Aug 2019, Paul Menzel wrote:
>>>
>>> I have no idea, who to report this to, so I please refer me to the
>>> correct
>>> list.
>>
>> I have no idea yet either :)
>>
>>> With Linux 5.2.7 from Debian Sid/unstable and PowerTOP 2.10, executing
>>>
>>>      sudo powertop --auto-tune
>>>
>>> causes a NULL pointer dereference, and the graphical session crashes
>>> due to an
>>> effect on the i915 driver. It worked in the past with the 4.19 series
>>> from
>>> Debian.
>>>
>>> Here is the trace, and please find all Linux kernel logs attached.
>>>
>>>> [ 2027.170589] BUG: kernel NULL pointer dereference, address:
>>>> 0000000000000000
>>>> [ 2027.170600] #PF: supervisor instruction fetch in kernel mode
>>>> [ 2027.170604] #PF: error_code(0x0010) - not-present page
>>>> [ 2027.170609] PGD 0 P4D 0 [ 2027.170619] Oops: 0010 [#1] SMP PTI
>> ...
>>>> [ 2027.170730]  do_dentry_open+0x13a/0x370
>>
>> If you have compiled with debug info, please decode the line:
>>
>>    linux/scripts/faddr2line vmlinux do_dentry_open+0x13a/0x370
>>
>> That gives us the fops pointer which is NULL.
>
> Hah, luckily it’s reproducible.
>
> ```
> $ scripts/faddr2line /usr/lib/debug/boot/vmlinux-5.2.0-2-amd64
> do_dentry_open+0x13a/0x370
> do_dentry_open+0x13a/0x370:
> do_dentry_open at fs/open.c:799
> ```
>
>>>> [ 2027.170745]  path_openat+0x2c6/0x1480
>>>> [ 2027.170757]  ? terminate_walk+0xe6/0x100
>>>> [ 2027.170767]  ? path_lookupat.isra.48+0xa3/0x220
>>>> [ 2027.170779]  ? reuse_swap_page+0x105/0x320
>>>> [ 2027.170791]  do_filp_open+0x93/0x100
>>>> [ 2027.170804]  ? __check_object_size+0x15d/0x189
>>>> [ 2027.170816]  do_sys_open+0x184/0x220
>>>> [ 2027.170828]  do_syscall_64+0x53/0x130
>>>> [ 2027.170837]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
>>
>> That's an open crashing. We just don't know which file. Is the machine
>> completely hosed after that or is it just the graphics stuff dying?
>
> No, the graphical login manager showed up, and I could log back in, and
> continue using hte machine.
>
>> If it's not completely dead then instead of running it from your
>> graphical
>> desktop you could switch to a VGA terminal Alt+Ctrl+F1 (or whatever
>> function key your distro maps to) after boot and run powertop with strace
>> from there:
>>
>>    strace -f -o xxx.log powertop
>>
>> With a bit of luck xxx.log should contain the information about the
>> file it
>> tries to open.
>
> ```
> 2157  access("/sys/class/drm/card0/power/rc6_residency_ms", R_OK) = 0
> 2157  openat(AT_FDCWD,
> "/sys/kernel/debug/tracing/events/power/cpu_idle/id", O_RDONLY) = ?
> 2157  +++ killed by SIGKILL +++
> ```
>
>> Alternatively if you have a serial console you can enable the
>> sys_enter_open* tracepoints:
>>
>> # echo 1 >/sys/kernel/debug/tracing/events/syscalls/sys_enter_open
>> # echo 1 >/sys/kernel/debug/tracing/events/syscalls/sys_enter_openat
>>
>> Either add 'ftrace_dump_on_oops' to the kernel command line or enable it
>> from the shell:
>>
>> # echo 1 > /proc/sys/kernel/ftrace_dump_on_oops
>>
>> Then run powertop. After the crash it will take some time to spill out
>> the trace buffer over serial, but it will pinpoint the offending file.
>
> I do not have serial console on this device.

For the record. It is also reproducible with Linux 5.2.6, and trying to
print the file contents with cat already fails.

```
$ sudo ls -l /sys/kernel/debug/tracing/events/power/cpu_idle/id
-r--r--r-- 1 root root 0 Aug 10 23:05
/sys/kernel/debug/tracing/events/power/cpu_idle/id
$ sudo cat /sys/kernel/debug/tracing/events/power/cpu_idle/id
Killed
```


Kind regards,

Paul

2019-08-11 06:23:46

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [Linux 5.2.x] /sys/kernel/debug/tracing/events/power/cpu_idle/id: BUG: kernel NULL pointer dereference, address: 0000000000000000

On Sat, 10 Aug 2019, Paul Menzel wrote:

Cc+ Steven

> [+ INTEL IDLE DRIVER]
>
> Dear Linux folks,
>
>
> On 10.08.19 20:28, Paul Menzel wrote:
>
> > On 10.08.19 19:31, Thomas Gleixner wrote:
> >
> > > On Sat, 10 Aug 2019, Paul Menzel wrote:
> > > >
> > > > I have no idea, who to report this to, so I please refer me to the
> > > > correct
> > > > list.
> > >
> > > I have no idea yet either :)
> > >
> > > > With Linux 5.2.7 from Debian Sid/unstable and PowerTOP 2.10, executing
> > > >
> > > >      sudo powertop --auto-tune
> > > >
> > > > causes a NULL pointer dereference, and the graphical session crashes due
> > > > to an
> > > > effect on the i915 driver. It worked in the past with the 4.19 series
> > > > from
> > > > Debian.
> > > >
> > > > Here is the trace, and please find all Linux kernel logs attached.
> > > >
> > > > > [ 2027.170589] BUG: kernel NULL pointer dereference, address:
> > > > > 0000000000000000
> > > > > [ 2027.170600] #PF: supervisor instruction fetch in kernel mode
> > > > > [ 2027.170604] #PF: error_code(0x0010) - not-present page
> > > > > [ 2027.170609] PGD 0 P4D 0 [ 2027.170619] Oops: 0010 [#1] SMP PTI
> > > ...
> > > > > [ 2027.170730]  do_dentry_open+0x13a/0x370
> > >
> > > If you have compiled with debug info, please decode the line:
> > >
> > >    linux/scripts/faddr2line vmlinux do_dentry_open+0x13a/0x370
> > >
> > > That gives us the fops pointer which is NULL.
> >
> > Hah, luckily it’s reproducible.
> >
> > ```
> > $ scripts/faddr2line /usr/lib/debug/boot/vmlinux-5.2.0-2-amd64
> > do_dentry_open+0x13a/0x370
> > do_dentry_open+0x13a/0x370:
> > do_dentry_open at fs/open.c:799
> > ```
> >
> > > > > [ 2027.170745]  path_openat+0x2c6/0x1480
> > > > > [ 2027.170757]  ? terminate_walk+0xe6/0x100
> > > > > [ 2027.170767]  ? path_lookupat.isra.48+0xa3/0x220
> > > > > [ 2027.170779]  ? reuse_swap_page+0x105/0x320
> > > > > [ 2027.170791]  do_filp_open+0x93/0x100
> > > > > [ 2027.170804]  ? __check_object_size+0x15d/0x189
> > > > > [ 2027.170816]  do_sys_open+0x184/0x220
> > > > > [ 2027.170828]  do_syscall_64+0x53/0x130
> > > > > [ 2027.170837]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> > >
> > > That's an open crashing. We just don't know which file. Is the machine
> > > completely hosed after that or is it just the graphics stuff dying?
> >
> > No, the graphical login manager showed up, and I could log back in, and
> > continue using hte machine.
> >
> > > If it's not completely dead then instead of running it from your graphical
> > > desktop you could switch to a VGA terminal Alt+Ctrl+F1 (or whatever
> > > function key your distro maps to) after boot and run powertop with strace
> > > from there:
> > >
> > >    strace -f -o xxx.log powertop
> > >
> > > With a bit of luck xxx.log should contain the information about the file
> > > it
> > > tries to open.
> >
> > ```
> > 2157  access("/sys/class/drm/card0/power/rc6_residency_ms", R_OK) = 0
> > 2157  openat(AT_FDCWD, "/sys/kernel/debug/tracing/events/power/cpu_idle/id",
> > O_RDONLY) = ?
> > 2157  +++ killed by SIGKILL +++
> > ```
> >
> > > Alternatively if you have a serial console you can enable the
> > > sys_enter_open* tracepoints:
> > >
> > > # echo 1 >/sys/kernel/debug/tracing/events/syscalls/sys_enter_open
> > > # echo 1 >/sys/kernel/debug/tracing/events/syscalls/sys_enter_openat
> > >
> > > Either add 'ftrace_dump_on_oops' to the kernel command line or enable it
> > > from the shell:
> > >
> > > # echo 1 > /proc/sys/kernel/ftrace_dump_on_oops
> > >
> > > Then run powertop. After the crash it will take some time to spill out the
> > > trace buffer over serial, but it will pinpoint the offending file.
> >
> > I do not have serial console on this device.
>
> For the record. It is also reproducible with Linux 5.2.6, and trying to print
> the file contents with cat already fails.
>
> ```
> $ sudo ls -l /sys/kernel/debug/tracing/events/power/cpu_idle/id
> -r--r--r-- 1 root root 0 Aug 10 23:05
> /sys/kernel/debug/tracing/events/power/cpu_idle/id
> $ sudo cat /sys/kernel/debug/tracing/events/power/cpu_idle/id
> Killed
> ```
>
>
> Kind regards,
>
> Paul
>

2019-08-11 09:50:46

by Salvatore Bonaccorso

[permalink] [raw]
Subject: Re: [Linux 5.2.x] /sys/kernel/debug/tracing/events/power/cpu_idle/id: BUG: kernel NULL pointer dereference, address: 0000000000000000

hi,

On Sun, Aug 11, 2019 at 08:22:35AM +0200, Thomas Gleixner wrote:
> On Sat, 10 Aug 2019, Paul Menzel wrote:
>
> Cc+ Steven
>
> > [+ INTEL IDLE DRIVER]
> >
> > Dear Linux folks,
> >
> >
> > On 10.08.19 20:28, Paul Menzel wrote:
> >
> > > On 10.08.19 19:31, Thomas Gleixner wrote:
> > >
> > > > On Sat, 10 Aug 2019, Paul Menzel wrote:
> > > > >
> > > > > I have no idea, who to report this to, so I please refer me to the
> > > > > correct
> > > > > list.
> > > >
> > > > I have no idea yet either :)
> > > >
> > > > > With Linux 5.2.7 from Debian Sid/unstable and PowerTOP 2.10, executing
> > > > >
> > > > >      sudo powertop --auto-tune
> > > > >
> > > > > causes a NULL pointer dereference, and the graphical session crashes due
> > > > > to an
> > > > > effect on the i915 driver. It worked in the past with the 4.19 series
> > > > > from
> > > > > Debian.
> > > > >
> > > > > Here is the trace, and please find all Linux kernel logs attached.
> > > > >
> > > > > > [ 2027.170589] BUG: kernel NULL pointer dereference, address:
> > > > > > 0000000000000000
> > > > > > [ 2027.170600] #PF: supervisor instruction fetch in kernel mode
> > > > > > [ 2027.170604] #PF: error_code(0x0010) - not-present page
> > > > > > [ 2027.170609] PGD 0 P4D 0 [ 2027.170619] Oops: 0010 [#1] SMP PTI
> > > > ...
> > > > > > [ 2027.170730]  do_dentry_open+0x13a/0x370
> > > >
> > > > If you have compiled with debug info, please decode the line:
> > > >
> > > >    linux/scripts/faddr2line vmlinux do_dentry_open+0x13a/0x370
> > > >
> > > > That gives us the fops pointer which is NULL.
> > >
> > > Hah, luckily it’s reproducible.
> > >
> > > ```
> > > $ scripts/faddr2line /usr/lib/debug/boot/vmlinux-5.2.0-2-amd64
> > > do_dentry_open+0x13a/0x370
> > > do_dentry_open+0x13a/0x370:
> > > do_dentry_open at fs/open.c:799
> > > ```
> > >
> > > > > > [ 2027.170745]  path_openat+0x2c6/0x1480
> > > > > > [ 2027.170757]  ? terminate_walk+0xe6/0x100
> > > > > > [ 2027.170767]  ? path_lookupat.isra.48+0xa3/0x220
> > > > > > [ 2027.170779]  ? reuse_swap_page+0x105/0x320
> > > > > > [ 2027.170791]  do_filp_open+0x93/0x100
> > > > > > [ 2027.170804]  ? __check_object_size+0x15d/0x189
> > > > > > [ 2027.170816]  do_sys_open+0x184/0x220
> > > > > > [ 2027.170828]  do_syscall_64+0x53/0x130
> > > > > > [ 2027.170837]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> > > >
> > > > That's an open crashing. We just don't know which file. Is the machine
> > > > completely hosed after that or is it just the graphics stuff dying?
> > >
> > > No, the graphical login manager showed up, and I could log back in, and
> > > continue using hte machine.
> > >
> > > > If it's not completely dead then instead of running it from your graphical
> > > > desktop you could switch to a VGA terminal Alt+Ctrl+F1 (or whatever
> > > > function key your distro maps to) after boot and run powertop with strace
> > > > from there:
> > > >
> > > >    strace -f -o xxx.log powertop
> > > >
> > > > With a bit of luck xxx.log should contain the information about the file
> > > > it
> > > > tries to open.
> > >
> > > ```
> > > 2157  access("/sys/class/drm/card0/power/rc6_residency_ms", R_OK) = 0
> > > 2157  openat(AT_FDCWD, "/sys/kernel/debug/tracing/events/power/cpu_idle/id",
> > > O_RDONLY) = ?
> > > 2157  +++ killed by SIGKILL +++
> > > ```
> > >
> > > > Alternatively if you have a serial console you can enable the
> > > > sys_enter_open* tracepoints:
> > > >
> > > > # echo 1 >/sys/kernel/debug/tracing/events/syscalls/sys_enter_open
> > > > # echo 1 >/sys/kernel/debug/tracing/events/syscalls/sys_enter_openat
> > > >
> > > > Either add 'ftrace_dump_on_oops' to the kernel command line or enable it
> > > > from the shell:
> > > >
> > > > # echo 1 > /proc/sys/kernel/ftrace_dump_on_oops
> > > >
> > > > Then run powertop. After the crash it will take some time to spill out the
> > > > trace buffer over serial, but it will pinpoint the offending file.
> > >
> > > I do not have serial console on this device.
> >
> > For the record. It is also reproducible with Linux 5.2.6, and trying to print
> > the file contents with cat already fails.
> >
> > ```
> > $ sudo ls -l /sys/kernel/debug/tracing/events/power/cpu_idle/id
> > -r--r--r-- 1 root root 0 Aug 10 23:05
> > /sys/kernel/debug/tracing/events/power/cpu_idle/id
> > $ sudo cat /sys/kernel/debug/tracing/events/power/cpu_idle/id
> > Killed
> > ```

This seems to be related to https://bugs.debian.org/934304 (in
particular https://bugs.debian.org/934304#29). The mentioned patch
features/all/lockdown/0031-tracefs-Restrict-tracefs-when-the-kernel-is-locked-d.patch
is a backport of https://patchwork.kernel.org/patch/11069661/ with
only change that it is converted back to the non-LSM lockdown API.

Regards,
Salvatore

2019-08-11 12:59:29

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [Linux 5.2.x] /sys/kernel/debug/tracing/events/power/cpu_idle/id: BUG: kernel NULL pointer dereference, address: 0000000000000000

Salvatore,

On Sun, 11 Aug 2019, Salvatore Bonaccorso wrote:
> On Sun, Aug 11, 2019 at 08:22:35AM +0200, Thomas Gleixner wrote:
> > On Sat, 10 Aug 2019, Paul Menzel wrote:
> > > For the record. It is also reproducible with Linux 5.2.6, and trying to print
> > > the file contents with cat already fails.
> > >
> > > ```
> > > $ sudo ls -l /sys/kernel/debug/tracing/events/power/cpu_idle/id
> > > -r--r--r-- 1 root root 0 Aug 10 23:05
> > > /sys/kernel/debug/tracing/events/power/cpu_idle/id
> > > $ sudo cat /sys/kernel/debug/tracing/events/power/cpu_idle/id
> > > Killed
> > > ```
>
> This seems to be related to https://bugs.debian.org/934304 (in
> particular https://bugs.debian.org/934304#29). The mentioned patch
> features/all/lockdown/0031-tracefs-Restrict-tracefs-when-the-kernel-is-locked-d.patch
> is a backport of https://patchwork.kernel.org/patch/11069661/ with
> only change that it is converted back to the non-LSM lockdown API.

So that's a debian kernel specific issue?

Thanks,

tglx

2019-08-11 15:34:32

by Salvatore Bonaccorso

[permalink] [raw]
Subject: Re: [Linux 5.2.x] /sys/kernel/debug/tracing/events/power/cpu_idle/id: BUG: kernel NULL pointer dereference, address: 0000000000000000

Thomas,

On Sun, Aug 11, 2019 at 02:58:15PM +0200, Thomas Gleixner wrote:
> Salvatore,
>
> On Sun, 11 Aug 2019, Salvatore Bonaccorso wrote:
> > On Sun, Aug 11, 2019 at 08:22:35AM +0200, Thomas Gleixner wrote:
> > > On Sat, 10 Aug 2019, Paul Menzel wrote:
> > > > For the record. It is also reproducible with Linux 5.2.6, and trying to print
> > > > the file contents with cat already fails.
> > > >
> > > > ```
> > > > $ sudo ls -l /sys/kernel/debug/tracing/events/power/cpu_idle/id
> > > > -r--r--r-- 1 root root 0 Aug 10 23:05
> > > > /sys/kernel/debug/tracing/events/power/cpu_idle/id
> > > > $ sudo cat /sys/kernel/debug/tracing/events/power/cpu_idle/id
> > > > Killed
> > > > ```
> >
> > This seems to be related to https://bugs.debian.org/934304 (in
> > particular https://bugs.debian.org/934304#29). The mentioned patch
> > features/all/lockdown/0031-tracefs-Restrict-tracefs-when-the-kernel-is-locked-d.patch
> > is a backport of https://patchwork.kernel.org/patch/11069661/ with
> > only change that it is converted back to the non-LSM lockdown API.
>
> So that's a debian kernel specific issue?

"yes". A kernel build without the above patch does not exhibit the
issue. So the issue is specific to that lockdown patch ("tracefs:
Restrict tracefs when the kernel is locked down").

Steven and Matthew are both CC'ed.

Regards,
Salvatore

2019-08-12 15:31:18

by Steven Rostedt

[permalink] [raw]
Subject: Re: [Linux 5.2.x] /sys/kernel/debug/tracing/events/power/cpu_idle/id: BUG: kernel NULL pointer dereference, address: 0000000000000000

On Sun, 11 Aug 2019 17:33:26 +0200
Salvatore Bonaccorso <[email protected]> wrote:

> Thomas,
>
> On Sun, Aug 11, 2019 at 02:58:15PM +0200, Thomas Gleixner wrote:
> > Salvatore,
> >
> > On Sun, 11 Aug 2019, Salvatore Bonaccorso wrote:
> > > On Sun, Aug 11, 2019 at 08:22:35AM +0200, Thomas Gleixner wrote:
> > > > On Sat, 10 Aug 2019, Paul Menzel wrote:
> > > > > For the record. It is also reproducible with Linux 5.2.6, and trying to print
> > > > > the file contents with cat already fails.
> > > > >
> > > > > ```
> > > > > $ sudo ls -l /sys/kernel/debug/tracing/events/power/cpu_idle/id
> > > > > -r--r--r-- 1 root root 0 Aug 10 23:05
> > > > > /sys/kernel/debug/tracing/events/power/cpu_idle/id
> > > > > $ sudo cat /sys/kernel/debug/tracing/events/power/cpu_idle/id
> > > > > Killed
> > > > > ```
> > >
> > > This seems to be related to https://bugs.debian.org/934304 (in
> > > particular https://bugs.debian.org/934304#29). The mentioned patch
> > > features/all/lockdown/0031-tracefs-Restrict-tracefs-when-the-kernel-is-locked-d.patch
> > > is a backport of https://patchwork.kernel.org/patch/11069661/ with
> > > only change that it is converted back to the non-LSM lockdown API.
> >
> > So that's a debian kernel specific issue?
>
> "yes". A kernel build without the above patch does not exhibit the
> issue. So the issue is specific to that lockdown patch ("tracefs:
> Restrict tracefs when the kernel is locked down").
>
> Steven and Matthew are both CC'ed.

From what I understand, Matthew's "lockdown" work is to prevent the
system from doing anything to see what is happening in the kernel. This
includes tracefs. This looks like it is working as designed.

Perhaps it's something in debian setup that needs to keep tracefs from
being selected as "lockdown", otherwise anything that tries to access
it, will have these issues.

-- Steve

2019-08-12 16:26:56

by Matthew Garrett

[permalink] [raw]
Subject: Re: [Linux 5.2.x] /sys/kernel/debug/tracing/events/power/cpu_idle/id: BUG: kernel NULL pointer dereference, address: 0000000000000000

On Mon, Aug 12, 2019 at 8:29 AM Steven Rostedt <[email protected]> wrote:

> From what I understand, Matthew's "lockdown" work is to prevent the
> system from doing anything to see what is happening in the kernel. This
> includes tracefs. This looks like it is working as designed.

Oopsing the kernel isn't working as designed. Ben has a patch to fix this.