2018-02-03 16:19:07

by Alexander Sergeyev

[permalink] [raw]
Subject: PROBLEM: NULL pointer dereference in dell_set_arguments() in 4.15

Hello,

I'm getting a null pointer dereference after upgrading to 4.15 kernel. The
machine is a Dell Latitude E5570 laptop. The problem happens early during
bootup (and earlier than netconsole can do its job), so a photo is attached as
well as the kernel config (note: efistub).

Call trace:
dell_set_arguments+0xb (RIP)
dell_micmute_led_set+0x35
alc_fixup_dell_wmi+0x44
apply_fixup+0x103
snd_hda_apply_fixup+0x1d
patch_alc269+0x282
hda_codec_driver_probe+0x4a
driver_probe_device+0x221
__device_attach_driver+0x79
? __driver_attach+0x90
bus_for_each_drv+0x74
__device_attach+0xe8
device_initial_probe+0xe
bus_probe_device+0x8d
device_add+0x3b9
snd_hdac_device_register+0x11
? azx_probe_codecs+0x11f
snd_hda_codec_configure+0x36
azx_codec_configure+0x2f
azx_probe_work+0x47d
process_one_work+0x182
worker_thread+0x37
kthread+0x11a
? process_one_work+0x310
? __kthread_create_on_node+0x1a0
ret_from_fork+0x22

I bisected the bug using repository at [1], the log follows:
git bisect start
# good: [8d577afdee3540808302d9dc7a0a7be96c91178f] Linux 4.14.12
git bisect good 8d577afdee3540808302d9dc7a0a7be96c91178f
# bad: [d8a5b80568a9cb66810e75b182018e9edb68e8ff] Linux 4.15
git bisect bad d8a5b80568a9cb66810e75b182018e9edb68e8ff
# good: [bebc6082da0a9f5d47a1ea2edc099bf671058bd4] Linux 4.14
git bisect good bebc6082da0a9f5d47a1ea2edc099bf671058bd4
# good: [5d352e69c60e54b5f04d6e337a1d2bf0dbf3d94a] Merge tag 'media/v4.15-1' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media
git bisect good 5d352e69c60e54b5f04d6e337a1d2bf0dbf3d94a
# good: [f6705bf959efac87bca76d40050d342f1d212587] Merge tag 'drm-for-v4.15-amd-dc' of git://people.freedesktop.org/~airlied/linux
git bisect good f6705bf959efac87bca76d40050d342f1d212587
# bad: [4066aa72f9f2886105c6f747d7f9bd4f14f53c12] Merge tag 'drm-fixes-for-v4.15-rc3' of git://people.freedesktop.org/~airlied/linux
git bisect bad 4066aa72f9f2886105c6f747d7f9bd4f14f53c12
# bad: [3d18cbb7fd0cfdf0b2ca18139950a4b0c1a0a220] rxrpc: Fix conn expiry timers
git bisect bad 3d18cbb7fd0cfdf0b2ca18139950a4b0c1a0a220
# good: [c131187db2d3fa2f8bf32fdf4e9a4ef805168467] bpf: fix branch pruning logic
git bisect good c131187db2d3fa2f8bf32fdf4e9a4ef805168467
# bad: [9ed33805cdf81eadcc6ef54a81a8448e80e19f54] Merge branch 'ipvlan-Fix-insufficient-skb-linear-check'
git bisect bad 9ed33805cdf81eadcc6ef54a81a8448e80e19f54
# bad: [bf8973fc76e456378d3e2d6a13ed62a52281d379] Merge tag 'jfs-4.15-2' of git://github.com/kleikamp/linux-shaggy
git bisect bad bf8973fc76e456378d3e2d6a13ed62a52281d379
# bad: [e4a18052bb99e25d2c0074981120b76638285c22] platform/x86: sony-laptop: Drop variable assignment in sony_nc_setup_rfkill()
git bisect bad e4a18052bb99e25d2c0074981120b76638285c22
# good: [a5e50220edbdd1ec8912c191a0f5272d629743bf] platform/x86: intel_telemetry: cleanup redundant headers
git bisect good a5e50220edbdd1ec8912c191a0f5272d629743bf
# good: [722c856d46c6ca74a246b54a72f14751fec01aae] platform/x86: wmi: Add new method wmidev_evaluate_method
git bisect good 722c856d46c6ca74a246b54a72f14751fec01aae
# bad: [549b4930f057658dc50d8010e66219233119a4d8] platform/x86: dell-smbios: Introduce dispatcher for SMM calls
git bisect bad 549b4930f057658dc50d8010e66219233119a4d8
# good: [92b8c540bce7b1662212dff35f503f5b1266725b] platform/x86: dell-wmi-descriptor: split WMI descriptor into it's own driver
git bisect good 92b8c540bce7b1662212dff35f503f5b1266725b
# good: [980f481d63f57bb62ac171a66294de3e14d52b77] platform/x86: dell-smbios: only run if proper oem string is detected
git bisect good 980f481d63f57bb62ac171a66294de3e14d52b77
# good: [33b9ca1e53b45f7cacdba9d4fba5cb1387b26827] platform/x86: dell-smbios: Add a sysfs interface for SMBIOS tokens
git bisect good 33b9ca1e53b45f7cacdba9d4fba5cb1387b26827
# first bad commit: [549b4930f057658dc50d8010e66219233119a4d8] platform/x86: dell-smbios: Introduce dispatcher for SMM calls

From source code (at 549b4930f057) it looks like dell_set_arguments() which
writes to `buffer` is called before the buffer gets allocated, but I might be
wrong.

But this is not the whole story. After a downgrade to a known-good 4.14.12
kernel, I ran unto another problem. The system consistently failed to wake up
from suspend-to-ram state and was rebooting instead. By some intuition I
navigated myself into the BIOS settings screen (which gave me unusual freezes
up to ~30 seconds) and switched POST diagnostic mode from minimal to thorough,
which somehow resolved the problem. There was no problem with system suspending
before, and the problem appeared only after I tried 4.15. It would be great to
hear any ideas or explanations of such behaviour.

[1] git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git


Attachments:
(No filename) (4.75 kB)
kconfig (113.71 kB)
dmesg.png (222.92 kB)
Download all attachments

2018-02-03 23:35:22

by Alexander Sergeyev

[permalink] [raw]
Subject: Re: PROBLEM: NULL pointer dereference in dell_set_arguments() in 4.15

On Sat, Feb 03, 2018 at 07:20:21PM +0300, Alexander Sergeyev wrote:
> # first bad commit: [549b4930f057658dc50d8010e66219233119a4d8] platform/x86:
>dell-smbios: Introduce dispatcher for SMM calls

>From source code (at 549b4930f057) it looks like dell_set_arguments() which
>writes to `buffer` is called before the buffer gets allocated

Turns out that the problem has already surfaced before, but from a different
origin -- namely, rfkill interface [1]. This was subsequently fixed in
5246741a3f2e and c6f9288ee460.

This time there is an ordering problem between initialization of the
dell-laptop module and audio modules which are trying to flash a microphone
mute led on keyboard (via dell-laptop interface).

And about suspend-to-ram wakeup problem -- is it possible that there was some
buggy interaction with smbios that led to the observed behaviour?

[1] https://lkml.org/lkml/2017/11/15/705

2018-02-04 09:21:33

by Alexander Sergeyev

[permalink] [raw]
Subject: Re: PROBLEM: NULL pointer dereference in dell_set_arguments() in 4.15

Mario,

>Would you please try https://patchwork.kernel.org/patch/10194287/
>And see if it cleans up this null pointer dereference?

Yes, it does.

Is there any estimates on when the patch will be merged into mainline? I want
to put something into my distribution bug tracker, but it's unlikely they will
use this patch before its stabilization.

>As for your suspend problem did you pick up the new firmware that contained
>the microcode update for Spectre? Or a microcode update from your
>distribution?

Yes, around a month ago (the firmware option). Still, there is a time frame
between the firmware update and the problem surfacing. As I said it was
eliminated by switching POST/fastboot to the thorough mode. I just have tried
to reproduce the problem with previous settings, but with no success. So, that
is (in a way) closed, but I have no idea what it was about.

Thank you.

2018-02-04 12:17:41

by Pali Rohár

[permalink] [raw]
Subject: Re: PROBLEM: NULL pointer dereference in dell_set_arguments() in 4.15

On Sunday 04 February 2018 12:23:33 Alexander Sergeyev wrote:
> Mario,
>
> > Would you please try https://patchwork.kernel.org/patch/10194287/
> > And see if it cleans up this null pointer dereference?
>
> Yes, it does.

So problem which I spotted is not only theoretical, but already affects
users... Pity that I have not looked at that patch which introduced that
problem deeply earlier :-(

So there is race condition between initializing dell-laptop driver and
calling exported function from this driver. But does not we still have
same problem at layer between dell-laptop.ko dell-sbios.ko and
dell-smbios-*.ko?

To make dell_micmute_led_set() work properly we need to ensure that
either WMI or SMM driver is already loaded and initialized.

> Is there any estimates on when the patch will be merged into mainline? I
> want to put something into my distribution bug tracker, but it's unlikely
> they will use this patch before its stabilization.

It should go in next round of merging into linus tree and after that
with proper commit message tags it should included in next stable
versions.

--
Pali Rohár
[email protected]


Attachments:
(No filename) (1.14 kB)
signature.asc (201.00 B)
Download all attachments

2018-02-04 14:28:05

by Andy Shevchenko

[permalink] [raw]
Subject: Re: PROBLEM: NULL pointer dereference in dell_set_arguments() in 4.15

On Sun, Feb 4, 2018 at 2:15 PM, Pali Rohár <[email protected]> wrote:
> On Sunday 04 February 2018 12:23:33 Alexander Sergeyev wrote:

>> Is there any estimates on when the patch will be merged into mainline? I
>> want to put something into my distribution bug tracker, but it's unlikely
>> they will use this patch before its stabilization.
>
> It should go in next round of merging into linus tree and after that
> with proper commit message tags it should included in next stable
> versions.

Next week in Linus' tree, after that through couple more weeks in
stable releases.

--
With Best Regards,
Andy Shevchenko

2018-02-04 16:29:38

by Alexander Sergeyev

[permalink] [raw]
Subject: Re: PROBLEM: NULL pointer dereference in dell_set_arguments() in 4.15

>To make dell_micmute_led_set() work properly we need to ensure that either WMI
>or SMM driver is already loaded and initialized.

From the looks of the call trace dell_micmute_led_set() is called from device
phase of initialization. Which means that there is a use of dell-laptop
interface while the module is not initialized (since late_initcall).
Previously, this led to crashes.

But dell_micmute_led_set() does not touch the module state now, so nothing is
technically broken; dell-smbios and dell-smbios-smm use subsys_initcall, so
they are ready. But WMI driver is in the same phase with intel hda, so there
might be something there.