I started adding fixes to my urgent branch rebased on top of v6.4-rc3
and ran my tests. Unfortunately they crashed on unrelated code.
Here's the dump:
BUG: kernel NULL pointer dereference, address: 00000000000003e8
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 0 P4D 0
Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC PTI
CPU: 0 PID: 1 Comm: swapper/0 Tainted: G N 6.3.0-rc1-test-00011-g27a2195efa8d #49
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
RIP: 0010:__dev_fwnode+0x9/0x2a
Code: ff 85 c0 78 16 48 8b 3c 24 89 c6 59 e9 e0 f7 ff ff b8 ea ff ff ff c3 cc cc cc cc 5a c3 cc cc cc cc f3 0f 1e fa 0f 1f 44 00 00 <48> 8b 87 e8 03 00 00 48
83 c0 18 c3 cc cc cc cc 48
RSP: 0000:ffffc90000013d88 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff88810b7a8800 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffff88810b7a8e20 RDI: 0000000000000000
RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: ffff88810b7a8800
R13: 0000000000000000 R14: 0000000000000001 R15: 0000000fffffffe0
FS: 0000000000000000(0000) GS:ffff88817ae00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000000003e8 CR3: 000000000221a001 CR4: 0000000000170eb0
Call Trace:
<TASK>
power_supply_get_battery_info+0x9d/0x6c7
? preempt_count_sub+0x13/0x20
? _raw_spin_unlock_irqrestore+0x3d/0x54
__power_supply_register+0x32f/0x48b
test_power_init+0x29/0xa0
? axp20x_usb_power_driver_init+0x17/0x17
do_one_initcall+0x105/0x28f
kernel_init_freeable+0x19e/0x1f2
? rest_init+0x14e/0x14e
kernel_init+0x1a/0x127
ret_from_fork+0x22/0x30
</TASK>
Modules linked in:
CR2: 00000000000003e8
---[ end trace 0000000000000000 ]---
Attached is the config. I ran a bisect and it found it to be this commit:
27a2195efa8d2 ("power: supply: core: auto-exposure of simple-battery data")
I checked out that commit and tested it, and it crashed. I then
reverted that commit, and the crash goes away.
The crash also goes away by reverting that commit on v6.4-rc3.
-- Steve
On Wed, May 24, 2023 at 10:12 AM Steven Rostedt <[email protected]> wrote:
>
> I started adding fixes to my urgent branch rebased on top of v6.4-rc3
> and ran my tests. Unfortunately they crashed on unrelated code.
>
> Here's the dump:
>
> BUG: kernel NULL pointer dereference, address: 00000000000003e8
> RIP: 0010:__dev_fwnode+0x9/0x2a
> Code: ff 85 c0 78 16 48 8b 3c 24 89 c6 59 e9 e0 f7 ff ff b8 ea ff ff ff c3 cc cc cc cc 5a c3 cc cc cc cc f3 0f 1e fa 0f 1f 44 00 00 <48> 8b 87 e8 03 00 00 48
> 83 c0 18 c3 cc cc cc cc 48
That disassembles to
endbr64
nopl 0x0(%rax,%rax,1)
mov 0x3e8(%rdi),%rax
add $0x18,%rax
ret
which looks like it must be the
return dev->fwnode;
with a NULL 'dev'. Which makes sense for __dev_fwnode with CONFIG_OF
not enabled.
Except I have no idea what that odd 'add $0x18" is all about. Strange.
Anyway, the caller seems to be this code in power_supply_get_battery_info():
if (psy->of_node) {
.. presumably not this ..
} else {
err = fwnode_property_get_reference_args(
dev_fwnode(psy->dev.parent),
"monitored-battery", NULL, 0, 0, &args);
...
so I suspect we have psy->dev.parent being NULL.
> I ran a bisect and it found it to be this commit:
>
> 27a2195efa8d2 ("power: supply: core: auto-exposure of simple-battery data")
>
> I checked out that commit and tested it, and it crashed. I then
> reverted that commit, and the crash goes away.
At a guess, it's
(a) the new code to expose battery info at registration time:
+ /*
+ * Expose constant battery info, if it is available. While there are
+ * some chargers accessing constant battery data, we only want to
+ * expose battery data to userspace for battery devices.
+ */
+ if (desc->type == POWER_SUPPLY_TYPE_BATTERY) {
+ rc = power_supply_get_battery_info(psy, &psy->battery_info);
+ if (rc && rc != -ENODEV && rc != -ENOENT)
+ goto check_supplies_failed;
+ }
interacting with
(b) the test_power_init() that does that
test_power_supplies[i] = power_supply_register(NULL,
&test_power_desc[i],
&test_power_configs[i]);
which passes in NULL for the "parent" pointer.
So it looks like a dodgy test that was a bit lazy. But maybe a NULL
parent is supposed to work.
Linus
Hi,
On Wed, May 24, 2023 at 11:28:41AM -0700, Linus Torvalds wrote:
> On Wed, May 24, 2023 at 10:12 AM Steven Rostedt <[email protected]> wrote:
> >
> > I started adding fixes to my urgent branch rebased on top of v6.4-rc3
> > and ran my tests. Unfortunately they crashed on unrelated code.
> >
> > Here's the dump:
> >
> > BUG: kernel NULL pointer dereference, address: 00000000000003e8
> > RIP: 0010:__dev_fwnode+0x9/0x2a
> > Code: ff 85 c0 78 16 48 8b 3c 24 89 c6 59 e9 e0 f7 ff ff b8 ea ff ff ff c3 cc cc cc cc 5a c3 cc cc cc cc f3 0f 1e fa 0f 1f 44 00 00 <48> 8b 87 e8 03 00 00 48
> > 83 c0 18 c3 cc cc cc cc 48
>
> That disassembles to
>
> endbr64
> nopl 0x0(%rax,%rax,1)
> mov 0x3e8(%rdi),%rax
> add $0x18,%rax
> ret
>
> which looks like it must be the
>
> return dev->fwnode;
>
> with a NULL 'dev'. Which makes sense for __dev_fwnode with CONFIG_OF
> not enabled.
>
> Except I have no idea what that odd 'add $0x18" is all about. Strange.
>
> Anyway, the caller seems to be this code in power_supply_get_battery_info():
>
> if (psy->of_node) {
> .. presumably not this ..
> } else {
> err = fwnode_property_get_reference_args(
> dev_fwnode(psy->dev.parent),
> "monitored-battery", NULL, 0, 0, &args);
> ...
>
> so I suspect we have psy->dev.parent being NULL.
>
> > I ran a bisect and it found it to be this commit:
> >
> > 27a2195efa8d2 ("power: supply: core: auto-exposure of simple-battery data")
> >
> > I checked out that commit and tested it, and it crashed. I then
> > reverted that commit, and the crash goes away.
>
> At a guess, it's
>
> (a) the new code to expose battery info at registration time:
>
> + /*
> + * Expose constant battery info, if it is available. While there are
> + * some chargers accessing constant battery data, we only want to
> + * expose battery data to userspace for battery devices.
> + */
> + if (desc->type == POWER_SUPPLY_TYPE_BATTERY) {
> + rc = power_supply_get_battery_info(psy, &psy->battery_info);
> + if (rc && rc != -ENODEV && rc != -ENOENT)
> + goto check_supplies_failed;
> + }
>
> interacting with
>
> (b) the test_power_init() that does that
>
> test_power_supplies[i] = power_supply_register(NULL,
> &test_power_desc[i],
> &test_power_configs[i]);
>
> which passes in NULL for the "parent" pointer.
>
> So it looks like a dodgy test that was a bit lazy. But maybe a NULL
> parent is supposed to work.
>
> Linus
I have a fix for that in my fixes branch, that I planned to send
this week:
https://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-power-supply.git/commit/?h=fixes&id=44c524b642996148a8e94f1a1b8751076edcf577
-- Sebastian
On Thu, 25 May 2023 18:42:48 +0200
Sebastian Reichel <[email protected]> wrote:
> I have a fix for that in my fixes branch, that I planned to send
> this week:
>
> https://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-power-supply.git/commit/?h=fixes&id=44c524b642996148a8e94f1a1b8751076edcf577
This appears to fix the bug I reported.
Tested-by: Steven Rostedt (Google) <[email protected]>
-- Steve