2023-03-16 08:39:31

by Li Zhijian

[permalink] [raw]
Subject: [PATCH] nvdimm: nvdimm_bus_register: Avoid adding device to the unregistered bus

nvdimm_bus_register() could be called from other modules, such as nfit,
but it can only be called after the nvdimm_bus_type is registered.

BUG: kernel NULL pointer dereference, address: 0000000000000098
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 0 P4D 0
Oops: 0000 [#1] PREEMPT SMP PTI
CPU: 0 PID: 117 Comm: systemd-udevd Not tainted 6.2.0-rc6-pmem+ #97
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
RIP: 0010:bus_add_device+0x58/0x150
Call Trace:
<TASK>
device_add+0x3ac/0x980
nvdimm_bus_register+0x16d/0x1d0
acpi_nfit_init+0xb72/0x1f90 [nfit]
acpi_nfit_add+0x1d5/0x200 [nfit]
acpi_device_probe+0x45/0x160
really_probe+0xce/0x390
__driver_probe_device+0x78/0x180
driver_probe_device+0x1e/0x90
__driver_attach+0xd6/0x1d0
bus_for_each_dev+0x7b/0xc0
bus_add_driver+0x1ac/0x200
driver_register+0x8f/0xf0
nfit_init+0x164/0xff0 [nfit]
do_one_initcall+0x5b/0x320
do_init_module+0x4c/0x1f0
__do_sys_finit_module+0xb4/0x130
do_syscall_64+0x3b/0x90
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Signed-off-by: Li Zhijian <[email protected]>
---
drivers/nvdimm/bus.c | 7 +++++++
1 file changed, 7 insertions(+)

diff --git a/drivers/nvdimm/bus.c b/drivers/nvdimm/bus.c
index ada61bbf49c1..ea66053072cb 100644
--- a/drivers/nvdimm/bus.c
+++ b/drivers/nvdimm/bus.c
@@ -28,6 +28,7 @@
static int nvdimm_bus_major;
struct class *nd_class;
static DEFINE_IDA(nd_ida);
+static bool nvdimm_bus_type_registered;

static int to_nd_device_type(struct device *dev)
{
@@ -337,6 +338,10 @@ struct nvdimm_bus *nvdimm_bus_register(struct device *parent,
struct nvdimm_bus *nvdimm_bus;
int rc;

+ if (!nvdimm_bus_type_registered) {
+ pr_warn("nvdimm bus type is not registered\n");
+ return NULL;
+ }
nvdimm_bus = kzalloc(sizeof(*nvdimm_bus), GFP_KERNEL);
if (!nvdimm_bus)
return NULL;
@@ -1321,6 +1326,7 @@ int __init nvdimm_bus_init(void)
if (rc)
goto err_nd_bus;

+ nvdimm_bus_type_registered = true;
return 0;

err_nd_bus:
@@ -1343,4 +1349,5 @@ void nvdimm_bus_exit(void)
unregister_chrdev(nvdimm_major, "dimmctl");
bus_unregister(&nvdimm_bus_type);
ida_destroy(&nd_ida);
+ nvdimm_bus_type_registered = false;
}
--
1.8.3.1



2023-03-16 15:55:05

by Dan Williams

[permalink] [raw]
Subject: RE: [PATCH] nvdimm: nvdimm_bus_register: Avoid adding device to the unregistered bus

Li Zhijian wrote:
> nvdimm_bus_register() could be called from other modules, such as nfit,
> but it can only be called after the nvdimm_bus_type is registered.
>
> BUG: kernel NULL pointer dereference, address: 0000000000000098
> #PF: supervisor read access in kernel mode
> #PF: error_code(0x0000) - not-present page
> PGD 0 P4D 0
> Oops: 0000 [#1] PREEMPT SMP PTI
> CPU: 0 PID: 117 Comm: systemd-udevd Not tainted 6.2.0-rc6-pmem+ #97
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
> RIP: 0010:bus_add_device+0x58/0x150
> Call Trace:
> <TASK>
> device_add+0x3ac/0x980
> nvdimm_bus_register+0x16d/0x1d0
> acpi_nfit_init+0xb72/0x1f90 [nfit]
> acpi_nfit_add+0x1d5/0x200 [nfit]
> acpi_device_probe+0x45/0x160

Can you explain a bit more how to hit this crash? This has not been a
problem historically and the explanation above makes it sound like this
is a theoretical issue.

libnvdimm_init() *should* be done before the nfit driver can attempt
nvdimm_bus_register(). So, something else is broken if
nvdimm_bus_register() can be called before libnvdimm_init(), or after
libnvdimm_exit().

2023-03-17 01:43:21

by Li Zhijian

[permalink] [raw]
Subject: Re: [PATCH] nvdimm: nvdimm_bus_register: Avoid adding device to the unregistered bus



On 16/03/2023 23:54, Dan Williams wrote:
> Li Zhijian wrote:
>> nvdimm_bus_register() could be called from other modules, such as nfit,
>> but it can only be called after the nvdimm_bus_type is registered.
>>
>> BUG: kernel NULL pointer dereference, address: 0000000000000098
>> #PF: supervisor read access in kernel mode
>> #PF: error_code(0x0000) - not-present page
>> PGD 0 P4D 0
>> Oops: 0000 [#1] PREEMPT SMP PTI
>> CPU: 0 PID: 117 Comm: systemd-udevd Not tainted 6.2.0-rc6-pmem+ #97
>> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
>> RIP: 0010:bus_add_device+0x58/0x150
>> Call Trace:
>> <TASK>
>> device_add+0x3ac/0x980
>> nvdimm_bus_register+0x16d/0x1d0
>> acpi_nfit_init+0xb72/0x1f90 [nfit]
>> acpi_nfit_add+0x1d5/0x200 [nfit]
>> acpi_device_probe+0x45/0x160
>
> Can you explain a bit more how to hit this crash? This has not been a
> problem historically and the explanation above makes it sound like this
> is a theoretical issue.
>

Dan,

Configure the kconfig with ACPI_NFIT [=m] && LIBNVDIMM [=y], and add extra kernel booting parameter
'initcall_blacklist=libnvdimm_init'. Then kernel panic!
Theoretically, it will also happen if nvdimm_bus_register() failed.


For kdump purpose[1], we need to disable libnvdimm driver to ensure metadata in pmem will not be updated again in kdump kernel
[1]https://lore.kernel.org/linux-mm/[email protected]/T/

Thanks
Zhijian

> libnvdimm_init() *should* be done before the nfit driver can attempt
> nvdimm_bus_register(). So, something else is broken if
> nvdimm_bus_register() can be called before libnvdimm_init(), or after
> libnvdimm_exit().

2023-03-17 02:27:10

by Dan Williams

[permalink] [raw]
Subject: Re: [PATCH] nvdimm: nvdimm_bus_register: Avoid adding device to the unregistered bus

[email protected] wrote:
>
>
> On 16/03/2023 23:54, Dan Williams wrote:
> > Li Zhijian wrote:
> >> nvdimm_bus_register() could be called from other modules, such as nfit,
> >> but it can only be called after the nvdimm_bus_type is registered.
> >>
> >> BUG: kernel NULL pointer dereference, address: 0000000000000098
> >> #PF: supervisor read access in kernel mode
> >> #PF: error_code(0x0000) - not-present page
> >> PGD 0 P4D 0
> >> Oops: 0000 [#1] PREEMPT SMP PTI
> >> CPU: 0 PID: 117 Comm: systemd-udevd Not tainted 6.2.0-rc6-pmem+ #97
> >> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
> >> RIP: 0010:bus_add_device+0x58/0x150
> >> Call Trace:
> >> <TASK>
> >> device_add+0x3ac/0x980
> >> nvdimm_bus_register+0x16d/0x1d0
> >> acpi_nfit_init+0xb72/0x1f90 [nfit]
> >> acpi_nfit_add+0x1d5/0x200 [nfit]
> >> acpi_device_probe+0x45/0x160
> >
> > Can you explain a bit more how to hit this crash? This has not been a
> > problem historically and the explanation above makes it sound like this
> > is a theoretical issue.
> >
>
> Dan,
>
> Configure the kconfig with ACPI_NFIT [=m] && LIBNVDIMM [=y], and add extra kernel booting parameter
> 'initcall_blacklist=libnvdimm_init'. Then kernel panic!
> Theoretically, it will also happen if nvdimm_bus_register() failed.
>
>
> For kdump purpose[1], we need to disable libnvdimm driver to ensure metadata in pmem will not be updated again in kdump kernel
> [1]https://lore.kernel.org/linux-mm/[email protected]/T/

Ah, great write up! Let me give that some thought. Apologies for missing
it earlier.

This would have been a good use for:

Link: https://lore.kernel.org/linux-mm/[email protected]

...in the above changelog.

2023-03-17 05:39:02

by Li Zhijian

[permalink] [raw]
Subject: Re: [PATCH] nvdimm: nvdimm_bus_register: Avoid adding device to the unregistered bus



On 17/03/2023 10:26, Dan Williams wrote:
> [email protected] wrote:
>>
>>
>> On 16/03/2023 23:54, Dan Williams wrote:
>>> Li Zhijian wrote:
>>>> nvdimm_bus_register() could be called from other modules, such as nfit,
>>>> but it can only be called after the nvdimm_bus_type is registered.
>>>>
>>>> BUG: kernel NULL pointer dereference, address: 0000000000000098
>>>> #PF: supervisor read access in kernel mode
>>>> #PF: error_code(0x0000) - not-present page
>>>> PGD 0 P4D 0
>>>> Oops: 0000 [#1] PREEMPT SMP PTI
>>>> CPU: 0 PID: 117 Comm: systemd-udevd Not tainted 6.2.0-rc6-pmem+ #97
>>>> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
>>>> RIP: 0010:bus_add_device+0x58/0x150
>>>> Call Trace:
>>>> <TASK>
>>>> device_add+0x3ac/0x980
>>>> nvdimm_bus_register+0x16d/0x1d0
>>>> acpi_nfit_init+0xb72/0x1f90 [nfit]
>>>> acpi_nfit_add+0x1d5/0x200 [nfit]
>>>> acpi_device_probe+0x45/0x160
>>>
>>> Can you explain a bit more how to hit this crash? This has not been a
>>> problem historically and the explanation above makes it sound like this
>>> is a theoretical issue.
>>>
>>
>> Dan,
>>
>> Configure the kconfig with ACPI_NFIT [=m] && LIBNVDIMM [=y], and add extra kernel booting parameter
>> 'initcall_blacklist=libnvdimm_init'. Then kernel panic!
>> Theoretically, it will also happen if nvdimm_bus_register() failed.
>>
>>
>> For kdump purpose[1], we need to disable libnvdimm driver to ensure metadata in pmem will not be updated again in kdump kernel
>> [1]https://lore.kernel.org/linux-mm/[email protected]/T/
>
> Ah, great write up! Let me give that some thought.

That would be great.


> Apologies for missing it earlier.

Never mind :)



>
> This would have been a good use for:
>
> Link: https://lore.kernel.org/linux-mm/[email protected]
>
> ...in the above changelog.


sounds good, i will update it in next version.


Thanks
Zhijian

2023-03-17 06:00:19

by Dan Williams

[permalink] [raw]
Subject: Re: [PATCH] nvdimm: nvdimm_bus_register: Avoid adding device to the unregistered bus

[email protected] wrote:
>
>
> On 16/03/2023 23:54, Dan Williams wrote:
> > Li Zhijian wrote:
> >> nvdimm_bus_register() could be called from other modules, such as nfit,
> >> but it can only be called after the nvdimm_bus_type is registered.
> >>
> >> BUG: kernel NULL pointer dereference, address: 0000000000000098
> >> #PF: supervisor read access in kernel mode
> >> #PF: error_code(0x0000) - not-present page
> >> PGD 0 P4D 0
> >> Oops: 0000 [#1] PREEMPT SMP PTI
> >> CPU: 0 PID: 117 Comm: systemd-udevd Not tainted 6.2.0-rc6-pmem+ #97
> >> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
> >> RIP: 0010:bus_add_device+0x58/0x150
> >> Call Trace:
> >> <TASK>
> >> device_add+0x3ac/0x980
> >> nvdimm_bus_register+0x16d/0x1d0
> >> acpi_nfit_init+0xb72/0x1f90 [nfit]
> >> acpi_nfit_add+0x1d5/0x200 [nfit]
> >> acpi_device_probe+0x45/0x160
> >
> > Can you explain a bit more how to hit this crash? This has not been a
> > problem historically and the explanation above makes it sound like this
> > is a theoretical issue.
> >
>
> Dan,
>
> Configure the kconfig with ACPI_NFIT [=m] && LIBNVDIMM [=y], and add extra kernel booting parameter
> 'initcall_blacklist=libnvdimm_init'. Then kernel panic!

That's expected though, you can't block libnvdimm_init and then expect
modules that link to libnvdimm to work. You would also need to block all
modules / initcalls that depend on libnvdimm_init having run. I'll
respond to the other thread with some ideas.

2023-03-20 02:07:36

by Li Zhijian

[permalink] [raw]
Subject: Re: [PATCH] nvdimm: nvdimm_bus_register: Avoid adding device to the unregistered bus



On 17/03/2023 13:59, Dan Williams wrote:
> [email protected] wrote:
>>
>>
>> On 16/03/2023 23:54, Dan Williams wrote:
>>> Li Zhijian wrote:
>>>> nvdimm_bus_register() could be called from other modules, such as nfit,
>>>> but it can only be called after the nvdimm_bus_type is registered.
>>>>
>>>> BUG: kernel NULL pointer dereference, address: 0000000000000098
>>>> #PF: supervisor read access in kernel mode
>>>> #PF: error_code(0x0000) - not-present page
>>>> PGD 0 P4D 0
>>>> Oops: 0000 [#1] PREEMPT SMP PTI
>>>> CPU: 0 PID: 117 Comm: systemd-udevd Not tainted 6.2.0-rc6-pmem+ #97
>>>> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
>>>> RIP: 0010:bus_add_device+0x58/0x150
>>>> Call Trace:
>>>> <TASK>
>>>> device_add+0x3ac/0x980
>>>> nvdimm_bus_register+0x16d/0x1d0
>>>> acpi_nfit_init+0xb72/0x1f90 [nfit]
>>>> acpi_nfit_add+0x1d5/0x200 [nfit]
>>>> acpi_device_probe+0x45/0x160
>>>
>>> Can you explain a bit more how to hit this crash? This has not been a
>>> problem historically and the explanation above makes it sound like this
>>> is a theoretical issue.
>>>
>>
>> Dan,
>>
>> Configure the kconfig with ACPI_NFIT [=m] && LIBNVDIMM [=y], and add extra kernel booting parameter
>> 'initcall_blacklist=libnvdimm_init'. Then kernel panic!
>
> That's expected though,

Do you mean we just keep it as it is.


> you can't block libnvdimm_init and then expect
> modules that link to libnvdimm to work.
Ah, we would rather see it *unable to work* than panic, isn't it.



Thanks
Zhijian


> You would also need to block all
> modules / initcalls that depend on libnvdimm_init having runI'll
> respond to the other thread with some ideas.

2023-03-20 17:36:51

by Dan Williams

[permalink] [raw]
Subject: Re: [PATCH] nvdimm: nvdimm_bus_register: Avoid adding device to the unregistered bus

[email protected] wrote:
[..]
> >> Dan,
> >>
> >> Configure the kconfig with ACPI_NFIT [=m] && LIBNVDIMM [=y], and add extra kernel booting parameter
> >> 'initcall_blacklist=libnvdimm_init'. Then kernel panic!
> >
> > That's expected though,
>
> Do you mean we just keep it as it is.

Yes.

>
>
> > you can't block libnvdimm_init and then expect
> > modules that link to libnvdimm to work.
> Ah, we would rather see it *unable to work* than panic, isn't it.

That part is true, but consider the implications of adding error
handling to all code that can no longer depend on initcall ordering, not
just libnvdimm. This would be a large paradigm shift.

Now I do think it would be a good idea to fail device_add() if the bus
is not registered, but I *think* that happens now as a result of:

5221b82d46f2 driver core: bus: bus_add/probe/remove_device() cleanups

...can you double check if you have that commit in your tests?

2023-03-21 02:19:08

by Li Zhijian

[permalink] [raw]
Subject: Re: [PATCH] nvdimm: nvdimm_bus_register: Avoid adding device to the unregistered bus



On 21/03/2023 01:30, Dan Williams wrote:
> [email protected] wrote:
> [..]
>>>> Dan,
>>>>
>>>> Configure the kconfig with ACPI_NFIT [=m] && LIBNVDIMM [=y], and add extra kernel booting parameter
>>>> 'initcall_blacklist=libnvdimm_init'. Then kernel panic!
>>>
>>> That's expected though,
>>
>> Do you mean we just keep it as it is.
>
> Yes.
>
>>
>>
>>> you can't block libnvdimm_init and then expect
>>> modules that link to libnvdimm to work.
>> Ah, we would rather see it *unable to work* than panic, isn't it.
>
> That part is true, but consider the implications of adding error
> handling to all code that can no longer depend on initcall ordering, not
> just libnvdimm. This would be a large paradigm shift.
>
> Now I do think it would be a good idea to fail device_add() if the bus
> is not registered, but I *think* that happens now as a result of:
>
> 5221b82d46f2 driver core: bus: bus_add/probe/remove_device() cleanups
>
> ...can you double check if you have that commit in your tests?

Great, panic is gone after i upgraded to the upstream!


> Now I do think it would be a good idea to fail device_add() if the bus
> is not registered,

BTW, below line 369: device_add() didn't fail in practical.

354 mutex_init(&nvdimm_bus->reconfig_mutex);
355 badrange_init(&nvdimm_bus->badrange);
356 nvdimm_bus->nd_desc = nd_desc;
357 nvdimm_bus->dev.parent = parent;
358 nvdimm_bus->dev.type = &nvdimm_bus_dev_type;
359 nvdimm_bus->dev.groups = nd_desc->attr_groups;
360 nvdimm_bus->dev.bus = &nvdimm_bus_type;
361 nvdimm_bus->dev.of_node = nd_desc->of_node;
362 device_initialize(&nvdimm_bus->dev);
363 lockdep_set_class(&nvdimm_bus->dev.mutex, &nvdimm_bus_key);
364 device_set_pm_not_required(&nvdimm_bus->dev);
365 rc = dev_set_name(&nvdimm_bus->dev, "ndbus%d", nvdimm_bus->id);
366 if (rc)
367 goto err;
368
369 rc = device_add(&nvdimm_bus->dev);
370 dev_err(&nvdimm_bus->dev, "registration failed: %d\n", rc);

Thanks
Zhijian

2023-03-21 02:38:41

by Dan Williams

[permalink] [raw]
Subject: Re: [PATCH] nvdimm: nvdimm_bus_register: Avoid adding device to the unregistered bus

[email protected] wrote:
[..]
> > Now I do think it would be a good idea to fail device_add() if the bus
> > is not registered,
>
> BTW, below line 369: device_add() didn't fail in practical.
>

I think that's ok because the device gets added, but never probed due to
this part of that commit I referenced:

@@ -503,20 +517,21 @@ int bus_add_device(struct device *dev)
*/
void bus_probe_device(struct device *dev)
{
- struct bus_type *bus = dev->bus;
+ struct subsys_private *sp = bus_to_subsys(dev->bus);
struct subsys_interface *sif;

- if (!bus)
+ if (!sp)
return;


...so it does what you want which is disable the libnvdimm subsystem
from binding to any devices without crashing.

2023-03-21 03:20:45

by Li Zhijian

[permalink] [raw]
Subject: Re: [PATCH] nvdimm: nvdimm_bus_register: Avoid adding device to the unregistered bus



On 21/03/2023 10:38, Dan Williams wrote:
> [email protected] wrote:
> [..]
>>> Now I do think it would be a good idea to fail device_add() if the bus
>>> is not registered,
>>
>> BTW, below line 369: device_add() didn't fail in practical.
>>
>
> I think that's ok because the device gets added, but never probed due to
> this part of that commit I referenced:

Very thanks for your explanation.

>
> @@ -503,20 +517,21 @@ int bus_add_device(struct device *dev)
> */
> void bus_probe_device(struct device *dev)
> {
> - struct bus_type *bus = dev->bus;
> + struct subsys_private *sp = bus_to_subsys(dev->bus);
> struct subsys_interface *sif;
>
> - if (!bus)
> + if (!sp)
> return;
>
>
> ...so it does what you want which is disable the libnvdimm subsystem
> from binding to any devices without crashing.

Yes, it's fine enough to me.