2024-02-28 06:13:42

by Peng Fan (OSS)

[permalink] [raw]
Subject: [PATCH] of: dynamic: notify before revert

From: Peng Fan <[email protected]>

When PCI node was created using an overlay and the overlay is
reverted/destroyed, the "linux,pci-domain" property no longer
exists, so of_get_pci_domain_nr will return failure. Then
of_pci_bus_release_domain_nr will actually use the dynamic IDA,
even if the IDA was allocated in static IDA.

So move the notify before revert to fix the issue.

Fixes: c14f7ccc9f5d ("PCI: Assign PCI domain IDs by ida_alloc()")
Signed-off-by: Peng Fan <[email protected]>
---

The initial thread:
https://lore.kernel.org/all/20230913115737.GA426735@bhelgaas/

drivers/of/dynamic.c | 12 +++++++-----
1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/drivers/of/dynamic.c b/drivers/of/dynamic.c
index 3bf27052832f..0a5ec2bda380 100644
--- a/drivers/of/dynamic.c
+++ b/drivers/of/dynamic.c
@@ -834,13 +834,15 @@ int __of_changeset_revert_notify(struct of_changeset *ocs)

static int __of_changeset_revert(struct of_changeset *ocs)
{
- int ret, ret_reply;
+ int ret, ret_reply = 0;

- ret_reply = 0;
- ret = __of_changeset_revert_entries(ocs, &ret_reply);
+ ret = __of_changeset_revert_notify(ocs);
+ if (ret)
+ return ret;

- if (!ret)
- ret = __of_changeset_revert_notify(ocs);
+ ret = __of_changeset_revert_entries(ocs, &ret_reply);
+ if (ret && !ret_reply)
+ ret = __of_changeset_apply_notify(ocs);

return ret;
}
--
2.37.1



2024-02-28 13:34:40

by Rob Herring (Arm)

[permalink] [raw]
Subject: Re: [PATCH] of: dynamic: notify before revert

On Wed, Feb 28, 2024 at 12:13 AM Peng Fan (OSS) <[email protected]> wrote:
>
> From: Peng Fan <[email protected]>
>
> When PCI node was created using an overlay and the overlay is
> reverted/destroyed, the "linux,pci-domain" property no longer
> exists, so of_get_pci_domain_nr will return failure. Then
> of_pci_bus_release_domain_nr will actually use the dynamic IDA,
> even if the IDA was allocated in static IDA.

That usage is broken to begin with unless there is a guarantee that
static and dynamic domain numbers don't overlap. For example, a
dynamic number is assigned and then you load an overlay with the same
number defined in it.

> So move the notify before revert to fix the issue.

You can't just change the timing. Something might require notify to be
after the revert.

> Fixes: c14f7ccc9f5d ("PCI: Assign PCI domain IDs by ida_alloc()")

I don't see where the notifier is even used.

Rob

2024-02-29 08:01:50

by Peng Fan

[permalink] [raw]
Subject: RE: [PATCH] of: dynamic: notify before revert

> Subject: Re: [PATCH] of: dynamic: notify before revert
>
> On Wed, Feb 28, 2024 at 12:13 AM Peng Fan (OSS) <[email protected]>
> wrote:
> >
> > From: Peng Fan <[email protected]>
> >
> > When PCI node was created using an overlay and the overlay is
> > reverted/destroyed, the "linux,pci-domain" property no longer exists,
> > so of_get_pci_domain_nr will return failure. Then
> > of_pci_bus_release_domain_nr will actually use the dynamic IDA, even
> > if the IDA was allocated in static IDA.
>
> That usage is broken to begin with unless there is a guarantee that static and
> dynamic domain numbers don't overlap. For example, a dynamic number is
> assigned and then you load an overlay with the same number defined in it.

I may not describe it clear, the code is here, because overlay property
removed, of_get_pci_domain_nr will return failure, so the code path
will goest into free a dynamic ida. But actually there is no such dynamic
ida, so dump.

So the problem is overlay was removed, but the notify callback may
still use the property to do some work.

static void of_pci_bus_release_domain_nr(struct pci_bus *bus, struct device *parent)
{
if (bus->domain_nr < 0)
return;

/* Release domain from IDA where it was allocated. */
if (of_get_pci_domain_nr(parent->of_node) == bus->domain_nr)
ida_free(&pci_domain_nr_static_ida, bus->domain_nr);
else
ida_free(&pci_domain_nr_dynamic_ida, bus->domain_nr);
}
>
> > So move the notify before revert to fix the issue.
>
> You can't just change the timing. Something might require notify to be after
> the revert.
>
> > Fixes: c14f7ccc9f5d ("PCI: Assign PCI domain IDs by ida_alloc()")
>
> I don't see where the notifier is even used.

The stack is as below:

[ 595.150529] CPU: 1 PID: 582 Comm: jailhouse Tainted: G O 6.5.0-rc4-next-20230804-05021-g3d4cc14b42ef-dirty #355
[ 595.161998] Hardware name: NXP i.MX93 11X11 EVK board (DT)
[ 595.167475] Call trace:
[ 595.169908] dump_backtrace+0x94/0xec
[ 595.173573] show_stack+0x18/0x24
[ 595.176884] dump_stack_lvl+0x48/0x60
[ 595.180541] dump_stack+0x18/0x24
[ 595.183843] pci_bus_release_domain_nr+0x34/0x84
[ 595.188453] pci_remove_root_bus+0xa0/0xa4
[ 595.192544] pci_host_common_remove+0x28/0x40
[ 595.196895] platform_remove+0x54/0x6c
[ 595.200641] device_remove+0x4c/0x80
[ 595.204209] device_release_driver_internal+0x1d4/0x230
[ 595.209430] device_release_driver+0x18/0x24
[ 595.213691] bus_remove_device+0xcc/0x10c
[ 595.217686] device_del+0x15c/0x41c
[ 595.221170] platform_device_del.part.0+0x1c/0x88
[ 595.225861] platform_device_unregister+0x24/0x40
[ 595.230557] of_platform_device_destroy+0xfc/0x10c
[ 595.235344] of_platform_notify+0x13c/0x178
[ 595.239518] blocking_notifier_call_chain+0x6c/0xa0
[ 595.244389] __of_changeset_entry_notify+0x148/0x16c
[ 595.249346] of_changeset_revert+0xa8/0xcc
[ 595.253437] jailhouse_pci_virtual_root_devices_remove+0x50/0x74 [jailhouse]
[ 595.260484] jailhouse_cmd_disable+0x70/0x1ac [jailhouse]
[ 595.265883] jailhouse_ioctl+0x60/0xf0 [jailhouse]
[ 595.270674] __arm64_sys_ioctl+0xac/0xf0
[ 595.274595] invoke_syscall+0x48/0x114
[ 595.278336] el0_svc_common.constprop.0+0xc4/0xe4

Regards,
Peng.

>
> Rob

2024-02-29 21:53:06

by Rob Herring (Arm)

[permalink] [raw]
Subject: Re: [PATCH] of: dynamic: notify before revert

On Thu, Feb 29, 2024 at 2:01 AM Peng Fan <[email protected]> wrote:
>
> > Subject: Re: [PATCH] of: dynamic: notify before revert
> >
> > On Wed, Feb 28, 2024 at 12:13 AM Peng Fan (OSS) <[email protected]>
> > wrote:
> > >
> > > From: Peng Fan <[email protected]>
> > >
> > > When PCI node was created using an overlay and the overlay is
> > > reverted/destroyed, the "linux,pci-domain" property no longer exists,
> > > so of_get_pci_domain_nr will return failure. Then
> > > of_pci_bus_release_domain_nr will actually use the dynamic IDA, even
> > > if the IDA was allocated in static IDA.
> >
> > That usage is broken to begin with unless there is a guarantee that static and
> > dynamic domain numbers don't overlap. For example, a dynamic number is
> > assigned and then you load an overlay with the same number defined in it.
>
> I may not describe it clear, the code is here, because overlay property
> removed, of_get_pci_domain_nr will return failure, so the code path
> will goest into free a dynamic ida. But actually there is no such dynamic
> ida, so dump.

I understood the problem.

Your usage of this is broken on applying your overlay. You just got lucky.

> So the problem is overlay was removed, but the notify callback may
> still use the property to do some work.
>
> static void of_pci_bus_release_domain_nr(struct pci_bus *bus, struct device *parent)
> {
> if (bus->domain_nr < 0)
> return;
>
> /* Release domain from IDA where it was allocated. */
> if (of_get_pci_domain_nr(parent->of_node) == bus->domain_nr)
> ida_free(&pci_domain_nr_static_ida, bus->domain_nr);
> else
> ida_free(&pci_domain_nr_dynamic_ida, bus->domain_nr);
> }
> >
> > > So move the notify before revert to fix the issue.
> >
> > You can't just change the timing. Something might require notify to be after
> > the revert.

Again ^^^

> >
> > > Fixes: c14f7ccc9f5d ("PCI: Assign PCI domain IDs by ida_alloc()")
> >
> > I don't see where the notifier is even used.
>
> The stack is as below:
>
> [ 595.150529] CPU: 1 PID: 582 Comm: jailhouse Tainted: G O 6.5.0-rc4-next-20230804-05021-g3d4cc14b42ef-dirty #355
> [ 595.161998] Hardware name: NXP i.MX93 11X11 EVK board (DT)
> [ 595.167475] Call trace:
> [ 595.169908] dump_backtrace+0x94/0xec
> [ 595.173573] show_stack+0x18/0x24
> [ 595.176884] dump_stack_lvl+0x48/0x60
> [ 595.180541] dump_stack+0x18/0x24
> [ 595.183843] pci_bus_release_domain_nr+0x34/0x84
> [ 595.188453] pci_remove_root_bus+0xa0/0xa4
> [ 595.192544] pci_host_common_remove+0x28/0x40
> [ 595.196895] platform_remove+0x54/0x6c
> [ 595.200641] device_remove+0x4c/0x80
> [ 595.204209] device_release_driver_internal+0x1d4/0x230
> [ 595.209430] device_release_driver+0x18/0x24
> [ 595.213691] bus_remove_device+0xcc/0x10c
> [ 595.217686] device_del+0x15c/0x41c
> [ 595.221170] platform_device_del.part.0+0x1c/0x88
> [ 595.225861] platform_device_unregister+0x24/0x40
> [ 595.230557] of_platform_device_destroy+0xfc/0x10c
> [ 595.235344] of_platform_notify+0x13c/0x178
> [ 595.239518] blocking_notifier_call_chain+0x6c/0xa0
> [ 595.244389] __of_changeset_entry_notify+0x148/0x16c
> [ 595.249346] of_changeset_revert+0xa8/0xcc
> [ 595.253437] jailhouse_pci_virtual_root_devices_remove+0x50/0x74 [jailhouse]

$ git grep jailhouse_pci_virtual_root_devices_remove
(END)

Another out of tree overlay user. I have little interest in worrying
about them. No one wants to step up and solve the problems with
overlays, so I'm not going to worry about them either.

Rob

2024-03-01 03:04:58

by Peng Fan

[permalink] [raw]
Subject: RE: [PATCH] of: dynamic: notify before revert

> Subject: Re: [PATCH] of: dynamic: notify before revert
>
> On Thu, Feb 29, 2024 at 2:01 AM Peng Fan <[email protected]> wrote:
> >
> > > Subject: Re: [PATCH] of: dynamic: notify before revert
> > >
> > > On Wed, Feb 28, 2024 at 12:13 AM Peng Fan (OSS)
> > > <[email protected]>
> > > wrote:
> > > >
> > > > From: Peng Fan <[email protected]>
> > > >
> > > > When PCI node was created using an overlay and the overlay is
> > > > reverted/destroyed, the "linux,pci-domain" property no longer
> > > > exists, so of_get_pci_domain_nr will return failure. Then
> > > > of_pci_bus_release_domain_nr will actually use the dynamic IDA,
> > > > even if the IDA was allocated in static IDA.
> > >
> > > That usage is broken to begin with unless there is a guarantee that
> > > static and dynamic domain numbers don't overlap. For example, a
> > > dynamic number is assigned and then you load an overlay with the same
> number defined in it.
> >
> > I may not describe it clear, the code is here, because overlay
> > property removed, of_get_pci_domain_nr will return failure, so the
> > code path will goest into free a dynamic ida. But actually there is no
> > such dynamic ida, so dump.
>
> I understood the problem.
>
> Your usage of this is broken on applying your overlay. You just got lucky.

Would you please point me out where is broken on using overlay?

https://github.com/siemens/jailhouse/blob/master/driver/pci.c#L458


>
> > So the problem is overlay was removed, but the notify callback may
> > still use the property to do some work.
> >
> > static void of_pci_bus_release_domain_nr(struct pci_bus *bus, struct
> > device *parent) {
> > if (bus->domain_nr < 0)
> > return;
> >
> > /* Release domain from IDA where it was allocated. */
> > if (of_get_pci_domain_nr(parent->of_node) == bus->domain_nr)
> > ida_free(&pci_domain_nr_static_ida, bus->domain_nr);
> > else
> > ida_free(&pci_domain_nr_dynamic_ida, bus->domain_nr);
> > }
> > >
> > > > So move the notify before revert to fix the issue.
> > >
> > > You can't just change the timing. Something might require notify to
> > > be after the revert.
>
> Again ^^^
>
> > >
> > > > Fixes: c14f7ccc9f5d ("PCI: Assign PCI domain IDs by ida_alloc()")
> > >
> > > I don't see where the notifier is even used.
> >
> > The stack is as below:
> >
> > [ 595.150529] CPU: 1 PID: 582 Comm: jailhouse Tainted: G O
> 6.5.0-rc4-next-20230804-05021-g3d4cc14b42ef-dirty #355
> > [ 595.161998] Hardware name: NXP i.MX93 11X11 EVK board (DT) [
> > 595.167475] Call trace:
> > [ 595.169908] dump_backtrace+0x94/0xec [ 595.173573]
> > show_stack+0x18/0x24 [ 595.176884] dump_stack_lvl+0x48/0x60 [
> > 595.180541] dump_stack+0x18/0x24 [ 595.183843]
> > pci_bus_release_domain_nr+0x34/0x84
> > [ 595.188453] pci_remove_root_bus+0xa0/0xa4 [ 595.192544]
> > pci_host_common_remove+0x28/0x40 [ 595.196895]
> > platform_remove+0x54/0x6c [ 595.200641] device_remove+0x4c/0x80 [
> > 595.204209] device_release_driver_internal+0x1d4/0x230
> > [ 595.209430] device_release_driver+0x18/0x24 [ 595.213691]
> > bus_remove_device+0xcc/0x10c [ 595.217686] device_del+0x15c/0x41c
> > [ 595.221170] platform_device_del.part.0+0x1c/0x88
> > [ 595.225861] platform_device_unregister+0x24/0x40
> > [ 595.230557] of_platform_device_destroy+0xfc/0x10c
> > [ 595.235344] of_platform_notify+0x13c/0x178 [ 595.239518]
> > blocking_notifier_call_chain+0x6c/0xa0
> > [ 595.244389] __of_changeset_entry_notify+0x148/0x16c
> > [ 595.249346] of_changeset_revert+0xa8/0xcc [ 595.253437]
> > jailhouse_pci_virtual_root_devices_remove+0x50/0x74 [jailhouse]
>
> $ git grep jailhouse_pci_virtual_root_devices_remove
> (END)
>
> Another out of tree overlay user. I have little interest in worrying about them.
> No one wants to step up and solve the problems with overlays, so I'm not
> going to worry about them either.

Ok, but I think this is indeed an issue, if driver accessing property after
property removed with overlay revert.


Thanks,
Peng.

>
> Rob

2024-03-01 23:05:47

by Rob Herring (Arm)

[permalink] [raw]
Subject: Re: [PATCH] of: dynamic: notify before revert

On Thu, Feb 29, 2024 at 9:04 PM Peng Fan <[email protected]> wrote:
>
> > Subject: Re: [PATCH] of: dynamic: notify before revert
> >
> > On Thu, Feb 29, 2024 at 2:01 AM Peng Fan <[email protected]> wrote:
> > >
> > > > Subject: Re: [PATCH] of: dynamic: notify before revert
> > > >
> > > > On Wed, Feb 28, 2024 at 12:13 AM Peng Fan (OSS)
> > > > <[email protected]>
> > > > wrote:
> > > > >
> > > > > From: Peng Fan <[email protected]>
> > > > >
> > > > > When PCI node was created using an overlay and the overlay is
> > > > > reverted/destroyed, the "linux,pci-domain" property no longer
> > > > > exists, so of_get_pci_domain_nr will return failure. Then
> > > > > of_pci_bus_release_domain_nr will actually use the dynamic IDA,
> > > > > even if the IDA was allocated in static IDA.
> > > >
> > > > That usage is broken to begin with unless there is a guarantee that
> > > > static and dynamic domain numbers don't overlap. For example, a
> > > > dynamic number is assigned and then you load an overlay with the same
> > number defined in it.
> > >
> > > I may not describe it clear, the code is here, because overlay
> > > property removed, of_get_pci_domain_nr will return failure, so the
> > > code path will goest into free a dynamic ida. But actually there is no
> > > such dynamic ida, so dump.
> >
> > I understood the problem.
> >
> > Your usage of this is broken on applying your overlay. You just got lucky.
>
> Would you please point me out where is broken on using overlay?
>
> https://github.com/siemens/jailhouse/blob/master/driver/pci.c#L458

I'm not saying your exact use is broken. Obviously, it worked for you.
In general, it is fragile. What happens with the following
combination?:

base.dts:
{
pci@0 {
linux,pci-domain = <0>;
};
pci@1 {
/* dynamic domain allocs domain 1 */
};
};

overlay.dts:
{
pci@0 {
linux,pci-domain = <1>;
};
};

The only way this can work is if the overlay linux,pci-domain is
ignored if there's a conflict or you reserve first N ids for static
and dynamic ones start at N where N is "should be more than anyone
needs".


> > > So the problem is overlay was removed, but the notify callback may
> > > still use the property to do some work.
> > >
> > > static void of_pci_bus_release_domain_nr(struct pci_bus *bus, struct
> > > device *parent) {
> > > if (bus->domain_nr < 0)
> > > return;
> > >
> > > /* Release domain from IDA where it was allocated. */
> > > if (of_get_pci_domain_nr(parent->of_node) == bus->domain_nr)
> > > ida_free(&pci_domain_nr_static_ida, bus->domain_nr);
> > > else
> > > ida_free(&pci_domain_nr_dynamic_ida, bus->domain_nr);
> > > }
> > > >
> > > > > So move the notify before revert to fix the issue.
> > > >
> > > > You can't just change the timing. Something might require notify to
> > > > be after the revert.
> >
> > Again ^^^
> >
> > > >
> > > > > Fixes: c14f7ccc9f5d ("PCI: Assign PCI domain IDs by ida_alloc()")
> > > >
> > > > I don't see where the notifier is even used.
> > >
> > > The stack is as below:
> > >
> > > [ 595.150529] CPU: 1 PID: 582 Comm: jailhouse Tainted: G O
> > 6.5.0-rc4-next-20230804-05021-g3d4cc14b42ef-dirty #355
> > > [ 595.161998] Hardware name: NXP i.MX93 11X11 EVK board (DT) [
> > > 595.167475] Call trace:
> > > [ 595.169908] dump_backtrace+0x94/0xec [ 595.173573]
> > > show_stack+0x18/0x24 [ 595.176884] dump_stack_lvl+0x48/0x60 [
> > > 595.180541] dump_stack+0x18/0x24 [ 595.183843]
> > > pci_bus_release_domain_nr+0x34/0x84
> > > [ 595.188453] pci_remove_root_bus+0xa0/0xa4 [ 595.192544]
> > > pci_host_common_remove+0x28/0x40 [ 595.196895]
> > > platform_remove+0x54/0x6c [ 595.200641] device_remove+0x4c/0x80 [
> > > 595.204209] device_release_driver_internal+0x1d4/0x230
> > > [ 595.209430] device_release_driver+0x18/0x24 [ 595.213691]
> > > bus_remove_device+0xcc/0x10c [ 595.217686] device_del+0x15c/0x41c
> > > [ 595.221170] platform_device_del.part.0+0x1c/0x88
> > > [ 595.225861] platform_device_unregister+0x24/0x40
> > > [ 595.230557] of_platform_device_destroy+0xfc/0x10c
> > > [ 595.235344] of_platform_notify+0x13c/0x178 [ 595.239518]
> > > blocking_notifier_call_chain+0x6c/0xa0
> > > [ 595.244389] __of_changeset_entry_notify+0x148/0x16c
> > > [ 595.249346] of_changeset_revert+0xa8/0xcc [ 595.253437]
> > > jailhouse_pci_virtual_root_devices_remove+0x50/0x74 [jailhouse]
> >
> > $ git grep jailhouse_pci_virtual_root_devices_remove
> > (END)
> >
> > Another out of tree overlay user. I have little interest in worrying about them.
> > No one wants to step up and solve the problems with overlays, so I'm not
> > going to worry about them either.
>
> Ok, but I think this is indeed an issue, if driver accessing property after
> property removed with overlay revert.

A pre-remove notifier might be useful, but as I said, you can't just
remove the post-remove notifier without some justification why no one
needs it. Notifiers aren't the best construct in the kernel, so adding
to them isn't really desired either. Also, I think there are 2 other
issues with the design here:

Generally, the code should read the DT once upfront and not need the
DT later on. So needing to access the DT on removal is kind of
suspect. That's like needing to read USB device descriptors after a
USB device is unplugged.

It also seems like the driver should hold a reference to the DT
node(s) it needs to prevent removal until it no longer needs them. Not
sure if the node refcount would be enough or we need something more.

Rob