2022-08-15 15:13:08

by Colin Foster

[permalink] [raw]
Subject: boot stuck at starting kernel, due to __genpd_dev_pm_attach?

Hello,

You might have already gotten this report, but I tried running v6.0-rc1
on my BeagleBone Black and it gets stuck right after "Starting kernel
..." from U-Boot.

A bisect pointed me to commit 5a46079a9645 ("PM: domains: Delete usage
of driver_deferred_probe_check_state()").

I don't have much more detail than that, other than I'm using the
in-tree am335x-boneblack.dts device tree and I believe I had tested with
the multi-v7-defconfig for this verification. I'm happy to test anything
that might offer more information.


2022-08-15 18:47:20

by Pavel Machek

[permalink] [raw]
Subject: Re: boot stuck at starting kernel, due to __genpd_dev_pm_attach?

Hi!

> You might have already gotten this report, but I tried running v6.0-rc1
> on my BeagleBone Black and it gets stuck right after "Starting kernel
> ..." from U-Boot.
>
> A bisect pointed me to commit 5a46079a9645 ("PM: domains: Delete usage
> of driver_deferred_probe_check_state()").
>
> I don't have much more detail than that, other than I'm using the
> in-tree am335x-boneblack.dts device tree and I believe I had tested with
> the multi-v7-defconfig for this verification. I'm happy to test anything
> that might offer more information.

Well, standart next step is reverting 5a46079a9645 on top of v6.0-rc1,
and if it starts working, either you get fix in your inbox, or you ask
Linus to revert :-).

Best regards,
Pavel
--
People of Russia, stop Putin before his war on Ukraine escalates.


Attachments:
(No filename) (834.00 B)
signature.asc (201.00 B)
Download all attachments

2022-08-16 04:17:34

by Saravana Kannan

[permalink] [raw]
Subject: Re: boot stuck at starting kernel, due to __genpd_dev_pm_attach?

On Mon, Aug 15, 2022 at 5:28 PM Colin Foster
<[email protected]> wrote:
>
> On Mon, Aug 15, 2022 at 08:23:07PM +0200, Pavel Machek wrote:
> > Hi!
> >
> > > You might have already gotten this report, but I tried running v6.0-rc1
> > > on my BeagleBone Black and it gets stuck right after "Starting kernel
> > > ..." from U-Boot.
> > >
> > > A bisect pointed me to commit 5a46079a9645 ("PM: domains: Delete usage
> > > of driver_deferred_probe_check_state()").
> > >
> > > I don't have much more detail than that, other than I'm using the
> > > in-tree am335x-boneblack.dts device tree and I believe I had tested with
> > > the multi-v7-defconfig for this verification. I'm happy to test anything
> > > that might offer more information.
> >
> > Well, standart next step is reverting 5a46079a9645 on top of v6.0-rc1,
> > and if it starts working, either you get fix in your inbox, or you ask
> > Linus to revert :-).
>
> I was able to revert 5a46079a9645 and 9cbffc7a5956 and successfully boot
> v6.0-rc1 on the Beaglebone Black.
>
> I still don't know whether the root cause is the patch, or perhaps an
> invalid boneblack DTS. I'll try and dig to get more info about what
> might be failing. But I do think anyone using a Beaglebone will have
> this issue, and I also think I'm not the only using the BBB.
>

Hi Colin,

Thanks for the report. There have been other reports like this. This
commit in question is probably the cause. I have two series going.

One [1] is to revert these patches. Probably more suited for 5.19.xxx releases.

The other [2] is to actually fix the issues you are seeing without
reverting these patches (long term we do want to keep the patch that's
causing the issue for you -- not going into the details here). Can you
give this series[2] a shot and tell me if it fixes the issue? You
might need to pull in this additional diff on top of [2] (I'll roll it
into v2 of the series once I get some tests on this)

diff --git a/drivers/base/core.c b/drivers/base/core.c
index 2f012e826986..866755d8ad95 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -2068,7 +2068,11 @@ static int fw_devlink_create_devlink(struct device *con,
device_links_write_unlock();
}

- sup_dev = get_dev_from_fwnode(sup_handle);
+ if (sup_handle->flags & FWNODE_FLAG_NOT_DEVICE)
+ sup_dev = fwnode_get_next_parent_dev(sup_handle);
+ else
+ sup_dev = get_dev_from_fwnode(sup_handle);
+
if (sup_dev) {
/*
* If it's one of those drivers that don't actually bind to

Thanks,
Saravana

[1] - https://lore.kernel.org/lkml/[email protected]/
[2] - https://lore.kernel.org/lkml/[email protected]/

2022-08-16 06:59:46

by Colin Foster

[permalink] [raw]
Subject: Re: boot stuck at starting kernel, due to __genpd_dev_pm_attach?

On Mon, Aug 15, 2022 at 08:23:07PM +0200, Pavel Machek wrote:
> Hi!
>
> > You might have already gotten this report, but I tried running v6.0-rc1
> > on my BeagleBone Black and it gets stuck right after "Starting kernel
> > ..." from U-Boot.
> >
> > A bisect pointed me to commit 5a46079a9645 ("PM: domains: Delete usage
> > of driver_deferred_probe_check_state()").
> >
> > I don't have much more detail than that, other than I'm using the
> > in-tree am335x-boneblack.dts device tree and I believe I had tested with
> > the multi-v7-defconfig for this verification. I'm happy to test anything
> > that might offer more information.
>
> Well, standart next step is reverting 5a46079a9645 on top of v6.0-rc1,
> and if it starts working, either you get fix in your inbox, or you ask
> Linus to revert :-).

I was able to revert 5a46079a9645 and 9cbffc7a5956 and successfully boot
v6.0-rc1 on the Beaglebone Black.

I still don't know whether the root cause is the patch, or perhaps an
invalid boneblack DTS. I'll try and dig to get more info about what
might be failing. But I do think anyone using a Beaglebone will have
this issue, and I also think I'm not the only using the BBB.

>
> Best regards,
> Pavel
> --
> People of Russia, stop Putin before his war on Ukraine escalates.


2022-08-17 06:22:56

by Colin Foster

[permalink] [raw]
Subject: Re: boot stuck at starting kernel, due to __genpd_dev_pm_attach?

On Mon, Aug 15, 2022 at 05:43:19PM -0700, Saravana Kannan wrote:
> On Mon, Aug 15, 2022 at 5:28 PM Colin Foster
> <[email protected]> wrote:
> >
> > On Mon, Aug 15, 2022 at 08:23:07PM +0200, Pavel Machek wrote:
> > > Hi!
> > >
> > > > You might have already gotten this report, but I tried running v6.0-rc1
> > > > on my BeagleBone Black and it gets stuck right after "Starting kernel
> > > > ..." from U-Boot.
> > > >
> > > > A bisect pointed me to commit 5a46079a9645 ("PM: domains: Delete usage
> > > > of driver_deferred_probe_check_state()").
> > > >
> > > > I don't have much more detail than that, other than I'm using the
> > > > in-tree am335x-boneblack.dts device tree and I believe I had tested with
> > > > the multi-v7-defconfig for this verification. I'm happy to test anything
> > > > that might offer more information.
> > >
> > > Well, standart next step is reverting 5a46079a9645 on top of v6.0-rc1,
> > > and if it starts working, either you get fix in your inbox, or you ask
> > > Linus to revert :-).
> >
> > I was able to revert 5a46079a9645 and 9cbffc7a5956 and successfully boot
> > v6.0-rc1 on the Beaglebone Black.
> >
> > I still don't know whether the root cause is the patch, or perhaps an
> > invalid boneblack DTS. I'll try and dig to get more info about what
> > might be failing. But I do think anyone using a Beaglebone will have
> > this issue, and I also think I'm not the only using the BBB.
> >
>
> Hi Colin,
>
> Thanks for the report. There have been other reports like this. This
> commit in question is probably the cause. I have two series going.
>
> One [1] is to revert these patches. Probably more suited for 5.19.xxx releases.
>
> The other [2] is to actually fix the issues you are seeing without
> reverting these patches (long term we do want to keep the patch that's
> causing the issue for you -- not going into the details here). Can you
> give this series[2] a shot and tell me if it fixes the issue? You
> might need to pull in this additional diff on top of [2] (I'll roll it
> into v2 of the series once I get some tests on this)

Hi Saravana,

I can confirm that series [2] fixes the boot issues I was having with
6.0-rc1 on the Beaglebone Black. I did not need to apply the diff you
posted below.

Thanks!

>
> diff --git a/drivers/base/core.c b/drivers/base/core.c
> index 2f012e826986..866755d8ad95 100644
> --- a/drivers/base/core.c
> +++ b/drivers/base/core.c
> @@ -2068,7 +2068,11 @@ static int fw_devlink_create_devlink(struct device *con,
> device_links_write_unlock();
> }
>
> - sup_dev = get_dev_from_fwnode(sup_handle);
> + if (sup_handle->flags & FWNODE_FLAG_NOT_DEVICE)
> + sup_dev = fwnode_get_next_parent_dev(sup_handle);
> + else
> + sup_dev = get_dev_from_fwnode(sup_handle);
> +
> if (sup_dev) {
> /*
> * If it's one of those drivers that don't actually bind to
>
> Thanks,
> Saravana
>
> [1] - https://lore.kernel.org/lkml/[email protected]/
> [2] - https://lore.kernel.org/lkml/[email protected]/