Stefan reports that rebooting his ODROID-N2+ (using a G12B SoC) results
in the board hanging. His kernel config uses:
CONFIG_MESON_EE_PM_DOMAINS=y
CONFIG_DRM_MESON=m
He reports that his kernel config results in the DRM driver's .shutdown
callback to be executed after the power domain driver's .shutdown
callback. That's problematic because meson_ee_pwrc_shutdown disables the
clock which are used by the VPU IP. This causes the board to hang.
Further he reports that changing from CONFIG_DRM_MESON=m to
CONFIG_DRM_MESON=y reverses the order in which the DRM and power domain
driver's shutdown functions are executed, making the reboot successful.
The reason why we use meson_ee_pwrc_shutdown is because of the VPU power
domain (which is causing the problem described above). It can be left
enabled by u-boot. According to the original TOFIX comment in
meson_ee_pwrc_init_domain we need to be careful because disabling the
power domain could "cause system errors". As a workaround the clocks
are manually enabled in meson_ee_pwrc_init_domain and the power domain
is marked as GENPD_FLAG_ALWAYS_ON (so it can never be turned off).
Experimenting has shown that the power domain itself can be disabled as
long as we keep the clocks enabled if u-boot enabled the power domain
but we don't have any driver enabled for the VPU (CONFIG_DRM_MESON=n).
Keeping the clocks enabled is the responsibility of the CCF drivers, not
the power domain driver. Even better: this is already covered as all
gates in the VPU and VAPB tree on GX an G12 SoCs have the
CLK_IGNORE_UNUSED flag set, meaning: if the bootloader has previously
enabled the clock we're not touching it until a driver explicitly asks
to enable (and then disable) it. In case of CONFIG_DRM_MESON=n we're
never calling meson_ee_pwrc_on, meaning that we always keep the state of
the clocks as set by u-boot.
The original TOFIX comment also mentioned that we need to make sure not
to mess up the clock's prepare/enable ref-counters. This is the only
requirement that's left for the meson-ee-pwrc driver that needs to be
managed for the VPU power domain.
Three steps can improve this situation:
- Don't prepare and enable the clocks (just to fix the ref-counting) in
meson_ee_pwrc_init_domain if u-boot left that power domain enabled.
Instead, remember if the clocks are enabled in meson_ee_pwrc_{on,off}
and only disable them if we have previously turned them on ourselves.
- Drop GENPD_FLAG_ALWAYS_ON as we can always manage the state of the VPU
power domain if both the power domain controller and DRM driver are
enabled (=m or =y). If the power domain driver is enabled but the DRM
driver is disabled we can still use meson_ee_pwrc_off because it's not
trying to disable the clocks anymore
- Drop meson_ee_pwrc_shutdown as it's the responsibility of the genpd
framework to call meson_ee_pwrc_off when needed (either when a power
domain is being disabled - regardless of whether it's was used by a
driver before or not). Now there's also no more shutdown callback
ordering dependency between the power domain driver and other drivers
anymore.
Fixes: eef3c2ba0a42a6 ("soc: amlogic: Add support for Everything-Else power domains controller")
Reported-by: Stefan Agner <[email protected]>
Signed-off-by: Martin Blumenstingl <[email protected]>
---
Neil, I would like you to specifically review (and re-test) the original
TOFIX part. I *believe* that I understand the original problem and hope
that my approach also works in that scenario, but I am not 100% sure.
Kevin, as you're my go-to genpd expert I am also asking you to review
this to point out any issues you see.
drivers/soc/amlogic/meson-ee-pwrc.c | 76 ++++++++++++-----------------
1 file changed, 31 insertions(+), 45 deletions(-)
diff --git a/drivers/soc/amlogic/meson-ee-pwrc.c b/drivers/soc/amlogic/meson-ee-pwrc.c
index 50bf5d2b828b..534c33ba31ce 100644
--- a/drivers/soc/amlogic/meson-ee-pwrc.c
+++ b/drivers/soc/amlogic/meson-ee-pwrc.c
@@ -353,10 +353,26 @@ static int meson_ee_pwrc_off(struct generic_pm_domain *domain)
if (pwrc_domain->num_clks) {
msleep(20);
- clk_bulk_disable_unprepare(pwrc_domain->num_clks,
- pwrc_domain->clks);
+
+ /*
+ * We are only allowed to turn off the clocks here if we
+ * have previously enabled them ourselves. In other words:
+ * for an "unused" power domain (which is not used by any
+ * power domain consumer) we have not enabled the clocks
+ * previously so we keep them in the state they are.
+ * This is relevant for the VPU power domain which may
+ * have been enabled by u-boot. If the display driver in
+ * Linux is disabled then we need to keep the clocks in
+ * the state as u-boot set them, otherwise the board will
+ * hang.
+ */
+ if (pwrc_domain->enabled)
+ clk_bulk_disable_unprepare(pwrc_domain->num_clks,
+ pwrc_domain->clks);
}
+ pwrc_domain->enabled = false;
+
return 0;
}
@@ -392,8 +408,14 @@ static int meson_ee_pwrc_on(struct generic_pm_domain *domain)
if (ret)
return ret;
- return clk_bulk_prepare_enable(pwrc_domain->num_clks,
- pwrc_domain->clks);
+ ret = clk_bulk_prepare_enable(pwrc_domain->num_clks,
+ pwrc_domain->clks);
+ if (ret)
+ return ret;
+
+ pwrc_domain->enabled = true;
+
+ return 0;
}
static int meson_ee_pwrc_init_domain(struct platform_device *pdev,
@@ -434,33 +456,11 @@ static int meson_ee_pwrc_init_domain(struct platform_device *pdev,
dom->base.power_on = meson_ee_pwrc_on;
dom->base.power_off = meson_ee_pwrc_off;
- /*
- * TOFIX: This is a special case for the VPU power domain, which can
- * be enabled previously by the bootloader. In this case the VPU
- * pipeline may be functional but no driver maybe never attach
- * to this power domain, and if the domain is disabled it could
- * cause system errors. This is why the pm_domain_always_on_gov
- * is used here.
- * For the same reason, the clocks should be enabled in case
- * we need to power the domain off, otherwise the internal clocks
- * prepare/enable counters won't be in sync.
- */
- if (dom->num_clks && dom->desc.get_power && !dom->desc.get_power(dom)) {
- ret = clk_bulk_prepare_enable(dom->num_clks, dom->clks);
- if (ret)
- return ret;
-
- dom->base.flags = GENPD_FLAG_ALWAYS_ON;
- ret = pm_genpd_init(&dom->base, NULL, false);
- if (ret)
- return ret;
- } else {
- ret = pm_genpd_init(&dom->base, NULL,
- (dom->desc.get_power ?
- dom->desc.get_power(dom) : true));
- if (ret)
- return ret;
- }
+ ret = pm_genpd_init(&dom->base, NULL,
+ (dom->desc.get_power ?
+ dom->desc.get_power(dom) : true));
+ if (ret)
+ return ret;
return 0;
}
@@ -528,19 +528,6 @@ static int meson_ee_pwrc_probe(struct platform_device *pdev)
return of_genpd_add_provider_onecell(pdev->dev.of_node, &pwrc->xlate);
}
-static void meson_ee_pwrc_shutdown(struct platform_device *pdev)
-{
- struct meson_ee_pwrc *pwrc = platform_get_drvdata(pdev);
- int i;
-
- for (i = 0 ; i < pwrc->xlate.num_domains ; ++i) {
- struct meson_ee_pwrc_domain *dom = &pwrc->domains[i];
-
- if (dom->desc.get_power && !dom->desc.get_power(dom))
- meson_ee_pwrc_off(&dom->base);
- }
-}
-
static struct meson_ee_pwrc_domain_data meson_ee_g12a_pwrc_data = {
.count = ARRAY_SIZE(g12a_pwrc_domains),
.domains = g12a_pwrc_domains,
@@ -606,7 +593,6 @@ MODULE_DEVICE_TABLE(of, meson_ee_pwrc_match_table);
static struct platform_driver meson_ee_pwrc_driver = {
.probe = meson_ee_pwrc_probe,
- .shutdown = meson_ee_pwrc_shutdown,
.driver = {
.name = "meson_ee_pwrc",
.of_match_table = meson_ee_pwrc_match_table,
--
2.31.1
On 2021-04-29 22:37, Martin Blumenstingl wrote:
> Stefan reports that rebooting his ODROID-N2+ (using a G12B SoC) results
> in the board hanging. His kernel config uses:
> CONFIG_MESON_EE_PM_DOMAINS=y
> CONFIG_DRM_MESON=m
>
> He reports that his kernel config results in the DRM driver's .shutdown
> callback to be executed after the power domain driver's .shutdown
> callback. That's problematic because meson_ee_pwrc_shutdown disables the
> clock which are used by the VPU IP. This causes the board to hang.
>
> Further he reports that changing from CONFIG_DRM_MESON=m to
> CONFIG_DRM_MESON=y reverses the order in which the DRM and power domain
> driver's shutdown functions are executed, making the reboot successful.
>
> The reason why we use meson_ee_pwrc_shutdown is because of the VPU power
> domain (which is causing the problem described above). It can be left
> enabled by u-boot. According to the original TOFIX comment in
> meson_ee_pwrc_init_domain we need to be careful because disabling the
> power domain could "cause system errors". As a workaround the clocks
> are manually enabled in meson_ee_pwrc_init_domain and the power domain
> is marked as GENPD_FLAG_ALWAYS_ON (so it can never be turned off).
>
> Experimenting has shown that the power domain itself can be disabled as
> long as we keep the clocks enabled if u-boot enabled the power domain
> but we don't have any driver enabled for the VPU (CONFIG_DRM_MESON=n).
>
> Keeping the clocks enabled is the responsibility of the CCF drivers, not
> the power domain driver. Even better: this is already covered as all
> gates in the VPU and VAPB tree on GX an G12 SoCs have the
> CLK_IGNORE_UNUSED flag set, meaning: if the bootloader has previously
> enabled the clock we're not touching it until a driver explicitly asks
> to enable (and then disable) it. In case of CONFIG_DRM_MESON=n we're
> never calling meson_ee_pwrc_on, meaning that we always keep the state of
> the clocks as set by u-boot.
>
> The original TOFIX comment also mentioned that we need to make sure not
> to mess up the clock's prepare/enable ref-counters. This is the only
> requirement that's left for the meson-ee-pwrc driver that needs to be
> managed for the VPU power domain.
>
> Three steps can improve this situation:
> - Don't prepare and enable the clocks (just to fix the ref-counting) in
> meson_ee_pwrc_init_domain if u-boot left that power domain enabled.
> Instead, remember if the clocks are enabled in meson_ee_pwrc_{on,off}
> and only disable them if we have previously turned them on ourselves.
> - Drop GENPD_FLAG_ALWAYS_ON as we can always manage the state of the VPU
> power domain if both the power domain controller and DRM driver are
> enabled (=m or =y). If the power domain driver is enabled but the DRM
> driver is disabled we can still use meson_ee_pwrc_off because it's not
> trying to disable the clocks anymore
> - Drop meson_ee_pwrc_shutdown as it's the responsibility of the genpd
> framework to call meson_ee_pwrc_off when needed (either when a power
> domain is being disabled - regardless of whether it's was used by a
> driver before or not). Now there's also no more shutdown callback
> ordering dependency between the power domain driver and other drivers
> anymore.
>
> Fixes: eef3c2ba0a42a6 ("soc: amlogic: Add support for Everything-Else
> power domains controller")
> Reported-by: Stefan Agner <[email protected]>
> Signed-off-by: Martin Blumenstingl <[email protected]>
FWIW, I successfully tested this change with CONFIG_DRM_MESON=m and
CONFIG_DRM_MESON=y each with a couple of reboots without any hang.
Tested-by: Stefan Agner <[email protected]>
Thanks Martin for help debugging and working on this patch!
--
Stefan
> ---
> Neil, I would like you to specifically review (and re-test) the original
> TOFIX part. I *believe* that I understand the original problem and hope
> that my approach also works in that scenario, but I am not 100% sure.
>
> Kevin, as you're my go-to genpd expert I am also asking you to review
> this to point out any issues you see.
>
>
> drivers/soc/amlogic/meson-ee-pwrc.c | 76 ++++++++++++-----------------
> 1 file changed, 31 insertions(+), 45 deletions(-)
>
> diff --git a/drivers/soc/amlogic/meson-ee-pwrc.c
> b/drivers/soc/amlogic/meson-ee-pwrc.c
> index 50bf5d2b828b..534c33ba31ce 100644
> --- a/drivers/soc/amlogic/meson-ee-pwrc.c
> +++ b/drivers/soc/amlogic/meson-ee-pwrc.c
> @@ -353,10 +353,26 @@ static int meson_ee_pwrc_off(struct
> generic_pm_domain *domain)
>
> if (pwrc_domain->num_clks) {
> msleep(20);
> - clk_bulk_disable_unprepare(pwrc_domain->num_clks,
> - pwrc_domain->clks);
> +
> + /*
> + * We are only allowed to turn off the clocks here if we
> + * have previously enabled them ourselves. In other words:
> + * for an "unused" power domain (which is not used by any
> + * power domain consumer) we have not enabled the clocks
> + * previously so we keep them in the state they are.
> + * This is relevant for the VPU power domain which may
> + * have been enabled by u-boot. If the display driver in
> + * Linux is disabled then we need to keep the clocks in
> + * the state as u-boot set them, otherwise the board will
> + * hang.
> + */
> + if (pwrc_domain->enabled)
> + clk_bulk_disable_unprepare(pwrc_domain->num_clks,
> + pwrc_domain->clks);
> }
>
> + pwrc_domain->enabled = false;
> +
> return 0;
> }
>
> @@ -392,8 +408,14 @@ static int meson_ee_pwrc_on(struct
> generic_pm_domain *domain)
> if (ret)
> return ret;
>
> - return clk_bulk_prepare_enable(pwrc_domain->num_clks,
> - pwrc_domain->clks);
> + ret = clk_bulk_prepare_enable(pwrc_domain->num_clks,
> + pwrc_domain->clks);
> + if (ret)
> + return ret;
> +
> + pwrc_domain->enabled = true;
> +
> + return 0;
> }
>
> static int meson_ee_pwrc_init_domain(struct platform_device *pdev,
> @@ -434,33 +456,11 @@ static int meson_ee_pwrc_init_domain(struct
> platform_device *pdev,
> dom->base.power_on = meson_ee_pwrc_on;
> dom->base.power_off = meson_ee_pwrc_off;
>
> - /*
> - * TOFIX: This is a special case for the VPU power domain, which can
> - * be enabled previously by the bootloader. In this case the VPU
> - * pipeline may be functional but no driver maybe never attach
> - * to this power domain, and if the domain is disabled it could
> - * cause system errors. This is why the pm_domain_always_on_gov
> - * is used here.
> - * For the same reason, the clocks should be enabled in case
> - * we need to power the domain off, otherwise the internal clocks
> - * prepare/enable counters won't be in sync.
> - */
> - if (dom->num_clks && dom->desc.get_power && !dom->desc.get_power(dom)) {
> - ret = clk_bulk_prepare_enable(dom->num_clks, dom->clks);
> - if (ret)
> - return ret;
> -
> - dom->base.flags = GENPD_FLAG_ALWAYS_ON;
> - ret = pm_genpd_init(&dom->base, NULL, false);
> - if (ret)
> - return ret;
> - } else {
> - ret = pm_genpd_init(&dom->base, NULL,
> - (dom->desc.get_power ?
> - dom->desc.get_power(dom) : true));
> - if (ret)
> - return ret;
> - }
> + ret = pm_genpd_init(&dom->base, NULL,
> + (dom->desc.get_power ?
> + dom->desc.get_power(dom) : true));
> + if (ret)
> + return ret;
>
> return 0;
> }
> @@ -528,19 +528,6 @@ static int meson_ee_pwrc_probe(struct
> platform_device *pdev)
> return of_genpd_add_provider_onecell(pdev->dev.of_node, &pwrc->xlate);
> }
>
> -static void meson_ee_pwrc_shutdown(struct platform_device *pdev)
> -{
> - struct meson_ee_pwrc *pwrc = platform_get_drvdata(pdev);
> - int i;
> -
> - for (i = 0 ; i < pwrc->xlate.num_domains ; ++i) {
> - struct meson_ee_pwrc_domain *dom = &pwrc->domains[i];
> -
> - if (dom->desc.get_power && !dom->desc.get_power(dom))
> - meson_ee_pwrc_off(&dom->base);
> - }
> -}
> -
> static struct meson_ee_pwrc_domain_data meson_ee_g12a_pwrc_data = {
> .count = ARRAY_SIZE(g12a_pwrc_domains),
> .domains = g12a_pwrc_domains,
> @@ -606,7 +593,6 @@ MODULE_DEVICE_TABLE(of, meson_ee_pwrc_match_table);
>
> static struct platform_driver meson_ee_pwrc_driver = {
> .probe = meson_ee_pwrc_probe,
> - .shutdown = meson_ee_pwrc_shutdown,
> .driver = {
> .name = "meson_ee_pwrc",
> .of_match_table = meson_ee_pwrc_match_table,
On Thu, Apr 29, 2021 at 10:37 PM Martin Blumenstingl
<[email protected]> wrote:
>
> Stefan reports that rebooting his ODROID-N2+ (using a G12B SoC) results
> in the board hanging. His kernel config uses:
> CONFIG_MESON_EE_PM_DOMAINS=y
> CONFIG_DRM_MESON=m
>
> He reports that his kernel config results in the DRM driver's .shutdown
> callback to be executed after the power domain driver's .shutdown
> callback. That's problematic because meson_ee_pwrc_shutdown disables the
> clock which are used by the VPU IP. This causes the board to hang.
>
> Further he reports that changing from CONFIG_DRM_MESON=m to
> CONFIG_DRM_MESON=y reverses the order in which the DRM and power domain
> driver's shutdown functions are executed, making the reboot successful.
>
> The reason why we use meson_ee_pwrc_shutdown is because of the VPU power
> domain (which is causing the problem described above). It can be left
> enabled by u-boot. According to the original TOFIX comment in
> meson_ee_pwrc_init_domain we need to be careful because disabling the
> power domain could "cause system errors". As a workaround the clocks
> are manually enabled in meson_ee_pwrc_init_domain and the power domain
> is marked as GENPD_FLAG_ALWAYS_ON (so it can never be turned off).
>
> Experimenting has shown that the power domain itself can be disabled as
> long as we keep the clocks enabled if u-boot enabled the power domain
> but we don't have any driver enabled for the VPU (CONFIG_DRM_MESON=n).
>
> Keeping the clocks enabled is the responsibility of the CCF drivers, not
> the power domain driver. Even better: this is already covered as all
> gates in the VPU and VAPB tree on GX an G12 SoCs have the
> CLK_IGNORE_UNUSED flag set, meaning: if the bootloader has previously
> enabled the clock we're not touching it until a driver explicitly asks
> to enable (and then disable) it. In case of CONFIG_DRM_MESON=n we're
> never calling meson_ee_pwrc_on, meaning that we always keep the state of
> the clocks as set by u-boot.
>
> The original TOFIX comment also mentioned that we need to make sure not
> to mess up the clock's prepare/enable ref-counters. This is the only
> requirement that's left for the meson-ee-pwrc driver that needs to be
> managed for the VPU power domain.
>
> Three steps can improve this situation:
> - Don't prepare and enable the clocks (just to fix the ref-counting) in
> meson_ee_pwrc_init_domain if u-boot left that power domain enabled.
> Instead, remember if the clocks are enabled in meson_ee_pwrc_{on,off}
> and only disable them if we have previously turned them on ourselves.
> - Drop GENPD_FLAG_ALWAYS_ON as we can always manage the state of the VPU
> power domain if both the power domain controller and DRM driver are
> enabled (=m or =y). If the power domain driver is enabled but the DRM
> driver is disabled we can still use meson_ee_pwrc_off because it's not
> trying to disable the clocks anymore
> - Drop meson_ee_pwrc_shutdown as it's the responsibility of the genpd
> framework to call meson_ee_pwrc_off when needed (either when a power
> domain is being disabled - regardless of whether it's was used by a
> driver before or not). Now there's also no more shutdown callback
> ordering dependency between the power domain driver and other drivers
> anymore.
>
> Fixes: eef3c2ba0a42a6 ("soc: amlogic: Add support for Everything-Else power domains controller")
> Reported-by: Stefan Agner <[email protected]>
> Signed-off-by: Martin Blumenstingl <[email protected]>
unfortunately I need to add:
Nacked-by: myself
it turns out that the genpd framework does not call .power_on when the
power domain is already powered on during initialization
(pm_genpd_init)
I wonder if we need to extend the genpd to handle this use-case.
CCF (common clock framework) for example has CLK_IGNORE_UNUSED and a
.disable_unused callback which can be used to specifically manage the
"unused" use-case
my idea #1:
- add a GENPD_FLAG_IGNORE_UNUSED flag
- set it for the VPU power domain
- skip "unused" power domains which have the GENPD_FLAG_IGNORE_UNUSED flag set
- drop the GENPD_FLAG_ALWAYS_ON flag from the VPU power domain
my idea #2:
- a power_off_unused callback (with the same arguments as power_off)
could be introduced
- in pm_genpd_init we check if that callback is initialized, if not we
assign the power_off callback
- instead of using the power_off callback when disabling an unused
power domain the new power_off_unused callback is used
- for the meson-ee-pwrc we implement a special meson_ee_pwrc_power_off
function that is a no-op
Best regards,
Martin
Martin Blumenstingl <[email protected]> writes:
> Stefan reports that rebooting his ODROID-N2+ (using a G12B SoC) results
> in the board hanging. His kernel config uses:
> CONFIG_MESON_EE_PM_DOMAINS=y
> CONFIG_DRM_MESON=m
>
> He reports that his kernel config results in the DRM driver's .shutdown
> callback to be executed after the power domain driver's .shutdown
> callback. That's problematic because meson_ee_pwrc_shutdown disables the
> clock which are used by the VPU IP. This causes the board to hang.
I didn't dig deeply on this yet because this smells very much like an
issue Art reported[1] and fixed.
What kernel version are you using, and does it contain
commit fa0c16caf3d7 (drm: meson_drv add shutdown function)
Kevin
[1] https://lore.kernel.org/dri-devel/[email protected]/