2022-10-16 10:00:47

by Shawn Guo

[permalink] [raw]
Subject: [PATCH 0/3] Manage domain power state for hibernate freeze

The series tries to fix a hardware lockup issue caused by missing domain
power state management from hibernate .freeze[thaw]_noirq hooks. The
detailed description of the issue can be found in commit log of patch
#3.

Shawn Guo (3):
PM: domains: Pass generic PM noirq hooks to genpd_finish_suspend()
PM: domains: Consolidate genpd_restore_noirq() and
genpd_resume_noirq()
PM: domains: Power off[on] domain in hibernate .freeze[thaw]_noirq
hook

drivers/base/power/domain.c | 130 +++++++++++++++---------------------
1 file changed, 52 insertions(+), 78 deletions(-)

--
2.25.1


2022-10-16 10:02:01

by Shawn Guo

[permalink] [raw]
Subject: [PATCH 1/3] PM: domains: Pass generic PM noirq hooks to genpd_finish_suspend()

While argument `poweroff` works fine for genpd_finish_suspend() to handle
distinction between suspend and poweroff, it won't scale if we want to
use it for freeze as well. Pass generic PM noirq hooks as arguments
instead, so that the function can possibly cover freeze case too.

Signed-off-by: Shawn Guo <[email protected]>
---
drivers/base/power/domain.c | 25 +++++++++++++------------
1 file changed, 13 insertions(+), 12 deletions(-)

diff --git a/drivers/base/power/domain.c b/drivers/base/power/domain.c
index 55a10e6d4e2a..01f5644938e0 100644
--- a/drivers/base/power/domain.c
+++ b/drivers/base/power/domain.c
@@ -1189,12 +1189,15 @@ static int genpd_prepare(struct device *dev)
* genpd_finish_suspend - Completion of suspend or hibernation of device in an
* I/O pm domain.
* @dev: Device to suspend.
- * @poweroff: Specifies if this is a poweroff_noirq or suspend_noirq callback.
+ * @suspend_noirq: Generic suspend_noirq callback.
+ * @resume_noirq: Generic resume_noirq callback.
*
* Stop the device and remove power from the domain if all devices in it have
* been stopped.
*/
-static int genpd_finish_suspend(struct device *dev, bool poweroff)
+static int genpd_finish_suspend(struct device *dev,
+ int (*suspend_noirq)(struct device *dev),
+ int (*resume_noirq)(struct device *dev))
{
struct generic_pm_domain *genpd;
int ret = 0;
@@ -1203,10 +1206,7 @@ static int genpd_finish_suspend(struct device *dev, bool poweroff)
if (IS_ERR(genpd))
return -EINVAL;

- if (poweroff)
- ret = pm_generic_poweroff_noirq(dev);
- else
- ret = pm_generic_suspend_noirq(dev);
+ ret = suspend_noirq(dev);
if (ret)
return ret;

@@ -1217,10 +1217,7 @@ static int genpd_finish_suspend(struct device *dev, bool poweroff)
!pm_runtime_status_suspended(dev)) {
ret = genpd_stop_dev(genpd, dev);
if (ret) {
- if (poweroff)
- pm_generic_restore_noirq(dev);
- else
- pm_generic_resume_noirq(dev);
+ resume_noirq(dev);
return ret;
}
}
@@ -1244,7 +1241,9 @@ static int genpd_suspend_noirq(struct device *dev)
{
dev_dbg(dev, "%s()\n", __func__);

- return genpd_finish_suspend(dev, false);
+ return genpd_finish_suspend(dev,
+ pm_generic_suspend_noirq,
+ pm_generic_resume_noirq);
}

/**
@@ -1353,7 +1352,9 @@ static int genpd_poweroff_noirq(struct device *dev)
{
dev_dbg(dev, "%s()\n", __func__);

- return genpd_finish_suspend(dev, true);
+ return genpd_finish_suspend(dev,
+ pm_generic_poweroff_noirq,
+ pm_generic_restore_noirq);
}

/**
--
2.25.1

2022-10-16 10:09:49

by Shawn Guo

[permalink] [raw]
Subject: [PATCH 3/3] PM: domains: Power off[on] domain in hibernate .freeze[thaw]_noirq hook

On platforms which use SHUTDOWN as hibernation mode, the genpd noirq
hooks will be called like below.

genpd_freeze_noirq() genpd_restore_noirq()
↓ ↑
Create snapshot image Restore system
↓ ↑
genpd_thaw_noirq() Read snapshot image
↓ ↑
Write snapshot image Kernel boot
↓ ↑
power_down() Power on device

As of today suspend hooks genpd_suspend[resume]_noirq() manages domain
on/off state, but hibernate hooks genpd_freeze[thaw]_noirq() doesn't.
This results in a different behavior of domain power state between suspend
and hibernate freeze, i.e. domain is powered off for the former while on
for the later. It causes a problem on platforms like i.MX where the
domain needs to be powered on/off by calling clock and regulator interface.
When the platform restores from hibernation, the domain is off in hardware
and genpd_restore_noirq() tries to power it on, but will never succeed
because software state of domain (clock and regulator) is left on from the
last hibernate freeze, so kernel thinks that clock and regulator are
enabled while they are actually not turned on in hardware. The
consequence would be that devices in the power domain will access
registers without clock or power, and cause hardware lockup.

Power off[on] domain in hibernate .freeze[thaw]_noirq hook for reasons:

- Align the behavior between suspend and hibernate freeze.
- Have power state of domains stay in sync between hardware and software
for hibernate freeze, and thus fix the lockup issue seen on i.MX power
platform.

Signed-off-by: Shawn Guo <[email protected]>
---
drivers/base/power/domain.c | 35 ++++-------------------------------
1 file changed, 4 insertions(+), 31 deletions(-)

diff --git a/drivers/base/power/domain.c b/drivers/base/power/domain.c
index fc0ac9e6050e..2497a43737e0 100644
--- a/drivers/base/power/domain.c
+++ b/drivers/base/power/domain.c
@@ -1323,24 +1323,11 @@ static int genpd_resume_noirq(struct device *dev)
*/
static int genpd_freeze_noirq(struct device *dev)
{
- const struct generic_pm_domain *genpd;
- int ret = 0;
-
dev_dbg(dev, "%s()\n", __func__);

- genpd = dev_to_genpd(dev);
- if (IS_ERR(genpd))
- return -EINVAL;
-
- ret = pm_generic_freeze_noirq(dev);
- if (ret)
- return ret;
-
- if (genpd->dev_ops.stop && genpd->dev_ops.start &&
- !pm_runtime_status_suspended(dev))
- ret = genpd_stop_dev(genpd, dev);
-
- return ret;
+ return genpd_finish_suspend(dev,
+ pm_generic_freeze_noirq,
+ pm_generic_thaw_noirq);
}

/**
@@ -1352,23 +1339,9 @@ static int genpd_freeze_noirq(struct device *dev)
*/
static int genpd_thaw_noirq(struct device *dev)
{
- const struct generic_pm_domain *genpd;
- int ret = 0;
-
dev_dbg(dev, "%s()\n", __func__);

- genpd = dev_to_genpd(dev);
- if (IS_ERR(genpd))
- return -EINVAL;
-
- if (genpd->dev_ops.stop && genpd->dev_ops.start &&
- !pm_runtime_status_suspended(dev)) {
- ret = genpd_start_dev(genpd, dev);
- if (ret)
- return ret;
- }
-
- return pm_generic_thaw_noirq(dev);
+ return genpd_finish_resume(dev, pm_generic_thaw_noirq);
}

/**
--
2.25.1

2022-10-16 10:10:51

by Shawn Guo

[permalink] [raw]
Subject: [PATCH 2/3] PM: domains: Consolidate genpd_restore_noirq() and genpd_resume_noirq()

Most of the logic between genpd_restore_noirq() and genpd_resume_noirq()
are same except GENPD_STATE_OFF status reset for hibernation restore.
The suspended_count decrement for restore should be the right thing to do
anyway, considering there is an increment in genpd_finish_suspend() for
hibernation.

Consolidate genpd_restore_noirq() and genpd_resume_noirq() into
genpd_finish_resume() and handle GENPD_STATE_OFF status reset for
restore case specially.

Signed-off-by: Shawn Guo <[email protected]>
---
drivers/base/power/domain.c | 70 ++++++++++++++++++-------------------
1 file changed, 35 insertions(+), 35 deletions(-)

diff --git a/drivers/base/power/domain.c b/drivers/base/power/domain.c
index 01f5644938e0..fc0ac9e6050e 100644
--- a/drivers/base/power/domain.c
+++ b/drivers/base/power/domain.c
@@ -1247,12 +1247,14 @@ static int genpd_suspend_noirq(struct device *dev)
}

/**
- * genpd_resume_noirq - Start of resume of device in an I/O PM domain.
+ * genpd_resume_noirq - Completion of resume of device in an I/O PM domain.
* @dev: Device to resume.
+ * @resume_noirq: Generic resume_noirq callback.
*
* Restore power to the device's PM domain, if necessary, and start the device.
*/
-static int genpd_resume_noirq(struct device *dev)
+static int genpd_finish_resume(struct device *dev,
+ int (*resume_noirq)(struct device *dev))
{
struct generic_pm_domain *genpd;
int ret;
@@ -1264,9 +1266,25 @@ static int genpd_resume_noirq(struct device *dev)
return -EINVAL;

if (device_wakeup_path(dev) && genpd_is_active_wakeup(genpd))
- return pm_generic_resume_noirq(dev);
+ return resume_noirq(dev);

genpd_lock(genpd);
+
+ /*
+ * Special handling for hibernation restore:
+ * At this point suspended_count == 0 means we are being run for the
+ * first time for the given domain in the present cycle.
+ */
+ if (resume_noirq == pm_generic_restore_noirq &&
+ genpd->suspended_count++ == 0) {
+ /*
+ * The boot kernel might put the domain into arbitrary state,
+ * so make it appear as powered off to genpd_sync_power_on(),
+ * so that it tries to power it on in case it was really off.
+ */
+ genpd->status = GENPD_STATE_OFF;
+ }
+
genpd_sync_power_on(genpd, true, 0);
genpd->suspended_count--;
genpd_unlock(genpd);
@@ -1281,6 +1299,19 @@ static int genpd_resume_noirq(struct device *dev)
return pm_generic_resume_noirq(dev);
}

+/**
+ * genpd_resume_noirq - Start of resume of device in an I/O PM domain.
+ * @dev: Device to resume.
+ *
+ * Restore power to the device's PM domain, if necessary, and start the device.
+ */
+static int genpd_resume_noirq(struct device *dev)
+{
+ dev_dbg(dev, "%s()\n", __func__);
+
+ return genpd_finish_resume(dev, pm_generic_resume_noirq);
+}
+
/**
* genpd_freeze_noirq - Completion of freezing a device in an I/O PM domain.
* @dev: Device to freeze.
@@ -1366,40 +1397,9 @@ static int genpd_poweroff_noirq(struct device *dev)
*/
static int genpd_restore_noirq(struct device *dev)
{
- struct generic_pm_domain *genpd;
- int ret = 0;
-
dev_dbg(dev, "%s()\n", __func__);

- genpd = dev_to_genpd(dev);
- if (IS_ERR(genpd))
- return -EINVAL;
-
- /*
- * At this point suspended_count == 0 means we are being run for the
- * first time for the given domain in the present cycle.
- */
- genpd_lock(genpd);
- if (genpd->suspended_count++ == 0) {
- /*
- * The boot kernel might put the domain into arbitrary state,
- * so make it appear as powered off to genpd_sync_power_on(),
- * so that it tries to power it on in case it was really off.
- */
- genpd->status = GENPD_STATE_OFF;
- }
-
- genpd_sync_power_on(genpd, true, 0);
- genpd_unlock(genpd);
-
- if (genpd->dev_ops.stop && genpd->dev_ops.start &&
- !pm_runtime_status_suspended(dev)) {
- ret = genpd_start_dev(genpd, dev);
- if (ret)
- return ret;
- }
-
- return pm_generic_restore_noirq(dev);
+ return genpd_finish_resume(dev, pm_generic_restore_noirq);
}

/**
--
2.25.1

2022-10-17 15:30:38

by kernel test robot

[permalink] [raw]
Subject: Re: [PATCH 2/3] PM: domains: Consolidate genpd_restore_noirq() and genpd_resume_noirq()

Hi Shawn,

I love your patch! Perhaps something to improve:

[auto build test WARNING on rafael-pm/linux-next]
[also build test WARNING on linus/master v6.1-rc1 next-20221017]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url: https://github.com/intel-lab-lkp/linux/commits/Shawn-Guo/Manage-domain-power-state-for-hibernate-freeze/20221017-114450
base: https://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git linux-next
patch link: https://lore.kernel.org/r/20221016095815.2550034-3-shawn.guo%40linaro.org
patch subject: [PATCH 2/3] PM: domains: Consolidate genpd_restore_noirq() and genpd_resume_noirq()
config: xtensa-randconfig-r032-20221017
compiler: xtensa-linux-gcc (GCC) 12.1.0
reproduce (this is a W=1 build):
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# https://github.com/intel-lab-lkp/linux/commit/07af89f71f5cf3e02a29c7da292968404e8ae5be
git remote add linux-review https://github.com/intel-lab-lkp/linux
git fetch --no-tags linux-review Shawn-Guo/Manage-domain-power-state-for-hibernate-freeze/20221017-114450
git checkout 07af89f71f5cf3e02a29c7da292968404e8ae5be
# save the config file
mkdir build_dir && cp config build_dir/.config
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-12.1.0 make.cross W=1 O=build_dir ARCH=xtensa SHELL=/bin/bash drivers/base/power/

If you fix the issue, kindly add following tag where applicable
| Reported-by: kernel test robot <[email protected]>

All warnings (new ones prefixed by >>):

>> drivers/base/power/domain.c:1258: warning: expecting prototype for genpd_resume_noirq(). Prototype was for genpd_finish_resume() instead


vim +1258 drivers/base/power/domain.c

10da65423fdbee Mikko Perttunen 2017-06-22 1248
596ba34bcd2978 Rafael J. Wysocki 2011-07-01 1249 /**
07af89f71f5cf3 Shawn Guo 2022-10-16 1250 * genpd_resume_noirq - Completion of resume of device in an I/O PM domain.
596ba34bcd2978 Rafael J. Wysocki 2011-07-01 1251 * @dev: Device to resume.
07af89f71f5cf3 Shawn Guo 2022-10-16 1252 * @resume_noirq: Generic resume_noirq callback.
596ba34bcd2978 Rafael J. Wysocki 2011-07-01 1253 *
0496c8ae366724 Rafael J. Wysocki 2012-01-29 1254 * Restore power to the device's PM domain, if necessary, and start the device.
596ba34bcd2978 Rafael J. Wysocki 2011-07-01 1255 */
07af89f71f5cf3 Shawn Guo 2022-10-16 1256 static int genpd_finish_resume(struct device *dev,
07af89f71f5cf3 Shawn Guo 2022-10-16 1257 int (*resume_noirq)(struct device *dev))
596ba34bcd2978 Rafael J. Wysocki 2011-07-01 @1258 {
596ba34bcd2978 Rafael J. Wysocki 2011-07-01 1259 struct generic_pm_domain *genpd;
a935424bb658f9 Ulf Hansson 2018-01-10 1260 int ret;
596ba34bcd2978 Rafael J. Wysocki 2011-07-01 1261
596ba34bcd2978 Rafael J. Wysocki 2011-07-01 1262 dev_dbg(dev, "%s()\n", __func__);
596ba34bcd2978 Rafael J. Wysocki 2011-07-01 1263
596ba34bcd2978 Rafael J. Wysocki 2011-07-01 1264 genpd = dev_to_genpd(dev);
596ba34bcd2978 Rafael J. Wysocki 2011-07-01 1265 if (IS_ERR(genpd))
596ba34bcd2978 Rafael J. Wysocki 2011-07-01 1266 return -EINVAL;
596ba34bcd2978 Rafael J. Wysocki 2011-07-01 1267
4e1d9a737d00f2 Patrice Chotard 2020-11-19 1268 if (device_wakeup_path(dev) && genpd_is_active_wakeup(genpd))
07af89f71f5cf3 Shawn Guo 2022-10-16 1269 return resume_noirq(dev);
596ba34bcd2978 Rafael J. Wysocki 2011-07-01 1270
0883ac038be12c Ulf Hansson 2017-02-08 1271 genpd_lock(genpd);
07af89f71f5cf3 Shawn Guo 2022-10-16 1272
07af89f71f5cf3 Shawn Guo 2022-10-16 1273 /*
07af89f71f5cf3 Shawn Guo 2022-10-16 1274 * Special handling for hibernation restore:
07af89f71f5cf3 Shawn Guo 2022-10-16 1275 * At this point suspended_count == 0 means we are being run for the
07af89f71f5cf3 Shawn Guo 2022-10-16 1276 * first time for the given domain in the present cycle.
07af89f71f5cf3 Shawn Guo 2022-10-16 1277 */
07af89f71f5cf3 Shawn Guo 2022-10-16 1278 if (resume_noirq == pm_generic_restore_noirq &&
07af89f71f5cf3 Shawn Guo 2022-10-16 1279 genpd->suspended_count++ == 0) {
07af89f71f5cf3 Shawn Guo 2022-10-16 1280 /*
07af89f71f5cf3 Shawn Guo 2022-10-16 1281 * The boot kernel might put the domain into arbitrary state,
07af89f71f5cf3 Shawn Guo 2022-10-16 1282 * so make it appear as powered off to genpd_sync_power_on(),
07af89f71f5cf3 Shawn Guo 2022-10-16 1283 * so that it tries to power it on in case it was really off.
07af89f71f5cf3 Shawn Guo 2022-10-16 1284 */
07af89f71f5cf3 Shawn Guo 2022-10-16 1285 genpd->status = GENPD_STATE_OFF;
07af89f71f5cf3 Shawn Guo 2022-10-16 1286 }
07af89f71f5cf3 Shawn Guo 2022-10-16 1287
0883ac038be12c Ulf Hansson 2017-02-08 1288 genpd_sync_power_on(genpd, true, 0);
596ba34bcd2978 Rafael J. Wysocki 2011-07-01 1289 genpd->suspended_count--;
0883ac038be12c Ulf Hansson 2017-02-08 1290 genpd_unlock(genpd);
f721889ff65afa Rafael J. Wysocki 2011-07-01 1291
17218e0092f8c7 Rafael J. Wysocki 2018-01-12 1292 if (genpd->dev_ops.stop && genpd->dev_ops.start &&
17218e0092f8c7 Rafael J. Wysocki 2018-01-12 1293 !pm_runtime_status_suspended(dev)) {
17218e0092f8c7 Rafael J. Wysocki 2018-01-12 1294 ret = genpd_start_dev(genpd, dev);
10da65423fdbee Mikko Perttunen 2017-06-22 1295 if (ret)
10da65423fdbee Mikko Perttunen 2017-06-22 1296 return ret;
a935424bb658f9 Ulf Hansson 2018-01-10 1297 }
10da65423fdbee Mikko Perttunen 2017-06-22 1298
a935424bb658f9 Ulf Hansson 2018-01-10 1299 return pm_generic_resume_noirq(dev);
596ba34bcd2978 Rafael J. Wysocki 2011-07-01 1300 }
596ba34bcd2978 Rafael J. Wysocki 2011-07-01 1301

--
0-DAY CI Kernel Test Service
https://01.org/lkp


Attachments:
(No filename) (6.12 kB)
config (123.46 kB)
Download all attachments

2022-10-18 03:02:55

by Shawn Guo

[permalink] [raw]
Subject: Re: [PATCH 2/3] PM: domains: Consolidate genpd_restore_noirq() and genpd_resume_noirq()

On Mon, Oct 17, 2022 at 10:50 PM kernel test robot <[email protected]> wrote:
>
> Hi Shawn,
>
> I love your patch! Perhaps something to improve:
>
> [auto build test WARNING on rafael-pm/linux-next]
> [also build test WARNING on linus/master v6.1-rc1 next-20221017]
> [If your patch is applied to the wrong git tree, kindly drop us a note.
> And when submitting patch, we suggest to use '--base' as documented in
> https://git-scm.com/docs/git-format-patch#_base_tree_information]
>
> url: https://github.com/intel-lab-lkp/linux/commits/Shawn-Guo/Manage-domain-power-state-for-hibernate-freeze/20221017-114450
> base: https://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git linux-next
> patch link: https://lore.kernel.org/r/20221016095815.2550034-3-shawn.guo%40linaro.org
> patch subject: [PATCH 2/3] PM: domains: Consolidate genpd_restore_noirq() and genpd_resume_noirq()
> config: xtensa-randconfig-r032-20221017
> compiler: xtensa-linux-gcc (GCC) 12.1.0
> reproduce (this is a W=1 build):
> wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
> chmod +x ~/bin/make.cross
> # https://github.com/intel-lab-lkp/linux/commit/07af89f71f5cf3e02a29c7da292968404e8ae5be
> git remote add linux-review https://github.com/intel-lab-lkp/linux
> git fetch --no-tags linux-review Shawn-Guo/Manage-domain-power-state-for-hibernate-freeze/20221017-114450
> git checkout 07af89f71f5cf3e02a29c7da292968404e8ae5be
> # save the config file
> mkdir build_dir && cp config build_dir/.config
> COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-12.1.0 make.cross W=1 O=build_dir ARCH=xtensa SHELL=/bin/bash drivers/base/power/
>
> If you fix the issue, kindly add following tag where applicable
> | Reported-by: kernel test robot <[email protected]>
>
> All warnings (new ones prefixed by >>):
>
> >> drivers/base/power/domain.c:1258: warning: expecting prototype for genpd_resume_noirq(). Prototype was for genpd_finish_resume() instead

It will be fixed in the next version.

Shawn