LinuxLists.cc - [PATCH v11 0/4] block layer runtime pm

2013-03-05 03:02:52

Subject: [PATCH v11 0/4] block layer runtime pm

In August 2010, Jens and Alan discussed about "Runtime PM and the block
layer". http://marc.info/?t=128259108400001&r=1&w=2
And then Alan has given a detailed implementation guide:
http://marc.info/?l=linux-scsi&m=133727953625963&w=2

To test:
# ls -l /sys/block/sda
/sys/devices/pci0000:00/0000:00:1f.2/ata1/host0/target0:0:0/0:0:0:0/block/sda

# echo 10000 > /sys/devices/pci0000:00/0000:00:1f.2/ata1/host0/target0:0:0/0:0:0:0/power/autosuspend_delay_ms
# echo auto > /sys/devices/pci0000:00/0000:00:1f.2/ata1/host0/target0:0:0/0:0:0:0/power/control
Then you'll see sda is suspended after 10secs idle.

[ 1767.680192] sd 2:0:0:0: [sda] Synchronizing SCSI cache
[ 1767.680317] sd 2:0:0:0: [sda] Stopping disk

And if you do some IO, it will resume immediately.
[ 1791.052438] sd 2:0:0:0: [sda] Starting disk

For test, I often set the autosuspend time to 1 second. If you are using
a GUI, the 10 seconds delay may be too long that the disk can not enter
runtime suspended state.

Note that sd's runtime suspend callback will dump some kernel messages
and the syslog daemon will write kernel message to /var/log/messages,
making the disk instantly resume after suspended. So for test, the
syslog daemon should better be temporarily stopped.

A git repo for it, on top of v3.9-rc1:
https://github.com/aaronlu/linux.git blockpm

v11:
- Add Alan Stern's Acked-by tag.

v10:
- Add link of Alan Stern's ideas on block layer runtime PM to patch 2
and 3's changelog;
- Add back code to schdule device suspend if scsi driver return -EBUSY.

v9:
- No need to mark last busy and autosuspend in blk_pm_runtime_init as
suggested by Alan Stern;
- mark last busy in blk_runtime_post_suspend if driver failed to runtime
suspend the device, so that PM core can try to autosuspend it some
time later;
- Update scsi bus layer runtime callback to handle scsi devices which
use request based runtime PM and which don't.

v8:
- Set default autosuspend delay to -1 to avoid suspend till an updated
value is set as suggested by Alan Stern;
- Always check the dev field of the queue structure, as it is incorrect
and meaningless to do any operation on devices that do not use block
layer runtime PM as reminded by Alan Stern;
- Update scsi bus level runtime PM callback to take care of scsi devices
that use block layer runtime PM and that don't.

v7:
- Add kernel doc for block layer runtime PM API as suggested by
Alan Stern;

- Add back check for q->dev, as that serves as a flag if driver
is using block layer runtime PM;

- Do not auto suspend when last request is finished, as that's a hot
path and auto suspend is not a trivial function. Instead, mark last
busy in pre_suspend so that runtim PM core will retry suspend some
time later to solve the 1st problem demostrated in v6, suggested by
Alan Stern.

- Move block layer runtime PM strtegy functions to where they are
needed instead of in include/linux/blkdev.h as suggested by Alan
Stern since clients of block layer do not need to know those
functions.

v6:
Take over from Lin Ming.

- Instead of put the device into autosuspend state in
blk_post_runtime_suspend, do it when the last request is finished.
This can also solve the problem illustrated below:

thread A thread B
|suspend timer expired |
| ... ... |a new request comes in,
| ... ... |blk_pm_add_request
| ... ... |skip request_resume due to
| ... ... |q->status is still RPM_ACTIVE
| rpm_suspend | ... ...
| scsi_runtime_suspend | ... ...
| blk_pre_runtime_suspend | ... ...
| return -EBUSY due to nr_pending | ... ...
| rpm_suspend done | ... ...
| | blk_pm_put_request, mark last busy

But no more trigger point, and the device will stay at RPM_ACTIVE state.
Run pm_runtime_autosuspend after the last request is finished solved
this problem.

- Requests which have the REQ_PM flag should not involve nr_pending
counting, or we may lose the condition to resume the device:
Suppose queue is active and nr_pending is 0. Then a REQ_PM request
comes and nr_pending will be increased to 1, but since the request has
REQ_PM flag, it will not cause resume. Before it is finished, a normal
request comes in, and since nr_pending is 1 now, it will not trigger
the resume of the device either. Bug.

- Do not quiesce the device in scsi bus level runtime suspend callback.
Since the only reason the device is to be runtime suspended is due to
no more requests pending for it, quiesce it is pointless.

- Remove scsi_autopm_* from sd_check_events as we are request driven.

- Call blk_pm_runtime_init in scsi_sysfs_initialize_dev, so that we do
not need to check queue's device in blk_pm_add/put_request.

- Do not mark last busy and initiate an autosuspend for the device in
blk_pm_runtime_init function.

- Do not mark last busy and initiate an autosuspend for the device in
block_post_runtime_resume, as when the request that triggered the
resume finished, the blk_pm_put_request will mark last busy and
initiate an autosuspend.

v5:
- rename scsi_execute_req to scsi_execute_req_flags
and wrap scsi_execute_req around it.
- add detail function descriptions in patch 2 log
- define static helper functions to do runtime pm work on block layer
and put the definitions inside a #ifdef block

v4:
- add CONFIG_PM_RUNTIME check
- update queue runtime pm status after system resume
- use pm_runtime_autosuspend instead of pm_request_autosuspend in scsi_runtime_idle
- always count PM request

v3:
- remove block layer suspend/resume callbacks
- add block layer runtime pm helper functions

v2:
- remove queue idle timer, use runtime pm core's auto suspend

Lin Ming (4):
block: add a flag to identify PM request
block: add runtime pm helpers
block: implement runtime pm strategy
sd: change to auto suspend mode

block/blk-core.c | 183 +++++++++++++++++++++++++++++++++++++++++++++
block/elevator.c | 26 +++++++
drivers/scsi/scsi_lib.c | 9 +--
drivers/scsi/scsi_pm.c | 84 +++++++++++++++++----
drivers/scsi/sd.c | 22 ++----
include/linux/blk_types.h | 2 +
include/linux/blkdev.h | 27 +++++++
include/scsi/scsi_device.h | 16 +++-
8 files changed, 331 insertions(+), 38 deletions(-)

--
1.8.1.4

2013-03-05 03:03:16

by Aaron Lu

[permalink] [raw]

Subject: [PATCH v11 4/4] sd: change to auto suspend mode

From: Lin Ming <[email protected]>

Uses block layer runtime pm helper functions in
scsi_runtime_suspend/resume for devices that take advantage of it.

Remove scsi_autopm_* from sd open/release path and check_events path.

Signed-off-by: Lin Ming <[email protected]>
Signed-off-by: Aaron Lu <[email protected]>
Acked-by: Alan Stern <[email protected]>
---
drivers/scsi/scsi_pm.c | 84 ++++++++++++++++++++++++++++++++++++++++++--------
drivers/scsi/sd.c | 13 +-------
2 files changed, 72 insertions(+), 25 deletions(-)

diff --git a/drivers/scsi/scsi_pm.c b/drivers/scsi/scsi_pm.c
index 8f6b12c..42539ee 100644
--- a/drivers/scsi/scsi_pm.c
+++ b/drivers/scsi/scsi_pm.c
@@ -144,33 +144,83 @@ static int scsi_bus_restore(struct device *dev)

#ifdef CONFIG_PM_RUNTIME

+static int sdev_blk_runtime_suspend(struct scsi_device *sdev,
+ int (*cb)(struct device *))
+{
+ int err;
+
+ err = blk_pre_runtime_suspend(sdev->request_queue);
+ if (err)
+ return err;
+ if (cb)
+ err = cb(&sdev->sdev_gendev);
+ blk_post_runtime_suspend(sdev->request_queue, err);
+
+ return err;
+}
+
+static int sdev_runtime_suspend(struct device *dev)
+{
+ const struct dev_pm_ops *pm = dev->driver ? dev->driver->pm : NULL;
+ int (*cb)(struct device *) = pm ? pm->runtime_suspend : NULL;
+ struct scsi_device *sdev = to_scsi_device(dev);
+ int err;
+
+ if (sdev->request_queue->dev)
+ return sdev_blk_runtime_suspend(sdev, cb);
+
+ err = scsi_dev_type_suspend(dev, cb);
+ if (err == -EAGAIN)
+ pm_schedule_suspend(dev, jiffies_to_msecs(
+ round_jiffies_up_relative(HZ/10)));
+ return err;
+}
+
static int scsi_runtime_suspend(struct device *dev)
{
int err = 0;
- const struct dev_pm_ops *pm = dev->driver ? dev->driver->pm : NULL;

dev_dbg(dev, "scsi_runtime_suspend\n");
- if (scsi_is_sdev_device(dev)) {
- err = scsi_dev_type_suspend(dev,
- pm ? pm->runtime_suspend : NULL);
- if (err == -EAGAIN)
- pm_schedule_suspend(dev, jiffies_to_msecs(
- round_jiffies_up_relative(HZ/10)));
- }
+ if (scsi_is_sdev_device(dev))
+ err = sdev_runtime_suspend(dev);

/* Insert hooks here for targets, hosts, and transport classes */

return err;
}

-static int scsi_runtime_resume(struct device *dev)
+static int sdev_blk_runtime_resume(struct scsi_device *sdev,
+ int (*cb)(struct device *))
{
int err = 0;
+
+ blk_pre_runtime_resume(sdev->request_queue);
+ if (cb)
+ err = cb(&sdev->sdev_gendev);
+ blk_post_runtime_resume(sdev->request_queue, err);
+
+ return err;
+}
+
+static int sdev_runtime_resume(struct device *dev)
+{
+ struct scsi_device *sdev = to_scsi_device(dev);
const struct dev_pm_ops *pm = dev->driver ? dev->driver->pm : NULL;
+ int (*cb)(struct device *) = pm ? pm->runtime_resume : NULL;
+
+ if (sdev->request_queue->dev)
+ return sdev_blk_runtime_resume(sdev, cb);
+ else
+ return scsi_dev_type_resume(dev, cb);
+}
+
+static int scsi_runtime_resume(struct device *dev)
+{
+ int err = 0;

dev_dbg(dev, "scsi_runtime_resume\n");
if (scsi_is_sdev_device(dev))
- err = scsi_dev_type_resume(dev, pm ? pm->runtime_resume : NULL);
+ err = sdev_runtime_resume(dev);

/* Insert hooks here for targets, hosts, and transport classes */

@@ -185,10 +235,18 @@ static int scsi_runtime_idle(struct device *dev)

/* Insert hooks here for targets, hosts, and transport classes */

- if (scsi_is_sdev_device(dev))
- err = pm_schedule_suspend(dev, 100);
- else
+ if (scsi_is_sdev_device(dev)) {
+ struct scsi_device *sdev = to_scsi_device(dev);
+
+ if (sdev->request_queue->dev) {
+ pm_runtime_mark_last_busy(dev);
+ err = pm_runtime_autosuspend(dev);
+ } else {
+ err = pm_runtime_suspend(dev);
+ }
+ } else {
err = pm_runtime_suspend(dev);
+ }
return err;
}

diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index c6e2b34..5000bec 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -1121,10 +1121,6 @@ static int sd_open(struct block_device *bdev, fmode_t mode)

sdev = sdkp->device;

- retval = scsi_autopm_get_device(sdev);
- if (retval)
- goto error_autopm;
-
/*
* If the device is in error recovery, wait until it is done.
* If the device is offline, then disallow any access to it.
@@ -1169,8 +1165,6 @@ static int sd_open(struct block_device *bdev, fmode_t mode)
return 0;

error_out:
- scsi_autopm_put_device(sdev);
-error_autopm:
scsi_disk_put(sdkp);
return retval;
}
@@ -1205,7 +1199,6 @@ static int sd_release(struct gendisk *disk, fmode_t mode)
* XXX is followed by a "rmmod sd_mod"?
*/

- scsi_autopm_put_device(sdev);
scsi_disk_put(sdkp);
return 0;
}
@@ -1367,14 +1360,9 @@ static unsigned int sd_check_events(struct gendisk *disk, unsigned int clearing)
retval = -ENODEV;

if (scsi_block_when_processing_errors(sdp)) {
- retval = scsi_autopm_get_device(sdp);
- if (retval)
- goto out;
-
sshdr = kzalloc(sizeof(*sshdr), GFP_KERNEL);
retval = scsi_test_unit_ready(sdp, SD_TIMEOUT, SD_MAX_RETRIES,
sshdr);
- scsi_autopm_put_device(sdp);
}

/* failed to execute TUR, assume media not present */
@@ -2839,6 +2827,7 @@ static void sd_probe_async(void *data, async_cookie_t cookie)

sd_printk(KERN_NOTICE, sdkp, "Attached SCSI %sdisk\n",
sdp->removable ? "removable " : "");
+ blk_pm_runtime_init(sdp->request_queue, dev);
scsi_autopm_put_device(sdp);
put_device(&sdkp->dev);
}
--
1.8.1.4

2013-03-05 03:03:14

by Aaron Lu

[permalink] [raw]

Subject: [PATCH v11 1/4] block: add a flag to identify PM request

From: Lin Ming <[email protected]>

Add a flag REQ_PM to identify the request is PM related, such requests
will not change the device request queue's runtime status. It is
intended to be used in driver's runtime PM callback, so that driver can
perform some IO to the device there with the queue's runtime status
unaffected. e.g. in SCSI disk's runtime suspend callback, the disk will
be put into stopped power state, and this require sending a command to
the device. Such command processing should not change the disk's runtime
status.

As an example, modify scsi code to use this flag.

Signed-off-by: Lin Ming <[email protected]>
Signed-off-by: Aaron Lu <[email protected]>
Acked-by: Alan Stern <[email protected]>
---
drivers/scsi/scsi_lib.c | 9 ++++-----
drivers/scsi/sd.c | 9 +++++----
include/linux/blk_types.h | 2 ++
include/scsi/scsi_device.h | 16 ++++++++++++----
4 files changed, 23 insertions(+), 13 deletions(-)

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 765398c..23f795f 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -271,11 +271,10 @@ int scsi_execute(struct scsi_device *sdev, const unsigned char *cmd,
}
EXPORT_SYMBOL(scsi_execute);

-
-int scsi_execute_req(struct scsi_device *sdev, const unsigned char *cmd,
+int scsi_execute_req_flags(struct scsi_device *sdev, const unsigned char *cmd,
int data_direction, void *buffer, unsigned bufflen,
struct scsi_sense_hdr *sshdr, int timeout, int retries,
- int *resid)
+ int *resid, int flags)
{
char *sense = NULL;
int result;
@@ -286,14 +285,14 @@ int scsi_execute_req(struct scsi_device *sdev, const unsigned char *cmd,
return DRIVER_ERROR << 24;
}
result = scsi_execute(sdev, cmd, data_direction, buffer, bufflen,
- sense, timeout, retries, 0, resid);
+ sense, timeout, retries, flags, resid);
if (sshdr)
scsi_normalize_sense(sense, SCSI_SENSE_BUFFERSIZE, sshdr);

kfree(sense);
return result;
}
-EXPORT_SYMBOL(scsi_execute_req);
+EXPORT_SYMBOL(scsi_execute_req_flags);

/*
* Function: scsi_init_cmd_errh()
diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index 7992635..c6e2b34 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -1424,8 +1424,9 @@ static int sd_sync_cache(struct scsi_disk *sdkp)
* Leave the rest of the command zero to indicate
* flush everything.
*/
- res = scsi_execute_req(sdp, cmd, DMA_NONE, NULL, 0, &sshdr,
- SD_FLUSH_TIMEOUT, SD_MAX_RETRIES, NULL);
+ res = scsi_execute_req_flags(sdp, cmd, DMA_NONE, NULL, 0,
+ &sshdr, SD_FLUSH_TIMEOUT,
+ SD_MAX_RETRIES, NULL, REQ_PM);
if (res == 0)
break;
}
@@ -3021,8 +3022,8 @@ static int sd_start_stop_device(struct scsi_disk *sdkp, int start)
if (!scsi_device_online(sdp))
return -ENODEV;

- res = scsi_execute_req(sdp, cmd, DMA_NONE, NULL, 0, &sshdr,
- SD_TIMEOUT, SD_MAX_RETRIES, NULL);
+ res = scsi_execute_req_flags(sdp, cmd, DMA_NONE, NULL, 0, &sshdr,
+ SD_TIMEOUT, SD_MAX_RETRIES, NULL, REQ_PM);
if (res) {
sd_printk(KERN_WARNING, sdkp, "START_STOP FAILED\n");
sd_print_result(sdkp, res);
diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
index cdf1119..fcc1ce2 100644
--- a/include/linux/blk_types.h
+++ b/include/linux/blk_types.h
@@ -175,6 +175,7 @@ enum rq_flag_bits {
__REQ_IO_STAT, /* account I/O stat */
__REQ_MIXED_MERGE, /* merge of different types, fail separately */
__REQ_KERNEL, /* direct IO to kernel pages */
+ __REQ_PM, /* runtime pm request */
__REQ_NR_BITS, /* stops here */
};

@@ -223,5 +224,6 @@ enum rq_flag_bits {
#define REQ_MIXED_MERGE (1 << __REQ_MIXED_MERGE)
#define REQ_SECURE (1 << __REQ_SECURE)
#define REQ_KERNEL (1 << __REQ_KERNEL)
+#define REQ_PM (1 << __REQ_PM)

#endif /* __LINUX_BLK_TYPES_H */
diff --git a/include/scsi/scsi_device.h b/include/scsi/scsi_device.h
index a7f9cba..cc64587 100644
--- a/include/scsi/scsi_device.h
+++ b/include/scsi/scsi_device.h
@@ -394,10 +394,18 @@ extern int scsi_execute(struct scsi_device *sdev, const unsigned char *cmd,
int data_direction, void *buffer, unsigned bufflen,
unsigned char *sense, int timeout, int retries,
int flag, int *resid);
-extern int scsi_execute_req(struct scsi_device *sdev, const unsigned char *cmd,
- int data_direction, void *buffer, unsigned bufflen,
- struct scsi_sense_hdr *, int timeout, int retries,
- int *resid);
+extern int scsi_execute_req_flags(struct scsi_device *sdev,
+ const unsigned char *cmd, int data_direction, void *buffer,
+ unsigned bufflen, struct scsi_sense_hdr *sshdr, int timeout,
+ int retries, int *resid, int flags);
+static inline int scsi_execute_req(struct scsi_device *sdev,
+ const unsigned char *cmd, int data_direction, void *buffer,
+ unsigned bufflen, struct scsi_sense_hdr *sshdr, int timeout,
+ int retries, int *resid)
+{
+ return scsi_execute_req_flags(sdev, cmd, data_direction, buffer,
+ bufflen, sshdr, timeout, retries, resid, 0);
+}
extern void sdev_disable_disk_events(struct scsi_device *sdev);
extern void sdev_enable_disk_events(struct scsi_device *sdev);

--
1.8.1.4

2013-03-05 03:03:11

by Aaron Lu

[permalink] [raw]

Subject: [PATCH v11 3/4] block: implement runtime pm strategy

From: Lin Ming <[email protected]>

When a request is added:
If device is suspended or is suspending and the request is not a
PM request, resume the device.

When the last request finishes:
Call pm_runtime_mark_last_busy().

When pick a request:
If device is resuming/suspending, then only PM request is allowed
to go.

The idea and API is designed by Alan Stern and described here:
http://marc.info/?l=linux-scsi&m=133727953625963&w=2

Signed-off-by: Lin Ming <[email protected]>
Signed-off-by: Aaron Lu <[email protected]>
Acked-by: Alan Stern <[email protected]>
---
block/blk-core.c | 39 +++++++++++++++++++++++++++++++++++++++
block/elevator.c | 26 ++++++++++++++++++++++++++
2 files changed, 65 insertions(+)

diff --git a/block/blk-core.c b/block/blk-core.c
index 123d240..441f348 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -1264,6 +1264,16 @@ void part_round_stats(int cpu, struct hd_struct *part)
}
EXPORT_SYMBOL_GPL(part_round_stats);

+#ifdef CONFIG_PM_RUNTIME
+static void blk_pm_put_request(struct request *rq)
+{
+ if (rq->q->dev && !(rq->cmd_flags & REQ_PM) && !--rq->q->nr_pending)
+ pm_runtime_mark_last_busy(rq->q->dev);
+}
+#else
+static inline void blk_pm_put_request(struct request *rq) {}
+#endif
+
/*
* queue lock must be held
*/
@@ -1274,6 +1284,8 @@ void __blk_put_request(struct request_queue *q, struct request *req)
if (unlikely(--req->ref_count))
return;

+ blk_pm_put_request(req);
+
elv_completed_request(q, req);

/* this is a bio leak */
@@ -2053,6 +2065,28 @@ static void blk_account_io_done(struct request *req)
}
}

+#ifdef CONFIG_PM_RUNTIME
+/*
+ * Don't process normal requests when queue is suspended
+ * or in the process of suspending/resuming
+ */
+static struct request *blk_pm_peek_request(struct request_queue *q,
+ struct request *rq)
+{
+ if (q->dev && (q->rpm_status == RPM_SUSPENDED ||
+ (q->rpm_status != RPM_ACTIVE && !(rq->cmd_flags & REQ_PM))))
+ return NULL;
+ else
+ return rq;
+}
+#else
+static inline struct request *blk_pm_peek_request(struct request_queue *q,
+ struct request *rq)
+{
+ return rq;
+}
+#endif
+
/**
* blk_peek_request - peek at the top of a request queue
* @q: request queue to peek at
@@ -2075,6 +2109,11 @@ struct request *blk_peek_request(struct request_queue *q)
int ret;

while ((rq = __elv_next_request(q)) != NULL) {
+
+ rq = blk_pm_peek_request(q, rq);
+ if (!rq)
+ break;
+
if (!(rq->cmd_flags & REQ_STARTED)) {
/*
* This is the first time the device driver
diff --git a/block/elevator.c b/block/elevator.c
index a0ffdd9..eba5b04 100644
--- a/block/elevator.c
+++ b/block/elevator.c
@@ -34,6 +34,7 @@
#include <linux/blktrace_api.h>
#include <linux/hash.h>
#include <linux/uaccess.h>
+#include <linux/pm_runtime.h>

#include <trace/events/block.h>

@@ -536,6 +537,27 @@ void elv_bio_merged(struct request_queue *q, struct request *rq,
e->type->ops.elevator_bio_merged_fn(q, rq, bio);
}

+#ifdef CONFIG_PM_RUNTIME
+static void blk_pm_requeue_request(struct request *rq)
+{
+ if (rq->q->dev && !(rq->cmd_flags & REQ_PM))
+ rq->q->nr_pending--;
+}
+
+static void blk_pm_add_request(struct request_queue *q, struct request *rq)
+{
+ if (q->dev && !(rq->cmd_flags & REQ_PM) && q->nr_pending++ == 0 &&
+ (q->rpm_status == RPM_SUSPENDED || q->rpm_status == RPM_SUSPENDING))
+ pm_request_resume(q->dev);
+}
+#else
+static inline void blk_pm_requeue_request(struct request *rq) {}
+static inline void blk_pm_add_request(struct request_queue *q,
+ struct request *rq)
+{
+}
+#endif
+
void elv_requeue_request(struct request_queue *q, struct request *rq)
{
/*
@@ -550,6 +572,8 @@ void elv_requeue_request(struct request_queue *q, struct request *rq)

rq->cmd_flags &= ~REQ_STARTED;

+ blk_pm_requeue_request(rq);
+
__elv_add_request(q, rq, ELEVATOR_INSERT_REQUEUE);
}

@@ -572,6 +596,8 @@ void __elv_add_request(struct request_queue *q, struct request *rq, int where)
{
trace_block_rq_insert(q, rq);

+ blk_pm_add_request(q, rq);
+
rq->q = q;

if (rq->cmd_flags & REQ_SOFTBARRIER) {
--
1.8.1.4

2013-03-05 03:03:08

by Aaron Lu

[permalink] [raw]

Subject: [PATCH v11 2/4] block: add runtime pm helpers

From: Lin Ming <[email protected]>

Add runtime pm helper functions:

void blk_pm_runtime_init(struct request_queue *q, struct device *dev)
- Initialization function for drivers to call.

int blk_pre_runtime_suspend(struct request_queue *q)
- If any requests are in the queue, mark last busy and return -EBUSY.
Otherwise set q->rpm_status to RPM_SUSPENDING and return 0.

void blk_post_runtime_suspend(struct request_queue *q, int err)
- If the suspend succeeded then set q->rpm_status to RPM_SUSPENDED.
Otherwise set it to RPM_ACTIVE and mark last busy.

void blk_pre_runtime_resume(struct request_queue *q)
- Set q->rpm_status to RPM_RESUMING.

void blk_post_runtime_resume(struct request_queue *q, int err)
- If the resume succeeded then set q->rpm_status to RPM_ACTIVE
and call __blk_run_queue, then mark last busy and autosuspend.
Otherwise set q->rpm_status to RPM_SUSPENDED.

The idea and API is designed by Alan Stern and described here:
http://marc.info/?l=linux-scsi&m=133727953625963&w=2

Signed-off-by: Lin Ming <[email protected]>
Signed-off-by: Aaron Lu <[email protected]>
Acked-by: Alan Stern <[email protected]>
---
block/blk-core.c | 144 +++++++++++++++++++++++++++++++++++++++++++++++++
include/linux/blkdev.h | 27 ++++++++++
2 files changed, 171 insertions(+)

diff --git a/block/blk-core.c b/block/blk-core.c
index 074b758..123d240 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -30,6 +30,7 @@
#include <linux/list_sort.h>
#include <linux/delay.h>
#include <linux/ratelimit.h>
+#include <linux/pm_runtime.h>

#define CREATE_TRACE_POINTS
#include <trace/events/block.h>
@@ -3045,6 +3046,149 @@ void blk_finish_plug(struct blk_plug *plug)
}
EXPORT_SYMBOL(blk_finish_plug);

+#ifdef CONFIG_PM_RUNTIME
+/**
+ * blk_pm_runtime_init - Block layer runtime PM initialization routine
+ * @q: the queue of the device
+ * @dev: the device the queue belongs to
+ *
+ * Description:
+ * Initialize runtime-PM-related fields for @q and start auto suspend for
+ * @dev. Drivers that want to take advantage of request-based runtime PM
+ * should call this function after @dev has been initialized, and its
+ * request queue @q has been allocated, and runtime PM for it can not happen
+ * yet(either due to disabled/forbidden or its usage_count > 0). In most
+ * cases, driver should call this function before any I/O has taken place.
+ *
+ * This function takes care of setting up using auto suspend for the device,
+ * the autosuspend delay is set to -1 to make runtime suspend impossible
+ * until an updated value is either set by user or by driver. Drivers do
+ * not need to touch other autosuspend settings.
+ *
+ * The block layer runtime PM is request based, so only works for drivers
+ * that use request as their IO unit instead of those directly use bio's.
+ */
+void blk_pm_runtime_init(struct request_queue *q, struct device *dev)
+{
+ q->dev = dev;
+ q->rpm_status = RPM_ACTIVE;
+ pm_runtime_set_autosuspend_delay(q->dev, -1);
+ pm_runtime_use_autosuspend(q->dev);
+}
+EXPORT_SYMBOL(blk_pm_runtime_init);
+
+/**
+ * blk_pre_runtime_suspend - Pre runtime suspend check
+ * @q: the queue of the device
+ *
+ * Description:
+ * This function will check if runtime suspend is allowed for the device
+ * by examining if there are any requests pending in the queue. If there
+ * are requests pending, the device can not be runtime suspended; otherwise,
+ * the queue's status will be updated to SUSPENDING and the driver can
+ * proceed to suspend the device.
+ *
+ * For the not allowed case, we mark last busy for the device so that
+ * runtime PM core will try to autosuspend it some time later.
+ *
+ * This function should be called near the start of the device's
+ * runtime_suspend callback.
+ *
+ * Return:
+ * 0 - OK to runtime suspend the device
+ * -EBUSY - Device should not be runtime suspended
+ */
+int blk_pre_runtime_suspend(struct request_queue *q)
+{
+ int ret = 0;
+
+ spin_lock_irq(q->queue_lock);
+ if (q->nr_pending) {
+ ret = -EBUSY;
+ pm_runtime_mark_last_busy(q->dev);
+ } else {
+ q->rpm_status = RPM_SUSPENDING;
+ }
+ spin_unlock_irq(q->queue_lock);
+ return ret;
+}
+EXPORT_SYMBOL(blk_pre_runtime_suspend);
+
+/**
+ * blk_post_runtime_suspend - Post runtime suspend processing
+ * @q: the queue of the device
+ * @err: return value of the device's runtime_suspend function
+ *
+ * Description:
+ * Update the queue's runtime status according to the return value of the
+ * device's runtime suspend function and mark last busy for the device so
+ * that PM core will try to auto suspend the device at a later time.
+ *
+ * This function should be called near the end of the device's
+ * runtime_suspend callback.
+ */
+void blk_post_runtime_suspend(struct request_queue *q, int err)
+{
+ spin_lock_irq(q->queue_lock);
+ if (!err) {
+ q->rpm_status = RPM_SUSPENDED;
+ } else {
+ q->rpm_status = RPM_ACTIVE;
+ pm_runtime_mark_last_busy(q->dev);
+ }
+ spin_unlock_irq(q->queue_lock);
+}
+EXPORT_SYMBOL(blk_post_runtime_suspend);
+
+/**
+ * blk_pre_runtime_resume - Pre runtime resume processing
+ * @q: the queue of the device
+ *
+ * Description:
+ * Update the queue's runtime status to RESUMING in preparation for the
+ * runtime resume of the device.
+ *
+ * This function should be called near the start of the device's
+ * runtime_resume callback.
+ */
+void blk_pre_runtime_resume(struct request_queue *q)
+{
+ spin_lock_irq(q->queue_lock);
+ q->rpm_status = RPM_RESUMING;
+ spin_unlock_irq(q->queue_lock);
+}
+EXPORT_SYMBOL(blk_pre_runtime_resume);
+
+/**
+ * blk_post_runtime_resume - Post runtime resume processing
+ * @q: the queue of the device
+ * @err: return value of the device's runtime_resume function
+ *
+ * Description:
+ * Update the queue's runtime status according to the return value of the
+ * device's runtime_resume function. If it is successfully resumed, process
+ * the requests that are queued into the device's queue when it is resuming
+ * and then mark last busy and initiate autosuspend for it.
+ *
+ * This function should be called near the end of the device's
+ * runtime_resume callback.
+ */
+void blk_post_runtime_resume(struct request_queue *q, int err)
+{
+ spin_lock_irq(q->queue_lock);
+ if (!err) {
+ q->rpm_status = RPM_ACTIVE;
+ __blk_run_queue(q);
+ pm_runtime_mark_last_busy(q->dev);
+ pm_runtime_autosuspend(q->dev);
+ } else {
+ q->rpm_status = RPM_SUSPENDED;
+ }
+ spin_unlock_irq(q->queue_lock);
+}
+EXPORT_SYMBOL(blk_post_runtime_resume);
+#endif
+
int __init blk_dev_init(void)
{
BUILD_BUG_ON(__REQ_NR_BITS > 8 *
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 78feda9..89d89c7 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -361,6 +361,12 @@ struct request_queue {
*/
struct kobject kobj;

+#ifdef CONFIG_PM_RUNTIME
+ struct device *dev;
+ int rpm_status;
+ unsigned int nr_pending;
+#endif
+
/*
* queue settings
*/
@@ -961,6 +967,27 @@ struct request_queue *blk_alloc_queue_node(gfp_t, int);
extern void blk_put_queue(struct request_queue *);

/*
+ * block layer runtime pm functions
+ */
+#ifdef CONFIG_PM_RUNTIME
+extern void blk_pm_runtime_init(struct request_queue *q, struct device *dev);
+extern int blk_pre_runtime_suspend(struct request_queue *q);
+extern void blk_post_runtime_suspend(struct request_queue *q, int err);
+extern void blk_pre_runtime_resume(struct request_queue *q);
+extern void blk_post_runtime_resume(struct request_queue *q, int err);
+#else
+static inline void blk_pm_runtime_init(struct request_queue *q,
+ struct device *dev) {}
+static inline int blk_pre_runtime_suspend(struct request_queue *q)
+{
+ return -ENOSYS;
+}
+static inline void blk_post_runtime_suspend(struct request_queue *q, int err) {}
+static inline void blk_pre_runtime_resume(struct request_queue *q) {}
+static inline void blk_post_runtime_resume(struct request_queue *q, int err) {}
+#endif
+
+/*
* blk_plug permits building a queue of related requests by holding the I/O
* fragments for a short period. This allows merging of sequential requests
* into single larger request. As the requests are moved from a per-task list to
--
1.8.1.4