2008-08-13 10:36:05

by Pavel Machek

[permalink] [raw]
Subject: Power management for SCSI


From: Alan Stern <[email protected]>

Add support for autosuspend/autoresume. Lowlevel driver can use it to
spin the disk down and power down its SATA link, to turn off the USB
interface, etc.

Spinning down the disk is useful - saves ~0.5W here. Powering down
SATA controller is even better -- should save ~1W.

Now, I guess the patch will need to be split to small pieces for
merge... I tried to rearrange it so that the documentation and hooks
go before stuff that needs the hooks, and before Kconfig enabler. If
it looks reasonably good, I'll split it into smaller pieces.

Signed-off-by: Pavel Machek <[email protected]>
Cc: James E.J. Bottomley <[email protected]>
Cc: Tejun Heo <[email protected]>


diff --git a/Documentation/scsi/scsi_mid_low_api.txt b/Documentation/scsi/scsi_mid_low_api.txt
index a6d5354..98dc005 100644
--- a/Documentation/scsi/scsi_mid_low_api.txt
+++ b/Documentation/scsi/scsi_mid_low_api.txt
@@ -782,6 +782,8 @@ In some cases more detail is given in sc
The interface functions are listed below in alphabetical order.

Summary:
+ autoresume - perform dynamic (runtime) host resume
+ autosuspend - perform dynamic (runtime) host suspend
bios_param - fetch head, sector, cylinder info for a disk
detect - detects HBAs this driver wants to control
eh_timed_out - notify the host that a command timer expired
@@ -802,6 +804,54 @@ Summary:
Details:

/**
+ * autoresume - perform dynamic (runtime) host resume
+ * @shp: host to resume
+ *
+ * Resume (return to an operational power level) the specified host.
+ * Return 0 if the resume was successful, otherwise a negative
+ * error code.
+ *
+ * Locks: struct Scsi_Host::pm_mutex held throughout the call.
+ *
+ * Calling context: process
+ *
+ * Notes: If the host is not currently suspended, this method does
+ * need to do anything.
+ *
+ * Optionally defined in: LLD
+ **/
+ int autoresume(struct Scsi_Host *shp)
+
+
+/**
+ * autosuspend - perform dynamic (runtime) host suspend
+ * @shp: host to suspend
+ *
+ * Suspend (change to a non-operational low-power state) the
+ * specified host.
+ * Return 0 if the suspend was successful (or was successfully
+ * queued, or was successfully ignored), otherwise a negative
+ * error code.
+ *
+ * Locks: struct Scsi_Host::pm_mutex held throughout the call.
+ *
+ * Calling context: process
+ *
+ * Notes: The suspend need not be carried out immediately (or indeed
+ * at all); it may be delayed indefinitely. The real meaning of this
+ * method call is that all of the host's devices are now idle. It can
+ * happen that an autosuspend is quickly followed by an autoresume,
+ * so it is beneficial if the suspend is delayed by a few seconds.
+ * A host is assumed to be at full power (resumed) when it is
+ * first created. In the absence of errors, the LLD will receive a
+ * strictly alternating sequence of autosuspend, autoresume,... calls.
+ *
+ * Optionally defined in: LLD
+ **/
+ int autosuspend(struct Scsi_Host *shp)
+
+
+/**
* bios_param - fetch head, sector, cylinder info for a disk
* @sdev: pointer to scsi device context (defined in
* include/scsi/scsi_device.h)

diff --git a/drivers/scsi/scsi_priv.h b/drivers/scsi/scsi_priv.h
index 79f0f75..70b1eca 100644
--- a/drivers/scsi/scsi_priv.h
+++ b/drivers/scsi/scsi_priv.h
@@ -4,11 +4,13 @@ #define _SCSI_PRIV_H
#include <linux/device.h>

struct request_queue;
+struct request;
struct scsi_cmnd;
struct scsi_device;
struct scsi_host_template;
struct Scsi_Host;
struct scsi_nl_hdr;
+struct workqueue_struct;


/*
@@ -67,6 +69,7 @@ int scsi_eh_get_sense(struct list_head *
extern int scsi_maybe_unblock_host(struct scsi_device *sdev);
extern void scsi_device_unbusy(struct scsi_device *sdev);
extern int scsi_queue_insert(struct scsi_cmnd *cmd, int reason);
+extern void scsi_run_queue(struct request_queue *q);
extern void scsi_next_command(struct scsi_cmnd *cmd);
extern void scsi_io_completion(struct scsi_cmnd *, unsigned int);
extern void scsi_run_host_queues(struct Scsi_Host *shost);
@@ -134,6 +137,41 @@ static inline void scsi_netlink_init(voi
static inline void scsi_netlink_exit(void) {}
#endif

+/* scsi_pm.c */
+extern int scsi_bus_suspend(struct device *, pm_message_t);
+extern int scsi_bus_resume(struct device *);
+extern int scsi_pm_state_check(struct scsi_device *, struct request *);
+extern int scsi_pm_device_stop(struct scsi_device *);
+extern int scsi_pm_host_stop(struct Scsi_Host *);
+#ifdef CONFIG_SCSI_DYNAMIC_PM
+extern void scsi_autosuspend_host(struct Scsi_Host *);
+extern int scsi_autoresume_host(struct Scsi_Host *);
+extern void scsi_pm_host_initialize(struct Scsi_Host *);
+extern void scsi_mark_last_busy(struct scsi_device *);
+extern void scsi_use_ULD_pm(struct scsi_device *, int);
+extern void scsi_autosuspend_device(struct scsi_device *);
+extern int scsi_autoresume_device(struct scsi_device *);
+extern int scsi_pm_create_device_files(struct scsi_device *);
+extern void scsi_pm_device_initialize(struct scsi_device *);
+extern int scsi_init_pm(void);
+extern void scsi_exit_pm(void);
+#else
+static inline void scsi_autosuspend_host(struct Scsi_Host *shost) {}
+static inline int scsi_autoresume_host(struct Scsi_Host *shost)
+ { return 0; }
+static inline void scsi_pm_host_initialize(struct Scsi_Host *shost) {}
+static inline void scsi_mark_last_busy(struct scsi_device *sdev) {}
+static inline void scsi_use_ULD_pm(struct scsi_device *sdev, int v) {}
+static inline void scsi_autosuspend_device(struct scsi_device *sdev) {}
+static inline int scsi_autoresume_device(struct scsi_device *sdev)
+ { return 0; }
+static inline int scsi_pm_create_device_files(struct scsi_device *sdev)
+ { return 0; }
+static inline void scsi_pm_device_initialize(struct scsi_device *sdev) {}
+static inline int scsi_init_pm(void) { return 0; }
+static inline void scsi_exit_pm(void) {}
+#endif /* CONFIG_SCSI_DYNAMIC_PM */
+
/*
* internal scsi timeout functions: for use by mid-layer and transport
* classes.



diff --git a/include/scsi/scsi_device.h b/include/scsi/scsi_device.h
index 291d56a..9aa96e9 100644
--- a/include/scsi/scsi_device.h
+++ b/include/scsi/scsi_device.h
@@ -36,8 +36,9 @@ enum scsi_device_state {
* Only error handler commands allowed */
SDEV_DEL, /* device deleted
* no commands allowed */
- SDEV_QUIESCE, /* Device quiescent. No block commands
- * will be accepted, only specials (which
+ SDEV_QUIESCE, /* Device quiescent or suspended.
+ * No block commands will be accepted,
+ * only specials (which
* originate in the mid-layer) */
SDEV_OFFLINE, /* Device offlined (by error handling or
* user request */
@@ -163,6 +164,24 @@ #define SCSI_DEFAULT_DEVICE_BLOCKED 3

struct execute_work ew; /* used to get process context on put */

+#ifdef CONFIG_SCSI_DYNAMIC_PM
+ struct mutex pm_mutex; /* protect PM data & operations */
+ struct work_struct autoresume_work;
+
+ unsigned long last_busy; /* time of last use */
+ int autosuspend_delay; /* delay in jiffies */
+ int pm_usage_cnt; /* usage counter for autosuspend */
+
+ unsigned is_suspended:1;
+ unsigned pm_in_progress:1; /* performing suspend or resume */
+ unsigned auto_pm:1; /* doing autosuspend or autoresume */
+ unsigned autosuspend_disabled:1; /* autosuspend & autoresume */
+ unsigned autoresume_disabled:1; /* disabled by the user */
+ unsigned skip_sys_resume:1; /* skip the next system resume */
+ unsigned use_ULD_pm:1; /* call the Upper-Level Driver's
+ * suspend/resume methods */
+#endif /* CONFIG_SCSI_DYNAMIC_PM */
+
struct scsi_dh_data *scsi_dh_data;
enum scsi_device_state sdev_state;
unsigned long sdev_data[0];
diff --git a/include/scsi/scsi_host.h b/include/scsi/scsi_host.h
index 44a55d1..b60445f 100644
--- a/include/scsi/scsi_host.h
+++ b/include/scsi/scsi_host.h
@@ -177,6 +177,22 @@ #endif
int (* eh_host_reset_handler)(struct scsi_cmnd *);

/*
+ * Power management routines. These are optional; you should
+ * implement them if you want your LLD to perform dynamic Power
+ * Management. The autosuspend method will be called whenever
+ * all the devices below a host have been suspended (are in an
+ * idle state), at which time the host adapter can safely be
+ * autosuspended. The autoresume method will be called whenever
+ * a suspended host must be resumed for one of its devices to
+ * carry out a command. Both routines are always called in a
+ * process context with interrupts enabled.
+ *
+ * Status: OPTIONAL
+ */
+ int (* autosuspend)(struct Scsi_Host *);
+ int (* autoresume)(struct Scsi_Host *);
+
+ /*
* Before the mid layer attempts to scan for a new device where none
* currently exists, it will call this entry in your driver. Should
* your driver need to allocate any structs or perform any other init
@@ -659,6 +675,14 @@ struct Scsi_Host {
/* ldm bits */
struct device shost_gendev, shost_dev;

+#ifdef CONFIG_SCSI_DYNAMIC_PM
+ struct mutex pm_mutex; /* protect PM data & operations */
+ struct delayed_work autosuspend_work;
+
+ int pm_usage_cnt; /* usage counter for autosuspend */
+ unsigned is_suspended:1;
+#endif /* CONFIG_SCSI_DYNAMIC_PM */
+
/*
* List of hosts per template.
*

diff --git a/drivers/scsi/hosts.c b/drivers/scsi/hosts.c
index fed0b02..2ec7fa1 100644
--- a/drivers/scsi/hosts.c
+++ b/drivers/scsi/hosts.c
@@ -154,24 +154,15 @@ EXPORT_SYMBOL(scsi_host_set_state);
**/
void scsi_remove_host(struct Scsi_Host *shost)
{
- unsigned long flags;
- mutex_lock(&shost->scan_mutex);
- spin_lock_irqsave(shost->host_lock, flags);
- if (scsi_host_set_state(shost, SHOST_CANCEL))
- if (scsi_host_set_state(shost, SHOST_CANCEL_RECOVERY)) {
- spin_unlock_irqrestore(shost->host_lock, flags);
- mutex_unlock(&shost->scan_mutex);
- return;
- }
- spin_unlock_irqrestore(shost->host_lock, flags);
- mutex_unlock(&shost->scan_mutex);
+ if (scsi_pm_host_stop(shost))
+ return;
scsi_forget_host(shost);
scsi_proc_host_rm(shost);

- spin_lock_irqsave(shost->host_lock, flags);
+ spin_lock_irq(shost->host_lock);
if (scsi_host_set_state(shost, SHOST_DEL))
BUG_ON(scsi_host_set_state(shost, SHOST_DEL_RECOVERY));
- spin_unlock_irqrestore(shost->host_lock, flags);
+ spin_unlock_irq(shost->host_lock);

transport_unregister_device(&shost->shost_gendev);
device_unregister(&shost->shost_dev);
@@ -246,6 +237,7 @@ int scsi_add_host(struct Scsi_Host *shos
if (error)
goto out_destroy_host;

+ scsi_autosuspend_host(shost);
scsi_proc_host_add(shost);
return error;

@@ -402,6 +394,8 @@ #endif
shost->host_no);
shost->shost_dev.groups = scsi_sysfs_shost_attr_groups;

+ scsi_pm_host_initialize(shost);
+
shost->ehandler = kthread_run(scsi_error_handler, shost,
"scsi_eh_%d", shost->host_no);
if (IS_ERR(shost->ehandler)) {
diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c
index ee6be59..a638bc5 100644
--- a/drivers/scsi/scsi.c
+++ b/drivers/scsi/scsi.c
@@ -1279,15 +1279,20 @@ static int __init init_scsi(void)
error = scsi_init_sysctl();
if (error)
goto cleanup_hosts;
- error = scsi_sysfs_register();
+ error = scsi_init_pm();
if (error)
goto cleanup_sysctl;
+ error = scsi_sysfs_register();
+ if (error)
+ goto cleanup_pm;

scsi_netlink_init();

printk(KERN_NOTICE "SCSI subsystem initialized\n");
return 0;

+cleanup_pm:
+ scsi_exit_pm();
cleanup_sysctl:
scsi_exit_sysctl();
cleanup_hosts:
@@ -1307,6 +1312,7 @@ static void __exit exit_scsi(void)
{
scsi_netlink_exit();
scsi_sysfs_unregister();
+ scsi_exit_pm();
scsi_exit_sysctl();
scsi_exit_hosts();
scsi_exit_devinfo();
diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index 880051c..0cb7448 100644
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -482,30 +482,32 @@ static void scsi_eh_done(struct scsi_cmn

/**
* scsi_try_host_reset - ask host adapter to reset itself
- * @scmd: SCSI cmd to send hsot reset.
+ * @scmd: SCSI cmd to send host reset.
*/
static int scsi_try_host_reset(struct scsi_cmnd *scmd)
{
unsigned long flags;
int rtn;
+ struct Scsi_Host *shost = scmd->device->host;

SCSI_LOG_ERROR_RECOVERY(3, printk("%s: Snd Host RST\n",
__func__));

if (!scmd->device->host->hostt->eh_host_reset_handler)
return FAILED;
-
- rtn = scmd->device->host->hostt->eh_host_reset_handler(scmd);
+ if (scsi_autoresume_host(shost) != 0)
+ return FAILED;

if (rtn == SUCCESS) {
- if (!scmd->device->host->hostt->skip_settle_delay)
+ if (!shost->hostt->skip_settle_delay)
ssleep(HOST_RESET_SETTLE_TIME);
- spin_lock_irqsave(scmd->device->host->host_lock, flags);
- scsi_report_bus_reset(scmd->device->host,
+ spin_lock_irqsave(shost->host_lock, flags);
+ scsi_report_bus_reset(shost,
scmd_channel(scmd));
- spin_unlock_irqrestore(scmd->device->host->host_lock, flags);
+ spin_unlock_irqrestore(shost->host_lock, flags);
}

+ scsi_autosuspend_host(shost);
return rtn;
}

@@ -517,24 +519,28 @@ static int scsi_try_bus_reset(struct scs
{
unsigned long flags;
int rtn;
+ struct Scsi_Host *shost = scmd->device->host;

SCSI_LOG_ERROR_RECOVERY(3, printk("%s: Snd Bus RST\n",
__func__));

- if (!scmd->device->host->hostt->eh_bus_reset_handler)
+ if (!shost->hostt->eh_bus_reset_handler)
+ return FAILED;
+ if (scsi_autoresume_host(shost) != 0)
return FAILED;

- rtn = scmd->device->host->hostt->eh_bus_reset_handler(scmd);
+ rtn = shost->hostt->eh_bus_reset_handler(scmd);

if (rtn == SUCCESS) {
- if (!scmd->device->host->hostt->skip_settle_delay)
+ if (!shost->hostt->skip_settle_delay)
ssleep(BUS_RESET_SETTLE_TIME);
- spin_lock_irqsave(scmd->device->host->host_lock, flags);
- scsi_report_bus_reset(scmd->device->host,
+ spin_lock_irqsave(shost->host_lock, flags);
+ scsi_report_bus_reset(shost,
scmd_channel(scmd));
- spin_unlock_irqrestore(scmd->device->host->host_lock, flags);
+ spin_unlock_irqrestore(shost->host_lock, flags);
}

+ scsi_autosuspend_host(shost);
return rtn;
}

@@ -1646,6 +1652,7 @@ static void scsi_unjam_host(struct Scsi_
int scsi_error_handler(void *data)
{
struct Scsi_Host *shost = data;
+ int autoresume_rc;

/*
* We use TASK_INTERRUPTIBLE so that the thread is not
@@ -1675,6 +1682,7 @@ int scsi_error_handler(void *data)
* what we need to do to get it up and online again (if we can).
* If we fail, we end up taking the thing offline.
*/
+ autoresume_rc = scsi_autoresume_host(shost);
if (shost->transportt->eh_strategy_handler)
shost->transportt->eh_strategy_handler(shost);
else
@@ -1688,6 +1696,8 @@ int scsi_error_handler(void *data)
* which are still online.
*/
scsi_restart_operations(shost);
+ if (autoresume_rc == 0)
+ scsi_autosuspend_host(shost);
set_current_state(TASK_INTERRUPTIBLE);
}
__set_current_state(TASK_RUNNING);


diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index ff5d56b..f6d8c75 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -67,8 +67,6 @@ #undef SP

struct kmem_cache *scsi_sdb_cache;

-static void scsi_run_queue(struct request_queue *q);
-
/*
* Function: scsi_unprep_request()
*
@@ -470,6 +468,7 @@ void scsi_device_unbusy(struct scsi_devi
spin_unlock(shost->host_lock);
spin_lock(sdev->request_queue->queue_lock);
sdev->device_busy--;
+ scsi_mark_last_busy(sdev);
spin_unlock_irqrestore(sdev->request_queue->queue_lock, flags);
}

@@ -531,7 +530,7 @@ static void scsi_single_lun_run(struct s
* Notes: The previous command was completely finished, start
* a new one if possible.
*/
-static void scsi_run_queue(struct request_queue *q)
+void scsi_run_queue(struct request_queue *q)
{
struct scsi_device *sdev = q->queuedata;
struct Scsi_Host *shost = sdev->host;
@@ -1249,6 +1248,8 @@ int scsi_prep_state_check(struct scsi_de
ret = BLKPREP_KILL;
break;
case SDEV_QUIESCE:
+ ret = scsi_pm_state_check(sdev, req);
+ break;
case SDEV_BLOCK:
/*
* If the devices is blocked we defer normal commands.

diff --git a/drivers/scsi/Makefile b/drivers/scsi/Makefile
index 72fd504..9ab631c 100644
--- a/drivers/scsi/Makefile
+++ b/drivers/scsi/Makefile
@@ -143,7 +143,8 @@ obj-$(CONFIG_SCSI_WAIT_SCAN) += scsi_wai
scsi_mod-y += scsi.o hosts.o scsi_ioctl.o constants.o \
scsicam.o scsi_error.o scsi_lib.o
scsi_mod-$(CONFIG_SCSI_DMA) += scsi_lib_dma.o
-scsi_mod-y += scsi_scan.o scsi_sysfs.o scsi_devinfo.o
+scsi_mod-y += scsi_scan.o scsi_sysfs.o scsi_devinfo.o \
+ scsi_pm.o
scsi_mod-$(CONFIG_SCSI_NETLINK) += scsi_netlink.o
scsi_mod-$(CONFIG_SYSCTL) += scsi_sysctl.o
scsi_mod-$(CONFIG_SCSI_PROC_FS) += scsi_proc.o


diff --git a/drivers/scsi/scsi_pm.c b/drivers/scsi/scsi_pm.c
new file mode 100644
index 0000000..e76f1bd
--- /dev/null
+++ b/drivers/scsi/scsi_pm.c
@@ -0,0 +1,860 @@
+/*
+ * scsi_pm.c Copyright (C) 2008 Alan Stern
+ *
+ * SCSI dynamic Power Management
+ * Initial version: Alan Stern <[email protected]>
+ */
+
+#define DEBUG
+
+#include <scsi/scsi.h>
+#include <scsi/scsi_device.h>
+#include <scsi/scsi_host.h>
+
+#include <linux/delay.h>
+
+#include "scsi_priv.h"
+
+#define shost_dbg(shost, format, arg...) \
+ dev_dbg(&shost->shost_gendev , format , ## arg)
+#define sdev_dbg(sdev, format, arg...) \
+ dev_dbg(&sdev->sdev_gendev , format , ## arg)
+
+#ifdef CONFIG_SCSI_DYNAMIC_PM
+
+/* This value is completely arbitrary. Should it be a module parameter? */
+#define SCSI_DEFAULT_AUTOSUSPEND_DELAY (30*HZ)
+
+/* Workqueue for autosuspend and autoresume of devices and hosts */
+struct workqueue_struct *ksuspend_scsi_wq;
+
+static void scsi_try_autosuspend_device(struct scsi_device *);
+static int autosuspend_check(struct scsi_device *);
+
+/**
+ * scsi_autosuspend_host - autosuspend a SCSI host
+ * @shost: the Scsi_Host to autosuspend
+ *
+ * This routine should be called when a core subsystem is finished using
+ * @shost and wants to allow it to autosuspend. @shost's usage counter
+ * is decremented. If the result is non-positive and the host is in the
+ * proper state, an autosuspend request will be forwarded to the LLD.
+ *
+ * This routine can run only in process context.
+ */
+void scsi_autosuspend_host(struct Scsi_Host *shost)
+{
+ mutex_lock(&shost->pm_mutex);
+ --shost->pm_usage_cnt;
+ WARN_ON(shost->pm_usage_cnt < 0);
+ if (shost->pm_usage_cnt <= 0 && !shost->is_suspended &&
+ shost->shost_state == SHOST_RUNNING) {
+ WARN_ON(shost->host_busy);
+ if (!shost->hostt->autosuspend ||
+ shost->hostt->autosuspend(shost) == 0) {
+ shost->is_suspended = 1;
+ shost_dbg(shost, "suspended\n");
+ }
+ }
+ mutex_unlock(&shost->pm_mutex);
+}
+
+/**
+ * scsi_autoresume_host - autoresume a SCSI host
+ * @shost: the Scsi_Host to autoresume
+ *
+ * This routine should be called when a core subsystem wants to use @shost
+ * and needs to guarantee that it is not suspended. No autosuspend will
+ * occur until scsi_autosuspend_host is called. (Note that this will not
+ * prevent suspend events originating in the PM core.)
+ *
+ * @shost's usage counter is incremented to prevent subsequent autosuspends.
+ * If @shost was suspended, an autoresume request is forwarded to the LLD.
+ * If the autoresume fails, the usage counter is re-decremented.
+ *
+ * This routine can run only in process context.
+ */
+
+int scsi_autoresume_host(struct Scsi_Host *shost)
+{
+ int status = 0;
+
+ mutex_lock(&shost->pm_mutex);
+ ++shost->pm_usage_cnt;
+ if (shost->is_suspended) {
+ if (shost->hostt->autoresume &&
+ (shost->shost_state == SHOST_RUNNING ||
+ shost->shost_state == SHOST_RECOVERY))
+ status = shost->hostt->autoresume(shost);
+ if (status == 0) {
+ shost->is_suspended = 0;
+ shost_dbg(shost, "resumed\n");
+ } else {
+ --shost->pm_usage_cnt;
+ }
+ }
+ mutex_unlock(&shost->pm_mutex);
+ return status;
+}
+
+#define SCAN_INTERVAL (10 * HZ) /* Autosuspend scan every 10 seconds */
+#define MAX_ATTEMPTS 3 /* Max autosuspend attempts */
+
+/**
+ * periodic_autosuspend_scan - try to autosuspend devices under a host
+ * @shost: host whose devices should be scanned for autosuspend
+ *
+ * Every so often (the default interval is 10 seconds but the actual
+ * interval can be shorter) all the devices under a host are checked to
+ * see if any of them can be autosuspended. This check is also made
+ * whenever a device's state changes so that it may be autosuspended.
+ * If any devices are in a suspendable state (i.e., not prohibited from
+ * autosuspending) but their idle-time delay hasn't yet expired, another
+ * scan is scheduled.
+ */
+static void periodic_autosuspend_scan(struct Scsi_Host *shost)
+{
+ struct scsi_device *sdev;
+ int any_suspendable;
+ int min_delay;
+ int num_attempts = 0;
+ int status;
+ unsigned long delay;
+
+ restart:
+ any_suspendable = 0;
+ min_delay = SCAN_INTERVAL;
+ spin_lock_irq(shost->host_lock);
+
+ /* Check each device below this host */
+ __shost_for_each_device(sdev, shost) {
+ if (sdev->is_suspended)
+ continue;
+ status = autosuspend_check(sdev);
+ if (status == -EPERM)
+ continue;
+
+ /* The device is suspendable. Should it be autosuspended? */
+ if (status == 0) {
+ if (num_attempts < MAX_ATTEMPTS) {
+ ++num_attempts;
+ spin_unlock_irq(shost->host_lock);
+ scsi_try_autosuspend_device(sdev);
+ goto restart;
+ }
+ status = HZ; /* Try again later */
+ }
+ if (status >= 0)
+ min_delay = min(min_delay, status);
+ any_suspendable = 1;
+ }
+
+ /* If any devices are still suspendable, rearm the periodic scan */
+ if (any_suspendable && (shost->shost_state == SHOST_RUNNING ||
+ shost->shost_state == SHOST_RECOVERY)) {
+
+ /* Round the delay up to the nearest second */
+ delay = round_jiffies_relative(min_delay);
+ if (delay < min_delay)
+ delay += HZ;
+ queue_delayed_work(ksuspend_scsi_wq, &shost->autosuspend_work,
+ delay);
+ }
+ spin_unlock_irq(shost->host_lock);
+}
+
+/* Periodic autosuspend workqueue routine */
+static void autosuspend_shost_work(struct work_struct *work)
+{
+ struct Scsi_Host *shost =
+ container_of(work, struct Scsi_Host, autosuspend_work.work);
+
+ periodic_autosuspend_scan(shost);
+}
+
+void scsi_pm_host_initialize(struct Scsi_Host *shost)
+{
+ mutex_init(&shost->pm_mutex);
+ INIT_DELAYED_WORK(&shost->autosuspend_work, autosuspend_shost_work);
+ shost->pm_usage_cnt = 1;
+}
+
+int scsi_pm_host_stop(struct Scsi_Host *shost)
+{
+ int rc = 0;
+
+ /* Prevent new device addition and synchronize with any ongoing
+ * PM activity.
+ */
+ mutex_lock(&shost->scan_mutex);
+ mutex_lock(&shost->pm_mutex);
+ spin_lock_irq(shost->host_lock);
+ if (scsi_host_set_state(shost, SHOST_CANCEL))
+ rc = scsi_host_set_state(shost, SHOST_CANCEL_RECOVERY);
+ spin_unlock_irq(shost->host_lock);
+ mutex_unlock(&shost->pm_mutex);
+ mutex_unlock(&shost->scan_mutex);
+
+ /* Stop any autosuspend requests already pending */
+ cancel_delayed_work_sync(&shost->autosuspend_work);
+ return rc;
+}
+
+/*
+ * Internal routine to check whether we may autosuspend a device.
+ * The return value isn't fully reliable unless the caller holds
+ * the device's request-queue lock.
+ */
+static int autosuspend_check(struct scsi_device *sdev)
+{
+ unsigned long suspend_time;
+
+ if (sdev->autosuspend_delay < 0 || sdev->autosuspend_disabled
+ || sdev->pm_usage_cnt > 0)
+ return -EPERM;
+ if (sdev->device_busy > 0)
+ return -EBUSY;
+ if (!(sdev->sdev_state == SDEV_RUNNING ||
+ sdev->sdev_state == SDEV_QUIESCE))
+ return -ENODEV;
+
+ suspend_time = sdev->last_busy + sdev->autosuspend_delay;
+ if (time_before(jiffies, suspend_time))
+ return suspend_time - jiffies;
+ return 0;
+}
+
+/* Record the time the device was most recently busy */
+void scsi_mark_last_busy(struct scsi_device *sdev)
+{
+ sdev->last_busy = jiffies;
+}
+
+/* Allow/disallow calls to the Upper-Level Driver's suspend/resume methods */
+void scsi_use_ULD_pm(struct scsi_device *sdev, int v)
+{
+ mutex_lock(&sdev->pm_mutex);
+ sdev->use_ULD_pm = v;
+ mutex_unlock(&sdev->pm_mutex);
+}
+EXPORT_SYMBOL_GPL(scsi_use_ULD_pm);
+
+static void device_may_be_suspendable(struct scsi_device *sdev)
+{
+ int status;
+
+ /* sdev's state has changed and as a result it may now be
+ * suspendable.
+ */
+ if (sdev->is_suspended)
+ return;
+ status = autosuspend_check(sdev);
+ if (status == -EPERM)
+ return;
+
+ /* It is suspendable, so schedule a periodic host scan
+ * unless one is already pending.
+ */
+ if (!timer_pending(&sdev->host->autosuspend_work.timer))
+ periodic_autosuspend_scan(sdev->host);
+}
+
+/**
+ * scsi_suspend_sdev - suspend a SCSI device
+ * @sdev: the scsi_device to suspend
+ * @msg: Power Management message describing this state transition
+ *
+ * SCSI devices can't actually be suspended in a literal sense,
+ * because SCSI doesn't have any notion of power management. Instead
+ * this routine drains the request queue and calls the ULD's suspend
+ * method to flush caches, spin-down drives, and so on.
+ *
+ * If the suspend succeeds, we call scsi_autosuspend_host to decrement
+ * the host's count of unsuspended devices and invoke the LLD's suspend
+ * method.
+ *
+ * The caller must hold @sdev->pm_mutex.
+ *
+ * This routine can run only in process context.
+ */
+static int scsi_suspend_sdev(struct scsi_device *sdev, pm_message_t msg)
+{
+ struct device_driver *drv = sdev->sdev_gendev.driver;
+ int status = 0;
+ enum scsi_device_state oldstate;
+
+ /*
+ * If the device is already suspended, offline or going away
+ * then succeed immediately. Otherwise the device must be
+ * either running or quiescent.
+ */
+ if (sdev->is_suspended)
+ goto done;
+
+ spin_lock_irq(sdev->request_queue->queue_lock);
+ oldstate = sdev->sdev_state;
+ if (sdev->auto_pm)
+ status = autosuspend_check(sdev);
+ if (status == 0) {
+ sdev->pm_in_progress = 1;
+ status = scsi_device_set_state(sdev, SDEV_QUIESCE);
+ if (status)
+ sdev->pm_in_progress = 0;
+ }
+ spin_unlock_irq(sdev->request_queue->queue_lock);
+ if (status)
+ goto done;
+
+ /* Unfortunate duplication of code in scsi_device_quiesce()... */
+ scsi_run_queue(sdev->request_queue);
+ while (sdev->device_busy) {
+ msleep_interruptible(200);
+ scsi_run_queue(sdev->request_queue);
+ }
+ if (sdev->auto_pm) /* sdev->last_busy may have changed */
+ status = autosuspend_check(sdev);
+
+ if (status == 0 && drv && drv->suspend && sdev->use_ULD_pm)
+ status = drv->suspend(&sdev->sdev_gendev, msg);
+
+ spin_lock_irq(sdev->request_queue->queue_lock);
+ sdev->pm_in_progress = 0;
+ if (status == 0)
+ sdev->is_suspended = 1;
+ else
+ scsi_device_set_state(sdev, oldstate);
+ spin_unlock_irq(sdev->request_queue->queue_lock);
+
+ /* If the suspend succeeded, inform the transport and
+ * propagate it up to the host.
+ */
+ if (status == 0) {
+ sdev_dbg(sdev, "suspended\n");
+ /* FIXME: Inform the transport */
+ scsi_autosuspend_host(sdev->host);
+ }
+
+ done:
+// if (status == 0)
+// sdev->sdev_gendev.power.power_state.event = msg.event;
+ return status;
+}
+
+/**
+ * scsi_resume_sdev - resume a SCSI device
+ * @sdev: the scsi_device to resume
+ *
+ * SCSI devices can't actually be resumed in a literal sense,
+ * because SCSI doesn't have any notion of power management. Instead
+ * this routine calls the ULD's resume method to spin-up drives, etc.,
+ * and starts executing commands from the request queue.
+ *
+ * Before doing the resume, we call scsi_autoresume_host to increment
+ * the host's count of unsuspended devices and invoke the LLD's resume
+ * method.
+ *
+ * The caller must hold @sdev->pm_mutex.
+ *
+ * This routine can run only in process context.
+ */
+static int scsi_resume_sdev(struct scsi_device *sdev)
+{
+ struct device_driver *drv = sdev->sdev_gendev.driver;
+ int status = 0;
+
+ if (!sdev->is_suspended)
+ goto done;
+ if (sdev->sdev_state != SDEV_QUIESCE) {
+ status = -ENODEV;
+ goto done;
+ }
+ if (sdev->auto_pm && sdev->autoresume_disabled) {
+ status = -EPERM;
+ goto done;
+ }
+
+ /* Propagate the resume up to the host and inform the transport */
+ status = scsi_autoresume_host(sdev->host);
+ if (status)
+ goto done;
+ /* FIXME: Inform the transport */
+
+ spin_lock_irq(sdev->request_queue->queue_lock);
+ if (sdev->sdev_state != SDEV_QUIESCE) {
+ status = -ENODEV;
+ } else {
+ sdev->is_suspended = 0;
+ sdev->pm_in_progress = 1;
+ }
+ spin_unlock_irq(sdev->request_queue->queue_lock);
+
+ if (status == 0 && drv && drv->resume && sdev->use_ULD_pm)
+ status = drv->resume(&sdev->sdev_gendev);
+
+ /* Unfortunate duplication of code in scsi_device_resume()... */
+ spin_lock_irq(sdev->request_queue->queue_lock);
+ sdev->pm_in_progress = 0;
+ if (status == 0)
+ status = scsi_device_set_state(sdev, SDEV_RUNNING);
+ spin_unlock_irq(sdev->request_queue->queue_lock);
+
+ if (status == 0) {
+ sdev_dbg(sdev, "resumed\n");
+ scsi_run_queue(sdev->request_queue);
+ } else {
+ /* Propagate resume failure to the host and the transport */
+ /* FIXME: inform the transport */
+ scsi_autosuspend_host(sdev->host);
+ }
+ done:
+// if (status == 0)
+// sdev->sdev_gendev.power.power_state.event = PM_EVENT_ON;
+ return status;
+}
+
+/* callback routine to autoresume a SCSI device */
+static void autoresume_sdev_work(struct work_struct *work)
+{
+ struct scsi_device *sdev =
+ container_of(work, struct scsi_device, autoresume_work);
+
+ printk("autoresume_sdev_work\n");
+ mutex_lock(&sdev->pm_mutex);
+ sdev->auto_pm = 1;
+ printk("autoresume_sdev_work: have mutex\n");
+ scsi_resume_sdev(sdev);
+ printk("autoresume_sdev_work: resume_sdev done\n");
+ mutex_unlock(&sdev->pm_mutex);
+ printk("autoresume_sdev_work: may be suspendable\n");
+ device_may_be_suspendable(sdev);
+ printk("autoresume_sdev_work: done\n");
+}
+
+/**
+ * scsi_autosuspend_device - autosuspend a SCSI device
+ * @sdev: the scsi_device to autosuspend
+ *
+ * This routine should be called when a core subsystem is finished using
+ * @sdev and wants to allow it to autosuspend. @sdev's usage counter
+ * is decremented. If the result is non-positive and the device is in the
+ * proper state, it will be suspended.
+ *
+ * This routine can run only in process context.
+ */
+void scsi_autosuspend_device(struct scsi_device *sdev)
+{
+ mutex_lock(&sdev->pm_mutex);
+ sdev->auto_pm = 1;
+ --sdev->pm_usage_cnt;
+ WARN_ON(sdev->pm_usage_cnt < 0);
+ if (sdev->pm_usage_cnt <= 0) {
+ if (scsi_suspend_sdev(sdev, PMSG_SUSPEND) != 0)
+ device_may_be_suspendable(sdev);
+ }
+ mutex_unlock(&sdev->pm_mutex);
+}
+EXPORT_SYMBOL_GPL(scsi_autosuspend_device);
+
+/**
+ * scsi_autoresume_device - autoresume a SCSI device
+ * @sdev: the scsi_device to autoresume
+ *
+ * This routine should be called when a core subsystem wants to use @sdev
+ * and needs to guarantee that it is not suspended. No autosuspend will
+ * occur until scsi_autosuspend_device is called. (Note that this will not
+ * prevent suspend events originating in the PM core.)
+ *
+ * @sdev's usage counter is incremented to prevent subsequent autosuspends.
+ * If @sdev was suspended, an autoresume is attempted. If the autoresume
+ * fails, the usage counter is re-decremented.
+ *
+ * This routine can run only in process context.
+ */
+int scsi_autoresume_device(struct scsi_device *sdev)
+{
+ int status = 0;
+
+ mutex_lock(&sdev->pm_mutex);
+ sdev->auto_pm = 1;
+ ++sdev->pm_usage_cnt;
+ sdev->last_busy = jiffies;
+ if (sdev->is_suspended) {
+ status = scsi_resume_sdev(sdev);
+ if (status != 0)
+ --sdev->pm_usage_cnt;
+ }
+ mutex_unlock(&sdev->pm_mutex);
+ return status;
+}
+EXPORT_SYMBOL_GPL(scsi_autoresume_device);
+
+/**
+ * scsi_try_autosuspend_device - attempt an autosuspend of a SCSI device
+ * @sdev: the scsi_device to autosuspend
+ *
+ * This routine should be called when a core subsystem thinks @sdev may
+ * be ready to autosuspend. @sdev's usage counter is left unchanged.
+ * If it is greater than 0 or autosuspend is not allowed for any other
+ * reason, nothing will happen. Otherwise @sdev will be suspended.
+ *
+ * This routine can run only in process context.
+ */
+static void scsi_try_autosuspend_device(struct scsi_device *sdev)
+{
+ mutex_lock(&sdev->pm_mutex);
+ sdev->auto_pm = 1;
+ if (sdev->pm_usage_cnt <= 0)
+ scsi_suspend_sdev(sdev, PMSG_SUSPEND);
+ mutex_unlock(&sdev->pm_mutex);
+}
+
+/**
+ * scsi_external_suspend_device - external suspend of a SCSI device
+ * @sdev: the scsi_device to suspend
+ * @msg: Power Management message describing this state transition
+ *
+ * This routine handles external suspend requests: ones not generated
+ * internally by a SCSI driver (autosuspend) but rather coming from the user
+ * (via sysfs) or the PM core (system sleep). The suspend will be carried
+ * out regardless of @sdev's usage counter. Of course, the Upper-Level
+ * Driver still has the option of failing the suspend.
+ *
+ * The caller must hold @sdev's device lock.
+ */
+static int scsi_external_suspend_device(struct scsi_device *sdev,
+ pm_message_t msg)
+{
+ int status;
+
+ mutex_lock(&sdev->pm_mutex);
+ sdev->auto_pm = 0;
+ status = scsi_suspend_sdev(sdev, msg);
+ mutex_unlock(&sdev->pm_mutex);
+ return status;
+}
+
+/**
+ * scsi_external_resume_device - external resume of a SCSI device
+ * @sdev: the scsi_device to resume
+ *
+ * This routine handles external resume requests: ones not generated
+ * internally by a SCSI driver (autoresume) but rather coming from the user
+ * (via sysfs), the PM core (system resume). @sdev's usage counter is
+ * unaffected.
+ *
+ * The caller must hold @sdev's device lock.
+ */
+static int scsi_external_resume_device(struct scsi_device *sdev)
+{
+ int status;
+
+ mutex_lock(&sdev->pm_mutex);
+ sdev->auto_pm = 0;
+ status = scsi_resume_sdev(sdev);
+ sdev->last_busy = jiffies;
+ mutex_unlock(&sdev->pm_mutex);
+
+ /* Now that the device is awake, we can start trying to autosuspend
+ * it again. */
+ device_may_be_suspendable(sdev);
+ return status;
+}
+
+/* Bus method invoked by the PM core for system sleep */
+int scsi_bus_suspend(struct device *dev, pm_message_t message)
+{
+ struct scsi_device *sdev = to_scsi_device(dev);
+ int ret;
+
+ printk("sleepy: scsi_bus_suspend start\n");
+
+ /* If sdev is already suspended, we can skip this suspend and
+ * we may also want to skip the upcoming system resume. */
+ if (sdev->is_suspended) {
+ sdev->skip_sys_resume = sdev->autoresume_disabled;
+ return 0;
+ }
+
+ sdev->skip_sys_resume = 0;
+ ret = scsi_external_suspend_device(sdev, message);
+ printk("sleepy: scsi_bus_suspend done\n");
+ return ret;
+}
+
+/* Bus method invoked by the PM core for system awakening */
+int scsi_bus_resume(struct device *dev)
+{
+ struct scsi_device *sdev = to_scsi_device(dev);
+ int ret;
+
+ printk("sleepy: scsi_bus_resume start\n");
+ if (sdev->skip_sys_resume)
+ return -EHOSTUNREACH;
+ ret = scsi_external_resume_device(sdev);
+ printk("sleepy: scsi_bus_resume done\n");
+ return ret;
+}
+
+/* Subroutine for scsi_prep_state_check(). This handles state checking
+ * when the device is in the SDEV_QUIESCE state.
+ */
+int scsi_pm_state_check(struct scsi_device *sdev, struct request *req)
+{
+ /*
+ * Special commands are allowed through if the device is merely
+ * quiescent. Some are allowed if it is in the process of
+ * suspending or resuming.
+ */
+ if ((req->cmd_flags & REQ_PREEMPT) && !sdev->is_suspended) {
+ if (!sdev->pm_in_progress)
+ return BLKPREP_OK;
+
+ /* Only certain commands are allowed during a transition */
+ if (req->cmd[0] == TEST_UNIT_READY ||
+ req->cmd[0] == START_STOP ||
+ req->cmd[0] == SYNCHRONIZE_CACHE)
+ return BLKPREP_OK;
+ }
+
+ /*
+ * If the device is suspending or suspended and autoresume is
+ * enabled, queue a wakeup request. But if autoresume isn't
+ * enabled then the command fails immediately.
+ */
+ if (sdev->is_suspended || sdev->pm_in_progress) {
+ if (sdev->autoresume_disabled)
+ return BLKPREP_KILL;
+ queue_work(ksuspend_scsi_wq, &sdev->autoresume_work);
+ }
+ return BLKPREP_DEFER;
+}
+
+/* Power-Management-related sysfs device attributes */
+
+static const char power_group[] = "power";
+
+static ssize_t
+show_autosuspend(struct device *dev, struct device_attribute *attr, char *buf)
+{
+ struct scsi_device *sdev = to_scsi_device(dev);
+
+ return sprintf(buf, "%d\n", sdev->autosuspend_delay / HZ);
+}
+
+static ssize_t
+set_autosuspend(struct device *dev, struct device_attribute *attr,
+ const char *buf, size_t count)
+{
+ struct scsi_device *sdev = to_scsi_device(dev);
+ int value;
+
+ if (sscanf(buf, "%d", &value) != 1 || value >= INT_MAX/HZ ||
+ value <= - INT_MAX/HZ)
+ return -EINVAL;
+ value *= HZ;
+
+ sdev->autosuspend_delay = value;
+ if (value >= 0) {
+ scsi_try_autosuspend_device(sdev);
+ device_may_be_suspendable(sdev);
+ } else {
+ if (scsi_autoresume_device(sdev) == 0)
+ scsi_autosuspend_device(sdev);
+ }
+ return count;
+}
+
+static DEVICE_ATTR(autosuspend, S_IRUGO | S_IWUSR,
+ show_autosuspend, set_autosuspend);
+
+static const char on_string[] = "on";
+static const char auto_string[] = "auto";
+static const char suspend_string[] = "suspend";
+
+static ssize_t
+show_level(struct device *dev, struct device_attribute *attr, char *buf)
+{
+ struct scsi_device *sdev = to_scsi_device(dev);
+ const char *p = auto_string;
+
+ if (sdev->is_suspended) {
+ if (sdev->autoresume_disabled)
+ p = suspend_string;
+ } else {
+ if (sdev->autosuspend_disabled)
+ p = on_string;
+ }
+ return sprintf(buf, "%s\n", p);
+}
+
+static ssize_t
+set_level(struct device *dev, struct device_attribute *attr,
+ const char *buf, size_t count)
+{
+ struct scsi_device *sdev = to_scsi_device(dev);
+ int len = count;
+ char *cp;
+ int rc = 0;
+ int old_autosuspend_disabled, old_autoresume_disabled;
+
+ cp = memchr(buf, '\n', count);
+ if (cp)
+ len = cp - buf;
+
+ down(&sdev->sdev_gendev.sem);
+ old_autosuspend_disabled = sdev->autosuspend_disabled;
+ old_autoresume_disabled = sdev->autoresume_disabled;
+
+ /* Setting the flags without locking sdev->pm_mutex is a subject to
+ * races, but who cares...
+ */
+ if (len == sizeof on_string - 1 &&
+ strncmp(buf, on_string, len) == 0) {
+ sdev->autosuspend_disabled = 1;
+ sdev->autoresume_disabled = 0;
+ rc = scsi_external_resume_device(sdev);
+
+ } else if (len == sizeof auto_string - 1 &&
+ strncmp(buf, auto_string, len) == 0) {
+ sdev->autosuspend_disabled = 0;
+ sdev->autoresume_disabled = 0;
+ rc = scsi_external_resume_device(sdev);
+
+ } else if (len == sizeof suspend_string - 1 &&
+ strncmp(buf, suspend_string, len) == 0) {
+ sdev->autosuspend_disabled = 0;
+ sdev->autoresume_disabled = 1;
+ rc = scsi_external_suspend_device(sdev, PMSG_SUSPEND);
+
+ } else
+ rc = -EINVAL;
+
+ if (rc) {
+ sdev->autosuspend_disabled = old_autosuspend_disabled;
+ sdev->autoresume_disabled = old_autoresume_disabled;
+ }
+ up(&sdev->sdev_gendev.sem);
+ return (rc < 0 ? rc : count);
+}
+
+static DEVICE_ATTR(level, S_IRUGO | S_IWUSR, show_level, set_level);
+
+int scsi_pm_create_device_files(struct scsi_device *sdev)
+{
+ int rc;
+
+ rc = sysfs_add_file_to_group(&sdev->sdev_gendev.kobj,
+ &dev_attr_autosuspend.attr, power_group);
+ if (rc == 0)
+ rc = sysfs_add_file_to_group(&sdev->sdev_gendev.kobj,
+ &dev_attr_level.attr, power_group);
+ return rc;
+}
+
+void scsi_pm_device_initialize(struct scsi_device *sdev)
+{
+ mutex_init(&sdev->pm_mutex);
+ INIT_WORK(&sdev->autoresume_work, autoresume_sdev_work);
+ sdev->autosuspend_delay = SCSI_DEFAULT_AUTOSUSPEND_DELAY;
+ sdev->autosuspend_disabled = 1;
+ sdev->pm_usage_cnt = 1;
+}
+
+int scsi_pm_device_stop(struct scsi_device *sdev)
+{
+ int rc;
+
+ /* Synchronize with any ongoing PM activity */
+ mutex_lock(&sdev->pm_mutex);
+ rc = scsi_device_set_state(sdev, SDEV_CANCEL);
+
+ /* Decrement the host's count of unsuspended children */
+ if (rc == 0 && !sdev->is_suspended)
+ scsi_autosuspend_host(sdev->host);
+ mutex_unlock(&sdev->pm_mutex);
+
+ /* Stop any autoresume requests already submitted */
+ cancel_work_sync(&sdev->autoresume_work);
+ return rc;
+}
+
+/* Create the ksuspend_scsid workqueue thread */
+int __init scsi_init_pm(void)
+{
+ /* This workqueue is supposed to be both freezable and
+ * singlethreaded. Its job doesn't justify running on more
+ * than one CPU.
+ */
+ ksuspend_scsi_wq = create_singlethread_workqueue("ksuspend_scsid");
+ if (!ksuspend_scsi_wq)
+ return -ENOMEM;
+ return 0;
+}
+
+void __exit scsi_exit_pm(void)
+{
+ destroy_workqueue(ksuspend_scsi_wq);
+}
+
+#else /* CONFIG_SCSI_DYNAMIC_PM */
+
+/* Legacy bus suspend method */
+int scsi_bus_suspend(struct device *dev, pm_message_t message)
+{
+ struct device_driver *drv = dev->driver;
+ int err;
+
+ BUG();
+
+ err = scsi_device_quiesce(to_scsi_device(dev));
+ if (!err && drv && drv->suspend)
+ err = drv->suspend(dev, message);
+ return err;
+}
+
+/* Legacy bus resume method */
+int scsi_bus_resume(struct device *dev)
+{
+ struct device_driver *drv = dev->driver;
+ int err = 0;
+
+ BUG();
+
+ if (drv && drv->resume)
+ err = drv->resume(dev);
+ scsi_device_resume(to_scsi_device(dev));
+ return err;
+}
+
+/* Legacy subroutine for scsi_prep_state_check(). This handles state checking
+ * when the device is in the SDEV_QUIESCE state.
+ */
+int scsi_pm_state_check(struct scsi_device *sdev, struct request *req)
+{
+ /*
+ * If the device is blocked we defer normal commands.
+ */
+ if (!(req->cmd_flags & REQ_PREEMPT))
+ return BLKPREP_DEFER;
+ return BLKPREP_OK;
+}
+
+int scsi_pm_device_stop(struct scsi_device *sdev)
+{
+ return scsi_device_set_state(sdev, SDEV_DEL);
+}
+
+int scsi_pm_host_stop(struct Scsi_Host *shost)
+{
+ int rc = 0;
+
+ mutex_lock(&shost->scan_mutex);
+ spin_lock_irq(shost->host_lock);
+ if (scsi_host_set_state(shost, SHOST_CANCEL))
+ rc = scsi_host_set_state(shost, SHOST_CANCEL_RECOVERY);
+ spin_unlock_irq(shost->host_lock);
+ mutex_unlock(&shost->scan_mutex);
+ return rc;
+}
+
+#endif /* CONFIG_SCSI_DYNAMIC_PM */
diff --git a/drivers/scsi/scsi_scan.c b/drivers/scsi/scsi_scan.c
index 84b4879..99de5d1 100644
--- a/drivers/scsi/scsi_scan.c
+++ b/drivers/scsi/scsi_scan.c
@@ -297,6 +297,9 @@ static struct scsi_device *scsi_alloc_sd
scsi_adjust_queue_depth(sdev, 0, sdev->host->cmd_per_lun);

scsi_sysfs_device_initialize(sdev);
+ scsi_pm_device_initialize(sdev);
+ if (scsi_autoresume_host(shost) != 0)
+ goto out_device_destroy;

if (shost->hostt->slave_alloc) {
ret = shost->hostt->slave_alloc(sdev);
@@ -307,6 +310,7 @@ static struct scsi_device *scsi_alloc_sd
*/
if (ret == -ENXIO)
display_failure_msg = 0;
+ scsi_autosuspend_host(shost);
goto out_device_destroy;
}
}
@@ -927,6 +931,7 @@ static int scsi_add_lun(struct scsi_devi
static inline void scsi_destroy_sdev(struct scsi_device *sdev)
{
scsi_device_set_state(sdev, SDEV_DEL);
+ scsi_autosuspend_host(sdev->host);
if (sdev->host->hostt->slave_destroy)
sdev->host->hostt->slave_destroy(sdev);
transport_destroy_device(&sdev->sdev_gendev);
@@ -1090,6 +1095,7 @@ static int scsi_probe_and_add_lun(struct

res = scsi_add_lun(sdev, result, &bflags, shost->async_scan);
if (res == SCSI_SCAN_LUN_PRESENT) {
+ scsi_autosuspend_device(sdev);
if (bflags & BLIST_KEY) {
sdev->lockable = 0;
scsi_unlock_floptical(sdev, result);
@@ -1789,6 +1795,8 @@ static void scsi_finish_async_scan(struc

static void do_scsi_scan_host(struct Scsi_Host *shost)
{
+ if (scsi_autoresume_host(shost) != 0)
+ return;
if (shost->hostt->scan_finished) {
unsigned long start = jiffies;
if (shost->hostt->scan_start)
@@ -1800,6 +1808,7 @@ static void do_scsi_scan_host(struct Scs
scsi_scan_host_selected(shost, SCAN_WILD_CARD, SCAN_WILD_CARD,
SCAN_WILD_CARD, 0);
}
+ scsi_autosuspend_host(shost);
}

static int do_scan_async(void *_data)
diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
index ab3c718..ac16a39 100644
--- a/drivers/scsi/scsi_sysfs.c
+++ b/drivers/scsi/scsi_sysfs.c
@@ -374,74 +374,12 @@ static int scsi_bus_uevent(struct device
return 0;
}

-static int scsi_bus_suspend(struct device * dev, pm_message_t state)
-{
- struct device_driver *drv;
- struct scsi_device *sdev;
- int err;
-
- if (dev->type != &scsi_dev_type)
- return 0;
-
- drv = dev->driver;
- sdev = to_scsi_device(dev);
-
- err = scsi_device_quiesce(sdev);
- if (err)
- return err;
-
- if (drv && drv->suspend) {
- err = drv->suspend(dev, state);
- if (err)
- return err;
- }
-
- return 0;
-}
-
-static int scsi_bus_resume(struct device * dev)
-{
- struct device_driver *drv;
- struct scsi_device *sdev;
- int err = 0;
-
- if (dev->type != &scsi_dev_type)
- return 0;
-
- drv = dev->driver;
- sdev = to_scsi_device(dev);
-
- if (drv && drv->resume)
- err = drv->resume(dev);
-
- scsi_device_resume(sdev);
-
- return err;
-}
-
-static int scsi_bus_remove(struct device *dev)
-{
- struct device_driver *drv = dev->driver;
- struct scsi_device *sdev = to_scsi_device(dev);
- int err = 0;
-
- /* reset the prep_fn back to the default since the
- * driver may have altered it and it's being removed */
- blk_queue_prep_rq(sdev->request_queue, scsi_prep_fn);
-
- if (drv && drv->remove)
- err = drv->remove(dev);
-
- return 0;
-}
-
struct bus_type scsi_bus_type = {
.name = "scsi",
.match = scsi_bus_match,
.uevent = scsi_bus_uevent,
.suspend = scsi_bus_suspend,
.resume = scsi_bus_resume,
- .remove = scsi_bus_remove,
};
EXPORT_SYMBOL_GPL(scsi_bus_type);

@@ -899,6 +837,12 @@ int scsi_sysfs_add_sdev(struct scsi_devi
goto out;
}

+ error = scsi_pm_create_device_files(sdev);
+ if (error) {
+ __scsi_remove_device(sdev);
+ goto out;
+ }
+
error = bsg_register_queue(rq, &sdev->sdev_gendev, NULL, NULL);

if (error)
@@ -939,7 +883,7 @@ void __scsi_remove_device(struct scsi_de
{
struct device *dev = &sdev->sdev_gendev;

- if (scsi_device_set_state(sdev, SDEV_CANCEL) != 0)
+ if (scsi_pm_device_stop(sdev) != 0)
return;

bsg_unregister_queue(sdev->request_queue);
diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index e5e7d78..65dd73c 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -61,6 +61,7 @@ #include <scsi/scsicam.h>

#include "sd.h"
#include "scsi_logging.h"
+#include "scsi_priv.h"

MODULE_AUTHOR("Eric Youngdale");
MODULE_DESCRIPTION("SCSI disk (sd) driver");
@@ -1815,6 +1816,10 @@ static int sd_probe(struct device *dev)
if (error)
goto out_put;

+ error = scsi_autoresume_device(sdp);
+ if (error)
+ goto out_put;
+
error = -EBUSY;
if (index >= SD_MAX_DISKS)
goto out_free_index;
@@ -1880,8 +1885,12 @@ static int sd_probe(struct device *dev)
sd_printk(KERN_NOTICE, sdkp, "Attached SCSI %sdisk\n",
sdp->removable ? "removable " : "");

+ scsi_use_ULD_pm(sdp, 1);
+ scsi_autosuspend_device(sdp);
return 0;

+ out_suspend:
+ scsi_autosuspend_device(sdp);
out_free_index:
ida_remove(&sd_index_ida, index);
out_put:
@@ -1909,6 +1918,7 @@ static int sd_remove(struct device *dev)

device_del(&sdkp->dev);
del_gendisk(sdkp->disk);
+ scsi_use_ULD_pm(sdkp->device, 0);
sd_shutdown(dev);

mutex_lock(&sd_ref_mutex);
diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c
index 3d36270..bd53ae3 100644
--- a/drivers/scsi/sg.c
+++ b/drivers/scsi/sg.c
@@ -58,6 +58,7 @@ #include <scsi/scsi_driver.h>
#include <scsi/scsi_ioctl.h>
#include <scsi/sg.h>

+#include "scsi_priv.h"
#include "scsi_logging.h"

#ifdef CONFIG_SCSI_PROC_FS
@@ -227,6 +228,7 @@ sg_open(struct inode *inode, struct file
Sg_fd *sfp;
int res;
int retval;
+ int autoresume_rc = 1;

lock_kernel();
nonseekable_open(inode, filp);
@@ -249,6 +251,10 @@ sg_open(struct inode *inode, struct file
return retval;
}

+ retval = autoresume_rc = scsi_autoresume_device(sdp->device);
+ if (retval)
+ goto error_out;
+
if (!((flags & O_NONBLOCK) ||
scsi_block_when_processing_errors(sdp->device))) {
retval = -ENXIO;
@@ -307,6 +313,8 @@ sg_open(struct inode *inode, struct file
return 0;

error_out:
+ if (autoresume_rc == 0)
+ scsi_autosuspend_device(sdp->device);
scsi_device_put(sdp->device);
unlock_kernel();
return retval;
@@ -323,6 +331,8 @@ sg_release(struct inode *inode, struct f
return -ENXIO;
SCSI_LOG_TIMEOUT(3, printk("sg_release: %s\n", sdp->disk->disk_name));
sg_fasync(-1, filp, 0); /* remove filp from async notification list */
+ scsi_autosuspend_device(sdp->device);
+
if (0 == sg_remove_sfp(sdp, sfp)) { /* Returns 1 when sdp gone */
if (!sdp->detached) {
scsi_device_put(sdp->device);
diff --git a/drivers/usb/storage/scsiglue.c b/drivers/usb/storage/scsiglue.c
index 09779f6..ef64fd8 100644
--- a/drivers/usb/storage/scsiglue.c
+++ b/drivers/usb/storage/scsiglue.c
@@ -304,10 +304,15 @@ static int device_reset(struct scsi_cmnd

US_DEBUGP("%s called\n", __func__);

- /* lock the device pointers and do the reset */
- mutex_lock(&(us->dev_mutex));
- result = us->transport_reset(us);
- mutex_unlock(&us->dev_mutex);
+ result = usb_autopm_get_interface(us->pusb_intf);
+ if (result == 0) {
+
+ /* lock the device pointers and do the reset */
+ mutex_lock(&(us->dev_mutex));
+ result = us->transport_reset(us);
+ mutex_unlock(&us->dev_mutex);
+ usb_autopm_put_interface(us->pusb_intf);
+ }

return result < 0 ? FAILED : SUCCESS;
}
@@ -350,6 +355,24 @@ void usb_stor_report_bus_reset(struct us
scsi_unlock(host);
}

+/* The host and its devices are all idle so we can autosuspend */
+static int autosuspend(struct Scsi_Host *host)
+{
+ struct us_data *us = host_to_us(host);
+
+ usb_autopm_put_interface(us->pusb_intf);
+ return 0;
+}
+
+/* The host needs to be autoresumed */
+static int autoresume(struct Scsi_Host *host)
+{
+ struct us_data *us = host_to_us(host);
+
+ return usb_autopm_get_interface(us->pusb_intf);
+}
+
+
/***********************************************************************
* /proc/scsi/ functions
***********************************************************************/
@@ -476,6 +499,10 @@ struct scsi_host_template usb_stor_host_
.eh_device_reset_handler = device_reset,
.eh_bus_reset_handler = bus_reset,

+ /* dynamic power management */
+ .autosuspend = autosuspend,
+ .autoresume = autoresume,
+
/* queue commands only, only one command per LUN */
.can_queue = 1,
.cmd_per_lun = 1,
diff --git a/drivers/usb/storage/usb.c b/drivers/usb/storage/usb.c
index bfea851..65c86ac 100644
--- a/drivers/usb/storage/usb.c
+++ b/drivers/usb/storage/usb.c
@@ -185,6 +185,8 @@ static int storage_suspend(struct usb_in
{
struct us_data *us = usb_get_intfdata(iface);

+ US_DEBUGP("%s\n", __FUNCTION__);
+
/* Wait until no command is running */
mutex_lock(&us->dev_mutex);

@@ -192,9 +194,6 @@ static int storage_suspend(struct usb_in
if (us->suspend_resume_hook)
(us->suspend_resume_hook)(us, US_SUSPEND);

- /* When runtime PM is working, we'll set a flag to indicate
- * whether we should autoresume when a SCSI request arrives. */
-
mutex_unlock(&us->dev_mutex);
return 0;
}
@@ -209,7 +208,6 @@ static int storage_resume(struct usb_int
if (us->suspend_resume_hook)
(us->suspend_resume_hook)(us, US_RESUME);

- mutex_unlock(&us->dev_mutex);
return 0;
}

@@ -930,6 +928,7 @@ static int usb_stor_scan_thread(void * _
/* Should we unbind if no devices were detected? */
}

+ usb_autopm_put_interface(us->pusb_intf);
complete_and_exit(&us->scanning_done, 0);
}

@@ -959,6 +958,9 @@ static int storage_probe(struct usb_inte
return -ENOMEM;
}

+ /* Don't autosuspend until the SCSI core tells us */
+ usb_autopm_get_interface(intf);
+
/*
* Allow 16-byte CDBs and thus > 2TB
*/
@@ -1020,6 +1022,7 @@ static int storage_probe(struct usb_inte
goto BadDevice;
}

+ usb_autopm_get_interface(intf); /* dropped in the scanning thread */
wake_up_process(th);

return 0;
@@ -1057,6 +1060,7 @@ #endif
.pre_reset = storage_pre_reset,
.post_reset = storage_post_reset,
.id_table = storage_usb_ids,
+ .supports_autosuspend = 1,
.soft_unbind = 1,
};

diff --git a/drivers/scsi/Kconfig b/drivers/scsi/Kconfig
index c7f0629..eba766d 100644
--- a/drivers/scsi/Kconfig
+++ b/drivers/scsi/Kconfig
@@ -57,6 +57,18 @@ config SCSI_PROC_FS

If unsure say Y.

+config SCSI_DYNAMIC_PM
+ bool "SCSI dynamic Power Management support (EXPERIMENTAL)"
+ depends on SCSI && PM && EXPERIMENTAL
+ ---help---
+ This option enables support for dynamic (or runtime)
+ power management of SCSI devices and host adapters.
+ If you say Y here, you can use the sysfs "power/level"
+ and "power/autosuspend" files to control manual or
+ automatic suspend/resume of individual SCSI devices.
+
+ If unsure say N.
+
comment "SCSI support type (disk, tape, CD-ROM)"
depends on SCSI



--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html


2008-08-13 14:31:19

by Alan Stern

[permalink] [raw]
Subject: Re: Power management for SCSI

On Wed, 13 Aug 2008, Pavel Machek wrote:

> From: Alan Stern <[email protected]>
>
> Add support for autosuspend/autoresume. Lowlevel driver can use it to
> spin the disk down and power down its SATA link, to turn off the USB
> interface, etc.
>
> Spinning down the disk is useful - saves ~0.5W here. Powering down
> SATA controller is even better -- should save ~1W.
>
> Now, I guess the patch will need to be split to small pieces for
> merge... I tried to rearrange it so that the documentation and hooks
> go before stuff that needs the hooks, and before Kconfig enabler. If
> it looks reasonably good, I'll split it into smaller pieces.

James had a number of objections to my original patch; you can read
them here:

https://lists.linux-foundation.org/pipermail/linux-pm/2008-March/016849.html

I haven't had time yet to work on an improved version.

Alan Stern

2008-08-13 14:46:51

by Oliver Neukum

[permalink] [raw]
Subject: Re: Power management for SCSI

Am Mittwoch 13 August 2008 16:31:03 schrieb Alan Stern:
> On Wed, 13 Aug 2008, Pavel Machek wrote:
>
> > From: Alan Stern <[email protected]>
> >
> > Add support for autosuspend/autoresume. Lowlevel driver can use it to
> > spin the disk down and power down its SATA link, to turn off the USB
> > interface, etc.
> >
> > Spinning down the disk is useful - saves ~0.5W here. Powering down
> > SATA controller is even better -- should save ~1W.
> >
> > Now, I guess the patch will need to be split to small pieces for
> > merge... I tried to rearrange it so that the documentation and hooks
> > go before stuff that needs the hooks, and before Kconfig enabler. If
> > it looks reasonably good, I'll split it into smaller pieces.
>
> James had a number of objections to my original patch; you can read
> them here:
>
> https://lists.linux-foundation.org/pipermail/linux-pm/2008-March/016849.html

Very well. I see a basic problem here. For USB it is necessary that child
devices be suspended before anything higher up in the tree is suspended.
SATA seems to be able to power down a link while the device is not suspended.

In fact in true SCSI busses can be shared. So are we using the correct
approach?

Regards
Oliver

2008-08-13 14:59:34

by Alan Stern

[permalink] [raw]
Subject: Re: Power management for SCSI

On Wed, 13 Aug 2008, Oliver Neukum wrote:

> Very well. I see a basic problem here. For USB it is necessary that child
> devices be suspended before anything higher up in the tree is suspended.
> SATA seems to be able to power down a link while the device is not suspended.

Is the USB transport unique in its requirement that all the child
devices must be suspended before the link can be powered down? Maybe
that requirement should be made an explicit property of the transport
or the transport class.

> In fact in true SCSI busses can be shared. So are we using the correct
> approach?

This is a good question. Most USB mass-storage devices do not act as a
true SCSI bus, but I believe there are a few non-standard ones that do
-- the USB device really contains a SCSI host and arbitrary SCSI
targets can be attached to it. For the moment, we should be safe
enough using a model in which there are no other initiators on a
USB-type SCSI transport, but it's something to keep in mind.

Alan Stern

2008-08-13 15:20:49

by Oliver Neukum

[permalink] [raw]
Subject: Re: Power management for SCSI

Am Mittwoch 13 August 2008 16:59:23 schrieb Alan Stern:
> On Wed, 13 Aug 2008, Oliver Neukum wrote:
>
> > Very well. I see a basic problem here. For USB it is necessary that child
> > devices be suspended before anything higher up in the tree is suspended.
> > SATA seems to be able to power down a link while the device is not suspended.
>
> Is the USB transport unique in its requirement that all the child
> devices must be suspended before the link can be powered down? Maybe

All children that are USB must be powered down. We know in fact that most
drives don't care that the device is suspended. The problem was drive
enclosures that cut power upon suspension losing cached data.

> that requirement should be made an explicit property of the transport
> or the transport class.
>
> > In fact in true SCSI busses can be shared. So are we using the correct
> > approach?
>
> This is a good question. Most USB mass-storage devices do not act as a
> true SCSI bus, but I believe there are a few non-standard ones that do
> -- the USB device really contains a SCSI host and arbitrary SCSI
> targets can be attached to it. For the moment, we should be safe
> enough using a model in which there are no other initiators on a
> USB-type SCSI transport, but it's something to keep in mind.

So do we really want to do autosuspend on the device level? Or do we work
on hosts and just use the suspend()/resume() support of the sd, sr, ... etc?

Regards
Oliver

2008-08-13 15:23:58

by Oliver Neukum

[permalink] [raw]
Subject: Re: Power management for SCSI

Am Mittwoch 13 August 2008 16:59:23 schrieb Alan Stern:
> This is a good question. ?Most USB mass-storage devices do not act as a
> true SCSI bus, but I believe there are a few non-standard ones that do
> -- the USB device really contains a SCSI host and arbitrary SCSI

OK, but does it make sense to have SCSI autosuspend? Or should autosuspend
operate on the bus the _host_ is connected to (usb, pci, ...)?

Regards
Oliver

2008-08-13 15:44:55

by Alan Stern

[permalink] [raw]
Subject: Re: Power management for SCSI

On Wed, 13 Aug 2008, Oliver Neukum wrote:

> Am Mittwoch 13 August 2008 16:59:23 schrieb Alan Stern:
> > On Wed, 13 Aug 2008, Oliver Neukum wrote:
> >
> > > Very well. I see a basic problem here. For USB it is necessary that child
> > > devices be suspended before anything higher up in the tree is suspended.
> > > SATA seems to be able to power down a link while the device is not suspended.
> >
> > Is the USB transport unique in its requirement that all the child
> > devices must be suspended before the link can be powered down? Maybe
>
> All children that are USB must be powered down. We know in fact that most
> drives don't care that the device is suspended. The problem was drive
> enclosures that cut power upon suspension losing cached data.

You misunderstood my question. Are there SCSI transports other than
USB sharing the requirement that all child devices must be suspended
before the link can be powered down?

> > > In fact in true SCSI busses can be shared. So are we using the correct
> > > approach?
> >
> > This is a good question. Most USB mass-storage devices do not act as a
> > true SCSI bus, but I believe there are a few non-standard ones that do
> > -- the USB device really contains a SCSI host and arbitrary SCSI
> > targets can be attached to it. For the moment, we should be safe
> > enough using a model in which there are no other initiators on a
> > USB-type SCSI transport, but it's something to keep in mind.
>
> So do we really want to do autosuspend on the device level? Or do we work
> on hosts and just use the suspend()/resume() support of the sd, sr, ... etc?

For transports which are like USB, we should do autosuspend at the
target (not device) level. This means invoking the suspend/resume
routines of the ULDs like sd and sr. The transport gets notified when
all of the targets are suspended. (Or maybe the host driver gets
notified instead; there probably isn't any advantage to using the
transport class here.)

For other transports, we should only do idle-timeout detection. The
transport gets notified when any target has been idle for sufficiently
long, so that it can power down the link. The ULDs are not involved.

Does that sound okay?

Alan Stern

2008-08-13 15:45:28

by Stefan Richter

[permalink] [raw]
Subject: Re: Power management for SCSI

Oliver Neukum wrote:
> Am Mittwoch 13 August 2008 16:59:23 schrieb Alan Stern:
[Quoting Oliver: true SCSI busses can be shared. So are we using the
correct approach?]
>> This is a good question. Most USB mass-storage devices do not act as a
>> true SCSI bus, but I believe there are a few non-standard ones that do
>> -- the USB device really contains a SCSI host and arbitrary SCSI
>
> OK, but does it make sense to have SCSI autosuspend? Or should autosuspend
> operate on the bus the _host_ is connected to (usb, pci, ...)?

In Alan's patch, SCSI calls scsi_host_template methods (if the LLD
provides ones) to suspend and resume a Scsi_Host. The LLD can use them
to work with the underlying infrastructure to determine what can be done
at that time. I.e. are there other protocols or other initiator-like
nodes sharing the link? If yes or if "maybe yes", the infrastructure
keeps the link up. If not, it can move it into a low-power state.
--
Stefan Richter
-=====-==--- =--- -==-=
http://arcgraph.de/sr/

2008-08-13 15:46:43

by Alan Stern

[permalink] [raw]
Subject: Re: Power management for SCSI

On Wed, 13 Aug 2008, Oliver Neukum wrote:

> Am Mittwoch 13 August 2008 16:59:23 schrieb Alan Stern:
> > This is a good question. ?Most USB mass-storage devices do not act as a
> > true SCSI bus, but I believe there are a few non-standard ones that do
> > -- the USB device really contains a SCSI host and arbitrary SCSI
>
> OK, but does it make sense to have SCSI autosuspend? Or should autosuspend
> operate on the bus the _host_ is connected to (usb, pci, ...)?

That's the situation we're in now. Autosuspend operates on the USB
bus, but it can't do anything with usb-storage because the child SCSI
devices don't do a SCSI autosuspend.

Alan Stern

2008-08-13 16:15:52

by Stefan Richter

[permalink] [raw]
Subject: Re: Power management for SCSI

Alan Stern wrote:
> On Wed, 13 Aug 2008, Oliver Neukum wrote:
>> Am Mittwoch 13 August 2008 16:59:23 schrieb Alan Stern:
>> > Is the USB transport unique in its requirement that all the child
>> > devices must be suspended before the link can be powered down? Maybe
>>
>> All children that are USB must be powered down. We know in fact that most
>> drives don't care that the device is suspended. The problem was drive
>> enclosures that cut power upon suspension losing cached data.
>
> You misunderstood my question. Are there SCSI transports other than
> USB sharing the requirement that all child devices must be suspended
> before the link can be powered down?

Yes in case of FireWire; it's necessary there too (but not sufficient).

(It's a bad example though since I have no good idea whether power
management beyond (a) system suspend and (b) disk spindown is feasible
in reality at all.)

[...]
>> So do we really want to do autosuspend on the device level? Or do we work
>> on hosts and just use the suspend()/resume() support of the sd, sr, ... etc?
>
> For transports which are like USB, we should do autosuspend at the
> target (not device) level. This means invoking the suspend/resume
> routines of the ULDs like sd and sr. The transport gets notified when
> all of the targets are suspended. (Or maybe the host driver gets
> notified instead; there probably isn't any advantage to using the
> transport class here.)
>
> For other transports, we should only do idle-timeout detection. The
> transport gets notified when any target has been idle for sufficiently
> long, so that it can power down the link. The ULDs are not involved.
>
> Does that sound okay?

Minor correction: The ULD suspend/resume methods necessarily work on
logical units, not targets.
--
Stefan Richter
-=====-==--- =--- -==-=
http://arcgraph.de/sr/

2008-08-13 16:20:39

by Oliver Neukum

[permalink] [raw]
Subject: Re: [linux-pm] Power management for SCSI

Am Mittwoch 13 August 2008 17:44:46 schrieb Alan Stern:
> > All children that are USB must be powered down. We know in fact that most
> > drives don't care that the device is suspended. The problem was drive
> > enclosures that cut power upon suspension losing cached data.
>
> You misunderstood my question. ?Are there SCSI transports other than
> USB sharing the requirement that all child devices must be suspended
> before the link can be powered down?

I dispute that USB in general has this property. Some storage devices
need their caches flushed. USB itself is perfectly happy with autosuspending
the storage device (host) without telling the disks (devices)

You could even argue that these storage devices violate the USB spec.

Regards
Oliver

2008-08-13 16:23:57

by Alan Stern

[permalink] [raw]
Subject: Re: Power management for SCSI

On Wed, 13 Aug 2008, Stefan Richter wrote:

> > For transports which are like USB, we should do autosuspend at the
> > target (not device) level. This means invoking the suspend/resume
> > routines of the ULDs like sd and sr. The transport gets notified when
> > all of the targets are suspended. (Or maybe the host driver gets
> > notified instead; there probably isn't any advantage to using the
> > transport class here.)
> >
> > For other transports, we should only do idle-timeout detection. The
> > transport gets notified when any target has been idle for sufficiently
> > long, so that it can power down the link. The ULDs are not involved.
> >
> > Does that sound okay?
>
> Minor correction: The ULD suspend/resume methods necessarily work on
> logical units, not targets.

Yes; I should said that the suspend/resume methods of the ULD for each
of the target's LUNs gets invoked.

Alan Stern

2008-08-13 16:24:34

by Oliver Neukum

[permalink] [raw]
Subject: Re: Power management for SCSI

Am Mittwoch 13 August 2008 17:44:00 schrieb Stefan Richter:
> Oliver Neukum wrote:
> > Am Mittwoch 13 August 2008 16:59:23 schrieb Alan Stern:
> [Quoting Oliver: true SCSI busses can be shared. So are we using the
> correct approach?]
> >> This is a good question. Most USB mass-storage devices do not act as a
> >> true SCSI bus, but I believe there are a few non-standard ones that do
> >> -- the USB device really contains a SCSI host and arbitrary SCSI
> >
> > OK, but does it make sense to have SCSI autosuspend? Or should autosuspend
> > operate on the bus the _host_ is connected to (usb, pci, ...)?
>
> In Alan's patch, SCSI calls scsi_host_template methods (if the LLD
> provides ones) to suspend and resume a Scsi_Host. The LLD can use them
> to work with the underlying infrastructure to determine what can be done
> at that time. I.e. are there other protocols or other initiator-like
> nodes sharing the link? If yes or if "maybe yes", the infrastructure
> keeps the link up. If not, it can move it into a low-power state.

That is a parculiar way of viewing it. Alan's patch introduce runtime
pm attributes to the devices. Quoting:


+/**
+ * scsi_suspend_sdev - suspend a SCSI device
+ * @sdev: the scsi_device to suspend
+ * @msg: Power Management message describing this state transition
+ *
+ * SCSI devices can't actually be suspended in a literal sense,
+ * because SCSI doesn't have any notion of power management. Instead
+ * this routine drains the request queue and calls the ULD's suspend
+ * method to flush caches, spin-down drives, and so on.
+ *
+ * If the suspend succeeds, we call scsi_autosuspend_host to decrement
+ * the host's count of unsuspended devices and invoke the LLD's suspend
+ * method.

So you cannot operate on the link independent from the devices.

Regards
Oliver

2008-08-13 19:34:40

by Alan Stern

[permalink] [raw]
Subject: Re: [linux-pm] Power management for SCSI

On Wed, 13 Aug 2008, Oliver Neukum wrote:

> Am Mittwoch 13 August 2008 17:44:46 schrieb Alan Stern:
> > > All children that are USB must be powered down. We know in fact that most
> > > drives don't care that the device is suspended. The problem was drive
> > > enclosures that cut power upon suspension losing cached data.
> >
> > You misunderstood my question. ?Are there SCSI transports other than
> > USB sharing the requirement that all child devices must be suspended
> > before the link can be powered down?
>
> I dispute that USB in general has this property.

How can you dispute that? You said it yourself, in the top quote
above: "All children that are USB must be powered down."

> Some storage devices
> need their caches flushed. USB itself is perfectly happy with autosuspending
> the storage device (host) without telling the disks (devices)
>
> You could even argue that these storage devices violate the USB spec.

Oliver, you can't have it both ways. Either we do spin down disks and
drain device caches before autosuspending usb-storage or we don't.
For safety's sake, obviously we should. The overhead is minimal since
this happens only after the idle timeout has expired. And for devices
that don't support it (like flash storage), sd skips the spin-down
command anway.

At any rate, Stefan Richter has answered my original question.
Firewire has essentially the same restrictions as USB.

Alan Stern

2008-08-13 19:37:41

by Alan Stern

[permalink] [raw]
Subject: Re: Power management for SCSI

On Wed, 13 Aug 2008, Oliver Neukum wrote:

> > In Alan's patch, SCSI calls scsi_host_template methods (if the LLD
> > provides ones) to suspend and resume a Scsi_Host. The LLD can use them
> > to work with the underlying infrastructure to determine what can be done
> > at that time. I.e. are there other protocols or other initiator-like
> > nodes sharing the link? If yes or if "maybe yes", the infrastructure
> > keeps the link up. If not, it can move it into a low-power state.
>
> That is a parculiar way of viewing it. Alan's patch introduce runtime
> pm attributes to the devices. Quoting:
>
>
> +/**
> + * scsi_suspend_sdev - suspend a SCSI device
> + * @sdev: the scsi_device to suspend
> + * @msg: Power Management message describing this state transition
> + *
> + * SCSI devices can't actually be suspended in a literal sense,
> + * because SCSI doesn't have any notion of power management. Instead
> + * this routine drains the request queue and calls the ULD's suspend
> + * method to flush caches, spin-down drives, and so on.
> + *
> + * If the suspend succeeds, we call scsi_autosuspend_host to decrement
> + * the host's count of unsuspended devices and invoke the LLD's suspend
> + * method.
>
> So you cannot operate on the link independent from the devices.

With the original patch, you can't operate on the link independent from
the devices. But with the revised patch (whenever I manage to find
time to write it!), you _will_ be able to.

Alan Stern

2008-08-13 19:42:20

by James Bottomley

[permalink] [raw]
Subject: Re: Power management for SCSI

On Wed, 2008-08-13 at 15:37 -0400, Alan Stern wrote:
> On Wed, 13 Aug 2008, Oliver Neukum wrote:
>
> > > In Alan's patch, SCSI calls scsi_host_template methods (if the LLD
> > > provides ones) to suspend and resume a Scsi_Host. The LLD can use them
> > > to work with the underlying infrastructure to determine what can be done
> > > at that time. I.e. are there other protocols or other initiator-like
> > > nodes sharing the link? If yes or if "maybe yes", the infrastructure
> > > keeps the link up. If not, it can move it into a low-power state.
> >
> > That is a parculiar way of viewing it. Alan's patch introduce runtime
> > pm attributes to the devices. Quoting:
> >
> >
> > +/**
> > + * scsi_suspend_sdev - suspend a SCSI device
> > + * @sdev: the scsi_device to suspend
> > + * @msg: Power Management message describing this state transition
> > + *
> > + * SCSI devices can't actually be suspended in a literal sense,
> > + * because SCSI doesn't have any notion of power management. Instead
> > + * this routine drains the request queue and calls the ULD's suspend
> > + * method to flush caches, spin-down drives, and so on.
> > + *
> > + * If the suspend succeeds, we call scsi_autosuspend_host to decrement
> > + * the host's count of unsuspended devices and invoke the LLD's suspend
> > + * method.
> >
> > So you cannot operate on the link independent from the devices.
>
> With the original patch, you can't operate on the link independent from
> the devices. But with the revised patch (whenever I manage to find
> time to write it!), you _will_ be able to.

That sounds great .. if you link it through the transport class, that
can implement the policy you want (as in power all devices down before
the link for USB, but just power down the link for SAS/SATA).

James

2008-08-13 20:16:20

by Alan Stern

[permalink] [raw]
Subject: Re: Power management for SCSI

On Wed, 13 Aug 2008, James Bottomley wrote:

> > With the original patch, you can't operate on the link independent from
> > the devices. But with the revised patch (whenever I manage to find
> > time to write it!), you _will_ be able to.
>
> That sounds great .. if you link it through the transport class, that
> can implement the policy you want (as in power all devices down before
> the link for USB, but just power down the link for SAS/SATA).

Assuming we have a transport class for USB/Firewire! That's the reason
I proposed adding such a thing.

Alan Stern

2008-08-13 20:24:22

by Leisner, Martin

[permalink] [raw]
Subject: RE: [linux-pm] Power management for SCSI

Regarding these scsi suspend patches, there's a general
problem to drop power on disk devices on a running system.
I discussed it in:

http://www.gossamer-threads.com/lists/linux/kernel/811598

We have a sequence:
a) stop further block requests
b) sync the disk (and sync the cache -- there was talk on and off
for several years about sync not realling syncing)
c) drop power on the disk

We tweak the ext3 mount timeout and the /proc/sys/vm settings to put
the computer into laptop mode.

When a read comes along, we reverse the process...(or a forced write
which generally won't happen).

But we need to add patches to the device driver and the block layer to
enable this...it seems useful if there was a more generic way to
handle it...maybe registering a callback to reenable power and a
mechanism
to start the poweroff sequence...

We've done this in 2.6.20, I wonder if there's any work along these
lines in recent kernels (I'm going to look at 2.6.2[67]...)

Marty

> -----Original Message-----
> From: [email protected] [mailto:linux-pm-
> [email protected]] On Behalf Of Alan Stern
> Sent: Wednesday, August 13, 2008 3:37 PM
> To: Oliver Neukum
> Cc: Linux-pm mailing list; kernel list; [email protected];
> [email protected]; Pavel Machek; Stefan Richter
> Subject: Re: [linux-pm] Power management for SCSI
>
> On Wed, 13 Aug 2008, Oliver Neukum wrote:
>
> > > In Alan's patch, SCSI calls scsi_host_template methods (if the
LLD
> > > provides ones) to suspend and resume a Scsi_Host. The LLD can
use
> them
> > > to work with the underlying infrastructure to determine what can
be
> done
> > > at that time. I.e. are there other protocols or other
initiator-
> like
> > > nodes sharing the link? If yes or if "maybe yes", the
> infrastructure
> > > keeps the link up. If not, it can move it into a low-power
state.
> >
> > That is a parculiar way of viewing it. Alan's patch introduce
runtime
> > pm attributes to the devices. Quoting:
> >
>
> With the original patch, you can't operate on the link independent
from
> the devices. But with the revised patch (whenever I manage to find
> time to write it!), you _will_ be able to.
>
> Alan Stern
>
> _______________________________________________
> linux-pm mailing list
> [email protected]
> https://lists.linux-foundation.org/mailman/listinfo/linux-pm

2008-08-13 20:38:49

by Alan Stern

[permalink] [raw]
Subject: RE: [linux-pm] Power management for SCSI

On Wed, 13 Aug 2008, Leisner, Martin wrote:

> Regarding these scsi suspend patches, there's a general
> problem to drop power on disk devices on a running system.
> I discussed it in:
>
> http://www.gossamer-threads.com/lists/linux/kernel/811598

There's a much worse problem which that thread completely ignored:

When you turn off power to a disk device, to the system it looks like a
hot-unplug event. Any mounted filesystems or memory mappings on that
disk will be lost.

Alan Stern

2008-08-14 06:07:40

by Oliver Neukum

[permalink] [raw]
Subject: Re: [linux-pm] Power management for SCSI

Am Mittwoch 13 August 2008 21:34:30 schrieb Alan Stern:
> On Wed, 13 Aug 2008, Oliver Neukum wrote:
>
> > Am Mittwoch 13 August 2008 17:44:46 schrieb Alan Stern:
> > > > All children that are USB must be powered down. We know in fact that most
> > > > drives don't care that the device is suspended. The problem was drive
> > > > enclosures that cut power upon suspension losing cached data.
> > >
> > > You misunderstood my question. ?Are there SCSI transports other than
> > > USB sharing the requirement that all child devices must be suspended
> > > before the link can be powered down?
> >
> > I dispute that USB in general has this property.
>
> How can you dispute that? You said it yourself, in the top quote
> above: "All children that are USB must be powered down."

But the children are SCSI, not USB.

> > Some storage devices
> > need their caches flushed. USB itself is perfectly happy with autosuspending
> > the storage device (host) without telling the disks (devices)
> >
> > You could even argue that these storage devices violate the USB spec.
>
> Oliver, you can't have it both ways. Either we do spin down disks and
> drain device caches before autosuspending usb-storage or we don't.

That is true.

> For safety's sake, obviously we should. The overhead is minimal since
> this happens only after the idle timeout has expired. And for devices
> that don't support it (like flash storage), sd skips the spin-down
> command anway.

But you cannot make the conclusion that the ultimate children should have
any autosuspend attributes. We can implement autosuspend in usb storage
and propagate the suspend calls down the tree without SCSI knowing about
autosuspend.

Such a system would have it drawbacks, but it'd be a lot simpler.

Regards
Oliver

2008-08-14 13:59:09

by Pavel Machek

[permalink] [raw]
Subject: Re: [linux-pm] Power management for SCSI

On Wed 2008-08-13 18:21:29, Oliver Neukum wrote:
> Am Mittwoch 13 August 2008 17:44:46 schrieb Alan Stern:
> > > All children that are USB must be powered down. We know in fact that most
> > > drives don't care that the device is suspended. The problem was drive
> > > enclosures that cut power upon suspension losing cached data.
> >
> > You misunderstood my question. ?Are there SCSI transports other than
> > USB sharing the requirement that all child devices must be suspended
> > before the link can be powered down?
>
> I dispute that USB in general has this property. Some storage devices
> need their caches flushed. USB itself is perfectly happy with autosuspending
> the storage device (host) without telling the disks (devices)
>
> You could even argue that these storage devices violate the USB spec.

Hmm... but suspended devices have very little power budget, right?

So unless you have external power supply (2.5" frames generally
don't), you can't really suspend and stay spinned up...

--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

2008-08-14 14:00:00

by Pavel Machek

[permalink] [raw]
Subject: Re: Power management for SCSI

Hi!

> > Add support for autosuspend/autoresume. Lowlevel driver can use it to
> > spin the disk down and power down its SATA link, to turn off the USB
> > interface, etc.
> >
> > Spinning down the disk is useful - saves ~0.5W here. Powering down
> > SATA controller is even better -- should save ~1W.
> >
> > Now, I guess the patch will need to be split to small pieces for
> > merge... I tried to rearrange it so that the documentation and hooks
> > go before stuff that needs the hooks, and before Kconfig enabler. If
> > it looks reasonably good, I'll split it into smaller pieces.
>
> James had a number of objections to my original patch; you can read
> them here:
>
> https://lists.linux-foundation.org/pipermail/linux-pm/2008-March/016849.html
>
> I haven't had time yet to work on an improved version.

Ok, I see, "its done at the wrong level" sounds pretty serious.

First the general comments/questions:

#
#1. It's done at the wrong level: suspend "device" is actually a target
#function. There's no way on a multi-lun device we want to keep the
#flags and last_busy anywhere but in the target

So... if there's one device with Lun0==cdrom1 and Lun1==cdrom2, it is a
single target, and we want to keep flags/last busy common to all that?

What is good data structure to add? I see scsi_tgt*.h, but it is very
short, and there does not seem to be good structure to hook into.

#2. As you say in the comment, the thing we're trying to power down is
#the link. In most SCSI implementations, the link has a rather complex
#relationship to the target, what we want to do in
#periodic_autosuspend_scan() is run over the devices on each link, and
#if
#they're not busy suspend the link? What's probably needed is a set of
#adjunct helpers for the transport classes to do this.

So the host suspend/resume stuff should go into struct
scsi_transport_template?

#3. The link power down is much faster than device spin down ... in
#your
#patch these two things seem to be coupled ... we really need to keep
#them separate.
#

ACK.

#4. The entanglement with error handling is incredibly problematic
#(since
#eh is a nastily complex state machine in its own right). What do
#transports that use eh_strategy_handler do about all of this?

/me scared...
Pavel

--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

2008-08-14 14:07:28

by Oliver Neukum

[permalink] [raw]
Subject: Re: [linux-pm] Power management for SCSI

Am Donnerstag 14 August 2008 15:50:21 schrieb Pavel Machek:
> On Wed 2008-08-13 18:21:29, Oliver Neukum wrote:
> > Am Mittwoch 13 August 2008 17:44:46 schrieb Alan Stern:
> > > > All children that are USB must be powered down. We know in fact that most
> > > > drives don't care that the device is suspended. The problem was drive
> > > > enclosures that cut power upon suspension losing cached data.
> > >
> > > You misunderstood my question. ?Are there SCSI transports other than
> > > USB sharing the requirement that all child devices must be suspended
> > > before the link can be powered down?
> >
> > I dispute that USB in general has this property. Some storage devices
> > need their caches flushed. USB itself is perfectly happy with autosuspending
> > the storage device (host) without telling the disks (devices)
> >
> > You could even argue that these storage devices violate the USB spec.
>
> Hmm... but suspended devices have very little power budget, right?
>
> So unless you have external power supply (2.5" frames generally
> don't), you can't really suspend and stay spinned up...
>

True, but the spec says that no state shall be lost.

I don't really argue against flushing the caches. But I cannot that this would
demand that we should implement autopsuspend for SCSI. It seems like
overengineering to me.

Regards
Oliver

2008-08-14 15:41:00

by Alan Stern

[permalink] [raw]
Subject: Re: [linux-pm] Power management for SCSI

On Thu, 14 Aug 2008, Oliver Neukum wrote:

> > > I dispute that USB in general has this property.
> >
> > How can you dispute that? You said it yourself, in the top quote
> > above: "All children that are USB must be powered down."
>
> But the children are SCSI, not USB.

Oh, I see. All right, yes. However USB in general _does_ have the
property that child devices might not be able to accomplish much while
the USB link is suspended, particularly if they are bus-powered. This
includes draining caches.

> > Oliver, you can't have it both ways. Either we do spin down disks and
> > drain device caches before autosuspending usb-storage or we don't.
>
> That is true.
>
> > For safety's sake, obviously we should. The overhead is minimal since
> > this happens only after the idle timeout has expired. And for devices
> > that don't support it (like flash storage), sd skips the spin-down
> > command anway.
>
> But you cannot make the conclusion that the ultimate children should have
> any autosuspend attributes. We can implement autosuspend in usb storage
> and propagate the suspend calls down the tree without SCSI knowing about
> autosuspend.

The way I designed the autosuspend framework, you _can't_ do that. In
my framework autosuspend and autoresume events propagate _up_ the
device tree, not _down_. This means an autosuspend has to be initiated
by the child SCSI layer, not by the USB layer. Which is as it should
be, since the USB layer doesn't know when it is appropriate for a SCSI
device to autosuspend.

> Such a system would have it drawbacks, but it'd be a lot simpler.

It would be a layering violation.

Alan Stern

2008-08-14 15:47:17

by Alan Stern

[permalink] [raw]
Subject: Re: [linux-pm] Power management for SCSI

On Thu, 14 Aug 2008, Oliver Neukum wrote:

> > > You could even argue that these storage devices violate the USB spec.
> >
> > Hmm... but suspended devices have very little power budget, right?
> >
> > So unless you have external power supply (2.5" frames generally
> > don't), you can't really suspend and stay spinned up...
> >
>
> True, but the spec says that no state shall be lost.

What can we do?... Real world devices don't always obey the spec.

You could argue that the suspend current should be sufficient to
maintain the contents of the cache, which would then be written out
after resume. But even if that is true, it's a very fragile guarantee
to rely on.

> I don't really argue against flushing the caches. But I cannot that this would
> demand that we should implement autopsuspend for SCSI. It seems like
> overengineering to me.

Think of it in two parts: idle-timeout detection and autosuspend.
Presumably you don't object to the idle-timeout detection (which is
needed for powering down links in general), and you don't argue against
the cache-flushing part of autosuspend. Taken together, that's about
90% of my proposal. So what is the objectionable 10%?

Alan Stern

2008-08-14 15:55:40

by Pavel Machek

[permalink] [raw]
Subject: Re: Power management for SCSI

Hi!

> > > Add support for autosuspend/autoresume. Lowlevel driver can use it to
> > > spin the disk down and power down its SATA link, to turn off the USB
> > > interface, etc.
> > >
> > > Spinning down the disk is useful - saves ~0.5W here. Powering down
> > > SATA controller is even better -- should save ~1W.
> > >
> > > Now, I guess the patch will need to be split to small pieces for
> > > merge... I tried to rearrange it so that the documentation and hooks
> > > go before stuff that needs the hooks, and before Kconfig enabler. If
> > > it looks reasonably good, I'll split it into smaller pieces.
> >
> > James had a number of objections to my original patch; you can read
> > them here:
> >
> > https://lists.linux-foundation.org/pipermail/linux-pm/2008-March/016849.html
> >
> > I haven't had time yet to work on an improved version.
>
> Ok, I see, "its done at the wrong level" sounds pretty serious.
>
> First the general comments/questions:
...
> #2. As you say in the comment, the thing we're trying to power down is
> #the link. In most SCSI implementations, the link has a rather complex
> #relationship to the target, what we want to do in
> #periodic_autosuspend_scan() is run over the devices on each link, and
> #if
> #they're not busy suspend the link? What's probably needed is a set of
> #adjunct helpers for the transport classes to do this.
>
> So the host suspend/resume stuff should go into struct
> scsi_transport_template?

Is this step in the right direction? Moved autosuspend from
scsi_host_template to scsi_transport_template...

Pavel

diff --git a/drivers/ata/ahci.c b/drivers/ata/ahci.c
index e4864d9..2b8cf09 100644
--- a/drivers/ata/ahci.c
+++ b/drivers/ata/ahci.c
@@ -320,14 +320,14 @@ static struct device_attribute *ahci_sde
};

struct pci_dev *my_pdev;
-int autosuspend_enabled = 0; /* HERE */
+int autosuspend_enabled = 1; /* HERE */

struct sleep_disabled_reason ahci_active = {
"ahci"
};

/* The host and its devices are all idle so we can autosuspend */
-static int autosuspend(struct Scsi_Host *host)
+int ahci_autosuspend(struct Scsi_Host *host)
{
if (my_pdev && autosuspend_enabled) {
printk("ahci: should autosuspend\n");
@@ -340,7 +340,7 @@ static int autosuspend(struct Scsi_Host
}

/* The host needs to be autoresumed */
-static int autoresume(struct Scsi_Host *host)
+int ahci_autoresume(struct Scsi_Host *host)
{
if (my_pdev && autosuspend_enabled) {
printk("ahci: should autoresume\n");
@@ -360,8 +360,8 @@ static struct scsi_host_template ahci_sh
.sg_tablesize = AHCI_MAX_SG,
.dma_boundary = AHCI_DMA_BOUNDARY,
.shost_attrs = ahci_shost_attrs,
- .autosuspend = autosuspend,
- .autoresume = autoresume,
+// .autosuspend = autosuspend,
+// .autoresume = autoresume,
.sdev_attrs = ahci_sdev_attrs,
};

diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
index b9d3ba4..d3526a0 100644
--- a/drivers/ata/libata-scsi.c
+++ b/drivers/ata/libata-scsi.c
@@ -103,6 +103,9 @@ static const u8 def_control_mpage[CONTRO
0, 30 /* extended self test time, see 05-359r1 */
};

+int ahci_autosuspend(struct Scsi_Host *host);
+int ahci_autoresume(struct Scsi_Host *host);
+
/*
* libata transport template. libata doesn't do real transport stuff.
* It just needs the eh_timed_out hook.
@@ -111,6 +114,8 @@ static struct scsi_transport_template at
.eh_strategy_handler = ata_scsi_error,
.eh_timed_out = ata_scsi_timed_out,
.user_scan = ata_scsi_user_scan,
+ .autosuspend = ahci_autosuspend,
+ .autoresume = ahci_autoresume,
};


diff --git a/drivers/scsi/scsi_pm.c b/drivers/scsi/scsi_pm.c
index 5ef69c4..d2de371 100644
--- a/drivers/scsi/scsi_pm.c
+++ b/drivers/scsi/scsi_pm.c
@@ -10,6 +10,7 @@ #define DEBUG
#include <scsi/scsi.h>
#include <scsi/scsi_device.h>
#include <scsi/scsi_host.h>
+#include <scsi/scsi_transport.h>

#include <linux/delay.h>

@@ -50,8 +51,8 @@ void scsi_autosuspend_host(struct Scsi_H
if (shost->pm_usage_cnt <= 0 && !shost->is_suspended &&
shost->shost_state == SHOST_RUNNING) {
WARN_ON(shost->host_busy);
- if (!shost->hostt->autosuspend ||
- shost->hostt->autosuspend(shost) == 0) {
+ if (!shost->transportt->autosuspend ||
+ shost->transportt->autosuspend(shost) == 0) {
shost->is_suspended = 1;
shost_dbg(shost, "suspended\n");
}
@@ -82,10 +83,10 @@ int scsi_autoresume_host(struct Scsi_Hos
mutex_lock(&shost->pm_mutex);
++shost->pm_usage_cnt;
if (shost->is_suspended) {
- if (shost->hostt->autoresume &&
+ if (shost->transportt->autoresume &&
(shost->shost_state == SHOST_RUNNING ||
shost->shost_state == SHOST_RECOVERY))
- status = shost->hostt->autoresume(shost);
+ status = shost->transportt->autoresume(shost);
if (status == 0) {
shost->is_suspended = 0;
shost_dbg(shost, "resumed\n");
diff --git a/drivers/usb/storage/scsiglue.c b/drivers/usb/storage/scsiglue.c
index ef64fd8..c96f11f 100644
--- a/drivers/usb/storage/scsiglue.c
+++ b/drivers/usb/storage/scsiglue.c
@@ -499,10 +499,6 @@ struct scsi_host_template usb_stor_host_
.eh_device_reset_handler = device_reset,
.eh_bus_reset_handler = bus_reset,

- /* dynamic power management */
- .autosuspend = autosuspend,
- .autoresume = autoresume,
-
/* queue commands only, only one command per LUN */
.can_queue = 1,
.cmd_per_lun = 1,
diff --git a/include/scsi/scsi_host.h b/include/scsi/scsi_host.h
index b60445f..0f30451 100644
--- a/include/scsi/scsi_host.h
+++ b/include/scsi/scsi_host.h
@@ -176,21 +176,6 @@ #endif
int (* eh_bus_reset_handler)(struct scsi_cmnd *);
int (* eh_host_reset_handler)(struct scsi_cmnd *);

- /*
- * Power management routines. These are optional; you should
- * implement them if you want your LLD to perform dynamic Power
- * Management. The autosuspend method will be called whenever
- * all the devices below a host have been suspended (are in an
- * idle state), at which time the host adapter can safely be
- * autosuspended. The autoresume method will be called whenever
- * a suspended host must be resumed for one of its devices to
- * carry out a command. Both routines are always called in a
- * process context with interrupts enabled.
- *
- * Status: OPTIONAL
- */
- int (* autosuspend)(struct Scsi_Host *);
- int (* autoresume)(struct Scsi_Host *);

/*
* Before the mid layer attempts to scan for a new device where none
diff --git a/include/scsi/scsi_transport.h b/include/scsi/scsi_transport.h
index 490bd13..15c7886 100644
--- a/include/scsi/scsi_transport.h
+++ b/include/scsi/scsi_transport.h
@@ -77,6 +77,22 @@ struct scsi_transport_template {
* request for target drivers.
*/
int (* tsk_mgmt_response)(struct Scsi_Host *, u64, u64, int);
+
+ /*
+ * Power management routines. These are optional; you should
+ * implement them if you want your LLD to perform dynamic Power
+ * Management. The autosuspend method will be called whenever
+ * all the devices below a host have been suspended (are in an
+ * idle state), at which time the host adapter can safely be
+ * autosuspended. The autoresume method will be called whenever
+ * a suspended host must be resumed for one of its devices to
+ * carry out a command. Both routines are always called in a
+ * process context with interrupts enabled.
+ *
+ * Status: OPTIONAL
+ */
+ int (* autosuspend)(struct Scsi_Host *);
+ int (* autoresume)(struct Scsi_Host *);
};

#define transport_class_to_shost(tc) \


--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

2008-08-14 21:43:07

by Oliver Neukum

[permalink] [raw]
Subject: Re: [linux-pm] Power management for SCSI

Am Donnerstag 14 August 2008 17:47:02 schrieb Alan Stern:
> > I don't really argue against flushing the caches. But I cannot that this would
> > demand that we should implement autopsuspend for SCSI. It seems like
> > overengineering to me.
>
> Think of it in two parts: idle-timeout detection and autosuspend. ?
> Presumably you don't object to the idle-timeout detection (which is
> needed for powering down links in general), and you don't argue against
> the cache-flushing part of autosuspend. ?Taken together, that's about
> 90% of my proposal. ?So what is the objectionable 10%?

The core problem is that you insist on a rigid bottom-to-top flow of
autosuspensions. That's good for systems like USB and PCI which
are trees for PM purposes. It makes no sense for true busses with
equal members on the bus.

Regards
Oliver

2008-08-14 22:11:55

by Stefan Richter

[permalink] [raw]
Subject: Re: Power management for SCSI

Pavel Machek wrote:
>> https://lists.linux-foundation.org/pipermail/linux-pm/2008-March/016849.html
...
> First the general comments/questions:
>
> #
> #1. It's done at the wrong level: suspend "device" is actually a target
> #function. There's no way on a multi-lun device we want to keep the
> #flags and last_busy anywhere but in the target
>
> So... if there's one device with Lun0==cdrom1 and Lun1==cdrom2, it is a
> single target, and we want to keep flags/last busy common to all that?

Actually a command set driver like sd surely wants last_busy (time of
last use) separate for each LU for auto-spindown, doesn't it?

I'm not sure about the rest, i.e. delay, counter, flags.

> What is good data structure to add? I see scsi_tgt*.h, but it is very
> short, and there does not seem to be good structure to hook into.

include/scsi/scsi_tgt*.h are for local target implementations. The
representation of "remote" targets, as seen by local initiators, is
include/scsi/scsi_device.h's struct scsi_target.
--
Stefan Richter
-=====-==--- =--- -===-
http://arcgraph.de/sr/

2008-08-14 22:25:44

by Alan Stern

[permalink] [raw]
Subject: Re: [linux-pm] Power management for SCSI

On Thu, 14 Aug 2008, Oliver Neukum wrote:

> The core problem is that you insist on a rigid bottom-to-top flow of
> autosuspensions. That's good for systems like USB and PCI which
> are trees for PM purposes. It makes no sense for true busses with
> equal members on the bus.

My framework is tree-oriented because it's based on the driver model,
which uses a tree of devices.

Even on a true bus, the members can't be entirely equal -- one of them
has to be closer to the CPU than the others are. If that one member is
in a low-power state then the CPU can't communicate with anything on
the bus, unlike when one of the other members is in a low-power state.

(I suppose in theory there could be a situation in which the CPU has
direct communication with a bunch of devices, which can also
communicate among themselves over some other bus. In such a situation
we would represent the devices as members of separate branches in the
device tree, so that suspending one would have no impact on suspending
the others. The presence of the interconnecting bus would be ignored.)

Alan Stern

2008-08-15 07:15:31

by Oliver Neukum

[permalink] [raw]
Subject: Re: [linux-pm] Power management for SCSI

Am Freitag 15 August 2008 00:25:28 schrieb Alan Stern:
> On Thu, 14 Aug 2008, Oliver Neukum wrote:
>
> > The core problem is that you insist on a rigid bottom-to-top flow of
> > autosuspensions. That's good for systems like USB and PCI which
> > are trees for PM purposes. It makes no sense for true busses with
> > equal members on the bus.
>
> My framework is tree-oriented because it's based on the driver model,
> which uses a tree of devices.

Which uses a tree because PCI and USB are.

> Even on a true bus, the members can't be entirely equal -- one of them
> has to be closer to the CPU than the others are. If that one member is
> in a low-power state then the CPU can't communicate with anything on
> the bus, unlike when one of the other members is in a low-power state.

Yes, that means under some circumstances you cannot suspend the
member closest to the CPU, but under others you can. In a tree this question
is very simply answered, on a bus you will actually need to compute whether
you need the connection to the bus.

It is true that you won't need the bus if all other members on the bus have
been suspended, but that's not very good because physically spinning
down and up a disk is a very expensive operation, while suspending a host
adapter can be trivial.

Regards
Oliver

2008-08-15 15:25:27

by Alan Stern

[permalink] [raw]
Subject: Re: [linux-pm] Power management for SCSI

On Fri, 15 Aug 2008, Oliver Neukum wrote:

> Am Freitag 15 August 2008 00:25:28 schrieb Alan Stern:
> > On Thu, 14 Aug 2008, Oliver Neukum wrote:
> >
> > > The core problem is that you insist on a rigid bottom-to-top flow of
> > > autosuspensions. That's good for systems like USB and PCI which
> > > are trees for PM purposes. It makes no sense for true busses with
> > > equal members on the bus.
> >
> > My framework is tree-oriented because it's based on the driver model,
> > which uses a tree of devices.
>
> Which uses a tree because PCI and USB are.

How do you know? Is that just a guess based on some of Greg KH's and
Pat Mochel's previous activities? Did you ask them?

> > Even on a true bus, the members can't be entirely equal -- one of them
> > has to be closer to the CPU than the others are. If that one member is
> > in a low-power state then the CPU can't communicate with anything on
> > the bus, unlike when one of the other members is in a low-power state.
>
> Yes, that means under some circumstances you cannot suspend the
> member closest to the CPU, but under others you can. In a tree this question
> is very simply answered, on a bus you will actually need to compute whether
> you need the connection to the bus.

I don't see why any computation is needed. If the CPU will need to
communicate with any devices on the bus (i.e., if any of these devices
are not idle) then you need the connection to the bus, otherwise you
don't. It's exactly the same with a tree. The fact that the
interconnections form a bus rather than a tree is irrelevant.

(Viewed in logical terms, even a true bus can be described as a tree.
The nodes are partially ordered by their communication paths to the
CPU.)

More to the point is whether you should ever suspend any of these
devices if there can be multiple initiators. But that's a separate
question.

> It is true that you won't need the bus if all other members on the bus have
> been suspended, but that's not very good because physically spinning
> down and up a disk is a very expensive operation, while suspending a host
> adapter can be trivial.

What is your point? You seem to be saying that it would be nice to
suspend a host adapter at times when some of the SCSI targets beneath
it are not suspended. I agree, but how would you determine whether
such a thing was safe?

Alan Stern

2008-08-19 07:38:49

by Pavel Machek

[permalink] [raw]
Subject: Re: Power management for SCSI

Hi!

> > Add support for autosuspend/autoresume. Lowlevel driver can use it to
> > spin the disk down and power down its SATA link, to turn off the USB
> > interface, etc.
> >
> > Spinning down the disk is useful - saves ~0.5W here. Powering down
> > SATA controller is even better -- should save ~1W.
> >
> > Now, I guess the patch will need to be split to small pieces for
> > merge... I tried to rearrange it so that the documentation and hooks
> > go before stuff that needs the hooks, and before Kconfig enabler. If
> > it looks reasonably good, I'll split it into smaller pieces.
>
> James had a number of objections to my original patch; you can read
> them here:
>
> https://lists.linux-foundation.org/pipermail/linux-pm/2008-March/016849.html
>
> I haven't had time yet to work on an improved version.

Would it make sense to split the patches into "autosuspend for SCSI
devices" and "autosuspend for SCSI controllers" for easier
review/merge? I guess I'll start with the devices, they seem easier...
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

2008-08-19 07:49:47

by Oliver Neukum

[permalink] [raw]
Subject: Re: [linux-pm] Power management for SCSI

Am Dienstag 19 August 2008 09:38:39 schrieb Pavel Machek:

> Would it make sense to split the patches into "autosuspend for SCSI
> devices" and "autosuspend for SCSI controllers" for easier
> review/merge? I guess I'll start with the devices, they seem easier...

Runtime PM will have to leave devices alone while error handling is active.
Unfortunately error handling is done at controller level. So I am afraid
this would be very difficult.

Regards
Oliver

2008-08-19 13:33:28

by Oliver Neukum

[permalink] [raw]
Subject: Re: [linux-pm] Power management for SCSI

Am Freitag 15 August 2008 17:25:13 schrieb Alan Stern:
> On Fri, 15 Aug 2008, Oliver Neukum wrote:
>
> > Am Freitag 15 August 2008 00:25:28 schrieb Alan Stern:

Hi,

> > Yes, that means under some circumstances you cannot suspend the
> > member closest to the CPU, but under others you can. In a tree this question
> > is very simply answered, on a bus you will actually need to compute whether
> > you need the connection to the bus.

> More to the point is whether you should ever suspend any of these
> devices if there can be multiple initiators. But that's a separate
> question.

But one that needs to be addressed.

> > It is true that you won't need the bus if all other members on the bus have
> > been suspended, but that's not very good because physically spinning
> > down and up a disk is a very expensive operation, while suspending a host
> > adapter can be trivial.
>
> What is your point? You seem to be saying that it would be nice to
> suspend a host adapter at times when some of the SCSI targets beneath
> it are not suspended. I agree, but how would you determine whether
> such a thing was safe?

I suggest by talking to the HLDs.

It seems to me that abstractly talking there are three criteria for suspension

- the cpu needs to talk to the device now
- the device may need to talk to the CPU at unpredictable times
- suspending has side effects

Suspension in USB has always side effects. That's not true for other
subsystems. It seems to me that for the general case we need to divorce
the notion of a child being suspended itself from a child agreeing to its
parent being suspended.

Regards
Oliver

2008-08-19 14:32:29

by Alan Stern

[permalink] [raw]
Subject: Re: Power management for SCSI

On Tue, 19 Aug 2008, Pavel Machek wrote:

> Would it make sense to split the patches into "autosuspend for SCSI
> devices" and "autosuspend for SCSI controllers" for easier
> review/merge? I guess I'll start with the devices, they seem easier...

It really should be split up differently. The topics of interest are:

idle detection for SCSI devices,

autosuspend for SCSI targets and its relation to suspend
for SCSI devices,

passing suspend & resume notifications to the transport class,

adding a USB transport class so that usb-storage can respond
to those notifications,

modifying other transport classes as needed so that the LLDs
can power-down links or host adapters.

I think that covers it. Oliver may have some additional suggestions.

Alan Stern

2008-08-19 15:28:39

by Alan Stern

[permalink] [raw]
Subject: Re: [linux-pm] Power management for SCSI

On Tue, 19 Aug 2008, Oliver Neukum wrote:

> > More to the point is whether you should ever suspend any of these
> > devices if there can be multiple initiators. But that's a separate
> > question.
>
> But one that needs to be addressed.

One possibility is to have an attribute flag for SCSI transport
classes, indicating whether the transport supports multiple initiators.

Besides, isn't this already an issue? What happens when someone does a
system suspend or hibernate? Don't the attached disk drives get spun
down, even if there are other initiators on the same SCSI bus?

(And is this really a problem? If an error occurs because a drive is
spun down when some other device tries to access it, that other device
should simply spin the drive back up again.)

> > What is your point? You seem to be saying that it would be nice to
> > suspend a host adapter at times when some of the SCSI targets beneath
> > it are not suspended. I agree, but how would you determine whether
> > such a thing was safe?
>
> I suggest by talking to the HLDs.

Why would the HLD (= ULD?) know?

For example, consider a USB disk drive. How is sd.c (the HLD) supposed
to know that it's not safe to suspend the USB link without spinning
down the drive? Or consider a traditional SCSI parallel interface
drive. How is sd.c supposed to know that it is safe to suspend the
SPI host adapter without first spinning down the drive?

> It seems to me that abstractly talking there are three criteria for suspension
>
> - the cpu needs to talk to the device now

I.e., whether the idle timeout has expired, right?

> - the device may need to talk to the CPU at unpredictable times

I.e, whether remote wakeup needs to be enabled, right?

> - suspending has side effects

I'm not sure what you mean by that. Suspension always has side effects
of one kind or another.

> Suspension in USB has always side effects. That's not true for other
> subsystems.

Name one. At the very least, suspending a device means you can't use
it again without first calling the driver's resume method. That's a
side effect.

Hopefully, in most subsystems suspending a device would reduce its
power usage. Unfortunately this isn't true for SCSI at the moment...

> It seems to me that for the general case we need to divorce
> the notion of a child being suspended itself from a child agreeing to its
> parent being suspended.

This is already possible. For example, you may remember a couple of
years ago I posted a patch for usb-storage which would autosuspend it
without regard for the state of its child devices. The patch didn't
work out, because some devices really did need to have their caches
drained or disks spun down.

There's nothing about my suspend framework to prevent a driver from
autosuspending its device while the children are still active.
Rather, the framework insists on notifications going the other way:
The driver has to be told whenever one of its device's children is
suspended or resumed.

Alan Stern

2008-08-19 21:10:36

by Leisner, Martin

[permalink] [raw]
Subject: RE: [linux-pm] Power management for SCSI

Being able to drop power on the disk on demand is a useful concept.

We do it, but need a number of custom patches in a number of places.

When the system WANTS to access the disk, it does what is necessary to
get the disk to spin up...

I'm not sure dropping disk power in a control way should trigger a
hot-plug
event --if everyone EXPECTS it.

I'm just looking for a more generic way to enable this...I'm going to be
looking a 2.6.26/27 for this soon...

marty

> -----Original Message-----
> From: Alan Stern [mailto:[email protected]]
> Sent: Wednesday, August 13, 2008 4:39 PM
> To: Leisner, Martin
> Cc: Oliver Neukum; Linux-pm mailing list; kernel list;
[email protected];
> [email protected]; Pavel Machek; Stefan Richter
> Subject: RE: [linux-pm] Power management for SCSI
>
> On Wed, 13 Aug 2008, Leisner, Martin wrote:
>
> > Regarding these scsi suspend patches, there's a general
> > problem to drop power on disk devices on a running system.
> > I discussed it in:
> >
> > http://www.gossamer-threads.com/lists/linux/kernel/811598
>
> There's a much worse problem which that thread completely ignored:
>
> When you turn off power to a disk device, to the system it looks
like a
> hot-unplug event. Any mounted filesystems or memory mappings on
that
> disk will be lost.
>
> Alan Stern

2008-08-19 23:25:30

by Stefan Richter

[permalink] [raw]
Subject: Re: [linux-pm] Power management for SCSI

Alan Stern wrote:
> On Tue, 19 Aug 2008, Oliver Neukum wrote:
>
>>> More to the point is whether you should ever suspend any of these
>>> devices if there can be multiple initiators. But that's a separate
>>> question.
>> But one that needs to be addressed.
>
> One possibility is to have an attribute flag for SCSI transport
> classes, indicating whether the transport supports multiple initiators.
>
> Besides, isn't this already an issue? What happens when someone does a
> system suspend or hibernate? Don't the attached disk drives get spun
> down, even if there are other initiators on the same SCSI bus?

In (fw-)sbp2, we have for example this simple code:

static int sbp2_scsi_slave_configure(struct scsi_device *sdev)
{
...
if (sbp2_param_exclusive_login)
sdev->manage_start_stop = 1;
...
By setting the exclusive_login module parameter from Y (default) to N,
multiple initiators per logical unit become possible. We are too lazy
to check whether there are actually other initiators at a given moment;
after all they can come and go all the time. So the simplest strategy
is to suppress managed START STOP when concurrent initiators are _possible_.

I suppose though that all multiple initiator capable transports have
ways to query the presence of other initiators at any given time; but I
don't think the respective effort is justified.

> (And is this really a problem? If an error occurs because a drive is
> spun down when some other device tries to access it, that other device
> should simply spin the drive back up again.)

The high latency may be a problem.
--
Stefan Richter
-=====-==--- =--- =-=--
http://arcgraph.de/sr/

2008-08-22 20:25:35

by Pavel Machek

[permalink] [raw]
Subject: Re: [linux-pm] Power management for SCSI

Hi!

> > - suspending has side effects
>
> I'm not sure what you mean by that. Suspension always has side effects
> of one kind or another.
>
> > Suspension in USB has always side effects. That's not true for other
> > subsystems.
>
> Name one. At the very least, suspending a device means you can't use
> it again without first calling the driver's resume method. That's a
> side effect.

IDE, actually. I don't think it is relevant, but you can do hdparm -y,
and it will automatically spin up when you try to talk to it next
time.

Pavel

--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

2008-08-22 22:15:03

by Alan Stern

[permalink] [raw]
Subject: Re: [linux-pm] Power management for SCSI

On Fri, 22 Aug 2008, Pavel Machek wrote:

> Hi!
>
> > > - suspending has side effects
> >
> > I'm not sure what you mean by that. Suspension always has side effects
> > of one kind or another.
> >
> > > Suspension in USB has always side effects. That's not true for other
> > > subsystems.
> >
> > Name one. At the very least, suspending a device means you can't use
> > it again without first calling the driver's resume method. That's a
> > side effect.
>
> IDE, actually. I don't think it is relevant, but you can do hdparm -y,
> and it will automatically spin up when you try to talk to it next
> time.

It's a matter of definitions... "hdparm -y" doesn't call the driver's
suspend method, so in some sense it isn't truly a suspend.

But it's true that some systems can power down more or less
transparently (with restart latency as the only visible side effect).

Alan Stern

2008-08-25 12:49:16

by Oliver Neukum

[permalink] [raw]
Subject: Re: [linux-pm] Power management for SCSI

Am Dienstag 19 August 2008 17:28:28 schrieb Alan Stern:
> On Tue, 19 Aug 2008, Oliver Neukum wrote:

> > I suggest by talking to the HLDs.
>
> Why would the HLD (= ULD?) know?
>
> For example, consider a USB disk drive. How is sd.c (the HLD) supposed
> to know that it's not safe to suspend the USB link without spinning
> down the drive? Or consider a traditional SCSI parallel interface

The HLD is responsible for suspending the disk in case the system is
suspended. The HLD must know how to safely suspend a device. It may be
overcautious, but it'll work.

> > It seems to me that abstractly talking there are three criteria for suspension
> >
> > - the cpu needs to talk to the device now
>
> I.e., whether the idle timeout has expired, right?
>
> > - the device may need to talk to the CPU at unpredictable times
>
> I.e, whether remote wakeup needs to be enabled, right?

I am talking about correctness for controllers. So remote wakeup may or may not
be available. Likewise the bus may be able to predict how long it'll be idle.

> > - suspending has side effects
>
> I'm not sure what you mean by that. Suspension always has side effects
> of one kind or another.

But not outside the controller. If you suspend the root hub of a usb bus,
you suspend everything on the bus. It's a feature of the hardware. Other
busses are different.


> There's nothing about my suspend framework to prevent a driver from
> autosuspending its device while the children are still active.
> Rather, the framework insists on notifications going the other way:
> The driver has to be told whenever one of its device's children is
> suspended or resumed.

That's the problem. You don't tell the children when the parent might want
to suspend.

Regards
Oliver

2008-08-25 14:45:29

by Alan Stern

[permalink] [raw]
Subject: Re: [linux-pm] Power management for SCSI

On Mon, 25 Aug 2008, Oliver Neukum wrote:

> Am Dienstag 19 August 2008 17:28:28 schrieb Alan Stern:
> > On Tue, 19 Aug 2008, Oliver Neukum wrote:
>
> > > I suggest by talking to the HLDs.
> >
> > Why would the HLD (= ULD?) know?
> >
> > For example, consider a USB disk drive. How is sd.c (the HLD) supposed
> > to know that it's not safe to suspend the USB link without spinning
> > down the drive? Or consider a traditional SCSI parallel interface
>
> The HLD is responsible for suspending the disk in case the system is
> suspended. The HLD must know how to safely suspend a device. It may be
> overcautious, but it'll work.

You didn't answer my question: How does the HLD know whether it's okay
to suspend the link without suspending the device? I should think that
it _doesn't_ know.

The transport class code might know, or the link's driver -- but not
the HLD. The HLD probably doesn't even know what type of transport is
being used!

> > > It seems to me that abstractly talking there are three criteria for suspension
> > >
> > > - the cpu needs to talk to the device now
> >
> > I.e., whether the idle timeout has expired, right?
> >
> > > - the device may need to talk to the CPU at unpredictable times
> >
> > I.e, whether remote wakeup needs to be enabled, right?
>
> I am talking about correctness for controllers. So remote wakeup may or may not
> be available. Likewise the bus may be able to predict how long it'll be idle.

I don't understand. Are you saying that whether or not it's correct to
suspend a link depends on whether the device may need to talk to the
CPU at unpredictable times? And if so, isn't that the same as saying
that remote wakeup for the link can be enabled?

As for predicting how long the link will be idle... I doubt it is
possible to do that with any reliability.

> > I'm not sure what you mean by that. Suspension always has side effects
> > of one kind or another.
>
> But not outside the controller. If you suspend the root hub of a usb bus,
> you suspend everything on the bus. It's a feature of the hardware. Other
> busses are different.

All right, granted.

> > There's nothing about my suspend framework to prevent a driver from
> > autosuspending its device while the children are still active.
> > Rather, the framework insists on notifications going the other way:
> > The driver has to be told whenever one of its device's children is
> > suspended or resumed.
>
> That's the problem. You don't tell the children when the parent might want
> to suspend.

Why should the children need to know?

If the children are already suspended then we certainly don't
need to tell them the link is going down.

If the children are active, then the link's driver or the
transport class must already have given the okay for
suspending the link while leaving the children active.
So again, why consult the children's drivers?

Alan Stern

2008-08-25 15:04:27

by Oliver Neukum

[permalink] [raw]
Subject: Re: [linux-pm] Power management for SCSI

Am Montag 25 August 2008 16:45:20 schrieb Alan Stern:

> You didn't answer my question: How does the HLD know whether it's okay
> to suspend the link without suspending the device? I should think that
> it _doesn't_ know.
>
> The transport class code might know, or the link's driver -- but not
> the HLD. The HLD probably doesn't even know what type of transport is
> being used!

There's some truth to that. Unfortunately the transport does not know
whether a device or link may be suspended. Take the case of a CD playing
sound. The transport may know what the consequences of suspending
a link will be to the devices, but only the devices know whether the
consequences are acceptable.

> > I am talking about correctness for controllers. So remote wakeup may or may not
> > be available. Likewise the bus may be able to predict how long it'll be idle.
>
> I don't understand. Are you saying that whether or not it's correct to
> suspend a link depends on whether the device may need to talk to the
> CPU at unpredictable times? And if so, isn't that the same as saying

Yes.

> that remote wakeup for the link can be enabled?

Remote wakeup is a concept specific to USB. If you are writing for
a generic system the question is indeed whether devices may want
to talk to the host and whether they can.
It seems to me that the ULD will know whether its devices will need
to talk to the CPU.

> > That's the problem. You don't tell the children when the parent might want
> > to suspend.
>
> Why should the children need to know?

Because they'll want to do things like flushing caches.

> If the children are already suspended then we certainly don't
> need to tell them the link is going down.

Yes.

> If the children are active, then the link's driver or the
> transport class must already have given the okay for
> suspending the link while leaving the children active.

Because the transport class may not know either.

Regards
Oliver

2008-08-25 16:18:29

by Alan Stern

[permalink] [raw]
Subject: Re: [linux-pm] Power management for SCSI

On Mon, 25 Aug 2008, Oliver Neukum wrote:

> Am Montag 25 August 2008 16:45:20 schrieb Alan Stern:
>
> > You didn't answer my question: How does the HLD know whether it's okay
> > to suspend the link without suspending the device? I should think that
> > it _doesn't_ know.
> >
> > The transport class code might know, or the link's driver -- but not
> > the HLD. The HLD probably doesn't even know what type of transport is
> > being used!
>
> There's some truth to that. Unfortunately the transport does not know
> whether a device or link may be suspended. Take the case of a CD playing
> sound. The transport may know what the consequences of suspending
> a link will be to the devices, but only the devices know whether the
> consequences are acceptable.

Even the device (or more properly, the driver) might not know! In your
example the driver might realize that playing had been started, but it
probably wouldn't know when the playing had ended.

> > I don't understand. Are you saying that whether or not it's correct to
> > suspend a link depends on whether the device may need to talk to the
> > CPU at unpredictable times? And if so, isn't that the same as saying
>
> Yes.
>
> > that remote wakeup for the link can be enabled?
>
> Remote wakeup is a concept specific to USB.

That's not true at all. Maybe the name is specific to USB, but the
concept isn't. Notice how we have power/wakeup files in the sysfs
directory for every device, even non-USB devices? Requesting a
low-power to high-power transition is a generic operation.

> If you are writing for
> a generic system the question is indeed whether devices may want
> to talk to the host and whether they can.
> It seems to me that the ULD will know whether its devices will need
> to talk to the CPU.

In general, the link or transport class will know whether it is
possible for a device to initiate communication with the CPU. If it is
possible then the link would probably want to have remote wakeup
enabled before autosuspending, even if none of the devices currently
attached actually wants to use it.

> > > That's the problem. You don't tell the children when the parent might want
> > > to suspend.
> >
> > Why should the children need to know?
>
> Because they'll want to do things like flushing caches.
>
> > If the children are already suspended then we certainly don't
> > need to tell them the link is going down.
>
> Yes.
>
> > If the children are active, then the link's driver or the
> > transport class must already have given the okay for
> > suspending the link while leaving the children active.
>
> Because the transport class may not know either.

So sd.c might, in theory, want to respond in two different ways to an
autosuspend request:

(A) Drain the cache,

(B) Drain the cache and spin down the drive.

How does it know which to do? Ask the transport class for help
choosing?

(A) would leave us in an awkward "half-suspended" state. Is the device
suspended or not? It is, in the sense that now the link can safely be
suspended. But it isn't, in the sense that a system sleep would still
require the drive to be spun down.

It's kind of like the state we have following a PMSG_FREEZE --
quiescent but not suspended. Somehow this extra state needs to be
incorporated into the autosuspend framework.

Alan Stern

2008-08-25 17:33:08

by Oliver Neukum

[permalink] [raw]
Subject: Re: [linux-pm] Power management for SCSI

Am Montag 25 August 2008 18:18:19 schrieb Alan Stern:
> On Mon, 25 Aug 2008, Oliver Neukum wrote:

> > There's some truth to that. Unfortunately the transport does not know
> > whether a device or link may be suspended. Take the case of a CD playing
> > sound. The transport may know what the consequences of suspending
> > a link will be to the devices, but only the devices know whether the
> > consequences are acceptable.
>
> Even the device (or more properly, the driver) might not know! In your
> example the driver might realize that playing had been started, but it
> probably wouldn't know when the playing had ended.

There is that possibility.

> That's not true at all. Maybe the name is specific to USB, but the
> concept isn't. Notice how we have power/wakeup files in the sysfs
> directory for every device, even non-USB devices? Requesting a
> low-power to high-power transition is a generic operation.

True. Let's say that we have to deal with busses incapable of supporting it.

> > If you are writing for
> > a generic system the question is indeed whether devices may want
> > to talk to the host and whether they can.
> > It seems to me that the ULD will know whether its devices will need
> > to talk to the CPU.
>
> In general, the link or transport class will know whether it is
> possible for a device to initiate communication with the CPU. If it is

Yes.

> possible then the link would probably want to have remote wakeup
> enabled before autosuspending, even if none of the devices currently
> attached actually wants to use it.

That supposes it doesn't matter in terms of power use. Is that true?

> So sd.c might, in theory, want to respond in two different ways to an
> autosuspend request:
>
> (A) Drain the cache,
>
> (B) Drain the cache and spin down the drive.

(C) Do nothing

(D) Refuse (i.e. the user has opened a block device and used a vendor
specific command)

> How does it know which to do? Ask the transport class for help
> choosing?

I see no other way.

> (A) would leave us in an awkward "half-suspended" state. Is the device
> suspended or not? It is, in the sense that now the link can safely be
> suspended. But it isn't, in the sense that a system sleep would still
> require the drive to be spun down.
>
> It's kind of like the state we have following a PMSG_FREEZE --
> quiescent but not suspended. Somehow this extra state needs to be
> incorporated into the autosuspend framework.

Why? Unless the device can be skipped for purposes of autosuspend and
system sleep, isn't it active?

Regards
Oliver

2008-08-25 18:39:29

by Alan Stern

[permalink] [raw]
Subject: Re: [linux-pm] Power management for SCSI

On Mon, 25 Aug 2008, Oliver Neukum wrote:

> > possible then the link would probably want to have remote wakeup
> > enabled before autosuspending, even if none of the devices currently
> > attached actually wants to use it.
>
> That supposes it doesn't matter in terms of power use. Is that true?

I don't know -- it would depend on the particular transport. In any
case, it's a decision the transport class can make.

> > So sd.c might, in theory, want to respond in two different ways to an
> > autosuspend request:
> >
> > (A) Drain the cache,
> >
> > (B) Drain the cache and spin down the drive.
>
> (C) Do nothing

Possibly.

> (D) Refuse (i.e. the user has opened a block device and used a vendor
> specific command)

Also possible, although I don't think your example is a good one since
sd.c wouldn't be aware of vendor-specific commands.

> > How does it know which to do? Ask the transport class for help
> > choosing?
>
> I see no other way.
>
> > (A) would leave us in an awkward "half-suspended" state. Is the device
> > suspended or not? It is, in the sense that now the link can safely be
> > suspended. But it isn't, in the sense that a system sleep would still
> > require the drive to be spun down.
> >
> > It's kind of like the state we have following a PMSG_FREEZE --
> > quiescent but not suspended. Somehow this extra state needs to be
> > incorporated into the autosuspend framework.
>
> Why? Unless the device can be skipped for purposes of autosuspend and
> system sleep, isn't it active?

To my mind, if the driver has to do something special to prepare for
the link going down (such as draining the cache), then afterward the
device is in a special state -- not the same as the active state. The
difference between the two states is that in one the link may be
autosuspended and in the other it mustn't.

I see the driver making the transition between these states in response
to autosuspend and autoresume calls.

This means a driver such as sd.c has to respond in different sorts of
ways to various autosuspend scenarios, either doing a real power-down
or merely preparing for the link to go down. The implication is that
we might want to send the driver two different autosuspend calls: One
to prepare for the link to go down (after, say, a couple of seconds of
idleness) and another to power-down the device (after, say, 15 minutes
of idleness).

Thus, there would be two "autosuspended" states: a shallow autosuspend
(cache is drained) and a deep autosuspend (disk is spun down). Such an
approach could be made to work, even though it seems slightly
artificial.

Alan Stern