2003-06-05 13:33:38

by Benjamin Herrenschmidt

[permalink] [raw]
Subject: [PATCH] IDE Power Management, try 2

Ok, Here's the new one.

However, I'm not completely happy with it yet. Bart, as you suggested, I
moved the state value to drive in order to keep a normal taskfile struct
in rq->special. But that has a drawback: The request beeing re-fetched
from the queue on each step, it goes through start_request for each of
them. So I needed a way to know when is the "first pass" so I could
initialize drive->pm_step properly. I don't want to initialize it
prior to queuing the request as it would be racy. In fact, that pm_step
is really a property of the request itself.

I had to add yet another rq->flags bit for that, and I think that sucks

Also, currently, I'm not passing the "state" argument of the PM callback
to the PM request. That argument would be needed if we ever wanted to
distinguish between S1,S2,S3,S4.... For example, that could be used to
skip the STANDBY pass on suspend-to-disk to avoid the disk spinning down
and back up (suspend-to-disk is slow enough already ;)

So I'm considering going back to putting some custom pm state structure
in rq->special, but having this structure hold a taskfile structure as
it's first entry so it is transparent to the taskfile interrupt
handlers. It's not the prettiest thing, but unless we add more fields
to struct request (which would be an option too since those PM requests
could be used by _any_ block driver to implement proper power
management).

Jens, Bart, what do you think ? Should I add pm_step & pm_state to
struct request ? Do the "extended taskfile structure" thing ? Or just
keep things like they are in this new patch and forget about carrying
the PM state value ?

I also added another rq->flags bit for requests forced at the head of
the queue with ide_preempt. This is typically for sense requests done
by ide-cd (though I also spotted a user in the tcq stuff). I need that
to make sure that if such a request ever happens to be pushed in front
of the current PM request (with the drive->blocked flag already set),
we don't enter an endless loop, fetching that new request and dropping
it right away because we only accept PM requests from the queue once
the drive is suspended.

Regards,
Ben.

===== drivers/ide/ide-cd.c 1.49 vs edited =====
--- 1.49/drivers/ide/ide-cd.c Sun Jun 1 21:55:45 2003
+++ edited/drivers/ide/ide-cd.c Thu Jun 5 15:31:25 2003
@@ -3253,6 +3253,44 @@

static int ide_cdrom_attach (ide_drive_t *drive);

+/* Power Management state machine.
+ *
+ * We don't do much for CDs right now
+ */
+
+
+static void ide_cdrom_complete_power_step (ide_drive_t *drive, struct request *rq, u8 stat, u8 error)
+{
+}
+
+static ide_startstop_t ide_cdrom_start_power_step (ide_drive_t *drive, struct request *rq)
+{
+ ide_task_t *args = rq->special;
+
+ memset(args, 0, sizeof(ide_task_t));
+
+ switch(drive->pm_step) {
+ case ide_pm_state_start_suspend:
+ break;
+
+ case ide_pm_state_start_resume: /* Resume step 1 (restore DMA) */
+ /* Right now, all we do is call ide_dma_check for the HWIF,
+ * we could be smarter and check for current xfer_speed
+ * in struct drive etc...
+ * Also, this step could be implemented as a generic helper
+ * as most subdrivers will use it
+ */
+ if (!drive->id || !(drive->id->capability & 1))
+ break;
+ if (HWIF(drive)->ide_dma_check == NULL)
+ break;
+ HWIF(drive)->ide_dma_check(drive);
+ break;
+ }
+ drive->pm_step = ide_pm_state_completed;
+ return ide_stopped;
+}
+
static ide_driver_t ide_cdrom_driver = {
.owner = THIS_MODULE,
.name = "ide-cdrom",
@@ -3269,6 +3307,12 @@
.capacity = ide_cdrom_capacity,
.attach = ide_cdrom_attach,
.drives = LIST_HEAD_INIT(ide_cdrom_driver.drives),
+ .start_power_step = ide_cdrom_start_power_step,
+ .complete_power_step = ide_cdrom_complete_power_step,
+ .gen_driver = {
+ .suspend = generic_ide_suspend,
+ .resume = generic_ide_resume,
+ }
};

static int idecd_open(struct inode * inode, struct file * file)
===== drivers/ide/ide-disk.c 1.45 vs edited =====
--- 1.45/drivers/ide/ide-disk.c Sun Jun 1 21:55:48 2003
+++ edited/drivers/ide/ide-disk.c Thu Jun 5 15:31:59 2003
@@ -138,8 +138,6 @@

#ifndef CONFIG_IDE_TASKFILE_IO

-static int driver_blocked;
-
/*
* read_intr() is the handler for disk read/multread interrupts
*/
@@ -364,9 +362,6 @@

nsectors.all = (u16) rq->nr_sectors;

- if (driver_blocked)
- panic("Request while ide driver is blocked?");
-
if (drive->using_tcq && idedisk_start_tag(drive, rq)) {
if (!ata_pending_commands(drive))
BUG();
@@ -1392,21 +1387,6 @@
return 0;
}

-static int call_idedisk_standby (ide_drive_t *drive, int arg)
-{
- ide_task_t args;
- u8 standby = (arg) ? WIN_STANDBYNOW2 : WIN_STANDBYNOW1;
- memset(&args, 0, sizeof(ide_task_t));
- args.tfRegister[IDE_COMMAND_OFFSET] = standby;
- args.command_type = ide_cmd_type_parser(&args);
- return ide_raw_taskfile(drive, &args, NULL);
-}
-
-static int do_idedisk_standby (ide_drive_t *drive)
-{
- return call_idedisk_standby(drive, 0);
-}
-
static int do_idedisk_flushcache (ide_drive_t *drive)
{
ide_task_t args;
@@ -1505,37 +1485,71 @@
#endif
}

-static int idedisk_suspend(struct device *dev, u32 state, u32 level)
-{
- ide_drive_t *drive = dev->driver_data;
-
- printk("Suspending device %p\n", dev->driver_data);

- /* I hope that every freeze operation from the upper levels have
- * already been done...
- */
+/* Power Management state machine. This one is rather trivial for now,
+ * we should probably add more, like switching back to PIO on suspend
+ * to help some BIOSes, re-do the door locking on resume, etc...
+ */

- if (level != SUSPEND_SAVE_STATE)
- return 0;
+enum {
+ idedisk_pm_flush_cache = ide_pm_state_start_suspend,
+ idedisk_pm_standby,

- /* set the drive to standby */
- printk(KERN_INFO "suspending: %s ", drive->name);
- do_idedisk_standby(drive);
- drive->blocked = 1;
+ idedisk_pm_restore_dma = ide_pm_state_start_resume,
+};

- BUG_ON (HWGROUP(drive)->handler);
- return 0;
+static void idedisk_complete_power_step (ide_drive_t *drive, struct request *rq, u8 stat, u8 error)
+{
+ switch(drive->pm_step) {
+ case idedisk_pm_flush_cache: /* Suspend step 1 (flush cache) complete */
+ drive->pm_step = idedisk_pm_standby;
+ break;
+ case idedisk_pm_standby: /* Suspend step 2 (standby) complete */
+ drive->pm_step = ide_pm_state_completed;
+ break;
+ }
}

-static int idedisk_resume(struct device *dev, u32 level)
+static ide_startstop_t idedisk_start_power_step (ide_drive_t *drive, struct request *rq)
{
- ide_drive_t *drive = dev->driver_data;
+ ide_task_t *args = rq->special;

- if (level != RESUME_RESTORE_STATE)
- return 0;
- BUG_ON(!drive->blocked);
- drive->blocked = 0;
- return 0;
+ memset(args, 0, sizeof(ide_task_t));
+
+ switch(drive->pm_step) {
+ case idedisk_pm_flush_cache: /* Suspend step 1 (flush cache) */
+ /* Not supported ? switch to next step now */
+ if (!drive->wcache) {
+ idedisk_complete_power_step(drive, rq, 0, 0);
+ return ide_stopped;
+ }
+ if (drive->id->cfs_enable_2 & 0x2400)
+ args->tfRegister[IDE_COMMAND_OFFSET] = WIN_FLUSH_CACHE_EXT;
+ else
+ args->tfRegister[IDE_COMMAND_OFFSET] = WIN_FLUSH_CACHE;
+ args->command_type = ide_cmd_type_parser(args);
+ return do_rw_taskfile(drive, args);
+ case idedisk_pm_standby: /* Suspend step 2 (standby) */
+ args->tfRegister[IDE_COMMAND_OFFSET] = WIN_STANDBYNOW1;
+ args->command_type = ide_cmd_type_parser(args);
+ return do_rw_taskfile(drive, args);
+
+ case idedisk_pm_restore_dma: /* Resume step 1 (restore DMA) */
+ /* Right now, all we do is call ide_dma_check for the HWIF,
+ * we could be smarter and check for current xfer_speed
+ * in struct drive etc...
+ * Also, this step could be implemented as a generic helper
+ * as most subdrivers will use it
+ */
+ if (!drive->id || !(drive->id->capability & 1))
+ break;
+ if (HWIF(drive)->ide_dma_check == NULL)
+ break;
+ HWIF(drive)->ide_dma_check(drive);
+ break;
+ }
+ drive->pm_step = ide_pm_state_completed;
+ return ide_stopped;
}

static void idedisk_setup (ide_drive_t *drive)
@@ -1681,9 +1695,11 @@
.proc = idedisk_proc,
.attach = idedisk_attach,
.drives = LIST_HEAD_INIT(idedisk_driver.drives),
+ .start_power_step = idedisk_start_power_step,
+ .complete_power_step = idedisk_complete_power_step,
.gen_driver = {
- .suspend = idedisk_suspend,
- .resume = idedisk_resume,
+ .suspend = generic_ide_suspend,
+ .resume = generic_ide_resume,
}
};

===== drivers/ide/ide-io.c 1.11 vs edited =====
--- 1.11/drivers/ide/ide-io.c Mon May 12 02:09:46 2003
+++ edited/drivers/ide/ide-io.c Thu Jun 5 15:32:44 2003
@@ -139,6 +139,38 @@
EXPORT_SYMBOL(ide_end_request);

/**
+ * ide_complete_pm_request - end the current Power Management request
+ * @drive: target drive
+ * @rq: request
+ *
+ * This function cleans up the current PM request and stops the queue
+ * if necessary.
+ */
+
+static void ide_complete_pm_request (ide_drive_t *drive, struct request *rq)
+{
+ unsigned long flags;
+ int suspend = ((rq->flags & REQ_PM_SUSPEND) != 0);
+
+#ifdef DEBUG_PM
+ printk("%s: completing PM request, suspend: %d\n", drive->name, suspend);
+#endif
+ spin_lock_irqsave(&ide_lock, flags);
+ if (suspend)
+ __blk_stop_queue(&drive->queue);
+ else
+ drive->blocked = 0;
+ blkdev_dequeue_request(rq);
+ HWGROUP(drive)->rq = NULL;
+ end_that_request_last(rq);
+ spin_unlock_irqrestore(&ide_lock, flags);
+ /* Hrm... there is no __blk_start_queue... */
+ if (!suspend)
+ blk_start_queue(&drive->queue);
+}
+
+
+/**
* ide_end_drive_cmd - end an explicit drive command
* @drive: command
* @stat: status bits
@@ -214,6 +246,15 @@
args->hobRegister[IDE_HCYL_OFFSET_HOB] = hwif->INB(IDE_HCYL_REG);
}
}
+ } else if (blk_request_is_pm(rq)) {
+#ifdef DEBUG_PM
+ printk("%s: complete_power_step(step: %d, stat: %x, err: %x)\n",
+ drive->name, drive->pm_step, stat, err);
+#endif
+ DRIVER(drive)->complete_power_step(drive, rq, stat, err);
+ if (drive->pm_step == ide_pm_state_completed)
+ ide_complete_pm_request(drive, rq);
+ return;
}

spin_lock_irqsave(&ide_lock, flags);
@@ -615,7 +656,37 @@
while ((read_timer() - HWIF(drive)->last_time) < DISK_RECOVERY_TIME);
#endif

+ if ((rq->flags & REQ_PM_SUSPEND) && !(rq->flags & REQ_PM_STARTED)) {
+ /* Mark drive blocked when starting the suspend sequence */
+ drive->blocked = 1;
+ drive->pm_step = ide_pm_state_start_suspend;
+ rq->flags |= REQ_PM_STARTED;
+ } else if ((rq->flags & REQ_PM_RESUME) && !(rq->flags & REQ_PM_STARTED)) {
+ int rc;
+
+ drive->pm_step = ide_pm_state_start_resume;
+ rq->flags |= REQ_PM_STARTED;
+
+ /* Te first thing we do on wakeup is to wait for BSY bit to go
+ * away (with a looong timeout) as a drive on this hwif may just
+ * be POSTing itself.
+ * We do that before even selecting as the "other" device on the
+ * bus may be broken enough to walk on our toes at this point.
+ */
+#ifdef DEBUG_PM
+ printk("%s: Wakeup request inited, waiting for !BSY...\n", drive->name);
+#endif
+ rc = ide_wait_not_busy(HWIF(drive), 35000);
+ if (rc)
+ printk(KERN_WARNING "%s: bus not ready on wakeup\n", drive->name);
+ SELECT_DRIVE(drive);
+ HWIF(drive)->OUTB(8, HWIF(drive)->io_ports[IDE_CONTROL_OFFSET]);
+ rc = ide_wait_not_busy(HWIF(drive), 10000);
+ if (rc)
+ printk(KERN_WARNING "%s: drive not ready on wakeup\n", drive->name);
+ }
SELECT_DRIVE(drive);
+
if (ide_wait_stat(&startstop, drive, drive->ready_stat, BUSY_STAT|DRQ_STAT, WAIT_READY)) {
printk(KERN_ERR "%s: drive not ready for command\n", drive->name);
return startstop;
@@ -625,6 +696,17 @@
return execute_drive_cmd(drive, rq);
else if (rq->flags & REQ_DRIVE_TASKFILE)
return execute_drive_cmd(drive, rq);
+ else if (blk_request_is_pm(rq)) {
+#ifdef DEBUG_PM
+ printk("%s: start_power_step(step: %d)\n", drive->name, drive->pm_step);
+#endif
+ startstop = DRIVER(drive)->start_power_step(drive, rq);
+ if (startstop == ide_stopped && drive->pm_step == ide_pm_state_completed) {
+ ide_complete_pm_request(drive, rq);
+ return ide_stopped;
+ } else
+ return startstop;
+ }
return (DRIVER(drive)->do_request(drive, rq, block));
}
return do_special(drive);
@@ -837,6 +919,26 @@
break;
}

+ /* Sanity: don't accept a request that isn't a PM request
+ * if we are currently power managed. This is very important as
+ * blk_stop_queue() doesn't prevent the elv_next_request() above
+ * to return us whatever is in the queue. Since we call ide_do_request()
+ * ourselves, we end up taking requests while the queue is blocked...
+ *
+ * We let requests forced at head of queue with ide-preempt though,
+ * I hope that doesn't happen too much, hopefully not unless the
+ * subdriver triggers such a thing in it's own PM state machine
+ */
+ if (drive->blocked && !blk_request_is_pm(rq) && !(rq->flags & REQ_PREEMPT)) {
+#ifdef DEBUG_PM
+ printk("%s: a request made it's way while we are power managing...\n", drive->name);
+#endif
+ /* We clear busy, there should be no pending ATA command at this point
+ */
+ hwgroup->busy = 0;
+ break;
+ }
+
if (!rq->bio && ata_pending_commands(drive))
break;

@@ -1282,12 +1384,16 @@
ide_hwgroup_t *hwgroup = HWGROUP(drive);
DECLARE_COMPLETION(wait);
int insert_end = 1, err;
-
+ int must_wait = (action == ide_wait || action == ide_head_wait);
+
#ifdef CONFIG_BLK_DEV_PDC4030
/*
* FIXME: there should be a drive or hwif->special
* handler that points here by default, not hacks
* in the ide-io.c code
+ *
+ * FIXME2: That code breaks power management if used with
+ * this chipset, that really doesn't belong here !
*/
if (HWIF(drive)->chipset == ide_pdc4030 && rq->buffer != NULL)
return -ENOSYS; /* special drive cmds not supported */
@@ -1301,22 +1407,23 @@
* we need to hold an extra reference to request for safe inspection
* after completion
*/
- if (action == ide_wait) {
+ if (must_wait) {
rq->ref_count++;
rq->waiting = &wait;
}

spin_lock_irqsave(&ide_lock, flags);
- if (action == ide_preempt) {
+ if (action == ide_preempt || action == ide_head_wait) {
hwgroup->rq = NULL;
insert_end = 0;
+ rq->flags |= REQ_PREEMPT;
}
__elv_add_request(&drive->queue, rq, insert_end, 0);
ide_do_request(hwgroup, IDE_NO_IRQ);
spin_unlock_irqrestore(&ide_lock, flags);

err = 0;
- if (action == ide_wait) {
+ if (must_wait) {
wait_for_completion(&wait);
if (rq->errors)
err = -EIO;
===== drivers/ide/ide-iops.c 1.17 vs edited =====
--- 1.17/drivers/ide/ide-iops.c Tue Mar 25 12:56:05 2003
+++ edited/drivers/ide/ide-iops.c Wed Jun 4 14:55:13 2003
@@ -1318,5 +1318,32 @@
return do_reset1(drive, 0);
}

+/*
+ * This function waits for the hwif to report a non-busy status
+ * see comments in probe_hwif()
+ */
+int ide_wait_not_busy(ide_hwif_t *hwif, unsigned long timeout)
+{
+ u8 stat = 0;
+
+ while(timeout--) {
+ /* Turn this into a schedule() sleep once I'm sure
+ * about locking issues (2.5 work ?)
+ */
+ mdelay(1);
+ stat = hwif->INB(hwif->io_ports[IDE_STATUS_OFFSET]);
+ if ((stat & BUSY_STAT) == 0)
+ break;
+ /* Assume a value of 0xff means nothing is connected to
+ * the interface and it doesn't implement the pull-down
+ * resistor on D7
+ */
+ if (stat == 0xff)
+ break;
+ }
+ return ((stat & BUSY_STAT) == 0) ? 0 : -EBUSY;
+}
+
+
EXPORT_SYMBOL(ide_do_reset);

===== drivers/ide/ide-probe.c 1.49 vs edited =====
--- 1.49/drivers/ide/ide-probe.c Tue May 27 02:48:43 2003
+++ edited/drivers/ide/ide-probe.c Wed Jun 4 14:56:08 2003
@@ -723,35 +723,6 @@

//EXPORT_SYMBOL(hwif_register);

-/* Enable code below on all archs later, for now, I want it on PPC
- */
-#ifdef CONFIG_PPC
-/*
- * This function waits for the hwif to report a non-busy status
- * see comments in probe_hwif()
- */
-static int wait_not_busy(ide_hwif_t *hwif, unsigned long timeout)
-{
- u8 stat = 0;
-
- while(timeout--) {
- /* Turn this into a schedule() sleep once I'm sure
- * about locking issues (2.5 work ?)
- */
- mdelay(1);
- stat = hwif->INB(hwif->io_ports[IDE_STATUS_OFFSET]);
- if ((stat & BUSY_STAT) == 0)
- break;
- /* Assume a value of 0xff means nothing is connected to
- * the interface and it doesn't implement the pull-down
- * resistor on D7
- */
- if (stat == 0xff)
- break;
- }
- return ((stat & BUSY_STAT) == 0) ? 0 : -EBUSY;
-}
-
static int wait_hwif_ready(ide_hwif_t *hwif)
{
int rc;
@@ -766,7 +737,7 @@
* I know of at least one disk who takes 31 seconds, I use 35
* here to be safe
*/
- rc = wait_not_busy(hwif, 35000);
+ rc = ide_wait_not_busy(hwif, 35000);
if (rc)
return rc;

@@ -774,20 +745,19 @@
SELECT_DRIVE(&hwif->drives[0]);
hwif->OUTB(8, hwif->io_ports[IDE_CONTROL_OFFSET]);
mdelay(2);
- rc = wait_not_busy(hwif, 10000);
+ rc = ide_wait_not_busy(hwif, 10000);
if (rc)
return rc;
SELECT_DRIVE(&hwif->drives[1]);
hwif->OUTB(8, hwif->io_ports[IDE_CONTROL_OFFSET]);
mdelay(2);
- rc = wait_not_busy(hwif, 10000);
+ rc = ide_wait_not_busy(hwif, 10000);

/* Exit function with master reselected (let's be sane) */
SELECT_DRIVE(&hwif->drives[0]);

return rc;
}
-#endif /* CONFIG_PPC */

/*
* This routine only knows how to look for drive units 0 and 1
===== drivers/ide/ide.c 1.70 vs edited =====
--- 1.70/drivers/ide/ide.c Sun Jun 1 21:50:38 2003
+++ edited/drivers/ide/ide.c Thu Jun 5 15:33:19 2003
@@ -1441,6 +1441,44 @@

EXPORT_SYMBOL(ata_attach);

+int generic_ide_suspend(struct device *dev, u32 state, u32 level)
+{
+ ide_drive_t *drive = dev->driver_data;
+ struct request rq;
+ ide_task_t args;
+
+ if (level == dev->power_state || level != SUSPEND_SAVE_STATE)
+ return 0;
+
+ memset(&rq, 0, sizeof(rq));
+ memset(&args, 0, sizeof(args));
+ rq.flags = REQ_PM_SUSPEND;
+ rq.special = &args;
+
+ return ide_do_drive_cmd(drive, &rq, ide_wait);
+}
+
+EXPORT_SYMBOL(generic_ide_suspend);
+
+int generic_ide_resume(struct device *dev, u32 level)
+{
+ ide_drive_t *drive = dev->driver_data;
+ struct request rq;
+ ide_task_t args;
+
+ if (level == dev->power_state || level != RESUME_RESTORE_STATE)
+ return 0;
+
+ memset(&rq, 0, sizeof(rq));
+ memset(&args, 0, sizeof(args));
+ rq.flags = REQ_PM_RESUME;
+ rq.special = &args;
+
+ return ide_do_drive_cmd(drive, &rq, ide_head_wait);
+}
+
+EXPORT_SYMBOL(generic_ide_resume);
+
int generic_ide_ioctl(struct block_device *bdev, unsigned int cmd,
unsigned long arg)
{
===== include/linux/blkdev.h 1.106 vs edited =====
--- 1.106/include/linux/blkdev.h Tue May 27 22:15:31 2003
+++ edited/include/linux/blkdev.h Thu Jun 5 15:33:52 2003
@@ -130,6 +130,10 @@
__REQ_DRIVE_CMD,
__REQ_DRIVE_TASK,
__REQ_DRIVE_TASKFILE,
+ __REQ_PM_SUSPEND, /* Suspend request */
+ __REQ_PM_RESUME, /* Resume request */
+ __REQ_PM_STARTED, /* Flag set when PM state machine has started */
+ __REQ_PREEMPT, /* Flag set for "ide_preempt" requests */
__REQ_NR_BITS, /* stops here */
};

@@ -151,6 +155,13 @@
#define REQ_DRIVE_CMD (1 << __REQ_DRIVE_CMD)
#define REQ_DRIVE_TASK (1 << __REQ_DRIVE_TASK)
#define REQ_DRIVE_TASKFILE (1 << __REQ_DRIVE_TASKFILE)
+#define REQ_PM_SUSPEND (1 << __REQ_PM_SUSPEND)
+#define REQ_PM_RESUME (1 << __REQ_PM_RESUME)
+#define REQ_PM_STARTED (1 << __REQ_PM_STARTED)
+#define REQ_PREEMPT (1 << __REQ_PREEMPT)
+
+#define blk_request_is_pm(rq) \
+ ((rq->flags & (REQ_PM_SUSPEND | REQ_PM_RESUME)) != 0)

#include <linux/elevator.h>

===== include/linux/ide.h 1.54 vs edited =====
--- 1.54/include/linux/ide.h Thu May 29 21:04:47 2003
+++ edited/include/linux/ide.h Thu Jun 5 15:34:36 2003
@@ -23,6 +23,8 @@
#include <asm/hdreg.h>
#include <asm/io.h>

+#define DEBUG_PM
+
/*
* This is the multiple IDE interface driver, as evolved from hd.c.
* It supports up to four IDE interfaces, on one or more IRQs (usually 14 & 15).
@@ -771,6 +773,7 @@
int forced_lun; /* if hdxlun was given at boot */
int lun; /* logical unit */
int crc_count; /* crc counter to reduce drive speed */
+ int pm_step; /* Power Management state machine step */
struct list_head list;
struct device gendev;
struct gendisk *disk;
@@ -1143,6 +1146,40 @@
#define PROC_IDE_READ_RETURN(page,start,off,count,eof,len) return 0;
#endif

+
+/* Power Management step value (drive->pm_step).
+ *
+ * The step value starts at 0 (ide_pm_state_start_suspend) for a
+ * suspend operation or 1000 (ide_pm_state_start_resume) for a
+ * resume operation
+ *
+ * For each step, the core calls the subdriver start_power_step()
+ * first. This can return:
+ * - ide_stopped : In this case, the core calls us back again unless
+ * step have been set to ide_power_state_completed
+ * - ide_started : In this case, the channel is left busy until an
+ * async event (interrupt) occurs.
+ * Typically, start_power_step() will issue a taskfile request with
+ * do_rw_taskfile().
+ *
+ * Upon reception of the interrupt, the core will call complete_power_step()
+ * with the error code if any. This routine should update the step value
+ * and return. It should not start a new request. The core will call
+ * start_power_step for the new step value, unless step have been set to
+ * ide_power_state_completed.
+ *
+ * Subdrivers are epxected to define their own additional power
+ * steps from 1..999 for suspend and from 1000..1999 for resume,
+ * other values are reserved for future use
+ */
+
+enum {
+ ide_pm_state_completed = -1,
+ ide_pm_state_start_suspend = 0,
+ ide_pm_state_start_resume = 1000,
+};
+
+
/*
* Subdrivers support.
*/
@@ -1172,6 +1209,8 @@
int (*attach)(ide_drive_t *);
void (*ata_prebuilder)(ide_drive_t *);
void (*atapi_prebuilder)(ide_drive_t *);
+ ide_startstop_t (*start_power_step)(ide_drive_t *, struct request *);
+ void (*complete_power_step)(ide_drive_t *, struct request *, u8, u8);
struct device_driver gen_driver;
struct list_head drives;
struct list_head drivers;
@@ -1180,6 +1219,8 @@
#define DRIVER(drive) ((drive)->driver)

extern int generic_ide_ioctl(struct block_device *, unsigned, unsigned long);
+extern int generic_ide_suspend(struct device *dev, u32 state, u32 level);
+extern int generic_ide_resume(struct device *dev, u32 level);

/*
* IDE modules.
@@ -1321,6 +1362,7 @@
ide_wait, /* insert rq at end of list, and wait for it */
ide_next, /* insert rq immediately after current request */
ide_preempt, /* insert rq in front of current request */
+ ide_head_wait, /* insert rq in front of current request and wait for it */
ide_end /* insert rq at end of list, but don't wait for it */
} ide_action_t;

@@ -1531,6 +1573,7 @@
extern int set_transfer(ide_drive_t *, ide_task_t *);
extern int taskfile_lib_get_identify(ide_drive_t *drive, u8 *);

+extern int ide_wait_not_busy(ide_hwif_t *hwif, unsigned long timeout);
ide_startstop_t __ide_do_rw_disk(ide_drive_t *drive, struct request *rq, sector_t block);

/*



Subject: Re: [PATCH] IDE Power Management, try 2



On 5 Jun 2003, Benjamin Herrenschmidt wrote:

> Ok, Here's the new one.
>
> However, I'm not completely happy with it yet. Bart, as you suggested, I

Yeah, I am not happy too.

> moved the state value to drive in order to keep a normal taskfile struct
> in rq->special. But that has a drawback: The request beeing re-fetched
> from the queue on each step, it goes through start_request for each of
> them. So I needed a way to know when is the "first pass" so I could
> initialize drive->pm_step properly. I don't want to initialize it
> prior to queuing the request as it would be racy. In fact, that pm_step
> is really a property of the request itself.
>
> I had to add yet another rq->flags bit for that, and I think that sucks

You don't have if you use additional, default pm_state (on == 0).
This sucks too, but a bit less.

> Also, currently, I'm not passing the "state" argument of the PM callback
> to the PM request. That argument would be needed if we ever wanted to
> distinguish between S1,S2,S3,S4.... For example, that could be used to
> skip the STANDBY pass on suspend-to-disk to avoid the disk spinning down
> and back up (suspend-to-disk is slow enough already ;)
>
> So I'm considering going back to putting some custom pm state structure
> in rq->special, but having this structure hold a taskfile structure as
> it's first entry so it is transparent to the taskfile interrupt
> handlers. It's not the prettiest thing, but unless we add more fields
> to struct request (which would be an option too since those PM requests
> could be used by _any_ block driver to implement proper power
> management).
>
> Jens, Bart, what do you think ? Should I add pm_step & pm_state to
> struct request ? Do the "extended taskfile structure" thing ? Or just
> keep things like they are in this new patch and forget about carrying
> the PM state value ?

I think extending struct request is the way to go,
pm_step & pm_state or even pointer to rq_pm_struct.

> I also added another rq->flags bit for requests forced at the head of
> the queue with ide_preempt. This is typically for sense requests done
> by ide-cd (though I also spotted a user in the tcq stuff). I need that
> to make sure that if such a request ever happens to be pushed in front
> of the current PM request (with the drive->blocked flag already set),
> we don't enter an endless loop, fetching that new request and dropping
> it right away because we only accept PM requests from the queue once
> the drive is suspended.

Jens, I think generic version of ide_do_drive_cmd() would be useful for
other block devices, what do you think?

Regards,
--
Bartlomiej

2003-06-05 14:14:34

by Benjamin Herrenschmidt

[permalink] [raw]
Subject: Re: [PATCH] IDE Power Management, try 2


> > I had to add yet another rq->flags bit for that, and I think that sucks
>
> You don't have if you use additional, default pm_state (on == 0).
> This sucks too, but a bit less.

Can you elaborate ? I'm not sure I understand what you meant

> I think extending struct request is the way to go,
> pm_step & pm_state or even pointer to rq_pm_struct.

Ok. I'll wait for Jens ack and go that way if he agrees.

Ben.

2003-06-05 14:17:19

by Benjamin Herrenschmidt

[permalink] [raw]
Subject: Re: [PATCH] IDE Power Management, try 2

On Thu, 2003-06-05 at 16:27, Benjamin Herrenschmidt wrote:
> > > I had to add yet another rq->flags bit for that, and I think that sucks
> >
> > You don't have if you use additional, default pm_state (on == 0).
> > This sucks too, but a bit less.
>
> Can you elaborate ? I'm not sure I understand what you meant

Forget it, my brain finally got a clue ;) Though I don't like the
solution. Adding pm_step & pm_state to struct request or a ptr to
rq_pm_struct seem the way to go to me, though I'm not sure which
of these 2 solution is the best, struct request is already a can
of worms imho... ;) If we ever need more PM fields in there, then
the pointer may be the best solution, but right know, I can't think
of any reason to add more stuffs

> > I think extending struct request is the way to go,
> > pm_step & pm_state or even pointer to rq_pm_struct.
>
> Ok. I'll wait for Jens ack and go that way if he agrees.
>
> Ben.
--
Benjamin Herrenschmidt <[email protected]>

2003-06-06 04:21:57

by Jens Axboe

[permalink] [raw]
Subject: Re: [PATCH] IDE Power Management, try 2

On Thu, Jun 05 2003, Bartlomiej Zolnierkiewicz wrote:
> > Jens, Bart, what do you think ? Should I add pm_step & pm_state to
> > struct request ? Do the "extended taskfile structure" thing ? Or just
> > keep things like they are in this new patch and forget about carrying
> > the PM state value ?
>
> I think extending struct request is the way to go,
> pm_step & pm_state or even pointer to rq_pm_struct.

Agree

> > I also added another rq->flags bit for requests forced at the head of
> > the queue with ide_preempt. This is typically for sense requests done
> > by ide-cd (though I also spotted a user in the tcq stuff). I need that
> > to make sure that if such a request ever happens to be pushed in front
> > of the current PM request (with the drive->blocked flag already set),
> > we don't enter an endless loop, fetching that new request and dropping
> > it right away because we only accept PM requests from the queue once
> > the drive is suspended.
>
> Jens, I think generic version of ide_do_drive_cmd() would be useful for
> other block devices, what do you think?

Yes very much so, scsi_ioctl also basically implements part of this
functionality.

--
Jens Axboe

2003-06-07 01:53:35

by Andre Hedrick

[permalink] [raw]
Subject: Re: [PATCH] IDE Power Management, try 2



You mean more like a "task management call".

On Thu, 5 Jun 2003, Bartlomiej Zolnierkiewicz wrote:

>
>
> On 5 Jun 2003, Benjamin Herrenschmidt wrote:
>
> > Ok, Here's the new one.
> >
> > However, I'm not completely happy with it yet. Bart, as you suggested, I
>
> Yeah, I am not happy too.
>
> > moved the state value to drive in order to keep a normal taskfile struct
> > in rq->special. But that has a drawback: The request beeing re-fetched
> > from the queue on each step, it goes through start_request for each of
> > them. So I needed a way to know when is the "first pass" so I could
> > initialize drive->pm_step properly. I don't want to initialize it
> > prior to queuing the request as it would be racy. In fact, that pm_step
> > is really a property of the request itself.
> >
> > I had to add yet another rq->flags bit for that, and I think that sucks
>
> You don't have if you use additional, default pm_state (on == 0).
> This sucks too, but a bit less.
>
> > Also, currently, I'm not passing the "state" argument of the PM callback
> > to the PM request. That argument would be needed if we ever wanted to
> > distinguish between S1,S2,S3,S4.... For example, that could be used to
> > skip the STANDBY pass on suspend-to-disk to avoid the disk spinning down
> > and back up (suspend-to-disk is slow enough already ;)
> >
> > So I'm considering going back to putting some custom pm state structure
> > in rq->special, but having this structure hold a taskfile structure as
> > it's first entry so it is transparent to the taskfile interrupt
> > handlers. It's not the prettiest thing, but unless we add more fields
> > to struct request (which would be an option too since those PM requests
> > could be used by _any_ block driver to implement proper power
> > management).
> >
> > Jens, Bart, what do you think ? Should I add pm_step & pm_state to
> > struct request ? Do the "extended taskfile structure" thing ? Or just
> > keep things like they are in this new patch and forget about carrying
> > the PM state value ?
>
> I think extending struct request is the way to go,
> pm_step & pm_state or even pointer to rq_pm_struct.
>
> > I also added another rq->flags bit for requests forced at the head of
> > the queue with ide_preempt. This is typically for sense requests done
> > by ide-cd (though I also spotted a user in the tcq stuff). I need that
> > to make sure that if such a request ever happens to be pushed in front
> > of the current PM request (with the drive->blocked flag already set),
> > we don't enter an endless loop, fetching that new request and dropping
> > it right away because we only accept PM requests from the queue once
> > the drive is suspended.
>
> Jens, I think generic version of ide_do_drive_cmd() would be useful for
> other block devices, what do you think?
>
> Regards,
> --
> Bartlomiej
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>

Andre Hedrick
LAD Storage Consulting Group

2003-06-10 16:01:31

by Jens Axboe

[permalink] [raw]
Subject: Re: [PATCH] IDE Power Management, try 2

On Thu, Jun 05 2003, Bartlomiej Zolnierkiewicz wrote:
> Jens, I think generic version of ide_do_drive_cmd() would be useful for
> other block devices, what do you think?

Something ala this? Completely untested, and I only did scsi_ioctl.c and
ide-io.c. iirc, scsi uses somthing similar that could be adapted too.

===== drivers/block/ll_rw_blk.c 1.173 vs edited =====
--- 1.173/drivers/block/ll_rw_blk.c Thu Jun 5 15:22:28 2003
+++ edited/drivers/block/ll_rw_blk.c Tue Jun 10 18:11:02 2003
@@ -1405,6 +1405,55 @@
spin_unlock_irqrestore(q->queue_lock, flags);
}

+/**
+ * blk_do_request - insert and issue request into block queue
+ * @bdev: target block device
+ * @rq: request to be inserted
+ * @action: where to insert request
+ *
+ * Description:
+ * Insert a request into the block queue at the end or head of the list,
+ * possibly waiting for its completion.
+ */
+int blk_do_request(struct gendisk *disk, struct request *rq, int action)
+{
+ request_queue_t *q = disk->queue;
+ DECLARE_COMPLETION(wait);
+ int err = 0;
+
+ if (!q)
+ return -ENODEV;
+
+ rq->errors = 0;
+ rq->rq_disk = disk;
+ rq->flags |= REQ_NOMERGE;
+
+ if (action & BLK_RQ_WAIT) {
+ /*
+ * we need an extra reference to the request if we want to
+ * look at it after io completion
+ */
+ rq->ref_count++;
+ rq->waiting = &wait;
+ }
+
+ elv_add_request(q, rq, action & BLK_RQ_END, 1);
+
+ generic_unplug_device(q);
+
+ if (action & BLK_RQ_WAIT) {
+ wait_for_completion(&wait);
+
+ if (rq->errors)
+ err = -EIO;
+
+ blk_put_request(rq);
+ }
+
+ return err;
+}
+
+
void drive_stat_acct(struct request *rq, int nr_sectors, int new_io)
{
int rw = rq_data_dir(rq);
===== drivers/block/scsi_ioctl.c 1.29 vs edited =====
--- 1.29/drivers/block/scsi_ioctl.c Fri May 30 20:01:51 2003
+++ edited/drivers/block/scsi_ioctl.c Tue Jun 10 18:07:20 2003
@@ -45,20 +45,11 @@
#define SCSI_SENSE_BUFFERSIZE 64
#endif

-static int blk_do_rq(request_queue_t *q, struct block_device *bdev,
- struct request *rq)
+#include <scsi/sg.h>
+
+static int sg_issue_request(struct block_device *bdev, struct request *rq)
{
char sense[SCSI_SENSE_BUFFERSIZE];
- DECLARE_COMPLETION(wait);
- int err = 0;
-
- rq->rq_disk = bdev->bd_disk;
-
- /*
- * we need an extra reference to the request, so we can look at
- * it after io completion
- */
- rq->ref_count++;

if (!rq->sense) {
memset(sense, 0, sizeof(sense));
@@ -66,20 +57,9 @@
rq->sense_len = 0;
}

- rq->flags |= REQ_NOMERGE;
- rq->waiting = &wait;
- elv_add_request(q, rq, 1, 1);
- generic_unplug_device(q);
- wait_for_completion(&wait);
-
- if (rq->errors)
- err = -EIO;
-
- return err;
+ return blk_do_request(bdev->bd_disk, rq, BLK_RQ_WAIT | BLK_RQ_END);
}

-#include <scsi/sg.h>
-
static int sg_get_version(int *p)
{
static int sg_version_num = 30527;
@@ -248,7 +228,7 @@
* (if he doesn't check that is his problem).
* N.B. a non-zero SCSI status is _not_ necessarily an error.
*/
- blk_do_rq(q, bdev, rq);
+ sg_issue_request(bdev, rq);

if (bio)
bio_unmap_user(bio, reading);
@@ -376,7 +356,7 @@
rq->data_len = bytes;
rq->flags |= REQ_BLOCK_PC;

- blk_do_rq(q, bdev, rq);
+ sg_issue_request(bdev, rq);
err = rq->errors & 0xff; /* only 8 bit SCSI status */
if (err) {
if (rq->sense_len && rq->sense) {
@@ -458,7 +438,7 @@
rq->cmd[0] = GPCMD_START_STOP_UNIT;
rq->cmd[4] = 0x02 + (close != 0);
rq->cmd_len = 6;
- err = blk_do_rq(q, bdev, rq);
+ err = sg_issue_request(bdev, rq);
blk_put_request(rq);
break;
default:
===== drivers/ide/ide-io.c 1.11 vs edited =====
--- 1.11/drivers/ide/ide-io.c Mon May 12 02:09:46 2003
+++ edited/drivers/ide/ide-io.c Tue Jun 10 18:12:45 2003
@@ -1278,10 +1278,9 @@

int ide_do_drive_cmd (ide_drive_t *drive, struct request *rq, ide_action_t action)
{
- unsigned long flags;
ide_hwgroup_t *hwgroup = HWGROUP(drive);
DECLARE_COMPLETION(wait);
- int insert_end = 1, err;
+ int blk_flags = 0;

#ifdef CONFIG_BLK_DEV_PDC4030
/*
@@ -1292,39 +1291,16 @@
if (HWIF(drive)->chipset == ide_pdc4030 && rq->buffer != NULL)
return -ENOSYS; /* special drive cmds not supported */
#endif
- rq->errors = 0;
- rq->rq_status = RQ_ACTIVE;

- rq->rq_disk = drive->disk;
+ if (action == ide_wait || action == ide_end)
+ blk_flags |= BLK_RQ_END;
+ if (action == ide_wait)
+ blk_flags |= BLK_RQ_WAIT;

- /*
- * we need to hold an extra reference to request for safe inspection
- * after completion
- */
- if (action == ide_wait) {
- rq->ref_count++;
- rq->waiting = &wait;
- }
-
- spin_lock_irqsave(&ide_lock, flags);
- if (action == ide_preempt) {
+ if (action == ide_preempt)
hwgroup->rq = NULL;
- insert_end = 0;
- }
- __elv_add_request(&drive->queue, rq, insert_end, 0);
- ide_do_request(hwgroup, IDE_NO_IRQ);
- spin_unlock_irqrestore(&ide_lock, flags);
-
- err = 0;
- if (action == ide_wait) {
- wait_for_completion(&wait);
- if (rq->errors)
- err = -EIO;
-
- blk_put_request(rq);
- }
-
- return err;
+
+ return blk_do_request(drive->disk, rq, blk_flags);
}

EXPORT_SYMBOL(ide_do_drive_cmd);
===== include/linux/blkdev.h 1.108 vs edited =====
--- 1.108/include/linux/blkdev.h Sun Jun 8 00:33:38 2003
+++ edited/include/linux/blkdev.h Tue Jun 10 18:05:49 2003
@@ -329,6 +329,12 @@
#define BLKPREP_KILL 1 /* fatal error, kill */
#define BLKPREP_DEFER 2 /* leave on queue */

+/*
+ * blk_do_request actions
+ */
+#define BLK_RQ_WAIT 1
+#define BLK_RQ_END 2
+
extern unsigned long blk_max_low_pfn, blk_max_pfn;

/*
@@ -372,6 +378,7 @@
extern struct request *blk_get_request(request_queue_t *, int, int);
extern void blk_put_request(struct request *);
extern void blk_insert_request(request_queue_t *, struct request *, int, void *);
+extern int blk_do_request(struct gendisk *, struct request *, int);
extern void blk_plug_device(request_queue_t *);
extern int blk_remove_plug(request_queue_t *);
extern void blk_recount_segments(request_queue_t *, struct bio *);

--
Jens Axboe

2003-06-10 16:14:17

by Benjamin Herrenschmidt

[permalink] [raw]
Subject: Re: [PATCH] IDE Power Management, try 2

On Tue, 2003-06-10 at 18:15, Jens Axboe wrote:
> On Thu, Jun 05 2003, Bartlomiej Zolnierkiewicz wrote:
> > Jens, I think generic version of ide_do_drive_cmd() would be useful for
> > other block devices, what do you think?
>
> Something ala this? Completely untested, and I only did scsi_ioctl.c and
> ide-io.c. iirc, scsi uses somthing similar that could be adapted too.

Hrm... you didn't keep some functionalities I added with the PM
patch here... like marking of preempt requests so I can fetch them
even when drive is blocked, and ide_head_wait...

Ben

Subject: Re: [PATCH] IDE Power Management, try 2


On 10 Jun 2003, Benjamin Herrenschmidt wrote:

> On Tue, 2003-06-10 at 18:15, Jens Axboe wrote:
> > On Thu, Jun 05 2003, Bartlomiej Zolnierkiewicz wrote:
> > > Jens, I think generic version of ide_do_drive_cmd() would be useful for
> > > other block devices, what do you think?
> >
> > Something ala this? Completely untested, and I only did scsi_ioctl.c and
> > ide-io.c. iirc, scsi uses somthing similar that could be adapted too.
>
> Hrm... you didn't keep some functionalities I added with the PM
> patch here... like marking of preempt requests so I can fetch them
> even when drive is blocked, and ide_head_wait...
>
> Ben

I've looked for users of ide_preempt in drivers/ide/ and I think
that REQ_PREEMPT can later die if we fix drivers to correctly mark
sense requests with REQ_SENSE...

--
Bartlomiej

2003-06-10 16:45:26

by Benjamin Herrenschmidt

[permalink] [raw]
Subject: Re: [PATCH] IDE Power Management, try 2

On Tue, 2003-06-10 at 18:36, Bartlomiej Zolnierkiewicz wrote:

> I've looked for users of ide_preempt in drivers/ide/ and I think
> that REQ_PREEMPT can later die if we fix drivers to correctly mark
> sense requests with REQ_SENSE...

Ok, I was not sure about REQ_SENSE exact meaning (I suppose the same
as REQ_PREEMPT for sense requests sent by the SCSI error handling ? :)

There's also a user in the TCQ stuff

Ben.

2003-06-10 17:14:32

by Jens Axboe

[permalink] [raw]
Subject: Re: [PATCH] IDE Power Management, try 2

On Tue, Jun 10 2003, Bartlomiej Zolnierkiewicz wrote:
>
> On 10 Jun 2003, Benjamin Herrenschmidt wrote:
>
> > On Tue, 2003-06-10 at 18:15, Jens Axboe wrote:
> > > On Thu, Jun 05 2003, Bartlomiej Zolnierkiewicz wrote:
> > > > Jens, I think generic version of ide_do_drive_cmd() would be useful for
> > > > other block devices, what do you think?
> > >
> > > Something ala this? Completely untested, and I only did scsi_ioctl.c and
> > > ide-io.c. iirc, scsi uses somthing similar that could be adapted too.
> >
> > Hrm... you didn't keep some functionalities I added with the PM
> > patch here... like marking of preempt requests so I can fetch them
> > even when drive is blocked, and ide_head_wait...
> >
> > Ben
>
> I've looked for users of ide_preempt in drivers/ide/ and I think
> that REQ_PREEMPT can later die if we fix drivers to correctly mark
> sense requests with REQ_SENSE...

Indeed, besides preempt is not really something you can do completely
in the block layer safely. So I'd rather kill that logic.

--
Jens Axboe