LinuxLists.cc - [PATCH] dm: pass up rotational flag

2011-05-25 20:51:21

by Mandeep Singh Baines

[permalink] [raw]

Subject: [PATCH] dm: pass up rotational flag

Allow the NONROT flag to be passed through to linear mappings if all
underlying device are non-rotational. Tools like ureadahead will
schedule IOs differently based on the rotational flag.

With this patch, I see boot time go from 7.75 s to 7.46 s on my device.

Suggested-by: J. Richard Barnette <[email protected]>
Signed-off-by: Mandeep Singh Baines <[email protected]>
Cc: Neil Brown <[email protected]>
Cc: Mike Snitzer <[email protected]>
Cc: Jens Axboe <[email protected]>
Cc: Martin K. Petersen <[email protected]>
Cc: Alasdair G Kergon <[email protected]>
Cc: [email protected]
---
drivers/md/dm-table.c | 28 ++++++++++++++++++++++++++++
1 files changed, 28 insertions(+), 0 deletions(-)

diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c
index cb8380c..40c6071 100644
--- a/drivers/md/dm-table.c
+++ b/drivers/md/dm-table.c
@@ -1177,6 +1177,30 @@ static void dm_table_set_integrity(struct dm_table *t)
blk_get_integrity(template_disk));
}

+static int device_nonrot(struct dm_target *ti, struct dm_dev *dev,
+ sector_t start, sector_t len, void *data)
+{
+ struct request_queue *q = bdev_get_queue(dev->bdev);
+
+ return q && blk_queue_nonrot(q);
+}
+
+static bool dm_table_all_nonrot(struct dm_table *t)
+{
+ unsigned i = 0;
+
+ /* Ensure that all underlying device are non rotational. */
+ while (i < dm_table_get_num_targets(t)) {
+ struct dm_target *ti = dm_table_get_target(t, i++);
+
+ if (!ti->type->iterate_devices ||
+ !ti->type->iterate_devices(ti, device_nonrot, NULL))
+ return false;
+ }
+
+ return true;
+}
+
void dm_table_set_restrictions(struct dm_table *t, struct request_queue *q,
struct queue_limits *limits)
{
@@ -1189,6 +1213,10 @@ void dm_table_set_restrictions(struct dm_table *t, struct request_queue *q,
queue_flag_clear_unlocked(QUEUE_FLAG_DISCARD, q);
else
queue_flag_set_unlocked(QUEUE_FLAG_DISCARD, q);
+ if (!dm_table_all_nonrot(t))
+ queue_flag_clear_unlocked(QUEUE_FLAG_NONROT, q);
+ else
+ queue_flag_set_unlocked(QUEUE_FLAG_NONROT, q);

dm_table_set_integrity(t);

--
1.7.3.1

2011-05-26 18:23:34

by Mike Snitzer

[permalink] [raw]

Subject: Re: dm: pass up rotational flag

On Wed, May 25 2011 at 4:50pm -0400,
Mandeep Singh Baines <[email protected]> wrote:

> Allow the NONROT flag to be passed through to linear mappings if all
> underlying device are non-rotational. Tools like ureadahead will
> schedule IOs differently based on the rotational flag.
>
> With this patch, I see boot time go from 7.75 s to 7.46 s on my device.
>
> Suggested-by: J. Richard Barnette <[email protected]>
> Signed-off-by: Mandeep Singh Baines <[email protected]>
> Cc: Neil Brown <[email protected]>
> Cc: Mike Snitzer <[email protected]>
> Cc: Jens Axboe <[email protected]>
> Cc: Martin K. Petersen <[email protected]>
> Cc: Alasdair G Kergon <[email protected]>
> Cc: [email protected]

Thanks for doing this, looks good in general.

Just a few small nits:
- rename device_nonrot to device_is_nonrot
- rename dm_table_all_nonrot to dm_table_is_nonrot
- add missing whitespace in dm_table_set_restrictions
- move ti declaration outside of dm_table_is_nonrot's while loop
- dm doesn't use 'true' or 'false' bool returns even though we have
functions that return bool -- odd but best to keep consistent for now

I'll respond with v2 that folds these changes in and my Signed-off-by
---
drivers/md/dm-table.c | 18 ++++++++++--------
1 files changed, 10 insertions(+), 8 deletions(-)

diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c
index 40c6071..a173828 100644
--- a/drivers/md/dm-table.c
+++ b/drivers/md/dm-table.c
@@ -1177,28 +1177,29 @@ static void dm_table_set_integrity(struct dm_table *t)
blk_get_integrity(template_disk));
}

-static int device_nonrot(struct dm_target *ti, struct dm_dev *dev,
- sector_t start, sector_t len, void *data)
+static int device_is_nonrot(struct dm_target *ti, struct dm_dev *dev,
+ sector_t start, sector_t len, void *data)
{
struct request_queue *q = bdev_get_queue(dev->bdev);

return q && blk_queue_nonrot(q);
}

-static bool dm_table_all_nonrot(struct dm_table *t)
+static bool dm_table_is_nonrot(struct dm_table *t)
{
+ struct dm_target *ti;
unsigned i = 0;

/* Ensure that all underlying device are non rotational. */
while (i < dm_table_get_num_targets(t)) {
- struct dm_target *ti = dm_table_get_target(t, i++);
+ ti = dm_table_get_target(t, i++);

if (!ti->type->iterate_devices ||
- !ti->type->iterate_devices(ti, device_nonrot, NULL))
- return false;
+ !ti->type->iterate_devices(ti, device_is_nonrot, NULL))
+ return 0;
}

- return true;
+ return 1;
}

void dm_table_set_restrictions(struct dm_table *t, struct request_queue *q,
@@ -1213,7 +1214,8 @@ void dm_table_set_restrictions(struct dm_table *t, struct request_queue *q,
queue_flag_clear_unlocked(QUEUE_FLAG_DISCARD, q);
else
queue_flag_set_unlocked(QUEUE_FLAG_DISCARD, q);
- if (!dm_table_all_nonrot(t))
+
+ if (!dm_table_is_nonrot(t))
queue_flag_clear_unlocked(QUEUE_FLAG_NONROT, q);
else
queue_flag_set_unlocked(QUEUE_FLAG_NONROT, q);

2011-05-26 18:29:58

by Martin K. Petersen

[permalink] [raw]

Subject: Re: dm: pass up rotational flag

>>>>> "Mike" == Mike Snitzer <[email protected]> writes:

>> Allow the NONROT flag to be passed through to linear mappings if all
>> underlying device are non-rotational. Tools like ureadahead will
>> schedule IOs differently based on the rotational flag.

Mike> Thanks for doing this, looks good in general.

I'd rather we made the rotational property part of the topology instead
of having to special-case it in DM. I seem to recall posting a patch
that did that at some point but I can't seem to find it...

--
Martin K. Petersen Oracle Linux Engineering

2011-05-26 18:35:47

by Mike Snitzer

[permalink] [raw]

Subject: [PATCH v2] dm: pass up non-rotational flag

Allow QUEUE_FLAG_NONROT to propagate up the device stack if all
underlying devices are non-rotational. Tools like ureadahead will
schedule IOs differently based on the rotational flag.

With this patch, I see boot time go from 7.75 s to 7.46 s on my device.

Suggested-by: J. Richard Barnette <[email protected]>
Signed-off-by: Mandeep Singh Baines <[email protected]>
Signed-off-by: Mike Snitzer <[email protected]>
Cc: Neil Brown <[email protected]>
Cc: Jens Axboe <[email protected]>
Cc: Martin K. Petersen <[email protected]>
Cc: Alasdair G Kergon <[email protected]>
Cc: [email protected]
---
drivers/md/dm-table.c | 30 ++++++++++++++++++++++++++++++
1 files changed, 30 insertions(+), 0 deletions(-)

v2: last minute edit was to avoid double-negative in
dm_table_set_restrictions's dm_table_is_nonrot conditional

diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c
index cb8380c..a173828 100644
--- a/drivers/md/dm-table.c
+++ b/drivers/md/dm-table.c
@@ -1177,6 +1177,31 @@ static void dm_table_set_integrity(struct dm_table *t)
blk_get_integrity(template_disk));
}

+static int device_is_nonrot(struct dm_target *ti, struct dm_dev *dev,
+ sector_t start, sector_t len, void *data)
+{
+ struct request_queue *q = bdev_get_queue(dev->bdev);
+
+ return q && blk_queue_nonrot(q);
+}
+
+static bool dm_table_is_nonrot(struct dm_table *t)
+{
+ struct dm_target *ti;
+ unsigned i = 0;
+
+ /* Ensure that all underlying device are non-rotational. */
+ while (i < dm_table_get_num_targets(t)) {
+ ti = dm_table_get_target(t, i++);
+
+ if (!ti->type->iterate_devices ||
+ !ti->type->iterate_devices(ti, device_is_nonrot, NULL))
+ return 0;
+ }
+
+ return 1;
+}
+
void dm_table_set_restrictions(struct dm_table *t, struct request_queue *q,
struct queue_limits *limits)
{
@@ -1190,6 +1215,11 @@ void dm_table_set_restrictions(struct dm_table *t, struct request_queue *q,
else
queue_flag_set_unlocked(QUEUE_FLAG_DISCARD, q);

+ if (dm_table_is_nonrot(t))
+ queue_flag_set_unlocked(QUEUE_FLAG_NONROT, q);
+ else
+ queue_flag_clear_unlocked(QUEUE_FLAG_NONROT, q);
+
dm_table_set_integrity(t);

/*

2011-05-26 18:43:37

by Mike Snitzer

[permalink] [raw]

Subject: Re: dm: pass up rotational flag

On Thu, May 26 2011 at 2:29pm -0400,
Martin K. Petersen <[email protected]> wrote:

> >>>>> "Mike" == Mike Snitzer <[email protected]> writes:
>
> >> Allow the NONROT flag to be passed through to linear mappings if all
> >> underlying device are non-rotational. Tools like ureadahead will
> >> schedule IOs differently based on the rotational flag.
>
> Mike> Thanks for doing this, looks good in general.
>
> I'd rather we made the rotational property part of the topology instead
> of having to special-case it in DM. I seem to recall posting a patch
> that did that at some point but I can't seem to find it...

Sure, that'd work too.

Similarly, couldn't the discard flag be made part of topology?

Mike

2011-05-26 18:48:45

by Martin K. Petersen

[permalink] [raw]

Subject: Re: dm: pass up rotational flag

>>>>> "Mike" == Mike Snitzer <[email protected]> writes:

Mike> Similarly, couldn't the discard flag be made part of topology?

Actually, now that I think about it I'm fairly sure my patch nuked the
discard queue flag because I was tired of keeping the flag and the
topology in sync. I don't think I mucked with the rotational flag.

Let me grep my patch queue and see. It must be in there somewhere...

--
Martin K. Petersen Oracle Linux Engineering

2011-05-26 19:14:24

[permalink] [raw]

Subject: Re: dm: pass up rotational flag

On 2011-05-26 20:29, Martin K. Petersen wrote:
>>>>>> "Mike" == Mike Snitzer <[email protected]> writes:
>
>>> Allow the NONROT flag to be passed through to linear mappings if all
>>> underlying device are non-rotational. Tools like ureadahead will
>>> schedule IOs differently based on the rotational flag.
>
> Mike> Thanks for doing this, looks good in general.
>
> I'd rather we made the rotational property part of the topology instead
> of having to special-case it in DM. I seem to recall posting a patch
> that did that at some point but I can't seem to find it...

Agree, would be a better idea.

--
Jens Axboe

2011-05-27 02:43:14

by Martin K. Petersen

[permalink] [raw]

Subject: Re: dm: pass up rotational flag

Here's a tentative patch set for transitioning the non_rotational,
discard and secure discard queue flags to the topology information.

The first patch introduces a new function for setting the default limits
for stacking drivers like we discussed a few weeks ago. Wrt. to naming
I felt it was best to have a function that said "stacking". That's why
I introduced a new call instead of using blk_set_default_limits() even
though that function was originally motivated by device mapper.

The last two patches deal with non-rotational and the discard flags
respectively.

Let me know what you guys think...

--
Martin K. Petersen Oracle Linux Engineering

2011-05-27 02:43:25

by Martin K. Petersen

[permalink] [raw]

Subject: [PATCH 1/3] block: Introduce blk_set_stacking_limits function

Stacking driver queue limits are typically bounded exclusively by the
capabilities of the low level devices, not by the stacking driver
itself.

Stacking drivers are typically permissive. A feature is supported unless
an incompatible bottom device causes it to be disabled. Low-level
drivers on the other hand are restrictive and want features disabled by
default. Low-level drivers explicitly enable features as part of their
device discovery process.

This patch introduces blk_set_stacking_limits() which has more liberal
metrics than the default queue limits function. This allows us to
inherit topology parameters from bottom devices without manually
tweaking the default limits in each driver prior to calling the stacking
function.

Since there is now a clear distinction between stacking and low-level
devices, blk_set_default_limits() has been modified to carry the more
conservative values that we used to manually set in
blk_queue_make_request().

Signed-off-by: Martin K. Petersen <[email protected]>
---
block/blk-settings.c | 30 ++++++++++++++++++++++--------
drivers/md/dm-table.c | 6 +++---
drivers/md/md.c | 1 +
include/linux/blkdev.h | 1 +
4 files changed, 27 insertions(+), 11 deletions(-)

diff --git a/block/blk-settings.c b/block/blk-settings.c
index fa1eb04..b373721 100644
--- a/block/blk-settings.c
+++ b/block/blk-settings.c
@@ -104,9 +104,7 @@ EXPORT_SYMBOL_GPL(blk_queue_lld_busy);
* @lim: the queue_limits structure to reset
*
* Description:
- * Returns a queue_limit struct to its default state. Can be used by
- * stacking drivers like DM that stage table swaps and reuse an
- * existing device queue.
+ * Returns a queue_limit struct to its default state.
*/
void blk_set_default_limits(struct queue_limits *lim)
{
@@ -114,13 +112,12 @@ void blk_set_default_limits(struct queue_limits *lim)
lim->max_integrity_segments = 0;
lim->seg_boundary_mask = BLK_SEG_BOUNDARY_MASK;
lim->max_segment_size = BLK_MAX_SEGMENT_SIZE;
- lim->max_sectors = BLK_DEF_MAX_SECTORS;
- lim->max_hw_sectors = INT_MAX;
+ lim->max_sectors = lim->max_hw_sectors = BLK_SAFE_MAX_SECTORS;
lim->max_discard_sectors = 0;
lim->discard_granularity = 0;
lim->discard_alignment = 0;
lim->discard_misaligned = 0;
- lim->discard_zeroes_data = 1;
+ lim->discard_zeroes_data = 0;
lim->logical_block_size = lim->physical_block_size = lim->io_min = 512;
lim->bounce_pfn = (unsigned long)(BLK_BOUNCE_ANY >> PAGE_SHIFT);
lim->alignment_offset = 0;
@@ -131,6 +128,25 @@ void blk_set_default_limits(struct queue_limits *lim)
EXPORT_SYMBOL(blk_set_default_limits);

/**
+ * blk_set_stacking_limits - set default limits for stacking devices
+ * @lim: the queue_limits structure to reset
+ *
+ * Description:
+ * Returns a queue_limit struct to its default state. Should be used
+ * by stacking drivers like DM that stage table swaps and reuse an
+ * existing device queue.
+ */
+void blk_set_stacking_limits(struct queue_limits *lim)
+{
+ blk_set_default_limits(lim);
+
+ lim->max_hw_sectors = INT_MAX;
+ lim->max_sectors = BLK_DEF_MAX_SECTORS;
+ lim->discard_zeroes_data = 1;
+}
+EXPORT_SYMBOL(blk_set_stacking_limits);
+
+/**
* blk_queue_make_request - define an alternate make_request function for a device
* @q: the request queue for the device to be affected
* @mfn: the alternate make_request function
@@ -165,8 +181,6 @@ void blk_queue_make_request(struct request_queue *q, make_request_fn *mfn)
q->nr_batching = BLK_BATCH_REQ;

blk_set_default_limits(&q->limits);
- blk_queue_max_hw_sectors(q, BLK_SAFE_MAX_SECTORS);
- q->limits.discard_zeroes_data = 0;

/*
* by default assume old behaviour and bounce for any highmem page
diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c
index cb8380c..35792bf 100644
--- a/drivers/md/dm-table.c
+++ b/drivers/md/dm-table.c
@@ -686,7 +686,7 @@ static int validate_hardware_logical_block_alignment(struct dm_table *table,
while (i < dm_table_get_num_targets(table)) {
ti = dm_table_get_target(table, i++);

- blk_set_default_limits(&ti_limits);
+ blk_set_stacking_limits(&ti_limits);

/* combine all target devices' limits */
if (ti->type->iterate_devices)
@@ -1107,10 +1107,10 @@ int dm_calculate_queue_limits(struct dm_table *table,
struct queue_limits ti_limits;
unsigned i = 0;

- blk_set_default_limits(limits);
+ blk_set_stacking_limits(limits);

while (i < dm_table_get_num_targets(table)) {
- blk_set_default_limits(&ti_limits);
+ blk_set_stacking_limits(&ti_limits);

ti = dm_table_get_target(table, i++);

diff --git a/drivers/md/md.c b/drivers/md/md.c
index aa640a8..1bd4c21 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -4329,6 +4329,7 @@ static int md_alloc(dev_t dev, char *name)
mddev->queue->queuedata = mddev;

blk_queue_make_request(mddev->queue, md_make_request);
+ blk_set_stacking_limits(&mddev->queue->limits);

disk = alloc_disk(1 << shift);
if (!disk) {
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index ae9091a..517247d 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -822,6 +822,7 @@ extern void blk_queue_io_min(struct request_queue *q, unsigned int min);
extern void blk_limits_io_opt(struct queue_limits *limits, unsigned int opt);
extern void blk_queue_io_opt(struct request_queue *q, unsigned int opt);
extern void blk_set_default_limits(struct queue_limits *lim);
+extern void blk_set_stacking_limits(struct queue_limits *lim);
extern int blk_stack_limits(struct queue_limits *t, struct queue_limits *b,
sector_t offset);
extern int bdev_stack_limits(struct queue_limits *t, struct block_device *bdev,
--
1.7.4.4

2011-05-27 02:43:18

by Martin K. Petersen

[permalink] [raw]

Subject: [PATCH 2/3] block: Move non-rotational flag to queue limits

To avoid special-casing the non-rotational flag when stacking it is
moved from the queue flags to be part of the queue limits. This allows
us to handle it like the remaining I/O topology information.

Signed-off-by: Martin K. Petersen <[email protected]>
---
block/blk-settings.c | 19 +++++++++++++++++++
block/blk-sysfs.c | 21 ++++++++++++++++++---
drivers/block/nbd.c | 2 +-
drivers/ide/ide-disk.c | 2 +-
drivers/mmc/card/queue.c | 2 +-
drivers/scsi/sd.c | 2 +-
drivers/staging/zram/zram_drv.c | 2 +-
include/linux/blkdev.h | 21 +++++++++++++--------
8 files changed, 55 insertions(+), 16 deletions(-)

diff --git a/block/blk-settings.c b/block/blk-settings.c
index b373721..f95760d 100644
--- a/block/blk-settings.c
+++ b/block/blk-settings.c
@@ -124,6 +124,7 @@ void blk_set_default_limits(struct queue_limits *lim)
lim->io_opt = 0;
lim->misaligned = 0;
lim->cluster = 1;
+ lim->non_rotational = 0;
}
EXPORT_SYMBOL(blk_set_default_limits);

@@ -143,6 +144,7 @@ void blk_set_stacking_limits(struct queue_limits *lim)
lim->max_hw_sectors = INT_MAX;
lim->max_sectors = BLK_DEF_MAX_SECTORS;
lim->discard_zeroes_data = 1;
+ lim->non_rotational = 1;
}
EXPORT_SYMBOL(blk_set_stacking_limits);

@@ -471,6 +473,22 @@ void blk_queue_io_opt(struct request_queue *q, unsigned int opt)
EXPORT_SYMBOL(blk_queue_io_opt);

/**
+ * blk_queue_non_rotational - set this queue as non-rotational
+ * @q: the request queue for the device
+ *
+ * Description:
+ * This setting may be used by drivers to indicate that the physical
+ * device is non-rotational (solid state device, array with
+ * non-volatile cache). Setting this may affect I/O scheduler
+ * decisions and readahead behavior.
+ */
+void blk_queue_non_rotational(struct request_queue *q)
+{
+ q->limits.non_rotational = 1;
+}
+EXPORT_SYMBOL(blk_queue_non_rotational);
+
+/**
* blk_queue_stack_limits - inherit underlying queue limits for stacked drivers
* @t: the stacking driver (top)
* @b: the underlying device (bottom)
@@ -552,6 +570,7 @@ int blk_stack_limits(struct queue_limits *t, struct queue_limits *b,

t->cluster &= b->cluster;
t->discard_zeroes_data &= b->discard_zeroes_data;
+ t->non_rotational &= b->non_rotational;

/* Physical block size a multiple of the logical block size? */
if (t->physical_block_size & (t->logical_block_size - 1)) {
diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index d935bd8..598d00a 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -186,6 +186,22 @@ static ssize_t queue_max_hw_sectors_show(struct request_queue *q, char *page)
return queue_var_show(max_hw_sectors_kb, (page));
}

+static ssize_t queue_rotational_show(struct request_queue *q, char *page)
+{
+ return queue_var_show(!q->limits.non_rotational, page);
+}
+
+static ssize_t queue_rotational_store(struct request_queue *q,
+ const char *page, size_t count)
+{
+ unsigned long rotational;
+ ssize_t ret = queue_var_store(&rotational, page, count);
+
+ q->limits.non_rotational = !rotational;
+
+ return ret;
+}
+
#define QUEUE_SYSFS_BIT_FNS(name, flag, neg) \
static ssize_t \
queue_show_##name(struct request_queue *q, char *page) \
@@ -212,7 +228,6 @@ queue_store_##name(struct request_queue *q, const char *page, size_t count) \
return ret; \
}

-QUEUE_SYSFS_BIT_FNS(nonrot, NONROT, 1);
QUEUE_SYSFS_BIT_FNS(random, ADD_RANDOM, 0);
QUEUE_SYSFS_BIT_FNS(iostats, IO_STAT, 0);
#undef QUEUE_SYSFS_BIT_FNS
@@ -352,8 +367,8 @@ static struct queue_sysfs_entry queue_discard_zeroes_data_entry = {

static struct queue_sysfs_entry queue_nonrot_entry = {
.attr = {.name = "rotational", .mode = S_IRUGO | S_IWUSR },
- .show = queue_show_nonrot,
- .store = queue_store_nonrot,
+ .show = queue_rotational_show,
+ .store = queue_rotational_store,
};

static struct queue_sysfs_entry queue_nomerges_entry = {
diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
index e6fc716..fd96b44 100644
--- a/drivers/block/nbd.c
+++ b/drivers/block/nbd.c
@@ -774,7 +774,7 @@ static int __init nbd_init(void)
/*
* Tell the block layer that we are not a rotational device
*/
- queue_flag_set_unlocked(QUEUE_FLAG_NONROT, disk->queue);
+ blk_queue_non_rotational(queue);
}

if (register_blkdev(NBD_MAJOR, "nbd")) {
diff --git a/drivers/ide/ide-disk.c b/drivers/ide/ide-disk.c
index 2747980..422c558 100644
--- a/drivers/ide/ide-disk.c
+++ b/drivers/ide/ide-disk.c
@@ -682,7 +682,7 @@ static void ide_disk_setup(ide_drive_t *drive)
queue_max_sectors(q) / 2);

if (ata_id_is_ssd(id))
- queue_flag_set_unlocked(QUEUE_FLAG_NONROT, q);
+ blk_queue_non_rotational(q);

/* calculate drive capacity, and select LBA if possible */
ide_disk_get_capacity(drive);
diff --git a/drivers/mmc/card/queue.c b/drivers/mmc/card/queue.c
index c07322c..9adce86 100644
--- a/drivers/mmc/card/queue.c
+++ b/drivers/mmc/card/queue.c
@@ -127,7 +127,7 @@ int mmc_init_queue(struct mmc_queue *mq, struct mmc_card *card, spinlock_t *lock
mq->req = NULL;

blk_queue_prep_rq(mq->queue, mmc_prep_request);
- queue_flag_set_unlocked(QUEUE_FLAG_NONROT, mq->queue);
+ blk_queue_non_rotational(mq->queue);
if (mmc_can_erase(card)) {
queue_flag_set_unlocked(QUEUE_FLAG_DISCARD, mq->queue);
mq->queue->limits.max_discard_sectors = UINT_MAX;
diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index bd0806e..7a5cf28 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -2257,7 +2257,7 @@ static void sd_read_block_characteristics(struct scsi_disk *sdkp)
rot = get_unaligned_be16(&buffer[4]);

if (rot == 1)
- queue_flag_set_unlocked(QUEUE_FLAG_NONROT, sdkp->disk->queue);
+ blk_queue_non_rotational(sdkp->disk->queue);

out:
kfree(buffer);
diff --git a/drivers/staging/zram/zram_drv.c b/drivers/staging/zram/zram_drv.c
index aab4ec4..9bd0874 100644
--- a/drivers/staging/zram/zram_drv.c
+++ b/drivers/staging/zram/zram_drv.c
@@ -538,7 +538,7 @@ int zram_init_device(struct zram *zram)
set_capacity(zram->disk, zram->disksize >> SECTOR_SHIFT);

/* zram devices sort of resembles non-rotational disks */
- queue_flag_set_unlocked(QUEUE_FLAG_NONROT, zram->disk->queue);
+ blk_queue_non_rotational(zram->disk->queue);

zram->mem_pool = xv_create_pool();
if (!zram->mem_pool) {
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 517247d..52a3f4c 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -258,6 +258,8 @@ struct queue_limits {
unsigned char discard_misaligned;
unsigned char cluster;
unsigned char discard_zeroes_data;
+
+ unsigned char non_rotational;
};

struct request_queue
@@ -396,13 +398,11 @@ struct request_queue
#define QUEUE_FLAG_SAME_COMP 9 /* force complete on same CPU */
#define QUEUE_FLAG_FAIL_IO 10 /* fake timeout */
#define QUEUE_FLAG_STACKABLE 11 /* supports request stacking */
-#define QUEUE_FLAG_NONROT 12 /* non-rotational device (SSD) */
-#define QUEUE_FLAG_VIRT QUEUE_FLAG_NONROT /* paravirt device */
-#define QUEUE_FLAG_IO_STAT 13 /* do IO stats */
-#define QUEUE_FLAG_DISCARD 14 /* supports DISCARD */
-#define QUEUE_FLAG_NOXMERGES 15 /* No extended merges */
-#define QUEUE_FLAG_ADD_RANDOM 16 /* Contributes to random pool */
-#define QUEUE_FLAG_SECDISCARD 17 /* supports SECDISCARD */
+#define QUEUE_FLAG_IO_STAT 12 /* do IO stats */
+#define QUEUE_FLAG_DISCARD 13 /* supports DISCARD */
+#define QUEUE_FLAG_NOXMERGES 14 /* No extended merges */
+#define QUEUE_FLAG_ADD_RANDOM 15 /* Contributes to random pool */
+#define QUEUE_FLAG_SECDISCARD 16 /* supports SECDISCARD */

#define QUEUE_FLAG_DEFAULT ((1 << QUEUE_FLAG_IO_STAT) | \
(1 << QUEUE_FLAG_STACKABLE) | \
@@ -479,7 +479,6 @@ static inline void queue_flag_clear(unsigned int flag, struct request_queue *q)
#define blk_queue_nomerges(q) test_bit(QUEUE_FLAG_NOMERGES, &(q)->queue_flags)
#define blk_queue_noxmerges(q) \
test_bit(QUEUE_FLAG_NOXMERGES, &(q)->queue_flags)
-#define blk_queue_nonrot(q) test_bit(QUEUE_FLAG_NONROT, &(q)->queue_flags)
#define blk_queue_io_stat(q) test_bit(QUEUE_FLAG_IO_STAT, &(q)->queue_flags)
#define blk_queue_add_random(q) test_bit(QUEUE_FLAG_ADD_RANDOM, &(q)->queue_flags)
#define blk_queue_stackable(q) \
@@ -821,6 +820,7 @@ extern void blk_limits_io_min(struct queue_limits *limits, unsigned int min);
extern void blk_queue_io_min(struct request_queue *q, unsigned int min);
extern void blk_limits_io_opt(struct queue_limits *limits, unsigned int opt);
extern void blk_queue_io_opt(struct request_queue *q, unsigned int opt);
+extern void blk_queue_non_rotational(struct request_queue *q);
extern void blk_set_default_limits(struct queue_limits *lim);
extern void blk_set_stacking_limits(struct queue_limits *lim);
extern int blk_stack_limits(struct queue_limits *t, struct queue_limits *b,
@@ -1028,6 +1028,11 @@ static inline int bdev_io_opt(struct block_device *bdev)
return queue_io_opt(bdev_get_queue(bdev));
}

+static inline unsigned int blk_queue_nonrot(struct request_queue *q)
+{
+ return q->limits.non_rotational;
+}
+
static inline int queue_alignment_offset(struct request_queue *q)
{
if (q->limits.misaligned)
--
1.7.4.4

2011-05-27 02:43:30

by Martin K. Petersen

[permalink] [raw]

Subject: [PATCH 3/3] block: Move discard and secure discard flags to queue limits

Whether a device supports discard is currently stored two places:
max_discard_sectors in the queue limits and the discard request_queue
flag. Deprecate the queue flag and always use the topology.

Also move the secure discard flag to the queue limits so it can be
stacked as well.

Signed-off-by: Martin K. Petersen <[email protected]>
---
block/blk-settings.c | 3 +++
drivers/block/brd.c | 1 -
drivers/md/dm-table.c | 5 -----
drivers/mmc/card/queue.c | 4 +---
drivers/mtd/mtd_blkdevs.c | 4 +---
drivers/scsi/sd.c | 3 +--
include/linux/blkdev.h | 21 +++++++++++++--------
7 files changed, 19 insertions(+), 22 deletions(-)

diff --git a/block/blk-settings.c b/block/blk-settings.c
index f95760d..feb3e40 100644
--- a/block/blk-settings.c
+++ b/block/blk-settings.c
@@ -118,6 +118,7 @@ void blk_set_default_limits(struct queue_limits *lim)
lim->discard_alignment = 0;
lim->discard_misaligned = 0;
lim->discard_zeroes_data = 0;
+ lim->discard_secure = 0;
lim->logical_block_size = lim->physical_block_size = lim->io_min = 512;
lim->bounce_pfn = (unsigned long)(BLK_BOUNCE_ANY >> PAGE_SHIFT);
lim->alignment_offset = 0;
@@ -144,6 +145,7 @@ void blk_set_stacking_limits(struct queue_limits *lim)
lim->max_hw_sectors = INT_MAX;
lim->max_sectors = BLK_DEF_MAX_SECTORS;
lim->discard_zeroes_data = 1;
+ lim->discard_secure = 1;
lim->non_rotational = 1;
}
EXPORT_SYMBOL(blk_set_stacking_limits);
@@ -570,6 +572,7 @@ int blk_stack_limits(struct queue_limits *t, struct queue_limits *b,

t->cluster &= b->cluster;
t->discard_zeroes_data &= b->discard_zeroes_data;
+ t->discard_secure &= b->discard_secure;
t->non_rotational &= b->non_rotational;

/* Physical block size a multiple of the logical block size? */
diff --git a/drivers/block/brd.c b/drivers/block/brd.c
index b7f51e4..3ade4e1 100644
--- a/drivers/block/brd.c
+++ b/drivers/block/brd.c
@@ -489,7 +489,6 @@ static struct brd_device *brd_alloc(int i)
brd->brd_queue->limits.discard_granularity = PAGE_SIZE;
brd->brd_queue->limits.max_discard_sectors = UINT_MAX;
brd->brd_queue->limits.discard_zeroes_data = 1;
- queue_flag_set_unlocked(QUEUE_FLAG_DISCARD, brd->brd_queue);

disk = brd->brd_disk = alloc_disk(1 << part_shift);
if (!disk)
diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c
index 35792bf..b5c6a1b 100644
--- a/drivers/md/dm-table.c
+++ b/drivers/md/dm-table.c
@@ -1185,11 +1185,6 @@ void dm_table_set_restrictions(struct dm_table *t, struct request_queue *q,
*/
q->limits = *limits;

- if (!dm_table_supports_discards(t))
- queue_flag_clear_unlocked(QUEUE_FLAG_DISCARD, q);
- else
- queue_flag_set_unlocked(QUEUE_FLAG_DISCARD, q);
-
dm_table_set_integrity(t);

/*
diff --git a/drivers/mmc/card/queue.c b/drivers/mmc/card/queue.c
index 9adce86..b5c11a0 100644
--- a/drivers/mmc/card/queue.c
+++ b/drivers/mmc/card/queue.c
@@ -129,7 +129,6 @@ int mmc_init_queue(struct mmc_queue *mq, struct mmc_card *card, spinlock_t *lock
blk_queue_prep_rq(mq->queue, mmc_prep_request);
blk_queue_non_rotational(mq->queue);
if (mmc_can_erase(card)) {
- queue_flag_set_unlocked(QUEUE_FLAG_DISCARD, mq->queue);
mq->queue->limits.max_discard_sectors = UINT_MAX;
if (card->erased_byte == 0)
mq->queue->limits.discard_zeroes_data = 1;
@@ -140,8 +139,7 @@ int mmc_init_queue(struct mmc_queue *mq, struct mmc_card *card, spinlock_t *lock
card->erase_size << 9;
}
if (mmc_can_secure_erase_trim(card))
- queue_flag_set_unlocked(QUEUE_FLAG_SECDISCARD,
- mq->queue);
+ mq->queue->limits.discard_secure = 1;
}

#ifdef CONFIG_MMC_BLOCK_BOUNCE
diff --git a/drivers/mtd/mtd_blkdevs.c b/drivers/mtd/mtd_blkdevs.c
index a534e1f..5315163 100644
--- a/drivers/mtd/mtd_blkdevs.c
+++ b/drivers/mtd/mtd_blkdevs.c
@@ -408,10 +408,8 @@ int add_mtd_blktrans_dev(struct mtd_blktrans_dev *new)
new->rq->queuedata = new;
blk_queue_logical_block_size(new->rq, tr->blksize);

- if (tr->discard) {
- queue_flag_set_unlocked(QUEUE_FLAG_DISCARD, new->rq);
+ if (tr->discard)
new->rq->limits.max_discard_sectors = UINT_MAX;
- }

gd->queue = new->rq;

diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index 7a5cf28..c958ac5 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -499,7 +499,7 @@ static void sd_config_discard(struct scsi_disk *sdkp, unsigned int mode)

case SD_LBP_DISABLE:
q->limits.max_discard_sectors = 0;
- queue_flag_clear_unlocked(QUEUE_FLAG_DISCARD, q);
+ max_blocks = 0;
return;

case SD_LBP_UNMAP:
@@ -521,7 +521,6 @@ static void sd_config_discard(struct scsi_disk *sdkp, unsigned int mode)
}

q->limits.max_discard_sectors = max_blocks * (logical_block_size >> 9);
- queue_flag_set_unlocked(QUEUE_FLAG_DISCARD, q);

sdkp->provisioning_mode = mode;
}
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 52a3f4c..42a374f 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -258,7 +258,7 @@ struct queue_limits {
unsigned char discard_misaligned;
unsigned char cluster;
unsigned char discard_zeroes_data;
-
+ unsigned char discard_secure;
unsigned char non_rotational;
};

@@ -399,10 +399,8 @@ struct request_queue
#define QUEUE_FLAG_FAIL_IO 10 /* fake timeout */
#define QUEUE_FLAG_STACKABLE 11 /* supports request stacking */
#define QUEUE_FLAG_IO_STAT 12 /* do IO stats */
-#define QUEUE_FLAG_DISCARD 13 /* supports DISCARD */
-#define QUEUE_FLAG_NOXMERGES 14 /* No extended merges */
-#define QUEUE_FLAG_ADD_RANDOM 15 /* Contributes to random pool */
-#define QUEUE_FLAG_SECDISCARD 16 /* supports SECDISCARD */
+#define QUEUE_FLAG_NOXMERGES 13 /* No extended merges */
+#define QUEUE_FLAG_ADD_RANDOM 14 /* Contributes to random pool */

#define QUEUE_FLAG_DEFAULT ((1 << QUEUE_FLAG_IO_STAT) | \
(1 << QUEUE_FLAG_STACKABLE) | \
@@ -483,9 +481,6 @@ static inline void queue_flag_clear(unsigned int flag, struct request_queue *q)
#define blk_queue_add_random(q) test_bit(QUEUE_FLAG_ADD_RANDOM, &(q)->queue_flags)
#define blk_queue_stackable(q) \
test_bit(QUEUE_FLAG_STACKABLE, &(q)->queue_flags)
-#define blk_queue_discard(q) test_bit(QUEUE_FLAG_DISCARD, &(q)->queue_flags)
-#define blk_queue_secdiscard(q) (blk_queue_discard(q) && \
- test_bit(QUEUE_FLAG_SECDISCARD, &(q)->queue_flags))

#define blk_noretry_request(rq) \
((rq)->cmd_flags & (REQ_FAILFAST_DEV|REQ_FAILFAST_TRANSPORT| \
@@ -1033,6 +1028,16 @@ static inline unsigned int blk_queue_nonrot(struct request_queue *q)
return q->limits.non_rotational;
}

+static inline unsigned int blk_queue_discard(struct request_queue *q)
+{
+ return !!q->limits.max_discard_sectors;
+}
+
+static inline unsigned int blk_queue_secdiscard(struct request_queue *q)
+{
+ return q->limits.discard_secure;
+}
+
static inline int queue_alignment_offset(struct request_queue *q)
{
if (q->limits.misaligned)
--
1.7.4.4

2011-05-27 13:03:01

by Mike Snitzer

[permalink] [raw]

Subject: Re: [PATCH 2/3] block: Move non-rotational flag to queue limits

On Thu, May 26 2011 at 10:42pm -0400,
Martin K. Petersen <[email protected]> wrote:

> To avoid special-casing the non-rotational flag when stacking it is
> moved from the queue flags to be part of the queue limits. This allows
> us to handle it like the remaining I/O topology information.

blk_queue_nonrot vs blk_queue_non_rotational lends itself to a small
amount of confusion.

What about:
s/blk_queue_nonrot/blk_queue_non_rotational/
s/blk_queue_non_rotational/blk_queue_set_non_rotational/
?

Otherwise, looks great.

Acked-by: Mike Snitzer <[email protected]>

2011-05-27 13:03:58

by Mike Snitzer

[permalink] [raw]

Subject: Re: [PATCH 1/3] block: Introduce blk_set_stacking_limits function

On Thu, May 26 2011 at 10:42pm -0400,
Martin K. Petersen <[email protected]> wrote:

> Stacking driver queue limits are typically bounded exclusively by the
> capabilities of the low level devices, not by the stacking driver
> itself.
>
> Stacking drivers are typically permissive. A feature is supported unless
> an incompatible bottom device causes it to be disabled. Low-level
> drivers on the other hand are restrictive and want features disabled by
> default. Low-level drivers explicitly enable features as part of their
> device discovery process.
>
> This patch introduces blk_set_stacking_limits() which has more liberal
> metrics than the default queue limits function. This allows us to
> inherit topology parameters from bottom devices without manually
> tweaking the default limits in each driver prior to calling the stacking
> function.
>
> Since there is now a clear distinction between stacking and low-level
> devices, blk_set_default_limits() has been modified to carry the more
> conservative values that we used to manually set in
> blk_queue_make_request().
>
> Signed-off-by: Martin K. Petersen <[email protected]>

Acked-by: Mike Snitzer <[email protected]>

2011-05-27 13:40:07

by Mike Snitzer

[permalink] [raw]

Subject: Re: [PATCH 3/3] block: Move discard and secure discard flags to queue limits

On Thu, May 26 2011 at 10:42pm -0400,
Martin K. Petersen <[email protected]> wrote:

> Whether a device supports discard is currently stored two places:
> max_discard_sectors in the queue limits and the discard request_queue
> flag. Deprecate the queue flag and always use the topology.
>
> Also move the secure discard flag to the queue limits so it can be
> stacked as well.
>
> Signed-off-by: Martin K. Petersen <[email protected]>

> diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c
> index 35792bf..b5c6a1b 100644
> --- a/drivers/md/dm-table.c
> +++ b/drivers/md/dm-table.c
> @@ -1185,11 +1185,6 @@ void dm_table_set_restrictions(struct dm_table *t, struct request_queue *q,
> */
> q->limits = *limits;
>
> - if (!dm_table_supports_discards(t))
> - queue_flag_clear_unlocked(QUEUE_FLAG_DISCARD, q);
> - else
> - queue_flag_set_unlocked(QUEUE_FLAG_DISCARD, q);
> -
> dm_table_set_integrity(t);
>
> /*

This doesn't go far enough; dm-table.c pays attention to the targets in
a table and their support for discards.

Most targets do support discards (tgt->num_discard_requests > 0). But
if any target doesn't support discards then the entire table doesn't
support them.

In addition, there is patch that Alasdair just merged that allows a
target to short-circuit the normal dm_table_supports_discards() and
always claim support for discards. This is needed for the upcoming DM
thinp target (target, and table, supports discards even if none of the
target's underlying devices do).

This is queued for Alasdair's 2.6.40 pull request to Linus (which should
be happening very soon):
ftp://sources.redhat.com/pub/dm/patches/2.6-unstable/editing/patches/dm-table-allow-targets-to-support-discards-internally.patch

Can we give this patch another cycle (IMHO quite late to crank out DM
changes that don't offer discard support regressions)? I'd have time to
sort out the DM side of things so that it plays nice with what you've
provided.

The previous 2 patches look good to me. But if Jens is to merge those
for the current window that means Alasdair should drop this stop-gap
patch that he queued up:
ftp://sources.redhat.com/pub/dm/patches/2.6-unstable/editing/patches/dm-table-propagate-non-rotational-flag.patch

Thoughts?

Mike

2011-05-27 16:20:27

by Mandeep Singh Baines

[permalink] [raw]

Subject: Re: [PATCH 3/3] block: Move discard and secure discard flags to queue limits

Martin K. Petersen ([email protected]) wrote:
> Whether a device supports discard is currently stored two places:
> max_discard_sectors in the queue limits and the discard request_queue
> flag. Deprecate the queue flag and always use the topology.
>
> Also move the secure discard flag to the queue limits so it can be
> stacked as well.
>
> Signed-off-by: Martin K. Petersen <[email protected]>
> ---
> block/blk-settings.c | 3 +++
> drivers/block/brd.c | 1 -
> drivers/md/dm-table.c | 5 -----
> drivers/mmc/card/queue.c | 4 +---
> drivers/mtd/mtd_blkdevs.c | 4 +---
> drivers/scsi/sd.c | 3 +--
> include/linux/blkdev.h | 21 +++++++++++++--------
> 7 files changed, 19 insertions(+), 22 deletions(-)
>
> diff --git a/block/blk-settings.c b/block/blk-settings.c
> index f95760d..feb3e40 100644
> --- a/block/blk-settings.c
> +++ b/block/blk-settings.c
> @@ -118,6 +118,7 @@ void blk_set_default_limits(struct queue_limits *lim)
> lim->discard_alignment = 0;
> lim->discard_misaligned = 0;
> lim->discard_zeroes_data = 0;
> + lim->discard_secure = 0;
> lim->logical_block_size = lim->physical_block_size = lim->io_min = 512;
> lim->bounce_pfn = (unsigned long)(BLK_BOUNCE_ANY >> PAGE_SHIFT);
> lim->alignment_offset = 0;
> @@ -144,6 +145,7 @@ void blk_set_stacking_limits(struct queue_limits *lim)
> lim->max_hw_sectors = INT_MAX;
> lim->max_sectors = BLK_DEF_MAX_SECTORS;
> lim->discard_zeroes_data = 1;
> + lim->discard_secure = 1;
> lim->non_rotational = 1;
> }
> EXPORT_SYMBOL(blk_set_stacking_limits);
> @@ -570,6 +572,7 @@ int blk_stack_limits(struct queue_limits *t, struct queue_limits *b,
>
> t->cluster &= b->cluster;
> t->discard_zeroes_data &= b->discard_zeroes_data;
> + t->discard_secure &= b->discard_secure;
> t->non_rotational &= b->non_rotational;
>
> /* Physical block size a multiple of the logical block size? */
> diff --git a/drivers/block/brd.c b/drivers/block/brd.c
> index b7f51e4..3ade4e1 100644
> --- a/drivers/block/brd.c
> +++ b/drivers/block/brd.c
> @@ -489,7 +489,6 @@ static struct brd_device *brd_alloc(int i)
> brd->brd_queue->limits.discard_granularity = PAGE_SIZE;
> brd->brd_queue->limits.max_discard_sectors = UINT_MAX;
> brd->brd_queue->limits.discard_zeroes_data = 1;
> - queue_flag_set_unlocked(QUEUE_FLAG_DISCARD, brd->brd_queue);
>
> disk = brd->brd_disk = alloc_disk(1 << part_shift);
> if (!disk)
> diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c
> index 35792bf..b5c6a1b 100644
> --- a/drivers/md/dm-table.c
> +++ b/drivers/md/dm-table.c
> @@ -1185,11 +1185,6 @@ void dm_table_set_restrictions(struct dm_table *t, struct request_queue *q,
> */
> q->limits = *limits;
>
> - if (!dm_table_supports_discards(t))

You've removed the only caller of dm_table_supports_discards here.
So you should be able to remove it also.

> - queue_flag_clear_unlocked(QUEUE_FLAG_DISCARD, q);
> - else
> - queue_flag_set_unlocked(QUEUE_FLAG_DISCARD, q);
> -
> dm_table_set_integrity(t);
>
> /*
> diff --git a/drivers/mmc/card/queue.c b/drivers/mmc/card/queue.c
> index 9adce86..b5c11a0 100644
> --- a/drivers/mmc/card/queue.c
> +++ b/drivers/mmc/card/queue.c
> @@ -129,7 +129,6 @@ int mmc_init_queue(struct mmc_queue *mq, struct mmc_card *card, spinlock_t *lock
> blk_queue_prep_rq(mq->queue, mmc_prep_request);
> blk_queue_non_rotational(mq->queue);
> if (mmc_can_erase(card)) {
> - queue_flag_set_unlocked(QUEUE_FLAG_DISCARD, mq->queue);
> mq->queue->limits.max_discard_sectors = UINT_MAX;
> if (card->erased_byte == 0)
> mq->queue->limits.discard_zeroes_data = 1;
> @@ -140,8 +139,7 @@ int mmc_init_queue(struct mmc_queue *mq, struct mmc_card *card, spinlock_t *lock
> card->erase_size << 9;
> }
> if (mmc_can_secure_erase_trim(card))
> - queue_flag_set_unlocked(QUEUE_FLAG_SECDISCARD,
> - mq->queue);
> + mq->queue->limits.discard_secure = 1;
> }
>
> #ifdef CONFIG_MMC_BLOCK_BOUNCE
> diff --git a/drivers/mtd/mtd_blkdevs.c b/drivers/mtd/mtd_blkdevs.c
> index a534e1f..5315163 100644
> --- a/drivers/mtd/mtd_blkdevs.c
> +++ b/drivers/mtd/mtd_blkdevs.c
> @@ -408,10 +408,8 @@ int add_mtd_blktrans_dev(struct mtd_blktrans_dev *new)
> new->rq->queuedata = new;
> blk_queue_logical_block_size(new->rq, tr->blksize);
>
> - if (tr->discard) {
> - queue_flag_set_unlocked(QUEUE_FLAG_DISCARD, new->rq);
> + if (tr->discard)
> new->rq->limits.max_discard_sectors = UINT_MAX;
> - }
>
> gd->queue = new->rq;
>
> diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
> index 7a5cf28..c958ac5 100644
> --- a/drivers/scsi/sd.c
> +++ b/drivers/scsi/sd.c
> @@ -499,7 +499,7 @@ static void sd_config_discard(struct scsi_disk *sdkp, unsigned int mode)
>
> case SD_LBP_DISABLE:
> q->limits.max_discard_sectors = 0;
> - queue_flag_clear_unlocked(QUEUE_FLAG_DISCARD, q);
> + max_blocks = 0;
> return;
>
> case SD_LBP_UNMAP:
> @@ -521,7 +521,6 @@ static void sd_config_discard(struct scsi_disk *sdkp, unsigned int mode)
> }
>
> q->limits.max_discard_sectors = max_blocks * (logical_block_size >> 9);
> - queue_flag_set_unlocked(QUEUE_FLAG_DISCARD, q);
>
> sdkp->provisioning_mode = mode;
> }
> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
> index 52a3f4c..42a374f 100644
> --- a/include/linux/blkdev.h
> +++ b/include/linux/blkdev.h
> @@ -258,7 +258,7 @@ struct queue_limits {
> unsigned char discard_misaligned;
> unsigned char cluster;
> unsigned char discard_zeroes_data;
> -
> + unsigned char discard_secure;
> unsigned char non_rotational;
> };
>
> @@ -399,10 +399,8 @@ struct request_queue
> #define QUEUE_FLAG_FAIL_IO 10 /* fake timeout */
> #define QUEUE_FLAG_STACKABLE 11 /* supports request stacking */
> #define QUEUE_FLAG_IO_STAT 12 /* do IO stats */
> -#define QUEUE_FLAG_DISCARD 13 /* supports DISCARD */
> -#define QUEUE_FLAG_NOXMERGES 14 /* No extended merges */
> -#define QUEUE_FLAG_ADD_RANDOM 15 /* Contributes to random pool */
> -#define QUEUE_FLAG_SECDISCARD 16 /* supports SECDISCARD */
> +#define QUEUE_FLAG_NOXMERGES 13 /* No extended merges */
> +#define QUEUE_FLAG_ADD_RANDOM 14 /* Contributes to random pool */
>
> #define QUEUE_FLAG_DEFAULT ((1 << QUEUE_FLAG_IO_STAT) | \
> (1 << QUEUE_FLAG_STACKABLE) | \
> @@ -483,9 +481,6 @@ static inline void queue_flag_clear(unsigned int flag, struct request_queue *q)
> #define blk_queue_add_random(q) test_bit(QUEUE_FLAG_ADD_RANDOM, &(q)->queue_flags)
> #define blk_queue_stackable(q) \
> test_bit(QUEUE_FLAG_STACKABLE, &(q)->queue_flags)
> -#define blk_queue_discard(q) test_bit(QUEUE_FLAG_DISCARD, &(q)->queue_flags)
> -#define blk_queue_secdiscard(q) (blk_queue_discard(q) && \
> - test_bit(QUEUE_FLAG_SECDISCARD, &(q)->queue_flags))
>
> #define blk_noretry_request(rq) \
> ((rq)->cmd_flags & (REQ_FAILFAST_DEV|REQ_FAILFAST_TRANSPORT| \
> @@ -1033,6 +1028,16 @@ static inline unsigned int blk_queue_nonrot(struct request_queue *q)
> return q->limits.non_rotational;
> }
>
> +static inline unsigned int blk_queue_discard(struct request_queue *q)
> +{
> + return !!q->limits.max_discard_sectors;
> +}
> +
> +static inline unsigned int blk_queue_secdiscard(struct request_queue *q)
> +{
> + return q->limits.discard_secure;
> +}
> +
> static inline int queue_alignment_offset(struct request_queue *q)
> {
> if (q->limits.misaligned)
> --
> 1.7.4.4
>

2011-05-31 02:20:06

by Martin K. Petersen

[permalink] [raw]

Subject: Re: [PATCH 2/3] block: Move non-rotational flag to queue limits

>>>>> "Mike" == Mike Snitzer <[email protected]> writes:

Mike> blk_queue_nonrot vs blk_queue_non_rotational lends itself to a
Mike> small amount of confusion.

Yeah, I just didn't feel like mucking with the existing call. But it
looks like there are only a handful of users.

Mike> What about:
Mike> s/blk_queue_nonrot/blk_queue_non_rotational/
Mike> s/blk_queue_non_rotational/blk_queue_set_non_rotational/
Mike> ?

Most of our other block layer calls take the form blk_queue_max_foo()
for setting foo and {bdev,queue}_max_foo() for querying.

So I guess the most appropriate thing to do would be to do something
like this?

block: Move non-rotational flag to queue limits

To avoid special-casing the non-rotational flag when stacking it is
moved from the queue flags to be part of the queue limits. This allows
us to handle it like the remaining I/O topology information.

Also rename blk_queue_nonrot() to be consistent with block layer calling
conventions.

Signed-off-by: Martin K. Petersen <[email protected]>

diff --git a/block/blk-settings.c b/block/blk-settings.c
index b373721..f95760d 100644
--- a/block/blk-settings.c
+++ b/block/blk-settings.c
@@ -124,6 +124,7 @@ void blk_set_default_limits(struct queue_limits *lim)
lim->io_opt = 0;
lim->misaligned = 0;
lim->cluster = 1;
+ lim->non_rotational = 0;
}
EXPORT_SYMBOL(blk_set_default_limits);

@@ -143,6 +144,7 @@ void blk_set_stacking_limits(struct queue_limits *lim)
lim->max_hw_sectors = INT_MAX;
lim->max_sectors = BLK_DEF_MAX_SECTORS;
lim->discard_zeroes_data = 1;
+ lim->non_rotational = 1;
}
EXPORT_SYMBOL(blk_set_stacking_limits);

@@ -471,6 +473,22 @@ void blk_queue_io_opt(struct request_queue *q, unsigned int opt)
EXPORT_SYMBOL(blk_queue_io_opt);

/**
+ * blk_queue_non_rotational - set this queue as non-rotational
+ * @q: the request queue for the device
+ *
+ * Description:
+ * This setting may be used by drivers to indicate that the physical
+ * device is non-rotational (solid state device, array with
+ * non-volatile cache). Setting this may affect I/O scheduler
+ * decisions and readahead behavior.
+ */
+void blk_queue_non_rotational(struct request_queue *q)
+{
+ q->limits.non_rotational = 1;
+}
+EXPORT_SYMBOL(blk_queue_non_rotational);
+
+/**
* blk_queue_stack_limits - inherit underlying queue limits for stacked drivers
* @t: the stacking driver (top)
* @b: the underlying device (bottom)
@@ -552,6 +570,7 @@ int blk_stack_limits(struct queue_limits *t, struct queue_limits *b,

t->cluster &= b->cluster;
t->discard_zeroes_data &= b->discard_zeroes_data;
+ t->non_rotational &= b->non_rotational;

/* Physical block size a multiple of the logical block size? */
if (t->physical_block_size & (t->logical_block_size - 1)) {
diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index d935bd8..2669b15 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -186,6 +186,22 @@ static ssize_t queue_max_hw_sectors_show(struct request_queue *q, char *page)
return queue_var_show(max_hw_sectors_kb, (page));
}

+static ssize_t queue_rotational_show(struct request_queue *q, char *page)
+{
+ return queue_var_show(!queue_non_rotational(q), page);
+}
+
+static ssize_t queue_rotational_store(struct request_queue *q,
+ const char *page, size_t count)
+{
+ unsigned long rotational;
+ ssize_t ret = queue_var_store(&rotational, page, count);
+
+ q->limits.non_rotational = !rotational;
+
+ return ret;
+}
+
#define QUEUE_SYSFS_BIT_FNS(name, flag, neg) \
static ssize_t \
queue_show_##name(struct request_queue *q, char *page) \
@@ -212,7 +228,6 @@ queue_store_##name(struct request_queue *q, const char *page, size_t count) \
return ret; \
}

-QUEUE_SYSFS_BIT_FNS(nonrot, NONROT, 1);
QUEUE_SYSFS_BIT_FNS(random, ADD_RANDOM, 0);
QUEUE_SYSFS_BIT_FNS(iostats, IO_STAT, 0);
#undef QUEUE_SYSFS_BIT_FNS
@@ -352,8 +367,8 @@ static struct queue_sysfs_entry queue_discard_zeroes_data_entry = {

static struct queue_sysfs_entry queue_nonrot_entry = {
.attr = {.name = "rotational", .mode = S_IRUGO | S_IWUSR },
- .show = queue_show_nonrot,
- .store = queue_store_nonrot,
+ .show = queue_rotational_show,
+ .store = queue_rotational_store,
};

static struct queue_sysfs_entry queue_nomerges_entry = {
diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c
index 7c52d68..c84b539 100644
--- a/block/cfq-iosched.c
+++ b/block/cfq-iosched.c
@@ -1961,7 +1961,7 @@ static bool cfq_should_idle(struct cfq_data *cfqd, struct cfq_queue *cfqq)

/* We do for queues that were marked with idle window flag. */
if (cfq_cfqq_idle_window(cfqq) &&
- !(blk_queue_nonrot(cfqd->queue) && cfqd->hw_tag))
+ !(queue_non_rotational(cfqd->queue) && cfqd->hw_tag))
return true;

/*
@@ -1986,7 +1986,7 @@ static void cfq_arm_slice_timer(struct cfq_data *cfqd)
* for devices that support queuing, otherwise we still have a problem
* with sync vs async workloads.
*/
- if (blk_queue_nonrot(cfqd->queue) && cfqd->hw_tag)
+ if (queue_non_rotational(cfqd->queue) && cfqd->hw_tag)
return;

WARN_ON(!RB_EMPTY_ROOT(&cfqq->sort_list));
@@ -3237,7 +3237,7 @@ cfq_update_io_seektime(struct cfq_data *cfqd, struct cfq_queue *cfqq,
}

cfqq->seek_history <<= 1;
- if (blk_queue_nonrot(cfqd->queue))
+ if (queue_non_rotational(cfqd->queue))
cfqq->seek_history |= (n_sec < CFQQ_SECT_THR_NONROT);
else
cfqq->seek_history |= (sdist > CFQQ_SEEK_THR);
diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
index e6fc716..fd96b44 100644
--- a/drivers/block/nbd.c
+++ b/drivers/block/nbd.c
@@ -774,7 +774,7 @@ static int __init nbd_init(void)
/*
* Tell the block layer that we are not a rotational device
*/
- queue_flag_set_unlocked(QUEUE_FLAG_NONROT, disk->queue);
+ blk_queue_non_rotational(queue);
}

if (register_blkdev(NBD_MAJOR, "nbd")) {
diff --git a/drivers/ide/ide-disk.c b/drivers/ide/ide-disk.c
index 2747980..422c558 100644
--- a/drivers/ide/ide-disk.c
+++ b/drivers/ide/ide-disk.c
@@ -682,7 +682,7 @@ static void ide_disk_setup(ide_drive_t *drive)
queue_max_sectors(q) / 2);

if (ata_id_is_ssd(id))
- queue_flag_set_unlocked(QUEUE_FLAG_NONROT, q);
+ blk_queue_non_rotational(q);

/* calculate drive capacity, and select LBA if possible */
ide_disk_get_capacity(drive);
diff --git a/drivers/mmc/card/queue.c b/drivers/mmc/card/queue.c
index c07322c..9adce86 100644
--- a/drivers/mmc/card/queue.c
+++ b/drivers/mmc/card/queue.c
@@ -127,7 +127,7 @@ int mmc_init_queue(struct mmc_queue *mq, struct mmc_card *card, spinlock_t *lock
mq->req = NULL;

blk_queue_prep_rq(mq->queue, mmc_prep_request);
- queue_flag_set_unlocked(QUEUE_FLAG_NONROT, mq->queue);
+ blk_queue_non_rotational(mq->queue);
if (mmc_can_erase(card)) {
queue_flag_set_unlocked(QUEUE_FLAG_DISCARD, mq->queue);
mq->queue->limits.max_discard_sectors = UINT_MAX;
diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index bd0806e..7a5cf28 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -2257,7 +2257,7 @@ static void sd_read_block_characteristics(struct scsi_disk *sdkp)
rot = get_unaligned_be16(&buffer[4]);

if (rot == 1)
- queue_flag_set_unlocked(QUEUE_FLAG_NONROT, sdkp->disk->queue);
+ blk_queue_non_rotational(sdkp->disk->queue);

out:
kfree(buffer);
diff --git a/drivers/staging/zram/zram_drv.c b/drivers/staging/zram/zram_drv.c
index aab4ec4..9bd0874 100644
--- a/drivers/staging/zram/zram_drv.c
+++ b/drivers/staging/zram/zram_drv.c
@@ -538,7 +538,7 @@ int zram_init_device(struct zram *zram)
set_capacity(zram->disk, zram->disksize >> SECTOR_SHIFT);

/* zram devices sort of resembles non-rotational disks */
- queue_flag_set_unlocked(QUEUE_FLAG_NONROT, zram->disk->queue);
+ blk_queue_non_rotational(zram->disk->queue);

zram->mem_pool = xv_create_pool();
if (!zram->mem_pool) {
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index c7367ae..b32650c 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -588,7 +588,7 @@ static int __btrfs_open_devices(struct btrfs_fs_devices *fs_devices,
device->in_fs_metadata = 0;
device->mode = flags;

- if (!blk_queue_nonrot(bdev_get_queue(bdev)))
+ if (!bdev_non_rotational(bdev))
fs_devices->rotating = 1;

fs_devices->open_devices++;
@@ -1619,7 +1619,7 @@ int btrfs_init_new_device(struct btrfs_root *root, char *device_path)
root->fs_info->fs_devices->rw_devices++;
root->fs_info->fs_devices->total_rw_bytes += device->total_bytes;

- if (!blk_queue_nonrot(bdev_get_queue(bdev)))
+ if (!bdev_non_rotational(bdev))
root->fs_info->fs_devices->rotating = 1;

total_bytes = btrfs_super_total_bytes(&root->fs_info->super_copy);
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 517247d..fb8da90 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -258,6 +258,8 @@ struct queue_limits {
unsigned char discard_misaligned;
unsigned char cluster;
unsigned char discard_zeroes_data;
+
+ unsigned char non_rotational;
};

struct request_queue
@@ -396,13 +398,11 @@ struct request_queue
#define QUEUE_FLAG_SAME_COMP 9 /* force complete on same CPU */
#define QUEUE_FLAG_FAIL_IO 10 /* fake timeout */
#define QUEUE_FLAG_STACKABLE 11 /* supports request stacking */
-#define QUEUE_FLAG_NONROT 12 /* non-rotational device (SSD) */
-#define QUEUE_FLAG_VIRT QUEUE_FLAG_NONROT /* paravirt device */
-#define QUEUE_FLAG_IO_STAT 13 /* do IO stats */
-#define QUEUE_FLAG_DISCARD 14 /* supports DISCARD */
-#define QUEUE_FLAG_NOXMERGES 15 /* No extended merges */
-#define QUEUE_FLAG_ADD_RANDOM 16 /* Contributes to random pool */
-#define QUEUE_FLAG_SECDISCARD 17 /* supports SECDISCARD */
+#define QUEUE_FLAG_IO_STAT 12 /* do IO stats */
+#define QUEUE_FLAG_DISCARD 13 /* supports DISCARD */
+#define QUEUE_FLAG_NOXMERGES 14 /* No extended merges */
+#define QUEUE_FLAG_ADD_RANDOM 15 /* Contributes to random pool */
+#define QUEUE_FLAG_SECDISCARD 16 /* supports SECDISCARD */

#define QUEUE_FLAG_DEFAULT ((1 << QUEUE_FLAG_IO_STAT) | \
(1 << QUEUE_FLAG_STACKABLE) | \
@@ -479,7 +479,6 @@ static inline void queue_flag_clear(unsigned int flag, struct request_queue *q)
#define blk_queue_nomerges(q) test_bit(QUEUE_FLAG_NOMERGES, &(q)->queue_flags)
#define blk_queue_noxmerges(q) \
test_bit(QUEUE_FLAG_NOXMERGES, &(q)->queue_flags)
-#define blk_queue_nonrot(q) test_bit(QUEUE_FLAG_NONROT, &(q)->queue_flags)
#define blk_queue_io_stat(q) test_bit(QUEUE_FLAG_IO_STAT, &(q)->queue_flags)
#define blk_queue_add_random(q) test_bit(QUEUE_FLAG_ADD_RANDOM, &(q)->queue_flags)
#define blk_queue_stackable(q) \
@@ -821,6 +820,7 @@ extern void blk_limits_io_min(struct queue_limits *limits, unsigned int min);
extern void blk_queue_io_min(struct request_queue *q, unsigned int min);
extern void blk_limits_io_opt(struct queue_limits *limits, unsigned int opt);
extern void blk_queue_io_opt(struct request_queue *q, unsigned int opt);
+extern void blk_queue_non_rotational(struct request_queue *q);
extern void blk_set_default_limits(struct queue_limits *lim);
extern void blk_set_stacking_limits(struct queue_limits *lim);
extern int blk_stack_limits(struct queue_limits *t, struct queue_limits *b,
@@ -1028,6 +1028,16 @@ static inline int bdev_io_opt(struct block_device *bdev)
return queue_io_opt(bdev_get_queue(bdev));
}

+static inline unsigned int queue_non_rotational(struct request_queue *q)
+{
+ return q->limits.non_rotational;
+}
+
+static inline unsigned int bdev_non_rotational(struct block_device *bdev)
+{
+ return queue_non_rotational(bdev_get_queue(bdev));
+}
+
static inline int queue_alignment_offset(struct request_queue *q)
{
if (q->limits.misaligned)
diff --git a/mm/swapfile.c b/mm/swapfile.c
index d537d29..3f35457 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -2110,7 +2110,7 @@ SYSCALL_DEFINE2(swapon, const char __user *, specialfile, int, swap_flags)
}

if (p->bdev) {
- if (blk_queue_nonrot(bdev_get_queue(p->bdev))) {
+ if (bdev_non_rotational(p->bdev)) {
p->flags |= SWP_SOLIDSTATE;
p->cluster_next = 1 + (random32() % p->highest_bit);
}

2011-05-31 02:22:52

by Martin K. Petersen

[permalink] [raw]

Subject: Re: [PATCH 3/3] block: Move discard and secure discard flags to queue limits

>>>>> "Mike" == Mike Snitzer <[email protected]> writes:

Mike> Most targets do support discards (tgt->num_discard_requests > 0).
Mike> But if any target doesn't support discards then the entire table
Mike> doesn't support them.

Would you rather have the stacking policy for discard be | instead of &?

--
Martin K. Petersen Oracle Linux Engineering

2011-05-31 12:49:24

by Mike Snitzer

[permalink] [raw]

Subject: Re: [PATCH 2/3] block: Move non-rotational flag to queue limits

On Mon, May 30 2011 at 10:19pm -0400,
Martin K. Petersen <[email protected]> wrote:

> >>>>> "Mike" == Mike Snitzer <[email protected]> writes:
>
> Mike> blk_queue_nonrot vs blk_queue_non_rotational lends itself to a
> Mike> small amount of confusion.
>
> Yeah, I just didn't feel like mucking with the existing call. But it
> looks like there are only a handful of users.
>
>
> Mike> What about:
> Mike> s/blk_queue_nonrot/blk_queue_non_rotational/
> Mike> s/blk_queue_non_rotational/blk_queue_set_non_rotational/
> Mike> ?
>
> Most of our other block layer calls take the form blk_queue_max_foo()
> for setting foo and {bdev,queue}_max_foo() for querying.
>
> So I guess the most appropriate thing to do would be to do something
> like this?
>
>
> block: Move non-rotational flag to queue limits
>
> To avoid special-casing the non-rotational flag when stacking it is
> moved from the queue flags to be part of the queue limits. This allows
> us to handle it like the remaining I/O topology information.
>
> Also rename blk_queue_nonrot() to be consistent with block layer calling
> conventions.
>
> Signed-off-by: Martin K. Petersen <[email protected]>

Acked-by: Mike Snitzer <[email protected]>

2011-05-31 13:14:19

[permalink] [raw]

Subject: Re: [PATCH 2/3] block: Move non-rotational flag to queue limits

On 2011-05-31 04:19, Martin K. Petersen wrote:
> - queue_flag_set_unlocked(QUEUE_FLAG_NONROT, disk->queue);
> + blk_queue_non_rotational(queue);

I don't like this part of the change. Before it was immediately
apparently that we were setting this flag, know you have no idea what it
does. Please make that blk_queue_set_non_rotational().

Otherwise looks fine.

--
Jens Axboe

2011-05-31 14:28:28

by Martin K. Petersen

[permalink] [raw]

Subject: Re: [PATCH 2/3] block: Move non-rotational flag to queue limits

>>>>> "Jens" == Jens Axboe <[email protected]> writes:

Jens> On 2011-05-31 04:19, Martin K. Petersen wrote:
>> - queue_flag_set_unlocked(QUEUE_FLAG_NONROT, disk->queue);
>> + blk_queue_non_rotational(queue);

Jens> I don't like this part of the change. Before it was immediately
Jens> apparently that we were setting this flag, know you have no idea
Jens> what it does. Please make that blk_queue_set_non_rotational().

I was just trying to mimic the rest of the topology calls.

How about:

blk_queue_rotational(q, BLK_QUEUE_ROTATIONAL|BLK_QUEUE_NON_ROTATIONAL)

Doing it that way would make the code clearer a few places too, I
think...

--
Martin K. Petersen Oracle Linux Engineering

2011-05-31 14:43:13

[permalink] [raw]

Subject: Re: [PATCH 2/3] block: Move non-rotational flag to queue limits

On 2011-05-31 16:28, Martin K. Petersen wrote:
>>>>>> "Jens" == Jens Axboe <[email protected]> writes:
>
> Jens> On 2011-05-31 04:19, Martin K. Petersen wrote:
>>> - queue_flag_set_unlocked(QUEUE_FLAG_NONROT, disk->queue);
>>> + blk_queue_non_rotational(queue);
>
> Jens> I don't like this part of the change. Before it was immediately
> Jens> apparently that we were setting this flag, know you have no idea
> Jens> what it does. Please make that blk_queue_set_non_rotational().
>
> I was just trying to mimic the rest of the topology calls.
>
> How about:
>
> blk_queue_rotational(q, BLK_QUEUE_ROTATIONAL|BLK_QUEUE_NON_ROTATIONAL)
>
> Doing it that way would make the code clearer a few places too, I
> think...

I prefer having the function names be descriptive instead, but
consistency is good as well. The problem with the above approach is that
it usually requires defines to match, otherwise you don't know what the
arguments do. So lets rather fix up the other names for the next kernel.

--
Jens Axboe