Commit 34b48db66e08 (block: remove artificial max_hw_sectors cap)
caused performance regressions for streaming I/O workloads across
a range of storage from SATA disks to enterprise storage arrays.
I was unable to actually show a performance gain for any storage
I have access to. However, the patch was introduced to boost
performance on software RAID over SATA disks, and I don't have
access to such a configuration.
The compromise is to reinstate BLK_DEF_MAX_SECTORS, but to bump it
up to 2560 from 1024. This number should allow full stripe writes
to 10 data disks with a chunk size of 128KB, and at the same time
does not regress the storage I have access to.
[PATCH 1/2] Revert "block: remove artifical max_hw_sectors cap"
[PATCH 2/2] block: bump BLK_DEF_MAX_SECTORS to 2560
block/blk-settings.c | 4 +++-
drivers/block/aoe/aoeblk.c | 2 +-
include/linux/blkdev.h | 1 +
3 files changed, 5 insertions(+), 2 deletions(-)
This reverts commit 34b48db66e08ca1c1bc07cf305d672ac940268dc.
That commit caused performance regressions for streaming I/O
workloads on a number of different storage devices, from
SATA disks to external RAID arrays. It also managed to
trip up some buggy firmware in at least one drive, causing
data corruption.
The next patch will bump the default max_sectors_kb value to
1280, which will accommodate a 10-data-disk stripe write
with chunk size 128k. In the testing I've done using iozone,
fio, and aio-stress, a value of 1280 does not show a big
performance difference from 512. This will hopefully still
help the software RAID setup that Christoph saw the original
performance gains with while still not regressing other
storage configurations.
Signed-off-by: Jeff Moyer <[email protected]>
---
block/blk-settings.c | 4 +++-
drivers/block/aoe/aoeblk.c | 2 +-
include/linux/blkdev.h | 1 +
3 files changed, 5 insertions(+), 2 deletions(-)
diff --git a/block/blk-settings.c b/block/blk-settings.c
index 12600bf..b160f89 100644
--- a/block/blk-settings.c
+++ b/block/blk-settings.c
@@ -257,7 +257,9 @@ void blk_limits_max_hw_sectors(struct queue_limits *limits, unsigned int max_hw_
__func__, max_hw_sectors);
}
- limits->max_sectors = limits->max_hw_sectors = max_hw_sectors;
+ limits->max_hw_sectors = max_hw_sectors;
+ limits->max_sectors = min_t(unsigned int, max_hw_sectors,
+ BLK_DEF_MAX_SECTORS);
}
EXPORT_SYMBOL(blk_limits_max_hw_sectors);
diff --git a/drivers/block/aoe/aoeblk.c b/drivers/block/aoe/aoeblk.c
index 46c282f..dd73e1f 100644
--- a/drivers/block/aoe/aoeblk.c
+++ b/drivers/block/aoe/aoeblk.c
@@ -395,7 +395,7 @@ aoeblk_gdalloc(void *vp)
WARN_ON(d->flags & DEVFL_TKILL);
WARN_ON(d->gd);
WARN_ON(d->flags & DEVFL_UP);
- blk_queue_max_hw_sectors(q, 1024);
+ blk_queue_max_hw_sectors(q, BLK_DEF_MAX_SECTORS);
q->backing_dev_info.name = "aoe";
q->backing_dev_info.ra_pages = READ_AHEAD / PAGE_CACHE_SIZE;
d->bufpool = mp;
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index d4068c1..1fd459e1 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -1138,6 +1138,7 @@ extern int blk_verify_command(unsigned char *cmd, fmode_t has_write_perm);
enum blk_default_limits {
BLK_MAX_SEGMENTS = 128,
BLK_SAFE_MAX_SECTORS = 255,
+ BLK_DEF_MAX_SECTORS = 1024,
BLK_MAX_SEGMENT_SIZE = 65536,
BLK_SEG_BOUNDARY_MASK = 0xFFFFFFFFUL,
};
--
1.8.3.1
A value of 2560 (1280k) will accommodate a 10-data-disk stripe
write with chunk size 128k. In the testing I've done using
iozone, fio, and aio-stress across a number of different storage
devices, a value of 1280 does not show a big performance
difference from 512, but will hopefully help software RAID
setups using SATA disks, as reported by Christoph.
NOTE: drivers/block/aoe/aoeblk.c sets its own max_hw_sectors_kb to
BLK_DEF_MAX_SECTORS. So, this patch essentially changes aeoblk to
Use a larger maximum sector size, and I did not test this.
Signed-off-by: Jeff Moyer <[email protected]>
---
include/linux/blkdev.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 1fd459e1..dbc845f 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -1138,7 +1138,7 @@ extern int blk_verify_command(unsigned char *cmd, fmode_t has_write_perm);
enum blk_default_limits {
BLK_MAX_SEGMENTS = 128,
BLK_SAFE_MAX_SECTORS = 255,
- BLK_DEF_MAX_SECTORS = 1024,
+ BLK_DEF_MAX_SECTORS = 2560,
BLK_MAX_SEGMENT_SIZE = 65536,
BLK_SEG_BOUNDARY_MASK = 0xFFFFFFFFUL,
};
--
1.8.3.1
ACK.
On 08/13/2015 02:57 PM, Jeff Moyer wrote:
> This reverts commit 34b48db66e08ca1c1bc07cf305d672ac940268dc.
> That commit caused performance regressions for streaming I/O
> workloads on a number of different storage devices, from
> SATA disks to external RAID arrays. It also managed to
> trip up some buggy firmware in at least one drive, causing
> data corruption.
>
> The next patch will bump the default max_sectors_kb value to
> 1280, which will accommodate a 10-data-disk stripe write
> with chunk size 128k. In the testing I've done using iozone,
> fio, and aio-stress, a value of 1280 does not show a big
> performance difference from 512. This will hopefully still
> help the software RAID setup that Christoph saw the original
> performance gains with while still not regressing other
> storage configurations.
>
> Signed-off-by: Jeff Moyer <[email protected]>
> ---
> block/blk-settings.c | 4 +++-
> drivers/block/aoe/aoeblk.c | 2 +-
> include/linux/blkdev.h | 1 +
> 3 files changed, 5 insertions(+), 2 deletions(-)
>
> diff --git a/block/blk-settings.c b/block/blk-settings.c
> index 12600bf..b160f89 100644
> --- a/block/blk-settings.c
> +++ b/block/blk-settings.c
> @@ -257,7 +257,9 @@ void blk_limits_max_hw_sectors(struct queue_limits *limits, unsigned int max_hw_
> __func__, max_hw_sectors);
> }
>
> - limits->max_sectors = limits->max_hw_sectors = max_hw_sectors;
> + limits->max_hw_sectors = max_hw_sectors;
> + limits->max_sectors = min_t(unsigned int, max_hw_sectors,
> + BLK_DEF_MAX_SECTORS);
> }
> EXPORT_SYMBOL(blk_limits_max_hw_sectors);
>
> diff --git a/drivers/block/aoe/aoeblk.c b/drivers/block/aoe/aoeblk.c
> index 46c282f..dd73e1f 100644
> --- a/drivers/block/aoe/aoeblk.c
> +++ b/drivers/block/aoe/aoeblk.c
> @@ -395,7 +395,7 @@ aoeblk_gdalloc(void *vp)
> WARN_ON(d->flags & DEVFL_TKILL);
> WARN_ON(d->gd);
> WARN_ON(d->flags & DEVFL_UP);
> - blk_queue_max_hw_sectors(q, 1024);
> + blk_queue_max_hw_sectors(q, BLK_DEF_MAX_SECTORS);
> q->backing_dev_info.name = "aoe";
> q->backing_dev_info.ra_pages = READ_AHEAD / PAGE_CACHE_SIZE;
> d->bufpool = mp;
> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
> index d4068c1..1fd459e1 100644
> --- a/include/linux/blkdev.h
> +++ b/include/linux/blkdev.h
> @@ -1138,6 +1138,7 @@ extern int blk_verify_command(unsigned char *cmd, fmode_t has_write_perm);
> enum blk_default_limits {
> BLK_MAX_SEGMENTS = 128,
> BLK_SAFE_MAX_SECTORS = 255,
> + BLK_DEF_MAX_SECTORS = 1024,
> BLK_MAX_SEGMENT_SIZE = 65536,
> BLK_SEG_BOUNDARY_MASK = 0xFFFFFFFFUL,
> };
On 08/13/2015 11:57 AM, Jeff Moyer wrote:
> Commit 34b48db66e08 (block: remove artificial max_hw_sectors cap)
> caused performance regressions for streaming I/O workloads across
> a range of storage from SATA disks to enterprise storage arrays.
> I was unable to actually show a performance gain for any storage
> I have access to. However, the patch was introduced to boost
> performance on software RAID over SATA disks, and I don't have
> access to such a configuration.
>
> The compromise is to reinstate BLK_DEF_MAX_SECTORS, but to bump it
> up to 2560 from 1024. This number should allow full stripe writes
> to 10 data disks with a chunk size of 128KB, and at the same time
> does not regress the storage I have access to.
>
> [PATCH 1/2] Revert "block: remove artifical max_hw_sectors cap"
> [PATCH 2/2] block: bump BLK_DEF_MAX_SECTORS to 2560
Thanks, I added both. A bit annoying though, but better to fix up the
perf regression now.
--
Jens Axboe