The math in both blk_stack_limits() and queue_limit_alignment_offset()
assume that a block device's io_min (aka minimum_io_size) is always a
power-of-2. Fix the math such that it works for non-power-of-2 io_min.
This issue (of alignment_offset != 0) became apparent when testing
dm-thinp with a thinp blocksize that matches a RAID6 stripesize of
1280K. Commit fdfb4c8c1 ("dm thin: set minimum_io_size to pool's data
block size") unlocked the potential for alignment_offset != 0 due to
the dm-thin-pool's io_min possibly being a non-power-of-2.
Signed-off-by: Mike Snitzer <[email protected]>
Cc: [email protected]
---
block/blk-settings.c | 4 ++--
include/linux/blkdev.h | 5 ++---
2 files changed, 4 insertions(+), 5 deletions(-)
diff --git a/block/blk-settings.c b/block/blk-settings.c
index f1a1795..aa02247 100644
--- a/block/blk-settings.c
+++ b/block/blk-settings.c
@@ -574,7 +574,7 @@ int blk_stack_limits(struct queue_limits *t, struct queue_limits *b,
bottom = max(b->physical_block_size, b->io_min) + alignment;
/* Verify that top and bottom intervals line up */
- if (max(top, bottom) & (min(top, bottom) - 1)) {
+ if (max(top, bottom) % min(top, bottom)) {
t->misaligned = 1;
ret = -1;
}
@@ -619,7 +619,7 @@ int blk_stack_limits(struct queue_limits *t, struct queue_limits *b,
/* Find lowest common alignment_offset */
t->alignment_offset = lcm(t->alignment_offset, alignment)
- & (max(t->physical_block_size, t->io_min) - 1);
+ % max(t->physical_block_size, t->io_min);
/* Verify that new alignment_offset is on a logical block boundary */
if (t->alignment_offset & (t->logical_block_size - 1)) {
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 038b40f..e077b92 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -1279,10 +1279,9 @@ static inline int queue_alignment_offset(struct request_queue *q)
static inline int queue_limit_alignment_offset(struct queue_limits *lim, sector_t sector)
{
unsigned int granularity = max(lim->physical_block_size, lim->io_min);
- unsigned int alignment = (sector << 9) & (granularity - 1);
+ unsigned int alignment = sector_div(sector, granularity >> 9);
- return (granularity + lim->alignment_offset - alignment)
- & (granularity - 1);
+ return (granularity + lim->alignment_offset - alignment) % granularity;
}
static inline int bdev_alignment_offset(struct block_device *bdev)
--
1.8.3.1
On 10/08/2014 04:05 PM, Mike Snitzer wrote:
> The math in both blk_stack_limits() and queue_limit_alignment_offset()
> assume that a block device's io_min (aka minimum_io_size) is always a
> power-of-2. Fix the math such that it works for non-power-of-2 io_min.
>
> This issue (of alignment_offset != 0) became apparent when testing
> dm-thinp with a thinp blocksize that matches a RAID6 stripesize of
> 1280K. Commit fdfb4c8c1 ("dm thin: set minimum_io_size to pool's data
> block size") unlocked the potential for alignment_offset != 0 due to
> the dm-thin-pool's io_min possibly being a non-power-of-2.
Well that sucks, AND with a mask is considerably cheaper than a MOD...
--
Jens Axboe
On Wed, Oct 08 2014 at 6:12pm -0400,
Jens Axboe <[email protected]> wrote:
> On 10/08/2014 04:05 PM, Mike Snitzer wrote:
> > The math in both blk_stack_limits() and queue_limit_alignment_offset()
> > assume that a block device's io_min (aka minimum_io_size) is always a
> > power-of-2. Fix the math such that it works for non-power-of-2 io_min.
> >
> > This issue (of alignment_offset != 0) became apparent when testing
> > dm-thinp with a thinp blocksize that matches a RAID6 stripesize of
> > 1280K. Commit fdfb4c8c1 ("dm thin: set minimum_io_size to pool's data
> > block size") unlocked the potential for alignment_offset != 0 due to
> > the dm-thin-pool's io_min possibly being a non-power-of-2.
>
> Well that sucks, AND with a mask is considerably cheaper than a MOD...
Yeah, certainly does suck (please note v2 that I just sent). The MODs
shouldn't kill us, these functions aren't called in any real hot path.
A storm at boot maybe.. or SCSI rescan but...
On 10/08/2014 04:28 PM, Mike Snitzer wrote:
> On Wed, Oct 08 2014 at 6:12pm -0400,
> Jens Axboe <[email protected]> wrote:
>
>> On 10/08/2014 04:05 PM, Mike Snitzer wrote:
>>> The math in both blk_stack_limits() and queue_limit_alignment_offset()
>>> assume that a block device's io_min (aka minimum_io_size) is always a
>>> power-of-2. Fix the math such that it works for non-power-of-2 io_min.
>>>
>>> This issue (of alignment_offset != 0) became apparent when testing
>>> dm-thinp with a thinp blocksize that matches a RAID6 stripesize of
>>> 1280K. Commit fdfb4c8c1 ("dm thin: set minimum_io_size to pool's data
>>> block size") unlocked the potential for alignment_offset != 0 due to
>>> the dm-thin-pool's io_min possibly being a non-power-of-2.
>>
>> Well that sucks, AND with a mask is considerably cheaper than a MOD...
>
> Yeah, certainly does suck (please note v2 that I just sent). The MODs
> shouldn't kill us, these functions aren't called in any real hot path.
> A storm at boot maybe.. or SCSI rescan but...
I had it mixed up with the recent blk_max_size_offset() - you are right,
this is not in a hot path. For that case, I don't really care, it's fine.
Is v2 runtime tested?
--
Jens Axboe
On Wed, Oct 08 2014 at 6:38pm -0400,
Jens Axboe <[email protected]> wrote:
> On 10/08/2014 04:28 PM, Mike Snitzer wrote:
> > On Wed, Oct 08 2014 at 6:12pm -0400,
> > Jens Axboe <[email protected]> wrote:
> >
> >> On 10/08/2014 04:05 PM, Mike Snitzer wrote:
> >>> The math in both blk_stack_limits() and queue_limit_alignment_offset()
> >>> assume that a block device's io_min (aka minimum_io_size) is always a
> >>> power-of-2. Fix the math such that it works for non-power-of-2 io_min.
> >>>
> >>> This issue (of alignment_offset != 0) became apparent when testing
> >>> dm-thinp with a thinp blocksize that matches a RAID6 stripesize of
> >>> 1280K. Commit fdfb4c8c1 ("dm thin: set minimum_io_size to pool's data
> >>> block size") unlocked the potential for alignment_offset != 0 due to
> >>> the dm-thin-pool's io_min possibly being a non-power-of-2.
> >>
> >> Well that sucks, AND with a mask is considerably cheaper than a MOD...
> >
> > Yeah, certainly does suck (please note v2 that I just sent). The MODs
> > shouldn't kill us, these functions aren't called in any real hot path.
> > A storm at boot maybe.. or SCSI rescan but...
>
> I had it mixed up with the recent blk_max_size_offset() - you are right,
> this is not in a hot path. For that case, I don't really care, it's fine.
>
> Is v2 runtime tested?
Yes.
Here is the DM stack for an lvm created dm-thin-pool (dm-5).
# lsblk /dev/skd0
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
skd0 252:0 0 745.3G 0 disk
├─bricks-mypool_tmeta 253:2 0 15.8G 0 lvm
│ └─bricks-mypool-tpool 253:4 0 512G 0 lvm
│ └─bricks-mypool 253:5 0 512G 0 lvm
└─bricks-mypool_tdata 253:3 0 512G 0 lvm
└─bricks-mypool-tpool 253:4 0 512G 0 lvm
└─bricks-mypool 253:5 0 512G 0 lvm
Before patch:
# cat /sys/block/dm-5/alignment_offset
1048576
After patch:
# cat /sys/block/dm-5/alignment_offset
0