Currently an IO size is limited to the request_queue limits max_sectors.
Limit the size for an atomic write to queue limit atomic_write_max_sectors
value.
Signed-off-by: John Garry <[email protected]>
---
block/blk-merge.c | 12 +++++++++++-
block/blk.h | 3 +++
2 files changed, 14 insertions(+), 1 deletion(-)
diff --git a/block/blk-merge.c b/block/blk-merge.c
index 0ccc251e22ff..8d4de9253fe9 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -171,7 +171,17 @@ static inline unsigned get_max_io_size(struct bio *bio,
{
unsigned pbs = lim->physical_block_size >> SECTOR_SHIFT;
unsigned lbs = lim->logical_block_size >> SECTOR_SHIFT;
- unsigned max_sectors = lim->max_sectors, start, end;
+ unsigned max_sectors, start, end;
+
+ /*
+ * We ignore lim->max_sectors for atomic writes simply because
+ * it may less than bio->write_atomic_unit, which we cannot
+ * tolerate.
+ */
+ if (bio->bi_opf & REQ_ATOMIC)
+ max_sectors = lim->atomic_write_max_sectors;
+ else
+ max_sectors = lim->max_sectors;
if (lim->chunk_sectors) {
max_sectors = min(max_sectors,
diff --git a/block/blk.h b/block/blk.h
index 94e330e9c853..6f6cd5b1830a 100644
--- a/block/blk.h
+++ b/block/blk.h
@@ -178,6 +178,9 @@ static inline unsigned int blk_queue_get_max_sectors(struct request *rq)
if (unlikely(op == REQ_OP_WRITE_ZEROES))
return q->limits.max_write_zeroes_sectors;
+ if (rq->cmd_flags & REQ_ATOMIC)
+ return q->limits.atomic_write_max_sectors;
+
return q->limits.max_sectors;
}
--
2.35.3
On Tue, Dec 12, 2023 at 11:08:36AM +0000, John Garry wrote:
> Currently an IO size is limited to the request_queue limits max_sectors.
> Limit the size for an atomic write to queue limit atomic_write_max_sectors
> value.
>
> Signed-off-by: John Garry <[email protected]>
> ---
> block/blk-merge.c | 12 +++++++++++-
> block/blk.h | 3 +++
> 2 files changed, 14 insertions(+), 1 deletion(-)
>
> diff --git a/block/blk-merge.c b/block/blk-merge.c
> index 0ccc251e22ff..8d4de9253fe9 100644
> --- a/block/blk-merge.c
> +++ b/block/blk-merge.c
> @@ -171,7 +171,17 @@ static inline unsigned get_max_io_size(struct bio *bio,
> {
> unsigned pbs = lim->physical_block_size >> SECTOR_SHIFT;
> unsigned lbs = lim->logical_block_size >> SECTOR_SHIFT;
> - unsigned max_sectors = lim->max_sectors, start, end;
> + unsigned max_sectors, start, end;
> +
> + /*
> + * We ignore lim->max_sectors for atomic writes simply because
> + * it may less than bio->write_atomic_unit, which we cannot
> + * tolerate.
> + */
> + if (bio->bi_opf & REQ_ATOMIC)
> + max_sectors = lim->atomic_write_max_sectors;
> + else
> + max_sectors = lim->max_sectors;
I can understand the trouble for write atomic from bio split, which
may simply split in the max_sectors boundary, however this change is
still too fragile:
1) ->max_sectors may be set from userspace
- so this change simply override userspace setting
2) otherwise ->max_sectors is same with ->max_hw_sectors:
- then something must be wrong in device side or driver side because
->write_atomic_unit conflicts with ->max_hw_sectors, which is supposed
to be figured out before device is setup
3) too big max_sectors may break driver or device, such as nvme-pci
aligns max_hw_sectors with DMA optimized mapping size
And there might be more(better) choices:
1) make sure atomic write limit is respected when userspace updates
->max_sectors
2) when driver finds that atomic write limits conflict with other
existed hardware limits, fail or solve(such as reduce write atomic unit) the
conflict before queue is started; With single write atomic limits update API,
the conflict can be figured out earlier by block layer too.
thanks,
Ming
On 15/12/2023 02:27, Ming Lei wrote:
> On Tue, Dec 12, 2023 at 11:08:36AM +0000, John Garry wrote:
>> Currently an IO size is limited to the request_queue limits max_sectors.
>> Limit the size for an atomic write to queue limit atomic_write_max_sectors
>> value.
>>
>> Signed-off-by: John Garry <[email protected]>
>> ---
>> block/blk-merge.c | 12 +++++++++++-
>> block/blk.h | 3 +++
>> 2 files changed, 14 insertions(+), 1 deletion(-)
>>
>> diff --git a/block/blk-merge.c b/block/blk-merge.c
>> index 0ccc251e22ff..8d4de9253fe9 100644
>> --- a/block/blk-merge.c
>> +++ b/block/blk-merge.c
>> @@ -171,7 +171,17 @@ static inline unsigned get_max_io_size(struct bio *bio,
>> {
>> unsigned pbs = lim->physical_block_size >> SECTOR_SHIFT;
>> unsigned lbs = lim->logical_block_size >> SECTOR_SHIFT;
>> - unsigned max_sectors = lim->max_sectors, start, end;
>> + unsigned max_sectors, start, end;
>> +
>> + /*
>> + * We ignore lim->max_sectors for atomic writes simply because
>> + * it may less than bio->write_atomic_unit, which we cannot
>> + * tolerate.
>> + */
>> + if (bio->bi_opf & REQ_ATOMIC)
>> + max_sectors = lim->atomic_write_max_sectors;
>> + else
>> + max_sectors = lim->max_sectors;
Please note that I mentioned this issue in the cover letter, so you can
see some discussion there (if missed).
>
> I can understand the trouble for write atomic from bio split, which
> may simply split in the max_sectors boundary, however this change is
> still too fragile:
>
> 1) ->max_sectors may be set from userspace
> - so this change simply override userspace setting
yes
>
> 2) otherwise ->max_sectors is same with ->max_hw_sectors:
>
> - then something must be wrong in device side or driver side because
> ->write_atomic_unit conflicts with ->max_hw_sectors, which is supposed
> to be figured out before device is setup
>
Right, so I think that it is proper to limit atomic_write_unit_max et al
to max_hw_sectors or whatever DMA engine device limits. I can make that
change.
> 3) too big max_sectors may break driver or device, such as nvme-pci
> aligns max_hw_sectors with DMA optimized mapping size
>
> And there might be more(better) choices:
>
> 1) make sure atomic write limit is respected when userspace updates
> ->max_sectors
My mind has been changed and I would say no and we can treat
atomic_write_unit_max as special. Indeed, max_sectors and
atomic_write_unit_max are complementary. If someone sets max_sectors to
4KB and then tries an atomic write of 16KB then they don't know what
they are doing.
My original idea was to dynamically limit atomic_unit_unit_max et al to
max_sectors (so that max_sectors is always respected for atomic writes).
As an alternative, how about we keep the value of atomic_unit_unit_max
static, but reject an atomic write if it exceeds max_sectors? It's not
too different than dynamically limiting atomic_unit_unit_max. But as
mentioned, it may be asking for trouble....
>
> 2) when driver finds that atomic write limits conflict with other
> existed hardware limits, fail or solve(such as reduce write atomic unit) the
> conflict before queue is started; With single write atomic limits update API,
> the conflict can be figured out earlier by block layer too.
I think that we can do this, but I am not sure what other queue limits
may conflict (apart from max_sectors / max_sectors_hw). There is max
segment size, but we just rely on a single PAGE per iovec to evaluate
atomic_unit_unit_max, so should not be an issue.
Thanks,
John