2014-10-01 13:08:23

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH] block: remove artifical max_hw_sectors cap

As we still haven't made any progress on this let me explain why
the limit does not make sense: It only applies to _FS request,
which basically have three use cases:

- metadata I/O. Generally small enough that the limit does not
matter at all.
- buffered reads/writes. We already have a self-tuning algorithm
that limits writeback size, and a readahead size tunable that
caps read sizes. Imposing another confusing limit that does
not interact with the visible tunables here is not helpful
- direct I/O. Users should get something resembling their request
as closely as possible on the write, and this is where our
stupid limitation causes the most problems.

On Sat, Sep 06, 2014 at 04:08:05PM -0700, Christoph Hellwig wrote:
> Set max_sectors to the value the drivers provides as hardware limit by
> default. Linux had proper I/O throttling for a long time and doesn't
> rely on a artifically small maximum I/O size anymore. By not limiting
> the I/O size by default we remove an annoying tuning step required for
> most Linux installation.
>
> Note that both the user, and if absolutely required the driver can still
> impose a limit for FS requests below max_hw_sectors_kb.
>
> Signed-off-by: Christoph Hellwig <[email protected]>
> ---
> block/blk-settings.c | 4 +---
> drivers/block/aoe/aoeblk.c | 2 +-
> include/linux/blkdev.h | 1 -
> 3 files changed, 2 insertions(+), 5 deletions(-)
>
> diff --git a/block/blk-settings.c b/block/blk-settings.c
> index f1a1795..f52c223 100644
> --- a/block/blk-settings.c
> +++ b/block/blk-settings.c
> @@ -257,9 +257,7 @@ void blk_limits_max_hw_sectors(struct queue_limits *limits, unsigned int max_hw_
> __func__, max_hw_sectors);
> }
>
> - limits->max_hw_sectors = max_hw_sectors;
> - limits->max_sectors = min_t(unsigned int, max_hw_sectors,
> - BLK_DEF_MAX_SECTORS);
> + limits->max_sectors = limits->max_hw_sectors = max_hw_sectors;
> }
> EXPORT_SYMBOL(blk_limits_max_hw_sectors);
>
> diff --git a/drivers/block/aoe/aoeblk.c b/drivers/block/aoe/aoeblk.c
> index dd73e1f..46c282f 100644
> --- a/drivers/block/aoe/aoeblk.c
> +++ b/drivers/block/aoe/aoeblk.c
> @@ -395,7 +395,7 @@ aoeblk_gdalloc(void *vp)
> WARN_ON(d->flags & DEVFL_TKILL);
> WARN_ON(d->gd);
> WARN_ON(d->flags & DEVFL_UP);
> - blk_queue_max_hw_sectors(q, BLK_DEF_MAX_SECTORS);
> + blk_queue_max_hw_sectors(q, 1024);
> q->backing_dev_info.name = "aoe";
> q->backing_dev_info.ra_pages = READ_AHEAD / PAGE_CACHE_SIZE;
> d->bufpool = mp;
> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
> index 518b465..7e3c172 100644
> --- a/include/linux/blkdev.h
> +++ b/include/linux/blkdev.h
> @@ -1192,7 +1192,6 @@ extern int blk_verify_command(unsigned char *cmd, fmode_t has_write_perm);
> enum blk_default_limits {
> BLK_MAX_SEGMENTS = 128,
> BLK_SAFE_MAX_SECTORS = 255,
> - BLK_DEF_MAX_SECTORS = 1024,
> BLK_MAX_SEGMENT_SIZE = 65536,
> BLK_SEG_BOUNDARY_MASK = 0xFFFFFFFFUL,
> };
> --
> 1.9.1
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
---end quoted text---


Subject: RE: [PATCH] block: remove artifical max_hw_sectors cap



> -----Original Message-----
> From: [email protected] [mailto:linux-scsi-
> [email protected]] On Behalf Of Christoph Hellwig
> Sent: Wednesday, 01 October, 2014 8:08 AM
> To: Jens Axboe; [email protected]; [email protected]; Wu
> Fengguang
> Subject: Re: [PATCH] block: remove artifical max_hw_sectors cap
>
> As we still haven't made any progress on this let me explain why
> the limit does not make sense: It only applies to _FS request,
> which basically have three use cases:
>
> - metadata I/O. Generally small enough that the limit does not
> matter at all.
> - buffered reads/writes. We already have a self-tuning algorithm
> that limits writeback size, and a readahead size tunable that
> caps read sizes. Imposing another confusing limit that does
> not interact with the visible tunables here is not helpful
> - direct I/O. Users should get something resembling their request
> as closely as possible on the write, and this is where our
> stupid limitation causes the most problems.

One supporting example: A low limit interferes with creation of
full stripe writes for RAID controllers.



> On Sat, Sep 06, 2014 at 04:08:05PM -0700, Christoph Hellwig wrote:
> > Set max_sectors to the value the drivers provides as hardware limit by
> > default. Linux had proper I/O throttling for a long time and doesn't
> > rely on a artifically small maximum I/O size anymore. By not limiting
> > the I/O size by default we remove an annoying tuning step required for
> > most Linux installation.
> >
> > Note that both the user, and if absolutely required the driver can still
> > impose a limit for FS requests below max_hw_sectors_kb.
> >
> > Signed-off-by: Christoph Hellwig <[email protected]>
> > ---
> > block/blk-settings.c | 4 +---
> > drivers/block/aoe/aoeblk.c | 2 +-
> > include/linux/blkdev.h | 1 -
> > 3 files changed, 2 insertions(+), 5 deletions(-)
> >
> > diff --git a/block/blk-settings.c b/block/blk-settings.c
> > index f1a1795..f52c223 100644
> > --- a/block/blk-settings.c
> > +++ b/block/blk-settings.c
> > @@ -257,9 +257,7 @@ void blk_limits_max_hw_sectors(struct queue_limits
> *limits, unsigned int max_hw_
> > __func__, max_hw_sectors);
> > }
> >
> > - limits->max_hw_sectors = max_hw_sectors;
> > - limits->max_sectors = min_t(unsigned int, max_hw_sectors,
> > - BLK_DEF_MAX_SECTORS);
> > + limits->max_sectors = limits->max_hw_sectors = max_hw_sectors;
> > }
> > EXPORT_SYMBOL(blk_limits_max_hw_sectors);

1. Documentation/block/biodoc.txt needs some updates:

blk_queue_max_sectors(q, max_sectors)
Sets two variables that limit the size of the request.

- The request queue's max_sectors, which is a soft size in
units of 512 byte sectors, and could be dynamically varied
by the core kernel.

- The request queue's max_hw_sectors, which is a hard limit
and reflects the maximum size request a driver can handle
in units of 512 byte sectors.

The default for both max_sectors and max_hw_sectors is
255. The upper limit of max_sectors is 1024.

There is no function with that name (it is now called
blk_queue_max_hw_sectors), the upper limit of max_sectors
is max_hw_sectors, and the default is misleading (255
is the default if the LLD doesn't provide max_hw_sectors).

2. Testing with hpsa and mpt3sas, this patch works as expected
for this setting. I/O sizes are still limited by max_segments,
which is expected. Something else is still limiting I/O sizes
to 1 MiB, though; probably bio_get_nr_vecs enforcing a maximum
size per bio of BIO_MAX_PAGES 256 (which is 1 MiB with 4 KiB
pages).


Otherwise,
Reviewed-by: Robert Elliott <[email protected]>
Tested-by: Robert Elliott <[email protected]>

---
Rob Elliott HP Server Storage


2014-10-01 21:13:03

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH] block: remove artifical max_hw_sectors cap

On Wed, Oct 01, 2014 at 06:59:34PM +0000, Elliott, Robert (Server Storage) wrote:
> One supporting example: A low limit interferes with creation of
> full stripe writes for RAID controllers.

Yes, both Linux mdraid and parity raids are often crippled by the
default. The mdadm default of 512kb for each chunk combined with
the default limit very much guarantees a horrible out of the box
experience.

2014-10-17 13:15:53

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH] block: remove artifical max_hw_sectors cap

On Wed, Oct 01, 2014 at 02:12:58PM -0700, Christoph Hellwig wrote:
> On Wed, Oct 01, 2014 at 06:59:34PM +0000, Elliott, Robert (Server Storage) wrote:
> > One supporting example: A low limit interferes with creation of
> > full stripe writes for RAID controllers.
>
> Yes, both Linux mdraid and parity raids are often crippled by the
> default. The mdadm default of 512kb for each chunk combined with
> the default limit very much guarantees a horrible out of the box
> experience.

Jens, can you apply this patch once the 3.19 queue opens?

2014-10-21 20:02:29

by Jens Axboe

[permalink] [raw]
Subject: Re: [PATCH] block: remove artifical max_hw_sectors cap

On 10/17/2014 07:12 AM, Christoph Hellwig wrote:
> On Wed, Oct 01, 2014 at 02:12:58PM -0700, Christoph Hellwig wrote:
>> On Wed, Oct 01, 2014 at 06:59:34PM +0000, Elliott, Robert (Server Storage) wrote:
>>> One supporting example: A low limit interferes with creation of
>>> full stripe writes for RAID controllers.
>>
>> Yes, both Linux mdraid and parity raids are often crippled by the
>> default. The mdadm default of 512kb for each chunk combined with
>> the default limit very much guarantees a horrible out of the box
>> experience.
>
> Jens, can you apply this patch once the 3.19 queue opens?

Yup, lets get it applied.

--
Jens Axboe