LinuxLists.cc - [PATCH 01/11] blk-mq: Add blk_mq_init_queue

2022-03-22 12:17:06

Subject: [PATCH 01/11] blk-mq: Add blk_mq_init_queue_ops()

Add an API to allocate a request queue which accepts a custom set of
blk_mq_ops for that request queue.

The reason which we may want custom ops is for queuing requests which we
don't want to go through the normal queuing path.

Signed-off-by: John Garry <[email protected]>
---
block/blk-mq.c | 23 +++++++++++++++++------
drivers/md/dm-rq.c | 2 +-
include/linux/blk-mq.h | 5 ++++-
3 files changed, 22 insertions(+), 8 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index f3bf3358a3bb..8ea3447339ca 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -3858,7 +3858,7 @@ void blk_mq_release(struct request_queue *q)
}

static struct request_queue *blk_mq_init_queue_data(struct blk_mq_tag_set *set,
- void *queuedata)
+ void *queuedata, const struct blk_mq_ops *ops)
{
struct request_queue *q;
int ret;
@@ -3867,27 +3867,35 @@ static struct request_queue *blk_mq_init_queue_data(struct blk_mq_tag_set *set,
if (!q)
return ERR_PTR(-ENOMEM);
q->queuedata = queuedata;
- ret = blk_mq_init_allocated_queue(set, q);
+ ret = blk_mq_init_allocated_queue(set, q, ops);
if (ret) {
blk_cleanup_queue(q);
return ERR_PTR(ret);
}
+
return q;
}

struct request_queue *blk_mq_init_queue(struct blk_mq_tag_set *set)
{
- return blk_mq_init_queue_data(set, NULL);
+ return blk_mq_init_queue_data(set, NULL, NULL);
}
EXPORT_SYMBOL(blk_mq_init_queue);

+struct request_queue *blk_mq_init_queue_ops(struct blk_mq_tag_set *set,
+ const struct blk_mq_ops *custom_ops)
+{
+ return blk_mq_init_queue_data(set, NULL, custom_ops);
+}
+EXPORT_SYMBOL(blk_mq_init_queue_ops);
+
struct gendisk *__blk_mq_alloc_disk(struct blk_mq_tag_set *set, void *queuedata,
struct lock_class_key *lkclass)
{
struct request_queue *q;
struct gendisk *disk;

- q = blk_mq_init_queue_data(set, queuedata);
+ q = blk_mq_init_queue_data(set, queuedata, NULL);
if (IS_ERR(q))
return ERR_CAST(q);

@@ -4010,13 +4018,16 @@ static void blk_mq_realloc_hw_ctxs(struct blk_mq_tag_set *set,
}

int blk_mq_init_allocated_queue(struct blk_mq_tag_set *set,
- struct request_queue *q)
+ struct request_queue *q, const struct blk_mq_ops *custom_ops)
{
WARN_ON_ONCE(blk_queue_has_srcu(q) !=
!!(set->flags & BLK_MQ_F_BLOCKING));

/* mark the queue as mq asap */
- q->mq_ops = set->ops;
+ if (custom_ops)
+ q->mq_ops = custom_ops;
+ else
+ q->mq_ops = set->ops;

q->poll_cb = blk_stat_alloc_callback(blk_mq_poll_stats_fn,
blk_mq_poll_stats_bkt,
diff --git a/drivers/md/dm-rq.c b/drivers/md/dm-rq.c
index 3907950a0ddc..9d93f72a3eec 100644
--- a/drivers/md/dm-rq.c
+++ b/drivers/md/dm-rq.c
@@ -560,7 +560,7 @@ int dm_mq_init_request_queue(struct mapped_device *md, struct dm_table *t)
if (err)
goto out_kfree_tag_set;

- err = blk_mq_init_allocated_queue(md->tag_set, md->queue);
+ err = blk_mq_init_allocated_queue(md->tag_set, md->queue, NULL);
if (err)
goto out_tag_set;
return 0;
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index d319ffa59354..e12d17c86c52 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -688,8 +688,11 @@ struct gendisk *__blk_mq_alloc_disk(struct blk_mq_tag_set *set, void *queuedata,
__blk_mq_alloc_disk(set, queuedata, &__key); \
})
struct request_queue *blk_mq_init_queue(struct blk_mq_tag_set *);
+struct request_queue *blk_mq_init_queue_ops(struct blk_mq_tag_set *,
+ const struct blk_mq_ops *custom_ops);
+
int blk_mq_init_allocated_queue(struct blk_mq_tag_set *set,
- struct request_queue *q);
+ struct request_queue *q, const struct blk_mq_ops *custom_ops);
void blk_mq_unregister_dev(struct device *, struct request_queue *);

int blk_mq_alloc_tag_set(struct blk_mq_tag_set *set);
--
2.26.2

2022-03-22 14:38:17

by Christoph Hellwig

[permalink] [raw]

Subject: Re: [PATCH 01/11] blk-mq: Add blk_mq_init_queue_ops()

On Tue, Mar 22, 2022 at 06:39:35PM +0800, John Garry wrote:
> Add an API to allocate a request queue which accepts a custom set of
> blk_mq_ops for that request queue.
>
> The reason which we may want custom ops is for queuing requests which we
> don't want to go through the normal queuing path.

Eww. I really do not think we should do separate ops per queue, as that
is going to get us into a deep mess eventually.

2022-03-22 15:19:57

by Hannes Reinecke

[permalink] [raw]

Subject: Re: [PATCH 01/11] blk-mq: Add blk_mq_init_queue_ops()

On 3/22/22 13:30, John Garry wrote:
> On 22/03/2022 12:16, Hannes Reinecke wrote:
>> On 3/22/22 12:33, John Garry wrote:
>>> On 22/03/2022 11:18, Christoph Hellwig wrote:
>>>> On Tue, Mar 22, 2022 at 06:39:35PM +0800, John Garry wrote:
>>>>> Add an API to allocate a request queue which accepts a custom set of
>>>>> blk_mq_ops for that request queue.
>>>>>
>>>>> The reason which we may want custom ops is for queuing requests
>>>>> which we
>>>>> don't want to go through the normal queuing path.
>>>>
>>>> Eww. I really do not think we should do separate ops per queue, as
>>>> that
>>>> is going to get us into a deep mess eventually.
>>>>
>>>
>>> Yeah... so far (here) it works out quite nicely, as we don't need to
>>> change the SCSI blk mq ops nor allocate a scsi_device - everything is
>>> just separate.
>>>
>>> The other method mentioned previously was to add the request
>>> "reserved" flag and add new paths in scsi_queue_rq() et al to handle
>>> this, but that gets messy.
>>>
>>> Any other ideas ...?
>>>
>>
>> As outlined in the other mail, I think might be useful is to have a
>> _third_ type of requests (in addition to the normal and the reserved
>> ones).
>> That one would be allocated from the normal I/O pool (and hence could
>> fail if the pool is exhausted), but would be able to carry a different
>> payload (type) than the normal requests.
>
> As mentioned in the cover letter response, it just seems best to keep
> the normal scsi_cmnd payload but have other means to add on the internal
> command data, like using host_scribble or scsi_cmnd priv data.
>
Well; I found that most drivers I had been looking at the scsi command
payload isn't used at all; the drivers primarily cared about the
(driver-provided) payload, and were completely ignoring the scsi command
payload.

Similar for ATA/libsas: you basically never issue real scsi commands,
but either 'raw' ATA requests or SCSI TMFs. None of which are scsi
commands, so providing them is a bit of a waste.

(And causes irritations, too, as a scsi command requires associated
pointers like ->device etc to be set up. Which makes it tricky to use
for the initial device setup.)

>> And we could have a separate queue_rq for these requests, as we can
>> differentiate them in the block layer.
>
> I don't know, let me think about it. Maybe we could add an "internal"
> blk flag, which uses a separate "internal" queue_rq callback.
>
Yeah, that's what I had in mind.

Cheers,

Hannes
--
Dr. Hannes Reinecke Kernel Storage Architect
[email protected] +49 911 74053 688
SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), GF: Felix Imendörffer

2022-03-22 16:37:34

by Hannes Reinecke

[permalink] [raw]

Subject: Re: [PATCH 01/11] blk-mq: Add blk_mq_init_queue_ops()

On 3/22/22 12:33, John Garry wrote:
> On 22/03/2022 11:18, Christoph Hellwig wrote:
>> On Tue, Mar 22, 2022 at 06:39:35PM +0800, John Garry wrote:
>>> Add an API to allocate a request queue which accepts a custom set of
>>> blk_mq_ops for that request queue.
>>>
>>> The reason which we may want custom ops is for queuing requests which we
>>> don't want to go through the normal queuing path.
>>
>> Eww. I really do not think we should do separate ops per queue, as that
>> is going to get us into a deep mess eventually.
>>
>
> Yeah... so far (here) it works out quite nicely, as we don't need to
> change the SCSI blk mq ops nor allocate a scsi_device - everything is
> just separate.
>
> The other method mentioned previously was to add the request "reserved"
> flag and add new paths in scsi_queue_rq() et al to handle this, but that
> gets messy.
>
> Any other ideas ...?
>

As outlined in the other mail, I think might be useful is to have a
_third_ type of requests (in addition to the normal and the reserved ones).
That one would be allocated from the normal I/O pool (and hence could
fail if the pool is exhausted), but would be able to carry a different
payload (type) than the normal requests.
And we could have a separate queue_rq for these requests, as we can
differentiate them in the block layer.

Cheers,

Hannes
--
Dr. Hannes Reinecke Kernel Storage Architect
[email protected] +49 911 74053 688
SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), GF: Felix Imendörffer

2022-03-22 17:26:01

by John Garry

[permalink] [raw]

Subject: Re: [PATCH 01/11] blk-mq: Add blk_mq_init_queue_ops()

On 22/03/2022 12:16, Hannes Reinecke wrote:
> On 3/22/22 12:33, John Garry wrote:
>> On 22/03/2022 11:18, Christoph Hellwig wrote:
>>> On Tue, Mar 22, 2022 at 06:39:35PM +0800, John Garry wrote:
>>>> Add an API to allocate a request queue which accepts a custom set of
>>>> blk_mq_ops for that request queue.
>>>>
>>>> The reason which we may want custom ops is for queuing requests
>>>> which we
>>>> don't want to go through the normal queuing path.
>>>
>>> Eww. I really do not think we should do separate ops per queue, as that
>>> is going to get us into a deep mess eventually.
>>>
>>
>> Yeah... so far (here) it works out quite nicely, as we don't need to
>> change the SCSI blk mq ops nor allocate a scsi_device - everything is
>> just separate.
>>
>> The other method mentioned previously was to add the request
>> "reserved" flag and add new paths in scsi_queue_rq() et al to handle
>> this, but that gets messy.
>>
>> Any other ideas ...?
>>
>
> As outlined in the other mail, I think might be useful is to have a
> _third_ type of requests (in addition to the normal and the reserved ones).
> That one would be allocated from the normal I/O pool (and hence could
> fail if the pool is exhausted), but would be able to carry a different
> payload (type) than the normal requests.

As mentioned in the cover letter response, it just seems best to keep
the normal scsi_cmnd payload but have other means to add on the internal
command data, like using host_scribble or scsi_cmnd priv data.

> And we could have a separate queue_rq for these requests, as we can
> differentiate them in the block layer.

I don't know, let me think about it. Maybe we could add an "internal"
blk flag, which uses a separate "internal" queue_rq callback.

Thanks,
John

2022-03-22 20:02:40

by Hannes Reinecke

[permalink] [raw]

Subject: Re: [PATCH 01/11] blk-mq: Add blk_mq_init_queue_ops()

On 3/22/22 16:17, John Garry wrote:
> On 22/03/2022 14:03, Hannes Reinecke wrote:
>>>
>>> As mentioned in the cover letter response, it just seems best to keep
>>> the normal scsi_cmnd payload but have other means to add on the
>>> internal command data, like using host_scribble or scsi_cmnd priv data.
>>>
>> Well; I found that most drivers I had been looking at the scsi command
>> payload isn't used at all; the drivers primarily cared about the
>> (driver-provided) payload, and were completely ignoring the scsi
>> command payload.
>>
>> Similar for ATA/libsas: you basically never issue real scsi commands,
>> but either 'raw' ATA requests or SCSI TMFs. None of which are scsi
>> commands, so providing them is a bit of a waste.
>>
>> (And causes irritations, too, as a scsi command requires associated
>> pointers like ->device etc to be set up. Which makes it tricky to use
>> for the initial device setup.)
>
> A problem I see is that in scsi_mq_init_request() we allocate memories
> like sense_buffer and prot_sdb and store the pointers in the scsi_cmnd
> payload. If we then reuse a scsi_cmnd payload as an "internal" command
> payload then this data may be lost.
>
> It might be possible to reuse the scsi cmnd payload for the "internal",
> but I would rather not get hung up on it now if possible.
>
Or, keep the payload as is, and use the 'internal' marker to indicate
that the scsi payload is not valid.
That would save us quite some checks during endio processing.

Cheers,

Hannes
--
Dr. Hannes Reinecke Kernel Storage Architect
[email protected] +49 911 74053 688
SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), GF: Felix Imendörffer

2022-03-22 23:28:45

by John Garry

[permalink] [raw]

Subject: Re: [PATCH 01/11] blk-mq: Add blk_mq_init_queue_ops()

On 22/03/2022 11:18, Christoph Hellwig wrote:
> On Tue, Mar 22, 2022 at 06:39:35PM +0800, John Garry wrote:
>> Add an API to allocate a request queue which accepts a custom set of
>> blk_mq_ops for that request queue.
>>
>> The reason which we may want custom ops is for queuing requests which we
>> don't want to go through the normal queuing path.
>
> Eww. I really do not think we should do separate ops per queue, as that
> is going to get us into a deep mess eventually.
>

Yeah... so far (here) it works out quite nicely, as we don't need to
change the SCSI blk mq ops nor allocate a scsi_device - everything is
just separate.

The other method mentioned previously was to add the request "reserved"
flag and add new paths in scsi_queue_rq() et al to handle this, but that
gets messy.

Any other ideas ...?

Cheers,
John

2022-03-23 09:46:05

by Bart Van Assche

[permalink] [raw]

Subject: Re: [PATCH 01/11] blk-mq: Add blk_mq_init_queue_ops()

On 3/22/22 03:39, John Garry wrote:
> Add an API to allocate a request queue which accepts a custom set of
> blk_mq_ops for that request queue.
>
> The reason which we may want custom ops is for queuing requests which we
> don't want to go through the normal queuing path.

Custom ops shouldn't be required for this. See e.g. how tmf_queue
is used in the UFS driver for an example of a queue implementation
with custom operations and that does not require changes of the block
layer core.

Thanks,

Bart.

2022-03-23 18:14:11

by John Garry

[permalink] [raw]

Subject: Re: [PATCH 01/11] blk-mq: Add blk_mq_init_queue_ops()

On 23/03/2022 02:57, Bart Van Assche wrote:
> On 3/22/22 03:39, John Garry wrote:
>> Add an API to allocate a request queue which accepts a custom set of
>> blk_mq_ops for that request queue.
>>
>> The reason which we may want custom ops is for queuing requests which we
>> don't want to go through the normal queuing path.
>

Hi Bart,

> Custom ops shouldn't be required for this. See e.g. how tmf_queue
> is used in the UFS driver for an example of a queue implementation
> with custom operations and that does not require changes of the block
> layer core.

The UFS code uses a private tagset (in ufs_hba.tmf_tag_set) for only
management of TMF tags/memories. This tagset does not really have any
custom operations. All it has is a stub of .queue_rq CB in
ufshcd_queue_tmf() and that is because this CB is compulsory.

As for the idea of having multiple tagsets per shost with real custom
operations, this idea was mentioned before, but I think managing
multiple tagsets could be trouble. For a start, it would mean that we
need a distinct allocation of reserved and regular tags, and sometimes
we don't want this - as Hannes mentioned earlier, many HBAs have low
queue depth and cannot afford to permanently carve out a bunch of
reserved tags.

Thanks,
John

2022-03-25 13:28:15

by John Garry

[permalink] [raw]

Subject: Re: [PATCH 01/11] blk-mq: Add blk_mq_init_queue_ops()

On 22/03/2022 14:03, Hannes Reinecke wrote:
>>
>> As mentioned in the cover letter response, it just seems best to keep
>> the normal scsi_cmnd payload but have other means to add on the
>> internal command data, like using host_scribble or scsi_cmnd priv data.
>>
> Well; I found that most drivers I had been looking at the scsi command
> payload isn't used at all; the drivers primarily cared about the
> (driver-provided) payload, and were completely ignoring the scsi command
> payload.
>
> Similar for ATA/libsas: you basically never issue real scsi commands,
> but either 'raw' ATA requests or SCSI TMFs. None of which are scsi
> commands, so providing them is a bit of a waste.
>
> (And causes irritations, too, as a scsi command requires associated
> pointers like ->device etc to be set up. Which makes it tricky to use
> for the initial device setup.)

A problem I see is that in scsi_mq_init_request() we allocate memories
like sense_buffer and prot_sdb and store the pointers in the scsi_cmnd
payload. If we then reuse a scsi_cmnd payload as an "internal" command
payload then this data may be lost.

It might be possible to reuse the scsi cmnd payload for the "internal",
but I would rather not get hung up on it now if possible.

Thanks,
John