2022-03-22 14:16:33

by John Garry

[permalink] [raw]
Subject: [PATCH RFC 00/11] blk-mq/libata/scsi: SCSI driver tagging improvements

Currently SCSI low-level drivers are required to manage tags for commands
which do not come via the block layer - libata internal commands would be
an example of one of these.

There was some work to provide "reserved commands" support in such series
as https://lore.kernel.org/linux-scsi/[email protected]/

This was based on allocating a request for the lifetime of the "internal"
command.

This series tries to solve that problem by not just allocating the request
but also sending it through the block layer, that being the normal flow
for a request. We need to do this as we may only poll completion of
requests through the block layer, so would need to do this for poll queue
support.

There is still scope to allocate commands just to get a tag as token as
that may suit some other scenarios, but it's not what we do here.

This series extends blk-mq to support a request queue having a custom set
of ops. In addition SCSI core code adds support for these type of requests.

This series does not include SCSI core handling for enabling reserved
tags per tagset, but that would be easy to add.

Based on mkp-scsi 5.18/scsi-staging @ 66daf3e6b993

Please consider as an RFC for now. I think that the libata change has the
largest scope for improvement...

John Garry (11):
blk-mq: Add blk_mq_init_queue_ops()
scsi: core: Add SUBMITTED_BY_SCSI_CUSTOM_OPS
libata: Send internal commands through the block layer
scsi: libsas: Send SMP commands through the block layer
scsi: libsas: Send TMF commands through the block layer
scsi: core: Add scsi_alloc_request_hwq()
scsi: libsas: Send internal abort commands through the block layer
scsi: libsas: Change ATA support to deal with each qc having a SCSI
command
scsi: libsas: Add sas_task_to_unique_tag()
scsi: libsas: Add sas_task_to_hwq()
scsi: hisi_sas: Remove private tag management

block/blk-mq.c | 23 +++-
drivers/ata/libata-core.c | 121 +++++++++++++------
drivers/md/dm-rq.c | 2 +-
drivers/scsi/hisi_sas/hisi_sas_main.c | 66 +----------
drivers/scsi/hisi_sas/hisi_sas_v2_hw.c | 3 +-
drivers/scsi/hisi_sas/hisi_sas_v3_hw.c | 3 +-
drivers/scsi/libsas/sas_ata.c | 11 +-
drivers/scsi/libsas/sas_expander.c | 38 ++++--
drivers/scsi/libsas/sas_internal.h | 1 +
drivers/scsi/libsas/sas_scsi_host.c | 153 ++++++++++++++++++++-----
drivers/scsi/scsi_lib.c | 14 +++
include/linux/blk-mq.h | 5 +-
include/scsi/libsas.h | 4 +-
include/scsi/scsi_cmnd.h | 4 +
14 files changed, 298 insertions(+), 150 deletions(-)

--
2.26.2


2022-03-22 14:42:50

by John Garry

[permalink] [raw]
Subject: [PATCH 03/11] libata: Send internal commands through the block layer

When SCSI HBA device drivers are required to process an ATA internal
command they still need a tag for the IO. This often requires the driver
to set aside a set of tags for these sorts of IOs and manage the tags
themselves.

If we associate a SCSI command (and request) with an ATA internal command
then the tag is already provided, so introduce the change to send ATA
internal commands through the block layer with a set of custom blk-mq ops.

note: I think that the timeout handling needs to be fixed up.

Signed-off-by: John Garry <[email protected]>
---
drivers/ata/libata-core.c | 121 ++++++++++++++++++++++++++++----------
1 file changed, 89 insertions(+), 32 deletions(-)

diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c
index 67f88027680a..9db0428d0511 100644
--- a/drivers/ata/libata-core.c
+++ b/drivers/ata/libata-core.c
@@ -1438,6 +1438,59 @@ static void ata_qc_complete_internal(struct ata_queued_cmd *qc)
complete(waiting);
}

+struct ata_internal_sg_data {
+ struct completion wait;
+
+ unsigned int preempted_tag;
+ u32 preempted_sactive;
+ u64 preempted_qc_active;
+ int preempted_nr_active_links;
+};
+
+static blk_status_t ata_exec_internal_sg_queue_rq(struct blk_mq_hw_ctx *hctx,
+ const struct blk_mq_queue_data *bd)
+{
+ struct request *rq = bd->rq;
+ struct scsi_cmnd *scmd = blk_mq_rq_to_pdu(rq);
+ struct ata_queued_cmd *qc = (struct ata_queued_cmd *)scmd->host_scribble;
+ struct ata_internal_sg_data *data;
+ struct ata_device *dev = qc->dev;
+ struct ata_port *ap = qc->ap;
+ struct ata_link *link = dev->link;
+ unsigned long flags;
+
+ data = container_of(qc->private_data, struct ata_internal_sg_data, wait);
+
+ blk_mq_start_request(bd->rq);
+
+ spin_lock_irqsave(ap->lock, flags);
+
+ /* no internal command while frozen */
+ if (ap->pflags & ATA_PFLAG_FROZEN) {
+ spin_unlock_irqrestore(ap->lock, flags);
+ return BLK_STS_TARGET;
+ }
+
+ data->preempted_tag = link->active_tag;
+ data->preempted_sactive = link->sactive;
+ data->preempted_qc_active = ap->qc_active;
+ data->preempted_nr_active_links = ap->nr_active_links;
+ link->active_tag = ATA_TAG_POISON;
+ link->sactive = 0;
+ ap->qc_active = 0;
+ ap->nr_active_links = 0;
+
+ ata_qc_issue(qc);
+
+ spin_unlock_irqrestore(ap->lock, flags);
+
+ return BLK_STS_OK;
+}
+
+static const struct blk_mq_ops ata_exec_internal_sg_mq_ops = {
+ .queue_rq = ata_exec_internal_sg_queue_rq,
+};
+
/**
* ata_exec_internal_sg - execute libata internal command
* @dev: Device to which the command is sent
@@ -1467,45 +1520,46 @@ unsigned ata_exec_internal_sg(struct ata_device *dev,
{
struct ata_link *link = dev->link;
struct ata_port *ap = link->ap;
+ struct Scsi_Host *scsi_host = ap->scsi_host;
+ struct request_queue *request_queue;
u8 command = tf->command;
- int auto_timeout = 0;
struct ata_queued_cmd *qc;
- unsigned int preempted_tag;
- u32 preempted_sactive;
- u64 preempted_qc_active;
- int preempted_nr_active_links;
- DECLARE_COMPLETION_ONSTACK(wait);
- unsigned long flags;
+ struct scsi_cmnd *scmd;
unsigned int err_mask;
- int rc;
+ unsigned long flags;
+ struct request *rq;
+ int rc, auto_timeout = 0;
+ struct ata_internal_sg_data data = {
+ .wait = COMPLETION_INITIALIZER_ONSTACK(data.wait),
+ };
+ unsigned int op;

- spin_lock_irqsave(ap->lock, flags);
+ op = (dma_dir == DMA_TO_DEVICE) ? REQ_OP_DRV_OUT : REQ_OP_DRV_IN;

- /* no internal command while frozen */
- if (ap->pflags & ATA_PFLAG_FROZEN) {
- spin_unlock_irqrestore(ap->lock, flags);
- return AC_ERR_SYSTEM;
+ request_queue = blk_mq_init_queue_ops(&scsi_host->tag_set,
+ &ata_exec_internal_sg_mq_ops);
+ if (!request_queue)
+ return AC_ERR_OTHER;
+
+ rq = scsi_alloc_request(request_queue, op, 0);
+ if (IS_ERR(rq)) {
+ err_mask = AC_ERR_OTHER;
+ goto out;
}

+ scmd = blk_mq_rq_to_pdu(rq);
+ scmd->submitter = SUBMITTED_BY_SCSI_CUSTOM_OPS;
+
/* initialize internal qc */
qc = __ata_qc_from_tag(ap, ATA_TAG_INTERNAL);

qc->tag = ATA_TAG_INTERNAL;
qc->hw_tag = 0;
- qc->scsicmd = NULL;
+ qc->scsicmd = scmd;
qc->ap = ap;
qc->dev = dev;
ata_qc_reinit(qc);

- preempted_tag = link->active_tag;
- preempted_sactive = link->sactive;
- preempted_qc_active = ap->qc_active;
- preempted_nr_active_links = ap->nr_active_links;
- link->active_tag = ATA_TAG_POISON;
- link->sactive = 0;
- ap->qc_active = 0;
- ap->nr_active_links = 0;
-
/* prepare & issue qc */
qc->tf = *tf;
if (cdb)
@@ -1529,12 +1583,11 @@ unsigned ata_exec_internal_sg(struct ata_device *dev,
qc->nbytes = buflen;
}

- qc->private_data = &wait;
+ qc->private_data = &data.wait;
qc->complete_fn = ata_qc_complete_internal;

- ata_qc_issue(qc);
-
- spin_unlock_irqrestore(ap->lock, flags);
+ scmd->host_scribble = (unsigned char *)qc;
+ blk_execute_rq_nowait(rq, true, NULL);

if (!timeout) {
if (ata_probe_timeout)
@@ -1548,7 +1601,7 @@ unsigned ata_exec_internal_sg(struct ata_device *dev,
if (ap->ops->error_handler)
ata_eh_release(ap);

- rc = wait_for_completion_timeout(&wait, msecs_to_jiffies(timeout));
+ rc = wait_for_completion_timeout(&data.wait, msecs_to_jiffies(timeout));

if (ap->ops->error_handler)
ata_eh_acquire(ap);
@@ -1603,16 +1656,20 @@ unsigned ata_exec_internal_sg(struct ata_device *dev,
err_mask = qc->err_mask;

ata_qc_free(qc);
- link->active_tag = preempted_tag;
- link->sactive = preempted_sactive;
- ap->qc_active = preempted_qc_active;
- ap->nr_active_links = preempted_nr_active_links;
+ link->active_tag = data.preempted_tag;
+ link->sactive = data.preempted_sactive;
+ ap->qc_active = data.preempted_qc_active;
+ ap->nr_active_links = data.preempted_nr_active_links;

spin_unlock_irqrestore(ap->lock, flags);

if ((err_mask & AC_ERR_TIMEOUT) && auto_timeout)
ata_internal_cmd_timed_out(dev, command);

+ __blk_mq_end_request(rq, BLK_STS_OK);
+
+out:
+ blk_cleanup_queue(request_queue);
return err_mask;
}

--
2.26.2

2022-03-22 14:48:51

by John Garry

[permalink] [raw]
Subject: [PATCH 11/11] scsi: hisi_sas: Remove private tag management

Now every sas_task which the driver sees has a SCSI command and also
request associated, so drop the internal tag management.

No reserved tags have been set aside in the tagset yet, but this is simple
to do.

For v2 HW we need to continue to allocate the tag internally as the HW is
so broken and we need to use special rules for tag allocation, see
slot_index_alloc_quirk_v2_hw()

Signed-off-by: John Garry <[email protected]>
---
drivers/scsi/hisi_sas/hisi_sas_main.c | 37 +--------------------------
1 file changed, 1 insertion(+), 36 deletions(-)

diff --git a/drivers/scsi/hisi_sas/hisi_sas_main.c b/drivers/scsi/hisi_sas/hisi_sas_main.c
index b3e03c229cb5..19c9ed169c91 100644
--- a/drivers/scsi/hisi_sas/hisi_sas_main.c
+++ b/drivers/scsi/hisi_sas/hisi_sas_main.c
@@ -169,41 +169,6 @@ static void hisi_sas_slot_index_free(struct hisi_hba *hisi_hba, int slot_idx)
}
}

-static void hisi_sas_slot_index_set(struct hisi_hba *hisi_hba, int slot_idx)
-{
- void *bitmap = hisi_hba->slot_index_tags;
-
- __set_bit(slot_idx, bitmap);
-}
-
-static int hisi_sas_slot_index_alloc(struct hisi_hba *hisi_hba,
- struct sas_task *task)
-{
- int index;
- void *bitmap = hisi_hba->slot_index_tags;
-
- if (task)
- return sas_task_to_unique_tag(task);
-
- spin_lock(&hisi_hba->lock);
- index = find_next_zero_bit(bitmap, hisi_hba->slot_index_count,
- hisi_hba->last_slot_index + 1);
- if (index >= hisi_hba->slot_index_count) {
- index = find_next_zero_bit(bitmap,
- hisi_hba->slot_index_count,
- HISI_SAS_UNRESERVED_IPTT);
- if (index >= hisi_hba->slot_index_count) {
- spin_unlock(&hisi_hba->lock);
- return -SAS_QUEUE_FULL;
- }
- }
- hisi_sas_slot_index_set(hisi_hba, index);
- hisi_hba->last_slot_index = index;
- spin_unlock(&hisi_hba->lock);
-
- return index;
-}
-
void hisi_sas_slot_task_free(struct hisi_hba *hisi_hba, struct sas_task *task,
struct hisi_sas_slot *slot)
{
@@ -556,7 +521,7 @@ static int hisi_sas_queue_command(struct sas_task *task, gfp_t gfp_flags)
if (!internal_abort && hisi_hba->hw->slot_index_alloc)
rc = hisi_hba->hw->slot_index_alloc(hisi_hba, device);
else
- rc = hisi_sas_slot_index_alloc(hisi_hba, task);
+ rc = sas_task_to_unique_tag(task);

if (rc < 0)
goto err_out_dif_dma_unmap;
--
2.26.2

2022-03-22 14:58:03

by Hannes Reinecke

[permalink] [raw]
Subject: Re: [PATCH RFC 00/11] blk-mq/libata/scsi: SCSI driver tagging improvements

On 3/22/22 11:39, John Garry wrote:
> Currently SCSI low-level drivers are required to manage tags for commands
> which do not come via the block layer - libata internal commands would be
> an example of one of these.
>
> There was some work to provide "reserved commands" support in such series
> as https://lore.kernel.org/linux-scsi/[email protected]/
>
> This was based on allocating a request for the lifetime of the "internal"
> command.
>
> This series tries to solve that problem by not just allocating the request
> but also sending it through the block layer, that being the normal flow
> for a request. We need to do this as we may only poll completion of
> requests through the block layer, so would need to do this for poll queue
> support.
>
> There is still scope to allocate commands just to get a tag as token as
> that may suit some other scenarios, but it's not what we do here.
>
> This series extends blk-mq to support a request queue having a custom set
> of ops. In addition SCSI core code adds support for these type of requests.
>
> This series does not include SCSI core handling for enabling reserved
> tags per tagset, but that would be easy to add.
>
> Based on mkp-scsi 5.18/scsi-staging @ 66daf3e6b993
>
> Please consider as an RFC for now. I think that the libata change has the
> largest scope for improvement...
>

Grand seeing that someone is taking it up. Thanks for doing this!

But:
Allocating a queue for every request (eg in patch 3) is overkill. If we
want to go that route we should be allocating the queue upfront (eg when
creating the device itself), and then just referencing it.

However, can't say I like this approach. I've been playing around with
supporting internal commands, and really found two constraints really
annoying:

- The tagset supports only _one_ set of payload via
blk_mq_rq_(to,from)_pdu().
This requires each request to be of the same type, and with that making
it really hard for re-purposing the request for internal usage. In the
end I settled by just keeping it and skipping the SCSI command field.
If we could have a distinct PDU type for internal commands I guess
things would be easier.

- The number of reserved commands is static.
With that it's getting really hard using reserved commands with
low-queue depth devices like ATA; we only have 31 commands to work with,
and setting one or two aside for TMF is really making a difference
performance wise. It would be _awesome_ if we could allocate reserved
commands dynamically (ie just marking a command as 'reserved' once
allocated).
Sure, it won't have the same guarantees as 'real' reserved commands, but
in most cases we don't actually need that.

Maybe these are some lines we could investigate?
Hmm?

Cheers,

Hannes
--
Dr. Hannes Reinecke Kernel Storage Architect
[email protected] +49 911 74053 688
SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), GF: Felix Imendörffer

2022-03-22 15:58:01

by John Garry

[permalink] [raw]
Subject: [PATCH 06/11] scsi: core: Add scsi_alloc_request_hwq()

Add a variant of scsi_alloc_request() which allocates a request for a
specific hw queue.

Signed-off-by: John Garry <[email protected]>
---
drivers/scsi/scsi_lib.c | 12 ++++++++++++
include/scsi/scsi_cmnd.h | 3 +++
2 files changed, 15 insertions(+)

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index d230392f2b4a..543cc01b2816 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -1137,6 +1137,18 @@ struct request *scsi_alloc_request(struct request_queue *q,
}
EXPORT_SYMBOL_GPL(scsi_alloc_request);

+struct request *scsi_alloc_request_hwq(struct request_queue *q,
+ unsigned int op, blk_mq_req_flags_t flags, unsigned int hwq)
+{
+ struct request *rq;
+
+ rq = blk_mq_alloc_request_hctx(q, op, flags, hwq);
+ if (!IS_ERR(rq))
+ scsi_initialize_rq(rq);
+ return rq;
+}
+EXPORT_SYMBOL_GPL(scsi_alloc_request_hwq);
+
/*
* Only called when the request isn't completed by SCSI, and not freed by
* SCSI
diff --git a/include/scsi/scsi_cmnd.h b/include/scsi/scsi_cmnd.h
index ad4bcace1390..94e65f4c81b5 100644
--- a/include/scsi/scsi_cmnd.h
+++ b/include/scsi/scsi_cmnd.h
@@ -399,4 +399,7 @@ extern void scsi_build_sense(struct scsi_cmnd *scmd, int desc,
struct request *scsi_alloc_request(struct request_queue *q,
unsigned int op, blk_mq_req_flags_t flags);

+struct request *scsi_alloc_request_hwq(struct request_queue *q,
+ unsigned int op, blk_mq_req_flags_t flags, unsigned int hwq);
+
#endif /* _SCSI_SCSI_CMND_H */
--
2.26.2

2022-03-22 16:47:50

by John Garry

[permalink] [raw]
Subject: [PATCH 05/11] scsi: libsas: Send TMF commands through the block layer

Like what we did with SMP commands, send TMF commands through the block
layer.

Signed-off-by: John Garry <[email protected]>
---
drivers/scsi/libsas/sas_scsi_host.c | 44 +++++++++++++++++++++--------
1 file changed, 32 insertions(+), 12 deletions(-)

diff --git a/drivers/scsi/libsas/sas_scsi_host.c b/drivers/scsi/libsas/sas_scsi_host.c
index 91133ad37ae8..04c6298d1340 100644
--- a/drivers/scsi/libsas/sas_scsi_host.c
+++ b/drivers/scsi/libsas/sas_scsi_host.c
@@ -1070,15 +1070,39 @@ int sas_execute_tmf(struct domain_device *device, void *parameter,
struct sas_internal *i =
to_sas_internal(device->port->ha->core.shost->transportt);
int res, retry;
+ struct request_queue *request_queue;
+ struct request *rq;
+ struct sas_ha_struct *ha = device->port->ha;
+ struct Scsi_Host *shost = ha->core.shost;
+
+ request_queue = sas_alloc_request_queue(shost);
+ if (IS_ERR(request_queue))
+ return PTR_ERR(request_queue);

for (retry = 0; retry < TASK_RETRY; retry++) {
+ struct scsi_cmnd *scmd;
+
task = sas_alloc_slow_task(GFP_KERNEL);
- if (!task)
- return -ENOMEM;
+ if (!task) {
+ res = -ENOMEM;
+ break;
+ }

task->dev = device;
task->task_proto = device->tproto;

+ rq = scsi_alloc_request(request_queue, REQ_OP_DRV_IN, 0);
+ if (IS_ERR(rq)) {
+ res = PTR_ERR(rq);
+ break;
+ }
+
+ scmd = blk_mq_rq_to_pdu(rq);
+ scmd->submitter = SUBMITTED_BY_SCSI_CUSTOM_OPS;
+ ASSIGN_SAS_TASK(scmd, task);
+
+ task->uldd_task = scmd;
+
if (dev_is_sata(device)) {
task->ata_task.device_control_reg_update = 1;
if (force_phy_id >= 0) {
@@ -1093,20 +1117,14 @@ int sas_execute_tmf(struct domain_device *device, void *parameter,
task->task_done = sas_task_internal_done;
task->tmf = tmf;

- task->slow_task->timer.function = sas_task_internal_timedout;
- task->slow_task->timer.expires = jiffies + TASK_TIMEOUT;
- add_timer(&task->slow_task->timer);
+ rq->timeout = TASK_TIMEOUT;

- res = i->dft->lldd_execute_task(task, GFP_KERNEL);
- if (res) {
- del_timer_sync(&task->slow_task->timer);
- pr_err("executing TMF task failed %016llx (%d)\n",
- SAS_ADDR(device->sas_addr), res);
- break;
- }
+ blk_execute_rq_nowait(rq, true, NULL);

wait_for_completion(&task->slow_task->completion);

+ __blk_mq_end_request(rq, BLK_STS_OK);
+
if (i->dft->lldd_tmf_exec_complete)
i->dft->lldd_tmf_exec_complete(device);

@@ -1177,6 +1195,8 @@ int sas_execute_tmf(struct domain_device *device, void *parameter,
SAS_ADDR(device->sas_addr), TASK_RETRY);
sas_free_task(task);

+ blk_cleanup_queue(request_queue);
+
return res;
}

--
2.26.2

2022-03-22 20:39:19

by John Garry

[permalink] [raw]
Subject: [PATCH 10/11] scsi: libsas: Add sas_task_to_hwq()

Add a function to get the hw queue index from a sas_task.

Signed-off-by: John Garry <[email protected]>
---
drivers/scsi/hisi_sas/hisi_sas_main.c | 20 ++------------------
drivers/scsi/libsas/sas_scsi_host.c | 9 ++++++++-
include/scsi/libsas.h | 2 +-
3 files changed, 11 insertions(+), 20 deletions(-)

diff --git a/drivers/scsi/hisi_sas/hisi_sas_main.c b/drivers/scsi/hisi_sas/hisi_sas_main.c
index f6b64c789335..b3e03c229cb5 100644
--- a/drivers/scsi/hisi_sas/hisi_sas_main.c
+++ b/drivers/scsi/hisi_sas/hisi_sas_main.c
@@ -461,7 +461,6 @@ static int hisi_sas_queue_command(struct sas_task *task, gfp_t gfp_flags)
struct asd_sas_port *sas_port = device->port;
struct hisi_sas_device *sas_dev = device->lldd_dev;
bool internal_abort = sas_is_internal_abort(task);
- struct scsi_cmnd *scmd = NULL;
struct hisi_sas_dq *dq = NULL;
struct hisi_sas_port *port;
struct hisi_hba *hisi_hba;
@@ -486,6 +485,8 @@ static int hisi_sas_queue_command(struct sas_task *task, gfp_t gfp_flags)
hisi_hba = dev_to_hisi_hba(device);
dev = hisi_hba->dev;

+ dq = &hisi_hba->dq[sas_task_to_hwq(task)];
+
switch (task->task_proto) {
case SAS_PROTOCOL_SSP:
case SAS_PROTOCOL_SMP:
@@ -520,22 +521,6 @@ static int hisi_sas_queue_command(struct sas_task *task, gfp_t gfp_flags)
return -ECOMM;
}

- scmd = task->uldd_task;
-
- if (scmd) {
- unsigned int dq_index;
- u32 blk_tag;
-
- blk_tag = blk_mq_unique_tag(scsi_cmd_to_rq(scmd));
- dq_index = blk_mq_unique_tag_to_hwq(blk_tag);
- dq = &hisi_hba->dq[dq_index];
- } else {
- struct Scsi_Host *shost = hisi_hba->shost;
- struct blk_mq_queue_map *qmap = &shost->tag_set.map[HCTX_TYPE_DEFAULT];
- int queue = qmap->mq_map[raw_smp_processor_id()];
-
- dq = &hisi_hba->dq[queue];
- }
break;
case SAS_PROTOCOL_INTERNAL_ABORT:
if (!hisi_hba->hw->prep_abort)
@@ -550,7 +535,6 @@ static int hisi_sas_queue_command(struct sas_task *task, gfp_t gfp_flags)
return -EINVAL;

port = to_hisi_sas_port(sas_port);
- dq = &hisi_hba->dq[task->abort_task.qid];
break;
default:
dev_err(hisi_hba->dev, "task prep: unknown/unsupported proto (0x%x)\n",
diff --git a/drivers/scsi/libsas/sas_scsi_host.c b/drivers/scsi/libsas/sas_scsi_host.c
index 425904fa4cc7..b808b23e265e 100644
--- a/drivers/scsi/libsas/sas_scsi_host.c
+++ b/drivers/scsi/libsas/sas_scsi_host.c
@@ -986,7 +986,6 @@ static int sas_execute_internal_abort(struct domain_device *device,

task->abort_task.tag = tag;
task->abort_task.type = type;
- task->abort_task.qid = qid;

rq = scsi_alloc_request_hwq(request_queue, REQ_OP_DRV_IN, BLK_MQ_REQ_NOWAIT, qid);
if (!rq) {
@@ -1079,6 +1078,14 @@ unsigned int sas_task_to_unique_tag(struct sas_task *task)
}
EXPORT_SYMBOL_GPL(sas_task_to_unique_tag);

+unsigned int sas_task_to_hwq(struct sas_task *task)
+{
+ u32 unique = sas_task_to_rq_unique_tag(task);
+
+ return blk_mq_unique_tag_to_hwq(unique);
+}
+EXPORT_SYMBOL_GPL(sas_task_to_hwq);
+
int sas_execute_tmf(struct domain_device *device, void *parameter,
int para_len, int force_phy_id,
struct sas_tmf_task *tmf)
diff --git a/include/scsi/libsas.h b/include/scsi/libsas.h
index 3d9ef4c2c889..368c45016af0 100644
--- a/include/scsi/libsas.h
+++ b/include/scsi/libsas.h
@@ -565,7 +565,6 @@ enum sas_internal_abort {

struct sas_internal_abort_task {
enum sas_internal_abort type;
- unsigned int qid;
u16 tag;
};

@@ -626,6 +625,7 @@ struct sas_task {
};

unsigned int sas_task_to_unique_tag(struct sas_task *task);
+unsigned int sas_task_to_hwq(struct sas_task *task);

struct sas_task_slow {
/* standard/extra infrastructure for slow path commands (SMP and
--
2.26.2

2022-03-23 02:26:51

by John Garry

[permalink] [raw]
Subject: Re: [PATCH RFC 00/11] blk-mq/libata/scsi: SCSI driver tagging improvements

On 22/03/2022 11:30, Hannes Reinecke wrote:
> On 3/22/22 11:39, John Garry wrote:
>> Currently SCSI low-level drivers are required to manage tags for commands
>> which do not come via the block layer - libata internal commands would be
>> an example of one of these.
>>
>> There was some work to provide "reserved commands" support in such series
>> as
>> https://lore.kernel.org/linux-scsi/[email protected]/
>>
>> This was based on allocating a request for the lifetime of the "internal"
>> command.
>>
>> This series tries to solve that problem by not just allocating the
>> request
>> but also sending it through the block layer, that being the normal flow
>> for a request. We need to do this as we may only poll completion of
>> requests through the block layer, so would need to do this for poll queue
>> support.
>>
>> There is still scope to allocate commands just to get a tag as token as
>> that may suit some other scenarios, but it's not what we do here.
>>
>> This series extends blk-mq to support a request queue having a custom set
>> of ops. In addition SCSI core code adds support for these type of
>> requests.
>>
>> This series does not include SCSI core handling for enabling reserved
>> tags per tagset, but that would be easy to add.
>>
>> Based on mkp-scsi 5.18/scsi-staging @ 66daf3e6b993
>>
>> Please consider as an RFC for now. I think that the libata change has the
>> largest scope for improvement...
>>
>
> Grand seeing that someone is taking it up. Thanks for doing this!
>
> But:
> Allocating a queue for every request (eg in patch 3) is overkill. If we
> want to go that route we should be allocating the queue upfront (eg when
> creating the device itself), and then just referencing it.

For patch #3 I needed to allocate a separate request queue as the scsi
device is not created by that stage.

And then for other scenarios in which we allocate the separate request
queue, since the scheme here is to allocate a request queue with
different ops, we can't use the same scsi_device request queue.

note: As for allocating request queues for the lifetime of the host, we
need to remember that blk-mq fairly reserves a tag budget per request
queue, and it would be a waste to keep a budget just for these internal
commands. So that is why I only keep the request queues temporarily.

>
> However, can't say I like this approach. I've been playing around with
> supporting internal commands, and really found two constraints really
> annoying:
>
> - The tagset supports only _one_ set of payload via
>   blk_mq_rq_(to,from)_pdu().
> This requires each request to be of the same type, and with that making
> it really hard for re-purposing the request for internal usage. In the
> end I settled by just keeping it and skipping the SCSI command field.

That sounds reasonable.

For this series I am just fixing up libsas, and libsas has a sas_task
per command already, so we can just allocate the sas_task separately and
use the host_scribble to point to the sas_task.

> If we could have a distinct PDU type for internal commands I guess
> things would be easier.

Other drivers can do something similar to above or use the scsi priv data.

>
> - The number of reserved commands is static.
> With that it's getting really hard using reserved commands with
> low-queue depth devices like ATA; we only have 31 commands to work with,
> and setting one or two aside for TMF is really making a difference
> performance wise. It would be _awesome_ if we could allocate reserved
> commands dynamically (ie just marking a command as 'reserved' once
> allocated).

I see. So you want to allocate a request, mark as "internal", and then
we have a flag which can be used to decide which path we need to send it
on. eh, maybe scsi_cmnd.submitter could be used

> Sure, it won't have the same guarantees as 'real' reserved commands, but
> in most cases we don't actually need that.
>
> Maybe these are some lines we could investigate?
> Hmm?
>

Thanks,
John