2022-06-09 10:38:45

by John Garry

[permalink] [raw]
Subject: [PATCH RFC v2 00/18] blk-mq/libata/scsi: SCSI driver tagging improvements

Currently SCSI low-level drivers are required to manage tags for commands
which do not come via the block layer - libata internal commands would be
an example of one of these. We want to make blk-mq manage these tags also.

There was some work to provide "reserved commands" support in such series
as https://lore.kernel.org/linux-scsi/[email protected]/

This was based on allocating a request for the lifetime of the "internal"
command.

This series tries to solve that problem by not just allocating the request
but also sending it as a request through the block layer. Reasons to do
this:
- Normal flow of a request and also commonality for regular scsi command
lifetime
- We don't leave request and scsi_cmnd fields dangling as when we just
allocate and free the request for the lifetime of the "internal" command
- For poll mode support we can only poll in block layer, so could not send
internal commands on poll mode queues if we did not do this, which is a
problem
- Can get rid of duplicated code like libsas internal command timeout
handling

Based on v5.19-rc1

Please consider as an PoC for now. I have broken builds, bisectablility,
and functionality for a lot of libsas drivers, like pm8001. Indeed,
10/18 onwards need a lot of work...

Testing:
QEMU with AHCI with disk and cdrom attached and hisi_sas v2 both boot.

v1 was sent here:
https://lore.kernel.org/linux-scsi/[email protected]/

Hannes Reinecke (1):
scsi: core: Implement reserved command handling

John Garry (17):
blk-mq: Add a flag for reserved requests
scsi: core: Resurrect scsi_{get,free}_host_dev()
scsi: core: Add support to send reserved commands
scsi: core: Allocate SCSI host sdev when required
libata-scsi: Add ata_scsi_queue_internal()
libata-scsi: Add ata_internal_queuecommand()
libata: Queue ATA internal commands as requests
scsi: ipr: Support reserved commands
libata/scsi: libsas: Add sas_queuecommand_internal()
scsi: libsas: Don't attempt to find scsi host rphy in slave alloc
scsi: libsas drivers: Prepare for reserved commands
scsi: libsas: Allocate SCSI commands for tasks
scsi: libsas: Queue SMP commands as requests
scsi: libsas: Queue TMF commands as requests
scsi: core: Add scsi_alloc_request_hwq()
scsi: libsas: Queue internal abort commands as requests
scsi: libsas drivers: Remove private tag management

block/blk-mq.c | 6 +
drivers/ata/libata-core.c | 145 +++++++++++++++----------
drivers/ata/libata-sata.c | 5 +-
drivers/ata/libata-scsi.c | 61 ++++++++++-
drivers/ata/libata.h | 4 -
drivers/scsi/hisi_sas/hisi_sas_main.c | 93 ++++------------
drivers/scsi/hisi_sas/hisi_sas_v2_hw.c | 6 +-
drivers/scsi/hisi_sas/hisi_sas_v3_hw.c | 3 +-
drivers/scsi/hosts.c | 14 +++
drivers/scsi/ipr.c | 17 ++-
drivers/scsi/libsas/sas_ata.c | 22 ++--
drivers/scsi/libsas/sas_expander.c | 24 ++--
drivers/scsi/libsas/sas_init.c | 74 ++++++++-----
drivers/scsi/libsas/sas_internal.h | 6 +
drivers/scsi/libsas/sas_scsi_host.c | 121 +++++++++++++++------
drivers/scsi/scsi_lib.c | 50 ++++++++-
drivers/scsi/scsi_scan.c | 57 ++++++++++
include/linux/blk-mq.h | 6 +
include/linux/libata.h | 13 ++-
include/scsi/libsas.h | 57 +++++++++-
include/scsi/scsi_cmnd.h | 8 ++
include/scsi/scsi_host.h | 34 +++++-
22 files changed, 597 insertions(+), 229 deletions(-)

--
2.26.2


2022-06-09 10:38:50

by John Garry

[permalink] [raw]
Subject: [PATCH RFC v2 01/18] blk-mq: Add a flag for reserved requests

Add a flag for reserved requests so that drivers may know this for any
special handling.

The 'reserved' argument in blk_mq_ops.timeout callback could now be
replaced by using this flag.

Signed-off-by: John Garry <[email protected]>
---
block/blk-mq.c | 6 ++++++
include/linux/blk-mq.h | 6 ++++++
2 files changed, 12 insertions(+)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index e9bf950983c7..23f2eafb09ca 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -474,6 +474,9 @@ static struct request *__blk_mq_alloc_requests(struct blk_mq_alloc_data *data)
if (!(data->rq_flags & RQF_ELV))
blk_mq_tag_busy(data->hctx);

+ if (data->flags & BLK_MQ_REQ_RESERVED)
+ data->rq_flags |= RQF_RESV;
+
/*
* Try batched alloc if we want more than 1 tag.
*/
@@ -586,6 +589,9 @@ struct request *blk_mq_alloc_request_hctx(struct request_queue *q,
else
data.rq_flags |= RQF_ELV;

+ if (flags & BLK_MQ_REQ_RESERVED)
+ data.rq_flags |= RQF_RESV;
+
ret = -EWOULDBLOCK;
tag = blk_mq_get_tag(&data);
if (tag == BLK_MQ_NO_TAG)
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index e2d9daf7e8dd..6d81fe10e850 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -57,6 +57,7 @@ typedef __u32 __bitwise req_flags_t;
#define RQF_TIMED_OUT ((__force req_flags_t)(1 << 21))
/* queue has elevator attached */
#define RQF_ELV ((__force req_flags_t)(1 << 22))
+#define RQF_RESV ((__force req_flags_t)(1 << 23))

/* flags that prevent us from merging requests: */
#define RQF_NOMERGE_FLAGS \
@@ -823,6 +824,11 @@ static inline bool blk_mq_need_time_stamp(struct request *rq)
return (rq->rq_flags & (RQF_IO_STAT | RQF_STATS | RQF_ELV));
}

+static inline bool blk_mq_is_reserved_rq(struct request *rq)
+{
+ return rq->rq_flags & RQF_RESV;
+}
+
/*
* Batched completions only work when there is no I/O error and no special
* ->end_io handler.
--
2.26.2

2022-06-09 10:38:50

by John Garry

[permalink] [raw]
Subject: [PATCH RFC v2 03/18] scsi: core: Implement reserved command handling

From: Hannes Reinecke <[email protected]>

Quite some drivers are using management commands internally, which
typically use the same hardware tag pool (ie they are being allocated
from the same hardware resources) as the 'normal' I/O commands.
These commands are set aside before allocating the block-mq tag bitmap,
so they'll never show up as busy in the tag map.
The block-layer, OTOH, already has 'reserved_tags' to handle precisely
this situation.
So this patch adds a new field 'nr_reserved_cmds' to the SCSI host
template to instruct the block layer to set aside a tag space for these
management commands by using reserved tags.

Signed-off-by: Hannes Reinecke <[email protected]>
Signed-off-by: John Garry <[email protected]>
---
drivers/scsi/hosts.c | 3 +++
drivers/scsi/scsi_lib.c | 6 +++++-
include/scsi/scsi_host.h | 22 +++++++++++++++++++++-
3 files changed, 29 insertions(+), 2 deletions(-)

diff --git a/drivers/scsi/hosts.c b/drivers/scsi/hosts.c
index 8352f90d997d..27296addaf63 100644
--- a/drivers/scsi/hosts.c
+++ b/drivers/scsi/hosts.c
@@ -474,6 +474,9 @@ struct Scsi_Host *scsi_host_alloc(struct scsi_host_template *sht, int privsize)
if (sht->virt_boundary_mask)
shost->virt_boundary_mask = sht->virt_boundary_mask;

+ if (sht->nr_reserved_cmds)
+ shost->nr_reserved_cmds = sht->nr_reserved_cmds;
+
device_initialize(&shost->shost_gendev);
dev_set_name(&shost->shost_gendev, "host%d", shost->host_no);
shost->shost_gendev.bus = &scsi_bus_type;
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 6ffc9e4258a8..f6e53c6d913c 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -1974,8 +1974,12 @@ int scsi_mq_setup_tags(struct Scsi_Host *shost)
else
tag_set->ops = &scsi_mq_ops_no_commit;
tag_set->nr_hw_queues = shost->nr_hw_queues ? : 1;
+
tag_set->nr_maps = shost->nr_maps ? : 1;
- tag_set->queue_depth = shost->can_queue;
+ tag_set->queue_depth =
+ shost->can_queue + shost->nr_reserved_cmds;
+ tag_set->reserved_tags = shost->nr_reserved_cmds;
+
tag_set->cmd_size = cmd_size;
tag_set->numa_node = dev_to_node(shost->dma_dev);
tag_set->flags = BLK_MQ_F_SHOULD_MERGE;
diff --git a/include/scsi/scsi_host.h b/include/scsi/scsi_host.h
index 59aef1f178f5..149dcbd4125e 100644
--- a/include/scsi/scsi_host.h
+++ b/include/scsi/scsi_host.h
@@ -366,10 +366,19 @@ struct scsi_host_template {
/*
* This determines if we will use a non-interrupt driven
* or an interrupt driven scheme. It is set to the maximum number
- * of simultaneous commands a single hw queue in HBA will accept.
+ * of simultaneous commands a single hw queue in HBA will accept
+ * excluding internal commands.
*/
int can_queue;

+ /*
+ * This determines how many commands the HBA will set aside
+ * for internal commands. This number will be added to
+ * @can_queue to calcumate the maximum number of simultaneous
+ * commands sent to the host.
+ */
+ int nr_reserved_cmds;
+
/*
* In many instances, especially where disconnect / reconnect are
* supported, our host also has an ID on the SCSI bus. If this is
@@ -602,6 +611,11 @@ struct Scsi_Host {
unsigned short max_cmd_len;

int this_id;
+
+ /*
+ * Number of commands this host can handle at the same time.
+ * This excludes reserved commands as specified by nr_reserved_cmds.
+ */
int can_queue;
short cmd_per_lun;
short unsigned int sg_tablesize;
@@ -620,6 +634,12 @@ struct Scsi_Host {
*/
unsigned nr_hw_queues;
unsigned nr_maps;
+
+ /*
+ * Number of reserved commands to allocate, if any.
+ */
+ unsigned nr_reserved_cmds;
+
unsigned active_mode:2;

/*
--
2.26.2

2022-06-09 10:40:43

by John Garry

[permalink] [raw]
Subject: [PATCH RFC v2 11/18] scsi: libsas: Don't attempt to find scsi host rphy in slave alloc

It doesn't have one.

Signed-off-by: John Garry <[email protected]>
---
drivers/scsi/hisi_sas/hisi_sas_main.c | 12 ++++++++++--
drivers/scsi/libsas/sas_scsi_host.c | 17 +++++++++++++++--
2 files changed, 25 insertions(+), 4 deletions(-)

diff --git a/drivers/scsi/hisi_sas/hisi_sas_main.c b/drivers/scsi/hisi_sas/hisi_sas_main.c
index 764e859d0106..0219ccac9062 100644
--- a/drivers/scsi/hisi_sas/hisi_sas_main.c
+++ b/drivers/scsi/hisi_sas/hisi_sas_main.c
@@ -756,14 +756,22 @@ static int hisi_sas_init_device(struct domain_device *device)

int hisi_sas_slave_alloc(struct scsi_device *sdev)
{
- struct domain_device *ddev = sdev_to_domain_dev(sdev);
- struct hisi_sas_device *sas_dev = ddev->lldd_dev;
+ struct scsi_target *starget = sdev->sdev_target;
+ struct device *parent = starget->dev.parent;
+ struct hisi_sas_device *sas_dev;
+ struct domain_device *ddev;
int rc;

+ if (scsi_is_host_device(parent))
+ return 0;
+
rc = sas_slave_alloc(sdev);
if (rc)
return rc;

+ ddev = sdev_to_domain_dev(sdev);
+ sas_dev = ddev->lldd_dev;
+
rc = hisi_sas_init_device(ddev);
if (rc)
return rc;
diff --git a/drivers/scsi/libsas/sas_scsi_host.c b/drivers/scsi/libsas/sas_scsi_host.c
index 42279a6d6b06..532e734c1fb6 100644
--- a/drivers/scsi/libsas/sas_scsi_host.c
+++ b/drivers/scsi/libsas/sas_scsi_host.c
@@ -847,8 +847,15 @@ struct domain_device *sas_find_dev_by_rphy(struct sas_rphy *rphy)

int sas_target_alloc(struct scsi_target *starget)
{
- struct sas_rphy *rphy = dev_to_rphy(starget->dev.parent);
- struct domain_device *found_dev = sas_find_dev_by_rphy(rphy);
+ struct device *parent = starget->dev.parent;
+ struct sas_rphy *rphy;
+ struct domain_device *found_dev;
+
+ if (scsi_is_host_device(parent))
+ return 0;
+
+ rphy = dev_to_rphy(parent);
+ found_dev = sas_find_dev_by_rphy(rphy);

if (!found_dev)
return -ENODEV;
@@ -1252,6 +1259,12 @@ EXPORT_SYMBOL_GPL(sas_task_abort);

int sas_slave_alloc(struct scsi_device *sdev)
{
+ struct scsi_target *starget = sdev->sdev_target;
+ struct device *parent = starget->dev.parent;
+
+ if (scsi_is_host_device(parent))
+ return 0;
+
if (dev_is_sata(sdev_to_domain_dev(sdev)) && sdev->lun)
return -ENXIO;

--
2.26.2

2022-06-09 10:41:50

by John Garry

[permalink] [raw]
Subject: [PATCH RFC v2 16/18] scsi: core: Add scsi_alloc_request_hwq()

Add a variant of scsi_alloc_request() which allocates a request for a
specific hw queue.

Signed-off-by: John Garry <[email protected]>
---
drivers/scsi/scsi_lib.c | 12 ++++++++++++
include/scsi/scsi_cmnd.h | 3 +++
2 files changed, 15 insertions(+)

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 8c8b4c6767d9..443afaf52c14 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -1137,6 +1137,18 @@ struct request *scsi_alloc_request(struct request_queue *q,
}
EXPORT_SYMBOL_GPL(scsi_alloc_request);

+struct request *scsi_alloc_request_hwq(struct request_queue *q,
+ unsigned int op, blk_mq_req_flags_t flags, unsigned int hwq)
+{
+ struct request *rq;
+
+ rq = blk_mq_alloc_request_hctx(q, op, flags, hwq);
+ if (!IS_ERR(rq))
+ scsi_initialize_rq(rq);
+ return rq;
+}
+EXPORT_SYMBOL_GPL(scsi_alloc_request_hwq);
+
/*
* Only called when the request isn't completed by SCSI, and not freed by
* SCSI
diff --git a/include/scsi/scsi_cmnd.h b/include/scsi/scsi_cmnd.h
index e47df5836070..e69bb6baad47 100644
--- a/include/scsi/scsi_cmnd.h
+++ b/include/scsi/scsi_cmnd.h
@@ -394,4 +394,7 @@ extern void scsi_build_sense(struct scsi_cmnd *scmd, int desc,
struct request *scsi_alloc_request(struct request_queue *q,
unsigned int op, blk_mq_req_flags_t flags);

+struct request *scsi_alloc_request_hwq(struct request_queue *q,
+ unsigned int op, blk_mq_req_flags_t flags, unsigned int hwq);
+
#endif /* _SCSI_SCSI_CMND_H */
--
2.26.2

2022-06-09 10:55:55

by John Garry

[permalink] [raw]
Subject: [PATCH RFC v2 05/18] scsi: core: Allocate SCSI host sdev when required

When the shost supports reserved commands then allocate the SCSI host sdev.

As noted later, we need to use this for libata internal commands.

Signed-off-by: John Garry <[email protected]>
---
drivers/scsi/hosts.c | 5 +++++
include/scsi/scsi_host.h | 1 +
2 files changed, 6 insertions(+)

diff --git a/drivers/scsi/hosts.c b/drivers/scsi/hosts.c
index 5c9b05a8fec8..67930a61b222 100644
--- a/drivers/scsi/hosts.c
+++ b/drivers/scsi/hosts.c
@@ -298,6 +298,8 @@ int scsi_add_host_with_dma(struct Scsi_Host *shost, struct device *dev,

scsi_proc_host_add(shost);
scsi_autopm_put_host(shost);
+ if (shost->nr_reserved_cmds)
+ shost->sdev = scsi_get_host_dev(shost); // TODO: Add error handling
return error;

/*
@@ -360,6 +362,9 @@ static void scsi_host_dev_release(struct device *dev)

if (shost->shost_state != SHOST_CREATED)
put_device(parent);
+ if (shost->sdev)
+ scsi_free_host_dev(shost->sdev);
+
kfree(shost);
}

diff --git a/include/scsi/scsi_host.h b/include/scsi/scsi_host.h
index 88c8504395c8..2e14d65b7444 100644
--- a/include/scsi/scsi_host.h
+++ b/include/scsi/scsi_host.h
@@ -566,6 +566,7 @@ struct Scsi_Host {
wait_queue_head_t host_wait;
struct scsi_host_template *hostt;
struct scsi_transport_template *transportt;
+ struct scsi_device *sdev;

/* Area to keep a shared tag map */
struct blk_mq_tag_set tag_set;
--
2.26.2

2022-06-09 11:05:34

by John Garry

[permalink] [raw]
Subject: [PATCH RFC v2 02/18] scsi: core: Resurrect scsi_{get,free}_host_dev()

This reverts commit 6bd49b1a8d43ec118c55f3aaa7577729b52bde15.

Signed-off-by: John Garry <[email protected]>
---
drivers/scsi/scsi_scan.c | 57 ++++++++++++++++++++++++++++++++++++++++
include/scsi/scsi_host.h | 10 +++++++
2 files changed, 67 insertions(+)

diff --git a/drivers/scsi/scsi_scan.c b/drivers/scsi/scsi_scan.c
index 91ac901a6682..925fe63fa370 100644
--- a/drivers/scsi/scsi_scan.c
+++ b/drivers/scsi/scsi_scan.c
@@ -1978,3 +1978,60 @@ void scsi_forget_host(struct Scsi_Host *shost)
spin_unlock_irqrestore(shost->host_lock, flags);
}

+/**
+ * scsi_get_host_dev - Create a scsi_device that points to the host adapter itself
+ * @shost: Host that needs a scsi_device
+ *
+ * Lock status: None assumed.
+ *
+ * Returns: The scsi_device or NULL
+ *
+ * Notes:
+ * Attach a single scsi_device to the Scsi_Host - this should
+ * be made to look like a "pseudo-device" that points to the
+ * HA itself.
+ *
+ * Note - this device is not accessible from any high-level
+ * drivers (including generics), which is probably not
+ * optimal. We can add hooks later to attach.
+ */
+struct scsi_device *scsi_get_host_dev(struct Scsi_Host *shost)
+{
+ struct scsi_device *sdev = NULL;
+ struct scsi_target *starget;
+
+ mutex_lock(&shost->scan_mutex);
+ if (!scsi_host_scan_allowed(shost))
+ goto out;
+ starget = scsi_alloc_target(&shost->shost_gendev, 0, shost->this_id);
+ if (!starget)
+ goto out;
+
+ sdev = scsi_alloc_sdev(starget, 0, NULL);
+ if (sdev)
+ sdev->borken = 0;
+ else
+ scsi_target_reap(starget);
+ put_device(&starget->dev);
+ out:
+ mutex_unlock(&shost->scan_mutex);
+ return sdev;
+}
+EXPORT_SYMBOL(scsi_get_host_dev);
+
+/**
+ * scsi_free_host_dev - Free a scsi_device that points to the host adapter itself
+ * @sdev: Host device to be freed
+ *
+ * Lock status: None assumed.
+ *
+ * Returns: Nothing
+ */
+void scsi_free_host_dev(struct scsi_device *sdev)
+{
+ BUG_ON(sdev->id != sdev->host->this_id);
+
+ __scsi_remove_device(sdev);
+}
+EXPORT_SYMBOL(scsi_free_host_dev);
+
diff --git a/include/scsi/scsi_host.h b/include/scsi/scsi_host.h
index 667d889b92b5..59aef1f178f5 100644
--- a/include/scsi/scsi_host.h
+++ b/include/scsi/scsi_host.h
@@ -790,6 +790,16 @@ void scsi_host_busy_iter(struct Scsi_Host *,

struct class_container;

+/*
+ * These two functions are used to allocate and free a pseudo device
+ * which will connect to the host adapter itself rather than any
+ * physical device. You must deallocate when you are done with the
+ * thing. This physical pseudo-device isn't real and won't be available
+ * from any high-level drivers.
+ */
+extern void scsi_free_host_dev(struct scsi_device *);
+extern struct scsi_device *scsi_get_host_dev(struct Scsi_Host *);
+
/*
* DIF defines the exchange of protection information between
* initiator and SBC block device.
--
2.26.2

2022-06-09 11:06:00

by John Garry

[permalink] [raw]
Subject: [PATCH RFC v2 07/18] libata-scsi: Add ata_internal_queuecommand()

Add callback to queue reserved commands - call it "internal" as this is
what libata uses.

Also add it to the base ATA SHT, and set nr_reserved_cmds = 1, which
matches tag ATA_TAG_INTERNAL.

Signed-off-by: John Garry <[email protected]>
---
drivers/ata/libata-scsi.c | 14 ++++++++++++++
include/linux/libata.h | 6 +++++-
2 files changed, 19 insertions(+), 1 deletion(-)

diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
index baac35dd17ca..b2702ab0183b 100644
--- a/drivers/ata/libata-scsi.c
+++ b/drivers/ata/libata-scsi.c
@@ -1114,6 +1114,20 @@ int ata_scsi_dev_config(struct scsi_device *sdev, struct ata_device *dev)
return 0;
}

+int ata_internal_queuecommand(struct Scsi_Host *shost, struct scsi_cmnd *scmd)
+{
+ struct ata_port *ap;
+ int res;
+
+ ap = ata_shost_to_port(shost);
+ spin_lock_irq(ap->lock);
+ res = ata_sas_queuecmd(scmd, ap);
+ spin_unlock_irq(ap->lock);
+
+ return res;
+}
+EXPORT_SYMBOL_GPL(ata_internal_queuecommand);
+
/**
* ata_scsi_slave_config - Set SCSI device attributes
* @sdev: SCSI device to examine
diff --git a/include/linux/libata.h b/include/linux/libata.h
index 43f4bcfe9a5f..5fa6f56bba81 100644
--- a/include/linux/libata.h
+++ b/include/linux/libata.h
@@ -1141,6 +1141,8 @@ extern int ata_std_bios_param(struct scsi_device *sdev,
sector_t capacity, int geom[]);
extern void ata_scsi_unlock_native_capacity(struct scsi_device *sdev);
extern int ata_scsi_slave_config(struct scsi_device *sdev);
+extern int ata_internal_queuecommand(struct Scsi_Host *shost,
+ struct scsi_cmnd *scmd);
extern void ata_scsi_slave_destroy(struct scsi_device *sdev);
extern int ata_scsi_change_queue_depth(struct scsi_device *sdev,
int queue_depth);
@@ -1390,7 +1392,9 @@ extern const struct attribute_group *ata_common_sdev_groups[];
.proc_name = drv_name, \
.slave_destroy = ata_scsi_slave_destroy, \
.bios_param = ata_std_bios_param, \
- .unlock_native_capacity = ata_scsi_unlock_native_capacity
+ .unlock_native_capacity = ata_scsi_unlock_native_capacity,\
+ .nr_reserved_cmds = 1,\
+ .reserved_queuecommand = ata_internal_queuecommand

#define ATA_SUBBASE_SHT(drv_name) \
__ATA_BASE_SHT(drv_name), \
--
2.26.2

2022-06-09 11:06:05

by John Garry

[permalink] [raw]
Subject: [PATCH RFC v2 18/18] scsi: libsas drivers: Remove private tag management

Now every sas_task which the driver sees has a SCSI command and also
request associated, so drop the internal tag management.

For now we are only fixing up hisi_sas v2 HW, but all others need this.

Signed-off-by: John Garry <[email protected]>
---
drivers/scsi/hisi_sas/hisi_sas_main.c | 74 +++-----------------------
drivers/scsi/hisi_sas/hisi_sas_v2_hw.c | 3 +-
drivers/scsi/hisi_sas/hisi_sas_v3_hw.c | 3 +-
drivers/scsi/libsas/sas_ata.c | 5 +-
include/scsi/libsas.h | 8 ++-
5 files changed, 19 insertions(+), 74 deletions(-)

diff --git a/drivers/scsi/hisi_sas/hisi_sas_main.c b/drivers/scsi/hisi_sas/hisi_sas_main.c
index 50ca7d63ab58..e487a6e6fe7e 100644
--- a/drivers/scsi/hisi_sas/hisi_sas_main.c
+++ b/drivers/scsi/hisi_sas/hisi_sas_main.c
@@ -169,41 +169,6 @@ static void hisi_sas_slot_index_free(struct hisi_hba *hisi_hba, int slot_idx)
}
}

-static void hisi_sas_slot_index_set(struct hisi_hba *hisi_hba, int slot_idx)
-{
- void *bitmap = hisi_hba->slot_index_tags;
-
- __set_bit(slot_idx, bitmap);
-}
-
-static int hisi_sas_slot_index_alloc(struct hisi_hba *hisi_hba,
- struct scsi_cmnd *scsi_cmnd)
-{
- int index;
- void *bitmap = hisi_hba->slot_index_tags;
-
- if (scsi_cmnd)
- return scsi_cmd_to_rq(scsi_cmnd)->tag;
-
- spin_lock(&hisi_hba->lock);
- index = find_next_zero_bit(bitmap, hisi_hba->slot_index_count,
- hisi_hba->last_slot_index + 1);
- if (index >= hisi_hba->slot_index_count) {
- index = find_next_zero_bit(bitmap,
- hisi_hba->slot_index_count,
- HISI_SAS_UNRESERVED_IPTT);
- if (index >= hisi_hba->slot_index_count) {
- spin_unlock(&hisi_hba->lock);
- return -SAS_QUEUE_FULL;
- }
- }
- hisi_sas_slot_index_set(hisi_hba, index);
- hisi_hba->last_slot_index = index;
- spin_unlock(&hisi_hba->lock);
-
- return index;
-}
-
void hisi_sas_slot_task_free(struct hisi_hba *hisi_hba, struct sas_task *task,
struct hisi_sas_slot *slot)
{
@@ -458,18 +423,17 @@ void hisi_sas_task_deliver(struct hisi_hba *hisi_hba,
static int hisi_sas_queue_command(struct sas_task *task, gfp_t gfp_flags)
{
int n_elem = 0, n_elem_dif = 0, n_elem_req = 0;
+ struct scsi_cmnd *scmd = sas_scmd_from_task(task);
struct domain_device *device = task->dev;
struct asd_sas_port *sas_port = device->port;
struct hisi_sas_device *sas_dev = device->lldd_dev;
bool internal_abort = sas_is_internal_abort(task);
- struct scsi_cmnd *scmd = NULL;
struct hisi_sas_dq *dq = NULL;
struct hisi_sas_port *port;
struct hisi_hba *hisi_hba;
struct hisi_sas_slot *slot;
struct device *dev;
int rc;
-
if (!sas_port) {
struct task_status_struct *ts = &task->task_status;

@@ -487,6 +451,8 @@ static int hisi_sas_queue_command(struct sas_task *task, gfp_t gfp_flags)
hisi_hba = dev_to_hisi_hba(device);
dev = hisi_hba->dev;

+ dq = &hisi_hba->dq[sas_task_to_hwq(task)];
+
switch (task->task_proto) {
case SAS_PROTOCOL_SSP:
case SAS_PROTOCOL_SMP:
@@ -521,31 +487,6 @@ static int hisi_sas_queue_command(struct sas_task *task, gfp_t gfp_flags)
return -ECOMM;
}

- if (task->uldd_task) {
- struct ata_queued_cmd *qc;
-
- if (dev_is_sata(device)) {
- qc = task->uldd_task;
- scmd = qc->scsicmd;
- } else {
- scmd = task->uldd_task;
- }
- }
-
- if (scmd) {
- unsigned int dq_index;
- u32 blk_tag;
-
- blk_tag = blk_mq_unique_tag(scsi_cmd_to_rq(scmd));
- dq_index = blk_mq_unique_tag_to_hwq(blk_tag);
- dq = &hisi_hba->dq[dq_index];
- } else {
- struct Scsi_Host *shost = hisi_hba->shost;
- struct blk_mq_queue_map *qmap = &shost->tag_set.map[HCTX_TYPE_DEFAULT];
- int queue = qmap->mq_map[raw_smp_processor_id()];
-
- dq = &hisi_hba->dq[queue];
- }
break;
case SAS_PROTOCOL_INTERNAL_ABORT:
if (!hisi_hba->hw->prep_abort)
@@ -556,9 +497,6 @@ static int hisi_sas_queue_command(struct sas_task *task, gfp_t gfp_flags)

if (unlikely(test_bit(HISI_SAS_REJECT_CMD_BIT, &hisi_hba->flags)))
return -EINVAL;
-
- port = to_hisi_sas_port(sas_port);
- dq = &hisi_hba->dq[sas_task_to_hwq(task)];
break;
default:
dev_err(hisi_hba->dev, "task prep: unknown/unsupported proto (0x%x)\n",
@@ -577,10 +515,12 @@ static int hisi_sas_queue_command(struct sas_task *task, gfp_t gfp_flags)
goto err_out_dma_unmap;
}

- if (!internal_abort && hisi_hba->hw->slot_index_alloc)
+ if (scsi_is_reserved_cmd(scmd))
+ rc = sas_task_to_unique_tag(task);
+ else if (hisi_hba->hw->slot_index_alloc)
rc = hisi_hba->hw->slot_index_alloc(hisi_hba, device);
else
- rc = hisi_sas_slot_index_alloc(hisi_hba, scmd);
+ rc = sas_task_to_unique_tag(task);

if (rc < 0)
goto err_out_dif_dma_unmap;
diff --git a/drivers/scsi/hisi_sas/hisi_sas_v2_hw.c b/drivers/scsi/hisi_sas/hisi_sas_v2_hw.c
index a1e60d2ef070..c2af2adf75c3 100644
--- a/drivers/scsi/hisi_sas/hisi_sas_v2_hw.c
+++ b/drivers/scsi/hisi_sas/hisi_sas_v2_hw.c
@@ -2493,6 +2493,7 @@ static void prep_ata_v2_hw(struct hisi_hba *hisi_hba,
struct hisi_sas_port *port = to_hisi_sas_port(sas_port);
struct sas_ata_task *ata_task = &task->ata_task;
struct sas_tmf_task *tmf = slot->tmf;
+ struct scsi_cmnd *scmd = sas_scmd_from_task(task);
u8 *buf_cmd;
int has_data = 0, hdr_tag = 0;
u32 dw0, dw1 = 0, dw2 = 0;
@@ -2538,7 +2539,7 @@ static void prep_ata_v2_hw(struct hisi_hba *hisi_hba,

/* dw2 */
if (task->ata_task.use_ncq) {
- struct ata_queued_cmd *qc = task->uldd_task;
+ struct ata_queued_cmd *qc = (struct ata_queued_cmd *)scmd->host_scribble;

hdr_tag = qc->tag;
task->ata_task.fis.sector_count |= (u8) (hdr_tag << 3);
diff --git a/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c b/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c
index 7d819fc0395e..6a76cc2e34af 100644
--- a/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c
+++ b/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c
@@ -1382,6 +1382,7 @@ static void prep_ata_v3_hw(struct hisi_hba *hisi_hba,
struct hisi_sas_cmd_hdr *hdr = slot->cmd_hdr;
struct asd_sas_port *sas_port = device->port;
struct hisi_sas_port *port = to_hisi_sas_port(sas_port);
+ struct scsi_cmnd *scmd = sas_scmd_from_task(task);
u8 *buf_cmd;
int has_data = 0, hdr_tag = 0;
u32 dw1 = 0, dw2 = 0;
@@ -1421,7 +1422,7 @@ static void prep_ata_v3_hw(struct hisi_hba *hisi_hba,

/* dw2 */
if (task->ata_task.use_ncq) {
- struct ata_queued_cmd *qc = task->uldd_task;
+ struct ata_queued_cmd *qc = (struct ata_queued_cmd *)scmd->host_scribble;

hdr_tag = qc->tag;
task->ata_task.fis.sector_count |= (u8) (hdr_tag << 3);
diff --git a/drivers/scsi/libsas/sas_ata.c b/drivers/scsi/libsas/sas_ata.c
index 66534332e9ac..6ebcabdf0c01 100644
--- a/drivers/scsi/libsas/sas_ata.c
+++ b/drivers/scsi/libsas/sas_ata.c
@@ -188,7 +188,6 @@ static unsigned int sas_ata_qc_issue(struct ata_queued_cmd *qc)
qc->tf.nsect = 0;

ata_tf_to_fis(&qc->tf, qc->dev->link->pmp, 1, (u8 *)&task->ata_task.fis);
- task->uldd_task = qc;
if (ata_is_atapi(qc->tf.protocol)) {
memcpy(task->ata_task.atapi_packet, qc->cdb, qc->dev->cdb_len);
task->total_xfer_len = qc->nbytes;
@@ -211,8 +210,7 @@ static unsigned int sas_ata_qc_issue(struct ata_queued_cmd *qc)
task->ata_task.use_ncq = ata_is_ncq(qc->tf.protocol);
task->ata_task.dma_xfer = ata_is_dma(qc->tf.protocol);

- if (qc->scsicmd)
- ASSIGN_SAS_TASK(qc->scsicmd, task);
+ ASSIGN_SAS_TASK(qc->scsicmd, qc);

ret = i->dft->lldd_execute_task(task, GFP_ATOMIC);
if (ret) {
@@ -473,7 +471,6 @@ static void sas_ata_post_internal(struct ata_queued_cmd *qc)
qc->lldd_task = NULL;
if (!task)
return;
- task->uldd_task = NULL;
sas_ata_internal_abort(task);
}
}
diff --git a/include/scsi/libsas.h b/include/scsi/libsas.h
index 62acbc8a46fd..3dd4d9c47b2b 100644
--- a/include/scsi/libsas.h
+++ b/include/scsi/libsas.h
@@ -619,7 +619,6 @@ struct sas_task {
void (*task_done)(struct sas_task *);

void *lldd_task; /* for use by LLDDs */
- void *uldd_task;
struct sas_task_slow *slow_task;
struct sas_tmf_task *tmf;
};
@@ -798,5 +797,12 @@ static inline unsigned int sas_task_to_hwq(struct sas_task *task)
return blk_mq_unique_tag_to_hwq(unique);
}

+static inline unsigned int sas_task_to_unique_tag(struct sas_task *task)
+{
+ u32 unique = sas_task_to_rq_unique_tag(task);
+
+ return blk_mq_unique_tag_to_tag(unique);
+
+}

#endif /* _SASLIB_H_ */
--
2.26.2

2022-06-09 11:06:22

by John Garry

[permalink] [raw]
Subject: [PATCH RFC v2 13/18] scsi: libsas: Allocate SCSI commands for tasks

Allocate a SCSI command for a SAS task.

The next step will be to send tasks same as we do for other requests -
through the block layer.

Signed-off-by: John Garry <[email protected]>
---
drivers/scsi/libsas/sas_ata.c | 17 +++++---
drivers/scsi/libsas/sas_expander.c | 11 +++++-
drivers/scsi/libsas/sas_init.c | 60 +++++++++++++++--------------
drivers/scsi/libsas/sas_scsi_host.c | 17 ++++----
include/scsi/libsas.h | 17 +++++++-
5 files changed, 77 insertions(+), 45 deletions(-)

diff --git a/drivers/scsi/libsas/sas_ata.c b/drivers/scsi/libsas/sas_ata.c
index d35c9296f738..66534332e9ac 100644
--- a/drivers/scsi/libsas/sas_ata.c
+++ b/drivers/scsi/libsas/sas_ata.c
@@ -72,7 +72,8 @@ static enum ata_completion_errors sas_to_ata_err(struct task_status_struct *ts)

static void sas_ata_task_done(struct sas_task *task)
{
- struct ata_queued_cmd *qc = task->uldd_task;
+ struct scsi_cmnd *scmd = sas_scmd_from_task(task);
+ struct ata_queued_cmd *qc = TO_SAS_TASK(scmd);
struct domain_device *dev = task->dev;
struct task_status_struct *stat = &task->task_status;
struct ata_task_resp *resp = (struct ata_task_resp *)stat->buf;
@@ -149,7 +150,9 @@ static void sas_ata_task_done(struct sas_task *task)
spin_unlock_irqrestore(ap->lock, flags);

qc_already_gone:
- sas_free_task(task);
+ /* We rely on libata internal command to do this for us */
+ if (!ata_is_scmd_ata_internal(scmd))
+ sas_free_task(task);
}

static unsigned int sas_ata_qc_issue(struct ata_queued_cmd *qc)
@@ -164,6 +167,7 @@ static unsigned int sas_ata_qc_issue(struct ata_queued_cmd *qc)
struct sas_ha_struct *sas_ha = dev->port->ha;
struct Scsi_Host *host = sas_ha->core.shost;
struct sas_internal *i = to_sas_internal(host->transportt);
+ struct scsi_cmnd *scmd;

/* TODO: we should try to remove that unlock */
spin_unlock(ap->lock);
@@ -172,9 +176,9 @@ static unsigned int sas_ata_qc_issue(struct ata_queued_cmd *qc)
if (test_bit(SAS_DEV_GONE, &dev->state))
goto out;

- task = sas_alloc_task(GFP_ATOMIC);
- if (!task)
- goto out;
+ scmd = qc->scsicmd;
+ task = sas_scmd_to_task(scmd);
+
task->dev = dev;
task->task_proto = SAS_PROTOCOL_STP;
task->task_done = sas_ata_task_done;
@@ -596,7 +600,8 @@ int sas_ata_init(struct domain_device *found_dev)

void sas_ata_task_abort(struct sas_task *task)
{
- struct ata_queued_cmd *qc = task->uldd_task;
+ struct scsi_cmnd *scmd = sas_scmd_from_task(task);
+ struct ata_queued_cmd *qc = TO_SAS_TASK(scmd);
struct completion *waiting;

/* Bounce SCSI-initiated commands to the SCSI EH */
diff --git a/drivers/scsi/libsas/sas_expander.c b/drivers/scsi/libsas/sas_expander.c
index 260e735d06fa..b833de062f88 100644
--- a/drivers/scsi/libsas/sas_expander.c
+++ b/drivers/scsi/libsas/sas_expander.c
@@ -39,20 +39,29 @@ static int smp_execute_task_sg(struct domain_device *dev,
struct sas_internal *i =
to_sas_internal(dev->port->ha->core.shost->transportt);
struct sas_ha_struct *ha = dev->port->ha;
+ struct request *rq;

pm_runtime_get_sync(ha->dev);
mutex_lock(&dev->ex_dev.cmd_mutex);
for (retry = 0; retry < 3; retry++) {
+ struct scsi_cmnd *scmd;
+
if (test_bit(SAS_DEV_GONE, &dev->state)) {
res = -ECOMM;
break;
}

- task = sas_alloc_slow_task(GFP_KERNEL);
+ task = sas_alloc_slow_task(dev->port->ha, GFP_KERNEL);
if (!task) {
res = -ENOMEM;
break;
}
+
+ rq = sas_rq_from_task(task);
+
+ scmd = blk_mq_rq_to_pdu(rq);
+ ASSIGN_SAS_TASK(scmd, task);
+
task->dev = dev;
task->task_proto = dev->tproto;
task->smp_task.smp_req = *req;
diff --git a/drivers/scsi/libsas/sas_init.c b/drivers/scsi/libsas/sas_init.c
index b0921fbc35b1..8d03b8abcaa3 100644
--- a/drivers/scsi/libsas/sas_init.c
+++ b/drivers/scsi/libsas/sas_init.c
@@ -21,30 +21,31 @@

#include "scsi_sas_internal.h"

-static struct kmem_cache *sas_task_cache;
static struct kmem_cache *sas_event_cache;

-struct sas_task *sas_alloc_task(gfp_t flags)
+struct sas_task *sas_alloc_slow_task(struct sas_ha_struct *sas_ha, gfp_t flags)
{
- struct sas_task *task = kmem_cache_zalloc(sas_task_cache, flags);
+ struct request *rq;
+ struct sas_task *task;
+ struct sas_task_slow *slow;
+ struct Scsi_Host *shost = sas_ha->core.shost;
+ struct scsi_cmnd *scmd;
+ struct scsi_device *sdev;

- if (task) {
- spin_lock_init(&task->task_state_lock);
- task->task_state_flags = SAS_TASK_STATE_PENDING;
- }
+ sdev = shost->sdev;

- return task;
-}
-EXPORT_SYMBOL_GPL(sas_alloc_task);
+ rq = scsi_alloc_request(sdev->request_queue, REQ_OP_DRV_IN,
+ BLK_MQ_REQ_RESERVED | BLK_MQ_REQ_NOWAIT);

-struct sas_task *sas_alloc_slow_task(gfp_t flags)
-{
- struct sas_task *task = sas_alloc_task(flags);
- struct sas_task_slow *slow = kmalloc(sizeof(*slow), flags);
+ if (IS_ERR(rq))
+ return NULL;
+
+ scmd = blk_mq_rq_to_pdu(rq);
+ task = sas_rq_to_task(rq);
+
+ slow = kmalloc(sizeof(*slow), flags);

if (!task || !slow) {
- if (task)
- kmem_cache_free(sas_task_cache, task);
kfree(slow);
return NULL;
}
@@ -53,17 +54,19 @@ struct sas_task *sas_alloc_slow_task(gfp_t flags)
slow->task = task;
timer_setup(&slow->timer, NULL, 0);
init_completion(&slow->completion);
-
+ scmd->host_scribble = NULL;
return task;
}
EXPORT_SYMBOL_GPL(sas_alloc_slow_task);

void sas_free_task(struct sas_task *task)
{
- if (task) {
- kfree(task->slow_task);
- kmem_cache_free(sas_task_cache, task);
- }
+ struct request *rq = sas_rq_from_task(task);
+
+ kfree(task->slow_task);
+
+ if (blk_mq_is_reserved_rq(rq))
+ blk_mq_free_request(rq);
}
EXPORT_SYMBOL_GPL(sas_free_task);

@@ -95,6 +98,13 @@ void sas_hash_addr(u8 *hashed, const u8 *sas_addr)

int sas_init_priv_cmd(struct Scsi_Host *shost, struct scsi_cmnd *cmd)
{
+ struct sas_task *task = sas_scmd_to_task(cmd);
+
+ memset(task, 0, sizeof(*task));
+ spin_lock_init(&task->task_state_lock);
+ task->task_state_flags = SAS_TASK_STATE_PENDING;
+ cmd->submitter = SUBMITTED_BY_BLOCK_LAYER;
+
return 0;
}

@@ -685,24 +695,18 @@ void sas_free_event(struct asd_sas_event *event)

static int __init sas_class_init(void)
{
- sas_task_cache = KMEM_CACHE(sas_task, SLAB_HWCACHE_ALIGN);
- if (!sas_task_cache)
- goto out;
-
sas_event_cache = KMEM_CACHE(asd_sas_event, SLAB_HWCACHE_ALIGN);
if (!sas_event_cache)
goto free_task_kmem;

return 0;
free_task_kmem:
- kmem_cache_destroy(sas_task_cache);
-out:
+
return -ENOMEM;
}

static void __exit sas_class_exit(void)
{
- kmem_cache_destroy(sas_task_cache);
kmem_cache_destroy(sas_event_cache);
}

diff --git a/drivers/scsi/libsas/sas_scsi_host.c b/drivers/scsi/libsas/sas_scsi_host.c
index 532e734c1fb6..3abb1d622a32 100644
--- a/drivers/scsi/libsas/sas_scsi_host.c
+++ b/drivers/scsi/libsas/sas_scsi_host.c
@@ -98,7 +98,7 @@ static void sas_end_task(struct scsi_cmnd *sc, struct sas_task *task)

static void sas_scsi_task_done(struct sas_task *task)
{
- struct scsi_cmnd *sc = task->uldd_task;
+ struct scsi_cmnd *sc = sas_scmd_from_task(task);
struct domain_device *dev = task->dev;
struct sas_ha_struct *ha = dev->port->ha;
unsigned long flags;
@@ -130,13 +130,12 @@ static struct sas_task *sas_create_task(struct scsi_cmnd *cmd,
struct domain_device *dev,
gfp_t gfp_flags)
{
- struct sas_task *task = sas_alloc_task(gfp_flags);
+ struct sas_task *task = sas_scmd_to_task(cmd);
struct scsi_lun lun;

if (!task)
return NULL;

- task->uldd_task = cmd;
ASSIGN_SAS_TASK(cmd, task);

task->dev = dev;
@@ -958,7 +957,7 @@ static int sas_execute_internal_abort(struct domain_device *device,
int res, retry;

for (retry = 0; retry < TASK_RETRY; retry++) {
- task = sas_alloc_slow_task(GFP_KERNEL);
+ task = sas_alloc_slow_task(ha, GFP_KERNEL);
if (!task)
return -ENOMEM;

@@ -1044,10 +1043,12 @@ int sas_execute_tmf(struct domain_device *device, void *parameter,
struct sas_task *task;
struct sas_internal *i =
to_sas_internal(device->port->ha->core.shost->transportt);
+ struct sas_ha_struct *ha = device->port->ha;
int res, retry;

+
for (retry = 0; retry < TASK_RETRY; retry++) {
- task = sas_alloc_slow_task(GFP_KERNEL);
+ task = sas_alloc_slow_task(ha, GFP_KERNEL);
if (!task)
return -ENOMEM;

@@ -1204,7 +1205,7 @@ int sas_query_task(struct sas_task *task, u16 tag)
.tmf = TMF_QUERY_TASK,
.tag_of_task_to_be_managed = tag,
};
- struct scsi_cmnd *cmnd = task->uldd_task;
+ struct scsi_cmnd *cmnd = sas_scmd_from_task(task);
struct domain_device *dev = task->dev;
struct scsi_lun lun;

@@ -1220,7 +1221,7 @@ int sas_abort_task(struct sas_task *task, u16 tag)
.tmf = TMF_ABORT_TASK,
.tag_of_task_to_be_managed = tag,
};
- struct scsi_cmnd *cmnd = task->uldd_task;
+ struct scsi_cmnd *cmnd = sas_scmd_from_task(task);
struct domain_device *dev = task->dev;
struct scsi_lun lun;

@@ -1236,7 +1237,7 @@ EXPORT_SYMBOL_GPL(sas_abort_task);
*/
void sas_task_abort(struct sas_task *task)
{
- struct scsi_cmnd *sc = task->uldd_task;
+ struct scsi_cmnd *sc = sas_scmd_from_task(task);

/* Escape for libsas internal commands */
if (!sc) {
diff --git a/include/scsi/libsas.h b/include/scsi/libsas.h
index 92fc3e5ef297..024b4c4eec3b 100644
--- a/include/scsi/libsas.h
+++ b/include/scsi/libsas.h
@@ -639,8 +639,7 @@ struct sas_task_slow {
#define SAS_TASK_STATE_ABORTED 4
#define SAS_TASK_NEED_DEV_RESET 8

-extern struct sas_task *sas_alloc_task(gfp_t flags);
-extern struct sas_task *sas_alloc_slow_task(gfp_t flags);
+extern struct sas_task *sas_alloc_slow_task(struct sas_ha_struct *, gfp_t flags);
extern void sas_free_task(struct sas_task *task);

static inline bool sas_is_internal_abort(struct sas_task *task)
@@ -766,10 +765,24 @@ static inline struct sas_task *sas_rq_to_task(struct request *rq)
return (struct sas_task *)(scmd + 1);
}

+static inline struct request *sas_rq_from_task(void *task)
+{
+ struct scsi_cmnd *scmd = task - sizeof(*scmd);
+
+ return blk_mq_rq_from_pdu(scmd);
+}
+
static inline struct scsi_cmnd *sas_scmd_from_task(void *task)
{
struct scsi_cmnd *scmd = task - sizeof(*scmd);

return scmd;
}
+
+static inline struct sas_task *sas_scmd_to_task(struct scsi_cmnd *scmd)
+{
+ struct request *rq = blk_mq_rq_from_pdu(scmd);
+ return sas_rq_to_task(rq);
+}
+
#endif /* _SASLIB_H_ */
--
2.26.2

2022-06-09 11:06:45

by John Garry

[permalink] [raw]
Subject: [PATCH RFC v2 15/18] scsi: libsas: Queue TMF commands as requests

Like what we did with SMP commands, send TMF commands through the block
layer.

In future we can now also take advantage of the block layer request
timeout handling.

Signed-off-by: John Garry <[email protected]>
---
drivers/scsi/libsas/sas_scsi_host.c | 28 ++++++++++++++++------------
1 file changed, 16 insertions(+), 12 deletions(-)

diff --git a/drivers/scsi/libsas/sas_scsi_host.c b/drivers/scsi/libsas/sas_scsi_host.c
index f0566f4512b2..520c301e4319 100644
--- a/drivers/scsi/libsas/sas_scsi_host.c
+++ b/drivers/scsi/libsas/sas_scsi_host.c
@@ -1059,18 +1059,27 @@ int sas_execute_tmf(struct domain_device *device, void *parameter,
struct sas_task *task;
struct sas_internal *i =
to_sas_internal(device->port->ha->core.shost->transportt);
- struct sas_ha_struct *ha = device->port->ha;
int res, retry;
-
+ struct request *rq;
+ struct sas_ha_struct *ha = device->port->ha;

for (retry = 0; retry < TASK_RETRY; retry++) {
+ struct scsi_cmnd *scmd;
+
task = sas_alloc_slow_task(ha, GFP_KERNEL);
- if (!task)
- return -ENOMEM;
+ if (!task) {
+ res = -ENOMEM;
+ break;
+ }

task->dev = device;
task->task_proto = device->tproto;

+ rq = sas_rq_from_task(task);
+
+ scmd = blk_mq_rq_to_pdu(rq);
+ ASSIGN_SAS_TASK(scmd, task);
+
if (dev_is_sata(device)) {
task->ata_task.device_control_reg_update = 1;
if (force_phy_id >= 0) {
@@ -1082,20 +1091,15 @@ int sas_execute_tmf(struct domain_device *device, void *parameter,
memcpy(&task->ssp_task, parameter, para_len);
}

- task->task_done = sas_task_internal_done;
+ task->task_done = sas_task_complete_internal;
task->tmf = tmf;

task->slow_task->timer.function = sas_task_internal_timedout;
task->slow_task->timer.expires = jiffies + TASK_TIMEOUT;
add_timer(&task->slow_task->timer);

- res = i->dft->lldd_execute_task(task, GFP_KERNEL);
- if (res) {
- del_timer_sync(&task->slow_task->timer);
- pr_err("executing TMF task failed %016llx (%d)\n",
- SAS_ADDR(device->sas_addr), res);
- break;
- }
+ rq->end_io = sas_blk_end_sync_rq;
+ blk_execute_rq_nowait(rq, true);

wait_for_completion(&task->slow_task->completion);

--
2.26.2

2022-06-09 11:08:01

by John Garry

[permalink] [raw]
Subject: [PATCH RFC v2 06/18] libata-scsi: Add ata_scsi_queue_internal()

Add a function to handle queued ATA internal SCSI cmnds - does much the
same as ata_exec_internal_sg() does (which will be fixed up later to
actually queue internal cmnds through this function).

Signed-off-by: John Garry <[email protected]>
---
drivers/ata/libata-scsi.c | 47 ++++++++++++++++++++++++++++++++++++++-
include/linux/libata.h | 6 +++++
2 files changed, 52 insertions(+), 1 deletion(-)

diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
index 42cecf95a4e5..baac35dd17ca 100644
--- a/drivers/ata/libata-scsi.c
+++ b/drivers/ata/libata-scsi.c
@@ -3963,6 +3963,49 @@ static inline ata_xlat_func_t ata_get_xlat_func(struct ata_device *dev, u8 cmd)
return NULL;
}

+static unsigned int ata_scsi_queue_internal(struct scsi_cmnd *scmd,
+ struct ata_device *dev)
+{
+ struct ata_link *link = dev->link;
+ struct ata_port *ap = link->ap;
+ struct ata_queued_cmd *qc;
+
+ /* no internal command while frozen */
+ if (ap->pflags & ATA_PFLAG_FROZEN)
+ goto did_err;
+
+ /* initialize internal qc */
+ qc = __ata_qc_from_tag(ap, ATA_TAG_INTERNAL);
+ link->preempted_tag = link->active_tag;
+ link->preempted_sactive = link->sactive;
+ ap->preempted_qc_active = ap->qc_active;
+ ap->preempted_nr_active_links = ap->nr_active_links;
+ link->active_tag = ATA_TAG_POISON;
+ link->sactive = 0;
+ ap->qc_active = 0;
+ ap->nr_active_links = 0;
+
+ if (qc->dma_dir != DMA_NONE) {
+ int n_elem;
+
+ n_elem = 1;
+ qc->n_elem = n_elem;
+ qc->sg = scsi_sglist(scmd);
+ qc->nbytes = qc->sg->length;
+ ata_sg_init(qc, qc->sg, n_elem);
+ }
+
+ scmd->submitter = SUBMITTED_BY_BLOCK_LAYER;
+
+ ata_qc_issue(qc);
+
+ return 0;
+did_err:
+ scmd->result = (DID_ERROR << 16);
+ scsi_done(scmd);
+ return 0;
+}
+
int __ata_scsi_queuecmd(struct scsi_cmnd *scmd, struct ata_device *dev)
{
u8 scsi_op = scmd->cmnd[0];
@@ -3971,7 +4014,9 @@ int __ata_scsi_queuecmd(struct scsi_cmnd *scmd, struct ata_device *dev)
if (unlikely(!scmd->cmd_len))
goto bad_cdb_len;

- if (dev->class == ATA_DEV_ATA || dev->class == ATA_DEV_ZAC) {
+ if (scsi_is_reserved_cmd(scmd)) {
+ return ata_scsi_queue_internal(scmd, dev);
+ } else if (dev->class == ATA_DEV_ATA || dev->class == ATA_DEV_ZAC) {
if (unlikely(scmd->cmd_len > dev->cdb_len))
goto bad_cdb_len;

diff --git a/include/linux/libata.h b/include/linux/libata.h
index 732de9014626..43f4bcfe9a5f 100644
--- a/include/linux/libata.h
+++ b/include/linux/libata.h
@@ -767,7 +767,9 @@ struct ata_link {

struct device tdev;
unsigned int active_tag; /* active tag on this link */
+ unsigned int preempted_tag;
u32 sactive; /* active NCQ commands */
+ u32 preempted_sactive;

unsigned int flags; /* ATA_LFLAG_xxx */

@@ -861,6 +863,10 @@ struct ata_port {
#ifdef CONFIG_ATA_ACPI
struct ata_acpi_gtm __acpi_init_gtm; /* use ata_acpi_init_gtm() */
#endif
+
+ u64 preempted_qc_active;
+ int preempted_nr_active_links;
+
/* owned by EH */
u8 sector_buf[ATA_SECT_SIZE] ____cacheline_aligned;
};
--
2.26.2

2022-06-09 11:08:44

by John Garry

[permalink] [raw]
Subject: [PATCH RFC v2 12/18] scsi: libsas drivers: Prepare for reserved commands

Set the various LLDD fields to be able to use reserved commands. I have
only done this for hisi sas v2 HW, but all others need this treatment.

Signed-off-by: John Garry <[email protected]>
---
drivers/scsi/hisi_sas/hisi_sas_main.c | 5 +++--
drivers/scsi/hisi_sas/hisi_sas_v2_hw.c | 3 +++
drivers/scsi/libsas/sas_init.c | 5 +++++
include/scsi/libsas.h | 8 ++++++++
4 files changed, 19 insertions(+), 2 deletions(-)

diff --git a/drivers/scsi/hisi_sas/hisi_sas_main.c b/drivers/scsi/hisi_sas/hisi_sas_main.c
index 0219ccac9062..2c5c6301f224 100644
--- a/drivers/scsi/hisi_sas/hisi_sas_main.c
+++ b/drivers/scsi/hisi_sas/hisi_sas_main.c
@@ -2427,8 +2427,9 @@ int hisi_sas_probe(struct platform_device *pdev,
shost->max_channel = 1;
shost->max_cmd_len = 16;
if (hisi_hba->hw->slot_index_alloc) {
- shost->can_queue = HISI_SAS_MAX_COMMANDS;
- shost->cmd_per_lun = HISI_SAS_MAX_COMMANDS;
+ shost->can_queue = HISI_SAS_UNRESERVED_IPTT;
+ shost->nr_reserved_cmds = HISI_SAS_RESERVED_IPTT;
+ shost->cmd_per_lun = HISI_SAS_UNRESERVED_IPTT;
} else {
shost->can_queue = HISI_SAS_UNRESERVED_IPTT;
shost->cmd_per_lun = HISI_SAS_UNRESERVED_IPTT;
diff --git a/drivers/scsi/hisi_sas/hisi_sas_v2_hw.c b/drivers/scsi/hisi_sas/hisi_sas_v2_hw.c
index 455d49299ddf..a1e60d2ef070 100644
--- a/drivers/scsi/hisi_sas/hisi_sas_v2_hw.c
+++ b/drivers/scsi/hisi_sas/hisi_sas_v2_hw.c
@@ -3564,6 +3564,7 @@ static struct scsi_host_template sht_v2_hw = {
.proc_name = DRV_NAME,
.module = THIS_MODULE,
.queuecommand = sas_queuecommand,
+ .reserved_queuecommand = sas_queuecommand_internal,
.dma_need_drain = ata_scsi_dma_need_drain,
.target_alloc = sas_target_alloc,
.slave_configure = hisi_sas_slave_configure,
@@ -3586,6 +3587,8 @@ static struct scsi_host_template sht_v2_hw = {
.host_reset = hisi_sas_host_reset,
.map_queues = map_queues_v2_hw,
.host_tagset = 1,
+ .cmd_size = sizeof(struct sas_task),
+ .init_cmd_priv = sas_init_priv_cmd,
};

static const struct hisi_sas_hw hisi_sas_v2_hw = {
diff --git a/drivers/scsi/libsas/sas_init.c b/drivers/scsi/libsas/sas_init.c
index dc35f0f8eae3..b0921fbc35b1 100644
--- a/drivers/scsi/libsas/sas_init.c
+++ b/drivers/scsi/libsas/sas_init.c
@@ -93,6 +93,11 @@ void sas_hash_addr(u8 *hashed, const u8 *sas_addr)
hashed[2] = r & 0xFF;
}

+int sas_init_priv_cmd(struct Scsi_Host *shost, struct scsi_cmnd *cmd)
+{
+ return 0;
+}
+
int sas_register_ha(struct sas_ha_struct *sas_ha)
{
char name[64];
diff --git a/include/scsi/libsas.h b/include/scsi/libsas.h
index 261169ffdca6..92fc3e5ef297 100644
--- a/include/scsi/libsas.h
+++ b/include/scsi/libsas.h
@@ -693,6 +693,8 @@ extern void sas_resume_ha(struct sas_ha_struct *sas_ha);
extern void sas_resume_ha_no_sync(struct sas_ha_struct *sas_ha);
extern void sas_suspend_ha(struct sas_ha_struct *sas_ha);

+extern int sas_init_priv_cmd(struct Scsi_Host *shost, struct scsi_cmnd *cmd);
+
int sas_set_phy_speed(struct sas_phy *phy, struct sas_phy_linkrates *rates);
int sas_phy_reset(struct sas_phy *phy, int hard_reset);
int sas_phy_enable(struct sas_phy *phy, int enable);
@@ -764,4 +766,10 @@ static inline struct sas_task *sas_rq_to_task(struct request *rq)
return (struct sas_task *)(scmd + 1);
}

+static inline struct scsi_cmnd *sas_scmd_from_task(void *task)
+{
+ struct scsi_cmnd *scmd = task - sizeof(*scmd);
+
+ return scmd;
+}
#endif /* _SASLIB_H_ */
--
2.26.2

2022-06-09 11:09:05

by John Garry

[permalink] [raw]
Subject: [PATCH RFC v2 04/18] scsi: core: Add support to send reserved commands

Add a method to queue reserved commands.

TODO:
- fix timeout handler to call into reserved commands
- We don't clear host_scribble for libata qc, but we should be able to drop
this if we store libata qc in the scsi cmnd priv_data

Signed-off-by: John Garry <[email protected]>
---
drivers/scsi/hosts.c | 6 ++++++
drivers/scsi/scsi_lib.c | 32 ++++++++++++++++++++++++++++++++
include/scsi/scsi_cmnd.h | 5 +++++
include/scsi/scsi_host.h | 1 +
4 files changed, 44 insertions(+)

diff --git a/drivers/scsi/hosts.c b/drivers/scsi/hosts.c
index 27296addaf63..5c9b05a8fec8 100644
--- a/drivers/scsi/hosts.c
+++ b/drivers/scsi/hosts.c
@@ -221,6 +221,12 @@ int scsi_add_host_with_dma(struct Scsi_Host *shost, struct device *dev,
goto fail;
}

+ if (shost->nr_reserved_cmds && !sht->reserved_queuecommand) {
+ shost_printk(KERN_ERR, shost,
+ "nr_reserved_cmds set but no method to queue\n");
+ goto fail;
+ }
+
/* Use min_t(int, ...) in case shost->can_queue exceeds SHRT_MAX */
shost->cmd_per_lun = min_t(int, shost->cmd_per_lun,
shost->can_queue);
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index f6e53c6d913c..8c8b4c6767d9 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -1422,6 +1422,16 @@ static void scsi_complete(struct request *rq)
struct scsi_cmnd *cmd = blk_mq_rq_to_pdu(rq);
enum scsi_disposition disposition;

+ if (scsi_is_reserved_cmd(cmd)) {
+ struct scsi_device *sdev = cmd->device;
+
+ scsi_mq_uninit_cmd(cmd);
+ scsi_device_unbusy(sdev, cmd);
+ __blk_mq_end_request(rq, 0);
+
+ return;
+ }
+
INIT_LIST_HEAD(&cmd->eh_entry);

atomic_inc(&cmd->device->iodone_cnt);
@@ -1706,6 +1716,28 @@ static blk_status_t scsi_queue_rq(struct blk_mq_hw_ctx *hctx,

WARN_ON_ONCE(cmd->budget_token < 0);

+ if (scsi_is_reserved_cmd(cmd)) {
+ unsigned char *host_scribble = cmd->host_scribble;
+
+ if (!(req->rq_flags & RQF_DONTPREP)) {
+ ret = scsi_prepare_cmd(req);
+ if (ret != BLK_STS_OK) {
+
+ goto out_dec_host_busy;
+ }
+
+ req->rq_flags |= RQF_DONTPREP;
+ } else {
+ clear_bit(SCMD_STATE_COMPLETE, &cmd->state);
+ }
+
+ cmd->host_scribble = host_scribble;
+
+ blk_mq_start_request(req);
+
+ return shost->hostt->reserved_queuecommand(shost, cmd);
+ }
+
/*
* If the device is not in running state we will reject some or all
* commands.
diff --git a/include/scsi/scsi_cmnd.h b/include/scsi/scsi_cmnd.h
index 1e80e70dfa92..e47df5836070 100644
--- a/include/scsi/scsi_cmnd.h
+++ b/include/scsi/scsi_cmnd.h
@@ -152,6 +152,11 @@ static inline void *scsi_cmd_priv(struct scsi_cmnd *cmd)
return cmd + 1;
}

+static inline bool scsi_is_reserved_cmd(struct scsi_cmnd *cmd)
+{
+ return blk_mq_is_reserved_rq(scsi_cmd_to_rq(cmd));
+}
+
void scsi_done(struct scsi_cmnd *cmd);
void scsi_done_direct(struct scsi_cmnd *cmd);

diff --git a/include/scsi/scsi_host.h b/include/scsi/scsi_host.h
index 149dcbd4125e..88c8504395c8 100644
--- a/include/scsi/scsi_host.h
+++ b/include/scsi/scsi_host.h
@@ -73,6 +73,7 @@ struct scsi_host_template {
* STATUS: REQUIRED
*/
int (* queuecommand)(struct Scsi_Host *, struct scsi_cmnd *);
+ int (* reserved_queuecommand)(struct Scsi_Host *, struct scsi_cmnd *);

/*
* The commit_rqs function is used to trigger a hardware
--
2.26.2

2022-06-09 11:10:51

by John Garry

[permalink] [raw]
Subject: [PATCH RFC v2 10/18] libata/scsi: libsas: Add sas_queuecommand_internal()

Add a callback for sht reserved_queuecommand callback. We need to add
libata helper ata_is_scmd_ata_internal() as it is difficult to know
whether the request should be queued as an ATA internal command.

We will store the sas_task in the scsi_cmnd payload also in future.

Signed-off-by: John Garry <[email protected]>
---
drivers/ata/libata-core.c | 8 ++++++++
drivers/scsi/libsas/sas_scsi_host.c | 21 +++++++++++++++++++++
include/linux/libata.h | 1 +
include/scsi/libsas.h | 8 ++++++++
4 files changed, 38 insertions(+)

diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c
index 6b1aaeccb253..a9645951a0f0 100644
--- a/drivers/ata/libata-core.c
+++ b/drivers/ata/libata-core.c
@@ -1454,6 +1454,14 @@ static void ata_internal_end_rq(struct request *rq, blk_status_t error)
complete(waiting);
}

+bool ata_is_scmd_ata_internal(struct scsi_cmnd *scmd)
+{
+ struct request *rq = scsi_cmd_to_rq(scmd);
+
+ return rq->end_io == ata_internal_end_rq;
+}
+EXPORT_SYMBOL_GPL(ata_is_scmd_ata_internal);
+
/**
* ata_exec_internal_sg - execute libata internal command
* @dev: Device to which the command is sent
diff --git a/drivers/scsi/libsas/sas_scsi_host.c b/drivers/scsi/libsas/sas_scsi_host.c
index 9c82e5dc4fcc..42279a6d6b06 100644
--- a/drivers/scsi/libsas/sas_scsi_host.c
+++ b/drivers/scsi/libsas/sas_scsi_host.c
@@ -158,6 +158,27 @@ static struct sas_task *sas_create_task(struct scsi_cmnd *cmd,
return task;
}

+int sas_queuecommand_internal(struct Scsi_Host *shost, struct scsi_cmnd *cmnd)
+{
+ struct sas_ha_struct *ha = SHOST_TO_SAS_HA(shost);
+ struct sas_internal *i = to_sas_internal(ha->core.shost->transportt);
+ struct request *rq = scsi_cmd_to_rq(cmnd);
+
+ if (ata_is_scmd_ata_internal(cmnd)) {
+ struct ata_queued_cmd *qc = (struct ata_queued_cmd *)cmnd->host_scribble;
+ struct ata_port *ap = qc->ap;
+ int res;
+
+ spin_lock_irq(ap->lock);
+ res = ata_sas_queuecmd(cmnd, ap);
+ spin_unlock_irq(ap->lock);
+ return res;
+ }
+
+ return i->dft->lldd_execute_task(sas_rq_to_task(rq), GFP_KERNEL);
+}
+EXPORT_SYMBOL_GPL(sas_queuecommand_internal);
+
int sas_queuecommand(struct Scsi_Host *host, struct scsi_cmnd *cmd)
{
struct sas_internal *i = to_sas_internal(host->transportt);
diff --git a/include/linux/libata.h b/include/linux/libata.h
index 5fa6f56bba81..6cb7a5bcb308 100644
--- a/include/linux/libata.h
+++ b/include/linux/libata.h
@@ -1238,6 +1238,7 @@ extern int ata_sas_slave_configure(struct scsi_device *, struct ata_port *);
extern int ata_sas_queuecmd(struct scsi_cmnd *cmd, struct ata_port *ap);
extern void ata_tf_to_fis(const struct ata_taskfile *tf,
u8 pmp, int is_cmd, u8 *fis);
+extern bool ata_is_scmd_ata_internal(struct scsi_cmnd *scmd);
extern void ata_tf_from_fis(const u8 *fis, struct ata_taskfile *tf);
extern int ata_qc_complete_multiple(struct ata_port *ap, u64 qc_active);
extern bool sata_lpm_ignore_phy_events(struct ata_link *link);
diff --git a/include/scsi/libsas.h b/include/scsi/libsas.h
index ff04eb6d250b..261169ffdca6 100644
--- a/include/scsi/libsas.h
+++ b/include/scsi/libsas.h
@@ -725,6 +725,7 @@ int sas_discover_sata(struct domain_device *);
int sas_discover_end_dev(struct domain_device *);

void sas_unregister_dev(struct asd_sas_port *port, struct domain_device *);
+int sas_queuecommand_internal(struct Scsi_Host *shost, struct scsi_cmnd *cmnd);

void sas_init_dev(struct domain_device *);

@@ -756,4 +757,11 @@ void sas_notify_port_event(struct asd_sas_phy *phy, enum port_event event,
void sas_notify_phy_event(struct asd_sas_phy *phy, enum phy_event event,
gfp_t gfp_flags);

+static inline struct sas_task *sas_rq_to_task(struct request *rq)
+{
+ struct scsi_cmnd *scmd = blk_mq_rq_to_pdu(rq);
+
+ return (struct sas_task *)(scmd + 1);
+}
+
#endif /* _SASLIB_H_ */
--
2.26.2

2022-06-09 11:32:08

by John Garry

[permalink] [raw]
Subject: [PATCH RFC v2 09/18] scsi: ipr: Support reserved commands

Support a single reserved command for ATA internal commands. We also
add SHT reserved_queuecommand callback to support queueing those internal
commands.

Signed-off-by: John Garry <[email protected]>
---
drivers/scsi/ipr.c | 17 ++++++++++++++++-
1 file changed, 16 insertions(+), 1 deletion(-)

diff --git a/drivers/scsi/ipr.c b/drivers/scsi/ipr.c
index 256ec6d08c16..4f4cf39cd2bc 100644
--- a/drivers/scsi/ipr.c
+++ b/drivers/scsi/ipr.c
@@ -6740,6 +6740,19 @@ static const char *ipr_ioa_info(struct Scsi_Host *host)
return buffer;
}

+static int ipr_reserved_queuecommand(struct Scsi_Host *shost, struct scsi_cmnd *scmd)
+{
+ struct ata_queued_cmd *qc = (struct ata_queued_cmd *)scmd->host_scribble;
+ struct ata_port *ap = qc->ap;
+ int ret;
+
+ /* We're only going to be seeing ATA internal commands */
+ spin_lock_irq(ap->lock);
+ ret = ata_sas_queuecmd(scmd, ap);
+ spin_unlock_irq(ap->lock);
+ return ret;
+}
+
static struct scsi_host_template driver_template = {
.module = THIS_MODULE,
.name = "IPR",
@@ -6749,6 +6762,7 @@ static struct scsi_host_template driver_template = {
.compat_ioctl = ipr_ioctl,
#endif
.queuecommand = ipr_queuecommand,
+ .reserved_queuecommand = ipr_reserved_queuecommand,
.dma_need_drain = ata_scsi_dma_need_drain,
.eh_abort_handler = ipr_eh_abort,
.eh_device_reset_handler = ipr_eh_dev_reset,
@@ -9991,7 +10005,8 @@ static void ipr_init_ioa_cfg(struct ipr_ioa_cfg *ioa_cfg,

host->unique_id = host->host_no;
host->max_cmd_len = IPR_MAX_CDB_LEN;
- host->can_queue = ioa_cfg->max_cmds;
+ host->can_queue = ioa_cfg->max_cmds - 1;
+ host->nr_reserved_cmds = 1;
pci_set_drvdata(pdev, ioa_cfg);

for (i = 0; i < ARRAY_SIZE(ioa_cfg->hrrq); i++) {
--
2.26.2

2022-06-09 11:33:31

by John Garry

[permalink] [raw]
Subject: [PATCH RFC v2 17/18] scsi: libsas: Queue internal abort commands as requests

Like what we did for SMP commands, send internal abort commands through
the block layer.

In future we can now also take advantage of the block layer request
timeout handling.

Signed-off-by: John Garry <[email protected]>
---
drivers/scsi/hisi_sas/hisi_sas_main.c | 4 +---
drivers/scsi/libsas/sas_init.c | 19 ++++++++++++++----
drivers/scsi/libsas/sas_internal.h | 3 +++
drivers/scsi/libsas/sas_scsi_host.c | 28 +++++++++++++++------------
include/scsi/libsas.h | 16 ++++++++++++++-
5 files changed, 50 insertions(+), 20 deletions(-)

diff --git a/drivers/scsi/hisi_sas/hisi_sas_main.c b/drivers/scsi/hisi_sas/hisi_sas_main.c
index 2c5c6301f224..50ca7d63ab58 100644
--- a/drivers/scsi/hisi_sas/hisi_sas_main.c
+++ b/drivers/scsi/hisi_sas/hisi_sas_main.c
@@ -554,13 +554,11 @@ static int hisi_sas_queue_command(struct sas_task *task, gfp_t gfp_flags)
if (test_bit(HISI_SAS_HW_FAULT_BIT, &hisi_hba->flags))
return -EIO;

- hisi_hba = dev_to_hisi_hba(device);
-
if (unlikely(test_bit(HISI_SAS_REJECT_CMD_BIT, &hisi_hba->flags)))
return -EINVAL;

port = to_hisi_sas_port(sas_port);
- dq = &hisi_hba->dq[task->abort_task.qid];
+ dq = &hisi_hba->dq[sas_task_to_hwq(task)];
break;
default:
dev_err(hisi_hba->dev, "task prep: unknown/unsupported proto (0x%x)\n",
diff --git a/drivers/scsi/libsas/sas_init.c b/drivers/scsi/libsas/sas_init.c
index 8d03b8abcaa3..1224ad2b44ce 100644
--- a/drivers/scsi/libsas/sas_init.c
+++ b/drivers/scsi/libsas/sas_init.c
@@ -23,7 +23,8 @@

static struct kmem_cache *sas_event_cache;

-struct sas_task *sas_alloc_slow_task(struct sas_ha_struct *sas_ha, gfp_t flags)
+struct sas_task *sas_alloc_slow_task_qid(struct sas_ha_struct *sas_ha,
+ gfp_t flags, unsigned int qid)
{
struct request *rq;
struct sas_task *task;
@@ -34,9 +35,14 @@ struct sas_task *sas_alloc_slow_task(struct sas_ha_struct *sas_ha, gfp_t flags)

sdev = shost->sdev;

- rq = scsi_alloc_request(sdev->request_queue, REQ_OP_DRV_IN,
- BLK_MQ_REQ_RESERVED | BLK_MQ_REQ_NOWAIT);
-
+ if (qid == -1U) {
+ rq = scsi_alloc_request(sdev->request_queue, REQ_OP_DRV_IN,
+ BLK_MQ_REQ_RESERVED | BLK_MQ_REQ_NOWAIT);
+ } else {
+ rq = scsi_alloc_request_hwq(sdev->request_queue, REQ_OP_DRV_IN,
+ BLK_MQ_REQ_RESERVED | BLK_MQ_REQ_NOWAIT,
+ qid);
+ }
if (IS_ERR(rq))
return NULL;

@@ -57,6 +63,11 @@ struct sas_task *sas_alloc_slow_task(struct sas_ha_struct *sas_ha, gfp_t flags)
scmd->host_scribble = NULL;
return task;
}
+
+struct sas_task *sas_alloc_slow_task(struct sas_ha_struct *sas_ha, gfp_t flags)
+{
+ return sas_alloc_slow_task_qid(sas_ha, flags, -1U);
+}
EXPORT_SYMBOL_GPL(sas_alloc_slow_task);

void sas_free_task(struct sas_task *task)
diff --git a/drivers/scsi/libsas/sas_internal.h b/drivers/scsi/libsas/sas_internal.h
index 52cfa75d432b..eb70cc0d04fe 100644
--- a/drivers/scsi/libsas/sas_internal.h
+++ b/drivers/scsi/libsas/sas_internal.h
@@ -102,6 +102,9 @@ int sas_execute_tmf(struct domain_device *device, void *parameter,
void sas_task_complete_internal(struct sas_task *task);
void sas_blk_end_sync_rq(struct request *rq, blk_status_t error);

+struct sas_task *sas_alloc_slow_task_qid(struct sas_ha_struct *sas_ha,
+ gfp_t flags, unsigned int qid);
+
#ifdef CONFIG_SCSI_SAS_HOST_SMP
extern void sas_smp_host_handler(struct bsg_job *job, struct Scsi_Host *shost);
#else
diff --git a/drivers/scsi/libsas/sas_scsi_host.c b/drivers/scsi/libsas/sas_scsi_host.c
index 520c301e4319..3c62a104f049 100644
--- a/drivers/scsi/libsas/sas_scsi_host.c
+++ b/drivers/scsi/libsas/sas_scsi_host.c
@@ -973,28 +973,32 @@ static int sas_execute_internal_abort(struct domain_device *device,
int res, retry;

for (retry = 0; retry < TASK_RETRY; retry++) {
- task = sas_alloc_slow_task(ha, GFP_KERNEL);
- if (!task)
- return -ENOMEM;
+ struct scsi_cmnd *scmd;
+ struct request *rq;
+
+ task = sas_alloc_slow_task_qid(ha, GFP_KERNEL, qid);
+ if (!task) {
+ res = -ENOMEM;
+ break;
+ }

task->dev = device;
task->task_proto = SAS_PROTOCOL_INTERNAL_ABORT;
- task->task_done = sas_task_internal_done;
+ task->task_done = sas_task_complete_internal;
task->slow_task->timer.function = sas_task_internal_timedout;
task->slow_task->timer.expires = jiffies + TASK_TIMEOUT;
add_timer(&task->slow_task->timer);

task->abort_task.tag = tag;
task->abort_task.type = type;
- task->abort_task.qid = qid;

- res = i->dft->lldd_execute_task(task, GFP_KERNEL);
- if (res) {
- del_timer_sync(&task->slow_task->timer);
- pr_err("Executing internal abort failed %016llx (%d)\n",
- SAS_ADDR(device->sas_addr), res);
- break;
- }
+ rq = sas_rq_from_task(task);
+
+ scmd = blk_mq_rq_to_pdu(rq);
+ ASSIGN_SAS_TASK(scmd, task);
+
+ rq->end_io = sas_blk_end_sync_rq;
+ blk_execute_rq_nowait(rq, true);

wait_for_completion(&task->slow_task->completion);
res = TMF_RESP_FUNC_FAILED;
diff --git a/include/scsi/libsas.h b/include/scsi/libsas.h
index 024b4c4eec3b..62acbc8a46fd 100644
--- a/include/scsi/libsas.h
+++ b/include/scsi/libsas.h
@@ -565,7 +565,6 @@ enum sas_internal_abort {

struct sas_internal_abort_task {
enum sas_internal_abort type;
- unsigned int qid;
u16 tag;
};

@@ -785,4 +784,19 @@ static inline struct sas_task *sas_scmd_to_task(struct scsi_cmnd *scmd)
return sas_rq_to_task(rq);
}

+static inline u32 sas_task_to_rq_unique_tag(struct sas_task *task)
+{
+ struct request *rq = sas_rq_from_task(task);
+
+ return blk_mq_unique_tag(rq);
+}
+
+static inline unsigned int sas_task_to_hwq(struct sas_task *task)
+{
+ u32 unique = sas_task_to_rq_unique_tag(task);
+
+ return blk_mq_unique_tag_to_hwq(unique);
+}
+
+
#endif /* _SASLIB_H_ */
--
2.26.2

2022-06-09 11:35:03

by John Garry

[permalink] [raw]
Subject: [PATCH RFC v2 08/18] libata: Queue ATA internal commands as requests

Follow the normal path for requests and queue through the block layer.

We hold the qc pointer in the scmd host scribble, which is less than
ideal. In future we can hold the ata_queued_cmd in the scmd priv_data,
allowing for easy lookup.

We need to use the shost sdev to queue these requests. That is because we
initially do not have the target disk scsi_device allocated yet when
queueing internal commands. This is less than ideal, and makes it hard for
libsas to know when queueing an internal command whether it is a ATA
internal command or not.

Also make ata_exec_internal_sg() static - there are no users outside
libata-core.c

Signed-off-by: John Garry <[email protected]>
---
drivers/ata/libata-core.c | 137 +++++++++++++++++++++-----------------
drivers/ata/libata-sata.c | 5 +-
drivers/ata/libata.h | 4 --
3 files changed, 80 insertions(+), 66 deletions(-)

diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c
index 40e816419f48..6b1aaeccb253 100644
--- a/drivers/ata/libata-core.c
+++ b/drivers/ata/libata-core.c
@@ -1438,9 +1438,18 @@ unsigned long ata_id_xfermask(const u16 *id)
}
EXPORT_SYMBOL_GPL(ata_id_xfermask);

-static void ata_qc_complete_internal(struct ata_queued_cmd *qc)
+void ata_qc_complete_internal(struct ata_queued_cmd *qc)
{
- struct completion *waiting = qc->private_data;
+ struct scsi_cmnd *scmd = qc->scsicmd;
+
+ scsi_done(scmd);
+}
+
+static void ata_internal_end_rq(struct request *rq, blk_status_t error)
+{
+ struct completion *waiting = rq->end_io_data;
+
+ rq->end_io_data = (void *)(uintptr_t)error;

complete(waiting);
}
@@ -1467,52 +1476,74 @@ static void ata_qc_complete_internal(struct ata_queued_cmd *qc)
* RETURNS:
* Zero on success, AC_ERR_* mask on failure
*/
-unsigned ata_exec_internal_sg(struct ata_device *dev,
+static unsigned ata_exec_internal_sg(struct ata_device *dev,
struct ata_taskfile *tf, const u8 *cdb,
- int dma_dir, struct scatterlist *sgl,
- unsigned int n_elem, unsigned long timeout)
+ int dma_dir, void *buf, unsigned int buflen,
+ unsigned long timeout)
{
struct ata_link *link = dev->link;
struct ata_port *ap = link->ap;
+ struct Scsi_Host *scsi_host = ap->scsi_host;
+ struct scsi_device *sdev = scsi_host->sdev;
u8 command = tf->command;
int auto_timeout = 0;
struct ata_queued_cmd *qc;
- unsigned int preempted_tag;
- u32 preempted_sactive;
- u64 preempted_qc_active;
- int preempted_nr_active_links;
DECLARE_COMPLETION_ONSTACK(wait);
unsigned long flags;
unsigned int err_mask;
+ struct scsi_cmnd *scmd;
+ struct request *req;
int rc;

- spin_lock_irqsave(ap->lock, flags);
+ /*
+ * We only support a single reserved command, so this guarantees
+ * serialization. However the code already assumed that (we are
+ * serialized here per-port).
+ */
+ req = scsi_alloc_request(sdev->request_queue,
+ dma_dir == DMA_TO_DEVICE ?
+ REQ_OP_DRV_OUT : REQ_OP_DRV_IN,
+ BLK_MQ_REQ_RESERVED);
+ if (IS_ERR(req))
+ return AC_ERR_OTHER;

- /* no internal command while frozen */
- if (ap->pflags & ATA_PFLAG_FROZEN) {
- spin_unlock_irqrestore(ap->lock, flags);
- return AC_ERR_SYSTEM;
+
+ if (!timeout) {
+ if (ata_probe_timeout)
+ timeout = ata_probe_timeout * 1000;
+ else {
+ timeout = ata_internal_cmd_timeout(dev, command);
+ auto_timeout = 1;
+ }
}

- /* initialize internal qc */
+ scmd = blk_mq_rq_to_pdu(req);
+ scmd->allowed = 0;
+ req->timeout = timeout;
+ //TODO: Hook up timeout handler
+ req->rq_flags |= RQF_QUIET;
+ scmd->device = sdev;
qc = __ata_qc_from_tag(ap, ATA_TAG_INTERNAL);

+ /* Do this until we can hold ata_queued_cmd in the SCMD priv data */
+ scmd->host_scribble = (unsigned char *)qc;
+
+ if (buflen) {
+ int ret = blk_rq_map_kern(sdev->request_queue, req,
+ buf, buflen, GFP_NOIO);
+ if (ret) {
+ blk_mq_free_request(req);
+ return AC_ERR_OTHER;
+ }
+ }
+
qc->tag = ATA_TAG_INTERNAL;
qc->hw_tag = 0;
- qc->scsicmd = NULL;
+ qc->scsicmd = scmd;
qc->ap = ap;
qc->dev = dev;
ata_qc_reinit(qc);

- preempted_tag = link->active_tag;
- preempted_sactive = link->sactive;
- preempted_qc_active = ap->qc_active;
- preempted_nr_active_links = ap->nr_active_links;
- link->active_tag = ATA_TAG_POISON;
- link->sactive = 0;
- ap->qc_active = 0;
- ap->nr_active_links = 0;
-
/* prepare & issue qc */
qc->tf = *tf;
if (cdb)
@@ -1525,32 +1556,14 @@ unsigned ata_exec_internal_sg(struct ata_device *dev,

qc->flags |= ATA_QCFLAG_RESULT_TF;
qc->dma_dir = dma_dir;
- if (dma_dir != DMA_NONE) {
- unsigned int i, buflen = 0;
- struct scatterlist *sg;
-
- for_each_sg(sgl, sg, n_elem, i)
- buflen += sg->length;
-
- ata_sg_init(qc, sgl, n_elem);
- qc->nbytes = buflen;
- }

- qc->private_data = &wait;
+ qc->private_data = ap;
qc->complete_fn = ata_qc_complete_internal;

- ata_qc_issue(qc);
-
- spin_unlock_irqrestore(ap->lock, flags);
+ req->end_io_data = &wait;
+ req->end_io = ata_internal_end_rq;

- if (!timeout) {
- if (ata_probe_timeout)
- timeout = ata_probe_timeout * 1000;
- else {
- timeout = ata_internal_cmd_timeout(dev, command);
- auto_timeout = 1;
- }
- }
+ blk_execute_rq_nowait(req, true);

if (ap->ops->error_handler)
ata_eh_release(ap);
@@ -1610,13 +1623,15 @@ unsigned ata_exec_internal_sg(struct ata_device *dev,
err_mask = qc->err_mask;

ata_qc_free(qc);
- link->active_tag = preempted_tag;
- link->sactive = preempted_sactive;
- ap->qc_active = preempted_qc_active;
- ap->nr_active_links = preempted_nr_active_links;
+ link->active_tag = link->preempted_tag;
+ link->sactive = link->preempted_sactive;
+ ap->qc_active = ap->preempted_qc_active;
+ ap->nr_active_links = ap->preempted_nr_active_links;

spin_unlock_irqrestore(ap->lock, flags);

+ blk_mq_free_request(req);
+
if ((err_mask & AC_ERR_TIMEOUT) && auto_timeout)
ata_internal_cmd_timed_out(dev, command);

@@ -1647,18 +1662,20 @@ unsigned ata_exec_internal(struct ata_device *dev,
int dma_dir, void *buf, unsigned int buflen,
unsigned long timeout)
{
- struct scatterlist *psg = NULL, sg;
- unsigned int n_elem = 0;
+ /* buf may not be aligned, so copy to/from an aligned buffer */
+ void *tmpbuf = kmemdup(buf, buflen, GFP_KERNEL);
+ unsigned res;

- if (dma_dir != DMA_NONE) {
- WARN_ON(!buf);
- sg_init_one(&sg, buf, buflen);
- psg = &sg;
- n_elem++;
- }
+ if (!tmpbuf)
+ return AC_ERR_OTHER;

- return ata_exec_internal_sg(dev, tf, cdb, dma_dir, psg, n_elem,
+ res = ata_exec_internal_sg(dev, tf, cdb, dma_dir, tmpbuf, buflen,
timeout);
+
+ memcpy(buf, tmpbuf, buflen);
+ kfree(tmpbuf);
+
+ return res;
}

/**
diff --git a/drivers/ata/libata-sata.c b/drivers/ata/libata-sata.c
index 7a5fe41aa5ae..3cecc45d54ab 100644
--- a/drivers/ata/libata-sata.c
+++ b/drivers/ata/libata-sata.c
@@ -1258,9 +1258,10 @@ int ata_sas_queuecmd(struct scsi_cmnd *cmd, struct ata_port *ap)
{
int rc = 0;

- if (likely(ata_dev_enabled(ap->link.device)))
+ if (likely(ata_dev_enabled(ap->link.device)) ||
+ scsi_is_reserved_cmd(cmd)) {
rc = __ata_scsi_queuecmd(cmd, ap->link.device);
- else {
+ } else {
cmd->result = (DID_BAD_TARGET << 16);
scsi_done(cmd);
}
diff --git a/drivers/ata/libata.h b/drivers/ata/libata.h
index 926a7f41303d..1446a482835d 100644
--- a/drivers/ata/libata.h
+++ b/drivers/ata/libata.h
@@ -53,10 +53,6 @@ extern unsigned ata_exec_internal(struct ata_device *dev,
struct ata_taskfile *tf, const u8 *cdb,
int dma_dir, void *buf, unsigned int buflen,
unsigned long timeout);
-extern unsigned ata_exec_internal_sg(struct ata_device *dev,
- struct ata_taskfile *tf, const u8 *cdb,
- int dma_dir, struct scatterlist *sg,
- unsigned int n_elem, unsigned long timeout);
extern int ata_wait_ready(struct ata_link *link, unsigned long deadline,
int (*check_ready)(struct ata_link *link));
extern int ata_dev_read_id(struct ata_device *dev, unsigned int *p_class,
--
2.26.2

2022-06-09 11:36:08

by John Garry

[permalink] [raw]
Subject: [PATCH RFC v2 14/18] scsi: libsas: Queue SMP commands as requests

Send SMP commands through the block layer so that each command gets a
unique tag associated.

In future we can now also take advantage of the block layer request
timeout handling.

Function sas_task_complete_internal() is what the LLDD calls to signal
that the CQ is complete and this calls into the SCSI midlayer. And then
sas_blk_end_sync_rq() is called when the request completes.

Signed-off-by: John Garry <[email protected]>
---
drivers/scsi/libsas/sas_expander.c | 13 +++----------
drivers/scsi/libsas/sas_internal.h | 3 +++
drivers/scsi/libsas/sas_scsi_host.c | 16 ++++++++++++++++
3 files changed, 22 insertions(+), 10 deletions(-)

diff --git a/drivers/scsi/libsas/sas_expander.c b/drivers/scsi/libsas/sas_expander.c
index b833de062f88..d0bf63286b7d 100644
--- a/drivers/scsi/libsas/sas_expander.c
+++ b/drivers/scsi/libsas/sas_expander.c
@@ -66,20 +66,13 @@ static int smp_execute_task_sg(struct domain_device *dev,
task->task_proto = dev->tproto;
task->smp_task.smp_req = *req;
task->smp_task.smp_resp = *resp;
-
- task->task_done = sas_task_internal_done;
+ task->task_done = sas_task_complete_internal;

task->slow_task->timer.function = sas_task_internal_timedout;
task->slow_task->timer.expires = jiffies + SMP_TIMEOUT*HZ;
add_timer(&task->slow_task->timer);
-
- res = i->dft->lldd_execute_task(task, GFP_KERNEL);
-
- if (res) {
- del_timer(&task->slow_task->timer);
- pr_notice("executing SMP task failed:%d\n", res);
- break;
- }
+ rq->end_io = sas_blk_end_sync_rq;
+ blk_execute_rq_nowait(rq, true);

wait_for_completion(&task->slow_task->completion);
res = -ECOMM;
diff --git a/drivers/scsi/libsas/sas_internal.h b/drivers/scsi/libsas/sas_internal.h
index 13d0ffaada93..52cfa75d432b 100644
--- a/drivers/scsi/libsas/sas_internal.h
+++ b/drivers/scsi/libsas/sas_internal.h
@@ -99,6 +99,9 @@ int sas_execute_tmf(struct domain_device *device, void *parameter,
int para_len, int force_phy_id,
struct sas_tmf_task *tmf);

+void sas_task_complete_internal(struct sas_task *task);
+void sas_blk_end_sync_rq(struct request *rq, blk_status_t error);
+
#ifdef CONFIG_SCSI_SAS_HOST_SMP
extern void sas_smp_host_handler(struct bsg_job *job, struct Scsi_Host *shost);
#else
diff --git a/drivers/scsi/libsas/sas_scsi_host.c b/drivers/scsi/libsas/sas_scsi_host.c
index 3abb1d622a32..f0566f4512b2 100644
--- a/drivers/scsi/libsas/sas_scsi_host.c
+++ b/drivers/scsi/libsas/sas_scsi_host.c
@@ -926,6 +926,16 @@ void sas_task_internal_done(struct sas_task *task)
complete(&task->slow_task->completion);
}

+void sas_task_complete_internal(struct sas_task *task)
+{
+ struct request *rq = sas_rq_from_task(task);
+ struct scsi_cmnd *scmd = blk_mq_rq_to_pdu(rq);
+
+ del_timer(&task->slow_task->timer);
+
+ scsi_done(scmd);
+}
+
void sas_task_internal_timedout(struct timer_list *t)
{
struct sas_task_slow *slow = from_timer(slow, t, timer);
@@ -947,6 +957,12 @@ void sas_task_internal_timedout(struct timer_list *t)
#define TASK_TIMEOUT (20 * HZ)
#define TASK_RETRY 3

+void sas_blk_end_sync_rq(struct request *rq, blk_status_t error)
+{
+ struct sas_task *task = sas_rq_to_task(rq);
+ complete(&task->slow_task->completion);
+}
+
static int sas_execute_internal_abort(struct domain_device *device,
enum sas_internal_abort type, u16 tag,
unsigned int qid, void *data)
--
2.26.2

2022-06-13 07:37:52

by Damien Le Moal

[permalink] [raw]
Subject: Re: [PATCH RFC v2 06/18] libata-scsi: Add ata_scsi_queue_internal()

On 6/9/22 19:29, John Garry wrote:
> Add a function to handle queued ATA internal SCSI cmnds - does much the
> same as ata_exec_internal_sg() does (which will be fixed up later to
> actually queue internal cmnds through this function).
>
> Signed-off-by: John Garry <[email protected]>
> ---
> drivers/ata/libata-scsi.c | 47 ++++++++++++++++++++++++++++++++++++++-
> include/linux/libata.h | 6 +++++
> 2 files changed, 52 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
> index 42cecf95a4e5..baac35dd17ca 100644
> --- a/drivers/ata/libata-scsi.c
> +++ b/drivers/ata/libata-scsi.c
> @@ -3963,6 +3963,49 @@ static inline ata_xlat_func_t ata_get_xlat_func(struct ata_device *dev, u8 cmd)
> return NULL;
> }
>
> +static unsigned int ata_scsi_queue_internal(struct scsi_cmnd *scmd,
> + struct ata_device *dev)
> +{
> + struct ata_link *link = dev->link;
> + struct ata_port *ap = link->ap;
> + struct ata_queued_cmd *qc;
> +
> + /* no internal command while frozen */
> + if (ap->pflags & ATA_PFLAG_FROZEN)
> + goto did_err;
> +
> + /* initialize internal qc */
> + qc = __ata_qc_from_tag(ap, ATA_TAG_INTERNAL);
> + link->preempted_tag = link->active_tag;
> + link->preempted_sactive = link->sactive;
> + ap->preempted_qc_active = ap->qc_active;
> + ap->preempted_nr_active_links = ap->nr_active_links;
> + link->active_tag = ATA_TAG_POISON;
> + link->sactive = 0;
> + ap->qc_active = 0;
> + ap->nr_active_links = 0;> +
> + if (qc->dma_dir != DMA_NONE) {
> + int n_elem;
> +
> + n_elem = 1;
> + qc->n_elem = n_elem;
> + qc->sg = scsi_sglist(scmd);
> + qc->nbytes = qc->sg->length;
> + ata_sg_init(qc, qc->sg, n_elem);
> + }
> +
> + scmd->submitter = SUBMITTED_BY_BLOCK_LAYER;
> +
> + ata_qc_issue(qc);
> +
> + return 0;
> +did_err:
> + scmd->result = (DID_ERROR << 16);
> + scsi_done(scmd);
> + return 0;
> +}
> +
> int __ata_scsi_queuecmd(struct scsi_cmnd *scmd, struct ata_device *dev)
> {
> u8 scsi_op = scmd->cmnd[0];
> @@ -3971,7 +4014,9 @@ int __ata_scsi_queuecmd(struct scsi_cmnd *scmd, struct ata_device *dev)
> if (unlikely(!scmd->cmd_len))
> goto bad_cdb_len;
>
> - if (dev->class == ATA_DEV_ATA || dev->class == ATA_DEV_ZAC) {
> + if (scsi_is_reserved_cmd(scmd)) {
> + return ata_scsi_queue_internal(scmd, dev);
> + } else if (dev->class == ATA_DEV_ATA || dev->class == ATA_DEV_ZAC) {

No need for the else here.

> if (unlikely(scmd->cmd_len > dev->cdb_len))
> goto bad_cdb_len;
>
> diff --git a/include/linux/libata.h b/include/linux/libata.h
> index 732de9014626..43f4bcfe9a5f 100644
> --- a/include/linux/libata.h
> +++ b/include/linux/libata.h
> @@ -767,7 +767,9 @@ struct ata_link {
>
> struct device tdev;
> unsigned int active_tag; /* active tag on this link */
> + unsigned int preempted_tag;
> u32 sactive; /* active NCQ commands */
> + u32 preempted_sactive;
>
> unsigned int flags; /* ATA_LFLAG_xxx */
>
> @@ -861,6 +863,10 @@ struct ata_port {
> #ifdef CONFIG_ATA_ACPI
> struct ata_acpi_gtm __acpi_init_gtm; /* use ata_acpi_init_gtm() */
> #endif
> +
> + u64 preempted_qc_active;
> + int preempted_nr_active_links;
> +
> /* owned by EH */
> u8 sector_buf[ATA_SECT_SIZE] ____cacheline_aligned;
> };


--
Damien Le Moal
Western Digital Research

2022-06-13 07:38:44

by Damien Le Moal

[permalink] [raw]
Subject: Re: [PATCH RFC v2 08/18] libata: Queue ATA internal commands as requests

On 6/9/22 19:29, John Garry wrote:
> Follow the normal path for requests and queue through the block layer.
>
> We hold the qc pointer in the scmd host scribble, which is less than
> ideal. In future we can hold the ata_queued_cmd in the scmd priv_data,
> allowing for easy lookup.
>
> We need to use the shost sdev to queue these requests. That is because we
> initially do not have the target disk scsi_device allocated yet when
> queueing internal commands. This is less than ideal, and makes it hard for
> libsas to know when queueing an internal command whether it is a ATA
> internal command or not.
>
> Also make ata_exec_internal_sg() static - there are no users outside
> libata-core.c
>
> Signed-off-by: John Garry <[email protected]>
> ---
> drivers/ata/libata-core.c | 137 +++++++++++++++++++++-----------------
> drivers/ata/libata-sata.c | 5 +-
> drivers/ata/libata.h | 4 --
> 3 files changed, 80 insertions(+), 66 deletions(-)
>
> diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c
> index 40e816419f48..6b1aaeccb253 100644
> --- a/drivers/ata/libata-core.c
> +++ b/drivers/ata/libata-core.c
> @@ -1438,9 +1438,18 @@ unsigned long ata_id_xfermask(const u16 *id)
> }
> EXPORT_SYMBOL_GPL(ata_id_xfermask);
>
> -static void ata_qc_complete_internal(struct ata_queued_cmd *qc)
> +void ata_qc_complete_internal(struct ata_queued_cmd *qc)
> {
> - struct completion *waiting = qc->private_data;
> + struct scsi_cmnd *scmd = qc->scsicmd;
> +
> + scsi_done(scmd);
> +}
> +
> +static void ata_internal_end_rq(struct request *rq, blk_status_t error)
> +{
> + struct completion *waiting = rq->end_io_data;
> +
> + rq->end_io_data = (void *)(uintptr_t)error;
>
> complete(waiting);
> }
> @@ -1467,52 +1476,74 @@ static void ata_qc_complete_internal(struct ata_queued_cmd *qc)
> * RETURNS:
> * Zero on success, AC_ERR_* mask on failure
> */
> -unsigned ata_exec_internal_sg(struct ata_device *dev,
> +static unsigned ata_exec_internal_sg(struct ata_device *dev,
> struct ata_taskfile *tf, const u8 *cdb,
> - int dma_dir, struct scatterlist *sgl,
> - unsigned int n_elem, unsigned long timeout)
> + int dma_dir, void *buf, unsigned int buflen,
> + unsigned long timeout)
> {
> struct ata_link *link = dev->link;
> struct ata_port *ap = link->ap;
> + struct Scsi_Host *scsi_host = ap->scsi_host;
> + struct scsi_device *sdev = scsi_host->sdev;
> u8 command = tf->command;
> int auto_timeout = 0;
> struct ata_queued_cmd *qc;
> - unsigned int preempted_tag;
> - u32 preempted_sactive;
> - u64 preempted_qc_active;
> - int preempted_nr_active_links;
> DECLARE_COMPLETION_ONSTACK(wait);
> unsigned long flags;
> unsigned int err_mask;
> + struct scsi_cmnd *scmd;
> + struct request *req;
> int rc;
>
> - spin_lock_irqsave(ap->lock, flags);
> + /*
> + * We only support a single reserved command, so this guarantees
> + * serialization. However the code already assumed that (we are
> + * serialized here per-port).
> + */
> + req = scsi_alloc_request(sdev->request_queue,
> + dma_dir == DMA_TO_DEVICE ?
> + REQ_OP_DRV_OUT : REQ_OP_DRV_IN,
> + BLK_MQ_REQ_RESERVED);
> + if (IS_ERR(req))
> + return AC_ERR_OTHER;
>
> - /* no internal command while frozen */
> - if (ap->pflags & ATA_PFLAG_FROZEN) {
> - spin_unlock_irqrestore(ap->lock, flags);
> - return AC_ERR_SYSTEM;
> +
> + if (!timeout) {
> + if (ata_probe_timeout)
> + timeout = ata_probe_timeout * 1000;
> + else {
> + timeout = ata_internal_cmd_timeout(dev, command);
> + auto_timeout = 1;
> + }
> }
>
> - /* initialize internal qc */
> + scmd = blk_mq_rq_to_pdu(req);
> + scmd->allowed = 0;
> + req->timeout = timeout;
> + //TODO: Hook up timeout handler
> + req->rq_flags |= RQF_QUIET;
> + scmd->device = sdev;
> qc = __ata_qc_from_tag(ap, ATA_TAG_INTERNAL);
>
> + /* Do this until we can hold ata_queued_cmd in the SCMD priv data */
> + scmd->host_scribble = (unsigned char *)qc;
> +
> + if (buflen) {
> + int ret = blk_rq_map_kern(sdev->request_queue, req,
> + buf, buflen, GFP_NOIO);
> + if (ret) {
> + blk_mq_free_request(req);
> + return AC_ERR_OTHER;
> + }
> + }
> +
> qc->tag = ATA_TAG_INTERNAL;
> qc->hw_tag = 0;
> - qc->scsicmd = NULL;
> + qc->scsicmd = scmd;
> qc->ap = ap;
> qc->dev = dev;
> ata_qc_reinit(qc);
>
> - preempted_tag = link->active_tag;
> - preempted_sactive = link->sactive;
> - preempted_qc_active = ap->qc_active;
> - preempted_nr_active_links = ap->nr_active_links;
> - link->active_tag = ATA_TAG_POISON;
> - link->sactive = 0;
> - ap->qc_active = 0;
> - ap->nr_active_links = 0;
> -
> /* prepare & issue qc */
> qc->tf = *tf;
> if (cdb)
> @@ -1525,32 +1556,14 @@ unsigned ata_exec_internal_sg(struct ata_device *dev,
>
> qc->flags |= ATA_QCFLAG_RESULT_TF;
> qc->dma_dir = dma_dir;
> - if (dma_dir != DMA_NONE) {
> - unsigned int i, buflen = 0;
> - struct scatterlist *sg;
> -
> - for_each_sg(sgl, sg, n_elem, i)
> - buflen += sg->length;
> -
> - ata_sg_init(qc, sgl, n_elem);
> - qc->nbytes = buflen;
> - }
>
> - qc->private_data = &wait;
> + qc->private_data = ap;
> qc->complete_fn = ata_qc_complete_internal;
>
> - ata_qc_issue(qc);
> -
> - spin_unlock_irqrestore(ap->lock, flags);
> + req->end_io_data = &wait;
> + req->end_io = ata_internal_end_rq;
>
> - if (!timeout) {
> - if (ata_probe_timeout)
> - timeout = ata_probe_timeout * 1000;
> - else {
> - timeout = ata_internal_cmd_timeout(dev, command);
> - auto_timeout = 1;
> - }
> - }
> + blk_execute_rq_nowait(req, true);

How do you get guarantee that this req ends up being issued with
ATA_TAG_INTERNAL as the tag ? Because you have the reserved commands *in
addition to can queue* ? I can see how that works if can_queue is indeed
32, but what if the user changes the max qd ? That breaks, no ?

>
> if (ap->ops->error_handler)
> ata_eh_release(ap);
> @@ -1610,13 +1623,15 @@ unsigned ata_exec_internal_sg(struct ata_device *dev,
> err_mask = qc->err_mask;
>
> ata_qc_free(qc);
> - link->active_tag = preempted_tag;
> - link->sactive = preempted_sactive;
> - ap->qc_active = preempted_qc_active;
> - ap->nr_active_links = preempted_nr_active_links;
> + link->active_tag = link->preempted_tag;
> + link->sactive = link->preempted_sactive;
> + ap->qc_active = ap->preempted_qc_active;
> + ap->nr_active_links = ap->preempted_nr_active_links;
>
> spin_unlock_irqrestore(ap->lock, flags);
>
> + blk_mq_free_request(req);
> +
> if ((err_mask & AC_ERR_TIMEOUT) && auto_timeout)
> ata_internal_cmd_timed_out(dev, command);
>
> @@ -1647,18 +1662,20 @@ unsigned ata_exec_internal(struct ata_device *dev,
> int dma_dir, void *buf, unsigned int buflen,
> unsigned long timeout)
> {
> - struct scatterlist *psg = NULL, sg;
> - unsigned int n_elem = 0;
> + /* buf may not be aligned, so copy to/from an aligned buffer */
> + void *tmpbuf = kmemdup(buf, buflen, GFP_KERNEL);
> + unsigned res;
>
> - if (dma_dir != DMA_NONE) {
> - WARN_ON(!buf);
> - sg_init_one(&sg, buf, buflen);
> - psg = &sg;
> - n_elem++;
> - }
> + if (!tmpbuf)
> + return AC_ERR_OTHER;
>
> - return ata_exec_internal_sg(dev, tf, cdb, dma_dir, psg, n_elem,
> + res = ata_exec_internal_sg(dev, tf, cdb, dma_dir, tmpbuf, buflen,
> timeout);
> +
> + memcpy(buf, tmpbuf, buflen);
> + kfree(tmpbuf);
> +
> + return res;
> }
>
> /**
> diff --git a/drivers/ata/libata-sata.c b/drivers/ata/libata-sata.c
> index 7a5fe41aa5ae..3cecc45d54ab 100644
> --- a/drivers/ata/libata-sata.c
> +++ b/drivers/ata/libata-sata.c
> @@ -1258,9 +1258,10 @@ int ata_sas_queuecmd(struct scsi_cmnd *cmd, struct ata_port *ap)
> {
> int rc = 0;
>
> - if (likely(ata_dev_enabled(ap->link.device)))
> + if (likely(ata_dev_enabled(ap->link.device)) ||
> + scsi_is_reserved_cmd(cmd)) {
> rc = __ata_scsi_queuecmd(cmd, ap->link.device);
> - else {
> + } else {
> cmd->result = (DID_BAD_TARGET << 16);
> scsi_done(cmd);
> }
> diff --git a/drivers/ata/libata.h b/drivers/ata/libata.h
> index 926a7f41303d..1446a482835d 100644
> --- a/drivers/ata/libata.h
> +++ b/drivers/ata/libata.h
> @@ -53,10 +53,6 @@ extern unsigned ata_exec_internal(struct ata_device *dev,
> struct ata_taskfile *tf, const u8 *cdb,
> int dma_dir, void *buf, unsigned int buflen,
> unsigned long timeout);
> -extern unsigned ata_exec_internal_sg(struct ata_device *dev,
> - struct ata_taskfile *tf, const u8 *cdb,
> - int dma_dir, struct scatterlist *sg,
> - unsigned int n_elem, unsigned long timeout);
> extern int ata_wait_ready(struct ata_link *link, unsigned long deadline,
> int (*check_ready)(struct ata_link *link));
> extern int ata_dev_read_id(struct ata_device *dev, unsigned int *p_class,


--
Damien Le Moal
Western Digital Research

2022-06-13 07:45:20

by Damien Le Moal

[permalink] [raw]
Subject: Re: [PATCH RFC v2 07/18] libata-scsi: Add ata_internal_queuecommand()

On 6/9/22 19:29, John Garry wrote:
> Add callback to queue reserved commands - call it "internal" as this is
> what libata uses.
>
> Also add it to the base ATA SHT, and set nr_reserved_cmds = 1, which
> matches tag ATA_TAG_INTERNAL.
>
> Signed-off-by: John Garry <[email protected]>
> ---
> drivers/ata/libata-scsi.c | 14 ++++++++++++++
> include/linux/libata.h | 6 +++++-
> 2 files changed, 19 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
> index baac35dd17ca..b2702ab0183b 100644
> --- a/drivers/ata/libata-scsi.c
> +++ b/drivers/ata/libata-scsi.c
> @@ -1114,6 +1114,20 @@ int ata_scsi_dev_config(struct scsi_device *sdev, struct ata_device *dev)
> return 0;
> }
>
> +int ata_internal_queuecommand(struct Scsi_Host *shost, struct scsi_cmnd *scmd)

ata_scsi_internal_queuecommand()

But given that this is used for the .reserved_queuecommand() method, I
would call it ata_scsi_reserved_queuecommand().

> +{
> + struct ata_port *ap;
> + int res;
> +
> + ap = ata_shost_to_port(shost);

You can move this to ap declaration.

struct ata_port *ap = ata_shost_to_port(shost);

> + spin_lock_irq(ap->lock);

spin_lock_irqsave() ?

> + res = ata_sas_queuecmd(scmd, ap);
> + spin_unlock_irq(ap->lock);
> +
> + return res;
> +}
> +EXPORT_SYMBOL_GPL(ata_internal_queuecommand);
> +
> /**
> * ata_scsi_slave_config - Set SCSI device attributes
> * @sdev: SCSI device to examine
> diff --git a/include/linux/libata.h b/include/linux/libata.h
> index 43f4bcfe9a5f..5fa6f56bba81 100644
> --- a/include/linux/libata.h
> +++ b/include/linux/libata.h
> @@ -1141,6 +1141,8 @@ extern int ata_std_bios_param(struct scsi_device *sdev,
> sector_t capacity, int geom[]);
> extern void ata_scsi_unlock_native_capacity(struct scsi_device *sdev);
> extern int ata_scsi_slave_config(struct scsi_device *sdev);
> +extern int ata_internal_queuecommand(struct Scsi_Host *shost,
> + struct scsi_cmnd *scmd);
> extern void ata_scsi_slave_destroy(struct scsi_device *sdev);
> extern int ata_scsi_change_queue_depth(struct scsi_device *sdev,
> int queue_depth);
> @@ -1390,7 +1392,9 @@ extern const struct attribute_group *ata_common_sdev_groups[];
> .proc_name = drv_name, \
> .slave_destroy = ata_scsi_slave_destroy, \
> .bios_param = ata_std_bios_param, \
> - .unlock_native_capacity = ata_scsi_unlock_native_capacity
> + .unlock_native_capacity = ata_scsi_unlock_native_capacity,\
> + .nr_reserved_cmds = 1,\
> + .reserved_queuecommand = ata_internal_queuecommand
>
> #define ATA_SUBBASE_SHT(drv_name) \
> __ATA_BASE_SHT(drv_name), \


--
Damien Le Moal
Western Digital Research

2022-06-13 07:54:35

by Damien Le Moal

[permalink] [raw]
Subject: Re: [PATCH RFC v2 04/18] scsi: core: Add support to send reserved commands

On 6/9/22 19:29, John Garry wrote:
> Add a method to queue reserved commands.
>
> TODO:
> - fix timeout handler to call into reserved commands
> - We don't clear host_scribble for libata qc, but we should be able to drop
> this if we store libata qc in the scsi cmnd priv_data
>
> Signed-off-by: John Garry <[email protected]>
> ---
> drivers/scsi/hosts.c | 6 ++++++
> drivers/scsi/scsi_lib.c | 32 ++++++++++++++++++++++++++++++++
> include/scsi/scsi_cmnd.h | 5 +++++
> include/scsi/scsi_host.h | 1 +
> 4 files changed, 44 insertions(+)
>
> diff --git a/drivers/scsi/hosts.c b/drivers/scsi/hosts.c
> index 27296addaf63..5c9b05a8fec8 100644
> --- a/drivers/scsi/hosts.c
> +++ b/drivers/scsi/hosts.c
> @@ -221,6 +221,12 @@ int scsi_add_host_with_dma(struct Scsi_Host *shost, struct device *dev,
> goto fail;
> }
>
> + if (shost->nr_reserved_cmds && !sht->reserved_queuecommand) {
> + shost_printk(KERN_ERR, shost,
> + "nr_reserved_cmds set but no method to queue\n");
> + goto fail;

This would be a driver implementation bug. So what about a WARN() here ?

> + }
> +
> /* Use min_t(int, ...) in case shost->can_queue exceeds SHRT_MAX */
> shost->cmd_per_lun = min_t(int, shost->cmd_per_lun,
> shost->can_queue);
> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> index f6e53c6d913c..8c8b4c6767d9 100644
> --- a/drivers/scsi/scsi_lib.c
> +++ b/drivers/scsi/scsi_lib.c
> @@ -1422,6 +1422,16 @@ static void scsi_complete(struct request *rq)
> struct scsi_cmnd *cmd = blk_mq_rq_to_pdu(rq);
> enum scsi_disposition disposition;
>
> + if (scsi_is_reserved_cmd(cmd)) {
> + struct scsi_device *sdev = cmd->device;
> +
> + scsi_mq_uninit_cmd(cmd);
> + scsi_device_unbusy(sdev, cmd);
> + __blk_mq_end_request(rq, 0);
> +
> + return;
> + }
> +
> INIT_LIST_HEAD(&cmd->eh_entry);
>
> atomic_inc(&cmd->device->iodone_cnt);
> @@ -1706,6 +1716,28 @@ static blk_status_t scsi_queue_rq(struct blk_mq_hw_ctx *hctx,
>
> WARN_ON_ONCE(cmd->budget_token < 0);
>
> + if (scsi_is_reserved_cmd(cmd)) {
> + unsigned char *host_scribble = cmd->host_scribble;
> +
> + if (!(req->rq_flags & RQF_DONTPREP)) {
> + ret = scsi_prepare_cmd(req);
> + if (ret != BLK_STS_OK) {
> +

Stray blank line.

> + goto out_dec_host_busy;
> + }

No need for the curly brackets here.

> +
> + req->rq_flags |= RQF_DONTPREP;
> + } else {
> + clear_bit(SCMD_STATE_COMPLETE, &cmd->state);
> + }
> +
> + cmd->host_scribble = host_scribble;
> +
> + blk_mq_start_request(req);
> +
> + return shost->hostt->reserved_queuecommand(shost, cmd);
> + }
> +
> /*
> * If the device is not in running state we will reject some or all
> * commands.
> diff --git a/include/scsi/scsi_cmnd.h b/include/scsi/scsi_cmnd.h
> index 1e80e70dfa92..e47df5836070 100644
> --- a/include/scsi/scsi_cmnd.h
> +++ b/include/scsi/scsi_cmnd.h
> @@ -152,6 +152,11 @@ static inline void *scsi_cmd_priv(struct scsi_cmnd *cmd)
> return cmd + 1;
> }
>
> +static inline bool scsi_is_reserved_cmd(struct scsi_cmnd *cmd)
> +{
> + return blk_mq_is_reserved_rq(scsi_cmd_to_rq(cmd));
> +}
> +
> void scsi_done(struct scsi_cmnd *cmd);
> void scsi_done_direct(struct scsi_cmnd *cmd);
>
> diff --git a/include/scsi/scsi_host.h b/include/scsi/scsi_host.h
> index 149dcbd4125e..88c8504395c8 100644
> --- a/include/scsi/scsi_host.h
> +++ b/include/scsi/scsi_host.h
> @@ -73,6 +73,7 @@ struct scsi_host_template {
> * STATUS: REQUIRED
> */
> int (* queuecommand)(struct Scsi_Host *, struct scsi_cmnd *);
> + int (* reserved_queuecommand)(struct Scsi_Host *, struct scsi_cmnd *);
>
> /*
> * The commit_rqs function is used to trigger a hardware


--
Damien Le Moal
Western Digital Research

2022-06-13 07:55:32

by Damien Le Moal

[permalink] [raw]
Subject: Re: [PATCH RFC v2 03/18] scsi: core: Implement reserved command handling

On 6/9/22 19:29, John Garry wrote:
> From: Hannes Reinecke <[email protected]>
>
> Quite some drivers are using management commands internally, which
> typically use the same hardware tag pool (ie they are being allocated
> from the same hardware resources) as the 'normal' I/O commands.
> These commands are set aside before allocating the block-mq tag bitmap,
> so they'll never show up as busy in the tag map.
> The block-layer, OTOH, already has 'reserved_tags' to handle precisely
> this situation.
> So this patch adds a new field 'nr_reserved_cmds' to the SCSI host
> template to instruct the block layer to set aside a tag space for these
> management commands by using reserved tags.
>
> Signed-off-by: Hannes Reinecke <[email protected]>
> Signed-off-by: John Garry <[email protected]>
> ---
> drivers/scsi/hosts.c | 3 +++
> drivers/scsi/scsi_lib.c | 6 +++++-
> include/scsi/scsi_host.h | 22 +++++++++++++++++++++-
> 3 files changed, 29 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/scsi/hosts.c b/drivers/scsi/hosts.c
> index 8352f90d997d..27296addaf63 100644
> --- a/drivers/scsi/hosts.c
> +++ b/drivers/scsi/hosts.c
> @@ -474,6 +474,9 @@ struct Scsi_Host *scsi_host_alloc(struct scsi_host_template *sht, int privsize)
> if (sht->virt_boundary_mask)
> shost->virt_boundary_mask = sht->virt_boundary_mask;
>
> + if (sht->nr_reserved_cmds)
> + shost->nr_reserved_cmds = sht->nr_reserved_cmds;
> +
> device_initialize(&shost->shost_gendev);
> dev_set_name(&shost->shost_gendev, "host%d", shost->host_no);
> shost->shost_gendev.bus = &scsi_bus_type;
> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> index 6ffc9e4258a8..f6e53c6d913c 100644
> --- a/drivers/scsi/scsi_lib.c
> +++ b/drivers/scsi/scsi_lib.c
> @@ -1974,8 +1974,12 @@ int scsi_mq_setup_tags(struct Scsi_Host *shost)
> else
> tag_set->ops = &scsi_mq_ops_no_commit;
> tag_set->nr_hw_queues = shost->nr_hw_queues ? : 1;
> +
> tag_set->nr_maps = shost->nr_maps ? : 1;
> - tag_set->queue_depth = shost->can_queue;
> + tag_set->queue_depth =
> + shost->can_queue + shost->nr_reserved_cmds;
> + tag_set->reserved_tags = shost->nr_reserved_cmds;
> +
> tag_set->cmd_size = cmd_size;
> tag_set->numa_node = dev_to_node(shost->dma_dev);
> tag_set->flags = BLK_MQ_F_SHOULD_MERGE;
> diff --git a/include/scsi/scsi_host.h b/include/scsi/scsi_host.h
> index 59aef1f178f5..149dcbd4125e 100644
> --- a/include/scsi/scsi_host.h
> +++ b/include/scsi/scsi_host.h
> @@ -366,10 +366,19 @@ struct scsi_host_template {
> /*
> * This determines if we will use a non-interrupt driven
> * or an interrupt driven scheme. It is set to the maximum number
> - * of simultaneous commands a single hw queue in HBA will accept.
> + * of simultaneous commands a single hw queue in HBA will accept
> + * excluding internal commands.
> */
> int can_queue;
>
> + /*
> + * This determines how many commands the HBA will set aside
> + * for internal commands. This number will be added to
> + * @can_queue to calcumate the maximum number of simultaneous

s/calcumate/calculate

But this is weird. For SATA, can_queue is 32. Having reserved commands,
that number needs to stay the same. We cannot have more than 32 tags.
I think keeping can_queue as the max queue depth with at most
nr_reserved_cmds tags reserved is better.

> + * commands sent to the host.
> + */
> + int nr_reserved_cmds;
> +
> /*
> * In many instances, especially where disconnect / reconnect are
> * supported, our host also has an ID on the SCSI bus. If this is
> @@ -602,6 +611,11 @@ struct Scsi_Host {
> unsigned short max_cmd_len;
>
> int this_id;
> +
> + /*
> + * Number of commands this host can handle at the same time.
> + * This excludes reserved commands as specified by nr_reserved_cmds.
> + */
> int can_queue;
> short cmd_per_lun;
> short unsigned int sg_tablesize;
> @@ -620,6 +634,12 @@ struct Scsi_Host {
> */
> unsigned nr_hw_queues;
> unsigned nr_maps;
> +
> + /*
> + * Number of reserved commands to allocate, if any.
> + */
> + unsigned nr_reserved_cmds;
> +
> unsigned active_mode:2;
>
> /*


--
Damien Le Moal
Western Digital Research

2022-06-13 08:13:54

by John Garry

[permalink] [raw]
Subject: Re: [PATCH RFC v2 08/18] libata: Queue ATA internal commands as requests

On 13/06/2022 08:22, Damien Le Moal wrote:
> On 6/9/22 19:29, John Garry wrote:
>> Follow the normal path for requests and queue through the block layer.
>>
>> We hold the qc pointer in the scmd host scribble, which is less than
>> ideal. In future we can hold the ata_queued_cmd in the scmd priv_data,
>> allowing for easy lookup.
>>
>> We need to use the shost sdev to queue these requests. That is because we
>> initially do not have the target disk scsi_device allocated yet when
>> queueing internal commands. This is less than ideal, and makes it hard for
>> libsas to know when queueing an internal command whether it is a ATA
>> internal command or not.
>>
>> Also make ata_exec_internal_sg() static - there are no users outside
>> libata-core.c
>>
>> Signed-off-by: John Garry <[email protected]>
>> ---
>> drivers/ata/libata-core.c | 137 +++++++++++++++++++++-----------------
>> drivers/ata/libata-sata.c | 5 +-
>> drivers/ata/libata.h | 4 --
>> 3 files changed, 80 insertions(+), 66 deletions(-)
>>
>> diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c
>> index 40e816419f48..6b1aaeccb253 100644
>> --- a/drivers/ata/libata-core.c
>> +++ b/drivers/ata/libata-core.c
>> @@ -1438,9 +1438,18 @@ unsigned long ata_id_xfermask(const u16 *id)
>> }
>> EXPORT_SYMBOL_GPL(ata_id_xfermask);
>>
>> -static void ata_qc_complete_internal(struct ata_queued_cmd *qc)
>> +void ata_qc_complete_internal(struct ata_queued_cmd *qc)
>> {
>> - struct completion *waiting = qc->private_data;
>> + struct scsi_cmnd *scmd = qc->scsicmd;
>> +
>> + scsi_done(scmd);
>> +}
>> +
>> +static void ata_internal_end_rq(struct request *rq, blk_status_t error)
>> +{
>> + struct completion *waiting = rq->end_io_data;
>> +
>> + rq->end_io_data = (void *)(uintptr_t)error;
>>
>> complete(waiting);
>> }
>> @@ -1467,52 +1476,74 @@ static void ata_qc_complete_internal(struct ata_queued_cmd *qc)
>> * RETURNS:
>> * Zero on success, AC_ERR_* mask on failure
>> */
>> -unsigned ata_exec_internal_sg(struct ata_device *dev,
>> +static unsigned ata_exec_internal_sg(struct ata_device *dev,
>> struct ata_taskfile *tf, const u8 *cdb,
>> - int dma_dir, struct scatterlist *sgl,
>> - unsigned int n_elem, unsigned long timeout)
>> + int dma_dir, void *buf, unsigned int buflen,
>> + unsigned long timeout)
>> {
>> struct ata_link *link = dev->link;
>> struct ata_port *ap = link->ap;
>> + struct Scsi_Host *scsi_host = ap->scsi_host;
>> + struct scsi_device *sdev = scsi_host->sdev;
>> u8 command = tf->command;
>> int auto_timeout = 0;
>> struct ata_queued_cmd *qc;
>> - unsigned int preempted_tag;
>> - u32 preempted_sactive;
>> - u64 preempted_qc_active;
>> - int preempted_nr_active_links;
>> DECLARE_COMPLETION_ONSTACK(wait);
>> unsigned long flags;
>> unsigned int err_mask;
>> + struct scsi_cmnd *scmd;
>> + struct request *req;
>> int rc;
>>
>> - spin_lock_irqsave(ap->lock, flags);
>> + /*
>> + * We only support a single reserved command, so this guarantees
>> + * serialization. However the code already assumed that (we are
>> + * serialized here per-port).
>> + */
>> + req = scsi_alloc_request(sdev->request_queue,
>> + dma_dir == DMA_TO_DEVICE ?
>> + REQ_OP_DRV_OUT : REQ_OP_DRV_IN,
>> + BLK_MQ_REQ_RESERVED);
>> + if (IS_ERR(req))
>> + return AC_ERR_OTHER;
>>
>> - /* no internal command while frozen */
>> - if (ap->pflags & ATA_PFLAG_FROZEN) {
>> - spin_unlock_irqrestore(ap->lock, flags);
>> - return AC_ERR_SYSTEM;
>> +
>> + if (!timeout) {
>> + if (ata_probe_timeout)
>> + timeout = ata_probe_timeout * 1000;
>> + else {
>> + timeout = ata_internal_cmd_timeout(dev, command);
>> + auto_timeout = 1;
>> + }
>> }
>>
>> - /* initialize internal qc */
>> + scmd = blk_mq_rq_to_pdu(req);
>> + scmd->allowed = 0;
>> + req->timeout = timeout;
>> + //TODO: Hook up timeout handler
>> + req->rq_flags |= RQF_QUIET;
>> + scmd->device = sdev;
>> qc = __ata_qc_from_tag(ap, ATA_TAG_INTERNAL);
>>
>> + /* Do this until we can hold ata_queued_cmd in the SCMD priv data */
>> + scmd->host_scribble = (unsigned char *)qc;
>> +
>> + if (buflen) {
>> + int ret = blk_rq_map_kern(sdev->request_queue, req,
>> + buf, buflen, GFP_NOIO);
>> + if (ret) {
>> + blk_mq_free_request(req);
>> + return AC_ERR_OTHER;
>> + }
>> + }
>> +
>> qc->tag = ATA_TAG_INTERNAL;
>> qc->hw_tag = 0;
>> - qc->scsicmd = NULL;
>> + qc->scsicmd = scmd;
>> qc->ap = ap;
>> qc->dev = dev;
>> ata_qc_reinit(qc);
>>
>> - preempted_tag = link->active_tag;
>> - preempted_sactive = link->sactive;
>> - preempted_qc_active = ap->qc_active;
>> - preempted_nr_active_links = ap->nr_active_links;
>> - link->active_tag = ATA_TAG_POISON;
>> - link->sactive = 0;
>> - ap->qc_active = 0;
>> - ap->nr_active_links = 0;
>> -
>> /* prepare & issue qc */
>> qc->tf = *tf;
>> if (cdb)
>> @@ -1525,32 +1556,14 @@ unsigned ata_exec_internal_sg(struct ata_device *dev,
>>
>> qc->flags |= ATA_QCFLAG_RESULT_TF;
>> qc->dma_dir = dma_dir;
>> - if (dma_dir != DMA_NONE) {
>> - unsigned int i, buflen = 0;
>> - struct scatterlist *sg;
>> -
>> - for_each_sg(sgl, sg, n_elem, i)
>> - buflen += sg->length;
>> -
>> - ata_sg_init(qc, sgl, n_elem);
>> - qc->nbytes = buflen;
>> - }
>>
>> - qc->private_data = &wait;
>> + qc->private_data = ap;
>> qc->complete_fn = ata_qc_complete_internal;
>>
>> - ata_qc_issue(qc);
>> -
>> - spin_unlock_irqrestore(ap->lock, flags);
>> + req->end_io_data = &wait;
>> + req->end_io = ata_internal_end_rq;
>>
>> - if (!timeout) {
>> - if (ata_probe_timeout)
>> - timeout = ata_probe_timeout * 1000;
>> - else {
>> - timeout = ata_internal_cmd_timeout(dev, command);
>> - auto_timeout = 1;
>> - }
>> - }

Hi Damien,

>> + blk_execute_rq_nowait(req, true);
>
> How do you get guarantee that this req ends up being issued with
> ATA_TAG_INTERNAL as the tag ?

We don't, but it's not important, as we don't use the request tag index
in libata to infer that a request is an ATA_TAG_INTERNAL request. Nobody
should assume that the tag index for the ATA_TAG_INTERNAL req is 32. We
have other methods to check that, like using scsi_is_reserved_cmd() and
continuing to use __ata_qc_from_tag(ap, ATA_TAG_INTERNAL) to get the ATA
internal qc.

But it is a good question, as if we start to put the ata_queued_cmd
structure in the SCSI cmnd priv data - as suggested by Hannes - then
__ata_qc_from_tag(ap, ATA_TAG_INTERNAL) may need to be reworked to
handle that.

> Because you have the reserved commands *in
> addition to can queue* ? I can see how that works if can_queue is indeed
> 32, but what if the user changes the max qd ? That breaks, no ?

Thanks,
john

>
>>
>> if (ap->ops->error_handler)
>> ata_eh_release(ap);
>> @@ -1610,13 +1623,15 @@ unsigned ata_exec_internal_sg(struct ata_device *dev,
>> err_mask = qc->err_mask;
>>
>> ata_qc_free(qc);
>> - link->active_tag = preempted_tag;
>> - link->sactive = preempted_sactive;
>> - ap->qc_active = preempted_qc_active;
>> - ap->nr_active_links = preempted_nr_active_links;
>> + link->active_tag = link->preempted_tag;
>> + link->sactive = link->preempted_sactive;
>> + ap->qc_active = ap->preempted_qc_active;
>> + ap->nr_active_links = ap->preempted_nr_active_links;
>>
>> spin_unlock_irqrestore(ap->lock, flags);
>>
>> + blk_mq_free_request(req);
>> +
>> if ((err_mask & AC_ERR_TIMEOUT) && auto_timeout)
>> ata_internal_cmd_timed_out(dev, command);
>>
>> @@ -1647,18 +1662,20 @@ unsigned ata_exec_internal(struct ata_device *dev,
>> int dma_dir, void *buf, unsigned int buflen,
>> unsigned long timeout)
>> {
>> - struct scatterlist *psg = NULL, sg;
>> - unsigned int n_elem = 0;
>> + /* buf may not be aligned, so copy to/from an aligned buffer */
>> + void *tmpbuf = kmemdup(buf, buflen, GFP_KERNEL);
>> + unsigned res;
>>
>> - if (dma_dir != DMA_NONE) {
>> - WARN_ON(!buf);
>> - sg_init_one(&sg, buf, buflen);
>> - psg = &sg;
>> - n_elem++;
>> - }
>> + if (!tmpbuf)
>> + return AC_ERR_OTHER;
>>
>> - return ata_exec_internal_sg(dev, tf, cdb, dma_dir, psg, n_elem,
>> + res = ata_exec_internal_sg(dev, tf, cdb, dma_dir, tmpbuf, buflen,
>> timeout);
>> +
>> + memcpy(buf, tmpbuf, buflen);
>> + kfree(tmpbuf);
>> +
>> + return res;
>> }
>>
>> /**
>> diff --git a/drivers/ata/libata-sata.c b/drivers/ata/libata-sata.c
>> index 7a5fe41aa5ae..3cecc45d54ab 100644
>> --- a/drivers/ata/libata-sata.c
>> +++ b/drivers/ata/libata-sata.c
>> @@ -1258,9 +1258,10 @@ int ata_sas_queuecmd(struct scsi_cmnd *cmd, struct ata_port *ap)
>> {
>> int rc = 0;
>>
>> - if (likely(ata_dev_enabled(ap->link.device)))
>> + if (likely(ata_dev_enabled(ap->link.device)) ||
>> + scsi_is_reserved_cmd(cmd)) {
>> rc = __ata_scsi_queuecmd(cmd, ap->link.device);
>> - else {
>> + } else {
>> cmd->result = (DID_BAD_TARGET << 16);
>> scsi_done(cmd);
>> }
>> diff --git a/drivers/ata/libata.h b/drivers/ata/libata.h
>> index 926a7f41303d..1446a482835d 100644
>> --- a/drivers/ata/libata.h
>> +++ b/drivers/ata/libata.h
>> @@ -53,10 +53,6 @@ extern unsigned ata_exec_internal(struct ata_device *dev,
>> struct ata_taskfile *tf, const u8 *cdb,
>> int dma_dir, void *buf, unsigned int buflen,
>> unsigned long timeout);
>> -extern unsigned ata_exec_internal_sg(struct ata_device *dev,
>> - struct ata_taskfile *tf, const u8 *cdb,
>> - int dma_dir, struct scatterlist *sg,
>> - unsigned int n_elem, unsigned long timeout);
>> extern int ata_wait_ready(struct ata_link *link, unsigned long deadline,
>> int (*check_ready)(struct ata_link *link));
>> extern int ata_dev_read_id(struct ata_device *dev, unsigned int *p_class,
>
>

2022-06-13 08:25:13

by John Garry

[permalink] [raw]
Subject: Re: [PATCH RFC v2 03/18] scsi: core: Implement reserved command handling

On 13/06/2022 08:01, Damien Le Moal wrote:
> On 6/9/22 19:29, John Garry wrote:
>> From: Hannes Reinecke <[email protected]>
>>
>> Quite some drivers are using management commands internally, which
>> typically use the same hardware tag pool (ie they are being allocated
>> from the same hardware resources) as the 'normal' I/O commands.
>> These commands are set aside before allocating the block-mq tag bitmap,
>> so they'll never show up as busy in the tag map.
>> The block-layer, OTOH, already has 'reserved_tags' to handle precisely
>> this situation.
>> So this patch adds a new field 'nr_reserved_cmds' to the SCSI host
>> template to instruct the block layer to set aside a tag space for these
>> management commands by using reserved tags.
>>
>> Signed-off-by: Hannes Reinecke <[email protected]>
>> Signed-off-by: John Garry <[email protected]>
>> ---
>> drivers/scsi/hosts.c | 3 +++
>> drivers/scsi/scsi_lib.c | 6 +++++-
>> include/scsi/scsi_host.h | 22 +++++++++++++++++++++-
>> 3 files changed, 29 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/scsi/hosts.c b/drivers/scsi/hosts.c
>> index 8352f90d997d..27296addaf63 100644
>> --- a/drivers/scsi/hosts.c
>> +++ b/drivers/scsi/hosts.c
>> @@ -474,6 +474,9 @@ struct Scsi_Host *scsi_host_alloc(struct scsi_host_template *sht, int privsize)
>> if (sht->virt_boundary_mask)
>> shost->virt_boundary_mask = sht->virt_boundary_mask;
>>
>> + if (sht->nr_reserved_cmds)
>> + shost->nr_reserved_cmds = sht->nr_reserved_cmds;
>> +
>> device_initialize(&shost->shost_gendev);
>> dev_set_name(&shost->shost_gendev, "host%d", shost->host_no);
>> shost->shost_gendev.bus = &scsi_bus_type;
>> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
>> index 6ffc9e4258a8..f6e53c6d913c 100644
>> --- a/drivers/scsi/scsi_lib.c
>> +++ b/drivers/scsi/scsi_lib.c
>> @@ -1974,8 +1974,12 @@ int scsi_mq_setup_tags(struct Scsi_Host *shost)
>> else
>> tag_set->ops = &scsi_mq_ops_no_commit;
>> tag_set->nr_hw_queues = shost->nr_hw_queues ? : 1;
>> +
>> tag_set->nr_maps = shost->nr_maps ? : 1;
>> - tag_set->queue_depth = shost->can_queue;
>> + tag_set->queue_depth =
>> + shost->can_queue + shost->nr_reserved_cmds;
>> + tag_set->reserved_tags = shost->nr_reserved_cmds;
>> +
>> tag_set->cmd_size = cmd_size;
>> tag_set->numa_node = dev_to_node(shost->dma_dev);
>> tag_set->flags = BLK_MQ_F_SHOULD_MERGE;
>> diff --git a/include/scsi/scsi_host.h b/include/scsi/scsi_host.h
>> index 59aef1f178f5..149dcbd4125e 100644
>> --- a/include/scsi/scsi_host.h
>> +++ b/include/scsi/scsi_host.h
>> @@ -366,10 +366,19 @@ struct scsi_host_template {
>> /*
>> * This determines if we will use a non-interrupt driven
>> * or an interrupt driven scheme. It is set to the maximum number
>> - * of simultaneous commands a single hw queue in HBA will accept.
>> + * of simultaneous commands a single hw queue in HBA will accept
>> + * excluding internal commands.
>> */
>> int can_queue;
>>
>> + /*
>> + * This determines how many commands the HBA will set aside
>> + * for internal commands. This number will be added to
>> + * @can_queue to calcumate the maximum number of simultaneous
>

Hi Damien,

> s/calcumate/calculate
>
> But this is weird. For SATA, can_queue is 32. Having reserved commands,
> that number needs to stay the same.

It does.

> We cannot have more than 32 tags.

We may have 32 regular tags and 1 reserved tag for SATA.

> I think keeping can_queue as the max queue depth with at most
> nr_reserved_cmds tags reserved is better.

Maybe the wording in the comment can be improved as it originally
focused on SAS HBAs where there are no special rules for tagset depth or
how the tagset should be carved up to handle regular and reserved commands.

Thanks,
John

>
>> + * commands sent to the host.
>> + */
>> + int nr_reserved_cmds;
>> +
>> /*
>> * In many instances, especially where disconnect / reconnect are
>> * supported, our host also has an ID on the SCSI bus. If this is
>> @@ -602,6 +611,11 @@ struct Scsi_Host {
>> unsigned short max_cmd_len;
>>
>> int this_id;
>> +
>> + /*
>> + * Number of commands this host can handle at the same time.
>> + * This excludes reserved commands as specified by nr_reserved_cmds.
>> + */
>> int can_queue;
>> short cmd_per_lun;
>> short unsigned int sg_tablesize;
>> @@ -620,6 +634,12 @@ struct Scsi_Host {
>> */
>> unsigned nr_hw_queues;
>> unsigned nr_maps;
>> +
>> + /*
>> + * Number of reserved commands to allocate, if any.
>> + */
>> + unsigned nr_reserved_cmds;
>> +
>> unsigned active_mode:2;
>>
>> /*
>
>

2022-06-13 09:44:32

by John Garry

[permalink] [raw]
Subject: Re: [PATCH RFC v2 03/18] scsi: core: Implement reserved command handling

On 13/06/2022 10:06, Damien Le Moal wrote:
>>> We cannot have more than 32 tags.
>> We may have 32 regular tags and 1 reserved tag for SATA.
> Right. But that is the messy part though. That extra 1 tag is actually not
> a tag since all internal commands are non-NCQ commands that do not need a
> tag...

But apart from SATA, libsas LLDDs do need a real tag for the libata
internal command.

>
> I am working on command duration limits support currently. This feature
> set has a new horrendous "improvement": a command can be aborted by the
> device if it fails its duration limit, but the abort is done with a good
> status + sense data available bit set so that the device queue is not
> aborted entirely like with a regular NCQ command error.
>
> For such aborted commands, the command sense data is set to
> "COMPLETED/DATA UNAVAILABLE". In this case, the host needs to go read the
> new "successful NCQ sense data log" to check that the command sense is
> indeed "COMPLETED/DATA UNAVAILABLE". And to go read that log page without
> stalling the device queue, we would need an internal NCQ (queuable) command.
>
> Currently, that is not possible to do cleanly as there are no guarantees
> we can get a free tag (there is a race between block layer tag allocation
> and libata internal tag counting). So a reserved tag for that would be
> nice. We would end up with 31 IO tags at most + 1 reserved tag for NCQ
> commands + ATA_TAG_INTERNAL for non-NCQ. That last one would be rendered
> rather useless. But that also means that we kind-of go back to the days
> when Linux showed ATA drives max QD of 31...

So must the ATA_TAG_INTERNAL qc always be available for non-NCQ action
like EH, and that is why you cannot reuse for this internal NCQ
(queuable) command?

>
> I am still struggling with this particular use case and trying to make it
> fit with your series. Trying out different things right now.
>

ok

>
>>> I think keeping can_queue as the max queue depth with at most
>>> nr_reserved_cmds tags reserved is better.
>> Maybe the wording in the comment can be improved as it originally
>> focused on SAS HBAs where there are no special rules for tagset depth or
>> how the tagset should be carved up to handle regular and reserved commands.
> Indeed. And that would be for HBAs that do*not* use libsas/libata.
> Otherwise, the NCQ vs non-NCQ reserved tag mess is there.
>

Thanks,
John

2022-06-13 09:46:51

by Damien Le Moal

[permalink] [raw]
Subject: Re: [PATCH RFC v2 03/18] scsi: core: Implement reserved command handling

On 6/13/22 17:25, John Garry wrote:
> On 13/06/2022 08:01, Damien Le Moal wrote:
>> On 6/9/22 19:29, John Garry wrote:
>>> From: Hannes Reinecke <[email protected]>
>>>
>>> Quite some drivers are using management commands internally, which
>>> typically use the same hardware tag pool (ie they are being allocated
>>> from the same hardware resources) as the 'normal' I/O commands.
>>> These commands are set aside before allocating the block-mq tag bitmap,
>>> so they'll never show up as busy in the tag map.
>>> The block-layer, OTOH, already has 'reserved_tags' to handle precisely
>>> this situation.
>>> So this patch adds a new field 'nr_reserved_cmds' to the SCSI host
>>> template to instruct the block layer to set aside a tag space for these
>>> management commands by using reserved tags.
>>>
>>> Signed-off-by: Hannes Reinecke <[email protected]>
>>> Signed-off-by: John Garry <[email protected]>
>>> ---
>>> drivers/scsi/hosts.c | 3 +++
>>> drivers/scsi/scsi_lib.c | 6 +++++-
>>> include/scsi/scsi_host.h | 22 +++++++++++++++++++++-
>>> 3 files changed, 29 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/drivers/scsi/hosts.c b/drivers/scsi/hosts.c
>>> index 8352f90d997d..27296addaf63 100644
>>> --- a/drivers/scsi/hosts.c
>>> +++ b/drivers/scsi/hosts.c
>>> @@ -474,6 +474,9 @@ struct Scsi_Host *scsi_host_alloc(struct scsi_host_template *sht, int privsize)
>>> if (sht->virt_boundary_mask)
>>> shost->virt_boundary_mask = sht->virt_boundary_mask;
>>>
>>> + if (sht->nr_reserved_cmds)
>>> + shost->nr_reserved_cmds = sht->nr_reserved_cmds;
>>> +
>>> device_initialize(&shost->shost_gendev);
>>> dev_set_name(&shost->shost_gendev, "host%d", shost->host_no);
>>> shost->shost_gendev.bus = &scsi_bus_type;
>>> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
>>> index 6ffc9e4258a8..f6e53c6d913c 100644
>>> --- a/drivers/scsi/scsi_lib.c
>>> +++ b/drivers/scsi/scsi_lib.c
>>> @@ -1974,8 +1974,12 @@ int scsi_mq_setup_tags(struct Scsi_Host *shost)
>>> else
>>> tag_set->ops = &scsi_mq_ops_no_commit;
>>> tag_set->nr_hw_queues = shost->nr_hw_queues ? : 1;
>>> +
>>> tag_set->nr_maps = shost->nr_maps ? : 1;
>>> - tag_set->queue_depth = shost->can_queue;
>>> + tag_set->queue_depth =
>>> + shost->can_queue + shost->nr_reserved_cmds;
>>> + tag_set->reserved_tags = shost->nr_reserved_cmds;
>>> +
>>> tag_set->cmd_size = cmd_size;
>>> tag_set->numa_node = dev_to_node(shost->dma_dev);
>>> tag_set->flags = BLK_MQ_F_SHOULD_MERGE;
>>> diff --git a/include/scsi/scsi_host.h b/include/scsi/scsi_host.h
>>> index 59aef1f178f5..149dcbd4125e 100644
>>> --- a/include/scsi/scsi_host.h
>>> +++ b/include/scsi/scsi_host.h
>>> @@ -366,10 +366,19 @@ struct scsi_host_template {
>>> /*
>>> * This determines if we will use a non-interrupt driven
>>> * or an interrupt driven scheme. It is set to the maximum number
>>> - * of simultaneous commands a single hw queue in HBA will accept.
>>> + * of simultaneous commands a single hw queue in HBA will accept
>>> + * excluding internal commands.
>>> */
>>> int can_queue;
>>>
>>> + /*
>>> + * This determines how many commands the HBA will set aside
>>> + * for internal commands. This number will be added to
>>> + * @can_queue to calcumate the maximum number of simultaneous
>>
>
> Hi Damien,
>
>> s/calcumate/calculate
>>
>> But this is weird. For SATA, can_queue is 32. Having reserved commands,
>> that number needs to stay the same.
>
> It does.
>
>> We cannot have more than 32 tags.
>
> We may have 32 regular tags and 1 reserved tag for SATA.

Right. But that is the messy part though. That extra 1 tag is actually not
a tag since all internal commands are non-NCQ commands that do not need a
tag...

I am working on command duration limits support currently. This feature
set has a new horrendous "improvement": a command can be aborted by the
device if it fails its duration limit, but the abort is done with a good
status + sense data available bit set so that the device queue is not
aborted entirely like with a regular NCQ command error.

For such aborted commands, the command sense data is set to
"COMPLETED/DATA UNAVAILABLE". In this case, the host needs to go read the
new "successful NCQ sense data log" to check that the command sense is
indeed "COMPLETED/DATA UNAVAILABLE". And to go read that log page without
stalling the device queue, we would need an internal NCQ (queuable) command.

Currently, that is not possible to do cleanly as there are no guarantees
we can get a free tag (there is a race between block layer tag allocation
and libata internal tag counting). So a reserved tag for that would be
nice. We would end up with 31 IO tags at most + 1 reserved tag for NCQ
commands + ATA_TAG_INTERNAL for non-NCQ. That last one would be rendered
rather useless. But that also means that we kind-of go back to the days
when Linux showed ATA drives max QD of 31...

I am still struggling with this particular use case and trying to make it
fit with your series. Trying out different things right now.


>
>> I think keeping can_queue as the max queue depth with at most
>> nr_reserved_cmds tags reserved is better.
>
> Maybe the wording in the comment can be improved as it originally
> focused on SAS HBAs where there are no special rules for tagset depth or
> how the tagset should be carved up to handle regular and reserved commands.

Indeed. And that would be for HBAs that do *not* use libsas/libata.
Otherwise, the NCQ vs non-NCQ reserved tag mess is there.

>
> Thanks,
> John
>
>>
>>> + * commands sent to the host.
>>> + */
>>> + int nr_reserved_cmds;
>>> +
>>> /*
>>> * In many instances, especially where disconnect / reconnect are
>>> * supported, our host also has an ID on the SCSI bus. If this is
>>> @@ -602,6 +611,11 @@ struct Scsi_Host {
>>> unsigned short max_cmd_len;
>>>
>>> int this_id;
>>> +
>>> + /*
>>> + * Number of commands this host can handle at the same time.
>>> + * This excludes reserved commands as specified by nr_reserved_cmds.
>>> + */
>>> int can_queue;
>>> short cmd_per_lun;
>>> short unsigned int sg_tablesize;
>>> @@ -620,6 +634,12 @@ struct Scsi_Host {
>>> */
>>> unsigned nr_hw_queues;
>>> unsigned nr_maps;
>>> +
>>> + /*
>>> + * Number of reserved commands to allocate, if any.
>>> + */
>>> + unsigned nr_reserved_cmds;
>>> +
>>> unsigned active_mode:2;
>>>
>>> /*
>>
>>
>


--
Damien Le Moal
Western Digital Research

2022-06-13 09:47:21

by John Garry

[permalink] [raw]
Subject: Re: [PATCH RFC v2 06/18] libata-scsi: Add ata_scsi_queue_internal()

On 13/06/2022 08:12, Damien Le Moal wrote:
>> - if (dev->class == ATA_DEV_ATA || dev->class == ATA_DEV_ZAC) {
>> + if (scsi_is_reserved_cmd(scmd)) {
>> + return ata_scsi_queue_internal(scmd, dev);
>> + } else if (dev->class == ATA_DEV_ATA || dev->class == ATA_DEV_ZAC) {
> No need for the else here.
>

Ah, yes

Thanks,
John

2022-06-13 10:01:32

by John Garry

[permalink] [raw]
Subject: Re: [PATCH RFC v2 04/18] scsi: core: Add support to send reserved commands

On 13/06/2022 08:03, Damien Le Moal wrote:
>> + if (shost->nr_reserved_cmds && !sht->reserved_queuecommand) {
>> + shost_printk(KERN_ERR, shost,
>> + "nr_reserved_cmds set but no method to queue\n");
>> + goto fail;
> This would be a driver implementation bug.

It would be a driver bug, but it probably makes the driver utterly
useless and there is no point in continuing (to try to add). If the
driver supports reserved commands then they are prob essential to make
the driver function.

> So what about a WARN() here ?

Maybe but I really do not see a point in continuing

>
>> + }
>> +
>> /* Use min_t(int, ...) in case shost->can_queue exceeds SHRT_MAX */
>> shost->cmd_per_lun = min_t(int, shost->cmd_per_lun,
>> shost->can_queue);
>> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
>> index f6e53c6d913c..8c8b4c6767d9 100644
>> --- a/drivers/scsi/scsi_lib.c
>> +++ b/drivers/scsi/scsi_lib.c
>> @@ -1422,6 +1422,16 @@ static void scsi_complete(struct request *rq)
>> struct scsi_cmnd *cmd = blk_mq_rq_to_pdu(rq);
>> enum scsi_disposition disposition;
>>
>> + if (scsi_is_reserved_cmd(cmd)) {
>> + struct scsi_device *sdev = cmd->device;
>> +
>> + scsi_mq_uninit_cmd(cmd);
>> + scsi_device_unbusy(sdev, cmd);
>> + __blk_mq_end_request(rq, 0);
>> +
>> + return;
>> + }
>> +
>> INIT_LIST_HEAD(&cmd->eh_entry);
>>
>> atomic_inc(&cmd->device->iodone_cnt);
>> @@ -1706,6 +1716,28 @@ static blk_status_t scsi_queue_rq(struct blk_mq_hw_ctx *hctx,
>>
>> WARN_ON_ONCE(cmd->budget_token < 0);
>>
>> + if (scsi_is_reserved_cmd(cmd)) {
>> + unsigned char *host_scribble = cmd->host_scribble;
>> +
>> + if (!(req->rq_flags & RQF_DONTPREP)) {
>> + ret = scsi_prepare_cmd(req);
>> + if (ret != BLK_STS_OK) {
>> +
> Stray blank line.

ok

>
>> + goto out_dec_host_busy;
>> + }
> No need for the curly brackets here.

ok

>
>> +
>> + req->rq_flags |= RQF_DONTPREP;
>> + } else {
>> + clear_bit(SCMD_STATE_COMPLETE, &cmd->state);
>> + }

Thanks,
John

2022-06-13 10:03:54

by Damien Le Moal

[permalink] [raw]
Subject: Re: [PATCH RFC v2 03/18] scsi: core: Implement reserved command handling

On 6/13/22 18:34, John Garry wrote:
> On 13/06/2022 10:06, Damien Le Moal wrote:
>>>> We cannot have more than 32 tags.
>>> We may have 32 regular tags and 1 reserved tag for SATA.
>> Right. But that is the messy part though. That extra 1 tag is actually not
>> a tag since all internal commands are non-NCQ commands that do not need a
>> tag...
>
> But apart from SATA, libsas LLDDs do need a real tag for the libata
> internal command.

Yes, but that is really to manage the LLD command descriptor and not for
the device to use in the case of non-NCQ commands. There is not
necessarily a 1:1 equivalence between the HBA command descriptor tag and
ATA command tag.

>
>>
>> I am working on command duration limits support currently. This feature
>> set has a new horrendous "improvement": a command can be aborted by the
>> device if it fails its duration limit, but the abort is done with a good
>> status + sense data available bit set so that the device queue is not
>> aborted entirely like with a regular NCQ command error.
>>
>> For such aborted commands, the command sense data is set to
>> "COMPLETED/DATA UNAVAILABLE". In this case, the host needs to go read the
>> new "successful NCQ sense data log" to check that the command sense is
>> indeed "COMPLETED/DATA UNAVAILABLE". And to go read that log page without
>> stalling the device queue, we would need an internal NCQ (queuable) command.
>>
>> Currently, that is not possible to do cleanly as there are no guarantees
>> we can get a free tag (there is a race between block layer tag allocation
>> and libata internal tag counting). So a reserved tag for that would be
>> nice. We would end up with 31 IO tags at most + 1 reserved tag for NCQ
>> commands + ATA_TAG_INTERNAL for non-NCQ. That last one would be rendered
>> rather useless. But that also means that we kind-of go back to the days
>> when Linux showed ATA drives max QD of 31...
>
> So must the ATA_TAG_INTERNAL qc always be available for non-NCQ action
> like EH, and that is why you cannot reuse for this internal NCQ
> (queuable) command?

Currently, ATA_TAG_INTERNAL is always used for non-NCQ commands. Seeing a
qc with that tag means it is *not* NCQ.

I am trying to see if I can reuse the tag from one of the commands that
completed with that weird good status/sense data available. The problem
though is that this all needs to be done *before* calling
qc->complete_fn() which will free the tag. So we endup with 2 qcs that
have the same tag, the second one (for the read log command) temporarily
using the tag but still going through the same completion path without the
original command fully completed yet. It is a real mess.

>
>>
>> I am still struggling with this particular use case and trying to make it
>> fit with your series. Trying out different things right now.
>>
>
> ok
>
>>
>>>> I think keeping can_queue as the max queue depth with at most
>>>> nr_reserved_cmds tags reserved is better.
>>> Maybe the wording in the comment can be improved as it originally
>>> focused on SAS HBAs where there are no special rules for tagset depth or
>>> how the tagset should be carved up to handle regular and reserved commands.
>> Indeed. And that would be for HBAs that do*not* use libsas/libata.
>> Otherwise, the NCQ vs non-NCQ reserved tag mess is there.
>>
>
> Thanks,
> John


--
Damien Le Moal
Western Digital Research

2022-06-13 10:04:09

by Damien Le Moal

[permalink] [raw]
Subject: Re: [PATCH RFC v2 04/18] scsi: core: Add support to send reserved commands

On 6/13/22 18:40, John Garry wrote:
> On 13/06/2022 08:03, Damien Le Moal wrote:
>>> + if (shost->nr_reserved_cmds && !sht->reserved_queuecommand) {
>>> + shost_printk(KERN_ERR, shost,
>>> + "nr_reserved_cmds set but no method to queue\n");
>>> + goto fail;
>> This would be a driver implementation bug.
>
> It would be a driver bug, but it probably makes the driver utterly
> useless and there is no point in continuing (to try to add). If the
> driver supports reserved commands then they are prob essential to make
> the driver function.
>
>> So what about a WARN() here ?
>
> Maybe but I really do not see a point in continuing

Completely agree. My suggestion is to replace the regular shost_printk()
error message with something "heavier" like WARN(). Since that is not an
expected error. Even a BUG_ON() may be acceptable :)

>
>>
>>> + }
>>> +
>>> /* Use min_t(int, ...) in case shost->can_queue exceeds SHRT_MAX */
>>> shost->cmd_per_lun = min_t(int, shost->cmd_per_lun,
>>> shost->can_queue);
>>> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
>>> index f6e53c6d913c..8c8b4c6767d9 100644
>>> --- a/drivers/scsi/scsi_lib.c
>>> +++ b/drivers/scsi/scsi_lib.c
>>> @@ -1422,6 +1422,16 @@ static void scsi_complete(struct request *rq)
>>> struct scsi_cmnd *cmd = blk_mq_rq_to_pdu(rq);
>>> enum scsi_disposition disposition;
>>>
>>> + if (scsi_is_reserved_cmd(cmd)) {
>>> + struct scsi_device *sdev = cmd->device;
>>> +
>>> + scsi_mq_uninit_cmd(cmd);
>>> + scsi_device_unbusy(sdev, cmd);
>>> + __blk_mq_end_request(rq, 0);
>>> +
>>> + return;
>>> + }
>>> +
>>> INIT_LIST_HEAD(&cmd->eh_entry);
>>>
>>> atomic_inc(&cmd->device->iodone_cnt);
>>> @@ -1706,6 +1716,28 @@ static blk_status_t scsi_queue_rq(struct blk_mq_hw_ctx *hctx,
>>>
>>> WARN_ON_ONCE(cmd->budget_token < 0);
>>>
>>> + if (scsi_is_reserved_cmd(cmd)) {
>>> + unsigned char *host_scribble = cmd->host_scribble;
>>> +
>>> + if (!(req->rq_flags & RQF_DONTPREP)) {
>>> + ret = scsi_prepare_cmd(req);
>>> + if (ret != BLK_STS_OK) {
>>> +
>> Stray blank line.
>
> ok
>
>>
>>> + goto out_dec_host_busy;
>>> + }
>> No need for the curly brackets here.
>
> ok
>
>>
>>> +
>>> + req->rq_flags |= RQF_DONTPREP;
>>> + } else {
>>> + clear_bit(SCMD_STATE_COMPLETE, &cmd->state);
>>> + }
>
> Thanks,
> John


--
Damien Le Moal
Western Digital Research

2022-06-13 10:04:37

by John Garry

[permalink] [raw]
Subject: Re: [PATCH RFC v2 07/18] libata-scsi: Add ata_internal_queuecommand()

On 13/06/2022 08:16, Damien Le Moal wrote:
>> diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
>> index baac35dd17ca..b2702ab0183b 100644
>> --- a/drivers/ata/libata-scsi.c
>> +++ b/drivers/ata/libata-scsi.c
>> @@ -1114,6 +1114,20 @@ int ata_scsi_dev_config(struct scsi_device *sdev, struct ata_device *dev)
>> return 0;
>> }
>>
>> +int ata_internal_queuecommand(struct Scsi_Host *shost, struct scsi_cmnd *scmd)
> ata_scsi_internal_queuecommand()

ok

>
> But given that this is used for the .reserved_queuecommand() method, I
> would call it ata_scsi_reserved_queuecommand().

I did mention somewhere that I continued to use the term "internal" as
that is what libata already uses, but I can change it.

>
>> +{
>> + struct ata_port *ap;
>> + int res;
>> +
>> + ap = ata_shost_to_port(shost);
> You can move this to ap declaration.
>
> struct ata_port *ap = ata_shost_to_port(shost);

ok

>
>> + spin_lock_irq(ap->lock);
> spin_lock_irqsave() ?

Right

>
>> + res = ata_sas_queuecmd(scmd, ap);
>> + spin_unlock_irq(ap->lock);
>> +
>> + return res;

Thanks,
John

2022-06-13 10:07:11

by John Garry

[permalink] [raw]
Subject: Re: [PATCH RFC v2 03/18] scsi: core: Implement reserved command handling

On 13/06/2022 10:43, Damien Le Moal wrote:
>>> Currently, that is not possible to do cleanly as there are no guarantees
>>> we can get a free tag (there is a race between block layer tag allocation
>>> and libata internal tag counting). So a reserved tag for that would be
>>> nice. We would end up with 31 IO tags at most + 1 reserved tag for NCQ
>>> commands + ATA_TAG_INTERNAL for non-NCQ. That last one would be rendered
>>> rather useless. But that also means that we kind-of go back to the days
>>> when Linux showed ATA drives max QD of 31...
>> So must the ATA_TAG_INTERNAL qc always be available for non-NCQ action
>> like EH, and that is why you cannot reuse for this internal NCQ
>> (queuable) command?
> Currently, ATA_TAG_INTERNAL is always used for non-NCQ commands. Seeing a
> qc with that tag means it is*not* NCQ.
>
> I am trying to see if I can reuse the tag from one of the commands that
> completed with that weird good status/sense data available. The problem
> though is that this all needs to be done*before* calling
> qc->complete_fn() which will free the tag. So we endup with 2 qcs that
> have the same tag, the second one (for the read log command) temporarily
> using the tag but still going through the same completion path without the
> original command fully completed yet. It is a real mess.
>

Reusing tags seems really messy, but then reserving an NCQ command seems
wasteful.

Thanks,
John

2022-06-14 06:48:31

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH RFC v2 01/18] blk-mq: Add a flag for reserved requests

On Thu, Jun 09, 2022 at 06:29:02PM +0800, John Garry wrote:
> Add a flag for reserved requests so that drivers may know this for any
> special handling.
>
> The 'reserved' argument in blk_mq_ops.timeout callback could now be
> replaced by using this flag.

And we should probably do that ASAP, independent of the rest of this
series. Otherwise looks good:

Reviewed-by: Christoph Hellwig <[email protected]>

2022-06-14 07:20:23

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH RFC v2 02/18] scsi: core: Resurrect scsi_{get,free}_host_dev()

On Thu, Jun 09, 2022 at 06:29:03PM +0800, John Garry wrote:
> This reverts commit 6bd49b1a8d43ec118c55f3aaa7577729b52bde15.

Please add an actual text describing why you are doing this and how
insteasd of this completely pointless revert line.

2022-06-14 09:32:58

by John Garry

[permalink] [raw]
Subject: Re: [PATCH RFC v2 01/18] blk-mq: Add a flag for reserved requests

On 14/06/2022 07:43, Christoph Hellwig wrote:
> On Thu, Jun 09, 2022 at 06:29:02PM +0800, John Garry wrote:
>> Add a flag for reserved requests so that drivers may know this for any
>> special handling.
>>
>> The 'reserved' argument in blk_mq_ops.timeout callback could now be
>> replaced by using this flag.
> And we should probably do that ASAP, independent of the rest of this
> series.

We only have 2x users of the 'reserved' arg for 11x implementations of
blk_mq_ops.timeout:
- mtip32xx.c
- scsi_lib.c

scsi_lib.c is dubious as currently scsi does use reserved commands. So
we really only have 1x.

I'd be happy to take any spin-off series to make this one more
manageable, but is just removing an arg a strong enough reason for that?
That reserved arg is passed around a lot in the blk-mq iter functions,
so probably yes.

> Otherwise looks good:
>
> Reviewed-by: Christoph Hellwig<[email protected]>

Thanks

2022-06-14 09:50:59

by John Garry

[permalink] [raw]
Subject: Re: [PATCH RFC v2 02/18] scsi: core: Resurrect scsi_{get,free}_host_dev()

On 14/06/2022 07:44, Christoph Hellwig wrote:
> On Thu, Jun 09, 2022 at 06:29:03PM +0800, John Garry wrote:
>> This reverts commit 6bd49b1a8d43ec118c55f3aaa7577729b52bde15.
>
> Please add an actual text describing why you are doing this and how
> insteasd of this completely pointless revert line.
>
>
> .

OK. And in hindsight it would have been a good opportunity to mention
something which I am undecided on - that is which scsi_device to use for
these reserved commands?

In this series I use the scsi shost sdev for all reserved commands, but
maybe we should use the target sdev.

Pros of using scsi host sdev:
- don't need to worry about request queue freezing
- don't need to worry about running out of request queue budget
- available when scsi host is added - libata adds target sdev after some
internal commands are sent, and this would be a bit painful to change,
however adding the sdev earlier would seem to be a good change to make

Cons:
- generally better to use same scsi device as target (or is it?). For
example, it seems better to have reserved scsi_cmnd.device actually set
to the target sdev.
- don't need to add stuff like ata_is_scmd_ata_internal() later in this
series

Prob other reasons which I have forgot about. Please let me know if you
have any thoughts on this.

Cheers,
John

2022-06-14 19:02:33

by Bart Van Assche

[permalink] [raw]
Subject: Re: [PATCH RFC v2 01/18] blk-mq: Add a flag for reserved requests

On 6/9/22 03:29, John Garry wrote:
> Add a flag for reserved requests so that drivers may know this for any
> special handling.
>
> The 'reserved' argument in blk_mq_ops.timeout callback could now be
> replaced by using this flag.

Why not to combine that change into this patch?

Anyway:

Reviewed-by: Bart Van Assche <[email protected]>

2022-06-14 19:14:15

by John Garry

[permalink] [raw]
Subject: Re: [PATCH RFC v2 01/18] blk-mq: Add a flag for reserved requests

On 14/06/2022 19:00, Bart Van Assche wrote:
> On 6/9/22 03:29, John Garry wrote:
>> Add a flag for reserved requests so that drivers may know this for any
>> special handling.
>>
>> The 'reserved' argument in blk_mq_ops.timeout callback could now be
>> replaced by using this flag.
>
> Why not to combine that change into this patch?
>

If we remove the 'reserved' argument in blk_mq_ops.timeout callback then
we can also remove the 'reserved' member of busy_tag_iter_fn. I gave
that all a try and the diffstat looks like this:

block/blk-mq-debugfs.c | 2 +-
block/blk-mq-tag.c | 13 +++++--------
block/blk-mq.c | 22 +++++++++++++---------
block/bsg-lib.c | 2 +-
drivers/block/mtip32xx/mtip32xx.c | 11 +++++------
drivers/block/nbd.c | 5 ++---
drivers/block/null_blk/main.c | 2 +-
drivers/infiniband/ulp/srp/ib_srp.c | 3 +--
drivers/mmc/core/queue.c | 3 +--
drivers/nvme/host/apple.c | 3 +--
drivers/nvme/host/core.c | 2 +-
drivers/nvme/host/fc.c | 6 ++----
drivers/nvme/host/nvme.h | 2 +-
drivers/nvme/host/pci.c | 2 +-
drivers/nvme/host/rdma.c | 3 +--
drivers/nvme/host/tcp.c | 3 +--
drivers/s390/block/dasd.c | 2 +-
drivers/scsi/aacraid/comminit.c | 2 +-
drivers/scsi/aacraid/linit.c | 2 +-
drivers/scsi/hosts.c | 14 ++++++--------
drivers/scsi/mpi3mr/mpi3mr_os.c | 15 ++++-----------
drivers/scsi/scsi_lib.c | 12 ++----------
include/linux/blk-mq.h | 10 ++++++++--
include/scsi/scsi_host.h | 2 +-
24 files changed, 62 insertions(+), 81 deletions(-)

It would seem sensible to send that all separately and break it down a
bit - this series is already almost unmanageable.

> Anyway:
>
> Reviewed-by: Bart Van Assche <[email protected]>
> .

Thanks

2022-06-14 19:27:56

by Bart Van Assche

[permalink] [raw]
Subject: Re: [PATCH RFC v2 03/18] scsi: core: Implement reserved command handling

On 6/13/22 00:01, Damien Le Moal wrote:
> On 6/9/22 19:29, John Garry wrote:
>> + /*
>> + * This determines how many commands the HBA will set aside
>> + * for internal commands. This number will be added to
>> + * @can_queue to calcumate the maximum number of simultaneous
>
> s/calcumate/calculate
>
> But this is weird. For SATA, can_queue is 32. Having reserved commands,
> that number needs to stay the same. We cannot have more than 32 tags.
> I think keeping can_queue as the max queue depth with at most
> nr_reserved_cmds tags reserved is better.
>
>> + * commands sent to the host.
>> + */
>> + int nr_reserved_cmds;

+1 for Damien's request. I also prefer to keep can_queue as the maximum
queue depth, whether or not nr_reserved_cmds has been set.

Thanks,

Bart.

2022-06-14 20:27:11

by Bart Van Assche

[permalink] [raw]
Subject: Re: [PATCH RFC v2 02/18] scsi: core: Resurrect scsi_{get,free}_host_dev()

On 6/9/22 03:29, John Garry wrote:
> +/**
> + * scsi_get_host_dev - Create a scsi_device that points to the host adapter itself
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
What does this mean? That part of the function description is not
clear to me.

> + * @shost: Host that needs a scsi_device
^^^^^^^^^^^^^
This is not detailed enough. Consider changing "a scsi_device" into
"a scsi device for allocating reserved commands from".

> + *
> + * Lock status: None assumed.
> + *
> + * Returns: The scsi_device or NULL
> + *
> + * Notes:
> + * Attach a single scsi_device to the Scsi_Host - this should
> + * be made to look like a "pseudo-device" that points to the
> + * HA itself.
> + *
> + * Note - this device is not accessible from any high-level
> + * drivers (including generics), which is probably not
> + * optimal. We can add hooks later to attach.

The "which is probably not optimal. We can add hooks later to attach."
part probably should be moved to the patch description.

> + */
> +struct scsi_device *scsi_get_host_dev(struct Scsi_Host *shost)
> +{
> + struct scsi_device *sdev = NULL;
> + struct scsi_target *starget;
> +
> + mutex_lock(&shost->scan_mutex);
> + if (!scsi_host_scan_allowed(shost))
> + goto out;
> + starget = scsi_alloc_target(&shost->shost_gendev, 0, shost->this_id);
^^^^^^^^^^^^^^^^^^
Is it guaranteed that this channel / id combination will not be used for
any other SCSI device?

What if shost->this_id == -1?

> + if (!starget)
> + goto out;
> +
> + sdev = scsi_alloc_sdev(starget, 0, NULL);
> + if (sdev)
> + sdev->borken = 0;
> + else
> + scsi_target_reap(starget);
> + put_device(&starget->dev);
> + out:
> + mutex_unlock(&shost->scan_mutex);
> + return sdev;
> +}
> +EXPORT_SYMBOL(scsi_get_host_dev);

Elsewhere in the SCSI core "get..dev" means increment the reference count of
a SCSI device. Maybe scsi_alloc_host_dev() is a better name?

> +/*
> + * These two functions are used to allocate and free a pseudo device
> + * which will connect to the host adapter itself rather than any
> + * physical device. You must deallocate when you are done with the
> + * thing. This physical pseudo-device isn't real and won't be available
> + * from any high-level drivers.
> + */

Please keep function comments in .c files because that makes it more likely
that the comment and the implementation will remain in sync.

Thanks,

Bart.

2022-06-15 00:04:57

by Damien Le Moal

[permalink] [raw]
Subject: Re: [PATCH RFC v2 03/18] scsi: core: Implement reserved command handling

On 6/15/22 03:20, Bart Van Assche wrote:
> On 6/13/22 00:01, Damien Le Moal wrote:
>> On 6/9/22 19:29, John Garry wrote:
>>> + /*
>>> + * This determines how many commands the HBA will set aside
>>> + * for internal commands. This number will be added to
>>> + * @can_queue to calcumate the maximum number of simultaneous
>>
>> s/calcumate/calculate
>>
>> But this is weird. For SATA, can_queue is 32. Having reserved commands,
>> that number needs to stay the same. We cannot have more than 32 tags.
>> I think keeping can_queue as the max queue depth with at most
>> nr_reserved_cmds tags reserved is better.
>>
>>> + * commands sent to the host.
>>> + */
>>> + int nr_reserved_cmds;
>
> +1 for Damien's request. I also prefer to keep can_queue as the maximum
> queue depth, whether or not nr_reserved_cmds has been set.

For non SATA drives, I still think that is a good idea. However, for SATA,
we always have the internal tag command that is special. With John's
change, it would have to be reserved but that means we are down to 31 max
QD, so going backward several years... That internal tag for ATA does not
need to be reserved since this command is always used when the drive is
idle and no other NCQ commands are on-going.

So the solution to all this is a likely a little more complicated if we
want to keep ATA max QD to 32.

>
> Thanks,
>
> Bart.


--
Damien Le Moal
Western Digital Research

2022-06-15 08:13:03

by John Garry

[permalink] [raw]
Subject: Re: [PATCH RFC v2 03/18] scsi: core: Implement reserved command handling

On 15/06/2022 00:43, Damien Le Moal wrote:
> On 6/15/22 03:20, Bart Van Assche wrote:
>> On 6/13/22 00:01, Damien Le Moal wrote:
>>> On 6/9/22 19:29, John Garry wrote:
>>>> + /*
>>>> + * This determines how many commands the HBA will set aside
>>>> + * for internal commands. This number will be added to
>>>> + * @can_queue to calcumate the maximum number of simultaneous
>>>
>>> s/calcumate/calculate
>>>
>>> But this is weird. For SATA, can_queue is 32. Having reserved commands,
>>> that number needs to stay the same. We cannot have more than 32 tags.
>>> I think keeping can_queue as the max queue depth with at most
>>> nr_reserved_cmds tags reserved is better.
>>>
>>>> + * commands sent to the host.
>>>> + */
>>>> + int nr_reserved_cmds;
>>
>> +1 for Damien's request. I also prefer to keep can_queue as the maximum
>> queue depth, whether or not nr_reserved_cmds has been set.
>
> For non SATA drives, I still think that is a good idea. However, for SATA,
> we always have the internal tag command that is special. With John's
> change, it would have to be reserved but that means we are down to 31 max
> QD,

My intention is to keep regular tag depth at 32 for SATA. We add an
extra tag as a reserved tag. Indeed, this is called a 'tag', but it's
just really the placeholder for what will be the ATA_TAG_INTERNAL request.

About how we set scsi_host.can_queue, in this series we set .can_queue
as max regular tags, and the handling is as follows:

scsi_mq_setup_tags():
tag_set->queue_depth = shost->can_queue + shost->nr_reserved_cmds
tag_set->reserved_tags = shost->nr_reserved_cmds

So we honour the rule that blk_mq_tag_set.queue_depth is the total tag
depth, including reserved.

Incidentally I think Christoph prefers to keep .can_queue at total max
tags including reserved:
https://lore.kernel.org/linux-scsi/[email protected]/

> so going backward several years... That internal tag for ATA does not
> need to be reserved since this command is always used when the drive is
> idle and no other NCQ commands are on-going.

So do you mean that ATA_TAG_INTERNAL qc is used for other commands apart
from internal commands?

>
> So the solution to all this is a likely a little more complicated if we
> want to keep ATA max QD to 32.
>

thanks,
John

2022-06-15 13:55:21

by John Garry

[permalink] [raw]
Subject: Re: [PATCH RFC v2 02/18] scsi: core: Resurrect scsi_{get,free}_host_dev()

On 14/06/2022 20:33, Bart Van Assche wrote:

Hi Bart,

> On 6/9/22 03:29, John Garry wrote:
>> +/**
>> + * scsi_get_host_dev - Create a scsi_device that points to the host
>> adapter itself
>
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> What does this mean? That part of the function description is not
> clear to me.
>

Agreed, this text is just as it was before (it was originally deleted)
but I can fix it up to make sense.

>> + * @shost: Host that needs a scsi_device
>                               ^^^^^^^^^^^^^
> This is not detailed enough. Consider changing "a scsi_device" into
> "a scsi device for allocating reserved commands from".
>
>> + *
>> + * Lock status: None assumed.
>> + *
>> + * Returns:     The scsi_device or NULL
>> + *
>> + * Notes:
>> + *    Attach a single scsi_device to the Scsi_Host - this should
>> + *    be made to look like a "pseudo-device" that points to the
>> + *    HA itself.
>> + *
>> + *    Note - this device is not accessible from any high-level
>> + *    drivers (including generics), which is probably not
>> + *    optimal.  We can add hooks later to attach.
>
> The "which is probably not optimal. We can add hooks later to attach."
> part probably should be moved to the patch description.

ok

>
>> + */
>> +struct scsi_device *scsi_get_host_dev(struct Scsi_Host *shost)
>> +{
>> +    struct scsi_device *sdev = NULL;
>> +    struct scsi_target *starget;
>> +
>> +    mutex_lock(&shost->scan_mutex);
>> +    if (!scsi_host_scan_allowed(shost))
>> +        goto out;
>> +    starget = scsi_alloc_target(&shost->shost_gendev, 0,
>> shost->this_id);
>
> ^^^^^^^^^^^^^^^^^^
> Is it guaranteed that this channel / id combination will not be used for
> any other SCSI device?

Does it matter if the parent device is different?

>
> What if shost->this_id == -1?
>
>> +    if (!starget)
>> +        goto out;
>> +
>> +    sdev = scsi_alloc_sdev(starget, 0, NULL);
>> +    if (sdev)
>> +        sdev->borken = 0;
>> +    else
>> +        scsi_target_reap(starget);
>> +    put_device(&starget->dev);
>> + out:
>> +    mutex_unlock(&shost->scan_mutex);
>> +    return sdev;
>> +}
>> +EXPORT_SYMBOL(scsi_get_host_dev);
>
> Elsewhere in the SCSI core "get..dev" means increment the reference
> count of
> a SCSI device. Maybe scsi_alloc_host_dev() is a better name?

I think that the intention is to only use this once for a shost, i.e.
get or allocate that scsi_device once and use it for the lifetime of the
shost. But I can rename if you think it's better.

>
>> +/*
>> + * These two functions are used to allocate and free a pseudo device
>> + * which will connect to the host adapter itself rather than any
>> + * physical device.  You must deallocate when you are done with the
>> + * thing.  This physical pseudo-device isn't real and won't be available
>> + * from any high-level drivers.
>> + */
>
> Please keep function comments in .c files because that makes it more likely
> that the comment and the implementation will remain in sync.
>

fine, I can relocate this.

Thanks,
John

2022-06-16 02:57:34

by Damien Le Moal

[permalink] [raw]
Subject: Re: [PATCH RFC v2 03/18] scsi: core: Implement reserved command handling

On 6/15/22 16:35, John Garry wrote:
> On 15/06/2022 00:43, Damien Le Moal wrote:
>> On 6/15/22 03:20, Bart Van Assche wrote:
>>> On 6/13/22 00:01, Damien Le Moal wrote:
>>>> On 6/9/22 19:29, John Garry wrote:
>>>>> +    /*
>>>>> +     * This determines how many commands the HBA will set aside
>>>>> +     * for internal commands. This number will be added to
>>>>> +     * @can_queue to calcumate the maximum number of simultaneous
>>>>
>>>> s/calcumate/calculate
>>>>
>>>> But this is weird. For SATA, can_queue is 32. Having reserved commands,
>>>> that number needs to stay the same. We cannot have more than 32 tags.
>>>> I think keeping can_queue as the max queue depth with at most
>>>> nr_reserved_cmds tags reserved is better.
>>>>
>>>>> +     * commands sent to the host.
>>>>> +     */
>>>>> +    int nr_reserved_cmds;
>>>
>>> +1 for Damien's request. I also prefer to keep can_queue as the maximum
>>> queue depth, whether or not nr_reserved_cmds has been set.
>>
>> For non SATA drives, I still think that is a good idea. However, for
>> SATA,
>> we always have the internal tag command that is special. With John's
>> change, it would have to be reserved but that means we are down to 31 max
>> QD,
>
> My intention is to keep regular tag depth at 32 for SATA. We add an
> extra tag as a reserved tag. Indeed, this is called a 'tag', but it's
> just really the placeholder for what will be the ATA_TAG_INTERNAL request.
>
> About how we set scsi_host.can_queue, in this series we set .can_queue
> as max regular tags, and the handling is as follows:
>
> scsi_mq_setup_tags():
> tag_set->queue_depth = shost->can_queue + shost->nr_reserved_cmds
> tag_set->reserved_tags = shost->nr_reserved_cmds
>
> So we honour the rule that blk_mq_tag_set.queue_depth is the total tag
> depth, including reserved.
>
> Incidentally I think Christoph prefers to keep .can_queue at total max
> tags including reserved:
> https://lore.kernel.org/linux-scsi/[email protected]/
>
>
>> so going backward several years... That internal tag for ATA does not
>> need to be reserved since this command is always used when the drive is
>> idle and no other NCQ commands are on-going.
>
> So do you mean that ATA_TAG_INTERNAL qc is used for other commands apart
> from internal commands?

No. It is used only for internal commands. What I meant to say is that
currently, internal commands are issued only on device scan, device
revalidate and error handling. All of these phases are done with the
device under EH with the issuing path stopped and all commands
completed, so no regular commands can be issued. Only internal ones, non
NCQ, using the ATA_TAG_INTERNAL. So strictly speaking, we should not
need to reserve that internal tag at all.

>
>>
>> So the solution to all this is a likely a little more complicated if we
>> want to keep ATA max QD to 32.
>>
>
> thanks,
> John


--
Damien Le Moal
Western Digital Research

2022-06-16 08:49:29

by Damien Le Moal

[permalink] [raw]
Subject: Re: [PATCH RFC v2 03/18] scsi: core: Implement reserved command handling

On 2022/06/16 17:24, John Garry wrote:
> On 16/06/2022 03:47, Damien Le Moal wrote:
>>>> so going backward several years... That internal tag for ATA does not
>>>> need to be reserved since this command is always used when the drive is
>>>> idle and no other NCQ commands are on-going.
>>>
>>> So do you mean that ATA_TAG_INTERNAL qc is used for other commands
>>> apart from internal commands?
>>
>> No. It is used only for internal commands. What I meant to say is that
>> currently, internal commands are issued only on device scan, device
>> revalidate and error handling. All of these phases are done with the
>> device under EH with the issuing path stopped and all commands
>> completed,
>
> If I want to allocate a request for an ATA internal command then could I
> use 1x from the regular tags? I didn't think that this was possible as I
> thought that all tags may be outstanding when EH kicks in. I need to
> double check it.

When EH kicks in, the drive is in error mode and all commands are back to the
host. From there, you need to get the drive out of error mode with read log 10h
and then internal commands can be issued if needed. Then the aborted commands
that are not in error are restarted.

For the non-error case (revalidate), ap->ops->qc_defer() will make sure that NCQ
and non-NCQ commands are never mixed. Since all internal commands are non-ncq,
when an internal command is issued, there are necessarily no other commands
ongoing, but 32 NCQ commands may be waiting, without any free tag. The internal
command being non-NCQ can still proceed since it does not need a real device tag.

The joy of ATA...

> Even if it were true, not using a reserved tag for ATA internal command
> makes things more tricky as this command requires special handling for
> scsi blk_mq_ops and there is no easy way to identify the command as
> reserved (to know special handling is required).

Yes. Having the ATA_TAG_INTERNAL tag as a reserved tag is fine. But from the
above, you can see that this is not really needed at all to make things work.
The management of ATA_TAG_INTERNAL as a reserve tag is really about getting your
API to simplify the code.

What I am thinking is that with your patches as is, it seems that we can never
actually reserve a real tag for ATA to do internal NCQ commands... We do not
really need that for now though, apart maybe for speeding up device revalidate.
Everytime that one runs, one can see a big spike in read/write IO latencies
because of the queue drain it causes.

And for CDL 0xD policy error handling, I may need a reserved NCQ tag... Still
trying to work out qc/tag reuse for now though.

>
>> so no regular commands can be issued. Only internal ones, non
>> NCQ, using the ATA_TAG_INTERNAL. So strictly speaking, we should not
>> need to reserve that internal tag at all.
>>
>
> Thanks,
> John
>


--
Damien Le Moal
Western Digital Research

2022-06-16 08:54:20

by John Garry

[permalink] [raw]
Subject: Re: [PATCH RFC v2 03/18] scsi: core: Implement reserved command handling

On 16/06/2022 03:47, Damien Le Moal wrote:
>>> so going backward several years... That internal tag for ATA does not
>>> need to be reserved since this command is always used when the drive is
>>> idle and no other NCQ commands are on-going.
>>
>> So do you mean that ATA_TAG_INTERNAL qc is used for other commands
>> apart from internal commands?
>
> No. It is used only for internal commands. What I meant to say is that
> currently, internal commands are issued only on device scan, device
> revalidate and error handling. All of these phases are done with the
> device under EH with the issuing path stopped and all commands
> completed,

If I want to allocate a request for an ATA internal command then could I
use 1x from the regular tags? I didn't think that this was possible as I
thought that all tags may be outstanding when EH kicks in. I need to
double check it.

Even if it were true, not using a reserved tag for ATA internal command
makes things more tricky as this command requires special handling for
scsi blk_mq_ops and there is no easy way to identify the command as
reserved (to know special handling is required).

>so no regular commands can be issued. Only internal ones, non
> NCQ, using the ATA_TAG_INTERNAL. So strictly speaking, we should not
> need to reserve that internal tag at all.
>

Thanks,
John

2022-06-20 06:31:06

by Hannes Reinecke

[permalink] [raw]
Subject: Re: [PATCH RFC v2 03/18] scsi: core: Implement reserved command handling

On 6/13/22 09:01, Damien Le Moal wrote:
> On 6/9/22 19:29, John Garry wrote:
>> From: Hannes Reinecke <[email protected]>
>>
>> Quite some drivers are using management commands internally, which
>> typically use the same hardware tag pool (ie they are being allocated
>> from the same hardware resources) as the 'normal' I/O commands.
>> These commands are set aside before allocating the block-mq tag bitmap,
>> so they'll never show up as busy in the tag map.
>> The block-layer, OTOH, already has 'reserved_tags' to handle precisely
>> this situation.
>> So this patch adds a new field 'nr_reserved_cmds' to the SCSI host
>> template to instruct the block layer to set aside a tag space for these
>> management commands by using reserved tags.
>>
>> Signed-off-by: Hannes Reinecke <[email protected]>
>> Signed-off-by: John Garry <[email protected]>
>> ---
>> drivers/scsi/hosts.c | 3 +++
>> drivers/scsi/scsi_lib.c | 6 +++++-
>> include/scsi/scsi_host.h | 22 +++++++++++++++++++++-
>> 3 files changed, 29 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/scsi/hosts.c b/drivers/scsi/hosts.c
>> index 8352f90d997d..27296addaf63 100644
>> --- a/drivers/scsi/hosts.c
>> +++ b/drivers/scsi/hosts.c
>> @@ -474,6 +474,9 @@ struct Scsi_Host *scsi_host_alloc(struct scsi_host_template *sht, int privsize)
>> if (sht->virt_boundary_mask)
>> shost->virt_boundary_mask = sht->virt_boundary_mask;
>>
>> + if (sht->nr_reserved_cmds)
>> + shost->nr_reserved_cmds = sht->nr_reserved_cmds;
>> +
>> device_initialize(&shost->shost_gendev);
>> dev_set_name(&shost->shost_gendev, "host%d", shost->host_no);
>> shost->shost_gendev.bus = &scsi_bus_type;
>> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
>> index 6ffc9e4258a8..f6e53c6d913c 100644
>> --- a/drivers/scsi/scsi_lib.c
>> +++ b/drivers/scsi/scsi_lib.c
>> @@ -1974,8 +1974,12 @@ int scsi_mq_setup_tags(struct Scsi_Host *shost)
>> else
>> tag_set->ops = &scsi_mq_ops_no_commit;
>> tag_set->nr_hw_queues = shost->nr_hw_queues ? : 1;
>> +
>> tag_set->nr_maps = shost->nr_maps ? : 1;
>> - tag_set->queue_depth = shost->can_queue;
>> + tag_set->queue_depth =
>> + shost->can_queue + shost->nr_reserved_cmds;
>> + tag_set->reserved_tags = shost->nr_reserved_cmds;
>> +
>> tag_set->cmd_size = cmd_size;
>> tag_set->numa_node = dev_to_node(shost->dma_dev);
>> tag_set->flags = BLK_MQ_F_SHOULD_MERGE;
>> diff --git a/include/scsi/scsi_host.h b/include/scsi/scsi_host.h
>> index 59aef1f178f5..149dcbd4125e 100644
>> --- a/include/scsi/scsi_host.h
>> +++ b/include/scsi/scsi_host.h
>> @@ -366,10 +366,19 @@ struct scsi_host_template {
>> /*
>> * This determines if we will use a non-interrupt driven
>> * or an interrupt driven scheme. It is set to the maximum number
>> - * of simultaneous commands a single hw queue in HBA will accept.
>> + * of simultaneous commands a single hw queue in HBA will accept
>> + * excluding internal commands.
>> */
>> int can_queue;
>>
>> + /*
>> + * This determines how many commands the HBA will set aside
>> + * for internal commands. This number will be added to
>> + * @can_queue to calcumate the maximum number of simultaneous
>
> s/calcumate/calculate
>
> But this is weird. For SATA, can_queue is 32. Having reserved commands,
> that number needs to stay the same. We cannot have more than 32 tags.
> I think keeping can_queue as the max queue depth with at most
> nr_reserved_cmds tags reserved is better.
>
I had been thinking about this for quite a while, and figured that the
'reserved' commands model from blk-mq doesn't fit nicely with the SATA
protocol.

So my idea for SATA is simply _not_ to use reserved tags.
Any TMF functions (or the equivalent thereof) should always be sent as
non-NCQ commands. And when doing so we're back to QD=1 on SATA anyway,
so there _must_ be tags available. Consequently the main reason for
having reserved tags (namely to guarantee that tags are available for
TMF) doesn't apply here.

Which is why in my initial patchset I've always been referrring to
'internal' commands, and drivers could select if the 'internal' commands
are mappend on reserved tags or not.

Cheers,

Hannes
--
Dr. Hannes Reinecke Kernel Storage Architect
[email protected] +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew
Myers, Andrew McDonald, Martje Boudien Moerman

2022-06-20 06:41:55

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH RFC v2 03/18] scsi: core: Implement reserved command handling

On Mon, Jun 20, 2022 at 08:24:24AM +0200, Hannes Reinecke wrote:
> So my idea for SATA is simply _not_ to use reserved tags.
> Any TMF functions (or the equivalent thereof) should always be sent as
> non-NCQ commands. And when doing so we're back to QD=1 on SATA anyway, so
> there _must_ be tags available. Consequently the main reason for having
> reserved tags (namely to guarantee that tags are available for TMF) doesn't
> apply here.

At least in the non-elevator case (which includes all passthrough I/O)
request have tags assigned as soon as they are allocated. So, we
absolutely can have all tags allocated and then need to do a TMF.

2022-06-20 07:15:53

by Damien Le Moal

[permalink] [raw]
Subject: Re: [PATCH RFC v2 03/18] scsi: core: Implement reserved command handling

On 6/20/22 15:45, Hannes Reinecke wrote:
> On 6/13/22 11:06, Damien Le Moal wrote:
>> On 6/13/22 17:25, John Garry wrote:
> [ .. ]
>>>
>>> We may have 32 regular tags and 1 reserved tag for SATA.
>>
>> Right. But that is the messy part though. That extra 1 tag is actually not
>> a tag since all internal commands are non-NCQ commands that do not need a
>> tag...
>>
>> I am working on command duration limits support currently. This feature
>> set has a new horrendous "improvement": a command can be aborted by the
>> device if it fails its duration limit, but the abort is done with a good
>> status + sense data available bit set so that the device queue is not
>> aborted entirely like with a regular NCQ command error.
>>
>> For such aborted commands, the command sense data is set to
>> "COMPLETED/DATA UNAVAILABLE". In this case, the host needs to go read the
>> new "successful NCQ sense data log" to check that the command sense is
>> indeed "COMPLETED/DATA UNAVAILABLE". And to go read that log page without
>> stalling the device queue, we would need an internal NCQ (queuable) command.
>>
>> Currently, that is not possible to do cleanly as there are no guarantees
>> we can get a free tag (there is a race between block layer tag allocation
>> and libata internal tag counting). So a reserved tag for that would be
>> nice. We would end up with 31 IO tags at most + 1 reserved tag for NCQ
>> commands + ATA_TAG_INTERNAL for non-NCQ. That last one would be rendered
>> rather useless. But that also means that we kind-of go back to the days
>> when Linux showed ATA drives max QD of 31...
>>
>> I am still struggling with this particular use case and trying to make it
>> fit with your series. Trying out different things right now.
>>
> Hmm. Struggling on how that is supposed to work in general.

The standard monks defined it as conceptually easy: if a command completes
with success and sense data available bit set, then just read that log
page that has the sense data to check what happened. Very trivial in
principle.
But of course, this is ATA, so a mess in practice because we want to do
that read log with an NCQ command to less impact on the drive performance
than a regular error. Otherwise, if we simply do a regular eh, we end up
with the same impact as a hard command failure. And then we end up with
all these problems with tag reusing and nothing in libata allowing to do
internal ncq commands.

> As we're reading from a log to get the sense information I guess that
> log is organized by tag index. Meaning we have to keep hold of the tag
> which generated that error.

Yep. This is a 1024B log which has all the sense information of for all
completed NCQ commands, organized per tag.

> Q1: Can we (re-) use that tag to read the log information?

I thought of that. BUT: if a revalidate or regular eh is ongoing, we need
to delay issuing of the NCQ read log command since eh will prevent issuing
anything (there will be non-ncq commands on-going). Problem here is that
delaying ncq commands means essentially doing a requeue so we need a real
req/scsi req for that. Reusing the tag for a new temporary internal qc is
not enough.

> Q2: What do you do if all 32 command generate such an error?

For that case, I can simply use the internal tag and do a non-ncq read
log. That is actually the easy case !

> But really, this sounds no different from the 'classical' request sense
> handling in SCSI ML. Have you considered just run with that an map
> 'REQUEST SENSE' on your new NCQ Get Log page command?

I am exploring the reuse of the scsi EH now. But very messy on libata
side. Still no good solution.

While doing that, I did discover that libata eh is very messy because of
one driver only: scsi ipr. That is the only one that does not have a
->error_handler port operation. And because of that, we are stuck with
lots of "old EH" ata code. So there are always 2 different eh path.
Complete mess. I am trying to see if I can't convert scsi ipr to have a
error_handler port operation, but I cannot test anything as I do not have
the hardware.

>
> Cheers,
>
> Hannes


--
Damien Le Moal
Western Digital Research

2022-06-20 07:37:50

by Hannes Reinecke

[permalink] [raw]
Subject: Re: [PATCH RFC v2 03/18] scsi: core: Implement reserved command handling

On 6/13/22 11:06, Damien Le Moal wrote:
> On 6/13/22 17:25, John Garry wrote:
[ .. ]
>>
>> We may have 32 regular tags and 1 reserved tag for SATA.
>
> Right. But that is the messy part though. That extra 1 tag is actually not
> a tag since all internal commands are non-NCQ commands that do not need a
> tag...
>
> I am working on command duration limits support currently. This feature
> set has a new horrendous "improvement": a command can be aborted by the
> device if it fails its duration limit, but the abort is done with a good
> status + sense data available bit set so that the device queue is not
> aborted entirely like with a regular NCQ command error.
>
> For such aborted commands, the command sense data is set to
> "COMPLETED/DATA UNAVAILABLE". In this case, the host needs to go read the
> new "successful NCQ sense data log" to check that the command sense is
> indeed "COMPLETED/DATA UNAVAILABLE". And to go read that log page without
> stalling the device queue, we would need an internal NCQ (queuable) command.
>
> Currently, that is not possible to do cleanly as there are no guarantees
> we can get a free tag (there is a race between block layer tag allocation
> and libata internal tag counting). So a reserved tag for that would be
> nice. We would end up with 31 IO tags at most + 1 reserved tag for NCQ
> commands + ATA_TAG_INTERNAL for non-NCQ. That last one would be rendered
> rather useless. But that also means that we kind-of go back to the days
> when Linux showed ATA drives max QD of 31...
>
> I am still struggling with this particular use case and trying to make it
> fit with your series. Trying out different things right now.
>
Hmm. Struggling on how that is supposed to work in general.
As we're reading from a log to get the sense information I guess that
log is organized by tag index. Meaning we have to keep hold of the tag
which generated that error.
Q1: Can we (re-) use that tag to read the log information?
Q2: What do you do if all 32 command generate such an error?

But really, this sounds no different from the 'classical' request sense
handling in SCSI ML. Have you considered just run with that an map
'REQUEST SENSE' on your new NCQ Get Log page command?

Cheers,

Hannes
--
Dr. Hannes Reinecke Kernel Storage Architect
[email protected] +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew
Myers, Andrew McDonald, Martje Boudien Moerman

2022-06-20 07:40:44

by Hannes Reinecke

[permalink] [raw]
Subject: Re: [PATCH RFC v2 03/18] scsi: core: Implement reserved command handling

On 6/20/22 08:28, Christoph Hellwig wrote:
> On Mon, Jun 20, 2022 at 08:24:24AM +0200, Hannes Reinecke wrote:
>> So my idea for SATA is simply _not_ to use reserved tags.
>> Any TMF functions (or the equivalent thereof) should always be sent as
>> non-NCQ commands. And when doing so we're back to QD=1 on SATA anyway, so
>> there _must_ be tags available. Consequently the main reason for having
>> reserved tags (namely to guarantee that tags are available for TMF) doesn't
>> apply here.
>
> At least in the non-elevator case (which includes all passthrough I/O)
> request have tags assigned as soon as they are allocated. So, we
> absolutely can have all tags allocated and then need to do a TMF.

SATA internals may come to the rescue here; if there's an error all NCQ
commands are aborted. So we'll get at least one command tag back.
As for the command duration limits I'm still waiting for clarification
from Damien if we can reuse tags there.

But I do agree it's ugly.

Cheers,

Hannes
--
Dr. Hannes Reinecke Kernel Storage Architect
[email protected] +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew
Myers, Andrew McDonald, Martje Boudien Moerman

2022-06-20 08:01:26

by Damien Le Moal

[permalink] [raw]
Subject: Re: [PATCH RFC v2 03/18] scsi: core: Implement reserved command handling

On 6/20/22 16:03, Hannes Reinecke wrote:
> On 6/20/22 08:28, Christoph Hellwig wrote:
>> On Mon, Jun 20, 2022 at 08:24:24AM +0200, Hannes Reinecke wrote:
>>> So my idea for SATA is simply _not_ to use reserved tags.
>>> Any TMF functions (or the equivalent thereof) should always be sent as
>>> non-NCQ commands. And when doing so we're back to QD=1 on SATA anyway, so
>>> there _must_ be tags available. Consequently the main reason for having
>>> reserved tags (namely to guarantee that tags are available for TMF) doesn't
>>> apply here.
>>
>> At least in the non-elevator case (which includes all passthrough I/O)
>> request have tags assigned as soon as they are allocated. So, we
>> absolutely can have all tags allocated and then need to do a TMF.
>
> SATA internals may come to the rescue here; if there's an error all NCQ
> commands are aborted. So we'll get at least one command tag back.
> As for the command duration limits I'm still waiting for clarification
> from Damien if we can reuse tags there.

We cannot easily reuse tags as the CDL case is *not* an error. So no queue
abort come with it. If we had the queue abort, things would be easy as
that would essentially be a regular eh.

Simply using scsi_execute_req() does not work since we have no guarantees
we can get a tag: all 32 commands being executed may complete with needing
sense data, and so scsi_execute_req() would hang waiting for a tag. I
could try hacking scsi_execute_req() to reuse a tag instead of allocating
a new one... Calling scsi_execute_req() from libata is really ugly though.

>
> But I do agree it's ugly.
>
> Cheers,
>
> Hannes


--
Damien Le Moal
Western Digital Research

2022-06-20 08:43:07

by Hannes Reinecke

[permalink] [raw]
Subject: Re: [PATCH RFC v2 03/18] scsi: core: Implement reserved command handling

On 6/15/22 09:35, John Garry wrote:
> On 15/06/2022 00:43, Damien Le Moal wrote:
>> On 6/15/22 03:20, Bart Van Assche wrote:
>>> On 6/13/22 00:01, Damien Le Moal wrote:
>>>> On 6/9/22 19:29, John Garry wrote:
>>>>> +    /*
>>>>> +     * This determines how many commands the HBA will set aside
>>>>> +     * for internal commands. This number will be added to
>>>>> +     * @can_queue to calcumate the maximum number of simultaneous
>>>>
>>>> s/calcumate/calculate
>>>>
>>>> But this is weird. For SATA, can_queue is 32. Having reserved commands,
>>>> that number needs to stay the same. We cannot have more than 32 tags.
>>>> I think keeping can_queue as the max queue depth with at most
>>>> nr_reserved_cmds tags reserved is better.
>>>>
>>>>> +     * commands sent to the host.
>>>>> +     */
>>>>> +    int nr_reserved_cmds;
>>>
>>> +1 for Damien's request. I also prefer to keep can_queue as the maximum
>>> queue depth, whether or not nr_reserved_cmds has been set.
>>
>> For non SATA drives, I still think that is a good idea. However, for
>> SATA,
>> we always have the internal tag command that is special. With John's
>> change, it would have to be reserved but that means we are down to 31 max
>> QD,
>
> My intention is to keep regular tag depth at 32 for SATA. We add an
> extra tag as a reserved tag. Indeed, this is called a 'tag', but it's
> just really the placeholder for what will be the ATA_TAG_INTERNAL request.
>
> About how we set scsi_host.can_queue, in this series we set .can_queue
> as max regular tags, and the handling is as follows:
>
> scsi_mq_setup_tags():
> tag_set->queue_depth = shost->can_queue + shost->nr_reserved_cmds
> tag_set->reserved_tags = shost->nr_reserved_cmds
>
> So we honour the rule that blk_mq_tag_set.queue_depth is the total tag
> depth, including reserved.
>
> Incidentally I think Christoph prefers to keep .can_queue at total max
> tags including reserved:
> https://lore.kernel.org/linux-scsi/[email protected]/
>
>
>> so going backward several years... That internal tag for ATA does not
>> need to be reserved since this command is always used when the drive is
>> idle and no other NCQ commands are on-going.
>
> So do you mean that ATA_TAG_INTERNAL qc is used for other commands apart
> from internal commands?
>
Well.

The problem is that 'ATA_TAG_INTERNAL' currently is overloaded to
a) signal internal commands
b) 'magic' tag when looking up commands

My proposal would be to separate these use-cases, and use a flag (eg
ATA_QCFLAG_INTERNAL) to determine internal commands.

The we'll be needing an internal tag-lookup map
(NCQ tag -> blk-mq tag) for ata_qc_from_tag() to retrieve the command
corresponding to a driver tag.

I guess we'll need that anyway with libsas, as there we're working with
a budget tag which has no relationship with the NCQ tag whatsoever ...

But with that we can kill 'ATA_TAG_INTERNAL', and let the driver figure
out how to allocate internal tags.

Cheers,

Hannes
--
Dr. Hannes Reinecke Kernel Storage Architect
[email protected] +49 911 74053 688
SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), GF: Felix Imendörffer

2022-06-20 08:50:30

by Hannes Reinecke

[permalink] [raw]
Subject: Re: [PATCH RFC v2 03/18] scsi: core: Implement reserved command handling

On 6/16/22 10:41, Damien Le Moal wrote:
> On 2022/06/16 17:24, John Garry wrote:
>> On 16/06/2022 03:47, Damien Le Moal wrote:
>>>>> so going backward several years... That internal tag for ATA does not
>>>>> need to be reserved since this command is always used when the drive is
>>>>> idle and no other NCQ commands are on-going.
>>>>
>>>> So do you mean that ATA_TAG_INTERNAL qc is used for other commands
>>>> apart from internal commands?
>>>
>>> No. It is used only for internal commands. What I meant to say is that
>>> currently, internal commands are issued only on device scan, device
>>> revalidate and error handling. All of these phases are done with the
>>> device under EH with the issuing path stopped and all commands
>>> completed,
>>
>> If I want to allocate a request for an ATA internal command then could I
>> use 1x from the regular tags? I didn't think that this was possible as I
>> thought that all tags may be outstanding when EH kicks in. I need to
>> double check it.
>
> When EH kicks in, the drive is in error mode and all commands are back to the
> host. From there, you need to get the drive out of error mode with read log 10h
> and then internal commands can be issued if needed. Then the aborted commands
> that are not in error are restarted.
>
> For the non-error case (revalidate), ap->ops->qc_defer() will make sure that NCQ
> and non-NCQ commands are never mixed. Since all internal commands are non-ncq,
> when an internal command is issued, there are necessarily no other commands
> ongoing, but 32 NCQ commands may be waiting, without any free tag. The internal
> command being non-NCQ can still proceed since it does not need a real device tag.
>
> The joy of ATA...
>
>> Even if it were true, not using a reserved tag for ATA internal command
>> makes things more tricky as this command requires special handling for
>> scsi blk_mq_ops and there is no easy way to identify the command as
>> reserved (to know special handling is required).
>
> Yes. Having the ATA_TAG_INTERNAL tag as a reserved tag is fine. But from the
> above, you can see that this is not really needed at all to make things work.
> The management of ATA_TAG_INTERNAL as a reserve tag is really about getting your
> API to simplify the code.
>
> What I am thinking is that with your patches as is, it seems that we can never
> actually reserve a real tag for ATA to do internal NCQ commands... We do not
> really need that for now though, apart maybe for speeding up device revalidate.
> Everytime that one runs, one can see a big spike in read/write IO latencies
> because of the queue drain it causes.
>
Hmm. But doesn't that mean the we can reserve one tag, _and_ set the
queue depth to '32'?
We'll need to fiddle with the tag map on completion (cf my previous
mail), but then we might need to do that anyway if we separate out
ATA_QCFLAG_INTERNAL ...

Cheers,

Hannes
--
Dr. Hannes Reinecke Kernel Storage Architect
[email protected] +49 911 74053 688
SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), GF: Felix Imendörffer

2022-06-20 09:04:50

by Damien Le Moal

[permalink] [raw]
Subject: Re: [PATCH RFC v2 03/18] scsi: core: Implement reserved command handling

On 6/20/22 17:27, Hannes Reinecke wrote:
> On 6/16/22 10:41, Damien Le Moal wrote:
>> On 2022/06/16 17:24, John Garry wrote:
>>> On 16/06/2022 03:47, Damien Le Moal wrote:
>>>>>> so going backward several years... That internal tag for ATA does not
>>>>>> need to be reserved since this command is always used when the drive is
>>>>>> idle and no other NCQ commands are on-going.
>>>>>
>>>>> So do you mean that ATA_TAG_INTERNAL qc is used for other commands
>>>>> apart from internal commands?
>>>>
>>>> No. It is used only for internal commands. What I meant to say is that
>>>> currently, internal commands are issued only on device scan, device
>>>> revalidate and error handling. All of these phases are done with the
>>>> device under EH with the issuing path stopped and all commands
>>>> completed,
>>>
>>> If I want to allocate a request for an ATA internal command then could I
>>> use 1x from the regular tags? I didn't think that this was possible as I
>>> thought that all tags may be outstanding when EH kicks in. I need to
>>> double check it.
>>
>> When EH kicks in, the drive is in error mode and all commands are back to the
>> host. From there, you need to get the drive out of error mode with read log 10h
>> and then internal commands can be issued if needed. Then the aborted commands
>> that are not in error are restarted.
>>
>> For the non-error case (revalidate), ap->ops->qc_defer() will make sure that NCQ
>> and non-NCQ commands are never mixed. Since all internal commands are non-ncq,
>> when an internal command is issued, there are necessarily no other commands
>> ongoing, but 32 NCQ commands may be waiting, without any free tag. The internal
>> command being non-NCQ can still proceed since it does not need a real device tag.
>>
>> The joy of ATA...
>>
>>> Even if it were true, not using a reserved tag for ATA internal command
>>> makes things more tricky as this command requires special handling for
>>> scsi blk_mq_ops and there is no easy way to identify the command as
>>> reserved (to know special handling is required).
>>
>> Yes. Having the ATA_TAG_INTERNAL tag as a reserved tag is fine. But from the
>> above, you can see that this is not really needed at all to make things work.
>> The management of ATA_TAG_INTERNAL as a reserve tag is really about getting your
>> API to simplify the code.
>>
>> What I am thinking is that with your patches as is, it seems that we can never
>> actually reserve a real tag for ATA to do internal NCQ commands... We do not
>> really need that for now though, apart maybe for speeding up device revalidate.
>> Everytime that one runs, one can see a big spike in read/write IO latencies
>> because of the queue drain it causes.
>>
> Hmm. But doesn't that mean the we can reserve one tag, _and_ set the
> queue depth to '32'?
> We'll need to fiddle with the tag map on completion (cf my previous
> mail), but then we might need to do that anyway if we separate out
> ATA_QCFLAG_INTERNAL ...

Reserving a tag is not enough. As explained, even if I can get a tag for a
qc, I need a proper req to safely issue an ncq command (because of the
potential need to delay and requeue even if we have a free tag !).

So reserving a tag/req to be able to do NCQ at the cost of max qd being 31
works for that. We could keep max qd at 32 by creating one more "fake" tag
and having a request for it, that is, having the fake tag visible to the
block layer as a reserved tag, as John's series is doing, but for the
reserved tags, we actually need to use an effective tag (qc->hw_tag) when
issuing the commands. And for that, we can reuse the tag of one of the
failed commands.

>
> Cheers,
>
> Hannes


--
Damien Le Moal
Western Digital Research

2022-06-20 09:42:52

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH RFC v2 03/18] scsi: core: Implement reserved command handling

On Mon, Jun 20, 2022 at 06:02:30PM +0900, Damien Le Moal wrote:
> So reserving a tag/req to be able to do NCQ at the cost of max qd being 31
> works for that. We could keep max qd at 32 by creating one more "fake" tag
> and having a request for it, that is, having the fake tag visible to the
> block layer as a reserved tag, as John's series is doing, but for the
> reserved tags, we actually need to use an effective tag (qc->hw_tag) when
> issuing the commands. And for that, we can reuse the tag of one of the
> failed commands.

Take a look at the magic flush request in blk-flush.c, which is
preallocated but borrows a tag from the request that wants a pre- or
post-flush. The logic is rather ugly, but maybe it might actually
become cleaner by generalizing it a bit.

2022-06-20 11:43:36

by Damien Le Moal

[permalink] [raw]
Subject: Re: [PATCH RFC v2 03/18] scsi: core: Implement reserved command handling

On 6/20/22 18:05, Christoph Hellwig wrote:
> On Mon, Jun 20, 2022 at 06:02:30PM +0900, Damien Le Moal wrote:
>> So reserving a tag/req to be able to do NCQ at the cost of max qd being 31
>> works for that. We could keep max qd at 32 by creating one more "fake" tag
>> and having a request for it, that is, having the fake tag visible to the
>> block layer as a reserved tag, as John's series is doing, but for the
>> reserved tags, we actually need to use an effective tag (qc->hw_tag) when
>> issuing the commands. And for that, we can reuse the tag of one of the
>> failed commands.
>
> Take a look at the magic flush request in blk-flush.c, which is
> preallocated but borrows a tag from the request that wants a pre- or
> post-flush. The logic is rather ugly, but maybe it might actually
> become cleaner by generalizing it a bit.

Thanks. Will check.
I am also looking at scsi_unjam_host() and scsi_eh_get_sense(). These
reuse a scsi command to do eh operations. So I could use that too, modulo
making it work outside of eh context to keep the command flow intact.

--
Damien Le Moal
Western Digital Research

2022-06-20 12:04:33

by Hannes Reinecke

[permalink] [raw]
Subject: Re: [PATCH RFC v2 03/18] scsi: core: Implement reserved command handling

On 6/20/22 13:24, Damien Le Moal wrote:
> On 6/20/22 18:05, Christoph Hellwig wrote:
>> On Mon, Jun 20, 2022 at 06:02:30PM +0900, Damien Le Moal wrote:
>>> So reserving a tag/req to be able to do NCQ at the cost of max qd being 31
>>> works for that. We could keep max qd at 32 by creating one more "fake" tag
>>> and having a request for it, that is, having the fake tag visible to the
>>> block layer as a reserved tag, as John's series is doing, but for the
>>> reserved tags, we actually need to use an effective tag (qc->hw_tag) when
>>> issuing the commands. And for that, we can reuse the tag of one of the
>>> failed commands.
>>
>> Take a look at the magic flush request in blk-flush.c, which is
>> preallocated but borrows a tag from the request that wants a pre- or
>> post-flush. The logic is rather ugly, but maybe it might actually
>> become cleaner by generalizing it a bit.
>
> Thanks. Will check.
> I am also looking at scsi_unjam_host() and scsi_eh_get_sense(). These
> reuse a scsi command to do eh operations. So I could use that too, modulo
> making it work outside of eh context to keep the command flow intact.
>

Tsk. I was hoping to be able to remove it (especially
scsi_eh_get_sense()), but looks as if we actually do need it.
But it might be not a bad idea to have scsi_eh_get_sense() to run
independent on the SCSI EH stuff; returning with a sense code is not
necessary an error, so there are reasons for not always invoking SCSI EH.

Cheers,

Hannes
--
Dr. Hannes Reinecke Kernel Storage Architect
[email protected] +49 911 74053 688
SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), GF: Felix Imendörffer