2021-01-12 22:10:01

by Tyrel Datwyler

[permalink] [raw]
Subject: [PATCH v4 00/21] ibmvfc: initial MQ development

Recent updates in pHyp Firmware and VIOS releases provide new infrastructure
towards enabling Subordinate Command Response Queues (Sub-CRQs) such that each
Sub-CRQ is a channel backed by an actual hardware queue in the FC stack on the
partner VIOS. Sub-CRQs are registered with the firmware via hypercalls and then
negotiated with the VIOS via new Management Datagrams (MADs) for channel setup.

This initial implementation adds the necessary Sub-CRQ framework and implements
the new MADs for negotiating and assigning a set of Sub-CRQs to associated VIOS
HW backed channels.

This latest series is completely rebased and reimplemented on top of the recent
("ibmvfc: MQ prepartory locking work") series. [1]

[1] https://lore.kernel.org/linux-scsi/[email protected]/

changes in v4:
* Series rebased and reworked on top of previous ibmvfc locking series
* Dropped all previous Reviewed-by tags

changes in v3:
* Patch 4: changed firmware support logging to dev_warn_once
* Patch 6: adjusted locking
* Patch 15: dropped logging verbosity, moved cancel event tracking into subqueue
* Patch 17: removed write permission for migration module parameters
drive hard reset after update to num of scsi channels

changes in v2:
* Patch 4: NULL'd scsi_scrq reference after deallocation
* Patch 6: Added switch case to handle XPORT event
* Patch 9: fixed ibmvfc_event leak and double free
* added support for cancel command with MQ
* added parameter toggles for MQ settings

Tyrel Datwyler (21):
ibmvfc: add vhost fields and defaults for MQ enablement
ibmvfc: move event pool init/free routines
ibmvfc: init/free event pool during queue allocation/free
ibmvfc: add size parameter to ibmvfc_init_event_pool
ibmvfc: define hcall wrapper for registering a Sub-CRQ
ibmvfc: add Subordinate CRQ definitions
ibmvfc: add alloc/dealloc routines for SCSI Sub-CRQ Channels
ibmvfc: add Sub-CRQ IRQ enable/disable routine
ibmvfc: add handlers to drain and complete Sub-CRQ responses
ibmvfc: define Sub-CRQ interrupt handler routine
ibmvfc: map/request irq and register Sub-CRQ interrupt handler
ibmvfc: implement channel enquiry and setup commands
ibmvfc: advertise client support for using hardware channels
ibmvfc: set and track hw queue in ibmvfc_event struct
ibmvfc: send commands down HW Sub-CRQ when channelized
ibmvfc: register Sub-CRQ handles with VIOS during channel setup
ibmvfc: add cancel mad initialization helper
ibmvfc: send Cancel MAD down each hw scsi channel
ibmvfc: purge scsi channels after transport loss/reset
ibmvfc: enable MQ and set reasonable defaults
ibmvfc: provide modules parameters for MQ settings

drivers/scsi/ibmvscsi/ibmvfc.c | 914 ++++++++++++++++++++++++++++-----
drivers/scsi/ibmvscsi/ibmvfc.h | 38 ++
2 files changed, 824 insertions(+), 128 deletions(-)

--
2.27.0


2021-01-12 22:11:29

by Tyrel Datwyler

[permalink] [raw]
Subject: [PATCH v4 01/21] ibmvfc: add vhost fields and defaults for MQ enablement

Introduce several new vhost fields for managing MQ state of the adapter
as well as initial defaults for MQ enablement.

Signed-off-by: Tyrel Datwyler <[email protected]>
---
drivers/scsi/ibmvscsi/ibmvfc.c | 8 ++++++++
drivers/scsi/ibmvscsi/ibmvfc.h | 9 +++++++++
2 files changed, 17 insertions(+)

diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c
index ba95438a8912..9200fe49c57e 100644
--- a/drivers/scsi/ibmvscsi/ibmvfc.c
+++ b/drivers/scsi/ibmvscsi/ibmvfc.c
@@ -3302,6 +3302,7 @@ static struct scsi_host_template driver_template = {
.max_sectors = IBMVFC_MAX_SECTORS,
.shost_attrs = ibmvfc_attrs,
.track_queue_depth = 1,
+ .host_tagset = 1,
};

/**
@@ -5290,6 +5291,7 @@ static int ibmvfc_probe(struct vio_dev *vdev, const struct vio_device_id *id)
shost->max_sectors = IBMVFC_MAX_SECTORS;
shost->max_cmd_len = IBMVFC_MAX_CDB_LEN;
shost->unique_id = shost->host_no;
+ shost->nr_hw_queues = IBMVFC_MQ ? IBMVFC_SCSI_HW_QUEUES : 1;

vhost = shost_priv(shost);
INIT_LIST_HEAD(&vhost->targets);
@@ -5300,6 +5302,12 @@ static int ibmvfc_probe(struct vio_dev *vdev, const struct vio_device_id *id)
vhost->partition_number = -1;
vhost->log_level = log_level;
vhost->task_set = 1;
+
+ vhost->mq_enabled = IBMVFC_MQ;
+ vhost->client_scsi_channels = IBMVFC_SCSI_CHANNELS;
+ vhost->using_channels = 0;
+ vhost->do_enquiry = 1;
+
strcpy(vhost->partition_name, "UNKNOWN");
init_waitqueue_head(&vhost->work_wait_q);
init_waitqueue_head(&vhost->init_wait_q);
diff --git a/drivers/scsi/ibmvscsi/ibmvfc.h b/drivers/scsi/ibmvscsi/ibmvfc.h
index 632e977449c5..dd6d89292867 100644
--- a/drivers/scsi/ibmvscsi/ibmvfc.h
+++ b/drivers/scsi/ibmvscsi/ibmvfc.h
@@ -41,6 +41,11 @@
#define IBMVFC_DEFAULT_LOG_LEVEL 2
#define IBMVFC_MAX_CDB_LEN 16
#define IBMVFC_CLS3_ERROR 0
+#define IBMVFC_MQ 0
+#define IBMVFC_SCSI_CHANNELS 0
+#define IBMVFC_SCSI_HW_QUEUES 1
+#define IBMVFC_MIG_NO_SUB_TO_CRQ 0
+#define IBMVFC_MIG_NO_N_TO_M 0

/*
* Ensure we have resources for ERP and initialization:
@@ -840,6 +845,10 @@ struct ibmvfc_host {
int delay_init;
int scan_complete;
int logged_in;
+ int mq_enabled;
+ int using_channels;
+ int do_enquiry;
+ int client_scsi_channels;
int aborting_passthru;
int events_to_log;
#define IBMVFC_AE_LINKUP 0x0001
--
2.27.0

2021-01-12 22:12:14

by Tyrel Datwyler

[permalink] [raw]
Subject: [PATCH v4 10/21] ibmvfc: define Sub-CRQ interrupt handler routine

Simple handler that calls Sub-CRQ drain routine directly.

Signed-off-by: Tyrel Datwyler <[email protected]>
---
drivers/scsi/ibmvscsi/ibmvfc.c | 10 ++++++++++
1 file changed, 10 insertions(+)

diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c
index f3cd092478ee..51bcafad9490 100644
--- a/drivers/scsi/ibmvscsi/ibmvfc.c
+++ b/drivers/scsi/ibmvscsi/ibmvfc.c
@@ -3571,6 +3571,16 @@ static void ibmvfc_drain_sub_crq(struct ibmvfc_queue *scrq)
}
}

+static irqreturn_t ibmvfc_interrupt_scsi(int irq, void *scrq_instance)
+{
+ struct ibmvfc_queue *scrq = (struct ibmvfc_queue *)scrq_instance;
+
+ ibmvfc_toggle_scrq_irq(scrq, 0);
+ ibmvfc_drain_sub_crq(scrq);
+
+ return IRQ_HANDLED;
+}
+
/**
* ibmvfc_init_tgt - Set the next init job step for the target
* @tgt: ibmvfc target struct
--
2.27.0

2021-01-12 22:14:09

by Tyrel Datwyler

[permalink] [raw]
Subject: [PATCH v4 03/21] ibmvfc: init/free event pool during queue allocation/free

The event pool and CRQ used to be separate entities of the adapter host
structure and as such were allocated and freed independently of each
other. Recent work as defined a generic queue structure with an event
pool specific to each queue. As such the event pool for each queue
shouldn't be allocated/freed independently, but instead performed as
part of the queue allocation/free routines.

Move the calls to ibmvfc_event_pool_{init|free} into
ibmvfc_{alloc|free}_queue respectively. The only functional change here
is that the CRQ cannot be released in ibmvfc_remove until after the
event pool has been successfully purged since releasing the queue will
also free the event pool.

Signed-off-by: Tyrel Datwyler <[email protected]>
---
drivers/scsi/ibmvscsi/ibmvfc.c | 18 ++++++++----------
1 file changed, 8 insertions(+), 10 deletions(-)

diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c
index cd9273a5fadb..9330f5a65a7e 100644
--- a/drivers/scsi/ibmvscsi/ibmvfc.c
+++ b/drivers/scsi/ibmvscsi/ibmvfc.c
@@ -807,6 +807,8 @@ static void ibmvfc_free_queue(struct ibmvfc_host *vhost,
dma_unmap_single(dev, queue->msg_token, PAGE_SIZE, DMA_BIDIRECTIONAL);
free_page((unsigned long)queue->msgs.handle);
queue->msgs.handle = NULL;
+
+ ibmvfc_free_event_pool(vhost, queue);
}

/**
@@ -5019,6 +5021,10 @@ static int ibmvfc_alloc_queue(struct ibmvfc_host *vhost,
switch (fmt) {
case IBMVFC_CRQ_FMT:
fmt_size = sizeof(*queue->msgs.crq);
+ if (ibmvfc_init_event_pool(vhost, queue)) {
+ dev_err(dev, "Couldn't initialize event pool.\n");
+ return -ENOMEM;
+ }
break;
case IBMVFC_ASYNC_FMT:
fmt_size = sizeof(*queue->msgs.async);
@@ -5333,13 +5339,8 @@ static int ibmvfc_probe(struct vio_dev *vdev, const struct vio_device_id *id)
goto kill_kthread;
}

- if ((rc = ibmvfc_init_event_pool(vhost, &vhost->crq))) {
- dev_err(dev, "Couldn't initialize event pool. rc=%d\n", rc);
- goto release_crq;
- }
-
if ((rc = scsi_add_host(shost, dev)))
- goto release_event_pool;
+ goto release_crq;

fc_host_dev_loss_tmo(shost) = IBMVFC_DEV_LOSS_TMO;

@@ -5362,8 +5363,6 @@ static int ibmvfc_probe(struct vio_dev *vdev, const struct vio_device_id *id)

remove_shost:
scsi_remove_host(shost);
-release_event_pool:
- ibmvfc_free_event_pool(vhost, &vhost->crq);
release_crq:
ibmvfc_release_crq_queue(vhost);
kill_kthread:
@@ -5398,7 +5397,6 @@ static int ibmvfc_remove(struct vio_dev *vdev)
spin_unlock_irqrestore(vhost->host->host_lock, flags);

ibmvfc_wait_while_resetting(vhost);
- ibmvfc_release_crq_queue(vhost);
kthread_stop(vhost->work_thread);
fc_remove_host(vhost->host);
scsi_remove_host(vhost->host);
@@ -5408,7 +5406,7 @@ static int ibmvfc_remove(struct vio_dev *vdev)
list_splice_init(&vhost->purge, &purge);
spin_unlock_irqrestore(vhost->host->host_lock, flags);
ibmvfc_complete_purge(&purge);
- ibmvfc_free_event_pool(vhost, &vhost->crq);
+ ibmvfc_release_crq_queue(vhost);

ibmvfc_free_mem(vhost);
spin_lock(&ibmvfc_driver_lock);
--
2.27.0

2021-01-12 22:16:20

by Tyrel Datwyler

[permalink] [raw]
Subject: [PATCH v4 13/21] ibmvfc: advertise client support for using hardware channels

Previous patches have plumbed the necessary Sub-CRQ interface and
channel negotiation MADs to fully channelize via hardware backed queues.

Advertise client support via NPIV Login capability
IBMVFC_CAN_USE_CHANNELS when the client bits have MQ enabled via
vhost->mq_enabled, or when channels were already in use during a
subsequent NPIV Login. The later is required because channel support is
only renegotiated after a CRQ pair is broken. Simple NPIV Logout/Logins
require the client to continue to advertise the channel capability until
the CRQ pair between the client is broken.

Signed-off-by: Tyrel Datwyler <[email protected]>
---
drivers/scsi/ibmvscsi/ibmvfc.c | 4 ++++
1 file changed, 4 insertions(+)

diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c
index a00f38558613..0653d52d4ea0 100644
--- a/drivers/scsi/ibmvscsi/ibmvfc.c
+++ b/drivers/scsi/ibmvscsi/ibmvfc.c
@@ -1410,6 +1410,10 @@ static void ibmvfc_set_login_info(struct ibmvfc_host *vhost)

login_info->max_cmds = cpu_to_be32(max_requests + IBMVFC_NUM_INTERNAL_REQ);
login_info->capabilities = cpu_to_be64(IBMVFC_CAN_MIGRATE | IBMVFC_CAN_SEND_VF_WWPN);
+
+ if (vhost->mq_enabled || vhost->using_channels)
+ login_info->capabilities |= cpu_to_be64(IBMVFC_CAN_USE_CHANNELS);
+
login_info->async.va = cpu_to_be64(vhost->async_crq.msg_token);
login_info->async.len = cpu_to_be32(async_crq->size *
sizeof(*async_crq->msgs.async));
--
2.27.0

2021-01-12 22:18:15

by Tyrel Datwyler

[permalink] [raw]
Subject: [PATCH v4 19/21] ibmvfc: purge scsi channels after transport loss/reset

Grab the queue and list lock for each Sub-CRQ and add any uncompleted
events to the host purge list.

Signed-off-by: Tyrel Datwyler <[email protected]>
---
drivers/scsi/ibmvscsi/ibmvfc.c | 16 ++++++++++++++++
1 file changed, 16 insertions(+)

diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c
index 24e1278acfeb..b413f5da71ce 100644
--- a/drivers/scsi/ibmvscsi/ibmvfc.c
+++ b/drivers/scsi/ibmvscsi/ibmvfc.c
@@ -1056,7 +1056,13 @@ static void ibmvfc_fail_request(struct ibmvfc_event *evt, int error_code)
static void ibmvfc_purge_requests(struct ibmvfc_host *vhost, int error_code)
{
struct ibmvfc_event *evt, *pos;
+ struct ibmvfc_queue *queues = vhost->scsi_scrqs.scrqs;
unsigned long flags;
+ int hwqs = 0;
+ int i;
+
+ if (vhost->using_channels)
+ hwqs = vhost->scsi_scrqs.active_queues;

ibmvfc_dbg(vhost, "Purging all requests\n");
spin_lock_irqsave(&vhost->crq.l_lock, flags);
@@ -1064,6 +1070,16 @@ static void ibmvfc_purge_requests(struct ibmvfc_host *vhost, int error_code)
ibmvfc_fail_request(evt, error_code);
list_splice_init(&vhost->crq.sent, &vhost->purge);
spin_unlock_irqrestore(&vhost->crq.l_lock, flags);
+
+ for (i = 0; i < hwqs; i++) {
+ spin_lock_irqsave(queues[i].q_lock, flags);
+ spin_lock(&queues[i].l_lock);
+ list_for_each_entry_safe(evt, pos, &queues[i].sent, queue_list)
+ ibmvfc_fail_request(evt, error_code);
+ list_splice_init(&queues[i].sent, &vhost->purge);
+ spin_unlock(&queues[i].l_lock);
+ spin_unlock_irqrestore(queues[i].q_lock, flags);
+ }
}

/**
--
2.27.0

2021-01-12 22:18:20

by Tyrel Datwyler

[permalink] [raw]
Subject: [PATCH v4 08/21] ibmvfc: add Sub-CRQ IRQ enable/disable routine

Each Sub-CRQ has its own interrupt. A hypercall is required to toggle
the IRQ state. Provide the necessary mechanism via a helper function.

Signed-off-by: Tyrel Datwyler <[email protected]>
---
drivers/scsi/ibmvscsi/ibmvfc.c | 20 ++++++++++++++++++++
1 file changed, 20 insertions(+)

diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c
index a198e118887d..5d7ada0ed0d6 100644
--- a/drivers/scsi/ibmvscsi/ibmvfc.c
+++ b/drivers/scsi/ibmvscsi/ibmvfc.c
@@ -3465,6 +3465,26 @@ static void ibmvfc_tasklet(void *data)
}
}

+static int ibmvfc_toggle_scrq_irq(struct ibmvfc_queue *scrq, int enable)
+{
+ struct device *dev = scrq->vhost->dev;
+ struct vio_dev *vdev = to_vio_dev(dev);
+ unsigned long rc;
+ int irq_action = H_ENABLE_VIO_INTERRUPT;
+
+ if (!enable)
+ irq_action = H_DISABLE_VIO_INTERRUPT;
+
+ rc = plpar_hcall_norets(H_VIOCTL, vdev->unit_address, irq_action,
+ scrq->hw_irq, 0, 0);
+
+ if (rc)
+ dev_err(dev, "Couldn't %s sub-crq[%lu] irq. rc=%ld\n",
+ enable ? "enable" : "disable", scrq->hwq_id, rc);
+
+ return rc;
+}
+
/**
* ibmvfc_init_tgt - Set the next init job step for the target
* @tgt: ibmvfc target struct
--
2.27.0

2021-01-12 22:18:25

by Tyrel Datwyler

[permalink] [raw]
Subject: [PATCH v4 11/21] ibmvfc: map/request irq and register Sub-CRQ interrupt handler

Create an irq mapping for the hw_irq number provided from phyp firmware.
Request an irq assigned our Sub-CRQ interrupt handler. Unmap these irqs
at Sub-CRQ teardown.

Signed-off-by: Tyrel Datwyler <[email protected]>
---
drivers/scsi/ibmvscsi/ibmvfc.c | 25 +++++++++++++++++++++++++
1 file changed, 25 insertions(+)

diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c
index 51bcafad9490..d3d7c6b53d4f 100644
--- a/drivers/scsi/ibmvscsi/ibmvfc.c
+++ b/drivers/scsi/ibmvscsi/ibmvfc.c
@@ -5292,12 +5292,34 @@ static int ibmvfc_register_scsi_channel(struct ibmvfc_host *vhost,
goto reg_failed;
}

+ scrq->irq = irq_create_mapping(NULL, scrq->hw_irq);
+
+ if (!scrq->irq) {
+ rc = -EINVAL;
+ dev_err(dev, "Error mapping sub-crq[%d] irq\n", index);
+ goto irq_failed;
+ }
+
+ snprintf(scrq->name, sizeof(scrq->name), "ibmvfc-%x-scsi%d",
+ vdev->unit_address, index);
+ rc = request_irq(scrq->irq, ibmvfc_interrupt_scsi, 0, scrq->name, scrq);
+
+ if (rc) {
+ dev_err(dev, "Couldn't register sub-crq[%d] irq\n", index);
+ irq_dispose_mapping(scrq->irq);
+ goto irq_failed;
+ }
+
scrq->hwq_id = index;
scrq->vhost = vhost;

LEAVE;
return 0;

+irq_failed:
+ do {
+ plpar_hcall_norets(H_FREE_SUB_CRQ, vdev->unit_address, scrq->cookie);
+ } while (rc == H_BUSY || H_IS_LONG_BUSY(rc));
reg_failed:
ibmvfc_free_queue(vhost, scrq);
LEAVE;
@@ -5313,6 +5335,9 @@ static void ibmvfc_deregister_scsi_channel(struct ibmvfc_host *vhost, int index)

ENTER;

+ free_irq(scrq->irq, scrq);
+ irq_dispose_mapping(scrq->irq);
+
do {
rc = plpar_hcall_norets(H_FREE_SUB_CRQ, vdev->unit_address,
scrq->cookie);
--
2.27.0

2021-01-12 22:18:30

by Tyrel Datwyler

[permalink] [raw]
Subject: [PATCH v4 07/21] ibmvfc: add alloc/dealloc routines for SCSI Sub-CRQ Channels

Allocate a set of Sub-CRQs in advance. During channel setup the client
and VIOS negotiate the number of queues the VIOS supports and the number
that the client desires to request. Its possible that the final channel
resources allocated is less than requested, but the client is still
responsible for sending handles for every queue it is hoping for.

Also, provide deallocation cleanup routines.

Signed-off-by: Tyrel Datwyler <[email protected]>
---
drivers/scsi/ibmvscsi/ibmvfc.c | 125 +++++++++++++++++++++++++++++++++
drivers/scsi/ibmvscsi/ibmvfc.h | 1 +
2 files changed, 126 insertions(+)

diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c
index 612c7f3d7bd3..a198e118887d 100644
--- a/drivers/scsi/ibmvscsi/ibmvfc.c
+++ b/drivers/scsi/ibmvscsi/ibmvfc.c
@@ -895,6 +895,8 @@ static int ibmvfc_reset_crq(struct ibmvfc_host *vhost)
unsigned long flags;
struct vio_dev *vdev = to_vio_dev(vhost->dev);
struct ibmvfc_queue *crq = &vhost->crq;
+ struct ibmvfc_queue *scrq;
+ int i;

/* Close the CRQ */
do {
@@ -912,6 +914,16 @@ static int ibmvfc_reset_crq(struct ibmvfc_host *vhost)
memset(crq->msgs.crq, 0, PAGE_SIZE);
crq->cur = 0;

+ if (vhost->scsi_scrqs.scrqs) {
+ for (i = 0; i < IBMVFC_SCSI_HW_QUEUES; i++) {
+ scrq = &vhost->scsi_scrqs.scrqs[i];
+ spin_lock(scrq->q_lock);
+ memset(scrq->msgs.scrq, 0, PAGE_SIZE);
+ scrq->cur = 0;
+ spin_unlock(scrq->q_lock);
+ }
+ }
+
/* And re-open it again */
rc = plpar_hcall_norets(H_REG_CRQ, vdev->unit_address,
crq->msg_token, PAGE_SIZE);
@@ -5045,6 +5057,11 @@ static int ibmvfc_alloc_queue(struct ibmvfc_host *vhost,
case IBMVFC_ASYNC_FMT:
fmt_size = sizeof(*queue->msgs.async);
break;
+ case IBMVFC_SUB_CRQ_FMT:
+ fmt_size = sizeof(*queue->msgs.scrq);
+ /* We need one extra event for Cancel Commands */
+ pool_size = max_requests + 1;
+ break;
default:
dev_warn(dev, "Unknown command/response queue message format: %d\n", fmt);
return -EINVAL;
@@ -5136,6 +5153,107 @@ static int ibmvfc_init_crq(struct ibmvfc_host *vhost)
return retrc;
}

+static int ibmvfc_register_scsi_channel(struct ibmvfc_host *vhost,
+ int index)
+{
+ struct device *dev = vhost->dev;
+ struct vio_dev *vdev = to_vio_dev(dev);
+ struct ibmvfc_queue *scrq = &vhost->scsi_scrqs.scrqs[index];
+ int rc = -ENOMEM;
+
+ ENTER;
+
+ if (ibmvfc_alloc_queue(vhost, scrq, IBMVFC_SUB_CRQ_FMT))
+ return -ENOMEM;
+
+ rc = h_reg_sub_crq(vdev->unit_address, scrq->msg_token, PAGE_SIZE,
+ &scrq->cookie, &scrq->hw_irq);
+
+ if (rc) {
+ dev_warn(dev, "Error registering sub-crq: %d\n", rc);
+ if (rc == H_PARAMETER)
+ dev_warn_once(dev, "Firmware may not support MQ\n");
+ goto reg_failed;
+ }
+
+ scrq->hwq_id = index;
+ scrq->vhost = vhost;
+
+ LEAVE;
+ return 0;
+
+reg_failed:
+ ibmvfc_free_queue(vhost, scrq);
+ LEAVE;
+ return rc;
+}
+
+static void ibmvfc_deregister_scsi_channel(struct ibmvfc_host *vhost, int index)
+{
+ struct device *dev = vhost->dev;
+ struct vio_dev *vdev = to_vio_dev(dev);
+ struct ibmvfc_queue *scrq = &vhost->scsi_scrqs.scrqs[index];
+ long rc;
+
+ ENTER;
+
+ do {
+ rc = plpar_hcall_norets(H_FREE_SUB_CRQ, vdev->unit_address,
+ scrq->cookie);
+ } while (rc == H_BUSY || H_IS_LONG_BUSY(rc));
+
+ if (rc)
+ dev_err(dev, "Failed to free sub-crq[%d]: rc=%ld\n", index, rc);
+
+ ibmvfc_free_queue(vhost, scrq);
+ LEAVE;
+}
+
+static int ibmvfc_init_sub_crqs(struct ibmvfc_host *vhost)
+{
+ int i, j;
+
+ ENTER;
+
+ vhost->scsi_scrqs.scrqs = kcalloc(IBMVFC_SCSI_HW_QUEUES,
+ sizeof(*vhost->scsi_scrqs.scrqs),
+ GFP_KERNEL);
+ if (!vhost->scsi_scrqs.scrqs)
+ return -1;
+
+ for (i = 0; i < IBMVFC_SCSI_HW_QUEUES; i++) {
+ if (ibmvfc_register_scsi_channel(vhost, i)) {
+ for (j = i; j > 0; j--)
+ ibmvfc_deregister_scsi_channel(vhost, j - 1);
+ kfree(vhost->scsi_scrqs.scrqs);
+ vhost->scsi_scrqs.scrqs = NULL;
+ vhost->scsi_scrqs.active_queues = 0;
+ LEAVE;
+ return -1;
+ }
+ }
+
+ LEAVE;
+ return 0;
+}
+
+static void ibmvfc_release_sub_crqs(struct ibmvfc_host *vhost)
+{
+ int i;
+
+ ENTER;
+ if (!vhost->scsi_scrqs.scrqs)
+ return;
+
+ for (i = 0; i < IBMVFC_SCSI_HW_QUEUES; i++)
+ ibmvfc_deregister_scsi_channel(vhost, i);
+
+ kfree(vhost->scsi_scrqs.scrqs);
+ vhost->scsi_scrqs.scrqs = NULL;
+ vhost->scsi_scrqs.active_queues = 0;
+ LEAVE;
+}
+
/**
* ibmvfc_free_mem - Free memory for vhost
* @vhost: ibmvfc host struct
@@ -5371,6 +5489,12 @@ static int ibmvfc_probe(struct vio_dev *vdev, const struct vio_device_id *id)
goto remove_shost;
}

+ if (vhost->mq_enabled) {
+ rc = ibmvfc_init_sub_crqs(vhost);
+ if (rc)
+ dev_warn(dev, "Failed to allocate Sub-CRQs. rc=%d\n", rc);
+ }
+
if (shost_to_fc_host(shost)->rqst_q)
blk_queue_max_segments(shost_to_fc_host(shost)->rqst_q, 1);
dev_set_drvdata(dev, vhost);
@@ -5427,6 +5551,7 @@ static int ibmvfc_remove(struct vio_dev *vdev)
list_splice_init(&vhost->purge, &purge);
spin_unlock_irqrestore(vhost->host->host_lock, flags);
ibmvfc_complete_purge(&purge);
+ ibmvfc_release_sub_crqs(vhost);
ibmvfc_release_crq_queue(vhost);

ibmvfc_free_mem(vhost);
diff --git a/drivers/scsi/ibmvscsi/ibmvfc.h b/drivers/scsi/ibmvscsi/ibmvfc.h
index b9eed05c165f..bdafe9956649 100644
--- a/drivers/scsi/ibmvscsi/ibmvfc.h
+++ b/drivers/scsi/ibmvscsi/ibmvfc.h
@@ -850,6 +850,7 @@ struct ibmvfc_host {
mempool_t *tgt_pool;
struct ibmvfc_queue crq;
struct ibmvfc_queue async_crq;
+ struct ibmvfc_scsi_channels scsi_scrqs;
struct ibmvfc_npiv_login login_info;
union ibmvfc_npiv_login_data *login_buf;
dma_addr_t login_buf_dma;
--
2.27.0

2021-01-13 03:23:07

by Brian King

[permalink] [raw]
Subject: Re: [PATCH v4 01/21] ibmvfc: add vhost fields and defaults for MQ enablement

On 1/11/21 5:12 PM, Tyrel Datwyler wrote:
> Introduce several new vhost fields for managing MQ state of the adapter
> as well as initial defaults for MQ enablement.
>
> Signed-off-by: Tyrel Datwyler <[email protected]>
> ---
> drivers/scsi/ibmvscsi/ibmvfc.c | 8 ++++++++
> drivers/scsi/ibmvscsi/ibmvfc.h | 9 +++++++++
> 2 files changed, 17 insertions(+)
>
> diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c
> index ba95438a8912..9200fe49c57e 100644
> --- a/drivers/scsi/ibmvscsi/ibmvfc.c
> +++ b/drivers/scsi/ibmvscsi/ibmvfc.c
> @@ -3302,6 +3302,7 @@ static struct scsi_host_template driver_template = {
> .max_sectors = IBMVFC_MAX_SECTORS,
> .shost_attrs = ibmvfc_attrs,
> .track_queue_depth = 1,
> + .host_tagset = 1,

This doesn't seem right. You are setting host_tagset, which means you want a
shared, host wide, tag set for commands. It also means that the total
queue depth for the host is can_queue. However, it looks like you are allocating
max_requests events for each sub crq, which means you are over allocating memory.

Looking at this closer, we might have bigger problems. There is a host wide
max number of commands that the VFC host supports, which gets returned on
NPIV Login. This value can change across a live migration event.

The ibmvfc driver, which does the same thing the lpfc driver does, modifies
can_queue on the scsi_host *after* the tag set has been allocated. This looks
to be a concern with ibmvfc, not sure about lpfc, as it doesn't look like
we look at can_queue once the tag set is setup, and I'm not seeing a good way
to dynamically change the host queue depth once the tag set is setup.

Unless I'm missing something, our best options appear to either be to implement
our own host wide busy reference counting, which doesn't sound very good, or
we need to add some API to block / scsi that allows us to dynamically change
can_queue.

Thanks,

Brian


--
Brian King
Power Linux I/O
IBM Linux Technology Center

2021-01-13 03:34:12

by Tyrel Datwyler

[permalink] [raw]
Subject: Re: [PATCH v4 01/21] ibmvfc: add vhost fields and defaults for MQ enablement

On 1/12/21 2:54 PM, Brian King wrote:
> On 1/11/21 5:12 PM, Tyrel Datwyler wrote:
>> Introduce several new vhost fields for managing MQ state of the adapter
>> as well as initial defaults for MQ enablement.
>>
>> Signed-off-by: Tyrel Datwyler <[email protected]>
>> ---
>> drivers/scsi/ibmvscsi/ibmvfc.c | 8 ++++++++
>> drivers/scsi/ibmvscsi/ibmvfc.h | 9 +++++++++
>> 2 files changed, 17 insertions(+)
>>
>> diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c
>> index ba95438a8912..9200fe49c57e 100644
>> --- a/drivers/scsi/ibmvscsi/ibmvfc.c
>> +++ b/drivers/scsi/ibmvscsi/ibmvfc.c
>> @@ -3302,6 +3302,7 @@ static struct scsi_host_template driver_template = {
>> .max_sectors = IBMVFC_MAX_SECTORS,
>> .shost_attrs = ibmvfc_attrs,
>> .track_queue_depth = 1,
>> + .host_tagset = 1,
>
> This doesn't seem right. You are setting host_tagset, which means you want a
> shared, host wide, tag set for commands. It also means that the total
> queue depth for the host is can_queue. However, it looks like you are allocating
> max_requests events for each sub crq, which means you are over allocating memory.

With the shared tagset yes the queue depth for the host is can_queue, but this
also implies that the max queue depth for each hw queue is also can_queue. So,
in the worst case that all commands are queued down the same hw queue we need an
event pool with can_queue commands.

>
> Looking at this closer, we might have bigger problems. There is a host wide
> max number of commands that the VFC host supports, which gets returned on
> NPIV Login. This value can change across a live migration event.

From what I understand the max commands can only become less.

>
> The ibmvfc driver, which does the same thing the lpfc driver does, modifies
> can_queue on the scsi_host *after* the tag set has been allocated. This looks
> to be a concern with ibmvfc, not sure about lpfc, as it doesn't look like
> we look at can_queue once the tag set is setup, and I'm not seeing a good way
> to dynamically change the host queue depth once the tag set is setup.
>
> Unless I'm missing something, our best options appear to either be to implement
> our own host wide busy reference counting, which doesn't sound very good, or
> we need to add some API to block / scsi that allows us to dynamically change
> can_queue.

Changing can_queue won't do use any good with the shared tagset becasue each
queue still needs to be able to queue can_queue number of commands in the worst
case.

The complaint before was not using shared tags increases the tag memory
allocation because they are statically allocated for each queue. The question is
what claims more memory our event pool allocation or the tagset per queue
allocation.

If we chose to not use the shared tagset then the queue depth for each hw queue
becomes (can_queue / nr_hw_queues).

-Tyrel

>
> Thanks,
>
> Brian
>
>

2021-01-13 17:15:40

by Brian King

[permalink] [raw]
Subject: Re: [PATCH v4 01/21] ibmvfc: add vhost fields and defaults for MQ enablement

On 1/12/21 6:33 PM, Tyrel Datwyler wrote:
> On 1/12/21 2:54 PM, Brian King wrote:
>> On 1/11/21 5:12 PM, Tyrel Datwyler wrote:
>>> Introduce several new vhost fields for managing MQ state of the adapter
>>> as well as initial defaults for MQ enablement.
>>>
>>> Signed-off-by: Tyrel Datwyler <[email protected]>
>>> ---
>>> drivers/scsi/ibmvscsi/ibmvfc.c | 8 ++++++++
>>> drivers/scsi/ibmvscsi/ibmvfc.h | 9 +++++++++
>>> 2 files changed, 17 insertions(+)
>>>
>>> diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c
>>> index ba95438a8912..9200fe49c57e 100644
>>> --- a/drivers/scsi/ibmvscsi/ibmvfc.c
>>> +++ b/drivers/scsi/ibmvscsi/ibmvfc.c
>>> @@ -3302,6 +3302,7 @@ static struct scsi_host_template driver_template = {
>>> .max_sectors = IBMVFC_MAX_SECTORS,
>>> .shost_attrs = ibmvfc_attrs,
>>> .track_queue_depth = 1,
>>> + .host_tagset = 1,
>>
>> This doesn't seem right. You are setting host_tagset, which means you want a
>> shared, host wide, tag set for commands. It also means that the total
>> queue depth for the host is can_queue. However, it looks like you are allocating
>> max_requests events for each sub crq, which means you are over allocating memory.
>
> With the shared tagset yes the queue depth for the host is can_queue, but this
> also implies that the max queue depth for each hw queue is also can_queue. So,
> in the worst case that all commands are queued down the same hw queue we need an
> event pool with can_queue commands.
>
>>
>> Looking at this closer, we might have bigger problems. There is a host wide
>> max number of commands that the VFC host supports, which gets returned on
>> NPIV Login. This value can change across a live migration event.
>
> From what I understand the max commands can only become less.
>
>>
>> The ibmvfc driver, which does the same thing the lpfc driver does, modifies
>> can_queue on the scsi_host *after* the tag set has been allocated. This looks
>> to be a concern with ibmvfc, not sure about lpfc, as it doesn't look like
>> we look at can_queue once the tag set is setup, and I'm not seeing a good way
>> to dynamically change the host queue depth once the tag set is setup.
>>
>> Unless I'm missing something, our best options appear to either be to implement
>> our own host wide busy reference counting, which doesn't sound very good, or
>> we need to add some API to block / scsi that allows us to dynamically change
>> can_queue.
>
> Changing can_queue won't do use any good with the shared tagset becasue each
> queue still needs to be able to queue can_queue number of commands in the worst
> case.

The issue I'm trying to highlight here is the following scenario:

1. We set shost->can_queue, then call scsi_add_host, which allocates the tag set.

2. On our NPIV login response from the VIOS, we might get a lower value than we
initially set in shost->can_queue, so we update it, but nobody ever looks at it
again, and we don't have any protection against sending too many commands to the host.


Basically, we no longer have any code that ensures we don't send more
commands to the VIOS than we are told it supports. According to the architecture,
if we actually do this, the VIOS will do an h_free_crq, which would be a bit
of a bug on our part.

I don't think it was ever clearly defined in the API that a driver can
change shost->can_queue after calling scsi_add_host, but up until
commit 6eb045e092efefafc6687409a6fa6d1dabf0fb69, this worked and now
it doesn't.

I started looking through drivers that do this, and so far, it looks like the
following drivers do: ibmvfc, lpfc, aix94xx, libfc, BusLogic, and likely others...

We probably need an API that lets us change shost->can_queue dynamically.

Thanks,

Brian

--
Brian King
Power Linux I/O
IBM Linux Technology Center

2021-01-13 19:00:47

by Brian King

[permalink] [raw]
Subject: Re: [PATCH v4 00/21] ibmvfc: initial MQ development

With the exception of the few comments I've shared, the rest of this looks
good to me and you can add my:

Reviewed-by: Brian King <[email protected]>

Thanks,

Brian

--
Brian King
Power Linux I/O
IBM Linux Technology Center

2021-01-14 01:42:17

by Ming Lei

[permalink] [raw]
Subject: Re: [PATCH v4 01/21] ibmvfc: add vhost fields and defaults for MQ enablement

On Wed, Jan 13, 2021 at 11:13:07AM -0600, Brian King wrote:
> On 1/12/21 6:33 PM, Tyrel Datwyler wrote:
> > On 1/12/21 2:54 PM, Brian King wrote:
> >> On 1/11/21 5:12 PM, Tyrel Datwyler wrote:
> >>> Introduce several new vhost fields for managing MQ state of the adapter
> >>> as well as initial defaults for MQ enablement.
> >>>
> >>> Signed-off-by: Tyrel Datwyler <[email protected]>
> >>> ---
> >>> drivers/scsi/ibmvscsi/ibmvfc.c | 8 ++++++++
> >>> drivers/scsi/ibmvscsi/ibmvfc.h | 9 +++++++++
> >>> 2 files changed, 17 insertions(+)
> >>>
> >>> diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c
> >>> index ba95438a8912..9200fe49c57e 100644
> >>> --- a/drivers/scsi/ibmvscsi/ibmvfc.c
> >>> +++ b/drivers/scsi/ibmvscsi/ibmvfc.c
> >>> @@ -3302,6 +3302,7 @@ static struct scsi_host_template driver_template = {
> >>> .max_sectors = IBMVFC_MAX_SECTORS,
> >>> .shost_attrs = ibmvfc_attrs,
> >>> .track_queue_depth = 1,
> >>> + .host_tagset = 1,
> >>
> >> This doesn't seem right. You are setting host_tagset, which means you want a
> >> shared, host wide, tag set for commands. It also means that the total
> >> queue depth for the host is can_queue. However, it looks like you are allocating
> >> max_requests events for each sub crq, which means you are over allocating memory.
> >
> > With the shared tagset yes the queue depth for the host is can_queue, but this
> > also implies that the max queue depth for each hw queue is also can_queue. So,
> > in the worst case that all commands are queued down the same hw queue we need an
> > event pool with can_queue commands.
> >
> >>
> >> Looking at this closer, we might have bigger problems. There is a host wide
> >> max number of commands that the VFC host supports, which gets returned on
> >> NPIV Login. This value can change across a live migration event.
> >
> > From what I understand the max commands can only become less.
> >
> >>
> >> The ibmvfc driver, which does the same thing the lpfc driver does, modifies
> >> can_queue on the scsi_host *after* the tag set has been allocated. This looks
> >> to be a concern with ibmvfc, not sure about lpfc, as it doesn't look like
> >> we look at can_queue once the tag set is setup, and I'm not seeing a good way
> >> to dynamically change the host queue depth once the tag set is setup.
> >>
> >> Unless I'm missing something, our best options appear to either be to implement
> >> our own host wide busy reference counting, which doesn't sound very good, or
> >> we need to add some API to block / scsi that allows us to dynamically change
> >> can_queue.
> >
> > Changing can_queue won't do use any good with the shared tagset becasue each
> > queue still needs to be able to queue can_queue number of commands in the worst
> > case.
>
> The issue I'm trying to highlight here is the following scenario:
>
> 1. We set shost->can_queue, then call scsi_add_host, which allocates the tag set.
>
> 2. On our NPIV login response from the VIOS, we might get a lower value than we
> initially set in shost->can_queue, so we update it, but nobody ever looks at it
> again, and we don't have any protection against sending too many commands to the host.
>
>
> Basically, we no longer have any code that ensures we don't send more
> commands to the VIOS than we are told it supports. According to the architecture,
> if we actually do this, the VIOS will do an h_free_crq, which would be a bit
> of a bug on our part.
>
> I don't think it was ever clearly defined in the API that a driver can
> change shost->can_queue after calling scsi_add_host, but up until
> commit 6eb045e092efefafc6687409a6fa6d1dabf0fb69, this worked and now
> it doesn't.

Actually it isn't related with commit 6eb045e092ef, because blk_mq_alloc_tag_set()
uses .can_queue to create driver tag sbitmap and request pool.

So even thought without 6eb045e092ef, the updated .can_queue can't work
as expected because the max driver tag depth has been fixed by blk-mq already.

What 6eb045e092ef does is just to remove the double check on max
host-wide allowed commands because that has been respected by blk-mq
driver tag allocation already.

>
> I started looking through drivers that do this, and so far, it looks like the
> following drivers do: ibmvfc, lpfc, aix94xx, libfc, BusLogic, and likely others...
>
> We probably need an API that lets us change shost->can_queue dynamically.

I'd suggest to confirm changing .can_queue is one real usecase.


Thanks,
Ming

2021-01-14 17:29:15

by Brian King

[permalink] [raw]
Subject: Re: [PATCH v4 01/21] ibmvfc: add vhost fields and defaults for MQ enablement

On 1/13/21 7:27 PM, Ming Lei wrote:
> On Wed, Jan 13, 2021 at 11:13:07AM -0600, Brian King wrote:
>> On 1/12/21 6:33 PM, Tyrel Datwyler wrote:
>>> On 1/12/21 2:54 PM, Brian King wrote:
>>>> On 1/11/21 5:12 PM, Tyrel Datwyler wrote:
>>>>> Introduce several new vhost fields for managing MQ state of the adapter
>>>>> as well as initial defaults for MQ enablement.
>>>>>
>>>>> Signed-off-by: Tyrel Datwyler <[email protected]>
>>>>> ---
>>>>> drivers/scsi/ibmvscsi/ibmvfc.c | 8 ++++++++
>>>>> drivers/scsi/ibmvscsi/ibmvfc.h | 9 +++++++++
>>>>> 2 files changed, 17 insertions(+)
>>>>>
>>>>> diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c
>>>>> index ba95438a8912..9200fe49c57e 100644
>>>>> --- a/drivers/scsi/ibmvscsi/ibmvfc.c
>>>>> +++ b/drivers/scsi/ibmvscsi/ibmvfc.c
>>>>> @@ -3302,6 +3302,7 @@ static struct scsi_host_template driver_template = {
>>>>> .max_sectors = IBMVFC_MAX_SECTORS,
>>>>> .shost_attrs = ibmvfc_attrs,
>>>>> .track_queue_depth = 1,
>>>>> + .host_tagset = 1,
>>>>
>>>> This doesn't seem right. You are setting host_tagset, which means you want a
>>>> shared, host wide, tag set for commands. It also means that the total
>>>> queue depth for the host is can_queue. However, it looks like you are allocating
>>>> max_requests events for each sub crq, which means you are over allocating memory.
>>>
>>> With the shared tagset yes the queue depth for the host is can_queue, but this
>>> also implies that the max queue depth for each hw queue is also can_queue. So,
>>> in the worst case that all commands are queued down the same hw queue we need an
>>> event pool with can_queue commands.
>>>
>>>>
>>>> Looking at this closer, we might have bigger problems. There is a host wide
>>>> max number of commands that the VFC host supports, which gets returned on
>>>> NPIV Login. This value can change across a live migration event.
>>>
>>> From what I understand the max commands can only become less.
>>>
>>>>
>>>> The ibmvfc driver, which does the same thing the lpfc driver does, modifies
>>>> can_queue on the scsi_host *after* the tag set has been allocated. This looks
>>>> to be a concern with ibmvfc, not sure about lpfc, as it doesn't look like
>>>> we look at can_queue once the tag set is setup, and I'm not seeing a good way
>>>> to dynamically change the host queue depth once the tag set is setup.
>>>>
>>>> Unless I'm missing something, our best options appear to either be to implement
>>>> our own host wide busy reference counting, which doesn't sound very good, or
>>>> we need to add some API to block / scsi that allows us to dynamically change
>>>> can_queue.
>>>
>>> Changing can_queue won't do use any good with the shared tagset becasue each
>>> queue still needs to be able to queue can_queue number of commands in the worst
>>> case.
>>
>> The issue I'm trying to highlight here is the following scenario:
>>
>> 1. We set shost->can_queue, then call scsi_add_host, which allocates the tag set.
>>
>> 2. On our NPIV login response from the VIOS, we might get a lower value than we
>> initially set in shost->can_queue, so we update it, but nobody ever looks at it
>> again, and we don't have any protection against sending too many commands to the host.
>>
>>
>> Basically, we no longer have any code that ensures we don't send more
>> commands to the VIOS than we are told it supports. According to the architecture,
>> if we actually do this, the VIOS will do an h_free_crq, which would be a bit
>> of a bug on our part.
>>
>> I don't think it was ever clearly defined in the API that a driver can
>> change shost->can_queue after calling scsi_add_host, but up until
>> commit 6eb045e092efefafc6687409a6fa6d1dabf0fb69, this worked and now
>> it doesn't.
>
> Actually it isn't related with commit 6eb045e092ef, because blk_mq_alloc_tag_set()
> uses .can_queue to create driver tag sbitmap and request pool.
>
> So even thought without 6eb045e092ef, the updated .can_queue can't work
> as expected because the max driver tag depth has been fixed by blk-mq already.

There are two scenarios here. In the scenario of someone increasing can_queue
after the tag set is allocated, I agree, blk-mq will never take advantage
of this. However, in the scenario of someone *decreasing* can_queue after the
tag set is allocated, prior to 6eb045e092ef, the shost->host_busy code provided
this protection.

>
> What 6eb045e092ef does is just to remove the double check on max
> host-wide allowed commands because that has been respected by blk-mq
> driver tag allocation already.
>
>>
>> I started looking through drivers that do this, and so far, it looks like the
>> following drivers do: ibmvfc, lpfc, aix94xx, libfc, BusLogic, and likely others...
>>
>> We probably need an API that lets us change shost->can_queue dynamically.
>
> I'd suggest to confirm changing .can_queue is one real usecase.

For ibmvfc, the total number of commands that the scsi host supports is very
much a dynamic value. It can increase and it can decrease. Live migrating
a logical partition from one system to another is the usual cause of
such a capability change. For ibmvfc, at least, this only ever happens
when we've self blocked the host and have sent back all outstanding I/O.

However, looking at other drivers that modify can_queue dynamically, this
doesn't always hold true. Looking at libfc, it looks to dynamically ramp
up and ramp down can_queue based on its ability to handle requests.

There are certainly a number of other drivers that change can_queue
after the tag set has been allocated. Some of these drivers could
likely be changed to avoid doing this, but changing them all will likely
be difficult.

Thanks,

Brian

--
Brian King
Power Linux I/O
IBM Linux Technology Center

2021-01-15 07:04:07

by Ming Lei

[permalink] [raw]
Subject: Re: [PATCH v4 01/21] ibmvfc: add vhost fields and defaults for MQ enablement

On Thu, Jan 14, 2021 at 11:24:35AM -0600, Brian King wrote:
> On 1/13/21 7:27 PM, Ming Lei wrote:
> > On Wed, Jan 13, 2021 at 11:13:07AM -0600, Brian King wrote:
> >> On 1/12/21 6:33 PM, Tyrel Datwyler wrote:
> >>> On 1/12/21 2:54 PM, Brian King wrote:
> >>>> On 1/11/21 5:12 PM, Tyrel Datwyler wrote:
> >>>>> Introduce several new vhost fields for managing MQ state of the adapter
> >>>>> as well as initial defaults for MQ enablement.
> >>>>>
> >>>>> Signed-off-by: Tyrel Datwyler <[email protected]>
> >>>>> ---
> >>>>> drivers/scsi/ibmvscsi/ibmvfc.c | 8 ++++++++
> >>>>> drivers/scsi/ibmvscsi/ibmvfc.h | 9 +++++++++
> >>>>> 2 files changed, 17 insertions(+)
> >>>>>
> >>>>> diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c
> >>>>> index ba95438a8912..9200fe49c57e 100644
> >>>>> --- a/drivers/scsi/ibmvscsi/ibmvfc.c
> >>>>> +++ b/drivers/scsi/ibmvscsi/ibmvfc.c
> >>>>> @@ -3302,6 +3302,7 @@ static struct scsi_host_template driver_template = {
> >>>>> .max_sectors = IBMVFC_MAX_SECTORS,
> >>>>> .shost_attrs = ibmvfc_attrs,
> >>>>> .track_queue_depth = 1,
> >>>>> + .host_tagset = 1,
> >>>>
> >>>> This doesn't seem right. You are setting host_tagset, which means you want a
> >>>> shared, host wide, tag set for commands. It also means that the total
> >>>> queue depth for the host is can_queue. However, it looks like you are allocating
> >>>> max_requests events for each sub crq, which means you are over allocating memory.
> >>>
> >>> With the shared tagset yes the queue depth for the host is can_queue, but this
> >>> also implies that the max queue depth for each hw queue is also can_queue. So,
> >>> in the worst case that all commands are queued down the same hw queue we need an
> >>> event pool with can_queue commands.
> >>>
> >>>>
> >>>> Looking at this closer, we might have bigger problems. There is a host wide
> >>>> max number of commands that the VFC host supports, which gets returned on
> >>>> NPIV Login. This value can change across a live migration event.
> >>>
> >>> From what I understand the max commands can only become less.
> >>>
> >>>>
> >>>> The ibmvfc driver, which does the same thing the lpfc driver does, modifies
> >>>> can_queue on the scsi_host *after* the tag set has been allocated. This looks
> >>>> to be a concern with ibmvfc, not sure about lpfc, as it doesn't look like
> >>>> we look at can_queue once the tag set is setup, and I'm not seeing a good way
> >>>> to dynamically change the host queue depth once the tag set is setup.
> >>>>
> >>>> Unless I'm missing something, our best options appear to either be to implement
> >>>> our own host wide busy reference counting, which doesn't sound very good, or
> >>>> we need to add some API to block / scsi that allows us to dynamically change
> >>>> can_queue.
> >>>
> >>> Changing can_queue won't do use any good with the shared tagset becasue each
> >>> queue still needs to be able to queue can_queue number of commands in the worst
> >>> case.
> >>
> >> The issue I'm trying to highlight here is the following scenario:
> >>
> >> 1. We set shost->can_queue, then call scsi_add_host, which allocates the tag set.
> >>
> >> 2. On our NPIV login response from the VIOS, we might get a lower value than we
> >> initially set in shost->can_queue, so we update it, but nobody ever looks at it
> >> again, and we don't have any protection against sending too many commands to the host.
> >>
> >>
> >> Basically, we no longer have any code that ensures we don't send more
> >> commands to the VIOS than we are told it supports. According to the architecture,
> >> if we actually do this, the VIOS will do an h_free_crq, which would be a bit
> >> of a bug on our part.
> >>
> >> I don't think it was ever clearly defined in the API that a driver can
> >> change shost->can_queue after calling scsi_add_host, but up until
> >> commit 6eb045e092efefafc6687409a6fa6d1dabf0fb69, this worked and now
> >> it doesn't.
> >
> > Actually it isn't related with commit 6eb045e092ef, because blk_mq_alloc_tag_set()
> > uses .can_queue to create driver tag sbitmap and request pool.
> >
> > So even thought without 6eb045e092ef, the updated .can_queue can't work
> > as expected because the max driver tag depth has been fixed by blk-mq already.
>
> There are two scenarios here. In the scenario of someone increasing can_queue
> after the tag set is allocated, I agree, blk-mq will never take advantage
> of this. However, in the scenario of someone *decreasing* can_queue after the
> tag set is allocated, prior to 6eb045e092ef, the shost->host_busy code provided
> this protection.

When .can_queue is decreased, blk-mq still may allocate driver tag which is >
.can_queue, this way might break driver/device too, but it depends on how driver
uses req->tag.

>
> >
> > What 6eb045e092ef does is just to remove the double check on max
> > host-wide allowed commands because that has been respected by blk-mq
> > driver tag allocation already.
> >
> >>
> >> I started looking through drivers that do this, and so far, it looks like the
> >> following drivers do: ibmvfc, lpfc, aix94xx, libfc, BusLogic, and likely others...
> >>
> >> We probably need an API that lets us change shost->can_queue dynamically.
> >
> > I'd suggest to confirm changing .can_queue is one real usecase.
>
> For ibmvfc, the total number of commands that the scsi host supports is very
> much a dynamic value. It can increase and it can decrease. Live migrating
> a logical partition from one system to another is the usual cause of
> such a capability change. For ibmvfc, at least, this only ever happens
> when we've self blocked the host and have sent back all outstanding I/O.

This one looks a good use case, and the new API may have to freeze request
queues of all LUNs, and the operation is very expensive and slow.

>
> However, looking at other drivers that modify can_queue dynamically, this
> doesn't always hold true. Looking at libfc, it looks to dynamically ramp
> up and ramp down can_queue based on its ability to handle requests.

This one looks hard to use the new API which isn't supposed to be called
in fast path. And changing host wide resource is really not good in fast
path, IMO.

>
> There are certainly a number of other drivers that change can_queue
> after the tag set has been allocated. Some of these drivers could
> likely be changed to avoid doing this, but changing them all will likely
> be difficult.

It is still better to understand why these drivers have to update
.can_queue dynamically.


Thanks,
Ming