2019-02-06 10:53:07

by John Garry

[permalink] [raw]
Subject: [PATCH 0/6] hisi_sas: DIX support, performance optimisation, and misc patches

This patchset introduces the following:
- DIX support for v3 hw

- Optimisation based on mapping CPUs to queues, much the same as other
SCSI drivers do today, without exposing > 1 queue to upper layer.

It is disabled by default and marked as experimental, due to ongoing
discussion here:
https://marc.info/?l=linux-scsi&m=154876335707751&w=2

In addition, hopefully we can eventually expose > 1 queue to upper
layer and remove the reply_map, if performance issue discussed here has
a positive outcome:
https://marc.info/?l=linux-scsi&m=154874776701343&w=2

- And some other minor tidying and a driver debugfs feature

This series is based on Martin's 5.1 queue +
"scsi: hisi_sas: Set protection parameters prior to adding SCSI host"

John Garry (2):
scsi: hisi_sas: Issue internal abort on all relevant queues
scsi: hisi_sas: Do some more tidy-up

Luo Jiaxing (1):
scsi: hisi_sas: Add manual trigger for debugfs dump

Xiang Chen (3):
scsi: hisi_sas: Add support for DIX feature for v3 hw
scsi: hisi_sas: change queue depth from 512 to 4096
scsi: hisi_sas: Use pci_irq_get_affinity() for v3 hw as experimental

drivers/scsi/hisi_sas/hisi_sas.h | 48 +++--
drivers/scsi/hisi_sas/hisi_sas_main.c | 237 +++++++++++++++++++++----
drivers/scsi/hisi_sas/hisi_sas_v1_hw.c | 2 +
drivers/scsi/hisi_sas/hisi_sas_v2_hw.c | 3 +
drivers/scsi/hisi_sas/hisi_sas_v3_hw.c | 157 +++++++++++++---
5 files changed, 377 insertions(+), 70 deletions(-)

--
2.17.1



2019-02-06 10:53:14

by John Garry

[permalink] [raw]
Subject: [PATCH 6/6] scsi: hisi_sas: Do some more tidy-up

Do some very minor tidy-up, for things like needlessly initing
variable and not leaving whitespace before quote endings.

Originally-from: Xiang Chen <[email protected]>
Originally-from: Luo Jiaxing <[email protected]>
Signed-off-by: John Garry <[email protected]>
---
drivers/scsi/hisi_sas/hisi_sas_main.c | 10 +++++-----
drivers/scsi/hisi_sas/hisi_sas_v3_hw.c | 4 ++--
2 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/drivers/scsi/hisi_sas/hisi_sas_main.c b/drivers/scsi/hisi_sas/hisi_sas_main.c
index eff31472b96e..923296653ed7 100644
--- a/drivers/scsi/hisi_sas/hisi_sas_main.c
+++ b/drivers/scsi/hisi_sas/hisi_sas_main.c
@@ -144,7 +144,7 @@ EXPORT_SYMBOL_GPL(hisi_sas_get_ncq_tag);
*/
u8 hisi_sas_get_prog_phy_linkrate_mask(enum sas_linkrate max)
{
- u16 rate = 0;
+ u8 rate = 0;
int i;

max -= SAS_LINK_RATE_1_5_GBPS;
@@ -1180,7 +1180,7 @@ static int hisi_sas_exec_internal_tmf_task(struct domain_device *device,
task->task_done = hisi_sas_task_done;

task->slow_task->timer.function = hisi_sas_tmf_timedout;
- task->slow_task->timer.expires = jiffies + TASK_TIMEOUT*HZ;
+ task->slow_task->timer.expires = jiffies + TASK_TIMEOUT * HZ;
add_timer(&task->slow_task->timer);

res = hisi_sas_task_exec(task, GFP_KERNEL, 1, tmf);
@@ -1701,8 +1701,8 @@ static int hisi_sas_abort_task_set(struct domain_device *device, u8 *lun)

static int hisi_sas_clear_aca(struct domain_device *device, u8 *lun)
{
- int rc = TMF_RESP_FUNC_FAILED;
struct hisi_sas_tmf_task tmf_task;
+ int rc;

tmf_task.tmf = TMF_CLEAR_ACA;
rc = hisi_sas_debug_issue_ssp_tmf(device, lun, &tmf_task);
@@ -1752,7 +1752,7 @@ static int hisi_sas_I_T_nexus_reset(struct domain_device *device)
{
struct hisi_hba *hisi_hba = dev_to_hisi_hba(device);
struct device *dev = hisi_hba->dev;
- int rc = TMF_RESP_FUNC_FAILED;
+ int rc;

rc = hisi_sas_internal_task_abort(hisi_hba, device,
HISI_SAS_INT_ABT_DEV, 0);
@@ -2005,7 +2005,7 @@ _hisi_sas_internal_task_abort(struct hisi_hba *hisi_hba,
task->task_proto = device->tproto;
task->task_done = hisi_sas_task_done;
task->slow_task->timer.function = hisi_sas_tmf_timedout;
- task->slow_task->timer.expires = jiffies + INTERNAL_ABORT_TIMEOUT*HZ;
+ task->slow_task->timer.expires = jiffies + INTERNAL_ABORT_TIMEOUT * HZ;
add_timer(&task->slow_task->timer);

res = hisi_sas_internal_abort_task_exec(hisi_hba, sas_dev->device_id,
diff --git a/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c b/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c
index d734aef263d1..0f05fd409f6b 100644
--- a/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c
+++ b/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c
@@ -11,7 +11,7 @@
#include "hisi_sas.h"
#define DRV_NAME "hisi_sas_v3_hw"

-/* global registers need init*/
+/* global registers need init */
#define DLVRY_QUEUE_ENABLE 0x0
#define IOST_BASE_ADDR_LO 0x8
#define IOST_BASE_ADDR_HI 0xc
@@ -728,7 +728,7 @@ static void clear_itct_v3_hw(struct hisi_hba *hisi_hba,
hisi_sas_write32(hisi_hba, ENT_INT_SRC3,
ENT_INT_SRC3_ITC_INT_MSK);

- /* clear the itct table*/
+ /* clear the itct table */
reg_val = ITCT_CLR_EN_MSK | (dev_id & ITCT_DEV_MSK);
hisi_sas_write32(hisi_hba, ITCT_CLR, reg_val);

--
2.17.1


2019-02-06 10:53:22

by John Garry

[permalink] [raw]
Subject: [PATCH 4/6] scsi: hisi_sas: Issue internal abort on all relevant queues

To support queue mapped to a CPU, it needs to be ensured that issuing an
internal abort is safe, in that it is guaranteed that an internal abort
is processed for a single IO or a device after all the relevant
command(s) which it is attempting to abort have been processed by
the controller.

Currently we only deliver commands for any device on a single queue to
solve this problem, as we know that commands issued on the same queue
will be processed in order, and we will not have a scenario where the
internal abort is racing against a command(s) which it is trying to abort.

To enqueue commands on queue mapped to a CPU, choosing a queue for an
command is based on the associated queue for the current CPU, so
this is not safe for internal abort since it would definitely not be
guaranteed that commands for the command devices are issued on the
same queue.

To solve this issue, we take a bludgeoning approach, and issue a separate
internal abort on any queue(s) relevant to the command or device, in that
we will be guaranteed that at least one of these internal aborts will be
received last in the controller.

So, for aborting a single command, we can just force the internal abort to
be issued on the same queue as the command which we are trying to abort.

For aborting all commands associated with a device, we issue a separate
internal abort on all relevant queues. Issuing multiple internal aborts in
this fashion would have not side affect.

Signed-off-by: John Garry <[email protected]>
Signed-off-by: Xiang Chen <[email protected]>
---
drivers/scsi/hisi_sas/hisi_sas.h | 2 +
drivers/scsi/hisi_sas/hisi_sas_main.c | 64 ++++++++++++++++++++------
drivers/scsi/hisi_sas/hisi_sas_v1_hw.c | 2 +
drivers/scsi/hisi_sas/hisi_sas_v2_hw.c | 2 +
drivers/scsi/hisi_sas/hisi_sas_v3_hw.c | 8 +++-
5 files changed, 62 insertions(+), 16 deletions(-)

diff --git a/drivers/scsi/hisi_sas/hisi_sas.h b/drivers/scsi/hisi_sas/hisi_sas.h
index ff844d4ba0d3..5accad05a86f 100644
--- a/drivers/scsi/hisi_sas/hisi_sas.h
+++ b/drivers/scsi/hisi_sas/hisi_sas.h
@@ -365,6 +365,8 @@ struct hisi_hba {
u32 intr_coal_ticks; /* Time of interrupt coalesce in us */
u32 intr_coal_count; /* Interrupt count to coalesce */

+ int cq_nvecs;
+
/* debugfs memories */
u32 *debugfs_global_reg;
u32 *debugfs_port_reg[HISI_SAS_MAX_PHYS];
diff --git a/drivers/scsi/hisi_sas/hisi_sas_main.c b/drivers/scsi/hisi_sas/hisi_sas_main.c
index 35619bdad1d5..40a402f09afb 100644
--- a/drivers/scsi/hisi_sas/hisi_sas_main.c
+++ b/drivers/scsi/hisi_sas/hisi_sas_main.c
@@ -1023,7 +1023,7 @@ static void hisi_sas_dev_gone(struct domain_device *device)

if (!test_bit(HISI_SAS_RESET_BIT, &hisi_hba->flags)) {
hisi_sas_internal_task_abort(hisi_hba, device,
- HISI_SAS_INT_ABT_DEV, 0);
+ HISI_SAS_INT_ABT_DEV, 0);

hisi_sas_dereg_device(hisi_hba, device);

@@ -1630,7 +1630,8 @@ static int hisi_sas_abort_task(struct sas_task *task)
task->task_proto & SAS_PROTOCOL_STP) {
if (task->dev->dev_type == SAS_SATA_DEV) {
rc = hisi_sas_internal_task_abort(hisi_hba, device,
- HISI_SAS_INT_ABT_DEV, 0);
+ HISI_SAS_INT_ABT_DEV,
+ 0);
if (rc < 0) {
dev_err(dev, "abort task: internal abort failed\n");
goto out;
@@ -1645,7 +1646,7 @@ static int hisi_sas_abort_task(struct sas_task *task)
struct hisi_sas_cq *cq = &hisi_hba->cq[slot->dlvry_queue];

rc = hisi_sas_internal_task_abort(hisi_hba, device,
- HISI_SAS_INT_ABT_CMD, tag);
+ HISI_SAS_INT_ABT_CMD, tag);
if (((rc < 0) || (rc == TMF_RESP_FUNC_FAILED)) &&
task->lldd_task) {
/*
@@ -1671,7 +1672,7 @@ static int hisi_sas_abort_task_set(struct domain_device *device, u8 *lun)
int rc = TMF_RESP_FUNC_FAILED;

rc = hisi_sas_internal_task_abort(hisi_hba, device,
- HISI_SAS_INT_ABT_DEV, 0);
+ HISI_SAS_INT_ABT_DEV, 0);
if (rc < 0) {
dev_err(dev, "abort task set: internal abort rc=%d\n", rc);
return TMF_RESP_FUNC_FAILED;
@@ -1743,7 +1744,7 @@ static int hisi_sas_I_T_nexus_reset(struct domain_device *device)
int rc = TMF_RESP_FUNC_FAILED;

rc = hisi_sas_internal_task_abort(hisi_hba, device,
- HISI_SAS_INT_ABT_DEV, 0);
+ HISI_SAS_INT_ABT_DEV, 0);
if (rc < 0) {
dev_err(dev, "I_T nexus reset: internal abort (%d)\n", rc);
return TMF_RESP_FUNC_FAILED;
@@ -1788,7 +1789,7 @@ static int hisi_sas_lu_reset(struct domain_device *device, u8 *lun)
struct hisi_sas_tmf_task tmf_task = { .tmf = TMF_LU_RESET };

rc = hisi_sas_internal_task_abort(hisi_hba, device,
- HISI_SAS_INT_ABT_DEV, 0);
+ HISI_SAS_INT_ABT_DEV, 0);
if (rc < 0) {
dev_err(dev, "lu_reset: internal abort failed\n");
goto out;
@@ -1874,7 +1875,7 @@ static int hisi_sas_query_task(struct sas_task *task)
static int
hisi_sas_internal_abort_task_exec(struct hisi_hba *hisi_hba, int device_id,
struct sas_task *task, int abort_flag,
- int task_tag)
+ int task_tag, struct hisi_sas_dq *dq)
{
struct domain_device *device = task->dev;
struct hisi_sas_device *sas_dev = device->lldd_dev;
@@ -1883,7 +1884,6 @@ hisi_sas_internal_abort_task_exec(struct hisi_hba *hisi_hba, int device_id,
struct hisi_sas_slot *slot;
struct asd_sas_port *sas_port = device->port;
struct hisi_sas_cmd_hdr *cmd_hdr_base;
- struct hisi_sas_dq *dq = sas_dev->dq;
int dlvry_queue_slot, dlvry_queue, n_elem = 0, rc, slot_idx;
unsigned long flags, flags_dq = 0;
int wr_q_index;
@@ -1955,18 +1955,19 @@ hisi_sas_internal_abort_task_exec(struct hisi_hba *hisi_hba, int device_id,
}

/**
- * hisi_sas_internal_task_abort -- execute an internal
+ * _hisi_sas_internal_task_abort -- execute an internal
* abort command for single IO command or a device
* @hisi_hba: host controller struct
* @device: domain device
* @abort_flag: mode of operation, device or single IO
* @tag: tag of IO to be aborted (only relevant to single
* IO mode)
+ * @dq: delivery queue for this internal abort command
*/
static int
-hisi_sas_internal_task_abort(struct hisi_hba *hisi_hba,
- struct domain_device *device,
- int abort_flag, int tag)
+_hisi_sas_internal_task_abort(struct hisi_hba *hisi_hba,
+ struct domain_device *device, int abort_flag,
+ int tag, struct hisi_sas_dq *dq)
{
struct sas_task *task;
struct hisi_sas_device *sas_dev = device->lldd_dev;
@@ -1994,7 +1995,7 @@ hisi_sas_internal_task_abort(struct hisi_hba *hisi_hba,
add_timer(&task->slow_task->timer);

res = hisi_sas_internal_abort_task_exec(hisi_hba, sas_dev->device_id,
- task, abort_flag, tag);
+ task, abort_flag, tag, dq);
if (res) {
del_timer(&task->slow_task->timer);
dev_err(dev, "internal task abort: executing internal task failed: %d\n",
@@ -2051,6 +2052,41 @@ hisi_sas_internal_task_abort(struct hisi_hba *hisi_hba,
return res;
}

+static int
+hisi_sas_internal_task_abort(struct hisi_hba *hisi_hba,
+ struct domain_device *device,
+ int abort_flag, int tag)
+{
+ struct hisi_sas_slot *slot;
+ struct device *dev = hisi_hba->dev;
+ struct hisi_sas_dq *dq;
+ int i, rc;
+
+ switch (abort_flag) {
+ case HISI_SAS_INT_ABT_CMD:
+ slot = &hisi_hba->slot_info[tag];
+ dq = &hisi_hba->dq[slot->dlvry_queue];
+ return _hisi_sas_internal_task_abort(hisi_hba, device,
+ abort_flag, tag, dq);
+ case HISI_SAS_INT_ABT_DEV:
+ for (i = 0; i < hisi_hba->cq_nvecs; i++) {
+ dq = &hisi_hba->dq[i];
+ rc = _hisi_sas_internal_task_abort(hisi_hba, device,
+ abort_flag, tag,
+ dq);
+ if (rc)
+ return rc;
+ }
+ break;
+ default:
+ dev_err(dev, "Unrecognised internal abort flag (%d)\n",
+ abort_flag);
+ return -EINVAL;
+ }
+
+ return 0;
+}
+
static void hisi_sas_port_formed(struct asd_sas_phy *sas_phy)
{
hisi_sas_port_notify_formed(sas_phy);
@@ -2117,7 +2153,7 @@ void hisi_sas_kill_tasklets(struct hisi_hba *hisi_hba)
{
int i;

- for (i = 0; i < hisi_hba->queue_count; i++) {
+ for (i = 0; i < hisi_hba->cq_nvecs; i++) {
struct hisi_sas_cq *cq = &hisi_hba->cq[i];

tasklet_kill(&cq->tasklet);
diff --git a/drivers/scsi/hisi_sas/hisi_sas_v1_hw.c b/drivers/scsi/hisi_sas/hisi_sas_v1_hw.c
index c8f808faed95..293807443480 100644
--- a/drivers/scsi/hisi_sas/hisi_sas_v1_hw.c
+++ b/drivers/scsi/hisi_sas/hisi_sas_v1_hw.c
@@ -1749,6 +1749,8 @@ static int interrupt_init_v1_hw(struct hisi_hba *hisi_hba)
}
}

+ hisi_hba->cq_nvecs = hisi_hba->queue_count;
+
return 0;
}

diff --git a/drivers/scsi/hisi_sas/hisi_sas_v2_hw.c b/drivers/scsi/hisi_sas/hisi_sas_v2_hw.c
index aa90f17a1cac..39233a9cd1f5 100644
--- a/drivers/scsi/hisi_sas/hisi_sas_v2_hw.c
+++ b/drivers/scsi/hisi_sas/hisi_sas_v2_hw.c
@@ -3400,6 +3400,8 @@ static int interrupt_init_v2_hw(struct hisi_hba *hisi_hba)
tasklet_init(t, cq_tasklet_v2_hw, (unsigned long)cq);
}

+ hisi_hba->cq_nvecs = hisi_hba->queue_count;
+
return 0;

free_cq_int_irqs:
diff --git a/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c b/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c
index 0455fba93daf..20cec1d2439c 100644
--- a/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c
+++ b/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c
@@ -402,6 +402,8 @@ struct hisi_sas_err_record_v3 {
#define T10_CHK_REF_TAG_MSK (0xf0 << T10_CHK_MSK_OFF)
#define T10_CHK_APP_TAG_MSK (0xc << T10_CHK_MSK_OFF)

+#define BASE_VECTORS_V3_HW 16
+
static bool hisi_sas_intr_conv;
MODULE_PARM_DESC(intr_conv, "interrupt converge enable (0-1)");

@@ -2050,6 +2052,8 @@ static int interrupt_init_v3_hw(struct hisi_hba *hisi_hba)
return -ENOENT;
}

+ hisi_hba->cq_nvecs = vectors - BASE_VECTORS_V3_HW;
+
rc = devm_request_irq(dev, pci_irq_vector(pdev, 1),
int_phy_up_down_bcast_v3_hw, 0,
DRV_NAME " phy", hisi_hba);
@@ -2078,7 +2082,7 @@ static int interrupt_init_v3_hw(struct hisi_hba *hisi_hba)
}

/* Init tasklets for cq only */
- for (i = 0; i < hisi_hba->queue_count; i++) {
+ for (i = 0; i < hisi_hba->cq_nvecs; i++) {
struct hisi_sas_cq *cq = &hisi_hba->cq[i];
struct tasklet_struct *t = &cq->tasklet;
int nr = hisi_sas_intr_conv ? 16 : 16 + i;
@@ -2795,7 +2799,7 @@ hisi_sas_v3_destroy_irqs(struct pci_dev *pdev, struct hisi_hba *hisi_hba)
free_irq(pci_irq_vector(pdev, 1), hisi_hba);
free_irq(pci_irq_vector(pdev, 2), hisi_hba);
free_irq(pci_irq_vector(pdev, 11), hisi_hba);
- for (i = 0; i < hisi_hba->queue_count; i++) {
+ for (i = 0; i < hisi_hba->cq_nvecs; i++) {
struct hisi_sas_cq *cq = &hisi_hba->cq[i];
int nr = hisi_sas_intr_conv ? 16 : 16 + i;

--
2.17.1


2019-02-06 10:53:29

by John Garry

[permalink] [raw]
Subject: [PATCH 3/6] scsi: hisi_sas: change queue depth from 512 to 4096

From: Xiang Chen <[email protected]>

If sending IOs to many disks from single queue, it is possible that the
queue may be full. To avoid the situation, change queue depth from 512
to 4096 which is the max number of IOs for v3 hw.

Signed-off-by: Xiang Chen <[email protected]>
Signed-off-by: John Garry <[email protected]>
---
drivers/scsi/hisi_sas/hisi_sas.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/scsi/hisi_sas/hisi_sas.h b/drivers/scsi/hisi_sas/hisi_sas.h
index df2a92a0e7a1..ff844d4ba0d3 100644
--- a/drivers/scsi/hisi_sas/hisi_sas.h
+++ b/drivers/scsi/hisi_sas/hisi_sas.h
@@ -30,7 +30,7 @@

#define HISI_SAS_MAX_PHYS 9
#define HISI_SAS_MAX_QUEUES 32
-#define HISI_SAS_QUEUE_SLOTS 512
+#define HISI_SAS_QUEUE_SLOTS 4096
#define HISI_SAS_MAX_ITCT_ENTRIES 1024
#define HISI_SAS_MAX_DEVICES HISI_SAS_MAX_ITCT_ENTRIES
#define HISI_SAS_RESET_BIT 0
--
2.17.1


2019-02-06 10:54:56

by John Garry

[permalink] [raw]
Subject: [PATCH 5/6] scsi: hisi_sas: Use pci_irq_get_affinity() for v3 hw as experimental

From: Xiang Chen <[email protected]>

For auto-control irq affinity mode, choose the dq to deliver IO according
to the current CPU.

Then it decreases the performance regression that fio and CQ interrupts
are processed on different node.

For user control irq affinity mode, keep it as before.

To realize it, also need to distinguish the usage of dq lock and sas_dev
lock.

We mark as experimental due to ongoing discussion on managed MSI IRQ
during hotplug:
https://marc.info/?l=linux-scsi&m=154876335707751&w=2

We're almost at the point where we can expose multiple queues to
the upper layer for SCSI MQ, but we need to sort out the per-HBA
tags performance issue.

Signed-off-by: Xiang Chen <[email protected]>
Signed-off-by: John Garry <[email protected]>
---
drivers/scsi/hisi_sas/hisi_sas.h | 4 ++
drivers/scsi/hisi_sas/hisi_sas_main.c | 33 ++++++++++---
drivers/scsi/hisi_sas/hisi_sas_v2_hw.c | 1 +
drivers/scsi/hisi_sas/hisi_sas_v3_hw.c | 65 +++++++++++++++++++++++---
4 files changed, 89 insertions(+), 14 deletions(-)

diff --git a/drivers/scsi/hisi_sas/hisi_sas.h b/drivers/scsi/hisi_sas/hisi_sas.h
index 5accad05a86f..6c87bd34509a 100644
--- a/drivers/scsi/hisi_sas/hisi_sas.h
+++ b/drivers/scsi/hisi_sas/hisi_sas.h
@@ -173,6 +173,7 @@ struct hisi_sas_port {

struct hisi_sas_cq {
struct hisi_hba *hisi_hba;
+ const struct cpumask *pci_irq_mask;
struct tasklet_struct tasklet;
int rd_point;
int id;
@@ -195,6 +196,7 @@ struct hisi_sas_device {
enum sas_device_type dev_type;
int device_id;
int sata_idx;
+ spinlock_t lock; /* For protecting slots */
};

struct hisi_sas_tmf_task {
@@ -217,6 +219,7 @@ struct hisi_sas_slot {
int cmplt_queue_slot;
int abort;
int ready;
+ int device_id;
void *cmd_hdr;
dma_addr_t cmd_hdr_dma;
struct timer_list internal_abort_timer;
@@ -366,6 +369,7 @@ struct hisi_hba {
u32 intr_coal_count; /* Interrupt count to coalesce */

int cq_nvecs;
+ unsigned int *reply_map;

/* debugfs memories */
u32 *debugfs_global_reg;
diff --git a/drivers/scsi/hisi_sas/hisi_sas_main.c b/drivers/scsi/hisi_sas/hisi_sas_main.c
index 40a402f09afb..eff31472b96e 100644
--- a/drivers/scsi/hisi_sas/hisi_sas_main.c
+++ b/drivers/scsi/hisi_sas/hisi_sas_main.c
@@ -241,8 +241,9 @@ static void hisi_sas_slot_index_init(struct hisi_hba *hisi_hba)
void hisi_sas_slot_task_free(struct hisi_hba *hisi_hba, struct sas_task *task,
struct hisi_sas_slot *slot)
{
- struct hisi_sas_dq *dq = &hisi_hba->dq[slot->dlvry_queue];
unsigned long flags;
+ int device_id = slot->device_id;
+ struct hisi_sas_device *sas_dev = &hisi_hba->devices[device_id];

if (task) {
struct device *dev = hisi_hba->dev;
@@ -267,10 +268,9 @@ void hisi_sas_slot_task_free(struct hisi_hba *hisi_hba, struct sas_task *task,
}
}

-
- spin_lock_irqsave(&dq->lock, flags);
+ spin_lock_irqsave(&sas_dev->lock, flags);
list_del_init(&slot->entry);
- spin_unlock_irqrestore(&dq->lock, flags);
+ spin_unlock_irqrestore(&sas_dev->lock, flags);

memset(slot, 0, offsetof(struct hisi_sas_slot, buf));

@@ -471,7 +471,14 @@ static int hisi_sas_task_prep(struct sas_task *task,
return -ECOMM;
}

- *dq_pointer = dq = sas_dev->dq;
+ if (hisi_hba->reply_map) {
+ int cpu = raw_smp_processor_id();
+ unsigned int dq_index = hisi_hba->reply_map[cpu];
+
+ *dq_pointer = dq = &hisi_hba->dq[dq_index];
+ } else {
+ *dq_pointer = dq = sas_dev->dq;
+ }

port = to_hisi_sas_port(sas_port);
if (port && !port->port_attached) {
@@ -526,12 +533,15 @@ static int hisi_sas_task_prep(struct sas_task *task,
}

list_add_tail(&slot->delivery, &dq->list);
- list_add_tail(&slot->entry, &sas_dev->list);
spin_unlock_irqrestore(&dq->lock, flags);
+ spin_lock_irqsave(&sas_dev->lock, flags);
+ list_add_tail(&slot->entry, &sas_dev->list);
+ spin_unlock_irqrestore(&sas_dev->lock, flags);

dlvry_queue = dq->id;
dlvry_queue_slot = wr_q_index;

+ slot->device_id = sas_dev->device_id;
slot->n_elem = n_elem;
slot->n_elem_dif = n_elem_dif;
slot->dlvry_queue = dlvry_queue;
@@ -701,6 +711,7 @@ static struct hisi_sas_device *hisi_sas_alloc_dev(struct domain_device *device)
sas_dev->hisi_hba = hisi_hba;
sas_dev->sas_device = device;
sas_dev->dq = dq;
+ spin_lock_init(&sas_dev->lock);
INIT_LIST_HEAD(&hisi_hba->devices[i].list);
break;
}
@@ -1913,10 +1924,14 @@ hisi_sas_internal_abort_task_exec(struct hisi_hba *hisi_hba, int device_id,
}
list_add_tail(&slot->delivery, &dq->list);
spin_unlock_irqrestore(&dq->lock, flags_dq);
+ spin_lock_irqsave(&sas_dev->lock, flags);
+ list_add_tail(&slot->entry, &sas_dev->list);
+ spin_unlock_irqrestore(&sas_dev->lock, flags);

dlvry_queue = dq->id;
dlvry_queue_slot = wr_q_index;

+ slot->device_id = sas_dev->device_id;
slot->n_elem = n_elem;
slot->dlvry_queue = dlvry_queue;
slot->dlvry_queue_slot = dlvry_queue_slot;
@@ -1940,7 +1955,6 @@ hisi_sas_internal_abort_task_exec(struct hisi_hba *hisi_hba, int device_id,
WRITE_ONCE(slot->ready, 1);
/* send abort command to the chip */
spin_lock_irqsave(&dq->lock, flags);
- list_add_tail(&slot->entry, &sas_dev->list);
hisi_hba->hw->start_delivery(dq);
spin_unlock_irqrestore(&dq->lock, flags);

@@ -2070,6 +2084,11 @@ hisi_sas_internal_task_abort(struct hisi_hba *hisi_hba,
abort_flag, tag, dq);
case HISI_SAS_INT_ABT_DEV:
for (i = 0; i < hisi_hba->cq_nvecs; i++) {
+ struct hisi_sas_cq *cq = &hisi_hba->cq[i];
+ const struct cpumask *mask = cq->pci_irq_mask;
+
+ if (mask && !cpumask_intersects(cpu_online_mask, mask))
+ continue;
dq = &hisi_hba->dq[i];
rc = _hisi_sas_internal_task_abort(hisi_hba, device,
abort_flag, tag,
diff --git a/drivers/scsi/hisi_sas/hisi_sas_v2_hw.c b/drivers/scsi/hisi_sas/hisi_sas_v2_hw.c
index 39233a9cd1f5..e40cc6b3b67b 100644
--- a/drivers/scsi/hisi_sas/hisi_sas_v2_hw.c
+++ b/drivers/scsi/hisi_sas/hisi_sas_v2_hw.c
@@ -873,6 +873,7 @@ hisi_sas_device *alloc_dev_quirk_v2_hw(struct domain_device *device)
sas_dev->sas_device = device;
sas_dev->sata_idx = sata_idx;
sas_dev->dq = dq;
+ spin_lock_init(&sas_dev->lock);
INIT_LIST_HEAD(&hisi_hba->devices[i].list);
break;
}
diff --git a/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c b/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c
index 20cec1d2439c..d734aef263d1 100644
--- a/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c
+++ b/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c
@@ -403,6 +403,7 @@ struct hisi_sas_err_record_v3 {
#define T10_CHK_APP_TAG_MSK (0xc << T10_CHK_MSK_OFF)

#define BASE_VECTORS_V3_HW 16
+#define MIN_AFFINE_VECTORS_V3_HW (BASE_VECTORS_V3_HW + 1)

static bool hisi_sas_intr_conv;
MODULE_PARM_DESC(intr_conv, "interrupt converge enable (0-1)");
@@ -412,6 +413,11 @@ static int prot_mask;
module_param(prot_mask, int, 0);
MODULE_PARM_DESC(prot_mask, " host protection capabilities mask, def=0x0 ");

+static bool auto_affine_msi_experimental;
+module_param(auto_affine_msi_experimental, bool, 0444);
+MODULE_PARM_DESC(auto_affine_msi_experimental, "Enable auto-affinity of MSI IRQs as experimental:\n"
+ "default is off");
+
static u32 hisi_sas_read32(struct hisi_hba *hisi_hba, u32 off)
{
void __iomem *regs = hisi_hba->regs + off;
@@ -2037,19 +2043,64 @@ static irqreturn_t cq_interrupt_v3_hw(int irq_no, void *p)
return IRQ_HANDLED;
}

+static void setup_reply_map_v3_hw(struct hisi_hba *hisi_hba, int nvecs)
+{
+ const struct cpumask *mask;
+ int queue, cpu;
+
+ for (queue = 0; queue < nvecs; queue++) {
+ struct hisi_sas_cq *cq = &hisi_hba->cq[queue];
+
+ mask = pci_irq_get_affinity(hisi_hba->pci_dev, queue +
+ BASE_VECTORS_V3_HW);
+ if (!mask)
+ goto fallback;
+ cq->pci_irq_mask = mask;
+ for_each_cpu(cpu, mask)
+ hisi_hba->reply_map[cpu] = queue;
+ }
+ return;
+
+fallback:
+ for_each_possible_cpu(cpu)
+ hisi_hba->reply_map[cpu] = cpu % hisi_hba->queue_count;
+ /* Don't clean all CQ masks */
+}
+
static int interrupt_init_v3_hw(struct hisi_hba *hisi_hba)
{
struct device *dev = hisi_hba->dev;
struct pci_dev *pdev = hisi_hba->pci_dev;
int vectors, rc;
int i, k;
- int max_msi = HISI_SAS_MSI_COUNT_V3_HW;
-
- vectors = pci_alloc_irq_vectors(hisi_hba->pci_dev, 1,
- max_msi, PCI_IRQ_MSI);
- if (vectors < max_msi) {
- dev_err(dev, "could not allocate all msi (%d)\n", vectors);
- return -ENOENT;
+ int max_msi = HISI_SAS_MSI_COUNT_V3_HW, min_msi;
+
+ if (auto_affine_msi_experimental) {
+ struct irq_affinity desc = {
+ .pre_vectors = BASE_VECTORS_V3_HW,
+ };
+
+ min_msi = MIN_AFFINE_VECTORS_V3_HW;
+
+ hisi_hba->reply_map = devm_kcalloc(dev, nr_cpu_ids,
+ sizeof(unsigned int),
+ GFP_KERNEL);
+ if (!hisi_hba->reply_map)
+ return -ENOMEM;
+ vectors = pci_alloc_irq_vectors_affinity(hisi_hba->pci_dev,
+ min_msi, max_msi,
+ PCI_IRQ_MSI |
+ PCI_IRQ_AFFINITY,
+ &desc);
+ if (vectors < 0)
+ return -ENOENT;
+ setup_reply_map_v3_hw(hisi_hba, vectors - BASE_VECTORS_V3_HW);
+ } else {
+ min_msi = max_msi;
+ vectors = pci_alloc_irq_vectors(hisi_hba->pci_dev, min_msi,
+ max_msi, PCI_IRQ_MSI);
+ if (vectors < 0)
+ return vectors;
}

hisi_hba->cq_nvecs = vectors - BASE_VECTORS_V3_HW;
--
2.17.1


2019-02-06 10:55:36

by John Garry

[permalink] [raw]
Subject: [PATCH 2/6] scsi: hisi_sas: Add manual trigger for debugfs dump

From: Luo Jiaxing <[email protected]>

Add an interface to manually trigger a debugfs dump.

Signed-off-by: Luo Jiaxing <[email protected]>
Signed-off-by: John Garry <[email protected]>
---
drivers/scsi/hisi_sas/hisi_sas_main.c | 34 +++++++++++++++++++++++++++
1 file changed, 34 insertions(+)

diff --git a/drivers/scsi/hisi_sas/hisi_sas_main.c b/drivers/scsi/hisi_sas/hisi_sas_main.c
index 1a3d31d4258c..35619bdad1d5 100644
--- a/drivers/scsi/hisi_sas/hisi_sas_main.c
+++ b/drivers/scsi/hisi_sas/hisi_sas_main.c
@@ -2977,6 +2977,36 @@ static void hisi_sas_debugfs_snapshot_regs(struct hisi_hba *hisi_hba)
hisi_hba->hw->snapshot_restore(hisi_hba);
}

+static ssize_t hisi_sas_debugfs_trigger_dump_write(struct file *file,
+ const char __user *user_buf,
+ size_t count, loff_t *ppos)
+{
+ struct hisi_hba *hisi_hba = file->f_inode->i_private;
+ char buf[8];
+
+ /* A bit racy, but don't care too much since it's only debugfs */
+ if (hisi_hba->debugfs_snapshot)
+ return -EFAULT;
+
+ if (count > 8)
+ return -EFAULT;
+
+ if (copy_from_user(buf, user_buf, count))
+ return -EFAULT;
+
+ if (buf[0] != '1')
+ return -EFAULT;
+
+ queue_work(hisi_hba->wq, &hisi_hba->debugfs_work);
+
+ return count;
+}
+
+static const struct file_operations hisi_sas_debugfs_trigger_dump_fops = {
+ .write = &hisi_sas_debugfs_trigger_dump_write,
+ .owner = THIS_MODULE,
+};
+
void hisi_sas_debugfs_work_handler(struct work_struct *work)
{
struct hisi_hba *hisi_hba =
@@ -2999,6 +3029,10 @@ void hisi_sas_debugfs_init(struct hisi_hba *hisi_hba)

hisi_hba->debugfs_dir = debugfs_create_dir(dev_name(dev),
hisi_sas_debugfs_dir);
+ debugfs_create_file("trigger_dump", 0600,
+ hisi_hba->debugfs_dir,
+ hisi_hba,
+ &hisi_sas_debugfs_trigger_dump_fops);

/* Alloc buffer for global */
sz = hisi_hba->hw->debugfs_reg_global->count * 4;
--
2.17.1


2019-02-06 10:55:41

by John Garry

[permalink] [raw]
Subject: [PATCH 1/6] scsi: hisi_sas: Add support for DIX feature for v3 hw

From: Xiang Chen <[email protected]>

This patch adds support for DIX to v3 hw driver.

For this, we build upon support for DIF, most significantly
is adding new DMA map and unmap paths.

Some pre-existing macro precedence issues are also tidied. They
were detected by checkpatch --strict.

Signed-off-by: Xiang Chen <[email protected]>
Signed-off-by: John Garry <[email protected]>
---
drivers/scsi/hisi_sas/hisi_sas.h | 40 ++++++++---
drivers/scsi/hisi_sas/hisi_sas_main.c | 96 +++++++++++++++++++++++---
drivers/scsi/hisi_sas/hisi_sas_v3_hw.c | 80 +++++++++++++++++----
3 files changed, 184 insertions(+), 32 deletions(-)

diff --git a/drivers/scsi/hisi_sas/hisi_sas.h b/drivers/scsi/hisi_sas/hisi_sas.h
index 9b35f84b5029..df2a92a0e7a1 100644
--- a/drivers/scsi/hisi_sas/hisi_sas.h
+++ b/drivers/scsi/hisi_sas/hisi_sas.h
@@ -41,20 +41,25 @@
#define HISI_SAS_COMMAND_TABLE_SZ (sizeof(union hisi_sas_command_table))

#define hisi_sas_status_buf_addr(buf) \
- (buf + offsetof(struct hisi_sas_slot_buf_table, status_buffer))
-#define hisi_sas_status_buf_addr_mem(slot) hisi_sas_status_buf_addr(slot->buf)
+ ((buf) + offsetof(struct hisi_sas_slot_buf_table, status_buffer))
+#define hisi_sas_status_buf_addr_mem(slot) hisi_sas_status_buf_addr((slot)->buf)
#define hisi_sas_status_buf_addr_dma(slot) \
- hisi_sas_status_buf_addr(slot->buf_dma)
+ hisi_sas_status_buf_addr((slot)->buf_dma)

#define hisi_sas_cmd_hdr_addr(buf) \
- (buf + offsetof(struct hisi_sas_slot_buf_table, command_header))
-#define hisi_sas_cmd_hdr_addr_mem(slot) hisi_sas_cmd_hdr_addr(slot->buf)
-#define hisi_sas_cmd_hdr_addr_dma(slot) hisi_sas_cmd_hdr_addr(slot->buf_dma)
+ ((buf) + offsetof(struct hisi_sas_slot_buf_table, command_header))
+#define hisi_sas_cmd_hdr_addr_mem(slot) hisi_sas_cmd_hdr_addr((slot)->buf)
+#define hisi_sas_cmd_hdr_addr_dma(slot) hisi_sas_cmd_hdr_addr((slot)->buf_dma)

#define hisi_sas_sge_addr(buf) \
- (buf + offsetof(struct hisi_sas_slot_buf_table, sge_page))
-#define hisi_sas_sge_addr_mem(slot) hisi_sas_sge_addr(slot->buf)
-#define hisi_sas_sge_addr_dma(slot) hisi_sas_sge_addr(slot->buf_dma)
+ ((buf) + offsetof(struct hisi_sas_slot_buf_table, sge_page))
+#define hisi_sas_sge_addr_mem(slot) hisi_sas_sge_addr((slot)->buf)
+#define hisi_sas_sge_addr_dma(slot) hisi_sas_sge_addr((slot)->buf_dma)
+
+#define hisi_sas_sge_dif_addr(buf) \
+ ((buf) + offsetof(struct hisi_sas_slot_dif_buf_table, sge_dif_page))
+#define hisi_sas_sge_dif_addr_mem(slot) hisi_sas_sge_dif_addr((slot)->buf)
+#define hisi_sas_sge_dif_addr_dma(slot) hisi_sas_sge_dif_addr((slot)->buf_dma)

#define HISI_SAS_MAX_SSP_RESP_SZ (sizeof(struct ssp_frame_hdr) + 1024)
#define HISI_SAS_MAX_SMP_RESP_SZ 1028
@@ -74,7 +79,11 @@
SHOST_DIF_TYPE2_PROTECTION | \
SHOST_DIF_TYPE3_PROTECTION)

-#define HISI_SAS_PROT_MASK (HISI_SAS_DIF_PROT_MASK)
+#define HISI_SAS_DIX_PROT_MASK (SHOST_DIX_TYPE1_PROTECTION | \
+ SHOST_DIX_TYPE2_PROTECTION | \
+ SHOST_DIX_TYPE3_PROTECTION)
+
+#define HISI_SAS_PROT_MASK (HISI_SAS_DIF_PROT_MASK | HISI_SAS_DIX_PROT_MASK)

#define HISI_SAS_WAIT_PHYUP_TIMEOUT 20

@@ -201,6 +210,7 @@ struct hisi_sas_slot {
struct sas_task *task;
struct hisi_sas_port *port;
u64 n_elem;
+ u64 n_elem_dif;
int dlvry_queue;
int dlvry_queue_slot;
int cmplt_queue;
@@ -464,6 +474,11 @@ struct hisi_sas_sge_page {
struct hisi_sas_sge sge[HISI_SAS_SGE_PAGE_CNT];
} __aligned(16);

+#define HISI_SAS_SGE_DIF_PAGE_CNT SG_CHUNK_SIZE
+struct hisi_sas_sge_dif_page {
+ struct hisi_sas_sge sge[HISI_SAS_SGE_DIF_PAGE_CNT];
+} __aligned(16);
+
struct hisi_sas_command_table_ssp {
struct ssp_frame_hdr hdr;
union {
@@ -494,6 +509,11 @@ struct hisi_sas_slot_buf_table {
struct hisi_sas_sge_page sge_page;
};

+struct hisi_sas_slot_dif_buf_table {
+ struct hisi_sas_slot_buf_table slot_buf;
+ struct hisi_sas_sge_dif_page sge_dif_page;
+};
+
extern struct scsi_transport_template *hisi_sas_stt;

extern bool hisi_sas_debugfs_enable;
diff --git a/drivers/scsi/hisi_sas/hisi_sas_main.c b/drivers/scsi/hisi_sas/hisi_sas_main.c
index 2d6b5fe90a9e..1a3d31d4258c 100644
--- a/drivers/scsi/hisi_sas/hisi_sas_main.c
+++ b/drivers/scsi/hisi_sas/hisi_sas_main.c
@@ -252,11 +252,19 @@ void hisi_sas_slot_task_free(struct hisi_hba *hisi_hba, struct sas_task *task,

task->lldd_task = NULL;

- if (!sas_protocol_ata(task->task_proto))
+ if (!sas_protocol_ata(task->task_proto)) {
+ struct sas_ssp_task *ssp_task = &task->ssp_task;
+ struct scsi_cmnd *scsi_cmnd = ssp_task->cmd;
+
if (slot->n_elem)
dma_unmap_sg(dev, task->scatter,
task->num_scatter,
task->data_dir);
+ if (slot->n_elem_dif)
+ dma_unmap_sg(dev, scsi_prot_sglist(scsi_cmnd),
+ scsi_prot_sg_count(scsi_cmnd),
+ task->data_dir);
+ }
}


@@ -380,6 +388,59 @@ static int hisi_sas_dma_map(struct hisi_hba *hisi_hba,
return rc;
}

+static void hisi_sas_dif_dma_unmap(struct hisi_hba *hisi_hba,
+ struct sas_task *task, int n_elem_dif)
+{
+ struct device *dev = hisi_hba->dev;
+
+ if (n_elem_dif) {
+ struct sas_ssp_task *ssp_task = &task->ssp_task;
+ struct scsi_cmnd *scsi_cmnd = ssp_task->cmd;
+
+ dma_unmap_sg(dev, scsi_prot_sglist(scsi_cmnd),
+ scsi_prot_sg_count(scsi_cmnd),
+ task->data_dir);
+ }
+}
+
+static int hisi_sas_dif_dma_map(struct hisi_hba *hisi_hba,
+ int *n_elem_dif, struct sas_task *task)
+{
+ struct device *dev = hisi_hba->dev;
+ struct sas_ssp_task *ssp_task;
+ struct scsi_cmnd *scsi_cmnd;
+ int rc;
+
+ if (task->num_scatter) {
+ ssp_task = &task->ssp_task;
+ scsi_cmnd = ssp_task->cmd;
+
+ if (scsi_prot_sg_count(scsi_cmnd)) {
+ *n_elem_dif = dma_map_sg(dev,
+ scsi_prot_sglist(scsi_cmnd),
+ scsi_prot_sg_count(scsi_cmnd),
+ task->data_dir);
+
+ if (!*n_elem_dif)
+ return -ENOMEM;
+
+ if (*n_elem_dif > HISI_SAS_SGE_DIF_PAGE_CNT) {
+ dev_err(dev, "task prep: n_elem_dif(%d) too large\n",
+ *n_elem_dif);
+ rc = -EINVAL;
+ goto err_out_dif_dma_unmap;
+ }
+ }
+ }
+
+ return 0;
+
+err_out_dif_dma_unmap:
+ dma_unmap_sg(dev, scsi_prot_sglist(scsi_cmnd),
+ scsi_prot_sg_count(scsi_cmnd), task->data_dir);
+ return rc;
+}
+
static int hisi_sas_task_prep(struct sas_task *task,
struct hisi_sas_dq **dq_pointer,
bool is_tmf, struct hisi_sas_tmf_task *tmf,
@@ -394,7 +455,7 @@ static int hisi_sas_task_prep(struct sas_task *task,
struct asd_sas_port *sas_port = device->port;
struct device *dev = hisi_hba->dev;
int dlvry_queue_slot, dlvry_queue, rc, slot_idx;
- int n_elem = 0, n_elem_req = 0, n_elem_resp = 0;
+ int n_elem = 0, n_elem_dif = 0, n_elem_req = 0, n_elem_resp = 0;
struct hisi_sas_dq *dq;
unsigned long flags;
int wr_q_index;
@@ -427,6 +488,12 @@ static int hisi_sas_task_prep(struct sas_task *task,
if (rc < 0)
goto prep_out;

+ if (!sas_protocol_ata(task->task_proto)) {
+ rc = hisi_sas_dif_dma_map(hisi_hba, &n_elem_dif, task);
+ if (rc < 0)
+ goto err_out_dma_unmap;
+ }
+
if (hisi_hba->hw->slot_index_alloc)
rc = hisi_hba->hw->slot_index_alloc(hisi_hba, device);
else {
@@ -445,7 +512,7 @@ static int hisi_sas_task_prep(struct sas_task *task,
rc = hisi_sas_slot_index_alloc(hisi_hba, scsi_cmnd);
}
if (rc < 0)
- goto err_out_dma_unmap;
+ goto err_out_dif_dma_unmap;

slot_idx = rc;
slot = &hisi_hba->slot_info[slot_idx];
@@ -466,6 +533,7 @@ static int hisi_sas_task_prep(struct sas_task *task,
dlvry_queue_slot = wr_q_index;

slot->n_elem = n_elem;
+ slot->n_elem_dif = n_elem_dif;
slot->dlvry_queue = dlvry_queue;
slot->dlvry_queue_slot = dlvry_queue_slot;
cmd_hdr_base = hisi_hba->cmd_hdr[dlvry_queue];
@@ -509,6 +577,9 @@ static int hisi_sas_task_prep(struct sas_task *task,

err_out_tag:
hisi_sas_slot_index_free(hisi_hba, slot_idx);
+err_out_dif_dma_unmap:
+ if (!sas_protocol_ata(task->task_proto))
+ hisi_sas_dif_dma_unmap(hisi_hba, task, n_elem_dif);
err_out_dma_unmap:
hisi_sas_dma_unmap(hisi_hba, task, n_elem,
n_elem_req, n_elem_resp);
@@ -2174,19 +2245,24 @@ int hisi_sas_alloc(struct hisi_hba *hisi_hba)

/* roundup to avoid overly large block size */
max_command_entries_ru = roundup(max_command_entries, 64);
- sz_slot_buf_ru = roundup(sizeof(struct hisi_sas_slot_buf_table), 64);
+ if (hisi_hba->prot_mask & HISI_SAS_DIX_PROT_MASK)
+ sz_slot_buf_ru = sizeof(struct hisi_sas_slot_dif_buf_table);
+ else
+ sz_slot_buf_ru = sizeof(struct hisi_sas_slot_buf_table);
+ sz_slot_buf_ru = roundup(sz_slot_buf_ru, 64);
s = lcm(max_command_entries_ru, sz_slot_buf_ru);
blk_cnt = (max_command_entries_ru * sz_slot_buf_ru) / s;
slots_per_blk = s / sz_slot_buf_ru;
+
for (i = 0; i < blk_cnt; i++) {
- struct hisi_sas_slot_buf_table *buf;
- dma_addr_t buf_dma;
int slot_index = i * slots_per_blk;
+ dma_addr_t buf_dma;
+ void *buf;

- buf = dmam_alloc_coherent(dev, s, &buf_dma, GFP_KERNEL);
+ buf = dmam_alloc_coherent(dev, s, &buf_dma,
+ GFP_KERNEL | __GFP_ZERO);
if (!buf)
goto err_out;
- memset(buf, 0, s);

for (j = 0; j < slots_per_blk; j++, slot_index++) {
struct hisi_sas_slot *slot;
@@ -2196,8 +2272,8 @@ int hisi_sas_alloc(struct hisi_hba *hisi_hba)
slot->buf_dma = buf_dma;
slot->idx = slot_index;

- buf++;
- buf_dma += sizeof(*buf);
+ buf += sz_slot_buf_ru;
+ buf_dma += sz_slot_buf_ru;
}
}

diff --git a/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c b/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c
index eafcab088c0b..0455fba93daf 100644
--- a/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c
+++ b/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c
@@ -399,6 +399,8 @@ struct hisi_sas_err_record_v3 {
#define USR_DATA_BLOCK_SZ_OFF 20
#define USR_DATA_BLOCK_SZ_MSK (0x3 << USR_DATA_BLOCK_SZ_OFF)
#define T10_CHK_MSK_OFF 16
+#define T10_CHK_REF_TAG_MSK (0xf0 << T10_CHK_MSK_OFF)
+#define T10_CHK_APP_TAG_MSK (0xc << T10_CHK_MSK_OFF)

static bool hisi_sas_intr_conv;
MODULE_PARM_DESC(intr_conv, "interrupt converge enable (0-1)");
@@ -969,19 +971,44 @@ static void prep_prd_sge_v3_hw(struct hisi_hba *hisi_hba,

hdr->prd_table_addr = cpu_to_le64(hisi_sas_sge_addr_dma(slot));

- hdr->sg_len = cpu_to_le32(n_elem << CMD_HDR_DATA_SGL_LEN_OFF);
+ hdr->sg_len |= cpu_to_le32(n_elem << CMD_HDR_DATA_SGL_LEN_OFF);
+}
+
+static void prep_prd_sge_dif_v3_hw(struct hisi_hba *hisi_hba,
+ struct hisi_sas_slot *slot,
+ struct hisi_sas_cmd_hdr *hdr,
+ struct scatterlist *scatter,
+ int n_elem)
+{
+ struct hisi_sas_sge_dif_page *sge_dif_page;
+ struct scatterlist *sg;
+ int i;
+
+ sge_dif_page = hisi_sas_sge_dif_addr_mem(slot);
+
+ for_each_sg(scatter, sg, n_elem, i) {
+ struct hisi_sas_sge *entry = &sge_dif_page->sge[i];
+
+ entry->addr = cpu_to_le64(sg_dma_address(sg));
+ entry->page_ctrl_0 = 0;
+ entry->page_ctrl_1 = 0;
+ entry->data_len = cpu_to_le32(sg_dma_len(sg));
+ entry->data_off = 0;
+ }
+
+ hdr->dif_prd_table_addr =
+ cpu_to_le64(hisi_sas_sge_dif_addr_dma(slot));
+
+ hdr->sg_len |= cpu_to_le32(n_elem << CMD_HDR_DIF_SGL_LEN_OFF);
}

static u32 get_prot_chk_msk_v3_hw(struct scsi_cmnd *scsi_cmnd)
{
unsigned char prot_flags = scsi_cmnd->prot_flags;

- if (prot_flags & SCSI_PROT_TRANSFER_PI) {
- if (prot_flags & SCSI_PROT_REF_CHECK)
- return 0xc << 16;
- return 0xfc << 16;
- }
- return 0;
+ if (prot_flags & SCSI_PROT_REF_CHECK)
+ return T10_CHK_APP_TAG_MSK;
+ return T10_CHK_REF_TAG_MSK | T10_CHK_APP_TAG_MSK;
}

static void fill_prot_v3_hw(struct scsi_cmnd *scsi_cmnd,
@@ -992,15 +1019,33 @@ static void fill_prot_v3_hw(struct scsi_cmnd *scsi_cmnd,
u32 lbrt_chk_val = t10_pi_ref_tag(scsi_cmnd->request);

switch (prot_op) {
+ case SCSI_PROT_READ_INSERT:
+ prot->dw0 |= T10_INSRT_EN_MSK;
+ prot->lbrtgv = lbrt_chk_val;
+ break;
case SCSI_PROT_READ_STRIP:
prot->dw0 |= (T10_RMV_EN_MSK | T10_CHK_EN_MSK);
prot->lbrtcv = lbrt_chk_val;
prot->dw4 |= get_prot_chk_msk_v3_hw(scsi_cmnd);
break;
+ case SCSI_PROT_READ_PASS:
+ prot->dw0 |= T10_CHK_EN_MSK;
+ prot->lbrtcv = lbrt_chk_val;
+ prot->dw4 |= get_prot_chk_msk_v3_hw(scsi_cmnd);
+ break;
case SCSI_PROT_WRITE_INSERT:
prot->dw0 |= T10_INSRT_EN_MSK;
prot->lbrtgv = lbrt_chk_val;
break;
+ case SCSI_PROT_WRITE_STRIP:
+ prot->dw0 |= (T10_RMV_EN_MSK | T10_CHK_EN_MSK);
+ prot->lbrtcv = lbrt_chk_val;
+ break;
+ case SCSI_PROT_WRITE_PASS:
+ prot->dw0 |= T10_CHK_EN_MSK;
+ prot->lbrtcv = lbrt_chk_val;
+ prot->dw4 |= get_prot_chk_msk_v3_hw(scsi_cmnd);
+ break;
default:
WARN(1, "prot_op(0x%x) is not valid\n", prot_op);
break;
@@ -1077,9 +1122,15 @@ static void prep_ssp_v3_hw(struct hisi_hba *hisi_hba,
hdr->dw2 = cpu_to_le32(dw2);
hdr->transfer_tags = cpu_to_le32(slot->idx);

- if (has_data)
+ if (has_data) {
prep_prd_sge_v3_hw(hisi_hba, slot, hdr, task->scatter,
- slot->n_elem);
+ slot->n_elem);
+
+ if (scsi_prot_sg_count(scsi_cmnd))
+ prep_prd_sge_dif_v3_hw(hisi_hba, slot, hdr,
+ scsi_prot_sglist(scsi_cmnd),
+ slot->n_elem_dif);
+ }

hdr->cmd_table_addr = cpu_to_le64(hisi_sas_cmd_hdr_addr_dma(slot));
hdr->sts_buffer_addr = cpu_to_le64(hisi_sas_status_buf_addr_dma(slot));
@@ -1120,18 +1171,19 @@ static void prep_ssp_v3_hw(struct hisi_hba *hisi_hba,
fill_prot_v3_hw(scsi_cmnd, &prot);
memcpy(buf_cmd_prot, &prot,
sizeof(struct hisi_sas_protect_iu_v3_hw));
-
/*
* For READ, we need length of info read to memory, while for
* WRITE we need length of data written to the disk.
*/
- if (prot_op == SCSI_PROT_WRITE_INSERT) {
+ if (prot_op == SCSI_PROT_WRITE_INSERT ||
+ prot_op == SCSI_PROT_READ_INSERT ||
+ prot_op == SCSI_PROT_WRITE_PASS ||
+ prot_op == SCSI_PROT_READ_PASS) {
unsigned int interval = scsi_prot_interval(scsi_cmnd);
unsigned int ilog2_interval = ilog2(interval);

len = (task->total_xfer_len >> ilog2_interval) * 8;
}
-
}

hdr->dw1 = cpu_to_le32(dw1);
@@ -2526,6 +2578,7 @@ static struct scsi_host_template sht_v3_hw = {
.bios_param = sas_bios_param,
.this_id = -1,
.sg_tablesize = HISI_SAS_SGE_PAGE_CNT,
+ .sg_prot_tablesize = HISI_SAS_SGE_PAGE_CNT,
.max_sectors = SCSI_DEFAULT_MAX_SECTORS,
.eh_device_reset_handler = sas_eh_device_reset_handler,
.eh_target_reset_handler = sas_eh_target_reset_handler,
@@ -2701,6 +2754,9 @@ hisi_sas_v3_probe(struct pci_dev *pdev, const struct pci_device_id *id)
dev_info(dev, "Registering for DIF/DIX prot_mask=0x%x\n",
prot_mask);
scsi_host_set_prot(hisi_hba->shost, prot_mask);
+ if (hisi_hba->prot_mask & HISI_SAS_DIX_PROT_MASK)
+ scsi_host_set_guard(hisi_hba->shost,
+ SHOST_DIX_GUARD_CRC);
}

rc = scsi_add_host(shost, dev);
--
2.17.1


2019-02-08 23:09:23

by Martin K. Petersen

[permalink] [raw]
Subject: Re: [PATCH 0/6] hisi_sas: DIX support, performance optimisation, and misc patches


John,

> This patchset introduces the following:
> - DIX support for v3 hw

Applied to 5.1/scsi-queue, thanks!

--
Martin K. Petersen Oracle Linux Engineering

2019-02-08 23:14:35

by Martin K. Petersen

[permalink] [raw]
Subject: Re: [PATCH 1/6] scsi: hisi_sas: Add support for DIX feature for v3 hw


John,

Just noticed this while inspecting the resulting complete diff:

> static u32 get_prot_chk_msk_v3_hw(struct scsi_cmnd *scsi_cmnd)
> {
> unsigned char prot_flags = scsi_cmnd->prot_flags;
>
> - if (prot_flags & SCSI_PROT_TRANSFER_PI) {
> - if (prot_flags & SCSI_PROT_REF_CHECK)
> - return 0xc << 16;
> - return 0xfc << 16;
> - }
> - return 0;
> + if (prot_flags & SCSI_PROT_REF_CHECK)
> + return T10_CHK_APP_TAG_MSK;

Polarity is a bit unclear here. Is this statement disabling checking of
the app tag?

> + return T10_CHK_REF_TAG_MSK | T10_CHK_APP_TAG_MSK;
> }

--
Martin K. Petersen Oracle Linux Engineering

2019-02-11 11:38:38

by chenxiang (M)

[permalink] [raw]
Subject: Re: [PATCH 1/6] scsi: hisi_sas: Add support for DIX feature for v3 hw

Hi Martin,

?? 2019/2/9 7:13, Martin K. Petersen ะด??:
> John,
>
> Just noticed this while inspecting the resulting complete diff:
>
>> static u32 get_prot_chk_msk_v3_hw(struct scsi_cmnd *scsi_cmnd)
>> {
>> unsigned char prot_flags = scsi_cmnd->prot_flags;
>>
>> - if (prot_flags & SCSI_PROT_TRANSFER_PI) {
>> - if (prot_flags & SCSI_PROT_REF_CHECK)
>> - return 0xc << 16;
>> - return 0xfc << 16;
>> - }
>> - return 0;
>> + if (prot_flags & SCSI_PROT_REF_CHECK)
>> + return T10_CHK_APP_TAG_MSK;
> Polarity is a bit unclear here. Is this statement disabling checking of
> the app tag?

Yes, disabling checking of app tag.
For here, 1'b0 presents the related byte in DIF will be checked, and
1'b1 presents the related byte in DIF will not be checked.


>
>> + return T10_CHK_REF_TAG_MSK | T10_CHK_APP_TAG_MSK;
>> }



2019-02-13 03:03:13

by Martin K. Petersen

[permalink] [raw]
Subject: Re: [PATCH 1/6] scsi: hisi_sas: Add support for DIX feature for v3 hw


chenxiang,

>>> + if (prot_flags & SCSI_PROT_REF_CHECK)
>>> + return T10_CHK_APP_TAG_MSK;

>> Polarity is a bit unclear here. Is this statement disabling checking of
>> the app tag?
>
> Yes, disabling checking of app tag.
> For here, 1'b0 presents the related byte in DIF will be checked, and
> 1'b1 presents the related byte in DIF will not be checked.

OK, thanks for the clarification!

--
Martin K. Petersen Oracle Linux Engineering