2022-01-25 16:44:26

by John Garry

[permalink] [raw]
Subject: [PATCH 00/16] scsi: libsas and users: Factor out LLDD TMF code

The LLDD TMF code is almost identical between hisi_sas, pm8001, and mvsas
drivers.

This series factors out that code into libsas, thus reducing much
duplication and giving a net reduction of ~250 LoC.

There are some subtle differences between the core TMF handler and each
of the LLDDs old implementation, so any review and testing is appreciated.

Some other minor patches are thrown in:
- Delete unused macro in hisi_sas driver
- Delete unused libsas callback
- Add enum for response frame datapres field

I have another follow-up series to factor out the internal abort code,
which is common to hisi_sas and pm8001 drivers.

Based on v5.17-rc1

John Garry (16):
scsi: libsas: Use enum for response frame DATAPRES field
scsi: libsas: Delete lldd_clear_aca callback
scsi: hisi_sas: Delete unused I_T_NEXUS_RESET_PHYUP_TIMEOUT
scsi: libsas: Move SMP task handlers to core
scsi: libsas: Add struct sas_tmf_task
scsi: libsas: Add sas_task.tmf
scsi: libsas: Add sas_execute_tmf()
scsi: libsas: Add sas_execute_ssp_tmf()
scsi: libsas: Add TMF handler exec complete callback
scsi: libsas: Add TMF handler aborted callback
scsi: libsas: Add sas_abort_task_set()
scsi: libsas: Add sas_clear_task_set()
scsi: libsas: Add sas_lu_reset()
scsi: libsas: Add sas_query_task()
scsi: libsas: Add sas_abort_task()
scsi: libsas: Add sas_execute_ata_cmd()

Documentation/scsi/libsas.rst | 2 -
drivers/scsi/aic94xx/aic94xx.h | 1 -
drivers/scsi/aic94xx/aic94xx_init.c | 1 -
drivers/scsi/aic94xx/aic94xx_tmf.c | 9 -
drivers/scsi/hisi_sas/hisi_sas.h | 9 +-
drivers/scsi/hisi_sas/hisi_sas_main.c | 235 ++++---------------------
drivers/scsi/hisi_sas/hisi_sas_v1_hw.c | 2 +-
drivers/scsi/hisi_sas/hisi_sas_v2_hw.c | 9 +-
drivers/scsi/hisi_sas/hisi_sas_v3_hw.c | 2 +-
drivers/scsi/isci/init.c | 1 -
drivers/scsi/isci/task.c | 18 --
drivers/scsi/isci/task.h | 4 -
drivers/scsi/libsas/sas_ata.c | 8 +
drivers/scsi/libsas/sas_expander.c | 24 +--
drivers/scsi/libsas/sas_internal.h | 6 +
drivers/scsi/libsas/sas_scsi_host.c | 220 +++++++++++++++++++++++
drivers/scsi/libsas/sas_task.c | 12 +-
drivers/scsi/mvsas/mv_defs.h | 5 -
drivers/scsi/mvsas/mv_init.c | 5 +-
drivers/scsi/mvsas/mv_sas.c | 177 +------------------
drivers/scsi/mvsas/mv_sas.h | 3 -
drivers/scsi/pm8001/pm8001_hwi.c | 4 +-
drivers/scsi/pm8001/pm8001_init.c | 4 +-
drivers/scsi/pm8001/pm8001_sas.c | 180 +++----------------
drivers/scsi/pm8001/pm8001_sas.h | 13 +-
include/scsi/libsas.h | 23 ++-
26 files changed, 353 insertions(+), 624 deletions(-)

--
2.26.2


2022-01-25 16:44:26

by John Garry

[permalink] [raw]
Subject: [PATCH 11/16] scsi: libsas: Add sas_abort_task_set()

Add a generic implementation of abort task set TMF handler, and use in
LLDDs.

Signed-off-by: John Garry <[email protected]>
---
drivers/scsi/hisi_sas/hisi_sas_main.c | 5 +----
drivers/scsi/libsas/sas_scsi_host.c | 16 ++++++++++++----
drivers/scsi/mvsas/mv_init.c | 2 +-
drivers/scsi/mvsas/mv_sas.c | 11 -----------
drivers/scsi/mvsas/mv_sas.h | 1 -
drivers/scsi/pm8001/pm8001_init.c | 2 +-
drivers/scsi/pm8001/pm8001_sas.c | 8 --------
drivers/scsi/pm8001/pm8001_sas.h | 1 -
include/scsi/libsas.h | 2 ++
9 files changed, 17 insertions(+), 31 deletions(-)

diff --git a/drivers/scsi/hisi_sas/hisi_sas_main.c b/drivers/scsi/hisi_sas/hisi_sas_main.c
index a2c03cb6414f..7bf0d400d11f 100644
--- a/drivers/scsi/hisi_sas/hisi_sas_main.c
+++ b/drivers/scsi/hisi_sas/hisi_sas_main.c
@@ -1793,7 +1793,6 @@ static int hisi_sas_abort_task_set(struct domain_device *device, u8 *lun)
{
struct hisi_hba *hisi_hba = dev_to_hisi_hba(device);
struct device *dev = hisi_hba->dev;
- struct sas_tmf_task tmf_task;
int rc;

rc = hisi_sas_internal_task_abort(hisi_hba, device,
@@ -1804,9 +1803,7 @@ static int hisi_sas_abort_task_set(struct domain_device *device, u8 *lun)
}
hisi_sas_dereg_device(hisi_hba, device);

- tmf_task.tmf = TMF_ABORT_TASK_SET;
- rc = hisi_sas_debug_issue_ssp_tmf(device, lun, &tmf_task);
-
+ rc = sas_abort_task_set(device, lun);
if (rc == TMF_RESP_FUNC_COMPLETE)
hisi_sas_release_task(hisi_hba, device);

diff --git a/drivers/scsi/libsas/sas_scsi_host.c b/drivers/scsi/libsas/sas_scsi_host.c
index 956b4387c1b8..b18cbb617710 100644
--- a/drivers/scsi/libsas/sas_scsi_host.c
+++ b/drivers/scsi/libsas/sas_scsi_host.c
@@ -1033,10 +1033,8 @@ int sas_execute_tmf(struct domain_device *device, void *parameter,
return res;
}

-int sas_execute_ssp_tmf(struct domain_device *device, u8 *lun,
- struct sas_tmf_task *tmf);
-int sas_execute_ssp_tmf(struct domain_device *device, u8 *lun,
- struct sas_tmf_task *tmf)
+static int sas_execute_ssp_tmf(struct domain_device *device, u8 *lun,
+ struct sas_tmf_task *tmf)
{
struct sas_ssp_task ssp_task;

@@ -1048,6 +1046,16 @@ int sas_execute_ssp_tmf(struct domain_device *device, u8 *lun,
return sas_execute_tmf(device, &ssp_task, sizeof(ssp_task), -1, tmf);
}

+int sas_abort_task_set(struct domain_device *dev, u8 *lun)
+{
+ struct sas_tmf_task tmf_task = {
+ .tmf = TMF_ABORT_TASK_SET,
+ };
+
+ return sas_execute_ssp_tmf(dev, lun, &tmf_task);
+}
+EXPORT_SYMBOL_GPL(sas_abort_task_set);
+
/*
* Tell an upper layer that it needs to initiate an abort for a given task.
* This should only ever be called by an LLDD.
diff --git a/drivers/scsi/mvsas/mv_init.c b/drivers/scsi/mvsas/mv_init.c
index 21429e3a3632..484ad2f41f48 100644
--- a/drivers/scsi/mvsas/mv_init.c
+++ b/drivers/scsi/mvsas/mv_init.c
@@ -63,7 +63,7 @@ static struct sas_domain_function_template mvs_transport_ops = {
.lldd_control_phy = mvs_phy_control,

.lldd_abort_task = mvs_abort_task,
- .lldd_abort_task_set = mvs_abort_task_set,
+ .lldd_abort_task_set = sas_abort_task_set,
.lldd_clear_task_set = mvs_clear_task_set,
.lldd_I_T_nexus_reset = mvs_I_T_nexus_reset,
.lldd_lu_reset = mvs_lu_reset,
diff --git a/drivers/scsi/mvsas/mv_sas.c b/drivers/scsi/mvsas/mv_sas.c
index 5105d55eb00c..3b2aa7117eb4 100644
--- a/drivers/scsi/mvsas/mv_sas.c
+++ b/drivers/scsi/mvsas/mv_sas.c
@@ -1539,17 +1539,6 @@ int mvs_abort_task(struct sas_task *task)
return rc;
}

-int mvs_abort_task_set(struct domain_device *dev, u8 *lun)
-{
- int rc;
- struct sas_tmf_task tmf_task;
-
- tmf_task.tmf = TMF_ABORT_TASK_SET;
- rc = mvs_debug_issue_ssp_tmf(dev, lun, &tmf_task);
-
- return rc;
-}
-
int mvs_clear_task_set(struct domain_device *dev, u8 *lun)
{
int rc = TMF_RESP_FUNC_FAILED;
diff --git a/drivers/scsi/mvsas/mv_sas.h b/drivers/scsi/mvsas/mv_sas.h
index fa654c73beee..0bee63596208 100644
--- a/drivers/scsi/mvsas/mv_sas.h
+++ b/drivers/scsi/mvsas/mv_sas.h
@@ -440,7 +440,6 @@ void mvs_scan_start(struct Scsi_Host *shost);
int mvs_scan_finished(struct Scsi_Host *shost, unsigned long time);
int mvs_queue_command(struct sas_task *task, gfp_t gfp_flags);
int mvs_abort_task(struct sas_task *task);
-int mvs_abort_task_set(struct domain_device *dev, u8 *lun);
int mvs_clear_task_set(struct domain_device *dev, u8 * lun);
void mvs_port_formed(struct asd_sas_phy *sas_phy);
void mvs_port_deformed(struct asd_sas_phy *sas_phy);
diff --git a/drivers/scsi/pm8001/pm8001_init.c b/drivers/scsi/pm8001/pm8001_init.c
index 8eef8f4de42f..65489c52c50a 100644
--- a/drivers/scsi/pm8001/pm8001_init.c
+++ b/drivers/scsi/pm8001/pm8001_init.c
@@ -122,7 +122,7 @@ static struct sas_domain_function_template pm8001_transport_ops = {
.lldd_control_phy = pm8001_phy_control,

.lldd_abort_task = pm8001_abort_task,
- .lldd_abort_task_set = pm8001_abort_task_set,
+ .lldd_abort_task_set = sas_abort_task_set,
.lldd_clear_task_set = pm8001_clear_task_set,
.lldd_I_T_nexus_reset = pm8001_I_T_nexus_reset,
.lldd_lu_reset = pm8001_lu_reset,
diff --git a/drivers/scsi/pm8001/pm8001_sas.c b/drivers/scsi/pm8001/pm8001_sas.c
index 06e5d5741ae2..c142bd0ce634 100644
--- a/drivers/scsi/pm8001/pm8001_sas.c
+++ b/drivers/scsi/pm8001/pm8001_sas.c
@@ -1341,14 +1341,6 @@ int pm8001_abort_task(struct sas_task *task)
return rc;
}

-int pm8001_abort_task_set(struct domain_device *dev, u8 *lun)
-{
- struct sas_tmf_task tmf_task;
-
- tmf_task.tmf = TMF_ABORT_TASK_SET;
- return pm8001_issue_ssp_tmf(dev, lun, &tmf_task);
-}
-
int pm8001_clear_task_set(struct domain_device *dev, u8 *lun)
{
struct sas_tmf_task tmf_task;
diff --git a/drivers/scsi/pm8001/pm8001_sas.h b/drivers/scsi/pm8001/pm8001_sas.h
index c19c9c80206c..817a37608133 100644
--- a/drivers/scsi/pm8001/pm8001_sas.h
+++ b/drivers/scsi/pm8001/pm8001_sas.h
@@ -644,7 +644,6 @@ void pm8001_scan_start(struct Scsi_Host *shost);
int pm8001_scan_finished(struct Scsi_Host *shost, unsigned long time);
int pm8001_queue_command(struct sas_task *task, gfp_t gfp_flags);
int pm8001_abort_task(struct sas_task *task);
-int pm8001_abort_task_set(struct domain_device *dev, u8 *lun);
int pm8001_clear_task_set(struct domain_device *dev, u8 *lun);
int pm8001_dev_found(struct domain_device *dev);
void pm8001_dev_gone(struct domain_device *dev);
diff --git a/include/scsi/libsas.h b/include/scsi/libsas.h
index 1e5ecfb2f36e..434437b0e89c 100644
--- a/include/scsi/libsas.h
+++ b/include/scsi/libsas.h
@@ -724,6 +724,8 @@ struct sas_phy *sas_get_local_phy(struct domain_device *dev);

int sas_request_addr(struct Scsi_Host *shost, u8 *addr);

+int sas_abort_task_set(struct domain_device *dev, u8 *lun);
+
int sas_notify_port_event(struct asd_sas_phy *phy, enum port_event event,
gfp_t gfp_flags);
int sas_notify_phy_event(struct asd_sas_phy *phy, enum phy_event event,
--
2.26.2

2022-01-25 16:44:26

by John Garry

[permalink] [raw]
Subject: [PATCH 02/16] scsi: libsas: Delete lldd_clear_aca callback

This callback is never called, so remove support.

Signed-off-by: John Garry <[email protected]>
---
Documentation/scsi/libsas.rst | 2 --
drivers/scsi/aic94xx/aic94xx.h | 1 -
drivers/scsi/aic94xx/aic94xx_init.c | 1 -
drivers/scsi/aic94xx/aic94xx_tmf.c | 9 ---------
drivers/scsi/hisi_sas/hisi_sas_main.c | 12 ------------
drivers/scsi/isci/init.c | 1 -
drivers/scsi/isci/task.c | 18 ------------------
drivers/scsi/isci/task.h | 4 ----
drivers/scsi/mvsas/mv_init.c | 1 -
drivers/scsi/mvsas/mv_sas.c | 11 -----------
drivers/scsi/mvsas/mv_sas.h | 1 -
drivers/scsi/pm8001/pm8001_init.c | 1 -
drivers/scsi/pm8001/pm8001_sas.c | 8 --------
drivers/scsi/pm8001/pm8001_sas.h | 1 -
include/scsi/libsas.h | 1 -
15 files changed, 72 deletions(-)

diff --git a/Documentation/scsi/libsas.rst b/Documentation/scsi/libsas.rst
index 6589dfefbc02..305a253d5c3b 100644
--- a/Documentation/scsi/libsas.rst
+++ b/Documentation/scsi/libsas.rst
@@ -207,7 +207,6 @@ Management Functions (TMFs) described in SAM::
/* Task Management Functions. Must be called from process context. */
int (*lldd_abort_task)(struct sas_task *);
int (*lldd_abort_task_set)(struct domain_device *, u8 *lun);
- int (*lldd_clear_aca)(struct domain_device *, u8 *lun);
int (*lldd_clear_task_set)(struct domain_device *, u8 *lun);
int (*lldd_I_T_nexus_reset)(struct domain_device *);
int (*lldd_lu_reset)(struct domain_device *, u8 *lun);
@@ -262,7 +261,6 @@ can look like this (called last thing from probe())

my_ha->sas_ha.lldd_abort_task = my_abort_task;
my_ha->sas_ha.lldd_abort_task_set = my_abort_task_set;
- my_ha->sas_ha.lldd_clear_aca = my_clear_aca;
my_ha->sas_ha.lldd_clear_task_set = my_clear_task_set;
my_ha->sas_ha.lldd_I_T_nexus_reset= NULL; (2)
my_ha->sas_ha.lldd_lu_reset = my_lu_reset;
diff --git a/drivers/scsi/aic94xx/aic94xx.h b/drivers/scsi/aic94xx/aic94xx.h
index 8f24180646c2..f595bc2ee45e 100644
--- a/drivers/scsi/aic94xx/aic94xx.h
+++ b/drivers/scsi/aic94xx/aic94xx.h
@@ -60,7 +60,6 @@ void asd_set_dmamode(struct domain_device *dev);
/* ---------- TMFs ---------- */
int asd_abort_task(struct sas_task *);
int asd_abort_task_set(struct domain_device *, u8 *lun);
-int asd_clear_aca(struct domain_device *, u8 *lun);
int asd_clear_task_set(struct domain_device *, u8 *lun);
int asd_lu_reset(struct domain_device *, u8 *lun);
int asd_I_T_nexus_reset(struct domain_device *dev);
diff --git a/drivers/scsi/aic94xx/aic94xx_init.c b/drivers/scsi/aic94xx/aic94xx_init.c
index 7a78606598c4..954d0c5ae2e2 100644
--- a/drivers/scsi/aic94xx/aic94xx_init.c
+++ b/drivers/scsi/aic94xx/aic94xx_init.c
@@ -960,7 +960,6 @@ static struct sas_domain_function_template aic94xx_transport_functions = {

.lldd_abort_task = asd_abort_task,
.lldd_abort_task_set = asd_abort_task_set,
- .lldd_clear_aca = asd_clear_aca,
.lldd_clear_task_set = asd_clear_task_set,
.lldd_I_T_nexus_reset = asd_I_T_nexus_reset,
.lldd_lu_reset = asd_lu_reset,
diff --git a/drivers/scsi/aic94xx/aic94xx_tmf.c b/drivers/scsi/aic94xx/aic94xx_tmf.c
index 0eb6e206a2b4..0ff0d6506ccf 100644
--- a/drivers/scsi/aic94xx/aic94xx_tmf.c
+++ b/drivers/scsi/aic94xx/aic94xx_tmf.c
@@ -644,15 +644,6 @@ int asd_abort_task_set(struct domain_device *dev, u8 *lun)
return res;
}

-int asd_clear_aca(struct domain_device *dev, u8 *lun)
-{
- int res = asd_initiate_ssp_tmf(dev, lun, TMF_CLEAR_ACA, 0);
-
- if (res == TMF_RESP_FUNC_COMPLETE)
- asd_clear_nexus_I_T_L(dev, lun);
- return res;
-}
-
int asd_clear_task_set(struct domain_device *dev, u8 *lun)
{
int res = asd_initiate_ssp_tmf(dev, lun, TMF_CLEAR_TASK_SET, 0);
diff --git a/drivers/scsi/hisi_sas/hisi_sas_main.c b/drivers/scsi/hisi_sas/hisi_sas_main.c
index a05ec7aece5a..f014e458edbb 100644
--- a/drivers/scsi/hisi_sas/hisi_sas_main.c
+++ b/drivers/scsi/hisi_sas/hisi_sas_main.c
@@ -1801,17 +1801,6 @@ static int hisi_sas_abort_task_set(struct domain_device *device, u8 *lun)
return rc;
}

-static int hisi_sas_clear_aca(struct domain_device *device, u8 *lun)
-{
- struct hisi_sas_tmf_task tmf_task;
- int rc;
-
- tmf_task.tmf = TMF_CLEAR_ACA;
- rc = hisi_sas_debug_issue_ssp_tmf(device, lun, &tmf_task);
-
- return rc;
-}
-
#define I_T_NEXUS_RESET_PHYUP_TIMEOUT (2 * HZ)

static int hisi_sas_debug_I_T_nexus_reset(struct domain_device *device)
@@ -2341,7 +2330,6 @@ static struct sas_domain_function_template hisi_sas_transport_ops = {
.lldd_control_phy = hisi_sas_control_phy,
.lldd_abort_task = hisi_sas_abort_task,
.lldd_abort_task_set = hisi_sas_abort_task_set,
- .lldd_clear_aca = hisi_sas_clear_aca,
.lldd_I_T_nexus_reset = hisi_sas_I_T_nexus_reset,
.lldd_lu_reset = hisi_sas_lu_reset,
.lldd_query_task = hisi_sas_query_task,
diff --git a/drivers/scsi/isci/init.c b/drivers/scsi/isci/init.c
index aade707c5553..e294d5d961eb 100644
--- a/drivers/scsi/isci/init.c
+++ b/drivers/scsi/isci/init.c
@@ -193,7 +193,6 @@ static struct sas_domain_function_template isci_transport_ops = {
/* Task Management Functions. Must be called from process context. */
.lldd_abort_task = isci_task_abort_task,
.lldd_abort_task_set = isci_task_abort_task_set,
- .lldd_clear_aca = isci_task_clear_aca,
.lldd_clear_task_set = isci_task_clear_task_set,
.lldd_I_T_nexus_reset = isci_task_I_T_nexus_reset,
.lldd_lu_reset = isci_task_lu_reset,
diff --git a/drivers/scsi/isci/task.c b/drivers/scsi/isci/task.c
index 3fd88d72a0c0..403bfee34d84 100644
--- a/drivers/scsi/isci/task.c
+++ b/drivers/scsi/isci/task.c
@@ -625,24 +625,6 @@ int isci_task_abort_task_set(
}


-/**
- * isci_task_clear_aca() - This function is one of the SAS Domain Template
- * functions. This is one of the Task Management functoins called by libsas.
- * @d_device: This parameter specifies the domain device associated with this
- * request.
- * @lun: This parameter specifies the lun associated with this request.
- *
- * status, zero indicates success.
- */
-int isci_task_clear_aca(
- struct domain_device *d_device,
- u8 *lun)
-{
- return TMF_RESP_FUNC_FAILED;
-}
-
-
-
/**
* isci_task_clear_task_set() - This function is one of the SAS Domain Template
* functions. This is one of the Task Management functoins called by libsas.
diff --git a/drivers/scsi/isci/task.h b/drivers/scsi/isci/task.h
index cae168b8916f..f96633fa6939 100644
--- a/drivers/scsi/isci/task.h
+++ b/drivers/scsi/isci/task.h
@@ -140,10 +140,6 @@ int isci_task_abort_task_set(
struct domain_device *d_device,
u8 *lun);

-int isci_task_clear_aca(
- struct domain_device *d_device,
- u8 *lun);
-
int isci_task_clear_task_set(
struct domain_device *d_device,
u8 *lun);
diff --git a/drivers/scsi/mvsas/mv_init.c b/drivers/scsi/mvsas/mv_init.c
index dcae2d4464f9..21429e3a3632 100644
--- a/drivers/scsi/mvsas/mv_init.c
+++ b/drivers/scsi/mvsas/mv_init.c
@@ -64,7 +64,6 @@ static struct sas_domain_function_template mvs_transport_ops = {

.lldd_abort_task = mvs_abort_task,
.lldd_abort_task_set = mvs_abort_task_set,
- .lldd_clear_aca = mvs_clear_aca,
.lldd_clear_task_set = mvs_clear_task_set,
.lldd_I_T_nexus_reset = mvs_I_T_nexus_reset,
.lldd_lu_reset = mvs_lu_reset,
diff --git a/drivers/scsi/mvsas/mv_sas.c b/drivers/scsi/mvsas/mv_sas.c
index 1e52bc7febfa..fd273c3e069e 100644
--- a/drivers/scsi/mvsas/mv_sas.c
+++ b/drivers/scsi/mvsas/mv_sas.c
@@ -1553,17 +1553,6 @@ int mvs_abort_task_set(struct domain_device *dev, u8 *lun)
return rc;
}

-int mvs_clear_aca(struct domain_device *dev, u8 *lun)
-{
- int rc = TMF_RESP_FUNC_FAILED;
- struct mvs_tmf_task tmf_task;
-
- tmf_task.tmf = TMF_CLEAR_ACA;
- rc = mvs_debug_issue_ssp_tmf(dev, lun, &tmf_task);
-
- return rc;
-}
-
int mvs_clear_task_set(struct domain_device *dev, u8 *lun)
{
int rc = TMF_RESP_FUNC_FAILED;
diff --git a/drivers/scsi/mvsas/mv_sas.h b/drivers/scsi/mvsas/mv_sas.h
index 8ff976c9967e..fa654c73beee 100644
--- a/drivers/scsi/mvsas/mv_sas.h
+++ b/drivers/scsi/mvsas/mv_sas.h
@@ -441,7 +441,6 @@ int mvs_scan_finished(struct Scsi_Host *shost, unsigned long time);
int mvs_queue_command(struct sas_task *task, gfp_t gfp_flags);
int mvs_abort_task(struct sas_task *task);
int mvs_abort_task_set(struct domain_device *dev, u8 *lun);
-int mvs_clear_aca(struct domain_device *dev, u8 *lun);
int mvs_clear_task_set(struct domain_device *dev, u8 * lun);
void mvs_port_formed(struct asd_sas_phy *sas_phy);
void mvs_port_deformed(struct asd_sas_phy *sas_phy);
diff --git a/drivers/scsi/pm8001/pm8001_init.c b/drivers/scsi/pm8001/pm8001_init.c
index d8a2121cb8d9..b8cf1bae4040 100644
--- a/drivers/scsi/pm8001/pm8001_init.c
+++ b/drivers/scsi/pm8001/pm8001_init.c
@@ -123,7 +123,6 @@ static struct sas_domain_function_template pm8001_transport_ops = {

.lldd_abort_task = pm8001_abort_task,
.lldd_abort_task_set = pm8001_abort_task_set,
- .lldd_clear_aca = pm8001_clear_aca,
.lldd_clear_task_set = pm8001_clear_task_set,
.lldd_I_T_nexus_reset = pm8001_I_T_nexus_reset,
.lldd_lu_reset = pm8001_lu_reset,
diff --git a/drivers/scsi/pm8001/pm8001_sas.c b/drivers/scsi/pm8001/pm8001_sas.c
index 160ee8b228c9..4368122ab475 100644
--- a/drivers/scsi/pm8001/pm8001_sas.c
+++ b/drivers/scsi/pm8001/pm8001_sas.c
@@ -1357,14 +1357,6 @@ int pm8001_abort_task_set(struct domain_device *dev, u8 *lun)
return pm8001_issue_ssp_tmf(dev, lun, &tmf_task);
}

-int pm8001_clear_aca(struct domain_device *dev, u8 *lun)
-{
- struct pm8001_tmf_task tmf_task;
-
- tmf_task.tmf = TMF_CLEAR_ACA;
- return pm8001_issue_ssp_tmf(dev, lun, &tmf_task);
-}
-
int pm8001_clear_task_set(struct domain_device *dev, u8 *lun)
{
struct pm8001_tmf_task tmf_task;
diff --git a/drivers/scsi/pm8001/pm8001_sas.h b/drivers/scsi/pm8001/pm8001_sas.h
index a17da1cebce1..3ea53a0d0cc1 100644
--- a/drivers/scsi/pm8001/pm8001_sas.h
+++ b/drivers/scsi/pm8001/pm8001_sas.h
@@ -649,7 +649,6 @@ int pm8001_scan_finished(struct Scsi_Host *shost, unsigned long time);
int pm8001_queue_command(struct sas_task *task, gfp_t gfp_flags);
int pm8001_abort_task(struct sas_task *task);
int pm8001_abort_task_set(struct domain_device *dev, u8 *lun);
-int pm8001_clear_aca(struct domain_device *dev, u8 *lun);
int pm8001_clear_task_set(struct domain_device *dev, u8 *lun);
int pm8001_dev_found(struct domain_device *dev);
void pm8001_dev_gone(struct domain_device *dev);
diff --git a/include/scsi/libsas.h b/include/scsi/libsas.h
index 698f2032807b..d59618898619 100644
--- a/include/scsi/libsas.h
+++ b/include/scsi/libsas.h
@@ -637,7 +637,6 @@ struct sas_domain_function_template {
/* Task Management Functions. Must be called from process context. */
int (*lldd_abort_task)(struct sas_task *);
int (*lldd_abort_task_set)(struct domain_device *, u8 *lun);
- int (*lldd_clear_aca)(struct domain_device *, u8 *lun);
int (*lldd_clear_task_set)(struct domain_device *, u8 *lun);
int (*lldd_I_T_nexus_reset)(struct domain_device *);
int (*lldd_ata_check_ready)(struct domain_device *);
--
2.26.2

2022-01-25 16:44:26

by John Garry

[permalink] [raw]
Subject: [PATCH 06/16] scsi: libsas: Add sas_task.tmf

Add a pointer to a sas_tmf_task to the sas_task struct, as this will be
used when the common LLDD TMF code is factored out.

Also set it for the LLDDs to store per-sas_task TMF info.

Signed-off-by: John Garry <[email protected]>
---
drivers/scsi/hisi_sas/hisi_sas_main.c | 25 +++++++++---------------
drivers/scsi/mvsas/mv_sas.c | 15 ++++++--------
drivers/scsi/pm8001/pm8001_sas.c | 28 ++++++++++-----------------
include/scsi/libsas.h | 1 +
4 files changed, 26 insertions(+), 43 deletions(-)

diff --git a/drivers/scsi/hisi_sas/hisi_sas_main.c b/drivers/scsi/hisi_sas/hisi_sas_main.c
index db6f8ea7864d..4f146aa50423 100644
--- a/drivers/scsi/hisi_sas/hisi_sas_main.c
+++ b/drivers/scsi/hisi_sas/hisi_sas_main.c
@@ -400,8 +400,7 @@ void hisi_sas_task_deliver(struct hisi_hba *hisi_hba,
struct hisi_sas_slot *slot,
struct hisi_sas_dq *dq,
struct hisi_sas_device *sas_dev,
- struct hisi_sas_internal_abort *abort,
- struct sas_tmf_task *tmf)
+ struct hisi_sas_internal_abort *abort)
{
struct hisi_sas_cmd_hdr *cmd_hdr_base;
int dlvry_queue_slot, dlvry_queue;
@@ -427,8 +426,6 @@ void hisi_sas_task_deliver(struct hisi_hba *hisi_hba,
cmd_hdr_base = hisi_hba->cmd_hdr[dlvry_queue];
slot->cmd_hdr = &cmd_hdr_base[dlvry_queue_slot];

- slot->tmf = tmf;
- slot->is_internal = tmf;
task->lldd_task = slot;

memset(slot->cmd_hdr, 0, sizeof(struct hisi_sas_cmd_hdr));
@@ -471,8 +468,7 @@ void hisi_sas_task_deliver(struct hisi_hba *hisi_hba,
spin_unlock(&dq->lock);
}

-static int hisi_sas_task_exec(struct sas_task *task, gfp_t gfp_flags,
- struct sas_tmf_task *tmf)
+static int hisi_sas_queue_command(struct sas_task *task, gfp_t gfp_flags)
{
int n_elem = 0, n_elem_dif = 0, n_elem_req = 0;
struct domain_device *device = task->dev;
@@ -583,11 +579,11 @@ static int hisi_sas_task_exec(struct sas_task *task, gfp_t gfp_flags,
slot->task = task;
slot->port = port;

- slot->tmf = tmf;
- slot->is_internal = tmf;
+ slot->tmf = task->tmf;
+ slot->is_internal = task->tmf;

/* protect task_prep and start_delivery sequence */
- hisi_sas_task_deliver(hisi_hba, slot, dq, sas_dev, NULL, tmf);
+ hisi_sas_task_deliver(hisi_hba, slot, dq, sas_dev, NULL);

return 0;

@@ -1115,11 +1111,6 @@ static void hisi_sas_dev_gone(struct domain_device *device)
up(&hisi_hba->sem);
}

-static int hisi_sas_queue_command(struct sas_task *task, gfp_t gfp_flags)
-{
- return hisi_sas_task_exec(task, gfp_flags, NULL);
-}
-
static int hisi_sas_phy_set_linkrate(struct hisi_hba *hisi_hba, int phy_no,
struct sas_phy_linkrates *r)
{
@@ -1273,7 +1264,9 @@ static int hisi_sas_exec_internal_tmf_task(struct domain_device *device,
task->slow_task->timer.expires = jiffies + TASK_TIMEOUT;
add_timer(&task->slow_task->timer);

- res = hisi_sas_task_exec(task, GFP_KERNEL, tmf);
+ task->tmf = tmf;
+
+ res = hisi_sas_queue_command(task, GFP_KERNEL);
if (res) {
del_timer_sync(&task->slow_task->timer);
dev_err(dev, "abort tmf: executing internal task failed: %d\n",
@@ -2054,7 +2047,7 @@ hisi_sas_internal_abort_task_exec(struct hisi_hba *hisi_hba, int device_id,
slot->port = port;
slot->is_internal = true;

- hisi_sas_task_deliver(hisi_hba, slot, dq, sas_dev, abort, NULL);
+ hisi_sas_task_deliver(hisi_hba, slot, dq, sas_dev, abort);

return 0;

diff --git a/drivers/scsi/mvsas/mv_sas.c b/drivers/scsi/mvsas/mv_sas.c
index 22bffaaf2eb0..5105d55eb00c 100644
--- a/drivers/scsi/mvsas/mv_sas.c
+++ b/drivers/scsi/mvsas/mv_sas.c
@@ -840,14 +840,14 @@ static int mvs_task_prep(struct sas_task *task, struct mvs_info *mvi, int is_tmf
return rc;
}

-static int mvs_task_exec(struct sas_task *task, gfp_t gfp_flags,
- struct completion *completion, int is_tmf,
- struct sas_tmf_task *tmf)
+int mvs_queue_command(struct sas_task *task, gfp_t gfp_flags)
{
struct mvs_info *mvi = NULL;
u32 rc = 0;
u32 pass = 0;
unsigned long flags = 0;
+ struct sas_tmf_task *tmf = task->tmf;
+ int is_tmf = !!task->tmf;

mvi = ((struct mvs_device *)task->dev->lldd_dev)->mvi_info;

@@ -864,11 +864,6 @@ static int mvs_task_exec(struct sas_task *task, gfp_t gfp_flags,
return rc;
}

-int mvs_queue_command(struct sas_task *task, gfp_t gfp_flags)
-{
- return mvs_task_exec(task, gfp_flags, NULL, 0, NULL);
-}
-
static void mvs_slot_free(struct mvs_info *mvi, u32 rx_desc)
{
u32 slot_idx = rx_desc & RXQ_SLOT_MASK;
@@ -1300,7 +1295,9 @@ static int mvs_exec_internal_tmf_task(struct domain_device *dev,
task->slow_task->timer.expires = jiffies + MVS_TASK_TIMEOUT*HZ;
add_timer(&task->slow_task->timer);

- res = mvs_task_exec(task, GFP_KERNEL, NULL, 1, tmf);
+ task->tmf = tmf;
+
+ res = mvs_queue_command(task, GFP_KERNEL);

if (res) {
del_timer(&task->slow_task->timer);
diff --git a/drivers/scsi/pm8001/pm8001_sas.c b/drivers/scsi/pm8001/pm8001_sas.c
index 0a3701d97549..323b172243b8 100644
--- a/drivers/scsi/pm8001/pm8001_sas.c
+++ b/drivers/scsi/pm8001/pm8001_sas.c
@@ -371,15 +371,14 @@ static int sas_find_local_port_id(struct domain_device *dev)

#define DEV_IS_GONE(pm8001_dev) \
((!pm8001_dev || (pm8001_dev->dev_type == SAS_PHY_UNUSED)))
+
/**
- * pm8001_task_exec - queue the task(ssp, smp && ata) to the hardware.
+ * pm8001_queue_command - register for upper layer used, all IO commands sent
+ * to HBA are from this interface.
* @task: the task to be execute.
- * @gfp_flags: gfp_flags.
- * @is_tmf: if it is task management task.
- * @tmf: the task management IU
+ * @gfp_flags: gfp_flags
*/
-static int pm8001_task_exec(struct sas_task *task,
- gfp_t gfp_flags, int is_tmf, struct sas_tmf_task *tmf)
+int pm8001_queue_command(struct sas_task *task, gfp_t gfp_flags)
{
struct domain_device *dev = task->dev;
struct pm8001_hba_info *pm8001_ha;
@@ -390,6 +389,8 @@ static int pm8001_task_exec(struct sas_task *task,
u32 tag = 0xdeadbeef, rc = 0, n_elem = 0;
unsigned long flags = 0;
enum sas_protocol task_proto = t->task_proto;
+ struct sas_tmf_task *tmf = task->tmf;
+ int is_tmf = !!task->tmf;

if (!dev->port) {
struct task_status_struct *tsm = &t->task_status;
@@ -507,17 +508,6 @@ static int pm8001_task_exec(struct sas_task *task,
return rc;
}

-/**
- * pm8001_queue_command - register for upper layer used, all IO commands sent
- * to HBA are from this interface.
- * @task: the task to be execute.
- * @gfp_flags: gfp_flags
- */
-int pm8001_queue_command(struct sas_task *task, gfp_t gfp_flags)
-{
- return pm8001_task_exec(task, gfp_flags, 0, NULL);
-}
-
/**
* pm8001_ccb_task_free - free the sg for ssp and smp command, free the ccb.
* @pm8001_ha: our hba card information
@@ -752,7 +742,9 @@ static int pm8001_exec_internal_tmf_task(struct domain_device *dev,
task->slow_task->timer.expires = jiffies + PM8001_TASK_TIMEOUT*HZ;
add_timer(&task->slow_task->timer);

- res = pm8001_task_exec(task, GFP_KERNEL, 1, tmf);
+ task->tmf = tmf;
+
+ res = pm8001_queue_command(task, GFP_KERNEL);

if (res) {
del_timer(&task->slow_task->timer);
diff --git a/include/scsi/libsas.h b/include/scsi/libsas.h
index 30b9afec1e1d..802b4e04e3a1 100644
--- a/include/scsi/libsas.h
+++ b/include/scsi/libsas.h
@@ -611,6 +611,7 @@ struct sas_task {
void *lldd_task; /* for use by LLDDs */
void *uldd_task;
struct sas_task_slow *slow_task;
+ struct sas_tmf_task *tmf;
};

struct sas_task_slow {
--
2.26.2

2022-01-25 16:44:26

by John Garry

[permalink] [raw]
Subject: [PATCH 15/16] scsi: libsas: Add sas_abort_task()

Add a generic implementation of abort task TMF handler, and use in LLDDs.

With that, some LLDDs custom TMF functions can now be deleted.

Signed-off-by: John Garry <[email protected]>
---
drivers/scsi/hisi_sas/hisi_sas_main.c | 27 +-----
drivers/scsi/libsas/sas_scsi_host.c | 16 ++++
drivers/scsi/mvsas/mv_sas.c | 118 +-------------------------
drivers/scsi/pm8001/pm8001_sas.c | 111 +-----------------------
include/scsi/libsas.h | 1 +
5 files changed, 20 insertions(+), 253 deletions(-)

diff --git a/drivers/scsi/hisi_sas/hisi_sas_main.c b/drivers/scsi/hisi_sas/hisi_sas_main.c
index c2a6d143b6ba..359a5718fc59 100644
--- a/drivers/scsi/hisi_sas/hisi_sas_main.c
+++ b/drivers/scsi/hisi_sas/hisi_sas_main.c
@@ -10,8 +10,6 @@
#define DEV_IS_GONE(dev) \
((!dev) || (dev->dev_type == SAS_PHY_UNUSED))

-static int hisi_sas_debug_issue_ssp_tmf(struct domain_device *device,
- u8 *lun, struct sas_tmf_task *tmf);
static int
hisi_sas_internal_task_abort(struct hisi_hba *hisi_hba,
struct domain_device *device,
@@ -1421,20 +1419,6 @@ static int hisi_sas_softreset_ata_disk(struct domain_device *device)
return rc;
}

-static int hisi_sas_debug_issue_ssp_tmf(struct domain_device *device,
- u8 *lun, struct sas_tmf_task *tmf)
-{
- struct sas_ssp_task ssp_task;
-
- if (!(device->tproto & SAS_PROTOCOL_SSP))
- return TMF_RESP_FUNC_ESUPP;
-
- memcpy(ssp_task.LUN, lun, 8);
-
- return hisi_sas_exec_internal_tmf_task(device, &ssp_task,
- sizeof(ssp_task), tmf);
-}
-
static void hisi_sas_refresh_port_id(struct hisi_hba *hisi_hba)
{
u32 state = hisi_hba->hw->get_phys_state(hisi_hba);
@@ -1680,8 +1664,6 @@ static int hisi_sas_controller_reset(struct hisi_hba *hisi_hba)

static int hisi_sas_abort_task(struct sas_task *task)
{
- struct scsi_lun lun;
- struct sas_tmf_task tmf_task;
struct domain_device *device = task->dev;
struct hisi_sas_device *sas_dev = device->lldd_dev;
struct hisi_hba *hisi_hba;
@@ -1716,18 +1698,11 @@ static int hisi_sas_abort_task(struct sas_task *task)
spin_unlock_irqrestore(&task->task_state_lock, flags);

if (task->lldd_task && task->task_proto & SAS_PROTOCOL_SSP) {
- struct scsi_cmnd *cmnd = task->uldd_task;
struct hisi_sas_slot *slot = task->lldd_task;
u16 tag = slot->idx;
int rc2;

- int_to_scsilun(cmnd->device->lun, &lun);
- tmf_task.tmf = TMF_ABORT_TASK;
- tmf_task.tag_of_task_to_be_managed = tag;
-
- rc = hisi_sas_debug_issue_ssp_tmf(task->dev, lun.scsi_lun,
- &tmf_task);
-
+ rc = sas_abort_task(task, tag);
rc2 = hisi_sas_internal_task_abort(hisi_hba, device,
HISI_SAS_INT_ABT_CMD, tag,
false);
diff --git a/drivers/scsi/libsas/sas_scsi_host.c b/drivers/scsi/libsas/sas_scsi_host.c
index b2051a1411f3..3e2e3bcbbf67 100644
--- a/drivers/scsi/libsas/sas_scsi_host.c
+++ b/drivers/scsi/libsas/sas_scsi_host.c
@@ -1092,6 +1092,22 @@ int sas_query_task(struct sas_task *task, u16 tag)
}
EXPORT_SYMBOL_GPL(sas_query_task);

+int sas_abort_task(struct sas_task *task, u16 tag)
+{
+ struct sas_tmf_task tmf_task = {
+ .tmf = TMF_ABORT_TASK,
+ .tag_of_task_to_be_managed = tag,
+ };
+ struct scsi_cmnd *cmnd = task->uldd_task;
+ struct domain_device *dev = task->dev;
+ struct scsi_lun lun;
+
+ int_to_scsilun(cmnd->device->lun, &lun);
+
+ return sas_execute_ssp_tmf(dev, lun.scsi_lun, &tmf_task);
+}
+EXPORT_SYMBOL_GPL(sas_abort_task);
+
/*
* Tell an upper layer that it needs to initiate an abort for a given task.
* This should only ever be called by an LLDD.
diff --git a/drivers/scsi/mvsas/mv_sas.c b/drivers/scsi/mvsas/mv_sas.c
index bf97de75da89..627186a52f7b 100644
--- a/drivers/scsi/mvsas/mv_sas.c
+++ b/drivers/scsi/mvsas/mv_sas.c
@@ -1257,114 +1257,6 @@ void mvs_dev_gone(struct domain_device *dev)
mvs_dev_gone_notify(dev);
}

-static void mvs_task_done(struct sas_task *task)
-{
- if (!del_timer(&task->slow_task->timer))
- return;
- complete(&task->slow_task->completion);
-}
-
-static void mvs_tmf_timedout(struct timer_list *t)
-{
- struct sas_task_slow *slow = from_timer(slow, t, timer);
- struct sas_task *task = slow->task;
-
- task->task_state_flags |= SAS_TASK_STATE_ABORTED;
- complete(&task->slow_task->completion);
-}
-
-#define MVS_TASK_TIMEOUT 20
-static int mvs_exec_internal_tmf_task(struct domain_device *dev,
- void *parameter, u32 para_len, struct sas_tmf_task *tmf)
-{
- int res, retry;
- struct sas_task *task = NULL;
-
- for (retry = 0; retry < 3; retry++) {
- task = sas_alloc_slow_task(GFP_KERNEL);
- if (!task)
- return -ENOMEM;
-
- task->dev = dev;
- task->task_proto = dev->tproto;
-
- memcpy(&task->ssp_task, parameter, para_len);
- task->task_done = mvs_task_done;
-
- task->slow_task->timer.function = mvs_tmf_timedout;
- task->slow_task->timer.expires = jiffies + MVS_TASK_TIMEOUT*HZ;
- add_timer(&task->slow_task->timer);
-
- task->tmf = tmf;
-
- res = mvs_queue_command(task, GFP_KERNEL);
-
- if (res) {
- del_timer(&task->slow_task->timer);
- mv_printk("executing internal task failed:%d\n", res);
- goto ex_err;
- }
-
- wait_for_completion(&task->slow_task->completion);
- res = TMF_RESP_FUNC_FAILED;
- /* Even TMF timed out, return direct. */
- if ((task->task_state_flags & SAS_TASK_STATE_ABORTED)) {
- if (!(task->task_state_flags & SAS_TASK_STATE_DONE)) {
- mv_printk("TMF task[%x] timeout.\n", tmf->tmf);
- goto ex_err;
- }
- }
-
- if (task->task_status.resp == SAS_TASK_COMPLETE &&
- task->task_status.stat == SAS_SAM_STAT_GOOD) {
- res = TMF_RESP_FUNC_COMPLETE;
- break;
- }
-
- if (task->task_status.resp == SAS_TASK_COMPLETE &&
- task->task_status.stat == SAS_DATA_UNDERRUN) {
- /* no error, but return the number of bytes of
- * underrun */
- res = task->task_status.residual;
- break;
- }
-
- if (task->task_status.resp == SAS_TASK_COMPLETE &&
- task->task_status.stat == SAS_DATA_OVERRUN) {
- mv_dprintk("blocked task error.\n");
- res = -EMSGSIZE;
- break;
- } else {
- mv_dprintk(" task to dev %016llx response: 0x%x "
- "status 0x%x\n",
- SAS_ADDR(dev->sas_addr),
- task->task_status.resp,
- task->task_status.stat);
- sas_free_task(task);
- task = NULL;
-
- }
- }
-ex_err:
- BUG_ON(retry == 3 && task != NULL);
- sas_free_task(task);
- return res;
-}
-
-static int mvs_debug_issue_ssp_tmf(struct domain_device *dev,
- u8 *lun, struct sas_tmf_task *tmf)
-{
- struct sas_ssp_task ssp_task;
- if (!(dev->tproto & SAS_PROTOCOL_SSP))
- return TMF_RESP_FUNC_ESUPP;
-
- memcpy(ssp_task.LUN, lun, 8);
-
- return mvs_exec_internal_tmf_task(dev, &ssp_task,
- sizeof(ssp_task), tmf);
-}
-
-
/* Standard mandates link reset for ATA (type 0)
and hard reset for SSP (type 1) , only for RECOVERY */
static int mvs_debug_I_T_nexus_reset(struct domain_device *dev)
@@ -1455,8 +1347,6 @@ int mvs_query_task(struct sas_task *task)
/* mandatory SAM-3, still need free task/slot info */
int mvs_abort_task(struct sas_task *task)
{
- struct scsi_lun lun;
- struct sas_tmf_task tmf_task;
struct domain_device *dev = task->dev;
struct mvs_device *mvi_dev = (struct mvs_device *)dev->lldd_dev;
struct mvs_info *mvi;
@@ -1480,9 +1370,6 @@ int mvs_abort_task(struct sas_task *task)
spin_unlock_irqrestore(&task->task_state_lock, flags);
mvi_dev->dev_status = MVS_DEV_EH;
if (task->lldd_task && task->task_proto & SAS_PROTOCOL_SSP) {
- struct scsi_cmnd * cmnd = (struct scsi_cmnd *)task->uldd_task;
-
- int_to_scsilun(cmnd->device->lun, &lun);
rc = mvs_find_tag(mvi, task, &tag);
if (rc == 0) {
mv_printk("No such tag in %s\n", __func__);
@@ -1490,10 +1377,7 @@ int mvs_abort_task(struct sas_task *task)
return rc;
}

- tmf_task.tmf = TMF_ABORT_TASK;
- tmf_task.tag_of_task_to_be_managed = cpu_to_le16(tag);
-
- rc = mvs_debug_issue_ssp_tmf(dev, lun.scsi_lun, &tmf_task);
+ rc = sas_abort_task(task, tag);

/* if successful, clear the task and callback forwards.*/
if (rc == TMF_RESP_FUNC_COMPLETE) {
diff --git a/drivers/scsi/pm8001/pm8001_sas.c b/drivers/scsi/pm8001/pm8001_sas.c
index baa919951db6..6607a8919d1d 100644
--- a/drivers/scsi/pm8001/pm8001_sas.c
+++ b/drivers/scsi/pm8001/pm8001_sas.c
@@ -709,97 +709,6 @@ static void pm8001_tmf_timedout(struct timer_list *t)
}

#define PM8001_TASK_TIMEOUT 20
-/**
- * pm8001_exec_internal_tmf_task - execute some task management commands.
- * @dev: the wanted device.
- * @tmf: which task management wanted to be take.
- * @para_len: para_len.
- * @parameter: ssp task parameter.
- *
- * when errors or exception happened, we may want to do something, for example
- * abort the issued task which result in this exception, it is done by calling
- * this function, note it is also with the task execute interface.
- */
-static int pm8001_exec_internal_tmf_task(struct domain_device *dev,
- void *parameter, u32 para_len, struct sas_tmf_task *tmf)
-{
- int res, retry;
- struct sas_task *task = NULL;
- struct pm8001_hba_info *pm8001_ha = pm8001_find_ha_by_dev(dev);
- struct pm8001_device *pm8001_dev = dev->lldd_dev;
- DECLARE_COMPLETION_ONSTACK(completion_setstate);
-
- for (retry = 0; retry < 3; retry++) {
- task = sas_alloc_slow_task(GFP_KERNEL);
- if (!task)
- return -ENOMEM;
-
- task->dev = dev;
- task->task_proto = dev->tproto;
- memcpy(&task->ssp_task, parameter, para_len);
- task->task_done = pm8001_task_done;
- task->slow_task->timer.function = pm8001_tmf_timedout;
- task->slow_task->timer.expires = jiffies + PM8001_TASK_TIMEOUT*HZ;
- add_timer(&task->slow_task->timer);
-
- task->tmf = tmf;
-
- res = pm8001_queue_command(task, GFP_KERNEL);
-
- if (res) {
- del_timer(&task->slow_task->timer);
- pm8001_dbg(pm8001_ha, FAIL, "Executing internal task failed\n");
- goto ex_err;
- }
- wait_for_completion(&task->slow_task->completion);
- if (pm8001_ha->chip_id != chip_8001) {
- pm8001_dev->setds_completion = &completion_setstate;
- PM8001_CHIP_DISP->set_dev_state_req(pm8001_ha,
- pm8001_dev, DS_OPERATIONAL);
- wait_for_completion(&completion_setstate);
- }
- res = -TMF_RESP_FUNC_FAILED;
- /* Even TMF timed out, return direct. */
- if (task->task_state_flags & SAS_TASK_STATE_ABORTED) {
- pm8001_dbg(pm8001_ha, FAIL, "TMF task[%x]timeout.\n",
- tmf->tmf);
- goto ex_err;
- }
-
- if (task->task_status.resp == SAS_TASK_COMPLETE &&
- task->task_status.stat == SAS_SAM_STAT_GOOD) {
- res = TMF_RESP_FUNC_COMPLETE;
- break;
- }
-
- if (task->task_status.resp == SAS_TASK_COMPLETE &&
- task->task_status.stat == SAS_DATA_UNDERRUN) {
- /* no error, but return the number of bytes of
- * underrun */
- res = task->task_status.residual;
- break;
- }
-
- if (task->task_status.resp == SAS_TASK_COMPLETE &&
- task->task_status.stat == SAS_DATA_OVERRUN) {
- pm8001_dbg(pm8001_ha, FAIL, "Blocked task error.\n");
- res = -EMSGSIZE;
- break;
- } else {
- pm8001_dbg(pm8001_ha, EH,
- " Task to dev %016llx response:0x%x status 0x%x\n",
- SAS_ADDR(dev->sas_addr),
- task->task_status.resp,
- task->task_status.stat);
- sas_free_task(task);
- task = NULL;
- }
- }
-ex_err:
- BUG_ON(retry == 3 && task != NULL);
- sas_free_task(task);
- return res;
-}

static int
pm8001_exec_internal_task_abort(struct pm8001_hba_info *pm8001_ha,
@@ -908,18 +817,6 @@ void pm8001_dev_gone(struct domain_device *dev)
pm8001_dev_gone_notify(dev);
}

-static int pm8001_issue_ssp_tmf(struct domain_device *dev,
- u8 *lun, struct sas_tmf_task *tmf)
-{
- struct sas_ssp_task ssp_task;
- if (!(dev->tproto & SAS_PROTOCOL_SSP))
- return TMF_RESP_FUNC_ESUPP;
-
- memcpy((u8 *)&ssp_task.LUN, lun, 8);
- return pm8001_exec_internal_tmf_task(dev, &ssp_task, sizeof(ssp_task),
- tmf);
-}
-
/* retry commands by ha, by task and/or by device */
void pm8001_open_reject_retry(
struct pm8001_hba_info *pm8001_ha,
@@ -1180,9 +1077,7 @@ int pm8001_abort_task(struct sas_task *task)
u32 tag;
struct domain_device *dev ;
struct pm8001_hba_info *pm8001_ha;
- struct scsi_lun lun;
struct pm8001_device *pm8001_dev;
- struct sas_tmf_task tmf_task;
int rc = TMF_RESP_FUNC_FAILED, ret;
u32 phy_id, port_id;
struct sas_task_slow slow_task;
@@ -1218,11 +1113,7 @@ int pm8001_abort_task(struct sas_task *task)
}
spin_unlock_irqrestore(&task->task_state_lock, flags);
if (task->task_proto & SAS_PROTOCOL_SSP) {
- struct scsi_cmnd *cmnd = task->uldd_task;
- int_to_scsilun(cmnd->device->lun, &lun);
- tmf_task.tmf = TMF_ABORT_TASK;
- tmf_task.tag_of_task_to_be_managed = tag;
- rc = pm8001_issue_ssp_tmf(dev, lun.scsi_lun, &tmf_task);
+ rc = sas_abort_task(task, tag);
pm8001_exec_internal_task_abort(pm8001_ha, pm8001_dev,
pm8001_dev->sas_device, 0, tag);
} else if (task->task_proto & SAS_PROTOCOL_SATA ||
diff --git a/include/scsi/libsas.h b/include/scsi/libsas.h
index 7d524907d546..8a9ffe689d1f 100644
--- a/include/scsi/libsas.h
+++ b/include/scsi/libsas.h
@@ -728,6 +728,7 @@ int sas_abort_task_set(struct domain_device *dev, u8 *lun);
int sas_clear_task_set(struct domain_device *dev, u8 *lun);
int sas_lu_reset(struct domain_device *dev, u8 *lun);
int sas_query_task(struct sas_task *task, u16 tag);
+int sas_abort_task(struct sas_task *task, u16 tag);

int sas_notify_port_event(struct asd_sas_phy *phy, enum port_event event,
gfp_t gfp_flags);
--
2.26.2

2022-01-25 16:45:00

by John Garry

[permalink] [raw]
Subject: [PATCH 16/16] scsi: libsas: Add sas_execute_ata_cmd()

Add a function to execute an ATA command using the TMF code, and use in
the hisi_sas driver. That driver needs to be able to issue the command on
a specific phy, so add an interface for that.

With that, hisi_sas_exec_internal_tmf_task() may be deleted.

Signed-off-by: John Garry <[email protected]>
---
drivers/scsi/hisi_sas/hisi_sas_main.c | 129 +------------------------
drivers/scsi/hisi_sas/hisi_sas_v2_hw.c | 5 +-
drivers/scsi/libsas/sas_ata.c | 8 ++
drivers/scsi/libsas/sas_scsi_host.c | 6 ++
include/scsi/libsas.h | 10 +-
5 files changed, 26 insertions(+), 132 deletions(-)

diff --git a/drivers/scsi/hisi_sas/hisi_sas_main.c b/drivers/scsi/hisi_sas/hisi_sas_main.c
index 359a5718fc59..c32af187e452 100644
--- a/drivers/scsi/hisi_sas/hisi_sas_main.c
+++ b/drivers/scsi/hisi_sas/hisi_sas_main.c
@@ -1245,124 +1245,7 @@ static void hisi_sas_tmf_timedout(struct timer_list *t)
complete(&task->slow_task->completion);
}

-#define TASK_TIMEOUT (20 * HZ)
-#define TASK_RETRY 3
#define INTERNAL_ABORT_TIMEOUT (6 * HZ)
-static int hisi_sas_exec_internal_tmf_task(struct domain_device *device,
- void *parameter, u32 para_len,
- struct sas_tmf_task *tmf)
-{
- struct hisi_sas_device *sas_dev = device->lldd_dev;
- struct hisi_hba *hisi_hba = sas_dev->hisi_hba;
- struct device *dev = hisi_hba->dev;
- struct sas_task *task;
- int res, retry;
-
- for (retry = 0; retry < TASK_RETRY; retry++) {
- task = sas_alloc_slow_task(GFP_KERNEL);
- if (!task)
- return -ENOMEM;
-
- task->dev = device;
- task->task_proto = device->tproto;
-
- if (dev_is_sata(device)) {
- task->ata_task.device_control_reg_update = 1;
- memcpy(&task->ata_task.fis, parameter, para_len);
- } else {
- memcpy(&task->ssp_task, parameter, para_len);
- }
- task->task_done = hisi_sas_task_done;
-
- task->slow_task->timer.function = hisi_sas_tmf_timedout;
- task->slow_task->timer.expires = jiffies + TASK_TIMEOUT;
- add_timer(&task->slow_task->timer);
-
- task->tmf = tmf;
-
- res = hisi_sas_queue_command(task, GFP_KERNEL);
- if (res) {
- del_timer_sync(&task->slow_task->timer);
- dev_err(dev, "abort tmf: executing internal task failed: %d\n",
- res);
- goto ex_err;
- }
-
- wait_for_completion(&task->slow_task->completion);
- res = TMF_RESP_FUNC_FAILED;
- /* Even TMF timed out, return direct. */
- if ((task->task_state_flags & SAS_TASK_STATE_ABORTED)) {
- if (!(task->task_state_flags & SAS_TASK_STATE_DONE)) {
- struct hisi_sas_slot *slot = task->lldd_task;
-
- dev_err(dev, "abort tmf: TMF task timeout and not done\n");
- if (slot) {
- struct hisi_sas_cq *cq =
- &hisi_hba->cq[slot->dlvry_queue];
- /*
- * sync irq to avoid free'ing task
- * before using task in IO completion
- */
- synchronize_irq(cq->irq_no);
- slot->task = NULL;
- }
-
- goto ex_err;
- } else
- dev_err(dev, "abort tmf: TMF task timeout\n");
- }
-
- if (task->task_status.resp == SAS_TASK_COMPLETE &&
- task->task_status.stat == TMF_RESP_FUNC_COMPLETE) {
- res = TMF_RESP_FUNC_COMPLETE;
- break;
- }
-
- if (task->task_status.resp == SAS_TASK_COMPLETE &&
- task->task_status.stat == TMF_RESP_FUNC_SUCC) {
- res = TMF_RESP_FUNC_SUCC;
- break;
- }
-
- if (task->task_status.resp == SAS_TASK_COMPLETE &&
- task->task_status.stat == SAS_DATA_UNDERRUN) {
- /* no error, but return the number of bytes of
- * underrun
- */
- dev_warn(dev, "abort tmf: task to dev %016llx resp: 0x%x sts 0x%x underrun\n",
- SAS_ADDR(device->sas_addr),
- task->task_status.resp,
- task->task_status.stat);
- res = task->task_status.residual;
- break;
- }
-
- if (task->task_status.resp == SAS_TASK_COMPLETE &&
- task->task_status.stat == SAS_DATA_OVERRUN) {
- dev_warn(dev, "abort tmf: blocked task error\n");
- res = -EMSGSIZE;
- break;
- }
-
- if (task->task_status.resp == SAS_TASK_COMPLETE &&
- task->task_status.stat == SAS_OPEN_REJECT) {
- dev_warn(dev, "abort tmf: open reject failed\n");
- res = -EIO;
- } else {
- dev_warn(dev, "abort tmf: task to dev %016llx resp: 0x%x status 0x%x\n",
- SAS_ADDR(device->sas_addr),
- task->task_status.resp,
- task->task_status.stat);
- }
- sas_free_task(task);
- task = NULL;
- }
-ex_err:
- if (retry == TASK_RETRY)
- dev_warn(dev, "abort tmf: executing internal task failed!\n");
- sas_free_task(task);
- return res;
-}

static void hisi_sas_fill_ata_reset_cmd(struct ata_device *dev,
bool reset, int pmp, u8 *fis)
@@ -1386,13 +1269,12 @@ static int hisi_sas_softreset_ata_disk(struct domain_device *device)
int rc = TMF_RESP_FUNC_FAILED;
struct hisi_hba *hisi_hba = dev_to_hisi_hba(device);
struct device *dev = hisi_hba->dev;
- int s = sizeof(struct host_to_dev_fis);

ata_for_each_link(link, ap, EDGE) {
int pmp = sata_srst_pmp(link);

hisi_sas_fill_ata_reset_cmd(link->device, 1, pmp, fis);
- rc = hisi_sas_exec_internal_tmf_task(device, fis, s, NULL);
+ rc = sas_execute_ata_cmd(device, fis, -1);
if (rc != TMF_RESP_FUNC_COMPLETE)
break;
}
@@ -1402,8 +1284,7 @@ static int hisi_sas_softreset_ata_disk(struct domain_device *device)
int pmp = sata_srst_pmp(link);

hisi_sas_fill_ata_reset_cmd(link->device, 0, pmp, fis);
- rc = hisi_sas_exec_internal_tmf_task(device, fis,
- s, NULL);
+ rc = sas_execute_ata_cmd(device, fis, -1);
if (rc != TMF_RESP_FUNC_COMPLETE)
dev_err(dev, "ata disk %016llx de-reset failed\n",
SAS_ADDR(device->sas_addr));
@@ -1513,10 +1394,8 @@ static void hisi_sas_send_ata_reset_each_phy(struct hisi_hba *hisi_hba,
struct asd_sas_port *sas_port,
struct domain_device *device)
{
- struct sas_tmf_task tmf_task = { .force_phy = 1 };
struct ata_port *ap = device->sata_dev.ap;
struct device *dev = hisi_hba->dev;
- int s = sizeof(struct host_to_dev_fis);
int rc = TMF_RESP_FUNC_FAILED;
struct ata_link *link;
u8 fis[20] = {0};
@@ -1529,10 +1408,8 @@ static void hisi_sas_send_ata_reset_each_phy(struct hisi_hba *hisi_hba,
ata_for_each_link(link, ap, EDGE) {
int pmp = sata_srst_pmp(link);

- tmf_task.phy_id = i;
hisi_sas_fill_ata_reset_cmd(link->device, 1, pmp, fis);
- rc = hisi_sas_exec_internal_tmf_task(device, fis, s,
- &tmf_task);
+ rc = sas_execute_ata_cmd(device, fis, i);
if (rc != TMF_RESP_FUNC_COMPLETE) {
dev_err(dev, "phy%d ata reset failed rc=%d\n",
i, rc);
diff --git a/drivers/scsi/hisi_sas/hisi_sas_v2_hw.c b/drivers/scsi/hisi_sas/hisi_sas_v2_hw.c
index ef5d8f64edd8..d8f58d92df4a 100644
--- a/drivers/scsi/hisi_sas/hisi_sas_v2_hw.c
+++ b/drivers/scsi/hisi_sas/hisi_sas_v2_hw.c
@@ -2492,6 +2492,7 @@ static void prep_ata_v2_hw(struct hisi_hba *hisi_hba,
struct hisi_sas_cmd_hdr *hdr = slot->cmd_hdr;
struct asd_sas_port *sas_port = device->port;
struct hisi_sas_port *port = to_hisi_sas_port(sas_port);
+ struct sas_ata_task *ata_task = &task->ata_task;
struct sas_tmf_task *tmf = slot->tmf;
u8 *buf_cmd;
int has_data = 0, hdr_tag = 0;
@@ -2505,9 +2506,9 @@ static void prep_ata_v2_hw(struct hisi_hba *hisi_hba,
else
dw0 |= 4 << CMD_HDR_CMD_OFF;

- if (tmf && tmf->force_phy) {
+ if (tmf && ata_task->force_phy) {
dw0 |= CMD_HDR_FORCE_PHY_MSK;
- dw0 |= (1 << tmf->phy_id) << CMD_HDR_PHY_ID_OFF;
+ dw0 |= (1 << ata_task->force_phy_id) << CMD_HDR_PHY_ID_OFF;
}

hdr->dw0 = cpu_to_le32(dw0);
diff --git a/drivers/scsi/libsas/sas_ata.c b/drivers/scsi/libsas/sas_ata.c
index a315715b3622..350d5e29e25c 100644
--- a/drivers/scsi/libsas/sas_ata.c
+++ b/drivers/scsi/libsas/sas_ata.c
@@ -856,3 +856,11 @@ void sas_ata_wait_eh(struct domain_device *dev)
ap = dev->sata_dev.ap;
ata_port_wait_eh(ap);
}
+
+int sas_execute_ata_cmd(struct domain_device *device, u8 *fis, int force_phy_id)
+{
+ struct sas_tmf_task tmf_task = {};
+ return sas_execute_tmf(device, fis, sizeof(struct host_to_dev_fis),
+ force_phy_id, &tmf_task);
+}
+EXPORT_SYMBOL_GPL(sas_execute_ata_cmd);
diff --git a/drivers/scsi/libsas/sas_scsi_host.c b/drivers/scsi/libsas/sas_scsi_host.c
index 3e2e3bcbbf67..7cc80a025702 100644
--- a/drivers/scsi/libsas/sas_scsi_host.c
+++ b/drivers/scsi/libsas/sas_scsi_host.c
@@ -939,6 +939,12 @@ int sas_execute_tmf(struct domain_device *device, void *parameter,
task->task_proto = device->tproto;

if (dev_is_sata(device)) {
+ task->ata_task.device_control_reg_update = 1;
+ if (force_phy_id >= 0) {
+ task->ata_task.force_phy = true;
+ task->ata_task.force_phy_id = force_phy_id;
+ }
+ memcpy(&task->ata_task.fis, parameter, para_len);
} else {
memcpy(&task->ssp_task, parameter, para_len);
}
diff --git a/include/scsi/libsas.h b/include/scsi/libsas.h
index 8a9ffe689d1f..02d28cd691e4 100644
--- a/include/scsi/libsas.h
+++ b/include/scsi/libsas.h
@@ -553,6 +553,9 @@ struct sas_ata_task {
u8 stp_affil_pol:1;

u8 device_control_reg_update:1;
+
+ bool force_phy;
+ int force_phy_id;
};

struct sas_smp_task {
@@ -580,10 +583,6 @@ struct sas_ssp_task {
struct sas_tmf_task {
u8 tmf;
u16 tag_of_task_to_be_managed;
-
- /* Temp */
- int force_phy;
- int phy_id;
};

struct sas_task {
@@ -729,10 +728,13 @@ int sas_clear_task_set(struct domain_device *dev, u8 *lun);
int sas_lu_reset(struct domain_device *dev, u8 *lun);
int sas_query_task(struct sas_task *task, u16 tag);
int sas_abort_task(struct sas_task *task, u16 tag);
+int sas_execute_ata_cmd(struct domain_device *device, u8 *fis,
+ int force_phy_id);

int sas_notify_port_event(struct asd_sas_phy *phy, enum port_event event,
gfp_t gfp_flags);
int sas_notify_phy_event(struct asd_sas_phy *phy, enum phy_event event,
gfp_t gfp_flags);

+
#endif /* _SASLIB_H_ */
--
2.26.2

2022-01-25 16:51:38

by Jinpu Wang

[permalink] [raw]
Subject: Re: [PATCH 02/16] scsi: libsas: Delete lldd_clear_aca callback

On Tue, Jan 25, 2022 at 12:38 PM John Garry <[email protected]> wrote:
>
> This callback is never called, so remove support.
>
> Signed-off-by: John Garry <[email protected]>
Indeed.
Reviewed-by: Jack Wang <[email protected]>
Thx!
> ---
> Documentation/scsi/libsas.rst | 2 --
> drivers/scsi/aic94xx/aic94xx.h | 1 -
> drivers/scsi/aic94xx/aic94xx_init.c | 1 -
> drivers/scsi/aic94xx/aic94xx_tmf.c | 9 ---------
> drivers/scsi/hisi_sas/hisi_sas_main.c | 12 ------------
> drivers/scsi/isci/init.c | 1 -
> drivers/scsi/isci/task.c | 18 ------------------
> drivers/scsi/isci/task.h | 4 ----
> drivers/scsi/mvsas/mv_init.c | 1 -
> drivers/scsi/mvsas/mv_sas.c | 11 -----------
> drivers/scsi/mvsas/mv_sas.h | 1 -
> drivers/scsi/pm8001/pm8001_init.c | 1 -
> drivers/scsi/pm8001/pm8001_sas.c | 8 --------
> drivers/scsi/pm8001/pm8001_sas.h | 1 -
> include/scsi/libsas.h | 1 -
> 15 files changed, 72 deletions(-)
>
> diff --git a/Documentation/scsi/libsas.rst b/Documentation/scsi/libsas.rst
> index 6589dfefbc02..305a253d5c3b 100644
> --- a/Documentation/scsi/libsas.rst
> +++ b/Documentation/scsi/libsas.rst
> @@ -207,7 +207,6 @@ Management Functions (TMFs) described in SAM::
> /* Task Management Functions. Must be called from process context. */
> int (*lldd_abort_task)(struct sas_task *);
> int (*lldd_abort_task_set)(struct domain_device *, u8 *lun);
> - int (*lldd_clear_aca)(struct domain_device *, u8 *lun);
> int (*lldd_clear_task_set)(struct domain_device *, u8 *lun);
> int (*lldd_I_T_nexus_reset)(struct domain_device *);
> int (*lldd_lu_reset)(struct domain_device *, u8 *lun);
> @@ -262,7 +261,6 @@ can look like this (called last thing from probe())
>
> my_ha->sas_ha.lldd_abort_task = my_abort_task;
> my_ha->sas_ha.lldd_abort_task_set = my_abort_task_set;
> - my_ha->sas_ha.lldd_clear_aca = my_clear_aca;
> my_ha->sas_ha.lldd_clear_task_set = my_clear_task_set;
> my_ha->sas_ha.lldd_I_T_nexus_reset= NULL; (2)
> my_ha->sas_ha.lldd_lu_reset = my_lu_reset;
> diff --git a/drivers/scsi/aic94xx/aic94xx.h b/drivers/scsi/aic94xx/aic94xx.h
> index 8f24180646c2..f595bc2ee45e 100644
> --- a/drivers/scsi/aic94xx/aic94xx.h
> +++ b/drivers/scsi/aic94xx/aic94xx.h
> @@ -60,7 +60,6 @@ void asd_set_dmamode(struct domain_device *dev);
> /* ---------- TMFs ---------- */
> int asd_abort_task(struct sas_task *);
> int asd_abort_task_set(struct domain_device *, u8 *lun);
> -int asd_clear_aca(struct domain_device *, u8 *lun);
> int asd_clear_task_set(struct domain_device *, u8 *lun);
> int asd_lu_reset(struct domain_device *, u8 *lun);
> int asd_I_T_nexus_reset(struct domain_device *dev);
> diff --git a/drivers/scsi/aic94xx/aic94xx_init.c b/drivers/scsi/aic94xx/aic94xx_init.c
> index 7a78606598c4..954d0c5ae2e2 100644
> --- a/drivers/scsi/aic94xx/aic94xx_init.c
> +++ b/drivers/scsi/aic94xx/aic94xx_init.c
> @@ -960,7 +960,6 @@ static struct sas_domain_function_template aic94xx_transport_functions = {
>
> .lldd_abort_task = asd_abort_task,
> .lldd_abort_task_set = asd_abort_task_set,
> - .lldd_clear_aca = asd_clear_aca,
> .lldd_clear_task_set = asd_clear_task_set,
> .lldd_I_T_nexus_reset = asd_I_T_nexus_reset,
> .lldd_lu_reset = asd_lu_reset,
> diff --git a/drivers/scsi/aic94xx/aic94xx_tmf.c b/drivers/scsi/aic94xx/aic94xx_tmf.c
> index 0eb6e206a2b4..0ff0d6506ccf 100644
> --- a/drivers/scsi/aic94xx/aic94xx_tmf.c
> +++ b/drivers/scsi/aic94xx/aic94xx_tmf.c
> @@ -644,15 +644,6 @@ int asd_abort_task_set(struct domain_device *dev, u8 *lun)
> return res;
> }
>
> -int asd_clear_aca(struct domain_device *dev, u8 *lun)
> -{
> - int res = asd_initiate_ssp_tmf(dev, lun, TMF_CLEAR_ACA, 0);
> -
> - if (res == TMF_RESP_FUNC_COMPLETE)
> - asd_clear_nexus_I_T_L(dev, lun);
> - return res;
> -}
> -
> int asd_clear_task_set(struct domain_device *dev, u8 *lun)
> {
> int res = asd_initiate_ssp_tmf(dev, lun, TMF_CLEAR_TASK_SET, 0);
> diff --git a/drivers/scsi/hisi_sas/hisi_sas_main.c b/drivers/scsi/hisi_sas/hisi_sas_main.c
> index a05ec7aece5a..f014e458edbb 100644
> --- a/drivers/scsi/hisi_sas/hisi_sas_main.c
> +++ b/drivers/scsi/hisi_sas/hisi_sas_main.c
> @@ -1801,17 +1801,6 @@ static int hisi_sas_abort_task_set(struct domain_device *device, u8 *lun)
> return rc;
> }
>
> -static int hisi_sas_clear_aca(struct domain_device *device, u8 *lun)
> -{
> - struct hisi_sas_tmf_task tmf_task;
> - int rc;
> -
> - tmf_task.tmf = TMF_CLEAR_ACA;
> - rc = hisi_sas_debug_issue_ssp_tmf(device, lun, &tmf_task);
> -
> - return rc;
> -}
> -
> #define I_T_NEXUS_RESET_PHYUP_TIMEOUT (2 * HZ)
>
> static int hisi_sas_debug_I_T_nexus_reset(struct domain_device *device)
> @@ -2341,7 +2330,6 @@ static struct sas_domain_function_template hisi_sas_transport_ops = {
> .lldd_control_phy = hisi_sas_control_phy,
> .lldd_abort_task = hisi_sas_abort_task,
> .lldd_abort_task_set = hisi_sas_abort_task_set,
> - .lldd_clear_aca = hisi_sas_clear_aca,
> .lldd_I_T_nexus_reset = hisi_sas_I_T_nexus_reset,
> .lldd_lu_reset = hisi_sas_lu_reset,
> .lldd_query_task = hisi_sas_query_task,
> diff --git a/drivers/scsi/isci/init.c b/drivers/scsi/isci/init.c
> index aade707c5553..e294d5d961eb 100644
> --- a/drivers/scsi/isci/init.c
> +++ b/drivers/scsi/isci/init.c
> @@ -193,7 +193,6 @@ static struct sas_domain_function_template isci_transport_ops = {
> /* Task Management Functions. Must be called from process context. */
> .lldd_abort_task = isci_task_abort_task,
> .lldd_abort_task_set = isci_task_abort_task_set,
> - .lldd_clear_aca = isci_task_clear_aca,
> .lldd_clear_task_set = isci_task_clear_task_set,
> .lldd_I_T_nexus_reset = isci_task_I_T_nexus_reset,
> .lldd_lu_reset = isci_task_lu_reset,
> diff --git a/drivers/scsi/isci/task.c b/drivers/scsi/isci/task.c
> index 3fd88d72a0c0..403bfee34d84 100644
> --- a/drivers/scsi/isci/task.c
> +++ b/drivers/scsi/isci/task.c
> @@ -625,24 +625,6 @@ int isci_task_abort_task_set(
> }
>
>
> -/**
> - * isci_task_clear_aca() - This function is one of the SAS Domain Template
> - * functions. This is one of the Task Management functoins called by libsas.
> - * @d_device: This parameter specifies the domain device associated with this
> - * request.
> - * @lun: This parameter specifies the lun associated with this request.
> - *
> - * status, zero indicates success.
> - */
> -int isci_task_clear_aca(
> - struct domain_device *d_device,
> - u8 *lun)
> -{
> - return TMF_RESP_FUNC_FAILED;
> -}
> -
> -
> -
> /**
> * isci_task_clear_task_set() - This function is one of the SAS Domain Template
> * functions. This is one of the Task Management functoins called by libsas.
> diff --git a/drivers/scsi/isci/task.h b/drivers/scsi/isci/task.h
> index cae168b8916f..f96633fa6939 100644
> --- a/drivers/scsi/isci/task.h
> +++ b/drivers/scsi/isci/task.h
> @@ -140,10 +140,6 @@ int isci_task_abort_task_set(
> struct domain_device *d_device,
> u8 *lun);
>
> -int isci_task_clear_aca(
> - struct domain_device *d_device,
> - u8 *lun);
> -
> int isci_task_clear_task_set(
> struct domain_device *d_device,
> u8 *lun);
> diff --git a/drivers/scsi/mvsas/mv_init.c b/drivers/scsi/mvsas/mv_init.c
> index dcae2d4464f9..21429e3a3632 100644
> --- a/drivers/scsi/mvsas/mv_init.c
> +++ b/drivers/scsi/mvsas/mv_init.c
> @@ -64,7 +64,6 @@ static struct sas_domain_function_template mvs_transport_ops = {
>
> .lldd_abort_task = mvs_abort_task,
> .lldd_abort_task_set = mvs_abort_task_set,
> - .lldd_clear_aca = mvs_clear_aca,
> .lldd_clear_task_set = mvs_clear_task_set,
> .lldd_I_T_nexus_reset = mvs_I_T_nexus_reset,
> .lldd_lu_reset = mvs_lu_reset,
> diff --git a/drivers/scsi/mvsas/mv_sas.c b/drivers/scsi/mvsas/mv_sas.c
> index 1e52bc7febfa..fd273c3e069e 100644
> --- a/drivers/scsi/mvsas/mv_sas.c
> +++ b/drivers/scsi/mvsas/mv_sas.c
> @@ -1553,17 +1553,6 @@ int mvs_abort_task_set(struct domain_device *dev, u8 *lun)
> return rc;
> }
>
> -int mvs_clear_aca(struct domain_device *dev, u8 *lun)
> -{
> - int rc = TMF_RESP_FUNC_FAILED;
> - struct mvs_tmf_task tmf_task;
> -
> - tmf_task.tmf = TMF_CLEAR_ACA;
> - rc = mvs_debug_issue_ssp_tmf(dev, lun, &tmf_task);
> -
> - return rc;
> -}
> -
> int mvs_clear_task_set(struct domain_device *dev, u8 *lun)
> {
> int rc = TMF_RESP_FUNC_FAILED;
> diff --git a/drivers/scsi/mvsas/mv_sas.h b/drivers/scsi/mvsas/mv_sas.h
> index 8ff976c9967e..fa654c73beee 100644
> --- a/drivers/scsi/mvsas/mv_sas.h
> +++ b/drivers/scsi/mvsas/mv_sas.h
> @@ -441,7 +441,6 @@ int mvs_scan_finished(struct Scsi_Host *shost, unsigned long time);
> int mvs_queue_command(struct sas_task *task, gfp_t gfp_flags);
> int mvs_abort_task(struct sas_task *task);
> int mvs_abort_task_set(struct domain_device *dev, u8 *lun);
> -int mvs_clear_aca(struct domain_device *dev, u8 *lun);
> int mvs_clear_task_set(struct domain_device *dev, u8 * lun);
> void mvs_port_formed(struct asd_sas_phy *sas_phy);
> void mvs_port_deformed(struct asd_sas_phy *sas_phy);
> diff --git a/drivers/scsi/pm8001/pm8001_init.c b/drivers/scsi/pm8001/pm8001_init.c
> index d8a2121cb8d9..b8cf1bae4040 100644
> --- a/drivers/scsi/pm8001/pm8001_init.c
> +++ b/drivers/scsi/pm8001/pm8001_init.c
> @@ -123,7 +123,6 @@ static struct sas_domain_function_template pm8001_transport_ops = {
>
> .lldd_abort_task = pm8001_abort_task,
> .lldd_abort_task_set = pm8001_abort_task_set,
> - .lldd_clear_aca = pm8001_clear_aca,
> .lldd_clear_task_set = pm8001_clear_task_set,
> .lldd_I_T_nexus_reset = pm8001_I_T_nexus_reset,
> .lldd_lu_reset = pm8001_lu_reset,
> diff --git a/drivers/scsi/pm8001/pm8001_sas.c b/drivers/scsi/pm8001/pm8001_sas.c
> index 160ee8b228c9..4368122ab475 100644
> --- a/drivers/scsi/pm8001/pm8001_sas.c
> +++ b/drivers/scsi/pm8001/pm8001_sas.c
> @@ -1357,14 +1357,6 @@ int pm8001_abort_task_set(struct domain_device *dev, u8 *lun)
> return pm8001_issue_ssp_tmf(dev, lun, &tmf_task);
> }
>
> -int pm8001_clear_aca(struct domain_device *dev, u8 *lun)
> -{
> - struct pm8001_tmf_task tmf_task;
> -
> - tmf_task.tmf = TMF_CLEAR_ACA;
> - return pm8001_issue_ssp_tmf(dev, lun, &tmf_task);
> -}
> -
> int pm8001_clear_task_set(struct domain_device *dev, u8 *lun)
> {
> struct pm8001_tmf_task tmf_task;
> diff --git a/drivers/scsi/pm8001/pm8001_sas.h b/drivers/scsi/pm8001/pm8001_sas.h
> index a17da1cebce1..3ea53a0d0cc1 100644
> --- a/drivers/scsi/pm8001/pm8001_sas.h
> +++ b/drivers/scsi/pm8001/pm8001_sas.h
> @@ -649,7 +649,6 @@ int pm8001_scan_finished(struct Scsi_Host *shost, unsigned long time);
> int pm8001_queue_command(struct sas_task *task, gfp_t gfp_flags);
> int pm8001_abort_task(struct sas_task *task);
> int pm8001_abort_task_set(struct domain_device *dev, u8 *lun);
> -int pm8001_clear_aca(struct domain_device *dev, u8 *lun);
> int pm8001_clear_task_set(struct domain_device *dev, u8 *lun);
> int pm8001_dev_found(struct domain_device *dev);
> void pm8001_dev_gone(struct domain_device *dev);
> diff --git a/include/scsi/libsas.h b/include/scsi/libsas.h
> index 698f2032807b..d59618898619 100644
> --- a/include/scsi/libsas.h
> +++ b/include/scsi/libsas.h
> @@ -637,7 +637,6 @@ struct sas_domain_function_template {
> /* Task Management Functions. Must be called from process context. */
> int (*lldd_abort_task)(struct sas_task *);
> int (*lldd_abort_task_set)(struct domain_device *, u8 *lun);
> - int (*lldd_clear_aca)(struct domain_device *, u8 *lun);
> int (*lldd_clear_task_set)(struct domain_device *, u8 *lun);
> int (*lldd_I_T_nexus_reset)(struct domain_device *);
> int (*lldd_ata_check_ready)(struct domain_device *);
> --
> 2.26.2
>

2022-01-27 12:47:31

by Damien Le Moal

[permalink] [raw]
Subject: Re: [PATCH 00/16] scsi: libsas and users: Factor out LLDD TMF code

On 1/25/22 20:32, John Garry wrote:
> The LLDD TMF code is almost identical between hisi_sas, pm8001, and mvsas
> drivers.
>
> This series factors out that code into libsas, thus reducing much
> duplication and giving a net reduction of ~250 LoC.
>
> There are some subtle differences between the core TMF handler and each
> of the LLDDs old implementation, so any review and testing is appreciated.
>
> Some other minor patches are thrown in:
> - Delete unused macro in hisi_sas driver
> - Delete unused libsas callback
> - Add enum for response frame datapres field
>
> I have another follow-up series to factor out the internal abort code,
> which is common to hisi_sas and pm8001 drivers.
>
> Based on v5.17-rc1
>
> John Garry (16):
> scsi: libsas: Use enum for response frame DATAPRES field
> scsi: libsas: Delete lldd_clear_aca callback
> scsi: hisi_sas: Delete unused I_T_NEXUS_RESET_PHYUP_TIMEOUT
> scsi: libsas: Move SMP task handlers to core
> scsi: libsas: Add struct sas_tmf_task
> scsi: libsas: Add sas_task.tmf
> scsi: libsas: Add sas_execute_tmf()
> scsi: libsas: Add sas_execute_ssp_tmf()
> scsi: libsas: Add TMF handler exec complete callback
> scsi: libsas: Add TMF handler aborted callback
> scsi: libsas: Add sas_abort_task_set()
> scsi: libsas: Add sas_clear_task_set()
> scsi: libsas: Add sas_lu_reset()
> scsi: libsas: Add sas_query_task()
> scsi: libsas: Add sas_abort_task()
> scsi: libsas: Add sas_execute_ata_cmd()
>
> Documentation/scsi/libsas.rst | 2 -
> drivers/scsi/aic94xx/aic94xx.h | 1 -
> drivers/scsi/aic94xx/aic94xx_init.c | 1 -
> drivers/scsi/aic94xx/aic94xx_tmf.c | 9 -
> drivers/scsi/hisi_sas/hisi_sas.h | 9 +-
> drivers/scsi/hisi_sas/hisi_sas_main.c | 235 ++++---------------------
> drivers/scsi/hisi_sas/hisi_sas_v1_hw.c | 2 +-
> drivers/scsi/hisi_sas/hisi_sas_v2_hw.c | 9 +-
> drivers/scsi/hisi_sas/hisi_sas_v3_hw.c | 2 +-
> drivers/scsi/isci/init.c | 1 -
> drivers/scsi/isci/task.c | 18 --
> drivers/scsi/isci/task.h | 4 -
> drivers/scsi/libsas/sas_ata.c | 8 +
> drivers/scsi/libsas/sas_expander.c | 24 +--
> drivers/scsi/libsas/sas_internal.h | 6 +
> drivers/scsi/libsas/sas_scsi_host.c | 220 +++++++++++++++++++++++
> drivers/scsi/libsas/sas_task.c | 12 +-
> drivers/scsi/mvsas/mv_defs.h | 5 -
> drivers/scsi/mvsas/mv_init.c | 5 +-
> drivers/scsi/mvsas/mv_sas.c | 177 +------------------
> drivers/scsi/mvsas/mv_sas.h | 3 -
> drivers/scsi/pm8001/pm8001_hwi.c | 4 +-
> drivers/scsi/pm8001/pm8001_init.c | 4 +-
> drivers/scsi/pm8001/pm8001_sas.c | 180 +++----------------
> drivers/scsi/pm8001/pm8001_sas.h | 13 +-
> include/scsi/libsas.h | 23 ++-
> 26 files changed, 353 insertions(+), 624 deletions(-)
>

John,

I did some light testing of this series (boot + some fio runs) and
everything looks good using my "ATTO Technology, Inc. ExpressSAS 12Gb/s
SAS/SATA HBA (rev 06)" HBA (x86_64 host).

Of note is that "make W=1 M=drivers/scsi" complains with:

drivers/scsi/pm8001/pm80xx_hwi.c:3938: warning: Function parameter or
member 'circularQ' not described in 'process_one_iomb'

And sparse/make C=1 complains about:

drivers/scsi/libsas/sas_port.c:77:13: warning: context imbalance in
'sas_form_port' - different lock contexts for basic block

But I have not checked if it is something that your series touch.

And there is a ton of complaints about __le32 use in the pm80xx code...
I can try to have a look at these if you want, on top of your series.

Cheers.

--
Damien Le Moal
Western Digital Research

2022-01-27 17:56:51

by John Garry

[permalink] [raw]
Subject: Re: [PATCH 00/16] scsi: libsas and users: Factor out LLDD TMF code

On 27/01/2022 06:37, Damien Le Moal wrote:

Hi Damien,

> I did some light testing of this series (boot + some fio runs) and
> everything looks good using my "ATTO Technology, Inc. ExpressSAS 12Gb/s
> SAS/SATA HBA (rev 06)" HBA (x86_64 host).

Yeah, unfortunately these steps prob won't exercise much of the code
changes here since I figure error handling would not kick in.

However using this same adapter type on my arm64 system has error
handling kick in almost straight away - and the handling looks sane. A
silver lining, I suppose ..

>
> Of note is that "make W=1 M=drivers/scsi" complains with:
>
> drivers/scsi/pm8001/pm80xx_hwi.c:3938: warning: Function parameter or
> member 'circularQ' not described in 'process_one_iomb'

That's per-existing. I'll send a patch for that now along with another
fix I found for that driver. ....

>
> And sparse/make C=1 complains about:
>
> drivers/scsi/libsas/sas_port.c:77:13: warning: context imbalance in
> 'sas_form_port' - different lock contexts for basic block

I think it's talking about the port->phy_list_lock usage - it prob
doesn't like segments where we fall out a loop with the lock held (which
was grabbed in the loop). Anyway it looks ok. Maybe we can improve this.

>
> But I have not checked if it is something that your series touch.
>
> And there is a ton of complaints about __le32 use in the pm80xx code...
> I can try to have a look at these if you want, on top of your series.

I really need to get make C=1 working for me - it segfaults in any env I
have :(

Thanks,
John

2022-01-27 17:56:56

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH 02/16] scsi: libsas: Delete lldd_clear_aca callback

Looks good,

Reviewed-by: Christoph Hellwig <[email protected]>

2022-01-27 17:58:29

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH 16/16] scsi: libsas: Add sas_execute_ata_cmd()

Looks good,

Reviewed-by: Christoph Hellwig <[email protected]>

2022-01-27 17:58:51

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH 06/16] scsi: libsas: Add sas_task.tmf

Looks good,

Reviewed-by: Christoph Hellwig <[email protected]>

2022-01-27 17:59:18

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH 15/16] scsi: libsas: Add sas_abort_task()

Looks good,

Reviewed-by: Christoph Hellwig <[email protected]>

2022-01-27 17:59:33

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH 11/16] scsi: libsas: Add sas_abort_task_set()

Looks good,

Reviewed-by: Christoph Hellwig <[email protected]>

2022-01-27 20:03:54

by chenxiang (M)

[permalink] [raw]
Subject: Re: [PATCH 02/16] scsi: libsas: Delete lldd_clear_aca callback



?? 2022/1/25 19:32, John Garry д??:
> This callback is never called, so remove support.
>
> Signed-off-by: John Garry <[email protected]>

Reviewed-by: Xiang Chen <[email protected]>

> ---
> Documentation/scsi/libsas.rst | 2 --
> drivers/scsi/aic94xx/aic94xx.h | 1 -
> drivers/scsi/aic94xx/aic94xx_init.c | 1 -
> drivers/scsi/aic94xx/aic94xx_tmf.c | 9 ---------
> drivers/scsi/hisi_sas/hisi_sas_main.c | 12 ------------
> drivers/scsi/isci/init.c | 1 -
> drivers/scsi/isci/task.c | 18 ------------------
> drivers/scsi/isci/task.h | 4 ----
> drivers/scsi/mvsas/mv_init.c | 1 -
> drivers/scsi/mvsas/mv_sas.c | 11 -----------
> drivers/scsi/mvsas/mv_sas.h | 1 -
> drivers/scsi/pm8001/pm8001_init.c | 1 -
> drivers/scsi/pm8001/pm8001_sas.c | 8 --------
> drivers/scsi/pm8001/pm8001_sas.h | 1 -
> include/scsi/libsas.h | 1 -
> 15 files changed, 72 deletions(-)
>
> diff --git a/Documentation/scsi/libsas.rst b/Documentation/scsi/libsas.rst
> index 6589dfefbc02..305a253d5c3b 100644
> --- a/Documentation/scsi/libsas.rst
> +++ b/Documentation/scsi/libsas.rst
> @@ -207,7 +207,6 @@ Management Functions (TMFs) described in SAM::
> /* Task Management Functions. Must be called from process context. */
> int (*lldd_abort_task)(struct sas_task *);
> int (*lldd_abort_task_set)(struct domain_device *, u8 *lun);
> - int (*lldd_clear_aca)(struct domain_device *, u8 *lun);
> int (*lldd_clear_task_set)(struct domain_device *, u8 *lun);
> int (*lldd_I_T_nexus_reset)(struct domain_device *);
> int (*lldd_lu_reset)(struct domain_device *, u8 *lun);
> @@ -262,7 +261,6 @@ can look like this (called last thing from probe())
>
> my_ha->sas_ha.lldd_abort_task = my_abort_task;
> my_ha->sas_ha.lldd_abort_task_set = my_abort_task_set;
> - my_ha->sas_ha.lldd_clear_aca = my_clear_aca;
> my_ha->sas_ha.lldd_clear_task_set = my_clear_task_set;
> my_ha->sas_ha.lldd_I_T_nexus_reset= NULL; (2)
> my_ha->sas_ha.lldd_lu_reset = my_lu_reset;
> diff --git a/drivers/scsi/aic94xx/aic94xx.h b/drivers/scsi/aic94xx/aic94xx.h
> index 8f24180646c2..f595bc2ee45e 100644
> --- a/drivers/scsi/aic94xx/aic94xx.h
> +++ b/drivers/scsi/aic94xx/aic94xx.h
> @@ -60,7 +60,6 @@ void asd_set_dmamode(struct domain_device *dev);
> /* ---------- TMFs ---------- */
> int asd_abort_task(struct sas_task *);
> int asd_abort_task_set(struct domain_device *, u8 *lun);
> -int asd_clear_aca(struct domain_device *, u8 *lun);
> int asd_clear_task_set(struct domain_device *, u8 *lun);
> int asd_lu_reset(struct domain_device *, u8 *lun);
> int asd_I_T_nexus_reset(struct domain_device *dev);
> diff --git a/drivers/scsi/aic94xx/aic94xx_init.c b/drivers/scsi/aic94xx/aic94xx_init.c
> index 7a78606598c4..954d0c5ae2e2 100644
> --- a/drivers/scsi/aic94xx/aic94xx_init.c
> +++ b/drivers/scsi/aic94xx/aic94xx_init.c
> @@ -960,7 +960,6 @@ static struct sas_domain_function_template aic94xx_transport_functions = {
>
> .lldd_abort_task = asd_abort_task,
> .lldd_abort_task_set = asd_abort_task_set,
> - .lldd_clear_aca = asd_clear_aca,
> .lldd_clear_task_set = asd_clear_task_set,
> .lldd_I_T_nexus_reset = asd_I_T_nexus_reset,
> .lldd_lu_reset = asd_lu_reset,
> diff --git a/drivers/scsi/aic94xx/aic94xx_tmf.c b/drivers/scsi/aic94xx/aic94xx_tmf.c
> index 0eb6e206a2b4..0ff0d6506ccf 100644
> --- a/drivers/scsi/aic94xx/aic94xx_tmf.c
> +++ b/drivers/scsi/aic94xx/aic94xx_tmf.c
> @@ -644,15 +644,6 @@ int asd_abort_task_set(struct domain_device *dev, u8 *lun)
> return res;
> }
>
> -int asd_clear_aca(struct domain_device *dev, u8 *lun)
> -{
> - int res = asd_initiate_ssp_tmf(dev, lun, TMF_CLEAR_ACA, 0);
> -
> - if (res == TMF_RESP_FUNC_COMPLETE)
> - asd_clear_nexus_I_T_L(dev, lun);
> - return res;
> -}
> -
> int asd_clear_task_set(struct domain_device *dev, u8 *lun)
> {
> int res = asd_initiate_ssp_tmf(dev, lun, TMF_CLEAR_TASK_SET, 0);
> diff --git a/drivers/scsi/hisi_sas/hisi_sas_main.c b/drivers/scsi/hisi_sas/hisi_sas_main.c
> index a05ec7aece5a..f014e458edbb 100644
> --- a/drivers/scsi/hisi_sas/hisi_sas_main.c
> +++ b/drivers/scsi/hisi_sas/hisi_sas_main.c
> @@ -1801,17 +1801,6 @@ static int hisi_sas_abort_task_set(struct domain_device *device, u8 *lun)
> return rc;
> }
>
> -static int hisi_sas_clear_aca(struct domain_device *device, u8 *lun)
> -{
> - struct hisi_sas_tmf_task tmf_task;
> - int rc;
> -
> - tmf_task.tmf = TMF_CLEAR_ACA;
> - rc = hisi_sas_debug_issue_ssp_tmf(device, lun, &tmf_task);
> -
> - return rc;
> -}
> -
> #define I_T_NEXUS_RESET_PHYUP_TIMEOUT (2 * HZ)
>
> static int hisi_sas_debug_I_T_nexus_reset(struct domain_device *device)
> @@ -2341,7 +2330,6 @@ static struct sas_domain_function_template hisi_sas_transport_ops = {
> .lldd_control_phy = hisi_sas_control_phy,
> .lldd_abort_task = hisi_sas_abort_task,
> .lldd_abort_task_set = hisi_sas_abort_task_set,
> - .lldd_clear_aca = hisi_sas_clear_aca,
> .lldd_I_T_nexus_reset = hisi_sas_I_T_nexus_reset,
> .lldd_lu_reset = hisi_sas_lu_reset,
> .lldd_query_task = hisi_sas_query_task,
> diff --git a/drivers/scsi/isci/init.c b/drivers/scsi/isci/init.c
> index aade707c5553..e294d5d961eb 100644
> --- a/drivers/scsi/isci/init.c
> +++ b/drivers/scsi/isci/init.c
> @@ -193,7 +193,6 @@ static struct sas_domain_function_template isci_transport_ops = {
> /* Task Management Functions. Must be called from process context. */
> .lldd_abort_task = isci_task_abort_task,
> .lldd_abort_task_set = isci_task_abort_task_set,
> - .lldd_clear_aca = isci_task_clear_aca,
> .lldd_clear_task_set = isci_task_clear_task_set,
> .lldd_I_T_nexus_reset = isci_task_I_T_nexus_reset,
> .lldd_lu_reset = isci_task_lu_reset,
> diff --git a/drivers/scsi/isci/task.c b/drivers/scsi/isci/task.c
> index 3fd88d72a0c0..403bfee34d84 100644
> --- a/drivers/scsi/isci/task.c
> +++ b/drivers/scsi/isci/task.c
> @@ -625,24 +625,6 @@ int isci_task_abort_task_set(
> }
>
>
> -/**
> - * isci_task_clear_aca() - This function is one of the SAS Domain Template
> - * functions. This is one of the Task Management functoins called by libsas.
> - * @d_device: This parameter specifies the domain device associated with this
> - * request.
> - * @lun: This parameter specifies the lun associated with this request.
> - *
> - * status, zero indicates success.
> - */
> -int isci_task_clear_aca(
> - struct domain_device *d_device,
> - u8 *lun)
> -{
> - return TMF_RESP_FUNC_FAILED;
> -}
> -
> -
> -
> /**
> * isci_task_clear_task_set() - This function is one of the SAS Domain Template
> * functions. This is one of the Task Management functoins called by libsas.
> diff --git a/drivers/scsi/isci/task.h b/drivers/scsi/isci/task.h
> index cae168b8916f..f96633fa6939 100644
> --- a/drivers/scsi/isci/task.h
> +++ b/drivers/scsi/isci/task.h
> @@ -140,10 +140,6 @@ int isci_task_abort_task_set(
> struct domain_device *d_device,
> u8 *lun);
>
> -int isci_task_clear_aca(
> - struct domain_device *d_device,
> - u8 *lun);
> -
> int isci_task_clear_task_set(
> struct domain_device *d_device,
> u8 *lun);
> diff --git a/drivers/scsi/mvsas/mv_init.c b/drivers/scsi/mvsas/mv_init.c
> index dcae2d4464f9..21429e3a3632 100644
> --- a/drivers/scsi/mvsas/mv_init.c
> +++ b/drivers/scsi/mvsas/mv_init.c
> @@ -64,7 +64,6 @@ static struct sas_domain_function_template mvs_transport_ops = {
>
> .lldd_abort_task = mvs_abort_task,
> .lldd_abort_task_set = mvs_abort_task_set,
> - .lldd_clear_aca = mvs_clear_aca,
> .lldd_clear_task_set = mvs_clear_task_set,
> .lldd_I_T_nexus_reset = mvs_I_T_nexus_reset,
> .lldd_lu_reset = mvs_lu_reset,
> diff --git a/drivers/scsi/mvsas/mv_sas.c b/drivers/scsi/mvsas/mv_sas.c
> index 1e52bc7febfa..fd273c3e069e 100644
> --- a/drivers/scsi/mvsas/mv_sas.c
> +++ b/drivers/scsi/mvsas/mv_sas.c
> @@ -1553,17 +1553,6 @@ int mvs_abort_task_set(struct domain_device *dev, u8 *lun)
> return rc;
> }
>
> -int mvs_clear_aca(struct domain_device *dev, u8 *lun)
> -{
> - int rc = TMF_RESP_FUNC_FAILED;
> - struct mvs_tmf_task tmf_task;
> -
> - tmf_task.tmf = TMF_CLEAR_ACA;
> - rc = mvs_debug_issue_ssp_tmf(dev, lun, &tmf_task);
> -
> - return rc;
> -}
> -
> int mvs_clear_task_set(struct domain_device *dev, u8 *lun)
> {
> int rc = TMF_RESP_FUNC_FAILED;
> diff --git a/drivers/scsi/mvsas/mv_sas.h b/drivers/scsi/mvsas/mv_sas.h
> index 8ff976c9967e..fa654c73beee 100644
> --- a/drivers/scsi/mvsas/mv_sas.h
> +++ b/drivers/scsi/mvsas/mv_sas.h
> @@ -441,7 +441,6 @@ int mvs_scan_finished(struct Scsi_Host *shost, unsigned long time);
> int mvs_queue_command(struct sas_task *task, gfp_t gfp_flags);
> int mvs_abort_task(struct sas_task *task);
> int mvs_abort_task_set(struct domain_device *dev, u8 *lun);
> -int mvs_clear_aca(struct domain_device *dev, u8 *lun);
> int mvs_clear_task_set(struct domain_device *dev, u8 * lun);
> void mvs_port_formed(struct asd_sas_phy *sas_phy);
> void mvs_port_deformed(struct asd_sas_phy *sas_phy);
> diff --git a/drivers/scsi/pm8001/pm8001_init.c b/drivers/scsi/pm8001/pm8001_init.c
> index d8a2121cb8d9..b8cf1bae4040 100644
> --- a/drivers/scsi/pm8001/pm8001_init.c
> +++ b/drivers/scsi/pm8001/pm8001_init.c
> @@ -123,7 +123,6 @@ static struct sas_domain_function_template pm8001_transport_ops = {
>
> .lldd_abort_task = pm8001_abort_task,
> .lldd_abort_task_set = pm8001_abort_task_set,
> - .lldd_clear_aca = pm8001_clear_aca,
> .lldd_clear_task_set = pm8001_clear_task_set,
> .lldd_I_T_nexus_reset = pm8001_I_T_nexus_reset,
> .lldd_lu_reset = pm8001_lu_reset,
> diff --git a/drivers/scsi/pm8001/pm8001_sas.c b/drivers/scsi/pm8001/pm8001_sas.c
> index 160ee8b228c9..4368122ab475 100644
> --- a/drivers/scsi/pm8001/pm8001_sas.c
> +++ b/drivers/scsi/pm8001/pm8001_sas.c
> @@ -1357,14 +1357,6 @@ int pm8001_abort_task_set(struct domain_device *dev, u8 *lun)
> return pm8001_issue_ssp_tmf(dev, lun, &tmf_task);
> }
>
> -int pm8001_clear_aca(struct domain_device *dev, u8 *lun)
> -{
> - struct pm8001_tmf_task tmf_task;
> -
> - tmf_task.tmf = TMF_CLEAR_ACA;
> - return pm8001_issue_ssp_tmf(dev, lun, &tmf_task);
> -}
> -
> int pm8001_clear_task_set(struct domain_device *dev, u8 *lun)
> {
> struct pm8001_tmf_task tmf_task;
> diff --git a/drivers/scsi/pm8001/pm8001_sas.h b/drivers/scsi/pm8001/pm8001_sas.h
> index a17da1cebce1..3ea53a0d0cc1 100644
> --- a/drivers/scsi/pm8001/pm8001_sas.h
> +++ b/drivers/scsi/pm8001/pm8001_sas.h
> @@ -649,7 +649,6 @@ int pm8001_scan_finished(struct Scsi_Host *shost, unsigned long time);
> int pm8001_queue_command(struct sas_task *task, gfp_t gfp_flags);
> int pm8001_abort_task(struct sas_task *task);
> int pm8001_abort_task_set(struct domain_device *dev, u8 *lun);
> -int pm8001_clear_aca(struct domain_device *dev, u8 *lun);
> int pm8001_clear_task_set(struct domain_device *dev, u8 *lun);
> int pm8001_dev_found(struct domain_device *dev);
> void pm8001_dev_gone(struct domain_device *dev);
> diff --git a/include/scsi/libsas.h b/include/scsi/libsas.h
> index 698f2032807b..d59618898619 100644
> --- a/include/scsi/libsas.h
> +++ b/include/scsi/libsas.h
> @@ -637,7 +637,6 @@ struct sas_domain_function_template {
> /* Task Management Functions. Must be called from process context. */
> int (*lldd_abort_task)(struct sas_task *);
> int (*lldd_abort_task_set)(struct domain_device *, u8 *lun);
> - int (*lldd_clear_aca)(struct domain_device *, u8 *lun);
> int (*lldd_clear_task_set)(struct domain_device *, u8 *lun);
> int (*lldd_I_T_nexus_reset)(struct domain_device *);
> int (*lldd_ata_check_ready)(struct domain_device *);

2022-01-27 21:45:04

by Damien Le Moal

[permalink] [raw]
Subject: Re: [PATCH 00/16] scsi: libsas and users: Factor out LLDD TMF code

On 1/27/22 19:17, John Garry wrote:
> On 27/01/2022 06:37, Damien Le Moal wrote:
>
> Hi Damien,
>
>> I did some light testing of this series (boot + some fio runs) and
>> everything looks good using my "ATTO Technology, Inc. ExpressSAS 12Gb/s
>> SAS/SATA HBA (rev 06)" HBA (x86_64 host).
>
> Yeah, unfortunately these steps prob won't exercise much of the code
> changes here since I figure error handling would not kick in.

I have SMR drives on that adapter, and generating errors with these is
*really* easy :)
ZBC test suite forces errors to check ASC/ASCQ so it will be a good test.

>
> However using this same adapter type on my arm64 system has error
> handling kick in almost straight away - and the handling looks sane. A
> silver lining, I suppose ..
>
>>
>> Of note is that "make W=1 M=drivers/scsi" complains with:
>>
>> drivers/scsi/pm8001/pm80xx_hwi.c:3938: warning: Function parameter or
>> member 'circularQ' not described in 'process_one_iomb'
>
> That's per-existing. I'll send a patch for that now along with another
> fix I found for that driver. ....
>
>>
>> And sparse/make C=1 complains about:
>>
>> drivers/scsi/libsas/sas_port.c:77:13: warning: context imbalance in
>> 'sas_form_port' - different lock contexts for basic block
>
> I think it's talking about the port->phy_list_lock usage - it prob
> doesn't like segments where we fall out a loop with the lock held (which
> was grabbed in the loop). Anyway it looks ok. Maybe we can improve this.
>
>>
>> But I have not checked if it is something that your series touch.
>>
>> And there is a ton of complaints about __le32 use in the pm80xx code...
>> I can try to have a look at these if you want, on top of your series.
>
> I really need to get make C=1 working for me - it segfaults in any env I
> have :(

Weird... All I did was install sparse (dnf install) and it works for me.
Running Fedora.

I will send patches for these.

>
> Thanks,
> John


--
Damien Le Moal
Western Digital Research

2022-01-27 23:17:47

by chenxiang (M)

[permalink] [raw]
Subject: Re: [PATCH 06/16] scsi: libsas: Add sas_task.tmf



?? 2022/1/25 19:32, John Garry д??:
> Add a pointer to a sas_tmf_task to the sas_task struct, as this will be
> used when the common LLDD TMF code is factored out.
>
> Also set it for the LLDDs to store per-sas_task TMF info.
>
> Signed-off-by: John Garry <[email protected]>
> ---
> drivers/scsi/hisi_sas/hisi_sas_main.c | 25 +++++++++---------------
> drivers/scsi/mvsas/mv_sas.c | 15 ++++++--------
> drivers/scsi/pm8001/pm8001_sas.c | 28 ++++++++++-----------------
> include/scsi/libsas.h | 1 +
> 4 files changed, 26 insertions(+), 43 deletions(-)
>
> diff --git a/drivers/scsi/hisi_sas/hisi_sas_main.c b/drivers/scsi/hisi_sas/hisi_sas_main.c
> index db6f8ea7864d..4f146aa50423 100644
> --- a/drivers/scsi/hisi_sas/hisi_sas_main.c
> +++ b/drivers/scsi/hisi_sas/hisi_sas_main.c
> @@ -400,8 +400,7 @@ void hisi_sas_task_deliver(struct hisi_hba *hisi_hba,
> struct hisi_sas_slot *slot,
> struct hisi_sas_dq *dq,
> struct hisi_sas_device *sas_dev,
> - struct hisi_sas_internal_abort *abort,
> - struct sas_tmf_task *tmf)
> + struct hisi_sas_internal_abort *abort)
> {
> struct hisi_sas_cmd_hdr *cmd_hdr_base;
> int dlvry_queue_slot, dlvry_queue;
> @@ -427,8 +426,6 @@ void hisi_sas_task_deliver(struct hisi_hba *hisi_hba,
> cmd_hdr_base = hisi_hba->cmd_hdr[dlvry_queue];
> slot->cmd_hdr = &cmd_hdr_base[dlvry_queue_slot];
>
> - slot->tmf = tmf;
> - slot->is_internal = tmf;

In kernel 5.17-rc1(with above two lines on), it seems there is a issue
for ata disk reset command as tmf = null while it is internal command.

> task->lldd_task = slot;
>
> memset(slot->cmd_hdr, 0, sizeof(struct hisi_sas_cmd_hdr));
> @@ -471,8 +468,7 @@ void hisi_sas_task_deliver(struct hisi_hba *hisi_hba,
> spin_unlock(&dq->lock);
> }
>
> -static int hisi_sas_task_exec(struct sas_task *task, gfp_t gfp_flags,
> - struct sas_tmf_task *tmf)
> +static int hisi_sas_queue_command(struct sas_task *task, gfp_t gfp_flags)
> {
> int n_elem = 0, n_elem_dif = 0, n_elem_req = 0;
> struct domain_device *device = task->dev;
> @@ -583,11 +579,11 @@ static int hisi_sas_task_exec(struct sas_task *task, gfp_t gfp_flags,
> slot->task = task;
> slot->port = port;
>
> - slot->tmf = tmf;
> - slot->is_internal = tmf;
> + slot->tmf = task->tmf;
> + slot->is_internal = task->tmf;
>
> /* protect task_prep and start_delivery sequence */
> - hisi_sas_task_deliver(hisi_hba, slot, dq, sas_dev, NULL, tmf);
> + hisi_sas_task_deliver(hisi_hba, slot, dq, sas_dev, NULL);
>
> return 0;
>
> @@ -1115,11 +1111,6 @@ static void hisi_sas_dev_gone(struct domain_device *device)
> up(&hisi_hba->sem);
> }
>
> -static int hisi_sas_queue_command(struct sas_task *task, gfp_t gfp_flags)
> -{
> - return hisi_sas_task_exec(task, gfp_flags, NULL);
> -}
> -
> static int hisi_sas_phy_set_linkrate(struct hisi_hba *hisi_hba, int phy_no,
> struct sas_phy_linkrates *r)
> {
> @@ -1273,7 +1264,9 @@ static int hisi_sas_exec_internal_tmf_task(struct domain_device *device,
> task->slow_task->timer.expires = jiffies + TASK_TIMEOUT;
> add_timer(&task->slow_task->timer);
>
> - res = hisi_sas_task_exec(task, GFP_KERNEL, tmf);
> + task->tmf = tmf;
> +
> + res = hisi_sas_queue_command(task, GFP_KERNEL);
> if (res) {
> del_timer_sync(&task->slow_task->timer);
> dev_err(dev, "abort tmf: executing internal task failed: %d\n",
> @@ -2054,7 +2047,7 @@ hisi_sas_internal_abort_task_exec(struct hisi_hba *hisi_hba, int device_id,
> slot->port = port;
> slot->is_internal = true;
>
> - hisi_sas_task_deliver(hisi_hba, slot, dq, sas_dev, abort, NULL);
> + hisi_sas_task_deliver(hisi_hba, slot, dq, sas_dev, abort);
>
> return 0;
>
> diff --git a/drivers/scsi/mvsas/mv_sas.c b/drivers/scsi/mvsas/mv_sas.c
> index 22bffaaf2eb0..5105d55eb00c 100644
> --- a/drivers/scsi/mvsas/mv_sas.c
> +++ b/drivers/scsi/mvsas/mv_sas.c
> @@ -840,14 +840,14 @@ static int mvs_task_prep(struct sas_task *task, struct mvs_info *mvi, int is_tmf
> return rc;
> }
>
> -static int mvs_task_exec(struct sas_task *task, gfp_t gfp_flags,
> - struct completion *completion, int is_tmf,
> - struct sas_tmf_task *tmf)
> +int mvs_queue_command(struct sas_task *task, gfp_t gfp_flags)
> {
> struct mvs_info *mvi = NULL;
> u32 rc = 0;
> u32 pass = 0;
> unsigned long flags = 0;
> + struct sas_tmf_task *tmf = task->tmf;
> + int is_tmf = !!task->tmf;
>
> mvi = ((struct mvs_device *)task->dev->lldd_dev)->mvi_info;
>
> @@ -864,11 +864,6 @@ static int mvs_task_exec(struct sas_task *task, gfp_t gfp_flags,
> return rc;
> }
>
> -int mvs_queue_command(struct sas_task *task, gfp_t gfp_flags)
> -{
> - return mvs_task_exec(task, gfp_flags, NULL, 0, NULL);
> -}
> -
> static void mvs_slot_free(struct mvs_info *mvi, u32 rx_desc)
> {
> u32 slot_idx = rx_desc & RXQ_SLOT_MASK;
> @@ -1300,7 +1295,9 @@ static int mvs_exec_internal_tmf_task(struct domain_device *dev,
> task->slow_task->timer.expires = jiffies + MVS_TASK_TIMEOUT*HZ;
> add_timer(&task->slow_task->timer);
>
> - res = mvs_task_exec(task, GFP_KERNEL, NULL, 1, tmf);
> + task->tmf = tmf;
> +
> + res = mvs_queue_command(task, GFP_KERNEL);
>
> if (res) {
> del_timer(&task->slow_task->timer);
> diff --git a/drivers/scsi/pm8001/pm8001_sas.c b/drivers/scsi/pm8001/pm8001_sas.c
> index 0a3701d97549..323b172243b8 100644
> --- a/drivers/scsi/pm8001/pm8001_sas.c
> +++ b/drivers/scsi/pm8001/pm8001_sas.c
> @@ -371,15 +371,14 @@ static int sas_find_local_port_id(struct domain_device *dev)
>
> #define DEV_IS_GONE(pm8001_dev) \
> ((!pm8001_dev || (pm8001_dev->dev_type == SAS_PHY_UNUSED)))
> +
> /**
> - * pm8001_task_exec - queue the task(ssp, smp && ata) to the hardware.
> + * pm8001_queue_command - register for upper layer used, all IO commands sent
> + * to HBA are from this interface.
> * @task: the task to be execute.
> - * @gfp_flags: gfp_flags.
> - * @is_tmf: if it is task management task.
> - * @tmf: the task management IU
> + * @gfp_flags: gfp_flags
> */
> -static int pm8001_task_exec(struct sas_task *task,
> - gfp_t gfp_flags, int is_tmf, struct sas_tmf_task *tmf)
> +int pm8001_queue_command(struct sas_task *task, gfp_t gfp_flags)
> {
> struct domain_device *dev = task->dev;
> struct pm8001_hba_info *pm8001_ha;
> @@ -390,6 +389,8 @@ static int pm8001_task_exec(struct sas_task *task,
> u32 tag = 0xdeadbeef, rc = 0, n_elem = 0;
> unsigned long flags = 0;
> enum sas_protocol task_proto = t->task_proto;
> + struct sas_tmf_task *tmf = task->tmf;
> + int is_tmf = !!task->tmf;
>
> if (!dev->port) {
> struct task_status_struct *tsm = &t->task_status;
> @@ -507,17 +508,6 @@ static int pm8001_task_exec(struct sas_task *task,
> return rc;
> }
>
> -/**
> - * pm8001_queue_command - register for upper layer used, all IO commands sent
> - * to HBA are from this interface.
> - * @task: the task to be execute.
> - * @gfp_flags: gfp_flags
> - */
> -int pm8001_queue_command(struct sas_task *task, gfp_t gfp_flags)
> -{
> - return pm8001_task_exec(task, gfp_flags, 0, NULL);
> -}
> -
> /**
> * pm8001_ccb_task_free - free the sg for ssp and smp command, free the ccb.
> * @pm8001_ha: our hba card information
> @@ -752,7 +742,9 @@ static int pm8001_exec_internal_tmf_task(struct domain_device *dev,
> task->slow_task->timer.expires = jiffies + PM8001_TASK_TIMEOUT*HZ;
> add_timer(&task->slow_task->timer);
>
> - res = pm8001_task_exec(task, GFP_KERNEL, 1, tmf);
> + task->tmf = tmf;
> +
> + res = pm8001_queue_command(task, GFP_KERNEL);
>
> if (res) {
> del_timer(&task->slow_task->timer);
> diff --git a/include/scsi/libsas.h b/include/scsi/libsas.h
> index 30b9afec1e1d..802b4e04e3a1 100644
> --- a/include/scsi/libsas.h
> +++ b/include/scsi/libsas.h
> @@ -611,6 +611,7 @@ struct sas_task {
> void *lldd_task; /* for use by LLDDs */
> void *uldd_task;
> struct sas_task_slow *slow_task;
> + struct sas_tmf_task *tmf;
> };
>
> struct sas_task_slow {

2022-01-28 08:37:19

by John Garry

[permalink] [raw]
Subject: Re: [PATCH 06/16] scsi: libsas: Add sas_task.tmf

On 27/01/2022 12:55, chenxiang (M) wrote:
>> - slot->tmf = tmf;
>> - slot->is_internal = tmf;
> In kernel 5.17-rc1(with above two lines on), it seems there is a issue
> for ata disk reset command as tmf = null while it is internal command.

ok, thanks for the notice. Would this fix it:

diff --git a/drivers/scsi/hisi_sas/hisi_sas_main.c
b/drivers/scsi/hisi_sas/hisi_sas_main.c
index a05ec7aece5a..0e12c9329ee5 100644
--- a/drivers/scsi/hisi_sas/hisi_sas_main.c
+++ b/drivers/scsi/hisi_sas/hisi_sas_main.c
@@ -427,8 +427,6 @@ void hisi_sas_task_deliver(struct hisi_hba *hisi_hba,
cmd_hdr_base = hisi_hba->cmd_hdr[dlvry_queue];
slot->cmd_hdr = &cmd_hdr_base[dlvry_queue_slot];

- slot->tmf = tmf;
- slot->is_internal = tmf;
task->lldd_task = slot;

memset(slot->cmd_hdr, 0, sizeof(struct hisi_sas_cmd_hdr));
@@ -1380,12 +1378,13 @@ static int hisi_sas_softreset_ata_disk(struct
domain_device *device)
struct hisi_hba *hisi_hba = dev_to_hisi_hba(device);
struct device *dev = hisi_hba->dev;
int s = sizeof(struct host_to_dev_fis);
+ struct hisi_sas_tmf_task tmf = {};

ata_for_each_link(link, ap, EDGE) {
int pmp = sata_srst_pmp(link);

hisi_sas_fill_ata_reset_cmd(link->device, 1, pmp, fis);
- rc = hisi_sas_exec_internal_tmf_task(device, fis, s, NULL);
+ rc = hisi_sas_exec_internal_tmf_task(device, fis, s, &tmf);
if (rc != TMF_RESP_FUNC_COMPLETE)
break;
}
@@ -1396,7 +1395,7 @@ static int hisi_sas_softreset_ata_disk(struct
domain_device *device)

hisi_sas_fill_ata_reset_cmd(link->device, 0, pmp, fis);
rc = hisi_sas_exec_internal_tmf_task(device, fis,
- s, NULL);
+ s, &tmf);
if (rc != TMF_RESP_FUNC_COMPLETE)
dev_err(dev, "ata disk %016llx de-reset failed\n",
SAS_ADDR(device->sas_addr));

>
>> task->lldd_task = slot;
>>



2022-01-30 00:36:11

by Damien Le Moal

[permalink] [raw]
Subject: Re: [PATCH 00/16] scsi: libsas and users: Factor out LLDD TMF code

On 1/27/22 19:17, John Garry wrote:
> On 27/01/2022 06:37, Damien Le Moal wrote:
>
> Hi Damien,
>
>> I did some light testing of this series (boot + some fio runs) and
>> everything looks good using my "ATTO Technology, Inc. ExpressSAS 12Gb/s
>> SAS/SATA HBA (rev 06)" HBA (x86_64 host).
>
> Yeah, unfortunately these steps prob won't exercise much of the code
> changes here since I figure error handling would not kick in.
>
> However using this same adapter type on my arm64 system has error
> handling kick in almost straight away - and the handling looks sane. A
> silver lining, I suppose ..

I ran some more tests. In particular, I ran libzbc compliance tests on a
20TB SMR drives. All tests pass with 5.17-rc1, but after applying your
series, I see command timeout that take forever to recover from, with
the drive revalidation failing after that.

[ 385.102073] sas: Enter sas_scsi_recover_host busy: 1 failed: 1
[ 385.108026] sas: sas_scsi_find_task: aborting task 0x000000007068ed73
[ 405.561099] pm80xx0:: pm8001_exec_internal_task_abort 757:TMF task
timeout.
[ 405.568236] sas: sas_scsi_find_task: task 0x000000007068ed73 is aborted
[ 405.574930] sas: sas_eh_handle_sas_errors: task 0x000000007068ed73 is
aborted
[ 411.192602] ata21.00: qc timeout (cmd 0xec)
[ 431.672122] pm80xx0:: pm8001_exec_internal_task_abort 757:TMF task
timeout.
[ 431.679282] ata21.00: failed to IDENTIFY (I/O error, err_mask=0x4)
[ 431.685544] ata21.00: revalidation failed (errno=-5)
[ 441.911948] ata21.00: qc timeout (cmd 0xec)
[ 462.391545] pm80xx0:: pm8001_exec_internal_task_abort 757:TMF task
timeout.
[ 462.398696] ata21.00: failed to IDENTIFY (I/O error, err_mask=0x4)
[ 462.404992] ata21.00: revalidation failed (errno=-5)
[ 492.598769] ata21.00: qc timeout (cmd 0xec)
...

So there is a problem. Need to dig into this. I see this issue only with
libzbc passthrough tests. fio runs with libaio are fine.

>> And sparse/make C=1 complains about:
>>
>> drivers/scsi/libsas/sas_port.c:77:13: warning: context imbalance in
>> 'sas_form_port' - different lock contexts for basic block
>
> I think it's talking about the port->phy_list_lock usage - it prob
> doesn't like segments where we fall out a loop with the lock held (which
> was grabbed in the loop). Anyway it looks ok. Maybe we can improve this.
>
>>
>> But I have not checked if it is something that your series touch.
>>
>> And there is a ton of complaints about __le32 use in the pm80xx code...
>> I can try to have a look at these if you want, on top of your series.
>
> I really need to get make C=1 working for me - it segfaults in any env I
> have :(

I now have a 12 patch series that fixes *all* the sparse warnings. Some
of the fixes were trivial, but most of them are simply hard bugs with
the handling of le32 struct field values. There is no way that this
driver is working as-is on big-endian machines. Some calculations are
actually done using cpu_to_le32() values !

But even though these fixes should have essentially no effect on
little-endian x86_64, with my series applied, I see the same command
timeout problem as with your libsas update, and both series together
result in the same timeout issue too.

So it looks like "fixing" the code actually is revealing some other bug
that was previously hidden... This will take some time to debug.

Another problem I noticed: doing "rmmod pm80xx; modprobe pm80xx" result
in a failure of device scans. I get loops of "link is slow to respond
->reset". For the above tests, I had to reboot every time I changed the
driver module code. Another thing to look at.

Will try to spend some more time on this next week.

Cheers.


>
> Thanks,
> John


--
Damien Le Moal
Western Digital Research

2022-01-30 14:34:32

by chenxiang (M)

[permalink] [raw]
Subject: Re: [PATCH 06/16] scsi: libsas: Add sas_task.tmf



在 2022/1/28 0:01, John Garry 写道:
> On 27/01/2022 12:55, chenxiang (M) wrote:
>>> - slot->tmf = tmf;
>>> - slot->is_internal = tmf;
>> In kernel 5.17-rc1(with above two lines on), it seems there is a issue
>> for ata disk reset command as tmf = null while it is internal command.
>
> ok, thanks for the notice. Would this fix it:

Yes, it should fix it, and i just notice that it seems solve the issue
in the last patch of this patchset (including this fix in the last patch).

>
> diff --git a/drivers/scsi/hisi_sas/hisi_sas_main.c
> b/drivers/scsi/hisi_sas/hisi_sas_main.c
> index a05ec7aece5a..0e12c9329ee5 100644
> --- a/drivers/scsi/hisi_sas/hisi_sas_main.c
> +++ b/drivers/scsi/hisi_sas/hisi_sas_main.c
> @@ -427,8 +427,6 @@ void hisi_sas_task_deliver(struct hisi_hba *hisi_hba,
> cmd_hdr_base = hisi_hba->cmd_hdr[dlvry_queue];
> slot->cmd_hdr = &cmd_hdr_base[dlvry_queue_slot];
>
> - slot->tmf = tmf;
> - slot->is_internal = tmf;
> task->lldd_task = slot;
>
> memset(slot->cmd_hdr, 0, sizeof(struct hisi_sas_cmd_hdr));
> @@ -1380,12 +1378,13 @@ static int hisi_sas_softreset_ata_disk(struct
> domain_device *device)
> struct hisi_hba *hisi_hba = dev_to_hisi_hba(device);
> struct device *dev = hisi_hba->dev;
> int s = sizeof(struct host_to_dev_fis);
> + struct hisi_sas_tmf_task tmf = {};
>
> ata_for_each_link(link, ap, EDGE) {
> int pmp = sata_srst_pmp(link);
>
> hisi_sas_fill_ata_reset_cmd(link->device, 1, pmp, fis);
> - rc = hisi_sas_exec_internal_tmf_task(device, fis, s, NULL);
> + rc = hisi_sas_exec_internal_tmf_task(device, fis, s, &tmf);
> if (rc != TMF_RESP_FUNC_COMPLETE)
> break;
> }
> @@ -1396,7 +1395,7 @@ static int hisi_sas_softreset_ata_disk(struct
> domain_device *device)
>
> hisi_sas_fill_ata_reset_cmd(link->device, 0, pmp, fis);
> rc = hisi_sas_exec_internal_tmf_task(device, fis,
> - s, NULL);
> + s, &tmf);
> if (rc != TMF_RESP_FUNC_COMPLETE)
> dev_err(dev, "ata disk %016llx de-reset failed\n",
> SAS_ADDR(device->sas_addr));
>
>>
>>> task->lldd_task = slot;
>
>
>
> .
>

2022-01-30 14:35:41

by chenxiang (M)

[permalink] [raw]
Subject: Re: [PATCH 00/16] scsi: libsas and users: Factor out LLDD TMF code

Hi John,


?? 2022/1/25 19:32, John Garry д??:
> The LLDD TMF code is almost identical between hisi_sas, pm8001, and mvsas
> drivers.
>
> This series factors out that code into libsas, thus reducing much
> duplication and giving a net reduction of ~250 LoC.
>
> There are some subtle differences between the core TMF handler and each
> of the LLDDs old implementation, so any review and testing is appreciated.
>
> Some other minor patches are thrown in:
> - Delete unused macro in hisi_sas driver
> - Delete unused libsas callback
> - Add enum for response frame datapres field
>
> I have another follow-up series to factor out the internal abort code,
> which is common to hisi_sas and pm8001 drivers.
>
> Based on v5.17-rc1

We have a full test on this patchset on KunPeng920 (for hisi_sas part),
and all of those testcases are passed, so please feel free to add :
Tested-by: [email protected]


> John Garry (16):
> scsi: libsas: Use enum for response frame DATAPRES field
> scsi: libsas: Delete lldd_clear_aca callback
> scsi: hisi_sas: Delete unused I_T_NEXUS_RESET_PHYUP_TIMEOUT
> scsi: libsas: Move SMP task handlers to core
> scsi: libsas: Add struct sas_tmf_task
> scsi: libsas: Add sas_task.tmf
> scsi: libsas: Add sas_execute_tmf()
> scsi: libsas: Add sas_execute_ssp_tmf()
> scsi: libsas: Add TMF handler exec complete callback
> scsi: libsas: Add TMF handler aborted callback
> scsi: libsas: Add sas_abort_task_set()
> scsi: libsas: Add sas_clear_task_set()
> scsi: libsas: Add sas_lu_reset()
> scsi: libsas: Add sas_query_task()
> scsi: libsas: Add sas_abort_task()
> scsi: libsas: Add sas_execute_ata_cmd()
>
> Documentation/scsi/libsas.rst | 2 -
> drivers/scsi/aic94xx/aic94xx.h | 1 -
> drivers/scsi/aic94xx/aic94xx_init.c | 1 -
> drivers/scsi/aic94xx/aic94xx_tmf.c | 9 -
> drivers/scsi/hisi_sas/hisi_sas.h | 9 +-
> drivers/scsi/hisi_sas/hisi_sas_main.c | 235 ++++---------------------
> drivers/scsi/hisi_sas/hisi_sas_v1_hw.c | 2 +-
> drivers/scsi/hisi_sas/hisi_sas_v2_hw.c | 9 +-
> drivers/scsi/hisi_sas/hisi_sas_v3_hw.c | 2 +-
> drivers/scsi/isci/init.c | 1 -
> drivers/scsi/isci/task.c | 18 --
> drivers/scsi/isci/task.h | 4 -
> drivers/scsi/libsas/sas_ata.c | 8 +
> drivers/scsi/libsas/sas_expander.c | 24 +--
> drivers/scsi/libsas/sas_internal.h | 6 +
> drivers/scsi/libsas/sas_scsi_host.c | 220 +++++++++++++++++++++++
> drivers/scsi/libsas/sas_task.c | 12 +-
> drivers/scsi/mvsas/mv_defs.h | 5 -
> drivers/scsi/mvsas/mv_init.c | 5 +-
> drivers/scsi/mvsas/mv_sas.c | 177 +------------------
> drivers/scsi/mvsas/mv_sas.h | 3 -
> drivers/scsi/pm8001/pm8001_hwi.c | 4 +-
> drivers/scsi/pm8001/pm8001_init.c | 4 +-
> drivers/scsi/pm8001/pm8001_sas.c | 180 +++----------------
> drivers/scsi/pm8001/pm8001_sas.h | 13 +-
> include/scsi/libsas.h | 23 ++-
> 26 files changed, 353 insertions(+), 624 deletions(-)
>

2022-01-30 19:29:15

by John Garry

[permalink] [raw]
Subject: Re: [PATCH 00/16] scsi: libsas and users: Factor out LLDD TMF code

On 28/01/2022 06:37, Damien Le Moal wrote:

Hi Damien,

>> However using this same adapter type on my arm64 system has error
>> handling kick in almost straight away - and the handling looks sane. A
>> silver lining, I suppose ..
> I ran some more tests. In particular, I ran libzbc compliance tests on a
> 20TB SMR drives. All tests pass with 5.17-rc1, but after applying your
> series, I see command timeout that take forever to recover from, with
> the drive revalidation failing after that.
>
> [ 385.102073] sas: Enter sas_scsi_recover_host busy: 1 failed: 1
> [ 385.108026] sas: sas_scsi_find_task: aborting task 0x000000007068ed73
> [ 405.561099] pm80xx0:: pm8001_exec_internal_task_abort 757:TMF task
> timeout.
> [ 405.568236] sas: sas_scsi_find_task: task 0x000000007068ed73 is aborted
> [ 405.574930] sas: sas_eh_handle_sas_errors: task 0x000000007068ed73 is
> aborted
> [ 411.192602] ata21.00: qc timeout (cmd 0xec)
> [ 431.672122] pm80xx0:: pm8001_exec_internal_task_abort 757:TMF task
> timeout.
> [ 431.679282] ata21.00: failed to IDENTIFY (I/O error, err_mask=0x4)
> [ 431.685544] ata21.00: revalidation failed (errno=-5)
> [ 441.911948] ata21.00: qc timeout (cmd 0xec)
> [ 462.391545] pm80xx0:: pm8001_exec_internal_task_abort 757:TMF task
> timeout.
> [ 462.398696] ata21.00: failed to IDENTIFY (I/O error, err_mask=0x4)
> [ 462.404992] ata21.00: revalidation failed (errno=-5)
> [ 492.598769] ata21.00: qc timeout (cmd 0xec)
> ...
>
> So there is a problem. Need to dig into this. I see this issue only with
> libzbc passthrough tests. fio runs with libaio are fine.

Thanks for the notice. I think that I also saw a hang, but, IIRC, it
happened on mainline for me - but it's hard to know if I broke something
if it is already broke in another way. That is why I wanted this card
working properly...

Anyway, I will investigate more.

>
>>> And sparse/make C=1 complains about:
>>>
>>> drivers/scsi/libsas/sas_port.c:77:13: warning: context imbalance in
>>> 'sas_form_port' - different lock contexts for basic block
>> I think it's talking about the port->phy_list_lock usage - it prob
>> doesn't like segments where we fall out a loop with the lock held (which
>> was grabbed in the loop). Anyway it looks ok. Maybe we can improve this.
>>
>>> But I have not checked if it is something that your series touch.
>>>
>>> And there is a ton of complaints about __le32 use in the pm80xx code...
>>> I can try to have a look at these if you want, on top of your series.
>> I really need to get make C=1 working for me - it segfaults in any env I
>> have:(
> I now have a 12 patch series that fixes*all* the sparse warnings. Some
> of the fixes were trivial, but most of them are simply hard bugs with
> the handling of le32 struct field values. There is no way that this
> driver is working as-is on big-endian machines. Some calculations are
> actually done using cpu_to_le32() values !

Great, I'll have a look when you send them.

>
> But even though these fixes should have essentially no effect on
> little-endian x86_64, with my series applied, I see the same command
> timeout problem as with your libsas update, and both series together
> result in the same timeout issue too.
>
> So it looks like "fixing" the code actually is revealing some other bug
> that was previously hidden... This will take some time to debug.
>
> Another problem I noticed: doing "rmmod pm80xx; modprobe pm80xx" result
> in a failure of device scans. I get loops of "link is slow to respond
> ->reset". For the above tests, I had to reboot every time I changed the
> driver module code. Another thing to look at.

Sounds odd, I would expect everything runs from afresh when insmod.

Thanks,
John

2022-02-01 20:44:07

by John Garry

[permalink] [raw]
Subject: Re: [PATCH 00/16] scsi: libsas and users: Factor out LLDD TMF code

On 28/01/2022 09:09, John Garry wrote:
>> I ran some more tests. In particular, I ran libzbc compliance tests on a
>> 20TB SMR drives. All tests pass with 5.17-rc1, but after applying your
>> series, I see command timeout that take forever to recover from, with
>> the drive revalidation failing after that.
>>
>> [  385.102073] sas: Enter sas_scsi_recover_host busy: 1 failed: 1
>> [  385.108026] sas: sas_scsi_find_task: aborting task 0x000000007068ed73
>> [  405.561099] pm80xx0:: pm8001_exec_internal_task_abort  757:TMF task
>> timeout.
>> [  405.568236] sas: sas_scsi_find_task: task 0x000000007068ed73 is
>> aborted
>> [  405.574930] sas: sas_eh_handle_sas_errors: task 0x000000007068ed73 is
>> aborted
>> [  411.192602] ata21.00: qc timeout (cmd 0xec)
>> [  431.672122] pm80xx0:: pm8001_exec_internal_task_abort  757:TMF task
>> timeout.
>> [  431.679282] ata21.00: failed to IDENTIFY (I/O error, err_mask=0x4)
>> [  431.685544] ata21.00: revalidation failed (errno=-5)
>> [  441.911948] ata21.00: qc timeout (cmd 0xec)
>> [  462.391545] pm80xx0:: pm8001_exec_internal_task_abort  757:TMF task
>> timeout.
>> [  462.398696] ata21.00: failed to IDENTIFY (I/O error, err_mask=0x4)
>> [  462.404992] ata21.00: revalidation failed (errno=-5)
>> [  492.598769] ata21.00: qc timeout (cmd 0xec)
>> ...
>>
>> So there is a problem. Need to dig into this. I see this issue only with
>> libzbc passthrough tests. fio runs with libaio are fine.
>
> Thanks for the notice. I think that I also saw a hang, but, IIRC, it
> happened on mainline for me - but it's hard to know if I broke something
> if it is already broke in another way. That is why I wanted this card
> working properly...

Hi Damien,

From testing mainline, I can see a hang on my arm64 system for SAS
disks. I think that the reason is the we don't finish some commands in
EH properly for pm8001:
- In EH, we attempt to abort the task in sas_scsi_find_task() ->
lldd_abort_task()
The default return from pm8001_exec_internal_tmf_task() is
-TMF_RESP_FUNC_FAILED, so if the TMF does not execute properly we return
this value
- sas_scsi_find_task() cannot handle -TMF_RESP_FUNC_FAILED, and returns
-TMF_RESP_FUNC_FAILED directly to sas_eh_handle_sas_errors(), which,
again, does not handle -TMF_RESP_FUNC_FAILED. So we don't progress to
ever finish the comand.

This looks like the correct fix for mainline:

--- a/drivers/scsi/pm8001/pm8001_sas.c
+++ b/drivers/scsi/pm8001/pm8001_sas.c
@@ -766,7 +766,7 @@ static int pm8001_exec_internal_tmf_task(struct
domain_device *dev,
pm8001_dev, DS_OPERATIONAL);
wait_for_completion(&completion_setstate);
}
- res = -TMF_RESP_FUNC_FAILED;
+ res = TMF_RESP_FUNC_FAILED;

That's effectively the same as what I have in this series in
sas_execute_tmf().

However your testing is a SATA device, which I'll check further.

Thanks,
John

2022-02-04 01:22:00

by Damien Le Moal

[permalink] [raw]
Subject: Re: [PATCH 00/16] scsi: libsas and users: Factor out LLDD TMF code

On 2/4/22 00:55, John Garry wrote:
> On 03/02/2022 09:44, Damien Le Moal wrote:
>
> Hi Damien,
>
>>>>> [  385.102073] sas: Enter sas_scsi_recover_host busy: 1 failed: 1
>>>>> [  385.108026] sas: sas_scsi_find_task: aborting task 0x000000007068ed73
>>>>> [  405.561099] pm80xx0:: pm8001_exec_internal_task_abort  757:TMF task
>
> Contrary to mentioning TMF in the log, this is not a TMF but rather an
> internal abort timing out. I don't think that this should ever happen.
> This command should just abort pending IO commands in the controller and
> not send anything to the target. So for this to timeout means a HW fault
> or driver bug. And I did not touch this code for pm8001.
>
>>>>> timeout.
>>>>> [  405.568236] sas: sas_scsi_find_task: task 0x000000007068ed73 is
>>>>> aborted
>>>>> [  405.574930] sas: sas_eh_handle_sas_errors: task 0x000000007068ed73 is
>>>>> aborted
>>>>> [  411.192602] ata21.00: qc timeout (cmd 0xec)
>>>>> [  431.672122] pm80xx0:: pm8001_exec_internal_task_abort  757:TMF task
>>>>> timeout.
>>>>> [  431.679282] ata21.00: failed to IDENTIFY (I/O error, err_mask=0x4)
>>>>> [  431.685544] ata21.00: revalidation failed (errno=-5)
>>>>> [  441.911948] ata21.00: qc timeout (cmd 0xec)
>>>>> [  462.391545] pm80xx0:: pm8001_exec_internal_task_abort  757:TMF task
>>>>> timeout.
>>>>> [  462.398696] ata21.00: failed to IDENTIFY (I/O error, err_mask=0x4)
>>>>> [  462.404992] ata21.00: revalidation failed (errno=-5)
>>>>> [  492.598769] ata21.00: qc timeout (cmd 0xec)
>>>>> ...
>>>>>
>
> Do you have a fuller dmesg with my series?

Here it is below. Conditions: I rebased everything on Linus latest
master, applied your series and the patch you sent below, rebooted with
pm80xx module option logging_level=31.

Device scan is all OK. With the system idle, I simply start libzbc tests
on the SATA SMR drive I have on the HBA.

The first command issued is 0x63 == NCQ NON DATA and seems to be OK, but
it seems that there are inconsistencies. And that may be what breaks the
adapter/driver state because everything after that command miserably
fails and does not complete.

The inconsistency is this line says:
[ 137.193944] pm80xx0:: pm80xx_chip_sata_req 4581:no data
Which seems to be sensical for NCQ_NON_DATA command, but then, this line
seems wrong:
[ 137.228015] pm80xx0:: mpi_sata_completion 2515:FPDMA len = 8

I need to go and check the specs what the FIS reply format is for
NCQ_NON_DATA.


[ 137.187184] pm80xx0:: pm8001_queue_command 408:pm8001_task_exec device
[ 137.193944] pm80xx0:: pm80xx_chip_sata_req 4581:no data
[ 137.199339] pm80xx0:: pm80xx_chip_sata_req 4682:Sending Normal SATA
command 0x63 inb 4
[ 137.207577] pm80xx0:: pm8001_mpi_msg_consume 1446:: CI=46 PI=47
msgHeader=8104200d
[ 137.215399] pm80xx0:: mpi_sata_completion 2481:IO_SUCCESS
[ 137.220961] pm80xx0:: mpi_sata_completion 2503:SAS_PROTO_RESPONSE
len = 20
[ 137.228015] pm80xx0:: mpi_sata_completion 2515:FPDMA len = 8
[ 137.233878] pm80xx0:: pm8001_mpi_msg_free_set 1403: CI=47 PI=47
[ 137.236696] pm80xx0:: pm8001_queue_command 408:pm8001_task_exec device
[ 137.247102] pm80xx0:: pm80xx_chip_sata_req 4585:DMA
[ 137.252186] pm80xx0:: pm80xx_chip_sata_req 4593:FPDMA
[ 137.257400] pm80xx0:: pm80xx_chip_sata_req 4682:Sending Normal SATA
command 0x65 inb f
[ 167.506280] sas: Enter sas_scsi_recover_host busy: 1 failed: 1
[ 167.512363] sas: sas_scsi_find_task: aborting task 0x00000000aa372627
[ 167.519049] pm80xx0:: pm8001_chip_abort_task 4607:cmd_tag = 2, abort
task tag = 0x1
[ 187.969173] pm80xx0:: pm8001_exec_internal_task_abort 753:TMF task
timeout.
[ 187.976450] sas: sas_scsi_find_task: task 0x00000000aa372627 is aborted
[ 187.983244] sas: sas_eh_handle_sas_errors: task 0x00000000aa372627 is
aborted
[ 188.144734] pm80xx0:: pm8001_queue_command 408:pm8001_task_exec device
[ 188.151555] pm80xx0:: pm80xx_chip_sata_req 4588:PIO
[ 188.156648] pm80xx0:: pm80xx_chip_sata_req 4682:Sending Normal SATA
command 0xec inb 8
[ 193.600813] ata21.00: qc timeout (cmd 0xec)
[ 193.605976] pm80xx0:: pm8001_chip_abort_task 4607:cmd_tag = 4, abort
task tag = 0x3
[ 214.080236] pm80xx0:: pm8001_exec_internal_task_abort 753:TMF task
timeout.
[ 214.087563] ata21.00: failed to IDENTIFY (I/O error, err_mask=0x4)
[ 214.093929] ata21.00: revalidation failed (errno=-5)
[ 214.256233] pm80xx0:: pm8001_queue_command 408:pm8001_task_exec device
[ 214.263043] pm80xx0:: pm80xx_chip_sata_req 4588:PIO
[ 214.268128] pm80xx0:: pm80xx_chip_sata_req 4682:Sending Normal SATA
command 0xec inb 8
[ 224.319899] ata21.00: qc timeout (cmd 0xec)
[ 224.324317] pm80xx0:: pm8001_chip_abort_task 4607:cmd_tag = 6, abort
task tag = 0x5
[ 244.799433] pm80xx0:: pm8001_exec_internal_task_abort 753:TMF task
timeout.
[ 244.806750] ata21.00: failed to IDENTIFY (I/O error, err_mask=0x4)
[ 244.813110] ata21.00: revalidation failed (errno=-5)
[ 244.975500] pm80xx0:: pm8001_queue_command 408:pm8001_task_exec device
[ 244.982314] pm80xx0:: pm80xx_chip_sata_req 4588:PIO
[ 244.987400] pm80xx0:: pm80xx_chip_sata_req 4682:Sending Normal SATA
command 0xec inb 8
[ 275.006814] ata21.00: qc timeout (cmd 0xec)
[ 275.011236] pm80xx0:: pm8001_chip_abort_task 4607:cmd_tag = 8, abort
task tag = 0x7
[ 295.486387] pm80xx0:: pm8001_exec_internal_task_abort 753:TMF task
timeout.
[ 295.494025] ata21.00: failed to IDENTIFY (I/O error, err_mask=0x4)
[ 295.500390] ata21.00: revalidation failed (errno=-5)
[ 295.509179] ata21.00: disable device
[ 295.670556] sas: --- Exit sas_scsi_recover_host: busy: 0 failed: 1
tries: 1
[ 295.670570] sd 19:0:2:0: [sdj] REPORT ZONES start lba 0 failed
[ 295.689972] sd 19:0:2:0: [sdj] REPORT ZONES: Result: hostbyte=DID_OK
driverbyte=DRIVER_OK
[ 295.700621] sd 19:0:2:0: [sdj] Sense Key : Not Ready [current]
[ 295.708975] sd 19:0:2:0: [sdj] Add. Sense: Logical unit not ready,
hard reset required
[ 295.719331] sd 19:0:2:0: [sdj] 0 4096-byte logical blocks: (0 B/0 B)
[ 295.728727] sd 19:0:2:0: [sdj] Write Protect is on
[ 295.737928] sdj: detected capacity change from 39063650304 to 0
[ 295.826347] sd 19:0:2:0: [sdj] tag#158 FAILED Result:
hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK cmd_age=0s
[ 295.838761] sd 19:0:2:0: [sdj] tag#158 CDB: Test Unit Ready 00 00 00
00 00 00
[ 295.920864] sd 19:0:2:0: [sdj] tag#780 FAILED Result:
hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK cmd_age=0s
[ 295.933341] sd 19:0:2:0: [sdj] tag#780 CDB: Test Unit Ready 00 00 00
00 00 00
[ 296.010417] sd 19:0:2:0: [sdj] tag#248 FAILED Result:
hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK cmd_age=0s
[ 296.022275] sd 19:0:2:0: [sdj] tag#248 CDB: Test Unit Ready 00 00 00
00 00 00
[ 296.101348] sd 19:0:2:0: [sdj] tag#949 FAILED Result:
hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK cmd_age=0s
[ 296.113924] sd 19:0:2:0: [sdj] tag#949 CDB: Test Unit Ready 00 00 00
00 00 00
[ 296.192258] sd 19:0:2:0: [sdj] tag#25 FAILED Result:
hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK cmd_age=0s
[ 296.204788] sd 19:0:2:0: [sdj] tag#25 CDB: Test Unit Ready 00 00 00
00 00 00
[ 296.284546] sd 19:0:2:0: [sdj] tag#273 FAILED Result:
hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK cmd_age=0s
[ 296.297126] sd 19:0:2:0: [sdj] tag#273 CDB: Test Unit Ready 00 00 00
00 00 00
[ 296.376896] sd 19:0:2:0: [sdj] tag#653 FAILED Result:
hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK cmd_age=0s
[ 296.389475] sd 19:0:2:0: [sdj] tag#653 CDB: Test Unit Ready 00 00 00
00 00 00
[ 296.468095] sd 19:0:2:0: [sdj] tag#159 FAILED Result:
hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK cmd_age=0s
[ 296.479974] sd 19:0:2:0: [sdj] tag#159 CDB: Test Unit Ready 00 00 00
00 00 00
[ 296.560861] sd 19:0:2:0: [sdj] tag#274 FAILED Result:
hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK cmd_age=0s
[ 296.572792] sd 19:0:2:0: [sdj] tag#274 CDB: Test Unit Ready 00 00 00
00 00 00
[ 296.654506] sd 19:0:2:0: [sdj] tag#727 FAILED Result:
hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK cmd_age=0s
[ 296.667151] sd 19:0:2:0: [sdj] tag#727 CDB: Test Unit Ready 00 00 00
00 00 00

After these messages, the tests exit on failure (drive dropped) and
there are no more messages. Doing rmmod or anything else get stuck too.
I have to reset the machine to get back to a good state.

I am starting to think that NCQ NON DATA command is being very badly
handled... Switching the tests to run with all non-NCQ commands is
working fine, albeit horribly slow (much slower than with other HBAs,
e.g. Broadcom).

Digging...

>
> ...
>
>>> }
>>> - res = -TMF_RESP_FUNC_FAILED;
>>> + res = TMF_RESP_FUNC_FAILED;
>>>
>>> That's effectively the same as what I have in this series in
>>> sas_execute_tmf().
>>>
>>> However your testing is a SATA device, which I'll check further.
>> This did not help. Still seeing 100% reproducible hangs.
>
> OK, but I think that we should also have this change as the mainline
> codes looks broken to be begin with:
>
> --->8 ---
>
> [PATCH] scsi: libsas: Handle all errors in sas_scsi_find_task()
>
> LLDD TMFs callbacks may return linux or other error codes instead of TMF
> codes. This may cause problems in sas_scsi_find_task() ->
> .lldd_query_task(), as only TMF codes are handled there. As such, we may
> not return a task_disposition type. Function sas_eh_handle_sas_errors()
> only handles that type, and may exit error
> handling early for unrecognised types.
>
> So use TASK_ABORT_FAILED for non-TMF types returned from
> .lldd_query_task(), on the assumption that the command may still be
> alive and error handling should be escalated.
>
> Signed-off-by: John Garry <[email protected]>
>
> diff --git a/drivers/scsi/libsas/sas_scsi_host.c
> b/drivers/scsi/libsas/sas_scsi_host.c
> index 53d8b7ede0cd..02274f471308 100644
> --- a/drivers/scsi/libsas/sas_scsi_host.c
> +++ b/drivers/scsi/libsas/sas_scsi_host.c
> @@ -316,8 +316,11 @@ static enum task_disposition
> sas_scsi_find_task(struct sas_task *task)
> pr_notice("%s: task 0x%p failed to abort\n",
> __func__, task);
> return TASK_ABORT_FAILED;
> + default:
> + pr_notice("%s: task 0x%p result code %d not handled, assuming
> failed\n",
> + __func__, task, res);
> + return TASK_ABORT_FAILED;
> }
> -
> }
> }
> return res;
>
> ---8< ----
>
>>
>> I did a lot of testing/digging today,
>
> Thanks for the effort!
>
> > and the hang cause seems to be
> > missing task completions.
>> At random, a task times out as its completion
>
> That sounds fimilar to my general issue running this driver on an arm64
> host...
>
>> does not come, and subsequent abort trial for the task fail, revalidate
>> fails
>
> I assume SMP IOs fail if revalidation fails - if this is the case, then
> the controller seems to be in bad state.
>
>> and the device is dropped (capacity goes to 0). But at that point,
>> doing rmmod/modprobe to reset the device does not work. sync cache
>> command issued at rmmod time never completes. I end up needing to power
>> cycle the machine every time...
>>
>> No clue about the root cause yet, but it definitely seem to be related
>> to NCQ/high QD operation. If I force my tests to use non-NCQ commands,
>> everything is fine and the tests run to completion without any issue.
>>
>> I wonder if their is a tag management bug somewhere...
>
> Maybe. Not sure.
>
> On a related point, Hannes' change here could avoid it:
>
> https://lore.kernel.org/linux-scsi/[email protected]/
>
>>
>> I did stumble on something very ugly in libsas too: sas_ata_qc_issue()
>> drops and retake the ata port lock. No other ATA driver do that since
>> the ata completion also take that lock. The ata port lock is taken
>> before ata_qc_issue() is called with IRQ disabled (spin_lock_irqsave()).
>> So doing a spin_unlock()/spin_lock() in sas_ata_qc_issue() (called from
>> ata_qc_issue()) seems like a very bad idea. I removed that and
>> everything work the same way (the lld execute does not sleep). But that
>> did not solve the hang problem.
>
> I would need to check why this is done again. Before my time...
>
>>
>> Of note is this is all with your libsas patches applied. Without the
>> patches, I have KASAN screaming at me about use-after-free in completion
>> context. With your patches, KASAN is silent.
>>
>> Another thing: this driver does not allow changing the max qd... Very
>> annoying.
>>
>> echo 1 > /sys/block/sdX/device/queue_depth
>>
>> has no effect. QD stays at 32 for an ATA drive. Need to look into that too.
>
> I had a look at this. It seems that we fail in
> __ata_change_queue_depth() -> ata_scsi_find_dev() returning NULL.
>
> Thanks again for your effort, I will continue to look.
>
> john


--
Damien Le Moal
Western Digital Research

2022-02-04 10:51:07

by Damien Le Moal

[permalink] [raw]
Subject: Re: [PATCH 00/16] scsi: libsas and users: Factor out LLDD TMF code

On 2/4/22 09:36, Damien Le Moal wrote:
> On 2/4/22 00:55, John Garry wrote:
>> On 03/02/2022 09:44, Damien Le Moal wrote:
>>
>> Hi Damien,
>>
>>>>>> [  385.102073] sas: Enter sas_scsi_recover_host busy: 1 failed: 1
>>>>>> [  385.108026] sas: sas_scsi_find_task: aborting task 0x000000007068ed73
>>>>>> [  405.561099] pm80xx0:: pm8001_exec_internal_task_abort  757:TMF task
>>
>> Contrary to mentioning TMF in the log, this is not a TMF but rather an
>> internal abort timing out. I don't think that this should ever happen.
>> This command should just abort pending IO commands in the controller and
>> not send anything to the target. So for this to timeout means a HW fault
>> or driver bug. And I did not touch this code for pm8001.
>>
>>>>>> timeout.
>>>>>> [  405.568236] sas: sas_scsi_find_task: task 0x000000007068ed73 is
>>>>>> aborted
>>>>>> [  405.574930] sas: sas_eh_handle_sas_errors: task 0x000000007068ed73 is
>>>>>> aborted
>>>>>> [  411.192602] ata21.00: qc timeout (cmd 0xec)
>>>>>> [  431.672122] pm80xx0:: pm8001_exec_internal_task_abort  757:TMF task
>>>>>> timeout.
>>>>>> [  431.679282] ata21.00: failed to IDENTIFY (I/O error, err_mask=0x4)
>>>>>> [  431.685544] ata21.00: revalidation failed (errno=-5)
>>>>>> [  441.911948] ata21.00: qc timeout (cmd 0xec)
>>>>>> [  462.391545] pm80xx0:: pm8001_exec_internal_task_abort  757:TMF task
>>>>>> timeout.
>>>>>> [  462.398696] ata21.00: failed to IDENTIFY (I/O error, err_mask=0x4)
>>>>>> [  462.404992] ata21.00: revalidation failed (errno=-5)
>>>>>> [  492.598769] ata21.00: qc timeout (cmd 0xec)
>>>>>> ...
>>>>>>
>>
>> Do you have a fuller dmesg with my series?
>
> Here it is below. Conditions: I rebased everything on Linus latest
> master, applied your series and the patch you sent below, rebooted with
> pm80xx module option logging_level=31.
>
> Device scan is all OK. With the system idle, I simply start libzbc tests
> on the SATA SMR drive I have on the HBA.
>
> The first command issued is 0x63 == NCQ NON DATA and seems to be OK, but
> it seems that there are inconsistencies. And that may be what breaks the
> adapter/driver state because everything after that command miserably
> fails and does not complete.
>
> The inconsistency is this line says:
> [ 137.193944] pm80xx0:: pm80xx_chip_sata_req 4581:no data
> Which seems to be sensical for NCQ_NON_DATA command, but then, this line
> seems wrong:
> [ 137.228015] pm80xx0:: mpi_sata_completion 2515:FPDMA len = 8
>
> I need to go and check the specs what the FIS reply format is for
> NCQ_NON_DATA.
>
>
> [ 137.187184] pm80xx0:: pm8001_queue_command 408:pm8001_task_exec device
> [ 137.193944] pm80xx0:: pm80xx_chip_sata_req 4581:no data
> [ 137.199339] pm80xx0:: pm80xx_chip_sata_req 4682:Sending Normal SATA
> command 0x63 inb 4
> [ 137.207577] pm80xx0:: pm8001_mpi_msg_consume 1446:: CI=46 PI=47
> msgHeader=8104200d
> [ 137.215399] pm80xx0:: mpi_sata_completion 2481:IO_SUCCESS
> [ 137.220961] pm80xx0:: mpi_sata_completion 2503:SAS_PROTO_RESPONSE
> len = 20
> [ 137.228015] pm80xx0:: mpi_sata_completion 2515:FPDMA len = 8
> [ 137.233878] pm80xx0:: pm8001_mpi_msg_free_set 1403: CI=47 PI=47
> [ 137.236696] pm80xx0:: pm8001_queue_command 408:pm8001_task_exec device
> [ 137.247102] pm80xx0:: pm80xx_chip_sata_req 4585:DMA
> [ 137.252186] pm80xx0:: pm80xx_chip_sata_req 4593:FPDMA
> [ 137.257400] pm80xx0:: pm80xx_chip_sata_req 4682:Sending Normal SATA
> command 0x65 inb f
> [ 167.506280] sas: Enter sas_scsi_recover_host busy: 1 failed: 1
> [ 167.512363] sas: sas_scsi_find_task: aborting task 0x00000000aa372627
> [ 167.519049] pm80xx0:: pm8001_chip_abort_task 4607:cmd_tag = 2, abort
> task tag = 0x1
> [ 187.969173] pm80xx0:: pm8001_exec_internal_task_abort 753:TMF task
> timeout.
> [ 187.976450] sas: sas_scsi_find_task: task 0x00000000aa372627 is aborted
> [ 187.983244] sas: sas_eh_handle_sas_errors: task 0x00000000aa372627 is
> aborted
> [ 188.144734] pm80xx0:: pm8001_queue_command 408:pm8001_task_exec device
> [ 188.151555] pm80xx0:: pm80xx_chip_sata_req 4588:PIO
> [ 188.156648] pm80xx0:: pm80xx_chip_sata_req 4682:Sending Normal SATA
> command 0xec inb 8
> [ 193.600813] ata21.00: qc timeout (cmd 0xec)
> [ 193.605976] pm80xx0:: pm8001_chip_abort_task 4607:cmd_tag = 4, abort
> task tag = 0x3
> [ 214.080236] pm80xx0:: pm8001_exec_internal_task_abort 753:TMF task
> timeout.
> [ 214.087563] ata21.00: failed to IDENTIFY (I/O error, err_mask=0x4)
> [ 214.093929] ata21.00: revalidation failed (errno=-5)
> [ 214.256233] pm80xx0:: pm8001_queue_command 408:pm8001_task_exec device
> [ 214.263043] pm80xx0:: pm80xx_chip_sata_req 4588:PIO
> [ 214.268128] pm80xx0:: pm80xx_chip_sata_req 4682:Sending Normal SATA
> command 0xec inb 8
> [ 224.319899] ata21.00: qc timeout (cmd 0xec)
> [ 224.324317] pm80xx0:: pm8001_chip_abort_task 4607:cmd_tag = 6, abort
> task tag = 0x5
> [ 244.799433] pm80xx0:: pm8001_exec_internal_task_abort 753:TMF task
> timeout.
> [ 244.806750] ata21.00: failed to IDENTIFY (I/O error, err_mask=0x4)
> [ 244.813110] ata21.00: revalidation failed (errno=-5)
> [ 244.975500] pm80xx0:: pm8001_queue_command 408:pm8001_task_exec device
> [ 244.982314] pm80xx0:: pm80xx_chip_sata_req 4588:PIO
> [ 244.987400] pm80xx0:: pm80xx_chip_sata_req 4682:Sending Normal SATA
> command 0xec inb 8
> [ 275.006814] ata21.00: qc timeout (cmd 0xec)
> [ 275.011236] pm80xx0:: pm8001_chip_abort_task 4607:cmd_tag = 8, abort
> task tag = 0x7
> [ 295.486387] pm80xx0:: pm8001_exec_internal_task_abort 753:TMF task
> timeout.
> [ 295.494025] ata21.00: failed to IDENTIFY (I/O error, err_mask=0x4)
> [ 295.500390] ata21.00: revalidation failed (errno=-5)
> [ 295.509179] ata21.00: disable device
> [ 295.670556] sas: --- Exit sas_scsi_recover_host: busy: 0 failed: 1
> tries: 1
> [ 295.670570] sd 19:0:2:0: [sdj] REPORT ZONES start lba 0 failed
> [ 295.689972] sd 19:0:2:0: [sdj] REPORT ZONES: Result: hostbyte=DID_OK
> driverbyte=DRIVER_OK
> [ 295.700621] sd 19:0:2:0: [sdj] Sense Key : Not Ready [current]
> [ 295.708975] sd 19:0:2:0: [sdj] Add. Sense: Logical unit not ready,
> hard reset required
> [ 295.719331] sd 19:0:2:0: [sdj] 0 4096-byte logical blocks: (0 B/0 B)
> [ 295.728727] sd 19:0:2:0: [sdj] Write Protect is on
> [ 295.737928] sdj: detected capacity change from 39063650304 to 0
> [ 295.826347] sd 19:0:2:0: [sdj] tag#158 FAILED Result:
> hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK cmd_age=0s
> [ 295.838761] sd 19:0:2:0: [sdj] tag#158 CDB: Test Unit Ready 00 00 00
> 00 00 00
> [ 295.920864] sd 19:0:2:0: [sdj] tag#780 FAILED Result:
> hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK cmd_age=0s
> [ 295.933341] sd 19:0:2:0: [sdj] tag#780 CDB: Test Unit Ready 00 00 00
> 00 00 00
> [ 296.010417] sd 19:0:2:0: [sdj] tag#248 FAILED Result:
> hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK cmd_age=0s
> [ 296.022275] sd 19:0:2:0: [sdj] tag#248 CDB: Test Unit Ready 00 00 00
> 00 00 00
> [ 296.101348] sd 19:0:2:0: [sdj] tag#949 FAILED Result:
> hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK cmd_age=0s
> [ 296.113924] sd 19:0:2:0: [sdj] tag#949 CDB: Test Unit Ready 00 00 00
> 00 00 00
> [ 296.192258] sd 19:0:2:0: [sdj] tag#25 FAILED Result:
> hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK cmd_age=0s
> [ 296.204788] sd 19:0:2:0: [sdj] tag#25 CDB: Test Unit Ready 00 00 00
> 00 00 00
> [ 296.284546] sd 19:0:2:0: [sdj] tag#273 FAILED Result:
> hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK cmd_age=0s
> [ 296.297126] sd 19:0:2:0: [sdj] tag#273 CDB: Test Unit Ready 00 00 00
> 00 00 00
> [ 296.376896] sd 19:0:2:0: [sdj] tag#653 FAILED Result:
> hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK cmd_age=0s
> [ 296.389475] sd 19:0:2:0: [sdj] tag#653 CDB: Test Unit Ready 00 00 00
> 00 00 00
> [ 296.468095] sd 19:0:2:0: [sdj] tag#159 FAILED Result:
> hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK cmd_age=0s
> [ 296.479974] sd 19:0:2:0: [sdj] tag#159 CDB: Test Unit Ready 00 00 00
> 00 00 00
> [ 296.560861] sd 19:0:2:0: [sdj] tag#274 FAILED Result:
> hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK cmd_age=0s
> [ 296.572792] sd 19:0:2:0: [sdj] tag#274 CDB: Test Unit Ready 00 00 00
> 00 00 00
> [ 296.654506] sd 19:0:2:0: [sdj] tag#727 FAILED Result:
> hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK cmd_age=0s
> [ 296.667151] sd 19:0:2:0: [sdj] tag#727 CDB: Test Unit Ready 00 00 00
> 00 00 00
>
> After these messages, the tests exit on failure (drive dropped) and
> there are no more messages. Doing rmmod or anything else get stuck too.
> I have to reset the machine to get back to a good state.
>
> I am starting to think that NCQ NON DATA command is being very badly
> handled... Switching the tests to run with all non-NCQ commands is
> working fine, albeit horribly slow (much slower than with other HBAs,
> e.g. Broadcom).
>
> Digging...

I missed a KASAN splat during device scan on boot:

33.725184]
==================================================================
[ 33.746168] BUG: KASAN: use-after-free in __lock_acquire+0x41a5/0x5b50
[ 33.764181] Read of size 8 at addr ffff88818a318660 by task
kworker/u64:6/583
[ 33.786726]
[ 33.802181] CPU: 6 PID: 583 Comm: kworker/u64:6 Not tainted
5.17.0-rc2+ #1425
[ 33.823777] Hardware name: Supermicro Super Server/H12SSL-NT, BIOS
2.1 06/02/2021
[ 33.845112] Workqueue: events_unbound async_run_entry_fn
[ 33.864185] Call Trace:
[ 33.880147] <TASK>
[ 33.896183] dump_stack_lvl+0x45/0x59
[ 33.913180] print_address_description.constprop.0+0x1f/0x120
[ 33.932108] ? __lock_acquire+0x41a5/0x5b50
[ 33.949156] kasan_report.cold+0x83/0xdf
[ 33.965184] ? __lock_acquire+0x41a5/0x5b50
[ 33.982178] __lock_acquire+0x41a5/0x5b50
[ 33.998919] ? _raw_spin_unlock_irqrestore+0x23/0x40
[ 34.019831] ? pm8001_mpi_build_cmd+0x3ad/0x780 [pm80xx]
[ 34.039171] ? lockdep_hardirqs_on_prepare+0x3e0/0x3e0
[ 34.057941] ? pm80xx_chip_sata_req+0xa78/0x1cb0 [pm80xx]
[ 34.076181] lock_acquire+0x194/0x490
[ 34.092185] ? pm8001_queue_command+0x842/0x1150 [pm80xx]
[ 34.113174] ? lock_release+0x6b0/0x6b0
[ 34.130184] _raw_spin_lock+0x2c/0x40
[ 34.147848] ? pm8001_queue_command+0x842/0x1150 [pm80xx]
[ 34.165964] pm8001_queue_command+0x842/0x1150 [pm80xx]
[ 34.185178] ? __raw_spin_lock_init+0x3b/0x110
[ 34.202748] sas_ata_qc_issue+0x6d8/0xba0 [libsas]
[ 34.220097] ata_qc_issue+0x4f8/0xcc0 [libata]
[ 34.242185] ata_exec_internal_sg+0xacd/0x1790 [libata]
[ 34.261176] ? ata_qc_issue+0xcc0/0xcc0 [libata]
[ 34.279059] ? memset+0x20/0x40
[ 34.296170] ata_read_log_page+0x361/0x5d0 [libata]
[ 34.313999] ? ata_dev_set_feature+0x330/0x330 [libata]
[ 34.332183] ? ata_dev_set_feature+0x330/0x330 [libata]
[ 34.353174] ? vsprintf+0x10/0x10
[ 34.369194] ata_identify_page_supported+0xb8/0x2e0 [libata]
[ 34.388168] ata_dev_configure+0x161b/0x4b90 [libata]
[ 34.407172] ? _raw_spin_trylock_bh+0x50/0x70
[ 34.424188] ? ata_do_dev_read_id+0xe0/0xe0 [libata]
[ 34.442100] ? ata_hpa_resize+0xce0/0xce0 [libata]
[ 34.458927] ? memcpy+0x39/0x60
[ 34.475172] ? ata_dev_read_id+0xf70/0xf70 [libata]
[ 34.492172] ata_dev_revalidate+0x172/0x2b0 [libata]
[ 34.508945] ata_do_set_mode+0x11f5/0x2410 [libata]
[ 34.525762] ? find_held_lock+0x2c/0x110
[ 34.542169] ? ata_dev_revalidate+0x2b0/0x2b0 [libata]
[ 34.558174] ? ata_eh_recover+0x181e/0x33e0 [libata]
[ 34.575698] ata_set_mode+0x2e8/0x3f0 [libata]
[ 34.592016] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 34.609166] ? _raw_spin_unlock_irqrestore+0x2d/0x40
[ 34.625180] ? trace_hardirqs_on+0x1c/0x110
[ 34.641190] ata_eh_recover+0x23be/0x33e0 [libata]
[ 34.658188] ? sas_ata_hard_reset+0x310/0x310 [libsas]
[ 34.675144] ? sas_ata_qc_fill_rtf+0x90/0x90 [libsas]
[ 34.691182] ? ata_link_nr_enabled+0x50/0x50 [libata]
[ 34.708173] ? find_held_lock+0x2c/0x110
[ 34.723717] ? lock_release+0x1dd/0x6b0
[ 34.740168] ? ata_scsi_port_error_handler+0x4d1/0xe60 [libata]
[ 34.758180] ? sas_ata_hard_reset+0x310/0x310 [libsas]
[ 34.774868] ? sas_ata_hard_reset+0x310/0x310 [libsas]
[ 34.791740] ? sas_ata_qc_fill_rtf+0x90/0x90 [libsas]
[ 34.808169] ata_do_eh+0x75/0x150 [libata]
[ 34.824178] ? trace_hardirqs_on+0x1c/0x110
[ 34.840176] ata_scsi_port_error_handler+0x536/0xe60 [libata]
[ 34.857110] ? sas_fail_probe.constprop.0+0xb5/0xb5 [libsas]
[ 34.874168] async_sas_ata_eh+0xcf/0x112 [libsas]
[ 34.890978] async_run_entry_fn+0x93/0x500
[ 34.907184] process_one_work+0x7f0/0x1310
[ 34.923188] ? lock_release+0x6b0/0x6b0
[ 34.939169] ? pwq_dec_nr_in_flight+0x230/0x230
[ 34.955754] ? rwlock_bug.part.0+0x90/0x90
[ 34.972232] worker_thread+0x598/0xf70
[ 34.987993] ? process_one_work+0x1310/0x1310
[ 35.004971] kthread+0x28f/0x330
[ 35.020168] ? kthread_complete_and_exit+0x20/0x20
[ 35.037168] ret_from_fork+0x1f/0x30
[ 35.052525] </TASK>
[ 35.066623]
[ 35.081181] Allocated by task 583:
[ 35.097188] kasan_save_stack+0x1e/0x40
[ 35.113930] __kasan_slab_alloc+0x64/0x80
[ 35.130571] kmem_cache_alloc+0x1a6/0x450
[ 35.148041] sas_alloc_task+0x1b/0x80 [libsas]
[ 35.163187] sas_ata_qc_issue+0x1a8/0xba0 [libsas]
[ 35.180145] ata_qc_issue+0x4f8/0xcc0 [libata]
[ 35.195456] ata_exec_internal_sg+0xacd/0x1790 [libata]
[ 35.213179] ata_read_log_page+0x361/0x5d0 [libata]
[ 35.228915] ata_identify_page_supported+0xb8/0x2e0 [libata]
[ 35.246182] ata_dev_configure+0x161b/0x4b90 [libata]
[ 35.262189] ata_dev_revalidate+0x172/0x2b0 [libata]
[ 35.280168] ata_do_set_mode+0x11f5/0x2410 [libata]
[ 35.296178] ata_set_mode+0x2e8/0x3f0 [libata]
[ 35.311176] ata_eh_recover+0x23be/0x33e0 [libata]
[ 35.327168] ata_do_eh+0x75/0x150 [libata]
[ 35.342188] ata_scsi_port_error_handler+0x536/0xe60 [libata]
[ 35.360166] async_sas_ata_eh+0xcf/0x112 [libsas]
[ 35.376365] async_run_entry_fn+0x93/0x500
[ 35.392175] process_one_work+0x7f0/0x1310
[ 35.408169] worker_thread+0x598/0xf70
[ 35.424172] kthread+0x28f/0x330
[ 35.439850] ret_from_fork+0x1f/0x30
[ 35.456177]
[ 35.469109] Freed by task 0:
[ 35.482174] kasan_save_stack+0x1e/0x40
[ 35.496889] kasan_set_track+0x21/0x30
[ 35.512165] kasan_set_free_info+0x20/0x30
[ 35.527168] __kasan_slab_free+0xd8/0x110
[ 35.542454] kmem_cache_free.part.0+0x67/0x170
[ 35.559111] mpi_sata_completion+0x99c/0x2d70 [pm80xx]
[ 35.576044] process_oq+0xbd2/0x7c20 [pm80xx]
[ 35.592169] pm80xx_chip_isr+0x94/0x130 [pm80xx]
[ 35.608180] tasklet_action_common.constprop.0+0x24b/0x2f0
[ 35.625171] __do_softirq+0x1b5/0x82d
[ 35.640187]
[ 35.653755] The buggy address belongs to the object at ffff88818a318640
[ 35.653755] which belongs to the cache sas_task of size 320
[ 35.688176] The buggy address is located 32 bytes inside of
[ 35.688176] 320-byte region [ffff88818a318640, ffff88818a318780)
[ 35.723167] The buggy address belongs to the page:
[ 35.740168] page:000000006ae5e64e refcount:1 mapcount:0
mapping:0000000000000000 index:0xffff88818a3184c0 pfn:0x18a318
[ 35.762689] flags: 0x20000000000200(slab|node=0|zone=2)
[ 35.780183] raw: 0020000000000200 ffff888100f02440 ffff888100f02440
ffff888100f09e00
[ 35.800300] raw: ffff88818a3184c0 ffff88818a318040 0000000100000008
0000000000000000
[ 35.821174] page dumped because: kasan: bad access detected
[ 35.840167]
[ 35.855170] Memory state around the buggy address:
[ 35.873169] ffff88818a318500: fc fc fc fc fc fc fc fc fc fc fc fc fc
fc fc fc
[ 35.893918] ffff88818a318580: fc fc fc fc fc fc fc fc fc fc fc fc fc
fc fc fc
[ 35.913172] >ffff88818a318600: fc fc fc fc fc fc fc fc fa fb fb fb fb
fb fb fb
[ 35.932176] ^
[ 35.953171] ffff88818a318680: fb fb fb fb fb fb fb fb fb fb fb fb fb
fb fb fb
[ 35.974171] ffff88818a318700: fb fb fb fb fb fb fb fb fb fb fb fb fb
fb fb fb
[ 35.994168]
==================================================================

This is the submission path, not completion. The code is:

(gdb) list *(pm8001_queue_command+0x842)
0x3d42 is in pm8001_queue_command (drivers/scsi/pm8001/pm8001_sas.c:491).
486 atomic_dec(&pm8001_dev->running_req);
487 goto err_out_tag;
488 }
489 /* TODO: select normal or high priority */
490 spin_lock(&t->task_state_lock);
491 t->task_state_flags |= SAS_TASK_AT_INITIATOR;
492 spin_unlock(&t->task_state_lock);
493 } while (0);
494 rc = 0;
495 goto out_done;

So the task is already completed when the submission path tries to set
the state flag ? Debugging...


--
Damien Le Moal
Western Digital Research

2022-02-04 11:48:59

by John Garry

[permalink] [raw]
Subject: Re: [PATCH 00/16] scsi: libsas and users: Factor out LLDD TMF code

On 03/02/2022 09:44, Damien Le Moal wrote:

Hi Damien,

>>>> [  385.102073] sas: Enter sas_scsi_recover_host busy: 1 failed: 1
>>>> [  385.108026] sas: sas_scsi_find_task: aborting task 0x000000007068ed73
>>>> [  405.561099] pm80xx0:: pm8001_exec_internal_task_abort  757:TMF task

Contrary to mentioning TMF in the log, this is not a TMF but rather an
internal abort timing out. I don't think that this should ever happen.
This command should just abort pending IO commands in the controller and
not send anything to the target. So for this to timeout means a HW fault
or driver bug. And I did not touch this code for pm8001.

>>>> timeout.
>>>> [  405.568236] sas: sas_scsi_find_task: task 0x000000007068ed73 is
>>>> aborted
>>>> [  405.574930] sas: sas_eh_handle_sas_errors: task 0x000000007068ed73 is
>>>> aborted
>>>> [  411.192602] ata21.00: qc timeout (cmd 0xec)
>>>> [  431.672122] pm80xx0:: pm8001_exec_internal_task_abort  757:TMF task
>>>> timeout.
>>>> [  431.679282] ata21.00: failed to IDENTIFY (I/O error, err_mask=0x4)
>>>> [  431.685544] ata21.00: revalidation failed (errno=-5)
>>>> [  441.911948] ata21.00: qc timeout (cmd 0xec)
>>>> [  462.391545] pm80xx0:: pm8001_exec_internal_task_abort  757:TMF task
>>>> timeout.
>>>> [  462.398696] ata21.00: failed to IDENTIFY (I/O error, err_mask=0x4)
>>>> [  462.404992] ata21.00: revalidation failed (errno=-5)
>>>> [  492.598769] ata21.00: qc timeout (cmd 0xec)
>>>> ...
>>>>

Do you have a fuller dmesg with my series?

...

>> }
>> - res = -TMF_RESP_FUNC_FAILED;
>> + res = TMF_RESP_FUNC_FAILED;
>>
>> That's effectively the same as what I have in this series in
>> sas_execute_tmf().
>>
>> However your testing is a SATA device, which I'll check further.
> This did not help. Still seeing 100% reproducible hangs.

OK, but I think that we should also have this change as the mainline
codes looks broken to be begin with:

--->8 ---

[PATCH] scsi: libsas: Handle all errors in sas_scsi_find_task()

LLDD TMFs callbacks may return linux or other error codes instead of TMF
codes. This may cause problems in sas_scsi_find_task() ->
.lldd_query_task(), as only TMF codes are handled there. As such, we may
not return a task_disposition type. Function sas_eh_handle_sas_errors()
only handles that type, and may exit error
handling early for unrecognised types.

So use TASK_ABORT_FAILED for non-TMF types returned from
.lldd_query_task(), on the assumption that the command may still be
alive and error handling should be escalated.

Signed-off-by: John Garry <[email protected]>

diff --git a/drivers/scsi/libsas/sas_scsi_host.c
b/drivers/scsi/libsas/sas_scsi_host.c
index 53d8b7ede0cd..02274f471308 100644
--- a/drivers/scsi/libsas/sas_scsi_host.c
+++ b/drivers/scsi/libsas/sas_scsi_host.c
@@ -316,8 +316,11 @@ static enum task_disposition
sas_scsi_find_task(struct sas_task *task)
pr_notice("%s: task 0x%p failed to abort\n",
__func__, task);
return TASK_ABORT_FAILED;
+ default:
+ pr_notice("%s: task 0x%p result code %d not handled, assuming
failed\n",
+ __func__, task, res);
+ return TASK_ABORT_FAILED;
}
-
}
}
return res;

---8< ----

>
> I did a lot of testing/digging today,

Thanks for the effort!

> and the hang cause seems to be
> missing task completions.
> At random, a task times out as its completion

That sounds fimilar to my general issue running this driver on an arm64
host...

> does not come, and subsequent abort trial for the task fail, revalidate
> fails

I assume SMP IOs fail if revalidation fails - if this is the case, then
the controller seems to be in bad state.

> and the device is dropped (capacity goes to 0). But at that point,
> doing rmmod/modprobe to reset the device does not work. sync cache
> command issued at rmmod time never completes. I end up needing to power
> cycle the machine every time...
>
> No clue about the root cause yet, but it definitely seem to be related
> to NCQ/high QD operation. If I force my tests to use non-NCQ commands,
> everything is fine and the tests run to completion without any issue.
>
> I wonder if their is a tag management bug somewhere...

Maybe. Not sure.

On a related point, Hannes' change here could avoid it:

https://lore.kernel.org/linux-scsi/[email protected]/

>
> I did stumble on something very ugly in libsas too: sas_ata_qc_issue()
> drops and retake the ata port lock. No other ATA driver do that since
> the ata completion also take that lock. The ata port lock is taken
> before ata_qc_issue() is called with IRQ disabled (spin_lock_irqsave()).
> So doing a spin_unlock()/spin_lock() in sas_ata_qc_issue() (called from
> ata_qc_issue()) seems like a very bad idea. I removed that and
> everything work the same way (the lld execute does not sleep). But that
> did not solve the hang problem.

I would need to check why this is done again. Before my time...

>
> Of note is this is all with your libsas patches applied. Without the
> patches, I have KASAN screaming at me about use-after-free in completion
> context. With your patches, KASAN is silent.
>
> Another thing: this driver does not allow changing the max qd... Very
> annoying.
>
> echo 1 > /sys/block/sdX/device/queue_depth
>
> has no effect. QD stays at 32 for an ATA drive. Need to look into that too.

I had a look at this. It seems that we fail in
__ata_change_queue_depth() -> ata_scsi_find_dev() returning NULL.

Thanks again for your effort, I will continue to look.

john

2022-02-04 17:58:53

by John Garry

[permalink] [raw]
Subject: Re: [PATCH 00/16] scsi: libsas and users: Factor out LLDD TMF code

On 04/02/2022 11:27, Damien Le Moal wrote:
>> As I mentioned, having this fail is a red flag. If I was pushed to guess
>> what has happened, I'd say the FW is faulting due to some erroneous
>> driver behaviour.
> I am still thinking that there is something wrong with the handling of
> NCQ NON DATA command. There are several places where the code determines
> non-data vs pio vs dma vs fpdma (ncq), and NCQ NON DATA always falls in
> the fpdma bucket, which is wrong.
>

Ok, I will have a look at this. We made some libsas changes related to
this not so long ago to "fix" something, see:

53de092f47f ("scsi: libsas: Set data_dir as DMA_NONE if libata marks qc
as NODATA")

176ddd89171d ("scsi: libsas: Reset num_scatter if libata marks qc as
NODATA")

>>> This is the submission path, not completion. The code is:
>>>
>>> (gdb) list *(pm8001_queue_command+0x842)
>>> 0x3d42 is in pm8001_queue_command (drivers/scsi/pm8001/pm8001_sas.c:491).
>>> 486 atomic_dec(&pm8001_dev->running_req);
>>> 487 goto err_out_tag;
>>> 488 }
>>> 489 /* TODO: select normal or high priority */
>>> 490 spin_lock(&t->task_state_lock);
>>> 491 t->task_state_flags |= SAS_TASK_AT_INITIATOR;
>>> 492 spin_unlock(&t->task_state_lock);
>>> 493 } while (0);
>>> 494 rc = 0;
>>> 495 goto out_done;
>>>
>>> So the task is already completed when the submission path tries to set
>>> the state flag ? Debugging...
>> Yeah, that's how it looks.
>>
>> I already mentioned this problem here:
>>
>> https://lore.kernel.org/linux-scsi/[email protected]/
>>
>> Maybe we should just fix it now to rule it out of possibly causing other
>> issues... I was reluctant to fix it as many places seems to need to be
>> touched. Let me check it.
> Here is my current fix:
>
> diff --git a/drivers/scsi/pm8001/pm8001_sas.c
> b/drivers/scsi/pm8001/pm8001_sas.c
> index 1b95c73d12d1..16c101577dd3 100644
> --- a/drivers/scsi/pm8001/pm8001_sas.c
> +++ b/drivers/scsi/pm8001/pm8001_sas.c
> @@ -453,13 +453,18 @@ int pm8001_queue_command(struct sas_task *task,
> gfp_t gfp_flags)
> ccb->ccb_tag = tag;
> ccb->task = t;
> ccb->device = pm8001_dev;
> +
> + /* TODO: select normal or high priority */
> + atomic_inc(&pm8001_dev->running_req);
> + spin_lock(&t->task_state_lock);
> + t->task_state_flags |= SAS_TASK_AT_INITIATOR;
> + spin_unlock(&t->task_state_lock);
> +
> switch (task_proto) {
> case SAS_PROTOCOL_SMP:
> - atomic_inc(&pm8001_dev->running_req);
> rc = pm8001_task_prep_smp(pm8001_ha, ccb);
> break;
> case SAS_PROTOCOL_SSP:
> - atomic_inc(&pm8001_dev->running_req);
> if (is_tmf)
> rc = pm8001_task_prep_ssp_tm(pm8001_ha,
> ccb, tmf);
> @@ -468,7 +473,6 @@ int pm8001_queue_command(struct sas_task *task,
> gfp_t gfp_flags)
> break;
> case SAS_PROTOCOL_SATA:
> case SAS_PROTOCOL_STP:
> - atomic_inc(&pm8001_dev->running_req);
> rc = pm8001_task_prep_ata(pm8001_ha, ccb);
> break;
> default:
> @@ -480,13 +484,12 @@ int pm8001_queue_command(struct sas_task *task,
> gfp_t gfp_flags)
>
> if (rc) {
> pm8001_dbg(pm8001_ha, IO, "rc is %x\n", rc);
> + spin_lock(&t->task_state_lock);
> + t->task_state_flags &= ~SAS_TASK_AT_INITIATOR;
> + spin_unlock(&t->task_state_lock);
> atomic_dec(&pm8001_dev->running_req);
> goto err_out_tag;
> }
> - /* TODO: select normal or high priority */
> - spin_lock(&t->task_state_lock);
> - t->task_state_flags |= SAS_TASK_AT_INITIATOR;
> - spin_unlock(&t->task_state_lock);
> } while (0);
> rc = 0;
> goto out_done;
>
> With this, No KASAN complaint. I will send a proper patch ASAP.

I had just prepared a patch, but different, attached.

>
> Of note is that I cannot see what the flag SAS_TASK_AT_INITIATOR is for.
> It is set and unset only, never tested anywhere in libsas nor pm8001
> driver. This flag seems totally useless to me, unless this is something
> that the HW can see ?

Right, it is only checked by isci AFAIC. And it is not something which
HW can see.

Since libsas does not check it the semantics are ill-defined. However I
think that it's worth setting it completeness; I thought it was better
setting it just before the command is delivered to HW, but the
implementation touches more SW. Simpler approach may be better since it
aligns with how this flag may be cleared in pm8001_chip_sata_req() under
asusmption it is already set.

Thanks,
John


Attachments:
0001-scsi-pm8001-Set-SAS_TASK_AT_INITIATOR-before-deliver.patch (6.03 kB)

2022-02-04 20:34:48

by Damien Le Moal

[permalink] [raw]
Subject: Re: [PATCH 00/16] scsi: libsas and users: Factor out LLDD TMF code

On 2/4/22 19:36, John Garry wrote:
> On 04/02/2022 03:02, Damien Le Moal wrote:
>>> The inconsistency is this line says:
>>> [ 137.193944] pm80xx0:: pm80xx_chip_sata_req 4581:no data
>>> Which seems to be sensical for NCQ_NON_DATA command, but then, this line
>>> seems wrong:
>>> [ 137.228015] pm80xx0:: mpi_sata_completion 2515:FPDMA len = 8
>>>
>>> I need to go and check the specs what the FIS reply format is for
>>> NCQ_NON_DATA.
>>>
>>>
>>> [ 137.187184] pm80xx0:: pm8001_queue_command 408:pm8001_task_exec device
>>> [ 137.193944] pm80xx0:: pm80xx_chip_sata_req 4581:no data
>>> [ 137.199339] pm80xx0:: pm80xx_chip_sata_req 4682:Sending Normal SATA
>>> command 0x63 inb 4
>>> [ 137.207577] pm80xx0:: pm8001_mpi_msg_consume 1446:: CI=46 PI=47
>>> msgHeader=8104200d
>>> [ 137.215399] pm80xx0:: mpi_sata_completion 2481:IO_SUCCESS
>>> [ 137.220961] pm80xx0:: mpi_sata_completion 2503:SAS_PROTO_RESPONSE
>>> len = 20
>>> [ 137.228015] pm80xx0:: mpi_sata_completion 2515:FPDMA len = 8
>>> [ 137.233878] pm80xx0:: pm8001_mpi_msg_free_set 1403: CI=47 PI=47
>>> [ 137.236696] pm80xx0:: pm8001_queue_command 408:pm8001_task_exec device
>>> [ 137.247102] pm80xx0:: pm80xx_chip_sata_req 4585:DMA
>>> [ 137.252186] pm80xx0:: pm80xx_chip_sata_req 4593:FPDMA
>>> [ 137.257400] pm80xx0:: pm80xx_chip_sata_req 4682:Sending Normal SATA
>>> command 0x65 inb f
>>> [ 167.506280] sas: Enter sas_scsi_recover_host busy: 1 failed: 1
>>> [ 167.512363] sas: sas_scsi_find_task: aborting task 0x00000000aa372627
>>> [ 167.519049] pm80xx0:: pm8001_chip_abort_task 4607:cmd_tag = 2, abort
>>> task tag = 0x1
>>> [ 187.969173] pm80xx0:: pm8001_exec_internal_task_abort 753:TMF task
>>> timeout.
>
> As I mentioned, having this fail is a red flag. If I was pushed to guess
> what has happened, I'd say the FW is faulting due to some erroneous
> driver behaviour.

I am still thinking that there is something wrong with the handling of
NCQ NON DATA command. There are several places where the code determines
non-data vs pio vs dma vs fpdma (ncq), and NCQ NON DATA always falls in
the fpdma bucket, which is wrong.

>> This is the submission path, not completion. The code is:
>>
>> (gdb) list *(pm8001_queue_command+0x842)
>> 0x3d42 is in pm8001_queue_command (drivers/scsi/pm8001/pm8001_sas.c:491).
>> 486 atomic_dec(&pm8001_dev->running_req);
>> 487 goto err_out_tag;
>> 488 }
>> 489 /* TODO: select normal or high priority */
>> 490 spin_lock(&t->task_state_lock);
>> 491 t->task_state_flags |= SAS_TASK_AT_INITIATOR;
>> 492 spin_unlock(&t->task_state_lock);
>> 493 } while (0);
>> 494 rc = 0;
>> 495 goto out_done;
>>
>> So the task is already completed when the submission path tries to set
>> the state flag ? Debugging...
>
> Yeah, that's how it looks.
>
> I already mentioned this problem here:
>
> https://lore.kernel.org/linux-scsi/[email protected]/
>
> Maybe we should just fix it now to rule it out of possibly causing other
> issues... I was reluctant to fix it as many places seems to need to be
> touched. Let me check it.

Here is my current fix:

diff --git a/drivers/scsi/pm8001/pm8001_sas.c
b/drivers/scsi/pm8001/pm8001_sas.c
index 1b95c73d12d1..16c101577dd3 100644
--- a/drivers/scsi/pm8001/pm8001_sas.c
+++ b/drivers/scsi/pm8001/pm8001_sas.c
@@ -453,13 +453,18 @@ int pm8001_queue_command(struct sas_task *task,
gfp_t gfp_flags)
ccb->ccb_tag = tag;
ccb->task = t;
ccb->device = pm8001_dev;
+
+ /* TODO: select normal or high priority */
+ atomic_inc(&pm8001_dev->running_req);
+ spin_lock(&t->task_state_lock);
+ t->task_state_flags |= SAS_TASK_AT_INITIATOR;
+ spin_unlock(&t->task_state_lock);
+
switch (task_proto) {
case SAS_PROTOCOL_SMP:
- atomic_inc(&pm8001_dev->running_req);
rc = pm8001_task_prep_smp(pm8001_ha, ccb);
break;
case SAS_PROTOCOL_SSP:
- atomic_inc(&pm8001_dev->running_req);
if (is_tmf)
rc = pm8001_task_prep_ssp_tm(pm8001_ha,
ccb, tmf);
@@ -468,7 +473,6 @@ int pm8001_queue_command(struct sas_task *task,
gfp_t gfp_flags)
break;
case SAS_PROTOCOL_SATA:
case SAS_PROTOCOL_STP:
- atomic_inc(&pm8001_dev->running_req);
rc = pm8001_task_prep_ata(pm8001_ha, ccb);
break;
default:
@@ -480,13 +484,12 @@ int pm8001_queue_command(struct sas_task *task,
gfp_t gfp_flags)

if (rc) {
pm8001_dbg(pm8001_ha, IO, "rc is %x\n", rc);
+ spin_lock(&t->task_state_lock);
+ t->task_state_flags &= ~SAS_TASK_AT_INITIATOR;
+ spin_unlock(&t->task_state_lock);
atomic_dec(&pm8001_dev->running_req);
goto err_out_tag;
}
- /* TODO: select normal or high priority */
- spin_lock(&t->task_state_lock);
- t->task_state_flags |= SAS_TASK_AT_INITIATOR;
- spin_unlock(&t->task_state_lock);
} while (0);
rc = 0;
goto out_done;

With this, No KASAN complaint. I will send a proper patch ASAP.

Of note is that I cannot see what the flag SAS_TASK_AT_INITIATOR is for.
It is set and unset only, never tested anywhere in libsas nor pm8001
driver. This flag seems totally useless to me, unless this is something
that the HW can see ?

--
Damien Le Moal
Western Digital Research

2022-02-07 07:53:06

by John Garry

[permalink] [raw]
Subject: Re: [PATCH 00/16] scsi: libsas and users: Factor out LLDD TMF code

On 04/02/2022 03:02, Damien Le Moal wrote:
>> The inconsistency is this line says:
>> [ 137.193944] pm80xx0:: pm80xx_chip_sata_req 4581:no data
>> Which seems to be sensical for NCQ_NON_DATA command, but then, this line
>> seems wrong:
>> [ 137.228015] pm80xx0:: mpi_sata_completion 2515:FPDMA len = 8
>>
>> I need to go and check the specs what the FIS reply format is for
>> NCQ_NON_DATA.
>>
>>
>> [ 137.187184] pm80xx0:: pm8001_queue_command 408:pm8001_task_exec device
>> [ 137.193944] pm80xx0:: pm80xx_chip_sata_req 4581:no data
>> [ 137.199339] pm80xx0:: pm80xx_chip_sata_req 4682:Sending Normal SATA
>> command 0x63 inb 4
>> [ 137.207577] pm80xx0:: pm8001_mpi_msg_consume 1446:: CI=46 PI=47
>> msgHeader=8104200d
>> [ 137.215399] pm80xx0:: mpi_sata_completion 2481:IO_SUCCESS
>> [ 137.220961] pm80xx0:: mpi_sata_completion 2503:SAS_PROTO_RESPONSE
>> len = 20
>> [ 137.228015] pm80xx0:: mpi_sata_completion 2515:FPDMA len = 8
>> [ 137.233878] pm80xx0:: pm8001_mpi_msg_free_set 1403: CI=47 PI=47
>> [ 137.236696] pm80xx0:: pm8001_queue_command 408:pm8001_task_exec device
>> [ 137.247102] pm80xx0:: pm80xx_chip_sata_req 4585:DMA
>> [ 137.252186] pm80xx0:: pm80xx_chip_sata_req 4593:FPDMA
>> [ 137.257400] pm80xx0:: pm80xx_chip_sata_req 4682:Sending Normal SATA
>> command 0x65 inb f
>> [ 167.506280] sas: Enter sas_scsi_recover_host busy: 1 failed: 1
>> [ 167.512363] sas: sas_scsi_find_task: aborting task 0x00000000aa372627
>> [ 167.519049] pm80xx0:: pm8001_chip_abort_task 4607:cmd_tag = 2, abort
>> task tag = 0x1
>> [ 187.969173] pm80xx0:: pm8001_exec_internal_task_abort 753:TMF task
>> timeout.

As I mentioned, having this fail is a red flag. If I was pushed to guess
what has happened, I'd say the FW is faulting due to some erroneous
driver behaviour.

>> [ 187.976450] sas: sas_scsi_find_task: task 0x00000000aa372627 is aborted
>> [ 187.983244] sas: sas_eh_handle_sas_errors: task 0x00000000aa372627 is
>> aborted



>>
>> After these messages, the tests exit on failure (drive dropped) and
>> there are no more messages. Doing rmmod or anything else get stuck too.
>> I have to reset the machine to get back to a good state.
>>
>> I am starting to think that NCQ NON DATA command is being very badly
>> handled... Switching the tests to run with all non-NCQ commands is
>> working fine, albeit horribly slow (much slower than with other HBAs,
>> e.g. Broadcom).
>>
>> Digging...
> I missed a KASAN splat during device scan on boot:
>
> 33.725184]
> ==================================================================
> [ 33.746168] BUG: KASAN: use-after-free in __lock_acquire+0x41a5/0x5b50
> [ 33.764181] Read of size 8 at addr ffff88818a318660 by task
> kworker/u64:6/583

...

> ==================================================================
>
> This is the submission path, not completion. The code is:
>
> (gdb) list *(pm8001_queue_command+0x842)
> 0x3d42 is in pm8001_queue_command (drivers/scsi/pm8001/pm8001_sas.c:491).
> 486 atomic_dec(&pm8001_dev->running_req);
> 487 goto err_out_tag;
> 488 }
> 489 /* TODO: select normal or high priority */
> 490 spin_lock(&t->task_state_lock);
> 491 t->task_state_flags |= SAS_TASK_AT_INITIATOR;
> 492 spin_unlock(&t->task_state_lock);
> 493 } while (0);
> 494 rc = 0;
> 495 goto out_done;
>
> So the task is already completed when the submission path tries to set
> the state flag ? Debugging...

Yeah, that's how it looks.

I already mentioned this problem here:

https://lore.kernel.org/linux-scsi/[email protected]/

Maybe we should just fix it now to rule it out of possibly causing other
issues... I was reluctant to fix it as many places seems to need to be
touched. Let me check it.

Thanks,
John



2022-02-07 10:27:27

by Damien Le Moal

[permalink] [raw]
Subject: Re: [PATCH 00/16] scsi: libsas and users: Factor out LLDD TMF code

On 2/1/22 00:58, John Garry wrote:
> On 28/01/2022 09:09, John Garry wrote:
>>> I ran some more tests. In particular, I ran libzbc compliance tests on a
>>> 20TB SMR drives. All tests pass with 5.17-rc1, but after applying your
>>> series, I see command timeout that take forever to recover from, with
>>> the drive revalidation failing after that.
>>>
>>> [  385.102073] sas: Enter sas_scsi_recover_host busy: 1 failed: 1
>>> [  385.108026] sas: sas_scsi_find_task: aborting task 0x000000007068ed73
>>> [  405.561099] pm80xx0:: pm8001_exec_internal_task_abort  757:TMF task
>>> timeout.
>>> [  405.568236] sas: sas_scsi_find_task: task 0x000000007068ed73 is
>>> aborted
>>> [  405.574930] sas: sas_eh_handle_sas_errors: task 0x000000007068ed73 is
>>> aborted
>>> [  411.192602] ata21.00: qc timeout (cmd 0xec)
>>> [  431.672122] pm80xx0:: pm8001_exec_internal_task_abort  757:TMF task
>>> timeout.
>>> [  431.679282] ata21.00: failed to IDENTIFY (I/O error, err_mask=0x4)
>>> [  431.685544] ata21.00: revalidation failed (errno=-5)
>>> [  441.911948] ata21.00: qc timeout (cmd 0xec)
>>> [  462.391545] pm80xx0:: pm8001_exec_internal_task_abort  757:TMF task
>>> timeout.
>>> [  462.398696] ata21.00: failed to IDENTIFY (I/O error, err_mask=0x4)
>>> [  462.404992] ata21.00: revalidation failed (errno=-5)
>>> [  492.598769] ata21.00: qc timeout (cmd 0xec)
>>> ...
>>>
>>> So there is a problem. Need to dig into this. I see this issue only with
>>> libzbc passthrough tests. fio runs with libaio are fine.
>>
>> Thanks for the notice. I think that I also saw a hang, but, IIRC, it
>> happened on mainline for me - but it's hard to know if I broke something
>> if it is already broke in another way. That is why I wanted this card
>> working properly...
>
> Hi Damien,
>
> From testing mainline, I can see a hang on my arm64 system for SAS
> disks. I think that the reason is the we don't finish some commands in
> EH properly for pm8001:
> - In EH, we attempt to abort the task in sas_scsi_find_task() ->
> lldd_abort_task()
> The default return from pm8001_exec_internal_tmf_task() is
> -TMF_RESP_FUNC_FAILED, so if the TMF does not execute properly we return
> this value
> - sas_scsi_find_task() cannot handle -TMF_RESP_FUNC_FAILED, and returns
> -TMF_RESP_FUNC_FAILED directly to sas_eh_handle_sas_errors(), which,
> again, does not handle -TMF_RESP_FUNC_FAILED. So we don't progress to
> ever finish the comand.
>
> This looks like the correct fix for mainline:
>
> --- a/drivers/scsi/pm8001/pm8001_sas.c
> +++ b/drivers/scsi/pm8001/pm8001_sas.c
> @@ -766,7 +766,7 @@ static int pm8001_exec_internal_tmf_task(struct
> domain_device *dev,
> pm8001_dev, DS_OPERATIONAL);
> wait_for_completion(&completion_setstate);
> }
> - res = -TMF_RESP_FUNC_FAILED;
> + res = TMF_RESP_FUNC_FAILED;
>
> That's effectively the same as what I have in this series in
> sas_execute_tmf().
>
> However your testing is a SATA device, which I'll check further.

This did not help. Still seeing 100% reproducible hangs.

I did a lot of testing/digging today, and the hang cause seems to be
missing task completions. At random, a task times out as its completion
does not come, and subsequent abort trial for the task fail, revalidate
fails and the device is dropped (capacity goes to 0). But at that point,
doing rmmod/modprobe to reset the device does not work. sync cache
command issued at rmmod time never completes. I end up needing to power
cycle the machine every time...

No clue about the root cause yet, but it definitely seem to be related
to NCQ/high QD operation. If I force my tests to use non-NCQ commands,
everything is fine and the tests run to completion without any issue.

I wonder if their is a tag management bug somewhere...

I did stumble on something very ugly in libsas too: sas_ata_qc_issue()
drops and retake the ata port lock. No other ATA driver do that since
the ata completion also take that lock. The ata port lock is taken
before ata_qc_issue() is called with IRQ disabled (spin_lock_irqsave()).
So doing a spin_unlock()/spin_lock() in sas_ata_qc_issue() (called from
ata_qc_issue()) seems like a very bad idea. I removed that and
everything work the same way (the lld execute does not sleep). But that
did not solve the hang problem.

Of note is this is all with your libsas patches applied. Without the
patches, I have KASAN screaming at me about use-after-free in completion
context. With your patches, KASAN is silent.

Another thing: this driver does not allow changing the max qd... Very
annoying.

echo 1 > /sys/block/sdX/device/queue_depth

has no effect. QD stays at 32 for an ATA drive. Need to look into that too.


--
Damien Le Moal
Western Digital Research

2022-02-09 11:20:14

by Damien Le Moal

[permalink] [raw]
Subject: Re: [PATCH 00/16] scsi: libsas and users: Factor out LLDD TMF code

On 2/7/22 22:09, John Garry wrote:
> Hi Damien,
>
> On 04/02/2022 03:02, Damien Le Moal wrote:
>> This is the submission path, not completion. The code is:
>>
>> (gdb) list *(pm8001_queue_command+0x842)
>> 0x3d42 is in pm8001_queue_command (drivers/scsi/pm8001/pm8001_sas.c:491).
>> 486 atomic_dec(&pm8001_dev->running_req);
>> 487 goto err_out_tag;
>> 488 }
>> 489 /* TODO: select normal or high priority */
>> 490 spin_lock(&t->task_state_lock);
>> 491 t->task_state_flags |= SAS_TASK_AT_INITIATOR;
>> 492 spin_unlock(&t->task_state_lock);
>> 493 } while (0);
>> 494 rc = 0;
>> 495 goto out_done;
>>
>> So the task is already completed when the submission path tries to set
>> the state flag ? Debugging...
>
> JFYI, I am putting together a patch to drop SAS_TASK_AT_INITIATOR
> altogether. I just need to ensure that the logic in the isci code is
> correct.

Great. Less dead code for the pm8001 driver :)

I was busy with a nasty libata bug today so I did not continue my
investigations. Will try tomorrow.


>
> Thanks,
> John


--
Damien Le Moal
Western Digital Research

2022-02-09 12:12:18

by John Garry

[permalink] [raw]
Subject: Re: [PATCH 00/16] scsi: libsas and users: Factor out LLDD TMF code

Hi Damien,

On 04/02/2022 03:02, Damien Le Moal wrote:
> This is the submission path, not completion. The code is:
>
> (gdb) list *(pm8001_queue_command+0x842)
> 0x3d42 is in pm8001_queue_command (drivers/scsi/pm8001/pm8001_sas.c:491).
> 486 atomic_dec(&pm8001_dev->running_req);
> 487 goto err_out_tag;
> 488 }
> 489 /* TODO: select normal or high priority */
> 490 spin_lock(&t->task_state_lock);
> 491 t->task_state_flags |= SAS_TASK_AT_INITIATOR;
> 492 spin_unlock(&t->task_state_lock);
> 493 } while (0);
> 494 rc = 0;
> 495 goto out_done;
>
> So the task is already completed when the submission path tries to set
> the state flag ? Debugging...

JFYI, I am putting together a patch to drop SAS_TASK_AT_INITIATOR
altogether. I just need to ensure that the logic in the isci code is
correct.

Thanks,
John