2015-08-26 04:15:31

by Nicholas A. Bellinger

[permalink] [raw]
Subject: [PATCH 0/2] mpt3sas: Reference counting fixes from in-flight mpt2sas

From: Nicholas Bellinger <[email protected]>

Hi James & Co,

This series is a mpt3sas forward port of Calvin Owens' in-flight
reference counting bugfixes for mpt2sas LLD code here:

[PATCH v4 0/2] Fixes for memory corruption in mpt2sas
http://marc.info/?l=linux-scsi&m=143951695904115&w=2

The differences between mpt2sas + mpt3sas in this area are very,
very small, and is required to address a NULL pointer dereference
OOPsen in _scsih_probe_sas() -> mpt3sas_transport_port_add() ->
sas_port_add_phy() that I've been hitting occasionally on boot
during initial LUN scan.

So far this code has been tested on v3.14.47 with a small cluster
using SAS3008 HBAs plus a few preceeding upstream mpt3sas patches
so these patches apply cleanly, and with the changes in place the
original OOPsen appears to be resolved.

This patch series is cut atop v4.2-rc1 code, and barring any
objections from Avago folks et al., should be considered along
with Calvin's mpt2sas patch set for v4.3-rc1 code.

Thank you,

--nab

Nicholas Bellinger (2):
mpt3sas: Refcount sas_device objects and fix unsafe list usage
mpt3sas: Refcount fw_events and fix unsafe list usage

drivers/scsi/mpt3sas/mpt3sas_base.h | 23 +-
drivers/scsi/mpt3sas/mpt3sas_scsih.c | 592 ++++++++++++++++++++++---------
drivers/scsi/mpt3sas/mpt3sas_transport.c | 12 +-
3 files changed, 449 insertions(+), 178 deletions(-)

--
1.9.1


2015-08-26 04:15:56

by Nicholas A. Bellinger

[permalink] [raw]
Subject: [PATCH 1/2] mpt3sas: Refcount sas_device objects and fix unsafe list usage

From: Nicholas Bellinger <[email protected]>

These objects can be referenced concurrently throughout the driver, we
need a way to make sure threads can't delete them out from under
each other. This patch adds the refcount, and refactors the code to use
it.

Additionally, we cannot iterate over the sas_device_list without
holding the lock, or we risk corrupting random memory if items are
added or deleted as we iterate. This patch refactors _scsih_probe_sas()
to use the sas_device_list in a safe way.

This patch is a port of Calvin's PATCH-v4 for mpt2sas code.

Cc: Calvin Owens <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Sreekanth Reddy <[email protected]>
Cc: MPT-FusionLinux.pdl <[email protected]>
Signed-off-by: Nicholas Bellinger <[email protected]>
---
drivers/scsi/mpt3sas/mpt3sas_base.h | 23 +-
drivers/scsi/mpt3sas/mpt3sas_scsih.c | 476 +++++++++++++++++++++----------
drivers/scsi/mpt3sas/mpt3sas_transport.c | 12 +-
3 files changed, 355 insertions(+), 156 deletions(-)

diff --git a/drivers/scsi/mpt3sas/mpt3sas_base.h b/drivers/scsi/mpt3sas/mpt3sas_base.h
index afa8816..fe29ac0 100644
--- a/drivers/scsi/mpt3sas/mpt3sas_base.h
+++ b/drivers/scsi/mpt3sas/mpt3sas_base.h
@@ -209,6 +209,7 @@ struct Mpi2ManufacturingPage11_t {
* @flags: MPT_TARGET_FLAGS_XXX flags
* @deleted: target flaged for deletion
* @tm_busy: target is busy with TM request.
+ * @sdev: The sas_device associated with this target
*/
struct MPT3SAS_TARGET {
struct scsi_target *starget;
@@ -218,6 +219,7 @@ struct MPT3SAS_TARGET {
u32 flags;
u8 deleted;
u8 tm_busy;
+ struct _sas_device *sdev;
};


@@ -294,6 +296,7 @@ struct _internal_cmd {
* @responding: used in _scsih_sas_device_mark_responding
* @fast_path: fast path feature enable bit
* @pfa_led_on: flag for PFA LED status
+ * @refcount: reference count for deletion
*
*/
struct _sas_device {
@@ -315,8 +318,24 @@ struct _sas_device {
u8 responding;
u8 fast_path;
u8 pfa_led_on;
+ struct kref refcount;
};

+static inline void sas_device_get(struct _sas_device *s)
+{
+ kref_get(&s->refcount);
+}
+
+static inline void sas_device_free(struct kref *r)
+{
+ kfree(container_of(r, struct _sas_device, refcount));
+}
+
+static inline void sas_device_put(struct _sas_device *s)
+{
+ kref_put(&s->refcount, sas_device_free);
+}
+
/**
* struct _raid_device - raid volume link list
* @list: sas device list
@@ -1043,7 +1062,9 @@ struct _sas_node *mpt3sas_scsih_expander_find_by_handle(
struct MPT3SAS_ADAPTER *ioc, u16 handle);
struct _sas_node *mpt3sas_scsih_expander_find_by_sas_address(
struct MPT3SAS_ADAPTER *ioc, u64 sas_address);
-struct _sas_device *mpt3sas_scsih_sas_device_find_by_sas_address(
+struct _sas_device *mpt3sas_get_sdev_by_addr(
+ struct MPT3SAS_ADAPTER *ioc, u64 sas_address);
+struct _sas_device *__mpt3sas_get_sdev_by_addr(
struct MPT3SAS_ADAPTER *ioc, u64 sas_address);

void mpt3sas_port_enable_complete(struct MPT3SAS_ADAPTER *ioc);
diff --git a/drivers/scsi/mpt3sas/mpt3sas_scsih.c b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
index 5a97e32..7142e3b 100644
--- a/drivers/scsi/mpt3sas/mpt3sas_scsih.c
+++ b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
@@ -518,8 +518,61 @@ _scsih_determine_boot_device(struct MPT3SAS_ADAPTER *ioc,
}
}

+static struct _sas_device *
+__mpt3sas_get_sdev_from_target(struct MPT3SAS_ADAPTER *ioc,
+ struct MPT3SAS_TARGET *tgt_priv)
+{
+ struct _sas_device *ret;
+
+ assert_spin_locked(&ioc->sas_device_lock);
+
+ ret = tgt_priv->sdev;
+ if (ret)
+ sas_device_get(ret);
+
+ return ret;
+}
+
+static struct _sas_device *
+mpt3sas_get_sdev_from_target(struct MPT3SAS_ADAPTER *ioc,
+ struct MPT3SAS_TARGET *tgt_priv)
+{
+ struct _sas_device *ret;
+ unsigned long flags;
+
+ spin_lock_irqsave(&ioc->sas_device_lock, flags);
+ ret = __mpt3sas_get_sdev_from_target(ioc, tgt_priv);
+ spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
+
+ return ret;
+}
+
+struct _sas_device *
+__mpt3sas_get_sdev_by_addr(struct MPT3SAS_ADAPTER *ioc,
+ u64 sas_address)
+{
+ struct _sas_device *sas_device;
+
+ assert_spin_locked(&ioc->sas_device_lock);
+
+ list_for_each_entry(sas_device, &ioc->sas_device_list, list)
+ if (sas_device->sas_address == sas_address)
+ goto found_device;
+
+ list_for_each_entry(sas_device, &ioc->sas_device_init_list, list)
+ if (sas_device->sas_address == sas_address)
+ goto found_device;
+
+ return NULL;
+
+found_device:
+ sas_device_get(sas_device);
+ return sas_device;
+}
+
+
/**
- * mpt3sas_scsih_sas_device_find_by_sas_address - sas device search
+ * mpt3sas_get_sdev_by_addr - sas device search
* @ioc: per adapter object
* @sas_address: sas address
* Context: Calling function should acquire ioc->sas_device_lock
@@ -528,24 +581,44 @@ _scsih_determine_boot_device(struct MPT3SAS_ADAPTER *ioc,
* object.
*/
struct _sas_device *
-mpt3sas_scsih_sas_device_find_by_sas_address(struct MPT3SAS_ADAPTER *ioc,
+mpt3sas_get_sdev_by_addr(struct MPT3SAS_ADAPTER *ioc,
u64 sas_address)
{
struct _sas_device *sas_device;
+ unsigned long flags;
+
+ spin_lock_irqsave(&ioc->sas_device_lock, flags);
+ sas_device = __mpt3sas_get_sdev_by_addr(ioc,
+ sas_address);
+ spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
+
+ return sas_device;
+}
+
+static struct _sas_device *
+__mpt3sas_get_sdev_by_handle(struct MPT3SAS_ADAPTER *ioc, u16 handle)
+{
+ struct _sas_device *sas_device;
+
+ assert_spin_locked(&ioc->sas_device_lock);

list_for_each_entry(sas_device, &ioc->sas_device_list, list)
- if (sas_device->sas_address == sas_address)
- return sas_device;
+ if (sas_device->handle == handle)
+ goto found_device;

list_for_each_entry(sas_device, &ioc->sas_device_init_list, list)
- if (sas_device->sas_address == sas_address)
- return sas_device;
+ if (sas_device->handle == handle)
+ goto found_device;

return NULL;
+
+found_device:
+ sas_device_get(sas_device);
+ return sas_device;
}

/**
- * _scsih_sas_device_find_by_handle - sas device search
+ * mpt3sas_get_sdev_by_handle - sas device search
* @ioc: per adapter object
* @handle: sas device handle (assigned by firmware)
* Context: Calling function should acquire ioc->sas_device_lock
@@ -554,19 +627,16 @@ mpt3sas_scsih_sas_device_find_by_sas_address(struct MPT3SAS_ADAPTER *ioc,
* object.
*/
static struct _sas_device *
-_scsih_sas_device_find_by_handle(struct MPT3SAS_ADAPTER *ioc, u16 handle)
+mpt3sas_get_sdev_by_handle(struct MPT3SAS_ADAPTER *ioc, u16 handle)
{
struct _sas_device *sas_device;
+ unsigned long flags;

- list_for_each_entry(sas_device, &ioc->sas_device_list, list)
- if (sas_device->handle == handle)
- return sas_device;
-
- list_for_each_entry(sas_device, &ioc->sas_device_init_list, list)
- if (sas_device->handle == handle)
- return sas_device;
+ spin_lock_irqsave(&ioc->sas_device_lock, flags);
+ sas_device = __mpt3sas_get_sdev_by_handle(ioc, handle);
+ spin_unlock_irqrestore(&ioc->sas_device_lock, flags);

- return NULL;
+ return sas_device;
}

/**
@@ -575,7 +645,7 @@ _scsih_sas_device_find_by_handle(struct MPT3SAS_ADAPTER *ioc, u16 handle)
* @sas_device: the sas_device object
* Context: This function will acquire ioc->sas_device_lock.
*
- * Removing object and freeing associated memory from the ioc->sas_device_list.
+ * If sas_device is on the list, remove it and decrement its reference count.
*/
static void
_scsih_sas_device_remove(struct MPT3SAS_ADAPTER *ioc,
@@ -585,10 +655,15 @@ _scsih_sas_device_remove(struct MPT3SAS_ADAPTER *ioc,

if (!sas_device)
return;
-
+ /*
+ * The lock serializes access to the list, but we still need to verify
+ * that nobody removed the entry while we were waiting on the lock.
+ */
spin_lock_irqsave(&ioc->sas_device_lock, flags);
- list_del(&sas_device->list);
- kfree(sas_device);
+ if (!list_empty(&sas_device->list)) {
+ list_del_init(&sas_device->list);
+ sas_device_put(sas_device);
+ }
spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
}

@@ -609,12 +684,17 @@ _scsih_device_remove_by_handle(struct MPT3SAS_ADAPTER *ioc, u16 handle)
return;

spin_lock_irqsave(&ioc->sas_device_lock, flags);
- sas_device = _scsih_sas_device_find_by_handle(ioc, handle);
- if (sas_device)
- list_del(&sas_device->list);
+ sas_device = __mpt3sas_get_sdev_by_handle(ioc, handle);
+ if (sas_device) {
+ list_del_init(&sas_device->list);
+ sas_device_put(sas_device);
+ }
spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
- if (sas_device)
+
+ if (sas_device) {
_scsih_remove_device(ioc, sas_device);
+ sas_device_put(sas_device);
+ }
}

/**
@@ -635,13 +715,17 @@ mpt3sas_device_remove_by_sas_address(struct MPT3SAS_ADAPTER *ioc,
return;

spin_lock_irqsave(&ioc->sas_device_lock, flags);
- sas_device = mpt3sas_scsih_sas_device_find_by_sas_address(ioc,
- sas_address);
- if (sas_device)
- list_del(&sas_device->list);
+ sas_device = __mpt3sas_get_sdev_by_addr(ioc, sas_address);
+ if (sas_device) {
+ list_del_init(&sas_device->list);
+ sas_device_put(sas_device);
+ }
spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
- if (sas_device)
+
+ if (sas_device) {
_scsih_remove_device(ioc, sas_device);
+ sas_device_put(sas_device);
+ }
}

/**
@@ -664,6 +748,7 @@ _scsih_sas_device_add(struct MPT3SAS_ADAPTER *ioc,
(unsigned long long)sas_device->sas_address));

spin_lock_irqsave(&ioc->sas_device_lock, flags);
+ sas_device_get(sas_device);
list_add_tail(&sas_device->list, &ioc->sas_device_list);
spin_unlock_irqrestore(&ioc->sas_device_lock, flags);

@@ -705,6 +790,7 @@ _scsih_sas_device_init_add(struct MPT3SAS_ADAPTER *ioc,
(unsigned long long)sas_device->sas_address));

spin_lock_irqsave(&ioc->sas_device_lock, flags);
+ sas_device_get(sas_device);
list_add_tail(&sas_device->list, &ioc->sas_device_init_list);
_scsih_determine_boot_device(ioc, sas_device, 0);
spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
@@ -1083,12 +1169,15 @@ _scsih_change_queue_depth(struct scsi_device *sdev, int qdepth)
goto not_sata;
if ((sas_target_priv_data->flags & MPT_TARGET_FLAGS_VOLUME))
goto not_sata;
+
spin_lock_irqsave(&ioc->sas_device_lock, flags);
- sas_device = mpt3sas_scsih_sas_device_find_by_sas_address(ioc,
- sas_device_priv_data->sas_target->sas_address);
- if (sas_device && sas_device->device_info &
- MPI2_SAS_DEVICE_INFO_SATA_DEVICE)
- max_depth = MPT3SAS_SATA_QUEUE_DEPTH;
+ sas_device = __mpt3sas_get_sdev_from_target(ioc, sas_target_priv_data);
+ if (sas_device) {
+ if (sas_device->device_info & MPI2_SAS_DEVICE_INFO_SATA_DEVICE)
+ max_depth = MPT3SAS_SATA_QUEUE_DEPTH;
+
+ sas_device_put(sas_device);
+ }
spin_unlock_irqrestore(&ioc->sas_device_lock, flags);

not_sata:
@@ -1145,12 +1234,13 @@ _scsih_target_alloc(struct scsi_target *starget)
/* sas/sata devices */
spin_lock_irqsave(&ioc->sas_device_lock, flags);
rphy = dev_to_rphy(starget->dev.parent);
- sas_device = mpt3sas_scsih_sas_device_find_by_sas_address(ioc,
+ sas_device = __mpt3sas_get_sdev_by_addr(ioc,
rphy->identify.sas_address);

if (sas_device) {
sas_target_priv_data->handle = sas_device->handle;
sas_target_priv_data->sas_address = sas_device->sas_address;
+ sas_target_priv_data->sdev = sas_device;
sas_device->starget = starget;
sas_device->id = starget->id;
sas_device->channel = starget->channel;
@@ -1200,13 +1290,21 @@ _scsih_target_destroy(struct scsi_target *starget)

spin_lock_irqsave(&ioc->sas_device_lock, flags);
rphy = dev_to_rphy(starget->dev.parent);
- sas_device = mpt3sas_scsih_sas_device_find_by_sas_address(ioc,
- rphy->identify.sas_address);
+ sas_device = __mpt3sas_get_sdev_from_target(ioc, sas_target_priv_data);
if (sas_device && (sas_device->starget == starget) &&
(sas_device->id == starget->id) &&
(sas_device->channel == starget->channel))
sas_device->starget = NULL;

+ if (sas_device) {
+ /*
+ * Corresponding get() is in _scsih_target_alloc()
+ */
+ sas_target_priv_data->sdev = NULL;
+ sas_device_put(sas_device);
+
+ sas_device_put(sas_device);
+ }
spin_unlock_irqrestore(&ioc->sas_device_lock, flags);

out:
@@ -1262,7 +1360,7 @@ _scsih_slave_alloc(struct scsi_device *sdev)

if (!(sas_target_priv_data->flags & MPT_TARGET_FLAGS_VOLUME)) {
spin_lock_irqsave(&ioc->sas_device_lock, flags);
- sas_device = mpt3sas_scsih_sas_device_find_by_sas_address(ioc,
+ sas_device = __mpt3sas_get_sdev_by_addr(ioc,
sas_target_priv_data->sas_address);
if (sas_device && (sas_device->starget == NULL)) {
sdev_printk(KERN_INFO, sdev,
@@ -1270,6 +1368,10 @@ _scsih_slave_alloc(struct scsi_device *sdev)
__func__, __LINE__);
sas_device->starget = starget;
}
+
+ if (sas_device)
+ sas_device_put(sas_device);
+
spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
}

@@ -1304,10 +1406,14 @@ _scsih_slave_destroy(struct scsi_device *sdev)

if (!(sas_target_priv_data->flags & MPT_TARGET_FLAGS_VOLUME)) {
spin_lock_irqsave(&ioc->sas_device_lock, flags);
- sas_device = mpt3sas_scsih_sas_device_find_by_sas_address(ioc,
- sas_target_priv_data->sas_address);
+ sas_device = __mpt3sas_get_sdev_from_target(ioc,
+ sas_target_priv_data);
if (sas_device && !sas_target_priv_data->num_luns)
sas_device->starget = NULL;
+
+ if (sas_device)
+ sas_device_put(sas_device);
+
spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
}

@@ -1743,7 +1849,7 @@ _scsih_slave_configure(struct scsi_device *sdev)
}

spin_lock_irqsave(&ioc->sas_device_lock, flags);
- sas_device = mpt3sas_scsih_sas_device_find_by_sas_address(ioc,
+ sas_device = __mpt3sas_get_sdev_by_addr(ioc,
sas_device_priv_data->sas_target->sas_address);
if (!sas_device) {
spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
@@ -1777,12 +1883,12 @@ _scsih_slave_configure(struct scsi_device *sdev)
ds, (unsigned long long)
sas_device->enclosure_logical_id, sas_device->slot);

+ sas_device_put(sas_device);
spin_unlock_irqrestore(&ioc->sas_device_lock, flags);

if (!ssp_target)
_scsih_display_sata_capabilities(ioc, handle, sdev);

-
_scsih_change_queue_depth(sdev, qdepth);

if (ssp_target) {
@@ -2173,8 +2279,7 @@ _scsih_tm_display_info(struct MPT3SAS_ADAPTER *ioc, struct scsi_cmnd *scmd)
device_str, (unsigned long long)priv_target->sas_address);
} else {
spin_lock_irqsave(&ioc->sas_device_lock, flags);
- sas_device = mpt3sas_scsih_sas_device_find_by_sas_address(ioc,
- priv_target->sas_address);
+ sas_device = __mpt3sas_get_sdev_from_target(ioc, priv_target);
if (sas_device) {
if (priv_target->flags &
MPT_TARGET_FLAGS_RAID_COMPONENT) {
@@ -2193,6 +2298,8 @@ _scsih_tm_display_info(struct MPT3SAS_ADAPTER *ioc, struct scsi_cmnd *scmd)
"enclosure_logical_id(0x%016llx), slot(%d)\n",
(unsigned long long)sas_device->enclosure_logical_id,
sas_device->slot);
+
+ sas_device_put(sas_device);
}
spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
}
@@ -2268,10 +2375,11 @@ _scsih_dev_reset(struct scsi_cmnd *scmd)
{
struct MPT3SAS_ADAPTER *ioc = shost_priv(scmd->device->host);
struct MPT3SAS_DEVICE *sas_device_priv_data;
- struct _sas_device *sas_device;
- unsigned long flags;
+ struct _sas_device *sas_device = NULL;
u16 handle;
int r;
+ struct scsi_target *starget = scmd->device->sdev_target;
+ struct MPT3SAS_TARGET *target_priv_data = starget->hostdata;

sdev_printk(KERN_INFO, scmd->device,
"attempting device reset! scmd(%p)\n", scmd);
@@ -2291,12 +2399,10 @@ _scsih_dev_reset(struct scsi_cmnd *scmd)
handle = 0;
if (sas_device_priv_data->sas_target->flags &
MPT_TARGET_FLAGS_RAID_COMPONENT) {
- spin_lock_irqsave(&ioc->sas_device_lock, flags);
- sas_device = _scsih_sas_device_find_by_handle(ioc,
- sas_device_priv_data->sas_target->handle);
+ sas_device = mpt3sas_get_sdev_from_target(ioc,
+ target_priv_data);
if (sas_device)
handle = sas_device->volume_handle;
- spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
} else
handle = sas_device_priv_data->sas_target->handle;

@@ -2313,6 +2419,10 @@ _scsih_dev_reset(struct scsi_cmnd *scmd)
out:
sdev_printk(KERN_INFO, scmd->device, "device reset: %s scmd(%p)\n",
((r == SUCCESS) ? "SUCCESS" : "FAILED"), scmd);
+
+ if (sas_device)
+ sas_device_put(sas_device);
+
return r;
}

@@ -2327,11 +2437,11 @@ _scsih_target_reset(struct scsi_cmnd *scmd)
{
struct MPT3SAS_ADAPTER *ioc = shost_priv(scmd->device->host);
struct MPT3SAS_DEVICE *sas_device_priv_data;
- struct _sas_device *sas_device;
- unsigned long flags;
+ struct _sas_device *sas_device = NULL;
u16 handle;
int r;
struct scsi_target *starget = scmd->device->sdev_target;
+ struct MPT3SAS_TARGET *target_priv_data = starget->hostdata;

starget_printk(KERN_INFO, starget, "attempting target reset! scmd(%p)\n",
scmd);
@@ -2351,12 +2461,10 @@ _scsih_target_reset(struct scsi_cmnd *scmd)
handle = 0;
if (sas_device_priv_data->sas_target->flags &
MPT_TARGET_FLAGS_RAID_COMPONENT) {
- spin_lock_irqsave(&ioc->sas_device_lock, flags);
- sas_device = _scsih_sas_device_find_by_handle(ioc,
- sas_device_priv_data->sas_target->handle);
+ sas_device = mpt3sas_get_sdev_from_target(ioc,
+ target_priv_data);
if (sas_device)
handle = sas_device->volume_handle;
- spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
} else
handle = sas_device_priv_data->sas_target->handle;

@@ -2373,6 +2481,10 @@ _scsih_target_reset(struct scsi_cmnd *scmd)
out:
starget_printk(KERN_INFO, starget, "target reset: %s scmd(%p)\n",
((r == SUCCESS) ? "SUCCESS" : "FAILED"), scmd);
+
+ if (sas_device)
+ sas_device_put(sas_device);
+
return r;
}

@@ -2683,15 +2795,15 @@ _scsih_block_io_to_children_attached_to_ex(struct MPT3SAS_ADAPTER *ioc,

list_for_each_entry(mpt3sas_port,
&sas_expander->sas_port_list, port_list) {
- if (mpt3sas_port->remote_identify.device_type ==
- SAS_END_DEVICE) {
+ if (mpt3sas_port->remote_identify.device_type == SAS_END_DEVICE) {
spin_lock_irqsave(&ioc->sas_device_lock, flags);
- sas_device =
- mpt3sas_scsih_sas_device_find_by_sas_address(ioc,
- mpt3sas_port->remote_identify.sas_address);
- if (sas_device)
+ sas_device = __mpt3sas_get_sdev_by_addr(ioc,
+ mpt3sas_port->remote_identify.sas_address);
+ if (sas_device) {
set_bit(sas_device->handle,
- ioc->blocking_handles);
+ ioc->blocking_handles);
+ sas_device_put(sas_device);
+ }
spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
}
}
@@ -2759,7 +2871,7 @@ _scsih_tm_tr_send(struct MPT3SAS_ADAPTER *ioc, u16 handle)
{
Mpi2SCSITaskManagementRequest_t *mpi_request;
u16 smid;
- struct _sas_device *sas_device;
+ struct _sas_device *sas_device = NULL;
struct MPT3SAS_TARGET *sas_target_priv_data = NULL;
u64 sas_address = 0;
unsigned long flags;
@@ -2792,7 +2904,7 @@ _scsih_tm_tr_send(struct MPT3SAS_ADAPTER *ioc, u16 handle)
return;

spin_lock_irqsave(&ioc->sas_device_lock, flags);
- sas_device = _scsih_sas_device_find_by_handle(ioc, handle);
+ sas_device = __mpt3sas_get_sdev_by_handle(ioc, handle);
if (sas_device && sas_device->starget &&
sas_device->starget->hostdata) {
sas_target_priv_data = sas_device->starget->hostdata;
@@ -2814,14 +2926,14 @@ _scsih_tm_tr_send(struct MPT3SAS_ADAPTER *ioc, u16 handle)
if (!smid) {
delayed_tr = kzalloc(sizeof(*delayed_tr), GFP_ATOMIC);
if (!delayed_tr)
- return;
+ goto out;
INIT_LIST_HEAD(&delayed_tr->list);
delayed_tr->handle = handle;
list_add_tail(&delayed_tr->list, &ioc->delayed_tr_list);
dewtprintk(ioc, pr_info(MPT3SAS_FMT
"DELAYED:tr:handle(0x%04x), (open)\n",
ioc->name, handle));
- return;
+ goto out;
}

dewtprintk(ioc, pr_info(MPT3SAS_FMT
@@ -2835,6 +2947,9 @@ _scsih_tm_tr_send(struct MPT3SAS_ADAPTER *ioc, u16 handle)
mpi_request->TaskType = MPI2_SCSITASKMGMT_TASKTYPE_TARGET_RESET;
mpt3sas_base_put_smid_hi_priority(ioc, smid);
mpt3sas_trigger_master(ioc, MASTER_TRIGGER_DEVICE_REMOVAL);
+out:
+ if (sas_device)
+ sas_device_put(sas_device);
}

/**
@@ -3685,7 +3800,6 @@ _scsih_scsi_ioc_info(struct MPT3SAS_ADAPTER *ioc, struct scsi_cmnd *scmd,
char *desc_scsi_state = ioc->tmp_string;
u32 log_info = le32_to_cpu(mpi_reply->IOCLogInfo);
struct _sas_device *sas_device = NULL;
- unsigned long flags;
struct scsi_target *starget = scmd->device->sdev_target;
struct MPT3SAS_TARGET *priv_target = starget->hostdata;
char *device_str = NULL;
@@ -3813,9 +3927,7 @@ _scsih_scsi_ioc_info(struct MPT3SAS_ADAPTER *ioc, struct scsi_cmnd *scmd,
pr_warn(MPT3SAS_FMT "\t%s wwid(0x%016llx)\n", ioc->name,
device_str, (unsigned long long)priv_target->sas_address);
} else {
- spin_lock_irqsave(&ioc->sas_device_lock, flags);
- sas_device = mpt3sas_scsih_sas_device_find_by_sas_address(ioc,
- priv_target->sas_address);
+ sas_device = mpt3sas_get_sdev_from_target(ioc, priv_target);
if (sas_device) {
pr_warn(MPT3SAS_FMT
"\tsas_address(0x%016llx), phy(%d)\n",
@@ -3825,8 +3937,9 @@ _scsih_scsi_ioc_info(struct MPT3SAS_ADAPTER *ioc, struct scsi_cmnd *scmd,
"\tenclosure_logical_id(0x%016llx), slot(%d)\n",
ioc->name, (unsigned long long)
sas_device->enclosure_logical_id, sas_device->slot);
+
+ sas_device_put(sas_device);
}
- spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
}

pr_warn(MPT3SAS_FMT
@@ -3878,7 +3991,7 @@ _scsih_turn_on_pfa_led(struct MPT3SAS_ADAPTER *ioc, u16 handle)
Mpi2SepRequest_t mpi_request;
struct _sas_device *sas_device;

- sas_device = _scsih_sas_device_find_by_handle(ioc, handle);
+ sas_device = mpt3sas_get_sdev_by_handle(ioc, handle);
if (!sas_device)
return;

@@ -3893,7 +4006,7 @@ _scsih_turn_on_pfa_led(struct MPT3SAS_ADAPTER *ioc, u16 handle)
&mpi_request)) != 0) {
pr_err(MPT3SAS_FMT "failure at %s:%d/%s()!\n", ioc->name,
__FILE__, __LINE__, __func__);
- return;
+ goto out;
}
sas_device->pfa_led_on = 1;

@@ -3902,8 +4015,10 @@ _scsih_turn_on_pfa_led(struct MPT3SAS_ADAPTER *ioc, u16 handle)
"enclosure_processor: ioc_status (0x%04x), loginfo(0x%08x)\n",
ioc->name, le16_to_cpu(mpi_reply.IOCStatus),
le32_to_cpu(mpi_reply.IOCLogInfo)));
- return;
+ goto out;
}
+out:
+ sas_device_put(sas_device);
}
/**
* _scsih_turn_off_pfa_led - turn off Fault LED
@@ -3986,19 +4101,17 @@ _scsih_smart_predicted_fault(struct MPT3SAS_ADAPTER *ioc, u16 handle)

/* only handle non-raid devices */
spin_lock_irqsave(&ioc->sas_device_lock, flags);
- sas_device = _scsih_sas_device_find_by_handle(ioc, handle);
+ sas_device = __mpt3sas_get_sdev_by_handle(ioc, handle);
if (!sas_device) {
- spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
- return;
+ goto out_unlock;
}
starget = sas_device->starget;
sas_target_priv_data = starget->hostdata;

if ((sas_target_priv_data->flags & MPT_TARGET_FLAGS_RAID_COMPONENT) ||
- ((sas_target_priv_data->flags & MPT_TARGET_FLAGS_VOLUME))) {
- spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
- return;
- }
+ ((sas_target_priv_data->flags & MPT_TARGET_FLAGS_VOLUME)))
+ goto out_unlock;
+
starget_printk(KERN_WARNING, starget, "predicted fault\n");
spin_unlock_irqrestore(&ioc->sas_device_lock, flags);

@@ -4012,7 +4125,7 @@ _scsih_smart_predicted_fault(struct MPT3SAS_ADAPTER *ioc, u16 handle)
if (!event_reply) {
pr_err(MPT3SAS_FMT "failure at %s:%d/%s()!\n",
ioc->name, __FILE__, __LINE__, __func__);
- return;
+ goto out;
}

event_reply->Function = MPI2_FUNCTION_EVENT_NOTIFICATION;
@@ -4029,6 +4142,14 @@ _scsih_smart_predicted_fault(struct MPT3SAS_ADAPTER *ioc, u16 handle)
event_data->SASAddress = cpu_to_le64(sas_target_priv_data->sas_address);
mpt3sas_ctl_add_to_event_log(ioc, event_reply);
kfree(event_reply);
+out:
+ if (sas_device)
+ sas_device_put(sas_device);
+ return;
+
+out_unlock:
+ spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
+ goto out;
}

/**
@@ -4772,13 +4893,11 @@ _scsih_check_device(struct MPT3SAS_ADAPTER *ioc,

spin_lock_irqsave(&ioc->sas_device_lock, flags);
sas_address = le64_to_cpu(sas_device_pg0.SASAddress);
- sas_device = mpt3sas_scsih_sas_device_find_by_sas_address(ioc,
+ sas_device = __mpt3sas_get_sdev_by_addr(ioc,
sas_address);

- if (!sas_device) {
- spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
- return;
- }
+ if (!sas_device)
+ goto out_unlock;

if (unlikely(sas_device->handle != handle)) {
starget = sas_device->starget;
@@ -4796,20 +4915,24 @@ _scsih_check_device(struct MPT3SAS_ADAPTER *ioc,
pr_err(MPT3SAS_FMT
"device is not present handle(0x%04x), flags!!!\n",
ioc->name, handle);
- spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
- return;
+ goto out_unlock;
}

/* check if there were any issues with discovery */
if (_scsih_check_access_status(ioc, sas_address, handle,
- sas_device_pg0.AccessStatus)) {
- spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
- return;
- }
+ sas_device_pg0.AccessStatus))
+ goto out_unlock;

spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
_scsih_ublock_io_device(ioc, sas_address);
+ if (sas_device)
+ sas_device_put(sas_device);
+ return;

+out_unlock:
+ spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
+ if (sas_device)
+ sas_device_put(sas_device);
}

/**
@@ -4834,7 +4957,6 @@ _scsih_add_device(struct MPT3SAS_ADAPTER *ioc, u16 handle, u8 phy_num,
u32 ioc_status;
u64 sas_address;
u32 device_info;
- unsigned long flags;

if ((mpt3sas_config_get_sas_device_pg0(ioc, &mpi_reply, &sas_device_pg0,
MPI2_SAS_DEVICE_PGAD_FORM_HANDLE, handle))) {
@@ -4870,13 +4992,11 @@ _scsih_add_device(struct MPT3SAS_ADAPTER *ioc, u16 handle, u8 phy_num,
sas_device_pg0.AccessStatus))
return -1;

- spin_lock_irqsave(&ioc->sas_device_lock, flags);
- sas_device = mpt3sas_scsih_sas_device_find_by_sas_address(ioc,
- sas_address);
- spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
-
- if (sas_device)
+ sas_device = mpt3sas_get_sdev_by_addr(ioc, sas_address);
+ if (sas_device) {
+ sas_device_put(sas_device);
return -1;
+ }

sas_device = kzalloc(sizeof(struct _sas_device),
GFP_KERNEL);
@@ -4886,6 +5006,7 @@ _scsih_add_device(struct MPT3SAS_ADAPTER *ioc, u16 handle, u8 phy_num,
return 0;
}

+ kref_init(&sas_device->refcount);
sas_device->handle = handle;
if (_scsih_get_sas_address(ioc,
le16_to_cpu(sas_device_pg0.ParentDevHandle),
@@ -4917,6 +5038,7 @@ _scsih_add_device(struct MPT3SAS_ADAPTER *ioc, u16 handle, u8 phy_num,
else
_scsih_sas_device_add(ioc, sas_device);

+ sas_device_put(sas_device);
return 0;
}

@@ -4965,8 +5087,6 @@ _scsih_remove_device(struct MPT3SAS_ADAPTER *ioc,
ioc->name, __func__,
sas_device->handle, (unsigned long long)
sas_device->sas_address));
-
- kfree(sas_device);
}

#ifdef CONFIG_SCSI_MPT3SAS_LOGGING
@@ -5291,25 +5411,25 @@ _scsih_sas_device_status_change_event(struct MPT3SAS_ADAPTER *ioc,

spin_lock_irqsave(&ioc->sas_device_lock, flags);
sas_address = le64_to_cpu(event_data->SASAddress);
- sas_device = mpt3sas_scsih_sas_device_find_by_sas_address(ioc,
- sas_address);
+ sas_device = __mpt3sas_get_sdev_by_addr(ioc, sas_address);

- if (!sas_device || !sas_device->starget) {
- spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
- return;
- }
+ if (!sas_device || !sas_device->starget)
+ goto out;

target_priv_data = sas_device->starget->hostdata;
- if (!target_priv_data) {
- spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
- return;
- }
+ if (!target_priv_data)
+ goto out;

if (event_data->ReasonCode ==
MPI2_EVENT_SAS_DEV_STAT_RC_INTERNAL_DEVICE_RESET)
target_priv_data->tm_busy = 1;
else
target_priv_data->tm_busy = 0;
+
+out:
+ if (sas_device)
+ sas_device_put(sas_device);
+
spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
}

@@ -5793,7 +5913,7 @@ _scsih_sas_pd_expose(struct MPT3SAS_ADAPTER *ioc,
u16 handle = le16_to_cpu(element->PhysDiskDevHandle);

spin_lock_irqsave(&ioc->sas_device_lock, flags);
- sas_device = _scsih_sas_device_find_by_handle(ioc, handle);
+ sas_device = __mpt3sas_get_sdev_by_handle(ioc, handle);
if (sas_device) {
sas_device->volume_handle = 0;
sas_device->volume_wwid = 0;
@@ -5812,6 +5932,8 @@ _scsih_sas_pd_expose(struct MPT3SAS_ADAPTER *ioc,
/* exposing raid component */
if (starget)
starget_for_each_device(starget, NULL, _scsih_reprobe_lun);
+
+ sas_device_put(sas_device);
}

/**
@@ -5840,7 +5962,7 @@ _scsih_sas_pd_hide(struct MPT3SAS_ADAPTER *ioc,
&volume_wwid);

spin_lock_irqsave(&ioc->sas_device_lock, flags);
- sas_device = _scsih_sas_device_find_by_handle(ioc, handle);
+ sas_device = __mpt3sas_get_sdev_by_handle(ioc, handle);
if (sas_device) {
set_bit(handle, ioc->pd_handles);
if (sas_device->starget && sas_device->starget->hostdata) {
@@ -5860,6 +5982,8 @@ _scsih_sas_pd_hide(struct MPT3SAS_ADAPTER *ioc,
_scsih_ir_fastpath(ioc, handle, element->PhysDiskNum);
if (starget)
starget_for_each_device(starget, (void *)1, _scsih_reprobe_lun);
+
+ sas_device_put(sas_device);
}

/**
@@ -5892,7 +6016,6 @@ _scsih_sas_pd_add(struct MPT3SAS_ADAPTER *ioc,
Mpi2EventIrConfigElement_t *element)
{
struct _sas_device *sas_device;
- unsigned long flags;
u16 handle = le16_to_cpu(element->PhysDiskDevHandle);
Mpi2ConfigReply_t mpi_reply;
Mpi2SasDevicePage0_t sas_device_pg0;
@@ -5902,11 +6025,10 @@ _scsih_sas_pd_add(struct MPT3SAS_ADAPTER *ioc,

set_bit(handle, ioc->pd_handles);

- spin_lock_irqsave(&ioc->sas_device_lock, flags);
- sas_device = _scsih_sas_device_find_by_handle(ioc, handle);
- spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
+ sas_device = mpt3sas_get_sdev_by_handle(ioc, handle);
if (sas_device) {
_scsih_ir_fastpath(ioc, handle, element->PhysDiskNum);
+ sas_device_put(sas_device);
return;
}

@@ -6183,7 +6305,6 @@ _scsih_sas_ir_physical_disk_event(struct MPT3SAS_ADAPTER *ioc,
u16 handle, parent_handle;
u32 state;
struct _sas_device *sas_device;
- unsigned long flags;
Mpi2ConfigReply_t mpi_reply;
Mpi2SasDevicePage0_t sas_device_pg0;
u32 ioc_status;
@@ -6212,12 +6333,12 @@ _scsih_sas_ir_physical_disk_event(struct MPT3SAS_ADAPTER *ioc,
case MPI2_RAID_PD_STATE_HOT_SPARE:

set_bit(handle, ioc->pd_handles);
- spin_lock_irqsave(&ioc->sas_device_lock, flags);
- sas_device = _scsih_sas_device_find_by_handle(ioc, handle);
- spin_unlock_irqrestore(&ioc->sas_device_lock, flags);

- if (sas_device)
+ sas_device = mpt3sas_get_sdev_by_handle(ioc, handle);
+ if (sas_device) {
+ sas_device_put(sas_device);
return;
+ }

if ((mpt3sas_config_get_sas_device_pg0(ioc, &mpi_reply,
&sas_device_pg0, MPI2_SAS_DEVICE_PGAD_FORM_HANDLE,
@@ -6673,6 +6794,7 @@ _scsih_remove_unresponding_sas_devices(struct MPT3SAS_ADAPTER *ioc)
struct _raid_device *raid_device, *raid_device_next;
struct list_head tmp_list;
unsigned long flags;
+ LIST_HEAD(head);

pr_info(MPT3SAS_FMT "removing unresponding devices: start\n",
ioc->name);
@@ -6680,14 +6802,29 @@ _scsih_remove_unresponding_sas_devices(struct MPT3SAS_ADAPTER *ioc)
/* removing unresponding end devices */
pr_info(MPT3SAS_FMT "removing unresponding devices: end-devices\n",
ioc->name);
+
+ /*
+ * Iterate, pulling off devices marked as non-responding. We become the
+ * owner for the reference the list had on any object we prune.
+ */
+ spin_lock_irqsave(&ioc->sas_device_lock, flags);
list_for_each_entry_safe(sas_device, sas_device_next,
- &ioc->sas_device_list, list) {
+ &ioc->sas_device_list, list) {
if (!sas_device->responding)
- mpt3sas_device_remove_by_sas_address(ioc,
- sas_device->sas_address);
+ list_move_tail(&sas_device->list, &head);
else
sas_device->responding = 0;
}
+ spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
+
+ /*
+ * Now, uninitialize and remove the unresponding devices we pruned.
+ */
+ list_for_each_entry_safe(sas_device, sas_device_next, &head, list) {
+ _scsih_remove_device(ioc, sas_device);
+ list_del_init(&sas_device->list);
+ sas_device_put(sas_device);
+ }

/* removing unresponding volumes */
if (ioc->ir_firmware) {
@@ -6841,11 +6978,11 @@ _scsih_scan_for_devices_after_reset(struct MPT3SAS_ADAPTER *ioc)
}
phys_disk_num = pd_pg0.PhysDiskNum;
handle = le16_to_cpu(pd_pg0.DevHandle);
- spin_lock_irqsave(&ioc->sas_device_lock, flags);
- sas_device = _scsih_sas_device_find_by_handle(ioc, handle);
- spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
- if (sas_device)
+ sas_device = mpt3sas_get_sdev_by_handle(ioc, handle);
+ if (sas_device) {
+ sas_device_put(sas_device);
continue;
+ }
if (mpt3sas_config_get_sas_device_pg0(ioc, &mpi_reply,
&sas_device_pg0, MPI2_SAS_DEVICE_PGAD_FORM_HANDLE,
handle) != 0)
@@ -6966,12 +7103,12 @@ _scsih_scan_for_devices_after_reset(struct MPT3SAS_ADAPTER *ioc)
if (!(_scsih_is_end_device(
le32_to_cpu(sas_device_pg0.DeviceInfo))))
continue;
- spin_lock_irqsave(&ioc->sas_device_lock, flags);
- sas_device = mpt3sas_scsih_sas_device_find_by_sas_address(ioc,
+ sas_device = mpt3sas_get_sdev_by_addr(ioc,
le64_to_cpu(sas_device_pg0.SASAddress));
- spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
- if (sas_device)
+ if (sas_device) {
+ sas_device_put(sas_device);
continue;
+ }
parent_handle = le16_to_cpu(sas_device_pg0.ParentDevHandle);
if (!_scsih_get_sas_address(ioc, parent_handle, &sas_address)) {
pr_info(MPT3SAS_FMT "\tBEFORE adding end device: " \
@@ -7598,6 +7735,48 @@ _scsih_probe_raid(struct MPT3SAS_ADAPTER *ioc)
}
}

+static struct _sas_device *get_next_sas_device(struct MPT3SAS_ADAPTER *ioc)
+{
+ struct _sas_device *sas_device = NULL;
+ unsigned long flags;
+
+ spin_lock_irqsave(&ioc->sas_device_lock, flags);
+ if (!list_empty(&ioc->sas_device_init_list)) {
+ sas_device = list_first_entry(&ioc->sas_device_init_list,
+ struct _sas_device, list);
+ sas_device_get(sas_device);
+ }
+ spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
+
+ return sas_device;
+}
+
+static void sas_device_make_active(struct MPT3SAS_ADAPTER *ioc,
+ struct _sas_device *sas_device)
+{
+ unsigned long flags;
+
+ spin_lock_irqsave(&ioc->sas_device_lock, flags);
+
+ /*
+ * Since we dropped the lock during the call to port_add(), we need to
+ * be careful here that somebody else didn't move or delete this item
+ * while we were busy with other things.
+ *
+ * If it was on the list, we need a put() for the reference the list
+ * had. Either way, we need a get() for the destination list.
+ */
+ if (!list_empty(&sas_device->list)) {
+ list_del_init(&sas_device->list);
+ sas_device_put(sas_device);
+ }
+
+ sas_device_get(sas_device);
+ list_add_tail(&sas_device->list, &ioc->sas_device_list);
+
+ spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
+}
+
/**
* _scsih_probe_sas - reporting sas devices to sas transport
* @ioc: per adapter object
@@ -7607,17 +7786,13 @@ _scsih_probe_raid(struct MPT3SAS_ADAPTER *ioc)
static void
_scsih_probe_sas(struct MPT3SAS_ADAPTER *ioc)
{
- struct _sas_device *sas_device, *next;
- unsigned long flags;
-
- /* SAS Device List */
- list_for_each_entry_safe(sas_device, next, &ioc->sas_device_init_list,
- list) {
+ struct _sas_device *sas_device;

+ while ((sas_device = get_next_sas_device(ioc))) {
if (!mpt3sas_transport_port_add(ioc, sas_device->handle,
- sas_device->sas_address_parent)) {
- list_del(&sas_device->list);
- kfree(sas_device);
+ sas_device->sas_address_parent)) {
+ _scsih_sas_device_remove(ioc, sas_device);
+ sas_device_put(sas_device);
continue;
} else if (!sas_device->starget) {
/*
@@ -7628,17 +7803,16 @@ _scsih_probe_sas(struct MPT3SAS_ADAPTER *ioc)
*/
if (!ioc->is_driver_loading) {
mpt3sas_transport_port_remove(ioc,
- sas_device->sas_address,
- sas_device->sas_address_parent);
- list_del(&sas_device->list);
- kfree(sas_device);
+ sas_device->sas_address,
+ sas_device->sas_address_parent);
+ _scsih_sas_device_remove(ioc, sas_device);
+ sas_device_put(sas_device);
continue;
}
}

- spin_lock_irqsave(&ioc->sas_device_lock, flags);
- list_move_tail(&sas_device->list, &ioc->sas_device_list);
- spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
+ sas_device_make_active(ioc, sas_device);
+ sas_device_put(sas_device);
}
}

diff --git a/drivers/scsi/mpt3sas/mpt3sas_transport.c b/drivers/scsi/mpt3sas/mpt3sas_transport.c
index efb98af..2f9d038 100644
--- a/drivers/scsi/mpt3sas/mpt3sas_transport.c
+++ b/drivers/scsi/mpt3sas/mpt3sas_transport.c
@@ -1306,15 +1306,17 @@ _transport_get_enclosure_identifier(struct sas_rphy *rphy, u64 *identifier)
int rc;

spin_lock_irqsave(&ioc->sas_device_lock, flags);
- sas_device = mpt3sas_scsih_sas_device_find_by_sas_address(ioc,
+ sas_device = __mpt3sas_get_sdev_by_addr(ioc,
rphy->identify.sas_address);
if (sas_device) {
*identifier = sas_device->enclosure_logical_id;
rc = 0;
+ sas_device_put(sas_device);
} else {
*identifier = 0;
rc = -ENXIO;
}
+
spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
return rc;
}
@@ -1334,12 +1336,14 @@ _transport_get_bay_identifier(struct sas_rphy *rphy)
int rc;

spin_lock_irqsave(&ioc->sas_device_lock, flags);
- sas_device = mpt3sas_scsih_sas_device_find_by_sas_address(ioc,
+ sas_device = __mpt3sas_get_sdev_by_addr(ioc,
rphy->identify.sas_address);
- if (sas_device)
+ if (sas_device) {
rc = sas_device->slot;
- else
+ sas_device_put(sas_device);
+ } else {
rc = -ENXIO;
+ }
spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
return rc;
}
--
1.9.1

2015-08-26 04:15:34

by Nicholas A. Bellinger

[permalink] [raw]
Subject: [PATCH 2/2] mpt3sas: Refcount fw_events and fix unsafe list usage

From: Nicholas Bellinger <[email protected]>

The fw_event_work struct is concurrently referenced at shutdown, so
add a refcount to protect it, and refactor the code to use it.

Additionally, refactor _scsih_fw_event_cleanup_queue() such that it
no longer iterates over the list without holding the lock, since
_firmware_event_work() concurrently deletes items from the list.

This patch is a port of Calvin's PATCH-v4 for mpt2sas code.

Cc: Calvin Owens <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Sreekanth Reddy <[email protected]>
Cc: MPT-FusionLinux.pdl <[email protected]>
Signed-off-by: Nicholas Bellinger <[email protected]>
---
drivers/scsi/mpt3sas/mpt3sas_scsih.c | 116 ++++++++++++++++++++++++++++-------
1 file changed, 94 insertions(+), 22 deletions(-)

diff --git a/drivers/scsi/mpt3sas/mpt3sas_scsih.c b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
index 7142e3b..4cb7e89 100644
--- a/drivers/scsi/mpt3sas/mpt3sas_scsih.c
+++ b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
@@ -175,6 +175,7 @@ struct sense_info {
* @VP_ID: virtual port id
* @ignore: flag meaning this event has been marked to ignore
* @event: firmware event MPI2_EVENT_XXX defined in mpt2_ioc.h
+ * @refcount: reference count for fw_event_work
* @event_data: reply event data payload follows
*
* This object stored on ioc->fw_event_list.
@@ -191,9 +192,37 @@ struct fw_event_work {
u8 VP_ID;
u8 ignore;
u16 event;
+ struct kref refcount;
char event_data[0] __aligned(4);
};

+static void fw_event_work_free(struct kref *r)
+{
+ kfree(container_of(r, struct fw_event_work, refcount));
+}
+
+static void fw_event_work_get(struct fw_event_work *fw_work)
+{
+ kref_get(&fw_work->refcount);
+}
+
+static void fw_event_work_put(struct fw_event_work *fw_work)
+{
+ kref_put(&fw_work->refcount, fw_event_work_free);
+}
+
+static struct fw_event_work *alloc_fw_event_work(int len)
+{
+ struct fw_event_work *fw_event;
+
+ fw_event = kzalloc(sizeof(*fw_event) + len, GFP_ATOMIC);
+ if (!fw_event)
+ return NULL;
+
+ kref_init(&fw_event->refcount);
+ return fw_event;
+}
+
/* raid transport support */
static struct raid_template *mpt3sas_raid_template;

@@ -2542,32 +2571,36 @@ _scsih_fw_event_add(struct MPT3SAS_ADAPTER *ioc, struct fw_event_work *fw_event)
return;

spin_lock_irqsave(&ioc->fw_event_lock, flags);
+ fw_event_work_get(fw_event);
INIT_LIST_HEAD(&fw_event->list);
list_add_tail(&fw_event->list, &ioc->fw_event_list);
INIT_WORK(&fw_event->work, _firmware_event_work);
+ fw_event_work_get(fw_event);
queue_work(ioc->firmware_event_thread, &fw_event->work);
spin_unlock_irqrestore(&ioc->fw_event_lock, flags);
}

/**
- * _scsih_fw_event_free - delete fw_event
+ * _scsih_fw_event_del_from_list - delete fw_event from the list
* @ioc: per adapter object
* @fw_event: object describing the event
* Context: This function will acquire ioc->fw_event_lock.
*
- * This removes firmware event object from link list, frees associated memory.
+ * If the fw_event is on the fw_event_list, remove it and do a put.
*
* Return nothing.
*/
static void
-_scsih_fw_event_free(struct MPT3SAS_ADAPTER *ioc, struct fw_event_work
+_scsih_fw_event_del_from_list(struct MPT3SAS_ADAPTER *ioc, struct fw_event_work
*fw_event)
{
unsigned long flags;

spin_lock_irqsave(&ioc->fw_event_lock, flags);
- list_del(&fw_event->list);
- kfree(fw_event);
+ if (!list_empty(&fw_event->list)) {
+ list_del_init(&fw_event->list);
+ fw_event_work_put(fw_event);
+ }
spin_unlock_irqrestore(&ioc->fw_event_lock, flags);
}

@@ -2587,14 +2620,14 @@ mpt3sas_send_trigger_data_event(struct MPT3SAS_ADAPTER *ioc,

if (ioc->is_driver_loading)
return;
- fw_event = kzalloc(sizeof(*fw_event) + sizeof(*event_data),
- GFP_ATOMIC);
+ fw_event = alloc_fw_event_work(sizeof(*event_data));
if (!fw_event)
return;
fw_event->event = MPT3SAS_PROCESS_TRIGGER_DIAG;
fw_event->ioc = ioc;
memcpy(fw_event->event_data, event_data, sizeof(*event_data));
_scsih_fw_event_add(ioc, fw_event);
+ fw_event_work_put(fw_event);
}

/**
@@ -2610,12 +2643,13 @@ _scsih_error_recovery_delete_devices(struct MPT3SAS_ADAPTER *ioc)

if (ioc->is_driver_loading)
return;
- fw_event = kzalloc(sizeof(struct fw_event_work), GFP_ATOMIC);
+ fw_event = alloc_fw_event_work(0);
if (!fw_event)
return;
fw_event->event = MPT3SAS_REMOVE_UNRESPONDING_DEVICES;
fw_event->ioc = ioc;
_scsih_fw_event_add(ioc, fw_event);
+ fw_event_work_put(fw_event);
}

/**
@@ -2629,12 +2663,29 @@ mpt3sas_port_enable_complete(struct MPT3SAS_ADAPTER *ioc)
{
struct fw_event_work *fw_event;

- fw_event = kzalloc(sizeof(struct fw_event_work), GFP_ATOMIC);
+ fw_event = alloc_fw_event_work(0);
if (!fw_event)
return;
fw_event->event = MPT3SAS_PORT_ENABLE_COMPLETE;
fw_event->ioc = ioc;
_scsih_fw_event_add(ioc, fw_event);
+ fw_event_work_put(fw_event);
+}
+
+static struct fw_event_work *dequeue_next_fw_event(struct MPT3SAS_ADAPTER *ioc)
+{
+ unsigned long flags;
+ struct fw_event_work *fw_event = NULL;
+
+ spin_lock_irqsave(&ioc->fw_event_lock, flags);
+ if (!list_empty(&ioc->fw_event_list)) {
+ fw_event = list_first_entry(&ioc->fw_event_list,
+ struct fw_event_work, list);
+ list_del_init(&fw_event->list);
+ }
+ spin_unlock_irqrestore(&ioc->fw_event_lock, flags);
+
+ return fw_event;
}

/**
@@ -2649,17 +2700,25 @@ mpt3sas_port_enable_complete(struct MPT3SAS_ADAPTER *ioc)
static void
_scsih_fw_event_cleanup_queue(struct MPT3SAS_ADAPTER *ioc)
{
- struct fw_event_work *fw_event, *next;
+ struct fw_event_work *fw_event;

if (list_empty(&ioc->fw_event_list) ||
!ioc->firmware_event_thread || in_interrupt())
return;

- list_for_each_entry_safe(fw_event, next, &ioc->fw_event_list, list) {
- if (cancel_delayed_work_sync(&fw_event->delayed_work)) {
- _scsih_fw_event_free(ioc, fw_event);
- continue;
- }
+ while ((fw_event = dequeue_next_fw_event(ioc))) {
+ /*
+ * Wait on the fw_event to complete. If this returns 1, then
+ * the event was never executed, and we need a put for the
+ * reference the delayed_work had on the fw_event.
+ *
+ * If it did execute, we wait for it to finish, and the put will
+ * happen from _firmware_event_work()
+ */
+ if (cancel_work_sync(&fw_event->work))
+ fw_event_work_put(fw_event);
+
+ fw_event_work_put(fw_event);
}
}

@@ -4071,13 +4130,14 @@ _scsih_send_event_to_turn_on_pfa_led(struct MPT3SAS_ADAPTER *ioc, u16 handle)
{
struct fw_event_work *fw_event;

- fw_event = kzalloc(sizeof(struct fw_event_work), GFP_ATOMIC);
+ fw_event = alloc_fw_event_work(0);
if (!fw_event)
return;
fw_event->event = MPT3SAS_TURN_ON_PFA_LED;
fw_event->device_handle = handle;
fw_event->ioc = ioc;
_scsih_fw_event_add(ioc, fw_event);
+ fw_event_work_put(fw_event);
}

/**
@@ -7200,10 +7260,11 @@ mpt3sas_scsih_reset_handler(struct MPT3SAS_ADAPTER *ioc, int reset_phase)
static void
_mpt3sas_fw_work(struct MPT3SAS_ADAPTER *ioc, struct fw_event_work *fw_event)
{
+ _scsih_fw_event_del_from_list(ioc, fw_event);
+
/* the queue is being flushed so ignore this event */
- if (ioc->remove_host ||
- ioc->pci_error_recovery) {
- _scsih_fw_event_free(ioc, fw_event);
+ if (ioc->remove_host || ioc->pci_error_recovery) {
+ fw_event_work_put(fw_event);
return;
}

@@ -7214,8 +7275,17 @@ _mpt3sas_fw_work(struct MPT3SAS_ADAPTER *ioc, struct fw_event_work *fw_event)
fw_event->event_data);
break;
case MPT3SAS_REMOVE_UNRESPONDING_DEVICES:
- while (scsi_host_in_recovery(ioc->shost) || ioc->shost_recovery)
+ while (scsi_host_in_recovery(ioc->shost) ||
+ ioc->shost_recovery) {
+ /*
+ * If we're unloading, bail. Otherwise, this can become
+ * an infinite loop.
+ */
+ if (ioc->remove_host)
+ goto out;
+
ssleep(1);
+ }
_scsih_remove_unresponding_sas_devices(ioc);
_scsih_scan_for_devices_after_reset(ioc);
break;
@@ -7260,7 +7330,8 @@ _mpt3sas_fw_work(struct MPT3SAS_ADAPTER *ioc, struct fw_event_work *fw_event)
_scsih_sas_ir_operation_status_event(ioc, fw_event);
break;
}
- _scsih_fw_event_free(ioc, fw_event);
+out:
+ fw_event_work_put(fw_event);
}

/**
@@ -7376,7 +7447,7 @@ mpt3sas_scsih_event_callback(struct MPT3SAS_ADAPTER *ioc, u8 msix_index,
}

sz = le16_to_cpu(mpi_reply->EventDataLength) * 4;
- fw_event = kzalloc(sizeof(*fw_event) + sz, GFP_ATOMIC);
+ fw_event = alloc_fw_event_work(sz);
if (!fw_event) {
pr_err(MPT3SAS_FMT "failure at %s:%d/%s()!\n",
ioc->name, __FILE__, __LINE__, __func__);
@@ -7389,6 +7460,7 @@ mpt3sas_scsih_event_callback(struct MPT3SAS_ADAPTER *ioc, u8 msix_index,
fw_event->VP_ID = mpi_reply->VP_ID;
fw_event->event = event;
_scsih_fw_event_add(ioc, fw_event);
+ fw_event_work_put(fw_event);
return 1;
}

--
1.9.1

2015-08-26 23:55:14

by Calvin Owens

[permalink] [raw]
Subject: Re: [PATCH 0/2] mpt3sas: Reference counting fixes from in-flight mpt2sas

On Wednesday 08/26 at 04:09 +0000, Nicholas A. Bellinger wrote:
> From: Nicholas Bellinger <[email protected]>
>
> Hi James & Co,
>
> This series is a mpt3sas forward port of Calvin Owens' in-flight
> reference counting bugfixes for mpt2sas LLD code here:
>
> [PATCH v4 0/2] Fixes for memory corruption in mpt2sas
> http://marc.info/?l=linux-scsi&m=143951695904115&w=2

Nicholas,

I'm glad to hear these fixes were helpful! I was planning on porting to
mpt3sas in the near future, thanks for saving me the trouble :)

Calvin

> The differences between mpt2sas + mpt3sas in this area are very,
> very small, and is required to address a NULL pointer dereference
> OOPsen in _scsih_probe_sas() -> mpt3sas_transport_port_add() ->
> sas_port_add_phy() that I've been hitting occasionally on boot
> during initial LUN scan.
>
> So far this code has been tested on v3.14.47 with a small cluster
> using SAS3008 HBAs plus a few preceeding upstream mpt3sas patches
> so these patches apply cleanly, and with the changes in place the
> original OOPsen appears to be resolved.
>
> This patch series is cut atop v4.2-rc1 code, and barring any
> objections from Avago folks et al., should be considered along
> with Calvin's mpt2sas patch set for v4.3-rc1 code.
>
> Thank you,
>
> --nab
>
> Nicholas Bellinger (2):
> mpt3sas: Refcount sas_device objects and fix unsafe list usage
> mpt3sas: Refcount fw_events and fix unsafe list usage
>
> drivers/scsi/mpt3sas/mpt3sas_base.h | 23 +-
> drivers/scsi/mpt3sas/mpt3sas_scsih.c | 592 ++++++++++++++++++++++---------
> drivers/scsi/mpt3sas/mpt3sas_transport.c | 12 +-
> 3 files changed, 449 insertions(+), 178 deletions(-)
>
> --
> 1.9.1
>

2015-08-26 23:58:38

by Nicholas A. Bellinger

[permalink] [raw]
Subject: Re: [PATCH 0/2] mpt3sas: Reference counting fixes from in-flight mpt2sas

On Wed, 2015-08-26 at 16:54 -0700, Calvin Owens wrote:
> On Wednesday 08/26 at 04:09 +0000, Nicholas A. Bellinger wrote:
> > From: Nicholas Bellinger <[email protected]>
> >
> > Hi James & Co,
> >
> > This series is a mpt3sas forward port of Calvin Owens' in-flight
> > reference counting bugfixes for mpt2sas LLD code here:
> >
> > [PATCH v4 0/2] Fixes for memory corruption in mpt2sas
> > http://marc.info/?l=linux-scsi&m=143951695904115&w=2
>
> Nicholas,
>
> I'm glad to hear these fixes were helpful! I was planning on porting to
> mpt3sas in the near future, thanks for saving me the trouble :)
>

Would you mind giving this series the once over, and add your
Reviewed-by as well..?

Sreekanth + Avago folks, can you please do the same for Calvin's mpt2sas
series and this series..?

This has been a long-standing issue with mpt*sas LLD code and really
needs to be addressed for upstream.

Thank you,

--nab

2015-08-27 02:07:09

by Calvin Owens

[permalink] [raw]
Subject: Re: [PATCH 2/2] mpt3sas: Refcount fw_events and fix unsafe list usage

On Wednesday 08/26 at 04:09 +0000, Nicholas A. Bellinger wrote:
> From: Nicholas Bellinger <[email protected]>
>
> The fw_event_work struct is concurrently referenced at shutdown, so
> add a refcount to protect it, and refactor the code to use it.
>
> Additionally, refactor _scsih_fw_event_cleanup_queue() such that it
> no longer iterates over the list without holding the lock, since
> _firmware_event_work() concurrently deletes items from the list.
>
> This patch is a port of Calvin's PATCH-v4 for mpt2sas code.
>
> Cc: Calvin Owens <[email protected]>
> Cc: Christoph Hellwig <[email protected]>
> Cc: Sreekanth Reddy <[email protected]>
> Cc: MPT-FusionLinux.pdl <[email protected]>
> Signed-off-by: Nicholas Bellinger <[email protected]>
> ---
> drivers/scsi/mpt3sas/mpt3sas_scsih.c | 116 ++++++++++++++++++++++++++++-------
> 1 file changed, 94 insertions(+), 22 deletions(-)

Looks good, thanks again for this.

Reviewed-by: Calvin Owens <[email protected]>

2015-08-27 02:20:56

by Calvin Owens

[permalink] [raw]
Subject: Re: [PATCH 1/2] mpt3sas: Refcount sas_device objects and fix unsafe list usage

On Wednesday 08/26 at 04:09 +0000, Nicholas A. Bellinger wrote:
> From: Nicholas Bellinger <[email protected]>
>
> These objects can be referenced concurrently throughout the driver, we
> need a way to make sure threads can't delete them out from under
> each other. This patch adds the refcount, and refactors the code to use
> it.
>
> Additionally, we cannot iterate over the sas_device_list without
> holding the lock, or we risk corrupting random memory if items are
> added or deleted as we iterate. This patch refactors _scsih_probe_sas()
> to use the sas_device_list in a safe way.
>
> This patch is a port of Calvin's PATCH-v4 for mpt2sas code.
>
> Cc: Calvin Owens <[email protected]>
> Cc: Christoph Hellwig <[email protected]>
> Cc: Sreekanth Reddy <[email protected]>
> Cc: MPT-FusionLinux.pdl <[email protected]>
> Signed-off-by: Nicholas Bellinger <[email protected]>
> ---
> drivers/scsi/mpt3sas/mpt3sas_base.h | 23 +-
> drivers/scsi/mpt3sas/mpt3sas_scsih.c | 476 +++++++++++++++++++++----------
> drivers/scsi/mpt3sas/mpt3sas_transport.c | 12 +-
> 3 files changed, 355 insertions(+), 156 deletions(-)

Looks good, thanks again for this.

Reviewed-by: Calvin Owens <[email protected]>

> diff --git a/drivers/scsi/mpt3sas/mpt3sas_base.h b/drivers/scsi/mpt3sas/mpt3sas_base.h
> index afa8816..fe29ac0 100644
> --- a/drivers/scsi/mpt3sas/mpt3sas_base.h
> +++ b/drivers/scsi/mpt3sas/mpt3sas_base.h
> @@ -209,6 +209,7 @@ struct Mpi2ManufacturingPage11_t {
> * @flags: MPT_TARGET_FLAGS_XXX flags
> * @deleted: target flaged for deletion
> * @tm_busy: target is busy with TM request.
> + * @sdev: The sas_device associated with this target
> */
> struct MPT3SAS_TARGET {
> struct scsi_target *starget;
> @@ -218,6 +219,7 @@ struct MPT3SAS_TARGET {
> u32 flags;
> u8 deleted;
> u8 tm_busy;
> + struct _sas_device *sdev;
> };
>
>
> @@ -294,6 +296,7 @@ struct _internal_cmd {
> * @responding: used in _scsih_sas_device_mark_responding
> * @fast_path: fast path feature enable bit
> * @pfa_led_on: flag for PFA LED status
> + * @refcount: reference count for deletion
> *
> */
> struct _sas_device {
> @@ -315,8 +318,24 @@ struct _sas_device {
> u8 responding;
> u8 fast_path;
> u8 pfa_led_on;
> + struct kref refcount;
> };
>
> +static inline void sas_device_get(struct _sas_device *s)
> +{
> + kref_get(&s->refcount);
> +}
> +
> +static inline void sas_device_free(struct kref *r)
> +{
> + kfree(container_of(r, struct _sas_device, refcount));
> +}
> +
> +static inline void sas_device_put(struct _sas_device *s)
> +{
> + kref_put(&s->refcount, sas_device_free);
> +}
> +
> /**
> * struct _raid_device - raid volume link list
> * @list: sas device list
> @@ -1043,7 +1062,9 @@ struct _sas_node *mpt3sas_scsih_expander_find_by_handle(
> struct MPT3SAS_ADAPTER *ioc, u16 handle);
> struct _sas_node *mpt3sas_scsih_expander_find_by_sas_address(
> struct MPT3SAS_ADAPTER *ioc, u64 sas_address);
> -struct _sas_device *mpt3sas_scsih_sas_device_find_by_sas_address(
> +struct _sas_device *mpt3sas_get_sdev_by_addr(
> + struct MPT3SAS_ADAPTER *ioc, u64 sas_address);
> +struct _sas_device *__mpt3sas_get_sdev_by_addr(
> struct MPT3SAS_ADAPTER *ioc, u64 sas_address);
>
> void mpt3sas_port_enable_complete(struct MPT3SAS_ADAPTER *ioc);
> diff --git a/drivers/scsi/mpt3sas/mpt3sas_scsih.c b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
> index 5a97e32..7142e3b 100644
> --- a/drivers/scsi/mpt3sas/mpt3sas_scsih.c
> +++ b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
> @@ -518,8 +518,61 @@ _scsih_determine_boot_device(struct MPT3SAS_ADAPTER *ioc,
> }
> }
>
> +static struct _sas_device *
> +__mpt3sas_get_sdev_from_target(struct MPT3SAS_ADAPTER *ioc,
> + struct MPT3SAS_TARGET *tgt_priv)
> +{
> + struct _sas_device *ret;
> +
> + assert_spin_locked(&ioc->sas_device_lock);
> +
> + ret = tgt_priv->sdev;
> + if (ret)
> + sas_device_get(ret);
> +
> + return ret;
> +}
> +
> +static struct _sas_device *
> +mpt3sas_get_sdev_from_target(struct MPT3SAS_ADAPTER *ioc,
> + struct MPT3SAS_TARGET *tgt_priv)
> +{
> + struct _sas_device *ret;
> + unsigned long flags;
> +
> + spin_lock_irqsave(&ioc->sas_device_lock, flags);
> + ret = __mpt3sas_get_sdev_from_target(ioc, tgt_priv);
> + spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
> +
> + return ret;
> +}
> +
> +struct _sas_device *
> +__mpt3sas_get_sdev_by_addr(struct MPT3SAS_ADAPTER *ioc,
> + u64 sas_address)
> +{
> + struct _sas_device *sas_device;
> +
> + assert_spin_locked(&ioc->sas_device_lock);
> +
> + list_for_each_entry(sas_device, &ioc->sas_device_list, list)
> + if (sas_device->sas_address == sas_address)
> + goto found_device;
> +
> + list_for_each_entry(sas_device, &ioc->sas_device_init_list, list)
> + if (sas_device->sas_address == sas_address)
> + goto found_device;
> +
> + return NULL;
> +
> +found_device:
> + sas_device_get(sas_device);
> + return sas_device;
> +}
> +
> +
> /**
> - * mpt3sas_scsih_sas_device_find_by_sas_address - sas device search
> + * mpt3sas_get_sdev_by_addr - sas device search
> * @ioc: per adapter object
> * @sas_address: sas address
> * Context: Calling function should acquire ioc->sas_device_lock
> @@ -528,24 +581,44 @@ _scsih_determine_boot_device(struct MPT3SAS_ADAPTER *ioc,
> * object.
> */
> struct _sas_device *
> -mpt3sas_scsih_sas_device_find_by_sas_address(struct MPT3SAS_ADAPTER *ioc,
> +mpt3sas_get_sdev_by_addr(struct MPT3SAS_ADAPTER *ioc,
> u64 sas_address)
> {
> struct _sas_device *sas_device;
> + unsigned long flags;
> +
> + spin_lock_irqsave(&ioc->sas_device_lock, flags);
> + sas_device = __mpt3sas_get_sdev_by_addr(ioc,
> + sas_address);
> + spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
> +
> + return sas_device;
> +}
> +
> +static struct _sas_device *
> +__mpt3sas_get_sdev_by_handle(struct MPT3SAS_ADAPTER *ioc, u16 handle)
> +{
> + struct _sas_device *sas_device;
> +
> + assert_spin_locked(&ioc->sas_device_lock);
>
> list_for_each_entry(sas_device, &ioc->sas_device_list, list)
> - if (sas_device->sas_address == sas_address)
> - return sas_device;
> + if (sas_device->handle == handle)
> + goto found_device;
>
> list_for_each_entry(sas_device, &ioc->sas_device_init_list, list)
> - if (sas_device->sas_address == sas_address)
> - return sas_device;
> + if (sas_device->handle == handle)
> + goto found_device;
>
> return NULL;
> +
> +found_device:
> + sas_device_get(sas_device);
> + return sas_device;
> }
>
> /**
> - * _scsih_sas_device_find_by_handle - sas device search
> + * mpt3sas_get_sdev_by_handle - sas device search
> * @ioc: per adapter object
> * @handle: sas device handle (assigned by firmware)
> * Context: Calling function should acquire ioc->sas_device_lock
> @@ -554,19 +627,16 @@ mpt3sas_scsih_sas_device_find_by_sas_address(struct MPT3SAS_ADAPTER *ioc,
> * object.
> */
> static struct _sas_device *
> -_scsih_sas_device_find_by_handle(struct MPT3SAS_ADAPTER *ioc, u16 handle)
> +mpt3sas_get_sdev_by_handle(struct MPT3SAS_ADAPTER *ioc, u16 handle)
> {
> struct _sas_device *sas_device;
> + unsigned long flags;
>
> - list_for_each_entry(sas_device, &ioc->sas_device_list, list)
> - if (sas_device->handle == handle)
> - return sas_device;
> -
> - list_for_each_entry(sas_device, &ioc->sas_device_init_list, list)
> - if (sas_device->handle == handle)
> - return sas_device;
> + spin_lock_irqsave(&ioc->sas_device_lock, flags);
> + sas_device = __mpt3sas_get_sdev_by_handle(ioc, handle);
> + spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
>
> - return NULL;
> + return sas_device;
> }
>
> /**
> @@ -575,7 +645,7 @@ _scsih_sas_device_find_by_handle(struct MPT3SAS_ADAPTER *ioc, u16 handle)
> * @sas_device: the sas_device object
> * Context: This function will acquire ioc->sas_device_lock.
> *
> - * Removing object and freeing associated memory from the ioc->sas_device_list.
> + * If sas_device is on the list, remove it and decrement its reference count.
> */
> static void
> _scsih_sas_device_remove(struct MPT3SAS_ADAPTER *ioc,
> @@ -585,10 +655,15 @@ _scsih_sas_device_remove(struct MPT3SAS_ADAPTER *ioc,
>
> if (!sas_device)
> return;
> -
> + /*
> + * The lock serializes access to the list, but we still need to verify
> + * that nobody removed the entry while we were waiting on the lock.
> + */
> spin_lock_irqsave(&ioc->sas_device_lock, flags);
> - list_del(&sas_device->list);
> - kfree(sas_device);
> + if (!list_empty(&sas_device->list)) {
> + list_del_init(&sas_device->list);
> + sas_device_put(sas_device);
> + }
> spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
> }
>
> @@ -609,12 +684,17 @@ _scsih_device_remove_by_handle(struct MPT3SAS_ADAPTER *ioc, u16 handle)
> return;
>
> spin_lock_irqsave(&ioc->sas_device_lock, flags);
> - sas_device = _scsih_sas_device_find_by_handle(ioc, handle);
> - if (sas_device)
> - list_del(&sas_device->list);
> + sas_device = __mpt3sas_get_sdev_by_handle(ioc, handle);
> + if (sas_device) {
> + list_del_init(&sas_device->list);
> + sas_device_put(sas_device);
> + }
> spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
> - if (sas_device)
> +
> + if (sas_device) {
> _scsih_remove_device(ioc, sas_device);
> + sas_device_put(sas_device);
> + }
> }
>
> /**
> @@ -635,13 +715,17 @@ mpt3sas_device_remove_by_sas_address(struct MPT3SAS_ADAPTER *ioc,
> return;
>
> spin_lock_irqsave(&ioc->sas_device_lock, flags);
> - sas_device = mpt3sas_scsih_sas_device_find_by_sas_address(ioc,
> - sas_address);
> - if (sas_device)
> - list_del(&sas_device->list);
> + sas_device = __mpt3sas_get_sdev_by_addr(ioc, sas_address);
> + if (sas_device) {
> + list_del_init(&sas_device->list);
> + sas_device_put(sas_device);
> + }
> spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
> - if (sas_device)
> +
> + if (sas_device) {
> _scsih_remove_device(ioc, sas_device);
> + sas_device_put(sas_device);
> + }
> }
>
> /**
> @@ -664,6 +748,7 @@ _scsih_sas_device_add(struct MPT3SAS_ADAPTER *ioc,
> (unsigned long long)sas_device->sas_address));
>
> spin_lock_irqsave(&ioc->sas_device_lock, flags);
> + sas_device_get(sas_device);
> list_add_tail(&sas_device->list, &ioc->sas_device_list);
> spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
>
> @@ -705,6 +790,7 @@ _scsih_sas_device_init_add(struct MPT3SAS_ADAPTER *ioc,
> (unsigned long long)sas_device->sas_address));
>
> spin_lock_irqsave(&ioc->sas_device_lock, flags);
> + sas_device_get(sas_device);
> list_add_tail(&sas_device->list, &ioc->sas_device_init_list);
> _scsih_determine_boot_device(ioc, sas_device, 0);
> spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
> @@ -1083,12 +1169,15 @@ _scsih_change_queue_depth(struct scsi_device *sdev, int qdepth)
> goto not_sata;
> if ((sas_target_priv_data->flags & MPT_TARGET_FLAGS_VOLUME))
> goto not_sata;
> +
> spin_lock_irqsave(&ioc->sas_device_lock, flags);
> - sas_device = mpt3sas_scsih_sas_device_find_by_sas_address(ioc,
> - sas_device_priv_data->sas_target->sas_address);
> - if (sas_device && sas_device->device_info &
> - MPI2_SAS_DEVICE_INFO_SATA_DEVICE)
> - max_depth = MPT3SAS_SATA_QUEUE_DEPTH;
> + sas_device = __mpt3sas_get_sdev_from_target(ioc, sas_target_priv_data);
> + if (sas_device) {
> + if (sas_device->device_info & MPI2_SAS_DEVICE_INFO_SATA_DEVICE)
> + max_depth = MPT3SAS_SATA_QUEUE_DEPTH;
> +
> + sas_device_put(sas_device);
> + }
> spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
>
> not_sata:
> @@ -1145,12 +1234,13 @@ _scsih_target_alloc(struct scsi_target *starget)
> /* sas/sata devices */
> spin_lock_irqsave(&ioc->sas_device_lock, flags);
> rphy = dev_to_rphy(starget->dev.parent);
> - sas_device = mpt3sas_scsih_sas_device_find_by_sas_address(ioc,
> + sas_device = __mpt3sas_get_sdev_by_addr(ioc,
> rphy->identify.sas_address);
>
> if (sas_device) {
> sas_target_priv_data->handle = sas_device->handle;
> sas_target_priv_data->sas_address = sas_device->sas_address;
> + sas_target_priv_data->sdev = sas_device;
> sas_device->starget = starget;
> sas_device->id = starget->id;
> sas_device->channel = starget->channel;
> @@ -1200,13 +1290,21 @@ _scsih_target_destroy(struct scsi_target *starget)
>
> spin_lock_irqsave(&ioc->sas_device_lock, flags);
> rphy = dev_to_rphy(starget->dev.parent);
> - sas_device = mpt3sas_scsih_sas_device_find_by_sas_address(ioc,
> - rphy->identify.sas_address);
> + sas_device = __mpt3sas_get_sdev_from_target(ioc, sas_target_priv_data);
> if (sas_device && (sas_device->starget == starget) &&
> (sas_device->id == starget->id) &&
> (sas_device->channel == starget->channel))
> sas_device->starget = NULL;
>
> + if (sas_device) {
> + /*
> + * Corresponding get() is in _scsih_target_alloc()
> + */
> + sas_target_priv_data->sdev = NULL;
> + sas_device_put(sas_device);
> +
> + sas_device_put(sas_device);
> + }
> spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
>
> out:
> @@ -1262,7 +1360,7 @@ _scsih_slave_alloc(struct scsi_device *sdev)
>
> if (!(sas_target_priv_data->flags & MPT_TARGET_FLAGS_VOLUME)) {
> spin_lock_irqsave(&ioc->sas_device_lock, flags);
> - sas_device = mpt3sas_scsih_sas_device_find_by_sas_address(ioc,
> + sas_device = __mpt3sas_get_sdev_by_addr(ioc,
> sas_target_priv_data->sas_address);
> if (sas_device && (sas_device->starget == NULL)) {
> sdev_printk(KERN_INFO, sdev,
> @@ -1270,6 +1368,10 @@ _scsih_slave_alloc(struct scsi_device *sdev)
> __func__, __LINE__);
> sas_device->starget = starget;
> }
> +
> + if (sas_device)
> + sas_device_put(sas_device);
> +
> spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
> }
>
> @@ -1304,10 +1406,14 @@ _scsih_slave_destroy(struct scsi_device *sdev)
>
> if (!(sas_target_priv_data->flags & MPT_TARGET_FLAGS_VOLUME)) {
> spin_lock_irqsave(&ioc->sas_device_lock, flags);
> - sas_device = mpt3sas_scsih_sas_device_find_by_sas_address(ioc,
> - sas_target_priv_data->sas_address);
> + sas_device = __mpt3sas_get_sdev_from_target(ioc,
> + sas_target_priv_data);
> if (sas_device && !sas_target_priv_data->num_luns)
> sas_device->starget = NULL;
> +
> + if (sas_device)
> + sas_device_put(sas_device);
> +
> spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
> }
>
> @@ -1743,7 +1849,7 @@ _scsih_slave_configure(struct scsi_device *sdev)
> }
>
> spin_lock_irqsave(&ioc->sas_device_lock, flags);
> - sas_device = mpt3sas_scsih_sas_device_find_by_sas_address(ioc,
> + sas_device = __mpt3sas_get_sdev_by_addr(ioc,
> sas_device_priv_data->sas_target->sas_address);
> if (!sas_device) {
> spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
> @@ -1777,12 +1883,12 @@ _scsih_slave_configure(struct scsi_device *sdev)
> ds, (unsigned long long)
> sas_device->enclosure_logical_id, sas_device->slot);
>
> + sas_device_put(sas_device);
> spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
>
> if (!ssp_target)
> _scsih_display_sata_capabilities(ioc, handle, sdev);
>
> -
> _scsih_change_queue_depth(sdev, qdepth);
>
> if (ssp_target) {
> @@ -2173,8 +2279,7 @@ _scsih_tm_display_info(struct MPT3SAS_ADAPTER *ioc, struct scsi_cmnd *scmd)
> device_str, (unsigned long long)priv_target->sas_address);
> } else {
> spin_lock_irqsave(&ioc->sas_device_lock, flags);
> - sas_device = mpt3sas_scsih_sas_device_find_by_sas_address(ioc,
> - priv_target->sas_address);
> + sas_device = __mpt3sas_get_sdev_from_target(ioc, priv_target);
> if (sas_device) {
> if (priv_target->flags &
> MPT_TARGET_FLAGS_RAID_COMPONENT) {
> @@ -2193,6 +2298,8 @@ _scsih_tm_display_info(struct MPT3SAS_ADAPTER *ioc, struct scsi_cmnd *scmd)
> "enclosure_logical_id(0x%016llx), slot(%d)\n",
> (unsigned long long)sas_device->enclosure_logical_id,
> sas_device->slot);
> +
> + sas_device_put(sas_device);
> }
> spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
> }
> @@ -2268,10 +2375,11 @@ _scsih_dev_reset(struct scsi_cmnd *scmd)
> {
> struct MPT3SAS_ADAPTER *ioc = shost_priv(scmd->device->host);
> struct MPT3SAS_DEVICE *sas_device_priv_data;
> - struct _sas_device *sas_device;
> - unsigned long flags;
> + struct _sas_device *sas_device = NULL;
> u16 handle;
> int r;
> + struct scsi_target *starget = scmd->device->sdev_target;
> + struct MPT3SAS_TARGET *target_priv_data = starget->hostdata;
>
> sdev_printk(KERN_INFO, scmd->device,
> "attempting device reset! scmd(%p)\n", scmd);
> @@ -2291,12 +2399,10 @@ _scsih_dev_reset(struct scsi_cmnd *scmd)
> handle = 0;
> if (sas_device_priv_data->sas_target->flags &
> MPT_TARGET_FLAGS_RAID_COMPONENT) {
> - spin_lock_irqsave(&ioc->sas_device_lock, flags);
> - sas_device = _scsih_sas_device_find_by_handle(ioc,
> - sas_device_priv_data->sas_target->handle);
> + sas_device = mpt3sas_get_sdev_from_target(ioc,
> + target_priv_data);
> if (sas_device)
> handle = sas_device->volume_handle;
> - spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
> } else
> handle = sas_device_priv_data->sas_target->handle;
>
> @@ -2313,6 +2419,10 @@ _scsih_dev_reset(struct scsi_cmnd *scmd)
> out:
> sdev_printk(KERN_INFO, scmd->device, "device reset: %s scmd(%p)\n",
> ((r == SUCCESS) ? "SUCCESS" : "FAILED"), scmd);
> +
> + if (sas_device)
> + sas_device_put(sas_device);
> +
> return r;
> }
>
> @@ -2327,11 +2437,11 @@ _scsih_target_reset(struct scsi_cmnd *scmd)
> {
> struct MPT3SAS_ADAPTER *ioc = shost_priv(scmd->device->host);
> struct MPT3SAS_DEVICE *sas_device_priv_data;
> - struct _sas_device *sas_device;
> - unsigned long flags;
> + struct _sas_device *sas_device = NULL;
> u16 handle;
> int r;
> struct scsi_target *starget = scmd->device->sdev_target;
> + struct MPT3SAS_TARGET *target_priv_data = starget->hostdata;
>
> starget_printk(KERN_INFO, starget, "attempting target reset! scmd(%p)\n",
> scmd);
> @@ -2351,12 +2461,10 @@ _scsih_target_reset(struct scsi_cmnd *scmd)
> handle = 0;
> if (sas_device_priv_data->sas_target->flags &
> MPT_TARGET_FLAGS_RAID_COMPONENT) {
> - spin_lock_irqsave(&ioc->sas_device_lock, flags);
> - sas_device = _scsih_sas_device_find_by_handle(ioc,
> - sas_device_priv_data->sas_target->handle);
> + sas_device = mpt3sas_get_sdev_from_target(ioc,
> + target_priv_data);
> if (sas_device)
> handle = sas_device->volume_handle;
> - spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
> } else
> handle = sas_device_priv_data->sas_target->handle;
>
> @@ -2373,6 +2481,10 @@ _scsih_target_reset(struct scsi_cmnd *scmd)
> out:
> starget_printk(KERN_INFO, starget, "target reset: %s scmd(%p)\n",
> ((r == SUCCESS) ? "SUCCESS" : "FAILED"), scmd);
> +
> + if (sas_device)
> + sas_device_put(sas_device);
> +
> return r;
> }
>
> @@ -2683,15 +2795,15 @@ _scsih_block_io_to_children_attached_to_ex(struct MPT3SAS_ADAPTER *ioc,
>
> list_for_each_entry(mpt3sas_port,
> &sas_expander->sas_port_list, port_list) {
> - if (mpt3sas_port->remote_identify.device_type ==
> - SAS_END_DEVICE) {
> + if (mpt3sas_port->remote_identify.device_type == SAS_END_DEVICE) {
> spin_lock_irqsave(&ioc->sas_device_lock, flags);
> - sas_device =
> - mpt3sas_scsih_sas_device_find_by_sas_address(ioc,
> - mpt3sas_port->remote_identify.sas_address);
> - if (sas_device)
> + sas_device = __mpt3sas_get_sdev_by_addr(ioc,
> + mpt3sas_port->remote_identify.sas_address);
> + if (sas_device) {
> set_bit(sas_device->handle,
> - ioc->blocking_handles);
> + ioc->blocking_handles);
> + sas_device_put(sas_device);
> + }
> spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
> }
> }
> @@ -2759,7 +2871,7 @@ _scsih_tm_tr_send(struct MPT3SAS_ADAPTER *ioc, u16 handle)
> {
> Mpi2SCSITaskManagementRequest_t *mpi_request;
> u16 smid;
> - struct _sas_device *sas_device;
> + struct _sas_device *sas_device = NULL;
> struct MPT3SAS_TARGET *sas_target_priv_data = NULL;
> u64 sas_address = 0;
> unsigned long flags;
> @@ -2792,7 +2904,7 @@ _scsih_tm_tr_send(struct MPT3SAS_ADAPTER *ioc, u16 handle)
> return;
>
> spin_lock_irqsave(&ioc->sas_device_lock, flags);
> - sas_device = _scsih_sas_device_find_by_handle(ioc, handle);
> + sas_device = __mpt3sas_get_sdev_by_handle(ioc, handle);
> if (sas_device && sas_device->starget &&
> sas_device->starget->hostdata) {
> sas_target_priv_data = sas_device->starget->hostdata;
> @@ -2814,14 +2926,14 @@ _scsih_tm_tr_send(struct MPT3SAS_ADAPTER *ioc, u16 handle)
> if (!smid) {
> delayed_tr = kzalloc(sizeof(*delayed_tr), GFP_ATOMIC);
> if (!delayed_tr)
> - return;
> + goto out;
> INIT_LIST_HEAD(&delayed_tr->list);
> delayed_tr->handle = handle;
> list_add_tail(&delayed_tr->list, &ioc->delayed_tr_list);
> dewtprintk(ioc, pr_info(MPT3SAS_FMT
> "DELAYED:tr:handle(0x%04x), (open)\n",
> ioc->name, handle));
> - return;
> + goto out;
> }
>
> dewtprintk(ioc, pr_info(MPT3SAS_FMT
> @@ -2835,6 +2947,9 @@ _scsih_tm_tr_send(struct MPT3SAS_ADAPTER *ioc, u16 handle)
> mpi_request->TaskType = MPI2_SCSITASKMGMT_TASKTYPE_TARGET_RESET;
> mpt3sas_base_put_smid_hi_priority(ioc, smid);
> mpt3sas_trigger_master(ioc, MASTER_TRIGGER_DEVICE_REMOVAL);
> +out:
> + if (sas_device)
> + sas_device_put(sas_device);
> }
>
> /**
> @@ -3685,7 +3800,6 @@ _scsih_scsi_ioc_info(struct MPT3SAS_ADAPTER *ioc, struct scsi_cmnd *scmd,
> char *desc_scsi_state = ioc->tmp_string;
> u32 log_info = le32_to_cpu(mpi_reply->IOCLogInfo);
> struct _sas_device *sas_device = NULL;
> - unsigned long flags;
> struct scsi_target *starget = scmd->device->sdev_target;
> struct MPT3SAS_TARGET *priv_target = starget->hostdata;
> char *device_str = NULL;
> @@ -3813,9 +3927,7 @@ _scsih_scsi_ioc_info(struct MPT3SAS_ADAPTER *ioc, struct scsi_cmnd *scmd,
> pr_warn(MPT3SAS_FMT "\t%s wwid(0x%016llx)\n", ioc->name,
> device_str, (unsigned long long)priv_target->sas_address);
> } else {
> - spin_lock_irqsave(&ioc->sas_device_lock, flags);
> - sas_device = mpt3sas_scsih_sas_device_find_by_sas_address(ioc,
> - priv_target->sas_address);
> + sas_device = mpt3sas_get_sdev_from_target(ioc, priv_target);
> if (sas_device) {
> pr_warn(MPT3SAS_FMT
> "\tsas_address(0x%016llx), phy(%d)\n",
> @@ -3825,8 +3937,9 @@ _scsih_scsi_ioc_info(struct MPT3SAS_ADAPTER *ioc, struct scsi_cmnd *scmd,
> "\tenclosure_logical_id(0x%016llx), slot(%d)\n",
> ioc->name, (unsigned long long)
> sas_device->enclosure_logical_id, sas_device->slot);
> +
> + sas_device_put(sas_device);
> }
> - spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
> }
>
> pr_warn(MPT3SAS_FMT
> @@ -3878,7 +3991,7 @@ _scsih_turn_on_pfa_led(struct MPT3SAS_ADAPTER *ioc, u16 handle)
> Mpi2SepRequest_t mpi_request;
> struct _sas_device *sas_device;
>
> - sas_device = _scsih_sas_device_find_by_handle(ioc, handle);
> + sas_device = mpt3sas_get_sdev_by_handle(ioc, handle);
> if (!sas_device)
> return;
>
> @@ -3893,7 +4006,7 @@ _scsih_turn_on_pfa_led(struct MPT3SAS_ADAPTER *ioc, u16 handle)
> &mpi_request)) != 0) {
> pr_err(MPT3SAS_FMT "failure at %s:%d/%s()!\n", ioc->name,
> __FILE__, __LINE__, __func__);
> - return;
> + goto out;
> }
> sas_device->pfa_led_on = 1;
>
> @@ -3902,8 +4015,10 @@ _scsih_turn_on_pfa_led(struct MPT3SAS_ADAPTER *ioc, u16 handle)
> "enclosure_processor: ioc_status (0x%04x), loginfo(0x%08x)\n",
> ioc->name, le16_to_cpu(mpi_reply.IOCStatus),
> le32_to_cpu(mpi_reply.IOCLogInfo)));
> - return;
> + goto out;
> }
> +out:
> + sas_device_put(sas_device);
> }
> /**
> * _scsih_turn_off_pfa_led - turn off Fault LED
> @@ -3986,19 +4101,17 @@ _scsih_smart_predicted_fault(struct MPT3SAS_ADAPTER *ioc, u16 handle)
>
> /* only handle non-raid devices */
> spin_lock_irqsave(&ioc->sas_device_lock, flags);
> - sas_device = _scsih_sas_device_find_by_handle(ioc, handle);
> + sas_device = __mpt3sas_get_sdev_by_handle(ioc, handle);
> if (!sas_device) {
> - spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
> - return;
> + goto out_unlock;
> }
> starget = sas_device->starget;
> sas_target_priv_data = starget->hostdata;
>
> if ((sas_target_priv_data->flags & MPT_TARGET_FLAGS_RAID_COMPONENT) ||
> - ((sas_target_priv_data->flags & MPT_TARGET_FLAGS_VOLUME))) {
> - spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
> - return;
> - }
> + ((sas_target_priv_data->flags & MPT_TARGET_FLAGS_VOLUME)))
> + goto out_unlock;
> +
> starget_printk(KERN_WARNING, starget, "predicted fault\n");
> spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
>
> @@ -4012,7 +4125,7 @@ _scsih_smart_predicted_fault(struct MPT3SAS_ADAPTER *ioc, u16 handle)
> if (!event_reply) {
> pr_err(MPT3SAS_FMT "failure at %s:%d/%s()!\n",
> ioc->name, __FILE__, __LINE__, __func__);
> - return;
> + goto out;
> }
>
> event_reply->Function = MPI2_FUNCTION_EVENT_NOTIFICATION;
> @@ -4029,6 +4142,14 @@ _scsih_smart_predicted_fault(struct MPT3SAS_ADAPTER *ioc, u16 handle)
> event_data->SASAddress = cpu_to_le64(sas_target_priv_data->sas_address);
> mpt3sas_ctl_add_to_event_log(ioc, event_reply);
> kfree(event_reply);
> +out:
> + if (sas_device)
> + sas_device_put(sas_device);
> + return;
> +
> +out_unlock:
> + spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
> + goto out;
> }
>
> /**
> @@ -4772,13 +4893,11 @@ _scsih_check_device(struct MPT3SAS_ADAPTER *ioc,
>
> spin_lock_irqsave(&ioc->sas_device_lock, flags);
> sas_address = le64_to_cpu(sas_device_pg0.SASAddress);
> - sas_device = mpt3sas_scsih_sas_device_find_by_sas_address(ioc,
> + sas_device = __mpt3sas_get_sdev_by_addr(ioc,
> sas_address);
>
> - if (!sas_device) {
> - spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
> - return;
> - }
> + if (!sas_device)
> + goto out_unlock;
>
> if (unlikely(sas_device->handle != handle)) {
> starget = sas_device->starget;
> @@ -4796,20 +4915,24 @@ _scsih_check_device(struct MPT3SAS_ADAPTER *ioc,
> pr_err(MPT3SAS_FMT
> "device is not present handle(0x%04x), flags!!!\n",
> ioc->name, handle);
> - spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
> - return;
> + goto out_unlock;
> }
>
> /* check if there were any issues with discovery */
> if (_scsih_check_access_status(ioc, sas_address, handle,
> - sas_device_pg0.AccessStatus)) {
> - spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
> - return;
> - }
> + sas_device_pg0.AccessStatus))
> + goto out_unlock;
>
> spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
> _scsih_ublock_io_device(ioc, sas_address);
> + if (sas_device)
> + sas_device_put(sas_device);
> + return;
>
> +out_unlock:
> + spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
> + if (sas_device)
> + sas_device_put(sas_device);
> }
>
> /**
> @@ -4834,7 +4957,6 @@ _scsih_add_device(struct MPT3SAS_ADAPTER *ioc, u16 handle, u8 phy_num,
> u32 ioc_status;
> u64 sas_address;
> u32 device_info;
> - unsigned long flags;
>
> if ((mpt3sas_config_get_sas_device_pg0(ioc, &mpi_reply, &sas_device_pg0,
> MPI2_SAS_DEVICE_PGAD_FORM_HANDLE, handle))) {
> @@ -4870,13 +4992,11 @@ _scsih_add_device(struct MPT3SAS_ADAPTER *ioc, u16 handle, u8 phy_num,
> sas_device_pg0.AccessStatus))
> return -1;
>
> - spin_lock_irqsave(&ioc->sas_device_lock, flags);
> - sas_device = mpt3sas_scsih_sas_device_find_by_sas_address(ioc,
> - sas_address);
> - spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
> -
> - if (sas_device)
> + sas_device = mpt3sas_get_sdev_by_addr(ioc, sas_address);
> + if (sas_device) {
> + sas_device_put(sas_device);
> return -1;
> + }
>
> sas_device = kzalloc(sizeof(struct _sas_device),
> GFP_KERNEL);
> @@ -4886,6 +5006,7 @@ _scsih_add_device(struct MPT3SAS_ADAPTER *ioc, u16 handle, u8 phy_num,
> return 0;
> }
>
> + kref_init(&sas_device->refcount);
> sas_device->handle = handle;
> if (_scsih_get_sas_address(ioc,
> le16_to_cpu(sas_device_pg0.ParentDevHandle),
> @@ -4917,6 +5038,7 @@ _scsih_add_device(struct MPT3SAS_ADAPTER *ioc, u16 handle, u8 phy_num,
> else
> _scsih_sas_device_add(ioc, sas_device);
>
> + sas_device_put(sas_device);
> return 0;
> }
>
> @@ -4965,8 +5087,6 @@ _scsih_remove_device(struct MPT3SAS_ADAPTER *ioc,
> ioc->name, __func__,
> sas_device->handle, (unsigned long long)
> sas_device->sas_address));
> -
> - kfree(sas_device);
> }
>
> #ifdef CONFIG_SCSI_MPT3SAS_LOGGING
> @@ -5291,25 +5411,25 @@ _scsih_sas_device_status_change_event(struct MPT3SAS_ADAPTER *ioc,
>
> spin_lock_irqsave(&ioc->sas_device_lock, flags);
> sas_address = le64_to_cpu(event_data->SASAddress);
> - sas_device = mpt3sas_scsih_sas_device_find_by_sas_address(ioc,
> - sas_address);
> + sas_device = __mpt3sas_get_sdev_by_addr(ioc, sas_address);
>
> - if (!sas_device || !sas_device->starget) {
> - spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
> - return;
> - }
> + if (!sas_device || !sas_device->starget)
> + goto out;
>
> target_priv_data = sas_device->starget->hostdata;
> - if (!target_priv_data) {
> - spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
> - return;
> - }
> + if (!target_priv_data)
> + goto out;
>
> if (event_data->ReasonCode ==
> MPI2_EVENT_SAS_DEV_STAT_RC_INTERNAL_DEVICE_RESET)
> target_priv_data->tm_busy = 1;
> else
> target_priv_data->tm_busy = 0;
> +
> +out:
> + if (sas_device)
> + sas_device_put(sas_device);
> +
> spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
> }
>
> @@ -5793,7 +5913,7 @@ _scsih_sas_pd_expose(struct MPT3SAS_ADAPTER *ioc,
> u16 handle = le16_to_cpu(element->PhysDiskDevHandle);
>
> spin_lock_irqsave(&ioc->sas_device_lock, flags);
> - sas_device = _scsih_sas_device_find_by_handle(ioc, handle);
> + sas_device = __mpt3sas_get_sdev_by_handle(ioc, handle);
> if (sas_device) {
> sas_device->volume_handle = 0;
> sas_device->volume_wwid = 0;
> @@ -5812,6 +5932,8 @@ _scsih_sas_pd_expose(struct MPT3SAS_ADAPTER *ioc,
> /* exposing raid component */
> if (starget)
> starget_for_each_device(starget, NULL, _scsih_reprobe_lun);
> +
> + sas_device_put(sas_device);
> }
>
> /**
> @@ -5840,7 +5962,7 @@ _scsih_sas_pd_hide(struct MPT3SAS_ADAPTER *ioc,
> &volume_wwid);
>
> spin_lock_irqsave(&ioc->sas_device_lock, flags);
> - sas_device = _scsih_sas_device_find_by_handle(ioc, handle);
> + sas_device = __mpt3sas_get_sdev_by_handle(ioc, handle);
> if (sas_device) {
> set_bit(handle, ioc->pd_handles);
> if (sas_device->starget && sas_device->starget->hostdata) {
> @@ -5860,6 +5982,8 @@ _scsih_sas_pd_hide(struct MPT3SAS_ADAPTER *ioc,
> _scsih_ir_fastpath(ioc, handle, element->PhysDiskNum);
> if (starget)
> starget_for_each_device(starget, (void *)1, _scsih_reprobe_lun);
> +
> + sas_device_put(sas_device);
> }
>
> /**
> @@ -5892,7 +6016,6 @@ _scsih_sas_pd_add(struct MPT3SAS_ADAPTER *ioc,
> Mpi2EventIrConfigElement_t *element)
> {
> struct _sas_device *sas_device;
> - unsigned long flags;
> u16 handle = le16_to_cpu(element->PhysDiskDevHandle);
> Mpi2ConfigReply_t mpi_reply;
> Mpi2SasDevicePage0_t sas_device_pg0;
> @@ -5902,11 +6025,10 @@ _scsih_sas_pd_add(struct MPT3SAS_ADAPTER *ioc,
>
> set_bit(handle, ioc->pd_handles);
>
> - spin_lock_irqsave(&ioc->sas_device_lock, flags);
> - sas_device = _scsih_sas_device_find_by_handle(ioc, handle);
> - spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
> + sas_device = mpt3sas_get_sdev_by_handle(ioc, handle);
> if (sas_device) {
> _scsih_ir_fastpath(ioc, handle, element->PhysDiskNum);
> + sas_device_put(sas_device);
> return;
> }
>
> @@ -6183,7 +6305,6 @@ _scsih_sas_ir_physical_disk_event(struct MPT3SAS_ADAPTER *ioc,
> u16 handle, parent_handle;
> u32 state;
> struct _sas_device *sas_device;
> - unsigned long flags;
> Mpi2ConfigReply_t mpi_reply;
> Mpi2SasDevicePage0_t sas_device_pg0;
> u32 ioc_status;
> @@ -6212,12 +6333,12 @@ _scsih_sas_ir_physical_disk_event(struct MPT3SAS_ADAPTER *ioc,
> case MPI2_RAID_PD_STATE_HOT_SPARE:
>
> set_bit(handle, ioc->pd_handles);
> - spin_lock_irqsave(&ioc->sas_device_lock, flags);
> - sas_device = _scsih_sas_device_find_by_handle(ioc, handle);
> - spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
>
> - if (sas_device)
> + sas_device = mpt3sas_get_sdev_by_handle(ioc, handle);
> + if (sas_device) {
> + sas_device_put(sas_device);
> return;
> + }
>
> if ((mpt3sas_config_get_sas_device_pg0(ioc, &mpi_reply,
> &sas_device_pg0, MPI2_SAS_DEVICE_PGAD_FORM_HANDLE,
> @@ -6673,6 +6794,7 @@ _scsih_remove_unresponding_sas_devices(struct MPT3SAS_ADAPTER *ioc)
> struct _raid_device *raid_device, *raid_device_next;
> struct list_head tmp_list;
> unsigned long flags;
> + LIST_HEAD(head);
>
> pr_info(MPT3SAS_FMT "removing unresponding devices: start\n",
> ioc->name);
> @@ -6680,14 +6802,29 @@ _scsih_remove_unresponding_sas_devices(struct MPT3SAS_ADAPTER *ioc)
> /* removing unresponding end devices */
> pr_info(MPT3SAS_FMT "removing unresponding devices: end-devices\n",
> ioc->name);
> +
> + /*
> + * Iterate, pulling off devices marked as non-responding. We become the
> + * owner for the reference the list had on any object we prune.
> + */
> + spin_lock_irqsave(&ioc->sas_device_lock, flags);
> list_for_each_entry_safe(sas_device, sas_device_next,
> - &ioc->sas_device_list, list) {
> + &ioc->sas_device_list, list) {
> if (!sas_device->responding)
> - mpt3sas_device_remove_by_sas_address(ioc,
> - sas_device->sas_address);
> + list_move_tail(&sas_device->list, &head);
> else
> sas_device->responding = 0;
> }
> + spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
> +
> + /*
> + * Now, uninitialize and remove the unresponding devices we pruned.
> + */
> + list_for_each_entry_safe(sas_device, sas_device_next, &head, list) {
> + _scsih_remove_device(ioc, sas_device);
> + list_del_init(&sas_device->list);
> + sas_device_put(sas_device);
> + }
>
> /* removing unresponding volumes */
> if (ioc->ir_firmware) {
> @@ -6841,11 +6978,11 @@ _scsih_scan_for_devices_after_reset(struct MPT3SAS_ADAPTER *ioc)
> }
> phys_disk_num = pd_pg0.PhysDiskNum;
> handle = le16_to_cpu(pd_pg0.DevHandle);
> - spin_lock_irqsave(&ioc->sas_device_lock, flags);
> - sas_device = _scsih_sas_device_find_by_handle(ioc, handle);
> - spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
> - if (sas_device)
> + sas_device = mpt3sas_get_sdev_by_handle(ioc, handle);
> + if (sas_device) {
> + sas_device_put(sas_device);
> continue;
> + }
> if (mpt3sas_config_get_sas_device_pg0(ioc, &mpi_reply,
> &sas_device_pg0, MPI2_SAS_DEVICE_PGAD_FORM_HANDLE,
> handle) != 0)
> @@ -6966,12 +7103,12 @@ _scsih_scan_for_devices_after_reset(struct MPT3SAS_ADAPTER *ioc)
> if (!(_scsih_is_end_device(
> le32_to_cpu(sas_device_pg0.DeviceInfo))))
> continue;
> - spin_lock_irqsave(&ioc->sas_device_lock, flags);
> - sas_device = mpt3sas_scsih_sas_device_find_by_sas_address(ioc,
> + sas_device = mpt3sas_get_sdev_by_addr(ioc,
> le64_to_cpu(sas_device_pg0.SASAddress));
> - spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
> - if (sas_device)
> + if (sas_device) {
> + sas_device_put(sas_device);
> continue;
> + }
> parent_handle = le16_to_cpu(sas_device_pg0.ParentDevHandle);
> if (!_scsih_get_sas_address(ioc, parent_handle, &sas_address)) {
> pr_info(MPT3SAS_FMT "\tBEFORE adding end device: " \
> @@ -7598,6 +7735,48 @@ _scsih_probe_raid(struct MPT3SAS_ADAPTER *ioc)
> }
> }
>
> +static struct _sas_device *get_next_sas_device(struct MPT3SAS_ADAPTER *ioc)
> +{
> + struct _sas_device *sas_device = NULL;
> + unsigned long flags;
> +
> + spin_lock_irqsave(&ioc->sas_device_lock, flags);
> + if (!list_empty(&ioc->sas_device_init_list)) {
> + sas_device = list_first_entry(&ioc->sas_device_init_list,
> + struct _sas_device, list);
> + sas_device_get(sas_device);
> + }
> + spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
> +
> + return sas_device;
> +}
> +
> +static void sas_device_make_active(struct MPT3SAS_ADAPTER *ioc,
> + struct _sas_device *sas_device)
> +{
> + unsigned long flags;
> +
> + spin_lock_irqsave(&ioc->sas_device_lock, flags);
> +
> + /*
> + * Since we dropped the lock during the call to port_add(), we need to
> + * be careful here that somebody else didn't move or delete this item
> + * while we were busy with other things.
> + *
> + * If it was on the list, we need a put() for the reference the list
> + * had. Either way, we need a get() for the destination list.
> + */
> + if (!list_empty(&sas_device->list)) {
> + list_del_init(&sas_device->list);
> + sas_device_put(sas_device);
> + }
> +
> + sas_device_get(sas_device);
> + list_add_tail(&sas_device->list, &ioc->sas_device_list);
> +
> + spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
> +}
> +
> /**
> * _scsih_probe_sas - reporting sas devices to sas transport
> * @ioc: per adapter object
> @@ -7607,17 +7786,13 @@ _scsih_probe_raid(struct MPT3SAS_ADAPTER *ioc)
> static void
> _scsih_probe_sas(struct MPT3SAS_ADAPTER *ioc)
> {
> - struct _sas_device *sas_device, *next;
> - unsigned long flags;
> -
> - /* SAS Device List */
> - list_for_each_entry_safe(sas_device, next, &ioc->sas_device_init_list,
> - list) {
> + struct _sas_device *sas_device;
>
> + while ((sas_device = get_next_sas_device(ioc))) {
> if (!mpt3sas_transport_port_add(ioc, sas_device->handle,
> - sas_device->sas_address_parent)) {
> - list_del(&sas_device->list);
> - kfree(sas_device);
> + sas_device->sas_address_parent)) {
> + _scsih_sas_device_remove(ioc, sas_device);
> + sas_device_put(sas_device);
> continue;
> } else if (!sas_device->starget) {
> /*
> @@ -7628,17 +7803,16 @@ _scsih_probe_sas(struct MPT3SAS_ADAPTER *ioc)
> */
> if (!ioc->is_driver_loading) {
> mpt3sas_transport_port_remove(ioc,
> - sas_device->sas_address,
> - sas_device->sas_address_parent);
> - list_del(&sas_device->list);
> - kfree(sas_device);
> + sas_device->sas_address,
> + sas_device->sas_address_parent);
> + _scsih_sas_device_remove(ioc, sas_device);
> + sas_device_put(sas_device);
> continue;
> }
> }
>
> - spin_lock_irqsave(&ioc->sas_device_lock, flags);
> - list_move_tail(&sas_device->list, &ioc->sas_device_list);
> - spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
> + sas_device_make_active(ioc, sas_device);
> + sas_device_put(sas_device);
> }
> }
>
> diff --git a/drivers/scsi/mpt3sas/mpt3sas_transport.c b/drivers/scsi/mpt3sas/mpt3sas_transport.c
> index efb98af..2f9d038 100644
> --- a/drivers/scsi/mpt3sas/mpt3sas_transport.c
> +++ b/drivers/scsi/mpt3sas/mpt3sas_transport.c
> @@ -1306,15 +1306,17 @@ _transport_get_enclosure_identifier(struct sas_rphy *rphy, u64 *identifier)
> int rc;
>
> spin_lock_irqsave(&ioc->sas_device_lock, flags);
> - sas_device = mpt3sas_scsih_sas_device_find_by_sas_address(ioc,
> + sas_device = __mpt3sas_get_sdev_by_addr(ioc,
> rphy->identify.sas_address);
> if (sas_device) {
> *identifier = sas_device->enclosure_logical_id;
> rc = 0;
> + sas_device_put(sas_device);
> } else {
> *identifier = 0;
> rc = -ENXIO;
> }
> +
> spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
> return rc;
> }
> @@ -1334,12 +1336,14 @@ _transport_get_bay_identifier(struct sas_rphy *rphy)
> int rc;
>
> spin_lock_irqsave(&ioc->sas_device_lock, flags);
> - sas_device = mpt3sas_scsih_sas_device_find_by_sas_address(ioc,
> + sas_device = __mpt3sas_get_sdev_by_addr(ioc,
> rphy->identify.sas_address);
> - if (sas_device)
> + if (sas_device) {
> rc = sas_device->slot;
> - else
> + sas_device_put(sas_device);
> + } else {
> rc = -ENXIO;
> + }
> spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
> return rc;
> }
> --
> 1.9.1
>

2015-08-27 05:07:54

by Sreekanth Reddy

[permalink] [raw]
Subject: Re: [PATCH 0/2] mpt3sas: Reference counting fixes from in-flight mpt2sas

HI Nicholas & Calvin,

Thanks for the patchset. Sure We will review and we do some unit
testing on this patch series. Currently my bandwidth is occupied with
some internal activity, so by end of next week I will acknowledge this
series if all the thing are fine with this patch series.

Thanks,
Sreekanth

On Thu, Aug 27, 2015 at 5:28 AM, Nicholas A. Bellinger
<[email protected]> wrote:
> On Wed, 2015-08-26 at 16:54 -0700, Calvin Owens wrote:
>> On Wednesday 08/26 at 04:09 +0000, Nicholas A. Bellinger wrote:
>> > From: Nicholas Bellinger <[email protected]>
>> >
>> > Hi James & Co,
>> >
>> > This series is a mpt3sas forward port of Calvin Owens' in-flight
>> > reference counting bugfixes for mpt2sas LLD code here:
>> >
>> > [PATCH v4 0/2] Fixes for memory corruption in mpt2sas
>> > http://marc.info/?l=linux-scsi&m=143951695904115&w=2
>>
>> Nicholas,
>>
>> I'm glad to hear these fixes were helpful! I was planning on porting to
>> mpt3sas in the near future, thanks for saving me the trouble :)
>>
>
> Would you mind giving this series the once over, and add your
> Reviewed-by as well..?
>
> Sreekanth + Avago folks, can you please do the same for Calvin's mpt2sas
> series and this series..?
>
> This has been a long-standing issue with mpt*sas LLD code and really
> needs to be addressed for upstream.
>
> Thank you,
>
> --nab
>

2015-08-27 07:07:44

by Nicholas A. Bellinger

[permalink] [raw]
Subject: Re: [PATCH 0/2] mpt3sas: Reference counting fixes from in-flight mpt2sas

Hi Sreekanth,

On Thu, 2015-08-27 at 10:37 +0530, Sreekanth Reddy wrote:
> HI Nicholas & Calvin,
>
> Thanks for the patchset. Sure We will review and we do some unit
> testing on this patch series. Currently my bandwidth is occupied with
> some internal activity, so by end of next week I will acknowledge this
> series if all the thing are fine with this patch series.
>

Thanks for the quick response.

So Calvin's mpt2sas series has been merged into scsi.git/for-next.

Since this series is AFAICT functionally identical to Calvin's changes,
I assume JEJB will be merging it up for v4.3-rc1 soon too.

Also, we're probably not the only ones hitting this class of OOPsen, so
it will certainly need to be picked up for stable at some point, and any
further regressions will need to be addressed.

Please keep the list posted of your testing progress.

For reference, here are prerequisites need to apply atop v3.14.y:

4dc06fd mpt3sas: delay scsi_add_host call to work with scsi-mq
35b6236 mpt3sas: combine fw_event_work and its event_data
62c4da4 mpt3sas: correct scsi_{target,device} hostdata allocation

--nab

2015-08-27 14:40:19

by James Bottomley

[permalink] [raw]
Subject: Re: [PATCH 0/2] mpt3sas: Reference counting fixes from in-flight mpt2sas

On Thu, 2015-08-27 at 10:37 +0530, Sreekanth Reddy wrote:
> HI Nicholas & Calvin,
>
> Thanks for the patchset. Sure We will review and we do some unit
> testing on this patch series. Currently my bandwidth is occupied with
> some internal activity, so by end of next week I will acknowledge this
> series if all the thing are fine with this patch series.

Calvin responded to your review feedback and that series has been
outstanding for a while, so I'm not going to drop it from the misc tree.
However, I will reorder to make it ready for the second push. You have
until Friday week to find a problem with it.

James

2015-08-27 14:48:24

by Sreekanth Reddy

[permalink] [raw]
Subject: Re: [PATCH 0/2] mpt3sas: Reference counting fixes from in-flight mpt2sas

Ok James, by 9/4/2015 we will provide our feedback over this patch series.

Thanks,
Sreekanth


On Thu, Aug 27, 2015 at 8:10 PM, James Bottomley
<[email protected]> wrote:
> On Thu, 2015-08-27 at 10:37 +0530, Sreekanth Reddy wrote:
>> HI Nicholas & Calvin,
>>
>> Thanks for the patchset. Sure We will review and we do some unit
>> testing on this patch series. Currently my bandwidth is occupied with
>> some internal activity, so by end of next week I will acknowledge this
>> series if all the thing are fine with this patch series.
>
> Calvin responded to your review feedback and that series has been
> outstanding for a while, so I'm not going to drop it from the misc tree.
> However, I will reorder to make it ready for the second push. You have
> until Friday week to find a problem with it.
>
> James
>
>

2015-08-27 19:15:54

by Nicholas A. Bellinger

[permalink] [raw]
Subject: Re: [PATCH 0/2] mpt3sas: Reference counting fixes from in-flight mpt2sas

On Thu, 2015-08-27 at 07:40 -0700, James Bottomley wrote:
> On Thu, 2015-08-27 at 10:37 +0530, Sreekanth Reddy wrote:
> > HI Nicholas & Calvin,
> >
> > Thanks for the patchset. Sure We will review and we do some unit
> > testing on this patch series. Currently my bandwidth is occupied with
> > some internal activity, so by end of next week I will acknowledge this
> > series if all the thing are fine with this patch series.
>
> Calvin responded to your review feedback and that series has been
> outstanding for a while, so I'm not going to drop it from the misc tree.
> However, I will reorder to make it ready for the second push. You have
> until Friday week to find a problem with it.
>

James, as mentioned this series is functionally identical to Calvin's
mpt2sas series.

Please consider merging it to scsi.git/for-next, so both series are
together and in-sync.

--nab

2015-08-28 20:25:42

by James Bottomley

[permalink] [raw]
Subject: Re: [PATCH 0/2] mpt3sas: Reference counting fixes from in-flight mpt2sas

On Thu, 2015-08-27 at 12:15 -0700, Nicholas A. Bellinger wrote:
> On Thu, 2015-08-27 at 07:40 -0700, James Bottomley wrote:
> > On Thu, 2015-08-27 at 10:37 +0530, Sreekanth Reddy wrote:
> > > HI Nicholas & Calvin,
> > >
> > > Thanks for the patchset. Sure We will review and we do some unit
> > > testing on this patch series. Currently my bandwidth is occupied with
> > > some internal activity, so by end of next week I will acknowledge this
> > > series if all the thing are fine with this patch series.
> >
> > Calvin responded to your review feedback and that series has been
> > outstanding for a while, so I'm not going to drop it from the misc tree.
> > However, I will reorder to make it ready for the second push. You have
> > until Friday week to find a problem with it.
> >
>
> James, as mentioned this series is functionally identical to Calvin's
> mpt2sas series.
>
> Please consider merging it to scsi.git/for-next, so both series are
> together and in-sync.

Unfortunately, the driver isn't, thanks to drift between v2 and v3 of
the mpt_sas code bases. This patch is also dangerous: the early
versions left unremoved objects lying around, so getting some stress
testing from avago is very useful. At this point in the cycle, the risk
vs reward of doing a blind upport to mpt3_sas is just too great and the
time for review and stress testing too limited within the merge window.

James

2015-08-30 07:22:22

by Nicholas A. Bellinger

[permalink] [raw]
Subject: Re: [PATCH 0/2] mpt3sas: Reference counting fixes from in-flight mpt2sas

On Fri, 2015-08-28 at 13:25 -0700, James Bottomley wrote:
> On Thu, 2015-08-27 at 12:15 -0700, Nicholas A. Bellinger wrote:
> > On Thu, 2015-08-27 at 07:40 -0700, James Bottomley wrote:
> > > On Thu, 2015-08-27 at 10:37 +0530, Sreekanth Reddy wrote:
> > > > HI Nicholas & Calvin,
> > > >
> > > > Thanks for the patchset. Sure We will review and we do some unit
> > > > testing on this patch series. Currently my bandwidth is occupied with
> > > > some internal activity, so by end of next week I will acknowledge this
> > > > series if all the thing are fine with this patch series.
> > >
> > > Calvin responded to your review feedback and that series has been
> > > outstanding for a while, so I'm not going to drop it from the misc tree.
> > > However, I will reorder to make it ready for the second push. You have
> > > until Friday week to find a problem with it.
> > >
> >
> > James, as mentioned this series is functionally identical to Calvin's
> > mpt2sas series.
> >
> > Please consider merging it to scsi.git/for-next, so both series are
> > together and in-sync.
>
> Unfortunately, the driver isn't, thanks to drift between v2 and v3 of
> the mpt_sas code bases. This patch is also dangerous: the early
> versions left unremoved objects lying around, so getting some stress
> testing from avago is very useful. At this point in the cycle, the risk
> vs reward of doing a blind upport to mpt3_sas is just too great and the
> time for review and stress testing too limited within the merge window.

To clarify, this series is Calvin's latest -v4 mpt2sas changes that
you've already merged into for-next, and that have been applied (by
hand) to v4.2-rc1 mpt3sas code.

If you look closer, this series is an obvious bug-fix for a class of
long-standing bugs within mpt*sas, and I don't see how keeping the
broken list_head dereferences in one LLD, but not the other makes any
sense at this point.

Unfortunately, the mpt3sas patches you've merged this week add yet more
bogus mpt3sas_scsih_sas_device_find_by_sas_address() usage. Really,
adding more broken code to mpt3sas can't possibly be better than just
merging this bug-fix series.

Here's are two cases that required fixing to apply this series atop
latest scsi.git/for-next:

diff --git a/drivers/scsi/mpt3sas/mpt3sas_scsih.c b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
index 85ff0dd..897153b 100644
--- a/drivers/scsi/mpt3sas/mpt3sas_scsih.c
+++ b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
@@ -2866,7 +2874,7 @@ _scsih_block_io_device(struct MPT3SAS_ADAPTER *ioc, u16 handle)
struct scsi_device *sdev;
struct _sas_device *sas_device;

- sas_device = _scsih_sas_device_find_by_handle(ioc, handle);
+ sas_device = __mpt3sas_get_sdev_by_handle(ioc, handle);
if (!sas_device)
return;

@@ -2882,6 +2890,8 @@ _scsih_block_io_device(struct MPT3SAS_ADAPTER *ioc, u16 handle)
continue;
_scsih_internal_device_block(sdev, sas_device_priv_data);
}
+
+ sas_device_put(sas_device);
}

/**
diff --git a/drivers/scsi/mpt3sas/mpt3sas_transport.c b/drivers/scsi/mpt3sas/mpt3sas_transport.c
index 18f1de5..6074b11 100644
--- a/drivers/scsi/mpt3sas/mpt3sas_transport.c
+++ b/drivers/scsi/mpt3sas/mpt3sas_transport.c
@@ -734,7 +734,7 @@ mpt3sas_transport_port_add(struct MPT3SAS_ADAPTER *ioc, u16 handle,
rphy->identify = mpt3sas_port->remote_identify;

if (mpt3sas_port->remote_identify.device_type == SAS_END_DEVICE) {
- sas_device = mpt3sas_scsih_sas_device_find_by_sas_address(ioc,
+ sas_device = __mpt3sas_get_sdev_by_addr(ioc,
mpt3sas_port->remote_identify.sas_address);
if (!sas_device) {
dfailprintk(ioc, printk(MPT3SAS_FMT
@@ -750,8 +750,10 @@ mpt3sas_transport_port_add(struct MPT3SAS_ADAPTER *ioc, u16 handle,
ioc->name, __FILE__, __LINE__, __func__);
}

- if (mpt3sas_port->remote_identify.device_type == SAS_END_DEVICE)
+ if (mpt3sas_port->remote_identify.device_type == SAS_END_DEVICE) {
sas_device->pend_sas_rphy_add = 0;
+ sas_device_put(sas_device);
+ }

if ((ioc->logging_level & MPT_DEBUG_TRANSPORT))
dev_printk(KERN_INFO, &rphy->dev,

Also, I'm currently using the -v1 series on v3.14.47 atop 40 nodes with
12 HDDs per HBA. (480 total), and the number of HBAs using this series
will double over the next week. The specific hardware setup is:

LSI Logic / Symbios Logic SAS3008 PCI-Express Fusion-MPT SAS-3 (rev 02)

Thus far, it has resolved the original OOPsen bug that would appear
occasionally during boot with a failing HDD. So far, no other new
regressions have appeared.

That said, I'll be posting the updated -v2 atop current scsi/for-next
shortly, and will push to target-pending/for-next-merge for now to be
picked up for 0-day + linux-next.

Please consider picking it up for v4.3-rc1, otherwise I'll plan to push
to Linus with Sreekanth's ACK, barring any new regressions or other
specific -v2 code comments.

--nab

2015-08-30 16:14:46

by James Bottomley

[permalink] [raw]
Subject: Re: [PATCH 0/2] mpt3sas: Reference counting fixes from in-flight mpt2sas

On Sun, 2015-08-30 at 00:22 -0700, Nicholas A. Bellinger wrote:
> On Fri, 2015-08-28 at 13:25 -0700, James Bottomley wrote:
> > On Thu, 2015-08-27 at 12:15 -0700, Nicholas A. Bellinger wrote:
> > > On Thu, 2015-08-27 at 07:40 -0700, James Bottomley wrote:
> > > > On Thu, 2015-08-27 at 10:37 +0530, Sreekanth Reddy wrote:
> > > > > HI Nicholas & Calvin,
> > > > >
> > > > > Thanks for the patchset. Sure We will review and we do some unit
> > > > > testing on this patch series. Currently my bandwidth is occupied with
> > > > > some internal activity, so by end of next week I will acknowledge this
> > > > > series if all the thing are fine with this patch series.
> > > >
> > > > Calvin responded to your review feedback and that series has been
> > > > outstanding for a while, so I'm not going to drop it from the misc tree.
> > > > However, I will reorder to make it ready for the second push. You have
> > > > until Friday week to find a problem with it.
> > > >
> > >
> > > James, as mentioned this series is functionally identical to Calvin's
> > > mpt2sas series.
> > >
> > > Please consider merging it to scsi.git/for-next, so both series are
> > > together and in-sync.
> >
> > Unfortunately, the driver isn't, thanks to drift between v2 and v3 of
> > the mpt_sas code bases. This patch is also dangerous: the early
> > versions left unremoved objects lying around, so getting some stress
> > testing from avago is very useful. At this point in the cycle, the risk
> > vs reward of doing a blind upport to mpt3_sas is just too great and the
> > time for review and stress testing too limited within the merge window.
>
> To clarify, this series is Calvin's latest -v4 mpt2sas changes that
> you've already merged into for-next, and that have been applied (by
> hand) to v4.2-rc1 mpt3sas code.
>
> If you look closer, this series is an obvious bug-fix for a class of
> long-standing bugs within mpt*sas, and I don't see how keeping the
> broken list_head dereferences in one LLD, but not the other makes any
> sense at this point.

Look, Nic, the original patches went through four reviews with issues
uncovered and fixed at most of them. They're now sitting at the head of
the misc tree while they get stress tested so they can easily be ejected
without affecting the rest of the tree should anything fail.

They're hardly "an obvious bug fix" they actually introduce a separated
lifetime for _sas_device and fw_event_work.

Separated lifetime patches should always be a last resort because they
introduce a whole new set of lifetime handling rules for the objects.
Get one of them wrong and either you free it too early (oops) or pin it
forever. The general rule of thumb is never do separated lifetime
objects unless you really really have to. Ideally pin them to existing
core objects. The only reason I'm considering this is because no one
can see how to pin _sas_device to the natural core object (scsi_device).
The firmware event one doesn't make sense to me at all so I'm relying on
the reviewers. Ideally a fw event is allocated in the event setup and
freed in the event fire (or cancel) I really don't see we need a
separated lifetime object here. you have a callback to mediate exactly
whether the event fired or not in the cancel, so there's no ambiguity
about ownership.

Simply because of this, your upport to mp3sas can't go in without the
same level of review and testing.

> Unfortunately, the mpt3sas patches you've merged this week add yet more
> bogus mpt3sas_scsih_sas_device_find_by_sas_address() usage. Really,
> adding more broken code to mpt3sas can't possibly be better than just
> merging this bug-fix series.

Well, it had two reviewers per patch, one of whom was Martin Petersen
who doesn't give reviewed by lightly. Funnily enough it seems the list
lost your review feedback for this patch set.

James