2017-03-29 09:41:47

by Johannes Thumshirn

[permalink] [raw]
Subject: [PATCH 0/2] Fix sysfs recursive removal splats in isci

This series fixes a sysfs warning caused by isci not being able to cope with
recursive sysfs path removals which are in place since commit bcdde7e
("sysfs: make __sysfs_remove_dir() recursive").

The mvsas, aic94xx and pm8001 and hisi_sas patches have been compile tested
only hence they have no callstack of the affected path in their changelogs.

I'm not sure whether to mark this patches as stable or not. I tend to say no
here, although we've seen complaints/bug reports on lkml and the scsi list.

Johannes Thumshirn (6):
scsi: sas: flush destruct workqueue on device unregister
scsi: isci: remove the SAS host after the SCSI host
aic94xx: remove the SAS host after the SCSI host
scsi: hisi_sas: remove the SAS host after the SCSI host
mvsas: remove the SAS host after the SCSI host
scsi: pm8001: remove the SAS host after the SCSI host

drivers/scsi/aic94xx/aic94xx_init.c | 13 ++++++++++---
drivers/scsi/hisi_sas/hisi_sas_main.c | 10 ++++++++--
drivers/scsi/isci/init.c | 9 ++++++++-
drivers/scsi/libsas/sas_discover.c | 4 ++++
drivers/scsi/mvsas/mv_init.c | 13 +++++++++++--
drivers/scsi/pm8001/pm8001_init.c | 14 ++++++++++++--
6 files changed, 53 insertions(+), 10 deletions(-)
--
1.8.5.6



2017-03-29 09:41:28

by Johannes Thumshirn

[permalink] [raw]
Subject: [PATCH 3/6] aic94xx: remove the SAS host after the SCSI host

After commit bcdde7e ("sysfs: make __sysfs_remove_dir() recursive") changed the
removal path of kernfs to make it recursive we have to remove the SAS host
before the SCSI host or we will see sysfs warnings on not found sysfs groups for
kobjects.

Signed-off-by: Johannes Thumshirn <[email protected]>
---
drivers/scsi/aic94xx/aic94xx_init.c | 13 ++++++++++---
1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/drivers/scsi/aic94xx/aic94xx_init.c b/drivers/scsi/aic94xx/aic94xx_init.c
index 662b232..362d65a 100644
--- a/drivers/scsi/aic94xx/aic94xx_init.c
+++ b/drivers/scsi/aic94xx/aic94xx_init.c
@@ -701,13 +701,20 @@ static int asd_register_sas_ha(struct asd_ha_struct *asd_ha)

static int asd_unregister_sas_ha(struct asd_ha_struct *asd_ha)
{
+ struct Scsi_Host *shost = asd_ha->sas_ha.core.shost;
+ unsigned long flags;
int err;

- scsi_remove_host(asd_ha->sas_ha.core.shost);
+ spin_lock_irqsave(shost->host_lock, flags);
+ if (scsi_host_set_state(shost, SHOST_CANCEL))
+ WARN_ON(scsi_host_set_state(shost, SHOST_CANCEL_RECOVERY));
+ spin_unlock_irqrestore(shost->host_lock, flags);
+
err = sas_unregister_ha(&asd_ha->sas_ha);

- sas_remove_host(asd_ha->sas_ha.core.shost);
- scsi_host_put(asd_ha->sas_ha.core.shost);
+ sas_remove_host(shost);
+ scsi_remove_host(shost);
+ scsi_host_put(shost);

kfree(asd_ha->sas_ha.sas_phy);
kfree(asd_ha->sas_ha.sas_port);
--
1.8.5.6

2017-03-29 09:41:33

by Johannes Thumshirn

[permalink] [raw]
Subject: [PATCH 6/6] scsi: pm8001: remove the SAS host after the SCSI host

After commit bcdde7e ("sysfs: make __sysfs_remove_dir() recursive") changed the
removal path of kernfs to make it recursive we have to remove the SAS host
before the SCSI host or we will see sysfs warnings on not found sysfs groups for
kobjects.

Signed-off-by: Johannes Thumshirn <[email protected]>
---
drivers/scsi/pm8001/pm8001_init.c | 14 ++++++++++++--
1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/drivers/scsi/pm8001/pm8001_init.c b/drivers/scsi/pm8001/pm8001_init.c
index 417368c..9116e9c 100644
--- a/drivers/scsi/pm8001/pm8001_init.c
+++ b/drivers/scsi/pm8001/pm8001_init.c
@@ -1086,11 +1086,21 @@ static void pm8001_pci_remove(struct pci_dev *pdev)
{
struct sas_ha_struct *sha = pci_get_drvdata(pdev);
struct pm8001_hba_info *pm8001_ha;
+ unsigned long flags;
+ struct Scsi_Host *shost;
int i, j;
pm8001_ha = sha->lldd_ha;
- scsi_remove_host(pm8001_ha->shost);
+ shost = pm8001_ha->shost;
+
+ spin_lock_irqsave(shost->host_lock, flags);
+ if (scsi_host_set_state(shost, SHOST_CANCEL))
+ WARN_ON(scsi_host_set_state(shost, SHOST_CANCEL_RECOVERY));
+ spin_unlock_irqrestore(shost->host_lock, flags);
+
sas_unregister_ha(sha);
- sas_remove_host(pm8001_ha->shost);
+ sas_remove_host(shost);
+ scsi_remove_host(shost);
+
list_del(&pm8001_ha->list);
PM8001_CHIP_DISP->interrupt_disable(pm8001_ha, 0xFF);
PM8001_CHIP_DISP->chip_soft_rst(pm8001_ha);
--
1.8.5.6

2017-03-29 09:41:37

by Johannes Thumshirn

[permalink] [raw]
Subject: [PATCH 5/6] mvsas: remove the SAS host after the SCSI host

After commit bcdde7e ("sysfs: make __sysfs_remove_dir() recursive") changed the
removal path of kernfs to make it recursive we have to remove the SAS host
before the SCSI host or we will see sysfs warnings on not found sysfs groups for
kobjects.

Signed-off-by: Johannes Thumshirn <[email protected]>
---
drivers/scsi/mvsas/mv_init.c | 13 +++++++++++--
1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/drivers/scsi/mvsas/mv_init.c b/drivers/scsi/mvsas/mv_init.c
index 8280046..32b86c8 100644
--- a/drivers/scsi/mvsas/mv_init.c
+++ b/drivers/scsi/mvsas/mv_init.c
@@ -634,17 +634,26 @@ static void mvs_pci_remove(struct pci_dev *pdev)
unsigned short core_nr, i = 0;
struct sas_ha_struct *sha = pci_get_drvdata(pdev);
struct mvs_info *mvi = NULL;
+ struct Scsi_Host *shost;
+ unsigned long flags;

core_nr = ((struct mvs_prv_info *)sha->lldd_ha)->n_host;
mvi = ((struct mvs_prv_info *)sha->lldd_ha)->mvi[0];
+ shost = mvi->shost;

#ifdef CONFIG_SCSI_MVSAS_TASKLET
tasklet_kill(&((struct mvs_prv_info *)sha->lldd_ha)->mv_tasklet);
#endif

- scsi_remove_host(mvi->shost);
+ spin_lock_irqsave(shost->host_lock, flags);
+ if (scsi_host_set_state(shost, SHOST_CANCEL))
+ WARN_ON(scsi_host_set_state(shost, SHOST_CANCEL_RECOVERY));
+ spin_unlock_irqrestore(shost->host_lock, flags);
+
+
sas_unregister_ha(sha);
- sas_remove_host(mvi->shost);
+ sas_remove_host(shost);
+ scsi_remove_host(shost);

MVS_CHIP_DISP->interrupt_disable(mvi);
free_irq(mvi->pdev->irq, sha);
--
1.8.5.6

2017-03-29 09:42:37

by Johannes Thumshirn

[permalink] [raw]
Subject: [PATCH 1/2] scsi: sas: flush destruct workqueue on device unregister

In the advent of an SAS device unregister we have to wait for all destruct
works to be done to not accidently delay deletion of a SAS rphy or it's
children to the point when we're removing the SCSI or SAS hosts.

Signed-off-by: Johannes Thumshirn <[email protected]>
---
drivers/scsi/libsas/sas_discover.c | 4 ++++
1 file changed, 4 insertions(+)

diff --git a/drivers/scsi/libsas/sas_discover.c b/drivers/scsi/libsas/sas_discover.c
index 60de662..75b18f1 100644
--- a/drivers/scsi/libsas/sas_discover.c
+++ b/drivers/scsi/libsas/sas_discover.c
@@ -382,9 +382,13 @@ void sas_unregister_dev(struct asd_sas_port *port, struct domain_device *dev)
}

if (!test_and_set_bit(SAS_DEV_DESTROY, &dev->state)) {
+ struct sas_discovery *disc = &dev->port->disc;
+ struct sas_work *sw = &disc->disc_work[DISCE_DESTRUCT].work;
+
sas_rphy_unlink(dev->rphy);
list_move_tail(&dev->disco_list_node, &port->destroy_list);
sas_discover_event(dev->port, DISCE_DESTRUCT);
+ flush_work(&sw->work);
}
}

--
1.8.5.6

2017-03-29 09:42:35

by Johannes Thumshirn

[permalink] [raw]
Subject: [PATCH 2/2] scsi: isci: remove the SAS host after the SCSI host

After commit bcdde7e ("sysfs: make __sysfs_remove_dir() recursive") changed the
removal path of kernfs to make it recursive we have to remove the SAS host
before the SCSI host or we will see sysfs warnings like the below when
triggering the the removal of the SAS HBA PCI device like with writing to the
sysfs pci remove file:

echo 1 > /sys/module/isci/drivers/pci:isci/<device>/remove

WARNING: CPU: 2 PID: 5 at fs/sysfs/group.c:241 sysfs_remove_group+0xc3/0xd0
sysfs group 'power' not found for kobject 'end_device-6:0'
CPU: 16 PID: 5884 Comm: echo Not tainted 4.11.0-rc3-libsas+ #504
Call Trace:
dump_stack+0x85/0xc2
__warn+0xc6/0xe0
warn_slowpath_fmt+0x4a/0x50
sysfs_remove_group+0xc3/0xd0
dpm_sysfs_remove+0x52/0x60
device_del+0x13c/0x360
? device_remove_file+0x14/0x20
attribute_container_class_device_del+0x15/0x20
transport_remove_classdev+0x4c/0x60
? transport_add_class_device+0x40/0x40
attribute_container_device_trigger+0xb3/0xc0
transport_remove_device+0x10/0x20
sas_port_delete+0x12d/0x160 [scsi_transport_sas]
sas_deform_port+0x1bf/0x1d0 [libsas]
sas_unregister_ports+0x36/0x50 [libsas]
sas_unregister_ha+0x1b/0x40 [libsas]
isci_unregister+0x2a/0x40 [isci]
isci_pci_remove+0x52/0xb0 [isci]
? __pm_runtime_resume+0x56/0x80
pci_device_remove+0x34/0xb0
device_release_driver_internal+0x158/0x210
device_release_driver+0xd/0x10
pci_stop_bus_device+0x85/0x90
pci_stop_and_remove_bus_device_locked+0x15/0x30
remove_store+0x59/0x70
dev_attr_store+0x13/0x20
sysfs_kf_write+0x40/0x50
kernfs_fop_write+0x130/0x1b0
__vfs_write+0x23/0x130
? rcu_read_lock_sched_held+0x6d/0x80
? rcu_sync_lockdep_assert+0x2a/0x50
? __sb_start_write+0xd7/0x1e0
? vfs_write+0x1a4/0x1f0
vfs_write+0xc6/0x1f0
SyS_write+0x44/0xa0
entry_SYSCALL_64_fastpath+0x23/0xc6

Signed-off-by: Johannes Thumshirn <[email protected]>
---
drivers/scsi/isci/init.c | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/drivers/scsi/isci/init.c b/drivers/scsi/isci/init.c
index 0b5b5db..afa6b25 100644
--- a/drivers/scsi/isci/init.c
+++ b/drivers/scsi/isci/init.c
@@ -267,15 +267,22 @@ static int isci_register_sas_ha(struct isci_host *isci_host)
static void isci_unregister(struct isci_host *isci_host)
{
struct Scsi_Host *shost;
+ unsigned long flags;

if (!isci_host)
return;

shost = to_shost(isci_host);
- scsi_remove_host(shost);
+
+ spin_lock_irqsave(shost->host_lock, flags);
+ if (scsi_host_set_state(shost, SHOST_CANCEL))
+ WARN_ON(scsi_host_set_state(shost, SHOST_CANCEL_RECOVERY));
+ spin_unlock_irqrestore(shost->host_lock, flags);
+
sas_unregister_ha(&isci_host->sas_ha);

sas_remove_host(shost);
+ scsi_remove_host(shost);
scsi_host_put(shost);
}

--
1.8.5.6

2017-03-29 09:43:14

by Johannes Thumshirn

[permalink] [raw]
Subject: [PATCH 4/6] scsi: hisi_sas: remove the SAS host after the SCSI host

After commit bcdde7e ("sysfs: make __sysfs_remove_dir() recursive") changed the
removal path of kernfs to make it recursive we have to remove the SAS host
before the SCSI host or we will see sysfs warnings on not found sysfs groups for
kobjects.

Signed-off-by: Johannes Thumshirn <[email protected]>
---
drivers/scsi/hisi_sas/hisi_sas_main.c | 10 ++++++++--
1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/drivers/scsi/hisi_sas/hisi_sas_main.c b/drivers/scsi/hisi_sas/hisi_sas_main.c
index 53637a9..34476c2 100644
--- a/drivers/scsi/hisi_sas/hisi_sas_main.c
+++ b/drivers/scsi/hisi_sas/hisi_sas_main.c
@@ -1582,10 +1582,16 @@ int hisi_sas_remove(struct platform_device *pdev)
struct sas_ha_struct *sha = platform_get_drvdata(pdev);
struct hisi_hba *hisi_hba = sha->lldd_ha;
struct Scsi_Host *shost = sha->core.shost;
+ unsigned long flags;
+
+ spin_lock_irqsave(shost->host_lock, flags);
+ if (scsi_host_set_state(shost, SHOST_CANCEL))
+ WARN_ON(scsi_host_set_state(shost, SHOST_CANCEL_RECOVERY));
+ spin_unlock_irqrestore(shost->host_lock, flags);

- scsi_remove_host(sha->core.shost);
sas_unregister_ha(sha);
- sas_remove_host(sha->core.shost);
+ sas_remove_host(shost);
+ scsi_remove_host(shost);

hisi_sas_free(hisi_hba);
kfree(shost);
--
1.8.5.6

2017-03-29 10:17:17

by Hannes Reinecke

[permalink] [raw]
Subject: Re: [PATCH 2/2] scsi: isci: remove the SAS host after the SCSI host

On 03/29/2017 11:41 AM, Johannes Thumshirn wrote:
> After commit bcdde7e ("sysfs: make __sysfs_remove_dir() recursive") changed the
> removal path of kernfs to make it recursive we have to remove the SAS host
> before the SCSI host or we will see sysfs warnings like the below when
> triggering the the removal of the SAS HBA PCI device like with writing to the
> sysfs pci remove file:
>
> echo 1 > /sys/module/isci/drivers/pci:isci/<device>/remove
>
> WARNING: CPU: 2 PID: 5 at fs/sysfs/group.c:241 sysfs_remove_group+0xc3/0xd0
> sysfs group 'power' not found for kobject 'end_device-6:0'
> CPU: 16 PID: 5884 Comm: echo Not tainted 4.11.0-rc3-libsas+ #504
> Call Trace:
> dump_stack+0x85/0xc2
> __warn+0xc6/0xe0
> warn_slowpath_fmt+0x4a/0x50
> sysfs_remove_group+0xc3/0xd0
> dpm_sysfs_remove+0x52/0x60
> device_del+0x13c/0x360
> ? device_remove_file+0x14/0x20
> attribute_container_class_device_del+0x15/0x20
> transport_remove_classdev+0x4c/0x60
> ? transport_add_class_device+0x40/0x40
> attribute_container_device_trigger+0xb3/0xc0
> transport_remove_device+0x10/0x20
> sas_port_delete+0x12d/0x160 [scsi_transport_sas]
> sas_deform_port+0x1bf/0x1d0 [libsas]
> sas_unregister_ports+0x36/0x50 [libsas]
> sas_unregister_ha+0x1b/0x40 [libsas]
> isci_unregister+0x2a/0x40 [isci]
> isci_pci_remove+0x52/0xb0 [isci]
> ? __pm_runtime_resume+0x56/0x80
> pci_device_remove+0x34/0xb0
> device_release_driver_internal+0x158/0x210
> device_release_driver+0xd/0x10
> pci_stop_bus_device+0x85/0x90
> pci_stop_and_remove_bus_device_locked+0x15/0x30
> remove_store+0x59/0x70
> dev_attr_store+0x13/0x20
> sysfs_kf_write+0x40/0x50
> kernfs_fop_write+0x130/0x1b0
> __vfs_write+0x23/0x130
> ? rcu_read_lock_sched_held+0x6d/0x80
> ? rcu_sync_lockdep_assert+0x2a/0x50
> ? __sb_start_write+0xd7/0x1e0
> ? vfs_write+0x1a4/0x1f0
> vfs_write+0xc6/0x1f0
> SyS_write+0x44/0xa0
> entry_SYSCALL_64_fastpath+0x23/0xc6
>
> Signed-off-by: Johannes Thumshirn <[email protected]>
> ---
> drivers/scsi/isci/init.c | 9 ++++++++-
> 1 file changed, 8 insertions(+), 1 deletion(-)
>
Reviewed-by: Hannes Reinecke <[email protected]>

Cheers,

Hannes
--
Dr. Hannes Reinecke Teamlead Storage & Networking
[email protected] +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 N?rnberg
GF: F. Imend?rffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG N?rnberg)

2017-03-29 10:17:56

by Hannes Reinecke

[permalink] [raw]
Subject: Re: [PATCH 3/6] aic94xx: remove the SAS host after the SCSI host

On 03/29/2017 11:41 AM, Johannes Thumshirn wrote:
> After commit bcdde7e ("sysfs: make __sysfs_remove_dir() recursive") changed the
> removal path of kernfs to make it recursive we have to remove the SAS host
> before the SCSI host or we will see sysfs warnings on not found sysfs groups for
> kobjects.
>
> Signed-off-by: Johannes Thumshirn <[email protected]>
> ---
> drivers/scsi/aic94xx/aic94xx_init.c | 13 ++++++++++---
> 1 file changed, 10 insertions(+), 3 deletions(-)
>
Reviewed-by: Hannes Reinecke <[email protected]>

Cheers,

Hannes
--
Dr. Hannes Reinecke Teamlead Storage & Networking
[email protected] +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 N?rnberg
GF: F. Imend?rffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG N?rnberg)

2017-03-29 10:18:52

by Hannes Reinecke

[permalink] [raw]
Subject: Re: [PATCH 6/6] scsi: pm8001: remove the SAS host after the SCSI host

On 03/29/2017 11:41 AM, Johannes Thumshirn wrote:
> After commit bcdde7e ("sysfs: make __sysfs_remove_dir() recursive") changed the
> removal path of kernfs to make it recursive we have to remove the SAS host
> before the SCSI host or we will see sysfs warnings on not found sysfs groups for
> kobjects.
>
> Signed-off-by: Johannes Thumshirn <[email protected]>
> ---
> drivers/scsi/pm8001/pm8001_init.c | 14 ++++++++++++--
> 1 file changed, 12 insertions(+), 2 deletions(-)
>
Reviewed-by: Hannes Reinecke <[email protected]>

Cheers,

Hannes
--
Dr. Hannes Reinecke Teamlead Storage & Networking
[email protected] +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 N?rnberg
GF: F. Imend?rffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG N?rnberg)

2017-03-29 10:18:02

by Hannes Reinecke

[permalink] [raw]
Subject: Re: [PATCH 4/6] scsi: hisi_sas: remove the SAS host after the SCSI host

On 03/29/2017 11:41 AM, Johannes Thumshirn wrote:
> After commit bcdde7e ("sysfs: make __sysfs_remove_dir() recursive") changed the
> removal path of kernfs to make it recursive we have to remove the SAS host
> before the SCSI host or we will see sysfs warnings on not found sysfs groups for
> kobjects.
>
> Signed-off-by: Johannes Thumshirn <[email protected]>
> ---
> drivers/scsi/hisi_sas/hisi_sas_main.c | 10 ++++++++--
> 1 file changed, 8 insertions(+), 2 deletions(-)
>
Reviewed-by: Hannes Reinecke <[email protected]>

Cheers,

Hannes
--
Dr. Hannes Reinecke Teamlead Storage & Networking
[email protected] +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 N?rnberg
GF: F. Imend?rffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG N?rnberg)

2017-03-29 10:18:51

by Hannes Reinecke

[permalink] [raw]
Subject: Re: [PATCH 5/6] mvsas: remove the SAS host after the SCSI host

On 03/29/2017 11:41 AM, Johannes Thumshirn wrote:
> After commit bcdde7e ("sysfs: make __sysfs_remove_dir() recursive") changed the
> removal path of kernfs to make it recursive we have to remove the SAS host
> before the SCSI host or we will see sysfs warnings on not found sysfs groups for
> kobjects.
>
> Signed-off-by: Johannes Thumshirn <[email protected]>
> ---
> drivers/scsi/mvsas/mv_init.c | 13 +++++++++++--
> 1 file changed, 11 insertions(+), 2 deletions(-)

Reviewed-by: Hannes Reinecke <[email protected]>

Cheers,

Hannes
--
Dr. Hannes Reinecke Teamlead Storage & Networking
[email protected] +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 N?rnberg
GF: F. Imend?rffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG N?rnberg)

2017-03-29 10:27:38

by Jack Wang

[permalink] [raw]
Subject: Re: [PATCH 6/6] scsi: pm8001: remove the SAS host after the SCSI host

On Wed, Mar 29, 2017 at 11:41 AM, Johannes Thumshirn <[email protected]> wrote:
> After commit bcdde7e ("sysfs: make __sysfs_remove_dir() recursive") changed the
> removal path of kernfs to make it recursive we have to remove the SAS host
> before the SCSI host or we will see sysfs warnings on not found sysfs groups for
> kobjects.
>
> Signed-off-by: Johannes Thumshirn <[email protected]>
> ---
> drivers/scsi/pm8001/pm8001_init.c | 14 ++++++++++++--
> 1 file changed, 12 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/scsi/pm8001/pm8001_init.c b/drivers/scsi/pm8001/pm8001_init.c
> index 417368c..9116e9c 100644
> --- a/drivers/scsi/pm8001/pm8001_init.c
> +++ b/drivers/scsi/pm8001/pm8001_init.c
> @@ -1086,11 +1086,21 @@ static void pm8001_pci_remove(struct pci_dev *pdev)
> {
> struct sas_ha_struct *sha = pci_get_drvdata(pdev);
> struct pm8001_hba_info *pm8001_ha;
> + unsigned long flags;
> + struct Scsi_Host *shost;
> int i, j;
> pm8001_ha = sha->lldd_ha;
> - scsi_remove_host(pm8001_ha->shost);
> + shost = pm8001_ha->shost;
> +
> + spin_lock_irqsave(shost->host_lock, flags);
> + if (scsi_host_set_state(shost, SHOST_CANCEL))
> + WARN_ON(scsi_host_set_state(shost, SHOST_CANCEL_RECOVERY));
> + spin_unlock_irqrestore(shost->host_lock, flags);
> +
> sas_unregister_ha(sha);
> - sas_remove_host(pm8001_ha->shost);
> + sas_remove_host(shost);
> + scsi_remove_host(shost);
> +
> list_del(&pm8001_ha->list);
> PM8001_CHIP_DISP->interrupt_disable(pm8001_ha, 0xFF);
> PM8001_CHIP_DISP->chip_soft_rst(pm8001_ha);
> --
> 1.8.5.6
>
Thanks Johannes for taking care of this. Looks good to me,

I have a question regarding the scsi_host_set_state change, why do we need that?
Can't we simply change the order of sas_remove_host and scsi_remove_host?

Cheers?
--
Jack Wang

2017-03-29 10:30:50

by Hannes Reinecke

[permalink] [raw]
Subject: Re: [PATCH 6/6] scsi: pm8001: remove the SAS host after the SCSI host

On 03/29/2017 12:27 PM, Jinpu Wang wrote:
> On Wed, Mar 29, 2017 at 11:41 AM, Johannes Thumshirn <[email protected]> wrote:
>> After commit bcdde7e ("sysfs: make __sysfs_remove_dir() recursive") changed the
>> removal path of kernfs to make it recursive we have to remove the SAS host
>> before the SCSI host or we will see sysfs warnings on not found sysfs groups for
>> kobjects.
>>
>> Signed-off-by: Johannes Thumshirn <[email protected]>
>> ---
>> drivers/scsi/pm8001/pm8001_init.c | 14 ++++++++++++--
>> 1 file changed, 12 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/scsi/pm8001/pm8001_init.c b/drivers/scsi/pm8001/pm8001_init.c
>> index 417368c..9116e9c 100644
>> --- a/drivers/scsi/pm8001/pm8001_init.c
>> +++ b/drivers/scsi/pm8001/pm8001_init.c
>> @@ -1086,11 +1086,21 @@ static void pm8001_pci_remove(struct pci_dev *pdev)
>> {
>> struct sas_ha_struct *sha = pci_get_drvdata(pdev);
>> struct pm8001_hba_info *pm8001_ha;
>> + unsigned long flags;
>> + struct Scsi_Host *shost;
>> int i, j;
>> pm8001_ha = sha->lldd_ha;
>> - scsi_remove_host(pm8001_ha->shost);
>> + shost = pm8001_ha->shost;
>> +
>> + spin_lock_irqsave(shost->host_lock, flags);
>> + if (scsi_host_set_state(shost, SHOST_CANCEL))
>> + WARN_ON(scsi_host_set_state(shost, SHOST_CANCEL_RECOVERY));
>> + spin_unlock_irqrestore(shost->host_lock, flags);
>> +
>> sas_unregister_ha(sha);
>> - sas_remove_host(pm8001_ha->shost);
>> + sas_remove_host(shost);
>> + scsi_remove_host(shost);
>> +
>> list_del(&pm8001_ha->list);
>> PM8001_CHIP_DISP->interrupt_disable(pm8001_ha, 0xFF);
>> PM8001_CHIP_DISP->chip_soft_rst(pm8001_ha);
>> --
>> 1.8.5.6
>>
> Thanks Johannes for taking care of this. Looks good to me,
>
> I have a question regarding the scsi_host_set_state change, why do we need that?
> Can't we simply change the order of sas_remove_host and scsi_remove_host?
>
If we don't do that I/O might still be coming in for the attached ports
while we're trying to remove the same. Which might cause all sorts of
race conditions.
So it's better to set the host state to CANCEL to stop I/O before
removing the ports.

Cheers,

Hannes
--
Dr. Hannes Reinecke Teamlead Storage & Networking
[email protected] +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)

2017-03-29 11:17:14

by John Garry

[permalink] [raw]
Subject: Re: [PATCH 1/2] scsi: sas: flush destruct workqueue on device unregister

On 29/03/2017 10:41, Johannes Thumshirn wrote:
> In the advent of an SAS device unregister we have to wait for all destruct
> works to be done to not accidently delay deletion of a SAS rphy or it's
> children to the point when we're removing the SCSI or SAS hosts.
>
> Signed-off-by: Johannes Thumshirn <[email protected]>
> ---
> drivers/scsi/libsas/sas_discover.c | 4 ++++
> 1 file changed, 4 insertions(+)
>
> diff --git a/drivers/scsi/libsas/sas_discover.c b/drivers/scsi/libsas/sas_discover.c
> index 60de662..75b18f1 100644
> --- a/drivers/scsi/libsas/sas_discover.c
> +++ b/drivers/scsi/libsas/sas_discover.c
> @@ -382,9 +382,13 @@ void sas_unregister_dev(struct asd_sas_port *port, struct domain_device *dev)
> }
>
> if (!test_and_set_bit(SAS_DEV_DESTROY, &dev->state)) {
> + struct sas_discovery *disc = &dev->port->disc;
> + struct sas_work *sw = &disc->disc_work[DISCE_DESTRUCT].work;
> +
> sas_rphy_unlink(dev->rphy);
> list_move_tail(&dev->disco_list_node, &port->destroy_list);
> sas_discover_event(dev->port, DISCE_DESTRUCT);
> + flush_work(&sw->work);

I quickly tested plugging out the expander and we never get past this
call to flush - a hang results:

root@(none)$ [ 243.357088] INFO: task kworker/u32:1:106 blocked for
more than 120 seconds.
[ 243.364030] Not tainted 4.11.0-rc1-13687-g2562e6a-dirty #1388
[ 243.370282] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[ 243.378086] kworker/u32:1 D 0 106 2 0x00000000
[ 243.383566] Workqueue: scsi_wq_0 sas_phye_loss_of_signal
[ 243.388863] Call trace:
[ 243.391314] [<ffff000008085d70>] __switch_to+0xa4/0xb0
[ 243.396442] [<ffff0000088f1134>] __schedule+0x1b4/0x5d0
[ 243.401654] [<ffff0000088f1588>] schedule+0x38/0x9c
[ 243.406520] [<ffff0000088f4540>] schedule_timeout+0x194/0x294
[ 243.412249] [<ffff0000088f202c>] wait_for_common+0xb0/0x144
[ 243.417805] [<ffff0000088f20d4>] wait_for_completion+0x14/0x1c
[ 243.423623] [<ffff0000080d5bd4>] flush_work+0xe0/0x1a8
[ 243.428747] [<ffff000008598158>] sas_unregister_dev+0xf8/0x110
[ 243.434563] [<ffff000008598304>] sas_unregister_domain_devices+0x4c/0xc8
[ 243.441242] [<ffff000008596884>] sas_deform_port+0x14c/0x15c
[ 243.446886] [<ffff000008596508>] sas_phye_loss_of_signal+0x48/0x54
[ 243.453048] [<ffff0000080d6164>] process_one_work+0x138/0x2d8
[ 243.458776] [<ffff0000080d635c>] worker_thread+0x58/0x424
[ 243.464161] [<ffff0000080dc16c>] kthread+0xf4/0x120
[ 243.469024] [<ffff0000080836c0>] ret_from_fork+0x10/0x50
[ 364.189094] INFO: task kworker/u32:1:106 blocked for more than 120
seconds.
[ 364.196035] Not tainted 4.11.0-rc1-13687-g2562e6a-dirty #1388
[ 364.202281] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[ 364.210085] kworker/u32:1 D 0 106 2 0x00000000
[ 364.215558] Workqueue: scsi_wq_0 sas_phye_loss_of_signal
[ 364.220855] Call trace:
[ 364.223303] [<ffff000008085d70>] __switch_to+0xa4/0xb0
[ 364.228428] [<ffff0000088f1134>] __schedule+0x1b4/0x5d0
[ 364.233640] [<ffff0000088f1588>] schedule+0x38/0x9c
[ 364.238506] [<ffff0000088f4540>] schedule_timeout+0x194/0x294
[ 364.244237] [<ffff0000088f202c>] wait_for_common+0xb0/0x144
[ 364.249793] [<ffff0000088f20d4>] wait_for_completion+0x14/0x1c
[ 364.255610] [<ffff0000080d5bd4>] flush_work+0xe0/0x1a8
[ 364.260736] [<ffff000008598158>] sas_unregister_dev+0xf8/0x110
[ 364.266551] [<ffff000008598304>] sas_unregister_domain_devices+0x4c/0xc8
[ 364.273230] [<ffff000008596884>] sas_deform_port+0x14c/0x15c
[ 364.278872] [<ffff000008596508>] sas_phye_loss_of_signal+0x48/0x54
[ 364.285034] [<ffff0000080d6164>] process_one_work+0x138/0x2d8
[ 364.290763] [<ffff0000080d635c>] worker_thread+0x58/0x424
[ 364.296147] [<ffff0000080dc16c>] kthread+0xf4/0x120
[ 364.301013] [<ffff0000080836c0>] ret_from_fork+0x10/0x50

Is the issue that we are trying to flush the queue when we are working
in the same queue context?

Thanks,
John

> }
> }
>
>


2017-03-29 11:30:01

by Johannes Thumshirn

[permalink] [raw]
Subject: Re: [PATCH 1/2] scsi: sas: flush destruct workqueue on device unregister

On Wed, Mar 29, 2017 at 12:15:44PM +0100, John Garry wrote:
> On 29/03/2017 10:41, Johannes Thumshirn wrote:
> >In the advent of an SAS device unregister we have to wait for all destruct
> >works to be done to not accidently delay deletion of a SAS rphy or it's
> >children to the point when we're removing the SCSI or SAS hosts.
> >
> >Signed-off-by: Johannes Thumshirn <[email protected]>
> >---
> > drivers/scsi/libsas/sas_discover.c | 4 ++++
> > 1 file changed, 4 insertions(+)
> >
> >diff --git a/drivers/scsi/libsas/sas_discover.c b/drivers/scsi/libsas/sas_discover.c
> >index 60de662..75b18f1 100644
> >--- a/drivers/scsi/libsas/sas_discover.c
> >+++ b/drivers/scsi/libsas/sas_discover.c
> >@@ -382,9 +382,13 @@ void sas_unregister_dev(struct asd_sas_port *port, struct domain_device *dev)
> > }
> >
> > if (!test_and_set_bit(SAS_DEV_DESTROY, &dev->state)) {
> >+ struct sas_discovery *disc = &dev->port->disc;
> >+ struct sas_work *sw = &disc->disc_work[DISCE_DESTRUCT].work;
> >+
> > sas_rphy_unlink(dev->rphy);
> > list_move_tail(&dev->disco_list_node, &port->destroy_list);
> > sas_discover_event(dev->port, DISCE_DESTRUCT);
> >+ flush_work(&sw->work);
>
> I quickly tested plugging out the expander and we never get past this call
> to flush - a hang results:

Can you activat lockdep so we can see which lock it is that we're blocking on?

It's most likely in sas_unregister_common_dev() but this function takes two spin
locks, port->dev_list_lock and ha->lock.

Thanks a lot,
Johannes

--
Johannes Thumshirn Storage
[email protected] +49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 N?rnberg
GF: Felix Imend?rffer, Jane Smithard, Graham Norton
HRB 21284 (AG N?rnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850

2017-03-29 11:39:00

by James Bottomley

[permalink] [raw]
Subject: Re: [PATCH 0/2] Fix sysfs recursive removal splats in isci

On Wed, 2017-03-29 at 11:41 +0200, Johannes Thumshirn wrote:
> This series fixes a sysfs warning caused by isci not being able to
> cope with recursive sysfs path removals which are in place since
> commit bcdde7e ("sysfs: make __sysfs_remove_dir() recursive").
>
> The mvsas, aic94xx and pm8001 and hisi_sas patches have been compile
> tested only hence they have no callstack of the affected path in
> their changelogs.
>
> I'm not sure whether to mark this patches as stable or not. I tend to
> say no here, although we've seen complaints/bug reports on lkml and
> the scsi list.

What happens to the SYNC CACHE for devices with write back caches? It
looks like you've already torn down most of the sas objects by the time
they're sent, so do they actually reach the device (or worse, do they
hang the system by not making progress)?

Assuming the above is OK, what about putting the state change inside
sas_remove_ha()? It's better than making every driver do it.

James

2017-03-29 11:54:47

by John Garry

[permalink] [raw]
Subject: Re: [PATCH 1/2] scsi: sas: flush destruct workqueue on device unregister

On 29/03/2017 12:29, Johannes Thumshirn wrote:
> On Wed, Mar 29, 2017 at 12:15:44PM +0100, John Garry wrote:
>> On 29/03/2017 10:41, Johannes Thumshirn wrote:
>>> In the advent of an SAS device unregister we have to wait for all destruct
>>> works to be done to not accidently delay deletion of a SAS rphy or it's
>>> children to the point when we're removing the SCSI or SAS hosts.
>>>
>>> Signed-off-by: Johannes Thumshirn <[email protected]>
>>> ---
>>> drivers/scsi/libsas/sas_discover.c | 4 ++++
>>> 1 file changed, 4 insertions(+)
>>>
>>> diff --git a/drivers/scsi/libsas/sas_discover.c b/drivers/scsi/libsas/sas_discover.c
>>> index 60de662..75b18f1 100644
>>> --- a/drivers/scsi/libsas/sas_discover.c
>>> +++ b/drivers/scsi/libsas/sas_discover.c
>>> @@ -382,9 +382,13 @@ void sas_unregister_dev(struct asd_sas_port *port, struct domain_device *dev)
>>> }
>>>
>>> if (!test_and_set_bit(SAS_DEV_DESTROY, &dev->state)) {
>>> + struct sas_discovery *disc = &dev->port->disc;
>>> + struct sas_work *sw = &disc->disc_work[DISCE_DESTRUCT].work;
>>> +
>>> sas_rphy_unlink(dev->rphy);
>>> list_move_tail(&dev->disco_list_node, &port->destroy_list);
>>> sas_discover_event(dev->port, DISCE_DESTRUCT);
>>> + flush_work(&sw->work);
>>
>> I quickly tested plugging out the expander and we never get past this call
>> to flush - a hang results:
>
> Can you activat lockdep so we can see which lock it is that we're blocking on?
>

I have it on:
CONFIG_LOCKDEP_SUPPORT=y
CONFIG_LOCKD=y
CONFIG_LOCKD_V4=y

> It's most likely in sas_unregister_common_dev() but this function takes two spin
> locks, port->dev_list_lock and ha->lock.
>

We can see from the callstack I provided that we're working in workqueue
scsi_wq_0 and trying to flush that same queue.

Much appreciated,
John

> Thanks a lot,
> Johannes
>


2017-03-29 12:26:39

by Johannes Thumshirn

[permalink] [raw]
Subject: Re: [PATCH 1/2] scsi: sas: flush destruct workqueue on device unregister

On Wed, Mar 29, 2017 at 12:53:28PM +0100, John Garry wrote:
> On 29/03/2017 12:29, Johannes Thumshirn wrote:
> >On Wed, Mar 29, 2017 at 12:15:44PM +0100, John Garry wrote:
> >>On 29/03/2017 10:41, Johannes Thumshirn wrote:
> >>>In the advent of an SAS device unregister we have to wait for all destruct
> >>>works to be done to not accidently delay deletion of a SAS rphy or it's
> >>>children to the point when we're removing the SCSI or SAS hosts.
> >>>
> >>>Signed-off-by: Johannes Thumshirn <[email protected]>
> >>>---
> >>>drivers/scsi/libsas/sas_discover.c | 4 ++++
> >>>1 file changed, 4 insertions(+)
> >>>
> >>>diff --git a/drivers/scsi/libsas/sas_discover.c b/drivers/scsi/libsas/sas_discover.c
> >>>index 60de662..75b18f1 100644
> >>>--- a/drivers/scsi/libsas/sas_discover.c
> >>>+++ b/drivers/scsi/libsas/sas_discover.c
> >>>@@ -382,9 +382,13 @@ void sas_unregister_dev(struct asd_sas_port *port, struct domain_device *dev)
> >>> }
> >>>
> >>> if (!test_and_set_bit(SAS_DEV_DESTROY, &dev->state)) {
> >>>+ struct sas_discovery *disc = &dev->port->disc;
> >>>+ struct sas_work *sw = &disc->disc_work[DISCE_DESTRUCT].work;
> >>>+
> >>> sas_rphy_unlink(dev->rphy);
> >>> list_move_tail(&dev->disco_list_node, &port->destroy_list);
> >>> sas_discover_event(dev->port, DISCE_DESTRUCT);
> >>>+ flush_work(&sw->work);
> >>
> >>I quickly tested plugging out the expander and we never get past this call
> >>to flush - a hang results:
> >
> >Can you activat lockdep so we can see which lock it is that we're blocking on?
> >
>
> I have it on:
> CONFIG_LOCKDEP_SUPPORT=y
> CONFIG_LOCKD=y
> CONFIG_LOCKD_V4=y
>
> >It's most likely in sas_unregister_common_dev() but this function takes two spin
> >locks, port->dev_list_lock and ha->lock.
> >
>
> We can see from the callstack I provided that we're working in workqueue
> scsi_wq_0 and trying to flush that same queue.

Aaahh, now I get what's happening (with some kicks^Whelp from Hannes I admit).

The sas_unregister_dev() comes from the work queued by notify_phy_event(). So this patch must be
replaced by (untested):

diff --git a/drivers/scsi/scsi_transport_sas.c b/drivers/scsi/scsi_transport_sas.c
index cdbb293..e1e6492 100644
--- a/drivers/scsi/scsi_transport_sas.c
+++ b/drivers/scsi/scsi_transport_sas.c
@@ -375,6 +375,7 @@ void sas_remove_children(struct device *dev)
*/
void sas_remove_host(struct Scsi_Host *shost)
{
+ scsi_flush_work(shost);
sas_remove_children(&shost->shost_gendev);
}
EXPORT_SYMBOL(sas_remove_host);

John, mind giving that one a shot in your test setup as well?

Thanks,
Johannes

--
Johannes Thumshirn Storage
[email protected] +49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 N?rnberg
GF: Felix Imend?rffer, Jane Smithard, Graham Norton
HRB 21284 (AG N?rnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850

2017-03-29 12:36:35

by Jack Wang

[permalink] [raw]
Subject: Re: [PATCH 1/2] scsi: sas: flush destruct workqueue on device unregister

On Wed, Mar 29, 2017 at 2:26 PM, Johannes Thumshirn <[email protected]> wrote:
> On Wed, Mar 29, 2017 at 12:53:28PM +0100, John Garry wrote:
>> On 29/03/2017 12:29, Johannes Thumshirn wrote:
>> >On Wed, Mar 29, 2017 at 12:15:44PM +0100, John Garry wrote:
>> >>On 29/03/2017 10:41, Johannes Thumshirn wrote:
>> >>>In the advent of an SAS device unregister we have to wait for all destruct
>> >>>works to be done to not accidently delay deletion of a SAS rphy or it's
>> >>>children to the point when we're removing the SCSI or SAS hosts.
>> >>>
>> >>>Signed-off-by: Johannes Thumshirn <[email protected]>
>> >>>---
>> >>>drivers/scsi/libsas/sas_discover.c | 4 ++++
>> >>>1 file changed, 4 insertions(+)
>> >>>
>> >>>diff --git a/drivers/scsi/libsas/sas_discover.c b/drivers/scsi/libsas/sas_discover.c
>> >>>index 60de662..75b18f1 100644
>> >>>--- a/drivers/scsi/libsas/sas_discover.c
>> >>>+++ b/drivers/scsi/libsas/sas_discover.c
>> >>>@@ -382,9 +382,13 @@ void sas_unregister_dev(struct asd_sas_port *port, struct domain_device *dev)
>> >>> }
>> >>>
>> >>> if (!test_and_set_bit(SAS_DEV_DESTROY, &dev->state)) {
>> >>>+ struct sas_discovery *disc = &dev->port->disc;
>> >>>+ struct sas_work *sw = &disc->disc_work[DISCE_DESTRUCT].work;
>> >>>+
>> >>> sas_rphy_unlink(dev->rphy);
>> >>> list_move_tail(&dev->disco_list_node, &port->destroy_list);
>> >>> sas_discover_event(dev->port, DISCE_DESTRUCT);
>> >>>+ flush_work(&sw->work);
>> >>
>> >>I quickly tested plugging out the expander and we never get past this call
>> >>to flush - a hang results:
>> >
>> >Can you activat lockdep so we can see which lock it is that we're blocking on?
>> >
>>
>> I have it on:
>> CONFIG_LOCKDEP_SUPPORT=y
>> CONFIG_LOCKD=y
>> CONFIG_LOCKD_V4=y
>>
>> >It's most likely in sas_unregister_common_dev() but this function takes two spin
>> >locks, port->dev_list_lock and ha->lock.
>> >
>>
>> We can see from the callstack I provided that we're working in workqueue
>> scsi_wq_0 and trying to flush that same queue.
>
> Aaahh, now I get what's happening (with some kicks^Whelp from Hannes I admit).
>
> The sas_unregister_dev() comes from the work queued by notify_phy_event(). So this patch must be
> replaced by (untested):
>
> diff --git a/drivers/scsi/scsi_transport_sas.c b/drivers/scsi/scsi_transport_sas.c
> index cdbb293..e1e6492 100644
> --- a/drivers/scsi/scsi_transport_sas.c
> +++ b/drivers/scsi/scsi_transport_sas.c
> @@ -375,6 +375,7 @@ void sas_remove_children(struct device *dev)
> */
> void sas_remove_host(struct Scsi_Host *shost)
> {
> + scsi_flush_work(shost);
> sas_remove_children(&shost->shost_gendev);
> }
> EXPORT_SYMBOL(sas_remove_host);
>
> John, mind giving that one a shot in your test setup as well?
>
> Thanks,
> Johannes
>
> --
> Johannes Thumshirn Storage
> [email protected] +49 911 74053 689
> SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
> GF: Felix Imendörffer, Jane Smithard, Graham Norton
> HRB 21284 (AG Nürnberg)
> Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850

Haha, I have same idea :)
Have no test env, so if John could test it, it will be great.
--
Jack Wang
Linux Kernel Developer

ProfitBricks GmbH
Greifswalder Str. 207
D - 10405 Berlin

Tel: +49 30 577 008 042
Fax: +49 30 577 008 299
Email: [email protected]
URL: https://www.profitbricks.de

Sitz der Gesellschaft: Berlin
Registergericht: Amtsgericht Charlottenburg, HRB 125506 B
Geschäftsführer: Achim Weiss

2017-03-29 12:48:00

by Johannes Thumshirn

[permalink] [raw]
Subject: Re: [PATCH 1/2] scsi: sas: flush destruct workqueue on device unregister

On Wed, Mar 29, 2017 at 02:36:11PM +0200, Jinpu Wang wrote:
> On Wed, Mar 29, 2017 at 2:26 PM, Johannes Thumshirn <[email protected]> wrote:
> > On Wed, Mar 29, 2017 at 12:53:28PM +0100, John Garry wrote:
> >> On 29/03/2017 12:29, Johannes Thumshirn wrote:
> >> >On Wed, Mar 29, 2017 at 12:15:44PM +0100, John Garry wrote:
> >> >>On 29/03/2017 10:41, Johannes Thumshirn wrote:
> >> >>>In the advent of an SAS device unregister we have to wait for all destruct
> >> >>>works to be done to not accidently delay deletion of a SAS rphy or it's
> >> >>>children to the point when we're removing the SCSI or SAS hosts.
> >> >>>
> >> >>>Signed-off-by: Johannes Thumshirn <[email protected]>
> >> >>>---
> >> >>>drivers/scsi/libsas/sas_discover.c | 4 ++++
> >> >>>1 file changed, 4 insertions(+)
> >> >>>
> >> >>>diff --git a/drivers/scsi/libsas/sas_discover.c b/drivers/scsi/libsas/sas_discover.c
> >> >>>index 60de662..75b18f1 100644
> >> >>>--- a/drivers/scsi/libsas/sas_discover.c
> >> >>>+++ b/drivers/scsi/libsas/sas_discover.c
> >> >>>@@ -382,9 +382,13 @@ void sas_unregister_dev(struct asd_sas_port *port, struct domain_device *dev)
> >> >>> }
> >> >>>
> >> >>> if (!test_and_set_bit(SAS_DEV_DESTROY, &dev->state)) {
> >> >>>+ struct sas_discovery *disc = &dev->port->disc;
> >> >>>+ struct sas_work *sw = &disc->disc_work[DISCE_DESTRUCT].work;
> >> >>>+
> >> >>> sas_rphy_unlink(dev->rphy);
> >> >>> list_move_tail(&dev->disco_list_node, &port->destroy_list);
> >> >>> sas_discover_event(dev->port, DISCE_DESTRUCT);
> >> >>>+ flush_work(&sw->work);
> >> >>
> >> >>I quickly tested plugging out the expander and we never get past this call
> >> >>to flush - a hang results:
> >> >
> >> >Can you activat lockdep so we can see which lock it is that we're blocking on?
> >> >
> >>
> >> I have it on:
> >> CONFIG_LOCKDEP_SUPPORT=y
> >> CONFIG_LOCKD=y
> >> CONFIG_LOCKD_V4=y
> >>
> >> >It's most likely in sas_unregister_common_dev() but this function takes two spin
> >> >locks, port->dev_list_lock and ha->lock.
> >> >
> >>
> >> We can see from the callstack I provided that we're working in workqueue
> >> scsi_wq_0 and trying to flush that same queue.
> >
> > Aaahh, now I get what's happening (with some kicks^Whelp from Hannes I admit).
> >
> > The sas_unregister_dev() comes from the work queued by notify_phy_event(). So this patch must be
> > replaced by (untested):
> >
> > diff --git a/drivers/scsi/scsi_transport_sas.c b/drivers/scsi/scsi_transport_sas.c
> > index cdbb293..e1e6492 100644
> > --- a/drivers/scsi/scsi_transport_sas.c
> > +++ b/drivers/scsi/scsi_transport_sas.c
> > @@ -375,6 +375,7 @@ void sas_remove_children(struct device *dev)
> > */
> > void sas_remove_host(struct Scsi_Host *shost)
> > {
> > + scsi_flush_work(shost);
> > sas_remove_children(&shost->shost_gendev);
> > }
> > EXPORT_SYMBOL(sas_remove_host);
> >
> > John, mind giving that one a shot in your test setup as well?

Well, don't mind. It doesn't work in my test setup.

I'm back to the drawing board...

Anyways thanks,
Johannes
--
Johannes Thumshirn Storage
[email protected] +49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 N?rnberg
GF: Felix Imend?rffer, Jane Smithard, Graham Norton
HRB 21284 (AG N?rnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850

2017-03-29 12:51:22

by Johannes Thumshirn

[permalink] [raw]
Subject: Re: [PATCH 1/2] scsi: sas: flush destruct workqueue on device unregister

On Wed, Mar 29, 2017 at 02:47:54PM +0200, Johannes Thumshirn wrote:
> > > John, mind giving that one a shot in your test setup as well?
>
> Well, don't mind. It doesn't work in my test setup.
>
> I'm back to the drawing board...

Please ignore my last mail, apparently it's a wise idea to do a git stash pop
after a git stash m)

--
Johannes Thumshirn Storage
[email protected] +49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 N?rnberg
GF: Felix Imend?rffer, Jane Smithard, Graham Norton
HRB 21284 (AG N?rnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850

2017-03-29 15:23:14

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH 0/2] Fix sysfs recursive removal splats in isci

Hello,

On Wed, Mar 29, 2017 at 11:41:07AM +0200, Johannes Thumshirn wrote:
> This series fixes a sysfs warning caused by isci not being able to cope with
> recursive sysfs path removals which are in place since commit bcdde7e
> ("sysfs: make __sysfs_remove_dir() recursive").

Thanks for fixing these.

> The mvsas, aic94xx and pm8001 and hisi_sas patches have been compile tested
> only hence they have no callstack of the affected path in their changelogs.
>
> I'm not sure whether to mark this patches as stable or not. I tend to say no
> here, although we've seen complaints/bug reports on lkml and the scsi list.

Given that the failures aren't critical or all that common (only
happens on controller removal), I agree that not cc'ing stable is the
right call here.

Thanks.

--
tejun