2018-05-29 03:52:07

by Sinan Kaya

[permalink] [raw]
Subject: [PATCH V3 1/2] PCI: Try to clean up resources via remove if shutdown doesn't exist

It is up to a driver to implement shutdown() callback. If shutdown()
callback is not implemented, PCI device can have pending interrupt and
even do DMA transactions while the system is going down.

If kexec is in use, this can damage the newly booting kexec kernel
or even prevent it from booting altogether. Fallback to calling the
remove() callback if shutdown() isn't implemented for a given driver.

Signed-off-by: Sinan Kaya <[email protected]>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=199779
Fixes: cc27b735ad3a ("PCI/portdrv: Turn off PCIe services during shutdown")
Cc: [email protected]
Reported-by: Ryan Finnie <[email protected]>
---
drivers/pci/pci-driver.c | 9 +++++++++
1 file changed, 9 insertions(+)

diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
index cbda0e6..75a00fe 100644
--- a/drivers/pci/pci-driver.c
+++ b/drivers/pci/pci-driver.c
@@ -477,8 +477,17 @@ static void pci_device_shutdown(struct device *dev)

pm_runtime_resume(dev);

+ /*
+ * Try shutdown callback if it exists, otherwise fallback to remove
+ * callback. PCI drivers can do DMA and have pending interrupts.
+ * Leaving the DMA and interrupts pending could damage the newly
+ * booting kexec kernel as well as prevent it from booting altogether
+ * if the pending interrupt is level.
+ */
if (drv && drv->shutdown)
drv->shutdown(pci_dev);
+ else if (drv && drv->remove)
+ drv->remove(pci_dev);

/*
* If this is a kexec reboot, turn off Bus Master bit on the
--
2.7.4



2018-05-29 03:52:11

by Sinan Kaya

[permalink] [raw]
Subject: [PATCH V3 2/2] scsi: hpsa: drop shutdown callback

'Commit cc27b735ad3a ("PCI/portdrv: Turn off PCIe services during
shutdown")' has been added to kernel to shutdown pending PCIe port
service interrupts during reboot so that a newly started kexec kernel
wouldn't observe pending interrupts.

pcie_port_device_remove() is disabling the root port and switches by
calling pci_disable_device() after all PCIe service drivers are shutdown.

This has been found to cause crashes on HP DL360 Gen9 machines during
reboot due to hpsa driver not clearing the bus master bit during the
shutdown procedure by calling pci_disable_device().

Drop the shutdown API and do an orderly clean up by using the remove.

Signed-off-by: Sinan Kaya <[email protected]>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=199779
Fixes: cc27b735ad3a ("PCI/portdrv: Turn off PCIe services during shutdown")
Cc: [email protected]
Reported-by: Ryan Finnie <[email protected]>
---
drivers/scsi/hpsa.c | 1 -
1 file changed, 1 deletion(-)

diff --git a/drivers/scsi/hpsa.c b/drivers/scsi/hpsa.c
index 3a9eca1..3dbef28 100644
--- a/drivers/scsi/hpsa.c
+++ b/drivers/scsi/hpsa.c
@@ -8970,7 +8970,6 @@ static struct pci_driver hpsa_pci_driver = {
.probe = hpsa_init_one,
.remove = hpsa_remove_one,
.id_table = hpsa_pci_device_id, /* id_table */
- .shutdown = hpsa_shutdown,
.suspend = hpsa_suspend,
.resume = hpsa_resume,
};
--
2.7.4


2018-05-30 01:50:19

by Ryan Finnie

[permalink] [raw]
Subject: Re: [PATCH V3 1/2] PCI: Try to clean up resources via remove if shutdown doesn't exist

On 05/28/2018 02:21 PM, Sinan Kaya wrote:
> It is up to a driver to implement shutdown() callback. If shutdown()
> callback is not implemented, PCI device can have pending interrupt and
> even do DMA transactions while the system is going down.
>
> If kexec is in use, this can damage the newly booting kexec kernel
> or even prevent it from booting altogether. Fallback to calling the
> remove() callback if shutdown() isn't implemented for a given driver.
>
> Signed-off-by: Sinan Kaya <[email protected]>
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=199779
> Fixes: cc27b735ad3a ("PCI/portdrv: Turn off PCIe services during shutdown")
> Cc: [email protected]
> Reported-by: Ryan Finnie <[email protected]>

Tested successfully on DL360 Gen9 and DL380 Gen9.

Tested-by: Ryan Finnie <[email protected]>

2018-05-30 01:51:13

by Ryan Finnie

[permalink] [raw]
Subject: Re: [PATCH V3 2/2] scsi: hpsa: drop shutdown callback

On 05/28/2018 02:21 PM, Sinan Kaya wrote:
> 'Commit cc27b735ad3a ("PCI/portdrv: Turn off PCIe services during
> shutdown")' has been added to kernel to shutdown pending PCIe port
> service interrupts during reboot so that a newly started kexec kernel
> wouldn't observe pending interrupts.
>
> pcie_port_device_remove() is disabling the root port and switches by
> calling pci_disable_device() after all PCIe service drivers are shutdown.
>
> This has been found to cause crashes on HP DL360 Gen9 machines during
> reboot due to hpsa driver not clearing the bus master bit during the
> shutdown procedure by calling pci_disable_device().
>
> Drop the shutdown API and do an orderly clean up by using the remove.
>
> Signed-off-by: Sinan Kaya <[email protected]>
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=199779
> Fixes: cc27b735ad3a ("PCI/portdrv: Turn off PCIe services during shutdown")
> Cc: [email protected]
> Reported-by: Ryan Finnie <[email protected]>

Tested successfully on DL360 Gen9 and DL380 Gen9.

Tested-by: Ryan Finnie <[email protected]>


2018-05-31 01:09:36

by Sinan Kaya

[permalink] [raw]
Subject: Re: [PATCH V3 2/2] scsi: hpsa: drop shutdown callback

On 2018-05-30 15:25, Don Brace wrote:
>> -----Original Message-----
>> From: Ryan Finnie [mailto:[email protected]]
>> Sent: Tuesday, May 29, 2018 8:50 PM
>> To: Sinan Kaya <[email protected]>; [email protected];
>> [email protected]
>> Cc: [email protected];
>> [email protected];
>> [email protected]; Don Brace <[email protected]>; James
>> E.J.
>> Bottomley <[email protected]>; Martin K. Petersen
>> <[email protected]>; esc.storagedev
>> <[email protected]>; open list:HEWLETT-PACKARD SMART ARRAY
>> RAID DRIVER (hpsa) <[email protected]>; open list <linux-
>> [email protected]>
>> Subject: Re: [PATCH V3 2/2] scsi: hpsa: drop shutdown callback
>>
>> EXTERNAL EMAIL
>>
>>
>> On 05/28/2018 02:21 PM, Sinan Kaya wrote:
>> > 'Commit cc27b735ad3a ("PCI/portdrv: Turn off PCIe services during
>> > shutdown")' has been added to kernel to shutdown pending PCIe port
>> > service interrupts during reboot so that a newly started kexec kernel
>> > wouldn't observe pending interrupts.
>> >
>> > pcie_port_device_remove() is disabling the root port and switches by
>> > calling pci_disable_device() after all PCIe service drivers are shutdown.
>> >
>> > This has been found to cause crashes on HP DL360 Gen9 machines during
>> > reboot due to hpsa driver not clearing the bus master bit during the
>> > shutdown procedure by calling pci_disable_device().
>> >
>> > Drop the shutdown API and do an orderly clean up by using the remove.
>> >
>> > Signed-off-by: Sinan Kaya <[email protected]>
>> > Link: https://bugzilla.kernel.org/show_bug.cgi?id=199779
>> > Fixes: cc27b735ad3a ("PCI/portdrv: Turn off PCIe services during shutdown")
>> > Cc: [email protected]
>> > Reported-by: Ryan Finnie <[email protected]>
>>
>> Tested successfully on DL360 Gen9 and DL380 Gen9.
>>
>> Tested-by: Ryan Finnie <[email protected]>
>
> The shutdown path issues a cache flush to the controller.
> Without this flush, you will see "Dirty Cache" messages at POST.
> It is best to keep the shutdown path.
>

I have seen that shutdown() is also called from remove().

remove() is supposed to do a safe cleanup too. If it is leaving the hw
in inconsistent state even though it is c lling shutdown , it is yet
another bug.

> Thanks,
> Don Brace
> ESC - Smart Storage
> Microsemi Corporation

2018-06-01 13:35:35

by Sinan Kaya

[permalink] [raw]
Subject: Re: [PATCH V3 2/2] scsi: hpsa: drop shutdown callback

On 5/30/2018 9:08 PM, [email protected] wrote:
> I have seen that shutdown() is also called from remove().
>
> remove() is supposed to do a safe cleanup too. If it is leaving the hw in inconsistent state even though it is c lling shutdown , it is yet another bug.

Let's try to be constructive. I'll post a patch with the pci_disable added to shutdown
only as in my original proposal.

Somebody can deal with remove another day.

--
Sinan Kaya
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.