The fix of Powerpc hotplug driver (drivers/pci/hotplug/pnv_php.c)
addresses below two issues.
1. Kernel Crash during hot unplug of bridge/switch slot.
2. DPC-Support Enablement - Previously, when we do a hot-unplug
operation on a bridge slot, all the ports and devices behind the
bridge-ports would be hot-unplugged/offline, but when we do a hot-plug
operation on the same bridge slot, all the ports and devices behind the
bridge would not get hot-plugged/online. In this case, Only the first
port of the bridge gets enabled and the remaining port/devices remain
unplugged/offline. After the fix, The hot-unplug and hot-plug
operations on the slot associated with the bridge started behaving
correctly and became in sync. Now, after the hot plug operation on the
same slot, all the bridge ports and devices behind the bridge become
hot-plugged/online/restored in the same manner as it was before the
hot-unplug operation.
Krishna Kumar (2):
pci/hotplug/pnv_php: Fix hotplug driver crash on Powernv
arch/powerpc: hotplug driver bridge support
arch/powerpc/include/asm/ppc-pci.h | 4 +++
arch/powerpc/kernel/pci-hotplug.c | 5 ++--
arch/powerpc/kernel/pci_dn.c | 42 ++++++++++++++++++++++++++++++
drivers/pci/hotplug/pnv_php.c | 3 +--
4 files changed, 49 insertions(+), 5 deletions(-)
--
2.44.0
Description of the problem: The hotplug driver for powerpc
(pci/hotplug/pnv_php.c) gives kernel crash when we try to
hot-unplug/disable the PCIe switch/bridge from the PHB.
Root Cause of Crash: The crash is due to the reason that, though the msi
data structure has been released during disable/hot-unplug path and it
has been assigned with NULL, still during unregistartion the code was
again trying to explicitly disable the msi which causes the Null pointer
dereference and kernel crash.
Proposed Fix : The fix is to correct the check during unregistration path
so that the code should not try to invoke pci_disable_msi/msix() if its
data structure is already freed.
Signed-off-by: Krishna Kumar <[email protected]>
---
Command used for reproducing the bug:
echo 0 > /sys/bus/pci/slots/C5/power
where C5 is slot associated with bridge.
Snippet of Crash:
Kernel attempted to read user page (10) - exploit attempt? (uid: 0)
BUG: Kernel NULL pointer dereference on read at 0x00000010
Faulting instruction address: 0xc000000000fad7d4
Oops: Kernel access of bad area, sig: 11 [#1]
Hardware name: 5105-22E POWER9 0x4e1203 opal:v7.0-39-g4660e63a PowerNV
NIP [c000000000fad7d4] mutex_lock+0x34/0x88
LR [c000000000fad7c8] mutex_lock+0x28/0x88
Call Trace:
[c00000017075f940] [c000000000fad7c8] mutex_lock+0x28/0x88 (unreliable)
[c00000017075f970] [c000000000214464] msi_lock_descs+0x28/0x3c
[c00000017075f990] [c0000000008e8be8] pci_disable_msi+0x68/0xa8
[c00000017075f9c0] [c00000000090f0a4] pnv_php_disable_irq+0x2a0/0x2b0
[c00000017075fab0] [c00000000090f128] pnv_php_free_slot+0x74/0xc4
[c00000017075fb30] [c000000000912184] pnv_php_disable_slot.part.0+0x1b8/0x24c
[c00000017075fc00] [c000000000902df0] power_write_file+0xf8/0x18c
[c00000017075fc80] [c0000000008f84d8] pci_slot_attr_store+0x40/0x5c
[c00000017075fca0] [c0000000006b4834] sysfs_kf_write+0x64/0x78
[c00000017075fcc0] [c0000000006b3300] kernfs_fop_write_iter+0x1b8/0x2dc
[c00000017075fd10] [c0000000005b3eb0] vfs_write+0x224/0x4e8
[c00000017075fdc0] [c0000000005b44b0] ksys_write+0x88/0x150
[c00000017075fe10] [c000000000030864] system_call_exception+0x124/0x320
[c00000017075fe50] [c00000000000cedc] system_call_vectored_common+0x15c/0x2ec
--- interrupt: 3000 at 0x7fffb9748774
Root-Cause: Its safe to call pci_disable_msi() if its associated data structre are
not freed (during bailout path). But when the driver code disables the
bridge during hot-unplug operation, its msi data structure becomes NULL
(php_slot->pdev->dev.msi.data:0000000000000000). This happens before
unregistration and during disable path in function
msi_device_data_release(). In this case, its not safe to explicitly call
pci_disable_msi/msix() due to NULL pointer dereference. But since the
current code does so, the crash is happening at the line
mutex_lock(&dev->msi.data->mutex).
FIX: In the current code, there are two paths to invoke
pci_disable_msi/msix(). In the error/bailout path, first argument of the
check - if(disable_device || irq > 0), i.e. disable_device is true, so
it will always invoke pci_disable_msi/msix(), it will never depend on
second argument. In this path it's fine to call pci_disable_msi/msix().
During the slot releasing/disable/hot-unpug path the disable_device is
false, irq is having old value which is making the overall check true
and causing the crash. Of course, we should not choose the old/stale
value of irq but should choose php_slot->irq for check. Also, since
php_slot->irq value is always 0 before the check, so it does not matter
if it will not be included into the check. So, the check can be formed
with only one argument i.e. disable_device. Based on its value
pci_disable_msi/msix() will be invoked and this is the fix for the
crash. During the bailout path its value will be true and during the
hot-unplug operation on the bridge slot, its value will be false.
drivers/pci/hotplug/pnv_php.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/drivers/pci/hotplug/pnv_php.c b/drivers/pci/hotplug/pnv_php.c
index 694349be9d0a..573a41869c15 100644
--- a/drivers/pci/hotplug/pnv_php.c
+++ b/drivers/pci/hotplug/pnv_php.c
@@ -40,7 +40,6 @@ static void pnv_php_disable_irq(struct pnv_php_slot *php_slot,
bool disable_device)
{
struct pci_dev *pdev = php_slot->pdev;
- int irq = php_slot->irq;
u16 ctrl;
if (php_slot->irq > 0) {
@@ -59,7 +58,7 @@ static void pnv_php_disable_irq(struct pnv_php_slot *php_slot,
php_slot->wq = NULL;
}
- if (disable_device || irq > 0) {
+ if (disable_device) {
if (pdev->msix_enabled)
pci_disable_msix(pdev);
else if (pdev->msi_enabled)
--
2.44.0
There is an issue with the hotplug operation when it's done on the
bridge/switch slot. The bridge-port and devices behind the bridge, which
become offline by hot-unplug operation, don't get hot-plugged/enabled by
doing hot-plug operation on that slot. Only the first port of the bridge
gets enabled and the remaining port/devices remain unplugged. The hot
plug/unplug operation is done by the hotplug driver
(drivers/pci/hotplug/pnv_php.c).
Root Cause Analysis: This behavior is due to missing code for the DPC
switch/bridge. The existing driver depends on pci_hp_add_devices()
function for device enablement. This function calls pci_scan_slot() on
only one device-node/port of the bridge, not on all the siblings'
device-node/port.
The missing code needs to be added which will find all the sibling
device-nodes/bridge-ports and will run explicit pci_scan_slot() on
those. A new function has been added for this purpose which gets
invoked from pci_hp_add_devices(). This new function
pci_traverse_sibling_nodes_and_scan_slot() gets all the sibling
bridge-ports by traversal and explicitly invokes pci_scan_slot on them.
Signed-off-by: Krishna Kumar <[email protected]>
---
Command for reproducing the issue :
For hot unplug/disable - echo 0 > /sys/bus/pci/slots/C5/power
For hot plug/enable - echo 1 > /sys/bus/pci/slots/C5/power
where C5 is slot associated with bridge.
Scenario/Tests:
Output of lspci -nn before test is given below. This snippet contains
devices used for testing on Powernv machine.
0004:02:00.0 PCI bridge [0604]: PMC-Sierra Inc. Device [11f8:4052]
0004:02:01.0 PCI bridge [0604]: PMC-Sierra Inc. Device [11f8:4052]
0004:02:02.0 PCI bridge [0604]: PMC-Sierra Inc. Device [11f8:4052]
0004:02:03.0 PCI bridge [0604]: PMC-Sierra Inc. Device [11f8:4052]
0004:08:00.0 Serial Attached SCSI controller [0107]:
Broadcom / LSI SAS3216 PCI-Express Fusion-MPT SAS-3 [1000:00c9] (rev 01)
0004:09:00.0 Serial Attached SCSI controller [0107]:
Broadcom / LSI SAS3216 PCI-Express Fusion-MPT SAS-3 [1000:00c9] (rev 01)
Output of lspci -tv before test is as follows:
# lspci -tv
+-[0004:00]---00.0-[01-0e]--+-00.0-[02-0e]--+-00.0-[03-07]--
| | +-01.0-[08]----00.0 Broadcom / LSI SAS3216 PCI-Express Fusion-MPT SAS-3
| | +-02.0-[09]----00.0 Broadcom / LSI SAS3216 PCI-Express Fusion-MPT SAS-3
| | \-03.0-[0a-0e]--
| \-00.1 PMC-Sierra Inc. Device 4052
C5(bridge) and C6(End Point) slot address are as below:
# cat /sys/bus/pci/slots/C5/address
0004:02:00
# cat /sys/bus/pci/slots/C6/address
0004:09:00
Hot-unplug operation on slot associated with bridge:
# echo 0 > /sys/bus/pci/slots/C5/power
# lspci -tv
+-[0004:00]---00.0-[01-0e]--+-00.0-[02-0e]--
| \-00.1 PMC-Sierra Inc. Device 4052
From the above lspci -tv output, it can be observed that hot unplug
operation has removed all the PMC-Sierra bridge ports like:
00.0-[03-07], 01.0-[08], 02.0-[09], 03.0-[0a-0e] and the SAS devices
behind the bridge-port. Without the fix, when the hot plug operation is
done on the same slot, it adds only the first bridge port and doesn't
restore all the bridge-ports and devices that it unplugged earlier.
Below snippet shows this.
Hot-plug operation on the bridge slot without the fix:
# echo 1 > /sys/bus/pci/slots/C5/power
# lspci -tv
+-[0004:00]---00.0-[01-0e]--+-00.0-[02-0e]--+-00.0-[03-07]--
| \-00.1 PMC-Sierra Inc. Device 4052
After the fix, it restores all the devices in the same manner how it
unplugged them earlier during the hot unplug operation. The below snippet
shows the same.
Hot-plug operation on bridge slot with the fix:
# echo 1 > /sys/bus/pci/slots/C5/power
# lspci -tv
+-[0004:00]---00.0-[01-0e]--+-00.0-[02-0e]--+-00.0-[03-07]--
| | +-01.0-[08]----00.0 Broadcom / LSI SAS3216 PCI-Express Fusion-MPT SAS-3
| | +-02.0-[09]----00.0 Broadcom / LSI SAS3216 PCI-Express Fusion-MPT SAS-3
| | \-03.0-[0a-0e]--
| \-00.1 PMC-Sierra Inc. Device 4052
Removal of End point device behind bridge are also intact and behaving
correctly.
Hot-unplug operation on Endpoint device C6:
# echo 0 > /sys/bus/pci/slots/C6/power
# lspci -tv
+-[0004:00]---00.0-[01-0e]--+-00.0-[02-0e]--+-00.0-[03-07]--
| | +-01.0-[08]----00.0 Broadcom / LSI SAS3216 PCI-Express Fusion-MPT SAS-3
| | +-02.0-[09]--
| | \-03.0-[0a-0e]--
| \-00.1 PMC-Sierra Inc. Device 4052
Hot-plug operation on Endpoint device C6:
# echo 1 > /sys/bus/pci/slots/C6/power
# lspci -tv
+-[0004:00]---00.0-[01-0e]--+-00.0-[02-0e]--+-00.0-[03-07]--
| | +-01.0-[08]----00.0 Broadcom / LSI SAS3216 PCI-Express Fusion-MPT SAS-3
| | +-02.0-[09]----00.0 Broadcom / LSI SAS3216 PCI-Express Fusion-MPT SAS-3
| | \-03.0-[0a-0e]--
| \-00.1 PMC-Sierra Inc. Device 4052
arch/powerpc/include/asm/ppc-pci.h | 4 +++
arch/powerpc/kernel/pci-hotplug.c | 5 ++--
arch/powerpc/kernel/pci_dn.c | 42 ++++++++++++++++++++++++++++++
3 files changed, 48 insertions(+), 3 deletions(-)
diff --git a/arch/powerpc/include/asm/ppc-pci.h b/arch/powerpc/include/asm/ppc-pci.h
index a8b7e8682f5b..a5d5ee4ff7c0 100644
--- a/arch/powerpc/include/asm/ppc-pci.h
+++ b/arch/powerpc/include/asm/ppc-pci.h
@@ -28,6 +28,10 @@ struct pci_dn;
void *pci_traverse_device_nodes(struct device_node *start,
void *(*fn)(struct device_node *, void *),
void *data);
+
+void *pci_traverse_sibling_nodes_and_scan_slot(struct device_node *start,
+ struct pci_bus *bus);
+
extern void pci_devs_phb_init_dynamic(struct pci_controller *phb);
#if defined(CONFIG_IOMMU_API) && (defined(CONFIG_PPC_PSERIES) || \
diff --git a/arch/powerpc/kernel/pci-hotplug.c b/arch/powerpc/kernel/pci-hotplug.c
index 0fe251c6ac2c..639a3d592fe2 100644
--- a/arch/powerpc/kernel/pci-hotplug.c
+++ b/arch/powerpc/kernel/pci-hotplug.c
@@ -106,7 +106,7 @@ EXPORT_SYMBOL_GPL(pci_hp_remove_devices);
*/
void pci_hp_add_devices(struct pci_bus *bus)
{
- int slotno, mode, max;
+ int mode, max;
struct pci_dev *dev;
struct pci_controller *phb;
struct device_node *dn = pci_bus_to_OF_node(bus);
@@ -129,8 +129,7 @@ void pci_hp_add_devices(struct pci_bus *bus)
* order for fully rescan all the way down to pick them up.
* They can have been removed during partial hotplug.
*/
- slotno = PCI_SLOT(PCI_DN(dn->child)->devfn);
- pci_scan_slot(bus, PCI_DEVFN(slotno, 0));
+ pci_traverse_sibling_nodes_and_scan_slot(dn, bus);
max = bus->busn_res.start;
/*
* Scan bridges that are already configured. We don't touch
diff --git a/arch/powerpc/kernel/pci_dn.c b/arch/powerpc/kernel/pci_dn.c
index 38561d6a2079..2e202f9cec21 100644
--- a/arch/powerpc/kernel/pci_dn.c
+++ b/arch/powerpc/kernel/pci_dn.c
@@ -493,4 +493,46 @@ static void pci_dev_pdn_setup(struct pci_dev *pdev)
pdn = pci_get_pdn(pdev);
pdev->dev.archdata.pci_data = pdn;
}
+
+void *pci_traverse_sibling_nodes_and_scan_slot(struct device_node *start, struct pci_bus *bus)
+{
+ struct device_node *dn;
+ struct device_node *parent;
+ int slotno;
+
+ const __be32 *classp1;
+ u32 class1 = 0;
+
+ classp1 = of_get_property(start->child, "class-code", NULL);
+ if (classp1)
+ class1 = of_read_number(classp1, 1);
+
+ /* Call of pci_scan_slot for non-bridge/EP case */
+ if (!((class1 >> 8) == PCI_CLASS_BRIDGE_PCI)) {
+ slotno = PCI_SLOT(PCI_DN(start->child)->devfn);
+ pci_scan_slot(bus, PCI_DEVFN(slotno, 0));
+ return NULL;
+ }
+
+ /* Iterate all siblings */
+ parent = start;
+ for_each_child_of_node(parent, dn) {
+ const __be32 *classp;
+ u32 class = 0;
+
+ classp = of_get_property(dn, "class-code", NULL);
+ if (classp)
+ class = of_read_number(classp, 1);
+
+ /* Call of pci_scan_slot on each sibling-nodes/bridge-ports */
+ if ((class >> 8) == PCI_CLASS_BRIDGE_PCI) {
+ slotno = PCI_SLOT(PCI_DN(dn)->devfn);
+ pci_scan_slot(bus, PCI_DEVFN(slotno, 0));
+ }
+ }
+
+ return NULL;
+}
+EXPORT_SYMBOL_GPL(pci_traverse_sibling_nodes_and_scan_slot);
+
DECLARE_PCI_FIXUP_EARLY(PCI_ANY_ID, PCI_ANY_ID, pci_dev_pdn_setup);
--
2.44.0
Thu, May 09, 2024 at 05:35:54PM +0530, Krishna Kumar kirjoitti:
> There is an issue with the hotplug operation when it's done on the
> bridge/switch slot. The bridge-port and devices behind the bridge, which
> become offline by hot-unplug operation, don't get hot-plugged/enabled by
> doing hot-plug operation on that slot. Only the first port of the bridge
> gets enabled and the remaining port/devices remain unplugged. The hot
> plug/unplug operation is done by the hotplug driver
> (drivers/pci/hotplug/pnv_php.c).
>
> Root Cause Analysis: This behavior is due to missing code for the DPC
> switch/bridge. The existing driver depends on pci_hp_add_devices()
> function for device enablement. This function calls pci_scan_slot() on
> only one device-node/port of the bridge, not on all the siblings'
> device-node/port.
>
> The missing code needs to be added which will find all the sibling
> device-nodes/bridge-ports and will run explicit pci_scan_slot() on
> those. A new function has been added for this purpose which gets
> invoked from pci_hp_add_devices(). This new function
> pci_traverse_sibling_nodes_and_scan_slot() gets all the sibling
> bridge-ports by traversal and explicitly invokes pci_scan_slot on them.
>
>
One blank line is enough here.
> Signed-off-by: Krishna Kumar <[email protected]>
..
> +void *pci_traverse_sibling_nodes_and_scan_slot(struct device_node *start, struct pci_bus *bus)
> +{
> + struct device_node *dn;
> + struct device_node *parent;
> + int slotno;
> +
> + const __be32 *classp1;
> + u32 class1 = 0;
> + classp1 = of_get_property(start->child, "class-code", NULL);
> + if (classp1)
> + class1 = of_read_number(classp1, 1);
What's wrong with of_property_read_u32()?
> + /* Call of pci_scan_slot for non-bridge/EP case */
> + if (!((class1 >> 8) == PCI_CLASS_BRIDGE_PCI)) {
> + slotno = PCI_SLOT(PCI_DN(start->child)->devfn);
> + pci_scan_slot(bus, PCI_DEVFN(slotno, 0));
> + return NULL;
> + }
> +
> + /* Iterate all siblings */
> + parent = start;
> + for_each_child_of_node(parent, dn) {
> + const __be32 *classp;
> + u32 class = 0;
> +
> + classp = of_get_property(dn, "class-code", NULL);
> + if (classp)
> + class = of_read_number(classp, 1);
Ditto.
> + /* Call of pci_scan_slot on each sibling-nodes/bridge-ports */
> + if ((class >> 8) == PCI_CLASS_BRIDGE_PCI) {
> + slotno = PCI_SLOT(PCI_DN(dn)->devfn);
> + pci_scan_slot(bus, PCI_DEVFN(slotno, 0));
> + }
> + }
> +
> + return NULL;
> +}
--
With Best Regards,
Andy Shevchenko
Hello Krishna,
Is "arch" in commit title really required?
On 09/05/24 17:35, Krishna Kumar wrote:
> There is an issue with the hotplug operation when it's done on the
> bridge/switch slot. The bridge-port and devices behind the bridge, which
> become offline by hot-unplug operation, don't get hot-plugged/enabled by
> doing hot-plug operation on that slot. Only the first port of the bridge
> gets enabled and the remaining port/devices remain unplugged. The hot
> plug/unplug operation is done by the hotplug driver
> (drivers/pci/hotplug/pnv_php.c).
>
> Root Cause Analysis: This behavior is due to missing code for the DPC
> switch/bridge. The existing driver depends on pci_hp_add_devices()
> function for device enablement. This function calls pci_scan_slot() on
> only one device-node/port of the bridge, not on all the siblings'
> device-node/port.
>
> The missing code needs to be added which will find all the sibling
> device-nodes/bridge-ports and will run explicit pci_scan_slot() on
> those. A new function has been added for this purpose which gets
> invoked from pci_hp_add_devices(). This new function
> pci_traverse_sibling_nodes_and_scan_slot() gets all the sibling
> bridge-ports by traversal and explicitly invokes pci_scan_slot on them.
>
>
> Signed-off-by: Krishna Kumar <[email protected]>
> ---
>
> Command for reproducing the issue :
>
> For hot unplug/disable - echo 0 > /sys/bus/pci/slots/C5/power
> For hot plug/enable - echo 1 > /sys/bus/pci/slots/C5/power
>
> where C5 is slot associated with bridge.
>
> Scenario/Tests:
> Output of lspci -nn before test is given below. This snippet contains
> devices used for testing on Powernv machine.
>
> 0004:02:00.0 PCI bridge [0604]: PMC-Sierra Inc. Device [11f8:4052]
> 0004:02:01.0 PCI bridge [0604]: PMC-Sierra Inc. Device [11f8:4052]
> 0004:02:02.0 PCI bridge [0604]: PMC-Sierra Inc. Device [11f8:4052]
> 0004:02:03.0 PCI bridge [0604]: PMC-Sierra Inc. Device [11f8:4052]
> 0004:08:00.0 Serial Attached SCSI controller [0107]:
> Broadcom / LSI SAS3216 PCI-Express Fusion-MPT SAS-3 [1000:00c9] (rev 01)
> 0004:09:00.0 Serial Attached SCSI controller [0107]:
> Broadcom / LSI SAS3216 PCI-Express Fusion-MPT SAS-3 [1000:00c9] (rev 01)
>
> Output of lspci -tv before test is as follows:
>
> # lspci -tv
> +-[0004:00]---00.0-[01-0e]--+-00.0-[02-0e]--+-00.0-[03-07]--
> | | +-01.0-[08]----00.0 Broadcom / LSI SAS3216 PCI-Express Fusion-MPT SAS-3
> | | +-02.0-[09]----00.0 Broadcom / LSI SAS3216 PCI-Express Fusion-MPT SAS-3
> | | \-03.0-[0a-0e]--
> | \-00.1 PMC-Sierra Inc. Device 4052
>
> C5(bridge) and C6(End Point) slot address are as below:
> # cat /sys/bus/pci/slots/C5/address
> 0004:02:00
> # cat /sys/bus/pci/slots/C6/address
> 0004:09:00
>
> Hot-unplug operation on slot associated with bridge:
> # echo 0 > /sys/bus/pci/slots/C5/power
> # lspci -tv
> +-[0004:00]---00.0-[01-0e]--+-00.0-[02-0e]--
> | \-00.1 PMC-Sierra Inc. Device 4052
>
> From the above lspci -tv output, it can be observed that hot unplug
> operation has removed all the PMC-Sierra bridge ports like:
> 00.0-[03-07], 01.0-[08], 02.0-[09], 03.0-[0a-0e] and the SAS devices
> behind the bridge-port. Without the fix, when the hot plug operation is
> done on the same slot, it adds only the first bridge port and doesn't
> restore all the bridge-ports and devices that it unplugged earlier.
> Below snippet shows this.
>
> Hot-plug operation on the bridge slot without the fix:
> # echo 1 > /sys/bus/pci/slots/C5/power
> # lspci -tv
> +-[0004:00]---00.0-[01-0e]--+-00.0-[02-0e]--+-00.0-[03-07]--
> | \-00.1 PMC-Sierra Inc. Device 4052
>
> After the fix, it restores all the devices in the same manner how it
> unplugged them earlier during the hot unplug operation. The below snippet
> shows the same.
> Hot-plug operation on bridge slot with the fix:
> # echo 1 > /sys/bus/pci/slots/C5/power
> # lspci -tv
> +-[0004:00]---00.0-[01-0e]--+-00.0-[02-0e]--+-00.0-[03-07]--
> | | +-01.0-[08]----00.0 Broadcom / LSI SAS3216 PCI-Express Fusion-MPT SAS-3
> | | +-02.0-[09]----00.0 Broadcom / LSI SAS3216 PCI-Express Fusion-MPT SAS-3
> | | \-03.0-[0a-0e]--
> | \-00.1 PMC-Sierra Inc. Device 4052
>
> Removal of End point device behind bridge are also intact and behaving
> correctly.
> Hot-unplug operation on Endpoint device C6:
> # echo 0 > /sys/bus/pci/slots/C6/power
> # lspci -tv
> +-[0004:00]---00.0-[01-0e]--+-00.0-[02-0e]--+-00.0-[03-07]--
> | | +-01.0-[08]----00.0 Broadcom / LSI SAS3216 PCI-Express Fusion-MPT SAS-3
> | | +-02.0-[09]--
> | | \-03.0-[0a-0e]--
> | \-00.1 PMC-Sierra Inc. Device 4052
>
> Hot-plug operation on Endpoint device C6:
> # echo 1 > /sys/bus/pci/slots/C6/power
> # lspci -tv
> +-[0004:00]---00.0-[01-0e]--+-00.0-[02-0e]--+-00.0-[03-07]--
> | | +-01.0-[08]----00.0 Broadcom / LSI SAS3216 PCI-Express Fusion-MPT SAS-3
> | | +-02.0-[09]----00.0 Broadcom / LSI SAS3216 PCI-Express Fusion-MPT SAS-3
> | | \-03.0-[0a-0e]--
> | \-00.1 PMC-Sierra Inc. Device 4052
>
>
>
> arch/powerpc/include/asm/ppc-pci.h | 4 +++
> arch/powerpc/kernel/pci-hotplug.c | 5 ++--
> arch/powerpc/kernel/pci_dn.c | 42 ++++++++++++++++++++++++++++++
> 3 files changed, 48 insertions(+), 3 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/ppc-pci.h b/arch/powerpc/include/asm/ppc-pci.h
> index a8b7e8682f5b..a5d5ee4ff7c0 100644
> --- a/arch/powerpc/include/asm/ppc-pci.h
> +++ b/arch/powerpc/include/asm/ppc-pci.h
> @@ -28,6 +28,10 @@ struct pci_dn;
> void *pci_traverse_device_nodes(struct device_node *start,
> void *(*fn)(struct device_node *, void *),
> void *data);
> +
> +void *pci_traverse_sibling_nodes_and_scan_slot(struct device_node *start,
> + struct pci_bus *bus);
> +
> extern void pci_devs_phb_init_dynamic(struct pci_controller *phb);
>
> #if defined(CONFIG_IOMMU_API) && (defined(CONFIG_PPC_PSERIES) || \
> diff --git a/arch/powerpc/kernel/pci-hotplug.c b/arch/powerpc/kernel/pci-hotplug.c
> index 0fe251c6ac2c..639a3d592fe2 100644
> --- a/arch/powerpc/kernel/pci-hotplug.c
> +++ b/arch/powerpc/kernel/pci-hotplug.c
> @@ -106,7 +106,7 @@ EXPORT_SYMBOL_GPL(pci_hp_remove_devices);
> */
> void pci_hp_add_devices(struct pci_bus *bus)
> {
> - int slotno, mode, max;
> + int mode, max;
> struct pci_dev *dev;
> struct pci_controller *phb;
> struct device_node *dn = pci_bus_to_OF_node(bus);
> @@ -129,8 +129,7 @@ void pci_hp_add_devices(struct pci_bus *bus)
> * order for fully rescan all the way down to pick them up.
> * They can have been removed during partial hotplug.
> */
> - slotno = PCI_SLOT(PCI_DN(dn->child)->devfn);
> - pci_scan_slot(bus, PCI_DEVFN(slotno, 0));
> + pci_traverse_sibling_nodes_and_scan_slot(dn, bus);
> max = bus->busn_res.start;
> /*
> * Scan bridges that are already configured. We don't touch
> diff --git a/arch/powerpc/kernel/pci_dn.c b/arch/powerpc/kernel/pci_dn.c
> index 38561d6a2079..2e202f9cec21 100644
> --- a/arch/powerpc/kernel/pci_dn.c
> +++ b/arch/powerpc/kernel/pci_dn.c
> @@ -493,4 +493,46 @@ static void pci_dev_pdn_setup(struct pci_dev *pdev)
> pdn = pci_get_pdn(pdev);
> pdev->dev.archdata.pci_data = pdn;
> }
> +
> +void *pci_traverse_sibling_nodes_and_scan_slot(struct device_node *start, struct pci_bus *bus)
Two things regarding the return type of the above function.
1. Function only returns NULL
2. Caller of this function doesn't take any action based on the return
value of this function.
How about changing the return type from void * to just void?
> +{
> + struct device_node *dn;
> + struct device_node *parent;
parent variable is not really required.
> + int slotno;
> +
> + const __be32 *classp1;
> + u32 class1 = 0;
> +
> + classp1 = of_get_property(start->child, "class-code", NULL);
> + if (classp1)
> + class1 = of_read_number(classp1, 1);
> +
> + /* Call of pci_scan_slot for non-bridge/EP case */
> + if (!((class1 >> 8) == PCI_CLASS_BRIDGE_PCI)) {
> + slotno = PCI_SLOT(PCI_DN(start->child)->devfn);
> + pci_scan_slot(bus, PCI_DEVFN(slotno, 0));
> + return NULL;
> + }
> +
> + /* Iterate all siblings */
> + parent = start;
> + for_each_child_of_node(parent, dn) {
> + const __be32 *classp;
> + u32 class = 0;
> +
> + classp = of_get_property(dn, "class-code", NULL);
> + if (classp)
> + class = of_read_number(classp, 1);
> +
> + /* Call of pci_scan_slot on each sibling-nodes/bridge-ports */
> + if ((class >> 8) == PCI_CLASS_BRIDGE_PCI) {
> + slotno = PCI_SLOT(PCI_DN(dn)->devfn);
> + pci_scan_slot(bus, PCI_DEVFN(slotno, 0));
> + }
> + }
> +
> + return NULL;
Some level of code duplication can be avoided if we push both cases
1. Call of pci_scan_slot for non-bridge/EP case
2. Call of pci_scan_slot for bridge port
inside `for_each_child_of_node` macro.
Something like this.
u32 class;
int slotno;
const __be32 *classp;
struct device_node *dn;
for_each_child_of_node(start, dn) {
class = 0;
classp = of_get_property(dn, "class-code", NULL);
if (classp)
class = of_read_number(classp, 1);
/* Call of pci_scan_slot for non-bridge/EP case */
if (!((class >> 8) == PCI_CLASS_BRIDGE_PCI) && start->child ==
dn) {
slotno = PCI_SLOT(PCI_DN(dn)->devfn);
pci_scan_slot(bus, PCI_DEVFN(slotno, 0));
of_node_put(dn);
return;
}
/* Call of pci_scan_slot for bridge port */
if ((class >> 8) == PCI_CLASS_BRIDGE_PCI) {
slotno = PCI_SLOT(PCI_DN(dn)->devfn);
pci_scan_slot(bus, PCI_DEVFN(slotno, 0));
}
}
> +}
> +EXPORT_SYMBOL_GPL(pci_traverse_sibling_nodes_and_scan_slot);
What is the need for exporting the above function?
> +
> DECLARE_PCI_FIXUP_EARLY(PCI_ANY_ID, PCI_ANY_ID, pci_dev_pdn_setup);
- Sourabh Jain
Thanks Andy for review. I have incorporated your comments and will be
sending the patch soon.
On 5/11/24 11:12, Andy Shevchenko wrote:
> Thu, May 09, 2024 at 05:35:54PM +0530, Krishna Kumar kirjoitti:
>> There is an issue with the hotplug operation when it's done on the
>> bridge/switch slot. The bridge-port and devices behind the bridge, which
>> become offline by hot-unplug operation, don't get hot-plugged/enabled by
>> doing hot-plug operation on that slot. Only the first port of the bridge
>> gets enabled and the remaining port/devices remain unplugged. The hot
>> plug/unplug operation is done by the hotplug driver
>> (drivers/pci/hotplug/pnv_php.c).
>>
>> Root Cause Analysis: This behavior is due to missing code for the DPC
>> switch/bridge. The existing driver depends on pci_hp_add_devices()
>> function for device enablement. This function calls pci_scan_slot() on
>> only one device-node/port of the bridge, not on all the siblings'
>> device-node/port.
>>
>> The missing code needs to be added which will find all the sibling
>> device-nodes/bridge-ports and will run explicit pci_scan_slot() on
>> those. A new function has been added for this purpose which gets
>> invoked from pci_hp_add_devices(). This new function
>> pci_traverse_sibling_nodes_and_scan_slot() gets all the sibling
>> bridge-ports by traversal and explicitly invokes pci_scan_slot on them.
>>
>>
> One blank line is enough here.
I have incorporated this.
>
>> Signed-off-by: Krishna Kumar <[email protected]>
> ...
>
>> +void *pci_traverse_sibling_nodes_and_scan_slot(struct device_node *start, struct pci_bus *bus)
>> +{
>> + struct device_node *dn;
>> + struct device_node *parent;
>> + int slotno;
>> +
>> + const __be32 *classp1;
>> + u32 class1 = 0;
>> + classp1 = of_get_property(start->child, "class-code", NULL);
>> + if (classp1)
>> + class1 = of_read_number(classp1, 1);
> What's wrong with of_property_read_u32()?
I have incorporated this.
>
>
>> + /* Call of pci_scan_slot for non-bridge/EP case */
>> + if (!((class1 >> 8) == PCI_CLASS_BRIDGE_PCI)) {
>> + slotno = PCI_SLOT(PCI_DN(start->child)->devfn);
>> + pci_scan_slot(bus, PCI_DEVFN(slotno, 0));
>> + return NULL;
>> + }
>> +
>> + /* Iterate all siblings */
>> + parent = start;
>> + for_each_child_of_node(parent, dn) {
>> + const __be32 *classp;
>> + u32 class = 0;
>> +
>> + classp = of_get_property(dn, "class-code", NULL);
>> + if (classp)
>> + class = of_read_number(classp, 1);
> Ditto.
>
>> + /* Call of pci_scan_slot on each sibling-nodes/bridge-ports */
>> + if ((class >> 8) == PCI_CLASS_BRIDGE_PCI) {
>> + slotno = PCI_SLOT(PCI_DN(dn)->devfn);
>> + pci_scan_slot(bus, PCI_DEVFN(slotno, 0));
>> + }
>> + }
>> +
>> + return NULL;
>> +}
Best Regards,
Krishna
Thanks Sourabh for review. I have incorporated your comments and will be
sending my patch soon.
Best Regards,
Krishna
On 5/13/24 15:15, Sourabh Jain wrote:
> Hello Krishna,
>
> Is "arch" in commit title really required?
>
> On 09/05/24 17:35, Krishna Kumar wrote:
>> There is an issue with the hotplug operation when it's done on the
>> bridge/switch slot. The bridge-port and devices behind the bridge, which
>> become offline by hot-unplug operation, don't get hot-plugged/enabled by
>> doing hot-plug operation on that slot. Only the first port of the bridge
>> gets enabled and the remaining port/devices remain unplugged. The hot
>> plug/unplug operation is done by the hotplug driver
>> (drivers/pci/hotplug/pnv_php.c).
>>
>> Root Cause Analysis: This behavior is due to missing code for the DPC
>> switch/bridge. The existing driver depends on pci_hp_add_devices()
>> function for device enablement. This function calls pci_scan_slot() on
>> only one device-node/port of the bridge, not on all the siblings'
>> device-node/port.
>>
>> The missing code needs to be added which will find all the sibling
>> device-nodes/bridge-ports and will run explicit pci_scan_slot() on
>> those. A new function has been added for this purpose which gets
>> invoked from pci_hp_add_devices(). This new function
>> pci_traverse_sibling_nodes_and_scan_slot() gets all the sibling
>> bridge-ports by traversal and explicitly invokes pci_scan_slot on them.
>>
>>
>> Signed-off-by: Krishna Kumar <[email protected]>
>> ---
>>
>> Command for reproducing the issue :
>>
>> For hot unplug/disable - echo 0 > /sys/bus/pci/slots/C5/power
>> For hot plug/enable - echo 1 > /sys/bus/pci/slots/C5/power
>>
>> where C5 is slot associated with bridge.
>>
>> Scenario/Tests:
>> Output of lspci -nn before test is given below. This snippet contains
>> devices used for testing on Powernv machine.
>>
>> 0004:02:00.0 PCI bridge [0604]: PMC-Sierra Inc. Device [11f8:4052]
>> 0004:02:01.0 PCI bridge [0604]: PMC-Sierra Inc. Device [11f8:4052]
>> 0004:02:02.0 PCI bridge [0604]: PMC-Sierra Inc. Device [11f8:4052]
>> 0004:02:03.0 PCI bridge [0604]: PMC-Sierra Inc. Device [11f8:4052]
>> 0004:08:00.0 Serial Attached SCSI controller [0107]:
>> Broadcom / LSI SAS3216 PCI-Express Fusion-MPT SAS-3 [1000:00c9] (rev 01)
>> 0004:09:00.0 Serial Attached SCSI controller [0107]:
>> Broadcom / LSI SAS3216 PCI-Express Fusion-MPT SAS-3 [1000:00c9] (rev 01)
>>
>> Output of lspci -tv before test is as follows:
>>
>> # lspci -tv
>> +-[0004:00]---00.0-[01-0e]--+-00.0-[02-0e]--+-00.0-[03-07]--
>> | | +-01.0-[08]----00.0 Broadcom / LSI
>> SAS3216 PCI-Express Fusion-MPT SAS-3
>> | | +-02.0-[09]----00.0 Broadcom / LSI
>> SAS3216 PCI-Express Fusion-MPT SAS-3
>> | | \-03.0-[0a-0e]--
>> | \-00.1 PMC-Sierra Inc. Device 4052
>>
>> C5(bridge) and C6(End Point) slot address are as below:
>> # cat /sys/bus/pci/slots/C5/address
>> 0004:02:00
>> # cat /sys/bus/pci/slots/C6/address
>> 0004:09:00
>>
>> Hot-unplug operation on slot associated with bridge:
>> # echo 0 > /sys/bus/pci/slots/C5/power
>> # lspci -tv
>> +-[0004:00]---00.0-[01-0e]--+-00.0-[02-0e]--
>> | \-00.1 PMC-Sierra Inc. Device 4052
>>
>> From the above lspci -tv output, it can be observed that hot unplug
>> operation has removed all the PMC-Sierra bridge ports like:
>> 00.0-[03-07], 01.0-[08], 02.0-[09], 03.0-[0a-0e] and the SAS devices
>> behind the bridge-port. Without the fix, when the hot plug operation is
>> done on the same slot, it adds only the first bridge port and doesn't
>> restore all the bridge-ports and devices that it unplugged earlier.
>> Below snippet shows this.
>>
>> Hot-plug operation on the bridge slot without the fix:
>> # echo 1 > /sys/bus/pci/slots/C5/power
>> # lspci -tv
>> +-[0004:00]---00.0-[01-0e]--+-00.0-[02-0e]--+-00.0-[03-07]--
>> | \-00.1 PMC-Sierra Inc. Device 4052
>>
>> After the fix, it restores all the devices in the same manner how it
>> unplugged them earlier during the hot unplug operation. The below
>> snippet
>> shows the same.
>> Hot-plug operation on bridge slot with the fix:
>> # echo 1 > /sys/bus/pci/slots/C5/power
>> # lspci -tv
>> +-[0004:00]---00.0-[01-0e]--+-00.0-[02-0e]--+-00.0-[03-07]--
>> | | +-01.0-[08]----00.0 Broadcom / LSI
>> SAS3216 PCI-Express Fusion-MPT SAS-3
>> | | +-02.0-[09]----00.0 Broadcom / LSI
>> SAS3216 PCI-Express Fusion-MPT SAS-3
>> | | \-03.0-[0a-0e]--
>> | \-00.1 PMC-Sierra Inc. Device 4052
>>
>> Removal of End point device behind bridge are also intact and behaving
>> correctly.
>> Hot-unplug operation on Endpoint device C6:
>> # echo 0 > /sys/bus/pci/slots/C6/power
>> # lspci -tv
>> +-[0004:00]---00.0-[01-0e]--+-00.0-[02-0e]--+-00.0-[03-07]--
>> | | +-01.0-[08]----00.0 Broadcom / LSI
>> SAS3216 PCI-Express Fusion-MPT SAS-3
>> | | +-02.0-[09]--
>> | | \-03.0-[0a-0e]--
>> | \-00.1 PMC-Sierra Inc. Device 4052
>>
>> Hot-plug operation on Endpoint device C6:
>> # echo 1 > /sys/bus/pci/slots/C6/power
>> # lspci -tv
>> +-[0004:00]---00.0-[01-0e]--+-00.0-[02-0e]--+-00.0-[03-07]--
>> | | +-01.0-[08]----00.0 Broadcom / LSI
>> SAS3216 PCI-Express Fusion-MPT SAS-3
>> | | +-02.0-[09]----00.0 Broadcom / LSI
>> SAS3216 PCI-Express Fusion-MPT SAS-3
>> | | \-03.0-[0a-0e]--
>> | \-00.1 PMC-Sierra Inc. Device 4052
>>
>>
>>
>> arch/powerpc/include/asm/ppc-pci.h | 4 +++
>> arch/powerpc/kernel/pci-hotplug.c | 5 ++--
>> arch/powerpc/kernel/pci_dn.c | 42 ++++++++++++++++++++++++++++++
>> 3 files changed, 48 insertions(+), 3 deletions(-)
>>
>> diff --git a/arch/powerpc/include/asm/ppc-pci.h
>> b/arch/powerpc/include/asm/ppc-pci.h
>> index a8b7e8682f5b..a5d5ee4ff7c0 100644
>> --- a/arch/powerpc/include/asm/ppc-pci.h
>> +++ b/arch/powerpc/include/asm/ppc-pci.h
>> @@ -28,6 +28,10 @@ struct pci_dn;
>> void *pci_traverse_device_nodes(struct device_node *start,
>> void *(*fn)(struct device_node *, void *),
>> void *data);
>> +
>> +void *pci_traverse_sibling_nodes_and_scan_slot(struct device_node
>> *start,
>> + struct pci_bus *bus);
>> +
>> extern void pci_devs_phb_init_dynamic(struct pci_controller *phb);
>> #if defined(CONFIG_IOMMU_API) && (defined(CONFIG_PPC_PSERIES) || \
>> diff --git a/arch/powerpc/kernel/pci-hotplug.c
>> b/arch/powerpc/kernel/pci-hotplug.c
>> index 0fe251c6ac2c..639a3d592fe2 100644
>> --- a/arch/powerpc/kernel/pci-hotplug.c
>> +++ b/arch/powerpc/kernel/pci-hotplug.c
>> @@ -106,7 +106,7 @@ EXPORT_SYMBOL_GPL(pci_hp_remove_devices);
>> */
>> void pci_hp_add_devices(struct pci_bus *bus)
>> {
>> - int slotno, mode, max;
>> + int mode, max;
>> struct pci_dev *dev;
>> struct pci_controller *phb;
>> struct device_node *dn = pci_bus_to_OF_node(bus);
>> @@ -129,8 +129,7 @@ void pci_hp_add_devices(struct pci_bus *bus)
>> * order for fully rescan all the way down to pick them up.
>> * They can have been removed during partial hotplug.
>> */
>> - slotno = PCI_SLOT(PCI_DN(dn->child)->devfn);
>> - pci_scan_slot(bus, PCI_DEVFN(slotno, 0));
>> + pci_traverse_sibling_nodes_and_scan_slot(dn, bus);
>> max = bus->busn_res.start;
>> /*
>> * Scan bridges that are already configured. We don't touch
>> diff --git a/arch/powerpc/kernel/pci_dn.c b/arch/powerpc/kernel/pci_dn.c
>> index 38561d6a2079..2e202f9cec21 100644
>> --- a/arch/powerpc/kernel/pci_dn.c
>> +++ b/arch/powerpc/kernel/pci_dn.c
>> @@ -493,4 +493,46 @@ static void pci_dev_pdn_setup(struct pci_dev *pdev)
>> pdn = pci_get_pdn(pdev);
>> pdev->dev.archdata.pci_data = pdn;
>> }
>> +
>> +void *pci_traverse_sibling_nodes_and_scan_slot(struct device_node
>> *start, struct pci_bus *bus)
>
> Two things regarding the return type of the above function.
>
> 1. Function only returns NULL
> 2. Caller of this function doesn't take any action based on the return
> value of this function.
>
> How about changing the return type from void * to just void?
>
>> +{
>> + struct device_node *dn;
>> + struct device_node *parent;
>
> parent variable is not really required.
>
>> + int slotno;
>> +
>> + const __be32 *classp1;
>> + u32 class1 = 0;
>> +
>> + classp1 = of_get_property(start->child, "class-code", NULL);
>> + if (classp1)
>> + class1 = of_read_number(classp1, 1);
>> +
>> + /* Call of pci_scan_slot for non-bridge/EP case */
>> + if (!((class1 >> 8) == PCI_CLASS_BRIDGE_PCI)) {
>> + slotno = PCI_SLOT(PCI_DN(start->child)->devfn);
>> + pci_scan_slot(bus, PCI_DEVFN(slotno, 0));
>> + return NULL;
>> + }
>> +
>> + /* Iterate all siblings */
>> + parent = start;
>> + for_each_child_of_node(parent, dn) {
>> + const __be32 *classp;
>> + u32 class = 0;
>> +
>> + classp = of_get_property(dn, "class-code", NULL);
>> + if (classp)
>> + class = of_read_number(classp, 1);
>> +
>> + /* Call of pci_scan_slot on each sibling-nodes/bridge-ports */
>> + if ((class >> 8) == PCI_CLASS_BRIDGE_PCI) {
>> + slotno = PCI_SLOT(PCI_DN(dn)->devfn);
>> + pci_scan_slot(bus, PCI_DEVFN(slotno, 0));
>> + }
>> + }
>> +
>> + return NULL;
>
> Some level of code duplication can be avoided if we push both cases
> 1. Call of pci_scan_slot for non-bridge/EP case
> 2. Call of pci_scan_slot for bridge port
>
> inside `for_each_child_of_node` macro.
>
> Something like this.
>
> u32 class;
> int slotno;
> const __be32 *classp;
> struct device_node *dn;
>
> for_each_child_of_node(start, dn) {
> class = 0;
> classp = of_get_property(dn, "class-code", NULL);
> if (classp)
> class = of_read_number(classp, 1);
>
> /* Call of pci_scan_slot for non-bridge/EP case */
> if (!((class >> 8) == PCI_CLASS_BRIDGE_PCI) && start->child ==
> dn) {
> slotno = PCI_SLOT(PCI_DN(dn)->devfn);
> pci_scan_slot(bus, PCI_DEVFN(slotno, 0));
> of_node_put(dn);
> return;
> }
>
> /* Call of pci_scan_slot for bridge port */
> if ((class >> 8) == PCI_CLASS_BRIDGE_PCI) {
> slotno = PCI_SLOT(PCI_DN(dn)->devfn);
> pci_scan_slot(bus, PCI_DEVFN(slotno, 0));
> }
> }
>
>
>> +}
>> +EXPORT_SYMBOL_GPL(pci_traverse_sibling_nodes_and_scan_slot);
>
> What is the need for exporting the above function?
>
>> +
>> DECLARE_PCI_FIXUP_EARLY(PCI_ANY_ID, PCI_ANY_ID, pci_dev_pdn_setup);
>
> - Sourabh Jain