2005-10-03 18:57:42

by linas

[permalink] [raw]
Subject: [PATCH] ppc64: Crash in DLPAR code on PCI hotplug add


08-hotplug-bugfix.patch

In the current 2.6.14-rc2-git6 kernel, performing a Dynamic LPAR Add
of a hotplug slot will crash the system, with the following (abbreviated)
stack trace:

cpu 0x3: Vector: 700 (Program Check) at [c000000053dff7f0]
pc: c0000000004f5974: .__alloc_bootmem+0x0/0xb0
lr: c0000000000258a0: .update_dn_pci_info+0x108/0x118
c0000000000257c8 .update_dn_pci_info+0x30/0x118 (unreliable)
c0000000000258fc .pci_dn_reconfig_notifier+0x4c/0x64
c000000000060754 .notifier_call_chain+0x68/0x9c

The root cause was that the phb was not marked "dynamic", and so instead
of having kmalloc() being called, the __init __alloc_bootmem() was called,
resulting in access of garage data. The patch below fixes this crash,
and adds some docs to clarify the code.

Signed-off-by: Linas Vepstas <[email protected]>


Index: linux-2.6.14-rc2-git6/arch/ppc64/kernel/pci_dn.c
===================================================================
--- linux-2.6.14-rc2-git6.orig/arch/ppc64/kernel/pci_dn.c 2005-10-03 13:45:58.011393833 -0500
+++ linux-2.6.14-rc2-git6/arch/ppc64/kernel/pci_dn.c 2005-10-03 13:52:26.421786761 -0500
@@ -121,6 +121,12 @@
return NULL;
}

+/** pci_devs_phb_init_dynamic -- setup pci devices under this PHB
+ *
+ * This routine is called both during boot, (before the memory
+ * subsystem is set up, before kmalloc is valid) and during the
+ * dynamic lpar operation of adding a PHB to a running system.
+ */
void __devinit pci_devs_phb_init_dynamic(struct pci_controller *phb)
{
struct device_node * dn = (struct device_node *) phb->arch_data;
@@ -201,17 +207,19 @@
.notifier_call = pci_dn_reconfig_notifier,
};

-/*
- * Actually initialize the phbs.
- * The buswalk on this phb has not happened yet.
+/** pci_devs_phb_init -- Initialize phbs and pci devs under them.
+ *
+ * When this is called, the buswalk of PHB's has not happened yet.
*/
void __init pci_devs_phb_init(void)
{
struct pci_controller *phb, *tmp;

/* This must be done first so the device nodes have valid pci info! */
- list_for_each_entry_safe(phb, tmp, &hose_list, list_node)
+ list_for_each_entry_safe(phb, tmp, &hose_list, list_node) {
pci_devs_phb_init_dynamic(phb);
+ phb->is_dynamic = 1;
+ }

pSeries_reconfig_notifier_register(&pci_dn_reconfig_nb);
}


2005-10-03 21:24:14

by Alexey Dobriyan

[permalink] [raw]
Subject: Re: [PATCH] ppc64: Crash in DLPAR code on PCI hotplug add

On Mon, Oct 03, 2005 at 01:57:39PM -0500, linas wrote:
> The root cause was that the phb was not marked "dynamic", and so instead
> of having kmalloc() being called, the __init __alloc_bootmem() was called,
> resulting in access of garage data. The patch below fixes this crash,
> and adds some docs to clarify the code.

> +/** pci_devs_phb_init_dynamic -- setup pci devices under this PHB
> + *
> + * This routine is called both during boot, (before the memory
> + * subsystem is set up, before kmalloc is valid) and during the
> + * dynamic lpar operation of adding a PHB to a running system.
> + */
> void __devinit pci_devs_phb_init_dynamic(struct pci_controller *phb)

Please, add docs in a proper way:

/**
* foo - bar
* a: b
*
* Does bar.
*/

2005-10-03 21:49:43

by linas

[permalink] [raw]
Subject: Re: [PATCH] ppc64: Crash in DLPAR code on PCI hotplug add

On Tue, Oct 04, 2005 at 01:34:30AM +0400, Alexey Dobriyan was heard to remark:
>
> Please, add docs in a proper way:

Done, new patch attached.

--linas

08-hotplug-bugfix.patch

In the current 2.6.14-rc2-git6 kernel, performing a Dynamic LPAR Add
of a hotplug slot will crash the system, with the following (abbreviated)
stack trace:

cpu 0x3: Vector: 700 (Program Check) at [c000000053dff7f0]
pc: c0000000004f5974: .__alloc_bootmem+0x0/0xb0
lr: c0000000000258a0: .update_dn_pci_info+0x108/0x118
c0000000000257c8 .update_dn_pci_info+0x30/0x118 (unreliable)
c0000000000258fc .pci_dn_reconfig_notifier+0x4c/0x64
c000000000060754 .notifier_call_chain+0x68/0x9c

The root cause was that the phb was not marked "dynamic", and so instead
of having kmalloc() being called, the __init __alloc_bootmem() was called,
resulting in access of garage data. The patch below fixes this crash,
and adds some docs to clarify the code.

Signed-off-by: Linas Vepstas <[email protected]>

Index: linux-2.6.14-rc2-git6/arch/ppc64/kernel/pci_dn.c
===================================================================
--- linux-2.6.14-rc2-git6.orig/arch/ppc64/kernel/pci_dn.c 2005-10-03 13:45:58.000000000 -0500
+++ linux-2.6.14-rc2-git6/arch/ppc64/kernel/pci_dn.c 2005-10-03 16:46:33.816658976 -0500
@@ -121,6 +121,14 @@
return NULL;
}

+/**
+ * pci_devs_phb_init_dynamic - setup pci devices under this PHB
+ * phb: pci-to-host bridge (top-level bridge connecting to cpu)
+ *
+ * This routine is called both during boot, (before the memory
+ * subsystem is set up, before kmalloc is valid) and during the
+ * dynamic lpar operation of adding a PHB to a running system.
+ */
void __devinit pci_devs_phb_init_dynamic(struct pci_controller *phb)
{
struct device_node * dn = (struct device_node *) phb->arch_data;
@@ -201,17 +209,24 @@
.notifier_call = pci_dn_reconfig_notifier,
};

-/*
- * Actually initialize the phbs.
- * The buswalk on this phb has not happened yet.
+/**
+ * pci_devs_phb_init - Initialize phbs and pci devs under them.
+ *
+ * This routine walks over all phb's (pci-host bridges) on the
+ * system, and sets up assorted pci-related structures
+ * (including pci info in the device node structs) for each
+ * pci device found underneath. This routine runs once,
+ * early in the boot sequence.
*/
void __init pci_devs_phb_init(void)
{
struct pci_controller *phb, *tmp;

/* This must be done first so the device nodes have valid pci info! */
- list_for_each_entry_safe(phb, tmp, &hose_list, list_node)
+ list_for_each_entry_safe(phb, tmp, &hose_list, list_node) {
pci_devs_phb_init_dynamic(phb);
+ phb->is_dynamic = 1;
+ }

pSeries_reconfig_notifier_register(&pci_dn_reconfig_nb);
}

2005-10-04 20:30:27

by linas

[permalink] [raw]
Subject: Re: [PATCH] ppc64: Crash in DLPAR code on PCI hotplug add


After discussion with John Rose, I relize that this patch breaks
something else, and so its no good. I'll try to come up with a
different patch, which will unfortunately be a bit more complex.

--linas

On Mon, Oct 03, 2005 at 01:57:39PM -0500, linas was heard to remark:
>
> 08-hotplug-bugfix.patch
>
> In the current 2.6.14-rc2-git6 kernel, performing a Dynamic LPAR Add
> of a hotplug slot will crash the system, with the following (abbreviated)
> stack trace:
>
> cpu 0x3: Vector: 700 (Program Check) at [c000000053dff7f0]
> pc: c0000000004f5974: .__alloc_bootmem+0x0/0xb0
> lr: c0000000000258a0: .update_dn_pci_info+0x108/0x118
> c0000000000257c8 .update_dn_pci_info+0x30/0x118 (unreliable)
> c0000000000258fc .pci_dn_reconfig_notifier+0x4c/0x64
> c000000000060754 .notifier_call_chain+0x68/0x9c
>
> The root cause was that the phb was not marked "dynamic", and so instead
> of having kmalloc() being called, the __init __alloc_bootmem() was called,
> resulting in access of garage data. The patch below fixes this crash,
> and adds some docs to clarify the code.
>
> Signed-off-by: Linas Vepstas <[email protected]>
>
>
> Index: linux-2.6.14-rc2-git6/arch/ppc64/kernel/pci_dn.c
> ===================================================================
> --- linux-2.6.14-rc2-git6.orig/arch/ppc64/kernel/pci_dn.c 2005-10-03 13:45:58.011393833 -0500
> +++ linux-2.6.14-rc2-git6/arch/ppc64/kernel/pci_dn.c 2005-10-03 13:52:26.421786761 -0500
> @@ -121,6 +121,12 @@
> return NULL;
> }
>
> +/** pci_devs_phb_init_dynamic -- setup pci devices under this PHB
> + *
> + * This routine is called both during boot, (before the memory
> + * subsystem is set up, before kmalloc is valid) and during the
> + * dynamic lpar operation of adding a PHB to a running system.
> + */
> void __devinit pci_devs_phb_init_dynamic(struct pci_controller *phb)
> {
> struct device_node * dn = (struct device_node *) phb->arch_data;
> @@ -201,17 +207,19 @@
> .notifier_call = pci_dn_reconfig_notifier,
> };
>
> -/*
> - * Actually initialize the phbs.
> - * The buswalk on this phb has not happened yet.
> +/** pci_devs_phb_init -- Initialize phbs and pci devs under them.
> + *
> + * When this is called, the buswalk of PHB's has not happened yet.
> */
> void __init pci_devs_phb_init(void)
> {
> struct pci_controller *phb, *tmp;
>
> /* This must be done first so the device nodes have valid pci info! */
> - list_for_each_entry_safe(phb, tmp, &hose_list, list_node)
> + list_for_each_entry_safe(phb, tmp, &hose_list, list_node) {
> pci_devs_phb_init_dynamic(phb);
> + phb->is_dynamic = 1;
> + }
>
> pSeries_reconfig_notifier_register(&pci_dn_reconfig_nb);
> }
> _______________________________________________
> Linuxppc64-dev mailing list
> [email protected]
> https://ozlabs.org/mailman/listinfo/linuxppc64-dev
>

2005-10-04 23:59:26

by linas

[permalink] [raw]
Subject: [PATCH 1/2] ppc64: Crash in DLPAR code on PCI hotplug add


Paul,

A new-improved variant of the previous patch in this thread.
Please apply.

08-hotplug-bugfix.patch

In the current 2.6.14-rc2-git6 kernel, performing a Dynamic LPAR Add
of a hotplug slot will crash the system, with the following (abbreviated)
stack trace:

cpu 0x3: Vector: 700 (Program Check) at [c000000053dff7f0]
pc: c0000000004f5974: .__alloc_bootmem+0x0/0xb0
lr: c0000000000258a0: .update_dn_pci_info+0x108/0x118
c0000000000257c8 .update_dn_pci_info+0x30/0x118 (unreliable)
c0000000000258fc .pci_dn_reconfig_notifier+0x4c/0x64
c000000000060754 .notifier_call_chain+0x68/0x9c

The root cause was that __init __alloc_bootmem() was called long after
boot had finished, resulting in a crash because this routine is undefined
after boot time. The patch below fixes this crash, and adds some docs to
clarify the code.

p.s. congrats to all for getting slashdotted on this yesterday!

Signed-off-by: Linas Vepstas <[email protected]>


Index: linux-2.6.14-rc2-git6/arch/ppc64/kernel/pci_dn.c
===================================================================
--- linux-2.6.14-rc2-git6.orig/arch/ppc64/kernel/pci_dn.c 2005-10-03 13:45:58.000000000 -0500
+++ linux-2.6.14-rc2-git6/arch/ppc64/kernel/pci_dn.c 2005-10-04 15:37:49.761245845 -0500
@@ -44,7 +44,7 @@
u32 *regs;
struct pci_dn *pdn;

- if (phb->is_dynamic)
+ if (mem_init_done)
pdn = kmalloc(sizeof(*pdn), GFP_KERNEL);
else
pdn = alloc_bootmem(sizeof(*pdn));
@@ -121,6 +121,14 @@
return NULL;
}

+/**
+ * pci_devs_phb_init_dynamic - setup pci devices under this PHB
+ * phb: pci-to-host bridge (top-level bridge connecting to cpu)
+ *
+ * This routine is called both during boot, (before the memory
+ * subsystem is set up, before kmalloc is valid) and during the
+ * dynamic lpar operation of adding a PHB to a running system.
+ */
void __devinit pci_devs_phb_init_dynamic(struct pci_controller *phb)
{
struct device_node * dn = (struct device_node *) phb->arch_data;
@@ -201,9 +209,14 @@
.notifier_call = pci_dn_reconfig_notifier,
};

-/*
- * Actually initialize the phbs.
- * The buswalk on this phb has not happened yet.
+/**
+ * pci_devs_phb_init - Initialize phbs and pci devs under them.
+ *
+ * This routine walks over all phb's (pci-host bridges) on the
+ * system, and sets up assorted pci-related structures
+ * (including pci info in the device node structs) for each
+ * pci device found underneath. This routine runs once,
+ * early in the boot sequence.
*/
void __init pci_devs_phb_init(void)
{

2005-10-05 00:01:19

by linas

[permalink] [raw]
Subject: [PATCH 2/2] ppc64: Crash in DLPAR code on remove operation


This patch fixes two bugs related to dlpar slot removal and add.

-- Both crashes are due to the fact the some children
of pci nodes are not pci nodes themselves, and thus do not
have pci_dn structures. For example:
/pci@800000020000002/pci@2,3/usb@1/hub@1
/pci@800000020000002/pci@2,3/usb@1,1/hub@1

Strangely, though, sometimes the following appears,
and I don't quite understand why.
/interrupt-controller@3fe0000a400

A typical stack trace:
Vector: 300 (Data Access) at [c0000000555637d0]
pc: c000000000202a50: .dlpar_add_slot+0x108/0x410
c000000000202e78 .add_slot_store+0x7c/0xac
c000000000202da0 .dlpar_attr_store+0x48/0x64
c0000000000f8ee4 .sysfs_write_file+0x100/0x1a0

A similar stack trace is involved for the slot remove.

This code survived testing, of adding and removing different slots,
23 times each, so far, as of this writing.

Signed-off-by: Linas Vepstas <[email protected]>


Index: linux-2.6.14-rc2-git6/arch/ppc64/kernel/pSeries_iommu.c
===================================================================
--- linux-2.6.14-rc2-git6.orig/arch/ppc64/kernel/pSeries_iommu.c 2005-10-04 16:47:09.175705100 -0500
+++ linux-2.6.14-rc2-git6/arch/ppc64/kernel/pSeries_iommu.c 2005-10-04 17:12:54.123928903 -0500
@@ -478,10 +478,13 @@
{
int err = NOTIFY_OK;
struct device_node *np = node;
- struct pci_dn *pci = np->data;
+ struct pci_dn *pci;

switch (action) {
case PSERIES_RECONFIG_REMOVE:
+ pci = PCI_DN(np);
+ if (!pci)
+ return NOTIFY_OK;
if (pci->iommu_table &&
get_property(np, "ibm,dma-window", NULL))
iommu_free_table(np);
Index: linux-2.6.14-rc2-git6/arch/ppc64/kernel/pci_dn.c
===================================================================
--- linux-2.6.14-rc2-git6.orig/arch/ppc64/kernel/pci_dn.c 2005-10-04 15:37:49.761245845 -0500
+++ linux-2.6.14-rc2-git6/arch/ppc64/kernel/pci_dn.c 2005-10-04 17:58:47.344628793 -0500
@@ -195,7 +195,10 @@

switch (action) {
case PSERIES_RECONFIG_ADD:
- pci = np->parent->data;
+ pci = PCI_DN(np->parent);
+ if (!pci)
+ return NOTIFY_OK;
+
update_dn_pci_info(np, pci->phb);
break;
default:

2005-10-05 00:05:10

by linas

[permalink] [raw]
Subject: [PATCH] rpaphp: PCI Hotplug crash on PHB DLPAR add


This patch fixes a bug related to dlpar PHB add, after a PHB removal.

-- The crash was due to the PHB not having a pci_dn structure yet,
when the phb is being added.

This code survived testing, of adding and removeig the PHB and all slots
underneath it, 17 times so far, as of this writing.

Signed-off-by: Linas Vepstas <[email protected]>

Index: linux-2.6.14-rc2-git6/drivers/pci/hotplug/rpadlpar_core.c
===================================================================
--- linux-2.6.14-rc2-git6.orig/drivers/pci/hotplug/rpadlpar_core.c 2005-10-04 16:40:12.539168432 -0500
+++ linux-2.6.14-rc2-git6/drivers/pci/hotplug/rpadlpar_core.c 2005-10-04 17:55:43.165471615 -0500
@@ -303,7 +303,7 @@
{
struct pci_controller *phb;

- if (PCI_DN(dn)->phb) {
+ if (PCI_DN(dn) && PCI_DN(dn)->phb) {
/* PHB already exists */
return -EINVAL;
}