2005-09-30 00:48:04

by linas

[permalink] [raw]
Subject: [PATCH 0/7] ppc64: Assorted minor EEH cleanups


The following seven patches perform a variety of cleanups and restructurings
of the EEH code, in preparation for the addition of code that will recover
from EEH events. The frst few patches are nearly janitorial, mostly tweaking
whitespace; alhough the later patches are more serious and actually fix bugs.

These are all small, and should be easy to review. I beleive that these
patches should not be controversial in any way, and thus should be ready
to be applied.

They compile but (ahem) are not tested, as I just now discovered that
I cannot even boot 2.6.14-rc2-git6 in any shape or form; it panics complaining
of "junk in gzipped archive". Of course, I am of the firmest conviction that
my patches are spotless, and need no testing, anyway. So there. :)

Signed-off-by: Linas Vepstas <[email protected]>

--linas


2005-09-30 00:51:45

by linas

[permalink] [raw]
Subject: [PATCH 1/7] ppc64: EEH typos, include files, macros, whitespace


01-eeh-minor-cleanup.patch

This patch performs some minor cleanup of the eeh.c file, including:
-- trim some trailing whitespace
-- remove extraneous #includes
-- use the macro PCI_DN uniformly, instead of the void pointer chase.
-- typos in comments
-- improved debug printk's

Signed-off-by: Linas Vepstas <[email protected]>

Index: linux-2.6.14-rc2-git6/arch/ppc64/kernel/eeh.c
===================================================================
--- linux-2.6.14-rc2-git6.orig/arch/ppc64/kernel/eeh.c 2005-09-29 11:03:52.084426836 -0500
+++ linux-2.6.14-rc2-git6/arch/ppc64/kernel/eeh.c 2005-09-29 13:09:10.306972480 -0500
@@ -1,32 +1,31 @@
/*
* eeh.c
* Copyright (C) 2001 Dave Engebretsen & Todd Inglett IBM Corporation
- *
+ *
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 2 of the License, or
* (at your option) any later version.
- *
+ *
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
- *
+ *
* You should have received a copy of the GNU General Public License
* along with this program; if not, write to the Free Software
* Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
*/

-#include <linux/bootmem.h>
#include <linux/init.h>
#include <linux/list.h>
-#include <linux/mm.h>
#include <linux/notifier.h>
#include <linux/pci.h>
#include <linux/proc_fs.h>
#include <linux/rbtree.h>
#include <linux/seq_file.h>
#include <linux/spinlock.h>
+#include <asm/atomic.h>
#include <asm/eeh.h>
#include <asm/io.h>
#include <asm/machdep.h>
@@ -49,8 +48,8 @@
* were "empty": all reads return 0xff's and all writes are silently
* ignored. EEH slot isolation events can be triggered by parity
* errors on the address or data busses (e.g. during posted writes),
- * which in turn might be caused by dust, vibration, humidity,
- * radioactivity or plain-old failed hardware.
+ * which in turn might be caused by low voltage on the bus, dust,
+ * vibration, humidity, radioactivity or plain-old failed hardware.
*
* Note, however, that one of the leading causes of EEH slot
* freeze events are buggy device drivers, buggy device microcode,
@@ -256,18 +255,17 @@

dn = pci_device_to_OF_node(dev);
if (!dn) {
- printk(KERN_WARNING "PCI: no pci dn found for dev=%s\n",
- pci_name(dev));
+ printk(KERN_WARNING "PCI: no pci dn found for dev=%s\n", pci_name(dev));
return;
}

/* Skip any devices for which EEH is not enabled. */
- pdn = dn->data;
+ pdn = PCI_DN(dn);
if (!(pdn->eeh_mode & EEH_MODE_SUPPORTED) ||
pdn->eeh_mode & EEH_MODE_NOCHECK) {
#ifdef DEBUG
- printk(KERN_INFO "PCI: skip building address cache for=%s\n",
- pci_name(dev));
+ printk(KERN_INFO "PCI: skip building address cache for=%s - %s\n",
+ pci_name(dev), pdn->node->full_name);
#endif
return;
}
@@ -410,16 +408,16 @@
* @dn: device node to read
* @rets: array to return results in
*/
-static int read_slot_reset_state(struct device_node *dn, int rets[])
+static int read_slot_reset_state(struct pci_dn *pdn, int rets[])
{
int token, outputs;
- struct pci_dn *pdn = dn->data;

if (ibm_read_slot_reset_state2 != RTAS_UNKNOWN_SERVICE) {
token = ibm_read_slot_reset_state2;
outputs = 4;
} else {
token = ibm_read_slot_reset_state;
+ rets[2] = 0; /* fake PE Unavailable info */
outputs = 3;
}

@@ -496,7 +494,7 @@

/**
* eeh_token_to_phys - convert EEH address token to phys address
- * @token i/o token, should be address in the form 0xE....
+ * @token i/o token, should be address in the form 0xA....
*/
static inline unsigned long eeh_token_to_phys(unsigned long token)
{
@@ -522,7 +520,7 @@
* will query firmware for the EEH status.
*
* Returns 0 if there has not been an EEH error; otherwise returns
- * a non-zero value and queues up a solt isolation event notification.
+ * a non-zero value and queues up a slot isolation event notification.
*
* It is safe to call this routine in an interrupt context.
*/
@@ -542,7 +540,7 @@

if (!dn)
return 0;
- pdn = dn->data;
+ pdn = PCI_DN(dn);

/* Access to IO BARs might get this far and still not want checking. */
if (!pdn->eeh_capable || !(pdn->eeh_mode & EEH_MODE_SUPPORTED) ||
@@ -562,7 +560,7 @@
atomic_inc(&eeh_fail_count);
if (atomic_read(&eeh_fail_count) >= EEH_MAX_FAILS) {
/* re-read the slot reset state */
- if (read_slot_reset_state(dn, rets) != 0)
+ if (read_slot_reset_state(pdn, rets) != 0)
rets[0] = -1; /* reset state unknown */
eeh_panic(dev, rets[0]);
}
@@ -576,7 +574,7 @@
* function zero of a multi-function device.
* In any case they must share a common PHB.
*/
- ret = read_slot_reset_state(dn, rets);
+ ret = read_slot_reset_state(pdn, rets);
if (!(ret == 0 && rets[1] == 1 && (rets[0] == 2 || rets[0] == 4))) {
__get_cpu_var(false_positives)++;
return 0;
@@ -635,7 +633,6 @@
* @token i/o token, should be address in the form 0xA....
* @val value, should be all 1's (XXX why do we need this arg??)
*
- * Check for an eeh failure at the given token address.
* Check for an EEH failure at the given token address. Call this
* routine if the result of a read was all 0xff's and you want to
* find out if this is due to an EEH slot freeze event. This routine
@@ -680,7 +677,7 @@
u32 *device_id = (u32 *)get_property(dn, "device-id", NULL);
u32 *regs;
int enable;
- struct pci_dn *pdn = dn->data;
+ struct pci_dn *pdn = PCI_DN(dn);

pdn->eeh_mode = 0;

@@ -732,7 +729,7 @@

/* This device doesn't support EEH, but it may have an
* EEH parent, in which case we mark it as supported. */
- if (dn->parent && dn->parent->data
+ if (dn->parent && PCI_DN(dn->parent)
&& (PCI_DN(dn->parent)->eeh_mode & EEH_MODE_SUPPORTED)) {
/* Parent supports EEH. */
pdn->eeh_mode |= EEH_MODE_SUPPORTED;
@@ -745,7 +742,7 @@
dn->full_name);
}

- return NULL;
+ return NULL;
}

/*
@@ -793,13 +790,11 @@
for (phb = of_find_node_by_name(NULL, "pci"); phb;
phb = of_find_node_by_name(phb, "pci")) {
unsigned long buid;
- struct pci_dn *pci;

buid = get_phb_buid(phb);
- if (buid == 0 || phb->data == NULL)
+ if (buid == 0 || PCI_DN(phb) == NULL)
continue;

- pci = phb->data;
info.buid_lo = BUID_LO(buid);
info.buid_hi = BUID_HI(buid);
traverse_pci_devices(phb, early_enable_eeh, &info);
@@ -828,11 +823,13 @@
struct pci_controller *phb;
struct eeh_early_enable_info info;

- if (!dn || !dn->data)
+ if (!dn || !PCI_DN(dn))
return;
phb = PCI_DN(dn)->phb;
if (NULL == phb || 0 == phb->buid) {
- printk(KERN_WARNING "EEH: Expected buid but found none\n");
+ printk(KERN_WARNING "EEH: Expected buid but found none for %s\n",
+ dn->full_name);
+ dump_stack();
return;
}

2005-09-30 00:53:41

by linas

[permalink] [raw]
Subject: [PATCH 2/7] ppc64: EEH PCI address cache cleanups


02-eeh-addr-cache-cleanup.patch

This is a minor patch to clean up a buglet related to the PCI address cache.
(The buglet doesn't manifes itself unless there are also bugs elsewhere,
which is why its minor.). Also:

-- Improved debug printing.
-- Declare some private routines as static
-- Adds reference counting to struct pci_dn->pcidev structure

Signed-off-by: Linas Vepstas <[email protected]>

Index: linux-2.6.14-rc2-git6/arch/ppc64/kernel/eeh.c
===================================================================
--- linux-2.6.14-rc2-git6.orig/arch/ppc64/kernel/eeh.c 2005-09-29 13:09:10.306972480 -0500
+++ linux-2.6.14-rc2-git6/arch/ppc64/kernel/eeh.c 2005-09-29 13:41:44.868689967 -0500
@@ -219,9 +219,9 @@
while (*p) {
parent = *p;
piar = rb_entry(parent, struct pci_io_addr_range, rb_node);
- if (alo < piar->addr_lo) {
+ if (ahi < piar->addr_lo) {
p = &parent->rb_left;
- } else if (ahi > piar->addr_hi) {
+ } else if (alo > piar->addr_hi) {
p = &parent->rb_right;
} else {
if (dev != piar->pcidev ||
@@ -240,6 +240,11 @@
piar->pcidev = dev;
piar->flags = flags;

+#ifdef DEBUG
+ printk(KERN_DEBUG "PIAR: insert range=[%lx:%lx] dev=%s\n",
+ alo, ahi, pci_name (dev));
+#endif
+
rb_link_node(&piar->rb_node, parent, p);
rb_insert_color(&piar->rb_node, &pci_io_addr_cache_root.rb_root);

@@ -301,7 +306,7 @@
* we maintain a cache of devices that can be quickly searched.
* This routine adds a device to that cache.
*/
-void pci_addr_cache_insert_device(struct pci_dev *dev)
+static void pci_addr_cache_insert_device(struct pci_dev *dev)
{
unsigned long flags;

@@ -344,7 +349,7 @@
* the tree multiple times (once per resource).
* But so what; device removal doesn't need to be that fast.
*/
-void pci_addr_cache_remove_device(struct pci_dev *dev)
+static void pci_addr_cache_remove_device(struct pci_dev *dev)
{
unsigned long flags;

@@ -366,6 +371,9 @@
{
struct pci_dev *dev = NULL;

+ if (!eeh_subsystem_enabled)
+ return;
+
spin_lock_init(&pci_io_addr_cache_root.piar_lock);

while ((dev = pci_get_device(PCI_ANY_ID, PCI_ANY_ID, dev)) != NULL) {
@@ -837,7 +845,7 @@
info.buid_lo = BUID_LO(phb->buid);
early_enable_eeh(dn, &info);
}
-EXPORT_SYMBOL(eeh_add_device_early);
+EXPORT_SYMBOL_GPL(eeh_add_device_early);

/**
* eeh_add_device_late - perform EEH initialization for the indicated pci device
@@ -848,6 +856,8 @@
*/
void eeh_add_device_late(struct pci_dev *dev)
{
+ struct device_node *dn;
+
if (!dev || !eeh_subsystem_enabled)
return;

@@ -855,9 +865,13 @@
printk(KERN_DEBUG "EEH: adding device %s\n", pci_name(dev));
#endif

+ pci_dev_get (dev);
+ dn = pci_device_to_OF_node(dev);
+ PCI_DN(dn)->pcidev = dev;
+
pci_addr_cache_insert_device (dev);
}
-EXPORT_SYMBOL(eeh_add_device_late);
+EXPORT_SYMBOL_GPL(eeh_add_device_late);

/**
* eeh_remove_device - undo EEH setup for the indicated pci device
@@ -868,6 +882,7 @@
*/
void eeh_remove_device(struct pci_dev *dev)
{
+ struct device_node *dn;
if (!dev || !eeh_subsystem_enabled)
return;

@@ -876,8 +891,12 @@
printk(KERN_DEBUG "EEH: remove device %s\n", pci_name(dev));
#endif
pci_addr_cache_remove_device(dev);
+
+ dn = pci_device_to_OF_node(dev);
+ PCI_DN(dn)->pcidev = NULL;
+ pci_dev_put (dev);
}
-EXPORT_SYMBOL(eeh_remove_device);
+EXPORT_SYMBOL_GPL(eeh_remove_device);

static int proc_eeh_show(struct seq_file *m, void *v)
{
Index: linux-2.6.14-rc2-git6/arch/ppc64/kernel/pci.h
===================================================================
--- linux-2.6.14-rc2-git6.orig/arch/ppc64/kernel/pci.h 2005-09-29 13:09:10.306972480 -0500
+++ linux-2.6.14-rc2-git6/arch/ppc64/kernel/pci.h 2005-09-29 13:36:08.550882158 -0500
@@ -39,10 +39,6 @@
void pci_devs_phb_init(void);
void pci_devs_phb_init_dynamic(struct pci_controller *phb);

-/* PCI address cache management routines */
-void pci_addr_cache_insert_device(struct pci_dev *dev);
-void pci_addr_cache_remove_device(struct pci_dev *dev);
-
/* From rtas_pci.h */
void init_pci_config_tokens (void);
unsigned long get_phb_buid (struct device_node *);

2005-09-30 00:54:55

by linas

[permalink] [raw]
Subject: [PATCH 3/7] ppc64: EEH Add event/internal state statistics


03-eeh-statistics.patch

This minor patch adds some statistics-gathering counters that allow the
behaviour of the EEH subsystem o be monitored. While far from perfect,
it does provide a rudimentary device that makes understanding of the
current state of the system a bit easier.

Signed-off-by: Linas Vepstas <[email protected]>


Index: linux-2.6.14-rc2-git6/arch/ppc64/kernel/eeh.c
===================================================================
--- linux-2.6.14-rc2-git6.orig/arch/ppc64/kernel/eeh.c 2005-09-29 13:52:08.188222887 -0500
+++ linux-2.6.14-rc2-git6/arch/ppc64/kernel/eeh.c 2005-09-29 16:05:53.025549160 -0500
@@ -102,6 +102,10 @@
static int eeh_error_buf_size;

/* System monitoring statistics */
+static DEFINE_PER_CPU(unsigned long, no_device);
+static DEFINE_PER_CPU(unsigned long, no_dn);
+static DEFINE_PER_CPU(unsigned long, no_cfg_addr);
+static DEFINE_PER_CPU(unsigned long, ignored_check);
static DEFINE_PER_CPU(unsigned long, total_mmio_ffs);
static DEFINE_PER_CPU(unsigned long, false_positives);
static DEFINE_PER_CPU(unsigned long, ignored_failures);
@@ -493,8 +497,6 @@
notifier_call_chain (&eeh_notifier_chain,
EEH_NOTIFY_FREEZE, event);

- __get_cpu_var(slot_resets)++;
-
pci_dev_put(event->dev);
kfree(event);
}
@@ -546,17 +548,24 @@
if (!eeh_subsystem_enabled)
return 0;

- if (!dn)
+ if (!dn) {
+ __get_cpu_var(no_dn)++;
return 0;
+ }
pdn = PCI_DN(dn);

/* Access to IO BARs might get this far and still not want checking. */
if (!pdn->eeh_capable || !(pdn->eeh_mode & EEH_MODE_SUPPORTED) ||
pdn->eeh_mode & EEH_MODE_NOCHECK) {
+ __get_cpu_var(ignored_check)++;
+#ifdef DEBUG
+ printk ("EEH:ignored check for %s %s\n", pci_name (dev), dn->full_name);
+#endif
return 0;
}

if (!pdn->eeh_config_addr) {
+ __get_cpu_var(no_cfg_addr)++;
return 0;
}

@@ -590,6 +599,7 @@

/* prevent repeated reports of this failure */
pdn->eeh_mode |= EEH_MODE_ISOLATED;
+ __get_cpu_var(slot_resets)++;

reset_state = rets[0];

@@ -657,8 +667,10 @@
/* Finding the phys addr + pci device; this is pretty quick. */
addr = eeh_token_to_phys((unsigned long __force) token);
dev = pci_get_device_by_addr(addr);
- if (!dev)
+ if (!dev) {
+ __get_cpu_var(no_device)++;
return val;
+ }

dn = pci_device_to_OF_node(dev);
eeh_dn_check_failure (dn, dev);
@@ -903,12 +915,17 @@
unsigned int cpu;
unsigned long ffs = 0, positives = 0, failures = 0;
unsigned long resets = 0;
+ unsigned long no_dev = 0, no_dn = 0, no_cfg = 0, no_check = 0;

for_each_cpu(cpu) {
ffs += per_cpu(total_mmio_ffs, cpu);
positives += per_cpu(false_positives, cpu);
failures += per_cpu(ignored_failures, cpu);
resets += per_cpu(slot_resets, cpu);
+ no_dev += per_cpu(no_device, cpu);
+ no_dn += per_cpu(no_dn, cpu);
+ no_cfg += per_cpu(no_cfg_addr, cpu);
+ no_check += per_cpu(ignored_check, cpu);
}

if (0 == eeh_subsystem_enabled) {
@@ -916,13 +933,17 @@
seq_printf(m, "eeh_total_mmio_ffs=%ld\n", ffs);
} else {
seq_printf(m, "EEH Subsystem is enabled\n");
- seq_printf(m, "eeh_total_mmio_ffs=%ld\n"
- "eeh_false_positives=%ld\n"
- "eeh_ignored_failures=%ld\n"
- "eeh_slot_resets=%ld\n"
- "eeh_fail_count=%d\n",
- ffs, positives, failures, resets,
- eeh_fail_count.counter);
+ seq_printf(m,
+ "no device=%ld\n"
+ "no device node=%ld\n"
+ "no config address=%ld\n"
+ "check not wanted=%ld\n"
+ "eeh_total_mmio_ffs=%ld\n"
+ "eeh_false_positives=%ld\n"
+ "eeh_ignored_failures=%ld\n"
+ "eeh_slot_resets=%ld\n",
+ no_dev, no_dn, no_cfg, no_check,
+ ffs, positives, failures, resets);
}

return 0;

2005-09-30 00:57:01

by linas

[permalink] [raw]
Subject: [PATCH 4/7] ppc64: EEH PCI slot error details abstraction


04-eeh-slot-error-detail.patch

This patch encapsulates a section of code that reports the EEH event.
The new subroutine can be used in several places to report the eror.

Signed-off-by: Linas Vepstas <[email protected]>


Index: linux-2.6.14-rc2-git6/arch/ppc64/kernel/eeh.c
===================================================================
--- linux-2.6.14-rc2-git6.orig/arch/ppc64/kernel/eeh.c 2005-09-29 16:05:53.025549160 -0500
+++ linux-2.6.14-rc2-git6/arch/ppc64/kernel/eeh.c 2005-09-29 16:06:25.583986100 -0500
@@ -397,6 +397,28 @@
/* --------------------------------------------------------------- */
/* Above lies the PCI Address Cache. Below lies the EEH event infrastructure */

+void eeh_slot_error_detail (struct pci_dn *pdn, int severity)
+{
+ unsigned long flags;
+ int rc;
+
+ /* Log the error with the rtas logger */
+ spin_lock_irqsave(&slot_errbuf_lock, flags);
+ memset(slot_errbuf, 0, eeh_error_buf_size);
+
+ rc = rtas_call(ibm_slot_error_detail,
+ 8, 1, NULL, pdn->eeh_config_addr,
+ BUID_HI(pdn->phb->buid),
+ BUID_LO(pdn->phb->buid), NULL, 0,
+ virt_to_phys(slot_errbuf),
+ eeh_error_buf_size,
+ severity);
+
+ if (rc == 0)
+ log_error(slot_errbuf, ERR_TYPE_RTAS_LOG, 0);
+ spin_unlock_irqrestore(&slot_errbuf_lock, flags);
+}
+
/**
* eeh_register_notifier - Register to find out about EEH events.
* @nb: notifier block to callback on events
@@ -454,9 +476,12 @@
* Since the panic_on_oops sysctl is used to halt the system
* in light of potential corruption, we can use it here.
*/
- if (panic_on_oops)
+ if (panic_on_oops) {
+ struct device_node *dn = pci_device_to_OF_node(dev);
+ eeh_slot_error_detail (PCI_DN(dn), 2 /* Permanent Error */);
panic("EEH: MMIO failure (%d) on device:%s\n", reset_state,
pci_name(dev));
+ }
else {
__get_cpu_var(ignored_failures)++;
printk(KERN_INFO "EEH: Ignored MMIO failure (%d) on device:%s\n",
@@ -539,7 +564,7 @@
int ret;
int rets[3];
unsigned long flags;
- int rc, reset_state;
+ int reset_state;
struct eeh_event *event;
struct pci_dn *pdn;

@@ -603,20 +628,7 @@

reset_state = rets[0];

- spin_lock_irqsave(&slot_errbuf_lock, flags);
- memset(slot_errbuf, 0, eeh_error_buf_size);
-
- rc = rtas_call(ibm_slot_error_detail,
- 8, 1, NULL, pdn->eeh_config_addr,
- BUID_HI(pdn->phb->buid),
- BUID_LO(pdn->phb->buid), NULL, 0,
- virt_to_phys(slot_errbuf),
- eeh_error_buf_size,
- 1 /* Temporary Error */);
-
- if (rc == 0)
- log_error(slot_errbuf, ERR_TYPE_RTAS_LOG, 0);
- spin_unlock_irqrestore(&slot_errbuf_lock, flags);
+ eeh_slot_error_detail (pdn, 1 /* Temporary Error */);

printk(KERN_INFO "EEH: MMIO failure (%d) on device: %s %s\n",
rets[0], dn->name, dn->full_name);
@@ -783,6 +795,8 @@
struct device_node *phb, *np;
struct eeh_early_enable_info info;

+ spin_lock_init(&slot_errbuf_lock);
+
np = of_find_node_by_path("/rtas");
if (np == NULL)
return;

2005-09-30 00:58:35

by linas

[permalink] [raw]
Subject: [PATCH 5/7] ppc64: EEH handle empty PCI slot failure



05-eeh-empty-slot-error.patch

Performing PCI config-space reads to empty PCI slots can lead to reports of
"permanent failure" from the firmware. Ignore permanent failures on empty slots.

Signed-off-by: Linas Vepstas <[email protected]>

Index: linux-2.6.14-rc2-git6/arch/ppc64/kernel/eeh.c
===================================================================
--- linux-2.6.14-rc2-git6.orig/arch/ppc64/kernel/eeh.c 2005-09-29 16:06:25.583986100 -0500
+++ linux-2.6.14-rc2-git6/arch/ppc64/kernel/eeh.c 2005-09-29 16:06:33.567867154 -0500
@@ -617,7 +617,32 @@
* In any case they must share a common PHB.
*/
ret = read_slot_reset_state(pdn, rets);
- if (!(ret == 0 && rets[1] == 1 && (rets[0] == 2 || rets[0] == 4))) {
+
+ /* If the call to firmware failed, punt */
+ if (ret != 0) {
+ printk(KERN_WARNING "EEH: read_slot_reset_state() failed; rc=%d dn=%s\n",
+ ret, dn->full_name);
+ __get_cpu_var(false_positives)++;
+ return 0;
+ }
+
+ /* If EEH is not supported on this device, punt. */
+ if (rets[1] != 1) {
+ printk(KERN_WARNING "EEH: event on unsupported device, rc=%d dn=%s\n",
+ ret, dn->full_name);
+ __get_cpu_var(false_positives)++;
+ return 0;
+ }
+
+ /* If not the kind of error we know about, punt. */
+ if (rets[0] != 2 && rets[0] != 4 && rets[0] != 5) {
+ __get_cpu_var(false_positives)++;
+ return 0;
+ }
+
+ /* Note that config-io to empty slots may fail;
+ * we recognize empty because they don't have children. */
+ if ((rets[0] == 5) && (dn->child == NULL)) {
__get_cpu_var(false_positives)++;
return 0;
}
@@ -650,7 +675,7 @@
/* Most EEH events are due to device driver bugs. Having
* a stack trace will help the device-driver authors figure
* out what happened. So print that out. */
- dump_stack();
+ if (rets[0] != 5) dump_stack();
schedule_work(&eeh_event_wq);

return 0;

2005-09-30 01:00:42

by linas

[permalink] [raw]
Subject: [PATCH 6/7] ppc64: EEH Avoid racing reports of errors



06-eeh-report-race.patch

When a PCI slot is isolated, all PCI functions under that slot are affected.
If hese functions have separate device drivers, the EEH isolation event
might be reported multiple times. This patch adds a lock to prevent the
racing of such multiple reports. It also marks every device under the slot
as having experienced an EEH event, so that multiple reports may be
recognized more easily.

Signed-off-by: Linas Vepstas <[email protected]>

Index: linux-2.6.14-rc2-git6/arch/ppc64/kernel/eeh.c
===================================================================
--- linux-2.6.14-rc2-git6.orig/arch/ppc64/kernel/eeh.c 2005-09-29 16:06:33.567867154 -0500
+++ linux-2.6.14-rc2-git6/arch/ppc64/kernel/eeh.c 2005-09-29 16:06:39.106090510 -0500
@@ -96,6 +96,9 @@

static int eeh_subsystem_enabled;

+/* Lock to avoid races due to multiple reports of an error */
+static DEFINE_SPINLOCK(confirm_error_lock);
+
/* Buffer for reporting slot-error-detail rtas calls */
static unsigned char slot_errbuf[RTAS_ERROR_LOG_MAX];
static DEFINE_SPINLOCK(slot_errbuf_lock);
@@ -544,6 +547,55 @@
return pa | (token & (PAGE_SIZE-1));
}

+/**
+ * Return the "partitionable endpoint" (pe) under which this device lies
+ */
+static struct device_node * find_device_pe(struct device_node *dn)
+{
+ while ((dn->parent) && PCI_DN(dn->parent) &&
+ (PCI_DN(dn->parent)->eeh_mode & EEH_MODE_SUPPORTED)) {
+ dn = dn->parent;
+ }
+ return dn;
+}
+
+/** Mark all devices that are peers of this device as failed.
+ * Mark the device driver too, so that it can see the failure
+ * immediately; this is critical, since some drivers poll
+ * status registers in interrupts ... If a driver is polling,
+ * and the slot is frozen, then the driver can deadlock in
+ * an interrupt context, which is bad.
+ */
+
+static inline void __eeh_mark_slot (struct device_node *dn)
+{
+ while (dn) {
+ PCI_DN(dn)->eeh_mode |= EEH_MODE_ISOLATED;
+
+ if (dn->child)
+ __eeh_mark_slot (dn->child);
+ dn = dn->sibling;
+ }
+}
+
+static inline void __eeh_clear_slot (struct device_node *dn)
+{
+ while (dn) {
+ PCI_DN(dn)->eeh_mode &= ~EEH_MODE_ISOLATED;
+ if (dn->child)
+ __eeh_clear_slot (dn->child);
+ dn = dn->sibling;
+ }
+}
+
+static inline void eeh_clear_slot (struct device_node *dn)
+{
+ unsigned long flags;
+ spin_lock_irqsave(&confirm_error_lock, flags);
+ __eeh_clear_slot (dn);
+ spin_unlock_irqrestore(&confirm_error_lock, flags);
+}
+
/**
* eeh_dn_check_failure - check if all 1's data is due to EEH slot freeze
* @dn device node
@@ -567,6 +619,8 @@
int reset_state;
struct eeh_event *event;
struct pci_dn *pdn;
+ struct device_node *pe_dn;
+ int rc = 0;

__get_cpu_var(total_mmio_ffs)++;

@@ -594,10 +648,14 @@
return 0;
}

- /*
- * If we already have a pending isolation event for this
- * slot, we know it's bad already, we don't need to check...
+ /* If we already have a pending isolation event for this
+ * slot, we know it's bad already, we don't need to check.
+ * Do this checking under a lock; as multiple PCI devices
+ * in one slot might report errors simultaneously, and we
+ * only want one error recovery routine running.
*/
+ spin_lock_irqsave(&confirm_error_lock, flags);
+ rc = 1;
if (pdn->eeh_mode & EEH_MODE_ISOLATED) {
atomic_inc(&eeh_fail_count);
if (atomic_read(&eeh_fail_count) >= EEH_MAX_FAILS) {
@@ -606,7 +664,7 @@
rets[0] = -1; /* reset state unknown */
eeh_panic(dev, rets[0]);
}
- return 0;
+ goto dn_unlock;
}

/*
@@ -623,7 +681,8 @@
printk(KERN_WARNING "EEH: read_slot_reset_state() failed; rc=%d dn=%s\n",
ret, dn->full_name);
__get_cpu_var(false_positives)++;
- return 0;
+ rc = 0;
+ goto dn_unlock;
}

/* If EEH is not supported on this device, punt. */
@@ -631,25 +690,33 @@
printk(KERN_WARNING "EEH: event on unsupported device, rc=%d dn=%s\n",
ret, dn->full_name);
__get_cpu_var(false_positives)++;
- return 0;
+ rc = 0;
+ goto dn_unlock;
}

/* If not the kind of error we know about, punt. */
if (rets[0] != 2 && rets[0] != 4 && rets[0] != 5) {
__get_cpu_var(false_positives)++;
- return 0;
+ rc = 0;
+ goto dn_unlock;
}

/* Note that config-io to empty slots may fail;
* we recognize empty because they don't have children. */
if ((rets[0] == 5) && (dn->child == NULL)) {
__get_cpu_var(false_positives)++;
- return 0;
+ rc = 0;
+ goto dn_unlock;
}

- /* prevent repeated reports of this failure */
- pdn->eeh_mode |= EEH_MODE_ISOLATED;
- __get_cpu_var(slot_resets)++;
+ __get_cpu_var(slot_resets)++;
+
+ /* Avoid repeated reports of this failure, including problems
+ * with other functions on this device, and functions under
+ * bridges. */
+ pe_dn = find_device_pe (dn);
+ __eeh_mark_slot (pe_dn);
+ spin_unlock_irqrestore(&confirm_error_lock, flags);

reset_state = rets[0];

@@ -678,10 +745,14 @@
if (rets[0] != 5) dump_stack();
schedule_work(&eeh_event_wq);

- return 0;
+ return 1;
+
+dn_unlock:
+ spin_unlock_irqrestore(&confirm_error_lock, flags);
+ return rc;
}

-EXPORT_SYMBOL(eeh_dn_check_failure);
+EXPORT_SYMBOL_GPL(eeh_dn_check_failure);

/**
* eeh_check_failure - check if all 1's data is due to EEH slot freeze
@@ -820,6 +891,7 @@
struct device_node *phb, *np;
struct eeh_early_enable_info info;

+ spin_lock_init(&confirm_error_lock);
spin_lock_init(&slot_errbuf_lock);

np = of_find_node_by_path("/rtas");

2005-09-30 01:02:41

by linas

[permalink] [raw]
Subject: [PATCH 7/7] ppc64: EEH Halt if bad drivers spin in error condition


07-eeh-spin-counter.patch

One an EEH event is triggers, all further I/O to a device is blocked (until
reset). Bad device drivers may end up spinning in their interrupt handlers,
trying to read an interrupt status register that will never change state.
This patch moves that spin counter to a per-device structure, and adds
some diagnostic prints to help locate the bad driver.

Signed-off-by: Linas Vepstas <[email protected]>

Index: linux-2.6.14-rc2-git6/arch/ppc64/kernel/eeh.c
===================================================================
--- linux-2.6.14-rc2-git6.orig/arch/ppc64/kernel/eeh.c 2005-09-29 16:29:00.726884805 -0500
+++ linux-2.6.14-rc2-git6/arch/ppc64/kernel/eeh.c 2005-09-29 16:32:05.550949258 -0500
@@ -78,14 +78,12 @@

static struct notifier_block *eeh_notifier_chain;

-/*
- * If a device driver keeps reading an MMIO register in an interrupt
+/* If a device driver keeps reading an MMIO register in an interrupt
* handler after a slot isolation event has occurred, we assume it
* is broken and panic. This sets the threshold for how many read
* attempts we allow before panicking.
*/
-#define EEH_MAX_FAILS 1000
-static atomic_t eeh_fail_count;
+#define EEH_MAX_FAILS 100000

/* RTAS tokens */
static int ibm_set_eeh_option;
@@ -521,7 +519,6 @@
"%s\n", event->reset_state,
pci_name(event->dev));

- atomic_set(&eeh_fail_count, 0);
notifier_call_chain (&eeh_notifier_chain,
EEH_NOTIFY_FREEZE, event);

@@ -657,12 +654,18 @@
spin_lock_irqsave(&confirm_error_lock, flags);
rc = 1;
if (pdn->eeh_mode & EEH_MODE_ISOLATED) {
- atomic_inc(&eeh_fail_count);
- if (atomic_read(&eeh_fail_count) >= EEH_MAX_FAILS) {
+ pdn->eeh_check_count ++;
+ if (pdn->eeh_check_count >= EEH_MAX_FAILS) {
+ printk (KERN_ERR "EEH: Device driver ignored %d bad reads, panicing\n",
+ pdn->eeh_check_count);
+ dump_stack();
+
/* re-read the slot reset state */
if (read_slot_reset_state(pdn, rets) != 0)
rets[0] = -1; /* reset state unknown */
- eeh_panic(dev, rets[0]);
+
+ /* If we are here, then we hit an infinite loop. Stop. */
+ panic("EEH: MMIO halt (%d) on device:%s\n", rets[0], pci_name(dev));
}
goto dn_unlock;
}
@@ -808,6 +811,8 @@
struct pci_dn *pdn = PCI_DN(dn);

pdn->eeh_mode = 0;
+ pdn->eeh_check_count = 0;
+ pdn->eeh_freeze_count = 0;

if (status && strcmp(status, "ok") != 0)
return NULL; /* ignore devices with bad status */

2005-09-30 04:49:25

by Doug Maxey

[permalink] [raw]
Subject: Re: [PATCH 7/7] ppc64: EEH Halt if bad drivers spin in error condition


On Thu, 29 Sep 2005 20:02:28 CDT, linas wrote:
>
>07-eeh-spin-counter.patch
>
>One an EEH event is triggers, all further I/O to a device is blocked (until
>reset). Bad device drivers may end up spinning in their interrupt handlers,
>trying to read an interrupt status register that will never change state.
>This patch moves that spin counter to a per-device structure, and adds
>some diagnostic prints to help locate the bad driver.
>

Which struct gets the element?

++doug

2005-09-30 14:58:28

by linas

[permalink] [raw]
Subject: Re: [PATCH 7/7] ppc64: EEH Halt if bad drivers spin in error condition

On Thu, Sep 29, 2005 at 11:49:09PM -0500, Doug Maxey was heard to remark:
>
> On Thu, 29 Sep 2005 20:02:28 CDT, linas wrote:
> >
> >07-eeh-spin-counter.patch
> >
> >One an EEH event is triggers, all further I/O to a device is blocked (until
> >reset). Bad device drivers may end up spinning in their interrupt handlers,
> >trying to read an interrupt status register that will never change state.
> >This patch moves that spin counter to a per-device structure, and adds
> >some diagnostic prints to help locate the bad driver.
> >
>
> Which struct gets the element?

struct pci_dn, which Paulus recently introduced; it splits off the pci
parts from struct device_node. Think of it as holding all the firmaware
and arch-specific peices that can't be jammed in the generic struct pci_dev.

--linas

2005-09-30 22:29:25

by linas

[permalink] [raw]
Subject: Re: [PATCH 0/7] ppc64: Assorted minor EEH cleanups

On Thu, Sep 29, 2005 at 07:48:00PM -0500, linas was heard to remark:
>
> They compile but (ahem) are not tested,

They are now tested. They work (I had a corupted initrd yesterday).
Please apply and foward as soon as possible.

During testing I found two unrelated bugs; wasn't able to squeeze out
patches for today; maybe monday.

Paul, these are:
1) You added an eeh_capable flag that is never initialized, and so
this blocks operation. I don't think this flag is needed, as it
duplicates a bitflag in eeh_mode. (Unless your plan is to use
bitfields; do you want to use C language bitfields?)

2) PCI hotplug is broken because the flag phb->is_dynamic is never
set to one. As a result, hotplug add calls __alloc_bootmem
instead of kmalloc(), and crashes. I was testing a potential
patch just now, but the clock ran out.

--linas

p.s. I hope to spit out the rest of the patces, including the kthread
handling, early next week. I've got things mostly ported, and am
testing. Let me know how to best coordinate on this.


2005-10-05 11:23:24

by Paul Mackerras

[permalink] [raw]
Subject: Re: [PATCH 1/7] ppc64: EEH typos, include files, macros, whitespace

Linas writes:

> 01-eeh-minor-cleanup.patch

Some trivial comments on a trivial patch... :)

> - printk(KERN_WARNING "PCI: no pci dn found for dev=%s\n",
> - pci_name(dev));
> + printk(KERN_WARNING "PCI: no pci dn found for dev=%s\n", pci_name(dev));

This makes the line go over 80 columns, which seems unnecessary.

> - * @token i/o token, should be address in the form 0xE....
> + * @token i/o token, should be address in the form 0xA....

I think the virtual addresses we get from ioremap these days start
with 0xD00008...

Regards,
Paul.

2005-10-05 11:23:46

by Paul Mackerras

[permalink] [raw]
Subject: Re: [PATCH 3/7] ppc64: EEH Add event/internal state statistics

Linas writes:

> 03-eeh-statistics.patch

> + if (!dn) {
> + __get_cpu_var(no_dn)++;

We have to make sure we are not preemptible when we use
__get_cpu_var, since it uses smp_processor_id(). It's not clear to me
that we have ensured that in every case where we use __get_cpu_var.
Are you sure that we hold a spinlock, or are at interrupt level, or
have explicitly disabled preemption at every point where we use
__get_cpu_var?

Regards,
Paul.

2005-10-05 11:23:24

by Paul Mackerras

[permalink] [raw]
Subject: Re: [PATCH 6/7] ppc64: EEH Avoid racing reports of errors

Linas writes:

> 06-eeh-report-race.patch

> +/** Mark all devices that are peers of this device as failed.
> + * Mark the device driver too, so that it can see the failure
> + * immediately; this is critical, since some drivers poll
> + * status registers in interrupts ... If a driver is polling,
> + * and the slot is frozen, then the driver can deadlock in
> + * an interrupt context, which is bad.
> + */
> +
> +static inline void __eeh_mark_slot (struct device_node *dn)
> +{
> + while (dn) {
> + PCI_DN(dn)->eeh_mode |= EEH_MODE_ISOLATED;
> +
> + if (dn->child)
> + __eeh_mark_slot (dn->child);
> + dn = dn->sibling;
> + }
> +}

So this does the device node that we pass in, plus all the nodes that
come after it in its parent's list of children. On that basis I
expected you to pass in the first child of the EADS bridge, but I see:

> + pe_dn = find_device_pe (dn);
> + __eeh_mark_slot (pe_dn);

My understanding is that pe_dn will end up pointing to the device node
for the EADS bridge. Shouldn't you pass in pe_dn->child here, or
alternatively rearrange __eeh_mark_slot to do the node you give it
plus its children (recursively)?

Two other comments about __eeh_mark_slot: (1) despite the comment, the
function doesn't do anything to any pci_dev or pci_driver (not that it
should be touching any pci_driver), and (2) a recursive function can't
really be inline (unless gcc is smart enough to turn arbitrary
recursive functions into iterative functions, which I doubt :).

Regards,
Paul.

2005-10-07 14:59:55

by linas

[permalink] [raw]
Subject: Re: [PATCH 3/7] ppc64: EEH Add event/internal state statistics

On Wed, Oct 05, 2005 at 09:14:58PM +1000, Paul Mackerras was heard to remark:
> Linas writes:
>
> > 03-eeh-statistics.patch
>
> > + if (!dn) {
> > + __get_cpu_var(no_dn)++;
>
> We have to make sure we are not preemptible when we use
> __get_cpu_var, since it uses smp_processor_id(). It's not clear to me
> that we have ensured that in every case where we use __get_cpu_var.
> Are you sure that we hold a spinlock, or are at interrupt level, or
> have explicitly disabled preemption at every point where we use
> __get_cpu_var?

Tese used to be plain-old global variables, but someone submitted
a patch that to turn them into the __get_cpu_var() form. I don't
know why; there's no real performance reason, since these are almost
never incremented, except a bit during boot. What if we just change
them back to global vars?

I've also day-dreamed about moving these stats to somewhere in
in the /sys directory. Any suggestions there?

--linas

2005-10-07 15:23:13

by linas

[permalink] [raw]
Subject: Re: [PATCH 6/7] ppc64: EEH Avoid racing reports of errors

On Wed, Oct 05, 2005 at 09:23:11PM +1000, Paul Mackerras was heard to remark:
> Linas writes:
>
> > 06-eeh-report-race.patch
>
> Shouldn't you pass in pe_dn->child here, or
> alternatively rearrange __eeh_mark_slot to do the node you give it
> plus its children (recursively)?

Yes; that's right; this gets fixed in a later patch in the series.
I guess this one snuck by while I was trying to sync up all the
different patches I was carrying :-/

> Two other comments about __eeh_mark_slot: (1) despite the comment, the
> function doesn't do anything to any pci_dev or pci_driver

The comment is also a "back port" of function that shows up in a later
patch, and so indeed is inappropriate for this patch. Again, my excuse
is that I got sloppy while juggling all of these patchlets. Sorry.

> (not that it
> should be touching any pci_driver),

One problem I was seeing was that after getting an EEH error,
some device drivers would start spinning in thier interrupt handlers.
I tried to break out of this spin-loop by adding a call to a
function that asked "am I the victim of an EEH event"?
Unfortunately, the first implementation of this call was not
interrupt safe (pci_device_to_OF_node calls traverse_pci_devices).
While scratching my head on to how to best fix this, I decided that
the best thing to do would be to mark up the pci driver with a flag;
that way, the driver can look up te EEH state without any further ado.

One might be able to get rid of this state in pci_driver,
although it seemed generically useful to have. For example,
later on, I futzed with a version that disabled the irq line
for that adapter "as soon as possible", and that seems to also
work, at least on an SMP machine. On a non-SMP machine, there
is still the danger that the device driver is spinning with
interrupts disabled, waiting on a status regiser to change,
that will never change. (And because of the deadlock, the
code to disable a given irq line never runs). Its all
depends on how the device driver got written.

> and (2) a recursive function can't
> really be inline

Well, no, but at least the first level call can be inlined; I assumed
that gcc would do at least that, but didn't check.

--linas

2005-10-07 19:46:47

by linas

[permalink] [raw]
Subject: Re: [PATCH 1/7] ppc64: EEH typos, include files, macros, whitespace

On Wed, Oct 05, 2005 at 09:11:43PM +1000, Paul Mackerras was heard to remark:
>
> This makes the line go over 80 columns, which seems unnecessary.

Hm, I code with tabstops set to 3, on a 132-column terminal.
It looked goofy there.

> > - * @token i/o token, should be address in the form 0xE....
> > + * @token i/o token, should be address in the form 0xA....
>
> I think the virtual addresses we get from ioremap these days start
> with 0xD00008...

Ah, didn't realize this changed. I was simultaneously debugging
a 2.4 kernel (for other reasons) when I noticed this.

--linas