2003-02-13 00:21:10

by Bjorn Helgaas

[permalink] [raw]
Subject: tg3: back-to-back register write bug workaround causes MCA

The attached change to tg3 causes an MCA on an HP zx6000
(a 2-CPU IA64 box). This is with Marcelo's current 2.4.x BK tree
plus the ia64 patch. Backing out the change below makes the
MCA go away.

Driver attach:
tg3.c:v1.4 (Feb 1, 2003)
PCI: Found IRQ 56 for device 20:02.0
eth1: Tigon3 [partno(BCM95700A6) rev 0105 PHY(5701)] (PCI:66MHz:64-bit) 10/100/1000BaseT Ethernet 00:30:6e:38:d9:67

lspci -vv:
20:02.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5701 Gigabit Ethernet (rev 15)
Subsystem: Hewlett-Packard Company: Unknown device 12a4
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR+ FastB2B-
Status: Cap+ 66Mhz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
Latency: 192 (16000ns min), cache line size 20
Interrupt: pin A routed to IRQ 56
Region 0: Memory at 0000000090800000 (64-bit, non-prefetchable) [size=64K]
Capabilities: [40] PCI-X non-bridge device.
Command: DPERE- ERO- RBC=0 OST=0
Status: Bus=0 Dev=0 Func=0 64bit- 133MHz- SCD- USC-, DC=simple, DMMRBC=0, DMOST=0, DMCRS=0, RSCEM-
Capabilities: [48] Power Management version 2
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot+,D3cold+)
Status: D0 PME-Enable- DSel=0 DScale=1 PME-
Capabilities: [50] Vital Product Data
Capabilities: [58] Message Signalled Interrupts: 64bit+ Queue=0/3 Enable-
Address: 968ec1a842f280a8 Data: ca46

The MCA occurs when dhclient configures the interface. The PCI bus
address is 0x0000000090806804, and the root bridge says the error is
"received no DEVSEL# when mastering I/O bus".

Is there any information I can collect to help diagnose the problem?

Bjorn


# --------------------------------------------
# 03/01/30 [email protected] 1.953.2.5
# [TG3]: Workaround 5701 back-to-back register write bug.
# --------------------------------------------
#
diff -Nru a/drivers/net/tg3.c b/drivers/net/tg3.c
--- a/drivers/net/tg3.c Wed Feb 12 15:42:52 2003
+++ b/drivers/net/tg3.c Wed Feb 12 15:42:53 2003
@@ -165,6 +165,8 @@
spin_unlock_irqrestore(&tp->indirect_lock, flags);
} else {
writel(val, tp->regs + off);
+ if ((tp->tg3_flags & TG3_FLAG_5701_REG_WRITE_BUG) != 0)
+ readl(tp->regs + off);
}
}

@@ -5961,6 +5963,14 @@
pci_write_config_word(tp->pdev, PCI_COMMAND, pci_cmd);
}
}
+
+ /* Back to back register writes can cause problems on this chip,
+ * the workaround is to read back all reg writes except those to
+ * mailbox regs. See tg3_write_indirect_reg32().
+ */
+ if (GET_ASIC_REV(tp->pci_chip_rev_id) == ASIC_REV_5701)
+ tp->tg3_flags |= TG3_FLAG_5701_REG_WRITE_BUG;
+
if ((pci_state_reg & PCISTATE_BUS_SPEED_HIGH) != 0)
tp->tg3_flags |= TG3_FLAG_PCI_HIGH_SPEED;
if ((pci_state_reg & PCISTATE_BUS_32BIT) != 0)
diff -Nru a/drivers/net/tg3.h b/drivers/net/tg3.h
--- a/drivers/net/tg3.h Wed Feb 12 15:42:53 2003
+++ b/drivers/net/tg3.h Wed Feb 12 15:42:53 2003
@@ -1795,6 +1795,7 @@
#define TG3_FLAG_USE_LINKCHG_REG 0x00000008
#define TG3_FLAG_USE_MI_INTERRUPT 0x00000010
#define TG3_FLAG_ENABLE_ASF 0x00000020
+#define TG3_FLAG_5701_REG_WRITE_BUG 0x00000040
#define TG3_FLAG_POLL_SERDES 0x00000080
#define TG3_FLAG_MBOX_WRITE_REORDER 0x00000100
#define TG3_FLAG_PCIX_TARGET_HWBUG 0x00000200


2003-02-13 03:37:00

by David Miller

[permalink] [raw]
Subject: Re: tg3: back-to-back register write bug workaround causes MCA

From: Bjorn Helgaas <[email protected]>
Date: Wed, 12 Feb 2003 17:30:53 -0700

The attached change to tg3 causes an MCA on an HP zx6000
(a 2-CPU IA64 box). This is with Marcelo's current 2.4.x BK tree
plus the ia64 patch. Backing out the change below makes the
MCA go away.

This sounds like either a bug in your ia64's PCI chipset or
in the tigon3 device.

Which means the only viable solution is to fail to probe this
tigon3 chip on ia64 systems using the same PCI host controller
as you have.

Can you ask a ia64 PCI host controller expert if there are any
known errata in this area?

2003-02-14 17:16:57

by Bjorn Helgaas

[permalink] [raw]
Subject: Re: tg3: back-to-back register write bug workaround causes MCA

On Wednesday 12 February 2003 8:32 pm, David S. Miller wrote:
> This sounds like either a bug in your ia64's PCI chipset or
> in the tigon3 device.

We haven't captured a PCI trace yet, but I think we have a good
lead. The MCA occurs in tg3_chip_reset(), which does:

tw32(GRC_MISC_CFG, GRC_MISC_CFG_CORECLK_RESET);

The comment immediately after the reset:

/* Flush PCI posted writes. The normal MMIO registers
* are inaccessible at this time so this is the only
* way to make this reliably. I tried to use indirect
* register read/write but this upset some 5701 variants.
*/
pci_read_config_dword(tp->pdev, PCI_COMMAND, &val);

says that the MMIO registers are inaccessible at this time.
Presumably they became inaccessible when tg3_write_indirect_reg32()
did the write to GRC_MISC_CFG, so the read-after-write for the
TG3_FLAG_5701_REG_WRITE_BUG is then reading an inaccessible register.

One unusual thing about our ia64 chipset (and our parisc chipset)
is that it's typically configured so PCI master aborts cause an MCA.
My understanding is that most other PCI controllers basically ignore
master aborts, so the aborted read would just return -1 instead of
causing an MCA.

The following change (though not correct because it ignores
TG3_FLAG_PCIX_TARGET_HWBUG) avoids the MCA:

--- 1.57/drivers/net/tg3.c Fri Feb 14 09:24:48 2003
+++ edited/drivers/net/tg3.c Fri Feb 14 09:26:49 2003
@@ -3059,7 +3059,7 @@
}
}

- tw32(GRC_MISC_CFG, GRC_MISC_CFG_CORECLK_RESET);
+ writel(GRC_MISC_CFG_CORECLK_RESET, tp->regs + GRC_MISC_CFG);

/* Flush PCI posted writes. The normal MMIO registers
* are inaccessible at this time so this is the only

So perhaps we need a special-case path for resetting, so we don't
try to access the registers while they're disabled.

Bjorn