2004-03-18 09:46:56

by Mikael Pettersson

[permalink] [raw]
Subject: tulip (pnic) errors in 2.6.5-rc1

2.6.5-rc1 causes my Netgear FA310TX to
fill the kernel log with messages like:

eth1: In tulip_rx(), entry 57 004c0728.
eth1: interrupt csr5=0x02670050 new csr5=0x02660010.
eth1: exiting interrupt, csr5=0x2660010.
In tulip_rx(), entry 58 004c0728.
eth1: In tulip_rx(), entry 58 004c0728.
eth1: interrupt csr5=0x02670050 new csr5=0x02660010.
eth1: exiting interrupt, csr5=0x2660010.
In tulip_rx(), entry 59 006a0300.
eth1: In tulip_rx(), entry 59 006a0300.
eth1: interrupt csr5=0x02670050 new csr5=0x02660010.
eth1: exiting interrupt, csr5=0x2660010.
eth1: interrupt csr5=0x02670014 new csr5=0x02660010.
eth1: exiting interrupt, csr5=0x2660010.

and on and on and on ...

No previous kernel had this problem.

2.6.4 identifies the nic as:

PCI: Found IRQ 10 for device 0000:00:0b.0
tulip1: MII transceiver #1 config 3000 status 7829 advertising 01e1.
eth1: Lite-On 82c168 PNIC rev 32 at 0xa000, XX:XX:XX:XX:XX:XX, IRQ 10.
eth1: Setting full-duplex based on MII#1 link partner capability of 41e1.

/Mikael


2004-03-18 09:52:27

by Jeff Garzik

[permalink] [raw]
Subject: Re: tulip (pnic) errors in 2.6.5-rc1

===== drivers/scsi/libata-core.c 1.26 vs edited =====
--- 1.26/drivers/scsi/libata-core.c Mon Mar 15 11:43:58 2004
+++ edited/drivers/scsi/libata-core.c Thu Mar 18 03:36:13 2004
@@ -2263,9 +2263,12 @@
mb(); /* make sure PRD table writes are visible to controller */
writel(ap->prd_dma, mmio + ATA_DMA_TABLE_OFS);

- /* specify data direction */
- /* FIXME: redundant to later start-dma command? */
- writeb(rw ? 0 : ATA_DMA_WR, mmio + ATA_DMA_CMD);
+ /* specify data direction, triple-check start bit is clear */
+ dmactl = readb(mmio + ATA_DMA_CMD);
+ dmactl &= ~(ATA_DMA_WR | ATA_DMA_START);
+ if (!rw)
+ dmactl |= ATA_DMA_WR;
+ writeb(dmactl, mmio + ATA_DMA_CMD);

/* clear interrupt, error bits */
host_stat = readb(mmio + ATA_DMA_STATUS);
@@ -2275,7 +2278,6 @@
ap->ops->exec_command(ap, &qc->tf);

/* start host DMA transaction */
- dmactl = readb(mmio + ATA_DMA_CMD);
writeb(dmactl | ATA_DMA_START, mmio + ATA_DMA_CMD);

/* Strictly, one may wish to issue a readb() here, to
@@ -2308,9 +2310,12 @@
/* load PRD table addr. */
outl(ap->prd_dma, ap->ioaddr.bmdma_addr + ATA_DMA_TABLE_OFS);

- /* specify data direction */
- /* FIXME: redundant to later start-dma command? */
- outb(rw ? 0 : ATA_DMA_WR, ap->ioaddr.bmdma_addr + ATA_DMA_CMD);
+ /* specify data direction, triple-check start bit is clear */
+ dmactl = inb(ap->ioaddr.bmdma_addr + ATA_DMA_CMD);
+ dmactl &= ~(ATA_DMA_WR | ATA_DMA_START);
+ if (!rw)
+ dmactl |= ATA_DMA_WR;
+ outb(dmactl, ap->ioaddr.bmdma_addr + ATA_DMA_CMD);

/* clear interrupt, error bits */
host_stat = inb(ap->ioaddr.bmdma_addr + ATA_DMA_STATUS);
@@ -2321,7 +2326,6 @@
ap->ops->exec_command(ap, &qc->tf);

/* start host DMA transaction */
- dmactl = inb(ap->ioaddr.bmdma_addr + ATA_DMA_CMD);
outb(dmactl | ATA_DMA_START,
ap->ioaddr.bmdma_addr + ATA_DMA_CMD);
}
@@ -2344,14 +2348,16 @@
void *mmio = (void *) ap->ioaddr.bmdma_addr;

/* clear start/stop bit */
- writeb(0, mmio + ATA_DMA_CMD);
+ writeb(readb(mmio + ATA_DMA_CMD) & ~ATA_DMA_START,
+ mmio + ATA_DMA_CMD);

/* ack intr, err bits */
writeb(host_stat | ATA_DMA_INTR | ATA_DMA_ERR,
mmio + ATA_DMA_STATUS);
} else {
/* clear start/stop bit */
- outb(0, ap->ioaddr.bmdma_addr + ATA_DMA_CMD);
+ outb(inb(ap->ioaddr.bmdma_addr + ATA_DMA_CMD) & ~ATA_DMA_START,
+ ap->ioaddr.bmdma_addr + ATA_DMA_CMD);

/* ack intr, err bits */
outb(host_stat | ATA_DMA_INTR | ATA_DMA_ERR,


Attachments:
patch (2.41 kB)

2004-03-18 09:57:09

by Jeff Garzik

[permalink] [raw]
Subject: Re: tulip (pnic) errors in 2.6.5-rc1

diff -Nru a/drivers/net/tulip/21142.c b/drivers/net/tulip/21142.c
--- a/drivers/net/tulip/21142.c Mon Mar 15 21:48:00 2004
+++ b/drivers/net/tulip/21142.c Mon Mar 15 21:48:00 2004
@@ -29,7 +29,7 @@
void t21142_timer(unsigned long data)
{
struct net_device *dev = (struct net_device *)data;
- struct tulip_private *tp = (struct tulip_private *)dev->priv;
+ struct tulip_private *tp = netdev_priv(dev);
long ioaddr = dev->base_addr;
int csr12 = inl(ioaddr + CSR12);
int next_tick = 60*HZ;
@@ -103,7 +103,7 @@

void t21142_start_nway(struct net_device *dev)
{
- struct tulip_private *tp = (struct tulip_private *)dev->priv;
+ struct tulip_private *tp = netdev_priv(dev);
long ioaddr = dev->base_addr;
int csr14 = ((tp->sym_advertise & 0x0780) << 9) |
((tp->sym_advertise & 0x0020) << 1) | 0xffbf;
@@ -131,7 +131,7 @@

void t21142_lnk_change(struct net_device *dev, int csr5)
{
- struct tulip_private *tp = (struct tulip_private *)dev->priv;
+ struct tulip_private *tp = netdev_priv(dev);
long ioaddr = dev->base_addr;
int csr12 = inl(ioaddr + CSR12);

diff -Nru a/drivers/net/tulip/eeprom.c b/drivers/net/tulip/eeprom.c
--- a/drivers/net/tulip/eeprom.c Mon Mar 15 21:48:08 2004
+++ b/drivers/net/tulip/eeprom.c Mon Mar 15 21:48:08 2004
@@ -136,7 +136,7 @@
static struct mediatable *last_mediatable;
static unsigned char *last_ee_data;
static int controller_index;
- struct tulip_private *tp = (struct tulip_private *)dev->priv;
+ struct tulip_private *tp = netdev_priv(dev);
unsigned char *ee_data = tp->eeprom;
int i;

diff -Nru a/drivers/net/tulip/interrupt.c b/drivers/net/tulip/interrupt.c
--- a/drivers/net/tulip/interrupt.c Mon Mar 15 21:48:00 2004
+++ b/drivers/net/tulip/interrupt.c Mon Mar 15 21:48:00 2004
@@ -63,7 +63,7 @@

int tulip_refill_rx(struct net_device *dev)
{
- struct tulip_private *tp = (struct tulip_private *)dev->priv;
+ struct tulip_private *tp = netdev_priv(dev);
int entry;
int refilled = 0;

@@ -109,7 +109,7 @@

int tulip_poll(struct net_device *dev, int *budget)
{
- struct tulip_private *tp = (struct tulip_private *)dev->priv;
+ struct tulip_private *tp = netdev_priv(dev);
int entry = tp->cur_rx % RX_RING_SIZE;
int rx_work_limit = *budget;
int received = 0;
@@ -191,9 +191,9 @@
&& (skb = dev_alloc_skb(pkt_len + 2)) != NULL) {
skb->dev = dev;
skb_reserve(skb, 2); /* 16 byte align the IP header */
- pci_dma_sync_single(tp->pdev,
- tp->rx_buffers[entry].mapping,
- pkt_len, PCI_DMA_FROMDEVICE);
+ pci_dma_sync_single_for_cpu(tp->pdev,
+ tp->rx_buffers[entry].mapping,
+ pkt_len, PCI_DMA_FROMDEVICE);
#if ! defined(__alpha__)
eth_copy_and_sum(skb, tp->rx_buffers[entry].skb->tail,
pkt_len, 0);
@@ -203,6 +203,9 @@
tp->rx_buffers[entry].skb->tail,
pkt_len);
#endif
+ pci_dma_sync_single_for_device(tp->pdev,
+ tp->rx_buffers[entry].mapping,
+ pkt_len, PCI_DMA_FROMDEVICE);
} else { /* Pass up the skb already on the Rx ring. */
char *temp = skb_put(skb = tp->rx_buffers[entry].skb,
pkt_len);
@@ -354,7 +357,7 @@

static int tulip_rx(struct net_device *dev)
{
- struct tulip_private *tp = (struct tulip_private *)dev->priv;
+ struct tulip_private *tp = netdev_priv(dev);
int entry = tp->cur_rx % RX_RING_SIZE;
int rx_work_limit = tp->dirty_rx + RX_RING_SIZE - tp->cur_rx;
int received = 0;
@@ -412,9 +415,9 @@
&& (skb = dev_alloc_skb(pkt_len + 2)) != NULL) {
skb->dev = dev;
skb_reserve(skb, 2); /* 16 byte align the IP header */
- pci_dma_sync_single(tp->pdev,
- tp->rx_buffers[entry].mapping,
- pkt_len, PCI_DMA_FROMDEVICE);
+ pci_dma_sync_single_for_cpu(tp->pdev,
+ tp->rx_buffers[entry].mapping,
+ pkt_len, PCI_DMA_FROMDEVICE);
#if ! defined(__alpha__)
eth_copy_and_sum(skb, tp->rx_buffers[entry].skb->tail,
pkt_len, 0);
@@ -424,6 +427,9 @@
tp->rx_buffers[entry].skb->tail,
pkt_len);
#endif
+ pci_dma_sync_single_for_device(tp->pdev,
+ tp->rx_buffers[entry].mapping,
+ pkt_len, PCI_DMA_FROMDEVICE);
} else { /* Pass up the skb already on the Rx ring. */
char *temp = skb_put(skb = tp->rx_buffers[entry].skb,
pkt_len);
@@ -465,7 +471,7 @@
{
#ifdef __hppa__
int csr12 = inl(dev->base_addr + CSR12) & 0xff;
- struct tulip_private *tp = (struct tulip_private *)dev->priv;
+ struct tulip_private *tp = netdev_priv(dev);

if (csr12 != tp->csr12_shadow) {
/* ack interrupt */
@@ -490,7 +496,7 @@
irqreturn_t tulip_interrupt(int irq, void *dev_instance, struct pt_regs *regs)
{
struct net_device *dev = (struct net_device *)dev_instance;
- struct tulip_private *tp = (struct tulip_private *)dev->priv;
+ struct tulip_private *tp = netdev_priv(dev);
long ioaddr = dev->base_addr;
int csr5;
int missed;
diff -Nru a/drivers/net/tulip/media.c b/drivers/net/tulip/media.c
--- a/drivers/net/tulip/media.c Mon Mar 15 21:48:00 2004
+++ b/drivers/net/tulip/media.c Mon Mar 15 21:48:00 2004
@@ -48,7 +48,7 @@

int tulip_mdio_read(struct net_device *dev, int phy_id, int location)
{
- struct tulip_private *tp = (struct tulip_private *)dev->priv;
+ struct tulip_private *tp = netdev_priv(dev);
int i;
int read_cmd = (0xf6 << 10) | ((phy_id & 0x1f) << 5) | location;
int retval = 0;
@@ -111,7 +111,7 @@

void tulip_mdio_write(struct net_device *dev, int phy_id, int location, int val)
{
- struct tulip_private *tp = (struct tulip_private *)dev->priv;
+ struct tulip_private *tp = netdev_priv(dev);
int i;
int cmd = (0x5002 << 16) | ((phy_id & 0x1f) << 23) | (location<<18) | (val & 0xffff);
long ioaddr = dev->base_addr;
@@ -171,7 +171,7 @@
void tulip_select_media(struct net_device *dev, int startup)
{
long ioaddr = dev->base_addr;
- struct tulip_private *tp = (struct tulip_private *)dev->priv;
+ struct tulip_private *tp = netdev_priv(dev);
struct mediatable *mtable = tp->mtable;
u32 new_csr6;
int i;
@@ -374,7 +374,7 @@
*/
int tulip_check_duplex(struct net_device *dev)
{
- struct tulip_private *tp = dev->priv;
+ struct tulip_private *tp = netdev_priv(dev);
unsigned int bmsr, lpa, negotiated, new_csr6;

bmsr = tulip_mdio_read(dev, tp->phys[0], MII_BMSR);
@@ -420,7 +420,7 @@

void __devinit tulip_find_mii (struct net_device *dev, int board_idx)
{
- struct tulip_private *tp = dev->priv;
+ struct tulip_private *tp = netdev_priv(dev);
int phyn, phy_idx = 0;
int mii_reg0;
int mii_advert;
diff -Nru a/drivers/net/tulip/pnic.c b/drivers/net/tulip/pnic.c
--- a/drivers/net/tulip/pnic.c Mon Mar 15 21:48:02 2004
+++ b/drivers/net/tulip/pnic.c Mon Mar 15 21:48:02 2004
@@ -20,7 +20,7 @@

void pnic_do_nway(struct net_device *dev)
{
- struct tulip_private *tp = (struct tulip_private *)dev->priv;
+ struct tulip_private *tp = netdev_priv(dev);
long ioaddr = dev->base_addr;
u32 phy_reg = inl(ioaddr + 0xB8);
u32 new_csr6 = tp->csr6 & ~0x40C40200;
@@ -53,7 +53,7 @@

void pnic_lnk_change(struct net_device *dev, int csr5)
{
- struct tulip_private *tp = (struct tulip_private *)dev->priv;
+ struct tulip_private *tp = netdev_priv(dev);
long ioaddr = dev->base_addr;
int phy_reg = inl(ioaddr + 0xB8);

@@ -89,7 +89,7 @@
void pnic_timer(unsigned long data)
{
struct net_device *dev = (struct net_device *)data;
- struct tulip_private *tp = (struct tulip_private *)dev->priv;
+ struct tulip_private *tp = netdev_priv(dev);
long ioaddr = dev->base_addr;
int next_tick = 60*HZ;

diff -Nru a/drivers/net/tulip/pnic2.c b/drivers/net/tulip/pnic2.c
--- a/drivers/net/tulip/pnic2.c Mon Mar 15 21:48:00 2004
+++ b/drivers/net/tulip/pnic2.c Mon Mar 15 21:48:00 2004
@@ -84,7 +84,7 @@
void pnic2_timer(unsigned long data)
{
struct net_device *dev = (struct net_device *)data;
- struct tulip_private *tp = (struct tulip_private *)dev->priv;
+ struct tulip_private *tp = netdev_priv(dev);
long ioaddr = dev->base_addr;
int next_tick = 60*HZ;

@@ -100,7 +100,7 @@

void pnic2_start_nway(struct net_device *dev)
{
- struct tulip_private *tp = (struct tulip_private *)dev->priv;
+ struct tulip_private *tp = netdev_priv(dev);
long ioaddr = dev->base_addr;
int csr14;
int csr12;
@@ -175,7 +175,7 @@

void pnic2_lnk_change(struct net_device *dev, int csr5)
{
- struct tulip_private *tp = (struct tulip_private *)dev->priv;
+ struct tulip_private *tp = netdev_priv(dev);
long ioaddr = dev->base_addr;
int csr14;

diff -Nru a/drivers/net/tulip/timer.c b/drivers/net/tulip/timer.c
--- a/drivers/net/tulip/timer.c Mon Mar 15 21:47:59 2004
+++ b/drivers/net/tulip/timer.c Mon Mar 15 21:47:59 2004
@@ -20,7 +20,7 @@
void tulip_timer(unsigned long data)
{
struct net_device *dev = (struct net_device *)data;
- struct tulip_private *tp = (struct tulip_private *)dev->priv;
+ struct tulip_private *tp = netdev_priv(dev);
long ioaddr = dev->base_addr;
u32 csr12 = inl(ioaddr + CSR12);
int next_tick = 2*HZ;
@@ -135,7 +135,7 @@
void mxic_timer(unsigned long data)
{
struct net_device *dev = (struct net_device *)data;
- struct tulip_private *tp = (struct tulip_private *)dev->priv;
+ struct tulip_private *tp = netdev_priv(dev);
long ioaddr = dev->base_addr;
int next_tick = 60*HZ;

@@ -152,7 +152,7 @@
void comet_timer(unsigned long data)
{
struct net_device *dev = (struct net_device *)data;
- struct tulip_private *tp = (struct tulip_private *)dev->priv;
+ struct tulip_private *tp = netdev_priv(dev);
long ioaddr = dev->base_addr;
int next_tick = 60*HZ;

diff -Nru a/drivers/net/tulip/tulip_core.c b/drivers/net/tulip/tulip_core.c
--- a/drivers/net/tulip/tulip_core.c Mon Mar 15 21:48:02 2004
+++ b/drivers/net/tulip/tulip_core.c Mon Mar 15 21:48:02 2004
@@ -253,7 +253,7 @@
static struct net_device_stats *tulip_get_stats(struct net_device *dev);
static int private_ioctl(struct net_device *dev, struct ifreq *rq, int cmd);
static void set_rx_mode(struct net_device *dev);
-
+static void poll_tulip(struct net_device *dev);


static void tulip_set_power_state (struct tulip_private *tp,
@@ -276,7 +276,7 @@

static void tulip_up(struct net_device *dev)
{
- struct tulip_private *tp = (struct tulip_private *)dev->priv;
+ struct tulip_private *tp = netdev_priv(dev);
long ioaddr = dev->base_addr;
int next_tick = 3*HZ;
int i;
@@ -499,7 +499,7 @@

static void tulip_tx_timeout(struct net_device *dev)
{
- struct tulip_private *tp = (struct tulip_private *)dev->priv;
+ struct tulip_private *tp = netdev_priv(dev);
long ioaddr = dev->base_addr;
unsigned long flags;

@@ -587,7 +587,7 @@
/* Initialize the Rx and Tx rings, along with various 'dev' bits. */
static void tulip_init_ring(struct net_device *dev)
{
- struct tulip_private *tp = (struct tulip_private *)dev->priv;
+ struct tulip_private *tp = netdev_priv(dev);
int i;

tp->susp_rx = 0;
@@ -638,7 +638,7 @@
static int
tulip_start_xmit(struct sk_buff *skb, struct net_device *dev)
{
- struct tulip_private *tp = (struct tulip_private *)dev->priv;
+ struct tulip_private *tp = netdev_priv(dev);
int entry;
u32 flag;
dma_addr_t mapping;
@@ -724,7 +724,7 @@
static void tulip_down (struct net_device *dev)
{
long ioaddr = dev->base_addr;
- struct tulip_private *tp = (struct tulip_private *) dev->priv;
+ struct tulip_private *tp = netdev_priv(dev);
unsigned long flags;

del_timer_sync (&tp->timer);
@@ -764,7 +764,7 @@
static int tulip_close (struct net_device *dev)
{
long ioaddr = dev->base_addr;
- struct tulip_private *tp = (struct tulip_private *) dev->priv;
+ struct tulip_private *tp = netdev_priv(dev);
int i;

netif_stop_queue (dev);
@@ -811,7 +811,7 @@

static struct net_device_stats *tulip_get_stats(struct net_device *dev)
{
- struct tulip_private *tp = (struct tulip_private *)dev->priv;
+ struct tulip_private *tp = netdev_priv(dev);
long ioaddr = dev->base_addr;

if (netif_running(dev)) {
@@ -830,7 +830,7 @@

static int netdev_ethtool_ioctl(struct net_device *dev, void *useraddr)
{
- struct tulip_private *np = dev->priv;
+ struct tulip_private *np = netdev_priv(dev);
u32 ethcmd;

if (copy_from_user(&ethcmd, useraddr, sizeof(ethcmd)))
@@ -855,7 +855,7 @@
/* Provide ioctl() calls to examine the MII xcvr state. */
static int private_ioctl (struct net_device *dev, struct ifreq *rq, int cmd)
{
- struct tulip_private *tp = dev->priv;
+ struct tulip_private *tp = netdev_priv(dev);
long ioaddr = dev->base_addr;
struct mii_ioctl_data *data = (struct mii_ioctl_data *) & rq->ifr_data;
const unsigned int phy_idx = 0;
@@ -964,7 +964,7 @@

static void build_setup_frame_hash(u16 *setup_frm, struct net_device *dev)
{
- struct tulip_private *tp = (struct tulip_private *)dev->priv;
+ struct tulip_private *tp = netdev_priv(dev);
u16 hash_table[32];
struct dev_mc_list *mclist;
int i;
@@ -995,7 +995,7 @@

static void build_setup_frame_perfect(u16 *setup_frm, struct net_device *dev)
{
- struct tulip_private *tp = (struct tulip_private *)dev->priv;
+ struct tulip_private *tp = netdev_priv(dev);
struct dev_mc_list *mclist;
int i;
u16 *eaddrs;
@@ -1023,7 +1023,7 @@

static void set_rx_mode(struct net_device *dev)
{
- struct tulip_private *tp = (struct tulip_private *)dev->priv;
+ struct tulip_private *tp = netdev_priv(dev);
long ioaddr = dev->base_addr;
int csr6;

@@ -1150,7 +1150,7 @@
static void __devinit tulip_mwi_config (struct pci_dev *pdev,
struct net_device *dev)
{
- struct tulip_private *tp = dev->priv;
+ struct tulip_private *tp = netdev_priv(dev);
u8 cache;
u16 pci_command;
u32 csr0;
@@ -1373,7 +1373,7 @@
* initialize private data structure 'tp'
* it is zeroed and aligned in alloc_etherdev
*/
- tp = dev->priv;
+ tp = netdev_priv(dev);

tp->rx_ring = pci_alloc_consistent(pdev,
sizeof(struct tulip_rx_desc) * RX_RING_SIZE +
@@ -1618,6 +1618,9 @@
dev->get_stats = tulip_get_stats;
dev->do_ioctl = private_ioctl;
dev->set_multicast_list = set_rx_mode;
+#ifdef CONFIG_NET_POLL_CONTROLLER
+ dev->poll_controller = &poll_tulip;
+#endif

if (register_netdev(dev))
goto err_out_free_ring;
@@ -1756,7 +1759,7 @@
if (!dev)
return;

- tp = dev->priv;
+ tp = netdev_priv(dev);
pci_free_consistent (pdev,
sizeof (struct tulip_rx_desc) * RX_RING_SIZE +
sizeof (struct tulip_tx_desc) * TX_RING_SIZE,
@@ -1774,6 +1777,22 @@
/* pci_power_off (pdev, -1); */
}

+#ifdef CONFIG_NET_POLL_CONTROLLER
+/*
+ * Polling 'interrupt' - used by things like netconsole to send skbs
+ * without having to re-enable interrupts. It's not called while
+ * the interrupt routine is executing.
+ */
+
+static void poll_tulip (struct net_device *dev)
+{
+ /* disable_irq here is not very nice, but with the lockless
+ interrupt handler we have no other choice. */
+ disable_irq(dev->irq);
+ tulip_interrupt (dev->irq, dev, NULL);
+ enable_irq(dev->irq);
+}
+#endif

static struct pci_driver tulip_driver = {
.name = DRV_NAME,


Attachments:
patch (15.21 kB)

2004-03-18 10:42:54

by Mikael Pettersson

[permalink] [raw]
Subject: Re: tulip (pnic) errors in 2.6.5-rc1

Jeff Garzik writes:
> er, oops... lemme find the right patch...

No change, still a flood of those tulip_rx() interrupt messages.

2004-03-18 10:48:22

by Jeff Garzik

[permalink] [raw]
Subject: Re: tulip (pnic) errors in 2.6.5-rc1

Mikael Pettersson wrote:
> Jeff Garzik writes:
> > er, oops... lemme find the right patch...
>
> No change, still a flood of those tulip_rx() interrupt messages.

hmmm. Well, it is something unrelated to tulip driver, then.

Did you recently change module options, or forget to disable tulip_debug
in modprobe.conf or modules.conf ?

if (tulip_debug > 4)
printk(KERN_DEBUG "%s: exiting interrupt, csr5=%#4.4x.\n",
dev->name, inl(ioaddr + CSR5));

Those messages only appear if a non-default verbosity has been selected.

Jeff



2004-03-18 12:45:20

by Mikael Pettersson

[permalink] [raw]
Subject: Re: tulip (pnic) errors in 2.6.5-rc1

Jeff Garzik writes:
> Mikael Pettersson wrote:
> > Jeff Garzik writes:
> > > er, oops... lemme find the right patch...
> >
> > No change, still a flood of those tulip_rx() interrupt messages.
>
> hmmm. Well, it is something unrelated to tulip driver, then.
>
> Did you recently change module options, or forget to disable tulip_debug
> in modprobe.conf or modules.conf ?
>
> if (tulip_debug > 4)
> printk(KERN_DEBUG "%s: exiting interrupt, csr5=%#4.4x.\n",
> dev->name, inl(ioaddr + CSR5));
>
> Those messages only appear if a non-default verbosity has been selected.

I had the same .config and kernel boot parameters as for 2.6.4,
except I disabled modules and everything non-essential, and
didn't apply my private patches.

440BX chipset, no I/O-APIC, no ACPI, no PREEMPT, direct PCI access,
two FA310TXs (eth0 idle, eth1 had light traffic).

2004-03-18 16:30:00

by Mikael Pettersson

[permalink] [raw]
Subject: Re: tulip (pnic) errors in 2.6.5-rc1

Jeff Garzik writes:
> Mikael Pettersson wrote:
> > Jeff Garzik writes:
> > > er, oops... lemme find the right patch...
> >
> > No change, still a flood of those tulip_rx() interrupt messages.
>
> hmmm. Well, it is something unrelated to tulip driver, then.

Testing older -bk versions I've found that 2.6.4-bk2
is Ok but 2.6.4-bk3 has this message flood problem.

2004-03-18 16:43:40

by Randy.Dunlap

[permalink] [raw]
Subject: Re: tulip (pnic) errors in 2.6.5-rc1

On Thu, 18 Mar 2004 17:29:39 +0100 Mikael Pettersson wrote:

| Jeff Garzik writes:
| > Mikael Pettersson wrote:
| > > Jeff Garzik writes:
| > > > er, oops... lemme find the right patch...
| > >
| > > No change, still a flood of those tulip_rx() interrupt messages.
| >
| > hmmm. Well, it is something unrelated to tulip driver, then.
|
| Testing older -bk versions I've found that 2.6.4-bk2
| is Ok but 2.6.4-bk3 has this message flood problem.

That looks like exactly where the netdev_priv() patch went
in -- the one that Jeff asked you to back out and test again.
So I would have to ask you to verify that backing that patch
out didn't help, while we continue to look in other places
for possible problems...

Thanks,
--
~Randy

2004-03-18 17:12:56

by Randy.Dunlap

[permalink] [raw]
Subject: Re: tulip (pnic) errors in 2.6.5-rc1

On Thu, 18 Mar 2004 17:29:39 +0100 Mikael Pettersson wrote:

| Jeff Garzik writes:
| > Mikael Pettersson wrote:
| > > Jeff Garzik writes:
| > > > er, oops... lemme find the right patch...
| > >
| > > No change, still a flood of those tulip_rx() interrupt messages.
| >
| > hmmm. Well, it is something unrelated to tulip driver, then.
|
| Testing older -bk versions I've found that 2.6.4-bk2
| is Ok but 2.6.4-bk3 has this message flood problem.

Other than the netdev_priv() changes, I see removal of
KERNEL_SYSCALLS and the addition of CONFIG_NET_POLL_CONTROLLER.
Are you enabling CONFIG_NET_POLL_CONTROLLER?
If so, can you test with it disabled?

Thanks,
--
~Randy

2004-03-18 17:40:27

by Mikael Pettersson

[permalink] [raw]
Subject: Re: tulip (pnic) errors in 2.6.5-rc1

Randy.Dunlap writes:
> On Thu, 18 Mar 2004 17:29:39 +0100 Mikael Pettersson wrote:
>
> | Jeff Garzik writes:
> | > Mikael Pettersson wrote:
> | > > Jeff Garzik writes:
> | > > > er, oops... lemme find the right patch...
> | > >
> | > > No change, still a flood of those tulip_rx() interrupt messages.
> | >
> | > hmmm. Well, it is something unrelated to tulip driver, then.
> |
> | Testing older -bk versions I've found that 2.6.4-bk2
> | is Ok but 2.6.4-bk3 has this message flood problem.
>
> Other than the netdev_priv() changes, I see removal of
> KERNEL_SYSCALLS and the addition of CONFIG_NET_POLL_CONTROLLER.
> Are you enabling CONFIG_NET_POLL_CONTROLLER?
> If so, can you test with it disabled?

No it wasn't NET_POLL_CONTROLLER (I didn't enable it).

I split the bk2->bk3 patch in pieces, and traced the bug to the
netdev_priv() change in in loopback.c. The bug is that loopback's
dev is static and its ->priv actually points to kmalloc:d space.
The netdev_priv() transformation is invalid for anything not
allocated with alloc_etherdev or alloc_netdev, so the patch
to loopback caused it to access and clobber static memory just
after its own dev.

It was my bad luck that tulip was the victim of that clobber.

Reverting the patch below solves the problem.

/Mikael

diff -ruN linux-2.6.4-bk2/drivers/net/loopback.c linux-2.6.4-bk3/drivers/net/loopback.c
--- linux-2.6.4-bk2/drivers/net/loopback.c 2004-03-11 14:01:28.000000000 +0100
+++ linux-2.6.4-bk3/drivers/net/loopback.c 2004-03-18 16:12:28.000000000 +0100
@@ -123,7 +123,7 @@
*/
static int loopback_xmit(struct sk_buff *skb, struct net_device *dev)
{
- struct net_device_stats *stats = (struct net_device_stats *)dev->priv;
+ struct net_device_stats *stats = netdev_priv(dev);

skb_orphan(skb);

2004-03-18 17:44:48

by Randy.Dunlap

[permalink] [raw]
Subject: Re: tulip (pnic) errors in 2.6.5-rc1

On Thu, 18 Mar 2004 18:39:54 +0100 Mikael Pettersson wrote:

| Randy.Dunlap writes:
| > On Thu, 18 Mar 2004 17:29:39 +0100 Mikael Pettersson wrote:
| >
| > | Jeff Garzik writes:
| > | > Mikael Pettersson wrote:
| > | > > Jeff Garzik writes:
| > | > > > er, oops... lemme find the right patch...
| > | > >
| > | > > No change, still a flood of those tulip_rx() interrupt messages.
| > | >
| > | > hmmm. Well, it is something unrelated to tulip driver, then.
| > |
| > | Testing older -bk versions I've found that 2.6.4-bk2
| > | is Ok but 2.6.4-bk3 has this message flood problem.
| >
| > Other than the netdev_priv() changes, I see removal of
| > KERNEL_SYSCALLS and the addition of CONFIG_NET_POLL_CONTROLLER.
| > Are you enabling CONFIG_NET_POLL_CONTROLLER?
| > If so, can you test with it disabled?
|
| No it wasn't NET_POLL_CONTROLLER (I didn't enable it).
|
| I split the bk2->bk3 patch in pieces, and traced the bug to the
| netdev_priv() change in in loopback.c. The bug is that loopback's
| dev is static and its ->priv actually points to kmalloc:d space.
| The netdev_priv() transformation is invalid for anything not
| allocated with alloc_etherdev or alloc_netdev, so the patch
| to loopback caused it to access and clobber static memory just
| after its own dev.
|
| It was my bad luck that tulip was the victim of that clobber.
|
| Reverting the patch below solves the problem.
|
| /Mikael
|
| diff -ruN linux-2.6.4-bk2/drivers/net/loopback.c linux-2.6.4-bk3/drivers/net/loopback.c
| --- linux-2.6.4-bk2/drivers/net/loopback.c 2004-03-11 14:01:28.000000000 +0100
| +++ linux-2.6.4-bk3/drivers/net/loopback.c 2004-03-18 16:12:28.000000000 +0100
| @@ -123,7 +123,7 @@
| */
| static int loopback_xmit(struct sk_buff *skb, struct net_device *dev)
| {
| - struct net_device_stats *stats = (struct net_device_stats *)dev->priv;
| + struct net_device_stats *stats = netdev_priv(dev);
|
| skb_orphan(skb);
|

Thanks for checking, and sorry about the trouble.

BTW, the loopback patch has already been reverted.

--
~Randy