2001-12-04 22:58:16

by Tim Hockin

[permalink] [raw]
Subject: [PATCH] eepro100 - need testers

diff -ruN 2.4.14-orig/drivers/net/eepro100.c 2.4.14-cobalt/drivers/net/eepro100.c
--- 2.4.14-orig/drivers/net/eepro100.c Tue Dec 4 14:30:09 2001
+++ 2.4.14-cobalt/drivers/net/eepro100.c Tue Dec 4 14:03:35 2001
@@ -64,8 +64,8 @@

/* A few values that may be tweaked. */
/* The ring sizes should be a power of two for efficiency. */
-#define TX_RING_SIZE 32
-#define RX_RING_SIZE 32
+#define TX_RING_SIZE 64
+#define RX_RING_SIZE 1024
/* How much slots multicast filter setup may take.
Do not descrease without changing set_rx_mode() implementaion. */
#define TX_MULTICAST_SIZE 2
@@ -1067,6 +1071,50 @@
outw(CUStart | SCBMaskEarlyRx | SCBMaskFlowCtl, ioaddr + SCBCmd);
}

+/*
+ * Sometimes the receiver stops making progress. This routine knows how to
+ * get it going again, without losing packets or being otherwise nasty like
+ * a chip reset would be. Previously the driver had a whole sequence
+ * of if RxSuspended, if it's no buffers do one thing, if it's no resources,
+ * do another, etc. But those things don't really matter. Separate logic
+ * in the ISR provides for allocating buffers--the other half of operation
+ * is just making sure the receiver is active. speedo_rx_soft_reset does that.
+ * This problem with the old, more involved algorithm is shown up under
+ * ping floods on the order of 60K packets/second on a 100Mbps fdx network.
+ */
+static void
+speedo_rx_soft_reset(struct net_device *dev)
+{
+ struct speedo_private *sp = dev->priv;
+ struct RxFD *rfd;
+ long ioaddr;
+
+ ioaddr = dev->base_addr;
+ wait_for_cmd_done(ioaddr + SCBCmd);
+ if (inb(ioaddr + SCBCmd) != 0) {
+ printk("%s: previous command stalled\n", dev->name);
+ return;
+ }
+ /*
+ * Put the hardware into a known state.
+ */
+ outb(RxAbort, ioaddr + SCBCmd);
+
+ rfd = sp->rx_ringp[sp->cur_rx % RX_RING_SIZE];
+
+ rfd->rx_buf_addr = 0xffffffff;
+
+ wait_for_cmd_done(ioaddr + SCBCmd);
+
+ if (inb(ioaddr + SCBCmd) != 0) {
+ printk("%s: RxAbort command stalled\n", dev->name);
+ return;
+ }
+ outl(sp->rx_ring_dma[sp->cur_rx % RX_RING_SIZE],
+ ioaddr + SCBPointer);
+ outb(RxStart, ioaddr + SCBCmd);
+}
+
/* Media monitoring and control. */
static void speedo_timer(unsigned long data)
{
@@ -1500,82 +1591,37 @@
if ((status & 0xfc00) == 0)
break;

- /* Always check if all rx buffers are allocated. --SAW */
- speedo_refill_rx_buffers(dev, 0);
-
if ((status & 0x5000) || /* Packet received, or Rx error. */
(sp->rx_ring_state&(RrNoMem|RrPostponed)) == RrPostponed)
/* Need to gather the postponed packet. */
speedo_rx(dev);

- if (status & 0x1000) {
- spin_lock(&sp->lock);
- if ((status & 0x003c) == 0x0028) { /* No more Rx buffers. */
- struct RxFD *rxf;
- printk(KERN_WARNING "%s: card reports no RX buffers.\n",
- dev->name);
- rxf = sp->rx_ringp[sp->cur_rx % RX_RING_SIZE];
- if (rxf == NULL) {
- if (speedo_debug > 2)
- printk(KERN_DEBUG
- "%s: NULL cur_rx in speedo_interrupt().\n",
- dev->name);
- sp->rx_ring_state |= RrNoMem|RrNoResources;
- } else if (rxf == sp->last_rxf) {
- if (speedo_debug > 2)
- printk(KERN_DEBUG
- "%s: cur_rx is last in speedo_interrupt().\n",
- dev->name);
- sp->rx_ring_state |= RrNoMem|RrNoResources;
- } else
- outb(RxResumeNoResources, ioaddr + SCBCmd);
- } else if ((status & 0x003c) == 0x0008) { /* No resources. */
- struct RxFD *rxf;
- printk(KERN_WARNING "%s: card reports no resources.\n",
- dev->name);
- rxf = sp->rx_ringp[sp->cur_rx % RX_RING_SIZE];
- if (rxf == NULL) {
- if (speedo_debug > 2)
- printk(KERN_DEBUG
- "%s: NULL cur_rx in speedo_interrupt().\n",
- dev->name);
- sp->rx_ring_state |= RrNoMem|RrNoResources;
- } else if (rxf == sp->last_rxf) {
- if (speedo_debug > 2)
- printk(KERN_DEBUG
- "%s: cur_rx is last in speedo_interrupt().\n",
- dev->name);
- sp->rx_ring_state |= RrNoMem|RrNoResources;
- } else {
- /* Restart the receiver. */
- outl(sp->rx_ring_dma[sp->cur_rx % RX_RING_SIZE],
- ioaddr + SCBPointer);
- outb(RxStart, ioaddr + SCBCmd);
- }
- }
- sp->stats.rx_errors++;
- spin_unlock(&sp->lock);
- }
+ /* Always check if all rx buffers are allocated. --SAW */
+ speedo_refill_rx_buffers(dev, 0);

- if ((sp->rx_ring_state&(RrNoMem|RrNoResources)) == RrNoResources) {
- printk(KERN_WARNING
- "%s: restart the receiver after a possible hang.\n",
- dev->name);
- spin_lock(&sp->lock);
- /* Restart the receiver.
- I'm not sure if it's always right to restart the receiver
- here but I don't know another way to prevent receiver hangs.
- 1999/12/25 SAW */
- outl(sp->rx_ring_dma[sp->cur_rx % RX_RING_SIZE],
- ioaddr + SCBPointer);
- outb(RxStart, ioaddr + SCBCmd);
- sp->rx_ring_state &= ~RrNoResources;
- spin_unlock(&sp->lock);
+ spin_lock(&sp->lock);
+ /*
+ * The chip may have suspended reception for various reasons.
+ * Check for that, and re-prime it should this be the case.
+ */
+ switch ((status >> 2) & 0xf) {
+ case 0: /* Idle */
+ break;
+ case 1: /* Suspended */
+ case 2: /* No resources (RxFDs) */
+ case 9: /* Suspended with no more RBDs */
+ case 10: /* No resources due to no RBDs */
+ case 12: /* Ready with no RBDs */
+ speedo_rx_soft_reset(dev);
+ break;
+ case 3: case 5: case 6: case 7: case 8:
+ case 11: case 13: case 14: case 15:
+ /* these are all reserved values */
+ break;
}

/* User interrupt, Command/Tx unit interrupt or CU not active. */
if (status & 0xA400) {
- spin_lock(&sp->lock);
speedo_tx_buffer_gc(dev);
if (sp->tx_full
&& (int)(sp->cur_tx - sp->dirty_tx) < TX_QUEUE_UNFULL) {
@@ -1583,8 +1629,8 @@
sp->tx_full = 0;
netif_wake_queue(dev); /* Attention: under a spinlock. --SAW */
}
- spin_unlock(&sp->lock);
}
+ spin_unlock(&sp->lock);

if (--boguscnt < 0) {
printk(KERN_ERR "%s: Too much work at interrupt, status=0x%4.4x.\n",
@@ -1702,6 +1748,7 @@
int entry = sp->cur_rx % RX_RING_SIZE;
int rx_work_limit = sp->dirty_rx + RX_RING_SIZE - sp->cur_rx;
int alloc_ok = 1;
+ int npkts = 0;

if (speedo_debug > 4)
printk(KERN_DEBUG " In speedo_rx().\n");
@@ -1768,6 +1815,7 @@
memcpy(skb_put(skb, pkt_len), sp->rx_skbuff[entry]->tail,
pkt_len);
#endif
+ npkts++;
} else {
/* Pass up the already-filled skbuff. */
skb = sp->rx_skbuff[entry];
@@ -1778,6 +1826,7 @@
}
sp->rx_skbuff[entry] = NULL;
skb_put(skb, pkt_len);
+ npkts++;
sp->rx_ringp[entry] = NULL;
pci_unmap_single(sp->pdev, sp->rx_ring_dma[entry],
PKT_BUF_SZ + sizeof(struct RxFD), PCI_DMA_FROMDEVICE);
@@ -1798,7 +1847,8 @@
/* Try hard to refill the recently taken buffers. */
speedo_refill_rx_buffers(dev, 1);

- sp->last_rx_time = jiffies;
+ if (npkts)
+ sp->last_rx_time = jiffies;

return 0;
}


Attachments:
sparker-eepro100.diff (6.77 kB)

2001-12-04 23:17:56

by Edward Muller

[permalink] [raw]
Subject: Re: [PATCH] eepro100 - need testers

What are the eepro100 issues? I have two machines with two eepro100's in
each machine.

I am running 2.4.16 on them and I've been testing nbd and the errors
that I have been having MAY be caused by the network going out (as per a
discussion on the ENBD list).

On Tue, 2001-12-04 at 17:57, Tim Hockin wrote:
> This patch was developed here to resolve a number of eepro100 issues we
> were seeing. I'd like to get people to try this on their eepro100 chips and
> beat on it for a while.
>
> volunteers?
>
> Tim
> --
> Tim Hockin
> Systems Software Engineer
> Sun Microsystems, Cobalt Server Appliances
> [email protected]
> ----
>

> diff -ruN 2.4.14-orig/drivers/net/eepro100.c 2.4.14-cobalt/drivers/net/eepro100.c
> --- 2.4.14-orig/drivers/net/eepro100.c Tue Dec 4 14:30:09 2001
> +++ 2.4.14-cobalt/drivers/net/eepro100.c Tue Dec 4 14:03:35 2001
> @@ -64,8 +64,8 @@
>
> /* A few values that may be tweaked. */
> /* The ring sizes should be a power of two for efficiency. */
> -#define TX_RING_SIZE 32
> -#define RX_RING_SIZE 32
> +#define TX_RING_SIZE 64
> +#define RX_RING_SIZE 1024
> /* How much slots multicast filter setup may take.
> Do not descrease without changing set_rx_mode() implementaion. */
> #define TX_MULTICAST_SIZE 2
> @@ -1067,6 +1071,50 @@
> outw(CUStart | SCBMaskEarlyRx | SCBMaskFlowCtl, ioaddr + SCBCmd);
> }
>
> +/*
> + * Sometimes the receiver stops making progress. This routine knows how to
> + * get it going again, without losing packets or being otherwise nasty like
> + * a chip reset would be. Previously the driver had a whole sequence
> + * of if RxSuspended, if it's no buffers do one thing, if it's no resources,
> + * do another, etc. But those things don't really matter. Separate logic
> + * in the ISR provides for allocating buffers--the other half of operation
> + * is just making sure the receiver is active. speedo_rx_soft_reset does that.
> + * This problem with the old, more involved algorithm is shown up under
> + * ping floods on the order of 60K packets/second on a 100Mbps fdx network.
> + */
> +static void
> +speedo_rx_soft_reset(struct net_device *dev)
> +{
> + struct speedo_private *sp = dev->priv;
> + struct RxFD *rfd;
> + long ioaddr;
> +
> + ioaddr = dev->base_addr;
> + wait_for_cmd_done(ioaddr + SCBCmd);
> + if (inb(ioaddr + SCBCmd) != 0) {
> + printk("%s: previous command stalled\n", dev->name);
> + return;
> + }
> + /*
> + * Put the hardware into a known state.
> + */
> + outb(RxAbort, ioaddr + SCBCmd);
> +
> + rfd = sp->rx_ringp[sp->cur_rx % RX_RING_SIZE];
> +
> + rfd->rx_buf_addr = 0xffffffff;
> +
> + wait_for_cmd_done(ioaddr + SCBCmd);
> +
> + if (inb(ioaddr + SCBCmd) != 0) {
> + printk("%s: RxAbort command stalled\n", dev->name);
> + return;
> + }
> + outl(sp->rx_ring_dma[sp->cur_rx % RX_RING_SIZE],
> + ioaddr + SCBPointer);
> + outb(RxStart, ioaddr + SCBCmd);
> +}
> +
> /* Media monitoring and control. */
> static void speedo_timer(unsigned long data)
> {
> @@ -1500,82 +1591,37 @@
> if ((status & 0xfc00) == 0)
> break;
>
> - /* Always check if all rx buffers are allocated. --SAW */
> - speedo_refill_rx_buffers(dev, 0);
> -
> if ((status & 0x5000) || /* Packet received, or Rx error. */
> (sp->rx_ring_state&(RrNoMem|RrPostponed)) == RrPostponed)
> /* Need to gather the postponed packet. */
> speedo_rx(dev);
>
> - if (status & 0x1000) {
> - spin_lock(&sp->lock);
> - if ((status & 0x003c) == 0x0028) { /* No more Rx buffers. */
> - struct RxFD *rxf;
> - printk(KERN_WARNING "%s: card reports no RX buffers.\n",
> - dev->name);
> - rxf = sp->rx_ringp[sp->cur_rx % RX_RING_SIZE];
> - if (rxf == NULL) {
> - if (speedo_debug > 2)
> - printk(KERN_DEBUG
> - "%s: NULL cur_rx in speedo_interrupt().\n",
> - dev->name);
> - sp->rx_ring_state |= RrNoMem|RrNoResources;
> - } else if (rxf == sp->last_rxf) {
> - if (speedo_debug > 2)
> - printk(KERN_DEBUG
> - "%s: cur_rx is last in speedo_interrupt().\n",
> - dev->name);
> - sp->rx_ring_state |= RrNoMem|RrNoResources;
> - } else
> - outb(RxResumeNoResources, ioaddr + SCBCmd);
> - } else if ((status & 0x003c) == 0x0008) { /* No resources. */
> - struct RxFD *rxf;
> - printk(KERN_WARNING "%s: card reports no resources.\n",
> - dev->name);
> - rxf = sp->rx_ringp[sp->cur_rx % RX_RING_SIZE];
> - if (rxf == NULL) {
> - if (speedo_debug > 2)
> - printk(KERN_DEBUG
> - "%s: NULL cur_rx in speedo_interrupt().\n",
> - dev->name);
> - sp->rx_ring_state |= RrNoMem|RrNoResources;
> - } else if (rxf == sp->last_rxf) {
> - if (speedo_debug > 2)
> - printk(KERN_DEBUG
> - "%s: cur_rx is last in speedo_interrupt().\n",
> - dev->name);
> - sp->rx_ring_state |= RrNoMem|RrNoResources;
> - } else {
> - /* Restart the receiver. */
> - outl(sp->rx_ring_dma[sp->cur_rx % RX_RING_SIZE],
> - ioaddr + SCBPointer);
> - outb(RxStart, ioaddr + SCBCmd);
> - }
> - }
> - sp->stats.rx_errors++;
> - spin_unlock(&sp->lock);
> - }
> + /* Always check if all rx buffers are allocated. --SAW */
> + speedo_refill_rx_buffers(dev, 0);
>
> - if ((sp->rx_ring_state&(RrNoMem|RrNoResources)) == RrNoResources) {
> - printk(KERN_WARNING
> - "%s: restart the receiver after a possible hang.\n",
> - dev->name);
> - spin_lock(&sp->lock);
> - /* Restart the receiver.
> - I'm not sure if it's always right to restart the receiver
> - here but I don't know another way to prevent receiver hangs.
> - 1999/12/25 SAW */
> - outl(sp->rx_ring_dma[sp->cur_rx % RX_RING_SIZE],
> - ioaddr + SCBPointer);
> - outb(RxStart, ioaddr + SCBCmd);
> - sp->rx_ring_state &= ~RrNoResources;
> - spin_unlock(&sp->lock);
> + spin_lock(&sp->lock);
> + /*
> + * The chip may have suspended reception for various reasons.
> + * Check for that, and re-prime it should this be the case.
> + */
> + switch ((status >> 2) & 0xf) {
> + case 0: /* Idle */
> + break;
> + case 1: /* Suspended */
> + case 2: /* No resources (RxFDs) */
> + case 9: /* Suspended with no more RBDs */
> + case 10: /* No resources due to no RBDs */
> + case 12: /* Ready with no RBDs */
> + speedo_rx_soft_reset(dev);
> + break;
> + case 3: case 5: case 6: case 7: case 8:
> + case 11: case 13: case 14: case 15:
> + /* these are all reserved values */
> + break;
> }
>
> /* User interrupt, Command/Tx unit interrupt or CU not active. */
> if (status & 0xA400) {
> - spin_lock(&sp->lock);
> speedo_tx_buffer_gc(dev);
> if (sp->tx_full
> && (int)(sp->cur_tx - sp->dirty_tx) < TX_QUEUE_UNFULL) {
> @@ -1583,8 +1629,8 @@
> sp->tx_full = 0;
> netif_wake_queue(dev); /* Attention: under a spinlock. --SAW */
> }
> - spin_unlock(&sp->lock);
> }
> + spin_unlock(&sp->lock);
>
> if (--boguscnt < 0) {
> printk(KERN_ERR "%s: Too much work at interrupt, status=0x%4.4x.\n",
> @@ -1702,6 +1748,7 @@
> int entry = sp->cur_rx % RX_RING_SIZE;
> int rx_work_limit = sp->dirty_rx + RX_RING_SIZE - sp->cur_rx;
> int alloc_ok = 1;
> + int npkts = 0;
>
> if (speedo_debug > 4)
> printk(KERN_DEBUG " In speedo_rx().\n");
> @@ -1768,6 +1815,7 @@
> memcpy(skb_put(skb, pkt_len), sp->rx_skbuff[entry]->tail,
> pkt_len);
> #endif
> + npkts++;
> } else {
> /* Pass up the already-filled skbuff. */
> skb = sp->rx_skbuff[entry];
> @@ -1778,6 +1826,7 @@
> }
> sp->rx_skbuff[entry] = NULL;
> skb_put(skb, pkt_len);
> + npkts++;
> sp->rx_ringp[entry] = NULL;
> pci_unmap_single(sp->pdev, sp->rx_ring_dma[entry],
> PKT_BUF_SZ + sizeof(struct RxFD), PCI_DMA_FROMDEVICE);
> @@ -1798,7 +1847,8 @@
> /* Try hard to refill the recently taken buffers. */
> speedo_refill_rx_buffers(dev, 1);
>
> - sp->last_rx_time = jiffies;
> + if (npkts)
> + sp->last_rx_time = jiffies;
>
> return 0;
> }
--
-------------------------------
Edward Muller
Director of IS

973-715-0230 (cell)
212-487-9064 x115 (NYC)

http://www.learningpatterns.com
-------------------------------

2001-12-05 01:26:38

by Kurt Roeckx

[permalink] [raw]
Subject: Re: [PATCH] eepro100 - need testers

On Tue, Dec 04, 2001 at 02:57:35PM -0800, Tim Hockin wrote:
> -#define TX_RING_SIZE 32
> -#define RX_RING_SIZE 32
> +#define TX_RING_SIZE 64
> +#define RX_RING_SIZE 1024

Why do I have the feeling that you're just changing those values
so you get less chance of having the problem? Are there any
other reason why you change this? It might even be a good idea
to test it with lower values.


> - } else if ((status & 0x003c) == 0x0008) { /* No resources. */
> - struct RxFD *rxf;
> - printk(KERN_WARNING "%s: card reports no resources.\n",
> - dev->name);

[...]

> + switch ((status >> 2) & 0xf) {
> + case 0: /* Idle */
> + break;
> + case 1: /* Suspended */
> + case 2: /* No resources (RxFDs) */
> + case 9: /* Suspended with no more RBDs */
> + case 10: /* No resources due to no RBDs */
> + case 12: /* Ready with no RBDs */
> + speedo_rx_soft_reset(dev);
> + break;

You can also argue that you're trying to fix the problem by
hiding it. It would be useful that it still reported the same
error message, so you can see that if it happens again with the
patch that it no longer locks up.


Kurt

2001-12-05 17:09:15

by Steve Parker

[permalink] [raw]
Subject: Re: [PATCH] eepro100 - need testers

At 05:26 PM 12/4/2001 , Kurt Roeckx wrote:
>On Tue, Dec 04, 2001 at 02:57:35PM -0800, Tim Hockin wrote:
> > -#define TX_RING_SIZE 32
> > -#define RX_RING_SIZE 32
> > +#define TX_RING_SIZE 64
> > +#define RX_RING_SIZE 1024
>
>Why do I have the feeling that you're just changing those values
>so you get less chance of having the problem? Are there any
>other reason why you change this? It might even be a good idea
>to test it with lower values.

If you test with lower values, I find that the problem happens so often that
bidirectional TCP bulk throughput tests on 100Mbits/sec ethernet are
significantly
lower. As Tim pointed out, the RX ring size is chosen based on being large
enough
to receive steadily and only require the ISR to come by and empty it once
every jiffy.
In order to provide good performance and survivability on maximum packet
rate loads,
it needs to be 1024, although it's moderately good on 512, on my 300MHz K6
system.



> > - } else if ((status & 0x003c) == 0x0008) { /* No
> resources. */
> > - struct RxFD *rxf;
> > - printk(KERN_WARNING "%s: card reports no
> resources.\n",
> > - dev->name);
>
>[...]
>
> > + switch ((status >> 2) & 0xf) {
> > + case 0: /* Idle */
> > + break;
> > + case 1: /* Suspended */
> > + case 2: /* No resources (RxFDs) */
> > + case 9: /* Suspended with no more RBDs */
> > + case 10: /* No resources due to no RBDs */
> > + case 12: /* Ready with no RBDs */
> > + speedo_rx_soft_reset(dev);
> > + break;
>
>You can also argue that you're trying to fix the problem by
>hiding it. It would be useful that it still reported the same
>error message, so you can see that if it happens again with the
>patch that it no longer locks up.

The printk significantly reduces TCP throughput on my tests, it doesn't tell me
about an interesting condition, so I removed it. This state happens any
time the chip receives
more than a ring buffer full before the ISR can empty it, which is
something which
is always possible.

And any way, why would you care to know? Is there something you'ld imagine
doing because you saw the message?


Cheers,

~sparker

2001-12-05 19:37:47

by Mike Fedyk

[permalink] [raw]
Subject: Re: [PATCH] eepro100 - need testers

On Wed, Dec 05, 2001 at 08:59:45AM -0800, Steve Parker wrote:
> At 05:26 PM 12/4/2001 , Kurt Roeckx wrote:
> >On Tue, Dec 04, 2001 at 02:57:35PM -0800, Tim Hockin wrote:
> >> -#define TX_RING_SIZE 32
> >> -#define RX_RING_SIZE 32
> >> +#define TX_RING_SIZE 64
> >> +#define RX_RING_SIZE 1024
> >
> >Why do I have the feeling that you're just changing those values
> >so you get less chance of having the problem? Are there any
> >other reason why you change this? It might even be a good idea
> >to test it with lower values.
>
> If you test with lower values, I find that the problem happens so often that
> bidirectional TCP bulk throughput tests on 100Mbits/sec ethernet are
> significantly
> lower. As Tim pointed out, the RX ring size is chosen based on being large
> enough
> to receive steadily and only require the ISR to come by and empty it once
> every jiffy.
> In order to provide good performance and survivability on maximum packet
> rate loads,
> it needs to be 1024, although it's moderately good on 512, on my 300MHz K6
> system.
>

So, if I choose to plug an eepro100 into a pentium 75 (or comperable on
other pci based arch), am I going to get massive RX_RING overflows? If so,
then the ring size should probably be sized based on bogomips...

mf

2001-12-06 23:25:20

by Alan

[permalink] [raw]
Subject: Re: [PATCH] eepro100 - need testers

> This patch was developed here to resolve a number of eepro100 issues we
> were seeing. I'd like to get people to try this on their eepro100 chips and
> beat on it for a while.

Works for me. Its the first eepro100 driver that wont choke eventually on
my i810 board and its also the only one that will recover the board after
a soft boot when it had previously started spewing errors

Alan

2001-12-06 23:30:00

by Tim Hockin

[permalink] [raw]
Subject: Re: [PATCH] eepro100 - need testers

Alan Cox wrote:

> Works for me. Its the first eepro100 driver that wont choke eventually on
> my i810 board and its also the only one that will recover the board after
> a soft boot when it had previously started spewing errors


woohoo! Glad to know, thanks Alan.

--
Tim Hockin
Systems Software Engineer
Sun Microsystems, Cobalt Server Appliances
[email protected]

2001-12-06 23:37:26

by Jeff Garzik

[permalink] [raw]
Subject: Re: [PATCH] eepro100 - need testers

Alan Cox wrote:
>
> > This patch was developed here to resolve a number of eepro100 issues we
> > were seeing. I'd like to get people to try this on their eepro100 chips and
> > beat on it for a while.
>
> Works for me. Its the first eepro100 driver that wont choke eventually on
> my i810 board and its also the only one that will recover the board after
> a soft boot when it had previously started spewing errors

This patch got me thinking about net driver ring sizes in general. When
you are talking thousands of packets per second at 100 mbit, a larger
ring size than the average 32-64 seems to make sense too.

--
Jeff Garzik | Only so many songs can be sung
Building 1024 | with two lips, two lungs, and one tongue.
MandrakeSoft | - nomeansno

2001-12-07 01:06:59

by Tim Hockin

[permalink] [raw]
Subject: Re: [PATCH] eepro100 - need testers

Jeff Garzik wrote:

> This patch got me thinking about net driver ring sizes in general. When
> you are talking thousands of packets per second at 100 mbit, a larger
> ring size than the average 32-64 seems to make sense too.

Well, the math for teh very worst case is something like:

100,000,000 bits/sec
/8
= 12500000 bytes/sec
/64 bytes/ping
= 195312.5 ping/sec
/100
= 1953 ping/jiffy
rounded to 2048
/2 = 1024 rx buffers per 1/2 jiffie.

1024 means you can withstand a wire-speed storm while interrupting twice
per jiffy.


--
Tim Hockin
Systems Software Engineer
Sun Microsystems, Cobalt Server Appliances
[email protected]

2001-12-07 01:30:53

by Leif Sawyer

[permalink] [raw]
Subject: RE: [PATCH] eepro100 - need testers

Tim Hockin responded to
> Jeff Garzik who wrote:
>
>> This patch got me thinking about net driver ring sizes in
>> general. When you are talking thousands of packets per second
>> at 100 mbit, a larger ring size than the average 32-64 seems
>> to make sense too.
>
> Well, the math for the very worst case is something like:
>
> 100,000,000 bits/sec
> /8 = 12500000 bytes/sec
> /64 bytes/ping = 195312.5 ping/sec
> /100 = 1953 ping/jiffy
> rounded to 2048 /2 = 1024 rx buffers per 1/2 jiffie.
>
> 1024 means you can withstand a wire-speed storm while
> interrupting twice per jiffy.

Given this, and the ever-upward climb in ethernet speed,
what would be the dangers involved in making this a run-time
option?

As soon as we detect the device, we know what it's max speed is,
and we can then build the ring size base on that knowledge.

just some ignorant thoughts..

2001-12-10 03:42:43

by Ben Greear

[permalink] [raw]
Subject: Re: [PATCH] eepro100 - need testers

Before this patch, I would see out-of-resource messages when I ran
50Mbps+ traffic + bonnie++ on a P-III 550Mhz machine. With this patch,
I see no error messages, and traffic is flowing fine...

So, seems like a winner to me!

PS. I don't currently have any machines available to test the
cmd-timeout issues with the eepro driver and some NICs. Has
anyone tested to see if this patch actually fixes those problems too?

Thanks,
Ben

Tim Hockin wrote:

> This patch was developed here to resolve a number of eepro100 issues we
> were seeing. I'd like to get people to try this on their eepro100 chips and
> beat on it for a while.
>
> volunteers?
>
> Tim


--
Ben Greear <[email protected]> <Ben_Greear AT excite.com>
President of Candela Technologies Inc http://www.candelatech.com
ScryMUD: http://scry.wanfear.com http://scry.wanfear.com/~greear