2003-03-18 20:16:48

by Vladimir B. Savkin

[permalink] [raw]
Subject: eepro100+NAPI failure

Hi!

I'm planning to deploy Intel-based fast ethernet NICs on a busy
router so I decided to try NAPI.

I grabbed eepro100-napi-020619.tar.gz from
ftp://robur.slu.se/pub/Linux/net-development/NAPI/
and applied eepro100-napi.patch from the archive.

The patch is pretty old and some hand-merging was required to
patch eepro100.c from 2.4.21-pre5. I'm attaching the resulting
diff to this message.

To test NAPI performance, I've blasted a stream of short packets
from another host using pktgen. The bad thing is that receiving host
stopped responding to packets immediately. The effect is 100%
reproducable, eventually it comes back though, black-out can last
from a second to several minutes. Here is what I was able to catch
after 'ethtool -s eth0 msglvl 0xfff':

Mar 18 22:39:18 intermap kernel: eth0: interrupt status=0x4050.
Mar 18 22:39:18 intermap kernel: switching to poll,status=4050
Mar 18 22:39:18 intermap kernel: eth0: exiting interrupt, status=0x4050.
Mar 18 22:39:18 intermap kernel: In speedo_poll().
Mar 18 22:39:18 intermap kernel: In speedo_rx().
Mar 18 22:39:18 intermap kernel: speedo_rx() status 0000a020 len 64.
Mar 18 22:39:18 intermap last message repeated 25 times
Mar 18 22:39:18 intermap kernel: In speedo_rx().
Mar 18 22:39:18 intermap kernel: speedo_rx() status 0000a020 len 64.
Mar 18 22:39:18 intermap kernel: In speedo_rx().
Mar 18 22:39:18 intermap kernel: speedo_rx() status 0000a020 len 64.
Mar 18 22:39:18 intermap kernel: In speedo_rx().
Mar 18 22:39:18 intermap kernel: speedo_rx() status 0000a020 len 64.
Mar 18 22:39:18 intermap kernel: eth0: interrupt status=0x4050.
Mar 18 22:39:18 intermap kernel: switching to poll,status=4050
Mar 18 22:39:18 intermap kernel: eth0: exiting interrupt, status=0x4050.
Mar 18 22:39:18 intermap kernel: done,received=29
Mar 18 22:39:18 intermap kernel: In speedo_poll().
Mar 18 22:39:18 intermap kernel: In speedo_rx().
Mar 18 22:39:18 intermap kernel: speedo_rx() status 0000a020 len 64.
Mar 18 22:39:18 intermap last message repeated 13 times
Mar 18 22:39:18 intermap kernel: not done,received=14
Mar 18 22:39:18 intermap kernel: In speedo_poll().
Mar 18 22:39:18 intermap kernel: In speedo_rx().
Mar 18 22:39:18 intermap kernel: speedo_rx() status 0000a020 len 64.
Mar 18 22:39:18 intermap last message repeated 63 times
Mar 18 22:39:18 intermap kernel: No resource,reset
Mar 18 22:39:18 intermap kernel: done,received=64
Mar 18 22:39:19 intermap kernel: eth0: Media control tick, status 0040.
Mar 18 22:39:23 intermap last message repeated 2 times
Mar 18 22:39:23 intermap kernel: eth0: interrupt status=0x2040.
Mar 18 22:39:23 intermap kernel: switching to poll,status=2040
Mar 18 22:39:23 intermap kernel: scavenge candidate 63 status 400ca000.
Mar 18 22:39:23 intermap kernel: eth0: interrupt status=0x0040.
Mar 18 22:39:23 intermap kernel: eth0: exiting interrupt, status=0x0040.
Mar 18 22:39:23 intermap kernel: In speedo_poll().
Mar 18 22:39:23 intermap kernel: In speedo_rx().
Mar 18 22:39:23 intermap kernel: No resource,reset
Mar 18 22:39:23 intermap kernel: received==0
Mar 18 22:39:23 intermap kernel: done,received=1
Mar 18 22:39:23 intermap kernel: eth0: interrupt status=0x4050.
Mar 18 22:39:23 intermap kernel: switching to poll,status=4050
Mar 18 22:39:23 intermap kernel: eth0: exiting interrupt, status=0x4050.
Mar 18 22:39:23 intermap kernel: In speedo_poll().
Mar 18 22:39:23 intermap kernel: In speedo_rx().
Mar 18 22:39:23 intermap kernel: speedo_rx() status 0000a020 len 64.
Mar 18 22:39:23 intermap kernel: done,received=1
Mar 18 22:39:23 intermap kernel: eth0: interrupt status=0x4050.
Mar 18 22:39:23 intermap kernel: switching to poll,status=4050
Mar 18 22:39:23 intermap kernel: eth0: exiting interrupt, status=0x4050.
Mar 18 22:39:23 intermap kernel: In speedo_poll().
Mar 18 22:39:23 intermap kernel: In speedo_rx().
Mar 18 22:39:23 intermap kernel: speedo_rx() status 0000a020 len 78.
Mar 18 22:39:23 intermap kernel: done,received=1
Mar 18 22:39:23 intermap kernel: eth0: interrupt status=0xa050.
Mar 18 22:39:23 intermap kernel: scavenge candidate 0 status 600ca000.
Mar 18 22:39:23 intermap kernel: eth0: interrupt status=0x0050.
Mar 18 22:39:23 intermap kernel: eth0: exiting interrupt, status=0x0050.

Notice three "eth0: Media control tick, status 0040." messages in a row,
this is precisely the black-out period. During normal activity,
it prints "eth0: Media control tick, status 0050." every 2 seconds.
Moreover, above are only "No resource,reset" messages that were
captured during test run.

This is lspci info about this NIC:

00:0f.0 Ethernet controller: Intel Corp. 82557/8/9 [Ethernet Pro 100]
(rev 08)
Subsystem: Intel Corp. EtherExpress PRO/100+ Management Adapter
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR- FastB2B-
Status: Cap+ 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort-
<TAbort- <MAbort- >SERR- <PERR-
Latency: 32 (2000ns min, 14000ns max), cache line size 08
Interrupt: pin A routed to IRQ 10
Region 0: Memory at d5500000 (32-bit, non-prefetchable)
[size=4K]
Region 1: I/O ports at c800 [size=64]
Region 2: Memory at d5400000 (32-bit, non-prefetchable)
[size=1M]
Expansion ROM at <unassigned> [disabled] [size=1M]
Capabilities: [dc] Power Management version 2
Flags: PMEClk- DSI+ D1+ D2+ AuxCurrent=0mA
PME(D0+,D1+,D2+,D3hot+,D3cold+)
Status: D0 PME-Enable- DSel=0 DScale=2 PME-

Can anyone help me to make NAPI work? Does anyone even use NAPI
with eepro100, I guess not many people since the patch is pretty old
and I could not find it ported to 2.4.21-pre.


:wq
With best regards,
Vladimir Savkin.


Attachments:
(No filename) (5.68 kB)
eepro100-napi-for2.4.21-pre5.diff (10.48 kB)
Download all attachments

2003-03-18 20:43:52

by Martin Josefsson

[permalink] [raw]
Subject: Re: eepro100+NAPI failure

On Tue, 2003-03-18 at 21:27, Vladimir B. Savkin wrote:
> Hi!
>
> I'm planning to deploy Intel-based fast ethernet NICs on a busy
> router so I decided to try NAPI.

> To test NAPI performance, I've blasted a stream of short packets
> from another host using pktgen. The bad thing is that receiving host
> stopped responding to packets immediately. The effect is 100%
> reproducable, eventually it comes back though, black-out can last
> from a second to several minutes. Here is what I was able to catch
> after 'ethtool -s eth0 msglvl 0xfff':

I saw the same problem when I tested it a while back, didn't have time
to investigate so my routers are running without NAPI right now :(

> Can anyone help me to make NAPI work? Does anyone even use NAPI
> with eepro100, I guess not many people since the patch is pretty old
> and I could not find it ported to 2.4.21-pre.

I havn't heard of anyone using it. I've understood that the recieve path
in the eepro100 chip can be quite fragile and has to be treated right or
it'll hang... maybe the NAPI patch changes things too much...

Anyway, please let me know if you manage to get it working

--
/Martin

Never argue with an idiot. They drag you down to their level, then beat you with experience.

2003-03-19 13:32:46

by Vladimir B. Savkin

[permalink] [raw]
Subject: Re: eepro100+NAPI failure

On Tue, Mar 18, 2003 at 09:54:44PM +0100, Martin Josefsson wrote:
> > Can anyone help me to make NAPI work? Does anyone even use NAPI
> > with eepro100, I guess not many people since the patch is pretty old
> > and I could not find it ported to 2.4.21-pre.
>
> I havn't heard of anyone using it. I've understood that the recieve path
> in the eepro100 chip can be quite fragile and has to be treated right or
> it'll hang... maybe the NAPI patch changes things too much...
>
> Anyway, please let me know if you manage to get it working

It seems to work with this one:

02:03.0 Ethernet controller: Intel Corp. 82557 [Ethernet Pro 100] (rev
02)
Subsystem: IBM 82558B Ethernet Pro 10/100
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR- FastB2B-
Status: Cap- 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort-
<TAbort- <MAbort- >SERR- <PERR-
Latency: 32 (2000ns min, 14000ns max)
Interrupt: pin A routed to IRQ 16
Region 0: Memory at e0000000 (32-bit, prefetchable) [size=4K]
Region 1: I/O ports at a400 [size=32]
Region 2: Memory at df000000 (32-bit, non-prefetchable)
[size=1M]
Expansion ROM at <unassigned> [disabled] [size=1M]


No problem with more than 10^7 packets
It just drops packets under heavy load, without live-locking,
so NAPI kinda works :)

Unfortunally, I could not get this NIC to work with oversized frames
to implement 802.1q, both with eepro100 and e100 drivers :(

:wq
With best regards,
Vladimir Savkin.

2003-03-19 18:11:01

by Ion Badulescu

[permalink] [raw]
Subject: Re: eepro100+NAPI failure

On Wed, 19 Mar 2003 16:43:41 +0300, Vladimir B. Savkin <[email protected]> wrote:

> It seems to work with this one:
>
> 02:03.0 Ethernet controller: Intel Corp. 82557 [Ethernet Pro 100] (rev
> 02)
[...]
> Unfortunally, I could not get this NIC to work with oversized frames
> to implement 802.1q, both with eepro100 and e100 drivers :(

Indeed, the 82557 does not support frames larger than 1500 bytes -- and
you'd need an extra 4 bytes for the vlan tag.

Only the 82558+ (PCI rev 4 or higher) supports huge frames.

Ion

--
It is better to keep your mouth shut and be thought a fool,
than to open it and remove all doubt.