Hello,
I've updated eepro100 driver for 2.4 kernel branch.
So far, the most annoying initialization problem (expressing itself in "card
reports no resources" messages) hasn't been fixed.
The driver is available at
ftp://ftp.sw.com.sg/pub/Linux/people/saw/kernel/v2.4/eepro100.c
The main changes are:
- fixes for 64-bit architectures (rx_copybreak, additional cpu_to_le32,
PCI_DMA_BIDIRECTIONAL for RX descriptions)
- a couple of timing fixes
- a lot of code cleanup, minor fixes.
See ftp://ftp.sw.com.sg/pub/Linux/people/saw/kernel/v2.4/eepro100.changelog
for a detailed log.
Best regards
Andrey
Andrey Savochkin wrote:
> I've updated eepro100 driver for 2.4 kernel branch.
> So far, the most annoying initialization problem (expressing itself in "card
> reports no resources" messages) hasn't been fixed.
Hi Andrey,
I've been using an older EEPro100/B card until now and it's been working without any
problems ever since the transmitter bugs were fixed. The boot output looked like this:
eepro100.c:v1.09j-t 9/29/99 Donald Becker http://cesdis.gsfc.nasa.gov/linux/drivers/eepro100.html
eepro100.c: $Revision: 1.35 $ 2000/11/17 Modified by Andrey V. Savochkin <[email protected]> and others
eth0: Intel Corporation 82557 [Ethernet Pro 100], 00:A0:C9:41:F4:DE, IRQ 9.
Board assembly 667280-003, Physical connectors present: RJ45
Primary interface chip i82555 PHY #1.
General self-test: passed.
Serial sub-system self-test: passed.
Internal registers self-test: passed.
ROM checksum self-test: passed (0x49caa8d6).
Receiver lock-up workaround activated.
Intel webpage says: 667280-xxx is a model EEPro100/B but I dunno which chipset.
Today I've installed a new model with Wake-on-LAN support and got caught by
above mentioned
eth0: card reports no RX buffers.
eth0: card reports no resources.
messages as well. Strangely those messages only ever happen during bootup and
*every* time. Shutting eth0 down and bringing it back up fixes the problem.
What puzzles me a bit is that the newer card (721383-xxx) is an 82559 chip,
according to the Intel site, but the boot output doesn't say so:
eepro100.c:v1.09j-t 9/29/99 Donald Becker http://cesdis.gsfc.nasa.gov/linux/drivers/eepro100.html
eepro100.c: $Revision: 1.35 $ 2000/11/17 Modified by Andrey V. Savochkin <[email protected]> and others
eth0: Intel Corporation 82557 [Ethernet Pro 100], 00:02:B3:1F:BA:5D, IRQ 9.
Receiver lock-up bug exists -- enabling work-around.
Board assembly 721383-016, Physical connectors present: RJ45
Primary interface chip i82555 PHY #1.
General self-test: passed.
Serial sub-system self-test: passed.
Internal registers self-test: passed.
ROM checksum self-test: passed (0x04f4518b).
If you have any patches or tests that would help to find and fix this init
bug, I'd offer to test them out, since I can reliably reproduce the problem.
Regards,
Udo.
Hello,
On Thu, Nov 30, 2000 at 07:41:11PM +0100, Udo A. Steinberg wrote:
> I've been using an older EEPro100/B card until now and it's been working without any
> problems ever since the transmitter bugs were fixed. The boot output looked like this:
[snip]
> Today I've installed a new model with Wake-on-LAN support and got caught by
> above mentioned
>
> eth0: card reports no RX buffers.
> eth0: card reports no resources.
>
> messages as well. Strangely those messages only ever happen during bootup and
> *every* time. Shutting eth0 down and bringing it back up fixes the problem.
It's a known issue.
I've been promised that this issue would be looked up in Intel's errata by
people who had the access to it, but I haven't got the results yet.
> What puzzles me a bit is that the newer card (721383-xxx) is an 82559 chip,
> according to the Intel site, but the boot output doesn't say so:
[snip]
The card itself doesn't report its revision in details.
It can be checked by `lspci'.
Rev 8 is 82559, if I remember, and rev 9 is 82559ER.
> If you have any patches or tests that would help to find and fix this init
> bug, I'd offer to test them out, since I can reliably reproduce the problem.
Sorry, no patches so far...
I may suggest only workarounds that reduces the likelihood of the fails.
Best regards
Andrey
Andrey Savochkin wrote:
>
> > eth0: card reports no RX buffers.
> > eth0: card reports no resources.
> It's a known issue.
> I've been promised that this issue would be looked up in Intel's errata by
> people who had the access to it, but I haven't got the results yet.
I just figured out something interesting. Apparently there's a small timing
problem with setting up the NIC: If I put in a sleep 1 between setting up
the interface and setting up the gateway route, everything works pretty well.
So things now look like this:
/sbin/ifconfig lo 127.0.0.1
/sbin/route add -net 127.0.0.0 netmask 255.0.0.0 lo
/sbin/ifconfig eth0 a.b.c.d broadcast x.y.225.255 netmask 255.255.255.0
/sbin/ifconfig eth0:0 a.b.c.d broadcast 172.16.255.255 netmask 255.255.0.0
sleep 1 # This does the trick
/sbin/route add default gw a.b.c.d netmask 0.0.0.0 metric 1
> The card itself doesn't report its revision in details.
> It can be checked by `lspci'.
> Rev 8 is 82559, if I remember, and rev 9 is 82559ER.
http://support.intel.com/support/network/adapter/pro100/21397.htm
has a list of Board-Assembly IDs and the corresponding chip revisions.
Regards,
Udo.
On Fri, 1 Dec 2000 17:51:09 +0800, Andrey Savochkin <[email protected]> wrote:
> I've been promised that this issue would be looked up in Intel's errata by
> people who had the access to it, but I haven't got the results yet.
There is nothing relevant in the errata, unfortunately...
> The card itself doesn't report its revision in details.
> It can be checked by `lspci'.
> Rev 8 is 82559, if I remember, and rev 9 is 82559ER.
No, 82559ER has its own PCI id, 0x1209. There is also a newer 82559 chip
which reports a different PCI device id, 0x1030 (I have one of those).
For the old chips reporting 0x1229, revisions 1-3 are 82557, revisions
4-5 are 82558 and revisions 6-8 are 82559.
Ion
--
It is better to keep your mouth shut and be thought a fool,
than to open it and remove all doubt.
Hello,
On Fri, Dec 01, 2000 at 01:45:24PM -0800, Ion Badulescu wrote:
> On Fri, 1 Dec 2000 17:51:09 +0800, Andrey Savochkin <[email protected]> wrote:
>
> > I've been promised that this issue would be looked up in Intel's errata by
> > people who had the access to it, but I haven't got the results yet.
>
> There is nothing relevant in the errata, unfortunately...
Do you have it?
The sympthomes are that the card triggers Flow Control Pause condition (and
interrupt) on the last stages of the initialization or right after.
And it happens with flow control being explicitly turned off.
High network load considerably increase the chances of the event.
After that the card stops to behave sane and reports status 0x7048.
It may happen that we don't understand something in the initialization
sequence, or just a plain hardware bug.
> > The card itself doesn't report its revision in details.
> > It can be checked by `lspci'.
> > Rev 8 is 82559, if I remember, and rev 9 is 82559ER.
>
> No, 82559ER has its own PCI id, 0x1209. There is also a newer 82559 chip
> which reports a different PCI device id, 0x1030 (I have one of those).
Yes, you're right.
> For the old chips reporting 0x1229, revisions 1-3 are 82557, revisions
> 4-5 are 82558 and revisions 6-8 are 82559.
Best regards
Andrey V.
Savochkin
On Mon, 4 Dec 2000, Andrey Savochkin wrote:
> > There is nothing relevant in the errata, unfortunately...
>
> Do you have it?
I have the manual in the office, so I can look at it again in a couple of
days. I've used it to hack on the BSDI driver...
> The sympthomes are that the card triggers Flow Control Pause condition (and
> interrupt) on the last stages of the initialization or right after.
> And it happens with flow control being explicitly turned off.
> High network load considerably increase the chances of the event.
> After that the card stops to behave sane and reports status 0x7048.
Cool, I'll try to go over the driver init sequence by the end of the
weekend and let you know if I see anything wrong.
> It may happen that we don't understand something in the initialization
> sequence, or just a plain hardware bug.
Do you know if only one specific chip revision exhibits this problem? It
would really help track down the problem. If I remember correctly, 82557
doesn't have flow control at all, and 82558/9 have different
implementations -- one is proprietary (82558) and one is standard (82559).
Thanks,
Ion
--
It is better to keep your mouth shut and be thought a fool,
than to open it and remove all doubt.
Ion Badulescu wrote:
>
> Do you know if only one specific chip revision exhibits this problem? It
> would really help track down the problem. If I remember correctly, 82557
> doesn't have flow control at all, and 82558/9 have different
> implementations -- one is proprietary (82558) and one is standard (82559).
82559 has this problem for sure.
-Udo.
Hello,
On Tue, Dec 05, 2000 at 11:13:27AM -0800, Ion Badulescu wrote:
> On Mon, 4 Dec 2000, Andrey Savochkin wrote:
>
> > > There is nothing relevant in the errata, unfortunately...
> >
> > Do you have it?
>
> I have the manual in the office, so I can look at it again in a couple of
> days. I've used it to hack on the BSDI driver...
Fine!
> > The sympthomes are that the card triggers Flow Control Pause condition (and
> > interrupt) on the last stages of the initialization or right after.
> > And it happens with flow control being explicitly turned off.
> > High network load considerably increase the chances of the event.
> > After that the card stops to behave sane and reports status 0x7048.
>
> Cool, I'll try to go over the driver init sequence by the end of the
> weekend and let you know if I see anything wrong.
May be, there is a mandatory delay missing somewhere..
> > It may happen that we don't understand something in the initialization
> > sequence, or just a plain hardware bug.
>
> Do you know if only one specific chip revision exhibits this problem? It
> would really help track down the problem. If I remember correctly, 82557
> doesn't have flow control at all, and 82558/9 have different
> implementations -- one is proprietary (82558) and one is standard (82559).
I personally have seen it with 82559ER only.
But there have been some reports about 82559, too.
Best regards
Andrey
On Wed, 6 Dec 2000, Andrey Savochkin wrote:
> > > The sympthomes are that the card triggers Flow Control Pause condition (and
> > > interrupt) on the last stages of the initialization or right after.
> > > And it happens with flow control being explicitly turned off.
> > > High network load considerably increase the chances of the event.
> > > After that the card stops to behave sane and reports status 0x7048.
> >
> > Cool, I'll try to go over the driver init sequence by the end of the
> > weekend and let you know if I see anything wrong.
>
> May be, there is a mandatory delay missing somewhere..
Or it may be something else. The manual states that one of the differences
between 82558 and 82559 is that the latter defaults to advertising its
flow-control capability through NWay, whereas the former does not. Both
*do* support flow-control, though.
> > Do you know if only one specific chip revision exhibits this problem? It
> > would really help track down the problem. If I remember correctly, 82557
> > doesn't have flow control at all, and 82558/9 have different
> > implementations -- one is proprietary (82558) and one is standard (82559).
>
> I personally have seen it with 82559ER only.
> But there have been some reports about 82559, too.
The fact that apparently only the people using 82559 chips are seeing this
seems to confirm my analysis above.
If you could try the attached patch (and maybe pass it onto the other
people who are experiencing this problem), that would be great.
There is some other problem I haven't yet tracked down: if an 82559 is set
to autonegotiate and it successfully gets 100FDX, detaching and then
re-attaching the cable leaves the switch believing that the connection is
100HDX -- which obviously doesn't work. I solved this problem in the BSDI
driver by monitoring the link status and forcing a renegotiation whenever
it comes back up, but there should be an easier way to do this.
BTW, you were right: rev 9 is an 82559ER A-step. They changed it
afterwards to have its own PCI id.
Thanks,
Ion
--
It is better to keep your mouth shut and be thought a fool,
than to open it and remove all doubt.
--------------------------
--- linux-2.4/drivers/net/eepro100.c.old Fri Dec 8 11:22:18 2000
+++ linux-2.4/drivers/net/eepro100.c Fri Dec 8 11:29:28 2000
@@ -965,8 +965,12 @@
sp->flow_ctrl = sp->partner = 0;
sp->rx_mode = -1; /* Invalid -> always reset the mode. */
set_rx_mode(dev);
- if ((sp->phy[0] & 0x8000) == 0)
+ if ((sp->phy[0] & 0x8000) == 0) {
sp->advertising = mdio_read(ioaddr, sp->phy[0] & 0x1f, 4);
+ /* disable advertising the flow-control capability */
+ sp->advertising &= ~0x0400;
+ mdio_write(ioaddr, sp->phy[0] & 0x1f, sp->advertising);
+ }
if (speedo_debug > 2) {
printk(KERN_DEBUG "%s: Done speedo_open(), status %8.8x.\n",
@@ -1249,6 +1253,8 @@
#else
mdio_read(ioaddr, phy_addr, 0);
mdio_write(ioaddr, phy_addr, 0, mii_bmcr);
+ /* disable advertising the flow control capability */
+ advertising &= ~0x0400;
mdio_write(ioaddr, phy_addr, 4, advertising);
#endif
}
Ion Badulescu wrote:
> The fact that apparently only the people using 82559 chips are seeing this
> seems to confirm my analysis above.
>
> If you could try the attached patch (and maybe pass it onto the other
> people who are experiencing this problem), that would be great.
> + /* disable advertising the flow-control capability */
> + sp->advertising &= ~0x0400;
> + mdio_write(ioaddr, sp->phy[0] & 0x1f, sp->advertising);
^^^
missing a 4 here?
I've tried the patch putting a 4 in the place noted above. It doesn't
help with the issue at all. Also interesting is the fact that my kernel
hangs upon bootup around starting syslogd/klogd or around setting up the
NIC (haven't quite figured out), if I pull the network plug and continues
when I plug it back in.
-Udo.
On Fri, 8 Dec 2000, Udo A. Steinberg wrote:
> > + /* disable advertising the flow-control capability */
> > + sp->advertising &= ~0x0400;
> > + mdio_write(ioaddr, sp->phy[0] & 0x1f, sp->advertising);
>
> ^^^
> missing a 4 here?
Yes, sorry about that.
> I've tried the patch putting a 4 in the place noted above. It doesn't
> help with the issue at all.
Ok. Can you send me the entire dump? Also, it would be helpful if you
could try to determine when exactly it happens (upon insmod, upon ifconfig
up, or upon receiving some packets later).
> Also interesting is the fact that my kernel
> hangs upon bootup around starting syslogd/klogd or around setting up the
> NIC (haven't quite figured out), if I pull the network plug and continues
> when I plug it back in.
Stupid question: are you sure this is not due to the DNS server being
unreachable?...
Thanks,
Ion
--
It is better to keep your mouth shut and be thought a fool,
than to open it and remove all doubt.
Ion Badulescu wrote:
>
> Ok. Can you send me the entire dump? Also, it would be helpful if you
> could try to determine when exactly it happens (upon insmod, upon ifconfig
> up, or upon receiving some packets later).
I have the eepro driver compiled into a monolithic kernel. After rebooting
a couple dozen times trying to find a pattern I can't see one, except for
one fact worth noticing.
As long as the network cable is pulled, everything's groovy. Kernel boots
nicely without triggering it, ifconfig doesn't trigger it, route doesn't
trigger it, but putting the cable in, immediately triggers it upon packet
traffic.
* put cable in *
eth0: card reports no RX buffers.
eth0: card reports no resources.
eth0: card reports no RX buffers.
eth0: card reports no resources.
:> ifconfig eth0 down
eth0: 0 multicast blocks dropped.
Is it worth analyzing packet traffic and comparing with the timestamps in
syslog to see what kind of packet triggers it, or whether any packet
triggers it?
> Stupid question: are you sure this is not due to the DNS server being
> unreachable?...
Maybe due to the issues with the NIC. All interesting hosts are now
in /etc/hosts, so we'll see.
Other stuff that might be of interest:
PCI Info:
Bus 0, device 13, function 0:
Ethernet controller: Intel Corporation 82557 [Ethernet Pro 100] (rev 8).
IRQ 9.
Master Capable. Latency=32. Min Gnt=8.Max Lat=56.
Non-prefetchable 32 bit memory at 0xd4800000 [0xd4800fff].
I/O at 0x9800 [0x983f].
Non-prefetchable 32 bit memory at 0xd4000000 [0xd40fffff].
Card shares IRQ 9 with 2 other devices:
irq 9: 620 acpi, bttv, eth0
Need any other info?
-Udo.
> * put cable in *
>
> eth0: card reports no RX buffers.
> eth0: card reports no resources.
> eth0: card reports no RX buffers.
> eth0: card reports no resources.
you know, this might be entirely unrelated, but i had the exact same type of
problem with a brand new machine running a not-so-brand new EE100 nic. i
couldn't figure out what was wrong, since it was a literal replacement with an
earlier machine with the same general setup (except it was a pentium-90, this
was a celeron-500-something) ... and in the p-90, that network card never gave
a hiccup.
the only way i could get it to stop was to change the network infrastructure.
this card was connected to a cisco catalyst 1000 24-port 10T switch and 2-port
100T switch. i stuck a generic repeater off one of the 100T ports, jacked the
ee100 into the repeater, and the problem *went away*.
i thought it was just an anomaly.
if it will help, i can get the info from that machine and post it to this
thread.
cheers,
josh fryman
At 20:57 08/12/2000, Ion Badulescu wrote:
>On Fri, 8 Dec 2000, Udo A. Steinberg wrote:
>
> > > + /* disable advertising the flow-control capability */
> > > + sp->advertising &= ~0x0400;
> > > + mdio_write(ioaddr, sp->phy[0] & 0x1f, sp->advertising);
> >
> > ^^^
> > missing a 4 here?
>
>Yes, sorry about that.
>
> > I've tried the patch putting a 4 in the place noted above. It doesn't
> > help with the issue at all.
Just to say that the patch (including added 4) fixed the "card reports no
resources" messages for me. - Looking at my logs the messages appeared once
every 10-40 minutes. - Now the box is up for more than 5 hours with the
patch and test12-pre7 and not a single no resources message logged so far.
(Note, I upgraded the kernel at the same time as adding the patch so it is
actually possible that test12-pre7 vanilla is fixed as well.)
My card is an Ether Express Pro 100, lcpci says: Intel Corporation 82557
[Ethernet Pro 100] (rev 04) and lspci -n gives: class 0200: 10b7:9004
Just my 2p.
Anton
--
"Education is what remains after one has forgotten everything he
learned in school." - Albert Einstein
--
Anton Altaparmakov Voice: +44-(0)1223-333541(lab) / +44-(0)7712-632205(mobile)
Christ's College eMail: [email protected] / [email protected]
Cambridge CB2 3BU ICQ: 8561279
United Kingdom WWW: http://www-stu.christs.cam.ac.uk/~aia21/
Hi,
Anton Altaparmakov wrote:
> Just to say that the patch (including added 4) fixed the "card reports no
> resources" messages for me. - Looking at my logs the messages appeared once
> every 10-40 minutes. - Now the box is up for more than 5 hours with the
> patch and test12-pre7 and not a single no resources message logged so far.
> (Note, I upgraded the kernel at the same time as adding the patch so it is
> actually possible that test12-pre7 vanilla is fixed as well.)
The problem here only ever happens at initialisation/first packets. Once the
network interface has been initialised properly it never produces those
messages anymore. Usually it helps to shut the NIC down with ifconfig and
bringing it back up afterwards to properly initialise it.
If you are bored, try to reboot a couple dozen times and see if you still
see it. I have test12-pre7 also.
> My card is an Ether Express Pro 100, lcpci says: Intel Corporation 82557
> [Ethernet Pro 100] (rev 04) and lspci -n gives: class 0200: 10b7:9004
Mine's a rev 08.
00:0d.0 Class 0200: 8086:1229 (rev 08)
-Udo.
On Mon, 11 Dec 2000, Udo A. Steinberg wrote:
> Anton Altaparmakov wrote:
>
> The problem here only ever happens at initialisation/first packets. Once the
> network interface has been initialised properly it never produces those
> messages anymore. Usually it helps to shut the NIC down with ifconfig and
> bringing it back up afterwards to properly initialise it.
Actually I'm beginning to suspect that you might have a different problem
after all.
Bear with me for another couple of days, until I get near my Linux boxes
and can actually look more closely at things..
Anton Altaparmakov wrote:
> > My card is an Ether Express Pro 100, lcpci says: Intel Corporation 82557
> > [Ethernet Pro 100] (rev 04)
So it's an i82558 A-step. That's interesting, the patch shouldn't have
made any difference on an i82558, at least according to the documentation.
Also according to the documentation (which I only realized later on), the
bit I'm turning off in the advertising word is supposed to be read-only..
This shows how much one can trust the docs, I guess. :)
> > and lspci -n gives: class 0200: 10b7:9004
Umm.. I don't think so. :) This a 3Com 3c900B. You probably got the wrong
entry, in case you have multiple cards in that box.
> Mine's a rev 08.
>
> 00:0d.0 Class 0200: 8086:1229 (rev 08)
This is an i82559 C-step. What kind of switch is it attached to?
Also, if you feel like experimenting, edit speedo_interrupt() and change
outw(status & 0xfc00, ioaddr + SCBStatus);
to
outw(status & 0xff00, ioaddr + SCBStatus);
and see if the complete hangs when disconnecting the cable go away. The
docs say that the device deasserts the interrupt line only when all
interrupt sources have been acked -- so if we don't ack the FCP interrupt
but do somehow get one, we end up with an un-ending stream of interrupts.
You could also try to make the printk() right next to that line *not*
depend on the debug level, and see what you get when the card barfs. Be
aware though that it will print one log line for each interrupt received,
so at the very least make your syslogd log kernel messages asynchronously
(without sync'ing after each line). On the other hand, since your problem
occurs as soon as the device is initialized, it shouldn't be too much of a
flood -- I hope.
Thanks,
Ion
--
It is better to keep your mouth shut and be thought a fool,
than to open it and remove all doubt.
Ion Badulescu wrote:
> This is an i82559 C-step. What kind of switch is it attached to?
It's a 3Com FDDI/Ethernet Linkswitch 2200 Rev 2.8
> Also, if you feel like experimenting, edit speedo_interrupt() and change
> outw(status & 0xfc00, ioaddr + SCBStatus);
> to
> outw(status & 0xff00, ioaddr + SCBStatus);
[rest snipped]
Ok, I'll try that this afternoon and post the results, since it's 4:30am
now and I have yet to try the sleep thing.
-Udo.
At 03:16 11/12/2000, Ion Badulescu wrote:
>On Mon, 11 Dec 2000, Udo A. Steinberg wrote:
>Anton Altaparmakov wrote:
> > > My card is an Ether Express Pro 100, lcpci says: Intel Corporation 82557
> > > [Ethernet Pro 100] (rev 04)
>
>So it's an i82558 A-step. That's interesting, the patch shouldn't have
>made any difference on an i82558, at least according to the documentation.
I'll give test12-pre7 a try without the patch and see if the messages
reappear. - With the patch it the box has been running all night without a
single no resources message from the EEPro.
> > > and lspci -n gives: class 0200: 10b7:9004
>
>Umm.. I don't think so. :) This a 3Com 3c900B. You probably got the wrong
>entry, in case you have multiple cards in that box.
Sorry. Slipped by one line (box has several network cards - only the eepro
gives the no resources messages, the 3com's are fine). The right one line
is: 0200: 8086:1229 (rev 04)
Anton
--
"Education is what remains after one has forgotten everything he
learned in school." - Albert Einstein
--
Anton Altaparmakov Voice: +44-(0)1223-333541(lab) / +44-(0)7712-632205(mobile)
Christ's College eMail: [email protected] / [email protected]
Cambridge CB2 3BU ICQ: 8561279
United Kingdom WWW: http://www-stu.christs.cam.ac.uk/~aia21/