2000-12-06 12:42:25

by Miles Lane

[permalink] [raw]
Subject: Re: The horrible hack from hell called A20

I reported problems with using my two Cardbus cards simultaneously
with previous test12 releases. The behavior has changed with pre6.

#1

When I run "ifup eth0", I get an error message:

SIOCADDRT: File exists
SIOCADDRT: File exists

This happens even when my 3c575 Cardbus ethernet card is the
only Cardbus card inserted. This behavior existed in pre4, too,
though.

#2

If I insert both my 3c575 and Belkin BusPort Mobile USB host-controller
and then enable both of them, "modprobe usb-ohci" hangs. If I then
attempt "modprobe -r 3c59x", that process hangs, too. lsmod shows:

usb-ohci 15072 1 (initializing)
3c59x 0 0 (deleted)
usbcore 50384 1 (autoclean) [usb-ohci]

Then, when I try to shut the machine down, the shutdown process
hangs when trying to close down eth0.

I am including my entire dmesg output. I apologize for this, but
I am not sure what parts of the logfile are definitely irrelevant
to this report.

Linux version 2.4.0-test12 (root@agate) (gcc version egcs-2.91.66
19990314/Linux (egcs-1.1.2 release)) #5 Wed Dec 6 00:48:18 PST 2000
BIOS-provided physical RAM map:
BIOS-e820: 000000000009f800 @ 0000000000000000 (usable)
BIOS-e820: 0000000000000800 @ 000000000009f800 (reserved)
BIOS-e820: 0000000000010000 @ 00000000000f0000 (reserved)
BIOS-e820: 0000000004f00000 @ 0000000000100000 (usable)
BIOS-e820: 0000000000010000 @ 00000000ffff0000 (reserved)
Scan SMP from c0000000 for 1024 bytes.
Scan SMP from c009fc00 for 1024 bytes.
Scan SMP from c00f0000 for 65536 bytes.
Scan SMP from c009f800 for 4096 bytes.
On node 0 totalpages: 20480
zone(0): 4096 pages.
zone(1): 16384 pages.
zone(2): 0 pages.
mapped APIC to ffffe000 (01156000)
Kernel command line: auto BOOT_IMAGE=Serial-Debug ro root=305
pci=biosirq console=ttyS0,38400 console=tty0
Initializing CPU#0
Detected 232.112 MHz processor.
Console: colour VGA+ 80x43
Calibrating delay loop... 462.03 BogoMIPS
Memory: 78616k/81920k available (1032k kernel code, 2916k reserved, 82k
data, 204k init, 0k highmem)
Dentry-cache hash table entries: 16384 (order: 5, 131072 bytes)
Buffer-cache hash table entries: 4096 (order: 2, 16384 bytes)
Page-cache hash table entries: 32768 (order: 5, 131072 bytes)
Inode-cache hash table entries: 8192 (order: 4, 65536 bytes)
VFS: Diskquotas version dquot_6.4.0 initialized
CPU: Before vendor init, caps: 0183f9ff 00000000 00000000, vendor = 0
CPU: L1 I cache: 16K, L1 D cache: 16K
CPU: L2 cache: 512K
Intel machine check architecture supported.
Intel machine check reporting enabled on CPU#0.
CPU: After vendor init, caps: 0183f9ff 00000000 00000000 00000000
CPU: After generic, caps: 0183f9ff 00000000 00000000 00000000
CPU: Common caps: 0183f9ff 00000000 00000000 00000000
CPU: Intel Pentium II (Deschutes) stepping 00
Checking 'hlt' instruction... OK.
POSIX conformance testing by UNIFIX
mtrr: v1.37 (20001109) Richard Gooch ([email protected])
mtrr: detected mtrr type: Intel
PCI: PCI BIOS revision 2.10 entry at 0xfda13, last bus=0
PCI: Using configuration type 1
PCI: Probing PCI hardware
PCI: Using IRQ router PIIX [8086/7110] at 00:07.0
got res[10000000:10000fff] for resource 0 of Texas Instruments PCI1131
got res[10001000:10001fff] for resource 0 of Texas Instruments
PCI1131 (#2)
Limiting direct PCI/PCI transfers.
Linux NET4.0 for Linux 2.4
Based upon Swansea University Computer Society NET3.039
Starting kswapd v1.8
pty: 256 Unix98 ptys configured
Uniform Multi-Platform E-IDE driver Revision: 6.31
ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx
PIIX4: IDE controller on PCI bus 00 dev 39
PIIX4: chipset revision 1
PIIX4: not 100% native mode: will probe irqs later
ide0: BM-DMA at 0xfcf0-0xfcf7, BIOS settings: hda:DMA, hdb:pio
ide1: BM-DMA at 0xfcf8-0xfcff, BIOS settings: hdc:pio, hdd:pio
hda: TOSHIBA MK4006MAV, ATA DISK drive
hdc: TOSHIBA CD-ROM XM-1702BC, ATAPI CDROM drive
ide0 at 0x1f0-0x1f7,0x3f6 on irq 14
ide1 at 0x170-0x177,0x376 on irq 15
hda: 8007552 sectors (4100 MB), CHS=993/128/63, UDMA(33)
Partition check:
/dev/ide/host0/bus0/target0/lun0: p1 p2 < p5 p6 >
Serial driver version 5.02 (2000-08-09) with MANY_PORTS SHARE_IRQ
SERIAL_PCI enabled
ttyS00 at 0x03f8 (irq = 4) is a 16550A
ttyS01 at 0x02f8 (irq = 3) is a 16550A
Linux PCMCIA Card Services 3.1.22
options: [pci] [cardbus]
PCI: Enabling device 00:04.0 (0000 -> 0002)
PCI: Assigned IRQ 11 for device 00:04.0
PCI: Enabling device 00:04.1 (0000 -> 0002)
PCI: Assigned IRQ 11 for device 00:04.1
Intel PCIC probe: not found.
NET4: Linux TCP/IP 1.0 for NET4.0
IP Protocols: ICMP, UDP, TCP, IGMP
IP: routing cache hash table of 512 buckets, 4Kbytes
TCP: Hash tables configured (established 8192 bind 8192)
Linux IP multicast router 0.06 plus PIM-SM
NET4: Unix domain sockets 1.0/SMP for Linux NET4.0.
devfs: v0.102 (20000622) Richard Gooch ([email protected])
devfs: devfs_debug: 0x0
devfs: boot_options: 0x2
Yenta IRQ list 0698, PCI irq11
Socket status: 30000006
Yenta IRQ list 0698, PCI irq11
Socket status: 30000020
cs: cb_alloc(bus 1): vendor 0x10b7, device 0x5157
got res[1000:107f] for resource 0 of PCI device 10b7:5157
got res[10800000:1080007f] for resource 1 of PCI device 10b7:5157
got res[10800080:108000ff] for resource 2 of PCI device 10b7:5157
got res[10400000:1041ffff] for resource 6 of PCI device 10b7:5157
PCI: Enabling device 01:00.0 (0000 -> 0003)
PCI: Found IRQ 11 for device 01:00.0
PCI: The same IRQ used for device 00:04.0
call_usermodehelper[/sbin/hotplug]: no root fs
VFS: Mounted root (ext2 filesystem) readonly.
Freeing unused kernel memory: 204k freed
Adding Swap: 108824k swap-space (priority -1)
usb.c: registered new driver usbdevfs
usb.c: registered new driver hub
3c59x.c:LK1.1.11 13 Nov 2000 Donald Becker and others.
http://www.scyld.com/network/vortex.html $Revision: 1.102.2.46 $
See Documentation/networking/vortex.txt
eth0: 3Com PCI 3CCFE575BT Cyclone CardBus at 0x1000, PCI: Found IRQ 11
for device 01:00.0
PCI: The same IRQ used for device 00:04.0
PCI: Setting latency timer of device 01:00.0 to 64
00:10:4b:7c:9d:9d, IRQ 11
eth0: CardBus functions mapped 10800080->c5840080
8K byte-wide RAM 5:3 Rx:Tx split, MII interface.
MII transceiver found at address 0, status 782d.
Enabling bus-master transmits and whole-frame receives.
eth0: using default media MII
isapnp: Scanning for Pnp cards...
isapnp: No Plug & Play device found
snd: cs4231: port = 0x530, id = 0xa
snd: CS4231: VERSION (I25) = 0x3
snd: CS4231: ext version; rev = 0xe8, id = 0xe8
snd: CS4236: [0xf00] C1 (version) = 0xe8, ext = 0xe8
cs: cb_alloc(bus 5): vendor 0x1045, device 0xc861
got res[11000000:11000fff] for resource 0 of PCI device 1045:c861
PCI: Enabling device 05:00.0 (0000 -> 0002)
PCI: Found IRQ 11 for device 05:00.0
PCI: The same IRQ used for device 00:04.0
PCI: The same IRQ used for device 01:00.0
PCI: Found IRQ 11 for device 05:00.0
PCI: The same IRQ used for device 00:04.0
PCI: The same IRQ used for device 01:00.0
PCI: Setting latency timer of device 05:00.0 to 64
usb-ohci.c: USB OHCI at membase 0xc586b000, IRQ 11
usb-ohci.c: usb-05:00.0, PCI device 1045:c861
usb.c: new USB bus registered, assigned bus number 1
usb.c: kmalloc IF c2a60720, numif 1
usb.c: new device strings: Mfr=0, Product=2, SerialNumber=1
usb.c: USB device number 1 default language ID 0x0
Product: USB OHCI Root Hub
SerialNumber: c586b000
hub.c: USB hub found
hub.c: 2 ports detected
hub.c: standalone hub
hub.c: ganged power switching
hub.c: global over-current protection
hub.c: power on to power good time: 2ms
hub.c: hub controller current requirement: 0mA
hub.c: port removable status: RR
hub.c: local power source is good
hub.c: no over-current condition exists
hub.c: enabling power on all ports
usb.c: hub driver claimed interface c2a60720
usb.c: kusbd: /sbin/hotplug add 1
hub.c: port 2 connection change
hub.c: port 2, portstatus 301, change 1, 1.5 Mb/s
hub.c: port 2, portstatus 303, change 10, 1.5 Mb/s
hub.c: USB new device connect on bus1/2, assigned device number 2
usb.c: kmalloc IF c2a603a0, numif 1
usb.c: skipped 1 class/vendor specific interface descriptors
usb.c: new device strings: Mfr=1, Product=2, SerialNumber=0
usb.c: USB device number 2 default language ID 0x409
Manufacturer: Microsoft
Product: Microsoft IntelliMouse&reg; Optical
usb.c: unhandled interfaces on device
usb.c: USB device 2 (vend/prod 0x45e/0x29) is not claimed by any active
driver.
Length = 18
DescriptorType = 01
USB version = 1.10
Vendor:Product = 045e:0029
MaxPacketSize0 = 8
NumConfigurations = 1
Device version = 1.08
Device Class:SubClass:Protocol = 00:00:00
Per-interface classes
Configuration:
bLength = 9
bDescriptorType = 02
wTotalLength = 0022
bNumInterfaces = 01
bConfigurationValue = 01
iConfiguration = 00
bmAttributes = a0
MaxPower = 100mA

Interface: 0
Alternate Setting: 0
bLength = 9
bDescriptorType = 04
bInterfaceNumber = 00
bAlternateSetting = 00
bNumEndpoints = 01
bInterface Class:SubClass:Protocol = 03:01:02
iInterface = 00
Endpoint:
bLength = 7
bDescriptorType = 05
bEndpointAddress = 81 (in)
bmAttributes = 03 (Interrupt)
wMaxPacketSize = 0004
bInterval = 0a
usb.c: kusbd: /sbin/hotplug add 2




2000-12-06 20:36:26

by Linus Torvalds

[permalink] [raw]
Subject: Re: The horrible hack from hell called A20



On Wed, 6 Dec 2000, Miles Lane wrote:
>
> If I insert both my 3c575 and Belkin BusPort Mobile USB host-controller
> and then enable both of them, "modprobe usb-ohci" hangs. If I then
> attempt "modprobe -r 3c59x", that process hangs, too. lsmod shows:
>
> usb-ohci 15072 1 (initializing)
> 3c59x 0 0 (deleted)
> usbcore 50384 1 (autoclean) [usb-ohci]

The only thing in common between the two will be the fact that they do
share the same irq, and I'm not at all sure that those two drivers are
always happy about irq sharing.

Your dmesg output looks sane and happy, though. Both the USB and the 3c59x
driver find their hardware, and claim to have successfully initialized
them. The USB driver even finds the stuff on the USB bus (microsoft
intellimouse), so it obviously works to a large degree. Similarly, the
ethernet driver happily finds everything etc.

In fact, everything looks so happy that I bet that the reason the module
is stuck initializing is some setup problem, possibly because kusbd ends
up waiting on /sbin/hotplug or similar. It does not look like the drivers
themselves would have trouble, it looks much more like a modprobe-related
issue (maybe deadlocking on some semaphore or other lock).

I'd suggest two things:

- try not using modules. Does it "just work" for you then? (Both the OHCI
and the 3c59x driver should happily work with hotplug compiled right
into the kernel).

- try "strace"ing the whole modprobe thing, to see where it hangs, in
order to figure out what it is waiting for. I wonder if it's the
keventd changes.

Basically, I think this is a completely different problem, and not really
driver-related any more.

Linus

2000-12-06 23:07:44

by Miles Lane

[permalink] [raw]
Subject: Re: The horrible hack from hell called A20

Hi Linus,

Thanks for the reply.

I agree with your analysis of the information I reported
in this message. However, in previous related bug reports
I mentioned actual functional conflicts between the drivers.

Here is what goes wrong:

Dec 6 04:21:32 agate kernel: eth0: Host error, FIFO diagnostic register
0000.
Dec 6 04:21:32 agate kernel: eth0: using default media MII
Dec 6 04:21:32 agate kernel: eth0: Host error, FIFO diagnostic register
0000.
Dec 6 04:21:32 agate kernel: eth0: using default media MII
Dec 6 04:21:32 agate kernel: eth0: Host error, FIFO diagnostic register
0000.
Dec 6 04:21:33 agate kernel: eth0: using default media MII
Dec 6 04:21:33 agate kernel: eth0: Host error, FIFO diagnostic register
0000.
Dec 6 04:21:33 agate kernel: eth0: using default media MII
Dec 6 04:21:33 agate kernel: eth0: Too much work in interrupt, status
e003.
Dec 6 04:21:33 agate kernel: eth0: Host error, FIFO diagnostic register
0000.
Dec 6 04:21:33 agate kernel: eth0: using default media MII
Dec 6 04:21:33 agate kernel: eth0: Host error, FIFO diagnostic register
0000.
Dec 6 04:21:33 agate kernel: eth0: using default media MII
Dec 6 04:21:33 agate kernel: eth0: Host error, FIFO diagnostic register
0000.

The repro case is to simply get both drivers happily loaded, insert
my USB mouse and restart XFree86 so the USB mouse gets used, then
copy a file from an FTP site while moving the mouse.

Here's an strace of modprobe usb-ohci:


query_module(NULL, QM_SYMBOLS, 0x806afe0, 16384, 21890) = -1 ENOSPC (No
space left on device)
brk(0x8076000) = 0x8076000
query_module(NULL, QM_SYMBOLS, { /* 930 entries */ }, 930) = 0
lseek(3, 0, SEEK_SET) = 0
read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\1\0\3\0\1\0\0\0\0\0\0\0"...,
52) = 52
lseek(3, 15100, SEEK_SET) = 15100
read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,
560) = 560
lseek(3, 64, SEEK_SET) = 64
read(3, "WVS\213t$\0241\333\17\267V\0049\323}\34\215~\20\213\4\237"...,
13074) = 13074
lseek(3, 18756, SEEK_SET) = 18756
read(3, "\35\0\0\0\2:\0\0/\0\0\0\2:\0\0o\0\0\0\2;\0\0\213\0\0\0"...,
2248) = 2248
lseek(3, 13152, SEEK_SET) = 13152
read(3, "\0\0\0\0\254\377\377\377\271\377\377\377\254\377\377\377"...,
192) = 192
lseek(3, 21004, SEEK_SET) = 21004
read(3, "@\0\0\0\1\3\0\0D\0\0\0\1\3\0\0L\0\0\0\1\2\0\0P\0\0\0\1"..., 96)
= 96
lseek(3, 13376, SEEK_SET) = 13376
read(3, "kernel_version=2.4.0-test12\0\0\0\0\0"..., 140) = 140
lseek(3, 13536, SEEK_SET) = 13536
read(3, "/usr/src/linux/include/linux/mou"..., 1393) = 1393
lseek(3, 21100, SEEK_SET) = 21100
read(3, "\350\2\0\0\1\2\0\0\354\2\0\0\1\2\0\0\360\2\0\0\1\2\0\0"...,
160) = 160
lseek(3, 14929, SEEK_SET) = 14929
read(3, "\0GCC: (GNU) egcs-2.91.66 1999031"..., 61) = 61
lseek(3, 14990, SEEK_SET) = 14990
read(3, "\0.symtab\0.strtab\0.shstrtab\0.text"..., 108) = 108
lseek(3, 15660, SEEK_SET) = 15660
read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\1\0\0\0\0\0\0\0\0\0\0"...,
1664) = 1664
brk(0x8077000) = 0x8077000
lseek(3, 17324, SEEK_SET) = 17324
read(3, "\0usb-ohci.c\0gcc2_compiled.\0__mod"..., 1430) = 1430
brk(0x8078000) = 0x8078000
lstat("/lib", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
lstat("/lib/modules", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
lstat("/lib/modules/2.4.0-test12", {st_mode=S_IFDIR|0775, st_size=4096,
...}) = 0
lstat("/lib/modules/2.4.0-test12/kernel", {st_mode=S_IFDIR|0755,
st_size=4096, ...}) = 0
lstat("/lib/modules/2.4.0-test12/kernel/drivers", {st_mode=S_IFDIR|0755,
st_size=4096, ...}) = 0
lstat("/lib/modules/2.4.0-test12/kernel/drivers/usb",
{st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
lstat("/lib/modules/2.4.0-test12/kernel/drivers/usb/usb-ohci.o",
{st_mode=S_IFREG|0644, st_size=21260, ...}) = 0
stat("/lib/modules/2.4.0-test12/kernel/drivers/usb/usb-ohci.o",
{st_mode=S_IFREG|0644, st_size=21260, ...}) = 0
create_module("usb-ohci", 15072) = 0xc5891000
brk(0x807c000) = 0x807c000
init_module("usb-ohci", 0x8077d10 <unfinished ...>

I will test again with usb-ohci and 3c59x built into
the kernel.

As always, many thanks for your help!

Miles

<snip>

> The only thing in common between the two will be the fact that they do
> share the same irq, and I'm not at all sure that those two drivers are
> always happy about irq sharing.
>
> Your dmesg output looks sane and happy, though. Both the USB and the 3c59x
> driver find their hardware, and claim to have successfully initialized
> them. The USB driver even finds the stuff on the USB bus (microsoft
> intellimouse), so it obviously works to a large degree. Similarly, the
> ethernet driver happily finds everything etc.
>
> In fact, everything looks so happy that I bet that the reason the module
> is stuck initializing is some setup problem, possibly because kusbd ends
> up waiting on /sbin/hotplug or similar. It does not look like the drivers
> themselves would have trouble, it looks much more like a modprobe-related
> issue (maybe deadlocking on some semaphore or other lock).
>
> I'd suggest two things:
>
> - try not using modules. Does it "just work" for you then? (Both the OHCI
> and the 3c59x driver should happily work with hotplug compiled right
> into the kernel).
>
> - try "strace"ing the whole modprobe thing, to see where it hangs, in
> order to figure out what it is waiting for. I wonder if it's the
> keventd changes.

2000-12-07 02:59:10

by Linus Torvalds

[permalink] [raw]
Subject: Re: The horrible hack from hell called A20



On Wed, 6 Dec 2000, Miles Lane wrote:
>
> Here is what goes wrong:
>
> Dec 6 04:21:32 agate kernel: eth0: Host error, FIFO diagnostic register 0000.

But it continues to work, right?

I bet that your ethernet card is just unhappy that it couldn't get DMA in
time, because the bus was so busy. Many of the busmastering ethernet
devices will start the packet send early, happy in the knowledge that
they'll usually have plenty of time to DMA the data by the time they need
it.

This works fine most of the time, but if you have a busy PCI bus and
you're doing things over a (potentially slow) PCI bridge like the Cardbus
bridge, you're taking chances. And sometimes those chances do not work out
ok.. Especially if you have slow memory, which most laptops have.

I suspect that the worst result of this is just a noisy driver: both on
the network (runt packets) and on the console. And it obviously will cause
performance to suffer too, due to retransmitting packets that failed,
and/or losing packets.

There may be some rule for the threshold for sending packets or something
else to make this happen less, so this is probably tweakable. But it
doesn't sound deadly (unless the driver causes this to result in a dead
network - does it?)

Linus

2000-12-07 07:15:24

by Miles Lane

[permalink] [raw]
Subject: Re: The horrible hack from hell called A20

Linus Torvalds wrote:

>
> On Wed, 6 Dec 2000, Miles Lane wrote:
>
>> Here is what goes wrong:
>>
>> Dec 6 04:21:32 agate kernel: eth0: Host error, FIFO diagnostic register 0000.
>
>
> But it continues to work, right?

I'll check. My system only has 80MB RAM, and I run Mozilla, which
pushes a lot of information into the swap space. When I encounter
this "Host error" problem, tons of messages start spewing into my
logs. This bogs my entire system down horribly.

<great educational material snipped>

I have reproduced this problem with all the drivers built
into the kernel.

I have also just tried a test pass with 3c59x built in and
USB built as modules. I booted with only the 3c575 inserted.
I got eth0 running and then loaded usb-ohci (with the enable
bus mastering change added). This resulted in modprobe hanging
again.

Now I'll try with all modules again and check to see whether eth0
is still usable.

Thanks,

Miles

2000-12-07 07:43:09

by Linus Torvalds

[permalink] [raw]
Subject: Re: The horrible hack from hell called A20



On Wed, 6 Dec 2000, Miles Lane wrote:
>
> I have also just tried a test pass with 3c59x built in and
> USB built as modules. I booted with only the 3c575 inserted.
> I got eth0 running and then loaded usb-ohci (with the enable
> bus mastering change added). This resulted in modprobe hanging
> again.

I bet you're hanging on the rtnl_semaphore due to having a /sbin/hotplug
policy.

Miles, mind trying out a really simple change in the
____call_usermodehelper() function in kernel/kmod.c?

Change: #if 0 out the whole block that says "if (retval >= 0)" and does
the waiting for the child. We shouldn't wait for the user mode helper:
that's just going to cause nasty deadlocks. Deadlocks like the one you
seem to be seeing, in fact.

Does your ifconfig problem go away with that fix?

Linus

2000-12-07 15:56:42

by Andrew Morton

[permalink] [raw]
Subject: Re: The horrible hack from hell called A20

Linus Torvalds wrote:
>
> On Wed, 6 Dec 2000, Miles Lane wrote:
> >
> > Here is what goes wrong:
> >
> > Dec 6 04:21:32 agate kernel: eth0: Host error, FIFO diagnostic register 0000.
>
> But it continues to work, right?
>
> I bet that your ethernet card is just unhappy that it couldn't get DMA in
> time, because the bus was so busy. Many of the busmastering ethernet
> devices will start the packet send early, happy in the knowledge that
> they'll usually have plenty of time to DMA the data by the time they need
> it.
>
> This works fine most of the time, but if you have a busy PCI bus and
> you're doing things over a (potentially slow) PCI bridge like the Cardbus
> bridge, you're taking chances. And sometimes those chances do not work out
> ok.. Especially if you have slow memory, which most laptops have.
>
> I suspect that the worst result of this is just a noisy driver: both on
> the network (runt packets) and on the console. And it obviously will cause
> performance to suffer too, due to retransmitting packets that failed,
> and/or losing packets.
>
> There may be some rule for the threshold for sending packets or something
> else to make this happen less, so this is probably tweakable. But it
> doesn't sound deadly (unless the driver causes this to result in a dead
> network - does it?)
>

We initialise the 3com NICs so that the DMA of Tx frames doesn't
commence until 1536 free bytes are available in the Tx FIFO. I assume
this is to make the most of the NIC's ability to bus-master-transfer an
entire frame in one slurp. But this is irrelevant.

We initialise the NIC so it starts putting data on the wire after 128
bytes are in the Tx FIFO. So yes, there is an opportunity for another
bus master to interrupt the slurp and to hold the bus for so long that
the NIC gets a TX underrun. But surely not by just wiggling the mouse
around?

I have seen just one report of a person getting Tx underruns. The
driver recovered OK. But Miles is reporting "Host error". This is
different. The 3com datasheet says:

This bit is set when a catastrophic error related to the bus
interface occurs.

The errors that set this bit are PCI target abort and PCI master
abort.

This bis is cleared by issuing the GlobalReset command...

This is a very rare problem. Trolling the vortex archives comes up
with a few comments from Das Nicmeisters:

> Donald Becker write:
> Another PCMCIA setup bug, except this one is much harder to track down.
> The CardBus bridge chip isn't configured correctly.
> This is a real bus problem, not a false report.

> David Hinds wrote:
> I've gotten a few reports of these PCI bus errors. They have indeed
> been very hard to track down, since they are specific to particular
> hardware combinations, and I've never been able to reproduce them.

> Donald Becker wrote:
> I've gotten this error on my Vaio 505TR, but I've never been able to
> reproduce it when I'm ready to observe it.

Miles, could you please apply the below patch? It'll give us a
little more info about the PCI error. Bit 31 of `bus status' is
MasterAbort and bit 30 is TargetAbort.

Also, you can disable the start-tx-after-128-bytes feature by uncommenting

// wait_for_completion(dev, SetTxStart|0x07ff);

near the end of vortex_up(). With this change the NIC won't start
transmitting until it has the entire frame onboard. It shouldn't make
any difference (hah).

This does look like a Cardbus bridge problem.



--- linux-2.4.0-test12-pre7/drivers/net/3c59x.c Tue Nov 21 20:11:20 2000
+++ linux-akpm/drivers/net/3c59x.c Fri Dec 8 02:24:11 2000
@@ -203,7 +203,7 @@
#include <linux/delay.h>

static char version[] __devinitdata =
-"3c59x.c:LK1.1.11 13 Nov 2000 Donald Becker and others. http://www.scyld.com/network/vortex.html " "$Revision: 1.102.2.46 $\n";
+"3c59x.c:LK1.1.11 13 Nov 2000 Donald Becker and others. http://www.scyld.com/network/vortex.html " "$Revision: 1.102.2.40 $\n";

MODULE_AUTHOR("Donald Becker <[email protected]>");
MODULE_DESCRIPTION("3Com 3c59x/3c90x/3c575 series Vortex/Boomerang/Cyclone driver");
@@ -843,10 +843,15 @@
{
int rc;

- rc = vortex_probe1 (pdev, pci_resource_start (pdev, 0), pdev->irq,
- ent->driver_data, vortex_cards_found);
- if (rc == 0)
- vortex_cards_found++;
+ /* wake up and enable device */
+ if (pci_enable_device (pdev)) {
+ rc = -EIO;
+ } else {
+ rc = vortex_probe1 (pdev, pci_resource_start (pdev, 0), pdev->irq,
+ ent->driver_data, vortex_cards_found);
+ if (rc == 0)
+ vortex_cards_found++;
+ }
return rc;
}

@@ -863,7 +868,7 @@
struct vortex_private *vp;
int option;
unsigned int eeprom[0x40], checksum = 0; /* EEPROM contents */
- int i;
+ int i, step;
struct net_device *dev;
static int printed_version;
int retval;
@@ -912,12 +917,6 @@
vp->must_free_region = 1;
}

- /* wake up and enable device */
- if (pci_enable_device (pdev)) {
- retval = -EIO;
- goto free_region;
- }
-
/* enable bus-mastering if necessary */
if (vci->flags & PCI_USES_MASTER)
pci_set_master (pdev);
@@ -1025,6 +1024,13 @@
dev->irq);
#endif

+ EL3WINDOW(4);
+ step = (inb(ioaddr + Wn4_NetDiag) & 0x1e) >> 1;
+ printk(KERN_INFO " product code '%c%c' rev %02x.%d date %02d-"
+ "%02d-%02d\n", eeprom[6]&0xff, eeprom[6]>>8, eeprom[0x14],
+ step, (eeprom[4]>>5) & 15, eeprom[4] & 31, eeprom[4]>>9);
+
+
if (pdev && vci->drv_flags & HAS_CB_FNS) {
unsigned long fn_st_addr; /* Cardbus function status space */
unsigned short n;
@@ -1148,14 +1154,19 @@
return retval;
}

-static void wait_for_completion(struct net_device *dev, int cmd)
+#define wait_for_completion(dev, cmd) _wait_for_completion(dev, cmd, __LINE__)
+
+static void _wait_for_completion(struct net_device *dev, int cmd, int line)
{
- int i = 4000;
+ int i;

outw(cmd, dev->base_addr + EL3_CMD);
- while (--i > 0) {
- if (!(inw(dev->base_addr + EL3_STATUS) & CmdInProgress))
+ for (i = 0; i < 4000000; i++) {
+ if (!(inw(dev->base_addr + EL3_STATUS) & CmdInProgress)) {
+ if (i > 1000)
+ printk("wait_for_completion: line=%d, count=%d\n", line, i);
return;
+ }
}
printk(KERN_ERR "%s: command 0x%04x did not complete! Status=0x%x\n",
dev->name, cmd, inw(dev->base_addr + EL3_STATUS));
@@ -1331,6 +1342,7 @@
set_rx_mode(dev);
outw(StatsEnable, ioaddr + EL3_CMD); /* Turn on statistics. */

+// wait_for_completion(dev, SetTxStart|0x07ff);
outw(RxEnable, ioaddr + EL3_CMD); /* Enable the receiver. */
outw(TxEnable, ioaddr + EL3_CMD); /* Enable transmitter. */
/* Allow status bits to be seen. */
@@ -1663,6 +1675,12 @@
dev->name, fifo_diag);
/* Adapter failure requires Tx/Rx reset and reinit. */
if (vp->full_bus_master_tx) {
+ int bus_status = inl(ioaddr + PktStatus);
+ /* 0x80000000 PCI master abort. */
+ /* 0x40000000 PCI target abort. */
+ if (vortex_debug)
+ printk(KERN_ERR "%s: PCI bus error, bus status %8.8x\n", dev->name, bus_status);
+
/* In this case, blow the card away */
vortex_down(dev);
wait_for_completion(dev, TotalReset | 0xff);