2002-10-23 01:12:48

by Peter Chubb

[permalink] [raw]
Subject: Ejecting an orinoco card causes hang


Hi Davids,
I see the following problems with the orinoco plus cardbus plus
yenta_socket system on 2.5.44.
I'm using a Netgear MA401.

1. cardctl reset gives a warning:
orinoco_lock() called with hw_unavailable.
I added a call to dump_stack() where the message was being printed
out --- it's happening when pcmcia_release_configuration() calls
set_socket, which calls yenta_get_socket() which calls set_cis_map
which causes an interrupt, and then orinoco_interrupt reports the
problem. So it's probably benign.

2. cardctl eject gives a warning, Bad: scheduling while atomic. I
think this is a generic problem, not orinoco-specific ---
pcmcia_eject_card() disables interrupts, then calls do_shutdown()
which calls cs_sleep(), and cs_sleep() tries to sleep (but with
interrupts disabled, bad)

3. Manually ejecting the card (without doing a cardctl eject first)
locks the machine solid. Nothing in the logs, nothing on the
screen. I suspect it's disabling interrupts then doing something
silly.

4. Transferring lots of data causes the link to collapse, and the
logs to fill up with `eth0: Error -110 writing Tx descriptor to
BAP' messages

--
Dr Peter Chubb [email protected]
You are lost in a maze of BitKeeper repositories, all almost the same.


2002-10-23 06:32:46

by David Gibson

[permalink] [raw]
Subject: Re: Ejecting an orinoco card causes hang

On Wed, Oct 23, 2002 at 11:18:52AM +1000, [email protected] wrote:
>
> Hi Davids,
> I see the following problems with the orinoco plus cardbus plus
> yenta_socket system on 2.5.44.
> I'm using a Netgear MA401.
>
> 1. cardctl reset gives a warning:
> orinoco_lock() called with hw_unavailable.
> I added a call to dump_stack() where the message was being printed
> out --- it's happening when pcmcia_release_configuration() calls
> set_socket, which calls yenta_get_socket() which calls set_cis_map
> which causes an interrupt, and then orinoco_interrupt reports the
> problem. So it's probably benign.

Yes, that's probably right. In fact the hw_unavailable flag exists
specifically to stop orinoco_interrupt() and others doing anything
worse than giving a warning if called at this sort of time. It would
certainly be bad to go ahead and access the hardware at this point.

We've already cleared the INTEN register at this point, so we're not
expecting to get an interrupt. But I guess that interrupt line is
shared with something else.

In the long time that warning should probably disappear (we should
just do nothing safely and silently). For now it is still usful for
tracking down real problems.

> 2. cardctl eject gives a warning, Bad: scheduling while atomic. I
> think this is a generic problem, not orinoco-specific ---
> pcmcia_eject_card() disables interrupts, then calls do_shutdown()
> which calls cs_sleep(), and cs_sleep() tries to sleep (but with
> interrupts disabled, bad)

I think that's correct.

> 3. Manually ejecting the card (without doing a cardctl eject first)
> locks the machine solid. Nothing in the logs, nothing on the
> screen. I suspect it's disabling interrupts then doing something
> silly.

I suspect this may be another PCMCIA rather than orinoco problem,
although I'm not sure. If it's happening in the orinoco driver, I
have no idea where it could be - I've generally been careful to have
timeouts and checks to handle the device suddenly disappearing.

Do you get a hang if you ifconfig down the interface, but don't
cardctl eject the card? I've also heard that some PCMCIA hardware
can't reliably cope with hot unplug like this.

> 4. Transferring lots of data causes the link to collapse, and the
> logs to fill up with `eth0: Error -110 writing Tx descriptor to
> BAP' messages

:-( this sounds like one of the perennial problems we've had with some
cards. The firmware falls over, and I haven't been able to figure out
what we've done to upset it.

--
David Gibson | For every complex problem there is a
[email protected] | solution which is simple, neat and
| wrong.
http://www.ozlabs.org/people/dgibson

2002-10-23 07:49:01

by Russell King

[permalink] [raw]
Subject: Re: Ejecting an orinoco card causes hang

On Wed, Oct 23, 2002 at 11:18:52AM +1000, [email protected] wrote:
> 4. Transferring lots of data causes the link to collapse, and the
> logs to fill up with `eth0: Error -110 writing Tx descriptor to
> BAP' messages

I see type of behaviour this with an Orinoco Silver card while trying to
set the mode/essid. I took the wvlan_cs code from my RH7.2 box and dropped
it into 2.5 - seems to work (although how reliable it is I don't know yet;
I need to get something for this card to talk to.)

http://ftp.linux.org.uk/pub/linux/rmk/wireless/wvlan_cs-2.5.44.diff

Another difference that I noticed was that when no AP is in range, and the
ESSID has never been set, orinoco v0.07 reports "unspecified SSID!!!" as
the ESSID, as does wvlan_cs on the same RH7.2 kernel and with wvlan_cs on
2.5.44. However, orinoco 0.13a reports an empty string.

Looking at the bytes read off the card, it seems that it returns a zero
length word, followed by the string "unspecified SSID!!!" with orinoco
0.13a.

Also, (iirc) I could make the card happier with the orinoco 0.13a driver
if I made it read excess bytes when reading the BAP (like wvlan_cs does.)
However, this didn't competely solve the problem - I still saw what I
think are firmware crashes.

--
Russell King ([email protected]) The developer of ARM Linux
http://www.arm.linux.org.uk/personal/aboutme.html

2002-12-04 04:50:20

by David Gibson

[permalink] [raw]
Subject: Re: Ejecting an orinoco card causes hang

On Wed, Oct 23, 2002 at 08:55:02AM +0100, Russell King wrote:
> On Wed, Oct 23, 2002 at 11:18:52AM +1000, [email protected] wrote:
> > 4. Transferring lots of data causes the link to collapse, and the
> > logs to fill up with `eth0: Error -110 writing Tx descriptor to
> > BAP' messages
>
> I see type of behaviour this with an Orinoco Silver card while trying to
> set the mode/essid. I took the wvlan_cs code from my RH7.2 box and dropped
> it into 2.5 - seems to work (although how reliable it is I don't know yet;
> I need to get something for this card to talk to.)

Sadly, I'm still battling this particular problem. However, I have
just fixed a bug which could cause hangs on eject. It's in the
"testing" version at
http://www.ozlabs.org/people/dgibson/dldwd/testing.

--
David Gibson | For every complex problem there is a
[email protected] | solution which is simple, neat and
| wrong.
http://www.ozlabs.org/people/dgibson