2013-10-22 04:16:39

by Alexandre Oliva

[permalink] [raw]
Subject: RTL8187B is racy

It's been at least a year since I first noticed that, on WiFi-busy
environments such as airports, hotels and Free Software conferences, my
Yeeloong laptop with a RTL8187B WiFi card will freeze or oops shortly
after I enable WiFi. This problem doesn't seem to happen when I'm at
home, probably because of the low WiFi traffic. The problem occurs
while running 3.11.* and 3.10.* kernels, but not 3.4.* or 3.0.*.

I couldn't find any changes to the rtl8187 module that explain this
misbehavior, so I suspect it's some new source of parallelism in the
mac80211 layer that has exposed the lack of synchronization in uses of
rx_queue and b_tx_status.queue. Indeed, I found many uses of these
queues that don't take locks to ensure consistency. Unfortunately,
adding spin locks around all uses causes harder freezes and/or complains
about scheduling in atomic contexts, depending on which race I hit
first. Without any changes, the problem I get most often is a crash
within rtl8187b_status_cb, when skb_unlink attempts to dereference a
NULL pointer. Testing skb->prev and skb->next before entering the
branch where the skb is removed seemed to make the error a little bit
less frequent, but surely not enough for the machine to remain up for
very long while WiFi is enabled.

Is this a known problem? Any suggestions on what I could try next to
fix the problem?

Thanks in advance,

--
Alexandre Oliva, freedom fighter http://FSFLA.org/~lxoliva/
You must be the change you wish to see in the world. -- Gandhi
Be Free! -- http://FSFLA.org/ FSF Latin America board member
Free Software Evangelist Red Hat Brazil Compiler Engineer


2013-10-22 15:32:40

by Larry Finger

[permalink] [raw]
Subject: Re: RTL8187B is racy

On 10/21/2013 11:07 PM, Alexandre Oliva wrote:
> It's been at least a year since I first noticed that, on WiFi-busy
> environments such as airports, hotels and Free Software conferences, my
> Yeeloong laptop with a RTL8187B WiFi card will freeze or oops shortly
> after I enable WiFi. This problem doesn't seem to happen when I'm at
> home, probably because of the low WiFi traffic. The problem occurs
> while running 3.11.* and 3.10.* kernels, but not 3.4.* or 3.0.*.
>
> I couldn't find any changes to the rtl8187 module that explain this
> misbehavior, so I suspect it's some new source of parallelism in the
> mac80211 layer that has exposed the lack of synchronization in uses of
> rx_queue and b_tx_status.queue. Indeed, I found many uses of these
> queues that don't take locks to ensure consistency. Unfortunately,
> adding spin locks around all uses causes harder freezes and/or complains
> about scheduling in atomic contexts, depending on which race I hit
> first. Without any changes, the problem I get most often is a crash
> within rtl8187b_status_cb, when skb_unlink attempts to dereference a
> NULL pointer. Testing skb->prev and skb->next before entering the
> branch where the skb is removed seemed to make the error a little bit
> less frequent, but surely not enough for the machine to remain up for
> very long while WiFi is enabled.
>
> Is this a known problem? Any suggestions on what I could try next to
> fix the problem?

No, the problem has not previously been reported. From your description of the
situation where it happens, the problem requires a lot of same channel, same AP
traffic. I will try to duplicate that condition here. Although I have an
RTL8187B device, I seldom use it as the case on the USB stick is falling apart.
I will need to do some repair on it so that it holds together.

After inspecting the code in rtl8187b_status_cb, I did notice that it does a lot
of things that should be done by mac80211. As you have been testing code
modifications, I assume that you will be able to test any patches that I generate.

Larry


2013-10-22 19:31:02

by Alexandre Oliva

[permalink] [raw]
Subject: Re: RTL8187B is racy

On Oct 22, 2013, Larry Finger <[email protected]> wrote:

> After inspecting the code in rtl8187b_status_cb, I did notice that it
> does a lot of things that should be done by mac80211. As you have been
> testing code modifications, I assume that you will be able to test any
> patches that I generate.

Yeah, I can easily build and test patches here, at my workplace at home.
The only catch is that the module already works here; it seems to fail
only at busier environments, which I only get into once a month or so.
As long as we're not in much of a hurry, I can have things set up so
that, whenever I hit the problem with the module as it is now, I have a
patched module handy to test. So, I'm looking forward to your patches
and/or suggestions on what else to try. Just please make sure you Cc:
me, so that I won't miss it.

Thanks a lot!

--
Alexandre Oliva, freedom fighter http://FSFLA.org/~lxoliva/
You must be the change you wish to see in the world. -- Gandhi
Be Free! -- http://FSFLA.org/ FSF Latin America board member
Free Software Evangelist Red Hat Brazil Compiler Engineer