2011-05-01 18:03:10

by Shawn Nock

[permalink] [raw]
Subject: rtlwifi: regression 39-rc5 (rtl8192ce)


--
Shawn Nock (OpenPGP: 0x8132E623)


Attachments:
rtlwifi_bug.txt (1.64 kB)
rtlwifi bug backtrace
dmesg-39rc5.txt (63.29 kB)
dmesg from affected host
Download all attachments

2011-05-01 20:12:18

by Larry Finger

[permalink] [raw]
Subject: Re: rtlwifi: regression 39-rc5 (rtl8192ce)

On 05/01/2011 12:59 PM, Shawn Nock wrote:
>
> During heavy network traffic (esp. flash video) I see the attached NULL
> pointer dereference in 2.6.39-rc5 on an IBM Thinkpad x120e. Immediately
> afterward a "scheduling while atomic" bug is reported and the system
> becomes unresponsive. I am unable to produce this problem in 2.6.38.4.
>
> See attached backtrace and dmesg. Please let me know what I can collect
> to make this problem easier to troubleshoot.

Could you also supply the instruction byte sequence for the oops? It is the
"Code:" line of the dump.

I think you are using a 32-bit system. Is that correct?

Is there one URL that exposes this problem? If so, please sent that as well.

Bisection is not likely to be of much help with this problem. Between 2.6.38 and
2.6.39, driver rtl8192cu was added. It shares a lot of code with rtl8192ce, thus
an extensive reorganization took place.

Larry

2011-05-02 14:29:09

by Shawn Nock

[permalink] [raw]
Subject: Re: rtlwifi: regression 39-rc5 (rtl8192ce)

Larry Finger <[email protected]> writes:

> On 05/01/2011 12:59 PM, Shawn Nock wrote:
>> During heavy network traffic (esp. flash video) I see the attached
>> NULL pointer dereference in 2.6.39-rc5 on an IBM Thinkpad
>> x120e. Immediately afterward a "scheduling while atomic" bug is
>> reported and the system becomes unresponsive. I am unable to produce
>> this problem in 2.6.38.4.
>> See attached backtrace and dmesg. Please let me know what I can
>> collect to make this problem easier to troubleshoot.
>
> Could you also supply the instruction byte sequence for the oops? It
> is the "Code:" line of the dump.

I had not been able to capture that, as the scheduler oops was triggered
during the call stack dump (before the Code line).

> I think you are using a 32-bit system. Is that correct?

Correct.

> Is there one URL that exposes this problem? If so, please sent that as
> well.

It had been speedtest.net (which *used* to crash this all the
time). I've since re-configured the kernel to allow it to build faster
(fedora .config trimmed to remove a lot of the unneeded modules). I can
no longer trigger this BUG.

I originally thought that this may be because I enabled the 8192CU
driver, but clean kernel without it also doesn't trigger the bug. My new
hypothesis is that building the kernel for the AMD processor type was a
mistake (the only other significant change was that I went back to
Pentium Pro as the CPU target).

As far as I can tell this problem is resolved (20+ hours without issue,
less than 20 min previously) and probably due to GCC not being aware of
this (newish) AMD processor.

Sorry for the false alarm and thanks,
Shawn

--
Shawn Nock (OpenPGP: 0x8132E623)

2011-05-02 14:49:12

by Larry Finger

[permalink] [raw]
Subject: Re: rtlwifi: regression 39-rc5 (rtl8192ce)

On 05/02/2011 09:26 AM, Shawn Nock wrote:
> Larry Finger<[email protected]> writes:
>
>> On 05/01/2011 12:59 PM, Shawn Nock wrote:
>>> During heavy network traffic (esp. flash video) I see the attached
>>> NULL pointer dereference in 2.6.39-rc5 on an IBM Thinkpad
>>> x120e. Immediately afterward a "scheduling while atomic" bug is
>>> reported and the system becomes unresponsive. I am unable to produce
>>> this problem in 2.6.38.4.
>>> See attached backtrace and dmesg. Please let me know what I can
>>> collect to make this problem easier to troubleshoot.
>>
>> Could you also supply the instruction byte sequence for the oops? It
>> is the "Code:" line of the dump.
>
> I had not been able to capture that, as the scheduler oops was triggered
> during the call stack dump (before the Code line).
>
>> I think you are using a 32-bit system. Is that correct?
>
> Correct.
>
>> Is there one URL that exposes this problem? If so, please sent that as
>> well.
>
> It had been speedtest.net (which *used* to crash this all the
> time). I've since re-configured the kernel to allow it to build faster
> (fedora .config trimmed to remove a lot of the unneeded modules). I can
> no longer trigger this BUG.
>
> I originally thought that this may be because I enabled the 8192CU
> driver, but clean kernel without it also doesn't trigger the bug. My new
> hypothesis is that building the kernel for the AMD processor type was a
> mistake (the only other significant change was that I went back to
> Pentium Pro as the CPU target).
>
> As far as I can tell this problem is resolved (20+ hours without issue,
> less than 20 min previously) and probably due to GCC not being aware of
> this (newish) AMD processor.
>
> Sorry for the false alarm and thanks,

No problem. Let me know if it resurfaces again.

Larry