2010-06-02 17:51:30

by Reinette Chatre

[permalink] [raw]
Subject: Re: kernel BUG in iwl-agn-rs.c:2076, WAS: iwlagn + some accesspoint == hardlock

On Mon, 2010-05-31 at 13:12 -0700, Nils Radtke wrote:

> This line indicates the first timestamp _after_ the crash:
> May 31 17:35:19 localhost kernel: [ 69.488456]
>
> The crash happened after site A and on site B. Just arrived, opened lid and *crash*.
>
> I noticed in iwl-agn-rs.c:2080:
> BUG_ON(window->average_tpt != ((window->success_ratio *
> tbl->expected_tpt[index] + 64) / 128));
> Could that be again the point that hit me today when the machine crashed once?
> Would you mind changing this into a milder WARN? That way I wouldn't hit the wall
> that hard. And I would notice it anyway while skimming the logs as we still are on the
> hunt. It's more maintainable if it's a WARN in the src instead of me patching it w/ any
> update..
>
> Wasn't this BUG_ON a WARNING in .33.3? (didn't check..)

Seems like you performed the testing without the patch that we used to
address the hang issue from the beginning of this thread. Please see
http://marc.info/?l=linux-wireless&m=127290931304496&w=2 - that thread
also explains why the patch is not in 2.6.34.

I think it is time to move this discussion to a bug report so that it
can be tracked better. Please open a new bug at
http://bugzilla.intellinuxwireless.org/

Reinette



2010-06-04 16:57:40

by Nils Radtke

[permalink] [raw]
Subject: Re: kernel BUG in iwl-agn-rs.c:2076, WAS: iwlagn + some accesspoint == hardlock

Hi Reinette,

BTW, this:
Jun 3 12:05:43 localhost kernel: [174170.391756] iwlagn 0000:03:00.0:
TX Power requested while scanning!
happened even w/o toggling radio switch, so this seems not uniquely
related to toggling the radio switch.

On mer 2010-06-02 @ 10-51-25 -0700, reinette chatre wrote:
# On Mon, 2010-05-31 at 13:12 -0700, Nils Radtke wrote:
#
# > This line indicates the first timestamp _after_ the crash:
# > May 31 17:35:19 localhost kernel: [ 69.488456]
# >
# > The crash happened after site A and on site B. Just arrived, opened lid and *crash*.
# >
# > I noticed in iwl-agn-rs.c:2080:
# > BUG_ON(window->average_tpt != ((window->success_ratio *
# > tbl->expected_tpt[index] + 64) / 128));
# > Could that be again the point that hit me today when the machine crashed once?
# > Would you mind changing this into a milder WARN? That way I wouldn't hit the wall
# > that hard. And I would notice it anyway while skimming the logs as we still are on the
# > hunt. It's more maintainable if it's a WARN in the src instead of me patching it w/ any
# > update..
# >
# > Wasn't this BUG_ON a WARNING in .33.3? (didn't check..)
#
# Seems like you performed the testing without the patch that we used to
# address the hang issue from the beginning of this thread. Please see
Indeed, that's what it feels like. It is just so annoying, that one..
You can't work w/ the kernel drivers. That's a shame.
BTW, iff the patch for the BUG_ON is in kernel src since 2.6.28, that might
explain a lot of crashes before where I haven't never been able to track it down.
Even more, those days I hadn't a chance to do more on this. Unlike now.

# http://marc.info/?l=linux-wireless&m=127290931304496&w=2 - that thread
# also explains why the patch is not in 2.6.34.
It should definitely and absolutely be merged (change the BUG_ON into WARNING).
Even if, like hypothesized, the bug is hidden elsewhere, a BUG_ON doesn't get
me far, it's killing every chance to advance to a solution. How am I supposed
to investigate w/ the kernel crashing? BTW, I don't like working w/ a Linux
kernel that kills my work regularly, I think that's understandable. If I needed
a break from work, I'd set an alarm.

I've seen a bugreport on this issue on the redhat bts referencing my word about
this BUG_ON only getting hit w/ cisco APs. There's a wide range of AP manufacturers
out there in the city. But only cisco APs are crashing this driver. Admittedly, only
on one single location, but anyway it's a cisco. Always the same MAC, unless they
use to reassign MAC addresses, though..

I think it's a tough one, if an AP is able to crash the driver.

I haven't yet received a comment of yours regarding my many other questions in
my previous message. I am willing to help investigate more, assist in other ways
than testing only (always only doing testing isn't a way to keep up fun..)

# I think it is time to move this discussion to a bug report so that it
# can be tracked better. Please open a new bug at
# http://bugzilla.intellinuxwireless.org/
As you wish. It's probably a good idea. But I still miss the registration mail
from bz, did register yesterday.

So, please see to it, that the patch rendering the BUG_ON into a
WARNING finds it's way back in.

Thank you very much,


Nils Radtke

2010-06-08 17:46:32

by Reinette Chatre

[permalink] [raw]
Subject: Re: kernel BUG in iwl-agn-rs.c:2076, WAS: iwlagn + some accesspoint == hardlock

On Fri, 2010-06-04 at 09:57 -0700, Nils Radtke wrote:
> I haven't yet received a comment of yours regarding my many other questions in
> my previous message. I am willing to help investigate more, assist in other ways
> than testing only (always only doing testing isn't a way to keep up fun..)

Your messages contain references to many issues and it is becoming
increasingly hard to keep track of them all in a single email thread.
Since the system crash is clearly the big issue I would like to focus on
that and get that resolved. This is why I proposed that you create bug
reports to help track your various issues better.

Reinette

2010-06-10 14:22:58

by Nils Radtke

[permalink] [raw]
Subject: Re: kernel BUG in iwl-agn-rs.c:2076, WAS: iwlagn + some accesspoint == hardlock


Hi Reinette,

Thanks for your message.

Yes, you're right about the multiple bugs one thread thing.

Just today I got registered w/ the wireless ml because the
system just did not send me a registration message.

For the bug reports to be created it will take me some time.
I'll firstly report the main issue, the 2 other ones afterwards.
Would it be ok cross referencing i.e. to the log and such
between the reports?

Should I paste all the mail messages in separate report messages
(belonging to one bug report, of course) or should I paste some
links to the thread?

Cheers,

Nils

@John: Yes, you're right but the 2.6.33.4 tree which for me still
has the bug_on in.


On Tue 2010-06-08 @ 10-46-29AM -0700, reinette chatre wrote:
# On Fri, 2010-06-04 at 09:57 -0700, Nils Radtke wrote:
# > I haven't yet received a comment of yours regarding my many other questions in
# > my previous message. I am willing to help investigate more, assist in other ways
# > than testing only (always only doing testing isn't a way to keep up fun..)
#
# Your messages contain references to many issues and it is becoming
# increasingly hard to keep track of them all in a single email thread.
# Since the system crash is clearly the big issue I would like to focus on
# that and get that resolved. This is why I proposed that you create bug
# reports to help track your various issues better.
#
# Reinette
#
#

--

2010-06-10 16:19:34

by Reinette Chatre

[permalink] [raw]
Subject: Re: kernel BUG in iwl-agn-rs.c:2076, WAS: iwlagn + some accesspoint == hardlock

On Thu, 2010-06-10 at 07:22 -0700, Nils Radtke wrote:
> For the bug reports to be created it will take me some time.
> I'll firstly report the main issue, the 2 other ones afterwards.

Sounds great. Thanks

> Would it be ok cross referencing i.e. to the log and such
> between the reports?
> Should I paste all the mail messages in separate report messages
> (belonging to one bug report, of course) or should I paste some
> links to the thread?

I find it most convenient if all information related to the bug is
contained in the bug report. Links can be used.

Reinette