Dear all,
kernel 2.6.33-rc4
I am having repeatable complete hard lockups on my laptop with 2.6.33-rc4.
2.6.32.3 works fine.
I believe that it is related to the network, because sometimes I can
actually log in (gnomes session) and as soon as I do some network
related suddenly hard hang, not even Sysrq working anymore.
Interestingly it only happens at a specific AP where the ESSID is
hidden (at work). At home I can work without any problems (ESSID not
hidden).
Unfortunately I cannot set up a serial console or similar.
Is there still anything else I can provide you for tracking that down.
Best wishes
Norbert
------------------------------------------------------------------------
Norbert Preining preining@{jaist.ac.jp, logic.at, debian.org}
JAIST, Japan TU Wien, Austria Debian TeX Task Force
DSA: 0x09C5B094 fp: 14DF 2E6C 0307 BE6D AD76 A9C0 D2BF 4AA3 09C5 B094
------------------------------------------------------------------------
POGES (pl.n.)
The lumps of dry powder that remain after cooking a packet soup.
--- Douglas Adams, The Meaning of Liff
On Sat, 2010-01-16 at 10:30 -0800, Norbert Preining wrote:
> On Fr, 15 Jan 2010, reinette chatre wrote:
> > > kernel 2.6.33-rc4
> > >
> > > I am having repeatable complete hard lockups on my laptop with 2.6.33-rc4.
> > > 2.6.32.3 works fine.
> > >
> > > I believe that it is related to the network, because sometimes I can
> > > actually log in (gnomes session) and as soon as I do some network
> > > related suddenly hard hang, not even Sysrq working anymore.
> > >
> > > Interestingly it only happens at a specific AP where the ESSID is
> > > hidden (at work). At home I can work without any problems (ESSID not
> > > hidden).
> > >
> > > Unfortunately I cannot set up a serial console or similar.
> >
> > Does that mean no netconsole either? Does anything show up in the logs?
> > Is it easy to reproduce? If so, perhaps you can have increased debug at
> > that time and hopefully something will be captured in the logs when the
> > problem occurs.
>
> Before I can test this on monday, something else, I just got BUG_ON:
> Jan 17 03:28:58 mithrandir kernel: [34535.207253] iwlagn 0000:06:00.0: iwl_tx_agg_start on ra = 00:0a:79:eb:56:10 tid = 0
> Jan 17 03:28:58 mithrandir kernel: [34535.331218] iwlagn 0000:06:00.0: BUG_ON idx doesn't match seq control idx=139, seq_idx=3435, seq=54960
> Jan 17 03:28:58 mithrandir kernel: [34535.331275] iwlagn 0000:06:00.0: Received BA when not expected
> Jan 17 03:28:58 mithrandir kernel: [34535.331816] iwlagn 0000:06:00.0: BUG_ON idx doesn't match seq control idx=146, seq_idx=3442, seq=55072
> Jan 17 03:28:58 mithrandir kernel: [34535.331915] iwlagn 0000:06:00.0: Received BA when not expected
> Jan 17 03:28:58 mithrandir kernel: [34535.332419] iwlagn 0000:06:00.0: BUG_ON idx doesn't match seq control idx=170, seq_idx=3466, seq=55456
>
> Actually many many many of these lines.
>
What you are seeing here is currently being looked into at
http://bugzilla.intellinuxwireless.org/show_bug.cgi?id=2098 - could you
please add your information there?
Reinette
Hi Reinette,
On Mo, 18 Jan 2010, reinette chatre wrote:
> > > Does that mean no netconsole either? Does anything show up in the logs?
> > > Is it easy to reproduce? If so, perhaps you can have increased debug at
> > > that time and hopefully something will be captured in the logs when the
> > > problem occurs.
I tried it today, but had "real work" (university job) to do. It worked
and I found out that it happend (up to now) *NOT* when I was only doing
a ping on a server, but when I ssh-ed into my server it hang.
More testing tomorrow (here it is already 2am).
BTW, logs were empty, unfortunately, complete hard hang.
> > Jan 17 03:28:58 mithrandir kernel: [34535.332419] iwlagn 0000:06:00.0: BUG_ON idx doesn't match seq control idx=170, seq_idx=3466, seq=55456
> >
> > Actually many many many of these lines.
> >
>
> What you are seeing here is currently being looked into at
> http://bugzilla.intellinuxwireless.org/show_bug.cgi?id=2098 - could you
> please add your information there?
I did that, although I was not sure what information to provide.
Best wishes
Norbert
------------------------------------------------------------------------
Norbert Preining preining@{jaist.ac.jp, logic.at, debian.org}
JAIST, Japan TU Wien, Austria Debian TeX Task Force
DSA: 0x09C5B094 fp: 14DF 2E6C 0307 BE6D AD76 A9C0 D2BF 4AA3 09C5 B094
------------------------------------------------------------------------
THURNBY (n.)
A rucked-up edge of carpet or linoleum which everyone says someone
will trip over and break a leg unless it gets fixed. After a year or
two someone trips over it and breaks a leg.
--- Douglas Adams, The Meaning of Liff
Hi Reinette,
On Fr, 15 Jan 2010, reinette chatre wrote:
> > I am having repeatable complete hard lockups on my laptop with 2.6.33-rc4.
> > 2.6.32.3 works fine.
> >
> > I believe that it is related to the network, because sometimes I can
> > actually log in (gnomes session) and as soon as I do some network
> > related suddenly hard hang, not even Sysrq working anymore.
> >
> > Interestingly it only happens at a specific AP where the ESSID is
> > hidden (at work). At home I can work without any problems (ESSID not
> > hidden).
> >
> > Unfortunately I cannot set up a serial console or similar.
>
> Does that mean no netconsole either? Does anything show up in the logs?
> Is it easy to reproduce? If so, perhaps you can have increased debug at
> that time and hopefully something will be captured in the logs when the
> problem occurs.
Ok, I can confirm that setting up the network is not the problem, nor
is it pinging other hosts. But ssh-ing into another server
made it go boom. From the screenshot I attach it looks like something
in TCP code (that explains why it does not happen in pings), below
I see tcp_data_snd_check
I managed to swithc in time to a console with tail -f syslog before
it hard locked up. The log files are empty, but I got a screenshot photo
which has some hopefully useful information. I cannot scroll up or down
anymore ...
If you want me to create a bug report or you create one in bugzilla,
I can also upload it htere, but I attach it for now.
Best wishes
Norbert
------------------------------------------------------------------------
Norbert Preining preining@{jaist.ac.jp, logic.at, debian.org}
JAIST, Japan TU Wien, Austria Debian TeX Task Force
DSA: 0x09C5B094 fp: 14DF 2E6C 0307 BE6D AD76 A9C0 D2BF 4AA3 09C5 B094
------------------------------------------------------------------------
SOLENT (adj.)
Descriptive of the state of serene self-knowledge reached through
drink.
--- Douglas Adams, The Meaning of Liff
Hi Norbert,
On Mon, 2010-01-18 at 21:47 -0800, Norbert Preining wrote:
> On Fr, 15 Jan 2010, reinette chatre wrote:
> > > I am having repeatable complete hard lockups on my laptop with 2.6.33-rc4.
> > > 2.6.32.3 works fine.
> > >
> > > I believe that it is related to the network, because sometimes I can
> > > actually log in (gnomes session) and as soon as I do some network
> > > related suddenly hard hang, not even Sysrq working anymore.
> > >
> > > Interestingly it only happens at a specific AP where the ESSID is
> > > hidden (at work). At home I can work without any problems (ESSID not
> > > hidden).
> > >
> > > Unfortunately I cannot set up a serial console or similar.
> >
> > Does that mean no netconsole either? Does anything show up in the logs?
> > Is it easy to reproduce? If so, perhaps you can have increased debug at
> > that time and hopefully something will be captured in the logs when the
> > problem occurs.
>
> Ok, I can confirm that setting up the network is not the problem, nor
> is it pinging other hosts. But ssh-ing into another server
> made it go boom. From the screenshot I attach it looks like something
> in TCP code (that explains why it does not happen in pings), below
> I see tcp_data_snd_check
>
> I managed to swithc in time to a console with tail -f syslog before
> it hard locked up. The log files are empty, but I got a screenshot photo
> which has some hopefully useful information. I cannot scroll up or down
> anymore ...
>
> If you want me to create a bug report or you create one in bugzilla,
> I can also upload it htere, but I attach it for now.
I see that it fails in skb_pull after being called from one of the RX
handlers. Let's add Johannes.
Johannes, does anything perhaps look familiar to you in this trace?
Thank you
Reinette
On Tue, 2010-01-19 at 09:01 -0800, reinette chatre wrote:
> > If you want me to create a bug report or you create one in bugzilla,
> > I can also upload it htere, but I attach it for now.
>
> I see that it fails in skb_pull after being called from one of the RX
> handlers. Let's add Johannes.
>
> Johannes, does anything perhaps look familiar to you in this trace?
Sorry, no, seems weird. The trace is not very useful unfortunately, is
this with CONFIG_FRAME_POINTER?
johannes
On Di, 19 Jan 2010, Johannes Berg wrote:
> Sorry, no, seems weird. The trace is not very useful unfortunately, is
> this with CONFIG_FRAME_POINTER?
# CONFIG_FRAME_POINTER is not set
Do you need it?
Other things for the .config needed?
Best wishes
Norbert
------------------------------------------------------------------------
Norbert Preining preining@{jaist.ac.jp, logic.at, debian.org}
JAIST, Japan TU Wien, Austria Debian TeX Task Force
DSA: 0x09C5B094 fp: 14DF 2E6C 0307 BE6D AD76 A9C0 D2BF 4AA3 09C5 B094
------------------------------------------------------------------------
GREAT TOSSON (n.)
A fat book containing four words and six cartoons which cost ?6.95.
--- Douglas Adams, The Meaning of Liff
On Mon, 2010-01-18 at 22:47 -0700, Norbert Preining wrote:
> Hi Reinette,
>
> On Fr, 15 Jan 2010, reinette chatre wrote:
> > > I am having repeatable complete hard lockups on my laptop with 2.6.33-rc4.
> > > 2.6.32.3 works fine.
> > >
> > > I believe that it is related to the network, because sometimes I can
> > > actually log in (gnomes session) and as soon as I do some network
> > > related suddenly hard hang, not even Sysrq working anymore.
> > >
> > > Interestingly it only happens at a specific AP where the ESSID is
> > > hidden (at work). At home I can work without any problems (ESSID not
> > > hidden).
> > >
> > > Unfortunately I cannot set up a serial console or similar.
> >
> > Does that mean no netconsole either? Does anything show up in the logs?
> > Is it easy to reproduce? If so, perhaps you can have increased debug at
> > that time and hopefully something will be captured in the logs when the
> > problem occurs.
>
> Ok, I can confirm that setting up the network is not the problem, nor
> is it pinging other hosts. But ssh-ing into another server
> made it go boom. From the screenshot I attach it looks like something
> in TCP code (that explains why it does not happen in pings), below
> I see tcp_data_snd_check
>
> I managed to swithc in time to a console with tail -f syslog before
> it hard locked up. The log files are empty, but I got a screenshot photo
> which has some hopefully useful information. I cannot scroll up or down
> anymore ...
Looks like this this is the BUG_ON in skb_pull. Please try if this patch
help? BTW, are you using swiotlb?
diff --git a/drivers/net/wireless/iwlwifi/iwl-rx.c b/drivers/net/wireless/iwlwifi/iwl-rx.c
index 6f36b6e..2f8978f 100644
--- a/drivers/net/wireless/iwlwifi/iwl-rx.c
+++ b/drivers/net/wireless/iwlwifi/iwl-rx.c
@@ -1031,6 +1031,11 @@ void iwl_rx_reply_rx(struct iwl_priv *priv,
return;
}
+ if (len < ieee80211_hdrlen(header->frame_control)) {
+ IWL_DEBUG_RX(priv, "Packet size is too small %d\n", len);
+ return;
+ }
+
/* This will be used in several places later */
rate_n_flags = le32_to_cpu(phy_res->rate_n_flags);
On Wed, 2010-01-20 at 01:36 +0100, Norbert Preining wrote:
> On Di, 19 Jan 2010, Johannes Berg wrote:
> > Sorry, no, seems weird. The trace is not very useful unfortunately,
> is
> > this with CONFIG_FRAME_POINTER?
>
> # CONFIG_FRAME_POINTER is not set
>
> Do you need it?
The stacktrace would be a lot more useful with it set, yes. Other than
that, I don't know. If there's a way to make your display resolution
higher that might be useful so more info fits on the screen, or maybe
trimming the stack trace depth (though I don't know if that's possible,
I do know it is on powerpc because I added it there but not sure on x86)
All assuming you can reproduce this issue, of course.
johannes
Dear all,
On Mi, 20 Jan 2010, Zhu Yi wrote:
> Looks like this this is the BUG_ON in skb_pull. Please try if this patch
> help? BTW, are you using swiotlb?
On Mi, 20 Jan 2010, Johannes Berg wrote:
> > # CONFIG_FRAME_POINTER is not set
>
> The stacktrace would be a lot more useful with it set, yes. Other than
> that, I don't know. If there's a way to make your display resolution
> higher that might be useful so more info fits on the screen, or maybe
> trimming the stack trace depth (though I don't know if that's possible,
> I do know it is on powerpc because I added it there but not sure on x86)
>
> All assuming you can reproduce this issue, of course.
@Zhu: the patch didn't help. I patched it into the kernel plus activated
CONFIG_FRAME_POINTER which led to the same hang (not surprisingly, the
patch does only debug more ;-)
This time unfortunately I there was too much output to actually capture it.
@Johannes: 100% reproducible. Everytime I boot into 33-rc4 and ssh into
any remote place it goes boom. 100%.
Maybe another tidbig might help: With 2.6.32.3 it happens that I have
hickups with WLAN:
[ 996.514491] iwlagn 0000:06:00.0: iwl_tx_agg_start on ra = 00:24:c4:ab:bb:42 tid = 0
and the connections needs 10-20secs (hard to guess) until it is
back alive.
Best wishes
Norbert
------------------------------------------------------------------------
Norbert Preining preining@{jaist.ac.jp, logic.at, debian.org}
JAIST, Japan TU Wien, Austria Debian TeX Task Force
DSA: 0x09C5B094 fp: 14DF 2E6C 0307 BE6D AD76 A9C0 D2BF 4AA3 09C5 B094
------------------------------------------------------------------------
AITH (n.)
The single bristle that sticks out sideways on a cheap paintbrush.
--- Douglas Adams, The Meaning of Liff
Do you have a point and shoot camera that can shoot video? I've used
one in the past to capture debug info that scrolls by too quickly.
John
On Wed, Jan 20, 2010 at 4:28 PM, Norbert Preining <[email protected]> wrote:
> Dear all,
>
> On Mi, 20 Jan 2010, Zhu Yi wrote:
>> Looks like this this is the BUG_ON in skb_pull. Please try if this patch
>> help? BTW, are you using swiotlb?
>
> On Mi, 20 Jan 2010, Johannes Berg wrote:
>> > # CONFIG_FRAME_POINTER is not set
>>
>> The stacktrace would be a lot more useful with it set, yes. Other than
>> that, I don't know. If there's a way to make your display resolution
>> higher that might be useful so more info fits on the screen, or maybe
>> trimming the stack trace depth (though I don't know if that's possible,
>> I do know it is on powerpc because I added it there but not sure on x86)
>>
>> All assuming you can reproduce this issue, of course.
>
>
> @Zhu: the patch didn't help. I patched it into the kernel plus activated
> CONFIG_FRAME_POINTER which led to the same hang (not surprisingly, the
> patch does only debug more ;-)
>
> This time unfortunately I there was too much output to actually capture it.
>
> @Johannes: 100% reproducible. Everytime I boot into 33-rc4 and ssh into
> any remote place it goes boom. 100%.
>
> Maybe another tidbig might help: With 2.6.32.3 it happens that I have
> hickups with WLAN:
> [ ?996.514491] iwlagn 0000:06:00.0: iwl_tx_agg_start on ra = 00:24:c4:ab:bb:42 tid = 0
> and the connections needs 10-20secs (hard to guess) until it is
> back alive.
>
> Best wishes
>
> Norbert
> ------------------------------------------------------------------------
> Norbert Preining ? ? ? ? ? ?preining@{jaist.ac.jp, logic.at, debian.org}
> JAIST, Japan ? ? ? ? ? ?TU Wien, Austria ? ? ? ? ? Debian TeX Task Force
> DSA: 0x09C5B094 ? fp: 14DF 2E6C 0307 BE6D AD76 ?A9C0 D2BF 4AA3 09C5 B094
> ------------------------------------------------------------------------
> AITH (n.)
> The single bristle that sticks out sideways on a cheap paintbrush.
> ? ? ? ? ? ? ? ? ? ? ? ?--- Douglas Adams, The Meaning of Liff
>
> ------------------------------------------------------------------------------
> Throughout its 18-year history, RSA Conference consistently attracts the
> world's best and brightest in the field, creating opportunities for Conference
> attendees to learn about information security's most important issues through
> interactions with peers, luminaries and emerging and established companies.
> http://p.sf.net/sfu/rsaconf-dev2dev
> _______________________________________________
> Ipw3945-devel mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/ipw3945-devel
>
Hi everyone,
On Mi, 20 Jan 2010, Zhu Yi wrote:
> Looks like this this is the BUG_ON in skb_pull. Please try if this patch
> help? BTW, are you using swiotlb?
As said, no it does not help.
I am currently running 2.6.33-rc5 and that bug is in my work place
100% reproducible.
Anything I can do more?
Should we move that to a bugzilla entry?
Best wishes
Norbert
------------------------------------------------------------------------
Norbert Preining preining@{jaist.ac.jp, logic.at, debian.org}
JAIST, Japan TU Wien, Austria Debian TeX Task Force
DSA: 0x09C5B094 fp: 14DF 2E6C 0307 BE6D AD76 A9C0 D2BF 4AA3 09C5 B094
------------------------------------------------------------------------
CAMER (n.)
A mis-tossed caber.
--- Douglas Adams, The Meaning of Liff
On Wed, 2010-01-27 at 07:37 -0800, Norbert Preining wrote:
> Should we move that to a bugzilla entry?
>
Please do. Thank you very much
Reinette
On Mi, 27 Jan 2010, reinette chatre wrote:
> > Should we move that to a bugzilla entry?
> >
>
> Please do. Thank you very much
Done that,
http://bugzilla.intellinuxwireless.org/show_bug.cgi?id=2155
Should that be a bug in the kernel bugzilla as regression, too?
I mean 2.6.32.N does not suffer from that.
Best wishes
Norbert
------------------------------------------------------------------------
Norbert Preining preining@{jaist.ac.jp, logic.at, debian.org}
JAIST, Japan TU Wien, Austria Debian TeX Task Force
DSA: 0x09C5B094 fp: 14DF 2E6C 0307 BE6D AD76 A9C0 D2BF 4AA3 09C5 B094
------------------------------------------------------------------------
OSHKOSH (n., vb.)
The noise made by someone who has just been grossly flattered and is
trying to make light of it.
--- Douglas Adams, The Meaning of Liff
On Thu, 2010-01-28 at 00:41 -0800, Norbert Preining wrote:
> Done that,
> http://bugzilla.intellinuxwireless.org/show_bug.cgi?id=2155
Thank you.
>
> Should that be a bug in the kernel bugzilla as regression, too?
> I mean 2.6.32.N does not suffer from that.
It is easier for our team to track bugs in our bugzilla. That is where
we will be working on resolving this issue. If you would like to submit
a kernel bugzilla for the purpose of tracking a regression you are
welcome to, please then just use it for that purpose and point people to
our bugzilla for the details and latest status of this issue.
Thank you
Reinette
Hi Norbert,
On Fri, 2010-01-15 at 07:22 -0800, Norbert Preining wrote:
> kernel 2.6.33-rc4
>
> I am having repeatable complete hard lockups on my laptop with 2.6.33-rc4.
> 2.6.32.3 works fine.
>
> I believe that it is related to the network, because sometimes I can
> actually log in (gnomes session) and as soon as I do some network
> related suddenly hard hang, not even Sysrq working anymore.
>
> Interestingly it only happens at a specific AP where the ESSID is
> hidden (at work). At home I can work without any problems (ESSID not
> hidden).
>
> Unfortunately I cannot set up a serial console or similar.
Does that mean no netconsole either? Does anything show up in the logs?
Is it easy to reproduce? If so, perhaps you can have increased debug at
that time and hopefully something will be captured in the logs when the
problem occurs.
>
> Is there still anything else I can provide you for tracking that down.
Can you try to boot without X and attempt a command line association
(using iw, iwconfig or wpa_supplicant) to reproduce?
Reinette
On Fr, 15 Jan 2010, reinette chatre wrote:
> Can you try to boot without X and attempt a command line association
> (using iw, iwconfig or wpa_supplicant) to reproduce?
I try on Monday back at work.
Best wishes
Norbert
------------------------------------------------------------------------
Norbert Preining preining@{jaist.ac.jp, logic.at, debian.org}
JAIST, Japan TU Wien, Austria Debian TeX Task Force
DSA: 0x09C5B094 fp: 14DF 2E6C 0307 BE6D AD76 A9C0 D2BF 4AA3 09C5 B094
------------------------------------------------------------------------
PIDDLETRENTHIDE (n.)
A trouser stain caused by a wimbledon (q.v.). Not to be confused with
a botley (q.v.)
--- Douglas Adams, The Meaning of Liff
On Fr, 15 Jan 2010, reinette chatre wrote:
> > kernel 2.6.33-rc4
> >
> > I am having repeatable complete hard lockups on my laptop with 2.6.33-rc4.
> > 2.6.32.3 works fine.
> >
> > I believe that it is related to the network, because sometimes I can
> > actually log in (gnomes session) and as soon as I do some network
> > related suddenly hard hang, not even Sysrq working anymore.
> >
> > Interestingly it only happens at a specific AP where the ESSID is
> > hidden (at work). At home I can work without any problems (ESSID not
> > hidden).
> >
> > Unfortunately I cannot set up a serial console or similar.
>
> Does that mean no netconsole either? Does anything show up in the logs?
> Is it easy to reproduce? If so, perhaps you can have increased debug at
> that time and hopefully something will be captured in the logs when the
> problem occurs.
Before I can test this on monday, something else, I just got BUG_ON:
Jan 17 03:28:58 mithrandir kernel: [34535.207253] iwlagn 0000:06:00.0: iwl_tx_agg_start on ra = 00:0a:79:eb:56:10 tid = 0
Jan 17 03:28:58 mithrandir kernel: [34535.331218] iwlagn 0000:06:00.0: BUG_ON idx doesn't match seq control idx=139, seq_idx=3435, seq=54960
Jan 17 03:28:58 mithrandir kernel: [34535.331275] iwlagn 0000:06:00.0: Received BA when not expected
Jan 17 03:28:58 mithrandir kernel: [34535.331816] iwlagn 0000:06:00.0: BUG_ON idx doesn't match seq control idx=146, seq_idx=3442, seq=55072
Jan 17 03:28:58 mithrandir kernel: [34535.331915] iwlagn 0000:06:00.0: Received BA when not expected
Jan 17 03:28:58 mithrandir kernel: [34535.332419] iwlagn 0000:06:00.0: BUG_ON idx doesn't match seq control idx=170, seq_idx=3466, seq=55456
Actually many many many of these lines.
Best wishes
Norbert
------------------------------------------------------------------------
Norbert Preining preining@{jaist.ac.jp, logic.at, debian.org}
JAIST, Japan TU Wien, Austria Debian TeX Task Force
DSA: 0x09C5B094 fp: 14DF 2E6C 0307 BE6D AD76 A9C0 D2BF 4AA3 09C5 B094
------------------------------------------------------------------------
LARGOWARD (n.)
Motorists' name for the kind of pedestrian who stands beside a main
road and waves on the traffic, as if it's their right of way.
--- Douglas Adams, The Meaning of Liff