[I am not sure whether this is ieee80211 or iwl3945 issue so put both
maintainers into loop]
Hi,
I have just noticed that my CPU0 is hogged by del_timer_sync called from
iwl3945_rs_free_sta and it doesn't seem to be able to do any progress.
Two consequent sysrq+t show that:
kworker/u:23 R running 0 2190 2 0x00000000
cdc39dec c046e940 00000000 c06cfa40 c06cfa40 eebbdf1c 00003f38 00000000
00000000 f4df3200 e22e8000 eef8c800 00003f38 cdc39da8 c0156944 00000000
f62029c0 cdc39de4 c014d8bd 00000000 eebbc346 00003f38 00000046 f62025c0
Call Trace:
[<c046e940>] ? __schedule+0x86a/0x893
[<c0156944>] ? tick_program_event+0x24/0x29
[<c014d8bd>] ? hrtimer_interrupt+0x120/0x1b2
[<c012a7dd>] ? get_parent_ip+0xb/0x31
[<c0472b69>] ? add_preempt_count+0x95/0x98
[<c01161d2>] ? smp_apic_timer_interrupt+0x6b/0x7e
[<c0470a0e>] ? apic_timer_interrupt+0x2a/0x30
[<c013d8ff>] ? lock_timer_base.isra.29+0x14/0x41
[<c013d948>] ? try_to_del_timer_sync+0x1c/0xb9
[<c013da16>] ? del_timer_sync+0x31/0x3c
[<f813a431>] ? iwl3945_rs_free_sta+0x10/0x12 [iwl3945]
[<f81c2f24>] ? __sta_info_free.isra.24+0x25/0x38 [mac80211]
[<f81c3cb4>] ? __sta_info_destroy+0x239/0x260 [mac80211]
[<f81c404c>] ? sta_info_flush+0x43/0x6e [mac80211]
[<f81c92e3>] ? ieee80211_set_disassoc+0x1a3/0x1d7 [mac80211]
[<f81c9478>] ? ieee80211_sta_connection_lost+0x31/0x8a [mac80211]
[<c031136f>] ? dev_printk+0x2b/0x2d
[<f81cbb42>] ? ieee80211_sta_work+0x113/0x126 [mac80211]
[<f81ce0f9>] ? ieee80211_iface_work+0x23e/0x250 [mac80211]
[<c0146673>] ? process_one_work+0x19b/0x2e6
[<f81cdebb>] ? ieee80211_netdev_select_queue+0x14/0x14 [mac80211]
[<c0146ba1>] ? worker_thread+0x136/0x1ee
[<c0146a6b>] ? manage_workers.isra.23+0x14f/0x14f
[<c0149eb9>] ? kthread+0x67/0x6c
[<c0149e52>] ? kthread_worker_fn+0x119/0x119
[<c04755b6>] ? kernel_thread_helper+0x6/0x10
and
kworker/u:23 R running 0 2190 2 0x00000000
00000000 0001103c 00000000 00000000 cdc39d7c c0115c2f cdc39d98 c0155b4c
00000001 f62025c0 f62025c0 9f8a8400 00003f62 cdc39da8 c0156944 00000000
f62029c0 cdc39de4 c014d8bd 00000000 9f4d7f93 00003f62 00000046 f62025c0
Call Trace:
[<c0115c2f>] ? lapic_next_event+0x1b/0x1f
[<c0155b4c>] ? clockevents_program_event+0xea/0x108
[<c0156944>] ? tick_program_event+0x24/0x29
[<c014d8bd>] ? hrtimer_interrupt+0x120/0x1b2
[<c012a7dd>] ? get_parent_ip+0xb/0x31
[<c0472ac7>] ? sub_preempt_count+0x81/0x8e
[<c0138913>] ? irq_exit+0x90/0x92
[<c01161d7>] ? smp_apic_timer_interrupt+0x70/0x7e
[<c0470a0e>] ? apic_timer_interrupt+0x2a/0x30
[<c013d8fb>] ? lock_timer_base.isra.29+0x10/0x41
[<c013d948>] ? try_to_del_timer_sync+0x1c/0xb9
[<c013da16>] ? del_timer_sync+0x31/0x3c
[<f813a431>] ? iwl3945_rs_free_sta+0x10/0x12 [iwl3945]
[<f81c2f24>] ? __sta_info_free.isra.24+0x25/0x38 [mac80211]
[<f81c3cb4>] ? __sta_info_destroy+0x239/0x260 [mac80211]
[<f81c404c>] ? sta_info_flush+0x43/0x6e [mac80211]
[<f81c92e3>] ? ieee80211_set_disassoc+0x1a3/0x1d7 [mac80211]
[<f81c9478>] ? ieee80211_sta_connection_lost+0x31/0x8a [mac80211]
[<c031136f>] ? dev_printk+0x2b/0x2d
[<f81cbb42>] ? ieee80211_sta_work+0x113/0x126 [mac80211]
[<f81ce0f9>] ? ieee80211_iface_work+0x23e/0x250 [mac80211]
[<c0146673>] ? process_one_work+0x19b/0x2e6
[<f81cdebb>] ? ieee80211_netdev_select_queue+0x14/0x14 [mac80211]
[<c0146ba1>] ? worker_thread+0x136/0x1ee
[<c0146a6b>] ? manage_workers.isra.23+0x14f/0x14f
[<c0149eb9>] ? kthread+0x67/0x6c
[<c0149e52>] ? kthread_worker_fn+0x119/0x119
[<c04755b6>] ? kernel_thread_helper+0x6/0x10
Looks we are stuck in lock_timer_base loop. I am not familiar with the
code but either base is NULL or we are racing with timer->base changes.
I guess that the first one sounds more probable. The timer in question
is &rs_sta->rate_scale_flush.
The disassoc came after resume from RAM when the network was gone (I
have moved to the office from home). I didn't see this issue with 3.1
kernel and early 3.20rc* were too unstable on my laptop so I had to skip
them.
config is attached.
Let me know if you need any further information.
Thanks
--
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9
Czech Republic
On Wed 30-11-11 15:23:16, Stanislaw Gruszka wrote:
> On Wed, Nov 30, 2011 at 11:10:28AM +0100, Michal Hocko wrote:
> > On Tue 29-11-11 12:39:07, Stanislaw Gruszka wrote:
> > > On Tue, Nov 29, 2011 at 11:07:27AM +0100, Michal Hocko wrote:
> > > > [I am not sure whether this is ieee80211 or iwl3945 issue so put both
> > > > maintainers into loop]
> > > The only changed we had in iwlegacy between 3.1 and 3.2-rc, was only
> > > adjustment to mac80211 changes. However I think this is iwlegacy issue,
> > > just for some reason bug did not trigger before.
> >
> > I have double checked 3.1 and cannot reproduce it.
> > Anyway, I have put:
> >
> > diff --git a/drivers/net/wireless/iwlegacy/iwl-3945-rs.c b/drivers/net/wireless/iwlegacy/iwl-3945-rs.c
> > index 8faeaf2..9221ed4 100644
> > --- a/drivers/net/wireless/iwlegacy/iwl-3945-rs.c
> > +++ b/drivers/net/wireless/iwlegacy/iwl-3945-rs.c
> > @@ -432,6 +432,7 @@ static void iwl3945_rs_free_sta(void *iwl_priv, struct ieee80211_sta *sta,
> > * to use iwl_priv to print out debugging) since it may not be fully
> > * initialized at this point.
> > */
> > + printk("XXX: deleting time: %x\n", rs_sta->rate_scale_flush.base);
> > del_timer_sync(&rs_sta->rate_scale_flush);
> > }
> >
> > And the timer base is really NULL when the issue happens. So, somebody
> > probably removed the timer already?
>
> I think we call rs_ops->free_sta without rs_ops->alloc_sta, otherwise
> I don't know how it could be NULL in iwl3945_rs_free_sta (excluding memory
> corruption or bug in timer internals).
>
> I suspect this could be a regression introduced by commit:
>
> commit 07ba55d7f1d0da174c9bc545c713b44cee760197
> Author: Arik Nemtsov <[email protected]>
> Date: Wed Sep 28 14:12:53 2011 +0300
>
> nl80211/mac80211: allow adding TDLS peers as stations
>
> I'm attaching patch with revert of relevant hunk, because full revert
> would be hard currently. Does it workaround problem for you?
No, didn't help unfortunately...
>
> > > Is this problem 100% reproducible for you ?
> >
> > Yes, it seems to be sufficient to suspend to RAM while associated and
> > turn off the AP before waking up the machine.
> > I wasn't able to reproduce just by turning of the AP while associated
> > without suspend.
>
> I'm not able to recreate, but I'm not using your config as my system
> user-space has problem to startup with it :-(
>
> Stanislaw
> diff --git a/net/mac80211/mlme.c b/net/mac80211/mlme.c
> index b1b1bb3..f773dbb 100644
> --- a/net/mac80211/mlme.c
> +++ b/net/mac80211/mlme.c
> @@ -1144,9 +1144,8 @@ static void ieee80211_set_disassoc(struct ieee80211_sub_if_data *sdata,
> changed |= BSS_CHANGED_BSSID | BSS_CHANGED_HT;
> ieee80211_bss_info_change_notify(sdata, changed);
>
> - /* remove AP and TDLS peers */
> if (remove_sta)
> - sta_info_flush(local, sdata);
> + sta_info_destroy_addr(sdata, bssid);
>
> del_timer_sync(&sdata->u.mgd.conn_mon_timer);
> del_timer_sync(&sdata->u.mgd.bcn_mon_timer);
--
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9
Czech Republic
On Wed, Nov 30, 2011 at 11:10:28AM +0100, Michal Hocko wrote:
> On Tue 29-11-11 12:39:07, Stanislaw Gruszka wrote:
> > On Tue, Nov 29, 2011 at 11:07:27AM +0100, Michal Hocko wrote:
> > > [I am not sure whether this is ieee80211 or iwl3945 issue so put both
> > > maintainers into loop]
> > The only changed we had in iwlegacy between 3.1 and 3.2-rc, was only
> > adjustment to mac80211 changes. However I think this is iwlegacy issue,
> > just for some reason bug did not trigger before.
>
> I have double checked 3.1 and cannot reproduce it.
> Anyway, I have put:
>
> diff --git a/drivers/net/wireless/iwlegacy/iwl-3945-rs.c b/drivers/net/wireless/iwlegacy/iwl-3945-rs.c
> index 8faeaf2..9221ed4 100644
> --- a/drivers/net/wireless/iwlegacy/iwl-3945-rs.c
> +++ b/drivers/net/wireless/iwlegacy/iwl-3945-rs.c
> @@ -432,6 +432,7 @@ static void iwl3945_rs_free_sta(void *iwl_priv, struct ieee80211_sta *sta,
> * to use iwl_priv to print out debugging) since it may not be fully
> * initialized at this point.
> */
> + printk("XXX: deleting time: %x\n", rs_sta->rate_scale_flush.base);
> del_timer_sync(&rs_sta->rate_scale_flush);
> }
>
> And the timer base is really NULL when the issue happens. So, somebody
> probably removed the timer already?
I think we call rs_ops->free_sta without rs_ops->alloc_sta, otherwise
I don't know how it could be NULL in iwl3945_rs_free_sta (excluding memory
corruption or bug in timer internals).
I suspect this could be a regression introduced by commit:
commit 07ba55d7f1d0da174c9bc545c713b44cee760197
Author: Arik Nemtsov <[email protected]>
Date: Wed Sep 28 14:12:53 2011 +0300
nl80211/mac80211: allow adding TDLS peers as stations
I'm attaching patch with revert of relevant hunk, because full revert
would be hard currently. Does it workaround problem for you?
> > Is this problem 100% reproducible for you ?
>
> Yes, it seems to be sufficient to suspend to RAM while associated and
> turn off the AP before waking up the machine.
> I wasn't able to reproduce just by turning of the AP while associated
> without suspend.
I'm not able to recreate, but I'm not using your config as my system
user-space has problem to startup with it :-(
Stanislaw
On Tue 29-11-11 12:39:07, Stanislaw Gruszka wrote:
> Hi Michal
>
> On Tue, Nov 29, 2011 at 11:07:27AM +0100, Michal Hocko wrote:
> > [I am not sure whether this is ieee80211 or iwl3945 issue so put both
> > maintainers into loop]
> The only changed we had in iwlegacy between 3.1 and 3.2-rc, was only
> adjustment to mac80211 changes. However I think this is iwlegacy issue,
> just for some reason bug did not trigger before.
I have double checked 3.1 and cannot reproduce it.
Anyway, I have put:
diff --git a/drivers/net/wireless/iwlegacy/iwl-3945-rs.c b/drivers/net/wireless/iwlegacy/iwl-3945-rs.c
index 8faeaf2..9221ed4 100644
--- a/drivers/net/wireless/iwlegacy/iwl-3945-rs.c
+++ b/drivers/net/wireless/iwlegacy/iwl-3945-rs.c
@@ -432,6 +432,7 @@ static void iwl3945_rs_free_sta(void *iwl_priv, struct ieee80211_sta *sta,
* to use iwl_priv to print out debugging) since it may not be fully
* initialized at this point.
*/
+ printk("XXX: deleting time: %x\n", rs_sta->rate_scale_flush.base);
del_timer_sync(&rs_sta->rate_scale_flush);
}
And the timer base is really NULL when the issue happens. So, somebody
probably removed the timer already?
> Is this problem 100% reproducible for you ?
Yes, it seems to be sufficient to suspend to RAM while associated and
turn off the AP before waking up the machine.
I wasn't able to reproduce just by turning of the AP while associated
without suspend.
> If so I'll try to reproduce it locally, otherwise I'll review related
> code to find reason of that problem.
>
> Thanks
> Stanislaw
Thanks
--
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9
Czech Republic
On Wed, Nov 30, 2011 at 03:03:16PM +0100, Michal Hocko wrote:
> No, didn't help unfortunately...
Lets gather some more messages. Please apply attached simple patch,
recompile with:
CONFIG_MAC80211_DEBUG_MENU=y
CONFIG_MAC80211_VERBOSE_DEBUG=y
and provide dmesg.
Thanks
Stanislaw
Hi Michal
On Tue, Nov 29, 2011 at 11:07:27AM +0100, Michal Hocko wrote:
> [I am not sure whether this is ieee80211 or iwl3945 issue so put both
> maintainers into loop]
The only changed we had in iwlegacy between 3.1 and 3.2-rc, was only
adjustment to mac80211 changes. However I think this is iwlegacy issue,
just for some reason bug did not trigger before.
Is this problem 100% reproducible for you ? If so I'll try to reproduce
it locally, otherwise I'll review related code to find reason of that
problem.
Thanks
Stanislaw
On Thu 01-12-11 12:59:12, Stanislaw Gruszka wrote:
> On Thu, Dec 01, 2011 at 11:55:38AM +0100, Michal Hocko wrote:
> > [ 349.316070] iwl3945_rs_free_sta rs_sta f4be1ac0
> > [ 349.316076] ------------[ cut here ]------------
> > [ 349.316097] WARNING: at drivers/net/wireless/iwlegacy/iwl-3945-rs.c:438 iwl3945_rs_free_sta+0x33/0x42 [iwl3945]()
>
> Johannes pointed on irc that this problem is most likely caused by:
>
> commit f785d83a19bca326f79d127a413e35769afc0105
> Author: Eliad Peller <[email protected]>
> Date: Mon Aug 8 16:50:22 2011 +0300
>
> mac80211: clear sta.drv_priv on reconfiguration
>
> Michal, could you confirm that? If that is the root of the problem, we
> need patch for iwlegacy (and iwlwifi) that allocate separate memory for
> rc structures.
Should I just revert it?
>
> Thanks
> Stanislaw
--
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9
Czech Republic
On Thu, Dec 01, 2011 at 01:44:40PM +0100, Michal Hocko wrote:
> On Thu 01-12-11 12:59:12, Stanislaw Gruszka wrote:
> > On Thu, Dec 01, 2011 at 11:55:38AM +0100, Michal Hocko wrote:
> > > [ 349.316070] iwl3945_rs_free_sta rs_sta f4be1ac0
> > > [ 349.316076] ------------[ cut here ]------------
> > > [ 349.316097] WARNING: at drivers/net/wireless/iwlegacy/iwl-3945-rs.c:438 iwl3945_rs_free_sta+0x33/0x42 [iwl3945]()
> >
> > Johannes pointed on irc that this problem is most likely caused by:
> >
> > commit f785d83a19bca326f79d127a413e35769afc0105
> > Author: Eliad Peller <[email protected]>
> > Date: Mon Aug 8 16:50:22 2011 +0300
> >
> > mac80211: clear sta.drv_priv on reconfiguration
> >
> > Michal, could you confirm that? If that is the root of the problem, we
> > need patch for iwlegacy (and iwlwifi) that allocate separate memory for
> > rc structures.
Actually this require quite big changes in iwlegacy at least, perhaps
also in iwlwifi. I prefer to revert commit as temporary fix, and add
memset(0) to drivers .add_sta callback, in drivers which need it. After
changing iwlegcy/iwlwifi we could move memset(0) back to ieee80211_reconfig().
Eliad, thoughts?
> Should I just revert it?
Yes, or remove proper line :-)
Stanislaw
On Tue, Nov 29, 2011 at 11:07:27AM +0100, Michal Hocko wrote:
> Looks we are stuck in lock_timer_base loop. I am not familiar with the
> code but either base is NULL or we are racing with timer->base changes.
> I guess that the first one sounds more probable. The timer in question
> is &rs_sta->rate_scale_flush.
FYI
If you could apply below commits in -tip/core/debugobjects
dc4218b timer: Use debugobjects to catch deletion of uninitialized timers
fb16b8c timer: Setup uninitialized timer with a stub callback
b84d435 debugobjects: Extend to assert that an object is initialized
feac18d debugobjects: Be smarter about static objects
and enable DEBUG_OBJECTS_TIMERS, maybe you can catch something easily.
Thanks,
Yong
On Thu 01-12-11 20:04:01, Yong Zhang wrote:
> On Thu, Dec 01, 2011 at 11:55:38AM +0100, Michal Hocko wrote:
> > On Thu 01-12-11 09:58:45, Michal Hocko wrote:
> > [...]
> > > I will try to turn on timer debugging suggested by Yong Zhang in other
> > > email.
> >
> > OK, so applied the following commits:
> > $ git shortlog master..
> > Christine Chan (2):
> > timer: Use debugobjects to catch deletion of uninitialized timers
> > debugobjects: Extend to assert that an object is initialized
> >
> > Michal Hocko (1):
> > Debugging patch
> >
> > Stephen Boyd (2):
> > timer: Setup uninitialized timer with a stub callback
> > debugobjects: Be smarter about static objects
> >
> > Debugging patch is the one from Stanislaw posted earlier in the thread.
> > It prevents del_timer_sync and warns on NULL timer in iwl3945_rs_free_sta.
> > Also turned on the suggested DEBUG_OBJECTS_TIMERS.
> >
> > I do not see any timer related warnings. I will try it with the slightly
> > modified Debugging patch and just warn without returning before calling
> > del_timer_sync.
> >
> > [ 349.316076] ------------[ cut here ]------------
> > [ 349.316097] WARNING: at drivers/net/wireless/iwlegacy/iwl-3945-rs.c:438 iwl3945_rs_free_sta+0x33/0x42 [iwl3945]()
> > [ 349.316105] Hardware name: LIFEBOOK S7110
> > [ 349.316110] Modules linked in: aes_i586 aes_generic i915 fbcon font bitblit softcursor drm_kms_helper drm fb fbdev i2c_algo_bit cfbcopyarea i2c_core cfbimgblt cfbfillrect fuse tun coretemp hwmon snd_hda_codec_realtek arc4 snd_hda_intel snd_hda_codec snd_pcm_oss snd_mixer_oss iwl3945 snd_pcm iwl_legacy snd_seq_oss snd_seq_midi_event mac80211 snd_seq snd_timer snd_seq_device cfg80211 snd video snd_page_alloc fujitsu_laptop backlight [last unloaded: scsi_wait_scan]
> > [ 349.316203] Pid: 12, comm: kworker/u:1 Tainted: G W 3.2.0-rc3-timerstest-00009-g762dd0e #20
>
> ^
> seems there was warning but we did't catch it, right?
> not sure if it's related.
Not related. It is just the previous attempt. I just forgot to set
loglevel before the first test.
--
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9
Czech Republic
On Thu, Dec 1, 2011 at 4:05 PM, Michal Hocko <[email protected]> wrote:
> On Thu 01-12-11 13:44:40, Michal Hocko wrote:
>> On Thu 01-12-11 12:59:12, Stanislaw Gruszka wrote:
>> > On Thu, Dec 01, 2011 at 11:55:38AM +0100, Michal Hocko wrote:
>> > > [ ?349.316070] iwl3945_rs_free_sta rs_sta f4be1ac0
>> > > [ ?349.316076] ------------[ cut here ]------------
>> > > [ ?349.316097] WARNING: at drivers/net/wireless/iwlegacy/iwl-3945-rs.c:438 iwl3945_rs_free_sta+0x33/0x42 [iwl3945]()
>> >
>> > Johannes pointed on irc that this problem is most likely caused by:
>> >
>> > commit f785d83a19bca326f79d127a413e35769afc0105
>> > Author: Eliad Peller <[email protected]>
>> > Date: ? Mon Aug 8 16:50:22 2011 +0300
>> >
>> > ? ? mac80211: clear sta.drv_priv on reconfiguration
>> >
>> > Michal, could you confirm that? If that is the root of the problem, we
>> > need patch for iwlegacy (and iwlwifi) that allocate separate memory for
>> > rc structures.
>>
>> Should I just revert it?
>
> Reverted and guess what ;)
> Yeah, it works. So feel free to add
> Reported-adn-Tested-by: Michal Hocko <[email protected]>
> to the revert.
>
cool. after giving it a second look, i agree with Stanislaw - let's
continue with the revert.
> Anyway, I have really no idea about the 80211 stack but the patch
> doesn't seem to be correct from the layering POV. Why should generic
> layer clear something that is driver private data (or at least the
> naming suggests that it is driver specific data)?
the driver should "know" the station (and its private data) only
between the .sta_add() and .sta_remove() callbacks.
on suspend, we do call sta_remove(), so it's valid to clear the private data.
the problem is that the rate control also uses the same private data,
and is valid also after sta_remove() was called. i think the right
solution is to split the generic private data and the rate control
private data.
Eliad.
On Thu, Dec 01, 2011 at 11:55:38AM +0100, Michal Hocko wrote:
> [ 349.316070] iwl3945_rs_free_sta rs_sta f4be1ac0
> [ 349.316076] ------------[ cut here ]------------
> [ 349.316097] WARNING: at drivers/net/wireless/iwlegacy/iwl-3945-rs.c:438 iwl3945_rs_free_sta+0x33/0x42 [iwl3945]()
Johannes pointed on irc that this problem is most likely caused by:
commit f785d83a19bca326f79d127a413e35769afc0105
Author: Eliad Peller <[email protected]>
Date: Mon Aug 8 16:50:22 2011 +0300
mac80211: clear sta.drv_priv on reconfiguration
Michal, could you confirm that? If that is the root of the problem, we
need patch for iwlegacy (and iwlwifi) that allocate separate memory for
rc structures.
Thanks
Stanislaw
On Thu, Dec 1, 2011 at 2:13 PM, Eliad Peller <[email protected]> wrote:
> On Thu, Dec 1, 2011 at 2:55 PM, Stanislaw Gruszka <[email protected]> wrote:
>> On Thu, Dec 01, 2011 at 01:44:40PM +0100, Michal Hocko wrote:
>>> On Thu 01-12-11 12:59:12, Stanislaw Gruszka wrote:
>>> > On Thu, Dec 01, 2011 at 11:55:38AM +0100, Michal Hocko wrote:
>>> > > [ 349.316070] iwl3945_rs_free_sta rs_sta f4be1ac0
>>> > > [ 349.316076] ------------[ cut here ]------------
>>> > > [ 349.316097] WARNING: at drivers/net/wireless/iwlegacy/iwl-3945-rs.c:438 iwl3945_rs_free_sta+0x33/0x42 [iwl3945]()
>>> >
>>> > Johannes pointed on irc that this problem is most likely caused by:
>>> >
>>> > commit f785d83a19bca326f79d127a413e35769afc0105
>>> > Author: Eliad Peller <[email protected]>
>>> > Date: Mon Aug 8 16:50:22 2011 +0300
>>> >
>>> > mac80211: clear sta.drv_priv on reconfiguration
>>> >
>>> > Michal, could you confirm that? If that is the root of the problem, we
>>> > need patch for iwlegacy (and iwlwifi) that allocate separate memory for
>>> > rc structures.
>> Actually this require quite big changes in iwlegacy at least, perhaps
>> also in iwlwifi. I prefer to revert commit as temporary fix, and add
>> memset(0) to drivers .add_sta callback, in drivers which need it. After
>> changing iwlegcy/iwlwifi we could move memset(0) back to ieee80211_reconfig().
>>
>> Eliad, thoughts?
>>
> i don't really care about temporarily reverting it.
> but maybe you can just add an "initialized" flag to iwl3945_rs_sta,
> and check it on free_sta?
>
Isnt't the mainline handling "user-friendliness first" by reverting
"buggy" commits (especially if there is no fix/patch in sight)?
A proper fix can be done later.
Just my 2 cents.
- Sedat -
> Eliad.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu 01-12-11 13:44:40, Michal Hocko wrote:
> On Thu 01-12-11 12:59:12, Stanislaw Gruszka wrote:
> > On Thu, Dec 01, 2011 at 11:55:38AM +0100, Michal Hocko wrote:
> > > [ 349.316070] iwl3945_rs_free_sta rs_sta f4be1ac0
> > > [ 349.316076] ------------[ cut here ]------------
> > > [ 349.316097] WARNING: at drivers/net/wireless/iwlegacy/iwl-3945-rs.c:438 iwl3945_rs_free_sta+0x33/0x42 [iwl3945]()
> >
> > Johannes pointed on irc that this problem is most likely caused by:
> >
> > commit f785d83a19bca326f79d127a413e35769afc0105
> > Author: Eliad Peller <[email protected]>
> > Date: Mon Aug 8 16:50:22 2011 +0300
> >
> > mac80211: clear sta.drv_priv on reconfiguration
> >
> > Michal, could you confirm that? If that is the root of the problem, we
> > need patch for iwlegacy (and iwlwifi) that allocate separate memory for
> > rc structures.
>
> Should I just revert it?
Reverted and guess what ;)
Yeah, it works. So feel free to add
Reported-adn-Tested-by: Michal Hocko <[email protected]>
to the revert.
Anyway, I have really no idea about the 80211 stack but the patch
doesn't seem to be correct from the layering POV. Why should generic
layer clear something that is driver private data (or at least the
naming suggests that it is driver specific data)?
Thanks a lot for your help.
--
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9
Czech Republic
On Thu 01-12-11 09:58:45, Michal Hocko wrote:
[...]
> I will try to turn on timer debugging suggested by Yong Zhang in other
> email.
OK, so applied the following commits:
$ git shortlog master..
Christine Chan (2):
timer: Use debugobjects to catch deletion of uninitialized timers
debugobjects: Extend to assert that an object is initialized
Michal Hocko (1):
Debugging patch
Stephen Boyd (2):
timer: Setup uninitialized timer with a stub callback
debugobjects: Be smarter about static objects
Debugging patch is the one from Stanislaw posted earlier in the thread.
It prevents del_timer_sync and warns on NULL timer in iwl3945_rs_free_sta.
Also turned on the suggested DEBUG_OBJECTS_TIMERS.
I do not see any timer related warnings. I will try it with the slightly
modified Debugging patch and just warn without returning before calling
del_timer_sync.
[...]
[ 305.800081] iwl3945_rs_alloc_sta rs_sta f4be1ac0
[ 305.800091] ieee80211 phy0: Allocated STA 00:0b:6b:3c:8c:e4
[ 305.800098] ieee80211 phy0: Inserted dummy STA 00:0b:6b:3c:8c:e4
[ 305.800112] ieee80211 phy0: device no longer idle - working
[ 305.800120] wlan0: associate with 00:0b:6b:3c:8c:e4 (try 1)
[ 305.807868] wlan0: RX ReassocResp from 00:0b:6b:3c:8c:e4 (capab=0x431 status=0 aid=1)
[ 305.807878] wlan0: associated
[ 305.808067] ieee80211 phy0: Inserted STA 00:0b:6b:3c:8c:e4
[ 305.808081] ieee80211 phy0: WMM queue=2 aci=0 acm=0 aifs=2 cWmin=7 cWmax=1023 txop=64 uapsd=0
[ 305.808095] ieee80211 phy0: WMM queue=3 aci=1 acm=0 aifs=7 cWmin=15 cWmax=1023 txop=0 uapsd=0
[ 305.808108] ieee80211 phy0: WMM queue=1 aci=2 acm=0 aifs=2 cWmin=7 cWmax=15 txop=94 uapsd=0
[ 305.808121] ieee80211 phy0: WMM queue=0 aci=3 acm=0 aifs=2 cWmin=3 cWmax=7 txop=47 uapsd=0
[ 316.368064] wlan0: no IPv6 routers present
[ 339.387891] PM: Syncing filesystems ... done.
[ 339.480244] PM: Preparing system for mem sleep
[ 340.735244] Freezing user space processes ... (elapsed 0.01 seconds) done.
[ 340.752248] Freezing remaining freezable tasks ... (elapsed 0.01 seconds) done.
[ 340.772279] PM: Entering mem sleep
[ 340.909961] sd 0:0:0:0: [sda] Synchronizing SCSI cache
[ 340.915156] sd 0:0:0:0: [sda] Stopping disk
[ 340.959367] serial 00:0a: disabled
[ 340.963655] serial 00:0a: wake-up capability disabled by ACPI
[ 340.968461] PIIX_IDE 0000:00:1f.1: PCI INT A disabled
[ 340.970913] ehci_hcd 0000:00:1d.7: PCI INT A disabled
[ 340.972114] uhci_hcd 0000:00:1d.3: PCI INT D disabled
[ 340.972127] uhci_hcd 0000:00:1d.2: PCI INT C disabled
[ 340.972140] uhci_hcd 0000:00:1d.1: PCI INT B disabled
[ 340.972152] uhci_hcd 0000:00:1d.0: PCI INT A disabled
[ 340.982942] i915 0000:00:02.0: PCI INT A disabled
[ 341.276198] snd_hda_intel 0000:00:1b.0: PCI INT A disabled
[ 341.376137] snd_hda_intel 0000:00:1b.0: power state changed by ACPI to D3
[ 342.922781] ata_piix 0000:00:1f.2: PCI INT B disabled
[ 343.024183] PM: suspend of devices complete after 2247.084 msecs
[ 343.128551] sky2 0000:02:00.0: PME# enabled
[ 343.218047] sky2 0000:02:00.0: wake-up capability enabled by ACPI
[ 343.464325] ehci_hcd 0000:00:1d.7: PME# enabled
[ 343.554551] ehci_hcd 0000:00:1d.7: wake-up capability enabled by ACPI
[ 343.660211] uhci_hcd 0000:00:1d.3: wake-up capability enabled by ACPI
[ 343.748740] uhci_hcd 0000:00:1d.2: wake-up capability enabled by ACPI
[ 343.837261] uhci_hcd 0000:00:1d.1: wake-up capability enabled by ACPI
[ 343.925774] uhci_hcd 0000:00:1d.0: wake-up capability enabled by ACPI
[ 344.012708] PM: late suspend of devices complete after 900.012 msecs
[ 344.099624] ACPI: Preparing to enter system sleep state S3
[ 344.186393] PM: Saving platform NVS memory
[ 344.272082] Disabling non-boot CPUs ...
[ 344.460060] CPU 1 is now offline
[ 344.543874] ACPI: Low-level resume complete
[ 344.543874] PM: Restoring platform NVS memory
[ 344.543874] Enabling non-boot CPUs ...
[ 344.554434] Booting Node 0 Processor 1 APIC 0x1
[ 344.558267] smpboot cpu 1: start_ip = 9b000
[ 344.358012] Initializing CPU#1
[ 344.358012] Calibrating delay loop (skipped) already calibrated this CPU
[ 344.580289] CPU1 is up
[ 344.593922] ACPI: Waking up from system sleep state S3
[ 344.598517] i915 0000:00:02.0: restoring config space at offset 0xf (was 0x100, writing 0x10b)
[ 344.600866] pci 0000:00:02.1: restoring config space at offset 0x1 (was 0x900000, writing 0x900007)
[ 344.603204] snd_hda_intel 0000:00:1b.0: restoring config space at offset 0x3 (was 0x0, writing 0x10)
[ 344.606598] snd_hda_intel 0000:00:1b.0: restoring config space at offset 0x1 (was 0x100006, writing 0x100002)
[ 344.609076] pci 0000:00:1c.0: restoring config space at offset 0xf (was 0x40100, writing 0x4010b)
[ 344.611429] pci 0000:00:1c.0: restoring config space at offset 0x9 (was 0x10001, writing 0x80c180a1)
[ 344.613759] pci 0000:00:1c.0: restoring config space at offset 0x8 (was 0x0, writing 0xf000f000)
[ 344.616070] pci 0000:00:1c.0: restoring config space at offset 0x7 (was 0x20000000, writing 0x20002020)
[ 344.618358] pci 0000:00:1c.0: restoring config space at offset 0x6 (was 0x0, writing 0x20200)
[ 344.620666] pci 0000:00:1c.0: restoring config space at offset 0x3 (was 0x810000, writing 0x810010)
[ 344.622936] pci 0000:00:1c.0: restoring config space at offset 0x1 (was 0x100000, writing 0x100007)
[ 344.625217] pci 0000:00:1c.1: restoring config space at offset 0xf (was 0x40200, writing 0x4020b)
[ 344.627483] pci 0000:00:1c.1: restoring config space at offset 0x9 (was 0x10001, writing 0x80918081)
[ 344.629779] pci 0000:00:1c.1: restoring config space at offset 0x8 (was 0x0, writing 0x80708060)
[ 344.632079] pci 0000:00:1c.1: restoring config space at offset 0x7 (was 0x20000000, writing 0x20005050)
[ 344.634367] pci 0000:00:1c.1: restoring config space at offset 0x3 (was 0x810000, writing 0x810010)
[ 344.636633] pci 0000:00:1c.1: restoring config space at offset 0x1 (was 0x100000, writing 0x100007)
[ 344.638923] pci 0000:00:1c.2: restoring config space at offset 0xf (was 0x40300, writing 0x4030b)
[ 344.641200] pci 0000:00:1c.2: restoring config space at offset 0x9 (was 0x10001, writing 0x80518041)
[ 344.643412] pci 0000:00:1c.2: restoring config space at offset 0x8 (was 0x0, writing 0x80308010)
[ 344.645622] pci 0000:00:1c.2: restoring config space at offset 0x7 (was 0x20000000, writing 0x20004040)
[ 344.647810] pci 0000:00:1c.2: restoring config space at offset 0x3 (was 0x810000, writing 0x810010)
[ 344.650002] pci 0000:00:1c.2: restoring config space at offset 0x1 (was 0x100000, writing 0x100007)
[ 344.652223] uhci_hcd 0000:00:1d.0: restoring config space at offset 0x1 (was 0x2800005, writing 0x2800001)
[ 344.654461] uhci_hcd 0000:00:1d.0: wake-up capability disabled by ACPI
[ 344.656676] uhci_hcd 0000:00:1d.1: restoring config space at offset 0x1 (was 0x2800005, writing 0x2800001)
[ 344.658924] uhci_hcd 0000:00:1d.1: wake-up capability disabled by ACPI
[ 344.661154] uhci_hcd 0000:00:1d.2: restoring config space at offset 0x1 (was 0x2800005, writing 0x2800001)
[ 344.663418] uhci_hcd 0000:00:1d.2: wake-up capability disabled by ACPI
[ 344.665710] uhci_hcd 0000:00:1d.3: restoring config space at offset 0x1 (was 0x2800005, writing 0x2800001)
[ 344.668058] uhci_hcd 0000:00:1d.3: wake-up capability disabled by ACPI
[ 344.670384] ehci_hcd 0000:00:1d.7: restoring config space at offset 0x1 (was 0x2900006, writing 0x2900002)
[ 344.672780] ehci_hcd 0000:00:1d.7: wake-up capability disabled by ACPI
[ 344.675140] ehci_hcd 0000:00:1d.7: PME# disabled
[ 344.677545] pci 0000:00:1e.0: restoring config space at offset 0x9 (was 0x10001, writing 0x87f18401)
[ 344.680014] pci 0000:00:1e.0: restoring config space at offset 0x8 (was 0x0, writing 0xf020f020)
[ 344.682483] pci 0000:00:1e.0: restoring config space at offset 0x7 (was 0xa2800000, writing 0xa2803030)
[ 344.685001] pci 0000:00:1e.0: restoring config space at offset 0x1 (was 0x100004, writing 0x100007)
[ 344.687548] PIIX_IDE 0000:00:1f.1: restoring config space at offset 0xf (was 0x100, writing 0x1ff)
[ 344.690119] ata_piix 0000:00:1f.2: restoring config space at offset 0xf (was 0x200, writing 0x20b)
[ 344.692848] sky2 0000:02:00.0: restoring config space at offset 0xf (was 0x100, writing 0x10b)
[ 344.695461] sky2 0000:02:00.0: restoring config space at offset 0x6 (was 0x1, writing 0x2001)
[ 344.697999] sky2 0000:02:00.0: restoring config space at offset 0x4 (was 0x4, writing 0xf0000004)
[ 344.700491] sky2 0000:02:00.0: restoring config space at offset 0x3 (was 0x0, writing 0x10)
[ 344.702963] sky2 0000:02:00.0: restoring config space at offset 0x1 (was 0x100000, writing 0x100407)
[ 344.705707] iwl3945 0000:05:00.0: restoring config space at offset 0xf (was 0x100, writing 0x10b)
[ 344.708232] iwl3945 0000:05:00.0: restoring config space at offset 0x4 (was 0x0, writing 0x80100000)
[ 344.710706] iwl3945 0000:05:00.0: restoring config space at offset 0x3 (was 0x0, writing 0x10)
[ 344.713210] iwl3945 0000:05:00.0: restoring config space at offset 0x1 (was 0x100000, writing 0x100406)
[ 344.715819] pci 0000:08:03.0: restoring config space at offset 0xf (was 0xc00100, writing 0x1c0010b)
[ 344.718384] pci 0000:08:03.0: restoring config space at offset 0xe (was 0x1, writing 0x30fd)
[ 344.720917] pci 0000:08:03.0: restoring config space at offset 0xd (was 0x1, writing 0x3001)
[ 344.723403] pci 0000:08:03.0: restoring config space at offset 0xc (was 0x1, writing 0x34fd)
[ 344.725882] pci 0000:08:03.0: restoring config space at offset 0xb (was 0x1, writing 0x3401)
[ 344.728332] pci 0000:08:03.0: restoring config space at offset 0xa (was 0x0, writing 0x8bfff000)
[ 344.730758] pci 0000:08:03.0: restoring config space at offset 0x9 (was 0x0, writing 0x88000000)
[ 344.733180] pci 0000:08:03.0: restoring config space at offset 0x8 (was 0x0, writing 0x87fff000)
[ 344.735574] pci 0000:08:03.0: restoring config space at offset 0x7 (was 0x0, writing 0x84000000)
[ 344.737980] pci 0000:08:03.0: restoring config space at offset 0x6 (was 0x0, writing 0xb00c0908)
[ 344.740358] pci 0000:08:03.0: restoring config space at offset 0x4 (was 0x0, writing 0x80000000)
[ 344.742715] pci 0000:08:03.0: restoring config space at offset 0x1 (was 0x4100080, writing 0x4100007)
[ 344.745107] pci 0000:08:03.4: restoring config space at offset 0xf (was 0x100, writing 0x10b)
[ 344.747449] pci 0000:08:03.4: restoring config space at offset 0x5 (was 0x0, writing 0xf0201000)
[ 344.749778] pci 0000:08:03.4: restoring config space at offset 0x4 (was 0x0, writing 0xf0200000)
[ 344.752072] pci 0000:08:03.4: restoring config space at offset 0x3 (was 0x0, writing 0x2010)
[ 344.754298] pci 0000:08:03.4: restoring config space at offset 0x1 (was 0x2100080, writing 0x2100017)
[ 344.756787] PM: early resume of devices complete after 158.326 msecs
[ 344.759029] i915 0000:00:02.0: PCI INT A -> GSI 16 (level, low) -> IRQ 16
[ 344.761243] i915 0000:00:02.0: setting latency timer to 64
[ 344.761338] pci 0000:00:1c.0: PCI INT A -> GSI 22 (level, low) -> IRQ 22
[ 344.761345] pci 0000:00:1c.0: setting latency timer to 64
[ 344.761361] pci 0000:00:1c.1: PCI INT B -> GSI 21 (level, low) -> IRQ 21
[ 344.761367] pci 0000:00:1c.1: setting latency timer to 64
[ 344.761380] pci 0000:00:1c.2: PCI INT C -> GSI 20 (level, low) -> IRQ 20
[ 344.761386] pci 0000:00:1c.2: setting latency timer to 64
[ 344.761401] uhci_hcd 0000:00:1d.0: PCI INT A -> GSI 23 (level, low) -> IRQ 23
[ 344.761409] uhci_hcd 0000:00:1d.0: setting latency timer to 64
[ 344.761439] usb usb2: root hub lost power or was reset
[ 344.761460] uhci_hcd 0000:00:1d.1: PCI INT B -> GSI 20 (level, low) -> IRQ 20
[ 344.761468] uhci_hcd 0000:00:1d.1: setting latency timer to 64
[ 344.761496] usb usb3: root hub lost power or was reset
[ 344.761514] uhci_hcd 0000:00:1d.2: PCI INT C -> GSI 18 (level, low) -> IRQ 18
[ 344.761522] uhci_hcd 0000:00:1d.2: setting latency timer to 64
[ 344.761549] usb usb4: root hub lost power or was reset
[ 344.761567] uhci_hcd 0000:00:1d.3: PCI INT D -> GSI 16 (level, low) -> IRQ 16
[ 344.761576] uhci_hcd 0000:00:1d.3: setting latency timer to 64
[ 344.761604] usb usb5: root hub lost power or was reset
[ 344.761624] ehci_hcd 0000:00:1d.7: PCI INT A -> GSI 23 (level, low) -> IRQ 23
[ 344.761632] ehci_hcd 0000:00:1d.7: setting latency timer to 64
[ 344.761675] pci 0000:00:1e.0: setting latency timer to 64
[ 344.761694] PIIX_IDE 0000:00:1f.1: PCI INT A -> GSI 18 (level, low) -> IRQ 18
[ 344.761700] PIIX_IDE 0000:00:1f.1: setting latency timer to 64
[ 344.761723] ata_piix 0000:00:1f.2: PCI INT B -> GSI 19 (level, low) -> IRQ 19
[ 344.761730] ata_piix 0000:00:1f.2: setting latency timer to 64
[ 344.762438] sky2 0000:02:00.0: wake-up capability disabled by ACPI
[ 344.762462] sky2 0000:02:00.0: PME# disabled
[ 344.768820] pci 0000:08:03.0: PCI INT A -> GSI 16 (level, low) -> IRQ 16
[ 344.769454] serial 00:0a: activated
[ 344.772573] hda: host max PIO4 wanted PIO255(auto-tune) selected PIO4
[ 344.772828] hda: UDMA/33 mode selected
[ 344.835139] sd 0:0:0:0: [sda] Starting disk
[ 344.897591] render error detected, EIR: 0x00000010
[ 344.899856] page table error
[ 344.900004] PGTBL_ER: 0x00000003
[ 344.900004] [drm:i915_report_and_clear_eir] *ERROR* EIR stuck: 0x00000010, masking
[ 344.900004] render error detected, EIR: 0x00000010
[ 344.900004] page table error
[ 344.900004] PGTBL_ER: 0x00000003
[ 345.168077] usb 5-2: reset full-speed USB device number 3 using uhci_hcd
[ 345.260090] snd_hda_intel 0000:00:1b.0: power state changed by ACPI to D0
[ 345.263106] snd_hda_intel 0000:00:1b.0: power state changed by ACPI to D0
[ 345.266113] snd_hda_intel 0000:00:1b.0: power state changed by ACPI to D0
[ 345.269115] snd_hda_intel 0000:00:1b.0: power state changed by ACPI to D0
[ 345.271430] snd_hda_intel 0000:00:1b.0: PCI INT A -> GSI 17 (level, low) -> IRQ 17
[ 345.273684] snd_hda_intel 0000:00:1b.0: setting latency timer to 64
[ 345.275966] snd_hda_intel 0000:00:1b.0: irq 41 for MSI/MSI-X
[ 345.572088] usb 2-2: reset low-speed USB device number 2 using uhci_hcd
[ 346.608301] ata1.00: ACPI cmd ef/03:0c:00:00:00:a0 (SET FEATURES) filtered out
[ 346.611333] ata1.00: ACPI cmd ef/03:45:00:00:00:a0 (SET FEATURES) filtered out
[ 346.614399] ata1.00: ACPI cmd f5/00:00:00:00:00:a0 (SECURITY FREEZE LOCK) filtered out
[ 346.632440] ata1.00: configured for UDMA/133
[ 346.744496] PM: resume of devices complete after 1985.491 msecs
[ 346.747333] PM: Finishing wakeup.
[ 346.749785] Restarting tasks ... done.
[ 346.789225] video LNXVIDEO:00: Restoring backlight state
[ 347.244077] ieee80211 phy0: wlan0: No probe response from AP 00:0b:6b:3c:8c:e4 after 500ms, try 1/5
[ 347.744100] ieee80211 phy0: wlan0: No probe response from AP 00:0b:6b:3c:8c:e4 after 500ms, try 2/5
[ 348.004058] wlan0: detected beacon loss from AP - sending probe request
[ 348.244038] ieee80211 phy0: wlan0: No probe response from AP 00:0b:6b:3c:8c:e4 after 500ms, try 3/5
[ 348.744062] ieee80211 phy0: wlan0: No probe response from AP 00:0b:6b:3c:8c:e4 after 500ms, try 4/5
[ 349.244066] ieee80211 phy0: wlan0: No probe response from AP 00:0b:6b:3c:8c:e4 after 500ms, disconnecting.
[ 349.316045] ieee80211 phy0: Removed STA 00:0b:6b:3c:8c:e4
[ 349.316070] iwl3945_rs_free_sta rs_sta f4be1ac0
[ 349.316076] ------------[ cut here ]------------
[ 349.316097] WARNING: at drivers/net/wireless/iwlegacy/iwl-3945-rs.c:438 iwl3945_rs_free_sta+0x33/0x42 [iwl3945]()
[ 349.316105] Hardware name: LIFEBOOK S7110
[ 349.316110] Modules linked in: aes_i586 aes_generic i915 fbcon font bitblit softcursor drm_kms_helper drm fb fbdev i2c_algo_bit cfbcopyarea i2c_core cfbimgblt cfbfillrect fuse tun coretemp hwmon snd_hda_codec_realtek arc4 snd_hda_intel snd_hda_codec snd_pcm_oss snd_mixer_oss iwl3945 snd_pcm iwl_legacy snd_seq_oss snd_seq_midi_event mac80211 snd_seq snd_timer snd_seq_device cfg80211 snd video snd_page_alloc fujitsu_laptop backlight [last unloaded: scsi_wait_scan]
[ 349.316203] Pid: 12, comm: kworker/u:1 Tainted: G W 3.2.0-rc3-timerstest-00009-g762dd0e #20
[ 349.316209] Call Trace:
[ 349.316223] [<c01331b4>] warn_slowpath_common+0x68/0x7d
[ 349.316237] [<f8169454>] ? iwl3945_rs_free_sta+0x33/0x42 [iwl3945]
[ 349.316247] [<c01331dd>] warn_slowpath_null+0x14/0x18
[ 349.316260] [<f8169454>] iwl3945_rs_free_sta+0x33/0x42 [iwl3945]
[ 349.316283] [<f8102f94>] __sta_info_free.isra.24+0x28/0x60 [mac80211]
[ 349.316304] [<f8103df9>] __sta_info_destroy+0x259/0x283 [mac80211]
[ 349.316328] [<f8104194>] sta_info_flush+0x43/0x6e [mac80211]
[ 349.316353] [<f8109453>] ieee80211_set_disassoc+0x1a3/0x1d7 [mac80211]
[ 349.316379] [<f81095e8>] ieee80211_sta_connection_lost+0x31/0x8a [mac80211]
[ 349.316390] [<c031208f>] ? dev_printk+0x2b/0x2d
[ 349.316416] [<f810bde2>] ieee80211_sta_work+0x1b0/0x1c3 [mac80211]
[ 349.316446] [<f810e38a>] ieee80211_iface_work+0x23e/0x250 [mac80211]
[ 349.316457] [<c01468af>] process_one_work+0x19b/0x2e6
[ 349.316483] [<f810e14c>] ? ieee80211_netdev_select_queue+0x14/0x14 [mac80211]
[ 349.316492] [<c0146ddd>] worker_thread+0x136/0x1ee
[ 349.316501] [<c0146ca7>] ? manage_workers.isra.23+0x14f/0x14f
[ 349.316509] [<c014a0f5>] kthread+0x67/0x6c
[ 349.316517] [<c014a08e>] ? kthread_worker_fn+0x119/0x119
[ 349.316527] [<c04762f6>] kernel_thread_helper+0x6/0x10
[ 349.316532] ---[ end trace 5cf96275a244e7ca ]---
[ 349.316538] ieee80211 phy0: Destroyed STA 00:0b:6b:3c:8c:e4
[ 349.316556] ieee80211 phy0: device now idle
[ 349.328105] cfg80211: Calling CRDA to update world regulatory domain
[ 349.416952] ieee80211 phy0: device no longer idle - scanning
[ 353.091211] ieee80211 phy0: device now idle
[ 358.092230] ieee80211 phy0: device no longer idle - scanning
[ 361.772470] ieee80211 phy0: device now idle
[ 366.774826] ieee80211 phy0: device no longer idle - scanning
[ 370.448401] ieee80211 phy0: device now idle
[ 375.455127] ieee80211 phy0: device no longer idle - scanning
[ 379.127638] ieee80211 phy0: device now idle
[ 384.131847] ieee80211 phy0: device no longer idle - scanning
[ 387.806711] ieee80211 phy0: device now idle
[ 396.708125] sky2 0000:02:00.0: lan0: Link is up at 1000 Mbps, full duplex, flow control both
[ 407.120063] lan0: no IPv6 routers present
[ 434.003896] ieee80211 phy0: device no longer idle - scanning
[ 438.694843] ieee80211 phy0: device now idle
[ 441.485602] process `skype' is using obsolete setsockopt SO_BSDCOMPAT
[ 494.004619] ieee80211 phy0: device no longer idle - scanning
[ 498.698796] ieee80211 phy0: device now idle
[ 554.004822] ieee80211 phy0: device no longer idle - scanning
[ 558.697908] ieee80211 phy0: device now idle
[ 614.003956] ieee80211 phy0: device no longer idle - scanning
[ 618.700408] ieee80211 phy0: device now idle
[ 674.004164] ieee80211 phy0: device no longer idle - scanning
[ 678.702423] ieee80211 phy0: device now idle
--
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9
Czech Republic
On Wed 30-11-11 16:39:30, Stanislaw Gruszka wrote:
> On Wed, Nov 30, 2011 at 03:03:16PM +0100, Michal Hocko wrote:
> > No, didn't help unfortunately...
>
> Lets gather some more messages. Please apply attached simple patch,
> recompile with:
> CONFIG_MAC80211_DEBUG_MENU=y
> CONFIG_MAC80211_VERBOSE_DEBUG=y
$ zgrep "MAC80211.*DEBUG" /proc/config.gz | grep -v ^\#
CONFIG_MAC80211_DEBUG_MENU=y
CONFIG_MAC80211_VERBOSE_DEBUG=y
> and provide dmesg.
[ 262.007061] ieee80211 phy0: device no longer idle - scanning
[ 266.699301] ieee80211 phy0: device now idle
[ 342.002248] ieee80211 phy0: device no longer idle - scanning
[ 346.695045] ieee80211 phy0: device now idle
[ 351.813276] ieee80211 phy0: device no longer idle - scanning
[ 355.487602] ieee80211 phy0: device now idle
[ 360.491257] ieee80211 phy0: device no longer idle - scanning
[ 364.170612] ieee80211 phy0: device now idle
[ 369.175101] ieee80211 phy0: device no longer idle - scanning
[ 372.847072] ieee80211 phy0: device now idle
[ 372.848534] ieee80211 phy0: device no longer idle - working
[ 372.848545] wlan0: authenticate with 00:0b:6b:3c:8c:e4 (try 1)
[ 372.850295] wlan0: authenticated
[ 372.850317] ieee80211 phy0: device now idle
[ 372.852870] iwl3945_rs_alloc_sta rs_sta f411dac0
[ 372.852877] ieee80211 phy0: Allocated STA 00:0b:6b:3c:8c:e4
[ 372.852882] ieee80211 phy0: Inserted dummy STA 00:0b:6b:3c:8c:e4
[ 372.852897] ieee80211 phy0: device no longer idle - working
[ 372.852903] wlan0: associate with 00:0b:6b:3c:8c:e4 (try 1)
[ 372.910215] wlan0: RX ReassocResp from 00:0b:6b:3c:8c:e4 (capab=0x431 status=0 aid=1)
[ 372.910223] wlan0: associated
[ 372.910341] ieee80211 phy0: Inserted STA 00:0b:6b:3c:8c:e4
[ 372.910352] ieee80211 phy0: WMM queue=2 aci=0 acm=0 aifs=2 cWmin=7 cWmax=1023 txop=64 uapsd=0
[ 372.910362] ieee80211 phy0: WMM queue=3 aci=1 acm=0 aifs=7 cWmin=15 cWmax=1023 txop=0 uapsd=0
[ 372.910372] ieee80211 phy0: WMM queue=1 aci=2 acm=0 aifs=2 cWmin=7 cWmax=15 txop=94 uapsd=0
[ 372.910381] ieee80211 phy0: WMM queue=0 aci=3 acm=0 aifs=2 cWmin=3 cWmax=7 txop=47 uapsd=0
[ 383.384042] wlan0: no IPv6 routers present
[ 426.375395] PM: Syncing filesystems ... done.
[ 426.430760] PM: Preparing system for mem sleep
[...]
suspen
resume with AP deactivated
[...]
[ 431.324532] iwl3945 0000:05:00.0: restoring config space at offset 0xf (was 0x100, writing 0x10b)
[ 431.324601] iwl3945 0000:05:00.0: restoring config space at offset 0x4 (was 0x0, writing 0x80100000)
[ 431.324616] iwl3945 0000:05:00.0: restoring config space at offset 0x3 (was 0x0, writing 0x10)
[ 431.324636] iwl3945 0000:05:00.0: restoring config space at offset 0x1 (was 0x100000, writing 0x100406)
[...]
[ 434.064050] ieee80211 phy0: wlan0: No probe response from AP 00:0b:6b:3c:8c:e4 after 500ms, try 1/5
[ 434.250089] usb 5-2: New USB device found, idVendor=0c24, idProduct=000f
[ 434.250098] usb 5-2: New USB device strings: Mfr=0, Product=0, SerialNumber=0
[ 434.564034] ieee80211 phy0: wlan0: No probe response from AP 00:0b:6b:3c:8c:e4 after 500ms, try 2/5
[ 435.004096] wlan0: detected beacon loss from AP - sending probe request
[ 435.064044] ieee80211 phy0: wlan0: No probe response from AP 00:0b:6b:3c:8c:e4 after 500ms, try 3/5
[ 435.564047] ieee80211 phy0: wlan0: No probe response from AP 00:0b:6b:3c:8c:e4 after 500ms, try 4/5
[ 436.064065] ieee80211 phy0: wlan0: No probe response from AP 00:0b:6b:3c:8c:e4 after 500ms, disconnecting.
[ 436.148044] ieee80211 phy0: Removed STA 00:0b:6b:3c:8c:e4
[ 436.148069] iwl3945_rs_free_sta rs_sta f411dac0
[ 436.148076] ------------[ cut here ]------------
[ 436.148097] WARNING: at drivers/net/wireless/iwlegacy/iwl-3945-rs.c:438 iwl3945_rs_free_sta+0x33/0x42 [iwl3945]()
[ 436.148103] Hardware name: LIFEBOOK S7110
[ 436.148107] Modules linked in: aes_i586 aes_generic i915 fbcon font bitblit softcursor drm_kms_helper drm fb fbdev i2c_algo_bit cfbcopyarea i2c_core cfbimgblt cfbfillrect fuse tun coretemp hwmon arc4 snd_hda_codec_realtek iwl3945 snd_hda_intel iwl_legacy snd_hda_codec snd_pcm_oss snd_mixer_oss snd_pcm snd_seq_oss snd_seq_midi_event mac80211 snd_seq snd_timer snd_seq_device snd fujitsu_laptop video backlight cfg80211 snd_page_alloc [last unloaded: scsi_wait_scan]
[ 436.148195] Pid: 1330, comm: kworker/u:4 Tainted: G W 3.2.0-rc3-mac80211test-00005-g1b97fa2 #19
[ 436.148200] Call Trace:
[ 436.148214] [<c01331ac>] warn_slowpath_common+0x68/0x7d
[ 436.148229] [<f8223454>] ? iwl3945_rs_free_sta+0x33/0x42 [iwl3945]
[ 436.148238] [<c01331d5>] warn_slowpath_null+0x14/0x18
[ 436.148251] [<f8223454>] iwl3945_rs_free_sta+0x33/0x42 [iwl3945]
[ 436.148274] [<f81c2f94>] __sta_info_free.isra.24+0x28/0x60 [mac80211]
[ 436.148296] [<f81c3df9>] __sta_info_destroy+0x259/0x283 [mac80211]
[ 436.148320] [<f81c4194>] sta_info_flush+0x43/0x6e [mac80211]
[ 436.148346] [<f81c9453>] ieee80211_set_disassoc+0x1a3/0x1d7 [mac80211]
[ 436.148372] [<f81c95e8>] ieee80211_sta_connection_lost+0x31/0x8a [mac80211]
[ 436.148382] [<c0311367>] ? dev_printk+0x2b/0x2d
[ 436.148409] [<f81cbde2>] ieee80211_sta_work+0x1b0/0x1c3 [mac80211]
[ 436.148439] [<f81ce38a>] ieee80211_iface_work+0x23e/0x250 [mac80211]
[ 436.148450] [<c0146673>] process_one_work+0x19b/0x2e6
[ 436.148476] [<f81ce14c>] ? ieee80211_netdev_select_queue+0x14/0x14 [mac80211]
[ 436.148485] [<c0146ba1>] worker_thread+0x136/0x1ee
[ 436.148494] [<c0146a6b>] ? manage_workers.isra.23+0x14f/0x14f
[ 436.148502] [<c0149eb9>] kthread+0x67/0x6c
[ 436.148510] [<c0149e52>] ? kthread_worker_fn+0x119/0x119
[ 436.148520] [<c0475576>] kernel_thread_helper+0x6/0x10
[ 436.148525] ---[ end trace f37fc1957f2d5230 ]---
[ 436.148532] ieee80211 phy0: Destroyed STA 00:0b:6b:3c:8c:e4
[ 436.148546] ieee80211 phy0: device now idle
[ 436.164140] cfg80211: Calling CRDA to update world regulatory domain
[ 436.248933] ieee80211 phy0: device no longer idle - scanning
[ 439.932216] ieee80211 phy0: device now idle
[ 444.930668] ieee80211 phy0: device no longer idle - scanning
[ 448.604546] ieee80211 phy0: device now idle
[ 453.608897] ieee80211 phy0: device no longer idle - scanning
[ 457.283583] ieee80211 phy0: device now idle
[ 462.287692] ieee80211 phy0: device no longer idle - scanning
[ 465.964143] ieee80211 phy0: device now idle
[ 470.967145] ieee80211 phy0: device no longer idle - scanning
[ 474.642423] ieee80211 phy0: device now idle
So we are hitting NULL timer in the iwl3945_rs_free_sta again. Is it
possible that the timer was just deactivated during suspend? I do not
see any code doing that in the iwl driver but maybe there is some
generic functionality at 80211 layer.
I will try to turn on timer debugging suggested by Yong Zhang in other
email.
Thanks
--
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9
Czech Republic
On Thu, Dec 1, 2011 at 2:55 PM, Stanislaw Gruszka <[email protected]> wrote:
> On Thu, Dec 01, 2011 at 01:44:40PM +0100, Michal Hocko wrote:
>> On Thu 01-12-11 12:59:12, Stanislaw Gruszka wrote:
>> > On Thu, Dec 01, 2011 at 11:55:38AM +0100, Michal Hocko wrote:
>> > > [ ?349.316070] iwl3945_rs_free_sta rs_sta f4be1ac0
>> > > [ ?349.316076] ------------[ cut here ]------------
>> > > [ ?349.316097] WARNING: at drivers/net/wireless/iwlegacy/iwl-3945-rs.c:438 iwl3945_rs_free_sta+0x33/0x42 [iwl3945]()
>> >
>> > Johannes pointed on irc that this problem is most likely caused by:
>> >
>> > commit f785d83a19bca326f79d127a413e35769afc0105
>> > Author: Eliad Peller <[email protected]>
>> > Date: ? Mon Aug 8 16:50:22 2011 +0300
>> >
>> > ? ? mac80211: clear sta.drv_priv on reconfiguration
>> >
>> > Michal, could you confirm that? If that is the root of the problem, we
>> > need patch for iwlegacy (and iwlwifi) that allocate separate memory for
>> > rc structures.
> Actually this require quite big changes in iwlegacy at least, perhaps
> also in iwlwifi. I prefer to revert commit as temporary fix, ?and add
> memset(0) to drivers .add_sta callback, in drivers which need it. After
> changing iwlegcy/iwlwifi we could move memset(0) back to ieee80211_reconfig().
>
> Eliad, thoughts?
>
i don't really care about temporarily reverting it.
but maybe you can just add an "initialized" flag to iwl3945_rs_sta,
and check it on free_sta?
Eliad.
On Thu, Dec 01, 2011 at 04:23:19PM +0200, Eliad Peller wrote:
> On Thu, Dec 1, 2011 at 4:05 PM, Michal Hocko <[email protected]> wrote:
> > On Thu 01-12-11 13:44:40, Michal Hocko wrote:
> >> On Thu 01-12-11 12:59:12, Stanislaw Gruszka wrote:
> >> > On Thu, Dec 01, 2011 at 11:55:38AM +0100, Michal Hocko wrote:
> >> > > [ ?349.316070] iwl3945_rs_free_sta rs_sta f4be1ac0
> >> > > [ ?349.316076] ------------[ cut here ]------------
> >> > > [ ?349.316097] WARNING: at drivers/net/wireless/iwlegacy/iwl-3945-rs.c:438 iwl3945_rs_free_sta+0x33/0x42 [iwl3945]()
> >> >
> >> > Johannes pointed on irc that this problem is most likely caused by:
> >> >
> >> > commit f785d83a19bca326f79d127a413e35769afc0105
> >> > Author: Eliad Peller <[email protected]>
> >> > Date: ? Mon Aug 8 16:50:22 2011 +0300
> >> >
> >> > ? ? mac80211: clear sta.drv_priv on reconfiguration
> >> >
> >> > Michal, could you confirm that? If that is the root of the problem, we
> >> > need patch for iwlegacy (and iwlwifi) that allocate separate memory for
> >> > rc structures.
> >>
> >> Should I just revert it?
> >
> > Reverted and guess what ;)
> > Yeah, it works. So feel free to add
> > Reported-adn-Tested-by: Michal Hocko <[email protected]>
> > to the revert.
> >
> cool. after giving it a second look, i agree with Stanislaw - let's
> continue with the revert.
I'm reverting it now.
> > Anyway, I have really no idea about the 80211 stack but the patch
> > doesn't seem to be correct from the layering POV. Why should generic
> > layer clear something that is driver private data (or at least the
> > naming suggests that it is driver specific data)?
>
> the driver should "know" the station (and its private data) only
> between the .sta_add() and .sta_remove() callbacks.
> on suspend, we do call sta_remove(), so it's valid to clear the private data.
>
> the problem is that the rate control also uses the same private data,
> and is valid also after sta_remove() was called. i think the right
> solution is to split the generic private data and the rate control
> private data.
Sounds like a reasonable plan to me -- thanks for volunteering! :-)
John
--
John W. Linville Someday the world will need a hero, and you
[email protected] might be all we have. Be ready.
On Thu, Dec 01, 2011 at 03:13:42PM +0200, Eliad Peller wrote:
> i don't really care about temporarily reverting it.
> but maybe you can just add an "initialized" flag to iwl3945_rs_sta,
> and check it on free_sta?
This is not right solution, that will prevent hangs though, but
we generally wipe out used data, so any other problem could be
triggered.
Stanislaw
On Thu, Dec 01, 2011 at 11:55:38AM +0100, Michal Hocko wrote:
> On Thu 01-12-11 09:58:45, Michal Hocko wrote:
> [...]
> > I will try to turn on timer debugging suggested by Yong Zhang in other
> > email.
>
> OK, so applied the following commits:
> $ git shortlog master..
> Christine Chan (2):
> timer: Use debugobjects to catch deletion of uninitialized timers
> debugobjects: Extend to assert that an object is initialized
>
> Michal Hocko (1):
> Debugging patch
>
> Stephen Boyd (2):
> timer: Setup uninitialized timer with a stub callback
> debugobjects: Be smarter about static objects
>
> Debugging patch is the one from Stanislaw posted earlier in the thread.
> It prevents del_timer_sync and warns on NULL timer in iwl3945_rs_free_sta.
> Also turned on the suggested DEBUG_OBJECTS_TIMERS.
>
> I do not see any timer related warnings. I will try it with the slightly
> modified Debugging patch and just warn without returning before calling
> del_timer_sync.
>
> [ 349.316076] ------------[ cut here ]------------
> [ 349.316097] WARNING: at drivers/net/wireless/iwlegacy/iwl-3945-rs.c:438 iwl3945_rs_free_sta+0x33/0x42 [iwl3945]()
> [ 349.316105] Hardware name: LIFEBOOK S7110
> [ 349.316110] Modules linked in: aes_i586 aes_generic i915 fbcon font bitblit softcursor drm_kms_helper drm fb fbdev i2c_algo_bit cfbcopyarea i2c_core cfbimgblt cfbfillrect fuse tun coretemp hwmon snd_hda_codec_realtek arc4 snd_hda_intel snd_hda_codec snd_pcm_oss snd_mixer_oss iwl3945 snd_pcm iwl_legacy snd_seq_oss snd_seq_midi_event mac80211 snd_seq snd_timer snd_seq_device cfg80211 snd video snd_page_alloc fujitsu_laptop backlight [last unloaded: scsi_wait_scan]
> [ 349.316203] Pid: 12, comm: kworker/u:1 Tainted: G W 3.2.0-rc3-timerstest-00009-g762dd0e #20
^
seems there was warning but we did't catch it, right?
not sure if it's related.
Thanks,
Yong
> [ 349.316209] Call Trace:
> [ 349.316223] [<c01331b4>] warn_slowpath_common+0x68/0x7d
> [ 349.316237] [<f8169454>] ? iwl3945_rs_free_sta+0x33/0x42 [iwl3945]
> [ 349.316247] [<c01331dd>] warn_slowpath_null+0x14/0x18
> [ 349.316260] [<f8169454>] iwl3945_rs_free_sta+0x33/0x42 [iwl3945]
> [ 349.316283] [<f8102f94>] __sta_info_free.isra.24+0x28/0x60 [mac80211]
> [ 349.316304] [<f8103df9>] __sta_info_destroy+0x259/0x283 [mac80211]
> [ 349.316328] [<f8104194>] sta_info_flush+0x43/0x6e [mac80211]
> [ 349.316353] [<f8109453>] ieee80211_set_disassoc+0x1a3/0x1d7 [mac80211]
> [ 349.316379] [<f81095e8>] ieee80211_sta_connection_lost+0x31/0x8a [mac80211]
> [ 349.316390] [<c031208f>] ? dev_printk+0x2b/0x2d
> [ 349.316416] [<f810bde2>] ieee80211_sta_work+0x1b0/0x1c3 [mac80211]
> [ 349.316446] [<f810e38a>] ieee80211_iface_work+0x23e/0x250 [mac80211]
> [ 349.316457] [<c01468af>] process_one_work+0x19b/0x2e6
> [ 349.316483] [<f810e14c>] ? ieee80211_netdev_select_queue+0x14/0x14 [mac80211]
> [ 349.316492] [<c0146ddd>] worker_thread+0x136/0x1ee
> [ 349.316501] [<c0146ca7>] ? manage_workers.isra.23+0x14f/0x14f
> [ 349.316509] [<c014a0f5>] kthread+0x67/0x6c
> [ 349.316517] [<c014a08e>] ? kthread_worker_fn+0x119/0x119
> [ 349.316527] [<c04762f6>] kernel_thread_helper+0x6/0x10
> [ 349.316532] ---[ end trace 5cf96275a244e7ca ]---
> [ 349.316538] ieee80211 phy0: Destroyed STA 00:0b:6b:3c:8c:e4
> [ 349.316556] ieee80211 phy0: device now idle
> [ 349.328105] cfg80211: Calling CRDA to update world regulatory domain
> [ 349.416952] ieee80211 phy0: device no longer idle - scanning
> [ 353.091211] ieee80211 phy0: device now idle
> [ 358.092230] ieee80211 phy0: device no longer idle - scanning
> [ 361.772470] ieee80211 phy0: device now idle
> [ 366.774826] ieee80211 phy0: device no longer idle - scanning
> [ 370.448401] ieee80211 phy0: device now idle
> [ 375.455127] ieee80211 phy0: device no longer idle - scanning
> [ 379.127638] ieee80211 phy0: device now idle
> [ 384.131847] ieee80211 phy0: device no longer idle - scanning
> [ 387.806711] ieee80211 phy0: device now idle
> [ 396.708125] sky2 0000:02:00.0: lan0: Link is up at 1000 Mbps, full duplex, flow control both
> [ 407.120063] lan0: no IPv6 routers present
> [ 434.003896] ieee80211 phy0: device no longer idle - scanning
> [ 438.694843] ieee80211 phy0: device now idle
> [ 441.485602] process `skype' is using obsolete setsockopt SO_BSDCOMPAT
> [ 494.004619] ieee80211 phy0: device no longer idle - scanning
> [ 498.698796] ieee80211 phy0: device now idle
> [ 554.004822] ieee80211 phy0: device no longer idle - scanning
> [ 558.697908] ieee80211 phy0: device now idle
> [ 614.003956] ieee80211 phy0: device no longer idle - scanning
> [ 618.700408] ieee80211 phy0: device now idle
> [ 674.004164] ieee80211 phy0: device no longer idle - scanning
> [ 678.702423] ieee80211 phy0: device now idle
>
> --
> Michal Hocko
> SUSE Labs
> SUSE LINUX s.r.o.
> Lihovarska 1060/12
> 190 00 Praha 9
> Czech Republic