2009-05-05 05:04:33

by Maxim Levitsky

[permalink] [raw]
Subject: [BUG] Crda oopses the system

Here what I see:


> May 4 16:35:14 maxim-laptop kernel: [16939.109054] Process crda (pid: 29344, threadinfo ffff88007deaa000, task ffff880067369600)
> May 4 16:35:14 maxim-laptop kernel: [16939.109058] Stack:
> May 4 16:35:14 maxim-laptop kernel: [16939.109061] 0000000000000034 ffff88003fb8a640 0000000000000000 ffff8800634f5c20
> May 4 16:35:14 maxim-laptop kernel: [16939.109069] ffff88007deaba28 ffffffffa00eb3d0 000004f87deab9c8 ffff88003fb8a640
> May 4 16:35:14 maxim-laptop kernel: [16939.109077] 0000000000000000 ffff8800634f5c24 ffff8800634f5c2c ffff8800634f5c34
> May 4 16:35:14 maxim-laptop kernel: [16939.109086] Call Trace:
> May 4 16:35:14 maxim-laptop kernel: [16939.109091] [<ffffffffa00eb3d0>] nl80211_set_reg+0x100/0x2b0 [cfg80211]
> May 4 16:35:14 maxim-laptop kernel: [16939.109107] [<ffffffff803e697f>] ? nla_parse+0xef/0x110
> May 4 16:35:14 maxim-laptop kernel: [16939.109118] [<ffffffff80513716>] genl_rcv_msg+0x1b6/0x1f0
> May 4 16:35:14 maxim-laptop kernel: [16939.109126] [<ffffffff80513560>] ? genl_rcv_msg+0x0/0x1f0
> May 4 16:35:14 maxim-laptop kernel: [16939.109132] [<ffffffff80512d49>] netlink_rcv_skb+0x89/0xb0
> May 4 16:35:14 maxim-laptop kernel: [16939.109140] [<ffffffff80513547>] genl_rcv+0x27/0x40
> May 4 16:35:14 maxim-laptop kernel: [16939.109146] [<ffffffff805128a9>] ? netlink_sendmsg+0x159/0x300
> May 4 16:35:14 maxim-laptop kernel: [16939.109153] [<ffffffff80512734>] netlink_unicast+0x2c4/0x2e0
> May 4 16:35:14 maxim-laptop kernel: [16939.109161] [<ffffffff804f356e>] ? __alloc_skb+0x6e/0x150
> May 4 16:35:14 maxim-laptop kernel: [16939.109169] [<ffffffff8051294e>] netlink_sendmsg+0x1fe/0x300
> May 4 16:35:14 maxim-laptop kernel: [16939.109176] [<ffffffff804ea607>] sock_sendmsg+0x127/0x140
> May 4 16:35:14 maxim-laptop kernel: [16939.109183] [<ffffffff8025be50>] ? autoremove_wake_function+0x0/0x40
> May 4 16:35:14 maxim-laptop kernel: [16939.109193] [<ffffffff8029f986>] ? get_page_from_freelist+0x3b6/0x650
> May 4 16:35:14 maxim-laptop kernel: [16939.109201] [<ffffffff80299985>] ? find_lock_page+0x25/0x70
> May 4 16:35:14 maxim-laptop kernel: [16939.109208] [<ffffffff804e924b>] ? move_addr_to_kernel+0x2b/0x40
> May 4 16:35:14 maxim-laptop kernel: [16939.109214] [<ffffffff804f4b8c>] ? verify_iovec+0x3c/0xd0
> May 4 16:35:14 maxim-laptop kernel: [16939.109221] [<ffffffff804ea7a9>] sys_sendmsg+0x189/0x320
> May 4 16:35:14 maxim-laptop kernel: [16939.109228] [<ffffffff804eb635>] ? move_addr_to_user+0x65/0x80
> May 4 16:35:14 maxim-laptop kernel: [16939.109235] [<ffffffff802b1651>] ? handle_mm_fault+0x1e1/0x830
> May 4 16:35:14 maxim-laptop kernel: [16939.109243] [<ffffffff803d8f21>] ? __up_read+0x91/0xb0
> May 4 16:35:14 maxim-laptop kernel: [16939.109252] [<ffffffff8020bf2b>] system_call_fastpath+0x16/0x1b
> May 4 16:35:14 maxim-laptop kernel: [16939.109261] Code: a1 00 00 00 0f be 50 39 0f be 70 38 48 c7 c7 70 07 0f a0 31 c0 e8 15 da 49 e0 e9 55 fe ff ff 0f 0b eb fe 0f 0b eb fe 0f 0b eb fe <0f> 0b eb fe 48 8b 35 e5 38 01 00 4c 89 e7 e8 05 f9 ff ff 49 89
> May 4 16:35:14 maxim-laptop kernel: [16939.109329] RIP [<ffffffffa00e4f58>] set_regdom+0x428/0x4c0 [cfg80211]
> May 4 16:35:14 maxim-laptop kernel: [16939.109344] RSP <ffff88007deab978>
> May 4 16:35:14 maxim-laptop kernel: [16939.109350] ---[ end trace 695815cef5ce0efe ]---
>

This happens @ my university where APs send country code, and thus NM calls crda to apply it.
(I already applied it in initscripts)

intel 3945 device, iwlwifi.git commit
#5a94b6d38100b7056a5a347e5c51359d924d305d

Best regards,
Maxim Levitsky



2009-05-31 12:47:15

by Maxim Levitsky

[permalink] [raw]
Subject: Re: [BUG] Crda oopses the system

On Sun, 2009-05-31 at 02:22 -0400, Luis R. Rodriguez wrote:
> On Fri, May 22, 2009 at 01:08:22PM +0300, Maxim Levitsky wrote:
> > I am talking about
> >
> > BUG_ON(!country_ie_regdomain);
> > in net/wireless/reg.c
>
> Please try this patch and leave a window open with this running:
>
> iw event
>
> Please be sure to grab iw from git, not sure if the reg events
> have all gone into an official release yet. What I'm looking for
> is whether or not a previous 11d setting was already processed
> or if the !country_ie_regdomain condition happens from the first
> 11d processing.
>
> Luis
>
> diff --git a/net/wireless/reg.c b/net/wireless/reg.c
> index f87ac1d..1b60dfc 100644
> --- a/net/wireless/reg.c
> +++ b/net/wireless/reg.c
> @@ -2171,7 +2171,11 @@ static int __set_regdom(const struct ieee80211_regdomain *rd)
> * the country IE rd with what CRDA believes that country should have
> */
>
> - BUG_ON(!country_ie_regdomain);
> + if (WARN_ON(!country_ie_regdomain)) {
> + kfree(rd);
> + rd = NULL;
> + return -EINVAL;
> + }
> BUG_ON(rd == country_ie_regdomain);
>
> /*
> @@ -2268,6 +2272,8 @@ int regulatory_init(void)
> if (IS_ERR(reg_pdev))
> return PTR_ERR(reg_pdev);
>
> + country_ie_regdomain = NULL;
> +
> spin_lock_init(&reg_requests_lock);
> spin_lock_init(&reg_pending_beacons_lock);
>


I'll test this today.
I have iw from git.

Best regards,
Maxim Levitsky


2009-05-10 19:34:47

by Maxim Levitsky

[permalink] [raw]
Subject: Re: [BUG] Crda oopses the system

On Sun, 2009-05-10 at 11:33 -0700, Luis R. Rodriguez wrote:
> On Sun, May 10, 2009 at 5:20 AM, Maxim Levitsky <[email protected]> wrote:
> > On Wed, 2009-05-06 at 16:34 -0400, Pavel Roskin wrote:
> >> gdb /lib/modules/`uname -r`/kernel/net/wireless/cfg80211.ko
> >
> > I have recompiled the kernel with debugging info.
> >
> > This is new backtrace:
> >
> >
> >> May 7 10:27:05 maxim-laptop kernel: [ 5411.203085] ------------[ cut here ]------------
> >> May 7 10:27:05 maxim-laptop kernel: [ 5411.203096] kernel BUG at /home/maxim/software/kernel/linux-2.6/net/wireless/reg.c:2039!
> >> May 7 10:27:05 maxim-laptop kernel: [ 5411.203103] invalid opcode: 0000 [#1] PREEMPT SMP
> >> May 7 10:27:05 maxim-laptop kernel: [ 5411.203115] last sysfs file: /sys/devices/platform/coretemp.1/temp1_input
> >> May 7 10:27:05 maxim-laptop kernel: [ 5411.203121] CPU 0
> >> May 7 10:27:05 maxim-laptop kernel: [ 5411.203127] Modules linked in: iwl3945 iwlcore mac80211 cfg80211 cpufreq_stats af_packet nvidia(P) nfsd exportfs nfs lockd nfs_acl auth_rpcgss sunrpc usb_storage usb_libusual cpufreq_powersave cpufreq_conservative cpufreq_userspace acpi_cpufreq coretemp sbp2 snd_hda_codec_realtek snd_hda_intel joydev snd_hda_codec uvcvideo snd_hwdep videodev v4l1_compat acer_wmi rfkill v4l2_compat_ioctl32 sdhci_pci uhci_hcd snd_pcm backlight psmouse serio_raw ohci1394 sdhci iTCO_wdt iTCO_vendor_support snd_timer snd_page_alloc ehci_hcd usbcore evdev wmi fuse [last unloaded: cfg80211]
> >> May 7 10:27:05 maxim-laptop kernel: [ 5411.203258] Pid: 20876, comm: crda Tainted: P 2.6.30-rc4-wl #58 Aspire 5720
> >> May 7 10:27:05 maxim-laptop kernel: [ 5411.203265] RIP: 0010:[<ffffffffa0c9611e>] [<ffffffffa0c9611e>] set_regdom+0x43e/0x4d0 [cfg80211]
> >> May 7 10:27:05 maxim-laptop kernel: [ 5411.203291] RSP: 0018:ffff8800638c1978 EFLAGS: 00010246
> >> May 7 10:27:05 maxim-laptop kernel: [ 5411.203296] RAX: ffff88005176df68 RBX: ffff880034b601a0 RCX: ffffffffa0ca9540
> >> May 7 10:27:05 maxim-laptop kernel: [ 5411.203302] RDX: ffff880034b60000 RSI: 0000000000000000 RDI: 0000000000000000
> >> May 7 10:27:05 maxim-laptop kernel: [ 5411.203308] RBP: ffff8800638c1998 R08: 0000000000000001 R09: 0000000000000001
> >> May 7 10:27:05 maxim-laptop kernel: [ 5411.203314] R10: 0000000000000000 R11: 0000000000000000 R12: ffff88004f889a90
> >> May 7 10:27:05 maxim-laptop kernel: [ 5411.203320] R13: ffff88007e18e6e0 R14: 0000000000000001 R15: 0000000000000001
> >> May 7 10:27:05 maxim-laptop kernel: [ 5411.203326] FS: 00007f1c069c56f0(0000) GS:ffff880001025000(0000) knlGS:0000000000000000
> >> May 7 10:27:05 maxim-laptop kernel: [ 5411.203333] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >> May 7 10:27:05 maxim-laptop kernel: [ 5411.203338] CR2: 00007f1c0632fdb0 CR3: 000000007fa65000 CR4: 00000000000006e0
> >> May 7 10:27:05 maxim-laptop kernel: [ 5411.203344] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> >> May 7 10:27:05 maxim-laptop kernel: [ 5411.203350] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> >> May 7 10:27:05 maxim-laptop kernel: [ 5411.203357] Process crda (pid: 20876, threadinfo ffff8800638c0000, task ffff880067093e80)
> >> May 7 10:27:05 maxim-laptop kernel: [ 5411.203362] Stack:
> >> May 7 10:27:05 maxim-laptop kernel: [ 5411.203366] 0000000000000000 ffff88004f889a90 0000000000000000 ffff88007e18e6e0
> >> May 7 10:27:05 maxim-laptop kernel: [ 5411.203379] ffff8800638c1a28 ffffffffa0c9c672 ffff88005e15ab40 ffff88004f889a90
> >> May 7 10:27:05 maxim-laptop kernel: [ 5411.203393] 0000000000000000 ffff88007e18e6e4 ffff88007e18e6ec ffff88007e18e6f4
> >> May 7 10:27:05 maxim-laptop kernel: [ 5411.203410] Call Trace:
> >> May 7 10:27:05 maxim-laptop kernel: [ 5411.203416] [<ffffffffa0c9c672>] nl80211_set_reg+0x112/0x2c0 [cfg80211]
> >> May 7 10:27:05 maxim-laptop kernel: [ 5411.203436] [<ffffffff80412c8f>] ? nla_parse+0xef/0x110
> >> May 7 10:27:05 maxim-laptop kernel: [ 5411.203450] [<ffffffff8054dac6>] genl_rcv_msg+0x1b6/0x1f0
> >> May 7 10:27:05 maxim-laptop kernel: [ 5411.203462] [<ffffffff8054d910>] ? genl_rcv_msg+0x0/0x1f0
> >> May 7 10:27:05 maxim-laptop kernel: [ 5411.203471] [<ffffffff8054d0e9>] netlink_rcv_skb+0x89/0xb0
> >> May 7 10:27:05 maxim-laptop kernel: [ 5411.203479] [<ffffffff8054d8ee>] genl_rcv+0x2e/0x50
> >> May 7 10:27:05 maxim-laptop kernel: [ 5411.203488] [<ffffffff8054c917>] ? netlink_unicast+0x117/0x2e0
> >> May 7 10:27:05 maxim-laptop kernel: [ 5411.203498] [<ffffffff8054cac4>] netlink_unicast+0x2c4/0x2e0
> >> May 7 10:27:05 maxim-laptop kernel: [ 5411.203508] [<ffffffff8052c4b3>] ? __alloc_skb+0x73/0x160
> >> May 7 10:27:05 maxim-laptop kernel: [ 5411.203519] [<ffffffff8054ccde>] netlink_sendmsg+0x1fe/0x300
> >> May 7 10:27:05 maxim-laptop kernel: [ 5411.203528] [<ffffffff80522f07>] sock_sendmsg+0x127/0x140
> >> May 7 10:27:05 maxim-laptop kernel: [ 5411.203537] [<ffffffff80522d61>] ? sock_recvmsg+0x141/0x160
> >> May 7 10:27:05 maxim-laptop kernel: [ 5411.203546] [<ffffffff80260ad0>] ? autoremove_wake_function+0x0/0x40
> >> May 7 10:27:05 maxim-laptop kernel: [ 5411.203558] [<ffffffff80293602>] ? __rcu_read_unlock+0xa2/0xc0
> >> May 7 10:27:05 maxim-laptop kernel: [ 5411.203567] [<ffffffff802730e9>] ? trace_hardirqs_on_caller+0x29/0x1c0
> >> May 7 10:27:05 maxim-laptop kernel: [ 5411.203578] [<ffffffff805219d0>] ? move_addr_to_kernel+0x30/0x40
> >> May 7 10:27:05 maxim-laptop kernel: [ 5411.203588] [<ffffffff8052db51>] ? verify_iovec+0x41/0xd0
> >> May 7 10:27:05 maxim-laptop kernel: [ 5411.203597] [<ffffffff805230ae>] sys_sendmsg+0x18e/0x320
> >> May 7 10:27:05 maxim-laptop kernel: [ 5411.203607] [<ffffffff805c4505>] ? _spin_unlock_irqrestore+0x65/0x80
> >> May 7 10:27:05 maxim-laptop kernel: [ 5411.203619] [<ffffffff805c77b1>] ? sub_preempt_count+0x51/0x60
> >> May 7 10:27:05 maxim-laptop kernel: [ 5411.203628] [<ffffffff80403c21>] ? __up_read+0x91/0xb0
> >> May 7 10:27:05 maxim-laptop kernel: [ 5411.203639] [<ffffffff802730e9>] ? trace_hardirqs_on_caller+0x29/0x1c0
> >> May 7 10:27:05 maxim-laptop kernel: [ 5411.203648] [<ffffffff805c3e4e>] ? trace_hardirqs_on_thunk+0x3a/0x3f
> >> May 7 10:27:05 maxim-laptop kernel: [ 5411.203658] [<ffffffff8020c15b>] system_call_fastpath+0x16/0x1b
> >> May 7 10:27:05 maxim-laptop kernel: [ 5411.203670] Code: 90 91 00 00 00 0f be b0 90 00 00 00 48 c7 c7 60 1b ca a0 31 c0 e8 13 9f 92 df e9 4f fe ff ff 0f 0b eb fe 0f 0b eb fe 0f 0b eb fe <0f> 0b eb fe 48 8b 35 17 43 01 00 4c 89 e7 e8 df f8 ff ff 49 89
> >> May 7 10:27:05 maxim-laptop kernel: [ 5411.203815] RIP [<ffffffffa0c9611e>] set_regdom+0x43e/0x4d0 [cfg80211]
> >> May 7 10:27:05 maxim-laptop kernel: [ 5411.203834] RSP <ffff8800638c1978>
> >> May 7 10:27:05 maxim-laptop kernel: [ 5411.203842] ---[ end trace 9723f71e550687a4 ]---
> >>
> >
> >
> > This is GDB output - can be inaccurate - I have pulled latest wireless-testing , and rebuild kernel again.
> > I will read the source, and try to fix this
> > This is 100% reproducible
> >
> >
> >
> >> (gdb) l *nl80211_set_reg+0x112
> >> 0x96a2 is in nl80211_set_reg (/home/maxim/software/kernel/linux-2.6/net/wireless/nl80211.c:2587).
> >> 2582
> >> 2583 BUG_ON(rule_idx != num_rules);
> >> 2584
> >> 2585 mutex_lock(&cfg80211_mutex);
> >> 2586 r = set_regdom(rd);
> >> 2587 mutex_unlock(&cfg80211_mutex);
> >> 2588 return r;
> >> 2589
> >> 2590 bad_reg:
> >> 2591 kfree(rd);
> >> (gdb)
> >
>
> Hm I don't see a BUG_ON at my net/wireless/reg.c:2039, can you please
> try with wireless-testing or paste the line 2039 in your
> net/wrieless/reg.c. Also if you can provide steps to how to get the
> code you have and how to reproduce it would help.
>
> Luis

Sorry for confusion....
This is iwlwifi.git tree.


Like I said, backtrace, and gdb output are from slightly different
kernels.


My university AP broadcasts the reg domain in beacons.
This always makes this oops.
(I use lates -git NM, and wpa_supplicant)

I will investigate this deeply, currently the above information just
isn't enough to see what is going on.

Thanks,
Best regards,
Maxim Levitsky




2009-05-19 14:17:30

by Maxim Levitsky

[permalink] [raw]
Subject: Re: [BUG] Crda oopses the system

On Thu, 2009-05-14 at 10:20 -0400, Bob Copeland wrote:
> On Wed, May 13, 2009 at 8:07 PM, Luis R. Rodriguez <[email protected]> wrote:
> >>
> >> I wish I had time to set up, an ath5k in AP mode. There are rumors that
> >> it works more or less now, then I could use the same beacon frame to
> >> test.
> >
> > It does and if it doesn't its a bug.
>
> Just FYI, it's known to have problems if the STA uses power saving --
> ath5k never updates the beacon. I have a patch that should work in the
> relevant bugzilla but I haven't been back to retest it (at first I
> thought it was causing hangs, but later realized the hangs were due to
> other bugs in w-t).
>


Just to say the truth, I haven't took my laptop with me, thus I still
haven't tested this. I just happened that I has to do such.


I ether make my ath5k send same beacon at home, or bring it here for
next week.


Anyway I think it won't hurt to merge these patches anyway.


Best regards, and really sorry,
Maxim Levitsky


2009-05-12 22:07:38

by Maxim Levitsky

[permalink] [raw]
Subject: Re: [BUG] Crda oopses the system

On Tue, 2009-05-12 at 15:00 -0700, Luis R. Rodriguez wrote:
> On Tue, May 12, 2009 at 10:34 AM, Maxim Levitsky
> <[email protected]> wrote:
> > On Tue, 2009-05-12 at 10:24 -0700, Luis R. Rodriguez wrote:
> >> On Tue, May 12, 2009 at 8:05 AM, Maxim Levitsky <[email protected]> wrote:
> >> > On Sun, 2009-05-10 at 12:36 -0700, Luis R. Rodriguez wrote:
> >> >> On Sun, May 10, 2009 at 12:34 PM, Maxim Levitsky
> >> >> <[email protected]> wrote:
> >> >> > On Sun, 2009-05-10 at 11:33 -0700, Luis R. Rodriguez wrote:
> >> >> >> On Sun, May 10, 2009 at 5:20 AM, Maxim Levitsky <[email protected]> wrote:
> >> >> >> > On Wed, 2009-05-06 at 16:34 -0400, Pavel Roskin wrote:
> >> >> >> >> gdb /lib/modules/`uname -r`/kernel/net/wireless/cfg80211.ko
> >> >> >> >
> >> >> >> > I have recompiled the kernel with debugging info.
> >> >> >> >
> >> >> >> > This is new backtrace:
> >> >
> >> > Ok, here is what I found:
> >> >
> >> > The real BUG_ON that fires up is at
> >> >
> >> > wireless/reg.c: __set_regdom: BUG_ON(!country_ie_regdomain)
> >> >
> >> > Confirmed by replacing it with printk.
> >> >
> >> >
> >> > I attach a beacon sample that triggers this:
> >>
> >> Thanks for that. Was this from wireless-testing? If so as of what SHA1
> >> sum? Can you provide steps to reproduce? What AP? What card?
> > wireless-testing, b2382a4aeff07a481ccf860e4f716b48b52e3781
>
> That HEAD is buggy, its missing a fix for the minstrel/pid fix.
> Although since you are running iwl3945 it explains why you haven't
> seen that oops.
>
> Will look into this -- thanks. How easy can you reproduce BTW? Does it
> happen immediately upon assoc? Does it happen with other APs at the
> university?
Happens always when I try to associate.
In fact, after a boot, this oops is in dmesg, and whole system
semi-frozen. (can't get root, etc probably due to locks held)


Other APs, I don't see many other APs there, (you mean other essids,
right?)

Most of them are encrypted anyway.


I have just compiled HEAD of wireless-testing, I see how well it will
work tomorrow.


Thanks in advance,
Maxim Levitsky
>
> Luis


2009-05-14 00:07:33

by Luis R. Rodriguez

[permalink] [raw]
Subject: Re: [BUG] Crda oopses the system

On Wed, May 13, 2009 at 4:47 PM, Maxim Levitsky <[email protected]=
m> wrote:
> On Wed, 2009-05-13 at 16:28 -0700, Luis R. Rodriguez wrote:
>> On Wed, May 13, 2009 at 4:22 PM, Maxim Levitsky <maximlevitsky@gmail=
=2Ecom> wrote:
>> > On Wed, 2009-05-13 at 16:12 -0700, Luis R. Rodriguez wrote:
>> >> On Wed, May 13, 2009 at 4:08 PM, Maxim Levitsky <maximlevitsky@gm=
ail.com> wrote:
>> >> > On Wed, 2009-05-13 at 14:28 -0700, Luis R. Rodriguez wrote:
>> >>
>> >> >> Anyway please try the patches I posted and I'll finish reviewi=
ng this
>> >> >> in the meantime.
>> >> > I don't see anything yet.
>> >>
>> >> wget this and git am it, it has all the patches:
>> >>
>> >> http://bombadil.infradead.org/~mcgrof/patches/wl/race-fixes-2009-=
05-13.patch
>> >>
>> >> =C2=A0 Luis
>> >
>> >
>> > Thanks a lot.
>> > I will test this as soon as be at university again. (next sunday).
>>
>> Oh wow, that seems like eons to me. Hm.. =C2=A0yeah oh well.
> For me it isn't... I have a pile of homeworks....
>
> Anyway, I was supposed to be at lectures tomorrow, but there will be =
'a
> student day', which I obviously don't attend, only to continue linux
> hacking...
>
> So thanks,
> =C2=A0 =C2=A0 =C2=A0 =C2=A0Maxim Levitsky
>
> I wish I had time to set up, an ath5k in AP mode. There are rumors th=
at
> it works more or less now, then I could use the same beacon frame to
> test.

It does and if it doesn't its a bug.

Luis

2009-05-31 21:13:44

by Maxim Levitsky

[permalink] [raw]
Subject: Re: [BUG] Crda oopses the system

On Sun, 2009-05-31 at 23:54 +0300, Maxim Levitsky wrote:
> On Sun, 2009-05-31 at 15:47 +0300, Maxim Levitsky wrote:
> > On Sun, 2009-05-31 at 02:22 -0400, Luis R. Rodriguez wrote:
> > > On Fri, May 22, 2009 at 01:08:22PM +0300, Maxim Levitsky wrote:
> > > > I am talking about
> > > >
> > > > BUG_ON(!country_ie_regdomain);
> > > > in net/wireless/reg.c
> > >
> > > Please try this patch and leave a window open with this running:
> > >
> > > iw event
> > >
> > > Please be sure to grab iw from git, not sure if the reg events
> > > have all gone into an official release yet. What I'm looking for
> > > is whether or not a previous 11d setting was already processed
> > > or if the !country_ie_regdomain condition happens from the first
> > > 11d processing.
> > >
> > > Luis
> > >
> > > diff --git a/net/wireless/reg.c b/net/wireless/reg.c
> > > index f87ac1d..1b60dfc 100644
> > > --- a/net/wireless/reg.c
> > > +++ b/net/wireless/reg.c
> > > @@ -2171,7 +2171,11 @@ static int __set_regdom(const struct ieee80211_regdomain *rd)
> > > * the country IE rd with what CRDA believes that country should have
> > > */
> > >
> > > - BUG_ON(!country_ie_regdomain);
> > > + if (WARN_ON(!country_ie_regdomain)) {
> > > + kfree(rd);
> > > + rd = NULL;
> > > + return -EINVAL;
> > > + }
> > > BUG_ON(rd == country_ie_regdomain);
> > >
> > > /*
> > > @@ -2268,6 +2272,8 @@ int regulatory_init(void)
> > > if (IS_ERR(reg_pdev))
> > > return PTR_ERR(reg_pdev);
> > >
> > > + country_ie_regdomain = NULL;
> > > +
> > > spin_lock_init(&reg_requests_lock);
> > > spin_lock_init(&reg_pending_beacons_lock);
> > >
> >
> >
> > I'll test this today.
> > I have iw from git.
> >
> > Best regards,
> > Maxim Levitsky
> >
>
>
>
>
>
>
>
>
>
>
> Here it is:
>
>
> > wlan0 (phy #0): assoc 00:1b:9e:d8:77:02 -> 00:1b:77:f1:7c:29 status: 0: Successful
> > wlan0 (phy #0): disassoc 00:1b:77:f1:7c:29 -> 00:1b:9e:d8:77:02 reason 3: Deauthenticated because sending station is leaving (or has left) the IBSS or ESS
> > wlan0 (phy #0): scan finished
> > wlan0 (phy #0): auth 00:23:4d:3c:80:27 -> 00:1b:77:f1:7c:29 status: 0: Successful
> > wlan0 (phy #0): assoc 00:23:4d:3c:80:27 -> 00:1b:77:f1:7c:29 status: 0: Successful
> > phy #0: regulatory domain change: intersection used due to a request made by a country IE on phy0
> >
> dmesg attached (I use nvidia drivers)
>
>
>
> On top of that there are few more very bold bugs in ath5k AP mode:
>
>
> 1 - beacons are send only after I start hostapd twise (kill it, and
> start again)
>
> 2 - ath5k makes kernel panic, reliably after few times hostapd have
> started, I didn't yet captured the output.
> I remember to see panics with ad-hoc as well.
> I mean blinking leds on keyboard.
>
> 3 - couldn't transfer any frames between AP and client, only association
> works.
>
> I have started the hostapd, associated the clent (using link-local
> feature of NM) assigned both ap and client an ip address (ifconfig wlan0
> 10.1.0.1/24 on AP, and ifconfig wlan0 10.1.0.2/24 on client ) and yet
> even I couldn't receive even a arp reply from AP, and vise versa.
>
> I use hostapd and wpa_supplicant frm latest git.

4 - transfers freeze very often, now I understand that this isn't
related to transfer speed or anything like that, just if the device is
on moderate load (1.1 Mbytes/s transfer via AP to my main notebook) will
play dead every 5 minutes or so. Even running 'iwlist scan', which
supposed to reset phy, doesn't help.

(This isn't related to AP mode)


You have the documentation, maybe you can look what is wrong?



Best regards,
Maxim Levitsky




2009-05-13 23:47:55

by Maxim Levitsky

[permalink] [raw]
Subject: Re: [BUG] Crda oopses the system

On Wed, 2009-05-13 at 16:28 -0700, Luis R. Rodriguez wrote:
> On Wed, May 13, 2009 at 4:22 PM, Maxim Levitsky <[email protected]> wrote:
> > On Wed, 2009-05-13 at 16:12 -0700, Luis R. Rodriguez wrote:
> >> On Wed, May 13, 2009 at 4:08 PM, Maxim Levitsky <[email protected]> wrote:
> >> > On Wed, 2009-05-13 at 14:28 -0700, Luis R. Rodriguez wrote:
> >>
> >> >> Anyway please try the patches I posted and I'll finish reviewing this
> >> >> in the meantime.
> >> > I don't see anything yet.
> >>
> >> wget this and git am it, it has all the patches:
> >>
> >> http://bombadil.infradead.org/~mcgrof/patches/wl/race-fixes-2009-05-13.patch
> >>
> >> Luis
> >
> >
> > Thanks a lot.
> > I will test this as soon as be at university again. (next sunday).
>
> Oh wow, that seems like eons to me. Hm.. yeah oh well.
For me it isn't... I have a pile of homeworks....

Anyway, I was supposed to be at lectures tomorrow, but there will be 'a
student day', which I obviously don't attend, only to continue linux
hacking...

So thanks,
Maxim Levitsky

I wish I had time to set up, an ath5k in AP mode. There are rumors that
it works more or less now, then I could use the same beacon frame to
test.


Best regards,
Maxim Levitsky


2009-05-13 23:08:29

by Maxim Levitsky

[permalink] [raw]
Subject: Re: [BUG] Crda oopses the system

On Wed, 2009-05-13 at 14:28 -0700, Luis R. Rodriguez wrote:
> On Tue, May 12, 2009 at 3:07 PM, Maxim Levitsky <[email protected]> wrote:
> > On Tue, 2009-05-12 at 15:00 -0700, Luis R. Rodriguez wrote:
> >> On Tue, May 12, 2009 at 10:34 AM, Maxim Levitsky
> >> <[email protected]> wrote:
> >> > On Tue, 2009-05-12 at 10:24 -0700, Luis R. Rodriguez wrote:
> >> >> On Tue, May 12, 2009 at 8:05 AM, Maxim Levitsky <[email protected]> wrote:
> >> >> > On Sun, 2009-05-10 at 12:36 -0700, Luis R. Rodriguez wrote:
> >> >> >> On Sun, May 10, 2009 at 12:34 PM, Maxim Levitsky
> >> >> >> <[email protected]> wrote:
> >> >> >> > On Sun, 2009-05-10 at 11:33 -0700, Luis R. Rodriguez wrote:
> >> >> >> >> On Sun, May 10, 2009 at 5:20 AM, Maxim Levitsky <[email protected]> wrote:
> >> >> >> >> > On Wed, 2009-05-06 at 16:34 -0400, Pavel Roskin wrote:
> >> >> >> >> >> gdb /lib/modules/`uname -r`/kernel/net/wireless/cfg80211.ko
> >> >> >> >> >
> >> >> >> >> > I have recompiled the kernel with debugging info.
> >> >> >> >> >
> >> >> >> >> > This is new backtrace:
> >> >> >
> >> >> > Ok, here is what I found:
> >> >> >
> >> >> > The real BUG_ON that fires up is at
> >> >> >
> >> >> > wireless/reg.c: __set_regdom: BUG_ON(!country_ie_regdomain)
> >> >> >
> >> >> > Confirmed by replacing it with printk.
> >> >> >
> >> >> >
> >> >> > I attach a beacon sample that triggers this:
> >> >>
> >> >> Thanks for that. Was this from wireless-testing? If so as of what SHA1
> >> >> sum? Can you provide steps to reproduce? What AP? What card?
> >> > wireless-testing, b2382a4aeff07a481ccf860e4f716b48b52e3781
> >>
> >> That HEAD is buggy, its missing a fix for the minstrel/pid fix.
> >> Although since you are running iwl3945 it explains why you haven't
> >> seen that oops.
> >>
> >> Will look into this -- thanks. How easy can you reproduce BTW? Does it
> >> happen immediately upon assoc? Does it happen with other APs at the
> >> university?
> > Happens always when I try to associate.
>
> Great so it will be easy to confirm a fix. Can you please try the 4
> patches I just posted? I haven't finished reviewing your case but so
> far from the pcap review (which was really helpful) I don't see
> anything funky from the beacon except that the FCS is bad. But that
> shouldn't matter.
This is a fault of iwlwifi monitor mode. it always reports 0 there.

>
> Anyway please try the patches I posted and I'll finish reviewing this
> in the meantime.
I don't see anything yet.

>
> > Other APs, I don't see many other APs there, (you mean other essids,
> > right?)
>
> I meant other APs at the University on the same network.
This happens with all (I obviously couldn't test them all...) AP here.
(more correctly to say there were no ap that didn't trigger that bug)
I now converted that BUG_ON to printk, so I can now use crda, and still
see if this bug exists.


>
> > Most of them are encrypted anyway.
>
> OK this should not matter.
>
> Luis


Thanks a lot.
Best regards,
Maxim Levitsky




2009-05-22 00:36:30

by Luis R. Rodriguez

[permalink] [raw]
Subject: Re: [BUG] Crda oopses the system

On Thu, May 21, 2009 at 5:20 PM, Maxim Levitsky <[email protected]> wrote:
> On Tue, 2009-05-19 at 17:17 +0300, Maxim Levitsky wrote:
>> On Thu, 2009-05-14 at 10:20 -0400, Bob Copeland wrote:
>> > On Wed, May 13, 2009 at 8:07 PM, Luis R. Rodriguez <[email protected]> wrote:
>> > >>
>> > >> I wish I had time to set up, an ath5k in AP mode. There are rumors that
>> > >> it works more or less now, then I could use the same beacon frame to
>> > >> test.
>> > >
>> > > It does and if it doesn't its a bug.
>> >
>> > Just FYI, it's known to have problems if the STA uses power saving --
>> > ath5k never updates the beacon.  I have a patch that should work in the
>> > relevant bugzilla but I haven't been back to retest it (at first I
>> > thought it was causing hangs, but later realized the hangs were due to
>> > other bugs in w-t).
>> >
>>
>
> I was just able to reproduce that bug on latest wireless-testing that I
> belive contains the patches you sent for me to test.

Latest wireless-testing *does* have those patches.

> I reproduced it against ath5k running in AP mode at home.
>
> I finally made the ath5k send beacons (although not much works besides
> this).
>
>
> Note that I noticed that this bug

You can get an oops when trying to associate to your ath5k AP? And its
easily reproducible? What driver to you use as the STA? Can you
provide a trace?

> happens once at boot, if I set NM to
> use system settings,

What does this mean?

> when I try to connect again it doesn't happen. Its
> a race condition after all.

Is the oops not crashing your box? How are you able to try again? What
does trying again mean?

Luis

2009-05-13 23:12:29

by Luis R. Rodriguez

[permalink] [raw]
Subject: Re: [BUG] Crda oopses the system

On Wed, May 13, 2009 at 4:08 PM, Maxim Levitsky <[email protected]> wrote:
> On Wed, 2009-05-13 at 14:28 -0700, Luis R. Rodriguez wrote:

>> Anyway please try the patches I posted and I'll finish reviewing this
>> in the meantime.
> I don't see anything yet.

wget this and git am it, it has all the patches:

http://bombadil.infradead.org/~mcgrof/patches/wl/race-fixes-2009-05-13.patch

Luis

2009-05-22 00:20:16

by Maxim Levitsky

[permalink] [raw]
Subject: Re: [BUG] Crda oopses the system

On Tue, 2009-05-19 at 17:17 +0300, Maxim Levitsky wrote:
> On Thu, 2009-05-14 at 10:20 -0400, Bob Copeland wrote:
> > On Wed, May 13, 2009 at 8:07 PM, Luis R. Rodriguez <[email protected]> wrote:
> > >>
> > >> I wish I had time to set up, an ath5k in AP mode. There are rumors that
> > >> it works more or less now, then I could use the same beacon frame to
> > >> test.
> > >
> > > It does and if it doesn't its a bug.
> >
> > Just FYI, it's known to have problems if the STA uses power saving --
> > ath5k never updates the beacon. I have a patch that should work in the
> > relevant bugzilla but I haven't been back to retest it (at first I
> > thought it was causing hangs, but later realized the hangs were due to
> > other bugs in w-t).
> >
>

I was just able to reproduce that bug on latest wireless-testing that I
belive contains the patches you sent for me to test.
I reproduced it against ath5k running in AP mode at home.

I finally made the ath5k send beacons (although not much works besides
this).


Note that I noticed that this bug happens once at boot, if I set NM to
use system settings, when I try to connect again it doesn't happen. Its
a race condition after all.

Best regards,
Maxim Levitsky



2009-05-13 23:29:09

by Luis R. Rodriguez

[permalink] [raw]
Subject: Re: [BUG] Crda oopses the system

On Wed, May 13, 2009 at 4:22 PM, Maxim Levitsky <[email protected]=
m> wrote:
> On Wed, 2009-05-13 at 16:12 -0700, Luis R. Rodriguez wrote:
>> On Wed, May 13, 2009 at 4:08 PM, Maxim Levitsky <maximlevitsky@gmail=
=2Ecom> wrote:
>> > On Wed, 2009-05-13 at 14:28 -0700, Luis R. Rodriguez wrote:
>>
>> >> Anyway please try the patches I posted and I'll finish reviewing =
this
>> >> in the meantime.
>> > I don't see anything yet.
>>
>> wget this and git am it, it has all the patches:
>>
>> http://bombadil.infradead.org/~mcgrof/patches/wl/race-fixes-2009-05-=
13.patch
>>
>> =C2=A0 Luis
>
>
> Thanks a lot.
> I will test this as soon as be at university again. (next sunday).

Oh wow, that seems like eons to me. Hm.. yeah oh well.

Luis

2009-05-05 18:47:05

by Luis R. Rodriguez

[permalink] [raw]
Subject: Re: [BUG] Crda oopses the system

On Mon, May 4, 2009 at 10:04 PM, Maxim Levitsky <[email protected]=
m> wrote:
> Here what I see:
>
>
>> May =C2=A04 16:35:14 maxim-laptop kernel: [16939.109054] Process crd=
a (pid: 29344, threadinfo ffff88007deaa000, task ffff880067369600)
>> May =C2=A04 16:35:14 maxim-laptop kernel: [16939.109058] Stack:
>> May =C2=A04 16:35:14 maxim-laptop kernel: [16939.109061] =C2=A000000=
00000000034 ffff88003fb8a640 0000000000000000 ffff8800634f5c20
>> May =C2=A04 16:35:14 maxim-laptop kernel: [16939.109069] =C2=A0ffff8=
8007deaba28 ffffffffa00eb3d0 000004f87deab9c8 ffff88003fb8a640
>> May =C2=A04 16:35:14 maxim-laptop kernel: [16939.109077] =C2=A000000=
00000000000 ffff8800634f5c24 ffff8800634f5c2c ffff8800634f5c34
>> May =C2=A04 16:35:14 maxim-laptop kernel: [16939.109086] Call Trace:
>> May =C2=A04 16:35:14 maxim-laptop kernel: [16939.109091] =C2=A0[<fff=
fffffa00eb3d0>] nl80211_set_reg+0x100/0x2b0 [cfg80211]
>> May =C2=A04 16:35:14 maxim-laptop kernel: [16939.109107] =C2=A0[<fff=
fffff803e697f>] ? nla_parse+0xef/0x110
>> May =C2=A04 16:35:14 maxim-laptop kernel: [16939.109118] =C2=A0[<fff=
fffff80513716>] genl_rcv_msg+0x1b6/0x1f0
>> May =C2=A04 16:35:14 maxim-laptop kernel: [16939.109126] =C2=A0[<fff=
fffff80513560>] ? genl_rcv_msg+0x0/0x1f0
>> May =C2=A04 16:35:14 maxim-laptop kernel: [16939.109132] =C2=A0[<fff=
fffff80512d49>] netlink_rcv_skb+0x89/0xb0
>> May =C2=A04 16:35:14 maxim-laptop kernel: [16939.109140] =C2=A0[<fff=
fffff80513547>] genl_rcv+0x27/0x40
>> May =C2=A04 16:35:14 maxim-laptop kernel: [16939.109146] =C2=A0[<fff=
fffff805128a9>] ? netlink_sendmsg+0x159/0x300
>> May =C2=A04 16:35:14 maxim-laptop kernel: [16939.109153] =C2=A0[<fff=
fffff80512734>] netlink_unicast+0x2c4/0x2e0
>> May =C2=A04 16:35:14 maxim-laptop kernel: [16939.109161] =C2=A0[<fff=
fffff804f356e>] ? __alloc_skb+0x6e/0x150
>> May =C2=A04 16:35:14 maxim-laptop kernel: [16939.109169] =C2=A0[<fff=
fffff8051294e>] netlink_sendmsg+0x1fe/0x300
>> May =C2=A04 16:35:14 maxim-laptop kernel: [16939.109176] =C2=A0[<fff=
fffff804ea607>] sock_sendmsg+0x127/0x140
>> May =C2=A04 16:35:14 maxim-laptop kernel: [16939.109183] =C2=A0[<fff=
fffff8025be50>] ? autoremove_wake_function+0x0/0x40
>> May =C2=A04 16:35:14 maxim-laptop kernel: [16939.109193] =C2=A0[<fff=
fffff8029f986>] ? get_page_from_freelist+0x3b6/0x650
>> May =C2=A04 16:35:14 maxim-laptop kernel: [16939.109201] =C2=A0[<fff=
fffff80299985>] ? find_lock_page+0x25/0x70
>> May =C2=A04 16:35:14 maxim-laptop kernel: [16939.109208] =C2=A0[<fff=
fffff804e924b>] ? move_addr_to_kernel+0x2b/0x40
>> May =C2=A04 16:35:14 maxim-laptop kernel: [16939.109214] =C2=A0[<fff=
fffff804f4b8c>] ? verify_iovec+0x3c/0xd0
>> May =C2=A04 16:35:14 maxim-laptop kernel: [16939.109221] =C2=A0[<fff=
fffff804ea7a9>] sys_sendmsg+0x189/0x320
>> May =C2=A04 16:35:14 maxim-laptop kernel: [16939.109228] =C2=A0[<fff=
fffff804eb635>] ? move_addr_to_user+0x65/0x80
>> May =C2=A04 16:35:14 maxim-laptop kernel: [16939.109235] =C2=A0[<fff=
fffff802b1651>] ? handle_mm_fault+0x1e1/0x830
>> May =C2=A04 16:35:14 maxim-laptop kernel: [16939.109243] =C2=A0[<fff=
fffff803d8f21>] ? __up_read+0x91/0xb0
>> May =C2=A04 16:35:14 maxim-laptop kernel: [16939.109252] =C2=A0[<fff=
fffff8020bf2b>] system_call_fastpath+0x16/0x1b
>> May =C2=A04 16:35:14 maxim-laptop kernel: [16939.109261] Code: a1 00=
00 00 0f be 50 39 0f be 70 38 48 c7 c7 70 07 0f a0 31 c0 e8 15 da 49 e=
0 e9 55 fe ff ff 0f 0b eb fe 0f 0b eb fe 0f 0b eb fe <0f> 0b eb fe 48 8=
b 35 e5 38 01 00 4c 89 e7 e8 05 f9 ff ff 49 89
>> May =C2=A04 16:35:14 maxim-laptop kernel: [16939.109329] RIP =C2=A0[=
<ffffffffa00e4f58>] set_regdom+0x428/0x4c0 [cfg80211]
>> May =C2=A04 16:35:14 maxim-laptop kernel: [16939.109344] =C2=A0RSP <=
ffff88007deab978>
>> May =C2=A04 16:35:14 maxim-laptop kernel: [16939.109350] ---[ end tr=
ace 695815cef5ce0efe ]---
>>
>
> This happens @ my university where APs send country code, and thus NM=
calls crda to apply it.
> (I already applied it in initscripts)
>
> intel 3945 device, iwlwifi.git commit
> #5a94b6d38100b7056a5a347e5c51359d924d305d

Please use wireless-testing.

Luis

2009-05-10 18:33:28

by Luis R. Rodriguez

[permalink] [raw]
Subject: Re: [BUG] Crda oopses the system

On Sun, May 10, 2009 at 5:20 AM, Maxim Levitsky <[email protected]=
m> wrote:
> On Wed, 2009-05-06 at 16:34 -0400, Pavel Roskin wrote:
>> gdb /lib/modules/`uname -r`/kernel/net/wireless/cfg80211.ko
>
> I have recompiled the kernel with debugging info.
>
> This is new backtrace:
>
>
>> May =C2=A07 10:27:05 maxim-laptop kernel: [ 5411.203085] -----------=
-[ cut here ]------------
>> May =C2=A07 10:27:05 maxim-laptop kernel: [ 5411.203096] kernel BUG =
at /home/maxim/software/kernel/linux-2.6/net/wireless/reg.c:2039!
>> May =C2=A07 10:27:05 maxim-laptop kernel: [ 5411.203103] invalid opc=
ode: 0000 [#1] PREEMPT SMP
>> May =C2=A07 10:27:05 maxim-laptop kernel: [ 5411.203115] last sysfs =
file: /sys/devices/platform/coretemp.1/temp1_input
>> May =C2=A07 10:27:05 maxim-laptop kernel: [ 5411.203121] CPU 0
>> May =C2=A07 10:27:05 maxim-laptop kernel: [ 5411.203127] Modules lin=
ked in: iwl3945 iwlcore mac80211 cfg80211 cpufreq_stats af_packet nvidi=
a(P) nfsd exportfs nfs lockd nfs_acl auth_rpcgss sunrpc usb_storage usb=
_libusual cpufreq_powersave cpufreq_conservative cpufreq_userspace acpi=
_cpufreq coretemp sbp2 snd_hda_codec_realtek snd_hda_intel joydev snd_h=
da_codec uvcvideo snd_hwdep videodev v4l1_compat acer_wmi rfkill v4l2_c=
ompat_ioctl32 sdhci_pci uhci_hcd snd_pcm backlight psmouse serio_raw oh=
ci1394 sdhci iTCO_wdt iTCO_vendor_support snd_timer snd_page_alloc ehci=
_hcd usbcore evdev wmi fuse [last unloaded: cfg80211]
>> May =C2=A07 10:27:05 maxim-laptop kernel: [ 5411.203258] Pid: 20876,=
comm: crda Tainted: P =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 2.6.30-rc4-wl=
#58 Aspire 5720
>> May =C2=A07 10:27:05 maxim-laptop kernel: [ 5411.203265] RIP: 0010:[=
<ffffffffa0c9611e>] =C2=A0[<ffffffffa0c9611e>] set_regdom+0x43e/0x4d0 [=
cfg80211]
>> May =C2=A07 10:27:05 maxim-laptop kernel: [ 5411.203291] RSP: 0018:f=
fff8800638c1978 =C2=A0EFLAGS: 00010246
>> May =C2=A07 10:27:05 maxim-laptop kernel: [ 5411.203296] RAX: ffff88=
005176df68 RBX: ffff880034b601a0 RCX: ffffffffa0ca9540
>> May =C2=A07 10:27:05 maxim-laptop kernel: [ 5411.203302] RDX: ffff88=
0034b60000 RSI: 0000000000000000 RDI: 0000000000000000
>> May =C2=A07 10:27:05 maxim-laptop kernel: [ 5411.203308] RBP: ffff88=
00638c1998 R08: 0000000000000001 R09: 0000000000000001
>> May =C2=A07 10:27:05 maxim-laptop kernel: [ 5411.203314] R10: 000000=
0000000000 R11: 0000000000000000 R12: ffff88004f889a90
>> May =C2=A07 10:27:05 maxim-laptop kernel: [ 5411.203320] R13: ffff88=
007e18e6e0 R14: 0000000000000001 R15: 0000000000000001
>> May =C2=A07 10:27:05 maxim-laptop kernel: [ 5411.203326] FS: =C2=A00=
0007f1c069c56f0(0000) GS:ffff880001025000(0000) knlGS:0000000000000000
>> May =C2=A07 10:27:05 maxim-laptop kernel: [ 5411.203333] CS: =C2=A00=
010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> May =C2=A07 10:27:05 maxim-laptop kernel: [ 5411.203338] CR2: 00007f=
1c0632fdb0 CR3: 000000007fa65000 CR4: 00000000000006e0
>> May =C2=A07 10:27:05 maxim-laptop kernel: [ 5411.203344] DR0: 000000=
0000000000 DR1: 0000000000000000 DR2: 0000000000000000
>> May =C2=A07 10:27:05 maxim-laptop kernel: [ 5411.203350] DR3: 000000=
0000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
>> May =C2=A07 10:27:05 maxim-laptop kernel: [ 5411.203357] Process crd=
a (pid: 20876, threadinfo ffff8800638c0000, task ffff880067093e80)
>> May =C2=A07 10:27:05 maxim-laptop kernel: [ 5411.203362] Stack:
>> May =C2=A07 10:27:05 maxim-laptop kernel: [ 5411.203366] =C2=A000000=
00000000000 ffff88004f889a90 0000000000000000 ffff88007e18e6e0
>> May =C2=A07 10:27:05 maxim-laptop kernel: [ 5411.203379] =C2=A0ffff8=
800638c1a28 ffffffffa0c9c672 ffff88005e15ab40 ffff88004f889a90
>> May =C2=A07 10:27:05 maxim-laptop kernel: [ 5411.203393] =C2=A000000=
00000000000 ffff88007e18e6e4 ffff88007e18e6ec ffff88007e18e6f4
>> May =C2=A07 10:27:05 maxim-laptop kernel: [ 5411.203410] Call Trace:
>> May =C2=A07 10:27:05 maxim-laptop kernel: [ 5411.203416] =C2=A0[<fff=
fffffa0c9c672>] nl80211_set_reg+0x112/0x2c0 [cfg80211]
>> May =C2=A07 10:27:05 maxim-laptop kernel: [ 5411.203436] =C2=A0[<fff=
fffff80412c8f>] ? nla_parse+0xef/0x110
>> May =C2=A07 10:27:05 maxim-laptop kernel: [ 5411.203450] =C2=A0[<fff=
fffff8054dac6>] genl_rcv_msg+0x1b6/0x1f0
>> May =C2=A07 10:27:05 maxim-laptop kernel: [ 5411.203462] =C2=A0[<fff=
fffff8054d910>] ? genl_rcv_msg+0x0/0x1f0
>> May =C2=A07 10:27:05 maxim-laptop kernel: [ 5411.203471] =C2=A0[<fff=
fffff8054d0e9>] netlink_rcv_skb+0x89/0xb0
>> May =C2=A07 10:27:05 maxim-laptop kernel: [ 5411.203479] =C2=A0[<fff=
fffff8054d8ee>] genl_rcv+0x2e/0x50
>> May =C2=A07 10:27:05 maxim-laptop kernel: [ 5411.203488] =C2=A0[<fff=
fffff8054c917>] ? netlink_unicast+0x117/0x2e0
>> May =C2=A07 10:27:05 maxim-laptop kernel: [ 5411.203498] =C2=A0[<fff=
fffff8054cac4>] netlink_unicast+0x2c4/0x2e0
>> May =C2=A07 10:27:05 maxim-laptop kernel: [ 5411.203508] =C2=A0[<fff=
fffff8052c4b3>] ? __alloc_skb+0x73/0x160
>> May =C2=A07 10:27:05 maxim-laptop kernel: [ 5411.203519] =C2=A0[<fff=
fffff8054ccde>] netlink_sendmsg+0x1fe/0x300
>> May =C2=A07 10:27:05 maxim-laptop kernel: [ 5411.203528] =C2=A0[<fff=
fffff80522f07>] sock_sendmsg+0x127/0x140
>> May =C2=A07 10:27:05 maxim-laptop kernel: [ 5411.203537] =C2=A0[<fff=
fffff80522d61>] ? sock_recvmsg+0x141/0x160
>> May =C2=A07 10:27:05 maxim-laptop kernel: [ 5411.203546] =C2=A0[<fff=
fffff80260ad0>] ? autoremove_wake_function+0x0/0x40
>> May =C2=A07 10:27:05 maxim-laptop kernel: [ 5411.203558] =C2=A0[<fff=
fffff80293602>] ? __rcu_read_unlock+0xa2/0xc0
>> May =C2=A07 10:27:05 maxim-laptop kernel: [ 5411.203567] =C2=A0[<fff=
fffff802730e9>] ? trace_hardirqs_on_caller+0x29/0x1c0
>> May =C2=A07 10:27:05 maxim-laptop kernel: [ 5411.203578] =C2=A0[<fff=
fffff805219d0>] ? move_addr_to_kernel+0x30/0x40
>> May =C2=A07 10:27:05 maxim-laptop kernel: [ 5411.203588] =C2=A0[<fff=
fffff8052db51>] ? verify_iovec+0x41/0xd0
>> May =C2=A07 10:27:05 maxim-laptop kernel: [ 5411.203597] =C2=A0[<fff=
fffff805230ae>] sys_sendmsg+0x18e/0x320
>> May =C2=A07 10:27:05 maxim-laptop kernel: [ 5411.203607] =C2=A0[<fff=
fffff805c4505>] ? _spin_unlock_irqrestore+0x65/0x80
>> May =C2=A07 10:27:05 maxim-laptop kernel: [ 5411.203619] =C2=A0[<fff=
fffff805c77b1>] ? sub_preempt_count+0x51/0x60
>> May =C2=A07 10:27:05 maxim-laptop kernel: [ 5411.203628] =C2=A0[<fff=
fffff80403c21>] ? __up_read+0x91/0xb0
>> May =C2=A07 10:27:05 maxim-laptop kernel: [ 5411.203639] =C2=A0[<fff=
fffff802730e9>] ? trace_hardirqs_on_caller+0x29/0x1c0
>> May =C2=A07 10:27:05 maxim-laptop kernel: [ 5411.203648] =C2=A0[<fff=
fffff805c3e4e>] ? trace_hardirqs_on_thunk+0x3a/0x3f
>> May =C2=A07 10:27:05 maxim-laptop kernel: [ 5411.203658] =C2=A0[<fff=
fffff8020c15b>] system_call_fastpath+0x16/0x1b
>> May =C2=A07 10:27:05 maxim-laptop kernel: [ 5411.203670] Code: 90 91=
00 00 00 0f be b0 90 00 00 00 48 c7 c7 60 1b ca a0 31 c0 e8 13 9f 92 d=
f e9 4f fe ff ff 0f 0b eb fe 0f 0b eb fe 0f 0b eb fe <0f> 0b eb fe 48 8=
b 35 17 43 01 00 4c 89 e7 e8 df f8 ff ff 49 89
>> May =C2=A07 10:27:05 maxim-laptop kernel: [ 5411.203815] RIP =C2=A0[=
<ffffffffa0c9611e>] set_regdom+0x43e/0x4d0 [cfg80211]
>> May =C2=A07 10:27:05 maxim-laptop kernel: [ 5411.203834] =C2=A0RSP <=
ffff8800638c1978>
>> May =C2=A07 10:27:05 maxim-laptop kernel: [ 5411.203842] ---[ end tr=
ace 9723f71e550687a4 ]---
>>
>
>
> This is GDB output - can be inaccurate - I have pulled latest wireles=
s-testing , and rebuild kernel again.
> I will read the source, and try to fix this
> This is 100% reproducible
>
>
>
>> (gdb) l *nl80211_set_reg+0x112
>> 0x96a2 is in nl80211_set_reg (/home/maxim/software/kernel/linux-2.6/=
net/wireless/nl80211.c:2587).
>> 2582
>> 2583 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0BUG_ON(rule_idx !=3D num_rule=
s);
>> 2584
>> 2585 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0mutex_lock(&cfg80211_mutex);
>> 2586 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0r =3D set_regdom(rd);
>> 2587 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0mutex_unlock(&cfg80211_mutex)=
;
>> 2588 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0return r;
>> 2589
>> 2590 =C2=A0 bad_reg:
>> 2591 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0kfree(rd);
>> (gdb)
>

Hm I don't see a BUG_ON at my net/wireless/reg.c:2039, can you please
try with wireless-testing or paste the line 2039 in your
net/wrieless/reg.c. Also if you can provide steps to how to get the
code you have and how to reproduce it would help.

Luis

2009-05-12 22:00:49

by Luis R. Rodriguez

[permalink] [raw]
Subject: Re: [BUG] Crda oopses the system

On Tue, May 12, 2009 at 10:34 AM, Maxim Levitsky
<[email protected]> wrote:
> On Tue, 2009-05-12 at 10:24 -0700, Luis R. Rodriguez wrote:
>> On Tue, May 12, 2009 at 8:05 AM, Maxim Levitsky <[email protected]> wrote:
>> > On Sun, 2009-05-10 at 12:36 -0700, Luis R. Rodriguez wrote:
>> >> On Sun, May 10, 2009 at 12:34 PM, Maxim Levitsky
>> >> <[email protected]> wrote:
>> >> > On Sun, 2009-05-10 at 11:33 -0700, Luis R. Rodriguez wrote:
>> >> >> On Sun, May 10, 2009 at 5:20 AM, Maxim Levitsky <[email protected]> wrote:
>> >> >> > On Wed, 2009-05-06 at 16:34 -0400, Pavel Roskin wrote:
>> >> >> >> gdb /lib/modules/`uname -r`/kernel/net/wireless/cfg80211.ko
>> >> >> >
>> >> >> > I have recompiled the kernel with debugging info.
>> >> >> >
>> >> >> > This is new backtrace:
>> >
>> > Ok, here is what I found:
>> >
>> > The real BUG_ON that fires up is at
>> >
>> > wireless/reg.c: __set_regdom: BUG_ON(!country_ie_regdomain)
>> >
>> > Confirmed by replacing it with printk.
>> >
>> >
>> > I attach a beacon sample that triggers this:
>>
>> Thanks for that. Was this from wireless-testing? If so as of what SHA1
>> sum? Can you provide steps to reproduce? What AP? What card?
> wireless-testing, b2382a4aeff07a481ccf860e4f716b48b52e3781

That HEAD is buggy, its missing a fix for the minstrel/pid fix.
Although since you are running iwl3945 it explains why you haven't
seen that oops.

Will look into this -- thanks. How easy can you reproduce BTW? Does it
happen immediately upon assoc? Does it happen with other APs at the
university?

Luis

2009-05-12 17:18:13

by Maxim Levitsky

[permalink] [raw]
Subject: Re: [BUG] Crda oopses the system

On Sun, 2009-05-10 at 12:36 -0700, Luis R. Rodriguez wrote:
> On Sun, May 10, 2009 at 12:34 PM, Maxim Levitsky
> <[email protected]> wrote:
> > On Sun, 2009-05-10 at 11:33 -0700, Luis R. Rodriguez wrote:
> >> On Sun, May 10, 2009 at 5:20 AM, Maxim Levitsky <[email protected]> wrote:
> >> > On Wed, 2009-05-06 at 16:34 -0400, Pavel Roskin wrote:
> >> >> gdb /lib/modules/`uname -r`/kernel/net/wireless/cfg80211.ko
> >> >
> >> > I have recompiled the kernel with debugging info.
> >> >
> >> > This is new backtrace:

Ok, here is what I found:

The real BUG_ON that fires up is at

wireless/reg.c: __set_regdom: BUG_ON(!country_ie_regdomain)

Confirmed by replacing it with printk.


I attach a beacon sample that triggers this:


Best regards,
Maxim Levitsky


Attachments:
beacon.pcap (248.00 B)

2009-05-06 20:34:16

by Pavel Roskin

[permalink] [raw]
Subject: Re: [BUG] Crda oopses the system

Hello!

(Dropping the Intel list; I not subscribed to it and I don't think it's
Intel specific)

On Wed, 2009-05-06 at 13:47 +0300, Maxim Levitsky wrote:
> On Tue, 2009-05-05 at 11:46 -0700, Luis R. Rodriguez wrote:
> > On Mon, May 4, 2009 at 10:04 PM, Maxim Levitsky <[email protected]> wrote:
> > > Here what I see:
> > >
> > >
> > >> May 4 16:35:14 maxim-laptop kernel: [16939.109054] Process crda (pid: 29344, threadinfo ffff88007deaa000, task ffff880067369600)
> > >> May 4 16:35:14 maxim-laptop kernel: [16939.109058] Stack:
> > >> May 4 16:35:14 maxim-laptop kernel: [16939.109061] 0000000000000034 ffff88003fb8a640 0000000000000000 ffff8800634f5c20
> > >> May 4 16:35:14 maxim-laptop kernel: [16939.109069] ffff88007deaba28 ffffffffa00eb3d0 000004f87deab9c8 ffff88003fb8a640
> > >> May 4 16:35:14 maxim-laptop kernel: [16939.109077] 0000000000000000 ffff8800634f5c24 ffff8800634f5c2c ffff8800634f5c34
> > >> May 4 16:35:14 maxim-laptop kernel: [16939.109086] Call Trace:
> > >> May 4 16:35:14 maxim-laptop kernel: [16939.109091] [<ffffffffa00eb3d0>] nl80211_set_reg+0x100/0x2b0 [cfg80211]

It would be great if you run:
gdb /lib/modules/`uname -r`/kernel/net/wireless/cfg80211.ko

And then inside gdb:
l *nl80211_set_reg+0x100

And then send us the output.

--
Regards,
Pavel Roskin

2009-05-13 21:29:04

by Luis R. Rodriguez

[permalink] [raw]
Subject: Re: [BUG] Crda oopses the system

On Tue, May 12, 2009 at 3:07 PM, Maxim Levitsky <[email protected]> wrote:
> On Tue, 2009-05-12 at 15:00 -0700, Luis R. Rodriguez wrote:
>> On Tue, May 12, 2009 at 10:34 AM, Maxim Levitsky
>> <[email protected]> wrote:
>> > On Tue, 2009-05-12 at 10:24 -0700, Luis R. Rodriguez wrote:
>> >> On Tue, May 12, 2009 at 8:05 AM, Maxim Levitsky <[email protected]> wrote:
>> >> > On Sun, 2009-05-10 at 12:36 -0700, Luis R. Rodriguez wrote:
>> >> >> On Sun, May 10, 2009 at 12:34 PM, Maxim Levitsky
>> >> >> <[email protected]> wrote:
>> >> >> > On Sun, 2009-05-10 at 11:33 -0700, Luis R. Rodriguez wrote:
>> >> >> >> On Sun, May 10, 2009 at 5:20 AM, Maxim Levitsky <[email protected]> wrote:
>> >> >> >> > On Wed, 2009-05-06 at 16:34 -0400, Pavel Roskin wrote:
>> >> >> >> >> gdb /lib/modules/`uname -r`/kernel/net/wireless/cfg80211.ko
>> >> >> >> >
>> >> >> >> > I have recompiled the kernel with debugging info.
>> >> >> >> >
>> >> >> >> > This is new backtrace:
>> >> >
>> >> > Ok, here is what I found:
>> >> >
>> >> > The real BUG_ON that fires up is at
>> >> >
>> >> > wireless/reg.c: __set_regdom: BUG_ON(!country_ie_regdomain)
>> >> >
>> >> > Confirmed by replacing it with printk.
>> >> >
>> >> >
>> >> > I attach a beacon sample that triggers this:
>> >>
>> >> Thanks for that. Was this from wireless-testing? If so as of what SHA1
>> >> sum? Can you provide steps to reproduce? What AP? What card?
>> > wireless-testing, b2382a4aeff07a481ccf860e4f716b48b52e3781
>>
>> That HEAD is buggy, its missing a fix for the minstrel/pid fix.
>> Although since you are running iwl3945 it explains why you haven't
>> seen that oops.
>>
>> Will look into this -- thanks. How easy can you reproduce BTW? Does it
>> happen immediately upon assoc? Does it happen with other APs at the
>> university?
> Happens always when I try to associate.

Great so it will be easy to confirm a fix. Can you please try the 4
patches I just posted? I haven't finished reviewing your case but so
far from the pcap review (which was really helpful) I don't see
anything funky from the beacon except that the FCS is bad. But that
shouldn't matter.

Anyway please try the patches I posted and I'll finish reviewing this
in the meantime.

> Other APs, I don't see many other APs there, (you mean other essids,
> right?)

I meant other APs at the University on the same network.

> Most of them are encrypted anyway.

OK this should not matter.

Luis

2009-05-10 19:36:59

by Luis R. Rodriguez

[permalink] [raw]
Subject: Re: [BUG] Crda oopses the system

On Sun, May 10, 2009 at 12:34 PM, Maxim Levitsky
<[email protected]> wrote:
> On Sun, 2009-05-10 at 11:33 -0700, Luis R. Rodriguez wrote:
>> On Sun, May 10, 2009 at 5:20 AM, Maxim Levitsky <maximlevitsky@gmail=
=2Ecom> wrote:
>> > On Wed, 2009-05-06 at 16:34 -0400, Pavel Roskin wrote:
>> >> gdb /lib/modules/`uname -r`/kernel/net/wireless/cfg80211.ko
>> >
>> > I have recompiled the kernel with debugging info.
>> >
>> > This is new backtrace:
>> >
>> >
>> >> May =C2=A07 10:27:05 maxim-laptop kernel: [ 5411.203085] --------=
----[ cut here ]------------
>> >> May =C2=A07 10:27:05 maxim-laptop kernel: [ 5411.203096] kernel B=
UG at /home/maxim/software/kernel/linux-2.6/net/wireless/reg.c:2039!
>> >> May =C2=A07 10:27:05 maxim-laptop kernel: [ 5411.203103] invalid =
opcode: 0000 [#1] PREEMPT SMP
>> >> May =C2=A07 10:27:05 maxim-laptop kernel: [ 5411.203115] last sys=
fs file: /sys/devices/platform/coretemp.1/temp1_input
>> >> May =C2=A07 10:27:05 maxim-laptop kernel: [ 5411.203121] CPU 0
>> >> May =C2=A07 10:27:05 maxim-laptop kernel: [ 5411.203127] Modules =
linked in: iwl3945 iwlcore mac80211 cfg80211 cpufreq_stats af_packet nv=
idia(P) nfsd exportfs nfs lockd nfs_acl auth_rpcgss sunrpc usb_storage =
usb_libusual cpufreq_powersave cpufreq_conservative cpufreq_userspace a=
cpi_cpufreq coretemp sbp2 snd_hda_codec_realtek snd_hda_intel joydev sn=
d_hda_codec uvcvideo snd_hwdep videodev v4l1_compat acer_wmi rfkill v4l=
2_compat_ioctl32 sdhci_pci uhci_hcd snd_pcm backlight psmouse serio_raw=
ohci1394 sdhci iTCO_wdt iTCO_vendor_support snd_timer snd_page_alloc e=
hci_hcd usbcore evdev wmi fuse [last unloaded: cfg80211]
>> >> May =C2=A07 10:27:05 maxim-laptop kernel: [ 5411.203258] Pid: 208=
76, comm: crda Tainted: P =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 2.6.30-rc4=
-wl #58 Aspire 5720
>> >> May =C2=A07 10:27:05 maxim-laptop kernel: [ 5411.203265] RIP: 001=
0:[<ffffffffa0c9611e>] =C2=A0[<ffffffffa0c9611e>] set_regdom+0x43e/0x4d=
0 [cfg80211]
>> >> May =C2=A07 10:27:05 maxim-laptop kernel: [ 5411.203291] RSP: 001=
8:ffff8800638c1978 =C2=A0EFLAGS: 00010246
>> >> May =C2=A07 10:27:05 maxim-laptop kernel: [ 5411.203296] RAX: fff=
f88005176df68 RBX: ffff880034b601a0 RCX: ffffffffa0ca9540
>> >> May =C2=A07 10:27:05 maxim-laptop kernel: [ 5411.203302] RDX: fff=
f880034b60000 RSI: 0000000000000000 RDI: 0000000000000000
>> >> May =C2=A07 10:27:05 maxim-laptop kernel: [ 5411.203308] RBP: fff=
f8800638c1998 R08: 0000000000000001 R09: 0000000000000001
>> >> May =C2=A07 10:27:05 maxim-laptop kernel: [ 5411.203314] R10: 000=
0000000000000 R11: 0000000000000000 R12: ffff88004f889a90
>> >> May =C2=A07 10:27:05 maxim-laptop kernel: [ 5411.203320] R13: fff=
f88007e18e6e0 R14: 0000000000000001 R15: 0000000000000001
>> >> May =C2=A07 10:27:05 maxim-laptop kernel: [ 5411.203326] FS: =C2=A0=
00007f1c069c56f0(0000) GS:ffff880001025000(0000) knlGS:0000000000000000
>> >> May =C2=A07 10:27:05 maxim-laptop kernel: [ 5411.203333] CS: =C2=A0=
0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> >> May =C2=A07 10:27:05 maxim-laptop kernel: [ 5411.203338] CR2: 000=
07f1c0632fdb0 CR3: 000000007fa65000 CR4: 00000000000006e0
>> >> May =C2=A07 10:27:05 maxim-laptop kernel: [ 5411.203344] DR0: 000=
0000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>> >> May =C2=A07 10:27:05 maxim-laptop kernel: [ 5411.203350] DR3: 000=
0000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
>> >> May =C2=A07 10:27:05 maxim-laptop kernel: [ 5411.203357] Process =
crda (pid: 20876, threadinfo ffff8800638c0000, task ffff880067093e80)
>> >> May =C2=A07 10:27:05 maxim-laptop kernel: [ 5411.203362] Stack:
>> >> May =C2=A07 10:27:05 maxim-laptop kernel: [ 5411.203366] =C2=A000=
00000000000000 ffff88004f889a90 0000000000000000 ffff88007e18e6e0
>> >> May =C2=A07 10:27:05 maxim-laptop kernel: [ 5411.203379] =C2=A0ff=
ff8800638c1a28 ffffffffa0c9c672 ffff88005e15ab40 ffff88004f889a90
>> >> May =C2=A07 10:27:05 maxim-laptop kernel: [ 5411.203393] =C2=A000=
00000000000000 ffff88007e18e6e4 ffff88007e18e6ec ffff88007e18e6f4
>> >> May =C2=A07 10:27:05 maxim-laptop kernel: [ 5411.203410] Call Tra=
ce:
>> >> May =C2=A07 10:27:05 maxim-laptop kernel: [ 5411.203416] =C2=A0[<=
ffffffffa0c9c672>] nl80211_set_reg+0x112/0x2c0 [cfg80211]
>> >> May =C2=A07 10:27:05 maxim-laptop kernel: [ 5411.203436] =C2=A0[<=
ffffffff80412c8f>] ? nla_parse+0xef/0x110
>> >> May =C2=A07 10:27:05 maxim-laptop kernel: [ 5411.203450] =C2=A0[<=
ffffffff8054dac6>] genl_rcv_msg+0x1b6/0x1f0
>> >> May =C2=A07 10:27:05 maxim-laptop kernel: [ 5411.203462] =C2=A0[<=
ffffffff8054d910>] ? genl_rcv_msg+0x0/0x1f0
>> >> May =C2=A07 10:27:05 maxim-laptop kernel: [ 5411.203471] =C2=A0[<=
ffffffff8054d0e9>] netlink_rcv_skb+0x89/0xb0
>> >> May =C2=A07 10:27:05 maxim-laptop kernel: [ 5411.203479] =C2=A0[<=
ffffffff8054d8ee>] genl_rcv+0x2e/0x50
>> >> May =C2=A07 10:27:05 maxim-laptop kernel: [ 5411.203488] =C2=A0[<=
ffffffff8054c917>] ? netlink_unicast+0x117/0x2e0
>> >> May =C2=A07 10:27:05 maxim-laptop kernel: [ 5411.203498] =C2=A0[<=
ffffffff8054cac4>] netlink_unicast+0x2c4/0x2e0
>> >> May =C2=A07 10:27:05 maxim-laptop kernel: [ 5411.203508] =C2=A0[<=
ffffffff8052c4b3>] ? __alloc_skb+0x73/0x160
>> >> May =C2=A07 10:27:05 maxim-laptop kernel: [ 5411.203519] =C2=A0[<=
ffffffff8054ccde>] netlink_sendmsg+0x1fe/0x300
>> >> May =C2=A07 10:27:05 maxim-laptop kernel: [ 5411.203528] =C2=A0[<=
ffffffff80522f07>] sock_sendmsg+0x127/0x140
>> >> May =C2=A07 10:27:05 maxim-laptop kernel: [ 5411.203537] =C2=A0[<=
ffffffff80522d61>] ? sock_recvmsg+0x141/0x160
>> >> May =C2=A07 10:27:05 maxim-laptop kernel: [ 5411.203546] =C2=A0[<=
ffffffff80260ad0>] ? autoremove_wake_function+0x0/0x40
>> >> May =C2=A07 10:27:05 maxim-laptop kernel: [ 5411.203558] =C2=A0[<=
ffffffff80293602>] ? __rcu_read_unlock+0xa2/0xc0
>> >> May =C2=A07 10:27:05 maxim-laptop kernel: [ 5411.203567] =C2=A0[<=
ffffffff802730e9>] ? trace_hardirqs_on_caller+0x29/0x1c0
>> >> May =C2=A07 10:27:05 maxim-laptop kernel: [ 5411.203578] =C2=A0[<=
ffffffff805219d0>] ? move_addr_to_kernel+0x30/0x40
>> >> May =C2=A07 10:27:05 maxim-laptop kernel: [ 5411.203588] =C2=A0[<=
ffffffff8052db51>] ? verify_iovec+0x41/0xd0
>> >> May =C2=A07 10:27:05 maxim-laptop kernel: [ 5411.203597] =C2=A0[<=
ffffffff805230ae>] sys_sendmsg+0x18e/0x320
>> >> May =C2=A07 10:27:05 maxim-laptop kernel: [ 5411.203607] =C2=A0[<=
ffffffff805c4505>] ? _spin_unlock_irqrestore+0x65/0x80
>> >> May =C2=A07 10:27:05 maxim-laptop kernel: [ 5411.203619] =C2=A0[<=
ffffffff805c77b1>] ? sub_preempt_count+0x51/0x60
>> >> May =C2=A07 10:27:05 maxim-laptop kernel: [ 5411.203628] =C2=A0[<=
ffffffff80403c21>] ? __up_read+0x91/0xb0
>> >> May =C2=A07 10:27:05 maxim-laptop kernel: [ 5411.203639] =C2=A0[<=
ffffffff802730e9>] ? trace_hardirqs_on_caller+0x29/0x1c0
>> >> May =C2=A07 10:27:05 maxim-laptop kernel: [ 5411.203648] =C2=A0[<=
ffffffff805c3e4e>] ? trace_hardirqs_on_thunk+0x3a/0x3f
>> >> May =C2=A07 10:27:05 maxim-laptop kernel: [ 5411.203658] =C2=A0[<=
ffffffff8020c15b>] system_call_fastpath+0x16/0x1b
>> >> May =C2=A07 10:27:05 maxim-laptop kernel: [ 5411.203670] Code: 90=
91 00 00 00 0f be b0 90 00 00 00 48 c7 c7 60 1b ca a0 31 c0 e8 13 9f 9=
2 df e9 4f fe ff ff 0f 0b eb fe 0f 0b eb fe 0f 0b eb fe <0f> 0b eb fe 4=
8 8b 35 17 43 01 00 4c 89 e7 e8 df f8 ff ff 49 89
>> >> May =C2=A07 10:27:05 maxim-laptop kernel: [ 5411.203815] RIP =C2=A0=
[<ffffffffa0c9611e>] set_regdom+0x43e/0x4d0 [cfg80211]
>> >> May =C2=A07 10:27:05 maxim-laptop kernel: [ 5411.203834] =C2=A0RS=
P <ffff8800638c1978>
>> >> May =C2=A07 10:27:05 maxim-laptop kernel: [ 5411.203842] ---[ end=
trace 9723f71e550687a4 ]---
>> >>
>> >
>> >
>> > This is GDB output - can be inaccurate - I have pulled latest wire=
less-testing , and rebuild kernel again.
>> > I will read the source, and try to fix this
>> > This is 100% reproducible
>> >
>> >
>> >
>> >> (gdb) l *nl80211_set_reg+0x112
>> >> 0x96a2 is in nl80211_set_reg (/home/maxim/software/kernel/linux-2=
=2E6/net/wireless/nl80211.c:2587).
>> >> 2582
>> >> 2583 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0BUG_ON(rule_idx !=3D num_r=
ules);
>> >> 2584
>> >> 2585 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0mutex_lock(&cfg80211_mutex=
);
>> >> 2586 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0r =3D set_regdom(rd);
>> >> 2587 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0mutex_unlock(&cfg80211_mut=
ex);
>> >> 2588 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0return r;
>> >> 2589
>> >> 2590 =C2=A0 bad_reg:
>> >> 2591 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0kfree(rd);
>> >> (gdb)
>> >
>>
>> Hm I don't see a BUG_ON at my net/wireless/reg.c:2039, can you pleas=
e
>> try with wireless-testing or paste the line 2039 in your
>> net/wrieless/reg.c. Also if you can provide steps to how to get the
>> code you have and how to reproduce it would help.
>>
>> =C2=A0 Luis
>
> Sorry for confusion....
> This is iwlwifi.git tree.

Like I said before please use wireless-testing, I don't know what goes
into that tree or if the fixes which I have posted get propagated.

Luis

2009-05-12 17:34:37

by Maxim Levitsky

[permalink] [raw]
Subject: Re: [BUG] Crda oopses the system

On Tue, 2009-05-12 at 10:24 -0700, Luis R. Rodriguez wrote:
> On Tue, May 12, 2009 at 8:05 AM, Maxim Levitsky <[email protected]> wrote:
> > On Sun, 2009-05-10 at 12:36 -0700, Luis R. Rodriguez wrote:
> >> On Sun, May 10, 2009 at 12:34 PM, Maxim Levitsky
> >> <[email protected]> wrote:
> >> > On Sun, 2009-05-10 at 11:33 -0700, Luis R. Rodriguez wrote:
> >> >> On Sun, May 10, 2009 at 5:20 AM, Maxim Levitsky <[email protected]> wrote:
> >> >> > On Wed, 2009-05-06 at 16:34 -0400, Pavel Roskin wrote:
> >> >> >> gdb /lib/modules/`uname -r`/kernel/net/wireless/cfg80211.ko
> >> >> >
> >> >> > I have recompiled the kernel with debugging info.
> >> >> >
> >> >> > This is new backtrace:
> >
> > Ok, here is what I found:
> >
> > The real BUG_ON that fires up is at
> >
> > wireless/reg.c: __set_regdom: BUG_ON(!country_ie_regdomain)
> >
> > Confirmed by replacing it with printk.
> >
> >
> > I attach a beacon sample that triggers this:
>
> Thanks for that. Was this from wireless-testing? If so as of what SHA1
> sum? Can you provide steps to reproduce? What AP? What card?
wireless-testing, b2382a4aeff07a481ccf860e4f716b48b52e3781

Steps to reproduce is to connect to an AP that transmits a beacon frame
like I attached.
Happens always.

-git NM and wpa_supplicant, running normally.


What AP? I don't know myself.

What card?, iwl3945


Best regards,
Maxim Levitsky


2009-05-31 20:54:22

by Maxim Levitsky

[permalink] [raw]
Subject: Re: [BUG] Crda oopses the system

On Sun, 2009-05-31 at 15:47 +0300, Maxim Levitsky wrote:
> On Sun, 2009-05-31 at 02:22 -0400, Luis R. Rodriguez wrote:
> > On Fri, May 22, 2009 at 01:08:22PM +0300, Maxim Levitsky wrote:
> > > I am talking about
> > >
> > > BUG_ON(!country_ie_regdomain);
> > > in net/wireless/reg.c
> >
> > Please try this patch and leave a window open with this running:
> >
> > iw event
> >
> > Please be sure to grab iw from git, not sure if the reg events
> > have all gone into an official release yet. What I'm looking for
> > is whether or not a previous 11d setting was already processed
> > or if the !country_ie_regdomain condition happens from the first
> > 11d processing.
> >
> > Luis
> >
> > diff --git a/net/wireless/reg.c b/net/wireless/reg.c
> > index f87ac1d..1b60dfc 100644
> > --- a/net/wireless/reg.c
> > +++ b/net/wireless/reg.c
> > @@ -2171,7 +2171,11 @@ static int __set_regdom(const struct ieee80211_regdomain *rd)
> > * the country IE rd with what CRDA believes that country should have
> > */
> >
> > - BUG_ON(!country_ie_regdomain);
> > + if (WARN_ON(!country_ie_regdomain)) {
> > + kfree(rd);
> > + rd = NULL;
> > + return -EINVAL;
> > + }
> > BUG_ON(rd == country_ie_regdomain);
> >
> > /*
> > @@ -2268,6 +2272,8 @@ int regulatory_init(void)
> > if (IS_ERR(reg_pdev))
> > return PTR_ERR(reg_pdev);
> >
> > + country_ie_regdomain = NULL;
> > +
> > spin_lock_init(&reg_requests_lock);
> > spin_lock_init(&reg_pending_beacons_lock);
> >
>
>
> I'll test this today.
> I have iw from git.
>
> Best regards,
> Maxim Levitsky
>










Here it is:


> wlan0 (phy #0): assoc 00:1b:9e:d8:77:02 -> 00:1b:77:f1:7c:29 status: 0: Successful
> wlan0 (phy #0): disassoc 00:1b:77:f1:7c:29 -> 00:1b:9e:d8:77:02 reason 3: Deauthenticated because sending station is leaving (or has left) the IBSS or ESS
> wlan0 (phy #0): scan finished
> wlan0 (phy #0): auth 00:23:4d:3c:80:27 -> 00:1b:77:f1:7c:29 status: 0: Successful
> wlan0 (phy #0): assoc 00:23:4d:3c:80:27 -> 00:1b:77:f1:7c:29 status: 0: Successful
> phy #0: regulatory domain change: intersection used due to a request made by a country IE on phy0
>
dmesg attached (I use nvidia drivers)



On top of that there are few more very bold bugs in ath5k AP mode:


1 - beacons are send only after I start hostapd twise (kill it, and
start again)

2 - ath5k makes kernel panic, reliably after few times hostapd have
started, I didn't yet captured the output.
I remember to see panics with ad-hoc as well.
I mean blinking leds on keyboard.

3 - couldn't transfer any frames between AP and client, only association
works.

I have started the hostapd, associated the clent (using link-local
feature of NM) assigned both ap and client an ip address (ifconfig wlan0
10.1.0.1/24 on AP, and ifconfig wlan0 10.1.0.2/24 on client ) and yet
even I couldn't receive even a arp reply from AP, and vise versa.

I use hostapd and wpa_supplicant frm latest git.




Attachments:
dmesg (63.77 kB)

2009-05-12 17:25:10

by Luis R. Rodriguez

[permalink] [raw]
Subject: Re: [BUG] Crda oopses the system

On Tue, May 12, 2009 at 8:05 AM, Maxim Levitsky <[email protected]> wrote:
> On Sun, 2009-05-10 at 12:36 -0700, Luis R. Rodriguez wrote:
>> On Sun, May 10, 2009 at 12:34 PM, Maxim Levitsky
>> <[email protected]> wrote:
>> > On Sun, 2009-05-10 at 11:33 -0700, Luis R. Rodriguez wrote:
>> >> On Sun, May 10, 2009 at 5:20 AM, Maxim Levitsky <[email protected]> wrote:
>> >> > On Wed, 2009-05-06 at 16:34 -0400, Pavel Roskin wrote:
>> >> >> gdb /lib/modules/`uname -r`/kernel/net/wireless/cfg80211.ko
>> >> >
>> >> > I have recompiled the kernel with debugging info.
>> >> >
>> >> > This is new backtrace:
>
> Ok, here is what I found:
>
> The real BUG_ON that fires up is at
>
> wireless/reg.c: __set_regdom: BUG_ON(!country_ie_regdomain)
>
> Confirmed by replacing it with printk.
>
>
> I attach a beacon sample that triggers this:

Thanks for that. Was this from wireless-testing? If so as of what SHA1
sum? Can you provide steps to reproduce? What AP? What card?

Luis

2009-05-14 14:27:12

by Bob Copeland

[permalink] [raw]
Subject: Re: [BUG] Crda oopses the system

On Wed, May 13, 2009 at 8:07 PM, Luis R. Rodriguez <[email protected]> wrote:
>>
>> I wish I had time to set up, an ath5k in AP mode. There are rumors that
>> it works more or less now, then I could use the same beacon frame to
>> test.
>
> It does and if it doesn't its a bug.

Just FYI, it's known to have problems if the STA uses power saving --
ath5k never updates the beacon. I have a patch that should work in the
relevant bugzilla but I haven't been back to retest it (at first I
thought it was causing hangs, but later realized the hangs were due to
other bugs in w-t).

--
Bob Copeland %% http://www.bobcopeland.com

2009-05-06 15:36:11

by Maxim Levitsky

[permalink] [raw]
Subject: Re: [BUG] Crda oopses the system

On Tue, 2009-05-05 at 11:46 -0700, Luis R. Rodriguez wrote:
> On Mon, May 4, 2009 at 10:04 PM, Maxim Levitsky <[email protected]> wrote:
> > Here what I see:
> >
> >
> >> May 4 16:35:14 maxim-laptop kernel: [16939.109054] Process crda (pid: 29344, threadinfo ffff88007deaa000, task ffff880067369600)
> >> May 4 16:35:14 maxim-laptop kernel: [16939.109058] Stack:
> >> May 4 16:35:14 maxim-laptop kernel: [16939.109061] 0000000000000034 ffff88003fb8a640 0000000000000000 ffff8800634f5c20
> >> May 4 16:35:14 maxim-laptop kernel: [16939.109069] ffff88007deaba28 ffffffffa00eb3d0 000004f87deab9c8 ffff88003fb8a640
> >> May 4 16:35:14 maxim-laptop kernel: [16939.109077] 0000000000000000 ffff8800634f5c24 ffff8800634f5c2c ffff8800634f5c34
> >> May 4 16:35:14 maxim-laptop kernel: [16939.109086] Call Trace:
> >> May 4 16:35:14 maxim-laptop kernel: [16939.109091] [<ffffffffa00eb3d0>] nl80211_set_reg+0x100/0x2b0 [cfg80211]
> >> May 4 16:35:14 maxim-laptop kernel: [16939.109107] [<ffffffff803e697f>] ? nla_parse+0xef/0x110
> >> May 4 16:35:14 maxim-laptop kernel: [16939.109118] [<ffffffff80513716>] genl_rcv_msg+0x1b6/0x1f0
> >> May 4 16:35:14 maxim-laptop kernel: [16939.109126] [<ffffffff80513560>] ? genl_rcv_msg+0x0/0x1f0
> >> May 4 16:35:14 maxim-laptop kernel: [16939.109132] [<ffffffff80512d49>] netlink_rcv_skb+0x89/0xb0
> >> May 4 16:35:14 maxim-laptop kernel: [16939.109140] [<ffffffff80513547>] genl_rcv+0x27/0x40
> >> May 4 16:35:14 maxim-laptop kernel: [16939.109146] [<ffffffff805128a9>] ? netlink_sendmsg+0x159/0x300
> >> May 4 16:35:14 maxim-laptop kernel: [16939.109153] [<ffffffff80512734>] netlink_unicast+0x2c4/0x2e0
> >> May 4 16:35:14 maxim-laptop kernel: [16939.109161] [<ffffffff804f356e>] ? __alloc_skb+0x6e/0x150
> >> May 4 16:35:14 maxim-laptop kernel: [16939.109169] [<ffffffff8051294e>] netlink_sendmsg+0x1fe/0x300
> >> May 4 16:35:14 maxim-laptop kernel: [16939.109176] [<ffffffff804ea607>] sock_sendmsg+0x127/0x140
> >> May 4 16:35:14 maxim-laptop kernel: [16939.109183] [<ffffffff8025be50>] ? autoremove_wake_function+0x0/0x40
> >> May 4 16:35:14 maxim-laptop kernel: [16939.109193] [<ffffffff8029f986>] ? get_page_from_freelist+0x3b6/0x650
> >> May 4 16:35:14 maxim-laptop kernel: [16939.109201] [<ffffffff80299985>] ? find_lock_page+0x25/0x70
> >> May 4 16:35:14 maxim-laptop kernel: [16939.109208] [<ffffffff804e924b>] ? move_addr_to_kernel+0x2b/0x40
> >> May 4 16:35:14 maxim-laptop kernel: [16939.109214] [<ffffffff804f4b8c>] ? verify_iovec+0x3c/0xd0
> >> May 4 16:35:14 maxim-laptop kernel: [16939.109221] [<ffffffff804ea7a9>] sys_sendmsg+0x189/0x320
> >> May 4 16:35:14 maxim-laptop kernel: [16939.109228] [<ffffffff804eb635>] ? move_addr_to_user+0x65/0x80
> >> May 4 16:35:14 maxim-laptop kernel: [16939.109235] [<ffffffff802b1651>] ? handle_mm_fault+0x1e1/0x830
> >> May 4 16:35:14 maxim-laptop kernel: [16939.109243] [<ffffffff803d8f21>] ? __up_read+0x91/0xb0
> >> May 4 16:35:14 maxim-laptop kernel: [16939.109252] [<ffffffff8020bf2b>] system_call_fastpath+0x16/0x1b
> >> May 4 16:35:14 maxim-laptop kernel: [16939.109261] Code: a1 00 00 00 0f be 50 39 0f be 70 38 48 c7 c7 70 07 0f a0 31 c0 e8 15 da 49 e0 e9 55 fe ff ff 0f 0b eb fe 0f 0b eb fe 0f 0b eb fe <0f> 0b eb fe 48 8b 35 e5 38 01 00 4c 89 e7 e8 05 f9 ff ff 49 89
> >> May 4 16:35:14 maxim-laptop kernel: [16939.109329] RIP [<ffffffffa00e4f58>] set_regdom+0x428/0x4c0 [cfg80211]
> >> May 4 16:35:14 maxim-laptop kernel: [16939.109344] RSP <ffff88007deab978>
> >> May 4 16:35:14 maxim-laptop kernel: [16939.109350] ---[ end trace 695815cef5ce0efe ]---
> >>
> >
> > This happens @ my university where APs send country code, and thus NM calls crda to apply it.
> > (I already applied it in initscripts)
> >
> > intel 3945 device, iwlwifi.git commit
> > #5a94b6d38100b7056a5a347e5c51359d924d305d
>
> Please use wireless-testing.
>
> Luis


I did, wireless-testing as of yesterday, crashes kernel in same way

Regards,
Maxim Levitsky


2009-05-13 23:22:36

by Maxim Levitsky

[permalink] [raw]
Subject: Re: [BUG] Crda oopses the system

On Wed, 2009-05-13 at 16:12 -0700, Luis R. Rodriguez wrote:
> On Wed, May 13, 2009 at 4:08 PM, Maxim Levitsky <[email protected]> wrote:
> > On Wed, 2009-05-13 at 14:28 -0700, Luis R. Rodriguez wrote:
>
> >> Anyway please try the patches I posted and I'll finish reviewing this
> >> in the meantime.
> > I don't see anything yet.
>
> wget this and git am it, it has all the patches:
>
> http://bombadil.infradead.org/~mcgrof/patches/wl/race-fixes-2009-05-13.patch
>
> Luis


Thanks a lot.
I will test this as soon as be at university again. (next sunday).

Thanks again,
Maxim Levitsky


2009-05-22 10:08:26

by Maxim Levitsky

[permalink] [raw]
Subject: Re: [BUG] Crda oopses the system

On Thu, 2009-05-21 at 17:36 -0700, Luis R. Rodriguez wrote:
> On Thu, May 21, 2009 at 5:20 PM, Maxim Levitsky <[email protected]> wrote:
> > On Tue, 2009-05-19 at 17:17 +0300, Maxim Levitsky wrote:
> >> On Thu, 2009-05-14 at 10:20 -0400, Bob Copeland wrote:
> >> > On Wed, May 13, 2009 at 8:07 PM, Luis R. Rodriguez <[email protected]> wrote:
> >> > >>
> >> > >> I wish I had time to set up, an ath5k in AP mode. There are rumors that
> >> > >> it works more or less now, then I could use the same beacon frame to
> >> > >> test.
> >> > >
> >> > > It does and if it doesn't its a bug.
> >> >
> >> > Just FYI, it's known to have problems if the STA uses power saving --
> >> > ath5k never updates the beacon. I have a patch that should work in the
> >> > relevant bugzilla but I haven't been back to retest it (at first I
> >> > thought it was causing hangs, but later realized the hangs were due to
> >> > other bugs in w-t).
> >> >
> >>
> >
> > I was just able to reproduce that bug on latest wireless-testing that I
> > belive contains the patches you sent for me to test.
>
> Latest wireless-testing *does* have those patches.
Yes I know, just to be sure.

>
> > I reproduced it against ath5k running in AP mode at home.
> >
> > I finally made the ath5k send beacons (although not much works besides
> > this).
> >
> >
> > Note that I noticed that this bug
>
> You can get an oops when trying to associate to your ath5k AP? And its
> easily reproducible? What driver to you use as the STA? Can you
> provide a trace?
>
> > happens once at boot, if I set NM to
> > use system settings,
>
> What does this mean?
Well, I can't say for sure, but it appears that if I asccociate with AP,
then I repeat this, I don't see this bug again.


>
> > when I try to connect again it doesn't happen. Its
> > a race condition after all.
>
> Is the oops not crashing your box? How are you able to try again? What
> does trying again mean?
Well, its not an oops on my system long ago.

I have converted the BUG_ON to printk, and so far I havent see any side
effects.
However I don't want you just to remove this check, unless it is bogus.
(Because it will bury the bug deeper).

On the other hand I see no reason to convert it to WARN_ON so it won't
oops user systems.

I am talking about

BUG_ON(!country_ie_regdomain);
in net/wireless/reg.c



>
> Luis


2009-05-10 12:20:30

by Maxim Levitsky

[permalink] [raw]
Subject: Re: [BUG] Crda oopses the system

On Wed, 2009-05-06 at 16:34 -0400, Pavel Roskin wrote:
> gdb /lib/modules/`uname -r`/kernel/net/wireless/cfg80211.ko

I have recompiled the kernel with debugging info.

This is new backtrace:


> May 7 10:27:05 maxim-laptop kernel: [ 5411.203085] ------------[ cut here ]------------
> May 7 10:27:05 maxim-laptop kernel: [ 5411.203096] kernel BUG at /home/maxim/software/kernel/linux-2.6/net/wireless/reg.c:2039!
> May 7 10:27:05 maxim-laptop kernel: [ 5411.203103] invalid opcode: 0000 [#1] PREEMPT SMP
> May 7 10:27:05 maxim-laptop kernel: [ 5411.203115] last sysfs file: /sys/devices/platform/coretemp.1/temp1_input
> May 7 10:27:05 maxim-laptop kernel: [ 5411.203121] CPU 0
> May 7 10:27:05 maxim-laptop kernel: [ 5411.203127] Modules linked in: iwl3945 iwlcore mac80211 cfg80211 cpufreq_stats af_packet nvidia(P) nfsd exportfs nfs lockd nfs_acl auth_rpcgss sunrpc usb_storage usb_libusual cpufreq_powersave cpufreq_conservative cpufreq_userspace acpi_cpufreq coretemp sbp2 snd_hda_codec_realtek snd_hda_intel joydev snd_hda_codec uvcvideo snd_hwdep videodev v4l1_compat acer_wmi rfkill v4l2_compat_ioctl32 sdhci_pci uhci_hcd snd_pcm backlight psmouse serio_raw ohci1394 sdhci iTCO_wdt iTCO_vendor_support snd_timer snd_page_alloc ehci_hcd usbcore evdev wmi fuse [last unloaded: cfg80211]
> May 7 10:27:05 maxim-laptop kernel: [ 5411.203258] Pid: 20876, comm: crda Tainted: P 2.6.30-rc4-wl #58 Aspire 5720
> May 7 10:27:05 maxim-laptop kernel: [ 5411.203265] RIP: 0010:[<ffffffffa0c9611e>] [<ffffffffa0c9611e>] set_regdom+0x43e/0x4d0 [cfg80211]
> May 7 10:27:05 maxim-laptop kernel: [ 5411.203291] RSP: 0018:ffff8800638c1978 EFLAGS: 00010246
> May 7 10:27:05 maxim-laptop kernel: [ 5411.203296] RAX: ffff88005176df68 RBX: ffff880034b601a0 RCX: ffffffffa0ca9540
> May 7 10:27:05 maxim-laptop kernel: [ 5411.203302] RDX: ffff880034b60000 RSI: 0000000000000000 RDI: 0000000000000000
> May 7 10:27:05 maxim-laptop kernel: [ 5411.203308] RBP: ffff8800638c1998 R08: 0000000000000001 R09: 0000000000000001
> May 7 10:27:05 maxim-laptop kernel: [ 5411.203314] R10: 0000000000000000 R11: 0000000000000000 R12: ffff88004f889a90
> May 7 10:27:05 maxim-laptop kernel: [ 5411.203320] R13: ffff88007e18e6e0 R14: 0000000000000001 R15: 0000000000000001
> May 7 10:27:05 maxim-laptop kernel: [ 5411.203326] FS: 00007f1c069c56f0(0000) GS:ffff880001025000(0000) knlGS:0000000000000000
> May 7 10:27:05 maxim-laptop kernel: [ 5411.203333] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> May 7 10:27:05 maxim-laptop kernel: [ 5411.203338] CR2: 00007f1c0632fdb0 CR3: 000000007fa65000 CR4: 00000000000006e0
> May 7 10:27:05 maxim-laptop kernel: [ 5411.203344] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> May 7 10:27:05 maxim-laptop kernel: [ 5411.203350] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> May 7 10:27:05 maxim-laptop kernel: [ 5411.203357] Process crda (pid: 20876, threadinfo ffff8800638c0000, task ffff880067093e80)
> May 7 10:27:05 maxim-laptop kernel: [ 5411.203362] Stack:
> May 7 10:27:05 maxim-laptop kernel: [ 5411.203366] 0000000000000000 ffff88004f889a90 0000000000000000 ffff88007e18e6e0
> May 7 10:27:05 maxim-laptop kernel: [ 5411.203379] ffff8800638c1a28 ffffffffa0c9c672 ffff88005e15ab40 ffff88004f889a90
> May 7 10:27:05 maxim-laptop kernel: [ 5411.203393] 0000000000000000 ffff88007e18e6e4 ffff88007e18e6ec ffff88007e18e6f4
> May 7 10:27:05 maxim-laptop kernel: [ 5411.203410] Call Trace:
> May 7 10:27:05 maxim-laptop kernel: [ 5411.203416] [<ffffffffa0c9c672>] nl80211_set_reg+0x112/0x2c0 [cfg80211]
> May 7 10:27:05 maxim-laptop kernel: [ 5411.203436] [<ffffffff80412c8f>] ? nla_parse+0xef/0x110
> May 7 10:27:05 maxim-laptop kernel: [ 5411.203450] [<ffffffff8054dac6>] genl_rcv_msg+0x1b6/0x1f0
> May 7 10:27:05 maxim-laptop kernel: [ 5411.203462] [<ffffffff8054d910>] ? genl_rcv_msg+0x0/0x1f0
> May 7 10:27:05 maxim-laptop kernel: [ 5411.203471] [<ffffffff8054d0e9>] netlink_rcv_skb+0x89/0xb0
> May 7 10:27:05 maxim-laptop kernel: [ 5411.203479] [<ffffffff8054d8ee>] genl_rcv+0x2e/0x50
> May 7 10:27:05 maxim-laptop kernel: [ 5411.203488] [<ffffffff8054c917>] ? netlink_unicast+0x117/0x2e0
> May 7 10:27:05 maxim-laptop kernel: [ 5411.203498] [<ffffffff8054cac4>] netlink_unicast+0x2c4/0x2e0
> May 7 10:27:05 maxim-laptop kernel: [ 5411.203508] [<ffffffff8052c4b3>] ? __alloc_skb+0x73/0x160
> May 7 10:27:05 maxim-laptop kernel: [ 5411.203519] [<ffffffff8054ccde>] netlink_sendmsg+0x1fe/0x300
> May 7 10:27:05 maxim-laptop kernel: [ 5411.203528] [<ffffffff80522f07>] sock_sendmsg+0x127/0x140
> May 7 10:27:05 maxim-laptop kernel: [ 5411.203537] [<ffffffff80522d61>] ? sock_recvmsg+0x141/0x160
> May 7 10:27:05 maxim-laptop kernel: [ 5411.203546] [<ffffffff80260ad0>] ? autoremove_wake_function+0x0/0x40
> May 7 10:27:05 maxim-laptop kernel: [ 5411.203558] [<ffffffff80293602>] ? __rcu_read_unlock+0xa2/0xc0
> May 7 10:27:05 maxim-laptop kernel: [ 5411.203567] [<ffffffff802730e9>] ? trace_hardirqs_on_caller+0x29/0x1c0
> May 7 10:27:05 maxim-laptop kernel: [ 5411.203578] [<ffffffff805219d0>] ? move_addr_to_kernel+0x30/0x40
> May 7 10:27:05 maxim-laptop kernel: [ 5411.203588] [<ffffffff8052db51>] ? verify_iovec+0x41/0xd0
> May 7 10:27:05 maxim-laptop kernel: [ 5411.203597] [<ffffffff805230ae>] sys_sendmsg+0x18e/0x320
> May 7 10:27:05 maxim-laptop kernel: [ 5411.203607] [<ffffffff805c4505>] ? _spin_unlock_irqrestore+0x65/0x80
> May 7 10:27:05 maxim-laptop kernel: [ 5411.203619] [<ffffffff805c77b1>] ? sub_preempt_count+0x51/0x60
> May 7 10:27:05 maxim-laptop kernel: [ 5411.203628] [<ffffffff80403c21>] ? __up_read+0x91/0xb0
> May 7 10:27:05 maxim-laptop kernel: [ 5411.203639] [<ffffffff802730e9>] ? trace_hardirqs_on_caller+0x29/0x1c0
> May 7 10:27:05 maxim-laptop kernel: [ 5411.203648] [<ffffffff805c3e4e>] ? trace_hardirqs_on_thunk+0x3a/0x3f
> May 7 10:27:05 maxim-laptop kernel: [ 5411.203658] [<ffffffff8020c15b>] system_call_fastpath+0x16/0x1b
> May 7 10:27:05 maxim-laptop kernel: [ 5411.203670] Code: 90 91 00 00 00 0f be b0 90 00 00 00 48 c7 c7 60 1b ca a0 31 c0 e8 13 9f 92 df e9 4f fe ff ff 0f 0b eb fe 0f 0b eb fe 0f 0b eb fe <0f> 0b eb fe 48 8b 35 17 43 01 00 4c 89 e7 e8 df f8 ff ff 49 89
> May 7 10:27:05 maxim-laptop kernel: [ 5411.203815] RIP [<ffffffffa0c9611e>] set_regdom+0x43e/0x4d0 [cfg80211]
> May 7 10:27:05 maxim-laptop kernel: [ 5411.203834] RSP <ffff8800638c1978>
> May 7 10:27:05 maxim-laptop kernel: [ 5411.203842] ---[ end trace 9723f71e550687a4 ]---
>


This is GDB output - can be inaccurate - I have pulled latest wireless-testing , and rebuild kernel again.
I will read the source, and try to fix this
This is 100% reproducible



> (gdb) l *nl80211_set_reg+0x112
> 0x96a2 is in nl80211_set_reg (/home/maxim/software/kernel/linux-2.6/net/wireless/nl80211.c:2587).
> 2582
> 2583 BUG_ON(rule_idx != num_rules);
> 2584
> 2585 mutex_lock(&cfg80211_mutex);
> 2586 r = set_regdom(rd);
> 2587 mutex_unlock(&cfg80211_mutex);
> 2588 return r;
> 2589
> 2590 bad_reg:
> 2591 kfree(rd);
> (gdb)


Best regards,
Maxim Levitsky


2009-05-31 06:22:27

by Luis Chamberlain

[permalink] [raw]
Subject: Re: [BUG] Crda oopses the system

On Fri, May 22, 2009 at 01:08:22PM +0300, Maxim Levitsky wrote:
> I am talking about
>
> BUG_ON(!country_ie_regdomain);
> in net/wireless/reg.c

Please try this patch and leave a window open with this running:

iw event

Please be sure to grab iw from git, not sure if the reg events
have all gone into an official release yet. What I'm looking for
is whether or not a previous 11d setting was already processed
or if the !country_ie_regdomain condition happens from the first
11d processing.

Luis

diff --git a/net/wireless/reg.c b/net/wireless/reg.c
index f87ac1d..1b60dfc 100644
--- a/net/wireless/reg.c
+++ b/net/wireless/reg.c
@@ -2171,7 +2171,11 @@ static int __set_regdom(const struct ieee80211_regdomain *rd)
* the country IE rd with what CRDA believes that country should have
*/

- BUG_ON(!country_ie_regdomain);
+ if (WARN_ON(!country_ie_regdomain)) {
+ kfree(rd);
+ rd = NULL;
+ return -EINVAL;
+ }
BUG_ON(rd == country_ie_regdomain);

/*
@@ -2268,6 +2272,8 @@ int regulatory_init(void)
if (IS_ERR(reg_pdev))
return PTR_ERR(reg_pdev);

+ country_ie_regdomain = NULL;
+
spin_lock_init(&reg_requests_lock);
spin_lock_init(&reg_pending_beacons_lock);


2009-06-01 00:33:45

by Bob Copeland

[permalink] [raw]
Subject: Re: [BUG] Crda oopses the system

On Sun, May 31, 2009 at 11:54:13PM +0300, Maxim Levitsky wrote:
> 2 - ath5k makes kernel panic, reliably after few times hostapd have
> started, I didn't yet captured the output.
> I remember to see panics with ad-hoc as well.
> I mean blinking leds on keyboard.

This should be fixed at least with a patch I recently posted to
linux-wireless.

> 3 - couldn't transfer any frames between AP and client, only association
> works.

Yeah I still plan to fix it, just haven't found cycles for it yet.

--
Bob Copeland %% http://www.bobcopeland.com


2009-05-31 22:24:45

by Luis R. Rodriguez

[permalink] [raw]
Subject: Re: [BUG] Crda oopses the system

On Sun, May 31, 2009 at 1:54 PM, Maxim Levitsky <[email protected]> wrote:
> On Sun, 2009-05-31 at 15:47 +0300, Maxim Levitsky wrote:
>> On Sun, 2009-05-31 at 02:22 -0400, Luis R. Rodriguez wrote:
>> > On Fri, May 22, 2009 at 01:08:22PM +0300, Maxim Levitsky wrote:
>> > > I am talking about
>> > >
>> > > BUG_ON(!country_ie_regdomain);
>> > > in net/wireless/reg.c
>> >
>> > Please try this patch and leave a window open with this running:
>> >
>> > iw event
>> >
>> > Please be sure to grab iw from git, not sure if the reg events
>> > have all gone into an official release yet. What I'm looking for
>> > is whether or not a previous 11d setting was already processed
>> > or if the !country_ie_regdomain condition happens from the first
>> > 11d processing.
>> >
>> >   Luis
>> >
>> > diff --git a/net/wireless/reg.c b/net/wireless/reg.c
>> > index f87ac1d..1b60dfc 100644
>> > --- a/net/wireless/reg.c
>> > +++ b/net/wireless/reg.c
>> > @@ -2171,7 +2171,11 @@ static int __set_regdom(const struct ieee80211_regdomain *rd)
>> >      * the country IE rd with what CRDA believes that country should have
>> >      */
>> >
>> > -   BUG_ON(!country_ie_regdomain);
>> > +   if (WARN_ON(!country_ie_regdomain)) {
>> > +           kfree(rd);
>> > +           rd = NULL;
>> > +           return -EINVAL;
>> > +   }
>> >     BUG_ON(rd == country_ie_regdomain);
>> >
>> >     /*
>> > @@ -2268,6 +2272,8 @@ int regulatory_init(void)
>> >     if (IS_ERR(reg_pdev))
>> >             return PTR_ERR(reg_pdev);
>> >
>> > +   country_ie_regdomain = NULL;
>> > +
>> >     spin_lock_init(&reg_requests_lock);
>> >     spin_lock_init(&reg_pending_beacons_lock);
>> >
>>
>>
>> I'll test this today.
>> I have iw from git.
>>
>> Best regards,
>>       Maxim Levitsky
>>
>
>
>
>
>
>
>
>
>
>
> Here it is:
>
>
>> wlan0 (phy #0): assoc 00:1b:9e:d8:77:02 -> 00:1b:77:f1:7c:29 status: 0: Successful
>> wlan0 (phy #0): disassoc 00:1b:77:f1:7c:29 -> 00:1b:9e:d8:77:02 reason 3: Deauthenticated because sending station is leaving (or has left) the IBSS or ESS
>> wlan0 (phy #0): scan finished
>> wlan0 (phy #0): auth 00:23:4d:3c:80:27 -> 00:1b:77:f1:7c:29 status: 0: Successful
>> wlan0 (phy #0): assoc 00:23:4d:3c:80:27 -> 00:1b:77:f1:7c:29 status: 0: Successful
>> phy #0: regulatory domain change: intersection used due to a request made by a country IE on phy0
>>
> dmesg attached (I use nvidia drivers)

I see only one userspace request *attempt* sent for the country IE in
your log ("Calling CRDA" bits). So my assumption that we were only
"trying" to send to userspace one request for the given country IE as
far as cfg80211 is concerned is accurate however it does not seem
accurate that the kernel won't send two requests or that userspace
will not respond twice.

> On top of that there are few more very bold bugs in ath5k AP mode:

Your best bet is to report these separately.

Luis

2009-05-31 22:48:34

by Maxim Levitsky

[permalink] [raw]
Subject: Re: [BUG] Crda oopses the system

On Sun, 2009-05-31 at 15:24 -0700, Luis R. Rodriguez wrote:
> On Sun, May 31, 2009 at 1:54 PM, Maxim Levitsky <[email protected]> wrote:
> > On Sun, 2009-05-31 at 15:47 +0300, Maxim Levitsky wrote:
> >> On Sun, 2009-05-31 at 02:22 -0400, Luis R. Rodriguez wrote:
> >> > On Fri, May 22, 2009 at 01:08:22PM +0300, Maxim Levitsky wrote:
> >> > > I am talking about
> >> > >
> >> > > BUG_ON(!country_ie_regdomain);
> >> > > in net/wireless/reg.c
> >> >
> >> > Please try this patch and leave a window open with this running:
> >> >
> >> > iw event
> >> >
> >> > Please be sure to grab iw from git, not sure if the reg events
> >> > have all gone into an official release yet. What I'm looking for
> >> > is whether or not a previous 11d setting was already processed
> >> > or if the !country_ie_regdomain condition happens from the first
> >> > 11d processing.
> >> >
> >> > Luis
> >> >
> >> > diff --git a/net/wireless/reg.c b/net/wireless/reg.c
> >> > index f87ac1d..1b60dfc 100644
> >> > --- a/net/wireless/reg.c
> >> > +++ b/net/wireless/reg.c
> >> > @@ -2171,7 +2171,11 @@ static int __set_regdom(const struct ieee80211_regdomain *rd)
> >> > * the country IE rd with what CRDA believes that country should have
> >> > */
> >> >
> >> > - BUG_ON(!country_ie_regdomain);
> >> > + if (WARN_ON(!country_ie_regdomain)) {
> >> > + kfree(rd);
> >> > + rd = NULL;
> >> > + return -EINVAL;
> >> > + }
> >> > BUG_ON(rd == country_ie_regdomain);
> >> >
> >> > /*
> >> > @@ -2268,6 +2272,8 @@ int regulatory_init(void)
> >> > if (IS_ERR(reg_pdev))
> >> > return PTR_ERR(reg_pdev);
> >> >
> >> > + country_ie_regdomain = NULL;
> >> > +
> >> > spin_lock_init(&reg_requests_lock);
> >> > spin_lock_init(&reg_pending_beacons_lock);
> >> >
> >>
> >>
> >> I'll test this today.
> >> I have iw from git.
> >>
> >> Best regards,
> >> Maxim Levitsky
> >>
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > Here it is:
> >
> >
> >> wlan0 (phy #0): assoc 00:1b:9e:d8:77:02 -> 00:1b:77:f1:7c:29 status: 0: Successful
> >> wlan0 (phy #0): disassoc 00:1b:77:f1:7c:29 -> 00:1b:9e:d8:77:02 reason 3: Deauthenticated because sending station is leaving (or has left) the IBSS or ESS
> >> wlan0 (phy #0): scan finished
> >> wlan0 (phy #0): auth 00:23:4d:3c:80:27 -> 00:1b:77:f1:7c:29 status: 0: Successful
> >> wlan0 (phy #0): assoc 00:23:4d:3c:80:27 -> 00:1b:77:f1:7c:29 status: 0: Successful
> >> phy #0: regulatory domain change: intersection used due to a request made by a country IE on phy0
> >>
> > dmesg attached (I use nvidia drivers)
>
> I see only one userspace request *attempt* sent for the country IE in
> your log ("Calling CRDA" bits). So my assumption that we were only
> "trying" to send to userspace one request for the given country IE as
> far as cfg80211 is concerned is accurate however it does not seem
> accurate that the kernel won't send two requests or that userspace
> will not respond twice.
>
> > On top of that there are few more very bold bugs in ath5k AP mode:
>
> Your best bet is to report these separately.
I did report some, I will report the reset, this is just a remainder,
cause today I had really bad time with ath5k. I was trying to compile
kernel on aspire one using distcc.

Thanks a lot for fixing the crda bug. I guess there is no need to test
it, I already had that test removed, and everything did work fine.



Best regards,
Maxim Levitsky