2007-09-02 23:59:21

by Michal Piotrowski

[permalink] [raw]
Subject: Re: BUG: scheduling while atomic: ifconfig/0x00000002/4170

Hi,

[Adding netdev and wireless to CC]

On 02/09/07, Florian Lohoff <[email protected]> wrote:
>
> Hi,
> with current git i got this when "ifconfig eth1" down. eth1 had a mac
> address which looked really like an eth1394 ethernet although the module
> was not loaded. Something is really broken in 2.6.23-currentgit. I always get the
> sysfs rename issues which are discussed to be an udev issue. Then i see
> a eth1394 mac address on an interface which typically shouldn exist
> (udev should rename the wireless to eth1) and when issueing an
> ifconfig eth1 down i get a
>
> BUG: scheduling while atomic: ifconfig/0x00000002/4170
>
> On the next boot i see the eth1394 mac address on the wireless interface
> wmaster0_rename whereas eth1 is active (the wireless) and has the correct
> ip address. I dont get it - this all looks really messed up. udev is
> debian sid 114-2.
>
> eth0 Link encap:Ethernet HWaddr 00:17:42:13:45:8C
> UP BROADCAST MULTICAST MTU:1500 Metric:1
> RX packets:0 errors:0 dropped:0 overruns:0 frame:0
> TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
> collisions:0 txqueuelen:1000
> RX bytes:0 (0.0 b) TX bytes:0 (0.0 b)
> Interrupt:19
>
> eth1 Link encap:Ethernet HWaddr 00:18:DE:63:F0:B3
> inet addr:195.71.97.208 Bcast:195.71.97.223 Mask:255.255.255.224
> inet6 addr: fe80::218:deff:fe63:f0b3/64 Scope:Link
> UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
> RX packets:2079 errors:0 dropped:0 overruns:0 frame:0
> TX packets:2220 errors:0 dropped:0 overruns:0 carrier:0
> collisions:0 txqueuelen:1000
> RX bytes:508959 (497.0 KiB) TX bytes:261123 (255.0 KiB)
>
> wmaster0_ Link encap:UNSPEC HWaddr 00-18-DE-63-F0-B3-30-3A-00-00-00-00-00-00-00-00
> UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
> RX packets:0 errors:0 dropped:0 overruns:0 frame:0
> TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
> collisions:0 txqueuelen:1000
> RX bytes:0 (0.0 b) TX bytes:0 (0.0 b)
>
>
> [ 14.300736] VFS: Mounted root (ext3 filesystem) readonly.
> [ 14.300902] Freeing unused kernel memory: 216k freed
> [ 17.618804] irda_init()
> [ 17.618817] NET: Registered protocol family 23
> [ 17.636399] ACPI: PCI Interrupt 0000:02:00.0[A] -> GSI 16 (level, low) -> IRQ 19
> [ 17.636588] PCI: Setting latency timer of device 0000:02:00.0 to 64
> [ 17.636619] sky2 0000:02:00.0: v1.17 addr 0xf0000000 irq 19 Yukon-EC Ultra (0xb4) rev 2
> [ 17.648081] parport_pc 00:0c: reported by Plug and Play ACPI
> [ 17.648206] parport0: PC-style at 0x378, irq 7 [PCSPP,TRISTATE,EPP]
> [ 17.653652] sky2 eth0: addr 00:17:42:13:45:8c
> [ 17.680848] input: Video Bus as /class/input/input6
> [ 17.680961] ACPI: Video Device [GFX0] (multi-head: yes rom: no post: no)
> [ 17.757019] usbcore: registered new interface driver usbfs
> [ 17.757139] usbcore: registered new interface driver hub
> [ 17.757264] usbcore: registered new device driver usb
> [ 17.824819] Yenta: CardBus bridge found at 0000:08:03.0 [10cf:131e]
> [ 17.824941] Yenta O2: res at 0x94/0xD4: 00/ea
> [ 17.825034] Yenta O2: enabling read prefetch/write burst
> [ 17.828363] hda: ATAPI 24X DVD-ROM DVD-R CD-R/RW drive, 2048kB Cache, UDMA(33)
> [ 17.828838] Uniform CD-ROM driver Revision: 3.20
> [ 17.891481] USB Universal Host Controller Interface driver v3.0
> [ 17.891650] ACPI: PCI Interrupt 0000:00:1d.0[A] -> GSI 23 (level, low) -> IRQ 22
> [ 17.891840] PCI: Setting latency timer of device 0000:00:1d.0 to 64
> [ 17.891844] uhci_hcd 0000:00:1d.0: UHCI Host Controller
> [ 17.892155] uhci_hcd 0000:00:1d.0: new USB bus registered, assigned bus number 1
> [ 17.892327] uhci_hcd 0000:00:1d.0: irq 22, io base 0x00001820
> [ 17.892571] usb usb1: configuration #1 chosen from 1 choice
> [ 17.892689] hub 1-0:1.0: USB hub found
> [ 17.892784] hub 1-0:1.0: 2 ports detected
> [ 17.924265] found SMC SuperIO Chip (devid=0x7a rev=00 base=0x002e): LPC47N227
> [ 17.924390] smsc_superio_flat(): fir: 0x6e8, sir: 0x2e8, dma: 03, irq: 3, mode: 0x0e
> [ 17.924526] smsc_ircc_present: can't get sir_base of 0x2e8
> [ 17.954918] Yenta: ISA IRQ mask 0x0c38, PCI irq 19
> [ 17.955009] Socket status: 30000006
> [ 17.955094] Yenta: Raising subordinate bus# of parent bus (#08) from #09 to #0c
> [ 17.955225] pcmcia: parent PCI bridge I/O window: 0x3000 - 0x3fff
> [ 17.955315] cs: IO port probe 0x3000-0x3fff: clean.
> [ 17.955773] pcmcia: parent PCI bridge Memory window: 0xf0200000 - 0xf02fffff
> [ 17.955864] pcmcia: parent PCI bridge Memory window: 0x30000000 - 0x37ffffff
> [ 17.956497] Yenta: CardBus bridge found at 0000:08:03.1 [10cf:131e]
> [ 17.981605] iwl3945: Intel(R) PRO/Wireless 3945ABG/BG Network Connection driver for Linux, 0.1.14
> [ 17.981752] iwl3945: Copyright(c) 2003-2007 Intel Corporation
> [ 17.983847] sdhci: Secure Digital Host Controller Interface driver
> [ 17.983940] sdhci: Copyright(c) Pierre Ossman
> [ 17.998087] ACPI: PCI Interrupt 0000:00:1d.1[B] -> GSI 20 (level, low) -> IRQ 18
> [ 17.998299] PCI: Setting latency timer of device 0000:00:1d.1 to 64
> [ 17.998303] uhci_hcd 0000:00:1d.1: UHCI Host Controller
> [ 17.998428] uhci_hcd 0000:00:1d.1: new USB bus registered, assigned bus number 2
> [ 17.998601] uhci_hcd 0000:00:1d.1: irq 18, io base 0x00001840
> [ 17.998811] usb usb2: configuration #1 chosen from 1 choice
> [ 17.998933] hub 2-0:1.0: USB hub found
> [ 17.999023] hub 2-0:1.0: 2 ports detected
> [ 18.082862] Yenta: ISA IRQ mask 0x0438, PCI irq 19
> [ 18.082956] Socket status: 30000410
> [ 18.083041] Yenta: Raising subordinate bus# of parent bus (#08) from #0c to #10
> [ 18.083169] pcmcia: parent PCI bridge I/O window: 0x3000 - 0x3fff
> [ 18.083255] cs: IO port probe 0x3000-0x3fff: clean.
> [ 18.083710] pcmcia: parent PCI bridge Memory window: 0xf0200000 - 0xf02fffff
> [ 18.083798] pcmcia: parent PCI bridge Memory window: 0x30000000 - 0x37ffffff
> [ 18.105897] ACPI: PCI Interrupt 0000:00:1d.2[C] -> GSI 18 (level, low) -> IRQ 20
> [ 18.106097] PCI: Setting latency timer of device 0000:00:1d.2 to 64
> [ 18.106101] uhci_hcd 0000:00:1d.2: UHCI Host Controller
> [ 18.106218] uhci_hcd 0000:00:1d.2: new USB bus registered, assigned bus number 3
> [ 18.106385] uhci_hcd 0000:00:1d.2: irq 20, io base 0x00001860
> [ 18.106594] usb usb3: configuration #1 chosen from 1 choice
> [ 18.106709] hub 3-0:1.0: USB hub found
> [ 18.106799] hub 3-0:1.0: 2 ports detected
> [ 18.213730] ACPI: PCI Interrupt 0000:00:1d.3[D] -> GSI 16 (level, low) -> IRQ 19
> [ 18.213932] PCI: Setting latency timer of device 0000:00:1d.3 to 64
> [ 18.213938] uhci_hcd 0000:00:1d.3: UHCI Host Controller
> [ 18.214052] uhci_hcd 0000:00:1d.3: new USB bus registered, assigned bus number 4
> [ 18.214210] uhci_hcd 0000:00:1d.3: irq 19, io base 0x00001880
> [ 18.214414] usb usb4: configuration #1 chosen from 1 choice
> [ 18.214529] hub 4-0:1.0: USB hub found
> [ 18.214620] hub 4-0:1.0: 2 ports detected
> [ 18.318020] ACPI: PCI Interrupt 0000:00:1d.7[A] -> GSI 23 (level, low) -> IRQ 22
> [ 18.324538] PCI: Setting latency timer of device 0000:00:1d.7 to 64
> [ 18.324544] ehci_hcd 0000:00:1d.7: EHCI Host Controller
> [ 18.324687] ehci_hcd 0000:00:1d.7: new USB bus registered, assigned bus number 5
> [ 18.324857] ehci_hcd 0000:00:1d.7: debug port 1
> [ 18.324951] PCI: cache line size of 32 is not supported by device 0000:00:1d.7
> [ 18.324959] ehci_hcd 0000:00:1d.7: irq 22, io mem 0xf0644000
> [ 18.328914] ehci_hcd 0000:00:1d.7: USB 2.0 started, EHCI 1.00, driver 10 Dec 2004
> [ 18.329163] usb usb5: configuration #1 chosen from 1 choice
> [ 18.329282] hub 5-0:1.0: USB hub found
> [ 18.329374] hub 5-0:1.0: 8 ports detected
> [ 18.429643] ACPI: PCI Interrupt 0000:08:03.4[A] -> GSI 16 (level, low) -> IRQ 19
> [ 18.480534] ohci1394: fw-host0: OHCI-1394 1.1 (PCI): IRQ=[19] MMIO=[f0201000-f02017ff] Max Packet=[2048] IR/IT contexts=[8/8]
> [ 18.483902] ACPI: PCI Interrupt 0000:05:00.0[A] -> GSI 18 (level, low) -> IRQ 20
> [ 18.484135] PCI: Setting latency timer of device 0000:05:00.0 to 64
> [ 18.484153] iwl3945: Detected Intel PRO/Wireless 3945ABG Network Connection
> [ 18.595000] sdhci: SDHCI controller found at 0000:08:03.2 [1217:7120] (rev 1)
> [ 18.595114] ACPI: PCI Interrupt 0000:08:03.2[A] -> GSI 16 (level, low) -> IRQ 19
> [ 18.595299] sdhci:slot0: Unknown controller version (16). You may experience problems.
> [ 18.595581] mmc0: SDHCI at 0xf0202800 irq 19 PIO
> [ 18.614800] ACPI: PCI Interrupt 0000:00:1b.0[A] -> GSI 17 (level, low) -> IRQ 23
> [ 18.614990] PCI: Setting latency timer of device 0000:00:1b.0 to 64
> [ 18.700659] iwl3945: Tunable channels: 13 802.11bg, 23 802.11a channels
> [ 18.701906] wmaster0: Selected rate control algorithm 'iwl-3945-rs'
> [ 18.721016] pccard: PCMCIA card inserted into slot 1
> [ 18.721131] cs: memory probe 0xf0200000-0xf02fffff: excluding 0xf0200000-0xf020ffff
> [ 18.739151] pcmcia: registering new device pcmcia1.0
> [ 18.740929] net eth1: device_rename: sysfs_create_symlink failed (-17)
> [ 18.741075] net wlan0_rename: device_rename: sysfs_create_symlink failed (-17)
> [ 18.741625] udev: renamed network interface wmaster0 to eth1
> [ 18.779425] cs: IO port probe 0x100-0x3af:<3>si3054: cannot initialize. EXT MID = 0000
> [ 18.783468] clean.
> [ 18.783592] cs: IO port probe 0x3e0-0x4ff: excluding 0x4d0-0x4d7
> [ 18.784668] cs: IO port probe 0x820-0x8ff: clean.
> [ 18.785521] cs: IO port probe 0xc00-0xcf7: clean.
> [ 18.786456] cs: IO port probe 0xa00-0xaff: clean.
> [ 18.837550] cs: IO port probe 0x100-0x3af: clean.
> [ 18.839641] cs: IO port probe 0x3e0-0x4ff: excluding 0x4d0-0x4d7
> [ 18.840664] cs: IO port probe 0x820-0x8ff: clean.
> [ 18.841501] cs: IO port probe 0xc00-0xcf7: clean.
> [ 18.842426] cs: IO port probe 0xa00-0xaff: clean.
> [ 18.964571] usb 4-2: new full speed USB device using uhci_hcd and address 2
> [ 19.141976] usb 4-2: configuration #1 chosen from 1 choice
> [ 19.214394] Bluetooth: Core ver 2.11
> [ 19.214546] NET: Registered protocol family 31
> [ 19.214640] Bluetooth: HCI device and connection manager initialized
> [ 19.214737] Bluetooth: HCI socket layer initialized
> [ 19.257180] Bluetooth: HCI USB driver ver 2.9
> [ 19.257349] usbcore: registered new interface driver hci_usb
> [ 19.751313] ieee1394: Host added: ID:BUS[0-00:1023] GUID[00000e10036532e4]
> [ 50.113619] Adding 979924k swap on /dev/sda5. Priority:-1 extents:1 across:979924k
> [ 50.163229] EXT3 FS on sda6, internal journal
> [ 64.870417] NET: Registered protocol family 10
> [ 64.870613] lo: Disabled Privacy Extensions
> [ 71.028607] Bluetooth: L2CAP ver 2.8
> [ 71.028615] Bluetooth: L2CAP socket layer initialized
> [ 71.168926] Bluetooth: RFCOMM socket layer initialized
> [ 71.169019] Bluetooth: RFCOMM TTY layer initialized
> [ 71.169054] Bluetooth: RFCOMM ver 1.8
> [ 71.539346] ADDRCONF(NETDEV_UP): wlan0_rename: link is not ready
> [ 71.737809] sky2 eth0: enabling interface
> [ 71.740794] sky2 eth0: ram buffer 0K
> [ 71.741733] ADDRCONF(NETDEV_UP): eth0: link is not ready
> [ 142.968449] wlan0_rename: Initial auth_alg=0
> [ 142.968461] wlan0_rename: authenticate with AP 00:0c:41:de:12:e1
> [ 142.976883] wlan0_rename: RX authentication from 00:0c:41:de:12:e1 (alg=0 transaction=2 status=0)
> [ 142.976891] wlan0_rename: authenticated
> [ 142.976895] wlan0_rename: associate with AP 00:0c:41:de:12:e1
> [ 142.979389] wlan0_rename: authentication frame received from 00:0c:41:de:12:e1, but not in authenticate state - ignored
> [ 142.980404] wlan0_rename: RX AssocResp from 00:0c:41:de:12:e1 (capab=0x401 status=0 aid=1)
> [ 142.980412] wlan0_rename: associated
> [ 142.982914] ADDRCONF(NETDEV_CHANGE): wlan0_rename: link becomes ready
> [ 148.291754] ICMPv6 NA: someone advertises our address on wlan0_rename!
> [ 151.386585] wlan0_rename: duplicate address detected!
> [ 382.517007] BUG: scheduling while atomic: ifconfig/0x00000002/4170
> [ 382.517029] [<c032bf5c>] __sched_text_start+0x84/0x71c
> [ 382.517047] [<c032c5da>] __sched_text_start+0x702/0x71c
> [ 382.517065] [<c0174a6c>] link_path_walk+0xa9/0xb3
> [ 382.517079] [<c032c6ad>] wait_for_completion+0x65/0x9b
> [ 382.517090] [<c011f0a2>] default_wake_function+0x0/0xc
> [ 382.517103] [<c01334cf>] synchronize_rcu+0x2a/0x2f
> [ 382.517115] [<c0133091>] wakeme_after_rcu+0x0/0x8
> [ 382.517127] [<c02d5f4f>] dev_deactivate+0x89/0x9c
> [ 382.517138] [<c02c8abc>] dev_close+0x24/0x67
> [ 382.517150] [<e01f402b>] ieee80211_master_stop+0x4a/0x6d [mac80211]
> [ 382.517176] [<c02c8ae3>] dev_close+0x4b/0x67
> [ 382.517183] [<c02c7f47>] dev_change_flags+0x9d/0x14e
> [ 382.517193] [<c03058c8>] devinet_ioctl+0x224/0x532
> [ 382.517201] [<c02c9589>] dev_ifsioc+0x113/0x396
> [ 382.517208] [<c02c8d59>] dev_load+0x24/0x4b
> [ 382.517214] [<c02be61d>] sock_ioctl+0x0/0x1be
> [ 382.517233] [<c02be7bc>] sock_ioctl+0x19f/0x1be
> [ 382.517239] [<c011a993>] do_page_fault+0x269/0x58e
> [ 382.517246] [<c02be61d>] sock_ioctl+0x0/0x1be
> [ 382.517254] [<c0176787>] do_ioctl+0x1f/0x62
> [ 382.517263] [<c0176a01>] vfs_ioctl+0x237/0x249
> [ 382.517270] [<c016b7d8>] do_sys_open+0xbb/0xc5
> [ 382.517280] [<c0176a46>] sys_ioctl+0x33/0x4d
> [ 382.517288] [<c0103e52>] sysenter_past_esp+0x5f/0x85
> [ 382.517303] =======================
> [ 382.528958] BUG: scheduling while atomic: ifconfig/0x00000002/4170
> [ 382.528972] [<c032bf5c>] __sched_text_start+0x84/0x71c
> [ 382.528996] [<c032c6ad>] wait_for_completion+0x65/0x9b
> [ 382.529005] [<c011f0a2>] default_wake_function+0x0/0xc
> [ 382.529014] [<c01334cf>] synchronize_rcu+0x2a/0x2f
> [ 382.529021] [<c0133091>] wakeme_after_rcu+0x0/0x8
> [ 382.529031] [<c02d5f4f>] dev_deactivate+0x89/0x9c
> [ 382.529041] [<c02c8abc>] dev_close+0x24/0x67
> [ 382.529052] [<e01f402b>] ieee80211_master_stop+0x4a/0x6d [mac80211]
> [ 382.529077] [<c02c8ae3>] dev_close+0x4b/0x67
> [ 382.529088] [<c02c7f47>] dev_change_flags+0x9d/0x14e
> [ 382.529101] [<c03058c8>] devinet_ioctl+0x224/0x532
> [ 382.529111] [<c02c9589>] dev_ifsioc+0x113/0x396
> [ 382.529122] [<c02c8d59>] dev_load+0x24/0x4b
> [ 382.529132] [<c02be61d>] sock_ioctl+0x0/0x1be
> [ 382.529153] [<c02be7bc>] sock_ioctl+0x19f/0x1be
> [ 382.529164] [<c011a993>] do_page_fault+0x269/0x58e
> [ 382.529174] [<c02be61d>] sock_ioctl+0x0/0x1be
> [ 382.529187] [<c0176787>] do_ioctl+0x1f/0x62
> [ 382.529201] [<c0176a01>] vfs_ioctl+0x237/0x249
> [ 382.529211] [<c016b7d8>] do_sys_open+0xbb/0xc5
> [ 382.529224] [<c0176a46>] sys_ioctl+0x33/0x4d
> [ 382.529235] [<c0103e52>] sysenter_past_esp+0x5f/0x85
> [ 382.529253] =======================
> [ 382.530962] Freeing alive inet6 address cd98f600
> --
> Florian Lohoff [email protected] +49-171-2280134
> Those who would give up a little freedom to get a little
> security shall soon have neither - Benjamin Franklin
>
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.6 (GNU/Linux)
>
> iD8DBQFG2wSWUaz2rXW+gJcRAq/8AKCEqN8Vylr12kRU9es5Y0K6vPNIXwCg0AnS
> tqx3+KmEGpVapGtzj8z3r1M=
> =EYJK
> -----END PGP SIGNATURE-----
>
>

Regards,
Michal

--
LOG
http://www.stardust.webpages.pl/log/


2007-09-07 14:29:01

by Johannes Berg

[permalink] [raw]
Subject: Re: BUG: scheduling while atomic: ifconfig/0x00000002/4170

On Fri, 2007-09-07 at 07:25 -0700, Paul E. McKenney wrote:

> > I don't like ASSERT_RTNL() much because it actually tries to lock it.
> > I'd be much happer if it was WARN_ON(!mutex_locked(&rtnl_mutex)) or
> > something equivalent.
>
> Ah! It would indeed be nice to have a lower-overhead ASSERT_RTNL_LIGHT()
> or whatever.

I don't know why it tries that anyway. Maybe it's from semaphore days
where you couldn't check _is_locked()?

> > In any case, I have an updated patch I'll be sending soon, and it
> > requires a new list walking primitive I'll also send.
>
> Look forward to seeing it!

Will send in a minute.

johannes


Attachments:
signature.asc (190.00 B)
This is a digitally signed message part

2007-09-06 08:20:52

by Herbert Xu

[permalink] [raw]
Subject: Re: BUG: scheduling while atomic: ifconfig/0x00000002/4170

On Thu, Sep 06, 2007 at 10:32:33AM +0530, Satyam Sharma wrote:
>
> Probably tangential, but Herbert, is the call to synchronize_rcu() in
> dev_deactivate() really correct?

Yes it's still correct as of today's tree. Of course patches
such as preemptible RCU may change things but they'll need to
audit this anyway.

Cheers,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

2007-09-06 12:38:39

by Herbert Xu

[permalink] [raw]
Subject: Re: BUG: scheduling while atomic: ifconfig/0x00000002/4170

Johannes Berg <[email protected]> wrote:
>
> Hah, I suspected as much but didn't have a chance to look yet. I had
> plans to replace that sub_if_list with an RCU list and not require the
> lock there, but that's far off. Any ideas how to fix this? We can't
> reject the master stop so we have to walk the list, I guess we'll have
> to audit the other list manipulation places, I think they're all under
> RTNL.

Yeah I think they're all under RTNL too. So you don't need to
take the lock here at all since you should already have the RTNL.

Cheers,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

2007-09-06 04:49:25

by Satyam Sharma

[permalink] [raw]
Subject: Re: BUG: scheduling while atomic: ifconfig/0x00000002/4170

Hi,


> On 02/09/07, Florian Lohoff <[email protected]> wrote:
> >
> > Hi,
> > with current git i got this when "ifconfig eth1" down. eth1 had a mac
> > address which looked really like an eth1394 ethernet although the module
> > was not loaded. Something is really broken in 2.6.23-currentgit. I always get the

-currentgit would be -rc5 right?


> > sysfs rename issues which are discussed to be an udev issue. Then i see
> > a eth1394 mac address on an interface which typically shouldn exist
> > (udev should rename the wireless to eth1) and when issueing an
> > ifconfig eth1 down i get a
> >
> > BUG: scheduling while atomic: ifconfig/0x00000002/4170

Is this reproducible? Also, please send your .config.

BTW the calltrace below shows that eth1 was the wireless interface when
you tried to "down" it.


> > On the next boot i see the eth1394 mac address on the wireless interface
> > wmaster0_rename whereas eth1 is active (the wireless) and has the correct
> > ip address.

I don't think the "scheduling while atomic" bug you saw is related to
the other problem you're seeing ... these are probably a bunch of all
different issues, but I'm not sure if eth1394 is involved at all.


> > I dont get it - this all looks really messed up. udev is
> > debian sid 114-2.

True, messed up it indeed is.


> > eth0 Link encap:Ethernet HWaddr 00:17:42:13:45:8C
> > UP BROADCAST MULTICAST MTU:1500 Metric:1
> > RX packets:0 errors:0 dropped:0 overruns:0 frame:0
> > TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
> > collisions:0 txqueuelen:1000
> > RX bytes:0 (0.0 b) TX bytes:0 (0.0 b)
> > Interrupt:19
> >
> > eth1 Link encap:Ethernet HWaddr 00:18:DE:63:F0:B3
> > inet addr:195.71.97.208 Bcast:195.71.97.223 Mask:255.255.255.224
> > inet6 addr: fe80::218:deff:fe63:f0b3/64 Scope:Link
> > UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
> > RX packets:2079 errors:0 dropped:0 overruns:0 frame:0
> > TX packets:2220 errors:0 dropped:0 overruns:0 carrier:0
> > collisions:0 txqueuelen:1000
> > RX bytes:508959 (497.0 KiB) TX bytes:261123 (255.0 KiB)
> >
> > wmaster0_ Link encap:UNSPEC HWaddr 00-18-DE-63-F0-B3-30-3A-00-00-00-00-00-00-00-00
> > UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
> > RX packets:0 errors:0 dropped:0 overruns:0 frame:0
> > TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
> > collisions:0 txqueuelen:1000
> > RX bytes:0 (0.0 b) TX bytes:0 (0.0 b)
> >
> >
> > [ 14.300736] VFS: Mounted root (ext3 filesystem) readonly.
> > [ 14.300902] Freeing unused kernel memory: 216k freed
> > [ 17.618804] irda_init()
> > [ 17.618817] NET: Registered protocol family 23
> > [ 17.636399] ACPI: PCI Interrupt 0000:02:00.0[A] -> GSI 16 (level, low) -> IRQ 19
> > [ 17.636588] PCI: Setting latency timer of device 0000:02:00.0 to 64
> > [ 17.636619] sky2 0000:02:00.0: v1.17 addr 0xf0000000 irq 19 Yukon-EC Ultra (0xb4) rev 2
> > [ 17.648081] parport_pc 00:0c: reported by Plug and Play ACPI
> > [ 17.648206] parport0: PC-style at 0x378, irq 7 [PCSPP,TRISTATE,EPP]
> > [ 17.653652] sky2 eth0: addr 00:17:42:13:45:8c
> > [ 17.680848] input: Video Bus as /class/input/input6
> > [ 17.680961] ACPI: Video Device [GFX0] (multi-head: yes rom: no post: no)
> > [ 17.757019] usbcore: registered new interface driver usbfs
> > [ 17.757139] usbcore: registered new interface driver hub
> > [ 17.757264] usbcore: registered new device driver usb
> > [ 17.824819] Yenta: CardBus bridge found at 0000:08:03.0 [10cf:131e]
> > [ 17.824941] Yenta O2: res at 0x94/0xD4: 00/ea
> > [ 17.825034] Yenta O2: enabling read prefetch/write burst
> > [ 17.828363] hda: ATAPI 24X DVD-ROM DVD-R CD-R/RW drive, 2048kB Cache, UDMA(33)
> > [ 17.828838] Uniform CD-ROM driver Revision: 3.20
> > [ 17.891481] USB Universal Host Controller Interface driver v3.0
> > [ 17.891650] ACPI: PCI Interrupt 0000:00:1d.0[A] -> GSI 23 (level, low) -> IRQ 22
> > [ 17.891840] PCI: Setting latency timer of device 0000:00:1d.0 to 64
> > [ 17.891844] uhci_hcd 0000:00:1d.0: UHCI Host Controller
> > [ 17.892155] uhci_hcd 0000:00:1d.0: new USB bus registered, assigned bus number 1
> > [ 17.892327] uhci_hcd 0000:00:1d.0: irq 22, io base 0x00001820
> > [ 17.892571] usb usb1: configuration #1 chosen from 1 choice
> > [ 17.892689] hub 1-0:1.0: USB hub found
> > [ 17.892784] hub 1-0:1.0: 2 ports detected
> > [ 17.924265] found SMC SuperIO Chip (devid=0x7a rev=00 base=0x002e): LPC47N227
> > [ 17.924390] smsc_superio_flat(): fir: 0x6e8, sir: 0x2e8, dma: 03, irq: 3, mode: 0x0e
> > [ 17.924526] smsc_ircc_present: can't get sir_base of 0x2e8
> > [ 17.954918] Yenta: ISA IRQ mask 0x0c38, PCI irq 19
> > [ 17.955009] Socket status: 30000006
> > [ 17.955094] Yenta: Raising subordinate bus# of parent bus (#08) from #09 to #0c
> > [ 17.955225] pcmcia: parent PCI bridge I/O window: 0x3000 - 0x3fff
> > [ 17.955315] cs: IO port probe 0x3000-0x3fff: clean.
> > [ 17.955773] pcmcia: parent PCI bridge Memory window: 0xf0200000 - 0xf02fffff
> > [ 17.955864] pcmcia: parent PCI bridge Memory window: 0x30000000 - 0x37ffffff
> > [ 17.956497] Yenta: CardBus bridge found at 0000:08:03.1 [10cf:131e]
> > [ 17.981605] iwl3945: Intel(R) PRO/Wireless 3945ABG/BG Network Connection driver for Linux, 0.1.14
> > [ 17.981752] iwl3945: Copyright(c) 2003-2007 Intel Corporation

Cc'ing ipw3945 ...


> > [ 17.983847] sdhci: Secure Digital Host Controller Interface driver
> > [ 17.983940] sdhci: Copyright(c) Pierre Ossman
> > [ 17.998087] ACPI: PCI Interrupt 0000:00:1d.1[B] -> GSI 20 (level, low) -> IRQ 18
> > [ 17.998299] PCI: Setting latency timer of device 0000:00:1d.1 to 64
> > [ 17.998303] uhci_hcd 0000:00:1d.1: UHCI Host Controller
> > [ 17.998428] uhci_hcd 0000:00:1d.1: new USB bus registered, assigned bus number 2
> > [ 17.998601] uhci_hcd 0000:00:1d.1: irq 18, io base 0x00001840
> > [ 17.998811] usb usb2: configuration #1 chosen from 1 choice
> > [ 17.998933] hub 2-0:1.0: USB hub found
> > [ 17.999023] hub 2-0:1.0: 2 ports detected
> > [ 18.082862] Yenta: ISA IRQ mask 0x0438, PCI irq 19
> > [ 18.082956] Socket status: 30000410
> > [ 18.083041] Yenta: Raising subordinate bus# of parent bus (#08) from #0c to #10
> > [ 18.083169] pcmcia: parent PCI bridge I/O window: 0x3000 - 0x3fff
> > [ 18.083255] cs: IO port probe 0x3000-0x3fff: clean.
> > [ 18.083710] pcmcia: parent PCI bridge Memory window: 0xf0200000 - 0xf02fffff
> > [ 18.083798] pcmcia: parent PCI bridge Memory window: 0x30000000 - 0x37ffffff
> > [ 18.105897] ACPI: PCI Interrupt 0000:00:1d.2[C] -> GSI 18 (level, low) -> IRQ 20
> > [ 18.106097] PCI: Setting latency timer of device 0000:00:1d.2 to 64
> > [ 18.106101] uhci_hcd 0000:00:1d.2: UHCI Host Controller
> > [ 18.106218] uhci_hcd 0000:00:1d.2: new USB bus registered, assigned bus number 3
> > [ 18.106385] uhci_hcd 0000:00:1d.2: irq 20, io base 0x00001860
> > [ 18.106594] usb usb3: configuration #1 chosen from 1 choice
> > [ 18.106709] hub 3-0:1.0: USB hub found
> > [ 18.106799] hub 3-0:1.0: 2 ports detected
> > [ 18.213730] ACPI: PCI Interrupt 0000:00:1d.3[D] -> GSI 16 (level, low) -> IRQ 19
> > [ 18.213932] PCI: Setting latency timer of device 0000:00:1d.3 to 64
> > [ 18.213938] uhci_hcd 0000:00:1d.3: UHCI Host Controller
> > [ 18.214052] uhci_hcd 0000:00:1d.3: new USB bus registered, assigned bus number 4
> > [ 18.214210] uhci_hcd 0000:00:1d.3: irq 19, io base 0x00001880
> > [ 18.214414] usb usb4: configuration #1 chosen from 1 choice
> > [ 18.214529] hub 4-0:1.0: USB hub found
> > [ 18.214620] hub 4-0:1.0: 2 ports detected
> > [ 18.318020] ACPI: PCI Interrupt 0000:00:1d.7[A] -> GSI 23 (level, low) -> IRQ 22
> > [ 18.324538] PCI: Setting latency timer of device 0000:00:1d.7 to 64
> > [ 18.324544] ehci_hcd 0000:00:1d.7: EHCI Host Controller
> > [ 18.324687] ehci_hcd 0000:00:1d.7: new USB bus registered, assigned bus number 5
> > [ 18.324857] ehci_hcd 0000:00:1d.7: debug port 1
> > [ 18.324951] PCI: cache line size of 32 is not supported by device 0000:00:1d.7
> > [ 18.324959] ehci_hcd 0000:00:1d.7: irq 22, io mem 0xf0644000
> > [ 18.328914] ehci_hcd 0000:00:1d.7: USB 2.0 started, EHCI 1.00, driver 10 Dec 2004
> > [ 18.329163] usb usb5: configuration #1 chosen from 1 choice
> > [ 18.329282] hub 5-0:1.0: USB hub found
> > [ 18.329374] hub 5-0:1.0: 8 ports detected
> > [ 18.429643] ACPI: PCI Interrupt 0000:08:03.4[A] -> GSI 16 (level, low) -> IRQ 19
> > [ 18.480534] ohci1394: fw-host0: OHCI-1394 1.1 (PCI): IRQ=[19] MMIO=[f0201000-f02017ff] Max Packet=[2048] IR/IT contexts=[8/8]
> > [ 18.483902] ACPI: PCI Interrupt 0000:05:00.0[A] -> GSI 18 (level, low) -> IRQ 20
> > [ 18.484135] PCI: Setting latency timer of device 0000:05:00.0 to 64
> > [ 18.484153] iwl3945: Detected Intel PRO/Wireless 3945ABG Network Connection

> > [ 18.595000] sdhci: SDHCI controller found at 0000:08:03.2 [1217:7120] (rev 1)
> > [ 18.595114] ACPI: PCI Interrupt 0000:08:03.2[A] -> GSI 16 (level, low) -> IRQ 19
> > [ 18.595299] sdhci:slot0: Unknown controller version (16). You may experience problems.
> > [ 18.595581] mmc0: SDHCI at 0xf0202800 irq 19 PIO
> > [ 18.614800] ACPI: PCI Interrupt 0000:00:1b.0[A] -> GSI 17 (level, low) -> IRQ 23
> > [ 18.614990] PCI: Setting latency timer of device 0000:00:1b.0 to 64
> > [ 18.700659] iwl3945: Tunable channels: 13 802.11bg, 23 802.11a channels
> > [ 18.701906] wmaster0: Selected rate control algorithm 'iwl-3945-rs'

iwl3945 is wmaster0


> > [ 18.721016] pccard: PCMCIA card inserted into slot 1
> > [ 18.721131] cs: memory probe 0xf0200000-0xf02fffff: excluding 0xf0200000-0xf020ffff
> > [ 18.739151] pcmcia: registering new device pcmcia1.0

> > [ 18.740929] net eth1: device_rename: sysfs_create_symlink failed (-17)

> > [ 18.741075] net wlan0_rename: device_rename: sysfs_create_symlink failed (-17)

These look like interesting bits here ... sysfs_create_symlink() failed
with the infamous -EEXIST. Those were actually two different calls to
dev_change_name()->device_rename()->sysfs_create_symlink() it appears.

If this is reproducible, please rebuild with a kernel with debugging
config options enabled and reproduce.


> > [ 18.741625] udev: renamed network interface wmaster0 to eth1

But did this wmaster0 -> eth1 actually succeed, is the question.


> > [ 18.779425] cs: IO port probe 0x100-0x3af:<3>si3054: cannot initialize. EXT MID = 0000
> > [ 18.783468] clean.
> > [ 18.783592] cs: IO port probe 0x3e0-0x4ff: excluding 0x4d0-0x4d7
> > [ 18.784668] cs: IO port probe 0x820-0x8ff: clean.
> > [ 18.785521] cs: IO port probe 0xc00-0xcf7: clean.
> > [ 18.786456] cs: IO port probe 0xa00-0xaff: clean.
> > [ 18.837550] cs: IO port probe 0x100-0x3af: clean.
> > [ 18.839641] cs: IO port probe 0x3e0-0x4ff: excluding 0x4d0-0x4d7
> > [ 18.840664] cs: IO port probe 0x820-0x8ff: clean.
> > [ 18.841501] cs: IO port probe 0xc00-0xcf7: clean.
> > [ 18.842426] cs: IO port probe 0xa00-0xaff: clean.
> > [ 18.964571] usb 4-2: new full speed USB device using uhci_hcd and address 2
> > [ 19.141976] usb 4-2: configuration #1 chosen from 1 choice
> > [ 19.214394] Bluetooth: Core ver 2.11
> > [ 19.214546] NET: Registered protocol family 31
> > [ 19.214640] Bluetooth: HCI device and connection manager initialized
> > [ 19.214737] Bluetooth: HCI socket layer initialized
> > [ 19.257180] Bluetooth: HCI USB driver ver 2.9
> > [ 19.257349] usbcore: registered new interface driver hci_usb
> > [ 19.751313] ieee1394: Host added: ID:BUS[0-00:1023] GUID[00000e10036532e4]
> > [ 50.113619] Adding 979924k swap on /dev/sda5. Priority:-1 extents:1 across:979924k
> > [ 50.163229] EXT3 FS on sda6, internal journal
> > [ 64.870417] NET: Registered protocol family 10
> > [ 64.870613] lo: Disabled Privacy Extensions
> > [ 71.028607] Bluetooth: L2CAP ver 2.8
> > [ 71.028615] Bluetooth: L2CAP socket layer initialized
> > [ 71.168926] Bluetooth: RFCOMM socket layer initialized
> > [ 71.169019] Bluetooth: RFCOMM TTY layer initialized
> > [ 71.169054] Bluetooth: RFCOMM ver 1.8
> > [ 71.539346] ADDRCONF(NETDEV_UP): wlan0_rename: link is not ready
> > [ 71.737809] sky2 eth0: enabling interface
> > [ 71.740794] sky2 eth0: ram buffer 0K
> > [ 71.741733] ADDRCONF(NETDEV_UP): eth0: link is not ready

> > [ 142.968449] wlan0_rename: Initial auth_alg=0
> > [ 142.968461] wlan0_rename: authenticate with AP 00:0c:41:de:12:e1
> > [ 142.976883] wlan0_rename: RX authentication from 00:0c:41:de:12:e1 (alg=0 transaction=2 status=0)
> > [ 142.976891] wlan0_rename: authenticated
> > [ 142.976895] wlan0_rename: associate with AP 00:0c:41:de:12:e1
> > [ 142.979389] wlan0_rename: authentication frame received from 00:0c:41:de:12:e1, but not in authenticate state - ignored
> > [ 142.980404] wlan0_rename: RX AssocResp from 00:0c:41:de:12:e1 (capab=0x401 status=0 aid=1)
> > [ 142.980412] wlan0_rename: associated
> > [ 142.982914] ADDRCONF(NETDEV_CHANGE): wlan0_rename: link becomes ready
> > [ 148.291754] ICMPv6 NA: someone advertises our address on wlan0_rename!
> > [ 151.386585] wlan0_rename: duplicate address detected!


> > [ 382.517007] BUG: scheduling while atomic: ifconfig/0x00000002/4170
> > [ 382.517029] [<c032bf5c>] __sched_text_start+0x84/0x71c
> > [ 382.517047] [<c032c5da>] __sched_text_start+0x702/0x71c
> > [ 382.517065] [<c0174a6c>] link_path_walk+0xa9/0xb3
> > [ 382.517079] [<c032c6ad>] wait_for_completion+0x65/0x9b
> > [ 382.517090] [<c011f0a2>] default_wake_function+0x0/0xc
> > [ 382.517103] [<c01334cf>] synchronize_rcu+0x2a/0x2f
> > [ 382.517115] [<c0133091>] wakeme_after_rcu+0x0/0x8
> > [ 382.517127] [<c02d5f4f>] dev_deactivate+0x89/0x9c
> > [ 382.517138] [<c02c8abc>] dev_close+0x24/0x67
> > [ 382.517150] [<e01f402b>] ieee80211_master_stop+0x4a/0x6d [mac80211]
> > [ 382.517176] [<c02c8ae3>] dev_close+0x4b/0x67
> > [ 382.517183] [<c02c7f47>] dev_change_flags+0x9d/0x14e
> > [ 382.517193] [<c03058c8>] devinet_ioctl+0x224/0x532
> > [ 382.517201] [<c02c9589>] dev_ifsioc+0x113/0x396
> > [ 382.517208] [<c02c8d59>] dev_load+0x24/0x4b
> > [ 382.517214] [<c02be61d>] sock_ioctl+0x0/0x1be
> > [ 382.517233] [<c02be7bc>] sock_ioctl+0x19f/0x1be
> > [ 382.517239] [<c011a993>] do_page_fault+0x269/0x58e
> > [ 382.517246] [<c02be61d>] sock_ioctl+0x0/0x1be
> > [ 382.517254] [<c0176787>] do_ioctl+0x1f/0x62
> > [ 382.517263] [<c0176a01>] vfs_ioctl+0x237/0x249
> > [ 382.517270] [<c016b7d8>] do_sys_open+0xbb/0xc5
> > [ 382.517280] [<c0176a46>] sys_ioctl+0x33/0x4d
> > [ 382.517288] [<c0103e52>] sysenter_past_esp+0x5f/0x85
> > [ 382.517303] =======================


> > [ 382.528958] BUG: scheduling while atomic: ifconfig/0x00000002/4170
> > [ 382.528972] [<c032bf5c>] __sched_text_start+0x84/0x71c
> > [ 382.528996] [<c032c6ad>] wait_for_completion+0x65/0x9b
> > [ 382.529005] [<c011f0a2>] default_wake_function+0x0/0xc
> > [ 382.529014] [<c01334cf>] synchronize_rcu+0x2a/0x2f
> > [ 382.529021] [<c0133091>] wakeme_after_rcu+0x0/0x8
> > [ 382.529031] [<c02d5f4f>] dev_deactivate+0x89/0x9c

Probably tangential, but Herbert, is the call to synchronize_rcu() in
dev_deactivate() really correct?

I saw that you put it in commit d4828d85d188dc70 about a year back
and it's supposed to ensure that all outstanding transmissions from
dev_queue_xmit() are over before proceeding. However, I think it may not
serve that purpose -- as the comment before synchronize_rcu() and its
use of call_rcu() [and not of call_rcu_bh()] indicate, synchronize_rcu()
is used to synchronize with all outstanding rcu_read_lock() uses. OTOH,
dev_queue_xmit() uses rcu_read_lock_*bh*() which isn't really the same --
bh / non-bh RCU callbacks have separate lists in the RCU implementation,
so I don't really see anything preventing old transmissions to remain
outstanding even after the synchronize_rcu() call in dev_deactivate (?)

Whether that has anything to do with this particular bug or not,
I have no idea ...


> > [ 382.529041] [<c02c8abc>] dev_close+0x24/0x67
> > [ 382.529052] [<e01f402b>] ieee80211_master_stop+0x4a/0x6d [mac80211]
> > [ 382.529077] [<c02c8ae3>] dev_close+0x4b/0x67
> > [ 382.529088] [<c02c7f47>] dev_change_flags+0x9d/0x14e
> > [ 382.529101] [<c03058c8>] devinet_ioctl+0x224/0x532
> > [ 382.529111] [<c02c9589>] dev_ifsioc+0x113/0x396
> > [ 382.529122] [<c02c8d59>] dev_load+0x24/0x4b
> > [ 382.529132] [<c02be61d>] sock_ioctl+0x0/0x1be
> > [ 382.529153] [<c02be7bc>] sock_ioctl+0x19f/0x1be
> > [ 382.529164] [<c011a993>] do_page_fault+0x269/0x58e
> > [ 382.529174] [<c02be61d>] sock_ioctl+0x0/0x1be
> > [ 382.529187] [<c0176787>] do_ioctl+0x1f/0x62
> > [ 382.529201] [<c0176a01>] vfs_ioctl+0x237/0x249
> > [ 382.529211] [<c016b7d8>] do_sys_open+0xbb/0xc5
> > [ 382.529224] [<c0176a46>] sys_ioctl+0x33/0x4d
> > [ 382.529235] [<c0103e52>] sysenter_past_esp+0x5f/0x85
> > [ 382.529253] =======================
> > [ 382.530962] Freeing alive inet6 address cd98f600

Unrelated, but I also wish the "BUG: scheduling while atomic" was the more
instructive show_regs() trace and not a dump_stack() ... I'll make such a
patch today itself.

2007-09-12 13:20:53

by David Miller

[permalink] [raw]
Subject: Re: BUG: scheduling while atomic: ifconfig/0x00000002/4170

From: Johannes Berg <[email protected]>
Date: Thu, 06 Sep 2007 17:19:55 +0200

>
> Oh btw. Can we stick a might_sleep() into dev_close() *before* the test
> whether the device is up? That way, we'd have seen the bug, but
> apparently nobody before Florian ever did a 'ip link set wmaster0 down'
> while the other interfaces were still open.

I've added this to net-2.6.24

2007-09-07 16:02:33

by Michael Büsch

[permalink] [raw]
Subject: Re: BUG: scheduling while atomic: ifconfig/0x00000002/4170

On Friday 07 September 2007, Johannes Berg wrote:
> On Thu, 2007-09-06 at 08:46 -0700, Paul E. McKenney wrote:
>
> > Looks good to me from an RCU viewpoint. I cannot claim familiarity with
> > this code. I therefore especially like the indications of where RTNL
> > is held and not!!!
>
> :)
>
> > Some questions below based on a quick scan. And a global question:
> > should the comments about RTNL being held be replaced by ASSERT_RTNL()?
>
> I don't like ASSERT_RTNL() much because it actually tries to lock it.
> I'd be much happer if it was WARN_ON(!mutex_locked(&rtnl_mutex)) or
> something equivalent.

What's the problem with trying to lock it?
In the paths where you insert this assertion, you will be locked.
So the trylock will fail and not cause any blocking or something else.
It's basically not more expensive than your mutex_locked test.
And the !mutex_locked test might not work on UP (Not sure, about
the current implementation.)

2007-09-06 08:28:37

by Satyam Sharma

[permalink] [raw]
Subject: Re: BUG: scheduling while atomic: ifconfig/0x00000002/4170



On Thu, 6 Sep 2007, Herbert Xu wrote:

> On Thu, Sep 06, 2007 at 10:32:33AM +0530, Satyam Sharma wrote:
> >
> > > > [ 382.529041] [<c02c8abc>] dev_close+0x24/0x67
> > > > [ 382.529052] [<e01f402b>] ieee80211_master_stop+0x4a/0x6d [mac80211]
>
> This is where the bug is. You cannot call dev_close from an
> atomic context as i33380211_master_stop does it within spin
> locks.

Doh, of course! I must be blind ... and wait_for_completion()'s
might_sleep() clearly didn't trigger earlier because Florian must've
had CONFIG_DEBUG_SPINLOCK_SLEEP off in his .config ...

2007-09-06 12:45:11

by Johannes Berg

[permalink] [raw]
Subject: Re: BUG: scheduling while atomic: ifconfig/0x00000002/4170

On Thu, 2007-09-06 at 18:22 +0530, Satyam Sharma wrote:

> Unless I missed something obvious (let me know if that's the case! :-)
> an RCU-protected list would suffer the same fate. list_for_each_xxx_rcu()
> must be under rcu_read_lock() which == preempt_disable() ...

Right. But I'm thinking that since all list manipulations are done under
RTNL we only need to protect against removing things from the list while
the RX or TX path is using it, so only it would have to use it under
rcu_read_lock() [which it already takes due to key management] so we
could get rid of that sub_if_lock completely. There is one issue with
this I know about and that is the management skb queue but I'll have to
take a closer look.

johannes


Attachments:
signature.asc (190.00 B)
This is a digitally signed message part

2007-09-06 12:43:32

by Johannes Berg

[permalink] [raw]
Subject: Re: BUG: scheduling while atomic: ifconfig/0x00000002/4170

On Thu, 2007-09-06 at 20:36 +0800, Herbert Xu wrote:
> Johannes Berg <[email protected]> wrote:
> >
> > Hah, I suspected as much but didn't have a chance to look yet. I had
> > plans to replace that sub_if_list with an RCU list and not require the
> > lock there, but that's far off. Any ideas how to fix this? We can't
> > reject the master stop so we have to walk the list, I guess we'll have
> > to audit the other list manipulation places, I think they're all under
> > RTNL.
>
> Yeah I think they're all under RTNL too. So you don't need to
> take the lock here at all since you should already have the RTNL.

I'll take a look at them and post appropriate patches.

johannes


Attachments:
signature.asc (190.00 B)
This is a digitally signed message part

2007-09-07 13:26:18

by Johannes Berg

[permalink] [raw]
Subject: Re: BUG: scheduling while atomic: ifconfig/0x00000002/4170

On Thu, 2007-09-06 at 08:46 -0700, Paul E. McKenney wrote:

> Looks good to me from an RCU viewpoint. I cannot claim familiarity with
> this code. I therefore especially like the indications of where RTNL
> is held and not!!!

:)

> Some questions below based on a quick scan. And a global question:
> should the comments about RTNL being held be replaced by ASSERT_RTNL()?

I don't like ASSERT_RTNL() much because it actually tries to lock it.
I'd be much happer if it was WARN_ON(!mutex_locked(&rtnl_mutex)) or
something equivalent.

In any case, I have an updated patch I'll be sending soon, and it
requires a new list walking primitive I'll also send.

> > - write_lock_bh(&local->sub_if_lock);
> > + /* we're under RTNL so all this is fine */
> > if (unlikely(local->reg_state == IEEE80211_DEV_UNREGISTERED)) {
> > - write_unlock_bh(&local->sub_if_lock);
> > __ieee80211_if_del(local, sdata);
> > return -ENODEV;
> > }
> > - list_add(&sdata->list, &local->sub_if_list);
> > + list_add_tail_rcu(&sdata->list, &local->interfaces);
>
> The _rcu is required because this list isn't protected by RTNL?

Yes, not all walkers of the list are protected by the RTNL.

> > @@ -226,22 +225,22 @@ void ieee80211_if_reinit(struct net_devi
> > /* Remove all virtual interfaces that use this BSS
> > * as their sdata->bss */
> > struct ieee80211_sub_if_data *tsdata, *n;
> > - LIST_HEAD(tmp_list);
> >
> > - write_lock_bh(&local->sub_if_lock);
>
> This code is also protected by RTNL?

Yes.

> > ASSERT_RTNL();
>
> I -like- this!!! ;-)

:)

johannes


Attachments:
signature.asc (190.00 B)
This is a digitally signed message part

2007-09-07 14:34:38

by Johannes Berg

[permalink] [raw]
Subject: Re: BUG: scheduling while atomic: ifconfig/0x00000002/4170

On Fri, 2007-09-07 at 07:25 -0700, Paul E. McKenney wrote:

> > > > @@ -226,22 +225,22 @@ void ieee80211_if_reinit(struct net_devi
> > > > /* Remove all virtual interfaces that use this BSS
> > > > * as their sdata->bss */
> > > > struct ieee80211_sub_if_data *tsdata, *n;
> > > > - LIST_HEAD(tmp_list);
> > > >
> > > > - write_lock_bh(&local->sub_if_lock);
> > >
> > > This code is also protected by RTNL?
> >
> > Yes.
>
> Comment? (Or is it in the function header?)

Oh, forgot to say: yes, there is a comment further up and even an
ASSERT_RTNL()

johannes


Attachments:
signature.asc (190.00 B)
This is a digitally signed message part

2007-09-13 07:15:08

by Johannes Berg

[permalink] [raw]
Subject: Re: BUG: scheduling while atomic: ifconfig/0x00000002/4170

On Wed, 2007-09-12 at 05:34 -0700, David Miller wrote:
> From: Johannes Berg <[email protected]>
> Date: Thu, 06 Sep 2007 17:19:55 +0200
>
> >
> > Oh btw. Can we stick a might_sleep() into dev_close() *before* the test
> > whether the device is up? That way, we'd have seen the bug, but
> > apparently nobody before Florian ever did a 'ip link set wmaster0 down'
> > while the other interfaces were still open.
>
> I've added this to net-2.6.24

Great, thanks. Now I just hope John gets around to merging all the
accumulated fixes :)

johannes


Attachments:
signature.asc (190.00 B)
This is a digitally signed message part

2007-09-07 14:37:30

by Johannes Berg

[permalink] [raw]
Subject: [RFC] mac80211: fix virtual interface locking

Florian Lohoff noticed a bug in mac80211: when bringing the
master interface down while other virtual interfaces are up
we call dev_close() under a spinlock which is not allowed.
This patch removes the sub_if_lock used by mac80211 in favour
of using an RCU list. All list manipulations are already done
under rtnl so are well protected against each other, and the
read-side locks we took in the RX and TX code are already in
RCU read-side critical sections.

Signed-off-by: Johannes Berg <[email protected]>
Cc: Florian Lohoff <[email protected]>
Cc: Herbert Xu <[email protected]>
Cc: Michal Piotrowski <[email protected]>
Cc: Satyam Sharma <[email protected]>

---
If you want to test this you'll need to get the other pending patches,
as John is at KS he isn't pushing to Dave who is at KS too anyhow. Grab
from
http://johannes.sipsolutions.net/patches/net-2.6.24/all/2007-09-06-13:43/ patches 002-011, they are slated to go into net-2.6.24 if timing works out. I'll backport this fix to -stable when we actually get around to verifying it.

net/mac80211/ieee80211.c | 100 ++++++++++++++++++++---------------------
net/mac80211/ieee80211_i.h | 5 --
net/mac80211/ieee80211_iface.c | 31 +++++-------
net/mac80211/ieee80211_sta.c | 12 ++--
net/mac80211/rx.c | 9 +--
net/mac80211/tx.c | 10 ++--
6 files changed, 84 insertions(+), 83 deletions(-)

--- wireless-dev.orig/net/mac80211/ieee80211.c 2007-09-07 10:52:12.604441281 +0200
+++ wireless-dev/net/mac80211/ieee80211.c 2007-09-07 16:30:34.044429746 +0200
@@ -88,24 +88,31 @@ static struct dev_mc_list *ieee80211_get
return NULL;
}

- /* start of iteration, both unassigned */
- if (!mcd->cur && !mcd->sdata) {
- mcd->sdata = list_entry(local->sub_if_list.next,
- struct ieee80211_sub_if_data, list);
- mcd->cur = mcd->sdata->dev->mc_list;
- }
+ /*
+ * Prepare for iteration if not done already.
+ */
+ list_prepare_entry(mcd->sdata, &local->interfaces, list);

- if (mcd->cur)
+ if (mcd->cur) {
+ /*
+ * Iterate over the multicast addresses in
+ * the current device (mcd->sdata).
+ */
mcd->cur = mcd->cur->next;
+ }

- while (!mcd->cur) {
- /* reached end of interface list? */
- if (mcd->sdata->list.next == &local->sub_if_list)
- break;
- /* otherwise try next interface */
- mcd->sdata = list_entry(mcd->sdata->list.next,
- struct ieee80211_sub_if_data, list);
- mcd->cur = mcd->sdata->dev->mc_list;
+ if (!mcd->cur) {
+ /*
+ * Iterate over the devices until finding one (the
+ * first or the next) with multicast addresses.
+ */
+ list_for_each_entry_continue_rcu(mcd->sdata,
+ &local->interfaces,
+ list) {
+ mcd->cur = mcd->sdata->dev->mc_list;
+ if (mcd->cur)
+ break;
+ }
}

return mcd->cur;
@@ -145,9 +152,10 @@ static void ieee80211_configure_filter(s

/*
* We can iterate through the device list for the multicast
- * address list so need to lock it.
+ * address list so need to be in a RCU read-side section,
+ * the RTNL isn't held in this function.
*/
- read_lock(&local->sub_if_lock);
+ rcu_read_lock();

/* be a bit nasty */
new_flags |= (1<<31);
@@ -163,7 +171,7 @@ static void ieee80211_configure_filter(s
WARN_ON(mcd.cur);

local->filter_flags = new_flags & ~(1<<31);
- read_unlock(&local->sub_if_lock);
+ rcu_read_unlock();

netif_tx_unlock(local->mdev);
}
@@ -176,14 +184,13 @@ static int ieee80211_master_open(struct
struct ieee80211_sub_if_data *sdata;
int res = -EOPNOTSUPP;

- read_lock(&local->sub_if_lock);
- list_for_each_entry(sdata, &local->sub_if_list, list) {
+ /* we hold the RTNL here so can safely walk the list */
+ list_for_each_entry(sdata, &local->interfaces, list) {
if (sdata->dev != dev && netif_running(sdata->dev)) {
res = 0;
break;
}
}
- read_unlock(&local->sub_if_lock);
return res;
}

@@ -192,11 +199,10 @@ static int ieee80211_master_stop(struct
struct ieee80211_local *local = wdev_priv(dev->ieee80211_ptr);
struct ieee80211_sub_if_data *sdata;

- read_lock(&local->sub_if_lock);
- list_for_each_entry(sdata, &local->sub_if_list, list)
+ /* we hold the RTNL here so can safely walk the list */
+ list_for_each_entry(sdata, &local->interfaces, list)
if (sdata->dev != dev && netif_running(sdata->dev))
dev_close(sdata->dev);
- read_unlock(&local->sub_if_lock);

return 0;
}
@@ -395,8 +401,8 @@ static int ieee80211_open(struct net_dev

sdata = IEEE80211_DEV_TO_SUB_IF(dev);

- read_lock(&local->sub_if_lock);
- list_for_each_entry(nsdata, &local->sub_if_list, list) {
+ /* we hold the RTNL here so can safely walk the list */
+ list_for_each_entry(nsdata, &local->interfaces, list) {
struct net_device *ndev = nsdata->dev;

if (ndev != dev && ndev != local->mdev && netif_running(ndev) &&
@@ -405,10 +411,8 @@ static int ieee80211_open(struct net_dev
* check whether it may have the same address
*/
if (!identical_mac_addr_allowed(sdata->type,
- nsdata->type)) {
- read_unlock(&local->sub_if_lock);
+ nsdata->type))
return -ENOTUNIQ;
- }

/*
* can only add VLANs to enabled APs
@@ -419,7 +423,6 @@ static int ieee80211_open(struct net_dev
sdata->u.vlan.ap = nsdata;
}
}
- read_unlock(&local->sub_if_lock);

switch (sdata->type) {
case IEEE80211_IF_TYPE_WDS:
@@ -541,14 +544,13 @@ static int ieee80211_stop(struct net_dev
del_timer_sync(&sdata->u.sta.timer);
del_timer_sync(&sdata->u.sta.admit_timer);
/*
- * Holding the sub_if_lock for writing here blocks
- * out the receive path and makes sure it's not
- * currently processing a packet that may get
- * added to the queue.
+ * When we get here, the interface is marked down.
+ * Call synchronize_rcu() to wait for the RX path
+ * should it be using the interface and enqueuing
+ * frames at this very time on another CPU.
*/
- write_lock_bh(&local->sub_if_lock);
+ synchronize_rcu();
skb_queue_purge(&sdata->u.sta.skb_queue);
- write_unlock_bh(&local->sub_if_lock);

if (!local->ops->hw_scan &&
local->scan_dev == sdata->dev) {
@@ -1101,9 +1103,9 @@ void ieee80211_tx_status(struct ieee8021

rthdr->data_retries = status->retry_count;

- read_lock(&local->sub_if_lock);
+ rcu_read_lock();
monitors = local->monitors;
- list_for_each_entry(sdata, &local->sub_if_list, list) {
+ list_for_each_entry_rcu(sdata, &local->interfaces, list) {
/*
* Using the monitors counter is possibly racy, but
* if the value is wrong we simply either clone the skb
@@ -1119,7 +1121,7 @@ void ieee80211_tx_status(struct ieee8021
continue;
monitors--;
if (monitors)
- skb2 = skb_clone(skb, GFP_KERNEL);
+ skb2 = skb_clone(skb, GFP_ATOMIC);
else
skb2 = NULL;
skb->dev = sdata->dev;
@@ -1134,7 +1136,7 @@ void ieee80211_tx_status(struct ieee8021
}
}
out:
- read_unlock(&local->sub_if_lock);
+ rcu_read_unlock();
if (skb)
dev_kfree_skb(skb);
}
@@ -1222,8 +1224,7 @@ struct ieee80211_hw *ieee80211_alloc_hw(

INIT_LIST_HEAD(&local->modes_list);

- rwlock_init(&local->sub_if_lock);
- INIT_LIST_HEAD(&local->sub_if_list);
+ INIT_LIST_HEAD(&local->interfaces);

INIT_DELAYED_WORK(&local->scan_work, ieee80211_sta_scan_work);
ieee80211_rx_bss_list_init(mdev);
@@ -1242,7 +1243,8 @@ struct ieee80211_hw *ieee80211_alloc_hw(
sdata->u.ap.force_unicast_rateidx = -1;
sdata->u.ap.max_ratectrl_rateidx = -1;
ieee80211_if_sdata_init(sdata);
- list_add_tail(&sdata->list, &local->sub_if_list);
+ /* no RCU needed since we're still during init phase */
+ list_add_tail(&sdata->list, &local->interfaces);

tasklet_init(&local->tx_pending_tasklet, ieee80211_tx_pending,
(unsigned long)local);
@@ -1401,7 +1403,6 @@ void ieee80211_unregister_hw(struct ieee
{
struct ieee80211_local *local = hw_to_local(hw);
struct ieee80211_sub_if_data *sdata, *tmp;
- struct list_head tmp_list;
int i;

tasklet_kill(&local->tx_pending_tasklet);
@@ -1415,11 +1416,12 @@ void ieee80211_unregister_hw(struct ieee
if (local->apdev)
ieee80211_if_del_mgmt(local);

- write_lock_bh(&local->sub_if_lock);
- list_replace_init(&local->sub_if_list, &tmp_list);
- write_unlock_bh(&local->sub_if_lock);
-
- list_for_each_entry_safe(sdata, tmp, &tmp_list, list)
+ /*
+ * At this point, interface list manipulations are fine
+ * because the driver cannot be handing us frames any
+ * more and the tasklet is killed.
+ */
+ list_for_each_entry_safe(sdata, tmp, &local->interfaces, list)
__ieee80211_if_del(local, sdata);

rtnl_unlock();
--- wireless-dev.orig/net/mac80211/ieee80211_i.h 2007-09-07 10:52:12.604441281 +0200
+++ wireless-dev/net/mac80211/ieee80211_i.h 2007-09-07 16:30:33.974429746 +0200
@@ -548,9 +548,8 @@ struct ieee80211_local {
ieee80211_rx_handler *rx_handlers;
ieee80211_tx_handler *tx_handlers;

- rwlock_t sub_if_lock; /* Protects sub_if_list. Cannot be taken under
- * sta_bss_lock or sta_lock. */
- struct list_head sub_if_list;
+ struct list_head interfaces;
+
int sta_scanning;
int scan_channel_idx;
enum { SCAN_SET_CHANNEL, SCAN_SEND_PROBE } scan_state;
--- wireless-dev.orig/net/mac80211/ieee80211_iface.c 2007-09-07 10:52:12.604441281 +0200
+++ wireless-dev/net/mac80211/ieee80211_iface.c 2007-09-07 16:30:34.244429746 +0200
@@ -79,16 +79,15 @@ int ieee80211_if_add(struct net_device *
ieee80211_debugfs_add_netdev(sdata);
ieee80211_if_set_type(ndev, type);

- write_lock_bh(&local->sub_if_lock);
+ /* we're under RTNL so all this is fine */
if (unlikely(local->reg_state == IEEE80211_DEV_UNREGISTERED)) {
- write_unlock_bh(&local->sub_if_lock);
__ieee80211_if_del(local, sdata);
return -ENODEV;
}
- list_add(&sdata->list, &local->sub_if_list);
+ list_add_tail_rcu(&sdata->list, &local->interfaces);
+
if (new_dev)
*new_dev = ndev;
- write_unlock_bh(&local->sub_if_lock);

return 0;

@@ -242,22 +241,22 @@ void ieee80211_if_reinit(struct net_devi
/* Remove all virtual interfaces that use this BSS
* as their sdata->bss */
struct ieee80211_sub_if_data *tsdata, *n;
- LIST_HEAD(tmp_list);

- write_lock_bh(&local->sub_if_lock);
- list_for_each_entry_safe(tsdata, n, &local->sub_if_list, list) {
+ list_for_each_entry_safe(tsdata, n, &local->interfaces, list) {
if (tsdata != sdata && tsdata->bss == &sdata->u.ap) {
printk(KERN_DEBUG "%s: removing virtual "
"interface %s because its BSS interface"
" is being removed\n",
sdata->dev->name, tsdata->dev->name);
- list_move_tail(&tsdata->list, &tmp_list);
+ list_del_rcu(&tsdata->list);
+ /*
+ * We have lots of time and can afford
+ * to sync for each interface
+ */
+ synchronize_rcu();
+ __ieee80211_if_del(local, tsdata);
}
}
- write_unlock_bh(&local->sub_if_lock);
-
- list_for_each_entry_safe(tsdata, n, &tmp_list, list)
- __ieee80211_if_del(local, tsdata);

kfree(sdata->u.ap.beacon_head);
kfree(sdata->u.ap.beacon_tail);
@@ -334,18 +333,16 @@ int ieee80211_if_remove(struct net_devic

ASSERT_RTNL();

- write_lock_bh(&local->sub_if_lock);
- list_for_each_entry_safe(sdata, n, &local->sub_if_list, list) {
+ list_for_each_entry_safe(sdata, n, &local->interfaces, list) {
if ((sdata->type == id || id == -1) &&
strcmp(name, sdata->dev->name) == 0 &&
sdata->dev != local->mdev) {
- list_del(&sdata->list);
- write_unlock_bh(&local->sub_if_lock);
+ list_del_rcu(&sdata->list);
+ synchronize_rcu();
__ieee80211_if_del(local, sdata);
return 0;
}
}
- write_unlock_bh(&local->sub_if_lock);
return -ENODEV;
}

--- wireless-dev.orig/net/mac80211/ieee80211_sta.c 2007-09-07 10:52:12.634441281 +0200
+++ wireless-dev/net/mac80211/ieee80211_sta.c 2007-09-07 10:57:23.574441281 +0200
@@ -3597,8 +3597,8 @@ void ieee80211_scan_completed(struct iee
memset(&wrqu, 0, sizeof(wrqu));
wireless_send_event(dev, SIOCGIWSCAN, &wrqu, NULL);

- read_lock(&local->sub_if_lock);
- list_for_each_entry(sdata, &local->sub_if_list, list) {
+ rcu_read_lock();
+ list_for_each_entry_rcu(sdata, &local->interfaces, list) {

/* No need to wake the master device. */
if (sdata->dev == local->mdev)
@@ -3612,7 +3612,7 @@ void ieee80211_scan_completed(struct iee

netif_wake_queue(sdata->dev);
}
- read_unlock(&local->sub_if_lock);
+ rcu_read_unlock();

sdata = IEEE80211_DEV_TO_SUB_IF(dev);
if (sdata->type == IEEE80211_IF_TYPE_IBSS) {
@@ -3749,8 +3749,8 @@ static int ieee80211_sta_start_scan(stru

local->sta_scanning = 1;

- read_lock(&local->sub_if_lock);
- list_for_each_entry(sdata, &local->sub_if_list, list) {
+ rcu_read_lock();
+ list_for_each_entry_rcu(sdata, &local->interfaces, list) {

/* Don't stop the master interface, otherwise we can't transmit
* probes! */
@@ -3762,7 +3762,7 @@ static int ieee80211_sta_start_scan(stru
(sdata->u.sta.flags & IEEE80211_STA_ASSOCIATED))
ieee80211_send_nullfunc(local, sdata, 1);
}
- read_unlock(&local->sub_if_lock);
+ rcu_read_unlock();

if (ssid) {
local->scan_ssid_len = ssid_len;
--- wireless-dev.orig/net/mac80211/rx.c 2007-09-07 10:52:12.654441281 +0200
+++ wireless-dev/net/mac80211/rx.c 2007-09-07 16:30:34.144429746 +0200
@@ -1522,8 +1522,9 @@ void __ieee80211_rx(struct ieee80211_hw
}

/*
- * key references are protected using RCU and this requires that
- * we are in a read-site RCU section during receive processing
+ * key references and virtual interfaces are protected using RCU
+ * and this requires that we are in a read-side RCU section during
+ * receive processing
*/
rcu_read_lock();

@@ -1578,8 +1579,7 @@ void __ieee80211_rx(struct ieee80211_hw

bssid = ieee80211_get_bssid(hdr, skb->len - radiotap_len);

- read_lock(&local->sub_if_lock);
- list_for_each_entry(sdata, &local->sub_if_list, list) {
+ list_for_each_entry_rcu(sdata, &local->interfaces, list) {
rx.flags |= IEEE80211_TXRXD_RXRA_MATCH;

if (!netif_running(sdata->dev))
@@ -1632,7 +1632,6 @@ void __ieee80211_rx(struct ieee80211_hw
&rx, sta);
} else
dev_kfree_skb(skb);
- read_unlock(&local->sub_if_lock);

end:
rcu_read_unlock();
--- wireless-dev.orig/net/mac80211/tx.c 2007-09-07 10:52:12.674441281 +0200
+++ wireless-dev/net/mac80211/tx.c 2007-09-07 12:20:51.174437343 +0200
@@ -291,8 +291,12 @@ static void purge_old_ps_buffers(struct
struct ieee80211_sub_if_data *sdata;
struct sta_info *sta;

- read_lock(&local->sub_if_lock);
- list_for_each_entry(sdata, &local->sub_if_list, list) {
+ /*
+ * virtual interfaces are protected by RCU
+ */
+ rcu_read_lock();
+
+ list_for_each_entry_rcu(sdata, &local->interfaces, list) {
struct ieee80211_if_ap *ap;
if (sdata->dev == local->mdev ||
sdata->type != IEEE80211_IF_TYPE_AP)
@@ -305,7 +309,7 @@ static void purge_old_ps_buffers(struct
}
total += skb_queue_len(&ap->ps_bc_buf);
}
- read_unlock(&local->sub_if_lock);
+ rcu_read_unlock();

read_lock_bh(&local->sta_lock);
list_for_each_entry(sta, &local->sta_list, list) {



2007-09-06 12:39:16

by Satyam Sharma

[permalink] [raw]
Subject: Re: BUG: scheduling while atomic: ifconfig/0x00000002/4170



On Thu, 6 Sep 2007, Johannes Berg wrote:

> On Thu, 2007-09-06 at 16:23 +0800, Herbert Xu wrote:
> > On Thu, Sep 06, 2007 at 10:32:33AM +0530, Satyam Sharma wrote:
> > >
> > > > > [ 382.529041] [<c02c8abc>] dev_close+0x24/0x67
> > > > > [ 382.529052] [<e01f402b>] ieee80211_master_stop+0x4a/0x6d [mac80211]
> >
> > This is where the bug is. You cannot call dev_close from an
> > atomic context as i33380211_master_stop does it within spin
> > locks.
>
> Hah, I suspected as much but didn't have a chance to look yet. I had
> plans to replace that sub_if_list with an RCU list and not require the
> lock there, but that's far off.

Unless I missed something obvious (let me know if that's the case! :-)
an RCU-protected list would suffer the same fate. list_for_each_xxx_rcu()
must be under rcu_read_lock() which == preempt_disable() ...

2007-09-06 08:23:12

by Herbert Xu

[permalink] [raw]
Subject: Re: BUG: scheduling while atomic: ifconfig/0x00000002/4170

On Thu, Sep 06, 2007 at 10:32:33AM +0530, Satyam Sharma wrote:
>
> > > [ 382.529041] [<c02c8abc>] dev_close+0x24/0x67
> > > [ 382.529052] [<e01f402b>] ieee80211_master_stop+0x4a/0x6d [mac80211]

This is where the bug is. You cannot call dev_close from an
atomic context as i33380211_master_stop does it within spin
locks.

Cheers,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

2007-09-07 14:25:44

by Paul E. McKenney

[permalink] [raw]
Subject: Re: BUG: scheduling while atomic: ifconfig/0x00000002/4170

On Fri, Sep 07, 2007 at 03:27:15PM +0200, Johannes Berg wrote:
> On Thu, 2007-09-06 at 08:46 -0700, Paul E. McKenney wrote:
>
> > Looks good to me from an RCU viewpoint. I cannot claim familiarity with
> > this code. I therefore especially like the indications of where RTNL
> > is held and not!!!
>
> :)
>
> > Some questions below based on a quick scan. And a global question:
> > should the comments about RTNL being held be replaced by ASSERT_RTNL()?
>
> I don't like ASSERT_RTNL() much because it actually tries to lock it.
> I'd be much happer if it was WARN_ON(!mutex_locked(&rtnl_mutex)) or
> something equivalent.

Ah! It would indeed be nice to have a lower-overhead ASSERT_RTNL_LIGHT()
or whatever.

> In any case, I have an updated patch I'll be sending soon, and it
> requires a new list walking primitive I'll also send.

Look forward to seeing it!

> > > - write_lock_bh(&local->sub_if_lock);
> > > + /* we're under RTNL so all this is fine */
> > > if (unlikely(local->reg_state == IEEE80211_DEV_UNREGISTERED)) {
> > > - write_unlock_bh(&local->sub_if_lock);
> > > __ieee80211_if_del(local, sdata);
> > > return -ENODEV;
> > > }
> > > - list_add(&sdata->list, &local->sub_if_list);
> > > + list_add_tail_rcu(&sdata->list, &local->interfaces);
> >
> > The _rcu is required because this list isn't protected by RTNL?
>
> Yes, not all walkers of the list are protected by the RTNL.

K.

> > > @@ -226,22 +225,22 @@ void ieee80211_if_reinit(struct net_devi
> > > /* Remove all virtual interfaces that use this BSS
> > > * as their sdata->bss */
> > > struct ieee80211_sub_if_data *tsdata, *n;
> > > - LIST_HEAD(tmp_list);
> > >
> > > - write_lock_bh(&local->sub_if_lock);
> >
> > This code is also protected by RTNL?
>
> Yes.

Comment? (Or is it in the function header?)

> > > ASSERT_RTNL();
> >
> > I -like- this!!! ;-)
>
> :)

Thanx, Paul

2007-09-07 19:19:51

by Johannes Berg

[permalink] [raw]
Subject: Re: BUG: scheduling while atomic: ifconfig/0x00000002/4170

On Fri, 2007-09-07 at 18:01 +0200, Michael Buesch wrote:

> What's the problem with trying to lock it?

I think I had a problem with it once when I inserted it into some code
that was atomic and it all blew up badly ;) Nothing important really but
it sort of made me not like it much.

johannes


Attachments:
signature.asc (190.00 B)
This is a digitally signed message part

2007-09-06 15:46:15

by Paul E. McKenney

[permalink] [raw]
Subject: Re: BUG: scheduling while atomic: ifconfig/0x00000002/4170

On Thu, Sep 06, 2007 at 03:36:55PM +0200, Johannes Berg wrote:
> On Thu, 2007-09-06 at 20:36 +0800, Herbert Xu wrote:
>
> > Yeah I think they're all under RTNL too. So you don't need to
> > take the lock here at all since you should already have the RTNL.
>
> Ok, this patch gets rid of the lock. I'd appreciate if you could give it
> a quick look for obvious RCU abuse as I haven't tested it. It also
> doesn't apply against net-2.6.24 because I based it after the patches I
> have queued with John for net-2.6.24.

Looks good to me from an RCU viewpoint. I cannot claim familiarity with
this code. I therefore especially like the indications of where RTNL
is held and not!!!

Some questions below based on a quick scan. And a global question:
should the comments about RTNL being held be replaced by ASSERT_RTNL()?

Thanx, Paul

> johannes
>
> --- netdev-2.6.orig/net/mac80211/ieee80211.c 2007-09-06 15:35:23.324447686 +0200
> +++ netdev-2.6/net/mac80211/ieee80211.c 2007-09-06 15:35:23.394447686 +0200
> @@ -90,8 +90,9 @@ static struct dev_mc_list *ieee80211_get
>
> /* start of iteration, both unassigned */
> if (!mcd->cur && !mcd->sdata) {
> - mcd->sdata = list_entry(local->sub_if_list.next,
> - struct ieee80211_sub_if_data, list);
> + mcd->sdata = list_entry(
> + rcu_dereference(local->interfaces.next),
> + struct ieee80211_sub_if_data, list);
> mcd->cur = mcd->sdata->dev->mc_list;
> }
>
> @@ -100,10 +101,11 @@ static struct dev_mc_list *ieee80211_get
>
> while (!mcd->cur) {
> /* reached end of interface list? */
> - if (mcd->sdata->list.next == &local->sub_if_list)
> + if (rcu_dereference(mcd->sdata->list.next) ==
> + &local->interfaces)
> break;
> /* otherwise try next interface */
> - mcd->sdata = list_entry(mcd->sdata->list.next,
> + mcd->sdata = list_entry(rcu_dereference(mcd->sdata->list.next),
> struct ieee80211_sub_if_data, list);
> mcd->cur = mcd->sdata->dev->mc_list;
> }
> @@ -145,9 +147,10 @@ static void ieee80211_configure_filter(s
>
> /*
> * We can iterate through the device list for the multicast
> - * address list so need to lock it.
> + * address list so need to be in a RCU read-side section,
> + * the RTNL isn't held in this function.
> */
> - read_lock(&local->sub_if_lock);
> + rcu_read_lock();
>
> /* be a bit nasty */
> new_flags |= (1<<31);
> @@ -163,7 +166,7 @@ static void ieee80211_configure_filter(s
> WARN_ON(mcd.cur);
>
> local->filter_flags = new_flags & ~(1<<31);
> - read_unlock(&local->sub_if_lock);
> + rcu_read_unlock();
>
> netif_tx_unlock(local->mdev);
> }
> @@ -176,14 +179,13 @@ static int ieee80211_master_open(struct
> struct ieee80211_sub_if_data *sdata;
> int res = -EOPNOTSUPP;
>
> - read_lock(&local->sub_if_lock);
> - list_for_each_entry(sdata, &local->sub_if_list, list) {
> + /* we hold the RTNL here so can safely walk the list */
> + list_for_each_entry(sdata, &local->interfaces, list) {
> if (sdata->dev != dev && netif_running(sdata->dev)) {
> res = 0;
> break;
> }
> }
> - read_unlock(&local->sub_if_lock);
> return res;
> }
>
> @@ -192,11 +194,10 @@ static int ieee80211_master_stop(struct
> struct ieee80211_local *local = wdev_priv(dev->ieee80211_ptr);
> struct ieee80211_sub_if_data *sdata;
>
> - read_lock(&local->sub_if_lock);
> - list_for_each_entry(sdata, &local->sub_if_list, list)
> + /* we hold the RTNL here so can safely walk the list */
> + list_for_each_entry(sdata, &local->interfaces, list)
> if (sdata->dev != dev && netif_running(sdata->dev))
> dev_close(sdata->dev);
> - read_unlock(&local->sub_if_lock);
>
> return 0;
> }
> @@ -430,8 +431,8 @@ static int ieee80211_open(struct net_dev
>
> sdata = IEEE80211_DEV_TO_SUB_IF(dev);
>
> - read_lock(&local->sub_if_lock);
> - list_for_each_entry(nsdata, &local->sub_if_list, list) {
> + /* we hold the RTNL here so can safely walk the list */
> + list_for_each_entry(nsdata, &local->interfaces, list) {
> struct net_device *ndev = nsdata->dev;
>
> if (ndev != dev && ndev != local->mdev && netif_running(ndev) &&
> @@ -440,10 +441,8 @@ static int ieee80211_open(struct net_dev
> * check whether it may have the same address
> */
> if (!identical_mac_addr_allowed(sdata->type,
> - nsdata->type)) {
> - read_unlock(&local->sub_if_lock);
> + nsdata->type))
> return -ENOTUNIQ;
> - }
>
> /*
> * can only add VLANs to enabled APs
> @@ -454,7 +453,6 @@ static int ieee80211_open(struct net_dev
> sdata->u.vlan.ap = nsdata;
> }
> }
> - read_unlock(&local->sub_if_lock);
>
> switch (sdata->type) {
> case IEEE80211_IF_TYPE_WDS:
> @@ -575,14 +573,13 @@ static int ieee80211_stop(struct net_dev
> sdata->u.sta.state = IEEE80211_DISABLED;
> del_timer_sync(&sdata->u.sta.timer);
> /*
> - * Holding the sub_if_lock for writing here blocks
> - * out the receive path and makes sure it's not
> - * currently processing a packet that may get
> - * added to the queue.
> + * When we get here, the interface is marked down.
> + * Call synchronize_rcu() to wait for the RX path
> + * should it be using the interface and enqueuing
> + * frames at this very time on another CPU.
> */
> - write_lock_bh(&local->sub_if_lock);
> + synchronize_rcu();
> skb_queue_purge(&sdata->u.sta.skb_queue);
> - write_unlock_bh(&local->sub_if_lock);
>
> if (!local->ops->hw_scan &&
> local->scan_dev == sdata->dev) {
> @@ -1134,9 +1131,9 @@ void ieee80211_tx_status(struct ieee8021
>
> rthdr->data_retries = status->retry_count;
>
> - read_lock(&local->sub_if_lock);
> + rcu_read_lock();
> monitors = local->monitors;
> - list_for_each_entry(sdata, &local->sub_if_list, list) {
> + list_for_each_entry_rcu(sdata, &local->interfaces, list) {
> /*
> * Using the monitors counter is possibly racy, but
> * if the value is wrong we simply either clone the skb
> @@ -1152,7 +1149,7 @@ void ieee80211_tx_status(struct ieee8021
> continue;
> monitors--;
> if (monitors)
> - skb2 = skb_clone(skb, GFP_KERNEL);
> + skb2 = skb_clone(skb, GFP_ATOMIC);
> else
> skb2 = NULL;
> skb->dev = sdata->dev;
> @@ -1167,7 +1164,7 @@ void ieee80211_tx_status(struct ieee8021
> }
> }
> out:
> - read_unlock(&local->sub_if_lock);
> + rcu_read_unlock();
> if (skb)
> dev_kfree_skb(skb);
> }
> @@ -1255,8 +1252,7 @@ struct ieee80211_hw *ieee80211_alloc_hw(
>
> INIT_LIST_HEAD(&local->modes_list);
>
> - rwlock_init(&local->sub_if_lock);
> - INIT_LIST_HEAD(&local->sub_if_list);
> + INIT_LIST_HEAD(&local->interfaces);
>
> INIT_DELAYED_WORK(&local->scan_work, ieee80211_sta_scan_work);
> ieee80211_rx_bss_list_init(mdev);
> @@ -1275,7 +1271,8 @@ struct ieee80211_hw *ieee80211_alloc_hw(
> sdata->u.ap.force_unicast_rateidx = -1;
> sdata->u.ap.max_ratectrl_rateidx = -1;
> ieee80211_if_sdata_init(sdata);
> - list_add_tail(&sdata->list, &local->sub_if_list);
> + /* no RCU needed since we're still during init phase */
> + list_add_tail(&sdata->list, &local->interfaces);
>
> tasklet_init(&local->tx_pending_tasklet, ieee80211_tx_pending,
> (unsigned long)local);
> @@ -1434,7 +1431,6 @@ void ieee80211_unregister_hw(struct ieee
> {
> struct ieee80211_local *local = hw_to_local(hw);
> struct ieee80211_sub_if_data *sdata, *tmp;
> - struct list_head tmp_list;
> int i;
>
> tasklet_kill(&local->tx_pending_tasklet);
> @@ -1448,11 +1444,12 @@ void ieee80211_unregister_hw(struct ieee
> if (local->apdev)
> ieee80211_if_del_mgmt(local);
>
> - write_lock_bh(&local->sub_if_lock);
> - list_replace_init(&local->sub_if_list, &tmp_list);
> - write_unlock_bh(&local->sub_if_lock);
> -
> - list_for_each_entry_safe(sdata, tmp, &tmp_list, list)
> + /*
> + * At this point, interface list manipulations are fine
> + * because the driver cannot be handing us frames any
> + * more and the tasklet is killed.
> + */
> + list_for_each_entry_safe(sdata, tmp, &local->interfaces, list)
> __ieee80211_if_del(local, sdata);
>
> rtnl_unlock();
> --- netdev-2.6.orig/net/mac80211/ieee80211_i.h 2007-09-06 15:35:23.334447686 +0200
> +++ netdev-2.6/net/mac80211/ieee80211_i.h 2007-09-06 15:35:23.404447686 +0200
> @@ -481,9 +481,8 @@ struct ieee80211_local {
> ieee80211_rx_handler *rx_handlers;
> ieee80211_tx_handler *tx_handlers;
>
> - rwlock_t sub_if_lock; /* Protects sub_if_list. Cannot be taken under
> - * sta_bss_lock or sta_lock. */
> - struct list_head sub_if_list;
> + struct list_head interfaces;
> +
> int sta_scanning;
> int scan_channel_idx;
> enum { SCAN_SET_CHANNEL, SCAN_SEND_PROBE } scan_state;
> --- netdev-2.6.orig/net/mac80211/ieee80211_iface.c 2007-09-06 15:35:23.334447686 +0200
> +++ netdev-2.6/net/mac80211/ieee80211_iface.c 2007-09-06 15:35:23.404447686 +0200
> @@ -79,16 +79,15 @@ int ieee80211_if_add(struct net_device *
> ieee80211_debugfs_add_netdev(sdata);
> ieee80211_if_set_type(ndev, type);
>
> - write_lock_bh(&local->sub_if_lock);
> + /* we're under RTNL so all this is fine */
> if (unlikely(local->reg_state == IEEE80211_DEV_UNREGISTERED)) {
> - write_unlock_bh(&local->sub_if_lock);
> __ieee80211_if_del(local, sdata);
> return -ENODEV;
> }
> - list_add(&sdata->list, &local->sub_if_list);
> + list_add_tail_rcu(&sdata->list, &local->interfaces);

The _rcu is required because this list isn't protected by RTNL?

> +
> if (new_dev)
> *new_dev = ndev;
> - write_unlock_bh(&local->sub_if_lock);
>
> return 0;
>
> @@ -226,22 +225,22 @@ void ieee80211_if_reinit(struct net_devi
> /* Remove all virtual interfaces that use this BSS
> * as their sdata->bss */
> struct ieee80211_sub_if_data *tsdata, *n;
> - LIST_HEAD(tmp_list);
>
> - write_lock_bh(&local->sub_if_lock);

This code is also protected by RTNL?

> - list_for_each_entry_safe(tsdata, n, &local->sub_if_list, list) {
> + list_for_each_entry_safe(tsdata, n, &local->interfaces, list) {
> if (tsdata != sdata && tsdata->bss == &sdata->u.ap) {
> printk(KERN_DEBUG "%s: removing virtual "
> "interface %s because its BSS interface"
> " is being removed\n",
> sdata->dev->name, tsdata->dev->name);
> - list_move_tail(&tsdata->list, &tmp_list);
> + list_del_rcu(&tsdata->list);
> + /*
> + * We have lots of time and can afford
> + * to sync for each interface
> + */
> + synchronize_rcu();
> + __ieee80211_if_del(local, tsdata);
> }
> }
> - write_unlock_bh(&local->sub_if_lock);
> -
> - list_for_each_entry_safe(tsdata, n, &tmp_list, list)
> - __ieee80211_if_del(local, tsdata);
>
> kfree(sdata->u.ap.beacon_head);
> kfree(sdata->u.ap.beacon_tail);
> @@ -318,18 +317,16 @@ int ieee80211_if_remove(struct net_devic
>
> ASSERT_RTNL();

I -like- this!!! ;-)

> - write_lock_bh(&local->sub_if_lock);
> - list_for_each_entry_safe(sdata, n, &local->sub_if_list, list) {
> + list_for_each_entry_safe(sdata, n, &local->interfaces, list) {
> if ((sdata->type == id || id == -1) &&
> strcmp(name, sdata->dev->name) == 0 &&
> sdata->dev != local->mdev) {
> - list_del(&sdata->list);
> - write_unlock_bh(&local->sub_if_lock);
> + list_del_rcu(&sdata->list);
> + synchronize_rcu();
> __ieee80211_if_del(local, sdata);
> return 0;
> }
> }
> - write_unlock_bh(&local->sub_if_lock);
> return -ENODEV;
> }
>
> --- netdev-2.6.orig/net/mac80211/ieee80211_sta.c 2007-09-06 15:35:23.224447686 +0200
> +++ netdev-2.6/net/mac80211/ieee80211_sta.c 2007-09-06 15:35:23.414447686 +0200
> @@ -2659,8 +2659,8 @@ void ieee80211_scan_completed(struct iee
> memset(&wrqu, 0, sizeof(wrqu));
> wireless_send_event(dev, SIOCGIWSCAN, &wrqu, NULL);
>
> - read_lock(&local->sub_if_lock);
> - list_for_each_entry(sdata, &local->sub_if_list, list) {
> + rcu_read_lock();
> + list_for_each_entry_rcu(sdata, &local->interfaces, list) {
>
> /* No need to wake the master device. */
> if (sdata->dev == local->mdev)
> @@ -2674,7 +2674,7 @@ void ieee80211_scan_completed(struct iee
>
> netif_wake_queue(sdata->dev);
> }
> - read_unlock(&local->sub_if_lock);
> + rcu_read_unlock();
>
> sdata = IEEE80211_DEV_TO_SUB_IF(dev);
> if (sdata->type == IEEE80211_IF_TYPE_IBSS) {
> @@ -2811,8 +2811,8 @@ static int ieee80211_sta_start_scan(stru
>
> local->sta_scanning = 1;
>
> - read_lock(&local->sub_if_lock);
> - list_for_each_entry(sdata, &local->sub_if_list, list) {
> + rcu_read_lock();
> + list_for_each_entry_rcu(sdata, &local->interfaces, list) {
>
> /* Don't stop the master interface, otherwise we can't transmit
> * probes! */
> @@ -2824,7 +2824,7 @@ static int ieee80211_sta_start_scan(stru
> (sdata->u.sta.flags & IEEE80211_STA_ASSOCIATED))
> ieee80211_send_nullfunc(local, sdata, 1);
> }
> - read_unlock(&local->sub_if_lock);
> + rcu_read_unlock();
>
> if (ssid) {
> local->scan_ssid_len = ssid_len;
> --- netdev-2.6.orig/net/mac80211/rx.c 2007-09-06 15:35:23.214447686 +0200
> +++ netdev-2.6/net/mac80211/rx.c 2007-09-06 15:35:23.414447686 +0200
> @@ -1385,8 +1385,9 @@ void __ieee80211_rx(struct ieee80211_hw
> }
>
> /*
> - * key references are protected using RCU and this requires that
> - * we are in a read-site RCU section during receive processing
> + * key references and virtual interfaces are protected using RCU
> + * and this requires that we are in a read-side RCU section during
> + * receive processing
> */
> rcu_read_lock();
>
> @@ -1441,8 +1442,7 @@ void __ieee80211_rx(struct ieee80211_hw
>
> bssid = ieee80211_get_bssid(hdr, skb->len - radiotap_len);
>
> - read_lock(&local->sub_if_lock);
> - list_for_each_entry(sdata, &local->sub_if_list, list) {
> + list_for_each_entry_rcu(sdata, &local->interfaces, list) {
> rx.flags |= IEEE80211_TXRXD_RXRA_MATCH;
>
> if (!netif_running(sdata->dev))
> @@ -1494,7 +1494,6 @@ void __ieee80211_rx(struct ieee80211_hw
> &rx, sta);
> } else
> dev_kfree_skb(skb);
> - read_unlock(&local->sub_if_lock);
>
> end:
> rcu_read_unlock();
> --- netdev-2.6.orig/net/mac80211/tx.c 2007-09-06 15:35:23.074447686 +0200
> +++ netdev-2.6/net/mac80211/tx.c 2007-09-06 15:35:23.424447686 +0200
> @@ -291,8 +291,12 @@ static void purge_old_ps_buffers(struct
> struct ieee80211_sub_if_data *sdata;
> struct sta_info *sta;
>
> - read_lock(&local->sub_if_lock);
> - list_for_each_entry(sdata, &local->sub_if_list, list) {
> + /*
> + * virtual interfaces are protected by RCU
> + */
> + rcu_read_lock();
> +
> + list_for_each_entry_rcu(sdata, &local->interfaces, list) {
> struct ieee80211_if_ap *ap;
> if (sdata->dev == local->mdev ||
> sdata->type != IEEE80211_IF_TYPE_AP)
> @@ -305,7 +309,7 @@ static void purge_old_ps_buffers(struct
> }
> total += skb_queue_len(&ap->ps_bc_buf);
> }
> - read_unlock(&local->sub_if_lock);
> + rcu_read_unlock();
>
> read_lock_bh(&local->sta_lock);
> list_for_each_entry(sta, &local->sta_list, list) {
>
>
> -
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

2007-09-06 13:35:42

by Johannes Berg

[permalink] [raw]
Subject: Re: BUG: scheduling while atomic: ifconfig/0x00000002/4170

On Thu, 2007-09-06 at 20:36 +0800, Herbert Xu wrote:

> Yeah I think they're all under RTNL too. So you don't need to
> take the lock here at all since you should already have the RTNL.

Ok, this patch gets rid of the lock. I'd appreciate if you could give it
a quick look for obvious RCU abuse as I haven't tested it. It also
doesn't apply against net-2.6.24 because I based it after the patches I
have queued with John for net-2.6.24.

johannes

--- netdev-2.6.orig/net/mac80211/ieee80211.c 2007-09-06 15:35:23.324447686 +0200
+++ netdev-2.6/net/mac80211/ieee80211.c 2007-09-06 15:35:23.394447686 +0200
@@ -90,8 +90,9 @@ static struct dev_mc_list *ieee80211_get

/* start of iteration, both unassigned */
if (!mcd->cur && !mcd->sdata) {
- mcd->sdata = list_entry(local->sub_if_list.next,
- struct ieee80211_sub_if_data, list);
+ mcd->sdata = list_entry(
+ rcu_dereference(local->interfaces.next),
+ struct ieee80211_sub_if_data, list);
mcd->cur = mcd->sdata->dev->mc_list;
}

@@ -100,10 +101,11 @@ static struct dev_mc_list *ieee80211_get

while (!mcd->cur) {
/* reached end of interface list? */
- if (mcd->sdata->list.next == &local->sub_if_list)
+ if (rcu_dereference(mcd->sdata->list.next) ==
+ &local->interfaces)
break;
/* otherwise try next interface */
- mcd->sdata = list_entry(mcd->sdata->list.next,
+ mcd->sdata = list_entry(rcu_dereference(mcd->sdata->list.next),
struct ieee80211_sub_if_data, list);
mcd->cur = mcd->sdata->dev->mc_list;
}
@@ -145,9 +147,10 @@ static void ieee80211_configure_filter(s

/*
* We can iterate through the device list for the multicast
- * address list so need to lock it.
+ * address list so need to be in a RCU read-side section,
+ * the RTNL isn't held in this function.
*/
- read_lock(&local->sub_if_lock);
+ rcu_read_lock();

/* be a bit nasty */
new_flags |= (1<<31);
@@ -163,7 +166,7 @@ static void ieee80211_configure_filter(s
WARN_ON(mcd.cur);

local->filter_flags = new_flags & ~(1<<31);
- read_unlock(&local->sub_if_lock);
+ rcu_read_unlock();

netif_tx_unlock(local->mdev);
}
@@ -176,14 +179,13 @@ static int ieee80211_master_open(struct
struct ieee80211_sub_if_data *sdata;
int res = -EOPNOTSUPP;

- read_lock(&local->sub_if_lock);
- list_for_each_entry(sdata, &local->sub_if_list, list) {
+ /* we hold the RTNL here so can safely walk the list */
+ list_for_each_entry(sdata, &local->interfaces, list) {
if (sdata->dev != dev && netif_running(sdata->dev)) {
res = 0;
break;
}
}
- read_unlock(&local->sub_if_lock);
return res;
}

@@ -192,11 +194,10 @@ static int ieee80211_master_stop(struct
struct ieee80211_local *local = wdev_priv(dev->ieee80211_ptr);
struct ieee80211_sub_if_data *sdata;

- read_lock(&local->sub_if_lock);
- list_for_each_entry(sdata, &local->sub_if_list, list)
+ /* we hold the RTNL here so can safely walk the list */
+ list_for_each_entry(sdata, &local->interfaces, list)
if (sdata->dev != dev && netif_running(sdata->dev))
dev_close(sdata->dev);
- read_unlock(&local->sub_if_lock);

return 0;
}
@@ -430,8 +431,8 @@ static int ieee80211_open(struct net_dev

sdata = IEEE80211_DEV_TO_SUB_IF(dev);

- read_lock(&local->sub_if_lock);
- list_for_each_entry(nsdata, &local->sub_if_list, list) {
+ /* we hold the RTNL here so can safely walk the list */
+ list_for_each_entry(nsdata, &local->interfaces, list) {
struct net_device *ndev = nsdata->dev;

if (ndev != dev && ndev != local->mdev && netif_running(ndev) &&
@@ -440,10 +441,8 @@ static int ieee80211_open(struct net_dev
* check whether it may have the same address
*/
if (!identical_mac_addr_allowed(sdata->type,
- nsdata->type)) {
- read_unlock(&local->sub_if_lock);
+ nsdata->type))
return -ENOTUNIQ;
- }

/*
* can only add VLANs to enabled APs
@@ -454,7 +453,6 @@ static int ieee80211_open(struct net_dev
sdata->u.vlan.ap = nsdata;
}
}
- read_unlock(&local->sub_if_lock);

switch (sdata->type) {
case IEEE80211_IF_TYPE_WDS:
@@ -575,14 +573,13 @@ static int ieee80211_stop(struct net_dev
sdata->u.sta.state = IEEE80211_DISABLED;
del_timer_sync(&sdata->u.sta.timer);
/*
- * Holding the sub_if_lock for writing here blocks
- * out the receive path and makes sure it's not
- * currently processing a packet that may get
- * added to the queue.
+ * When we get here, the interface is marked down.
+ * Call synchronize_rcu() to wait for the RX path
+ * should it be using the interface and enqueuing
+ * frames at this very time on another CPU.
*/
- write_lock_bh(&local->sub_if_lock);
+ synchronize_rcu();
skb_queue_purge(&sdata->u.sta.skb_queue);
- write_unlock_bh(&local->sub_if_lock);

if (!local->ops->hw_scan &&
local->scan_dev == sdata->dev) {
@@ -1134,9 +1131,9 @@ void ieee80211_tx_status(struct ieee8021

rthdr->data_retries = status->retry_count;

- read_lock(&local->sub_if_lock);
+ rcu_read_lock();
monitors = local->monitors;
- list_for_each_entry(sdata, &local->sub_if_list, list) {
+ list_for_each_entry_rcu(sdata, &local->interfaces, list) {
/*
* Using the monitors counter is possibly racy, but
* if the value is wrong we simply either clone the skb
@@ -1152,7 +1149,7 @@ void ieee80211_tx_status(struct ieee8021
continue;
monitors--;
if (monitors)
- skb2 = skb_clone(skb, GFP_KERNEL);
+ skb2 = skb_clone(skb, GFP_ATOMIC);
else
skb2 = NULL;
skb->dev = sdata->dev;
@@ -1167,7 +1164,7 @@ void ieee80211_tx_status(struct ieee8021
}
}
out:
- read_unlock(&local->sub_if_lock);
+ rcu_read_unlock();
if (skb)
dev_kfree_skb(skb);
}
@@ -1255,8 +1252,7 @@ struct ieee80211_hw *ieee80211_alloc_hw(

INIT_LIST_HEAD(&local->modes_list);

- rwlock_init(&local->sub_if_lock);
- INIT_LIST_HEAD(&local->sub_if_list);
+ INIT_LIST_HEAD(&local->interfaces);

INIT_DELAYED_WORK(&local->scan_work, ieee80211_sta_scan_work);
ieee80211_rx_bss_list_init(mdev);
@@ -1275,7 +1271,8 @@ struct ieee80211_hw *ieee80211_alloc_hw(
sdata->u.ap.force_unicast_rateidx = -1;
sdata->u.ap.max_ratectrl_rateidx = -1;
ieee80211_if_sdata_init(sdata);
- list_add_tail(&sdata->list, &local->sub_if_list);
+ /* no RCU needed since we're still during init phase */
+ list_add_tail(&sdata->list, &local->interfaces);

tasklet_init(&local->tx_pending_tasklet, ieee80211_tx_pending,
(unsigned long)local);
@@ -1434,7 +1431,6 @@ void ieee80211_unregister_hw(struct ieee
{
struct ieee80211_local *local = hw_to_local(hw);
struct ieee80211_sub_if_data *sdata, *tmp;
- struct list_head tmp_list;
int i;

tasklet_kill(&local->tx_pending_tasklet);
@@ -1448,11 +1444,12 @@ void ieee80211_unregister_hw(struct ieee
if (local->apdev)
ieee80211_if_del_mgmt(local);

- write_lock_bh(&local->sub_if_lock);
- list_replace_init(&local->sub_if_list, &tmp_list);
- write_unlock_bh(&local->sub_if_lock);
-
- list_for_each_entry_safe(sdata, tmp, &tmp_list, list)
+ /*
+ * At this point, interface list manipulations are fine
+ * because the driver cannot be handing us frames any
+ * more and the tasklet is killed.
+ */
+ list_for_each_entry_safe(sdata, tmp, &local->interfaces, list)
__ieee80211_if_del(local, sdata);

rtnl_unlock();
--- netdev-2.6.orig/net/mac80211/ieee80211_i.h 2007-09-06 15:35:23.334447686 +0200
+++ netdev-2.6/net/mac80211/ieee80211_i.h 2007-09-06 15:35:23.404447686 +0200
@@ -481,9 +481,8 @@ struct ieee80211_local {
ieee80211_rx_handler *rx_handlers;
ieee80211_tx_handler *tx_handlers;

- rwlock_t sub_if_lock; /* Protects sub_if_list. Cannot be taken under
- * sta_bss_lock or sta_lock. */
- struct list_head sub_if_list;
+ struct list_head interfaces;
+
int sta_scanning;
int scan_channel_idx;
enum { SCAN_SET_CHANNEL, SCAN_SEND_PROBE } scan_state;
--- netdev-2.6.orig/net/mac80211/ieee80211_iface.c 2007-09-06 15:35:23.334447686 +0200
+++ netdev-2.6/net/mac80211/ieee80211_iface.c 2007-09-06 15:35:23.404447686 +0200
@@ -79,16 +79,15 @@ int ieee80211_if_add(struct net_device *
ieee80211_debugfs_add_netdev(sdata);
ieee80211_if_set_type(ndev, type);

- write_lock_bh(&local->sub_if_lock);
+ /* we're under RTNL so all this is fine */
if (unlikely(local->reg_state == IEEE80211_DEV_UNREGISTERED)) {
- write_unlock_bh(&local->sub_if_lock);
__ieee80211_if_del(local, sdata);
return -ENODEV;
}
- list_add(&sdata->list, &local->sub_if_list);
+ list_add_tail_rcu(&sdata->list, &local->interfaces);
+
if (new_dev)
*new_dev = ndev;
- write_unlock_bh(&local->sub_if_lock);

return 0;

@@ -226,22 +225,22 @@ void ieee80211_if_reinit(struct net_devi
/* Remove all virtual interfaces that use this BSS
* as their sdata->bss */
struct ieee80211_sub_if_data *tsdata, *n;
- LIST_HEAD(tmp_list);

- write_lock_bh(&local->sub_if_lock);
- list_for_each_entry_safe(tsdata, n, &local->sub_if_list, list) {
+ list_for_each_entry_safe(tsdata, n, &local->interfaces, list) {
if (tsdata != sdata && tsdata->bss == &sdata->u.ap) {
printk(KERN_DEBUG "%s: removing virtual "
"interface %s because its BSS interface"
" is being removed\n",
sdata->dev->name, tsdata->dev->name);
- list_move_tail(&tsdata->list, &tmp_list);
+ list_del_rcu(&tsdata->list);
+ /*
+ * We have lots of time and can afford
+ * to sync for each interface
+ */
+ synchronize_rcu();
+ __ieee80211_if_del(local, tsdata);
}
}
- write_unlock_bh(&local->sub_if_lock);
-
- list_for_each_entry_safe(tsdata, n, &tmp_list, list)
- __ieee80211_if_del(local, tsdata);

kfree(sdata->u.ap.beacon_head);
kfree(sdata->u.ap.beacon_tail);
@@ -318,18 +317,16 @@ int ieee80211_if_remove(struct net_devic

ASSERT_RTNL();

- write_lock_bh(&local->sub_if_lock);
- list_for_each_entry_safe(sdata, n, &local->sub_if_list, list) {
+ list_for_each_entry_safe(sdata, n, &local->interfaces, list) {
if ((sdata->type == id || id == -1) &&
strcmp(name, sdata->dev->name) == 0 &&
sdata->dev != local->mdev) {
- list_del(&sdata->list);
- write_unlock_bh(&local->sub_if_lock);
+ list_del_rcu(&sdata->list);
+ synchronize_rcu();
__ieee80211_if_del(local, sdata);
return 0;
}
}
- write_unlock_bh(&local->sub_if_lock);
return -ENODEV;
}

--- netdev-2.6.orig/net/mac80211/ieee80211_sta.c 2007-09-06 15:35:23.224447686 +0200
+++ netdev-2.6/net/mac80211/ieee80211_sta.c 2007-09-06 15:35:23.414447686 +0200
@@ -2659,8 +2659,8 @@ void ieee80211_scan_completed(struct iee
memset(&wrqu, 0, sizeof(wrqu));
wireless_send_event(dev, SIOCGIWSCAN, &wrqu, NULL);

- read_lock(&local->sub_if_lock);
- list_for_each_entry(sdata, &local->sub_if_list, list) {
+ rcu_read_lock();
+ list_for_each_entry_rcu(sdata, &local->interfaces, list) {

/* No need to wake the master device. */
if (sdata->dev == local->mdev)
@@ -2674,7 +2674,7 @@ void ieee80211_scan_completed(struct iee

netif_wake_queue(sdata->dev);
}
- read_unlock(&local->sub_if_lock);
+ rcu_read_unlock();

sdata = IEEE80211_DEV_TO_SUB_IF(dev);
if (sdata->type == IEEE80211_IF_TYPE_IBSS) {
@@ -2811,8 +2811,8 @@ static int ieee80211_sta_start_scan(stru

local->sta_scanning = 1;

- read_lock(&local->sub_if_lock);
- list_for_each_entry(sdata, &local->sub_if_list, list) {
+ rcu_read_lock();
+ list_for_each_entry_rcu(sdata, &local->interfaces, list) {

/* Don't stop the master interface, otherwise we can't transmit
* probes! */
@@ -2824,7 +2824,7 @@ static int ieee80211_sta_start_scan(stru
(sdata->u.sta.flags & IEEE80211_STA_ASSOCIATED))
ieee80211_send_nullfunc(local, sdata, 1);
}
- read_unlock(&local->sub_if_lock);
+ rcu_read_unlock();

if (ssid) {
local->scan_ssid_len = ssid_len;
--- netdev-2.6.orig/net/mac80211/rx.c 2007-09-06 15:35:23.214447686 +0200
+++ netdev-2.6/net/mac80211/rx.c 2007-09-06 15:35:23.414447686 +0200
@@ -1385,8 +1385,9 @@ void __ieee80211_rx(struct ieee80211_hw
}

/*
- * key references are protected using RCU and this requires that
- * we are in a read-site RCU section during receive processing
+ * key references and virtual interfaces are protected using RCU
+ * and this requires that we are in a read-side RCU section during
+ * receive processing
*/
rcu_read_lock();

@@ -1441,8 +1442,7 @@ void __ieee80211_rx(struct ieee80211_hw

bssid = ieee80211_get_bssid(hdr, skb->len - radiotap_len);

- read_lock(&local->sub_if_lock);
- list_for_each_entry(sdata, &local->sub_if_list, list) {
+ list_for_each_entry_rcu(sdata, &local->interfaces, list) {
rx.flags |= IEEE80211_TXRXD_RXRA_MATCH;

if (!netif_running(sdata->dev))
@@ -1494,7 +1494,6 @@ void __ieee80211_rx(struct ieee80211_hw
&rx, sta);
} else
dev_kfree_skb(skb);
- read_unlock(&local->sub_if_lock);

end:
rcu_read_unlock();
--- netdev-2.6.orig/net/mac80211/tx.c 2007-09-06 15:35:23.074447686 +0200
+++ netdev-2.6/net/mac80211/tx.c 2007-09-06 15:35:23.424447686 +0200
@@ -291,8 +291,12 @@ static void purge_old_ps_buffers(struct
struct ieee80211_sub_if_data *sdata;
struct sta_info *sta;

- read_lock(&local->sub_if_lock);
- list_for_each_entry(sdata, &local->sub_if_list, list) {
+ /*
+ * virtual interfaces are protected by RCU
+ */
+ rcu_read_lock();
+
+ list_for_each_entry_rcu(sdata, &local->interfaces, list) {
struct ieee80211_if_ap *ap;
if (sdata->dev == local->mdev ||
sdata->type != IEEE80211_IF_TYPE_AP)
@@ -305,7 +309,7 @@ static void purge_old_ps_buffers(struct
}
total += skb_queue_len(&ap->ps_bc_buf);
}
- read_unlock(&local->sub_if_lock);
+ rcu_read_unlock();

read_lock_bh(&local->sta_lock);
list_for_each_entry(sta, &local->sta_list, list) {



2007-09-06 12:08:51

by Johannes Berg

[permalink] [raw]
Subject: Re: BUG: scheduling while atomic: ifconfig/0x00000002/4170

On Thu, 2007-09-06 at 16:23 +0800, Herbert Xu wrote:
> On Thu, Sep 06, 2007 at 10:32:33AM +0530, Satyam Sharma wrote:
> >
> > > > [ 382.529041] [<c02c8abc>] dev_close+0x24/0x67
> > > > [ 382.529052] [<e01f402b>] ieee80211_master_stop+0x4a/0x6d [mac80211]
>
> This is where the bug is. You cannot call dev_close from an
> atomic context as i33380211_master_stop does it within spin
> locks.

Hah, I suspected as much but didn't have a chance to look yet. I had
plans to replace that sub_if_list with an RCU list and not require the
lock there, but that's far off. Any ideas how to fix this? We can't
reject the master stop so we have to walk the list, I guess we'll have
to audit the other list manipulation places, I think they're all under
RTNL.

johannes


Attachments:
signature.asc (190.00 B)
This is a digitally signed message part

2007-09-06 09:05:48

by Florian Lohoff

[permalink] [raw]
Subject: Re: BUG: scheduling while atomic: ifconfig/0x00000002/4170

On Thu, Sep 06, 2007 at 02:11:46PM +0530, Satyam Sharma wrote:
> On Thu, 6 Sep 2007, Herbert Xu wrote:
>
> > On Thu, Sep 06, 2007 at 10:32:33AM +0530, Satyam Sharma wrote:
> > >
> > > > > [ 382.529041] [<c02c8abc>] dev_close+0x24/0x67
> > > > > [ 382.529052] [<e01f402b>] ieee80211_master_stop+0x4a/0x6d [mac80211]
> >
> > This is where the bug is. You cannot call dev_close from an
> > atomic context as i33380211_master_stop does it within spin
> > locks.
>
> Doh, of course! I must be blind ... and wait_for_completion()'s
> might_sleep() clearly didn't trigger earlier because Florian must've
> had CONFIG_DEBUG_SPINLOCK_SLEEP off in his .config ...

# CONFIG_DEBUG_SPINLOCK_SLEEP is not set

Exactly ...

Flo
--
Florian Lohoff [email protected] +49-171-2280134
Those who would give up a little freedom to get a little
security shall soon have neither - Benjamin Franklin


Attachments:
(No filename) (924.00 B)
signature.asc (189.00 B)
Digital signature
Download all attachments

2007-09-06 15:18:41

by Johannes Berg

[permalink] [raw]
Subject: Re: BUG: scheduling while atomic: ifconfig/0x00000002/4170

On Thu, 2007-09-06 at 16:23 +0800, Herbert Xu wrote:
> On Thu, Sep 06, 2007 at 10:32:33AM +0530, Satyam Sharma wrote:
> >
> > > > [ 382.529041] [<c02c8abc>] dev_close+0x24/0x67
> > > > [ 382.529052] [<e01f402b>] ieee80211_master_stop+0x4a/0x6d [mac80211]
>
> This is where the bug is. You cannot call dev_close from an
> atomic context as i33380211_master_stop does it within spin
> locks.

Oh btw. Can we stick a might_sleep() into dev_close() *before* the test
whether the device is up? That way, we'd have seen the bug, but
apparently nobody before Florian ever did a 'ip link set wmaster0 down'
while the other interfaces were still open.

johannes


Attachments:
signature.asc (190.00 B)
This is a digitally signed message part