2012-02-14 15:51:35

by Ronald Wahl

[permalink] [raw]
Subject: rtlwifi/rtl8192cu: scheduling while atomic / sleeping function called from invalid context

Hi,

I just got the below traces with the rtlwifi driver in linux 3.2.5 and a
rtl8192cu chip. It looks like that _usb_read_sync() and kmalloc(...,
GFP_KERNEL) is called while the rcu lock is held inside
rtl92c_dm_refresh_rate_adaptive_mask.

BUG: sleeping function called from invalid context at mm/slub.c:935
in_atomic(): 1, irqs_disabled(): 0, pid: 128, name: kworker/0:1
[<c000e158>] (unwind_backtrace+0x0/0x12c) from [<c02db684>] (dump_stack+0x20/0x24)
[<c02db684>] (dump_stack+0x20/0x24) from [<c0017238>] (__might_sleep+0x104/0x124)
[<c0017238>] (__might_sleep+0x104/0x124) from [<c00b123c>] (kmem_cache_alloc_trace+0x4c/0x1f0)
[<c00b123c>] (kmem_cache_alloc_trace+0x4c/0x1f0) from [<bf20b398>] (_usb_read_sync+0x40/0xd0 [rtlwifi])
[<bf20b398>] (_usb_read_sync+0x40/0xd0 [rtlwifi]) from [<bf20b44c>] (_usb_read32_sync+0x24/0x28 [rtlwifi])
[<bf20b44c>] (_usb_read32_sync+0x24/0x28 [rtlwifi]) from [<bf2a0be4>] (rtl92cu_update_hal_rate_table+0x1c8/0x210 [rtl8192cu])
[<bf2a0be4>] (rtl92cu_update_hal_rate_table+0x1c8/0x210 [rtl8192cu]) from [<bf21e950>] (rtl92c_dm_watchdog+0x10ec/0x1390 [rtl8192c_common])
[<bf21e950>] (rtl92c_dm_watchdog+0x10ec/0x1390 [rtl8192c_common]) from [<bf200484>] (rtl_watchdog_wq_callback+0x2ec/0x320 [rtlwifi])
[<bf200484>] (rtl_watchdog_wq_callback+0x2ec/0x320 [rtlwifi]) from [<c00398c4>] (process_one_work+0x294/0x474)
[<c00398c4>] (process_one_work+0x294/0x474) from [<c003c0a8>] (worker_thread+0x214/0x344)
[<c003c0a8>] (worker_thread+0x214/0x344) from [<c0040330>] (kthread+0x94/0x9c)
[<c0040330>] (kthread+0x94/0x9c) from [<c0009eec>] (kernel_thread_exit+0x0/0x8)

BUG: scheduling while atomic: kworker/0:1/128/0x00000002
Modules linked in: rtl8192cu aes_generic r8712u(C) arc4 rtl8192c_common rtlwifi rt2800usb rt2800lib crc_ccitt rt2x00usb rt2x00lib ath9k_htc ath9k_common ath9k_hw carl9170 mac80211 firmware_class ath cfg80211 g_serial v4l2_int_device uvcvideo v4l2_common videodev [last unloaded: rtl8192cu]
[<c000e158>] (unwind_backtrace+0x0/0x12c) from [<c02db684>] (dump_stack+0x20/0x24)
[<c02db684>] (dump_stack+0x20/0x24) from [<c0017770>] (__schedule_bug+0x70/0x7c)
[<c0017770>] (__schedule_bug+0x70/0x7c) from [<c02dbe40>] (__schedule+0x68/0x548)
[<c02dbe40>] (__schedule+0x68/0x548) from [<c02dc454>] (schedule+0x80/0x84)
[<c02dc454>] (schedule+0x80/0x84) from [<c02dc76c>] (schedule_timeout+0x28/0x2d4)
[<c02dc76c>] (schedule_timeout+0x28/0x2d4) from [<c02dbc2c>] (wait_for_common+0x110/0x1e0)
[<c02dbc2c>] (wait_for_common+0x110/0x1e0) from [<c02dbdb0>] (wait_for_completion_timeout+0x1c/0x20)
[<c02dbdb0>] (wait_for_completion_timeout+0x1c/0x20) from [<c01e5738>] (usb_start_wait_urb+0x78/0x13c)
[<c01e5738>] (usb_start_wait_urb+0x78/0x13c) from [<c01e5a08>] (usb_control_msg+0xdc/0x108)
[<c01e5a08>] (usb_control_msg+0xdc/0x108) from [<bf20b3e8>] (_usb_read_sync+0x90/0xd0 [rtlwifi])
[<bf20b3e8>] (_usb_read_sync+0x90/0xd0 [rtlwifi]) from [<bf20b44c>] (_usb_read32_sync+0x24/0x28 [rtlwifi])
[<bf20b44c>] (_usb_read32_sync+0x24/0x28 [rtlwifi]) from [<bf2a0be4>] (rtl92cu_update_hal_rate_table+0x1c8/0x210 [rtl8192cu])
[<bf2a0be4>] (rtl92cu_update_hal_rate_table+0x1c8/0x210 [rtl8192cu]) from [<bf21e950>] (rtl92c_dm_watchdog+0x10ec/0x1390 [rtl8192c_common])
[<bf21e950>] (rtl92c_dm_watchdog+0x10ec/0x1390 [rtl8192c_common]) from [<bf200484>] (rtl_watchdog_wq_callback+0x2ec/0x320 [rtlwifi])
[<bf200484>] (rtl_watchdog_wq_callback+0x2ec/0x320 [rtlwifi]) from [<c00398c4>] (process_one_work+0x294/0x474)
[<c00398c4>] (process_one_work+0x294/0x474) from [<c003c0a8>] (worker_thread+0x214/0x344)
[<c003c0a8>] (worker_thread+0x214/0x344) from [<c0040330>] (kthread+0x94/0x9c)
[<c0040330>] (kthread+0x94/0x9c) from [<c0009eec>] (kernel_thread_exit+0x0/0x8)

I hope you can fix this.

Thanks,
Ronald

--
Ronald Wahl - [email protected] - Phone +49 375271349-0 Fax -99
Raritan Deutschland GmbH, Kornmarkt 7, 08056 Zwickau, Germany
USt-IdNr. DE813094160, Steuer-Nr. 227/117/01749
Amtsgericht Chemnitz HRB 23605
Geschäftsführung: Stuart Hopper, Burkhard Wessler


2012-02-15 13:40:23

by Ronald Wahl

[permalink] [raw]
Subject: Re: rtlwifi/rtl8192cu: scheduling while atomic / sleeping function called from invalid context

On 14.02.2012 22:04, Larry Finger wrote:
> On 02/14/2012 09:36 AM, Ronald Wahl wrote:
>> Hi,
>>
>> I just got the below traces with the rtlwifi driver in linux 3.2.5 and a
>> rtl8192cu chip. It looks like that _usb_read_sync() and kmalloc(...,
>> GFP_KERNEL) is called while the rcu lock is held inside
>> rtl92c_dm_refresh_rate_adaptive_mask.
>
> --snip--
>
>> I hope you can fix this.
>
> Thanks to your analysis, it was easy. Routine
> rtl92c_dm_refresh_rate_adaptive_mask() makes a callback to
> rtlpriv->cfg->ops->update_rate_tbl() with the lock held. For rtl8192ce,
> it is OK, but rtl8192cu kmallocs a data buffer. The irony is that
> rtl8192cu does not use the ieee80211_sta struct, which is what is being
> locked. The quick fix is to release the lock in the affected routine and
> reacquire it before the exit. The real fix will be to redefine the
> callback routines. That will be done for 3.4, but we need to get 3.2 and
> 3.3 fixed first.
>
> Could you please test this patch?

Thanks for the patch. It improves the situation but I found a second trace
during association:

BUG: sleeping function called from invalid context at mm/slub.c:935
in_atomic(): 1, irqs_disabled(): 0, pid: 11580, name: kworker/u:2
[<c000e158>] (unwind_backtrace+0x0/0x12c) from [<c02db684>] (dump_stack+0x20/0x24)
[<c02db684>] (dump_stack+0x20/0x24) from [<c0017238>] (__might_sleep+0x104/0x124)
[<c0017238>] (__might_sleep+0x104/0x124) from [<c00b123c>] (kmem_cache_alloc_trace+0x4c/0x1f0)
[<c00b123c>] (kmem_cache_alloc_trace+0x4c/0x1f0) from [<bf459398>] (_usb_read_sync+0x40/0xd0 [rtlwifi])
[<bf459398>] (_usb_read_sync+0x40/0xd0 [rtlwifi]) from [<bf45944c>] (_usb_read32_sync+0x24/0x28 [rtlwifi])
[<bf45944c>] (_usb_read32_sync+0x24/0x28 [rtlwifi]) from [<bf486c04>] (rtl92cu_update_hal_rate_table+0x1e8/0x24c [rtl8192cu])
[<bf486c04>] (rtl92cu_update_hal_rate_table+0x1e8/0x24c [rtl8192cu]) from [<bf451830>] (rtl_op_sta_add+0x114/0x13c [rtlwifi])
[<bf451830>] (rtl_op_sta_add+0x114/0x13c [rtlwifi]) from [<bf0bd4d0>] (sta_info_finish_insert+0x80/0x204 [mac80211])
[<bf0bd4d0>] (sta_info_finish_insert+0x80/0x204 [mac80211]) from [<bf0bdb18>] (sta_info_insert_non_ibss+0x108/0x140 [mac80211])
[<bf0bdb18>] (sta_info_insert_non_ibss+0x108/0x140 [mac80211]) from [<bf0bdfc0>] (sta_info_reinsert+0x4c/0x78 [mac80211])
[<bf0bdfc0>] (sta_info_reinsert+0x4c/0x78 [mac80211]) from [<bf0c7d4c>] (ieee80211_assoc_success+0x474/0x7c0 [mac80211])
[<bf0c7d4c>] (ieee80211_assoc_success+0x474/0x7c0 [mac80211]) from [<bf0c81c0>] (ieee80211_assoc_done+0x128/0x1e4 [mac80211])
[<bf0c81c0>] (ieee80211_assoc_done+0x128/0x1e4 [mac80211]) from [<bf0c98a8>] (ieee80211_work_work+0x2c0/0x11c0 [mac80211])
[<bf0c98a8>] (ieee80211_work_work+0x2c0/0x11c0 [mac80211]) from [<c00398c4>] (process_one_work+0x294/0x474)
[<c00398c4>] (process_one_work+0x294/0x474) from [<c003c0a8>] (worker_thread+0x214/0x344)
[<c003c0a8>] (worker_thread+0x214/0x344) from [<c0040330>] (kthread+0x94/0x9c)
[<c0040330>] (kthread+0x94/0x9c) from [<c0009eec>] (kernel_thread_exit+0x0/0x8)

Here it looks like the lock is held in the mac80211 layer ouside of the
lowlevel driver. I tried GFP_ATOMIC in _usb_read_sync() but this does not
work because on one hand usb_control_msg() calls non-atomic kmalloc() itself
and we also wait for completion of the usb transfer.

Note: I'm away until end of next week so I can't test any further patches
during that time.

thanks,
ron

--
Ronald Wahl - [email protected] - Phone +49 375271349-0 Fax -99
Raritan Deutschland GmbH, Kornmarkt 7, 08056 Zwickau, Germany
USt-IdNr. DE813094160, Steuer-Nr. 227/117/01749
Amtsgericht Chemnitz HRB 23605
Geschäftsführung: Stuart Hopper, Burkhard Wessler

2012-02-14 21:04:19

by Larry Finger

[permalink] [raw]
Subject: Re: rtlwifi/rtl8192cu: scheduling while atomic / sleeping function called from invalid context

On 02/14/2012 09:36 AM, Ronald Wahl wrote:
> Hi,
>
> I just got the below traces with the rtlwifi driver in linux 3.2.5 and a
> rtl8192cu chip. It looks like that _usb_read_sync() and kmalloc(...,
> GFP_KERNEL) is called while the rcu lock is held inside
> rtl92c_dm_refresh_rate_adaptive_mask.

--snip--

> I hope you can fix this.

Thanks to your analysis, it was easy. Routine
rtl92c_dm_refresh_rate_adaptive_mask() makes a callback to
rtlpriv->cfg->ops->update_rate_tbl() with the lock held. For rtl8192ce, it is
OK, but rtl8192cu kmallocs a data buffer. The irony is that rtl8192cu does not
use the ieee80211_sta struct, which is what is being locked. The quick fix is to
release the lock in the affected routine and reacquire it before the exit. The
real fix will be to redefine the callback routines. That will be done for 3.4,
but we need to get 3.2 and 3.3 fixed first.

Could you please test this patch?

Thanks,

Larry

Index: wireless-testing-new/drivers/net/wireless/rtlwifi/rtl8192cu/hw.c
===================================================================
--- wireless-testing-new.orig/drivers/net/wireless/rtlwifi/rtl8192cu/hw.c
+++ wireless-testing-new/drivers/net/wireless/rtlwifi/rtl8192cu/hw.c
@@ -2010,6 +2010,7 @@ void rtl92cu_update_hal_rate_table(struc
u8 curshortgi_20mhz = mac->sgi_20;
enum wireless_mode wirelessmode = mac->mode;

+ rcu_read_unlock();
ratr_value |= ((*(u16 *) (mcsrate))) << 12;
switch (wirelessmode) {
case WIRELESS_MODE_B:
@@ -2061,6 +2062,7 @@ void rtl92cu_update_hal_rate_table(struc
rtl_write_dword(rtlpriv, REG_ARFR0 + ratr_index * 4, ratr_value);
RT_TRACE(rtlpriv, COMP_RATR, DBG_DMESG, "%x\n",
rtl_read_dword(rtlpriv, REG_ARFR0));
+ rcu_read_lock();
}

void rtl92cu_update_hal_rate_mask(struct ieee80211_hw *hw, u8 rssi_level)

2012-02-15 17:43:52

by Larry Finger

[permalink] [raw]
Subject: Re: rtlwifi/rtl8192cu: scheduling while atomic / sleeping function called from invalid context

On 02/15/2012 10:15 AM, Ronald Wahl wrote:
> On 15.02.2012 16:27, Larry Finger wrote:
>> If possible, I would like one favor before you leave. These traces
>> should be a part of a report at bugzilla.kernel.org. Is this a
>> regression? I have assumed so.
>
> https://bugzilla.kernel.org/show_bug.cgi?id=42775
>
> I cannot tell if this is a regression - I just startet to test the driver with
> this kernel.

Thanks for the report and the link.

Attached is a general fix that replaces the previous one and should fix both
problems. As we do synchronous reads, there is no real need to allocate a new
buffer for each one. I'm putting a 32-bit data buffer in the private data area
and eliminating the kmalloc/kfree process. The driver is obviously too
vulnerable to upstream changes to keep the current structure. Once again it is
ironic in that the sta struct that was protected in your second oops was also a
dummy, and that entire code is eliminated in the current version of mac80211.

Please test whenever you get a chance. I'll continue to test the patch, but will
not submit it until I hear back from you.

Thanks,

Larry


Index: linux-2.6/drivers/net/wireless/rtlwifi/usb.c
===================================================================
--- linux-2.6.orig/drivers/net/wireless/rtlwifi/usb.c
+++ linux-2.6/drivers/net/wireless/rtlwifi/usb.c
@@ -113,46 +113,38 @@ static int _usbctrl_vendorreq_sync_read(
return status;
}

-static u32 _usb_read_sync(struct usb_device *udev, u32 addr, u16 len)
+static u32 _usb_read_sync(struct rtl_priv *rtlpriv, u32 addr, u16 len)
{
+ struct device *dev = rtlpriv->io.dev;
+ struct usb_device *udev = to_usb_device(dev);
u8 request;
u16 wvalue;
u16 index;
- u32 *data;
+ u32 *data = &rtlpriv->usb_data;
u32 ret;

- data = kmalloc(sizeof(u32), GFP_KERNEL);
- if (!data)
- return -ENOMEM;
request = REALTEK_USB_VENQT_CMD_REQ;
index = REALTEK_USB_VENQT_CMD_IDX; /* n/a */

wvalue = (u16)addr;
_usbctrl_vendorreq_sync_read(udev, request, wvalue, index, data, len);
ret = *data;
- kfree(data);
return ret;
}

static u8 _usb_read8_sync(struct rtl_priv *rtlpriv, u32 addr)
{
- struct device *dev = rtlpriv->io.dev;
-
- return (u8)_usb_read_sync(to_usb_device(dev), addr, 1);
+ return (u8)_usb_read_sync(rtlpriv, addr, 1);
}

static u16 _usb_read16_sync(struct rtl_priv *rtlpriv, u32 addr)
{
- struct device *dev = rtlpriv->io.dev;
-
- return (u16)_usb_read_sync(to_usb_device(dev), addr, 2);
+ return (u16)_usb_read_sync(rtlpriv, addr, 2);
}

static u32 _usb_read32_sync(struct rtl_priv *rtlpriv, u32 addr)
{
- struct device *dev = rtlpriv->io.dev;
-
- return _usb_read_sync(to_usb_device(dev), addr, 4);
+ return _usb_read_sync(rtlpriv, addr, 4);
}

static void _usb_write_async(struct usb_device *udev, u32 addr, u32 val,
Index: linux-2.6/drivers/net/wireless/rtlwifi/wifi.h
===================================================================
--- linux-2.6.orig/drivers/net/wireless/rtlwifi/wifi.h
+++ linux-2.6/drivers/net/wireless/rtlwifi/wifi.h
@@ -1621,6 +1621,9 @@ struct rtl_priv {
interface or hardware */
unsigned long status;

+ /* data buffer for USB reads */
+ u32 usb_data;
+
/*This must be the last item so
that it points to the data allocated
beyond this structure like:



2012-02-15 16:15:28

by Ronald Wahl

[permalink] [raw]
Subject: Re: rtlwifi/rtl8192cu: scheduling while atomic / sleeping function called from invalid context

On 15.02.2012 16:27, Larry Finger wrote:
> If possible, I would like one favor before you leave. These traces
> should be a part of a report at bugzilla.kernel.org. Is this a
> regression? I have assumed so.

https://bugzilla.kernel.org/show_bug.cgi?id=42775

I cannot tell if this is a regression - I just startet to test the
driver with this kernel.

- ron

--
Ronald Wahl - [email protected] - Phone +49 375271349-0 Fax -99
Raritan Deutschland GmbH, Kornmarkt 7, 08056 Zwickau, Germany
USt-IdNr. DE813094160, Steuer-Nr. 227/117/01749
Amtsgericht Chemnitz HRB 23605
Geschäftsführung: Stuart Hopper, Burkhard Wessler

2012-02-15 15:27:09

by Larry Finger

[permalink] [raw]
Subject: Re: rtlwifi/rtl8192cu: scheduling while atomic / sleeping function called from invalid context

On 02/15/2012 07:40 AM, Ronald Wahl wrote:
> On 14.02.2012 22:04, Larry Finger wrote:
>> On 02/14/2012 09:36 AM, Ronald Wahl wrote:
>>> Hi,
>>>
>>> I just got the below traces with the rtlwifi driver in linux 3.2.5 and a
>>> rtl8192cu chip. It looks like that _usb_read_sync() and kmalloc(...,
>>> GFP_KERNEL) is called while the rcu lock is held inside
>>> rtl92c_dm_refresh_rate_adaptive_mask.
>>
>> --snip--
>>
>>> I hope you can fix this.
>>
>> Thanks to your analysis, it was easy. Routine
>> rtl92c_dm_refresh_rate_adaptive_mask() makes a callback to
>> rtlpriv->cfg->ops->update_rate_tbl() with the lock held. For rtl8192ce,
>> it is OK, but rtl8192cu kmallocs a data buffer. The irony is that
>> rtl8192cu does not use the ieee80211_sta struct, which is what is being
>> locked. The quick fix is to release the lock in the affected routine and
>> reacquire it before the exit. The real fix will be to redefine the
>> callback routines. That will be done for 3.4, but we need to get 3.2 and
>> 3.3 fixed first.
>>
>> Could you please test this patch?
>
> Thanks for the patch. It improves the situation but I found a second trace
> during association:
>
> BUG: sleeping function called from invalid context at mm/slub.c:935
> in_atomic(): 1, irqs_disabled(): 0, pid: 11580, name: kworker/u:2
> [<c000e158>] (unwind_backtrace+0x0/0x12c) from [<c02db684>] (dump_stack+0x20/0x24)
> [<c02db684>] (dump_stack+0x20/0x24) from [<c0017238>] (__might_sleep+0x104/0x124)
> [<c0017238>] (__might_sleep+0x104/0x124) from [<c00b123c>] (kmem_cache_alloc_trace+0x4c/0x1f0)
> [<c00b123c>] (kmem_cache_alloc_trace+0x4c/0x1f0) from [<bf459398>] (_usb_read_sync+0x40/0xd0 [rtlwifi])
> [<bf459398>] (_usb_read_sync+0x40/0xd0 [rtlwifi]) from [<bf45944c>] (_usb_read32_sync+0x24/0x28 [rtlwifi])
> [<bf45944c>] (_usb_read32_sync+0x24/0x28 [rtlwifi]) from [<bf486c04>] (rtl92cu_update_hal_rate_table+0x1e8/0x24c [rtl8192cu])
> [<bf486c04>] (rtl92cu_update_hal_rate_table+0x1e8/0x24c [rtl8192cu]) from [<bf451830>] (rtl_op_sta_add+0x114/0x13c [rtlwifi])
> [<bf451830>] (rtl_op_sta_add+0x114/0x13c [rtlwifi]) from [<bf0bd4d0>] (sta_info_finish_insert+0x80/0x204 [mac80211])
> [<bf0bd4d0>] (sta_info_finish_insert+0x80/0x204 [mac80211]) from [<bf0bdb18>] (sta_info_insert_non_ibss+0x108/0x140 [mac80211])
> [<bf0bdb18>] (sta_info_insert_non_ibss+0x108/0x140 [mac80211]) from [<bf0bdfc0>] (sta_info_reinsert+0x4c/0x78 [mac80211])
> [<bf0bdfc0>] (sta_info_reinsert+0x4c/0x78 [mac80211]) from [<bf0c7d4c>] (ieee80211_assoc_success+0x474/0x7c0 [mac80211])
> [<bf0c7d4c>] (ieee80211_assoc_success+0x474/0x7c0 [mac80211]) from [<bf0c81c0>] (ieee80211_assoc_done+0x128/0x1e4 [mac80211])
> [<bf0c81c0>] (ieee80211_assoc_done+0x128/0x1e4 [mac80211]) from [<bf0c98a8>] (ieee80211_work_work+0x2c0/0x11c0 [mac80211])
> [<bf0c98a8>] (ieee80211_work_work+0x2c0/0x11c0 [mac80211]) from [<c00398c4>] (process_one_work+0x294/0x474)
> [<c00398c4>] (process_one_work+0x294/0x474) from [<c003c0a8>] (worker_thread+0x214/0x344)
> [<c003c0a8>] (worker_thread+0x214/0x344) from [<c0040330>] (kthread+0x94/0x9c)
> [<c0040330>] (kthread+0x94/0x9c) from [<c0009eec>] (kernel_thread_exit+0x0/0x8)
>
> Here it looks like the lock is held in the mac80211 layer ouside of the
> lowlevel driver. I tried GFP_ATOMIC in _usb_read_sync() but this does not
> work because on one hand usb_control_msg() calls non-atomic kmalloc() itself
> and we also wait for completion of the usb transfer.
>
> Note: I'm away until end of next week so I can't test any further patches
> during that time.

Thanks for the bug report. This one will be harder to fix.

If possible, I would like one favor before you leave. These traces should be a
part of a report at bugzilla.kernel.org. Is this a regression? I have assumed so.

Larry