2017-05-22 10:57:13

by Johannes Berg

[permalink] [raw]
Subject: Re: 4.12-RC2 BUG: scheduling while atomic: irq/47-iwlwifi

On Mon, 2017-05-22 at 12:36 +0200, Sander Eikelenboom wrote:
> Hi,
>
> I encountered this splat with 4.12-RC2.

Ugh, yeah, I should've seen that in the review.

Arend, please take a look at this. cfg80211_sched_scan_results() cannot
sleep, so you can't rtnl_lock() in there. Looks like you can just rely
on RCU though?

johannes


2017-05-23 07:22:36

by Johannes Berg

[permalink] [raw]
Subject: Re: 4.12-RC2 BUG: scheduling while atomic: irq/47-iwlwifi

On Tue, 2017-05-23 at 09:19 +0200, Arend Van Spriel wrote:
>
> > WARN_ON_ONCE(!rcu_read_lock_held() && !lockdep_rtnl_is_held());
>
> Thought about something like this after sending the email. So there
> are two call sites. One for scheduled scan results notification and
> one in scheduled scan stop scenario. So for the latter it is not
> needed to use the rcu_read_lock() as it should have RTNL lock hence
> the two checks above?

Right. The latter can't even really use rcu_read_lock() since it also
wants to modify the list, and that's not sufficient protection for
modifying.

Thanks!

johannes

2017-05-22 12:10:05

by Arend van Spriel

[permalink] [raw]
Subject: Re: 4.12-RC2 BUG: scheduling while atomic: irq/47-iwlwifi

On 5/22/2017 12:57 PM, Johannes Berg wrote:
> On Mon, 2017-05-22 at 12:36 +0200, Sander Eikelenboom wrote:
>> Hi,
>>
>> I encountered this splat with 4.12-RC2.
>
> Ugh, yeah, I should've seen that in the review.
>
> Arend, please take a look at this. cfg80211_sched_scan_results() cannot
> sleep, so you can't rtnl_lock() in there. Looks like you can just rely
> on RCU though?

I see. I think you are right on RCU. Don't have the code in front of me
now, but I think the lookup has an ASSERT_RTNL. Will look into it after
my monday meeting :-p

Regards,
Arend

2017-05-22 21:04:58

by Johannes Berg

[permalink] [raw]
Subject: Re: 4.12-RC2 BUG: scheduling while atomic: irq/47-iwlwifi

Hi Arend,

Sorry, I forgot that the original message wasn't Cc'ed to the wireless
list, only netdev.

> +++ b/net/wireless/scan.c
> @@ -322,9 +322,7 @@ static void cfg80211_del_sched_scan_req(struct
> cfg80211_regi
>  {
>         struct cfg80211_sched_scan_request *pos;
>
> -       ASSERT_RTNL();
> -
> -       list_for_each_entry(pos, &rdev->sched_scan_req_list, list) {
> +       list_for_each_entry_rcu(pos, &rdev->sched_scan_req_list,
> list) {

[snip]

This looks fine, but perhaps in the above we should have some kind of
locking assertion, e.g.

WARN_ON_ONCE(!rcu_read_lock_held() && !lockdep_rtnl_is_held());

johannes

2017-05-22 21:02:38

by Arend van Spriel

[permalink] [raw]
Subject: Re: 4.12-RC2 BUG: scheduling while atomic: irq/47-iwlwifi



On 22-5-2017 14:09, Arend van Spriel wrote:
> On 5/22/2017 12:57 PM, Johannes Berg wrote:
>> On Mon, 2017-05-22 at 12:36 +0200, Sander Eikelenboom wrote:
>>> Hi,
>>>
>>> I encountered this splat with 4.12-RC2.
>>
>> Ugh, yeah, I should've seen that in the review.
>>
>> Arend, please take a look at this. cfg80211_sched_scan_results() cannot
>> sleep, so you can't rtnl_lock() in there. Looks like you can just rely
>> on RCU though?
>
> I see. I think you are right on RCU. Don't have the code in front of me
> now, but I think the lookup has an ASSERT_RTNL. Will look into it after
> my monday meeting :-p

I realized I have a laptop lying around with intel 3160 wifi chip and
tried to reproduce the issue. Did not run into the splat running
4.12-rc1 from wireless-drivers-next repo. I did not get the email from
Sander so I don't know any details.

Here is what I changed based on the info Johannes provided. Can you
please check if this get rid of the splat and let me know.

Regards,
Arend
---
diff --git a/net/wireless/scan.c b/net/wireless/scan.c
index 14d5f0c..04833bb 100644
--- a/net/wireless/scan.c
+++ b/net/wireless/scan.c
@@ -322,9 +322,7 @@ static void cfg80211_del_sched_scan_req(struct
cfg80211_regi
{
struct cfg80211_sched_scan_request *pos;

- ASSERT_RTNL();
-
- list_for_each_entry(pos, &rdev->sched_scan_req_list, list) {
+ list_for_each_entry_rcu(pos, &rdev->sched_scan_req_list, list) {
if (pos->reqid == reqid)
return pos;
}
@@ -398,13 +396,13 @@ void cfg80211_sched_scan_results(struct wiphy
*wiphy, u64
trace_cfg80211_sched_scan_results(wiphy, reqid);
/* ignore if we're not scanning */

- rtnl_lock();
+ rcu_read_lock();
request = cfg80211_find_sched_scan_req(rdev, reqid);
if (request) {
request->report_results = true;
queue_work(cfg80211_wq, &rdev->sched_scan_res_wk);
}
- rtnl_unlock();
+ rcu_read_unlock();
}
EXPORT_SYMBOL(cfg80211_sched_scan_results);

2017-05-23 18:19:26

by Sander Eikelenboom

[permalink] [raw]
Subject: Re: 4.12-RC2 BUG: scheduling while atomic: irq/47-iwlwifi

On 22/05/17 23:02, Arend Van Spriel wrote:
>
>
> On 22-5-2017 14:09, Arend van Spriel wrote:
>> On 5/22/2017 12:57 PM, Johannes Berg wrote:
>>> On Mon, 2017-05-22 at 12:36 +0200, Sander Eikelenboom wrote:
>>>> Hi,
>>>>
>>>> I encountered this splat with 4.12-RC2.
>>>
>>> Ugh, yeah, I should've seen that in the review.
>>>
>>> Arend, please take a look at this. cfg80211_sched_scan_results() cannot
>>> sleep, so you can't rtnl_lock() in there. Looks like you can just rely
>>> on RCU though?
>>
>> I see. I think you are right on RCU. Don't have the code in front of me
>> now, but I think the lookup has an ASSERT_RTNL. Will look into it after
>> my monday meeting :-p
>
> I realized I have a laptop lying around with intel 3160 wifi chip and
> tried to reproduce the issue. Did not run into the splat running
> 4.12-rc1 from wireless-drivers-next repo. I did not get the email from
> Sander so I don't know any details.
>
> Here is what I changed based on the info Johannes provided. Can you
> please check if this get rid of the splat and let me know.

Hi Arend,

I ran your patch today, so far no issues.

--
Sander


> Regards,
> Arend
> ---
> diff --git a/net/wireless/scan.c b/net/wireless/scan.c
> index 14d5f0c..04833bb 100644
> --- a/net/wireless/scan.c
> +++ b/net/wireless/scan.c
> @@ -322,9 +322,7 @@ static void cfg80211_del_sched_scan_req(struct
> cfg80211_regi
> {
> struct cfg80211_sched_scan_request *pos;
>
> - ASSERT_RTNL();
> -
> - list_for_each_entry(pos, &rdev->sched_scan_req_list, list) {
> + list_for_each_entry_rcu(pos, &rdev->sched_scan_req_list, list) {
> if (pos->reqid == reqid)
> return pos;
> }
> @@ -398,13 +396,13 @@ void cfg80211_sched_scan_results(struct wiphy
> *wiphy, u64
> trace_cfg80211_sched_scan_results(wiphy, reqid);
> /* ignore if we're not scanning */
>
> - rtnl_lock();
> + rcu_read_lock();
> request = cfg80211_find_sched_scan_req(rdev, reqid);
> if (request) {
> request->report_results = true;
> queue_work(cfg80211_wq, &rdev->sched_scan_res_wk);
> }
> - rtnl_unlock();
> + rcu_read_unlock();
> }
> EXPORT_SYMBOL(cfg80211_sched_scan_results);
>
>

2017-05-23 07:19:29

by Arend van Spriel

[permalink] [raw]
Subject: Re: 4.12-RC2 BUG: scheduling while atomic: irq/47-iwlwifi

On 22-5-2017 23:04, Johannes Berg wrote:
> Hi Arend,
>
> Sorry, I forgot that the original message wasn't Cc'ed to the wireless
> list, only netdev.

That explains. Not subscribed to that.

>> +++ b/net/wireless/scan.c
>> @@ -322,9 +322,7 @@ static void cfg80211_del_sched_scan_req(struct
>> cfg80211_regi
>> {
>> struct cfg80211_sched_scan_request *pos;
>>
>> - ASSERT_RTNL();
>> -
>> - list_for_each_entry(pos, &rdev->sched_scan_req_list, list) {
>> + list_for_each_entry_rcu(pos, &rdev->sched_scan_req_list,
>> list) {
>
> [snip]
>
> This looks fine, but perhaps in the above we should have some kind of
> locking assertion, e.g.
>
> WARN_ON_ONCE(!rcu_read_lock_held() && !lockdep_rtnl_is_held());

Thought about something like this after sending the email. So there are
two call sites. One for scheduled scan results notification and one in
scheduled scan stop scenario. So for the latter it is not needed to use
the rcu_read_lock() as it should have RTNL lock hence the two checks above?

Will create a formal patch.

Regards,
Arend

2017-05-23 07:24:55

by Arend van Spriel

[permalink] [raw]
Subject: Re: 4.12-RC2 BUG: scheduling while atomic: irq/47-iwlwifi



On 23-5-2017 9:22, Johannes Berg wrote:
> On Tue, 2017-05-23 at 09:19 +0200, Arend Van Spriel wrote:
>>
>>> WARN_ON_ONCE(!rcu_read_lock_held() && !lockdep_rtnl_is_held());
>>
>> Thought about something like this after sending the email. So there
>> are two call sites. One for scheduled scan results notification and
>> one in scheduled scan stop scenario. So for the latter it is not
>> needed to use the rcu_read_lock() as it should have RTNL lock hence
>> the two checks above?
>
> Right. The latter can't even really use rcu_read_lock() since it also
> wants to modify the list, and that's not sufficient protection for
> modifying.

Hence the name ;-)

Regards,
Arend