2021-09-01 19:02:23

by Linus Torvalds

[permalink] [raw]
Subject: Re: [GIT PULL] Networking for v5.15

On Tue, Aug 31, 2021 at 1:37 PM Jakub Kicinski <[email protected]> wrote:
>
> No conflicts at the time of writing. There were conflicts with
> char-misc but I believe Greg dropped the commits in question.

Hmm. I already merged this earlier, but didn't notice a new warning on
my desktop:

RTNL: assertion failed at net/wireless/reg.c (4025)
WARNING: CPU: 60 PID: 1720 at net/wireless/reg.c:4025
regulatory_set_wiphy_regd_sync+0x7f/0x90 [cfg80211]
Call Trace:
iwl_mvm_init_mcc+0x170/0x190 [iwlmvm]
iwl_op_mode_mvm_start+0x824/0xa60 [iwlmvm]
iwl_opmode_register+0xd0/0x130 [iwlwifi]
init_module+0x23/0x1000 [iwlmvm]

and

RTNL: assertion failed at net/wireless/reg.c (3106)
WARNING: CPU: 60 PID: 1720 at net/wireless/reg.c:3106
reg_process_self_managed_hint+0x26c/0x280 [cfg80211]
Call Trace:
regulatory_set_wiphy_regd_sync+0x3a/0x90 [cfg80211]
iwl_mvm_init_mcc+0x170/0x190 [iwlmvm]
iwl_op_mode_mvm_start+0x824/0xa60 [iwlmvm]
iwl_opmode_register+0xd0/0x130 [iwlwifi]
init_module+0x23/0x1000 [iwlmvm]

and

RTNL: assertion failed at net/wireless/core.c (84)
WARNING: CPU: 60 PID: 1720 at net/wireless/core.c:84
wiphy_idx_to_wiphy+0x97/0xd0 [cfg80211]
Call Trace:
nl80211_common_reg_change_event+0xf9/0x1e0 [cfg80211]
reg_process_self_managed_hint+0x23d/0x280 [cfg80211]
regulatory_set_wiphy_regd_sync+0x3a/0x90 [cfg80211]
iwl_mvm_init_mcc+0x170/0x190 [iwlmvm]
iwl_op_mode_mvm_start+0x824/0xa60 [iwlmvm]
iwl_opmode_register+0xd0/0x130 [iwlwifi]
init_module+0x23/0x1000 [iwlmvm]

and

RTNL: assertion failed at net/wireless/core.c (61)
WARNING: CPU: 60 PID: 1720 at net/wireless/core.c:61
wiphy_idx_to_wiphy+0xbf/0xd0 [cfg80211]
Call Trace:
nl80211_common_reg_change_event+0xf9/0x1e0 [cfg80211]
reg_process_self_managed_hint+0x23d/0x280 [cfg80211]
regulatory_set_wiphy_regd_sync+0x3a/0x90 [cfg80211]
iwl_mvm_init_mcc+0x170/0x190 [iwlmvm]
iwl_op_mode_mvm_start+0x824/0xa60 [iwlmvm]
iwl_opmode_register+0xd0/0x130 [iwlwifi]
init_module+0x23/0x1000 [iwlmvm]

They all seem to have that same issue, and it looks like the fix would
be to get the RTN lock in iwl_mvm_init_mcc(), but I didn't really look
into it very much.

This is on my desktop, and I actually don't _use_ the wireless on this
machine. I assume it still works despite the warnings, but they should
get fixed.

I *don't* see these warnings on my laptop where I actually use
wireless, but that one uses ath10k_pci, so it seems this is purely a
iwlwifi issue.

I can't be the only one that sees this. Hmm?

Linus


2021-09-01 20:25:12

by Jakub Kicinski

[permalink] [raw]
Subject: Re: [GIT PULL] Networking for v5.15

On Wed, 1 Sep 2021 12:00:57 -0700 Linus Torvalds wrote:
> On Tue, Aug 31, 2021 at 1:37 PM Jakub Kicinski <[email protected]> wrote:
> >
> > No conflicts at the time of writing. There were conflicts with
> > char-misc but I believe Greg dropped the commits in question.
>
> Hmm. I already merged this earlier, but didn't notice a new warning on
> my desktop:

> RTNL: assertion failed at net/wireless/core.c (61)
> WARNING: CPU: 60 PID: 1720 at net/wireless/core.c:61
> wiphy_idx_to_wiphy+0xbf/0xd0 [cfg80211]
> Call Trace:
> nl80211_common_reg_change_event+0xf9/0x1e0 [cfg80211]
> reg_process_self_managed_hint+0x23d/0x280 [cfg80211]
> regulatory_set_wiphy_regd_sync+0x3a/0x90 [cfg80211]
> iwl_mvm_init_mcc+0x170/0x190 [iwlmvm]
> iwl_op_mode_mvm_start+0x824/0xa60 [iwlmvm]
> iwl_opmode_register+0xd0/0x130 [iwlwifi]
> init_module+0x23/0x1000 [iwlmvm]
>
> They all seem to have that same issue, and it looks like the fix would
> be to get the RTN lock in iwl_mvm_init_mcc(), but I didn't really look
> into it very much.
>
> This is on my desktop, and I actually don't _use_ the wireless on this
> machine. I assume it still works despite the warnings, but they should
> get fixed.
>
> I *don't* see these warnings on my laptop where I actually use
> wireless, but that one uses ath10k_pci, so it seems this is purely a
> iwlwifi issue.
>
> I can't be the only one that sees this. Hmm?

Mm. Looking thru the recent commits there is a suspicious rtnl_unlock()
in commit eb09ae93dabf ("iwlwifi: mvm: load regdomain at INIT stage").

CC Miri, Johannes

2021-09-01 20:25:12

by Johannes Berg

[permalink] [raw]
Subject: Re: [GIT PULL] Networking for v5.15

On Wed, 2021-09-01 at 12:41 -0700, Jakub Kicinski wrote:
>
> >
> > They all seem to have that same issue, and it looks like the fix would
> > be to get the RTN lock in iwl_mvm_init_mcc(), but I didn't really look
> > into it very much.
> >
> > This is on my desktop, and I actually don't _use_ the wireless on this
> > machine. I assume it still works despite the warnings, but they should
> > get fixed.
> >
> > I *don't* see these warnings on my laptop where I actually use
> > wireless, but that one uses ath10k_pci, so it seems this is purely a
> > iwlwifi issue.
> >
> > I can't be the only one that sees this. Hmm?
>
> Mm. Looking thru the recent commits there is a suspicious rtnl_unlock()
> in commit eb09ae93dabf ("iwlwifi: mvm: load regdomain at INIT stage").

Huh! That's not the version of the commit I remember - it had an
rtnl_lock() in there too (just before the mutex_lock)?! Looks like that
should really be there, not sure how/where it got lost along the way.

That unbalanced rtnl_unlock() makes no sense anyway. Wonder why it
doesn't cause more assertions/problems at that point, clearly it's
unbalanced. Pretty sure it's missing the rtnl_lock() earlier in the
function for some reason.

Luca and I will look at it tomorrow, getting late here, sorry.

johannes

2021-09-01 22:46:26

by Larry Finger

[permalink] [raw]
Subject: Re: [GIT PULL] Networking for v5.15

On 9/1/21 14:49, Johannes Berg wrote:
> On Wed, 2021-09-01 at 12:41 -0700, Jakub Kicinski wrote:
>>
>>>
>>> They all seem to have that same issue, and it looks like the fix would
>>> be to get the RTN lock in iwl_mvm_init_mcc(), but I didn't really look
>>> into it very much.
>>>
>>> This is on my desktop, and I actually don't _use_ the wireless on this
>>> machine. I assume it still works despite the warnings, but they should
>>> get fixed.
>>>
>>> I *don't* see these warnings on my laptop where I actually use
>>> wireless, but that one uses ath10k_pci, so it seems this is purely a
>>> iwlwifi issue.
>>>
>>> I can't be the only one that sees this. Hmm?
>>
>> Mm. Looking thru the recent commits there is a suspicious rtnl_unlock()
>> in commit eb09ae93dabf ("iwlwifi: mvm: load regdomain at INIT stage").
>
> Huh! That's not the version of the commit I remember - it had an
> rtnl_lock() in there too (just before the mutex_lock)?! Looks like that
> should really be there, not sure how/where it got lost along the way.
>
> That unbalanced rtnl_unlock() makes no sense anyway. Wonder why it
> doesn't cause more assertions/problems at that point, clearly it's
> unbalanced. Pretty sure it's missing the rtnl_lock() earlier in the
> function for some reason.
>
> Luca and I will look at it tomorrow, getting late here, sorry.
>
> johannes
>
I am seeing the same problem, and it does happen in lots of places. For example

finger@2603-8090-2005-39b3-0000-0000-0000-1023:~/rtl8812au>dmesg | grep
assertion\ failed
[ 6.465589] RTNL: assertion failed at net/core/rtnetlink.c (1702)
[ 6.465948] RTNL: assertion failed at net/core/devlink.c (11496)
[ 6.466263] RTNL: assertion failed at net/core/rtnetlink.c (1412)
[ 6.466500] RTNL: assertion failed at net/core/dev.c (1987)
[ 6.466708] RTNL: assertion failed at net/core/fib_rules.c (1227)
[ 6.466902] RTNL: assertion failed at net/ipv4/devinet.c (1526)
[ 6.467097] RTNL: assertion failed at net/ipv4/igmp.c (1779)
[ 6.467291] RTNL: assertion failed at net/ipv4/igmp.c (1432)

I am in the process of bisecting the problem, just in case it happens some other
place than your suspicion leads you.

Larry

2021-09-02 05:58:55

by Larry Finger

[permalink] [raw]
Subject: Re: [GIT PULL] Networking for v5.15

On 9/1/21 14:41, Jakub Kicinski wrote:
> On Wed, 1 Sep 2021 12:00:57 -0700 Linus Torvalds wrote:
>> On Tue, Aug 31, 2021 at 1:37 PM Jakub Kicinski <[email protected]> wrote:
>>>
>>> No conflicts at the time of writing. There were conflicts with
>>> char-misc but I believe Greg dropped the commits in question.
>>
>> Hmm. I already merged this earlier, but didn't notice a new warning on
>> my desktop:
>
>> RTNL: assertion failed at net/wireless/core.c (61)
>> WARNING: CPU: 60 PID: 1720 at net/wireless/core.c:61
>> wiphy_idx_to_wiphy+0xbf/0xd0 [cfg80211]
>> Call Trace:
>> nl80211_common_reg_change_event+0xf9/0x1e0 [cfg80211]
>> reg_process_self_managed_hint+0x23d/0x280 [cfg80211]
>> regulatory_set_wiphy_regd_sync+0x3a/0x90 [cfg80211]
>> iwl_mvm_init_mcc+0x170/0x190 [iwlmvm]
>> iwl_op_mode_mvm_start+0x824/0xa60 [iwlmvm]
>> iwl_opmode_register+0xd0/0x130 [iwlwifi]
>> init_module+0x23/0x1000 [iwlmvm]
>>
>> They all seem to have that same issue, and it looks like the fix would
>> be to get the RTN lock in iwl_mvm_init_mcc(), but I didn't really look
>> into it very much.
>>
>> This is on my desktop, and I actually don't _use_ the wireless on this
>> machine. I assume it still works despite the warnings, but they should
>> get fixed.
>>
>> I *don't* see these warnings on my laptop where I actually use
>> wireless, but that one uses ath10k_pci, so it seems this is purely a
>> iwlwifi issue.
>>
>> I can't be the only one that sees this. Hmm?
>
> Mm. Looking thru the recent commits there is a suspicious rtnl_unlock()
> in commit eb09ae93dabf ("iwlwifi: mvm: load regdomain at INIT stage").
>
> CC Miri, Johannes
>

I did not get the bisection finished tonight, but commit eb09ae93dabf is not the
problem.

My bisection has identified commit 7a3f5b0de36 ("netfilter: add netfilter hooks
to SRv6 data plane") as bad, and commit 9055a2f59162 ("ixp4xx_eth: make ptp
support a platform driver") as good.

Larry

2021-09-02 07:11:54

by Johannes Berg

[permalink] [raw]
Subject: Re: [GIT PULL] Networking for v5.15

On Thu, 2021-09-02 at 00:55 -0500, Larry Finger wrote:
>
> I did not get the bisection finished tonight, but commit eb09ae93dabf is not the
> problem.
>
> My bisection has identified commit 7a3f5b0de36 ("netfilter: add netfilter hooks
> to SRv6 data plane") as bad, and commit 9055a2f59162 ("ixp4xx_eth: make ptp
> support a platform driver") as good.

Can you send the backtraces from the RTNL assertions you posted?
Probably easier that way anyway.

johannes

2021-09-02 09:34:45

by Luciano Coelho

[permalink] [raw]
Subject: Re: [GIT PULL] Networking for v5.15

On Wed, 2021-09-01 at 21:49 +0200, Johannes Berg wrote:
> On Wed, 2021-09-01 at 12:41 -0700, Jakub Kicinski wrote:
> >
> > >
> > > They all seem to have that same issue, and it looks like the fix would
> > > be to get the RTN lock in iwl_mvm_init_mcc(), but I didn't really look
> > > into it very much.
> > >
> > > This is on my desktop, and I actually don't _use_ the wireless on this
> > > machine. I assume it still works despite the warnings, but they should
> > > get fixed.
> > >
> > > I *don't* see these warnings on my laptop where I actually use
> > > wireless, but that one uses ath10k_pci, so it seems this is purely a
> > > iwlwifi issue.
> > >
> > > I can't be the only one that sees this. Hmm?
> >
> > Mm. Looking thru the recent commits there is a suspicious rtnl_unlock()
> > in commit eb09ae93dabf ("iwlwifi: mvm: load regdomain at INIT stage").
>
> Huh! That's not the version of the commit I remember - it had an
> rtnl_lock() in there too (just before the mutex_lock)?! Looks like that
> should really be there, not sure how/where it got lost along the way.
>
> That unbalanced rtnl_unlock() makes no sense anyway. Wonder why it
> doesn't cause more assertions/problems at that point, clearly it's
> unbalanced. Pretty sure it's missing the rtnl_lock() earlier in the
> function for some reason.
>
> Luca and I will look at it tomorrow, getting late here, sorry.

Right, the reason for this was a rebase damage. We lost the
rtnl_lock() call when I rebased the patch on top of the tree without
iwlmei (which touch this same function).

Sorry for the trouble, I'll send the fix in a sec.

--
Cheers,
Luca.