linux-next 20160420 is whining at an incredible rate - in 20 minutes of
uptime, I piled up some 41,000 hits from all over the place (cleaned up
to skip the CPU and PID so the list isn't quite so long):
% grep include/net/sock.h /var/log/messages | cut -f5- -d: | sed -e 's/PID: [0-9]* /PID: (elided) /' -e 's/CPU: [0-3]/CPU: +/' | sort | uniq -c | sort -nr
13468 CPU: + PID: (elided) at include/net/sock.h:1408 tcp_v6_rcv+0xc20/0xcb0
9770 CPU: + PID: (elided) at include/net/sock.h:1408 udp_queue_rcv_skb+0x3ca/0x6d0
7706 CPU: + PID: (elided) at include/net/sock.h:1408 sock_owned_by_user+0x91/0xa0
2818 CPU: + PID: (elided) at include/net/sock.h:1408 udpv6_queue_rcv_skb+0x3b6/0x6d0
1981 CPU: + PID: (elided) at include/net/sock.h:1408 tcp_write_timer+0xf2/0x110
1954 CPU: + PID: (elided) at include/net/sock.h:1408 tcp_delack_timer+0x110/0x130
1912 CPU: + PID: (elided) at include/net/sock.h:1408 tcp_keepalive_timer+0x136/0x2c0
882 CPU: + PID: (elided) at include/net/sock.h:1408 tcp_close+0x226/0x4f0
804 CPU: + PID: (elided) at include/net/sock.h:1408 tcp_tasklet_func+0x192/0x1e0
28 CPU: + PID: (elided) at include/net/sock.h:1408 tcp_child_process+0x17a/0x350
2 CPU: + PID: (elided) at include/net/sock.h:1408 tcp_v6_err+0x401/0x660
2 CPU: + PID: (elided) at include/net/sock.h:1408 tcp_v6_err+0x1fd/0x660
Seems to be from this commit, which is apparently over-stringent or
isn't handling some case correctly:
commit fafc4e1ea1a4c1eb13a30c9426fb799f5efacbc3
Author: Hannes Frederic Sowa <[email protected]>
Date: Fri Apr 8 15:11:27 2016 +0200
sock: tigthen lockdep checks for sock_owned_by_user
sock_owned_by_user should not be used without socket lock held. It seems
to be a common practice to check .owned before lock reclassification, so
provide a little help to abstract this check away.
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Hannes Frederic Sowa <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Hi,
On Thu, Apr 21, 2016, at 02:30, Valdis Kletnieks wrote:
> linux-next 20160420 is whining at an incredible rate - in 20 minutes of
> uptime, I piled up some 41,000 hits from all over the place (cleaned up
> to skip the CPU and PID so the list isn't quite so long):
Thanks for the report. Can you give me some more details:
Is this an nfs socket? Do you by accident know if this socket went
through xs_reclassify_socket at any point? We do hold the appropriate
locks at that point but I fear that the lockdep reinitialization
confused lockdep.
Thanks,
Hannes
On Thu, 21 Apr 2016 09:42:12 +0200, Hannes Frederic Sowa said:
> Hi,
>
> On Thu, Apr 21, 2016, at 02:30, Valdis Kletnieks wrote:
> > linux-next 20160420 is whining at an incredible rate - in 20 minutes of
> > uptime, I piled up some 41,000 hits from all over the place (cleaned up
> > to skip the CPU and PID so the list isn't quite so long):
>
> Thanks for the report. Can you give me some more details:
>
> Is this an nfs socket? Do you by accident know if this socket went
> through xs_reclassify_socket at any point? We do hold the appropriate
> locks at that point but I fear that the lockdep reinitialization
> confused lockdep.
It wasn't an NFS socket, as NFS wasn't even active at the time. I'm reasonably
sure that multiple sockets were in play, given that tcp_v6_rcv and
udpv6_queue_rcv_skb were both implicated. I strongly suspect that pretty much
any IPv6 traffic could do it - the frequency dropped off quite a bit when I
closed firefox, which is usually a heavy network hitter on my laptop.
On Thu, 2016-04-21 at 05:05 -0400, [email protected] wrote:
> On Thu, 21 Apr 2016 09:42:12 +0200, Hannes Frederic Sowa said:
> > Hi,
> >
> > On Thu, Apr 21, 2016, at 02:30, Valdis Kletnieks wrote:
> > > linux-next 20160420 is whining at an incredible rate - in 20 minutes of
> > > uptime, I piled up some 41,000 hits from all over the place (cleaned up
> > > to skip the CPU and PID so the list isn't quite so long):
> >
> > Thanks for the report. Can you give me some more details:
> >
> > Is this an nfs socket? Do you by accident know if this socket went
> > through xs_reclassify_socket at any point? We do hold the appropriate
> > locks at that point but I fear that the lockdep reinitialization
> > confused lockdep.
>
> It wasn't an NFS socket, as NFS wasn't even active at the time. I'm reasonably
> sure that multiple sockets were in play, given that tcp_v6_rcv and
> udpv6_queue_rcv_skb were both implicated. I strongly suspect that pretty much
> any IPv6 traffic could do it - the frequency dropped off quite a bit when I
> closed firefox, which is usually a heavy network hitter on my laptop.
Looks like the following patch is needed, can you try it please ?
Thanks !
diff --git a/include/net/sock.h b/include/net/sock.h
index d997ec13a643..db8301c76d50 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -1350,7 +1350,8 @@ static inline bool lockdep_sock_is_held(const struct sock *csk)
{
struct sock *sk = (struct sock *)csk;
- return lockdep_is_held(&sk->sk_lock) ||
+ return !debug_locks ||
+ lockdep_is_held(&sk->sk_lock) ||
lockdep_is_held(&sk->sk_lock.slock);
}
#endif
On 21.04.2016 15:31, Eric Dumazet wrote:
> On Thu, 2016-04-21 at 05:05 -0400, [email protected] wrote:
>> On Thu, 21 Apr 2016 09:42:12 +0200, Hannes Frederic Sowa said:
>>> Hi,
>>>
>>> On Thu, Apr 21, 2016, at 02:30, Valdis Kletnieks wrote:
>>>> linux-next 20160420 is whining at an incredible rate - in 20 minutes of
>>>> uptime, I piled up some 41,000 hits from all over the place (cleaned up
>>>> to skip the CPU and PID so the list isn't quite so long):
>>>
>>> Thanks for the report. Can you give me some more details:
>>>
>>> Is this an nfs socket? Do you by accident know if this socket went
>>> through xs_reclassify_socket at any point? We do hold the appropriate
>>> locks at that point but I fear that the lockdep reinitialization
>>> confused lockdep.
>>
>> It wasn't an NFS socket, as NFS wasn't even active at the time. I'm reasonably
>> sure that multiple sockets were in play, given that tcp_v6_rcv and
>> udpv6_queue_rcv_skb were both implicated. I strongly suspect that pretty much
>> any IPv6 traffic could do it - the frequency dropped off quite a bit when I
>> closed firefox, which is usually a heavy network hitter on my laptop.
>
>
> Looks like the following patch is needed, can you try it please ?
>
> Thanks !
>
> diff --git a/include/net/sock.h b/include/net/sock.h
> index d997ec13a643..db8301c76d50 100644
> --- a/include/net/sock.h
> +++ b/include/net/sock.h
> @@ -1350,7 +1350,8 @@ static inline bool lockdep_sock_is_held(const struct sock *csk)
> {
> struct sock *sk = (struct sock *)csk;
>
> - return lockdep_is_held(&sk->sk_lock) ||
> + return !debug_locks ||
> + lockdep_is_held(&sk->sk_lock) ||
> lockdep_is_held(&sk->sk_lock.slock);
> }
> #endif
I would prefer to add debug_locks at the WARN_ON level, like
WARN_ON(debug_locks && !lockdep_sock_is_held(sk)), but I am not sure if
this fixes the initial splat.
Thanks Eric!
On 21.04.2016 15:31, Eric Dumazet wrote:
> On Thu, 2016-04-21 at 05:05 -0400, [email protected] wrote:
>> On Thu, 21 Apr 2016 09:42:12 +0200, Hannes Frederic Sowa said:
>>> Hi,
>>>
>>> On Thu, Apr 21, 2016, at 02:30, Valdis Kletnieks wrote:
>>>> linux-next 20160420 is whining at an incredible rate - in 20 minutes of
>>>> uptime, I piled up some 41,000 hits from all over the place (cleaned up
>>>> to skip the CPU and PID so the list isn't quite so long):
>>>
>>> Thanks for the report. Can you give me some more details:
>>>
>>> Is this an nfs socket? Do you by accident know if this socket went
>>> through xs_reclassify_socket at any point? We do hold the appropriate
>>> locks at that point but I fear that the lockdep reinitialization
>>> confused lockdep.
>>
>> It wasn't an NFS socket, as NFS wasn't even active at the time. I'm reasonably
>> sure that multiple sockets were in play, given that tcp_v6_rcv and
>> udpv6_queue_rcv_skb were both implicated. I strongly suspect that pretty much
>> any IPv6 traffic could do it - the frequency dropped off quite a bit when I
>> closed firefox, which is usually a heavy network hitter on my laptop.
>
>
> Looks like the following patch is needed, can you try it please ?
>
> Thanks !
>
> diff --git a/include/net/sock.h b/include/net/sock.h
> index d997ec13a643..db8301c76d50 100644
> --- a/include/net/sock.h
> +++ b/include/net/sock.h
> @@ -1350,7 +1350,8 @@ static inline bool lockdep_sock_is_held(const struct sock *csk)
> {
> struct sock *sk = (struct sock *)csk;
>
> - return lockdep_is_held(&sk->sk_lock) ||
> + return !debug_locks ||
> + lockdep_is_held(&sk->sk_lock) ||
> lockdep_is_held(&sk->sk_lock.slock);
> }
> #endif
I am a little bit lost because I cannot reproduce this bug. I thought
maybe it has something to do with single cpu spin_locks which don't
update the lockdep_maps but I couldn't reproduce it. If debug_locks get
flipped you should see something in dmesg, too. Maybe you have this
handy? Was there another lockdep splat before the networking ones? Also
the config would be helpful.
Thanks,
Hannes
From: Hannes Frederic Sowa <[email protected]>
Date: Thu, 21 Apr 2016 15:49:37 +0200
> On 21.04.2016 15:31, Eric Dumazet wrote:
>> On Thu, 2016-04-21 at 05:05 -0400, [email protected] wrote:
>>> On Thu, 21 Apr 2016 09:42:12 +0200, Hannes Frederic Sowa said:
>>>> Hi,
>>>>
>>>> On Thu, Apr 21, 2016, at 02:30, Valdis Kletnieks wrote:
>>>>> linux-next 20160420 is whining at an incredible rate - in 20 minutes of
>>>>> uptime, I piled up some 41,000 hits from all over the place (cleaned up
>>>>> to skip the CPU and PID so the list isn't quite so long):
>>>>
>>>> Thanks for the report. Can you give me some more details:
>>>>
>>>> Is this an nfs socket? Do you by accident know if this socket went
>>>> through xs_reclassify_socket at any point? We do hold the appropriate
>>>> locks at that point but I fear that the lockdep reinitialization
>>>> confused lockdep.
>>>
>>> It wasn't an NFS socket, as NFS wasn't even active at the time. I'm reasonably
>>> sure that multiple sockets were in play, given that tcp_v6_rcv and
>>> udpv6_queue_rcv_skb were both implicated. I strongly suspect that pretty much
>>> any IPv6 traffic could do it - the frequency dropped off quite a bit when I
>>> closed firefox, which is usually a heavy network hitter on my laptop.
>>
>>
>> Looks like the following patch is needed, can you try it please ?
>>
>> Thanks !
>>
>> diff --git a/include/net/sock.h b/include/net/sock.h
>> index d997ec13a643..db8301c76d50 100644
>> --- a/include/net/sock.h
>> +++ b/include/net/sock.h
>> @@ -1350,7 +1350,8 @@ static inline bool lockdep_sock_is_held(const struct sock *csk)
>> {
>> struct sock *sk = (struct sock *)csk;
>>
>> - return lockdep_is_held(&sk->sk_lock) ||
>> + return !debug_locks ||
>> + lockdep_is_held(&sk->sk_lock) ||
>> lockdep_is_held(&sk->sk_lock.slock);
>> }
>> #endif
>
> I would prefer to add debug_locks at the WARN_ON level, like
> WARN_ON(debug_locks && !lockdep_sock_is_held(sk)), but I am not sure if
> this fixes the initial splat.
Can we finish this conversation out and come up with a final patch
for this soon?
Thanks.
On 24.04.2016 20:38, David Miller wrote:
> From: Hannes Frederic Sowa <[email protected]>
> Date: Thu, 21 Apr 2016 15:49:37 +0200
>
>> On 21.04.2016 15:31, Eric Dumazet wrote:
>>> On Thu, 2016-04-21 at 05:05 -0400, [email protected] wrote:
>>>> On Thu, 21 Apr 2016 09:42:12 +0200, Hannes Frederic Sowa said:
>>>>> Hi,
>>>>>
>>>>> On Thu, Apr 21, 2016, at 02:30, Valdis Kletnieks wrote:
>>>>>> linux-next 20160420 is whining at an incredible rate - in 20 minutes of
>>>>>> uptime, I piled up some 41,000 hits from all over the place (cleaned up
>>>>>> to skip the CPU and PID so the list isn't quite so long):
>>>>>
>>>>> Thanks for the report. Can you give me some more details:
>>>>>
>>>>> Is this an nfs socket? Do you by accident know if this socket went
>>>>> through xs_reclassify_socket at any point? We do hold the appropriate
>>>>> locks at that point but I fear that the lockdep reinitialization
>>>>> confused lockdep.
>>>>
>>>> It wasn't an NFS socket, as NFS wasn't even active at the time. I'm reasonably
>>>> sure that multiple sockets were in play, given that tcp_v6_rcv and
>>>> udpv6_queue_rcv_skb were both implicated. I strongly suspect that pretty much
>>>> any IPv6 traffic could do it - the frequency dropped off quite a bit when I
>>>> closed firefox, which is usually a heavy network hitter on my laptop.
>>>
>>>
>>> Looks like the following patch is needed, can you try it please ?
>>>
>>> Thanks !
>>>
>>> diff --git a/include/net/sock.h b/include/net/sock.h
>>> index d997ec13a643..db8301c76d50 100644
>>> --- a/include/net/sock.h
>>> +++ b/include/net/sock.h
>>> @@ -1350,7 +1350,8 @@ static inline bool lockdep_sock_is_held(const struct sock *csk)
>>> {
>>> struct sock *sk = (struct sock *)csk;
>>>
>>> - return lockdep_is_held(&sk->sk_lock) ||
>>> + return !debug_locks ||
>>> + lockdep_is_held(&sk->sk_lock) ||
>>> lockdep_is_held(&sk->sk_lock.slock);
>>> }
>>> #endif
>>
>> I would prefer to add debug_locks at the WARN_ON level, like
>> WARN_ON(debug_locks && !lockdep_sock_is_held(sk)), but I am not sure if
>> this fixes the initial splat.
>
> Can we finish this conversation out and come up with a final patch
> for this soon?
Eric's patch is worth to apply anyway, but I am not sure if it solves
the (fundamental) problem. I couldn't reproduce it with the exact next-
tag provided in the initial mail. All other reports also only happend
with linux-next and not net-next.
I hope I Valdis provides his config soon and I will continue my analysis
on this then.
Thanks,
Hannes
From: Hannes Frederic Sowa <[email protected]>
Date: Sun, 24 Apr 2016 20:48:24 +0200
> Eric's patch is worth to apply anyway, but I am not sure if it solves
> the (fundamental) problem. I couldn't reproduce it with the exact next-
> tag provided in the initial mail. All other reports also only happend
> with linux-next and not net-next.
Ok, Eric please submit it formally.
Thanks!
On Sun, 2016-04-24 at 20:48 +0200, Hannes Frederic Sowa wrote:
> On 24.04.2016 20:38, David Miller wrote:
> > From: Hannes Frederic Sowa <[email protected]>
> > Date: Thu, 21 Apr 2016 15:49:37 +0200
> >
> >> On 21.04.2016 15:31, Eric Dumazet wrote:
> >>> On Thu, 2016-04-21 at 05:05 -0400, [email protected] wrote:
> >>>> On Thu, 21 Apr 2016 09:42:12 +0200, Hannes Frederic Sowa said:
> >>>>> Hi,
> >>>>>
> >>>>> On Thu, Apr 21, 2016, at 02:30, Valdis Kletnieks wrote:
> >>>>>> linux-next 20160420 is whining at an incredible rate - in 20 minutes of
> >>>>>> uptime, I piled up some 41,000 hits from all over the place (cleaned up
> >>>>>> to skip the CPU and PID so the list isn't quite so long):
> >>>>>
> >>>>> Thanks for the report. Can you give me some more details:
> >>>>>
> >>>>> Is this an nfs socket? Do you by accident know if this socket went
> >>>>> through xs_reclassify_socket at any point? We do hold the appropriate
> >>>>> locks at that point but I fear that the lockdep reinitialization
> >>>>> confused lockdep.
> >>>>
> >>>> It wasn't an NFS socket, as NFS wasn't even active at the time. I'm reasonably
> >>>> sure that multiple sockets were in play, given that tcp_v6_rcv and
> >>>> udpv6_queue_rcv_skb were both implicated. I strongly suspect that pretty much
> >>>> any IPv6 traffic could do it - the frequency dropped off quite a bit when I
> >>>> closed firefox, which is usually a heavy network hitter on my laptop.
> >>>
> >>>
> >>> Looks like the following patch is needed, can you try it please ?
> >>>
> >>> Thanks !
> >>>
> >>> diff --git a/include/net/sock.h b/include/net/sock.h
> >>> index d997ec13a643..db8301c76d50 100644
> >>> --- a/include/net/sock.h
> >>> +++ b/include/net/sock.h
> >>> @@ -1350,7 +1350,8 @@ static inline bool lockdep_sock_is_held(const struct sock *csk)
> >>> {
> >>> struct sock *sk = (struct sock *)csk;
> >>>
> >>> - return lockdep_is_held(&sk->sk_lock) ||
> >>> + return !debug_locks ||
> >>> + lockdep_is_held(&sk->sk_lock) ||
> >>> lockdep_is_held(&sk->sk_lock.slock);
> >>> }
> >>> #endif
> >>
> >> I would prefer to add debug_locks at the WARN_ON level, like
> >> WARN_ON(debug_locks && !lockdep_sock_is_held(sk)), but I am not sure if
> >> this fixes the initial splat.
> >
> > Can we finish this conversation out and come up with a final patch
> > for this soon?
>
> Eric's patch is worth to apply anyway, but I am not sure if it solves
> the (fundamental) problem. I couldn't reproduce it with the exact next-
> tag provided in the initial mail. All other reports also only happend
> with linux-next and not net-next.
>
> I hope I Valdis provides his config soon and I will continue my analysis
> on this then.
Should be easy to force a lockdep splat and check if the patch solves
the issue.
Issue here is that once lockdep detected a problem (not necessarily in
net/ tree btw), your helper always 'detect' a problem, since lockdep
automatically disables itself.
On Sun, 2016-04-24 at 14:54 -0400, David Miller wrote:
> From: Hannes Frederic Sowa <[email protected]>
> Date: Sun, 24 Apr 2016 20:48:24 +0200
>
> > Eric's patch is worth to apply anyway, but I am not sure if it solves
> > the (fundamental) problem. I couldn't reproduce it with the exact next-
> > tag provided in the initial mail. All other reports also only happend
> > with linux-next and not net-next.
>
> Ok, Eric please submit it formally.
>
> Thanks!
Yes, I will first test it, and will take Hannes suggestion as well.
On Sun, 24 Apr 2016 12:46:42 -0700, Eric Dumazet said:
> >>> + return !debug_locks ||
> >>> + lockdep_is_held(&sk->sk_lock) ||
> Issue here is that once lockdep detected a problem (not necessarily in
> net/ tree btw), your helper always 'detect' a problem, since lockdep
> automatically disables itself.
"D'Oh!" -- H. Simpson
I thought this patch looked suspect, but couldn't put my finger on it. The
reason why I got like 41,000 of them is because I built a kernel that has
lockdep enabled, but I have an out-of-tree module that doesn't init something,
so I get this:
[ 48.898156] INFO: trying to register non-static key.
[ 48.898157] the code is fine but needs lockdep annotation.
[ 48.898157] turning off the locking correctness validator.
After which point, even with this patch, every time through it's still going to
explode.
On Sun, 2016-04-24 at 15:56 -0400, [email protected] wrote:
> On Sun, 24 Apr 2016 12:46:42 -0700, Eric Dumazet said:
>
> > >>> + return !debug_locks ||
> > >>> + lockdep_is_held(&sk->sk_lock) ||
>
> > Issue here is that once lockdep detected a problem (not necessarily in
> > net/ tree btw), your helper always 'detect' a problem, since lockdep
> > automatically disables itself.
>
> "D'Oh!" -- H. Simpson
>
> I thought this patch looked suspect, but couldn't put my finger on it. The
> reason why I got like 41,000 of them is because I built a kernel that has
> lockdep enabled, but I have an out-of-tree module that doesn't init something,
> so I get this:
>
> [ 48.898156] INFO: trying to register non-static key.
> [ 48.898157] the code is fine but needs lockdep annotation.
> [ 48.898157] turning off the locking correctness validator.
>
> After which point, even with this patch, every time through it's still going to
> explode.
Which patch are you talking about ?
On Sun, 24 Apr 2016 14:00:17 -0700, Eric Dumazet said:
> On Sun, 2016-04-24 at 15:56 -0400, [email protected] wrote:
> > On Sun, 24 Apr 2016 12:46:42 -0700, Eric Dumazet said:
> >
> > > >>> + return !debug_locks ||
> > > >>> + lockdep_is_held(&sk->sk_lock) ||
> >
> > > Issue here is that once lockdep detected a problem (not necessarily in
> > > net/ tree btw), your helper always 'detect' a problem, since lockdep
> > > automatically disables itself.
> >
> > "D'Oh!" -- H. Simpson
> >
> > I thought this patch looked suspect, but couldn't put my finger on it. The
> > reason why I got like 41,000 of them is because I built a kernel that has
> > lockdep enabled, but I have an out-of-tree module that doesn't init something,
> > so I get this:
> >
> > [ 48.898156] INFO: trying to register non-static key.
> > [ 48.898157] the code is fine but needs lockdep annotation.
> > [ 48.898157] turning off the locking correctness validator.
> >
> > After which point, even with this patch, every time through it's still going to
> > explode.
>
> Which patch are you talking about ?
The one that adds the !debug_locks check - once my out-of-kernel module
hits something that turns off lockdep, it's *still* going to complain on
pretty much all the same packets it complained about earlier. I thought
it looked suspicious, but you clarified why...
On Sun, 2016-04-24 at 17:13 -0400, [email protected] wrote:
> On Sun, 24 Apr 2016 14:00:17 -0700, Eric Dumazet said:
> > On Sun, 2016-04-24 at 15:56 -0400, [email protected] wrote:
> > > On Sun, 24 Apr 2016 12:46:42 -0700, Eric Dumazet said:
> > >
> > > > >>> + return !debug_locks ||
> > > > >>> + lockdep_is_held(&sk->sk_lock) ||
> > >
> > > > Issue here is that once lockdep detected a problem (not necessarily in
> > > > net/ tree btw), your helper always 'detect' a problem, since lockdep
> > > > automatically disables itself.
> > >
> > > "D'Oh!" -- H. Simpson
> > >
> > > I thought this patch looked suspect, but couldn't put my finger on it. The
> > > reason why I got like 41,000 of them is because I built a kernel that has
> > > lockdep enabled, but I have an out-of-tree module that doesn't init something,
> > > so I get this:
> > >
> > > [ 48.898156] INFO: trying to register non-static key.
> > > [ 48.898157] the code is fine but needs lockdep annotation.
> > > [ 48.898157] turning off the locking correctness validator.
> > >
> > > After which point, even with this patch, every time through it's still going to
> > > explode.
> >
> > Which patch are you talking about ?
>
> The one that adds the !debug_locks check - once my out-of-kernel module
> hits something that turns off lockdep, it's *still* going to complain on
> pretty much all the same packets it complained about earlier. I thought
> it looked suspicious, but you clarified why...
It does not make sense to me. If lockdep is disabled, then debug_locks
is 0.
So no complain should happen from networking.
I was about to send following patch, please check it solves the issue. ?
(It certainly did for me, once I forced a lockdep splat loading a buggy
module)
Thanks
From: Eric Dumazet <[email protected]>
Valdis reported tons of stack dumps caused by WARN_ON() in sock_owned_by_user()
This test needs to be relaxed if/when lockdep disables itself.
Note that other lockdep_sock_is_held() callers are all from
rcu_dereference_protected() sections which already are disabled
if/when lockdep has been disabled.
Fixes: fafc4e1ea1a4 ("sock: tigthen lockdep checks for sock_owned_by_user")
Reported-by: Valdis Kletnieks <[email protected]>
Signed-off-by: Eric Dumazet <[email protected]>
---
include/net/sock.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/include/net/sock.h b/include/net/sock.h
index 52448baf19d7..f492d01512ed 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -1409,7 +1409,7 @@ static inline void unlock_sock_fast(struct sock *sk, bool slow)
static inline bool sock_owned_by_user(const struct sock *sk)
{
#ifdef CONFIG_LOCKDEP
- WARN_ON(!lockdep_sock_is_held(sk));
+ WARN_ON_ONCE(!lockdep_sock_is_held(sk) && !debug_locks);
#endif
return sk->sk_lock.owned;
}
On Sun, 2016-04-24 at 14:25 -0700, Eric Dumazet wrote:
> On Sun, 2016-04-24 at 17:13 -0400, [email protected] wrote:
> > On Sun, 24 Apr 2016 14:00:17 -0700, Eric Dumazet said:
> > > On Sun, 2016-04-24 at 15:56 -0400, [email protected] wrote:
> > > > On Sun, 24 Apr 2016 12:46:42 -0700, Eric Dumazet said:
> > > >
> > > > > >>> + return !debug_locks ||
> > > > > >>> + lockdep_is_held(&sk->sk_lock) ||
> > > >
> > > > > Issue here is that once lockdep detected a problem (not necessarily in
> > > > > net/ tree btw), your helper always 'detect' a problem, since lockdep
> > > > > automatically disables itself.
> > > >
> > > > "D'Oh!" -- H. Simpson
> > > >
> > > > I thought this patch looked suspect, but couldn't put my finger on it. The
> > > > reason why I got like 41,000 of them is because I built a kernel that has
> > > > lockdep enabled, but I have an out-of-tree module that doesn't init something,
> > > > so I get this:
> > > >
> > > > [ 48.898156] INFO: trying to register non-static key.
> > > > [ 48.898157] the code is fine but needs lockdep annotation.
> > > > [ 48.898157] turning off the locking correctness validator.
> > > >
> > > > After which point, even with this patch, every time through it's still going to
> > > > explode.
> > >
> > > Which patch are you talking about ?
> >
> > The one that adds the !debug_locks check - once my out-of-kernel module
> > hits something that turns off lockdep, it's *still* going to complain on
> > pretty much all the same packets it complained about earlier. I thought
> > it looked suspicious, but you clarified why...
>
> It does not make sense to me. If lockdep is disabled, then debug_locks
> is 0.
>
> So no complain should happen from networking.
>
> I was about to send following patch, please check it solves the issue. ?
>
> (It certainly did for me, once I forced a lockdep splat loading a buggy
> module)
>
> Thanks
>
> From: Eric Dumazet <[email protected]>
>
> Valdis reported tons of stack dumps caused by WARN_ON() in sock_owned_by_user()
>
> This test needs to be relaxed if/when lockdep disables itself.
>
> Note that other lockdep_sock_is_held() callers are all from
> rcu_dereference_protected() sections which already are disabled
> if/when lockdep has been disabled.
>
> Fixes: fafc4e1ea1a4 ("sock: tigthen lockdep checks for sock_owned_by_user")
> Reported-by: Valdis Kletnieks <[email protected]>
> Signed-off-by: Eric Dumazet <[email protected]>
> ---
> include/net/sock.h | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/include/net/sock.h b/include/net/sock.h
> index 52448baf19d7..f492d01512ed 100644
> --- a/include/net/sock.h
> +++ b/include/net/sock.h
> @@ -1409,7 +1409,7 @@ static inline void unlock_sock_fast(struct sock *sk, bool slow)
> static inline bool sock_owned_by_user(const struct sock *sk)
> {
> #ifdef CONFIG_LOCKDEP
> - WARN_ON(!lockdep_sock_is_held(sk));
> + WARN_ON_ONCE(!lockdep_sock_is_held(sk) && !debug_locks);
Silly me, I tested the opposite test of course :
WARN_ON_ONCE(!lockdep_sock_is_held(sk) && debug_locks);
> #endif
> return sk->sk_lock.owned;
> }
>
On Sun, Apr 24, 2016, at 23:25, Eric Dumazet wrote:
> #ifdef CONFIG_LOCKDEP
> - WARN_ON(!lockdep_sock_is_held(sk));
> + WARN_ON_ONCE(!lockdep_sock_is_held(sk) && !debug_locks);
> #endif
Eric, could you resend this patch without the negation and also add my
acked-by (I came finally around to test it).
Thanks,
Hannes