bpf_sk_reuseport_detach() calls __rcu_dereference_sk_user_data_with_flags()
to obtain the value of sk->sk_user_data, but that function is only usable
if the RCU read lock is held, and neither that function nor any of its
callers hold it.
Fix this by adding a new helper, __locked_read_sk_user_data_with_flags()
that checks to see if sk->sk_callback_lock() is held and use that here
instead.
Alternatively, making __rcu_dereference_sk_user_data_with_flags() use
rcu_dereference_checked() might suffice.
Without this, the following warning can be occasionally observed:
=============================
WARNING: suspicious RCU usage
6.0.0-rc1-build2+ #563 Not tainted
-----------------------------
include/net/sock.h:592 suspicious rcu_dereference_check() usage!
other info that might help us debug this:
rcu_scheduler_active = 2, debug_locks = 1
5 locks held by locktest/29873:
#0: ffff88812734b550 (&sb->s_type->i_mutex_key#9){+.+.}-{3:3}, at: __sock_release+0x77/0x121
#1: ffff88812f5621b0 (sk_lock-AF_INET){+.+.}-{0:0}, at: tcp_close+0x1c/0x70
#2: ffff88810312f5c8 (&h->lhash2[i].lock){+.+.}-{2:2}, at: inet_unhash+0x76/0x1c0
#3: ffffffff83768bb8 (reuseport_lock){+...}-{2:2}, at: reuseport_detach_sock+0x18/0xdd
#4: ffff88812f562438 (clock-AF_INET){++..}-{2:2}, at: bpf_sk_reuseport_detach+0x24/0xa4
stack backtrace:
CPU: 1 PID: 29873 Comm: locktest Not tainted 6.0.0-rc1-build2+ #563
Hardware name: ASUS All Series/H97-PLUS, BIOS 2306 10/09/2014
Call Trace:
<TASK>
dump_stack_lvl+0x4c/0x5f
bpf_sk_reuseport_detach+0x6d/0xa4
reuseport_detach_sock+0x75/0xdd
inet_unhash+0xa5/0x1c0
tcp_set_state+0x169/0x20f
? lockdep_sock_is_held+0x3a/0x3a
? __lock_release.isra.0+0x13e/0x220
? reacquire_held_locks+0x1bb/0x1bb
? hlock_class+0x31/0x96
? mark_lock+0x9e/0x1af
__tcp_close+0x50/0x4b6
tcp_close+0x28/0x70
inet_release+0x8e/0xa7
__sock_release+0x95/0x121
sock_close+0x14/0x17
__fput+0x20f/0x36a
task_work_run+0xa3/0xcc
exit_to_user_mode_prepare+0x9c/0x14d
syscall_exit_to_user_mode+0x18/0x44
entry_SYSCALL_64_after_hwframe+0x63/0xcd
Fixes: cf8c1e967224 ("net: refactor bpf_sk_reuseport_detach()")
Signed-off-by: David Howells <[email protected]>
cc: Hawkins Jiawei <[email protected]>
cc: Jakub Kicinski <[email protected]>
cc: [email protected]
---
include/net/sock.h | 25 +++++++++++++++++++++++++
kernel/bpf/reuseport_array.c | 2 +-
2 files changed, 26 insertions(+), 1 deletion(-)
diff --git a/include/net/sock.h b/include/net/sock.h
index 05a1bbdf5805..d08cfe190a78 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -577,6 +577,31 @@ static inline bool sk_user_data_is_nocopy(const struct sock *sk)
#define __sk_user_data(sk) ((*((void __rcu **)&(sk)->sk_user_data)))
+/**
+ * __locked_read_sk_user_data_with_flags - return the pointer
+ * only if argument flags all has been set in sk_user_data. Otherwise
+ * return NULL
+ *
+ * @sk: socket
+ * @flags: flag bits
+ *
+ * The caller must be holding sk->sk_callback_lock.
+ */
+static inline void *
+__locked_read_sk_user_data_with_flags(const struct sock *sk,
+ uintptr_t flags)
+{
+ uintptr_t sk_user_data =
+ (uintptr_t)rcu_dereference_check(__sk_user_data(sk),
+ lockdep_is_held(&sk->sk_callback_lock));
+
+ WARN_ON_ONCE(flags & SK_USER_DATA_PTRMASK);
+
+ if ((sk_user_data & flags) == flags)
+ return (void *)(sk_user_data & SK_USER_DATA_PTRMASK);
+ return NULL;
+}
+
/**
* __rcu_dereference_sk_user_data_with_flags - return the pointer
* only if argument flags all has been set in sk_user_data. Otherwise
diff --git a/kernel/bpf/reuseport_array.c b/kernel/bpf/reuseport_array.c
index 85fa9dbfa8bf..82c61612f382 100644
--- a/kernel/bpf/reuseport_array.c
+++ b/kernel/bpf/reuseport_array.c
@@ -24,7 +24,7 @@ void bpf_sk_reuseport_detach(struct sock *sk)
struct sock __rcu **socks;
write_lock_bh(&sk->sk_callback_lock);
- socks = __rcu_dereference_sk_user_data_with_flags(sk, SK_USER_DATA_BPF);
+ socks = __locked_read_sk_user_data_with_flags(sk, SK_USER_DATA_BPF);
if (socks) {
WRITE_ONCE(sk->sk_user_data, NULL);
/*
On Tue, 16 Aug 2022 at 17:34, David Howells <[email protected]> wrote:
>
> Fix this by adding a new helper, __locked_read_sk_user_data_with_flags()
> that checks to see if sk->sk_callback_lock() is held and use that here
> instead.
Hi, I wonder if we make this more geniric, for I think maybe the future
code who use __rcu_dereference_sk_user_data_with_flags() may
also meet this bug.
To be more specific, maybe we can refactor
__rcu_dereference_sk_user_data_with_flags() to
__rcu_dereference_sk_user_data_with_flags_check(), like
rcu_dereference() and rcu_dereference_check(). Maybe:
diff --git a/include/net/sock.h b/include/net/sock.h
index 05a1bbdf5805..cf123954eab9 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -578,18 +578,27 @@ static inline bool sk_user_data_is_nocopy(const struct sock *sk)
#define __sk_user_data(sk) ((*((void __rcu **)&(sk)->sk_user_data)))
/**
- * __rcu_dereference_sk_user_data_with_flags - return the pointer
- * only if argument flags all has been set in sk_user_data. Otherwise
- * return NULL
+ * __rcu_dereference_sk_user_data_with_flags_check - return the pointer
+ * only if argument flags all has been set in sk_user_data, with debug
+ * checking. Otherwise return NULL
*
- * @sk: socket
- * @flags: flag bits
+ * Do __rcu_dereference_sk_user_data_with_flags(), but check that the
+ * conditions under which the rcu dereference will take place are correct,
+ * which is a bit like rcu_dereference_check() and rcu_derefence().
+ *
+ * @sk : socket
+ * @flags : flag bits
+ * @condition : the conditions under which the rcu dereference will
+ * take place
*/
static inline void *
-__rcu_dereference_sk_user_data_with_flags(const struct sock *sk,
- uintptr_t flags)
+__rcu_dereference_sk_user_data_with_flags_check(const struct sock *sk,
+ uintptr_t flags, bool condition)
{
- uintptr_t sk_user_data = (uintptr_t)rcu_dereference(__sk_user_data(sk));
+ uintptr_t sk_user_data;
+
+ sk_user_data = (uintptr_t)rcu_dereference_check(__sk_user_data(sk),
+ condition);
WARN_ON_ONCE(flags & SK_USER_DATA_PTRMASK);
@@ -598,6 +607,8 @@ __rcu_dereference_sk_user_data_with_flags(const struct sock *sk,
return NULL;
}
+#define __rcu_dereference_sk_user_data_with_flags(sk, flags) \
+ __rcu_dereference_sk_user_data_with_flags_check(sk, flags, 0)
#define rcu_dereference_sk_user_data(sk) \
__rcu_dereference_sk_user_data_with_flags(sk, 0)
#define __rcu_assign_sk_user_data_with_flags(sk, ptr, flags) \
> +/**
> + * __locked_read_sk_user_data_with_flags - return the pointer
> + * only if argument flags all has been set in sk_user_data. Otherwise
> + * return NULL
> + *
> + (uintptr_t)rcu_dereference_check(__sk_user_data(sk),
> + lockdep_is_held(&sk->sk_callback_lock));
> diff --git a/kernel/bpf/reuseport_array.c b/kernel/bpf/reuseport_array.c
> index 85fa9dbfa8bf..82c61612f382 100644
> --- a/kernel/bpf/reuseport_array.c
> +++ b/kernel/bpf/reuseport_array.c
> @@ -24,7 +24,7 @@ void bpf_sk_reuseport_detach(struct sock *sk)
> struct sock __rcu **socks;
>
> write_lock_bh(&sk->sk_callback_lock);
> - socks = __rcu_dereference_sk_user_data_with_flags(sk, SK_USER_DATA_BPF);
> + socks = __locked_read_sk_user_data_with_flags(sk, SK_USER_DATA_BPF);
> if (socks) {
> WRITE_ONCE(sk->sk_user_data, NULL);
> /*
Then, as you point out, we can pass
condition(lockdep_is_held(&sk->sk_callback_lock)) to
__rcu_dereference_sk_user_data_with_flags_check() in order to
make compiler happy as below:
diff --git a/kernel/bpf/reuseport_array.c b/kernel/bpf/reuseport_array.c
index 85fa9dbfa8bf..a772610987c5 100644
--- a/kernel/bpf/reuseport_array.c
+++ b/kernel/bpf/reuseport_array.c
@@ -24,7 +24,10 @@ void bpf_sk_reuseport_detach(struct sock *sk)
struct sock __rcu **socks;
write_lock_bh(&sk->sk_callback_lock);
- socks = __rcu_dereference_sk_user_data_with_flags(sk, SK_USER_DATA_BPF);
+ socks = __rcu_dereference_sk_user_data_with_flags_check(
+ sk, SK_USER_DATA_BPF,
+ lockdep_is_held(&sk->sk_callback_lock));
+
if (socks) {
WRITE_ONCE(sk->sk_user_data, NULL);
/*
Hawkins Jiawei <[email protected]> wrote:
> if (socks) {
> WRITE_ONCE(sk->sk_user_data, NULL);
Btw, shouldn't this be rcu_assign_pointer() or RCU_INIT_POINTER(), not
WRITE_ONCE()?
David
On Tue, 16 Aug 2022 18:34:52 +0800 Hawkins Jiawei wrote:
> +__rcu_dereference_sk_user_data_with_flags_check(const struct sock *sk,
This name is insanely long now.
Jakub Kicinski <[email protected]> wrote:
> > +__rcu_dereference_sk_user_data_with_flags_check(const struct sock *sk,
>
> This name is insanely long now.
I know. 47 chars. Do you have something you'd prefer? Maybe
get_sk_user_data_checked()?
It's a shame C doesn't allow default arguments.
David
Hawkins Jiawei <[email protected]> wrote:
> +__rcu_dereference_sk_user_data_with_flags_check(const struct sock *sk,
> + uintptr_t flags, bool condition)
That doesn't work. RCU_LOCKDEP_WARN() relies on anything passing on a
condition down to it to be a macro so that it can vanish the 'condition'
argument without causing an undefined symbol for 'lockdep_is_held' if lockdep
is disabled:
x86_64-linux-gnu-ld: kernel/bpf/reuseport_array.o: in function `bpf_sk_reuseport_detach':
/data/fs/linux-fs/build3/../kernel/bpf/reuseport_array.c:28: undefined reference to `lockdep_is_held'
So either __rcu_dereference_sk_user_data_with_flags_check() has to be a macro,
or we need to go with something like the first version of my patch where I
don't pass the condition through. Do you have a preference?
David
On Tue, Aug 16, 2022 at 02:09:46PM +0100, David Howells wrote:
> Hawkins Jiawei <[email protected]> wrote:
>
> > if (socks) {
> > WRITE_ONCE(sk->sk_user_data, NULL);
>
> Btw, shouldn't this be rcu_assign_pointer() or RCU_INIT_POINTER(), not
> WRITE_ONCE()?
It is not necessary. The sk_user_data usage in reuseport_array
is protected by the sk_callback_lock alone. The code
before the commit cf8c1e967224 is fine. If the
__rcu_dereference_sk_user_data_with_flags() could be reused here as is,
an extra rcu_dereference is fine, so I did not mention it.
It seems it is not the case and new function naming is getting long,
so how about reverting the commit cf8c1e967224 and keep it as it was.
On Tue, 16 Aug 2022 22:16:46 +0100 David Howells wrote:
> So either __rcu_dereference_sk_user_data_with_flags_check() has to be a macro,
> or we need to go with something like the first version of my patch where I
> don't pass the condition through. Do you have a preference?
I like your version because it documents what the lock protecting this
field is.
In fact should we also add && sock_owned_by_user(). Martin, WDYT? Would
that work for reuseport? Jakub S is fixing l2tp to hold the socket lock
while setting this field, yet most places take the callback lock...
One the naming - maybe just drop the _with_flags() ? There's no version
of locked helper which does not take the flags. And not underscores?
On Tue, Aug 16, 2022 at 04:44:35PM -0700, Jakub Kicinski wrote:
> On Tue, 16 Aug 2022 22:16:46 +0100 David Howells wrote:
> > So either __rcu_dereference_sk_user_data_with_flags_check() has to be a macro,
> > or we need to go with something like the first version of my patch where I
> > don't pass the condition through. Do you have a preference?
>
> I like your version because it documents what the lock protecting this
> field is.
>
> In fact should we also add && sock_owned_by_user(). Martin, WDYT? Would
> that work for reuseport? Jakub S is fixing l2tp to hold the socket lock
> while setting this field, yet most places take the callback lock...
It needs to take a closer look at where the lock_sock() has already
been acquired and also need to consider the lock ordering with reuseport_lock.
It probably should work but may need a separate patch to discuss those
considerations ?
>
> One the naming - maybe just drop the _with_flags() ? There's no version
> of locked helper which does not take the flags. And not underscores?
I am also good with a shorter name.
Could a comment be added to bpf_sk_reuseport_detach() mentioning
sk_user_data access is protected by the sk_callback_lock alone (or the lock
sock in the future) while reusing __locked_read_sk_user_data() with
a rcu_dereference(). It will be easier to understand if there is
actually any rcu reader in the future code reading.
On Tue, 16 Aug 2022 17:43:19 -0700 Martin KaFai Lau wrote:
> > I like your version because it documents what the lock protecting this
> > field is.
> >
> > In fact should we also add && sock_owned_by_user(). Martin, WDYT? Would
> > that work for reuseport? Jakub S is fixing l2tp to hold the socket lock
> > while setting this field, yet most places take the callback lock...
>
> It needs to take a closer look at where the lock_sock() has already
> been acquired and also need to consider the lock ordering with reuseport_lock.
> It probably should work but may need a separate patch to discuss those
> considerations ?
Right, the users of the field with a bit allocated protect the writes
with the callback lock, so we can hard code the check against the
callback lock for now and revisit later if needed.
On Wed, 17 Aug 2022 at 07:44, Jakub Kicinski <[email protected]> wrote:
>
> On Tue, 16 Aug 2022 22:16:46 +0100 David Howells wrote:
> > So either __rcu_dereference_sk_user_data_with_flags_check() has to be a macro,
> > or we need to go with something like the first version of my patch where I
> > don't pass the condition through. Do you have a preference?
>
> I like your version because it documents what the lock protecting this
> field is.
In my opinion, I still think we'd better refactor it to a more
geniric function, to avoid adding new functions when meeting
the same problem. However, if this is just a standalone problem,
maybe David's version shoule be better.
So maybe apply the David's version, and refactor it later when
meeting the same problem next time if needed.
On Wed, 17 Aug 2022 at 08:43, Martin KaFai Lau <[email protected]> wrote:
>
> On Tue, Aug 16, 2022 at 04:44:35PM -0700, Jakub Kicinski wrote:
> >
> > One the naming - maybe just drop the _with_flags() ? There's no version
> > of locked helper which does not take the flags. And not underscores?
> I am also good with a shorter name.
I also agree, the name is really long.
Jakub Kicinski <[email protected]> wrote:
> I like your version because it documents what the lock protecting this
> field is.
>
> In fact should we also add && sock_owned_by_user(). Martin, WDYT? Would
> that work for reuseport? Jakub S is fixing l2tp to hold the socket lock
> while setting this field, yet most places take the callback lock...
So how do you want to proceed? My first version of the patch with
sock_owned_by_user()?
David
On Wed, 17 Aug 2022 21:55:49 +0100 David Howells wrote:
> Jakub Kicinski <[email protected]> wrote:
>
> > I like your version because it documents what the lock protecting this
> > field is.
> >
> > In fact should we also add && sock_owned_by_user(). Martin, WDYT? Would
> > that work for reuseport? Jakub S is fixing l2tp to hold the socket lock
> > while setting this field, yet most places take the callback lock...
>
> So how do you want to proceed? My first version of the patch with
> sock_owned_by_user()?
Sorry about the lack of clarity. I was sort of expecting the name
to still be shortened, but what you have is probably good enough.
Applying v1, then, thanks!