2022-06-04 16:34:04

by syzbot

[permalink] [raw]
Subject: [syzbot] WARNING: refcount bug in sk_psock_get (2)

Hello,

syzbot found the following issue on:

HEAD commit: 7e062cda7d90 Merge tag 'net-next-5.19' of git://git.kernel..
git tree: net-next
console+strace: https://syzkaller.appspot.com/x/log.txt?x=13e2c083f00000
kernel config: https://syzkaller.appspot.com/x/.config?x=e2c9c27babb4d679
dashboard link: https://syzkaller.appspot.com/bug?extid=5f26f85569bd179c18ce
compiler: gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2
syz repro: https://syzkaller.appspot.com/x/repro.syz?x=16f2520bf00000
C reproducer: https://syzkaller.appspot.com/x/repro.c?x=13c31fcbf00000

The issue was bisected to:

commit 341adeec9adad0874f29a0a1af35638207352a39
Author: Wen Gu <[email protected]>
Date: Wed Jan 26 15:33:04 2022 +0000

net/smc: Forward wakeup to smc socket waitqueue after fallback

bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=12eb9635f00000
final oops: https://syzkaller.appspot.com/x/report.txt?x=11eb9635f00000
console output: https://syzkaller.appspot.com/x/log.txt?x=16eb9635f00000

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: [email protected]
Fixes: 341adeec9ada ("net/smc: Forward wakeup to smc socket waitqueue after fallback")

netdevsim netdevsim0 netdevsim3: set [1, 0] type 2 family 0 port 6081 - 0
------------[ cut here ]------------
refcount_t: saturated; leaking memory.
WARNING: CPU: 1 PID: 3605 at lib/refcount.c:19 refcount_warn_saturate+0xf4/0x1e0 lib/refcount.c:19
Modules linked in:
CPU: 1 PID: 3605 Comm: syz-executor208 Not tainted 5.18.0-syzkaller-03023-g7e062cda7d90 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:refcount_warn_saturate+0xf4/0x1e0 lib/refcount.c:19
Code: 1d 58 b1 ac 09 31 ff 89 de e8 38 3d 80 fd 84 db 75 ab e8 4f 39 80 fd 48 c7 c7 e0 a3 27 8a c6 05 38 b1 ac 09 01 e8 62 c6 34 05 <0f> 0b eb 8f e8 33 39 80 fd 0f b6 1d 22 b1 ac 09 31 ff 89 de e8 03
RSP: 0018:ffffc90002fcf9d0 EFLAGS: 00010282
RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
RDX: ffff888022543b00 RSI: ffffffff81606db8 RDI: fffff520005f9f2c
RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
R10: ffffffff816000f9 R11: 0000000000000000 R12: 1ffff920005f9f3d
R13: 0000000090965601 R14: ffff88807e9a0000 R15: ffffc90002fcfa08
FS: 00005555567fa300(0000) GS:ffff8880b9c00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fd252aa4300 CR3: 000000001994e000 CR4: 00000000003506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
__refcount_add_not_zero include/linux/refcount.h:163 [inline]
__refcount_inc_not_zero include/linux/refcount.h:227 [inline]
refcount_inc_not_zero include/linux/refcount.h:245 [inline]
sk_psock_get+0x3bc/0x410 include/linux/skmsg.h:439
tls_data_ready+0x6d/0x1b0 net/tls/tls_sw.c:2091
tcp_data_ready+0x106/0x520 net/ipv4/tcp_input.c:4983
tcp_data_queue+0x25f2/0x4c90 net/ipv4/tcp_input.c:5057
tcp_rcv_state_process+0x1774/0x4e80 net/ipv4/tcp_input.c:6659
tcp_v4_do_rcv+0x339/0x980 net/ipv4/tcp_ipv4.c:1682
sk_backlog_rcv include/net/sock.h:1061 [inline]
__release_sock+0x134/0x3b0 net/core/sock.c:2849
release_sock+0x54/0x1b0 net/core/sock.c:3404
inet_shutdown+0x1e0/0x430 net/ipv4/af_inet.c:909
__sys_shutdown_sock net/socket.c:2331 [inline]
__sys_shutdown_sock net/socket.c:2325 [inline]
__sys_shutdown+0xf1/0x1b0 net/socket.c:2343
__do_sys_shutdown net/socket.c:2351 [inline]
__se_sys_shutdown net/socket.c:2349 [inline]
__x64_sys_shutdown+0x50/0x70 net/socket.c:2349
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x46/0xb0
RIP: 0033:0x7fbd68c61969
Code: 28 c3 e8 4a 15 00 00 66 2e 0f 1f 84 00 00 00 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 c0 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007ffdfd5f2358 EFLAGS: 00000246 ORIG_RAX: 0000000000000030
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007fbd68c61969
RDX: 00007fbd68c61969 RSI: 0000000000000001 RDI: 0000000000000003
RBP: 0000000000000003 R08: bb1414ac00000000 R09: bb1414ac00000000
R10: 0000000000000028 R11: 0000000000000246 R12: 00007ffdfd5f2370
R13: 00007ffdfd5f2364 R14: 0000000000000003 R15: 0000000000000000
</TASK>


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at [email protected].

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
For information about bisection process see: https://goo.gl/tpsmEJ#bisection
syzbot can test patches for this issue, for details see:
https://goo.gl/tpsmEJ#testing-patches


2022-07-09 02:57:42

by Hawkins Jiawei

[permalink] [raw]
Subject: [PATCH] smc: fix refcount bug in sk_psock_get (2)

From: hawk <[email protected]>

Syzkaller reportes refcount bug as follows:
------------[ cut here ]------------
refcount_t: saturated; leaking memory.
WARNING: CPU: 1 PID: 3605 at lib/refcount.c:19 refcount_warn_saturate+0xf4/0x1e0 lib/refcount.c:19
Modules linked in:
CPU: 1 PID: 3605 Comm: syz-executor208 Not tainted 5.18.0-syzkaller-03023-g7e062cda7d90 #0
...
Call Trace:
<TASK>
__refcount_add_not_zero include/linux/refcount.h:163 [inline]
__refcount_inc_not_zero include/linux/refcount.h:227 [inline]
refcount_inc_not_zero include/linux/refcount.h:245 [inline]
sk_psock_get+0x3bc/0x410 include/linux/skmsg.h:439
tls_data_ready+0x6d/0x1b0 net/tls/tls_sw.c:2091
tcp_data_ready+0x106/0x520 net/ipv4/tcp_input.c:4983
tcp_data_queue+0x25f2/0x4c90 net/ipv4/tcp_input.c:5057
tcp_rcv_state_process+0x1774/0x4e80 net/ipv4/tcp_input.c:6659
tcp_v4_do_rcv+0x339/0x980 net/ipv4/tcp_ipv4.c:1682
sk_backlog_rcv include/net/sock.h:1061 [inline]
__release_sock+0x134/0x3b0 net/core/sock.c:2849
release_sock+0x54/0x1b0 net/core/sock.c:3404
inet_shutdown+0x1e0/0x430 net/ipv4/af_inet.c:909
__sys_shutdown_sock net/socket.c:2331 [inline]
__sys_shutdown_sock net/socket.c:2325 [inline]
__sys_shutdown+0xf1/0x1b0 net/socket.c:2343
__do_sys_shutdown net/socket.c:2351 [inline]
__se_sys_shutdown net/socket.c:2349 [inline]
__x64_sys_shutdown+0x50/0x70 net/socket.c:2349
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x46/0xb0
</TASK>

syzbot is try to setup TLS on a SMC socket.

During SMC fallback process in connect syscall, kernel will sets the
smc->sk.sk_socket->file->private_data to smc->clcsock
in smc_switch_to_fallback(), and set smc->clcsock->sk_user_data
to origin smc in smc_fback_replace_callbacks().

When syzbot makes a setsockopt syscall, its argument sockfd
actually points to smc->clcsock, which is not a smc_sock type,
So it won't call smc_setsockopt() in setsockopt syscall,
instead it will call do_tcp_setsockopt() to setup TLS, which
bypasses the fixes 734942cc4ea6, its content is shown as below
> diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c
> index be3e80b3e27f1..5eff7cccceffc 100644
> --- a/net/smc/af_smc.c
> +++ b/net/smc/af_smc.c
> @@ -2161,6 +2161,9 @@ static int smc_setsockopt(struct socket *sock,
> int level, int optname,
> struct smc_sock *smc;
> int val, rc;
>
> + if (level == SOL_TCP && optname == TCP_ULP)
> + return -EOPNOTSUPP;
> +
> smc = smc_sk(sk);
>
> /* generic setsockopts reaching us here always apply to the
> @@ -2185,7 +2188,6 @@ static int smc_setsockopt(struct socket *sock,
> int level, int optname,
> if (rc || smc->use_fallback)
> goto out;
> switch (optname) {
> - case TCP_ULP:
> case TCP_FASTOPEN:
> case TCP_FASTOPEN_CONNECT:
> case TCP_FASTOPEN_KEY:
> --

Later, sk_psock_get() will treat the smc->clcsock->sk_user_data
as sk_psock type, which triggers the refcnt warning.

So Just disallow this setup in do_tcp_setsockopt() is OK,
by checking whether sk_user_data points to a SMC socket.

Reported-and-tested-by: [email protected]
Signed-off-by: hawk <[email protected]>
---
net/ipv4/tcp.c | 13 +++++++++++++
1 file changed, 13 insertions(+)

diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 9984d23a7f3e..a1e6cab2c748 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -3395,10 +3395,23 @@ static int do_tcp_setsockopt(struct sock *sk, int level, int optname,
}
case TCP_ULP: {
char name[TCP_ULP_NAME_MAX];
+ struct sock *smc_sock;

if (optlen < 1)
return -EINVAL;

+ /* SMC sk_user_data may be treated as psock,
+ * which triggers a refcnt warning.
+ */
+ rcu_read_lock();
+ smc_sock = rcu_dereference_sk_user_data(sk);
+ if (level == SOL_TCP && smc_sock &&
+ smc_sock->__sk_common.skc_family == AF_SMC) {
+ rcu_read_unlock();
+ return -EOPNOTSUPP;
+ }
+ rcu_read_unlock();
+
val = strncpy_from_sockptr(name, optval,
min_t(long, TCP_ULP_NAME_MAX - 1,
optlen));
--
2.25.1

2022-07-09 03:40:51

by Jakub Kicinski

[permalink] [raw]
Subject: Re: [PATCH] smc: fix refcount bug in sk_psock_get (2)

On Sat, 9 Jul 2022 10:46:59 +0800 Hawkins Jiawei wrote:
> Reported-and-tested-by: [email protected]
> Signed-off-by: hawk <[email protected]>
> ---
> net/ipv4/tcp.c | 13 +++++++++++++
> 1 file changed, 13 insertions(+)
>
> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> index 9984d23a7f3e..a1e6cab2c748 100644
> --- a/net/ipv4/tcp.c
> +++ b/net/ipv4/tcp.c
> @@ -3395,10 +3395,23 @@ static int do_tcp_setsockopt(struct sock *sk, int level, int optname,
> }
> case TCP_ULP: {
> char name[TCP_ULP_NAME_MAX];
> + struct sock *smc_sock;
>
> if (optlen < 1)
> return -EINVAL;
>
> + /* SMC sk_user_data may be treated as psock,
> + * which triggers a refcnt warning.
> + */
> + rcu_read_lock();
> + smc_sock = rcu_dereference_sk_user_data(sk);
> + if (level == SOL_TCP && smc_sock &&
> + smc_sock->__sk_common.skc_family == AF_SMC) {

This should prolly be under the socket lock?

Can we add a bit to SK_USER_DATA_PTRMASK and have ULP-compatible
users (sockmap) opt into ULP cooperation? Modifying TCP is backwards,
layer-wise.

> + rcu_read_unlock();
> + return -EOPNOTSUPP;
> + }
> + rcu_read_unlock();
> +

2022-07-09 08:52:58

by Hawkins Jiawei

[permalink] [raw]
Subject: Re: [PATCH] smc: fix refcount bug in sk_psock_get (2)

On Sat, 9 Jul 2022 at 11:06, Jakub Kicinski <[email protected]> wrote:
> On Sat, 9 Jul 2022 10:46:59 +0800 Hawkins Jiawei wrote:
> > Reported-and-tested-by: [email protected]
> > Signed-off-by: hawk <[email protected]>
> > ---
> > net/ipv4/tcp.c | 13 +++++++++++++
> > 1 file changed, 13 insertions(+)
> >
> > diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> > index 9984d23a7f3e..a1e6cab2c748 100644
> > --- a/net/ipv4/tcp.c
> > +++ b/net/ipv4/tcp.c
> > @@ -3395,10 +3395,23 @@ static int do_tcp_setsockopt(struct sock *sk, int level, int optname,
> > }
> > case TCP_ULP: {
> > char name[TCP_ULP_NAME_MAX];
> > + struct sock *smc_sock;
> >
> > if (optlen < 1)
> > return -EINVAL;
> >
> > + /* SMC sk_user_data may be treated as psock,
> > + * which triggers a refcnt warning.
> > + */
> > + rcu_read_lock();
> > + smc_sock = rcu_dereference_sk_user_data(sk);
> > + if (level == SOL_TCP && smc_sock &&
> > + smc_sock->__sk_common.skc_family == AF_SMC) {
>
> This should prolly be under the socket lock?
>
> Can we add a bit to SK_USER_DATA_PTRMASK and have ULP-compatible
> users (sockmap) opt into ULP cooperation? Modifying TCP is backwards,
> layer-wise.

Thanks for your suggestion, I also agree that modifying TCP directly
is not wise.

I am sorry that I can't follow you on haveing ULP-compatible
users (sockmap) opt into ULP cooperation, yet adding a bit to
SK_USER_DATA_PTRMASK seems like a good way.

I plan to add a mask bit, and check it during sk_psock_get(),
in v2 patch

>
> > + rcu_read_unlock();
> > + return -EOPNOTSUPP;
> > + }
> > + rcu_read_unlock();
> > +

2022-07-11 07:50:55

by Wen Gu

[permalink] [raw]
Subject: Re: [PATCH] smc: fix refcount bug in sk_psock_get (2)



On 2022/7/9 10:46 am, Hawkins Jiawei wrote:

>
> syzbot is try to setup TLS on a SMC socket.
>
> During SMC fallback process in connect syscall, kernel will sets the
> smc->sk.sk_socket->file->private_data to smc->clcsock
> in smc_switch_to_fallback(), and set smc->clcsock->sk_user_data
> to origin smc in smc_fback_replace_callbacks().

>
> Later, sk_psock_get() will treat the smc->clcsock->sk_user_data
> as sk_psock type, which triggers the refcnt warning.
>


Thanks for your analysis.

Although syzbot found this issue in SMC, seems that it is a generic
issue about sk_user_data usage? Fixing it from SK_USER_DATA_PTRMASK
as you plan should be a right way.

2022-07-12 10:18:15

by Dan Carpenter

[permalink] [raw]
Subject: Re: [PATCH] smc: fix refcount bug in sk_psock_get (2)

On Sat, Jul 09, 2022 at 10:46:59AM +0800, Hawkins Jiawei wrote:
> From: hawk <[email protected]>

Please use your legal name like you would for signing a legal document.

regards,
dan carpenter

2022-07-13 03:18:51

by Hawkins Jiawei

[permalink] [raw]
Subject: Re: [PATCH] smc: fix refcount bug in sk_psock_get (2)

On Mon, 11 Jul 2022 at 15:21, Wen Gu <[email protected]> wrote:
>Although syzbot found this issue in SMC, seems that it is a generic
>issue about sk_user_data usage? Fixing it from SK_USER_DATA_PTRMASK
>as you plan should be a right way.

Thanks for your advice. In fact, I found a more
general patch, but it seems that it has not
been merged until now.

In this bug, the problem is that smc and psock, both use
sk_user_data field to save their private data. So they
will treat field in their own way.

>> in smc_switch_to_fallback(), and set smc->clcsock->sk_user_data
>> to origin smc in smc_fback_replace_callbacks().
>>
>> Later, sk_psock_get() will treat the smc->clcsock->sk_user_data
>> as sk_psock type, which triggers the refcnt warning.

So in the patch [PATCH RFC 1/5] net: Add distinct sk_psock field,
psock private data will be moved to the sk_psock field, shown as
below
> The sk_psock facility populates the sk_user_data field with the
> address of an extra bit of metadata. User space sockets never
> populate the sk_user_data field, so this has worked out fine.
>
> However, kernel consumers such as the RPC client and server do
> populate the sk_user_data field. The sk_psock() function cannot tell
> that the content of sk_user_data does not point to psock metadata,
> so it will happily return a pointer to something else, cast to a
> struct sk_psock.
>
> Thus kernel consumers and psock currently cannot co-exist.
>
> We could educate sk_psock() to return NULL if sk_user_data does
> not point to a struct sk_psock. However, a more general solution
> that enables full co-existence psock and other uses of sk_user_data
> might be more interesting.
>
> Move the struct sk_psock address to its own pointer field so that
> the contents of the sk_user_data field is preserved.
>
> Signed-off-by: Chuck Lever <[email protected]>
> ---
> include/linux/skmsg.h | 2 +-
> include/net/sock.h | 4 +++-
> net/core/skmsg.c | 6 +++---
> 3 files changed, 7 insertions(+), 5 deletions(-)
>
> diff --git a/include/linux/skmsg.h b/include/linux/skmsg.h
> index c5a2d6f50f25..5ef3a07c5b6c 100644
> --- a/include/linux/skmsg.h
> +++ b/include/linux/skmsg.h
> @@ -277,7 +277,7 @@ static inline void sk_msg_sg_copy_clear(
> struct sk_msg *msg, u32 start)
>
> static inline struct sk_psock *sk_psock(const struct sock *sk)
> {
> - return rcu_dereference_sk_user_data(sk);
> + return rcu_dereference(sk->sk_psock);
> }
>
> static inline void sk_psock_set_state(struct sk_psock *psock,
> diff --git a/include/net/sock.h b/include/net/sock.h
> index c4b91fc19b9c..d2a513169527 100644
> --- a/include/net/sock.h
> +++ b/include/net/sock.h
> @@ -327,7 +327,8 @@ struct sk_filter;
> * @sk_tskey: counter to disambiguate concurrent tstamp requests
> * @sk_zckey: counter to order MSG_ZEROCOPY notifications
> * @sk_socket: Identd and reporting IO signals
> - * @sk_user_data: RPC layer private data
> + * @sk_user_data: Upper layer private data
> + * @sk_psock: socket policy data (bpf)
> * @sk_frag: cached page frag
> * @sk_peek_off: current peek_offset value
> * @sk_send_head: front of stuff to transmit
> @@ -519,6 +520,7 @@ struct sock {
>
> struct socket *sk_socket;
> void *sk_user_data;
> + struct sk_psock __rcu *sk_psock;
> #ifdef CONFIG_SECURITY
> void *sk_security;
> #endif
> diff --git a/net/core/skmsg.c b/net/core/skmsg.c
> index cc381165ea08..2b3d01d92790 100644
> --- a/net/core/skmsg.c
> +++ b/net/core/skmsg.c
> @@ -695,7 +695,7 @@ struct sk_psock *sk_psock_init(struct sock *sk,
> int node)
>
> write_lock_bh(&sk->sk_callback_lock);
>
> - if (sk->sk_user_data) {
> + if (sk->sk_psock) {
> psock = ERR_PTR(-EBUSY);
> goto out;
> }
> @@ -726,7 +726,7 @@ struct sk_psock *sk_psock_init(struct sock *sk,
> int node)
> sk_psock_set_state(psock, SK_PSOCK_TX_ENABLED);
> refcount_set(&psock->refcnt, 1);
>
> - rcu_assign_sk_user_data_nocopy(sk, psock);
> + rcu_assign_pointer(sk->sk_psock, psock);
> sock_hold(sk);
>
> out:
> @@ -825,7 +825,7 @@ void sk_psock_drop(struct sock *sk,
> struct sk_psock *psock)
> {
> write_lock_bh(&sk->sk_callback_lock);
> sk_psock_restore_proto(sk, psock);
> - rcu_assign_sk_user_data(sk, NULL);
> + rcu_assign_pointer(sk->sk_psock, NULL);
> if (psock->progs.stream_parser)
> sk_psock_stop_strp(sk, psock);
> else if (psock->progs.stream_verdict || psock->progs.skb_verdict)

I have tested this patch and the reproducer did not trigger any issue.

In Patchwork website, this patch fails the checks on
netdev/cc_maintainers. If this patch fails for some other reasons,
I will still fix this bug from SK_USER_DATA_PTRMASK,
as a temporary solution.

2022-07-13 04:05:57

by Jakub Kicinski

[permalink] [raw]
Subject: Re: [PATCH] smc: fix refcount bug in sk_psock_get (2)

On Wed, 13 Jul 2022 11:10:05 +0800 Hawkins Jiawei wrote:
> In Patchwork website, this patch fails the checks on
> netdev/cc_maintainers. If this patch fails for some other reasons,
> I will still fix this bug from SK_USER_DATA_PTRMASK,
> as a temporary solution.

That check just runs scripts/get_maintainer.pl so make sure you CC
folks pointed out by that script and you should be fine.

2022-07-13 04:06:13

by Hawkins Jiawei

[permalink] [raw]
Subject: Re: [PATCH] smc: fix refcount bug in sk_psock_get (2)

On Tue, 12 Jul 2022 at 17:48, Dan Carpenter <[email protected]> wrote:
>
> On Sat, Jul 09, 2022 at 10:46:59AM +0800, Hawkins Jiawei wrote:
> > From: hawk <[email protected]>
>
> Please use your legal name like you would for signing a legal document.

Thanks, I will pay attention to it in the future.

>
> regards,
> dan carpenter
>

2022-07-13 04:21:12

by Hawkins Jiawei

[permalink] [raw]
Subject: Re: [PATCH] smc: fix refcount bug in sk_psock_get (2)

On Wed, 13 Jul 2022 at 11:33, Jakub Kicinski <[email protected]> wrote:
>
> On Wed, 13 Jul 2022 11:10:05 +0800 Hawkins Jiawei wrote:
> > In Patchwork website, this patch fails the checks on
> > netdev/cc_maintainers. If this patch fails for some other reasons,
> > I will still fix this bug from SK_USER_DATA_PTRMASK,
> > as a temporary solution.
>
> That check just runs scripts/get_maintainer.pl so make sure you CC
> folks pointed out by that script and you should be fine.

Thanks for your reply, yet I am not the patch's author, I
found this patch during my bug analysis.

I will reply the relative email to remind the patch's author.

2022-07-30 09:19:07

by Hawkins Jiawei

[permalink] [raw]
Subject: [PATCH v2] net/smc: fix refcount bug in sk_psock_get (2)

Syzkaller reports refcount bug as follows:
------------[ cut here ]------------
refcount_t: saturated; leaking memory.
WARNING: CPU: 1 PID: 3605 at lib/refcount.c:19 refcount_warn_saturate+0xf4/0x1e0 lib/refcount.c:19
Modules linked in:
CPU: 1 PID: 3605 Comm: syz-executor208 Not tainted 5.18.0-syzkaller-03023-g7e062cda7d90 #0
...
Call Trace:
<TASK>
__refcount_add_not_zero include/linux/refcount.h:163 [inline]
__refcount_inc_not_zero include/linux/refcount.h:227 [inline]
refcount_inc_not_zero include/linux/refcount.h:245 [inline]
sk_psock_get+0x3bc/0x410 include/linux/skmsg.h:439
tls_data_ready+0x6d/0x1b0 net/tls/tls_sw.c:2091
tcp_data_ready+0x106/0x520 net/ipv4/tcp_input.c:4983
tcp_data_queue+0x25f2/0x4c90 net/ipv4/tcp_input.c:5057
tcp_rcv_state_process+0x1774/0x4e80 net/ipv4/tcp_input.c:6659
tcp_v4_do_rcv+0x339/0x980 net/ipv4/tcp_ipv4.c:1682
sk_backlog_rcv include/net/sock.h:1061 [inline]
__release_sock+0x134/0x3b0 net/core/sock.c:2849
release_sock+0x54/0x1b0 net/core/sock.c:3404
inet_shutdown+0x1e0/0x430 net/ipv4/af_inet.c:909
__sys_shutdown_sock net/socket.c:2331 [inline]
__sys_shutdown_sock net/socket.c:2325 [inline]
__sys_shutdown+0xf1/0x1b0 net/socket.c:2343
__do_sys_shutdown net/socket.c:2351 [inline]
__se_sys_shutdown net/socket.c:2349 [inline]
__x64_sys_shutdown+0x50/0x70 net/socket.c:2349
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x46/0xb0
</TASK>

During SMC fallback process in connect syscall, kernel will
replaces TCP with SMC. In order to forward wakeup
smc socket waitqueue after fallback, kernel will sets
clcsk->sk_user_data to origin smc socket in
smc_fback_replace_callbacks().

Later, in shutdown syscall, kernel will calls
sk_psock_get(), which treats the clcsk->sk_user_data
as sk_psock type, triggering the refcnt warning.

So, the root cause is that smc and psock, both will use
sk_user_data field. So they will mismatch this field
easily.

This patch solves it by using another bit(defined as
SK_USER_DATA_NOTPSOCK) in PTRMASK, to mark whether
sk_user_data points to a sk_psock object or not.
This patch depends on a PTRMASK introduced in commit f1ff5ce2cd5e
("net, sk_msg: Clear sk_user_data pointer on clone if tagged").

Fixes: 341adeec9ada ("net/smc: Forward wakeup to smc socket waitqueue after fallback")
Fixes: a60a2b1e0af1 ("net/smc: reduce active tcp_listen workers")
Reported-and-tested-by: [email protected]
Suggested-by: Jakub Kicinski <[email protected]>
Acked-by: Wen Gu <[email protected]>
Signed-off-by: Hawkins Jiawei <[email protected]>
---
v1 -> v2:
- add bit in PTRMASK to patch the bug

include/linux/skmsg.h | 2 +-
include/net/sock.h | 27 +++++++++++++++++++++++++--
net/smc/af_smc.c | 6 ++++--
net/smc/smc.h | 2 +-
4 files changed, 31 insertions(+), 6 deletions(-)

diff --git a/include/linux/skmsg.h b/include/linux/skmsg.h
index c5a2d6f50f25..81bfa1a33623 100644
--- a/include/linux/skmsg.h
+++ b/include/linux/skmsg.h
@@ -277,7 +277,7 @@ static inline void sk_msg_sg_copy_clear(struct sk_msg *msg, u32 start)

static inline struct sk_psock *sk_psock(const struct sock *sk)
{
- return rcu_dereference_sk_user_data(sk);
+ return rcu_dereference_sk_user_data_psock(sk);
}

static inline void sk_psock_set_state(struct sk_psock *psock,
diff --git a/include/net/sock.h b/include/net/sock.h
index 9fa54762e077..316c0313b2bf 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -549,10 +549,17 @@ enum sk_pacing {
* when cloning the socket. For instance, it can point to a reference
* counted object. sk_user_data bottom bit is set if pointer must not
* be copied.
+ *
+ * SK_USER_DATA_NOCOPY - test if pointer must not copied
+ * SK_USER_DATA_BPF - managed by BPF
+ * SK_USER_DATA_NOTPSOCK - test if pointer points to psock
*/
#define SK_USER_DATA_NOCOPY 1UL
-#define SK_USER_DATA_BPF 2UL /* Managed by BPF */
-#define SK_USER_DATA_PTRMASK ~(SK_USER_DATA_NOCOPY | SK_USER_DATA_BPF)
+#define SK_USER_DATA_BPF 2UL
+#define SK_USER_DATA_NOTPSOCK 4UL
+#define SK_USER_DATA_PTRMASK ~(SK_USER_DATA_NOCOPY | SK_USER_DATA_BPF |\
+ SK_USER_DATA_NOTPSOCK)
+

/**
* sk_user_data_is_nocopy - Test if sk_user_data pointer must not be copied
@@ -584,6 +591,22 @@ static inline bool sk_user_data_is_nocopy(const struct sock *sk)
__tmp | SK_USER_DATA_NOCOPY); \
})

+/**
+ * rcu_dereference_sk_user_data_psock - return psock if sk_user_data points
+ * to the psock
+ * @sk: socket
+ */
+static inline
+struct sk_psock *rcu_dereference_sk_user_data_psock(const struct sock *sk)
+{
+ uintptr_t __tmp = (uintptr_t)rcu_dereference(__sk_user_data((sk)));
+
+ if (__tmp & SK_USER_DATA_NOTPSOCK)
+ return NULL;
+ return (struct sk_psock *)(__tmp & SK_USER_DATA_PTRMASK);
+}
+
+
static inline
struct net *sock_net(const struct sock *sk)
{
diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c
index 433bb5a7df31..d0feccf824c8 100644
--- a/net/smc/af_smc.c
+++ b/net/smc/af_smc.c
@@ -812,7 +812,8 @@ static void smc_fback_replace_callbacks(struct smc_sock *smc)
struct sock *clcsk = smc->clcsock->sk;

write_lock_bh(&clcsk->sk_callback_lock);
- clcsk->sk_user_data = (void *)((uintptr_t)smc | SK_USER_DATA_NOCOPY);
+ clcsk->sk_user_data = (void *)((uintptr_t)smc | SK_USER_DATA_NOCOPY |
+ SK_USER_DATA_NOTPSOCK);

smc_clcsock_replace_cb(&clcsk->sk_state_change, smc_fback_state_change,
&smc->clcsk_state_change);
@@ -2470,7 +2471,8 @@ static int smc_listen(struct socket *sock, int backlog)
*/
write_lock_bh(&smc->clcsock->sk->sk_callback_lock);
smc->clcsock->sk->sk_user_data =
- (void *)((uintptr_t)smc | SK_USER_DATA_NOCOPY);
+ (void *)((uintptr_t)smc | SK_USER_DATA_NOCOPY |
+ SK_USER_DATA_NOTPSOCK);
smc_clcsock_replace_cb(&smc->clcsock->sk->sk_data_ready,
smc_clcsock_data_ready, &smc->clcsk_data_ready);
write_unlock_bh(&smc->clcsock->sk->sk_callback_lock);
diff --git a/net/smc/smc.h b/net/smc/smc.h
index 5ed765ea0c73..c24d0469d267 100644
--- a/net/smc/smc.h
+++ b/net/smc/smc.h
@@ -299,7 +299,7 @@ static inline void smc_init_saved_callbacks(struct smc_sock *smc)
static inline struct smc_sock *smc_clcsock_user_data(const struct sock *clcsk)
{
return (struct smc_sock *)
- ((uintptr_t)clcsk->sk_user_data & ~SK_USER_DATA_NOCOPY);
+ ((uintptr_t)clcsk->sk_user_data & SK_USER_DATA_PTRMASK);
}

/* save target_cb in saved_cb, and replace target_cb with new_cb */
--
2.25.1


2022-08-01 09:18:44

by Jakub Sitnicki

[permalink] [raw]
Subject: Re: [PATCH v2] net/smc: fix refcount bug in sk_psock_get (2)

Hi,

On Sat, Jul 30, 2022 at 04:56 PM +08, Hawkins Jiawei wrote:
> Syzkaller reports refcount bug as follows:
> ------------[ cut here ]------------
> refcount_t: saturated; leaking memory.
> WARNING: CPU: 1 PID: 3605 at lib/refcount.c:19 refcount_warn_saturate+0xf4/0x1e0 lib/refcount.c:19
> Modules linked in:
> CPU: 1 PID: 3605 Comm: syz-executor208 Not tainted 5.18.0-syzkaller-03023-g7e062cda7d90 #0
> ...
> Call Trace:
> <TASK>
> __refcount_add_not_zero include/linux/refcount.h:163 [inline]
> __refcount_inc_not_zero include/linux/refcount.h:227 [inline]
> refcount_inc_not_zero include/linux/refcount.h:245 [inline]
> sk_psock_get+0x3bc/0x410 include/linux/skmsg.h:439
> tls_data_ready+0x6d/0x1b0 net/tls/tls_sw.c:2091
> tcp_data_ready+0x106/0x520 net/ipv4/tcp_input.c:4983
> tcp_data_queue+0x25f2/0x4c90 net/ipv4/tcp_input.c:5057
> tcp_rcv_state_process+0x1774/0x4e80 net/ipv4/tcp_input.c:6659
> tcp_v4_do_rcv+0x339/0x980 net/ipv4/tcp_ipv4.c:1682
> sk_backlog_rcv include/net/sock.h:1061 [inline]
> __release_sock+0x134/0x3b0 net/core/sock.c:2849
> release_sock+0x54/0x1b0 net/core/sock.c:3404
> inet_shutdown+0x1e0/0x430 net/ipv4/af_inet.c:909
> __sys_shutdown_sock net/socket.c:2331 [inline]
> __sys_shutdown_sock net/socket.c:2325 [inline]
> __sys_shutdown+0xf1/0x1b0 net/socket.c:2343
> __do_sys_shutdown net/socket.c:2351 [inline]
> __se_sys_shutdown net/socket.c:2349 [inline]
> __x64_sys_shutdown+0x50/0x70 net/socket.c:2349
> do_syscall_x64 arch/x86/entry/common.c:50 [inline]
> do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
> entry_SYSCALL_64_after_hwframe+0x46/0xb0
> </TASK>
>
> During SMC fallback process in connect syscall, kernel will
> replaces TCP with SMC. In order to forward wakeup
> smc socket waitqueue after fallback, kernel will sets
> clcsk->sk_user_data to origin smc socket in
> smc_fback_replace_callbacks().
>
> Later, in shutdown syscall, kernel will calls
> sk_psock_get(), which treats the clcsk->sk_user_data
> as sk_psock type, triggering the refcnt warning.
>
> So, the root cause is that smc and psock, both will use
> sk_user_data field. So they will mismatch this field
> easily.
>
> This patch solves it by using another bit(defined as
> SK_USER_DATA_NOTPSOCK) in PTRMASK, to mark whether
> sk_user_data points to a sk_psock object or not.
> This patch depends on a PTRMASK introduced in commit f1ff5ce2cd5e
> ("net, sk_msg: Clear sk_user_data pointer on clone if tagged").
>
> Fixes: 341adeec9ada ("net/smc: Forward wakeup to smc socket waitqueue after fallback")
> Fixes: a60a2b1e0af1 ("net/smc: reduce active tcp_listen workers")
> Reported-and-tested-by: [email protected]
> Suggested-by: Jakub Kicinski <[email protected]>
> Acked-by: Wen Gu <[email protected]>
> Signed-off-by: Hawkins Jiawei <[email protected]>
> ---

Since using psock is not the common case, I'm wondering if it makes more
sense to have an inverse flag - SK_USER_DATA_PSOCK. Flag would be set by
the psock code on assignment to sk_user_data.

This way we would also avoid some confusion. With the change below, the
SK_USER_DATA_NOTPSOCK is not *always* set when sk_user_data holds a
non-psock pointer. Only when SMC sets it.

If we go with the current approach, the rest of sites, execpt for psock,
that assign to sk_user_data should be updated to set
SK_USER_DATA_NOTPSOCK as well, IMO.

That is why I'd do it the other way.

[...]

2022-08-02 15:11:18

by Hawkins Jiawei

[permalink] [raw]
Subject: Re: [PATCH v2] net/smc: fix refcount bug in sk_psock_get (2)

Thanks for your suggestion!

On Mon, 1 Aug 2022 at 17:16, Jakub Sitnicki <[email protected]> wrote:
> This way we would also avoid some confusion. With the change below, the
> SK_USER_DATA_NOTPSOCK is not *always* set when sk_user_data holds a
> non-psock pointer. Only when SMC sets it.
>
> If we go with the current approach, the rest of sites, execpt for psock,
> that assign to sk_user_data should be updated to set
> SK_USER_DATA_NOTPSOCK as well, IMO.
Yes, as you point out, in this patch, this flag's name should be
*SK_USER_DATA_NEEDCHECK_NOTPSOCK*, which is more clearly.

To be more specific, we don't need to set this flag for
every sk_user_data who holds non-psock pointer. Only set the flag for
the site that has been reported involved with the type-mismatch bug
like this bug.
> > During SMC fallback process in connect syscall, kernel will
> > replaces TCP with SMC. In order to forward wakeup
> > smc socket waitqueue after fallback, kernel will sets
> > clcsk->sk_user_data to origin smc socket in
> > smc_fback_replace_callbacks().
> >
> > Later, in shutdown syscall, kernel will calls
> > sk_psock_get(), which treats the clcsk->sk_user_data
> > as sk_psock type, triggering the refcnt warning.

For other sites, this patch is actually transparent to them, because
the *SK_USER_DATA_NEEDCHECK_NOTPSOCK* flag is always unset. So this
patch won't affect them, which won't introduce any extra
potential bugs.
> +/**
> + * rcu_dereference_sk_user_data_psock - return psock if sk_user_data points
> + * to the psock
> + * @sk: socket
> + */
> +static inline
> +struct sk_psock *rcu_dereference_sk_user_data_psock(const struct sock *sk)
> +{
> + uintptr_t __tmp = (uintptr_t)rcu_dereference(__sk_user_data((sk)));
> +
> + if (__tmp & SK_USER_DATA_NOTPSOCK)
> + return NULL;
> + return (struct sk_psock *)(__tmp & SK_USER_DATA_PTRMASK);
> +}

>
> Hi,
> Since using psock is not the common case, I'm wondering if it makes more
> sense to have an inverse flag - SK_USER_DATA_PSOCK. Flag would be set by
> the psock code on assignment to sk_user_data.
However, your suggestion seems more elegant. For my patch, as you point
out, when anyone reports a new type-mismatch bug, the relative assign to
sk_user_data should be updated to set *SK_USER_DATA_NEEDCHECK_NOTPSOCK*
flag.

For your suggestion, you seems avoid above situation. What's more, as I
use git grep to search, there seems no direct access to sk_user_data,
all via a small amount macros and wrapper functions. So we can keep
transparent by only update those macros and wrapper functions, which
also won't introduce any extra potential bugs.

I will patch as you suggest in v3 patch.

2022-08-03 08:43:16

by Hawkins Jiawei

[permalink] [raw]
Subject: [PATCH v3] net/smc: fix refcount bug in sk_psock_get (2)

Syzkaller reports refcount bug as follows:
------------[ cut here ]------------
refcount_t: saturated; leaking memory.
WARNING: CPU: 1 PID: 3605 at lib/refcount.c:19 refcount_warn_saturate+0xf4/0x1e0 lib/refcount.c:19
Modules linked in:
CPU: 1 PID: 3605 Comm: syz-executor208 Not tainted 5.18.0-syzkaller-03023-g7e062cda7d90 #0
<TASK>
__refcount_add_not_zero include/linux/refcount.h:163 [inline]
__refcount_inc_not_zero include/linux/refcount.h:227 [inline]
refcount_inc_not_zero include/linux/refcount.h:245 [inline]
sk_psock_get+0x3bc/0x410 include/linux/skmsg.h:439
tls_data_ready+0x6d/0x1b0 net/tls/tls_sw.c:2091
tcp_data_ready+0x106/0x520 net/ipv4/tcp_input.c:4983
tcp_data_queue+0x25f2/0x4c90 net/ipv4/tcp_input.c:5057
tcp_rcv_state_process+0x1774/0x4e80 net/ipv4/tcp_input.c:6659
tcp_v4_do_rcv+0x339/0x980 net/ipv4/tcp_ipv4.c:1682
sk_backlog_rcv include/net/sock.h:1061 [inline]
__release_sock+0x134/0x3b0 net/core/sock.c:2849
release_sock+0x54/0x1b0 net/core/sock.c:3404
inet_shutdown+0x1e0/0x430 net/ipv4/af_inet.c:909
__sys_shutdown_sock net/socket.c:2331 [inline]
__sys_shutdown_sock net/socket.c:2325 [inline]
__sys_shutdown+0xf1/0x1b0 net/socket.c:2343
__do_sys_shutdown net/socket.c:2351 [inline]
__se_sys_shutdown net/socket.c:2349 [inline]
__x64_sys_shutdown+0x50/0x70 net/socket.c:2349
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x46/0xb0
</TASK>

During SMC fallback process in connect syscall, kernel will
replaces TCP with SMC. In order to forward wakeup
smc socket waitqueue after fallback, kernel will sets
clcsk->sk_user_data to origin smc socket in
smc_fback_replace_callbacks().

Later, in shutdown syscall, kernel will calls
sk_psock_get(), which treats the clcsk->sk_user_data
as psock type, triggering the refcnt warning.

So, the root cause is that smc and psock, both will use
sk_user_data field. So they will mismatch this field
easily.

This patch solves it by using another bit(defined as
SK_USER_DATA_PSOCK) in PTRMASK, to mark whether
sk_user_data points to a psock object or not.
This patch depends on a PTRMASK introduced in commit f1ff5ce2cd5e
("net, sk_msg: Clear sk_user_data pointer on clone if tagged").

Reported-and-tested-by: [email protected]
Suggested-by: Jakub Kicinski <[email protected]>
Acked-by: Wen Gu <[email protected]>
Signed-off-by: Hawkins Jiawei <[email protected]>
---
v2 -> v3:
- use SK_USER_DATA_PSOCK instead of SK_USER_DATA_NOTPSOCK
to patch the bug
- refactor the code on assigning to sk_user_data field
in psock part
- refactor the code on getting and setting the flag
with sk_user_data field

v1 -> v2:
- add bit in PTRMASK to patch the bug

include/linux/skmsg.h | 2 +-
include/net/sock.h | 58 +++++++++++++++++++++++++++++++------------
net/core/skmsg.c | 3 ++-
3 files changed, 45 insertions(+), 18 deletions(-)

diff --git a/include/linux/skmsg.h b/include/linux/skmsg.h
index c5a2d6f50f25..81bfa1a33623 100644
--- a/include/linux/skmsg.h
+++ b/include/linux/skmsg.h
@@ -277,7 +277,7 @@ static inline void sk_msg_sg_copy_clear(struct sk_msg *msg, u32 start)

static inline struct sk_psock *sk_psock(const struct sock *sk)
{
- return rcu_dereference_sk_user_data(sk);
+ return rcu_dereference_sk_user_data_psock(sk);
}

static inline void sk_psock_set_state(struct sk_psock *psock,
diff --git a/include/net/sock.h b/include/net/sock.h
index 9fa54762e077..d010910d5879 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -545,14 +545,24 @@ enum sk_pacing {
SK_PACING_FQ = 2,
};

-/* Pointer stored in sk_user_data might not be suitable for copying
- * when cloning the socket. For instance, it can point to a reference
- * counted object. sk_user_data bottom bit is set if pointer must not
- * be copied.
+/* flag bits in sk_user_data
+ *
+ * SK_USER_DATA_NOCOPY - Pointer stored in sk_user_data might
+ * not be suitable for copying when cloning the socket.
+ * For instance, it can point to a reference counted object.
+ * sk_user_data bottom bit is set if pointer must not be copied.
+ *
+ * SK_USER_DATA_BPF - Managed by BPF
+ *
+ * SK_USER_DATA_PSOCK - Mark whether pointer stored in sk_user_data points
+ * to psock type. This bit should be set when sk_user_data is
+ * assigned to a psock object.
*/
#define SK_USER_DATA_NOCOPY 1UL
-#define SK_USER_DATA_BPF 2UL /* Managed by BPF */
-#define SK_USER_DATA_PTRMASK ~(SK_USER_DATA_NOCOPY | SK_USER_DATA_BPF)
+#define SK_USER_DATA_BPF 2UL
+#define SK_USER_DATA_PSOCK 4UL
+#define SK_USER_DATA_PTRMASK ~(SK_USER_DATA_NOCOPY | SK_USER_DATA_BPF |\
+ SK_USER_DATA_PSOCK)

/**
* sk_user_data_is_nocopy - Test if sk_user_data pointer must not be copied
@@ -570,19 +580,35 @@ static inline bool sk_user_data_is_nocopy(const struct sock *sk)
void *__tmp = rcu_dereference(__sk_user_data((sk))); \ (void *)((uintptr_t)__tmp & SK_USER_DATA_PTRMASK); \
})
-#define rcu_assign_sk_user_data(sk, ptr) \
+#define rcu_assign_sk_user_data_with_flags(sk, ptr, flags) \
({ \
- uintptr_t __tmp = (uintptr_t)(ptr); \
- WARN_ON_ONCE(__tmp & ~SK_USER_DATA_PTRMASK); \
- rcu_assign_pointer(__sk_user_data((sk)), __tmp); \
-})
-#define rcu_assign_sk_user_data_nocopy(sk, ptr) \
-({ \
- uintptr_t __tmp = (uintptr_t)(ptr); \
- WARN_ON_ONCE(__tmp & ~SK_USER_DATA_PTRMASK); \
+ uintptr_t __tmp1 = (uintptr_t)(ptr), \
+ __tmp2 = (uintptr_t)(flags); \
+ WARN_ON_ONCE(__tmp1 & ~SK_USER_DATA_PTRMASK); \
+ WARN_ON_ONCE(__tmp2 & SK_USER_DATA_PTRMASK); \
rcu_assign_pointer(__sk_user_data((sk)), \
- __tmp | SK_USER_DATA_NOCOPY); \
+ __tmp1 | __tmp2); \
})
+#define rcu_assign_sk_user_data(sk, ptr) \
+ rcu_assign_sk_user_data_with_flags(sk, ptr, 0)
+
+/**
+ * rcu_dereference_sk_user_data_psock - return psock if sk_user_data
+ * points to the psock type(SK_USER_DATA_PSOCK flag is set), otherwise
+ * return NULL
+ *
+ * @sk: socket
+ */
+static inline
+struct sk_psock *rcu_dereference_sk_user_data_psock(const struct sock *sk)
+{
+ uintptr_t __tmp = (uintptr_t)rcu_dereference(__sk_user_data((sk)));
+
+ if (__tmp & SK_USER_DATA_PSOCK)
+ return (struct sk_psock *)(__tmp & SK_USER_DATA_PTRMASK);
+
+ return NULL;
+}

static inline
struct net *sock_net(const struct sock *sk)
diff --git a/net/core/skmsg.c b/net/core/skmsg.c
index b0fcd0200e84..d174897dbb4b 100644
--- a/net/core/skmsg.c
+++ b/net/core/skmsg.c
@@ -735,7 +735,8 @@ struct sk_psock *sk_psock_init(struct sock *sk, int node)
sk_psock_set_state(psock, SK_PSOCK_TX_ENABLED);
refcount_set(&psock->refcnt, 1);

- rcu_assign_sk_user_data_nocopy(sk, psock);
+ rcu_assign_sk_user_data_with_flags(sk, psock, SK_USER_DATA_NOCOPY |
+ SK_USER_DATA_PSOCK);
sock_hold(sk);

out:
--
2.25.1


2022-08-03 11:46:30

by Wen Gu

[permalink] [raw]
Subject: Re: [PATCH v3] net/smc: fix refcount bug in sk_psock_get (2)



On 2022/8/3 16:03, Hawkins Jiawei wrote:
> Syzkaller reports refcount bug as follows:
> ------------[ cut here ]------------
> refcount_t: saturated; leaking memory.
> WARNING: CPU: 1 PID: 3605 at lib/refcount.c:19 refcount_warn_saturate+0xf4/0x1e0 lib/refcount.c:19
> Modules linked in:
> CPU: 1 PID: 3605 Comm: syz-executor208 Not tainted 5.18.0-syzkaller-03023-g7e062cda7d90 #0
> <TASK>
> __refcount_add_not_zero include/linux/refcount.h:163 [inline]
> __refcount_inc_not_zero include/linux/refcount.h:227 [inline]
> refcount_inc_not_zero include/linux/refcount.h:245 [inline]
> sk_psock_get+0x3bc/0x410 include/linux/skmsg.h:439
> tls_data_ready+0x6d/0x1b0 net/tls/tls_sw.c:2091
> tcp_data_ready+0x106/0x520 net/ipv4/tcp_input.c:4983
> tcp_data_queue+0x25f2/0x4c90 net/ipv4/tcp_input.c:5057
> tcp_rcv_state_process+0x1774/0x4e80 net/ipv4/tcp_input.c:6659
> tcp_v4_do_rcv+0x339/0x980 net/ipv4/tcp_ipv4.c:1682
> sk_backlog_rcv include/net/sock.h:1061 [inline]
> __release_sock+0x134/0x3b0 net/core/sock.c:2849
> release_sock+0x54/0x1b0 net/core/sock.c:3404
> inet_shutdown+0x1e0/0x430 net/ipv4/af_inet.c:909
> __sys_shutdown_sock net/socket.c:2331 [inline]
> __sys_shutdown_sock net/socket.c:2325 [inline]
> __sys_shutdown+0xf1/0x1b0 net/socket.c:2343
> __do_sys_shutdown net/socket.c:2351 [inline]
> __se_sys_shutdown net/socket.c:2349 [inline]
> __x64_sys_shutdown+0x50/0x70 net/socket.c:2349
> do_syscall_x64 arch/x86/entry/common.c:50 [inline]
> do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
> entry_SYSCALL_64_after_hwframe+0x46/0xb0
> </TASK>
>
> During SMC fallback process in connect syscall, kernel will
> replaces TCP with SMC. In order to forward wakeup
> smc socket waitqueue after fallback, kernel will sets
> clcsk->sk_user_data to origin smc socket in
> smc_fback_replace_callbacks().
>
> Later, in shutdown syscall, kernel will calls
> sk_psock_get(), which treats the clcsk->sk_user_data
> as psock type, triggering the refcnt warning.
>
> So, the root cause is that smc and psock, both will use
> sk_user_data field. So they will mismatch this field
> easily.
>
> This patch solves it by using another bit(defined as
> SK_USER_DATA_PSOCK) in PTRMASK, to mark whether
> sk_user_data points to a psock object or not.
> This patch depends on a PTRMASK introduced in commit f1ff5ce2cd5e
> ("net, sk_msg: Clear sk_user_data pointer on clone if tagged").
>
> Reported-and-tested-by: [email protected]
> Suggested-by: Jakub Kicinski <[email protected]>
> Acked-by: Wen Gu <[email protected]>
> Signed-off-by: Hawkins Jiawei <[email protected]>
> ---
> v2 -> v3:
> - use SK_USER_DATA_PSOCK instead of SK_USER_DATA_NOTPSOCK
> to patch the bug
> - refactor the code on assigning to sk_user_data field
> in psock part
> - refactor the code on getting and setting the flag
> with sk_user_data field
>
> v1 -> v2:
> - add bit in PTRMASK to patch the bug
>
> include/linux/skmsg.h | 2 +-
> include/net/sock.h | 58 +++++++++++++++++++++++++++++++------------
> net/core/skmsg.c | 3 ++-
> 3 files changed, 45 insertions(+), 18 deletions(-)
>

Hi Hawkins,

Since the fix v3 doesn't involved smc codes any more, I wonder if it's still
appropriate to use 'net/smc:' in subject?

Cheers,
Wen Gu

2022-08-03 12:25:48

by Hawkins Jiawei

[permalink] [raw]
Subject: [PATCH v3] net/smc: fix refcount bug in sk_psock_get (2)

On Wed, 3 Aug 2022 at 19:27, Wen Gu <[email protected]> wrote:
> Hi Hawkins,
>
> Since the fix v3 doesn't involved smc codes any more, I wonder if it's still
> appropriate to use 'net/smc:' in subject?
>
> Cheers,
> Wen Gu
Thank you for reminding me, I will send the v4 patch with new subject.

2022-08-03 12:58:26

by Hawkins Jiawei

[permalink] [raw]
Subject: [PATCH v4] net: fix refcount bug in sk_psock_get (2)

Syzkaller reports refcount bug as follows:
------------[ cut here ]------------
refcount_t: saturated; leaking memory.
WARNING: CPU: 1 PID: 3605 at lib/refcount.c:19 refcount_warn_saturate+0xf4/0x1e0 lib/refcount.c:19
Modules linked in:
CPU: 1 PID: 3605 Comm: syz-executor208 Not tainted 5.18.0-syzkaller-03023-g7e062cda7d90 #0
<TASK>
__refcount_add_not_zero include/linux/refcount.h:163 [inline]
__refcount_inc_not_zero include/linux/refcount.h:227 [inline]
refcount_inc_not_zero include/linux/refcount.h:245 [inline]
sk_psock_get+0x3bc/0x410 include/linux/skmsg.h:439
tls_data_ready+0x6d/0x1b0 net/tls/tls_sw.c:2091
tcp_data_ready+0x106/0x520 net/ipv4/tcp_input.c:4983
tcp_data_queue+0x25f2/0x4c90 net/ipv4/tcp_input.c:5057
tcp_rcv_state_process+0x1774/0x4e80 net/ipv4/tcp_input.c:6659
tcp_v4_do_rcv+0x339/0x980 net/ipv4/tcp_ipv4.c:1682
sk_backlog_rcv include/net/sock.h:1061 [inline]
__release_sock+0x134/0x3b0 net/core/sock.c:2849
release_sock+0x54/0x1b0 net/core/sock.c:3404
inet_shutdown+0x1e0/0x430 net/ipv4/af_inet.c:909
__sys_shutdown_sock net/socket.c:2331 [inline]
__sys_shutdown_sock net/socket.c:2325 [inline]
__sys_shutdown+0xf1/0x1b0 net/socket.c:2343
__do_sys_shutdown net/socket.c:2351 [inline]
__se_sys_shutdown net/socket.c:2349 [inline]
__x64_sys_shutdown+0x50/0x70 net/socket.c:2349
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x46/0xb0
</TASK>

During SMC fallback process in connect syscall, kernel will
replaces TCP with SMC. In order to forward wakeup
smc socket waitqueue after fallback, kernel will sets
clcsk->sk_user_data to origin smc socket in
smc_fback_replace_callbacks().

Later, in shutdown syscall, kernel will calls
sk_psock_get(), which treats the clcsk->sk_user_data
as psock type, triggering the refcnt warning.

So, the root cause is that smc and psock, both will use
sk_user_data field. So they will mismatch this field
easily.

This patch solves it by using another bit(defined as
SK_USER_DATA_PSOCK) in PTRMASK, to mark whether
sk_user_data points to a psock object or not.
This patch depends on a PTRMASK introduced in commit f1ff5ce2cd5e
("net, sk_msg: Clear sk_user_data pointer on clone if tagged").

Reported-and-tested-by: [email protected]
Suggested-by: Jakub Kicinski <[email protected]>
Acked-by: Wen Gu <[email protected]>
Signed-off-by: Hawkins Jiawei <[email protected]>
---
v3 -> v4:
- change new subject
- fix diff content, which has been edit accidentally

v2 -> v3:
- use SK_USER_DATA_PSOCK instead of SK_USER_DATA_NOTPSOCK
to patch the bug
- refactor the code on assigning to sk_user_data field
in psock part
- refactor the code on getting and setting the flag
with sk_user_data field

v1 -> v2:
- add bit in PTRMASK to patch the bug

include/linux/skmsg.h | 2 +-
include/net/sock.h | 58 +++++++++++++++++++++++++++++++------------
net/core/skmsg.c | 3 ++-
3 files changed, 45 insertions(+), 18 deletions(-)

diff --git a/include/linux/skmsg.h b/include/linux/skmsg.h
index c5a2d6f50f25..81bfa1a33623 100644
--- a/include/linux/skmsg.h
+++ b/include/linux/skmsg.h
@@ -277,7 +277,7 @@ static inline void sk_msg_sg_copy_clear(struct sk_msg *msg, u32 start)

static inline struct sk_psock *sk_psock(const struct sock *sk)
{
- return rcu_dereference_sk_user_data(sk);
+ return rcu_dereference_sk_user_data_psock(sk);
}

static inline void sk_psock_set_state(struct sk_psock *psock,
diff --git a/include/net/sock.h b/include/net/sock.h
index 9fa54762e077..d010910d5879 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -545,14 +545,24 @@ enum sk_pacing {
SK_PACING_FQ = 2,
};

-/* Pointer stored in sk_user_data might not be suitable for copying
- * when cloning the socket. For instance, it can point to a reference
- * counted object. sk_user_data bottom bit is set if pointer must not
- * be copied.
+/* flag bits in sk_user_data
+ *
+ * SK_USER_DATA_NOCOPY - Pointer stored in sk_user_data might
+ * not be suitable for copying when cloning the socket.
+ * For instance, it can point to a reference counted object.
+ * sk_user_data bottom bit is set if pointer must not be copied.
+ *
+ * SK_USER_DATA_BPF - Managed by BPF
+ *
+ * SK_USER_DATA_PSOCK - Mark whether pointer stored in sk_user_data points
+ * to psock type. This bit should be set when sk_user_data is
+ * assigned to a psock object.
*/
#define SK_USER_DATA_NOCOPY 1UL
-#define SK_USER_DATA_BPF 2UL /* Managed by BPF */
-#define SK_USER_DATA_PTRMASK ~(SK_USER_DATA_NOCOPY | SK_USER_DATA_BPF)
+#define SK_USER_DATA_BPF 2UL
+#define SK_USER_DATA_PSOCK 4UL
+#define SK_USER_DATA_PTRMASK ~(SK_USER_DATA_NOCOPY | SK_USER_DATA_BPF |\
+ SK_USER_DATA_PSOCK)

/**
* sk_user_data_is_nocopy - Test if sk_user_data pointer must not be copied
@@ -570,19 +580,35 @@ static inline bool sk_user_data_is_nocopy(const struct sock *sk)
void *__tmp = rcu_dereference(__sk_user_data((sk))); \
(void *)((uintptr_t)__tmp & SK_USER_DATA_PTRMASK); \
})
-#define rcu_assign_sk_user_data(sk, ptr) \
+#define rcu_assign_sk_user_data_with_flags(sk, ptr, flags) \
({ \
- uintptr_t __tmp = (uintptr_t)(ptr); \
- WARN_ON_ONCE(__tmp & ~SK_USER_DATA_PTRMASK); \
- rcu_assign_pointer(__sk_user_data((sk)), __tmp); \
-})
-#define rcu_assign_sk_user_data_nocopy(sk, ptr) \
-({ \
- uintptr_t __tmp = (uintptr_t)(ptr); \
- WARN_ON_ONCE(__tmp & ~SK_USER_DATA_PTRMASK); \
+ uintptr_t __tmp1 = (uintptr_t)(ptr), \
+ __tmp2 = (uintptr_t)(flags); \
+ WARN_ON_ONCE(__tmp1 & ~SK_USER_DATA_PTRMASK); \
+ WARN_ON_ONCE(__tmp2 & SK_USER_DATA_PTRMASK); \
rcu_assign_pointer(__sk_user_data((sk)), \
- __tmp | SK_USER_DATA_NOCOPY); \
+ __tmp1 | __tmp2); \
})
+#define rcu_assign_sk_user_data(sk, ptr) \
+ rcu_assign_sk_user_data_with_flags(sk, ptr, 0)
+
+/**
+ * rcu_dereference_sk_user_data_psock - return psock if sk_user_data
+ * points to the psock type(SK_USER_DATA_PSOCK flag is set), otherwise
+ * return NULL
+ *
+ * @sk: socket
+ */
+static inline
+struct sk_psock *rcu_dereference_sk_user_data_psock(const struct sock *sk)
+{
+ uintptr_t __tmp = (uintptr_t)rcu_dereference(__sk_user_data((sk)));
+
+ if (__tmp & SK_USER_DATA_PSOCK)
+ return (struct sk_psock *)(__tmp & SK_USER_DATA_PTRMASK);
+
+ return NULL;
+}

static inline
struct net *sock_net(const struct sock *sk)
diff --git a/net/core/skmsg.c b/net/core/skmsg.c
index b0fcd0200e84..d174897dbb4b 100644
--- a/net/core/skmsg.c
+++ b/net/core/skmsg.c
@@ -735,7 +735,8 @@ struct sk_psock *sk_psock_init(struct sock *sk, int node)
sk_psock_set_state(psock, SK_PSOCK_TX_ENABLED);
refcount_set(&psock->refcnt, 1);

- rcu_assign_sk_user_data_nocopy(sk, psock);
+ rcu_assign_sk_user_data_with_flags(sk, psock, SK_USER_DATA_NOCOPY |
+ SK_USER_DATA_PSOCK);
sock_hold(sk);

out:
--
2.25.1


2022-08-03 15:41:52

by Martin KaFai Lau

[permalink] [raw]
Subject: Re: [PATCH v4] net: fix refcount bug in sk_psock_get (2)

On Wed, Aug 03, 2022 at 08:14:13AM -0700, Jakub Kicinski wrote:
> On Wed, 3 Aug 2022 20:41:22 +0800 Hawkins Jiawei wrote:
> > -/* Pointer stored in sk_user_data might not be suitable for copying
> > - * when cloning the socket. For instance, it can point to a reference
> > - * counted object. sk_user_data bottom bit is set if pointer must not
> > - * be copied.
> > +/* flag bits in sk_user_data
> > + *
> > + * SK_USER_DATA_NOCOPY - Pointer stored in sk_user_data might
> > + * not be suitable for copying when cloning the socket.
> > + * For instance, it can point to a reference counted object.
> > + * sk_user_data bottom bit is set if pointer must not be copied.
> > + *
> > + * SK_USER_DATA_BPF - Managed by BPF
>
> I'd use this opportunity to add more info here, BPF is too general.
> Maybe "Pointer is used by a BPF reuseport array"? Martin, WDYT?
SGTM. Thanks.

2022-08-03 15:51:41

by Jakub Kicinski

[permalink] [raw]
Subject: Re: [PATCH v4] net: fix refcount bug in sk_psock_get (2)

On Wed, 3 Aug 2022 20:41:22 +0800 Hawkins Jiawei wrote:
> -/* Pointer stored in sk_user_data might not be suitable for copying
> - * when cloning the socket. For instance, it can point to a reference
> - * counted object. sk_user_data bottom bit is set if pointer must not
> - * be copied.
> +/* flag bits in sk_user_data
> + *
> + * SK_USER_DATA_NOCOPY - Pointer stored in sk_user_data might
> + * not be suitable for copying when cloning the socket.
> + * For instance, it can point to a reference counted object.
> + * sk_user_data bottom bit is set if pointer must not be copied.
> + *
> + * SK_USER_DATA_BPF - Managed by BPF

I'd use this opportunity to add more info here, BPF is too general.
Maybe "Pointer is used by a BPF reuseport array"? Martin, WDYT?

> + * SK_USER_DATA_PSOCK - Mark whether pointer stored in sk_user_data points
> + * to psock type. This bit should be set when sk_user_data is
> + * assigned to a psock object.

> +/**
> + * rcu_dereference_sk_user_data_psock - return psock if sk_user_data
> + * points to the psock type(SK_USER_DATA_PSOCK flag is set), otherwise
> + * return NULL
> + *
> + * @sk: socket
> + */
> +static inline
> +struct sk_psock *rcu_dereference_sk_user_data_psock(const struct sock *sk)

nit: the return type more commonly goes on the same line as "static
inline"

> +{
> + uintptr_t __tmp = (uintptr_t)rcu_dereference(__sk_user_data((sk)));
> +
> + if (__tmp & SK_USER_DATA_PSOCK)
> + return (struct sk_psock *)(__tmp & SK_USER_DATA_PTRMASK);
> +
> + return NULL;
> +}

As a follow up we can probably generalize this into
__rcu_dereference_sk_user_data_cond(sk, bit)

and make the psock just call that:

static inline struct sk_psock *
rcu_dereference_sk_user_data_psock(const struct sock *sk)
{
return __rcu_dereference_sk_user_data_cond(sk, SK_USER_DATA_PSOCK);
}

then reuseport can also benefit, maybe:

diff --git a/kernel/bpf/reuseport_array.c b/kernel/bpf/reuseport_array.c
index e2618fb5870e..ad5c447a690c 100644
--- a/kernel/bpf/reuseport_array.c
+++ b/kernel/bpf/reuseport_array.c
@@ -21,14 +21,11 @@ static struct reuseport_array *reuseport_array(struct bpf_map *map)
/* The caller must hold the reuseport_lock */
void bpf_sk_reuseport_detach(struct sock *sk)
{
- uintptr_t sk_user_data;
+ struct sock __rcu **socks;

write_lock_bh(&sk->sk_callback_lock);
- sk_user_data = (uintptr_t)sk->sk_user_data;
- if (sk_user_data & SK_USER_DATA_BPF) {
- struct sock __rcu **socks;
-
- socks = (void *)(sk_user_data & SK_USER_DATA_PTRMASK);
+ socks = __rcu_dereference_sk_user_data_cond(sk, SK_USER_DATA_BPF);
+ if (socks) {
WRITE_ONCE(sk->sk_user_data, NULL);
/*
* Do not move this NULL assignment outside of


But that must be a separate patch, not part of this fix.

2022-08-04 03:17:20

by Hawkins Jiawei

[permalink] [raw]
Subject: Re: [PATCH v4] net: fix refcount bug in sk_psock_get (2)

On Wed, 3 Aug 2022 at 23:37, Martin KaFai Lau <[email protected]> wrote:
> > > + * SK_USER_DATA_BPF - Managed by BPF
> >
> > I'd use this opportunity to add more info here, BPF is too general.
> > Maybe "Pointer is used by a BPF reuseport array"? Martin, WDYT?
> SGTM. Thanks.
OK. It seems that this flag is introduced from
c9a368f1c0fb ("bpf: net: Avoid incorrect bpf_sk_reuseport_detach call").
I will search for more detailed description in this commit.

> >
> > > +/**
> > > + * rcu_dereference_sk_user_data_psock - return psock if sk_user_data
> > > + * points to the psock type(SK_USER_DATA_PSOCK flag is set), otherwise
> > > + * return NULL
> > > + *
> > > + * @sk: socket
> > > + */
> > > +static inline
> > > +struct sk_psock *rcu_dereference_sk_user_data_psock(const struct sock *sk)
> >
> > nit: the return type more commonly goes on the same line as "static
> > inline"
Ok. I will correct it.

> >
> > > +{
> > > +     uintptr_t __tmp = (uintptr_t)rcu_dereference(__sk_user_data((sk)));
> > > +
> > > +     if (__tmp & SK_USER_DATA_PSOCK)
> > > +             return (struct sk_psock *)(__tmp & SK_USER_DATA_PTRMASK);
> > > +
> > > +     return NULL;
> > > +}
> >
> > As a follow up we can probably generalize this into
> >  __rcu_dereference_sk_user_data_cond(sk, bit)
> >
> > and make the psock just call that:
> >
> > static inline struct sk_psock *
> > rcu_dereference_sk_user_data_psock(const struct sock *sk)
> > {
> >         return __rcu_dereference_sk_user_data_cond(sk, SK_USER_DATA_PSOCK);
> > }
Yes. I will refactor it in this way.

> >
> > diff --git a/kernel/bpf/reuseport_array.c b/kernel/bpf/reuseport_array.c
> > index e2618fb5870e..ad5c447a690c 100644
> > --- a/kernel/bpf/reuseport_array.c
> > +++ b/kernel/bpf/reuseport_array.c
> > @@ -21,14 +21,11 @@ static struct reuseport_array *reuseport_array(struct bpf_map *map)
> >  /* The caller must hold the reuseport_lock */
> >  void bpf_sk_reuseport_detach(struct sock *sk)
> >  {
> > -       uintptr_t sk_user_data;
> > +       struct sock __rcu **socks;
> >
> >         write_lock_bh(&sk->sk_callback_lock);
> > -       sk_user_data = (uintptr_t)sk->sk_user_data;
> > -       if (sk_user_data & SK_USER_DATA_BPF) {
> > -               struct sock __rcu **socks;
> > -
> > -               socks = (void *)(sk_user_data & SK_USER_DATA_PTRMASK);
> > +       socks = __rcu_dereference_sk_user_data_cond(sk, SK_USER_DATA_BPF);
> > +       if (socks) {
> >                 WRITE_ONCE(sk->sk_user_data, NULL);
> >                 /*
> >                  * Do not move this NULL assignment outside of
> >
> >
> > But that must be a separate patch, not part of this fix.
I wonder if it is proper to gather these together in a patchset, for
they are all about flags in sk_user_data, maybe:

[PATCH v5 0/2] net: enhancement to flags in sk_user_data field
- introduce the patchset

[PATCH v5 1/2] net: clean up code for flags in sk_user_data field
- refactor the things in include/linux/skmsg.h and
include/net/sock.h
- refactor the flags's usage by other code, such as
net/core/skmsg.c and kernel/bpf/reuseport_array.c

[PATCH v5 2/2] net: fix refcount bug in sk_psock_get (2)
- add SK_USER_DATA_PSOCK flag in sk_user_data field

2022-08-04 15:42:01

by Jakub Kicinski

[permalink] [raw]
Subject: Re: [PATCH v4] net: fix refcount bug in sk_psock_get (2)

On Thu, 4 Aug 2022 11:05:15 +0800 Hawkins Jiawei wrote:
> I wonder if it is proper to gather these together in a patchset, for
> they are all about flags in sk_user_data, maybe:
>
> [PATCH v5 0/2] net: enhancement to flags in sk_user_data field
> - introduce the patchset
>
> [PATCH v5 1/2] net: clean up code for flags in sk_user_data field
> - refactor the things in include/linux/skmsg.h and
> include/net/sock.h
> - refactor the flags's usage by other code, such as
> net/core/skmsg.c and kernel/bpf/reuseport_array.c
>
> [PATCH v5 2/2] net: fix refcount bug in sk_psock_get (2)
> - add SK_USER_DATA_PSOCK flag in sk_user_data field

Usually the fix comes first, because it needs to be backported to
stable, and we don't want to have to pull extra commits into stable
and risk regressions in code which was not directly involved in the bug.

2022-08-05 06:33:15

by Hawkins Jiawei

[permalink] [raw]
Subject: Re: [PATCH v4] net: fix refcount bug in sk_psock_get (2)

On Thu, 4 Aug 2022 at 23:29, Jakub Kicinski <[email protected]> wrote:
>
> On Thu, 4 Aug 2022 11:05:15 +0800 Hawkins Jiawei wrote:
> > I wonder if it is proper to gather these together in a patchset, for
> > they are all about flags in sk_user_data, maybe:
> >
> > [PATCH v5 0/2] net: enhancement to flags in sk_user_data field
> > - introduce the patchset
> >
> > [PATCH v5 1/2] net: clean up code for flags in sk_user_data field
> > - refactor the things in include/linux/skmsg.h and
> > include/net/sock.h
> > - refactor the flags's usage by other code, such as
> > net/core/skmsg.c and kernel/bpf/reuseport_array.c
> >
> > [PATCH v5 2/2] net: fix refcount bug in sk_psock_get (2)
> > - add SK_USER_DATA_PSOCK flag in sk_user_data field
>
> Usually the fix comes first, because it needs to be backported to
> stable, and we don't want to have to pull extra commits into stable
> and risk regressions in code which was not directly involved in the bug.
Ok, I got it. Thanks for the explanation.