2022-09-27 17:02:59

by Kuniyuki Iwashima

[permalink] [raw]
Subject: [PATCH v1 net 0/5] tcp/udp: Fix memory leaks and data races around IPV6_ADDRFORM.

This series fixes some memory leaks and data races caused in the
same scenario where one thread converts an IPv6 socket into IPv4
with IPV6_ADDRFORM and another accesses the socket concurrently.

Note patch 1 and 5 conflict with these commits in net-next, respectively.

* 24426654ed3a ("bpf: net: Avoid sk_setsockopt() taking sk lock when called from bpf")
* 34704ef024ae ("bpf: net: Change do_tcp_getsockopt() to take the sockptr_t argument")


Kuniyuki Iwashima (5):
tcp/udp: Fix memory leak in ipv6_renew_options().
udp: Call inet6_destroy_sock() in setsockopt(IPV6_ADDRFORM).
tcp/udp: Call inet6_destroy_sock() in IPv4 sk_prot->destroy().
ipv6: Fix data races around sk->sk_prot.
tcp: Fix data races around icsk->icsk_af_ops.

net/core/sock.c | 6 ++++--
net/ipv4/af_inet.c | 23 ++++++++++++++++-------
net/ipv4/tcp.c | 10 ++++++----
net/ipv4/tcp_ipv4.c | 5 +++++
net/ipv4/udp.c | 6 ++++++
net/ipv6/ipv6_sockglue.c | 29 ++++++++++++++---------------
net/ipv6/tcp_ipv6.c | 1 -
7 files changed, 51 insertions(+), 29 deletions(-)

--
2.30.2


2022-09-27 17:07:03

by Kuniyuki Iwashima

[permalink] [raw]
Subject: [PATCH v1 net 3/5] tcp/udp: Call inet6_destroy_sock() in IPv4 sk_prot->destroy().

Originally, inet6_sk(sk)->XXX were changed under lock_sock(), so we were
able to clean them up by calling inet6_destroy_sock() during the IPv6 ->
IPv4 conversion by IPV6_ADDRFORM. However, commit 03485f2adcde ("udpv6:
Add lockless sendmsg() support") added a lockless memory allocation path,
which could cause a memory leak:

setsockopt(IPV6_ADDRFORM) sendmsg()
+-----------------------+ +-------+
- do_ipv6_setsockopt(sk, ...) - udpv6_sendmsg(sk, ...)
- lock_sock(sk) ^._ called via udpv6_prot
- WRITE_ONCE(sk->sk_prot, &tcp_prot) before WRITE_ONCE()
- inet6_destroy_sock()
- release_sock(sk) - ip6_make_skb(sk, ...)
^._ lockless fast path for
the non-corking case

- __ip6_append_data(sk, ...)
- ipv6_local_rxpmtu(sk, ...)
- xchg(&np->rxpmtu, skb)
^._ rxpmtu is never freed.

- lock_sock(sk)

For now, rxpmtu is only the case, but let's call inet6_destroy_sock()
in both TCP/UDP v4 destroy functions not to miss the future change.

We can consolidate TCP/UDP v4/v6 destroy functions, but such changes
are too invasive to backport to stable. So, they can be posted as a
follow-up later for net-next.

Fixes: 03485f2adcde ("udpv6: Add lockless sendmsg() support")
Signed-off-by: Kuniyuki Iwashima <[email protected]>
---
Cc: Vladislav Yasevich <[email protected]>
---
net/ipv4/tcp_ipv4.c | 5 +++++
net/ipv4/udp.c | 6 ++++++
net/ipv6/tcp_ipv6.c | 1 -
3 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index 5b019ba2b9d2..035b6c52a243 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -2263,6 +2263,11 @@ void tcp_v4_destroy_sock(struct sock *sk)
tcp_saved_syn_free(tp);

sk_sockets_allocated_dec(sk);
+
+#if IS_ENABLED(CONFIG_IPV6)
+ if (sk->sk_prot_creator == &tcpv6_prot)
+ inet6_destroy_sock(sk);
+#endif
}
EXPORT_SYMBOL(tcp_v4_destroy_sock);

diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 560d9eadeaa5..cdf131c0a819 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -115,6 +115,7 @@
#include <net/udp_tunnel.h>
#if IS_ENABLED(CONFIG_IPV6)
#include <net/ipv6_stubs.h>
+#include <net/transp_v6.h>
#endif

struct udp_table udp_table __read_mostly;
@@ -2666,6 +2667,11 @@ void udp_destroy_sock(struct sock *sk)
if (up->encap_enabled)
static_branch_dec(&udp_encap_needed_key);
}
+
+#if IS_ENABLED(CONFIG_IPV6)
+ if (sk->sk_prot_creator == &udpv6_prot)
+ inet6_destroy_sock(sk);
+#endif
}

/*
diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
index e54eee80ce5f..1ff6a92f7774 100644
--- a/net/ipv6/tcp_ipv6.c
+++ b/net/ipv6/tcp_ipv6.c
@@ -1945,7 +1945,6 @@ static int tcp_v6_init_sock(struct sock *sk)
static void tcp_v6_destroy_sock(struct sock *sk)
{
tcp_v4_destroy_sock(sk);
- inet6_destroy_sock(sk);
}

#ifdef CONFIG_PROC_FS
--
2.30.2

2022-09-27 17:19:14

by Eric Dumazet

[permalink] [raw]
Subject: Re: [PATCH v1 net 3/5] tcp/udp: Call inet6_destroy_sock() in IPv4 sk_prot->destroy().

On Tue, Sep 27, 2022 at 9:13 AM Kuniyuki Iwashima <[email protected]> wrote:
>
> Originally, inet6_sk(sk)->XXX were changed under lock_sock(), so we were
> able to clean them up by calling inet6_destroy_sock() during the IPv6 ->
> IPv4 conversion by IPV6_ADDRFORM. However, commit 03485f2adcde ("udpv6:
> Add lockless sendmsg() support") added a lockless memory allocation path,
> which could cause a memory leak:
>
> setsockopt(IPV6_ADDRFORM) sendmsg()
> +-----------------------+ +-------+
> - do_ipv6_setsockopt(sk, ...) - udpv6_sendmsg(sk, ...)
> - lock_sock(sk) ^._ called via udpv6_prot
> - WRITE_ONCE(sk->sk_prot, &tcp_prot) before WRITE_ONCE()
> - inet6_destroy_sock()
> - release_sock(sk) - ip6_make_skb(sk, ...)
> ^._ lockless fast path for
> the non-corking case
>
> - __ip6_append_data(sk, ...)
> - ipv6_local_rxpmtu(sk, ...)
> - xchg(&np->rxpmtu, skb)
> ^._ rxpmtu is never freed.
>
> - lock_sock(sk)
>
> For now, rxpmtu is only the case, but let's call inet6_destroy_sock()
> in both TCP/UDP v4 destroy functions not to miss the future change.
>
> We can consolidate TCP/UDP v4/v6 destroy functions, but such changes
> are too invasive to backport to stable. So, they can be posted as a
> follow-up later for net-next.
>
> Fixes: 03485f2adcde ("udpv6: Add lockless sendmsg() support")
> Signed-off-by: Kuniyuki Iwashima <[email protected]>
> ---
> Cc: Vladislav Yasevich <[email protected]>
> ---
> net/ipv4/tcp_ipv4.c | 5 +++++
> net/ipv4/udp.c | 6 ++++++
> net/ipv6/tcp_ipv6.c | 1 -
> 3 files changed, 11 insertions(+), 1 deletion(-)
>
> diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
> index 5b019ba2b9d2..035b6c52a243 100644
> --- a/net/ipv4/tcp_ipv4.c
> +++ b/net/ipv4/tcp_ipv4.c
> @@ -2263,6 +2263,11 @@ void tcp_v4_destroy_sock(struct sock *sk)
> tcp_saved_syn_free(tp);
>
> sk_sockets_allocated_dec(sk);
> +
> +#if IS_ENABLED(CONFIG_IPV6)
> + if (sk->sk_prot_creator == &tcpv6_prot)
> + inet6_destroy_sock(sk);
> +#endif
> }

This is ugly, and will not compile with CONFIG_IPV6=m, right ?


> EXPORT_SYMBOL(tcp_v4_destroy_sock);
>
> diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
> index 560d9eadeaa5..cdf131c0a819 100644
> --- a/net/ipv4/udp.c
> +++ b/net/ipv4/udp.c
> @@ -115,6 +115,7 @@
> #include <net/udp_tunnel.h>
> #if IS_ENABLED(CONFIG_IPV6)
> #include <net/ipv6_stubs.h>
> +#include <net/transp_v6.h>
> #endif
>
> struct udp_table udp_table __read_mostly;
> @@ -2666,6 +2667,11 @@ void udp_destroy_sock(struct sock *sk)
> if (up->encap_enabled)
> static_branch_dec(&udp_encap_needed_key);
> }
> +
> +#if IS_ENABLED(CONFIG_IPV6)
> + if (sk->sk_prot_creator == &udpv6_prot)
> + inet6_destroy_sock(sk);
> +#endif
> }
>
> /*
> diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
> index e54eee80ce5f..1ff6a92f7774 100644
> --- a/net/ipv6/tcp_ipv6.c
> +++ b/net/ipv6/tcp_ipv6.c
> @@ -1945,7 +1945,6 @@ static int tcp_v6_init_sock(struct sock *sk)
> static void tcp_v6_destroy_sock(struct sock *sk)
> {
> tcp_v4_destroy_sock(sk);
> - inet6_destroy_sock(sk);
> }
>
> #ifdef CONFIG_PROC_FS
> --
> 2.30.2
>

2022-09-27 17:22:55

by Kuniyuki Iwashima

[permalink] [raw]
Subject: [PATCH v1 net 5/5] tcp: Fix data races around icsk->icsk_af_ops.

IPV6_ADDRFORM changes icsk->icsk_af_ops under lock_sock(), but
tcp_(get|set)sockopt() read it locklessly. To avoid load/store
tearing, we need to add READ_ONCE() and WRITE_ONCE() for the reads
and write.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Kuniyuki Iwashima <[email protected]>
---
net/ipv4/tcp.c | 10 ++++++----
net/ipv6/ipv6_sockglue.c | 3 ++-
2 files changed, 8 insertions(+), 5 deletions(-)

diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index e373dde1f46f..c86dd0ccef5b 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -3795,8 +3795,9 @@ int tcp_setsockopt(struct sock *sk, int level, int optname, sockptr_t optval,
const struct inet_connection_sock *icsk = inet_csk(sk);

if (level != SOL_TCP)
- return icsk->icsk_af_ops->setsockopt(sk, level, optname,
- optval, optlen);
+ /* IPV6_ADDRFORM can change icsk->icsk_af_ops under us. */
+ return READ_ONCE(icsk->icsk_af_ops)->setsockopt(sk, level, optname,
+ optval, optlen);
return do_tcp_setsockopt(sk, level, optname, optval, optlen);
}
EXPORT_SYMBOL(tcp_setsockopt);
@@ -4394,8 +4395,9 @@ int tcp_getsockopt(struct sock *sk, int level, int optname, char __user *optval,
struct inet_connection_sock *icsk = inet_csk(sk);

if (level != SOL_TCP)
- return icsk->icsk_af_ops->getsockopt(sk, level, optname,
- optval, optlen);
+ /* IPV6_ADDRFORM can change icsk->icsk_af_ops under us. */
+ return READ_ONCE(icsk->icsk_af_ops)->getsockopt(sk, level, optname,
+ optval, optlen);
return do_tcp_getsockopt(sk, level, optname, optval, optlen);
}
EXPORT_SYMBOL(tcp_getsockopt);
diff --git a/net/ipv6/ipv6_sockglue.c b/net/ipv6/ipv6_sockglue.c
index a89db5872dc3..726d95859898 100644
--- a/net/ipv6/ipv6_sockglue.c
+++ b/net/ipv6/ipv6_sockglue.c
@@ -479,7 +479,8 @@ static int do_ipv6_setsockopt(struct sock *sk, int level, int optname,

/* Paired with READ_ONCE(sk->sk_prot) in inet6_stream_ops */
WRITE_ONCE(sk->sk_prot, &tcp_prot);
- icsk->icsk_af_ops = &ipv4_specific;
+ /* Paired with READ_ONCE() in tcp_(get|set)sockopt() */
+ WRITE_ONCE(icsk->icsk_af_ops, &ipv4_specific);
sk->sk_socket->ops = &inet_stream_ops;
sk->sk_family = PF_INET;
tcp_sync_mss(sk, icsk->icsk_pmtu_cookie);
--
2.30.2

2022-09-27 17:24:10

by Eric Dumazet

[permalink] [raw]
Subject: Re: [PATCH v1 net 5/5] tcp: Fix data races around icsk->icsk_af_ops.

On Tue, Sep 27, 2022 at 9:48 AM Kuniyuki Iwashima <[email protected]> wrote:
>
> From: Eric Dumazet <[email protected]>
> Date: Tue, 27 Sep 2022 09:39:37 -0700
> > On Tue, Sep 27, 2022 at 9:33 AM Kuniyuki Iwashima <[email protected]> wrote:
> > >
> > > IPV6_ADDRFORM changes icsk->icsk_af_ops under lock_sock(), but
> > > tcp_(get|set)sockopt() read it locklessly. To avoid load/store
> > > tearing, we need to add READ_ONCE() and WRITE_ONCE() for the reads
> > > and write.
> >
> > I am pretty sure I have released a syzkaller bug recently with this issue.
> > Have you seen this?
> > If yes, please include the appropriate syzbot tag.
>
> No, I haven't.
> Could you provide the URL?
> I'm happy to include the syzbot tag and KCSAN report in the changelog.
>
>

Report has been released 10 days ago, but apparently the syzbot queue
is so full these days that the report is still throttled.

==================================================================
BUG: KCSAN: data-race in tcp_setsockopt / tcp_v6_connect

write to 0xffff88813c624518 of 8 bytes by task 23936 on cpu 0:
tcp_v6_connect+0x5b3/0xce0 net/ipv6/tcp_ipv6.c:240
__inet_stream_connect+0x159/0x6d0 net/ipv4/af_inet.c:660
inet_stream_connect+0x44/0x70 net/ipv4/af_inet.c:724
__sys_connect_file net/socket.c:1976 [inline]
__sys_connect+0x197/0x1b0 net/socket.c:1993
__do_sys_connect net/socket.c:2003 [inline]
__se_sys_connect net/socket.c:2000 [inline]
__x64_sys_connect+0x3d/0x50 net/socket.c:2000
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x2b/0x70 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

read to 0xffff88813c624518 of 8 bytes by task 23937 on cpu 1:
tcp_setsockopt+0x147/0x1c80 net/ipv4/tcp.c:3789
sock_common_setsockopt+0x5d/0x70 net/core/sock.c:3585
__sys_setsockopt+0x212/0x2b0 net/socket.c:2252
__do_sys_setsockopt net/socket.c:2263 [inline]
__se_sys_setsockopt net/socket.c:2260 [inline]
__x64_sys_setsockopt+0x62/0x70 net/socket.c:2260
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x2b/0x70 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

value changed: 0xffffffff8539af68 -> 0xffffffff8539aff8

Reported by Kernel Concurrency Sanitizer on:
CPU: 1 PID: 23937 Comm: syz-executor.5 Not tainted
6.0.0-rc4-syzkaller-00331-g4ed9c1e971b1-dirty #0

Hardware name: Google Google Compute Engine/Google Compute Engine,
BIOS Google 08/26/2022
==================================================================

> > > Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
> > > Signed-off-by: Kuniyuki Iwashima <[email protected]>
> > > ---
> > > net/ipv4/tcp.c | 10 ++++++----
> > > net/ipv6/ipv6_sockglue.c | 3 ++-
> > > 2 files changed, 8 insertions(+), 5 deletions(-)
> > >
> > > diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> > > index e373dde1f46f..c86dd0ccef5b 100644
> > > --- a/net/ipv4/tcp.c
> > > +++ b/net/ipv4/tcp.c
> > > @@ -3795,8 +3795,9 @@ int tcp_setsockopt(struct sock *sk, int level, int optname, sockptr_t optval,
> > > const struct inet_connection_sock *icsk = inet_csk(sk);
> > >
> > > if (level != SOL_TCP)
> > > - return icsk->icsk_af_ops->setsockopt(sk, level, optname,
> > > - optval, optlen);
> > > + /* IPV6_ADDRFORM can change icsk->icsk_af_ops under us. */
> > > + return READ_ONCE(icsk->icsk_af_ops)->setsockopt(sk, level, optname,
> > > + optval, optlen);
> > > return do_tcp_setsockopt(sk, level, optname, optval, optlen);
> > > }
> > > EXPORT_SYMBOL(tcp_setsockopt);
> > > @@ -4394,8 +4395,9 @@ int tcp_getsockopt(struct sock *sk, int level, int optname, char __user *optval,
> > > struct inet_connection_sock *icsk = inet_csk(sk);
> > >
> > > if (level != SOL_TCP)
> > > - return icsk->icsk_af_ops->getsockopt(sk, level, optname,
> > > - optval, optlen);
> > > + /* IPV6_ADDRFORM can change icsk->icsk_af_ops under us. */
> > > + return READ_ONCE(icsk->icsk_af_ops)->getsockopt(sk, level, optname,
> > > + optval, optlen);
> > > return do_tcp_getsockopt(sk, level, optname, optval, optlen);
> > > }
> > > EXPORT_SYMBOL(tcp_getsockopt);
> > > diff --git a/net/ipv6/ipv6_sockglue.c b/net/ipv6/ipv6_sockglue.c
> > > index a89db5872dc3..726d95859898 100644
> > > --- a/net/ipv6/ipv6_sockglue.c
> > > +++ b/net/ipv6/ipv6_sockglue.c
> > > @@ -479,7 +479,8 @@ static int do_ipv6_setsockopt(struct sock *sk, int level, int optname,
> > >
> > > /* Paired with READ_ONCE(sk->sk_prot) in inet6_stream_ops */
> > > WRITE_ONCE(sk->sk_prot, &tcp_prot);
> > > - icsk->icsk_af_ops = &ipv4_specific;
> > > + /* Paired with READ_ONCE() in tcp_(get|set)sockopt() */
> > > + WRITE_ONCE(icsk->icsk_af_ops, &ipv4_specific);
> > > sk->sk_socket->ops = &inet_stream_ops;
> > > sk->sk_family = PF_INET;
> > > tcp_sync_mss(sk, icsk->icsk_pmtu_cookie);
> > > --
> > > 2.30.2
> > >

2022-09-27 18:17:33

by Eric Dumazet

[permalink] [raw]
Subject: Re: [PATCH v1 net 5/5] tcp: Fix data races around icsk->icsk_af_ops.

On Tue, Sep 27, 2022 at 10:49 AM Kuniyuki Iwashima <[email protected]> wrote:
>
> From: Kuniyuki Iwashima <[email protected]>
> Date: Tue, 27 Sep 2022 09:48:24 -0700
> > From: Eric Dumazet <[email protected]>
> > Date: Tue, 27 Sep 2022 09:39:37 -0700
> > > On Tue, Sep 27, 2022 at 9:33 AM Kuniyuki Iwashima <[email protected]> wrote:
> > > >
> > > > IPV6_ADDRFORM changes icsk->icsk_af_ops under lock_sock(), but
> > > > tcp_(get|set)sockopt() read it locklessly. To avoid load/store
> > > > tearing, we need to add READ_ONCE() and WRITE_ONCE() for the reads
> > > > and write.
> > >
> > > I am pretty sure I have released a syzkaller bug recently with this issue.
> > > Have you seen this?
> > > If yes, please include the appropriate syzbot tag.
>
> Are you mentioning this commit ?
>

No, this is a new syzbot report, with a different stack trace.

> 086d49058cd8 ("ipv6: annotate some data-races around sk->sk_prot")
>
> Then, yes, I'll add syzbot tags to patch 4 and 5.
>
>
> >
> > No, I haven't.
> > Could you provide the URL?
> > I'm happy to include the syzbot tag and KCSAN report in the changelog.
> >
> >
> > > > Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
> > > > Signed-off-by: Kuniyuki Iwashima <[email protected]>
> > > > ---
> > > > net/ipv4/tcp.c | 10 ++++++----
> > > > net/ipv6/ipv6_sockglue.c | 3 ++-
> > > > 2 files changed, 8 insertions(+), 5 deletions(-)
> > > >
> > > > diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> > > > index e373dde1f46f..c86dd0ccef5b 100644
> > > > --- a/net/ipv4/tcp.c
> > > > +++ b/net/ipv4/tcp.c
> > > > @@ -3795,8 +3795,9 @@ int tcp_setsockopt(struct sock *sk, int level, int optname, sockptr_t optval,
> > > > const struct inet_connection_sock *icsk = inet_csk(sk);
> > > >
> > > > if (level != SOL_TCP)
> > > > - return icsk->icsk_af_ops->setsockopt(sk, level, optname,
> > > > - optval, optlen);
> > > > + /* IPV6_ADDRFORM can change icsk->icsk_af_ops under us. */
> > > > + return READ_ONCE(icsk->icsk_af_ops)->setsockopt(sk, level, optname,
> > > > + optval, optlen);
> > > > return do_tcp_setsockopt(sk, level, optname, optval, optlen);
> > > > }
> > > > EXPORT_SYMBOL(tcp_setsockopt);
> > > > @@ -4394,8 +4395,9 @@ int tcp_getsockopt(struct sock *sk, int level, int optname, char __user *optval,
> > > > struct inet_connection_sock *icsk = inet_csk(sk);
> > > >
> > > > if (level != SOL_TCP)
> > > > - return icsk->icsk_af_ops->getsockopt(sk, level, optname,
> > > > - optval, optlen);
> > > > + /* IPV6_ADDRFORM can change icsk->icsk_af_ops under us. */
> > > > + return READ_ONCE(icsk->icsk_af_ops)->getsockopt(sk, level, optname,
> > > > + optval, optlen);
> > > > return do_tcp_getsockopt(sk, level, optname, optval, optlen);
> > > > }
> > > > EXPORT_SYMBOL(tcp_getsockopt);
> > > > diff --git a/net/ipv6/ipv6_sockglue.c b/net/ipv6/ipv6_sockglue.c
> > > > index a89db5872dc3..726d95859898 100644
> > > > --- a/net/ipv6/ipv6_sockglue.c
> > > > +++ b/net/ipv6/ipv6_sockglue.c
> > > > @@ -479,7 +479,8 @@ static int do_ipv6_setsockopt(struct sock *sk, int level, int optname,
> > > >
> > > > /* Paired with READ_ONCE(sk->sk_prot) in inet6_stream_ops */
> > > > WRITE_ONCE(sk->sk_prot, &tcp_prot);
> > > > - icsk->icsk_af_ops = &ipv4_specific;
> > > > + /* Paired with READ_ONCE() in tcp_(get|set)sockopt() */
> > > > + WRITE_ONCE(icsk->icsk_af_ops, &ipv4_specific);
> > > > sk->sk_socket->ops = &inet_stream_ops;
> > > > sk->sk_family = PF_INET;
> > > > tcp_sync_mss(sk, icsk->icsk_pmtu_cookie);
> > > > --
> > > > 2.30.2
> > > >

2022-09-27 18:18:29

by Kuniyuki Iwashima

[permalink] [raw]
Subject: Re: [PATCH v1 net 5/5] tcp: Fix data races around icsk->icsk_af_ops.

From: Kuniyuki Iwashima <[email protected]>
Date: Tue, 27 Sep 2022 09:48:24 -0700
> From: Eric Dumazet <[email protected]>
> Date: Tue, 27 Sep 2022 09:39:37 -0700
> > On Tue, Sep 27, 2022 at 9:33 AM Kuniyuki Iwashima <[email protected]> wrote:
> > >
> > > IPV6_ADDRFORM changes icsk->icsk_af_ops under lock_sock(), but
> > > tcp_(get|set)sockopt() read it locklessly. To avoid load/store
> > > tearing, we need to add READ_ONCE() and WRITE_ONCE() for the reads
> > > and write.
> >
> > I am pretty sure I have released a syzkaller bug recently with this issue.
> > Have you seen this?
> > If yes, please include the appropriate syzbot tag.

Are you mentioning this commit ?

086d49058cd8 ("ipv6: annotate some data-races around sk->sk_prot")

Then, yes, I'll add syzbot tags to patch 4 and 5.


>
> No, I haven't.
> Could you provide the URL?
> I'm happy to include the syzbot tag and KCSAN report in the changelog.
>
>
> > > Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
> > > Signed-off-by: Kuniyuki Iwashima <[email protected]>
> > > ---
> > > net/ipv4/tcp.c | 10 ++++++----
> > > net/ipv6/ipv6_sockglue.c | 3 ++-
> > > 2 files changed, 8 insertions(+), 5 deletions(-)
> > >
> > > diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> > > index e373dde1f46f..c86dd0ccef5b 100644
> > > --- a/net/ipv4/tcp.c
> > > +++ b/net/ipv4/tcp.c
> > > @@ -3795,8 +3795,9 @@ int tcp_setsockopt(struct sock *sk, int level, int optname, sockptr_t optval,
> > > const struct inet_connection_sock *icsk = inet_csk(sk);
> > >
> > > if (level != SOL_TCP)
> > > - return icsk->icsk_af_ops->setsockopt(sk, level, optname,
> > > - optval, optlen);
> > > + /* IPV6_ADDRFORM can change icsk->icsk_af_ops under us. */
> > > + return READ_ONCE(icsk->icsk_af_ops)->setsockopt(sk, level, optname,
> > > + optval, optlen);
> > > return do_tcp_setsockopt(sk, level, optname, optval, optlen);
> > > }
> > > EXPORT_SYMBOL(tcp_setsockopt);
> > > @@ -4394,8 +4395,9 @@ int tcp_getsockopt(struct sock *sk, int level, int optname, char __user *optval,
> > > struct inet_connection_sock *icsk = inet_csk(sk);
> > >
> > > if (level != SOL_TCP)
> > > - return icsk->icsk_af_ops->getsockopt(sk, level, optname,
> > > - optval, optlen);
> > > + /* IPV6_ADDRFORM can change icsk->icsk_af_ops under us. */
> > > + return READ_ONCE(icsk->icsk_af_ops)->getsockopt(sk, level, optname,
> > > + optval, optlen);
> > > return do_tcp_getsockopt(sk, level, optname, optval, optlen);
> > > }
> > > EXPORT_SYMBOL(tcp_getsockopt);
> > > diff --git a/net/ipv6/ipv6_sockglue.c b/net/ipv6/ipv6_sockglue.c
> > > index a89db5872dc3..726d95859898 100644
> > > --- a/net/ipv6/ipv6_sockglue.c
> > > +++ b/net/ipv6/ipv6_sockglue.c
> > > @@ -479,7 +479,8 @@ static int do_ipv6_setsockopt(struct sock *sk, int level, int optname,
> > >
> > > /* Paired with READ_ONCE(sk->sk_prot) in inet6_stream_ops */
> > > WRITE_ONCE(sk->sk_prot, &tcp_prot);
> > > - icsk->icsk_af_ops = &ipv4_specific;
> > > + /* Paired with READ_ONCE() in tcp_(get|set)sockopt() */
> > > + WRITE_ONCE(icsk->icsk_af_ops, &ipv4_specific);
> > > sk->sk_socket->ops = &inet_stream_ops;
> > > sk->sk_family = PF_INET;
> > > tcp_sync_mss(sk, icsk->icsk_pmtu_cookie);
> > > --
> > > 2.30.2
> > >

2022-09-27 18:21:19

by Kuniyuki Iwashima

[permalink] [raw]
Subject: [PATCH v1 net 1/5] tcp/udp: Fix memory leak in ipv6_renew_options().

syzbot reported a memory leak [0] related to IPV6_ADDRFORM.

The scenario is that while one thread is converting an IPv6 socket into
IPv4 with IPV6_ADDRFORM, another thread calls do_ipv6_setsockopt() and
allocates memory to inet6_sk(sk)->XXX after conversion.

Then, the converted sk with (tcp|udp)_prot never frees the IPv6 resources,
which inet6_destroy_sock() should have cleaned up.

setsockopt(IPV6_ADDRFORM) setsockopt(IPV6_DSTOPTS)
+-----------------------+ +----------------------+
- do_ipv6_setsockopt(sk, ...)
- lock_sock(sk) - do_ipv6_setsockopt(sk, ...)
- WRITE_ONCE(sk->sk_prot, &tcp_prot) ^._ called via tcpv6_prot
- xchg(&np->opt, NULL) before WRITE_ONCE()
- txopt_put(opt)
- release_sock(sk)
- lock_sock(sk)
- ipv6_set_opt_hdr(sk, ...)
- ipv6_update_options(sk, opt)
- xchg(&inet6_sk(sk)->opt, opt)
^._ opt is never freed.

- release_sock(sk)

Since IPV6_DSTOPTS allocates options under lock_sock(), we can avoid this
memory leak by testing whether sk_family is changed by IPV6_ADDRFORM after
acquiring the lock.

This issue exists from the initial commit between IPV6_ADDRFORM and
IPV6_PKTOPTIONS.

[0]:
BUG: memory leak
unreferenced object 0xffff888009ab9f80 (size 96):
comm "syz-executor583", pid 328, jiffies 4294916198 (age 13.034s)
hex dump (first 32 bytes):
01 00 00 00 48 00 00 00 08 00 00 00 00 00 00 00 ....H...........
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
backtrace:
[<000000002ee98ae1>] kmalloc include/linux/slab.h:605 [inline]
[<000000002ee98ae1>] sock_kmalloc+0xb3/0x100 net/core/sock.c:2566
[<0000000065d7b698>] ipv6_renew_options+0x21e/0x10b0 net/ipv6/exthdrs.c:1318
[<00000000a8c756d7>] ipv6_set_opt_hdr net/ipv6/ipv6_sockglue.c:354 [inline]
[<00000000a8c756d7>] do_ipv6_setsockopt.constprop.0+0x28b7/0x4350 net/ipv6/ipv6_sockglue.c:668
[<000000002854d204>] ipv6_setsockopt+0xdf/0x190 net/ipv6/ipv6_sockglue.c:1021
[<00000000e69fdcf8>] tcp_setsockopt+0x13b/0x2620 net/ipv4/tcp.c:3789
[<0000000090da4b9b>] __sys_setsockopt+0x239/0x620 net/socket.c:2252
[<00000000b10d192f>] __do_sys_setsockopt net/socket.c:2263 [inline]
[<00000000b10d192f>] __se_sys_setsockopt net/socket.c:2260 [inline]
[<00000000b10d192f>] __x64_sys_setsockopt+0xbe/0x160 net/socket.c:2260
[<000000000a80d7aa>] do_syscall_x64 arch/x86/entry/common.c:50 [inline]
[<000000000a80d7aa>] do_syscall_64+0x38/0x90 arch/x86/entry/common.c:80
[<000000004562b5c6>] entry_SYSCALL_64_after_hwframe+0x63/0xcd

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Reported-by: syzbot <[email protected]>
Signed-off-by: Kuniyuki Iwashima <[email protected]>
---
Note the syzbot is running on our EC2, so we don't have a URL or hash.
Also, there seems to be no similar report on the public syzkaller dashboard.
Thus the Reported-by address might be inappropriate.

Please let me know if we should keep it empty or use another email address.
---
net/ipv6/ipv6_sockglue.c | 7 +++++++
1 file changed, 7 insertions(+)

diff --git a/net/ipv6/ipv6_sockglue.c b/net/ipv6/ipv6_sockglue.c
index e0dcc7a193df..b61066ac8648 100644
--- a/net/ipv6/ipv6_sockglue.c
+++ b/net/ipv6/ipv6_sockglue.c
@@ -419,6 +419,12 @@ static int do_ipv6_setsockopt(struct sock *sk, int level, int optname,
rtnl_lock();
lock_sock(sk);

+ /* Another thread has converted the socket into IPv4 with
+ * IPV6_ADDRFORM concurrently.
+ */
+ if (unlikely(sk->sk_family != AF_INET6))
+ goto unlock;
+
switch (optname) {

case IPV6_ADDRFORM:
@@ -994,6 +1000,7 @@ static int do_ipv6_setsockopt(struct sock *sk, int level, int optname,
break;
}

+unlock:
release_sock(sk);
if (needs_rtnl)
rtnl_unlock();
--
2.30.2

2022-09-27 18:29:39

by Kuniyuki Iwashima

[permalink] [raw]
Subject: Re: [PATCH v1 net 3/5] tcp/udp: Call inet6_destroy_sock() in IPv4 sk_prot->destroy().

From: Eric Dumazet <[email protected]>
Date: Tue, 27 Sep 2022 09:50:04 -0700
> On Tue, Sep 27, 2022 at 9:13 AM Kuniyuki Iwashima <[email protected]> wrote:
> >
> > Originally, inet6_sk(sk)->XXX were changed under lock_sock(), so we were
> > able to clean them up by calling inet6_destroy_sock() during the IPv6 ->
> > IPv4 conversion by IPV6_ADDRFORM. However, commit 03485f2adcde ("udpv6:
> > Add lockless sendmsg() support") added a lockless memory allocation path,
> > which could cause a memory leak:
> >
> > setsockopt(IPV6_ADDRFORM) sendmsg()
> > +-----------------------+ +-------+
> > - do_ipv6_setsockopt(sk, ...) - udpv6_sendmsg(sk, ...)
> > - lock_sock(sk) ^._ called via udpv6_prot
> > - WRITE_ONCE(sk->sk_prot, &tcp_prot) before WRITE_ONCE()
> > - inet6_destroy_sock()
> > - release_sock(sk) - ip6_make_skb(sk, ...)
> > ^._ lockless fast path for
> > the non-corking case
> >
> > - __ip6_append_data(sk, ...)
> > - ipv6_local_rxpmtu(sk, ...)
> > - xchg(&np->rxpmtu, skb)
> > ^._ rxpmtu is never freed.
> >
> > - lock_sock(sk)
> >
> > For now, rxpmtu is only the case, but let's call inet6_destroy_sock()
> > in both TCP/UDP v4 destroy functions not to miss the future change.
> >
> > We can consolidate TCP/UDP v4/v6 destroy functions, but such changes
> > are too invasive to backport to stable. So, they can be posted as a
> > follow-up later for net-next.
> >
> > Fixes: 03485f2adcde ("udpv6: Add lockless sendmsg() support")
> > Signed-off-by: Kuniyuki Iwashima <[email protected]>
> > ---
> > Cc: Vladislav Yasevich <[email protected]>
> > ---
> > net/ipv4/tcp_ipv4.c | 5 +++++
> > net/ipv4/udp.c | 6 ++++++
> > net/ipv6/tcp_ipv6.c | 1 -
> > 3 files changed, 11 insertions(+), 1 deletion(-)
> >
> > diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
> > index 5b019ba2b9d2..035b6c52a243 100644
> > --- a/net/ipv4/tcp_ipv4.c
> > +++ b/net/ipv4/tcp_ipv4.c
> > @@ -2263,6 +2263,11 @@ void tcp_v4_destroy_sock(struct sock *sk)
> > tcp_saved_syn_free(tp);
> >
> > sk_sockets_allocated_dec(sk);
> > +
> > +#if IS_ENABLED(CONFIG_IPV6)
> > + if (sk->sk_prot_creator == &tcpv6_prot)
> > + inet6_destroy_sock(sk);
> > +#endif
> > }
>
> This is ugly, and will not compile with CONFIG_IPV6=m, right ?

Ah, exactly...

ld: net/ipv4/tcp_ipv4.o: in function `tcp_v4_destroy_sock':
/mnt/ec2-user/kernel/214_tcp_ipv6_renew_options_memleak/net/ipv4/tcp_ipv4.c:2290: undefined reference to `tcpv6_prot'
ld: /mnt/ec2-user/kernel/214_tcp_ipv6_renew_options_memleak/net/ipv4/tcp_ipv4.c:2291: undefined reference to `inet6_destroy_sock'
ld: net/ipv4/udp.o: in function `udp_destroy_sock':
/mnt/ec2-user/kernel/214_tcp_ipv6_renew_options_memleak/net/ipv4/udp.c:2660: undefined reference to `udpv6_prot'
ld: /mnt/ec2-user/kernel/214_tcp_ipv6_renew_options_memleak/net/ipv4/udp.c:2661: undefined reference to `inet6_destroy_sock'

So, do we have to move these 4 under net/ipv4/ with #ifdef CONFIG_IPv6 ?


> > EXPORT_SYMBOL(tcp_v4_destroy_sock);
> >
> > diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
> > index 560d9eadeaa5..cdf131c0a819 100644
> > --- a/net/ipv4/udp.c
> > +++ b/net/ipv4/udp.c
> > @@ -115,6 +115,7 @@
> > #include <net/udp_tunnel.h>
> > #if IS_ENABLED(CONFIG_IPV6)
> > #include <net/ipv6_stubs.h>
> > +#include <net/transp_v6.h>
> > #endif
> >
> > struct udp_table udp_table __read_mostly;
> > @@ -2666,6 +2667,11 @@ void udp_destroy_sock(struct sock *sk)
> > if (up->encap_enabled)
> > static_branch_dec(&udp_encap_needed_key);
> > }
> > +
> > +#if IS_ENABLED(CONFIG_IPV6)
> > + if (sk->sk_prot_creator == &udpv6_prot)
> > + inet6_destroy_sock(sk);
> > +#endif
> > }
> >
> > /*
> > diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
> > index e54eee80ce5f..1ff6a92f7774 100644
> > --- a/net/ipv6/tcp_ipv6.c
> > +++ b/net/ipv6/tcp_ipv6.c
> > @@ -1945,7 +1945,6 @@ static int tcp_v6_init_sock(struct sock *sk)
> > static void tcp_v6_destroy_sock(struct sock *sk)
> > {
> > tcp_v4_destroy_sock(sk);
> > - inet6_destroy_sock(sk);
> > }
> >
> > #ifdef CONFIG_PROC_FS
> > --
> > 2.30.2
> >

2022-09-27 18:47:42

by Kuniyuki Iwashima

[permalink] [raw]
Subject: Re: [PATCH v1 net 5/5] tcp: Fix data races around icsk->icsk_af_ops.

From: Eric Dumazet <[email protected]>
Date: Tue, 27 Sep 2022 09:55:03 -0700
> On Tue, Sep 27, 2022 at 9:48 AM Kuniyuki Iwashima <[email protected]> wrote:
> >
> > From: Eric Dumazet <[email protected]>
> > Date: Tue, 27 Sep 2022 09:39:37 -0700
> > > On Tue, Sep 27, 2022 at 9:33 AM Kuniyuki Iwashima <[email protected]> wrote:
> > > >
> > > > IPV6_ADDRFORM changes icsk->icsk_af_ops under lock_sock(), but
> > > > tcp_(get|set)sockopt() read it locklessly. To avoid load/store
> > > > tearing, we need to add READ_ONCE() and WRITE_ONCE() for the reads
> > > > and write.
> > >
> > > I am pretty sure I have released a syzkaller bug recently with this issue.
> > > Have you seen this?
> > > If yes, please include the appropriate syzbot tag.
> >
> > No, I haven't.
> > Could you provide the URL?
> > I'm happy to include the syzbot tag and KCSAN report in the changelog.
> >
> >
>
> Report has been released 10 days ago, but apparently the syzbot queue
> is so full these days that the report is still throttled.

Thank you!
I'll add this in v2.


>
> ==================================================================
> BUG: KCSAN: data-race in tcp_setsockopt / tcp_v6_connect
>
> write to 0xffff88813c624518 of 8 bytes by task 23936 on cpu 0:
> tcp_v6_connect+0x5b3/0xce0 net/ipv6/tcp_ipv6.c:240
> __inet_stream_connect+0x159/0x6d0 net/ipv4/af_inet.c:660
> inet_stream_connect+0x44/0x70 net/ipv4/af_inet.c:724
> __sys_connect_file net/socket.c:1976 [inline]
> __sys_connect+0x197/0x1b0 net/socket.c:1993
> __do_sys_connect net/socket.c:2003 [inline]
> __se_sys_connect net/socket.c:2000 [inline]
> __x64_sys_connect+0x3d/0x50 net/socket.c:2000
> do_syscall_x64 arch/x86/entry/common.c:50 [inline]
> do_syscall_64+0x2b/0x70 arch/x86/entry/common.c:80
> entry_SYSCALL_64_after_hwframe+0x63/0xcd
>
> read to 0xffff88813c624518 of 8 bytes by task 23937 on cpu 1:
> tcp_setsockopt+0x147/0x1c80 net/ipv4/tcp.c:3789
> sock_common_setsockopt+0x5d/0x70 net/core/sock.c:3585
> __sys_setsockopt+0x212/0x2b0 net/socket.c:2252
> __do_sys_setsockopt net/socket.c:2263 [inline]
> __se_sys_setsockopt net/socket.c:2260 [inline]
> __x64_sys_setsockopt+0x62/0x70 net/socket.c:2260
> do_syscall_x64 arch/x86/entry/common.c:50 [inline]
> do_syscall_64+0x2b/0x70 arch/x86/entry/common.c:80
> entry_SYSCALL_64_after_hwframe+0x63/0xcd
>
> value changed: 0xffffffff8539af68 -> 0xffffffff8539aff8
>
> Reported by Kernel Concurrency Sanitizer on:
> CPU: 1 PID: 23937 Comm: syz-executor.5 Not tainted
> 6.0.0-rc4-syzkaller-00331-g4ed9c1e971b1-dirty #0
>
> Hardware name: Google Google Compute Engine/Google Compute Engine,
> BIOS Google 08/26/2022
> ==================================================================
>
> > > > Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
> > > > Signed-off-by: Kuniyuki Iwashima <[email protected]>
> > > > ---
> > > > net/ipv4/tcp.c | 10 ++++++----
> > > > net/ipv6/ipv6_sockglue.c | 3 ++-
> > > > 2 files changed, 8 insertions(+), 5 deletions(-)
> > > >
> > > > diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> > > > index e373dde1f46f..c86dd0ccef5b 100644
> > > > --- a/net/ipv4/tcp.c
> > > > +++ b/net/ipv4/tcp.c
> > > > @@ -3795,8 +3795,9 @@ int tcp_setsockopt(struct sock *sk, int level, int optname, sockptr_t optval,
> > > > const struct inet_connection_sock *icsk = inet_csk(sk);
> > > >
> > > > if (level != SOL_TCP)
> > > > - return icsk->icsk_af_ops->setsockopt(sk, level, optname,
> > > > - optval, optlen);
> > > > + /* IPV6_ADDRFORM can change icsk->icsk_af_ops under us. */
> > > > + return READ_ONCE(icsk->icsk_af_ops)->setsockopt(sk, level, optname,
> > > > + optval, optlen);
> > > > return do_tcp_setsockopt(sk, level, optname, optval, optlen);
> > > > }
> > > > EXPORT_SYMBOL(tcp_setsockopt);
> > > > @@ -4394,8 +4395,9 @@ int tcp_getsockopt(struct sock *sk, int level, int optname, char __user *optval,
> > > > struct inet_connection_sock *icsk = inet_csk(sk);
> > > >
> > > > if (level != SOL_TCP)
> > > > - return icsk->icsk_af_ops->getsockopt(sk, level, optname,
> > > > - optval, optlen);
> > > > + /* IPV6_ADDRFORM can change icsk->icsk_af_ops under us. */
> > > > + return READ_ONCE(icsk->icsk_af_ops)->getsockopt(sk, level, optname,
> > > > + optval, optlen);
> > > > return do_tcp_getsockopt(sk, level, optname, optval, optlen);
> > > > }
> > > > EXPORT_SYMBOL(tcp_getsockopt);
> > > > diff --git a/net/ipv6/ipv6_sockglue.c b/net/ipv6/ipv6_sockglue.c
> > > > index a89db5872dc3..726d95859898 100644
> > > > --- a/net/ipv6/ipv6_sockglue.c
> > > > +++ b/net/ipv6/ipv6_sockglue.c
> > > > @@ -479,7 +479,8 @@ static int do_ipv6_setsockopt(struct sock *sk, int level, int optname,
> > > >
> > > > /* Paired with READ_ONCE(sk->sk_prot) in inet6_stream_ops */
> > > > WRITE_ONCE(sk->sk_prot, &tcp_prot);
> > > > - icsk->icsk_af_ops = &ipv4_specific;
> > > > + /* Paired with READ_ONCE() in tcp_(get|set)sockopt() */
> > > > + WRITE_ONCE(icsk->icsk_af_ops, &ipv4_specific);
> > > > sk->sk_socket->ops = &inet_stream_ops;
> > > > sk->sk_family = PF_INET;
> > > > tcp_sync_mss(sk, icsk->icsk_pmtu_cookie);
> > > > --
> > > > 2.30.2