2022-10-06 19:28:28

by Kuniyuki Iwashima

[permalink] [raw]
Subject: [PATCH v5 net 0/5] tcp/udp: Fix memory leaks and data races around IPV6_ADDRFORM.

This series fixes some memory leaks and data races caused in the
same scenario where one thread converts an IPv6 socket into IPv4
with IPV6_ADDRFORM and another accesses the socket concurrently.


Changes:
v5:
* Patch 1 & 5
* Rebase and resolve conflicts with 24426654ed3a and 34704ef024ae

* Patch 3:
* Remove 'inline' from udplite_sk_init()
* Remove EXPORT_SYMBOL_GPL(inet6_sock_destruct)

v4: https://lore.kernel.org/netdev/[email protected]/
* Patch 3:
* Change UDPv6 Lite's sk->sk_prot->init() and sk->destruct() as well.
* Move udplite_sk_init() from udplite.h to udplite.c.

v3 (Resend): https://lore.kernel.org/netdev/[email protected]/
* CC blamed commits' EHOSTUNREACH authors to make patchwork happy

v3: https://lore.kernel.org/netdev/[email protected]/
* Patch 2:
* Add comment for np->rxopt.all = 0
* Add inet6_cleanup_sock()
* Patch 3:
* Call inet6_cleanup_sock() instead of inet6_destroy_sock()

v2: https://lore.kernel.org/netdev/[email protected]/
* Patch 3:
* Move inet6_destroy_sock() from sk_prot->destroy()
to sk->sk_destruct() and fix CONFIG_IPV6=m build failure
* Patch 5:
* Add WRITE_ONCE()s in tcp_v6_connect()
* Add Reported-by tags and KCSAN log in changelog

v1: https://lore.kernel.org/netdev/[email protected]/


Kuniyuki Iwashima (5):
tcp/udp: Fix memory leak in ipv6_renew_options().
udp: Call inet6_destroy_sock() in setsockopt(IPV6_ADDRFORM).
tcp/udp: Call inet6_destroy_sock() in IPv6 sk->sk_destruct().
ipv6: Fix data races around sk->sk_prot.
tcp: Fix data races around icsk->icsk_af_ops.

include/net/ipv6.h | 2 ++
include/net/udp.h | 2 +-
include/net/udplite.h | 8 --------
net/core/sock.c | 6 ++++--
net/ipv4/af_inet.c | 23 ++++++++++++++++-------
net/ipv4/tcp.c | 10 ++++++----
net/ipv4/udp.c | 9 ++++++---
net/ipv4/udplite.c | 8 ++++++++
net/ipv6/af_inet6.c | 14 +++++++++++++-
net/ipv6/ipv6_sockglue.c | 34 +++++++++++++++++++---------------
net/ipv6/tcp_ipv6.c | 6 ++++--
net/ipv6/udp.c | 15 ++++++++++++++-
net/ipv6/udp_impl.h | 1 +
net/ipv6/udplite.c | 9 ++++++++-
14 files changed, 102 insertions(+), 45 deletions(-)

--
2.30.2


2022-10-06 19:31:40

by Kuniyuki Iwashima

[permalink] [raw]
Subject: [PATCH v5 net 4/5] ipv6: Fix data races around sk->sk_prot.

Commit 086d49058cd8 ("ipv6: annotate some data-races around sk->sk_prot")
fixed some data-races around sk->sk_prot but it was not enough.

Some functions in inet6_(stream|dgram)_ops still access sk->sk_prot
without lock_sock() or rtnl_lock(), so they need READ_ONCE() to avoid
load tearing.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Kuniyuki Iwashima <[email protected]>
---
net/core/sock.c | 6 ++++--
net/ipv4/af_inet.c | 23 ++++++++++++++++-------
net/ipv6/ipv6_sockglue.c | 4 ++--
3 files changed, 22 insertions(+), 11 deletions(-)

diff --git a/net/core/sock.c b/net/core/sock.c
index eeb6cbac6f49..a3ba0358c77c 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -3610,7 +3610,8 @@ int sock_common_getsockopt(struct socket *sock, int level, int optname,
{
struct sock *sk = sock->sk;

- return sk->sk_prot->getsockopt(sk, level, optname, optval, optlen);
+ /* IPV6_ADDRFORM can change sk->sk_prot under us. */
+ return READ_ONCE(sk->sk_prot)->getsockopt(sk, level, optname, optval, optlen);
}
EXPORT_SYMBOL(sock_common_getsockopt);

@@ -3636,7 +3637,8 @@ int sock_common_setsockopt(struct socket *sock, int level, int optname,
{
struct sock *sk = sock->sk;

- return sk->sk_prot->setsockopt(sk, level, optname, optval, optlen);
+ /* IPV6_ADDRFORM can change sk->sk_prot under us. */
+ return READ_ONCE(sk->sk_prot)->setsockopt(sk, level, optname, optval, optlen);
}
EXPORT_SYMBOL(sock_common_setsockopt);

diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index e2c219382345..3dd02396517d 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -558,22 +558,27 @@ int inet_dgram_connect(struct socket *sock, struct sockaddr *uaddr,
int addr_len, int flags)
{
struct sock *sk = sock->sk;
+ const struct proto *prot;
int err;

if (addr_len < sizeof(uaddr->sa_family))
return -EINVAL;
+
+ /* IPV6_ADDRFORM can change sk->sk_prot under us. */
+ prot = READ_ONCE(sk->sk_prot);
+
if (uaddr->sa_family == AF_UNSPEC)
- return sk->sk_prot->disconnect(sk, flags);
+ return prot->disconnect(sk, flags);

if (BPF_CGROUP_PRE_CONNECT_ENABLED(sk)) {
- err = sk->sk_prot->pre_connect(sk, uaddr, addr_len);
+ err = prot->pre_connect(sk, uaddr, addr_len);
if (err)
return err;
}

if (data_race(!inet_sk(sk)->inet_num) && inet_autobind(sk))
return -EAGAIN;
- return sk->sk_prot->connect(sk, uaddr, addr_len);
+ return prot->connect(sk, uaddr, addr_len);
}
EXPORT_SYMBOL(inet_dgram_connect);

@@ -734,10 +739,11 @@ EXPORT_SYMBOL(inet_stream_connect);
int inet_accept(struct socket *sock, struct socket *newsock, int flags,
bool kern)
{
- struct sock *sk1 = sock->sk;
+ struct sock *sk1 = sock->sk, *sk2;
int err = -EINVAL;
- struct sock *sk2 = sk1->sk_prot->accept(sk1, flags, &err, kern);

+ /* IPV6_ADDRFORM can change sk->sk_prot under us. */
+ sk2 = READ_ONCE(sk1->sk_prot)->accept(sk1, flags, &err, kern);
if (!sk2)
goto do_err;

@@ -825,12 +831,15 @@ ssize_t inet_sendpage(struct socket *sock, struct page *page, int offset,
size_t size, int flags)
{
struct sock *sk = sock->sk;
+ const struct proto *prot;

if (unlikely(inet_send_prepare(sk)))
return -EAGAIN;

- if (sk->sk_prot->sendpage)
- return sk->sk_prot->sendpage(sk, page, offset, size, flags);
+ /* IPV6_ADDRFORM can change sk->sk_prot under us. */
+ prot = READ_ONCE(sk->sk_prot);
+ if (prot->sendpage)
+ return prot->sendpage(sk, page, offset, size, flags);
return sock_no_sendpage(sock, page, offset, size, flags);
}
EXPORT_SYMBOL(inet_sendpage);
diff --git a/net/ipv6/ipv6_sockglue.c b/net/ipv6/ipv6_sockglue.c
index a20edae868fd..d7207a546aec 100644
--- a/net/ipv6/ipv6_sockglue.c
+++ b/net/ipv6/ipv6_sockglue.c
@@ -477,7 +477,7 @@ int do_ipv6_setsockopt(struct sock *sk, int level, int optname,
sock_prot_inuse_add(net, sk->sk_prot, -1);
sock_prot_inuse_add(net, &tcp_prot, 1);

- /* Paired with READ_ONCE(sk->sk_prot) in net/ipv6/af_inet6.c */
+ /* Paired with READ_ONCE(sk->sk_prot) in inet6_stream_ops */
WRITE_ONCE(sk->sk_prot, &tcp_prot);
icsk->icsk_af_ops = &ipv4_specific;
sk->sk_socket->ops = &inet_stream_ops;
@@ -492,7 +492,7 @@ int do_ipv6_setsockopt(struct sock *sk, int level, int optname,
sock_prot_inuse_add(net, sk->sk_prot, -1);
sock_prot_inuse_add(net, prot, 1);

- /* Paired with READ_ONCE(sk->sk_prot) in net/ipv6/af_inet6.c */
+ /* Paired with READ_ONCE(sk->sk_prot) in inet6_dgram_ops */
WRITE_ONCE(sk->sk_prot, prot);
sk->sk_socket->ops = &inet_dgram_ops;
sk->sk_family = PF_INET;
--
2.30.2

2022-10-06 19:31:42

by Kuniyuki Iwashima

[permalink] [raw]
Subject: [PATCH v5 net 2/5] udp: Call inet6_destroy_sock() in setsockopt(IPV6_ADDRFORM).

Commit 4b340ae20d0e ("IPv6: Complete IPV6_DONTFRAG support") forgot
to add a change to free inet6_sk(sk)->rxpmtu while converting an IPv6
socket into IPv4 with IPV6_ADDRFORM. After conversion, sk_prot is
changed to udp_prot and ->destroy() never cleans it up, resulting in
a memory leak.

This is due to the discrepancy between inet6_destroy_sock() and
IPV6_ADDRFORM, so let's call inet6_destroy_sock() from IPV6_ADDRFORM
to remove the difference.

However, this is not enough for now because rxpmtu can be changed
without lock_sock() after commit 03485f2adcde ("udpv6: Add lockless
sendmsg() support"). We will fix this case in the following patch.

Note we will rename inet6_destroy_sock() to inet6_cleanup_sock() and
remove unnecessary inet6_destroy_sock() calls in sk_prot->destroy()
in the future.

Fixes: 4b340ae20d0e ("IPv6: Complete IPV6_DONTFRAG support")
Signed-off-by: Kuniyuki Iwashima <[email protected]>
---
Cc: Brian Haley <[email protected]>
---
include/net/ipv6.h | 1 +
net/ipv6/af_inet6.c | 6 ++++++
net/ipv6/ipv6_sockglue.c | 20 ++++++++------------
3 files changed, 15 insertions(+), 12 deletions(-)

diff --git a/include/net/ipv6.h b/include/net/ipv6.h
index d664ba5812d8..335a49ecd8a0 100644
--- a/include/net/ipv6.h
+++ b/include/net/ipv6.h
@@ -1182,6 +1182,7 @@ void ipv6_icmp_error(struct sock *sk, struct sk_buff *skb, int err, __be16 port,
void ipv6_local_error(struct sock *sk, int err, struct flowi6 *fl6, u32 info);
void ipv6_local_rxpmtu(struct sock *sk, struct flowi6 *fl6, u32 mtu);

+void inet6_cleanup_sock(struct sock *sk);
int inet6_release(struct socket *sock);
int inet6_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len);
int inet6_getname(struct socket *sock, struct sockaddr *uaddr,
diff --git a/net/ipv6/af_inet6.c b/net/ipv6/af_inet6.c
index d40b7d60e00e..ded827944fa6 100644
--- a/net/ipv6/af_inet6.c
+++ b/net/ipv6/af_inet6.c
@@ -510,6 +510,12 @@ void inet6_destroy_sock(struct sock *sk)
}
EXPORT_SYMBOL_GPL(inet6_destroy_sock);

+void inet6_cleanup_sock(struct sock *sk)
+{
+ inet6_destroy_sock(sk);
+}
+EXPORT_SYMBOL_GPL(inet6_cleanup_sock);
+
/*
* This does both peername and sockname.
*/
diff --git a/net/ipv6/ipv6_sockglue.c b/net/ipv6/ipv6_sockglue.c
index 408345fc4c5c..a20edae868fd 100644
--- a/net/ipv6/ipv6_sockglue.c
+++ b/net/ipv6/ipv6_sockglue.c
@@ -431,9 +431,6 @@ int do_ipv6_setsockopt(struct sock *sk, int level, int optname,
if (optlen < sizeof(int))
goto e_inval;
if (val == PF_INET) {
- struct ipv6_txoptions *opt;
- struct sk_buff *pktopt;
-
if (sk->sk_type == SOCK_RAW)
break;

@@ -464,7 +461,6 @@ int do_ipv6_setsockopt(struct sock *sk, int level, int optname,
break;
}

- fl6_free_socklist(sk);
__ipv6_sock_mc_close(sk);
__ipv6_sock_ac_close(sk);

@@ -501,14 +497,14 @@ int do_ipv6_setsockopt(struct sock *sk, int level, int optname,
sk->sk_socket->ops = &inet_dgram_ops;
sk->sk_family = PF_INET;
}
- opt = xchg((__force struct ipv6_txoptions **)&np->opt,
- NULL);
- if (opt) {
- atomic_sub(opt->tot_len, &sk->sk_omem_alloc);
- txopt_put(opt);
- }
- pktopt = xchg(&np->pktoptions, NULL);
- kfree_skb(pktopt);
+
+ /* Disable all options not to allocate memory anymore,
+ * but there is still a race. See the lockless path
+ * in udpv6_sendmsg() and ipv6_local_rxpmtu().
+ */
+ np->rxopt.all = 0;
+
+ inet6_cleanup_sock(sk);

/*
* ... and add it to the refcnt debug socks count
--
2.30.2

2022-10-06 19:36:15

by Kuniyuki Iwashima

[permalink] [raw]
Subject: [PATCH v5 net 3/5] tcp/udp: Call inet6_destroy_sock() in IPv6 sk->sk_destruct().

Originally, inet6_sk(sk)->XXX were changed under lock_sock(), so we were
able to clean them up by calling inet6_destroy_sock() during the IPv6 ->
IPv4 conversion by IPV6_ADDRFORM. However, commit 03485f2adcde ("udpv6:
Add lockless sendmsg() support") added a lockless memory allocation path,
which could cause a memory leak:

setsockopt(IPV6_ADDRFORM) sendmsg()
+-----------------------+ +-------+
- do_ipv6_setsockopt(sk, ...) - udpv6_sendmsg(sk, ...)
- sockopt_lock_sock(sk) ^._ called via udpv6_prot
- lock_sock(sk) before WRITE_ONCE()
- WRITE_ONCE(sk->sk_prot, &tcp_prot)
- inet6_destroy_sock() - if (!corkreq)
- sockopt_release_sock(sk) - ip6_make_skb(sk, ...)
- release_sock(sk) ^._ lockless fast path for
the non-corking case

- __ip6_append_data(sk, ...)
- ipv6_local_rxpmtu(sk, ...)
- xchg(&np->rxpmtu, skb)
^._ rxpmtu is never freed.

- goto out_no_dst;

- lock_sock(sk)

For now, rxpmtu is only the case, but not to miss the future change
and a similar bug fixed in commit e27326009a3d ("net: ping6: Fix
memleak in ipv6_renew_options()."), let's set a new function to IPv6
sk->sk_destruct() and call inet6_cleanup_sock() there. Since the
conversion does not change sk->sk_destruct(), we can guarantee that
we can clean up IPv6 resources finally.

We can now remove all inet6_destroy_sock() calls from IPv6 protocol
specific ->destroy() functions, but such changes are invasive to
backport. So they can be posted as a follow-up later for net-next.

Fixes: 03485f2adcde ("udpv6: Add lockless sendmsg() support")
Signed-off-by: Kuniyuki Iwashima <[email protected]>
---
Cc: Vladislav Yasevich <[email protected]>
---
include/net/ipv6.h | 1 +
include/net/udp.h | 2 +-
include/net/udplite.h | 8 --------
net/ipv4/udp.c | 9 ++++++---
net/ipv4/udplite.c | 8 ++++++++
net/ipv6/af_inet6.c | 8 +++++++-
net/ipv6/udp.c | 15 ++++++++++++++-
net/ipv6/udp_impl.h | 1 +
net/ipv6/udplite.c | 9 ++++++++-
9 files changed, 46 insertions(+), 15 deletions(-)

diff --git a/include/net/ipv6.h b/include/net/ipv6.h
index 335a49ecd8a0..37943ba3a73c 100644
--- a/include/net/ipv6.h
+++ b/include/net/ipv6.h
@@ -1183,6 +1183,7 @@ void ipv6_local_error(struct sock *sk, int err, struct flowi6 *fl6, u32 info);
void ipv6_local_rxpmtu(struct sock *sk, struct flowi6 *fl6, u32 mtu);

void inet6_cleanup_sock(struct sock *sk);
+void inet6_sock_destruct(struct sock *sk);
int inet6_release(struct socket *sock);
int inet6_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len);
int inet6_getname(struct socket *sock, struct sockaddr *uaddr,
diff --git a/include/net/udp.h b/include/net/udp.h
index 5ee88ddf79c3..fee053bcd17c 100644
--- a/include/net/udp.h
+++ b/include/net/udp.h
@@ -247,7 +247,7 @@ static inline bool udp_sk_bound_dev_eq(struct net *net, int bound_dev_if,
}

/* net/ipv4/udp.c */
-void udp_destruct_sock(struct sock *sk);
+void udp_destruct_common(struct sock *sk);
void skb_consume_udp(struct sock *sk, struct sk_buff *skb, int len);
int __udp_enqueue_schedule_skb(struct sock *sk, struct sk_buff *skb);
void udp_skb_destructor(struct sock *sk, struct sk_buff *skb);
diff --git a/include/net/udplite.h b/include/net/udplite.h
index 0143b373602e..299c14ce2bb9 100644
--- a/include/net/udplite.h
+++ b/include/net/udplite.h
@@ -25,14 +25,6 @@ static __inline__ int udplite_getfrag(void *from, char *to, int offset,
return copy_from_iter_full(to, len, &msg->msg_iter) ? 0 : -EFAULT;
}

-/* Designate sk as UDP-Lite socket */
-static inline int udplite_sk_init(struct sock *sk)
-{
- udp_init_sock(sk);
- udp_sk(sk)->pcflag = UDPLITE_BIT;
- return 0;
-}
-
/*
* Checksumming routines
*/
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index d63118ce5900..8126f67d18b3 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -1598,7 +1598,7 @@ int __udp_enqueue_schedule_skb(struct sock *sk, struct sk_buff *skb)
}
EXPORT_SYMBOL_GPL(__udp_enqueue_schedule_skb);

-void udp_destruct_sock(struct sock *sk)
+void udp_destruct_common(struct sock *sk)
{
/* reclaim completely the forward allocated memory */
struct udp_sock *up = udp_sk(sk);
@@ -1611,10 +1611,14 @@ void udp_destruct_sock(struct sock *sk)
kfree_skb(skb);
}
udp_rmem_release(sk, total, 0, true);
+}
+EXPORT_SYMBOL_GPL(udp_destruct_common);

+static void udp_destruct_sock(struct sock *sk)
+{
+ udp_destruct_common(sk);
inet_sock_destruct(sk);
}
-EXPORT_SYMBOL_GPL(udp_destruct_sock);

int udp_init_sock(struct sock *sk)
{
@@ -1622,7 +1626,6 @@ int udp_init_sock(struct sock *sk)
sk->sk_destruct = udp_destruct_sock;
return 0;
}
-EXPORT_SYMBOL_GPL(udp_init_sock);

void skb_consume_udp(struct sock *sk, struct sk_buff *skb, int len)
{
diff --git a/net/ipv4/udplite.c b/net/ipv4/udplite.c
index 6e08a76ae1e7..e0c9cc39b81e 100644
--- a/net/ipv4/udplite.c
+++ b/net/ipv4/udplite.c
@@ -17,6 +17,14 @@
struct udp_table udplite_table __read_mostly;
EXPORT_SYMBOL(udplite_table);

+/* Designate sk as UDP-Lite socket */
+static int udplite_sk_init(struct sock *sk)
+{
+ udp_init_sock(sk);
+ udp_sk(sk)->pcflag = UDPLITE_BIT;
+ return 0;
+}
+
static int udplite_rcv(struct sk_buff *skb)
{
return __udp4_lib_rcv(skb, &udplite_table, IPPROTO_UDPLITE);
diff --git a/net/ipv6/af_inet6.c b/net/ipv6/af_inet6.c
index ded827944fa6..024191004982 100644
--- a/net/ipv6/af_inet6.c
+++ b/net/ipv6/af_inet6.c
@@ -109,6 +109,12 @@ static __inline__ struct ipv6_pinfo *inet6_sk_generic(struct sock *sk)
return (struct ipv6_pinfo *)(((u8 *)sk) + offset);
}

+void inet6_sock_destruct(struct sock *sk)
+{
+ inet6_cleanup_sock(sk);
+ inet_sock_destruct(sk);
+}
+
static int inet6_create(struct net *net, struct socket *sock, int protocol,
int kern)
{
@@ -201,7 +207,7 @@ static int inet6_create(struct net *net, struct socket *sock, int protocol,
inet->hdrincl = 1;
}

- sk->sk_destruct = inet_sock_destruct;
+ sk->sk_destruct = inet6_sock_destruct;
sk->sk_family = PF_INET6;
sk->sk_protocol = protocol;

diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index 91e795bb9ade..8d09f0ea5b8c 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -56,6 +56,19 @@
#include <trace/events/skb.h>
#include "udp_impl.h"

+static void udpv6_destruct_sock(struct sock *sk)
+{
+ udp_destruct_common(sk);
+ inet6_sock_destruct(sk);
+}
+
+int udpv6_init_sock(struct sock *sk)
+{
+ skb_queue_head_init(&udp_sk(sk)->reader_queue);
+ sk->sk_destruct = udpv6_destruct_sock;
+ return 0;
+}
+
static u32 udp6_ehashfn(const struct net *net,
const struct in6_addr *laddr,
const u16 lport,
@@ -1733,7 +1746,7 @@ struct proto udpv6_prot = {
.connect = ip6_datagram_connect,
.disconnect = udp_disconnect,
.ioctl = udp_ioctl,
- .init = udp_init_sock,
+ .init = udpv6_init_sock,
.destroy = udpv6_destroy_sock,
.setsockopt = udpv6_setsockopt,
.getsockopt = udpv6_getsockopt,
diff --git a/net/ipv6/udp_impl.h b/net/ipv6/udp_impl.h
index 4251e49d32a0..0590f566379d 100644
--- a/net/ipv6/udp_impl.h
+++ b/net/ipv6/udp_impl.h
@@ -12,6 +12,7 @@ int __udp6_lib_rcv(struct sk_buff *, struct udp_table *, int);
int __udp6_lib_err(struct sk_buff *, struct inet6_skb_parm *, u8, u8, int,
__be32, struct udp_table *);

+int udpv6_init_sock(struct sock *sk);
int udp_v6_get_port(struct sock *sk, unsigned short snum);
void udp_v6_rehash(struct sock *sk);

diff --git a/net/ipv6/udplite.c b/net/ipv6/udplite.c
index b70725856259..67eaf3ca14ce 100644
--- a/net/ipv6/udplite.c
+++ b/net/ipv6/udplite.c
@@ -12,6 +12,13 @@
#include <linux/proc_fs.h>
#include "udp_impl.h"

+static int udplitev6_sk_init(struct sock *sk)
+{
+ udpv6_init_sock(sk);
+ udp_sk(sk)->pcflag = UDPLITE_BIT;
+ return 0;
+}
+
static int udplitev6_rcv(struct sk_buff *skb)
{
return __udp6_lib_rcv(skb, &udplite_table, IPPROTO_UDPLITE);
@@ -38,7 +45,7 @@ struct proto udplitev6_prot = {
.connect = ip6_datagram_connect,
.disconnect = udp_disconnect,
.ioctl = udp_ioctl,
- .init = udplite_sk_init,
+ .init = udplitev6_sk_init,
.destroy = udpv6_destroy_sock,
.setsockopt = udpv6_setsockopt,
.getsockopt = udpv6_getsockopt,
--
2.30.2

2022-10-06 20:08:16

by Kuniyuki Iwashima

[permalink] [raw]
Subject: [PATCH v5 net 5/5] tcp: Fix data races around icsk->icsk_af_ops.

setsockopt(IPV6_ADDRFORM) and tcp_v6_connect() change icsk->icsk_af_ops
under lock_sock(), but tcp_(get|set)sockopt() read it locklessly. To
avoid load/store tearing, we need to add READ_ONCE() and WRITE_ONCE()
for the reads and writes.

Thanks to Eric Dumazet for providing the syzbot report:

BUG: KCSAN: data-race in tcp_setsockopt / tcp_v6_connect

write to 0xffff88813c624518 of 8 bytes by task 23936 on cpu 0:
tcp_v6_connect+0x5b3/0xce0 net/ipv6/tcp_ipv6.c:240
__inet_stream_connect+0x159/0x6d0 net/ipv4/af_inet.c:660
inet_stream_connect+0x44/0x70 net/ipv4/af_inet.c:724
__sys_connect_file net/socket.c:1976 [inline]
__sys_connect+0x197/0x1b0 net/socket.c:1993
__do_sys_connect net/socket.c:2003 [inline]
__se_sys_connect net/socket.c:2000 [inline]
__x64_sys_connect+0x3d/0x50 net/socket.c:2000
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x2b/0x70 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

read to 0xffff88813c624518 of 8 bytes by task 23937 on cpu 1:
tcp_setsockopt+0x147/0x1c80 net/ipv4/tcp.c:3789
sock_common_setsockopt+0x5d/0x70 net/core/sock.c:3585
__sys_setsockopt+0x212/0x2b0 net/socket.c:2252
__do_sys_setsockopt net/socket.c:2263 [inline]
__se_sys_setsockopt net/socket.c:2260 [inline]
__x64_sys_setsockopt+0x62/0x70 net/socket.c:2260
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x2b/0x70 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

value changed: 0xffffffff8539af68 -> 0xffffffff8539aff8

Reported by Kernel Concurrency Sanitizer on:
CPU: 1 PID: 23937 Comm: syz-executor.5 Not tainted
6.0.0-rc4-syzkaller-00331-g4ed9c1e971b1-dirty #0

Hardware name: Google Google Compute Engine/Google Compute Engine,
BIOS Google 08/26/2022

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Reported-by: syzbot <[email protected]>
Reported-by: Eric Dumazet <[email protected]>
Signed-off-by: Kuniyuki Iwashima <[email protected]>
---
net/ipv4/tcp.c | 10 ++++++----
net/ipv6/ipv6_sockglue.c | 3 ++-
net/ipv6/tcp_ipv6.c | 6 ++++--
3 files changed, 12 insertions(+), 7 deletions(-)

diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 0c51abeee172..f8232811a5be 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -3796,8 +3796,9 @@ int tcp_setsockopt(struct sock *sk, int level, int optname, sockptr_t optval,
const struct inet_connection_sock *icsk = inet_csk(sk);

if (level != SOL_TCP)
- return icsk->icsk_af_ops->setsockopt(sk, level, optname,
- optval, optlen);
+ /* Paired with WRITE_ONCE() in do_ipv6_setsockopt() and tcp_v6_connect() */
+ return READ_ONCE(icsk->icsk_af_ops)->setsockopt(sk, level, optname,
+ optval, optlen);
return do_tcp_setsockopt(sk, level, optname, optval, optlen);
}
EXPORT_SYMBOL(tcp_setsockopt);
@@ -4396,8 +4397,9 @@ int tcp_getsockopt(struct sock *sk, int level, int optname, char __user *optval,
struct inet_connection_sock *icsk = inet_csk(sk);

if (level != SOL_TCP)
- return icsk->icsk_af_ops->getsockopt(sk, level, optname,
- optval, optlen);
+ /* Paired with WRITE_ONCE() in do_ipv6_setsockopt() and tcp_v6_connect() */
+ return READ_ONCE(icsk->icsk_af_ops)->getsockopt(sk, level, optname,
+ optval, optlen);
return do_tcp_getsockopt(sk, level, optname, USER_SOCKPTR(optval),
USER_SOCKPTR(optlen));
}
diff --git a/net/ipv6/ipv6_sockglue.c b/net/ipv6/ipv6_sockglue.c
index d7207a546aec..532f4478c884 100644
--- a/net/ipv6/ipv6_sockglue.c
+++ b/net/ipv6/ipv6_sockglue.c
@@ -479,7 +479,8 @@ int do_ipv6_setsockopt(struct sock *sk, int level, int optname,

/* Paired with READ_ONCE(sk->sk_prot) in inet6_stream_ops */
WRITE_ONCE(sk->sk_prot, &tcp_prot);
- icsk->icsk_af_ops = &ipv4_specific;
+ /* Paired with READ_ONCE() in tcp_(get|set)sockopt() */
+ WRITE_ONCE(icsk->icsk_af_ops, &ipv4_specific);
sk->sk_socket->ops = &inet_stream_ops;
sk->sk_family = PF_INET;
tcp_sync_mss(sk, icsk->icsk_pmtu_cookie);
diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
index a8adda623da1..2a3f9296df1e 100644
--- a/net/ipv6/tcp_ipv6.c
+++ b/net/ipv6/tcp_ipv6.c
@@ -238,7 +238,8 @@ static int tcp_v6_connect(struct sock *sk, struct sockaddr *uaddr,
sin.sin_port = usin->sin6_port;
sin.sin_addr.s_addr = usin->sin6_addr.s6_addr32[3];

- icsk->icsk_af_ops = &ipv6_mapped;
+ /* Paired with READ_ONCE() in tcp_(get|set)sockopt() */
+ WRITE_ONCE(icsk->icsk_af_ops, &ipv6_mapped);
if (sk_is_mptcp(sk))
mptcpv6_handle_mapped(sk, true);
sk->sk_backlog_rcv = tcp_v4_do_rcv;
@@ -250,7 +251,8 @@ static int tcp_v6_connect(struct sock *sk, struct sockaddr *uaddr,

if (err) {
icsk->icsk_ext_hdr_len = exthdrlen;
- icsk->icsk_af_ops = &ipv6_specific;
+ /* Paired with READ_ONCE() in tcp_(get|set)sockopt() */
+ WRITE_ONCE(icsk->icsk_af_ops, &ipv6_specific);
if (sk_is_mptcp(sk))
mptcpv6_handle_mapped(sk, false);
sk->sk_backlog_rcv = tcp_v6_do_rcv;
--
2.30.2

2022-10-11 10:30:45

by Paolo Abeni

[permalink] [raw]
Subject: Re: [PATCH v5 net 2/5] udp: Call inet6_destroy_sock() in setsockopt(IPV6_ADDRFORM).

On Thu, 2022-10-06 at 11:53 -0700, Kuniyuki Iwashima wrote:
> Commit 4b340ae20d0e ("IPv6: Complete IPV6_DONTFRAG support") forgot
> to add a change to free inet6_sk(sk)->rxpmtu while converting an IPv6
> socket into IPv4 with IPV6_ADDRFORM. After conversion, sk_prot is
> changed to udp_prot and ->destroy() never cleans it up, resulting in
> a memory leak.
>
> This is due to the discrepancy between inet6_destroy_sock() and
> IPV6_ADDRFORM, so let's call inet6_destroy_sock() from IPV6_ADDRFORM
> to remove the difference.
>
> However, this is not enough for now because rxpmtu can be changed
> without lock_sock() after commit 03485f2adcde ("udpv6: Add lockless
> sendmsg() support"). We will fix this case in the following patch.
>
> Note we will rename inet6_destroy_sock() to inet6_cleanup_sock() and
> remove unnecessary inet6_destroy_sock() calls in sk_prot->destroy()
> in the future.
>
> Fixes: 4b340ae20d0e ("IPv6: Complete IPV6_DONTFRAG support")
> Signed-off-by: Kuniyuki Iwashima <[email protected]>
> ---
> Cc: Brian Haley <[email protected]>
> ---
> include/net/ipv6.h | 1 +
> net/ipv6/af_inet6.c | 6 ++++++
> net/ipv6/ipv6_sockglue.c | 20 ++++++++------------
> 3 files changed, 15 insertions(+), 12 deletions(-)
>
> diff --git a/include/net/ipv6.h b/include/net/ipv6.h
> index d664ba5812d8..335a49ecd8a0 100644
> --- a/include/net/ipv6.h
> +++ b/include/net/ipv6.h
> @@ -1182,6 +1182,7 @@ void ipv6_icmp_error(struct sock *sk, struct sk_buff *skb, int err, __be16 port,
> void ipv6_local_error(struct sock *sk, int err, struct flowi6 *fl6, u32 info);
> void ipv6_local_rxpmtu(struct sock *sk, struct flowi6 *fl6, u32 mtu);
>
> +void inet6_cleanup_sock(struct sock *sk);
> int inet6_release(struct socket *sock);
> int inet6_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len);
> int inet6_getname(struct socket *sock, struct sockaddr *uaddr,
> diff --git a/net/ipv6/af_inet6.c b/net/ipv6/af_inet6.c
> index d40b7d60e00e..ded827944fa6 100644
> --- a/net/ipv6/af_inet6.c
> +++ b/net/ipv6/af_inet6.c
> @@ -510,6 +510,12 @@ void inet6_destroy_sock(struct sock *sk)
> }
> EXPORT_SYMBOL_GPL(inet6_destroy_sock);
>
> +void inet6_cleanup_sock(struct sock *sk)
> +{
> + inet6_destroy_sock(sk);
> +}
> +EXPORT_SYMBOL_GPL(inet6_cleanup_sock);
> +
> /*
> * This does both peername and sockname.
> */
> diff --git a/net/ipv6/ipv6_sockglue.c b/net/ipv6/ipv6_sockglue.c
> index 408345fc4c5c..a20edae868fd 100644
> --- a/net/ipv6/ipv6_sockglue.c
> +++ b/net/ipv6/ipv6_sockglue.c
> @@ -431,9 +431,6 @@ int do_ipv6_setsockopt(struct sock *sk, int level, int optname,
> if (optlen < sizeof(int))
> goto e_inval;
> if (val == PF_INET) {
> - struct ipv6_txoptions *opt;
> - struct sk_buff *pktopt;
> -
> if (sk->sk_type == SOCK_RAW)
> break;
>
> @@ -464,7 +461,6 @@ int do_ipv6_setsockopt(struct sock *sk, int level, int optname,
> break;
> }
>
> - fl6_free_socklist(sk);
> __ipv6_sock_mc_close(sk);
> __ipv6_sock_ac_close(sk);
>
> @@ -501,14 +497,14 @@ int do_ipv6_setsockopt(struct sock *sk, int level, int optname,
> sk->sk_socket->ops = &inet_dgram_ops;
> sk->sk_family = PF_INET;
> }
> - opt = xchg((__force struct ipv6_txoptions **)&np->opt,
> - NULL);
> - if (opt) {
> - atomic_sub(opt->tot_len, &sk->sk_omem_alloc);
> - txopt_put(opt);
> - }
> - pktopt = xchg(&np->pktoptions, NULL);
> - kfree_skb(pktopt);
> +
> + /* Disable all options not to allocate memory anymore,
> + * but there is still a race. See the lockless path
> + * in udpv6_sendmsg() and ipv6_local_rxpmtu().
> + */
> + np->rxopt.all = 0;
> +
> + inet6_cleanup_sock(sk);

I think there still a pending point raised from Eric here.

"""
Once the v6 socket has been transformed to IPv4 one,
inet6_sock_destruct() is not going to be called.
"""

AFAICS the series is safe Kuniyuki noted:

"""
inet6_sock_destruct() is set to sk->sk_destruct(), which is not changed
by the transformation and will be called from __sk_destruct().
"""
[with the next patch]

@Eric are you fine with the above?

Thanks!

Paolo


>
> /*
> * ... and add it to the refcnt debug socks count

2022-10-13 02:24:08

by patchwork-bot+netdevbpf

[permalink] [raw]
Subject: Re: [PATCH v5 net 0/5] tcp/udp: Fix memory leaks and data races around IPV6_ADDRFORM.

Hello:

This series was applied to netdev/net.git (master)
by Jakub Kicinski <[email protected]>:

On Thu, 6 Oct 2022 11:53:44 -0700 you wrote:
> This series fixes some memory leaks and data races caused in the
> same scenario where one thread converts an IPv6 socket into IPv4
> with IPV6_ADDRFORM and another accesses the socket concurrently.
>
>
> Changes:
> v5:
> * Patch 1 & 5
> * Rebase and resolve conflicts with 24426654ed3a and 34704ef024ae
>
> [...]

Here is the summary with links:
- [v5,net,1/5] tcp/udp: Fix memory leak in ipv6_renew_options().
https://git.kernel.org/netdev/net/c/3c52c6bb831f
- [v5,net,2/5] udp: Call inet6_destroy_sock() in setsockopt(IPV6_ADDRFORM).
https://git.kernel.org/netdev/net/c/21985f43376c
- [v5,net,3/5] tcp/udp: Call inet6_destroy_sock() in IPv6 sk->sk_destruct().
https://git.kernel.org/netdev/net/c/d38afeec26ed
- [v5,net,4/5] ipv6: Fix data races around sk->sk_prot.
https://git.kernel.org/netdev/net/c/364f997b5cfe
- [v5,net,5/5] tcp: Fix data races around icsk->icsk_af_ops.
https://git.kernel.org/netdev/net/c/f49cd2f4d617

You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html