Changes from v3:
- Used atomic_try_cmpxchg() as suggested by Peter Zijlstra
- Renamed static_key_fast_inc() => static_key_fast_inc_not_negative()
(addressing Peter Zijlstra's review)
- Based on linux-tip/master
- tcp_md5_key_copy() now does net_warn_ratelimited()
(addressing Peter Zijlstra's review)
tcp_md5_do_add() does not as it returns -EUSERS from setsockopt()
syscall back to the userspace
- Corrected WARN_ON_ONCE(!static_key_fast_inc(key))
(Spotted by Jason Baron)
- Moved declaration of static_key_fast_inc_not_negative() and its
EXPORT_SYMBOL_GPL() to the patch 3 that uses it,
"net/tcp: Disable TCP-MD5 static key on tcp_md5sig_info destruction"
(addressing Peter Zijlstra's review)
- Added patch 4 that destroys the newly created request socket
if md5 info allocation or static_key increment was unsuccessful.
Instead of proceeding to add a socket without TCP-MD5 keys.
- Added patch 5 that separates helper tcp_time_wait_init()
and converts BUG_ON() to WARN_ON_ONCE().
Changes from v2:
- Prevent key->enabled from turning negative by overflow from
static_key_slow_inc() or static_key_fast_inc()
(addressing Peter Zijlstra's review)
- Added checks if static_branch_inc() and static_key_fast_int()
were successful to TCP-MD5 code.
Changes from v1:
- Add static_key_fast_inc() helper rather than open-coded atomic_inc()
(as suggested by Eric Dumazet)
Version 3:
https://lore.kernel.org/all/[email protected]/T/#u
Version 2:
https://lore.kernel.org/all/[email protected]/T/#u
Version 1:
https://lore.kernel.org/all/[email protected]/T/#u
The static key introduced by commit 6015c71e656b ("tcp: md5: add
tcp_md5_needed jump label") is a fast-path optimization aimed at
avoiding a cache line miss.
Once an MD5 key is introduced in the system the static key is enabled
and never disabled. Address this by disabling the static key when
the last tcp_md5sig_info in system is destroyed.
Previously it was submitted as a part of TCP-AO patches set [1].
Now in attempt to split 36 patches submission, I send this independently.
Cc: Ard Biesheuvel <[email protected]>
Cc: Bob Gilligan <[email protected]>
Cc: David Ahern <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Dmitry Safonov <[email protected]>
Cc: Eric Dumazet <[email protected]>
Cc: Francesco Ruggeri <[email protected]>
Cc: Hideaki YOSHIFUJI <[email protected]>
Cc: Jakub Kicinski <[email protected]>
Cc: Jason Baron <[email protected]>
Cc: Josh Poimboeuf <[email protected]>
Cc: Paolo Abeni <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Salam Noureddine <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: [email protected]
Cc: [email protected]
[1]: https://lore.kernel.org/all/[email protected]/T/#u
Thanks,
Dmitry
Dmitry Safonov (5):
jump_label: Prevent key->enabled int overflow
net/tcp: Separate tcp_md5sig_info allocation into
tcp_md5sig_info_add()
net/tcp: Disable TCP-MD5 static key on tcp_md5sig_info destruction
net/tcp: Do cleanup on tcp_md5_key_copy() failure
net/tcp: Separate initialization of twsk
include/linux/jump_label.h | 21 +++++++--
include/net/tcp.h | 10 ++--
kernel/jump_label.c | 55 +++++++++++++++++-----
net/ipv4/tcp.c | 5 +-
net/ipv4/tcp_ipv4.c | 94 +++++++++++++++++++++++++++++---------
net/ipv4/tcp_minisocks.c | 61 ++++++++++++++++---------
net/ipv4/tcp_output.c | 4 +-
net/ipv6/tcp_ipv6.c | 21 ++++-----
8 files changed, 191 insertions(+), 80 deletions(-)
base-commit: 094226ad94f471a9f19e8f8e7140a09c2625abaa
--
2.38.1
If the kernel was short on (atomic) memory and failed to allocate it -
don't proceed to creation of request socket. Otherwise the socket would
be unsigned and userspace likely doesn't expect that the TCP is not
MD5-signed anymore.
Signed-off-by: Dmitry Safonov <[email protected]>
---
net/ipv4/tcp_ipv4.c | 9 ++-------
net/ipv6/tcp_ipv6.c | 15 ++++++++-------
2 files changed, 10 insertions(+), 14 deletions(-)
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index 4bdb6e1ecaf3..deabf3309865 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -1628,13 +1628,8 @@ struct sock *tcp_v4_syn_recv_sock(const struct sock *sk, struct sk_buff *skb,
addr = (union tcp_md5_addr *)&newinet->inet_daddr;
key = tcp_md5_do_lookup(sk, l3index, addr, AF_INET);
if (key) {
- /*
- * We're using one, so create a matching key
- * on the newsk structure. If we fail to get
- * memory, then we end up not copying the key
- * across. Shucks.
- */
- tcp_md5_key_copy(newsk, addr, AF_INET, 32, l3index, key);
+ if (tcp_md5_key_copy(newsk, addr, AF_INET, 32, l3index, key))
+ goto put_and_exit;
sk_gso_disable(newsk);
}
#endif
diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
index 3e3bdc120fc8..64788cfbefc7 100644
--- a/net/ipv6/tcp_ipv6.c
+++ b/net/ipv6/tcp_ipv6.c
@@ -1376,13 +1376,14 @@ static struct sock *tcp_v6_syn_recv_sock(const struct sock *sk, struct sk_buff *
/* Copy over the MD5 key from the original socket */
key = tcp_v6_md5_do_lookup(sk, &newsk->sk_v6_daddr, l3index);
if (key) {
- /* We're using one, so create a matching key
- * on the newsk structure. If we fail to get
- * memory, then we end up not copying the key
- * across. Shucks.
- */
- tcp_md5_key_copy(newsk, (union tcp_md5_addr *)&newsk->sk_v6_daddr,
- AF_INET6, 128, l3index, key);
+ const union tcp_md5_addr *addr;
+
+ addr = (union tcp_md5_addr *)&newsk->sk_v6_daddr;
+ if (tcp_md5_key_copy(newsk, addr, AF_INET6, 128, l3index, key)) {
+ inet_csk_prepare_forced_close(newsk);
+ tcp_done(newsk);
+ goto out;
+ }
}
#endif
--
2.38.1
1. With CONFIG_JUMP_LABEL=n static_key_slow_inc() doesn't have any
protection against key->enabled refcounter overflow.
2. With CONFIG_JUMP_LABEL=y static_key_slow_inc_cpuslocked()
still may turn the refcounter negative as (v + 1) may overflow.
key->enabled is indeed a ref-counter as it's documented in multiple
places: top comment in jump_label.h, Documentation/staging/static-keys.rst,
etc.
As -1 is reserved for static key that's in process of being enabled,
functions would break with negative key->enabled refcount:
- for CONFIG_JUMP_LABEL=n negative return of static_key_count()
breaks static_key_false(), static_key_true()
- the ref counter may become 0 from negative side by too many
static_key_slow_inc() calls and lead to use-after-free issues.
These flaws result in that some users have to introduce an additional
mutex and prevent the reference counter from overflowing themselves,
see bpf_enable_runtime_stats() checking the counter against INT_MAX / 2.
Prevent the reference counter overflow by checking if (v + 1) > 0.
Change functions API to return whether the increment was successful.
Signed-off-by: Dmitry Safonov <[email protected]>
---
include/linux/jump_label.h | 19 +++++++++++---
kernel/jump_label.c | 54 +++++++++++++++++++++++++++++---------
2 files changed, 57 insertions(+), 16 deletions(-)
diff --git a/include/linux/jump_label.h b/include/linux/jump_label.h
index 570831ca9951..c0a02d4c2ea2 100644
--- a/include/linux/jump_label.h
+++ b/include/linux/jump_label.h
@@ -224,9 +224,9 @@ extern bool arch_jump_label_transform_queue(struct jump_entry *entry,
enum jump_label_type type);
extern void arch_jump_label_transform_apply(void);
extern int jump_label_text_reserved(void *start, void *end);
-extern void static_key_slow_inc(struct static_key *key);
+extern bool static_key_slow_inc(struct static_key *key);
extern void static_key_slow_dec(struct static_key *key);
-extern void static_key_slow_inc_cpuslocked(struct static_key *key);
+extern bool static_key_slow_inc_cpuslocked(struct static_key *key);
extern void static_key_slow_dec_cpuslocked(struct static_key *key);
extern int static_key_count(struct static_key *key);
extern void static_key_enable(struct static_key *key);
@@ -278,10 +278,21 @@ static __always_inline bool static_key_true(struct static_key *key)
return false;
}
-static inline void static_key_slow_inc(struct static_key *key)
+static inline bool static_key_slow_inc(struct static_key *key)
{
+ int v;
+
STATIC_KEY_CHECK_USE(key);
- atomic_inc(&key->enabled);
+ /*
+ * Prevent key->enabled getting negative to follow the same semantics
+ * as for CONFIG_JUMP_LABEL=y, see kernel/jump_label.c comment.
+ */
+ v = atomic_read(&key->enabled);
+ do {
+ if (v < 0 || (v + 1) < 0)
+ return false;
+ } while (!likely(atomic_try_cmpxchg(&key->enabled, &v, v + 1)));
+ return true;
}
static inline void static_key_slow_dec(struct static_key *key)
diff --git a/kernel/jump_label.c b/kernel/jump_label.c
index 4d6c6f5f60db..677a6674c130 100644
--- a/kernel/jump_label.c
+++ b/kernel/jump_label.c
@@ -113,9 +113,38 @@ int static_key_count(struct static_key *key)
}
EXPORT_SYMBOL_GPL(static_key_count);
-void static_key_slow_inc_cpuslocked(struct static_key *key)
+/***
+ * static_key_fast_inc_not_negative - adds a user for a static key
+ * @key: static key that must be already enabled
+ *
+ * The caller must make sure that the static key can't get disabled while
+ * in this function. It doesn't patch jump labels, only adds a user to
+ * an already enabled static key.
+ *
+ * Returns true if the increment was done.
+ */
+static bool static_key_fast_inc_not_negative(struct static_key *key)
{
+ int v;
+
STATIC_KEY_CHECK_USE(key);
+ /*
+ * Negative key->enabled has a special meaning: it sends
+ * static_key_slow_inc() down the slow path, and it is non-zero
+ * so it counts as "enabled" in jump_label_update(). Note that
+ * atomic_inc_unless_negative() checks >= 0, so roll our own.
+ */
+ v = atomic_read(&key->enabled);
+ do {
+ if (v <= 0 || (v + 1) < 0)
+ return false;
+ } while (!likely(atomic_try_cmpxchg(&key->enabled, &v, v + 1)));
+
+ return true;
+}
+
+bool static_key_slow_inc_cpuslocked(struct static_key *key)
+{
lockdep_assert_cpus_held();
/*
@@ -124,15 +153,9 @@ void static_key_slow_inc_cpuslocked(struct static_key *key)
* jump_label_update() process. At the same time, however,
* the jump_label_update() call below wants to see
* static_key_enabled(&key) for jumps to be updated properly.
- *
- * So give a special meaning to negative key->enabled: it sends
- * static_key_slow_inc() down the slow path, and it is non-zero
- * so it counts as "enabled" in jump_label_update(). Note that
- * atomic_inc_unless_negative() checks >= 0, so roll our own.
*/
- for (int v = atomic_read(&key->enabled); v > 0; )
- if (likely(atomic_try_cmpxchg(&key->enabled, &v, v + 1)))
- return;
+ if (static_key_fast_inc_not_negative(key))
+ return true;
jump_label_lock();
if (atomic_read(&key->enabled) == 0) {
@@ -144,16 +167,23 @@ void static_key_slow_inc_cpuslocked(struct static_key *key)
*/
atomic_set_release(&key->enabled, 1);
} else {
- atomic_inc(&key->enabled);
+ if (WARN_ON_ONCE(!static_key_fast_inc_not_negative(key))) {
+ jump_label_unlock();
+ return false;
+ }
}
jump_label_unlock();
+ return true;
}
-void static_key_slow_inc(struct static_key *key)
+bool static_key_slow_inc(struct static_key *key)
{
+ bool ret;
+
cpus_read_lock();
- static_key_slow_inc_cpuslocked(key);
+ ret = static_key_slow_inc_cpuslocked(key);
cpus_read_unlock();
+ return ret;
}
EXPORT_SYMBOL_GPL(static_key_slow_inc);
--
2.38.1
To do that, separate two scenarios:
- where it's the first MD5 key on the system, which means that enabling
of the static key may need to sleep;
- copying of an existing key from a listening socket to the request
socket upon receiving a signed TCP segment, where static key was
already enabled (when the key was added to the listening socket).
Now the life-time of the static branch for TCP-MD5 is until:
- last tcp_md5sig_info is destroyed
- last socket in time-wait state with MD5 key is closed.
Which means that after all sockets with TCP-MD5 keys are gone, the
system gets back the performance of disabled md5-key static branch.
While at here, provide static_key_fast_inc() helper that does ref
counter increment in atomic fashion (without grabbing cpus_read_lock()
on CONFIG_JUMP_LABEL=y). This is needed to add a new user for
a static_key when the caller controls the lifetime of another user.
Signed-off-by: Dmitry Safonov <[email protected]>
---
include/linux/jump_label.h | 4 ++-
include/net/tcp.h | 10 ++++--
kernel/jump_label.c | 3 +-
net/ipv4/tcp.c | 5 +--
net/ipv4/tcp_ipv4.c | 69 +++++++++++++++++++++++++++++++-------
net/ipv4/tcp_minisocks.c | 16 ++++++---
net/ipv4/tcp_output.c | 4 +--
net/ipv6/tcp_ipv6.c | 10 +++---
8 files changed, 87 insertions(+), 34 deletions(-)
diff --git a/include/linux/jump_label.h b/include/linux/jump_label.h
index c0a02d4c2ea2..f3fc5081cae6 100644
--- a/include/linux/jump_label.h
+++ b/include/linux/jump_label.h
@@ -225,6 +225,7 @@ extern bool arch_jump_label_transform_queue(struct jump_entry *entry,
extern void arch_jump_label_transform_apply(void);
extern int jump_label_text_reserved(void *start, void *end);
extern bool static_key_slow_inc(struct static_key *key);
+extern bool static_key_fast_inc_not_negative(struct static_key *key);
extern void static_key_slow_dec(struct static_key *key);
extern bool static_key_slow_inc_cpuslocked(struct static_key *key);
extern void static_key_slow_dec_cpuslocked(struct static_key *key);
@@ -278,7 +279,7 @@ static __always_inline bool static_key_true(struct static_key *key)
return false;
}
-static inline bool static_key_slow_inc(struct static_key *key)
+static inline bool static_key_fast_inc_not_negative(struct static_key *key)
{
int v;
@@ -294,6 +295,7 @@ static inline bool static_key_slow_inc(struct static_key *key)
} while (!likely(atomic_try_cmpxchg(&key->enabled, &v, v + 1)));
return true;
}
+#define static_key_slow_inc(key) static_key_fast_inc_not_negative(key)
static inline void static_key_slow_dec(struct static_key *key)
{
diff --git a/include/net/tcp.h b/include/net/tcp.h
index 14d45661a84d..a0cdf013782a 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -1675,7 +1675,11 @@ int tcp_v4_md5_hash_skb(char *md5_hash, const struct tcp_md5sig_key *key,
const struct sock *sk, const struct sk_buff *skb);
int tcp_md5_do_add(struct sock *sk, const union tcp_md5_addr *addr,
int family, u8 prefixlen, int l3index, u8 flags,
- const u8 *newkey, u8 newkeylen, gfp_t gfp);
+ const u8 *newkey, u8 newkeylen);
+int tcp_md5_key_copy(struct sock *sk, const union tcp_md5_addr *addr,
+ int family, u8 prefixlen, int l3index,
+ struct tcp_md5sig_key *key);
+
int tcp_md5_do_del(struct sock *sk, const union tcp_md5_addr *addr,
int family, u8 prefixlen, int l3index, u8 flags);
struct tcp_md5sig_key *tcp_v4_md5_lookup(const struct sock *sk,
@@ -1683,7 +1687,7 @@ struct tcp_md5sig_key *tcp_v4_md5_lookup(const struct sock *sk,
#ifdef CONFIG_TCP_MD5SIG
#include <linux/jump_label.h>
-extern struct static_key_false tcp_md5_needed;
+extern struct static_key_false_deferred tcp_md5_needed;
struct tcp_md5sig_key *__tcp_md5_do_lookup(const struct sock *sk, int l3index,
const union tcp_md5_addr *addr,
int family);
@@ -1691,7 +1695,7 @@ static inline struct tcp_md5sig_key *
tcp_md5_do_lookup(const struct sock *sk, int l3index,
const union tcp_md5_addr *addr, int family)
{
- if (!static_branch_unlikely(&tcp_md5_needed))
+ if (!static_branch_unlikely(&tcp_md5_needed.key))
return NULL;
return __tcp_md5_do_lookup(sk, l3index, addr, family);
}
diff --git a/kernel/jump_label.c b/kernel/jump_label.c
index 677a6674c130..32c785f5d2b1 100644
--- a/kernel/jump_label.c
+++ b/kernel/jump_label.c
@@ -123,7 +123,7 @@ EXPORT_SYMBOL_GPL(static_key_count);
*
* Returns true if the increment was done.
*/
-static bool static_key_fast_inc_not_negative(struct static_key *key)
+bool static_key_fast_inc_not_negative(struct static_key *key)
{
int v;
@@ -142,6 +142,7 @@ static bool static_key_fast_inc_not_negative(struct static_key *key)
return true;
}
+EXPORT_SYMBOL_GPL(static_key_fast_inc_not_negative);
bool static_key_slow_inc_cpuslocked(struct static_key *key)
{
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 54836a6b81d6..07a73c9b49da 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -4460,11 +4460,8 @@ bool tcp_alloc_md5sig_pool(void)
if (unlikely(!READ_ONCE(tcp_md5sig_pool_populated))) {
mutex_lock(&tcp_md5sig_mutex);
- if (!tcp_md5sig_pool_populated) {
+ if (!tcp_md5sig_pool_populated)
__tcp_alloc_md5sig_pool();
- if (tcp_md5sig_pool_populated)
- static_branch_inc(&tcp_md5_needed);
- }
mutex_unlock(&tcp_md5sig_mutex);
}
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index fae80b1a1796..4bdb6e1ecaf3 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -1064,7 +1064,7 @@ static void tcp_v4_reqsk_destructor(struct request_sock *req)
* We need to maintain these in the sk structure.
*/
-DEFINE_STATIC_KEY_FALSE(tcp_md5_needed);
+DEFINE_STATIC_KEY_DEFERRED_FALSE(tcp_md5_needed, HZ);
EXPORT_SYMBOL(tcp_md5_needed);
static bool better_md5_match(struct tcp_md5sig_key *old, struct tcp_md5sig_key *new)
@@ -1177,9 +1177,6 @@ static int tcp_md5sig_info_add(struct sock *sk, gfp_t gfp)
struct tcp_sock *tp = tcp_sk(sk);
struct tcp_md5sig_info *md5sig;
- if (rcu_dereference_protected(tp->md5sig_info, lockdep_sock_is_held(sk)))
- return 0;
-
md5sig = kmalloc(sizeof(*md5sig), gfp);
if (!md5sig)
return -ENOMEM;
@@ -1191,9 +1188,9 @@ static int tcp_md5sig_info_add(struct sock *sk, gfp_t gfp)
}
/* This can be called on a newly created socket, from other files */
-int tcp_md5_do_add(struct sock *sk, const union tcp_md5_addr *addr,
- int family, u8 prefixlen, int l3index, u8 flags,
- const u8 *newkey, u8 newkeylen, gfp_t gfp)
+static int __tcp_md5_do_add(struct sock *sk, const union tcp_md5_addr *addr,
+ int family, u8 prefixlen, int l3index, u8 flags,
+ const u8 *newkey, u8 newkeylen, gfp_t gfp)
{
/* Add Key to the list */
struct tcp_md5sig_key *key;
@@ -1220,9 +1217,6 @@ int tcp_md5_do_add(struct sock *sk, const union tcp_md5_addr *addr,
return 0;
}
- if (tcp_md5sig_info_add(sk, gfp))
- return -ENOMEM;
-
md5sig = rcu_dereference_protected(tp->md5sig_info,
lockdep_sock_is_held(sk));
@@ -1246,8 +1240,57 @@ int tcp_md5_do_add(struct sock *sk, const union tcp_md5_addr *addr,
hlist_add_head_rcu(&key->node, &md5sig->head);
return 0;
}
+
+int tcp_md5_do_add(struct sock *sk, const union tcp_md5_addr *addr,
+ int family, u8 prefixlen, int l3index, u8 flags,
+ const u8 *newkey, u8 newkeylen)
+{
+ struct tcp_sock *tp = tcp_sk(sk);
+
+ if (!rcu_dereference_protected(tp->md5sig_info, lockdep_sock_is_held(sk))) {
+ if (tcp_md5sig_info_add(sk, GFP_KERNEL))
+ return -ENOMEM;
+
+ if (!static_branch_inc(&tcp_md5_needed.key)) {
+ struct tcp_md5sig_info *md5sig = tp->md5sig_info;
+
+ rcu_assign_pointer(tp->md5sig_info, NULL);
+ kfree_rcu(md5sig);
+ return -EUSERS;
+ }
+ }
+
+ return __tcp_md5_do_add(sk, addr, family, prefixlen, l3index, flags,
+ newkey, newkeylen, GFP_KERNEL);
+}
EXPORT_SYMBOL(tcp_md5_do_add);
+int tcp_md5_key_copy(struct sock *sk, const union tcp_md5_addr *addr,
+ int family, u8 prefixlen, int l3index,
+ struct tcp_md5sig_key *key)
+{
+ struct tcp_sock *tp = tcp_sk(sk);
+
+ if (!rcu_dereference_protected(tp->md5sig_info, lockdep_sock_is_held(sk))) {
+ if (tcp_md5sig_info_add(sk, sk_gfp_mask(sk, GFP_ATOMIC)))
+ return -ENOMEM;
+
+ if (!static_key_fast_inc_not_negative(&tcp_md5_needed.key.key)) {
+ struct tcp_md5sig_info *md5sig = tp->md5sig_info;
+
+ net_warn_ratelimited("Too many TCP-MD5 keys in the system\n");
+ rcu_assign_pointer(tp->md5sig_info, NULL);
+ kfree_rcu(md5sig);
+ return -EUSERS;
+ }
+ }
+
+ return __tcp_md5_do_add(sk, addr, family, prefixlen, l3index,
+ key->flags, key->key, key->keylen,
+ sk_gfp_mask(sk, GFP_ATOMIC));
+}
+EXPORT_SYMBOL(tcp_md5_key_copy);
+
int tcp_md5_do_del(struct sock *sk, const union tcp_md5_addr *addr, int family,
u8 prefixlen, int l3index, u8 flags)
{
@@ -1334,7 +1377,7 @@ static int tcp_v4_parse_md5_keys(struct sock *sk, int optname,
return -EINVAL;
return tcp_md5_do_add(sk, addr, AF_INET, prefixlen, l3index, flags,
- cmd.tcpm_key, cmd.tcpm_keylen, GFP_KERNEL);
+ cmd.tcpm_key, cmd.tcpm_keylen);
}
static int tcp_v4_md5_hash_headers(struct tcp_md5sig_pool *hp,
@@ -1591,8 +1634,7 @@ struct sock *tcp_v4_syn_recv_sock(const struct sock *sk, struct sk_buff *skb,
* memory, then we end up not copying the key
* across. Shucks.
*/
- tcp_md5_do_add(newsk, addr, AF_INET, 32, l3index, key->flags,
- key->key, key->keylen, GFP_ATOMIC);
+ tcp_md5_key_copy(newsk, addr, AF_INET, 32, l3index, key);
sk_gso_disable(newsk);
}
#endif
@@ -2284,6 +2326,7 @@ void tcp_v4_destroy_sock(struct sock *sk)
tcp_clear_md5_list(sk);
kfree_rcu(rcu_dereference_protected(tp->md5sig_info, 1), rcu);
tp->md5sig_info = NULL;
+ static_branch_slow_dec_deferred(&tcp_md5_needed);
}
#endif
diff --git a/net/ipv4/tcp_minisocks.c b/net/ipv4/tcp_minisocks.c
index c375f603a16c..50f91c10eb7b 100644
--- a/net/ipv4/tcp_minisocks.c
+++ b/net/ipv4/tcp_minisocks.c
@@ -291,13 +291,19 @@ void tcp_time_wait(struct sock *sk, int state, int timeo)
*/
do {
tcptw->tw_md5_key = NULL;
- if (static_branch_unlikely(&tcp_md5_needed)) {
+ if (static_branch_unlikely(&tcp_md5_needed.key)) {
struct tcp_md5sig_key *key;
key = tp->af_specific->md5_lookup(sk, sk);
if (key) {
tcptw->tw_md5_key = kmemdup(key, sizeof(*key), GFP_ATOMIC);
- BUG_ON(tcptw->tw_md5_key && !tcp_alloc_md5sig_pool());
+ if (!tcptw->tw_md5_key)
+ break;
+ BUG_ON(!tcp_alloc_md5sig_pool());
+ if (!static_key_fast_inc_not_negative(&tcp_md5_needed.key.key)) {
+ kfree(tcptw->tw_md5_key);
+ tcptw->tw_md5_key = NULL;
+ }
}
}
} while (0);
@@ -337,11 +343,13 @@ EXPORT_SYMBOL(tcp_time_wait);
void tcp_twsk_destructor(struct sock *sk)
{
#ifdef CONFIG_TCP_MD5SIG
- if (static_branch_unlikely(&tcp_md5_needed)) {
+ if (static_branch_unlikely(&tcp_md5_needed.key)) {
struct tcp_timewait_sock *twsk = tcp_twsk(sk);
- if (twsk->tw_md5_key)
+ if (twsk->tw_md5_key) {
kfree_rcu(twsk->tw_md5_key, rcu);
+ static_branch_slow_dec_deferred(&tcp_md5_needed);
+ }
}
#endif
}
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index c69f4d966024..86e71c8c76bc 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -766,7 +766,7 @@ static unsigned int tcp_syn_options(struct sock *sk, struct sk_buff *skb,
*md5 = NULL;
#ifdef CONFIG_TCP_MD5SIG
- if (static_branch_unlikely(&tcp_md5_needed) &&
+ if (static_branch_unlikely(&tcp_md5_needed.key) &&
rcu_access_pointer(tp->md5sig_info)) {
*md5 = tp->af_specific->md5_lookup(sk, sk);
if (*md5) {
@@ -922,7 +922,7 @@ static unsigned int tcp_established_options(struct sock *sk, struct sk_buff *skb
*md5 = NULL;
#ifdef CONFIG_TCP_MD5SIG
- if (static_branch_unlikely(&tcp_md5_needed) &&
+ if (static_branch_unlikely(&tcp_md5_needed.key) &&
rcu_access_pointer(tp->md5sig_info)) {
*md5 = tp->af_specific->md5_lookup(sk, sk);
if (*md5) {
diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
index 2a3f9296df1e..3e3bdc120fc8 100644
--- a/net/ipv6/tcp_ipv6.c
+++ b/net/ipv6/tcp_ipv6.c
@@ -677,12 +677,11 @@ static int tcp_v6_parse_md5_keys(struct sock *sk, int optname,
if (ipv6_addr_v4mapped(&sin6->sin6_addr))
return tcp_md5_do_add(sk, (union tcp_md5_addr *)&sin6->sin6_addr.s6_addr32[3],
AF_INET, prefixlen, l3index, flags,
- cmd.tcpm_key, cmd.tcpm_keylen,
- GFP_KERNEL);
+ cmd.tcpm_key, cmd.tcpm_keylen);
return tcp_md5_do_add(sk, (union tcp_md5_addr *)&sin6->sin6_addr,
AF_INET6, prefixlen, l3index, flags,
- cmd.tcpm_key, cmd.tcpm_keylen, GFP_KERNEL);
+ cmd.tcpm_key, cmd.tcpm_keylen);
}
static int tcp_v6_md5_hash_headers(struct tcp_md5sig_pool *hp,
@@ -1382,9 +1381,8 @@ static struct sock *tcp_v6_syn_recv_sock(const struct sock *sk, struct sk_buff *
* memory, then we end up not copying the key
* across. Shucks.
*/
- tcp_md5_do_add(newsk, (union tcp_md5_addr *)&newsk->sk_v6_daddr,
- AF_INET6, 128, l3index, key->flags, key->key, key->keylen,
- sk_gfp_mask(sk, GFP_ATOMIC));
+ tcp_md5_key_copy(newsk, (union tcp_md5_addr *)&newsk->sk_v6_daddr,
+ AF_INET6, 128, l3index, key);
}
#endif
--
2.38.1
Convert BUG_ON() to WARN_ON_ONCE() and warn as well for unlikely
static key int overflow error-path.
Signed-off-by: Dmitry Safonov <[email protected]>
---
net/ipv4/tcp_minisocks.c | 61 +++++++++++++++++++++++-----------------
1 file changed, 35 insertions(+), 26 deletions(-)
diff --git a/net/ipv4/tcp_minisocks.c b/net/ipv4/tcp_minisocks.c
index 50f91c10eb7b..1cfafad9ba29 100644
--- a/net/ipv4/tcp_minisocks.c
+++ b/net/ipv4/tcp_minisocks.c
@@ -240,6 +240,40 @@ tcp_timewait_state_process(struct inet_timewait_sock *tw, struct sk_buff *skb,
}
EXPORT_SYMBOL(tcp_timewait_state_process);
+static void tcp_time_wait_init(struct sock *sk, struct tcp_timewait_sock *tcptw)
+{
+#ifdef CONFIG_TCP_MD5SIG
+ const struct tcp_sock *tp = tcp_sk(sk);
+ struct tcp_md5sig_key *key;
+
+ /*
+ * The timewait bucket does not have the key DB from the
+ * sock structure. We just make a quick copy of the
+ * md5 key being used (if indeed we are using one)
+ * so the timewait ack generating code has the key.
+ */
+ tcptw->tw_md5_key = NULL;
+ if (!static_branch_unlikely(&tcp_md5_needed.key))
+ return;
+
+ key = tp->af_specific->md5_lookup(sk, sk);
+ if (key) {
+ tcptw->tw_md5_key = kmemdup(key, sizeof(*key), GFP_ATOMIC);
+ if (!tcptw->tw_md5_key)
+ return;
+ if (!tcp_alloc_md5sig_pool())
+ goto out_free;
+ if (!static_key_fast_inc_not_negative(&tcp_md5_needed.key.key))
+ goto out_free;
+ }
+ return;
+out_free:
+ WARN_ON_ONCE(1);
+ kfree(tcptw->tw_md5_key);
+ tcptw->tw_md5_key = NULL;
+#endif
+}
+
/*
* Move a socket to time-wait or dead fin-wait-2 state.
*/
@@ -282,32 +316,7 @@ void tcp_time_wait(struct sock *sk, int state, int timeo)
}
#endif
-#ifdef CONFIG_TCP_MD5SIG
- /*
- * The timewait bucket does not have the key DB from the
- * sock structure. We just make a quick copy of the
- * md5 key being used (if indeed we are using one)
- * so the timewait ack generating code has the key.
- */
- do {
- tcptw->tw_md5_key = NULL;
- if (static_branch_unlikely(&tcp_md5_needed.key)) {
- struct tcp_md5sig_key *key;
-
- key = tp->af_specific->md5_lookup(sk, sk);
- if (key) {
- tcptw->tw_md5_key = kmemdup(key, sizeof(*key), GFP_ATOMIC);
- if (!tcptw->tw_md5_key)
- break;
- BUG_ON(!tcp_alloc_md5sig_pool());
- if (!static_key_fast_inc_not_negative(&tcp_md5_needed.key.key)) {
- kfree(tcptw->tw_md5_key);
- tcptw->tw_md5_key = NULL;
- }
- }
- }
- } while (0);
-#endif
+ tcp_time_wait_init(sk, tcptw);
/* Get the TIME_WAIT timeout firing. */
if (timeo < rto)
--
2.38.1
On Tue, 15 Nov 2022 21:19:03 +0000 Dmitry Safonov wrote:
> + if (!rcu_dereference_protected(tp->md5sig_info, lockdep_sock_is_held(sk))) {
> + if (tcp_md5sig_info_add(sk, sk_gfp_mask(sk, GFP_ATOMIC)))
> + return -ENOMEM;
> +
> + if (!static_key_fast_inc_not_negative(&tcp_md5_needed.key.key)) {
> + struct tcp_md5sig_info *md5sig = tp->md5sig_info;
I don't think sparse will be able to deduce that ->md5sig_info access
is safe here, so could you wrap it up as well? Maybe it wouldn't be
the worst move to provide a sk_rcu_dereference() or rcu_dereference_sk()
or some such wrapper.
More importantly tho - was the merging part for this patches discussed?
They don't apply to net-next.
On 11/19/22 03:18, Jakub Kicinski wrote:
> On Tue, 15 Nov 2022 21:19:03 +0000 Dmitry Safonov wrote:
>> + if (!rcu_dereference_protected(tp->md5sig_info, lockdep_sock_is_held(sk))) {
>> + if (tcp_md5sig_info_add(sk, sk_gfp_mask(sk, GFP_ATOMIC)))
>> + return -ENOMEM;
>> +
>> + if (!static_key_fast_inc_not_negative(&tcp_md5_needed.key.key)) {
>> + struct tcp_md5sig_info *md5sig = tp->md5sig_info;
>
> I don't think sparse will be able to deduce that ->md5sig_info access
> is safe here, so could you wrap it up as well?
Sure, that sounds logical to do.
> Maybe it wouldn't be
> the worst move to provide a sk_rcu_dereference() or rcu_dereference_sk()
> or some such wrapper.
>
> More importantly tho - was the merging part for this patches discussed?
> They don't apply to net-next.
They apply over linux-next as there's a change [1] in
linux-tip/locking/core on which the patches set based.
Could the way forward be through linux-tip tree, or that might create
net conflicts?
I'll send v5 with the trivial change to rcu_dereference_protected()
mentioned above.
[1]
https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/commit/?h=locking/core&id=d0c006402e7941558e5283ae434e2847c7999378
Thanks,
Dmitry
On Mon, 21 Nov 2022 20:31:38 +0000 Dmitry Safonov wrote:
> > Maybe it wouldn't be
> > the worst move to provide a sk_rcu_dereference() or rcu_dereference_sk()
> > or some such wrapper.
> >
> > More importantly tho - was the merging part for this patches discussed?
> > They don't apply to net-next.
>
> They apply over linux-next as there's a change [1] in
> linux-tip/locking/core on which the patches set based.
>
> Could the way forward be through linux-tip tree, or that might create
> net conflicts?
Dunno from memory, too much happens in these files :S
Could you cherry-pick [1] onto net-next and see if
git am --no-3way patches/*
goes thru cleanly? If so no objections for the patches to go via tip,
we're close enough to the merge window.
> I'll send v5 with the trivial change to rcu_dereference_protected()
> mentioned above.
On 11/21/22 20:41, Jakub Kicinski wrote:
> On Mon, 21 Nov 2022 20:31:38 +0000 Dmitry Safonov wrote:
>>> Maybe it wouldn't be
>>> the worst move to provide a sk_rcu_dereference() or rcu_dereference_sk()
>>> or some such wrapper.
>>>
>>> More importantly tho - was the merging part for this patches discussed?
>>> They don't apply to net-next.
>>
>> They apply over linux-next as there's a change [1] in
>> linux-tip/locking/core on which the patches set based.
>>
>> Could the way forward be through linux-tip tree, or that might create
>> net conflicts?
>
> Dunno from memory, too much happens in these files :S
>
> Could you cherry-pick [1] onto net-next and see if
>
> git am --no-3way patches/*
>
> goes thru cleanly? If so no objections for the patches to go via tip,
> we're close enough to the merge window.
That did go cleanly for me on today's net-next/main.
>> I'll send v5 with the trivial change to rcu_dereference_protected()
>> mentioned above.
Thanks,
Dmitry
On Mon, Nov 21, 2022 at 7:53 PM Jakub Kicinski <[email protected]> wrote:
>
> On Mon, 21 Nov 2022 20:56:18 +0000 Dmitry Safonov wrote:
> > > Dunno from memory, too much happens in these files :S
> > >
> > > Could you cherry-pick [1] onto net-next and see if
> > >
> > > git am --no-3way patches/*
> > >
> > > goes thru cleanly? If so no objections for the patches to go via tip,
> > > we're close enough to the merge window.
> >
> > That did go cleanly for me on today's net-next/main.
>
> Great, feel free to slap my:
>
> Acked-by: Jakub Kicinski <[email protected]>
>
> on v5. (But we'll still want a proper review from Eric.)
I'll try my best.
On Mon, 21 Nov 2022 20:56:18 +0000 Dmitry Safonov wrote:
> > Dunno from memory, too much happens in these files :S
> >
> > Could you cherry-pick [1] onto net-next and see if
> >
> > git am --no-3way patches/*
> >
> > goes thru cleanly? If so no objections for the patches to go via tip,
> > we're close enough to the merge window.
>
> That did go cleanly for me on today's net-next/main.
Great, feel free to slap my:
Acked-by: Jakub Kicinski <[email protected]>
on v5. (But we'll still want a proper review from Eric.)