2023-11-29 16:57:48

by Dmitry Safonov

[permalink] [raw]
Subject: [PATCH v4 0/7] TCP-AO fixes

Hi,

Changes from v3:
- Don't restrict adding any keys on TCP-AO connection in VRF, but only
the ones that don't match l3index (David)

Changes from v2:
- rwlocks are problematic in net code (Paolo)
Changed the SNE code to avoid spin/rw locks on RX/TX fastpath by
double-accounting SEQ numbers for TCP-AO enabled connections.

Changes from v1:
- Use tcp_can_repair_sock() helper to limit TCP_AO_REPAIR (Eric)
- Instead of hook to listen() syscall, allow removing current/rnext keys
on TCP_LISTEN (addressing Eric's objection)
- Add sne_lock to protect snd_sne/rcv_sne
- Don't move used_tcp_ao in struct tcp_request_sock (Eric)

I've been working on TCP-AO key-rotation selftests and as a result
exercised some corner-cases that are not usually met in production.

Here are a bunch of semi-related fixes:
- Documentation typo (reported by Markus Elfring)
- Proper alignment for TCP-AO option in TCP header that has MAC length
of non 4 bytes (now a selftest with randomized maclen/algorithm/etc
passes)
- 3 uAPI restricting patches that disallow more things to userspace in
order to prevent it shooting itself in any parts of the body
- SNEs READ_ONCE()/WRITE_ONCE() that went missing by my human factor
- Avoid storing MAC length from SYN header as SYN-ACK will use
rnext_key.maclen (drops an extra check that fails on new selftests)

Please, consider applying/pulling.

The following changes since commit 3b47bc037bd44f142ac09848e8d3ecccc726be99:

Merge tag 'pinctrl-v6.7-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl (2023-11-29 06:45:22 -0800)

are available in the Git repository at:

[email protected]:0x7f454c46/linux.git tcp-ao-post-merge-v4

for you to fetch changes up to adb1a6e8d2034c1e17b6a84a512c71aa4c41c1d2:

net/tcp: Don't store TCP-AO maclen on reqsk (2023-11-29 16:14:14 +0000)

----------------------------------------------------------------

Thanks,
Dmitry

Cc: David Ahern <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Dmitry Safonov <[email protected]>
Cc: Eric Dumazet <[email protected]>
Cc: Francesco Ruggeri <[email protected]>
Cc: Jakub Kicinski <[email protected]>
Cc: Paolo Abeni <[email protected]>
Cc: Salam Noureddine <[email protected]>
Cc: Simon Horman <[email protected]>
Cc: [email protected]
Cc: [email protected]

Dmitry Safonov (7):
Documentation/tcp: Fix an obvious typo
net/tcp: Consistently align TCP-AO option in the header
net/tcp: Limit TCP_AO_REPAIR to non-listen sockets
net/tcp: Allow removing current/rnext TCP-AO keys on TCP_LISTEN
sockets
net/tcp: Don't add key with non-matching VRF on connected sockets
net/tcp: Store SNEs + SEQs on ao_info
net/tcp: Don't store TCP-AO maclen on reqsk

Documentation/networking/tcp_ao.rst | 2 +-
include/linux/tcp.h | 8 +--
include/net/tcp_ao.h | 31 +++++++++--
net/ipv4/tcp.c | 13 ++++-
net/ipv4/tcp_ao.c | 80 ++++++++++++++++++-----------
net/ipv4/tcp_fastopen.c | 2 +
net/ipv4/tcp_input.c | 26 ++++++----
net/ipv4/tcp_ipv4.c | 4 +-
net/ipv4/tcp_minisocks.c | 2 +-
net/ipv4/tcp_output.c | 16 +++---
net/ipv6/tcp_ipv6.c | 2 +-
11 files changed, 122 insertions(+), 64 deletions(-)


base-commit: 3b47bc037bd44f142ac09848e8d3ecccc726be99
--
2.43.0


2023-11-29 16:58:02

by Dmitry Safonov

[permalink] [raw]
Subject: [PATCH v4 5/7] net/tcp: Don't add key with non-matching VRF on connected sockets

If the connection was established, don't allow adding TCP-AO keys that
don't match the peer. Currently, there are checks for ip-address
matching, but L3 index check is missing. Add it to restrict userspace
shooting itself somewhere.

Yet, nothing restricts the CAP_NET_RAW user from trying to shoot
themselves by performing setsockopt(SO_BINDTODEVICE) or
setsockopt(SO_BINDTOIFINDEX) over an established TCP-AO connection.
So, this is just "minimum effort" to potentially save someone's
debugging time, rather than a full restriction on doing weird things.

Fixes: 248411b8cb89 ("net/tcp: Wire up l3index to TCP-AO")
Signed-off-by: Dmitry Safonov <[email protected]>
---
net/ipv4/tcp_ao.c | 9 +++++++++
1 file changed, 9 insertions(+)

diff --git a/net/ipv4/tcp_ao.c b/net/ipv4/tcp_ao.c
index bf41be6d4721..465c871786aa 100644
--- a/net/ipv4/tcp_ao.c
+++ b/net/ipv4/tcp_ao.c
@@ -1608,6 +1608,15 @@ static int tcp_ao_add_cmd(struct sock *sk, unsigned short int family,
if (!dev || !l3index)
return -EINVAL;

+ if (!bound_dev_if || bound_dev_if != cmd.ifindex) {
+ /* tcp_ao_established_key() doesn't expect having
+ * non peer-matching key on an established TCP-AO
+ * connection.
+ */
+ if (!((1 << sk->sk_state) & (TCPF_LISTEN | TCPF_CLOSE)))
+ return -EINVAL;
+ }
+
/* It's still possible to bind after adding keys or even
* re-bind to a different dev (with CAP_NET_RAW).
* So, no reason to return error here, rather try to be
--
2.43.0

2023-11-29 16:58:03

by Dmitry Safonov

[permalink] [raw]
Subject: [PATCH v4 4/7] net/tcp: Allow removing current/rnext TCP-AO keys on TCP_LISTEN sockets

TCP_LISTEN sockets are not connected to any peer, so having
current_key/rnext_key doesn't make sense.

The userspace may falter over this issue by setting current or rnext
TCP-AO key before listen() syscall. setsockopt(TCP_AO_DEL_KEY) doesn't
allow removing a key that is in use (in accordance to RFC 5925), so
it might be inconvenient to have keys that can be destroyed only with
listener socket.

Fixes: 4954f17ddefc ("net/tcp: Introduce TCP_AO setsockopt()s")
Signed-off-by: Dmitry Safonov <[email protected]>
---
net/ipv4/tcp_ao.c | 12 ++++++++++--
1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/net/ipv4/tcp_ao.c b/net/ipv4/tcp_ao.c
index c8be1d526eac..bf41be6d4721 100644
--- a/net/ipv4/tcp_ao.c
+++ b/net/ipv4/tcp_ao.c
@@ -1818,8 +1818,16 @@ static int tcp_ao_del_cmd(struct sock *sk, unsigned short int family,
if (!new_rnext)
return -ENOENT;
}
- if (cmd.del_async && sk->sk_state != TCP_LISTEN)
- return -EINVAL;
+ if (sk->sk_state == TCP_LISTEN) {
+ /* Cleaning up possible "stale" current/rnext keys state,
+ * that may have preserved from TCP_CLOSE, before sys_listen()
+ */
+ ao_info->current_key = NULL;
+ ao_info->rnext_key = NULL;
+ } else {
+ if (cmd.del_async)
+ return -EINVAL;
+ }

if (family == AF_INET) {
struct sockaddr_in *sin = (struct sockaddr_in *)&cmd.addr;
--
2.43.0

2023-11-29 16:58:14

by Dmitry Safonov

[permalink] [raw]
Subject: [PATCH v4 3/7] net/tcp: Limit TCP_AO_REPAIR to non-listen sockets

Listen socket is not an established TCP connection, so
setsockopt(TCP_AO_REPAIR) doesn't have any impact.

Restrict this uAPI for listen sockets.

Fixes: faadfaba5e01 ("net/tcp: Add TCP_AO_REPAIR")
Signed-off-by: Dmitry Safonov <[email protected]>
---
net/ipv4/tcp.c | 6 ++++++
1 file changed, 6 insertions(+)

diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 53bcc17c91e4..b1fe4eb01829 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -3594,6 +3594,10 @@ int do_tcp_setsockopt(struct sock *sk, int level, int optname,
break;

case TCP_AO_REPAIR:
+ if (!tcp_can_repair_sock(sk)) {
+ err = -EPERM;
+ break;
+ }
err = tcp_ao_set_repair(sk, optval, optlen);
break;
#ifdef CONFIG_TCP_AO
@@ -4293,6 +4297,8 @@ int do_tcp_getsockopt(struct sock *sk, int level,
}
#endif
case TCP_AO_REPAIR:
+ if (!tcp_can_repair_sock(sk))
+ return -EPERM;
return tcp_ao_get_repair(sk, optval, optlen);
case TCP_AO_GET_KEYS:
case TCP_AO_INFO: {
--
2.43.0

2023-11-29 16:58:23

by Dmitry Safonov

[permalink] [raw]
Subject: [PATCH v4 7/7] net/tcp: Don't store TCP-AO maclen on reqsk

This extra check doesn't work for a handshake when SYN segment has
(current_key.maclen != rnext_key.maclen). It could be amended to
preserve rnext_key.maclen instead of current_key.maclen, but that
requires a lookup on listen socket.

Originally, this extra maclen check was introduced just because it was
cheap. Drop it and convert tcp_request_sock::maclen into boolean
tcp_request_sock::used_tcp_ao.

Fixes: 06b22ef29591 ("net/tcp: Wire TCP-AO to request sockets")
Signed-off-by: Dmitry Safonov <[email protected]>
---
include/linux/tcp.h | 8 ++------
net/ipv4/tcp_ao.c | 4 ++--
net/ipv4/tcp_input.c | 5 +++--
net/ipv4/tcp_output.c | 9 +++------
4 files changed, 10 insertions(+), 16 deletions(-)

diff --git a/include/linux/tcp.h b/include/linux/tcp.h
index 68f3d315d2e1..b646b574b060 100644
--- a/include/linux/tcp.h
+++ b/include/linux/tcp.h
@@ -169,7 +169,7 @@ struct tcp_request_sock {
#ifdef CONFIG_TCP_AO
u8 ao_keyid;
u8 ao_rcv_next;
- u8 maclen;
+ bool used_tcp_ao;
#endif
};

@@ -180,14 +180,10 @@ static inline struct tcp_request_sock *tcp_rsk(const struct request_sock *req)

static inline bool tcp_rsk_used_ao(const struct request_sock *req)
{
- /* The real length of MAC is saved in the request socket,
- * signing anything with zero-length makes no sense, so here is
- * a little hack..
- */
#ifndef CONFIG_TCP_AO
return false;
#else
- return tcp_rsk(req)->maclen != 0;
+ return tcp_rsk(req)->used_tcp_ao;
#endif
}

diff --git a/net/ipv4/tcp_ao.c b/net/ipv4/tcp_ao.c
index 25fbb1e0a0ad..dbfea165ff44 100644
--- a/net/ipv4/tcp_ao.c
+++ b/net/ipv4/tcp_ao.c
@@ -846,7 +846,7 @@ void tcp_ao_syncookie(struct sock *sk, const struct sk_buff *skb,
const struct tcp_ao_hdr *aoh;
struct tcp_ao_key *key;

- treq->maclen = 0;
+ treq->used_tcp_ao = false;

if (tcp_parse_auth_options(th, NULL, &aoh) || !aoh)
return;
@@ -858,7 +858,7 @@ void tcp_ao_syncookie(struct sock *sk, const struct sk_buff *skb,

treq->ao_rcv_next = aoh->keyid;
treq->ao_keyid = aoh->rnext_keyid;
- treq->maclen = tcp_ao_maclen(key);
+ treq->used_tcp_ao = true;
}

static enum skb_drop_reason
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 0a58447c33b1..9bcbde89ab5c 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -7187,11 +7187,12 @@ int tcp_conn_request(struct request_sock_ops *rsk_ops,
if (tcp_parse_auth_options(tcp_hdr(skb), NULL, &aoh))
goto drop_and_release; /* Invalid TCP options */
if (aoh) {
- tcp_rsk(req)->maclen = aoh->length - sizeof(struct tcp_ao_hdr);
+ tcp_rsk(req)->used_tcp_ao = true;
tcp_rsk(req)->ao_rcv_next = aoh->keyid;
tcp_rsk(req)->ao_keyid = aoh->rnext_keyid;
+
} else {
- tcp_rsk(req)->maclen = 0;
+ tcp_rsk(req)->used_tcp_ao = false;
}
#endif
tcp_rsk(req)->snt_isn = isn;
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 3ddd057fb6f7..335ab90afe65 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -3720,7 +3720,6 @@ struct sk_buff *tcp_make_synack(const struct sock *sk, struct dst_entry *dst,
if (tcp_rsk_used_ao(req)) {
#ifdef CONFIG_TCP_AO
struct tcp_ao_key *ao_key = NULL;
- u8 maclen = tcp_rsk(req)->maclen;
u8 keyid = tcp_rsk(req)->ao_keyid;

ao_key = tcp_sk(sk)->af_specific->ao_lookup(sk, req_to_sk(req),
@@ -3730,13 +3729,11 @@ struct sk_buff *tcp_make_synack(const struct sock *sk, struct dst_entry *dst,
* for another peer-matching key, but the peer has requested
* ao_keyid (RFC5925 RNextKeyID), so let's keep it simple here.
*/
- if (unlikely(!ao_key || tcp_ao_maclen(ao_key) != maclen)) {
- u8 key_maclen = ao_key ? tcp_ao_maclen(ao_key) : 0;
-
+ if (unlikely(!ao_key)) {
rcu_read_unlock();
kfree_skb(skb);
- net_warn_ratelimited("TCP-AO: the keyid %u with maclen %u|%u from SYN packet is not present - not sending SYNACK\n",
- keyid, maclen, key_maclen);
+ net_warn_ratelimited("TCP-AO: the keyid %u from SYN packet is not present - not sending SYNACK\n",
+ keyid);
return NULL;
}
key.ao_key = ao_key;
--
2.43.0

2023-11-29 16:58:35

by Dmitry Safonov

[permalink] [raw]
Subject: [PATCH v4 6/7] net/tcp: Store SNEs + SEQs on ao_info

RFC 5925 (6.2):
> TCP-AO emulates a 64-bit sequence number space by inferring when to
> increment the high-order 32-bit portion (the SNE) based on
> transitions in the low-order portion (the TCP sequence number).

snd_sne and rcv_sne are the upper 4 bytes of extended SEQ number.
Unfortunately, reading two 4-bytes pointers can't be performed
atomically (without synchronization).

In order to avoid locks on TCP fastpath, let's just double-account for
SEQ changes: snd_una/rcv_nxt will be lower 4 bytes of snd_sne/rcv_sne.

Fixes: 64382c71a557 ("net/tcp: Add TCP-AO SNE support")
Signed-off-by: Dmitry Safonov <[email protected]>
---
include/net/tcp_ao.h | 25 +++++++++++++++++---
net/ipv4/tcp.c | 7 ++++--
net/ipv4/tcp_ao.c | 51 ++++++++++++++++++++++-------------------
net/ipv4/tcp_fastopen.c | 2 ++
net/ipv4/tcp_input.c | 21 ++++++++++-------
net/ipv4/tcp_output.c | 1 +
6 files changed, 71 insertions(+), 36 deletions(-)

diff --git a/include/net/tcp_ao.h b/include/net/tcp_ao.h
index 647781080613..b8ef25d4b632 100644
--- a/include/net/tcp_ao.h
+++ b/include/net/tcp_ao.h
@@ -121,8 +121,8 @@ struct tcp_ao_info {
* - for time-wait sockets the basis is tw_rcv_nxt/tw_snd_nxt.
* tw_snd_nxt is not expected to change, while tw_rcv_nxt may.
*/
- u32 snd_sne;
- u32 rcv_sne;
+ u64 snd_sne;
+ u64 rcv_sne;
refcount_t refcnt; /* Protects twsk destruction */
struct rcu_head rcu;
};
@@ -212,7 +212,6 @@ enum skb_drop_reason tcp_inbound_ao_hash(struct sock *sk,
const struct sk_buff *skb, unsigned short int family,
const struct request_sock *req, int l3index,
const struct tcp_ao_hdr *aoh);
-u32 tcp_ao_compute_sne(u32 next_sne, u32 next_seq, u32 seq);
struct tcp_ao_key *tcp_ao_do_lookup(const struct sock *sk, int l3index,
const union tcp_ao_addr *addr,
int family, int sndid, int rcvid);
@@ -353,6 +352,26 @@ static inline int tcp_ao_set_repair(struct sock *sk,
}
#endif

+static inline void tcp_ao_sne_set(struct tcp_sock *tp, bool send, u64 sne)
+{
+#ifdef CONFIG_TCP_AO
+ struct tcp_ao_info *ao;
+
+ if (!static_branch_unlikely(&tcp_ao_needed.key))
+ return;
+
+ ao = rcu_dereference_protected(tp->ao_info,
+ lockdep_sock_is_held((struct sock *)tp));
+ if (!ao)
+ return;
+
+ if (send)
+ WRITE_ONCE(ao->snd_sne, sne);
+ else
+ WRITE_ONCE(ao->rcv_sne, sne);
+#endif
+}
+
#if defined(CONFIG_TCP_MD5SIG) || defined(CONFIG_TCP_AO)
int tcp_do_parse_auth_options(const struct tcphdr *th,
const u8 **md5_hash, const u8 **ao_hash);
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index b1fe4eb01829..431c10917d27 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -3545,16 +3545,19 @@ int do_tcp_setsockopt(struct sock *sk, int level, int optname,
if (sk->sk_state != TCP_CLOSE) {
err = -EPERM;
} else if (tp->repair_queue == TCP_SEND_QUEUE) {
- if (!tcp_rtx_queue_empty(sk))
+ if (!tcp_rtx_queue_empty(sk)) {
err = -EPERM;
- else
+ } else {
WRITE_ONCE(tp->write_seq, val);
+ tcp_ao_sne_set(tp, true, val);
+ }
} else if (tp->repair_queue == TCP_RECV_QUEUE) {
if (tp->rcv_nxt != tp->copied_seq) {
err = -EPERM;
} else {
WRITE_ONCE(tp->rcv_nxt, val);
WRITE_ONCE(tp->copied_seq, val);
+ tcp_ao_sne_set(tp, false, val);
}
} else {
err = -EINVAL;
diff --git a/net/ipv4/tcp_ao.c b/net/ipv4/tcp_ao.c
index 465c871786aa..25fbb1e0a0ad 100644
--- a/net/ipv4/tcp_ao.c
+++ b/net/ipv4/tcp_ao.c
@@ -472,9 +472,10 @@ static int tcp_ao_hash_pseudoheader(unsigned short int family,
return -EAFNOSUPPORT;
}

-u32 tcp_ao_compute_sne(u32 next_sne, u32 next_seq, u32 seq)
+static u32 tcp_ao_compute_sne(u64 seq_sne, u32 seq)
{
- u32 sne = next_sne;
+ u32 next_seq = (u32)(seq_sne & 0xffffffff);
+ u32 sne = seq_sne >> 32;

if (before(seq, next_seq)) {
if (seq > next_seq)
@@ -483,7 +484,6 @@ u32 tcp_ao_compute_sne(u32 next_sne, u32 next_seq, u32 seq)
if (seq < next_seq)
sne++;
}
-
return sne;
}

@@ -731,7 +731,7 @@ int tcp_ao_prepare_reset(const struct sock *sk, struct sk_buff *skb,

sisn = htonl(tcp_rsk(req)->rcv_isn);
disn = htonl(tcp_rsk(req)->snt_isn);
- *sne = tcp_ao_compute_sne(0, tcp_rsk(req)->snt_isn, seq);
+ *sne = tcp_ao_compute_sne(tcp_rsk(req)->snt_isn, seq);
} else {
sisn = th->seq;
disn = 0;
@@ -763,14 +763,11 @@ int tcp_ao_prepare_reset(const struct sock *sk, struct sk_buff *skb,
*keyid = (*key)->rcvid;
} else {
struct tcp_ao_key *rnext_key;
- u32 snd_basis;

if (sk->sk_state == TCP_TIME_WAIT) {
ao_info = rcu_dereference(tcp_twsk(sk)->ao_info);
- snd_basis = tcp_twsk(sk)->tw_snd_nxt;
} else {
ao_info = rcu_dereference(tcp_sk(sk)->ao_info);
- snd_basis = tcp_sk(sk)->snd_una;
}
if (!ao_info)
return -ENOENT;
@@ -781,8 +778,7 @@ int tcp_ao_prepare_reset(const struct sock *sk, struct sk_buff *skb,
*traffic_key = snd_other_key(*key);
rnext_key = READ_ONCE(ao_info->rnext_key);
*keyid = rnext_key->rcvid;
- *sne = tcp_ao_compute_sne(READ_ONCE(ao_info->snd_sne),
- snd_basis, seq);
+ *sne = tcp_ao_compute_sne(READ_ONCE(ao_info->snd_sne), seq);
}
return 0;
}
@@ -816,8 +812,7 @@ int tcp_ao_transmit_skb(struct sock *sk, struct sk_buff *skb,
tp->af_specific->ao_calc_key_sk(key, traffic_key,
sk, ao->lisn, disn, true);
}
- sne = tcp_ao_compute_sne(READ_ONCE(ao->snd_sne), READ_ONCE(tp->snd_una),
- ntohl(th->seq));
+ sne = tcp_ao_compute_sne(READ_ONCE(ao->snd_sne), ntohl(th->seq));
tp->af_specific->calc_ao_hash(hash_location, key, sk, skb, traffic_key,
hash_location - (u8 *)th, sne);
kfree(tkey_buf);
@@ -938,8 +933,8 @@ tcp_inbound_ao_hash(struct sock *sk, const struct sk_buff *skb,

/* Fast-path */
if (likely((1 << sk->sk_state) & TCP_AO_ESTABLISHED)) {
- enum skb_drop_reason err;
struct tcp_ao_key *current_key;
+ enum skb_drop_reason err;

/* Check if this socket's rnext_key matches the keyid in the
* packet. If not we lookup the key based on the keyid
@@ -956,8 +951,7 @@ tcp_inbound_ao_hash(struct sock *sk, const struct sk_buff *skb,
if (unlikely(th->syn && !th->ack))
goto verify_hash;

- sne = tcp_ao_compute_sne(info->rcv_sne, tcp_sk(sk)->rcv_nxt,
- ntohl(th->seq));
+ sne = tcp_ao_compute_sne(READ_ONCE(info->rcv_sne), ntohl(th->seq));
/* Established socket, traffic key are cached */
traffic_key = rcv_other_key(key);
err = tcp_ao_verify_hash(sk, skb, family, info, aoh, key,
@@ -992,7 +986,7 @@ tcp_inbound_ao_hash(struct sock *sk, const struct sk_buff *skb,
if ((1 << sk->sk_state) & (TCPF_LISTEN | TCPF_NEW_SYN_RECV)) {
/* Make the initial syn the likely case here */
if (unlikely(req)) {
- sne = tcp_ao_compute_sne(0, tcp_rsk(req)->rcv_isn,
+ sne = tcp_ao_compute_sne(tcp_rsk(req)->rcv_isn,
ntohl(th->seq));
sisn = htonl(tcp_rsk(req)->rcv_isn);
disn = htonl(tcp_rsk(req)->snt_isn);
@@ -1000,8 +994,7 @@ tcp_inbound_ao_hash(struct sock *sk, const struct sk_buff *skb,
/* Possible syncookie packet */
sisn = htonl(ntohl(th->seq) - 1);
disn = htonl(ntohl(th->ack_seq) - 1);
- sne = tcp_ao_compute_sne(0, ntohl(sisn),
- ntohl(th->seq));
+ sne = tcp_ao_compute_sne(ntohl(sisn), ntohl(th->seq));
} else if (unlikely(!th->syn)) {
/* no way to figure out initial sisn/disn - drop */
return SKB_DROP_REASON_TCP_FLAGS;
@@ -1103,7 +1096,8 @@ void tcp_ao_connect_init(struct sock *sk)
tp->tcp_header_len += tcp_ao_len_aligned(key);

ao_info->lisn = htonl(tp->write_seq);
- ao_info->snd_sne = 0;
+ ao_info->snd_sne = htonl(tp->write_seq);
+ ao_info->rcv_sne = 0;
} else {
/* Can't happen: tcp_connect() verifies that there's
* at least one tcp-ao key that matches the remote peer.
@@ -1139,7 +1133,7 @@ void tcp_ao_finish_connect(struct sock *sk, struct sk_buff *skb)
return;

WRITE_ONCE(ao->risn, tcp_hdr(skb)->seq);
- ao->rcv_sne = 0;
+ WRITE_ONCE(ao->rcv_sne, ntohl(tcp_hdr(skb)->seq));

hlist_for_each_entry_rcu(key, &ao->head, node)
tcp_ao_cache_traffic_keys(sk, ao, key);
@@ -1169,6 +1163,8 @@ int tcp_ao_copy_all_matching(const struct sock *sk, struct sock *newsk,
return -ENOMEM;
new_ao->lisn = htonl(tcp_rsk(req)->snt_isn);
new_ao->risn = htonl(tcp_rsk(req)->rcv_isn);
+ new_ao->snd_sne = tcp_rsk(req)->snt_isn;
+ new_ao->rcv_sne = tcp_rsk(req)->rcv_isn;
new_ao->ao_required = ao->ao_required;
new_ao->accept_icmps = ao->accept_icmps;

@@ -1700,6 +1696,8 @@ static int tcp_ao_add_cmd(struct sock *sk, unsigned short int family,
goto err_free_sock;
}
sk_gso_disable(sk);
+ WRITE_ONCE(ao_info->snd_sne, tcp_sk(sk)->snd_una);
+ WRITE_ONCE(ao_info->rcv_sne, tcp_sk(sk)->rcv_nxt);
rcu_assign_pointer(tcp_sk(sk)->ao_info, ao_info);
}

@@ -2340,6 +2338,7 @@ int tcp_ao_set_repair(struct sock *sk, sockptr_t optval, unsigned int optlen)
struct tcp_ao_repair cmd;
struct tcp_ao_key *key;
struct tcp_ao_info *ao;
+ u64 sne;
int err;

if (optlen < sizeof(cmd))
@@ -2360,8 +2359,14 @@ int tcp_ao_set_repair(struct sock *sk, sockptr_t optval, unsigned int optlen)

WRITE_ONCE(ao->lisn, cmd.snt_isn);
WRITE_ONCE(ao->risn, cmd.rcv_isn);
- WRITE_ONCE(ao->snd_sne, cmd.snd_sne);
- WRITE_ONCE(ao->rcv_sne, cmd.rcv_sne);
+
+ sne = READ_ONCE(ao->snd_sne) & 0xffffffff;
+ sne += (u64)cmd.snd_sne << 32;
+ WRITE_ONCE(ao->snd_sne, sne);
+
+ sne = READ_ONCE(ao->rcv_sne) & 0xffffffff;
+ sne += (u64)cmd.rcv_sne << 32;
+ WRITE_ONCE(ao->rcv_sne, sne);

hlist_for_each_entry_rcu(key, &ao->head, node)
tcp_ao_cache_traffic_keys(sk, ao, key);
@@ -2394,8 +2399,8 @@ int tcp_ao_get_repair(struct sock *sk, sockptr_t optval, sockptr_t optlen)

opt.snt_isn = ao->lisn;
opt.rcv_isn = ao->risn;
- opt.snd_sne = READ_ONCE(ao->snd_sne);
- opt.rcv_sne = READ_ONCE(ao->rcv_sne);
+ opt.snd_sne = READ_ONCE(ao->snd_sne) >> 32;
+ opt.rcv_sne = READ_ONCE(ao->rcv_sne) >> 32;
rcu_read_unlock();

if (copy_to_sockptr(optval, &opt, min_t(int, len, sizeof(opt))))
diff --git a/net/ipv4/tcp_fastopen.c b/net/ipv4/tcp_fastopen.c
index 8ed54e7334a9..d28d0df300d3 100644
--- a/net/ipv4/tcp_fastopen.c
+++ b/net/ipv4/tcp_fastopen.c
@@ -194,6 +194,7 @@ void tcp_fastopen_add_skb(struct sock *sk, struct sk_buff *skb)
TCP_SKB_CB(skb)->tcp_flags &= ~TCPHDR_SYN;

tp->rcv_nxt = TCP_SKB_CB(skb)->end_seq;
+ tcp_ao_sne_set(tp, false, TCP_SKB_CB(skb)->end_seq);
__skb_queue_tail(&sk->sk_receive_queue, skb);
tp->syn_data_acked = 1;

@@ -282,6 +283,7 @@ static struct sock *tcp_fastopen_create_child(struct sock *sk,
tcp_init_transfer(child, BPF_SOCK_OPS_PASSIVE_ESTABLISHED_CB, skb);

tp->rcv_nxt = TCP_SKB_CB(skb)->seq + 1;
+ tcp_ao_sne_set(tp, false, TCP_SKB_CB(skb)->seq + 1);

tcp_fastopen_add_skb(child, skb);

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index bcb55d98004c..0a58447c33b1 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -3572,7 +3572,7 @@ static inline bool tcp_may_update_window(const struct tcp_sock *tp,
(ack_seq == tp->snd_wl1 && (nwin > tp->snd_wnd || !nwin));
}

-static void tcp_snd_sne_update(struct tcp_sock *tp, u32 ack)
+static void tcp_ao_snd_sne_update(struct tcp_sock *tp, u32 delta)
{
#ifdef CONFIG_TCP_AO
struct tcp_ao_info *ao;
@@ -3582,8 +3582,9 @@ static void tcp_snd_sne_update(struct tcp_sock *tp, u32 ack)

ao = rcu_dereference_protected(tp->ao_info,
lockdep_sock_is_held((struct sock *)tp));
- if (ao && ack < tp->snd_una)
- ao->snd_sne++;
+ if (!ao)
+ return;
+ WRITE_ONCE(ao->snd_sne, ao->snd_sne + delta);
#endif
}

@@ -3594,11 +3595,11 @@ static void tcp_snd_una_update(struct tcp_sock *tp, u32 ack)

sock_owned_by_me((struct sock *)tp);
tp->bytes_acked += delta;
- tcp_snd_sne_update(tp, ack);
+ tcp_ao_snd_sne_update(tp, delta);
tp->snd_una = ack;
}

-static void tcp_rcv_sne_update(struct tcp_sock *tp, u32 seq)
+static void tcp_ao_rcv_sne_update(struct tcp_sock *tp, u32 delta)
{
#ifdef CONFIG_TCP_AO
struct tcp_ao_info *ao;
@@ -3608,8 +3609,9 @@ static void tcp_rcv_sne_update(struct tcp_sock *tp, u32 seq)

ao = rcu_dereference_protected(tp->ao_info,
lockdep_sock_is_held((struct sock *)tp));
- if (ao && seq < tp->rcv_nxt)
- ao->rcv_sne++;
+ if (!ao)
+ return;
+ WRITE_ONCE(ao->rcv_sne, ao->rcv_sne + delta);
#endif
}

@@ -3620,7 +3622,7 @@ static void tcp_rcv_nxt_update(struct tcp_sock *tp, u32 seq)

sock_owned_by_me((struct sock *)tp);
tp->bytes_received += delta;
- tcp_rcv_sne_update(tp, seq);
+ tcp_ao_rcv_sne_update(tp, delta);
WRITE_ONCE(tp->rcv_nxt, seq);
}

@@ -6400,6 +6402,7 @@ static int tcp_rcv_synsent_state_process(struct sock *sk, struct sk_buff *skb,
* move to established.
*/
WRITE_ONCE(tp->rcv_nxt, TCP_SKB_CB(skb)->seq + 1);
+ tcp_ao_sne_set(tp, false, TCP_SKB_CB(skb)->seq + 1);
tp->rcv_wup = TCP_SKB_CB(skb)->seq + 1;

/* RFC1323: The window in SYN & SYN/ACK segments is
@@ -6510,6 +6513,7 @@ static int tcp_rcv_synsent_state_process(struct sock *sk, struct sk_buff *skb,
}

WRITE_ONCE(tp->rcv_nxt, TCP_SKB_CB(skb)->seq + 1);
+ tcp_ao_sne_set(tp, false, TCP_SKB_CB(skb)->seq + 1);
WRITE_ONCE(tp->copied_seq, tp->rcv_nxt);
tp->rcv_wup = TCP_SKB_CB(skb)->seq + 1;

@@ -6722,6 +6726,7 @@ int tcp_rcv_state_process(struct sock *sk, struct sk_buff *skb)
if (sk->sk_socket)
sk_wake_async(sk, SOCK_WAKE_IO, POLL_OUT);

+ tcp_ao_sne_set(tp, true, TCP_SKB_CB(skb)->ack_seq);
tp->snd_una = TCP_SKB_CB(skb)->ack_seq;
tp->snd_wnd = ntohs(th->window) << tp->rx_opt.snd_wscale;
tcp_init_wl(tp, TCP_SKB_CB(skb)->seq);
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 93eef1dbbc55..3ddd057fb6f7 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -3882,6 +3882,7 @@ static void tcp_connect_init(struct sock *sk)
tp->snd_wnd = 0;
tcp_init_wl(tp, 0);
tcp_write_queue_purge(sk);
+ tcp_ao_sne_set(tp, true, tp->write_seq);
tp->snd_una = tp->write_seq;
tp->snd_sml = tp->write_seq;
tp->snd_up = tp->write_seq;
--
2.43.0

2023-11-29 17:39:12

by Eric Dumazet

[permalink] [raw]
Subject: Re: [PATCH v4 3/7] net/tcp: Limit TCP_AO_REPAIR to non-listen sockets

On Wed, Nov 29, 2023 at 5:57 PM Dmitry Safonov <[email protected]> wrote:
>
> Listen socket is not an established TCP connection, so
> setsockopt(TCP_AO_REPAIR) doesn't have any impact.
>
> Restrict this uAPI for listen sockets.
>
> Fixes: faadfaba5e01 ("net/tcp: Add TCP_AO_REPAIR")
> Signed-off-by: Dmitry Safonov <[email protected]>

Reviewed-by: Eric Dumazet <[email protected]>

2023-11-29 17:53:59

by Eric Dumazet

[permalink] [raw]
Subject: Re: [PATCH v4 4/7] net/tcp: Allow removing current/rnext TCP-AO keys on TCP_LISTEN sockets

On Wed, Nov 29, 2023 at 5:57 PM Dmitry Safonov <[email protected]> wrote:
>
> TCP_LISTEN sockets are not connected to any peer, so having
> current_key/rnext_key doesn't make sense.

I do not understand this patch.

This seems that the clearing should happen at disconnect time, from
tcp_disconnect() ?

Why forcing user to set a socket option to clear these fields ?

>
> The userspace may falter over this issue by setting current or rnext
> TCP-AO key before listen() syscall. setsockopt(TCP_AO_DEL_KEY) doesn't
> allow removing a key that is in use (in accordance to RFC 5925), so
> it might be inconvenient to have keys that can be destroyed only with
> listener socket.
>
> Fixes: 4954f17ddefc ("net/tcp: Introduce TCP_AO setsockopt()s")
> Signed-off-by: Dmitry Safonov <[email protected]>
> ---
> net/ipv4/tcp_ao.c | 12 ++++++++++--
> 1 file changed, 10 insertions(+), 2 deletions(-)
>
> diff --git a/net/ipv4/tcp_ao.c b/net/ipv4/tcp_ao.c
> index c8be1d526eac..bf41be6d4721 100644
> --- a/net/ipv4/tcp_ao.c
> +++ b/net/ipv4/tcp_ao.c
> @@ -1818,8 +1818,16 @@ static int tcp_ao_del_cmd(struct sock *sk, unsigned short int family,
> if (!new_rnext)
> return -ENOENT;
> }
> - if (cmd.del_async && sk->sk_state != TCP_LISTEN)
> - return -EINVAL;
> + if (sk->sk_state == TCP_LISTEN) {
> + /* Cleaning up possible "stale" current/rnext keys state,
> + * that may have preserved from TCP_CLOSE, before sys_listen()
> + */
> + ao_info->current_key = NULL;
> + ao_info->rnext_key = NULL;
> + } else {
> + if (cmd.del_async)
> + return -EINVAL;
> + }
>
> if (family == AF_INET) {
> struct sockaddr_in *sin = (struct sockaddr_in *)&cmd.addr;
> --
> 2.43.0
>

2023-11-29 17:59:47

by Eric Dumazet

[permalink] [raw]
Subject: Re: [PATCH v4 5/7] net/tcp: Don't add key with non-matching VRF on connected sockets

On Wed, Nov 29, 2023 at 5:57 PM Dmitry Safonov <[email protected]> wrote:
>
> If the connection was established, don't allow adding TCP-AO keys that
> don't match the peer. Currently, there are checks for ip-address
> matching, but L3 index check is missing. Add it to restrict userspace
> shooting itself somewhere.
>
> Yet, nothing restricts the CAP_NET_RAW user from trying to shoot
> themselves by performing setsockopt(SO_BINDTODEVICE) or
> setsockopt(SO_BINDTOIFINDEX) over an established TCP-AO connection.
> So, this is just "minimum effort" to potentially save someone's
> debugging time, rather than a full restriction on doing weird things.
>
> Fixes: 248411b8cb89 ("net/tcp: Wire up l3index to TCP-AO")
> Signed-off-by: Dmitry Safonov <[email protected]>

Reviewed-by: Eric Dumazet <[email protected]>

2023-11-29 18:09:39

by Eric Dumazet

[permalink] [raw]
Subject: Re: [PATCH v4 6/7] net/tcp: Store SNEs + SEQs on ao_info

On Wed, Nov 29, 2023 at 5:57 PM Dmitry Safonov <[email protected]> wrote:
>
> RFC 5925 (6.2):
> > TCP-AO emulates a 64-bit sequence number space by inferring when to
> > increment the high-order 32-bit portion (the SNE) based on
> > transitions in the low-order portion (the TCP sequence number).
>
> snd_sne and rcv_sne are the upper 4 bytes of extended SEQ number.
> Unfortunately, reading two 4-bytes pointers can't be performed
> atomically (without synchronization).
>
> In order to avoid locks on TCP fastpath, let's just double-account for
> SEQ changes: snd_una/rcv_nxt will be lower 4 bytes of snd_sne/rcv_sne.
>

This will not work on 32bit kernels ?

Unless ao->snd_sne and ao->rcv_sneare only read/written under the
socket lock (and in this case no READ_ONCE()/WRITE_ONCE() should be
necessary)

2023-11-29 18:11:39

by Dmitry Safonov

[permalink] [raw]
Subject: Re: [PATCH v4 4/7] net/tcp: Allow removing current/rnext TCP-AO keys on TCP_LISTEN sockets

On 11/29/23 17:53, Eric Dumazet wrote:
> On Wed, Nov 29, 2023 at 5:57 PM Dmitry Safonov <[email protected]> wrote:
>>
>> TCP_LISTEN sockets are not connected to any peer, so having
>> current_key/rnext_key doesn't make sense.
>
> I do not understand this patch.
>
> This seems that the clearing should happen at disconnect time, from
> tcp_disconnect() ?

Yeah, probably the patch description could have been better.

The key here is that TCP_CLOSE may have current/rnext keys: they will be
the ones that are used on connect() for 3way handshake. While for
TCP_LISTEN it doesn't make any sense to have them (as they otherwise
should be per-peer ip/netmask).

So, initially I thought of cleaning them up on listen() syscall [1].
But obviously the result was a bit gross.

So, I decided to just let userspace delete any keys on TCP_LISTEN by
cleaning current/rnext pointers before the checks that don't allow
removing current/rnext keys as they are in use by connection.

For TCP_CLOSE it's a lesser deal:
- the socket may just be closed
- alternatively, the user may add a new key and set it as current/rnext
and then remove the old key (as it won't be current/rnext anymore).

I also should note that currently it's not possible to set/change
current/rnext key on TCP_LISTEN, this situation is only a theoretical
issue that may be encountered by userspace if it sets those keys by any
random reason before listen():

static bool tcp_ao_can_set_current_rnext(struct sock *sk)
{
/* There aren't current/rnext keys on TCP_LISTEN sockets */
if (sk->sk_state == TCP_LISTEN)
return false;
return true;
}

> Why forcing user to set a socket option to clear these fields ?

It's just before the checks that disallow removing keys in use:

static int tcp_ao_delete_key(struct sock *sk, struct tcp_ao_info *ao_info,
bool del_async, struct tcp_ao_key *key,
struct tcp_ao_key *new_current,
struct tcp_ao_key *new_rnext)
{
...
if (unlikely(READ_ONCE(ao_info->current_key) == key ||
READ_ONCE(ao_info->rnext_key) == key)) {
err = -EBUSY;
goto add_key;
}


>> The userspace may falter over this issue by setting current or rnext
>> TCP-AO key before listen() syscall. setsockopt(TCP_AO_DEL_KEY) doesn't
>> allow removing a key that is in use (in accordance to RFC 5925), so
>> it might be inconvenient to have keys that can be destroyed only with
>> listener socket.
>>
>> Fixes: 4954f17ddefc ("net/tcp: Introduce TCP_AO setsockopt()s")
>> Signed-off-by: Dmitry Safonov <[email protected]>
>> ---
>> net/ipv4/tcp_ao.c | 12 ++++++++++--
>> 1 file changed, 10 insertions(+), 2 deletions(-)
>>
>> diff --git a/net/ipv4/tcp_ao.c b/net/ipv4/tcp_ao.c
>> index c8be1d526eac..bf41be6d4721 100644
>> --- a/net/ipv4/tcp_ao.c
>> +++ b/net/ipv4/tcp_ao.c
>> @@ -1818,8 +1818,16 @@ static int tcp_ao_del_cmd(struct sock *sk, unsigned short int family,
>> if (!new_rnext)
>> return -ENOENT;
>> }
>> - if (cmd.del_async && sk->sk_state != TCP_LISTEN)
>> - return -EINVAL;
>> + if (sk->sk_state == TCP_LISTEN) {
>> + /* Cleaning up possible "stale" current/rnext keys state,
>> + * that may have preserved from TCP_CLOSE, before sys_listen()
>> + */
>> + ao_info->current_key = NULL;
>> + ao_info->rnext_key = NULL;
>> + } else {
>> + if (cmd.del_async)
>> + return -EINVAL;
>> + }
>>
>> if (family == AF_INET) {
>> struct sockaddr_in *sin = (struct sockaddr_in *)&cmd.addr;
>> --
>> 2.43.0
>>

[1]
https://lore.kernel.org/all/CANn89i+xvBQY5HLXNkjW0o9R4SX1hqRisJnr54ZqwuOpEJdHeA@mail.gmail.com/T/#mfd4461bf1dabf6e4b994d85f5191b6cefce337cd

Thanks,
Dmitry

2023-11-29 18:12:07

by Eric Dumazet

[permalink] [raw]
Subject: Re: [PATCH v4 7/7] net/tcp: Don't store TCP-AO maclen on reqsk

On Wed, Nov 29, 2023 at 5:57 PM Dmitry Safonov <[email protected]> wrote:
>
> This extra check doesn't work for a handshake when SYN segment has
> (current_key.maclen != rnext_key.maclen). It could be amended to
> preserve rnext_key.maclen instead of current_key.maclen, but that
> requires a lookup on listen socket.
>
> Originally, this extra maclen check was introduced just because it was
> cheap. Drop it and convert tcp_request_sock::maclen into boolean
> tcp_request_sock::used_tcp_ao.
>
> Fixes: 06b22ef29591 ("net/tcp: Wire TCP-AO to request sockets")
> Signed-off-by: Dmitry Safonov <[email protected]>
>

Reviewed-by: Eric Dumazet <[email protected]>

2023-11-29 18:14:33

by Dmitry Safonov

[permalink] [raw]
Subject: Re: [PATCH v4 6/7] net/tcp: Store SNEs + SEQs on ao_info

On 11/29/23 18:09, Eric Dumazet wrote:
> On Wed, Nov 29, 2023 at 5:57 PM Dmitry Safonov <[email protected]> wrote:
>>
>> RFC 5925 (6.2):
>>> TCP-AO emulates a 64-bit sequence number space by inferring when to
>>> increment the high-order 32-bit portion (the SNE) based on
>>> transitions in the low-order portion (the TCP sequence number).
>>
>> snd_sne and rcv_sne are the upper 4 bytes of extended SEQ number.
>> Unfortunately, reading two 4-bytes pointers can't be performed
>> atomically (without synchronization).
>>
>> In order to avoid locks on TCP fastpath, let's just double-account for
>> SEQ changes: snd_una/rcv_nxt will be lower 4 bytes of snd_sne/rcv_sne.
>>
>
> This will not work on 32bit kernels ?

Yeah, unsure if there's someone who wants to run BGP on 32bit box, so at
this moment it's already limited:

config TCP_AO
bool "TCP: Authentication Option (RFC5925)"
select CRYPTO
select TCP_SIGPOOL
depends on 64BIT && IPV6 != m # seq-number extension needs WRITE_ONCE(u64)

Probably, if there will be a person who is interested in this, it can
get a spinlock for !CONFIG_64BIT.

> Unless ao->snd_sne and ao->rcv_sneare only read/written under the
> socket lock (and in this case no READ_ONCE()/WRITE_ONCE() should be
> necessary)

Thanks,
Dmitry

2023-11-29 18:34:37

by Eric Dumazet

[permalink] [raw]
Subject: Re: [PATCH v4 6/7] net/tcp: Store SNEs + SEQs on ao_info

On Wed, Nov 29, 2023 at 7:14 PM Dmitry Safonov <[email protected]> wrote:
>
> On 11/29/23 18:09, Eric Dumazet wrote:
> > On Wed, Nov 29, 2023 at 5:57 PM Dmitry Safonov <[email protected]> wrote:
> >>
> >> RFC 5925 (6.2):
> >>> TCP-AO emulates a 64-bit sequence number space by inferring when to
> >>> increment the high-order 32-bit portion (the SNE) based on
> >>> transitions in the low-order portion (the TCP sequence number).
> >>
> >> snd_sne and rcv_sne are the upper 4 bytes of extended SEQ number.
> >> Unfortunately, reading two 4-bytes pointers can't be performed
> >> atomically (without synchronization).
> >>
> >> In order to avoid locks on TCP fastpath, let's just double-account for
> >> SEQ changes: snd_una/rcv_nxt will be lower 4 bytes of snd_sne/rcv_sne.
> >>
> >
> > This will not work on 32bit kernels ?
>
> Yeah, unsure if there's someone who wants to run BGP on 32bit box, so at
> this moment it's already limited:
>
> config TCP_AO
> bool "TCP: Authentication Option (RFC5925)"
> select CRYPTO
> select TCP_SIGPOOL
> depends on 64BIT && IPV6 != m # seq-number extension needs WRITE_ONCE(u64)
>

Oh well, this seems quite strange to have such a limitation.

> Probably, if there will be a person who is interested in this, it can
> get a spinlock for !CONFIG_64BIT.


>
> > Unless ao->snd_sne and ao->rcv_sneare only read/written under the
> > socket lock (and in this case no READ_ONCE()/WRITE_ONCE() should be
> > necessary)
>

You have not commented on where these are read without the socket lock held ?

tcp_ao_get_repair() can lock the socket.

In TW state, I guess these values can not be changed ?

I think you can remove all these READ_ONCE()/WRITE_ONCE() which are not needed,
or please add a comment if they really are.

Then, you might be able to remove the 64BIT dependency ...

2023-11-29 19:58:31

by Dmitry Safonov

[permalink] [raw]
Subject: Re: [PATCH v4 6/7] net/tcp: Store SNEs + SEQs on ao_info

On 11/29/23 18:34, Eric Dumazet wrote:
> On Wed, Nov 29, 2023 at 7:14 PM Dmitry Safonov <[email protected]> wrote:
>>
>> On 11/29/23 18:09, Eric Dumazet wrote:
>>> On Wed, Nov 29, 2023 at 5:57 PM Dmitry Safonov <[email protected]> wrote:
>>>>
>>>> RFC 5925 (6.2):
>>>>> TCP-AO emulates a 64-bit sequence number space by inferring when to
>>>>> increment the high-order 32-bit portion (the SNE) based on
>>>>> transitions in the low-order portion (the TCP sequence number).
>>>>
>>>> snd_sne and rcv_sne are the upper 4 bytes of extended SEQ number.
>>>> Unfortunately, reading two 4-bytes pointers can't be performed
>>>> atomically (without synchronization).
>>>>
>>>> In order to avoid locks on TCP fastpath, let's just double-account for
>>>> SEQ changes: snd_una/rcv_nxt will be lower 4 bytes of snd_sne/rcv_sne.
>>>>
>>>
>>> This will not work on 32bit kernels ?
>>
>> Yeah, unsure if there's someone who wants to run BGP on 32bit box, so at
>> this moment it's already limited:
>>
>> config TCP_AO
>> bool "TCP: Authentication Option (RFC5925)"
>> select CRYPTO
>> select TCP_SIGPOOL
>> depends on 64BIT && IPV6 != m # seq-number extension needs WRITE_ONCE(u64)
>>
>
> Oh well, this seems quite strange to have such a limitation.

I guess so. On the other side, it seems that there aren't many
non-hobbyist 32bit platforms: ia32 compatible layer will even be limited
with a boot parameter/compile option. Maybe I'm not aware of, but it
seems that arm64/ppc64/risc-v/x86_64 are the ones everyone interested in
these days.

>
>> Probably, if there will be a person who is interested in this, it can
>> get a spinlock for !CONFIG_64BIT.
>
>
>>
>>> Unless ao->snd_sne and ao->rcv_sneare only read/written under the
>>> socket lock (and in this case no READ_ONCE()/WRITE_ONCE() should be
>>> necessary)
>>
>
> You have not commented on where these are read without the socket lock held ?

Sorry for missing this, the SNEs are used with this helper
tcp_ao_compute_sne(), so these places are (in square brackets AFAICS,
there is a chance that I miss something obvious from your message):

- tcp_v4_send_reset() => tcp_ao_prepare_reset() [rcu_read_lock()]
- __tcp_transmit_skb() => tcp_ao_transmit_skb() [TX softirq]
- tcp_v4_rcv() => tcp_inbound_ao_hash() [RX softirq]


> tcp_ao_get_repair() can lock the socket.

It can, sure.

> In TW state, I guess these values can not be changed ?

Currently, they are considered constant on TW. The incoming segments are
not verified on twsk (so no need for SNEs). And from ACK side not
expecting SEQ roll-over (tcp_ao_compute_sne() is not called) - this may
change, but not quite critical it seems.

If we go with this patch in question, I'll have to update this:
: key.sne = READ_ONCE(ao_info->snd_sne);
(didn't adjust it for higher-bytes shift)

> I think you can remove all these READ_ONCE()/WRITE_ONCE() which are not needed,
> or please add a comment if they really are.

Not sure if I answered above..

> Then, you might be able to remove the 64BIT dependency ...

At this moment I fail to imagine anyone running BGP + TCP-AO on 32bit
kernel. I may be wrong, for sure.

Thanks,
Dmitry

2023-11-29 21:01:54

by Eric Dumazet

[permalink] [raw]
Subject: Re: [PATCH v4 6/7] net/tcp: Store SNEs + SEQs on ao_info

On Wed, Nov 29, 2023 at 8:58 PM Dmitry Safonov <[email protected]> wrote:
>
> On 11/29/23 18:34, Eric Dumazet wrote:
> > On Wed, Nov 29, 2023 at 7:14 PM Dmitry Safonov <[email protected]> wrote:
> >>
> >> On 11/29/23 18:09, Eric Dumazet wrote:
> >>> On Wed, Nov 29, 2023 at 5:57 PM Dmitry Safonov <[email protected]> wrote:
> >>>>
> >>>> RFC 5925 (6.2):
> >>>>> TCP-AO emulates a 64-bit sequence number space by inferring when to
> >>>>> increment the high-order 32-bit portion (the SNE) based on
> >>>>> transitions in the low-order portion (the TCP sequence number).
> >>>>
> >>>> snd_sne and rcv_sne are the upper 4 bytes of extended SEQ number.
> >>>> Unfortunately, reading two 4-bytes pointers can't be performed
> >>>> atomically (without synchronization).
> >>>>
> >>>> In order to avoid locks on TCP fastpath, let's just double-account for
> >>>> SEQ changes: snd_una/rcv_nxt will be lower 4 bytes of snd_sne/rcv_sne.
> >>>>
> >>>
> >>> This will not work on 32bit kernels ?
> >>
> >> Yeah, unsure if there's someone who wants to run BGP on 32bit box, so at
> >> this moment it's already limited:
> >>
> >> config TCP_AO
> >> bool "TCP: Authentication Option (RFC5925)"
> >> select CRYPTO
> >> select TCP_SIGPOOL
> >> depends on 64BIT && IPV6 != m # seq-number extension needs WRITE_ONCE(u64)
> >>
> >
> > Oh well, this seems quite strange to have such a limitation.
>
> I guess so. On the other side, it seems that there aren't many
> non-hobbyist 32bit platforms: ia32 compatible layer will even be limited
> with a boot parameter/compile option. Maybe I'm not aware of, but it
> seems that arm64/ppc64/risc-v/x86_64 are the ones everyone interested in
> these days.
>
> >
> >> Probably, if there will be a person who is interested in this, it can
> >> get a spinlock for !CONFIG_64BIT.
> >
> >
> >>
> >>> Unless ao->snd_sne and ao->rcv_sneare only read/written under the
> >>> socket lock (and in this case no READ_ONCE()/WRITE_ONCE() should be
> >>> necessary)
> >>
> >
> > You have not commented on where these are read without the socket lock held ?
>
> Sorry for missing this, the SNEs are used with this helper
> tcp_ao_compute_sne(), so these places are (in square brackets AFAICS,
> there is a chance that I miss something obvious from your message):
>
> - tcp_v4_send_reset() => tcp_ao_prepare_reset() [rcu_read_lock()]
> - __tcp_transmit_skb() => tcp_ao_transmit_skb() [TX softirq]
> - tcp_v4_rcv() => tcp_inbound_ao_hash() [RX softirq]

All these should/must have the socket lock held !

Or reading tcp_sk(sk)->rcv_nxt would be racy anyway (note the lack of
READ_ONCE() on it)

I think you need more work to make sure this is done correctly.

ie tcp_inbound_hash() should be called from tcp_v4_do_rcv() after the
bh_lock_sock_nested() and sock_owned_by_user() checks.



>
>
> > tcp_ao_get_repair() can lock the socket.
>
> It can, sure.
>
> > In TW state, I guess these values can not be changed ?
>
> Currently, they are considered constant on TW. The incoming segments are
> not verified on twsk (so no need for SNEs). And from ACK side not
> expecting SEQ roll-over (tcp_ao_compute_sne() is not called) - this may
> change, but not quite critical it seems.
>
> If we go with this patch in question, I'll have to update this:
> : key.sne = READ_ONCE(ao_info->snd_sne);
> (didn't adjust it for higher-bytes shift)
>
> > I think you can remove all these READ_ONCE()/WRITE_ONCE() which are not needed,
> > or please add a comment if they really are.
>
> Not sure if I answered above..
>
> > Then, you might be able to remove the 64BIT dependency ...
>
> At this moment I fail to imagine anyone running BGP + TCP-AO on 32bit
> kernel. I may be wrong, for sure.

I fail to see anyone using TCP-AO today. (up to linux-6.6), regardless
of the architecture.

Would that be a reason for not supporting it in the future ?

32bit or 64bit should not be in the picture, especially if done for
the wrong reasons.

2023-11-29 22:12:38

by Dmitry Safonov

[permalink] [raw]
Subject: Re: [PATCH v4 6/7] net/tcp: Store SNEs + SEQs on ao_info

On 11/29/23 21:01, Eric Dumazet wrote:
> On Wed, Nov 29, 2023 at 8:58 PM Dmitry Safonov <[email protected]> wrote:
>> On 11/29/23 18:34, Eric Dumazet wrote:
[..]
>>> You have not commented on where these are read without the socket lock held ?
>>
>> Sorry for missing this, the SNEs are used with this helper
>> tcp_ao_compute_sne(), so these places are (in square brackets AFAICS,
>> there is a chance that I miss something obvious from your message):
>>
>> - tcp_v4_send_reset() => tcp_ao_prepare_reset() [rcu_read_lock()]
>> - __tcp_transmit_skb() => tcp_ao_transmit_skb() [TX softirq]
>> - tcp_v4_rcv() => tcp_inbound_ao_hash() [RX softirq]
>
> All these should/must have the socket lock held !
>
> Or reading tcp_sk(sk)->rcv_nxt would be racy anyway (note the lack of
> READ_ONCE() on it)

For fairness, post this patch rcv_next is not read anymore (SNEs are
updated in parallel).


> I think you need more work to make sure this is done correctly.

Sure.

> ie tcp_inbound_hash() should be called from tcp_v4_do_rcv() after the
> bh_lock_sock_nested() and sock_owned_by_user() checks.

But than my concern would be that any incoming segment will cause
contention for the time of signature verification. That potentially may
create DoS.

If this patch is ugly enough to be not acceptable, would
bh_lock_sock_nested() around reading SNEs + rcv_nxt/snd_una sound better?

Let me add some information, that is lacking in patch message, but may
be critical to avoid misunderstanding:

Note that the code doesn't need precise SEQ numbers, but it needs a
consistent SNE+SEQ pair to detect the moment of SEQ number rolling over.
So, that tcp_ao_compute_sne() will be able to use decremented SNE for a
delayed/retransmitted segment and to use incremented SNE for a new
segment post-rollover. So, technically, it just needs a correct SNE.
Which is computed based on what was "cached" SEQ for that "cached" SNE
and what is the SEQ from the skb.

As tcp window size is smaller than 2 GB, the valid segment to be
verified or signed won't be far away from this consistent number, that
is to be used by tcp_ao_compute_sne().

Technically, if the SNE+SEQ "cached" pair is inconsistent (which
unlikely but may happen _prior_ to this patch): i.e. SNE from
pre-rollover and SEQ is post-rollover, tcp_ao_compute_sne() will
incorrectly increment/decrement the SNE that is used for
signing/verification of the TCP segment. In result the segment will fail
verification and will be retransmitted again.
As it's unlikely race that may happen on SEQ rollover (once in 4GB) and
TCP-AO connection won't break, but survives after the retransmission, I
don't think it was noticed on testing.

Thanks,
Dmitry

2023-12-02 17:17:01

by Simon Horman

[permalink] [raw]
Subject: Re: [PATCH v4 6/7] net/tcp: Store SNEs + SEQs on ao_info

On Wed, Nov 29, 2023 at 04:57:20PM +0000, Dmitry Safonov wrote:
> RFC 5925 (6.2):
> > TCP-AO emulates a 64-bit sequence number space by inferring when to
> > increment the high-order 32-bit portion (the SNE) based on
> > transitions in the low-order portion (the TCP sequence number).
>
> snd_sne and rcv_sne are the upper 4 bytes of extended SEQ number.
> Unfortunately, reading two 4-bytes pointers can't be performed
> atomically (without synchronization).
>
> In order to avoid locks on TCP fastpath, let's just double-account for
> SEQ changes: snd_una/rcv_nxt will be lower 4 bytes of snd_sne/rcv_sne.
>
> Fixes: 64382c71a557 ("net/tcp: Add TCP-AO SNE support")
> Signed-off-by: Dmitry Safonov <[email protected]>

...

> diff --git a/include/net/tcp_ao.h b/include/net/tcp_ao.h
> index 647781080613..b8ef25d4b632 100644
> --- a/include/net/tcp_ao.h
> +++ b/include/net/tcp_ao.h
> @@ -121,8 +121,8 @@ struct tcp_ao_info {
> * - for time-wait sockets the basis is tw_rcv_nxt/tw_snd_nxt.
> * tw_snd_nxt is not expected to change, while tw_rcv_nxt may.
> */
> - u32 snd_sne;
> - u32 rcv_sne;
> + u64 snd_sne;
> + u64 rcv_sne;
> refcount_t refcnt; /* Protects twsk destruction */
> struct rcu_head rcu;
> };

Hi Dmitry,

In tcp_ao.c:tcp_ao_connect_init() there is a local
variable:

struct tcp_ao_info *ao_info;

And the following assignment occurs:

ao_info->snd_sne = htonl(tp->write_seq);

Is this still correct in light of the change of the type of snd_sne?

Flagged by Sparse.

...

2023-12-04 17:08:37

by Dmitry Safonov

[permalink] [raw]
Subject: Re: [PATCH v4 6/7] net/tcp: Store SNEs + SEQs on ao_info

Hi Simon,

On 12/2/23 17:16, Simon Horman wrote:
> On Wed, Nov 29, 2023 at 04:57:20PM +0000, Dmitry Safonov wrote:
>> RFC 5925 (6.2):
>>> TCP-AO emulates a 64-bit sequence number space by inferring when to
>>> increment the high-order 32-bit portion (the SNE) based on
>>> transitions in the low-order portion (the TCP sequence number).
>>
>> snd_sne and rcv_sne are the upper 4 bytes of extended SEQ number.
>> Unfortunately, reading two 4-bytes pointers can't be performed
>> atomically (without synchronization).
>>
>> In order to avoid locks on TCP fastpath, let's just double-account for
>> SEQ changes: snd_una/rcv_nxt will be lower 4 bytes of snd_sne/rcv_sne.
>>
>> Fixes: 64382c71a557 ("net/tcp: Add TCP-AO SNE support")
>> Signed-off-by: Dmitry Safonov <[email protected]>
>
> ...
>
>> diff --git a/include/net/tcp_ao.h b/include/net/tcp_ao.h
>> index 647781080613..b8ef25d4b632 100644
>> --- a/include/net/tcp_ao.h
>> +++ b/include/net/tcp_ao.h
>> @@ -121,8 +121,8 @@ struct tcp_ao_info {
>> * - for time-wait sockets the basis is tw_rcv_nxt/tw_snd_nxt.
>> * tw_snd_nxt is not expected to change, while tw_rcv_nxt may.
>> */
>> - u32 snd_sne;
>> - u32 rcv_sne;
>> + u64 snd_sne;
>> + u64 rcv_sne;
>> refcount_t refcnt; /* Protects twsk destruction */
>> struct rcu_head rcu;
>> };
>
> Hi Dmitry,
>
> In tcp_ao.c:tcp_ao_connect_init() there is a local
> variable:
>
> struct tcp_ao_info *ao_info;
>
> And the following assignment occurs:
>
> ao_info->snd_sne = htonl(tp->write_seq);
>
> Is this still correct in light of the change of the type of snd_sne?

Thanks for the report.
Yes, it's correct as lower 4-bytes are initialized as initial SEQ.
I'll add a cast for it if I'll go with v5 for this patch.

>
> Flagged by Sparse.
>

Thanks,
Dmitry

2023-12-07 10:52:56

by Simon Horman

[permalink] [raw]
Subject: Re: [PATCH v4 6/7] net/tcp: Store SNEs + SEQs on ao_info

On Mon, Dec 04, 2023 at 05:08:20PM +0000, Dmitry Safonov wrote:
> Hi Simon,
>
> On 12/2/23 17:16, Simon Horman wrote:
> > On Wed, Nov 29, 2023 at 04:57:20PM +0000, Dmitry Safonov wrote:
> >> RFC 5925 (6.2):
> >>> TCP-AO emulates a 64-bit sequence number space by inferring when to
> >>> increment the high-order 32-bit portion (the SNE) based on
> >>> transitions in the low-order portion (the TCP sequence number).
> >>
> >> snd_sne and rcv_sne are the upper 4 bytes of extended SEQ number.
> >> Unfortunately, reading two 4-bytes pointers can't be performed
> >> atomically (without synchronization).
> >>
> >> In order to avoid locks on TCP fastpath, let's just double-account for
> >> SEQ changes: snd_una/rcv_nxt will be lower 4 bytes of snd_sne/rcv_sne.
> >>
> >> Fixes: 64382c71a557 ("net/tcp: Add TCP-AO SNE support")
> >> Signed-off-by: Dmitry Safonov <[email protected]>
> >
> > ...
> >
> >> diff --git a/include/net/tcp_ao.h b/include/net/tcp_ao.h
> >> index 647781080613..b8ef25d4b632 100644
> >> --- a/include/net/tcp_ao.h
> >> +++ b/include/net/tcp_ao.h
> >> @@ -121,8 +121,8 @@ struct tcp_ao_info {
> >> * - for time-wait sockets the basis is tw_rcv_nxt/tw_snd_nxt.
> >> * tw_snd_nxt is not expected to change, while tw_rcv_nxt may.
> >> */
> >> - u32 snd_sne;
> >> - u32 rcv_sne;
> >> + u64 snd_sne;
> >> + u64 rcv_sne;
> >> refcount_t refcnt; /* Protects twsk destruction */
> >> struct rcu_head rcu;
> >> };
> >
> > Hi Dmitry,
> >
> > In tcp_ao.c:tcp_ao_connect_init() there is a local
> > variable:
> >
> > struct tcp_ao_info *ao_info;
> >
> > And the following assignment occurs:
> >
> > ao_info->snd_sne = htonl(tp->write_seq);
> >
> > Is this still correct in light of the change of the type of snd_sne?
>
> Thanks for the report.
> Yes, it's correct as lower 4-bytes are initialized as initial SEQ.
> I'll add a cast for it if I'll go with v5 for this patch.

Thanks Dmitry,

I think that would address my concern.