From: Menglong Dong <[email protected]>
For now, the skb drop reasons have not fully be supported by TCP
protocol on the code path of TCP connection state change. The function
call chain is a little complex, which makes it hard to get the reason
that why skb is dropped.
However, I have a idea now: store the drop reason in the tcp_skb_cb,
which means that we need to add a 'drop_reason' field to the struct
tcp_skb_cb. Luckily, this struct still has 4 bytes spare space for this
purpose.
In this way, we need only to initialize to 'TCP_SKB_CB(skb)->drop_reason'
to SKB_DROP_REASON_NOT_SPECIFIED in tcp_v4_rcv()/tcp_v6_rcv(). When the
skb needs to be dropped, the value of this field should be the drop
reason or SKB_DROP_REASON_NOT_SPECIFIED. Meanwhile, the value also can be
SKB_NOT_DROPPED_YET. On such case, try_kfree_skb(), which we add in the
1th patch, should be called.
Hi, Eric, do you like it? In this way, we almost don't need to change the
exist code, and won't mess the code up.
In this series, the skb drop reasons are added following functions:
tcp_rcv_synsent_state_process
tcp_timewait_state_process
tcp_conn_request
tcp_rcv_state_process
And following new drop reasons are added:
SKB_DROP_REASON_TCP_PAWSACTIVEREJECTED
SKB_DROP_REASON_TIMEWAIT
SKB_DROP_REASON_LISTENOVERFLOWS
SKB_DROP_REASON_TCP_REQQFULLDROP
SKB_DROP_REASON_TCP_ABORTONDATA
SKB_DROP_REASON_TCP_ABORTONLINGER
SKB_DROP_REASON_LSM
Menglong Dong (9):
net: skb: introduce try_kfree_skb()
net: tcp: add 'drop_reason' field to struct tcp_skb_cb
net: tcp: use the drop reasons stored in tcp_skb_cb
net: tcp: store drop reasons in tcp_rcv_synsent_state_process()
net: tcp: store drop reasons in tcp_timewait_state_process()
net: tcp: store drop reasons in tcp_conn_request()
net: tcp: store drop reasons in tcp_rcv_state_process()
net: tcp: store drop reasons in route_req
net: tcp: use LINUX_MIB_TCPABORTONLINGER in tcp_rcv_state_process()
include/linux/skbuff.h | 9 +++++++++
include/net/dropreason.h | 43 ++++++++++++++++++++++++++++++++++++++++
include/net/tcp.h | 3 +++
net/ipv4/tcp_input.c | 29 ++++++++++++++++++++++-----
net/ipv4/tcp_ipv4.c | 26 ++++++++++++++++++++----
net/ipv4/tcp_minisocks.c | 15 ++++++++++++--
net/ipv6/tcp_ipv6.c | 31 +++++++++++++++++++++++------
7 files changed, 139 insertions(+), 17 deletions(-)
--
2.37.2
From: Menglong Dong <[email protected]>
Store the skb drop reasons to tcp_skb_cb for tcp_conn_request(). When
the skb should be freed normally, 'TCP_SKB_CB(skb)->drop_reason' will be
set to SKB_NOT_DROPPED_YET, which means consume_skb() will be called for
the skb.
Now, we can replace the consume_skb() with try_kfree_skb() in
tcp_rcv_state_process() if the skb needs to be dropped.
The new drop reasons 'LISTENOVERFLOWS' and 'TCP_REQQFULLDROP' are added,
which are used for 'accept queue' and 'request queue' full.
Signed-off-by: Menglong Dong <[email protected]>
--
---
include/net/dropreason.h | 12 ++++++++++++
net/ipv4/tcp_input.c | 11 +++++++++--
2 files changed, 21 insertions(+), 2 deletions(-)
diff --git a/include/net/dropreason.h b/include/net/dropreason.h
index cbfd88493ef2..633a05c95026 100644
--- a/include/net/dropreason.h
+++ b/include/net/dropreason.h
@@ -70,6 +70,8 @@
FN(PKT_TOO_BIG) \
FN(TCP_PAWSACTIVEREJECTED) \
FN(TIMEWAIT) \
+ FN(LISTENOVERFLOWS) \
+ FN(TCP_REQQFULLDROP) \
FNe(MAX)
/**
@@ -312,6 +314,16 @@ enum skb_drop_reason {
* 'SYN' packet
*/
SKB_DROP_REASON_TIMEWAIT,
+ /**
+ * @SKB_DROP_REASON_LISTENOVERFLOWS: accept queue of the listen
+ * socket is full, corresponding to LINUX_MIB_LISTENOVERFLOWS
+ */
+ SKB_DROP_REASON_LISTENOVERFLOWS,
+ /**
+ * @SKB_DROP_REASON_TCP_REQQFULLDROP: request queue of the listen
+ * socket is full, corresponding to LINUX_MIB_TCPREQQFULLDROP
+ */
+ SKB_DROP_REASON_TCP_REQQFULLDROP,
/**
* @SKB_DROP_REASON_MAX: the maximum of drop reason, which shouldn't be
* used as a real 'reason'
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index c0e5c4a29a4e..ad088e228b1e 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -6482,7 +6482,9 @@ int tcp_rcv_state_process(struct sock *sk, struct sk_buff *skb)
if (!acceptable)
return 1;
- consume_skb(skb);
+
+ reason = TCP_SKB_CB(skb)->drop_reason;
+ try_kfree_skb(skb, reason);
return 0;
}
SKB_DR_SET(reason, TCP_FLAGS);
@@ -6928,12 +6930,15 @@ int tcp_conn_request(struct request_sock_ops *rsk_ops,
*/
if ((syncookies == 2 || inet_csk_reqsk_queue_is_full(sk)) && !isn) {
want_cookie = tcp_syn_flood_action(sk, rsk_ops->slab_name);
- if (!want_cookie)
+ if (!want_cookie) {
+ TCP_SKB_DR(skb, TCP_REQQFULLDROP);
goto drop;
+ }
}
if (sk_acceptq_is_full(sk)) {
NET_INC_STATS(sock_net(sk), LINUX_MIB_LISTENOVERFLOWS);
+ TCP_SKB_DR(skb, LISTENOVERFLOWS);
goto drop;
}
@@ -6991,6 +6996,7 @@ int tcp_conn_request(struct request_sock_ops *rsk_ops,
*/
pr_drop_req(req, ntohs(tcp_hdr(skb)->source),
rsk_ops->family);
+ TCP_SKB_DR(skb, TCP_REQQFULLDROP);
goto drop_and_release;
}
@@ -7005,6 +7011,7 @@ int tcp_conn_request(struct request_sock_ops *rsk_ops,
inet_rsk(req)->ecn_ok = 0;
}
+ TCP_SKB_CB(skb)->drop_reason = SKB_NOT_DROPPED_YET;
tcp_rsk(req)->snt_isn = isn;
tcp_rsk(req)->txhash = net_tx_rndhash();
tcp_rsk(req)->syn_tos = TCP_SKB_CB(skb)->ip_dsfield;
--
2.37.2
From: Menglong Dong <[email protected]>
Add skb drop reasons to tcp_v4_route_req() and tcp_v6_route_req().
And the new reason SKB_DROP_REASON_LSM is added, which is used when
skb is dropped by LSM.
Signed-off-by: Menglong Dong <[email protected]>
---
include/net/dropreason.h | 5 +++++
net/ipv4/tcp_ipv4.c | 11 +++++++++--
net/ipv6/tcp_ipv6.c | 11 +++++++++--
3 files changed, 23 insertions(+), 4 deletions(-)
diff --git a/include/net/dropreason.h b/include/net/dropreason.h
index 364811bce63f..a5de00d02213 100644
--- a/include/net/dropreason.h
+++ b/include/net/dropreason.h
@@ -74,6 +74,7 @@
FN(TCP_REQQFULLDROP) \
FN(TCP_ABORTONDATA) \
FN(TCP_ABORTONLINGER) \
+ FN(LSM) \
FNe(MAX)
/**
@@ -336,6 +337,10 @@ enum skb_drop_reason {
* LINUX_MIB_TCPABORTONLINGER
*/
SKB_DROP_REASON_TCP_ABORTONLINGER,
+ /**
+ * @SKB_DROP_REASON_LSM: dropped by LSM
+ */
+ SKB_DROP_REASON_LSM,
/**
* @SKB_DROP_REASON_MAX: the maximum of drop reason, which shouldn't be
* used as a real 'reason'
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index a85bc7483c5a..8fdea8e6207f 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -1447,12 +1447,19 @@ static struct dst_entry *tcp_v4_route_req(const struct sock *sk,
struct flowi *fl,
struct request_sock *req)
{
+ struct dst_entry *dst;
+
tcp_v4_init_req(req, sk, skb);
- if (security_inet_conn_request(sk, skb, req))
+ if (security_inet_conn_request(sk, skb, req)) {
+ TCP_SKB_DR(skb, LSM);
return NULL;
+ }
- return inet_csk_route_req(sk, &fl->u.ip4, req);
+ dst = inet_csk_route_req(sk, &fl->u.ip4, req);
+ if (!dst)
+ TCP_SKB_DR(skb, IP_OUTNOROUTES);
+ return dst;
}
struct request_sock_ops tcp_request_sock_ops __read_mostly = {
diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
index 2c2048832714..44c4aa2789d6 100644
--- a/net/ipv6/tcp_ipv6.c
+++ b/net/ipv6/tcp_ipv6.c
@@ -823,12 +823,19 @@ static struct dst_entry *tcp_v6_route_req(const struct sock *sk,
struct flowi *fl,
struct request_sock *req)
{
+ struct dst_entry *dst;
+
tcp_v6_init_req(req, sk, skb);
- if (security_inet_conn_request(sk, skb, req))
+ if (security_inet_conn_request(sk, skb, req)) {
+ TCP_SKB_DR(skb, LSM);
return NULL;
+ }
- return inet6_csk_route_req(sk, &fl->u.ip6, req, IPPROTO_TCP);
+ dst = inet6_csk_route_req(sk, &fl->u.ip6, req, IPPROTO_TCP);
+ if (!dst)
+ TCP_SKB_DR(skb, IP_OUTNOROUTES);
+ return dst;
}
struct request_sock_ops tcp6_request_sock_ops __read_mostly = {
--
2.37.2
From: Menglong Dong <[email protected]>
Store the drop reason to the tcp_skb_cb in tcp_timewait_state_process()
when the skb is going to be dropped. The new drop reason 'TIMEWAIT' is
added.
Signed-off-by: Menglong Dong <[email protected]>
---
include/net/dropreason.h | 7 +++++++
net/ipv4/tcp_minisocks.c | 15 +++++++++++++--
2 files changed, 20 insertions(+), 2 deletions(-)
diff --git a/include/net/dropreason.h b/include/net/dropreason.h
index 0f0edcd5f95f..cbfd88493ef2 100644
--- a/include/net/dropreason.h
+++ b/include/net/dropreason.h
@@ -69,6 +69,7 @@
FN(IP_INNOROUTES) \
FN(PKT_TOO_BIG) \
FN(TCP_PAWSACTIVEREJECTED) \
+ FN(TIMEWAIT) \
FNe(MAX)
/**
@@ -305,6 +306,12 @@ enum skb_drop_reason {
* LINUX_MIB_PAWSACTIVEREJECTED
*/
SKB_DROP_REASON_TCP_PAWSACTIVEREJECTED,
+ /**
+ * @SKB_DROP_REASON_TIMEWAIT: socket is in time-wait state and all
+ * packet that received will be treated as 'drop', except a good
+ * 'SYN' packet
+ */
+ SKB_DROP_REASON_TIMEWAIT,
/**
* @SKB_DROP_REASON_MAX: the maximum of drop reason, which shouldn't be
* used as a real 'reason'
diff --git a/net/ipv4/tcp_minisocks.c b/net/ipv4/tcp_minisocks.c
index c375f603a16c..e1963394dc4a 100644
--- a/net/ipv4/tcp_minisocks.c
+++ b/net/ipv4/tcp_minisocks.c
@@ -113,11 +113,16 @@ tcp_timewait_state_process(struct inet_timewait_sock *tw, struct sk_buff *skb,
return tcp_timewait_check_oow_rate_limit(
tw, skb, LINUX_MIB_TCPACKSKIPPEDFINWAIT2);
- if (th->rst)
+ if (th->rst) {
+ TCP_SKB_DR(skb, TCP_RESET);
goto kill;
+ }
- if (th->syn && !before(TCP_SKB_CB(skb)->seq, tcptw->tw_rcv_nxt))
+ if (th->syn && !before(TCP_SKB_CB(skb)->seq,
+ tcptw->tw_rcv_nxt)) {
+ TCP_SKB_DR(skb, TCP_FLAGS);
return TCP_TW_RST;
+ }
/* Dup ACK? */
if (!th->ack ||
@@ -143,6 +148,9 @@ tcp_timewait_state_process(struct inet_timewait_sock *tw, struct sk_buff *skb,
}
inet_twsk_reschedule(tw, TCP_TIMEWAIT_LEN);
+
+ /* skb should be free normally on this case. */
+ TCP_SKB_CB(skb)->drop_reason = SKB_NOT_DROPPED_YET;
return TCP_TW_ACK;
}
@@ -174,6 +182,7 @@ tcp_timewait_state_process(struct inet_timewait_sock *tw, struct sk_buff *skb,
* protocol bug yet.
*/
if (!READ_ONCE(twsk_net(tw)->ipv4.sysctl_tcp_rfc1337)) {
+ TCP_SKB_DR(skb, TCP_RESET);
kill:
inet_twsk_deschedule_put(tw);
return TCP_TW_SUCCESS;
@@ -232,9 +241,11 @@ tcp_timewait_state_process(struct inet_timewait_sock *tw, struct sk_buff *skb,
if (paws_reject || th->ack)
inet_twsk_reschedule(tw, TCP_TIMEWAIT_LEN);
+ TCP_SKB_DR(skb, TIMEWAIT);
return tcp_timewait_check_oow_rate_limit(
tw, skb, LINUX_MIB_TCPACKSKIPPEDTIMEWAIT);
}
+ TCP_SKB_DR(skb, TCP_RESET);
inet_twsk_put(tw);
return TCP_TW_SUCCESS;
}
--
2.37.2
From: Menglong Dong <[email protected]>
The skb drop reasons for the 'reset' code path in
tcp_rcv_synsent_state_process() is not handled yet. Now, we can store the
drop reason to tcp_skb_cb for such case.
The new reason 'TCP_PAWSACTIVEREJECTED' is added, which is corresponding
to LINUX_MIB_PAWSACTIVEREJECTED.
Signed-off-by: Menglong Dong <[email protected]>
---
include/net/dropreason.h | 7 +++++++
net/ipv4/tcp_input.c | 5 +++++
2 files changed, 12 insertions(+)
diff --git a/include/net/dropreason.h b/include/net/dropreason.h
index c1cbcdbaf149..0f0edcd5f95f 100644
--- a/include/net/dropreason.h
+++ b/include/net/dropreason.h
@@ -68,6 +68,7 @@
FN(IP_INADDRERRORS) \
FN(IP_INNOROUTES) \
FN(PKT_TOO_BIG) \
+ FN(TCP_PAWSACTIVEREJECTED) \
FNe(MAX)
/**
@@ -298,6 +299,12 @@ enum skb_drop_reason {
* MTU)
*/
SKB_DROP_REASON_PKT_TOO_BIG,
+ /**
+ * @SKB_DROP_REASON_TCP_PAWSACTIVEREJECTED: PAWS check failed for
+ * active TCP connection, corresponding to
+ * LINUX_MIB_PAWSACTIVEREJECTED
+ */
+ SKB_DROP_REASON_TCP_PAWSACTIVEREJECTED,
/**
* @SKB_DROP_REASON_MAX: the maximum of drop reason, which shouldn't be
* used as a real 'reason'
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 0640453fce54..c0e5c4a29a4e 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -6195,6 +6195,10 @@ static int tcp_rcv_synsent_state_process(struct sock *sk, struct sk_buff *skb,
inet_csk_reset_xmit_timer(sk,
ICSK_TIME_RETRANS,
TCP_TIMEOUT_MIN, TCP_RTO_MAX);
+ if (after(TCP_SKB_CB(skb)->ack_seq, tp->snd_nxt))
+ TCP_SKB_DR(skb, TCP_ACK_UNSENT_DATA);
+ else
+ TCP_SKB_DR(skb, TCP_TOO_OLD_ACK);
goto reset_and_undo;
}
@@ -6203,6 +6207,7 @@ static int tcp_rcv_synsent_state_process(struct sock *sk, struct sk_buff *skb,
tcp_time_stamp(tp))) {
NET_INC_STATS(sock_net(sk),
LINUX_MIB_PAWSACTIVEREJECTED);
+ TCP_SKB_DR(skb, TCP_PAWSACTIVEREJECTED);
goto reset_and_undo;
}
--
2.37.2
From: Menglong Dong <[email protected]>
Because of the long call chain on processing skb in TCP protocol, it's
hard to pass the drop reason to the code where the skb is freed.
Therefore, we can store the drop reason to skb->cb, and pass it to
kfree_skb_reason(). I'm lucky, the struct tcp_skb_cb still has 4 bytes
spare space for this purpose.
Signed-off-by: Menglong Dong <[email protected]>
---
include/net/tcp.h | 3 +++
1 file changed, 3 insertions(+)
diff --git a/include/net/tcp.h b/include/net/tcp.h
index 14d45661a84d..0b6e39ca83b4 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -888,6 +888,7 @@ struct tcp_skb_cb {
has_rxtstamp:1, /* SKB has a RX timestamp */
unused:5;
__u32 ack_seq; /* Sequence number ACK'd */
+ enum skb_drop_reason drop_reason; /* Why skb is dropped */
union {
struct {
#define TCPCB_DELIVERED_CE_MASK ((1U<<20) - 1)
@@ -912,6 +913,8 @@ struct tcp_skb_cb {
};
#define TCP_SKB_CB(__skb) ((struct tcp_skb_cb *)&((__skb)->cb[0]))
+#define TCP_SKB_DR(__skb, reason) \
+ (TCP_SKB_CB(__skb)->drop_reason = SKB_DROP_REASON_##reason)
extern const struct inet_connection_sock_af_ops ipv4_specific;
--
2.37.2
From: Menglong Dong <[email protected]>
In order to simply the code, introduce try_kfree_skb(), which allow
SKB_NOT_DROPPED_YET to be passed. When the reason is SKB_NOT_DROPPED_YET,
consume_skb() will be called to free the skb normally. Otherwise,
kfree_skb_reason() will be called.
Signed-off-by: Menglong Dong <[email protected]>
---
include/linux/skbuff.h | 9 +++++++++
1 file changed, 9 insertions(+)
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 59c9fd55699d..f722accc054e 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -1236,6 +1236,15 @@ static inline void consume_skb(struct sk_buff *skb)
}
#endif
+static inline void try_kfree_skb(struct sk_buff *skb,
+ enum skb_drop_reason reason)
+{
+ if (reason != SKB_NOT_DROPPED_YET)
+ kfree_skb_reason(skb, reason);
+ else
+ consume_skb(skb);
+}
+
void __consume_stateless_skb(struct sk_buff *skb);
void __kfree_skb(struct sk_buff *skb);
extern struct kmem_cache *skbuff_head_cache;
--
2.37.2
From: Menglong Dong <[email protected]>
Store the drop reasons for skb to the tcp_skb_cb in
tcp_rcv_state_process() when it returns non-zero, which means that
the skb need to be dropped.
The new drop reasons 'TCP_ABORTONDATA' and 'TCP_ABORTONLINGER' are
added.
Signed-off-by: Menglong Dong <[email protected]>
---
include/net/dropreason.h | 12 ++++++++++++
net/ipv4/tcp_input.c | 11 +++++++++--
2 files changed, 21 insertions(+), 2 deletions(-)
diff --git a/include/net/dropreason.h b/include/net/dropreason.h
index 633a05c95026..364811bce63f 100644
--- a/include/net/dropreason.h
+++ b/include/net/dropreason.h
@@ -72,6 +72,8 @@
FN(TIMEWAIT) \
FN(LISTENOVERFLOWS) \
FN(TCP_REQQFULLDROP) \
+ FN(TCP_ABORTONDATA) \
+ FN(TCP_ABORTONLINGER) \
FNe(MAX)
/**
@@ -324,6 +326,16 @@ enum skb_drop_reason {
* socket is full, corresponding to LINUX_MIB_TCPREQQFULLDROP
*/
SKB_DROP_REASON_TCP_REQQFULLDROP,
+ /**
+ * @SKB_DROP_REASON_TCP_ABORTONDATA: corresponding to
+ * LINUX_MIB_TCPABORTONDATA
+ */
+ SKB_DROP_REASON_TCP_ABORTONDATA,
+ /**
+ * @SKB_DROP_REASON_TCP_ABORTONLINGER: corresponding to
+ * LINUX_MIB_TCPABORTONLINGER
+ */
+ SKB_DROP_REASON_TCP_ABORTONLINGER,
/**
* @SKB_DROP_REASON_MAX: the maximum of drop reason, which shouldn't be
* used as a real 'reason'
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index ad088e228b1e..e08842f999f8 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -6459,8 +6459,10 @@ int tcp_rcv_state_process(struct sock *sk, struct sk_buff *skb)
goto discard;
case TCP_LISTEN:
- if (th->ack)
+ if (th->ack) {
+ TCP_SKB_DR(skb, TCP_FLAGS);
return 1;
+ }
if (th->rst) {
SKB_DR_SET(reason, TCP_RESET);
@@ -6533,8 +6535,10 @@ int tcp_rcv_state_process(struct sock *sk, struct sk_buff *skb)
FLAG_NO_CHALLENGE_ACK) > 0;
if (!acceptable) {
- if (sk->sk_state == TCP_SYN_RECV)
+ if (sk->sk_state == TCP_SYN_RECV) {
+ TCP_SKB_DR(skb, TCP_FLAGS);
return 1; /* send one RST */
+ }
tcp_send_challenge_ack(sk);
SKB_DR_SET(reason, TCP_OLD_ACK);
goto discard;
@@ -6605,6 +6609,7 @@ int tcp_rcv_state_process(struct sock *sk, struct sk_buff *skb)
if (tp->linger2 < 0) {
tcp_done(sk);
NET_INC_STATS(sock_net(sk), LINUX_MIB_TCPABORTONDATA);
+ TCP_SKB_DR(skb, TCP_ABORTONLINGER);
return 1;
}
if (TCP_SKB_CB(skb)->end_seq != TCP_SKB_CB(skb)->seq &&
@@ -6614,6 +6619,7 @@ int tcp_rcv_state_process(struct sock *sk, struct sk_buff *skb)
tcp_fastopen_active_disable(sk);
tcp_done(sk);
NET_INC_STATS(sock_net(sk), LINUX_MIB_TCPABORTONDATA);
+ TCP_SKB_DR(skb, TCP_ABORTONDATA);
return 1;
}
@@ -6678,6 +6684,7 @@ int tcp_rcv_state_process(struct sock *sk, struct sk_buff *skb)
if (TCP_SKB_CB(skb)->end_seq != TCP_SKB_CB(skb)->seq &&
after(TCP_SKB_CB(skb)->end_seq - th->fin, tp->rcv_nxt)) {
NET_INC_STATS(sock_net(sk), LINUX_MIB_TCPABORTONDATA);
+ TCP_SKB_DR(skb, TCP_ABORTONDATA);
tcp_reset(sk, skb);
return 1;
}
--
2.37.2
From: Menglong Dong <[email protected]>
The drop reasons for skb can be stored in tcp_skb_cb in some function
when it needs to be dropped. The following functions will do it in the
latter commits:
tcp_rcv_state_process
tcp_conn_request
tcp_rcv_state_process
tcp_timewait_state_process
tcp_rcv_synsent_state_process
Now, we initialize the drop_reason in tcp_skb_cb to
SKB_DROP_REASON_NOT_SPECIFIED. try_kfree_skb() should be used if any code
path makes the drop_reason to SKB_NOT_DROPPED_YET. Don't try to set it
to SKB_NOT_DROPPED_YET if the skb has any posibility to be dropped later.
Signed-off-by: Menglong Dong <[email protected]>
---
net/ipv4/tcp_ipv4.c | 15 +++++++++++++--
net/ipv6/tcp_ipv6.c | 20 ++++++++++++++++----
2 files changed, 29 insertions(+), 6 deletions(-)
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index 87d440f47a70..a85bc7483c5a 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -1693,6 +1693,7 @@ int tcp_v4_do_rcv(struct sock *sk, struct sk_buff *skb)
goto discard;
if (nsk != sk) {
if (tcp_child_process(sk, nsk, skb)) {
+ reason = TCP_SKB_CB(skb)->drop_reason;
rsk = nsk;
goto reset;
}
@@ -1702,6 +1703,7 @@ int tcp_v4_do_rcv(struct sock *sk, struct sk_buff *skb)
sock_rps_save_rxhash(sk, skb);
if (tcp_rcv_state_process(sk, skb)) {
+ reason = TCP_SKB_CB(skb)->drop_reason;
rsk = sk;
goto reset;
}
@@ -1945,6 +1947,7 @@ int tcp_v4_rcv(struct sk_buff *skb)
int ret;
drop_reason = SKB_DROP_REASON_NOT_SPECIFIED;
+ TCP_SKB_DR(skb, NOT_SPECIFIED);
if (skb->pkt_type != PACKET_HOST)
goto discard_it;
@@ -2050,6 +2053,7 @@ int tcp_v4_rcv(struct sk_buff *skb)
reqsk_put(req);
tcp_v4_restore_cb(skb);
} else if (tcp_child_process(sk, nsk, skb)) {
+ drop_reason = TCP_SKB_CB(skb)->drop_reason;
tcp_v4_send_reset(nsk, skb);
goto discard_and_relse;
} else {
@@ -2136,6 +2140,11 @@ int tcp_v4_rcv(struct sk_buff *skb)
kfree_skb_reason(skb, drop_reason);
return 0;
+free_it:
+ drop_reason = TCP_SKB_CB(skb)->drop_reason;
+ try_kfree_skb(skb, drop_reason);
+ return 0;
+
discard_and_relse:
sk_drops_add(sk, skb);
if (refcounted)
@@ -2171,6 +2180,8 @@ int tcp_v4_rcv(struct sk_buff *skb)
refcounted = false;
goto process;
}
+ /* TCP_FLAGS or NO_SOCKET? */
+ TCP_SKB_DR(skb, TCP_FLAGS);
}
/* to ACK */
fallthrough;
@@ -2180,10 +2191,10 @@ int tcp_v4_rcv(struct sk_buff *skb)
case TCP_TW_RST:
tcp_v4_send_reset(sk, skb);
inet_twsk_deschedule_put(inet_twsk(sk));
- goto discard_it;
+ goto free_it;
case TCP_TW_SUCCESS:;
}
- goto discard_it;
+ goto free_it;
}
static struct timewait_sock_ops tcp_timewait_sock_ops = {
diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
index f676be14e6b6..2c2048832714 100644
--- a/net/ipv6/tcp_ipv6.c
+++ b/net/ipv6/tcp_ipv6.c
@@ -1515,8 +1515,10 @@ int tcp_v6_do_rcv(struct sock *sk, struct sk_buff *skb)
goto discard;
if (nsk != sk) {
- if (tcp_child_process(sk, nsk, skb))
+ if (tcp_child_process(sk, nsk, skb)) {
+ reason = TCP_SKB_CB(skb)->drop_reason;
goto reset;
+ }
if (opt_skb)
__kfree_skb(opt_skb);
return 0;
@@ -1524,8 +1526,10 @@ int tcp_v6_do_rcv(struct sock *sk, struct sk_buff *skb)
} else
sock_rps_save_rxhash(sk, skb);
- if (tcp_rcv_state_process(sk, skb))
+ if (tcp_rcv_state_process(sk, skb)) {
+ reason = TCP_SKB_CB(skb)->drop_reason;
goto reset;
+ }
if (opt_skb)
goto ipv6_pktoptions;
return 0;
@@ -1615,6 +1619,7 @@ INDIRECT_CALLABLE_SCOPE int tcp_v6_rcv(struct sk_buff *skb)
struct net *net = dev_net(skb->dev);
drop_reason = SKB_DROP_REASON_NOT_SPECIFIED;
+ TCP_SKB_DR(skb, NOT_SPECIFIED);
if (skb->pkt_type != PACKET_HOST)
goto discard_it;
@@ -1711,6 +1716,7 @@ INDIRECT_CALLABLE_SCOPE int tcp_v6_rcv(struct sk_buff *skb)
reqsk_put(req);
tcp_v6_restore_cb(skb);
} else if (tcp_child_process(sk, nsk, skb)) {
+ drop_reason = TCP_SKB_CB(skb)->drop_reason;
tcp_v6_send_reset(nsk, skb);
goto discard_and_relse;
} else {
@@ -1792,6 +1798,11 @@ INDIRECT_CALLABLE_SCOPE int tcp_v6_rcv(struct sk_buff *skb)
kfree_skb_reason(skb, drop_reason);
return 0;
+free_it:
+ drop_reason = TCP_SKB_CB(skb)->drop_reason;
+ try_kfree_skb(skb, drop_reason);
+ return 0;
+
discard_and_relse:
sk_drops_add(sk, skb);
if (refcounted)
@@ -1832,6 +1843,7 @@ INDIRECT_CALLABLE_SCOPE int tcp_v6_rcv(struct sk_buff *skb)
refcounted = false;
goto process;
}
+ TCP_SKB_DR(skb, TCP_FLAGS);
}
/* to ACK */
fallthrough;
@@ -1841,11 +1853,11 @@ INDIRECT_CALLABLE_SCOPE int tcp_v6_rcv(struct sk_buff *skb)
case TCP_TW_RST:
tcp_v6_send_reset(sk, skb);
inet_twsk_deschedule_put(inet_twsk(sk));
- goto discard_it;
+ goto free_it;
case TCP_TW_SUCCESS:
;
}
- goto discard_it;
+ goto free_it;
}
void tcp_v6_early_demux(struct sk_buff *skb)
--
2.37.2
From: Menglong Dong <[email protected]>
The statistics for 'tp->linger2 < 0' in tcp_rcv_state_process() seems
more accurate to be LINUX_MIB_TCPABORTONLINGER.
Signed-off-by: Menglong Dong <[email protected]>
---
net/ipv4/tcp_input.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index e08842f999f8..e8623cea1633 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -6608,7 +6608,7 @@ int tcp_rcv_state_process(struct sock *sk, struct sk_buff *skb)
if (tp->linger2 < 0) {
tcp_done(sk);
- NET_INC_STATS(sock_net(sk), LINUX_MIB_TCPABORTONDATA);
+ NET_INC_STATS(sock_net(sk), SKB_DROP_REASON_TCP_ABORTONLINGER);
TCP_SKB_DR(skb, TCP_ABORTONLINGER);
return 1;
}
--
2.37.2
On Sat, Oct 29, 2022 at 6:11 AM <[email protected]> wrote:
>
> From: Menglong Dong <[email protected]>
>
> Because of the long call chain on processing skb in TCP protocol, it's
> hard to pass the drop reason to the code where the skb is freed.
>
> Therefore, we can store the drop reason to skb->cb, and pass it to
> kfree_skb_reason(). I'm lucky, the struct tcp_skb_cb still has 4 bytes
> spare space for this purpose.
No, we have needs for this space for future use.
skb->cb[] purpose is to store semi-permanent info, for skbs that stay
in a queue.
Here, you need a state stored only in the context of the current context.
Stack variables are better.
On Sat, Oct 29, 2022 at 6:11 AM <[email protected]> wrote:
>
> From: Menglong Dong <[email protected]>
>
> In order to simply the code, introduce try_kfree_skb(), which allow
> SKB_NOT_DROPPED_YET to be passed. When the reason is SKB_NOT_DROPPED_YET,
> consume_skb() will be called to free the skb normally. Otherwise,
> kfree_skb_reason() will be called.
>
> Signed-off-by: Menglong Dong <[email protected]>
> ---
> include/linux/skbuff.h | 9 +++++++++
> 1 file changed, 9 insertions(+)
>
> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> index 59c9fd55699d..f722accc054e 100644
> --- a/include/linux/skbuff.h
> +++ b/include/linux/skbuff.h
> @@ -1236,6 +1236,15 @@ static inline void consume_skb(struct sk_buff *skb)
> }
> #endif
>
> +static inline void try_kfree_skb(struct sk_buff *skb,
> + enum skb_drop_reason reason)
> +{
> + if (reason != SKB_NOT_DROPPED_YET)
> + kfree_skb_reason(skb, reason);
> + else
> + consume_skb(skb);
> +}
> +
My proposal looks better IMO
https://patchwork.kernel.org/project/netdevbpf/patch/[email protected]/
On Sat, Oct 29, 2022 at 11:30 PM Eric Dumazet <[email protected]> wrote:
>
> On Sat, Oct 29, 2022 at 6:11 AM <[email protected]> wrote:
> >
> > From: Menglong Dong <[email protected]>
> >
> > In order to simply the code, introduce try_kfree_skb(), which allow
> > SKB_NOT_DROPPED_YET to be passed. When the reason is SKB_NOT_DROPPED_YET,
> > consume_skb() will be called to free the skb normally. Otherwise,
> > kfree_skb_reason() will be called.
> >
> > Signed-off-by: Menglong Dong <[email protected]>
> > ---
> > include/linux/skbuff.h | 9 +++++++++
> > 1 file changed, 9 insertions(+)
> >
> > diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> > index 59c9fd55699d..f722accc054e 100644
> > --- a/include/linux/skbuff.h
> > +++ b/include/linux/skbuff.h
> > @@ -1236,6 +1236,15 @@ static inline void consume_skb(struct sk_buff *skb)
> > }
> > #endif
> >
> > +static inline void try_kfree_skb(struct sk_buff *skb,
> > + enum skb_drop_reason reason)
> > +{
> > + if (reason != SKB_NOT_DROPPED_YET)
> > + kfree_skb_reason(skb, reason);
> > + else
> > + consume_skb(skb);
> > +}
> > +
>
> My proposal looks better IMO
>
> https://patchwork.kernel.org/project/netdevbpf/patch/[email protected]/
Hnn...yeah, your idea seems fine, which wont
affect the existing code.
Hello,
On Sat, Oct 29, 2022 at 11:23 PM Eric Dumazet <[email protected]> wrote:
>
> On Sat, Oct 29, 2022 at 6:11 AM <[email protected]> wrote:
> >
> > From: Menglong Dong <[email protected]>
> >
> > Because of the long call chain on processing skb in TCP protocol, it's
> > hard to pass the drop reason to the code where the skb is freed.
> >
> > Therefore, we can store the drop reason to skb->cb, and pass it to
> > kfree_skb_reason(). I'm lucky, the struct tcp_skb_cb still has 4 bytes
> > spare space for this purpose.
>
> No, we have needs for this space for future use.
>
> skb->cb[] purpose is to store semi-permanent info, for skbs that stay
> in a queue.
>
May I use it for a while? Or, can I make it a union with
some field, such as 'header'? As the 'drop_reason' field will
be set only if it is going to be freed.
> Here, you need a state stored only in the context of the current context.
> Stack variables are better.
It's hard to get the drop reason through stack variables,
especially some functions in TCP protocol, such as:
tcp_rcv_synsent_state_process
tcp_timewait_state_process
tcp_conn_request
tcp_rcv_state_process
And it will mess the code a little up, just like what
you comment in this series:
https://lore.kernel.org/netdev/[email protected]/
I hope to complete this part, or I think I can't move
ahead :/
Thanks!
Menglong Dong