2024-06-11 20:11:37

by Yan Zhai

[permalink] [raw]
Subject: [PATCH v4 net-next 0/7] net: pass receive socket to drop tracepoint

We set up our production packet drop monitoring around the kfree_skb
tracepoint. While this tracepoint is extremely valuable for diagnosing
critical problems, it also has some limitation with drops on the local
receive path: this tracepoint can only inspect the dropped skb itself,
but such skb might not carry enough information to:

1. determine in which netns/container this skb gets dropped
2. determine by which socket/service this skb oughts to be received

The 1st issue is because skb->dev is the only member field with valid
netns reference. But skb->dev can get cleared or reused. For example,
tcp_v4_rcv will clear skb->dev and in later processing it might be reused
for OFO tree.

The 2nd issue is because there is no reference on an skb that reliably
points to a receiving socket. skb->sk usually points to the local
sending socket, and it only points to a receive socket briefly after
early demux stage, yet the socket can get stolen later. For certain drop
reason like TCP OFO_MERGE, Zerowindow, UDP at PROTO_MEM error, etc, it
is hard to infer which receiving socket is impacted. This cannot be
overcome by simply looking at the packet header, because of
complications like sk lookup programs. In the past, single purpose
tracepoints like trace_udp_fail_queue_rcv_skb, trace_sock_rcvqueue_full,
etc are added as needed to provide more visibility. This could be
handled in a more generic way.

In this change set we propose a new 'sk_skb_reason_drop' call as a drop-in
replacement for kfree_skb_reason at various local input path. It accepts
an extra receiving socket argument. Both issues above can be resolved
via this new argument.

V3->V4: adjusted the TP_STRUCT field order to align better, suggested by
Steven Rostedt.

V2->V3: fixed drop_monitor function signatures; fixed a few uninitialized sks;
Added a few missing report tags from test bots (also noticed by Dan
Carpenter and Simon Horman).

V1->V2: instead of using skb->cb, directly add the needed argument to
trace_kfree_skb tracepoint. Also renamed functions as Eric Dumazet
suggested.

V3: https://lore.kernel.org/netdev/[email protected]/
V2: https://lore.kernel.org/linux-kernel/[email protected]/
V1: https://lore.kernel.org/netdev/[email protected]/

Yan Zhai (7):
net: add rx_sk to trace_kfree_skb
net: introduce sk_skb_reason_drop function
ping: use sk_skb_reason_drop to free rx packets
net: raw: use sk_skb_reason_drop to free rx packets
tcp: use sk_skb_reason_drop to free rx packets
udp: use sk_skb_reason_drop to free rx packets
af_packet: use sk_skb_reason_drop to free rx packets

include/linux/skbuff.h | 10 ++++++++--
include/trace/events/skb.h | 11 +++++++----
net/core/dev.c | 2 +-
net/core/drop_monitor.c | 9 ++++++---
net/core/skbuff.c | 22 ++++++++++++----------
net/ipv4/ping.c | 2 +-
net/ipv4/raw.c | 4 ++--
net/ipv4/syncookies.c | 2 +-
net/ipv4/tcp_input.c | 2 +-
net/ipv4/tcp_ipv4.c | 6 +++---
net/ipv4/udp.c | 10 +++++-----
net/ipv6/raw.c | 8 ++++----
net/ipv6/syncookies.c | 2 +-
net/ipv6/tcp_ipv6.c | 6 +++---
net/ipv6/udp.c | 10 +++++-----
net/packet/af_packet.c | 10 +++++-----
16 files changed, 65 insertions(+), 51 deletions(-)

--
2.30.2




2024-06-11 20:11:56

by Yan Zhai

[permalink] [raw]
Subject: [PATCH v4 net-next 1/7] net: add rx_sk to trace_kfree_skb

skb does not include enough information to find out receiving
sockets/services and netns/containers on packet drops. In theory
skb->dev tells about netns, but it can get cleared/reused, e.g. by TCP
stack for OOO packet lookup. Similarly, skb->sk often identifies a local
sender, and tells nothing about a receiver.

Allow passing an extra receiving socket to the tracepoint to improve
the visibility on receiving drops.

Signed-off-by: Yan Zhai <[email protected]>
---
v3->v4: adjusted the TP_STRUCT field order to be consistent
v2->v3: fixed drop_monitor function prototype
---
include/trace/events/skb.h | 11 +++++++----
net/core/dev.c | 2 +-
net/core/drop_monitor.c | 9 ++++++---
net/core/skbuff.c | 2 +-
4 files changed, 15 insertions(+), 9 deletions(-)

diff --git a/include/trace/events/skb.h b/include/trace/events/skb.h
index 07e0715628ec..3e9ea1cca6f2 100644
--- a/include/trace/events/skb.h
+++ b/include/trace/events/skb.h
@@ -24,13 +24,14 @@ DEFINE_DROP_REASON(FN, FN)
TRACE_EVENT(kfree_skb,

TP_PROTO(struct sk_buff *skb, void *location,
- enum skb_drop_reason reason),
+ enum skb_drop_reason reason, struct sock *rx_sk),

- TP_ARGS(skb, location, reason),
+ TP_ARGS(skb, location, reason, rx_sk),

TP_STRUCT__entry(
__field(void *, skbaddr)
__field(void *, location)
+ __field(void *, rx_skaddr)
__field(unsigned short, protocol)
__field(enum skb_drop_reason, reason)
),
@@ -38,12 +39,14 @@ TRACE_EVENT(kfree_skb,
TP_fast_assign(
__entry->skbaddr = skb;
__entry->location = location;
+ __entry->rx_skaddr = rx_sk;
__entry->protocol = ntohs(skb->protocol);
__entry->reason = reason;
),

- TP_printk("skbaddr=%p protocol=%u location=%pS reason: %s",
- __entry->skbaddr, __entry->protocol, __entry->location,
+ TP_printk("skbaddr=%p rx_skaddr=%p protocol=%u location=%pS reason: %s",
+ __entry->skbaddr, __entry->rx_skaddr, __entry->protocol,
+ __entry->location,
__print_symbolic(__entry->reason,
DEFINE_DROP_REASON(FN, FNe)))
);
diff --git a/net/core/dev.c b/net/core/dev.c
index 85fe8138f3e4..7844227ecbfd 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -5233,7 +5233,7 @@ static __latent_entropy void net_tx_action(struct softirq_action *h)
trace_consume_skb(skb, net_tx_action);
else
trace_kfree_skb(skb, net_tx_action,
- get_kfree_skb_cb(skb)->reason);
+ get_kfree_skb_cb(skb)->reason, NULL);

if (skb->fclone != SKB_FCLONE_UNAVAILABLE)
__kfree_skb(skb);
diff --git a/net/core/drop_monitor.c b/net/core/drop_monitor.c
index 430ed18f8584..2e0ae3328232 100644
--- a/net/core/drop_monitor.c
+++ b/net/core/drop_monitor.c
@@ -109,7 +109,8 @@ static u32 net_dm_queue_len = 1000;
struct net_dm_alert_ops {
void (*kfree_skb_probe)(void *ignore, struct sk_buff *skb,
void *location,
- enum skb_drop_reason reason);
+ enum skb_drop_reason reason,
+ struct sock *rx_sk);
void (*napi_poll_probe)(void *ignore, struct napi_struct *napi,
int work, int budget);
void (*work_item_func)(struct work_struct *work);
@@ -264,7 +265,8 @@ static void trace_drop_common(struct sk_buff *skb, void *location)

static void trace_kfree_skb_hit(void *ignore, struct sk_buff *skb,
void *location,
- enum skb_drop_reason reason)
+ enum skb_drop_reason reason,
+ struct sock *rx_sk)
{
trace_drop_common(skb, location);
}
@@ -491,7 +493,8 @@ static const struct net_dm_alert_ops net_dm_alert_summary_ops = {
static void net_dm_packet_trace_kfree_skb_hit(void *ignore,
struct sk_buff *skb,
void *location,
- enum skb_drop_reason reason)
+ enum skb_drop_reason reason,
+ struct sock *rx_sk)
{
ktime_t tstamp = ktime_get_real();
struct per_cpu_dm_data *data;
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 466999a7515e..2854afdd713f 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -1203,7 +1203,7 @@ bool __kfree_skb_reason(struct sk_buff *skb, enum skb_drop_reason reason)
if (reason == SKB_CONSUMED)
trace_consume_skb(skb, __builtin_return_address(0));
else
- trace_kfree_skb(skb, __builtin_return_address(0), reason);
+ trace_kfree_skb(skb, __builtin_return_address(0), reason, NULL);
return true;
}

--
2.30.2



2024-06-11 20:12:16

by Yan Zhai

[permalink] [raw]
Subject: [PATCH v4 net-next 2/7] net: introduce sk_skb_reason_drop function

Long used destructors kfree_skb and kfree_skb_reason do not pass
receiving socket to packet drop tracepoints trace_kfree_skb.
This makes it hard to track packet drops of a certain netns (container)
or a socket (user application).

The naming of these destructors are also not consistent with most sk/skb
operating functions, i.e. functions named "sk_xxx" or "skb_xxx".
Introduce a new functions sk_skb_reason_drop as drop-in replacement for
kfree_skb_reason on local receiving path. Callers can now pass receiving
sockets to the tracepoints.

kfree_skb and kfree_skb_reason are still usable but they are now just
inline helpers that call sk_skb_reason_drop.

Note it is not feasible to do the same to consume_skb. Packets not
dropped can flow through multiple receive handlers, and have multiple
receiving sockets. Leave it untouched for now.

Suggested-by: Eric Dumazet <[email protected]>
Signed-off-by: Yan Zhai <[email protected]>
---
v1->v2: changes function names to be more consistent with common sk/skb
operations
---
include/linux/skbuff.h | 10 ++++++++--
net/core/skbuff.c | 22 ++++++++++++----------
2 files changed, 20 insertions(+), 12 deletions(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index fe7d8dbef77e..c479a2515a62 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -1251,8 +1251,14 @@ static inline bool skb_data_unref(const struct sk_buff *skb,
return true;
}

-void __fix_address
-kfree_skb_reason(struct sk_buff *skb, enum skb_drop_reason reason);
+void __fix_address sk_skb_reason_drop(struct sock *sk, struct sk_buff *skb,
+ enum skb_drop_reason reason);
+
+static inline void
+kfree_skb_reason(struct sk_buff *skb, enum skb_drop_reason reason)
+{
+ sk_skb_reason_drop(NULL, skb, reason);
+}

/**
* kfree_skb - free an sk_buff with 'NOT_SPECIFIED' reason
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 2854afdd713f..9def11fe42c4 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -1190,7 +1190,8 @@ void __kfree_skb(struct sk_buff *skb)
EXPORT_SYMBOL(__kfree_skb);

static __always_inline
-bool __kfree_skb_reason(struct sk_buff *skb, enum skb_drop_reason reason)
+bool __sk_skb_reason_drop(struct sock *sk, struct sk_buff *skb,
+ enum skb_drop_reason reason)
{
if (unlikely(!skb_unref(skb)))
return false;
@@ -1203,26 +1204,27 @@ bool __kfree_skb_reason(struct sk_buff *skb, enum skb_drop_reason reason)
if (reason == SKB_CONSUMED)
trace_consume_skb(skb, __builtin_return_address(0));
else
- trace_kfree_skb(skb, __builtin_return_address(0), reason, NULL);
+ trace_kfree_skb(skb, __builtin_return_address(0), reason, sk);
return true;
}

/**
- * kfree_skb_reason - free an sk_buff with special reason
+ * sk_skb_reason_drop - free an sk_buff with special reason
+ * @sk: the socket to receive @skb, or NULL if not applicable
* @skb: buffer to free
* @reason: reason why this skb is dropped
*
- * Drop a reference to the buffer and free it if the usage count has
- * hit zero. Meanwhile, pass the drop reason to 'kfree_skb'
- * tracepoint.
+ * Drop a reference to the buffer and free it if the usage count has hit
+ * zero. Meanwhile, pass the receiving socket and drop reason to
+ * 'kfree_skb' tracepoint.
*/
void __fix_address
-kfree_skb_reason(struct sk_buff *skb, enum skb_drop_reason reason)
+sk_skb_reason_drop(struct sock *sk, struct sk_buff *skb, enum skb_drop_reason reason)
{
- if (__kfree_skb_reason(skb, reason))
+ if (__sk_skb_reason_drop(sk, skb, reason))
__kfree_skb(skb);
}
-EXPORT_SYMBOL(kfree_skb_reason);
+EXPORT_SYMBOL(sk_skb_reason_drop);

#define KFREE_SKB_BULK_SIZE 16

@@ -1261,7 +1263,7 @@ kfree_skb_list_reason(struct sk_buff *segs, enum skb_drop_reason reason)
while (segs) {
struct sk_buff *next = segs->next;

- if (__kfree_skb_reason(segs, reason)) {
+ if (__sk_skb_reason_drop(NULL, segs, reason)) {
skb_poison_list(segs);
kfree_skb_add_bulk(segs, &sa, reason);
}
--
2.30.2



2024-06-11 20:12:54

by Yan Zhai

[permalink] [raw]
Subject: [PATCH v4 net-next 4/7] net: raw: use sk_skb_reason_drop to free rx packets

Replace kfree_skb_reason with sk_skb_reason_drop and pass the receiving
socket to the tracepoint.

Signed-off-by: Yan Zhai <[email protected]>
---
net/ipv4/raw.c | 4 ++--
net/ipv6/raw.c | 8 ++++----
2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/net/ipv4/raw.c b/net/ipv4/raw.c
index 1a0953650356..474dfd263c8b 100644
--- a/net/ipv4/raw.c
+++ b/net/ipv4/raw.c
@@ -301,7 +301,7 @@ static int raw_rcv_skb(struct sock *sk, struct sk_buff *skb)

ipv4_pktinfo_prepare(sk, skb, true);
if (sock_queue_rcv_skb_reason(sk, skb, &reason) < 0) {
- kfree_skb_reason(skb, reason);
+ sk_skb_reason_drop(sk, skb, reason);
return NET_RX_DROP;
}

@@ -312,7 +312,7 @@ int raw_rcv(struct sock *sk, struct sk_buff *skb)
{
if (!xfrm4_policy_check(sk, XFRM_POLICY_IN, skb)) {
atomic_inc(&sk->sk_drops);
- kfree_skb_reason(skb, SKB_DROP_REASON_XFRM_POLICY);
+ sk_skb_reason_drop(sk, skb, SKB_DROP_REASON_XFRM_POLICY);
return NET_RX_DROP;
}
nf_reset_ct(skb);
diff --git a/net/ipv6/raw.c b/net/ipv6/raw.c
index f838366e8256..608fa9d05b55 100644
--- a/net/ipv6/raw.c
+++ b/net/ipv6/raw.c
@@ -362,14 +362,14 @@ static inline int rawv6_rcv_skb(struct sock *sk, struct sk_buff *skb)
if ((raw6_sk(sk)->checksum || rcu_access_pointer(sk->sk_filter)) &&
skb_checksum_complete(skb)) {
atomic_inc(&sk->sk_drops);
- kfree_skb_reason(skb, SKB_DROP_REASON_SKB_CSUM);
+ sk_skb_reason_drop(sk, skb, SKB_DROP_REASON_SKB_CSUM);
return NET_RX_DROP;
}

/* Charge it to the socket. */
skb_dst_drop(skb);
if (sock_queue_rcv_skb_reason(sk, skb, &reason) < 0) {
- kfree_skb_reason(skb, reason);
+ sk_skb_reason_drop(sk, skb, reason);
return NET_RX_DROP;
}

@@ -390,7 +390,7 @@ int rawv6_rcv(struct sock *sk, struct sk_buff *skb)

if (!xfrm6_policy_check(sk, XFRM_POLICY_IN, skb)) {
atomic_inc(&sk->sk_drops);
- kfree_skb_reason(skb, SKB_DROP_REASON_XFRM_POLICY);
+ sk_skb_reason_drop(sk, skb, SKB_DROP_REASON_XFRM_POLICY);
return NET_RX_DROP;
}
nf_reset_ct(skb);
@@ -415,7 +415,7 @@ int rawv6_rcv(struct sock *sk, struct sk_buff *skb)
if (inet_test_bit(HDRINCL, sk)) {
if (skb_checksum_complete(skb)) {
atomic_inc(&sk->sk_drops);
- kfree_skb_reason(skb, SKB_DROP_REASON_SKB_CSUM);
+ sk_skb_reason_drop(sk, skb, SKB_DROP_REASON_SKB_CSUM);
return NET_RX_DROP;
}
}
--
2.30.2



2024-06-11 20:13:11

by Yan Zhai

[permalink] [raw]
Subject: [PATCH v4 net-next 3/7] ping: use sk_skb_reason_drop to free rx packets

Replace kfree_skb_reason with sk_skb_reason_drop and pass the receiving
socket to the tracepoint.

Signed-off-by: Yan Zhai <[email protected]>
---
net/ipv4/ping.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/ipv4/ping.c b/net/ipv4/ping.c
index 823306487a82..619ddc087957 100644
--- a/net/ipv4/ping.c
+++ b/net/ipv4/ping.c
@@ -946,7 +946,7 @@ static enum skb_drop_reason __ping_queue_rcv_skb(struct sock *sk,
pr_debug("ping_queue_rcv_skb(sk=%p,sk->num=%d,skb=%p)\n",
inet_sk(sk), inet_sk(sk)->inet_num, skb);
if (sock_queue_rcv_skb_reason(sk, skb, &reason) < 0) {
- kfree_skb_reason(skb, reason);
+ sk_skb_reason_drop(sk, skb, reason);
pr_debug("ping_queue_rcv_skb -> failed\n");
return reason;
}
--
2.30.2



2024-06-11 20:13:14

by Yan Zhai

[permalink] [raw]
Subject: [PATCH v4 net-next 5/7] tcp: use sk_skb_reason_drop to free rx packets

Replace kfree_skb_reason with sk_skb_reason_drop and pass the receiving
socket to the tracepoint.

Reported-by: kernel test robot <[email protected]>
Closes: https://lore.kernel.org/r/[email protected]/
Signed-off-by: Yan Zhai <[email protected]>
---
v2->v3: added missing report tags
---
net/ipv4/syncookies.c | 2 +-
net/ipv4/tcp_input.c | 2 +-
net/ipv4/tcp_ipv4.c | 6 +++---
net/ipv6/syncookies.c | 2 +-
net/ipv6/tcp_ipv6.c | 6 +++---
5 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/net/ipv4/syncookies.c b/net/ipv4/syncookies.c
index b61d36810fe3..1948d15f1f28 100644
--- a/net/ipv4/syncookies.c
+++ b/net/ipv4/syncookies.c
@@ -496,6 +496,6 @@ struct sock *cookie_v4_check(struct sock *sk, struct sk_buff *skb)
out_free:
reqsk_free(req);
out_drop:
- kfree_skb_reason(skb, reason);
+ sk_skb_reason_drop(sk, skb, reason);
return NULL;
}
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 5aadf64e554d..bedb079de1f0 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -4859,7 +4859,7 @@ static void tcp_drop_reason(struct sock *sk, struct sk_buff *skb,
enum skb_drop_reason reason)
{
sk_drops_add(sk, skb);
- kfree_skb_reason(skb, reason);
+ sk_skb_reason_drop(sk, skb, reason);
}

/* This one checks to see if we can put data from the
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index 041c7eda9abe..f7a046bc4b27 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -1939,7 +1939,7 @@ int tcp_v4_do_rcv(struct sock *sk, struct sk_buff *skb)
reset:
tcp_v4_send_reset(rsk, skb, sk_rst_convert_drop_reason(reason));
discard:
- kfree_skb_reason(skb, reason);
+ sk_skb_reason_drop(sk, skb, reason);
/* Be careful here. If this function gets more complicated and
* gcc suffers from register pressure on the x86, sk (in %ebx)
* might be destroyed here. This current version compiles correctly,
@@ -2176,8 +2176,8 @@ int tcp_v4_rcv(struct sk_buff *skb)
int dif = inet_iif(skb);
const struct iphdr *iph;
const struct tcphdr *th;
+ struct sock *sk = NULL;
bool refcounted;
- struct sock *sk;
int ret;
u32 isn;

@@ -2376,7 +2376,7 @@ int tcp_v4_rcv(struct sk_buff *skb)
discard_it:
SKB_DR_OR(drop_reason, NOT_SPECIFIED);
/* Discard frame. */
- kfree_skb_reason(skb, drop_reason);
+ sk_skb_reason_drop(sk, skb, drop_reason);
return 0;

discard_and_relse:
diff --git a/net/ipv6/syncookies.c b/net/ipv6/syncookies.c
index bfad1e89b6a6..9d83eadd308b 100644
--- a/net/ipv6/syncookies.c
+++ b/net/ipv6/syncookies.c
@@ -275,6 +275,6 @@ struct sock *cookie_v6_check(struct sock *sk, struct sk_buff *skb)
out_free:
reqsk_free(req);
out_drop:
- kfree_skb_reason(skb, reason);
+ sk_skb_reason_drop(sk, skb, reason);
return NULL;
}
diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
index 1ac7502e1bf5..93967accc35d 100644
--- a/net/ipv6/tcp_ipv6.c
+++ b/net/ipv6/tcp_ipv6.c
@@ -1678,7 +1678,7 @@ int tcp_v6_do_rcv(struct sock *sk, struct sk_buff *skb)
discard:
if (opt_skb)
__kfree_skb(opt_skb);
- kfree_skb_reason(skb, reason);
+ sk_skb_reason_drop(sk, skb, reason);
return 0;
csum_err:
reason = SKB_DROP_REASON_TCP_CSUM;
@@ -1751,8 +1751,8 @@ INDIRECT_CALLABLE_SCOPE int tcp_v6_rcv(struct sk_buff *skb)
int dif = inet6_iif(skb);
const struct tcphdr *th;
const struct ipv6hdr *hdr;
+ struct sock *sk = NULL;
bool refcounted;
- struct sock *sk;
int ret;
u32 isn;
struct net *net = dev_net(skb->dev);
@@ -1944,7 +1944,7 @@ INDIRECT_CALLABLE_SCOPE int tcp_v6_rcv(struct sk_buff *skb)

discard_it:
SKB_DR_OR(drop_reason, NOT_SPECIFIED);
- kfree_skb_reason(skb, drop_reason);
+ sk_skb_reason_drop(sk, skb, drop_reason);
return 0;

discard_and_relse:
--
2.30.2



2024-06-11 20:13:41

by Yan Zhai

[permalink] [raw]
Subject: [PATCH v4 net-next 6/7] udp: use sk_skb_reason_drop to free rx packets

Replace kfree_skb_reason with sk_skb_reason_drop and pass the receiving
socket to the tracepoint.

Reported-by: kernel test robot <[email protected]>
Closes: https://lore.kernel.org/r/[email protected]/
Signed-off-by: Yan Zhai <[email protected]>
---
v2->v3: added missing report tags
---
net/ipv4/udp.c | 10 +++++-----
net/ipv6/udp.c | 10 +++++-----
2 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 189c9113fe9a..ecafb1695999 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -2074,7 +2074,7 @@ static int __udp_queue_rcv_skb(struct sock *sk, struct sk_buff *skb)
}
UDP_INC_STATS(sock_net(sk), UDP_MIB_INERRORS, is_udplite);
trace_udp_fail_queue_rcv_skb(rc, sk, skb);
- kfree_skb_reason(skb, drop_reason);
+ sk_skb_reason_drop(sk, skb, drop_reason);
return -1;
}

@@ -2196,7 +2196,7 @@ static int udp_queue_rcv_one_skb(struct sock *sk, struct sk_buff *skb)
drop:
__UDP_INC_STATS(sock_net(sk), UDP_MIB_INERRORS, is_udplite);
atomic_inc(&sk->sk_drops);
- kfree_skb_reason(skb, drop_reason);
+ sk_skb_reason_drop(sk, skb, drop_reason);
return -1;
}

@@ -2383,7 +2383,7 @@ static int udp_unicast_rcv_skb(struct sock *sk, struct sk_buff *skb,
int __udp4_lib_rcv(struct sk_buff *skb, struct udp_table *udptable,
int proto)
{
- struct sock *sk;
+ struct sock *sk = NULL;
struct udphdr *uh;
unsigned short ulen;
struct rtable *rt = skb_rtable(skb);
@@ -2460,7 +2460,7 @@ int __udp4_lib_rcv(struct sk_buff *skb, struct udp_table *udptable,
* Hmm. We got an UDP packet to a port to which we
* don't wanna listen. Ignore it.
*/
- kfree_skb_reason(skb, drop_reason);
+ sk_skb_reason_drop(sk, skb, drop_reason);
return 0;

short_packet:
@@ -2485,7 +2485,7 @@ int __udp4_lib_rcv(struct sk_buff *skb, struct udp_table *udptable,
__UDP_INC_STATS(net, UDP_MIB_CSUMERRORS, proto == IPPROTO_UDPLITE);
drop:
__UDP_INC_STATS(net, UDP_MIB_INERRORS, proto == IPPROTO_UDPLITE);
- kfree_skb_reason(skb, drop_reason);
+ sk_skb_reason_drop(sk, skb, drop_reason);
return 0;
}

diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index c81a07ac0463..b56f0b9f4307 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -673,7 +673,7 @@ static int __udpv6_queue_rcv_skb(struct sock *sk, struct sk_buff *skb)
}
UDP6_INC_STATS(sock_net(sk), UDP_MIB_INERRORS, is_udplite);
trace_udp_fail_queue_rcv_skb(rc, sk, skb);
- kfree_skb_reason(skb, drop_reason);
+ sk_skb_reason_drop(sk, skb, drop_reason);
return -1;
}

@@ -776,7 +776,7 @@ static int udpv6_queue_rcv_one_skb(struct sock *sk, struct sk_buff *skb)
drop:
__UDP6_INC_STATS(sock_net(sk), UDP_MIB_INERRORS, is_udplite);
atomic_inc(&sk->sk_drops);
- kfree_skb_reason(skb, drop_reason);
+ sk_skb_reason_drop(sk, skb, drop_reason);
return -1;
}

@@ -940,8 +940,8 @@ int __udp6_lib_rcv(struct sk_buff *skb, struct udp_table *udptable,
enum skb_drop_reason reason = SKB_DROP_REASON_NOT_SPECIFIED;
const struct in6_addr *saddr, *daddr;
struct net *net = dev_net(skb->dev);
+ struct sock *sk = NULL;
struct udphdr *uh;
- struct sock *sk;
bool refcounted;
u32 ulen = 0;

@@ -1033,7 +1033,7 @@ int __udp6_lib_rcv(struct sk_buff *skb, struct udp_table *udptable,
__UDP6_INC_STATS(net, UDP_MIB_NOPORTS, proto == IPPROTO_UDPLITE);
icmpv6_send(skb, ICMPV6_DEST_UNREACH, ICMPV6_PORT_UNREACH, 0);

- kfree_skb_reason(skb, reason);
+ sk_skb_reason_drop(sk, skb, reason);
return 0;

short_packet:
@@ -1054,7 +1054,7 @@ int __udp6_lib_rcv(struct sk_buff *skb, struct udp_table *udptable,
__UDP6_INC_STATS(net, UDP_MIB_CSUMERRORS, proto == IPPROTO_UDPLITE);
discard:
__UDP6_INC_STATS(net, UDP_MIB_INERRORS, proto == IPPROTO_UDPLITE);
- kfree_skb_reason(skb, reason);
+ sk_skb_reason_drop(sk, skb, reason);
return 0;
}

--
2.30.2



2024-06-11 20:14:00

by Yan Zhai

[permalink] [raw]
Subject: [PATCH v4 net-next 7/7] af_packet: use sk_skb_reason_drop to free rx packets

Replace kfree_skb_reason with sk_skb_reason_drop and pass the receiving
socket to the tracepoint.

Reported-by: kernel test robot <[email protected]>
Closes: https://lore.kernel.org/r/[email protected]/
Signed-off-by: Yan Zhai <[email protected]>
---
v2->v3: fixed uninitialized sk, added missing report tags.
---
net/packet/af_packet.c | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index fce390887591..42d29b8a84fc 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -2121,7 +2121,7 @@ static int packet_rcv(struct sk_buff *skb, struct net_device *dev,
struct packet_type *pt, struct net_device *orig_dev)
{
enum skb_drop_reason drop_reason = SKB_CONSUMED;
- struct sock *sk;
+ struct sock *sk = NULL;
struct sockaddr_ll *sll;
struct packet_sock *po;
u8 *skb_head = skb->data;
@@ -2226,7 +2226,7 @@ static int packet_rcv(struct sk_buff *skb, struct net_device *dev,
skb->len = skb_len;
}
drop:
- kfree_skb_reason(skb, drop_reason);
+ sk_skb_reason_drop(sk, skb, drop_reason);
return 0;
}

@@ -2234,7 +2234,7 @@ static int tpacket_rcv(struct sk_buff *skb, struct net_device *dev,
struct packet_type *pt, struct net_device *orig_dev)
{
enum skb_drop_reason drop_reason = SKB_CONSUMED;
- struct sock *sk;
+ struct sock *sk = NULL;
struct packet_sock *po;
struct sockaddr_ll *sll;
union tpacket_uhdr h;
@@ -2494,7 +2494,7 @@ static int tpacket_rcv(struct sk_buff *skb, struct net_device *dev,
skb->len = skb_len;
}
drop:
- kfree_skb_reason(skb, drop_reason);
+ sk_skb_reason_drop(sk, skb, drop_reason);
return 0;

drop_n_account:
@@ -2503,7 +2503,7 @@ static int tpacket_rcv(struct sk_buff *skb, struct net_device *dev,
drop_reason = SKB_DROP_REASON_PACKET_SOCK_ERROR;

sk->sk_data_ready(sk);
- kfree_skb_reason(copy_skb, drop_reason);
+ sk_skb_reason_drop(sk, copy_skb, drop_reason);
goto drop_n_restore;
}

--
2.30.2



2024-06-12 08:05:12

by Jesper Dangaard Brouer

[permalink] [raw]
Subject: Re: [PATCH v4 net-next 1/7] net: add rx_sk to trace_kfree_skb



On 11/06/2024 22.11, Yan Zhai wrote:
> skb does not include enough information to find out receiving
> sockets/services and netns/containers on packet drops. In theory
> skb->dev tells about netns, but it can get cleared/reused, e.g. by TCP
> stack for OOO packet lookup. Similarly, skb->sk often identifies a local
> sender, and tells nothing about a receiver.
>
> Allow passing an extra receiving socket to the tracepoint to improve
> the visibility on receiving drops.
>
> Signed-off-by: Yan Zhai<[email protected]>
> ---
> v3->v4: adjusted the TP_STRUCT field order to be consistent
> v2->v3: fixed drop_monitor function prototype
> ---
> include/trace/events/skb.h | 11 +++++++----
> net/core/dev.c | 2 +-
> net/core/drop_monitor.c | 9 ++++++---
> net/core/skbuff.c | 2 +-
> 4 files changed, 15 insertions(+), 9 deletions(-)
>
> diff --git a/include/trace/events/skb.h b/include/trace/events/skb.h
> index 07e0715628ec..3e9ea1cca6f2 100644
> --- a/include/trace/events/skb.h
> +++ b/include/trace/events/skb.h
> @@ -24,13 +24,14 @@ DEFINE_DROP_REASON(FN, FN)
> TRACE_EVENT(kfree_skb,
>
> TP_PROTO(struct sk_buff *skb, void *location,
> - enum skb_drop_reason reason),
> + enum skb_drop_reason reason, struct sock *rx_sk),
>
> - TP_ARGS(skb, location, reason),
> + TP_ARGS(skb, location, reason, rx_sk),
>
> TP_STRUCT__entry(
> __field(void *, skbaddr)
> __field(void *, location)
> + __field(void *, rx_skaddr)

Is there any reason for appending the "addr" part to "rx_sk" ?
It makes it harder to read this is the sk (socket).

AFAICR the skbaddr naming is a legacy thing.

> __field(unsigned short, protocol)
> __field(enum skb_drop_reason, reason)
> ),
> @@ -38,12 +39,14 @@ TRACE_EVENT(kfree_skb,
> TP_fast_assign(
> __entry->skbaddr = skb;
> __entry->location = location;
> + __entry->rx_skaddr = rx_sk;
> __entry->protocol = ntohs(skb->protocol);
> __entry->reason = reason;
> ),
>
> - TP_printk("skbaddr=%p protocol=%u location=%pS reason: %s",
> - __entry->skbaddr, __entry->protocol, __entry->location,
> + TP_printk("skbaddr=%p rx_skaddr=%p protocol=%u location=%pS reason: %s",
^^^^^^^^^
I find it hard to visually tell skbaddr and rx_skaddr apart.
And especially noticing the "skb" vs "sk" part of the string.


> + __entry->skbaddr, __entry->rx_skaddr, __entry->protocol,
> + __entry->location,
> __print_symbolic(__entry->reason,
> DEFINE_DROP_REASON(FN, FNe)))
> );


--Jesper

2024-06-14 10:16:02

by Paolo Abeni

[permalink] [raw]
Subject: Re: [PATCH v4 net-next 1/7] net: add rx_sk to trace_kfree_skb

On Wed, 2024-06-12 at 09:59 +0200, Jesper Dangaard Brouer wrote:
>
> On 11/06/2024 22.11, Yan Zhai wrote:
> > skb does not include enough information to find out receiving
> > sockets/services and netns/containers on packet drops. In theory
> > skb->dev tells about netns, but it can get cleared/reused, e.g. by TCP
> > stack for OOO packet lookup. Similarly, skb->sk often identifies a local
> > sender, and tells nothing about a receiver.
> >
> > Allow passing an extra receiving socket to the tracepoint to improve
> > the visibility on receiving drops.
> >
> > Signed-off-by: Yan Zhai<[email protected]>
> > ---
> > v3->v4: adjusted the TP_STRUCT field order to be consistent
> > v2->v3: fixed drop_monitor function prototype
> > ---
> > include/trace/events/skb.h | 11 +++++++----
> > net/core/dev.c | 2 +-
> > net/core/drop_monitor.c | 9 ++++++---
> > net/core/skbuff.c | 2 +-
> > 4 files changed, 15 insertions(+), 9 deletions(-)
> >
> > diff --git a/include/trace/events/skb.h b/include/trace/events/skb.h
> > index 07e0715628ec..3e9ea1cca6f2 100644
> > --- a/include/trace/events/skb.h
> > +++ b/include/trace/events/skb.h
> > @@ -24,13 +24,14 @@ DEFINE_DROP_REASON(FN, FN)
> > TRACE_EVENT(kfree_skb,
> >
> > TP_PROTO(struct sk_buff *skb, void *location,
> > - enum skb_drop_reason reason),
> > + enum skb_drop_reason reason, struct sock *rx_sk),
> >
> > - TP_ARGS(skb, location, reason),
> > + TP_ARGS(skb, location, reason, rx_sk),
> >
> > TP_STRUCT__entry(
> > __field(void *, skbaddr)
> > __field(void *, location)
> > + __field(void *, rx_skaddr)
>
> Is there any reason for appending the "addr" part to "rx_sk" ?
> It makes it harder to read this is the sk (socket).
>
> AFAICR the skbaddr naming is a legacy thing.

I'm double-minded about the above: I can see your point, but on the
flip side the 'addr' suffix is consistently used in net-related
tracepoints.
>
> > __field(unsigned short, protocol)
> > __field(enum skb_drop_reason, reason)
> > ),
> > @@ -38,12 +39,14 @@ TRACE_EVENT(kfree_skb,
> > TP_fast_assign(
> > __entry->skbaddr = skb;
> > __entry->location = location;
> > + __entry->rx_skaddr = rx_sk;
> > __entry->protocol = ntohs(skb->protocol);
> > __entry->reason = reason;
> > ),
> >
> > - TP_printk("skbaddr=%p protocol=%u location=%pS reason: %s",
> > - __entry->skbaddr, __entry->protocol, __entry->location,
> > + TP_printk("skbaddr=%p rx_skaddr=%p protocol=%u location=%pS reason: %s",
> ^^^^^^^^^
> I find it hard to visually tell skbaddr and rx_skaddr apart.
> And especially noticing the "skb" vs "sk" part of the string.

I agree 'rx_skaddr' is sub-optimal. Either be consistent with all the
other net tracepoints and use 'skaddr' (which will very likely will
increase Jesper concerns, but I personally have no problem with such
format) or prefer readability with something alike 'rx_sk' or (even
better) 'sk'.

Thanks,

Paolo


2024-06-14 19:41:47

by Yan Zhai

[permalink] [raw]
Subject: Re: [PATCH v4 net-next 1/7] net: add rx_sk to trace_kfree_skb

On Fri, Jun 14, 2024 at 5:15 AM Paolo Abeni <[email protected]> wrote:
>
> On Wed, 2024-06-12 at 09:59 +0200, Jesper Dangaard Brouer wrote:
> >
> > On 11/06/2024 22.11, Yan Zhai wrote:
> > > skb does not include enough information to find out receiving
> > > sockets/services and netns/containers on packet drops. In theory
> > > skb->dev tells about netns, but it can get cleared/reused, e.g. by TCP
> > > stack for OOO packet lookup. Similarly, skb->sk often identifies a local
> > > sender, and tells nothing about a receiver.
> > >
> > > Allow passing an extra receiving socket to the tracepoint to improve
> > > the visibility on receiving drops.
> > >
> > > Signed-off-by: Yan Zhai<[email protected]>
> > > ---
> > > v3->v4: adjusted the TP_STRUCT field order to be consistent
> > > v2->v3: fixed drop_monitor function prototype
> > > ---
> > > include/trace/events/skb.h | 11 +++++++----
> > > net/core/dev.c | 2 +-
> > > net/core/drop_monitor.c | 9 ++++++---
> > > net/core/skbuff.c | 2 +-
> > > 4 files changed, 15 insertions(+), 9 deletions(-)
> > >
> > > diff --git a/include/trace/events/skb.h b/include/trace/events/skb.h
> > > index 07e0715628ec..3e9ea1cca6f2 100644
> > > --- a/include/trace/events/skb.h
> > > +++ b/include/trace/events/skb.h
> > > @@ -24,13 +24,14 @@ DEFINE_DROP_REASON(FN, FN)
> > > TRACE_EVENT(kfree_skb,
> > >
> > > TP_PROTO(struct sk_buff *skb, void *location,
> > > - enum skb_drop_reason reason),
> > > + enum skb_drop_reason reason, struct sock *rx_sk),
> > >
> > > - TP_ARGS(skb, location, reason),
> > > + TP_ARGS(skb, location, reason, rx_sk),
> > >
> > > TP_STRUCT__entry(
> > > __field(void *, skbaddr)
> > > __field(void *, location)
> > > + __field(void *, rx_skaddr)
> >
> > Is there any reason for appending the "addr" part to "rx_sk" ?
> > It makes it harder to read this is the sk (socket).
> >
> > AFAICR the skbaddr naming is a legacy thing.
>
> I'm double-minded about the above: I can see your point, but on the
> flip side the 'addr' suffix is consistently used in net-related
> tracepoints.
> >
> > > __field(unsigned short, protocol)
> > > __field(enum skb_drop_reason, reason)
> > > ),
> > > @@ -38,12 +39,14 @@ TRACE_EVENT(kfree_skb,
> > > TP_fast_assign(
> > > __entry->skbaddr = skb;
> > > __entry->location = location;
> > > + __entry->rx_skaddr = rx_sk;
> > > __entry->protocol = ntohs(skb->protocol);
> > > __entry->reason = reason;
> > > ),
> > >
> > > - TP_printk("skbaddr=%p protocol=%u location=%pS reason: %s",
> > > - __entry->skbaddr, __entry->protocol, __entry->location,
> > > + TP_printk("skbaddr=%p rx_skaddr=%p protocol=%u location=%pS reason: %s",
> > ^^^^^^^^^
> > I find it hard to visually tell skbaddr and rx_skaddr apart.
> > And especially noticing the "skb" vs "sk" part of the string.
>
> I agree 'rx_skaddr' is sub-optimal. Either be consistent with all the
> other net tracepoints and use 'skaddr' (which will very likely will
> increase Jesper concerns, but I personally have no problem with such
> format) or prefer readability with something alike 'rx_sk' or (even
> better) 'sk'.
>

Jesper explained to me in a private message that "addr" makes more
sense when there was no BPF, since likely nothing would dereference
the pointer anymore at that time, so it's purely an address. But it is
no longer the case now. Also, in later patches of this change, I am
already breaking the "convention" by replacing kfree_skb with
sk_skb_reason_drop, so how about breaking it once more, and just
calling it "rx_sk". I want to keep the "rx_" to emphasize this is a
receiving socket. Let me send an amended version early next week and
see if more thoughts come.

thanks
Yan


> Thanks,
>
> Paolo
>