2021-12-30 09:32:59

by Menglong Dong

[permalink] [raw]
Subject: [PATCH v2 net-next 0/3] net: skb: introduce kfree_skb_with_reason() and use it for tcp and udp

From: Menglong Dong <[email protected]>

In this series patch, the interface kfree_skb_with_reason() is
introduced(), which is used to collect skb drop reason, and pass
it to 'kfree_skb' tracepoint. Therefor, 'drop_monitor' or eBPF is
able to monitor abnormal skb with detail reason.

In fact, this series patches are out of the intelligence of David
and Steve, I'm just a truck man :/

Previous discussion is here:

https://lore.kernel.org/netdev/[email protected]/
https://lore.kernel.org/netdev/[email protected]/

In the first patch, kfree_skb_with_reason() is introduced and
the 'reason' field is added to 'kfree_skb' tracepoint. In the
second patch, 'kfree_skb()' in replaced with 'kfree_skb_with_reason()'
in tcp_v4_rcv(). In the third patch, 'kfree_skb_with_reason()' is
used in __udp4_lib_rcv().

Changes since v1:
- rename some drop reason, as David suggested
- add the third patch


Menglong Dong (3):
net: skb: introduce kfree_skb_with_reason()
net: skb: use kfree_skb_with_reason() in tcp_v4_rcv()
net: skb: use kfree_skb_with_reason() in __udp4_lib_rcv()

include/linux/skbuff.h | 18 +++++++++++++++++
include/trace/events/skb.h | 41 +++++++++++++++++++++++++++++++-------
net/core/dev.c | 3 ++-
net/core/drop_monitor.c | 10 +++++++---
net/core/skbuff.c | 22 +++++++++++++++++++-
net/ipv4/tcp_ipv4.c | 14 ++++++++++---
net/ipv4/udp.c | 10 ++++++++--
7 files changed, 101 insertions(+), 17 deletions(-)

--
2.27.0



2021-12-30 09:33:08

by Menglong Dong

[permalink] [raw]
Subject: [PATCH v2 net-next 1/3] net: skb: introduce kfree_skb_with_reason()

From: Menglong Dong <[email protected]>

Introduce the interface kfree_skb_with_reason(), which is used to pass
the reason why the skb is dropped to 'kfree_skb' tracepoint.

Add the 'reason' field to 'trace_kfree_skb', therefor user can get
more detail information about abnormal skb with 'drop_monitor' or
eBPF.

Signed-off-by: Menglong Dong <[email protected]>
---
include/linux/skbuff.h | 13 +++++++++++++
include/trace/events/skb.h | 36 +++++++++++++++++++++++++++++-------
net/core/dev.c | 3 ++-
net/core/drop_monitor.c | 10 +++++++---
net/core/skbuff.c | 22 +++++++++++++++++++++-
5 files changed, 72 insertions(+), 12 deletions(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index aa9d42724e20..3620b3ff2154 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -305,6 +305,17 @@ struct sk_buff_head {

struct sk_buff;

+/* The reason of skb drop, which is used in kfree_skb_with_reason().
+ * en...maybe they should be splited by group?
+ *
+ * Each item here should also be in 'TRACE_SKB_DROP_REASON', which is
+ * used to translate the reason to string.
+ */
+enum skb_drop_reason {
+ SKB_DROP_REASON_NOT_SPECIFIED,
+ SKB_DROP_REASON_MAX,
+};
+
/* To allow 64K frame to be packed as single skb without frag_list we
* require 64K/PAGE_SIZE pages plus 1 additional page to allow for
* buffers which do not start on a page boundary.
@@ -1087,6 +1098,8 @@ static inline bool skb_unref(struct sk_buff *skb)

void skb_release_head_state(struct sk_buff *skb);
void kfree_skb(struct sk_buff *skb);
+void kfree_skb_with_reason(struct sk_buff *skb,
+ enum skb_drop_reason reason);
void kfree_skb_list(struct sk_buff *segs);
void skb_dump(const char *level, const struct sk_buff *skb, bool full_pkt);
void skb_tx_error(struct sk_buff *skb);
diff --git a/include/trace/events/skb.h b/include/trace/events/skb.h
index 9e92f22eb086..cab1c08a30cd 100644
--- a/include/trace/events/skb.h
+++ b/include/trace/events/skb.h
@@ -9,29 +9,51 @@
#include <linux/netdevice.h>
#include <linux/tracepoint.h>

+#define TRACE_SKB_DROP_REASON \
+ EM(SKB_DROP_REASON_NOT_SPECIFIED, NOT_SPECIFIED) \
+ EMe(SKB_DROP_REASON_MAX, HAHA_MAX)
+
+#undef EM
+#undef EMe
+
+#define EM(a, b) TRACE_DEFINE_ENUM(a);
+#define EMe(a, b) TRACE_DEFINE_ENUM(a);
+
+TRACE_SKB_DROP_REASON
+
+#undef EM
+#undef EMe
+#define EM(a, b) { a, #b },
+#define EMe(a, b) { a, #b }
+
+
/*
* Tracepoint for free an sk_buff:
*/
TRACE_EVENT(kfree_skb,

- TP_PROTO(struct sk_buff *skb, void *location),
+ TP_PROTO(struct sk_buff *skb, void *location,
+ enum skb_drop_reason reason),

- TP_ARGS(skb, location),
+ TP_ARGS(skb, location, reason),

TP_STRUCT__entry(
- __field( void *, skbaddr )
- __field( void *, location )
- __field( unsigned short, protocol )
+ __field(void *, skbaddr)
+ __field(void *, location)
+ __field(unsigned short, protocol)
+ __field(enum skb_drop_reason, reason)
),

TP_fast_assign(
__entry->skbaddr = skb;
__entry->location = location;
__entry->protocol = ntohs(skb->protocol);
+ __entry->reason = reason;
),

- TP_printk("skbaddr=%p protocol=%u location=%p",
- __entry->skbaddr, __entry->protocol, __entry->location)
+ TP_printk("skbaddr=%p protocol=%u location=%p reason: %s",
+ __entry->skbaddr, __entry->protocol, __entry->location,
+ __print_symbolic(__entry->reason, TRACE_SKB_DROP_REASON))
);

TRACE_EVENT(consume_skb,
diff --git a/net/core/dev.c b/net/core/dev.c
index 644b9c8be3a8..9464dbf9e3d6 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -4899,7 +4899,8 @@ static __latent_entropy void net_tx_action(struct softirq_action *h)
if (likely(get_kfree_skb_cb(skb)->reason == SKB_REASON_CONSUMED))
trace_consume_skb(skb);
else
- trace_kfree_skb(skb, net_tx_action);
+ trace_kfree_skb(skb, net_tx_action,
+ SKB_DROP_REASON_NOT_SPECIFIED);

if (skb->fclone != SKB_FCLONE_UNAVAILABLE)
__kfree_skb(skb);
diff --git a/net/core/drop_monitor.c b/net/core/drop_monitor.c
index 3d0ab2eec916..7b288a121a41 100644
--- a/net/core/drop_monitor.c
+++ b/net/core/drop_monitor.c
@@ -110,7 +110,8 @@ static u32 net_dm_queue_len = 1000;

struct net_dm_alert_ops {
void (*kfree_skb_probe)(void *ignore, struct sk_buff *skb,
- void *location);
+ void *location,
+ enum skb_drop_reason reason);
void (*napi_poll_probe)(void *ignore, struct napi_struct *napi,
int work, int budget);
void (*work_item_func)(struct work_struct *work);
@@ -262,7 +263,9 @@ static void trace_drop_common(struct sk_buff *skb, void *location)
spin_unlock_irqrestore(&data->lock, flags);
}

-static void trace_kfree_skb_hit(void *ignore, struct sk_buff *skb, void *location)
+static void trace_kfree_skb_hit(void *ignore, struct sk_buff *skb,
+ void *location,
+ enum skb_drop_reason reason)
{
trace_drop_common(skb, location);
}
@@ -490,7 +493,8 @@ static const struct net_dm_alert_ops net_dm_alert_summary_ops = {

static void net_dm_packet_trace_kfree_skb_hit(void *ignore,
struct sk_buff *skb,
- void *location)
+ void *location,
+ enum skb_drop_reason reason)
{
ktime_t tstamp = ktime_get_real();
struct per_cpu_dm_data *data;
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 275f7b8416fe..570dc022a8a1 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -770,11 +770,31 @@ void kfree_skb(struct sk_buff *skb)
if (!skb_unref(skb))
return;

- trace_kfree_skb(skb, __builtin_return_address(0));
+ trace_kfree_skb(skb, __builtin_return_address(0),
+ SKB_DROP_REASON_NOT_SPECIFIED);
__kfree_skb(skb);
}
EXPORT_SYMBOL(kfree_skb);

+/**
+ * kfree_skb_with_reason - free an sk_buff with reason
+ * @skb: buffer to free
+ * @reason: reason why this skb is dropped
+ *
+ * The same as kfree_skb() except that this function will pass
+ * the drop reason to 'kfree_skb' tracepoint.
+ */
+void kfree_skb_with_reason(struct sk_buff *skb,
+ enum skb_drop_reason reason)
+{
+ if (!skb_unref(skb))
+ return;
+
+ trace_kfree_skb(skb, __builtin_return_address(0), reason);
+ __kfree_skb(skb);
+}
+EXPORT_SYMBOL(kfree_skb_with_reason);
+
void kfree_skb_list(struct sk_buff *segs)
{
while (segs) {
--
2.30.2


2021-12-30 09:33:14

by Menglong Dong

[permalink] [raw]
Subject: [PATCH v2 net-next 2/3] net: skb: use kfree_skb_with_reason() in tcp_v4_rcv()

From: Menglong Dong <[email protected]>

Replace kfree_skb() with kfree_skb_with_reason() in tcp_v4_rcv().
Following drop reason are added:

SKB_DROP_REASON_NO_SOCKET
SKB_DROP_REASON_PKT_TOO_SMALL
SKB_DROP_REASON_TCP_CSUM
SKB_DROP_REASON_TCP_FILTER

After this patch, 'kfree_skb' event will print message like this:

$ TASK-PID CPU# ||||| TIMESTAMP FUNCTION
$ | | | ||||| | |
<idle>-0 [000] ..s1. 36.113438: kfree_skb: skbaddr=(____ptrval____) protocol=2048 location=(____ptrval____) reason: NO_SOCKET

The reason of skb drop is printed too.

Signed-off-by: Menglong Dong <[email protected]>
---
v2:
- rename some reason name as David suggested
- add the new reason: SKB_DROP_REASON_TCP_FILTER
---
include/linux/skbuff.h | 4 ++++
include/trace/events/skb.h | 4 ++++
net/ipv4/tcp_ipv4.c | 14 +++++++++++---
3 files changed, 19 insertions(+), 3 deletions(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 3620b3ff2154..43cb3b75b5af 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -313,6 +313,10 @@ struct sk_buff;
*/
enum skb_drop_reason {
SKB_DROP_REASON_NOT_SPECIFIED,
+ SKB_DROP_REASON_NO_SOCKET,
+ SKB_DROP_REASON_PKT_TOO_SMALL,
+ SKB_DROP_REASON_TCP_CSUM,
+ SKB_DROP_REASON_TCP_FILTER,
SKB_DROP_REASON_MAX,
};

diff --git a/include/trace/events/skb.h b/include/trace/events/skb.h
index cab1c08a30cd..c6f4ecf6781e 100644
--- a/include/trace/events/skb.h
+++ b/include/trace/events/skb.h
@@ -11,6 +11,10 @@

#define TRACE_SKB_DROP_REASON \
EM(SKB_DROP_REASON_NOT_SPECIFIED, NOT_SPECIFIED) \
+ EM(SKB_DROP_REASON_NO_SOCKET, NO_SOCKET) \
+ EM(SKB_DROP_REASON_PKT_TOO_SMALL, PKT_TOO_SMALL) \
+ EM(SKB_DROP_REASON_TCP_CSUM, TCP_CSUM) \
+ EM(SKB_DROP_REASON_TCP_FILTER, TCP_FILTER) \
EMe(SKB_DROP_REASON_MAX, HAHA_MAX)

#undef EM
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index ac10e4cdd8d0..61832949fc93 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -1971,8 +1971,10 @@ int tcp_v4_rcv(struct sk_buff *skb)
const struct tcphdr *th;
bool refcounted;
struct sock *sk;
+ int drop_reason;
int ret;

+ drop_reason = SKB_DROP_REASON_NOT_SPECIFIED;
if (skb->pkt_type != PACKET_HOST)
goto discard_it;

@@ -1984,8 +1986,10 @@ int tcp_v4_rcv(struct sk_buff *skb)

th = (const struct tcphdr *)skb->data;

- if (unlikely(th->doff < sizeof(struct tcphdr) / 4))
+ if (unlikely(th->doff < sizeof(struct tcphdr) / 4)) {
+ drop_reason = SKB_DROP_REASON_PKT_TOO_SMALL;
goto bad_packet;
+ }
if (!pskb_may_pull(skb, th->doff * 4))
goto discard_it;

@@ -2090,8 +2094,10 @@ int tcp_v4_rcv(struct sk_buff *skb)

nf_reset_ct(skb);

- if (tcp_filter(sk, skb))
+ if (tcp_filter(sk, skb)) {
+ drop_reason = SKB_DROP_REASON_TCP_FILTER;
goto discard_and_relse;
+ }
th = (const struct tcphdr *)skb->data;
iph = ip_hdr(skb);
tcp_v4_fill_cb(skb, iph, th);
@@ -2124,6 +2130,7 @@ int tcp_v4_rcv(struct sk_buff *skb)
return ret;

no_tcp_socket:
+ drop_reason = SKB_DROP_REASON_NO_SOCKET;
if (!xfrm4_policy_check(NULL, XFRM_POLICY_IN, skb))
goto discard_it;

@@ -2131,6 +2138,7 @@ int tcp_v4_rcv(struct sk_buff *skb)

if (tcp_checksum_complete(skb)) {
csum_error:
+ drop_reason = SKB_DROP_REASON_TCP_CSUM;
trace_tcp_bad_csum(skb);
__TCP_INC_STATS(net, TCP_MIB_CSUMERRORS);
bad_packet:
@@ -2141,7 +2149,7 @@ int tcp_v4_rcv(struct sk_buff *skb)

discard_it:
/* Discard frame. */
- kfree_skb(skb);
+ kfree_skb_with_reason(skb, drop_reason);
return 0;

discard_and_relse:
--
2.30.2


2021-12-30 09:33:21

by Menglong Dong

[permalink] [raw]
Subject: [PATCH net-next 3/3] net: skb: use kfree_skb_with_reason() in __udp4_lib_rcv()

From: Menglong Dong <[email protected]>

Replace kfree_skb() with kfree_skb_with_reason() in __udp4_lib_rcv.
New drop reason 'SKB_DROP_REASON_UDP_CSUM' is added for udp csum
error.

Signed-off-by: Menglong Dong <[email protected]>
---
include/linux/skbuff.h | 1 +
include/trace/events/skb.h | 1 +
net/ipv4/udp.c | 10 ++++++++--
3 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 43cb3b75b5af..f0c6949fd19c 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -317,6 +317,7 @@ enum skb_drop_reason {
SKB_DROP_REASON_PKT_TOO_SMALL,
SKB_DROP_REASON_TCP_CSUM,
SKB_DROP_REASON_TCP_FILTER,
+ SKB_DROP_REASON_UDP_CSUM,
SKB_DROP_REASON_MAX,
};

diff --git a/include/trace/events/skb.h b/include/trace/events/skb.h
index c6f4ecf6781e..f616547dddc6 100644
--- a/include/trace/events/skb.h
+++ b/include/trace/events/skb.h
@@ -15,6 +15,7 @@
EM(SKB_DROP_REASON_PKT_TOO_SMALL, PKT_TOO_SMALL) \
EM(SKB_DROP_REASON_TCP_CSUM, TCP_CSUM) \
EM(SKB_DROP_REASON_TCP_FILTER, TCP_FILTER) \
+ EM(SKB_DROP_REASON_UDP_CSUM, UDP_CSUM) \
EMe(SKB_DROP_REASON_MAX, HAHA_MAX)

#undef EM
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index f376c777e8fc..463a5adcaacf 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -2410,6 +2410,9 @@ int __udp4_lib_rcv(struct sk_buff *skb, struct udp_table *udptable,
__be32 saddr, daddr;
struct net *net = dev_net(skb->dev);
bool refcounted;
+ int drop_reason;
+
+ drop_reason = SKB_DROP_REASON_NOT_SPECIFIED;

/*
* Validate the packet.
@@ -2465,6 +2468,7 @@ int __udp4_lib_rcv(struct sk_buff *skb, struct udp_table *udptable,
if (udp_lib_checksum_complete(skb))
goto csum_error;

+ drop_reason = SKB_DROP_REASON_NO_SOCKET;
__UDP_INC_STATS(net, UDP_MIB_NOPORTS, proto == IPPROTO_UDPLITE);
icmp_send(skb, ICMP_DEST_UNREACH, ICMP_PORT_UNREACH, 0);

@@ -2472,10 +2476,11 @@ int __udp4_lib_rcv(struct sk_buff *skb, struct udp_table *udptable,
* Hmm. We got an UDP packet to a port to which we
* don't wanna listen. Ignore it.
*/
- kfree_skb(skb);
+ kfree_skb_with_reason(skb, drop_reason);
return 0;

short_packet:
+ drop_reason = SKB_DROP_REASON_PKT_TOO_SMALL;
net_dbg_ratelimited("UDP%s: short packet: From %pI4:%u %d/%d to %pI4:%u\n",
proto == IPPROTO_UDPLITE ? "Lite" : "",
&saddr, ntohs(uh->source),
@@ -2488,6 +2493,7 @@ int __udp4_lib_rcv(struct sk_buff *skb, struct udp_table *udptable,
* RFC1122: OK. Discards the bad packet silently (as far as
* the network is concerned, anyway) as per 4.1.3.4 (MUST).
*/
+ drop_reason = SKB_DROP_REASON_UDP_CSUM;
net_dbg_ratelimited("UDP%s: bad checksum. From %pI4:%u to %pI4:%u ulen %d\n",
proto == IPPROTO_UDPLITE ? "Lite" : "",
&saddr, ntohs(uh->source), &daddr, ntohs(uh->dest),
@@ -2495,7 +2501,7 @@ int __udp4_lib_rcv(struct sk_buff *skb, struct udp_table *udptable,
__UDP_INC_STATS(net, UDP_MIB_CSUMERRORS, proto == IPPROTO_UDPLITE);
drop:
__UDP_INC_STATS(net, UDP_MIB_INERRORS, proto == IPPROTO_UDPLITE);
- kfree_skb(skb);
+ kfree_skb_with_reason(skb, drop_reason);
return 0;
}

--
2.30.2


2021-12-31 01:26:25

by Jakub Kicinski

[permalink] [raw]
Subject: Re: [PATCH v2 net-next 1/3] net: skb: introduce kfree_skb_with_reason()

On Thu, 30 Dec 2021 17:32:38 +0800 [email protected] wrote:
> From: Menglong Dong <[email protected]>
>
> Introduce the interface kfree_skb_with_reason(), which is used to pass
> the reason why the skb is dropped to 'kfree_skb' tracepoint.
>
> Add the 'reason' field to 'trace_kfree_skb', therefor user can get
> more detail information about abnormal skb with 'drop_monitor' or
> eBPF.

> void skb_release_head_state(struct sk_buff *skb);
> void kfree_skb(struct sk_buff *skb);

Should this be turned into a static inline calling
kfree_skb_with_reason() now? BTW you should drop the
'_with'.

> +void kfree_skb_with_reason(struct sk_buff *skb,
> + enum skb_drop_reason reason);

continuation line is unaligned, please try checkpatch

> void kfree_skb_list(struct sk_buff *segs);
> void skb_dump(const char *level, const struct sk_buff *skb, bool full_pkt);
> void skb_tx_error(struct sk_buff *skb);
> diff --git a/include/trace/events/skb.h b/include/trace/events/skb.h
> index 9e92f22eb086..cab1c08a30cd 100644
> --- a/include/trace/events/skb.h
> +++ b/include/trace/events/skb.h
> @@ -9,29 +9,51 @@
> #include <linux/netdevice.h>
> #include <linux/tracepoint.h>
>
> +#define TRACE_SKB_DROP_REASON \
> + EM(SKB_DROP_REASON_NOT_SPECIFIED, NOT_SPECIFIED) \
> + EMe(SKB_DROP_REASON_MAX, HAHA_MAX)

HAHA_MAX ?

> +
> +#undef EM
> +#undef EMe
> +
> +#define EM(a, b) TRACE_DEFINE_ENUM(a);
> +#define EMe(a, b) TRACE_DEFINE_ENUM(a);
> +
> +TRACE_SKB_DROP_REASON
> +
> +#undef EM
> +#undef EMe
> +#define EM(a, b) { a, #b },
> +#define EMe(a, b) { a, #b }
> +
> +

double new line

> /*
> * Tracepoint for free an sk_buff:
> */
> TRACE_EVENT(kfree_skb,
>
> - TP_PROTO(struct sk_buff *skb, void *location),
> + TP_PROTO(struct sk_buff *skb, void *location,
> + enum skb_drop_reason reason),
>
> - TP_ARGS(skb, location),
> + TP_ARGS(skb, location, reason),
>
> TP_STRUCT__entry(
> - __field( void *, skbaddr )
> - __field( void *, location )
> - __field( unsigned short, protocol )
> + __field(void *, skbaddr)
> + __field(void *, location)
> + __field(unsigned short, protocol)
> + __field(enum skb_drop_reason, reason)
> ),
>
> TP_fast_assign(
> __entry->skbaddr = skb;
> __entry->location = location;
> __entry->protocol = ntohs(skb->protocol);
> + __entry->reason = reason;
> ),
>
> - TP_printk("skbaddr=%p protocol=%u location=%p",
> - __entry->skbaddr, __entry->protocol, __entry->location)
> + TP_printk("skbaddr=%p protocol=%u location=%p reason: %s",
> + __entry->skbaddr, __entry->protocol, __entry->location,
> + __print_symbolic(__entry->reason, TRACE_SKB_DROP_REASON))
> );
>
> TRACE_EVENT(consume_skb,
> diff --git a/net/core/dev.c b/net/core/dev.c
> index 644b9c8be3a8..9464dbf9e3d6 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -4899,7 +4899,8 @@ static __latent_entropy void net_tx_action(struct softirq_action *h)
> if (likely(get_kfree_skb_cb(skb)->reason == SKB_REASON_CONSUMED))
> trace_consume_skb(skb);
> else
> - trace_kfree_skb(skb, net_tx_action);
> + trace_kfree_skb(skb, net_tx_action,
> + SKB_DROP_REASON_NOT_SPECIFIED);
>
> if (skb->fclone != SKB_FCLONE_UNAVAILABLE)
> __kfree_skb(skb);
> diff --git a/net/core/drop_monitor.c b/net/core/drop_monitor.c
> index 3d0ab2eec916..7b288a121a41 100644
> --- a/net/core/drop_monitor.c
> +++ b/net/core/drop_monitor.c
> @@ -110,7 +110,8 @@ static u32 net_dm_queue_len = 1000;
>
> struct net_dm_alert_ops {
> void (*kfree_skb_probe)(void *ignore, struct sk_buff *skb,
> - void *location);
> + void *location,
> + enum skb_drop_reason reason);
> void (*napi_poll_probe)(void *ignore, struct napi_struct *napi,
> int work, int budget);
> void (*work_item_func)(struct work_struct *work);
> @@ -262,7 +263,9 @@ static void trace_drop_common(struct sk_buff *skb, void *location)
> spin_unlock_irqrestore(&data->lock, flags);
> }
>
> -static void trace_kfree_skb_hit(void *ignore, struct sk_buff *skb, void *location)
> +static void trace_kfree_skb_hit(void *ignore, struct sk_buff *skb,
> + void *location,
> + enum skb_drop_reason reason)
> {
> trace_drop_common(skb, location);
> }
> @@ -490,7 +493,8 @@ static const struct net_dm_alert_ops net_dm_alert_summary_ops = {
>
> static void net_dm_packet_trace_kfree_skb_hit(void *ignore,
> struct sk_buff *skb,
> - void *location)
> + void *location,
> + enum skb_drop_reason reason)
> {
> ktime_t tstamp = ktime_get_real();
> struct per_cpu_dm_data *data;
> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> index 275f7b8416fe..570dc022a8a1 100644
> --- a/net/core/skbuff.c
> +++ b/net/core/skbuff.c
> @@ -770,11 +770,31 @@ void kfree_skb(struct sk_buff *skb)
> if (!skb_unref(skb))
> return;
>
> - trace_kfree_skb(skb, __builtin_return_address(0));
> + trace_kfree_skb(skb, __builtin_return_address(0),
> + SKB_DROP_REASON_NOT_SPECIFIED);
> __kfree_skb(skb);
> }
> EXPORT_SYMBOL(kfree_skb);
>
> +/**
> + * kfree_skb_with_reason - free an sk_buff with reason
> + * @skb: buffer to free
> + * @reason: reason why this skb is dropped
> + *
> + * The same as kfree_skb() except that this function will pass
> + * the drop reason to 'kfree_skb' tracepoint.
> + */
> +void kfree_skb_with_reason(struct sk_buff *skb,
> + enum skb_drop_reason reason)
> +{
> + if (!skb_unref(skb))
> + return;
> +
> + trace_kfree_skb(skb, __builtin_return_address(0), reason);
> + __kfree_skb(skb);
> +}
> +EXPORT_SYMBOL(kfree_skb_with_reason);
> +
> void kfree_skb_list(struct sk_buff *segs)
> {
> while (segs) {


2021-12-31 06:38:46

by Menglong Dong

[permalink] [raw]
Subject: Re: [PATCH v2 net-next 1/3] net: skb: introduce kfree_skb_with_reason()

On Fri, Dec 31, 2021 at 9:26 AM Jakub Kicinski <[email protected]> wrote:
>
> On Thu, 30 Dec 2021 17:32:38 +0800 [email protected] wrote:
> > From: Menglong Dong <[email protected]>
> >
> > Introduce the interface kfree_skb_with_reason(), which is used to pass
> > the reason why the skb is dropped to 'kfree_skb' tracepoint.
> >
> > Add the 'reason' field to 'trace_kfree_skb', therefor user can get
> > more detail information about abnormal skb with 'drop_monitor' or
> > eBPF.
>
> > void skb_release_head_state(struct sk_buff *skb);
> > void kfree_skb(struct sk_buff *skb);
>
> Should this be turned into a static inline calling
> kfree_skb_with_reason() now? BTW you should drop the
> '_with'.
>

I thought about it before, but I'm a little afraid that some users may trace
kfree_skb() with kprobe, making it inline may not be friendly to them?

> > +void kfree_skb_with_reason(struct sk_buff *skb,
> > + enum skb_drop_reason reason);
>
> continuation line is unaligned, please try checkpatch
>
> > void kfree_skb_list(struct sk_buff *segs);
> > void skb_dump(const char *level, const struct sk_buff *skb, bool full_pkt);
> > void skb_tx_error(struct sk_buff *skb);
> > diff --git a/include/trace/events/skb.h b/include/trace/events/skb.h
> > index 9e92f22eb086..cab1c08a30cd 100644
> > --- a/include/trace/events/skb.h
> > +++ b/include/trace/events/skb.h
> > @@ -9,29 +9,51 @@
> > #include <linux/netdevice.h>
> > #include <linux/tracepoint.h>
> >
> > +#define TRACE_SKB_DROP_REASON \
> > + EM(SKB_DROP_REASON_NOT_SPECIFIED, NOT_SPECIFIED) \
> > + EMe(SKB_DROP_REASON_MAX, HAHA_MAX)
>
> HAHA_MAX ?

Enn......WOW_MAX? Just kidding, I'll make it 'MAX' (or remove this line,
as it won't be used).

>
> > +
> > +#undef EM
> > +#undef EMe
> > +
> > +#define EM(a, b) TRACE_DEFINE_ENUM(a);
> > +#define EMe(a, b) TRACE_DEFINE_ENUM(a);
> > +
> > +TRACE_SKB_DROP_REASON
> > +
> > +#undef EM
> > +#undef EMe
> > +#define EM(a, b) { a, #b },
> > +#define EMe(a, b) { a, #b }
> > +
> > +
>
> double new line

Get it!

Thanks~
Menglong Dong

>
> > /*
> > * Tracepoint for free an sk_buff:
> > */
> > TRACE_EVENT(kfree_skb,
> >
> > - TP_PROTO(struct sk_buff *skb, void *location),
> > + TP_PROTO(struct sk_buff *skb, void *location,
> > + enum skb_drop_reason reason),
> >
> > - TP_ARGS(skb, location),
> > + TP_ARGS(skb, location, reason),
> >
> > TP_STRUCT__entry(
> > - __field( void *, skbaddr )
> > - __field( void *, location )
> > - __field( unsigned short, protocol )
> > + __field(void *, skbaddr)
> > + __field(void *, location)
> > + __field(unsigned short, protocol)
> > + __field(enum skb_drop_reason, reason)
> > ),
> >
> > TP_fast_assign(
> > __entry->skbaddr = skb;
> > __entry->location = location;
> > __entry->protocol = ntohs(skb->protocol);
> > + __entry->reason = reason;
> > ),
> >
> > - TP_printk("skbaddr=%p protocol=%u location=%p",
> > - __entry->skbaddr, __entry->protocol, __entry->location)
> > + TP_printk("skbaddr=%p protocol=%u location=%p reason: %s",
> > + __entry->skbaddr, __entry->protocol, __entry->location,
> > + __print_symbolic(__entry->reason, TRACE_SKB_DROP_REASON))
> > );
> >
> > TRACE_EVENT(consume_skb,
> > diff --git a/net/core/dev.c b/net/core/dev.c
> > index 644b9c8be3a8..9464dbf9e3d6 100644
> > --- a/net/core/dev.c
> > +++ b/net/core/dev.c
> > @@ -4899,7 +4899,8 @@ static __latent_entropy void net_tx_action(struct softirq_action *h)
> > if (likely(get_kfree_skb_cb(skb)->reason == SKB_REASON_CONSUMED))
> > trace_consume_skb(skb);
> > else
> > - trace_kfree_skb(skb, net_tx_action);
> > + trace_kfree_skb(skb, net_tx_action,
> > + SKB_DROP_REASON_NOT_SPECIFIED);
> >
> > if (skb->fclone != SKB_FCLONE_UNAVAILABLE)
> > __kfree_skb(skb);
> > diff --git a/net/core/drop_monitor.c b/net/core/drop_monitor.c
> > index 3d0ab2eec916..7b288a121a41 100644
> > --- a/net/core/drop_monitor.c
> > +++ b/net/core/drop_monitor.c
> > @@ -110,7 +110,8 @@ static u32 net_dm_queue_len = 1000;
> >
> > struct net_dm_alert_ops {
> > void (*kfree_skb_probe)(void *ignore, struct sk_buff *skb,
> > - void *location);
> > + void *location,
> > + enum skb_drop_reason reason);
> > void (*napi_poll_probe)(void *ignore, struct napi_struct *napi,
> > int work, int budget);
> > void (*work_item_func)(struct work_struct *work);
> > @@ -262,7 +263,9 @@ static void trace_drop_common(struct sk_buff *skb, void *location)
> > spin_unlock_irqrestore(&data->lock, flags);
> > }
> >
> > -static void trace_kfree_skb_hit(void *ignore, struct sk_buff *skb, void *location)
> > +static void trace_kfree_skb_hit(void *ignore, struct sk_buff *skb,
> > + void *location,
> > + enum skb_drop_reason reason)
> > {
> > trace_drop_common(skb, location);
> > }
> > @@ -490,7 +493,8 @@ static const struct net_dm_alert_ops net_dm_alert_summary_ops = {
> >
> > static void net_dm_packet_trace_kfree_skb_hit(void *ignore,
> > struct sk_buff *skb,
> > - void *location)
> > + void *location,
> > + enum skb_drop_reason reason)
> > {
> > ktime_t tstamp = ktime_get_real();
> > struct per_cpu_dm_data *data;
> > diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> > index 275f7b8416fe..570dc022a8a1 100644
> > --- a/net/core/skbuff.c
> > +++ b/net/core/skbuff.c
> > @@ -770,11 +770,31 @@ void kfree_skb(struct sk_buff *skb)
> > if (!skb_unref(skb))
> > return;
> >
> > - trace_kfree_skb(skb, __builtin_return_address(0));
> > + trace_kfree_skb(skb, __builtin_return_address(0),
> > + SKB_DROP_REASON_NOT_SPECIFIED);
> > __kfree_skb(skb);
> > }
> > EXPORT_SYMBOL(kfree_skb);
> >
> > +/**
> > + * kfree_skb_with_reason - free an sk_buff with reason
> > + * @skb: buffer to free
> > + * @reason: reason why this skb is dropped
> > + *
> > + * The same as kfree_skb() except that this function will pass
> > + * the drop reason to 'kfree_skb' tracepoint.
> > + */
> > +void kfree_skb_with_reason(struct sk_buff *skb,
> > + enum skb_drop_reason reason)
> > +{
> > + if (!skb_unref(skb))
> > + return;
> > +
> > + trace_kfree_skb(skb, __builtin_return_address(0), reason);
> > + __kfree_skb(skb);
> > +}
> > +EXPORT_SYMBOL(kfree_skb_with_reason);
> > +
> > void kfree_skb_list(struct sk_buff *segs)
> > {
> > while (segs) {
>

2022-01-01 02:22:30

by Jakub Kicinski

[permalink] [raw]
Subject: Re: [PATCH v2 net-next 1/3] net: skb: introduce kfree_skb_with_reason()

On Fri, 31 Dec 2021 14:35:31 +0800 Menglong Dong wrote:
> > > void skb_release_head_state(struct sk_buff *skb);
> > > void kfree_skb(struct sk_buff *skb);
> >
> > Should this be turned into a static inline calling
> > kfree_skb_with_reason() now? BTW you should drop the
> > '_with'.
> >
>
> I thought about it before, but I'm a little afraid that some users may trace
> kfree_skb() with kprobe, making it inline may not be friendly to them?

Hm, there is a bpf sample which does that, but that's probably
not commonly used given there is a tracepoint. If someone is
using a kprobe they can switch to kprobing kfree_skb*reason().

2022-01-04 01:48:02

by Cong Wang

[permalink] [raw]
Subject: Re: [PATCH v2 net-next 0/3] net: skb: introduce kfree_skb_with_reason() and use it for tcp and udp

On Thu, Dec 30, 2021 at 05:32:37PM +0800, [email protected] wrote:
> From: Menglong Dong <[email protected]>
>
> In this series patch, the interface kfree_skb_with_reason() is
> introduced(), which is used to collect skb drop reason, and pass
> it to 'kfree_skb' tracepoint. Therefor, 'drop_monitor' or eBPF is
> able to monitor abnormal skb with detail reason.
>

We already something close, __dev_kfree_skb_any(). Can't we unify
all of these?


> In fact, this series patches are out of the intelligence of David
> and Steve, I'm just a truck man :/
>

I think there was another discussion before yours, which I got involved
as well.

> Previous discussion is here:
>
> https://lore.kernel.org/netdev/[email protected]/
> https://lore.kernel.org/netdev/[email protected]/
>
> In the first patch, kfree_skb_with_reason() is introduced and
> the 'reason' field is added to 'kfree_skb' tracepoint. In the
> second patch, 'kfree_skb()' in replaced with 'kfree_skb_with_reason()'
> in tcp_v4_rcv(). In the third patch, 'kfree_skb_with_reason()' is
> used in __udp4_lib_rcv().
>

I don't follow all the discussions here, but IIRC it would be nice
if we can provide the SNMP stat code (for instance, TCP_MIB_CSUMERRORS) to
user-space, because those stats are already exposed to user-space, so
you don't have to invent new ones.

Thanks.

2022-01-04 02:01:39

by David Ahern

[permalink] [raw]
Subject: Re: [PATCH v2 net-next 0/3] net: skb: introduce kfree_skb_with_reason() and use it for tcp and udp

On 1/3/22 6:47 PM, Cong Wang wrote:
> On Thu, Dec 30, 2021 at 05:32:37PM +0800, [email protected] wrote:
>> From: Menglong Dong <[email protected]>
>>
>> In this series patch, the interface kfree_skb_with_reason() is
>> introduced(), which is used to collect skb drop reason, and pass
>> it to 'kfree_skb' tracepoint. Therefor, 'drop_monitor' or eBPF is
>> able to monitor abnormal skb with detail reason.
>>
>
> We already something close, __dev_kfree_skb_any(). Can't we unify
> all of these?

Specifically?

The 'reason' passed around by those is either SKB_REASON_CONSUMED or
SKB_REASON_DROPPED and is used to call kfree_skb vs consume_skb. i.e.,
this is unrelated to this patch set and goal.

>
>
>> In fact, this series patches are out of the intelligence of David
>> and Steve, I'm just a truck man :/
>>
>
> I think there was another discussion before yours, which I got involved
> as well.
>
>> Previous discussion is here:
>>
>> https://lore.kernel.org/netdev/[email protected]/
>> https://lore.kernel.org/netdev/[email protected]/
>>
>> In the first patch, kfree_skb_with_reason() is introduced and
>> the 'reason' field is added to 'kfree_skb' tracepoint. In the
>> second patch, 'kfree_skb()' in replaced with 'kfree_skb_with_reason()'
>> in tcp_v4_rcv(). In the third patch, 'kfree_skb_with_reason()' is
>> used in __udp4_lib_rcv().
>>
>
> I don't follow all the discussions here, but IIRC it would be nice
> if we can provide the SNMP stat code (for instance, TCP_MIB_CSUMERRORS) to
> user-space, because those stats are already exposed to user-space, so
> you don't have to invent new ones.

Those SNMP macros are not unique and can not be fed into a generic
kfree_skb_reason function.


2022-01-04 03:09:52

by Cong Wang

[permalink] [raw]
Subject: Re: [PATCH v2 net-next 0/3] net: skb: introduce kfree_skb_with_reason() and use it for tcp and udp

On Mon, Jan 03, 2022 at 07:01:30PM -0700, David Ahern wrote:
> On 1/3/22 6:47 PM, Cong Wang wrote:
> > On Thu, Dec 30, 2021 at 05:32:37PM +0800, [email protected] wrote:
> >> From: Menglong Dong <[email protected]>
> >>
> >> In this series patch, the interface kfree_skb_with_reason() is
> >> introduced(), which is used to collect skb drop reason, and pass
> >> it to 'kfree_skb' tracepoint. Therefor, 'drop_monitor' or eBPF is
> >> able to monitor abnormal skb with detail reason.
> >>
> >
> > We already something close, __dev_kfree_skb_any(). Can't we unify
> > all of these?
>
> Specifically?
>
> The 'reason' passed around by those is either SKB_REASON_CONSUMED or
> SKB_REASON_DROPPED and is used to call kfree_skb vs consume_skb. i.e.,
> this is unrelated to this patch set and goal.

What prevents you extending it?

>
> >
> >
> >> In fact, this series patches are out of the intelligence of David
> >> and Steve, I'm just a truck man :/
> >>
> >
> > I think there was another discussion before yours, which I got involved
> > as well.
> >
> >> Previous discussion is here:
> >>
> >> https://lore.kernel.org/netdev/[email protected]/
> >> https://lore.kernel.org/netdev/[email protected]/
> >>
> >> In the first patch, kfree_skb_with_reason() is introduced and
> >> the 'reason' field is added to 'kfree_skb' tracepoint. In the
> >> second patch, 'kfree_skb()' in replaced with 'kfree_skb_with_reason()'
> >> in tcp_v4_rcv(). In the third patch, 'kfree_skb_with_reason()' is
> >> used in __udp4_lib_rcv().
> >>
> >
> > I don't follow all the discussions here, but IIRC it would be nice
> > if we can provide the SNMP stat code (for instance, TCP_MIB_CSUMERRORS) to
> > user-space, because those stats are already exposed to user-space, so
> > you don't have to invent new ones.
>
> Those SNMP macros are not unique and can not be fed into a generic
> kfree_skb_reason function.

Sure, you also have the skb itself, particularly skb protocol, with
these combined, it should be unique.

Thanks.

2022-01-04 03:35:41

by Menglong Dong

[permalink] [raw]
Subject: Re: [PATCH v2 net-next 0/3] net: skb: introduce kfree_skb_with_reason() and use it for tcp and udp

On Tue, Jan 4, 2022 at 11:09 AM Cong Wang <[email protected]> wrote:
>
> On Mon, Jan 03, 2022 at 07:01:30PM -0700, David Ahern wrote:
> > On 1/3/22 6:47 PM, Cong Wang wrote:
> > > On Thu, Dec 30, 2021 at 05:32:37PM +0800, [email protected] wrote:
> > >> From: Menglong Dong <[email protected]>
> > >>
> > >> In this series patch, the interface kfree_skb_with_reason() is
> > >> introduced(), which is used to collect skb drop reason, and pass
> > >> it to 'kfree_skb' tracepoint. Therefor, 'drop_monitor' or eBPF is
> > >> able to monitor abnormal skb with detail reason.
> > >>
> > >
> > > We already something close, __dev_kfree_skb_any(). Can't we unify
> > > all of these?
> >
> > Specifically?
> >
> > The 'reason' passed around by those is either SKB_REASON_CONSUMED or
> > SKB_REASON_DROPPED and is used to call kfree_skb vs consume_skb. i.e.,
> > this is unrelated to this patch set and goal.
>
> What prevents you extending it?
>

I think extending kfree_skb() with kfree_skb_reason() is more reasonable,
considering the goal of kfree_skb() and __dev_kfree_skb_any().

> >
> > >
> > >
> > >> In fact, this series patches are out of the intelligence of David
> > >> and Steve, I'm just a truck man :/
> > >>
> > >
> > > I think there was another discussion before yours, which I got involved
> > > as well.
> > >
> > >> Previous discussion is here:
> > >>
> > >> https://lore.kernel.org/netdev/[email protected]/
> > >> https://lore.kernel.org/netdev/[email protected]/
> > >>
> > >> In the first patch, kfree_skb_with_reason() is introduced and
> > >> the 'reason' field is added to 'kfree_skb' tracepoint. In the
> > >> second patch, 'kfree_skb()' in replaced with 'kfree_skb_with_reason()'
> > >> in tcp_v4_rcv(). In the third patch, 'kfree_skb_with_reason()' is
> > >> used in __udp4_lib_rcv().
> > >>
> > >
> > > I don't follow all the discussions here, but IIRC it would be nice
> > > if we can provide the SNMP stat code (for instance, TCP_MIB_CSUMERRORS) to
> > > user-space, because those stats are already exposed to user-space, so
> > > you don't have to invent new ones.
> >
> > Those SNMP macros are not unique and can not be fed into a generic
> > kfree_skb_reason function.
>
> Sure, you also have the skb itself, particularly skb protocol, with
> these combined, it should be unique.

I thought about it before, but it's hard to use the reason in SNMP
directly. First,
the stats of SNMP are grouped, and the same skb protocol can use stats in
different groups, which makes it hard to be unique. Second, SNMP is used to
do statistics, not only drop statistics, which is a little different
from the goal here.
Third, it's not flexible enough to extend the new drop reason.

Thanks!
Menglong Dong

>
> Thanks.