2022-02-20 14:56:42

by Dongli Zhang

[permalink] [raw]
Subject: [PATCH v2 0/3] tun/tap: use kfree_skb_reason() to trace dropped skb

The commit c504e5c2f964 ("net: skb: introduce kfree_skb_reason()") has
introduced the kfree_skb_reason() to help track the reason.

The tun and tap are commonly used as virtio-net/vhost-net backend. This is to
use kfree_skb_reason() to trace the dropped skb for those two drivers.

Changed since v1:

I have renamed many of the reasons since v1. I make them as generic as possible
so that they can be re-used by core networking and drivers.

- SKB_DROP_REASON_SKB_CSUM

This is used whenever there is checksum error with sk_buff.

- SKB_DROP_REASON_SKB_COPY_DATA

The kernel may (zero) copy the data to or from sk_buff, e.g.,
zerocopy_sg_from_iter(), skb_copy_datagram_from_iter() and
skb_orphan_frags_rx(). This reason is for the copy related error.

- SKB_DROP_REASON_SKB_GSO_SEG

Any error reported when GSO processing the sk_buff. It is frequent to process
sk_buff gso data and we introduce a new reason to handle that.

- SKB_DROP_REASON_SKB_PULL
- SKB_DROP_REASON_SKB_TRIM

It is frequent to pull to sk_buff data or trim the sk_buff data.

- SKB_DROP_REASON_DEV_HDR

Any driver may report error if there is any error in the metadata on the DMA
ring buffer.

- SKB_DROP_REASON_DEV_READY

The device is not ready/online or initialized to receive data.

- SKB_DROP_REASON_DEV_FILTER

David Ahern suggested SKB_DROP_REASON_TAP_FILTER. I changed from 'TAP' to 'DEV'
to make it more generic.

- SKB_DROP_REASON_FULL_RING

Suggested by Eric Dumazet.

- SKB_DROP_REASON_BPF_FILTER

Dropped by ebpf filter


This is the output for TUN device.

# cat /sys/kernel/debug/tracing/trace_pipe
<idle>-0 [020] b.s3. 2317.035738: kfree_skb: skbaddr=00000000feed21ec protocol=4 location=000000003e0b7d3d reason: FULL_RING
arping-7078 [028] b..1. 2317.065261: kfree_skb: skbaddr=000000003409d3c3 protocol=2054 location=000000003e0b7d3d reason: FULL_RING
vhost-5409-5422 [001] b..1. 2317.818647: kfree_skb: skbaddr=0000000004b8c03a protocol=2054 location=000000003e0b7d3d reason: FULL_RING
arping-7078 [028] b..1. 2318.065336: kfree_skb: skbaddr=000000003409d3c3 protocol=2054 location=000000003e0b7d3d reason: FULL_RING
vhost-5409-5422 [001] b..1. 2318.845315: kfree_skb: skbaddr=0000000004b8c03a protocol=2054 location=000000003e0b7d3d reason: FULL_RING

This is the output for TAP device.

# cat /sys/kernel/debug/tracing/trace_pipe
kworker/19:1-243 [019] ..s1. 922.639393: kfree_skb: skbaddr=0000000045a155b0 protocol=2054 location=00000000f88b4996 reason: FULL_RING
kworker/19:1-243 [019] ..s1. 923.663398: kfree_skb: skbaddr=00000000ff4e054b protocol=2054 location=00000000f88b4996 reason: FULL_RING
kworker/19:1-243 [019] ..s1. 924.687351: kfree_skb: skbaddr=00000000ff4e054b protocol=2054 location=00000000f88b4996 reason: FULL_RING
kworker/19:1-243 [019] ..s1. 925.711294: kfree_skb: skbaddr=00000000fb75a48b protocol=2054 location=00000000f88b4996 reason: FULL_RING



drivers/net/tap.c | 30 ++++++++++++++++++++++--------
drivers/net/tun.c | 38 ++++++++++++++++++++++++++++++--------
include/linux/skbuff.h | 16 ++++++++++++++++
include/trace/events/skb.h | 10 ++++++++++


Please let me know if there is any suggestion on the definition of reasons.

Thank you very much!

Dongli Zhang



2022-02-21 04:35:46

by Dongli Zhang

[permalink] [raw]
Subject: [PATCH v2 1/3] net: tap: track dropped skb via kfree_skb_reason()

The TAP can be used as vhost-net backend. E.g., the tap_handle_frame() is
the interface to forward the skb from TAP to vhost-net/virtio-net.

However, there are many "goto drop" in the TAP driver. Therefore, the
kfree_skb_reason() is involved at each "goto drop" to help userspace
ftrace/ebpf to track the reason for the loss of packets.

The below reasons are introduced:

- SKB_DROP_REASON_SKB_CSUM
- SKB_DROP_REASON_SKB_COPY_DATA
- SKB_DROP_REASON_SKB_GSO_SEG
- SKB_DROP_REASON_DEV_HDR
- SKB_DROP_REASON_FULL_RING

Cc: Joao Martins <[email protected]>
Cc: Joe Jin <[email protected]>
Signed-off-by: Dongli Zhang <[email protected]>
---
drivers/net/tap.c | 30 ++++++++++++++++++++++--------
include/linux/skbuff.h | 9 +++++++++
include/trace/events/skb.h | 5 +++++
3 files changed, 36 insertions(+), 8 deletions(-)

diff --git a/drivers/net/tap.c b/drivers/net/tap.c
index 8e3a28ba6b28..ab3592506ef8 100644
--- a/drivers/net/tap.c
+++ b/drivers/net/tap.c
@@ -322,6 +322,7 @@ rx_handler_result_t tap_handle_frame(struct sk_buff **pskb)
struct tap_dev *tap;
struct tap_queue *q;
netdev_features_t features = TAP_FEATURES;
+ int drop_reason;

tap = tap_dev_get_rcu(dev);
if (!tap)
@@ -343,12 +344,16 @@ rx_handler_result_t tap_handle_frame(struct sk_buff **pskb)
struct sk_buff *segs = __skb_gso_segment(skb, features, false);
struct sk_buff *next;

- if (IS_ERR(segs))
+ if (IS_ERR(segs)) {
+ drop_reason = SKB_DROP_REASON_SKB_GSO_SEG;
goto drop;
+ }

if (!segs) {
- if (ptr_ring_produce(&q->ring, skb))
+ if (ptr_ring_produce(&q->ring, skb)) {
+ drop_reason = SKB_DROP_REASON_FULL_RING;
goto drop;
+ }
goto wake_up;
}

@@ -369,10 +374,14 @@ rx_handler_result_t tap_handle_frame(struct sk_buff **pskb)
*/
if (skb->ip_summed == CHECKSUM_PARTIAL &&
!(features & NETIF_F_CSUM_MASK) &&
- skb_checksum_help(skb))
+ skb_checksum_help(skb)) {
+ drop_reason = SKB_DROP_REASON_SKB_CSUM;
goto drop;
- if (ptr_ring_produce(&q->ring, skb))
+ }
+ if (ptr_ring_produce(&q->ring, skb)) {
+ drop_reason = SKB_DROP_REASON_FULL_RING;
goto drop;
+ }
}

wake_up:
@@ -383,7 +392,7 @@ rx_handler_result_t tap_handle_frame(struct sk_buff **pskb)
/* Count errors/drops only here, thus don't care about args. */
if (tap->count_rx_dropped)
tap->count_rx_dropped(tap);
- kfree_skb(skb);
+ kfree_skb_reason(skb, drop_reason);
return RX_HANDLER_CONSUMED;
}
EXPORT_SYMBOL_GPL(tap_handle_frame);
@@ -632,6 +641,7 @@ static ssize_t tap_get_user(struct tap_queue *q, void *msg_control,
int depth;
bool zerocopy = false;
size_t linear;
+ int drop_reason;

if (q->flags & IFF_VNET_HDR) {
vnet_hdr_len = READ_ONCE(q->vnet_hdr_sz);
@@ -696,8 +706,10 @@ static ssize_t tap_get_user(struct tap_queue *q, void *msg_control,
else
err = skb_copy_datagram_from_iter(skb, 0, from, len);

- if (err)
+ if (err) {
+ drop_reason = SKB_DROP_REASON_SKB_COPY_DATA;
goto err_kfree;
+ }

skb_set_network_header(skb, ETH_HLEN);
skb_reset_mac_header(skb);
@@ -706,8 +718,10 @@ static ssize_t tap_get_user(struct tap_queue *q, void *msg_control,
if (vnet_hdr_len) {
err = virtio_net_hdr_to_skb(skb, &vnet_hdr,
tap_is_little_endian(q));
- if (err)
+ if (err) {
+ drop_reason = SKB_DROP_REASON_DEV_HDR;
goto err_kfree;
+ }
}

skb_probe_transport_header(skb);
@@ -738,7 +752,7 @@ static ssize_t tap_get_user(struct tap_queue *q, void *msg_control,
return total_len;

err_kfree:
- kfree_skb(skb);
+ kfree_skb_reason(skb, drop_reason);

err:
rcu_read_lock();
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index a5adbf6b51e8..218f7ba753e7 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -346,6 +346,15 @@ enum skb_drop_reason {
* udp packet drop out of
* udp_memory_allocated.
*/
+ SKB_DROP_REASON_SKB_CSUM, /* sk_buff checksum error */
+ SKB_DROP_REASON_SKB_COPY_DATA, /* failed to copy data from or to
+ * sk_buff
+ */
+ SKB_DROP_REASON_SKB_GSO_SEG, /* gso segmentation error */
+ SKB_DROP_REASON_DEV_HDR, /* there is something wrong with
+ * device driver specific header
+ */
+ SKB_DROP_REASON_FULL_RING, /* ring buffer is full */
SKB_DROP_REASON_MAX,
};

diff --git a/include/trace/events/skb.h b/include/trace/events/skb.h
index cfcfd26399f7..842020d532f2 100644
--- a/include/trace/events/skb.h
+++ b/include/trace/events/skb.h
@@ -27,6 +27,11 @@
EM(SKB_DROP_REASON_IP_NOPROTO, IP_NOPROTO) \
EM(SKB_DROP_REASON_SOCKET_RCVBUFF, SOCKET_RCVBUFF) \
EM(SKB_DROP_REASON_PROTO_MEM, PROTO_MEM) \
+ EM(SKB_DROP_REASON_SKB_CSUM, SKB_CSUM) \
+ EM(SKB_DROP_REASON_SKB_COPY_DATA, SKB_COPY_DATA) \
+ EM(SKB_DROP_REASON_SKB_GSO_SEG, SKB_GSO_SEG) \
+ EM(SKB_DROP_REASON_DEV_HDR, DEV_HDR) \
+ EM(SKB_DROP_REASON_FULL_RING, FULL_RING) \
EMe(SKB_DROP_REASON_MAX, MAX)

#undef EM
--
2.17.1

2022-02-21 09:07:36

by Dongli Zhang

[permalink] [raw]
Subject: [PATCH v2 3/3] net: tun: track dropped skb via kfree_skb_reason()

The TUN can be used as vhost-net backend. E.g, the tun_net_xmit() is the
interface to forward the skb from TUN to vhost-net/virtio-net.

However, there are many "goto drop" in the TUN driver. Therefore, the
kfree_skb_reason() is involved at each "goto drop" to help userspace
ftrace/ebpf to track the reason for the loss of packets.

The below reasons are introduced:

- SKB_DROP_REASON_SKB_PULL
- SKB_DROP_REASON_SKB_TRIM
- SKB_DROP_REASON_DEV_READY
- SKB_DROP_REASON_DEV_FILTER
- SKB_DROP_REASON_BPF_FILTER

Cc: Joao Martins <[email protected]>
Cc: Joe Jin <[email protected]>
Signed-off-by: Dongli Zhang <[email protected]>
---
drivers/net/tun.c | 37 ++++++++++++++++++++++++++++---------
include/linux/skbuff.h | 7 +++++++
include/trace/events/skb.h | 5 +++++
3 files changed, 40 insertions(+), 9 deletions(-)

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index aa27268edc5f..ab47a66deb7f 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -1062,13 +1062,16 @@ static netdev_tx_t tun_net_xmit(struct sk_buff *skb, struct net_device *dev)
struct netdev_queue *queue;
struct tun_file *tfile;
int len = skb->len;
+ int drop_reason;

rcu_read_lock();
tfile = rcu_dereference(tun->tfiles[txq]);

/* Drop packet if interface is not attached */
- if (!tfile)
+ if (!tfile) {
+ drop_reason = SKB_DROP_REASON_DEV_READY;
goto drop;
+ }

if (!rcu_dereference(tun->steering_prog))
tun_automq_xmit(tun, skb);
@@ -1078,22 +1081,32 @@ static netdev_tx_t tun_net_xmit(struct sk_buff *skb, struct net_device *dev)
/* Drop if the filter does not like it.
* This is a noop if the filter is disabled.
* Filter can be enabled only for the TAP devices. */
- if (!check_filter(&tun->txflt, skb))
+ if (!check_filter(&tun->txflt, skb)) {
+ drop_reason = SKB_DROP_REASON_DEV_FILTER;
goto drop;
+ }

if (tfile->socket.sk->sk_filter &&
- sk_filter(tfile->socket.sk, skb))
+ sk_filter(tfile->socket.sk, skb)) {
+ drop_reason = SKB_DROP_REASON_SOCKET_FILTER;
goto drop;
+ }

len = run_ebpf_filter(tun, skb, len);
- if (len == 0)
+ if (len == 0) {
+ drop_reason = SKB_DROP_REASON_BPF_FILTER;
goto drop;
+ }

- if (pskb_trim(skb, len))
+ if (pskb_trim(skb, len)) {
+ drop_reason = SKB_DROP_REASON_SKB_TRIM;
goto drop;
+ }

- if (unlikely(skb_orphan_frags_rx(skb, GFP_ATOMIC)))
+ if (unlikely(skb_orphan_frags_rx(skb, GFP_ATOMIC))) {
+ drop_reason = SKB_DROP_REASON_SKB_COPY_DATA;
goto drop;
+ }

skb_tx_timestamp(skb);

@@ -1104,8 +1117,10 @@ static netdev_tx_t tun_net_xmit(struct sk_buff *skb, struct net_device *dev)

nf_reset_ct(skb);

- if (ptr_ring_produce(&tfile->tx_ring, skb))
+ if (ptr_ring_produce(&tfile->tx_ring, skb)) {
+ drop_reason = SKB_DROP_REASON_FULL_RING;
goto drop;
+ }

/* NETIF_F_LLTX requires to do our own update of trans_start */
queue = netdev_get_tx_queue(dev, txq);
@@ -1122,7 +1137,7 @@ static netdev_tx_t tun_net_xmit(struct sk_buff *skb, struct net_device *dev)
drop:
atomic_long_inc(&dev->tx_dropped);
skb_tx_error(skb);
- kfree_skb(skb);
+ kfree_skb_reason(skb, drop_reason);
rcu_read_unlock();
return NET_XMIT_DROP;
}
@@ -1720,6 +1735,7 @@ static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file *tfile,
u32 rxhash = 0;
int skb_xdp = 1;
bool frags = tun_napi_frags_enabled(tfile);
+ int drop_reason;

if (!(tun->flags & IFF_NO_PI)) {
if (len < sizeof(pi))
@@ -1822,10 +1838,11 @@ static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file *tfile,
err = skb_copy_datagram_from_iter(skb, 0, from, len);

if (err) {
+ drop_reason = SKB_DROP_REASON_SKB_COPY_DATA;
err = -EFAULT;
drop:
atomic_long_inc(&tun->dev->rx_dropped);
- kfree_skb(skb);
+ kfree_skb_reason(skb, drop_reason);
if (frags) {
tfile->napi.skb = NULL;
mutex_unlock(&tfile->napi_mutex);
@@ -1872,6 +1889,7 @@ static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file *tfile,
case IFF_TAP:
if (frags && !pskb_may_pull(skb, ETH_HLEN)) {
err = -ENOMEM;
+ drop_reason = SKB_DROP_REASON_SKB_PULL;
goto drop;
}
skb->protocol = eth_type_trans(skb, tun->dev);
@@ -1925,6 +1943,7 @@ static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file *tfile,
if (unlikely(!(tun->dev->flags & IFF_UP))) {
err = -EIO;
rcu_read_unlock();
+ drop_reason = SKB_DROP_REASON_DEV_READY;
goto drop;
}

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 218f7ba753e7..9370778b428d 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -351,10 +351,17 @@ enum skb_drop_reason {
* sk_buff
*/
SKB_DROP_REASON_SKB_GSO_SEG, /* gso segmentation error */
+ SKB_DROP_REASON_SKB_PULL, /* failed to pull sk_buff data */
+ SKB_DROP_REASON_SKB_TRIM, /* failed to trim sk_buff data */
SKB_DROP_REASON_DEV_HDR, /* there is something wrong with
* device driver specific header
*/
+ SKB_DROP_REASON_DEV_READY, /* device is not ready */
+ SKB_DROP_REASON_DEV_FILTER, /* dropped by device driver
+ * specific filter
+ */
SKB_DROP_REASON_FULL_RING, /* ring buffer is full */
+ SKB_DROP_REASON_BPF_FILTER, /* dropped by ebpf filter */
SKB_DROP_REASON_MAX,
};

diff --git a/include/trace/events/skb.h b/include/trace/events/skb.h
index 842020d532f2..62704851062c 100644
--- a/include/trace/events/skb.h
+++ b/include/trace/events/skb.h
@@ -30,8 +30,13 @@
EM(SKB_DROP_REASON_SKB_CSUM, SKB_CSUM) \
EM(SKB_DROP_REASON_SKB_COPY_DATA, SKB_COPY_DATA) \
EM(SKB_DROP_REASON_SKB_GSO_SEG, SKB_GSO_SEG) \
+ EM(SKB_DROP_REASON_SKB_PULL, SKB_PULL) \
+ EM(SKB_DROP_REASON_SKB_TRIM, SKB_TRIM) \
EM(SKB_DROP_REASON_DEV_HDR, DEV_HDR) \
+ EM(SKB_DROP_REASON_DEV_READY, DEV_READY) \
+ EM(SKB_DROP_REASON_DEV_FILTER, DEV_FILTER) \
EM(SKB_DROP_REASON_FULL_RING, FULL_RING) \
+ EM(SKB_DROP_REASON_BPF_FILTER, BPF_FILTER) \
EMe(SKB_DROP_REASON_MAX, MAX)

#undef EM
--
2.17.1

2022-02-21 09:31:40

by David Ahern

[permalink] [raw]
Subject: Re: [PATCH v2 1/3] net: tap: track dropped skb via kfree_skb_reason()

On 2/19/22 12:12 PM, Dongli Zhang wrote:
> The TAP can be used as vhost-net backend. E.g., the tap_handle_frame() is
> the interface to forward the skb from TAP to vhost-net/virtio-net.
>
> However, there are many "goto drop" in the TAP driver. Therefore, the
> kfree_skb_reason() is involved at each "goto drop" to help userspace
> ftrace/ebpf to track the reason for the loss of packets.
>
> The below reasons are introduced:
>
> - SKB_DROP_REASON_SKB_CSUM
> - SKB_DROP_REASON_SKB_COPY_DATA
> - SKB_DROP_REASON_SKB_GSO_SEG
> - SKB_DROP_REASON_DEV_HDR
> - SKB_DROP_REASON_FULL_RING
>
> Cc: Joao Martins <[email protected]>
> Cc: Joe Jin <[email protected]>
> Signed-off-by: Dongli Zhang <[email protected]>
> ---
> drivers/net/tap.c | 30 ++++++++++++++++++++++--------
> include/linux/skbuff.h | 9 +++++++++
> include/trace/events/skb.h | 5 +++++
> 3 files changed, 36 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/net/tap.c b/drivers/net/tap.c
> index 8e3a28ba6b28..ab3592506ef8 100644
> --- a/drivers/net/tap.c
> +++ b/drivers/net/tap.c
> @@ -322,6 +322,7 @@ rx_handler_result_t tap_handle_frame(struct sk_buff **pskb)
> struct tap_dev *tap;
> struct tap_queue *q;
> netdev_features_t features = TAP_FEATURES;
> + int drop_reason;

enum skb_drop_reason drop_reason;

>
> tap = tap_dev_get_rcu(dev);
> if (!tap)
> @@ -343,12 +344,16 @@ rx_handler_result_t tap_handle_frame(struct sk_buff **pskb)
> struct sk_buff *segs = __skb_gso_segment(skb, features, false);
> struct sk_buff *next;
>
> - if (IS_ERR(segs))
> + if (IS_ERR(segs)) {
> + drop_reason = SKB_DROP_REASON_SKB_GSO_SEG;
> goto drop;
> + }
>
> if (!segs) {
> - if (ptr_ring_produce(&q->ring, skb))
> + if (ptr_ring_produce(&q->ring, skb)) {
> + drop_reason = SKB_DROP_REASON_FULL_RING;
> goto drop;
> + }
> goto wake_up;
> }

you missed the walk of segs and calling ptr_ring_produce.

>