Received: by 2002:a05:6500:2018:b0:1fb:9675:f89d with SMTP id t24csp79056lqh; Thu, 30 May 2024 14:49:36 -0700 (PDT) X-Forwarded-Encrypted: i=3; AJvYcCXW4mJYZ78c8UJoB7WIE9V2hZSFekR1luzSG6enyoiIEYz1N3c7PV20Vd0L06dRGX30ePXg7n3OSS6LI7hAIZbd6O55yAx77v0fRTHVgw== X-Google-Smtp-Source: AGHT+IEyQKLQNbvvYFOe3tuzMitU8oRuYBANFrTyi6mSHKJznx1ZCu0IBwebHaKGZq1DU6ckE/c0 X-Received: by 2002:a17:90b:83:b0:2bd:f751:e184 with SMTP id 98e67ed59e1d1-2c1dc4b2c1bmr41126a91.0.1717105775956; Thu, 30 May 2024 14:49:35 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1717105775; cv=pass; d=google.com; s=arc-20160816; b=WJhqnluqm3eMzeo95VLOQAOe13Px+0JQ7khxvQLMV9dZOWzOtrYVVPbzM724tvbb6N 3tItuiNjEVvzftGPZQ9rwcFgksm2VjOriVtlRMTl4SXq22cHi2PqfF6kMVJlMkpsbqbT QKJ3vkdLM6SWQzL+7/Yfk8vlwdqisQlZzyNDKUAq6MNh2+Zjkm+PFCgYn9JPto/58U94 yElJxHGB5SlHeowyLNxA2HB+ttfDRvoQ7sksH+hxXHtn5x0O0FxOMHkfaBnuale5jkND UYNLge0aeVsd9mOq58/3l+gzAOJGIF11+dZxbfeT2ZWCo4Z14XWQuXfRgSxdQvhX0MCU cPCg== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=in-reply-to:content-disposition:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:message-id:subject:cc :to:from:date:dkim-signature; bh=iU0kmb1Q8d2eQhrqkYXLZ8UzXfCX7B5qnp9jHajfNfU=; fh=B8WRR+1mj1zqJBx8Z8rpO0YRdxnl5DwdO1zOPDoMDsY=; b=LMXvd5URFvO41IzOulXGqyV9tPIy5eK6MNwCp0/kVykZV3daYQiSe9gRGvcCzZcFnj O7/VvRHAfikmpk1R+Yvstl30O2aOopZSsJz2FnTz0Uh34cDOdMx2/o122KPX7NPxWqDO T8+s4pXGwTen+sG+W3oRNHTKq6wh9U90C+BQAHAKGPYf2e9LaYUnZZF4Uh8FAi3M6P17 PKBI2cneb1q0jimJiOwvAt5vYyOT57DdmE8wcn/RblhqBu+y3hakkqrPM+FeKjnOPnlZ k0hrAGsMJ04j0pR2YRtQ5Sz1FZd26fQAaekR9+23pdTj88Urr2mEXzQ7UQoNkCbO4AeA /Twg==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@cloudflare.com header.s=google09082023 header.b="JGNDl/Iz"; arc=pass (i=1 spf=pass spfdomain=cloudflare.com dkim=pass dkdomain=cloudflare.com dmarc=pass fromdomain=cloudflare.com); spf=pass (google.com: domain of linux-kernel+bounces-195949-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.48.161 as permitted sender) smtp.mailfrom="linux-kernel+bounces-195949-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=cloudflare.com Return-Path: Received: from sy.mirrors.kernel.org (sy.mirrors.kernel.org. [147.75.48.161]) by mx.google.com with ESMTPS id 98e67ed59e1d1-2c1c2852019si350538a91.185.2024.05.30.14.49.35 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 30 May 2024 14:49:35 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel+bounces-195949-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.48.161 as permitted sender) client-ip=147.75.48.161; Authentication-Results: mx.google.com; dkim=pass header.i=@cloudflare.com header.s=google09082023 header.b="JGNDl/Iz"; arc=pass (i=1 spf=pass spfdomain=cloudflare.com dkim=pass dkdomain=cloudflare.com dmarc=pass fromdomain=cloudflare.com); spf=pass (google.com: domain of linux-kernel+bounces-195949-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.48.161 as permitted sender) smtp.mailfrom="linux-kernel+bounces-195949-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=cloudflare.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sy.mirrors.kernel.org (Postfix) with ESMTPS id 7234CB23D2C for ; Thu, 30 May 2024 21:47:29 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 89C76184135; Thu, 30 May 2024 21:47:02 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=cloudflare.com header.i=@cloudflare.com header.b="JGNDl/Iz" Received: from mail-oi1-f171.google.com (mail-oi1-f171.google.com [209.85.167.171]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C2327182D3C for ; Thu, 30 May 2024 21:46:59 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.167.171 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1717105621; cv=none; b=Okt15J/LFcCtSc4L4WaMV2qVWxQqcTlk5s7t2S0RIPtbgOKPDYMMC9rA4GzWmuInRRN3050iW7Pdi74oDzJhv91NxhmvW1PktsuVdSuYy3mABETMBM5ncmNUxnw13Q5fArz2ln4DaRxmDjd2XmqtS9u+FmMZ/PgnUwF8W11/ImY= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1717105621; c=relaxed/simple; bh=W2h/LnRyHWTHgTveKmP08DBvMQDEDLRwZs6glWTSns0=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=DRRicDsqVckx2DKCIW4nsLnIKPDZHVlrYvJkWdWsx6KqTdU4HaaJQb19BDhEXfKm2OhFZHJhzsxDt30KH+DAVcASutSzNh/ggfBlgjDnkUaM0eFciBsTLSHYzbhXXhpC476lgQ6oCY3+BH90InuztkGGix0KCRHh8gPb01PPWJA= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=cloudflare.com; spf=pass smtp.mailfrom=cloudflare.com; dkim=pass (2048-bit key) header.d=cloudflare.com header.i=@cloudflare.com header.b=JGNDl/Iz; arc=none smtp.client-ip=209.85.167.171 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=cloudflare.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=cloudflare.com Received: by mail-oi1-f171.google.com with SMTP id 5614622812f47-3d1bc6e5f01so773207b6e.0 for ; Thu, 30 May 2024 14:46:59 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cloudflare.com; s=google09082023; t=1717105619; x=1717710419; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=iU0kmb1Q8d2eQhrqkYXLZ8UzXfCX7B5qnp9jHajfNfU=; b=JGNDl/IzYCqu2Mrrfd9qRTedLOgFLIvopB1c16Vy89CmFUNhrMWZxKhpF5EQ7SHdid Wp8VUs5BE5Acep/CtY+740C74H+yvWvZHaWEypm2JTOFIz0pYAeWr8xvc+n5k8+rE8xP FvQQ2qi1766/uy3wlobRWq9ehBzwaB2C/ogKkgUbwmdwzEXeruDN5ge/6HXU5cZFutK3 BsfntBDRMIerX+t4fJiEMa0k15GVUxBwBsKNBTCZcRck9n8fIprf2LaW61iSVGiTn2uq EBjriRF14HAKNESp1x4AjUKEWDJNpcmhMPcka8uHX/2eUsmA+OvEo3N/7F1FW1KQhXy7 KL/A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1717105619; x=1717710419; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=iU0kmb1Q8d2eQhrqkYXLZ8UzXfCX7B5qnp9jHajfNfU=; b=ZU5PKYwmQGsHV1GCJ8a51MjgQdcJbvcCJuXUI+Tky24Gk+tJeVR18Pre5fjNt//cG1 a3kFTFy67816zgSeWnE5m3u3znGlUtgES64heYfM/T/keZ1F16rcajOcdIdtXV3GF2pS b8YdV6y7qLfBVXjioekVU+kwAdoAF/BD6xYcTUIKWvGhGkNhln7Fb2yS+A0miIJl73s/ 71782OMgVLghEgsmKVNzlJqRTSUPAyvXGlPg0lIVc1slILIlR1AYXBTi8Yu7DvifbFGj ggOpukwmx7FgY6x/3+T5OQAQQkBCU6qnyaQpIe60NZBx1RTCYX4gVOj7odPSecr1gZs8 hytQ== X-Forwarded-Encrypted: i=1; AJvYcCUbUE1ewFy0SpbvIWhfPMV0juF+pip3xG6ISnLewZ03W1/dyd9vQOcXzf8eQgWR4r0nS56g2vj5To0RRcmwlgoIoC84snOrRgNrAnoO X-Gm-Message-State: AOJu0Ywzst7nweVfasrqaTzIVez/Iu6gdIdaSqJQU7YjmXgzKf6ABAdB hx3aKkoWPIUSzKlhUUAj7Ne7BQ98J5JOA9UG+jr6+05IZhE6MOt+zMF+F5exeb8= X-Received: by 2002:aca:1119:0:b0:3c9:966e:32ea with SMTP id 5614622812f47-3d1e34786demr29137b6e.2.1717105618801; Thu, 30 May 2024 14:46:58 -0700 (PDT) Received: from debian.debian ([2a09:bac5:7a49:f9b::18e:1c]) by smtp.gmail.com with ESMTPSA id 6a1803df08f44-6ae4a73e4absm1963286d6.6.2024.05.30.14.46.56 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 30 May 2024 14:46:58 -0700 (PDT) Date: Thu, 30 May 2024 14:46:56 -0700 From: Yan Zhai To: netdev@vger.kernel.org Cc: "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Simon Horman , David Ahern , Abhishek Chauhan , Mina Almasry , Florian Westphal , Alexander Lobakin , David Howells , Jiri Pirko , Daniel Borkmann , Sebastian Andrzej Siewior , Lorenzo Bianconi , Pavel Begunkov , linux-kernel@vger.kernel.org, kernel-team@cloudflare.com, Jesper Dangaard Brouer Subject: [RFC net-next 1/6] net: add kfree_skb_for_sk function Message-ID: <9be3733eee16bb81a7e8e2e57ebcc008f95cae08.1717105215.git.yan@cloudflare.com> References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Implement a new kfree_skb_for_sk to replace kfree_skb_reason on a few local receive path. The function accepts an extra receiving socket argument, which will be set in skb->cb for kfree_skb/consume_skb tracepoint consumption. With this extra bit of information, it will be easier to attribute dropped packets to netns/containers and sockets/services for performance and error monitoring purpose. Signed-off-by: Yan Zhai --- include/linux/skbuff.h | 48 ++++++++++++++++++++++++++++++++++++++++-- net/core/dev.c | 21 +++++++----------- net/core/skbuff.c | 29 +++++++++++++------------ 3 files changed, 70 insertions(+), 28 deletions(-) diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index fe7d8dbef77e..66f5b06798f2 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -1251,8 +1251,52 @@ static inline bool skb_data_unref(const struct sk_buff *skb, return true; } -void __fix_address -kfree_skb_reason(struct sk_buff *skb, enum skb_drop_reason reason); +/* + * Some protocol will clear or reuse skb->dev field for other purposes. For + * example, TCP stack would reuse the pointer for out of order packet handling. + * This caused some problem for drop monitoring on kfree_skb tracepoint, since + * no other fields of an skb provides netns information. In addition, it is + * also complicated to recover receive socket information for dropped packets, + * because the socket lookup can be an sk-lookup BPF program. + * + * This can be addressed by just passing the rx socket to the tracepoint, + * because it also has valid netns binding. + */ +struct kfree_skb_cb { + enum skb_drop_reason reason; /* used only by dev_kfree_skb_irq */ + struct sock *rx_sk; +}; + +#define KFREE_SKB_CB(skb) ((struct kfree_skb_cb *)(skb)->cb) + +/* Save cb->rx_sk before calling kfree_skb/consume_skb tracepoint, and restore + * after the tracepoint. This is necessary because some skb destructor might + * rely on values in skb->cb, e.g. unix_destruct_scm. + */ +#define _call_trace_kfree_skb(action, skb, sk, ...) do { \ + if (trace_##action##_skb_enabled()) { \ + struct kfree_skb_cb saved; \ + saved.rx_sk = KFREE_SKB_CB(skb)->rx_sk; \ + KFREE_SKB_CB(skb)->rx_sk = sk; \ + trace_##action##_skb((skb), ## __VA_ARGS__); \ + KFREE_SKB_CB(skb)->rx_sk = saved.rx_sk; \ + } \ +} while (0) + +#define call_trace_kfree_skb(skb, sk, ...) \ + _call_trace_kfree_skb(kfree, skb, sk, ## __VA_ARGS__) + +#define call_trace_consume_skb(skb, sk, ...) \ + _call_trace_kfree_skb(consume, skb, sk, ## __VA_ARGS__) + +void __fix_address kfree_skb_for_sk(struct sk_buff *skb, struct sock *rx_sk, + enum skb_drop_reason reason); + +static inline void +kfree_skb_reason(struct sk_buff *skb, enum skb_drop_reason reason) +{ + kfree_skb_for_sk(skb, NULL, reason); +} /** * kfree_skb - free an sk_buff with 'NOT_SPECIFIED' reason diff --git a/net/core/dev.c b/net/core/dev.c index 85fe8138f3e4..17516f26be92 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -3135,15 +3135,6 @@ void __netif_schedule(struct Qdisc *q) } EXPORT_SYMBOL(__netif_schedule); -struct dev_kfree_skb_cb { - enum skb_drop_reason reason; -}; - -static struct dev_kfree_skb_cb *get_kfree_skb_cb(const struct sk_buff *skb) -{ - return (struct dev_kfree_skb_cb *)skb->cb; -} - void netif_schedule_queue(struct netdev_queue *txq) { rcu_read_lock(); @@ -3182,7 +3173,11 @@ void dev_kfree_skb_irq_reason(struct sk_buff *skb, enum skb_drop_reason reason) } else if (likely(!refcount_dec_and_test(&skb->users))) { return; } - get_kfree_skb_cb(skb)->reason = reason; + + /* There is no need to save the old cb since we are the only user. */ + KFREE_SKB_CB(skb)->reason = reason; + KFREE_SKB_CB(skb)->rx_sk = NULL; + local_irq_save(flags); skb->next = __this_cpu_read(softnet_data.completion_queue); __this_cpu_write(softnet_data.completion_queue, skb); @@ -5229,17 +5224,17 @@ static __latent_entropy void net_tx_action(struct softirq_action *h) clist = clist->next; WARN_ON(refcount_read(&skb->users)); - if (likely(get_kfree_skb_cb(skb)->reason == SKB_CONSUMED)) + if (likely(KFREE_SKB_CB(skb)->reason == SKB_CONSUMED)) trace_consume_skb(skb, net_tx_action); else trace_kfree_skb(skb, net_tx_action, - get_kfree_skb_cb(skb)->reason); + KFREE_SKB_CB(skb)->reason); if (skb->fclone != SKB_FCLONE_UNAVAILABLE) __kfree_skb(skb); else __napi_kfree_skb(skb, - get_kfree_skb_cb(skb)->reason); + KFREE_SKB_CB(skb)->reason); } } diff --git a/net/core/skbuff.c b/net/core/skbuff.c index 466999a7515e..5ce6996512a1 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -1190,7 +1190,8 @@ void __kfree_skb(struct sk_buff *skb) EXPORT_SYMBOL(__kfree_skb); static __always_inline -bool __kfree_skb_reason(struct sk_buff *skb, enum skb_drop_reason reason) +bool __kfree_skb_reason(struct sk_buff *skb, struct sock *rx_sk, + enum skb_drop_reason reason) { if (unlikely(!skb_unref(skb))) return false; @@ -1201,28 +1202,30 @@ bool __kfree_skb_reason(struct sk_buff *skb, enum skb_drop_reason reason) SKB_DROP_REASON_SUBSYS_NUM); if (reason == SKB_CONSUMED) - trace_consume_skb(skb, __builtin_return_address(0)); + call_trace_consume_skb(skb, rx_sk, __builtin_return_address(0)); else - trace_kfree_skb(skb, __builtin_return_address(0), reason); + call_trace_kfree_skb(skb, rx_sk, __builtin_return_address(0), reason); + return true; } /** - * kfree_skb_reason - free an sk_buff with special reason + * kfree_skb_for_sk - free an sk_buff with special reason and receiving socket * @skb: buffer to free + * @rx_sk: the socket to receive the buffer, or NULL if not applicable * @reason: reason why this skb is dropped * * Drop a reference to the buffer and free it if the usage count has - * hit zero. Meanwhile, pass the drop reason to 'kfree_skb' + * hit zero. Meanwhile, pass the drop reason and rx socket to 'kfree_skb' * tracepoint. */ -void __fix_address -kfree_skb_reason(struct sk_buff *skb, enum skb_drop_reason reason) +void __fix_address kfree_skb_for_sk(struct sk_buff *skb, struct sock *rx_sk, + enum skb_drop_reason reason) { - if (__kfree_skb_reason(skb, reason)) + if (__kfree_skb_reason(skb, rx_sk, reason)) __kfree_skb(skb); } -EXPORT_SYMBOL(kfree_skb_reason); +EXPORT_SYMBOL(kfree_skb_for_sk); #define KFREE_SKB_BULK_SIZE 16 @@ -1261,7 +1264,7 @@ kfree_skb_list_reason(struct sk_buff *segs, enum skb_drop_reason reason) while (segs) { struct sk_buff *next = segs->next; - if (__kfree_skb_reason(segs, reason)) { + if (__kfree_skb_reason(segs, NULL, reason)) { skb_poison_list(segs); kfree_skb_add_bulk(segs, &sa, reason); } @@ -1405,7 +1408,7 @@ void consume_skb(struct sk_buff *skb) if (!skb_unref(skb)) return; - trace_consume_skb(skb, __builtin_return_address(0)); + call_trace_consume_skb(skb, NULL, __builtin_return_address(0)); __kfree_skb(skb); } EXPORT_SYMBOL(consume_skb); @@ -1420,7 +1423,7 @@ EXPORT_SYMBOL(consume_skb); */ void __consume_stateless_skb(struct sk_buff *skb) { - trace_consume_skb(skb, __builtin_return_address(0)); + call_trace_consume_skb(skb, NULL, __builtin_return_address(0)); skb_release_data(skb, SKB_CONSUMED); kfree_skbmem(skb); } @@ -1478,7 +1481,7 @@ void napi_consume_skb(struct sk_buff *skb, int budget) return; /* if reaching here SKB is ready to free */ - trace_consume_skb(skb, __builtin_return_address(0)); + call_trace_consume_skb(skb, NULL, __builtin_return_address(0)); /* if SKB is a clone, don't handle this case */ if (skb->fclone != SKB_FCLONE_UNAVAILABLE) { -- 2.30.2