Received: by 2002:a05:6358:16cc:b0:ea:6187:17c9 with SMTP id r12csp7547005rwl; Tue, 10 Jan 2023 02:10:36 -0800 (PST) X-Google-Smtp-Source: AMrXdXsuwSynnslhpEubCfKx9VGOzT2bq/wA8Nhp32MN5UKLGhIgPCucnBUJ0eOQiA33y9JxL9cE X-Received: by 2002:a17:903:1c1:b0:191:271f:47be with SMTP id e1-20020a17090301c100b00191271f47bemr104743637plh.35.1673345436707; Tue, 10 Jan 2023 02:10:36 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1673345436; cv=none; d=google.com; s=arc-20160816; b=NbvNebVAk7CcxU+6dlo7sBKQIDYWO3C78P3Dt/H59JGxVe+emAirnmHUjb+XhHMkDX MiwT8dlK+vzDiqkogJ3Zrl7Zr0pMZ14av7iXDVp9UU3iEUO8e6XT+wQVx7dJ0TB9sGzb pG7WCjxQnJWUmKPWLr/358hhmMsA5cINtMyhg7cNiJ7+k717YJxFqa5SlUwwNm6IrfT8 dV6omakR2H+rvGDME6ll305a89vHhJ8eB/F/xkvvDDt0/uxSb07ji/aQ04Tn7q37uREd mEh1Jxj4bJwImo+hjltopaLsXpwsyszZE4oY+pmmgKz5rN38qzE/ffHSy6HaON8LidAp D1vA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:to:from:dkim-signature; bh=+xfZbEYbh/Q29ZJ4Nqg9QA8hyR4QarYUB/yXo5iriLM=; b=qDv1/cOhc3caIZS00yPzyeU8Jps2F+qwBSVsA0+JL1UcdhLMnk8eHRtEm63+ah4bHi zbgJ5CHLkkWvRs34/GZRETRNXjikdpouP4dc1cQ/OyMV3/nxywPY2u1Nq1dNCmC8KmvO 9AK/M1ziN/Yi8Y+5NShBAZL+4kWmq1AJ/cn+miFMoo/Ns8EvVJRn6Lo3VxMiBlyuj5I7 y7fI95+WKU5hsHlbtVz47T3HFNiSaBwRRqeGBLmhiKKZH77uQjmEQj7YVjDWinME0Oaz 035nHcssPWewGYC+vKbb4NWGFRGdQuNWMRbKYCsNwU/YbxwCCe810S0E+n+h80Z8HANr YObg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@bytedance-com.20210112.gappssmtp.com header.s=20210112 header.b=U3OKDoBT; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=bytedance.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id s4-20020a170902ea0400b0019338a168f2si3366081plg.562.2023.01.10.02.10.29; Tue, 10 Jan 2023 02:10:36 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@bytedance-com.20210112.gappssmtp.com header.s=20210112 header.b=U3OKDoBT; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=bytedance.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231959AbjAJJQB (ORCPT + 53 others); Tue, 10 Jan 2023 04:16:01 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60392 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231773AbjAJJPo (ORCPT ); Tue, 10 Jan 2023 04:15:44 -0500 Received: from mail-pf1-x42b.google.com (mail-pf1-x42b.google.com [IPv6:2607:f8b0:4864:20::42b]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 57D6E50F53 for ; Tue, 10 Jan 2023 01:15:20 -0800 (PST) Received: by mail-pf1-x42b.google.com with SMTP id i65so4845015pfc.0 for ; Tue, 10 Jan 2023 01:15:20 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20210112.gappssmtp.com; s=20210112; h=content-transfer-encoding:mime-version:message-id:date:subject:to :from:from:to:cc:subject:date:message-id:reply-to; bh=+xfZbEYbh/Q29ZJ4Nqg9QA8hyR4QarYUB/yXo5iriLM=; b=U3OKDoBTw4gnRy+Me/WwUJW4mTaUx8JyyvaTKsVBwMwrmKP1a7q9VzBRIQtdU5RdOm nEbsOA4OLakNcandN4kvkZyJ9t0C+HZo7FShk/OPeD46UoqzafDEwlTD2zu4cM96s4VW /+qEFVqlbh/y6JLicdA/feO/lyH5djLwq9U0sPS8HxjSqghCbBA5ThpigmXBq6NkvXVv oSF2aYwAGREx7PeZbtADxEVd10cd/LR1Jpx8XRPYYTSuLuE6Bk5K3fr/zOaoA7h8cY+l 0xGkWoJj/gEpY3v7d9mPns9NcDJdZ0vBkE9cS8vy7OWwquL4TRgAgk4c6U9HHYNXNarS aiDw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:message-id:date:subject:to :from:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=+xfZbEYbh/Q29ZJ4Nqg9QA8hyR4QarYUB/yXo5iriLM=; b=TU4DQ2VUaWJRVTag1H+BC6l9KLE6CCQaHnJkCp+QpcfJY0AA2UXiOalqkOASL4X21X iGLWcn9re9SBefuTE0nwvaG4hwl7xg+dYttK6xKo1ngQhqEgmD4kgUUehgd1pQbcvb1u 0FFOezxeFzp1rPKXOsVyLdQOH3OM/HAXwb1kgef2calXM3T8AMKlHs9CUV2nEQw928wi gNKk51z0cSWGKW6z87nYEbT0Ocm7TOTK7SiobcfoiNfsllTzVVsRWve3a+hx8zvnZeuj DFFFS4lPWvFKe8tg87L/Niqb5bqQoswnf04EQqLwPN7VHyxm7dNmytHREhVgbXsuDiQo vaKw== X-Gm-Message-State: AFqh2krGXnWduNYmpzbHdk+LLbVT7aVKkXotqwoaWk9rbkaYCaRZymUv P8Ws0XITCyslKVq2Qrcpk0Ig0ONg0jmBKS2j9rk= X-Received: by 2002:a62:1d97:0:b0:578:ac9f:79a9 with SMTP id d145-20020a621d97000000b00578ac9f79a9mr63359945pfd.15.1673342119808; Tue, 10 Jan 2023 01:15:19 -0800 (PST) Received: from PF2E59YH-BKX.inc.bytedance.com ([61.213.176.5]) by smtp.gmail.com with ESMTPSA id z5-20020aa79f85000000b00575fbe1cf2esm7562856pfr.109.2023.01.10.01.15.15 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 10 Jan 2023 01:15:19 -0800 (PST) From: Yunhui Cui To: rostedt@goodmis.org, mhiramat@kernel.org, davem@davemloft.net, edumazet@google.com, kuba@kernel.org, pabeni@redhat.com, kuniyu@amazon.com, xiyou.wangcong@gmail.com, duanxiongchun@bytedance.com, cuiyunhui@bytedance.com, linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org, netdev@vger.kernel.org, dust.li@linux.alibaba.com Subject: [PATCH v5] sock: add tracepoint for send recv length Date: Tue, 10 Jan 2023 17:13:56 +0800 Message-Id: <20230110091356.1524-1-cuiyunhui@bytedance.com> X-Mailer: git-send-email 2.37.3.windows.1 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-0.7 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,NUMERIC_HTTP_ADDR,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Add 2 tracepoints to monitor the tcp/udp traffic of per process and per cgroup. Regarding monitoring the tcp/udp traffic of each process, there are two existing solutions, the first one is https://www.atoptool.nl/netatop.php. The second is via kprobe/kretprobe. Netatop solution is implemented by registering the hook function at the hook point provided by the netfilter framework. These hook functions may be in the soft interrupt context and cannot directly obtain the pid. Some data structures are added to bind packets and processes. For example, struct taskinfobucket, struct taskinfo ... Every time the process sends and receives packets it needs multiple hashmaps,resulting in low performance and it has the problem fo inaccurate tcp/udp traffic statistics(for example: multiple threads share sockets). We can obtain the information with kretprobe, but as we know, kprobe gets the result by trappig in an exception, which loses performance compared to tracepoint. We compared the performance of tracepoints with the above two methods, and the results are as follows: ab -n 1000000 -c 1000 -r http://127.0.0.1/index.html without trace: Time per request: 39.660 [ms] (mean) Time per request: 0.040 [ms] (mean, across all concurrent requests) netatop: Time per request: 50.717 [ms] (mean) Time per request: 0.051 [ms] (mean, across all concurrent requests) kr: Time per request: 43.168 [ms] (mean) Time per request: 0.043 [ms] (mean, across all concurrent requests) tracepoint: Time per request: 41.004 [ms] (mean) Time per request: 0.041 [ms] (mean, across all concurrent requests It can be seen that tracepoint has better performance. Signed-off-by: Yunhui Cui Signed-off-by: Xiongchun Duan --- include/trace/events/sock.h | 44 +++++++++++++++++++++++++++++++++++++ net/socket.c | 36 ++++++++++++++++++++++++++---- 2 files changed, 76 insertions(+), 4 deletions(-) diff --git a/include/trace/events/sock.h b/include/trace/events/sock.h index 777ee6cbe933..2c380cb110a3 100644 --- a/include/trace/events/sock.h +++ b/include/trace/events/sock.h @@ -263,6 +263,50 @@ TRACE_EVENT(inet_sk_error_report, __entry->error) ); +/* + * sock send/recv msg length + */ +DECLARE_EVENT_CLASS(sock_msg_length, + + TP_PROTO(struct sock *sk, int ret, int flags), + + TP_ARGS(sk, ret, flags), + + TP_STRUCT__entry( + __field(void *, sk) + __field(__u16, family) + __field(__u16, protocol) + __field(int, length) + __field(int, error) + __field(int, flags) + ), + + TP_fast_assign( + __entry->sk = sk; + __entry->family = sk->sk_family; + __entry->protocol = sk->sk_protocol; + __entry->length = ret > 0 ? ret : 0; + __entry->error = ret < 0 ? ret : 0; + __entry->flags = flags; + ), + + TP_printk("sk address = %p, family = %s protocol = %s, length = %d, error = %d, flags = 0x%x", + __entry->sk, show_family_name(__entry->family), + show_inet_protocol_name(__entry->protocol), + __entry->length, __entry->error, __entry->flags) +); + +DEFINE_EVENT(sock_msg_length, sock_send_length, + TP_PROTO(struct sock *sk, int ret, int flags), + + TP_ARGS(sk, ret, flags) +); + +DEFINE_EVENT(sock_msg_length, sock_recv_length, + TP_PROTO(struct sock *sk, int ret, int flags), + + TP_ARGS(sk, ret, flags) +); #endif /* _TRACE_SOCK_H */ /* This part must be outside protection */ diff --git a/net/socket.c b/net/socket.c index 888cd618a968..6180d0ad47f9 100644 --- a/net/socket.c +++ b/net/socket.c @@ -106,6 +106,7 @@ #include #include #include +#include #ifdef CONFIG_NET_RX_BUSY_POLL unsigned int sysctl_net_busy_read __read_mostly; @@ -709,12 +710,22 @@ INDIRECT_CALLABLE_DECLARE(int inet_sendmsg(struct socket *, struct msghdr *, size_t)); INDIRECT_CALLABLE_DECLARE(int inet6_sendmsg(struct socket *, struct msghdr *, size_t)); + +static noinline void call_trace_sock_send_length(struct sock *sk, int ret, + int flags) +{ + trace_sock_send_length(sk, ret, 0); +} + static inline int sock_sendmsg_nosec(struct socket *sock, struct msghdr *msg) { int ret = INDIRECT_CALL_INET(sock->ops->sendmsg, inet6_sendmsg, inet_sendmsg, sock, msg, msg_data_left(msg)); BUG_ON(ret == -EIOCBQUEUED); + + if (trace_sock_send_length_enabled()) + call_trace_sock_send_length(sock->sk, ret, 0); return ret; } @@ -989,12 +1000,24 @@ INDIRECT_CALLABLE_DECLARE(int inet_recvmsg(struct socket *, struct msghdr *, size_t, int)); INDIRECT_CALLABLE_DECLARE(int inet6_recvmsg(struct socket *, struct msghdr *, size_t, int)); + +static noinline void call_trace_sock_recv_length(struct sock *sk, int ret, int flags) +{ + trace_sock_recv_length(sk, !(flags & MSG_PEEK) ? ret : + (ret < 0 ? ret : 0), flags); +} + static inline int sock_recvmsg_nosec(struct socket *sock, struct msghdr *msg, int flags) { - return INDIRECT_CALL_INET(sock->ops->recvmsg, inet6_recvmsg, - inet_recvmsg, sock, msg, msg_data_left(msg), - flags); + int ret = INDIRECT_CALL_INET(sock->ops->recvmsg, inet6_recvmsg, + inet_recvmsg, sock, msg, + msg_data_left(msg), flags); + + if (trace_sock_recv_length_enabled()) + call_trace_sock_recv_length(sock->sk, !(flags & MSG_PEEK) ? + ret : (ret < 0 ? ret : 0), flags); + return ret; } /** @@ -1044,6 +1067,7 @@ static ssize_t sock_sendpage(struct file *file, struct page *page, { struct socket *sock; int flags; + int ret; sock = file->private_data; @@ -1051,7 +1075,11 @@ static ssize_t sock_sendpage(struct file *file, struct page *page, /* more is a combination of MSG_MORE and MSG_SENDPAGE_NOTLAST */ flags |= more; - return kernel_sendpage(sock, page, offset, size, flags); + ret = kernel_sendpage(sock, page, offset, size, flags); + + if (trace_sock_send_length_enabled()) + call_trace_sock_send_length(sock->sk, ret, 0); + return ret; } static ssize_t sock_splice_read(struct file *file, loff_t *ppos, -- 2.20.1