Received: by 2002:a05:6a11:4021:0:0:0:0 with SMTP id ky33csp4287893pxb; Mon, 27 Sep 2021 13:35:19 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzo48MT1lQzkKOWSzlpnp1PfpEDDI+EqA9oxo1mqR3S7ghUh9xIC0Ut9104baWQwbDQhOPD X-Received: by 2002:a17:906:165a:: with SMTP id n26mr2316396ejd.236.1632774919724; Mon, 27 Sep 2021 13:35:19 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1632774919; cv=none; d=google.com; s=arc-20160816; b=Q5BH5rV7yWlSFf9GUdLXzCqEYAf1uQTSRePk0IsMTrerQ0CAoVYmxCsI/LDlReNrIK VW2N9yYZ2q8Is/cI+2pz7PuvyJMq38xH3B+ENrbSNUjowKO3hwuFvO6rL8vQZ2eMRItL AYkxCzc52x6kM/wakkbB6NdcM6UG9SpbhjRjihu4LRM2WVKMMZHQnxA+j7YxQxxS9x4g dqt6wpXiQci9x/16pvVYpeyy7MvTCJl9tya59N3BGVAvTuTovNxx8eJ3zw3gcAsMz9GH jg052vp0NoiVlMl91QDfVajRN2qlgE0dE4QlHDltMVyt/y77cG9F4hhW0jgvk1e5c0PI pdew== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:message-id:date:subject:cc:to:from :dkim-signature; bh=8Crp/mKlKCJoRPApaEQ45TzUsLI1jNtTwgloDzwU01g=; b=dNCL8ENzDRsHu9YXAjQgu69nMwculNTOhbgkVnLyOsLjAIOSGw6ounljvZdKHAPQxx yeuJenBxoi2+CW7MX0c8PVZtuv7E8VYiQg2lhbwbnAdnDAQzwVNetsaWEuuW+CQt90uu m9MeIThFXwJAta869zxK6Xbs3teaSnRVduidjSxU/iZlZJju1Kziehz9cC2aV7yPJJ46 mj8hSd6vzqawyvtyNU96InvupaYJeXIXKnwKuGa9KyLeYNPz62MxSRJ2KHgqjzVGnjBY J2jNZs763W+vzatsnr9m6zpaBCgRkJoEQkJCH0DwCar0yvkBWVQUZntD5Dg0TEGKbl7A UylA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@llnw.com header.s=google header.b=e7oPQ1Yl; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=NONE dis=NONE) header.from=llnw.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id 24si1147967edv.5.2021.09.27.13.34.54; Mon, 27 Sep 2021 13:35:19 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@llnw.com header.s=google header.b=e7oPQ1Yl; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=NONE dis=NONE) header.from=llnw.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237002AbhI0UcN (ORCPT + 99 others); Mon, 27 Sep 2021 16:32:13 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58006 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S237003AbhI0UcM (ORCPT ); Mon, 27 Sep 2021 16:32:12 -0400 Received: from mail-pl1-x62c.google.com (mail-pl1-x62c.google.com [IPv6:2607:f8b0:4864:20::62c]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C8909C061604 for ; Mon, 27 Sep 2021 13:30:34 -0700 (PDT) Received: by mail-pl1-x62c.google.com with SMTP id x8so9772691plv.8 for ; Mon, 27 Sep 2021 13:30:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=llnw.com; s=google; h=from:to:cc:subject:date:message-id; bh=8Crp/mKlKCJoRPApaEQ45TzUsLI1jNtTwgloDzwU01g=; b=e7oPQ1Ylx3GOzE20U/dqy20eGLCy7P0YfEmY2q6XbN/hntKEap2FrkGNmFg1Wvz/J4 0nmu3O58HopA64lqwoiW0w8Fr37Sdx+rS3hfexCPDNkvTbji9UZdod6qHoiymgcL/N0a oBKx4EtdihuLlzC0PY18aBLyOpOQpj0N9xzHE9iBA0uHUqgGfZLpyGwIwAmyJd26WVkP 34DSjNIVE7fL72IasRXrnX0vmHt9ncFQBOKbz8NBDI9UEdWvmFzgNf8/qEuwBbR7rTTy ANMUbjkO87pGn+3AkSIpGkdYmbbxFx4hiwzUwZVMeWkhZGd1tTv+DcZf65sBScvA306+ SlZg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id; bh=8Crp/mKlKCJoRPApaEQ45TzUsLI1jNtTwgloDzwU01g=; b=YTs3oEvOQik41lwrDiG/Fk/as/EH8Q4oo7Nm1cDa2ErIGWTCqt98UV/j4E9S4qgSkr t2n+u2aL29lwwtjsbuhX9YPanIgGStHh45lehCpHdozWHDzevAv+i0QYqb9o7WkII/iQ 4cpzgZr1kMEuW8n3GNbWDWzQBB92AmJ6yy3o4RWvWeDHNG529D3rdTAMBMxfLRyqdR92 QzzgfwZh9qzqtGXzF7Nw4wmMIAmj1SCUhyyla28xcwMLO+mwNhOQgi9zBLrfWX+Ek5JR BazIZ3i79Nrv6eLModWjl7gTmfSy2f+4TfMQjLR0Sr/uohtV1N7R62ge98eyi4s3lkvi 9zUw== X-Gm-Message-State: AOAM531d5iAvCcm8KSDTpbRkKprsa7sRhFNFVJZ9fTCQoFEEUwMHX0HH tb/RqGPlYeBCUY+p7M2MdPgsfkuIUo/cjMCvJlD9eg== X-Received: by 2002:a17:90a:19e:: with SMTP id 30mr1071056pjc.131.1632774633962; Mon, 27 Sep 2021 13:30:33 -0700 (PDT) Received: from localhost.localdomain (wsip-184-181-13-226.ph.ph.cox.net. [184.181.13.226]) by smtp.googlemail.com with ESMTPSA id q20sm18606748pfc.57.2021.09.27.13.30.32 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 27 Sep 2021 13:30:33 -0700 (PDT) From: Johannes Lundberg To: linux-kernel@vger.kernel.org Cc: Johannes Lundberg , "David S. Miller" , Jakub Kicinski , Eric Dumazet , Hideaki YOSHIFUJI , David Ahern , Paolo Abeni , Florian Westphal , Alexander Aring , Tonghao Zhang , Yangbo Lu , Thomas Gleixner , netdev@vger.kernel.org Subject: [PATCH] fs: eventpoll: add empty event Date: Mon, 27 Sep 2021 13:29:17 -0700 Message-Id: <20210927202923.7360-1-jlundberg@llnw.com> X-Mailer: git-send-email 2.17.1 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org The EPOLLEMPTY event will trigger when the TCP write buffer becomes empty, i.e., when all outgoing data have been ACKed. The need for this functionality comes from a business requirement of measuring with higher precision how much time is spent transmitting data to a client. For reference, similar functionality was previously added to FreeBSD as the kqueue event EVFILT_EMPTY. Signed-off-by: Johannes Lundberg --- include/net/sock.h | 11 +++++++++++ include/uapi/linux/eventpoll.h | 1 + net/core/sock.c | 5 +++++ net/core/stream.c | 14 ++++++++++++++ net/ipv4/tcp.c | 5 +++++ 5 files changed, 36 insertions(+) diff --git a/include/net/sock.h b/include/net/sock.h index c005c3c750e8..9047a9e225a9 100644 --- a/include/net/sock.h +++ b/include/net/sock.h @@ -516,6 +516,7 @@ struct sock { void (*sk_state_change)(struct sock *sk); void (*sk_data_ready)(struct sock *sk); void (*sk_write_space)(struct sock *sk); + void (*sk_empty)(struct sock *sk); void (*sk_error_report)(struct sock *sk); int (*sk_backlog_rcv)(struct sock *sk, struct sk_buff *skb); @@ -965,6 +966,7 @@ static inline void sk_wmem_queued_add(struct sock *sk, int val) WRITE_ONCE(sk->sk_wmem_queued, sk->sk_wmem_queued + val); } +void sk_stream_empty(struct sock *sk); void sk_stream_write_space(struct sock *sk); /* OOB backlog add */ @@ -1288,6 +1290,11 @@ static inline void sk_refcnt_debug_release(const struct sock *sk) INDIRECT_CALLABLE_DECLARE(bool tcp_stream_memory_free(const struct sock *sk, int wake)); +static inline bool sk_stream_is_empty(const struct sock *sk) +{ + return (sk->sk_wmem_queued == 0); +} + static inline bool __sk_stream_memory_free(const struct sock *sk, int wake) { if (READ_ONCE(sk->sk_wmem_queued) >= READ_ONCE(sk->sk_sndbuf)) @@ -1559,6 +1566,10 @@ DECLARE_STATIC_KEY_FALSE(tcp_tx_skb_cache_key); static inline void sk_wmem_free_skb(struct sock *sk, struct sk_buff *skb) { sk_wmem_queued_add(sk, -skb->truesize); + + if (sk_stream_is_empty(sk)) + sk->sk_empty(sk); + sk_mem_uncharge(sk, skb->truesize); if (static_branch_unlikely(&tcp_tx_skb_cache_key) && !sk->sk_tx_skb_cache && !skb_cloned(skb)) { diff --git a/include/uapi/linux/eventpoll.h b/include/uapi/linux/eventpoll.h index 8a3432d0f0dc..aab9f1f624d0 100644 --- a/include/uapi/linux/eventpoll.h +++ b/include/uapi/linux/eventpoll.h @@ -39,6 +39,7 @@ #define EPOLLWRNORM (__force __poll_t)0x00000100 #define EPOLLWRBAND (__force __poll_t)0x00000200 #define EPOLLMSG (__force __poll_t)0x00000400 +#define EPOLLEMPTY (__force __poll_t)0x00000800 #define EPOLLRDHUP (__force __poll_t)0x00002000 /* Set exclusive wakeup mode for the target file descriptor */ diff --git a/net/core/sock.c b/net/core/sock.c index 512e629f9780..f917791d8149 100644 --- a/net/core/sock.c +++ b/net/core/sock.c @@ -3062,6 +3062,10 @@ static void sock_def_write_space(struct sock *sk) rcu_read_unlock(); } +static void sock_def_empty(struct sock *sk) +{ +} + static void sock_def_destruct(struct sock *sk) { } @@ -3136,6 +3140,7 @@ void sock_init_data(struct socket *sock, struct sock *sk) sk->sk_state_change = sock_def_wakeup; sk->sk_data_ready = sock_def_readable; sk->sk_write_space = sock_def_write_space; + sk->sk_empty = sock_def_empty; sk->sk_error_report = sock_def_error_report; sk->sk_destruct = sock_def_destruct; diff --git a/net/core/stream.c b/net/core/stream.c index 4f1d4aa5fb38..c7e4135542a2 100644 --- a/net/core/stream.c +++ b/net/core/stream.c @@ -21,6 +21,20 @@ #include #include +void sk_stream_empty(struct sock *sk) +{ + struct socket *sock = sk->sk_socket; + struct socket_wq *wq; + + if (sk_stream_is_empty(sk) && sock) { + rcu_read_lock(); + wq = rcu_dereference(sk->sk_wq); + if (skwq_has_sleeper(wq)) + wake_up_interruptible_poll(&wq->wait, EPOLLEMPTY); + rcu_read_unlock(); + } +} + /** * sk_stream_write_space - stream socket write_space callback. * @sk: socket diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index e8b48df73c85..550bae79af06 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -453,6 +453,8 @@ void tcp_init_sock(struct sock *sk) tp->tsoffset = 0; tp->rack.reo_wnd_steps = 1; + sk->sk_empty = sk_stream_empty; + sk->sk_write_space = sk_stream_write_space; sock_set_flag(sk, SOCK_USE_WRITE_QUEUE); @@ -561,6 +563,9 @@ __poll_t tcp_poll(struct file *file, struct socket *sock, poll_table *wait) tp->urg_data) target++; + if (sk_stream_is_empty(sk)) + mask |= EPOLLEMPTY; + if (tcp_stream_is_readable(sk, target)) mask |= EPOLLIN | EPOLLRDNORM; -- 2.17.1