Received: by 2002:a25:4158:0:0:0:0:0 with SMTP id o85csp900821yba; Thu, 18 Apr 2019 11:28:23 -0700 (PDT) X-Google-Smtp-Source: APXvYqwY3m83U6Uny8kkJSMorXlfUGykmOqdO+O4ZNyIn0VUNbK69/LNDgcLieI8Yn8QKTiSoPXh X-Received: by 2002:a63:28c:: with SMTP id 134mr91349749pgc.278.1555612103744; Thu, 18 Apr 2019 11:28:23 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1555612103; cv=none; d=google.com; s=arc-20160816; b=Uydd9GTpKa+2DZOTR8ouHyE4drEDmMXeqgGt8NEIYd+9EFAatjZguynLQvPXUcyeoh gvM/bHylDdZK+ep9PIjeh1at1674Yqov6ww97YSOnoow4kO5n7qtB2I5vv10I6ZzMDed wgQuOSBgs01xpfNk9NS5qIdztbufpLZXFvfFKBdn/e+BUtXd9HQTytgtiQ+hOl3gXVFx dMOz+HeOsSU7a9XaIBhXX0zsnT95+BKiiuL3AkpNnN8JiS9SbEw+AgN0vhI3wOjFIwaL ItjKuZ9ffb4US3dlhHSYBL0Xa2YMzuV7bxwO+Pp+FZpI7Mr9zgRtfJ9FkFos7/3VlGwB OYMA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from; bh=g4ZE0XjcyCCuFdw8KlezhV+UyaFKnb2VvASHkkxhw08=; b=nrtYnQZ8IAS13XRTyrJy418KDdvxahvgX8xCh/BJBBxhNdNfp2QTypcY30LErr9dXs UIgif3HNdvw3fwiGCp/9YmPr8ez3SeP9613f0B+OfqLJ6uCfExfnc44XTcG+SZvnHjki 3baTm/huCtcxilRfJU0sXZa1alIR/jna86sYOJuE6vXkd92bE80qhzYyMulvmIk8Q6Lg DgCx1IE/ZfJ6y9TYEMejsWxlON/vPevoNJNWbemmtoT51yAqh6TU+2uWedex3Lt95tZP BAorVHeRBmlvs3KrymchqjZkMaACwR1ysrqr02D4ows6MsTkPIrhNAg4aDhUk0LMTcvW tsDg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=virtuozzo.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id g3si2684383plp.369.2019.04.18.11.28.08; Thu, 18 Apr 2019 11:28:23 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=virtuozzo.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2391472AbfDRSZr (ORCPT + 99 others); Thu, 18 Apr 2019 14:25:47 -0400 Received: from relay.sw.ru ([185.231.240.75]:41798 "EHLO relay.sw.ru" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2391219AbfDRSFG (ORCPT ); Thu, 18 Apr 2019 14:05:06 -0400 Received: from [172.16.25.12] (helo=i7.sw.ru) by relay.sw.ru with esmtp (Exim 4.91) (envelope-from ) id 1hHBP8-0005BU-FO; Thu, 18 Apr 2019 21:04:58 +0300 From: Andrey Ryabinin To: "David S. Miller" Cc: Eric Dumazet , Mel Gorman , Willem de Bruijn , Florian Westphal , linux-kernel@vger.kernel.org, netdev@vger.kernel.org, Andrey Ryabinin Subject: [PATCH 1/4] net/skbuff: don't waste memory reserves Date: Thu, 18 Apr 2019 21:05:21 +0300 Message-Id: <20190418180524.23489-1-aryabinin@virtuozzo.com> X-Mailer: git-send-email 2.21.0 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org In some workloads we have noticed packets being dropped by sk_filter_trim_cap() because the 'skb' was allocated from pfmemalloc reserves: /* * If the skb was allocated from pfmemalloc reserves, only * allow SOCK_MEMALLOC sockets to use it as this socket is * helping free memory */ if (skb_pfmemalloc(skb) && !sock_flag(sk, SOCK_MEMALLOC)) { NET_INC_STATS(sock_net(sk), LINUX_MIB_PFMEMALLOCDROP); return -ENOMEM; } Memalloc sockets are used for stuff like swap over NBD or NFS and only memalloc sockets can process memalloc skbs. Since we don't have any memalloc sockets in our setups we shouldn't have memalloc skbs either. It simply doesn't make any sense to waste memory reserves on skb which will be dropped anyway. It appears that __dev_alloc_pages() unconditionally uses __GFP_MEMALLOC, so unless caller added __GFP_NOMEMALLOC, the __dev_alloc_pages() may dive into memory reserves. Later build_skb() or __skb_fill_page_desc() sets skb->pfmemalloc = 1 so this skb always dropped by sk_filter_trim_cap(). Instead of wasting memory reserves we simply shouldn't use them in the case of absence memalloc sockets in the system. Do this by adding the __GFP_MEMALLOC only when such socket is present in the system. Fixes: 0614002bb5f7 ("netvm: propagate page->pfmemalloc from skb_alloc_page to skb") Signed-off-by: Andrey Ryabinin --- include/linux/skbuff.h | 17 ++++++++++++++++- include/net/sock.h | 15 --------------- 2 files changed, 16 insertions(+), 16 deletions(-) diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index a06275a618f0..676e54f84de4 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -2784,6 +2784,19 @@ void napi_consume_skb(struct sk_buff *skb, int budget); void __kfree_skb_flush(void); void __kfree_skb_defer(struct sk_buff *skb); +#ifdef CONFIG_NET +DECLARE_STATIC_KEY_FALSE(memalloc_socks_key); +static inline int sk_memalloc_socks(void) +{ + return static_branch_unlikely(&memalloc_socks_key); +} +#else +static inline int sk_memalloc_socks(void) +{ + return 0; +} +#endif + /** * __dev_alloc_pages - allocate page for network Rx * @gfp_mask: allocation priority. Set __GFP_NOMEMALLOC if not for network Rx @@ -2804,7 +2817,9 @@ static inline struct page *__dev_alloc_pages(gfp_t gfp_mask, * 4. __GFP_MEMALLOC is ignored if __GFP_NOMEMALLOC is set due to * code in gfp_to_alloc_flags that should be enforcing this. */ - gfp_mask |= __GFP_COMP | __GFP_MEMALLOC; + gfp_mask |= __GFP_COMP; + if (sk_memalloc_socks()) + gfp_mask |= __GFP_MEMALLOC; return alloc_pages_node(NUMA_NO_NODE, gfp_mask, order); } diff --git a/include/net/sock.h b/include/net/sock.h index bdd77bbce7d8..5b2138d47bd8 100644 --- a/include/net/sock.h +++ b/include/net/sock.h @@ -838,21 +838,6 @@ static inline bool sock_flag(const struct sock *sk, enum sock_flags flag) return test_bit(flag, &sk->sk_flags); } -#ifdef CONFIG_NET -DECLARE_STATIC_KEY_FALSE(memalloc_socks_key); -static inline int sk_memalloc_socks(void) -{ - return static_branch_unlikely(&memalloc_socks_key); -} -#else - -static inline int sk_memalloc_socks(void) -{ - return 0; -} - -#endif - static inline gfp_t sk_gfp_mask(const struct sock *sk, gfp_t gfp_mask) { return gfp_mask | (sk->sk_allocation & __GFP_MEMALLOC); -- 2.21.0