Received: by 2002:a05:6a10:8c0a:0:0:0:0 with SMTP id go10csp1244711pxb; Fri, 22 Jan 2021 10:25:16 -0800 (PST) X-Google-Smtp-Source: ABdhPJx4PoUhsAZHFGzd4BqbdtCnK75V4IJf/3jS7bgLFpUZuxHbG0fSfOMqhk//uBA+lrx10/Zj X-Received: by 2002:a50:c34b:: with SMTP id q11mr4263091edb.173.1611339916699; Fri, 22 Jan 2021 10:25:16 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1611339916; cv=none; d=google.com; s=arc-20160816; b=ykD5B2CDkAQ3moWl+tjkMHdA5TIAzLtqt/m1WTFQjdUP7liCWycHhB1NKH7gbrm8t3 z2PKhm0FMPMJi8bDL2XI8yr3YLrQBiHBiBsOc26qcmIs1u9+hS0wcO0Hxw6TKr14didR dCLVnIc9DQX3zR12aw9x45io51DDDF+ZpjpAzObyqz0/MbqHht+j1rh7OzRMWJPsfTsc rPll3HN/RFNkrTh6PF/it9clucmH7BFHEK26RzQoyzOMB7Q+QJZg5ok7Ww5dVpFT8SUA TxNf0ZEWJb3NhLUgaJC/9OR1hckfR5zFxbQpxC43JkeQ2kWhp/YqRlT68JP9rpptyRyW L+2w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :user-agent:references:in-reply-to:message-id:date:subject:cc:to :from:dkim-signature; bh=bldoxoYTLEwEr4yEgwYtusuQYzuX5LCBeD8UiId7Cn8=; b=gw1zZBc6gShmvhbDPrLbhARcPCGdm9hSu3VXTL6izpXXUngh0ITUK+YiWKVXxExl7w VQN7MMWwlHV+dpsAd7JGtHsKseS1CXyhVc0AELqDaa4zvxBaGEGE2uv75ky6sYfdzMWo MPMA3LiX2i2b3cFVZ9aH69a4RTQz4m/EP59JVbukerRBKCh/yh3ox9SW8MLj6FyHuA3i YPXKG/v9kgm4PhaYOSTRJKmOgOkhkUjcz7zygIF6Sg7UehiI4PHRDeB8a93EDC2JmlYp rSofxCX4urzS2wNqUIMF5XI+XNR81Ce+eYoKBL0rSgxJQ/fGM/pHvSdbVl2VK2bVR7QK AYlg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linuxfoundation.org header.s=korg header.b=Hzz+rKey; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linuxfoundation.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id i26si3969912edx.182.2021.01.22.10.24.52; Fri, 22 Jan 2021 10:25:16 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@linuxfoundation.org header.s=korg header.b=Hzz+rKey; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linuxfoundation.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729915AbhAVSXr (ORCPT + 99 others); Fri, 22 Jan 2021 13:23:47 -0500 Received: from mail.kernel.org ([198.145.29.99]:40024 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728578AbhAVOXD (ORCPT ); Fri, 22 Jan 2021 09:23:03 -0500 Received: by mail.kernel.org (Postfix) with ESMTPSA id 9422E23A79; Fri, 22 Jan 2021 14:17:20 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linuxfoundation.org; s=korg; t=1611325041; bh=OjkBCCbD6kjF52zIbkw0BilrXN/lZLYyKMMYNf7ASLw=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=Hzz+rKeynI7UH/T9T7FT9MNvd4Qo5S5tIyc0bcXpn/FCAoFbEC6rDRfxmfj4MCkB1 J5SHsKn2fr6Kg9RTYtuyy7inN5axgcStYG7HMfVj4W6ZJ01qB+G8doLS9P+NdTt/HV +iTjJiebYcrW3nOT/RvUxjTNOJuQEwiIscpmFtcw= From: Greg Kroah-Hartman To: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman , stable@vger.kernel.org, Eric Dumazet , Paolo Abeni , Greg Thelen , Alexander Duyck , "Michael S. Tsirkin" , Jakub Kicinski Subject: [PATCH 5.4 27/33] net: avoid 32 x truesize under-estimation for tiny skbs Date: Fri, 22 Jan 2021 15:12:43 +0100 Message-Id: <20210122135734.669951214@linuxfoundation.org> X-Mailer: git-send-email 2.30.0 In-Reply-To: <20210122135733.565501039@linuxfoundation.org> References: <20210122135733.565501039@linuxfoundation.org> User-Agent: quilt/0.66 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Eric Dumazet [ Upstream commit 3226b158e67cfaa677fd180152bfb28989cb2fac ] Both virtio net and napi_get_frags() allocate skbs with a very small skb->head While using page fragments instead of a kmalloc backed skb->head might give a small performance improvement in some cases, there is a huge risk of under estimating memory usage. For both GOOD_COPY_LEN and GRO_MAX_HEAD, we can fit at least 32 allocations per page (order-3 page in x86), or even 64 on PowerPC We have been tracking OOM issues on GKE hosts hitting tcp_mem limits but consuming far more memory for TCP buffers than instructed in tcp_mem[2] Even if we force napi_alloc_skb() to only use order-0 pages, the issue would still be there on arches with PAGE_SIZE >= 32768 This patch makes sure that small skb head are kmalloc backed, so that other objects in the slab page can be reused instead of being held as long as skbs are sitting in socket queues. Note that we might in the future use the sk_buff napi cache, instead of going through a more expensive __alloc_skb() Another idea would be to use separate page sizes depending on the allocated length (to never have more than 4 frags per page) I would like to thank Greg Thelen for his precious help on this matter, analysing crash dumps is always a time consuming task. Fixes: fd11a83dd363 ("net: Pull out core bits of __netdev_alloc_skb and add __napi_alloc_skb") Signed-off-by: Eric Dumazet Cc: Paolo Abeni Cc: Greg Thelen Reviewed-by: Alexander Duyck Acked-by: Michael S. Tsirkin Link: https://lore.kernel.org/r/20210113161819.1155526-1-eric.dumazet@gmail.com Signed-off-by: Jakub Kicinski Signed-off-by: Greg Kroah-Hartman --- net/core/skbuff.c | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-) --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -496,13 +496,17 @@ EXPORT_SYMBOL(__netdev_alloc_skb); struct sk_buff *__napi_alloc_skb(struct napi_struct *napi, unsigned int len, gfp_t gfp_mask) { - struct napi_alloc_cache *nc = this_cpu_ptr(&napi_alloc_cache); + struct napi_alloc_cache *nc; struct sk_buff *skb; void *data; len += NET_SKB_PAD + NET_IP_ALIGN; - if ((len > SKB_WITH_OVERHEAD(PAGE_SIZE)) || + /* If requested length is either too small or too big, + * we use kmalloc() for skb->head allocation. + */ + if (len <= SKB_WITH_OVERHEAD(1024) || + len > SKB_WITH_OVERHEAD(PAGE_SIZE) || (gfp_mask & (__GFP_DIRECT_RECLAIM | GFP_DMA))) { skb = __alloc_skb(len, gfp_mask, SKB_ALLOC_RX, NUMA_NO_NODE); if (!skb) @@ -510,6 +514,7 @@ struct sk_buff *__napi_alloc_skb(struct goto skb_success; } + nc = this_cpu_ptr(&napi_alloc_cache); len += SKB_DATA_ALIGN(sizeof(struct skb_shared_info)); len = SKB_DATA_ALIGN(len);