Received: by 2002:a05:6a10:f3d0:0:0:0:0 with SMTP id a16csp4580718pxv; Tue, 6 Jul 2021 04:33:59 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzO7z1cYqDgff0QO3+eODc8xs2DJ+Su8CU3bN+Tq9nPro16GdhasGfb2ymVKUFDJ0vSdJX/ X-Received: by 2002:a05:6602:2f05:: with SMTP id q5mr15677533iow.192.1625571239530; Tue, 06 Jul 2021 04:33:59 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1625571239; cv=none; d=google.com; s=arc-20160816; b=jnfn1uYUfvLba3Q7wvYXD2uZRPE+UCqk/iit5SKz9u33FjvUVSFoPvpGMhOsR4hoLa BAzBCNorIOA3RWn1b9eW+obCbPA87Z5n0PLV4TtPp1sc7V9jrenYrijLZ6LxlTF6VB6w EV2xihXsxhn3mRxZatlK4+T+8fgKCLVvhOTBZOKnmWJ0xCHSXXvxiLYhB0XpvPkVTF96 +vxJ/6JBDyj+7NLa+uuKJe2BUWG0l2Xa/SJ2Qb7PMx08+JkVmRLnFCt/tBzC08S92Tyr 2Lz01woMB4j0lHTdWbI4Tew6nJwDkVDYkgjDV8YOm3lu15xapwT8mJhqWzhcf6aXY250 sV0w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=R+2jtwJ/eETPnFrdz10FCgwCmsmXIDgiEmZL/Nrj5ac=; b=D3IbGV4HwPePzK0BfDM7+KPzp3krhnxoxXcNmZNTUtQR9Louvc3powErlqWgZ+lQF9 mViMTTZYy/nE3emc+e+ovymar9+EHX+1XL6t8qFp/rV5Ul8sbUp0gXyZ1bH/gWslnieP A2yhy7h/asPcgh50BYZ3Z87Dli19aBm2Wso3LolIipwkTkjzBXwT4QN5jsOeYzpOoLel PBsIElQU47nlzgPctXDX/zlJ0c6g01XP2BUDm+DmT0Wd/MH7HVE5g45LIzynqMZrv92c 8RVMOUv5LC0F0v+Eku00oKr2wdCtJMx/xJwDohfO2up/8feNreLEgzRK4VQQRwgYjI79 lS4A== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=dTIkawkk; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id i12si940196iow.7.2021.07.06.04.33.46; Tue, 06 Jul 2021 04:33:59 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=dTIkawkk; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232723AbhGFLcr (ORCPT + 99 others); Tue, 6 Jul 2021 07:32:47 -0400 Received: from mail.kernel.org ([198.145.29.99]:55966 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232979AbhGFLWW (ORCPT ); Tue, 6 Jul 2021 07:22:22 -0400 Received: by mail.kernel.org (Postfix) with ESMTPSA id 3F81161C92; Tue, 6 Jul 2021 11:17:55 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1625570276; bh=1+0dQX6lUbrtNao9Ov+J26qHQNDAk7zkrpUfrZtwHJE=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=dTIkawkkuhU1vfyYSJi1mHAD7xv7v6Uu6NthAiGRcoCmpOSs9uMeUBgwCwK8LzBVx WeMm0tVpAMsE3p+b5zsOtnDuuj5TTMbxFS3noHUoIXPUmCAWF1GAjCCQrTFe6XeaBR zkAo6WAHN2+wZPfO4eUI81HaHsp2cRVo7Jz1HcLWpNPxpzSRvQfV0MIiKd57IcPnM3 MXLzdWV/LGBbmuHSWhxfnAEKQsRRCYrtur/J9JjQQXlhGaqZPupEQRlts1e0pUzyWR Cqnwcg/Z8ejzM25OQUlgQ2zvmQuErp2l/PaumZgKRjjNVCioxAh8qRxVjEN3s3CEZZ /SNnjdw4n2b0Q== From: Sasha Levin To: linux-kernel@vger.kernel.org, stable@vger.kernel.org Cc: Jakub Kicinski , Dave Jones , "David S . Miller" , Sasha Levin , netdev@vger.kernel.org Subject: [PATCH AUTOSEL 5.13 168/189] net: ip: avoid OOM kills with large UDP sends over loopback Date: Tue, 6 Jul 2021 07:13:48 -0400 Message-Id: <20210706111409.2058071-168-sashal@kernel.org> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20210706111409.2058071-1-sashal@kernel.org> References: <20210706111409.2058071-1-sashal@kernel.org> MIME-Version: 1.0 X-stable: review X-Patchwork-Hint: Ignore Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Jakub Kicinski [ Upstream commit 6d123b81ac615072a8525c13c6c41b695270a15d ] Dave observed number of machines hitting OOM on the UDP send path. The workload seems to be sending large UDP packets over loopback. Since loopback has MTU of 64k kernel will try to allocate an skb with up to 64k of head space. This has a good chance of failing under memory pressure. What's worse if the message length is <32k the allocation may trigger an OOM killer. This is entirely avoidable, we can use an skb with page frags. af_unix solves a similar problem by limiting the head length to SKB_MAX_ALLOC. This seems like a good and simple approach. It means that UDP messages > 16kB will now use fragments if underlying device supports SG, if extra allocator pressure causes regressions in real workloads we can switch to trying the large allocation first and falling back. v4: pre-calculate all the additions to alloclen so we can be sure it won't go over order-2 Reported-by: Dave Jones Signed-off-by: Jakub Kicinski Signed-off-by: David S. Miller Signed-off-by: Sasha Levin --- net/ipv4/ip_output.c | 32 ++++++++++++++++++-------------- net/ipv6/ip6_output.c | 32 +++++++++++++++++--------------- 2 files changed, 35 insertions(+), 29 deletions(-) diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c index c3efc7d658f6..8d8a8da3ae7e 100644 --- a/net/ipv4/ip_output.c +++ b/net/ipv4/ip_output.c @@ -1054,7 +1054,7 @@ static int __ip_append_data(struct sock *sk, unsigned int datalen; unsigned int fraglen; unsigned int fraggap; - unsigned int alloclen; + unsigned int alloclen, alloc_extra; unsigned int pagedlen; struct sk_buff *skb_prev; alloc_new_skb: @@ -1074,35 +1074,39 @@ static int __ip_append_data(struct sock *sk, fraglen = datalen + fragheaderlen; pagedlen = 0; + alloc_extra = hh_len + 15; + alloc_extra += exthdrlen; + + /* The last fragment gets additional space at tail. + * Note, with MSG_MORE we overallocate on fragments, + * because we have no idea what fragment will be + * the last. + */ + if (datalen == length + fraggap) + alloc_extra += rt->dst.trailer_len; + if ((flags & MSG_MORE) && !(rt->dst.dev->features&NETIF_F_SG)) alloclen = mtu; - else if (!paged) + else if (!paged && + (fraglen + alloc_extra < SKB_MAX_ALLOC || + !(rt->dst.dev->features & NETIF_F_SG))) alloclen = fraglen; else { alloclen = min_t(int, fraglen, MAX_HEADER); pagedlen = fraglen - alloclen; } - alloclen += exthdrlen; - - /* The last fragment gets additional space at tail. - * Note, with MSG_MORE we overallocate on fragments, - * because we have no idea what fragment will be - * the last. - */ - if (datalen == length + fraggap) - alloclen += rt->dst.trailer_len; + alloclen += alloc_extra; if (transhdrlen) { - skb = sock_alloc_send_skb(sk, - alloclen + hh_len + 15, + skb = sock_alloc_send_skb(sk, alloclen, (flags & MSG_DONTWAIT), &err); } else { skb = NULL; if (refcount_read(&sk->sk_wmem_alloc) + wmem_alloc_delta <= 2 * sk->sk_sndbuf) - skb = alloc_skb(alloclen + hh_len + 15, + skb = alloc_skb(alloclen, sk->sk_allocation); if (unlikely(!skb)) err = -ENOBUFS; diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c index ff4f9ebcf7f6..497974b4372a 100644 --- a/net/ipv6/ip6_output.c +++ b/net/ipv6/ip6_output.c @@ -1555,7 +1555,7 @@ static int __ip6_append_data(struct sock *sk, unsigned int datalen; unsigned int fraglen; unsigned int fraggap; - unsigned int alloclen; + unsigned int alloclen, alloc_extra; unsigned int pagedlen; alloc_new_skb: /* There's no room in the current skb */ @@ -1582,17 +1582,28 @@ static int __ip6_append_data(struct sock *sk, fraglen = datalen + fragheaderlen; pagedlen = 0; + alloc_extra = hh_len; + alloc_extra += dst_exthdrlen; + alloc_extra += rt->dst.trailer_len; + + /* We just reserve space for fragment header. + * Note: this may be overallocation if the message + * (without MSG_MORE) fits into the MTU. + */ + alloc_extra += sizeof(struct frag_hdr); + if ((flags & MSG_MORE) && !(rt->dst.dev->features&NETIF_F_SG)) alloclen = mtu; - else if (!paged) + else if (!paged && + (fraglen + alloc_extra < SKB_MAX_ALLOC || + !(rt->dst.dev->features & NETIF_F_SG))) alloclen = fraglen; else { alloclen = min_t(int, fraglen, MAX_HEADER); pagedlen = fraglen - alloclen; } - - alloclen += dst_exthdrlen; + alloclen += alloc_extra; if (datalen != length + fraggap) { /* @@ -1602,30 +1613,21 @@ static int __ip6_append_data(struct sock *sk, datalen += rt->dst.trailer_len; } - alloclen += rt->dst.trailer_len; fraglen = datalen + fragheaderlen; - /* - * We just reserve space for fragment header. - * Note: this may be overallocation if the message - * (without MSG_MORE) fits into the MTU. - */ - alloclen += sizeof(struct frag_hdr); - copy = datalen - transhdrlen - fraggap - pagedlen; if (copy < 0) { err = -EINVAL; goto error; } if (transhdrlen) { - skb = sock_alloc_send_skb(sk, - alloclen + hh_len, + skb = sock_alloc_send_skb(sk, alloclen, (flags & MSG_DONTWAIT), &err); } else { skb = NULL; if (refcount_read(&sk->sk_wmem_alloc) + wmem_alloc_delta <= 2 * sk->sk_sndbuf) - skb = alloc_skb(alloclen + hh_len, + skb = alloc_skb(alloclen, sk->sk_allocation); if (unlikely(!skb)) err = -ENOBUFS; -- 2.30.2