Received: by 2002:a05:6a10:1287:0:0:0:0 with SMTP id d7csp621698pxv; Thu, 15 Jul 2021 11:46:02 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxCEky7MapXreZktdxwq3rA5xwishVHuXW2bjmvAHvL41DNb/bISU2fuCeq+sxa+SHpHDJo X-Received: by 2002:a05:6e02:eac:: with SMTP id u12mr3719487ilj.216.1626374761821; Thu, 15 Jul 2021 11:46:01 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1626374761; cv=none; d=google.com; s=arc-20160816; b=xal4fZuWiVeJwFk3q9Rqcx1MZgE4Zxn1eLk5NHxxuNg8V4Xqw17ym/aywy3V/U4jm2 jphNS+/C/z9sSzflt+l0yI6V4wdWrybogWTp1TnEgAaRRV0q3FPF434S7DJBGDecOVDO wQNCqpEgK92GSIVxofSlvuPjj/q6/7/Cwl6/EAiHbNWiRMbu/8zjfUKUeIkogarfrjeM WvL122+WMnvfKbX+dOoRPEywqNsnJdPDIog2IW6Vb2J6XOoZoSZK0t9+uTV9vM0+BV06 KpMus77pTiS8NmW9iu524JOHY6Tz7RTBYsAdACv113m2IuvgQT8e3fvEpZBiWtcIv3Co /PLg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :user-agent:references:in-reply-to:message-id:date:subject:cc:to :from:dkim-signature; bh=OaMtCsawil+ICviBl0e1xTaYvtb7X0+fEkvPg7VHz70=; b=gwuJWDJOqfZc/NF2Fef0w5Xe3UDnw+v+H5UwBYU+RqIcCjkTTeYEA+jOIFPu2qAfGB fQ0zaw8jz5vo4kvtF71XCwZbzjK9eDjjLGX3fPtk46RbR0mvMYPMB1992RIBov7Z35so 4c+JoJ5FDyshvw2WbkB6f1kdSl9xk+WsKZyenfY1ekbZpCPpxCXDnZBWhI9xgqrV2FP2 Qg8q4V1kjE2k4ru/KNXBV3DYG/TeAAH3NtQUmjEvaZme5LNiQPIUbIvfrY/rYMMU6T7I WqnPLCnbjiN/HfY1dGq7hl5d1KI6nsCkkQDzZjWVoO2mKlUiLXgcSx0e9WzEnLH0PRhA YJrg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linuxfoundation.org header.s=korg header.b=xtuwygXM; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linuxfoundation.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id m4si7497957ioy.95.2021.07.15.11.45.48; Thu, 15 Jul 2021 11:46:01 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@linuxfoundation.org header.s=korg header.b=xtuwygXM; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linuxfoundation.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235765AbhGOSqq (ORCPT + 99 others); Thu, 15 Jul 2021 14:46:46 -0400 Received: from mail.kernel.org ([198.145.29.99]:47066 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S237753AbhGOSqD (ORCPT ); Thu, 15 Jul 2021 14:46:03 -0400 Received: by mail.kernel.org (Postfix) with ESMTPSA id E0741613D7; Thu, 15 Jul 2021 18:43:08 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linuxfoundation.org; s=korg; t=1626374589; bh=yOtutcEAAnUNYtCF8vpg5yEMZ33aMVeFVytmFilJ1aU=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=xtuwygXMpgQOI7Sl6L4BfE77tv5unpkW+lLl1Jd3cb+x/GrQho+8Vkjrqxa/QYkM/ OzUBweNMQVSENCPBHndretozcKXL7M2MuYnsvJ0PDUtQoo3Qbpnl6nRn66fjOde4Jr wNaLQCgwCU0GMl9JXf1T36tZhnj8MXbfN7Nmq2oo= From: Greg Kroah-Hartman To: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman , stable@vger.kernel.org, Dave Jones , Jakub Kicinski , "David S. Miller" , Sasha Levin Subject: [PATCH 5.4 062/122] net: ip: avoid OOM kills with large UDP sends over loopback Date: Thu, 15 Jul 2021 20:38:29 +0200 Message-Id: <20210715182505.909309984@linuxfoundation.org> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20210715182448.393443551@linuxfoundation.org> References: <20210715182448.393443551@linuxfoundation.org> User-Agent: quilt/0.66 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Jakub Kicinski [ Upstream commit 6d123b81ac615072a8525c13c6c41b695270a15d ] Dave observed number of machines hitting OOM on the UDP send path. The workload seems to be sending large UDP packets over loopback. Since loopback has MTU of 64k kernel will try to allocate an skb with up to 64k of head space. This has a good chance of failing under memory pressure. What's worse if the message length is <32k the allocation may trigger an OOM killer. This is entirely avoidable, we can use an skb with page frags. af_unix solves a similar problem by limiting the head length to SKB_MAX_ALLOC. This seems like a good and simple approach. It means that UDP messages > 16kB will now use fragments if underlying device supports SG, if extra allocator pressure causes regressions in real workloads we can switch to trying the large allocation first and falling back. v4: pre-calculate all the additions to alloclen so we can be sure it won't go over order-2 Reported-by: Dave Jones Signed-off-by: Jakub Kicinski Signed-off-by: David S. Miller Signed-off-by: Sasha Levin --- net/ipv4/ip_output.c | 32 ++++++++++++++++++-------------- net/ipv6/ip6_output.c | 32 +++++++++++++++++--------------- 2 files changed, 35 insertions(+), 29 deletions(-) diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c index 7a394479dd56..f52bc9c22e5b 100644 --- a/net/ipv4/ip_output.c +++ b/net/ipv4/ip_output.c @@ -1048,7 +1048,7 @@ static int __ip_append_data(struct sock *sk, unsigned int datalen; unsigned int fraglen; unsigned int fraggap; - unsigned int alloclen; + unsigned int alloclen, alloc_extra; unsigned int pagedlen; struct sk_buff *skb_prev; alloc_new_skb: @@ -1068,35 +1068,39 @@ alloc_new_skb: fraglen = datalen + fragheaderlen; pagedlen = 0; + alloc_extra = hh_len + 15; + alloc_extra += exthdrlen; + + /* The last fragment gets additional space at tail. + * Note, with MSG_MORE we overallocate on fragments, + * because we have no idea what fragment will be + * the last. + */ + if (datalen == length + fraggap) + alloc_extra += rt->dst.trailer_len; + if ((flags & MSG_MORE) && !(rt->dst.dev->features&NETIF_F_SG)) alloclen = mtu; - else if (!paged) + else if (!paged && + (fraglen + alloc_extra < SKB_MAX_ALLOC || + !(rt->dst.dev->features & NETIF_F_SG))) alloclen = fraglen; else { alloclen = min_t(int, fraglen, MAX_HEADER); pagedlen = fraglen - alloclen; } - alloclen += exthdrlen; - - /* The last fragment gets additional space at tail. - * Note, with MSG_MORE we overallocate on fragments, - * because we have no idea what fragment will be - * the last. - */ - if (datalen == length + fraggap) - alloclen += rt->dst.trailer_len; + alloclen += alloc_extra; if (transhdrlen) { - skb = sock_alloc_send_skb(sk, - alloclen + hh_len + 15, + skb = sock_alloc_send_skb(sk, alloclen, (flags & MSG_DONTWAIT), &err); } else { skb = NULL; if (refcount_read(&sk->sk_wmem_alloc) + wmem_alloc_delta <= 2 * sk->sk_sndbuf) - skb = alloc_skb(alloclen + hh_len + 15, + skb = alloc_skb(alloclen, sk->sk_allocation); if (unlikely(!skb)) err = -ENOBUFS; diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c index 7a80c42fcce2..4dcbb1ccab25 100644 --- a/net/ipv6/ip6_output.c +++ b/net/ipv6/ip6_output.c @@ -1484,7 +1484,7 @@ emsgsize: unsigned int datalen; unsigned int fraglen; unsigned int fraggap; - unsigned int alloclen; + unsigned int alloclen, alloc_extra; unsigned int pagedlen; alloc_new_skb: /* There's no room in the current skb */ @@ -1511,17 +1511,28 @@ alloc_new_skb: fraglen = datalen + fragheaderlen; pagedlen = 0; + alloc_extra = hh_len; + alloc_extra += dst_exthdrlen; + alloc_extra += rt->dst.trailer_len; + + /* We just reserve space for fragment header. + * Note: this may be overallocation if the message + * (without MSG_MORE) fits into the MTU. + */ + alloc_extra += sizeof(struct frag_hdr); + if ((flags & MSG_MORE) && !(rt->dst.dev->features&NETIF_F_SG)) alloclen = mtu; - else if (!paged) + else if (!paged && + (fraglen + alloc_extra < SKB_MAX_ALLOC || + !(rt->dst.dev->features & NETIF_F_SG))) alloclen = fraglen; else { alloclen = min_t(int, fraglen, MAX_HEADER); pagedlen = fraglen - alloclen; } - - alloclen += dst_exthdrlen; + alloclen += alloc_extra; if (datalen != length + fraggap) { /* @@ -1531,30 +1542,21 @@ alloc_new_skb: datalen += rt->dst.trailer_len; } - alloclen += rt->dst.trailer_len; fraglen = datalen + fragheaderlen; - /* - * We just reserve space for fragment header. - * Note: this may be overallocation if the message - * (without MSG_MORE) fits into the MTU. - */ - alloclen += sizeof(struct frag_hdr); - copy = datalen - transhdrlen - fraggap - pagedlen; if (copy < 0) { err = -EINVAL; goto error; } if (transhdrlen) { - skb = sock_alloc_send_skb(sk, - alloclen + hh_len, + skb = sock_alloc_send_skb(sk, alloclen, (flags & MSG_DONTWAIT), &err); } else { skb = NULL; if (refcount_read(&sk->sk_wmem_alloc) + wmem_alloc_delta <= 2 * sk->sk_sndbuf) - skb = alloc_skb(alloclen + hh_len, + skb = alloc_skb(alloclen, sk->sk_allocation); if (unlikely(!skb)) err = -ENOBUFS; -- 2.30.2