Received: by 2002:a25:4158:0:0:0:0:0 with SMTP id o85csp2184366yba; Fri, 19 Apr 2019 14:00:57 -0700 (PDT) X-Google-Smtp-Source: APXvYqzQOLasD4kmk7F8OO6HE6TQU5Xlc8fFCoaCMsp6QS7SyYWkXO0x2ZJKCVMKXvMB/vPu8j5C X-Received: by 2002:a62:14d7:: with SMTP id 206mr6083278pfu.162.1555707657391; Fri, 19 Apr 2019 14:00:57 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1555707657; cv=none; d=google.com; s=arc-20160816; b=Me1Vb799smP++9HhvZ+58r5KZKuT4JYnH6PAMVhwIGOskVjgbWMe70Y4iqV8cFL989 Euf1DvDIHFqi05hIt7gZyY4374rwW04hY48i563/cAgscjV6rt8En8OVKQ5Q9TmxVgDY s/QXmjywKmkzaG6rZB+j6TI4nKzUhcL/S+nyCa6g9Hd9OLcTudIhG1/tpa/GnV5ZjdxJ Oseri3+yc63yJflM4XhXwKiN/Kj3EwosYE5Sg1inLmYe3fgmG88jLpBnA/ishTnK/ENt KLvtWfajkPhzGUJ9eXqThSD8b+SsAec/HA4ZtEEYdKlgSRZmrKgJvOW6nxitAO767l5n tkRA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject; bh=w131tOIpk8eCFViqpoCBJ8wruYzbRe7UeeFaPsX8B3Y=; b=rVZO5IvOYAiH/GrzU1bIkNmHz0VjJj0iV+gp0WuptfFxZE/aE2Vr5phQKlXorJpI5w BoqAW4Ii/BFK9L0lmcuOlKDvs0YtUwzabgQhFUU+VNekqHjo8hBKiGg1pUXfUvsobYNW FMQB3/7rHdTsEylokexQhPILMNjfIWNCXcaBnyS6pBLK18DNUIlYIgFZ6Xlp7Z+6RMU0 KClCs8NJ81l6DX5hIipvlN83ZMU+kg75b2A7Jd6SP/0xDGO1gCikhEtE9LjjRvKILv+P 8eKX+fB8Svg53N6+vbh/+9pD+fenbh4oQx3Y7/fZd3EgoAMHQCNDb8I7GYQaQoksMt1W KBXQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=virtuozzo.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id q1si6488510pfb.68.2019.04.19.14.00.42; Fri, 19 Apr 2019 14:00:57 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=virtuozzo.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727572AbfDSU6n (ORCPT + 99 others); Fri, 19 Apr 2019 16:58:43 -0400 Received: from relay.sw.ru ([185.231.240.75]:35560 "EHLO relay.sw.ru" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726358AbfDSU6n (ORCPT ); Fri, 19 Apr 2019 16:58:43 -0400 Received: from [172.16.25.12] by relay.sw.ru with esmtp (Exim 4.91) (envelope-from ) id 1hHTOT-0008G9-GU; Fri, 19 Apr 2019 16:17:29 +0300 Subject: Re: [PATCH 1/4] net/skbuff: don't waste memory reserves To: Eric Dumazet , "David S. Miller" Cc: Eric Dumazet , Mel Gorman , Willem de Bruijn , Florian Westphal , linux-kernel@vger.kernel.org, netdev@vger.kernel.org References: <20190418180524.23489-1-aryabinin@virtuozzo.com> <791f4f23-d931-4ac8-4e60-3ffe46c4ece2@gmail.com> From: Andrey Ryabinin Message-ID: Date: Fri, 19 Apr 2019 16:17:58 +0300 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.6.1 MIME-Version: 1.0 In-Reply-To: <791f4f23-d931-4ac8-4e60-3ffe46c4ece2@gmail.com> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 4/18/19 9:55 PM, Eric Dumazet wrote: > > > On 04/18/2019 11:05 AM, Andrey Ryabinin wrote: >> In some workloads we have noticed packets being dropped by >> sk_filter_trim_cap() because the 'skb' was allocated from pfmemalloc >> reserves: >> >> /* >> * If the skb was allocated from pfmemalloc reserves, only >> * allow SOCK_MEMALLOC sockets to use it as this socket is >> * helping free memory >> */ >> if (skb_pfmemalloc(skb) && !sock_flag(sk, SOCK_MEMALLOC)) { >> NET_INC_STATS(sock_net(sk), LINUX_MIB_PFMEMALLOCDROP); >> return -ENOMEM; >> } >> >> Memalloc sockets are used for stuff like swap over NBD or NFS >> and only memalloc sockets can process memalloc skbs. Since we >> don't have any memalloc sockets in our setups we shouldn't have >> memalloc skbs either. It simply doesn't make any sense to waste >> memory reserves on skb which will be dropped anyway. >> >> It appears that __dev_alloc_pages() unconditionally uses >> __GFP_MEMALLOC, so unless caller added __GFP_NOMEMALLOC, the >> __dev_alloc_pages() may dive into memory reserves. >> Later build_skb() or __skb_fill_page_desc() sets skb->pfmemalloc = 1 >> so this skb always dropped by sk_filter_trim_cap(). >> >> Instead of wasting memory reserves we simply shouldn't use them in the >> case of absence memalloc sockets in the system. Do this by adding >> the __GFP_MEMALLOC only when such socket is present in the system. >> >> Fixes: 0614002bb5f7 ("netvm: propagate page->pfmemalloc from skb_alloc_page to skb") >> Signed-off-by: Andrey Ryabinin >> --- >> include/linux/skbuff.h | 17 ++++++++++++++++- >> include/net/sock.h | 15 --------------- >> 2 files changed, 16 insertions(+), 16 deletions(-) >> > > Hi Andrey > > Are you targeting net or net-next tree ? > I think it's up to Dave to decide where the patches should go. They apply cleanly on both trees. The last two patches just minor cleanups so they are definitely net-next material. > AFAIK, drivers allocate skbs way before a frame is actually received, > (at RX ring buffer initialization or refill) > > So sk_memalloc_socks() might be false at that time, but true later. > I don't see why that would be a problem. If refill failed because we didn't have access to reserves, then there going to be an another refill attempt, right? And the next refill attempt will be with access to the reserves if memalloc socket was created. We can't predict the future, so until the memalloc socket appeared we must assume that those RX ring buffers won't be used to reclaim memory (and that is actually true in 99% of cases).