Received: by 2002:a25:d7c1:0:0:0:0:0 with SMTP id o184csp126801ybg; Tue, 22 Oct 2019 17:21:28 -0700 (PDT) X-Google-Smtp-Source: APXvYqyS5WDIBXl1rJFfLZNy9g38AoGDwFiu/wcz3AQERkemF8A3PWZZDt586iuHfMKP0+GfbB5q X-Received: by 2002:a17:906:d214:: with SMTP id w20mr29799387ejz.68.1571790088549; Tue, 22 Oct 2019 17:21:28 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1571790088; cv=none; d=google.com; s=arc-20160816; b=jvWzPJLAUhnytmLJNC+ZmyqlYrXhhnFz2ljx8Z0v/AjeyP1Qid+SmZIma/rbvPAFm5 sqxF6XYNfPfvuotYmbsR4DvYih+GDiMGfInXkijTRGI/7XVlb44dAQ8O5ZfYaR36Lijw vOjtJiK0I1skeLiFygkOjL8ZO6RraB4mX26j1zNs+BxS4LFlyaaKw69ihH6I0eU+L1Uo SJDiFpqUnaPUebqqOjhFPFyVtp7+IA5+srUQhpwNHPt3lp9UQqIIl8gd5Z20GVLVpc9I C+z2vpscpor127kEdzj2cJNWekRxL+R/Uf7GB02wABPwFeFbz2Eb96GXL9mz81P7PIZe ISHw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from:dkim-signature; bh=SxDH2MC3dIvDcpX5lxZ6Q3L+W2CwFb0BSjYPclNc+5s=; b=0rm8USNEIMRnpiVm3vhuc9PmOU51l+N42eb0xSt0X65dj/OkXNuLd4mT9KkSXOigb6 YFVH72V4PyhgIHqDmoVRp6PwQ1T61Z9SyNF1Eih0qi7bbfTI/9cDWs+/md7eP+N4jRAK ommy2+dC6hvXX56J/3IcqNOeGxjS4P1+QNRV33kmch0F6HO6exS5gP1WZfMI5X0s3dDH KznCcofysUsNyRXxlGEvF3+goqtKtfUwMMpa7bhFgbRgVcmekup7jqI2NZJiAJ3QA629 QfJkJvd+gv4rzNLimhUORSyVQvn3qAP2iiyMgPmb8g+zI/6xH1RmBLZ/gZmwy84qjP2C iHEA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@cmpxchg-org.20150623.gappssmtp.com header.s=20150623 header.b="sPfa/q/b"; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=cmpxchg.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id dx19si2217299ejb.113.2019.10.22.17.21.04; Tue, 22 Oct 2019 17:21:28 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@cmpxchg-org.20150623.gappssmtp.com header.s=20150623 header.b="sPfa/q/b"; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=cmpxchg.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2389732AbfJVXhS (ORCPT + 99 others); Tue, 22 Oct 2019 19:37:18 -0400 Received: from mail-qk1-f194.google.com ([209.85.222.194]:35659 "EHLO mail-qk1-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1732805AbfJVXhR (ORCPT ); Tue, 22 Oct 2019 19:37:17 -0400 Received: by mail-qk1-f194.google.com with SMTP id w2so18077290qkf.2 for ; Tue, 22 Oct 2019 16:37:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cmpxchg-org.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=SxDH2MC3dIvDcpX5lxZ6Q3L+W2CwFb0BSjYPclNc+5s=; b=sPfa/q/byWY2+7Hn+C2BJi79zjdBqk3v2vpzotcKzKWgvz/US699VypLShsR1FE0xn HoSmnl4W1yx0Zg47VhWYFvSxTicaWE5BsQJKzqWGxCS5sqBcpBKu38tBh+a8FL1O5Z0+ 3tl9QFtVs00aX65Mhrj0USlCsm5tksvq0eWdsoCIGZslYTpUs0DkWlf5BXiidjEmyFfm issgcyUzjYqXPY7d9XNi84i8pwmyT+O2TegVTRdypqWP9B/Ez6s+YXxt9mPLHp850Kfr BUlQ0aF4EKpaXoTjtH4a0bxg0KoBU8pZhDHayYemwEEhZZWw74ZaXu38Cqz5THcyfRIm vxkQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=SxDH2MC3dIvDcpX5lxZ6Q3L+W2CwFb0BSjYPclNc+5s=; b=l9KsaikP89x5pG0z7meMuwsm2KE1zQKw31FpiNB+hqSijEfw53zlbemRDZLqxoetzi bHJwvJnyAuSIjY/9CZaj5iXERGDDiR5KU3ChfAVXWT1wgx7/yT6YapAzOXNg+EPwJdL8 olMQsCqenUEHvB0F+azgNVdr+PlEOUhIKWOajA66WmkXifn0A1mc2ZwSX0svQn9r75Cw Iu5b7A2n7rZLJPDyZ6LNu4TXRA0NgeEhOQYh16twTGiZFYj98elLST0WIHbrWQirZgxl T5V0abgPmNIlUVFAQxsb8JHkSMMUl80q1dtw9TIHa2Be/uZCQXGaJvjJXi5rtoFo3bdi EocA== X-Gm-Message-State: APjAAAVEi56xRkGrkwGuzx8WjyesaydKwf40RYyfFow9cmQu3Hkq/eVX MNlqdht1up8FFrgRplEqz5FVpg== X-Received: by 2002:a37:6212:: with SMTP id w18mr6001419qkb.204.1571787430558; Tue, 22 Oct 2019 16:37:10 -0700 (PDT) Received: from localhost ([2620:10d:c091:500::3:869e]) by smtp.gmail.com with ESMTPSA id r7sm10300827qkf.124.2019.10.22.16.37.09 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 22 Oct 2019 16:37:09 -0700 (PDT) From: Johannes Weiner To: Andrew Morton Cc: Shakeel Butt , Michal Hocko , linux-mm@kvack.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, netdev@vger.kernel.org, kernel-team@fb.com Subject: [PATCH] mm: memcontrol: fix network errors from failing __GFP_ATOMIC charges Date: Tue, 22 Oct 2019 19:37:08 -0400 Message-Id: <20191022233708.365764-1-hannes@cmpxchg.org> X-Mailer: git-send-email 2.23.0 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org While upgrading from 4.16 to 5.2, we noticed these allocation errors in the log of the new kernel: [ 8642.253395] SLUB: Unable to allocate memory on node -1, gfp=0xa20(GFP_ATOMIC) [ 8642.269170] cache: tw_sock_TCPv6(960:helper-logs), object size: 232, buffer size: 240, default order: 1, min order: 0 [ 8642.293009] node 0: slabs: 5, objs: 170, free: 0 slab_out_of_memory+1 ___slab_alloc+969 __slab_alloc+14 kmem_cache_alloc+346 inet_twsk_alloc+60 tcp_time_wait+46 tcp_fin+206 tcp_data_queue+2034 tcp_rcv_state_process+784 tcp_v6_do_rcv+405 __release_sock+118 tcp_close+385 inet_release+46 __sock_release+55 sock_close+17 __fput+170 task_work_run+127 exit_to_usermode_loop+191 do_syscall_64+212 entry_SYSCALL_64_after_hwframe+68 accompanied by an increase in machines going completely radio silent under memory pressure. One thing that changed since 4.16 is e699e2c6a654 ("net, mm: account sock objects to kmemcg"), which made these slab caches subject to cgroup memory accounting and control. The problem with that is that cgroups, unlike the page allocator, do not maintain dedicated atomic reserves. As a cgroup's usage hovers at its limit, atomic allocations - such as done during network rx - can fail consistently for extended periods of time. The kernel is not able to operate under these conditions. We don't want to revert the culprit patch, because it indeed tracks a potentially substantial amount of memory used by a cgroup. We also don't want to implement dedicated atomic reserves for cgroups. There is no point in keeping a fixed margin of unused bytes in the cgroup's memory budget to accomodate a consumer that is impossible to predict - we'd be wasting memory and get into configuration headaches, not unlike what we have going with min_free_kbytes. We do this for physical mem because we have to, but cgroups are an accounting game. Instead, account these privileged allocations to the cgroup, but let them bypass the configured limit if they have to. This way, we get the benefits of accounting the consumed memory and have it exert pressure on the rest of the cgroup, but like with the page allocator, we shift the burden of reclaimining on behalf of atomic allocations onto the regular allocations that can block. Cc: stable@kernel.org # 4.18+ Fixes: e699e2c6a654 ("net, mm: account sock objects to kmemcg") Signed-off-by: Johannes Weiner --- mm/memcontrol.c | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 8090b4c99ac7..c7e3e758c165 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -2528,6 +2528,15 @@ static int try_charge(struct mem_cgroup *memcg, gfp_t gfp_mask, goto retry; } + /* + * Memcg doesn't have a dedicated reserve for atomic + * allocations. But like the global atomic pool, we need to + * put the burden of reclaim on regular allocation requests + * and let these go through as privileged allocations. + */ + if (gfp_mask & __GFP_ATOMIC) + goto force; + /* * Unlike in global OOM situations, memcg is not in a physical * memory shortage. Allow dying and OOM-killed tasks to -- 2.23.0