Received: by 10.213.65.68 with SMTP id h4csp3676157imn; Tue, 3 Apr 2018 08:58:01 -0700 (PDT) X-Google-Smtp-Source: AIpwx4+fNSdbh/UTGAL7STefvgnNEtj/zyQcsasTF3vXj+LHdxx4+bf7jD8Wb3M9DA0GK8A83mI8 X-Received: by 2002:a17:902:8d84:: with SMTP id v4-v6mr14445617plo.215.1522771081894; Tue, 03 Apr 2018 08:58:01 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1522771081; cv=none; d=google.com; s=arc-20160816; b=NrzJDRqYMvF92nsmANnNWt1swyKQsSfyXvcOfPbLkbZarjnMs9yTDD9znxEBGrx15m bv0HyjiqjBn9n+W3gLt65kYS+EUdkcTI8kESMeBQ1uDhHLnoIFosRWVmyzz3ACO0f/jf DE2KYQK8/7cHj7PYqP6x+FZWkyKZOvasvUh3OgLrQTMsWi2e4tiNE/cwosOGdoIJHJv5 mlt0SVJDCMuWzQ4bHNZNqm9OG/n84S9f2AvWx50oFYaWx663cD0HWehJeFTuh7zczWpw 1WR2ILNZmI9s1ES2T01yvRPIed8AyScDj/EWWWiaG4sQ+2DJnehwKbOjSe/Yd5GZDXFV z1/w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:arc-authentication-results; bh=6OY9KVXlt4+7oaJqOrXRiYcseDZToQY9xVhvAbTw374=; b=XOwyKlEzP3/1mYwVTc43Kv6lz32pTSRhMGJ7679OTEZUfpLJzKQUXdzCM6/T706530 3vFVSGsTn+f00hBASDo3ieUsxQTBfb/6K84u4vuCagKOSP/w5NjvKFvnXv1R8hRPXgMo pP7mic/YtiBbss2NWu07BtogiLfJ9m+LfJqpqvMtpP760DAOXxiZ1Lz5zQB1bbDw23Mt dJ11RApeZityMC9CUBztRUnAK/R4LweGXawg7MzAuy8iOXbow5gvC2oMU/sRkiZX6+Ix 56nYPhS7ia/JoZrTaYvSiITP2q9WxHpym1w9SrwZnbPLmkyAXKu8mjfE7v3DG+4fJJl1 vdKQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id g4si2374172pfm.358.2018.04.03.08.57.48; Tue, 03 Apr 2018 08:58:01 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752257AbeDCPzO (ORCPT + 99 others); Tue, 3 Apr 2018 11:55:14 -0400 Received: from mx2.suse.de ([195.135.220.15]:43385 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751713AbeDCPzN (ORCPT ); Tue, 3 Apr 2018 11:55:13 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (charybdis-ext.suse.de [195.135.220.254]) by mx2.suse.de (Postfix) with ESMTP id A817AAEFF; Tue, 3 Apr 2018 15:55:10 +0000 (UTC) Date: Tue, 3 Apr 2018 17:55:09 +0200 From: Michal Hocko To: Johannes Weiner Cc: Andrew Morton , David Rientjes , "Kirill A. Shutemov" , Vlastimil Babka , linux-mm@kvack.org, LKML Subject: Re: [PATCH] memcg, thp: do not invoke oom killer on thp charges Message-ID: <20180403155509.GD5501@dhcp22.suse.cz> References: <20180321205928.22240-1-mhocko@kernel.org> <20180403145853.GB21411@cmpxchg.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20180403145853.GB21411@cmpxchg.org> User-Agent: Mutt/1.9.4 (2018-02-28) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue 03-04-18 10:58:53, Johannes Weiner wrote: > On Wed, Mar 21, 2018 at 09:59:28PM +0100, Michal Hocko wrote: > > From: Michal Hocko > > > > David has noticed that THP memcg charge can trigger the oom killer > > since 2516035499b9 ("mm, thp: remove __GFP_NORETRY from khugepaged and > > madvised allocations"). We have used an explicit __GFP_NORETRY > > previously which ruled the OOM killer automagically. > > > > Memcg charge path should be semantically compliant with the allocation > > path and that means that if we do not trigger the OOM killer for costly > > orders which should do the same in the memcg charge path as well. > > Otherwise we are forcing callers to distinguish the two and use > > different gfp masks which is both non-intuitive and bug prone. Not to > > mention the maintenance burden. > > > > Teach mem_cgroup_oom to bail out on costly order requests to fix the THP > > issue as well as any other costly OOM eligible allocations to be added > > in future. > > > > Fixes: 2516035499b9 ("mm, thp: remove __GFP_NORETRY from khugepaged and madvised allocations") > > Reported-by: David Rientjes > > Signed-off-by: Michal Hocko > > I also prefer this fix over having separate OOM behaviors (which is > user-visible, and not just about technical ability to satisfy the > allocation) between the allocator and memcg. > > Acked-by: Johannes Weiner I will repost the patch with the currently merged THP specific handling reverted (see below). While 9d3c3354bb85 might have been an appropriate quick fix, we shouldn't keep it longterm for 4.17+ IMHO. Does you ack apply to that patch as well? --- From 3e1b127dd6107a0e51654e90d9faef7f196f5a10 Mon Sep 17 00:00:00 2001 From: Michal Hocko Date: Wed, 21 Mar 2018 10:10:37 +0100 Subject: [PATCH] memcg, thp: do not invoke oom killer on thp charges David has noticed that THP memcg charge can trigger the oom killer since 2516035499b9 ("mm, thp: remove __GFP_NORETRY from khugepaged and madvised allocations"). We have used an explicit __GFP_NORETRY previously which ruled the OOM killer automagically. Memcg charge path should be semantically compliant with the allocation path and that means that if we do not trigger the OOM killer for costly orders which should do the same in the memcg charge path as well. Otherwise we are forcing callers to distinguish the two and use different gfp masks which is both non-intuitive and bug prone. As soon as we get a costly high order kmalloc user we even do not have any means to tell the memcg specific gfp mask to prevent from OOM because the charging is deep within guts of the slab allocator. The unexpected memcg OOM on THP has already been fixed upstream by 9d3c3354bb85 ("mm, thp: do not cause memcg oom for thp") but this is one-off fix rather than a generic solution. Teach mem_cgroup_oom to bail out on costly order requests to fix the THP issue as well as any other costly OOM eligible allocations to be added in future. Also revert 9d3c3354bb85 because special gfp for THP is no longer needed. Fixes: 2516035499b9 ("mm, thp: remove __GFP_NORETRY from khugepaged and madvised allocations") Reported-by: David Rientjes Signed-off-by: Michal Hocko --- mm/huge_memory.c | 5 ++--- mm/khugepaged.c | 8 ++------ mm/memcontrol.c | 2 +- 3 files changed, 5 insertions(+), 10 deletions(-) diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 2297dd9cc7c3..0cc62405de9c 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -555,8 +555,7 @@ static int __do_huge_pmd_anonymous_page(struct vm_fault *vmf, struct page *page, VM_BUG_ON_PAGE(!PageCompound(page), page); - if (mem_cgroup_try_charge(page, vma->vm_mm, gfp | __GFP_NORETRY, &memcg, - true)) { + if (mem_cgroup_try_charge(page, vma->vm_mm, gfp, &memcg, true)) { put_page(page); count_vm_event(THP_FAULT_FALLBACK); return VM_FAULT_FALLBACK; @@ -1317,7 +1316,7 @@ int do_huge_pmd_wp_page(struct vm_fault *vmf, pmd_t orig_pmd) } if (unlikely(mem_cgroup_try_charge(new_page, vma->vm_mm, - huge_gfp | __GFP_NORETRY, &memcg, true))) { + huge_gfp, &memcg, true))) { put_page(new_page); split_huge_pmd(vma, vmf->pmd, vmf->address); if (page) diff --git a/mm/khugepaged.c b/mm/khugepaged.c index 214e614b62b0..b7e2268dfc9a 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -960,9 +960,7 @@ static void collapse_huge_page(struct mm_struct *mm, goto out_nolock; } - /* Do not oom kill for khugepaged charges */ - if (unlikely(mem_cgroup_try_charge(new_page, mm, gfp | __GFP_NORETRY, - &memcg, true))) { + if (unlikely(mem_cgroup_try_charge(new_page, mm, gfp, &memcg, true))) { result = SCAN_CGROUP_CHARGE_FAIL; goto out_nolock; } @@ -1321,9 +1319,7 @@ static void collapse_shmem(struct mm_struct *mm, goto out; } - /* Do not oom kill for khugepaged charges */ - if (unlikely(mem_cgroup_try_charge(new_page, mm, gfp | __GFP_NORETRY, - &memcg, true))) { + if (unlikely(mem_cgroup_try_charge(new_page, mm, gfp, &memcg, true))) { result = SCAN_CGROUP_CHARGE_FAIL; goto out; } diff --git a/mm/memcontrol.c b/mm/memcontrol.c index d1a917b5b7b7..08accbcd1a18 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -1493,7 +1493,7 @@ static void memcg_oom_recover(struct mem_cgroup *memcg) static void mem_cgroup_oom(struct mem_cgroup *memcg, gfp_t mask, int order) { - if (!current->memcg_may_oom) + if (!current->memcg_may_oom || order > PAGE_ALLOC_COSTLY_ORDER) return; /* * We are in the middle of the charge context here, so we -- 2.16.3 -- Michal Hocko SUSE Labs