Received: by 10.213.65.68 with SMTP id h4csp480286imn; Thu, 22 Mar 2018 01:28:12 -0700 (PDT) X-Google-Smtp-Source: AG47ELs1L1YFNVBh7oTTETkZeTYiCyYuKct8W2E6lh20wiL8HUSzkC3vfpJjtRuNVTtZGqZcfQdx X-Received: by 2002:a17:902:127:: with SMTP id 36-v6mr24178244plb.194.1521707292047; Thu, 22 Mar 2018 01:28:12 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1521707292; cv=none; d=google.com; s=arc-20160816; b=p0my5SmyLAZoQ8st3cTDLSjySy4E2UxpbhjPzvwZU/XRkjcIAB4Ik+s/R8V21Wm64u lgFvqk5y9gP2e+uWCr8rckGwpgZkdv2/jlhuCLbAX1rUrpk2P+PfXwjhi7tGyzptTB/F wBaMf/nRcvPfk+SUfBA9sYuCdpDRSBFUkW2Li/kBkRaWhjgfJ6Fwywx/Jm6f4Z78Z8nt xifEiUvxAxEC2eU2lhPb4TGT5Blrl4nOT1OTvIdl293ds55EsCztKErSWKdHbcq9Mfrl 8VhjLVDp9oMQ5r0rVNAF8jRuKG3rYZOzyZNR5omishSDMpKOIHXRsGisIWdltSgT+q/k 4O2A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:user-agent:references :message-id:in-reply-to:subject:cc:to:from:date:dkim-signature :arc-authentication-results; bh=R47Id9xhnc6MlkrToAJ4gY+X4IGZVhj9IxUuW4dZgpQ=; b=v8wvEnsYUSpppL3O2k9ehz/SX7NdgTIJuQgdtB6UTZku8HHi/fiM1NpFPTvkkHcKp4 ZSXiQ4d6WCi5zUMBXEMyn9dbxZ0SbMq2XlznvzGw9ZuQDamigXDL/5Q1nGVr5zIHYbx4 DpEw+vcv0zDmqgc+ytJE83zJ+emnUVGlMFZPKbYM4hHUB3YBQBMeCElKxsVzvcvILDh0 nFEiw4PFlwmtWyIL2WVJPPRxZD0B4oOri7yJ9pFUp3X+kP/LJuYv/eTmlxqt5MP9UuWP yMtLcKzMuF9v2S79QMtelZJRnCVXrfO4c2yxx0hBWGUa1ceIGpRibRVgrRpb93G4DBfs RhTQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=hr/l5lWZ; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id p17si4065992pge.348.2018.03.22.01.27.57; Thu, 22 Mar 2018 01:28:11 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=hr/l5lWZ; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752679AbeCVI0U (ORCPT + 99 others); Thu, 22 Mar 2018 04:26:20 -0400 Received: from mail-pg0-f42.google.com ([74.125.83.42]:35812 "EHLO mail-pg0-f42.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751817AbeCVI0P (ORCPT ); Thu, 22 Mar 2018 04:26:15 -0400 Received: by mail-pg0-f42.google.com with SMTP id d1so2988863pgv.2 for ; Thu, 22 Mar 2018 01:26:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:from:to:cc:subject:in-reply-to:message-id:references :user-agent:mime-version; bh=R47Id9xhnc6MlkrToAJ4gY+X4IGZVhj9IxUuW4dZgpQ=; b=hr/l5lWZIU7ksuJgpua8lZjJJORd9YxE3bdt+HhMuZqDAn90P6Z29TUfzZkAkenFH6 kC+BZgdMovxGZF9VtKPdocQ1nuEbre5+6Sd309HEsrV6VOOi9EZwnGpPkywJDKokpFQU 5VXpClnQw3ujkAVGu0QqQ1fRBBY4TVJu71l9mFHfZ+tizZ6g41lsxRwV65ZZuce/zooO DO/2nwX7HSSxiEESqKmgXS1xFNr4zUHjORYt4803wiPjpvDv0xIfEAPjQcJJ+8SLtH6N zZ3WUoPKztl3rLeJbjWmuCTQzwQfNcdnKVugtrZTPGOaXAtij0R1A7dFSw4V3br5b+EL liDg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:in-reply-to:message-id :references:user-agent:mime-version; bh=R47Id9xhnc6MlkrToAJ4gY+X4IGZVhj9IxUuW4dZgpQ=; b=p8KGfipZ12vgrBqV4obFmNh+LQWb/ppc9EMQN9gEmAx4kqbyezVNV9aFwpwcTHV5tt /CgTaiy8qP/QecBv73DiSrONIC1zlDV4qcBK92kPW2DbhgE+A4jPJ6MUK+bgDxeXf7mH axXtHj4qanwX2J+paiNoL8uq5+OiH9OGeKwD8GxveyCpviaUOnx08iCRGHXV0O/9iJRN IekdFunk38uhVgxQzjYA0wm8X47w9TqWzqKRSQqsCegfWuNtaInIio9QN8ud/oJcXrku F+ElJ29zF3aAHfHxvxgmdeYQs7Js3ZDnXLUfqc5HxP3x5I1lklZYJaTTm+W7YWMRGYnS 7OAA== X-Gm-Message-State: AElRT7GUuObTLRhI2V6eCXX0AIOVjOINRoYge87ITEYucNb+hhZOucKE EYnD5e8R6hZ5KDR+l5882Ev8jQ== X-Received: by 10.98.189.24 with SMTP id a24mr19823693pff.125.1521707174360; Thu, 22 Mar 2018 01:26:14 -0700 (PDT) Received: from [2620:15c:17:3:3a5:23a7:5e32:4598] ([2620:15c:17:3:3a5:23a7:5e32:4598]) by smtp.gmail.com with ESMTPSA id y3sm9565134pgc.81.2018.03.22.01.26.13 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 22 Mar 2018 01:26:13 -0700 (PDT) Date: Thu, 22 Mar 2018 01:26:13 -0700 (PDT) From: David Rientjes X-X-Sender: rientjes@chino.kir.corp.google.com To: Michal Hocko cc: Andrew Morton , Johannes Weiner , "Kirill A. Shutemov" , Vlastimil Babka , linux-mm@kvack.org, LKML Subject: Re: [PATCH] memcg, thp: do not invoke oom killer on thp charges In-Reply-To: <20180321214104.GT23100@dhcp22.suse.cz> Message-ID: References: <20180321205928.22240-1-mhocko@kernel.org> <20180321214104.GT23100@dhcp22.suse.cz> User-Agent: Alpine 2.20 (DEB 67 2015-01-07) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, 21 Mar 2018, Michal Hocko wrote: > > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > > > index d1a917b5b7b7..08accbcd1a18 100644 > > > --- a/mm/memcontrol.c > > > +++ b/mm/memcontrol.c > > > @@ -1493,7 +1493,7 @@ static void memcg_oom_recover(struct mem_cgroup *memcg) > > > > > > static void mem_cgroup_oom(struct mem_cgroup *memcg, gfp_t mask, int order) > > > { > > > - if (!current->memcg_may_oom) > > > + if (!current->memcg_may_oom || order > PAGE_ALLOC_COSTLY_ORDER) > > > return; > > > /* > > > * We are in the middle of the charge context here, so we > > > > What bug reports have you received about order-4 and higher order non thp > > charges that this fixes? > > We do not have any costly _OOM killable_ allocations but THP AFAIR. Or > am I missing any? > So now you're making a generalized policy change to the memcg charge path to fix what is obviously only thp and caused by removing the __GFP_NORETRY from thp allocations in commit 2516035499b9? I don't know what orders people enforce for slub_min_order. I assume that people who don't want to cause a memcg oom kill are using __GFP_NORETRY because that's how it has always worked. The fact that the page allocator got more sophisticated logic for the various thp fault and defrag policies doesn't change that. You're implementing the exact same behavior that commit 2516035499b9 was trying to avoid; it's trying to avoid special-casing thp in general logic. order > PAGE_ALLOC_COSTLY_ORDER is a terrible heuristic to identify thp allocations. > > PAGE_ALLOC_COSTLY_ORDER is a heuristic used by the page allocator because > > it cannot free high-order contiguous memory. Memcg just needs to reclaim > > a number of pages. Two order-3 charges can cause a memcg oom kill but now > > an order-4 charge cannot. It's an unfair bias against high-order charges > > that are not explicitly using __GFP_NORETRY. > > PAGE_ALLOC_COSTLY_ORDER is documented and people know what to expect > from such a request. Diverging from that behavior just comes as a > surprise. There is no reason for that and as the above outlines it is > error prone. > You're diverging from it because the memcg charge path has never had this heuristic. I'm somewhat stunned this has to be repeated: PAGE_ALLOC_COSTLY_ORDER is about the ability to allocate _contiguous_ memory, it's not about the _amount_ of memory. Changing the memcg charge path to factor order into oom kill decisions is new, and should be proposed as a follow-up patch to my bug fix to describe what else is being impacted by your patch and what is fixed by it. Yours is a heuristic change, mine is a bug fix. Look, commit 2516035499b9 pulled off __GFP_NORETRY for GFP_TRANSHUGE and forgot to fix it up for memcg charging. I'm setting the bit again to prevent the oom kill. It's what should be merged for rc7. I can't make a stable case for it because the stable rules want it to impact more than one user and I haven't seen other bug reports. It can be backported if others are affected to meet the rules. Your change is broken and I wouldn't push it to Linus for rc7 if my life depended on it. What is the response when someone complains that they start getting a ton of MEMCG_OOM notifications for every thp fallback? They will, because yours is a broken implementation. I'm trying to fix the problem introduced by commit 2516035499b9 wrt how memcg charges treat high order non-__GFP_NORETRY allocations, and fix it directly with something that is obviously right. I'm specifically not trying to change heuristics as a bug fix. Please feel free to send a follow-up patch for 4.17 that lays out why memcg doesn't want to oom kill for order-4 and above (why does memcg fail for 64KB charges when the caller specifically left off __GFP_NORETRY again?) as a policy change and why that is helpful. Respectfully, allow the bugfix to fix what was obviously left off from commit 2516035499b9. I don't have time to write 100 emails on it, but Andrew can be assured if he chooses to send it for rc7 that my code (1) is actually tested, (2) has users that depend on it, and (3) won't cause undesired side-effects like yours wrt oom notifications.