Received: by 2002:ac0:a5a6:0:0:0:0:0 with SMTP id m35-v6csp3977120imm; Tue, 11 Sep 2018 05:12:21 -0700 (PDT) X-Google-Smtp-Source: ANB0VdYnoT/10zekQreExIbAROWY7q6PmknOwBcCuIuD5zMduW3KY5KN7FKflQPF2jESad9JRHgf X-Received: by 2002:a63:549:: with SMTP id 70-v6mr28710618pgf.385.1536667941106; Tue, 11 Sep 2018 05:12:21 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1536667941; cv=none; d=google.com; s=arc-20160816; b=WcFNdwo9lutYp6givhswa76FBUczM7uo5yEe7JTvR/8yu5T5trTje7iCOIxg5wpXN8 JvqamrbLkgoRyTVAaHpqJDlcHDjAR61wvQIcidChzLf3CVM+dReWwstsCV4T/KAuAjo4 y65NGgKWZvYEb/jx1LCikAliETyuIHfro2RAsGoCk+PEEtgPwa8kPvUOfxhiiJ+8rY6w K1O+iLpQ2BjPj/HKe7wfjD3bp2QV6rvEPCBlq8kCsJU9rk1aAzsp42Jy3lN3XTVa96L+ rinHWOfEqEfD8zrhNywihUEGLCE1dXH5svADk6z6rNUFGLPszwA/PG4STgd/M4euCS7W 5QrQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=1VYC0p7sbbQSRNENzUiET7gxXTPOf+FEhge9EHs+5Sk=; b=oU1RahEFuoX9UMLqYHfCuO31FtX998PRE/rI712i/6TNW334BeSWtEW3SewrqNBpUj B6Xc4uM379SpKMykpP5aqnvXCbZuXY4i4HtG15Eq08+zeo9RlHg0dpbpVJeicG8zKxVO Hn1WSjRcrq83G0c4in9X//N8vE3i5xt4/NmX+c6QV7wqQauFqEl16CJ2BwS3FtHTwef/ T/37WbRNu3WMkXB+KQ/5RVgkkqIyBvV+Tuf06AsvNTq7WVM5rMHRsdrRwb8E0rjpeugZ ni1jiMr7zggg1RseoZ8nfvJVHzrhxJwIcw6Y4ugFmWXm7OE3iwRsqUT4Lmc2RXLOlc/7 Nl0Q== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id u90-v6si20237435pfk.82.2018.09.11.05.11.52; Tue, 11 Sep 2018 05:12:21 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727034AbeIKRKr (ORCPT + 99 others); Tue, 11 Sep 2018 13:10:47 -0400 Received: from mx2.suse.de ([195.135.220.15]:60790 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726612AbeIKRKr (ORCPT ); Tue, 11 Sep 2018 13:10:47 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay1.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 076C0AFC4; Tue, 11 Sep 2018 12:11:42 +0000 (UTC) Date: Tue, 11 Sep 2018 14:11:41 +0200 From: Michal Hocko To: Roman Gushchin Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, kernel-team@fb.com, Johannes Weiner , Vladimir Davydov Subject: Re: [PATCH RFC] mm: don't raise MEMCG_OOM event due to failed high-order allocation Message-ID: <20180911121141.GS10951@dhcp22.suse.cz> References: <20180910215622.4428-1-guro@fb.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20180910215622.4428-1-guro@fb.com> User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon 10-09-18 14:56:22, Roman Gushchin wrote: > The memcg OOM killer is never invoked due to a failed high-order > allocation, however the MEMCG_OOM event can be easily raised. > > Under some memory pressure it can happen easily because of a > concurrent allocation. Let's look at try_charge(). Even if we were > able to reclaim enough memory, this check can fail due to a race > with another allocation: > > if (mem_cgroup_margin(mem_over_limit) >= nr_pages) > goto retry; > > For regular pages the following condition will save us from triggering > the OOM: > > if (nr_reclaimed && nr_pages <= (1 << PAGE_ALLOC_COSTLY_ORDER)) > goto retry; > > But for high-order allocation this condition will intentionally fail. > The reason behind is that we'll likely fall to regular pages anyway, > so it's ok and even preferred to return ENOMEM. > > In this case the idea of raising the MEMCG_OOM event looks dubious. Why is this a problem though? IIRC this event was deliberately placed outside of the oom path because we wanted to count allocation failures and this is also documented that way oom The number of time the cgroup's memory usage was reached the limit and allocation was about to fail. Depending on context result could be invocation of OOM killer and retrying allocation or failing a One could argue that we do not apply the same logic to GFP_NOWAIT requests but in general I would like to see a good reason to change the behavior and if it is really the right thing to do then we need to update the documentation as well. > Fix this by moving MEMCG_OOM raising to mem_cgroup_oom() after > allocation order check, so that the event won't be raised for high > order allocations. > > Signed-off-by: Roman Gushchin > Cc: Johannes Weiner > Cc: Michal Hocko > Cc: Vladimir Davydov > --- > mm/memcontrol.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index fcec9b39e2a3..103ca3c31c04 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -1669,6 +1669,8 @@ static enum oom_status mem_cgroup_oom(struct mem_cgroup *memcg, gfp_t mask, int > if (order > PAGE_ALLOC_COSTLY_ORDER) > return OOM_SKIPPED; > > + memcg_memory_event(memcg, MEMCG_OOM); > + > /* > * We are in the middle of the charge context here, so we > * don't want to block when potentially sitting on a callstack > @@ -2250,8 +2252,6 @@ static int try_charge(struct mem_cgroup *memcg, gfp_t gfp_mask, > if (fatal_signal_pending(current)) > goto force; > > - memcg_memory_event(mem_over_limit, MEMCG_OOM); > - > /* > * keep retrying as long as the memcg oom killer is able to make > * a forward progress or bypass the charge if the oom killer > -- > 2.17.1 -- Michal Hocko SUSE Labs