Received: by 2002:ac0:a581:0:0:0:0:0 with SMTP id m1-v6csp912873imm; Wed, 20 Jun 2018 08:33:20 -0700 (PDT) X-Google-Smtp-Source: ADUXVKK5CzlKHx4Yw5eTWcrE83k1sLM0IJMWXK5h+Cd8wEa1cxAkQnFmDQ/w3WrfVC8tdiaWo2Rh X-Received: by 2002:a63:7553:: with SMTP id f19-v6mr19639850pgn.314.1529508800734; Wed, 20 Jun 2018 08:33:20 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1529508800; cv=none; d=google.com; s=arc-20160816; b=N2FVzGqKY5y0fjbENLJ7DlHyAwp12z1t9u5EM+DPGt3fKeQ4URs6XLgje6lASQtwS+ /9+4aAS7K8+6QavSiFRuow7Lk6/q7Lxj8WArN6a2fZaAvDymqYk7dMk9ANEnlWuPU0DI RncFZxzRVazQ+VFjFlDqCd7sVAq15KncHsc/ebpnjQQLrJ9s2DrO/DEL3rwwFski+hQd NEBe++Sk8A89P/H0pbuPH1ZoAr8dfhm8hvRJ3PQhF5KkNuuSB7HUdO2YjvUQOVS14VvR tShkoUzMkrNixBCEoms3LM0SkqprVKNXzm1A72nbDsxVzCu0qLjrR9d/jmsHtRL2EHEd t0Xg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:arc-authentication-results; bh=mvhlV2tHPBG9MlYE3Q8v7d5e4fGKPaxpT6h06W1fBdU=; b=0P4ACQsJpAw1ShCbqHqL63SiFezgNqaKF4QlCT5CcaXwq15BnH1jD//oMR0vHUpHR2 xzwYfDznQ+5Z9UW05vz6INglU1XTSX55YQTJ6XEoxdF8A74KSuOhAuUw+n7pfeAzQMzp gMojpH6gy/XFiisI7gc+3cr3sooV/bsFjPPkFdhak99soonV78OPIQRsqszQnlP7VWCw LKRuG8a6vgnUEYvJbLlM8iXDnkXiAIMLlsb6+xYHGNyoPXim8wSuGPuwO07NajNvKPgf tUnl9MoNo3up4y1ZzXRJjV4FPP6esReej5mLMhQc80VcTmgYjGkm9A0DxZECp6eLLE3E GFWw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id q1-v6si2174515pgs.441.2018.06.20.08.33.06; Wed, 20 Jun 2018 08:33:20 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754199AbeFTPbw (ORCPT + 99 others); Wed, 20 Jun 2018 11:31:52 -0400 Received: from mx2.suse.de ([195.135.220.15]:35514 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754113AbeFTPbv (ORCPT ); Wed, 20 Jun 2018 11:31:51 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (charybdis-ext-too.suse.de [195.135.220.254]) by mx2.suse.de (Postfix) with ESMTP id 9FF93ADCA; Wed, 20 Jun 2018 15:31:49 +0000 (UTC) Date: Wed, 20 Jun 2018 17:31:48 +0200 From: Michal Hocko To: Johannes Weiner Cc: linux-mm@kvack.org, Greg Thelen , Shakeel Butt , Andrew Morton , LKML Subject: Re: [RFC PATCH] memcg, oom: move out_of_memory back to the charge path Message-ID: <20180620153148.GO13685@dhcp22.suse.cz> References: <20180620103736.13880-1-mhocko@kernel.org> <20180620151812.GA2441@cmpxchg.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20180620151812.GA2441@cmpxchg.org> User-Agent: Mutt/1.9.5 (2018-04-13) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed 20-06-18 11:18:12, Johannes Weiner wrote: > On Wed, Jun 20, 2018 at 12:37:36PM +0200, Michal Hocko wrote: [...] > > -static void mem_cgroup_oom(struct mem_cgroup *memcg, gfp_t mask, int order) > > +enum oom_status { > > + OOM_SUCCESS, > > + OOM_FAILED, > > + OOM_ASYNC, > > + OOM_SKIPPED > > Either SUCCESS & FAILURE, or SUCCEEDED & FAILED ;) sure, I will go with later. > We're not distinguishing ASYNC and SKIPPED anywhere below, but I > cannot think of a good name to communicate them both without this > function making assumptions about the charge function's behavior. > So it's a bit weird, but probably the best way to go. Yeah, that was what I was fighting with. My original proposal which simply ENOMEM in the failure case was a simple bool but once we have those different sates and failure behavior I think it is better to comunicate that and let the caller do whatever it finds reasonable. > > > +static enum oom_status mem_cgroup_oom(struct mem_cgroup *memcg, gfp_t mask, int order) > > { > > - if (!current->memcg_may_oom || order > PAGE_ALLOC_COSTLY_ORDER) > > - return; > > + if (order > PAGE_ALLOC_COSTLY_ORDER) > > + return OOM_SKIPPED; > > /* > > * We are in the middle of the charge context here, so we > > * don't want to block when potentially sitting on a callstack > > * that holds all kinds of filesystem and mm locks. > > * > > - * Also, the caller may handle a failed allocation gracefully > > - * (like optional page cache readahead) and so an OOM killer > > - * invocation might not even be necessary. > > + * cgroup1 allows disabling the OOM killer and waiting for outside > > + * handling until the charge can succeed; remember the context and put > > + * the task to sleep at the end of the page fault when all locks are > > + * released. > > + * > > + * On the other hand, in-kernel OOM killer allows for an async victim > > + * memory reclaim (oom_reaper) and that means that we are not solely > > + * relying on the oom victim to make a forward progress and we can > > + * invoke the oom killer here. > > * > > - * That's why we don't do anything here except remember the > > - * OOM context and then deal with it at the end of the page > > - * fault when the stack is unwound, the locks are released, > > - * and when we know whether the fault was overall successful. > > + * Please note that mem_cgroup_oom_synchronize might fail to find a > > + * victim and then we have rely on mem_cgroup_oom_synchronize otherwise > > + * we would fall back to the global oom killer in pagefault_out_of_memory > > */ > > + if (!memcg->oom_kill_disable) { > > + if (mem_cgroup_out_of_memory(memcg, mask, order)) > > + return OOM_SUCCESS; > > + > > + WARN(!current->memcg_may_oom, > > + "Memory cgroup charge failed because of no reclaimable memory! " > > + "This looks like a misconfiguration or a kernel bug."); > > + return OOM_FAILED; > > + } > > + > > + if (!current->memcg_may_oom) > > + return OOM_SKIPPED; > > memcg_may_oom was introduced to distinguish between userspace faults > that can OOM and contexts that return -ENOMEM. Now we're using it > slightly differently and it's confusing. > > 1) Why warn for kernel allocations, but not userspace ones? This > should have a comment at least. I am not sure I understand. We do warn for all allocations types of mem_cgroup_out_of_memory fails as long as we are not in a legacy - oom_disabled case. > 2) We invoke the OOM killer when !memcg_may_oom. We want to OOM kill > in either case, but only set up the mem_cgroup_oom_synchronize() for > userspace faults. So the code makes sense, but a better name would be > in order -- current->in_user_fault? in_user_fault is definitely better than memcg_may_oom. > > css_get(&memcg->css); > > current->memcg_in_oom = memcg; > > current->memcg_oom_gfp_mask = mask; > > current->memcg_oom_order = order; > > + > > + return OOM_ASYNC; > > In terms of code flow, it would be much clearer to handle the > memcg->oom_kill_disable case first, as a special case with early > return, and make the OOM invocation the main code of this function, > given its name. This? if (order > PAGE_ALLOC_COSTLY_ORDER) return OOM_SKIPPED; /* * We are in the middle of the charge context here, so we * don't want to block when potentially sitting on a callstack * that holds all kinds of filesystem and mm locks. * * cgroup1 allows disabling the OOM killer and waiting for outside * handling until the charge can succeed; remember the context and put * the task to sleep at the end of the page fault when all locks are * released. * * On the other hand, in-kernel OOM killer allows for an async victim * memory reclaim (oom_reaper) and that means that we are not solely * relying on the oom victim to make a forward progress and we can * invoke the oom killer here. * * Please note that mem_cgroup_oom_synchronize might fail to find a * victim and then we have rely on mem_cgroup_oom_synchronize otherwise * we would fall back to the global oom killer in pagefault_out_of_memory */ if (memcg->oom_kill_disable) { if (!current->memcg_may_oom) return OOM_SKIPPED; css_get(&memcg->css); current->memcg_in_oom = memcg; current->memcg_oom_gfp_mask = mask; current->memcg_oom_order = order; return OOM_ASYNC; } if (mem_cgroup_out_of_memory(memcg, mask, order)) return OOM_SUCCESS; WARN(!current->memcg_may_oom, "Memory cgroup charge failed because of no reclaimable memory! " "This looks like a misconfiguration or a kernel bug."); return OOM_FAILED; Thanks! -- Michal Hocko SUSE Labs