Received: by 2002:ac0:a5a7:0:0:0:0:0 with SMTP id m36-v6csp5362069imm; Tue, 21 Aug 2018 10:23:47 -0700 (PDT) X-Google-Smtp-Source: AA+uWPwt84snve1/N1Sx8tyCbJuyPQIwg5PmPL3TAqYKJJqPUQMAMj7hyy7pmBuOTyxP4hdjZcfo X-Received: by 2002:a62:51c6:: with SMTP id f189-v6mr54200017pfb.7.1534872227682; Tue, 21 Aug 2018 10:23:47 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1534872227; cv=none; d=google.com; s=arc-20160816; b=gKN8qH3RO6B0QzXU2Qhy5GTuoJETNTkrFa+RV6k7q+asbWK5A7pSn2kBQIWaXpvlPf p2N9afacuMiv1E2u8/PiZUTBxLCMR2hEBtl46uzb8zixbtWB2xs4sTgpioRmaQoVMWV6 MWUkBoNYnLoFvrjMSmGoeFnvezmo8cglNhSGE4MWTblYlnZXuLJ5rDu3Dj0ThuHwMSUv E75g8azGGgeeEqQAatuSntUFSpiekT7ZhTwUaImeic1lq5A3gxGPTmHWqC5rwGuuUm7i 8SLVfVAUzlcaugLRzEw4deYNTpc7j+okAnwTw0KFRb6rEfv3z7Tp6f+eRzv6vVU0VYzZ LawQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature:arc-authentication-results; bh=1cwZlKpeAS5YM2rsLA7B0QOciT7iRlPf9f33zXq2+AA=; b=rrTcvZAGeMLQl37m2KGvK8cuksOzFcGMyMx9S0D9kX+8UrhiXRfmVqRK7tLaiubaYe c9KmPYicLyS7xDgtSIyKa3Ct8cKuIqnMFjVFsYs0EsJTPSbWRfG2ygqkWi5mXn3mCFbP VyumzERF7t1EXTP5FxyZNfuNM9Fld0n2TRPeXjztvz6B17b+Hli5FPnbYB2lp3h3rhCQ N7LBRaaMS7AXbge7RHGV4MQLeyvyW8wxC/Wfx/fzfcIcvx5bhlf6usafaltZ0mSM78Bc IF5a1pTyLGPftwB8awJFTibo3ICMbtgyFT6fkg4V9nIJmgXMveZsenicE7yKoHaEzv6R 91lA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@cmpxchg-org.20150623.gappssmtp.com header.s=20150623 header.b=FIlSW1pu; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=cmpxchg.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id l30-v6si12670677pgn.238.2018.08.21.10.23.32; Tue, 21 Aug 2018 10:23:47 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@cmpxchg-org.20150623.gappssmtp.com header.s=20150623 header.b=FIlSW1pu; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=cmpxchg.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727863AbeHUUmB (ORCPT + 99 others); Tue, 21 Aug 2018 16:42:01 -0400 Received: from mail-yw1-f65.google.com ([209.85.161.65]:46296 "EHLO mail-yw1-f65.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726612AbeHUUmA (ORCPT ); Tue, 21 Aug 2018 16:42:00 -0400 Received: by mail-yw1-f65.google.com with SMTP id j131-v6so2634475ywc.13 for ; Tue, 21 Aug 2018 10:20:58 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cmpxchg-org.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to:user-agent; bh=1cwZlKpeAS5YM2rsLA7B0QOciT7iRlPf9f33zXq2+AA=; b=FIlSW1puFoQpkQc8KUpYwRxnzfrIbEzogrYaKNCSyxngfshTqturTgGMRYp230ScBX fApZNC4ExOA0hLmX+eQo65cTVI8hDL5LCYYCGYdwZAq9XyBimyWzWG8EYHaSC81ZTQs6 r+yVjZrHL4mjSHeuetYqLtNZsIwIjgt6O2sIGOVs2UTfdJtmp1Ladsl5zKLIRas0CM2j 0NAKDzPp3pOd/j6QwSahfu1GaM/8EGqHAMzqWnapuQaOAyS1Eqgdmonn/ik3sLK7tLlN zJXfS9JGO7G0n+mdq6YBqygMljKwm3aA9QrAU/Uyh0n6VMoqSZwpz0JDz1Ud87OwARLr 0dtw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to:user-agent; bh=1cwZlKpeAS5YM2rsLA7B0QOciT7iRlPf9f33zXq2+AA=; b=X3iGd3mtCkBpnfLkdGOJPQQWguMZDAhZVuUC15eD4ngwIiFZuQ1BGZZyRsWwXsTAak TI14wr2+OtaTs0sNWZe+WMstC3Zeer92EtFLIdBKodYkw3kpsKPgxMdiGf+Ac2J8pDcm eNRglOWHBTD9V+4qjnRuBnqPf67VJGncH7DFc6obQQMvNITIgW0R4xMhrDwd/uin5FnE /bg0szZVumn8G8YbfBnSd1djCIsuZ1XVIx4zEcR2ETef7tD5XzsPNNMXw8jNjDfPcUBl Zm7SzbwLgGuNZ5FRTtqggFU4GdcPWi71PrbQxWYRotunAPa7/rpDNUVO0wvLF/lafiEU eotQ== X-Gm-Message-State: AOUpUlE7MYPQzTar1GJRl1tYZYMhAFgEfIUee5xw6e3eqI/uRsTKY0NU 2YDk1PxBBzufwGnftsayWcKdnA== X-Received: by 2002:a81:e203:: with SMTP id p3-v6mr26807783ywl.271.1534872057933; Tue, 21 Aug 2018 10:20:57 -0700 (PDT) Received: from localhost ([2620:10d:c091:200::3:17a0]) by smtp.gmail.com with ESMTPSA id s206-v6sm5974726ywc.55.2018.08.21.10.20.56 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Tue, 21 Aug 2018 10:20:56 -0700 (PDT) Date: Tue, 21 Aug 2018 13:20:55 -0400 From: Johannes Weiner To: Michal Hocko Cc: Andrew Morton , Vladimir Davydov , Greg Thelen , Tetsuo Handa , Dmitry Vyukov , linux-mm@kvack.org, LKML Subject: Re: [PATCH 2/2] memcg, oom: emit oom report when there is no eligible task Message-ID: <20180821172055.GA23516@cmpxchg.org> References: <20180808064414.GA27972@dhcp22.suse.cz> <20180808071301.12478-1-mhocko@kernel.org> <20180808071301.12478-3-mhocko@kernel.org> <20180808144515.GA9276@cmpxchg.org> <20180808161737.GQ27972@dhcp22.suse.cz> <20180821140612.GD16611@dhcp22.suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20180821140612.GD16611@dhcp22.suse.cz> User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org I sent them in a separate thread. Thanks. On Tue, Aug 21, 2018 at 04:06:12PM +0200, Michal Hocko wrote: > Do you plan to repost these two? They are quite deep in the email thread > so they can easily fall through cracks. > > On Wed 08-08-18 18:17:37, Michal Hocko wrote: > > On Wed 08-08-18 10:45:15, Johannes Weiner wrote: > [...] > > > >From bba01122f739b05a689dbf1eeeb4f0e07affd4e7 Mon Sep 17 00:00:00 2001 > > > From: Johannes Weiner > > > Date: Wed, 8 Aug 2018 09:59:40 -0400 > > > Subject: [PATCH] mm: memcontrol: print proper OOM header when no eligible > > > victim left > > > > > > When the memcg OOM killer runs out of killable tasks, it currently > > > prints a WARN with no further OOM context. This has caused some user > > > confusion. > > > > > > Warnings indicate a kernel problem. In a reported case, however, the > > > situation was triggered by a non-sensical memcg configuration (hard > > > limit set to 0). But without any VM context this wasn't obvious from > > > the report, and it took some back and forth on the mailing list to > > > identify what is actually a trivial issue. > > > > > > Handle this OOM condition like we handle it in the global OOM killer: > > > dump the full OOM context and tell the user we ran out of tasks. > > > > > > This way the user can identify misconfigurations easily by themselves > > > and rectify the problem - without having to go through the hassle of > > > running into an obscure but unsettling warning, finding the > > > appropriate kernel mailing list and waiting for a kernel developer to > > > remote-analyze that the memcg configuration caused this. > > > > > > If users cannot make sense of why the OOM killer was triggered or why > > > it failed, they will still report it to the mailing list, we know that > > > from experience. So in case there is an actual kernel bug causing > > > this, kernel developers will very likely hear about it. > > > > > > Signed-off-by: Johannes Weiner > > > > Yes this works as well. We would get a dump even for the race we have > > seen but I do not think this is something to lose sleep over. And if it > > triggers too often to be disturbing we can add > > tsk_is_oom_victim(current) check there. > > > > Acked-by: Michal Hocko > > > > > --- > > > mm/memcontrol.c | 2 -- > > > mm/oom_kill.c | 13 ++++++++++--- > > > 2 files changed, 10 insertions(+), 5 deletions(-) > > > > > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > > > index 4e3c1315b1de..29d9d1a69b36 100644 > > > --- a/mm/memcontrol.c > > > +++ b/mm/memcontrol.c > > > @@ -1701,8 +1701,6 @@ static enum oom_status mem_cgroup_oom(struct mem_cgroup *memcg, gfp_t mask, int > > > if (mem_cgroup_out_of_memory(memcg, mask, order)) > > > return OOM_SUCCESS; > > > > > > - WARN(1,"Memory cgroup charge failed because of no reclaimable memory! " > > > - "This looks like a misconfiguration or a kernel bug."); > > > return OOM_FAILED; > > > } > > > > > > diff --git a/mm/oom_kill.c b/mm/oom_kill.c > > > index 0e10b864e074..07ae222d7830 100644 > > > --- a/mm/oom_kill.c > > > +++ b/mm/oom_kill.c > > > @@ -1103,10 +1103,17 @@ bool out_of_memory(struct oom_control *oc) > > > } > > > > > > select_bad_process(oc); > > > - /* Found nothing?!?! Either we hang forever, or we panic. */ > > > - if (!oc->chosen && !is_sysrq_oom(oc) && !is_memcg_oom(oc)) { > > > + /* Found nothing?!?! */ > > > + if (!oc->chosen) { > > > dump_header(oc, NULL); > > > - panic("Out of memory and no killable processes...\n"); > > > + pr_warn("Out of memory and no killable processes...\n"); > > > + /* > > > + * If we got here due to an actual allocation at the > > > + * system level, we cannot survive this and will enter > > > + * an endless loop in the allocator. Bail out now. > > > + */ > > > + if (!is_sysrq_oom(oc) && !is_memcg_oom(oc)) > > > + panic("System is deadlocked on memory\n"); > > > } > > > if (oc->chosen && oc->chosen != (void *)-1UL) > > > oom_kill_process(oc, !is_memcg_oom(oc) ? "Out of memory" : > > > -- > > > 2.18.0 > > > > > > > -- > > Michal Hocko > > SUSE Labs > > -- > Michal Hocko > SUSE Labs