Received: by 2002:ac0:a5a7:0:0:0:0:0 with SMTP id m36-v6csp5152123imm; Tue, 21 Aug 2018 07:08:29 -0700 (PDT) X-Google-Smtp-Source: AA+uWPy3B7WNLzpTANcmHWg18Cfd1usLEta2/cep+oLSHMPLyP7PyfzOPWboycAdkk7krtDDjnhY X-Received: by 2002:a62:2e02:: with SMTP id u2-v6mr53267496pfu.134.1534860509321; Tue, 21 Aug 2018 07:08:29 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1534860509; cv=none; d=google.com; s=arc-20160816; b=Ye4D4M2yzwo37684Q9IR21Hmuht39JedgFhSHMrF6vheDmew/7IRuV3bBSAekuj/hp kTxxW+IbOAL5NMo7JHWyYmK0s9XY3NG/1PvBx0EV+MiShRNL8OmFDWJU67Y4DCJ06cEF JXqibtWNY/RFIest4WmuZvqOUG7u14mOZ41SoDFJTBAIFY+ZfbdVSd6VCAJcq+hLKlOd FoXUVj0y0D53s+RAwNojHaS0F7fSC40ZLB2LVZkFRPz8QpcFQIf3eMgmek1K0b7QlUcp jHDhBQZDiWOQ2He9/g9VnRXM7YUqyU0+B5uMPnOshaNcGRngkwHvjd6nK+VpH7u60Fze 93cg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:arc-authentication-results; bh=7alqxPGIadtP5j4DXJrFyJ5Tj/b1vVh3jhQ8ovjJiMA=; b=nn9i70uWsXMGUDQXeHAVjZfynQO2cGrXrkgl55wTE2pn132ZcMbpT2A9Cj+BC1XC74 miCXSVnw2wClIrUhxpX+FA67ZuLFJ51wukrkX/6VHQuTIQpYHcdZSo3GmZqJC//ndwfL u+8BeFnXzDunHBRtHNJHWHH8xnE6rExTWcQSroyGdzK9c/2E6EPyl0S9JCAo1cwSbW2V YM+oTFwAiFhL2cnX4qX1on6DFGdTOtkvEPjY1+PcjacJoQghPyRZSZc7fA5Ft3kCOOiL H1YIFdK7AWxhPAD8DIICkv6YmUsg13jWLuTKJo9nQ7GpCeagQ3Sheb4kOMkEUpQKpgWa RETg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id u207-v6si4424497pgb.640.2018.08.21.07.08.13; Tue, 21 Aug 2018 07:08:29 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727531AbeHUR0e (ORCPT + 99 others); Tue, 21 Aug 2018 13:26:34 -0400 Received: from mx2.suse.de ([195.135.220.15]:33614 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1727113AbeHUR0d (ORCPT ); Tue, 21 Aug 2018 13:26:33 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 0A925AD04; Tue, 21 Aug 2018 14:06:14 +0000 (UTC) Date: Tue, 21 Aug 2018 16:06:12 +0200 From: Michal Hocko To: Johannes Weiner Cc: Andrew Morton , Vladimir Davydov , Greg Thelen , Tetsuo Handa , Dmitry Vyukov , linux-mm@kvack.org, LKML Subject: Re: [PATCH 2/2] memcg, oom: emit oom report when there is no eligible task Message-ID: <20180821140612.GD16611@dhcp22.suse.cz> References: <20180808064414.GA27972@dhcp22.suse.cz> <20180808071301.12478-1-mhocko@kernel.org> <20180808071301.12478-3-mhocko@kernel.org> <20180808144515.GA9276@cmpxchg.org> <20180808161737.GQ27972@dhcp22.suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20180808161737.GQ27972@dhcp22.suse.cz> User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Do you plan to repost these two? They are quite deep in the email thread so they can easily fall through cracks. On Wed 08-08-18 18:17:37, Michal Hocko wrote: > On Wed 08-08-18 10:45:15, Johannes Weiner wrote: [...] > > >From bba01122f739b05a689dbf1eeeb4f0e07affd4e7 Mon Sep 17 00:00:00 2001 > > From: Johannes Weiner > > Date: Wed, 8 Aug 2018 09:59:40 -0400 > > Subject: [PATCH] mm: memcontrol: print proper OOM header when no eligible > > victim left > > > > When the memcg OOM killer runs out of killable tasks, it currently > > prints a WARN with no further OOM context. This has caused some user > > confusion. > > > > Warnings indicate a kernel problem. In a reported case, however, the > > situation was triggered by a non-sensical memcg configuration (hard > > limit set to 0). But without any VM context this wasn't obvious from > > the report, and it took some back and forth on the mailing list to > > identify what is actually a trivial issue. > > > > Handle this OOM condition like we handle it in the global OOM killer: > > dump the full OOM context and tell the user we ran out of tasks. > > > > This way the user can identify misconfigurations easily by themselves > > and rectify the problem - without having to go through the hassle of > > running into an obscure but unsettling warning, finding the > > appropriate kernel mailing list and waiting for a kernel developer to > > remote-analyze that the memcg configuration caused this. > > > > If users cannot make sense of why the OOM killer was triggered or why > > it failed, they will still report it to the mailing list, we know that > > from experience. So in case there is an actual kernel bug causing > > this, kernel developers will very likely hear about it. > > > > Signed-off-by: Johannes Weiner > > Yes this works as well. We would get a dump even for the race we have > seen but I do not think this is something to lose sleep over. And if it > triggers too often to be disturbing we can add > tsk_is_oom_victim(current) check there. > > Acked-by: Michal Hocko > > > --- > > mm/memcontrol.c | 2 -- > > mm/oom_kill.c | 13 ++++++++++--- > > 2 files changed, 10 insertions(+), 5 deletions(-) > > > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > > index 4e3c1315b1de..29d9d1a69b36 100644 > > --- a/mm/memcontrol.c > > +++ b/mm/memcontrol.c > > @@ -1701,8 +1701,6 @@ static enum oom_status mem_cgroup_oom(struct mem_cgroup *memcg, gfp_t mask, int > > if (mem_cgroup_out_of_memory(memcg, mask, order)) > > return OOM_SUCCESS; > > > > - WARN(1,"Memory cgroup charge failed because of no reclaimable memory! " > > - "This looks like a misconfiguration or a kernel bug."); > > return OOM_FAILED; > > } > > > > diff --git a/mm/oom_kill.c b/mm/oom_kill.c > > index 0e10b864e074..07ae222d7830 100644 > > --- a/mm/oom_kill.c > > +++ b/mm/oom_kill.c > > @@ -1103,10 +1103,17 @@ bool out_of_memory(struct oom_control *oc) > > } > > > > select_bad_process(oc); > > - /* Found nothing?!?! Either we hang forever, or we panic. */ > > - if (!oc->chosen && !is_sysrq_oom(oc) && !is_memcg_oom(oc)) { > > + /* Found nothing?!?! */ > > + if (!oc->chosen) { > > dump_header(oc, NULL); > > - panic("Out of memory and no killable processes...\n"); > > + pr_warn("Out of memory and no killable processes...\n"); > > + /* > > + * If we got here due to an actual allocation at the > > + * system level, we cannot survive this and will enter > > + * an endless loop in the allocator. Bail out now. > > + */ > > + if (!is_sysrq_oom(oc) && !is_memcg_oom(oc)) > > + panic("System is deadlocked on memory\n"); > > } > > if (oc->chosen && oc->chosen != (void *)-1UL) > > oom_kill_process(oc, !is_memcg_oom(oc) ? "Out of memory" : > > -- > > 2.18.0 > > > > -- > Michal Hocko > SUSE Labs -- Michal Hocko SUSE Labs