Received: by 2002:ac0:a5a7:0:0:0:0:0 with SMTP id m36-v6csp25719imm; Tue, 7 Aug 2018 13:13:05 -0700 (PDT) X-Google-Smtp-Source: AAOMgpcJ+j5KYLvIrLfkTI2znHFgS+v/R0FM9UlnlgRAiQ7gPV9qD/ECZUufb7Adn84AI89nnedJ X-Received: by 2002:a62:fc5:: with SMTP id 66-v6mr23497340pfp.237.1533672785741; Tue, 07 Aug 2018 13:13:05 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1533672785; cv=none; d=google.com; s=arc-20160816; b=UM7WHQ5bnvxhonsnUwrXISCPlt4K7oqU+XER5ejCHG8yaAItYXKS3AaSOEZEkxG+a4 s60u4qSxTAwT9/wn7Y+kevnMs4E2bpg3r+jCqRuioyAU13neyZ2w9rott4luBGTqBdD1 MBFb537Fj/O4AZaxO0Cb1C/bSiudjPF7y1Gd/eh5i920r1bX3BrX/5Vqtk0uAMKMZpux qGFknnPb3n96ZiFOb2vwPtRG8LtrGQBOCzgSMKAjF/5YOmjacQwxNnVYtqxCgQR4OKz3 YrNu6W5k358LSSTnMqjTudtwg0i1abYm7LjQhKSRXBbE8lk34upb+e/T9uTiKwQPKt+q SPTQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature:arc-authentication-results; bh=jodwqqdlys18HHxzdnu43IcspEflxwN7Fset0UPJxvE=; b=q8FOf5NgzAFK8tSfMln3USviyyPYEpJdprYmHvP+un2jHWO02/0uLXPamXtvnQ4hIs /Ov4N6lIxAt+x2tqSBjPugwJN2OLDZa7YbPbO+/kwHVnTDFXO+d817HOTMmnnLm6zoR/ GdfEFhN89khRlwvM8df9dh4W/IP77d5Dy59vPit0lDhbtcYuAEgXGM2gmx6Qs6xcExyj airU6IecRPibEKvKU+VQ+DVHJ4N1LzZ0wswrOUZHekfpTyY8LQgwMrIl0PoqRtfHU6/8 m+JLZ8TRcW92l0r1zTEsnJk1yhEfUJvahmq3M1OU6+uhAH8dTJ/r0H8oKgWh46HO66uH aiJA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@cmpxchg-org.20150623.gappssmtp.com header.s=20150623 header.b=h5LI4qPL; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=cmpxchg.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id n1-v6si1623670plp.166.2018.08.07.13.12.51; Tue, 07 Aug 2018 13:13:05 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@cmpxchg-org.20150623.gappssmtp.com header.s=20150623 header.b=h5LI4qPL; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=cmpxchg.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727529AbeHGWPt (ORCPT + 99 others); Tue, 7 Aug 2018 18:15:49 -0400 Received: from mail-yb0-f195.google.com ([209.85.213.195]:41767 "EHLO mail-yb0-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726027AbeHGWPt (ORCPT ); Tue, 7 Aug 2018 18:15:49 -0400 Received: by mail-yb0-f195.google.com with SMTP id q5-v6so3918377ybk.8 for ; Tue, 07 Aug 2018 12:59:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cmpxchg-org.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to:user-agent; bh=jodwqqdlys18HHxzdnu43IcspEflxwN7Fset0UPJxvE=; b=h5LI4qPLwQQrpf1L53QjNlPsqdWhLyjBCpS3llgxO2EyQCPzxGk7azLXL84tyYmJRD oYipZ0SPSKuFGSG05sYPRDqwFGa1jaM8z+hjDE8BSZtn73zt61LE6gKhWZJtnzfO8WD/ UnqauveTSDbHVJluMbNzKwS/szMqiYPQOQQ84gOsbebQjGoMa7RfeM9KOONbqtAqtqk4 GBZkx7OFK6jpUFQ/+g1G+zGrtl7WUVzR1qfDo21FMjkys52W9NqNiXy3cg15DxPwtfTq V8xHG5uIlYYnqy9eepRa2Yrqg6T1lZgnPVcq4XXQgZq0qFbRxNo61tXW0QukexRJ6yF2 4gXA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to:user-agent; bh=jodwqqdlys18HHxzdnu43IcspEflxwN7Fset0UPJxvE=; b=Thj98dlPAD5YPs9656lyvPVr9aByuNvop60J1KEWNC7+bFpX1qoGp02MO3NV6wn5Vm kaltFD/ftG2MzgrPWX8XqspU6Ad933rf77ohxPSBTfzU1VFWUOM9m4tVV1PzmNXgPPcM mkzblzRbOaITZky0V0SKRTCQPCVCnlaZGY9+EvtxQQz+dMZC0i5tdlNSfQyBp5g6keyD 90/mkDkV3m2wleHLgHwEkd+tD0B4mccTkRlGNwA8lJLRHPIKri3IeE67ar47QIj+UaXB /bRMQljg++wdlcweb7mQWIfW9LgmZ0WHaAiReOeaXso/Q4sLvrb/YTTzEzJXSAhhwg5g Ud5w== X-Gm-Message-State: AOUpUlFP8zId5E/z5ZmV15xwrPqhTbZU2Waa8SLXGOssWHevm98JQaDy 8/22fN9mPkCUlee/wp7sMBy/AA== X-Received: by 2002:a25:afc6:: with SMTP id d6-v6mr10706634ybj.84.1533671988848; Tue, 07 Aug 2018 12:59:48 -0700 (PDT) Received: from localhost ([2620:10d:c091:180::1:97c3]) by smtp.gmail.com with ESMTPSA id 79-v6sm1530668ywp.71.2018.08.07.12.59.47 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Tue, 07 Aug 2018 12:59:47 -0700 (PDT) Date: Tue, 7 Aug 2018 16:02:47 -0400 From: Johannes Weiner To: Michal Hocko Cc: Andrew Morton , Vladimir Davydov , linux-mm@kvack.org, Greg Thelen , Tetsuo Handa , Dmitry Vyukov , LKML , Michal Hocko Subject: Re: [PATCH] memcg, oom: be careful about races when warning about no reclaimable task Message-ID: <20180807200247.GA4251@cmpxchg.org> References: <20180807072553.14941-1-mhocko@kernel.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20180807072553.14941-1-mhocko@kernel.org> User-Agent: Mutt/1.10.0 (2018-05-17) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Aug 07, 2018 at 09:25:53AM +0200, Michal Hocko wrote: > From: Michal Hocko > > "memcg, oom: move out_of_memory back to the charge path" has added a > warning triggered when the oom killer cannot find any eligible task > and so there is no way to reclaim the oom memcg under its hard limit. > Further charges for such a memcg are forced and therefore the hard limit > isolation is weakened. > > The current warning is however too eager to trigger even when we are not > really hitting the above condition. Syzbot[1] and Greg Thelen have noticed > that we can hit this condition even when there is still oom victim > pending. E.g. the following race is possible: > > memcg has two tasks taskA, taskB. > > CPU1 (taskA) CPU2 CPU3 (taskB) > try_charge > mem_cgroup_out_of_memory try_charge > select_bad_process(taskB) > oom_kill_process oom_reap_task > # No real memory reaped > mem_cgroup_out_of_memory > # set taskB -> MMF_OOM_SKIP > # retry charge > mem_cgroup_out_of_memory > oom_lock oom_lock > select_bad_process(self) > oom_kill_process(self) > oom_unlock > # no eligible task > > In fact syzbot test triggered this situation by placing multiple tasks > into a memcg with hard limit set to 0. So no task really had any memory > charged to the memcg > > : Memory cgroup stats for /ile0: cache:0KB rss:0KB rss_huge:0KB shmem:0KB mapped_file:0KB dirty:0KB writeback:0KB swap:0KB inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file:0KB unevictable:0KB > : Tasks state (memory values in pages): > : [ pid ] uid tgid total_vm rss pgtables_bytes swapents oom_score_adj name > : [ 6569] 0 6562 9427 1 53248 0 0 syz-executor0 > : [ 6576] 0 6576 9426 0 61440 0 0 syz-executor6 > : [ 6578] 0 6578 9426 534 61440 0 0 syz-executor4 > : [ 6579] 0 6579 9426 0 57344 0 0 syz-executor5 > : [ 6582] 0 6582 9426 0 61440 0 0 syz-executor7 > : [ 6584] 0 6584 9426 0 57344 0 0 syz-executor1 > > so in principle there is indeed nothing reclaimable in this memcg and > this looks like a misconfiguration. On the other hand we can clearly > kill all those tasks so it is a bit early to warn and scare users. Do > that by checking that the current is the oom victim and bypass the > warning then. The victim is allowed to force charge and terminate to > release its temporal charge along the way. > > [1] http://lkml.kernel.org/r/0000000000005e979605729c1564@google.com > Fixes: "memcg, oom: move out_of_memory back to the charge path" > Noticed-by: Greg Thelen > Reported-and-tested-by: syzbot+bab151e82a4e973fa325@syzkaller.appspotmail.com > Signed-off-by: Michal Hocko > --- > mm/memcontrol.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index 4603ad75c9a9..1b6eed1bc404 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -1703,7 +1703,8 @@ static enum oom_status mem_cgroup_oom(struct mem_cgroup *memcg, gfp_t mask, int > return OOM_ASYNC; > } > > - if (mem_cgroup_out_of_memory(memcg, mask, order)) > + if (mem_cgroup_out_of_memory(memcg, mask, order) || > + tsk_is_oom_victim(current)) > return OOM_SUCCESS; > > WARN(1,"Memory cgroup charge failed because of no reclaimable memory! " This is really ugly. :( If that check is only there to suppress the warning when the limit is 0, this should really be a separate branch around the warning, with a fat comment that this is a ridiculous cornercase, and not look like it is an essential part of the memcg reclaim/oom process. Personally, I really don't get the point of this message. What is the user to do with this information? What are we to do with it if people report it? It conveys zero information on what the problem could be, because it asserts a really vague high-level thing. Shouldn't such debugging happen inside the OOM killer? What are the conceivable scenarios in which this triggers other than obvious misconfigs? What would we lose by just deleting it?