Received: by 2002:ac0:a5a7:0:0:0:0:0 with SMTP id m36-v6csp3835128imm; Mon, 6 Aug 2018 11:22:55 -0700 (PDT) X-Google-Smtp-Source: AAOMgpd00SjMxYTfoDI0p9Gy3kn6eBcOpygRd4tW7z7Qa4Vy4C3ktzNJUUEIIjHyoOTNzADbObbm X-Received: by 2002:a17:902:5501:: with SMTP id f1-v6mr14923487pli.219.1533579775121; Mon, 06 Aug 2018 11:22:55 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1533579775; cv=none; d=google.com; s=arc-20160816; b=llAqU6Ik/TRHJ1/wZI/fmnxPm5LWTT1nJtaC+2u7vKdcrPK3dM8MFPPdNOx1/uH47n 31agWwJfo/ijOsyMxW1WBSpuiNq4Wh1eXnquOwjo0oW2znmyASckEasL+FY1JPVYM8uk 4Ka7tROrkhVrrM4aloCU/LZs0isCnZFEcLdzfE1HQafY+Zz8+UdEn22FA4zEKopf28UR Vib6snjErpyVmoYO81f/n5Gb2v1CGJ6pt36dvqScpiD7mRKMHw3WFfYmQrGejxBCEIiz WdKFubV/f3jOaoSKxOaG59h0vneAjHnKBRC+4nKniYRkqKDDODlFtsPHftJP90gaOXPK Dp3A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:arc-authentication-results; bh=lvpK/FhUEh1nXneIRlYGE1USNw5D6FWwAB8C9rMfHXc=; b=QTWTGSJ6w0ipIzUV0VZTCSh95DVm9O1qQmn4eFWCj7PX+PqnmENVKWmgIhsf310da3 EBPogsR3AOSH+pNmlVDETD7YK2cyp2XUrxaGtN09HC4gllvVAlhx6I8wn9rC+xEX78gl us/jMBzrKAPh5n4W7qN3Tx4ykyRy0rQhJBorCsatWlE5m9Dhnt4GRcXdLC1xUOo15BbP CFj2Z5yecFAnIIg6x14DeF/q1E2r4UjQMwL/UM/kYx1D2WqEm+DMtH3Q4BU20A1aoG8A MuPLAwMOvwGntVuwqk8sJixDlgaUbckWkdF0ufq7vRIx/CMj+Goq+ykP8C2bUPpeMgqP e22g== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id h8-v6si11852253pli.14.2018.08.06.11.22.40; Mon, 06 Aug 2018 11:22:55 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1732691AbeHFTkL (ORCPT + 99 others); Mon, 6 Aug 2018 15:40:11 -0400 Received: from mx2.suse.de ([195.135.220.15]:60256 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1728948AbeHFTkL (ORCPT ); Mon, 6 Aug 2018 15:40:11 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 71C11ACF9; Mon, 6 Aug 2018 17:30:03 +0000 (UTC) Date: Mon, 6 Aug 2018 19:30:00 +0200 From: Michal Hocko To: Dmitry Vyukov Cc: syzbot , cgroups@vger.kernel.org, Johannes Weiner , LKML , Linux-MM , syzkaller-bugs , Vladimir Davydov , Dmitry Torokhov Subject: Re: WARNING in try_charge Message-ID: <20180806173000.GA10003@dhcp22.suse.cz> References: <0000000000005e979605729c1564@google.com> <20180806091552.GE19540@dhcp22.suse.cz> <20180806094827.GH19540@dhcp22.suse.cz> <20180806110224.GI19540@dhcp22.suse.cz> <20180806142124.GP19540@dhcp22.suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon 06-08-18 16:58:01, Dmitry Vyukov wrote: > On Mon, Aug 6, 2018 at 4:21 PM, Michal Hocko wrote: > > On Mon 06-08-18 13:57:38, Dmitry Vyukov wrote: > >> On Mon, Aug 6, 2018 at 1:02 PM, Michal Hocko wrote: > > [...] > >> >> A much > >> >> friendlier for user way to say this would be print a message at the > >> >> point of misconfiguration saying what exactly is wrong, e.g. "pid $PID > >> >> misconfigures cgroup /cgroup/path with mem.limit=0" without a stack > >> >> trace (does not give any useful info for user). And return EINVAL if > >> >> it can't fly at all? And then leave the "or a kernel bug" part for the > >> >> WARNING each occurrence of which we do want to be reported to kernel > >> >> developers. > >> > > >> > But this is not applicable here. Your misconfiguration is quite obvious > >> > because you simply set the hard limit to 0. This is not the only > >> > situation when this can happen. There is no clear point to tell, you are > >> > doing this wrong. If it was we would do it at that point obviously. > >> > >> But, isn't there a point were hard limit is set to 0? I would expect > >> there is a something like cgroup file write handler with a value of 0 > >> or something. > > > > Yeah, but this is only one instance of the problem. Other is that the > > memcg is not reclaimable for any other reasons. And we do not know what > > those might be > > > >> > >> > If you have a strong reason to believe that this is an abuse of WARN I > >> > am all happy to change that. But I haven't heard any yet, to be honest. > >> > >> WARN must not be used for anything that is not kernel bugs. If this is > >> not kernel bug, WARN must not be used here. > > > > This is rather strong wording without any backing arguments. I strongly > > doubt 90% of existing WARN* match this expectation. WARN* has > > traditionally been a way to tell that something suspicious is going on. > > Those situation are mostly likely not fatal but it is good to know they > > are happening. > > > > Sure there is that panic_on_warn thingy which you seem to be using and I > > suspect it is a reason why you are so careful about warnings in general > > but my experience tells me that this configuration is barely usable > > except for testing (which is your case). > > > > But as I've said, I do not insist on WARN here. All I care about is to > > warn user that something might go south and this may be either due to > > misconfiguration or a subtly wrong memcg reclaim/OOM handler behavior. > > I am a bit lost. Can limit=0 legally lead to the warnings? Or there is > also a kernel bug on top of that and it's actually a kernel bug that > provokes the warning? As I've tried to tell already. I cannot tell for sure. It is the killed oom victim which triggered thw warning and that shouldn't really happen. Considering this doesn't reproduce with the current linux next nor linus tree and the oom code has changed since the version you have tested then I would suspect there was something wrong with the memcg oom code. But maybe the test doesn't really reproduce reliably. > If it's a kernel bug, then I propose to stop arguing about > configuration and concentrate on the bug. > If it's just the misconfiguration that triggers the warning, then can > we separate the 2 causes of the warning (user misconfiguration and > kernel bugs)? Say, return EINVAL when mem limit is set to 0 (and print > a line to console if necessary)? Or if the limit=0 is somehow not > possible/desirable to detect right away, check limit=0 at the point of > the warning and don't want? No we simply cannot. There is numerous situations when this can trigger. Say you set the hard limit to N and then try to fault in shmem file with the size >= N. No oom killer will help to reclaim memory. Or say you migrate the all tasks away from the memcg and then somebody triggers the memcg OOM in that group. There is simply nobody to kill. See the point? There is simply no direct contection between the configuration and actual problem. Too many things might happen between those two points. Let me repeat. We do warn because we want to hear if this happens. WARN tends to be a good way to get that attention. If you strongly believe this is an abuse I won't mind seeing a patch to turn it into something different. -- Michal Hocko SUSE Labs