Received: by 2002:ac0:a5a7:0:0:0:0:0 with SMTP id m36-v6csp3840365imm; Mon, 6 Aug 2018 11:28:43 -0700 (PDT) X-Google-Smtp-Source: AAOMgpfIfFVvqzaXWsYEpPign2sOWZc67Ru8zAZrZgww2oWyu4MuwRGQLU9YmIJsA+3BneJA/eT5 X-Received: by 2002:a62:1e81:: with SMTP id e123-v6mr18552284pfe.24.1533580123093; Mon, 06 Aug 2018 11:28:43 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1533580123; cv=none; d=google.com; s=arc-20160816; b=V1Ioi9KXFYhzkXX0NhN7LOx/a/4FWZMo1SuDhl8FYtO0G2fl8y66Os72P5S9A/Djk3 KbEVe6hTqb6MO9sNx+o8yYBXvubIMFxRBE3iuVwQLFDx8O0dBnYBSJGGj6qkENagAQz+ tYS2+P36yGgwZLFrrjS5GWgQjJjUJXUYc+xjTYzTgEQ4JoevWvBbI+RSrzruKd/une5D pCFeDX8xuzEFyBYFWNgKnxAnl+5RGMJXRDAvEazMleHO6VJA/NdT4uVNdIRQMMF/xED2 CnqZQQrfJ4Vwn1kaFu/d4YUwKLRw1O+PKuU19Yt4+qLlm+vamrj1NQ7sD0ETRmup3Tsj 981A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :references:in-reply-to:mime-version:dkim-signature :arc-authentication-results; bh=hPqqnX0EJz3Ior5nbXXYJFW7CeM5q2klKKYepQkaJfw=; b=mqi374LBt/qTha4hjQ2zu5BV6ffZRL/pS7KFw6bBaNQytMMXme0w45gBIIaCICvwnC 9bGmIN9ow86C+DRT65eopleQrct0lH/8sP1aZyfwP+IrxjDEI75JWCyEpHWqMLNWCORl 4yNn25ZZfwNaJkv3w2hSJjj1sn5fZhr+Og2R5wl8sXXOCuTcgLzMBvhjchl8x1n7XsYl IrWb5zIYlar7xoPL7RmEhCJNYcLf+sXWM/G9IMgcJSvRTxTXh82aXQwHKGlJuS/ka3iT OfVQfSio+f/Uw000XUZBPo9TZQJpHLpdmqRr5G2YDPQo0xyRf2/lMTcaICSr1RCg0pTw W68g== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=n6ARc77G; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id h8-v6si11852253pli.14.2018.08.06.11.28.28; Mon, 06 Aug 2018 11:28:43 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=n6ARc77G; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1733152AbeHFUEL (ORCPT + 99 others); Mon, 6 Aug 2018 16:04:11 -0400 Received: from mail-pf1-f194.google.com ([209.85.210.194]:36170 "EHLO mail-pf1-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1732970AbeHFUEL (ORCPT ); Mon, 6 Aug 2018 16:04:11 -0400 Received: by mail-pf1-f194.google.com with SMTP id b11-v6so7223366pfo.3 for ; Mon, 06 Aug 2018 10:54:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=hPqqnX0EJz3Ior5nbXXYJFW7CeM5q2klKKYepQkaJfw=; b=n6ARc77GSrfE7O+bkS+6g6f8R0TKVeNdf67vvEVLeypqqEvfCmUT46u/zOXTZzoaAp 49j+agrOSk4L14ufZE3Ewqjq8bkKiDoAiAT/qO3P/yPfdber+mSC7zq0U/fgLKlVoYyO ShXRNcMHYbZih9L4HXaRqbCKB0ZU54e9NBR3tnOuCXKd8LH5XD8T9W+eT/QV+EA06SUv 3pkGQsaZmMuF4uy6jS5owqfpfXuuMN0s2+BVhr9unp5RzNyqblZJ8Fsku7632gVYugFz /gqLiY9bXu4fS2ayH/bEp6GtKxxs45vOLhLIwsq9rrA9gU1OPezAPa1zrVot2UI5wlH4 RYrg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=hPqqnX0EJz3Ior5nbXXYJFW7CeM5q2klKKYepQkaJfw=; b=SypG8Dz78NkNTR8gIsHYqj8ccgo+XghSXmRsBUBh6XVwLjs1oJOGFsWm8mvWDDq9Lc P2uHkbLrL9BeHGl/jfy1HUFKSNoMwNxvTMaAqXPrstMxenoPTD3imDFzRJMcc0P2YJSS 1YgF2o5RfhgBlIKTF/6G4WuVRx934ixKD3UAYYhigENUe3nUxUGM+c0UJM4IrjnVAQUY RdzTOW+qY+mckXbAuOoZAqlosnh8CiL1vqihouX9znZKilNmBGqPLkRF2vuAHR3wwIrI D/yrxNmz1Ff0nOr3bN104QGccWIMm5toDM1dd7WGLnV1CZHlwWQOh2FsvJSifdV3bV9y kuyQ== X-Gm-Message-State: AOUpUlGic1FouSo2j+6jRyzqk4ZD5TZR/hODhhFL8Cbg/qaxPE23rcib Nix+5ZfAnBWNFIW+jF3PRwWu6DDigHP6qnTCLLIG/w== X-Received: by 2002:a63:c046:: with SMTP id z6-v6mr15456964pgi.114.1533578040098; Mon, 06 Aug 2018 10:54:00 -0700 (PDT) MIME-Version: 1.0 Received: by 2002:a17:90a:ac14:0:0:0:0 with HTTP; Mon, 6 Aug 2018 10:53:39 -0700 (PDT) In-Reply-To: <20180806173000.GA10003@dhcp22.suse.cz> References: <0000000000005e979605729c1564@google.com> <20180806091552.GE19540@dhcp22.suse.cz> <20180806094827.GH19540@dhcp22.suse.cz> <20180806110224.GI19540@dhcp22.suse.cz> <20180806142124.GP19540@dhcp22.suse.cz> <20180806173000.GA10003@dhcp22.suse.cz> From: Dmitry Vyukov Date: Mon, 6 Aug 2018 19:53:39 +0200 Message-ID: Subject: Re: WARNING in try_charge To: Michal Hocko Cc: syzbot , cgroups@vger.kernel.org, Johannes Weiner , LKML , Linux-MM , syzkaller-bugs , Vladimir Davydov , Dmitry Torokhov Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Aug 6, 2018 at 7:30 PM, Michal Hocko wrote: >> >> >> A much >> >> >> friendlier for user way to say this would be print a message at the >> >> >> point of misconfiguration saying what exactly is wrong, e.g. "pid $PID >> >> >> misconfigures cgroup /cgroup/path with mem.limit=0" without a stack >> >> >> trace (does not give any useful info for user). And return EINVAL if >> >> >> it can't fly at all? And then leave the "or a kernel bug" part for the >> >> >> WARNING each occurrence of which we do want to be reported to kernel >> >> >> developers. >> >> > >> >> > But this is not applicable here. Your misconfiguration is quite obvious >> >> > because you simply set the hard limit to 0. This is not the only >> >> > situation when this can happen. There is no clear point to tell, you are >> >> > doing this wrong. If it was we would do it at that point obviously. >> >> >> >> But, isn't there a point were hard limit is set to 0? I would expect >> >> there is a something like cgroup file write handler with a value of 0 >> >> or something. >> > >> > Yeah, but this is only one instance of the problem. Other is that the >> > memcg is not reclaimable for any other reasons. And we do not know what >> > those might be >> > >> >> >> >> > If you have a strong reason to believe that this is an abuse of WARN I >> >> > am all happy to change that. But I haven't heard any yet, to be honest. >> >> >> >> WARN must not be used for anything that is not kernel bugs. If this is >> >> not kernel bug, WARN must not be used here. >> > >> > This is rather strong wording without any backing arguments. I strongly >> > doubt 90% of existing WARN* match this expectation. WARN* has >> > traditionally been a way to tell that something suspicious is going on. >> > Those situation are mostly likely not fatal but it is good to know they >> > are happening. >> > >> > Sure there is that panic_on_warn thingy which you seem to be using and I >> > suspect it is a reason why you are so careful about warnings in general >> > but my experience tells me that this configuration is barely usable >> > except for testing (which is your case). >> > >> > But as I've said, I do not insist on WARN here. All I care about is to >> > warn user that something might go south and this may be either due to >> > misconfiguration or a subtly wrong memcg reclaim/OOM handler behavior. >> >> I am a bit lost. Can limit=0 legally lead to the warnings? Or there is >> also a kernel bug on top of that and it's actually a kernel bug that >> provokes the warning? > > As I've tried to tell already. I cannot tell for sure. It is the killed > oom victim which triggered thw warning and that shouldn't really > happen. Considering this doesn't reproduce with the current linux next > nor linus tree and the oom code has changed since the version you have > tested then I would suspect there was something wrong with the memcg oom > code. But maybe the test doesn't really reproduce reliably. > >> If it's a kernel bug, then I propose to stop arguing about >> configuration and concentrate on the bug. >> If it's just the misconfiguration that triggers the warning, then can >> we separate the 2 causes of the warning (user misconfiguration and >> kernel bugs)? Say, return EINVAL when mem limit is set to 0 (and print >> a line to console if necessary)? Or if the limit=0 is somehow not >> possible/desirable to detect right away, check limit=0 at the point of >> the warning and don't want? > > No we simply cannot. There is numerous situations when this can trigger. > Say you set the hard limit to N and then try to fault in shmem file with > the size >= N. No oom killer will help to reclaim memory. Or say you > migrate the all tasks away from the memcg and then somebody triggers the > memcg OOM in that group. There is simply nobody to kill. See the point? > There is simply no direct contection between the configuration and > actual problem. Too many things might happen between those two points. > Let me repeat. We do warn because we want to hear if this happens. WARN > tends to be a good way to get that attention. If you strongly believe > this is an abuse I won't mind seeing a patch to turn it into something > different. I don't believe it is an abuse, I don't know this code well. Let's assume the misconfiguration is a red-herring for now then.