Received: by 10.223.176.5 with SMTP id f5csp3503262wra; Mon, 29 Jan 2018 14:17:10 -0800 (PST) X-Google-Smtp-Source: AH8x224NxpxT4xrli9iOSUhXJRklmJyGqxGXmgAjBFNe1HmuwkSiLwhi0aM4jmoTy+LSKRc9qV/O X-Received: by 10.101.96.6 with SMTP id m6mr13858249pgu.131.1517264230241; Mon, 29 Jan 2018 14:17:10 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1517264230; cv=none; d=google.com; s=arc-20160816; b=XvFYK8xjZxzXbT2FeQCYq+ughskwZpNe7Rltxh1knJUSPnY7enHEJRmkPByqOS081J n0x/Nb+hNRhfHlr7mhdsT5xUwYvgQo3FCIIggLpQalQa4pnTj5VCCpPFUx1H/2RJg97a 42fH8ZOwW1WlkigSRt2rpUidsBeZkvchmTIj4IXAhsH9htHCqYfJZ0lKkyKnxyTQB2zE 2SzCko08e2md8Ywfhrp22zGK9BWL6BRkNxqVXcnciQFtx0S6Us+XdkYye/xABvNv9/kN a2gAF9yJmXnvO+blrcRl20BKIpNOqY5axNYjXOhAyayrrztrwfPCMKpQAYHH73ZEmo8o K9yA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:user-agent:references :message-id:in-reply-to:subject:cc:to:from:date:dkim-signature :arc-authentication-results; bh=S0uWyEVSEwgrAgDNDcuK2GJLvxCIQ9ZA0A7YmGoaWB0=; b=jeWIBGiWBWxRrj2QHB4tgNNtpf3Oi3rj10eFcWYg0M8pMJBEQEWYzIz2m+NaiuPcC2 Ht2qUPxLtOMJFFsHrRfxmhB2PpYBTquHFPEmQstMZgW+YTxPbswBxGDEmJTKfEGXP3Ze doKs9QIqC9dJakyRdqZz3cpoMM3hKImGJLEs2Ha4SXJrXvjkLdJW/jPd5tdr1H11lWTo vr6prXYNSpdHu/Hu/KItrqRfCqWIraoFaoTmxL/V7vV6eB1RNPadTaaKiZ85mbvk5zBo YPmrZn3I9ZrazPHN/97LsermqegN7euuuMlSLnCK+g0f8AEtXogUM1Sf7U8q+QUOAzKq uw8Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=rE4uMYPE; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id v16si12687780pfl.198.2018.01.29.14.16.55; Mon, 29 Jan 2018 14:17:10 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=rE4uMYPE; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751855AbeA2WQ3 (ORCPT + 99 others); Mon, 29 Jan 2018 17:16:29 -0500 Received: from mail-io0-f194.google.com ([209.85.223.194]:38622 "EHLO mail-io0-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751670AbeA2WQ1 (ORCPT ); Mon, 29 Jan 2018 17:16:27 -0500 Received: by mail-io0-f194.google.com with SMTP id d13so9325891iog.5 for ; Mon, 29 Jan 2018 14:16:26 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:from:to:cc:subject:in-reply-to:message-id:references :user-agent:mime-version; bh=S0uWyEVSEwgrAgDNDcuK2GJLvxCIQ9ZA0A7YmGoaWB0=; b=rE4uMYPEtoNYEp391kvWVa3K50GS6XSESgb0+9SkZJd3OJEw3+VvuRRAXwAKhvnmVa XtOcOEX27xYYrMCcDHi3kIw102LIy4CVDxC0Z12cFKjN5tRG6Hjj2kKn5nOXJOUIfU6S elPULn3BPwJbNPCRZ30tYuk9Vh8YsUA/GXe2RW1li5d10pPr6vD5cqqiFaf/+WvjA1kB gPRCsSVwg9yyjndcbKMHG3DG/Z352FrKkb4LxQu7fEOcEtOGFk58nvvt9pfFCAlmb9vs tDSaO41jOu+uwhlA8FzSaU3KfFohP+kdnL+HzxMxGJpT7FoLf56fx9h9u07AEO50QWup 1qTA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:in-reply-to:message-id :references:user-agent:mime-version; bh=S0uWyEVSEwgrAgDNDcuK2GJLvxCIQ9ZA0A7YmGoaWB0=; b=gfIN5wZU7zp+Z4XZMEipRfrUozuiJRS4claKjYF5TNwaTQs2ri3dCjCHIsRHj4oQU2 vNMJqRFrLj7TQ+yrEFtr+4XX+ZaONpK9cyVdU6uwUnPuYY8/4lfBXvfzLPkp+3O469WL aN1BhvmUb0/EPO0Q4QPe8o6/2cZXaIYg/3XL8+jphNVZh3KJ/ldEuSrSPRi9Zc35ML9c ijw5yWZh4c/VJTb/tM/W3fcJvQJ0QjrD4RLxToN6L4QBQrytF2H9QLpOtjQ26M+o/DEM uqvEIHJ1rRb325+At/cnQ4jObkVO1ta7oO0rctjd/vVeJliX8J0KlMdxdleLmZdU5maA 9QAg== X-Gm-Message-State: AKwxyteo/BmSQZ0wk5a892xxA1pVO8FXPFbll5QO+OMZDLpCWnGuasvZ RJtlwZbhAKT6j+GksF8TBva6tA== X-Received: by 10.107.53.83 with SMTP id c80mr29448400ioa.90.1517264185910; Mon, 29 Jan 2018 14:16:25 -0800 (PST) Received: from [2620:15c:17:3:8c07:a9e:d132:4e3c] ([2620:15c:17:3:8c07:a9e:d132:4e3c]) by smtp.gmail.com with ESMTPSA id r78sm4656311ior.25.2018.01.29.14.16.24 (version=TLS1 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Mon, 29 Jan 2018 14:16:25 -0800 (PST) Date: Mon, 29 Jan 2018 14:16:23 -0800 (PST) From: David Rientjes X-X-Sender: rientjes@chino.kir.corp.google.com To: Andrew Morton cc: Roman Gushchin , Michal Hocko , Vladimir Davydov , Johannes Weiner , Tetsuo Handa , Tejun Heo , kernel-team@fb.com, cgroups@vger.kernel.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: Re: [patch -mm v2 2/3] mm, memcg: replace cgroup aware oom killer mount option with tunable In-Reply-To: <20180126161735.b999356fbe96c0acd33aaa66@linux-foundation.org> Message-ID: References: <20180125160016.30e019e546125bb13b5b6b4f@linux-foundation.org> <20180126143950.719912507bd993d92188877f@linux-foundation.org> <20180126161735.b999356fbe96c0acd33aaa66@linux-foundation.org> User-Agent: Alpine 2.10 (DEB 1266 2009-07-14) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, 26 Jan 2018, Andrew Morton wrote: > > > > -ECONFUSED. We want to have a mount option that has the sole purpose of > > > > doing echo cgroup > /mnt/cgroup/memory.oom_policy? > > > > > > Approximately. Let me put it another way: can we modify your patchset > > > so that the mount option remains, and continues to have a sufficiently > > > same effect? For backward compatibility. > > > > > > > The mount option would exist solely to set the oom policy of the root mem > > cgroup, it would lose its effect of mandating that policy for any subtree > > since it would become configurable by the user if delegated. > > Why can't we propagate the mount option into the subtrees? > > If the user then alters that behaviour with new added-by-David tunables > then fine, that's still backward compatible. > It's not, if you look for the "groupoom" mount option it will specify two different things: the entire hierarchy is locked into a single per-cgroup usage comparison (Roman's original patchset), and entire hierarchy had an initial oom policy set which could have subsequently changed (my extension). With memory.oom_policy you need to query what the effective policy is, checking for "groupoom" is entirely irrelevant, it was only the initial setting. Thus, if memory.oom_policy is going to be merged in the future, it necessarily obsoletes the mount option. It would depend on the kernel version to determine its meaning. I'm struggling to see the benefit of simply not reviewing patches that build off the original and merging a patchset early. What are we gaining? > > Let me put it another way: if the cgroup aware oom killer is merged for > > 4.16 without this patchset, userspace can reasonably infer the oom policy > > from checking how cgroups were mounted. If my followup patchset were > > merged for 4.17, that's invalid and it becomes dependent on kernel > > version: it could have the "groupoom" mount option but configured through > > the root mem cgroup's memory.oom_policy to not be cgroup aware at all. > > That concern seems unreasonable to me. Is an application *really* > going to peek at the mount options to figure out what its present oom > policy is? Well, maybe. But that's a pretty dopey thing to do and I > wouldn't lose much sleep over breaking any such application in the very > unlikely case that such a thing was developed in that two-month window. > It's not dopey, it's the only way that any userspace can determine what process is going to be oom killed! That policy will dictate how the cgroup hierarchy is configured without my extension, there's no other way to prefer or bias processes. How can a userspace cgroup manager possibly construct a cgroup v2 hierarchy with expected oom kill behavior if it is not peeking at the mount option? My concern is that if extended with my patchset the mount option itself becomes obsolete and then peeking at it is irrelevant to the runtime behavior! > > > There's nothing wrong with that! As long as we don't break existing > > > setups while evolving the feature. How do we do that? > > > > We'd break the setups that actually configure their cgroups and processes > > to abide by the current implementation since we'd need to discount > > oom_score_adj from the the root mem cgroup usage to fix it. > > Am having trouble understanding that. Expand, please? > > Can we address this (and other such) issues in the (interim) > documentation? > This point isn't a documentation issue at all, this is the fact that oom_score_adj is only effective for the root mem cgroup. If the user is fully aware of the implementation, it does not change the fact that he or she will construct their cgroup hierarchy and attach processes to it to abide by the behavior. That is the breakage that I am concerned about. An example: you have a log scraper that is running with /proc/pid/oom_score_adj == 999. It's best effort, it can be killed, we'll retry the next time if the system has memory available. This is partly why oom_adj and oom_score_adj exist and is used on production systems. If you attach that process to an unlimited mem cgroup dedicated to system daemons purely for the rich stats that mem cgroup provides, this breaks the oom_score_adj setting solely because it's attached to the cgroup. On system-wide oom, it is no longer the killed process merely because it is attached to an unlimited child cgroup. This is not the only such example: this occurs for any process attached to a cgroup.