Received: by 2002:ac0:a5b6:0:0:0:0:0 with SMTP id m51-v6csp799366imm; Tue, 5 Jun 2018 04:48:42 -0700 (PDT) X-Google-Smtp-Source: ADUXVKL1IkG6P/frv8YsXb/Y9onhobnyV0LFYaysmhHoNLfE3dMLHGWe4V0ym/t+GUp6eWDSwZrR X-Received: by 2002:a62:a104:: with SMTP id b4-v6mr11289264pff.159.1528199322478; Tue, 05 Jun 2018 04:48:42 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1528199322; cv=none; d=google.com; s=arc-20160816; b=L6mJjKfcj8b/G5Y3khrFMked8EbiODtly5TONfBK3EW9+rafxroaefyXTGR8XR13i9 IKYVlCApY+IduW7w1djWx/0hNLKm7JRA9lkKgJQZIK3yqHsbA91JleVWATH+VN8+YWew dsWOr+Q58u4k+xk2F518t7xHQ1DrUmjS27YsRRepdvRTaGyEnrpJDMBZSz6QLtDO8NCc Si5K5ZZ0bJr/HwMAHsih0gTvsvCnwIEZmYxZiPPAwUzS9lR7WqrGb6/LYEq/Vq99gqKy nQxsbEoOwrwDZTKeCtFxb0KTsYkPgUFExSayfODJITJFJJ9oWke+sVyV4HWHU65TzWEV SJzw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:arc-authentication-results; bh=bNFv9xAx4QfwuNObkgJNRNZgWXOU9w7+4FKJpjjcSiU=; b=qpQVitLUXCiHa3z8XpwHVmTFti7eaxarzE/t4krGIjtrTL8HKxladbNu4O3LN1eQNE eLfDlusWn0vl37dGHd/RcdztVh2GzCxaH5XCoONUoHpxj3yslXZgof+9K2I8n04nDBKS tLlPam8EPuZQhY/AhFbhzxoy8Ve8Iv1jylZK0fayyIvyPddeLO7z9Ob6m9T4k0AWoLh0 KWNOoEWdAKluKSnfH54VqE96NL0kpeO74eWPf0iuN+S33ZAcsI04AnIYXsUApP8f8a6L 0G2LjTtsQ4Qb3UWMZ8uNOjb9/ybAxVg2l2nMkb113WBtvu8s/JdBatsqIksIs54FeKr0 raYg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id b37-v6si18819834pla.555.2018.06.05.04.48.28; Tue, 05 Jun 2018 04:48:42 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751771AbeFELrj (ORCPT + 99 others); Tue, 5 Jun 2018 07:47:39 -0400 Received: from mx2.suse.de ([195.135.220.15]:35917 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751625AbeFELrd (ORCPT ); Tue, 5 Jun 2018 07:47:33 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay1.suse.de (charybdis-ext-too.suse.de [195.135.220.254]) by mx2.suse.de (Postfix) with ESMTP id 975A8AE09; Tue, 5 Jun 2018 11:47:30 +0000 (UTC) Date: Tue, 5 Jun 2018 13:47:29 +0200 From: Michal Hocko To: Roman Gushchin Cc: linux-mm@vger.kernel.org, Vladimir Davydov , Johannes Weiner , Tetsuo Handa , David Rientjes , Andrew Morton , Tejun Heo , kernel-team@fb.com, cgroups@vger.kernel.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: Re: [PATCH v13 0/7] cgroup-aware OOM killer Message-ID: <20180605114729.GB19202@dhcp22.suse.cz> References: <20171130152824.1591-1-guro@fb.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20171130152824.1591-1-guro@fb.com> User-Agent: Mutt/1.9.5 (2018-04-13) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org It seems that this is still in limbo mostly because of David's concerns. So let me reiterate them and provide my POV once more (and the last time) just to help Andrew make a decision: 1) comparision root with tail memcgs during the OOM killer is not fair because we are comparing tasks with memcgs. This is true, but I do not think this matters much for workloads which are going to use the feature. Why? Because the main consumers of the new feature seem to be containers which really need some fairness when comparing _workloads_ rather than processes. Those are unlikely to contain any significant memory consumers in the root memcg. That would be mostly common infrastructure. Is this is fixable? Yes, we would need to account in the root memcgs. Why are we not doing that now? Because it has some negligible performance overhead. Are there other ways? Yes we can approximate root memcg memory consumption but I would rather wait for somebody seeing that as a real problem rather than add hacks now without a strong reason. 2) Evading the oom killer by attaching processes to child cgroups which basically means that a task can split up the workload into smaller memcgs to hide their real memory consumption. Again true but not really anything new. Processes can already fork and split up the memory consumption. Moreover it doesn't even require any special privileges to do so unlike creating a sub memcg. Is this fixable? Yes, untrusted workloads can setup group oom evaluation at the delegation layer so all subgroups would be considered together. 3) Userspace has zero control over oom kill selection in leaf mem cgroups. Again true but this is something that needs a good evaluation to not end up in the fiasko we have seen with oom_score*. Current users demanding this feature can live without any prioritization so blocking the whole feature seems unreasonable. 4) Future extensibility to be backward compatible. David is wrong here IMHO. Any prioritization or oom selection policy controls added in future are orthogonal to the oom_group concept added by this patchset. Allowing memcg to be an oom entity is something that we really want longterm. Global CGRP_GROUP_OOM is the most restrictive semantic and softening it will be possible by a adding a new knob to tell whether a memcg/hierarchy is a workload or a set of tasks. -- Michal Hocko SUSE Labs