Received: by 10.192.165.148 with SMTP id m20csp434856imm; Wed, 9 May 2018 15:38:35 -0700 (PDT) X-Google-Smtp-Source: AB8JxZrTPUh3c76LIOJR4i2XXUhvOaZ22mYPUMLD7zvGFGGjpP+MM+Eq37BJjflUupjiWnvHFJDE X-Received: by 2002:a65:46c1:: with SMTP id n1-v6mr283524pgr.62.1525905515553; Wed, 09 May 2018 15:38:35 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1525905515; cv=none; d=google.com; s=arc-20160816; b=cArURAQcH9eHED68zyBacZKlKKhjbHEucdy2h6nLgB7CbVVwnwAFRgr5eUQTB3B/bB AMnnhIJ3ch76MBWeMX6U2qVgZczQVGjgd6F+PrRqFjiRDD9cFAzJJYPKzo6fjFEpE9OP aWJHUlq7RKOh0QYl+92oniMAElJesrvetzLmm7Dp9gCQ04oYY6qUoqPbGSQ8stPq48Hd Gkllua+sAjdrvFRXUnMG5Zi4Tx51mck+3MT3RtyxmTPNOBmkrroikdPkm4DLy08NS6Ys r4TgUdQOl14wgsGdQJh1hU5+PjQ/9MSa3DgeOWNOO/EMKfjzghKH9fmQ+7keKw86xK0B Pe8g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :references:in-reply-to:message-id:subject:cc:to:from:date :arc-authentication-results; bh=j0mp3Dwl1SeOQKA6qw7FECYpX2NZIpaIpreKiWW+8Lo=; b=BvWaURwtjKNFrX3m7x0EPV0A4jj7I9hWLfRhvz0lufQQl+SwaFqqtQ9Yz82fSHnbUs 2zPRRUOjxpgBCZmGFVsiHg0mw4eOj1MlDVubFjIrSa63qZDHwcqkTU46XyFKhoGwWVAB DDSTCr8VRpYJ/WhPY5McL7Tfl8cPwB+pHgnMH3F7529WzyJphpqhLEdswd0sPOpJ2ktO mxNeEjV0fzaCBdEoGVe3BgYBB1T07E9Cvy98/kNNPdkcnEc+qR1MMZqo3Y4Ti8748XmC KH1h2sHpzncHAAK1r5pHzd6GYllH+eDB9mqnqSi7rwhoMvvgJ+zcbM7F4IjESJzwOaSl lLbQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id a1-v6si19619243pls.523.2018.05.09.15.38.20; Wed, 09 May 2018 15:38:35 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S965786AbeEIWiI (ORCPT + 99 others); Wed, 9 May 2018 18:38:08 -0400 Received: from mail.linuxfoundation.org ([140.211.169.12]:44502 "EHLO mail.linuxfoundation.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S965426AbeEIWiH (ORCPT ); Wed, 9 May 2018 18:38:07 -0400 Received: from akpm3.svl.corp.google.com (unknown [104.133.9.71]) by mail.linuxfoundation.org (Postfix) with ESMTPSA id D0AD7CD1; Wed, 9 May 2018 22:38:06 +0000 (UTC) Date: Wed, 9 May 2018 15:38:05 -0700 From: Andrew Morton To: Roman Gushchin Cc: Johannes Weiner , , , , Michal Hocko , Vladimir Davydov , Tejun Heo Subject: Re: [PATCH v3 2/2] mm: ignore memory.min of abandoned memory cgroups Message-Id: <20180509153805.2a940eac8c858398fb0f4b0c@linux-foundation.org> In-Reply-To: <20180509180734.GA4856@castle.DHCP.thefacebook.com> References: <20180503114358.7952-1-guro@fb.com> <20180503114358.7952-2-guro@fb.com> <20180503173835.GA28437@cmpxchg.org> <20180509180734.GA4856@castle.DHCP.thefacebook.com> X-Mailer: Sylpheed 3.6.0 (GTK+ 2.24.31; x86_64-pc-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org > > Memory controller implements the memory.low best-effort memory > protection mechanism, which works perfectly in many cases and > allows protecting working sets of important workloads from > sudden reclaim. > > But its semantics has a significant limitation: it works > only as long as there is a supply of reclaimable memory. > This makes it pretty useless against any sort of slow memory > leaks or memory usage increases. This is especially true > for swapless systems. If swap is enabled, memory soft protection > effectively postpones problems, allowing a leaking application > to fill all swap area, which makes no sense. > The only effective way to guarantee the memory protection > in this case is to invoke the OOM killer. > > It's possible to handle this case in userspace by reacting > on MEMCG_LOW events; but there is still a place for a fail-safe > in-kernel mechanism to provide stronger guarantees. > > This patch introduces the memory.min interface for cgroup v2 > memory controller. It works very similarly to memory.low > (sharing the same hierarchical behavior), except that it's > not disabled if there is no more reclaimable memory in the system. > > If cgroup is not populated, its memory.min is ignored, > because otherwise even the OOM killer wouldn't be able > to reclaim the protected memory, and the system can stall. > > ... > > --- a/Documentation/cgroup-v2.txt > +++ b/Documentation/cgroup-v2.txt > @@ -1002,6 +1002,29 @@ PAGE_SIZE multiple when read back. > The total amount of memory currently being used by the cgroup > and its descendants. > > + memory.min > + A read-write single value file which exists on non-root > + cgroups. The default is "0". > + > + Hard memory protection. If the memory usage of a cgroup > + is within its effective min boundary, the cgroup's memory > + won't be reclaimed under any conditions. If there is no > + unprotected reclaimable memory available, OOM killer > + is invoked. > + > + Effective low boundary is limited by memory.min values of > + all ancestor cgroups. If there is memory.min overcommitment > + (child cgroup or cgroups are requiring more protected memory > + than parent will allow), then each child cgroup will get > + the part of parent's protection proportional to its > + actual memory usage below memory.min. > + > + Putting more memory than generally available under this > + protection is discouraged and may lead to constant OOMs. > + > + If a memory cgroup is not populated with processes, > + its memory.min is ignored. This is a copy-paste-edit of the memory.low description. Could we please carefully check that it all remains accurate? Should "Effective low boundary" be "Effective min boundary"? Does overcommit still apply to .min? etcetera.