Received: by 10.213.65.68 with SMTP id h4csp2308723imn; Thu, 5 Apr 2018 12:38:16 -0700 (PDT) X-Google-Smtp-Source: AIpwx4+/6WRJJLHqUuVzNpfLuAJeGspKPvcCypZiexQK1DHa/5GieyIw2ewqD1rIXnuzk3O3dc/s X-Received: by 2002:a17:902:9892:: with SMTP id s18-v6mr7952951plp.95.1522957096252; Thu, 05 Apr 2018 12:38:16 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1522957096; cv=none; d=google.com; s=arc-20160816; b=ciwmUwv1n/TP5zxZ7DtkbfyQGkeFFZtLcCaG9Os7bLPUMCKPgrh7fx5qLiIacqKjS5 hxlB4e8UM7GexOEuUj10V4gdxNd8SPwj3OoshdJKr5uZSTOHxp7E2Mt7MYU3wPwHcTCf tLS5OcbAQylDBJGpxfmHyyrJNODs3xjvGexVsbH4Yga9k4UVCBL7W+RjTj2nNUSmcU2s mVWPprMBdRbnqzsG8+WTkubPzkKD272fiyZMgzbrxQGuGXXfmeFYHMmeIYfVGKCcHh6b VA/QT0s2WubYqfjUXCrGhfyrt0hsOxApTncjtdiMM3LgZezRlEDVFKrWXyES52UI2BIn hSBA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature:arc-authentication-results; bh=zrLjWGpD1wh+7LZkztm+FPxvKL7wX2O8HnzauZ4/srY=; b=ak9TGCCaFBLt1IDeiiDqDImzU+aqDLCe46V0ecykeTVNtHKEd2RaYnp7lTf8EiIdLM HI1TDElaDWlZDSMso48xnb3C5yYe5c7vb05ZGyB8qaDpSVm9BQqG+edMAcIGdO2c1L9f E7lM2R52i2AJY522eHp+Qi9zkHHxP7zHAchwfsrdjFhCOPjce3hfVJLIPlMdDJjfAlZH XxFofIFMr5oLY5fTT5lY7DSMQjTui+D88SUoHkSzoTXnxAB3ov7pJ3GLKCB/KwQE2jm3 KPZbRuVORgbx4A60wRr81SgF899MtAQZKSep3epqINKmFd7OJUafRKSyZosQ+Z24sxWn X8hA== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@cmpxchg.org header.s=x header.b=hsfMcjmi; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=cmpxchg.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id s184si5657153pgc.95.2018.04.05.12.38.01; Thu, 05 Apr 2018 12:38:16 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=fail header.i=@cmpxchg.org header.s=x header.b=hsfMcjmi; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=cmpxchg.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752555AbeDETgu (ORCPT + 99 others); Thu, 5 Apr 2018 15:36:50 -0400 Received: from gum.cmpxchg.org ([85.214.110.215]:50212 "EHLO gum.cmpxchg.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751952AbeDETgs (ORCPT ); Thu, 5 Apr 2018 15:36:48 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=cmpxchg.org ; s=x; h=In-Reply-To:Content-Type:MIME-Version:References:Message-ID:Subject: Cc:To:From:Date:Sender:Reply-To:Content-Transfer-Encoding:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:List-Id:List-Help:List-Unsubscribe:List-Subscribe: List-Post:List-Owner:List-Archive; bh=zrLjWGpD1wh+7LZkztm+FPxvKL7wX2O8HnzauZ4/srY=; b=hsfMcjmiZVBxASFNr6Lw6YMiVX 8inEqtBQ3OeyLWIwYpp2Vnx7sfPXwpgVcHCk9aVkCF890gj/rVT3pLOQISxhx4D9q01e37Tx+9gb4 PfHkUrGzn51MQp22xfz1PLSmqDnt3h8g78cVlNC2+3D07qS5ooLcMoL/tzReenGnsLVs=; Date: Thu, 5 Apr 2018 15:36:40 -0400 From: Johannes Weiner To: Roman Gushchin Cc: linux-mm@kvack.org, Andrew Morton , Michal Hocko , Vladimir Davydov , Tejun Heo , kernel-team@fb.com, cgroups@vger.kernel.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH v3 2/4] mm: memory.low hierarchical behavior Message-ID: <20180405193640.GB27918@cmpxchg.org> References: <20180405185921.4942-1-guro@fb.com> <20180405185921.4942-2-guro@fb.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20180405185921.4942-2-guro@fb.com> User-Agent: Mutt/1.9.4 (2018-02-28) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Apr 05, 2018 at 07:59:19PM +0100, Roman Gushchin wrote: > This patch aims to address an issue in current memory.low semantics, > which makes it hard to use it in a hierarchy, where some leaf memory > cgroups are more valuable than others. > > For example, there are memcgs A, A/B, A/C, A/D and A/E: > > A A/memory.low = 2G, A/memory.current = 6G > //\\ > BC DE B/memory.low = 3G B/memory.current = 2G > C/memory.low = 1G C/memory.current = 2G > D/memory.low = 0 D/memory.current = 2G > E/memory.low = 10G E/memory.current = 0 > > If we apply memory pressure, B, C and D are reclaimed at > the same pace while A's usage exceeds 2G. > This is obviously wrong, as B's usage is fully below B's memory.low, > and C has 1G of protection as well. > Also, A is pushed to the size, which is less than A's 2G memory.low, > which is also wrong. > > A simple bash script (provided below) can be used to reproduce > the problem. Current results are: > A: 1430097920 > A/B: 711929856 > A/C: 717426688 > A/D: 741376 > A/E: 0 > > To address the issue a concept of effective memory.low is introduced. > Effective memory.low is always equal or less than original memory.low. > In a case, when there is no memory.low overcommittment (and also for > top-level cgroups), these two values are equal. > Otherwise it's a part of parent's effective memory.low, calculated as > a cgroup's memory.low usage divided by sum of sibling's memory.low > usages (under memory.low usage I mean the size of actually protected > memory: memory.current if memory.current < memory.low, 0 otherwise). > It's necessary to track the actual usage, because otherwise an empty > cgroup with memory.low set (A/E in my example) will affect actual > memory distribution, which makes no sense. To avoid traversing > the cgroup tree twice, page_counters code is reused. > > Calculating effective memory.low can be done in the reclaim path, > as we conveniently traversing the cgroup tree from top to bottom and > check memory.low on each level. So, it's a perfect place to calculate > effective memory low and save it to use it for children cgroups. > > This also eliminates a need to traverse the cgroup tree from bottom > to top each time to check if parent's guarantee is not exceeded. > > Setting/resetting effective memory.low is intentionally racy, but > it's fine and shouldn't lead to any significant differences in > actual memory distribution. > > With this patch applied results are matching the expectations: > A: 2147930112 > A/B: 1428721664 > A/C: 718393344 > A/D: 815104 > A/E: 0 > > Test script: > #!/bin/bash > > CGPATH="/sys/fs/cgroup" > > truncate /file1 --size 2G > truncate /file2 --size 2G > truncate /file3 --size 2G > truncate /file4 --size 50G > > mkdir "${CGPATH}/A" > echo "+memory" > "${CGPATH}/A/cgroup.subtree_control" > mkdir "${CGPATH}/A/B" "${CGPATH}/A/C" "${CGPATH}/A/D" "${CGPATH}/A/E" > > echo 2G > "${CGPATH}/A/memory.low" > echo 3G > "${CGPATH}/A/B/memory.low" > echo 1G > "${CGPATH}/A/C/memory.low" > echo 0 > "${CGPATH}/A/D/memory.low" > echo 10G > "${CGPATH}/A/E/memory.low" > > echo $$ > "${CGPATH}/A/B/cgroup.procs" && vmtouch -qt /file1 > echo $$ > "${CGPATH}/A/C/cgroup.procs" && vmtouch -qt /file2 > echo $$ > "${CGPATH}/A/D/cgroup.procs" && vmtouch -qt /file3 > echo $$ > "${CGPATH}/cgroup.procs" && vmtouch -qt /file4 > > echo "A: " `cat "${CGPATH}/A/memory.current"` > echo "A/B: " `cat "${CGPATH}/A/B/memory.current"` > echo "A/C: " `cat "${CGPATH}/A/C/memory.current"` > echo "A/D: " `cat "${CGPATH}/A/D/memory.current"` > echo "A/E: " `cat "${CGPATH}/A/E/memory.current"` > > rmdir "${CGPATH}/A/B" "${CGPATH}/A/C" "${CGPATH}/A/D" "${CGPATH}/A/E" > rmdir "${CGPATH}/A" > rm /file1 /file2 /file3 /file4 > > Signed-off-by: Roman Gushchin > Cc: Andrew Morton > Cc: Johannes Weiner > Cc: Michal Hocko > Cc: Vladimir Davydov > Cc: Tejun Heo > Cc: kernel-team@fb.com > Cc: linux-mm@kvack.org > Cc: cgroups@vger.kernel.org > Cc: linux-kernel@vger.kernel.org Acked-by: Johannes Weiner