Received: by 10.192.165.148 with SMTP id m20csp2714933imm; Sun, 22 Apr 2018 13:28:05 -0700 (PDT) X-Google-Smtp-Source: AIpwx4/1xzdMhcdD7w2cNBVv1sR8kjT7PqE4IkDTYRXKnTduacopK5D0SbL3w/UmtROhqdOQE0FD X-Received: by 2002:a17:902:a5:: with SMTP id a34-v6mr15077276pla.58.1524428885128; Sun, 22 Apr 2018 13:28:05 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1524428885; cv=none; d=google.com; s=arc-20160816; b=m2l903yaiYGoNxYqovdpMDK+8KwEipZjUc1CT+k4XnDiJG96XXB2xCueYIDCNhrWe5 IDBwosyilM+qzumsc8FxFGMK4mEFD9haTwzOL9zHJ7ClbWH3/wVZdQatw3SX9DAFJfF+ H/wHMqWe9ZQHJ0vtJxFj6VwMtHKaAUAmVif83O1DI7fAiGa96qcKu+2AC9IDmCpXtZuZ rnTrn8TiSgOp6wwb4bNwTCGeGZ1nE1vv7v63FmYFC+JX9LdzdPt7guOGhAFvZInrHh/g oJgqDpTrN0KRiIM/lnjzNneHjTZvioeqqGMXtnqFo03ORAj8zq1t13Snoop/ap2lMi2p vNxg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:references:in-reply-to:message-id:date :subject:cc:to:from:dkim-signature:arc-authentication-results; bh=9FhIIPxytZ6DkzIdsMjXpzOUa8f1SnRtyK6Gybd2RsE=; b=GNKckHDGH7+46kE5aKR8pJvHNoi99urzih90s2tg+PHP9Fi6rRXYoWVIJTEM9pkGFW 7j9Mbt3nOMgXDaGdIr8tl2/ow0SMBEJGbnJtuKjTBkZXz6iG3TERg2qVeNsvkeKul/6t F02Yx+hsNpewne3ndlVTxfOaUSQ684g+/6BqB75g/LUVVXYwzifCBkShP2QiO7B20Xlb S3qLMZNuoybXhFqTnl9sVnfKYu6jLTpmumZYM6mZfv6G7mi2i6YFCAwRIOglLwcM/32I qbZlugvLaYhDLnkBUwV9baHZlYUccFMb5gAPZwu4nv81Kr63waYAOS7LH4OO0wo8qnnc pg0w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=YNkAsJ2Q; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id f75si4989222pfh.90.2018.04.22.13.27.50; Sun, 22 Apr 2018 13:28:05 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=YNkAsJ2Q; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753752AbeDVU0e (ORCPT + 99 others); Sun, 22 Apr 2018 16:26:34 -0400 Received: from mail-pf0-f179.google.com ([209.85.192.179]:43326 "EHLO mail-pf0-f179.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753571AbeDVU0c (ORCPT ); Sun, 22 Apr 2018 16:26:32 -0400 Received: by mail-pf0-f179.google.com with SMTP id j11so7484892pff.10 for ; Sun, 22 Apr 2018 13:26:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=9FhIIPxytZ6DkzIdsMjXpzOUa8f1SnRtyK6Gybd2RsE=; b=YNkAsJ2QN85QKFsWuCsatp3i6hH1YUwSYI7e61BrOnUvd5/pDtRzJfANMvZRRH4RYR 8cUc+Fn98zqOQ61ZA59mWJC0zjysUnwB8loiNpLQIf8PDrk2VhTpzsXM23Ay2ac3cL6+ MwF8RJs0CBmjpkKsTzUmksNC3XfpNQVP31OzjL6CW/AUt+aNHm4IowpHwV6S9eku5llP nWU5gmIw1XUDAF825FvjtDt+VeHZbWtkrlkUyKfwGt9s7YUZpvvyiyKqJR/IhS/sm7QG Yrhl0HPJGNUwMLIknnbwgkcqr40AHca45W8EbEkVGp8A1O85F21OzFI3k7tgl8zD+5OW 88Pw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=9FhIIPxytZ6DkzIdsMjXpzOUa8f1SnRtyK6Gybd2RsE=; b=Y3vK3dloK7Rej+MWGbcpXZssxP7SzgFSCZycpW/48wdazifS3b4/V83rtd1kZu+55H YtHtPKaG1OdQpjk6O4+xVyvD56JEM+Dx3Cu9vRTl3ia2igKy9NS2RFDpp1Nkc+jP58Je 5hJp7rcStivUgtBB2b/x3UwB0B9CFWUPmLRzHowCRj1BqyVb0cJANxKQ9K1ljkeqFKTe xm/5emt8trNs7SbKJV0xCAY5JdJcgQUf3JbbuCQ9RR+Ne2osszW0H3eA8CMDgKjjqXwY 4driJwXDmIMFyBZpUJhwLvTbHbRR81k9XdFojtNsToWCEnCa0ia+DO0XwhqsaFb069iO FFvg== X-Gm-Message-State: ALQs6tBZrGZhW4QDIQH/hDPgEkcUDcN2KBkTo3lq8mgnEirSG2tKTsSe oL+Qi5KqcEKikJcwBAadRGikjg== X-Received: by 10.99.156.9 with SMTP id f9mr8240057pge.274.1524428791245; Sun, 22 Apr 2018 13:26:31 -0700 (PDT) Received: from gthelen.svl.corp.google.com ([2620:15c:2cb:201:7fd0:97b4:747b:9bf1]) by smtp.gmail.com with ESMTPSA id p1sm20512726pfp.48.2018.04.22.13.26.29 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Sun, 22 Apr 2018 13:26:30 -0700 (PDT) From: Greg Thelen To: guro@fb.com, Johannes Weiner , Andrew Morton , Michal Hocko Cc: Vladimir Davydov , Tejun Heo , Cgroups , kernel-team@fb.com, Linux MM , LKML , Greg Thelen Subject: [RFC PATCH 0/2] memory.low,min reclaim Date: Sun, 22 Apr 2018 13:26:10 -0700 Message-Id: <20180422202612.127760-1-gthelen@google.com> X-Mailer: git-send-email 2.17.0.484.g0c8726318c-goog In-Reply-To: <20180320223353.5673-1-guro@fb.com> References: <20180320223353.5673-1-guro@fb.com> Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Roman's previously posted memory.low,min patches add per memcg effective low limit to detect overcommitment of parental limits. But if we flip low,min reclaim to bail if usage<{low,min} at any level, then we don't need an effective low limit, which makes the code simpler. When parent limits are overcommited memory.min will oom kill, which is more drastic but makes the memory.low a simpler concept. If memcg a/b wants oom kill before reclaim, then give it to them. It seems a bit strange for a/b/memory.low's behaviour to depend on a/c/memory.low (i.e. a/b.low is strong unless a/b.low+a/c.low exceed a.low). Previous patches: - mm: rename page_counter's count/limit into usage/max - mm: memory.low hierarchical behavior - mm: treat memory.low value inclusive - mm/docs: describe memory.low refinements - mm: introduce memory.min - mm: move the high field from struct mem_cgroup to page_counter 8 files changed, 405 insertions(+), 141 deletions(-) I think there might be a simpler way (ableit it doesn't yet include Documentation): - memcg: fix memory.low - memcg: add memory.min 3 files changed, 75 insertions(+), 6 deletions(-) The idea of this alternate approach is for memory.low,min to avoid reclaim if any portion of under-consideration memcg ancestry is under respective limit. Also, in either approach, I suspect we should mention the interaction with numa contrainted allocations (cpuset.mems, mempolicy, mbind). For example, if a numa agnostic memcg with large memory.min happens to gobble up all of node N1 memory, and a future task really wants node N1 memory (via mempolicy) then we oom kill rather reclaim or migrating memory. Ideas: a) oom kill numa constrainted allocator, that's what we've been doing in Google. I can provide patch if helpful. I admit that it has shortcomings. b) oom kill a memcg with memory.low protection if its TBD priority is lower than the allocating task. Priority is a TBD concept. c) consider migrating numa agnostic memory.low memory as a lighterweight alternative to oom kill. I extended Roman's nifty reclaim test: #!/bin/bash # # Uppercase cgroups can tolerate some reclaim (current > low). # Lowerase cgroups are intolerate to reclaim (current < low). # # A A/memory.low = 2G, A/memory.current = 6G # // \\ # bC De b/memory.low = 3G b/memory.current = 2G # C/memory.low = 1G C/memory.current = 2G # D/memory.low = 0 D/memory.current = 2G # e/memory.low = 10G e/memory.current = 0 # # F F/memory.low = 2G, F/memory.current = 4G # / \ # g H g/memory.low = 3G g/memory.current = 2G # H/memory.low = 1G H/memory.current = 2G # # i i/memory.low = 5G, i/memory.current = 4G # / \ # j K j/memory.low = 3G j/memory.current = 2G # K/memory.low = 1G K/memory.current = 2G # # L L/memory.low = 2G, L/memory.current = 4G, L/memory.max = 4G # / \ # m N m/memory.low = 3G m/memory.current = 2G # N/memory.low = 1G N/memory.current = 2G # # setting memory.min: warmup => after global pressure # A : 6306372k => 3154336k # A/b : 2102184k => 2101928k # A/C : 2101936k => 1048352k # A/D : 2102252k => 4056k # A/e : 0k => 0k # F : 4204420k => 3150272k # F/g : 2102188k => 2101912k # F/H : 2102232k => 1048360k # i : 4204652k => 4203884k # i/j : 2102324k => 2101940k # i/K : 2102328k => 2101944k # L : 4189976k => 3147824k # L/m : 2101980k => 2101956k # L/N : 2087996k => 1045868k # # setting memory.min: warmup => after L/m antagonist # A : 6306076k => 6305988k # A/b : 2102152k => 2102128k # A/C : 2101948k => 2101916k # A/D : 2101976k => 2101944k # A/e : 0k => 0k # F : 4204156k => 4203832k # F/g : 2102220k => 2101920k # F/H : 2101936k => 2101912k # i : 4204204k => 4203852k # i/j : 2102256k => 2101936k # i/K : 2101948k => 2101916k # L : 4190012k => 3886352k # L/m : 2101996k => 2837856k # L/N : 2088016k => 1048496k # # setting memory.low: warmup => after global pressure # A : 6306220k => 3154864k # A/b : 2101964k => 2101940k # A/C : 2102204k => 1047040k # A/D : 2102052k => 5884k # A/e : 0k => 0k # F : 4204192k => 3147888k # F/g : 2101948k => 2101916k # F/H : 2102244k => 1045972k # i : 4204480k => 4204056k # i/j : 2102008k => 2101976k # i/K : 2102464k => 2102080k # L : 4190028k => 3150192k # L/m : 2102004k => 2101980k # L/N : 2088024k => 1048212k # # setting memory.low: warmup => after L/m antagonist # A : 6306360k => 6305960k # A/b : 2101988k => 2101924k # A/C : 2102192k => 2101916k # A/D : 2102180k => 2102120k # A/e : 0k => 0k # F : 4203964k => 4203908k # F/g : 2102016k => 2101992k # F/H : 2101948k => 2101916k # i : 4204408k => 4203988k # i/j : 2101984k => 2101936k # i/K : 2102424k => 2102052k # L : 4189960k => 3886296k # L/m : 2101968k => 2838704k # L/N : 2087992k => 1047592k set -o errexit set -o nounset set -o pipefail LIM="$1"; shift ANTAGONIST="$1"; shift CGPATH=/tmp/cgroup vmtouch2() { rm -f "$2" (echo $BASHPID > "${CGPATH}/$1/cgroup.procs" && exec /tmp/mmap --loop 1 --file "$2" "$3") } vmtouch() { # twice to ensure slab caches are warmed up and all objs are charged to cgroup. vmtouch2 "$1" "$2" "$3" vmtouch2 "$1" "$2" "$3" } dump() { for i in A A/b A/C A/D A/e F F/g F/H i i/j i/K L L/m L/N; do printf "%-5s: %sk\n" $i $(($(cat "${CGPATH}/${i}/memory.current")/1024)) done } rm -f /file_? if [[ -e "${CGPATH}/A" ]]; then rmdir ${CGPATH}/?/? ${CGPATH}/? fi echo "+memory" > "${CGPATH}/cgroup.subtree_control" mkdir "${CGPATH}/A" "${CGPATH}/F" "${CGPATH}/i" "${CGPATH}/L" echo "+memory" > "${CGPATH}/A/cgroup.subtree_control" echo "+memory" > "${CGPATH}/F/cgroup.subtree_control" echo "+memory" > "${CGPATH}/i/cgroup.subtree_control" echo "+memory" > "${CGPATH}/L/cgroup.subtree_control" mkdir "${CGPATH}/A/b" "${CGPATH}/A/C" "${CGPATH}/A/D" "${CGPATH}/A/e" mkdir "${CGPATH}/F/g" "${CGPATH}/F/H" mkdir "${CGPATH}/i/j" "${CGPATH}/i/K" mkdir "${CGPATH}/L/m" "${CGPATH}/L/N" echo 2G > "${CGPATH}/A/memory.${LIM}" echo 3G > "${CGPATH}/A/b/memory.${LIM}" echo 1G > "${CGPATH}/A/C/memory.${LIM}" echo 0 > "${CGPATH}/A/D/memory.${LIM}" echo 10G > "${CGPATH}/A/e/memory.${LIM}" echo 2G > "${CGPATH}/F/memory.${LIM}" echo 3G > "${CGPATH}/F/g/memory.${LIM}" echo 1G > "${CGPATH}/F/H/memory.${LIM}" echo 5G > "${CGPATH}/i/memory.${LIM}" echo 3G > "${CGPATH}/i/j/memory.${LIM}" echo 1G > "${CGPATH}/i/K/memory.${LIM}" echo 2G > "${CGPATH}/L/memory.${LIM}" echo 4G > "${CGPATH}/L/memory.max" echo 3G > "${CGPATH}/L/m/memory.${LIM}" echo 1G > "${CGPATH}/L/N/memory.${LIM}" vmtouch A/b /file_b 2G vmtouch A/C /file_C 2G vmtouch A/D /file_D 2G vmtouch F/g /file_g 2G vmtouch F/H /file_H 2G vmtouch i/j /file_j 2G vmtouch i/K /file_K 2G vmtouch L/m /file_m 2G vmtouch L/N /file_N 2G vmtouch2 "$ANTAGONIST" /file_ant 150G echo echo "after $ANTAGONIST antagonist" dump rmdir "${CGPATH}/A/b" "${CGPATH}/A/C" "${CGPATH}/A/D" "${CGPATH}/A/e" rmdir "${CGPATH}/F/g" "${CGPATH}/F/H" rmdir "${CGPATH}/i/j" "${CGPATH}/i/K" rmdir "${CGPATH}/L/m" "${CGPATH}/L/N" rmdir "${CGPATH}/A" "${CGPATH}/F" "${CGPATH}/i" "${CGPATH}/L" rm /file_ant /file_b /file_C /file_D /file_g /file_H /file_j /file_K Greg Thelen (2): memcg: fix memory.low memcg: add memory.min include/linux/memcontrol.h | 8 +++++ mm/memcontrol.c | 70 ++++++++++++++++++++++++++++++++++---- mm/vmscan.c | 3 ++ 3 files changed, 75 insertions(+), 6 deletions(-) -- 2.17.0.484.g0c8726318c-goog