Received: by 2002:a05:6a10:1d13:0:0:0:0 with SMTP id pp19csp754023pxb; Wed, 18 Aug 2021 13:21:57 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzJE/oG2NBmkxEHSL1/d6QhaD6nEXaN73Cr8nmtNp/qh1KH1LOhNJd9q4yblnlEeI8RcNcf X-Received: by 2002:a02:2307:: with SMTP id u7mr9808525jau.28.1629318117228; Wed, 18 Aug 2021 13:21:57 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1629318117; cv=none; d=google.com; s=arc-20160816; b=GiiDbGkU5wLIgwCxaRLARmrGuQTwH2SpSOPwm3jqgf8nVNtnfhoGg8TXYOFk9b6KlC A0wGIbFLDR0shv+OU7Ez15HP2Ah8/qybuF02CyHnk20QVoHZDab7FXPB8txxlyqD0ixj CpU/MCE5vy/dBJiz5ASQd/GUXK3Q+w4liBQSlYjNUu2a7vDk572+9lCqh38fN9BQp45f 1TKjnmf3pOCB3rWsMjVTZCxnM0DU1C3lc7kkv7a0N1oQLhFoKYczxQD7R49fqWWDjm7T POvU++HbWVfvpHQYL2dAkB8wXhABzdQqTH+akyaRc94UT85VBPu5AJQoYdfxPx8bRTu5 +C+A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:user-agent:in-reply-to:content-disposition :mime-version:references:message-id:subject:cc:to:from:date :dkim-signature; bh=nHfCCbHIXfL8S0LAJ+5b/okpSRYN0l8zL9APPGHUHjM=; b=N38BYNkWJs2nbQu89pueiuC35MGxRKgw3W4rfjqVVoRhLaMyka9THiujSba2lxopub MruA7wI8wIksxYvKPq6rjy3MM1HvTo4vg9WA3j7q8Pj1vogZcJQvNWXUTKGr2+16DmvK D0xWxCJr/zU7bBizY1k9dgJdJ+/yBtY2aANEqNwduA4heTWG+QbsLOHWbZnYMR/0dRtM O5oyWki1Scegon7yZt0Qub3XvMZeM5ygqyKgt+m3jsoCSoN2xhfjl0A3QgLmDmtphqOl PFRxsyNGtsSEZyQF1uss8SFFPjlygu4GttYXEob88SS/8DtzGGpC1U8/hispeAuBZQQb Ataw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@chrisdown.name header.s=google header.b=JNxUjQJf; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=chrisdown.name Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id n6si901018ilk.152.2021.08.18.13.21.39; Wed, 18 Aug 2021 13:21:57 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@chrisdown.name header.s=google header.b=JNxUjQJf; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=chrisdown.name Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233639AbhHRUSw (ORCPT + 99 others); Wed, 18 Aug 2021 16:18:52 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47594 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233509AbhHRUSt (ORCPT ); Wed, 18 Aug 2021 16:18:49 -0400 Received: from mail-wm1-x32d.google.com (mail-wm1-x32d.google.com [IPv6:2a00:1450:4864:20::32d]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C04B8C061764 for ; Wed, 18 Aug 2021 13:18:14 -0700 (PDT) Received: by mail-wm1-x32d.google.com with SMTP id a201-20020a1c7fd2000000b002e6d33447f9so4157988wmd.0 for ; Wed, 18 Aug 2021 13:18:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=chrisdown.name; s=google; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to:user-agent; bh=nHfCCbHIXfL8S0LAJ+5b/okpSRYN0l8zL9APPGHUHjM=; b=JNxUjQJfF8Rfgw0pIVdmau+KQ+tvGbR5jwxsNL+C4ZDxsjgE+iui13LWXrzEM2oi4e JExJW4m3jOmYpeYqh5Jm5a8nfWjtKnJpIJqhmxMEVzERO+5UnEh+APgQwU6jjDW2aU50 dWaLO+Kt1o238ME+il7oVvZb/cGa/r219D06M= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to:user-agent; bh=nHfCCbHIXfL8S0LAJ+5b/okpSRYN0l8zL9APPGHUHjM=; b=b/KjcCz2uODvl34/Cjdmc3ivrNNwz/NmJI13APUKwNu/oxg8xJLUDkPf/QT42Og+i8 rHSWLapF/7BcdtrxYgIbmQmB01RhBTLteNq5vZN6sw343r8HpzMVkTcwhj2uPc4kwYMP MxSWP1pfkfn82yQw5TlZPGGzTp8XoponPbbfhHdk0XsV6bNJeKR3jU0pKL3+O+RS8qWW AqZKgReY4RvOazE81Jom/bvOd1eDyYEQaDmeiFM2T83Wg8X+bd2uWsrV8s9Cc9mgxRN2 E+Mnj0VTrlZDIGx2cY/L3Wwh1AhXC4kTy5Vqfs3dbTpzi8wYm8Zc9cpNxum+HgSylnbc DwVA== X-Gm-Message-State: AOAM532Af2kM1ZPn8JLD0DATjcWWS0RMYyYEIhQMudjHooFI/+fHEzfa TS4m7PrcT9VHnpM6gRqx7m54ig== X-Received: by 2002:a05:600c:290:: with SMTP id 16mr10314299wmk.187.1629317893281; Wed, 18 Aug 2021 13:18:13 -0700 (PDT) Received: from localhost (82-69-42-66.dsl.in-addr.zen.co.uk. [82.69.42.66]) by smtp.gmail.com with ESMTPSA id a18sm5639913wmg.43.2021.08.18.13.18.11 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 18 Aug 2021 13:18:12 -0700 (PDT) Date: Wed, 18 Aug 2021 21:18:10 +0100 From: Chris Down To: Johannes Weiner Cc: Andrew Morton , Leon Yang , Roman Gushchin , Michal Hocko , linux-mm@kvack.org, cgroups@vger.kernel.org, linux-kernel@vger.kernel.org, kernel-team@fb.com Subject: Re: [PATCH] mm: memcontrol: fix occasional OOMs due to proportional memory.low reclaim Message-ID: References: <20210817180506.220056-1-hannes@cmpxchg.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii; format=flowed Content-Disposition: inline In-Reply-To: <20210817180506.220056-1-hannes@cmpxchg.org> User-Agent: Mutt/2.1.1 (e2a89abc) (2021-07-12) Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Johannes Weiner writes: >We've noticed occasional OOM killing when memory.low settings are in >effect for cgroups. This is unexpected and undesirable as memory.low >is supposed to express non-OOMing memory priorities between cgroups. > >The reason for this is proportional memory.low reclaim. When cgroups >are below their memory.low threshold, reclaim passes them over in the >first round, and then retries if it couldn't find pages anywhere else. >But when cgroups are slighly above their memory.low setting, page scan >force is scaled down and diminished in proportion to the overage, to >the point where it can cause reclaim to fail as well - only in that >case we currently don't retry, and instead trigger OOM. > >To fix this, hook proportional reclaim into the same retry logic we >have in place for when cgroups are skipped entirely. This way if >reclaim fails and some cgroups were scanned with dimished pressure, >we'll try another full-force cycle before giving up and OOMing. > >Reported-by: Leon Yang >Signed-off-by: Johannes Weiner Thanks for tracking this down! Agreed that this looks like a good stable candidate. Acked-by: Chris Down >--- > include/linux/memcontrol.h | 29 +++++++++++++++-------------- > mm/vmscan.c | 27 +++++++++++++++++++-------- > 2 files changed, 34 insertions(+), 22 deletions(-) > >diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h >index bfe5c486f4ad..24797929d8a1 100644 >--- a/include/linux/memcontrol.h >+++ b/include/linux/memcontrol.h >@@ -612,12 +612,15 @@ static inline bool mem_cgroup_disabled(void) > return !cgroup_subsys_enabled(memory_cgrp_subsys); > } > >-static inline unsigned long mem_cgroup_protection(struct mem_cgroup *root, >- struct mem_cgroup *memcg, >- bool in_low_reclaim) >+static inline void mem_cgroup_protection(struct mem_cgroup *root, >+ struct mem_cgroup *memcg, >+ unsigned long *min, >+ unsigned long *low) > { >+ *min = *low = 0; >+ > if (mem_cgroup_disabled()) >- return 0; >+ return; > > /* > * There is no reclaim protection applied to a targeted reclaim. >@@ -653,13 +656,10 @@ static inline unsigned long mem_cgroup_protection(struct mem_cgroup *root, > * > */ > if (root == memcg) >- return 0; >- >- if (in_low_reclaim) >- return READ_ONCE(memcg->memory.emin); >+ return; > >- return max(READ_ONCE(memcg->memory.emin), >- READ_ONCE(memcg->memory.elow)); >+ *min = READ_ONCE(memcg->memory.emin); >+ *low = READ_ONCE(memcg->memory.elow); > } > > void mem_cgroup_calculate_protection(struct mem_cgroup *root, >@@ -1147,11 +1147,12 @@ static inline void memcg_memory_event_mm(struct mm_struct *mm, > { > } > >-static inline unsigned long mem_cgroup_protection(struct mem_cgroup *root, >- struct mem_cgroup *memcg, >- bool in_low_reclaim) >+static inline void mem_cgroup_protection(struct mem_cgroup *root, >+ struct mem_cgroup *memcg, >+ unsigned long *min, >+ unsigned long *low) > { >- return 0; >+ *min = *low = 0; > } > > static inline void mem_cgroup_calculate_protection(struct mem_cgroup *root, >diff --git a/mm/vmscan.c b/mm/vmscan.c >index 4620df62f0ff..701106e1829c 100644 >--- a/mm/vmscan.c >+++ b/mm/vmscan.c >@@ -100,9 +100,12 @@ struct scan_control { > unsigned int may_swap:1; > > /* >- * Cgroups are not reclaimed below their configured memory.low, >- * unless we threaten to OOM. If any cgroups are skipped due to >- * memory.low and nothing was reclaimed, go back for memory.low. >+ * Cgroup memory below memory.low is protected as long as we >+ * don't threaten to OOM. If any cgroup is reclaimed at >+ * reduced force or passed over entirely due to its memory.low >+ * setting (memcg_low_skipped), and nothing is reclaimed as a >+ * result, then go back back for one more cycle that reclaims >+ * the protected memory (memcg_low_reclaim) to avert OOM. > */ > unsigned int memcg_low_reclaim:1; > unsigned int memcg_low_skipped:1; >@@ -2537,15 +2540,14 @@ static void get_scan_count(struct lruvec *lruvec, struct scan_control *sc, > for_each_evictable_lru(lru) { > int file = is_file_lru(lru); > unsigned long lruvec_size; >+ unsigned long low, min; > unsigned long scan; >- unsigned long protection; > > lruvec_size = lruvec_lru_size(lruvec, lru, sc->reclaim_idx); >- protection = mem_cgroup_protection(sc->target_mem_cgroup, >- memcg, >- sc->memcg_low_reclaim); >+ mem_cgroup_protection(sc->target_mem_cgroup, memcg, >+ &min, &low); > >- if (protection) { >+ if (min || low) { > /* > * Scale a cgroup's reclaim pressure by proportioning > * its current usage to its memory.low or memory.min >@@ -2576,6 +2578,15 @@ static void get_scan_count(struct lruvec *lruvec, struct scan_control *sc, > * hard protection. > */ > unsigned long cgroup_size = mem_cgroup_size(memcg); >+ unsigned long protection; >+ >+ /* memory.low scaling, make sure we retry before OOM */ >+ if (!sc->memcg_low_reclaim && low > min) { >+ protection = low; >+ sc->memcg_low_skipped = 1; >+ } else { >+ protection = min; >+ } > > /* Avoid TOCTOU with earlier protection check */ > cgroup_size = max(cgroup_size, protection); >-- >2.32.0 >