Received: by 2002:a25:e7d8:0:0:0:0:0 with SMTP id e207csp783941ybh; Thu, 12 Mar 2020 11:04:14 -0700 (PDT) X-Google-Smtp-Source: ADFU+vvB8mh8NkTZUoUyTtreuD3c3v03/hCnyB1k9dR6TT3T8822bz9EsE3jn+ojdsPTnPuVYI5j X-Received: by 2002:aca:b803:: with SMTP id i3mr1784829oif.92.1584036254742; Thu, 12 Mar 2020 11:04:14 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1584036254; cv=none; d=google.com; s=arc-20160816; b=QiZeF1VqtNuvphW3GOIIWgMed1BvptOh7EvGoJikzHY3fFhKWTAYhxNDZu4SYcTrIP etH1SrIhteXu4mGI+T0iqZBZPbWBVQeyT6p+ScrtIRi3uUKrC0F2W9K9kpd4OV4fS1qA a3XiWwArDqoXwFnxxoIVc2f3Lzt3bWeFCSlMY3sgLO7vtZ72a/bLuKkCcSMWI5nAZ6tN zwLzzG5uC0s/bevPojnCVExzqoVv0iprVCEF/2nb//ryf0oWBoBOakFi4iygEoOtWNy/ 6kzBz2oUQ1VzXr2Z4/HAolfpZ/lMBoWobekaK9D26o9u/i+e1fK4GF0G2v/cMkVbXTJB tAdQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:in-reply-to:content-disposition :mime-version:references:message-id:subject:cc:to:from:date :dkim-signature; bh=db1bUIPese8tjEPtKXx5DKELzL2SGDYpGP6uaK+iPYk=; b=ajNVhVLNfKy1yNYMQv7gh33xfymblrGYc34QmU/JHwT8XBfc0zfSxydVDPNW+b0R8D irdyN9ZfrIgnPVZSs/SvcSTdirgeCNo36PBHsHr9cMULMlBHBJmV5/pLVp+6F2k9VOvb YnsuJLTxGfRLwju8MBABbc7d0CIx27aAQRy5V9ztogaTjcPPZv5Oo94uUMLKa6ZfwL5G OuExc6t7RLwHjE2BE4wP8LmxNILXvueIPc5VVBhhRAO+3cWMM8+3fGV3jcn3bnxwzG4h DddS/osBpdv0L/c5Z4bsKr/FLEAm4Fyv+Wx7SP+abkbOL/XTaUBhkk9nR8h5dDP4zOdM EaFg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@chrisdown.name header.s=google header.b=tmWupmQn; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=chrisdown.name Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id u67si1095769oie.259.2020.03.12.11.04.01; Thu, 12 Mar 2020 11:04:14 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@chrisdown.name header.s=google header.b=tmWupmQn; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=chrisdown.name Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726582AbgCLSDJ (ORCPT + 99 others); Thu, 12 Mar 2020 14:03:09 -0400 Received: from mail-wm1-f68.google.com ([209.85.128.68]:54725 "EHLO mail-wm1-f68.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726558AbgCLSDJ (ORCPT ); Thu, 12 Mar 2020 14:03:09 -0400 Received: by mail-wm1-f68.google.com with SMTP id n8so7110820wmc.4 for ; Thu, 12 Mar 2020 11:03:06 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=chrisdown.name; s=google; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=db1bUIPese8tjEPtKXx5DKELzL2SGDYpGP6uaK+iPYk=; b=tmWupmQn6aBGlxS345WRGAZeHC8WXMQPju9i7p6QnNUOSx21Xp5MO9v9Bmms9+plRJ aJHMDf/lddi7gNlBlJeMQXSy5Wu0fY67UMHLwg5PDMHqLMs3SJUAGd11Z4nrIo5kdDkN sNB9XXE0s93VkX6/77RkeFloCCdRWi7p6Kq7s= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=db1bUIPese8tjEPtKXx5DKELzL2SGDYpGP6uaK+iPYk=; b=E1NBWKR7tusrrhotY5IxmHQ5s6noz8CQwpoDHYwQFjryx+IyssaJlD57mJqazQunBD l38plaq+f+Jv2AAxz92xO8DAUCFzz+FlCsIUYhV0kC5YwsXMognkaIQkdV0TBxnHD+g9 FY0w2/XcVzShPa5LPfwZNQoNdWnm1oBs4uEBjJ2s3PGSX2hLPl0RD2NiJZ4QY8yTM4YQ yfWLH1NzMjC2PLLvMugCeX06g9+LMjI9fIOTjmyJ8VYB+WPAlcKiCjBuy6g2ZC4btEeB NcHIlm85nxGq/ldYTenNqvgXdnumvwQ8CwT0CNngDHWYPMzVCOfxAG93K8Hc7vodBNQh eN2Q== X-Gm-Message-State: ANhLgQ0nZ/DosSZ7FQRp09xGmWNA2pxg520A1iUKAC5SQTiQLWW16T+h 28bvZD37fzArqnb2+uP06QE9jQ== X-Received: by 2002:a1c:b4d4:: with SMTP id d203mr6075251wmf.85.1584036185206; Thu, 12 Mar 2020 11:03:05 -0700 (PDT) Received: from localhost ([89.32.122.5]) by smtp.gmail.com with ESMTPSA id b141sm14147822wme.2.2020.03.12.11.03.04 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 12 Mar 2020 11:03:04 -0700 (PDT) Date: Thu, 12 Mar 2020 18:03:04 +0000 From: Chris Down To: Andrew Morton Cc: Johannes Weiner , Tejun Heo , linux-mm@kvack.org, cgroups@vger.kernel.org, linux-kernel@vger.kernel.org, kernel-team@fb.com Subject: [PATCH 2/2] mm, memcg: Throttle allocators based on ancestral memory.high Message-ID: <8cd132f84bd7e16cdb8fde3378cdbf05ba00d387.1584036142.git.chris@chrisdown.name> References: <80780887060514967d414b3cd91f9a316a16ab98.1584036142.git.chris@chrisdown.name> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <80780887060514967d414b3cd91f9a316a16ab98.1584036142.git.chris@chrisdown.name> Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Prior to this commit, we only directly check the affected cgroup's memory.high against its usage. However, it's possible that we are being reclaimed as a result of hitting an ancestor memory.high and should be penalised based on that, instead. This patch changes memory.high overage throttling to use the largest overage in its ancestors when considering how many penalty jiffies to charge. This makes sure that we penalise poorly behaving cgroups in the same way regardless of at what level of the hierarchy memory.high was breached. Fixes: 0e4b01df8659 ("mm, memcg: throttle allocators when failing reclaim over memory.high") Reported-by: Johannes Weiner Signed-off-by: Chris Down Cc: Andrew Morton Cc: Tejun Heo Cc: linux-mm@kvack.org Cc: cgroups@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: kernel-team@fb.com Cc: stable@vger.kernel.org # 5.4.x --- mm/memcontrol.c | 93 ++++++++++++++++++++++++++++++------------------- 1 file changed, 58 insertions(+), 35 deletions(-) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index a70206e516fe..46d649241a21 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -2297,28 +2297,41 @@ static void high_work_func(struct work_struct *work) #define MEMCG_DELAY_SCALING_SHIFT 14 /* - * Scheduled by try_charge() to be executed from the userland return path - * and reclaims memory over the high limit. + * Get the number of jiffies that we should penalise a mischievous cgroup which + * is exceeding its memory.high by checking both it and its ancestors. */ -void mem_cgroup_handle_over_high(void) +static unsigned long calculate_high_delay(struct mem_cgroup *memcg, + unsigned int nr_pages) { - unsigned long usage, high, clamped_high; - unsigned long pflags; - unsigned long penalty_jiffies, overage; - unsigned int nr_pages = current->memcg_nr_pages_over_high; - struct mem_cgroup *memcg; + unsigned long penalty_jiffies; + u64 max_overage = 0; - if (likely(!nr_pages)) - return; + do { + unsigned long usage, high; + u64 overage; - memcg = get_mem_cgroup_from_mm(current->mm); - reclaim_high(memcg, nr_pages, GFP_KERNEL); - current->memcg_nr_pages_over_high = 0; + usage = page_counter_read(&memcg->memory); + high = READ_ONCE(memcg->high); + + /* + * Prevent division by 0 in overage calculation by acting as if + * it was a threshold of 1 page + */ + high = max(high, 1UL); + + overage = usage - high; + overage <<= MEMCG_DELAY_PRECISION_SHIFT; + overage = div64_u64(overage, high); + + if (overage > max_overage) + max_overage = overage; + } while ((memcg = parent_mem_cgroup(memcg)) && + !mem_cgroup_is_root(memcg)); + + if (!max_overage) + return 0; /* - * memory.high is breached and reclaim is unable to keep up. Throttle - * allocators proactively to slow down excessive growth. - * * We use overage compared to memory.high to calculate the number of * jiffies to sleep (penalty_jiffies). Ideally this value should be * fairly lenient on small overages, and increasingly harsh when the @@ -2326,24 +2339,9 @@ void mem_cgroup_handle_over_high(void) * its crazy behaviour, so we exponentially increase the delay based on * overage amount. */ - - usage = page_counter_read(&memcg->memory); - high = READ_ONCE(memcg->high); - - if (usage <= high) - goto out; - - /* - * Prevent division by 0 in overage calculation by acting as if it was a - * threshold of 1 page - */ - clamped_high = max(high, 1UL); - - overage = div64_u64((u64)(usage - high) << MEMCG_DELAY_PRECISION_SHIFT, - clamped_high); - - penalty_jiffies = ((u64)overage * overage * HZ) - >> (MEMCG_DELAY_PRECISION_SHIFT + MEMCG_DELAY_SCALING_SHIFT); + penalty_jiffies = max_overage * max_overage * HZ; + penalty_jiffies >>= MEMCG_DELAY_PRECISION_SHIFT; + penalty_jiffies >>= MEMCG_DELAY_SCALING_SHIFT; /* * Factor in the task's own contribution to the overage, such that four @@ -2360,7 +2358,32 @@ void mem_cgroup_handle_over_high(void) * application moving forwards and also permit diagnostics, albeit * extremely slowly. */ - penalty_jiffies = min(penalty_jiffies, MEMCG_MAX_HIGH_DELAY_JIFFIES); + return min(penalty_jiffies, MEMCG_MAX_HIGH_DELAY_JIFFIES); +} + +/* + * Scheduled by try_charge() to be executed from the userland return path + * and reclaims memory over the high limit. + */ +void mem_cgroup_handle_over_high(void) +{ + unsigned long penalty_jiffies; + unsigned long pflags; + unsigned int nr_pages = current->memcg_nr_pages_over_high; + struct mem_cgroup *memcg; + + if (likely(!nr_pages)) + return; + + memcg = get_mem_cgroup_from_mm(current->mm); + reclaim_high(memcg, nr_pages, GFP_KERNEL); + current->memcg_nr_pages_over_high = 0; + + /* + * memory.high is breached and reclaim is unable to keep up. Throttle + * allocators proactively to slow down excessive growth. + */ + penalty_jiffies = calculate_high_delay(memcg, nr_pages); /* * Don't sleep if the amount of jiffies this memcg owes us is so low -- 2.25.1