Received: by 2002:a05:6a10:8c0a:0:0:0:0 with SMTP id go10csp757572pxb; Thu, 25 Feb 2021 14:31:12 -0800 (PST) X-Google-Smtp-Source: ABdhPJyP+h5YK5QtdGDE/hj9PNPzF2oLgxBXHQXiT/YYFBjKxvTuF+XOtwF+IEzqS+4fGGulXHnl X-Received: by 2002:a17:906:a1c2:: with SMTP id bx2mr5132886ejb.138.1614292272417; Thu, 25 Feb 2021 14:31:12 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1614292272; cv=none; d=google.com; s=arc-20160816; b=A1ymhwSTxmr7VB543D5asEWAUZo6DP9oCs9chGf40Ko6KeaQlHjw2YDMjxcAnycw2p 182w9T9MgQR2Co3PEoGD9WIjCgSJcjBWgT7dU6jNTW8dXjeXyq0bUtL3aBo4sg/zG+1E uo2Q7225uoR0ZZqYiD3zBj69Y/pkJFup2mOwkitXAe339Tb6vz7j6Tc7SSXsVWFbVteC 4Rory08tgx6fl6bLD0BQEWDdPKJfJBLaHGeUE+AM55ZuCCnHASXz9JbjpgzNhWbrRMeF DUk9T+E3zKnOSOP+INdvJXar+HIHbTSEeQ0ZJ7W9RAa0WNIIj9YMcqeCSwBQKGr9CX+5 gZZQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:content-language :in-reply-to:mime-version:user-agent:date:message-id:references:cc :to:from:subject:ironport-sdr:ironport-sdr; bh=RXdKtdCyUTXmtqFAzkBOWNEl+0FGhrqi9gO/dXhOvlQ=; b=oHWvI2Z6W6mjvhEo5Ya7SZox2rOljyS8tqtZNL6clFUBZu4da8FtDZiNiJFqcvQPH6 KiZKf1kxOLJJvN/QKvSdGmBDpTQHSsx3YNzW0kv0W8tBDtC2kmkKibUBv+dsl2TRbNVT xc67WqyrcsjuK+W//acFaXb2BZd7gtGo6ZqQgqoz7u4h2aPMxuwLF5hN9rkNyoGkBZfE RuUci4kUIfqKsPvO/uJ6UU+fXfDB6l0yy5cS3zhQMIRCWMd7tOqk2zjhHGKI2KkD+kBw OZ5yeHJLSv3b8nRa8cQh/T07Co+0dIJ/0wGK5SaBi83yp3dFjQzzav6Ge7BNHRaIvmDx FQ1g== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id n13si1580989eji.366.2021.02.25.14.30.49; Thu, 25 Feb 2021 14:31:12 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230459AbhBYW17 (ORCPT + 99 others); Thu, 25 Feb 2021 17:27:59 -0500 Received: from mga12.intel.com ([192.55.52.136]:33744 "EHLO mga12.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231881AbhBYW1g (ORCPT ); Thu, 25 Feb 2021 17:27:36 -0500 IronPort-SDR: uQBRiXE8ffy4Dd+tHy0u2+jC9MQtzLeyUnSjejC0f60gTFMk5tpCiNyX5YSBcDWVBzV7sRTQ6K cnoGO5/xRWiQ== X-IronPort-AV: E=McAfee;i="6000,8403,9906"; a="164916683" X-IronPort-AV: E=Sophos;i="5.81,207,1610438400"; d="scan'208";a="164916683" Received: from orsmga003.jf.intel.com ([10.7.209.27]) by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 Feb 2021 14:25:48 -0800 IronPort-SDR: 3xeohI4Olwdz2iRO+Wcc4cNuCWodhBY1jeeoz7LVZ+cZBwoZwLjM3a7ax373SR3+PoXrk0U++r pxX9gr5hypXQ== X-IronPort-AV: E=Sophos;i="5.81,207,1610438400"; d="scan'208";a="365618944" Received: from schen9-mobl.amr.corp.intel.com ([10.254.86.33]) by orsmga003-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 Feb 2021 14:25:48 -0800 Subject: Re: [PATCH v2 2/3] mm: Force update of mem cgroup soft limit tree on usage excess From: Tim Chen To: Michal Hocko Cc: Andrew Morton , Johannes Weiner , Vladimir Davydov , Dave Hansen , Ying Huang , linux-mm@kvack.org, cgroups@vger.kernel.org, linux-kernel@vger.kernel.org References: <06f1f92f1f7d4e57c4e20c97f435252c16c60a27.1613584277.git.tim.c.chen@linux.intel.com> <884d7559-e118-3773-351d-84c02642ca96@linux.intel.com> Message-ID: Date: Thu, 25 Feb 2021 14:25:47 -0800 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.6.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 2/22/21 9:41 AM, Tim Chen wrote: > > > On 2/22/21 12:40 AM, Michal Hocko wrote: >> On Fri 19-02-21 10:59:05, Tim Chen wrote: > occurrence. >>>> >>>> Soft limit is evaluated every THRESHOLDS_EVENTS_TARGET * SOFTLIMIT_EVENTS_TARGET. >>>> If all events correspond with a newly charged memory and the last event >>>> was just about the soft limit boundary then we should be bound by 128k >>>> pages (512M and much more if this were huge pages) which is a lot! >>>> I haven't realized this was that much. Now I see the problem. This would >>>> be a useful information for the changelog. >>>> >>>> Your fix is focusing on the over-the-limit boundary which will solve the >>>> problem but wouldn't that lead to to updates happening too often in >>>> pathological situation when a memcg would get reclaimed immediatelly? >>> >>> Not really immediately. The memcg that has the most soft limit excess will >>> be chosen for page reclaim, which is the way it should be. >>> It is less likely that a memcg that just exceeded >>> the soft limit becomes the worst offender immediately. >> >> Well this all depends on when the the soft limit reclaim triggeres. In >> other words how often you see the global memory reclaim. If we have a >> memcg with a sufficient excess then this will work mostly fine. I was more >> worried about a case when you have memcgs just slightly over the limit >> and the global memory pressure is a regular event. You can easily end up >> bouncing memcgs off and on the tree in a rapid fashion. >> > > If you are concerned about such a case, we can add an excess threshold, > say 4 MB (or 1024 4K pages), before we trigger a forced update. You think > that will cover this concern? > Michal, How about modifiying this patch with a threshold? Like the following? Tim --- From 5a78ab56e2e654290cacab2f5a1631e1da1d90d2 Mon Sep 17 00:00:00 2001 From: Tim Chen Date: Wed, 3 Feb 2021 14:08:48 -0800 Subject: [PATCH] mm: Force update of mem cgroup soft limit tree on usage excess To rate limit updates to the mem cgroup soft limit tree, we only perform updates every SOFTLIMIT_EVENTS_TARGET (defined as 1024) memory events. However, this sampling based updates may miss a critical update: i.e. when the mem cgroup first exceeded its limit but it was not on the soft limit tree. It should be on the tree at that point so it could be subjected to soft limit page reclaim. If the mem cgroup had few memory events compared with other mem cgroups, we may not update it and place in on the mem cgroup soft limit tree for many memory events. And this mem cgroup excess usage could creep up and the mem cgroup could be hidden from the soft limit page reclaim for a long time. Fix this issue by forcing an update to the mem cgroup soft limit tree if a mem cgroup has exceeded its memory soft limit but it is not on the mem cgroup soft limit tree. --- mm/memcontrol.c | 19 +++++++++++++++++-- 1 file changed, 17 insertions(+), 2 deletions(-) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index a51bf90732cb..e0f6948f8ea5 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -104,6 +104,7 @@ static bool do_memsw_account(void) #define THRESHOLDS_EVENTS_TARGET 128 #define SOFTLIMIT_EVENTS_TARGET 1024 +#define SOFTLIMIT_EXCESS_THRESHOLD 1024 /* * Cgroups above their limits are maintained in a RB-Tree, independent of @@ -985,15 +986,29 @@ static bool mem_cgroup_event_ratelimit(struct mem_cgroup *memcg, */ static void memcg_check_events(struct mem_cgroup *memcg, struct page *page) { + struct mem_cgroup_per_node *mz; + bool force_update = false; + + mz = mem_cgroup_nodeinfo(memcg, page_to_nid(page)); + /* + * mem_cgroup_update_tree may not be called for a memcg exceeding + * soft limit due to the sampling nature of update. Don't allow + * a memcg to be left out of the tree if it has too much usage + * excess. + */ + if (mz && !mz->on_tree && + soft_limit_excess(mz->memcg) > SOFTLIMIT_EXCESS_THRESHOLD) + force_update = true; + /* threshold event is triggered in finer grain than soft limit */ - if (unlikely(mem_cgroup_event_ratelimit(memcg, + if (unlikely((force_update) || mem_cgroup_event_ratelimit(memcg, MEM_CGROUP_TARGET_THRESH))) { bool do_softlimit; do_softlimit = mem_cgroup_event_ratelimit(memcg, MEM_CGROUP_TARGET_SOFTLIMIT); mem_cgroup_threshold(memcg); - if (unlikely(do_softlimit)) + if (unlikely((force_update) || do_softlimit)) mem_cgroup_update_tree(memcg, page); } } -- 2.20.1