Received: by 2002:a05:6602:2086:0:0:0:0 with SMTP id a6csp4467558ioa; Wed, 27 Apr 2022 04:39:00 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxMDQ5C3fqt2tAMV59qc49+GcgvO56+cmYlevFal5kDcxPIj2PVqIBh4weSr66JcCNEzr5c X-Received: by 2002:a63:9044:0:b0:3ab:28cd:4d0b with SMTP id a65-20020a639044000000b003ab28cd4d0bmr14133336pge.150.1651059540346; Wed, 27 Apr 2022 04:39:00 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1651059540; cv=none; d=google.com; s=arc-20160816; b=usfqxMKiRt/agh7rSm/3Pd2Rxj4oBN8NTl5DDubjzzS3fi8MEMlmLivEygT2BD39hs 0ffd3pS/SkrZ9ub87dI15eMsalnYITay6Am+g6gE9X0PBCam88feRC93eXeQUgFDCZkA 1PZJ3e2LNJJuoX1mQYPzFm25wnamCfmDZiJB7Gvu62aZniYvw0BMnA9h45OIN+g/tPgQ 97cIQRmw0zgrODIP57TC1IPajocSx2SFf7LOzLY74k96u1BvadEKyXUDAP7jgXS+DvKt PLGKK1xBeEZ47uY5l0WNXjfbjGlqIprMWNUUwlqM/pe9Iq0i/BQqjxu4IveuaXTdI5Gv Ym9A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :user-agent:references:in-reply-to:message-id:date:subject:cc:to :from:dkim-signature; bh=FQP3RVF/Gigwzfnf1PJTDxhTQLNYHmq+eAS/PnMaPW8=; b=uT60AF5R2BbtYwZsz5IVSA0y15Ch6uLcUVfj9L0OGZy5KCvic5ThQiPMCw3m9265ZU qCaBDSGj2NI91K5gSj5tkuhVrGkiuXj9xKjoJvcCZrMTtQIYH1PM+gVap4V8ndsA5J8z Yg8IZgNX9arrd8HbOikaMRdOIdyCGphMlGPSgJrXs6/XT50WdoJxNutjPvBme6l0t0jx qJ6cEdugY/rQl+8PBLd5R9V9B14bMP5t4DwIS/afTuZJPZOLEEiEOD9ls/j4dgwwBdAv NVifgExk/mdOmfFTZMSwo+h5hizUxYSwSfOOgwGQTUaQdzURxR5osCTmpF/Xr02xYnRN H8gg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linuxfoundation.org header.s=korg header.b=RDUttGCx; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linuxfoundation.org Return-Path: Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net. [2620:137:e000::1:18]) by mx.google.com with ESMTPS id 185-20020a6205c2000000b004fa3a8e0091si1099728pff.328.2022.04.27.04.39.00 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 27 Apr 2022 04:39:00 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) client-ip=2620:137:e000::1:18; Authentication-Results: mx.google.com; dkim=pass header.i=@linuxfoundation.org header.s=korg header.b=RDUttGCx; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linuxfoundation.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id AB0D43B4C44; Wed, 27 Apr 2022 03:33:48 -0700 (PDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S241771AbiDZJMD (ORCPT + 99 others); Tue, 26 Apr 2022 05:12:03 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55506 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1347311AbiDZIvS (ORCPT ); Tue, 26 Apr 2022 04:51:18 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [IPv6:2604:1380:4641:c500::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5413263DA; Tue, 26 Apr 2022 01:40:09 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id C689260EC3; Tue, 26 Apr 2022 08:40:08 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id A4D9EC385A0; Tue, 26 Apr 2022 08:40:07 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linuxfoundation.org; s=korg; t=1650962408; bh=C+Emsd2dvUZs4sS+uJikxoTnvtK6V3iRkCJ7myLYyI4=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=RDUttGCxQCtc2DUIfYhTDL4YlzQ7U7PaT8piZ5prXDb+VVkqZgzvlxDbEKZXNFwyZ xFW/PQYvGxoGuF70RZu2c/TAxUWTaqFHtBRG4p6uHgaGhCFa44O/dCUDj+SXIzVRes fb/SfJHvfFY2Izz4nvHeqsRdG16AHvMbjyYNs9Og= From: Greg Kroah-Hartman To: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman , stable@vger.kernel.org, Shakeel Butt , Daniel Dao , Ivan Babrou , Michal Hocko , Roman Gushchin , Johannes Weiner , =?UTF-8?q?Michal=20Koutn=C3=BD?= , Frank Hofmann , Andrew Morton , Linus Torvalds Subject: [PATCH 5.15 079/124] memcg: sync flush only if periodic flush is delayed Date: Tue, 26 Apr 2022 10:21:20 +0200 Message-Id: <20220426081749.546421968@linuxfoundation.org> X-Mailer: git-send-email 2.36.0 In-Reply-To: <20220426081747.286685339@linuxfoundation.org> References: <20220426081747.286685339@linuxfoundation.org> User-Agent: quilt/0.66 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-2.5 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,RDNS_NONE,SPF_HELO_NONE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Shakeel Butt commit 9b3016154c913b2e7ec5ae5c9a42eb9e732d86aa upstream. Daniel Dao has reported [1] a regression on workloads that may trigger a lot of refaults (anon and file). The underlying issue is that flushing rstat is expensive. Although rstat flush are batched with (nr_cpus * MEMCG_BATCH) stat updates, it seems like there are workloads which genuinely do stat updates larger than batch value within short amount of time. Since the rstat flush can happen in the performance critical codepaths like page faults, such workload can suffer greatly. This patch fixes this regression by making the rstat flushing conditional in the performance critical codepaths. More specifically, the kernel relies on the async periodic rstat flusher to flush the stats and only if the periodic flusher is delayed by more than twice the amount of its normal time window then the kernel allows rstat flushing from the performance critical codepaths. Now the question: what are the side-effects of this change? The worst that can happen is the refault codepath will see 4sec old lruvec stats and may cause false (or missed) activations of the refaulted page which may under-or-overestimate the workingset size. Though that is not very concerning as the kernel can already miss or do false activations. There are two more codepaths whose flushing behavior is not changed by this patch and we may need to come to them in future. One is the writeback stats used by dirty throttling and second is the deactivation heuristic in the reclaim. For now keeping an eye on them and if there is report of regression due to these codepaths, we will reevaluate then. Link: https://lore.kernel.org/all/CA+wXwBSyO87ZX5PVwdHm-=dBjZYECGmfnydUicUyrQqndgX2MQ@mail.gmail.com [1] Link: https://lkml.kernel.org/r/20220304184040.1304781-1-shakeelb@google.com Fixes: 1f828223b799 ("memcg: flush lruvec stats in the refault") Signed-off-by: Shakeel Butt Reported-by: Daniel Dao Tested-by: Ivan Babrou Cc: Michal Hocko Cc: Roman Gushchin Cc: Johannes Weiner Cc: Michal Koutný Cc: Frank Hofmann Cc: Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman --- include/linux/memcontrol.h | 5 +++++ mm/memcontrol.c | 12 +++++++++++- mm/workingset.c | 2 +- 3 files changed, 17 insertions(+), 2 deletions(-) --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -1002,6 +1002,7 @@ static inline unsigned long lruvec_page_ } void mem_cgroup_flush_stats(void); +void mem_cgroup_flush_stats_delayed(void); void __mod_memcg_lruvec_state(struct lruvec *lruvec, enum node_stat_item idx, int val); @@ -1422,6 +1423,10 @@ static inline void mem_cgroup_flush_stat { } +static inline void mem_cgroup_flush_stats_delayed(void) +{ +} + static inline void __mod_memcg_lruvec_state(struct lruvec *lruvec, enum node_stat_item idx, int val) { --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -650,6 +650,9 @@ static DECLARE_DEFERRABLE_WORK(stats_flu static DEFINE_SPINLOCK(stats_flush_lock); static DEFINE_PER_CPU(unsigned int, stats_updates); static atomic_t stats_flush_threshold = ATOMIC_INIT(0); +static u64 flush_next_time; + +#define FLUSH_TIME (2UL*HZ) static inline void memcg_rstat_updated(struct mem_cgroup *memcg, int val) { @@ -671,6 +674,7 @@ static void __mem_cgroup_flush_stats(voi if (!spin_trylock_irqsave(&stats_flush_lock, flag)) return; + flush_next_time = jiffies_64 + 2*FLUSH_TIME; cgroup_rstat_flush_irqsafe(root_mem_cgroup->css.cgroup); atomic_set(&stats_flush_threshold, 0); spin_unlock_irqrestore(&stats_flush_lock, flag); @@ -682,10 +686,16 @@ void mem_cgroup_flush_stats(void) __mem_cgroup_flush_stats(); } +void mem_cgroup_flush_stats_delayed(void) +{ + if (time_after64(jiffies_64, flush_next_time)) + mem_cgroup_flush_stats(); +} + static void flush_memcg_stats_dwork(struct work_struct *w) { __mem_cgroup_flush_stats(); - queue_delayed_work(system_unbound_wq, &stats_flush_dwork, 2UL*HZ); + queue_delayed_work(system_unbound_wq, &stats_flush_dwork, FLUSH_TIME); } /** --- a/mm/workingset.c +++ b/mm/workingset.c @@ -352,7 +352,7 @@ void workingset_refault(struct page *pag inc_lruvec_state(lruvec, WORKINGSET_REFAULT_BASE + file); - mem_cgroup_flush_stats(); + mem_cgroup_flush_stats_delayed(); /* * Compare the distance to the existing workingset size. We * don't activate pages that couldn't stay resident even if