Received: by 2002:a05:6a10:9afc:0:0:0:0 with SMTP id t28csp710258pxm; Fri, 25 Feb 2022 17:58:27 -0800 (PST) X-Google-Smtp-Source: ABdhPJyfaMh8jEmcVvBZL6a9vDl+9Yj64EOF8BGO8K5ivQYHZMrQjdo2ZPKD769SVgT3M+1kW52F X-Received: by 2002:aa7:9902:0:b0:4cb:95a7:a4c4 with SMTP id z2-20020aa79902000000b004cb95a7a4c4mr10275129pff.85.1645840707380; Fri, 25 Feb 2022 17:58:27 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1645840707; cv=none; d=google.com; s=arc-20160816; b=llpDN4wH3QSaDhj1XhRJmB4KX9zsloDFEVt8ADqsOq9iMQ/dAAK4JYg6dzW/hsiVPJ 0l+/9p5yamhvS8d87ON9CS0vsoRD2RvIGzJ1PwqZ/SeS0zjMNYnMzEmyLOAV68PAff3k dCvL0uHRsWg9rf3e7g3B8jnYLX+e8qWtRM7GZLHbDNzIngx4QW+ZEl5B1taIZFjwXZIg AwJYOySA1mzv79StZIC0w7ddhHaH8d/MgvB6DLIYG5JEX5t1Co9NlEwwcGvuwI2qmP9w zxoiEYlFWCUTeQgxH4BaSdsapWCUlAz/UgTT6dvWDoLjumzqhJMBoG45tnaaPy3iLXKs LA0w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:mime-version:message-id:date :dkim-signature; bh=c5XxepI0RiJzH9nImzDTpadEG2SO2IU4ZT8K2Wm0fgw=; b=kyBaAV70iPijDhREyFCfAuAMj/tlbuUnpK9gUMUaHKLEchCREWE6nb8GmKEq/9oO8H eov+LC6DLQMMLoawlCyu8PtCQR6t1vjZKyjyUXBLHWwUpMRruDiAwlAypLpKVzNawiXL cRX+jfV8rmbEgnmbz3b7GTPh4U8iuoGn288YP9/mBlvR1SNi36RBdlOgIQB+7XBW+H5U B6+0M6XMR6wqbCT3BtHd8kFCuzsW7uxuUR29kZSC1CE4VvxS4Xe1XhodRULpuE9NbUGQ 0Rm9H1ftiuhEcTpLwo02ZReukARntFmKGj2A1+flCN9tDxFN21ECT4eQGa4//enjHcCi Zmow== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=iwU9nbda; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net. [2620:137:e000::1:18]) by mx.google.com with ESMTPS id y3-20020a17090aa40300b001b8a76b34bdsi9575549pjp.80.2022.02.25.17.58.27 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 25 Feb 2022 17:58:27 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) client-ip=2620:137:e000::1:18; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=iwU9nbda; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 9DB5317B0E1; Fri, 25 Feb 2022 17:41:49 -0800 (PST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S241436AbiBZAZK (ORCPT + 99 others); Fri, 25 Feb 2022 19:25:10 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51078 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S241068AbiBZAZI (ORCPT ); Fri, 25 Feb 2022 19:25:08 -0500 Received: from mail-yw1-x1149.google.com (mail-yw1-x1149.google.com [IPv6:2607:f8b0:4864:20::1149]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id EAEAA199D78 for ; Fri, 25 Feb 2022 16:24:34 -0800 (PST) Received: by mail-yw1-x1149.google.com with SMTP id 00721157ae682-2d07ae11464so46581207b3.14 for ; Fri, 25 Feb 2022 16:24:34 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:message-id:mime-version:subject:from:to:cc; bh=c5XxepI0RiJzH9nImzDTpadEG2SO2IU4ZT8K2Wm0fgw=; b=iwU9nbdapI4QDtZoaxiJdFfKk1Aeelgfe69uZJ/lOO24OOX3A5QLJPiH4qboOUz+2t zQ09GP9qcVBfCxpp8LeUQrCciTFqyr3ZfNXtHmQcDdshuU8oBAZQYGNMMsLvF1rrJSy0 xgl5ybIbuKPpkxSdP4WL+dvjKkd+mHI6inkn2YcaLxC09Mz1DhKHMnY5GFhO+SCythBy J5KTKZYcptRIR/A8cB0AHU+KBOeaXdbwJSaeFWuzemzTsZh9FW8r2PUqyNenl0z+4moK scvXa+QlOZ7lJ9J3duby8Kr1w1z/3ERSbJTInf29y1U66/ojmBs2Psy+xnZMhY6dGx+E 5nFA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:message-id:mime-version:subject:from:to:cc; bh=c5XxepI0RiJzH9nImzDTpadEG2SO2IU4ZT8K2Wm0fgw=; b=x9k3W9mezHr8nc5nx5SrrGt65qnJnv2xkD6+82ik8HXkqn1SF/HQv24f9C2BE1HeYr qJEF2vpgM+BWvF1tMaJtVU07ZUm6tzdYwwwnjiVdfd4Li4f/o42SBtbpJywhUojF2SHG 37+GyaMUIwSeJ8ExH9NbHjNafgIzvGQRoLm6vDFWMbLvYQlpmddltNC/UhbsqoRLDZES NOTqFNQgXotgv56p/4ySbVPyos4aN1yLwoT8ySnJdZpklNBBETobQpAgXXnlHAd6k9Ce mzsl+A6bnremFHsDD8PzzjRkHNOGgh2AMIFAjAqf++PNO0O69FBcqrT9M/ai/M2O+NBW wVBA== X-Gm-Message-State: AOAM530Wi6yu85M6Px/yQvUCgtGxTPbZcL2r1Gv8epMEEL9T9jVbQu7n kKccqEFxMlS+rBBUQ+laUnL+gCiCDExDLQ== X-Received: from shakeelb.svl.corp.google.com ([2620:15c:2cd:202:1a3e:f375:915f:ad7e]) (user=shakeelb job=sendgmr) by 2002:a25:2b0a:0:b0:624:a898:3e2f with SMTP id r10-20020a252b0a000000b00624a8983e2fmr9753491ybr.643.1645835074125; Fri, 25 Feb 2022 16:24:34 -0800 (PST) Date: Fri, 25 Feb 2022 16:24:12 -0800 Message-Id: <20220226002412.113819-1-shakeelb@google.com> Mime-Version: 1.0 X-Mailer: git-send-email 2.35.1.574.g5d30c73bfb-goog Subject: [PATCH] memcg: async flush memcg stats from perf sensitive codepaths From: Shakeel Butt To: "=?UTF-8?q?Michal=20Koutn=C3=BD?=" , Johannes Weiner , Michal Hocko , Roman Gushchin Cc: Ivan Babrou , Andrew Morton , cgroups@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Shakeel Butt , Daniel Dao , stable@vger.kernel.org Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-9.5 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,RDNS_NONE,SPF_HELO_NONE,T_SCC_BODY_TEXT_LINE, USER_IN_DEF_DKIM_WL autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Daniel Dao has reported [1] a regression on workloads that may trigger a lot of refaults (anon and file). The underlying issue is that flushing rstat is expensive. Although rstat flush are batched with (nr_cpus * MEMCG_BATCH) stat updates, it seems like there are workloads which genuinely do stat updates larger than batch value within short amount of time. Since the rstat flush can happen in the performance critical codepaths like page faults, such workload can suffer greatly. The easiest fix for now is for performance critical codepaths trigger the rstat flush asynchronously. This patch converts the refault codepath to use async rstat flush. In addition, this patch has premptively converted mem_cgroup_wb_stats and shrink_node to also use the async rstat flush as they may also similar performance regressions. Link: https://lore.kernel.org/all/CA+wXwBSyO87ZX5PVwdHm-=dBjZYECGmfnydUicUyrQqndgX2MQ@mail.gmail.com [1] Fixes: 1f828223b799 ("memcg: flush lruvec stats in the refault") Reported-by: Daniel Dao Signed-off-by: Shakeel Butt Cc: --- include/linux/memcontrol.h | 1 + mm/memcontrol.c | 10 +++++++++- mm/vmscan.c | 2 +- mm/workingset.c | 2 +- 4 files changed, 12 insertions(+), 3 deletions(-) diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index ef4b445392a9..bfdd48be60ff 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -998,6 +998,7 @@ static inline unsigned long lruvec_page_state_local(struct lruvec *lruvec, } void mem_cgroup_flush_stats(void); +void mem_cgroup_flush_stats_async(void); void __mod_memcg_lruvec_state(struct lruvec *lruvec, enum node_stat_item idx, int val); diff --git a/mm/memcontrol.c b/mm/memcontrol.c index c695608c521c..4338e8d779b2 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -690,6 +690,14 @@ void mem_cgroup_flush_stats(void) __mem_cgroup_flush_stats(); } +void mem_cgroup_flush_stats_async(void) +{ + if (atomic_read(&stats_flush_threshold) > num_online_cpus()) { + atomic_set(&stats_flush_threshold, 0); + mod_delayed_work(system_unbound_wq, &stats_flush_dwork, 0); + } +} + static void flush_memcg_stats_dwork(struct work_struct *w) { __mem_cgroup_flush_stats(); @@ -4522,7 +4530,7 @@ void mem_cgroup_wb_stats(struct bdi_writeback *wb, unsigned long *pfilepages, struct mem_cgroup *memcg = mem_cgroup_from_css(wb->memcg_css); struct mem_cgroup *parent; - mem_cgroup_flush_stats(); + mem_cgroup_flush_stats_async(); *pdirty = memcg_page_state(memcg, NR_FILE_DIRTY); *pwriteback = memcg_page_state(memcg, NR_WRITEBACK); diff --git a/mm/vmscan.c b/mm/vmscan.c index c6f77e3e6d59..b6c6b165c1ef 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -3188,7 +3188,7 @@ static void shrink_node(pg_data_t *pgdat, struct scan_control *sc) * Flush the memory cgroup stats, so that we read accurate per-memcg * lruvec stats for heuristics. */ - mem_cgroup_flush_stats(); + mem_cgroup_flush_stats_async(); memset(&sc->nr, 0, sizeof(sc->nr)); diff --git a/mm/workingset.c b/mm/workingset.c index b717eae4e0dd..a4f2b1aa5bcc 100644 --- a/mm/workingset.c +++ b/mm/workingset.c @@ -355,7 +355,7 @@ void workingset_refault(struct folio *folio, void *shadow) mod_lruvec_state(lruvec, WORKINGSET_REFAULT_BASE + file, nr); - mem_cgroup_flush_stats(); + mem_cgroup_flush_stats_async(); /* * Compare the distance to the existing workingset size. We * don't activate pages that couldn't stay resident even if -- 2.35.1.574.g5d30c73bfb-goog