Received: by 2002:a05:6a10:9afc:0:0:0:0 with SMTP id t28csp735502pxm; Fri, 25 Feb 2022 18:43:22 -0800 (PST) X-Google-Smtp-Source: ABdhPJwcANP/czU03T+rxyEC94hXkpaUGyiNUsvxWW69Zn8kel8IzhToZ1/ZswvjokW4IS3ZRw4i X-Received: by 2002:a17:90a:1202:b0:1b9:b7e7:1652 with SMTP id f2-20020a17090a120200b001b9b7e71652mr6111052pja.1.1645843401801; Fri, 25 Feb 2022 18:43:21 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1645843401; cv=none; d=google.com; s=arc-20160816; b=aXZmi7Qv81x27dNOSWJgIQRtLDjkT7Q8PY4bKxW5yWVB2B+6BSOWAJzJ7dMkiO9jGt E2Yak+mvKM5PjHN0hYsjiic+raNBH13neZigaymVv3ti+fjEoU5SqZGZDw9SQ0qXudpK S8emcrhWM6CnhjSTHI0tracZX2O34rlPdo7t5FgqkGG/Axpb8oMx6XyoEHFfrMC7AgB3 lnmSSZ4pbtqD5rQjctBI28KK7dZuRpz6M8a4A3cQrzeVMvpuLKiu2rhntGz3ibRvbLKF KYxoacUt4tPShB+3RB77cZr8Rk6s6E73trHfmKSKoesPglDdIMMY4LxJbFW9CvHG0tpI gcdw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=PVUmr01vxo2MfeZIcVa242AyLzLGr7DmFOqnKspe5oo=; b=nAprahD4EPEIgIsOsYqdtK0AN8lHELGv8DZuNLdPlvhZEnSmkyA3/HqcazdHRfnzzE 69esqyCvGjnPU9FkDeEKn8X0ZvEkzd4Nq+MYN5qZUVCpcZR39Rg8YFbCdyBrjLaQz/mE 9g1FJ9j4dZ8puJwUoWPPAV/M9/NQdqO/araFIh03sZSvl7t9/yWofxB5cl2JPszNALx4 e4zEIDwjuQEGXTWvkssFCExFVyof5dw8WPrmjy/v2WzooZgOFUSfHeD5as3gZVKEp7gl jaxopY2u68Ds2IQVtC9ZaOScX8Xe8iY/TnqevHVIs2vIol7mTxKvLNCxg0B5ozuPCkZE 1Sig== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=g7+3YvD7; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net. [2620:137:e000::1:18]) by mx.google.com with ESMTPS id g18-20020a056a001a1200b004e1786ccce1si3403593pfv.115.2022.02.25.18.43.20 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 25 Feb 2022 18:43:21 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) client-ip=2620:137:e000::1:18; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=g7+3YvD7; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id B918E2AA3C4; Fri, 25 Feb 2022 18:07:20 -0800 (PST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229470AbiBZBns (ORCPT + 99 others); Fri, 25 Feb 2022 20:43:48 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40090 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229498AbiBZBnr (ORCPT ); Fri, 25 Feb 2022 20:43:47 -0500 Received: from mail-pl1-x62b.google.com (mail-pl1-x62b.google.com [IPv6:2607:f8b0:4864:20::62b]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9AF742A2307 for ; Fri, 25 Feb 2022 17:43:10 -0800 (PST) Received: by mail-pl1-x62b.google.com with SMTP id s1so6166419plg.12 for ; Fri, 25 Feb 2022 17:43:10 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=PVUmr01vxo2MfeZIcVa242AyLzLGr7DmFOqnKspe5oo=; b=g7+3YvD71i4cYpDgsVXRs6VjOt3Np9OxZ/vd1UugXCCnnBMzSZ1u/kSQOgIzsv2RW7 KTwGzMI7ijIz+tnxvMMLRvTVtizPUmJxNhHdWZeQUskYgbYwYdrSIbM+QpPqkGAlHfy2 LdZM9ykzpQE3lLbLvVEmFW3X4f5AYqqCEen/v5MWGCQNWVPjmo7wXmGhy2kqZSBNKZPP oUGF6XnpafKrhrj8EaW65TOo3AfKd9PBClE0LNfewWSAfXfxtSao2EkVBplJOO1ffH65 LK9yweKPr75iY/XsHxIoqk1/gDWy7/pRyIBR8vmRLw2mgrUhflfHqdTx+uqGhXAY3CX8 I7mw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=PVUmr01vxo2MfeZIcVa242AyLzLGr7DmFOqnKspe5oo=; b=G2LHkXxi9DOU40cA0vE0cbjH03SAsxRyr29y2zlITfLmwMPsnk3NXBo16p7ikle6Hr GtLxXMv4YRslvGxzH0W2AzXveSO8NuHNHDIbtkVfclkPv0MEAy5tj+r6BsUGAZnbmU9J 7NEHvMl5yzvAInoDoH4t6pqW6hAy/D2oZDwcxrtJRZN2MKoaTv0ER5Tq3NHlfXzwhNOk 2H843rGHf5m8lP9/tYaqmwLMnULg/j7mwnxvbj8b961eP9hNoR9l/MqgKayYzLDGGVtw PWb6FE24ougwSEsSt1hILneIo5xQ9JgbazSJXW/oCzaZoHr4V9wxiWyYciu6MhqEU/G5 lB5g== X-Gm-Message-State: AOAM5337lToC/k1/SInTS583TH8kqh6YGwshiCZADhjB0m3zm2BLZc23 yOM9imD5XbpwtTNb1/CDETw6V/Zzda7LOmX5szx05A== X-Received: by 2002:a17:90a:db15:b0:1bd:71f:8123 with SMTP id g21-20020a17090adb1500b001bd071f8123mr3081583pjv.126.1645839789806; Fri, 25 Feb 2022 17:43:09 -0800 (PST) MIME-Version: 1.0 References: <20220226002412.113819-1-shakeelb@google.com> <20220225165842.561d3a475310aeab86a2d653@linux-foundation.org> In-Reply-To: <20220225165842.561d3a475310aeab86a2d653@linux-foundation.org> From: Shakeel Butt Date: Fri, 25 Feb 2022 17:42:57 -0800 Message-ID: Subject: Re: [PATCH] memcg: async flush memcg stats from perf sensitive codepaths To: Andrew Morton Cc: =?UTF-8?Q?Michal_Koutn=C3=BD?= , Johannes Weiner , Michal Hocko , Roman Gushchin , Ivan Babrou , Cgroups , Linux MM , LKML , Daniel Dao , stable Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-9.5 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,RDNS_NONE,SPF_HELO_NONE,T_SCC_BODY_TEXT_LINE, USER_IN_DEF_DKIM_WL autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Feb 25, 2022 at 4:58 PM Andrew Morton wrote: > > On Fri, 25 Feb 2022 16:24:12 -0800 Shakeel Butt wrote: > > > Daniel Dao has reported [1] a regression on workloads that may trigger > > a lot of refaults (anon and file). The underlying issue is that flushing > > rstat is expensive. Although rstat flush are batched with (nr_cpus * > > MEMCG_BATCH) stat updates, it seems like there are workloads which > > genuinely do stat updates larger than batch value within short amount of > > time. Since the rstat flush can happen in the performance critical > > codepaths like page faults, such workload can suffer greatly. > > > > The easiest fix for now is for performance critical codepaths trigger > > the rstat flush asynchronously. This patch converts the refault codepath > > to use async rstat flush. In addition, this patch has premptively > > converted mem_cgroup_wb_stats and shrink_node to also use the async > > rstat flush as they may also similar performance regressions. > > Gee we do this trick a lot and gee I don't like it :( > > a) if we're doing too much work then we're doing too much work. > Punting that work over to a different CPU or thread doesn't alter > that - it in fact adds more work. > Please note that we already have the async worker running every 2 seconds. Normally no consumer would need to flush the stats but if there were too many stat updates from producers in a short amount of time then one of the consumers will have to pay the price of the flush. We have two types of consumers i.e. performance critical (e.g. refault) and non-performance critical (e.g. memory.stat or num_stat readers). I think we can let the performance critical consumer skip the synchronous flushing and the async worker do the work for performance reasons. > b) there's an assumption here that the flusher is able to keep up > with the producer. What happens if that isn't the case? Do we > simply wind up the deferred items until the system goes oom? > Without a consumer nothing bad can happen even if flusher is slow (or it has too much work) or too many stats are being updated by many producers. With a couple of consumers, in the current kernel, one of them may have to pay the cost of synch flush. With this patch, we will have two types of consumers. First, who are ok to pay the price of sync flush to get the accurate stats and second who are ok with out of sync stats but bounded by 2 seconds (yes assuming flusher runs every 2 seconds). BTW there is no concern of the system going into oom due to reading a bit older stats. > What happens if there's a producer running on each CPU? Can the > flushers keep up? > > Pathologically, what happens if the producer is running > task_is_realtime() on a single-CPU system? Or if there's a > task_is_realtime() producer running on every CPU? The flusher never > gets to run and we're dead? > I think it has to be a mix of (stat) producers and (stat) consumers which are hogging CPU forever and no, we will not be dead. At worst the consumers might be making some wrong decisions due to consuming older stats. One can argue that since one consumer is reclaim code, some reclaim heuristic can get messed up due to older stats. Yes, that can happen. > > An obvious fix is to limit the permissible amount of windup (to what?) > and at some point, do the flushing synchronously anyway. > That is what we are currently doing. The limit being nr_cpus * MEMCG_BATCH. > Or we just don't do any this at all and put up with the cost of the > current code. I mean, this "fix" is kind of fake anyway, isn't it? > Pushing the 4-10ms delay onto a different CPU will just disrupt > something else which wanted to run on that CPU. The overall effect is > to hide the impact from one particular testcase, but is the benefit > really a real one? > Yes, the right fix would be to optimize the flushing code (but that would require more work/time). However I still think letting performance critical code paths to skip the sync flush would be good in general. So, if the current patch is not to your liking we can remove mem_cgroup_flush_stats() from workingset_refault().