Received: by 2002:ab2:6203:0:b0:1f5:f2ab:c469 with SMTP id o3csp76656lqt; Thu, 18 Apr 2024 08:50:52 -0700 (PDT) X-Forwarded-Encrypted: i=3; AJvYcCVbeOoMOmzqX0pua5WFEqs1iUboPMfMxbyg28G1hDvB+mkCMQNrYJY4oIAXqYdYGMXKWmkeA0WHCYpQuyEQmd3H9wgfVEXldAINQ5K/SA== X-Google-Smtp-Source: AGHT+IE5NoL/8Drs26ExLvafq0nBWqUnHPdjok5p32Cwz9IJ+13mTO+zxcBSj2hXnReJlkL32Zhd X-Received: by 2002:ac8:5fc4:0:b0:436:5ca6:cd90 with SMTP id k4-20020ac85fc4000000b004365ca6cd90mr4197750qta.60.1713455452452; Thu, 18 Apr 2024 08:50:52 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1713455452; cv=pass; d=google.com; s=arc-20160816; b=v2yf1cs35Y6uWie9MHMyKb2NeSKvo6EjxVW+zA4W4nfnEuZ1nN0imKffoe0nj23NXN sqVpHwZUVSKEGacy2xdMVYhMxRPFHwQmERZNmA9IjbYAb13weloqm4rT2oge3i+8hGfs IodhMbCCIK+uXp5qbpocJ3N+KUZN0kVu2cKCvxu5MRBVulQGHMI++I+sdRorkcxu1l/Y rbR4F+FGkh/3TiNg6ghLCYEiYLRZ3nFPatxahUrDirYhKi9kKrWsQMKwIfzeW2/nS3Uk ibZC5uBk/CDlGm5wq7Efl8t5aUNmeSLRHw2bYblBOCpUjSl6cu9hBDBCXohkrEJQeOOQ ZuYw== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:list-unsubscribe:list-subscribe:list-id:precedence :references:message-id:subject:cc:to:from:dkim-signature:date; bh=ziaS2a2JLbKpn9yWeB1wELMDTiamfefOEg/5bnjE4gE=; fh=nGUocZvDufSpVLZTdzYjI8fXYMul0To4lCL4Uq3xcfw=; b=mEHauCv20NAtQDE5ogOKD7v65lC3QdJCgmcHGKYjIz+wy5aUfCzvoSsHj1eubkFJqh uIH8BQrmh2nJAMeyc1jKI/jByCWedfkKPolIXfBw7bh4HxW9AGQfaHY33Tcpj39wsWCF Dso4kUviGFm9GcJkBSzJyb/uUCXYNRvHpg4zLFyUwmCxnxOXki1+30TDlarEl2tJtyEv cZjHAmtMeSx7lEaoMbqp+Cm7QUI3dzEEtamhKVPiK+2yphyhoYFcn33cQvAw/cBRN8Vi r/OqJaZWetpmZLSe5je6sB0ZLFV/PvoTg2C/7T+hUmMEUkJboMo2qJr6zcIE9A/GzQsu r//Q==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@linux.dev header.s=key1 header.b=IsVD6mVu; arc=pass (i=1 spf=pass spfdomain=linux.dev dkim=pass dkdomain=linux.dev dmarc=pass fromdomain=linux.dev); spf=pass (google.com: domain of linux-kernel+bounces-150425-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45d1:ec00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-150425-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linux.dev Return-Path: Received: from ny.mirrors.kernel.org (ny.mirrors.kernel.org. [2604:1380:45d1:ec00::1]) by mx.google.com with ESMTPS id 16-20020ac85910000000b004372b268dbfsi1735975qty.571.2024.04.18.08.50.52 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 18 Apr 2024 08:50:52 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel+bounces-150425-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45d1:ec00::1 as permitted sender) client-ip=2604:1380:45d1:ec00::1; Authentication-Results: mx.google.com; dkim=pass header.i=@linux.dev header.s=key1 header.b=IsVD6mVu; arc=pass (i=1 spf=pass spfdomain=linux.dev dkim=pass dkdomain=linux.dev dmarc=pass fromdomain=linux.dev); spf=pass (google.com: domain of linux-kernel+bounces-150425-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45d1:ec00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-150425-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linux.dev Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ny.mirrors.kernel.org (Postfix) with ESMTPS id 00F3E1C234E9 for ; Thu, 18 Apr 2024 15:50:49 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 92F0516F859; Thu, 18 Apr 2024 15:49:53 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="IsVD6mVu" Received: from out-174.mta0.migadu.com (out-174.mta0.migadu.com [91.218.175.174]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 765C9168B0D for ; Thu, 18 Apr 2024 15:49:49 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=91.218.175.174 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1713455392; cv=none; b=uhlElHUvsnpnIcV7dSNe5KXCUKEq6JKMshsq2CbqvxQquCUzWcCxR8z4R/zJ2d2BJXkQTeoO4EQMc9zqwJ9sGrYzs5iQZsCbAhA/jLjLr1fWvA1IdTXgJfqPSWIPMdxZlJKo6i4Wm9Yh0OWVuldvV7VWQpBungolF80/Csi+by4= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1713455392; c=relaxed/simple; bh=p/dF7JMox6BrqaZIBuvA5pvpXeCd9gZESmiZZctmfkw=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=jHBjJmUePJCPPS/EnQHrvBgr3xtDk2h7iyUI6BlO4PDcq2G/gdOS7amCjxkTLnG9ZgyZjs1GZZY0SGWVzMG+DhHtOcVMz8/KE2UFasXCV/Bc3Xn35GUkhfJ4dMzT2ivdaFnUnDAysJnQaafpdQiN2onmEk6X6mlIhuttKMMNgD8= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=IsVD6mVu; arc=none smtp.client-ip=91.218.175.174 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Date: Thu, 18 Apr 2024 08:49:42 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1713455387; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=ziaS2a2JLbKpn9yWeB1wELMDTiamfefOEg/5bnjE4gE=; b=IsVD6mVumHoODJn7B/wl4ZNhpjw8ihVo+pRJpCI3y59wPDzivclT2aF5c/mQG55Mp+pFY+ hEtMsbiS1SKwooNaoh23/Hf/B/G5u+7a4sg+2RchIP7qIEsZEfKw3QL2xjHtkUE6VINkrO tqr5ma/KinE2a4BcflYzP6lbU5+7ekY= X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Shakeel Butt To: Jesper Dangaard Brouer Cc: Yosry Ahmed , tj@kernel.org, hannes@cmpxchg.org, lizefan.x@bytedance.com, cgroups@vger.kernel.org, longman@redhat.com, netdev@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, kernel-team@cloudflare.com, Arnaldo Carvalho de Melo , Sebastian Andrzej Siewior , mhocko@kernel.org Subject: Re: [PATCH v1 3/3] cgroup/rstat: introduce ratelimited rstat flushing Message-ID: <4o4qxf3tcos5rl7h2noldeg3knqkgc2ph36tv2cceourbsxgas@xicxkcacme7v> References: <171328983017.3930751.9484082608778623495.stgit@firesoul> <171328990014.3930751.10674097155895405137.stgit@firesoul> <72e4a55e-a246-4e28-9d2e-d4f1ef5637c2@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <72e4a55e-a246-4e28-9d2e-d4f1ef5637c2@kernel.org> X-Migadu-Flow: FLOW_OUT On Thu, Apr 18, 2024 at 01:00:30PM +0200, Jesper Dangaard Brouer wrote: > > > On 18/04/2024 04.21, Yosry Ahmed wrote: > > On Tue, Apr 16, 2024 at 10:51 AM Jesper Dangaard Brouer wrote: > > > > > > This patch aims to reduce userspace-triggered pressure on the global > > > cgroup_rstat_lock by introducing a mechanism to limit how often reading > > > stat files causes cgroup rstat flushing. > > > > > > In the memory cgroup subsystem, memcg_vmstats_needs_flush() combined with > > > mem_cgroup_flush_stats_ratelimited() already limits pressure on the > > > global lock (cgroup_rstat_lock). As a result, reading memory-related stat > > > files (such as memory.stat, memory.numa_stat, zswap.current) is already > > > a less userspace-triggerable issue. > > > > > > However, other userspace users of cgroup_rstat_flush(), such as when > > > reading io.stat (blk-cgroup.c) and cpu.stat, lack a similar system to > > > limit pressure on the global lock. Furthermore, userspace can easily > > > trigger this issue by reading those stat files. > > > > > > Typically, normal userspace stats tools (e.g., cadvisor, nomad, systemd) > > > spawn threads that read io.stat, cpu.stat, and memory.stat (even from the > > > same cgroup) without realizing that on the kernel side, they share the > > > same global lock. This limitation also helps prevent malicious userspace > > > applications from harming the kernel by reading these stat files in a > > > tight loop. > > > > > > To address this, the patch introduces cgroup_rstat_flush_ratelimited(), > > > similar to memcg's mem_cgroup_flush_stats_ratelimited(). > > > > > > Flushing occurs per cgroup (even though the lock remains global) a > > > variable named rstat_flush_last_time is introduced to track when a given > > > cgroup was last flushed. This variable, which contains the jiffies of the > > > flush, shares properties and a cache line with rstat_flush_next and is > > > updated simultaneously. > > > > > > For cpu.stat, we need to acquire the lock (via cgroup_rstat_flush_hold) > > > because other data is read under the lock, but we skip the expensive > > > flushing if it occurred recently. > > > > > > Regarding io.stat, there is an opportunity outside the lock to skip the > > > flush, but inside the lock, we must recheck to handle races. > > > > > > Signed-off-by: Jesper Dangaard Brouer > > > > As I mentioned in another thread, I really don't like time-based > > rate-limiting [1]. Would it be possible to generalize the > > magnitude-based rate-limiting instead? Have something like > > memcg_vmstats_needs_flush() in the core rstat code? > > > > I've taken a closer look at memcg_vmstats_needs_flush(). And I'm > concerned about overhead maintaining the stats (that is used as a filter). > > static bool memcg_vmstats_needs_flush(struct memcg_vmstats *vmstats) > { > return atomic64_read(&vmstats->stats_updates) > > MEMCG_CHARGE_BATCH * num_online_cpus(); > } > > I looked at `vmstats->stats_updates` to see how often this is getting > updated. It is updated in memcg_rstat_updated(), but it gets inlined into a > number function (__mod_memcg_state, __mod_memcg_lruvec_state, > __count_memcg_events), plus it calls cgroup_rstat_updated(). > Counting invocations per sec (via funccount): > > 10:28:09 > FUNC COUNT > __mod_memcg_state 377553 > __count_memcg_events 393078 > __mod_memcg_lruvec_state 1229673 > cgroup_rstat_updated 2632389 > Is it possible for you to also measure the frequency of the unique callstacks calling these functions? In addition the frequency of the each stat item update would be awesome. > > I'm surprised to see how many time per sec this is getting invoked. > Originating from memcg_rstat_updated() = 2,000,304 times per sec. > (On a 128 CPU core machine with 39% idle CPU-load.) > Maintaining these stats seems excessive... > > Then how often does the filter lower pressure on lock: > > MEMCG_CHARGE_BATCH(64) * 128 CPU = 8192 > 2000304/(64*128) = 244 time per sec (every ~4ms) > (assuming memcg_rstat_updated val=1) > It seems like we have opportunities to improve the stat update side and we definitely need to improve the stat flush side. One issue from the memcg side is that kernel has to do a lot of work, so we should be reducing that.