Received: by 2002:a05:7412:b10a:b0:f3:1519:9f41 with SMTP id az10csp2958352rdb; Mon, 4 Dec 2023 12:13:23 -0800 (PST) X-Google-Smtp-Source: AGHT+IHYa10fSYJC0Lq8af+RRRiEes3XuUnISjJya+6sJoniIGVVjmByCaUpmBV3yss1k7pPkbp7 X-Received: by 2002:a17:903:186:b0:1d0:afd5:1e77 with SMTP id z6-20020a170903018600b001d0afd51e77mr1315959plg.42.1701720802735; Mon, 04 Dec 2023 12:13:22 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1701720802; cv=none; d=google.com; s=arc-20160816; b=umVvD28vbKsz8lpaGl8f87QXV+lBBAUbhecxou862+zb4Gi0+eSl1pfDAfL7OpqLm+ u4Bpi3aNCMJSkTTU+TJIU88h6usS9jt5ttwNanm/8a6R27a7BPSVzPZYhnnEhi91w3y+ mRtgZBD6a8juKzTZzaocCc9Rjq9Hi0Hzu9Rr4mvF8mhWLP8g2PIkuXarQy9X/bmcYduh GMn0uRGFhkB789ib6jkKxwfW6pzjV3VWd4UB4GewtTw2BznWfkpWlXmS+IylprsZi8Oc WiF87+WEpPI+8BqfagVjD7HzDTOWW5nMZbXHQYCqV6y5bMKLh9W6nTUdw23Wg1xz+Wb9 dBiA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:cc:to:subject :message-id:date:from:in-reply-to:references:mime-version :dkim-signature; bh=EG2n1RagWpJyioFCtKEQGNRIi6T2whr4+NS0jQsnqJc=; fh=gEEc67+bA/8hG7y9zfBKcEeqKFT25r+eEI+YNdLj7iQ=; b=p4/1RECbEifsBaKO6tgfZFIyrxHOX8rBcJRGjrU8o5Vg+qY4627v9aqKSstFuCVYqF C+sH3tyYgtQZN1HNxQVuCqdN3XDxrqCA1W50yJZZZ3mmFa+tfWSZXtW9SE4+5jWDQOX6 mUzyRDEh1B14BbrrYnakQ1UX5vsoqZWkFRa0HoYvqdGFosLlCZEnMikUAJexGI4Zrnx5 KgppXoTpuBISAtt5U2LG3XeApyrDIeiFcf2rnpsoRHIe46qcuNJKKymmJRPWcfuF+xng C4JiOxY7llydfiBhcZusuiUUMauWiM1SBoM1I52HL8SNYIjZSHIrfVrK0sLfrE2Hk2sC tnVg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b=LbmNSRC3; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:8 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from fry.vger.email (fry.vger.email. [2620:137:e000::3:8]) by mx.google.com with ESMTPS id iz9-20020a170902ef8900b001d068dfe6e5si5077708plb.31.2023.12.04.12.13.22 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 04 Dec 2023 12:13:22 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:8 as permitted sender) client-ip=2620:137:e000::3:8; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b=LbmNSRC3; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:8 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by fry.vger.email (Postfix) with ESMTP id 9E538805D5AC; Mon, 4 Dec 2023 12:13:19 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.11 at fry.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231501AbjLDUM7 (ORCPT + 99 others); Mon, 4 Dec 2023 15:12:59 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58226 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229983AbjLDUM5 (ORCPT ); Mon, 4 Dec 2023 15:12:57 -0500 Received: from mail-lj1-x22c.google.com (mail-lj1-x22c.google.com [IPv6:2a00:1450:4864:20::22c]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2A25CCE for ; Mon, 4 Dec 2023 12:13:04 -0800 (PST) Received: by mail-lj1-x22c.google.com with SMTP id 38308e7fff4ca-2c9fbb846b7so19263241fa.2 for ; Mon, 04 Dec 2023 12:13:04 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1701720782; x=1702325582; darn=vger.kernel.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=EG2n1RagWpJyioFCtKEQGNRIi6T2whr4+NS0jQsnqJc=; b=LbmNSRC3pKCuJDLGJ1XvjqON7WRsUw/WxQRwsRCTFTX1GU5HHIbhEtKUzW2OkIcdEc Vxu/WtcT2oScSIl9cqfTNSh+kZUnp03wTRnt4SgViF+Brzt9B4BNnOjGiG4kEdq+BOuy NyXq93+m7svRiRSsKoxHPSDSMk4cixiZ2woezxPVJ1MJUpOjk2JibFK8m3my77OKqMHx WlG0DgNhKabuhcANCuQPY0EM3gFotMWyGXVwUIntsGYLsMVdbMw2zy9e/cFQVVtPoQTt PRbq0GyX8YQMEHoX5YQ6mzdWBcGFere4JC2/j3saUGZUUbeb4jfJ9THN+PKR46lcfFF5 QB4w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1701720782; x=1702325582; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=EG2n1RagWpJyioFCtKEQGNRIi6T2whr4+NS0jQsnqJc=; b=bEr9owkNgjn5Ezgk70Aa6C51+jRunsiBLTfVX+Ze7KKTGzBdoTi0lzLF4uOEa8zPLR vPLZS8GB2ztfN/VksnY+2VeUX4WSsqmhgKOPdiCsP+yH8jFVPVfR95cdGre06Rye5Oq0 TOzCNmvEEE10CCYsawRwCuYCVKGCah8SRJg0sfhNBVzRku9MQ0y/TAZnnSfswyV+SNAv d3O30aTNow0hRizCl4mmWZJVGxHaaZY+7pkcOTBLWxliGybpGS7s8hSCmQW5wubb20Pd wsY4EUetBfMa2Ac6AeS8qkQHh5VdUykwcz8/bXzeLzEgqfna4NdpARASvLgW5wXh7P7Z A43A== X-Gm-Message-State: AOJu0YzpBQaxlGHvzgOsrq0ORI2Tsyv2ox0Y3oCnwvrDiyn0dLU5LCVW 8fZOg8Ay7Eh09ncLYd9Lx8djfVrR+Vjzel1AFDe7Sw== X-Received: by 2002:a2e:9a87:0:b0:2c9:efa3:e1e8 with SMTP id p7-20020a2e9a87000000b002c9efa3e1e8mr1610758lji.33.1701720782130; Mon, 04 Dec 2023 12:13:02 -0800 (PST) MIME-Version: 1.0 References: <20231129032154.3710765-1-yosryahmed@google.com> <20231129032154.3710765-6-yosryahmed@google.com> <20231202083129.3pmds2cddy765szr@google.com> In-Reply-To: <20231202083129.3pmds2cddy765szr@google.com> From: Yosry Ahmed Date: Mon, 4 Dec 2023 12:12:25 -0800 Message-ID: Subject: Re: [mm-unstable v4 5/5] mm: memcg: restore subtree stats flushing To: Shakeel Butt Cc: Andrew Morton , Johannes Weiner , Michal Hocko , Roman Gushchin , Muchun Song , Ivan Babrou , Tejun Heo , =?UTF-8?Q?Michal_Koutn=C3=BD?= , Waiman Long , kernel-team@cloudflare.com, Wei Xu , Greg Thelen , Domenico Cerasuolo , linux-mm@kvack.org, cgroups@vger.kernel.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-8.4 required=5.0 tests=DKIMWL_WL_MED,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE, USER_IN_DEF_DKIM_WL autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on fry.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (fry.vger.email [0.0.0.0]); Mon, 04 Dec 2023 12:13:19 -0800 (PST) On Sat, Dec 2, 2023 at 12:31=E2=80=AFAM Shakeel Butt = wrote: > > On Wed, Nov 29, 2023 at 03:21:53AM +0000, Yosry Ahmed wrote: > [...] > > +void mem_cgroup_flush_stats(struct mem_cgroup *memcg) > > { > > - if (memcg_should_flush_stats(root_mem_cgroup)) > > - do_flush_stats(); > > + static DEFINE_MUTEX(memcg_stats_flush_mutex); > > + > > + if (mem_cgroup_disabled()) > > + return; > > + > > + if (!memcg) > > + memcg =3D root_mem_cgroup; > > + > > + if (memcg_should_flush_stats(memcg)) { > > + mutex_lock(&memcg_stats_flush_mutex); > > What's the point of this mutex now? What is it providing? I understand > we can not try_lock here due to targeted flushing. Why not just let the > global rstat serialize the flushes? Actually this mutex can cause > latency hiccups as the mutex owner can get resched during flush and then > no one can flush for a potentially long time. I was hoping this was clear from the commit message and code comments, but apparently I was wrong, sorry. Let me give more context. In previous versions and/or series, the mutex was only used with flushes from userspace to guard in-kernel flushers against high contention from userspace. Later on, I kept the mutex for all memcg flushers for the following reasons: (a) Allow waiters to sleep: Unlike other flushers, the memcg flushing path can see a lot of concurrency. The mutex avoids having a lot of CPUs spinning (e.g. concurrent reclaimers) by allowing waiters to sleep. (b) Check the threshold under lock but before calling cgroup_rstat_flush(): The calls to cgroup_rstat_flush() are not very cheap even if there's nothing to flush, as we still need to iterate all CPUs. If flushers contend directly on the rstat lock, overlapping flushes will unnecessarily do the percpu iteration once they hold the lock. With the mutex, they will check the threshold again once they hold the mutex. (c) Protect non-memcg flushers from contention from memcg flushers. This is not as strong of an argument as protecting in-kernel flushers from userspace flushers. There has been discussions before about changing the rstat lock itself to be a mutex, which would resolve (a), but there are concerns about priority inversions if a low priority task holds the mutex and gets preempted, as well as the amount of time the rstat lock holder keeps the lock for: https://lore.kernel.org/lkml/ZO48h7c9qwQxEPPA@slm.duckdns.org/ I agree about possible hiccups due to the inner lock being dropped while the mutex is held. Running a synthetic test with high concurrency between reclaimers (in-kernel flushers) and stats readers show no material performance difference with or without the mutex. Maybe things cancel out, or don't really matter in practice. I would prefer to keep the current code as I think (a) and (b) could cause problems in the future, and the current form of the code (with the mutex) has already seen mileage with production workloads.