Received: by 2002:a05:7412:b10a:b0:f3:1519:9f41 with SMTP id az10csp3055963rdb; Mon, 4 Dec 2023 15:47:42 -0800 (PST) X-Google-Smtp-Source: AGHT+IGhKsyxqJU0UK+gIXjjamyIIXsBGuUkUWjFWzrDw0oL2Um0YWDjHWvv8RHUIyHvFZz+QzmS X-Received: by 2002:a9d:6381:0:b0:6d9:ae04:d5ea with SMTP id w1-20020a9d6381000000b006d9ae04d5eamr1128085otk.52.1701733661919; Mon, 04 Dec 2023 15:47:41 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1701733661; cv=none; d=google.com; s=arc-20160816; b=f83x88+oiarqgsY4OMl0C/W2aye1a7wiMmG5m1+Yee3Y+KV3vx7dS4cQifBgyXzb6h QFNRxsK5t8c0UaMVM3Ws6D50tweU2skEasSckV9nVBrTKmmFRYO5SeQ96gGUXGeQULUu OzoLESVSJU0Gykjmf6reRCuM8Xw17VruRJxerowk/Y6+wptI3HxUn8qdxUcOPm0mOICs ccOJRUTgWs6SXMGBGpqYCu4A1RZFNfTx/aSbu2zb+xzKGVPhckp+tlnxZl5tJr6scbpl 2KVhQagIUHNYNG2fQjr0y6KYFIgPM4MkAOed078Mkc7KIY5Ut1c6T5mVnkz8tEKnLPQt uUmg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:cc:to:subject :message-id:date:from:in-reply-to:references:mime-version :dkim-signature; bh=rtEY70Swapvm1FcCGsKX1rWlu1dJk5FF5Xvr3KubAN8=; fh=GrqiXUzyb/e8utXAFxt3314M2zdQMUBpEq8ldNaSl5g=; b=UxFvAVa6TPArAW/+MHyvEFJHnN5DMRduDBspvR6Y/n47osT2xOjfcqCjLSPaqJjJMK Ax+cXBzU+/hzrjqN/l8K0OPN37xF2ctSRkivaq10u4t1AuY/yPWUNLO/psnJqjW8pLUM YJx/twZ1nQWOE+ruFVzktbvO/RAHdOgw1+buQcmnchr33659u4Wa8QALis5PhJGDTKGS XpCbPtIlOVY0Z4HUp+3ke1vwyu6ipQi+hs/b9vEeeBH6jvN1Ii1aXUEYMeEq362LAYJ9 SC24pzqALAHWeGLJyLrzLBmF7xA9uTlZCRxtflSgEniyRUbmKr3FiTI+vz1r6zh9FwJQ QyCA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b=TjCqGhX5; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.31 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from morse.vger.email (morse.vger.email. [23.128.96.31]) by mx.google.com with ESMTPS id j15-20020a63594f000000b005c67029a131si3726036pgm.220.2023.12.04.15.47.41 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 04 Dec 2023 15:47:41 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.31 as permitted sender) client-ip=23.128.96.31; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b=TjCqGhX5; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.31 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by morse.vger.email (Postfix) with ESMTP id 44EAD8043AF3; Mon, 4 Dec 2023 15:47:39 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.11 at morse.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231555AbjLDXrX (ORCPT + 99 others); Mon, 4 Dec 2023 18:47:23 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35122 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234577AbjLDXrW (ORCPT ); Mon, 4 Dec 2023 18:47:22 -0500 Received: from mail-wr1-x434.google.com (mail-wr1-x434.google.com [IPv6:2a00:1450:4864:20::434]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id F0918D4D for ; Mon, 4 Dec 2023 15:46:44 -0800 (PST) Received: by mail-wr1-x434.google.com with SMTP id ffacd0b85a97d-332fd78fa9dso4049062f8f.3 for ; Mon, 04 Dec 2023 15:46:44 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1701733603; x=1702338403; darn=vger.kernel.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=rtEY70Swapvm1FcCGsKX1rWlu1dJk5FF5Xvr3KubAN8=; b=TjCqGhX5NancIvUQCaA4I61jwwjcTFOIE1UqYz0pwpEWB2bzdyeuqm/eu5iZqi3SWA CZfj3qNpDct0ARYDJjIlKmbJxXagXYBcXCD7VWxhgw8jRvOmNwnw/i5v6Jedd2vc25tC oekt4Ba80Rn39QEtnJBNAr+iDmUcR0kUxJN0Fd5yjVg8OqDOw3XP7pU7mavgdXvKdNjd P4yocS3PApu0fjEZEsTkIxjm8g/ySu8ijlPnbTVxDmHZZdLZ6RxdvghLP2i4ne23zbmk 4AAThz7WQsZPvAUoMZV/86J2jQGGheKWC3KDzoLve2RMNXY64j0UCz7ayc3RbZdAvg8t Ystg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1701733603; x=1702338403; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=rtEY70Swapvm1FcCGsKX1rWlu1dJk5FF5Xvr3KubAN8=; b=sbH53rgw5Im9Qqhn+d+8Id5yGwgAE5iinRQDowLmvx+AaJM4VxX3yAp4tp0BDKJGSZ PdCv55EQXnkwztcsaNu9/RtaG5Oy2vVaO/gcUWUEWrgiKpHLGarpZ3OiZNCCl/t4xJQ6 9pL2djMNFLAcX5YLG+Ce9MYyXUD+sIuRY3WfSI4aDUh0FmXi3538381cC7zI5lP5PbcB WPBDbMVktFNBVr6k9afkeSuKNuKaMREltyJw9SqbAOdFBmNfTbX8x4ADCiowU59sWzSn YftanRqOLAWle5BFCsYkhpHUN9M6bYbENBVlRa3VvncYGzBxA1DTQ10rhuQoBKTbwR4C z5lA== X-Gm-Message-State: AOJu0Yw/6hc/dVAJPq83zzNhunTNX9GwY+P5F/mp4pWpiCG1UUl51kCB dmSysHubDGz0UyK8QcMfLAVFNgY/OjfJlXyuwuwcyg== X-Received: by 2002:a5d:522b:0:b0:333:43bd:5c67 with SMTP id i11-20020a5d522b000000b0033343bd5c67mr884514wra.170.1701733602840; Mon, 04 Dec 2023 15:46:42 -0800 (PST) MIME-Version: 1.0 References: <20231129032154.3710765-1-yosryahmed@google.com> <20231129032154.3710765-6-yosryahmed@google.com> <20231202083129.3pmds2cddy765szr@google.com> In-Reply-To: From: Wei Xu Date: Mon, 4 Dec 2023 15:46:31 -0800 Message-ID: Subject: Re: [mm-unstable v4 5/5] mm: memcg: restore subtree stats flushing To: Shakeel Butt Cc: Yosry Ahmed , Andrew Morton , Johannes Weiner , Michal Hocko , Roman Gushchin , Muchun Song , Ivan Babrou , Tejun Heo , =?UTF-8?Q?Michal_Koutn=C3=BD?= , Waiman Long , kernel-team@cloudflare.com, Greg Thelen , Domenico Cerasuolo , linux-mm@kvack.org, cgroups@vger.kernel.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-8.4 required=5.0 tests=DKIMWL_WL_MED,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE, USER_IN_DEF_DKIM_WL autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on morse.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (morse.vger.email [0.0.0.0]); Mon, 04 Dec 2023 15:47:39 -0800 (PST) On Mon, Dec 4, 2023 at 3:31=E2=80=AFPM Shakeel Butt w= rote: > > On Mon, Dec 4, 2023 at 1:38=E2=80=AFPM Yosry Ahmed wrote: > > > > On Mon, Dec 4, 2023 at 12:12=E2=80=AFPM Yosry Ahmed wrote: > > > > > > On Sat, Dec 2, 2023 at 12:31=E2=80=AFAM Shakeel Butt wrote: > > > > > > > > On Wed, Nov 29, 2023 at 03:21:53AM +0000, Yosry Ahmed wrote: > > > > [...] > > > > > +void mem_cgroup_flush_stats(struct mem_cgroup *memcg) > > > > > { > > > > > - if (memcg_should_flush_stats(root_mem_cgroup)) > > > > > - do_flush_stats(); > > > > > + static DEFINE_MUTEX(memcg_stats_flush_mutex); > > > > > + > > > > > + if (mem_cgroup_disabled()) > > > > > + return; > > > > > + > > > > > + if (!memcg) > > > > > + memcg =3D root_mem_cgroup; > > > > > + > > > > > + if (memcg_should_flush_stats(memcg)) { > > > > > + mutex_lock(&memcg_stats_flush_mutex); > > > > > > > > What's the point of this mutex now? What is it providing? I underst= and > > > > we can not try_lock here due to targeted flushing. Why not just let= the > > > > global rstat serialize the flushes? Actually this mutex can cause > > > > latency hiccups as the mutex owner can get resched during flush and= then > > > > no one can flush for a potentially long time. > > > > > > I was hoping this was clear from the commit message and code comments= , > > > but apparently I was wrong, sorry. Let me give more context. > > > > > > In previous versions and/or series, the mutex was only used with > > > flushes from userspace to guard in-kernel flushers against high > > > contention from userspace. Later on, I kept the mutex for all memcg > > > flushers for the following reasons: > > > > > > (a) Allow waiters to sleep: > > > Unlike other flushers, the memcg flushing path can see a lot of > > > concurrency. The mutex avoids having a lot of CPUs spinning (e.g. > > > concurrent reclaimers) by allowing waiters to sleep. > > > > > > (b) Check the threshold under lock but before calling cgroup_rstat_fl= ush(): > > > The calls to cgroup_rstat_flush() are not very cheap even if there's > > > nothing to flush, as we still need to iterate all CPUs. If flushers > > > contend directly on the rstat lock, overlapping flushes will > > > unnecessarily do the percpu iteration once they hold the lock. With > > > the mutex, they will check the threshold again once they hold the > > > mutex. > > > > > > (c) Protect non-memcg flushers from contention from memcg flushers. > > > This is not as strong of an argument as protecting in-kernel flushers > > > from userspace flushers. > > > > > > There has been discussions before about changing the rstat lock itsel= f > > > to be a mutex, which would resolve (a), but there are concerns about > > > priority inversions if a low priority task holds the mutex and gets > > > preempted, as well as the amount of time the rstat lock holder keeps > > > the lock for: > > > https://lore.kernel.org/lkml/ZO48h7c9qwQxEPPA@slm.duckdns.org/ > > > > > > I agree about possible hiccups due to the inner lock being dropped > > > while the mutex is held. Running a synthetic test with high > > > concurrency between reclaimers (in-kernel flushers) and stats readers > > > show no material performance difference with or without the mutex. > > > Maybe things cancel out, or don't really matter in practice. > > > > > > I would prefer to keep the current code as I think (a) and (b) could > > > cause problems in the future, and the current form of the code (with > > > the mutex) has already seen mileage with production workloads. > > > > Correction: The priority inversion is possible on the memcg side due > > to the mutex in this patch. Also, for point (a), the spinners will > > eventually sleep once they hold the lock and hit the first CPU > > boundary -- because of the lock dropping and cond_resched(). So > > eventually, all spinners should be able to sleep, although it will be > > a while until they do. With the mutex, they all sleep from the > > beginning. Point (b) still holds though. > > > > I am slightly inclined to keep the mutex but I can send a small fixlet > > to remove it if others think otherwise. > > > > Shakeel, Wei, any preferences? > > My preference is to avoid the issue we know we see in production alot > i.e. priority inversion. > > In future if you see issues with spinning then you can come up with > the lockless flush mechanism at that time. Given that the synthetic high concurrency test doesn't show material performance difference between the mutex and non-mutex versions, I agree that the mutex can be taken out from this patch set (one less global mutex to worry about).