Received: by 2002:a05:6358:11c7:b0:104:8066:f915 with SMTP id i7csp1959935rwl; Fri, 24 Mar 2023 19:20:37 -0700 (PDT) X-Google-Smtp-Source: AKy350YSFtulg5sN5NntIEBjGs9ofPXyt0ARs7Xy61PBGxUWXN2KMwSqDOiFztIQfdmhEwQPmS83 X-Received: by 2002:aa7:9603:0:b0:627:e342:7f0e with SMTP id q3-20020aa79603000000b00627e3427f0emr4930994pfg.30.1679710837249; Fri, 24 Mar 2023 19:20:37 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1679710837; cv=none; d=google.com; s=arc-20160816; b=ELD4md0I1Nqzab9SRNesWlAX1sqWmCFUf4qTeL64zedWS2xi6TLz/8wjyNOaAOnUbz 6FwO8+quu4L12P8xDIdw2mej94jrmuAF/9iklD/QON6phteJ+ZMeDMlT+JMbV1T2ExDx UR7Wd/rQcUhVyNoYrbwqKdvCr7XmHEY4O8HRLl9nprRgJf8wLPrRiItvH+rhwmN3wJNZ RDa88LX6wotYUoLxAk9OpQ5ccm1eh13mpq4ubG4PfuI0XdQ/xinilsSgq6ayZdkg11Ji 3xJmRDpBYeNr12O4gtWfXuNDxIHd2grMbNzQpsSdUyhJGnaUyW36k13j9qTip8xXiO5o kZzQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:cc:to:subject :message-id:date:from:in-reply-to:references:mime-version :dkim-signature; bh=Uw0vvRX6fB45yziv3c2fRfTS3bTqjYau48XTnRaZKUQ=; b=o8oQNy9pcWJlEJ6WR2jv2mMwKzg1iLtvc42Y6XP7DQGC6qXJoJ413B2V7wqaClFUFq vGe6k8ja9RvdVIEOQ6SPjP84MYtIAVghpRlvN2MHyuBBQPS8ycjKG8hAK1HJ1L+3el1K FrHSi63iBpVl+wm53HlGZRlWzdj3ZSNmAol08/zR6ePvJTW/zHqJOFYgAfEQ542r7OKp F8qx1EHgZo7/pqwiX3hV0+hlmPgBgcB6Qm81KH9ypfC/WmUd/F0jVcRQW4YrBX8kYV6J tZ+tZjp06nIH2YoeQzg72lmvxK3xIRRhBfMhB1+m6Eb7tClNmb4qMz3kO5kSJNo8t3Xn s4sA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=O7Jst4Ab; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id 68-20020a621947000000b0062ae6345c73si2877347pfz.398.2023.03.24.19.20.25; Fri, 24 Mar 2023 19:20:37 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=O7Jst4Ab; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231877AbjCYCSf (ORCPT + 99 others); Fri, 24 Mar 2023 22:18:35 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59432 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231277AbjCYCSe (ORCPT ); Fri, 24 Mar 2023 22:18:34 -0400 Received: from mail-ed1-x52b.google.com (mail-ed1-x52b.google.com [IPv6:2a00:1450:4864:20::52b]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6BC3C212E for ; Fri, 24 Mar 2023 19:18:32 -0700 (PDT) Received: by mail-ed1-x52b.google.com with SMTP id eh3so14628634edb.11 for ; Fri, 24 Mar 2023 19:18:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; t=1679710711; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=Uw0vvRX6fB45yziv3c2fRfTS3bTqjYau48XTnRaZKUQ=; b=O7Jst4Abb03/2SPXMx5C68RHpw3r3kFMKH7fZnFXh0aJIp2ZBgoM9SwaZL0lI/9pGh nOnlLtm4EcFl3i8d0j46aVqMGUTdlYBqj5yZXFsOZY9Wt0HiaQAMqbcFmRL6lcVCWdbx TnkXpXZS53IaAukR3UsCxNyfv1DwvSSVUN2GtB8yMaLijLAA2wiQ4TtPwRaAWmO3RNMB ssGwMxA6FnWB+KLIdXXQrz42GNgvAWzmC9y1auxEEhX5ynPFv8qM/jW7VouPr1t8bz9W o/YBrB3nWIDIOZaWdVAxep7Nkgx0jPcDgraTEVMPnZzXibYcCq16TyKvx4l4lRaML3zU Wd7g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1679710711; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Uw0vvRX6fB45yziv3c2fRfTS3bTqjYau48XTnRaZKUQ=; b=Fw9mEWdzdtMq0r+0wL7ZRpg6kqXJVR1b8Ht6zjuANUqiLtzWNHLWbGoThnkYkOxaAf tp5H55EAvEbXC6JXCFtgnxN4AfvQy1M+rHtGWHjo8WvjfFYZjBQam5d/l6EE9ZWSJOCC MRum/UFugicjNlPJw5lu/xCnidlgsPZ0FOVmIpi+bv8Jxybq8BOSpBO2p49Qfbk+skB1 Zg9hFgqULq5JppUlY3u0lAY6ZXw2u0qPQ2J25sSwgTy3iJc7c+rUlhM//KO9/qjtP6K9 193S/vSoeYrfpb+O2O0uh/HxjsbMQDgWGMFrLnf5UBxEn4JnEMZ4ZSAuZyWh6+NSW0kA 35DQ== X-Gm-Message-State: AAQBX9ePLdvskP+FPh95a0qrOSnW/tRReYMWAYIH7LrRw0cRFUEKlw2d Jql8sx4C5opUtxTn7ERm/CoanJblQz4ZczAuF/Io+Q== X-Received: by 2002:a17:906:eec7:b0:93e:186f:ea0d with SMTP id wu7-20020a170906eec700b0093e186fea0dmr2057027ejb.15.1679710710778; Fri, 24 Mar 2023 19:18:30 -0700 (PDT) MIME-Version: 1.0 References: <20230323040037.2389095-1-yosryahmed@google.com> <20230323040037.2389095-2-yosryahmed@google.com> In-Reply-To: From: Yosry Ahmed Date: Fri, 24 Mar 2023 19:17:54 -0700 Message-ID: Subject: Re: [RFC PATCH 1/7] cgroup: rstat: only disable interrupts for the percpu lock To: Tejun Heo Cc: Josef Bacik , Jens Axboe , Zefan Li , Johannes Weiner , Michal Hocko , Roman Gushchin , Shakeel Butt , Muchun Song , Andrew Morton , Vasily Averin , cgroups@vger.kernel.org, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, bpf@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-15.7 required=5.0 tests=DKIMWL_WL_MED,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,ENV_AND_HDR_SPF_MATCH, RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL, USER_IN_DEF_SPF_WL autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Mar 24, 2023 at 6:54=E2=80=AFPM Tejun Heo wrote: > > Hello, > > On Fri, Mar 24, 2023 at 12:22:09AM -0700, Yosry Ahmed wrote: > > I think a problem with this approach is that we risk having to contend > > for the global lock at every CPU boundary in atomic contexts. Right > > now we contend for the global lock once, and once we have it we go > > through all CPUs to flush, only having to contend with updates taking > > the percpu locks at this point. If we unconditionally release & > > reacquire the global lock at every CPU boundary then we may contend > > for it much more frequently with concurrent flushers. > > > > On the memory controller side, concurrent flushers are already held > > back to avoid a thundering herd problem on the global rstat lock, but > > flushers from outside the memory controller can still compete together > > or with a flusher from the memory controller. In this case, we risk > > contending the global lock more and concurrent flushers taking a > > longer period of time, which may end up causing multi-CPU stalls > > anyway, right? Also, if we keep _irq when spinning for the lock, then > > concurrent flushers still need to spin with irq disabled -- another > > problem that this series tries to fix. > > > > This is particularly a problem for flushers in atomic contexts. There > > is a flusher in mem_cgroup_wb_stats() that flushes while holding > > another spinlock, and a flusher in mem_cgroup_usage() that flushes > > with irqs disabled. If flushing takes a longer period of time due to > > repeated lock contention, it affects such atomic context negatively. > > > > I am not sure how all of this matters in practice, it depends heavily > > on the workloads and the configuration like you mentioned. I am just > > pointing out the potential disadvantages of reacquiring the lock at > > every CPU boundary in atomic contexts. > > So, I'm not too convinced by the arguments for a couple reasons: > > * It's not very difficult to create conditions where a contented non-irq > protected spinlock is held unnecessarily long due to heavy IRQ irq load= on > the holding CPU. This can easily extend the amount of time the lock is > held by multiple times if not orders of magnitude. That is likely a > significantly worse problem than the contention on the lock cacheline > which will lead to a lot more gradual degradation. I agree that can be a problem, it depends on the specific workload and configuration. The continuous lock contention at each CPU boundary causes a regression (see my reply to Waiman), but I am not sure if it's worse than the scenario you are describing. > > * If concurrent flushing is an actual problem, we need and can implement = a > better solution. There's quite a bit of maneuvering room here given tha= t > the flushing operations are mostly idempotent in close time proximity a= nd > there's no real atomicity requirement across different segments of > flushing operations. Concurrent flushing can be a problem for some workloads, especially in the MM code we flush in the reclaim and refault paths. This is currently mitigated by only allowing one flusher at a time from the memcg side, but contention can still happen with flushing when a cgroup is being freed or other flushers in other subsystems. I tried allowing concurrent flushing by completely removing the global rstat lock, and only depending on the percpu locks for synchronization. For this to be correct the global stat counters need to be atomic, this introduced a slow down for flushing in general. I also noticed heavier lock contention on the percpu locks, since all flushers try to acquire all locks in the same order. I even tried implementing a simple retry scheme where we try to acquire the percpu lock, and if we fail we queue the current cpu and move to the next one. This ended up causing a little bit of slowness as well. Together with the slowness introduced by atomic operations it seemed like a significant regression in the simple flushing path. Don't get me wrong, I am all for modifying the current approach, I just want to make sure we are making the correct decision for *most* workloads. Keep in mind that this series aims to minimize the number of flushers from atomic contexts as well, and for non-atomic flushers we allow giving up the lock at CPU boundaries anyway. The current approach only keeps the lock held throughout for atomic flushers. Any ideas here are welcome! > > Thanks. > > -- > tejun