Received: by 2002:a05:6358:11c7:b0:104:8066:f915 with SMTP id i7csp707739rwl; Fri, 24 Mar 2023 00:25:12 -0700 (PDT) X-Google-Smtp-Source: AKy350Yx8VpZ4AYA60RqAmWlUwE52mOoB9BOIWJODZEVLHhaWHOV5esjYdJkp59q/BWUb2BkK/CJ X-Received: by 2002:a17:903:4306:b0:1a1:98a9:4069 with SMTP id jz6-20020a170903430600b001a198a94069mr1305651plb.9.1679642712003; Fri, 24 Mar 2023 00:25:12 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1679642711; cv=none; d=google.com; s=arc-20160816; b=c+tbAlerfJa3d6x7mS4AyA5kMSUAtWuuHuD3sVN6HDQ2uOjQOCstIEt0kNPQrORho5 pToWuWb9lL7k9GyEjFP4bkRx/DM9JvtavZ+YeCs3/MV+SLO1lU9Ow1cdoDZU1Rr5OftV TaxPGilaWz1T+rQbiH3MSErWLbpW6GKfyDx2wrkiPtPFxYq4CmlxBqGf9NmNFSSLLc7r KbXvHdN1OIaHcazxHjPORfsOuAOCzyMXENF5VpY08RXSkTtjHZN41Qg6M9mAJ43qD5ie Kb8WVfkGN94jjVSmvpKirrsa9XNTsr7HzZWX1eZRm8jN+3L1dpLz8MwT/xYEyJ+GeWoG Ey1g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:cc:to:subject :message-id:date:from:in-reply-to:references:mime-version :dkim-signature; bh=Fa396lb8MjMOl888rXohQoljyuvFkmmjAMpaedz2iKQ=; b=cYGSna0Z+v5l0BKLuX7j93bYvRAfJ435MWP2DrqzMVIH1KcAIuDvb+MiGrLC/nAhrd J6dB2oD4MHd/LTIHd6+MEVA/O85Tnj6CMw4druvakF4Ic3S101eJ4f5rHDyNjKT80X6u swoqoxNzPAHBAOHxa86GFpJISKj8Ryjfg3d14hrcuQaZH7jO7IntpqMF/9cWNQdOJ/Jy a2udDXw26LzVLyR4HFD5ez+/CbEK0rd1Z1DqVCULtX9uMOucVmiKl8HPzXekxaQjz1EV S1KRnwh0625TULfs8TJdHRndVwLipelAn7VgcyY2/7iqEO9oLJWEtV1dFSiJrHd3A+Qp iHFg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=KJ9iz5Ux; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id n18-20020a63e052000000b0050317bc90a4si19922016pgj.547.2023.03.24.00.25.00; Fri, 24 Mar 2023 00:25:11 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=KJ9iz5Ux; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231327AbjCXHWx (ORCPT + 99 others); Fri, 24 Mar 2023 03:22:53 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44300 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231299AbjCXHWu (ORCPT ); Fri, 24 Mar 2023 03:22:50 -0400 Received: from mail-ed1-x52f.google.com (mail-ed1-x52f.google.com [IPv6:2a00:1450:4864:20::52f]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E71A24C06 for ; Fri, 24 Mar 2023 00:22:47 -0700 (PDT) Received: by mail-ed1-x52f.google.com with SMTP id i5so4428306eda.0 for ; Fri, 24 Mar 2023 00:22:47 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; t=1679642566; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=Fa396lb8MjMOl888rXohQoljyuvFkmmjAMpaedz2iKQ=; b=KJ9iz5Uxy+UE7eTmfuZpdee6ope6aJEfkespR2fZMv6ZHIq8x8lW4vifO9yR+IS7pd AK7dGPutrSQ13hJJ80evRzpXtgobfZFo8jXuTzRATpMeqGO6i8D5NSa73p0qsx5i481+ 7yXiGgxPWmcaJdwbUyT1PbP3K0ZyFQ7Al79kpt3c+Idj9EX40zqvfzZ8aXiS7pBZAi5c qDNLiwlOYnW0tVo8pGG8bbNsjTBTTctF7v1u8gW1K9bJA0CKfZjEf/5jsWat5ZudO5jG cU9nAY6htLcFCArHxPUvuBAUnt26YcyoPsjONXhbMoH+B0T69pLrtxIpj5GEcP1xxW3T K9jg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1679642566; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Fa396lb8MjMOl888rXohQoljyuvFkmmjAMpaedz2iKQ=; b=CU8YdgFE6YlYLzF6NlEZfuQTTWyFN0Ncvr2hXwbMhQ2FQVAL1xfbGLIZlWC51qrHn6 /nL3g8EsY4UkEBoHTKXa7DahsmnihxWC6POLA5A6RsH3kQEig7mNqgiwmp26tFdnPu52 NkNJFPP47vYh7vGeZfSp2SglZ6gYlTXXHGp+ldqnrBQa3+rCvcMWfqIfNNJji/17CCtC g5aDKZoGrHcapG5jh8PWQcVZs/Qjx0Ku70j302YcJpLsxr9Mz6DXDA6jGzH/aA7jTUaz 1Gmdp5LGlnfmzDxw35klPpRbceClSOlZaH8vLHk6XTye8a3xr8SpgDjTiG9m0CvuLjaB kRtA== X-Gm-Message-State: AO0yUKU/trDty9+OuPVYG26toHmF2P4d8MnZmo6SiK+E4RdGgpOZXbZ+ sJrga/aGz7+gVYs5tq2hnWiOYLzgiLPKki3JuQ3+ng== X-Received: by 2002:a05:6402:540c:b0:4fb:e069:77ac with SMTP id ev12-20020a056402540c00b004fbe06977acmr6398957edb.0.1679642566177; Fri, 24 Mar 2023 00:22:46 -0700 (PDT) MIME-Version: 1.0 References: <20230323040037.2389095-1-yosryahmed@google.com> <20230323040037.2389095-2-yosryahmed@google.com> In-Reply-To: From: Yosry Ahmed Date: Fri, 24 Mar 2023 00:22:09 -0700 Message-ID: Subject: Re: [RFC PATCH 1/7] cgroup: rstat: only disable interrupts for the percpu lock To: Tejun Heo Cc: Josef Bacik , Jens Axboe , Zefan Li , Johannes Weiner , Michal Hocko , Roman Gushchin , Shakeel Butt , Muchun Song , Andrew Morton , Vasily Averin , cgroups@vger.kernel.org, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, bpf@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-15.7 required=5.0 tests=DKIMWL_WL_MED,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,ENV_AND_HDR_SPF_MATCH, RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL, USER_IN_DEF_SPF_WL autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Mar 23, 2023 at 6:39=E2=80=AFPM Tejun Heo wrote: > > Hello, > > On Thu, Mar 23, 2023 at 04:00:31AM +0000, Yosry Ahmed wrote: > > Currently, when sleeping is not allowed during rstat flushing, we hold > > the global rstat lock with interrupts disabled throughout the entire > > flush operation. Flushing in an O(# cgroups * # cpus) operation, and > > having interrupts disabled throughout is dangerous. > > > > For some contexts, we may not want to sleep, but can be interrupted > > (e.g. while holding a spinlock or RCU read lock). As such, do not > > disable interrupts throughout rstat flushing, only when holding the > > percpu lock. This breaks down the O(# cgroups * # cpus) duration with > > interrupts disabled to a series of O(# cgroups) durations. > > > > Furthermore, if a cpu spinning waiting for the global rstat lock, it > > doesn't need to spin with interrupts disabled anymore. > > I'm generally not a fan of big spin locks w/o irq protection. They too of= ten > become a source of unpredictable latency spikes. As you said, the global > rstat lock can be held for quite a while. Removing _irq makes irq latency > better on the CPU but on the other hand it makes a lot more likely that t= he > lock is gonna be held even longer, possibly significantly so depending on > the configuration and workload which will in turn stall other CPUs waitin= g > for the lock. Sure, irqs are being serviced quicker but if the cost is mo= re > and longer !irq context multi-cpu stalls, what's the point? > > I don't think there's anything which requires the global lock to be held > throughout the entire flushing sequence and irq needs to be disabled when > grabbing the percpu lock anyway, so why not just release the global lock = on > CPU boundaries instead? We don't really lose anything significant that wa= y. > The durations of irq disabled sections are still about the same as in the > currently proposed solution at O(# cgroups) and we avoid the risk of hold= ing > the global lock for too long unexpectedly from getting hit repeatedly by > irqs while holding the global lock. Thanks for taking a look! I think a problem with this approach is that we risk having to contend for the global lock at every CPU boundary in atomic contexts. Right now we contend for the global lock once, and once we have it we go through all CPUs to flush, only having to contend with updates taking the percpu locks at this point. If we unconditionally release & reacquire the global lock at every CPU boundary then we may contend for it much more frequently with concurrent flushers. On the memory controller side, concurrent flushers are already held back to avoid a thundering herd problem on the global rstat lock, but flushers from outside the memory controller can still compete together or with a flusher from the memory controller. In this case, we risk contending the global lock more and concurrent flushers taking a longer period of time, which may end up causing multi-CPU stalls anyway, right? Also, if we keep _irq when spinning for the lock, then concurrent flushers still need to spin with irq disabled -- another problem that this series tries to fix. This is particularly a problem for flushers in atomic contexts. There is a flusher in mem_cgroup_wb_stats() that flushes while holding another spinlock, and a flusher in mem_cgroup_usage() that flushes with irqs disabled. If flushing takes a longer period of time due to repeated lock contention, it affects such atomic context negatively. I am not sure how all of this matters in practice, it depends heavily on the workloads and the configuration like you mentioned. I am just pointing out the potential disadvantages of reacquiring the lock at every CPU boundary in atomic contexts. > > Thanks. > > -- > tejun