Received: by 2002:ac8:156:0:b0:3e0:cd10:60c8 with SMTP id f22csp2348247qtg; Wed, 22 Mar 2023 22:18:21 -0700 (PDT) X-Google-Smtp-Source: AK7set9UDr7blknV2L62cGtdoRAf3HmrFY2lbvttR35+jy/nmohOZIGrnOCY8nMvJ5/LKg9B/qz6 X-Received: by 2002:a17:906:bcce:b0:933:89a1:57e6 with SMTP id lw14-20020a170906bcce00b0093389a157e6mr9132439ejb.26.1679548701495; Wed, 22 Mar 2023 22:18:21 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1679548701; cv=none; d=google.com; s=arc-20160816; b=JqKoWKIRoMNJ/EQ7jo0JVzBhKZteEtBB/aP97qwBlLLohkr+LypiXEPtJ7uyeiso3j Gt9ySeLWvmuMmRlMap8DwftoKDo7JQcFVW8zVyu4GT1vV79o5PSfu1PpJNktGzx2WG2F bOCwRtesuFI5nhEtZyaIStGEIreVoSlAf4u7nb5yEkZBlpsGrJBnqfEOWWuniByyzDb1 0Nwu0v5qtJayCcJsCokUJ06itUVsljXyKOCvt11uEm6J1th0/gonCnlCLn1OzKXvxE1V JqRJE1f86nRG07k/as3+p7kxYebOqFN7yNBgd8BGoFvcs4EX8rCDQS0+0V4nhHgQG2BZ NJQw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:cc:to:subject :message-id:date:from:in-reply-to:references:mime-version :dkim-signature; bh=YiBnEdYy6Aid4dyiIRb9ptP+TZS4JSWAxv3bvFBhfG4=; b=T/9km9EMoVkZZ1k0VU4HMRG4jUMcldOMeC6Snv7SyzhekW6HFPLnDIjjkFPbbKXNi5 84GPYaYnx2EkglFA+6dDDv70gVMTUJruJjfXsUcW1OTiF7h0jc1X/YXAZH9RdUs/zLVK jRYupqjASLKwlO8gUdH2yR7iO9bmNoh9xcXov+npn96FnWoB37XJEzQKm6eDHdunLDdR wiwlsCnv84TbI+78jAEXeJLpB5eJov9P8ikSQKmO3UAJzQH5MKG96rpALVEN2t8ZgBSR Z3Qyw54gvXn/lTedqK0SVK3jttOmc1QZWQXEaSVMerig7egl71BiT35zLIijL2/JpYXE iATA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=KjytUVYD; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id sc6-20020a1709078a0600b0093cd63ce6f8si921202ejc.523.2023.03.22.22.17.57; Wed, 22 Mar 2023 22:18:21 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=KjytUVYD; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229997AbjCWFPv (ORCPT + 99 others); Thu, 23 Mar 2023 01:15:51 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42458 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229702AbjCWFPs (ORCPT ); Thu, 23 Mar 2023 01:15:48 -0400 Received: from mail-ed1-x52d.google.com (mail-ed1-x52d.google.com [IPv6:2a00:1450:4864:20::52d]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DC2AA1F5C4 for ; Wed, 22 Mar 2023 22:15:43 -0700 (PDT) Received: by mail-ed1-x52d.google.com with SMTP id ew6so18732472edb.7 for ; Wed, 22 Mar 2023 22:15:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; t=1679548542; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=YiBnEdYy6Aid4dyiIRb9ptP+TZS4JSWAxv3bvFBhfG4=; b=KjytUVYDmXDoW1kHgbDCbvCLYfxK+63b9aN2H7qBDtgtsxUI5OrZu53Lwl+GZnw0aJ 01ArWTb9RVmWC495n9DxCFBjmF0qY/lqyeeIAVSiAn/aUnYfvmYR3QRULUpzpOBma5HJ 1+ytefxZiq0W1QKk5BPSDY7oYrkC4lDmQhAqbKsHhetsX6cK5fJcoZErFKGhWuRXhxyI p2BcFq4WozSFWYwoA28WxVcjRBVpqMyGmKeVHUGeck+Via4SQbj3gRnZsH2+wmdGGYqJ m3bky+4aHeeGT/A27qicrKyfuYVue8fxY0dpOhHe+LAcLJLzrkBD+xp1/xz5qYKXjWiw GJsQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1679548542; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=YiBnEdYy6Aid4dyiIRb9ptP+TZS4JSWAxv3bvFBhfG4=; b=giyc6KIog5qe0TzRiY9Fhf8THbqMqieF31A7RDK1ON96ZMA0ZieHMK464jHp3P70Yz EKNY2+lKcqdqqSdbrRXW/q2VJwbzQRg8gO7J+1trZtHkQL31RR/y4FFNxtyUsfll6Ydk 8+F/CaENruS+MbD61JtznMUeeo3p/dHmtmGKgxxRalz1aQWOtpcNliBKAl0vlSOb4DkE kbi9Y+8nC8lRGRfcXF92KcgawnEeZ/vs4c3ZMkr//NIuvhjgwr14s3cE4f28+JM270LI bd5YCHkbbj+GUlndj0aJRn0VdCpmFN05NMumHEcOLRa1Ki0yeS8+MuN4QuYcIXbnVpS4 6NzA== X-Gm-Message-State: AO0yUKVGcB1/6YmxmtZoyVHWrO7tr2gL6I8XIOctvVepOITaN3rdmNA7 1mDPyB/WI420NJuYsnF6FD0rT+YgS6gDFeOnXKkSXA== X-Received: by 2002:a17:906:aac9:b0:927:912:6baf with SMTP id kt9-20020a170906aac900b0092709126bafmr4297724ejb.15.1679548542233; Wed, 22 Mar 2023 22:15:42 -0700 (PDT) MIME-Version: 1.0 References: <20230323040037.2389095-1-yosryahmed@google.com> <20230323040037.2389095-2-yosryahmed@google.com> In-Reply-To: From: Yosry Ahmed Date: Wed, 22 Mar 2023 22:15:05 -0700 Message-ID: Subject: Re: [RFC PATCH 1/7] cgroup: rstat: only disable interrupts for the percpu lock To: Shakeel Butt Cc: Tejun Heo , Josef Bacik , Jens Axboe , Zefan Li , Johannes Weiner , Michal Hocko , Roman Gushchin , Muchun Song , Andrew Morton , Vasily Averin , cgroups@vger.kernel.org, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, bpf@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-15.7 required=5.0 tests=DKIMWL_WL_MED,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,ENV_AND_HDR_SPF_MATCH, RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL, USER_IN_DEF_SPF_WL autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Mar 22, 2023 at 9:29=E2=80=AFPM Shakeel Butt = wrote: > > On Wed, Mar 22, 2023 at 9:00=E2=80=AFPM Yosry Ahmed wrote: > > > > Currently, when sleeping is not allowed during rstat flushing, we hold > > the global rstat lock with interrupts disabled throughout the entire > > flush operation. Flushing in an O(# cgroups * # cpus) operation, and > > having interrupts disabled throughout is dangerous. > > > > For some contexts, we may not want to sleep, but can be interrupted > > (e.g. while holding a spinlock or RCU read lock). As such, do not > > disable interrupts throughout rstat flushing, only when holding the > > percpu lock. This breaks down the O(# cgroups * # cpus) duration with > > interrupts disabled to a series of O(# cgroups) durations. > > > > Furthermore, if a cpu spinning waiting for the global rstat lock, it > > doesn't need to spin with interrupts disabled anymore. > > > > If the caller of rstat flushing needs interrupts to be disabled, it's u= p > > to them to decide that, and it should be fine to hold the global rstat > > lock with interrupts disabled. There is currently a single context that > > may invoke rstat flushing with interrupts disabled, the > > mem_cgroup_flush_stats() call in mem_cgroup_usage(), if called from > > mem_cgroup_threshold(). > > > > To make it safe to hold the global rstat lock with interrupts enabled, > > make sure we only flush from in_task() contexts. The side effect of tha= t > > we read stale stats in interrupt context, but this should be okay, as > > flushing in interrupt context is dangerous anyway as it is an expensive > > operation, so reading stale stats is safer. > > > > Signed-off-by: Yosry Ahmed > > Couple of questions: > > 1. What exactly is cgroup_rstat_lock protecting? Can we just remove it > altogether? I believe it protects the global state variables that we flush into. For example, for memcg, it protects mem_cgroup->vmstats. I tried removing the lock and allowing concurrent flushing on different cpus, by changing mem_cgroup->vmstats to use atomics instead, but that turned out to be a little expensive. Also, cgroup_rstat_lock is already contended by different flushers (mitigated by stats_flush_lock on the memcg side). If we remove it, concurrent flushers contend on every single percpu lock instead, which also seems to be expensive. > 2. Are we really calling rstat flush in irq context? I think it is possible through the charge/uncharge path: memcg_check_events()->mem_cgroup_threshold()->mem_cgroup_usage(). I added the protection against flushing in an interrupt context for future callers as well, as it may cause a deadlock if we don't disable interrupts when acquiring cgroup_rstat_lock. > 3. The mem_cgroup_flush_stats() call in mem_cgroup_usage() is only > done for root memcg. Why is mem_cgroup_threshold() interested in root > memcg usage? Why not ignore root memcg in mem_cgroup_threshold() ? I am not sure, but the code looks like event notifications may be set up on root memcg, which is why we need to check thresholds. Even if mem_cgroup_threshold() does not flush memcg stats, the purpose of this patch is to make sure the rstat flushing code itself is not disabling interrupts; which it currently does for any unsleepable context, even if it is interruptible.