Received: by 2002:a05:6358:11c7:b0:104:8066:f915 with SMTP id i7csp7941041rwl; Thu, 23 Mar 2023 10:28:36 -0700 (PDT) X-Google-Smtp-Source: AK7set9sHt//cwU1SB/PQ7QRNOP+Y27iJaHfzZBOTVfBqB2nSxey6qnSJRajgiNG+BC96sS3YmcY X-Received: by 2002:a17:903:32c4:b0:1a1:ab40:b0f5 with SMTP id i4-20020a17090332c400b001a1ab40b0f5mr8447340plr.36.1679592516117; Thu, 23 Mar 2023 10:28:36 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1679592516; cv=none; d=google.com; s=arc-20160816; b=fj9e0CchxxCKStC18TMkbPeSfMofG5dwUsgSnkjLjhhuJf7miH3ElmuBuC4B7SOjFb yqCLeDpKVCZ+X6LMf+gGJxsMubY2d4hYja58d70ZfisdohYCdO7F6DEUt57gtb6lRmxW 1LY/lj26M8n71WodHMJjPebaewTqTAIPM+mu0tiieQMziHOPlIWCkTD8qulPdXt9Xz18 g6cK1fIBNiICmtY+pkXR4p8SE3bCX9oyWSevlUdVRy4P5f7LhG2fylAiMJD43IXdoAyn bMLG3gQwhscKGfjbjRWDyrBNayrYXCvNIMvGmu3Y5tMma/2ah9D4QCuuZQyS0G0FXTGd D/ww== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-transfer-encoding :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature; bh=z/h2YRuqxkbq2WVy9rSukgPkz7MffMXHVCaEQ8JsfSI=; b=MKCnGlHlOCPcW7GBgPuCl4vIJoLSUmODDaMWiV5Sa/fFfHYWkxklaDq1O7Yk6pn/HT mYPZCLEEdEDhA/qcEEdjaF69QtY7WL7XtCgT0Aexavg1kArNu37wL/UwCveiorqF8DMK 1hPY0La47Bu/SrO7v6aE3+5K0+e0J7sbcdrAi4Ii59G9JPCNmaNizrfhvQ/JWN20XtyZ OfM/bSb9VBKHs6VOrP/6jS+7DGuPGXj5+iZt1rO45GLSahUru46zuAL/J3cd00V76zDt o7i2SBsXV5OM/e097dmGseLBnbpvOcBe9GTpmFrRxgvZ+trNy53Dxw6Jsn6DYEOh0z5U SXzw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@cmpxchg-org.20210112.gappssmtp.com header.s=20210112 header.b=0b1sytAc; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=cmpxchg.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id 23-20020a631957000000b0050bfc1d1b6dsi18269372pgz.132.2023.03.23.10.28.23; Thu, 23 Mar 2023 10:28:36 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@cmpxchg-org.20210112.gappssmtp.com header.s=20210112 header.b=0b1sytAc; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=cmpxchg.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232067AbjCWR1i (ORCPT + 99 others); Thu, 23 Mar 2023 13:27:38 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38452 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232063AbjCWR1g (ORCPT ); Thu, 23 Mar 2023 13:27:36 -0400 Received: from mail-qv1-xf2f.google.com (mail-qv1-xf2f.google.com [IPv6:2607:f8b0:4864:20::f2f]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5472A166CF for ; Thu, 23 Mar 2023 10:27:34 -0700 (PDT) Received: by mail-qv1-xf2f.google.com with SMTP id oe8so1595203qvb.6 for ; Thu, 23 Mar 2023 10:27:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cmpxchg-org.20210112.gappssmtp.com; s=20210112; t=1679592453; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date:from:to :cc:subject:date:message-id:reply-to; bh=z/h2YRuqxkbq2WVy9rSukgPkz7MffMXHVCaEQ8JsfSI=; b=0b1sytAcRxnizGxvw+p+lkYnbDhWMhQxrXGPJPm5sHacyMrIEdNLQJWaWguMuI6iRT gIob4qZ61pz7VHAvob8ln+eERCrG8SD695iKZx3xXdVhxJf2AK68WpjLNHTbfR1+mL/h W7T8K7/p+oIhtsGd2HUyWvFwIy6iY8DRoXXxq5zUNE26HtDLdvCeI65VKxv5w/6Cz/WF fEPo60DJZK/R6TZ+NZeMzWnHn7sJXOyNF2qPFslVrcEcKc1n5cSCHU1BEimYAOU8AJaR A2BbTJLzfny7Nv4w5tfJ4xLxwqSSQOmSgthksXW6BG1tzizPpqODzSDd5+8QtbLdZvtY p2Tw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1679592453; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=z/h2YRuqxkbq2WVy9rSukgPkz7MffMXHVCaEQ8JsfSI=; b=cry8afJRuej1/uuJpN5yLP4Ruu2zpK1zT85mCiE81EeI1J4LpeV+NrwH9/hmnbGwV6 Djm3kQCpv8wc4ePRAZHjglb/0RF8MII5md2mtPKGIwoWZLKOAdZsIQHDOKSKsBJoAYdC Tx6nmbQjmzS2FFPYtBnDBJ+yZcs1ZILbQs7IZcvVscXCxYIqEShFZMlSyTncvtaA+oWL RSCCQHwVLpMHE3e9IwOBVD/eH4g7PRsPg7/2E3mZIUsakgPjRwcpzOCqbDDiPql9U+5R CmGsSDh2QtlzXKPopPQw0l6zzITIAGeoXyaXQMkx7IaJvY6iBziSFfcnS8LMjLZzEfXa +ozQ== X-Gm-Message-State: AO0yUKVoIExKRKx9Bf0HjpWRPSlSg/VS1BHLFNW3bD4XAGWE/jPOA91d 5d8ahqxUFALZqZss8EzpElmQOg== X-Received: by 2002:a05:6214:5010:b0:56b:fb30:49c6 with SMTP id jo16-20020a056214501000b0056bfb3049c6mr12890918qvb.50.1679592453212; Thu, 23 Mar 2023 10:27:33 -0700 (PDT) Received: from localhost ([2620:10d:c091:400::5:62db]) by smtp.gmail.com with ESMTPSA id 123-20020a370581000000b007456df35859sm9126833qkf.74.2023.03.23.10.27.32 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 23 Mar 2023 10:27:32 -0700 (PDT) Date: Thu, 23 Mar 2023 13:27:32 -0400 From: Johannes Weiner To: Yosry Ahmed Cc: Tejun Heo , Josef Bacik , Jens Axboe , Zefan Li , Michal Hocko , Roman Gushchin , Shakeel Butt , Muchun Song , Andrew Morton , Vasily Averin , cgroups@vger.kernel.org, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, bpf@vger.kernel.org Subject: Re: [RFC PATCH 4/7] memcg: sleep during flushing stats in safe contexts Message-ID: <20230323172732.GE739026@cmpxchg.org> References: <20230323040037.2389095-1-yosryahmed@google.com> <20230323040037.2389095-5-yosryahmed@google.com> <20230323155613.GC739026@cmpxchg.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: X-Spam-Status: No, score=0.0 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Mar 23, 2023 at 09:01:12AM -0700, Yosry Ahmed wrote: > On Thu, Mar 23, 2023 at 8:56 AM Johannes Weiner wrote: > > > > On Thu, Mar 23, 2023 at 04:00:34AM +0000, Yosry Ahmed wrote: > > > @@ -644,26 +644,26 @@ static void __mem_cgroup_flush_stats(void) > > > return; > > > > > > flush_next_time = jiffies_64 + 2*FLUSH_TIME; > > > - cgroup_rstat_flush(root_mem_cgroup->css.cgroup, false); > > > + cgroup_rstat_flush(root_mem_cgroup->css.cgroup, may_sleep); > > > > How is it safe to call this with may_sleep=true when it's holding the > > stats_flush_lock? > > stats_flush_lock is always called with trylock, it is only used today > so that we can skip flushing if another cpu is already doing a flush > (which is not 100% correct as they may have not finished flushing yet, > but that's orthogonal here). So I think it should be safe to sleep as > no one can be blocked waiting for this spinlock. I see. It still cannot sleep while the lock is held, though, because preemption is disabled. Make sure you have all lock debugging on while testing this. > Perhaps it would be better semantically to replace the spinlock with > an atomic test and set, instead of having a lock that can only be used > with trylock? It could be helpful to clarify what stats_flush_lock is protecting first. Keep in mind that locks should protect data, not code paths. Right now it's doing multiple things: 1. It protects updates to stats_flush_threshold 2. It protects updates to flush_next_time 3. It serializes calls to cgroup_rstat_flush() based on those ratelimits However, 1. stats_flush_threshold is already an atomic 2. flush_next_time is not atomic. The writer is locked, but the reader is lockless. If the reader races with a flush, you could see this: if (time_after(jiffies, flush_next_time)) spin_trylock() flush_next_time = now + delay flush() spin_unlock() spin_trylock() flush_next_time = now + delay flush() spin_unlock() which means we already can get flushes at a higher frequency than FLUSH_TIME during races. But it isn't really a problem. The reader could also see garbled partial updates, so it needs at least READ_ONCE and WRITE_ONCE protection. 3. Serializing cgroup_rstat_flush() calls against the ratelimit factors is currently broken because of the race in 2. But the race is actually harmless, all we might get is the occasional earlier flush. If there is no delta, the flush won't do much. And if there is, the flush is justified. In summary, it seems to me the lock can be ditched altogether. All the code needs is READ_ONCE/WRITE_ONCE around flush_next_time.