Received: by 2002:a05:7412:da14:b0:e2:908c:2ebd with SMTP id fe20csp1618612rdb; Sun, 8 Oct 2023 17:17:11 -0700 (PDT) X-Google-Smtp-Source: AGHT+IHMyWe8CokuvJqltTETlxKaTQb2WmTdEudRhm25T8BFDUcnseeLWt7EoxrZGL+ccjuWqEws X-Received: by 2002:a17:90b:305:b0:26f:4685:5b69 with SMTP id ay5-20020a17090b030500b0026f46855b69mr11251953pjb.7.1696810630884; Sun, 08 Oct 2023 17:17:10 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1696810630; cv=none; d=google.com; s=arc-20160816; b=YhQN8TyGfNlHXDAOqoPrmqDveROh4xQJGZDNddt92tgL7W+myIFV4WqFZstVKBinqG cuHzZcFJnOhoY2j0fKDQNfh/gweJMR29+yD2rM7NYzWdw0sCjUWF1+D+GHUA9ye15Qg1 aHinWZvo5agLFY7S7hurZpEFhYc5v/UlJbqmhmVR+vqDcue4tfpKzllomavAJ9NCwc0i 7VR//s10MM9RQlMjd1E0JwRO4FsAKo4MLgq7ELzoMcaUwWCNXbI+3AUakUs0Uq8BvUme cmmLR0DKgQFABSCFl0JwxmidWjrvYcdN5tv/pmcTr8yO5GgjEJwhImDaeqlCJFjrGLvR k/vw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=5pMbdaei/dF0n16cuejvm6nYTMm6wJ9kYRo603Y9tXY=; fh=MBE0jQANScWwt2X7OdNT+yE90t8XwxYUiTs9q4k4ltc=; b=1IRquBZhUS8yh+jMHbKc+Pp7xGVATaupUNphh6rdXcViDv9mrGmkAvZrPlqT0l/tnA vasyO8Sy0Ubt5iJRUj5YPlgmj2bCyXkq3HV4Ymv2XCAXOeuStVGm2nyi0X4oHg7hseiK CxS+dI66oYBX91iRaz2UNbsnpe6eu7CwRr8pId4OwlPHwrXgY7kuJGaobAgkY1bhjY1K 8yIXekf8M/CnpnEEW38f4qs+hyZ4h23vN13/N6S7ERFucHnJjKZmeSBWfRWAqY28ciCn Gd3oijjXsIjvIpZKDwrPF40dvoquw9odCe8MUGheR2YtiJAepXTtZIMbHM7Py42qaz3J bK3Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@fromorbit-com.20230601.gappssmtp.com header.s=20230601 header.b=QdmKix3y; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.32 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=fromorbit.com Return-Path: Received: from agentk.vger.email (agentk.vger.email. [23.128.96.32]) by mx.google.com with ESMTPS id a21-20020a17090ad81500b00278f819d43asi8305572pjv.109.2023.10.08.17.17.10 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 08 Oct 2023 17:17:10 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.32 as permitted sender) client-ip=23.128.96.32; Authentication-Results: mx.google.com; dkim=pass header.i=@fromorbit-com.20230601.gappssmtp.com header.s=20230601 header.b=QdmKix3y; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.32 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=fromorbit.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by agentk.vger.email (Postfix) with ESMTP id 53778808724B; Sun, 8 Oct 2023 17:17:07 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at agentk.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230183AbjJIAP6 (ORCPT + 99 others); Sun, 8 Oct 2023 20:15:58 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38574 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229845AbjJIAPz (ORCPT ); Sun, 8 Oct 2023 20:15:55 -0400 Received: from mail-pl1-x62f.google.com (mail-pl1-x62f.google.com [IPv6:2607:f8b0:4864:20::62f]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 54D71B9 for ; Sun, 8 Oct 2023 17:15:51 -0700 (PDT) Received: by mail-pl1-x62f.google.com with SMTP id d9443c01a7336-1c871a095ceso28169315ad.2 for ; Sun, 08 Oct 2023 17:15:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fromorbit-com.20230601.gappssmtp.com; s=20230601; t=1696810551; x=1697415351; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=5pMbdaei/dF0n16cuejvm6nYTMm6wJ9kYRo603Y9tXY=; b=QdmKix3yy3E1Ui3bJJ0ES5EF0MArsMkQ6sjKQpgbckyBkn8y88kvYo4oJY3F/U6JJc 9f7WCXSdpD6M/dOXuDhKaBKD/f4mCiwLsjHz7X6pRFM9V9xo9rPyOgAOEb3XJYQ0wP5B gh2ZxL+ryUMigElWCKzn4JSoZQfq5CZOhuENpI4AyqqYk+GsSAeyrMiFssyRK8wtQOT/ dwsZsAUAZ8bS0Mpe3eswW9Ed5rCtu2FwT6eI8blDryhOiG4XR2/+EsJ0AurVlQJ4apml UlvIMSj58bMqPqK/n+HzA/63/nA9/yHeeTANgqO02AK4A615h1R9npV0TJyERfkOOsOr vVMQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1696810551; x=1697415351; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=5pMbdaei/dF0n16cuejvm6nYTMm6wJ9kYRo603Y9tXY=; b=Sw68dzhB5nKUp1MzQcnjmFZVYQxQaS9JpoZ50eUGxvz24vPn+4u9nZU8MLsHMwTZ8k CEgrZXU8QMiILdrMGW0sEK071711vcURct8Uu5vlQx/RLY9mwxZSRDI0+uFm9fH9J1g0 w/t5MR+oJVEUwfzncJeVy45v84vR7WFWVfm9pWk93w2JLyKsAxeUdWucoLBjp/tjgdpS o343rNUmziKtSI0Jfev+wlbAa4XWZNfuV9a5yqS3/KPjXBsWJlVc7/TnjqX9Omf2/b1u 9EtdzkG/Krb1CsKCQdLtDCsJZBwloZbbiWGC5lvO4qmcuGxNiljgWPU4+kO/Ecs1R9iU zV1w== X-Gm-Message-State: AOJu0YybRaYO/GPjKCBCj29Nh2WdJC4aWU4uPup7E0keREJkrekY8k7n Gd7c2h5enafovlQgVnPEAJmbrQ== X-Received: by 2002:a05:6a21:6d9f:b0:13f:b028:7892 with SMTP id wl31-20020a056a216d9f00b0013fb0287892mr14836860pzb.2.1696810550803; Sun, 08 Oct 2023 17:15:50 -0700 (PDT) Received: from dread.disaster.area (pa49-180-20-59.pa.nsw.optusnet.com.au. [49.180.20.59]) by smtp.gmail.com with ESMTPSA id ix1-20020a170902f80100b001b53953f306sm8122275plb.178.2023.10.08.17.15.49 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 08 Oct 2023 17:15:50 -0700 (PDT) Received: from dave by dread.disaster.area with local (Exim 4.96) (envelope-from ) id 1qpdw2-00BIkj-31; Mon, 09 Oct 2023 11:15:46 +1100 Date: Mon, 9 Oct 2023 11:15:46 +1100 From: Dave Chinner To: Hugh Dickins Cc: Andrew Morton , Tim Chen , Dave Chinner , "Darrick J. Wong" , Christian Brauner , Carlos Maiolino , Chuck Lever , Jan Kara , Matthew Wilcox , Johannes Weiner , Axel Rasmussen , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: Re: [PATCH 8/8] shmem,percpu_counter: add _limited_add(fbc, limit, amount) Message-ID: References: <2451f678-38b3-46c7-82fe-8eaf4d50a3a6@google.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <2451f678-38b3-46c7-82fe-8eaf4d50a3a6@google.com> X-Spam-Status: No, score=2.8 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,RCVD_IN_SBL_CSS, SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on agentk.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (agentk.vger.email [0.0.0.0]); Sun, 08 Oct 2023 17:17:07 -0700 (PDT) X-Spam-Level: ** On Thu, Oct 05, 2023 at 10:35:33PM -0700, Hugh Dickins wrote: > On Thu, 5 Oct 2023, Dave Chinner wrote: > > On Fri, Sep 29, 2023 at 08:42:45PM -0700, Hugh Dickins wrote: > > > Percpu counter's compare and add are separate functions: without locking > > > around them (which would defeat their purpose), it has been possible to > > > overflow the intended limit. Imagine all the other CPUs fallocating > > > tmpfs huge pages to the limit, in between this CPU's compare and its add. > > > > > > I have not seen reports of that happening; but tmpfs's recent addition > > > of dquot_alloc_block_nodirty() in between the compare and the add makes > > > it even more likely, and I'd be uncomfortable to leave it unfixed. > > > > > > Introduce percpu_counter_limited_add(fbc, limit, amount) to prevent it. > > > > > > I believe this implementation is correct, and slightly more efficient > > > than the combination of compare and add (taking the lock once rather > > > than twice when nearing full - the last 128MiB of a tmpfs volume on a > > > machine with 128 CPUs and 4KiB pages); but it does beg for a better > > > design - when nearing full, there is no new batching, but the costly > > > percpu counter sum across CPUs still has to be done, while locked. > > > > > > Follow __percpu_counter_sum()'s example, including cpu_dying_mask as > > > well as cpu_online_mask: but shouldn't __percpu_counter_compare() and > > > __percpu_counter_limited_add() then be adding a num_dying_cpus() to > > > num_online_cpus(), when they calculate the maximum which could be held > > > across CPUs? But the times when it matters would be vanishingly rare. > > > > > > Signed-off-by: Hugh Dickins > > > Cc: Tim Chen > > > Cc: Dave Chinner > > > Cc: Darrick J. Wong > > > --- > > > Tim, Dave, Darrick: I didn't want to waste your time on patches 1-7, > > > which are just internal to shmem, and do not affect this patch (which > > > applies to v6.6-rc and linux-next as is): but want to run this by you. > > > > Hmmmm. IIUC, this only works for addition that approaches the limit > > from below? > > That's certainly how I was thinking about it, and what I need for tmpfs. > Precisely what its limitations (haha) are, I'll have to take care to > spell out. > > (IIRC - it's a while since I wrote it - it can be used for subtraction, > but goes the very slow way when it could go the fast way - uncompared > percpu_counter_sub() much better for that. You might be proposing that > a tweak could adjust it to going the fast way when coming down from the > "limit", but going the slow way as it approaches 0 - that would be neat, > but I've not yet looked into whether it's feasily done.) > > > > > So if we are approaching the limit from above (i.e. add of a > > negative amount, limit is zero) then this code doesn't work the same > > as the open-coded compare+add operation would? > > To it and to me, a limit of 0 means nothing positive can be added > (and it immediately returns false for that case); and adding anything > negative would be an error since the positive would not have been allowed. > > Would a negative limit have any use? I don't have any use for it, but the XFS case is decrementing free space to determine if ENOSPC has been hit. It's the opposite implemention to shmem, which increments used space to determine if ENOSPC is hit. > It's definitely not allowing all the possibilities that you could arrange > with a separate compare and add; whether it's ruling out some useful > possibilities to which it can easily be generalized, I'm not sure. > > Well worth a look - but it'll be easier for me to break it than get > it right, so I might just stick to adding some comments. > > I might find that actually I prefer your way round: getting slower > as approaching 0, without any need for specifying a limit?? That the > tmpfs case pushed it in this direction, when it's better reversed? Or > that might be an embarrassing delusion which I'll regret having mentioned. I think there's cases for both approaching and upper limit from before and a lower limit from above. Both are the same "compare and add" algorithm, just with minor logic differences... > > Hence I think this looks like a "add if result is less than" > > operation, which is distinct from then "add if result is greater > > than" operation that we use this same pattern for in XFS and ext4. > > Perhaps a better name is in order? > > The name still seems good to me, but a comment above it on its > assumptions/limitations well worth adding. > > I didn't find a percpu_counter_compare() in ext4, and haven't got Go search for EXT4_FREECLUSTERS_WATERMARK.... > far yet with understanding the XFS ones: tomorrow... XFS detects being near ENOSPC to change the batch update size so taht when near ENOSPC the percpu counter always aggregates to the global sum on every modification. i.e. it becomes more accurate (but slower) near the ENOSPC threshold. Then if the result of the subtraction ends up being less than zero, it takes a lock (i.e. goes even slower!), undoes the subtraction that took it below zero, and determines if it can dip into the reserve pool or ENOSPC should be reported. Some of that could be optimised, but we need that external "lock and undo" mechanism to manage the reserve pool space atomically at ENOSPC... Cheers, Dave. -- Dave Chinner david@fromorbit.com