Received: by 2002:a05:7412:3784:b0:e2:908c:2ebd with SMTP id jk4csp148543rdb; Fri, 29 Sep 2023 21:47:43 -0700 (PDT) X-Google-Smtp-Source: AGHT+IGdjoJo35jOjMflYzZLvEM9AFXdnBwGwg25nAQbN1k9W/8cY5gvSPV9zOgR3QkDKewjgC32 X-Received: by 2002:a17:90a:d515:b0:276:c6b8:eaae with SMTP id t21-20020a17090ad51500b00276c6b8eaaemr5522326pju.7.1696049262914; Fri, 29 Sep 2023 21:47:42 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1696049262; cv=none; d=google.com; s=arc-20160816; b=YRmbDYsadgs7fwZYMLQ5zFiEKOHpL+vlArXN5eKOqfg6S3Z70kU0NhVvbbzNc0M1/f kCbls6NFOoFhwi8K+Wp9o6NqgVQVm8vZCNt03kodU9eCdn9JnXqujJU6ISZwwqObxq2U re132f9UiNop0dlD30P4Vv1h1PChoYZWeU8gRi70A0BFKCQOoR6hitxBo39qkyU+ajOB En5XaY51iXjRcej2VEHjH2gdcGMSmIgaoYtTT5zXAEMbwy1f4mYqlju7iCe9vRpTTs5V 4e9Ct3VyE/T/wyeayD624sMNDracOrCJSmlgsDwF2JWMRtudb5yct2aeymZ/eznwQu3K R9zg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:mime-version:references:message-id:in-reply-to :subject:cc:to:from:date:dkim-signature; bh=2x6fH+w1PdDZvpTev5khLvy5t3y2+BALqf/PDla7s+s=; fh=4N5tiHZjK/98NZmYcqpK66wCiSoYDtS7CwEpWLHqt0E=; b=oXtG2eEFGEK7JKupjBQ3YfzBgSOP6F1eOm83F7LhQswlmDSDZU+B4uCIpalLB5gxCT WPx7qSoYEE0wqNjR/iA81ndLp0he+YUkAy1D+jPJRyepdTkvng3A0ON6RNRC1zYg4Vd5 nz3WiJL/iujTBRfvMY3VZGXO1TDcYBZdNIC4gyWeF/UjXkcc8j8AbLvSOXfdDPa1e550 zRCZ5fPuEvUG8kJWeiQbb63pdjPHYgA2/fng4PXijhvEFXkMOhRXzSAH6Fk3qE5LZOVS jOZOAo/oi69bgQGdQ5EVjW8L3J0EzFgBlmtoZcamgRvz114MMq5Mn70ibhSWG3qVSbPd 1bGg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b=KDXuLjk9; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:4 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from howler.vger.email (howler.vger.email. [2620:137:e000::3:4]) by mx.google.com with ESMTPS id l14-20020a17090a660e00b00277387ac962si2959597pjj.2.2023.09.29.21.47.42 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 29 Sep 2023 21:47:42 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:4 as permitted sender) client-ip=2620:137:e000::3:4; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b=KDXuLjk9; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:4 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by howler.vger.email (Postfix) with ESMTP id BFAD982CBA76; Fri, 29 Sep 2023 20:43:06 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at howler.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229649AbjI3Dmw (ORCPT + 99 others); Fri, 29 Sep 2023 23:42:52 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55572 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229764AbjI3Dmv (ORCPT ); Fri, 29 Sep 2023 23:42:51 -0400 Received: from mail-yw1-x112c.google.com (mail-yw1-x112c.google.com [IPv6:2607:f8b0:4864:20::112c]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 66DF7BA for ; Fri, 29 Sep 2023 20:42:49 -0700 (PDT) Received: by mail-yw1-x112c.google.com with SMTP id 00721157ae682-59f1dff5298so170316357b3.3 for ; Fri, 29 Sep 2023 20:42:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1696045368; x=1696650168; darn=vger.kernel.org; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:from:to:cc:subject:date:message-id:reply-to; bh=2x6fH+w1PdDZvpTev5khLvy5t3y2+BALqf/PDla7s+s=; b=KDXuLjk9D3qhT2wExeqdG8th+lOtxMeAclFPpxw8Eplb+nuCJ1qj2C+wlM8Vnp4Q+s 5vh+3t3SxmP41GhssWSZpCjIG40O0JH2t0ACsKhF2hQtVNpf9OguwKMtxFLTPjtGtKg5 AzymlDy38Gz4I5ApfOtFiQ6H3YNVRJ5vTceGXaGciuWzPgPBSqpe6dsg2Z/dv+cGF46G p1ElRPE1/+n0bb9P/Rs7Zkhw+34L6Vk2Bkr9iS7SaaosTcBNG6OJfpA0stcivAeo1PW1 hzA7calsNJNQeTy37dthP2VWY4Sbd/3s9FH/l/x6a+5gkb03En7M3eQV5KYcjduXk/5R ackw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1696045368; x=1696650168; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=2x6fH+w1PdDZvpTev5khLvy5t3y2+BALqf/PDla7s+s=; b=n1y6TcW+NmwIuJvVgug+bh/Jj5Jjx7pDt37OtMyCQjhFs+o41luys3BiBz0fxOverI d1mmhiCEFhZ4tsrOqL0NMGF3j5HIi6GlbXTD3J2IvCY3S4b5uAlQhPYyJXGJntZqm0C5 +R8zYIFIrAE5mni6DbEiK/yqYIHFOBIYTmcjlL5o5dWAPs9rfHwHCM1HkrvMJHjk0PXz AQiKwsrDesepshC9wnthQL1kp52zqDEIKCKZ24DuG6AoLw0SBfMTgLdG5E/SuOH8ivT0 LNss9dDatlH2WhfKOh596LgBdB1A1IfBGF03f3xvlIdnrVUwbn4k9wb5Nhj2n80Ux9sU FATA== X-Gm-Message-State: AOJu0YwjFnnJlgiUbFuH8EeadMt/FnkjSZj8bfAwdsvgZcHZaVss+x8/ WhX0K+cNyPvLMEnU375Y0PbLKw== X-Received: by 2002:a81:a24e:0:b0:59b:5170:a0f3 with SMTP id z14-20020a81a24e000000b0059b5170a0f3mr5957990ywg.36.1696045368515; Fri, 29 Sep 2023 20:42:48 -0700 (PDT) Received: from ripple.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id f184-20020a0dc3c1000000b0059ae483b89dsm5983309ywd.50.2023.09.29.20.42.46 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 29 Sep 2023 20:42:47 -0700 (PDT) Date: Fri, 29 Sep 2023 20:42:45 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@ripple.attlocal.net To: Andrew Morton cc: Tim Chen , Dave Chinner , "Darrick J. Wong" , Christian Brauner , Carlos Maiolino , Chuck Lever , Jan Kara , Matthew Wilcox , Johannes Weiner , Axel Rasmussen , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH 8/8] shmem,percpu_counter: add _limited_add(fbc, limit, amount) In-Reply-To: Message-ID: References: MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII X-Spam-Status: No, score=-17.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF, ENV_AND_HDR_SPF_MATCH,RCVD_IN_DNSWL_BLOCKED,SPF_HELO_NONE,SPF_PASS, USER_IN_DEF_DKIM_WL,USER_IN_DEF_SPF_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (howler.vger.email [0.0.0.0]); Fri, 29 Sep 2023 20:43:06 -0700 (PDT) Percpu counter's compare and add are separate functions: without locking around them (which would defeat their purpose), it has been possible to overflow the intended limit. Imagine all the other CPUs fallocating tmpfs huge pages to the limit, in between this CPU's compare and its add. I have not seen reports of that happening; but tmpfs's recent addition of dquot_alloc_block_nodirty() in between the compare and the add makes it even more likely, and I'd be uncomfortable to leave it unfixed. Introduce percpu_counter_limited_add(fbc, limit, amount) to prevent it. I believe this implementation is correct, and slightly more efficient than the combination of compare and add (taking the lock once rather than twice when nearing full - the last 128MiB of a tmpfs volume on a machine with 128 CPUs and 4KiB pages); but it does beg for a better design - when nearing full, there is no new batching, but the costly percpu counter sum across CPUs still has to be done, while locked. Follow __percpu_counter_sum()'s example, including cpu_dying_mask as well as cpu_online_mask: but shouldn't __percpu_counter_compare() and __percpu_counter_limited_add() then be adding a num_dying_cpus() to num_online_cpus(), when they calculate the maximum which could be held across CPUs? But the times when it matters would be vanishingly rare. Signed-off-by: Hugh Dickins Cc: Tim Chen Cc: Dave Chinner Cc: Darrick J. Wong --- Tim, Dave, Darrick: I didn't want to waste your time on patches 1-7, which are just internal to shmem, and do not affect this patch (which applies to v6.6-rc and linux-next as is): but want to run this by you. include/linux/percpu_counter.h | 23 +++++++++++++++ lib/percpu_counter.c | 53 ++++++++++++++++++++++++++++++++++ mm/shmem.c | 10 +++---- 3 files changed, 81 insertions(+), 5 deletions(-) diff --git a/include/linux/percpu_counter.h b/include/linux/percpu_counter.h index d01351b1526f..8cb7c071bd5c 100644 --- a/include/linux/percpu_counter.h +++ b/include/linux/percpu_counter.h @@ -57,6 +57,8 @@ void percpu_counter_add_batch(struct percpu_counter *fbc, s64 amount, s32 batch); s64 __percpu_counter_sum(struct percpu_counter *fbc); int __percpu_counter_compare(struct percpu_counter *fbc, s64 rhs, s32 batch); +bool __percpu_counter_limited_add(struct percpu_counter *fbc, s64 limit, + s64 amount, s32 batch); void percpu_counter_sync(struct percpu_counter *fbc); static inline int percpu_counter_compare(struct percpu_counter *fbc, s64 rhs) @@ -69,6 +71,13 @@ static inline void percpu_counter_add(struct percpu_counter *fbc, s64 amount) percpu_counter_add_batch(fbc, amount, percpu_counter_batch); } +static inline bool +percpu_counter_limited_add(struct percpu_counter *fbc, s64 limit, s64 amount) +{ + return __percpu_counter_limited_add(fbc, limit, amount, + percpu_counter_batch); +} + /* * With percpu_counter_add_local() and percpu_counter_sub_local(), counts * are accumulated in local per cpu counter and not in fbc->count until @@ -185,6 +194,20 @@ percpu_counter_add(struct percpu_counter *fbc, s64 amount) local_irq_restore(flags); } +static inline bool +percpu_counter_limited_add(struct percpu_counter *fbc, s64 limit, s64 amount) +{ + unsigned long flags; + s64 count; + + local_irq_save(flags); + count = fbc->count + amount; + if (count <= limit) + fbc->count = count; + local_irq_restore(flags); + return count <= limit; +} + /* non-SMP percpu_counter_add_local is the same with percpu_counter_add */ static inline void percpu_counter_add_local(struct percpu_counter *fbc, s64 amount) diff --git a/lib/percpu_counter.c b/lib/percpu_counter.c index 9073430dc865..58a3392f471b 100644 --- a/lib/percpu_counter.c +++ b/lib/percpu_counter.c @@ -278,6 +278,59 @@ int __percpu_counter_compare(struct percpu_counter *fbc, s64 rhs, s32 batch) } EXPORT_SYMBOL(__percpu_counter_compare); +/* + * Compare counter, and add amount if the total is within limit. + * Return true if amount was added, false if it would exceed limit. + */ +bool __percpu_counter_limited_add(struct percpu_counter *fbc, + s64 limit, s64 amount, s32 batch) +{ + s64 count; + s64 unknown; + unsigned long flags; + bool good; + + if (amount > limit) + return false; + + local_irq_save(flags); + unknown = batch * num_online_cpus(); + count = __this_cpu_read(*fbc->counters); + + /* Skip taking the lock when safe */ + if (abs(count + amount) <= batch && + fbc->count + unknown <= limit) { + this_cpu_add(*fbc->counters, amount); + local_irq_restore(flags); + return true; + } + + raw_spin_lock(&fbc->lock); + count = fbc->count + amount; + + /* Skip percpu_counter_sum() when safe */ + if (count + unknown > limit) { + s32 *pcount; + int cpu; + + for_each_cpu_or(cpu, cpu_online_mask, cpu_dying_mask) { + pcount = per_cpu_ptr(fbc->counters, cpu); + count += *pcount; + } + } + + good = count <= limit; + if (good) { + count = __this_cpu_read(*fbc->counters); + fbc->count += count + amount; + __this_cpu_sub(*fbc->counters, count); + } + + raw_spin_unlock(&fbc->lock); + local_irq_restore(flags); + return good; +} + static int __init percpu_counter_startup(void) { int ret; diff --git a/mm/shmem.c b/mm/shmem.c index 4f4ab26bc58a..7cb72c747954 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -217,15 +217,15 @@ static int shmem_inode_acct_blocks(struct inode *inode, long pages) might_sleep(); /* when quotas */ if (sbinfo->max_blocks) { - if (percpu_counter_compare(&sbinfo->used_blocks, - sbinfo->max_blocks - pages) > 0) + if (!percpu_counter_limited_add(&sbinfo->used_blocks, + sbinfo->max_blocks, pages)) goto unacct; err = dquot_alloc_block_nodirty(inode, pages); - if (err) + if (err) { + percpu_counter_sub(&sbinfo->used_blocks, pages); goto unacct; - - percpu_counter_add(&sbinfo->used_blocks, pages); + } } else { err = dquot_alloc_block_nodirty(inode, pages); if (err) -- 2.35.3