Received: by 2002:a05:6358:9144:b0:117:f937:c515 with SMTP id r4csp5888072rwr; Mon, 1 May 2023 12:25:15 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ5NC21AaBpFPJ3SjR2Fsn901D0DrITUHyK/r8FbLY2PGBlAc8XO1j5czNNvAvnFfhs9C5/3 X-Received: by 2002:a17:902:dac6:b0:1ab:5b0:6f16 with SMTP id q6-20020a170902dac600b001ab05b06f16mr1147541plx.43.1682969114835; Mon, 01 May 2023 12:25:14 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1682969114; cv=none; d=google.com; s=arc-20160816; b=r2gQYrqBwMPp0s1pT9dnVPtJjVFp+LdImGyvfriWV8Bgk1SdwOwiscc/Ec52DKCrHY n/HNXorsLWhpc+4TSUTfGtv9Y5GfzT2yBgD9LderTd6VaesZSMROsfo5lYHN0xGx23Pn yu0QvS7bW4o/Vh2pYlvEbo74uscic2VvdxmDWDUb8emzDsa/DRAsa1qStOhVfwExZBaE AsMXCsU5yTZYVl+5WnXEUNFhS5Scyl9R+RHVZ1+TNEDz5rNSQJ2wyDkjGy3/DbHMdkQ7 PAGKECyeJGl8G82ohFphGSymB9+9kwTp3v4uE0yx9Q11NIy0y6ylX7ggw3eFXUSNlbgf 6NEQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to:from :references:cc:to:content-language:subject:user-agent:mime-version :date:message-id:dkim-signature; bh=5UxSQNJpFaq+7dWUx985o9MhLzvQlphTHJz1Ql9KvxU=; b=LWxeljx59HorejEbyaFmGBKTrEhhZ4ghCTjXRr2aPrC8uamXl4XAhjaWeMotK6OWA7 wwWqJBev47NGGS3Bb6oTmjpWSD0h7BlJtk15AvDK9bRrxDTBtd4A+X9K1W3JwfFV7Bxu liQ9ygavMOJJL/ff6Ou5m1iqIinlAy6YG/+B+dDF6iRGWqFxEPP5D4o0f9vbg79vgaTm UJtewnYPn+LK4LJDGWAmlBbNQIpW4fh1+AV4G4Ty2KslNyU+kWs4/QU+THwy4dE25qGZ CnTI9k2uYLErJEsVz9QnNTUqTcWs4vWyirFaGq4EFqudBV5AxzwPMXzbq3goPS5wh/eS v0hg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@infradead.org header.s=bombadil.20210309 header.b=m24iecrt; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id g6-20020a170902740600b001ab007aff67si1428420pll.32.2023.05.01.12.25.03; Mon, 01 May 2023 12:25:14 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@infradead.org header.s=bombadil.20210309 header.b=m24iecrt; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232740AbjEATSc (ORCPT + 99 others); Mon, 1 May 2023 15:18:32 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48372 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231229AbjEATSa (ORCPT ); Mon, 1 May 2023 15:18:30 -0400 Received: from bombadil.infradead.org (bombadil.infradead.org [IPv6:2607:7c80:54:3::133]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7611510FF; Mon, 1 May 2023 12:18:27 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=bombadil.20210309; h=Content-Transfer-Encoding: Content-Type:In-Reply-To:From:References:Cc:To:Subject:MIME-Version:Date: Message-ID:Sender:Reply-To:Content-ID:Content-Description; bh=5UxSQNJpFaq+7dWUx985o9MhLzvQlphTHJz1Ql9KvxU=; b=m24iecrtD682Y0avrqU5+JQdF+ nWZoQBcK2H5OgTG/EBb90D7VsXtt6xEHViWN4TIiZkBrt8C9XLDM/gMEZOQyQwaVeKhc8yKRzSnI1 EzeIrWxb7TA+gz3k3Q34aDVhSZw8lRJrmIe+wJm7NL0+XOlWQ4vqwT7t6/7ueqwnmAdOpDQn6KWwT VkCLthfvJ4YhigT9Gq6pdWYrayOPmOxFENBfusjCeVkthG2cVdNBj1dKAMOInehBlzAmXZWW/R+3X q7ZxLkhhzsT169AYNc/Co9NhEJ8a6ViTju2j3HAEdDlsm0enI6uAvqqUbcKqpNak+lVph4GI/UUHe 1KK6klSw==; Received: from [2601:1c2:980:9ec0::2764] by bombadil.infradead.org with esmtpsa (Exim 4.96 #2 (Red Hat Linux)) id 1ptZ1u-00GeuY-36; Mon, 01 May 2023 19:17:47 +0000 Message-ID: <91a360d3-c876-0a57-5cb1-e3a5f419080d@infradead.org> Date: Mon, 1 May 2023 12:17:42 -0700 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.10.0 Subject: Re: [PATCH 07/40] Lazy percpu counters Content-Language: en-US To: Suren Baghdasaryan , akpm@linux-foundation.org Cc: kent.overstreet@linux.dev, mhocko@suse.com, vbabka@suse.cz, hannes@cmpxchg.org, roman.gushchin@linux.dev, mgorman@suse.de, dave@stgolabs.net, willy@infradead.org, liam.howlett@oracle.com, corbet@lwn.net, void@manifault.com, peterz@infradead.org, juri.lelli@redhat.com, ldufour@linux.ibm.com, catalin.marinas@arm.com, will@kernel.org, arnd@arndb.de, tglx@linutronix.de, mingo@redhat.com, dave.hansen@linux.intel.com, x86@kernel.org, peterx@redhat.com, david@redhat.com, axboe@kernel.dk, mcgrof@kernel.org, masahiroy@kernel.org, nathan@kernel.org, dennis@kernel.org, tj@kernel.org, muchun.song@linux.dev, rppt@kernel.org, paulmck@kernel.org, pasha.tatashin@soleen.com, yosryahmed@google.com, yuzhao@google.com, dhowells@redhat.com, hughd@google.com, andreyknvl@gmail.com, keescook@chromium.org, ndesaulniers@google.com, gregkh@linuxfoundation.org, ebiggers@google.com, ytcoode@gmail.com, vincent.guittot@linaro.org, dietmar.eggemann@arm.com, rostedt@goodmis.org, bsegall@google.com, bristot@redhat.com, vschneid@redhat.com, cl@linux.com, penberg@kernel.org, iamjoonsoo.kim@lge.com, 42.hyeyoo@gmail.com, glider@google.com, elver@google.com, dvyukov@google.com, shakeelb@google.com, songmuchun@bytedance.com, jbaron@akamai.com, rientjes@google.com, minchan@google.com, kaleshsingh@google.com, kernel-team@android.com, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, iommu@lists.linux.dev, linux-arch@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-modules@vger.kernel.org, kasan-dev@googlegroups.com, cgroups@vger.kernel.org References: <20230501165450.15352-1-surenb@google.com> <20230501165450.15352-8-surenb@google.com> From: Randy Dunlap In-Reply-To: <20230501165450.15352-8-surenb@google.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-5.8 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,NICE_REPLY_A,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi-- On 5/1/23 09:54, Suren Baghdasaryan wrote: > From: Kent Overstreet > > This patch adds lib/lazy-percpu-counter.c, which implements counters > that start out as atomics, but lazily switch to percpu mode if the > update rate crosses some threshold (arbitrarily set at 256 per second). > from submitting-patches.rst: Describe your changes in imperative mood, e.g. "make xyzzy do frotz" instead of "[This patch] makes xyzzy do frotz" or "[I] changed xyzzy to do frotz", as if you are giving orders to the codebase to change its behaviour. > Signed-off-by: Kent Overstreet > Signed-off-by: Suren Baghdasaryan > --- > include/linux/lazy-percpu-counter.h | 102 ++++++++++++++++++++++ > lib/Kconfig | 3 + > lib/Makefile | 2 + > lib/lazy-percpu-counter.c | 127 ++++++++++++++++++++++++++++ > 4 files changed, 234 insertions(+) > create mode 100644 include/linux/lazy-percpu-counter.h > create mode 100644 lib/lazy-percpu-counter.c > > diff --git a/include/linux/lazy-percpu-counter.h b/include/linux/lazy-percpu-counter.h > new file mode 100644 > index 000000000000..45ca9e2ce58b > --- /dev/null > +++ b/include/linux/lazy-percpu-counter.h > @@ -0,0 +1,102 @@ > +/* SPDX-License-Identifier: GPL-2.0 */ > +/* > + * Lazy percpu counters: > + * (C) 2022 Kent Overstreet > + * > + * Lazy percpu counters start out in atomic mode, then switch to percpu mode if > + * the update rate crosses some threshold. > + * > + * This means we don't have to decide between low memory overhead atomic > + * counters and higher performance percpu counters - we can have our cake and > + * eat it, too! > + * > + * Internally we use an atomic64_t, where the low bit indicates whether we're in > + * percpu mode, and the high 8 bits are a secondary counter that's incremented > + * when the counter is modified - meaning 55 bits of precision are available for > + * the counter itself. > + */ > + > +#ifndef _LINUX_LAZY_PERCPU_COUNTER_H > +#define _LINUX_LAZY_PERCPU_COUNTER_H > + > +#include > +#include > + > +struct lazy_percpu_counter { > + atomic64_t v; > + unsigned long last_wrap; > +}; > + > +void lazy_percpu_counter_exit(struct lazy_percpu_counter *c); > +void lazy_percpu_counter_add_slowpath(struct lazy_percpu_counter *c, s64 i); > +void lazy_percpu_counter_add_slowpath_noupgrade(struct lazy_percpu_counter *c, s64 i); > +s64 lazy_percpu_counter_read(struct lazy_percpu_counter *c); > + > +/* > + * We use the high bits of the atomic counter for a secondary counter, which is > + * incremented every time the counter is touched. When the secondary counter > + * wraps, we check the time the counter last wrapped, and if it was recent > + * enough that means the update frequency has crossed our threshold and we > + * switch to percpu mode: > + */ > +#define COUNTER_MOD_BITS 8 > +#define COUNTER_MOD_MASK ~(~0ULL >> COUNTER_MOD_BITS) > +#define COUNTER_MOD_BITS_START (64 - COUNTER_MOD_BITS) > + > +/* > + * We use the low bit of the counter to indicate whether we're in atomic mode > + * (low bit clear), or percpu mode (low bit set, counter is a pointer to actual > + * percpu counters: > + */ > +#define COUNTER_IS_PCPU_BIT 1 > + > +static inline u64 __percpu *lazy_percpu_counter_is_pcpu(u64 v) > +{ > + if (!(v & COUNTER_IS_PCPU_BIT)) > + return NULL; > + > + v ^= COUNTER_IS_PCPU_BIT; > + return (u64 __percpu *)(unsigned long)v; > +} > + > +/** > + * lazy_percpu_counter_add: Add a value to a lazy_percpu_counter For kernel-doc, the function name should be followed by '-', not ':'. (many places) > + * > + * @c: counter to modify > + * @i: value to add > + */ > +static inline void lazy_percpu_counter_add(struct lazy_percpu_counter *c, s64 i) > +{ > + u64 v = atomic64_read(&c->v); > + u64 __percpu *pcpu_v = lazy_percpu_counter_is_pcpu(v); > + > + if (likely(pcpu_v)) > + this_cpu_add(*pcpu_v, i); > + else > + lazy_percpu_counter_add_slowpath(c, i); > +} > + > +/** > + * lazy_percpu_counter_add_noupgrade: Add a value to a lazy_percpu_counter, > + * without upgrading to percpu mode > + * > + * @c: counter to modify > + * @i: value to add > + */ > +static inline void lazy_percpu_counter_add_noupgrade(struct lazy_percpu_counter *c, s64 i) > +{ > + u64 v = atomic64_read(&c->v); > + u64 __percpu *pcpu_v = lazy_percpu_counter_is_pcpu(v); > + > + if (likely(pcpu_v)) > + this_cpu_add(*pcpu_v, i); > + else > + lazy_percpu_counter_add_slowpath_noupgrade(c, i); > +} > + > +static inline void lazy_percpu_counter_sub(struct lazy_percpu_counter *c, s64 i) > +{ > + lazy_percpu_counter_add(c, -i); > +} > + > +#endif /* _LINUX_LAZY_PERCPU_COUNTER_H */ > diff --git a/lib/lazy-percpu-counter.c b/lib/lazy-percpu-counter.c > new file mode 100644 > index 000000000000..4f4e32c2dc09 > --- /dev/null > +++ b/lib/lazy-percpu-counter.c > @@ -0,0 +1,127 @@ > +// SPDX-License-Identifier: GPL-2.0-only > + > +#include > +#include > +#include > +#include > +#include > + > +static inline s64 lazy_percpu_counter_atomic_val(s64 v) > +{ > + /* Ensure output is sign extended properly: */ > + return (v << COUNTER_MOD_BITS) >> > + (COUNTER_MOD_BITS + COUNTER_IS_PCPU_BIT); > +} > + ... > + > +/** > + * lazy_percpu_counter_exit: Free resources associated with a > + * lazy_percpu_counter Same kernel-doc comment. > + * > + * @c: counter to exit > + */ > +void lazy_percpu_counter_exit(struct lazy_percpu_counter *c) > +{ > + free_percpu(lazy_percpu_counter_is_pcpu(atomic64_read(&c->v))); > +} > +EXPORT_SYMBOL_GPL(lazy_percpu_counter_exit); > + > +/** > + * lazy_percpu_counter_read: Read current value of a lazy_percpu_counter > + * > + * @c: counter to read > + */ > +s64 lazy_percpu_counter_read(struct lazy_percpu_counter *c) > +{ > + s64 v = atomic64_read(&c->v); > + u64 __percpu *pcpu_v = lazy_percpu_counter_is_pcpu(v); > + > + if (pcpu_v) { > + int cpu; > + > + v = 0; > + for_each_possible_cpu(cpu) > + v += *per_cpu_ptr(pcpu_v, cpu); > + } else { > + v = lazy_percpu_counter_atomic_val(v); > + } > + > + return v; > +} > +EXPORT_SYMBOL_GPL(lazy_percpu_counter_read); > + > +void lazy_percpu_counter_add_slowpath(struct lazy_percpu_counter *c, s64 i) > +{ > + u64 atomic_i; > + u64 old, v = atomic64_read(&c->v); > + u64 __percpu *pcpu_v; > + > + atomic_i = i << COUNTER_IS_PCPU_BIT; > + atomic_i &= ~COUNTER_MOD_MASK; > + atomic_i |= 1ULL << COUNTER_MOD_BITS_START; > + > + do { > + pcpu_v = lazy_percpu_counter_is_pcpu(v); > + if (pcpu_v) { > + this_cpu_add(*pcpu_v, i); > + return; > + } > + > + old = v; > + } while ((v = atomic64_cmpxchg(&c->v, old, old + atomic_i)) != old); > + > + if (unlikely(!(v & COUNTER_MOD_MASK))) { > + unsigned long now = jiffies; > + > + if (c->last_wrap && > + unlikely(time_after(c->last_wrap + HZ, now))) > + lazy_percpu_counter_switch_to_pcpu(c); > + else > + c->last_wrap = now; > + } > +} > +EXPORT_SYMBOL(lazy_percpu_counter_add_slowpath); > + > +void lazy_percpu_counter_add_slowpath_noupgrade(struct lazy_percpu_counter *c, s64 i) > +{ > + u64 atomic_i; > + u64 old, v = atomic64_read(&c->v); > + u64 __percpu *pcpu_v; > + > + atomic_i = i << COUNTER_IS_PCPU_BIT; > + atomic_i &= ~COUNTER_MOD_MASK; > + > + do { > + pcpu_v = lazy_percpu_counter_is_pcpu(v); > + if (pcpu_v) { > + this_cpu_add(*pcpu_v, i); > + return; > + } > + > + old = v; > + } while ((v = atomic64_cmpxchg(&c->v, old, old + atomic_i)) != old); > +} > +EXPORT_SYMBOL(lazy_percpu_counter_add_slowpath_noupgrade); These last 2 exported functions could use some comments, preferably in kernel-doc format. Thanks. -- ~Randy