2021-11-28 03:59:19

by Yury Norov

[permalink] [raw]
Subject: [PATCH 0/9] lib/bitmap: optimize bitmap_weight() usage

In many cases people use bitmap_weight()-based functions like this:

if (num_present_cpus() > 1)
do_something();

This may take considerable amount of time on many-cpus machines because
num_present_cpus() will traverse every word of underlying cpumask
unconditionally.

We can significantly improve on it for many real cases if stop traversing
the mask as soon as we count present cpus to any number greater than 1:

if (num_present_cpus_gt(1))
do_something();

To implement this idea, the series adds bitmap_weight_{eq,gt,le}
functions together with corresponding wrappers in cpumask and nodemask.

Yury Norov (9):
lib/bitmap: add bitmap_weight_{eq,gt,le}
lib/bitmap: implement bitmap_{empty,full} with bitmap_weight_eq()
all: replace bitmap_weigth() with bitmap_{empty,full,eq,gt,le}
tools: sync bitmap_weight() usage with the kernel
lib/cpumask: add cpumask_weight_{eq,gt,le}
lib/nodemask: add nodemask_weight_{eq,gt,le}
lib/cpumask: add num_{possible,present,active}_cpus_{eq,gt,le}
lib/nodemask: add num_node_state_eq()
MAINTAINERS: add cpumask and nodemask files to BITMAP_API

MAINTAINERS | 4 ++
arch/alpha/kernel/process.c | 2 +-
arch/arc/kernel/smp.c | 2 +-
arch/arm/kernel/machine_kexec.c | 2 +-
arch/arm/mach-exynos/exynos.c | 2 +-
arch/arm/mm/cache-b15-rac.c | 2 +-
arch/arm64/kernel/smp.c | 2 +-
arch/arm64/mm/context.c | 2 +-
arch/csky/mm/asid.c | 2 +-
arch/csky/mm/context.c | 2 +-
arch/ia64/kernel/setup.c | 2 +-
arch/ia64/mm/tlb.c | 8 +--
arch/mips/cavium-octeon/octeon-irq.c | 4 +-
arch/mips/kernel/crash.c | 2 +-
arch/mips/kernel/i8253.c | 2 +-
arch/mips/kernel/perf_event_mipsxx.c | 4 +-
arch/mips/kernel/rtlx-cmp.c | 2 +-
arch/mips/kernel/smp.c | 4 +-
arch/mips/kernel/vpe-cmp.c | 2 +-
.../loongson2ef/common/cs5536/cs5536_mfgpt.c | 2 +-
arch/mips/mm/context.c | 2 +-
arch/mips/mm/tlbex.c | 2 +-
arch/nds32/kernel/perf_event_cpu.c | 4 +-
arch/nios2/kernel/cpuinfo.c | 2 +-
arch/powerpc/kernel/smp.c | 2 +-
arch/powerpc/kernel/watchdog.c | 4 +-
arch/powerpc/platforms/85xx/smp.c | 2 +-
arch/powerpc/platforms/pseries/hotplug-cpu.c | 4 +-
arch/powerpc/sysdev/mpic.c | 2 +-
arch/powerpc/xmon/xmon.c | 10 +--
arch/riscv/kvm/vmid.c | 2 +-
arch/s390/kernel/perf_cpum_cf.c | 2 +-
arch/sparc/kernel/mdesc.c | 6 +-
arch/x86/events/amd/core.c | 2 +-
arch/x86/kernel/alternative.c | 8 +--
arch/x86/kernel/apic/apic.c | 4 +-
arch/x86/kernel/apic/apic_flat_64.c | 2 +-
arch/x86/kernel/apic/probe_32.c | 2 +-
arch/x86/kernel/cpu/mce/dev-mcelog.c | 2 +-
arch/x86/kernel/cpu/resctrl/rdtgroup.c | 18 +++---
arch/x86/kernel/hpet.c | 2 +-
arch/x86/kernel/i8253.c | 2 +-
arch/x86/kernel/kvm.c | 2 +-
arch/x86/kernel/kvmclock.c | 2 +-
arch/x86/kernel/smpboot.c | 4 +-
arch/x86/kernel/tsc.c | 2 +-
arch/x86/kvm/hyperv.c | 8 +--
arch/x86/mm/amdtopology.c | 2 +-
arch/x86/mm/mmio-mod.c | 2 +-
arch/x86/mm/numa_emulation.c | 4 +-
arch/x86/platform/uv/uv_nmi.c | 2 +-
arch/x86/xen/smp_pv.c | 2 +-
arch/x86/xen/spinlock.c | 2 +-
drivers/acpi/numa/srat.c | 2 +-
drivers/clk/samsung/clk-exynos4.c | 2 +-
drivers/clocksource/ingenic-timer.c | 3 +-
drivers/cpufreq/pcc-cpufreq.c | 2 +-
drivers/cpufreq/qcom-cpufreq-hw.c | 2 +-
drivers/cpufreq/scmi-cpufreq.c | 2 +-
drivers/crypto/ccp/ccp-dev-v5.c | 5 +-
drivers/dma/mv_xor.c | 5 +-
drivers/firmware/psci/psci_checker.c | 2 +-
drivers/gpu/drm/i810/i810_drv.c | 2 +-
drivers/gpu/drm/i915/i915_pmu.c | 2 +-
drivers/gpu/drm/msm/disp/mdp5/mdp5_smp.c | 2 +-
drivers/hv/channel_mgmt.c | 4 +-
drivers/iio/adc/mxs-lradc-adc.c | 3 +-
drivers/iio/dummy/iio_simple_dummy_buffer.c | 4 +-
drivers/iio/industrialio-buffer.c | 2 +-
drivers/iio/industrialio-trigger.c | 2 +-
drivers/infiniband/hw/hfi1/affinity.c | 13 ++--
drivers/infiniband/hw/qib/qib_file_ops.c | 2 +-
drivers/infiniband/hw/qib/qib_iba7322.c | 2 +-
drivers/infiniband/sw/siw/siw_main.c | 3 +-
drivers/irqchip/irq-bcm6345-l1.c | 2 +-
drivers/irqchip/irq-gic.c | 2 +-
drivers/memstick/core/ms_block.c | 4 +-
drivers/net/caif/caif_virtio.c | 2 +-
drivers/net/dsa/b53/b53_common.c | 2 +-
drivers/net/ethernet/broadcom/bcmsysport.c | 6 +-
.../cavium/liquidio/cn23xx_vf_device.c | 2 +-
drivers/net/ethernet/hisilicon/hns/hns_enet.c | 2 +-
.../net/ethernet/intel/ice/ice_virtchnl_pf.c | 4 +-
.../net/ethernet/intel/ixgbe/ixgbe_sriov.c | 2 +-
.../net/ethernet/marvell/mvpp2/mvpp2_main.c | 2 +-
.../marvell/octeontx2/nic/otx2_ethtool.c | 2 +-
.../marvell/octeontx2/nic/otx2_flows.c | 8 +--
.../ethernet/marvell/octeontx2/nic/otx2_pf.c | 2 +-
drivers/net/ethernet/mellanox/mlx4/cmd.c | 10 +--
drivers/net/ethernet/mellanox/mlx4/eq.c | 4 +-
drivers/net/ethernet/mellanox/mlx4/main.c | 2 +-
.../ethernet/mellanox/mlx5/core/en_ethtool.c | 2 +-
drivers/net/ethernet/qlogic/qed/qed_dev.c | 3 +-
drivers/net/ethernet/qlogic/qed/qed_rdma.c | 4 +-
drivers/net/ethernet/qlogic/qed/qed_roce.c | 2 +-
drivers/net/wireless/ath/ath9k/hw.c | 2 +-
drivers/net/wireless/marvell/mwifiex/main.c | 4 +-
drivers/net/wireless/st/cw1200/queue.c | 3 +-
drivers/nvdimm/region.c | 2 +-
drivers/nvme/host/pci.c | 2 +-
drivers/perf/arm-cci.c | 2 +-
drivers/perf/arm_pmu.c | 6 +-
drivers/perf/hisilicon/hisi_uncore_pmu.c | 2 +-
drivers/perf/thunderx2_pmu.c | 3 +-
drivers/perf/xgene_pmu.c | 2 +-
.../intel/speed_select_if/isst_if_common.c | 6 +-
drivers/pwm/pwm-pca9685.c | 2 +-
drivers/scsi/lpfc/lpfc_init.c | 2 +-
drivers/soc/bcm/brcmstb/biuctrl.c | 2 +-
drivers/soc/fsl/dpio/dpio-service.c | 4 +-
drivers/soc/fsl/qbman/qman_test_stash.c | 2 +-
drivers/spi/spi-dw-bt1.c | 2 +-
drivers/staging/media/tegra-video/vi.c | 2 +-
drivers/thermal/intel/intel_powerclamp.c | 10 ++-
drivers/virt/acrn/hsm.c | 2 +-
fs/ocfs2/cluster/heartbeat.c | 14 ++---
fs/xfs/xfs_sysfs.c | 2 +-
include/linux/bitmap.h | 45 ++++++++++---
include/linux/cpumask.h | 55 ++++++++++++++++
include/linux/kdb.h | 2 +-
include/linux/nodemask.h | 29 +++++++++
kernel/debug/kdb/kdb_bt.c | 2 +-
kernel/irq/affinity.c | 2 +-
kernel/padata.c | 2 +-
kernel/printk/printk.c | 2 +-
kernel/rcu/tree_nocb.h | 4 +-
kernel/rcu/tree_plugin.h | 2 +-
kernel/reboot.c | 4 +-
kernel/sched/core.c | 10 +--
kernel/sched/topology.c | 4 +-
kernel/time/clockevents.c | 4 +-
kernel/time/clocksource.c | 2 +-
lib/bitmap.c | 63 +++++++++++++++++++
mm/mempolicy.c | 2 +-
mm/page_alloc.c | 2 +-
mm/percpu.c | 6 +-
mm/slab.c | 2 +-
mm/vmstat.c | 4 +-
tools/include/linux/bitmap.h | 42 ++++++++++---
tools/lib/bitmap.c | 60 ++++++++++++++++++
tools/perf/builtin-c2c.c | 4 +-
tools/perf/util/pmu.c | 2 +-
142 files changed, 490 insertions(+), 251 deletions(-)

--
2.25.1



2021-11-28 03:59:30

by Yury Norov

[permalink] [raw]
Subject: [PATCH 1/9] lib/bitmap: add bitmap_weight_{eq,gt,le}

Many kernel users call bitmap_weight() to compare the result against
some number or expression:
if (bitmap_weight(...) > 1)
do_something();

It works OK, but may be significantly improved for large bitmaps: if
first few words count set bits to a number greater than given, we can
stop counting and immediately return.

The same idea would work in other direction: if we know that the number
of set bits that we counted so far is small enough, so that it would be
smaller than required number even if all bits of the rest of the bitmap
are set, we can return earlier.

This patch adds new bitmap_weight_{eq,gt,le} functions to allow this
optimization, and the following patches apply them where appropriate.

Signed-off-by: Yury Norov <[email protected]>
---
include/linux/bitmap.h | 33 ++++++++++++++++++++++
lib/bitmap.c | 63 ++++++++++++++++++++++++++++++++++++++++++
2 files changed, 96 insertions(+)

diff --git a/include/linux/bitmap.h b/include/linux/bitmap.h
index 7dba0847510c..996041f771c8 100644
--- a/include/linux/bitmap.h
+++ b/include/linux/bitmap.h
@@ -51,6 +51,9 @@ struct device;
* bitmap_empty(src, nbits) Are all bits zero in *src?
* bitmap_full(src, nbits) Are all bits set in *src?
* bitmap_weight(src, nbits) Hamming Weight: number set bits
+ * bitmap_weight_eq(src, nbits, num) Hamming Weight is equal to num
+ * bitmap_weight_gt(src, nbits, num) Hamming Weight is greater than num
+ * bitmap_weight_le(src, nbits, num) Hamming Weight is less than num
* bitmap_set(dst, pos, nbits) Set specified bit area
* bitmap_clear(dst, pos, nbits) Clear specified bit area
* bitmap_find_next_zero_area(buf, len, pos, n, mask) Find bit free area
@@ -162,6 +165,9 @@ int __bitmap_intersects(const unsigned long *bitmap1,
int __bitmap_subset(const unsigned long *bitmap1,
const unsigned long *bitmap2, unsigned int nbits);
int __bitmap_weight(const unsigned long *bitmap, unsigned int nbits);
+bool __bitmap_weight_eq(const unsigned long *bitmap, unsigned int nbits, unsigned int num);
+bool __bitmap_weight_gt(const unsigned long *bitmap, unsigned int nbits, unsigned int num);
+bool __bitmap_weight_le(const unsigned long *bitmap, unsigned int nbits, unsigned int num);
void __bitmap_set(unsigned long *map, unsigned int start, int len);
void __bitmap_clear(unsigned long *map, unsigned int start, int len);

@@ -403,6 +409,33 @@ static __always_inline int bitmap_weight(const unsigned long *src, unsigned int
return __bitmap_weight(src, nbits);
}

+static __always_inline bool bitmap_weight_eq(const unsigned long *src,
+ unsigned int nbits, unsigned int num)
+{
+ if (small_const_nbits(nbits))
+ return hweight_long(*src & BITMAP_LAST_WORD_MASK(nbits)) == num;
+
+ return __bitmap_weight_eq(src, nbits, num);
+}
+
+static __always_inline bool bitmap_weight_gt(const unsigned long *src,
+ unsigned int nbits, unsigned int num)
+{
+ if (small_const_nbits(nbits))
+ return hweight_long(*src & BITMAP_LAST_WORD_MASK(nbits)) > num;
+
+ return __bitmap_weight_gt(src, nbits, num);
+}
+
+static __always_inline bool bitmap_weight_le(const unsigned long *src,
+ unsigned int nbits, unsigned int num)
+{
+ if (small_const_nbits(nbits))
+ return hweight_long(*src & BITMAP_LAST_WORD_MASK(nbits)) < num;
+
+ return __bitmap_weight_le(src, nbits, num);
+}
+
static __always_inline void bitmap_set(unsigned long *map, unsigned int start,
unsigned int nbits)
{
diff --git a/lib/bitmap.c b/lib/bitmap.c
index 926408883456..72e7ab2d7bdd 100644
--- a/lib/bitmap.c
+++ b/lib/bitmap.c
@@ -348,6 +348,69 @@ int __bitmap_weight(const unsigned long *bitmap, unsigned int bits)
}
EXPORT_SYMBOL(__bitmap_weight);

+bool __bitmap_weight_eq(const unsigned long *bitmap, unsigned int bits, unsigned int num)
+{
+ unsigned int k, w, lim = bits / BITS_PER_LONG;
+
+ for (k = 0, w = 0; k < lim; k++) {
+ if (w + bits - k * BITS_PER_LONG < num)
+ return false;
+
+ w += hweight_long(bitmap[k]);
+
+ if (w > num)
+ return false;
+ }
+
+ if (bits % BITS_PER_LONG)
+ w += hweight_long(bitmap[k] & BITMAP_LAST_WORD_MASK(bits));
+
+ return w == num;
+}
+EXPORT_SYMBOL(__bitmap_weight_eq);
+
+bool __bitmap_weight_gt(const unsigned long *bitmap, unsigned int bits, unsigned int num)
+{
+ unsigned int k, w, lim = bits / BITS_PER_LONG;
+
+ for (k = 0, w = 0; k < lim; k++) {
+ if (w + bits - k * BITS_PER_LONG <= num)
+ return false;
+
+ w += hweight_long(bitmap[k]);
+
+ if (w > num)
+ return true;
+ }
+
+ if (bits % BITS_PER_LONG)
+ w += hweight_long(bitmap[k] & BITMAP_LAST_WORD_MASK(bits));
+
+ return w > num;
+}
+EXPORT_SYMBOL(__bitmap_weight_gt);
+
+bool __bitmap_weight_le(const unsigned long *bitmap, unsigned int bits, unsigned int num)
+{
+ unsigned int k, w, lim = bits / BITS_PER_LONG;
+
+ for (k = 0, w = 0; k < lim; k++) {
+ if (w + bits - k * BITS_PER_LONG < num)
+ return true;
+
+ w += hweight_long(bitmap[k]);
+
+ if (w >= num)
+ return false;
+ }
+
+ if (bits % BITS_PER_LONG)
+ w += hweight_long(bitmap[k] & BITMAP_LAST_WORD_MASK(bits));
+
+ return w < num;
+}
+EXPORT_SYMBOL(__bitmap_weight_le);
+
void __bitmap_set(unsigned long *map, unsigned int start, int len)
{
unsigned long *p = map + BIT_WORD(start);
--
2.25.1


2021-11-28 03:59:42

by Yury Norov

[permalink] [raw]
Subject: [PATCH 2/9] lib/bitmap: implement bitmap_{empty,full} with bitmap_weight_eq()

Now as we have bitmap_weight_eq(), switch bitmap_full() and
bitmap_empty() to using it.

Signed-off-by: Yury Norov <[email protected]>
---
include/linux/bitmap.h | 26 ++++++++++----------------
1 file changed, 10 insertions(+), 16 deletions(-)

diff --git a/include/linux/bitmap.h b/include/linux/bitmap.h
index 996041f771c8..2d951e4dc814 100644
--- a/include/linux/bitmap.h
+++ b/include/linux/bitmap.h
@@ -386,22 +386,6 @@ static inline int bitmap_subset(const unsigned long *src1,
return __bitmap_subset(src1, src2, nbits);
}

-static inline bool bitmap_empty(const unsigned long *src, unsigned nbits)
-{
- if (small_const_nbits(nbits))
- return ! (*src & BITMAP_LAST_WORD_MASK(nbits));
-
- return find_first_bit(src, nbits) == nbits;
-}
-
-static inline bool bitmap_full(const unsigned long *src, unsigned int nbits)
-{
- if (small_const_nbits(nbits))
- return ! (~(*src) & BITMAP_LAST_WORD_MASK(nbits));
-
- return find_first_zero_bit(src, nbits) == nbits;
-}
-
static __always_inline int bitmap_weight(const unsigned long *src, unsigned int nbits)
{
if (small_const_nbits(nbits))
@@ -436,6 +420,16 @@ static __always_inline bool bitmap_weight_le(const unsigned long *src,
return __bitmap_weight_le(src, nbits, num);
}

+static __always_inline bool bitmap_empty(const unsigned long *src, unsigned int nbits)
+{
+ return bitmap_weight_eq(src, nbits, 0);
+}
+
+static __always_inline bool bitmap_full(const unsigned long *src, unsigned int nbits)
+{
+ return bitmap_weight_eq(src, nbits, nbits);
+}
+
static __always_inline void bitmap_set(unsigned long *map, unsigned int start,
unsigned int nbits)
{
--
2.25.1


2021-11-28 03:59:53

by Yury Norov

[permalink] [raw]
Subject: [PATCH 3/9] all: replace bitmap_weigth() with bitmap_{empty,full,eq,gt,le}

bitmap_weight() counts all set bits in the bitmap unconditionally.
However in some cases we can traverse a part of bitmap when we
only need to check if number of set bits is greater, less or equal
to some number.

This patch replaces bitmap_weight() with one of
bitmap_{empty,full,eq,gt,le), as appropriate.

In some places driver code has been optimized further, where it's
trivial.

Signed-off-by: Yury Norov <[email protected]>
---
arch/nds32/kernel/perf_event_cpu.c | 4 +---
arch/x86/kernel/cpu/resctrl/rdtgroup.c | 4 ++--
arch/x86/kvm/hyperv.c | 8 ++++----
drivers/crypto/ccp/ccp-dev-v5.c | 5 +----
drivers/gpu/drm/msm/disp/mdp5/mdp5_smp.c | 2 +-
drivers/iio/adc/mxs-lradc-adc.c | 3 +--
drivers/iio/dummy/iio_simple_dummy_buffer.c | 4 ++--
drivers/iio/industrialio-buffer.c | 2 +-
drivers/iio/industrialio-trigger.c | 2 +-
drivers/memstick/core/ms_block.c | 4 ++--
drivers/net/dsa/b53/b53_common.c | 2 +-
drivers/net/ethernet/broadcom/bcmsysport.c | 6 +-----
drivers/net/ethernet/intel/ice/ice_virtchnl_pf.c | 4 ++--
drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c | 2 +-
.../ethernet/marvell/octeontx2/nic/otx2_ethtool.c | 2 +-
.../ethernet/marvell/octeontx2/nic/otx2_flows.c | 8 ++++----
.../net/ethernet/marvell/octeontx2/nic/otx2_pf.c | 2 +-
drivers/net/ethernet/mellanox/mlx4/cmd.c | 10 +++-------
drivers/net/ethernet/mellanox/mlx4/eq.c | 4 ++--
drivers/net/ethernet/mellanox/mlx4/main.c | 2 +-
.../net/ethernet/mellanox/mlx5/core/en_ethtool.c | 2 +-
drivers/net/ethernet/qlogic/qed/qed_dev.c | 3 +--
drivers/net/ethernet/qlogic/qed/qed_rdma.c | 4 ++--
drivers/net/ethernet/qlogic/qed/qed_roce.c | 2 +-
drivers/perf/arm-cci.c | 2 +-
drivers/perf/arm_pmu.c | 4 ++--
drivers/perf/hisilicon/hisi_uncore_pmu.c | 2 +-
drivers/perf/thunderx2_pmu.c | 3 +--
drivers/perf/xgene_pmu.c | 2 +-
drivers/pwm/pwm-pca9685.c | 2 +-
drivers/staging/media/tegra-video/vi.c | 2 +-
drivers/thermal/intel/intel_powerclamp.c | 10 ++++------
fs/ocfs2/cluster/heartbeat.c | 14 +++++++-------
33 files changed, 57 insertions(+), 75 deletions(-)

diff --git a/arch/nds32/kernel/perf_event_cpu.c b/arch/nds32/kernel/perf_event_cpu.c
index a78a879e7ef1..05a1cd258356 100644
--- a/arch/nds32/kernel/perf_event_cpu.c
+++ b/arch/nds32/kernel/perf_event_cpu.c
@@ -695,10 +695,8 @@ static void nds32_pmu_enable(struct pmu *pmu)
{
struct nds32_pmu *nds32_pmu = to_nds32_pmu(pmu);
struct pmu_hw_events *hw_events = nds32_pmu->get_hw_events();
- int enabled = bitmap_weight(hw_events->used_mask,
- nds32_pmu->num_events);

- if (enabled)
+ if (!bitmap_empty(hw_events->used_mask, nds32_pmu->num_events))
nds32_pmu->start(nds32_pmu);
}

diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index b57b3db9a6a7..94e7e6b420e4 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -2749,10 +2749,10 @@ static int __init_one_rdt_domain(struct rdt_domain *d, struct resctrl_schema *s,
cfg->new_ctrl = cbm_ensure_valid(cfg->new_ctrl, r);
/*
* Assign the u32 CBM to an unsigned long to ensure that
- * bitmap_weight() does not access out-of-bound memory.
+ * bitmap_weight_le() does not access out-of-bound memory.
*/
tmp_cbm = cfg->new_ctrl;
- if (bitmap_weight(&tmp_cbm, r->cache.cbm_len) < r->cache.min_cbm_bits) {
+ if (bitmap_weight_le(&tmp_cbm, r->cache.cbm_len, r->cache.min_cbm_bits) {
rdt_last_cmd_printf("No space on %s:%d\n", s->name, d->id);
return -ENOSPC;
}
diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c
index 5e19e6e4c2ce..8b72c896e0f1 100644
--- a/arch/x86/kvm/hyperv.c
+++ b/arch/x86/kvm/hyperv.c
@@ -90,7 +90,7 @@ static void synic_update_vector(struct kvm_vcpu_hv_synic *synic,
{
struct kvm_vcpu *vcpu = hv_synic_to_vcpu(synic);
struct kvm_hv *hv = to_kvm_hv(vcpu->kvm);
- int auto_eoi_old, auto_eoi_new;
+ bool auto_eoi_old, auto_eoi_new;

if (vector < HV_SYNIC_FIRST_VALID_VECTOR)
return;
@@ -100,16 +100,16 @@ static void synic_update_vector(struct kvm_vcpu_hv_synic *synic,
else
__clear_bit(vector, synic->vec_bitmap);

- auto_eoi_old = bitmap_weight(synic->auto_eoi_bitmap, 256);
+ auto_eoi_old = bitmap_empty(synic->auto_eoi_bitmap, 256);

if (synic_has_vector_auto_eoi(synic, vector))
__set_bit(vector, synic->auto_eoi_bitmap);
else
__clear_bit(vector, synic->auto_eoi_bitmap);

- auto_eoi_new = bitmap_weight(synic->auto_eoi_bitmap, 256);
+ auto_eoi_new = bitmap_empty(synic->auto_eoi_bitmap, 256);

- if (!!auto_eoi_old == !!auto_eoi_new)
+ if (auto_eoi_old == auto_eoi_new)
return;

down_write(&vcpu->kvm->arch.apicv_update_lock);
diff --git a/drivers/crypto/ccp/ccp-dev-v5.c b/drivers/crypto/ccp/ccp-dev-v5.c
index 7b73332d6aa1..f569e8b99851 100644
--- a/drivers/crypto/ccp/ccp-dev-v5.c
+++ b/drivers/crypto/ccp/ccp-dev-v5.c
@@ -611,7 +611,6 @@ static int ccp_find_and_assign_lsb_to_q(struct ccp_device *ccp,
{
DECLARE_BITMAP(qlsb, MAX_LSB_CNT);
int bitno;
- int qlsb_wgt;
int i;

/* For each queue:
@@ -627,9 +626,7 @@ static int ccp_find_and_assign_lsb_to_q(struct ccp_device *ccp,
for (i = 0; i < ccp->cmd_q_count; i++) {
struct ccp_cmd_queue *cmd_q = &ccp->cmd_q[i];

- qlsb_wgt = bitmap_weight(cmd_q->lsbmask, MAX_LSB_CNT);
-
- if (qlsb_wgt == lsb_cnt) {
+ if (bitmap_weight_eq(cmd_q->lsbmask, MAX_LSB_CNT, lsb_cnt)) {
bitmap_copy(qlsb, cmd_q->lsbmask, MAX_LSB_CNT);

bitno = find_first_bit(qlsb, MAX_LSB_CNT);
diff --git a/drivers/gpu/drm/msm/disp/mdp5/mdp5_smp.c b/drivers/gpu/drm/msm/disp/mdp5/mdp5_smp.c
index d7fa2c49e741..56a3063545ec 100644
--- a/drivers/gpu/drm/msm/disp/mdp5/mdp5_smp.c
+++ b/drivers/gpu/drm/msm/disp/mdp5/mdp5_smp.c
@@ -68,7 +68,7 @@ static int smp_request_block(struct mdp5_smp *smp,
uint8_t reserved;

/* we shouldn't be requesting blocks for an in-use client: */
- WARN_ON(bitmap_weight(cs, cnt) > 0);
+ WARN_ON(!bitmap_empty(cs, cnt));

reserved = smp->reserved[cid];

diff --git a/drivers/iio/adc/mxs-lradc-adc.c b/drivers/iio/adc/mxs-lradc-adc.c
index bca79a93cbe4..b3ec713b9c88 100644
--- a/drivers/iio/adc/mxs-lradc-adc.c
+++ b/drivers/iio/adc/mxs-lradc-adc.c
@@ -540,7 +540,6 @@ static bool mxs_lradc_adc_validate_scan_mask(struct iio_dev *iio,
{
struct mxs_lradc_adc *adc = iio_priv(iio);
struct mxs_lradc *lradc = adc->lradc;
- const int map_chans = bitmap_weight(mask, LRADC_MAX_TOTAL_CHANS);
int rsvd_chans = 0;
unsigned long rsvd_mask = 0;

@@ -561,7 +560,7 @@ static bool mxs_lradc_adc_validate_scan_mask(struct iio_dev *iio,
return false;

/* Test for attempts to map more channels then available slots. */
- if (map_chans + rsvd_chans > LRADC_MAX_MAPPED_CHANS)
+ if (bitmap_weight_gt(mask, LRADC_MAX_TOTAL_CHANS, LRADC_MAX_MAPPED_CHAN - rsvd_chans)
return false;

return true;
diff --git a/drivers/iio/dummy/iio_simple_dummy_buffer.c b/drivers/iio/dummy/iio_simple_dummy_buffer.c
index 59aa60d4ca37..cd2470ddf82b 100644
--- a/drivers/iio/dummy/iio_simple_dummy_buffer.c
+++ b/drivers/iio/dummy/iio_simple_dummy_buffer.c
@@ -72,8 +72,8 @@ static irqreturn_t iio_simple_dummy_trigger_h(int irq, void *p)
int i, j;

for (i = 0, j = 0;
- i < bitmap_weight(indio_dev->active_scan_mask,
- indio_dev->masklength);
+ bitmap_weight_gt(indio_dev->active_scan_mask,
+ indio_dev->masklength, i);
i++, j++) {
j = find_next_bit(indio_dev->active_scan_mask,
indio_dev->masklength, j);
diff --git a/drivers/iio/industrialio-buffer.c b/drivers/iio/industrialio-buffer.c
index e180728914c0..4487ece29218 100644
--- a/drivers/iio/industrialio-buffer.c
+++ b/drivers/iio/industrialio-buffer.c
@@ -1804,7 +1804,7 @@ void iio_buffers_free_sysfs_and_mask(struct iio_dev *indio_dev)
bool iio_validate_scan_mask_onehot(struct iio_dev *indio_dev,
const unsigned long *mask)
{
- return bitmap_weight(mask, indio_dev->masklength) == 1;
+ return bitmap_weight_eq(mask, indio_dev->masklength, 1);
}
EXPORT_SYMBOL_GPL(iio_validate_scan_mask_onehot);

diff --git a/drivers/iio/industrialio-trigger.c b/drivers/iio/industrialio-trigger.c
index 93990ff1dfe3..d8cedae0d9da 100644
--- a/drivers/iio/industrialio-trigger.c
+++ b/drivers/iio/industrialio-trigger.c
@@ -298,7 +298,7 @@ int iio_trigger_detach_poll_func(struct iio_trigger *trig,
{
struct iio_dev_opaque *iio_dev_opaque = to_iio_dev_opaque(pf->indio_dev);
bool no_other_users =
- bitmap_weight(trig->pool, CONFIG_IIO_CONSUMERS_PER_TRIGGER) == 1;
+ bitmap_weight_eq(trig->pool, CONFIG_IIO_CONSUMERS_PER_TRIGGER, 1);
int ret = 0;

if (trig->ops && trig->ops->set_trigger_state && no_other_users) {
diff --git a/drivers/memstick/core/ms_block.c b/drivers/memstick/core/ms_block.c
index 0cda6c6baefc..5cdd987e78f7 100644
--- a/drivers/memstick/core/ms_block.c
+++ b/drivers/memstick/core/ms_block.c
@@ -155,8 +155,8 @@ static int msb_validate_used_block_bitmap(struct msb_data *msb)
for (i = 0; i < msb->zone_count; i++)
total_free_blocks += msb->free_block_count[i];

- if (msb->block_count - bitmap_weight(msb->used_blocks_bitmap,
- msb->block_count) == total_free_blocks)
+ if (bitmap_weight_eq(msb->used_blocks_bitmap, msb->block_count,
+ msb->block_count - total_free_blocks))
return 0;

pr_err("BUG: free block counts don't match the bitmap");
diff --git a/drivers/net/dsa/b53/b53_common.c b/drivers/net/dsa/b53/b53_common.c
index af4761968733..bc0cc2d226a9 100644
--- a/drivers/net/dsa/b53/b53_common.c
+++ b/drivers/net/dsa/b53/b53_common.c
@@ -1620,7 +1620,7 @@ static int b53_arl_read(struct b53_device *dev, u64 mac,
return 0;
}

- if (bitmap_weight(free_bins, dev->num_arl_bins) == 0)
+ if (bitmap_empty(free_bins, dev->num_arl_bins))
return -ENOSPC;

*idx = find_first_bit(free_bins, dev->num_arl_bins);
diff --git a/drivers/net/ethernet/broadcom/bcmsysport.c b/drivers/net/ethernet/broadcom/bcmsysport.c
index 40933bf5a710..241696fdc6c7 100644
--- a/drivers/net/ethernet/broadcom/bcmsysport.c
+++ b/drivers/net/ethernet/broadcom/bcmsysport.c
@@ -2177,13 +2177,9 @@ static int bcm_sysport_rule_set(struct bcm_sysport_priv *priv,
if (nfc->fs.ring_cookie != RX_CLS_FLOW_WAKE)
return -EOPNOTSUPP;

- /* All filters are already in use, we cannot match more rules */
- if (bitmap_weight(priv->filters, RXCHK_BRCM_TAG_MAX) ==
- RXCHK_BRCM_TAG_MAX)
- return -ENOSPC;
-
index = find_first_zero_bit(priv->filters, RXCHK_BRCM_TAG_MAX);
if (index >= RXCHK_BRCM_TAG_MAX)
+ /* All filters are already in use, we cannot match more rules */
return -ENOSPC;

/* Location is the classification ID, and index is the position
diff --git a/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.c b/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.c
index 217ff5e9a6f1..9d3ed351d77f 100644
--- a/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.c
+++ b/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.c
@@ -384,8 +384,8 @@ ice_set_pfe_link(struct ice_vf *vf, struct virtchnl_pf_event *pfe,
*/
static bool ice_vf_has_no_qs_ena(struct ice_vf *vf)
{
- return (!bitmap_weight(vf->rxq_ena, ICE_MAX_RSS_QS_PER_VF) &&
- !bitmap_weight(vf->txq_ena, ICE_MAX_RSS_QS_PER_VF));
+ return bitmap_empty(vf->rxq_ena, ICE_MAX_RSS_QS_PER_VF) &&
+ bitmap_empty(vf->txq_ena, ICE_MAX_RSS_QS_PER_VF);
}

/**
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c
index 214a38de3f41..35297d8a488b 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c
@@ -246,7 +246,7 @@ int ixgbe_disable_sriov(struct ixgbe_adapter *adapter)
#endif

/* Disable VMDq flag so device will be set in VM mode */
- if (bitmap_weight(adapter->fwd_bitmask, adapter->num_rx_pools) == 1) {
+ if (bitmap_weight_eq(adapter->fwd_bitmask, adapter->num_rx_pools, 1)) {
adapter->flags &= ~IXGBE_FLAG_VMDQ_ENABLED;
adapter->flags &= ~IXGBE_FLAG_SRIOV_ENABLED;
rss = min_t(int, ixgbe_max_rss_indices(adapter),
diff --git a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_ethtool.c b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_ethtool.c
index d85db90632d6..a55fd1d0c653 100644
--- a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_ethtool.c
+++ b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_ethtool.c
@@ -287,7 +287,7 @@ static int otx2_set_channels(struct net_device *dev,
if (!channel->rx_count || !channel->tx_count)
return -EINVAL;

- if (bitmap_weight(&pfvf->rq_bmap, pfvf->hw.rx_queues) > 1) {
+ if (bitmap_weight_gt(&pfvf->rq_bmap, pfvf->hw.rx_queues, 1)) {
netdev_err(dev,
"Receive queues are in use by TC police action\n");
return -EINVAL;
diff --git a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_flows.c b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_flows.c
index 77a13fb555fb..55c899a6fcdd 100644
--- a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_flows.c
+++ b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_flows.c
@@ -353,7 +353,7 @@ int otx2_add_macfilter(struct net_device *netdev, const u8 *mac)
{
struct otx2_nic *pf = netdev_priv(netdev);

- if (bitmap_weight(&pf->flow_cfg->dmacflt_bmap,
+ if (!bitmap_empty(&pf->flow_cfg->dmacflt_bmap,
pf->flow_cfg->dmacflt_max_flows))
netdev_warn(netdev,
"Add %pM to CGX/RPM DMAC filters list as well\n",
@@ -436,7 +436,7 @@ int otx2_get_maxflows(struct otx2_flow_config *flow_cfg)
return 0;

if (flow_cfg->nr_flows == flow_cfg->max_flows ||
- bitmap_weight(&flow_cfg->dmacflt_bmap,
+ !bitmap_empty(&flow_cfg->dmacflt_bmap,
flow_cfg->dmacflt_max_flows))
return flow_cfg->max_flows + flow_cfg->dmacflt_max_flows;
else
@@ -1170,8 +1170,8 @@ int otx2_remove_flow(struct otx2_nic *pfvf, u32 location)
* interface mac address and configure CGX/RPM block in
* promiscuous mode
*/
- if (bitmap_weight(&flow_cfg->dmacflt_bmap,
- flow_cfg->dmacflt_max_flows) == 1)
+ if (bitmap_weight_eq(&flow_cfg->dmacflt_bmap,
+ flow_cfg->dmacflt_max_flows, 1))
otx2_update_rem_pfmac(pfvf, DMAC_ADDR_DEL);
} else {
err = otx2_remove_flow_msg(pfvf, flow->entry, false);
diff --git a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_pf.c b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_pf.c
index 1333edf1c361..d51623ff605c 100644
--- a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_pf.c
+++ b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_pf.c
@@ -1115,7 +1115,7 @@ static int otx2_cgx_config_loopback(struct otx2_nic *pf, bool enable)
struct msg_req *msg;
int err;

- if (enable && bitmap_weight(&pf->flow_cfg->dmacflt_bmap,
+ if (enable && !bitmap_empty(&pf->flow_cfg->dmacflt_bmap,
pf->flow_cfg->dmacflt_max_flows))
netdev_warn(pf->netdev,
"CGX/RPM internal loopback might not work as DMAC filters are active\n");
diff --git a/drivers/net/ethernet/mellanox/mlx4/cmd.c b/drivers/net/ethernet/mellanox/mlx4/cmd.c
index e10b7b04b894..766128749bd0 100644
--- a/drivers/net/ethernet/mellanox/mlx4/cmd.c
+++ b/drivers/net/ethernet/mellanox/mlx4/cmd.c
@@ -2803,9 +2803,8 @@ int mlx4_slave_convert_port(struct mlx4_dev *dev, int slave, int port)
{
unsigned n;
struct mlx4_active_ports actv_ports = mlx4_get_active_ports(dev, slave);
- unsigned m = bitmap_weight(actv_ports.ports, dev->caps.num_ports);

- if (port <= 0 || port > m)
+ if (port <= 0 || bitmap_weight_le(actv_ports.ports, dev->caps.num_ports, port))
return -EINVAL;

n = find_first_bit(actv_ports.ports, dev->caps.num_ports);
@@ -3415,10 +3414,6 @@ int mlx4_vf_set_enable_smi_admin(struct mlx4_dev *dev, int slave, int port,
struct mlx4_priv *priv = mlx4_priv(dev);
struct mlx4_active_ports actv_ports = mlx4_get_active_ports(
&priv->dev, slave);
- int min_port = find_first_bit(actv_ports.ports,
- priv->dev.caps.num_ports) + 1;
- int max_port = min_port - 1 +
- bitmap_weight(actv_ports.ports, priv->dev.caps.num_ports);

if (slave == mlx4_master_func_num(dev))
return 0;
@@ -3428,7 +3423,8 @@ int mlx4_vf_set_enable_smi_admin(struct mlx4_dev *dev, int slave, int port,
enabled < 0 || enabled > 1)
return -EINVAL;

- if (min_port == max_port && dev->caps.num_ports > 1) {
+ if (bitmap_weight_eq(actv_ports.ports, priv->dev.caps.num_ports, 1) &&
+ dev->caps.num_ports > 1) {
mlx4_info(dev, "SMI access disallowed for single ported VFs\n");
return -EPROTONOSUPPORT;
}
diff --git a/drivers/net/ethernet/mellanox/mlx4/eq.c b/drivers/net/ethernet/mellanox/mlx4/eq.c
index 9e48509ed3b2..1b38c95ba4f5 100644
--- a/drivers/net/ethernet/mellanox/mlx4/eq.c
+++ b/drivers/net/ethernet/mellanox/mlx4/eq.c
@@ -1437,8 +1437,8 @@ int mlx4_is_eq_shared(struct mlx4_dev *dev, int vector)
if (vector <= 0 || (vector >= dev->caps.num_comp_vectors + 1))
return -EINVAL;

- return !!(bitmap_weight(priv->eq_table.eq[vector].actv_ports.ports,
- dev->caps.num_ports) > 1);
+ return bitmap_weight_gt(priv->eq_table.eq[vector].actv_ports.ports,
+ dev->caps.num_ports, 1);
}
EXPORT_SYMBOL(mlx4_is_eq_shared);

diff --git a/drivers/net/ethernet/mellanox/mlx4/main.c b/drivers/net/ethernet/mellanox/mlx4/main.c
index b187c210d4d6..cfbaa7ac712f 100644
--- a/drivers/net/ethernet/mellanox/mlx4/main.c
+++ b/drivers/net/ethernet/mellanox/mlx4/main.c
@@ -1383,7 +1383,7 @@ static int mlx4_mf_bond(struct mlx4_dev *dev)
dev->persist->num_vfs + 1);

/* only single port vfs are allowed */
- if (bitmap_weight(slaves_port_1_2, dev->persist->num_vfs + 1) > 1) {
+ if (bitmap_weight_gt(slaves_port_1_2, dev->persist->num_vfs + 1, 1)) {
mlx4_warn(dev, "HA mode unsupported for dual ported VFs\n");
return -EINVAL;
}
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
index c8757c5a812b..ec81ef9299e6 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
@@ -1670,7 +1670,7 @@ static int mlx5e_set_fecparam(struct net_device *netdev,
int err;

bitmap_from_arr32(&fec_bitmap, &fecparam->fec, sizeof(fecparam->fec) * BITS_PER_BYTE);
- if (bitmap_weight(&fec_bitmap, ETHTOOL_FEC_LLRS_BIT + 1) > 1)
+ if (bitmap_weight_gt(&fec_bitmap, ETHTOOL_FEC_LLRS_BIT + 1, 1))
return -EOPNOTSUPP;

for (mode = 0; mode < ARRAY_SIZE(pplm_fec_2_ethtool); mode++) {
diff --git a/drivers/net/ethernet/qlogic/qed/qed_dev.c b/drivers/net/ethernet/qlogic/qed/qed_dev.c
index cc4ec2bb36db..3a588decffb5 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_dev.c
+++ b/drivers/net/ethernet/qlogic/qed/qed_dev.c
@@ -1702,8 +1702,7 @@ static u16 *qed_init_qm_get_idx_from_flags(struct qed_hwfn *p_hwfn,
struct qed_qm_info *qm_info = &p_hwfn->qm_info;

/* Can't have multiple flags set here */
- if (bitmap_weight(&pq_flags,
- sizeof(pq_flags) * BITS_PER_BYTE) > 1) {
+ if (bitmap_weight_gt(&pq_flags, sizeof(pq_flags) * BITS_PER_BYTE, 1)) {
DP_ERR(p_hwfn, "requested multiple pq flags 0x%lx\n", pq_flags);
goto err;
}
diff --git a/drivers/net/ethernet/qlogic/qed/qed_rdma.c b/drivers/net/ethernet/qlogic/qed/qed_rdma.c
index 23b668de4640..b6e2e17bac04 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_rdma.c
+++ b/drivers/net/ethernet/qlogic/qed/qed_rdma.c
@@ -336,7 +336,7 @@ void qed_rdma_bmap_free(struct qed_hwfn *p_hwfn,

/* print aligned non-zero lines, if any */
for (item = 0, line = 0; line < last_line; line++, item += 8)
- if (bitmap_weight((unsigned long *)&pmap[item], 64 * 8))
+ if (!bitmap_empty((unsigned long *)&pmap[item], 64 * 8))
DP_NOTICE(p_hwfn,
"line 0x%04x: 0x%016llx 0x%016llx 0x%016llx 0x%016llx 0x%016llx 0x%016llx 0x%016llx 0x%016llx\n",
line,
@@ -350,7 +350,7 @@ void qed_rdma_bmap_free(struct qed_hwfn *p_hwfn,

/* print last unaligned non-zero line, if any */
if ((bmap->max_count % (64 * 8)) &&
- (bitmap_weight((unsigned long *)&pmap[item],
+ (!bitmap_empty((unsigned long *)&pmap[item],
bmap->max_count - item * 64))) {
offset = sprintf(str_last_line, "line 0x%04x: ", line);
for (; item < last_item; item++)
diff --git a/drivers/net/ethernet/qlogic/qed/qed_roce.c b/drivers/net/ethernet/qlogic/qed/qed_roce.c
index 071b4aeaddf2..134ecfca96a3 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_roce.c
+++ b/drivers/net/ethernet/qlogic/qed/qed_roce.c
@@ -76,7 +76,7 @@ void qed_roce_stop(struct qed_hwfn *p_hwfn)
* We delay for a short while if an async destroy QP is still expected.
* Beyond the added delay we clear the bitmap anyway.
*/
- while (bitmap_weight(rcid_map->bitmap, rcid_map->max_count)) {
+ while (!bitmap_empty(rcid_map->bitmap, rcid_map->max_count)) {
/* If the HW device is during recovery, all resources are
* immediately reset without receiving a per-cid indication
* from HW. In this case we don't expect the cid bitmap to be
diff --git a/drivers/perf/arm-cci.c b/drivers/perf/arm-cci.c
index 54aca3a62814..96e09fa40909 100644
--- a/drivers/perf/arm-cci.c
+++ b/drivers/perf/arm-cci.c
@@ -1096,7 +1096,7 @@ static void cci_pmu_enable(struct pmu *pmu)
{
struct cci_pmu *cci_pmu = to_cci_pmu(pmu);
struct cci_pmu_hw_events *hw_events = &cci_pmu->hw_events;
- int enabled = bitmap_weight(hw_events->used_mask, cci_pmu->num_cntrs);
+ bool enabled = !bitmap_empty(hw_events->used_mask, cci_pmu->num_cntrs);
unsigned long flags;

if (!enabled)
diff --git a/drivers/perf/arm_pmu.c b/drivers/perf/arm_pmu.c
index 295cc7952d0e..a31b302b0ade 100644
--- a/drivers/perf/arm_pmu.c
+++ b/drivers/perf/arm_pmu.c
@@ -524,7 +524,7 @@ static void armpmu_enable(struct pmu *pmu)
{
struct arm_pmu *armpmu = to_arm_pmu(pmu);
struct pmu_hw_events *hw_events = this_cpu_ptr(armpmu->hw_events);
- int enabled = bitmap_weight(hw_events->used_mask, armpmu->num_events);
+ bool enabled = !bitmap_empty(hw_events->used_mask, armpmu->num_events);

/* For task-bound events we may be called on other CPUs */
if (!cpumask_test_cpu(smp_processor_id(), &armpmu->supported_cpus))
@@ -785,7 +785,7 @@ static int cpu_pm_pmu_notify(struct notifier_block *b, unsigned long cmd,
{
struct arm_pmu *armpmu = container_of(b, struct arm_pmu, cpu_pm_nb);
struct pmu_hw_events *hw_events = this_cpu_ptr(armpmu->hw_events);
- int enabled = bitmap_weight(hw_events->used_mask, armpmu->num_events);
+ bool enabled = !bitmap_empty(hw_events->used_mask, armpmu->num_events);

if (!cpumask_test_cpu(smp_processor_id(), &armpmu->supported_cpus))
return NOTIFY_DONE;
diff --git a/drivers/perf/hisilicon/hisi_uncore_pmu.c b/drivers/perf/hisilicon/hisi_uncore_pmu.c
index a738aeab5c04..358e4e284a62 100644
--- a/drivers/perf/hisilicon/hisi_uncore_pmu.c
+++ b/drivers/perf/hisilicon/hisi_uncore_pmu.c
@@ -393,7 +393,7 @@ EXPORT_SYMBOL_GPL(hisi_uncore_pmu_read);
void hisi_uncore_pmu_enable(struct pmu *pmu)
{
struct hisi_pmu *hisi_pmu = to_hisi_pmu(pmu);
- int enabled = bitmap_weight(hisi_pmu->pmu_events.used_mask,
+ bool enabled = !bitmap_empty(hisi_pmu->pmu_events.used_mask,
hisi_pmu->num_counters);

if (!enabled)
diff --git a/drivers/perf/thunderx2_pmu.c b/drivers/perf/thunderx2_pmu.c
index 05378c0fd8f3..c8fee2e1e6d4 100644
--- a/drivers/perf/thunderx2_pmu.c
+++ b/drivers/perf/thunderx2_pmu.c
@@ -623,8 +623,7 @@ static void tx2_uncore_event_start(struct perf_event *event, int flags)
return;

/* Start timer for first event */
- if (bitmap_weight(tx2_pmu->active_counters,
- tx2_pmu->max_counters) == 1) {
+ if (bitmap_weight_eq(tx2_pmu->active_counters, tx2_pmu->max_counters, 1)) {
hrtimer_start(&tx2_pmu->hrtimer,
ns_to_ktime(tx2_pmu->hrtimer_interval),
HRTIMER_MODE_REL_PINNED);
diff --git a/drivers/perf/xgene_pmu.c b/drivers/perf/xgene_pmu.c
index 2b6d476bd213..88bd100a9633 100644
--- a/drivers/perf/xgene_pmu.c
+++ b/drivers/perf/xgene_pmu.c
@@ -867,7 +867,7 @@ static void xgene_perf_pmu_enable(struct pmu *pmu)
{
struct xgene_pmu_dev *pmu_dev = to_pmu_dev(pmu);
struct xgene_pmu *xgene_pmu = pmu_dev->parent;
- int enabled = bitmap_weight(pmu_dev->cntr_assign_mask,
+ bool enabled = !bitmap_empty(pmu_dev->cntr_assign_mask,
pmu_dev->max_counters);

if (!enabled)
diff --git a/drivers/pwm/pwm-pca9685.c b/drivers/pwm/pwm-pca9685.c
index c56001a790d0..49841a5681fb 100644
--- a/drivers/pwm/pwm-pca9685.c
+++ b/drivers/pwm/pwm-pca9685.c
@@ -98,7 +98,7 @@ static bool pca9685_prescaler_can_change(struct pca9685 *pca, int channel)
if (bitmap_empty(pca->pwms_enabled, PCA9685_MAXCHAN + 1))
return true;
/* More than one PWM enabled: Change not allowed */
- if (bitmap_weight(pca->pwms_enabled, PCA9685_MAXCHAN + 1) > 1)
+ if (bitmap_weight_gt(pca->pwms_enabled, PCA9685_MAXCHAN + 1, 1))
return false;
/*
* Only one PWM enabled: Change allowed if the PWM about to
diff --git a/drivers/staging/media/tegra-video/vi.c b/drivers/staging/media/tegra-video/vi.c
index 69d9787d5338..dfb7435aca83 100644
--- a/drivers/staging/media/tegra-video/vi.c
+++ b/drivers/staging/media/tegra-video/vi.c
@@ -436,7 +436,7 @@ static int tegra_channel_enum_format(struct file *file, void *fh,
if (!IS_ENABLED(CONFIG_VIDEO_TEGRA_TPG))
fmts_bitmap = chan->fmts_bitmap;

- if (f->index >= bitmap_weight(fmts_bitmap, MAX_FORMAT_NUM))
+ if (bitmap_weight_le(fmts_bitmap, MAX_FORMAT_NUM, f->index + 1))
return -EINVAL;

for (i = 0; i < f->index + 1; i++, index++)
diff --git a/drivers/thermal/intel/intel_powerclamp.c b/drivers/thermal/intel/intel_powerclamp.c
index 9b68489a2356..72cf8925ea3c 100644
--- a/drivers/thermal/intel/intel_powerclamp.c
+++ b/drivers/thermal/intel/intel_powerclamp.c
@@ -556,12 +556,10 @@ static void end_power_clamp(void)
* stop faster.
*/
clamping = false;
- if (bitmap_weight(cpu_clamping_mask, num_possible_cpus())) {
- for_each_set_bit(i, cpu_clamping_mask, num_possible_cpus()) {
- pr_debug("clamping worker for cpu %d alive, destroy\n",
- i);
- stop_power_clamp_worker(i);
- }
+ for_each_set_bit(i, cpu_clamping_mask, num_possible_cpus()) {
+ pr_debug("clamping worker for cpu %d alive, destroy\n",
+ i);
+ stop_power_clamp_worker(i);
}
}

diff --git a/fs/ocfs2/cluster/heartbeat.c b/fs/ocfs2/cluster/heartbeat.c
index a17be1618bf7..fb2de42400e5 100644
--- a/fs/ocfs2/cluster/heartbeat.c
+++ b/fs/ocfs2/cluster/heartbeat.c
@@ -874,8 +874,8 @@ static void o2hb_set_quorum_device(struct o2hb_region *reg)
* If global heartbeat active, unpin all regions if the
* region count > CUT_OFF
*/
- if (bitmap_weight(o2hb_quorum_region_bitmap,
- O2NM_MAX_REGIONS) > O2HB_PIN_CUT_OFF)
+ if (bitmap_weight_gt(o2hb_quorum_region_bitmap,
+ O2NM_MAX_REGIONS, O2HB_PIN_CUT_OFF))
o2hb_region_unpin(NULL);
unlock:
spin_unlock(&o2hb_live_lock);
@@ -1845,7 +1845,7 @@ static ssize_t o2hb_region_dev_store(struct config_item *item,
live_threshold = O2HB_LIVE_THRESHOLD;
if (o2hb_global_heartbeat_active()) {
spin_lock(&o2hb_live_lock);
- if (bitmap_weight(o2hb_region_bitmap, O2NM_MAX_REGIONS) == 1)
+ if (bitmap_weight_eq(o2hb_region_bitmap, O2NM_MAX_REGIONS, 1))
live_threshold <<= 1;
spin_unlock(&o2hb_live_lock);
}
@@ -2120,8 +2120,8 @@ static void o2hb_heartbeat_group_drop_item(struct config_group *group,
if (!o2hb_dependent_users)
goto unlock;

- if (bitmap_weight(o2hb_quorum_region_bitmap,
- O2NM_MAX_REGIONS) <= O2HB_PIN_CUT_OFF)
+ if (bitmap_weight_le(o2hb_quorum_region_bitmap,
+ O2NM_MAX_REGIONS, O2HB_PIN_CUT_OFF + 1))
o2hb_region_pin(NULL);

unlock:
@@ -2364,8 +2364,8 @@ static int o2hb_region_inc_user(const char *region_uuid)
if (o2hb_dependent_users > 1)
goto unlock;

- if (bitmap_weight(o2hb_quorum_region_bitmap,
- O2NM_MAX_REGIONS) <= O2HB_PIN_CUT_OFF)
+ if (bitmap_weight_le(o2hb_quorum_region_bitmap,
+ O2NM_MAX_REGIONS, O2HB_PIN_CUT_OFF + 1))
ret = o2hb_region_pin(NULL);

unlock:
--
2.25.1


2021-11-28 04:00:01

by Yury Norov

[permalink] [raw]
Subject: [PATCH 4/9] tools: sync bitmap_weight() usage with the kernel

bitmap_weight() counts all set bits in the bitmap unconditionally.
However in some cases we can traverse a part of bitmap when we
only need to check if number of set bits is greater, less or equal
to some number.

This patch adds bitmap_weight_{eq,gt,le}, reimplements bitmap_{empty,full}
and replace bitmap_weight() where appropriate.

Signed-off-by: Yury Norov <[email protected]>
---
tools/include/linux/bitmap.h | 42 +++++++++++++++++++------
tools/lib/bitmap.c | 60 ++++++++++++++++++++++++++++++++++++
tools/perf/builtin-c2c.c | 4 +--
tools/perf/util/pmu.c | 2 +-
4 files changed, 96 insertions(+), 12 deletions(-)

diff --git a/tools/include/linux/bitmap.h b/tools/include/linux/bitmap.h
index ea97804d04d4..eb2831f7e5a7 100644
--- a/tools/include/linux/bitmap.h
+++ b/tools/include/linux/bitmap.h
@@ -12,6 +12,9 @@
unsigned long name[BITS_TO_LONGS(bits)]

int __bitmap_weight(const unsigned long *bitmap, int bits);
+bool __bitmap_weight_eq(const unsigned long *bitmap, unsigned int nbits, unsigned int num);
+bool __bitmap_weight_gt(const unsigned long *bitmap, unsigned int nbits, unsigned int num);
+bool __bitmap_weight_le(const unsigned long *bitmap, unsigned int nbits, unsigned int num);
void __bitmap_or(unsigned long *dst, const unsigned long *bitmap1,
const unsigned long *bitmap2, int bits);
int __bitmap_and(unsigned long *dst, const unsigned long *bitmap1,
@@ -45,27 +48,48 @@ static inline void bitmap_fill(unsigned long *dst, unsigned int nbits)
dst[nlongs - 1] = BITMAP_LAST_WORD_MASK(nbits);
}

-static inline int bitmap_empty(const unsigned long *src, unsigned nbits)
+static inline int bitmap_weight(const unsigned long *src, unsigned int nbits)
{
if (small_const_nbits(nbits))
- return ! (*src & BITMAP_LAST_WORD_MASK(nbits));
+ return hweight_long(*src & BITMAP_LAST_WORD_MASK(nbits));
+ return __bitmap_weight(src, nbits);
+}

- return find_first_bit(src, nbits) == nbits;
+static __always_inline bool bitmap_weight_eq(const unsigned long *src,
+ unsigned int nbits, unsigned int num)
+{
+ if (small_const_nbits(nbits))
+ return hweight_long(*src & BITMAP_LAST_WORD_MASK(nbits)) == num;
+
+ return __bitmap_weight_eq(src, nbits, num);
}

-static inline int bitmap_full(const unsigned long *src, unsigned int nbits)
+static __always_inline bool bitmap_weight_gt(const unsigned long *src,
+ unsigned int nbits, unsigned int num)
{
if (small_const_nbits(nbits))
- return ! (~(*src) & BITMAP_LAST_WORD_MASK(nbits));
+ return hweight_long(*src & BITMAP_LAST_WORD_MASK(nbits)) > num;

- return find_first_zero_bit(src, nbits) == nbits;
+ return __bitmap_weight_gt(src, nbits, num);
}

-static inline int bitmap_weight(const unsigned long *src, unsigned int nbits)
+static __always_inline bool bitmap_weight_le(const unsigned long *src,
+ unsigned int nbits, unsigned int num)
{
if (small_const_nbits(nbits))
- return hweight_long(*src & BITMAP_LAST_WORD_MASK(nbits));
- return __bitmap_weight(src, nbits);
+ return hweight_long(*src & BITMAP_LAST_WORD_MASK(nbits)) < num;
+
+ return __bitmap_weight_le(src, nbits, num);
+}
+
+static __always_inline bool bitmap_empty(const unsigned long *src, unsigned int nbits)
+{
+ return bitmap_weight_eq(src, nbits, 0);
+}
+
+static __always_inline bool bitmap_full(const unsigned long *src, unsigned int nbits)
+{
+ return bitmap_weight_eq(src, nbits, nbits);
}

static inline void bitmap_or(unsigned long *dst, const unsigned long *src1,
diff --git a/tools/lib/bitmap.c b/tools/lib/bitmap.c
index db466ef7be9d..3aaf1767d237 100644
--- a/tools/lib/bitmap.c
+++ b/tools/lib/bitmap.c
@@ -18,6 +18,66 @@ int __bitmap_weight(const unsigned long *bitmap, int bits)
return w;
}

+bool __bitmap_weight_eq(const unsigned long *bitmap, unsigned int bits, unsigned int num)
+{
+ unsigned int k, w, lim = bits / BITS_PER_LONG;
+
+ for (k = 0, w = 0; k < lim; k++) {
+ if (w + bits - k * BITS_PER_LONG < num)
+ return false;
+
+ w += hweight_long(bitmap[k]);
+
+ if (w > num)
+ return false;
+ }
+
+ if (bits % BITS_PER_LONG)
+ w += hweight_long(bitmap[k] & BITMAP_LAST_WORD_MASK(bits));
+
+ return w == num;
+}
+
+bool __bitmap_weight_gt(const unsigned long *bitmap, unsigned int bits, unsigned int num)
+{
+ unsigned int k, w, lim = bits / BITS_PER_LONG;
+
+ for (k = 0, w = 0; k < lim; k++) {
+ if (w + bits - k * BITS_PER_LONG <= num)
+ return false;
+
+ w += hweight_long(bitmap[k]);
+
+ if (w > num)
+ return true;
+ }
+
+ if (bits % BITS_PER_LONG)
+ w += hweight_long(bitmap[k] & BITMAP_LAST_WORD_MASK(bits));
+
+ return w > num;
+}
+
+bool __bitmap_weight_le(const unsigned long *bitmap, unsigned int bits, unsigned int num)
+{
+ unsigned int k, w, lim = bits / BITS_PER_LONG;
+
+ for (k = 0, w = 0; k < lim; k++) {
+ if (w + bits - k * BITS_PER_LONG < num)
+ return true;
+
+ w += hweight_long(bitmap[k]);
+
+ if (w >= num)
+ return false;
+ }
+
+ if (bits % BITS_PER_LONG)
+ w += hweight_long(bitmap[k] & BITMAP_LAST_WORD_MASK(bits));
+
+ return w < num;
+}
+
void __bitmap_or(unsigned long *dst, const unsigned long *bitmap1,
const unsigned long *bitmap2, int bits)
{
diff --git a/tools/perf/builtin-c2c.c b/tools/perf/builtin-c2c.c
index b5c67ef73862..51997386fb31 100644
--- a/tools/perf/builtin-c2c.c
+++ b/tools/perf/builtin-c2c.c
@@ -1080,7 +1080,7 @@ node_entry(struct perf_hpp_fmt *fmt __maybe_unused, struct perf_hpp *hpp,
bitmap_zero(set, c2c.cpus_cnt);
bitmap_and(set, c2c_he->cpuset, c2c.nodes[node], c2c.cpus_cnt);

- if (!bitmap_weight(set, c2c.cpus_cnt)) {
+ if (bitmap_empty(set, c2c.cpus_cnt)) {
if (c2c.node_info == 1) {
ret = scnprintf(hpp->buf, hpp->size, "%21s", " ");
advance_hpp(hpp, ret);
@@ -1944,7 +1944,7 @@ static int set_nodestr(struct c2c_hist_entry *c2c_he)
if (c2c_he->nodestr)
return 0;

- if (bitmap_weight(c2c_he->nodeset, c2c.nodes_cnt)) {
+ if (!bitmap_empty(c2c_he->nodeset, c2c.nodes_cnt)) {
len = bitmap_scnprintf(c2c_he->nodeset, c2c.nodes_cnt,
buf, sizeof(buf));
} else {
diff --git a/tools/perf/util/pmu.c b/tools/perf/util/pmu.c
index 6ae58406f4fc..015ee1321c7c 100644
--- a/tools/perf/util/pmu.c
+++ b/tools/perf/util/pmu.c
@@ -1314,7 +1314,7 @@ static int pmu_config_term(const char *pmu_name,
*/
if (term->type_val == PARSE_EVENTS__TERM_TYPE_NUM) {
if (term->no_value &&
- bitmap_weight(format->bits, PERF_PMU_FORMAT_BITS) > 1) {
+ bitmap_weight_gt(format->bits, PERF_PMU_FORMAT_BITS, 1)) {
if (err) {
parse_events_error__handle(err, term->err_val,
strdup("no value assigned for term"),
--
2.25.1


2021-11-28 04:00:07

by Yury Norov

[permalink] [raw]
Subject: [PATCH 5/9] lib/cpumask: add cpumask_weight_{eq,gt,le}

Add cpumask_weight_{eq,gt,le} and replace cpumask_weight() with one
of cpumask_weight_{empty,eq,gt,le} where appropriate. This allows
cpumask_weight_*() to return earlier depending on the condition.

Signed-off-by: Yury Norov <[email protected]>
---
arch/alpha/kernel/process.c | 2 +-
arch/ia64/kernel/setup.c | 2 +-
arch/ia64/mm/tlb.c | 2 +-
arch/mips/cavium-octeon/octeon-irq.c | 4 +--
arch/mips/kernel/crash.c | 2 +-
arch/powerpc/kernel/smp.c | 2 +-
arch/powerpc/kernel/watchdog.c | 4 +--
arch/powerpc/xmon/xmon.c | 4 +--
arch/s390/kernel/perf_cpum_cf.c | 2 +-
arch/x86/kernel/cpu/resctrl/rdtgroup.c | 14 +++++------
arch/x86/kernel/smpboot.c | 4 +--
arch/x86/mm/mmio-mod.c | 2 +-
arch/x86/platform/uv/uv_nmi.c | 2 +-
drivers/cpufreq/qcom-cpufreq-hw.c | 2 +-
drivers/cpufreq/scmi-cpufreq.c | 2 +-
drivers/firmware/psci/psci_checker.c | 2 +-
drivers/gpu/drm/i915/i915_pmu.c | 2 +-
drivers/hv/channel_mgmt.c | 4 +--
drivers/infiniband/hw/hfi1/affinity.c | 13 +++++-----
drivers/infiniband/hw/qib/qib_file_ops.c | 2 +-
drivers/infiniband/hw/qib/qib_iba7322.c | 2 +-
drivers/infiniband/sw/siw/siw_main.c | 3 +--
drivers/irqchip/irq-bcm6345-l1.c | 2 +-
drivers/scsi/lpfc/lpfc_init.c | 2 +-
drivers/soc/fsl/qbman/qman_test_stash.c | 2 +-
include/linux/cpumask.h | 32 ++++++++++++++++++++++++
kernel/irq/affinity.c | 2 +-
kernel/padata.c | 2 +-
kernel/rcu/tree_nocb.h | 4 +--
kernel/rcu/tree_plugin.h | 2 +-
kernel/sched/core.c | 10 ++++----
kernel/sched/topology.c | 4 +--
kernel/time/clockevents.c | 2 +-
kernel/time/clocksource.c | 2 +-
mm/vmstat.c | 4 +--
35 files changed, 89 insertions(+), 59 deletions(-)

diff --git a/arch/alpha/kernel/process.c b/arch/alpha/kernel/process.c
index 5f8527081da9..0d4bc60828bf 100644
--- a/arch/alpha/kernel/process.c
+++ b/arch/alpha/kernel/process.c
@@ -125,7 +125,7 @@ common_shutdown_1(void *generic_ptr)
/* Wait for the secondaries to halt. */
set_cpu_present(boot_cpuid, false);
set_cpu_possible(boot_cpuid, false);
- while (cpumask_weight(cpu_present_mask))
+ while (!cpumask_empty(cpu_present_mask))
barrier();
#endif

diff --git a/arch/ia64/kernel/setup.c b/arch/ia64/kernel/setup.c
index 5010348fa21b..fd6301eafa9d 100644
--- a/arch/ia64/kernel/setup.c
+++ b/arch/ia64/kernel/setup.c
@@ -572,7 +572,7 @@ setup_arch (char **cmdline_p)
#ifdef CONFIG_ACPI_HOTPLUG_CPU
prefill_possible_map();
#endif
- per_cpu_scan_finalize((cpumask_weight(&early_cpu_possible_map) == 0 ?
+ per_cpu_scan_finalize((cpumask_empty(&early_cpu_possible_map) ?
32 : cpumask_weight(&early_cpu_possible_map)),
additional_cpus > 0 ? additional_cpus : 0);
#endif /* CONFIG_ACPI_NUMA */
diff --git a/arch/ia64/mm/tlb.c b/arch/ia64/mm/tlb.c
index 135b5135cace..a5bce13ab047 100644
--- a/arch/ia64/mm/tlb.c
+++ b/arch/ia64/mm/tlb.c
@@ -332,7 +332,7 @@ __flush_tlb_range (struct vm_area_struct *vma, unsigned long start,

preempt_disable();
#ifdef CONFIG_SMP
- if (mm != current->active_mm || cpumask_weight(mm_cpumask(mm)) != 1) {
+ if (mm != current->active_mm || !cpumask_weight_eq(mm_cpumask(mm), 1)) {
ia64_global_tlb_purge(mm, start, end, nbits);
preempt_enable();
return;
diff --git a/arch/mips/cavium-octeon/octeon-irq.c b/arch/mips/cavium-octeon/octeon-irq.c
index 844f882096e6..914871f15fb7 100644
--- a/arch/mips/cavium-octeon/octeon-irq.c
+++ b/arch/mips/cavium-octeon/octeon-irq.c
@@ -763,7 +763,7 @@ static void octeon_irq_cpu_offline_ciu(struct irq_data *data)
if (!cpumask_test_cpu(cpu, mask))
return;

- if (cpumask_weight(mask) > 1) {
+ if (cpumask_weight_gt(mask, 1)) {
/*
* It has multi CPU affinity, just remove this CPU
* from the affinity set.
@@ -795,7 +795,7 @@ static int octeon_irq_ciu_set_affinity(struct irq_data *data,
* This removes the need to do locking in the .ack/.eoi
* functions.
*/
- if (cpumask_weight(dest) != 1)
+ if (!cpumask_weight_eq(dest, 1))
return -EINVAL;

if (!enable_one)
diff --git a/arch/mips/kernel/crash.c b/arch/mips/kernel/crash.c
index 81845ba04835..4c35004754db 100644
--- a/arch/mips/kernel/crash.c
+++ b/arch/mips/kernel/crash.c
@@ -72,7 +72,7 @@ static void crash_kexec_prepare_cpus(void)
*/
pr_emerg("Sending IPI to other cpus...\n");
msecs = 10000;
- while ((cpumask_weight(&cpus_in_crash) < ncpus) && (--msecs > 0)) {
+ while (cpumask_weight_le(&cpus_in_crash, ncpus) && (--msecs > 0)) {
cpu_relax();
mdelay(1);
}
diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
index c23ee842c4c3..2eae302ea26b 100644
--- a/arch/powerpc/kernel/smp.c
+++ b/arch/powerpc/kernel/smp.c
@@ -1615,7 +1615,7 @@ void start_secondary(void *unused)
if (has_big_cores)
sibling_mask = cpu_smallcore_mask;

- if (cpumask_weight(mask) > cpumask_weight(sibling_mask(cpu)))
+ if (cpumask_weight_gt(mask, cpumask_weight(sibling_mask(cpu))))
shared_caches = true;
}

diff --git a/arch/powerpc/kernel/watchdog.c b/arch/powerpc/kernel/watchdog.c
index 3fa6d240bade..9e9cba7cfb85 100644
--- a/arch/powerpc/kernel/watchdog.c
+++ b/arch/powerpc/kernel/watchdog.c
@@ -158,7 +158,7 @@ static void watchdog_smp_panic(int cpu, u64 tb)
goto out;
if (cpumask_test_cpu(cpu, &wd_smp_cpus_pending))
goto out;
- if (cpumask_weight(&wd_smp_cpus_pending) == 0)
+ if (cpumask_empty(&wd_smp_cpus_pending))
goto out;

pr_emerg("CPU %d detected hard LOCKUP on other CPUs %*pbl\n",
@@ -346,7 +346,7 @@ static void start_watchdog(void *arg)

wd_smp_lock(&flags);
cpumask_set_cpu(cpu, &wd_cpus_enabled);
- if (cpumask_weight(&wd_cpus_enabled) == 1) {
+ if (cpumask_weight_eq(&wd_cpus_enabled, 1)) {
cpumask_set_cpu(cpu, &wd_smp_cpus_pending);
wd_smp_last_reset_tb = get_tb();
}
diff --git a/arch/powerpc/xmon/xmon.c b/arch/powerpc/xmon/xmon.c
index 8b28ff9d98d1..2073be312fe9 100644
--- a/arch/powerpc/xmon/xmon.c
+++ b/arch/powerpc/xmon/xmon.c
@@ -469,7 +469,7 @@ static bool wait_for_other_cpus(int ncpus)

/* We wait for 2s, which is a metric "little while" */
for (timeout = 20000; timeout != 0; --timeout) {
- if (cpumask_weight(&cpus_in_xmon) >= ncpus)
+ if (cpumask_weight_gt(&cpus_in_xmon, ncpus - 1))
return true;
udelay(100);
barrier();
@@ -1338,7 +1338,7 @@ static int cpu_cmd(void)
case 'S':
case 't':
cpumask_copy(&xmon_batch_cpus, &cpus_in_xmon);
- if (cpumask_weight(&xmon_batch_cpus) <= 1) {
+ if (cpumask_weight_le(&xmon_batch_cpus, 2)) {
printf("There are no other cpus in xmon\n");
break;
}
diff --git a/arch/s390/kernel/perf_cpum_cf.c b/arch/s390/kernel/perf_cpum_cf.c
index ee8707abdb6a..4d217f7f5ccf 100644
--- a/arch/s390/kernel/perf_cpum_cf.c
+++ b/arch/s390/kernel/perf_cpum_cf.c
@@ -975,7 +975,7 @@ static int cfset_all_start(struct cfset_request *req)
return -ENOMEM;
cpumask_and(mask, &req->mask, cpu_online_mask);
on_each_cpu_mask(mask, cfset_ioctl_on, &p, 1);
- if (atomic_read(&p.cpus_ack) != cpumask_weight(mask)) {
+ if (!cpumask_weight_eq(mask, atomic_read(&p.cpus_ack))) {
on_each_cpu_mask(mask, cfset_ioctl_off, &p, 1);
rc = -EIO;
debug_sprintf_event(cf_dbg, 4, "%s CPUs missing", __func__);
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index 94e7e6b420e4..5fa730063af2 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -341,14 +341,14 @@ static int cpus_mon_write(struct rdtgroup *rdtgrp, cpumask_var_t newmask,

/* Check whether cpus belong to parent ctrl group */
cpumask_andnot(tmpmask, newmask, &prgrp->cpu_mask);
- if (cpumask_weight(tmpmask)) {
+ if (!cpumask_empty(tmpmask)) {
rdt_last_cmd_puts("Can only add CPUs to mongroup that belong to parent\n");
return -EINVAL;
}

/* Check whether cpus are dropped from this group */
cpumask_andnot(tmpmask, &rdtgrp->cpu_mask, newmask);
- if (cpumask_weight(tmpmask)) {
+ if (!cpumask_empty(tmpmask)) {
/* Give any dropped cpus to parent rdtgroup */
cpumask_or(&prgrp->cpu_mask, &prgrp->cpu_mask, tmpmask);
update_closid_rmid(tmpmask, prgrp);
@@ -359,7 +359,7 @@ static int cpus_mon_write(struct rdtgroup *rdtgrp, cpumask_var_t newmask,
* and update per-cpu rmid
*/
cpumask_andnot(tmpmask, newmask, &rdtgrp->cpu_mask);
- if (cpumask_weight(tmpmask)) {
+ if (!cpumask_empty(tmpmask)) {
head = &prgrp->mon.crdtgrp_list;
list_for_each_entry(crgrp, head, mon.crdtgrp_list) {
if (crgrp == rdtgrp)
@@ -394,7 +394,7 @@ static int cpus_ctrl_write(struct rdtgroup *rdtgrp, cpumask_var_t newmask,

/* Check whether cpus are dropped from this group */
cpumask_andnot(tmpmask, &rdtgrp->cpu_mask, newmask);
- if (cpumask_weight(tmpmask)) {
+ if (!cpumask_empty(tmpmask)) {
/* Can't drop from default group */
if (rdtgrp == &rdtgroup_default) {
rdt_last_cmd_puts("Can't drop CPUs from default group\n");
@@ -413,12 +413,12 @@ static int cpus_ctrl_write(struct rdtgroup *rdtgrp, cpumask_var_t newmask,
* and update per-cpu closid/rmid.
*/
cpumask_andnot(tmpmask, newmask, &rdtgrp->cpu_mask);
- if (cpumask_weight(tmpmask)) {
+ if (!cpumask_empty(tmpmask)) {
list_for_each_entry(r, &rdt_all_groups, rdtgroup_list) {
if (r == rdtgrp)
continue;
cpumask_and(tmpmask1, &r->cpu_mask, tmpmask);
- if (cpumask_weight(tmpmask1))
+ if (!cpumask_empty(tmpmask1))
cpumask_rdtgrp_clear(r, tmpmask1);
}
update_closid_rmid(tmpmask, rdtgrp);
@@ -488,7 +488,7 @@ static ssize_t rdtgroup_cpus_write(struct kernfs_open_file *of,

/* check that user didn't specify any offline cpus */
cpumask_andnot(tmpmask, newmask, cpu_online_mask);
- if (cpumask_weight(tmpmask)) {
+ if (!cpumask_empty(tmpmask)) {
ret = -EINVAL;
rdt_last_cmd_puts("Can only assign online CPUs\n");
goto unlock;
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index ac2909f0cab3..394071898b50 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -1594,7 +1594,7 @@ static void remove_siblinginfo(int cpu)
/*/
* last thread sibling in this cpu core going down
*/
- if (cpumask_weight(topology_sibling_cpumask(cpu)) == 1)
+ if (cpumask_weight_eq(topology_sibling_cpumask(cpu), 1))
cpu_data(sibling).booted_cores--;
}

@@ -1603,7 +1603,7 @@ static void remove_siblinginfo(int cpu)

for_each_cpu(sibling, topology_sibling_cpumask(cpu)) {
cpumask_clear_cpu(cpu, topology_sibling_cpumask(sibling));
- if (cpumask_weight(topology_sibling_cpumask(sibling)) == 1)
+ if (cpumask_weight_eq(topology_sibling_cpumask(sibling), 1))
cpu_data(sibling).smt_active = false;
}

diff --git a/arch/x86/mm/mmio-mod.c b/arch/x86/mm/mmio-mod.c
index 933a2ebad471..c3317f0650d8 100644
--- a/arch/x86/mm/mmio-mod.c
+++ b/arch/x86/mm/mmio-mod.c
@@ -400,7 +400,7 @@ static void leave_uniprocessor(void)
int cpu;
int err;

- if (!cpumask_available(downed_cpus) || cpumask_weight(downed_cpus) == 0)
+ if (!cpumask_available(downed_cpus) || cpumask_empty(downed_cpus))
return;
pr_notice("Re-enabling CPUs...\n");
for_each_cpu(cpu, downed_cpus) {
diff --git a/arch/x86/platform/uv/uv_nmi.c b/arch/x86/platform/uv/uv_nmi.c
index 1e9ff28bc2e0..ea277fc08357 100644
--- a/arch/x86/platform/uv/uv_nmi.c
+++ b/arch/x86/platform/uv/uv_nmi.c
@@ -985,7 +985,7 @@ static int uv_handle_nmi(unsigned int reason, struct pt_regs *regs)

/* Clear global flags */
if (master) {
- if (cpumask_weight(uv_nmi_cpu_mask))
+ if (!cpumask_empty(uv_nmi_cpu_mask))
uv_nmi_cleanup_mask();
atomic_set(&uv_nmi_cpus_in_nmi, -1);
atomic_set(&uv_nmi_cpu, -1);
diff --git a/drivers/cpufreq/qcom-cpufreq-hw.c b/drivers/cpufreq/qcom-cpufreq-hw.c
index 248135e5087e..60055ab190a9 100644
--- a/drivers/cpufreq/qcom-cpufreq-hw.c
+++ b/drivers/cpufreq/qcom-cpufreq-hw.c
@@ -475,7 +475,7 @@ static int qcom_cpufreq_hw_cpu_init(struct cpufreq_policy *policy)
}

qcom_get_related_cpus(index, policy->cpus);
- if (!cpumask_weight(policy->cpus)) {
+ if (cpumask_empty(policy->cpus)) {
dev_err(dev, "Domain-%d failed to get related CPUs\n", index);
ret = -ENOENT;
goto error;
diff --git a/drivers/cpufreq/scmi-cpufreq.c b/drivers/cpufreq/scmi-cpufreq.c
index 1e0cd4d165f0..919fa6e3f462 100644
--- a/drivers/cpufreq/scmi-cpufreq.c
+++ b/drivers/cpufreq/scmi-cpufreq.c
@@ -154,7 +154,7 @@ static int scmi_cpufreq_init(struct cpufreq_policy *policy)
* table and opp-shared.
*/
ret = dev_pm_opp_of_get_sharing_cpus(cpu_dev, priv->opp_shared_cpus);
- if (ret || !cpumask_weight(priv->opp_shared_cpus)) {
+ if (ret || cpumask_empty(priv->opp_shared_cpus)) {
/*
* Either opp-table is not set or no opp-shared was found.
* Use the CPU mask from SCMI to designate CPUs sharing an OPP
diff --git a/drivers/firmware/psci/psci_checker.c b/drivers/firmware/psci/psci_checker.c
index 116eb465cdb4..90c9473832a9 100644
--- a/drivers/firmware/psci/psci_checker.c
+++ b/drivers/firmware/psci/psci_checker.c
@@ -90,7 +90,7 @@ static unsigned int down_and_up_cpus(const struct cpumask *cpus,
* cpu_down() checks the number of online CPUs before the TOS
* resident CPU.
*/
- if (cpumask_weight(offlined_cpus) + 1 == nb_available_cpus) {
+ if (cpumask_weight_eq(offlined_cpus, nb_available_cpus - 1)) {
if (ret != -EBUSY) {
pr_err("Unexpected return code %d while trying "
"to power down last online CPU %d\n",
diff --git a/drivers/gpu/drm/i915/i915_pmu.c b/drivers/gpu/drm/i915/i915_pmu.c
index 0b488d49694c..962e8d6bf6ea 100644
--- a/drivers/gpu/drm/i915/i915_pmu.c
+++ b/drivers/gpu/drm/i915/i915_pmu.c
@@ -1048,7 +1048,7 @@ static int i915_pmu_cpu_online(unsigned int cpu, struct hlist_node *node)
GEM_BUG_ON(!pmu->base.event_init);

/* Select the first online CPU as a designated reader. */
- if (!cpumask_weight(&i915_pmu_cpumask))
+ if (cpumask_empty(&i915_pmu_cpumask))
cpumask_set_cpu(cpu, &i915_pmu_cpumask);

return 0;
diff --git a/drivers/hv/channel_mgmt.c b/drivers/hv/channel_mgmt.c
index 2829575fd9b7..da297220230d 100644
--- a/drivers/hv/channel_mgmt.c
+++ b/drivers/hv/channel_mgmt.c
@@ -762,8 +762,8 @@ static void init_vp_index(struct vmbus_channel *channel)
}
alloced_mask = &hv_context.hv_numa_map[numa_node];

- if (cpumask_weight(alloced_mask) ==
- cpumask_weight(cpumask_of_node(numa_node))) {
+ if (cpumask_weight_eq(alloced_mask,
+ cpumask_weight(cpumask_of_node(numa_node)))) {
/*
* We have cycled through all the CPUs in the node;
* reset the alloced map.
diff --git a/drivers/infiniband/hw/hfi1/affinity.c b/drivers/infiniband/hw/hfi1/affinity.c
index 98c813ba4304..7c5ca5c5306a 100644
--- a/drivers/infiniband/hw/hfi1/affinity.c
+++ b/drivers/infiniband/hw/hfi1/affinity.c
@@ -507,7 +507,7 @@ static int _dev_comp_vect_cpu_mask_init(struct hfi1_devdata *dd,
* available CPUs divide it by the number of devices in the
* local NUMA node.
*/
- if (cpumask_weight(&entry->comp_vect_mask) == 1) {
+ if (cpumask_weight_eq(&entry->comp_vect_mask, 1)) {
possible_cpus_comp_vect = 1;
dd_dev_warn(dd,
"Number of kernel receive queues is too large for completion vector affinity to be effective\n");
@@ -593,7 +593,7 @@ int hfi1_dev_affinity_init(struct hfi1_devdata *dd)
{
struct hfi1_affinity_node *entry;
const struct cpumask *local_mask;
- int curr_cpu, possible, i, ret;
+ int curr_cpu, i, ret;
bool new_entry = false;

local_mask = cpumask_of_node(dd->node);
@@ -626,10 +626,9 @@ int hfi1_dev_affinity_init(struct hfi1_devdata *dd)
local_mask);

/* fill in the receive list */
- possible = cpumask_weight(&entry->def_intr.mask);
curr_cpu = cpumask_first(&entry->def_intr.mask);

- if (possible == 1) {
+ if (cpumask_weight_eq(&entry->def_intr.mask, 1)) {
/* only one CPU, everyone will use it */
cpumask_set_cpu(curr_cpu, &entry->rcv_intr.mask);
cpumask_set_cpu(curr_cpu, &entry->general_intr_mask);
@@ -667,7 +666,7 @@ int hfi1_dev_affinity_init(struct hfi1_devdata *dd)
* engines, use the same CPU cores as general/control
* context.
*/
- if (cpumask_weight(&entry->def_intr.mask) == 0)
+ if (cpumask_empty(&entry->def_intr.mask))
cpumask_copy(&entry->def_intr.mask,
&entry->general_intr_mask);
}
@@ -687,7 +686,7 @@ int hfi1_dev_affinity_init(struct hfi1_devdata *dd)
* vectors, use the same CPU core as the general/control
* context.
*/
- if (cpumask_weight(&entry->comp_vect_mask) == 0)
+ if (cpumask_empty(&entry->comp_vect_mask))
cpumask_copy(&entry->comp_vect_mask,
&entry->general_intr_mask);
}
@@ -1017,7 +1016,7 @@ int hfi1_get_proc_affinity(int node)
cpu = cpumask_first(proc_mask);
cpumask_set_cpu(cpu, &set->used);
goto done;
- } else if (current->nr_cpus_allowed < cpumask_weight(&set->mask)) {
+ } else if (cpumask_weight_gt(&set->mask, current->nr_cpus_allowed)) {
hfi1_cdbg(PROC, "PID %u %s affinity set to CPU set(s) %*pbl",
current->pid, current->comm,
cpumask_pr_args(proc_mask));
diff --git a/drivers/infiniband/hw/qib/qib_file_ops.c b/drivers/infiniband/hw/qib/qib_file_ops.c
index aa290928cf96..60717606fe98 100644
--- a/drivers/infiniband/hw/qib/qib_file_ops.c
+++ b/drivers/infiniband/hw/qib/qib_file_ops.c
@@ -1151,7 +1151,7 @@ static void assign_ctxt_affinity(struct file *fp, struct qib_devdata *dd)
* reserve a processor for it on the local NUMA node.
*/
if ((weight >= qib_cpulist_count) &&
- (cpumask_weight(local_mask) <= qib_cpulist_count)) {
+ (cpumask_weight_le(local_mask, qib_cpulist_count + 1))) {
for_each_cpu(local_cpu, local_mask)
if (!test_and_set_bit(local_cpu, qib_cpulist)) {
fd->rec_cpu_num = local_cpu;
diff --git a/drivers/infiniband/hw/qib/qib_iba7322.c b/drivers/infiniband/hw/qib/qib_iba7322.c
index ab98b6a3ae1e..636a080b2952 100644
--- a/drivers/infiniband/hw/qib/qib_iba7322.c
+++ b/drivers/infiniband/hw/qib/qib_iba7322.c
@@ -3405,7 +3405,7 @@ static void qib_setup_7322_interrupt(struct qib_devdata *dd, int clearpend)
local_mask = cpumask_of_pcibus(dd->pcidev->bus);
firstcpu = cpumask_first(local_mask);
if (firstcpu >= nr_cpu_ids ||
- cpumask_weight(local_mask) == num_online_cpus()) {
+ cpumask_weight_eq(local_mask, num_online_cpus())) {
local_mask = topology_core_cpumask(0);
firstcpu = cpumask_first(local_mask);
}
diff --git a/drivers/infiniband/sw/siw/siw_main.c b/drivers/infiniband/sw/siw/siw_main.c
index e5c586913d0b..5d6220137a70 100644
--- a/drivers/infiniband/sw/siw/siw_main.c
+++ b/drivers/infiniband/sw/siw/siw_main.c
@@ -193,8 +193,7 @@ int siw_get_tx_cpu(struct siw_device *sdev)
else
tx_cpumask = siw_cpu_info.tx_valid_cpus[node];

- num_cpus = cpumask_weight(tx_cpumask);
- if (!num_cpus) {
+ if (cpumask_empty(tx_cpumask)) {
/* no CPU on this NUMA node */
tx_cpumask = cpu_online_mask;
num_cpus = cpumask_weight(tx_cpumask);
diff --git a/drivers/irqchip/irq-bcm6345-l1.c b/drivers/irqchip/irq-bcm6345-l1.c
index fd079215c17f..142a7431745f 100644
--- a/drivers/irqchip/irq-bcm6345-l1.c
+++ b/drivers/irqchip/irq-bcm6345-l1.c
@@ -315,7 +315,7 @@ static int __init bcm6345_l1_of_init(struct device_node *dn,
cpumask_set_cpu(idx, &intc->cpumask);
}

- if (!cpumask_weight(&intc->cpumask)) {
+ if (cpumask_empty(&intc->cpumask)) {
ret = -ENODEV;
goto out_free;
}
diff --git a/drivers/scsi/lpfc/lpfc_init.c b/drivers/scsi/lpfc/lpfc_init.c
index ba17a8f740a9..3c9e31078f06 100644
--- a/drivers/scsi/lpfc/lpfc_init.c
+++ b/drivers/scsi/lpfc/lpfc_init.c
@@ -12626,7 +12626,7 @@ lpfc_cpuhp_get_eq(struct lpfc_hba *phba, unsigned int cpu,
* gone offline yet, we need >1.
*/
cpumask_and(tmp, maskp, cpu_online_mask);
- if (cpumask_weight(tmp) > 1)
+ if (cpumask_weight_gt(tmp, 1))
continue;

/* Now that we have an irq to shutdown, get the eq
diff --git a/drivers/soc/fsl/qbman/qman_test_stash.c b/drivers/soc/fsl/qbman/qman_test_stash.c
index b7e8e5ec884c..7ef6c624bb59 100644
--- a/drivers/soc/fsl/qbman/qman_test_stash.c
+++ b/drivers/soc/fsl/qbman/qman_test_stash.c
@@ -561,7 +561,7 @@ int qman_test_stash(void)
{
int err;

- if (cpumask_weight(cpu_online_mask) < 2) {
+ if (cpumask_weight_le(cpu_online_mask, 2)) {
pr_info("%s(): skip - only 1 CPU\n", __func__);
return 0;
}
diff --git a/include/linux/cpumask.h b/include/linux/cpumask.h
index 64dae70d31f5..b5e50cf74785 100644
--- a/include/linux/cpumask.h
+++ b/include/linux/cpumask.h
@@ -575,6 +575,38 @@ static inline unsigned int cpumask_weight(const struct cpumask *srcp)
return bitmap_weight(cpumask_bits(srcp), nr_cpumask_bits);
}

+/**
+ * cpumask_weight_eq - Check if # of bits in *srcp is equal to a given number
+ * @srcp: the cpumask to count bits (< nr_cpu_ids) in.
+ * @num: the number to check.
+ */
+static inline bool cpumask_weight_eq(const struct cpumask *srcp, unsigned int num)
+{
+ return bitmap_weight_eq(cpumask_bits(srcp), nr_cpumask_bits, num);
+}
+
+/**
+ * cpumask_weight_gt - Check if # of bits in *srcp is greater than a given number
+ * @srcp: the cpumask to count bits (< nr_cpu_ids) in.
+ * @num: the number to check.
+ */
+static inline bool cpumask_weight_gt(const struct cpumask *srcp,
+ unsigned int num)
+{
+ return bitmap_weight_gt(cpumask_bits(srcp), nr_cpumask_bits, num);
+}
+
+/**
+ * cpumask_weight_le - Check if # of bits in *srcp is less than a given number
+ * @srcp: the cpumask to count bits (< nr_cpu_ids) in.
+ * @num: the number to check.
+ */
+static inline bool cpumask_weight_le(const struct cpumask *srcp,
+ unsigned int num)
+{
+ return bitmap_weight_le(cpumask_bits(srcp), nr_cpumask_bits, num);
+}
+
/**
* cpumask_shift_right - *dstp = *srcp >> n
* @dstp: the cpumask result
diff --git a/kernel/irq/affinity.c b/kernel/irq/affinity.c
index f7ff8919dc9b..18740faf0eb1 100644
--- a/kernel/irq/affinity.c
+++ b/kernel/irq/affinity.c
@@ -258,7 +258,7 @@ static int __irq_build_affinity_masks(unsigned int startvec,
nodemask_t nodemsk = NODE_MASK_NONE;
struct node_vectors *node_vectors;

- if (!cpumask_weight(cpu_mask))
+ if (cpumask_empty(cpu_mask))
return 0;

nodes = get_nodes_in_cpumask(node_to_cpumask, cpu_mask, &nodemsk);
diff --git a/kernel/padata.c b/kernel/padata.c
index 18d3a5c699d8..e5819bb8bd1d 100644
--- a/kernel/padata.c
+++ b/kernel/padata.c
@@ -181,7 +181,7 @@ int padata_do_parallel(struct padata_shell *ps,
goto out;

if (!cpumask_test_cpu(*cb_cpu, pd->cpumask.cbcpu)) {
- if (!cpumask_weight(pd->cpumask.cbcpu))
+ if (cpumask_empty(pd->cpumask.cbcpu))
goto out;

/* Select an alternate fallback CPU and notify the caller. */
diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h
index 2461fe8d0c23..82473478e222 100644
--- a/kernel/rcu/tree_nocb.h
+++ b/kernel/rcu/tree_nocb.h
@@ -1135,7 +1135,7 @@ void __init rcu_init_nohz(void)
struct rcu_data *rdp;

#if defined(CONFIG_NO_HZ_FULL)
- if (tick_nohz_full_running && cpumask_weight(tick_nohz_full_mask))
+ if (tick_nohz_full_running && !cpumask_empty(tick_nohz_full_mask))
need_rcu_nocb_mask = true;
#endif /* #if defined(CONFIG_NO_HZ_FULL) */

@@ -1319,7 +1319,7 @@ static void __init rcu_organize_nocb_kthreads(void)
*/
void rcu_bind_current_to_nocb(void)
{
- if (cpumask_available(rcu_nocb_mask) && cpumask_weight(rcu_nocb_mask))
+ if (cpumask_available(rcu_nocb_mask) && !cpumask_empty(rcu_nocb_mask))
WARN_ON(sched_setaffinity(current->pid, rcu_nocb_mask));
}
EXPORT_SYMBOL_GPL(rcu_bind_current_to_nocb);
diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index 463735a3b657..2908495cc840 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -1215,7 +1215,7 @@ static void rcu_boost_kthread_setaffinity(struct rcu_node *rnp, int outgoingcpu)
cpu != outgoingcpu)
cpumask_set_cpu(cpu, cm);
cpumask_and(cm, cm, housekeeping_cpumask(HK_FLAG_RCU));
- if (cpumask_weight(cm) == 0)
+ if (cpumask_empty(cm))
cpumask_copy(cm, housekeeping_cpumask(HK_FLAG_RCU));
set_cpus_allowed_ptr(t, cm);
free_cpumask_var(cm);
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index beaa8be6241e..c91912c0c005 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -6003,7 +6003,7 @@ static void sched_core_cpu_starting(unsigned int cpu)
WARN_ON_ONCE(rq->core != rq);

/* if we're the first, we'll be our own leader */
- if (cpumask_weight(smt_mask) == 1)
+ if (cpumask_weight_eq(smt_mask, 1))
goto unlock;

/* find the leader */
@@ -6044,7 +6044,7 @@ static void sched_core_cpu_deactivate(unsigned int cpu)
sched_core_lock(cpu, &flags);

/* if we're the last man standing, nothing to do */
- if (cpumask_weight(smt_mask) == 1) {
+ if (cpumask_weight_eq(smt_mask, 1)) {
WARN_ON_ONCE(rq->core != rq);
goto unlock;
}
@@ -8715,7 +8715,7 @@ int cpuset_cpumask_can_shrink(const struct cpumask *cur,
{
int ret = 1;

- if (!cpumask_weight(cur))
+ if (cpumask_empty(cur))
return ret;

ret = dl_cpuset_cpumask_can_shrink(cur, trial);
@@ -9054,7 +9054,7 @@ int sched_cpu_activate(unsigned int cpu)
/*
* When going up, increment the number of cores with SMT present.
*/
- if (cpumask_weight(cpu_smt_mask(cpu)) == 2)
+ if (cpumask_weight_eq(cpu_smt_mask(cpu), 2))
static_branch_inc_cpuslocked(&sched_smt_present);
#endif
set_cpu_active(cpu, true);
@@ -9129,7 +9129,7 @@ int sched_cpu_deactivate(unsigned int cpu)
/*
* When going down, decrement the number of cores with SMT present.
*/
- if (cpumask_weight(cpu_smt_mask(cpu)) == 2)
+ if (cpumask_weight_eq(cpu_smt_mask(cpu), 2))
static_branch_dec_cpuslocked(&sched_smt_present);

sched_core_cpu_deactivate(cpu);
diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
index d201a7052a29..79395571599f 100644
--- a/kernel/sched/topology.c
+++ b/kernel/sched/topology.c
@@ -74,7 +74,7 @@ static int sched_domain_debug_one(struct sched_domain *sd, int cpu, int level,
break;
}

- if (!cpumask_weight(sched_group_span(group))) {
+ if (cpumask_empty(sched_group_span(group))) {
printk(KERN_CONT "\n");
printk(KERN_ERR "ERROR: empty group\n");
break;
@@ -169,7 +169,7 @@ static const unsigned int SD_DEGENERATE_GROUPS_MASK =

static int sd_degenerate(struct sched_domain *sd)
{
- if (cpumask_weight(sched_domain_span(sd)) == 1)
+ if (cpumask_weight_eq(sched_domain_span(sd), 1))
return 1;

/* Following flags need at least 2 groups */
diff --git a/kernel/time/clockevents.c b/kernel/time/clockevents.c
index 003ccf338d20..32d6629a55b2 100644
--- a/kernel/time/clockevents.c
+++ b/kernel/time/clockevents.c
@@ -648,7 +648,7 @@ void tick_cleanup_dead_cpu(int cpu)
*/
list_for_each_entry_safe(dev, tmp, &clockevent_devices, list) {
if (cpumask_test_cpu(cpu, dev->cpumask) &&
- cpumask_weight(dev->cpumask) == 1 &&
+ cpumask_weight_eq(dev->cpumask, 1) &&
!tick_is_broadcast_device(dev)) {
BUG_ON(!clockevent_state_detached(dev));
list_del(&dev->list);
diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
index bf6a087e132f..8471b9378206 100644
--- a/kernel/time/clocksource.c
+++ b/kernel/time/clocksource.c
@@ -321,7 +321,7 @@ void clocksource_verify_percpu(struct clocksource *cs)
cpus_read_lock();
preempt_disable();
clocksource_verify_choose_cpus();
- if (cpumask_weight(&cpus_chosen) == 0) {
+ if (cpumask_empty(&cpus_chosen)) {
preempt_enable();
cpus_read_unlock();
pr_warn("Not enough CPUs to check clocksource '%s'.\n", cs->name);
diff --git a/mm/vmstat.c b/mm/vmstat.c
index d701c335628c..295642e2c24c 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -2032,7 +2032,7 @@ static void __init init_cpu_node_state(void)
int node;

for_each_online_node(node) {
- if (cpumask_weight(cpumask_of_node(node)) > 0)
+ if (!cpumask_empty(cpumask_of_node(node)))
node_set_state(node, N_CPU);
}
}
@@ -2059,7 +2059,7 @@ static int vmstat_cpu_dead(unsigned int cpu)

refresh_zone_stat_thresholds();
node_cpus = cpumask_of_node(node);
- if (cpumask_weight(node_cpus) > 0)
+ if (!cpumask_empty(node_cpus))
return 0;

node_clear_state(node, N_CPU);
--
2.25.1


2021-11-28 04:00:14

by Yury Norov

[permalink] [raw]
Subject: [PATCH 6/9] lib/nodemask: add nodemask_weight_{eq,gt,le}

Add nodemask_weight_{eq,gt,le} and replace nodemask_weight() where
appropriate. This allows nodemask_weight_*() to return earlier
depending on the condition.

Signed-off-by: Yury Norov <[email protected]>
---
arch/x86/mm/amdtopology.c | 2 +-
arch/x86/mm/numa_emulation.c | 4 ++--
drivers/acpi/numa/srat.c | 2 +-
include/linux/nodemask.h | 24 ++++++++++++++++++++++++
mm/mempolicy.c | 2 +-
5 files changed, 29 insertions(+), 5 deletions(-)

diff --git a/arch/x86/mm/amdtopology.c b/arch/x86/mm/amdtopology.c
index 058b2f36b3a6..b3ca7d23e4b0 100644
--- a/arch/x86/mm/amdtopology.c
+++ b/arch/x86/mm/amdtopology.c
@@ -154,7 +154,7 @@ int __init amd_numa_init(void)
node_set(nodeid, numa_nodes_parsed);
}

- if (!nodes_weight(numa_nodes_parsed))
+ if (nodes_empty(numa_nodes_parsed))
return -ENOENT;

/*
diff --git a/arch/x86/mm/numa_emulation.c b/arch/x86/mm/numa_emulation.c
index 1a02b791d273..9a9305367fdd 100644
--- a/arch/x86/mm/numa_emulation.c
+++ b/arch/x86/mm/numa_emulation.c
@@ -123,7 +123,7 @@ static int __init split_nodes_interleave(struct numa_meminfo *ei,
* Continue to fill physical nodes with fake nodes until there is no
* memory left on any of them.
*/
- while (nodes_weight(physnode_mask)) {
+ while (!nodes_empty(physnode_mask)) {
for_each_node_mask(i, physnode_mask) {
u64 dma32_end = PFN_PHYS(MAX_DMA32_PFN);
u64 start, limit, end;
@@ -270,7 +270,7 @@ static int __init split_nodes_size_interleave_uniform(struct numa_meminfo *ei,
* Fill physical nodes with fake nodes of size until there is no memory
* left on any of them.
*/
- while (nodes_weight(physnode_mask)) {
+ while (!nodes_empty(physnode_mask)) {
for_each_node_mask(i, physnode_mask) {
u64 dma32_end = PFN_PHYS(MAX_DMA32_PFN);
u64 start, limit, end;
diff --git a/drivers/acpi/numa/srat.c b/drivers/acpi/numa/srat.c
index 66a0142dc78c..c4f80d2d85bf 100644
--- a/drivers/acpi/numa/srat.c
+++ b/drivers/acpi/numa/srat.c
@@ -67,7 +67,7 @@ int acpi_map_pxm_to_node(int pxm)
node = pxm_to_node_map[pxm];

if (node == NUMA_NO_NODE) {
- if (nodes_weight(nodes_found_map) >= MAX_NUMNODES)
+ if (nodes_weight_gt(nodes_found_map, MAX_NUMNODES + 1))
return NUMA_NO_NODE;
node = first_unset_node(nodes_found_map);
__acpi_map_pxm_to_node(pxm, node);
diff --git a/include/linux/nodemask.h b/include/linux/nodemask.h
index 567c3ddba2c4..3801ec5b06f4 100644
--- a/include/linux/nodemask.h
+++ b/include/linux/nodemask.h
@@ -38,6 +38,9 @@
* int nodes_empty(mask) Is mask empty (no bits sets)?
* int nodes_full(mask) Is mask full (all bits sets)?
* int nodes_weight(mask) Hamming weight - number of set bits
+ * bool nodes_weight_eq(src, nbits, num) Hamming Weight is equal to num
+ * bool nodes_weight_gt(src, nbits, num) Hamming Weight is greater than num
+ * bool nodes_weight_le(src, nbits, num) Hamming Weight is less than num
*
* void nodes_shift_right(dst, src, n) Shift right
* void nodes_shift_left(dst, src, n) Shift left
@@ -240,6 +243,27 @@ static inline int __nodes_weight(const nodemask_t *srcp, unsigned int nbits)
return bitmap_weight(srcp->bits, nbits);
}

+#define nodes_weight_eq(nodemask, num) __nodes_weight_eq(&(nodemask), MAX_NUMNODES, (num))
+static inline int __nodes_weight_eq(const nodemask_t *srcp,
+ unsigned int nbits, unsigned int num)
+{
+ return bitmap_weight_eq(srcp->bits, nbits, num);
+}
+
+#define nodes_weight_gt(nodemask, num) __nodes_weight_gt(&(nodemask), MAX_NUMNODES, (num))
+static inline int __nodes_weight_gt(const nodemask_t *srcp,
+ unsigned int nbits, unsigned int num)
+{
+ return bitmap_weight_gt(srcp->bits, nbits, num);
+}
+
+#define nodes_weight_le(nodemask, num) __nodes_weight_le(&(nodemask), MAX_NUMNODES, (num))
+static inline int __nodes_weight_le(const nodemask_t *srcp,
+ unsigned int nbits, unsigned int num)
+{
+ return bitmap_weight_le(srcp->bits, nbits, num);
+}
+
#define nodes_shift_right(dst, src, n) \
__nodes_shift_right(&(dst), &(src), (n), MAX_NUMNODES)
static inline void __nodes_shift_right(nodemask_t *dstp,
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index b1fcdb4d25d6..4a48ce5b86cf 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -1154,7 +1154,7 @@ int do_migrate_pages(struct mm_struct *mm, const nodemask_t *from,
* [0-7] - > [3,4,5] moves only 0,1,2,6,7.
*/

- if ((nodes_weight(*from) != nodes_weight(*to)) &&
+ if (!nodes_weight_eq(*from, nodes_weight(*to)) &&
(node_isset(s, *to)))
continue;

--
2.25.1


2021-11-28 04:00:22

by Yury Norov

[permalink] [raw]
Subject: [PATCH 7/9] lib/cpumask: add num_{possible,present,active}_cpus_{eq,gt,le}

Add num_{possible,present,active}_cpus_{eq,gt,le} and replace num_*_cpus()
with one of new functions where appropriate. This allows num_*_cpus_*()
to return earlier depending on the condition.

Signed-off-by: Yury Norov <[email protected]>
---
arch/arc/kernel/smp.c | 2 +-
arch/arm/kernel/machine_kexec.c | 2 +-
arch/arm/mach-exynos/exynos.c | 2 +-
arch/arm/mm/cache-b15-rac.c | 2 +-
arch/arm64/kernel/smp.c | 2 +-
arch/arm64/mm/context.c | 2 +-
arch/csky/mm/asid.c | 2 +-
arch/csky/mm/context.c | 2 +-
arch/ia64/mm/tlb.c | 6 ++---
arch/mips/kernel/i8253.c | 2 +-
arch/mips/kernel/perf_event_mipsxx.c | 4 ++--
arch/mips/kernel/rtlx-cmp.c | 2 +-
arch/mips/kernel/smp.c | 4 ++--
arch/mips/kernel/vpe-cmp.c | 2 +-
.../loongson2ef/common/cs5536/cs5536_mfgpt.c | 2 +-
arch/mips/mm/context.c | 2 +-
arch/mips/mm/tlbex.c | 2 +-
arch/nios2/kernel/cpuinfo.c | 2 +-
arch/powerpc/platforms/85xx/smp.c | 2 +-
arch/powerpc/platforms/pseries/hotplug-cpu.c | 4 ++--
arch/powerpc/sysdev/mpic.c | 2 +-
arch/powerpc/xmon/xmon.c | 6 ++---
arch/riscv/kvm/vmid.c | 2 +-
arch/sparc/kernel/mdesc.c | 6 ++---
arch/x86/events/amd/core.c | 2 +-
arch/x86/kernel/alternative.c | 8 +++----
arch/x86/kernel/apic/apic.c | 4 ++--
arch/x86/kernel/apic/apic_flat_64.c | 2 +-
arch/x86/kernel/apic/probe_32.c | 2 +-
arch/x86/kernel/cpu/mce/dev-mcelog.c | 2 +-
arch/x86/kernel/hpet.c | 2 +-
arch/x86/kernel/i8253.c | 2 +-
arch/x86/kernel/kvm.c | 2 +-
arch/x86/kernel/kvmclock.c | 2 +-
arch/x86/kernel/tsc.c | 2 +-
arch/x86/xen/smp_pv.c | 2 +-
arch/x86/xen/spinlock.c | 2 +-
drivers/clk/samsung/clk-exynos4.c | 2 +-
drivers/clocksource/ingenic-timer.c | 3 +--
drivers/cpufreq/pcc-cpufreq.c | 2 +-
drivers/dma/mv_xor.c | 5 ++--
drivers/gpu/drm/i810/i810_drv.c | 2 +-
drivers/irqchip/irq-gic.c | 2 +-
drivers/net/caif/caif_virtio.c | 2 +-
.../cavium/liquidio/cn23xx_vf_device.c | 2 +-
drivers/net/ethernet/hisilicon/hns/hns_enet.c | 2 +-
.../net/ethernet/marvell/mvpp2/mvpp2_main.c | 2 +-
drivers/net/wireless/ath/ath9k/hw.c | 2 +-
drivers/net/wireless/marvell/mwifiex/main.c | 4 ++--
drivers/net/wireless/st/cw1200/queue.c | 3 +--
drivers/nvdimm/region.c | 2 +-
drivers/nvme/host/pci.c | 2 +-
drivers/perf/arm_pmu.c | 2 +-
.../intel/speed_select_if/isst_if_common.c | 6 ++---
drivers/soc/bcm/brcmstb/biuctrl.c | 2 +-
drivers/soc/fsl/dpio/dpio-service.c | 4 ++--
drivers/spi/spi-dw-bt1.c | 2 +-
drivers/virt/acrn/hsm.c | 2 +-
fs/xfs/xfs_sysfs.c | 2 +-
include/linux/cpumask.h | 23 +++++++++++++++++++
include/linux/kdb.h | 2 +-
kernel/debug/kdb/kdb_bt.c | 2 +-
kernel/printk/printk.c | 2 +-
kernel/reboot.c | 4 ++--
kernel/time/clockevents.c | 2 +-
mm/percpu.c | 6 ++---
mm/slab.c | 2 +-
67 files changed, 110 insertions(+), 90 deletions(-)

diff --git a/arch/arc/kernel/smp.c b/arch/arc/kernel/smp.c
index 78e6d069b1c1..d4f2765755c9 100644
--- a/arch/arc/kernel/smp.c
+++ b/arch/arc/kernel/smp.c
@@ -103,7 +103,7 @@ void __init smp_prepare_cpus(unsigned int max_cpus)
* if platform didn't set the present map already, do it now
* boot cpu is set to present already by init/main.c
*/
- if (num_present_cpus() <= 1)
+ if (num_present_cpus_le(2))
init_cpu_present(cpu_possible_mask);
}

diff --git a/arch/arm/kernel/machine_kexec.c b/arch/arm/kernel/machine_kexec.c
index f567032a09c0..8875e2ee0083 100644
--- a/arch/arm/kernel/machine_kexec.c
+++ b/arch/arm/kernel/machine_kexec.c
@@ -44,7 +44,7 @@ int machine_kexec_prepare(struct kimage *image)
* and implements CPU hotplug for the current HW. If not, we won't be
* able to kexec reliably, so fail the prepare operation.
*/
- if (num_possible_cpus() > 1 && platform_can_secondary_boot() &&
+ if (num_possible_cpus_gt(1) && platform_can_secondary_boot() &&
!platform_can_cpu_hotplug())
return -EINVAL;

diff --git a/arch/arm/mach-exynos/exynos.c b/arch/arm/mach-exynos/exynos.c
index 8b48326be9fd..ba658402ac1e 100644
--- a/arch/arm/mach-exynos/exynos.c
+++ b/arch/arm/mach-exynos/exynos.c
@@ -120,7 +120,7 @@ void exynos_set_delayed_reset_assertion(bool enable)
if (of_machine_is_compatible("samsung,exynos4")) {
unsigned int tmp, core_id;

- for (core_id = 0; core_id < num_possible_cpus(); core_id++) {
+ for (core_id = 0; num_possible_cpus_gt(core_id); core_id++) {
tmp = pmu_raw_readl(EXYNOS_ARM_CORE_OPTION(core_id));
if (enable)
tmp |= S5P_USE_DELAYED_RESET_ASSERTION;
diff --git a/arch/arm/mm/cache-b15-rac.c b/arch/arm/mm/cache-b15-rac.c
index bdc07030997b..202c3a6cf98b 100644
--- a/arch/arm/mm/cache-b15-rac.c
+++ b/arch/arm/mm/cache-b15-rac.c
@@ -296,7 +296,7 @@ static int __init b15_rac_init(void)
if (!dn)
return -ENODEV;

- if (WARN(num_possible_cpus() > 4, "RAC only supports 4 CPUs\n"))
+ if (WARN(num_possible_cpus_gt(4), "RAC only supports 4 CPUs\n"))
goto out;

b15_rac_base = of_iomap(dn, 0);
diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
index 27df5c1e6baa..bd1280e5081b 100644
--- a/arch/arm64/kernel/smp.c
+++ b/arch/arm64/kernel/smp.c
@@ -1099,7 +1099,7 @@ static bool have_cpu_die(void)

bool cpus_are_stuck_in_kernel(void)
{
- bool smp_spin_tables = (num_possible_cpus() > 1 && !have_cpu_die());
+ bool smp_spin_tables = (num_possible_cpus_gt(1) && !have_cpu_die());

return !!cpus_stuck_in_kernel || smp_spin_tables ||
is_protected_kvm_enabled();
diff --git a/arch/arm64/mm/context.c b/arch/arm64/mm/context.c
index cd72576ae2b7..702248dc105e 100644
--- a/arch/arm64/mm/context.c
+++ b/arch/arm64/mm/context.c
@@ -384,7 +384,7 @@ static int asids_update_limit(void)
* Expect allocation after rollover to fail if we don't have at least
* one more ASID than CPUs. ASID #0 is reserved for init_mm.
*/
- WARN_ON(num_available_asids - 1 <= num_possible_cpus());
+ WARN_ON(num_possible_cpus_gt(num_available_asids - 2));
pr_info("ASID allocator initialised with %lu entries\n",
num_available_asids);

diff --git a/arch/csky/mm/asid.c b/arch/csky/mm/asid.c
index b2e914745c1d..4dd6eb62a9e0 100644
--- a/arch/csky/mm/asid.c
+++ b/arch/csky/mm/asid.c
@@ -176,7 +176,7 @@ int asid_allocator_init(struct asid_info *info,
* Expect allocation after rollover to fail if we don't have at least
* one more ASID than CPUs. ASID #0 is always reserved.
*/
- WARN_ON(NUM_CTXT_ASIDS(info) - 1 <= num_possible_cpus());
+ WARN_ON(num_possible_cpus_gt(NUM_CTXT_ASIDS(info) - 2));
atomic64_set(&info->generation, ASID_FIRST_VERSION(info));
info->map = kcalloc(BITS_TO_LONGS(NUM_CTXT_ASIDS(info)),
sizeof(*info->map), GFP_KERNEL);
diff --git a/arch/csky/mm/context.c b/arch/csky/mm/context.c
index 0d95bdd93846..c12312215bde 100644
--- a/arch/csky/mm/context.c
+++ b/arch/csky/mm/context.c
@@ -28,7 +28,7 @@ static void asid_flush_cpu_ctxt(void)

static int asids_init(void)
{
- BUG_ON(((1 << CONFIG_CPU_ASID_BITS) - 1) <= num_possible_cpus());
+ BUG_ON(num_possible_cpus_gt((1 << CONFIG_CPU_ASID_BITS) - 2));

if (asid_allocator_init(&asid_info, CONFIG_CPU_ASID_BITS, 1,
asid_flush_cpu_ctxt))
diff --git a/arch/ia64/mm/tlb.c b/arch/ia64/mm/tlb.c
index a5bce13ab047..44f623f5dc5e 100644
--- a/arch/ia64/mm/tlb.c
+++ b/arch/ia64/mm/tlb.c
@@ -202,7 +202,7 @@ setup_ptcg_sem(int max_purges, int nptcg_from)
goto resetsema;
}
if (kp_override) {
- need_ptcg_sem = num_possible_cpus() > nptcg;
+ need_ptcg_sem = num_possible_cpus_gt(nptcg);
return;
}

@@ -221,7 +221,7 @@ setup_ptcg_sem(int max_purges, int nptcg_from)
}
if (palo_override) {
if (nptcg != PALO_MAX_TLB_PURGES)
- need_ptcg_sem = (num_possible_cpus() > nptcg);
+ need_ptcg_sem = num_possible_cpus_gt(nptcg);
return;
}

@@ -238,7 +238,7 @@ setup_ptcg_sem(int max_purges, int nptcg_from)
need_ptcg_sem = 0;
return;
} else
- need_ptcg_sem = (num_possible_cpus() > nptcg);
+ need_ptcg_sem = num_possible_cpus_gt(nptcg);

resetsema:
spinaphore_init(&ptcg_sem, max_purges);
diff --git a/arch/mips/kernel/i8253.c b/arch/mips/kernel/i8253.c
index ca21210e06b5..89a63538be4a 100644
--- a/arch/mips/kernel/i8253.c
+++ b/arch/mips/kernel/i8253.c
@@ -29,7 +29,7 @@ void __init setup_pit_timer(void)

static int __init init_pit_clocksource(void)
{
- if (num_possible_cpus() > 1 || /* PIT does not scale! */
+ if (num_possible_cpus_gt(1) || /* PIT does not scale! */
!clockevent_state_periodic(&i8253_clockevent))
return 0;

diff --git a/arch/mips/kernel/perf_event_mipsxx.c b/arch/mips/kernel/perf_event_mipsxx.c
index 1641d274fe37..4b6458899b05 100644
--- a/arch/mips/kernel/perf_event_mipsxx.c
+++ b/arch/mips/kernel/perf_event_mipsxx.c
@@ -135,7 +135,7 @@ static DEFINE_RWLOCK(pmuint_rwlock);
/* Copied from op_model_mipsxx.c */
static unsigned int vpe_shift(void)
{
- if (num_possible_cpus() > 1)
+ if (num_possible_cpus_gt(1))
return 1;

return 0;
@@ -704,7 +704,7 @@ static unsigned int mipspmu_perf_event_encode(const struct mips_perf_event *pev)
* event_id.
*/
#ifdef CONFIG_MIPS_MT_SMP
- if (num_possible_cpus() > 1)
+ if (num_possible_cpus_gt(1))
return ((unsigned int)pev->range << 24) |
(pev->cntr_mask & 0xffff00) |
(pev->event_id & 0xff);
diff --git a/arch/mips/kernel/rtlx-cmp.c b/arch/mips/kernel/rtlx-cmp.c
index d26dcc4b46e7..e4bb83bc46c6 100644
--- a/arch/mips/kernel/rtlx-cmp.c
+++ b/arch/mips/kernel/rtlx-cmp.c
@@ -54,7 +54,7 @@ int __init rtlx_module_init(void)
return -ENODEV;
}

- if (num_possible_cpus() - aprp_cpu_index() < 1) {
+ if (num_possible_cpus_le(aprp_cpu_index() + 1)) {
pr_warn("No TCs reserved for AP/SP, not initializing RTLX.\n"
"Pass maxcpus=<n> argument as kernel argument\n");

diff --git a/arch/mips/kernel/smp.c b/arch/mips/kernel/smp.c
index d542fb7af3ba..6a0bbf249528 100644
--- a/arch/mips/kernel/smp.c
+++ b/arch/mips/kernel/smp.c
@@ -248,7 +248,7 @@ int mips_smp_ipi_allocate(const struct cpumask *mask)
* setup, if we're running with only a single CPU.
*/
if (!ipidomain) {
- BUG_ON(num_present_cpus() > 1);
+ BUG_ON(num_present_cpus_gt(1));
return 0;
}

@@ -314,7 +314,7 @@ int mips_smp_ipi_free(const struct cpumask *mask)

static int __init mips_smp_ipi_init(void)
{
- if (num_possible_cpus() == 1)
+ if (num_possible_cpus_eq(1))
return 0;

mips_smp_ipi_allocate(cpu_possible_mask);
diff --git a/arch/mips/kernel/vpe-cmp.c b/arch/mips/kernel/vpe-cmp.c
index e673603e11e5..c1dc00cda666 100644
--- a/arch/mips/kernel/vpe-cmp.c
+++ b/arch/mips/kernel/vpe-cmp.c
@@ -98,7 +98,7 @@ int __init vpe_module_init(void)
return -ENODEV;
}

- if (num_possible_cpus() - aprp_cpu_index() < 1) {
+ if (num_possible_cpus_le(aprp_cpu_index() + 1)) {
pr_warn("No VPEs reserved for AP/SP, not initialize VPE loader\n"
"Pass maxcpus=<n> argument as kernel argument\n");
return -ENODEV;
diff --git a/arch/mips/loongson2ef/common/cs5536/cs5536_mfgpt.c b/arch/mips/loongson2ef/common/cs5536/cs5536_mfgpt.c
index f21a540a1dd2..37166fa866c4 100644
--- a/arch/mips/loongson2ef/common/cs5536/cs5536_mfgpt.c
+++ b/arch/mips/loongson2ef/common/cs5536/cs5536_mfgpt.c
@@ -194,7 +194,7 @@ static struct clocksource clocksource_mfgpt = {

int __init init_mfgpt_clocksource(void)
{
- if (num_possible_cpus() > 1) /* MFGPT does not scale! */
+ if (num_possible_cpus_gt(1)) /* MFGPT does not scale! */
return 0;

return clocksource_register_hz(&clocksource_mfgpt, MFGPT_TICK_RATE);
diff --git a/arch/mips/mm/context.c b/arch/mips/mm/context.c
index b25564090939..bf508e38d30a 100644
--- a/arch/mips/mm/context.c
+++ b/arch/mips/mm/context.c
@@ -274,7 +274,7 @@ static int mmid_init(void)
* one more MMID than CPUs.
*/
num_mmids = asid_first_version(0);
- WARN_ON(num_mmids <= num_possible_cpus());
+ WARN_ON(num_possible_cpus_gt(num_mmids - 1));

atomic64_set(&mmid_version, asid_first_version(0));
mmid_map = kcalloc(BITS_TO_LONGS(num_mmids), sizeof(*mmid_map),
diff --git a/arch/mips/mm/tlbex.c b/arch/mips/mm/tlbex.c
index bede66b072a7..92dae5cfa0a4 100644
--- a/arch/mips/mm/tlbex.c
+++ b/arch/mips/mm/tlbex.c
@@ -363,7 +363,7 @@ static struct work_registers build_get_work_registers(u32 **p)
return r;
}

- if (num_possible_cpus() > 1) {
+ if (num_possible_cpus_gt(1)) {
/* Get smp_processor_id */
UASM_i_CPUID_MFC0(p, K0, SMP_CPUID_REG);
UASM_i_SRL_SAFE(p, K0, K0, SMP_CPUID_REGSHIFT);
diff --git a/arch/nios2/kernel/cpuinfo.c b/arch/nios2/kernel/cpuinfo.c
index 203870c4b86d..7bdc511eba60 100644
--- a/arch/nios2/kernel/cpuinfo.c
+++ b/arch/nios2/kernel/cpuinfo.c
@@ -172,7 +172,7 @@ static void *cpuinfo_start(struct seq_file *m, loff_t *pos)
{
unsigned long i = *pos;

- return i < num_possible_cpus() ? (void *) (i + 1) : NULL;
+ return num_possible_cpus_gt(i) ? (void *) (i + 1) : NULL;
}

static void *cpuinfo_next(struct seq_file *m, void *v, loff_t *pos)
diff --git a/arch/powerpc/platforms/85xx/smp.c b/arch/powerpc/platforms/85xx/smp.c
index 83f4a6389a28..15573310fab4 100644
--- a/arch/powerpc/platforms/85xx/smp.c
+++ b/arch/powerpc/platforms/85xx/smp.c
@@ -280,7 +280,7 @@ static int smp_85xx_kick_cpu(int nr)
int primary = nr;
#endif

- WARN_ON(nr < 0 || nr >= num_possible_cpus());
+ WARN_ON(nr < 0 || num_possible_cpus_le(nr + 1));

pr_debug("kick CPU #%d\n", nr);

diff --git a/arch/powerpc/platforms/pseries/hotplug-cpu.c b/arch/powerpc/platforms/pseries/hotplug-cpu.c
index 5ab44600c8d3..b0d66de92309 100644
--- a/arch/powerpc/platforms/pseries/hotplug-cpu.c
+++ b/arch/powerpc/platforms/pseries/hotplug-cpu.c
@@ -365,7 +365,7 @@ static int dlpar_offline_cpu(struct device_node *dn)
cpu_maps_update_begin();
break;
}
- if (cpu == num_possible_cpus()) {
+ if (num_possible_cpus_eq(cpu)) {
pr_warn("Could not find cpu to offline with physical id 0x%x\n",
thread);
}
@@ -408,7 +408,7 @@ static int dlpar_online_cpu(struct device_node *dn)

break;
}
- if (cpu == num_possible_cpus())
+ if (num_possible_cpus_eq(cpu))
printk(KERN_WARNING "Could not find cpu to online "
"with physical id 0x%x\n", thread);
}
diff --git a/arch/powerpc/sysdev/mpic.c b/arch/powerpc/sysdev/mpic.c
index 995fb2ada507..ded5007f2af9 100644
--- a/arch/powerpc/sysdev/mpic.c
+++ b/arch/powerpc/sysdev/mpic.c
@@ -1440,7 +1440,7 @@ struct mpic * __init mpic_alloc(struct device_node *node,
* The MPIC driver will crash if there are more cores than we
* can initialize, so we may as well catch that problem here.
*/
- BUG_ON(num_possible_cpus() > MPIC_MAX_CPUS);
+ BUG_ON(num_possible_cpus_gt(MPIC_MAX_CPUS));

/* Map the per-CPU registers */
for_each_possible_cpu(i) {
diff --git a/arch/powerpc/xmon/xmon.c b/arch/powerpc/xmon/xmon.c
index 2073be312fe9..938346f9af7d 100644
--- a/arch/powerpc/xmon/xmon.c
+++ b/arch/powerpc/xmon/xmon.c
@@ -2747,7 +2747,7 @@ static void dump_all_pacas(void)
{
int cpu;

- if (num_possible_cpus() == 0) {
+ if (num_possible_cpus_eq(0)) {
printf("No possible cpus, use 'dp #' to dump individual cpus\n");
return;
}
@@ -2809,7 +2809,7 @@ static void dump_all_xives(void)
{
int cpu;

- if (num_possible_cpus() == 0) {
+ if (num_possible_cpus_eq(0)) {
printf("No possible cpus, use 'dx #' to dump individual cpus\n");
return;
}
@@ -3692,7 +3692,7 @@ symbol_lookup(void)
ptr >= (void __percpu *)__per_cpu_start &&
ptr < (void __percpu *)__per_cpu_end)
{
- if (scanhex(&cpu) && cpu < num_possible_cpus()) {
+ if (scanhex(&cpu) && num_possible_cpus_gt(cpu)) {
addr = (unsigned long)per_cpu_ptr(ptr, cpu);
} else {
cpu = raw_smp_processor_id();
diff --git a/arch/riscv/kvm/vmid.c b/arch/riscv/kvm/vmid.c
index 2c6253b293bc..6e176baedf65 100644
--- a/arch/riscv/kvm/vmid.c
+++ b/arch/riscv/kvm/vmid.c
@@ -36,7 +36,7 @@ void kvm_riscv_stage2_vmid_detect(void)
__kvm_riscv_hfence_gvma_all();

/* We don't use VMID bits if they are not sufficient */
- if ((1UL << vmid_bits) < num_possible_cpus())
+ if (num_possible_cpus_gt(1UL << vmid_bits))
vmid_bits = 0;
}

diff --git a/arch/sparc/kernel/mdesc.c b/arch/sparc/kernel/mdesc.c
index 30f171b7b00c..b779c6607ff3 100644
--- a/arch/sparc/kernel/mdesc.c
+++ b/arch/sparc/kernel/mdesc.c
@@ -885,7 +885,7 @@ static void __mark_core_id(struct mdesc_handle *hp, u64 node,
{
const u64 *id = mdesc_get_property(hp, node, "id", NULL);

- if (*id < num_possible_cpus())
+ if (num_possible_cpus_gt(*id))
cpu_data(*id).core_id = core_id;
}

@@ -894,7 +894,7 @@ static void __mark_max_cache_id(struct mdesc_handle *hp, u64 node,
{
const u64 *id = mdesc_get_property(hp, node, "id", NULL);

- if (*id < num_possible_cpus()) {
+ if (num_possible_cpus_gt(*id)) {
cpu_data(*id).max_cache_id = max_cache_id;

/**
@@ -986,7 +986,7 @@ static void set_sock_ids_by_socket(struct mdesc_handle *hp, u64 mp)
continue;

id = mdesc_get_property(hp, t, "id", NULL);
- if (*id < num_possible_cpus())
+ if (num_possible_cpus_gt(*id))
cpu_data(*id).sock_id = idx;
}
idx++;
diff --git a/arch/x86/events/amd/core.c b/arch/x86/events/amd/core.c
index 9687a8aef01c..d69ed09a85b0 100644
--- a/arch/x86/events/amd/core.c
+++ b/arch/x86/events/amd/core.c
@@ -1007,7 +1007,7 @@ __init int amd_pmu_init(void)
if (ret)
return ret;

- if (num_possible_cpus() == 1) {
+ if (num_possible_cpus_eq(1)) {
/*
* No point in allocating data structures to serialize
* against other CPUs, when there is only the one CPU.
diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
index 23fb4d51a5da..55fd70fdb213 100644
--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -574,7 +574,7 @@ void __init_or_module alternatives_smp_module_add(struct module *mod,
if (!uniproc_patched)
goto unlock;

- if (num_possible_cpus() == 1)
+ if (num_possible_cpus_eq(1))
/* Don't bother remembering, we'll never have to undo it. */
goto smp_unlock;

@@ -620,7 +620,7 @@ void alternatives_enable_smp(void)
struct smp_alt_module *mod;

/* Why bother if there are no other CPUs? */
- BUG_ON(num_possible_cpus() == 1);
+ BUG_ON(num_possible_cpus_eq(1));

mutex_lock(&text_mutex);

@@ -833,14 +833,14 @@ void __init alternative_instructions(void)

#ifdef CONFIG_SMP
/* Patch to UP if other cpus not imminent. */
- if (!noreplace_smp && (num_present_cpus() == 1 || setup_max_cpus <= 1)) {
+ if (!noreplace_smp && (num_present_cpus_eq(1) || setup_max_cpus <= 1)) {
uniproc_patched = true;
alternatives_smp_module_add(NULL, "core kernel",
__smp_locks, __smp_locks_end,
_text, _etext);
}

- if (!uniproc_patched || num_possible_cpus() == 1) {
+ if (!uniproc_patched || num_possible_cpus_eq(1)) {
free_init_pages("SMP alternatives",
(unsigned long)__smp_locks,
(unsigned long)__smp_locks_end);
diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
index b70344bf6600..9a3d0748ca86 100644
--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -1020,7 +1020,7 @@ void __init setup_boot_APIC_clock(void)
if (disable_apic_timer) {
pr_info("Disabling APIC timer\n");
/* No broadcast on UP ! */
- if (num_possible_cpus() > 1) {
+ if (num_possible_cpus_gt(1)) {
lapic_clockevent.mult = 1;
setup_APIC_timer();
}
@@ -1029,7 +1029,7 @@ void __init setup_boot_APIC_clock(void)

if (calibrate_APIC_clock()) {
/* No broadcast on UP ! */
- if (num_possible_cpus() > 1)
+ if (num_possible_cpus_gt(1))
setup_APIC_timer();
return;
}
diff --git a/arch/x86/kernel/apic/apic_flat_64.c b/arch/x86/kernel/apic/apic_flat_64.c
index 8f72b4351c9f..3dfd4c5d30dc 100644
--- a/arch/x86/kernel/apic/apic_flat_64.c
+++ b/arch/x86/kernel/apic/apic_flat_64.c
@@ -189,7 +189,7 @@ static void physflat_init_apic_ldr(void)

static int physflat_probe(void)
{
- if (apic == &apic_physflat || num_possible_cpus() > 8 ||
+ if (apic == &apic_physflat || num_possible_cpus_gt(8) ||
jailhouse_paravirt())
return 1;

diff --git a/arch/x86/kernel/apic/probe_32.c b/arch/x86/kernel/apic/probe_32.c
index a61f642b1b90..b65c1572aaf5 100644
--- a/arch/x86/kernel/apic/probe_32.c
+++ b/arch/x86/kernel/apic/probe_32.c
@@ -138,7 +138,7 @@ void __init default_setup_apic_routing(void)
{
int version = boot_cpu_apic_version;

- if (num_possible_cpus() > 8) {
+ if (num_possible_cpus_gt(8)) {
switch (boot_cpu_data.x86_vendor) {
case X86_VENDOR_INTEL:
if (!APIC_XAPIC(version)) {
diff --git a/arch/x86/kernel/cpu/mce/dev-mcelog.c b/arch/x86/kernel/cpu/mce/dev-mcelog.c
index 100fbeebdc72..34e44b0d9546 100644
--- a/arch/x86/kernel/cpu/mce/dev-mcelog.c
+++ b/arch/x86/kernel/cpu/mce/dev-mcelog.c
@@ -310,7 +310,7 @@ static ssize_t mce_chrdev_write(struct file *filp, const char __user *ubuf,
if (copy_from_user(&m, ubuf, usize))
return -EFAULT;

- if (m.extcpu >= num_possible_cpus() || !cpu_online(m.extcpu))
+ if (num_possible_cpus_le(m.extcpu + 1) || !cpu_online(m.extcpu))
return -EINVAL;

/*
diff --git a/arch/x86/kernel/hpet.c b/arch/x86/kernel/hpet.c
index 882213df3713..e432e6248599 100644
--- a/arch/x86/kernel/hpet.c
+++ b/arch/x86/kernel/hpet.c
@@ -737,7 +737,7 @@ static void __init hpet_select_clockevents(void)
hc->irq = irq;
hc->mode = HPET_MODE_CLOCKEVT;

- if (++hpet_base.nr_clockevents == num_possible_cpus())
+ if (num_possible_cpus_eq(++hpet_base.nr_clockevents))
break;
}

diff --git a/arch/x86/kernel/i8253.c b/arch/x86/kernel/i8253.c
index 2b7999a1a50a..e6e30a7bc80f 100644
--- a/arch/x86/kernel/i8253.c
+++ b/arch/x86/kernel/i8253.c
@@ -57,7 +57,7 @@ static int __init init_pit_clocksource(void)
* - when HPET is enabled
* - when local APIC timer is active (PIT is switched off)
*/
- if (num_possible_cpus() > 1 || is_hpet_enabled() ||
+ if (num_possible_cpus_gt(1) || is_hpet_enabled() ||
!clockevent_state_periodic(&i8253_clockevent))
return 0;

diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index 59abbdad7729..375226dcf29e 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -1057,7 +1057,7 @@ void __init kvm_spinlock_init(void)
goto out;
}

- if (num_possible_cpus() == 1) {
+ if (num_possible_cpus_eq(1)) {
pr_info("PV spinlocks disabled, single CPU\n");
goto out;
}
diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c
index 462dd8e9b03d..12c1fb1dfd07 100644
--- a/arch/x86/kernel/kvmclock.c
+++ b/arch/x86/kernel/kvmclock.c
@@ -205,7 +205,7 @@ static void __init kvmclock_init_mem(void)
struct page *p;
int r;

- if (HVC_BOOT_ARRAY_SIZE >= num_possible_cpus())
+ if (num_possible_cpus_le(HVC_BOOT_ARRAY_SIZE + 1))
return;

ncpus = num_possible_cpus() - HVC_BOOT_ARRAY_SIZE;
diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
index 2e076a459a0c..2245c9721d4a 100644
--- a/arch/x86/kernel/tsc.c
+++ b/arch/x86/kernel/tsc.c
@@ -1223,7 +1223,7 @@ int unsynchronized_tsc(void)
*/
if (boot_cpu_data.x86_vendor != X86_VENDOR_INTEL) {
/* assume multi socket systems are not synchronized: */
- if (num_possible_cpus() > 1)
+ if (num_possible_cpus_gt(1))
return 1;
}

diff --git a/arch/x86/xen/smp_pv.c b/arch/x86/xen/smp_pv.c
index 6a8f3b53ab83..b32ca28292ae 100644
--- a/arch/x86/xen/smp_pv.c
+++ b/arch/x86/xen/smp_pv.c
@@ -254,7 +254,7 @@ static void __init xen_pv_smp_prepare_cpus(unsigned int max_cpus)
cpumask_copy(xen_cpu_initialized_map, cpumask_of(0));

/* Restrict the possible_map according to max_cpus. */
- while ((num_possible_cpus() > 1) && (num_possible_cpus() > max_cpus)) {
+ while (num_possible_cpus_gt(max(1, max_cpus))) {
for (cpu = nr_cpu_ids - 1; !cpu_possible(cpu); cpu--)
continue;
set_cpu_possible(cpu, false);
diff --git a/arch/x86/xen/spinlock.c b/arch/x86/xen/spinlock.c
index 043c73dfd2c9..58caaa9aec3e 100644
--- a/arch/x86/xen/spinlock.c
+++ b/arch/x86/xen/spinlock.c
@@ -125,7 +125,7 @@ PV_CALLEE_SAVE_REGS_THUNK(xen_vcpu_stolen);
void __init xen_init_spinlocks(void)
{
/* Don't need to use pvqspinlock code if there is only 1 vCPU. */
- if (num_possible_cpus() == 1 || nopvspin)
+ if (num_possible_cpus_eq(1) || nopvspin)
xen_pvspin = false;

if (!xen_pvspin) {
diff --git a/drivers/clk/samsung/clk-exynos4.c b/drivers/clk/samsung/clk-exynos4.c
index 22009cb53428..64d7de6b885c 100644
--- a/drivers/clk/samsung/clk-exynos4.c
+++ b/drivers/clk/samsung/clk-exynos4.c
@@ -1178,7 +1178,7 @@ static void __init exynos4x12_core_down_clock(void)
PWR_CTRL1_USE_CORE1_WFE | PWR_CTRL1_USE_CORE0_WFE |
PWR_CTRL1_USE_CORE1_WFI | PWR_CTRL1_USE_CORE0_WFI);
/* On Exynos4412 enable it also on core 2 and 3 */
- if (num_possible_cpus() == 4)
+ if (num_possible_cpus_eq(4))
tmp |= PWR_CTRL1_USE_CORE3_WFE | PWR_CTRL1_USE_CORE2_WFE |
PWR_CTRL1_USE_CORE3_WFI | PWR_CTRL1_USE_CORE2_WFI;
writel_relaxed(tmp, reg_base + PWR_CTRL1);
diff --git a/drivers/clocksource/ingenic-timer.c b/drivers/clocksource/ingenic-timer.c
index 24ed0f1f089b..c4a34d26357c 100644
--- a/drivers/clocksource/ingenic-timer.c
+++ b/drivers/clocksource/ingenic-timer.c
@@ -302,8 +302,7 @@ static int __init ingenic_tcu_init(struct device_node *np)
(u32 *)&tcu->pwm_channels_mask);

/* Verify that we have at least num_possible_cpus() + 1 free channels */
- if (hweight8(tcu->pwm_channels_mask) >
- soc_info->num_channels - num_possible_cpus() + 1) {
+ if (num_possible_cpus_gt(soc_info->num_channels + 1 - hweight8(tcu->pwm_channels_mask))) {
pr_crit("%s: Invalid PWM channel mask: 0x%02lx\n", __func__,
tcu->pwm_channels_mask);
ret = -EINVAL;
diff --git a/drivers/cpufreq/pcc-cpufreq.c b/drivers/cpufreq/pcc-cpufreq.c
index 9f3fc7a073d0..8bf76eaa9e1e 100644
--- a/drivers/cpufreq/pcc-cpufreq.c
+++ b/drivers/cpufreq/pcc-cpufreq.c
@@ -593,7 +593,7 @@ static int __init pcc_cpufreq_init(void)
return ret;
}

- if (num_present_cpus() > 4) {
+ if (num_present_cpus_gt(4)) {
pcc_cpufreq_driver.flags |= CPUFREQ_NO_AUTO_DYNAMIC_SWITCHING;
pr_err("%s: Too many CPUs, dynamic performance scaling disabled\n",
__func__);
diff --git a/drivers/dma/mv_xor.c b/drivers/dma/mv_xor.c
index 23b232b57518..f99177e72158 100644
--- a/drivers/dma/mv_xor.c
+++ b/drivers/dma/mv_xor.c
@@ -1293,7 +1293,7 @@ static int mv_xor_probe(struct platform_device *pdev)
struct mv_xor_device *xordev;
struct mv_xor_platform_data *pdata = dev_get_platdata(&pdev->dev);
struct resource *res;
- unsigned int max_engines, max_channels;
+ unsigned int max_channels;
int i, ret;

dev_notice(&pdev->dev, "Marvell shared XOR driver\n");
@@ -1362,7 +1362,6 @@ static int mv_xor_probe(struct platform_device *pdev)
* separate engines when possible. For dual-CPU Armada 3700
* SoC with single XOR engine allow using its both channels.
*/
- max_engines = num_present_cpus();
if (xordev->xor_type == XOR_ARMADA_37XX)
max_channels = num_present_cpus();
else
@@ -1370,7 +1369,7 @@ static int mv_xor_probe(struct platform_device *pdev)
MV_XOR_MAX_CHANNELS,
DIV_ROUND_UP(num_present_cpus(), 2));

- if (mv_xor_engine_count >= max_engines)
+ if (num_present_cpus_le(mv_xor_engine_count + 1))
return 0;

if (pdev->dev.of_node) {
diff --git a/drivers/gpu/drm/i810/i810_drv.c b/drivers/gpu/drm/i810/i810_drv.c
index 0e53a066d4db..c70745fa4166 100644
--- a/drivers/gpu/drm/i810/i810_drv.c
+++ b/drivers/gpu/drm/i810/i810_drv.c
@@ -80,7 +80,7 @@ static struct pci_driver i810_pci_driver = {

static int __init i810_init(void)
{
- if (num_possible_cpus() > 1) {
+ if (num_possible_cpus_gt(1)) {
pr_err("drm/i810 does not support SMP\n");
return -EINVAL;
}
diff --git a/drivers/irqchip/irq-gic.c b/drivers/irqchip/irq-gic.c
index b8bb46c65a97..4e319e4ba9dc 100644
--- a/drivers/irqchip/irq-gic.c
+++ b/drivers/irqchip/irq-gic.c
@@ -430,7 +430,7 @@ static u8 gic_get_cpumask(struct gic_chip_data *gic)
break;
}

- if (!mask && num_possible_cpus() > 1)
+ if (!mask && num_possible_cpus_gt(1))
pr_crit("GIC CPU mask not found - kernel will fail to boot.\n");

return mask;
diff --git a/drivers/net/caif/caif_virtio.c b/drivers/net/caif/caif_virtio.c
index 91230894692d..c7aa3f6dc635 100644
--- a/drivers/net/caif/caif_virtio.c
+++ b/drivers/net/caif/caif_virtio.c
@@ -537,7 +537,7 @@ static netdev_tx_t cfv_netdev_tx(struct sk_buff *skb, struct net_device *netdev)
*
* Flow-on is triggered when sufficient buffers are freed
*/
- if (unlikely(cfv->vq_tx->num_free <= num_present_cpus())) {
+ if (unlikely(num_present_cpus_gt(cfv->vq_tx->num_free - 1))) {
flow_off = true;
cfv->stats.tx_full_ring++;
}
diff --git a/drivers/net/ethernet/cavium/liquidio/cn23xx_vf_device.c b/drivers/net/ethernet/cavium/liquidio/cn23xx_vf_device.c
index fda49404968c..79d5ded30b65 100644
--- a/drivers/net/ethernet/cavium/liquidio/cn23xx_vf_device.c
+++ b/drivers/net/ethernet/cavium/liquidio/cn23xx_vf_device.c
@@ -649,7 +649,7 @@ int cn23xx_setup_octeon_vf_device(struct octeon_device *oct)
rings_per_vf);
oct->sriov_info.rings_per_vf = rings_per_vf;
} else {
- if (rings_per_vf > num_present_cpus()) {
+ if (num_present_cpus_le(rings_per_vf)) {
dev_warn(&oct->pci_dev->dev,
"PF configured rings_per_vf:%d greater than num_cpu:%d. Using rings_per_vf:%d equal to num cpus\n",
rings_per_vf,
diff --git a/drivers/net/ethernet/hisilicon/hns/hns_enet.c b/drivers/net/ethernet/hisilicon/hns/hns_enet.c
index 22a463e15678..7d97939413d2 100644
--- a/drivers/net/ethernet/hisilicon/hns/hns_enet.c
+++ b/drivers/net/ethernet/hisilicon/hns/hns_enet.c
@@ -1239,7 +1239,7 @@ static int hns_nic_init_affinity_mask(int q_num, int ring_idx,
* The cpu mask set by ring index according to the ring flag
* which indicate the ring is tx or rx.
*/
- if (q_num == num_possible_cpus()) {
+ if (num_possible_cpus_eq(q_num)) {
if (is_tx_ring(ring))
cpu = ring_idx;
else
diff --git a/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c b/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c
index a48e804c46f2..34ad59fd51d6 100644
--- a/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c
+++ b/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c
@@ -3315,7 +3315,7 @@ static int mvpp2_setup_txqs(struct mvpp2_port *port)
goto err_cleanup;

/* Assign this queue to a CPU */
- if (queue < num_possible_cpus())
+ if (num_possible_cpus_gt(queue))
netif_set_xps_queue(port->dev, cpumask_of(queue), queue);
}

diff --git a/drivers/net/wireless/ath/ath9k/hw.c b/drivers/net/wireless/ath/ath9k/hw.c
index 172081ffe477..33d3cddc6c7b 100644
--- a/drivers/net/wireless/ath/ath9k/hw.c
+++ b/drivers/net/wireless/ath/ath9k/hw.c
@@ -429,7 +429,7 @@ static void ath9k_hw_init_config(struct ath_hw *ah)
* This issue is not present on PCI-Express devices or pre-AR5416
* devices (legacy, 802.11abg).
*/
- if (num_possible_cpus() > 1)
+ if (num_possible_cpus_gt(1))
ah->config.serialize_regmode = SER_REG_MODE_AUTO;

if (NR_CPUS > 1 && ah->config.serialize_regmode == SER_REG_MODE_AUTO) {
diff --git a/drivers/net/wireless/marvell/mwifiex/main.c b/drivers/net/wireless/marvell/mwifiex/main.c
index 19b996c6a260..6ce0236a3203 100644
--- a/drivers/net/wireless/marvell/mwifiex/main.c
+++ b/drivers/net/wireless/marvell/mwifiex/main.c
@@ -1536,7 +1536,7 @@ mwifiex_reinit_sw(struct mwifiex_adapter *adapter)
adapter->cmd_wait_q.status = 0;
adapter->scan_wait_q_woken = false;

- if ((num_possible_cpus() > 1) || adapter->iface_type == MWIFIEX_USB)
+ if (num_possible_cpus_gt(1) || adapter->iface_type == MWIFIEX_USB)
adapter->rx_work_enabled = true;

adapter->workqueue =
@@ -1691,7 +1691,7 @@ mwifiex_add_card(void *card, struct completion *fw_done,
adapter->cmd_wait_q.status = 0;
adapter->scan_wait_q_woken = false;

- if ((num_possible_cpus() > 1) || adapter->iface_type == MWIFIEX_USB)
+ if (num_possible_cpus_gt(1) || adapter->iface_type == MWIFIEX_USB)
adapter->rx_work_enabled = true;

adapter->workqueue =
diff --git a/drivers/net/wireless/st/cw1200/queue.c b/drivers/net/wireless/st/cw1200/queue.c
index 12952b1c29df..4d47a1e26d55 100644
--- a/drivers/net/wireless/st/cw1200/queue.c
+++ b/drivers/net/wireless/st/cw1200/queue.c
@@ -312,8 +312,7 @@ int cw1200_queue_put(struct cw1200_queue *queue,
* Leave extra queue slots so we don't overflow.
*/
if (queue->overfull == false &&
- queue->num_queued >=
- (queue->capacity - (num_present_cpus() - 1))) {
+ num_present_cpus_gt(queue->capacity - queue->num_queued)) {
queue->overfull = true;
__cw1200_queue_lock(queue);
mod_timer(&queue->gc, jiffies);
diff --git a/drivers/nvdimm/region.c b/drivers/nvdimm/region.c
index e0c34120df37..474f1ed5d9b9 100644
--- a/drivers/nvdimm/region.c
+++ b/drivers/nvdimm/region.c
@@ -17,7 +17,7 @@ static int nd_region_probe(struct device *dev)
struct nd_region *nd_region = to_nd_region(dev);

if (nd_region->num_lanes > num_online_cpus()
- && nd_region->num_lanes < num_possible_cpus()
+ && num_possible_cpus_gt(nd_region->num_lanes)
&& !test_and_set_bit(0, &once)) {
dev_dbg(dev, "online cpus (%d) < concurrent i/o lanes (%d) < possible cpus (%d)\n",
num_online_cpus(), nd_region->num_lanes,
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index ca2ee806d74b..34958f775ad8 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -79,7 +79,7 @@ static int io_queue_count_set(const char *val, const struct kernel_param *kp)
int ret;

ret = kstrtouint(val, 10, &n);
- if (ret != 0 || n > num_possible_cpus())
+ if (ret != 0 || num_possible_cpus_le(n))
return -EINVAL;
return param_set_uint(val, kp);
}
diff --git a/drivers/perf/arm_pmu.c b/drivers/perf/arm_pmu.c
index a31b302b0ade..5f43a7bde55d 100644
--- a/drivers/perf/arm_pmu.c
+++ b/drivers/perf/arm_pmu.c
@@ -637,7 +637,7 @@ int armpmu_request_irq(int irq, int cpu)

err = irq_force_affinity(irq, cpumask_of(cpu));

- if (err && num_possible_cpus() > 1) {
+ if (err && num_possible_cpus_gt(1)) {
pr_warn("unable to set irq affinity (irq=%d, cpu=%u)\n",
irq, cpu);
goto err_out;
diff --git a/drivers/platform/x86/intel/speed_select_if/isst_if_common.c b/drivers/platform/x86/intel/speed_select_if/isst_if_common.c
index c9a85eb2e860..c25902969475 100644
--- a/drivers/platform/x86/intel/speed_select_if/isst_if_common.c
+++ b/drivers/platform/x86/intel/speed_select_if/isst_if_common.c
@@ -297,7 +297,7 @@ static struct pci_dev *_isst_if_get_pci_dev(int cpu, int bus_no, int dev, int fn
int i, bus_number;

if (bus_no < 0 || bus_no > 1 || cpu < 0 || cpu >= nr_cpu_ids ||
- cpu >= num_possible_cpus())
+ num_possible_cpus_le(cpu + 1))
return NULL;

bus_number = isst_cpu_info[cpu].bus_info[bus_no];
@@ -362,7 +362,7 @@ struct pci_dev *isst_if_get_pci_dev(int cpu, int bus_no, int dev, int fn)
struct pci_dev *pci_dev;

if (bus_no < 0 || bus_no > 1 || cpu < 0 || cpu >= nr_cpu_ids ||
- cpu >= num_possible_cpus())
+ num_possible_cpus_le(cpu + 1))
return NULL;

pci_dev = isst_cpu_info[cpu].pci_dev[bus_no];
@@ -442,7 +442,7 @@ static long isst_if_proc_phyid_req(u8 *cmd_ptr, int *write_only, int resume)

cpu_map = (struct isst_if_cpu_map *)cmd_ptr;
if (cpu_map->logical_cpu >= nr_cpu_ids ||
- cpu_map->logical_cpu >= num_possible_cpus())
+ num_possible_cpus_le(cpu_map->logical_cpu + 1))
return -EINVAL;

*write_only = 0;
diff --git a/drivers/soc/bcm/brcmstb/biuctrl.c b/drivers/soc/bcm/brcmstb/biuctrl.c
index 2c975d79fe8e..6a75cbe836a4 100644
--- a/drivers/soc/bcm/brcmstb/biuctrl.c
+++ b/drivers/soc/bcm/brcmstb/biuctrl.c
@@ -181,7 +181,7 @@ static void __init a72_b53_rac_enable_all(struct device_node *np)
if (IS_ENABLED(CONFIG_CACHE_B15_RAC))
return;

- if (WARN(num_possible_cpus() > 4, "RAC only supports 4 CPUs\n"))
+ if (WARN(num_possible_cpus_gt(4), "RAC only supports 4 CPUs\n"))
return;

pref_dist = cbc_readl(RAC_CONFIG1_REG);
diff --git a/drivers/soc/fsl/dpio/dpio-service.c b/drivers/soc/fsl/dpio/dpio-service.c
index 1d2b27e3ea63..b38c519f2294 100644
--- a/drivers/soc/fsl/dpio/dpio-service.c
+++ b/drivers/soc/fsl/dpio/dpio-service.c
@@ -60,7 +60,7 @@ static inline struct dpaa2_io *service_select_by_cpu(struct dpaa2_io *d,
if (d)
return d;

- if (cpu != DPAA2_IO_ANY_CPU && cpu >= num_possible_cpus())
+ if (cpu != DPAA2_IO_ANY_CPU && num_possible_cpus_le(cpu + 1))
return NULL;

/*
@@ -140,7 +140,7 @@ struct dpaa2_io *dpaa2_io_create(const struct dpaa2_io_desc *desc,
return NULL;

/* check if CPU is out of range (-1 means any cpu) */
- if (desc->cpu != DPAA2_IO_ANY_CPU && desc->cpu >= num_possible_cpus()) {
+ if (desc->cpu != DPAA2_IO_ANY_CPU && num_possible_cpus_le(desc->cpu + 1)) {
kfree(obj);
return NULL;
}
diff --git a/drivers/spi/spi-dw-bt1.c b/drivers/spi/spi-dw-bt1.c
index c06553416123..ab6b6a32a0d6 100644
--- a/drivers/spi/spi-dw-bt1.c
+++ b/drivers/spi/spi-dw-bt1.c
@@ -241,7 +241,7 @@ static int dw_spi_bt1_sys_init(struct platform_device *pdev,
* though, but still tends to be not fast enough at low CPU
* frequencies.
*/
- if (num_possible_cpus() > 1)
+ if (num_possible_cpus_gt(1))
dws->max_mem_freq = 10000000U;
else
dws->max_mem_freq = 20000000U;
diff --git a/drivers/virt/acrn/hsm.c b/drivers/virt/acrn/hsm.c
index 5419794fccf1..50cd69012dcf 100644
--- a/drivers/virt/acrn/hsm.c
+++ b/drivers/virt/acrn/hsm.c
@@ -431,7 +431,7 @@ static ssize_t remove_cpu_store(struct device *dev,
if (kstrtoull(buf, 0, &cpu) < 0)
return -EINVAL;

- if (cpu >= num_possible_cpus() || cpu == 0 || !cpu_is_hotpluggable(cpu))
+ if (num_possible_cpus_le(cpu + 1) || cpu == 0 || !cpu_is_hotpluggable(cpu))
return -EINVAL;

if (cpu_online(cpu))
diff --git a/fs/xfs/xfs_sysfs.c b/fs/xfs/xfs_sysfs.c
index 8608f804388f..5580d60ec962 100644
--- a/fs/xfs/xfs_sysfs.c
+++ b/fs/xfs/xfs_sysfs.c
@@ -211,7 +211,7 @@ pwork_threads_store(
if (ret)
return ret;

- if (val < -1 || val > num_possible_cpus())
+ if (val < -1 || num_possible_cpus_le(val))
return -EINVAL;

xfs_globals.pwork_threads = val;
diff --git a/include/linux/cpumask.h b/include/linux/cpumask.h
index b5e50cf74785..ea0699fa4d4c 100644
--- a/include/linux/cpumask.h
+++ b/include/linux/cpumask.h
@@ -945,8 +945,19 @@ static inline unsigned int num_online_cpus(void)
return atomic_read(&__num_online_cpus);
}
#define num_possible_cpus() cpumask_weight(cpu_possible_mask)
+#define num_possible_cpus_eq(n) cpumask_weight_eq(cpu_possible_mask, (n))
+#define num_possible_cpus_gt(n) cpumask_weight_gt(cpu_possible_mask, (n))
+#define num_possible_cpus_le(n) cpumask_weight_le(cpu_possible_mask, (n))
+
#define num_present_cpus() cpumask_weight(cpu_present_mask)
+#define num_present_cpus_eq(n) cpumask_weight_eq(cpu_present_mask, (n))
+#define num_present_cpus_gt(n) cpumask_weight_gt(cpu_present_mask, (n))
+#define num_present_cpus_le(n) cpumask_weight_le(cpu_present_mask, (n))
+
#define num_active_cpus() cpumask_weight(cpu_active_mask)
+#define num_active_cpus_eq(n) cpumask_weight_eq(cpu_active_mask, (n))
+#define num_active_cpus_gt(n) cpumask_weight_gt(cpu_active_mask, (n))
+#define num_active_cpus_le(n) cpumask_weight_le(cpu_active_mask, (n))

static inline bool cpu_online(unsigned int cpu)
{
@@ -976,9 +987,21 @@ static inline bool cpu_dying(unsigned int cpu)
#else

#define num_online_cpus() 1U
+
#define num_possible_cpus() 1U
+#define num_possible_cpus_eq(n) (1U == (n))
+#define num_possible_cpus_gt(n) (1U > (n))
+#define num_possible_cpus_le(n) (1U < (n))
+
#define num_present_cpus() 1U
+#define num_present_cpus_eq(n) (1U == (n))
+#define num_present_cpus_gt(n) (1U > (n))
+#define num_present_cpus_le(n) (1U < (n))
+
#define num_active_cpus() 1U
+#define num_active_cpus_eq(n) (1U == (n))
+#define num_active_cpus_gt(n) (1U > (n))
+#define num_active_cpus_le(n) (1U < (n))

static inline bool cpu_online(unsigned int cpu)
{
diff --git a/include/linux/kdb.h b/include/linux/kdb.h
index ea0f5e580fac..48269d32b038 100644
--- a/include/linux/kdb.h
+++ b/include/linux/kdb.h
@@ -191,7 +191,7 @@ static inline
int kdb_process_cpu(const struct task_struct *p)
{
unsigned int cpu = task_cpu(p);
- if (cpu > num_possible_cpus())
+ if (num_possible_cpus_le(cpu))
cpu = 0;
return cpu;
}
diff --git a/kernel/debug/kdb/kdb_bt.c b/kernel/debug/kdb/kdb_bt.c
index 10b454554ab0..b6435a41a537 100644
--- a/kernel/debug/kdb/kdb_bt.c
+++ b/kernel/debug/kdb/kdb_bt.c
@@ -108,7 +108,7 @@ kdb_bt_cpu(unsigned long cpu)
{
struct task_struct *kdb_tsk;

- if (cpu >= num_possible_cpus() || !cpu_online(cpu)) {
+ if (num_possible_cpus_le(cpu + 1) || !cpu_online(cpu)) {
kdb_printf("WARNING: no process for cpu %ld\n", cpu);
return;
}
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index cbc35d586afb..08e6df52eb4d 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -1005,7 +1005,7 @@ static void __init log_buf_add_cpu(void)
* set_cpu_possible() after setup_arch() but just in
* case lets ensure this is valid.
*/
- if (num_possible_cpus() == 1)
+ if (num_possible_cpus_eq(1))
return;

cpu_extra = (num_possible_cpus() - 1) * __LOG_CPU_MAX_BUF_LEN;
diff --git a/kernel/reboot.c b/kernel/reboot.c
index 6bcc5d6a6572..f21c2c20505d 100644
--- a/kernel/reboot.c
+++ b/kernel/reboot.c
@@ -639,7 +639,7 @@ static int __init reboot_setup(char *str)
if (isdigit(str[0])) {
int cpu = simple_strtoul(str, NULL, 0);

- if (cpu >= num_possible_cpus()) {
+ if (num_possible_cpus_le(cpu + 1)) {
pr_err("Ignoring the CPU number in reboot= option. "
"CPU %d exceeds possible cpu number %d\n",
cpu, num_possible_cpus());
@@ -844,7 +844,7 @@ static ssize_t cpu_store(struct kobject *kobj, struct kobj_attribute *attr,
if (rc)
return rc;

- if (cpunum >= num_possible_cpus())
+ if (num_possible_cpus_le(cpunum + 1))
return -ERANGE;

reboot_default = 0;
diff --git a/kernel/time/clockevents.c b/kernel/time/clockevents.c
index 32d6629a55b2..c1fdfa4084c3 100644
--- a/kernel/time/clockevents.c
+++ b/kernel/time/clockevents.c
@@ -448,7 +448,7 @@ void clockevents_register_device(struct clock_event_device *dev)
clockevent_set_state(dev, CLOCK_EVT_STATE_DETACHED);

if (!dev->cpumask) {
- WARN_ON(num_possible_cpus() > 1);
+ WARN_ON(num_possible_cpus_gt(1));
dev->cpumask = cpumask_of(smp_processor_id());
}

diff --git a/mm/percpu.c b/mm/percpu.c
index 293009cc03ef..76e846b3d48e 100644
--- a/mm/percpu.c
+++ b/mm/percpu.c
@@ -2936,7 +2936,7 @@ static struct pcpu_alloc_info * __init __flatten pcpu_build_alloc_info(
* greater-than comparison ensures upa==1 always
* passes the following check.
*/
- if (wasted > num_possible_cpus() / 3)
+ if (num_possible_cpus_le(wasted * 3))
continue;

/* and then don't consume more memory */
@@ -3193,7 +3193,7 @@ int __init pcpu_page_first_chunk(size_t reserved_size,

/* allocate pages */
j = 0;
- for (unit = 0; unit < num_possible_cpus(); unit++) {
+ for (unit = 0; num_possible_cpus_gt(unit); unit++) {
unsigned int cpu = ai->groups[0].cpu_map[unit];
for (i = 0; i < unit_pages; i++) {
void *ptr;
@@ -3215,7 +3215,7 @@ int __init pcpu_page_first_chunk(size_t reserved_size,
vm.size = num_possible_cpus() * ai->unit_size;
vm_area_register_early(&vm, PAGE_SIZE);

- for (unit = 0; unit < num_possible_cpus(); unit++) {
+ for (unit = 0; num_possible_cpus_gt(unit); unit++) {
unsigned long unit_addr =
(unsigned long)vm.addr + unit * ai->unit_size;

diff --git a/mm/slab.c b/mm/slab.c
index ca4822f6b2b6..3cf2ee629b23 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -3929,7 +3929,7 @@ static int enable_cpucache(struct kmem_cache *cachep, gfp_t gfp)
* to a larger limit. Thus disabled by default.
*/
shared = 0;
- if (cachep->size <= PAGE_SIZE && num_possible_cpus() > 1)
+ if (cachep->size <= PAGE_SIZE && num_possible_cpus_gt(1))
shared = 8;

#if DEBUG
--
2.25.1


2021-11-28 04:00:25

by Yury Norov

[permalink] [raw]
Subject: [PATCH 8/9] lib/nodemask: add num_node_state_eq()

Add num_node_state_eq() and replace num_node_state() with it in
page_alloc_init().

Signed-off-by: Yury Norov <[email protected]>
---
include/linux/nodemask.h | 5 +++++
mm/page_alloc.c | 2 +-
2 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/include/linux/nodemask.h b/include/linux/nodemask.h
index 3801ec5b06f4..b68ee2a80164 100644
--- a/include/linux/nodemask.h
+++ b/include/linux/nodemask.h
@@ -455,6 +455,11 @@ static inline int num_node_state(enum node_states state)
return nodes_weight(node_states[state]);
}

+static inline int num_node_state_eq(enum node_states state, unsigned int num)
+{
+ return nodes_weight_eq(node_states[state], num);
+}
+
#define for_each_node_state(__node, __state) \
for_each_node_mask((__node), node_states[__state])

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 91c1105a9efe..81d111155ffb 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -8323,7 +8323,7 @@ void __init page_alloc_init(void)
int ret;

#ifdef CONFIG_NUMA
- if (num_node_state(N_MEMORY) == 1)
+ if (num_node_state_eq(N_MEMORY, 1))
hashdist = 0;
#endif

--
2.25.1


2021-11-28 04:00:35

by Yury Norov

[permalink] [raw]
Subject: [PATCH 9/9] MAINTAINERS: add cpumask and nodemask files to BITMAP_API

cpumask and nodemask APIs are thin wrappers around basic bitmap API, and
corresponding files are not formally maintained. This patch adds them to
BITMAP_API section, so that bitmap folks would have closer look at it.

Signed-off-by: Yury Norov <[email protected]>
---
MAINTAINERS | 4 ++++
1 file changed, 4 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 00ad0cb5cb05..ceeffcd81fa4 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3375,10 +3375,14 @@ R: Andy Shevchenko <[email protected]>
R: Rasmus Villemoes <[email protected]>
S: Maintained
F: include/linux/bitmap.h
+F: include/linux/cpumask.h
F: include/linux/find.h
+F: include/linux/nodemask.h
F: lib/bitmap.c
+F: lib/cpumask.c
F: lib/find_bit.c
F: lib/find_bit_benchmark.c
+F: lib/nodemask.c
F: lib/test_bitmap.c
F: tools/include/linux/bitmap.h
F: tools/include/linux/find.h
--
2.25.1


2021-11-28 04:39:45

by Michał Mirosław

[permalink] [raw]
Subject: Re: [PATCH 2/9] lib/bitmap: implement bitmap_{empty,full} with bitmap_weight_eq()

On Sat, Nov 27, 2021 at 07:56:57PM -0800, Yury Norov wrote:
> Now as we have bitmap_weight_eq(), switch bitmap_full() and
> bitmap_empty() to using it.
[...]
> -static inline bool bitmap_empty(const unsigned long *src, unsigned nbits)
> -{
> - if (small_const_nbits(nbits))
> - return ! (*src & BITMAP_LAST_WORD_MASK(nbits));
> -
> - return find_first_bit(src, nbits) == nbits;
> -}
[...]
> +static __always_inline bool bitmap_empty(const unsigned long *src, unsigned int nbits)
> +{
> + return bitmap_weight_eq(src, nbits, 0);
> +}
[..]

What's the speed difference? Have you benchmarked this?

Best Regards
Micha? Miros?aw

2021-11-28 04:49:53

by Michał Mirosław

[permalink] [raw]
Subject: Re: [PATCH 3/9] all: replace bitmap_weigth() with bitmap_{empty,full,eq,gt,le}

On Sat, Nov 27, 2021 at 07:56:58PM -0800, Yury Norov wrote:
> bitmap_weight() counts all set bits in the bitmap unconditionally.
> However in some cases we can traverse a part of bitmap when we
> only need to check if number of set bits is greater, less or equal
> to some number.
>
> This patch replaces bitmap_weight() with one of
> bitmap_{empty,full,eq,gt,le), as appropriate.
>
> In some places driver code has been optimized further, where it's
> trivial.
[...]

I think this patch needs to be split. bitmap_full/empty() conversions
can be separated (don't need the bitmap_weight_*() functions) and
not all changes are trivial. Besides, gathering and checking all the
acks needed into one patch seems problematic.

Best Regards,
Micha??Miros?aw

2021-11-28 04:59:02

by Michał Mirosław

[permalink] [raw]
Subject: Re: [PATCH 7/9] lib/cpumask: add num_{possible,present,active}_cpus_{eq,gt,le}

On Sat, Nov 27, 2021 at 07:57:02PM -0800, Yury Norov wrote:
> Add num_{possible,present,active}_cpus_{eq,gt,le} and replace num_*_cpus()
> with one of new functions where appropriate. This allows num_*_cpus_*()
> to return earlier depending on the condition.
[...]
> @@ -3193,7 +3193,7 @@ int __init pcpu_page_first_chunk(size_t reserved_size,
>
> /* allocate pages */
> j = 0;
> - for (unit = 0; unit < num_possible_cpus(); unit++) {
> + for (unit = 0; num_possible_cpus_gt(unit); unit++) {

This looks dubious. The old version I could hope the compiler would call
num_possible_cpus() only once if it's marked const or pure, but the
alternative is going to count the bits every time making this a guaranteed
O(n^2) even though the bitmap doesn't change.

Best Regards
Micha??Miros?aw

2021-11-28 05:12:03

by Michał Mirosław

[permalink] [raw]
Subject: Re: [PATCH 7/9] lib/cpumask: add num_{possible,present,active}_cpus_{eq,gt,le}

On Sun, Nov 28, 2021 at 05:56:51AM +0100, Micha? Miros?aw wrote:
> On Sat, Nov 27, 2021 at 07:57:02PM -0800, Yury Norov wrote:
> > Add num_{possible,present,active}_cpus_{eq,gt,le} and replace num_*_cpus()
> > with one of new functions where appropriate. This allows num_*_cpus_*()
> > to return earlier depending on the condition.
> [...]
> > @@ -3193,7 +3193,7 @@ int __init pcpu_page_first_chunk(size_t reserved_size,
> >
> > /* allocate pages */
> > j = 0;
> > - for (unit = 0; unit < num_possible_cpus(); unit++) {
> > + for (unit = 0; num_possible_cpus_gt(unit); unit++) {
>
> This looks dubious. The old version I could hope the compiler would call
> num_possible_cpus() only once if it's marked const or pure, but the
> alternative is going to count the bits every time making this a guaranteed
> O(n^2) even though the bitmap doesn't change.

Hmm. This code already unnecessarily calls num_possible_cpus() multiple
times. Since it doesn't change after early init I would suggest just
calling it once here.

Best Regards
Micha??Miros?aw

2021-11-28 06:29:37

by Yury Norov

[permalink] [raw]
Subject: Re: [PATCH 2/9] lib/bitmap: implement bitmap_{empty,full} with bitmap_weight_eq()

On Sun, Nov 28, 2021 at 05:37:19AM +0100, Michał Mirosław wrote:
> On Sat, Nov 27, 2021 at 07:56:57PM -0800, Yury Norov wrote:
> > Now as we have bitmap_weight_eq(), switch bitmap_full() and
> > bitmap_empty() to using it.
> [...]
> > -static inline bool bitmap_empty(const unsigned long *src, unsigned nbits)
> > -{
> > - if (small_const_nbits(nbits))
> > - return ! (*src & BITMAP_LAST_WORD_MASK(nbits));
> > -
> > - return find_first_bit(src, nbits) == nbits;
> > -}
> [...]
> > +static __always_inline bool bitmap_empty(const unsigned long *src, unsigned int nbits)
> > +{
> > + return bitmap_weight_eq(src, nbits, 0);
> > +}
> [..]
>
> What's the speed difference? Have you benchmarked this?

bitmap_weight_eq() should be faster than find_first_bit(), but the
difference is few cycles, so I didn't bother measuring it.

New version looks just better.

2021-11-28 06:36:54

by Yury Norov

[permalink] [raw]
Subject: Re: [PATCH 7/9] lib/cpumask: add num_{possible,present,active}_cpus_{eq,gt,le}

(restore CC list)

On Sun, Nov 28, 2021 at 05:56:51AM +0100, Michał Mirosław wrote:
> On Sat, Nov 27, 2021 at 07:57:02PM -0800, Yury Norov wrote:
> > Add num_{possible,present,active}_cpus_{eq,gt,le} and replace num_*_cpus()
> > with one of new functions where appropriate. This allows num_*_cpus_*()
> > to return earlier depending on the condition.
> [...]
> > @@ -3193,7 +3193,7 @@ int __init pcpu_page_first_chunk(size_t reserved_size,
> >
> > /* allocate pages */
> > j = 0;
> > - for (unit = 0; unit < num_possible_cpus(); unit++) {
> > + for (unit = 0; num_possible_cpus_gt(unit); unit++) {
>
> This looks dubious.

Only this?

> The old version I could hope the compiler would call
> num_possible_cpus() only once if it's marked const or pure, but the
> alternative is going to count the bits every time making this a guaranteed
> O(n^2) even though the bitmap doesn't change.

num_possible_cpus() is not const neither pure. This is O(n^2) before and after.


2021-11-28 08:03:57

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH 3/9] all: replace bitmap_weigth() with bitmap_{empty,full,eq,gt,le}

On Sat, Nov 27, 2021 at 07:56:58PM -0800, Yury Norov wrote:
> bitmap_weight() counts all set bits in the bitmap unconditionally.
> However in some cases we can traverse a part of bitmap when we
> only need to check if number of set bits is greater, less or equal
> to some number.
>
> This patch replaces bitmap_weight() with one of
> bitmap_{empty,full,eq,gt,le), as appropriate.
>
> In some places driver code has been optimized further, where it's
> trivial.
>
> Signed-off-by: Yury Norov <[email protected]>
> ---
> arch/nds32/kernel/perf_event_cpu.c | 4 +---
> arch/x86/kernel/cpu/resctrl/rdtgroup.c | 4 ++--
> arch/x86/kvm/hyperv.c | 8 ++++----
> drivers/crypto/ccp/ccp-dev-v5.c | 5 +----
> drivers/gpu/drm/msm/disp/mdp5/mdp5_smp.c | 2 +-
> drivers/iio/adc/mxs-lradc-adc.c | 3 +--
> drivers/iio/dummy/iio_simple_dummy_buffer.c | 4 ++--
> drivers/iio/industrialio-buffer.c | 2 +-
> drivers/iio/industrialio-trigger.c | 2 +-
> drivers/memstick/core/ms_block.c | 4 ++--
> drivers/net/dsa/b53/b53_common.c | 2 +-
> drivers/net/ethernet/broadcom/bcmsysport.c | 6 +-----
> drivers/net/ethernet/intel/ice/ice_virtchnl_pf.c | 4 ++--
> drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c | 2 +-
> .../ethernet/marvell/octeontx2/nic/otx2_ethtool.c | 2 +-
> .../ethernet/marvell/octeontx2/nic/otx2_flows.c | 8 ++++----
> .../net/ethernet/marvell/octeontx2/nic/otx2_pf.c | 2 +-
> drivers/net/ethernet/mellanox/mlx4/cmd.c | 10 +++-------
> drivers/net/ethernet/mellanox/mlx4/eq.c | 4 ++--
> drivers/net/ethernet/mellanox/mlx4/main.c | 2 +-
> .../net/ethernet/mellanox/mlx5/core/en_ethtool.c | 2 +-
> drivers/net/ethernet/qlogic/qed/qed_dev.c | 3 +--
> drivers/net/ethernet/qlogic/qed/qed_rdma.c | 4 ++--
> drivers/net/ethernet/qlogic/qed/qed_roce.c | 2 +-
> drivers/perf/arm-cci.c | 2 +-
> drivers/perf/arm_pmu.c | 4 ++--
> drivers/perf/hisilicon/hisi_uncore_pmu.c | 2 +-
> drivers/perf/thunderx2_pmu.c | 3 +--
> drivers/perf/xgene_pmu.c | 2 +-
> drivers/pwm/pwm-pca9685.c | 2 +-
> drivers/staging/media/tegra-video/vi.c | 2 +-
> drivers/thermal/intel/intel_powerclamp.c | 10 ++++------
> fs/ocfs2/cluster/heartbeat.c | 14 +++++++-------
> 33 files changed, 57 insertions(+), 75 deletions(-)

After you get the new functions added to the kernel tree, this patch
should be broken up into one-patch-per-subsystem and submitted through
the various subsystem trees.

thanks,

greg k-h

2021-11-28 11:11:03

by Nicholas Piggin

[permalink] [raw]
Subject: Re: [PATCH 0/9] lib/bitmap: optimize bitmap_weight() usage

Excerpts from Yury Norov's message of November 28, 2021 1:56 pm:
> In many cases people use bitmap_weight()-based functions like this:
>
> if (num_present_cpus() > 1)
> do_something();
>
> This may take considerable amount of time on many-cpus machines because
> num_present_cpus() will traverse every word of underlying cpumask
> unconditionally.
>
> We can significantly improve on it for many real cases if stop traversing
> the mask as soon as we count present cpus to any number greater than 1:
>
> if (num_present_cpus_gt(1))
> do_something();
>
> To implement this idea, the series adds bitmap_weight_{eq,gt,le}
> functions together with corresponding wrappers in cpumask and nodemask.

There would be no change to callers if you maintain counters like what
is done for num_online_cpus() today. Maybe some fixes to arch code that
does not use set_cpu_possible() etc APIs required, but AFAIKS it would
be better to fix such cases anyway.

Thanks,
Nick

2021-11-28 17:10:49

by Joe Perches

[permalink] [raw]
Subject: Re: [PATCH 7/9] lib/cpumask: add num_{possible,present,active}_cpus_{eq,gt,le}

On Sat, 2021-11-27 at 19:57 -0800, Yury Norov wrote:
> Add num_{possible,present,active}_cpus_{eq,gt,le} and replace num_*_cpus()
> with one of new functions where appropriate. This allows num_*_cpus_*()
> to return earlier depending on the condition.
[]
> diff --git a/arch/arc/kernel/smp.c b/arch/arc/kernel/smp.c
[]
> @@ -103,7 +103,7 @@ void __init smp_prepare_cpus(unsigned int max_cpus)
> * if platform didn't set the present map already, do it now
> * boot cpu is set to present already by init/main.c
> */
> - if (num_present_cpus() <= 1)
> + if (num_present_cpus_le(2))
> init_cpu_present(cpu_possible_mask);

? is this supposed to be 2 or 1

> diff --git a/drivers/cpufreq/pcc-cpufreq.c b/drivers/cpufreq/pcc-cpufreq.c
[]
> @@ -593,7 +593,7 @@ static int __init pcc_cpufreq_init(void)
> return ret;
> }
>
> - if (num_present_cpus() > 4) {
> + if (num_present_cpus_gt(4)) {
> pcc_cpufreq_driver.flags |= CPUFREQ_NO_AUTO_DYNAMIC_SWITCHING;
> pr_err("%s: Too many CPUs, dynamic performance scaling disabled\n",
> __func__);

It looks as if the present variants should be using the same values
so the _le test above with 1 changed to 2 looks odd.



2021-11-28 17:45:46

by Yury Norov

[permalink] [raw]
Subject: Re: [PATCH 7/9] lib/cpumask: add num_{possible,present,active}_cpus_{eq,gt,le}

On Sun, Nov 28, 2021 at 09:07:52AM -0800, Joe Perches wrote:
> On Sat, 2021-11-27 at 19:57 -0800, Yury Norov wrote:
> > Add num_{possible,present,active}_cpus_{eq,gt,le} and replace num_*_cpus()
> > with one of new functions where appropriate. This allows num_*_cpus_*()
> > to return earlier depending on the condition.
> []
> > diff --git a/arch/arc/kernel/smp.c b/arch/arc/kernel/smp.c
> []
> > @@ -103,7 +103,7 @@ void __init smp_prepare_cpus(unsigned int max_cpus)
> > * if platform didn't set the present map already, do it now
> > * boot cpu is set to present already by init/main.c
> > */
> > - if (num_present_cpus() <= 1)
> > + if (num_present_cpus_le(2))
> > init_cpu_present(cpu_possible_mask);
>
> ? is this supposed to be 2 or 1

X <= 1 is the equivalent of X < 2.

> > diff --git a/drivers/cpufreq/pcc-cpufreq.c b/drivers/cpufreq/pcc-cpufreq.c
> []
> > @@ -593,7 +593,7 @@ static int __init pcc_cpufreq_init(void)
> > return ret;
> > }
> >
> > - if (num_present_cpus() > 4) {
> > + if (num_present_cpus_gt(4)) {
> > pcc_cpufreq_driver.flags |= CPUFREQ_NO_AUTO_DYNAMIC_SWITCHING;
> > pr_err("%s: Too many CPUs, dynamic performance scaling disabled\n",
> > __func__);
>
> It looks as if the present variants should be using the same values
> so the _le test above with 1 changed to 2 looks odd.


2021-11-28 17:56:11

by Dennis Zhou

[permalink] [raw]
Subject: Re: [PATCH 7/9] lib/cpumask: add num_{possible,present,active}_cpus_{eq,gt,le}

Hello,

On Sun, Nov 28, 2021 at 09:43:20AM -0800, Yury Norov wrote:
> On Sun, Nov 28, 2021 at 09:07:52AM -0800, Joe Perches wrote:
> > On Sat, 2021-11-27 at 19:57 -0800, Yury Norov wrote:
> > > Add num_{possible,present,active}_cpus_{eq,gt,le} and replace num_*_cpus()
> > > with one of new functions where appropriate. This allows num_*_cpus_*()
> > > to return earlier depending on the condition.
> > []
> > > diff --git a/arch/arc/kernel/smp.c b/arch/arc/kernel/smp.c
> > []
> > > @@ -103,7 +103,7 @@ void __init smp_prepare_cpus(unsigned int max_cpus)
> > > * if platform didn't set the present map already, do it now
> > > * boot cpu is set to present already by init/main.c
> > > */
> > > - if (num_present_cpus() <= 1)
> > > + if (num_present_cpus_le(2))
> > > init_cpu_present(cpu_possible_mask);
> >
> > ? is this supposed to be 2 or 1
>
> X <= 1 is the equivalent of X < 2.
>
> > > diff --git a/drivers/cpufreq/pcc-cpufreq.c b/drivers/cpufreq/pcc-cpufreq.c
> > []
> > > @@ -593,7 +593,7 @@ static int __init pcc_cpufreq_init(void)
> > > return ret;
> > > }
> > >
> > > - if (num_present_cpus() > 4) {
> > > + if (num_present_cpus_gt(4)) {
> > > pcc_cpufreq_driver.flags |= CPUFREQ_NO_AUTO_DYNAMIC_SWITCHING;
> > > pr_err("%s: Too many CPUs, dynamic performance scaling disabled\n",
> > > __func__);
> >
> > It looks as if the present variants should be using the same values
> > so the _le test above with 1 changed to 2 looks odd.
>

I think the confusion comes from le meaning less than rather than lt.
Given the general convention of: lt (<), le (<=), eg (=), ge (>=),
gt (>), I'd consider renaming your le to lt.

Thanks,
Dennis

2021-11-28 18:00:44

by Joe Perches

[permalink] [raw]
Subject: Re: [PATCH 7/9] lib/cpumask: add num_{possible,present,active}_cpus_{eq,gt,le}

On Sun, 2021-11-28 at 09:43 -0800, Yury Norov wrote:
> On Sun, Nov 28, 2021 at 09:07:52AM -0800, Joe Perches wrote:
> > On Sat, 2021-11-27 at 19:57 -0800, Yury Norov wrote:
> > > Add num_{possible,present,active}_cpus_{eq,gt,le} and replace num_*_cpus()
> > > with one of new functions where appropriate. This allows num_*_cpus_*()
> > > to return earlier depending on the condition.
> > []
> > > diff --git a/arch/arc/kernel/smp.c b/arch/arc/kernel/smp.c
> > []
> > > @@ -103,7 +103,7 @@ void __init smp_prepare_cpus(unsigned int max_cpus)
> > > * if platform didn't set the present map already, do it now
> > > * boot cpu is set to present already by init/main.c
> > > */
> > > - if (num_present_cpus() <= 1)
> > > + if (num_present_cpus_le(2))
> > > init_cpu_present(cpu_possible_mask);
> >
> > ? is this supposed to be 2 or 1
>
> X <= 1 is the equivalent of X < 2.

True. The call though is _le not _lt



2021-11-28 18:02:22

by Emil Renner Berthing

[permalink] [raw]
Subject: Re: [PATCH 7/9] lib/cpumask: add num_{possible,present,active}_cpus_{eq,gt,le}

On Sun, 28 Nov 2021 at 18:43, Yury Norov <[email protected]> wrote:
> On Sun, Nov 28, 2021 at 09:07:52AM -0800, Joe Perches wrote:
> > On Sat, 2021-11-27 at 19:57 -0800, Yury Norov wrote:
> > > Add num_{possible,present,active}_cpus_{eq,gt,le} and replace num_*_cpus()
> > > with one of new functions where appropriate. This allows num_*_cpus_*()
> > > to return earlier depending on the condition.
> > []
> > > diff --git a/arch/arc/kernel/smp.c b/arch/arc/kernel/smp.c
> > []
> > > @@ -103,7 +103,7 @@ void __init smp_prepare_cpus(unsigned int max_cpus)
> > > * if platform didn't set the present map already, do it now
> > > * boot cpu is set to present already by init/main.c
> > > */
> > > - if (num_present_cpus() <= 1)
> > > + if (num_present_cpus_le(2))
> > > init_cpu_present(cpu_possible_mask);
> >
> > ? is this supposed to be 2 or 1
>
> X <= 1 is the equivalent of X < 2.

Ah, then the function is confusing. Usually it's lt = less than and lt
= less than or equal. Same idea for gt vs ge.

2021-11-28 18:06:03

by mirq-test

[permalink] [raw]
Subject: Re: [PATCH 0/9] lib/bitmap: optimize bitmap_weight() usage

On Sat, Nov 27, 2021 at 07:56:55PM -0800, Yury Norov wrote:
> In many cases people use bitmap_weight()-based functions like this:
>
> if (num_present_cpus() > 1)
> do_something();
>
> This may take considerable amount of time on many-cpus machines because
> num_present_cpus() will traverse every word of underlying cpumask
> unconditionally.
>
> We can significantly improve on it for many real cases if stop traversing
> the mask as soon as we count present cpus to any number greater than 1:
>
> if (num_present_cpus_gt(1))
> do_something();
>
> To implement this idea, the series adds bitmap_weight_{eq,gt,le}
> functions together with corresponding wrappers in cpumask and nodemask.

Having slept on it I have more structured thoughts:

First, I like substituting bitmap_empty/full where possible - I think
the change stands on its own, so could be split and sent as is.

I don't like the proposed API very much. One problem is that it hides
the comparison operator and makes call sites less readable:

bitmap_weight(...) > N

becomes:

bitmap_weight_gt(..., N)

and:
bitmap_weight(...) <= N

becomes:

bitmap_weight_lt(..., N+1)
or:
!bitmap_weight_gt(..., N)

I'd rather see something resembling memcmp() API that's known enough
to be easier to grasp. For above examples:

bitmap_weight_cmp(..., N) > 0
bitmap_weight_cmp(..., N) <= 0
...

This would also make the implementation easier in not having to
copy and paste the code three times. Could also use a simple
optimization reducing code size:

#include <linux/overflow.h>

int bitmap_weight_cmp(long *bits, size_t nbits, size_t cmp)
{
for (size_t i = 0; i < nbits / BITS_PER_LONG; ++i, ++bits)
if (check_sub_overflow(cmp, popcount(*bits), &cmp))
return 1;

nbits %= BITS_PER_LONG;
if (nbits && check_sub_overflow(cmp,
popcount(*bits & GENMASK(nbits)), &cmp))
return 1;

return cmp ? -1 : 0;
}

Best Regards
Micha? Miros?aw

2021-11-28 18:12:32

by Michał Mirosław

[permalink] [raw]
Subject: Re: [PATCH 2/9] lib/bitmap: implement bitmap_{empty,full} with bitmap_weight_eq()

On Sat, Nov 27, 2021 at 07:56:57PM -0800, Yury Norov wrote:
> Now as we have bitmap_weight_eq(), switch bitmap_full() and
> bitmap_empty() to using it.
>
> Signed-off-by: Yury Norov <[email protected]>
> ---
> include/linux/bitmap.h | 26 ++++++++++----------------
> 1 file changed, 10 insertions(+), 16 deletions(-)
>
> diff --git a/include/linux/bitmap.h b/include/linux/bitmap.h
> index 996041f771c8..2d951e4dc814 100644
> --- a/include/linux/bitmap.h
> +++ b/include/linux/bitmap.h
> @@ -386,22 +386,6 @@ static inline int bitmap_subset(const unsigned long *src1,
> return __bitmap_subset(src1, src2, nbits);
> }
>
> -static inline bool bitmap_empty(const unsigned long *src, unsigned nbits)
> -{
> - if (small_const_nbits(nbits))
> - return ! (*src & BITMAP_LAST_WORD_MASK(nbits));
> -
> - return find_first_bit(src, nbits) == nbits;
> -}

Since this is supposed to be an optimization, I would go all the way and
replace this with the trivial implementation instead:

bool bitmap_empty(long *bits, size_t nbits)
{
for (; nbits >= BITS_PER_LONG; ++bits, nbits -= BITS_PER_LONG)
if (*bits)
return false;

if (nbits && *bits & BITMAP_LAST_WORD_MASK(nbits))
return false;

return true;
}

Best Regards
Micha? Miros?aw

2021-11-28 18:50:23

by Yury Norov

[permalink] [raw]
Subject: Re: [PATCH 7/9] lib/cpumask: add num_{possible,present,active}_cpus_{eq,gt,le}

On Sun, Nov 28, 2021 at 12:54:00PM -0500, Dennis Zhou wrote:
> Hello,
>
> On Sun, Nov 28, 2021 at 09:43:20AM -0800, Yury Norov wrote:
> > On Sun, Nov 28, 2021 at 09:07:52AM -0800, Joe Perches wrote:
> > > On Sat, 2021-11-27 at 19:57 -0800, Yury Norov wrote:
> > > > Add num_{possible,present,active}_cpus_{eq,gt,le} and replace num_*_cpus()
> > > > with one of new functions where appropriate. This allows num_*_cpus_*()
> > > > to return earlier depending on the condition.
> > > []
> > > > diff --git a/arch/arc/kernel/smp.c b/arch/arc/kernel/smp.c
> > > []
> > > > @@ -103,7 +103,7 @@ void __init smp_prepare_cpus(unsigned int max_cpus)
> > > > * if platform didn't set the present map already, do it now
> > > > * boot cpu is set to present already by init/main.c
> > > > */
> > > > - if (num_present_cpus() <= 1)
> > > > + if (num_present_cpus_le(2))
> > > > init_cpu_present(cpu_possible_mask);
> > >
> > > ? is this supposed to be 2 or 1
> >
> > X <= 1 is the equivalent of X < 2.
> >
> > > > diff --git a/drivers/cpufreq/pcc-cpufreq.c b/drivers/cpufreq/pcc-cpufreq.c
> > > []
> > > > @@ -593,7 +593,7 @@ static int __init pcc_cpufreq_init(void)
> > > > return ret;
> > > > }
> > > >
> > > > - if (num_present_cpus() > 4) {
> > > > + if (num_present_cpus_gt(4)) {
> > > > pcc_cpufreq_driver.flags |= CPUFREQ_NO_AUTO_DYNAMIC_SWITCHING;
> > > > pr_err("%s: Too many CPUs, dynamic performance scaling disabled\n",
> > > > __func__);
> > >
> > > It looks as if the present variants should be using the same values
> > > so the _le test above with 1 changed to 2 looks odd.
> >
>
> I think the confusion comes from le meaning less than rather than lt.
> Given the general convention of: lt (<), le (<=), eg (=), ge (>=),
> gt (>), I'd consider renaming your le to lt.

Ok, makes sense. I'll rename in v2 and add <= and >= versions.

2021-11-28 23:38:40

by Yury Norov

[permalink] [raw]
Subject: Re: [PATCH 0/9] lib/bitmap: optimize bitmap_weight() usage

On Sun, Nov 28, 2021 at 09:08:41PM +1000, Nicholas Piggin wrote:
> Excerpts from Yury Norov's message of November 28, 2021 1:56 pm:
> > In many cases people use bitmap_weight()-based functions like this:
> >
> > if (num_present_cpus() > 1)
> > do_something();
> >
> > This may take considerable amount of time on many-cpus machines because
> > num_present_cpus() will traverse every word of underlying cpumask
> > unconditionally.
> >
> > We can significantly improve on it for many real cases if stop traversing
> > the mask as soon as we count present cpus to any number greater than 1:
> >
> > if (num_present_cpus_gt(1))
> > do_something();
> >
> > To implement this idea, the series adds bitmap_weight_{eq,gt,le}
> > functions together with corresponding wrappers in cpumask and nodemask.
>
> There would be no change to callers if you maintain counters like what
> is done for num_online_cpus() today. Maybe some fixes to arch code that
> does not use set_cpu_possible() etc APIs required, but AFAIKS it would
> be better to fix such cases anyway.

Thanks, Nick. I'll try to do this.

2021-11-29 06:40:48

by Yury Norov

[permalink] [raw]
Subject: Re: [PATCH 0/9] lib/bitmap: optimize bitmap_weight() usage

On Sun, Nov 28, 2021 at 07:03:41PM +0100, [email protected] wrote:
> On Sat, Nov 27, 2021 at 07:56:55PM -0800, Yury Norov wrote:
> > In many cases people use bitmap_weight()-based functions like this:
> >
> > if (num_present_cpus() > 1)
> > do_something();
> >
> > This may take considerable amount of time on many-cpus machines because
> > num_present_cpus() will traverse every word of underlying cpumask
> > unconditionally.
> >
> > We can significantly improve on it for many real cases if stop traversing
> > the mask as soon as we count present cpus to any number greater than 1:
> >
> > if (num_present_cpus_gt(1))
> > do_something();
> >
> > To implement this idea, the series adds bitmap_weight_{eq,gt,le}
> > functions together with corresponding wrappers in cpumask and nodemask.
>
> Having slept on it I have more structured thoughts:
>
> First, I like substituting bitmap_empty/full where possible - I think
> the change stands on its own, so could be split and sent as is.

Ok, I can do it.

> I don't like the proposed API very much. One problem is that it hides
> the comparison operator and makes call sites less readable:
>
> bitmap_weight(...) > N
>
> becomes:
>
> bitmap_weight_gt(..., N)
>
> and:
> bitmap_weight(...) <= N
>
> becomes:
>
> bitmap_weight_lt(..., N+1)
> or:
> !bitmap_weight_gt(..., N)
>
> I'd rather see something resembling memcmp() API that's known enough
> to be easier to grasp. For above examples:
>
> bitmap_weight_cmp(..., N) > 0
> bitmap_weight_cmp(..., N) <= 0
> ...

bitmap_weight_cmp() cannot be efficient. Consider this example:

bitmap_weight_lt(1000 0000 0000 0000, 1) == false
^
stop here

bitmap_weight_cmp(1000 0000 0000 0000, 1) == 0
^
stop here

I agree that '_gt' is less verbose than '>', but the advantage of
'_gt' over '>' is proportional to length of bitmap, and it means
that this API should exist.

> This would also make the implementation easier in not having to
> copy and paste the code three times. Could also use a simple
> optimization reducing code size:

In the next version I'll reduce code duplication like this:

bool bitmap_eq(..., N);
bool bitmap_ge(..., N);

#define bitmap_weight_gt(..., N) bitmap_weight_ge(..., N + 1)
#define bitmap_weight_lt(..., N) !bitmap_weight_ge(..., N)
#define bitmap_weight_le(..., N) !bitmap_weight_gt(..., N)

Thanks,
Yury

2021-11-29 16:36:35

by Michał Mirosław

[permalink] [raw]
Subject: Re: [PATCH 0/9] lib/bitmap: optimize bitmap_weight() usage

Dnia 29 listopada 2021 06:38:39 UTC, Yury Norov <[email protected]> napisał/a:
>On Sun, Nov 28, 2021 at 07:03:41PM +0100, [email protected] wrote:
>> On Sat, Nov 27, 2021 at 07:56:55PM -0800, Yury Norov wrote:
>> > In many cases people use bitmap_weight()-based functions like this:
>> >
>> > if (num_present_cpus() > 1)
>> > do_something();
>> >
>> > This may take considerable amount of time on many-cpus machines because
>> > num_present_cpus() will traverse every word of underlying cpumask
>> > unconditionally.
>> >
>> > We can significantly improve on it for many real cases if stop traversing
>> > the mask as soon as we count present cpus to any number greater than 1:
>> >
>> > if (num_present_cpus_gt(1))
>> > do_something();
>> >
>> > To implement this idea, the series adds bitmap_weight_{eq,gt,le}
>> > functions together with corresponding wrappers in cpumask and nodemask.
>>
>> Having slept on it I have more structured thoughts:
>>
>> First, I like substituting bitmap_empty/full where possible - I think
>> the change stands on its own, so could be split and sent as is.
>
>Ok, I can do it.
>
>> I don't like the proposed API very much. One problem is that it hides
>> the comparison operator and makes call sites less readable:
>>
>> bitmap_weight(...) > N
>>
>> becomes:
>>
>> bitmap_weight_gt(..., N)
>>
>> and:
>> bitmap_weight(...) <= N
>>
>> becomes:
>>
>> bitmap_weight_lt(..., N+1)
>> or:
>> !bitmap_weight_gt(..., N)
>>
>> I'd rather see something resembling memcmp() API that's known enough
>> to be easier to grasp. For above examples:
>>
>> bitmap_weight_cmp(..., N) > 0
>> bitmap_weight_cmp(..., N) <= 0
>> ...
>
>bitmap_weight_cmp() cannot be efficient. Consider this example:
>
>bitmap_weight_lt(1000 0000 0000 0000, 1) == false
> ^
> stop here
>
>bitmap_weight_cmp(1000 0000 0000 0000, 1) == 0
> ^
> stop here
>
>I agree that '_gt' is less verbose than '>', but the advantage of
>'_gt' over '>' is proportional to length of bitmap, and it means
>that this API should exist.

Thank you for the example. Indeed, for less-than to be efficient here you would need to replace
bitmap_weight_cmp(..., N) < 0
with
bitmap_weight_cmp(..., N-1) <= 0

It would still be more readable, I think.

Best Regards
Michał Mirosław

2021-12-02 00:31:55

by Yury Norov

[permalink] [raw]
Subject: Re: [PATCH 0/9] lib/bitmap: optimize bitmap_weight() usage

On Mon, Nov 29, 2021 at 04:34:07PM +0000, Michał Mirosław wrote:
> Dnia 29 listopada 2021 06:38:39 UTC, Yury Norov <[email protected]> napisał/a:
> >On Sun, Nov 28, 2021 at 07:03:41PM +0100, [email protected] wrote:
> >> On Sat, Nov 27, 2021 at 07:56:55PM -0800, Yury Norov wrote:
> >> > In many cases people use bitmap_weight()-based functions like this:
> >> >
> >> > if (num_present_cpus() > 1)
> >> > do_something();
> >> >
> >> > This may take considerable amount of time on many-cpus machines because
> >> > num_present_cpus() will traverse every word of underlying cpumask
> >> > unconditionally.
> >> >
> >> > We can significantly improve on it for many real cases if stop traversing
> >> > the mask as soon as we count present cpus to any number greater than 1:
> >> >
> >> > if (num_present_cpus_gt(1))
> >> > do_something();
> >> >
> >> > To implement this idea, the series adds bitmap_weight_{eq,gt,le}
> >> > functions together with corresponding wrappers in cpumask and nodemask.
> >>
> >> Having slept on it I have more structured thoughts:
> >>
> >> First, I like substituting bitmap_empty/full where possible - I think
> >> the change stands on its own, so could be split and sent as is.
> >
> >Ok, I can do it.
> >
> >> I don't like the proposed API very much. One problem is that it hides
> >> the comparison operator and makes call sites less readable:
> >>
> >> bitmap_weight(...) > N
> >>
> >> becomes:
> >>
> >> bitmap_weight_gt(..., N)
> >>
> >> and:
> >> bitmap_weight(...) <= N
> >>
> >> becomes:
> >>
> >> bitmap_weight_lt(..., N+1)
> >> or:
> >> !bitmap_weight_gt(..., N)
> >>
> >> I'd rather see something resembling memcmp() API that's known enough
> >> to be easier to grasp. For above examples:
> >>
> >> bitmap_weight_cmp(..., N) > 0
> >> bitmap_weight_cmp(..., N) <= 0
> >> ...
> >
> >bitmap_weight_cmp() cannot be efficient. Consider this example:
> >
> >bitmap_weight_lt(1000 0000 0000 0000, 1) == false
> > ^
> > stop here
> >
> >bitmap_weight_cmp(1000 0000 0000 0000, 1) == 0
> > ^
> > stop here
> >
> >I agree that '_gt' is less verbose than '>', but the advantage of
> >'_gt' over '>' is proportional to length of bitmap, and it means
> >that this API should exist.
>
> Thank you for the example. Indeed, for less-than to be efficient here you would need to replace
> bitmap_weight_cmp(..., N) < 0
> with
> bitmap_weight_cmp(..., N-1) <= 0

Indeed, thanks for pointing to it.

> It would still be more readable, I think.

To be honest, I'm not sure that
bitmap_weight_cmp(..., N-1) <= 0
would be an obvious replacement for the original
bitmap_weight(...) < N
comparing to
bitmap_weight_lt(..., N)

I think the best thing I can do is to add bitmap_weight_cmp() as
you suggested, and turn lt and others to be wrappers on it. This
will let people choose a better function in each case.

I also think that for v2 it would be better to drop the conversion
for short bitmaps, except for switching to bitmap_empty(), because
in that case readability wins over performance; if no objections.

Thanks,
Yury

2021-12-14 19:43:27

by Yury Norov

[permalink] [raw]
Subject: Re: [PATCH 2/9] lib/bitmap: implement bitmap_{empty,full} with bitmap_weight_eq()

On Sun, Nov 28, 2021 at 10:10 AM Michał Mirosław
<[email protected]> wrote:
>
> On Sat, Nov 27, 2021 at 07:56:57PM -0800, Yury Norov wrote:
> > Now as we have bitmap_weight_eq(), switch bitmap_full() and
> > bitmap_empty() to using it.
> >
> > Signed-off-by: Yury Norov <[email protected]>
> > ---
> > include/linux/bitmap.h | 26 ++++++++++----------------
> > 1 file changed, 10 insertions(+), 16 deletions(-)
> >
> > diff --git a/include/linux/bitmap.h b/include/linux/bitmap.h
> > index 996041f771c8..2d951e4dc814 100644
> > --- a/include/linux/bitmap.h
> > +++ b/include/linux/bitmap.h
> > @@ -386,22 +386,6 @@ static inline int bitmap_subset(const unsigned long *src1,
> > return __bitmap_subset(src1, src2, nbits);
> > }
> >
> > -static inline bool bitmap_empty(const unsigned long *src, unsigned nbits)
> > -{
> > - if (small_const_nbits(nbits))
> > - return ! (*src & BITMAP_LAST_WORD_MASK(nbits));
> > -
> > - return find_first_bit(src, nbits) == nbits;
> > -}
>
> Since this is supposed to be an optimization, I would go all the way and
> replace this with the trivial implementation instead:
>
> bool bitmap_empty(long *bits, size_t nbits)
> {
> for (; nbits >= BITS_PER_LONG; ++bits, nbits -= BITS_PER_LONG)
> if (*bits)
> return false;
>
> if (nbits && *bits & BITMAP_LAST_WORD_MASK(nbits))
> return false;
>
> return true;
> }

This is what current implementations basically do, based on find_first_bit().

I think that for long bitmaps the most time consuming operation is moving
data to L1, and for short bitmaps the difference between approaches is
barely measurable.

But hweght_long on each iteration can't be more effective than the current
version. So, I'll drop this patch for v2 and keep things unchanged.

2021-12-15 08:41:11

by David Laight

[permalink] [raw]
Subject: RE: [PATCH 2/9] lib/bitmap: implement bitmap_{empty,full} with bitmap_weight_eq()

From: Yury Norov
> Sent: 14 December 2021 19:43
...
>
> I think that for long bitmaps the most time consuming operation is moving
> data to L1, and for short bitmaps the difference between approaches is
> barely measurable.
>
> But hweght_long on each iteration can't be more effective than the current
> version. So, I'll drop this patch for v2 and keep things unchanged.

Actually do bitmap_full/empty() calls make any sense at all?
The result is stale since bitmaps are designed to do locked operations.
If you have a lock covering the bitmap then you should be using
something that uses non-locked accesses.
Rightly or wrongly that isn't the bitmap api.

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

2021-12-15 17:45:49

by Yury Norov

[permalink] [raw]
Subject: Re: [PATCH 2/9] lib/bitmap: implement bitmap_{empty,full} with bitmap_weight_eq()

On Wed, Dec 15, 2021 at 12:41 AM David Laight <[email protected]> wrote:
>
> From: Yury Norov
> > Sent: 14 December 2021 19:43
> ...
> >
> > I think that for long bitmaps the most time consuming operation is moving
> > data to L1, and for short bitmaps the difference between approaches is
> > barely measurable.
> >
> > But hweght_long on each iteration can't be more effective than the current
> > version. So, I'll drop this patch for v2 and keep things unchanged.
>
> Actually do bitmap_full/empty() calls make any sense at all?
> The result is stale since bitmaps are designed to do locked operations.
> If you have a lock covering the bitmap then you should be using
> something that uses non-locked accesses.
> Rightly or wrongly that isn't the bitmap api.

Are you talking about __{set,clear}_bit()?
include/asm-generic/bitops/non-atomic.h