2022-01-24 12:11:15

by Yury Norov

[permalink] [raw]
Subject: [PATCH v3 00/54] lib/bitmap: optimize bitmap_weight() usage

In many cases people use bitmap_weight()-based functions to compare
the result against a number of expression:

if (cpumask_weight(mask) > 1)
do_something();

This may take considerable amount of time on many-cpus machines because
cpumask_weight() will traverse every word of underlying cpumask
unconditionally.

We can significantly improve on it for many real cases if stop traversing
the mask as soon as we count cpus to any number greater than 1:

if (cpumask_weight_gt(mask, 1))
do_something();

The first part of series converts cpumask_weight() to cpumask_empty()
if the number to compare with is 0. Ditto for bitmap_weigth() and
nodes_weight().

In the 2nd part of the series bitmap_weight_cmp() is added together with
bitmap_weight_{eq,gt,ge,lt,le} wrappers on top of it. Corresponding
wrappers for cpumask and nodemask are added as well.

v1: https://lkml.org/lkml/2021/11/27/339
v2: https://lkml.org/lkml/2021/12/18/241
v3:
- drop subseries for possible, present and active cpumasks. Will
submit it separately if needed;
- split patches per subsystems as requested by Greg and Michał;
- trim the recipient list. Add drivers and arch maintainers to
corresponding patches only.

Yury Norov (54):
net/dsa: don't use bitmap_weight() in b53_arl_read()
net/ethernet: don't use bitmap_weight() in bcm_sysport_rule_set()
thermal/intel: don't use bitmap_weight() in end_power_clamp()
net: mellanox: fix open-coded for_each_set_bit()
nds32: perf: replace bitmap_weight with bitmap_empty where appropriate
x86/kvm: replace bitmap_weight with bitmap_empty where appropriate
gpu: drm: replace bitmap_weight with bitmap_empty where appropriate
net: ethernet: replace bitmap_weight with bitmap_empty for intel
net: ethernet: replace bitmap_weight with bitmap_empty for Marvell
net: ethernet: replace bitmap_weight with bitmap_empty for qlogic
perf: replace bitmap_weight with bitmap_empty where appropriate
tools/perf: replace bitmap_weight with bitmap_empty where appropriate
arch/alpha: replace cpumask_weight with cpumask_empty where
appropriate
arch/ia64: replace cpumask_weight with cpumask_empty where appropriate
arch/x86: replace cpumask_weight with cpumask_empty where appropriate
cpufreq: replace cpumask_weight with cpumask_empty where appropriate
gpu: drm: replace cpumask_weight with cpumask_empty where appropriate
drivers/infiniband: replace cpumask_weight with cpumask_empty where
appropriate
drivers/irqchip: replace cpumask_weight with cpumask_empty where
appropriate
kernel/irq: replace cpumask_weight with cpumask_empty where
appropriate
kernel: replace cpumask_weight with cpumask_empty in padata.c
rcu: replace cpumask_weight with cpumask_empty where appropriate
sched: replace cpumask_weight with cpumask_empty where appropriate
time: replace cpumask_weight with cpumask_empty in clocksource.c
mm/vmstat: replace cpumask_weight with cpumask_empty where appropriate
arch/x86: replace nodes_weight with nodes_empty where appropriate
lib/bitmap: add bitmap_weight_{cmp, eq, gt, ge, lt, le} functions
arch/x86: replace bitmap_weight with bitmap_weight_{eq,gt,ge,lt,le}
where appropriate
drivers/iio: replace bitmap_weight() with bitmap_weight_{eq,gt} where
appropriate
drivers/memstick: replace bitmap_weight with bitmap_weight_eq where
appropriate
net: ethernet: replace bitmap_weight with bitmap_weight_eq for intel
net: ethernet: replace bitmap_weight with bitmap_weight_{eq,gt} for
OcteonTX2
net: ethernet: replace bitmap_weight with
bitmap_weight_{eq,gt,ge,lt,le} for mellanox
perf: replace bitmap_weight with bitmap_weight_eq for ThunderX2
drivers/staging: replace bitmap_weight with bitmap_weight_le for
tegra-video
lib/cpumask: add cpumask_weight_{eq,gt,ge,lt,le}
arch/ia64: replace cpumask_weight with cpumask_weight_eq in mm/tlb.c
arch/mips: replace cpumask_weight with cpumask_weight_{eq, ...} where
appropriate
arch/powerpc: replace cpumask_weight with cpumask_weight_{eq, ...}
where appropriate
arch/s390: replace cpumask_weight with cpumask_weight_eq where
appropriate
arch/x86: replace cpumask_weight with cpumask_weight_eq where
appropriate
firmware: pcsi: replace cpumask_weight with cpumask_weight_eq
drivers/hv: replace cpumask_weight with cpumask_weight_eq
infiniband: replace cpumask_weight with cpumask_weight_{eq, ...} where
appropriate
scsi: replace cpumask_weight with cpumask_weight_gt
soc: replace cpumask_weight with cpumask_weight_lt
sched: replace cpumask_weight with cpumask_weight_eq where appropriate
kernel/time: replace cpumask_weight with cpumask_weight_eq where
appropriate
lib/nodemask: add nodemask_weight_{eq,gt,ge,lt,le}
acpi: replace nodes__weight with nodes_weight_ge for numa
mm: replace nodes_weight with nodes_weight_eq in mempolicy
lib/nodemask: add num_node_state_eq()
tools/bitmap: sync bitmap_weight
MAINTAINERS: add cpumask and nodemask files to BITMAP_API

MAINTAINERS | 4 +
arch/alpha/kernel/process.c | 2 +-
arch/ia64/kernel/setup.c | 2 +-
arch/ia64/mm/tlb.c | 2 +-
arch/mips/cavium-octeon/octeon-irq.c | 4 +-
arch/mips/kernel/crash.c | 2 +-
arch/nds32/kernel/perf_event_cpu.c | 2 +-
arch/powerpc/kernel/smp.c | 2 +-
arch/powerpc/kernel/watchdog.c | 2 +-
arch/powerpc/xmon/xmon.c | 4 +-
arch/s390/kernel/perf_cpum_cf.c | 2 +-
arch/x86/kernel/cpu/resctrl/rdtgroup.c | 16 ++--
arch/x86/kernel/smpboot.c | 4 +-
arch/x86/kvm/hyperv.c | 8 +-
arch/x86/mm/amdtopology.c | 2 +-
arch/x86/mm/mmio-mod.c | 2 +-
arch/x86/mm/numa_emulation.c | 4 +-
arch/x86/platform/uv/uv_nmi.c | 2 +-
drivers/acpi/numa/srat.c | 2 +-
drivers/cpufreq/qcom-cpufreq-hw.c | 2 +-
drivers/cpufreq/scmi-cpufreq.c | 2 +-
drivers/firmware/psci/psci_checker.c | 2 +-
drivers/gpu/drm/i915/i915_pmu.c | 2 +-
drivers/gpu/drm/msm/disp/mdp5/mdp5_smp.c | 2 +-
drivers/hv/channel_mgmt.c | 4 +-
drivers/iio/dummy/iio_simple_dummy_buffer.c | 4 +-
drivers/iio/industrialio-trigger.c | 2 +-
drivers/infiniband/hw/hfi1/affinity.c | 13 ++-
drivers/infiniband/hw/qib/qib_file_ops.c | 2 +-
drivers/infiniband/hw/qib/qib_iba7322.c | 2 +-
drivers/irqchip/irq-bcm6345-l1.c | 2 +-
drivers/memstick/core/ms_block.c | 4 +-
drivers/net/dsa/b53/b53_common.c | 6 +-
drivers/net/ethernet/broadcom/bcmsysport.c | 6 +-
.../net/ethernet/intel/ice/ice_virtchnl_pf.c | 4 +-
.../net/ethernet/intel/ixgbe/ixgbe_sriov.c | 2 +-
.../marvell/octeontx2/nic/otx2_ethtool.c | 2 +-
.../marvell/octeontx2/nic/otx2_flows.c | 8 +-
.../ethernet/marvell/octeontx2/nic/otx2_pf.c | 2 +-
drivers/net/ethernet/mellanox/mlx4/cmd.c | 33 +++-----
drivers/net/ethernet/mellanox/mlx4/eq.c | 4 +-
drivers/net/ethernet/mellanox/mlx4/fw.c | 4 +-
drivers/net/ethernet/mellanox/mlx4/main.c | 2 +-
drivers/net/ethernet/qlogic/qed/qed_rdma.c | 4 +-
drivers/net/ethernet/qlogic/qed/qed_roce.c | 2 +-
drivers/perf/arm-cci.c | 2 +-
drivers/perf/arm_pmu.c | 4 +-
drivers/perf/hisilicon/hisi_uncore_pmu.c | 2 +-
drivers/perf/thunderx2_pmu.c | 4 +-
drivers/perf/xgene_pmu.c | 2 +-
drivers/scsi/lpfc/lpfc_init.c | 2 +-
drivers/soc/fsl/qbman/qman_test_stash.c | 2 +-
drivers/staging/media/tegra-video/vi.c | 2 +-
drivers/thermal/intel/intel_powerclamp.c | 9 +--
include/linux/bitmap.h | 80 +++++++++++++++++++
include/linux/cpumask.h | 50 ++++++++++++
include/linux/nodemask.h | 40 ++++++++++
kernel/irq/affinity.c | 2 +-
kernel/padata.c | 2 +-
kernel/rcu/tree_nocb.h | 4 +-
kernel/rcu/tree_plugin.h | 2 +-
kernel/sched/core.c | 10 +--
kernel/sched/topology.c | 4 +-
kernel/time/clockevents.c | 2 +-
kernel/time/clocksource.c | 2 +-
lib/bitmap.c | 21 +++++
mm/mempolicy.c | 2 +-
mm/page_alloc.c | 2 +-
mm/vmstat.c | 4 +-
tools/include/linux/bitmap.h | 44 ++++++++++
tools/lib/bitmap.c | 20 +++++
tools/perf/builtin-c2c.c | 4 +-
tools/perf/util/pmu.c | 2 +-
73 files changed, 374 insertions(+), 142 deletions(-)

--
2.30.2


2022-01-24 12:11:17

by Yury Norov

[permalink] [raw]
Subject: [PATCH 01/54] net/dsa: don't use bitmap_weight() in b53_arl_read()

Don't call bitmap_weight() if the following code can get by
without it.

Signed-off-by: Yury Norov <[email protected]>
---
drivers/net/dsa/b53/b53_common.c | 6 +-----
1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/drivers/net/dsa/b53/b53_common.c b/drivers/net/dsa/b53/b53_common.c
index 3867f3d4545f..9a10d80125d9 100644
--- a/drivers/net/dsa/b53/b53_common.c
+++ b/drivers/net/dsa/b53/b53_common.c
@@ -1620,12 +1620,8 @@ static int b53_arl_read(struct b53_device *dev, u64 mac,
return 0;
}

- if (bitmap_weight(free_bins, dev->num_arl_bins) == 0)
- return -ENOSPC;
-
*idx = find_first_bit(free_bins, dev->num_arl_bins);
-
- return -ENOENT;
+ return *idx >= dev->num_arl_bins ? -ENOSPC : -ENOENT;
}

static int b53_arl_op(struct b53_device *dev, int op, int port,
--
2.30.2

2022-01-24 12:11:22

by Yury Norov

[permalink] [raw]
Subject: [PATCH 02/54] net/ethernet: don't use bitmap_weight() in bcm_sysport_rule_set()

Don't call bitmap_weight() if the following code can get by
without it.

Signed-off-by: Yury Norov <[email protected]>
---
drivers/net/ethernet/broadcom/bcmsysport.c | 6 +-----
1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bcmsysport.c b/drivers/net/ethernet/broadcom/bcmsysport.c
index 60dde29974bf..5284a5c961db 100644
--- a/drivers/net/ethernet/broadcom/bcmsysport.c
+++ b/drivers/net/ethernet/broadcom/bcmsysport.c
@@ -2180,13 +2180,9 @@ static int bcm_sysport_rule_set(struct bcm_sysport_priv *priv,
if (nfc->fs.ring_cookie != RX_CLS_FLOW_WAKE)
return -EOPNOTSUPP;

- /* All filters are already in use, we cannot match more rules */
- if (bitmap_weight(priv->filters, RXCHK_BRCM_TAG_MAX) ==
- RXCHK_BRCM_TAG_MAX)
- return -ENOSPC;
-
index = find_first_zero_bit(priv->filters, RXCHK_BRCM_TAG_MAX);
if (index >= RXCHK_BRCM_TAG_MAX)
+ /* All filters are already in use, we cannot match more rules */
return -ENOSPC;

/* Location is the classification ID, and index is the position
--
2.30.2

2022-01-24 12:11:27

by Yury Norov

[permalink] [raw]
Subject: [PATCH 03/54] thermal/intel: don't use bitmap_weight() in end_power_clamp()

Don't call bitmap_weight() if the following code can get by
without it.

Signed-off-by: Yury Norov <[email protected]>
---
drivers/thermal/intel/intel_powerclamp.c | 9 +++------
1 file changed, 3 insertions(+), 6 deletions(-)

diff --git a/drivers/thermal/intel/intel_powerclamp.c b/drivers/thermal/intel/intel_powerclamp.c
index 14256421d98c..c841ab37e7c6 100644
--- a/drivers/thermal/intel/intel_powerclamp.c
+++ b/drivers/thermal/intel/intel_powerclamp.c
@@ -556,12 +556,9 @@ static void end_power_clamp(void)
* stop faster.
*/
clamping = false;
- if (bitmap_weight(cpu_clamping_mask, num_possible_cpus())) {
- for_each_set_bit(i, cpu_clamping_mask, num_possible_cpus()) {
- pr_debug("clamping worker for cpu %d alive, destroy\n",
- i);
- stop_power_clamp_worker(i);
- }
+ for_each_set_bit(i, cpu_clamping_mask, num_possible_cpus()) {
+ pr_debug("clamping worker for cpu %d alive, destroy\n", i);
+ stop_power_clamp_worker(i);
}
}

--
2.30.2

2022-01-24 12:11:29

by Yury Norov

[permalink] [raw]
Subject: [PATCH 04/54] net: mellanox: fix open-coded for_each_set_bit()

Mellanox driver has an open-coded for_each_set_bit(). Fix it.

Signed-off-by: Yury Norov <[email protected]>
---
drivers/net/ethernet/mellanox/mlx4/cmd.c | 23 ++++++-----------------
1 file changed, 6 insertions(+), 17 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/cmd.c b/drivers/net/ethernet/mellanox/mlx4/cmd.c
index e10b7b04b894..c56d2194cbfc 100644
--- a/drivers/net/ethernet/mellanox/mlx4/cmd.c
+++ b/drivers/net/ethernet/mellanox/mlx4/cmd.c
@@ -1994,21 +1994,16 @@ static void mlx4_allocate_port_vpps(struct mlx4_dev *dev, int port)

static int mlx4_master_activate_admin_state(struct mlx4_priv *priv, int slave)
{
- int port, err;
+ int p, port, err;
struct mlx4_vport_state *vp_admin;
struct mlx4_vport_oper_state *vp_oper;
struct mlx4_slave_state *slave_state =
&priv->mfunc.master.slave_state[slave];
struct mlx4_active_ports actv_ports = mlx4_get_active_ports(
&priv->dev, slave);
- int min_port = find_first_bit(actv_ports.ports,
- priv->dev.caps.num_ports) + 1;
- int max_port = min_port - 1 +
- bitmap_weight(actv_ports.ports, priv->dev.caps.num_ports);

- for (port = min_port; port <= max_port; port++) {
- if (!test_bit(port - 1, actv_ports.ports))
- continue;
+ for_each_set_bit(p, actv_ports.ports, priv->dev.caps.num_ports) {
+ port = p + 1;
priv->mfunc.master.vf_oper[slave].smi_enabled[port] =
priv->mfunc.master.vf_admin[slave].enable_smi[port];
vp_oper = &priv->mfunc.master.vf_oper[slave].vport[port];
@@ -2063,19 +2058,13 @@ static int mlx4_master_activate_admin_state(struct mlx4_priv *priv, int slave)

static void mlx4_master_deactivate_admin_state(struct mlx4_priv *priv, int slave)
{
- int port;
+ int p, port;
struct mlx4_vport_oper_state *vp_oper;
struct mlx4_active_ports actv_ports = mlx4_get_active_ports(
&priv->dev, slave);
- int min_port = find_first_bit(actv_ports.ports,
- priv->dev.caps.num_ports) + 1;
- int max_port = min_port - 1 +
- bitmap_weight(actv_ports.ports, priv->dev.caps.num_ports);

-
- for (port = min_port; port <= max_port; port++) {
- if (!test_bit(port - 1, actv_ports.ports))
- continue;
+ for_each_set_bit(p, actv_ports.ports, priv->dev.caps.num_ports) {
+ port = p + 1;
priv->mfunc.master.vf_oper[slave].smi_enabled[port] =
MLX4_VF_SMI_DISABLED;
vp_oper = &priv->mfunc.master.vf_oper[slave].vport[port];
--
2.30.2

2022-01-24 12:11:30

by Yury Norov

[permalink] [raw]
Subject: [PATCH 05/54] nds32: perf: replace bitmap_weight with bitmap_empty where appropriate

nds32_pmu_enable calls bitmap_weight() to check if any bit of a given
bitmap is set. It's better to use bitmap_empty() in that case because
bitmap_empty() stops traversing the bitmap as soon as it finds first
set bit, while bitmap_weight() counts all bits unconditionally.

Signed-off-by: Yury Norov <[email protected]>
---
arch/nds32/kernel/perf_event_cpu.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/nds32/kernel/perf_event_cpu.c b/arch/nds32/kernel/perf_event_cpu.c
index a78a879e7ef1..ea44e9ecb5c7 100644
--- a/arch/nds32/kernel/perf_event_cpu.c
+++ b/arch/nds32/kernel/perf_event_cpu.c
@@ -695,7 +695,7 @@ static void nds32_pmu_enable(struct pmu *pmu)
{
struct nds32_pmu *nds32_pmu = to_nds32_pmu(pmu);
struct pmu_hw_events *hw_events = nds32_pmu->get_hw_events();
- int enabled = bitmap_weight(hw_events->used_mask,
+ bool enabled = !bitmap_empty(hw_events->used_mask,
nds32_pmu->num_events);

if (enabled)
--
2.30.2

2022-01-24 12:11:31

by Yury Norov

[permalink] [raw]
Subject: [PATCH 06/54] x86/kvm: replace bitmap_weight with bitmap_empty where appropriate

In some places kvm/hyperv.c code calls bitmap_weight() to check if any bit
of a given bitmap is set. It's better to use bitmap_empty() in that case
because bitmap_empty() stops traversing the bitmap as soon as it finds
first set bit, while bitmap_weight() counts all bits unconditionally.

Signed-off-by: Yury Norov <[email protected]>
---
arch/x86/kvm/hyperv.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c
index 6e38a7d22e97..2c3400dea4b3 100644
--- a/arch/x86/kvm/hyperv.c
+++ b/arch/x86/kvm/hyperv.c
@@ -90,7 +90,7 @@ static void synic_update_vector(struct kvm_vcpu_hv_synic *synic,
{
struct kvm_vcpu *vcpu = hv_synic_to_vcpu(synic);
struct kvm_hv *hv = to_kvm_hv(vcpu->kvm);
- int auto_eoi_old, auto_eoi_new;
+ bool auto_eoi_old, auto_eoi_new;

if (vector < HV_SYNIC_FIRST_VALID_VECTOR)
return;
@@ -100,16 +100,16 @@ static void synic_update_vector(struct kvm_vcpu_hv_synic *synic,
else
__clear_bit(vector, synic->vec_bitmap);

- auto_eoi_old = bitmap_weight(synic->auto_eoi_bitmap, 256);
+ auto_eoi_old = bitmap_empty(synic->auto_eoi_bitmap, 256);

if (synic_has_vector_auto_eoi(synic, vector))
__set_bit(vector, synic->auto_eoi_bitmap);
else
__clear_bit(vector, synic->auto_eoi_bitmap);

- auto_eoi_new = bitmap_weight(synic->auto_eoi_bitmap, 256);
+ auto_eoi_new = bitmap_empty(synic->auto_eoi_bitmap, 256);

- if (!!auto_eoi_old == !!auto_eoi_new)
+ if (auto_eoi_old == auto_eoi_new)
return;

down_write(&vcpu->kvm->arch.apicv_update_lock);
--
2.30.2

2022-01-24 12:11:48

by Yury Norov

[permalink] [raw]
Subject: [PATCH 09/54] net: ethernet: replace bitmap_weight with bitmap_empty for Marvell

In some places, octeontx2 code calls bitmap_weight() to check if any bit of
a given bitmap is set. It's better to use bitmap_empty() in that case
because bitmap_empty() stops traversing the bitmap as soon as it finds
first set bit, while bitmap_weight() counts all bits unconditionally.

Signed-off-by: Yury Norov <[email protected]>
---
drivers/net/ethernet/marvell/octeontx2/nic/otx2_flows.c | 4 ++--
drivers/net/ethernet/marvell/octeontx2/nic/otx2_pf.c | 2 +-
2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_flows.c b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_flows.c
index 77a13fb555fb..80b2d64b4136 100644
--- a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_flows.c
+++ b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_flows.c
@@ -353,7 +353,7 @@ int otx2_add_macfilter(struct net_device *netdev, const u8 *mac)
{
struct otx2_nic *pf = netdev_priv(netdev);

- if (bitmap_weight(&pf->flow_cfg->dmacflt_bmap,
+ if (!bitmap_empty(&pf->flow_cfg->dmacflt_bmap,
pf->flow_cfg->dmacflt_max_flows))
netdev_warn(netdev,
"Add %pM to CGX/RPM DMAC filters list as well\n",
@@ -436,7 +436,7 @@ int otx2_get_maxflows(struct otx2_flow_config *flow_cfg)
return 0;

if (flow_cfg->nr_flows == flow_cfg->max_flows ||
- bitmap_weight(&flow_cfg->dmacflt_bmap,
+ !bitmap_empty(&flow_cfg->dmacflt_bmap,
flow_cfg->dmacflt_max_flows))
return flow_cfg->max_flows + flow_cfg->dmacflt_max_flows;
else
diff --git a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_pf.c b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_pf.c
index 6080ebd9bd94..3d369ccc7ab9 100644
--- a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_pf.c
+++ b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_pf.c
@@ -1115,7 +1115,7 @@ static int otx2_cgx_config_loopback(struct otx2_nic *pf, bool enable)
struct msg_req *msg;
int err;

- if (enable && bitmap_weight(&pf->flow_cfg->dmacflt_bmap,
+ if (enable && !bitmap_empty(&pf->flow_cfg->dmacflt_bmap,
pf->flow_cfg->dmacflt_max_flows))
netdev_warn(pf->netdev,
"CGX/RPM internal loopback might not work as DMAC filters are active\n");
--
2.30.2

2022-01-24 12:12:00

by Yury Norov

[permalink] [raw]
Subject: [PATCH 08/54] net: ethernet: replace bitmap_weight with bitmap_empty for intel

The ice_vf_has_no_qs_ena() calls bitmap_weight() to check if any bit
of a given bitmap is set. It's better to use bitmap_empty() in that
case because bitmap_empty() stops traversing the bitmap as soon as it
finds first set bit, while bitmap_weight() counts all bits
unconditionally.

Signed-off-by: Yury Norov <[email protected]>
---
drivers/net/ethernet/intel/ice/ice_virtchnl_pf.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.c b/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.c
index 39b80124d282..9dd52aab68cc 100644
--- a/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.c
+++ b/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.c
@@ -267,8 +267,8 @@ ice_set_pfe_link(struct ice_vf *vf, struct virtchnl_pf_event *pfe,
*/
static bool ice_vf_has_no_qs_ena(struct ice_vf *vf)
{
- return (!bitmap_weight(vf->rxq_ena, ICE_MAX_RSS_QS_PER_VF) &&
- !bitmap_weight(vf->txq_ena, ICE_MAX_RSS_QS_PER_VF));
+ return (bitmap_empty(vf->rxq_ena, ICE_MAX_RSS_QS_PER_VF) &&
+ bitmap_empty(vf->txq_ena, ICE_MAX_RSS_QS_PER_VF));
}

/**
--
2.30.2

2022-01-24 12:12:25

by Yury Norov

[permalink] [raw]
Subject: [PATCH 10/54] net: ethernet: replace bitmap_weight with bitmap_empty for qlogic

qlogic/qed code calls bitmap_weight() to check if any bit of a given
bitmap is set. It's better to use bitmap_empty() in that case because
bitmap_empty() stops traversing the bitmap as soon as it finds first
set bit, while bitmap_weight() counts all bits unconditionally.

Signed-off-by: Yury Norov <[email protected]>
---
drivers/net/ethernet/qlogic/qed/qed_rdma.c | 4 ++--
drivers/net/ethernet/qlogic/qed/qed_roce.c | 2 +-
2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/qlogic/qed/qed_rdma.c b/drivers/net/ethernet/qlogic/qed/qed_rdma.c
index 23b668de4640..b6e2e17bac04 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_rdma.c
+++ b/drivers/net/ethernet/qlogic/qed/qed_rdma.c
@@ -336,7 +336,7 @@ void qed_rdma_bmap_free(struct qed_hwfn *p_hwfn,

/* print aligned non-zero lines, if any */
for (item = 0, line = 0; line < last_line; line++, item += 8)
- if (bitmap_weight((unsigned long *)&pmap[item], 64 * 8))
+ if (!bitmap_empty((unsigned long *)&pmap[item], 64 * 8))
DP_NOTICE(p_hwfn,
"line 0x%04x: 0x%016llx 0x%016llx 0x%016llx 0x%016llx 0x%016llx 0x%016llx 0x%016llx 0x%016llx\n",
line,
@@ -350,7 +350,7 @@ void qed_rdma_bmap_free(struct qed_hwfn *p_hwfn,

/* print last unaligned non-zero line, if any */
if ((bmap->max_count % (64 * 8)) &&
- (bitmap_weight((unsigned long *)&pmap[item],
+ (!bitmap_empty((unsigned long *)&pmap[item],
bmap->max_count - item * 64))) {
offset = sprintf(str_last_line, "line 0x%04x: ", line);
for (; item < last_item; item++)
diff --git a/drivers/net/ethernet/qlogic/qed/qed_roce.c b/drivers/net/ethernet/qlogic/qed/qed_roce.c
index 071b4aeaddf2..134ecfca96a3 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_roce.c
+++ b/drivers/net/ethernet/qlogic/qed/qed_roce.c
@@ -76,7 +76,7 @@ void qed_roce_stop(struct qed_hwfn *p_hwfn)
* We delay for a short while if an async destroy QP is still expected.
* Beyond the added delay we clear the bitmap anyway.
*/
- while (bitmap_weight(rcid_map->bitmap, rcid_map->max_count)) {
+ while (!bitmap_empty(rcid_map->bitmap, rcid_map->max_count)) {
/* If the HW device is during recovery, all resources are
* immediately reset without receiving a per-cid indication
* from HW. In this case we don't expect the cid bitmap to be
--
2.30.2

2022-01-24 12:12:43

by Yury Norov

[permalink] [raw]
Subject: [PATCH 11/54] perf: replace bitmap_weight with bitmap_empty where appropriate

In some places, drivers/perf code calls bitmap_weight() to check if any
bit of a given bitmap is set. It's better to use bitmap_empty() in that
case because bitmap_empty() stops traversing the bitmap as soon as it
finds first set bit, while bitmap_weight() counts all bits unconditionally.

Signed-off-by: Yury Norov <[email protected]>
---
drivers/perf/arm-cci.c | 2 +-
drivers/perf/arm_pmu.c | 4 ++--
drivers/perf/hisilicon/hisi_uncore_pmu.c | 2 +-
drivers/perf/xgene_pmu.c | 2 +-
4 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/perf/arm-cci.c b/drivers/perf/arm-cci.c
index 54aca3a62814..96e09fa40909 100644
--- a/drivers/perf/arm-cci.c
+++ b/drivers/perf/arm-cci.c
@@ -1096,7 +1096,7 @@ static void cci_pmu_enable(struct pmu *pmu)
{
struct cci_pmu *cci_pmu = to_cci_pmu(pmu);
struct cci_pmu_hw_events *hw_events = &cci_pmu->hw_events;
- int enabled = bitmap_weight(hw_events->used_mask, cci_pmu->num_cntrs);
+ bool enabled = !bitmap_empty(hw_events->used_mask, cci_pmu->num_cntrs);
unsigned long flags;

if (!enabled)
diff --git a/drivers/perf/arm_pmu.c b/drivers/perf/arm_pmu.c
index 295cc7952d0e..a31b302b0ade 100644
--- a/drivers/perf/arm_pmu.c
+++ b/drivers/perf/arm_pmu.c
@@ -524,7 +524,7 @@ static void armpmu_enable(struct pmu *pmu)
{
struct arm_pmu *armpmu = to_arm_pmu(pmu);
struct pmu_hw_events *hw_events = this_cpu_ptr(armpmu->hw_events);
- int enabled = bitmap_weight(hw_events->used_mask, armpmu->num_events);
+ bool enabled = !bitmap_empty(hw_events->used_mask, armpmu->num_events);

/* For task-bound events we may be called on other CPUs */
if (!cpumask_test_cpu(smp_processor_id(), &armpmu->supported_cpus))
@@ -785,7 +785,7 @@ static int cpu_pm_pmu_notify(struct notifier_block *b, unsigned long cmd,
{
struct arm_pmu *armpmu = container_of(b, struct arm_pmu, cpu_pm_nb);
struct pmu_hw_events *hw_events = this_cpu_ptr(armpmu->hw_events);
- int enabled = bitmap_weight(hw_events->used_mask, armpmu->num_events);
+ bool enabled = !bitmap_empty(hw_events->used_mask, armpmu->num_events);

if (!cpumask_test_cpu(smp_processor_id(), &armpmu->supported_cpus))
return NOTIFY_DONE;
diff --git a/drivers/perf/hisilicon/hisi_uncore_pmu.c b/drivers/perf/hisilicon/hisi_uncore_pmu.c
index a738aeab5c04..358e4e284a62 100644
--- a/drivers/perf/hisilicon/hisi_uncore_pmu.c
+++ b/drivers/perf/hisilicon/hisi_uncore_pmu.c
@@ -393,7 +393,7 @@ EXPORT_SYMBOL_GPL(hisi_uncore_pmu_read);
void hisi_uncore_pmu_enable(struct pmu *pmu)
{
struct hisi_pmu *hisi_pmu = to_hisi_pmu(pmu);
- int enabled = bitmap_weight(hisi_pmu->pmu_events.used_mask,
+ bool enabled = !bitmap_empty(hisi_pmu->pmu_events.used_mask,
hisi_pmu->num_counters);

if (!enabled)
diff --git a/drivers/perf/xgene_pmu.c b/drivers/perf/xgene_pmu.c
index 2b6d476bd213..88bd100a9633 100644
--- a/drivers/perf/xgene_pmu.c
+++ b/drivers/perf/xgene_pmu.c
@@ -867,7 +867,7 @@ static void xgene_perf_pmu_enable(struct pmu *pmu)
{
struct xgene_pmu_dev *pmu_dev = to_pmu_dev(pmu);
struct xgene_pmu *xgene_pmu = pmu_dev->parent;
- int enabled = bitmap_weight(pmu_dev->cntr_assign_mask,
+ bool enabled = !bitmap_empty(pmu_dev->cntr_assign_mask,
pmu_dev->max_counters);

if (!enabled)
--
2.30.2

2022-01-24 12:13:00

by Yury Norov

[permalink] [raw]
Subject: [PATCH 07/54] gpu: drm: replace bitmap_weight with bitmap_empty where appropriate

smp_request_block() in drivers/gpu/drm/msm/disp/mdp5/mdp5_smp.c calls
bitmap_weight() to check if any bit of a given bitmap is set. It's
better to use bitmap_empty() in that case because bitmap_empty() stops
traversing the bitmap as soon as it finds first set bit, while
bitmap_weight() counts all bits unconditionally.

Signed-off-by: Yury Norov <[email protected]>
---
drivers/gpu/drm/msm/disp/mdp5/mdp5_smp.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/msm/disp/mdp5/mdp5_smp.c b/drivers/gpu/drm/msm/disp/mdp5/mdp5_smp.c
index d7fa2c49e741..56a3063545ec 100644
--- a/drivers/gpu/drm/msm/disp/mdp5/mdp5_smp.c
+++ b/drivers/gpu/drm/msm/disp/mdp5/mdp5_smp.c
@@ -68,7 +68,7 @@ static int smp_request_block(struct mdp5_smp *smp,
uint8_t reserved;

/* we shouldn't be requesting blocks for an in-use client: */
- WARN_ON(bitmap_weight(cs, cnt) > 0);
+ WARN_ON(!bitmap_empty(cs, cnt));

reserved = smp->reserved[cid];

--
2.30.2

2022-01-24 12:13:13

by Yury Norov

[permalink] [raw]
Subject: [PATCH 12/54] tools/perf: replace bitmap_weight with bitmap_empty where appropriate

Some code in builtin-c2c.c calls bitmap_weight() to check if any bit of
a given bitmap is set. It's better to use bitmap_empty() in that case
because bitmap_empty() stops traversing the bitmap as soon as it finds
first set bit, while bitmap_weight() counts all bits unconditionally.

Signed-off-by: Yury Norov <[email protected]>
---
tools/perf/builtin-c2c.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/tools/perf/builtin-c2c.c b/tools/perf/builtin-c2c.c
index 77dd4afacca4..14f787c67140 100644
--- a/tools/perf/builtin-c2c.c
+++ b/tools/perf/builtin-c2c.c
@@ -1080,7 +1080,7 @@ node_entry(struct perf_hpp_fmt *fmt __maybe_unused, struct perf_hpp *hpp,
bitmap_zero(set, c2c.cpus_cnt);
bitmap_and(set, c2c_he->cpuset, c2c.nodes[node], c2c.cpus_cnt);

- if (!bitmap_weight(set, c2c.cpus_cnt)) {
+ if (bitmap_empty(set, c2c.cpus_cnt)) {
if (c2c.node_info == 1) {
ret = scnprintf(hpp->buf, hpp->size, "%21s", " ");
advance_hpp(hpp, ret);
@@ -1944,7 +1944,7 @@ static int set_nodestr(struct c2c_hist_entry *c2c_he)
if (c2c_he->nodestr)
return 0;

- if (bitmap_weight(c2c_he->nodeset, c2c.nodes_cnt)) {
+ if (!bitmap_empty(c2c_he->nodeset, c2c.nodes_cnt)) {
len = bitmap_scnprintf(c2c_he->nodeset, c2c.nodes_cnt,
buf, sizeof(buf));
} else {
--
2.30.2

2022-01-24 12:13:18

by Yury Norov

[permalink] [raw]
Subject: [PATCH 13/54] arch/alpha: replace cpumask_weight with cpumask_empty where appropriate

common_shutdown_1() calls cpumask_weight() to check if any bit of a
given cpumask is set. We can do it more efficiently with cpumask_empty()
because cpumask_empty() stops traversing the cpumask as soon as it finds
first set bit, while cpumask_weight() counts all bits unconditionally.

Signed-off-by: Yury Norov <[email protected]>
---
arch/alpha/kernel/process.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/alpha/kernel/process.c b/arch/alpha/kernel/process.c
index 5f8527081da9..0d4bc60828bf 100644
--- a/arch/alpha/kernel/process.c
+++ b/arch/alpha/kernel/process.c
@@ -125,7 +125,7 @@ common_shutdown_1(void *generic_ptr)
/* Wait for the secondaries to halt. */
set_cpu_present(boot_cpuid, false);
set_cpu_possible(boot_cpuid, false);
- while (cpumask_weight(cpu_present_mask))
+ while (!cpumask_empty(cpu_present_mask))
barrier();
#endif

--
2.30.2

2022-01-24 12:13:35

by Yury Norov

[permalink] [raw]
Subject: [PATCH 14/54] arch/ia64: replace cpumask_weight with cpumask_empty where appropriate

setup_arch() calls cpumask_weight() to check if any bit of a given cpumask
is set. We can do it more efficiently with cpumask_empty() because
cpumask_empty() stops traversing the cpumask as soon as it finds first set
bit, while cpumask_weight() counts all bits unconditionally.

Signed-off-by: Yury Norov <[email protected]>
---
arch/ia64/kernel/setup.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/ia64/kernel/setup.c b/arch/ia64/kernel/setup.c
index 5010348fa21b..fd6301eafa9d 100644
--- a/arch/ia64/kernel/setup.c
+++ b/arch/ia64/kernel/setup.c
@@ -572,7 +572,7 @@ setup_arch (char **cmdline_p)
#ifdef CONFIG_ACPI_HOTPLUG_CPU
prefill_possible_map();
#endif
- per_cpu_scan_finalize((cpumask_weight(&early_cpu_possible_map) == 0 ?
+ per_cpu_scan_finalize((cpumask_empty(&early_cpu_possible_map) ?
32 : cpumask_weight(&early_cpu_possible_map)),
additional_cpus > 0 ? additional_cpus : 0);
#endif /* CONFIG_ACPI_NUMA */
--
2.30.2

2022-01-24 12:13:48

by Yury Norov

[permalink] [raw]
Subject: [PATCH 15/54] arch/x86: replace cpumask_weight with cpumask_empty where appropriate

In some cases, arch/x86 code calls cpumask_weight() to check if any bit of
a given cpumask is set. We can do it more efficiently with cpumask_empty()
because cpumask_empty() stops traversing the cpumask as soon as it finds
first set bit, while cpumask_weight() counts all bits unconditionally.

Signed-off-by: Yury Norov <[email protected]>
---
arch/x86/kernel/cpu/resctrl/rdtgroup.c | 14 +++++++-------
arch/x86/mm/mmio-mod.c | 2 +-
arch/x86/platform/uv/uv_nmi.c | 2 +-
3 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index b57b3db9a6a7..e23ff03290b8 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -341,14 +341,14 @@ static int cpus_mon_write(struct rdtgroup *rdtgrp, cpumask_var_t newmask,

/* Check whether cpus belong to parent ctrl group */
cpumask_andnot(tmpmask, newmask, &prgrp->cpu_mask);
- if (cpumask_weight(tmpmask)) {
+ if (!cpumask_empty(tmpmask)) {
rdt_last_cmd_puts("Can only add CPUs to mongroup that belong to parent\n");
return -EINVAL;
}

/* Check whether cpus are dropped from this group */
cpumask_andnot(tmpmask, &rdtgrp->cpu_mask, newmask);
- if (cpumask_weight(tmpmask)) {
+ if (!cpumask_empty(tmpmask)) {
/* Give any dropped cpus to parent rdtgroup */
cpumask_or(&prgrp->cpu_mask, &prgrp->cpu_mask, tmpmask);
update_closid_rmid(tmpmask, prgrp);
@@ -359,7 +359,7 @@ static int cpus_mon_write(struct rdtgroup *rdtgrp, cpumask_var_t newmask,
* and update per-cpu rmid
*/
cpumask_andnot(tmpmask, newmask, &rdtgrp->cpu_mask);
- if (cpumask_weight(tmpmask)) {
+ if (!cpumask_empty(tmpmask)) {
head = &prgrp->mon.crdtgrp_list;
list_for_each_entry(crgrp, head, mon.crdtgrp_list) {
if (crgrp == rdtgrp)
@@ -394,7 +394,7 @@ static int cpus_ctrl_write(struct rdtgroup *rdtgrp, cpumask_var_t newmask,

/* Check whether cpus are dropped from this group */
cpumask_andnot(tmpmask, &rdtgrp->cpu_mask, newmask);
- if (cpumask_weight(tmpmask)) {
+ if (!cpumask_empty(tmpmask)) {
/* Can't drop from default group */
if (rdtgrp == &rdtgroup_default) {
rdt_last_cmd_puts("Can't drop CPUs from default group\n");
@@ -413,12 +413,12 @@ static int cpus_ctrl_write(struct rdtgroup *rdtgrp, cpumask_var_t newmask,
* and update per-cpu closid/rmid.
*/
cpumask_andnot(tmpmask, newmask, &rdtgrp->cpu_mask);
- if (cpumask_weight(tmpmask)) {
+ if (!cpumask_empty(tmpmask)) {
list_for_each_entry(r, &rdt_all_groups, rdtgroup_list) {
if (r == rdtgrp)
continue;
cpumask_and(tmpmask1, &r->cpu_mask, tmpmask);
- if (cpumask_weight(tmpmask1))
+ if (!cpumask_empty(tmpmask1))
cpumask_rdtgrp_clear(r, tmpmask1);
}
update_closid_rmid(tmpmask, rdtgrp);
@@ -488,7 +488,7 @@ static ssize_t rdtgroup_cpus_write(struct kernfs_open_file *of,

/* check that user didn't specify any offline cpus */
cpumask_andnot(tmpmask, newmask, cpu_online_mask);
- if (cpumask_weight(tmpmask)) {
+ if (!cpumask_empty(tmpmask)) {
ret = -EINVAL;
rdt_last_cmd_puts("Can only assign online CPUs\n");
goto unlock;
diff --git a/arch/x86/mm/mmio-mod.c b/arch/x86/mm/mmio-mod.c
index 933a2ebad471..c3317f0650d8 100644
--- a/arch/x86/mm/mmio-mod.c
+++ b/arch/x86/mm/mmio-mod.c
@@ -400,7 +400,7 @@ static void leave_uniprocessor(void)
int cpu;
int err;

- if (!cpumask_available(downed_cpus) || cpumask_weight(downed_cpus) == 0)
+ if (!cpumask_available(downed_cpus) || cpumask_empty(downed_cpus))
return;
pr_notice("Re-enabling CPUs...\n");
for_each_cpu(cpu, downed_cpus) {
diff --git a/arch/x86/platform/uv/uv_nmi.c b/arch/x86/platform/uv/uv_nmi.c
index 1e9ff28bc2e0..ea277fc08357 100644
--- a/arch/x86/platform/uv/uv_nmi.c
+++ b/arch/x86/platform/uv/uv_nmi.c
@@ -985,7 +985,7 @@ static int uv_handle_nmi(unsigned int reason, struct pt_regs *regs)

/* Clear global flags */
if (master) {
- if (cpumask_weight(uv_nmi_cpu_mask))
+ if (!cpumask_empty(uv_nmi_cpu_mask))
uv_nmi_cleanup_mask();
atomic_set(&uv_nmi_cpus_in_nmi, -1);
atomic_set(&uv_nmi_cpu, -1);
--
2.30.2

2022-01-24 12:14:00

by Yury Norov

[permalink] [raw]
Subject: [PATCH 16/54] cpufreq: replace cpumask_weight with cpumask_empty where appropriate

drivers/cpufreq calls cpumask_weight() to check if any bit of a given
cpumask is set. We can do it more efficiently with cpumask_empty() because
cpumask_empty() stops traversing the cpumask as soon as it finds first set
bit, while cpumask_weight() counts all bits unconditionally.

Signed-off-by: Yury Norov <[email protected]>
---
drivers/cpufreq/qcom-cpufreq-hw.c | 2 +-
drivers/cpufreq/scmi-cpufreq.c | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/cpufreq/qcom-cpufreq-hw.c b/drivers/cpufreq/qcom-cpufreq-hw.c
index 05f3d7876e44..95a0c57ab5bb 100644
--- a/drivers/cpufreq/qcom-cpufreq-hw.c
+++ b/drivers/cpufreq/qcom-cpufreq-hw.c
@@ -482,7 +482,7 @@ static int qcom_cpufreq_hw_cpu_init(struct cpufreq_policy *policy)
}

qcom_get_related_cpus(index, policy->cpus);
- if (!cpumask_weight(policy->cpus)) {
+ if (cpumask_empty(policy->cpus)) {
dev_err(dev, "Domain-%d failed to get related CPUs\n", index);
ret = -ENOENT;
goto error;
diff --git a/drivers/cpufreq/scmi-cpufreq.c b/drivers/cpufreq/scmi-cpufreq.c
index 1e0cd4d165f0..919fa6e3f462 100644
--- a/drivers/cpufreq/scmi-cpufreq.c
+++ b/drivers/cpufreq/scmi-cpufreq.c
@@ -154,7 +154,7 @@ static int scmi_cpufreq_init(struct cpufreq_policy *policy)
* table and opp-shared.
*/
ret = dev_pm_opp_of_get_sharing_cpus(cpu_dev, priv->opp_shared_cpus);
- if (ret || !cpumask_weight(priv->opp_shared_cpus)) {
+ if (ret || cpumask_empty(priv->opp_shared_cpus)) {
/*
* Either opp-table is not set or no opp-shared was found.
* Use the CPU mask from SCMI to designate CPUs sharing an OPP
--
2.30.2

2022-01-24 12:14:01

by Yury Norov

[permalink] [raw]
Subject: [PATCH 17/54] gpu: drm: replace cpumask_weight with cpumask_empty where appropriate

i915_pmu_cpu_online() calls cpumask_weight() to check if any bit of a
given cpumask is set. We can do it more efficiently with cpumask_empty()
because cpumask_empty() stops traversing the cpumask as soon as it finds
first set bit, while cpumask_weight() counts all bits unconditionally.

Signed-off-by: Yury Norov <[email protected]>
---
drivers/gpu/drm/i915/i915_pmu.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_pmu.c b/drivers/gpu/drm/i915/i915_pmu.c
index ea655161793e..1894c876b31d 100644
--- a/drivers/gpu/drm/i915/i915_pmu.c
+++ b/drivers/gpu/drm/i915/i915_pmu.c
@@ -1048,7 +1048,7 @@ static int i915_pmu_cpu_online(unsigned int cpu, struct hlist_node *node)
GEM_BUG_ON(!pmu->base.event_init);

/* Select the first online CPU as a designated reader. */
- if (!cpumask_weight(&i915_pmu_cpumask))
+ if (cpumask_empty(&i915_pmu_cpumask))
cpumask_set_cpu(cpu, &i915_pmu_cpumask);

return 0;
--
2.30.2

2022-01-24 12:14:08

by Yury Norov

[permalink] [raw]
Subject: [PATCH 18/54] drivers/infiniband: replace cpumask_weight with cpumask_empty where appropriate

drivers/infiniband/hw/hfi1/affinity.c code calls cpumask_weight() to check
if any bit of a given cpumask is set. We can do it more efficiently with
cpumask_empty() because cpumask_empty() stops traversing the cpumask as
soon as it finds first set bit, while cpumask_weight() counts all bits
unconditionally.

Signed-off-by: Yury Norov <[email protected]>
---
drivers/infiniband/hw/hfi1/affinity.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/hw/hfi1/affinity.c b/drivers/infiniband/hw/hfi1/affinity.c
index 98c813ba4304..38eee675369a 100644
--- a/drivers/infiniband/hw/hfi1/affinity.c
+++ b/drivers/infiniband/hw/hfi1/affinity.c
@@ -667,7 +667,7 @@ int hfi1_dev_affinity_init(struct hfi1_devdata *dd)
* engines, use the same CPU cores as general/control
* context.
*/
- if (cpumask_weight(&entry->def_intr.mask) == 0)
+ if (cpumask_empty(&entry->def_intr.mask))
cpumask_copy(&entry->def_intr.mask,
&entry->general_intr_mask);
}
@@ -687,7 +687,7 @@ int hfi1_dev_affinity_init(struct hfi1_devdata *dd)
* vectors, use the same CPU core as the general/control
* context.
*/
- if (cpumask_weight(&entry->comp_vect_mask) == 0)
+ if (cpumask_empty(&entry->comp_vect_mask))
cpumask_copy(&entry->comp_vect_mask,
&entry->general_intr_mask);
}
--
2.30.2

2022-01-24 12:14:09

by Yury Norov

[permalink] [raw]
Subject: [PATCH 19/54] drivers/irqchip: replace cpumask_weight with cpumask_empty where appropriate

bcm6345_l1_of_init() calls cpumask_weight() to check if any bit of a given
cpumask is set. We can do it more efficiently with cpumask_empty() because
cpumask_empty() stops traversing the cpumask as soon as it finds first set
bit, while cpumask_weight() counts all bits unconditionally.

Signed-off-by: Yury Norov <[email protected]>
---
drivers/irqchip/irq-bcm6345-l1.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/irqchip/irq-bcm6345-l1.c b/drivers/irqchip/irq-bcm6345-l1.c
index fd079215c17f..142a7431745f 100644
--- a/drivers/irqchip/irq-bcm6345-l1.c
+++ b/drivers/irqchip/irq-bcm6345-l1.c
@@ -315,7 +315,7 @@ static int __init bcm6345_l1_of_init(struct device_node *dn,
cpumask_set_cpu(idx, &intc->cpumask);
}

- if (!cpumask_weight(&intc->cpumask)) {
+ if (cpumask_empty(&intc->cpumask)) {
ret = -ENODEV;
goto out_free;
}
--
2.30.2

2022-01-24 12:14:23

by Yury Norov

[permalink] [raw]
Subject: [PATCH 20/54] kernel/irq: replace cpumask_weight with cpumask_empty where appropriate

__irq_build_affinity_masks() calls cpumask_weight() to check if
any bit of a given cpumask is set. We can do it more efficiently with
cpumask_empty() because cpumask_empty() stops traversing the cpumask as
soon as it finds first set bit, while cpumask_weight() counts all bits
unconditionally.

Signed-off-by: Yury Norov <[email protected]>
---
kernel/irq/affinity.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/irq/affinity.c b/kernel/irq/affinity.c
index f7ff8919dc9b..18740faf0eb1 100644
--- a/kernel/irq/affinity.c
+++ b/kernel/irq/affinity.c
@@ -258,7 +258,7 @@ static int __irq_build_affinity_masks(unsigned int startvec,
nodemask_t nodemsk = NODE_MASK_NONE;
struct node_vectors *node_vectors;

- if (!cpumask_weight(cpu_mask))
+ if (cpumask_empty(cpu_mask))
return 0;

nodes = get_nodes_in_cpumask(node_to_cpumask, cpu_mask, &nodemsk);
--
2.30.2

2022-01-24 12:14:25

by Yury Norov

[permalink] [raw]
Subject: [PATCH 21/54] kernel: replace cpumask_weight with cpumask_empty in padata.c

padata_do_parallel() calls cpumask_weight() to check if any bit of a
given cpumask is set. We can do it more efficiently with cpumask_empty()
because cpumask_empty() stops traversing the cpumask as soon as it finds
first set bit, while cpumask_weight() counts all bits unconditionally.

Signed-off-by: Yury Norov <[email protected]>
---
kernel/padata.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/padata.c b/kernel/padata.c
index 18d3a5c699d8..e5819bb8bd1d 100644
--- a/kernel/padata.c
+++ b/kernel/padata.c
@@ -181,7 +181,7 @@ int padata_do_parallel(struct padata_shell *ps,
goto out;

if (!cpumask_test_cpu(*cb_cpu, pd->cpumask.cbcpu)) {
- if (!cpumask_weight(pd->cpumask.cbcpu))
+ if (cpumask_empty(pd->cpumask.cbcpu))
goto out;

/* Select an alternate fallback CPU and notify the caller. */
--
2.30.2

2022-01-24 12:14:27

by Yury Norov

[permalink] [raw]
Subject: [PATCH 22/54] rcu: replace cpumask_weight with cpumask_empty where appropriate

In some places, RCU code calls cpumask_weight() to check if any bit of a
given cpumask is set. We can do it more efficiently with cpumask_empty()
because cpumask_empty() stops traversing the cpumask as soon as it finds
first set bit, while cpumask_weight() counts all bits unconditionally.

Signed-off-by: Yury Norov <[email protected]>
---
kernel/rcu/tree_nocb.h | 4 ++--
kernel/rcu/tree_plugin.h | 2 +-
2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h
index eeafb546a7a0..f83c7b1d6110 100644
--- a/kernel/rcu/tree_nocb.h
+++ b/kernel/rcu/tree_nocb.h
@@ -1169,7 +1169,7 @@ void __init rcu_init_nohz(void)
struct rcu_data *rdp;

#if defined(CONFIG_NO_HZ_FULL)
- if (tick_nohz_full_running && cpumask_weight(tick_nohz_full_mask))
+ if (tick_nohz_full_running && !cpumask_empty(tick_nohz_full_mask))
need_rcu_nocb_mask = true;
#endif /* #if defined(CONFIG_NO_HZ_FULL) */

@@ -1348,7 +1348,7 @@ static void __init rcu_organize_nocb_kthreads(void)
*/
void rcu_bind_current_to_nocb(void)
{
- if (cpumask_available(rcu_nocb_mask) && cpumask_weight(rcu_nocb_mask))
+ if (cpumask_available(rcu_nocb_mask) && !cpumask_empty(rcu_nocb_mask))
WARN_ON(sched_setaffinity(current->pid, rcu_nocb_mask));
}
EXPORT_SYMBOL_GPL(rcu_bind_current_to_nocb);
diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index c5b45c2f68a1..0dc0c8d6717c 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -1215,7 +1215,7 @@ static void rcu_boost_kthread_setaffinity(struct rcu_node *rnp, int outgoingcpu)
cpu != outgoingcpu)
cpumask_set_cpu(cpu, cm);
cpumask_and(cm, cm, housekeeping_cpumask(HK_FLAG_RCU));
- if (cpumask_weight(cm) == 0)
+ if (cpumask_empty(cm))
cpumask_copy(cm, housekeeping_cpumask(HK_FLAG_RCU));
set_cpus_allowed_ptr(t, cm);
free_cpumask_var(cm);
--
2.30.2

2022-01-24 12:14:28

by Yury Norov

[permalink] [raw]
Subject: [PATCH 23/54] sched: replace cpumask_weight with cpumask_empty where appropriate

In some places, kernel/sched code calls cpumask_weight() to check if
any bit of a given cpumask is set. We can do it more efficiently with
cpumask_empty() because cpumask_empty() stops traversing the cpumask as
soon as it finds first set bit, while cpumask_weight() counts all bits
unconditionally.

Signed-off-by: Yury Norov <[email protected]>
---
kernel/sched/core.c | 2 +-
kernel/sched/topology.c | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 2e4ae00e52d1..918d0bdc2ea8 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -8707,7 +8707,7 @@ int cpuset_cpumask_can_shrink(const struct cpumask *cur,
{
int ret = 1;

- if (!cpumask_weight(cur))
+ if (cpumask_empty(cur))
return ret;

ret = dl_cpuset_cpumask_can_shrink(cur, trial);
diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
index d201a7052a29..8478e2a8cd65 100644
--- a/kernel/sched/topology.c
+++ b/kernel/sched/topology.c
@@ -74,7 +74,7 @@ static int sched_domain_debug_one(struct sched_domain *sd, int cpu, int level,
break;
}

- if (!cpumask_weight(sched_group_span(group))) {
+ if (cpumask_empty(sched_group_span(group))) {
printk(KERN_CONT "\n");
printk(KERN_ERR "ERROR: empty group\n");
break;
--
2.30.2

2022-01-24 12:14:30

by Yury Norov

[permalink] [raw]
Subject: [PATCH 24/54] time: replace cpumask_weight with cpumask_empty in clocksource.c

clocksource_verify_percpu() calls cpumask_weight() to check if any bit of
a given cpumask is set. We can do it more efficiently with cpumask_empty()
because cpumask_empty() stops traversing the cpumask as soon as it finds
first set bit, while cpumask_weight() counts all bits unconditionally.

Signed-off-by: Yury Norov <[email protected]>
---
kernel/time/clocksource.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
index 1cf73807b450..a2fecb4d8c0e 100644
--- a/kernel/time/clocksource.c
+++ b/kernel/time/clocksource.c
@@ -337,7 +337,7 @@ void clocksource_verify_percpu(struct clocksource *cs)
cpus_read_lock();
preempt_disable();
clocksource_verify_choose_cpus();
- if (cpumask_weight(&cpus_chosen) == 0) {
+ if (cpumask_empty(&cpus_chosen)) {
preempt_enable();
cpus_read_unlock();
pr_warn("Not enough CPUs to check clocksource '%s'.\n", cs->name);
--
2.30.2

2022-01-24 12:14:34

by Yury Norov

[permalink] [raw]
Subject: [PATCH 25/54] mm/vmstat: replace cpumask_weight with cpumask_empty where appropriate

mm/vmstat.c code calls cpumask_weight() to check if any bit of a given
cpumask is set. We can do it more efficiently with cpumask_empty() because
cpumask_empty() stops traversing the cpumask as soon as it finds first set
bit, while cpumask_weight() counts all bits unconditionally.

Signed-off-by: Yury Norov <[email protected]>
---
mm/vmstat.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/mm/vmstat.c b/mm/vmstat.c
index 4057372745d0..f56f11e3eef5 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -2035,7 +2035,7 @@ static void __init init_cpu_node_state(void)
int node;

for_each_online_node(node) {
- if (cpumask_weight(cpumask_of_node(node)) > 0)
+ if (!cpumask_empty(cpumask_of_node(node)))
node_set_state(node, N_CPU);
}
}
@@ -2062,7 +2062,7 @@ static int vmstat_cpu_dead(unsigned int cpu)

refresh_zone_stat_thresholds();
node_cpus = cpumask_of_node(node);
- if (cpumask_weight(node_cpus) > 0)
+ if (!cpumask_empty(node_cpus))
return 0;

node_clear_state(node, N_CPU);
--
2.30.2

2022-01-24 12:14:35

by Yury Norov

[permalink] [raw]
Subject: [PATCH 26/54] arch/x86: replace nodes_weight with nodes_empty where appropriate

mm code calls nodes_weight() to check if any bit of a given nodemask is
set. We can do it more efficiently with nodes_empty() because nodes_empty()
stops traversing the nodemask as soon as it finds first set bit, while
nodes_weight() counts all bits unconditionally.

Signed-off-by: Yury Norov <[email protected]>
---
arch/x86/mm/amdtopology.c | 2 +-
arch/x86/mm/numa_emulation.c | 4 ++--
2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/x86/mm/amdtopology.c b/arch/x86/mm/amdtopology.c
index 058b2f36b3a6..b3ca7d23e4b0 100644
--- a/arch/x86/mm/amdtopology.c
+++ b/arch/x86/mm/amdtopology.c
@@ -154,7 +154,7 @@ int __init amd_numa_init(void)
node_set(nodeid, numa_nodes_parsed);
}

- if (!nodes_weight(numa_nodes_parsed))
+ if (nodes_empty(numa_nodes_parsed))
return -ENOENT;

/*
diff --git a/arch/x86/mm/numa_emulation.c b/arch/x86/mm/numa_emulation.c
index 1a02b791d273..9a9305367fdd 100644
--- a/arch/x86/mm/numa_emulation.c
+++ b/arch/x86/mm/numa_emulation.c
@@ -123,7 +123,7 @@ static int __init split_nodes_interleave(struct numa_meminfo *ei,
* Continue to fill physical nodes with fake nodes until there is no
* memory left on any of them.
*/
- while (nodes_weight(physnode_mask)) {
+ while (!nodes_empty(physnode_mask)) {
for_each_node_mask(i, physnode_mask) {
u64 dma32_end = PFN_PHYS(MAX_DMA32_PFN);
u64 start, limit, end;
@@ -270,7 +270,7 @@ static int __init split_nodes_size_interleave_uniform(struct numa_meminfo *ei,
* Fill physical nodes with fake nodes of size until there is no memory
* left on any of them.
*/
- while (nodes_weight(physnode_mask)) {
+ while (!nodes_empty(physnode_mask)) {
for_each_node_mask(i, physnode_mask) {
u64 dma32_end = PFN_PHYS(MAX_DMA32_PFN);
u64 start, limit, end;
--
2.30.2

2022-01-24 12:15:08

by Yury Norov

[permalink] [raw]
Subject: [PATCH 28/54] arch/x86: replace bitmap_weight with bitmap_weight_{eq,gt,ge,lt,le} where appropriate

__init_one_rdt_domain in rdtgroup.c code calls bitmap_weight() to compare
the weight of bitmap with a given number. We can do it more efficiently
with bitmap_weight_lt because conditional bitmap_weight() may stop
traversing the bitmap earlier, as soon as condition is met.

Signed-off-by: Yury Norov <[email protected]>
---
arch/x86/kernel/cpu/resctrl/rdtgroup.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index e23ff03290b8..9d42e592c1cf 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -2752,7 +2752,7 @@ static int __init_one_rdt_domain(struct rdt_domain *d, struct resctrl_schema *s,
* bitmap_weight() does not access out-of-bound memory.
*/
tmp_cbm = cfg->new_ctrl;
- if (bitmap_weight(&tmp_cbm, r->cache.cbm_len) < r->cache.min_cbm_bits) {
+ if (bitmap_weight_lt(&tmp_cbm, r->cache.cbm_len, r->cache.min_cbm_bits)) {
rdt_last_cmd_printf("No space on %s:%d\n", s->name, d->id);
return -ENOSPC;
}
--
2.30.2

2022-01-24 12:15:08

by Yury Norov

[permalink] [raw]
Subject: [PATCH 27/54] lib/bitmap: add bitmap_weight_{cmp, eq, gt, ge, lt, le} functions

Many kernel users use bitmap_weight() to compare the result against
some number or expression:

if (bitmap_weight(...) > 1)
do_something();

It works OK, but may be significantly improved for large bitmaps: if
first few words count set bits to a number greater than given, we can
stop counting and immediately return.

The same idea would work in other direction: if we know that the number
of set bits that we counted so far is small enough, so that it would be
smaller than required number even if all bits of the rest of the bitmap
are set, we can stop counting earlier.

This patch adds new bitmap_weight_cmp() as suggested by Michał Mirosław
and a family of eq, gt, ge, lt and le wrappers to allow this optimization.
The following patches apply new functions where appropriate.

Suggested-by: "Michał Mirosław" <[email protected]> (for bitmap_weight_cmp)
Signed-off-by: Yury Norov <[email protected]>
---
include/linux/bitmap.h | 80 ++++++++++++++++++++++++++++++++++++++++++
lib/bitmap.c | 21 +++++++++++
2 files changed, 101 insertions(+)

diff --git a/include/linux/bitmap.h b/include/linux/bitmap.h
index 7dba0847510c..708e57b32362 100644
--- a/include/linux/bitmap.h
+++ b/include/linux/bitmap.h
@@ -51,6 +51,12 @@ struct device;
* bitmap_empty(src, nbits) Are all bits zero in *src?
* bitmap_full(src, nbits) Are all bits set in *src?
* bitmap_weight(src, nbits) Hamming Weight: number set bits
+ * bitmap_weight_cmp(src, nbits) compare Hamming Weight with a number
+ * bitmap_weight_eq(src, nbits, num) Hamming Weight == num
+ * bitmap_weight_gt(src, nbits, num) Hamming Weight > num
+ * bitmap_weight_ge(src, nbits, num) Hamming Weight >= num
+ * bitmap_weight_lt(src, nbits, num) Hamming Weight < num
+ * bitmap_weight_le(src, nbits, num) Hamming Weight <= num
* bitmap_set(dst, pos, nbits) Set specified bit area
* bitmap_clear(dst, pos, nbits) Clear specified bit area
* bitmap_find_next_zero_area(buf, len, pos, n, mask) Find bit free area
@@ -162,6 +168,7 @@ int __bitmap_intersects(const unsigned long *bitmap1,
int __bitmap_subset(const unsigned long *bitmap1,
const unsigned long *bitmap2, unsigned int nbits);
int __bitmap_weight(const unsigned long *bitmap, unsigned int nbits);
+int __bitmap_weight_cmp(const unsigned long *bitmap, unsigned int bits, int num);
void __bitmap_set(unsigned long *map, unsigned int start, int len);
void __bitmap_clear(unsigned long *map, unsigned int start, int len);

@@ -403,6 +410,79 @@ static __always_inline int bitmap_weight(const unsigned long *src, unsigned int
return __bitmap_weight(src, nbits);
}

+/**
+ * bitmap_weight_cmp - compares number of set bits in @src with @num.
+ * @src: source bitmap
+ * @nbits: length of bitmap in bits
+ * @num: number to compare with
+ *
+ * As opposite to bitmap_weight() this function doesn't necessarily
+ * traverse full bitmap and may return earlier.
+ *
+ * Returns zero if weight of @src is equal to @num;
+ * negative number if weight of @src is less than @num;
+ * positive number if weight of @src is greater than @num;
+ *
+ * NOTES
+ *
+ * Because number of set bits cannot decrease while counting, when user
+ * wants to know if the number of set bits in the bitmap is less than
+ * @num, calling
+ * bitmap_weight_cmp(..., @num) < 0
+ * is potentially less effective than
+ * bitmap_weight_cmp(..., @num - 1) <= 0
+ *
+ * Consider an example:
+ * bitmap_weight_cmp(1000 0000 0000 0000, 1) < 0
+ * ^
+ * stop here
+ *
+ * bitmap_weight_cmp(1000 0000 0000 0000, 0) <= 0
+ * ^
+ * stop here
+ */
+static __always_inline
+int bitmap_weight_cmp(const unsigned long *src, unsigned int nbits, int num)
+{
+ if (num > (int)nbits || num < 0)
+ return -num;
+
+ if (small_const_nbits(nbits))
+ return hweight_long(*src & BITMAP_LAST_WORD_MASK(nbits)) - num;
+
+ return __bitmap_weight_cmp(src, nbits, num);
+}
+
+static __always_inline
+bool bitmap_weight_eq(const unsigned long *src, unsigned int nbits, int num)
+{
+ return bitmap_weight_cmp(src, nbits, num) == 0;
+}
+
+static __always_inline
+bool bitmap_weight_gt(const unsigned long *src, unsigned int nbits, int num)
+{
+ return bitmap_weight_cmp(src, nbits, num) > 0;
+}
+
+static __always_inline
+bool bitmap_weight_ge(const unsigned long *src, unsigned int nbits, int num)
+{
+ return bitmap_weight_cmp(src, nbits, num - 1) > 0;
+}
+
+static __always_inline
+bool bitmap_weight_lt(const unsigned long *src, unsigned int nbits, int num)
+{
+ return bitmap_weight_cmp(src, nbits, num - 1) <= 0;
+}
+
+static __always_inline
+bool bitmap_weight_le(const unsigned long *src, unsigned int nbits, int num)
+{
+ return bitmap_weight_cmp(src, nbits, num) <= 0;
+}
+
static __always_inline void bitmap_set(unsigned long *map, unsigned int start,
unsigned int nbits)
{
diff --git a/lib/bitmap.c b/lib/bitmap.c
index 926408883456..fb84ca70c5d9 100644
--- a/lib/bitmap.c
+++ b/lib/bitmap.c
@@ -348,6 +348,27 @@ int __bitmap_weight(const unsigned long *bitmap, unsigned int bits)
}
EXPORT_SYMBOL(__bitmap_weight);

+int __bitmap_weight_cmp(const unsigned long *bitmap, unsigned int bits, int num)
+{
+ unsigned int k, w, lim = bits / BITS_PER_LONG;
+
+ for (k = 0, w = 0; k < lim; k++) {
+ if (w + bits - k * BITS_PER_LONG < num)
+ goto out;
+
+ w += hweight_long(bitmap[k]);
+
+ if (w > num)
+ goto out;
+ }
+
+ if (bits % BITS_PER_LONG)
+ w += hweight_long(bitmap[k] & BITMAP_LAST_WORD_MASK(bits));
+out:
+ return w - num;
+}
+EXPORT_SYMBOL(__bitmap_weight_cmp);
+
void __bitmap_set(unsigned long *map, unsigned int start, int len)
{
unsigned long *p = map + BIT_WORD(start);
--
2.30.2

2022-01-24 12:15:10

by Yury Norov

[permalink] [raw]
Subject: [PATCH 29/54] drivers/iio: replace bitmap_weight() with bitmap_weight_{eq,gt} where appropriate

drivers/iio calls bitmap_weight() to compare the weight of bitmap with
a given number. We can do it more efficiently with bitmap_weight_{eq, gt}
because conditional bitmap_weight may stop traversing the bitmap earlier,
as soon as condition is met.

Signed-off-by: Yury Norov <[email protected]>
---
drivers/iio/dummy/iio_simple_dummy_buffer.c | 4 ++--
drivers/iio/industrialio-trigger.c | 2 +-
2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/iio/dummy/iio_simple_dummy_buffer.c b/drivers/iio/dummy/iio_simple_dummy_buffer.c
index d81c2b2dad82..670997301e47 100644
--- a/drivers/iio/dummy/iio_simple_dummy_buffer.c
+++ b/drivers/iio/dummy/iio_simple_dummy_buffer.c
@@ -71,8 +71,8 @@ static irqreturn_t iio_simple_dummy_trigger_h(int irq, void *p)
int i, j;

for (i = 0, j = 0;
- i < bitmap_weight(indio_dev->active_scan_mask,
- indio_dev->masklength);
+ bitmap_weight_gt(indio_dev->active_scan_mask,
+ indio_dev->masklength, i);
i++, j++) {
j = find_next_bit(indio_dev->active_scan_mask,
indio_dev->masklength, j);
diff --git a/drivers/iio/industrialio-trigger.c b/drivers/iio/industrialio-trigger.c
index f504ed351b3e..98c54022fecf 100644
--- a/drivers/iio/industrialio-trigger.c
+++ b/drivers/iio/industrialio-trigger.c
@@ -331,7 +331,7 @@ int iio_trigger_detach_poll_func(struct iio_trigger *trig,
{
struct iio_dev_opaque *iio_dev_opaque = to_iio_dev_opaque(pf->indio_dev);
bool no_other_users =
- bitmap_weight(trig->pool, CONFIG_IIO_CONSUMERS_PER_TRIGGER) == 1;
+ bitmap_weight_eq(trig->pool, CONFIG_IIO_CONSUMERS_PER_TRIGGER, 1);
int ret = 0;

if (trig->ops && trig->ops->set_trigger_state && no_other_users) {
--
2.30.2

2022-01-24 12:15:12

by Yury Norov

[permalink] [raw]
Subject: [PATCH 30/54] drivers/memstick: replace bitmap_weight with bitmap_weight_eq where appropriate

msb_validate_used_block_bitmap() calls bitmap_weight() to compare the
weight of bitmap with a given number. We can do it more efficiently with
bitmap_weight_eq because conditional bitmap_weight may stop traversing the
bitmap earlier, as soon as condition is met.

Signed-off-by: Yury Norov <[email protected]>
---
drivers/memstick/core/ms_block.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/memstick/core/ms_block.c b/drivers/memstick/core/ms_block.c
index 0cda6c6baefc..5cdd987e78f7 100644
--- a/drivers/memstick/core/ms_block.c
+++ b/drivers/memstick/core/ms_block.c
@@ -155,8 +155,8 @@ static int msb_validate_used_block_bitmap(struct msb_data *msb)
for (i = 0; i < msb->zone_count; i++)
total_free_blocks += msb->free_block_count[i];

- if (msb->block_count - bitmap_weight(msb->used_blocks_bitmap,
- msb->block_count) == total_free_blocks)
+ if (bitmap_weight_eq(msb->used_blocks_bitmap, msb->block_count,
+ msb->block_count - total_free_blocks))
return 0;

pr_err("BUG: free block counts don't match the bitmap");
--
2.30.2

2022-01-24 12:15:13

by Yury Norov

[permalink] [raw]
Subject: [PATCH 31/54] net: ethernet: replace bitmap_weight with bitmap_weight_eq for intel

ixgbe_disable_sriov calls bitmap_weight() to compare the weight of bitmap
with a given number. We can do it more efficiently with bitmap_weight_eq
because conditional bitmap_weight may stop traversing the bitmap earlier,
as soon as condition is met.

Signed-off-by: Yury Norov <[email protected]>
---
drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c
index 214a38de3f41..35297d8a488b 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c
@@ -246,7 +246,7 @@ int ixgbe_disable_sriov(struct ixgbe_adapter *adapter)
#endif

/* Disable VMDq flag so device will be set in VM mode */
- if (bitmap_weight(adapter->fwd_bitmask, adapter->num_rx_pools) == 1) {
+ if (bitmap_weight_eq(adapter->fwd_bitmask, adapter->num_rx_pools, 1)) {
adapter->flags &= ~IXGBE_FLAG_VMDQ_ENABLED;
adapter->flags &= ~IXGBE_FLAG_SRIOV_ENABLED;
rss = min_t(int, ixgbe_max_rss_indices(adapter),
--
2.30.2

2022-01-24 12:15:15

by Yury Norov

[permalink] [raw]
Subject: [PATCH 32/54] net: ethernet: replace bitmap_weight with bitmap_weight_{eq,gt} for OcteonTX2

OcteonTX2 code calls bitmap_weight() to compare the weight of bitmap with
a given number. We can do it more efficiently with bitmap_weight_{eq,gt}
because conditional bitmap_weight may stop traversing the bitmap earlier,
as soon as condition is met.

Signed-off-by: Yury Norov <[email protected]>
---
drivers/net/ethernet/marvell/octeontx2/nic/otx2_ethtool.c | 2 +-
drivers/net/ethernet/marvell/octeontx2/nic/otx2_flows.c | 4 ++--
2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_ethtool.c b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_ethtool.c
index d85db90632d6..a55fd1d0c653 100644
--- a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_ethtool.c
+++ b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_ethtool.c
@@ -287,7 +287,7 @@ static int otx2_set_channels(struct net_device *dev,
if (!channel->rx_count || !channel->tx_count)
return -EINVAL;

- if (bitmap_weight(&pfvf->rq_bmap, pfvf->hw.rx_queues) > 1) {
+ if (bitmap_weight_gt(&pfvf->rq_bmap, pfvf->hw.rx_queues, 1)) {
netdev_err(dev,
"Receive queues are in use by TC police action\n");
return -EINVAL;
diff --git a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_flows.c b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_flows.c
index 80b2d64b4136..55c899a6fcdd 100644
--- a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_flows.c
+++ b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_flows.c
@@ -1170,8 +1170,8 @@ int otx2_remove_flow(struct otx2_nic *pfvf, u32 location)
* interface mac address and configure CGX/RPM block in
* promiscuous mode
*/
- if (bitmap_weight(&flow_cfg->dmacflt_bmap,
- flow_cfg->dmacflt_max_flows) == 1)
+ if (bitmap_weight_eq(&flow_cfg->dmacflt_bmap,
+ flow_cfg->dmacflt_max_flows, 1))
otx2_update_rem_pfmac(pfvf, DMAC_ADDR_DEL);
} else {
err = otx2_remove_flow_msg(pfvf, flow->entry, false);
--
2.30.2

2022-01-24 12:15:17

by Yury Norov

[permalink] [raw]
Subject: [PATCH 34/54] perf: replace bitmap_weight with bitmap_weight_eq for ThunderX2

tx2_uncore_event_start() calls bitmap_weight() to compare the weight
of bitmap with a given number. We can do it more efficiently with
bitmap_weight_eq because conditional bitmap_weight may stop traversing
the bitmap earlier, as soon as condition is met.

Signed-off-by: Yury Norov <[email protected]>
---
drivers/perf/thunderx2_pmu.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/perf/thunderx2_pmu.c b/drivers/perf/thunderx2_pmu.c
index 05378c0fd8f3..ebfa66b212c7 100644
--- a/drivers/perf/thunderx2_pmu.c
+++ b/drivers/perf/thunderx2_pmu.c
@@ -623,8 +623,8 @@ static void tx2_uncore_event_start(struct perf_event *event, int flags)
return;

/* Start timer for first event */
- if (bitmap_weight(tx2_pmu->active_counters,
- tx2_pmu->max_counters) == 1) {
+ if (bitmap_weight_eq(tx2_pmu->active_counters,
+ tx2_pmu->max_counters, 1)) {
hrtimer_start(&tx2_pmu->hrtimer,
ns_to_ktime(tx2_pmu->hrtimer_interval),
HRTIMER_MODE_REL_PINNED);
--
2.30.2

2022-01-24 12:15:21

by Yury Norov

[permalink] [raw]
Subject: [PATCH 33/54] net: ethernet: replace bitmap_weight with bitmap_weight_{eq,gt,ge,lt,le} for mellanox

Mellanox code uses bitmap_weight() to compare the weight of bitmap with
a given number. We can do it more efficiently with bitmap_weight_{eq, ...}
because conditional bitmap_weight may stop traversing the bitmap earlier,
as soon as condition is met.

Signed-off-by: Yury Norov <[email protected]>
---
drivers/net/ethernet/mellanox/mlx4/cmd.c | 10 +++-------
drivers/net/ethernet/mellanox/mlx4/eq.c | 4 ++--
drivers/net/ethernet/mellanox/mlx4/fw.c | 4 ++--
drivers/net/ethernet/mellanox/mlx4/main.c | 2 +-
4 files changed, 8 insertions(+), 12 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/cmd.c b/drivers/net/ethernet/mellanox/mlx4/cmd.c
index c56d2194cbfc..5bca0c68f00a 100644
--- a/drivers/net/ethernet/mellanox/mlx4/cmd.c
+++ b/drivers/net/ethernet/mellanox/mlx4/cmd.c
@@ -2792,9 +2792,8 @@ int mlx4_slave_convert_port(struct mlx4_dev *dev, int slave, int port)
{
unsigned n;
struct mlx4_active_ports actv_ports = mlx4_get_active_ports(dev, slave);
- unsigned m = bitmap_weight(actv_ports.ports, dev->caps.num_ports);

- if (port <= 0 || port > m)
+ if (port <= 0 || bitmap_weight_lt(actv_ports.ports, dev->caps.num_ports, port))
return -EINVAL;

n = find_first_bit(actv_ports.ports, dev->caps.num_ports);
@@ -3404,10 +3403,6 @@ int mlx4_vf_set_enable_smi_admin(struct mlx4_dev *dev, int slave, int port,
struct mlx4_priv *priv = mlx4_priv(dev);
struct mlx4_active_ports actv_ports = mlx4_get_active_ports(
&priv->dev, slave);
- int min_port = find_first_bit(actv_ports.ports,
- priv->dev.caps.num_ports) + 1;
- int max_port = min_port - 1 +
- bitmap_weight(actv_ports.ports, priv->dev.caps.num_ports);

if (slave == mlx4_master_func_num(dev))
return 0;
@@ -3417,7 +3412,8 @@ int mlx4_vf_set_enable_smi_admin(struct mlx4_dev *dev, int slave, int port,
enabled < 0 || enabled > 1)
return -EINVAL;

- if (min_port == max_port && dev->caps.num_ports > 1) {
+ if (dev->caps.num_ports > 1 &&
+ bitmap_weight_eq(actv_ports.ports, priv->dev.caps.num_ports, 1)) {
mlx4_info(dev, "SMI access disallowed for single ported VFs\n");
return -EPROTONOSUPPORT;
}
diff --git a/drivers/net/ethernet/mellanox/mlx4/eq.c b/drivers/net/ethernet/mellanox/mlx4/eq.c
index 414e390e6b48..0c09432ff389 100644
--- a/drivers/net/ethernet/mellanox/mlx4/eq.c
+++ b/drivers/net/ethernet/mellanox/mlx4/eq.c
@@ -1435,8 +1435,8 @@ int mlx4_is_eq_shared(struct mlx4_dev *dev, int vector)
if (vector <= 0 || (vector >= dev->caps.num_comp_vectors + 1))
return -EINVAL;

- return !!(bitmap_weight(priv->eq_table.eq[vector].actv_ports.ports,
- dev->caps.num_ports) > 1);
+ return bitmap_weight_gt(priv->eq_table.eq[vector].actv_ports.ports,
+ dev->caps.num_ports, 1);
}
EXPORT_SYMBOL(mlx4_is_eq_shared);

diff --git a/drivers/net/ethernet/mellanox/mlx4/fw.c b/drivers/net/ethernet/mellanox/mlx4/fw.c
index 42c96c9d7fb1..855aae326ccb 100644
--- a/drivers/net/ethernet/mellanox/mlx4/fw.c
+++ b/drivers/net/ethernet/mellanox/mlx4/fw.c
@@ -1300,8 +1300,8 @@ int mlx4_QUERY_DEV_CAP_wrapper(struct mlx4_dev *dev, int slave,
actv_ports = mlx4_get_active_ports(dev, slave);
first_port = find_first_bit(actv_ports.ports, dev->caps.num_ports);
for (slave_port = 0, real_port = first_port;
- real_port < first_port +
- bitmap_weight(actv_ports.ports, dev->caps.num_ports);
+ bitmap_weight_gt(actv_ports.ports, dev->caps.num_ports,
+ real_port - first_port);
++real_port, ++slave_port) {
if (flags & (MLX4_DEV_CAP_FLAG_WOL_PORT1 << real_port))
flags |= MLX4_DEV_CAP_FLAG_WOL_PORT1 << slave_port;
diff --git a/drivers/net/ethernet/mellanox/mlx4/main.c b/drivers/net/ethernet/mellanox/mlx4/main.c
index b187c210d4d6..cfbaa7ac712f 100644
--- a/drivers/net/ethernet/mellanox/mlx4/main.c
+++ b/drivers/net/ethernet/mellanox/mlx4/main.c
@@ -1383,7 +1383,7 @@ static int mlx4_mf_bond(struct mlx4_dev *dev)
dev->persist->num_vfs + 1);

/* only single port vfs are allowed */
- if (bitmap_weight(slaves_port_1_2, dev->persist->num_vfs + 1) > 1) {
+ if (bitmap_weight_gt(slaves_port_1_2, dev->persist->num_vfs + 1, 1)) {
mlx4_warn(dev, "HA mode unsupported for dual ported VFs\n");
return -EINVAL;
}
--
2.30.2

2022-01-24 12:15:23

by Yury Norov

[permalink] [raw]
Subject: [PATCH 35/54] drivers/staging: replace bitmap_weight with bitmap_weight_le for tegra-video

tegra_channel_enum_format() calls bitmap_weight() to compare the weight
of bitmap with a given number. We can do it more efficiently with
bitmap_weight_le() because conditional bitmap_weight may stop traversing
the bitmap earlier, as soon as condition is met.

Signed-off-by: Yury Norov <[email protected]>
---
drivers/staging/media/tegra-video/vi.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/staging/media/tegra-video/vi.c b/drivers/staging/media/tegra-video/vi.c
index d1f43f465c22..4e79a80e9307 100644
--- a/drivers/staging/media/tegra-video/vi.c
+++ b/drivers/staging/media/tegra-video/vi.c
@@ -436,7 +436,7 @@ static int tegra_channel_enum_format(struct file *file, void *fh,
if (!IS_ENABLED(CONFIG_VIDEO_TEGRA_TPG))
fmts_bitmap = chan->fmts_bitmap;

- if (f->index >= bitmap_weight(fmts_bitmap, MAX_FORMAT_NUM))
+ if (bitmap_weight_le(fmts_bitmap, MAX_FORMAT_NUM, f->index))
return -EINVAL;

for (i = 0; i < f->index + 1; i++, index++)
--
2.30.2

2022-01-24 12:15:27

by Yury Norov

[permalink] [raw]
Subject: [PATCH 37/54] arch/ia64: replace cpumask_weight with cpumask_weight_eq in mm/tlb.c

__flush_tlb_range() code calls cpumask_weight() to compare the
weight of cpumask with a given number. We can do it more efficiently with
cpumask_weight_eq because conditional cpumask_weight may stop traversing
the cpumask earlier, as soon as condition is met.

Signed-off-by: Yury Norov <[email protected]>
---
arch/ia64/mm/tlb.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/ia64/mm/tlb.c b/arch/ia64/mm/tlb.c
index 135b5135cace..a5bce13ab047 100644
--- a/arch/ia64/mm/tlb.c
+++ b/arch/ia64/mm/tlb.c
@@ -332,7 +332,7 @@ __flush_tlb_range (struct vm_area_struct *vma, unsigned long start,

preempt_disable();
#ifdef CONFIG_SMP
- if (mm != current->active_mm || cpumask_weight(mm_cpumask(mm)) != 1) {
+ if (mm != current->active_mm || !cpumask_weight_eq(mm_cpumask(mm), 1)) {
ia64_global_tlb_purge(mm, start, end, nbits);
preempt_enable();
return;
--
2.30.2

2022-01-24 12:15:30

by Yury Norov

[permalink] [raw]
Subject: [PATCH 38/54] arch/mips: replace cpumask_weight with cpumask_weight_{eq, ...} where appropriate

Mips code uses calls cpumask_weight() to compare the weight of
cpumask with a given number. We can do it more efficiently with
cpumask_weight_{eq, ...} because conditional cpumask_weight may stop
traversing the cpumask earlier, as soon as condition is met.

Signed-off-by: Yury Norov <[email protected]>
---
arch/mips/cavium-octeon/octeon-irq.c | 4 ++--
arch/mips/kernel/crash.c | 2 +-
2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/mips/cavium-octeon/octeon-irq.c b/arch/mips/cavium-octeon/octeon-irq.c
index 844f882096e6..914871f15fb7 100644
--- a/arch/mips/cavium-octeon/octeon-irq.c
+++ b/arch/mips/cavium-octeon/octeon-irq.c
@@ -763,7 +763,7 @@ static void octeon_irq_cpu_offline_ciu(struct irq_data *data)
if (!cpumask_test_cpu(cpu, mask))
return;

- if (cpumask_weight(mask) > 1) {
+ if (cpumask_weight_gt(mask, 1)) {
/*
* It has multi CPU affinity, just remove this CPU
* from the affinity set.
@@ -795,7 +795,7 @@ static int octeon_irq_ciu_set_affinity(struct irq_data *data,
* This removes the need to do locking in the .ack/.eoi
* functions.
*/
- if (cpumask_weight(dest) != 1)
+ if (!cpumask_weight_eq(dest, 1))
return -EINVAL;

if (!enable_one)
diff --git a/arch/mips/kernel/crash.c b/arch/mips/kernel/crash.c
index 81845ba04835..5b690d52491f 100644
--- a/arch/mips/kernel/crash.c
+++ b/arch/mips/kernel/crash.c
@@ -72,7 +72,7 @@ static void crash_kexec_prepare_cpus(void)
*/
pr_emerg("Sending IPI to other cpus...\n");
msecs = 10000;
- while ((cpumask_weight(&cpus_in_crash) < ncpus) && (--msecs > 0)) {
+ while (cpumask_weight_lt(&cpus_in_crash, ncpus) && (--msecs > 0)) {
cpu_relax();
mdelay(1);
}
--
2.30.2

2022-01-24 12:15:35

by Yury Norov

[permalink] [raw]
Subject: [PATCH 39/54] arch/powerpc: replace cpumask_weight with cpumask_weight_{eq, ...} where appropriate

PowerPC code uses cpumask_weight() to compare the weight of cpumask
with a given number. We can do it more efficiently with
cpumask_weight_{eq, ...} because conditional cpumask_weight may stop
traversing the cpumask earlier, as soon as condition is met.

Signed-off-by: Yury Norov <[email protected]>
---
arch/powerpc/kernel/smp.c | 2 +-
arch/powerpc/kernel/watchdog.c | 2 +-
arch/powerpc/xmon/xmon.c | 4 ++--
3 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
index b7fd6a72aa76..8bff748df402 100644
--- a/arch/powerpc/kernel/smp.c
+++ b/arch/powerpc/kernel/smp.c
@@ -1656,7 +1656,7 @@ void start_secondary(void *unused)
if (has_big_cores)
sibling_mask = cpu_smallcore_mask;

- if (cpumask_weight(mask) > cpumask_weight(sibling_mask(cpu)))
+ if (cpumask_weight_gt(mask, cpumask_weight(sibling_mask(cpu))))
shared_caches = true;
}

diff --git a/arch/powerpc/kernel/watchdog.c b/arch/powerpc/kernel/watchdog.c
index bfc27496fe7e..62937a077de7 100644
--- a/arch/powerpc/kernel/watchdog.c
+++ b/arch/powerpc/kernel/watchdog.c
@@ -483,7 +483,7 @@ static void start_watchdog(void *arg)

wd_smp_lock(&flags);
cpumask_set_cpu(cpu, &wd_cpus_enabled);
- if (cpumask_weight(&wd_cpus_enabled) == 1) {
+ if (cpumask_weight_eq(&wd_cpus_enabled, 1)) {
cpumask_set_cpu(cpu, &wd_smp_cpus_pending);
wd_smp_last_reset_tb = get_tb();
}
diff --git a/arch/powerpc/xmon/xmon.c b/arch/powerpc/xmon/xmon.c
index fd72753e8ad5..b423812e94e0 100644
--- a/arch/powerpc/xmon/xmon.c
+++ b/arch/powerpc/xmon/xmon.c
@@ -469,7 +469,7 @@ static bool wait_for_other_cpus(int ncpus)

/* We wait for 2s, which is a metric "little while" */
for (timeout = 20000; timeout != 0; --timeout) {
- if (cpumask_weight(&cpus_in_xmon) >= ncpus)
+ if (cpumask_weight_ge(&cpus_in_xmon, ncpus))
return true;
udelay(100);
barrier();
@@ -1338,7 +1338,7 @@ static int cpu_cmd(void)
case 'S':
case 't':
cpumask_copy(&xmon_batch_cpus, &cpus_in_xmon);
- if (cpumask_weight(&xmon_batch_cpus) <= 1) {
+ if (cpumask_weight_le(&xmon_batch_cpus, 1)) {
printf("There are no other cpus in xmon\n");
break;
}
--
2.30.2

2022-01-24 12:15:35

by Yury Norov

[permalink] [raw]
Subject: [PATCH 40/54] arch/s390: replace cpumask_weight with cpumask_weight_eq where appropriate

cfset_all_start() calls cpumask_weight() to compare the weight of cpumask
with a given number. We can do it more efficiently with
cpumask_weight_{eq, ...} because conditional cpumask_weight may stop
traversing the cpumask earlier, as soon as condition is met.

Signed-off-by: Yury Norov <[email protected]>
---
arch/s390/kernel/perf_cpum_cf.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/s390/kernel/perf_cpum_cf.c b/arch/s390/kernel/perf_cpum_cf.c
index ee8707abdb6a..4d217f7f5ccf 100644
--- a/arch/s390/kernel/perf_cpum_cf.c
+++ b/arch/s390/kernel/perf_cpum_cf.c
@@ -975,7 +975,7 @@ static int cfset_all_start(struct cfset_request *req)
return -ENOMEM;
cpumask_and(mask, &req->mask, cpu_online_mask);
on_each_cpu_mask(mask, cfset_ioctl_on, &p, 1);
- if (atomic_read(&p.cpus_ack) != cpumask_weight(mask)) {
+ if (!cpumask_weight_eq(mask, atomic_read(&p.cpus_ack))) {
on_each_cpu_mask(mask, cfset_ioctl_off, &p, 1);
rc = -EIO;
debug_sprintf_event(cf_dbg, 4, "%s CPUs missing", __func__);
--
2.30.2

2022-01-24 12:15:41

by Yury Norov

[permalink] [raw]
Subject: [PATCH 36/54] lib/cpumask: add cpumask_weight_{eq,gt,ge,lt,le}

In many cases people use cpumask_weight() to compare the result against
some number or expression:

if (cpumask_weight(...) > 1)
do_something();

It may be significantly improved for large cpumasks: if first few words
count set bits to a number greater than given, we can stop counting and
immediately return.

The same idea would work in other direction: if we know that the number
of set bits that we counted so far is small enough, so that it would be
smaller than required number even if all bits of the rest of the cpumask
are set, we can stop counting earlier.

This patch adds cpumask_weight_{eq, gt, ge, lt, le} helpers based on
corresponding bitmap functions. The following patches apply new functions
where appropriate.

Signed-off-by: Yury Norov <[email protected]>
---
include/linux/cpumask.h | 50 +++++++++++++++++++++++++++++++++++++++++
1 file changed, 50 insertions(+)

diff --git a/include/linux/cpumask.h b/include/linux/cpumask.h
index 64dae70d31f5..1906e3225737 100644
--- a/include/linux/cpumask.h
+++ b/include/linux/cpumask.h
@@ -575,6 +575,56 @@ static inline unsigned int cpumask_weight(const struct cpumask *srcp)
return bitmap_weight(cpumask_bits(srcp), nr_cpumask_bits);
}

+/**
+ * cpumask_weight_eq - Check if # of bits in *srcp is equal to a given number
+ * @srcp: the cpumask to count bits (< nr_cpu_ids) in.
+ * @num: the number to check.
+ */
+static inline bool cpumask_weight_eq(const struct cpumask *srcp, unsigned int num)
+{
+ return bitmap_weight_eq(cpumask_bits(srcp), nr_cpumask_bits, num);
+}
+
+/**
+ * cpumask_weight_gt - Check if # of bits in *srcp is greater than a given number
+ * @srcp: the cpumask to count bits (< nr_cpu_ids) in.
+ * @num: the number to check.
+ */
+static inline bool cpumask_weight_gt(const struct cpumask *srcp, int num)
+{
+ return bitmap_weight_gt(cpumask_bits(srcp), nr_cpumask_bits, num);
+}
+
+/**
+ * cpumask_weight_ge - Check if # of bits in *srcp is greater than or equal to a given number
+ * @srcp: the cpumask to count bits (< nr_cpu_ids) in.
+ * @num: the number to check.
+ */
+static inline bool cpumask_weight_ge(const struct cpumask *srcp, int num)
+{
+ return bitmap_weight_ge(cpumask_bits(srcp), nr_cpumask_bits, num);
+}
+
+/**
+ * cpumask_weight_lt - Check if # of bits in *srcp is less than a given number
+ * @srcp: the cpumask to count bits (< nr_cpu_ids) in.
+ * @num: the number to check.
+ */
+static inline bool cpumask_weight_lt(const struct cpumask *srcp, int num)
+{
+ return bitmap_weight_lt(cpumask_bits(srcp), nr_cpumask_bits, num);
+}
+
+/**
+ * cpumask_weight_le - Check if # of bits in *srcp is less than or equal to a given number
+ * @srcp: the cpumask to count bits (< nr_cpu_ids) in.
+ * @num: the number to check.
+ */
+static inline bool cpumask_weight_le(const struct cpumask *srcp, int num)
+{
+ return bitmap_weight_le(cpumask_bits(srcp), nr_cpumask_bits, num);
+}
+
/**
* cpumask_shift_right - *dstp = *srcp >> n
* @dstp: the cpumask result
--
2.30.2

2022-01-24 12:15:45

by Yury Norov

[permalink] [raw]
Subject: [PATCH 41/54] arch/x86: replace cpumask_weight with cpumask_weight_eq where appropriate

smpboot code in somw places calls cpumask_weight() to compare the weight
of cpumask with a given number. We can do it more efficiently with
cpumask_weight_eq() because conditional cpumask_weight may stop traversing
the cpumask earlier, as soon as condition is met.

Signed-off-by: Yury Norov <[email protected]>
---
arch/x86/kernel/smpboot.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index 617012f4619f..e851e9945eb5 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -1608,7 +1608,7 @@ static void remove_siblinginfo(int cpu)
/*/
* last thread sibling in this cpu core going down
*/
- if (cpumask_weight(topology_sibling_cpumask(cpu)) == 1)
+ if (cpumask_weight_eq(topology_sibling_cpumask(cpu), 1))
cpu_data(sibling).booted_cores--;
}

@@ -1617,7 +1617,7 @@ static void remove_siblinginfo(int cpu)

for_each_cpu(sibling, topology_sibling_cpumask(cpu)) {
cpumask_clear_cpu(cpu, topology_sibling_cpumask(sibling));
- if (cpumask_weight(topology_sibling_cpumask(sibling)) == 1)
+ if (cpumask_weight_eq(topology_sibling_cpumask(sibling), 1))
cpu_data(sibling).smt_active = false;
}

--
2.30.2

2022-01-24 12:15:47

by Yury Norov

[permalink] [raw]
Subject: [PATCH 42/54] firmware: pcsi: replace cpumask_weight with cpumask_weight_eq

down_and_up_cpus() calls cpumask_weight() to compare the weight of
cpumask with a given number. We can do it more efficiently with
cpumask_weight_{eq, ...} because conditional cpumask_weight may stop
traversing the cpumask earlier, as soon as condition is met.

Signed-off-by: Yury Norov <[email protected]>
---
drivers/firmware/psci/psci_checker.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/firmware/psci/psci_checker.c b/drivers/firmware/psci/psci_checker.c
index 116eb465cdb4..90c9473832a9 100644
--- a/drivers/firmware/psci/psci_checker.c
+++ b/drivers/firmware/psci/psci_checker.c
@@ -90,7 +90,7 @@ static unsigned int down_and_up_cpus(const struct cpumask *cpus,
* cpu_down() checks the number of online CPUs before the TOS
* resident CPU.
*/
- if (cpumask_weight(offlined_cpus) + 1 == nb_available_cpus) {
+ if (cpumask_weight_eq(offlined_cpus, nb_available_cpus - 1)) {
if (ret != -EBUSY) {
pr_err("Unexpected return code %d while trying "
"to power down last online CPU %d\n",
--
2.30.2

2022-01-24 12:15:49

by Yury Norov

[permalink] [raw]
Subject: [PATCH 43/54] drivers/hv: replace cpumask_weight with cpumask_weight_eq

init_vp_index() calls cpumask_weight() to compare the weights of cpumasks
We can do it more efficiently with cpumask_weight_eq because conditional
cpumask_weight may stop traversing the cpumask earlier (at least one), as
soon as condition is met.

Signed-off-by: Yury Norov <[email protected]>
---
drivers/hv/channel_mgmt.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/hv/channel_mgmt.c b/drivers/hv/channel_mgmt.c
index 60375879612f..7420a5fd47b5 100644
--- a/drivers/hv/channel_mgmt.c
+++ b/drivers/hv/channel_mgmt.c
@@ -762,8 +762,8 @@ static void init_vp_index(struct vmbus_channel *channel)
}
alloced_mask = &hv_context.hv_numa_map[numa_node];

- if (cpumask_weight(alloced_mask) ==
- cpumask_weight(cpumask_of_node(numa_node))) {
+ if (cpumask_weight_eq(alloced_mask,
+ cpumask_weight(cpumask_of_node(numa_node)))) {
/*
* We have cycled through all the CPUs in the node;
* reset the alloced map.
--
2.30.2

2022-01-24 12:15:57

by Yury Norov

[permalink] [raw]
Subject: [PATCH 44/54] infiniband: replace cpumask_weight with cpumask_weight_{eq, ...} where appropriate

Infiniband code uses cpumask_weight() to compare the weight of
cpumask with a given number. We can do it more efficiently with
cpumask_weight_{eq, ...} because conditional cpumask_weight may stop
traversing the cpumask earlier, as soon as condition is met.

Signed-off-by: Yury Norov <[email protected]>
---
drivers/infiniband/hw/hfi1/affinity.c | 9 ++++-----
drivers/infiniband/hw/qib/qib_file_ops.c | 2 +-
drivers/infiniband/hw/qib/qib_iba7322.c | 2 +-
3 files changed, 6 insertions(+), 7 deletions(-)

diff --git a/drivers/infiniband/hw/hfi1/affinity.c b/drivers/infiniband/hw/hfi1/affinity.c
index 38eee675369a..7c5ca5c5306a 100644
--- a/drivers/infiniband/hw/hfi1/affinity.c
+++ b/drivers/infiniband/hw/hfi1/affinity.c
@@ -507,7 +507,7 @@ static int _dev_comp_vect_cpu_mask_init(struct hfi1_devdata *dd,
* available CPUs divide it by the number of devices in the
* local NUMA node.
*/
- if (cpumask_weight(&entry->comp_vect_mask) == 1) {
+ if (cpumask_weight_eq(&entry->comp_vect_mask, 1)) {
possible_cpus_comp_vect = 1;
dd_dev_warn(dd,
"Number of kernel receive queues is too large for completion vector affinity to be effective\n");
@@ -593,7 +593,7 @@ int hfi1_dev_affinity_init(struct hfi1_devdata *dd)
{
struct hfi1_affinity_node *entry;
const struct cpumask *local_mask;
- int curr_cpu, possible, i, ret;
+ int curr_cpu, i, ret;
bool new_entry = false;

local_mask = cpumask_of_node(dd->node);
@@ -626,10 +626,9 @@ int hfi1_dev_affinity_init(struct hfi1_devdata *dd)
local_mask);

/* fill in the receive list */
- possible = cpumask_weight(&entry->def_intr.mask);
curr_cpu = cpumask_first(&entry->def_intr.mask);

- if (possible == 1) {
+ if (cpumask_weight_eq(&entry->def_intr.mask, 1)) {
/* only one CPU, everyone will use it */
cpumask_set_cpu(curr_cpu, &entry->rcv_intr.mask);
cpumask_set_cpu(curr_cpu, &entry->general_intr_mask);
@@ -1017,7 +1016,7 @@ int hfi1_get_proc_affinity(int node)
cpu = cpumask_first(proc_mask);
cpumask_set_cpu(cpu, &set->used);
goto done;
- } else if (current->nr_cpus_allowed < cpumask_weight(&set->mask)) {
+ } else if (cpumask_weight_gt(&set->mask, current->nr_cpus_allowed)) {
hfi1_cdbg(PROC, "PID %u %s affinity set to CPU set(s) %*pbl",
current->pid, current->comm,
cpumask_pr_args(proc_mask));
diff --git a/drivers/infiniband/hw/qib/qib_file_ops.c b/drivers/infiniband/hw/qib/qib_file_ops.c
index aa290928cf96..add89bc21b0a 100644
--- a/drivers/infiniband/hw/qib/qib_file_ops.c
+++ b/drivers/infiniband/hw/qib/qib_file_ops.c
@@ -1151,7 +1151,7 @@ static void assign_ctxt_affinity(struct file *fp, struct qib_devdata *dd)
* reserve a processor for it on the local NUMA node.
*/
if ((weight >= qib_cpulist_count) &&
- (cpumask_weight(local_mask) <= qib_cpulist_count)) {
+ (cpumask_weight_le(local_mask, qib_cpulist_count))) {
for_each_cpu(local_cpu, local_mask)
if (!test_and_set_bit(local_cpu, qib_cpulist)) {
fd->rec_cpu_num = local_cpu;
diff --git a/drivers/infiniband/hw/qib/qib_iba7322.c b/drivers/infiniband/hw/qib/qib_iba7322.c
index ceed302cf6a0..b17f96509d2c 100644
--- a/drivers/infiniband/hw/qib/qib_iba7322.c
+++ b/drivers/infiniband/hw/qib/qib_iba7322.c
@@ -3405,7 +3405,7 @@ static void qib_setup_7322_interrupt(struct qib_devdata *dd, int clearpend)
local_mask = cpumask_of_pcibus(dd->pcidev->bus);
firstcpu = cpumask_first(local_mask);
if (firstcpu >= nr_cpu_ids ||
- cpumask_weight(local_mask) == num_online_cpus()) {
+ cpumask_weight_eq(local_mask, num_online_cpus())) {
local_mask = topology_core_cpumask(0);
firstcpu = cpumask_first(local_mask);
}
--
2.30.2

2022-01-24 12:16:00

by Yury Norov

[permalink] [raw]
Subject: [PATCH 45/54] scsi: replace cpumask_weight with cpumask_weight_gt

lpfc_cpuhp_get_eq() calls cpumask_weight() to compare the weight of
cpumask with a given number. We can do it more efficiently with
cpumask_weight_gt because conditional cpumask_weight may stop
traversing the cpumask earlier, as soon as condition is met.

Signed-off-by: Yury Norov <[email protected]>
---
drivers/scsi/lpfc/lpfc_init.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/scsi/lpfc/lpfc_init.c b/drivers/scsi/lpfc/lpfc_init.c
index a56f01f659f8..325e9004dacd 100644
--- a/drivers/scsi/lpfc/lpfc_init.c
+++ b/drivers/scsi/lpfc/lpfc_init.c
@@ -12643,7 +12643,7 @@ lpfc_cpuhp_get_eq(struct lpfc_hba *phba, unsigned int cpu,
* gone offline yet, we need >1.
*/
cpumask_and(tmp, maskp, cpu_online_mask);
- if (cpumask_weight(tmp) > 1)
+ if (cpumask_weight_gt(tmp, 1))
continue;

/* Now that we have an irq to shutdown, get the eq
--
2.30.2

2022-01-24 12:16:04

by Yury Norov

[permalink] [raw]
Subject: [PATCH 47/54] sched: replace cpumask_weight with cpumask_weight_eq where appropriate

kernel/sched code uses cpumask_weight() to compare the weight of
cpumask with a given number. We can do it more efficiently with
cpumask_weight_eq because conditional cpumask_weight may stop
traversing the cpumask earlier, as soon as condition is met.

Signed-off-by: Yury Norov <[email protected]>
---
kernel/sched/core.c | 8 ++++----
kernel/sched/topology.c | 2 +-
2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 918d0bdc2ea8..7494f51a3608 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -6006,7 +6006,7 @@ static void sched_core_cpu_starting(unsigned int cpu)
WARN_ON_ONCE(rq->core != rq);

/* if we're the first, we'll be our own leader */
- if (cpumask_weight(smt_mask) == 1)
+ if (cpumask_weight_eq(smt_mask, 1))
goto unlock;

/* find the leader */
@@ -6047,7 +6047,7 @@ static void sched_core_cpu_deactivate(unsigned int cpu)
sched_core_lock(cpu, &flags);

/* if we're the last man standing, nothing to do */
- if (cpumask_weight(smt_mask) == 1) {
+ if (cpumask_weight_eq(smt_mask, 1)) {
WARN_ON_ONCE(rq->core != rq);
goto unlock;
}
@@ -9045,7 +9045,7 @@ int sched_cpu_activate(unsigned int cpu)
/*
* When going up, increment the number of cores with SMT present.
*/
- if (cpumask_weight(cpu_smt_mask(cpu)) == 2)
+ if (cpumask_weight_eq(cpu_smt_mask(cpu), 2))
static_branch_inc_cpuslocked(&sched_smt_present);
#endif
set_cpu_active(cpu, true);
@@ -9120,7 +9120,7 @@ int sched_cpu_deactivate(unsigned int cpu)
/*
* When going down, decrement the number of cores with SMT present.
*/
- if (cpumask_weight(cpu_smt_mask(cpu)) == 2)
+ if (cpumask_weight_eq(cpu_smt_mask(cpu), 2))
static_branch_dec_cpuslocked(&sched_smt_present);

sched_core_cpu_deactivate(cpu);
diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
index 8478e2a8cd65..79395571599f 100644
--- a/kernel/sched/topology.c
+++ b/kernel/sched/topology.c
@@ -169,7 +169,7 @@ static const unsigned int SD_DEGENERATE_GROUPS_MASK =

static int sd_degenerate(struct sched_domain *sd)
{
- if (cpumask_weight(sched_domain_span(sd)) == 1)
+ if (cpumask_weight_eq(sched_domain_span(sd), 1))
return 1;

/* Following flags need at least 2 groups */
--
2.30.2

2022-01-24 12:16:34

by Yury Norov

[permalink] [raw]
Subject: [PATCH 46/54] soc: replace cpumask_weight with cpumask_weight_lt

qman_test_stash() calls cpumask_weight() to compare the weight of
cpumask with a given number. We can do it more efficiently with
cpumask_weight_lt because conditional cpumask_weight may stop
traversing the cpumask earlier, as soon as condition is met.

Signed-off-by: Yury Norov <[email protected]>
---
drivers/soc/fsl/qbman/qman_test_stash.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/soc/fsl/qbman/qman_test_stash.c b/drivers/soc/fsl/qbman/qman_test_stash.c
index b7e8e5ec884c..28b08568a349 100644
--- a/drivers/soc/fsl/qbman/qman_test_stash.c
+++ b/drivers/soc/fsl/qbman/qman_test_stash.c
@@ -561,7 +561,7 @@ int qman_test_stash(void)
{
int err;

- if (cpumask_weight(cpu_online_mask) < 2) {
+ if (cpumask_weight_lt(cpu_online_mask, 2)) {
pr_info("%s(): skip - only 1 CPU\n", __func__);
return 0;
}
--
2.30.2

2022-01-24 12:16:35

by Yury Norov

[permalink] [raw]
Subject: [PATCH 48/54] kernel/time: replace cpumask_weight with cpumask_weight_eq where appropriate

tick_cleanup_dead_cpu() calls cpumask_weight() to compare the weight
of cpumask with a given number. We can do it more efficiently with
cpumask_weight_eq() because conditional cpumask_weight may stop
traversing the cpumask earlier, as soon as condition is met.

Signed-off-by: Yury Norov <[email protected]>
---
kernel/time/clockevents.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/time/clockevents.c b/kernel/time/clockevents.c
index 003ccf338d20..32d6629a55b2 100644
--- a/kernel/time/clockevents.c
+++ b/kernel/time/clockevents.c
@@ -648,7 +648,7 @@ void tick_cleanup_dead_cpu(int cpu)
*/
list_for_each_entry_safe(dev, tmp, &clockevent_devices, list) {
if (cpumask_test_cpu(cpu, dev->cpumask) &&
- cpumask_weight(dev->cpumask) == 1 &&
+ cpumask_weight_eq(dev->cpumask, 1) &&
!tick_is_broadcast_device(dev)) {
BUG_ON(!clockevent_state_detached(dev));
list_del(&dev->list);
--
2.30.2

2022-01-24 12:16:50

by Yury Norov

[permalink] [raw]
Subject: [PATCH 49/54] lib/nodemask: add nodemask_weight_{eq,gt,ge,lt,le}

In many cases kernel code uses nodemask_weight() to compare the result
against some number or expression:

if (nodes_weight(...) > 1)
do_something();

It may be significantly improved for large nodemasks: if first few words
count set bits to a number greater than given, we can stop counting and
immediately return.

The same idea would work in other direction: if we know that the number
of set bits that we counted so far is small enough, so that it would be
smaller than required number even if all bits of the rest of the nodemask
are set, we can stop counting earlier.

This patch adds nodes_weight{eq, gt, ge, lt, le} helpers based on
corresponding bitmap functions. The following patches apply new functions
where appropriate.

Signed-off-by: Yury Norov <[email protected]>
---
include/linux/nodemask.h | 35 +++++++++++++++++++++++++++++++++++
1 file changed, 35 insertions(+)

diff --git a/include/linux/nodemask.h b/include/linux/nodemask.h
index 567c3ddba2c4..197598e075e9 100644
--- a/include/linux/nodemask.h
+++ b/include/linux/nodemask.h
@@ -38,6 +38,11 @@
* int nodes_empty(mask) Is mask empty (no bits sets)?
* int nodes_full(mask) Is mask full (all bits sets)?
* int nodes_weight(mask) Hamming weight - number of set bits
+ * bool nodes_weight_eq(src, nbits, num) Hamming Weight is equal to num
+ * bool nodes_weight_gt(src, nbits, num) Hamming Weight is greater than num
+ * bool nodes_weight_ge(src, nbits, num) Hamming Weight is greater than or equal to num
+ * bool nodes_weight_lt(src, nbits, num) Hamming Weight is less than num
+ * bool nodes_weight_le(src, nbits, num) Hamming Weight is less than or equal to num
*
* void nodes_shift_right(dst, src, n) Shift right
* void nodes_shift_left(dst, src, n) Shift left
@@ -240,6 +245,36 @@ static inline int __nodes_weight(const nodemask_t *srcp, unsigned int nbits)
return bitmap_weight(srcp->bits, nbits);
}

+#define nodes_weight_eq(nodemask, num) __nodes_weight_eq(&(nodemask), MAX_NUMNODES, (num))
+static inline int __nodes_weight_eq(const nodemask_t *srcp, unsigned int nbits, int num)
+{
+ return bitmap_weight_eq(srcp->bits, nbits, num);
+}
+
+#define nodes_weight_gt(nodemask, num) __nodes_weight_gt(&(nodemask), MAX_NUMNODES, (num))
+static inline int __nodes_weight_gt(const nodemask_t *srcp, unsigned int nbits, int num)
+{
+ return bitmap_weight_gt(srcp->bits, nbits, num);
+}
+
+#define nodes_weight_ge(nodemask, num) __nodes_weight_ge(&(nodemask), MAX_NUMNODES, (num))
+static inline int __nodes_weight_ge(const nodemask_t *srcp, unsigned int nbits, int num)
+{
+ return bitmap_weight_ge(srcp->bits, nbits, num);
+}
+
+#define nodes_weight_lt(nodemask, num) __nodes_weight_lt(&(nodemask), MAX_NUMNODES, (num))
+static inline int __nodes_weight_lt(const nodemask_t *srcp, unsigned int nbits, int num)
+{
+ return bitmap_weight_lt(srcp->bits, nbits, num);
+}
+
+#define nodes_weight_le(nodemask, num) __nodes_weight_le(&(nodemask), MAX_NUMNODES, (num))
+static inline int __nodes_weight_le(const nodemask_t *srcp, unsigned int nbits, int num)
+{
+ return bitmap_weight_le(srcp->bits, nbits, num);
+}
+
#define nodes_shift_right(dst, src, n) \
__nodes_shift_right(&(dst), &(src), (n), MAX_NUMNODES)
static inline void __nodes_shift_right(nodemask_t *dstp,
--
2.30.2

2022-01-24 12:49:19

by Yury Norov

[permalink] [raw]
Subject: [PATCH 50/54] acpi: replace nodes__weight with nodes_weight_ge for numa

acpi_map_pxm_to_node() calls nodes_weight() to compare the weight
of nodemask with a given number. We can do it more efficiently with
nodes_weight_eq() because conditional nodes_weight may stop
traversing the nodemask earlier, as soon as condition is met.

Signed-off-by: Yury Norov <[email protected]>
---
drivers/acpi/numa/srat.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/acpi/numa/srat.c b/drivers/acpi/numa/srat.c
index 3b818ab186be..fe7a7996f553 100644
--- a/drivers/acpi/numa/srat.c
+++ b/drivers/acpi/numa/srat.c
@@ -67,7 +67,7 @@ int acpi_map_pxm_to_node(int pxm)
node = pxm_to_node_map[pxm];

if (node == NUMA_NO_NODE) {
- if (nodes_weight(nodes_found_map) >= MAX_NUMNODES)
+ if (nodes_weight_ge(nodes_found_map, MAX_NUMNODES))
return NUMA_NO_NODE;
node = first_unset_node(nodes_found_map);
__acpi_map_pxm_to_node(pxm, node);
--
2.30.2

2022-01-24 12:49:19

by Yury Norov

[permalink] [raw]
Subject: [PATCH 51/54] mm: replace nodes_weight with nodes_weight_eq in mempolicy

do_migrate_pages() calls nodes_weight() to compare the weight
of nodemask with a given number. We can do it more efficiently with
nodes_weight_eq() because conditional nodes_weight() may stop
traversing the nodemask earlier, as soon as condition is met.

Signed-off-by: Yury Norov <[email protected]>
---
mm/mempolicy.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index a86590b2507d..27817cf2f2a0 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -1157,7 +1157,7 @@ int do_migrate_pages(struct mm_struct *mm, const nodemask_t *from,
* [0-7] - > [3,4,5] moves only 0,1,2,6,7.
*/

- if ((nodes_weight(*from) != nodes_weight(*to)) &&
+ if (!nodes_weight_eq(*from, nodes_weight(*to)) &&
(node_isset(s, *to)))
continue;

--
2.30.2

2022-01-24 12:49:34

by Yury Norov

[permalink] [raw]
Subject: [PATCH 52/54] lib/nodemask: add num_node_state_eq()

Kernel code uses num_node_state() to compare number of nodes with a given
number. The underlying code calls bitmap_weight(), and we can do it more
efficiently with num_node_state_eq because conditional nodes_weight may
stop traversing the nodemask earlier, as soon as condition is met.

Signed-off-by: Yury Norov <[email protected]>
---
include/linux/nodemask.h | 5 +++++
mm/page_alloc.c | 2 +-
2 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/include/linux/nodemask.h b/include/linux/nodemask.h
index 197598e075e9..c5014dbf3cce 100644
--- a/include/linux/nodemask.h
+++ b/include/linux/nodemask.h
@@ -466,6 +466,11 @@ static inline int num_node_state(enum node_states state)
return nodes_weight(node_states[state]);
}

+static inline int num_node_state_eq(enum node_states state, int num)
+{
+ return nodes_weight_eq(node_states[state], num);
+}
+
#define for_each_node_state(__node, __state) \
for_each_node_mask((__node), node_states[__state])

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 8dd6399bafb5..37496d764643 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -8328,7 +8328,7 @@ void __init page_alloc_init(void)
int ret;

#ifdef CONFIG_NUMA
- if (num_node_state(N_MEMORY) == 1)
+ if (num_node_state_eq(N_MEMORY, 1))
hashdist = 0;
#endif

--
2.30.2

2022-01-24 12:49:39

by Yury Norov

[permalink] [raw]
Subject: [PATCH 53/54] tools/bitmap: sync bitmap_weight

Pull bitmap_weight_{cmp,eq,gt,ge,lt,le} from mother kernel and
use where applicable.

Signed-off-by: Yury Norov <[email protected]>
---
tools/include/linux/bitmap.h | 44 ++++++++++++++++++++++++++++++++++++
tools/lib/bitmap.c | 20 ++++++++++++++++
tools/perf/util/pmu.c | 2 +-
3 files changed, 65 insertions(+), 1 deletion(-)

diff --git a/tools/include/linux/bitmap.h b/tools/include/linux/bitmap.h
index ea97804d04d4..e8ae9a85d555 100644
--- a/tools/include/linux/bitmap.h
+++ b/tools/include/linux/bitmap.h
@@ -12,6 +12,8 @@
unsigned long name[BITS_TO_LONGS(bits)]

int __bitmap_weight(const unsigned long *bitmap, int bits);
+int __bitmap_weight_cmp(const unsigned long *bitmap, unsigned int bits,
+ unsigned int num);
void __bitmap_or(unsigned long *dst, const unsigned long *bitmap1,
const unsigned long *bitmap2, int bits);
int __bitmap_and(unsigned long *dst, const unsigned long *bitmap1,
@@ -68,6 +70,48 @@ static inline int bitmap_weight(const unsigned long *src, unsigned int nbits)
return __bitmap_weight(src, nbits);
}

+static __always_inline
+int bitmap_weight_cmp(const unsigned long *src, unsigned int nbits, int num)
+{
+ if (num > (int)nbits || num < 0)
+ return -num;
+
+ if (small_const_nbits(nbits))
+ return hweight_long(*src & BITMAP_LAST_WORD_MASK(nbits)) - num;
+
+ return __bitmap_weight_cmp(src, nbits, num);
+}
+
+static __always_inline
+bool bitmap_weight_eq(const unsigned long *src, unsigned int nbits, int num)
+{
+ return bitmap_weight_cmp(src, nbits, num) == 0;
+}
+
+static __always_inline
+bool bitmap_weight_gt(const unsigned long *src, unsigned int nbits, int num)
+{
+ return bitmap_weight_cmp(src, nbits, num) > 0;
+}
+
+static __always_inline
+bool bitmap_weight_ge(const unsigned long *src, unsigned int nbits, int num)
+{
+ return bitmap_weight_cmp(src, nbits, num - 1) > 0;
+}
+
+static __always_inline
+bool bitmap_weight_lt(const unsigned long *src, unsigned int nbits, int num)
+{
+ return bitmap_weight_cmp(src, nbits, num - 1) <= 0;
+}
+
+static __always_inline
+bool bitmap_weight_le(const unsigned long *src, unsigned int nbits, int num)
+{
+ return bitmap_weight_cmp(src, nbits, num) <= 0;
+}
+
static inline void bitmap_or(unsigned long *dst, const unsigned long *src1,
const unsigned long *src2, unsigned int nbits)
{
diff --git a/tools/lib/bitmap.c b/tools/lib/bitmap.c
index db466ef7be9d..06e58fee8523 100644
--- a/tools/lib/bitmap.c
+++ b/tools/lib/bitmap.c
@@ -18,6 +18,26 @@ int __bitmap_weight(const unsigned long *bitmap, int bits)
return w;
}

+int __bitmap_weight_cmp(const unsigned long *bitmap, unsigned int bits, int num)
+{
+ unsigned int k, w, lim = bits / BITS_PER_LONG;
+
+ for (k = 0, w = 0; k < lim; k++) {
+ if (w + bits - k * BITS_PER_LONG < num)
+ goto out;
+
+ w += hweight_long(bitmap[k]);
+
+ if (w > num)
+ goto out;
+ }
+
+ if (bits % BITS_PER_LONG)
+ w += hweight_long(bitmap[k] & BITMAP_LAST_WORD_MASK(bits));
+out:
+ return w - num;
+}
+
void __bitmap_or(unsigned long *dst, const unsigned long *bitmap1,
const unsigned long *bitmap2, int bits)
{
diff --git a/tools/perf/util/pmu.c b/tools/perf/util/pmu.c
index 8dfbba15aeb8..2c26cdd7f9b0 100644
--- a/tools/perf/util/pmu.c
+++ b/tools/perf/util/pmu.c
@@ -1314,7 +1314,7 @@ static int pmu_config_term(const char *pmu_name,
*/
if (term->type_val == PARSE_EVENTS__TERM_TYPE_NUM) {
if (term->no_value &&
- bitmap_weight(format->bits, PERF_PMU_FORMAT_BITS) > 1) {
+ bitmap_weight_gt(format->bits, PERF_PMU_FORMAT_BITS, 1)) {
if (err) {
parse_events_error__handle(err, term->err_val,
strdup("no value assigned for term"),
--
2.30.2

2022-01-24 12:49:48

by Yury Norov

[permalink] [raw]
Subject: [PATCH 54/54] MAINTAINERS: add cpumask and nodemask files to BITMAP_API

cpumask and nodemask APIs are thin wrappers around basic bitmap API, and
corresponding files are not formally maintained. This patch adds them to
BITMAP_API section, so that bitmap folks would have closer look at it.

Signed-off-by: Yury Norov <[email protected]>
---
MAINTAINERS | 4 ++++
1 file changed, 4 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 27730a5a6345..7a3798de61c9 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3412,10 +3412,14 @@ R: Andy Shevchenko <[email protected]>
R: Rasmus Villemoes <[email protected]>
S: Maintained
F: include/linux/bitmap.h
+F: include/linux/cpumask.h
F: include/linux/find.h
+F: include/linux/nodemask.h
F: lib/bitmap.c
+F: lib/cpumask.c
F: lib/find_bit.c
F: lib/find_bit_benchmark.c
+F: lib/nodemask.c
F: lib/test_bitmap.c
F: tools/include/linux/bitmap.h
F: tools/include/linux/find.h
--
2.30.2

2022-01-24 12:59:27

by Leon Romanovsky

[permalink] [raw]
Subject: Re: [PATCH 18/54] drivers/infiniband: replace cpumask_weight with cpumask_empty where appropriate

On Sun, Jan 23, 2022 at 10:38:49AM -0800, Yury Norov wrote:
> drivers/infiniband/hw/hfi1/affinity.c code calls cpumask_weight() to check
> if any bit of a given cpumask is set. We can do it more efficiently with
> cpumask_empty() because cpumask_empty() stops traversing the cpumask as
> soon as it finds first set bit, while cpumask_weight() counts all bits
> unconditionally.
>
> Signed-off-by: Yury Norov <[email protected]>
> ---
> drivers/infiniband/hw/hfi1/affinity.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)

Except that title needs to be: "RDMA/hfi: ....", the change looks ok.

Thanks,
Reviewed-by: Leon Romanovsky <[email protected]>

2022-01-24 14:02:48

by Wei Liu

[permalink] [raw]
Subject: Re: [PATCH 43/54] drivers/hv: replace cpumask_weight with cpumask_weight_eq

On Sun, Jan 23, 2022 at 10:39:14AM -0800, Yury Norov wrote:
> init_vp_index() calls cpumask_weight() to compare the weights of cpumasks
> We can do it more efficiently with cpumask_weight_eq because conditional
> cpumask_weight may stop traversing the cpumask earlier (at least one), as
> soon as condition is met.
>
> Signed-off-by: Yury Norov <[email protected]>

Acked-by: Wei Liu <[email protected]>

2022-01-24 14:02:51

by Haiyang Zhang

[permalink] [raw]
Subject: RE: [PATCH 43/54] drivers/hv: replace cpumask_weight with cpumask_weight_eq



> -----Original Message-----
> From: Yury Norov <[email protected]>
> Sent: Sunday, January 23, 2022 1:39 PM
> To: Yury Norov <[email protected]>; Andy Shevchenko <[email protected]>;
> Rasmus Villemoes <[email protected]>; Andrew Morton <[email protected]>;
> Micha? Miros?aw <[email protected]>; Greg Kroah-Hartman <[email protected]>;
> Peter Zijlstra <[email protected]>; David Laight <[email protected]>; Joe Perches
> <[email protected]>; Dennis Zhou <[email protected]>; Emil Renner Berthing <[email protected]>;
> Nicholas Piggin <[email protected]>; Matti Vaittinen <[email protected]>;
> Alexey Klimov <[email protected]>; [email protected]; KY Srinivasan
> <[email protected]>; Haiyang Zhang <[email protected]>; Stephen Hemminger
> <[email protected]>; Wei Liu <[email protected]>; Dexuan Cui <[email protected]>;
> [email protected]
> Subject: [PATCH 43/54] drivers/hv: replace cpumask_weight with cpumask_weight_eq
>
> init_vp_index() calls cpumask_weight() to compare the weights of cpumasks
> We can do it more efficiently with cpumask_weight_eq because conditional
> cpumask_weight may stop traversing the cpumask earlier (at least one), as
> soon as condition is met.
>
> Signed-off-by: Yury Norov <[email protected]>
> ---
> drivers/hv/channel_mgmt.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/hv/channel_mgmt.c b/drivers/hv/channel_mgmt.c
> index 60375879612f..7420a5fd47b5 100644
> --- a/drivers/hv/channel_mgmt.c
> +++ b/drivers/hv/channel_mgmt.c
> @@ -762,8 +762,8 @@ static void init_vp_index(struct vmbus_channel *channel)
> }
> alloced_mask = &hv_context.hv_numa_map[numa_node];
>
> - if (cpumask_weight(alloced_mask) ==
> - cpumask_weight(cpumask_of_node(numa_node))) {
> + if (cpumask_weight_eq(alloced_mask,
> + cpumask_weight(cpumask_of_node(numa_node)))) {
> /*
> * We have cycled through all the CPUs in the node;
> * reset the alloced map.

Thanks.

Reviewed-by: Haiyang Zhang <[email protected]>

2022-01-24 15:03:37

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH 22/54] rcu: replace cpumask_weight with cpumask_empty where appropriate

On Sun, Jan 23, 2022 at 10:38:53AM -0800, Yury Norov wrote:
> In some places, RCU code calls cpumask_weight() to check if any bit of a
> given cpumask is set. We can do it more efficiently with cpumask_empty()
> because cpumask_empty() stops traversing the cpumask as soon as it finds
> first set bit, while cpumask_weight() counts all bits unconditionally.
>
> Signed-off-by: Yury Norov <[email protected]>

Good point! Queued and pushed, thank you!

Thanx, Paul

> ---
> kernel/rcu/tree_nocb.h | 4 ++--
> kernel/rcu/tree_plugin.h | 2 +-
> 2 files changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h
> index eeafb546a7a0..f83c7b1d6110 100644
> --- a/kernel/rcu/tree_nocb.h
> +++ b/kernel/rcu/tree_nocb.h
> @@ -1169,7 +1169,7 @@ void __init rcu_init_nohz(void)
> struct rcu_data *rdp;
>
> #if defined(CONFIG_NO_HZ_FULL)
> - if (tick_nohz_full_running && cpumask_weight(tick_nohz_full_mask))
> + if (tick_nohz_full_running && !cpumask_empty(tick_nohz_full_mask))
> need_rcu_nocb_mask = true;
> #endif /* #if defined(CONFIG_NO_HZ_FULL) */
>
> @@ -1348,7 +1348,7 @@ static void __init rcu_organize_nocb_kthreads(void)
> */
> void rcu_bind_current_to_nocb(void)
> {
> - if (cpumask_available(rcu_nocb_mask) && cpumask_weight(rcu_nocb_mask))
> + if (cpumask_available(rcu_nocb_mask) && !cpumask_empty(rcu_nocb_mask))
> WARN_ON(sched_setaffinity(current->pid, rcu_nocb_mask));
> }
> EXPORT_SYMBOL_GPL(rcu_bind_current_to_nocb);
> diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> index c5b45c2f68a1..0dc0c8d6717c 100644
> --- a/kernel/rcu/tree_plugin.h
> +++ b/kernel/rcu/tree_plugin.h
> @@ -1215,7 +1215,7 @@ static void rcu_boost_kthread_setaffinity(struct rcu_node *rnp, int outgoingcpu)
> cpu != outgoingcpu)
> cpumask_set_cpu(cpu, cm);
> cpumask_and(cm, cm, housekeeping_cpumask(HK_FLAG_RCU));
> - if (cpumask_weight(cm) == 0)
> + if (cpumask_empty(cm))
> cpumask_copy(cm, housekeeping_cpumask(HK_FLAG_RCU));
> set_cpus_allowed_ptr(t, cm);
> free_cpumask_var(cm);
> --
> 2.30.2
>

2022-01-24 16:01:56

by Florian Fainelli

[permalink] [raw]
Subject: Re: [PATCH 01/54] net/dsa: don't use bitmap_weight() in b53_arl_read()



On 1/23/2022 10:38 AM, Yury Norov wrote:
> Don't call bitmap_weight() if the following code can get by
> without it.
>
> Signed-off-by: Yury Norov <[email protected]>

Acked-by: Florian Fainelli <[email protected]>
--
Florian

2022-01-24 16:02:21

by Florian Fainelli

[permalink] [raw]
Subject: Re: [PATCH 19/54] drivers/irqchip: replace cpumask_weight with cpumask_empty where appropriate



On 1/23/2022 10:38 AM, Yury Norov wrote:
> bcm6345_l1_of_init() calls cpumask_weight() to check if any bit of a given
> cpumask is set. We can do it more efficiently with cpumask_empty() because
> cpumask_empty() stops traversing the cpumask as soon as it finds first set
> bit, while cpumask_weight() counts all bits unconditionally.
>
> Signed-off-by: Yury Norov <[email protected]>

Acked-by: Florian Fainelli <[email protected]>
--
Florian

2022-01-24 16:02:48

by Florian Fainelli

[permalink] [raw]
Subject: Re: [PATCH 02/54] net/ethernet: don't use bitmap_weight() in bcm_sysport_rule_set()



On 1/23/2022 10:38 AM, Yury Norov wrote:
> Don't call bitmap_weight() if the following code can get by
> without it.
>
> Signed-off-by: Yury Norov <[email protected]>

Acked-by: Florian Fainelli <[email protected]>
--
Florian

2022-01-24 18:50:09

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 47/54] sched: replace cpumask_weight with cpumask_weight_eq where appropriate

On Sun, Jan 23, 2022 at 10:39:18AM -0800, Yury Norov wrote:
> kernel/sched code uses cpumask_weight() to compare the weight of
> cpumask with a given number. We can do it more efficiently with
> cpumask_weight_eq because conditional cpumask_weight may stop
> traversing the cpumask earlier, as soon as condition is met.
>

Same as with the other patch, you're just making the code more difficult
to read for no reason.

2022-01-24 18:50:09

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 41/54] arch/x86: replace cpumask_weight with cpumask_weight_eq where appropriate

On Sun, Jan 23, 2022 at 10:39:12AM -0800, Yury Norov wrote:
> smpboot code in somw places calls cpumask_weight() to compare the weight
> of cpumask with a given number. We can do it more efficiently with
> cpumask_weight_eq() because conditional cpumask_weight may stop traversing
> the cpumask earlier, as soon as condition is met.

Why use a more complicated API for code that has no performance
requirements?

From where I'm sitting this is a net negative for making the code harder
to read.

2022-01-24 18:50:10

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 48/54] kernel/time: replace cpumask_weight with cpumask_weight_eq where appropriate

On Sun, Jan 23, 2022 at 10:39:19AM -0800, Yury Norov wrote:
> tick_cleanup_dead_cpu() calls cpumask_weight() to compare the weight
> of cpumask with a given number. We can do it more efficiently with
> cpumask_weight_eq() because conditional cpumask_weight may stop
> traversing the cpumask earlier, as soon as condition is met.

But again, nobody gives a crap about performance here..

> Signed-off-by: Yury Norov <[email protected]>
> ---
> kernel/time/clockevents.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/kernel/time/clockevents.c b/kernel/time/clockevents.c
> index 003ccf338d20..32d6629a55b2 100644
> --- a/kernel/time/clockevents.c
> +++ b/kernel/time/clockevents.c
> @@ -648,7 +648,7 @@ void tick_cleanup_dead_cpu(int cpu)
> */
> list_for_each_entry_safe(dev, tmp, &clockevent_devices, list) {
> if (cpumask_test_cpu(cpu, dev->cpumask) &&
> - cpumask_weight(dev->cpumask) == 1 &&
> + cpumask_weight_eq(dev->cpumask, 1) &&
> !tick_is_broadcast_device(dev)) {
> BUG_ON(!clockevent_state_detached(dev));
> list_del(&dev->list);
> --
> 2.30.2
>

2022-01-24 18:56:14

by Vitaly Kuznetsov

[permalink] [raw]
Subject: Re: [PATCH 06/54] x86/kvm: replace bitmap_weight with bitmap_empty where appropriate

Yury Norov <[email protected]> writes:

> In some places kvm/hyperv.c code calls bitmap_weight() to check if any bit
> of a given bitmap is set. It's better to use bitmap_empty() in that case
> because bitmap_empty() stops traversing the bitmap as soon as it finds
> first set bit, while bitmap_weight() counts all bits unconditionally.
>
> Signed-off-by: Yury Norov <[email protected]>
> ---
> arch/x86/kvm/hyperv.c | 8 ++++----
> 1 file changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c
> index 6e38a7d22e97..2c3400dea4b3 100644
> --- a/arch/x86/kvm/hyperv.c
> +++ b/arch/x86/kvm/hyperv.c
> @@ -90,7 +90,7 @@ static void synic_update_vector(struct kvm_vcpu_hv_synic *synic,
> {
> struct kvm_vcpu *vcpu = hv_synic_to_vcpu(synic);
> struct kvm_hv *hv = to_kvm_hv(vcpu->kvm);
> - int auto_eoi_old, auto_eoi_new;
> + bool auto_eoi_old, auto_eoi_new;
>
> if (vector < HV_SYNIC_FIRST_VALID_VECTOR)
> return;
> @@ -100,16 +100,16 @@ static void synic_update_vector(struct kvm_vcpu_hv_synic *synic,
> else
> __clear_bit(vector, synic->vec_bitmap);
>
> - auto_eoi_old = bitmap_weight(synic->auto_eoi_bitmap, 256);
> + auto_eoi_old = bitmap_empty(synic->auto_eoi_bitmap, 256);

I would've preferred this written as

auto_eoi_old = !bitmap_empty(synic->auto_eoi_bitmap, 256);

so the variable would indicate wether AutoEOI was previosly enabled, not
disabled.

>
> if (synic_has_vector_auto_eoi(synic, vector))
> __set_bit(vector, synic->auto_eoi_bitmap);
> else
> __clear_bit(vector, synic->auto_eoi_bitmap);
>
> - auto_eoi_new = bitmap_weight(synic->auto_eoi_bitmap, 256);
> + auto_eoi_new = bitmap_empty(synic->auto_eoi_bitmap, 256);

Same here, of course. "auto_eoi_new = true" sounds like "AutoEOI is now
enabled".

>
> - if (!!auto_eoi_old == !!auto_eoi_new)
> + if (auto_eoi_old == auto_eoi_new)
> return;
>
> down_write(&vcpu->kvm->arch.apicv_update_lock);

The change look good to me otherwise, feel free to add

Reviewed-by: Vitaly Kuznetsov <[email protected]>

--
Vitaly

2022-01-24 18:56:54

by Vitaly Kuznetsov

[permalink] [raw]
Subject: Re: [PATCH 41/54] arch/x86: replace cpumask_weight with cpumask_weight_eq where appropriate

Yury Norov <[email protected]> writes:

> smpboot code in somw places calls cpumask_weight() to compare the weight
> of cpumask with a given number. We can do it more efficiently with
> cpumask_weight_eq() because conditional cpumask_weight may stop traversing
> the cpumask earlier, as soon as condition is met.

I think this is misleading. cpumask_weight_eq() with any implementation
can only stop earlier if the condition is NOT met (when the number of
set bits is already higher than needed), to check for equality all bits
always need to be examined.

>
> Signed-off-by: Yury Norov <[email protected]>
> ---
> arch/x86/kernel/smpboot.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
> index 617012f4619f..e851e9945eb5 100644
> --- a/arch/x86/kernel/smpboot.c
> +++ b/arch/x86/kernel/smpboot.c
> @@ -1608,7 +1608,7 @@ static void remove_siblinginfo(int cpu)
> /*/
> * last thread sibling in this cpu core going down
> */
> - if (cpumask_weight(topology_sibling_cpumask(cpu)) == 1)
> + if (cpumask_weight_eq(topology_sibling_cpumask(cpu), 1))
> cpu_data(sibling).booted_cores--;
> }
>
> @@ -1617,7 +1617,7 @@ static void remove_siblinginfo(int cpu)
>
> for_each_cpu(sibling, topology_sibling_cpumask(cpu)) {
> cpumask_clear_cpu(cpu, topology_sibling_cpumask(sibling));
> - if (cpumask_weight(topology_sibling_cpumask(sibling)) == 1)
> + if (cpumask_weight_eq(topology_sibling_cpumask(sibling), 1))
> cpu_data(sibling).smt_active = false;
> }

--
Vitaly

2022-01-24 18:57:31

by Vitaly Kuznetsov

[permalink] [raw]
Subject: Re: [PATCH 43/54] drivers/hv: replace cpumask_weight with cpumask_weight_eq

Yury Norov <[email protected]> writes:

> init_vp_index() calls cpumask_weight() to compare the weights of cpumasks
> We can do it more efficiently with cpumask_weight_eq because conditional
> cpumask_weight may stop traversing the cpumask earlier (at least one), as
> soon as condition is met.

Same comment as for "PATCH 41/54": cpumask_weight_eq() can only stop
earlier if the condition is not met, to prove the equality all bits need
always have to be examined.

>
> Signed-off-by: Yury Norov <[email protected]>
> ---
> drivers/hv/channel_mgmt.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/hv/channel_mgmt.c b/drivers/hv/channel_mgmt.c
> index 60375879612f..7420a5fd47b5 100644
> --- a/drivers/hv/channel_mgmt.c
> +++ b/drivers/hv/channel_mgmt.c
> @@ -762,8 +762,8 @@ static void init_vp_index(struct vmbus_channel *channel)
> }
> alloced_mask = &hv_context.hv_numa_map[numa_node];
>
> - if (cpumask_weight(alloced_mask) ==
> - cpumask_weight(cpumask_of_node(numa_node))) {
> + if (cpumask_weight_eq(alloced_mask,
> + cpumask_weight(cpumask_of_node(numa_node)))) {

This code is not performace critical and I prefer the old version:

cpumask_weight() == cpumask_weight()

looks better than

cpumask_weight_eq(..., cpumask_weight())

(let alone the inner cpumask_of_node()) to me.

> /*
> * We have cycled through all the CPUs in the node;
> * reset the alloced map.

--
Vitaly

2022-01-24 19:06:05

by Sudeep Holla

[permalink] [raw]
Subject: Re: [PATCH 16/54] cpufreq: replace cpumask_weight with cpumask_empty where appropriate

On Sun, Jan 23, 2022 at 10:38:47AM -0800, Yury Norov wrote:
> drivers/cpufreq calls cpumask_weight() to check if any bit of a given
> cpumask is set. We can do it more efficiently with cpumask_empty() because
> cpumask_empty() stops traversing the cpumask as soon as it finds first set
> bit, while cpumask_weight() counts all bits unconditionally.
>

Reviewed-by: Sudeep Holla <[email protected]> (for SCMI cpufreq driver)

--
Regards,
Sudeep

2022-01-24 19:12:43

by Andy Shevchenko

[permalink] [raw]
Subject: Re: [PATCH 10/54] net: ethernet: replace bitmap_weight with bitmap_empty for qlogic

On Sun, Jan 23, 2022 at 10:38:41AM -0800, Yury Norov wrote:
> qlogic/qed code calls bitmap_weight() to check if any bit of a given
> bitmap is set. It's better to use bitmap_empty() in that case because
> bitmap_empty() stops traversing the bitmap as soon as it finds first
> set bit, while bitmap_weight() counts all bits unconditionally.

> - if (bitmap_weight((unsigned long *)&pmap[item], 64 * 8))
> + if (!bitmap_empty((unsigned long *)&pmap[item], 64 * 8))

> - (bitmap_weight((unsigned long *)&pmap[item],
> + (!bitmap_empty((unsigned long *)&pmap[item],

Side note, these castings reminds me previous discussion and I'm wondering
if you have this kind of potentially problematic places in your TODO as
subject to fix.


--
With Best Regards,
Andy Shevchenko


2022-01-24 19:13:10

by Andy Shevchenko

[permalink] [raw]
Subject: Re: [PATCH 27/54] lib/bitmap: add bitmap_weight_{cmp, eq, gt, ge, lt, le} functions

On Sun, Jan 23, 2022 at 10:38:58AM -0800, Yury Norov wrote:
> Many kernel users use bitmap_weight() to compare the result against
> some number or expression:
>
> if (bitmap_weight(...) > 1)
> do_something();
>
> It works OK, but may be significantly improved for large bitmaps: if
> first few words count set bits to a number greater than given, we can
> stop counting and immediately return.
>
> The same idea would work in other direction: if we know that the number
> of set bits that we counted so far is small enough, so that it would be
> smaller than required number even if all bits of the rest of the bitmap
> are set, we can stop counting earlier.
>
> This patch adds new bitmap_weight_cmp() as suggested by Michał Mirosław
> and a family of eq, gt, ge, lt and le wrappers to allow this optimization.

lt, and le

> The following patches apply new functions where appropriate.

What I missed in the above message is the rough statistics like some of them
are used more often, some less, and some, perhaps, just added for the sake of
symmetry (the latter is what would be important to see if there are APIs which
have no users at all).

> Suggested-by: "Michał Mirosław" <[email protected]> (for bitmap_weight_cmp)

Please, avoid using double quotes in the tags.

While at it, as a few folks already noticed, keep the subject lines in align
with the policies established in the certain subsystems (in this case seems
'bitmap:' should suffice). I would recommend to run

`git log --oneline --no-merges -- ...file(s)_in_question...`

to figure out what is the most used and best fit in each case individually.

...

> + * Returns zero if weight of @src is equal to @num;
> + * negative number if weight of @src is less than @num;
> + * positive number if weight of @src is greater than @num;

> + * NOTES
> + *
> + * Because number of set bits cannot decrease while counting, when user
> + * wants to know if the number of set bits in the bitmap is less than
> + * @num, calling
> + * bitmap_weight_cmp(..., @num) < 0
> + * is potentially less effective than
> + * bitmap_weight_cmp(..., @num - 1) <= 0
> + *
> + * Consider an example:
> + * bitmap_weight_cmp(1000 0000 0000 0000, 1) < 0
> + * ^
> + * stop here
> + *
> + * bitmap_weight_cmp(1000 0000 0000 0000, 0) <= 0
> + * ^
> + * stop here

This probably should precede the Returns paragraph, also that paragraph can be
converted to a section in the documentation as follows:

*
* Returns:
* ...
*

...

> + if (num > (int)nbits || num < 0)

Wonder if

if (abs(num) > nbits)

would be sufficient.

> + return -num;

--
With Best Regards,
Andy Shevchenko


2022-01-24 19:13:15

by Andy Shevchenko

[permalink] [raw]
Subject: Re: [PATCH 27/54] lib/bitmap: add bitmap_weight_{cmp, eq, gt, ge, lt, le} functions

On Mon, Jan 24, 2022 at 02:41:38PM +0200, Andy Shevchenko wrote:
> On Sun, Jan 23, 2022 at 10:38:58AM -0800, Yury Norov wrote:

...

> > + if (num > (int)nbits || num < 0)
>
> Wonder if
>
> if (abs(num) > nbits)
>
> would be sufficient.

Scratch it. Of course it won't work.

It may be other way around:

if ((unsigned int)num > nbits)

> > + return -num;

--
With Best Regards,
Andy Shevchenko


2022-01-24 19:13:34

by Andy Shevchenko

[permalink] [raw]
Subject: Re: [PATCH 29/54] drivers/iio: replace bitmap_weight() with bitmap_weight_{eq,gt} where appropriate

On Sun, Jan 23, 2022 at 10:39:00AM -0800, Yury Norov wrote:
> drivers/iio calls bitmap_weight() to compare the weight of bitmap with
> a given number. We can do it more efficiently with bitmap_weight_{eq, gt}
> because conditional bitmap_weight may stop traversing the bitmap earlier,
> as soon as condition is met.

...

> int i, j;
>
> for (i = 0, j = 0;
> - i < bitmap_weight(indio_dev->active_scan_mask,
> - indio_dev->masklength);
> + bitmap_weight_gt(indio_dev->active_scan_mask,
> + indio_dev->masklength, i);
> i++, j++) {
> j = find_next_bit(indio_dev->active_scan_mask,
> indio_dev->masklength, j);

This smells like room for improvement. Have you checked this deeply?

--
With Best Regards,
Andy Shevchenko


2022-01-24 19:13:57

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 27/54] lib/bitmap: add bitmap_weight_{cmp, eq, gt, ge, lt, le} functions

On Mon, Jan 24, 2022 at 02:43:30PM +0200, Andy Shevchenko wrote:
> It may be other way around:
>
> if ((unsigned int)num > nbits)

Yes, that's my preferred method too :-)

2022-01-24 19:14:06

by Andy Shevchenko

[permalink] [raw]
Subject: Re: [PATCH 33/54] net: ethernet: replace bitmap_weight with bitmap_weight_{eq,gt,ge,lt,le} for mellanox

On Sun, Jan 23, 2022 at 10:39:04AM -0800, Yury Norov wrote:
> Mellanox code uses bitmap_weight() to compare the weight of bitmap with
> a given number. We can do it more efficiently with bitmap_weight_{eq, ...}
> because conditional bitmap_weight may stop traversing the bitmap earlier,
> as soon as condition is met.

> - if (port <= 0 || port > m)
> + if (port <= 0 || bitmap_weight_lt(actv_ports.ports, dev->caps.num_ports, port))
> return -EINVAL;

Can we eliminate now the port <= 0 check? Or at least make it port == 0?

--
With Best Regards,
Andy Shevchenko


2022-01-25 14:25:46

by Tvrtko Ursulin

[permalink] [raw]
Subject: Re: [PATCH 17/54] gpu: drm: replace cpumask_weight with cpumask_empty where appropriate


On 23/01/2022 18:38, Yury Norov wrote:
> i915_pmu_cpu_online() calls cpumask_weight() to check if any bit of a
> given cpumask is set. We can do it more efficiently with cpumask_empty()
> because cpumask_empty() stops traversing the cpumask as soon as it finds
> first set bit, while cpumask_weight() counts all bits unconditionally.
>
> Signed-off-by: Yury Norov <[email protected]>
> ---
> drivers/gpu/drm/i915/i915_pmu.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_pmu.c b/drivers/gpu/drm/i915/i915_pmu.c
> index ea655161793e..1894c876b31d 100644
> --- a/drivers/gpu/drm/i915/i915_pmu.c
> +++ b/drivers/gpu/drm/i915/i915_pmu.c
> @@ -1048,7 +1048,7 @@ static int i915_pmu_cpu_online(unsigned int cpu, struct hlist_node *node)
> GEM_BUG_ON(!pmu->base.event_init);
>
> /* Select the first online CPU as a designated reader. */
> - if (!cpumask_weight(&i915_pmu_cpumask))
> + if (cpumask_empty(&i915_pmu_cpumask))
> cpumask_set_cpu(cpu, &i915_pmu_cpumask);
>
> return 0;
>

Reviewed-by: Tvrtko Ursulin <[email protected]>

I see it's a large series which only partially appeared on our mailing
lists. So for instance it hasn't got tested by our automated CI. (Not
that I expect any problems in this patch.)

What are the plans in terms of which tree will it get merged through?

Regards,

Tvrtko

2022-01-26 03:26:26

by Yury Norov

[permalink] [raw]
Subject: Re: [PATCH 17/54] gpu: drm: replace cpumask_weight with cpumask_empty where appropriate

On Tue, Jan 25, 2022 at 1:28 AM Tvrtko Ursulin
<[email protected]> wrote:
>
>
> On 23/01/2022 18:38, Yury Norov wrote:
> > i915_pmu_cpu_online() calls cpumask_weight() to check if any bit of a
> > given cpumask is set. We can do it more efficiently with cpumask_empty()
> > because cpumask_empty() stops traversing the cpumask as soon as it finds
> > first set bit, while cpumask_weight() counts all bits unconditionally.
> >
> > Signed-off-by: Yury Norov <[email protected]>
> > ---
> > drivers/gpu/drm/i915/i915_pmu.c | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/drivers/gpu/drm/i915/i915_pmu.c b/drivers/gpu/drm/i915/i915_pmu.c
> > index ea655161793e..1894c876b31d 100644
> > --- a/drivers/gpu/drm/i915/i915_pmu.c
> > +++ b/drivers/gpu/drm/i915/i915_pmu.c
> > @@ -1048,7 +1048,7 @@ static int i915_pmu_cpu_online(unsigned int cpu, struct hlist_node *node)
> > GEM_BUG_ON(!pmu->base.event_init);
> >
> > /* Select the first online CPU as a designated reader. */
> > - if (!cpumask_weight(&i915_pmu_cpumask))
> > + if (cpumask_empty(&i915_pmu_cpumask))
> > cpumask_set_cpu(cpu, &i915_pmu_cpumask);
> >
> > return 0;
> >
>
> Reviewed-by: Tvrtko Ursulin <[email protected]>
>
> I see it's a large series which only partially appeared on our mailing
> lists.

The series is here: https://lkml.org/lkml/2022/1/23/223
The branch: https://github.com/norov/linux/tree/bitmap-20220123

> So for instance it hasn't got tested by our automated CI. (Not
> that I expect any problems in this patch.)

Would be great if you give a test for the whole series, thanks!

> What are the plans in terms of which tree will it get merged through?

For the patches that will not be merged by maintainers of corresponding
subsystems, I'll use my bitmap branch and send it to linux-next.

Thanks,
Yury

2022-01-26 09:54:16

by Yury Norov

[permalink] [raw]
Subject: Re: [PATCH 10/54] net: ethernet: replace bitmap_weight with bitmap_empty for qlogic

On Mon, Jan 24, 2022 at 4:29 AM Andy Shevchenko
<[email protected]> wrote:
>
> On Sun, Jan 23, 2022 at 10:38:41AM -0800, Yury Norov wrote:
> > qlogic/qed code calls bitmap_weight() to check if any bit of a given
> > bitmap is set. It's better to use bitmap_empty() in that case because
> > bitmap_empty() stops traversing the bitmap as soon as it finds first
> > set bit, while bitmap_weight() counts all bits unconditionally.
>
> > - if (bitmap_weight((unsigned long *)&pmap[item], 64 * 8))
> > + if (!bitmap_empty((unsigned long *)&pmap[item], 64 * 8))
>
> > - (bitmap_weight((unsigned long *)&pmap[item],
> > + (!bitmap_empty((unsigned long *)&pmap[item],
>
> Side note, these castings reminds me previous discussion and I'm wondering
> if you have this kind of potentially problematic places in your TODO as
> subject to fix.

In the discussion you mentioned above, the u32* was cast to u64*,
which is wrong. The code
here is safe because in the worst case, it casts u64* to u32*. This
would be OK wrt
-Werror=array-bounds.

The function itself looks like doing this unsigned long <-> u64
conversions just for printing
purpose. I'm not a qlogic expert, so let's wait what people say?

The printing part may be refactored although to use %pb" format,
similarly to the snippet below
(not tested).

Thanks,
Yury

diff --git a/drivers/net/ethernet/qlogic/qed/qed_rdma.c
b/drivers/net/ethernet/qlogic/qed/qed_rdma.c
index 23b668de4640..72505517ced1 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_rdma.c
+++ b/drivers/net/ethernet/qlogic/qed/qed_rdma.c
@@ -336,17 +336,8 @@ void qed_rdma_bmap_free(struct qed_hwfn *p_hwfn,

/* print aligned non-zero lines, if any */
for (item = 0, line = 0; line < last_line; line++, item += 8)
- if (bitmap_weight((unsigned long *)&pmap[item], 64 * 8))
- DP_NOTICE(p_hwfn,
- "line 0x%04x: 0x%016llx 0x%016llx
0x%016llx 0x%016llx 0x%016llx 0x%016llx 0x%016llx 0x%016llx\n",
- line,
- pmap[item],
- pmap[item + 1],
- pmap[item + 2],
- pmap[item + 3],
- pmap[item + 4],
- pmap[item + 5],
- pmap[item + 6], pmap[item + 7]);
+ if (bitmap_weight(bmap->bitmap, 64 * 8))
+ DP_NOTICE(p_hwfn, "line 0x%04x: %512pb\n",
line, bmap->bitmap);

/* print last unaligned non-zero line, if any */
if ((bmap->max_count % (64 * 8)) &&

2022-01-26 10:54:51

by David Laight

[permalink] [raw]
Subject: RE: [PATCH 10/54] net: ethernet: replace bitmap_weight with bitmap_empty for qlogic

From: Yury Norov
> Sent: 25 January 2022 21:10
> On Mon, Jan 24, 2022 at 4:29 AM Andy Shevchenko
> <[email protected]> wrote:
> >
> > On Sun, Jan 23, 2022 at 10:38:41AM -0800, Yury Norov wrote:
> > > qlogic/qed code calls bitmap_weight() to check if any bit of a given
> > > bitmap is set. It's better to use bitmap_empty() in that case because
> > > bitmap_empty() stops traversing the bitmap as soon as it finds first
> > > set bit, while bitmap_weight() counts all bits unconditionally.
> >
> > > - if (bitmap_weight((unsigned long *)&pmap[item], 64 * 8))
> > > + if (!bitmap_empty((unsigned long *)&pmap[item], 64 * 8))
> >
> > > - (bitmap_weight((unsigned long *)&pmap[item],
> > > + (!bitmap_empty((unsigned long *)&pmap[item],
> >
> > Side note, these castings reminds me previous discussion and I'm wondering
> > if you have this kind of potentially problematic places in your TODO as
> > subject to fix.
>
> In the discussion you mentioned above, the u32* was cast to u64*,
> which is wrong. The code
> here is safe because in the worst case, it casts u64* to u32*. This
> would be OK wrt
> -Werror=array-bounds.
>
> The function itself looks like doing this unsigned long <-> u64
> conversions just for printing
> purpose. I'm not a qlogic expert, so let's wait what people say?

It'll be wrong on BE systems.
You just can't cast the argument it has to be long[].

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

2022-01-26 12:37:43

by Thomas Bogendoerfer

[permalink] [raw]
Subject: Re: [PATCH 38/54] arch/mips: replace cpumask_weight with cpumask_weight_{eq, ...} where appropriate

On Sun, Jan 23, 2022 at 10:39:09AM -0800, Yury Norov wrote:
> Mips code uses calls cpumask_weight() to compare the weight of
> cpumask with a given number. We can do it more efficiently with
> cpumask_weight_{eq, ...} because conditional cpumask_weight may stop
> traversing the cpumask earlier, as soon as condition is met.
>
> Signed-off-by: Yury Norov <[email protected]>
> ---
> arch/mips/cavium-octeon/octeon-irq.c | 4 ++--
> arch/mips/kernel/crash.c | 2 +-
> 2 files changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/arch/mips/cavium-octeon/octeon-irq.c b/arch/mips/cavium-octeon/octeon-irq.c
> index 844f882096e6..914871f15fb7 100644
> --- a/arch/mips/cavium-octeon/octeon-irq.c
> +++ b/arch/mips/cavium-octeon/octeon-irq.c
> @@ -763,7 +763,7 @@ static void octeon_irq_cpu_offline_ciu(struct irq_data *data)
> if (!cpumask_test_cpu(cpu, mask))
> return;
>
> - if (cpumask_weight(mask) > 1) {
> + if (cpumask_weight_gt(mask, 1)) {
> /*
> * It has multi CPU affinity, just remove this CPU
> * from the affinity set.
> @@ -795,7 +795,7 @@ static int octeon_irq_ciu_set_affinity(struct irq_data *data,
> * This removes the need to do locking in the .ack/.eoi
> * functions.
> */
> - if (cpumask_weight(dest) != 1)
> + if (!cpumask_weight_eq(dest, 1))
> return -EINVAL;
>
> if (!enable_one)
> diff --git a/arch/mips/kernel/crash.c b/arch/mips/kernel/crash.c
> index 81845ba04835..5b690d52491f 100644
> --- a/arch/mips/kernel/crash.c
> +++ b/arch/mips/kernel/crash.c
> @@ -72,7 +72,7 @@ static void crash_kexec_prepare_cpus(void)
> */
> pr_emerg("Sending IPI to other cpus...\n");
> msecs = 10000;
> - while ((cpumask_weight(&cpus_in_crash) < ncpus) && (--msecs > 0)) {
> + while (cpumask_weight_lt(&cpus_in_crash, ncpus) && (--msecs > 0)) {
> cpu_relax();
> mdelay(1);
> }
> --
> 2.30.2

Acked-by: Thomas Bogendoerfer <[email protected]>

--
Crap can work. Given enough thrust pigs will fly, but it's not necessarily a
good idea. [ RFC1925, 2.3 ]

2022-01-26 13:01:31

by Yury Norov

[permalink] [raw]
Subject: Re: [PATCH 10/54] net: ethernet: replace bitmap_weight with bitmap_empty for qlogic

On Tue, Jan 25, 2022 at 2:15 PM David Laight <[email protected]> wrote:
>
> From: Yury Norov
> > Sent: 25 January 2022 21:10
> > On Mon, Jan 24, 2022 at 4:29 AM Andy Shevchenko
> > <[email protected]> wrote:
> > >
> > > On Sun, Jan 23, 2022 at 10:38:41AM -0800, Yury Norov wrote:
> > > > qlogic/qed code calls bitmap_weight() to check if any bit of a given
> > > > bitmap is set. It's better to use bitmap_empty() in that case because
> > > > bitmap_empty() stops traversing the bitmap as soon as it finds first
> > > > set bit, while bitmap_weight() counts all bits unconditionally.
> > >
> > > > - if (bitmap_weight((unsigned long *)&pmap[item], 64 * 8))
> > > > + if (!bitmap_empty((unsigned long *)&pmap[item], 64 * 8))
> > >
> > > > - (bitmap_weight((unsigned long *)&pmap[item],
> > > > + (!bitmap_empty((unsigned long *)&pmap[item],
> > >
> > > Side note, these castings reminds me previous discussion and I'm wondering
> > > if you have this kind of potentially problematic places in your TODO as
> > > subject to fix.
> >
> > In the discussion you mentioned above, the u32* was cast to u64*,
> > which is wrong. The code
> > here is safe because in the worst case, it casts u64* to u32*. This
> > would be OK wrt
> > -Werror=array-bounds.
> >
> > The function itself looks like doing this unsigned long <-> u64
> > conversions just for printing
> > purpose. I'm not a qlogic expert, so let's wait what people say?
>
> It'll be wrong on BE systems.

The bitmap_weigh() result will be correct. As you can see, the address
is 64-bit aligned anyways. The array boundary violation will never happen
as well.

DP_NOTICE() may be wrong, or may not. It depends on how important
the absolute position of the bit in the printed bitmap is. Nevertheless,
printk("%pb") is better and should be used.

This whole concern may be simply irrelevant if QED is not supported
on 32-bit BE machines. From what I can see, at least Infiniband requires
64BIT.

Thanks,
Yury

> You just can't cast the argument it has to be long[].
>
> David
>
> -
> Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
> Registration No: 1397386 (Wales)

2022-01-26 20:22:58

by Matti Vaittinen

[permalink] [raw]
Subject: Re: [PATCH v3 00/54] lib/bitmap: optimize bitmap_weight() usage

On 1/23/22 20:38, Yury Norov wrote:
> In many cases people use bitmap_weight()-based functions to compare
> the result against a number of expression:
>
> if (cpumask_weight(mask) > 1)
> do_something();
>
> This may take considerable amount of time on many-cpus machines because
> cpumask_weight() will traverse every word of underlying cpumask
> unconditionally.
>
> We can significantly improve on it for many real cases if stop traversing
> the mask as soon as we count cpus to any number greater than 1:
>
> if (cpumask_weight_gt(mask, 1))
> do_something();

I guess I am part of the recipient list because I did the original
suggestion of adding the single_bit_set()?

If this is the case - well, I do like this series. Overall it looks good
to me - but I for sure did not go through all the changes in detail ;)
If there is some other reason to loop me in (Eg, if someone expects me
to take a more specific look on something) - please give me a nudge.

Best Regards
-- Matti Vaittinen


--
The Linux Kernel guy at ROHM Semiconductors

Matti Vaittinen, Linux device drivers
ROHM Semiconductors, Finland SWDC
Kiviharjunlenkki 1E
90220 OULU
FINLAND

~~ this year is the year of a signature writers block ~~

2022-01-26 20:37:34

by Tariq Toukan

[permalink] [raw]
Subject: Re: [PATCH 04/54] net: mellanox: fix open-coded for_each_set_bit()



On 1/23/2022 8:38 PM, Yury Norov wrote:
> Mellanox driver has an open-coded for_each_set_bit(). Fix it.
>
> Signed-off-by: Yury Norov <[email protected]>
> ---
> drivers/net/ethernet/mellanox/mlx4/cmd.c | 23 ++++++-----------------
> 1 file changed, 6 insertions(+), 17 deletions(-)
>

Reviewed-by: Tariq Toukan <[email protected]>

Thanks,
Tariq

2022-01-26 20:51:15

by Tvrtko Ursulin

[permalink] [raw]
Subject: Re: [PATCH 17/54] gpu: drm: replace cpumask_weight with cpumask_empty where appropriate


On 25/01/2022 18:16, Yury Norov wrote:
> On Tue, Jan 25, 2022 at 1:28 AM Tvrtko Ursulin
> <[email protected]> wrote:
>>
>>
>> On 23/01/2022 18:38, Yury Norov wrote:
>>> i915_pmu_cpu_online() calls cpumask_weight() to check if any bit of a
>>> given cpumask is set. We can do it more efficiently with cpumask_empty()
>>> because cpumask_empty() stops traversing the cpumask as soon as it finds
>>> first set bit, while cpumask_weight() counts all bits unconditionally.
>>>
>>> Signed-off-by: Yury Norov <[email protected]>
>>> ---
>>> drivers/gpu/drm/i915/i915_pmu.c | 2 +-
>>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/gpu/drm/i915/i915_pmu.c b/drivers/gpu/drm/i915/i915_pmu.c
>>> index ea655161793e..1894c876b31d 100644
>>> --- a/drivers/gpu/drm/i915/i915_pmu.c
>>> +++ b/drivers/gpu/drm/i915/i915_pmu.c
>>> @@ -1048,7 +1048,7 @@ static int i915_pmu_cpu_online(unsigned int cpu, struct hlist_node *node)
>>> GEM_BUG_ON(!pmu->base.event_init);
>>>
>>> /* Select the first online CPU as a designated reader. */
>>> - if (!cpumask_weight(&i915_pmu_cpumask))
>>> + if (cpumask_empty(&i915_pmu_cpumask))
>>> cpumask_set_cpu(cpu, &i915_pmu_cpumask);
>>>
>>> return 0;
>>>
>>
>> Reviewed-by: Tvrtko Ursulin <[email protected]>
>>
>> I see it's a large series which only partially appeared on our mailing
>> lists.
>
> The series is here: https://lkml.org/lkml/2022/1/23/223
> The branch: https://github.com/norov/linux/tree/bitmap-20220123
>
>> So for instance it hasn't got tested by our automated CI. (Not
>> that I expect any problems in this patch.)
>
> Would be great if you give a test for the whole series, thanks!

Can't really test the whole series for you, but if you want to send just
the i915 patch standalone to the intel-gfx mailing list, that would
trigger the CI run and if that passes we can merge that single one.

>> What are the plans in terms of which tree will it get merged through?
>
> For the patches that will not be merged by maintainers of corresponding
> subsystems, I'll use my bitmap branch and send it to linux-next.

Or I guess we can wait for them to trickle back to us this way.

Regards,

Tvrtko

2022-01-26 22:10:57

by Yury Norov

[permalink] [raw]
Subject: Re: [PATCH 27/54] lib/bitmap: add bitmap_weight_{cmp, eq, gt, ge, lt, le} functions

On Mon, Jan 24, 2022 at 4:42 AM Andy Shevchenko
<[email protected]> wrote:
>
> On Sun, Jan 23, 2022 at 10:38:58AM -0800, Yury Norov wrote:
> > Many kernel users use bitmap_weight() to compare the result against
> > some number or expression:
> >
> > if (bitmap_weight(...) > 1)
> > do_something();
> >
> > It works OK, but may be significantly improved for large bitmaps: if
> > first few words count set bits to a number greater than given, we can
> > stop counting and immediately return.
> >
> > The same idea would work in other direction: if we know that the number
> > of set bits that we counted so far is small enough, so that it would be
> > smaller than required number even if all bits of the rest of the bitmap
> > are set, we can stop counting earlier.
> >
> > This patch adds new bitmap_weight_cmp() as suggested by Michał Mirosław
> > and a family of eq, gt, ge, lt and le wrappers to allow this optimization.
>
> lt, and le
>
> > The following patches apply new functions where appropriate.
>
> What I missed in the above message is the rough statistics like some of them
> are used more often, some less, and some, perhaps, just added for the sake of
> symmetry (the latter is what would be important to see if there are APIs which
> have no users at all).

These are my grep numbers. Some lines are declarations and comments, so minus
6 or 8 for each number, but all new functions have actual users.

$ git grep weight_eq|wc -l
35
$ git grep weight_gt|wc -l
20
$ git grep weight_ge|wc -l
25
$ git grep weight_lt|wc -l
14
$ git grep weight_le|wc -l
18

2022-01-28 07:11:04

by Michael Kelley (LINUX)

[permalink] [raw]
Subject: RE: [PATCH 43/54] drivers/hv: replace cpumask_weight with cpumask_weight_eq

From: Vitaly Kuznetsov <[email protected]> Sent: Monday, January 24, 2022 1:20 AM
>
> Yury Norov <[email protected]> writes:
>
> > init_vp_index() calls cpumask_weight() to compare the weights of cpumasks
> > We can do it more efficiently with cpumask_weight_eq because conditional
> > cpumask_weight may stop traversing the cpumask earlier (at least one), as
> > soon as condition is met.
>
> Same comment as for "PATCH 41/54": cpumask_weight_eq() can only stop
> earlier if the condition is not met, to prove the equality all bits need
> always have to be examined.
>
> >
> > Signed-off-by: Yury Norov <[email protected]>
> > ---
> > drivers/hv/channel_mgmt.c | 4 ++--
> > 1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/hv/channel_mgmt.c b/drivers/hv/channel_mgmt.c
> > index 60375879612f..7420a5fd47b5 100644
> > --- a/drivers/hv/channel_mgmt.c
> > +++ b/drivers/hv/channel_mgmt.c
> > @@ -762,8 +762,8 @@ static void init_vp_index(struct vmbus_channel *channel)
> > }
> > alloced_mask = &hv_context.hv_numa_map[numa_node];
> >
> > - if (cpumask_weight(alloced_mask) ==
> > - cpumask_weight(cpumask_of_node(numa_node))) {
> > + if (cpumask_weight_eq(alloced_mask,
> > + cpumask_weight(cpumask_of_node(numa_node)))) {
>
> This code is not performace critical and I prefer the old version:
>
> cpumask_weight() == cpumask_weight()
>
> looks better than
>
> cpumask_weight_eq(..., cpumask_weight())
>
> (let alone the inner cpumask_of_node()) to me.
>
> > /*
> > * We have cycled through all the CPUs in the node;
> > * reset the alloced map.
>
> --
> Vitaly

I agree with Vitaly in preferring the old version, and indeed performance
here is a shrug. But actually, I think the old version is a poorly coded way
to determine if the two cpumasks are equal. The following would correctly
capture the intent:

if (cpumask_equal(alloced_mask, cpumask_of_node(numa_node))

Michael



2022-01-28 11:26:56

by Steve Wahl

[permalink] [raw]
Subject: Re: [PATCH 15/54] arch/x86: replace cpumask_weight with cpumask_empty where appropriate

Reviewed-by: Steve Wahl <[email protected]>

On Sun, Jan 23, 2022 at 10:38:46AM -0800, Yury Norov wrote:
> In some cases, arch/x86 code calls cpumask_weight() to check if any bit of
> a given cpumask is set. We can do it more efficiently with cpumask_empty()
> because cpumask_empty() stops traversing the cpumask as soon as it finds
> first set bit, while cpumask_weight() counts all bits unconditionally.
>
> Signed-off-by: Yury Norov <[email protected]>
> ---
> arch/x86/kernel/cpu/resctrl/rdtgroup.c | 14 +++++++-------
> arch/x86/mm/mmio-mod.c | 2 +-
> arch/x86/platform/uv/uv_nmi.c | 2 +-
> 3 files changed, 9 insertions(+), 9 deletions(-)
>
> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> index b57b3db9a6a7..e23ff03290b8 100644
> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> @@ -341,14 +341,14 @@ static int cpus_mon_write(struct rdtgroup *rdtgrp, cpumask_var_t newmask,
>
> /* Check whether cpus belong to parent ctrl group */
> cpumask_andnot(tmpmask, newmask, &prgrp->cpu_mask);
> - if (cpumask_weight(tmpmask)) {
> + if (!cpumask_empty(tmpmask)) {
> rdt_last_cmd_puts("Can only add CPUs to mongroup that belong to parent\n");
> return -EINVAL;
> }
>
> /* Check whether cpus are dropped from this group */
> cpumask_andnot(tmpmask, &rdtgrp->cpu_mask, newmask);
> - if (cpumask_weight(tmpmask)) {
> + if (!cpumask_empty(tmpmask)) {
> /* Give any dropped cpus to parent rdtgroup */
> cpumask_or(&prgrp->cpu_mask, &prgrp->cpu_mask, tmpmask);
> update_closid_rmid(tmpmask, prgrp);
> @@ -359,7 +359,7 @@ static int cpus_mon_write(struct rdtgroup *rdtgrp, cpumask_var_t newmask,
> * and update per-cpu rmid
> */
> cpumask_andnot(tmpmask, newmask, &rdtgrp->cpu_mask);
> - if (cpumask_weight(tmpmask)) {
> + if (!cpumask_empty(tmpmask)) {
> head = &prgrp->mon.crdtgrp_list;
> list_for_each_entry(crgrp, head, mon.crdtgrp_list) {
> if (crgrp == rdtgrp)
> @@ -394,7 +394,7 @@ static int cpus_ctrl_write(struct rdtgroup *rdtgrp, cpumask_var_t newmask,
>
> /* Check whether cpus are dropped from this group */
> cpumask_andnot(tmpmask, &rdtgrp->cpu_mask, newmask);
> - if (cpumask_weight(tmpmask)) {
> + if (!cpumask_empty(tmpmask)) {
> /* Can't drop from default group */
> if (rdtgrp == &rdtgroup_default) {
> rdt_last_cmd_puts("Can't drop CPUs from default group\n");
> @@ -413,12 +413,12 @@ static int cpus_ctrl_write(struct rdtgroup *rdtgrp, cpumask_var_t newmask,
> * and update per-cpu closid/rmid.
> */
> cpumask_andnot(tmpmask, newmask, &rdtgrp->cpu_mask);
> - if (cpumask_weight(tmpmask)) {
> + if (!cpumask_empty(tmpmask)) {
> list_for_each_entry(r, &rdt_all_groups, rdtgroup_list) {
> if (r == rdtgrp)
> continue;
> cpumask_and(tmpmask1, &r->cpu_mask, tmpmask);
> - if (cpumask_weight(tmpmask1))
> + if (!cpumask_empty(tmpmask1))
> cpumask_rdtgrp_clear(r, tmpmask1);
> }
> update_closid_rmid(tmpmask, rdtgrp);
> @@ -488,7 +488,7 @@ static ssize_t rdtgroup_cpus_write(struct kernfs_open_file *of,
>
> /* check that user didn't specify any offline cpus */
> cpumask_andnot(tmpmask, newmask, cpu_online_mask);
> - if (cpumask_weight(tmpmask)) {
> + if (!cpumask_empty(tmpmask)) {
> ret = -EINVAL;
> rdt_last_cmd_puts("Can only assign online CPUs\n");
> goto unlock;
> diff --git a/arch/x86/mm/mmio-mod.c b/arch/x86/mm/mmio-mod.c
> index 933a2ebad471..c3317f0650d8 100644
> --- a/arch/x86/mm/mmio-mod.c
> +++ b/arch/x86/mm/mmio-mod.c
> @@ -400,7 +400,7 @@ static void leave_uniprocessor(void)
> int cpu;
> int err;
>
> - if (!cpumask_available(downed_cpus) || cpumask_weight(downed_cpus) == 0)
> + if (!cpumask_available(downed_cpus) || cpumask_empty(downed_cpus))
> return;
> pr_notice("Re-enabling CPUs...\n");
> for_each_cpu(cpu, downed_cpus) {
> diff --git a/arch/x86/platform/uv/uv_nmi.c b/arch/x86/platform/uv/uv_nmi.c
> index 1e9ff28bc2e0..ea277fc08357 100644
> --- a/arch/x86/platform/uv/uv_nmi.c
> +++ b/arch/x86/platform/uv/uv_nmi.c
> @@ -985,7 +985,7 @@ static int uv_handle_nmi(unsigned int reason, struct pt_regs *regs)
>
> /* Clear global flags */
> if (master) {
> - if (cpumask_weight(uv_nmi_cpu_mask))
> + if (!cpumask_empty(uv_nmi_cpu_mask))
> uv_nmi_cleanup_mask();
> atomic_set(&uv_nmi_cpus_in_nmi, -1);
> atomic_set(&uv_nmi_cpu, -1);
> --
> 2.30.2
>

--
Steve Wahl, Hewlett Packard Enterprise

2022-01-28 12:28:39

by Yury Norov

[permalink] [raw]
Subject: Re: [PATCH v3 00/54] lib/bitmap: optimize bitmap_weight() usage

On Tue, Jan 25, 2022 at 11:30 PM Vaittinen, Matti
<[email protected]> wrote:
>
> On 1/23/22 20:38, Yury Norov wrote:
> > In many cases people use bitmap_weight()-based functions to compare
> > the result against a number of expression:
> >
> > if (cpumask_weight(mask) > 1)
> > do_something();
> >
> > This may take considerable amount of time on many-cpus machines because
> > cpumask_weight() will traverse every word of underlying cpumask
> > unconditionally.
> >
> > We can significantly improve on it for many real cases if stop traversing
> > the mask as soon as we count cpus to any number greater than 1:
> >
> > if (cpumask_weight_gt(mask, 1))
> > do_something();
>
> I guess I am part of the recipient list because I did the original
> suggestion of adding the single_bit_set()?

Yes, because of single_bit_set()

> If this is the case - well, I do like this series. Overall it looks good
> to me - but I for sure did not go through all the changes in detail ;)
> If there is some other reason to loop me in (Eg, if someone expects me
> to take a more specific look on something) - please give me a nudge.

The key patch of the series is #27: "lib/bitmap: add bitmap_weight_{cmp, eq,
gt, ge, lt, le} functions"

Feel free to add suggested/reviewed (or whatever you find appropriate) tags
if you want.

Thanks,
Yury

2022-01-30 03:40:47

by Matti Vaittinen

[permalink] [raw]
Subject: Re: [PATCH 27/54] lib/bitmap: add bitmap_weight_{cmp, eq, gt, ge, lt, le} functions

On 1/23/22 20:38, Yury Norov wrote:
> Many kernel users use bitmap_weight() to compare the result against
> some number or expression:
>
> if (bitmap_weight(...) > 1)
> do_something();
>
> It works OK, but may be significantly improved for large bitmaps: if
> first few words count set bits to a number greater than given, we can
> stop counting and immediately return.
>
> The same idea would work in other direction: if we know that the number
> of set bits that we counted so far is small enough, so that it would be
> smaller than required number even if all bits of the rest of the bitmap
> are set, we can stop counting earlier.
>
> This patch adds new bitmap_weight_cmp() as suggested by Michał Mirosław
> and a family of eq, gt, ge, lt and le wrappers to allow this optimization.
> The following patches apply new functions where appropriate.
>

Thanks for pushing this improvement Yury. Seeing how much this has
evolved from the single_bit_set() suggestion - it'd be a bit thick from
me to add a suggested-by ;) I did review it though and it looks good to me!

Reviewed-by: Matti Vaittinen <[email protected]>

> Suggested-by: "Michał Mirosław" <[email protected]> (for bitmap_weight_cmp)
> Signed-off-by: Yury Norov <[email protected]>
> ---
> include/linux/bitmap.h | 80 ++++++++++++++++++++++++++++++++++++++++++
> lib/bitmap.c | 21 +++++++++++
> 2 files changed, 101 insertions(+)


--
The Linux Kernel guy at ROHM Semiconductors

Matti Vaittinen, Linux device drivers
ROHM Semiconductors, Finland SWDC
Kiviharjunlenkki 1E
90220 OULU
FINLAND

~~ this year is the year of a signature writers block ~~

2022-01-30 20:48:37

by Vitaly Kuznetsov

[permalink] [raw]
Subject: RE: [PATCH 43/54] drivers/hv: replace cpumask_weight with cpumask_weight_eq

"Michael Kelley (LINUX)" <[email protected]> writes:

> From: Vitaly Kuznetsov <[email protected]> Sent: Monday, January 24, 2022 1:20 AM
>>
>> Yury Norov <[email protected]> writes:
>>
...
>> >
>> > diff --git a/drivers/hv/channel_mgmt.c b/drivers/hv/channel_mgmt.c
>> > index 60375879612f..7420a5fd47b5 100644
>> > --- a/drivers/hv/channel_mgmt.c
>> > +++ b/drivers/hv/channel_mgmt.c
>> > @@ -762,8 +762,8 @@ static void init_vp_index(struct vmbus_channel *channel)
>> > }
>> > alloced_mask = &hv_context.hv_numa_map[numa_node];
>> >
>> > - if (cpumask_weight(alloced_mask) ==
>> > - cpumask_weight(cpumask_of_node(numa_node))) {
>> > + if (cpumask_weight_eq(alloced_mask,
>> > + cpumask_weight(cpumask_of_node(numa_node)))) {
>>
>> This code is not performace critical and I prefer the old version:
>>
>> cpumask_weight() == cpumask_weight()
>>
>> looks better than
>>
>> cpumask_weight_eq(..., cpumask_weight())
>>
>> (let alone the inner cpumask_of_node()) to me.
>>
>> > /*
>> > * We have cycled through all the CPUs in the node;
>> > * reset the alloced map.
>>
> I agree with Vitaly in preferring the old version, and indeed performance
> here is a shrug. But actually, I think the old version is a poorly coded way
> to determine if the two cpumasks are equal. The following would correctly
> capture the intent:
>
> if (cpumask_equal(alloced_mask, cpumask_of_node(numa_node))
>

Indeed. While it seems that only CPUs from 'cpumask_of_node(numa_node)'
can be set in 'alloced_mask' (and thus the comparison is valid), there's
no real need to weigh anything. I'll send a patch.

--
Vitaly

2022-02-01 10:37:25

by Jonathan Cameron

[permalink] [raw]
Subject: Re: [PATCH 29/54] drivers/iio: replace bitmap_weight() with bitmap_weight_{eq,gt} where appropriate

On Mon, 24 Jan 2022 14:46:43 +0200
Andy Shevchenko <[email protected]> wrote:

> On Sun, Jan 23, 2022 at 10:39:00AM -0800, Yury Norov wrote:
> > drivers/iio calls bitmap_weight() to compare the weight of bitmap with
> > a given number. We can do it more efficiently with bitmap_weight_{eq, gt}
> > because conditional bitmap_weight may stop traversing the bitmap earlier,
> > as soon as condition is met.
>
> ...
>
> > int i, j;
> >
> > for (i = 0, j = 0;
> > - i < bitmap_weight(indio_dev->active_scan_mask,
> > - indio_dev->masklength);
> > + bitmap_weight_gt(indio_dev->active_scan_mask,
> > + indio_dev->masklength, i);
> > i++, j++) {
> > j = find_next_bit(indio_dev->active_scan_mask,
> > indio_dev->masklength, j);
>
> This smells like room for improvement. Have you checked this deeply?
>

I have no idea what I was smoking that day.
It was near 10 years ago, so I'll blame my younger self ;)

Jonathan

2022-02-01 20:43:13

by Ulf Hansson

[permalink] [raw]
Subject: Re: [PATCH 30/54] drivers/memstick: replace bitmap_weight with bitmap_weight_eq where appropriate

On Sun, 23 Jan 2022 at 19:41, Yury Norov <[email protected]> wrote:
>
> msb_validate_used_block_bitmap() calls bitmap_weight() to compare the
> weight of bitmap with a given number. We can do it more efficiently with
> bitmap_weight_eq because conditional bitmap_weight may stop traversing the
> bitmap earlier, as soon as condition is met.
>
> Signed-off-by: Yury Norov <[email protected]>

Acked-by: Ulf Hansson <[email protected]>

Kind regards
Uffe

> ---
> drivers/memstick/core/ms_block.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/memstick/core/ms_block.c b/drivers/memstick/core/ms_block.c
> index 0cda6c6baefc..5cdd987e78f7 100644
> --- a/drivers/memstick/core/ms_block.c
> +++ b/drivers/memstick/core/ms_block.c
> @@ -155,8 +155,8 @@ static int msb_validate_used_block_bitmap(struct msb_data *msb)
> for (i = 0; i < msb->zone_count; i++)
> total_free_blocks += msb->free_block_count[i];
>
> - if (msb->block_count - bitmap_weight(msb->used_blocks_bitmap,
> - msb->block_count) == total_free_blocks)
> + if (bitmap_weight_eq(msb->used_blocks_bitmap, msb->block_count,
> + msb->block_count - total_free_blocks))
> return 0;
>
> pr_err("BUG: free block counts don't match the bitmap");
> --
> 2.30.2
>

2022-02-07 11:29:08

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH 03/54] thermal/intel: don't use bitmap_weight() in end_power_clamp()

On Sun, Jan 23, 2022 at 7:39 PM Yury Norov <[email protected]> wrote:
>
> Don't call bitmap_weight() if the following code can get by
> without it.
>
> Signed-off-by: Yury Norov <[email protected]>
> ---
> drivers/thermal/intel/intel_powerclamp.c | 9 +++------
> 1 file changed, 3 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/thermal/intel/intel_powerclamp.c b/drivers/thermal/intel/intel_powerclamp.c
> index 14256421d98c..c841ab37e7c6 100644
> --- a/drivers/thermal/intel/intel_powerclamp.c
> +++ b/drivers/thermal/intel/intel_powerclamp.c
> @@ -556,12 +556,9 @@ static void end_power_clamp(void)
> * stop faster.
> */
> clamping = false;
> - if (bitmap_weight(cpu_clamping_mask, num_possible_cpus())) {
> - for_each_set_bit(i, cpu_clamping_mask, num_possible_cpus()) {
> - pr_debug("clamping worker for cpu %d alive, destroy\n",
> - i);
> - stop_power_clamp_worker(i);
> - }
> + for_each_set_bit(i, cpu_clamping_mask, num_possible_cpus()) {
> + pr_debug("clamping worker for cpu %d alive, destroy\n", i);
> + stop_power_clamp_worker(i);
> }
> }
>
> --

Applied as 5.18 material, thanks!

2022-02-09 13:44:35

by Viresh Kumar

[permalink] [raw]
Subject: Re: [PATCH 16/54] cpufreq: replace cpumask_weight with cpumask_empty where appropriate

On 23-01-22, 10:38, Yury Norov wrote:
> drivers/cpufreq calls cpumask_weight() to check if any bit of a given
> cpumask is set. We can do it more efficiently with cpumask_empty() because
> cpumask_empty() stops traversing the cpumask as soon as it finds first set
> bit, while cpumask_weight() counts all bits unconditionally.
>
> Signed-off-by: Yury Norov <[email protected]>
> ---
> drivers/cpufreq/qcom-cpufreq-hw.c | 2 +-
> drivers/cpufreq/scmi-cpufreq.c | 2 +-
> 2 files changed, 2 insertions(+), 2 deletions(-)

Applied. Thanks.

--
viresh

2022-02-09 13:46:50

by Yury Norov

[permalink] [raw]
Subject: Re: [PATCH 33/54] net: ethernet: replace bitmap_weight with bitmap_weight_{eq,gt,ge,lt,le} for mellanox

On Mon, Jan 24, 2022 at 02:48:12PM +0200, Andy Shevchenko wrote:
> On Sun, Jan 23, 2022 at 10:39:04AM -0800, Yury Norov wrote:
> > Mellanox code uses bitmap_weight() to compare the weight of bitmap with
> > a given number. We can do it more efficiently with bitmap_weight_{eq, ...}
> > because conditional bitmap_weight may stop traversing the bitmap earlier,
> > as soon as condition is met.
>
> > - if (port <= 0 || port > m)
> > + if (port <= 0 || bitmap_weight_lt(actv_ports.ports, dev->caps.num_ports, port))
> > return -EINVAL;
>
> Can we eliminate now the port <= 0 check? Or at least make it port == 0?

The port is a parameter of exported function. I'd rather not take this risk.
Even if it makes sense, it should be a separate patch anyways.

2022-02-15 17:23:42

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: Re: [PATCH 12/54] tools/perf: replace bitmap_weight with bitmap_empty where appropriate

Em Sun, Jan 23, 2022 at 10:38:43AM -0800, Yury Norov escreveu:
> Some code in builtin-c2c.c calls bitmap_weight() to check if any bit of
> a given bitmap is set. It's better to use bitmap_empty() in that case
> because bitmap_empty() stops traversing the bitmap as soon as it finds
> first set bit, while bitmap_weight() counts all bits unconditionally.

Thanks, applied.

- Arnaldo


> Signed-off-by: Yury Norov <[email protected]>
> ---
> tools/perf/builtin-c2c.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/tools/perf/builtin-c2c.c b/tools/perf/builtin-c2c.c
> index 77dd4afacca4..14f787c67140 100644
> --- a/tools/perf/builtin-c2c.c
> +++ b/tools/perf/builtin-c2c.c
> @@ -1080,7 +1080,7 @@ node_entry(struct perf_hpp_fmt *fmt __maybe_unused, struct perf_hpp *hpp,
> bitmap_zero(set, c2c.cpus_cnt);
> bitmap_and(set, c2c_he->cpuset, c2c.nodes[node], c2c.cpus_cnt);
>
> - if (!bitmap_weight(set, c2c.cpus_cnt)) {
> + if (bitmap_empty(set, c2c.cpus_cnt)) {
> if (c2c.node_info == 1) {
> ret = scnprintf(hpp->buf, hpp->size, "%21s", " ");
> advance_hpp(hpp, ret);
> @@ -1944,7 +1944,7 @@ static int set_nodestr(struct c2c_hist_entry *c2c_he)
> if (c2c_he->nodestr)
> return 0;
>
> - if (bitmap_weight(c2c_he->nodeset, c2c.nodes_cnt)) {
> + if (!bitmap_empty(c2c_he->nodeset, c2c.nodes_cnt)) {
> len = bitmap_scnprintf(c2c_he->nodeset, c2c.nodes_cnt,
> buf, sizeof(buf));
> } else {
> --
> 2.30.2

--

- Arnaldo