2021-12-18 21:20:22

by Yury Norov

[permalink] [raw]
Subject: [PATCH v2 00/17] lib/bitmap: optimize bitmap_weight() usage

In many cases people use bitmap_weight()-based functions to compare
the result against a number of expression:

if (cpumask_weight(...) > 1)
do_something();

This may take considerable amount of time on many-cpus machines because
cpumask_weight(...) will traverse every word of underlying cpumask
unconditionally.

We can significantly improve on it for many real cases if stop traversing
the mask as soon as we count cpus to any number greater than 1:

if (cpumask_weight_gt(..., 1))
do_something();

To implement this idea, the series adds bitmap_weight_cmp() function
and bitmap_weight_{eq,gt,ge,lt,le} macros on top of it; corresponding
wrappers in cpumask and nodemask.

There are 3 cpumasks, for which weight is counted frequently: possible,
present and active. They all are read-mostly, and to optimize counting
number of set bits for them, this series adds atomic counters, similarly
to online cpumask.

v1: https://lkml.org/lkml/2021/11/27/339
v2:
- add bitmap_weight_cmp();
- fix bitmap_weight_le semantics and provide full set of {eq,gt,ge,lt,le}
as wrappers around bitmap_weight_cmp();
- don't touch small bitmaps (less than 32 bits) - optimization works
only for large bitmaps;
- move bitmap_weight() == 0 -> bitmap_empty() conversion to a separate
patch, ditto cpumask_weight() and nodes_weight;
- add counters for possible, present and active cpus;
- drop bitmap_empty() where possible;
- various fixes around bit counting that spotted my eyes.

Yury Norov (17):
all: don't use bitmap_weight() where possible
drivers: rename num_*_cpus variables
fix open-coded for_each_set_bit()
all: replace bitmap_weight with bitmap_empty where appropriate
all: replace cpumask_weight with cpumask_empty where appropriate
all: replace nodes_weight with nodes_empty where appropriate
lib/bitmap: add bitmap_weight_{cmp,eq,gt,ge,lt,le} functions
all: replace bitmap_weight with bitmap_weight_{eq,gt,ge,lt,le} where
appropriate
lib/cpumask: add cpumask_weight_{eq,gt,ge,lt,le}
lib/nodemask: add nodemask_weight_{eq,gt,ge,lt,le}
lib/nodemask: add num_node_state_eq()
kernel/cpu.c: fix init_cpu_online
kernel/cpu: add num_possible_cpus counter
kernel/cpu: add num_present_cpu counter
kernel/cpu: add num_active_cpu counter
tools/bitmap: sync bitmap_weight
MAINTAINERS: add cpumask and nodemask files to BITMAP_API

MAINTAINERS | 4 +
arch/alpha/kernel/process.c | 2 +-
arch/ia64/kernel/setup.c | 2 +-
arch/ia64/mm/tlb.c | 2 +-
arch/mips/cavium-octeon/octeon-irq.c | 4 +-
arch/mips/kernel/crash.c | 2 +-
arch/nds32/kernel/perf_event_cpu.c | 2 +-
arch/powerpc/kernel/smp.c | 2 +-
arch/powerpc/kernel/watchdog.c | 2 +-
arch/powerpc/xmon/xmon.c | 4 +-
arch/s390/kernel/perf_cpum_cf.c | 2 +-
arch/x86/kernel/cpu/resctrl/rdtgroup.c | 16 +--
arch/x86/kernel/smpboot.c | 4 +-
arch/x86/kvm/hyperv.c | 8 +-
arch/x86/mm/amdtopology.c | 2 +-
arch/x86/mm/mmio-mod.c | 2 +-
arch/x86/mm/numa_emulation.c | 4 +-
arch/x86/platform/uv/uv_nmi.c | 2 +-
drivers/acpi/numa/srat.c | 2 +-
drivers/cpufreq/qcom-cpufreq-hw.c | 2 +-
drivers/cpufreq/scmi-cpufreq.c | 2 +-
drivers/firmware/psci/psci_checker.c | 2 +-
drivers/gpu/drm/i915/i915_pmu.c | 2 +-
drivers/gpu/drm/msm/disp/mdp5/mdp5_smp.c | 2 +-
drivers/hv/channel_mgmt.c | 4 +-
drivers/iio/dummy/iio_simple_dummy_buffer.c | 4 +-
drivers/iio/industrialio-trigger.c | 2 +-
drivers/infiniband/hw/hfi1/affinity.c | 13 +-
drivers/infiniband/hw/qib/qib_file_ops.c | 2 +-
drivers/infiniband/hw/qib/qib_iba7322.c | 2 +-
drivers/irqchip/irq-bcm6345-l1.c | 2 +-
drivers/leds/trigger/ledtrig-cpu.c | 6 +-
drivers/memstick/core/ms_block.c | 4 +-
drivers/net/dsa/b53/b53_common.c | 6 +-
drivers/net/ethernet/broadcom/bcmsysport.c | 6 +-
.../net/ethernet/intel/ice/ice_virtchnl_pf.c | 4 +-
.../net/ethernet/intel/ixgbe/ixgbe_sriov.c | 2 +-
.../marvell/octeontx2/nic/otx2_ethtool.c | 2 +-
.../marvell/octeontx2/nic/otx2_flows.c | 8 +-
.../ethernet/marvell/octeontx2/nic/otx2_pf.c | 2 +-
drivers/net/ethernet/mellanox/mlx4/cmd.c | 33 ++---
drivers/net/ethernet/mellanox/mlx4/eq.c | 4 +-
drivers/net/ethernet/mellanox/mlx4/fw.c | 4 +-
drivers/net/ethernet/mellanox/mlx4/main.c | 2 +-
drivers/net/ethernet/qlogic/qed/qed_rdma.c | 4 +-
drivers/net/ethernet/qlogic/qed/qed_roce.c | 2 +-
drivers/perf/arm-cci.c | 2 +-
drivers/perf/arm_pmu.c | 4 +-
drivers/perf/hisilicon/hisi_uncore_pmu.c | 2 +-
drivers/perf/thunderx2_pmu.c | 4 +-
drivers/perf/xgene_pmu.c | 2 +-
drivers/scsi/lpfc/lpfc_init.c | 2 +-
drivers/scsi/storvsc_drv.c | 6 +-
drivers/soc/fsl/qbman/qman_test_stash.c | 2 +-
drivers/staging/media/tegra-video/vi.c | 2 +-
drivers/thermal/intel/intel_powerclamp.c | 9 +-
include/linux/bitmap.h | 80 +++++++++++
include/linux/cpumask.h | 131 +++++++++++++-----
include/linux/nodemask.h | 40 ++++++
kernel/cpu.c | 54 ++++++++
kernel/irq/affinity.c | 2 +-
kernel/padata.c | 2 +-
kernel/rcu/tree_nocb.h | 4 +-
kernel/rcu/tree_plugin.h | 2 +-
kernel/sched/core.c | 10 +-
kernel/sched/topology.c | 4 +-
kernel/time/clockevents.c | 2 +-
kernel/time/clocksource.c | 2 +-
lib/bitmap.c | 21 +++
mm/mempolicy.c | 2 +-
mm/page_alloc.c | 2 +-
mm/vmstat.c | 4 +-
tools/include/linux/bitmap.h | 44 ++++++
tools/lib/bitmap.c | 20 +++
tools/perf/builtin-c2c.c | 4 +-
tools/perf/util/pmu.c | 2 +-
76 files changed, 480 insertions(+), 183 deletions(-)

--
2.30.2



2021-12-18 21:20:27

by Yury Norov

[permalink] [raw]
Subject: [PATCH 01/17] all: don't use bitmap_weight() where possible

Don't call bitmap_weight() if the following code can get by
without it.

Signed-off-by: Yury Norov <[email protected]>
---
drivers/net/dsa/b53/b53_common.c | 6 +-----
drivers/net/ethernet/broadcom/bcmsysport.c | 6 +-----
drivers/thermal/intel/intel_powerclamp.c | 9 +++------
3 files changed, 5 insertions(+), 16 deletions(-)

diff --git a/drivers/net/dsa/b53/b53_common.c b/drivers/net/dsa/b53/b53_common.c
index 3867f3d4545f..9a10d80125d9 100644
--- a/drivers/net/dsa/b53/b53_common.c
+++ b/drivers/net/dsa/b53/b53_common.c
@@ -1620,12 +1620,8 @@ static int b53_arl_read(struct b53_device *dev, u64 mac,
return 0;
}

- if (bitmap_weight(free_bins, dev->num_arl_bins) == 0)
- return -ENOSPC;
-
*idx = find_first_bit(free_bins, dev->num_arl_bins);
-
- return -ENOENT;
+ return *idx >= dev->num_arl_bins ? -ENOSPC : -ENOENT;
}

static int b53_arl_op(struct b53_device *dev, int op, int port,
diff --git a/drivers/net/ethernet/broadcom/bcmsysport.c b/drivers/net/ethernet/broadcom/bcmsysport.c
index 40933bf5a710..241696fdc6c7 100644
--- a/drivers/net/ethernet/broadcom/bcmsysport.c
+++ b/drivers/net/ethernet/broadcom/bcmsysport.c
@@ -2177,13 +2177,9 @@ static int bcm_sysport_rule_set(struct bcm_sysport_priv *priv,
if (nfc->fs.ring_cookie != RX_CLS_FLOW_WAKE)
return -EOPNOTSUPP;

- /* All filters are already in use, we cannot match more rules */
- if (bitmap_weight(priv->filters, RXCHK_BRCM_TAG_MAX) ==
- RXCHK_BRCM_TAG_MAX)
- return -ENOSPC;
-
index = find_first_zero_bit(priv->filters, RXCHK_BRCM_TAG_MAX);
if (index >= RXCHK_BRCM_TAG_MAX)
+ /* All filters are already in use, we cannot match more rules */
return -ENOSPC;

/* Location is the classification ID, and index is the position
diff --git a/drivers/thermal/intel/intel_powerclamp.c b/drivers/thermal/intel/intel_powerclamp.c
index 14256421d98c..c841ab37e7c6 100644
--- a/drivers/thermal/intel/intel_powerclamp.c
+++ b/drivers/thermal/intel/intel_powerclamp.c
@@ -556,12 +556,9 @@ static void end_power_clamp(void)
* stop faster.
*/
clamping = false;
- if (bitmap_weight(cpu_clamping_mask, num_possible_cpus())) {
- for_each_set_bit(i, cpu_clamping_mask, num_possible_cpus()) {
- pr_debug("clamping worker for cpu %d alive, destroy\n",
- i);
- stop_power_clamp_worker(i);
- }
+ for_each_set_bit(i, cpu_clamping_mask, num_possible_cpus()) {
+ pr_debug("clamping worker for cpu %d alive, destroy\n", i);
+ stop_power_clamp_worker(i);
}
}

--
2.30.2


2021-12-18 21:20:35

by Yury Norov

[permalink] [raw]
Subject: [PATCH 02/17] drivers: rename num_*_cpus variables

Some drivers declare num_active_cpus and num_present_cpus,
despite that kernel has macros with corresponding names in
linux/cpumask.h, and the drivers include cpumask.h

The following patches switch num_*_cpus() to real functions,
which causes build failures for the drivers.

Signed-off-by: Yury Norov <[email protected]>
---
drivers/leds/trigger/ledtrig-cpu.c | 6 +++---
drivers/scsi/storvsc_drv.c | 6 +++---
2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/leds/trigger/ledtrig-cpu.c b/drivers/leds/trigger/ledtrig-cpu.c
index 8af4f9bb9cde..767e9749ca41 100644
--- a/drivers/leds/trigger/ledtrig-cpu.c
+++ b/drivers/leds/trigger/ledtrig-cpu.c
@@ -39,7 +39,7 @@ struct led_trigger_cpu {
static DEFINE_PER_CPU(struct led_trigger_cpu, cpu_trig);

static struct led_trigger *trig_cpu_all;
-static atomic_t num_active_cpus = ATOMIC_INIT(0);
+static atomic_t _active_cpus = ATOMIC_INIT(0);

/**
* ledtrig_cpu - emit a CPU event as a trigger
@@ -79,8 +79,8 @@ void ledtrig_cpu(enum cpu_led_event ledevt)

/* Update trigger state */
trig->is_active = is_active;
- atomic_add(is_active ? 1 : -1, &num_active_cpus);
- active_cpus = atomic_read(&num_active_cpus);
+ atomic_add(is_active ? 1 : -1, &_active_cpus);
+ active_cpus = atomic_read(&_active_cpus);
total_cpus = num_present_cpus();

led_trigger_event(trig->_trig,
diff --git a/drivers/scsi/storvsc_drv.c b/drivers/scsi/storvsc_drv.c
index 20595c0ba0ae..705dd4ebde98 100644
--- a/drivers/scsi/storvsc_drv.c
+++ b/drivers/scsi/storvsc_drv.c
@@ -1950,7 +1950,7 @@ static int storvsc_probe(struct hv_device *device,
{
int ret;
int num_cpus = num_online_cpus();
- int num_present_cpus = num_present_cpus();
+ int present_cpus = num_present_cpus();
struct Scsi_Host *host;
struct hv_host_device *host_dev;
bool dev_is_ide = ((dev_id->driver_data == IDE_GUID) ? true : false);
@@ -2060,7 +2060,7 @@ static int storvsc_probe(struct hv_device *device,
* Set the number of HW queues we are supporting.
*/
if (!dev_is_ide) {
- if (storvsc_max_hw_queues > num_present_cpus) {
+ if (storvsc_max_hw_queues > present_cpus) {
storvsc_max_hw_queues = 0;
storvsc_log(device, STORVSC_LOGGING_WARN,
"Resetting invalid storvsc_max_hw_queues value to default.\n");
@@ -2068,7 +2068,7 @@ static int storvsc_probe(struct hv_device *device,
if (storvsc_max_hw_queues)
host->nr_hw_queues = storvsc_max_hw_queues;
else
- host->nr_hw_queues = num_present_cpus;
+ host->nr_hw_queues = present_cpus;
}

/*
--
2.30.2


2021-12-18 21:20:43

by Yury Norov

[permalink] [raw]
Subject: [PATCH 03/17] fix open-coded for_each_set_bit()

Mellanox driver has an open-coded for_each_set_bit(). Fix it.

Signed-off-by: Yury Norov <[email protected]>
---
drivers/net/ethernet/mellanox/mlx4/cmd.c | 23 ++++++-----------------
1 file changed, 6 insertions(+), 17 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/cmd.c b/drivers/net/ethernet/mellanox/mlx4/cmd.c
index e10b7b04b894..c56d2194cbfc 100644
--- a/drivers/net/ethernet/mellanox/mlx4/cmd.c
+++ b/drivers/net/ethernet/mellanox/mlx4/cmd.c
@@ -1994,21 +1994,16 @@ static void mlx4_allocate_port_vpps(struct mlx4_dev *dev, int port)

static int mlx4_master_activate_admin_state(struct mlx4_priv *priv, int slave)
{
- int port, err;
+ int p, port, err;
struct mlx4_vport_state *vp_admin;
struct mlx4_vport_oper_state *vp_oper;
struct mlx4_slave_state *slave_state =
&priv->mfunc.master.slave_state[slave];
struct mlx4_active_ports actv_ports = mlx4_get_active_ports(
&priv->dev, slave);
- int min_port = find_first_bit(actv_ports.ports,
- priv->dev.caps.num_ports) + 1;
- int max_port = min_port - 1 +
- bitmap_weight(actv_ports.ports, priv->dev.caps.num_ports);

- for (port = min_port; port <= max_port; port++) {
- if (!test_bit(port - 1, actv_ports.ports))
- continue;
+ for_each_set_bit(p, actv_ports.ports, priv->dev.caps.num_ports) {
+ port = p + 1;
priv->mfunc.master.vf_oper[slave].smi_enabled[port] =
priv->mfunc.master.vf_admin[slave].enable_smi[port];
vp_oper = &priv->mfunc.master.vf_oper[slave].vport[port];
@@ -2063,19 +2058,13 @@ static int mlx4_master_activate_admin_state(struct mlx4_priv *priv, int slave)

static void mlx4_master_deactivate_admin_state(struct mlx4_priv *priv, int slave)
{
- int port;
+ int p, port;
struct mlx4_vport_oper_state *vp_oper;
struct mlx4_active_ports actv_ports = mlx4_get_active_ports(
&priv->dev, slave);
- int min_port = find_first_bit(actv_ports.ports,
- priv->dev.caps.num_ports) + 1;
- int max_port = min_port - 1 +
- bitmap_weight(actv_ports.ports, priv->dev.caps.num_ports);

-
- for (port = min_port; port <= max_port; port++) {
- if (!test_bit(port - 1, actv_ports.ports))
- continue;
+ for_each_set_bit(p, actv_ports.ports, priv->dev.caps.num_ports) {
+ port = p + 1;
priv->mfunc.master.vf_oper[slave].smi_enabled[port] =
MLX4_VF_SMI_DISABLED;
vp_oper = &priv->mfunc.master.vf_oper[slave].vport[port];
--
2.30.2


2021-12-18 21:20:46

by Yury Norov

[permalink] [raw]
Subject: [PATCH 04/17] all: replace bitmap_weight with bitmap_empty where appropriate

In many cases, kernel code calls bitmap_weight() to check if any bit of
a given bitmap is set. It's better to use bitmap_empty() in that case
because bitmap_empty() stops traversing the bitmap as soon as it finds
first set bit, while bitmap_weight() counts all bits unconditionally.

Signed-off-by: Yury Norov <[email protected]>
---
arch/nds32/kernel/perf_event_cpu.c | 2 +-
arch/x86/kvm/hyperv.c | 8 ++++----
drivers/gpu/drm/msm/disp/mdp5/mdp5_smp.c | 2 +-
drivers/net/ethernet/intel/ice/ice_virtchnl_pf.c | 4 ++--
drivers/net/ethernet/marvell/octeontx2/nic/otx2_flows.c | 4 ++--
drivers/net/ethernet/marvell/octeontx2/nic/otx2_pf.c | 2 +-
drivers/net/ethernet/qlogic/qed/qed_rdma.c | 4 ++--
drivers/net/ethernet/qlogic/qed/qed_roce.c | 2 +-
drivers/perf/arm-cci.c | 2 +-
drivers/perf/arm_pmu.c | 4 ++--
drivers/perf/hisilicon/hisi_uncore_pmu.c | 2 +-
drivers/perf/xgene_pmu.c | 2 +-
tools/perf/builtin-c2c.c | 4 ++--
13 files changed, 21 insertions(+), 21 deletions(-)

diff --git a/arch/nds32/kernel/perf_event_cpu.c b/arch/nds32/kernel/perf_event_cpu.c
index a78a879e7ef1..ea44e9ecb5c7 100644
--- a/arch/nds32/kernel/perf_event_cpu.c
+++ b/arch/nds32/kernel/perf_event_cpu.c
@@ -695,7 +695,7 @@ static void nds32_pmu_enable(struct pmu *pmu)
{
struct nds32_pmu *nds32_pmu = to_nds32_pmu(pmu);
struct pmu_hw_events *hw_events = nds32_pmu->get_hw_events();
- int enabled = bitmap_weight(hw_events->used_mask,
+ bool enabled = !bitmap_empty(hw_events->used_mask,
nds32_pmu->num_events);

if (enabled)
diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c
index 6e38a7d22e97..2c3400dea4b3 100644
--- a/arch/x86/kvm/hyperv.c
+++ b/arch/x86/kvm/hyperv.c
@@ -90,7 +90,7 @@ static void synic_update_vector(struct kvm_vcpu_hv_synic *synic,
{
struct kvm_vcpu *vcpu = hv_synic_to_vcpu(synic);
struct kvm_hv *hv = to_kvm_hv(vcpu->kvm);
- int auto_eoi_old, auto_eoi_new;
+ bool auto_eoi_old, auto_eoi_new;

if (vector < HV_SYNIC_FIRST_VALID_VECTOR)
return;
@@ -100,16 +100,16 @@ static void synic_update_vector(struct kvm_vcpu_hv_synic *synic,
else
__clear_bit(vector, synic->vec_bitmap);

- auto_eoi_old = bitmap_weight(synic->auto_eoi_bitmap, 256);
+ auto_eoi_old = bitmap_empty(synic->auto_eoi_bitmap, 256);

if (synic_has_vector_auto_eoi(synic, vector))
__set_bit(vector, synic->auto_eoi_bitmap);
else
__clear_bit(vector, synic->auto_eoi_bitmap);

- auto_eoi_new = bitmap_weight(synic->auto_eoi_bitmap, 256);
+ auto_eoi_new = bitmap_empty(synic->auto_eoi_bitmap, 256);

- if (!!auto_eoi_old == !!auto_eoi_new)
+ if (auto_eoi_old == auto_eoi_new)
return;

down_write(&vcpu->kvm->arch.apicv_update_lock);
diff --git a/drivers/gpu/drm/msm/disp/mdp5/mdp5_smp.c b/drivers/gpu/drm/msm/disp/mdp5/mdp5_smp.c
index d7fa2c49e741..56a3063545ec 100644
--- a/drivers/gpu/drm/msm/disp/mdp5/mdp5_smp.c
+++ b/drivers/gpu/drm/msm/disp/mdp5/mdp5_smp.c
@@ -68,7 +68,7 @@ static int smp_request_block(struct mdp5_smp *smp,
uint8_t reserved;

/* we shouldn't be requesting blocks for an in-use client: */
- WARN_ON(bitmap_weight(cs, cnt) > 0);
+ WARN_ON(!bitmap_empty(cs, cnt));

reserved = smp->reserved[cid];

diff --git a/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.c b/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.c
index 61b2db3342ed..ac0fe04df2e0 100644
--- a/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.c
+++ b/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.c
@@ -267,8 +267,8 @@ ice_set_pfe_link(struct ice_vf *vf, struct virtchnl_pf_event *pfe,
*/
static bool ice_vf_has_no_qs_ena(struct ice_vf *vf)
{
- return (!bitmap_weight(vf->rxq_ena, ICE_MAX_RSS_QS_PER_VF) &&
- !bitmap_weight(vf->txq_ena, ICE_MAX_RSS_QS_PER_VF));
+ return (bitmap_empty(vf->rxq_ena, ICE_MAX_RSS_QS_PER_VF) &&
+ bitmap_empty(vf->txq_ena, ICE_MAX_RSS_QS_PER_VF));
}

/**
diff --git a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_flows.c b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_flows.c
index 77a13fb555fb..80b2d64b4136 100644
--- a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_flows.c
+++ b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_flows.c
@@ -353,7 +353,7 @@ int otx2_add_macfilter(struct net_device *netdev, const u8 *mac)
{
struct otx2_nic *pf = netdev_priv(netdev);

- if (bitmap_weight(&pf->flow_cfg->dmacflt_bmap,
+ if (!bitmap_empty(&pf->flow_cfg->dmacflt_bmap,
pf->flow_cfg->dmacflt_max_flows))
netdev_warn(netdev,
"Add %pM to CGX/RPM DMAC filters list as well\n",
@@ -436,7 +436,7 @@ int otx2_get_maxflows(struct otx2_flow_config *flow_cfg)
return 0;

if (flow_cfg->nr_flows == flow_cfg->max_flows ||
- bitmap_weight(&flow_cfg->dmacflt_bmap,
+ !bitmap_empty(&flow_cfg->dmacflt_bmap,
flow_cfg->dmacflt_max_flows))
return flow_cfg->max_flows + flow_cfg->dmacflt_max_flows;
else
diff --git a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_pf.c b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_pf.c
index 6080ebd9bd94..3d369ccc7ab9 100644
--- a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_pf.c
+++ b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_pf.c
@@ -1115,7 +1115,7 @@ static int otx2_cgx_config_loopback(struct otx2_nic *pf, bool enable)
struct msg_req *msg;
int err;

- if (enable && bitmap_weight(&pf->flow_cfg->dmacflt_bmap,
+ if (enable && !bitmap_empty(&pf->flow_cfg->dmacflt_bmap,
pf->flow_cfg->dmacflt_max_flows))
netdev_warn(pf->netdev,
"CGX/RPM internal loopback might not work as DMAC filters are active\n");
diff --git a/drivers/net/ethernet/qlogic/qed/qed_rdma.c b/drivers/net/ethernet/qlogic/qed/qed_rdma.c
index 23b668de4640..b6e2e17bac04 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_rdma.c
+++ b/drivers/net/ethernet/qlogic/qed/qed_rdma.c
@@ -336,7 +336,7 @@ void qed_rdma_bmap_free(struct qed_hwfn *p_hwfn,

/* print aligned non-zero lines, if any */
for (item = 0, line = 0; line < last_line; line++, item += 8)
- if (bitmap_weight((unsigned long *)&pmap[item], 64 * 8))
+ if (!bitmap_empty((unsigned long *)&pmap[item], 64 * 8))
DP_NOTICE(p_hwfn,
"line 0x%04x: 0x%016llx 0x%016llx 0x%016llx 0x%016llx 0x%016llx 0x%016llx 0x%016llx 0x%016llx\n",
line,
@@ -350,7 +350,7 @@ void qed_rdma_bmap_free(struct qed_hwfn *p_hwfn,

/* print last unaligned non-zero line, if any */
if ((bmap->max_count % (64 * 8)) &&
- (bitmap_weight((unsigned long *)&pmap[item],
+ (!bitmap_empty((unsigned long *)&pmap[item],
bmap->max_count - item * 64))) {
offset = sprintf(str_last_line, "line 0x%04x: ", line);
for (; item < last_item; item++)
diff --git a/drivers/net/ethernet/qlogic/qed/qed_roce.c b/drivers/net/ethernet/qlogic/qed/qed_roce.c
index 071b4aeaddf2..134ecfca96a3 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_roce.c
+++ b/drivers/net/ethernet/qlogic/qed/qed_roce.c
@@ -76,7 +76,7 @@ void qed_roce_stop(struct qed_hwfn *p_hwfn)
* We delay for a short while if an async destroy QP is still expected.
* Beyond the added delay we clear the bitmap anyway.
*/
- while (bitmap_weight(rcid_map->bitmap, rcid_map->max_count)) {
+ while (!bitmap_empty(rcid_map->bitmap, rcid_map->max_count)) {
/* If the HW device is during recovery, all resources are
* immediately reset without receiving a per-cid indication
* from HW. In this case we don't expect the cid bitmap to be
diff --git a/drivers/perf/arm-cci.c b/drivers/perf/arm-cci.c
index 54aca3a62814..96e09fa40909 100644
--- a/drivers/perf/arm-cci.c
+++ b/drivers/perf/arm-cci.c
@@ -1096,7 +1096,7 @@ static void cci_pmu_enable(struct pmu *pmu)
{
struct cci_pmu *cci_pmu = to_cci_pmu(pmu);
struct cci_pmu_hw_events *hw_events = &cci_pmu->hw_events;
- int enabled = bitmap_weight(hw_events->used_mask, cci_pmu->num_cntrs);
+ bool enabled = !bitmap_empty(hw_events->used_mask, cci_pmu->num_cntrs);
unsigned long flags;

if (!enabled)
diff --git a/drivers/perf/arm_pmu.c b/drivers/perf/arm_pmu.c
index 295cc7952d0e..a31b302b0ade 100644
--- a/drivers/perf/arm_pmu.c
+++ b/drivers/perf/arm_pmu.c
@@ -524,7 +524,7 @@ static void armpmu_enable(struct pmu *pmu)
{
struct arm_pmu *armpmu = to_arm_pmu(pmu);
struct pmu_hw_events *hw_events = this_cpu_ptr(armpmu->hw_events);
- int enabled = bitmap_weight(hw_events->used_mask, armpmu->num_events);
+ bool enabled = !bitmap_empty(hw_events->used_mask, armpmu->num_events);

/* For task-bound events we may be called on other CPUs */
if (!cpumask_test_cpu(smp_processor_id(), &armpmu->supported_cpus))
@@ -785,7 +785,7 @@ static int cpu_pm_pmu_notify(struct notifier_block *b, unsigned long cmd,
{
struct arm_pmu *armpmu = container_of(b, struct arm_pmu, cpu_pm_nb);
struct pmu_hw_events *hw_events = this_cpu_ptr(armpmu->hw_events);
- int enabled = bitmap_weight(hw_events->used_mask, armpmu->num_events);
+ bool enabled = !bitmap_empty(hw_events->used_mask, armpmu->num_events);

if (!cpumask_test_cpu(smp_processor_id(), &armpmu->supported_cpus))
return NOTIFY_DONE;
diff --git a/drivers/perf/hisilicon/hisi_uncore_pmu.c b/drivers/perf/hisilicon/hisi_uncore_pmu.c
index a738aeab5c04..358e4e284a62 100644
--- a/drivers/perf/hisilicon/hisi_uncore_pmu.c
+++ b/drivers/perf/hisilicon/hisi_uncore_pmu.c
@@ -393,7 +393,7 @@ EXPORT_SYMBOL_GPL(hisi_uncore_pmu_read);
void hisi_uncore_pmu_enable(struct pmu *pmu)
{
struct hisi_pmu *hisi_pmu = to_hisi_pmu(pmu);
- int enabled = bitmap_weight(hisi_pmu->pmu_events.used_mask,
+ bool enabled = !bitmap_empty(hisi_pmu->pmu_events.used_mask,
hisi_pmu->num_counters);

if (!enabled)
diff --git a/drivers/perf/xgene_pmu.c b/drivers/perf/xgene_pmu.c
index 2b6d476bd213..88bd100a9633 100644
--- a/drivers/perf/xgene_pmu.c
+++ b/drivers/perf/xgene_pmu.c
@@ -867,7 +867,7 @@ static void xgene_perf_pmu_enable(struct pmu *pmu)
{
struct xgene_pmu_dev *pmu_dev = to_pmu_dev(pmu);
struct xgene_pmu *xgene_pmu = pmu_dev->parent;
- int enabled = bitmap_weight(pmu_dev->cntr_assign_mask,
+ bool enabled = !bitmap_empty(pmu_dev->cntr_assign_mask,
pmu_dev->max_counters);

if (!enabled)
diff --git a/tools/perf/builtin-c2c.c b/tools/perf/builtin-c2c.c
index b5c67ef73862..51997386fb31 100644
--- a/tools/perf/builtin-c2c.c
+++ b/tools/perf/builtin-c2c.c
@@ -1080,7 +1080,7 @@ node_entry(struct perf_hpp_fmt *fmt __maybe_unused, struct perf_hpp *hpp,
bitmap_zero(set, c2c.cpus_cnt);
bitmap_and(set, c2c_he->cpuset, c2c.nodes[node], c2c.cpus_cnt);

- if (!bitmap_weight(set, c2c.cpus_cnt)) {
+ if (bitmap_empty(set, c2c.cpus_cnt)) {
if (c2c.node_info == 1) {
ret = scnprintf(hpp->buf, hpp->size, "%21s", " ");
advance_hpp(hpp, ret);
@@ -1944,7 +1944,7 @@ static int set_nodestr(struct c2c_hist_entry *c2c_he)
if (c2c_he->nodestr)
return 0;

- if (bitmap_weight(c2c_he->nodeset, c2c.nodes_cnt)) {
+ if (!bitmap_empty(c2c_he->nodeset, c2c.nodes_cnt)) {
len = bitmap_scnprintf(c2c_he->nodeset, c2c.nodes_cnt,
buf, sizeof(buf));
} else {
--
2.30.2


2021-12-18 21:20:50

by Yury Norov

[permalink] [raw]
Subject: [PATCH 05/17] all: replace cpumask_weight with cpumask_empty where appropriate

In many cases, kernel code calls cpumask_weight() to check if any bit of
a given cpumask is set. We can do it more efficiently with cpumask_empty()
because cpumask_empty() stops traversing the cpumask as soon as it finds
first set bit, while cpumask_weight() counts all bits unconditionally.

Signed-off-by: Yury Norov <[email protected]>
---
arch/alpha/kernel/process.c | 2 +-
arch/ia64/kernel/setup.c | 2 +-
arch/x86/kernel/cpu/resctrl/rdtgroup.c | 14 +++++++-------
arch/x86/mm/mmio-mod.c | 2 +-
arch/x86/platform/uv/uv_nmi.c | 2 +-
drivers/cpufreq/qcom-cpufreq-hw.c | 2 +-
drivers/cpufreq/scmi-cpufreq.c | 2 +-
drivers/gpu/drm/i915/i915_pmu.c | 2 +-
drivers/infiniband/hw/hfi1/affinity.c | 4 ++--
drivers/irqchip/irq-bcm6345-l1.c | 2 +-
kernel/irq/affinity.c | 2 +-
kernel/padata.c | 2 +-
kernel/rcu/tree_nocb.h | 4 ++--
kernel/rcu/tree_plugin.h | 2 +-
kernel/sched/core.c | 2 +-
kernel/sched/topology.c | 2 +-
kernel/time/clocksource.c | 2 +-
mm/vmstat.c | 4 ++--
18 files changed, 27 insertions(+), 27 deletions(-)

diff --git a/arch/alpha/kernel/process.c b/arch/alpha/kernel/process.c
index f4759e4ee4a9..a4415ad44982 100644
--- a/arch/alpha/kernel/process.c
+++ b/arch/alpha/kernel/process.c
@@ -125,7 +125,7 @@ common_shutdown_1(void *generic_ptr)
/* Wait for the secondaries to halt. */
set_cpu_present(boot_cpuid, false);
set_cpu_possible(boot_cpuid, false);
- while (cpumask_weight(cpu_present_mask))
+ while (!cpumask_empty(cpu_present_mask))
barrier();
#endif

diff --git a/arch/ia64/kernel/setup.c b/arch/ia64/kernel/setup.c
index 5010348fa21b..fd6301eafa9d 100644
--- a/arch/ia64/kernel/setup.c
+++ b/arch/ia64/kernel/setup.c
@@ -572,7 +572,7 @@ setup_arch (char **cmdline_p)
#ifdef CONFIG_ACPI_HOTPLUG_CPU
prefill_possible_map();
#endif
- per_cpu_scan_finalize((cpumask_weight(&early_cpu_possible_map) == 0 ?
+ per_cpu_scan_finalize((cpumask_empty(&early_cpu_possible_map) ?
32 : cpumask_weight(&early_cpu_possible_map)),
additional_cpus > 0 ? additional_cpus : 0);
#endif /* CONFIG_ACPI_NUMA */
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index b57b3db9a6a7..e23ff03290b8 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -341,14 +341,14 @@ static int cpus_mon_write(struct rdtgroup *rdtgrp, cpumask_var_t newmask,

/* Check whether cpus belong to parent ctrl group */
cpumask_andnot(tmpmask, newmask, &prgrp->cpu_mask);
- if (cpumask_weight(tmpmask)) {
+ if (!cpumask_empty(tmpmask)) {
rdt_last_cmd_puts("Can only add CPUs to mongroup that belong to parent\n");
return -EINVAL;
}

/* Check whether cpus are dropped from this group */
cpumask_andnot(tmpmask, &rdtgrp->cpu_mask, newmask);
- if (cpumask_weight(tmpmask)) {
+ if (!cpumask_empty(tmpmask)) {
/* Give any dropped cpus to parent rdtgroup */
cpumask_or(&prgrp->cpu_mask, &prgrp->cpu_mask, tmpmask);
update_closid_rmid(tmpmask, prgrp);
@@ -359,7 +359,7 @@ static int cpus_mon_write(struct rdtgroup *rdtgrp, cpumask_var_t newmask,
* and update per-cpu rmid
*/
cpumask_andnot(tmpmask, newmask, &rdtgrp->cpu_mask);
- if (cpumask_weight(tmpmask)) {
+ if (!cpumask_empty(tmpmask)) {
head = &prgrp->mon.crdtgrp_list;
list_for_each_entry(crgrp, head, mon.crdtgrp_list) {
if (crgrp == rdtgrp)
@@ -394,7 +394,7 @@ static int cpus_ctrl_write(struct rdtgroup *rdtgrp, cpumask_var_t newmask,

/* Check whether cpus are dropped from this group */
cpumask_andnot(tmpmask, &rdtgrp->cpu_mask, newmask);
- if (cpumask_weight(tmpmask)) {
+ if (!cpumask_empty(tmpmask)) {
/* Can't drop from default group */
if (rdtgrp == &rdtgroup_default) {
rdt_last_cmd_puts("Can't drop CPUs from default group\n");
@@ -413,12 +413,12 @@ static int cpus_ctrl_write(struct rdtgroup *rdtgrp, cpumask_var_t newmask,
* and update per-cpu closid/rmid.
*/
cpumask_andnot(tmpmask, newmask, &rdtgrp->cpu_mask);
- if (cpumask_weight(tmpmask)) {
+ if (!cpumask_empty(tmpmask)) {
list_for_each_entry(r, &rdt_all_groups, rdtgroup_list) {
if (r == rdtgrp)
continue;
cpumask_and(tmpmask1, &r->cpu_mask, tmpmask);
- if (cpumask_weight(tmpmask1))
+ if (!cpumask_empty(tmpmask1))
cpumask_rdtgrp_clear(r, tmpmask1);
}
update_closid_rmid(tmpmask, rdtgrp);
@@ -488,7 +488,7 @@ static ssize_t rdtgroup_cpus_write(struct kernfs_open_file *of,

/* check that user didn't specify any offline cpus */
cpumask_andnot(tmpmask, newmask, cpu_online_mask);
- if (cpumask_weight(tmpmask)) {
+ if (!cpumask_empty(tmpmask)) {
ret = -EINVAL;
rdt_last_cmd_puts("Can only assign online CPUs\n");
goto unlock;
diff --git a/arch/x86/mm/mmio-mod.c b/arch/x86/mm/mmio-mod.c
index 933a2ebad471..c3317f0650d8 100644
--- a/arch/x86/mm/mmio-mod.c
+++ b/arch/x86/mm/mmio-mod.c
@@ -400,7 +400,7 @@ static void leave_uniprocessor(void)
int cpu;
int err;

- if (!cpumask_available(downed_cpus) || cpumask_weight(downed_cpus) == 0)
+ if (!cpumask_available(downed_cpus) || cpumask_empty(downed_cpus))
return;
pr_notice("Re-enabling CPUs...\n");
for_each_cpu(cpu, downed_cpus) {
diff --git a/arch/x86/platform/uv/uv_nmi.c b/arch/x86/platform/uv/uv_nmi.c
index 1e9ff28bc2e0..ea277fc08357 100644
--- a/arch/x86/platform/uv/uv_nmi.c
+++ b/arch/x86/platform/uv/uv_nmi.c
@@ -985,7 +985,7 @@ static int uv_handle_nmi(unsigned int reason, struct pt_regs *regs)

/* Clear global flags */
if (master) {
- if (cpumask_weight(uv_nmi_cpu_mask))
+ if (!cpumask_empty(uv_nmi_cpu_mask))
uv_nmi_cleanup_mask();
atomic_set(&uv_nmi_cpus_in_nmi, -1);
atomic_set(&uv_nmi_cpu, -1);
diff --git a/drivers/cpufreq/qcom-cpufreq-hw.c b/drivers/cpufreq/qcom-cpufreq-hw.c
index 05f3d7876e44..95a0c57ab5bb 100644
--- a/drivers/cpufreq/qcom-cpufreq-hw.c
+++ b/drivers/cpufreq/qcom-cpufreq-hw.c
@@ -482,7 +482,7 @@ static int qcom_cpufreq_hw_cpu_init(struct cpufreq_policy *policy)
}

qcom_get_related_cpus(index, policy->cpus);
- if (!cpumask_weight(policy->cpus)) {
+ if (cpumask_empty(policy->cpus)) {
dev_err(dev, "Domain-%d failed to get related CPUs\n", index);
ret = -ENOENT;
goto error;
diff --git a/drivers/cpufreq/scmi-cpufreq.c b/drivers/cpufreq/scmi-cpufreq.c
index 1e0cd4d165f0..919fa6e3f462 100644
--- a/drivers/cpufreq/scmi-cpufreq.c
+++ b/drivers/cpufreq/scmi-cpufreq.c
@@ -154,7 +154,7 @@ static int scmi_cpufreq_init(struct cpufreq_policy *policy)
* table and opp-shared.
*/
ret = dev_pm_opp_of_get_sharing_cpus(cpu_dev, priv->opp_shared_cpus);
- if (ret || !cpumask_weight(priv->opp_shared_cpus)) {
+ if (ret || cpumask_empty(priv->opp_shared_cpus)) {
/*
* Either opp-table is not set or no opp-shared was found.
* Use the CPU mask from SCMI to designate CPUs sharing an OPP
diff --git a/drivers/gpu/drm/i915/i915_pmu.c b/drivers/gpu/drm/i915/i915_pmu.c
index 0b488d49694c..962e8d6bf6ea 100644
--- a/drivers/gpu/drm/i915/i915_pmu.c
+++ b/drivers/gpu/drm/i915/i915_pmu.c
@@ -1048,7 +1048,7 @@ static int i915_pmu_cpu_online(unsigned int cpu, struct hlist_node *node)
GEM_BUG_ON(!pmu->base.event_init);

/* Select the first online CPU as a designated reader. */
- if (!cpumask_weight(&i915_pmu_cpumask))
+ if (cpumask_empty(&i915_pmu_cpumask))
cpumask_set_cpu(cpu, &i915_pmu_cpumask);

return 0;
diff --git a/drivers/infiniband/hw/hfi1/affinity.c b/drivers/infiniband/hw/hfi1/affinity.c
index 98c813ba4304..38eee675369a 100644
--- a/drivers/infiniband/hw/hfi1/affinity.c
+++ b/drivers/infiniband/hw/hfi1/affinity.c
@@ -667,7 +667,7 @@ int hfi1_dev_affinity_init(struct hfi1_devdata *dd)
* engines, use the same CPU cores as general/control
* context.
*/
- if (cpumask_weight(&entry->def_intr.mask) == 0)
+ if (cpumask_empty(&entry->def_intr.mask))
cpumask_copy(&entry->def_intr.mask,
&entry->general_intr_mask);
}
@@ -687,7 +687,7 @@ int hfi1_dev_affinity_init(struct hfi1_devdata *dd)
* vectors, use the same CPU core as the general/control
* context.
*/
- if (cpumask_weight(&entry->comp_vect_mask) == 0)
+ if (cpumask_empty(&entry->comp_vect_mask))
cpumask_copy(&entry->comp_vect_mask,
&entry->general_intr_mask);
}
diff --git a/drivers/irqchip/irq-bcm6345-l1.c b/drivers/irqchip/irq-bcm6345-l1.c
index fd079215c17f..142a7431745f 100644
--- a/drivers/irqchip/irq-bcm6345-l1.c
+++ b/drivers/irqchip/irq-bcm6345-l1.c
@@ -315,7 +315,7 @@ static int __init bcm6345_l1_of_init(struct device_node *dn,
cpumask_set_cpu(idx, &intc->cpumask);
}

- if (!cpumask_weight(&intc->cpumask)) {
+ if (cpumask_empty(&intc->cpumask)) {
ret = -ENODEV;
goto out_free;
}
diff --git a/kernel/irq/affinity.c b/kernel/irq/affinity.c
index f7ff8919dc9b..18740faf0eb1 100644
--- a/kernel/irq/affinity.c
+++ b/kernel/irq/affinity.c
@@ -258,7 +258,7 @@ static int __irq_build_affinity_masks(unsigned int startvec,
nodemask_t nodemsk = NODE_MASK_NONE;
struct node_vectors *node_vectors;

- if (!cpumask_weight(cpu_mask))
+ if (cpumask_empty(cpu_mask))
return 0;

nodes = get_nodes_in_cpumask(node_to_cpumask, cpu_mask, &nodemsk);
diff --git a/kernel/padata.c b/kernel/padata.c
index 18d3a5c699d8..e5819bb8bd1d 100644
--- a/kernel/padata.c
+++ b/kernel/padata.c
@@ -181,7 +181,7 @@ int padata_do_parallel(struct padata_shell *ps,
goto out;

if (!cpumask_test_cpu(*cb_cpu, pd->cpumask.cbcpu)) {
- if (!cpumask_weight(pd->cpumask.cbcpu))
+ if (cpumask_empty(pd->cpumask.cbcpu))
goto out;

/* Select an alternate fallback CPU and notify the caller. */
diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h
index 1e40519d1a05..bc038a451768 100644
--- a/kernel/rcu/tree_nocb.h
+++ b/kernel/rcu/tree_nocb.h
@@ -1169,7 +1169,7 @@ void __init rcu_init_nohz(void)
struct rcu_data *rdp;

#if defined(CONFIG_NO_HZ_FULL)
- if (tick_nohz_full_running && cpumask_weight(tick_nohz_full_mask))
+ if (tick_nohz_full_running && !cpumask_empty(tick_nohz_full_mask))
need_rcu_nocb_mask = true;
#endif /* #if defined(CONFIG_NO_HZ_FULL) */

@@ -1353,7 +1353,7 @@ static void __init rcu_organize_nocb_kthreads(void)
*/
void rcu_bind_current_to_nocb(void)
{
- if (cpumask_available(rcu_nocb_mask) && cpumask_weight(rcu_nocb_mask))
+ if (cpumask_available(rcu_nocb_mask) && !cpumask_empty(rcu_nocb_mask))
WARN_ON(sched_setaffinity(current->pid, rcu_nocb_mask));
}
EXPORT_SYMBOL_GPL(rcu_bind_current_to_nocb);
diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index 54ef0e8c8742..3857ff6cb6f7 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -1216,7 +1216,7 @@ static void rcu_boost_kthread_setaffinity(struct rcu_node *rnp, int outgoingcpu)
cpu != outgoingcpu)
cpumask_set_cpu(cpu, cm);
cpumask_and(cm, cm, housekeeping_cpumask(HK_FLAG_RCU));
- if (cpumask_weight(cm) == 0)
+ if (cpumask_empty(cm))
cpumask_copy(cm, housekeeping_cpumask(HK_FLAG_RCU));
set_cpus_allowed_ptr(t, cm);
mutex_unlock(&rnp->boost_kthread_mutex);
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 83872f95a1ea..9b3ec14227e1 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -8715,7 +8715,7 @@ int cpuset_cpumask_can_shrink(const struct cpumask *cur,
{
int ret = 1;

- if (!cpumask_weight(cur))
+ if (cpumask_empty(cur))
return ret;

ret = dl_cpuset_cpumask_can_shrink(cur, trial);
diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
index d201a7052a29..8478e2a8cd65 100644
--- a/kernel/sched/topology.c
+++ b/kernel/sched/topology.c
@@ -74,7 +74,7 @@ static int sched_domain_debug_one(struct sched_domain *sd, int cpu, int level,
break;
}

- if (!cpumask_weight(sched_group_span(group))) {
+ if (cpumask_empty(sched_group_span(group))) {
printk(KERN_CONT "\n");
printk(KERN_ERR "ERROR: empty group\n");
break;
diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
index 95d7ca35bdf2..cee5da1e54c4 100644
--- a/kernel/time/clocksource.c
+++ b/kernel/time/clocksource.c
@@ -343,7 +343,7 @@ void clocksource_verify_percpu(struct clocksource *cs)
cpus_read_lock();
preempt_disable();
clocksource_verify_choose_cpus();
- if (cpumask_weight(&cpus_chosen) == 0) {
+ if (cpumask_empty(&cpus_chosen)) {
preempt_enable();
cpus_read_unlock();
pr_warn("Not enough CPUs to check clocksource '%s'.\n", cs->name);
diff --git a/mm/vmstat.c b/mm/vmstat.c
index d701c335628c..295642e2c24c 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -2032,7 +2032,7 @@ static void __init init_cpu_node_state(void)
int node;

for_each_online_node(node) {
- if (cpumask_weight(cpumask_of_node(node)) > 0)
+ if (!cpumask_empty(cpumask_of_node(node)))
node_set_state(node, N_CPU);
}
}
@@ -2059,7 +2059,7 @@ static int vmstat_cpu_dead(unsigned int cpu)

refresh_zone_stat_thresholds();
node_cpus = cpumask_of_node(node);
- if (cpumask_weight(node_cpus) > 0)
+ if (!cpumask_empty(node_cpus))
return 0;

node_clear_state(node, N_CPU);
--
2.30.2


2021-12-18 21:21:00

by Yury Norov

[permalink] [raw]
Subject: [PATCH 08/17] all: replace bitmap_weight with bitmap_weight_{eq,gt,ge,lt,le} where appropriate

Kernel code calls bitmap_weight() to compare the weight of bitmap with
a given number. We can do it more efficiently with bitmap_weight_{eq, ...}
because conditional bitmap_weight may stop traversing the bitmap earlier,
as soon as condition is met.

This patch replaces bitmap_weight with conditional versions where possible,
except for small bitmaps which size is not configurable and known at
constant time. In that case conditional version of bitmap_weight would not
benefit due to small_const_nbits() optimization; but readability may
suffer.

Signed-off-by: Yury Norov <[email protected]>
---
arch/x86/kernel/cpu/resctrl/rdtgroup.c | 2 +-
drivers/iio/dummy/iio_simple_dummy_buffer.c | 4 ++--
drivers/iio/industrialio-trigger.c | 2 +-
drivers/memstick/core/ms_block.c | 4 ++--
drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c | 2 +-
.../net/ethernet/marvell/octeontx2/nic/otx2_ethtool.c | 2 +-
.../net/ethernet/marvell/octeontx2/nic/otx2_flows.c | 4 ++--
drivers/net/ethernet/mellanox/mlx4/cmd.c | 10 +++-------
drivers/net/ethernet/mellanox/mlx4/eq.c | 4 ++--
drivers/net/ethernet/mellanox/mlx4/fw.c | 4 ++--
drivers/net/ethernet/mellanox/mlx4/main.c | 2 +-
drivers/perf/thunderx2_pmu.c | 4 ++--
drivers/staging/media/tegra-video/vi.c | 2 +-
13 files changed, 21 insertions(+), 25 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index e23ff03290b8..9d42e592c1cf 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -2752,7 +2752,7 @@ static int __init_one_rdt_domain(struct rdt_domain *d, struct resctrl_schema *s,
* bitmap_weight() does not access out-of-bound memory.
*/
tmp_cbm = cfg->new_ctrl;
- if (bitmap_weight(&tmp_cbm, r->cache.cbm_len) < r->cache.min_cbm_bits) {
+ if (bitmap_weight_lt(&tmp_cbm, r->cache.cbm_len, r->cache.min_cbm_bits)) {
rdt_last_cmd_printf("No space on %s:%d\n", s->name, d->id);
return -ENOSPC;
}
diff --git a/drivers/iio/dummy/iio_simple_dummy_buffer.c b/drivers/iio/dummy/iio_simple_dummy_buffer.c
index 59aa60d4ca37..cd2470ddf82b 100644
--- a/drivers/iio/dummy/iio_simple_dummy_buffer.c
+++ b/drivers/iio/dummy/iio_simple_dummy_buffer.c
@@ -72,8 +72,8 @@ static irqreturn_t iio_simple_dummy_trigger_h(int irq, void *p)
int i, j;

for (i = 0, j = 0;
- i < bitmap_weight(indio_dev->active_scan_mask,
- indio_dev->masklength);
+ bitmap_weight_gt(indio_dev->active_scan_mask,
+ indio_dev->masklength, i);
i++, j++) {
j = find_next_bit(indio_dev->active_scan_mask,
indio_dev->masklength, j);
diff --git a/drivers/iio/industrialio-trigger.c b/drivers/iio/industrialio-trigger.c
index f504ed351b3e..98c54022fecf 100644
--- a/drivers/iio/industrialio-trigger.c
+++ b/drivers/iio/industrialio-trigger.c
@@ -331,7 +331,7 @@ int iio_trigger_detach_poll_func(struct iio_trigger *trig,
{
struct iio_dev_opaque *iio_dev_opaque = to_iio_dev_opaque(pf->indio_dev);
bool no_other_users =
- bitmap_weight(trig->pool, CONFIG_IIO_CONSUMERS_PER_TRIGGER) == 1;
+ bitmap_weight_eq(trig->pool, CONFIG_IIO_CONSUMERS_PER_TRIGGER, 1);
int ret = 0;

if (trig->ops && trig->ops->set_trigger_state && no_other_users) {
diff --git a/drivers/memstick/core/ms_block.c b/drivers/memstick/core/ms_block.c
index 0cda6c6baefc..5cdd987e78f7 100644
--- a/drivers/memstick/core/ms_block.c
+++ b/drivers/memstick/core/ms_block.c
@@ -155,8 +155,8 @@ static int msb_validate_used_block_bitmap(struct msb_data *msb)
for (i = 0; i < msb->zone_count; i++)
total_free_blocks += msb->free_block_count[i];

- if (msb->block_count - bitmap_weight(msb->used_blocks_bitmap,
- msb->block_count) == total_free_blocks)
+ if (bitmap_weight_eq(msb->used_blocks_bitmap, msb->block_count,
+ msb->block_count - total_free_blocks))
return 0;

pr_err("BUG: free block counts don't match the bitmap");
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c
index 214a38de3f41..35297d8a488b 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c
@@ -246,7 +246,7 @@ int ixgbe_disable_sriov(struct ixgbe_adapter *adapter)
#endif

/* Disable VMDq flag so device will be set in VM mode */
- if (bitmap_weight(adapter->fwd_bitmask, adapter->num_rx_pools) == 1) {
+ if (bitmap_weight_eq(adapter->fwd_bitmask, adapter->num_rx_pools, 1)) {
adapter->flags &= ~IXGBE_FLAG_VMDQ_ENABLED;
adapter->flags &= ~IXGBE_FLAG_SRIOV_ENABLED;
rss = min_t(int, ixgbe_max_rss_indices(adapter),
diff --git a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_ethtool.c b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_ethtool.c
index d85db90632d6..a55fd1d0c653 100644
--- a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_ethtool.c
+++ b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_ethtool.c
@@ -287,7 +287,7 @@ static int otx2_set_channels(struct net_device *dev,
if (!channel->rx_count || !channel->tx_count)
return -EINVAL;

- if (bitmap_weight(&pfvf->rq_bmap, pfvf->hw.rx_queues) > 1) {
+ if (bitmap_weight_gt(&pfvf->rq_bmap, pfvf->hw.rx_queues, 1)) {
netdev_err(dev,
"Receive queues are in use by TC police action\n");
return -EINVAL;
diff --git a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_flows.c b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_flows.c
index 80b2d64b4136..55c899a6fcdd 100644
--- a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_flows.c
+++ b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_flows.c
@@ -1170,8 +1170,8 @@ int otx2_remove_flow(struct otx2_nic *pfvf, u32 location)
* interface mac address and configure CGX/RPM block in
* promiscuous mode
*/
- if (bitmap_weight(&flow_cfg->dmacflt_bmap,
- flow_cfg->dmacflt_max_flows) == 1)
+ if (bitmap_weight_eq(&flow_cfg->dmacflt_bmap,
+ flow_cfg->dmacflt_max_flows, 1))
otx2_update_rem_pfmac(pfvf, DMAC_ADDR_DEL);
} else {
err = otx2_remove_flow_msg(pfvf, flow->entry, false);
diff --git a/drivers/net/ethernet/mellanox/mlx4/cmd.c b/drivers/net/ethernet/mellanox/mlx4/cmd.c
index c56d2194cbfc..5bca0c68f00a 100644
--- a/drivers/net/ethernet/mellanox/mlx4/cmd.c
+++ b/drivers/net/ethernet/mellanox/mlx4/cmd.c
@@ -2792,9 +2792,8 @@ int mlx4_slave_convert_port(struct mlx4_dev *dev, int slave, int port)
{
unsigned n;
struct mlx4_active_ports actv_ports = mlx4_get_active_ports(dev, slave);
- unsigned m = bitmap_weight(actv_ports.ports, dev->caps.num_ports);

- if (port <= 0 || port > m)
+ if (port <= 0 || bitmap_weight_lt(actv_ports.ports, dev->caps.num_ports, port))
return -EINVAL;

n = find_first_bit(actv_ports.ports, dev->caps.num_ports);
@@ -3404,10 +3403,6 @@ int mlx4_vf_set_enable_smi_admin(struct mlx4_dev *dev, int slave, int port,
struct mlx4_priv *priv = mlx4_priv(dev);
struct mlx4_active_ports actv_ports = mlx4_get_active_ports(
&priv->dev, slave);
- int min_port = find_first_bit(actv_ports.ports,
- priv->dev.caps.num_ports) + 1;
- int max_port = min_port - 1 +
- bitmap_weight(actv_ports.ports, priv->dev.caps.num_ports);

if (slave == mlx4_master_func_num(dev))
return 0;
@@ -3417,7 +3412,8 @@ int mlx4_vf_set_enable_smi_admin(struct mlx4_dev *dev, int slave, int port,
enabled < 0 || enabled > 1)
return -EINVAL;

- if (min_port == max_port && dev->caps.num_ports > 1) {
+ if (dev->caps.num_ports > 1 &&
+ bitmap_weight_eq(actv_ports.ports, priv->dev.caps.num_ports, 1)) {
mlx4_info(dev, "SMI access disallowed for single ported VFs\n");
return -EPROTONOSUPPORT;
}
diff --git a/drivers/net/ethernet/mellanox/mlx4/eq.c b/drivers/net/ethernet/mellanox/mlx4/eq.c
index 414e390e6b48..0c09432ff389 100644
--- a/drivers/net/ethernet/mellanox/mlx4/eq.c
+++ b/drivers/net/ethernet/mellanox/mlx4/eq.c
@@ -1435,8 +1435,8 @@ int mlx4_is_eq_shared(struct mlx4_dev *dev, int vector)
if (vector <= 0 || (vector >= dev->caps.num_comp_vectors + 1))
return -EINVAL;

- return !!(bitmap_weight(priv->eq_table.eq[vector].actv_ports.ports,
- dev->caps.num_ports) > 1);
+ return bitmap_weight_gt(priv->eq_table.eq[vector].actv_ports.ports,
+ dev->caps.num_ports, 1);
}
EXPORT_SYMBOL(mlx4_is_eq_shared);

diff --git a/drivers/net/ethernet/mellanox/mlx4/fw.c b/drivers/net/ethernet/mellanox/mlx4/fw.c
index 42c96c9d7fb1..855aae326ccb 100644
--- a/drivers/net/ethernet/mellanox/mlx4/fw.c
+++ b/drivers/net/ethernet/mellanox/mlx4/fw.c
@@ -1300,8 +1300,8 @@ int mlx4_QUERY_DEV_CAP_wrapper(struct mlx4_dev *dev, int slave,
actv_ports = mlx4_get_active_ports(dev, slave);
first_port = find_first_bit(actv_ports.ports, dev->caps.num_ports);
for (slave_port = 0, real_port = first_port;
- real_port < first_port +
- bitmap_weight(actv_ports.ports, dev->caps.num_ports);
+ bitmap_weight_gt(actv_ports.ports, dev->caps.num_ports,
+ real_port - first_port);
++real_port, ++slave_port) {
if (flags & (MLX4_DEV_CAP_FLAG_WOL_PORT1 << real_port))
flags |= MLX4_DEV_CAP_FLAG_WOL_PORT1 << slave_port;
diff --git a/drivers/net/ethernet/mellanox/mlx4/main.c b/drivers/net/ethernet/mellanox/mlx4/main.c
index b187c210d4d6..cfbaa7ac712f 100644
--- a/drivers/net/ethernet/mellanox/mlx4/main.c
+++ b/drivers/net/ethernet/mellanox/mlx4/main.c
@@ -1383,7 +1383,7 @@ static int mlx4_mf_bond(struct mlx4_dev *dev)
dev->persist->num_vfs + 1);

/* only single port vfs are allowed */
- if (bitmap_weight(slaves_port_1_2, dev->persist->num_vfs + 1) > 1) {
+ if (bitmap_weight_gt(slaves_port_1_2, dev->persist->num_vfs + 1, 1)) {
mlx4_warn(dev, "HA mode unsupported for dual ported VFs\n");
return -EINVAL;
}
diff --git a/drivers/perf/thunderx2_pmu.c b/drivers/perf/thunderx2_pmu.c
index 05378c0fd8f3..ebfa66b212c7 100644
--- a/drivers/perf/thunderx2_pmu.c
+++ b/drivers/perf/thunderx2_pmu.c
@@ -623,8 +623,8 @@ static void tx2_uncore_event_start(struct perf_event *event, int flags)
return;

/* Start timer for first event */
- if (bitmap_weight(tx2_pmu->active_counters,
- tx2_pmu->max_counters) == 1) {
+ if (bitmap_weight_eq(tx2_pmu->active_counters,
+ tx2_pmu->max_counters, 1)) {
hrtimer_start(&tx2_pmu->hrtimer,
ns_to_ktime(tx2_pmu->hrtimer_interval),
HRTIMER_MODE_REL_PINNED);
diff --git a/drivers/staging/media/tegra-video/vi.c b/drivers/staging/media/tegra-video/vi.c
index 69d9787d5338..98d878a5ca6b 100644
--- a/drivers/staging/media/tegra-video/vi.c
+++ b/drivers/staging/media/tegra-video/vi.c
@@ -436,7 +436,7 @@ static int tegra_channel_enum_format(struct file *file, void *fh,
if (!IS_ENABLED(CONFIG_VIDEO_TEGRA_TPG))
fmts_bitmap = chan->fmts_bitmap;

- if (f->index >= bitmap_weight(fmts_bitmap, MAX_FORMAT_NUM))
+ if (bitmap_weight_le(fmts_bitmap, MAX_FORMAT_NUM, f->index))
return -EINVAL;

for (i = 0; i < f->index + 1; i++, index++)
--
2.30.2


2021-12-18 21:21:03

by Yury Norov

[permalink] [raw]
Subject: [PATCH 09/17] lib/cpumask: add cpumask_weight_{eq,gt,ge,lt,le}

Kernel code calls cpumask_weight() to compare the weight of cpumask with
a given number. We can do it more efficiently with cpumask_weight_{eq, ...}
because conditional cpumask_weight may stop traversing the cpumask earlier,
as soon as condition is met.

Signed-off-by: Yury Norov <[email protected]>
---
arch/ia64/mm/tlb.c | 2 +-
arch/mips/cavium-octeon/octeon-irq.c | 4 +-
arch/mips/kernel/crash.c | 2 +-
arch/powerpc/kernel/smp.c | 2 +-
arch/powerpc/kernel/watchdog.c | 2 +-
arch/powerpc/xmon/xmon.c | 4 +-
arch/s390/kernel/perf_cpum_cf.c | 2 +-
arch/x86/kernel/smpboot.c | 4 +-
drivers/firmware/psci/psci_checker.c | 2 +-
drivers/hv/channel_mgmt.c | 4 +-
drivers/infiniband/hw/hfi1/affinity.c | 9 ++---
drivers/infiniband/hw/qib/qib_file_ops.c | 2 +-
drivers/infiniband/hw/qib/qib_iba7322.c | 2 +-
drivers/scsi/lpfc/lpfc_init.c | 2 +-
drivers/soc/fsl/qbman/qman_test_stash.c | 2 +-
include/linux/cpumask.h | 50 ++++++++++++++++++++++++
kernel/sched/core.c | 8 ++--
kernel/sched/topology.c | 2 +-
kernel/time/clockevents.c | 2 +-
19 files changed, 78 insertions(+), 29 deletions(-)

diff --git a/arch/ia64/mm/tlb.c b/arch/ia64/mm/tlb.c
index 135b5135cace..a5bce13ab047 100644
--- a/arch/ia64/mm/tlb.c
+++ b/arch/ia64/mm/tlb.c
@@ -332,7 +332,7 @@ __flush_tlb_range (struct vm_area_struct *vma, unsigned long start,

preempt_disable();
#ifdef CONFIG_SMP
- if (mm != current->active_mm || cpumask_weight(mm_cpumask(mm)) != 1) {
+ if (mm != current->active_mm || !cpumask_weight_eq(mm_cpumask(mm), 1)) {
ia64_global_tlb_purge(mm, start, end, nbits);
preempt_enable();
return;
diff --git a/arch/mips/cavium-octeon/octeon-irq.c b/arch/mips/cavium-octeon/octeon-irq.c
index 844f882096e6..914871f15fb7 100644
--- a/arch/mips/cavium-octeon/octeon-irq.c
+++ b/arch/mips/cavium-octeon/octeon-irq.c
@@ -763,7 +763,7 @@ static void octeon_irq_cpu_offline_ciu(struct irq_data *data)
if (!cpumask_test_cpu(cpu, mask))
return;

- if (cpumask_weight(mask) > 1) {
+ if (cpumask_weight_gt(mask, 1)) {
/*
* It has multi CPU affinity, just remove this CPU
* from the affinity set.
@@ -795,7 +795,7 @@ static int octeon_irq_ciu_set_affinity(struct irq_data *data,
* This removes the need to do locking in the .ack/.eoi
* functions.
*/
- if (cpumask_weight(dest) != 1)
+ if (!cpumask_weight_eq(dest, 1))
return -EINVAL;

if (!enable_one)
diff --git a/arch/mips/kernel/crash.c b/arch/mips/kernel/crash.c
index 81845ba04835..5b690d52491f 100644
--- a/arch/mips/kernel/crash.c
+++ b/arch/mips/kernel/crash.c
@@ -72,7 +72,7 @@ static void crash_kexec_prepare_cpus(void)
*/
pr_emerg("Sending IPI to other cpus...\n");
msecs = 10000;
- while ((cpumask_weight(&cpus_in_crash) < ncpus) && (--msecs > 0)) {
+ while (cpumask_weight_lt(&cpus_in_crash, ncpus) && (--msecs > 0)) {
cpu_relax();
mdelay(1);
}
diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
index c338f9d8ab37..00da2064ddf3 100644
--- a/arch/powerpc/kernel/smp.c
+++ b/arch/powerpc/kernel/smp.c
@@ -1655,7 +1655,7 @@ void start_secondary(void *unused)
if (has_big_cores)
sibling_mask = cpu_smallcore_mask;

- if (cpumask_weight(mask) > cpumask_weight(sibling_mask(cpu)))
+ if (cpumask_weight_gt(mask, cpumask_weight(sibling_mask(cpu))))
shared_caches = true;
}

diff --git a/arch/powerpc/kernel/watchdog.c b/arch/powerpc/kernel/watchdog.c
index bfc27496fe7e..62937a077de7 100644
--- a/arch/powerpc/kernel/watchdog.c
+++ b/arch/powerpc/kernel/watchdog.c
@@ -483,7 +483,7 @@ static void start_watchdog(void *arg)

wd_smp_lock(&flags);
cpumask_set_cpu(cpu, &wd_cpus_enabled);
- if (cpumask_weight(&wd_cpus_enabled) == 1) {
+ if (cpumask_weight_eq(&wd_cpus_enabled, 1)) {
cpumask_set_cpu(cpu, &wd_smp_cpus_pending);
wd_smp_last_reset_tb = get_tb();
}
diff --git a/arch/powerpc/xmon/xmon.c b/arch/powerpc/xmon/xmon.c
index f9ae0b398260..b9e9d0b20a7b 100644
--- a/arch/powerpc/xmon/xmon.c
+++ b/arch/powerpc/xmon/xmon.c
@@ -469,7 +469,7 @@ static bool wait_for_other_cpus(int ncpus)

/* We wait for 2s, which is a metric "little while" */
for (timeout = 20000; timeout != 0; --timeout) {
- if (cpumask_weight(&cpus_in_xmon) >= ncpus)
+ if (cpumask_weight_ge(&cpus_in_xmon, ncpus))
return true;
udelay(100);
barrier();
@@ -1338,7 +1338,7 @@ static int cpu_cmd(void)
case 'S':
case 't':
cpumask_copy(&xmon_batch_cpus, &cpus_in_xmon);
- if (cpumask_weight(&xmon_batch_cpus) <= 1) {
+ if (cpumask_weight_le(&xmon_batch_cpus, 1)) {
printf("There are no other cpus in xmon\n");
break;
}
diff --git a/arch/s390/kernel/perf_cpum_cf.c b/arch/s390/kernel/perf_cpum_cf.c
index ee8707abdb6a..4d217f7f5ccf 100644
--- a/arch/s390/kernel/perf_cpum_cf.c
+++ b/arch/s390/kernel/perf_cpum_cf.c
@@ -975,7 +975,7 @@ static int cfset_all_start(struct cfset_request *req)
return -ENOMEM;
cpumask_and(mask, &req->mask, cpu_online_mask);
on_each_cpu_mask(mask, cfset_ioctl_on, &p, 1);
- if (atomic_read(&p.cpus_ack) != cpumask_weight(mask)) {
+ if (!cpumask_weight_eq(mask, atomic_read(&p.cpus_ack))) {
on_each_cpu_mask(mask, cfset_ioctl_off, &p, 1);
rc = -EIO;
debug_sprintf_event(cf_dbg, 4, "%s CPUs missing", __func__);
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index 617012f4619f..e851e9945eb5 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -1608,7 +1608,7 @@ static void remove_siblinginfo(int cpu)
/*/
* last thread sibling in this cpu core going down
*/
- if (cpumask_weight(topology_sibling_cpumask(cpu)) == 1)
+ if (cpumask_weight_eq(topology_sibling_cpumask(cpu), 1))
cpu_data(sibling).booted_cores--;
}

@@ -1617,7 +1617,7 @@ static void remove_siblinginfo(int cpu)

for_each_cpu(sibling, topology_sibling_cpumask(cpu)) {
cpumask_clear_cpu(cpu, topology_sibling_cpumask(sibling));
- if (cpumask_weight(topology_sibling_cpumask(sibling)) == 1)
+ if (cpumask_weight_eq(topology_sibling_cpumask(sibling), 1))
cpu_data(sibling).smt_active = false;
}

diff --git a/drivers/firmware/psci/psci_checker.c b/drivers/firmware/psci/psci_checker.c
index 116eb465cdb4..90c9473832a9 100644
--- a/drivers/firmware/psci/psci_checker.c
+++ b/drivers/firmware/psci/psci_checker.c
@@ -90,7 +90,7 @@ static unsigned int down_and_up_cpus(const struct cpumask *cpus,
* cpu_down() checks the number of online CPUs before the TOS
* resident CPU.
*/
- if (cpumask_weight(offlined_cpus) + 1 == nb_available_cpus) {
+ if (cpumask_weight_eq(offlined_cpus, nb_available_cpus - 1)) {
if (ret != -EBUSY) {
pr_err("Unexpected return code %d while trying "
"to power down last online CPU %d\n",
diff --git a/drivers/hv/channel_mgmt.c b/drivers/hv/channel_mgmt.c
index 2829575fd9b7..da297220230d 100644
--- a/drivers/hv/channel_mgmt.c
+++ b/drivers/hv/channel_mgmt.c
@@ -762,8 +762,8 @@ static void init_vp_index(struct vmbus_channel *channel)
}
alloced_mask = &hv_context.hv_numa_map[numa_node];

- if (cpumask_weight(alloced_mask) ==
- cpumask_weight(cpumask_of_node(numa_node))) {
+ if (cpumask_weight_eq(alloced_mask,
+ cpumask_weight(cpumask_of_node(numa_node)))) {
/*
* We have cycled through all the CPUs in the node;
* reset the alloced map.
diff --git a/drivers/infiniband/hw/hfi1/affinity.c b/drivers/infiniband/hw/hfi1/affinity.c
index 38eee675369a..7c5ca5c5306a 100644
--- a/drivers/infiniband/hw/hfi1/affinity.c
+++ b/drivers/infiniband/hw/hfi1/affinity.c
@@ -507,7 +507,7 @@ static int _dev_comp_vect_cpu_mask_init(struct hfi1_devdata *dd,
* available CPUs divide it by the number of devices in the
* local NUMA node.
*/
- if (cpumask_weight(&entry->comp_vect_mask) == 1) {
+ if (cpumask_weight_eq(&entry->comp_vect_mask, 1)) {
possible_cpus_comp_vect = 1;
dd_dev_warn(dd,
"Number of kernel receive queues is too large for completion vector affinity to be effective\n");
@@ -593,7 +593,7 @@ int hfi1_dev_affinity_init(struct hfi1_devdata *dd)
{
struct hfi1_affinity_node *entry;
const struct cpumask *local_mask;
- int curr_cpu, possible, i, ret;
+ int curr_cpu, i, ret;
bool new_entry = false;

local_mask = cpumask_of_node(dd->node);
@@ -626,10 +626,9 @@ int hfi1_dev_affinity_init(struct hfi1_devdata *dd)
local_mask);

/* fill in the receive list */
- possible = cpumask_weight(&entry->def_intr.mask);
curr_cpu = cpumask_first(&entry->def_intr.mask);

- if (possible == 1) {
+ if (cpumask_weight_eq(&entry->def_intr.mask, 1)) {
/* only one CPU, everyone will use it */
cpumask_set_cpu(curr_cpu, &entry->rcv_intr.mask);
cpumask_set_cpu(curr_cpu, &entry->general_intr_mask);
@@ -1017,7 +1016,7 @@ int hfi1_get_proc_affinity(int node)
cpu = cpumask_first(proc_mask);
cpumask_set_cpu(cpu, &set->used);
goto done;
- } else if (current->nr_cpus_allowed < cpumask_weight(&set->mask)) {
+ } else if (cpumask_weight_gt(&set->mask, current->nr_cpus_allowed)) {
hfi1_cdbg(PROC, "PID %u %s affinity set to CPU set(s) %*pbl",
current->pid, current->comm,
cpumask_pr_args(proc_mask));
diff --git a/drivers/infiniband/hw/qib/qib_file_ops.c b/drivers/infiniband/hw/qib/qib_file_ops.c
index aa290928cf96..add89bc21b0a 100644
--- a/drivers/infiniband/hw/qib/qib_file_ops.c
+++ b/drivers/infiniband/hw/qib/qib_file_ops.c
@@ -1151,7 +1151,7 @@ static void assign_ctxt_affinity(struct file *fp, struct qib_devdata *dd)
* reserve a processor for it on the local NUMA node.
*/
if ((weight >= qib_cpulist_count) &&
- (cpumask_weight(local_mask) <= qib_cpulist_count)) {
+ (cpumask_weight_le(local_mask, qib_cpulist_count))) {
for_each_cpu(local_cpu, local_mask)
if (!test_and_set_bit(local_cpu, qib_cpulist)) {
fd->rec_cpu_num = local_cpu;
diff --git a/drivers/infiniband/hw/qib/qib_iba7322.c b/drivers/infiniband/hw/qib/qib_iba7322.c
index ab98b6a3ae1e..636a080b2952 100644
--- a/drivers/infiniband/hw/qib/qib_iba7322.c
+++ b/drivers/infiniband/hw/qib/qib_iba7322.c
@@ -3405,7 +3405,7 @@ static void qib_setup_7322_interrupt(struct qib_devdata *dd, int clearpend)
local_mask = cpumask_of_pcibus(dd->pcidev->bus);
firstcpu = cpumask_first(local_mask);
if (firstcpu >= nr_cpu_ids ||
- cpumask_weight(local_mask) == num_online_cpus()) {
+ cpumask_weight_eq(local_mask, num_online_cpus())) {
local_mask = topology_core_cpumask(0);
firstcpu = cpumask_first(local_mask);
}
diff --git a/drivers/scsi/lpfc/lpfc_init.c b/drivers/scsi/lpfc/lpfc_init.c
index a56f01f659f8..325e9004dacd 100644
--- a/drivers/scsi/lpfc/lpfc_init.c
+++ b/drivers/scsi/lpfc/lpfc_init.c
@@ -12643,7 +12643,7 @@ lpfc_cpuhp_get_eq(struct lpfc_hba *phba, unsigned int cpu,
* gone offline yet, we need >1.
*/
cpumask_and(tmp, maskp, cpu_online_mask);
- if (cpumask_weight(tmp) > 1)
+ if (cpumask_weight_gt(tmp, 1))
continue;

/* Now that we have an irq to shutdown, get the eq
diff --git a/drivers/soc/fsl/qbman/qman_test_stash.c b/drivers/soc/fsl/qbman/qman_test_stash.c
index b7e8e5ec884c..28b08568a349 100644
--- a/drivers/soc/fsl/qbman/qman_test_stash.c
+++ b/drivers/soc/fsl/qbman/qman_test_stash.c
@@ -561,7 +561,7 @@ int qman_test_stash(void)
{
int err;

- if (cpumask_weight(cpu_online_mask) < 2) {
+ if (cpumask_weight_lt(cpu_online_mask, 2)) {
pr_info("%s(): skip - only 1 CPU\n", __func__);
return 0;
}
diff --git a/include/linux/cpumask.h b/include/linux/cpumask.h
index 64dae70d31f5..1906e3225737 100644
--- a/include/linux/cpumask.h
+++ b/include/linux/cpumask.h
@@ -575,6 +575,56 @@ static inline unsigned int cpumask_weight(const struct cpumask *srcp)
return bitmap_weight(cpumask_bits(srcp), nr_cpumask_bits);
}

+/**
+ * cpumask_weight_eq - Check if # of bits in *srcp is equal to a given number
+ * @srcp: the cpumask to count bits (< nr_cpu_ids) in.
+ * @num: the number to check.
+ */
+static inline bool cpumask_weight_eq(const struct cpumask *srcp, unsigned int num)
+{
+ return bitmap_weight_eq(cpumask_bits(srcp), nr_cpumask_bits, num);
+}
+
+/**
+ * cpumask_weight_gt - Check if # of bits in *srcp is greater than a given number
+ * @srcp: the cpumask to count bits (< nr_cpu_ids) in.
+ * @num: the number to check.
+ */
+static inline bool cpumask_weight_gt(const struct cpumask *srcp, int num)
+{
+ return bitmap_weight_gt(cpumask_bits(srcp), nr_cpumask_bits, num);
+}
+
+/**
+ * cpumask_weight_ge - Check if # of bits in *srcp is greater than or equal to a given number
+ * @srcp: the cpumask to count bits (< nr_cpu_ids) in.
+ * @num: the number to check.
+ */
+static inline bool cpumask_weight_ge(const struct cpumask *srcp, int num)
+{
+ return bitmap_weight_ge(cpumask_bits(srcp), nr_cpumask_bits, num);
+}
+
+/**
+ * cpumask_weight_lt - Check if # of bits in *srcp is less than a given number
+ * @srcp: the cpumask to count bits (< nr_cpu_ids) in.
+ * @num: the number to check.
+ */
+static inline bool cpumask_weight_lt(const struct cpumask *srcp, int num)
+{
+ return bitmap_weight_lt(cpumask_bits(srcp), nr_cpumask_bits, num);
+}
+
+/**
+ * cpumask_weight_le - Check if # of bits in *srcp is less than or equal to a given number
+ * @srcp: the cpumask to count bits (< nr_cpu_ids) in.
+ * @num: the number to check.
+ */
+static inline bool cpumask_weight_le(const struct cpumask *srcp, int num)
+{
+ return bitmap_weight_le(cpumask_bits(srcp), nr_cpumask_bits, num);
+}
+
/**
* cpumask_shift_right - *dstp = *srcp >> n
* @dstp: the cpumask result
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 9b3ec14227e1..60f7d04a05f8 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -6006,7 +6006,7 @@ static void sched_core_cpu_starting(unsigned int cpu)
WARN_ON_ONCE(rq->core != rq);

/* if we're the first, we'll be our own leader */
- if (cpumask_weight(smt_mask) == 1)
+ if (cpumask_weight_eq(smt_mask, 1))
goto unlock;

/* find the leader */
@@ -6047,7 +6047,7 @@ static void sched_core_cpu_deactivate(unsigned int cpu)
sched_core_lock(cpu, &flags);

/* if we're the last man standing, nothing to do */
- if (cpumask_weight(smt_mask) == 1) {
+ if (cpumask_weight_eq(smt_mask, 1)) {
WARN_ON_ONCE(rq->core != rq);
goto unlock;
}
@@ -9053,7 +9053,7 @@ int sched_cpu_activate(unsigned int cpu)
/*
* When going up, increment the number of cores with SMT present.
*/
- if (cpumask_weight(cpu_smt_mask(cpu)) == 2)
+ if (cpumask_weight_eq(cpu_smt_mask(cpu), 2))
static_branch_inc_cpuslocked(&sched_smt_present);
#endif
set_cpu_active(cpu, true);
@@ -9128,7 +9128,7 @@ int sched_cpu_deactivate(unsigned int cpu)
/*
* When going down, decrement the number of cores with SMT present.
*/
- if (cpumask_weight(cpu_smt_mask(cpu)) == 2)
+ if (cpumask_weight_eq(cpu_smt_mask(cpu), 2))
static_branch_dec_cpuslocked(&sched_smt_present);

sched_core_cpu_deactivate(cpu);
diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
index 8478e2a8cd65..79395571599f 100644
--- a/kernel/sched/topology.c
+++ b/kernel/sched/topology.c
@@ -169,7 +169,7 @@ static const unsigned int SD_DEGENERATE_GROUPS_MASK =

static int sd_degenerate(struct sched_domain *sd)
{
- if (cpumask_weight(sched_domain_span(sd)) == 1)
+ if (cpumask_weight_eq(sched_domain_span(sd), 1))
return 1;

/* Following flags need at least 2 groups */
diff --git a/kernel/time/clockevents.c b/kernel/time/clockevents.c
index 003ccf338d20..32d6629a55b2 100644
--- a/kernel/time/clockevents.c
+++ b/kernel/time/clockevents.c
@@ -648,7 +648,7 @@ void tick_cleanup_dead_cpu(int cpu)
*/
list_for_each_entry_safe(dev, tmp, &clockevent_devices, list) {
if (cpumask_test_cpu(cpu, dev->cpumask) &&
- cpumask_weight(dev->cpumask) == 1 &&
+ cpumask_weight_eq(dev->cpumask, 1) &&
!tick_is_broadcast_device(dev)) {
BUG_ON(!clockevent_state_detached(dev));
list_del(&dev->list);
--
2.30.2


2021-12-18 21:21:09

by Yury Norov

[permalink] [raw]
Subject: [PATCH 10/17] lib/nodemask: add nodemask_weight_{eq,gt,ge,lt,le}

Kernel code calls nodes_weight() to compare the weight of nodemask with
a given number. We can do it more efficiently with nodes_weight_{eq, ...}
because conditional nodes_weight may stop traversing the nodemask earlier,
as soon as condition is met.

Signed-off-by: Yury Norov <[email protected]>
---
drivers/acpi/numa/srat.c | 2 +-
include/linux/nodemask.h | 35 +++++++++++++++++++++++++++++++++++
mm/mempolicy.c | 2 +-
3 files changed, 37 insertions(+), 2 deletions(-)

diff --git a/drivers/acpi/numa/srat.c b/drivers/acpi/numa/srat.c
index 66a0142dc78c..484b9307f8cc 100644
--- a/drivers/acpi/numa/srat.c
+++ b/drivers/acpi/numa/srat.c
@@ -67,7 +67,7 @@ int acpi_map_pxm_to_node(int pxm)
node = pxm_to_node_map[pxm];

if (node == NUMA_NO_NODE) {
- if (nodes_weight(nodes_found_map) >= MAX_NUMNODES)
+ if (nodes_weight_ge(nodes_found_map, MAX_NUMNODES))
return NUMA_NO_NODE;
node = first_unset_node(nodes_found_map);
__acpi_map_pxm_to_node(pxm, node);
diff --git a/include/linux/nodemask.h b/include/linux/nodemask.h
index 567c3ddba2c4..197598e075e9 100644
--- a/include/linux/nodemask.h
+++ b/include/linux/nodemask.h
@@ -38,6 +38,11 @@
* int nodes_empty(mask) Is mask empty (no bits sets)?
* int nodes_full(mask) Is mask full (all bits sets)?
* int nodes_weight(mask) Hamming weight - number of set bits
+ * bool nodes_weight_eq(src, nbits, num) Hamming Weight is equal to num
+ * bool nodes_weight_gt(src, nbits, num) Hamming Weight is greater than num
+ * bool nodes_weight_ge(src, nbits, num) Hamming Weight is greater than or equal to num
+ * bool nodes_weight_lt(src, nbits, num) Hamming Weight is less than num
+ * bool nodes_weight_le(src, nbits, num) Hamming Weight is less than or equal to num
*
* void nodes_shift_right(dst, src, n) Shift right
* void nodes_shift_left(dst, src, n) Shift left
@@ -240,6 +245,36 @@ static inline int __nodes_weight(const nodemask_t *srcp, unsigned int nbits)
return bitmap_weight(srcp->bits, nbits);
}

+#define nodes_weight_eq(nodemask, num) __nodes_weight_eq(&(nodemask), MAX_NUMNODES, (num))
+static inline int __nodes_weight_eq(const nodemask_t *srcp, unsigned int nbits, int num)
+{
+ return bitmap_weight_eq(srcp->bits, nbits, num);
+}
+
+#define nodes_weight_gt(nodemask, num) __nodes_weight_gt(&(nodemask), MAX_NUMNODES, (num))
+static inline int __nodes_weight_gt(const nodemask_t *srcp, unsigned int nbits, int num)
+{
+ return bitmap_weight_gt(srcp->bits, nbits, num);
+}
+
+#define nodes_weight_ge(nodemask, num) __nodes_weight_ge(&(nodemask), MAX_NUMNODES, (num))
+static inline int __nodes_weight_ge(const nodemask_t *srcp, unsigned int nbits, int num)
+{
+ return bitmap_weight_ge(srcp->bits, nbits, num);
+}
+
+#define nodes_weight_lt(nodemask, num) __nodes_weight_lt(&(nodemask), MAX_NUMNODES, (num))
+static inline int __nodes_weight_lt(const nodemask_t *srcp, unsigned int nbits, int num)
+{
+ return bitmap_weight_lt(srcp->bits, nbits, num);
+}
+
+#define nodes_weight_le(nodemask, num) __nodes_weight_le(&(nodemask), MAX_NUMNODES, (num))
+static inline int __nodes_weight_le(const nodemask_t *srcp, unsigned int nbits, int num)
+{
+ return bitmap_weight_le(srcp->bits, nbits, num);
+}
+
#define nodes_shift_right(dst, src, n) \
__nodes_shift_right(&(dst), &(src), (n), MAX_NUMNODES)
static inline void __nodes_shift_right(nodemask_t *dstp,
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index a86590b2507d..27817cf2f2a0 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -1157,7 +1157,7 @@ int do_migrate_pages(struct mm_struct *mm, const nodemask_t *from,
* [0-7] - > [3,4,5] moves only 0,1,2,6,7.
*/

- if ((nodes_weight(*from) != nodes_weight(*to)) &&
+ if (!nodes_weight_eq(*from, nodes_weight(*to)) &&
(node_isset(s, *to)))
continue;

--
2.30.2


2021-12-18 21:21:17

by Yury Norov

[permalink] [raw]
Subject: [PATCH 06/17] all: replace nodes_weight with nodes_empty where appropriate

Kernel code calls nodes_weight() to check if any bit of a given nodemask is
set. We can do it more efficiently with nodes_empty() because nodes_empty()
stops traversing the nodemask as soon as it finds first set bit, while
nodes_weight() counts all bits unconditionally.

Signed-off-by: Yury Norov <[email protected]>
---
arch/x86/mm/amdtopology.c | 2 +-
arch/x86/mm/numa_emulation.c | 4 ++--
2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/x86/mm/amdtopology.c b/arch/x86/mm/amdtopology.c
index 058b2f36b3a6..b3ca7d23e4b0 100644
--- a/arch/x86/mm/amdtopology.c
+++ b/arch/x86/mm/amdtopology.c
@@ -154,7 +154,7 @@ int __init amd_numa_init(void)
node_set(nodeid, numa_nodes_parsed);
}

- if (!nodes_weight(numa_nodes_parsed))
+ if (nodes_empty(numa_nodes_parsed))
return -ENOENT;

/*
diff --git a/arch/x86/mm/numa_emulation.c b/arch/x86/mm/numa_emulation.c
index 1a02b791d273..9a9305367fdd 100644
--- a/arch/x86/mm/numa_emulation.c
+++ b/arch/x86/mm/numa_emulation.c
@@ -123,7 +123,7 @@ static int __init split_nodes_interleave(struct numa_meminfo *ei,
* Continue to fill physical nodes with fake nodes until there is no
* memory left on any of them.
*/
- while (nodes_weight(physnode_mask)) {
+ while (!nodes_empty(physnode_mask)) {
for_each_node_mask(i, physnode_mask) {
u64 dma32_end = PFN_PHYS(MAX_DMA32_PFN);
u64 start, limit, end;
@@ -270,7 +270,7 @@ static int __init split_nodes_size_interleave_uniform(struct numa_meminfo *ei,
* Fill physical nodes with fake nodes of size until there is no memory
* left on any of them.
*/
- while (nodes_weight(physnode_mask)) {
+ while (!nodes_empty(physnode_mask)) {
for_each_node_mask(i, physnode_mask) {
u64 dma32_end = PFN_PHYS(MAX_DMA32_PFN);
u64 start, limit, end;
--
2.30.2


2021-12-18 21:21:21

by Yury Norov

[permalink] [raw]
Subject: [PATCH 07/17] lib/bitmap: add bitmap_weight_{cmp,eq,gt,ge,lt,le} functions

Many kernel users use bitmap_weight() to compare the result against
some number or expression:

if (bitmap_weight(...) > 1)
do_something();

It works OK, but may be significantly improved for large bitmaps: if
first few words count set bits to a number greater than given, we can
stop counting and immediately return.

The same idea would work in other direction: if we know that the number
of set bits that we counted so far is small enough, so that it would be
smaller than required number even if all bits of the rest of the bitmap
are set, we can stop counting earlier.

This patch adds new bitmap_weight_cmp() as suggested by Michał Mirosław
and a family of eq, gt, ge, lt and le wrappers to allow this optimization.
The following patches apply new functions where appropriate.

Suggested-by: "Michał Mirosław" <[email protected]> (for bitmap_weight_cmp)
Signed-off-by: Yury Norov <[email protected]>
---
include/linux/bitmap.h | 80 ++++++++++++++++++++++++++++++++++++++++++
lib/bitmap.c | 21 +++++++++++
2 files changed, 101 insertions(+)

diff --git a/include/linux/bitmap.h b/include/linux/bitmap.h
index 7dba0847510c..708e57b32362 100644
--- a/include/linux/bitmap.h
+++ b/include/linux/bitmap.h
@@ -51,6 +51,12 @@ struct device;
* bitmap_empty(src, nbits) Are all bits zero in *src?
* bitmap_full(src, nbits) Are all bits set in *src?
* bitmap_weight(src, nbits) Hamming Weight: number set bits
+ * bitmap_weight_cmp(src, nbits) compare Hamming Weight with a number
+ * bitmap_weight_eq(src, nbits, num) Hamming Weight == num
+ * bitmap_weight_gt(src, nbits, num) Hamming Weight > num
+ * bitmap_weight_ge(src, nbits, num) Hamming Weight >= num
+ * bitmap_weight_lt(src, nbits, num) Hamming Weight < num
+ * bitmap_weight_le(src, nbits, num) Hamming Weight <= num
* bitmap_set(dst, pos, nbits) Set specified bit area
* bitmap_clear(dst, pos, nbits) Clear specified bit area
* bitmap_find_next_zero_area(buf, len, pos, n, mask) Find bit free area
@@ -162,6 +168,7 @@ int __bitmap_intersects(const unsigned long *bitmap1,
int __bitmap_subset(const unsigned long *bitmap1,
const unsigned long *bitmap2, unsigned int nbits);
int __bitmap_weight(const unsigned long *bitmap, unsigned int nbits);
+int __bitmap_weight_cmp(const unsigned long *bitmap, unsigned int bits, int num);
void __bitmap_set(unsigned long *map, unsigned int start, int len);
void __bitmap_clear(unsigned long *map, unsigned int start, int len);

@@ -403,6 +410,79 @@ static __always_inline int bitmap_weight(const unsigned long *src, unsigned int
return __bitmap_weight(src, nbits);
}

+/**
+ * bitmap_weight_cmp - compares number of set bits in @src with @num.
+ * @src: source bitmap
+ * @nbits: length of bitmap in bits
+ * @num: number to compare with
+ *
+ * As opposite to bitmap_weight() this function doesn't necessarily
+ * traverse full bitmap and may return earlier.
+ *
+ * Returns zero if weight of @src is equal to @num;
+ * negative number if weight of @src is less than @num;
+ * positive number if weight of @src is greater than @num;
+ *
+ * NOTES
+ *
+ * Because number of set bits cannot decrease while counting, when user
+ * wants to know if the number of set bits in the bitmap is less than
+ * @num, calling
+ * bitmap_weight_cmp(..., @num) < 0
+ * is potentially less effective than
+ * bitmap_weight_cmp(..., @num - 1) <= 0
+ *
+ * Consider an example:
+ * bitmap_weight_cmp(1000 0000 0000 0000, 1) < 0
+ * ^
+ * stop here
+ *
+ * bitmap_weight_cmp(1000 0000 0000 0000, 0) <= 0
+ * ^
+ * stop here
+ */
+static __always_inline
+int bitmap_weight_cmp(const unsigned long *src, unsigned int nbits, int num)
+{
+ if (num > (int)nbits || num < 0)
+ return -num;
+
+ if (small_const_nbits(nbits))
+ return hweight_long(*src & BITMAP_LAST_WORD_MASK(nbits)) - num;
+
+ return __bitmap_weight_cmp(src, nbits, num);
+}
+
+static __always_inline
+bool bitmap_weight_eq(const unsigned long *src, unsigned int nbits, int num)
+{
+ return bitmap_weight_cmp(src, nbits, num) == 0;
+}
+
+static __always_inline
+bool bitmap_weight_gt(const unsigned long *src, unsigned int nbits, int num)
+{
+ return bitmap_weight_cmp(src, nbits, num) > 0;
+}
+
+static __always_inline
+bool bitmap_weight_ge(const unsigned long *src, unsigned int nbits, int num)
+{
+ return bitmap_weight_cmp(src, nbits, num - 1) > 0;
+}
+
+static __always_inline
+bool bitmap_weight_lt(const unsigned long *src, unsigned int nbits, int num)
+{
+ return bitmap_weight_cmp(src, nbits, num - 1) <= 0;
+}
+
+static __always_inline
+bool bitmap_weight_le(const unsigned long *src, unsigned int nbits, int num)
+{
+ return bitmap_weight_cmp(src, nbits, num) <= 0;
+}
+
static __always_inline void bitmap_set(unsigned long *map, unsigned int start,
unsigned int nbits)
{
diff --git a/lib/bitmap.c b/lib/bitmap.c
index 926408883456..fb84ca70c5d9 100644
--- a/lib/bitmap.c
+++ b/lib/bitmap.c
@@ -348,6 +348,27 @@ int __bitmap_weight(const unsigned long *bitmap, unsigned int bits)
}
EXPORT_SYMBOL(__bitmap_weight);

+int __bitmap_weight_cmp(const unsigned long *bitmap, unsigned int bits, int num)
+{
+ unsigned int k, w, lim = bits / BITS_PER_LONG;
+
+ for (k = 0, w = 0; k < lim; k++) {
+ if (w + bits - k * BITS_PER_LONG < num)
+ goto out;
+
+ w += hweight_long(bitmap[k]);
+
+ if (w > num)
+ goto out;
+ }
+
+ if (bits % BITS_PER_LONG)
+ w += hweight_long(bitmap[k] & BITMAP_LAST_WORD_MASK(bits));
+out:
+ return w - num;
+}
+EXPORT_SYMBOL(__bitmap_weight_cmp);
+
void __bitmap_set(unsigned long *map, unsigned int start, int len)
{
unsigned long *p = map + BIT_WORD(start);
--
2.30.2


2021-12-18 21:21:47

by Yury Norov

[permalink] [raw]
Subject: [PATCH 11/17] lib/nodemask: add num_node_state_eq()

Kernel code calls num_node_state() to compare number of nodes with a given
number. The underlying code calls bitmap_weight(), and we can do it more
efficiently with num_node_state_eq because conditional nodes_weight may
stop traversing the nodemask earlier, as soon as condition is met.

Signed-off-by: Yury Norov <[email protected]>
---
include/linux/nodemask.h | 5 +++++
mm/page_alloc.c | 2 +-
2 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/include/linux/nodemask.h b/include/linux/nodemask.h
index 197598e075e9..c5014dbf3cce 100644
--- a/include/linux/nodemask.h
+++ b/include/linux/nodemask.h
@@ -466,6 +466,11 @@ static inline int num_node_state(enum node_states state)
return nodes_weight(node_states[state]);
}

+static inline int num_node_state_eq(enum node_states state, int num)
+{
+ return nodes_weight_eq(node_states[state], num);
+}
+
#define for_each_node_state(__node, __state) \
for_each_node_mask((__node), node_states[__state])

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index edfd6c81af82..71f5652828b8 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -8323,7 +8323,7 @@ void __init page_alloc_init(void)
int ret;

#ifdef CONFIG_NUMA
- if (num_node_state(N_MEMORY) == 1)
+ if (num_node_state_eq(N_MEMORY, 1))
hashdist = 0;
#endif

--
2.30.2


2021-12-18 21:22:07

by Yury Norov

[permalink] [raw]
Subject: [PATCH 12/17] kernel/cpu.c: fix init_cpu_online

cpu_online_mask has an associate counter of online cpus, which should be
initialized in init_cpu_online()

Fixes: 0c09ab96fc82010 (cpu/hotplug: Cache number of online CPUs)
Signed-off-by: Yury Norov <[email protected]>
---
kernel/cpu.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/kernel/cpu.c b/kernel/cpu.c
index 407a2568f35e..cd7605204d4d 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -2616,6 +2616,7 @@ void init_cpu_possible(const struct cpumask *src)
void init_cpu_online(const struct cpumask *src)
{
cpumask_copy(&__cpu_online_mask, src);
+ atomic_set(&__num_online_cpus, cpumask_weight(cpu_online_mask));
}

void set_cpu_online(unsigned int cpu, bool online)
--
2.30.2


2021-12-18 21:22:31

by Yury Norov

[permalink] [raw]
Subject: [PATCH 13/17] kernel/cpu: add num_possible_cpus counter

Similarly to the online cpus, the cpu_possible_mask is actively used
in the kernel. This patch adds a counter for possible cpus, so that
users that call num_possible_cpus() would know the result immediately,
instead of calling the bitmap_weight for the mask underlying.

Suggested-by: Nicholas Piggin <[email protected]>
Signed-off-by: Yury Norov <[email protected]>
---
include/linux/cpumask.h | 30 ++++++++++++++++--------------
kernel/cpu.c | 22 ++++++++++++++++++++++
2 files changed, 38 insertions(+), 14 deletions(-)

diff --git a/include/linux/cpumask.h b/include/linux/cpumask.h
index 1906e3225737..0be2504d8e4c 100644
--- a/include/linux/cpumask.h
+++ b/include/linux/cpumask.h
@@ -99,6 +99,7 @@ extern struct cpumask __cpu_dying_mask;
#define cpu_dying_mask ((const struct cpumask *)&__cpu_dying_mask)

extern atomic_t __num_online_cpus;
+extern atomic_t __num_possible_cpus;

extern cpumask_t cpus_booted_once_mask;

@@ -870,19 +871,8 @@ void init_cpu_present(const struct cpumask *src);
void init_cpu_possible(const struct cpumask *src);
void init_cpu_online(const struct cpumask *src);

-static inline void reset_cpu_possible_mask(void)
-{
- bitmap_zero(cpumask_bits(&__cpu_possible_mask), NR_CPUS);
-}
-
-static inline void
-set_cpu_possible(unsigned int cpu, bool possible)
-{
- if (possible)
- cpumask_set_cpu(cpu, &__cpu_possible_mask);
- else
- cpumask_clear_cpu(cpu, &__cpu_possible_mask);
-}
+void set_cpu_possible(unsigned int cpu, bool possible);
+void reset_cpu_possible_mask(void);

static inline void
set_cpu_present(unsigned int cpu, bool present)
@@ -962,7 +952,19 @@ static inline unsigned int num_online_cpus(void)
{
return atomic_read(&__num_online_cpus);
}
-#define num_possible_cpus() cpumask_weight(cpu_possible_mask)
+
+/**
+ * num_possible_cpus() - Read the number of possible CPUs
+ *
+ * Despite the fact that __num_possible_cpus is of type atomic_t, this
+ * interface gives only a momentary snapshot and is not protected against
+ * concurrent CPU hotplug operations unless invoked from a cpuhp_lock held
+ * region.
+ */
+static inline unsigned int num_possible_cpus(void)
+{
+ return atomic_read(&__num_possible_cpus);
+}
#define num_present_cpus() cpumask_weight(cpu_present_mask)
#define num_active_cpus() cpumask_weight(cpu_active_mask)

diff --git a/kernel/cpu.c b/kernel/cpu.c
index cd7605204d4d..a0a815911173 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -2583,10 +2583,13 @@ EXPORT_SYMBOL(cpu_all_bits);
#ifdef CONFIG_INIT_ALL_POSSIBLE
struct cpumask __cpu_possible_mask __read_mostly
= {CPU_BITS_ALL};
+atomic_t __num_possible_cpus __read_mostly = ATOMIC_INIT(NR_CPUS);
#else
struct cpumask __cpu_possible_mask __read_mostly;
+atomic_t __num_possible_cpus __read_mostly;
#endif
EXPORT_SYMBOL(__cpu_possible_mask);
+EXPORT_SYMBOL(__num_possible_cpus);

struct cpumask __cpu_online_mask __read_mostly;
EXPORT_SYMBOL(__cpu_online_mask);
@@ -2611,6 +2614,7 @@ void init_cpu_present(const struct cpumask *src)
void init_cpu_possible(const struct cpumask *src)
{
cpumask_copy(&__cpu_possible_mask, src);
+ atomic_set(&__num_possible_cpus, cpumask_weight(cpu_possible_mask));
}

void init_cpu_online(const struct cpumask *src)
@@ -2640,6 +2644,24 @@ void set_cpu_online(unsigned int cpu, bool online)
}
}

+void reset_cpu_possible_mask(void)
+{
+ bitmap_zero(cpumask_bits(&__cpu_possible_mask), NR_CPUS);
+ atomic_set(&__num_possible_cpus, 0);
+}
+
+void set_cpu_possible(unsigned int cpu, bool possible)
+{
+ if (possible) {
+ if (!cpumask_test_and_set_cpu(cpu, &__cpu_possible_mask))
+ atomic_inc(&__num_possible_cpus);
+ } else {
+ if (cpumask_test_and_clear_cpu(cpu, &__cpu_possible_mask))
+ atomic_dec(&__num_possible_cpus);
+ }
+}
+EXPORT_SYMBOL(set_cpu_possible);
+
/*
* Activate the first processor.
*/
--
2.30.2


2021-12-18 21:22:34

by Yury Norov

[permalink] [raw]
Subject: [PATCH 14/17] kernel/cpu: add num_present_cpu counter

Similarly to the online cpus, the cpu_present_mask is actively used
in the kernel. This patch adds a counter for present cpus, so that
users that call num_present_cpus() would know the result immediately,
instead of calling the bitmap_weight for the mask.

Suggested-by: Nicholas Piggin <[email protected]>
Signed-off-by: Yury Norov <[email protected]>
---
include/linux/cpumask.h | 25 +++++++++++++++----------
kernel/cpu.c | 16 ++++++++++++++++
2 files changed, 31 insertions(+), 10 deletions(-)

diff --git a/include/linux/cpumask.h b/include/linux/cpumask.h
index 0be2504d8e4c..c2a9d15e2cbd 100644
--- a/include/linux/cpumask.h
+++ b/include/linux/cpumask.h
@@ -100,6 +100,7 @@ extern struct cpumask __cpu_dying_mask;

extern atomic_t __num_online_cpus;
extern atomic_t __num_possible_cpus;
+extern atomic_t __num_present_cpus;

extern cpumask_t cpus_booted_once_mask;

@@ -873,15 +874,7 @@ void init_cpu_online(const struct cpumask *src);

void set_cpu_possible(unsigned int cpu, bool possible);
void reset_cpu_possible_mask(void);
-
-static inline void
-set_cpu_present(unsigned int cpu, bool present)
-{
- if (present)
- cpumask_set_cpu(cpu, &__cpu_present_mask);
- else
- cpumask_clear_cpu(cpu, &__cpu_present_mask);
-}
+void set_cpu_present(unsigned int cpu, bool present);

void set_cpu_online(unsigned int cpu, bool online);

@@ -965,7 +958,19 @@ static inline unsigned int num_possible_cpus(void)
{
return atomic_read(&__num_possible_cpus);
}
-#define num_present_cpus() cpumask_weight(cpu_present_mask)
+
+/**
+ * num_present_cpus() - Read the number of present CPUs
+ *
+ * Despite the fact that __num_present_cpus is of type atomic_t, this
+ * interface gives only a momentary snapshot and is not protected against
+ * concurrent CPU hotplug operations unless invoked from a cpuhp_lock held
+ * region.
+ */
+static inline unsigned int num_present_cpus(void)
+{
+ return atomic_read(&__num_present_cpus);
+}
#define num_active_cpus() cpumask_weight(cpu_active_mask)

static inline bool cpu_online(unsigned int cpu)
diff --git a/kernel/cpu.c b/kernel/cpu.c
index a0a815911173..1f7ea1bdde1a 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -2597,6 +2597,9 @@ EXPORT_SYMBOL(__cpu_online_mask);
struct cpumask __cpu_present_mask __read_mostly;
EXPORT_SYMBOL(__cpu_present_mask);

+atomic_t __num_present_cpus __read_mostly;
+EXPORT_SYMBOL(__num_present_cpus);
+
struct cpumask __cpu_active_mask __read_mostly;
EXPORT_SYMBOL(__cpu_active_mask);

@@ -2609,6 +2612,7 @@ EXPORT_SYMBOL(__num_online_cpus);
void init_cpu_present(const struct cpumask *src)
{
cpumask_copy(&__cpu_present_mask, src);
+ atomic_set(&__num_present_cpus, cpumask_weight(cpu_present_mask));
}

void init_cpu_possible(const struct cpumask *src)
@@ -2662,6 +2666,18 @@ void set_cpu_possible(unsigned int cpu, bool possible)
}
EXPORT_SYMBOL(set_cpu_possible);

+void set_cpu_present(unsigned int cpu, bool present)
+{
+ if (present) {
+ if (!cpumask_test_and_set_cpu(cpu, &__cpu_present_mask))
+ atomic_inc(&__num_present_cpus);
+ } else {
+ if (cpumask_test_and_clear_cpu(cpu, &__cpu_present_mask))
+ atomic_dec(&__num_present_cpus);
+ }
+}
+EXPORT_SYMBOL(set_cpu_present);
+
/*
* Activate the first processor.
*/
--
2.30.2


2021-12-18 21:22:39

by Yury Norov

[permalink] [raw]
Subject: [PATCH 15/17] kernel/cpu: add num_active_cpu counter

Similarly to the online cpus, the cpu_active_mask is actively used
in the kernel. This patch adds a counter for active cpus, so that
users that call num_active_cpus() would know the result immediately,
instead of calling the bitmap_weight for the mask.

Suggested-by: Nicholas Piggin <[email protected]>
Signed-off-by: Yury Norov <[email protected]>
---
include/linux/cpumask.h | 26 +++++++++++++++-----------
kernel/cpu.c | 15 +++++++++++++++
2 files changed, 30 insertions(+), 11 deletions(-)

diff --git a/include/linux/cpumask.h b/include/linux/cpumask.h
index c2a9d15e2cbd..0add872898f8 100644
--- a/include/linux/cpumask.h
+++ b/include/linux/cpumask.h
@@ -101,6 +101,7 @@ extern struct cpumask __cpu_dying_mask;
extern atomic_t __num_online_cpus;
extern atomic_t __num_possible_cpus;
extern atomic_t __num_present_cpus;
+extern atomic_t __num_active_cpus;

extern cpumask_t cpus_booted_once_mask;

@@ -875,17 +876,8 @@ void init_cpu_online(const struct cpumask *src);
void set_cpu_possible(unsigned int cpu, bool possible);
void reset_cpu_possible_mask(void);
void set_cpu_present(unsigned int cpu, bool present);
-
void set_cpu_online(unsigned int cpu, bool online);
-
-static inline void
-set_cpu_active(unsigned int cpu, bool active)
-{
- if (active)
- cpumask_set_cpu(cpu, &__cpu_active_mask);
- else
- cpumask_clear_cpu(cpu, &__cpu_active_mask);
-}
+void set_cpu_active(unsigned int cpu, bool active);

static inline void
set_cpu_dying(unsigned int cpu, bool dying)
@@ -971,7 +963,19 @@ static inline unsigned int num_present_cpus(void)
{
return atomic_read(&__num_present_cpus);
}
-#define num_active_cpus() cpumask_weight(cpu_active_mask)
+
+/**
+ * num_active_cpus() - Read the number of active CPUs
+ *
+ * Despite the fact that __num_active_cpus is of type atomic_t, this
+ * interface gives only a momentary snapshot and is not protected against
+ * concurrent CPU hotplug operations unless invoked from a cpuhp_lock held
+ * region.
+ */
+static inline unsigned int num_active_cpus(void)
+{
+ return atomic_read(&__num_active_cpus);
+}

static inline bool cpu_online(unsigned int cpu)
{
diff --git a/kernel/cpu.c b/kernel/cpu.c
index 1f7ea1bdde1a..62b411d88810 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -2603,6 +2603,9 @@ EXPORT_SYMBOL(__num_present_cpus);
struct cpumask __cpu_active_mask __read_mostly;
EXPORT_SYMBOL(__cpu_active_mask);

+atomic_t __num_active_cpus __read_mostly;
+EXPORT_SYMBOL(__num_active_cpus);
+
struct cpumask __cpu_dying_mask __read_mostly;
EXPORT_SYMBOL(__cpu_dying_mask);

@@ -2678,6 +2681,18 @@ void set_cpu_present(unsigned int cpu, bool present)
}
EXPORT_SYMBOL(set_cpu_present);

+void set_cpu_active(unsigned int cpu, bool active)
+{
+ if (active) {
+ if (!cpumask_test_and_set_cpu(cpu, &__cpu_active_mask))
+ atomic_inc(&__num_active_cpus);
+ } else {
+ if (cpumask_test_and_clear_cpu(cpu, &__cpu_active_mask))
+ atomic_dec(&__num_active_cpus);
+ }
+}
+EXPORT_SYMBOL(set_cpu_active);
+
/*
* Activate the first processor.
*/
--
2.30.2


2021-12-18 21:22:50

by Yury Norov

[permalink] [raw]
Subject: [PATCH 16/17] tools/bitmap: sync bitmap_weight

Pull bitmap_weight_{cmp,eq,gt,ge,lt,le} from mother kernel and
use where applicable.

Signed-off-by: Yury Norov <[email protected]>
---
tools/include/linux/bitmap.h | 44 ++++++++++++++++++++++++++++++++++++
tools/lib/bitmap.c | 20 ++++++++++++++++
tools/perf/util/pmu.c | 2 +-
3 files changed, 65 insertions(+), 1 deletion(-)

diff --git a/tools/include/linux/bitmap.h b/tools/include/linux/bitmap.h
index ea97804d04d4..e8ae9a85d555 100644
--- a/tools/include/linux/bitmap.h
+++ b/tools/include/linux/bitmap.h
@@ -12,6 +12,8 @@
unsigned long name[BITS_TO_LONGS(bits)]

int __bitmap_weight(const unsigned long *bitmap, int bits);
+int __bitmap_weight_cmp(const unsigned long *bitmap, unsigned int bits,
+ unsigned int num);
void __bitmap_or(unsigned long *dst, const unsigned long *bitmap1,
const unsigned long *bitmap2, int bits);
int __bitmap_and(unsigned long *dst, const unsigned long *bitmap1,
@@ -68,6 +70,48 @@ static inline int bitmap_weight(const unsigned long *src, unsigned int nbits)
return __bitmap_weight(src, nbits);
}

+static __always_inline
+int bitmap_weight_cmp(const unsigned long *src, unsigned int nbits, int num)
+{
+ if (num > (int)nbits || num < 0)
+ return -num;
+
+ if (small_const_nbits(nbits))
+ return hweight_long(*src & BITMAP_LAST_WORD_MASK(nbits)) - num;
+
+ return __bitmap_weight_cmp(src, nbits, num);
+}
+
+static __always_inline
+bool bitmap_weight_eq(const unsigned long *src, unsigned int nbits, int num)
+{
+ return bitmap_weight_cmp(src, nbits, num) == 0;
+}
+
+static __always_inline
+bool bitmap_weight_gt(const unsigned long *src, unsigned int nbits, int num)
+{
+ return bitmap_weight_cmp(src, nbits, num) > 0;
+}
+
+static __always_inline
+bool bitmap_weight_ge(const unsigned long *src, unsigned int nbits, int num)
+{
+ return bitmap_weight_cmp(src, nbits, num - 1) > 0;
+}
+
+static __always_inline
+bool bitmap_weight_lt(const unsigned long *src, unsigned int nbits, int num)
+{
+ return bitmap_weight_cmp(src, nbits, num - 1) <= 0;
+}
+
+static __always_inline
+bool bitmap_weight_le(const unsigned long *src, unsigned int nbits, int num)
+{
+ return bitmap_weight_cmp(src, nbits, num) <= 0;
+}
+
static inline void bitmap_or(unsigned long *dst, const unsigned long *src1,
const unsigned long *src2, unsigned int nbits)
{
diff --git a/tools/lib/bitmap.c b/tools/lib/bitmap.c
index db466ef7be9d..06e58fee8523 100644
--- a/tools/lib/bitmap.c
+++ b/tools/lib/bitmap.c
@@ -18,6 +18,26 @@ int __bitmap_weight(const unsigned long *bitmap, int bits)
return w;
}

+int __bitmap_weight_cmp(const unsigned long *bitmap, unsigned int bits, int num)
+{
+ unsigned int k, w, lim = bits / BITS_PER_LONG;
+
+ for (k = 0, w = 0; k < lim; k++) {
+ if (w + bits - k * BITS_PER_LONG < num)
+ goto out;
+
+ w += hweight_long(bitmap[k]);
+
+ if (w > num)
+ goto out;
+ }
+
+ if (bits % BITS_PER_LONG)
+ w += hweight_long(bitmap[k] & BITMAP_LAST_WORD_MASK(bits));
+out:
+ return w - num;
+}
+
void __bitmap_or(unsigned long *dst, const unsigned long *bitmap1,
const unsigned long *bitmap2, int bits)
{
diff --git a/tools/perf/util/pmu.c b/tools/perf/util/pmu.c
index 6ae58406f4fc..015ee1321c7c 100644
--- a/tools/perf/util/pmu.c
+++ b/tools/perf/util/pmu.c
@@ -1314,7 +1314,7 @@ static int pmu_config_term(const char *pmu_name,
*/
if (term->type_val == PARSE_EVENTS__TERM_TYPE_NUM) {
if (term->no_value &&
- bitmap_weight(format->bits, PERF_PMU_FORMAT_BITS) > 1) {
+ bitmap_weight_gt(format->bits, PERF_PMU_FORMAT_BITS, 1)) {
if (err) {
parse_events_error__handle(err, term->err_val,
strdup("no value assigned for term"),
--
2.30.2


2021-12-18 21:22:52

by Yury Norov

[permalink] [raw]
Subject: [PATCH 17/17] MAINTAINERS: add cpumask and nodemask files to BITMAP_API

cpumask and nodemask APIs are thin wrappers around basic bitmap API, and
corresponding files are not formally maintained. This patch adds them to
BITMAP_API section, so that bitmap folks would have closer look at it.

Signed-off-by: Yury Norov <[email protected]>
---
MAINTAINERS | 4 ++++
1 file changed, 4 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 5964e047bc04..ecd41988c871 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3392,10 +3392,14 @@ R: Andy Shevchenko <[email protected]>
R: Rasmus Villemoes <[email protected]>
S: Maintained
F: include/linux/bitmap.h
+F: include/linux/cpumask.h
F: include/linux/find.h
+F: include/linux/nodemask.h
F: lib/bitmap.c
+F: lib/cpumask.c
F: lib/find_bit.c
F: lib/find_bit_benchmark.c
+F: lib/nodemask.c
F: lib/test_bitmap.c
F: tools/include/linux/bitmap.h
F: tools/include/linux/find.h
--
2.30.2


2021-12-18 22:16:18

by Michał Mirosław

[permalink] [raw]
Subject: Re: [PATCH 01/17] all: don't use bitmap_weight() where possible

On Sat, Dec 18, 2021 at 01:19:57PM -0800, Yury Norov wrote:
> Don't call bitmap_weight() if the following code can get by
> without it.
>
> Signed-off-by: Yury Norov <[email protected]>
> ---
> drivers/net/dsa/b53/b53_common.c | 6 +-----
> drivers/net/ethernet/broadcom/bcmsysport.c | 6 +-----
> drivers/thermal/intel/intel_powerclamp.c | 9 +++------
> 3 files changed, 5 insertions(+), 16 deletions(-)
[...]

Looks good, but I think this needs to be split per subsystem.

Best Regards
Micha??Miros?aw

2021-12-18 23:28:43

by Yury Norov

[permalink] [raw]
Subject: Re: [PATCH 01/17] all: don't use bitmap_weight() where possible

On Sat, Dec 18, 2021 at 2:16 PM Michał Mirosław <[email protected]> wrote:
>
> On Sat, Dec 18, 2021 at 01:19:57PM -0800, Yury Norov wrote:
> > Don't call bitmap_weight() if the following code can get by
> > without it.
> >
> > Signed-off-by: Yury Norov <[email protected]>
> > ---
> > drivers/net/dsa/b53/b53_common.c | 6 +-----
> > drivers/net/ethernet/broadcom/bcmsysport.c | 6 +-----
> > drivers/thermal/intel/intel_powerclamp.c | 9 +++------
> > 3 files changed, 5 insertions(+), 16 deletions(-)
> [...]
>
> Looks good,

Does it mean Acked-by, Reviewed-by, or something else?

> but I think this needs to be split per subsystem.

What you ask breaks rules:

Documentation/process/submitting-patches.rst:

Separate each **logical change** into a separate patch.

For example, if your changes include both bug fixes and performance
enhancements for a single driver, separate those changes into two
or more patches. If your changes include an API update, and a new
driver which uses that new API, separate those into two patches.

On the other hand, if you make a single change to numerous files,
group those changes into a single patch. Thus a single logical change
is contained within a single patch.

This is not a dead rule, refer for example the 96d4f267e40f9 ("Remove
'type' argument from access_ok() functioin.")

Or this: https://lkml.org/lkml/2021/6/14/1736

Thanks,
Yury

2021-12-20 16:41:10

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH 08/17] all: replace bitmap_weight with bitmap_weight_{eq,gt,ge,lt,le} where appropriate

On Sat, Dec 18, 2021 at 01:20:04PM -0800, Yury Norov wrote:
> Kernel code calls bitmap_weight() to compare the weight of bitmap with
> a given number. We can do it more efficiently with bitmap_weight_{eq, ...}
> because conditional bitmap_weight may stop traversing the bitmap earlier,
> as soon as condition is met.
>
> This patch replaces bitmap_weight with conditional versions where possible,
> except for small bitmaps which size is not configurable and known at
> constant time. In that case conditional version of bitmap_weight would not
> benefit due to small_const_nbits() optimization; but readability may
> suffer.
>
> Signed-off-by: Yury Norov <[email protected]>
> ---
> arch/x86/kernel/cpu/resctrl/rdtgroup.c | 2 +-
> drivers/iio/dummy/iio_simple_dummy_buffer.c | 4 ++--
> drivers/iio/industrialio-trigger.c | 2 +-
> drivers/memstick/core/ms_block.c | 4 ++--
> drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c | 2 +-
> .../net/ethernet/marvell/octeontx2/nic/otx2_ethtool.c | 2 +-
> .../net/ethernet/marvell/octeontx2/nic/otx2_flows.c | 4 ++--
> drivers/net/ethernet/mellanox/mlx4/cmd.c | 10 +++-------
> drivers/net/ethernet/mellanox/mlx4/eq.c | 4 ++--
> drivers/net/ethernet/mellanox/mlx4/fw.c | 4 ++--
> drivers/net/ethernet/mellanox/mlx4/main.c | 2 +-
> drivers/perf/thunderx2_pmu.c | 4 ++--
> drivers/staging/media/tegra-video/vi.c | 2 +-
> 13 files changed, 21 insertions(+), 25 deletions(-)

"all" is not how to submit changes to the kernel. Please break them up
into subsystem-specific patches, and send them after your core changes
are accepted.

good luck!

greg k-h

2021-12-21 13:14:14

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 15/17] kernel/cpu: add num_active_cpu counter

On Sat, Dec 18, 2021 at 01:20:11PM -0800, Yury Norov wrote:
> Similarly to the online cpus, the cpu_active_mask is actively used
> in the kernel. This patch adds a counter for active cpus, so that
> users that call num_active_cpus() would know the result immediately,
> instead of calling the bitmap_weight for the mask.

Who cares about num_active_cpus() ?

> +EXPORT_SYMBOL(set_cpu_active);

NAK, this should *never*ever* be used from a module.


2021-12-21 13:15:32

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 14/17] kernel/cpu: add num_present_cpu counter

On Sat, Dec 18, 2021 at 01:20:10PM -0800, Yury Norov wrote:
> +EXPORT_SYMBOL(set_cpu_present);

NAK

2021-12-21 13:16:25

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 13/17] kernel/cpu: add num_possible_cpus counter

On Sat, Dec 18, 2021 at 01:20:09PM -0800, Yury Norov wrote:
> Similarly to the online cpus, the cpu_possible_mask is actively used
> in the kernel. This patch adds a counter for possible cpus, so that
> users that call num_possible_cpus() would know the result immediately,
> instead of calling the bitmap_weight for the mask underlying.

So what user actually cares about performance here enough to warrant
this?


> +EXPORT_SYMBOL(set_cpu_possible);

NAK