The drivers currently rely on irq_set_affinity_hint() to either set the
affinity_hint that is consumed by the userspace and/or to enforce a custom
affinity.
irq_set_affinity_hint() as the name suggests is originally introduced to
only set the affinity_hint to help the userspace in guiding the interrupts
and not the affinity itself. However, since the commit
e2e64a932556 "genirq: Set initial affinity in irq_set_affinity_hint()"
irq_set_affinity_hint() also started applying the provided cpumask (if not
NULL) as the affinity for the interrupts. The issue that this commit was
trying to solve is to allow the drivers to enforce their affinity mask to
distribute the interrupts across the CPUs such that they don't always end
up on CPU0. This issue has been resolved within the irq subsystem since the
commit
a0c9259dc4e1 "irq/matrix: Spread interrupts on allocation"
Hence, there is no need for the drivers to overwrite the affinity to spread
as it is dynamically performed at the time of allocation.
Also, irq_set_affinity_hint() setting affinity unconditionally introduces
issues for the drivers that only want to set their affinity_hint and not the
affinity itself as for these driver interrupts the default_smp_affinity_mask
is completely ignored (for detailed investigation please refer to [1]).
Unfortunately reverting the commit e2e64a932556 is not an option at this
point for two reasons [2]:
- Several drivers for a valid reason (performance) rely on this API to
enforce their own affinity mask
- Until very recently this was the only exported interface that was
available
To clear this out Thomas has come up with the following interfaces:
- irq_set_affinity(): only sets affinity of an IRQ [3]
- irq_update_affinity_hint(): Only sets the hint [4]
- irq_set_affinity_and_hint(): Sets both affinity and the hint mask [4]
The first API is already merged in the linux-next tree and a v2 for the
remaining two is included with this patch-set.
To move to the stage where we can safely get rid of the
irq_set_affinity_hint(), which has been marked deprecated, we have to
move all its consumers to these new interfaces. In this patch-set, I have
done that for a few drivers and will hopefully try to move the remaining of
them in the coming days.
Testing
-------
In terms of testing, I have performed some basic testing on x86 to verify
things such as the interrupts are evenly spread on all CPUs, hint mask is
correctly set etc. for the drivers - i40e, iavf, mlx5, mlx4, ixgbe, i40iw
and enic on top of:
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git
master - 5.13.0-rc6
So more testing is probably required for these and the drivers that I didn't
test and any help will be much appreciated.
[1] https://lore.kernel.org/lkml/[email protected]/
[2] https://lore.kernel.org/linux-pci/[email protected]/
[3] https://lore.kernel.org/linux-arm-kernel/[email protected]/
[4] https://lore.kernel.org/patchwork/patch/1434326/
Nitesh Narayan Lal (13):
iavf: Use irq_update_affinity_hint
i40e: Use irq_update_affinity_hint
scsi: megaraid_sas: Use irq_set_affinity_and_hint
scsi: mpt3sas: Use irq_set_affinity_and_hint
RDMA/i40iw: Use irq_update_affinity_hint
enic: Use irq_update_affinity_hint
be2net: Use irq_update_affinity_hint
ixgbe: Use irq_update_affinity_hint
mailbox: Use irq_update_affinity_hint
scsi: lpfc: Use irq_set_affinity
hinic: Use irq_set_affinity_and_hint
net/mlx5: Use irq_update_affinity_hint
net/mlx4: Use irq_update_affinity_hint
Thomas Gleixner (1):
genirq: Provide new interfaces for affinity hints
drivers/infiniband/hw/i40iw/i40iw_main.c | 4 +-
drivers/mailbox/bcm-flexrm-mailbox.c | 4 +-
drivers/net/ethernet/cisco/enic/enic_main.c | 6 +--
drivers/net/ethernet/emulex/benet/be_main.c | 4 +-
drivers/net/ethernet/huawei/hinic/hinic_rx.c | 4 +-
drivers/net/ethernet/intel/i40e/i40e_main.c | 8 ++--
drivers/net/ethernet/intel/iavf/iavf_main.c | 8 ++--
drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 6 +--
drivers/net/ethernet/mellanox/mlx4/eq.c | 6 +--
.../net/ethernet/mellanox/mlx5/core/pci_irq.c | 6 +--
drivers/scsi/lpfc/lpfc_init.c | 4 +-
drivers/scsi/megaraid/megaraid_sas_base.c | 25 ++++++-----
drivers/scsi/mpt3sas/mpt3sas_base.c | 15 ++++---
include/linux/interrupt.h | 41 ++++++++++++++++++-
kernel/irq/manage.c | 8 ++--
15 files changed, 95 insertions(+), 54 deletions(-)
--
The driver uses irq_set_affinity_hint() for two purposes:
- To set the affinity_hint which is consumed by the userspace for
distributing the interrupts
- To apply an affinity that it provides for the i40e interrupts
The latter is done to ensure that all the interrupts are evenly spread
across all available CPUs. However, since commit a0c9259dc4e1 ("irq/matrix:
Spread interrupts on allocation") the spreading of interrupts is
dynamically performed at the time of allocation. Hence, there is no need
for the drivers to enforce their own affinity for the spreading of
interrupts.
Also, irq_set_affinity_hint() applying the provided cpumask as an affinity
for the interrupt is an undocumented side effect. To remove this side
effect irq_set_affinity_hint() has been marked as deprecated and new
interfaces have been introduced. Hence, replace the irq_set_affinity_hint()
with the new interface irq_update_affinity_hint() that only sets the
pointer for the affinity_hint.
Signed-off-by: Nitesh Narayan Lal <[email protected]>
---
drivers/net/ethernet/intel/i40e/i40e_main.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
index 704e474879c5..a4439a86aeb8 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -3873,10 +3873,10 @@ static int i40e_vsi_request_irq_msix(struct i40e_vsi *vsi, char *basename)
*
* get_cpu_mask returns a static constant mask with
* a permanent lifetime so it's ok to pass to
- * irq_set_affinity_hint without making a copy.
+ * irq_update_affinity_hint without making a copy.
*/
cpu = cpumask_local_spread(q_vector->v_idx, -1);
- irq_set_affinity_hint(irq_num, get_cpu_mask(cpu));
+ irq_update_affinity_hint(irq_num, get_cpu_mask(cpu));
}
vsi->irqs_ready = true;
@@ -3887,7 +3887,7 @@ static int i40e_vsi_request_irq_msix(struct i40e_vsi *vsi, char *basename)
vector--;
irq_num = pf->msix_entries[base + vector].vector;
irq_set_affinity_notifier(irq_num, NULL);
- irq_set_affinity_hint(irq_num, NULL);
+ irq_update_affinity_hint(irq_num, NULL);
free_irq(irq_num, &vsi->q_vectors[vector]);
}
return err;
@@ -4695,7 +4695,7 @@ static void i40e_vsi_free_irq(struct i40e_vsi *vsi)
/* clear the affinity notifier in the IRQ descriptor */
irq_set_affinity_notifier(irq_num, NULL);
/* remove our suggested affinity mask for this IRQ */
- irq_set_affinity_hint(irq_num, NULL);
+ irq_update_affinity_hint(irq_num, NULL);
synchronize_irq(irq_num);
free_irq(irq_num, vsi->q_vectors[i]);
--
2.27.0
The driver uses irq_set_affinity_hint() specifically for the high IOPS
queue interrupts for two purposes:
- To set the affinity_hint which is consumed by the userspace for
distributing the interrupts
- To apply an affinity that it provides
The driver enforces its own affinity to bind the high IOPS queue interrupts
to the local NUMA node. However, irq_set_affinity_hint() applying the
provided cpumask as an affinity for the interrupt is an undocumented side
effect.
To remove this side effect irq_set_affinity_hint() has been marked
as deprecated and new interfaces have been introduced. Hence, replace the
irq_set_affinity_hint() with the new interface irq_set_affinity_and_hint()
that clearly indicates the purpose of the usage and is meant to apply the
affinity and set the affinity_hint pointer. Also, replace
irq_set_affinity_hint() with irq_update_affinity_hint() when only
affinity_hint needs to be updated.
Change the megasas_set_high_iops_queue_affinity_hint function name to
megasas_set_high_iops_queue_affinity_and_hint to clearly indicate that the
function is setting both affinity and affinity_hint.
Signed-off-by: Nitesh Narayan Lal <[email protected]>
---
drivers/scsi/megaraid/megaraid_sas_base.c | 25 ++++++++++++++---------
1 file changed, 15 insertions(+), 10 deletions(-)
diff --git a/drivers/scsi/megaraid/megaraid_sas_base.c b/drivers/scsi/megaraid/megaraid_sas_base.c
index 4d4e9dbe5193..54f4eac09589 100644
--- a/drivers/scsi/megaraid/megaraid_sas_base.c
+++ b/drivers/scsi/megaraid/megaraid_sas_base.c
@@ -5666,7 +5666,7 @@ megasas_setup_irqs_msix(struct megasas_instance *instance, u8 is_probe)
"Failed to register IRQ for vector %d.\n", i);
for (j = 0; j < i; j++) {
if (j < instance->low_latency_index_start)
- irq_set_affinity_hint(
+ irq_update_affinity_hint(
pci_irq_vector(pdev, j), NULL);
free_irq(pci_irq_vector(pdev, j),
&instance->irq_context[j]);
@@ -5709,7 +5709,7 @@ megasas_destroy_irqs(struct megasas_instance *instance) {
if (instance->msix_vectors)
for (i = 0; i < instance->msix_vectors; i++) {
if (i < instance->low_latency_index_start)
- irq_set_affinity_hint(
+ irq_update_affinity_hint(
pci_irq_vector(instance->pdev, i), NULL);
free_irq(pci_irq_vector(instance->pdev, i),
&instance->irq_context[i]);
@@ -5840,22 +5840,27 @@ int megasas_get_device_list(struct megasas_instance *instance)
}
/**
- * megasas_set_high_iops_queue_affinity_hint - Set affinity hint for high IOPS queues
- * @instance: Adapter soft state
- * return: void
+ * megasas_set_high_iops_queue_affinity_and_hint - Set affinity and hint
+ * for high IOPS queues
+ * @instance: Adapter soft state
+ * return: void
*/
static inline void
-megasas_set_high_iops_queue_affinity_hint(struct megasas_instance *instance)
+megasas_set_high_iops_queue_affinity_and_hint(struct megasas_instance *instance)
{
int i;
+ unsigned int irq;
int local_numa_node;
+ const struct cpumask *mask;
if (instance->perf_mode == MR_BALANCED_PERF_MODE) {
local_numa_node = dev_to_node(&instance->pdev->dev);
- for (i = 0; i < instance->low_latency_index_start; i++)
- irq_set_affinity_hint(pci_irq_vector(instance->pdev, i),
- cpumask_of_node(local_numa_node));
+ for (i = 0; i < instance->low_latency_index_start; i++) {
+ irq = pci_irq_vector(instance->pdev, i);
+ mask = cpumask_of_node(local_numa_node);
+ irq_update_affinity_hint(irq, mask);
+ }
}
}
@@ -5944,7 +5949,7 @@ megasas_alloc_irq_vectors(struct megasas_instance *instance)
instance->msix_vectors = 0;
if (instance->smp_affinity_enable)
- megasas_set_high_iops_queue_affinity_hint(instance);
+ megasas_set_high_iops_queue_affinity_and_hint(instance);
}
/**
--
2.27.0
The driver uses irq_set_affinity_hint() to update the affinity_hint mask
that is consumed by the userspace to distribute the interrupts. However,
under the hood irq_set_affinity_hint() also applies the provided cpumask
(if not NULL) as the affinity for the given interrupt which is an
undocumented side effect.
To remove this side effect irq_set_affinity_hint() has been marked
as deprecated and new interfaces have been introduced. Hence, replace the
irq_set_affinity_hint() with the new interface irq_update_affinity_hint()
that only updates the affinity_hint pointer.
Signed-off-by: Nitesh Narayan Lal <[email protected]>
---
drivers/infiniband/hw/i40iw/i40iw_main.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/infiniband/hw/i40iw/i40iw_main.c b/drivers/infiniband/hw/i40iw/i40iw_main.c
index b496f30ce066..433d91c30cae 100644
--- a/drivers/infiniband/hw/i40iw/i40iw_main.c
+++ b/drivers/infiniband/hw/i40iw/i40iw_main.c
@@ -266,7 +266,7 @@ static void i40iw_disable_irq(struct i40iw_sc_dev *dev,
i40iw_wr32(dev->hw, I40E_PFINT_DYN_CTLN(msix_vec->idx - 1), 0);
else
i40iw_wr32(dev->hw, I40E_VFINT_DYN_CTLN1(msix_vec->idx - 1), 0);
- irq_set_affinity_hint(msix_vec->irq, NULL);
+ irq_update_affinity_hint(msix_vec->irq, NULL);
free_irq(msix_vec->irq, dev_id);
}
@@ -696,7 +696,7 @@ static enum i40iw_status_code i40iw_configure_ceq_vector(struct i40iw_device *iw
cpumask_clear(&msix_vec->mask);
cpumask_set_cpu(msix_vec->cpu_affinity, &msix_vec->mask);
- irq_set_affinity_hint(msix_vec->irq, &msix_vec->mask);
+ irq_update_affinity_hint(msix_vec->irq, &msix_vec->mask);
if (status) {
i40iw_pr_err("ceq irq config fail\n");
--
2.27.0
The driver uses irq_set_affinity_hint() to update the affinity_hint mask
that is consumed by the userspace to distribute the interrupts. However,
under the hood irq_set_affinity_hint() also applies the provided cpumask
(if not NULL) as the affinity for the given interrupt which is an
undocumented side effect.
To remove this side effect irq_set_affinity_hint() has been marked
as deprecated and new interfaces have been introduced. Hence, replace the
irq_set_affinity_hint() with the new interface irq_update_affinity_hint()
that only updates the affinity_hint pointer.
Signed-off-by: Nitesh Narayan Lal <[email protected]>
---
drivers/net/ethernet/cisco/enic/enic_main.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/drivers/net/ethernet/cisco/enic/enic_main.c b/drivers/net/ethernet/cisco/enic/enic_main.c
index d0a8f7106958..29b7f0ab6b5e 100644
--- a/drivers/net/ethernet/cisco/enic/enic_main.c
+++ b/drivers/net/ethernet/cisco/enic/enic_main.c
@@ -150,10 +150,10 @@ static void enic_set_affinity_hint(struct enic *enic)
!cpumask_available(enic->msix[i].affinity_mask) ||
cpumask_empty(enic->msix[i].affinity_mask))
continue;
- err = irq_set_affinity_hint(enic->msix_entry[i].vector,
+ err = irq_update_affinity_hint(enic->msix_entry[i].vector,
enic->msix[i].affinity_mask);
if (err)
- netdev_warn(enic->netdev, "irq_set_affinity_hint failed, err %d\n",
+ netdev_warn(enic->netdev, "irq_update_affinity_hint failed, err %d\n",
err);
}
@@ -173,7 +173,7 @@ static void enic_unset_affinity_hint(struct enic *enic)
int i;
for (i = 0; i < enic->intr_count; i++)
- irq_set_affinity_hint(enic->msix_entry[i].vector, NULL);
+ irq_update_affinity_hint(enic->msix_entry[i].vector, NULL);
}
static int enic_udp_tunnel_set_port(struct net_device *netdev,
--
2.27.0
The driver uses irq_set_affinity_hint() to update the affinity_hint mask
that is consumed by the userspace to distribute the interrupts. However,
under the hood irq_set_affinity_hint() also applies the provided cpumask
(if not NULL) as the affinity for the given interrupt which is an
undocumented side effect.
To remove this side effect irq_set_affinity_hint() has been marked
as deprecated and new interfaces have been introduced. Hence, replace the
irq_set_affinity_hint() with the new interface irq_update_affinity_hint()
that only updates the affinity_hint pointer.
Signed-off-by: Nitesh Narayan Lal <[email protected]>
---
drivers/net/ethernet/emulex/benet/be_main.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/emulex/benet/be_main.c b/drivers/net/ethernet/emulex/benet/be_main.c
index b6eba29d8e99..7b7d97fcf008 100644
--- a/drivers/net/ethernet/emulex/benet/be_main.c
+++ b/drivers/net/ethernet/emulex/benet/be_main.c
@@ -3491,7 +3491,7 @@ static int be_msix_register(struct be_adapter *adapter)
if (status)
goto err_msix;
- irq_set_affinity_hint(vec, eqo->affinity_mask);
+ irq_update_affinity_hint(vec, eqo->affinity_mask);
}
return 0;
@@ -3552,7 +3552,7 @@ static void be_irq_unregister(struct be_adapter *adapter)
/* MSIx */
for_all_evt_queues(adapter, eqo, i) {
vec = be_msix_vec_get(adapter, eqo);
- irq_set_affinity_hint(vec, NULL);
+ irq_update_affinity_hint(vec, NULL);
free_irq(vec, eqo);
}
--
2.27.0
The driver uses irq_set_affinity_hint() to:
- Set the affinity_hint which is consumed by the userspace for
distributing the interrupts
- Enforce affinity
As per commit 6ac17fe8c14a ("mailbox: bcm-flexrm-mailbox: Set IRQ affinity
hint for FlexRM ring IRQs") the latter is done to ensure that the FlexRM
ring interrupts are evenly spread across all available CPUs. However, since
commit a0c9259dc4e1 ("irq/matrix: Spread interrupts on allocation") the
spreading of interrupts is dynamically performed at the time of allocation.
Hence, there is no need for the drivers to enforce their own affinity for
the spreading of interrupts.
Also, irq_set_affinity_hint() applying the provided cpumask as an affinity
for the interrupt is an undocumented side effect. To remove this side
effect irq_set_affinity_hint() has been marked as deprecated and new
interfaces have been introduced. Hence, replace the irq_set_affinity_hint()
with the new interface irq_update_affinity_hint() that only sets the
affinity_hint pointer.
Signed-off-by: Nitesh Narayan Lal <[email protected]>
---
drivers/mailbox/bcm-flexrm-mailbox.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/mailbox/bcm-flexrm-mailbox.c b/drivers/mailbox/bcm-flexrm-mailbox.c
index b4f33dc399a0..abdd06d1986a 100644
--- a/drivers/mailbox/bcm-flexrm-mailbox.c
+++ b/drivers/mailbox/bcm-flexrm-mailbox.c
@@ -1298,7 +1298,7 @@ static int flexrm_startup(struct mbox_chan *chan)
val = (num_online_cpus() < val) ? val / num_online_cpus() : 1;
cpumask_set_cpu((ring->num / val) % num_online_cpus(),
&ring->irq_aff_hint);
- ret = irq_set_affinity_hint(ring->irq, &ring->irq_aff_hint);
+ ret = irq_update_affinity_hint(ring->irq, &ring->irq_aff_hint);
if (ret) {
dev_err(ring->mbox->dev,
"failed to set IRQ affinity hint for ring%d\n",
@@ -1425,7 +1425,7 @@ static void flexrm_shutdown(struct mbox_chan *chan)
/* Release IRQ */
if (ring->irq_requested) {
- irq_set_affinity_hint(ring->irq, NULL);
+ irq_update_affinity_hint(ring->irq, NULL);
free_irq(ring->irq, ring);
ring->irq_requested = false;
}
--
2.27.0
The driver uses irq_set_affinity_hint() to update the affinity_hint mask
that is consumed by the userspace to distribute the interrupts. However,
under the hood irq_set_affinity_hint() also applies the provided cpumask
(if not NULL) as the affinity for the given interrupt which is an
undocumented side effect.
To remove this side effect irq_set_affinity_hint() has been marked
as deprecated and new interfaces have been introduced. Hence, replace the
irq_set_affinity_hint() with the new interface irq_update_affinity_hint()
that only updates the affinity_hint pointer.
Signed-off-by: Nitesh Narayan Lal <[email protected]>
---
drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index 2ac5b82676f3..b66040319bc6 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -3243,7 +3243,7 @@ static int ixgbe_request_msix_irqs(struct ixgbe_adapter *adapter)
/* If Flow Director is enabled, set interrupt affinity */
if (adapter->flags & IXGBE_FLAG_FDIR_HASH_CAPABLE) {
/* assign the mask for this irq */
- irq_set_affinity_hint(entry->vector,
+ irq_update_affinity_hint(entry->vector,
&q_vector->affinity_mask);
}
}
@@ -3260,7 +3260,7 @@ static int ixgbe_request_msix_irqs(struct ixgbe_adapter *adapter)
free_queue_irqs:
while (vector) {
vector--;
- irq_set_affinity_hint(adapter->msix_entries[vector].vector,
+ irq_update_affinity_hint(adapter->msix_entries[vector].vector,
NULL);
free_irq(adapter->msix_entries[vector].vector,
adapter->q_vector[vector]);
@@ -3394,7 +3394,7 @@ static void ixgbe_free_irq(struct ixgbe_adapter *adapter)
continue;
/* clear the affinity_mask in the IRQ descriptor */
- irq_set_affinity_hint(entry->vector, NULL);
+ irq_update_affinity_hint(entry->vector, NULL);
free_irq(entry->vector, q_vector);
}
--
2.27.0
The driver uses irq_set_affinity_hint() to:
- Set the affinity_hint which is consumed by the userspace for
distributing the interrupts
- Enforce affinity
As per commit 352f58b0d9f2 ("net-next/hinic: Set Rxq irq to specific cpu
for NUMA"), the hinic driver enforces its own affinity to bind IRQs to the
local NUMA node. However, irq_set_affinity_hint() applying the provided
cpumask as an affinity for the interrupt is an undocumented side effect.
To remove this side effect irq_set_affinity_hint() has been marked as
deprecated and new interfaces have been introduced. Hence, replace the
irq_set_affinity_hint() with the new interface
irq_set_affinity_and_hint() that applies the affinity and updates the
affinity_hint pointer. Also, use irq_update_affinity() when only
affinity_hint needs to be updated.
Signed-off-by: Nitesh Narayan Lal <[email protected]>
---
drivers/net/ethernet/huawei/hinic/hinic_rx.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/huawei/hinic/hinic_rx.c b/drivers/net/ethernet/huawei/hinic/hinic_rx.c
index cce08647b9b2..c6cac4bbdb49 100644
--- a/drivers/net/ethernet/huawei/hinic/hinic_rx.c
+++ b/drivers/net/ethernet/huawei/hinic/hinic_rx.c
@@ -547,7 +547,7 @@ static int rx_request_irq(struct hinic_rxq *rxq)
goto err_req_irq;
cpumask_set_cpu(qp->q_id % num_online_cpus(), &rq->affinity_mask);
- err = irq_set_affinity_hint(rq->irq, &rq->affinity_mask);
+ err = irq_set_affinity_and_hint(rq->irq, &rq->affinity_mask);
if (err)
goto err_irq_affinity;
@@ -564,7 +564,7 @@ static void rx_free_irq(struct hinic_rxq *rxq)
{
struct hinic_rq *rq = rxq->rq;
- irq_set_affinity_hint(rq->irq, NULL);
+ irq_update_affinity_hint(rq->irq, NULL);
free_irq(rq->irq, rxq);
rx_del_napi(rxq);
}
--
2.27.0
The driver uses irq_set_affinity_hint() to update the affinity_hint mask
that is consumed by the userspace to distribute the interrupts. However,
under the hood irq_set_affinity_hint() also applies the provided cpumask
(if not NULL) as the affinity for the given interrupt which is an
undocumented side effect.
To remove this side effect irq_set_affinity_hint() has been marked
as deprecated and new interfaces have been introduced. Hence, replace the
irq_set_affinity_hint() with the new interface irq_update_affinity_hint()
that only updates the affinity_hint pointer.
Signed-off-by: Nitesh Narayan Lal <[email protected]>
---
drivers/net/ethernet/mellanox/mlx5/core/pci_irq.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/pci_irq.c b/drivers/net/ethernet/mellanox/mlx5/core/pci_irq.c
index c3373fb1cd7f..a1b9434f4e25 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/pci_irq.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/pci_irq.c
@@ -276,8 +276,8 @@ static int set_comp_irq_affinity_hint(struct mlx5_core_dev *mdev, int i)
cpumask_set_cpu(cpumask_local_spread(i, mdev->priv.numa_node),
irq->mask);
if (IS_ENABLED(CONFIG_SMP) &&
- irq_set_affinity_hint(irqn, irq->mask))
- mlx5_core_warn(mdev, "irq_set_affinity_hint failed, irq 0x%.4x",
+ irq_update_affinity_hint(irqn, irq->mask))
+ mlx5_core_warn(mdev, "irq_update_affinity_hint failed, irq 0x%.4x",
irqn);
return 0;
@@ -291,7 +291,7 @@ static void clear_comp_irq_affinity_hint(struct mlx5_core_dev *mdev, int i)
irq = mlx5_irq_get(mdev, vecidx);
irqn = pci_irq_vector(mdev->pdev, vecidx);
- irq_set_affinity_hint(irqn, NULL);
+ irq_update_affinity_hint(irqn, NULL);
free_cpumask_var(irq->mask);
}
--
2.27.0
On 2021-06-17 19:22, Nitesh Narayan Lal wrote:
> The driver uses irq_set_affinity_hint() specifically for the high IOPS
> queue interrupts for two purposes:
>
> - To set the affinity_hint which is consumed by the userspace for
> distributing the interrupts
>
> - To apply an affinity that it provides
>
> The driver enforces its own affinity to bind the high IOPS queue interrupts
> to the local NUMA node. However, irq_set_affinity_hint() applying the
> provided cpumask as an affinity for the interrupt is an undocumented side
> effect.
>
> To remove this side effect irq_set_affinity_hint() has been marked
> as deprecated and new interfaces have been introduced. Hence, replace the
> irq_set_affinity_hint() with the new interface irq_set_affinity_and_hint()
> that clearly indicates the purpose of the usage and is meant to apply the
> affinity and set the affinity_hint pointer. Also, replace
> irq_set_affinity_hint() with irq_update_affinity_hint() when only
> affinity_hint needs to be updated.
>
> Change the megasas_set_high_iops_queue_affinity_hint function name to
> megasas_set_high_iops_queue_affinity_and_hint to clearly indicate that the
> function is setting both affinity and affinity_hint.
>
> Signed-off-by: Nitesh Narayan Lal <[email protected]>
> ---
> drivers/scsi/megaraid/megaraid_sas_base.c | 25 ++++++++++++++---------
> 1 file changed, 15 insertions(+), 10 deletions(-)
>
> diff --git a/drivers/scsi/megaraid/megaraid_sas_base.c b/drivers/scsi/megaraid/megaraid_sas_base.c
> index 4d4e9dbe5193..54f4eac09589 100644
> --- a/drivers/scsi/megaraid/megaraid_sas_base.c
> +++ b/drivers/scsi/megaraid/megaraid_sas_base.c
> @@ -5666,7 +5666,7 @@ megasas_setup_irqs_msix(struct megasas_instance *instance, u8 is_probe)
> "Failed to register IRQ for vector %d.\n", i);
> for (j = 0; j < i; j++) {
> if (j < instance->low_latency_index_start)
> - irq_set_affinity_hint(
> + irq_update_affinity_hint(
> pci_irq_vector(pdev, j), NULL);
> free_irq(pci_irq_vector(pdev, j),
> &instance->irq_context[j]);
> @@ -5709,7 +5709,7 @@ megasas_destroy_irqs(struct megasas_instance *instance) {
> if (instance->msix_vectors)
> for (i = 0; i < instance->msix_vectors; i++) {
> if (i < instance->low_latency_index_start)
> - irq_set_affinity_hint(
> + irq_update_affinity_hint(
> pci_irq_vector(instance->pdev, i), NULL);
> free_irq(pci_irq_vector(instance->pdev, i),
> &instance->irq_context[i]);
> @@ -5840,22 +5840,27 @@ int megasas_get_device_list(struct megasas_instance *instance)
> }
>
> /**
> - * megasas_set_high_iops_queue_affinity_hint - Set affinity hint for high IOPS queues
> - * @instance: Adapter soft state
> - * return: void
> + * megasas_set_high_iops_queue_affinity_and_hint - Set affinity and hint
> + * for high IOPS queues
> + * @instance: Adapter soft state
> + * return: void
> */
> static inline void
> -megasas_set_high_iops_queue_affinity_hint(struct megasas_instance *instance)
> +megasas_set_high_iops_queue_affinity_and_hint(struct megasas_instance *instance)
> {
> int i;
> + unsigned int irq;
> int local_numa_node;
> + const struct cpumask *mask;
>
> if (instance->perf_mode == MR_BALANCED_PERF_MODE) {
> local_numa_node = dev_to_node(&instance->pdev->dev);
Drive-by nit: you could assign mask in this scope.
> - for (i = 0; i < instance->low_latency_index_start; i++)
> - irq_set_affinity_hint(pci_irq_vector(instance->pdev, i),
> - cpumask_of_node(local_numa_node));
> + for (i = 0; i < instance->low_latency_index_start; i++) {
> + irq = pci_irq_vector(instance->pdev, i);
> + mask = cpumask_of_node(local_numa_node);
> + irq_update_affinity_hint(irq, mask);
And this doesn't seem to match what the commit message says?
Robin.
> + }
> }
> }
>
> @@ -5944,7 +5949,7 @@ megasas_alloc_irq_vectors(struct megasas_instance *instance)
> instance->msix_vectors = 0;
>
> if (instance->smp_affinity_enable)
> - megasas_set_high_iops_queue_affinity_hint(instance);
> + megasas_set_high_iops_queue_affinity_and_hint(instance);
> }
>
> /**
>
On Thu, Jun 17, 2021 at 3:31 PM Robin Murphy <[email protected]> wrote:
>
> On 2021-06-17 19:22, Nitesh Narayan Lal wrote:
> > The driver uses irq_set_affinity_hint() specifically for the high IOPS
> > queue interrupts for two purposes:
> >
> > - To set the affinity_hint which is consumed by the userspace for
> > distributing the interrupts
> >
> > - To apply an affinity that it provides
> >
> > The driver enforces its own affinity to bind the high IOPS queue interrupts
> > to the local NUMA node. However, irq_set_affinity_hint() applying the
> > provided cpumask as an affinity for the interrupt is an undocumented side
> > effect.
> >
> > To remove this side effect irq_set_affinity_hint() has been marked
> > as deprecated and new interfaces have been introduced. Hence, replace the
> > irq_set_affinity_hint() with the new interface irq_set_affinity_and_hint()
> > that clearly indicates the purpose of the usage and is meant to apply the
> > affinity and set the affinity_hint pointer. Also, replace
> > irq_set_affinity_hint() with irq_update_affinity_hint() when only
> > affinity_hint needs to be updated.
> >
> > Change the megasas_set_high_iops_queue_affinity_hint function name to
> > megasas_set_high_iops_queue_affinity_and_hint to clearly indicate that the
> > function is setting both affinity and affinity_hint.
> >
> > Signed-off-by: Nitesh Narayan Lal <[email protected]>
> > ---
> > drivers/scsi/megaraid/megaraid_sas_base.c | 25 ++++++++++++++---------
> > 1 file changed, 15 insertions(+), 10 deletions(-)
> >
> > diff --git a/drivers/scsi/megaraid/megaraid_sas_base.c b/drivers/scsi/megaraid/megaraid_sas_base.c
> > index 4d4e9dbe5193..54f4eac09589 100644
> > --- a/drivers/scsi/megaraid/megaraid_sas_base.c
> > +++ b/drivers/scsi/megaraid/megaraid_sas_base.c
> > @@ -5666,7 +5666,7 @@ megasas_setup_irqs_msix(struct megasas_instance *instance, u8 is_probe)
> > "Failed to register IRQ for vector %d.\n", i);
> > for (j = 0; j < i; j++) {
> > if (j < instance->low_latency_index_start)
> > - irq_set_affinity_hint(
> > + irq_update_affinity_hint(
> > pci_irq_vector(pdev, j), NULL);
> > free_irq(pci_irq_vector(pdev, j),
> > &instance->irq_context[j]);
> > @@ -5709,7 +5709,7 @@ megasas_destroy_irqs(struct megasas_instance *instance) {
> > if (instance->msix_vectors)
> > for (i = 0; i < instance->msix_vectors; i++) {
> > if (i < instance->low_latency_index_start)
> > - irq_set_affinity_hint(
> > + irq_update_affinity_hint(
> > pci_irq_vector(instance->pdev, i), NULL);
> > free_irq(pci_irq_vector(instance->pdev, i),
> > &instance->irq_context[i]);
> > @@ -5840,22 +5840,27 @@ int megasas_get_device_list(struct megasas_instance *instance)
> > }
> >
> > /**
> > - * megasas_set_high_iops_queue_affinity_hint - Set affinity hint for high IOPS queues
> > - * @instance: Adapter soft state
> > - * return: void
> > + * megasas_set_high_iops_queue_affinity_and_hint - Set affinity and hint
> > + * for high IOPS queues
> > + * @instance: Adapter soft state
> > + * return: void
> > */
> > static inline void
> > -megasas_set_high_iops_queue_affinity_hint(struct megasas_instance *instance)
> > +megasas_set_high_iops_queue_affinity_and_hint(struct megasas_instance *instance)
> > {
> > int i;
> > + unsigned int irq;
> > int local_numa_node;
> > + const struct cpumask *mask;
> >
> > if (instance->perf_mode == MR_BALANCED_PERF_MODE) {
> > local_numa_node = dev_to_node(&instance->pdev->dev);
>
> Drive-by nit: you could assign mask in this scope.
>
Sure
> > - for (i = 0; i < instance->low_latency_index_start; i++)
> > - irq_set_affinity_hint(pci_irq_vector(instance->pdev, i),
> > - cpumask_of_node(local_numa_node));
> > + for (i = 0; i < instance->low_latency_index_start; i++) {
> > + irq = pci_irq_vector(instance->pdev, i);
> > + mask = cpumask_of_node(local_numa_node);
> > + irq_update_affinity_hint(irq, mask);
>
> And this doesn't seem to match what the commit message says?
>
Clearly, thanks for catching it.
This should be irq_set_affinity_and_hint().
> Robin.
>
> > + }
> > }
> > }
> >
> > @@ -5944,7 +5949,7 @@ megasas_alloc_irq_vectors(struct megasas_instance *instance)
> > instance->msix_vectors = 0;
> >
> > if (instance->smp_affinity_enable)
> > - megasas_set_high_iops_queue_affinity_hint(instance);
> > + megasas_set_high_iops_queue_affinity_and_hint(instance);
> > }
> >
> > /**
> >
>
--
Thanks
Nitesh
From: Thomas Gleixner <[email protected]>
The discussion about removing the side effect of irq_set_affinity_hint() of
actually applying the cpumask (if not NULL) as affinity to the interrupt,
unearthed a few unpleasantries:
1) The modular perf drivers rely on the current behaviour for the very
wrong reasons.
2) While none of the other drivers prevents user space from changing
the affinity, a cursorily inspection shows that there are at least
expectations in some drivers.
#1 needs to be cleaned up anyway, so that's not a problem
#2 might result in subtle regressions especially when irqbalanced (which
nowadays ignores the affinity hint) is disabled.
Provide new interfaces:
irq_update_affinity_hint() - Only sets the affinity hint pointer
irq_set_affinity_and_hint() - Set the pointer and apply the affinity to
the interrupt
Make irq_set_affinity_hint() a wrapper around irq_apply_affinity_hint() and
document it to be phased out.
Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: Nitesh Narayan Lal <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
---
include/linux/interrupt.h | 41 ++++++++++++++++++++++++++++++++++++++-
kernel/irq/manage.c | 8 ++++----
2 files changed, 44 insertions(+), 5 deletions(-)
diff --git a/include/linux/interrupt.h b/include/linux/interrupt.h
index 2ed65b01c961..4ca491a76033 100644
--- a/include/linux/interrupt.h
+++ b/include/linux/interrupt.h
@@ -328,7 +328,46 @@ extern int irq_force_affinity(unsigned int irq, const struct cpumask *cpumask);
extern int irq_can_set_affinity(unsigned int irq);
extern int irq_select_affinity(unsigned int irq);
-extern int irq_set_affinity_hint(unsigned int irq, const struct cpumask *m);
+extern int __irq_apply_affinity_hint(unsigned int irq, const struct cpumask *m,
+ bool setaffinity);
+
+/**
+ * irq_update_affinity_hint - Update the affinity hint
+ * @irq: Interrupt to update
+ * @cpumask: cpumask pointer (NULL to clear the hint)
+ *
+ * Updates the affinity hint, but does not change the affinity of the interrupt.
+ */
+static inline int
+irq_update_affinity_hint(unsigned int irq, const struct cpumask *m)
+{
+ return __irq_apply_affinity_hint(irq, m, false);
+}
+
+/**
+ * irq_set_affinity_and_hint - Update the affinity hint and apply the provided
+ * cpumask to the interrupt
+ * @irq: Interrupt to update
+ * @cpumask: cpumask pointer (NULL to clear the hint)
+ *
+ * Updates the affinity hint and if @cpumask is not NULL it applies it as
+ * the affinity of that interrupt.
+ */
+static inline int
+irq_set_affinity_and_hint(unsigned int irq, const struct cpumask *m)
+{
+ return __irq_apply_affinity_hint(irq, m, true);
+}
+
+/*
+ * Deprecated. Use irq_update_affinity_hint() or irq_set_affinity_and_hint()
+ * instead.
+ */
+static inline int irq_set_affinity_hint(unsigned int irq, const struct cpumask *m)
+{
+ return irq_set_affinity_and_hint(irq, m);
+}
+
extern int irq_update_affinity_desc(unsigned int irq,
struct irq_affinity_desc *affinity);
diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c
index ef30b4762947..837b63e63111 100644
--- a/kernel/irq/manage.c
+++ b/kernel/irq/manage.c
@@ -487,7 +487,8 @@ int irq_force_affinity(unsigned int irq, const struct cpumask *cpumask)
}
EXPORT_SYMBOL_GPL(irq_force_affinity);
-int irq_set_affinity_hint(unsigned int irq, const struct cpumask *m)
+int __irq_apply_affinity_hint(unsigned int irq, const struct cpumask *m,
+ bool setaffinity)
{
unsigned long flags;
struct irq_desc *desc = irq_get_desc_lock(irq, &flags, IRQ_GET_DESC_CHECK_GLOBAL);
@@ -496,12 +497,11 @@ int irq_set_affinity_hint(unsigned int irq, const struct cpumask *m)
return -EINVAL;
desc->affinity_hint = m;
irq_put_desc_unlock(desc, flags);
- /* set the initial affinity to prevent every interrupt being on CPU0 */
- if (m)
+ if (m && setaffinity)
__irq_set_affinity(irq, m, false);
return 0;
}
-EXPORT_SYMBOL_GPL(irq_set_affinity_hint);
+EXPORT_SYMBOL_GPL(__irq_apply_affinity_hint);
static void irq_affinity_notify(struct work_struct *work)
{
--
2.27.0
The driver uses irq_set_affinity_hint() for two purposes:
- To set the affinity_hint which is consumed by the userspace for
distributing the interrupts
- To apply an affinity that it provides for the iavf interrupts
The latter is done to ensure that all the interrupts are evenly spread
across all available CPUs. However, since commit a0c9259dc4e1 ("irq/matrix:
Spread interrupts on allocation") the spreading of interrupts is
dynamically performed at the time of allocation. Hence, there is no need
for the drivers to enforce their own affinity for the spreading of
interrupts.
Also, irq_set_affinity_hint() applying the provided cpumask as an affinity
for the interrupt is an undocumented side effect. To remove this side
effect irq_set_affinity_hint() has been marked as deprecated and new
interfaces have been introduced. Hence, replace the irq_set_affinity_hint()
with the new interface irq_update_affinity_hint() that only sets the
pointer for the affinity_hint.
Signed-off-by: Nitesh Narayan Lal <[email protected]>
---
drivers/net/ethernet/intel/iavf/iavf_main.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/drivers/net/ethernet/intel/iavf/iavf_main.c b/drivers/net/ethernet/intel/iavf/iavf_main.c
index e612c24fa384..f2e8fab53cb9 100644
--- a/drivers/net/ethernet/intel/iavf/iavf_main.c
+++ b/drivers/net/ethernet/intel/iavf/iavf_main.c
@@ -449,10 +449,10 @@ iavf_request_traffic_irqs(struct iavf_adapter *adapter, char *basename)
irq_set_affinity_notifier(irq_num, &q_vector->affinity_notify);
/* Spread the IRQ affinity hints across online CPUs. Note that
* get_cpu_mask returns a mask with a permanent lifetime so
- * it's safe to use as a hint for irq_set_affinity_hint.
+ * it's safe to use as a hint for irq_update_affinity_hint.
*/
cpu = cpumask_local_spread(q_vector->v_idx, -1);
- irq_set_affinity_hint(irq_num, get_cpu_mask(cpu));
+ irq_update_affinity_hint(irq_num, get_cpu_mask(cpu));
}
return 0;
@@ -462,7 +462,7 @@ iavf_request_traffic_irqs(struct iavf_adapter *adapter, char *basename)
vector--;
irq_num = adapter->msix_entries[vector + NONQ_VECS].vector;
irq_set_affinity_notifier(irq_num, NULL);
- irq_set_affinity_hint(irq_num, NULL);
+ irq_update_affinity_hint(irq_num, NULL);
free_irq(irq_num, &adapter->q_vectors[vector]);
}
return err;
@@ -514,7 +514,7 @@ static void iavf_free_traffic_irqs(struct iavf_adapter *adapter)
for (vector = 0; vector < q_vectors; vector++) {
irq_num = adapter->msix_entries[vector + NONQ_VECS].vector;
irq_set_affinity_notifier(irq_num, NULL);
- irq_set_affinity_hint(irq_num, NULL);
+ irq_update_affinity_hint(irq_num, NULL);
free_irq(irq_num, &adapter->q_vectors[vector]);
}
}
--
2.27.0
The driver uses irq_set_affinity_hint() specifically for the high IOPS
queue interrupts for two purposes:
- To set the affinity_hint which is consumed by the userspace for
distributing the interrupts
- To apply an affinity that it provides
The driver enforces its own affinity to bind the high IOPS queue interrupts
to the local NUMA node. However, irq_set_affinity_hint() applying the
provided cpumask as an affinity (if not NULL) for the interrupt is an
undocumented side effect.
To remove this side effect irq_set_affinity_hint() has been marked
as deprecated and new interfaces have been introduced. Hence, replace the
irq_set_affinity_hint() with the new interface irq_set_affinity_and_hint()
that clearly indicates the purpose of the usage and is meant to apply the
affinity and set the affinity_hint pointer. Also, replace
irq_set_affinity_hint() with irq_update_affinity_hint() when only
affinity_hint needs to be updated.
Signed-off-by: Nitesh Narayan Lal <[email protected]>
---
drivers/scsi/mpt3sas/mpt3sas_base.c | 15 +++++++--------
1 file changed, 7 insertions(+), 8 deletions(-)
diff --git a/drivers/scsi/mpt3sas/mpt3sas_base.c b/drivers/scsi/mpt3sas/mpt3sas_base.c
index 5779f313f6f8..c112c30577bb 100644
--- a/drivers/scsi/mpt3sas/mpt3sas_base.c
+++ b/drivers/scsi/mpt3sas/mpt3sas_base.c
@@ -2998,8 +2998,8 @@ _base_free_irq(struct MPT3SAS_ADAPTER *ioc)
list_for_each_entry_safe(reply_q, next, &ioc->reply_queue_list, list) {
list_del(&reply_q->list);
if (ioc->smp_affinity_enable)
- irq_set_affinity_hint(pci_irq_vector(ioc->pdev,
- reply_q->msix_index), NULL);
+ irq_update_affinity_hint(pci_irq_vector(ioc->pdev,
+ reply_q->msix_index), NULL);
free_irq(pci_irq_vector(ioc->pdev, reply_q->msix_index),
reply_q);
kfree(reply_q);
@@ -3055,15 +3055,13 @@ _base_request_irq(struct MPT3SAS_ADAPTER *ioc, u8 index)
* @ioc: per adapter object
*
* The enduser would need to set the affinity via /proc/irq/#/smp_affinity
- *
- * It would nice if we could call irq_set_affinity, however it is not
- * an exported symbol
*/
static void
_base_assign_reply_queues(struct MPT3SAS_ADAPTER *ioc)
{
- unsigned int cpu, nr_cpus, nr_msix, index = 0;
+ unsigned int cpu, nr_cpus, nr_msix, index = 0, irq;
struct adapter_reply_queue *reply_q;
+ const struct cpumask *mask;
int local_numa_node;
if (!_base_is_controller_msix_enabled(ioc))
@@ -3090,8 +3088,9 @@ _base_assign_reply_queues(struct MPT3SAS_ADAPTER *ioc)
local_numa_node = dev_to_node(&ioc->pdev->dev);
for (index = 0; index < ioc->high_iops_queues;
index++) {
- irq_set_affinity_hint(pci_irq_vector(ioc->pdev,
- index), cpumask_of_node(local_numa_node));
+ irq = pci_irq_vector(ioc->pdev, index);
+ mask = cpumask_of_node(local_numa_node);
+ irq_set_affinity_and_hint(irq, mask);
}
}
--
2.27.0
The driver uses irq_set_affinity_hint to set the affinity for the lpfc
interrupts to a mask corresponding to the local NUMA node to avoid
performance overhead on AMD architectures.
However, irq_set_affinity_hint() setting the affinity is an undocumented
side effect that this function also sets the affinity under the hood.
To remove this side effect irq_set_affinity_hint() has been marked as
deprecated and new interfaces have been introduced.
Also, as per the commit dcaa21367938 ("scsi: lpfc: Change default IRQ model
on AMD architectures"):
"On AMD architecture, revert the irq allocation to the normal style
(non-managed) and then use irq_set_affinity_hint() to set the cpu affinity
and disable user-space rebalancing."
we don't really need to set the affinity_hint as user-space rebalancing for
the lpfc interrupts is not desired.
Hence, replace the irq_set_affinity_hint() with irq_set_affinity() which
only applies the affinity for the interrupts.
Signed-off-by: Nitesh Narayan Lal <[email protected]>
---
drivers/scsi/lpfc/lpfc_init.c | 4 +---
1 file changed, 1 insertion(+), 3 deletions(-)
diff --git a/drivers/scsi/lpfc/lpfc_init.c b/drivers/scsi/lpfc/lpfc_init.c
index 5f018d02bf56..d6e48414018d 100644
--- a/drivers/scsi/lpfc/lpfc_init.c
+++ b/drivers/scsi/lpfc/lpfc_init.c
@@ -11360,7 +11360,7 @@ lpfc_irq_set_aff(struct lpfc_hba_eq_hdl *eqhdl, unsigned int cpu)
cpumask_clear(&eqhdl->aff_mask);
cpumask_set_cpu(cpu, &eqhdl->aff_mask);
irq_set_status_flags(eqhdl->irq, IRQ_NO_BALANCING);
- irq_set_affinity_hint(eqhdl->irq, &eqhdl->aff_mask);
+ irq_set_affinity(eqhdl->irq, &eqhdl->aff_mask);
}
/**
@@ -11649,7 +11649,6 @@ lpfc_sli4_enable_msix(struct lpfc_hba *phba)
for (--index; index >= 0; index--) {
eqhdl = lpfc_get_eq_hdl(index);
lpfc_irq_clear_aff(eqhdl);
- irq_set_affinity_hint(eqhdl->irq, NULL);
free_irq(eqhdl->irq, eqhdl);
}
@@ -11810,7 +11809,6 @@ lpfc_sli4_disable_intr(struct lpfc_hba *phba)
for (index = 0; index < phba->cfg_irq_chann; index++) {
eqhdl = lpfc_get_eq_hdl(index);
lpfc_irq_clear_aff(eqhdl);
- irq_set_affinity_hint(eqhdl->irq, NULL);
free_irq(eqhdl->irq, eqhdl);
}
} else {
--
2.27.0
The driver uses irq_set_affinity_hint() to update the affinity_hint mask
that is consumed by the userspace to distribute the interrupts. However,
under the hood irq_set_affinity_hint() also applies the provided cpumask
(if not NULL) as the affinity for the given interrupt which is an
undocumented side effect.
To remove this side effect irq_set_affinity_hint() has been marked
as deprecated and new interfaces have been introduced. Hence, replace the
irq_set_affinity_hint() with the new interface irq_update_affinity_hint()
that only updates the affinity_hint pointer.
Signed-off-by: Nitesh Narayan Lal <[email protected]>
---
drivers/net/ethernet/mellanox/mlx4/eq.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx4/eq.c b/drivers/net/ethernet/mellanox/mlx4/eq.c
index 9e48509ed3b2..f549d697ca95 100644
--- a/drivers/net/ethernet/mellanox/mlx4/eq.c
+++ b/drivers/net/ethernet/mellanox/mlx4/eq.c
@@ -244,9 +244,9 @@ static void mlx4_set_eq_affinity_hint(struct mlx4_priv *priv, int vec)
cpumask_empty(eq->affinity_mask))
return;
- hint_err = irq_set_affinity_hint(eq->irq, eq->affinity_mask);
+ hint_err = irq_update_affinity_hint(eq->irq, eq->affinity_mask);
if (hint_err)
- mlx4_warn(dev, "irq_set_affinity_hint failed, err %d\n", hint_err);
+ mlx4_warn(dev, "irq_update_affinity_hint failed, err %d\n", hint_err);
}
#endif
@@ -1124,7 +1124,7 @@ static void mlx4_free_irqs(struct mlx4_dev *dev)
if (eq_table->eq[i].have_irq) {
free_cpumask_var(eq_table->eq[i].affinity_mask);
#if defined(CONFIG_SMP)
- irq_set_affinity_hint(eq_table->eq[i].irq, NULL);
+ irq_update_affinity_hint(eq_table->eq[i].irq, NULL);
#endif
free_irq(eq_table->eq[i].irq, eq_table->eq + i);
eq_table->eq[i].have_irq = 0;
--
2.27.0
On Thu, Jun 17, 2021 at 02:22:42PM -0400, Nitesh Narayan Lal wrote:
> The driver uses irq_set_affinity_hint() to update the affinity_hint mask
> that is consumed by the userspace to distribute the interrupts. However,
> under the hood irq_set_affinity_hint() also applies the provided cpumask
> (if not NULL) as the affinity for the given interrupt which is an
> undocumented side effect.
>
> To remove this side effect irq_set_affinity_hint() has been marked
> as deprecated and new interfaces have been introduced. Hence, replace the
> irq_set_affinity_hint() with the new interface irq_update_affinity_hint()
> that only updates the affinity_hint pointer.
>
> Signed-off-by: Nitesh Narayan Lal <[email protected]>
> ---
> drivers/net/ethernet/mellanox/mlx4/eq.c | 6 +++---
> 1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/net/ethernet/mellanox/mlx4/eq.c b/drivers/net/ethernet/mellanox/mlx4/eq.c
> index 9e48509ed3b2..f549d697ca95 100644
> --- a/drivers/net/ethernet/mellanox/mlx4/eq.c
> +++ b/drivers/net/ethernet/mellanox/mlx4/eq.c
> @@ -244,9 +244,9 @@ static void mlx4_set_eq_affinity_hint(struct mlx4_priv *priv, int vec)
> cpumask_empty(eq->affinity_mask))
> return;
>
> - hint_err = irq_set_affinity_hint(eq->irq, eq->affinity_mask);
> + hint_err = irq_update_affinity_hint(eq->irq, eq->affinity_mask);
> if (hint_err)
> - mlx4_warn(dev, "irq_set_affinity_hint failed, err %d\n", hint_err);
> + mlx4_warn(dev, "irq_update_affinity_hint failed, err %d\n", hint_err);
> }
> #endif
>
> @@ -1124,7 +1124,7 @@ static void mlx4_free_irqs(struct mlx4_dev *dev)
> if (eq_table->eq[i].have_irq) {
> free_cpumask_var(eq_table->eq[i].affinity_mask);
> #if defined(CONFIG_SMP)
> - irq_set_affinity_hint(eq_table->eq[i].irq, NULL);
> + irq_update_affinity_hint(eq_table->eq[i].irq, NULL);
> #endif
This #if/endif can be deleted.
Thanks,
Reviewed-by: Leon Romanovsky <[email protected]>
On Thu, Jun 17, 2021 at 02:22:41PM -0400, Nitesh Narayan Lal wrote:
> The driver uses irq_set_affinity_hint() to update the affinity_hint mask
> that is consumed by the userspace to distribute the interrupts. However,
> under the hood irq_set_affinity_hint() also applies the provided cpumask
> (if not NULL) as the affinity for the given interrupt which is an
> undocumented side effect.
>
> To remove this side effect irq_set_affinity_hint() has been marked
> as deprecated and new interfaces have been introduced. Hence, replace the
> irq_set_affinity_hint() with the new interface irq_update_affinity_hint()
> that only updates the affinity_hint pointer.
>
> Signed-off-by: Nitesh Narayan Lal <[email protected]>
> ---
> drivers/net/ethernet/mellanox/mlx5/core/pci_irq.c | 6 +++---
> 1 file changed, 3 insertions(+), 3 deletions(-)
>
Thanks,
Reviewed-by: Leon Romanovsky <[email protected]>
On Mon, Jun 21, 2021 at 4:02 AM Leon Romanovsky <[email protected]> wrote:
>
> On Thu, Jun 17, 2021 at 02:22:42PM -0400, Nitesh Narayan Lal wrote:
> > The driver uses irq_set_affinity_hint() to update the affinity_hint mask
> > that is consumed by the userspace to distribute the interrupts. However,
> > under the hood irq_set_affinity_hint() also applies the provided cpumask
> > (if not NULL) as the affinity for the given interrupt which is an
> > undocumented side effect.
> >
> > To remove this side effect irq_set_affinity_hint() has been marked
> > as deprecated and new interfaces have been introduced. Hence, replace the
> > irq_set_affinity_hint() with the new interface irq_update_affinity_hint()
> > that only updates the affinity_hint pointer.
> >
> > Signed-off-by: Nitesh Narayan Lal <[email protected]>
> > ---
> > drivers/net/ethernet/mellanox/mlx4/eq.c | 6 +++---
> > 1 file changed, 3 insertions(+), 3 deletions(-)
> >
> > diff --git a/drivers/net/ethernet/mellanox/mlx4/eq.c b/drivers/net/ethernet/mellanox/mlx4/eq.c
> > index 9e48509ed3b2..f549d697ca95 100644
> > --- a/drivers/net/ethernet/mellanox/mlx4/eq.c
> > +++ b/drivers/net/ethernet/mellanox/mlx4/eq.c
> > @@ -244,9 +244,9 @@ static void mlx4_set_eq_affinity_hint(struct mlx4_priv *priv, int vec)
> > cpumask_empty(eq->affinity_mask))
> > return;
> >
> > - hint_err = irq_set_affinity_hint(eq->irq, eq->affinity_mask);
> > + hint_err = irq_update_affinity_hint(eq->irq, eq->affinity_mask);
> > if (hint_err)
> > - mlx4_warn(dev, "irq_set_affinity_hint failed, err %d\n", hint_err);
> > + mlx4_warn(dev, "irq_update_affinity_hint failed, err %d\n", hint_err);
> > }
> > #endif
> >
> > @@ -1124,7 +1124,7 @@ static void mlx4_free_irqs(struct mlx4_dev *dev)
> > if (eq_table->eq[i].have_irq) {
> > free_cpumask_var(eq_table->eq[i].affinity_mask);
> > #if defined(CONFIG_SMP)
> > - irq_set_affinity_hint(eq_table->eq[i].irq, NULL);
> > + irq_update_affinity_hint(eq_table->eq[i].irq, NULL);
> > #endif
>
> This #if/endif can be deleted.
>
Ack will get rid of them in v2.
> Thanks,
> Reviewed-by: Leon Romanovsky <[email protected]>
>
--
Thanks
Nitesh
On Thu, Jun 17, 2021 at 2:23 PM Nitesh Narayan Lal <[email protected]> wrote:
>
> From: Thomas Gleixner <[email protected]>
>
> The discussion about removing the side effect of irq_set_affinity_hint() of
> actually applying the cpumask (if not NULL) as affinity to the interrupt,
> unearthed a few unpleasantries:
>
> 1) The modular perf drivers rely on the current behaviour for the very
> wrong reasons.
>
> 2) While none of the other drivers prevents user space from changing
> the affinity, a cursorily inspection shows that there are at least
> expectations in some drivers.
>
> #1 needs to be cleaned up anyway, so that's not a problem
>
> #2 might result in subtle regressions especially when irqbalanced (which
> nowadays ignores the affinity hint) is disabled.
>
> Provide new interfaces:
>
> irq_update_affinity_hint() - Only sets the affinity hint pointer
> irq_set_affinity_and_hint() - Set the pointer and apply the affinity to
> the interrupt
>
> Make irq_set_affinity_hint() a wrapper around irq_apply_affinity_hint() and
> document it to be phased out.
>
> Signed-off-by: Thomas Gleixner <[email protected]>
> Signed-off-by: Nitesh Narayan Lal <[email protected]>
> Link: https://lore.kernel.org/r/[email protected]
> ---
> include/linux/interrupt.h | 41 ++++++++++++++++++++++++++++++++++++++-
> kernel/irq/manage.c | 8 ++++----
> 2 files changed, 44 insertions(+), 5 deletions(-)
>
> diff --git a/include/linux/interrupt.h b/include/linux/interrupt.h
> index 2ed65b01c961..4ca491a76033 100644
> --- a/include/linux/interrupt.h
> +++ b/include/linux/interrupt.h
> @@ -328,7 +328,46 @@ extern int irq_force_affinity(unsigned int irq, const struct cpumask *cpumask);
> extern int irq_can_set_affinity(unsigned int irq);
> extern int irq_select_affinity(unsigned int irq);
>
> -extern int irq_set_affinity_hint(unsigned int irq, const struct cpumask *m);
> +extern int __irq_apply_affinity_hint(unsigned int irq, const struct cpumask *m,
> + bool setaffinity);
> +
> +/**
> + * irq_update_affinity_hint - Update the affinity hint
> + * @irq: Interrupt to update
> + * @cpumask: cpumask pointer (NULL to clear the hint)
> + *
> + * Updates the affinity hint, but does not change the affinity of the interrupt.
> + */
> +static inline int
> +irq_update_affinity_hint(unsigned int irq, const struct cpumask *m)
> +{
> + return __irq_apply_affinity_hint(irq, m, false);
> +}
> +
> +/**
> + * irq_set_affinity_and_hint - Update the affinity hint and apply the provided
> + * cpumask to the interrupt
> + * @irq: Interrupt to update
> + * @cpumask: cpumask pointer (NULL to clear the hint)
> + *
> + * Updates the affinity hint and if @cpumask is not NULL it applies it as
> + * the affinity of that interrupt.
> + */
> +static inline int
> +irq_set_affinity_and_hint(unsigned int irq, const struct cpumask *m)
> +{
> + return __irq_apply_affinity_hint(irq, m, true);
> +}
> +
> +/*
> + * Deprecated. Use irq_update_affinity_hint() or irq_set_affinity_and_hint()
> + * instead.
> + */
> +static inline int irq_set_affinity_hint(unsigned int irq, const struct cpumask *m)
> +{
> + return irq_set_affinity_and_hint(irq, m);
> +}
> +
> extern int irq_update_affinity_desc(unsigned int irq,
> struct irq_affinity_desc *affinity);
>
> diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c
> index ef30b4762947..837b63e63111 100644
> --- a/kernel/irq/manage.c
> +++ b/kernel/irq/manage.c
> @@ -487,7 +487,8 @@ int irq_force_affinity(unsigned int irq, const struct cpumask *cpumask)
> }
> EXPORT_SYMBOL_GPL(irq_force_affinity);
>
> -int irq_set_affinity_hint(unsigned int irq, const struct cpumask *m)
> +int __irq_apply_affinity_hint(unsigned int irq, const struct cpumask *m,
> + bool setaffinity)
> {
> unsigned long flags;
> struct irq_desc *desc = irq_get_desc_lock(irq, &flags, IRQ_GET_DESC_CHECK_GLOBAL);
> @@ -496,12 +497,11 @@ int irq_set_affinity_hint(unsigned int irq, const struct cpumask *m)
> return -EINVAL;
> desc->affinity_hint = m;
> irq_put_desc_unlock(desc, flags);
> - /* set the initial affinity to prevent every interrupt being on CPU0 */
> - if (m)
> + if (m && setaffinity)
> __irq_set_affinity(irq, m, false);
> return 0;
> }
> -EXPORT_SYMBOL_GPL(irq_set_affinity_hint);
> +EXPORT_SYMBOL_GPL(__irq_apply_affinity_hint);
>
> static void irq_affinity_notify(struct work_struct *work)
> {
> --
> 2.27.0
>
It turns out that this patch has an issue. The new interfaces are not
added under the #ifdef (CONFIG_SMP)'s else section.
I will fix it and send a v2 with other changes.
--
Thanks
Nitesh
On Mon, Jun 21, 2021 at 4:02 AM Leon Romanovsky <[email protected]> wrote:
>
> On Thu, Jun 17, 2021 at 02:22:42PM -0400, Nitesh Narayan Lal wrote:
> > The driver uses irq_set_affinity_hint() to update the affinity_hint mask
> > that is consumed by the userspace to distribute the interrupts. However,
> > under the hood irq_set_affinity_hint() also applies the provided cpumask
> > (if not NULL) as the affinity for the given interrupt which is an
> > undocumented side effect.
> >
> > To remove this side effect irq_set_affinity_hint() has been marked
> > as deprecated and new interfaces have been introduced. Hence, replace the
> > irq_set_affinity_hint() with the new interface irq_update_affinity_hint()
> > that only updates the affinity_hint pointer.
> >
> > Signed-off-by: Nitesh Narayan Lal <[email protected]>
> > ---
> > drivers/net/ethernet/mellanox/mlx4/eq.c | 6 +++---
> > 1 file changed, 3 insertions(+), 3 deletions(-)
> >
> > diff --git a/drivers/net/ethernet/mellanox/mlx4/eq.c b/drivers/net/ethernet/mellanox/mlx4/eq.c
> > index 9e48509ed3b2..f549d697ca95 100644
> > --- a/drivers/net/ethernet/mellanox/mlx4/eq.c
> > +++ b/drivers/net/ethernet/mellanox/mlx4/eq.c
> > @@ -244,9 +244,9 @@ static void mlx4_set_eq_affinity_hint(struct mlx4_priv *priv, int vec)
> > cpumask_empty(eq->affinity_mask))
> > return;
> >
> > - hint_err = irq_set_affinity_hint(eq->irq, eq->affinity_mask);
> > + hint_err = irq_update_affinity_hint(eq->irq, eq->affinity_mask);
> > if (hint_err)
> > - mlx4_warn(dev, "irq_set_affinity_hint failed, err %d\n", hint_err);
> > + mlx4_warn(dev, "irq_update_affinity_hint failed, err %d\n", hint_err);
> > }
> > #endif
> >
> > @@ -1124,7 +1124,7 @@ static void mlx4_free_irqs(struct mlx4_dev *dev)
> > if (eq_table->eq[i].have_irq) {
> > free_cpumask_var(eq_table->eq[i].affinity_mask);
> > #if defined(CONFIG_SMP)
> > - irq_set_affinity_hint(eq_table->eq[i].irq, NULL);
> > + irq_update_affinity_hint(eq_table->eq[i].irq, NULL);
> > #endif
>
> This #if/endif can be deleted.
I think we also can get rid of the other #if/endif CONFIG_SMP
occurrences that are present around mlx4_set_eq_affinity_hint()
definition and call, isn't it?
There is already a check-in interrupt.h so doing it again in the
driver looks like an unwanted repetition IMHO.
>
> Thanks,
> Reviewed-by: Leon Romanovsky <[email protected]>
>
--
Thanks
Nitesh