2022-04-21 18:42:26

by Marc Zyngier

[permalink] [raw]
Subject: Re: [PATCH] irqchip/armada-370-xp: Enable MSI affinity configuration

Hi Nathan,

On Thu, 21 Apr 2022 02:57:28 +0100,
Nathan Rossi <[email protected]> wrote:
>
> From: Nathan Rossi <[email protected]>
>
> With multiple devices attached via PCIe to an Armada 385 it is possible
> to overwhelm a single CPU with MSI interrupts. Under certain scenarios
> configuring the interrupts to be handled by more than one CPU would
> prevent the system from being overwhelmed. However the
> irqchip-aramada-370-xp driver is configured to only handle MSIs on the
> boot CPU, and provides no affinity configuration.
>
> This change adds support to the armada-370-xp driver to allow for
> configuring the affinity of specific MSI irqs and to generate the
> interrupts on secondary CPUs. This is done by enabling the private
> doorbell for all online CPUs and configures all CPUs to unmask MSI
> specific private doorbell bits. The CPU affinity selection of the
> interrupt is handled by the target list of the software triggered
> interrupt value, which is provided as the MSI message. The message has
> the associated CPU bit set for the target CPU. For private doorbell
> interrupts only one bit can be set otherwise all CPUs will receive the
> interrupt, so the lowest CPU in the affinity mask is used. This means
> that by default the first CPU will handle all the interrupts as was the
> case before.
>
> Signed-off-by: Nathan Rossi <[email protected]>
> ---
> drivers/irqchip/irq-armada-370-xp.c | 34 ++++++++++++++++++++++++++++++++--
> 1 file changed, 32 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/irqchip/irq-armada-370-xp.c b/drivers/irqchip/irq-armada-370-xp.c
> index 5b8d571c04..42c257f576 100644
> --- a/drivers/irqchip/irq-armada-370-xp.c
> +++ b/drivers/irqchip/irq-armada-370-xp.c
> @@ -209,15 +209,37 @@ static struct msi_domain_info armada_370_xp_msi_domain_info = {
>
> static void armada_370_xp_compose_msi_msg(struct irq_data *data, struct msi_msg *msg)
> {
> +#ifdef CONFIG_SMP
> + unsigned int cpu = cpumask_first(irq_data_get_effective_affinity_mask(data));
> +
> + msg->data = (1 << (cpu + 8)) | (data->hwirq + PCI_MSI_DOORBELL_START);

BIT(cpu + 8) | ...

> +#else
> + msg->data = 0xf00 | (data->hwirq + PCI_MSI_DOORBELL_START);

This paints the existing code a bit differently. This seems to target
all 4 CPUs. Why is that? I'd expect only bit 8 to be set, and the
whole #ifdefery to go away.

> +#endif
> msg->address_lo = lower_32_bits(msi_doorbell_addr);
> msg->address_hi = upper_32_bits(msi_doorbell_addr);
> - msg->data = 0xf00 | (data->hwirq + PCI_MSI_DOORBELL_START);
> }
>
> static int armada_370_xp_msi_set_affinity(struct irq_data *irq_data,
> const struct cpumask *mask, bool force)
> {
> - return -EINVAL;
> +#ifdef CONFIG_SMP
> + unsigned int cpu;
> +
> + if (!force)
> + cpu = cpumask_any_and(mask, cpu_online_mask);
> + else
> + cpu = cpumask_first(mask);
> +
> + if (cpu >= nr_cpu_ids)
> + return -EINVAL;
> +
> + irq_data_update_effective_affinity(irq_data, cpumask_of(cpu));
> +
> + return IRQ_SET_MASK_OK;
> +#else
> + return -EINVAL;
> +#endif
> }
>
> static struct irq_chip armada_370_xp_msi_bottom_irq_chip = {
> @@ -482,6 +504,7 @@ static void armada_xp_mpic_smp_cpu_init(void)
> static void armada_xp_mpic_reenable_percpu(void)
> {
> unsigned int irq;
> + u32 reg;
>
> /* Re-enable per-CPU interrupts that were enabled before suspend */
> for (irq = 0; irq < ARMADA_370_XP_MAX_PER_CPU_IRQS; irq++) {
> @@ -501,6 +524,13 @@ static void armada_xp_mpic_reenable_percpu(void)
> }
>
> ipi_resume();
> +
> + /* Enable MSI doorbell mask and combined cpu local interrupt */
> + reg = readl(per_cpu_int_base + ARMADA_370_XP_IN_DRBEL_MSK_OFFS)
> + | PCI_MSI_DOORBELL_MASK;
> + writel(reg, per_cpu_int_base + ARMADA_370_XP_IN_DRBEL_MSK_OFFS);
> + /* Unmask local doorbell interrupt */
> + writel(1, per_cpu_int_base + ARMADA_370_XP_INT_CLEAR_MASK_OFFS);

This is a duplicate of what is already in armada_370_xp_msi_init().
Please refactor it so that this doesn't happen twice on the first CPU.

This otherwise seem like a valuable improvement on the current
behaviour,

M.

--
Without deviation from the norm, progress is not possible.


2022-04-22 03:11:55

by Nathan Rossi

[permalink] [raw]
Subject: Re: [PATCH] irqchip/armada-370-xp: Enable MSI affinity configuration

On Thu, 21 Apr 2022 at 16:54, Marc Zyngier <[email protected]> wrote:
>
> Hi Nathan,
>
> On Thu, 21 Apr 2022 02:57:28 +0100,
> Nathan Rossi <[email protected]> wrote:
> >
> > From: Nathan Rossi <[email protected]>
> >
> > With multiple devices attached via PCIe to an Armada 385 it is possible
> > to overwhelm a single CPU with MSI interrupts. Under certain scenarios
> > configuring the interrupts to be handled by more than one CPU would
> > prevent the system from being overwhelmed. However the
> > irqchip-aramada-370-xp driver is configured to only handle MSIs on the
> > boot CPU, and provides no affinity configuration.
> >
> > This change adds support to the armada-370-xp driver to allow for
> > configuring the affinity of specific MSI irqs and to generate the
> > interrupts on secondary CPUs. This is done by enabling the private
> > doorbell for all online CPUs and configures all CPUs to unmask MSI
> > specific private doorbell bits. The CPU affinity selection of the
> > interrupt is handled by the target list of the software triggered
> > interrupt value, which is provided as the MSI message. The message has
> > the associated CPU bit set for the target CPU. For private doorbell
> > interrupts only one bit can be set otherwise all CPUs will receive the
> > interrupt, so the lowest CPU in the affinity mask is used. This means
> > that by default the first CPU will handle all the interrupts as was the
> > case before.
> >
> > Signed-off-by: Nathan Rossi <[email protected]>
> > ---
> > drivers/irqchip/irq-armada-370-xp.c | 34 ++++++++++++++++++++++++++++++++--
> > 1 file changed, 32 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/irqchip/irq-armada-370-xp.c b/drivers/irqchip/irq-armada-370-xp.c
> > index 5b8d571c04..42c257f576 100644
> > --- a/drivers/irqchip/irq-armada-370-xp.c
> > +++ b/drivers/irqchip/irq-armada-370-xp.c
> > @@ -209,15 +209,37 @@ static struct msi_domain_info armada_370_xp_msi_domain_info = {
> >
> > static void armada_370_xp_compose_msi_msg(struct irq_data *data, struct msi_msg *msg)
> > {
> > +#ifdef CONFIG_SMP
> > + unsigned int cpu = cpumask_first(irq_data_get_effective_affinity_mask(data));
> > +
> > + msg->data = (1 << (cpu + 8)) | (data->hwirq + PCI_MSI_DOORBELL_START);
>
> BIT(cpu + 8) | ...
>
> > +#else
> > + msg->data = 0xf00 | (data->hwirq + PCI_MSI_DOORBELL_START);
>
> This paints the existing code a bit differently. This seems to target
> all 4 CPUs. Why is that? I'd expect only bit 8 to be set, and the
> whole #ifdefery to go away.

I am not sure why this is targeting 4 CPUs, it will be masked by the
percpu doorbell mask register and is effectively BIT(8). At least
based on the documentation I have (only for armada 370/38x), which is
why I left it as an #ifdef. I was also not able to find any specifics
as to why it is targeting all 4 CPUs in git history. However this
value was added with the initial driver implementation when only
armada 370 was available in the kernel, so it is perhaps an
inconsistent value that was never an issue due to the bits being
reserved. I will remove the #ifdef in a v2 patch that addresses your
other comments.

>
> > +#endif
> > msg->address_lo = lower_32_bits(msi_doorbell_addr);
> > msg->address_hi = upper_32_bits(msi_doorbell_addr);
> > - msg->data = 0xf00 | (data->hwirq + PCI_MSI_DOORBELL_START);
> > }
> >
> > static int armada_370_xp_msi_set_affinity(struct irq_data *irq_data,
> > const struct cpumask *mask, bool force)
> > {
> > - return -EINVAL;
> > +#ifdef CONFIG_SMP
> > + unsigned int cpu;
> > +
> > + if (!force)
> > + cpu = cpumask_any_and(mask, cpu_online_mask);
> > + else
> > + cpu = cpumask_first(mask);
> > +
> > + if (cpu >= nr_cpu_ids)
> > + return -EINVAL;
> > +
> > + irq_data_update_effective_affinity(irq_data, cpumask_of(cpu));
> > +
> > + return IRQ_SET_MASK_OK;
> > +#else
> > + return -EINVAL;
> > +#endif
> > }
> >
> > static struct irq_chip armada_370_xp_msi_bottom_irq_chip = {
> > @@ -482,6 +504,7 @@ static void armada_xp_mpic_smp_cpu_init(void)
> > static void armada_xp_mpic_reenable_percpu(void)
> > {
> > unsigned int irq;
> > + u32 reg;
> >
> > /* Re-enable per-CPU interrupts that were enabled before suspend */
> > for (irq = 0; irq < ARMADA_370_XP_MAX_PER_CPU_IRQS; irq++) {
> > @@ -501,6 +524,13 @@ static void armada_xp_mpic_reenable_percpu(void)
> > }
> >
> > ipi_resume();
> > +
> > + /* Enable MSI doorbell mask and combined cpu local interrupt */
> > + reg = readl(per_cpu_int_base + ARMADA_370_XP_IN_DRBEL_MSK_OFFS)
> > + | PCI_MSI_DOORBELL_MASK;
> > + writel(reg, per_cpu_int_base + ARMADA_370_XP_IN_DRBEL_MSK_OFFS);
> > + /* Unmask local doorbell interrupt */
> > + writel(1, per_cpu_int_base + ARMADA_370_XP_INT_CLEAR_MASK_OFFS);
>
> This is a duplicate of what is already in armada_370_xp_msi_init().
> Please refactor it so that this doesn't happen twice on the first CPU.

It is duplicated, however armada_xp_mpic_reenable_percpu is not called
on the boot cpu as the setup is called with cpuhp_setup_state_nocalls.

Thanks,
Nathan


>
> This otherwise seem like a valuable improvement on the current
> behaviour,
>
> M.
>
> --
> Without deviation from the norm, progress is not possible.

2022-04-22 19:14:34

by Andrew Lunn

[permalink] [raw]
Subject: Re: [PATCH] irqchip/armada-370-xp: Enable MSI affinity configuration

> > > static void armada_370_xp_compose_msi_msg(struct irq_data *data, struct msi_msg *msg)
> > > {
> > > +#ifdef CONFIG_SMP
> > > + unsigned int cpu = cpumask_first(irq_data_get_effective_affinity_mask(data));
> > > +
> > > + msg->data = (1 << (cpu + 8)) | (data->hwirq + PCI_MSI_DOORBELL_START);
> >
> > BIT(cpu + 8) | ...
> >
> > > +#else
> > > + msg->data = 0xf00 | (data->hwirq + PCI_MSI_DOORBELL_START);
> >
> > This paints the existing code a bit differently. This seems to target
> > all 4 CPUs. Why is that? I'd expect only bit 8 to be set, and the
> > whole #ifdefery to go away.
>
> I will remove the #ifdef in a v2 patch that addresses your
> other comments.

Please try to remove all the #ifdef'ery.

> > > static int armada_370_xp_msi_set_affinity(struct irq_data *irq_data,
> > > const struct cpumask *mask, bool force)
> > > {
> > > - return -EINVAL;
> > > +#ifdef CONFIG_SMP
> > > + unsigned int cpu;
> > > +
> > > + if (!force)
> > > + cpu = cpumask_any_and(mask, cpu_online_mask);
> > > + else
> > > + cpu = cpumask_first(mask);
> > > +
> > > + if (cpu >= nr_cpu_ids)
> > > + return -EINVAL;
> > > +
> > > + irq_data_update_effective_affinity(irq_data, cpumask_of(cpu));
> > > +
> > > + return IRQ_SET_MASK_OK;
> > > +#else
> > > + return -EINVAL;
> > > +#endif

A quick look in cpumask.h suggests that if NR_CPUS == 1, there are
stub functions which return constant values. So you might not need
this #ifdef. However, i'm a network guy, not a scheduling guy, so
don't trust what i say...

Andrew

2022-04-22 20:51:19

by Marc Zyngier

[permalink] [raw]
Subject: Re: [PATCH] irqchip/armada-370-xp: Enable MSI affinity configuration

On Thu, 21 Apr 2022 09:32:23 +0100,
Nathan Rossi <[email protected]> wrote:
>
> On Thu, 21 Apr 2022 at 16:54, Marc Zyngier <[email protected]> wrote:
> >
> > Hi Nathan,
> >
> > On Thu, 21 Apr 2022 02:57:28 +0100,
> > Nathan Rossi <[email protected]> wrote:
> > >
> > > From: Nathan Rossi <[email protected]>
> > >
> > > With multiple devices attached via PCIe to an Armada 385 it is possible
> > > to overwhelm a single CPU with MSI interrupts. Under certain scenarios
> > > configuring the interrupts to be handled by more than one CPU would
> > > prevent the system from being overwhelmed. However the
> > > irqchip-aramada-370-xp driver is configured to only handle MSIs on the
> > > boot CPU, and provides no affinity configuration.
> > >
> > > This change adds support to the armada-370-xp driver to allow for
> > > configuring the affinity of specific MSI irqs and to generate the
> > > interrupts on secondary CPUs. This is done by enabling the private
> > > doorbell for all online CPUs and configures all CPUs to unmask MSI
> > > specific private doorbell bits. The CPU affinity selection of the
> > > interrupt is handled by the target list of the software triggered
> > > interrupt value, which is provided as the MSI message. The message has
> > > the associated CPU bit set for the target CPU. For private doorbell
> > > interrupts only one bit can be set otherwise all CPUs will receive the
> > > interrupt, so the lowest CPU in the affinity mask is used. This means
> > > that by default the first CPU will handle all the interrupts as was the
> > > case before.
> > >
> > > Signed-off-by: Nathan Rossi <[email protected]>
> > > ---
> > > drivers/irqchip/irq-armada-370-xp.c | 34 ++++++++++++++++++++++++++++++++--
> > > 1 file changed, 32 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/drivers/irqchip/irq-armada-370-xp.c b/drivers/irqchip/irq-armada-370-xp.c
> > > index 5b8d571c04..42c257f576 100644
> > > --- a/drivers/irqchip/irq-armada-370-xp.c
> > > +++ b/drivers/irqchip/irq-armada-370-xp.c
> > > @@ -209,15 +209,37 @@ static struct msi_domain_info armada_370_xp_msi_domain_info = {
> > >
> > > static void armada_370_xp_compose_msi_msg(struct irq_data *data, struct msi_msg *msg)
> > > {
> > > +#ifdef CONFIG_SMP
> > > + unsigned int cpu = cpumask_first(irq_data_get_effective_affinity_mask(data));
> > > +
> > > + msg->data = (1 << (cpu + 8)) | (data->hwirq + PCI_MSI_DOORBELL_START);
> >
> > BIT(cpu + 8) | ...
> >
> > > +#else
> > > + msg->data = 0xf00 | (data->hwirq + PCI_MSI_DOORBELL_START);
> >
> > This paints the existing code a bit differently. This seems to target
> > all 4 CPUs. Why is that? I'd expect only bit 8 to be set, and the
> > whole #ifdefery to go away.
>
> I am not sure why this is targeting 4 CPUs, it will be masked by the
> percpu doorbell mask register and is effectively BIT(8). At least
> based on the documentation I have (only for armada 370/38x), which is
> why I left it as an #ifdef. I was also not able to find any specifics
> as to why it is targeting all 4 CPUs in git history. However this
> value was added with the initial driver implementation when only
> armada 370 was available in the kernel, so it is perhaps an
> inconsistent value that was never an issue due to the bits being
> reserved. I will remove the #ifdef in a v2 patch that addresses your
> other comments.

I guess we can get at least some testing from the platform maintainers
to check that this doesn't regress the UP systems.

>
> >
> > > +#endif
> > > msg->address_lo = lower_32_bits(msi_doorbell_addr);
> > > msg->address_hi = upper_32_bits(msi_doorbell_addr);
> > > - msg->data = 0xf00 | (data->hwirq + PCI_MSI_DOORBELL_START);
> > > }
> > >
> > > static int armada_370_xp_msi_set_affinity(struct irq_data *irq_data,
> > > const struct cpumask *mask, bool force)
> > > {
> > > - return -EINVAL;
> > > +#ifdef CONFIG_SMP
> > > + unsigned int cpu;
> > > +
> > > + if (!force)
> > > + cpu = cpumask_any_and(mask, cpu_online_mask);
> > > + else
> > > + cpu = cpumask_first(mask);
> > > +
> > > + if (cpu >= nr_cpu_ids)
> > > + return -EINVAL;
> > > +
> > > + irq_data_update_effective_affinity(irq_data, cpumask_of(cpu));
> > > +
> > > + return IRQ_SET_MASK_OK;
> > > +#else
> > > + return -EINVAL;
> > > +#endif
> > > }
> > >
> > > static struct irq_chip armada_370_xp_msi_bottom_irq_chip = {
> > > @@ -482,6 +504,7 @@ static void armada_xp_mpic_smp_cpu_init(void)
> > > static void armada_xp_mpic_reenable_percpu(void)
> > > {
> > > unsigned int irq;
> > > + u32 reg;
> > >
> > > /* Re-enable per-CPU interrupts that were enabled before suspend */
> > > for (irq = 0; irq < ARMADA_370_XP_MAX_PER_CPU_IRQS; irq++) {
> > > @@ -501,6 +524,13 @@ static void armada_xp_mpic_reenable_percpu(void)
> > > }
> > >
> > > ipi_resume();
> > > +
> > > + /* Enable MSI doorbell mask and combined cpu local interrupt */
> > > + reg = readl(per_cpu_int_base + ARMADA_370_XP_IN_DRBEL_MSK_OFFS)
> > > + | PCI_MSI_DOORBELL_MASK;
> > > + writel(reg, per_cpu_int_base + ARMADA_370_XP_IN_DRBEL_MSK_OFFS);
> > > + /* Unmask local doorbell interrupt */
> > > + writel(1, per_cpu_int_base + ARMADA_370_XP_INT_CLEAR_MASK_OFFS);
> >
> > This is a duplicate of what is already in armada_370_xp_msi_init().
> > Please refactor it so that this doesn't happen twice on the first CPU.
>
> It is duplicated, however armada_xp_mpic_reenable_percpu is not called
> on the boot cpu as the setup is called with cpuhp_setup_state_nocalls.

Ah, right. Make sure we can get rid of the code duplication then.

Thanks,

M.

--
Without deviation from the norm, progress is not possible.