2020-11-25 15:11:34

by Laurent Vivier

[permalink] [raw]
Subject: [PATCH v3 0/2] powerpc/pseries: fix MSI/X IRQ affinity on pseries

With virtio, in multiqueue case, each queue IRQ is normally

bound to a different CPU using the affinity mask.



This works fine on x86_64 but totally ignored on pseries.



This is not obvious at first look because irqbalance is doing

some balancing to improve that.



It appears that the "managed" flag set in the MSI entry

is never copied to the system IRQ entry.



This series passes the affinity mask from rtas_setup_msi_irqs()

to irq_domain_alloc_descs() by adding an affinity parameter to

irq_create_mapping().



The first patch adds the parameter (no functional change), the

second patch passes the actual affinity mask to irq_create_mapping()

in rtas_setup_msi_irqs().



For instance, with 32 CPUs VM and 32 queues virtio-scsi interface:



... -smp 32 -device virtio-scsi-pci,id=virtio_scsi_pci0,num_queues=32



for IRQ in $(grep virtio2-request /proc/interrupts |cut -d: -f1); do

for file in /proc/irq/$IRQ/ ; do

echo -n "IRQ: $(basename $file) CPU: " ; cat $file/smp_affinity_list

done

done



Without the patch (and without irqbalanced)



IRQ: 268 CPU: 0-31

IRQ: 269 CPU: 0-31

IRQ: 270 CPU: 0-31

IRQ: 271 CPU: 0-31

IRQ: 272 CPU: 0-31

IRQ: 273 CPU: 0-31

IRQ: 274 CPU: 0-31

IRQ: 275 CPU: 0-31

IRQ: 276 CPU: 0-31

IRQ: 277 CPU: 0-31

IRQ: 278 CPU: 0-31

IRQ: 279 CPU: 0-31

IRQ: 280 CPU: 0-31

IRQ: 281 CPU: 0-31

IRQ: 282 CPU: 0-31

IRQ: 283 CPU: 0-31

IRQ: 284 CPU: 0-31

IRQ: 285 CPU: 0-31

IRQ: 286 CPU: 0-31

IRQ: 287 CPU: 0-31

IRQ: 288 CPU: 0-31

IRQ: 289 CPU: 0-31

IRQ: 290 CPU: 0-31

IRQ: 291 CPU: 0-31

IRQ: 292 CPU: 0-31

IRQ: 293 CPU: 0-31

IRQ: 294 CPU: 0-31

IRQ: 295 CPU: 0-31

IRQ: 296 CPU: 0-31

IRQ: 297 CPU: 0-31

IRQ: 298 CPU: 0-31

IRQ: 299 CPU: 0-31



With the patch:



IRQ: 265 CPU: 0

IRQ: 266 CPU: 1

IRQ: 267 CPU: 2

IRQ: 268 CPU: 3

IRQ: 269 CPU: 4

IRQ: 270 CPU: 5

IRQ: 271 CPU: 6

IRQ: 272 CPU: 7

IRQ: 273 CPU: 8

IRQ: 274 CPU: 9

IRQ: 275 CPU: 10

IRQ: 276 CPU: 11

IRQ: 277 CPU: 12

IRQ: 278 CPU: 13

IRQ: 279 CPU: 14

IRQ: 280 CPU: 15

IRQ: 281 CPU: 16

IRQ: 282 CPU: 17

IRQ: 283 CPU: 18

IRQ: 284 CPU: 19

IRQ: 285 CPU: 20

IRQ: 286 CPU: 21

IRQ: 287 CPU: 22

IRQ: 288 CPU: 23

IRQ: 289 CPU: 24

IRQ: 290 CPU: 25

IRQ: 291 CPU: 26

IRQ: 292 CPU: 27

IRQ: 293 CPU: 28

IRQ: 294 CPU: 29

IRQ: 295 CPU: 30

IRQ: 299 CPU: 31



This matches what we have on an x86_64 system.



v3: update changelog of PATCH 1 with comments from Thomas Gleixner and

Marc Zyngier.

v2: add a wrapper around original irq_create_mapping() with the

affinity parameter. Update comments



Laurent Vivier (2):

genirq/irqdomain: Add an irq_create_mapping_affinity() function

powerpc/pseries: pass MSI affinity to irq_create_mapping()



arch/powerpc/platforms/pseries/msi.c | 3 ++-

include/linux/irqdomain.h | 12 ++++++++++--

kernel/irq/irqdomain.c | 13 ++++++++-----

3 files changed, 20 insertions(+), 8 deletions(-)



--

2.28.0





2020-11-25 15:14:32

by Laurent Vivier

[permalink] [raw]
Subject: [PATCH v3 1/2] genirq/irqdomain: Add an irq_create_mapping_affinity() function

There is currently no way to convey the affinity of an interrupt
via irq_create_mapping(), which creates issues for devices that
expect that affinity to be managed by the kernel.

In order to sort this out, rename irq_create_mapping() to
irq_create_mapping_affinity() with an additional affinity parameter
that can conveniently passed down to irq_domain_alloc_descs().

irq_create_mapping() is then re-implemented as a wrapper around
irq_create_mapping_affinity().

Signed-off-by: Laurent Vivier <[email protected]>
Reviewed-by: Greg Kurz <[email protected]>
---
include/linux/irqdomain.h | 12 ++++++++++--
kernel/irq/irqdomain.c | 13 ++++++++-----
2 files changed, 18 insertions(+), 7 deletions(-)

diff --git a/include/linux/irqdomain.h b/include/linux/irqdomain.h
index 71535e87109f..ea5a337e0f8b 100644
--- a/include/linux/irqdomain.h
+++ b/include/linux/irqdomain.h
@@ -384,11 +384,19 @@ extern void irq_domain_associate_many(struct irq_domain *domain,
extern void irq_domain_disassociate(struct irq_domain *domain,
unsigned int irq);

-extern unsigned int irq_create_mapping(struct irq_domain *host,
- irq_hw_number_t hwirq);
+extern unsigned int irq_create_mapping_affinity(struct irq_domain *host,
+ irq_hw_number_t hwirq,
+ const struct irq_affinity_desc *affinity);
extern unsigned int irq_create_fwspec_mapping(struct irq_fwspec *fwspec);
extern void irq_dispose_mapping(unsigned int virq);

+static inline unsigned int irq_create_mapping(struct irq_domain *host,
+ irq_hw_number_t hwirq)
+{
+ return irq_create_mapping_affinity(host, hwirq, NULL);
+}
+
+
/**
* irq_linear_revmap() - Find a linux irq from a hw irq number.
* @domain: domain owning this hardware interrupt
diff --git a/kernel/irq/irqdomain.c b/kernel/irq/irqdomain.c
index cf8b374b892d..e4ca69608f3b 100644
--- a/kernel/irq/irqdomain.c
+++ b/kernel/irq/irqdomain.c
@@ -624,17 +624,19 @@ unsigned int irq_create_direct_mapping(struct irq_domain *domain)
EXPORT_SYMBOL_GPL(irq_create_direct_mapping);

/**
- * irq_create_mapping() - Map a hardware interrupt into linux irq space
+ * irq_create_mapping_affinity() - Map a hardware interrupt into linux irq space
* @domain: domain owning this hardware interrupt or NULL for default domain
* @hwirq: hardware irq number in that domain space
+ * @affinity: irq affinity
*
* Only one mapping per hardware interrupt is permitted. Returns a linux
* irq number.
* If the sense/trigger is to be specified, set_irq_type() should be called
* on the number returned from that call.
*/
-unsigned int irq_create_mapping(struct irq_domain *domain,
- irq_hw_number_t hwirq)
+unsigned int irq_create_mapping_affinity(struct irq_domain *domain,
+ irq_hw_number_t hwirq,
+ const struct irq_affinity_desc *affinity)
{
struct device_node *of_node;
int virq;
@@ -660,7 +662,8 @@ unsigned int irq_create_mapping(struct irq_domain *domain,
}

/* Allocate a virtual interrupt number */
- virq = irq_domain_alloc_descs(-1, 1, hwirq, of_node_to_nid(of_node), NULL);
+ virq = irq_domain_alloc_descs(-1, 1, hwirq, of_node_to_nid(of_node),
+ affinity);
if (virq <= 0) {
pr_debug("-> virq allocation failed\n");
return 0;
@@ -676,7 +679,7 @@ unsigned int irq_create_mapping(struct irq_domain *domain,

return virq;
}
-EXPORT_SYMBOL_GPL(irq_create_mapping);
+EXPORT_SYMBOL_GPL(irq_create_mapping_affinity);

/**
* irq_create_strict_mappings() - Map a range of hw irqs to fixed linux irqs
--
2.28.0

2020-11-25 15:14:52

by Laurent Vivier

[permalink] [raw]
Subject: [PATCH v3 2/2] powerpc/pseries: pass MSI affinity to irq_create_mapping()

With virtio multiqueue, normally each queue IRQ is mapped to a CPU.

But since commit 0d9f0a52c8b9f ("virtio_scsi: use virtio IRQ affinity")
this is broken on pseries.

The affinity is correctly computed in msi_desc but this is not applied
to the system IRQs.

It appears the affinity is correctly passed to rtas_setup_msi_irqs() but
lost at this point and never passed to irq_domain_alloc_descs()
(see commit 06ee6d571f0e ("genirq: Add affinity hint to irq allocation"))
because irq_create_mapping() doesn't take an affinity parameter.

As the previous patch has added the affinity parameter to
irq_create_mapping() we can forward the affinity from rtas_setup_msi_irqs()
to irq_domain_alloc_descs().

With this change, the virtqueues are correctly dispatched between the CPUs
on pseries.

BugId: https://bugzilla.redhat.com/show_bug.cgi?id=1702939
Signed-off-by: Laurent Vivier <[email protected]>
Reviewed-by: Greg Kurz <[email protected]>
---
arch/powerpc/platforms/pseries/msi.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/platforms/pseries/msi.c b/arch/powerpc/platforms/pseries/msi.c
index 133f6adcb39c..b3ac2455faad 100644
--- a/arch/powerpc/platforms/pseries/msi.c
+++ b/arch/powerpc/platforms/pseries/msi.c
@@ -458,7 +458,8 @@ static int rtas_setup_msi_irqs(struct pci_dev *pdev, int nvec_in, int type)
return hwirq;
}

- virq = irq_create_mapping(NULL, hwirq);
+ virq = irq_create_mapping_affinity(NULL, hwirq,
+ entry->affinity);

if (!virq) {
pr_debug("rtas_msi: Failed mapping hwirq %d\n", hwirq);
--
2.28.0

2020-11-25 16:10:29

by Denis Kirjanov

[permalink] [raw]
Subject: Re: [PATCH v3 2/2] powerpc/pseries: pass MSI affinity to irq_create_mapping()

On 11/25/20, Laurent Vivier <[email protected]> wrote:
> With virtio multiqueue, normally each queue IRQ is mapped to a CPU.
>
> But since commit 0d9f0a52c8b9f ("virtio_scsi: use virtio IRQ affinity")
> this is broken on pseries.

Please add "Fixes" tag.

Thanks!

>
> The affinity is correctly computed in msi_desc but this is not applied
> to the system IRQs.
>
> It appears the affinity is correctly passed to rtas_setup_msi_irqs() but
> lost at this point and never passed to irq_domain_alloc_descs()
> (see commit 06ee6d571f0e ("genirq: Add affinity hint to irq allocation"))
> because irq_create_mapping() doesn't take an affinity parameter.
>
> As the previous patch has added the affinity parameter to
> irq_create_mapping() we can forward the affinity from rtas_setup_msi_irqs()
> to irq_domain_alloc_descs().
>
> With this change, the virtqueues are correctly dispatched between the CPUs
> on pseries.
>
> BugId: https://bugzilla.redhat.com/show_bug.cgi?id=1702939
> Signed-off-by: Laurent Vivier <[email protected]>
> Reviewed-by: Greg Kurz <[email protected]>
> ---
> arch/powerpc/platforms/pseries/msi.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/arch/powerpc/platforms/pseries/msi.c
> b/arch/powerpc/platforms/pseries/msi.c
> index 133f6adcb39c..b3ac2455faad 100644
> --- a/arch/powerpc/platforms/pseries/msi.c
> +++ b/arch/powerpc/platforms/pseries/msi.c
> @@ -458,7 +458,8 @@ static int rtas_setup_msi_irqs(struct pci_dev *pdev, int
> nvec_in, int type)
> return hwirq;
> }
>
> - virq = irq_create_mapping(NULL, hwirq);
> + virq = irq_create_mapping_affinity(NULL, hwirq,
> + entry->affinity);
>
> if (!virq) {
> pr_debug("rtas_msi: Failed mapping hwirq %d\n", hwirq);
> --
> 2.28.0
>
>

2020-11-25 16:28:10

by Laurent Vivier

[permalink] [raw]
Subject: Re: [PATCH v3 2/2] powerpc/pseries: pass MSI affinity to irq_create_mapping()

On 25/11/2020 17:05, Denis Kirjanov wrote:
> On 11/25/20, Laurent Vivier <[email protected]> wrote:
>> With virtio multiqueue, normally each queue IRQ is mapped to a CPU.
>>
>> But since commit 0d9f0a52c8b9f ("virtio_scsi: use virtio IRQ affinity")
>> this is broken on pseries.
>
> Please add "Fixes" tag.

In fact, the code in commit 0d9f0a52c8b9f is correct.

The problem is with MSI/X irq affinity and pseries. So this patch fixes more than
virtio_scsi. I put this information because this commit allows to clearly show the
problem. Perhaps I should remove this line in fact?

Thanks,
Laurent

>
> Thanks!
>
>>
>> The affinity is correctly computed in msi_desc but this is not applied
>> to the system IRQs.
>>
>> It appears the affinity is correctly passed to rtas_setup_msi_irqs() but
>> lost at this point and never passed to irq_domain_alloc_descs()
>> (see commit 06ee6d571f0e ("genirq: Add affinity hint to irq allocation"))
>> because irq_create_mapping() doesn't take an affinity parameter.
>>
>> As the previous patch has added the affinity parameter to
>> irq_create_mapping() we can forward the affinity from rtas_setup_msi_irqs()
>> to irq_domain_alloc_descs().
>>
>> With this change, the virtqueues are correctly dispatched between the CPUs
>> on pseries.
>>
>> BugId: https://bugzilla.redhat.com/show_bug.cgi?id=1702939
>> Signed-off-by: Laurent Vivier <[email protected]>
>> Reviewed-by: Greg Kurz <[email protected]>
>> ---
>> arch/powerpc/platforms/pseries/msi.c | 3 ++-
>> 1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/powerpc/platforms/pseries/msi.c
>> b/arch/powerpc/platforms/pseries/msi.c
>> index 133f6adcb39c..b3ac2455faad 100644
>> --- a/arch/powerpc/platforms/pseries/msi.c
>> +++ b/arch/powerpc/platforms/pseries/msi.c
>> @@ -458,7 +458,8 @@ static int rtas_setup_msi_irqs(struct pci_dev *pdev, int
>> nvec_in, int type)
>> return hwirq;
>> }
>>
>> - virq = irq_create_mapping(NULL, hwirq);
>> + virq = irq_create_mapping_affinity(NULL, hwirq,
>> + entry->affinity);
>>
>> if (!virq) {
>> pr_debug("rtas_msi: Failed mapping hwirq %d\n", hwirq);
>> --
>> 2.28.0
>>
>>
>

2020-11-25 16:46:17

by Marc Zyngier

[permalink] [raw]
Subject: Re: [PATCH v3 2/2] powerpc/pseries: pass MSI affinity to irq_create_mapping()

On 2020-11-25 16:24, Laurent Vivier wrote:
> On 25/11/2020 17:05, Denis Kirjanov wrote:
>> On 11/25/20, Laurent Vivier <[email protected]> wrote:
>>> With virtio multiqueue, normally each queue IRQ is mapped to a CPU.
>>>
>>> But since commit 0d9f0a52c8b9f ("virtio_scsi: use virtio IRQ
>>> affinity")
>>> this is broken on pseries.
>>
>> Please add "Fixes" tag.
>
> In fact, the code in commit 0d9f0a52c8b9f is correct.
>
> The problem is with MSI/X irq affinity and pseries. So this patch
> fixes more than virtio_scsi. I put this information because this
> commit allows to clearly show the problem. Perhaps I should remove
> this line in fact?

This patch does not fix virtio_scsi at all, which as you noticed, is
correct. It really fixes the PPC MSI setup, which is starting to show
its age. So getting rid of the reference seems like the right thing to
do.

I'm also not keen on the BugId thing. It should really be a lore link.
I also cannot find any such tag in the kernel, nor is it a documented
practice. The last reference to a Bugzilla entry seems to have happened
with 786b5219081ff16 (five years ago).

Thanks,

M.
--
Jazz is not dead. It just smells funny...

2020-11-25 18:15:42

by Greg Kurz

[permalink] [raw]
Subject: Re: [PATCH v3 2/2] powerpc/pseries: pass MSI affinity to irq_create_mapping()

On Wed, 25 Nov 2020 16:42:30 +0000
Marc Zyngier <[email protected]> wrote:

> On 2020-11-25 16:24, Laurent Vivier wrote:
> > On 25/11/2020 17:05, Denis Kirjanov wrote:
> >> On 11/25/20, Laurent Vivier <[email protected]> wrote:
> >>> With virtio multiqueue, normally each queue IRQ is mapped to a CPU.
> >>>
> >>> But since commit 0d9f0a52c8b9f ("virtio_scsi: use virtio IRQ
> >>> affinity")
> >>> this is broken on pseries.
> >>
> >> Please add "Fixes" tag.
> >
> > In fact, the code in commit 0d9f0a52c8b9f is correct.
> >
> > The problem is with MSI/X irq affinity and pseries. So this patch
> > fixes more than virtio_scsi. I put this information because this
> > commit allows to clearly show the problem. Perhaps I should remove
> > this line in fact?
>
> This patch does not fix virtio_scsi at all, which as you noticed, is
> correct. It really fixes the PPC MSI setup, which is starting to show
> its age. So getting rid of the reference seems like the right thing to
> do.
>
> I'm also not keen on the BugId thing. It should really be a lore link.
> I also cannot find any such tag in the kernel, nor is it a documented
> practice. The last reference to a Bugzilla entry seems to have happened
> with 786b5219081ff16 (five years ago).
>

My bad, I suggested BugId to Laurent but the intent was actually BugLink,
which seems to be commonly used in the kernel.

Cheers,

--
Greg

> Thanks,
>
> M.

2020-11-26 09:39:54

by Michael Ellerman

[permalink] [raw]
Subject: Re: [PATCH v3 2/2] powerpc/pseries: pass MSI affinity to irq_create_mapping()

Marc Zyngier <[email protected]> writes:
> On 2020-11-25 16:24, Laurent Vivier wrote:
>> On 25/11/2020 17:05, Denis Kirjanov wrote:
>>> On 11/25/20, Laurent Vivier <[email protected]> wrote:
>>>> With virtio multiqueue, normally each queue IRQ is mapped to a CPU.
>>>>
>>>> But since commit 0d9f0a52c8b9f ("virtio_scsi: use virtio IRQ
>>>> affinity")
>>>> this is broken on pseries.
>>>
>>> Please add "Fixes" tag.
>>
>> In fact, the code in commit 0d9f0a52c8b9f is correct.
>>
>> The problem is with MSI/X irq affinity and pseries. So this patch
>> fixes more than virtio_scsi. I put this information because this
>> commit allows to clearly show the problem. Perhaps I should remove
>> this line in fact?
>
> This patch does not fix virtio_scsi at all, which as you noticed, is
> correct. It really fixes the PPC MSI setup, which is starting to show
> its age. So getting rid of the reference seems like the right thing to
> do.

It's still useful to refer to that commit if the code worked prior to
that commit. But you should make it clearer that 0d9f0a52c8b9f wasn't in
error, it just exposed an existing shortcoming of the arch code.

cheers

2020-11-26 09:40:00

by Michael Ellerman

[permalink] [raw]
Subject: Re: [PATCH v3 2/2] powerpc/pseries: pass MSI affinity to irq_create_mapping()

Laurent Vivier <[email protected]> writes:
> With virtio multiqueue, normally each queue IRQ is mapped to a CPU.
>
> But since commit 0d9f0a52c8b9f ("virtio_scsi: use virtio IRQ affinity")
> this is broken on pseries.
>
> The affinity is correctly computed in msi_desc but this is not applied
> to the system IRQs.
>
> It appears the affinity is correctly passed to rtas_setup_msi_irqs() but
> lost at this point and never passed to irq_domain_alloc_descs()
> (see commit 06ee6d571f0e ("genirq: Add affinity hint to irq allocation"))
> because irq_create_mapping() doesn't take an affinity parameter.
>
> As the previous patch has added the affinity parameter to
> irq_create_mapping() we can forward the affinity from rtas_setup_msi_irqs()
> to irq_domain_alloc_descs().
>
> With this change, the virtqueues are correctly dispatched between the CPUs
> on pseries.
>
> BugId: https://bugzilla.redhat.com/show_bug.cgi?id=1702939
> Signed-off-by: Laurent Vivier <[email protected]>
> Reviewed-by: Greg Kurz <[email protected]>
> ---
> arch/powerpc/platforms/pseries/msi.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)

Acked-by: Michael Ellerman <[email protected]>

cheers

> diff --git a/arch/powerpc/platforms/pseries/msi.c b/arch/powerpc/platforms/pseries/msi.c
> index 133f6adcb39c..b3ac2455faad 100644
> --- a/arch/powerpc/platforms/pseries/msi.c
> +++ b/arch/powerpc/platforms/pseries/msi.c
> @@ -458,7 +458,8 @@ static int rtas_setup_msi_irqs(struct pci_dev *pdev, int nvec_in, int type)
> return hwirq;
> }
>
> - virq = irq_create_mapping(NULL, hwirq);
> + virq = irq_create_mapping_affinity(NULL, hwirq,
> + entry->affinity);
>
> if (!virq) {
> pr_debug("rtas_msi: Failed mapping hwirq %d\n", hwirq);
> --
> 2.28.0