2023-05-27 06:11:36

by Huacai Chen

[permalink] [raw]
Subject: [PATCH 1/2] genirq/msi, platform-msi: Adjust return value of msi_domain_prepare_irqs()

Adjust the return value semanteme of msi_domain_prepare_irqs(), which
allows us to modify the input nvec by overriding the msi_domain_ops::
msi_prepare(). This is necessary for the later patch.

Before:
0 on success, others on error.

After:
= 0: Success;
> 0: The modified nvec;
< 0: Error code.

Callers are also updated.

Signed-off-by: Huacai Chen <[email protected]>
---
drivers/base/platform-msi.c | 2 +-
kernel/irq/msi.c | 10 +++++++++-
2 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/drivers/base/platform-msi.c b/drivers/base/platform-msi.c
index f37ad34c80ec..e4a517c144e7 100644
--- a/drivers/base/platform-msi.c
+++ b/drivers/base/platform-msi.c
@@ -298,7 +298,7 @@ __platform_msi_create_device_domain(struct device *dev,

platform_msi_set_proxy_dev(&data->arg);
err = msi_domain_prepare_irqs(domain->parent, dev, nvec, &data->arg);
- if (err)
+ if (err < 0)
goto free_domain;

return domain;
diff --git a/kernel/irq/msi.c b/kernel/irq/msi.c
index 7a97bcb086bf..d151936aec05 100644
--- a/kernel/irq/msi.c
+++ b/kernel/irq/msi.c
@@ -1058,6 +1058,12 @@ bool msi_match_device_irq_domain(struct device *dev, unsigned int domid,
return ret;
}

+/*
+ * Return Val:
+ * = 0: Success;
+ * > 0: The modified nvec;
+ * < 0: Error code.
+ */
int msi_domain_prepare_irqs(struct irq_domain *domain, struct device *dev,
int nvec, msi_alloc_info_t *arg)
{
@@ -1260,8 +1266,10 @@ static int __msi_domain_alloc_irqs(struct device *dev, struct irq_domain *domain
int i, ret, virq;

ret = msi_domain_prepare_irqs(domain, dev, ctrl->nirqs, &arg);
- if (ret)
+ if (ret < 0)
return ret;
+ if (ret > 0)
+ ctrl->nirqs = ret;

/*
* This flag is set by the PCI layer as we need to activate
--
2.39.1



2023-05-27 14:54:24

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH 1/2] genirq/msi, platform-msi: Adjust return value of msi_domain_prepare_irqs()

On Sat, May 27 2023 at 13:46, Huacai Chen wrote:
> Adjust the return value semanteme of msi_domain_prepare_irqs(), which
> allows us to modify the input nvec by overriding the msi_domain_ops::
> msi_prepare(). This is necessary for the later patch.
>
> Before:
> 0 on success, others on error.
>
> After:
> = 0: Success;
>> 0: The modified nvec;
> < 0: Error code.

This explains what the patch does, but provides zero justification for
this nor any analysis why this is correct for the existing use cases.

That longsoon MSI domain is a PCI MSI domain. PCI/MSI has already a
mechanism to return the actual possible number of vectors if the
underlying space is exhausted.

Why is that not sufficient for your problem at hand?

Thanks,

tglx

2023-05-28 03:44:52

by Huacai Chen

[permalink] [raw]
Subject: Re: [PATCH 1/2] genirq/msi, platform-msi: Adjust return value of msi_domain_prepare_irqs()

Hi, Thomas,

On Sat, May 27, 2023 at 10:03 PM Thomas Gleixner <[email protected]> wrote:
>
> On Sat, May 27 2023 at 13:46, Huacai Chen wrote:
> > Adjust the return value semanteme of msi_domain_prepare_irqs(), which
> > allows us to modify the input nvec by overriding the msi_domain_ops::
> > msi_prepare(). This is necessary for the later patch.
> >
> > Before:
> > 0 on success, others on error.
> >
> > After:
> > = 0: Success;
> >> 0: The modified nvec;
> > < 0: Error code.
>
> This explains what the patch does, but provides zero justification for
> this nor any analysis why this is correct for the existing use cases.
I checked all msi_prepare() callbacks and none of them return positive
values now, so I think it is correct.

>
> That longsoon MSI domain is a PCI MSI domain. PCI/MSI has already a
> mechanism to return the actual possible number of vectors if the
> underlying space is exhausted.
>
> Why is that not sufficient for your problem at hand?
Hmm, maybe I should make things clearer. We want to do some proactive
throttling here. For example, if we have two NICs, we want both of
them to get 32 msi vectors, not one get 64 vectors, and the other
fallback to use legacy irq.

Huacai
>
> Thanks,
>
> tglx

2023-05-28 07:53:26

by Marc Zyngier

[permalink] [raw]
Subject: Re: [PATCH 1/2] genirq/msi, platform-msi: Adjust return value of msi_domain_prepare_irqs()

On Sat, 27 May 2023 15:03:29 +0100,
Thomas Gleixner <[email protected]> wrote:
>
> On Sat, May 27 2023 at 13:46, Huacai Chen wrote:
> > Adjust the return value semanteme of msi_domain_prepare_irqs(), which
> > allows us to modify the input nvec by overriding the msi_domain_ops::
> > msi_prepare(). This is necessary for the later patch.
> >
> > Before:
> > 0 on success, others on error.
> >
> > After:
> > = 0: Success;
> >> 0: The modified nvec;
> > < 0: Error code.
>
> This explains what the patch does, but provides zero justification for
> this nor any analysis why this is correct for the existing use cases.
>
> That longsoon MSI domain is a PCI MSI domain. PCI/MSI has already a
> mechanism to return the actual possible number of vectors if the
> underlying space is exhausted.
>
> Why is that not sufficient for your problem at hand?

I've already made that point, but it seems that the argument is
falling on deaf ears.

Being able to allocate MSIs is not a guarantee, and is always
opportunistic. If some drivers badly fail because the they don't get
the number of MSIs they need, then they need fixing.

I really don't see the point in papering over this at the lowest level
of the stack.

M.

--
Without deviation from the norm, progress is not possible.

2023-05-28 12:18:18

by Huacai Chen

[permalink] [raw]
Subject: Re: [PATCH 1/2] genirq/msi, platform-msi: Adjust return value of msi_domain_prepare_irqs()

Hi, Marc,

On Sun, May 28, 2023 at 3:47 PM Marc Zyngier <[email protected]> wrote:
>
> On Sat, 27 May 2023 15:03:29 +0100,
> Thomas Gleixner <[email protected]> wrote:
> >
> > On Sat, May 27 2023 at 13:46, Huacai Chen wrote:
> > > Adjust the return value semanteme of msi_domain_prepare_irqs(), which
> > > allows us to modify the input nvec by overriding the msi_domain_ops::
> > > msi_prepare(). This is necessary for the later patch.
> > >
> > > Before:
> > > 0 on success, others on error.
> > >
> > > After:
> > > = 0: Success;
> > >> 0: The modified nvec;
> > > < 0: Error code.
> >
> > This explains what the patch does, but provides zero justification for
> > this nor any analysis why this is correct for the existing use cases.
> >
> > That longsoon MSI domain is a PCI MSI domain. PCI/MSI has already a
> > mechanism to return the actual possible number of vectors if the
> > underlying space is exhausted.
> >
> > Why is that not sufficient for your problem at hand?
>
> I've already made that point, but it seems that the argument is
> falling on deaf ears.
I'm very sorry that I didn't answer your question directly.

>
> Being able to allocate MSIs is not a guarantee, and is always
> opportunistic. If some drivers badly fail because the they don't get
> the number of MSIs they need, then they need fixing.
Yes, I know allocating MSIs is not a guarantee, and most existing
drivers will fallback to use legacy irqs when failed. However, as I
replied in an early mail, we want to do some proactive throttling in
the loongson-pch-msi irqchip driver, rather than consume msi vectors
aggressively. For example, if we have two NICs, we want both of them
to get 32 msi vectors; not one exhaust all available vectors, and the
other fallback to use legacy irq.

I hope I have explained clearly, thanks.

Huacai

>
> I really don't see the point in papering over this at the lowest level
> of the stack.
>
> M.
>
> --
> Without deviation from the norm, progress is not possible.

2023-05-29 08:23:23

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH 1/2] genirq/msi, platform-msi: Adjust return value of msi_domain_prepare_irqs()

On Sun, May 28 2023 at 11:42, Huacai Chen wrote:
> On Sat, May 27, 2023 at 10:03 PM Thomas Gleixner <[email protected]> wrote:
>> On Sat, May 27 2023 at 13:46, Huacai Chen wrote:
>> > After:
>> > = 0: Success;
>> >> 0: The modified nvec;
>> > < 0: Error code.
>>
>> This explains what the patch does, but provides zero justification for
>> this nor any analysis why this is correct for the existing use cases.
> I checked all msi_prepare() callbacks and none of them return positive
> values now, so I think it is correct.

Still you failed to tell so in the changelog. It's not helpful if you
think it is correct. The point is that you have to make clear why it
_IS_ correct.

Thanks,

tglx

2023-05-29 09:31:22

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH 1/2] genirq/msi, platform-msi: Adjust return value of msi_domain_prepare_irqs()

On Sun, May 28 2023 at 20:07, Huacai Chen wrote:
> On Sun, May 28, 2023 at 3:47 PM Marc Zyngier <[email protected]> wrote:
>>
>> Being able to allocate MSIs is not a guarantee, and is always
>> opportunistic. If some drivers badly fail because the they don't get
>> the number of MSIs they need, then they need fixing.
>
> Yes, I know allocating MSIs is not a guarantee, and most existing
> drivers will fallback to use legacy irqs when failed. However, as I
> replied in an early mail, we want to do some proactive throttling in
> the loongson-pch-msi irqchip driver, rather than consume msi vectors
> aggressively. For example, if we have two NICs, we want both of them
> to get 32 msi vectors; not one exhaust all available vectors, and the
> other fallback to use legacy irq.

By default you allow up to 256 interrupts to be allocated, right? So to
prevent vector exhaustion, the admin needs to reboot the machine and set
a command line parameter to limit this, right? As that parameter is not
documented the admin is going to dice a number. That's impractical and
just a horrible bandaid.

Thanks,

tglx



2023-05-29 09:43:19

by Huacai Chen

[permalink] [raw]
Subject: Re: [PATCH 1/2] genirq/msi, platform-msi: Adjust return value of msi_domain_prepare_irqs()

Hi, Thomas,

On Mon, May 29, 2023 at 3:44 PM Thomas Gleixner <[email protected]> wrote:
>
> On Sun, May 28 2023 at 11:42, Huacai Chen wrote:
> > On Sat, May 27, 2023 at 10:03 PM Thomas Gleixner <[email protected]> wrote:
> >> On Sat, May 27 2023 at 13:46, Huacai Chen wrote:
> >> > After:
> >> > = 0: Success;
> >> >> 0: The modified nvec;
> >> > < 0: Error code.
> >>
> >> This explains what the patch does, but provides zero justification for
> >> this nor any analysis why this is correct for the existing use cases.
> > I checked all msi_prepare() callbacks and none of them return positive
> > values now, so I think it is correct.
>
> Still you failed to tell so in the changelog. It's not helpful if you
> think it is correct. The point is that you have to make clear why it
> _IS_ correct.
OK, I think I should add more information in the next version, thanks.

Huacai
>
> Thanks,
>
> tglx

2023-05-29 09:45:22

by Huacai Chen

[permalink] [raw]
Subject: Re: [PATCH 1/2] genirq/msi, platform-msi: Adjust return value of msi_domain_prepare_irqs()

Hi, Thomas,

On Mon, May 29, 2023 at 5:27 PM Thomas Gleixner <[email protected]> wrote:
>
> On Sun, May 28 2023 at 20:07, Huacai Chen wrote:
> > On Sun, May 28, 2023 at 3:47 PM Marc Zyngier <[email protected]> wrote:
> >>
> >> Being able to allocate MSIs is not a guarantee, and is always
> >> opportunistic. If some drivers badly fail because the they don't get
> >> the number of MSIs they need, then they need fixing.
> >
> > Yes, I know allocating MSIs is not a guarantee, and most existing
> > drivers will fallback to use legacy irqs when failed. However, as I
> > replied in an early mail, we want to do some proactive throttling in
> > the loongson-pch-msi irqchip driver, rather than consume msi vectors
> > aggressively. For example, if we have two NICs, we want both of them
> > to get 32 msi vectors; not one exhaust all available vectors, and the
> > other fallback to use legacy irq.
>
> By default you allow up to 256 interrupts to be allocated, right? So to
> prevent vector exhaustion, the admin needs to reboot the machine and set
> a command line parameter to limit this, right? As that parameter is not
> documented the admin is going to dice a number. That's impractical and
> just a horrible bandaid.
OK, I think I should update the documents in the new version.

Huacai
>
> Thanks,
>
> tglx
>
>

2023-05-29 20:33:07

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH 1/2] genirq/msi, platform-msi: Adjust return value of msi_domain_prepare_irqs()

Huacai!

On Mon, May 29 2023 at 17:36, Huacai Chen wrote:
> On Mon, May 29, 2023 at 5:27 PM Thomas Gleixner <[email protected]> wrote:
>> By default you allow up to 256 interrupts to be allocated, right? So to
>> prevent vector exhaustion, the admin needs to reboot the machine and set
>> a command line parameter to limit this, right? As that parameter is not
>> documented the admin is going to dice a number. That's impractical and
>> just a horrible bandaid.
>
> OK, I think I should update the documents in the new version.

Updating documentation neither makes it more practical (it still
requires a reboot) nor does it justify the abuse of the msi_prepare()
callback.

The only reason why this hack "works" is that there is a historical
mechanism which tells the PCI/MSI core that the number of requested
vectors cannot be allocated, but that there would be $N vectors
possible. But even that return value has no guarantee.

This mechanism is ill defined and really should go away.

Adding yet another way to limit this via msi_prepare() is just
proliferating this ill defined mechanism and I have zero interest in
that.

Let's take a step back and look at the larger picture:

1) A PCI/MSI irqdomain is attached to a PCI bus

2) The number of PCI devices on that PCI bus is usually known at boot
time _before_ the first device driver is probed.

That's not entirely true for PCI hotplug devices, but that's hardly
relevant for an architecture which got designed less than 10 years
ago and the architects decided that 256 MSI vectors are good enough
for up to 256 CPUs. The concept of per CPU queues was already known
at that time, no?

So the irqdomain can tell the PCI/MSI core the maximum number of vectors
available for a particular bus, right?

The default, i.e if the irqdomain does not expose that information,
would be "unlimited", i.e. ULONG_MAX.

Now take that number and divide it by the number of devices on the bus
and you get at least a sensible limit which does not immediately cause
vector exhaustion.

That limit might be suboptimal if there are lots of other devices on
that bus which just require one or two vectors, but that's something
which can be optimized via a generic command line option or even a sysfs
mechanism.

Thanks,

tglx






2023-05-30 08:39:32

by Huacai Chen

[permalink] [raw]
Subject: Re: [PATCH 1/2] genirq/msi, platform-msi: Adjust return value of msi_domain_prepare_irqs()

Hi, Thomas,

On Tue, May 30, 2023 at 4:19 AM Thomas Gleixner <[email protected]> wrote:
>
> Huacai!
>
> On Mon, May 29 2023 at 17:36, Huacai Chen wrote:
> > On Mon, May 29, 2023 at 5:27 PM Thomas Gleixner <[email protected]> wrote:
> >> By default you allow up to 256 interrupts to be allocated, right? So to
> >> prevent vector exhaustion, the admin needs to reboot the machine and set
> >> a command line parameter to limit this, right? As that parameter is not
> >> documented the admin is going to dice a number. That's impractical and
> >> just a horrible bandaid.
> >
> > OK, I think I should update the documents in the new version.
>
> Updating documentation neither makes it more practical (it still
> requires a reboot) nor does it justify the abuse of the msi_prepare()
> callback.
>
> The only reason why this hack "works" is that there is a historical
> mechanism which tells the PCI/MSI core that the number of requested
> vectors cannot be allocated, but that there would be $N vectors
> possible. But even that return value has no guarantee.
>
> This mechanism is ill defined and really should go away.
>
> Adding yet another way to limit this via msi_prepare() is just
> proliferating this ill defined mechanism and I have zero interest in
> that.
>
> Let's take a step back and look at the larger picture:
>
> 1) A PCI/MSI irqdomain is attached to a PCI bus
>
> 2) The number of PCI devices on that PCI bus is usually known at boot
> time _before_ the first device driver is probed.
>
> That's not entirely true for PCI hotplug devices, but that's hardly
> relevant for an architecture which got designed less than 10 years
> ago and the architects decided that 256 MSI vectors are good enough
> for up to 256 CPUs. The concept of per CPU queues was already known
> at that time, no?
Does this solution depend on the per-device msi domain? Can we do that
if we use the global msi domain?

>
> So the irqdomain can tell the PCI/MSI core the maximum number of vectors
> available for a particular bus, right?
>
> The default, i.e if the irqdomain does not expose that information,
> would be "unlimited", i.e. ULONG_MAX.
OK, thanks, but how to expose? By msi_domain_info::hwsize?

>
> Now take that number and divide it by the number of devices on the bus
> and you get at least a sensible limit which does not immediately cause
> vector exhaustion.
>
> That limit might be suboptimal if there are lots of other devices on
> that bus which just require one or two vectors, but that's something
> which can be optimized via a generic command line option or even a sysfs
> mechanism.
Hmm, if we still use the command line, then we still have some similar
drawbacks.

Huacai
>
> Thanks,
>
> tglx
>
>
>
>
>

2023-05-30 12:30:31

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH 1/2] genirq/msi, platform-msi: Adjust return value of msi_domain_prepare_irqs()

On Tue, May 30 2023 at 16:34, Huacai Chen wrote:
> On Tue, May 30, 2023 at 4:19 AM Thomas Gleixner <[email protected]> wrote:
>> Now take that number and divide it by the number of devices on the bus
>> and you get at least a sensible limit which does not immediately cause
>> vector exhaustion.
>>
>> That limit might be suboptimal if there are lots of other devices on
>> that bus which just require one or two vectors, but that's something
>> which can be optimized via a generic command line option or even a sysfs
>> mechanism.
> Hmm, if we still use the command line, then we still have some similar
> drawbacks.

Only for optimization. Without the optimization the limit might end up
being overbroad, but it definitely prevents vector exhaustion. For quite
some cases this might be even the proper limit which does not need any
further tweaking.

Thanks,

tglx

2023-05-30 15:19:31

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH 1/2] genirq/msi, platform-msi: Adjust return value of msi_domain_prepare_irqs()

On Tue, May 30 2023 at 16:34, Huacai Chen wrote:
> On Tue, May 30, 2023 at 4:19 AM Thomas Gleixner <[email protected]> wrote:
>> Let's take a step back and look at the larger picture:
>>
>> 1) A PCI/MSI irqdomain is attached to a PCI bus
>>
>> 2) The number of PCI devices on that PCI bus is usually known at boot
>> time _before_ the first device driver is probed.
>>
>> That's not entirely true for PCI hotplug devices, but that's hardly
>> relevant for an architecture which got designed less than 10 years
>> ago and the architects decided that 256 MSI vectors are good enough
>> for up to 256 CPUs. The concept of per CPU queues was already known
>> at that time, no?
> Does this solution depend on the per-device msi domain? Can we do that
> if we use the global msi domain?

In principle it should not depend on per-device MSI domains, but I
really don't want to add new functionality to the old operating models
as that does not create an incentive for people to convert their stuff
over.

>> So the irqdomain can tell the PCI/MSI core the maximum number of vectors
>> available for a particular bus, right?
>>
>> The default, i.e if the irqdomain does not expose that information,
>> would be "unlimited", i.e. ULONG_MAX.
> OK, thanks, but how to expose? By msi_domain_info::hwsize?

Probably. Needs a proper helper around it.

Thanks,

tglx

2023-06-01 15:24:35

by Huacai Chen

[permalink] [raw]
Subject: Re: [PATCH 1/2] genirq/msi, platform-msi: Adjust return value of msi_domain_prepare_irqs()

Hi, Thomas,

On Tue, May 30, 2023 at 11:03 PM Thomas Gleixner <[email protected]> wrote:
>
> On Tue, May 30 2023 at 16:34, Huacai Chen wrote:
> > On Tue, May 30, 2023 at 4:19 AM Thomas Gleixner <[email protected]> wrote:
> >> Let's take a step back and look at the larger picture:
> >>
> >> 1) A PCI/MSI irqdomain is attached to a PCI bus
> >>
> >> 2) The number of PCI devices on that PCI bus is usually known at boot
> >> time _before_ the first device driver is probed.
> >>
> >> That's not entirely true for PCI hotplug devices, but that's hardly
> >> relevant for an architecture which got designed less than 10 years
> >> ago and the architects decided that 256 MSI vectors are good enough
> >> for up to 256 CPUs. The concept of per CPU queues was already known
> >> at that time, no?
> > Does this solution depend on the per-device msi domain? Can we do that
> > if we use the global msi domain?
>
> In principle it should not depend on per-device MSI domains, but I
> really don't want to add new functionality to the old operating models
> as that does not create an incentive for people to convert their stuff
> over.
Thank you for your advice, but for our scenario, its effect is no
better than this patch (because not all devices are aggressive
devices), so we give up. :)

And as Jason said in another thread, this problem can be solved by
simply reducing the number of queues by ethtool. So let's ignore this
patch since it is not acceptable.

Huacai
>
> >> So the irqdomain can tell the PCI/MSI core the maximum number of vectors
> >> available for a particular bus, right?
> >>
> >> The default, i.e if the irqdomain does not expose that information,
> >> would be "unlimited", i.e. ULONG_MAX.
> > OK, thanks, but how to expose? By msi_domain_info::hwsize?
>
> Probably. Needs a proper helper around it.
>
> Thanks,
>
> tglx

2023-06-02 01:29:04

by maobibo

[permalink] [raw]
Subject: Re: [PATCH 1/2] genirq/msi, platform-msi: Adjust return value of msi_domain_prepare_irqs()



在 2023/5/30 23:03, Thomas Gleixner 写道:
> On Tue, May 30 2023 at 16:34, Huacai Chen wrote:
>> On Tue, May 30, 2023 at 4:19 AM Thomas Gleixner <[email protected]> wrote:
>>> Let's take a step back and look at the larger picture:
>>>
>>> 1) A PCI/MSI irqdomain is attached to a PCI bus
>>>
>>> 2) The number of PCI devices on that PCI bus is usually known at boot
>>> time _before_ the first device driver is probed.
>>>
>>> That's not entirely true for PCI hotplug devices, but that's hardly
>>> relevant for an architecture which got designed less than 10 years
>>> ago and the architects decided that 256 MSI vectors are good enough
>>> for up to 256 CPUs. The concept of per CPU queues was already known
>>> at that time, no?
>> Does this solution depend on the per-device msi domain? Can we do that
>> if we use the global msi domain?
>
> In principle it should not depend on per-device MSI domains, but I
> really don't want to add new functionality to the old operating models
> as that does not create an incentive for people to convert their stuff
> over.
>
>>> So the irqdomain can tell the PCI/MSI core the maximum number of vectors
>>> available for a particular bus, right?
>>>
>>> The default, i.e if the irqdomain does not expose that information,
>>> would be "unlimited", i.e. ULONG_MAX.
>> OK, thanks, but how to expose? By msi_domain_info::hwsize?
>
> Probably. Needs a proper helper around it.

It is not common issue, command line and documentation explanation
is not suitable here.

Can we add weak function like this?
int __weak arch_set_max_msix_vectors(void)

Regards
Bibo, mao
>
> Thanks,
>
> tglx