2013-04-29 04:30:33

by Alexander Gordeev

[permalink] [raw]
Subject: [PATCH -tip/apic 0/2] PCI/MSI: Allocate as many multiple-MSIs as requested

Hi Gentleman,

This update will allow to conserve interrupt resources when PCI
device uses multiple-MSI mode and sends lesser power-of-two MSIs.

I have held this off for some time, since there were no such usages
(at least known to me). But recently PLX Technology confirmed they
do have, i.e. their new PEX8796 chip needs 18 MSIs.

The series is against the tip/apic.

Alexander Gordeev (2):
PCI/MSI: Allocate as many multiple-MSIs as requested
x86/MSI: Allocate as many multiple-MSIs as requested

drivers/iommu/irq_remapping.c | 7 ++++---
drivers/pci/msi.c | 10 ++++++++--
include/linux/msi.h | 1 +
3 files changed, 13 insertions(+), 5 deletions(-)

--
1.7.7.6


--
Regards,
Alexander Gordeev
[email protected]


2013-04-29 04:31:19

by Alexander Gordeev

[permalink] [raw]
Subject: [PATCH -tip/apic 1/2] PCI/MSI: Allocate as many multiple-MSIs as requested

When multiple MSIs are enabled with pci_enable_msi_block(), the
requested number of interrupts 'nvec' is rounded up to the nearest
power-of-two value. The result is then used for setting up the
number of MSI messages in the PCI device and allocation of
interrupt resources in the operating system (i.e. vector numbers).
Thus, in cases when a device driver requests some number of MSIs
and this number is not a power-of-two value, the extra operating
system resources (allocated as the result of rounding) are wasted.

This fix introduces 'msi_desc::nvec' field to address the above
issue. When non-zero, it will report the actual number of MSIs the
device will send, as requested by the device driver. This value
should be used by architectures to properly set up and tear down
associated interrupt resources.

Note, although the existing 'msi_desc::multiple' field might seem
redundant, in fact in does not. In general case the number of MSIs a
PCI device is initialized with is not necessarily the closest power-
of-two value of the number of MSIs the device will send. Thus, in
theory it would not be always possible to derive the former from the
latter and we need to keep them both, to stress this corner case.
Besides, since 'msi_desc::multiple' is a bitfield, throwing it out
would not save us any space.

Signed-off-by: Alexander Gordeev <[email protected]>
---
drivers/pci/msi.c | 10 ++++++++--
include/linux/msi.h | 1 +
2 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
index 00cc78c..014b9d5 100644
--- a/drivers/pci/msi.c
+++ b/drivers/pci/msi.c
@@ -79,7 +79,10 @@ void default_teardown_msi_irqs(struct pci_dev *dev)
int i, nvec;
if (entry->irq == 0)
continue;
- nvec = 1 << entry->msi_attrib.multiple;
+ if (entry->nvec)
+ nvec = entry->nvec;
+ else
+ nvec = 1 << entry->msi_attrib.multiple;
for (i = 0; i < nvec; i++)
arch_teardown_msi_irq(entry->irq + i);
}
@@ -340,7 +343,10 @@ static void free_msi_irqs(struct pci_dev *dev)
int i, nvec;
if (!entry->irq)
continue;
- nvec = 1 << entry->msi_attrib.multiple;
+ if (entry->nvec)
+ nvec = entry->nvec;
+ else
+ nvec = 1 << entry->msi_attrib.multiple;
#ifdef CONFIG_GENERIC_HARDIRQS
for (i = 0; i < nvec; i++)
BUG_ON(irq_has_action(entry->irq + i));
diff --git a/include/linux/msi.h b/include/linux/msi.h
index ce93a34..0e20dfc 100644
--- a/include/linux/msi.h
+++ b/include/linux/msi.h
@@ -35,6 +35,7 @@ struct msi_desc {

u32 masked; /* mask bits */
unsigned int irq;
+ unsigned int nvec; /* number of messages */
struct list_head list;

union {
--
1.7.7.6


--
Regards,
Alexander Gordeev
[email protected]

2013-04-29 04:32:01

by Alexander Gordeev

[permalink] [raw]
Subject: [PATCH -tip/apic 2/2] x86/MSI: Allocate as many multiple-MSIs as requested

Signed-off-by: Alexander Gordeev <[email protected]>
---
drivers/iommu/irq_remapping.c | 7 ++++---
1 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/irq_remapping.c b/drivers/iommu/irq_remapping.c
index d56f8c1..315c563 100644
--- a/drivers/iommu/irq_remapping.c
+++ b/drivers/iommu/irq_remapping.c
@@ -55,19 +55,19 @@ static int do_setup_msi_irqs(struct pci_dev *dev, int nvec)
unsigned int irq;
struct msi_desc *msidesc;

- nvec = __roundup_pow_of_two(nvec);
-
WARN_ON(!list_is_singular(&dev->msi_list));
msidesc = list_entry(dev->msi_list.next, struct msi_desc, list);
WARN_ON(msidesc->irq);
WARN_ON(msidesc->msi_attrib.multiple);
+ WARN_ON(msidesc->nvec);

node = dev_to_node(&dev->dev);
irq = __create_irqs(get_nr_irqs_gsi(), nvec, node);
if (irq == 0)
return -ENOSPC;

- msidesc->msi_attrib.multiple = ilog2(nvec);
+ msidesc->nvec = nvec;
+ msidesc->msi_attrib.multiple = ilog2(__roundup_pow_of_two(nvec));
for (sub_handle = 0; sub_handle < nvec; sub_handle++) {
if (!sub_handle) {
index = msi_alloc_remapped_irq(dev, irq, nvec);
@@ -95,6 +95,7 @@ error:
* IRQs from tearing down again in default_teardown_msi_irqs()
*/
msidesc->irq = 0;
+ msidesc->nvec = 0;
msidesc->msi_attrib.multiple = 0;

return ret;
--
1.7.7.6


--
Regards,
Alexander Gordeev
[email protected]

2013-04-29 07:21:36

by Jan Beulich

[permalink] [raw]
Subject: Re: [PATCH -tip/apic 2/2] x86/MSI: Allocate as many multiple-MSIs as requested

>>> On 29.04.13 at 06:33, Alexander Gordeev <[email protected]> wrote:
> --- a/drivers/iommu/irq_remapping.c
> +++ b/drivers/iommu/irq_remapping.c
> @@ -55,19 +55,19 @@ static int do_setup_msi_irqs(struct pci_dev *dev, int nvec)
> unsigned int irq;
> struct msi_desc *msidesc;
>
> - nvec = __roundup_pow_of_two(nvec);
> -
> WARN_ON(!list_is_singular(&dev->msi_list));
> msidesc = list_entry(dev->msi_list.next, struct msi_desc, list);
> WARN_ON(msidesc->irq);
> WARN_ON(msidesc->msi_attrib.multiple);
> + WARN_ON(msidesc->nvec);
>
> node = dev_to_node(&dev->dev);
> irq = __create_irqs(get_nr_irqs_gsi(), nvec, node);
> if (irq == 0)
> return -ENOSPC;
>
> - msidesc->msi_attrib.multiple = ilog2(nvec);
> + msidesc->nvec = nvec;
> + msidesc->msi_attrib.multiple = ilog2(__roundup_pow_of_two(nvec));
> for (sub_handle = 0; sub_handle < nvec; sub_handle++) {
> if (!sub_handle) {
> index = msi_alloc_remapped_irq(dev, irq, nvec);

This breaks the interface to IOMMU-specific code: While Intel's
implementation does bump the number of allocated IRTEs to a
power of 2, AMD's doesn't, and hence the tail entries in the block
that don't get allocated here can get used for another device,
thus creating a security hole when both devices aren't owned by
the same guest (with the host being considered a special kind of
guest for this purpose).

IOW, while you can conserve on the number of vectors allocated,
you can't on the IRTEs, and I think this should be taken care of in
the generic IOMMU code, not in the individual vendor
implementations.

Jan

2013-04-30 11:09:59

by Alexander Gordeev

[permalink] [raw]
Subject: [PATCH v2 -tip/apic 2/2] x86/MSI: Allocate as many multiple-MSIs as requested

Signed-off-by: Alexander Gordeev <[email protected]>
---
drivers/iommu/irq_remapping.c | 12 +++++++-----
1 files changed, 7 insertions(+), 5 deletions(-)

diff --git a/drivers/iommu/irq_remapping.c b/drivers/iommu/irq_remapping.c
index d56f8c1..9eeb6cf 100644
--- a/drivers/iommu/irq_remapping.c
+++ b/drivers/iommu/irq_remapping.c
@@ -51,26 +51,27 @@ static void irq_remapping_disable_io_apic(void)

static int do_setup_msi_irqs(struct pci_dev *dev, int nvec)
{
- int node, ret, sub_handle, index = 0;
+ int node, ret, sub_handle, nvec_pow2, index = 0;
unsigned int irq;
struct msi_desc *msidesc;

- nvec = __roundup_pow_of_two(nvec);
-
WARN_ON(!list_is_singular(&dev->msi_list));
msidesc = list_entry(dev->msi_list.next, struct msi_desc, list);
WARN_ON(msidesc->irq);
WARN_ON(msidesc->msi_attrib.multiple);
+ WARN_ON(msidesc->nvec);

node = dev_to_node(&dev->dev);
irq = __create_irqs(get_nr_irqs_gsi(), nvec, node);
if (irq == 0)
return -ENOSPC;

- msidesc->msi_attrib.multiple = ilog2(nvec);
+ nvec_pow2 = __roundup_pow_of_two(nvec);
+ msidesc->nvec = nvec;
+ msidesc->msi_attrib.multiple = ilog2(nvec_pow2);
for (sub_handle = 0; sub_handle < nvec; sub_handle++) {
if (!sub_handle) {
- index = msi_alloc_remapped_irq(dev, irq, nvec);
+ index = msi_alloc_remapped_irq(dev, irq, nvec_pow2);
if (index < 0) {
ret = index;
goto error;
@@ -95,6 +96,7 @@ error:
* IRQs from tearing down again in default_teardown_msi_irqs()
*/
msidesc->irq = 0;
+ msidesc->nvec = 0;
msidesc->msi_attrib.multiple = 0;

return ret;
--
1.7.7.6


--
Regards,
Alexander Gordeev
[email protected]

2013-05-02 12:36:35

by Joerg Roedel

[permalink] [raw]
Subject: Re: [PATCH v2 -tip/apic 2/2] x86/MSI: Allocate as many multiple-MSIs as requested

On Tue, Apr 30, 2013 at 01:11:11PM +0200, Alexander Gordeev wrote:
> Signed-off-by: Alexander Gordeev <[email protected]>

Please add a proper commit message explaining why this change is
necessary or what problem it fixes.


Joerg

2013-05-03 07:11:47

by Alexander Gordeev

[permalink] [raw]
Subject: [PATCH v3 -tip/apic 2/2] x86/MSI: Conserve interrupt resources when using multiple-MSIs

Current multiple-MSI implementation does not take into account
actual number of requested MSIs and always rounds that number
to a closest power-of-two value. Yet, a number of MSIs a PCI
device could send (and therefore a number of messages a device
driver could request) may be a lesser power-of-two. As result,
resources allocated for extra MSIs just wasted.

This update takes advantage of 'msi_desc::nvec' field introduced
with generic MSI code to track number of requested and used MSIs.
As result, resources associated with interrupts are conserved.
Of those resources most noticeable are x86 interrupt vectors.

The initial version of this fix also consumed on IRTEs, but Jan
noticed that a malfunctioning PCI device might send a message
number it did not claim and thus refer an IRTE it does not own.
To avoid this security hole the old-way approach preserved and
as many IRTEs are reserved as the device could possibly send.

Signed-off-by: Alexander Gordeev <[email protected]>
---
drivers/iommu/irq_remapping.c | 12 +++++++-----
1 files changed, 7 insertions(+), 5 deletions(-)

diff --git a/drivers/iommu/irq_remapping.c b/drivers/iommu/irq_remapping.c
index d56f8c1..9eeb6cf 100644
--- a/drivers/iommu/irq_remapping.c
+++ b/drivers/iommu/irq_remapping.c
@@ -51,26 +51,27 @@ static void irq_remapping_disable_io_apic(void)

static int do_setup_msi_irqs(struct pci_dev *dev, int nvec)
{
- int node, ret, sub_handle, index = 0;
+ int node, ret, sub_handle, nvec_pow2, index = 0;
unsigned int irq;
struct msi_desc *msidesc;

- nvec = __roundup_pow_of_two(nvec);
-
WARN_ON(!list_is_singular(&dev->msi_list));
msidesc = list_entry(dev->msi_list.next, struct msi_desc, list);
WARN_ON(msidesc->irq);
WARN_ON(msidesc->msi_attrib.multiple);
+ WARN_ON(msidesc->nvec);

node = dev_to_node(&dev->dev);
irq = __create_irqs(get_nr_irqs_gsi(), nvec, node);
if (irq == 0)
return -ENOSPC;

- msidesc->msi_attrib.multiple = ilog2(nvec);
+ nvec_pow2 = __roundup_pow_of_two(nvec);
+ msidesc->nvec = nvec;
+ msidesc->msi_attrib.multiple = ilog2(nvec_pow2);
for (sub_handle = 0; sub_handle < nvec; sub_handle++) {
if (!sub_handle) {
- index = msi_alloc_remapped_irq(dev, irq, nvec);
+ index = msi_alloc_remapped_irq(dev, irq, nvec_pow2);
if (index < 0) {
ret = index;
goto error;
@@ -95,6 +96,7 @@ error:
* IRQs from tearing down again in default_teardown_msi_irqs()
*/
msidesc->irq = 0;
+ msidesc->nvec = 0;
msidesc->msi_attrib.multiple = 0;

return ret;
--
1.7.7.6

--
Regards,
Alexander Gordeev
[email protected]