LinuxLists.cc - [PATCH 0/4] AHCI: Conserve interrupts with pci_enable_msi_block_part() interface

2013-09-02 08:57:15

by Alexander Gordeev

[permalink] [raw]

Subject: [PATCH 0/4] AHCI: Conserve interrupts with pci_enable_msi_block_part() interface

This series is aimed to conserve on othewise wasted interrupt
resources for 10 of 16 unused MSI vectors for AHCI devices on
Intel chipsets.

Alexander Gordeev (4):
PCI/MSI: Introduce pci_enable_msi_block_part() interface
MSI/x86: Support pci_enable_msi_block_part() interface
AHCI: Conserve interrupts with pci_enable_msi_block_part() interface
PCI/MSI: Get rid of pci_enable_msi_block_auto() interface

Documentation/PCI/MSI-HOWTO.txt | 46 +++++++++++++++++-
arch/mips/pci/msi-octeon.c | 2 +-
arch/powerpc/kernel/msi.c | 4 +-
arch/s390/pci/pci.c | 2 +-
arch/x86/include/asm/pci.h | 8 ++-
arch/x86/include/asm/x86_init.h | 3 +-
arch/x86/kernel/apic/io_apic.c | 3 +-
drivers/ata/ahci.c | 48 +++++++++++--------
drivers/iommu/irq_remapping.c | 14 +++---
drivers/pci/msi.c | 98 ++++++++++++++++++++-------------------
include/linux/msi.h | 5 +-
include/linux/pci.h | 9 ++--
12 files changed, 150 insertions(+), 92 deletions(-)

--
1.7.7.6

--
Regards,
Alexander Gordeev
[email protected]

2013-09-02 08:57:42

by Alexander Gordeev

[permalink] [raw]

Subject: [PATCH 1/4] PCI/MSI: Introduce pci_enable_msi_block_part() interface

There are PCI devices that require a particular value written
to the Multiple Message Enable (MME) register while aligned on
power of 2 boundary value of actually used MSI vectors 'nvec'
is a lesser of that MME value:

roundup_pow_of_two(nvec) < 'Multiple Message Enable'

However the existing pci_enable_msi_block() interface is not
able to configure such devices, since the value written to the
MME register is calculated from the number of requested MSIs
'nvec':

'Multiple Message Enable' = roundup_pow_of_two(nvec)

In this case the result written to the MME register may not
satisfy the aforementioned PCI devices requirement and therefore
the PCI functions will not operate in a desired mode.

This update introduces pci_enable_msi_block_part() extension to
pci_enable_msi_block() interface that accepts extra 'nvec_mme'
argument which is then written to the MME register while the
value of 'nvec' is still used to setup as many interrupts as
requested.

Signed-off-by: Alexander Gordeev <[email protected]>
---
Documentation/PCI/MSI-HOWTO.txt | 56 ++++++++++++++++++++++++----
arch/mips/pci/msi-octeon.c | 2 +-
arch/powerpc/kernel/msi.c | 4 +-
arch/s390/pci/pci.c | 2 +-
arch/x86/include/asm/pci.h | 8 +++--
arch/x86/include/asm/x86_init.h | 3 +-
arch/x86/kernel/apic/io_apic.c | 3 +-
drivers/iommu/irq_remapping.c | 2 +-
drivers/pci/msi.c | 77 ++++++++++++++++++++++++++-------------
include/linux/msi.h | 5 ++-
include/linux/pci.h | 8 ++++
11 files changed, 125 insertions(+), 45 deletions(-)

diff --git a/Documentation/PCI/MSI-HOWTO.txt b/Documentation/PCI/MSI-HOWTO.txt
index a091780..32d7d15 100644
--- a/Documentation/PCI/MSI-HOWTO.txt
+++ b/Documentation/PCI/MSI-HOWTO.txt
@@ -127,7 +127,47 @@ on the number of vectors that can be allocated; pci_enable_msi_block()
returns as soon as it finds any constraint that doesn't allow the
call to succeed.

-4.2.3 pci_enable_msi_block_auto
+4.2.3 pci_enable_msi_block_part
+
+int pci_enable_msi_block_part(struct pci_dev *dev, int count, int alloc)
+
+This variation on the above call allows a device driver to request 'alloc'
+number of multiple MSIs while setup 'count' number of MSIs, which could be
+a lesser of 'alloc'. The MSI specification only allows interrupts to be
+allocated in powers of two, up to a maximum of 2^5 (32).
+
+In case the driver wants to allocate a maximum possible number of MSIs
+for the device it may pass a negative number as 'alloc' parameter.
+
+If this function returns 0, it has succeeded in allocating 'alloc'
+interrupts and setting up 'count' interrupts. In this case, the function
+enables MSI on this device and updates dev->irq to be the lowest of the
+new interrupts assigned to it. The other interrupts assigned to the
+device are in the range dev->irq to dev->irq + count - 1.
+
+If this function returns -ERANGE, it indicates 'count' is greater than
+'alloc' and the driver should adjust either or both parameters.
+
+If this function returns other negative number, it indicates an error
+and the driver should not attempt to request any more MSI interrupts
+for this device. If this function returns a positive number, it is
+less than 'alloc' and indicates the number of interrupts that could have
+been allocated. In neither case is the irq value updated or the device
+switched into MSI mode.
+
+The device driver must decide what action to take if
+pci_enable_msi_block_part() returns a value less than 'alloc'. For
+instance, the driver could still make use of fewer interrupts; in this
+case the driver should possibly adjust 'count' parameter and call
+pci_enable_msi_block_part() again or even call pci_enable_msi_block()
+instead. Note that it is not guaranteed to succeed, even when the
+'alloc' has been reduced to the value returned from a previous call to
+pci_enable_msi_block_part(). This is because there are multiple
+constraints on the number of vectors that can be allocated;
+pci_enable_msi_block_part() returns as soon as it finds any constraint
+that doesn't allow the call to succeed.
+
+4.2.4 pci_enable_msi_block_auto

int pci_enable_msi_block_auto(struct pci_dev *dev, unsigned int *count)

@@ -153,16 +193,16 @@ succeeds, but returns a value less than the number of interrupts supported.
If the device driver does not need to know the number of interrupts
supported, it can set the pointer count to NULL.

-4.2.4 pci_disable_msi
+4.2.5 pci_disable_msi

void pci_disable_msi(struct pci_dev *dev)

-This function should be used to undo the effect of pci_enable_msi() or
-pci_enable_msi_block() or pci_enable_msi_block_auto(). Calling it restores
-dev->irq to the pin-based interrupt number and frees the previously
-allocated message signaled interrupt(s). The interrupt may subsequently be
-assigned to another device, so drivers should not cache the value of
-dev->irq.
+This function should be used to undo the effect of pci_enable_msi_block(),
+pci_enable_msi(), pci_enable_msi_block_auto() or pci_enable_msi_block_part().
+Calling it restores dev->irq to the pin-based interrupt number and frees the
+previously allocated message signaled interrupt(s). The interrupt may
+subsequently be assigned to another device, so drivers should not cache the
+value of dev->irq.

Before calling this function, a device driver must always call free_irq()
on any interrupt for which it previously called request_irq().
diff --git a/arch/mips/pci/msi-octeon.c b/arch/mips/pci/msi-octeon.c
index d37be36..c9aaf8d 100644
--- a/arch/mips/pci/msi-octeon.c
+++ b/arch/mips/pci/msi-octeon.c
@@ -177,7 +177,7 @@ msi_irq_allocated:
return 0;
}

-int arch_setup_msi_irqs(struct pci_dev *dev, int nvec, int type)
+int arch_setup_msi_irqs(struct pci_dev *dev, int nvec, int nvec_mme, int type)
{
struct msi_desc *entry;
int ret;
diff --git a/arch/powerpc/kernel/msi.c b/arch/powerpc/kernel/msi.c
index 8bbc12d..fc70513 100644
--- a/arch/powerpc/kernel/msi.c
+++ b/arch/powerpc/kernel/msi.c
@@ -13,7 +13,7 @@

#include <asm/machdep.h>

-int arch_msi_check_device(struct pci_dev* dev, int nvec, int type)
+int arch_msi_check_device(struct pci_dev* dev, int nvec, int nvec_mme, int type)
{
if (!ppc_md.setup_msi_irqs || !ppc_md.teardown_msi_irqs) {
pr_debug("msi: Platform doesn't provide MSI callbacks.\n");
@@ -32,7 +32,7 @@ int arch_msi_check_device(struct pci_dev* dev, int nvec, int type)
return 0;
}

-int arch_setup_msi_irqs(struct pci_dev *dev, int nvec, int type)
+int arch_setup_msi_irqs(struct pci_dev *dev, int nvec, int nvec_mme, int type)
{
return ppc_md.setup_msi_irqs(dev, nvec, type);
}
diff --git a/arch/s390/pci/pci.c b/arch/s390/pci/pci.c
index e2956ad..688a5db 100644
--- a/arch/s390/pci/pci.c
+++ b/arch/s390/pci/pci.c
@@ -538,7 +538,7 @@ static void zpci_teardown_msi(struct pci_dev *pdev)
aisb_max--;
}

-int arch_setup_msi_irqs(struct pci_dev *pdev, int nvec, int type)
+int arch_setup_msi_irqs(struct pci_dev *pdev, int nvec, int nvec_mme, int type)
{
pr_debug("%s: requesting %d MSI-X interrupts...", __func__, nvec);
if (type != PCI_CAP_ID_MSIX && type != PCI_CAP_ID_MSI)
diff --git a/arch/x86/include/asm/pci.h b/arch/x86/include/asm/pci.h
index d9e9e6c..620642f 100644
--- a/arch/x86/include/asm/pci.h
+++ b/arch/x86/include/asm/pci.h
@@ -101,9 +101,10 @@ extern void pci_iommu_alloc(void);

#ifdef CONFIG_PCI_MSI
/* MSI arch specific hooks */
-static inline int x86_setup_msi_irqs(struct pci_dev *dev, int nvec, int type)
+static inline int x86_setup_msi_irqs(struct pci_dev *dev,
+ int nvec, int nvec_mme, int type)
{
- return x86_msi.setup_msi_irqs(dev, nvec, type);
+ return x86_msi.setup_msi_irqs(dev, nvec, nvec_mme, type);
}

static inline void x86_teardown_msi_irqs(struct pci_dev *dev)
@@ -125,7 +126,8 @@ static inline void x86_restore_msi_irqs(struct pci_dev *dev, int irq)
#define arch_restore_msi_irqs x86_restore_msi_irqs
/* implemented in arch/x86/kernel/apic/io_apic. */
struct msi_desc;
-int native_setup_msi_irqs(struct pci_dev *dev, int nvec, int type);
+int native_setup_msi_irqs(struct pci_dev *dev,
+ int nvec, int nvec_mme, int type);
void native_teardown_msi_irq(unsigned int irq);
void native_restore_msi_irqs(struct pci_dev *dev, int irq);
int setup_msi_irq(struct pci_dev *dev, struct msi_desc *msidesc,
diff --git a/arch/x86/include/asm/x86_init.h b/arch/x86/include/asm/x86_init.h
index 828a156..04a8767 100644
--- a/arch/x86/include/asm/x86_init.h
+++ b/arch/x86/include/asm/x86_init.h
@@ -174,7 +174,8 @@ struct pci_dev;
struct msi_msg;

struct x86_msi_ops {
- int (*setup_msi_irqs)(struct pci_dev *dev, int nvec, int type);
+ int (*setup_msi_irqs)(struct pci_dev *dev,
+ int nvec, int nvec_mme, int type);
void (*compose_msi_msg)(struct pci_dev *dev, unsigned int irq,
unsigned int dest, struct msi_msg *msg,
u8 hpet_id);
diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
index 9ed796c..21f6a44 100644
--- a/arch/x86/kernel/apic/io_apic.c
+++ b/arch/x86/kernel/apic/io_apic.c
@@ -3132,7 +3132,8 @@ int setup_msi_irq(struct pci_dev *dev, struct msi_desc *msidesc,
return 0;
}

-int native_setup_msi_irqs(struct pci_dev *dev, int nvec, int type)
+int native_setup_msi_irqs(struct pci_dev *dev,
+ int nvec, int nvec_mme, int type)
{
unsigned int irq, irq_want;
struct msi_desc *msidesc;
diff --git a/drivers/iommu/irq_remapping.c b/drivers/iommu/irq_remapping.c
index 39f81ae..1a220a0 100644
--- a/drivers/iommu/irq_remapping.c
+++ b/drivers/iommu/irq_remapping.c
@@ -142,7 +142,7 @@ error:
}

static int irq_remapping_setup_msi_irqs(struct pci_dev *dev,
- int nvec, int type)
+ int nvec, int nvec_mme, int type)
{
if (type == PCI_CAP_ID_MSI)
return do_setup_msi_irqs(dev, nvec);
diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
index aca7578..a5c958f 100644
--- a/drivers/pci/msi.c
+++ b/drivers/pci/msi.c
@@ -31,7 +31,8 @@ static int pci_msi_enable = 1;
/* Arch hooks */

#ifndef arch_msi_check_device
-int arch_msi_check_device(struct pci_dev *dev, int nvec, int type)
+int arch_msi_check_device(struct pci_dev *dev,
+ int nvec, int nvec_mme, int type)
{
return 0;
}
@@ -43,7 +44,8 @@ int arch_msi_check_device(struct pci_dev *dev, int nvec, int type)
#endif

#ifdef HAVE_DEFAULT_MSI_SETUP_IRQS
-int default_setup_msi_irqs(struct pci_dev *dev, int nvec, int type)
+int default_setup_msi_irqs(struct pci_dev *dev,
+ int nvec, int nvec_mme, int type)
{
struct msi_desc *entry;
int ret;
@@ -540,6 +542,7 @@ out_unroll:
* msi_capability_init - configure device's MSI capability structure
* @dev: pointer to the pci_dev data structure of MSI device function
* @nvec: number of interrupts to allocate
+ * @nvec_mme: number of interrupts to write to Multiple Message Enable register
*
* Setup the MSI capability structure of the device with the requested
* number of interrupts. A return value of zero indicates the successful
@@ -547,7 +550,7 @@ out_unroll:
* an error, and a positive return value indicates the number of interrupts
* which could have been allocated.
*/
-static int msi_capability_init(struct pci_dev *dev, int nvec)
+static int msi_capability_init(struct pci_dev *dev, int nvec, int nvec_mme)
{
struct msi_desc *entry;
int ret;
@@ -582,7 +585,7 @@ static int msi_capability_init(struct pci_dev *dev, int nvec)
list_add_tail(&entry->list, &dev->msi_list);

/* Configure MSI capability structure */
- ret = arch_setup_msi_irqs(dev, nvec, PCI_CAP_ID_MSI);
+ ret = arch_setup_msi_irqs(dev, nvec, nvec_mme, PCI_CAP_ID_MSI);
if (ret) {
msi_mask_irq(entry, mask, ~mask);
free_msi_irqs(dev);
@@ -700,7 +703,8 @@ static int msix_capability_init(struct pci_dev *dev,
if (ret)
return ret;

- ret = arch_setup_msi_irqs(dev, nvec, PCI_CAP_ID_MSIX);
+ /* nvec_mme parameter does not make sense in case of MSI-X */
+ ret = arch_setup_msi_irqs(dev, nvec, -1, PCI_CAP_ID_MSIX);
if (ret)
goto error;

@@ -755,13 +759,15 @@ error:
* pci_msi_check_device - check whether MSI may be enabled on a device
* @dev: pointer to the pci_dev data structure of MSI device function
* @nvec: how many MSIs have been requested ?
+ * @nvec_mme: how many MSIs write to Multiple Message Enable register ?
* @type: are we checking for MSI or MSI-X ?
*
* Look at global flags, the device itself, and its parent busses
* to determine if MSI/-X are supported for the device. If MSI/-X is
* supported return 0, else return an error code.
**/
-static int pci_msi_check_device(struct pci_dev *dev, int nvec, int type)
+static int pci_msi_check_device(struct pci_dev *dev,
+ int nvec, int nvec_mme, int type)
{
struct pci_bus *bus;
int ret;
@@ -789,27 +795,15 @@ static int pci_msi_check_device(struct pci_dev *dev, int nvec, int type)
if (bus->bus_flags & PCI_BUS_FLAGS_NO_MSI)
return -EINVAL;

- ret = arch_msi_check_device(dev, nvec, type);
+ ret = arch_msi_check_device(dev, nvec, nvec_mme, type);
if (ret)
return ret;

return 0;
}

-/**
- * pci_enable_msi_block - configure device's MSI capability structure
- * @dev: device to configure
- * @nvec: number of interrupts to configure
- *
- * Allocate IRQs for a device with the MSI capability.
- * This function returns a negative errno if an error occurs. If it
- * is unable to allocate the number of interrupts requested, it returns
- * the number of interrupts it might be able to allocate. If it successfully
- * allocates at least the number of interrupts requested, it returns 0 and
- * updates the @dev's irq member to the lowest new interrupt number; the
- * other interrupt numbers allocated to this device are consecutive.
- */
-int pci_enable_msi_block(struct pci_dev *dev, unsigned int nvec)
+int pci_enable_msi_block_part(struct pci_dev *dev,
+ unsigned int nvec, int nvec_mme)
{
int status, maxvec;
u16 msgctl;
@@ -819,10 +813,17 @@ int pci_enable_msi_block(struct pci_dev *dev, unsigned int nvec)

pci_read_config_word(dev, dev->msi_cap + PCI_MSI_FLAGS, &msgctl);
maxvec = 1 << ((msgctl & PCI_MSI_FLAGS_QMASK) >> 1);
- if (nvec > maxvec)
+
+ if (nvec_mme < 0)
+ nvec_mme = maxvec;
+ if (nvec_mme > maxvec)
return maxvec;
+ if (__roundup_pow_of_two(nvec_mme) != nvec_mme)
+ return -EINVAL;
+ if (nvec > nvec_mme)
+ return -ERANGE;

- status = pci_msi_check_device(dev, nvec, PCI_CAP_ID_MSI);
+ status = pci_msi_check_device(dev, nvec, nvec_mme, PCI_CAP_ID_MSI);
if (status)
return status;

@@ -835,9 +836,34 @@ int pci_enable_msi_block(struct pci_dev *dev, unsigned int nvec)
return -EINVAL;
}

- status = msi_capability_init(dev, nvec);
+ status = msi_capability_init(dev, nvec, nvec_mme);
return status;
}
+EXPORT_SYMBOL(pci_enable_msi_block_part);
+
+/**
+ * pci_enable_msi_block - configure device's MSI capability structure
+ * @dev: device to configure
+ * @nvec: number of interrupts to configure
+ *
+ * Allocate IRQs for a device with the MSI capability.
+ * This function returns a negative errno if an error occurs. If it
+ * is unable to allocate the number of interrupts requested, it returns
+ * the number of interrupts it might be able to allocate. If it successfully
+ * allocates at least the number of interrupts requested, it returns 0 and
+ * updates the @dev's irq member to the lowest new interrupt number; the
+ * other interrupt numbers allocated to this device are consecutive.
+ */
+int pci_enable_msi_block(struct pci_dev *dev, unsigned int nvec)
+{
+ /*
+ * Archtectures which do not support nvec_mme should ignore it.
+ * However, it would be surprising if an architecture write to
+ * the Multiple Message Enable register something else than nvec
+ * rounded up to the power of two.
+ */
+ return pci_enable_msi_block_part(dev, nvec, __roundup_pow_of_two(nvec));
+}
EXPORT_SYMBOL(pci_enable_msi_block);

int pci_enable_msi_block_auto(struct pci_dev *dev, unsigned int *maxvec)
@@ -941,7 +967,8 @@ int pci_enable_msix(struct pci_dev *dev, struct msix_entry *entries, int nvec)
if (!entries || !dev->msix_cap)
return -EINVAL;

- status = pci_msi_check_device(dev, nvec, PCI_CAP_ID_MSIX);
+ /* nvec_mme parameter does not make sense in case of MSI-X */
+ status = pci_msi_check_device(dev, nvec, -1, PCI_CAP_ID_MSIX);
if (status)
return status;

diff --git a/include/linux/msi.h b/include/linux/msi.h
index ee66f3a..e27ad31 100644
--- a/include/linux/msi.h
+++ b/include/linux/msi.h
@@ -55,8 +55,9 @@ struct msi_desc {
*/
int arch_setup_msi_irq(struct pci_dev *dev, struct msi_desc *desc);
void arch_teardown_msi_irq(unsigned int irq);
-int arch_setup_msi_irqs(struct pci_dev *dev, int nvec, int type);
+int arch_setup_msi_irqs(struct pci_dev *dev, int nvec, int nvec_mme, int type);
void arch_teardown_msi_irqs(struct pci_dev *dev);
-int arch_msi_check_device(struct pci_dev* dev, int nvec, int type);
+int arch_msi_check_device(struct pci_dev* dev,
+ int nvec, int nvec_mme, int type);

#endif /* LINUX_MSI_H */
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 0fd1f15..6552cee 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -1122,6 +1122,12 @@ struct msix_entry {

#ifndef CONFIG_PCI_MSI
+static inline int
+pci_enable_msi_block_part(struct pci_dev *dev, unsigned int nvec, int nvec_mme)
+{
+ return -1;
+}
+
static inline int pci_enable_msi_block(struct pci_dev *dev, unsigned int nvec)
{
return -1;
@@ -1163,6 +1169,8 @@ static inline int pci_msi_enabled(void)
return 0;
}
#else
+int pci_enable_msi_block_part(struct pci_dev *dev,
+ unsigned int nvec, int nvec_mme);
int pci_enable_msi_block(struct pci_dev *dev, unsigned int nvec);
int pci_enable_msi_block_auto(struct pci_dev *dev, unsigned int *maxvec);
void pci_msi_shutdown(struct pci_dev *dev);
--
1.7.7.6

--
Regards,
Alexander Gordeev
[email protected]

2013-09-02 08:58:07

by Alexander Gordeev

[permalink] [raw]

Subject: [PATCH 2/4] MSI/x86: Support pci_enable_msi_block_part() interface

This update is a prerequisite for the forthcoming update
of the AHCI device driver to conserve 10/16 MSIs on Intel
chipsets. The update makes use of 'nvec_mme' parameter
for pci_enable_msi_block_part() interface.

Signed-off-by: Alexander Gordeev <[email protected]>
---
drivers/iommu/irq_remapping.c | 12 ++++++------
1 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/iommu/irq_remapping.c b/drivers/iommu/irq_remapping.c
index 1a220a0..f443e23 100644
--- a/drivers/iommu/irq_remapping.c
+++ b/drivers/iommu/irq_remapping.c
@@ -49,9 +49,9 @@ static void irq_remapping_disable_io_apic(void)
disconnect_bsp_APIC(0);
}

-static int do_setup_msi_irqs(struct pci_dev *dev, int nvec)
+static int do_setup_msi_irqs(struct pci_dev *dev, int nvec, int nvec_mme)
{
- int node, ret, sub_handle, nvec_pow2, index = 0;
+ int node, ret, sub_handle, index = 0;
unsigned int irq;
struct msi_desc *msidesc;

@@ -60,18 +60,18 @@ static int do_setup_msi_irqs(struct pci_dev *dev, int nvec)
WARN_ON(msidesc->irq);
WARN_ON(msidesc->msi_attrib.multiple);
WARN_ON(msidesc->nvec_used);
+ BUG_ON(nvec_mme != __roundup_pow_of_two(nvec_mme));

node = dev_to_node(&dev->dev);
irq = __create_irqs(get_nr_irqs_gsi(), nvec, node);
if (irq == 0)
return -ENOSPC;

- nvec_pow2 = __roundup_pow_of_two(nvec);
msidesc->nvec_used = nvec;
- msidesc->msi_attrib.multiple = ilog2(nvec_pow2);
+ msidesc->msi_attrib.multiple = ilog2(nvec_mme);
for (sub_handle = 0; sub_handle < nvec; sub_handle++) {
if (!sub_handle) {
- index = msi_alloc_remapped_irq(dev, irq, nvec_pow2);
+ index = msi_alloc_remapped_irq(dev, irq, nvec_mme);
if (index < 0) {
ret = index;
goto error;
@@ -145,7 +145,7 @@ static int irq_remapping_setup_msi_irqs(struct pci_dev *dev,
int nvec, int nvec_mme, int type)
{
if (type == PCI_CAP_ID_MSI)
- return do_setup_msi_irqs(dev, nvec);
+ return do_setup_msi_irqs(dev, nvec, nvec_mme);
else
return do_setup_msix_irqs(dev, nvec);
}
--
1.7.7.6

--
Regards,
Alexander Gordeev
[email protected]

2013-09-02 08:58:37

by Alexander Gordeev

[permalink] [raw]

Subject: [PATCH 3/4] AHCI: Conserve interrupts with pci_enable_msi_block_part() interface

Make use of the new pci_enable_msi_block_part() interface
and conserve on othewise wasted interrupt resources for 10
of 16 unused MSI vectors on Intel chipsets.

Signed-off-by: Alexander Gordeev <[email protected]>
---
drivers/ata/ahci.c | 48 +++++++++++++++++++++++++++++-------------------
1 files changed, 29 insertions(+), 19 deletions(-)

diff --git a/drivers/ata/ahci.c b/drivers/ata/ahci.c
index db4380d..b0c1648 100644
--- a/drivers/ata/ahci.c
+++ b/drivers/ata/ahci.c
@@ -1091,26 +1091,36 @@ static inline void ahci_gtf_filter_workaround(struct ata_host *host)
{}
#endif

-int ahci_init_interrupts(struct pci_dev *pdev, struct ahci_host_priv *hpriv)
+int ahci_init_interrupts(struct pci_dev *pdev, unsigned int n_ports,
+ struct ahci_host_priv *hpriv)
{
int rc;
unsigned int maxvec;

- if (!(hpriv->flags & AHCI_HFLAG_NO_MSI)) {
- rc = pci_enable_msi_block_auto(pdev, &maxvec);
- if (rc > 0) {
- if ((rc == maxvec) || (rc == 1))
- return rc;
- /*
- * Assume that advantage of multipe MSIs is negated,
- * so fallback to single MSI mode to save resources
- */
- pci_disable_msi(pdev);
- if (!pci_enable_msi(pdev))
- return 1;
- }
- }
+ if (hpriv->flags & AHCI_HFLAG_NO_MSI)
+ goto intx;
+
+ rc = pci_enable_msi_block_part(pdev, n_ports, AHCI_MAX_PORTS);
+ if (!rc)
+ return AHCI_MAX_PORTS;
+ if (rc < 0)
+ goto intx;
+
+ maxvec = rc;
+ rc = pci_enable_msi_block_part(pdev, n_ports, maxvec);
+ if (!rc)
+ return maxvec;
+ if (rc < 0)
+ goto intx;
+
+ /*
+ * Assume that advantage of multipe MSIs is negated,
+ * so fallback to single MSI mode to save resources
+ */
+ if (!pci_enable_msi(pdev))
+ return 1;

+intx:
pci_intx(pdev, 1);
return 0;
}
@@ -1277,10 +1287,6 @@ static int ahci_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)

hpriv->mmio = pcim_iomap_table(pdev)[ahci_pci_bar];

- n_msis = ahci_init_interrupts(pdev, hpriv);
- if (n_msis > 1)
- hpriv->flags |= AHCI_HFLAG_MULTI_MSI;
-
/* save initial config */
ahci_pci_save_initial_config(pdev, hpriv);

@@ -1327,6 +1333,10 @@ static int ahci_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
*/
n_ports = max(ahci_nr_ports(hpriv->cap), fls(hpriv->port_map));

+ n_msis = ahci_init_interrupts(pdev, n_ports, hpriv);
+ if (n_msis > 1)
+ hpriv->flags |= AHCI_HFLAG_MULTI_MSI;
+
host = ata_host_alloc_pinfo(&pdev->dev, ppi, n_ports);
if (!host)
return -ENOMEM;
--
1.7.7.6

--
Regards,
Alexander Gordeev
[email protected]

2013-09-02 08:59:00

by Alexander Gordeev

[permalink] [raw]

Subject: [PATCH 4/4] PCI/MSI: Get rid of pci_enable_msi_block_auto() interface

There are no consumers of pci_enable_msi_block_auto()
interface have left. Eliminate it for now.

Signed-off-by: Alexander Gordeev <[email protected]>
---
Documentation/PCI/MSI-HOWTO.txt | 12 ++++++------
drivers/pci/msi.c | 25 -------------------------
include/linux/pci.h | 7 -------
3 files changed, 6 insertions(+), 38 deletions(-)

diff --git a/Documentation/PCI/MSI-HOWTO.txt b/Documentation/PCI/MSI-HOWTO.txt
index 32d7d15..a8019e3 100644
--- a/Documentation/PCI/MSI-HOWTO.txt
+++ b/Documentation/PCI/MSI-HOWTO.txt
@@ -197,12 +197,12 @@ supported, it can set the pointer count to NULL.

void pci_disable_msi(struct pci_dev *dev)

-This function should be used to undo the effect of pci_enable_msi_block(),
-pci_enable_msi(), pci_enable_msi_block_auto() or pci_enable_msi_block_part().
-Calling it restores dev->irq to the pin-based interrupt number and frees the
-previously allocated message signaled interrupt(s). The interrupt may
-subsequently be assigned to another device, so drivers should not cache the
-value of dev->irq.
+This function should be used to undo the effect of pci_enable_msi() or
+pci_enable_msi_block() or pci_enable_msi_block_part(). Calling it restores
+dev->irq to the pin-based interrupt number and frees the previously
+allocated message signaled interrupt(s). The interrupt may subsequently be
+assigned to another device, so drivers should not cache the value of
+dev->irq.

Before calling this function, a device driver must always call free_irq()
on any interrupt for which it previously called request_irq().
diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
index a5c958f..d99f022 100644
--- a/drivers/pci/msi.c
+++ b/drivers/pci/msi.c
@@ -866,31 +866,6 @@ int pci_enable_msi_block(struct pci_dev *dev, unsigned int nvec)
}
EXPORT_SYMBOL(pci_enable_msi_block);

-int pci_enable_msi_block_auto(struct pci_dev *dev, unsigned int *maxvec)
-{
- int ret, nvec;
- u16 msgctl;
-
- if (!dev->msi_cap)
- return -EINVAL;
-
- pci_read_config_word(dev, dev->msi_cap + PCI_MSI_FLAGS, &msgctl);
- ret = 1 << ((msgctl & PCI_MSI_FLAGS_QMASK) >> 1);
-
- if (maxvec)
- *maxvec = ret;
-
- do {
- nvec = ret;
- ret = pci_enable_msi_block(dev, nvec);
- } while (ret > 0);
-
- if (ret < 0)
- return ret;
- return nvec;
-}
-EXPORT_SYMBOL(pci_enable_msi_block_auto);
-
void pci_msi_shutdown(struct pci_dev *dev)
{
struct msi_desc *desc;
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 6552cee..89935b4 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -1133,12 +1133,6 @@ static inline int pci_enable_msi_block(struct pci_dev *dev, unsigned int nvec)
return -1;
}

-static inline int
-pci_enable_msi_block_auto(struct pci_dev *dev, unsigned int *maxvec)
-{
- return -1;
-}
-
static inline void pci_msi_shutdown(struct pci_dev *dev)
{ }
static inline void pci_disable_msi(struct pci_dev *dev)
@@ -1172,7 +1166,6 @@ static inline int pci_msi_enabled(void)
int pci_enable_msi_block_part(struct pci_dev *dev,
unsigned int nvec, int nvec_mme);
int pci_enable_msi_block(struct pci_dev *dev, unsigned int nvec);
-int pci_enable_msi_block_auto(struct pci_dev *dev, unsigned int *maxvec);
void pci_msi_shutdown(struct pci_dev *dev);
void pci_disable_msi(struct pci_dev *dev);
int pci_msix_table_size(struct pci_dev *dev);
--
1.7.7.6

--
Regards,
Alexander Gordeev
[email protected]

2013-09-03 07:56:56

by Alexander Gordeev

[permalink] [raw]

Subject: [PATCH v2 4/4] PCI/MSI: Get rid of pci_enable_msi_block_auto() interface

There are no consumers of pci_enable_msi_block_auto()
interface have left. Eliminate it for now.

Signed-off-by: Alexander Gordeev <[email protected]>
---
Documentation/PCI/MSI-HOWTO.txt | 40 ++++++--------------------------------
drivers/pci/msi.c | 25 ------------------------
include/linux/pci.h | 7 ------
3 files changed, 7 insertions(+), 65 deletions(-)

diff --git a/Documentation/PCI/MSI-HOWTO.txt b/Documentation/PCI/MSI-HOWTO.txt
index 32d7d15..0c318c8 100644
--- a/Documentation/PCI/MSI-HOWTO.txt
+++ b/Documentation/PCI/MSI-HOWTO.txt
@@ -167,42 +167,16 @@ constraints on the number of vectors that can be allocated;
pci_enable_msi_block_part() returns as soon as it finds any constraint
that doesn't allow the call to succeed.

-4.2.4 pci_enable_msi_block_auto
-
-int pci_enable_msi_block_auto(struct pci_dev *dev, unsigned int *count)
-
-This variation on pci_enable_msi() call allows a device driver to request
-the maximum possible number of MSIs. The MSI specification only allows
-interrupts to be allocated in powers of two, up to a maximum of 2^5 (32).
-
-If this function returns a positive number, it indicates that it has
-succeeded and the returned value is the number of allocated interrupts. In
-this case, the function enables MSI on this device and updates dev->irq to
-be the lowest of the new interrupts assigned to it. The other interrupts
-assigned to the device are in the range dev->irq to dev->irq + returned
-value - 1.
-
-If this function returns a negative number, it indicates an error and
-the driver should not attempt to request any more MSI interrupts for
-this device.
-
-If the device driver needs to know the number of interrupts the device
-supports it can pass the pointer count where that number is stored. The
-device driver must decide what action to take if pci_enable_msi_block_auto()
-succeeds, but returns a value less than the number of interrupts supported.
-If the device driver does not need to know the number of interrupts
-supported, it can set the pointer count to NULL.
-
-4.2.5 pci_disable_msi
+4.2.4 pci_disable_msi

void pci_disable_msi(struct pci_dev *dev)

-This function should be used to undo the effect of pci_enable_msi_block(),
-pci_enable_msi(), pci_enable_msi_block_auto() or pci_enable_msi_block_part().
-Calling it restores dev->irq to the pin-based interrupt number and frees the
-previously allocated message signaled interrupt(s). The interrupt may
-subsequently be assigned to another device, so drivers should not cache the
-value of dev->irq.
+This function should be used to undo the effect of pci_enable_msi() or
+pci_enable_msi_block() or pci_enable_msi_block_part(). Calling it restores
+dev->irq to the pin-based interrupt number and frees the previously
+allocated message signaled interrupt(s). The interrupt may subsequently be
+assigned to another device, so drivers should not cache the value of
+dev->irq.

Before calling this function, a device driver must always call free_irq()
on any interrupt for which it previously called request_irq().
diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
index a5c958f..d99f022 100644
--- a/drivers/pci/msi.c
+++ b/drivers/pci/msi.c
@@ -866,31 +866,6 @@ int pci_enable_msi_block(struct pci_dev *dev, unsigned int nvec)
}
EXPORT_SYMBOL(pci_enable_msi_block);

-int pci_enable_msi_block_auto(struct pci_dev *dev, unsigned int *maxvec)
-{
- int ret, nvec;
- u16 msgctl;
-
- if (!dev->msi_cap)
- return -EINVAL;
-
- pci_read_config_word(dev, dev->msi_cap + PCI_MSI_FLAGS, &msgctl);
- ret = 1 << ((msgctl & PCI_MSI_FLAGS_QMASK) >> 1);
-
- if (maxvec)
- *maxvec = ret;
-
- do {
- nvec = ret;
- ret = pci_enable_msi_block(dev, nvec);
- } while (ret > 0);
-
- if (ret < 0)
- return ret;
- return nvec;
-}
-EXPORT_SYMBOL(pci_enable_msi_block_auto);
-
void pci_msi_shutdown(struct pci_dev *dev)
{
struct msi_desc *desc;
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 6552cee..89935b4 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -1133,12 +1133,6 @@ static inline int pci_enable_msi_block(struct pci_dev *dev, unsigned int nvec)
return -1;
}

-static inline int
-pci_enable_msi_block_auto(struct pci_dev *dev, unsigned int *maxvec)
-{
- return -1;
-}
-
static inline void pci_msi_shutdown(struct pci_dev *dev)
{ }
static inline void pci_disable_msi(struct pci_dev *dev)
@@ -1172,7 +1166,6 @@ static inline int pci_msi_enabled(void)
int pci_enable_msi_block_part(struct pci_dev *dev,
unsigned int nvec, int nvec_mme);
int pci_enable_msi_block(struct pci_dev *dev, unsigned int nvec);
-int pci_enable_msi_block_auto(struct pci_dev *dev, unsigned int *maxvec);
void pci_msi_shutdown(struct pci_dev *dev);
void pci_disable_msi(struct pci_dev *dev);
int pci_msix_table_size(struct pci_dev *dev);
--
1.7.7.6

--
Regards,
Alexander Gordeev
[email protected]

2013-09-03 13:33:32

[permalink] [raw]

Subject: Re: [PATCH 2/4] MSI/x86: Support pci_enable_msi_block_part() interface

On Mon, Sep 02, 2013 at 11:00:01AM +0200, Alexander Gordeev wrote:
> @@ -60,18 +60,18 @@ static int do_setup_msi_irqs(struct pci_dev *dev, int nvec)
> WARN_ON(msidesc->irq);
> WARN_ON(msidesc->msi_attrib.multiple);
> WARN_ON(msidesc->nvec_used);
> + BUG_ON(nvec_mme != __roundup_pow_of_two(nvec_mme));

Wouldn't !is_power_of_2() be better?

Thanks.

--
tejun

2013-09-03 13:55:46

[permalink] [raw]

Subject: Re: [PATCH 0/4] AHCI: Conserve interrupts with pci_enable_msi_block_part() interface

Hello,

On Mon, Sep 02, 2013 at 10:59:07AM +0200, Alexander Gordeev wrote:
> This series is aimed to conserve on othewise wasted interrupt
> resources for 10 of 16 unused MSI vectors for AHCI devices on
> Intel chipsets.

Hmmm.... I've been looking at the code and and a curiosity. Why does
multiple MSI support implicitly enabled threaded IRQ handling? Why
are those two linked? Also, do you have any numbers to show that this
actually is better? Handling the processing off to a thread isn't a
light operation.

Thanks.

--
tejun

2013-09-03 14:10:01

[permalink] [raw]

Subject: Re: [PATCH 0/4] AHCI: Conserve interrupts with pci_enable_msi_block_part() interface

On Tue, Sep 03, 2013 at 09:55:41AM -0400, Tejun Heo wrote:
> Hmmm.... I've been looking at the code and and a curiosity. Why does
> multiple MSI support implicitly enabled threaded IRQ handling? Why
> are those two linked? Also, do you have any numbers to show that this
> actually is better? Handling the processing off to a thread isn't a
> light operation.

Also, it probably is a good idea to skip dummy ports when requesting
irqs from ahci_host_activate(), which BTW should probably be renamed
to ahci_host_activate_mmsi().

Thanks.

--
tejun

2013-09-03 14:18:30

[permalink] [raw]

Subject: Re: [PATCH 3/4] AHCI: Conserve interrupts with pci_enable_msi_block_part() interface

On Mon, Sep 02, 2013 at 11:00:28AM +0200, Alexander Gordeev wrote:
> + if (hpriv->flags & AHCI_HFLAG_NO_MSI)
> + goto intx;
> +
> + rc = pci_enable_msi_block_part(pdev, n_ports, AHCI_MAX_PORTS);
> + if (!rc)
> + return AHCI_MAX_PORTS;
> + if (rc < 0)
> + goto intx;
> +
> + maxvec = rc;
> + rc = pci_enable_msi_block_part(pdev, n_ports, maxvec);
> + if (!rc)
> + return maxvec;
> + if (rc < 0)
> + goto intx;

Why is the above fallback necessary? The only other number which
makes sense is roundup_pow_of_two(n_ports), right? What does the
above fallback logic buy us?

Thanks.

--
tejun

2013-09-03 14:19:15

[permalink] [raw]

Subject: Re: [PATCH 1/4] PCI/MSI: Introduce pci_enable_msi_block_part() interface

On Mon, Sep 02, 2013 at 10:59:33AM +0200, Alexander Gordeev wrote:
> + if (__roundup_pow_of_two(nvec_mme) != nvec_mme)

Why the __ prefixed version?

--
tejun

2013-09-03 14:55:26

by Alexander Gordeev

[permalink] [raw]

Subject: Re: [PATCH 0/4] AHCI: Conserve interrupts with pci_enable_msi_block_part() interface

On Tue, Sep 03, 2013 at 09:55:41AM -0400, Tejun Heo wrote:
> Hello,
>
> On Mon, Sep 02, 2013 at 10:59:07AM +0200, Alexander Gordeev wrote:
> > This series is aimed to conserve on othewise wasted interrupt
> > resources for 10 of 16 unused MSI vectors for AHCI devices on
> > Intel chipsets.
>
> Hmmm.... I've been looking at the code and and a curiosity. Why does
> multiple MSI support implicitly enabled threaded IRQ handling? Why
> are those two linked? Also, do you have any numbers to show that this
> actually is better? Handling the processing off to a thread isn't a
> light operation.

Just to emphasize the purpose of this series - it is aimed on configuring
(allocating x86 vectors, irq_desc's etc.) 6/16 interrupts instead of current
16/16 - while 10/16 are unused. Nothing more.

Multiple MSI support enables threaded IRQ handling, because at the time of
posting I did not want to intrude into the existing single-MSI codebase while
multiple MSI/multipe CPU approach gained good numbers (below).

Besides, the ports interrupt handling moved from the interrupt-disabled
hardware context to the interrupt-enabled threaded context and also ports
got per-port locks instead of the single host lock. AFAIR I did not post
the numbers, but it was something close to no-contention.

Host-wide lock statistics summary:

Taken using command 'if=/dev/sd{a,b,c} of=/dev/null' running in parallel
on three SATA HDDs:
ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300)

ahci_interrupt()
Capabilities: [80] MSI: Enable+ Count=1/16 Maskable- 64bit-

Test holdtime-total waittime-total acquis. holdtime-avg waittime-avg
#
01. 22565801.27 93056.41 6377094 3.538571216 0.014592291
02. 20358324.12 60825.47 6411012 3.175524257 0.009487655
03. 21190730.38 63610.38 6384508 3.319085884 0.009963239
04. 22491064.54 80624.96 6398390 3.515113105 0.012600820
05. 21986193.10 78034.38 6379092 3.446602291 0.012232835
06. 20556437.70 62314.64 6287673 3.269323596 0.009910604
07. 20477137.55 66190.92 6388507 3.205308776 0.010360937
08. 21442240.03 69306.80 6402109 3.349246323 0.010825620
----------- -----------
avg 3.352346931 0.011246750

ahci_hw_interrupt()
Capabilities: [80] MSI: Enable+ Count=16/16 Maskable- 64bit-

Test holdtime-total waittime-total acquis. holdtime-avg waittime-avg
#
09. 8936643.32 54559.79 6505606 1.373683454 0.008386581
10. 8579972.36 51496.56 6496233 1.320761180 0.007927142
11. 8138708.47 54061.46 6494647 1.253141005 0.008324003
12. 8322068.81 60427.51 6509975 1.278356493 0.009282295
13. 8527745.18 65978.83 6515589 1.308821839 0.010126303
14. 8662461.99 57291.39 6510702 1.330495850 0.008799572
15. 8054911.26 69186.41 6514262 1.236504037 0.010620759
16. 9618631.17 39726.37 6517385 1.475842101 0.006095446
----------- -----------
avg 1.322200745 0.008695263

> Thanks.
>
> --
> tejun

--
Regards,
Alexander Gordeev
[email protected]

2013-09-03 16:17:16

by Alexander Gordeev

[permalink] [raw]

Subject: Re: [PATCH 3/4] AHCI: Conserve interrupts with pci_enable_msi_block_part() interface

On Tue, Sep 03, 2013 at 10:18:24AM -0400, Tejun Heo wrote:
> On Mon, Sep 02, 2013 at 11:00:28AM +0200, Alexander Gordeev wrote:
> > + if (hpriv->flags & AHCI_HFLAG_NO_MSI)
> > + goto intx;
> > +
> > + rc = pci_enable_msi_block_part(pdev, n_ports, AHCI_MAX_PORTS);

We start with maximum possible number of ports AHCI_MAX_PORTS

> > + if (!rc)
> > + return AHCI_MAX_PORTS;

If we succeeded the device is indeed supports all AHCI_MAX_PORTS
and we report it.

> > + if (rc < 0)
> > + goto intx;

If pci_enable_msi_block_part() failed we should not make further
attempts and fallback to simple IRQ.

> > + maxvec = rc;

The device supports a lesser of AHCI_MAX_PORTS, because the previous
pci_enable_msi_block_part() has not succeeded nor failed. Thus, rc
contains number of supported MSIs. In case of ICH this will be 16
rather than 32.

Actually, while I was writing this I realized this could be a number
of MSIs that could have been enabled this device, not the maximum
number of supported MSIs - these two may differ. I think MRSM should
be checked. But I will continue as if it always the same.

> > + rc = pci_enable_msi_block_part(pdev, n_ports, maxvec);

Try pci_enable_msi_block_part() with the maximum number of supported MSIs.

> > + if (!rc)
> > + return maxvec;

If we succeeded report the number of enabled MSIs.

> > + if (rc < 0)
> > + goto intx;

If pci_enable_msi_block_part() failed we should not make further
attempts and fallback to simple IRQ.

> Why is the above fallback necessary? The only other number which
> makes sense is roundup_pow_of_two(n_ports), right? What does the
> above fallback logic buy us?

We must enable maximum possible number of MSIs - the one reported in
Multiple Message Capable register. Otherwise ICH device will fallback
to MRSM. IOW, if the result of roundup_pow_of_two(n_ports) is not what
in Multiple Message Capable register (i.e. as roundup_pow_of_two(6) vs 16)
ICH will enforce MRSM mode.

> Thanks.
>
> --
> tejun

--
Regards,
Alexander Gordeev
[email protected]

2013-09-03 16:26:16

[permalink] [raw]

Subject: Re: [PATCH 0/4] AHCI: Conserve interrupts with pci_enable_msi_block_part() interface

Hello,

On Tue, Sep 03, 2013 at 04:57:19PM +0200, Alexander Gordeev wrote:
> Multiple MSI support enables threaded IRQ handling, because at the time of
> posting I did not want to intrude into the existing single-MSI codebase while
> multiple MSI/multipe CPU approach gained good numbers (below).

Please separate out threaded IRQ support from multiple MSI.

> Besides, the ports interrupt handling moved from the interrupt-disabled
> hardware context to the interrupt-enabled threaded context and also ports
> got per-port locks instead of the single host lock. AFAIR I did not post
> the numbers, but it was something close to no-contention.

Which is a completely separate issue, right? Let's please not mix the
two.

Thanks.

--
tejun

2013-09-03 18:27:42

[permalink] [raw]

Subject: Re: [PATCH 3/4] AHCI: Conserve interrupts with pci_enable_msi_block_part() interface

Hello,

On Tue, Sep 03, 2013 at 06:19:06PM +0200, Alexander Gordeev wrote:
> We must enable maximum possible number of MSIs - the one reported in
> Multiple Message Capable register. Otherwise ICH device will fallback
> to MRSM. IOW, if the result of roundup_pow_of_two(n_ports) is not what
> in Multiple Message Capable register (i.e. as roundup_pow_of_two(6) vs 16)
> ICH will enforce MRSM mode.

Hmmm... I think the interface in general is pretty messy. Wouldn't it
be much cleaner to have a separate function to query MSICAP and let
the function just return success / failure?

Thanks.

--
tejun

2013-09-04 07:21:09

by Alexander Gordeev

[permalink] [raw]

Subject: Re: [PATCH 3/4] AHCI: Conserve interrupts with pci_enable_msi_block_part() interface

On Tue, Sep 03, 2013 at 02:27:31PM -0400, Tejun Heo wrote: > Hello,
>
> On Tue, Sep 03, 2013 at 06:19:06PM +0200, Alexander Gordeev wrote:
> > We must enable maximum possible number of MSIs - the one reported in
> > Multiple Message Capable register. Otherwise ICH device will fallback
> > to MRSM. IOW, if the result of roundup_pow_of_two(n_ports) is not what
> > in Multiple Message Capable register (i.e. as roundup_pow_of_two(6) vs 16)
> > ICH will enforce MRSM mode.
>
> Hmmm... I think the interface in general is pretty messy. Wouldn't it
> be much cleaner to have a separate function to query MSICAP and let
> the function just return success / failure?

Actually, sorry for misleading. It is only ICH (I am aware of) that works
this way and I was focused on.

I think a general approach that will cover it all (including ICH and undesired
sharing of interrupt vectors) - start MME from roundup_pow_of_two(n_ports) and
ensure MRSM bit is unset. If not - double MME and retry. If still not and the
limit is reached - fall back to single MSI.

Makes sense?

> Thanks.
>
> --
> tejun

--
Regards,
Alexander Gordeev
[email protected]

2013-09-04 08:04:34

by Alexander Gordeev

[permalink] [raw]

Subject: Re: [PATCH 0/4] AHCI: Conserve interrupts with pci_enable_msi_block_part() interface

On Tue, Sep 03, 2013 at 12:24:11PM -0400, Tejun Heo wrote:
> Hello,
>
> On Tue, Sep 03, 2013 at 04:57:19PM +0200, Alexander Gordeev wrote:
> > Multiple MSI support enables threaded IRQ handling, because at the time of
> > posting I did not want to intrude into the existing single-MSI codebase while
> > multiple MSI/multipe CPU approach gained good numbers (below).
>
> Please separate out threaded IRQ support from multiple MSI.

This series does not really bring any structural change to the AHCI code -
just tweaks the MSI initialization.

As threaded IRQ support vs multiple MSI is a separate issue what about
addressing it in a separate effort? I would have to think on how to do it
eigher :)

> Thanks.
>
> --
> tejun

--
Regards,
Alexander Gordeev
[email protected]

2013-09-04 12:30:47

by Alexander Gordeev

[permalink] [raw]

Subject: Re: [PATCH 0/4] AHCI: Conserve interrupts with pci_enable_msi_block_part() interface

On Tue, Sep 03, 2013 at 10:09:54AM -0400, Tejun Heo wrote:
> On Tue, Sep 03, 2013 at 09:55:41AM -0400, Tejun Heo wrote:
> > Hmmm.... I've been looking at the code and and a curiosity. Why does
> > multiple MSI support implicitly enabled threaded IRQ handling? Why
> > are those two linked? Also, do you have any numbers to show that this
> > actually is better? Handling the processing off to a thread isn't a
> > light operation.
>
> Also, it probably is a good idea to skip dummy ports when requesting
> irqs from ahci_host_activate(), which BTW should probably be renamed
> to ahci_host_activate_mmsi().

Hmm.. that is actually a great idea. What I am not sure about whether is
a dummy port still can send (spurious?) interrupts? The hardware interrupt
handler would have to be reworked then. Seems as a yet another topic to me ;)

> --
> tejun

--
Regards,
Alexander Gordeev
[email protected]

2013-09-04 14:15:20

by Alexander Gordeev

[permalink] [raw]

Subject: [PATCH v2 3/4] AHCI: Conserve interrupts with pci_enable_msi_block_part() interface

Make use of the new pci_enable_msi_block_part() interface
and conserve on othewise wasted interrupt resources for 10
of 16 unused MSI vectors on Intel chipsets.

Signed-off-by: Alexander Gordeev <[email protected]>

diff --git a/drivers/ata/ahci.c b/drivers/ata/ahci.c
index db4380d..41f9c08 100644
--- a/drivers/ata/ahci.c
+++ b/drivers/ata/ahci.c
@@ -1091,26 +1091,39 @@ static inline void ahci_gtf_filter_workaround(struct ata_host *host)
{}
#endif

-int ahci_init_interrupts(struct pci_dev *pdev, struct ahci_host_priv *hpriv)
+static int ahci_get_mrsm(struct ahci_host_priv *hpriv)
+{
+ void __iomem *mmio = hpriv->mmio;
+ u32 ctl = readl(mmio + HOST_CTL);
+
+ return (ctl & HOST_MRSM);
+}
+
+int ahci_init_interrupts(struct pci_dev *pdev, unsigned int n_ports,
+ struct ahci_host_priv *hpriv)
{
int rc;
- unsigned int maxvec;
+ unsigned int n_msis;

- if (!(hpriv->flags & AHCI_HFLAG_NO_MSI)) {
- rc = pci_enable_msi_block_auto(pdev, &maxvec);
- if (rc > 0) {
- if ((rc == maxvec) || (rc == 1))
- return rc;
- /*
- * Assume that advantage of multipe MSIs is negated,
- * so fallback to single MSI mode to save resources
- */
- pci_disable_msi(pdev);
- if (!pci_enable_msi(pdev))
- return 1;
- }
+ if (hpriv->flags & AHCI_HFLAG_NO_MSI)
+ goto intx;
+
+ for (n_msis = roundup_pow_of_two(n_ports);
+ n_msis <= AHCI_MAX_PORTS; n_msis *= 2) {
+ rc = pci_enable_msi_block_part(pdev, n_ports, n_msis);
+ if (rc < 0)
+ goto intx;
+ if (rc)
+ break;
+ if (!ahci_get_mrsm(hpriv))
+ return n_msis;
+ pci_disable_msi(pdev);
}

+ if (!pci_enable_msi(pdev))
+ return 1;
+
+intx:
pci_intx(pdev, 1);
return 0;
}
@@ -1277,10 +1290,6 @@ static int ahci_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)

hpriv->mmio = pcim_iomap_table(pdev)[ahci_pci_bar];

- n_msis = ahci_init_interrupts(pdev, hpriv);
- if (n_msis > 1)
- hpriv->flags |= AHCI_HFLAG_MULTI_MSI;
-
/* save initial config */
ahci_pci_save_initial_config(pdev, hpriv);

@@ -1327,6 +1336,10 @@ static int ahci_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
*/
n_ports = max(ahci_nr_ports(hpriv->cap), fls(hpriv->port_map));

+ n_msis = ahci_init_interrupts(pdev, n_ports, hpriv);
+ if (n_msis > 1)
+ hpriv->flags |= AHCI_HFLAG_MULTI_MSI;
+
host = ata_host_alloc_pinfo(&pdev->dev, ppi, n_ports);
if (!host)
return -ENOMEM;
diff --git a/drivers/ata/ahci.h b/drivers/ata/ahci.h
index 1145637..19bc846 100644
--- a/drivers/ata/ahci.h
+++ b/drivers/ata/ahci.h
@@ -91,6 +91,7 @@ enum {
/* HOST_CTL bits */
HOST_RESET = (1 << 0), /* reset controller; self-clear */
HOST_IRQ_EN = (1 << 1), /* global IRQ enable */
+ HOST_MRSM = (1 << 2), /* MSI Revert to Single Message */
HOST_AHCI_EN = (1 << 31), /* AHCI enabled */

/* HOST_CAP bits */

--
Regards,
Alexander Gordeev
[email protected]

2013-09-04 14:56:10

[permalink] [raw]

Subject: Re: [PATCH 3/4] AHCI: Conserve interrupts with pci_enable_msi_block_part() interface

Hello,

On Wed, Sep 04, 2013 at 09:22:57AM +0200, Alexander Gordeev wrote:
> I think a general approach that will cover it all (including ICH and undesired
> sharing of interrupt vectors) - start MME from roundup_pow_of_two(n_ports) and
> ensure MRSM bit is unset. If not - double MME and retry. If still not and the
> limit is reached - fall back to single MSI.
>
> Makes sense?

Ugh... I really don't want this sort of retry loop. I'm still a bit
lost on why this is necessary. Can you please elaborate?

Thanks.

--
tejun

2013-09-04 14:58:01

[permalink] [raw]

Subject: Re: [PATCH 0/4] AHCI: Conserve interrupts with pci_enable_msi_block_part() interface

Hello,

On Wed, Sep 04, 2013 at 02:32:39PM +0200, Alexander Gordeev wrote:
> Hmm.. that is actually a great idea. What I am not sure about whether is
> a dummy port still can send (spurious?) interrupts? The hardware interrupt
> handler would have to be reworked then. Seems as a yet another topic to me ;)

Only unimplemented ports are assigned to be dummies and it's highly
unlikely that they'll generate spurious interrupts. That said, I
think it'd be a good idea to be ready for that if that doesn't add too
much complexity / overhead.

Thanks.

--
tejun

2013-09-04 15:02:45

[permalink] [raw]

Subject: Re: [PATCH 0/4] AHCI: Conserve interrupts with pci_enable_msi_block_part() interface

Hello,

On Wed, Sep 04, 2013 at 10:06:23AM +0200, Alexander Gordeev wrote:
> > Please separate out threaded IRQ support from multiple MSI.
>
> This series does not really bring any structural change to the AHCI code -
> just tweaks the MSI initialization.
>
> As threaded IRQ support vs multiple MSI is a separate issue what about
> addressing it in a separate effort? I would have to think on how to do it
> eigher :)

Oh, I'm not saying that these all should be done together but it looks
like you're interested in pushing things further in this respect and I
wanna have an idea about the overall direction / future before acking
further changes. IOW, I'm happy that you're working on it but also
wanna make sure you're gonna see things through and know where things
are headed.

Thanks!

--
tejun

2013-09-04 16:12:50

by Alexander Gordeev

[permalink] [raw]

Subject: Re: [PATCH 3/4] AHCI: Conserve interrupts with pci_enable_msi_block_part() interface

On Wed, Sep 04, 2013 at 10:55:59AM -0400, Tejun Heo wrote:
> Hello,
>
> On Wed, Sep 04, 2013 at 09:22:57AM +0200, Alexander Gordeev wrote:
> > I think a general approach that will cover it all (including ICH and undesired
> > sharing of interrupt vectors) - start MME from roundup_pow_of_two(n_ports) and
> > ensure MRSM bit is unset. If not - double MME and retry. If still not and the
> > limit is reached - fall back to single MSI.
> >
> > Makes sense?
>
> Ugh... I really don't want this sort of retry loop. I'm still a bit
> lost on why this is necessary. Can you please elaborate?

Sure.

1. We do not support sharing MSI messages since there is no appropriate
interrupt handling for it. I am not sure if any hardware supports it at
all. This assumption is just for clarity here.

2. We can not just read out MMC value and try to write it to MME, because
we are not sure the hardware honours the specification. I.e. in case of 6
ports and MMC value of 16 the value of 8 in MME could enable multiple MSIs
while the value of 16 could enforce MRSM. Contradicts to the AHCI
specification, but look how weird ICH is [4].

3. Enabling more MSIs than needed (MME == MMC instead of MME < MMC) could
lead to unnecessary allocation of internal device resources. Bit lame, but
still true.

4. We can not derive the value of MME needed from the number of ports, at
least in case of ICH. I.e. with 6 ports and MMC value of 16 the value of 8
in MME is what would be expected according to the AHCI specification. But
the ICH reserves it and MRSMs in case 8 is written to MME.

So to cover all the above assumptions we need to scan from lowest possible
and watch the MRSM bit is unset.

> Thanks.
>
> --
> tejun

--
Regards,
Alexander Gordeev
[email protected]

2013-09-04 18:06:19

[permalink] [raw]

Subject: Re: [PATCH 3/4] AHCI: Conserve interrupts with pci_enable_msi_block_part() interface

Hello,

On Wed, Sep 04, 2013 at 06:14:43PM +0200, Alexander Gordeev wrote:
> 1. We do not support sharing MSI messages since there is no appropriate
> interrupt handling for it. I am not sure if any hardware supports it at
> all. This assumption is just for clarity here.

I don't think it even matters. If we can configure all MSIs, good.
If not, using single interrupt is good enough. Being able to allocate
only some of the interrupts is highly unlikely to begin with and
there's no reason to meddle with extra complexity.

> 2. We can not just read out MMC value and try to write it to MME, because
> we are not sure the hardware honours the specification. I.e. in case of 6
> ports and MMC value of 16 the value of 8 in MME could enable multiple MSIs
> while the value of 16 could enforce MRSM. Contradicts to the AHCI
> specification, but look how weird ICH is [4].
>
> 3. Enabling more MSIs than needed (MME == MMC instead of MME < MMC) could
> lead to unnecessary allocation of internal device resources. Bit lame, but
> still true.
>
> 4. We can not derive the value of MME needed from the number of ports, at
> least in case of ICH. I.e. with 6 ports and MMC value of 16 the value of 8
> in MME is what would be expected according to the AHCI specification. But
> the ICH reserves it and MRSMs in case 8 is written to MME.
>
> So to cover all the above assumptions we need to scan from lowest possible
> and watch the MRSM bit is unset.

I don't think it's necessary / a good idea to try to support
everything. Just following the spec would be fine. If that doesn't
work for too many devices, maybe just do one fallback?

Thanks.

--
tejun

2013-09-04 18:45:28

by Alexander Gordeev

[permalink] [raw]

Subject: Re: [PATCH 3/4] AHCI: Conserve interrupts with pci_enable_msi_block_part() interface

On Wed, Sep 04, 2013 at 02:06:07PM -0400, Tejun Heo wrote:
> I don't think it's necessary / a good idea to try to support
> everything. Just following the spec would be fine. If that doesn't
> work for too many devices, maybe just do one fallback?

Calling pci_enable_msi_block_part() in a loop may seem not that terrible
if you think of the original interface - pci_enable_msi_block(). If we
were to enable multiple MSIs using that one we still had to use it pretty
much the same way ;)

Anyway. I will send a version which reads MMC out.

> Thanks.
>
> --
> tejun

--
Regards,
Alexander Gordeev
[email protected]

2013-09-04 18:51:27

[permalink] [raw]

Subject: Re: [PATCH 3/4] AHCI: Conserve interrupts with pci_enable_msi_block_part() interface

On Wed, Sep 04, 2013 at 08:47:16PM +0200, Alexander Gordeev wrote:
> On Wed, Sep 04, 2013 at 02:06:07PM -0400, Tejun Heo wrote:
> > I don't think it's necessary / a good idea to try to support
> > everything. Just following the spec would be fine. If that doesn't
> > work for too many devices, maybe just do one fallback?
>
> Calling pci_enable_msi_block_part() in a loop may seem not that terrible
> if you think of the original interface - pci_enable_msi_block(). If we
> were to enable multiple MSIs using that one we still had to use it pretty
> much the same way ;)

I don't know. The thing is, do we even want to meddle with
controllers which only accept a value which isn't the value specified
in the spec or the maximum value described as supported? It's icky
and we are far better of playing it safe. This has the possibility of
breaking boot on many configurations.

Thanks.

--
tejun