2012-10-01 08:08:27

by Alexander Gordeev

[permalink] [raw]
Subject: [PATCH v3 -tip 0/5] x86, MSI, AHCI: Support multiple MSIs

Currently multiple MSI mode is limited to a single vector per device (at
least on x86 and PPC). This series breathes life into pci_enable_msi_block()
and makes it possible to set interrupt affinity for multiple IRQs, similarly
to MSI-X. Yet, only for x86 and only when IOMMUs are present.

Although IRQ and PCI subsystems are modified, the current behaviour left
intact. The drivers could just start using multiple MSIs just by following
the existing documentation.

The AHCI device driver makes use of the new mode and a new function -
pci_enable_msi_block_auto() - patches 4,5.

The series is adapted to Ingo's -tip repository.


Compared to v2:

1/5: v1 is conditionally acked by Suresh

Based on Yinghai's comment:
- create_irqs() is updated to be __create_irqs(, count, );
- create_irq_nr() done as __create_irqs(, 1, ) call;

2,3/5: No changes since v1, conditionally acked by Suresh

4/5: No changes, v2 acked by Bjorn

5/5: "Shared Last Message" mode leftovers removed;

Based on Jeff's comments:

- PCI-specific code moved from libahci.c to ahci.c -
ahci_host_activate() lost scsi_host_template* parameter
as result;

- spinlock initialization moved to ahci_port_start() -
AHCI_HFLAG_MULTI_MSI flag introduced for that;


Alexander Gordeev (5):
1/5 x86, MSI: Support multiple MSIs in presense of IRQ remapping
2/5 x86, MSI: Allocate as many multiple IRQs as requested
3/5 x86, MSI: Minor readability fixes
4/5 PCI, MSI: Enable multiple MSIs with pci_enable_msi_block_auto()
5/5 AHCI: Support multiple MSIs

Documentation/PCI/MSI-HOWTO.txt | 37 +++++++-
arch/x86/kernel/apic/io_apic.c | 178 ++++++++++++++++++++++++++++++++-------
drivers/ata/ahci.c | 91 +++++++++++++++++++-
drivers/ata/ahci.h | 6 ++
drivers/ata/libahci.c | 118 ++++++++++++++++++++++++--
drivers/pci/msi.c | 36 ++++++++-
include/linux/irq.h | 6 ++
include/linux/msi.h | 1 +
include/linux/pci.h | 7 ++
kernel/irq/chip.c | 30 +++++--
kernel/irq/irqdesc.c | 31 +++++++
11 files changed, 486 insertions(+), 55 deletions(-)

--
1.7.7.6


--
Regards,
Alexander Gordeev
[email protected]


2012-10-01 08:09:43

by Alexander Gordeev

[permalink] [raw]
Subject: [PATCH v3 -tip 1/5] x86, MSI: Support multiple MSIs in presense of IRQ remapping

The MSI specification has several constraints in comparison with MSI-X,
most notable of them is the inability to configure MSIs independently.
As a result, it is impossible to dispatch interrupts from different
queues to different CPUs. This is largely devalues the support of
multiple MSIs in SMP systems.

Also, a necessity to allocate a contiguous block of vector numbers for
devices capable of multiple MSIs might cause a considerable pressure on
x86 interrupt vector allocator and could lead to fragmentation of the
interrupt vectors space.

This patch overcomes both drawbacks in presense of IRQ remapping and
lets devices take advantage of multiple queues and per-IRQ affinity
assignments.

Signed-off-by: Alexander Gordeev <[email protected]>
---
arch/x86/kernel/apic/io_apic.c | 174 +++++++++++++++++++++++++++++++++------
include/linux/irq.h | 6 ++
kernel/irq/chip.c | 30 +++++--
kernel/irq/irqdesc.c | 31 +++++++
4 files changed, 206 insertions(+), 35 deletions(-)

diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
index c265593..d5cb13c 100644
--- a/arch/x86/kernel/apic/io_apic.c
+++ b/arch/x86/kernel/apic/io_apic.c
@@ -305,6 +305,11 @@ static int alloc_irq_from(unsigned int from, int node)
return irq_alloc_desc_from(from, node);
}

+static int alloc_irqs_from(unsigned int from, unsigned int count, int node)
+{
+ return irq_alloc_descs_from(from, count, node);
+}
+
static void free_irq_at(unsigned int at, struct irq_cfg *cfg)
{
free_irq_cfg(at, cfg);
@@ -2991,37 +2996,58 @@ device_initcall(ioapic_init_ops);
/*
* Dynamic irq allocate and deallocation
*/
-unsigned int create_irq_nr(unsigned int from, int node)
+unsigned int __create_irqs(unsigned int from, unsigned int count, int node)
{
- struct irq_cfg *cfg;
+ struct irq_cfg **cfg;
unsigned long flags;
- unsigned int ret = 0;
- int irq;
+ int irq, i;

if (from < nr_irqs_gsi)
from = nr_irqs_gsi;

- irq = alloc_irq_from(from, node);
- if (irq < 0)
- return 0;
- cfg = alloc_irq_cfg(irq, node);
- if (!cfg) {
- free_irq_at(irq, NULL);
+ cfg = kzalloc_node(count * sizeof(cfg[0]), GFP_KERNEL, node);
+ if (!cfg)
return 0;
+
+ irq = alloc_irqs_from(from, count, node);
+ if (irq < 0)
+ goto out_cfgs;
+
+ for (i = 0; i < count; i++) {
+ cfg[i] = alloc_irq_cfg(irq + i, node);
+ if (!cfg[i])
+ goto out_irqs;
}

raw_spin_lock_irqsave(&vector_lock, flags);
- if (!__assign_irq_vector(irq, cfg, apic->target_cpus()))
- ret = irq;
+ for (i = 0; i < count; i++)
+ if (__assign_irq_vector(irq + i, cfg[i], apic->target_cpus()))
+ goto out_vecs;
raw_spin_unlock_irqrestore(&vector_lock, flags);

- if (ret) {
- irq_set_chip_data(irq, cfg);
- irq_clear_status_flags(irq, IRQ_NOREQUEST);
- } else {
- free_irq_at(irq, cfg);
+ for (i = 0; i < count; i++) {
+ irq_set_chip_data(irq + i, cfg[i]);
+ irq_clear_status_flags(irq + i, IRQ_NOREQUEST);
}
- return ret;
+
+ kfree(cfg);
+ return irq;
+
+out_vecs:
+ for (; i; i--)
+ __clear_irq_vector(irq + i - 1, cfg[i - 1]);
+ raw_spin_unlock_irqrestore(&vector_lock, flags);
+out_irqs:
+ for (i = 0; i < count; i++)
+ free_irq_at(irq + i, cfg[i]);
+out_cfgs:
+ kfree(cfg);
+ return 0;
+}
+
+unsigned int create_irq_nr(unsigned int from, int node)
+{
+ return __create_irqs(from, 1, node);
}

int create_irq(void)
@@ -3054,6 +3080,27 @@ void destroy_irq(unsigned int irq)
free_irq_at(irq, cfg);
}

+static inline void destroy_irqs(unsigned int irq, unsigned int count)
+{
+ unsigned int i;
+ for (i = 0; i < count; i++)
+ destroy_irq(irq + i);
+}
+
+static inline int
+can_create_pow_of_two_irqs(unsigned int from, unsigned int count)
+{
+ if ((count > 1) && (count % 2))
+ return -EINVAL;
+
+ for (; count; count = count / 2) {
+ if (!irq_can_alloc_irqs(from, count))
+ return count;
+ }
+
+ return -ENOSPC;
+}
+
/*
* MSI message composition
*/
@@ -3145,18 +3192,25 @@ static struct irq_chip msi_chip = {
.irq_retrigger = ioapic_retrigger_irq,
};

-static int setup_msi_irq(struct pci_dev *dev, struct msi_desc *msidesc, int irq)
+static int setup_msi_irq(struct pci_dev *dev, struct msi_desc *msidesc,
+ unsigned int irq_base, unsigned int irq_offset)
{
struct irq_chip *chip = &msi_chip;
struct msi_msg msg;
+ unsigned int irq = irq_base + irq_offset;
int ret;

ret = msi_compose_msg(dev, irq, &msg, -1);
if (ret < 0)
return ret;

- irq_set_msi_desc(irq, msidesc);
- write_msi_msg(irq, &msg);
+ irq_set_msi_desc_off(irq_base, irq_offset, msidesc);
+
+ /* MSI-X message is written per-IRQ, the offset is always 0.
+ * MSI message denotes a contiguous group of IRQs, written for 0th IRQ.
+ */
+ if (!irq_offset)
+ write_msi_msg(irq, &msg);

if (irq_remapped(irq_get_chip_data(irq))) {
irq_set_status_flags(irq, IRQ_MOVE_PCNTXT);
@@ -3170,16 +3224,12 @@ static int setup_msi_irq(struct pci_dev *dev, struct msi_desc *msidesc, int irq)
return 0;
}

-int native_setup_msi_irqs(struct pci_dev *dev, int nvec, int type)
+int setup_msix_irqs(struct pci_dev *dev, int nvec)
{
int node, ret, sub_handle, index = 0;
unsigned int irq, irq_want;
struct msi_desc *msidesc;

- /* x86 doesn't support multiple MSI yet */
- if (type == PCI_CAP_ID_MSI && nvec > 1)
- return 1;
-
node = dev_to_node(&dev->dev);
irq_want = nr_irqs_gsi;
sub_handle = 0;
@@ -3208,7 +3258,7 @@ int native_setup_msi_irqs(struct pci_dev *dev, int nvec, int type)
goto error;
}
no_ir:
- ret = setup_msi_irq(dev, msidesc, irq);
+ ret = setup_msi_irq(dev, msidesc, irq, 0);
if (ret < 0)
goto error;
sub_handle++;
@@ -3220,6 +3270,76 @@ error:
return ret;
}

+int setup_msi_irqs(struct pci_dev *dev, int nvec)
+{
+ int node, ret, sub_handle, index = 0;
+ unsigned int irq;
+ struct msi_desc *msidesc;
+
+ if (nvec > 1 && !irq_remapping_enabled)
+ return 1;
+
+ nvec = __roundup_pow_of_two(nvec);
+ ret = can_create_pow_of_two_irqs(nr_irqs_gsi, nvec);
+ if (ret != nvec)
+ return ret;
+
+ WARN_ON(!list_is_singular(&dev->msi_list));
+ msidesc = list_entry(dev->msi_list.next, struct msi_desc, list);
+ WARN_ON(msidesc->irq);
+ WARN_ON(msidesc->msi_attrib.multiple);
+
+ node = dev_to_node(&dev->dev);
+ irq = __create_irqs(nr_irqs_gsi, nvec, node);
+ if (irq == 0)
+ return -ENOSPC;
+
+ if (!irq_remapping_enabled) {
+ ret = setup_msi_irq(dev, msidesc, irq, 0);
+ if (ret < 0)
+ goto error;
+ return 0;
+ }
+
+ msidesc->msi_attrib.multiple = ilog2(nvec);
+ for (sub_handle = 0; sub_handle < nvec; sub_handle++) {
+ if (!sub_handle) {
+ index = msi_alloc_remapped_irq(dev, irq, nvec);
+ if (index < 0) {
+ ret = index;
+ goto error;
+ }
+ } else {
+ ret = msi_setup_remapped_irq(dev, irq + sub_handle,
+ index, sub_handle);
+ if (ret < 0)
+ goto error;
+ }
+ ret = setup_msi_irq(dev, msidesc, irq, sub_handle);
+ if (ret < 0)
+ goto error;
+ }
+ return 0;
+
+error:
+ destroy_irqs(irq, nvec);
+
+ /* Restore altered MSI descriptor fields and prevent just destroyed
+ * IRQs from tearing down again in default_teardown_msi_irqs()
+ */
+ msidesc->irq = 0;
+ msidesc->msi_attrib.multiple = 0;
+
+ return ret;
+}
+
+int native_setup_msi_irqs(struct pci_dev *dev, int nvec, int type)
+{
+ if (type == PCI_CAP_ID_MSI)
+ return setup_msi_irqs(dev, nvec);
+ return setup_msix_irqs(dev, nvec);
+}
+
void native_teardown_msi_irq(unsigned int irq)
{
destroy_irq(irq);
diff --git a/include/linux/irq.h b/include/linux/irq.h
index 216b0ba..c3ba39f 100644
--- a/include/linux/irq.h
+++ b/include/linux/irq.h
@@ -522,6 +522,8 @@ extern int irq_set_handler_data(unsigned int irq, void *data);
extern int irq_set_chip_data(unsigned int irq, void *data);
extern int irq_set_irq_type(unsigned int irq, unsigned int type);
extern int irq_set_msi_desc(unsigned int irq, struct msi_desc *entry);
+extern int irq_set_msi_desc_off(unsigned int irq_base, unsigned int irq_offset,
+ struct msi_desc *entry);
extern struct irq_data *irq_get_irq_data(unsigned int irq);

static inline struct irq_chip *irq_get_chip(unsigned int irq)
@@ -584,8 +586,12 @@ int __irq_alloc_descs(int irq, unsigned int from, unsigned int cnt, int node,
#define irq_alloc_desc_from(from, node) \
irq_alloc_descs(-1, from, 1, node)

+#define irq_alloc_descs_from(from, cnt, node) \
+ irq_alloc_descs(-1, from, cnt, node)
+
void irq_free_descs(unsigned int irq, unsigned int cnt);
int irq_reserve_irqs(unsigned int from, unsigned int cnt);
+int irq_can_alloc_irqs(unsigned int from, unsigned int cnt);

static inline void irq_free_desc(unsigned int irq)
{
diff --git a/kernel/irq/chip.c b/kernel/irq/chip.c
index 57d86d0..2230389 100644
--- a/kernel/irq/chip.c
+++ b/kernel/irq/chip.c
@@ -90,27 +90,41 @@ int irq_set_handler_data(unsigned int irq, void *data)
EXPORT_SYMBOL(irq_set_handler_data);

/**
- * irq_set_msi_desc - set MSI descriptor data for an irq
- * @irq: Interrupt number
- * @entry: Pointer to MSI descriptor data
+ * irq_set_msi_desc_off - set MSI descriptor data for an irq at offset
+ * @irq_base: Interrupt number base
+ * @irq_offset: Interrupt number offset
+ * @entry: Pointer to MSI descriptor data
*
- * Set the MSI descriptor entry for an irq
+ * Set the MSI descriptor entry for an irq at offset
*/
-int irq_set_msi_desc(unsigned int irq, struct msi_desc *entry)
+int irq_set_msi_desc_off(unsigned int irq_base, unsigned int irq_offset,
+ struct msi_desc *entry)
{
unsigned long flags;
- struct irq_desc *desc = irq_get_desc_lock(irq, &flags, IRQ_GET_DESC_CHECK_GLOBAL);
+ struct irq_desc *desc = irq_get_desc_lock(irq_base + irq_offset, &flags, IRQ_GET_DESC_CHECK_GLOBAL);

if (!desc)
return -EINVAL;
desc->irq_data.msi_desc = entry;
- if (entry)
- entry->irq = irq;
+ if (entry && !irq_offset)
+ entry->irq = irq_base;
irq_put_desc_unlock(desc, flags);
return 0;
}

/**
+ * irq_set_msi_desc - set MSI descriptor data for an irq
+ * @irq: Interrupt number
+ * @entry: Pointer to MSI descriptor data
+ *
+ * Set the MSI descriptor entry for an irq
+ */
+int irq_set_msi_desc(unsigned int irq, struct msi_desc *entry)
+{
+ return irq_set_msi_desc_off(irq, 0, entry);
+}
+
+/**
* irq_set_chip_data - set irq chip data for an irq
* @irq: Interrupt number
* @data: Pointer to chip specific data
diff --git a/kernel/irq/irqdesc.c b/kernel/irq/irqdesc.c
index 192a302..8287b78 100644
--- a/kernel/irq/irqdesc.c
+++ b/kernel/irq/irqdesc.c
@@ -210,6 +210,13 @@ static int irq_expand_nr_irqs(unsigned int nr)
return 0;
}

+static int irq_can_expand_nr_irqs(unsigned int nr)
+{
+ if (nr > IRQ_BITMAP_BITS)
+ return -ENOMEM;
+ return 0;
+}
+
int __init early_irq_init(void)
{
int i, initcnt, node = first_online_node;
@@ -414,6 +421,30 @@ int irq_reserve_irqs(unsigned int from, unsigned int cnt)
}

/**
+ * irq_can_alloc_irqs - checks if a range of irqs could be allocated
+ * @from: check from irq number
+ * @cnt: number of irqs to check
+ *
+ * Returns 0 on success or an appropriate error code
+ */
+int irq_can_alloc_irqs(unsigned int from, unsigned int cnt)
+{
+ unsigned int start;
+ int ret = 0;
+
+ if (!cnt)
+ return -EINVAL;
+
+ mutex_lock(&sparse_irq_lock);
+ start = bitmap_find_next_zero_area(allocated_irqs, IRQ_BITMAP_BITS,
+ from, cnt, 0);
+ mutex_unlock(&sparse_irq_lock);
+ if (start + cnt > nr_irqs)
+ ret = irq_can_expand_nr_irqs(start + cnt);
+ return ret;
+}
+
+/**
* irq_get_next_irq - get next allocated irq number
* @offset: where to start the search
*
--
1.7.7.6


--
Regards,
Alexander Gordeev
[email protected]

2012-10-01 08:10:47

by Alexander Gordeev

[permalink] [raw]
Subject: [PATCH v3 -tip 2/5] x86, MSI: Allocate as many multiple IRQs as requested

When multiple MSIs are enabled with pci_enable_msi_block() the number of
allocated IRQs 'nvec' is rounded up to the nearest value of power of two.
That could lead to a condition when number of requested and used IRQs is
less than number of actually allocated IRQs.

This fix introduces 'msi_desc::nvec' field to address the above issue -
when non-zero, it holds the number of allocated IRQs. Otherwise, the old
method is used.

Signed-off-by: Alexander Gordeev <[email protected]>
---
arch/x86/kernel/apic/io_apic.c | 16 +++++++---------
drivers/pci/msi.c | 10 ++++++++--
include/linux/msi.h | 1 +
3 files changed, 16 insertions(+), 11 deletions(-)

diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
index d5cb13c..84d632b 100644
--- a/arch/x86/kernel/apic/io_apic.c
+++ b/arch/x86/kernel/apic/io_apic.c
@@ -3088,16 +3088,12 @@ static inline void destroy_irqs(unsigned int irq, unsigned int count)
}

static inline int
-can_create_pow_of_two_irqs(unsigned int from, unsigned int count)
+can_create_irqs(unsigned int from, unsigned int count)
{
- if ((count > 1) && (count % 2))
- return -EINVAL;
-
- for (; count; count = count / 2) {
+ for (; count; count = count - 1) {
if (!irq_can_alloc_irqs(from, count))
return count;
}
-
return -ENOSPC;
}

@@ -3279,8 +3275,7 @@ int setup_msi_irqs(struct pci_dev *dev, int nvec)
if (nvec > 1 && !irq_remapping_enabled)
return 1;

- nvec = __roundup_pow_of_two(nvec);
- ret = can_create_pow_of_two_irqs(nr_irqs_gsi, nvec);
+ ret = can_create_irqs(nr_irqs_gsi, nvec);
if (ret != nvec)
return ret;

@@ -3288,11 +3283,13 @@ int setup_msi_irqs(struct pci_dev *dev, int nvec)
msidesc = list_entry(dev->msi_list.next, struct msi_desc, list);
WARN_ON(msidesc->irq);
WARN_ON(msidesc->msi_attrib.multiple);
+ WARN_ON(msidesc->nvec);

node = dev_to_node(&dev->dev);
irq = __create_irqs(nr_irqs_gsi, nvec, node);
if (irq == 0)
return -ENOSPC;
+ msidesc->nvec = nvec;

if (!irq_remapping_enabled) {
ret = setup_msi_irq(dev, msidesc, irq, 0);
@@ -3301,7 +3298,7 @@ int setup_msi_irqs(struct pci_dev *dev, int nvec)
return 0;
}

- msidesc->msi_attrib.multiple = ilog2(nvec);
+ msidesc->msi_attrib.multiple = ilog2(__roundup_pow_of_two(nvec));
for (sub_handle = 0; sub_handle < nvec; sub_handle++) {
if (!sub_handle) {
index = msi_alloc_remapped_irq(dev, irq, nvec);
@@ -3329,6 +3326,7 @@ error:
*/
msidesc->irq = 0;
msidesc->msi_attrib.multiple = 0;
+ msidesc->nvec = 0;

return ret;
}
diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
index a825d78..f0752d1 100644
--- a/drivers/pci/msi.c
+++ b/drivers/pci/msi.c
@@ -79,7 +79,10 @@ void default_teardown_msi_irqs(struct pci_dev *dev)
int i, nvec;
if (entry->irq == 0)
continue;
- nvec = 1 << entry->msi_attrib.multiple;
+ if (entry->nvec)
+ nvec = entry->nvec;
+ else
+ nvec = 1 << entry->msi_attrib.multiple;
for (i = 0; i < nvec; i++)
arch_teardown_msi_irq(entry->irq + i);
}
@@ -336,7 +339,10 @@ static void free_msi_irqs(struct pci_dev *dev)
int i, nvec;
if (!entry->irq)
continue;
- nvec = 1 << entry->msi_attrib.multiple;
+ if (entry->nvec)
+ nvec = entry->nvec;
+ else
+ nvec = 1 << entry->msi_attrib.multiple;
for (i = 0; i < nvec; i++)
BUG_ON(irq_has_action(entry->irq + i));
}
diff --git a/include/linux/msi.h b/include/linux/msi.h
index ce93a34..6f4dfba 100644
--- a/include/linux/msi.h
+++ b/include/linux/msi.h
@@ -35,6 +35,7 @@ struct msi_desc {

u32 masked; /* mask bits */
unsigned int irq;
+ unsigned int nvec;
struct list_head list;

union {
--
1.7.7.6


--
Regards,
Alexander Gordeev
[email protected]

2012-10-01 08:12:58

by Alexander Gordeev

[permalink] [raw]
Subject: [PATCH v3 -tip 4/5] PCI, MSI: Enable multiple MSIs with pci_enable_msi_block_auto()

The new function pci_enable_msi_block_auto() tries to allocate maximum
possible number of MSIs up to the number the device supports. It
generalizes a pattern when pci_enable_msi_block() is contiguously called
until it succeeds or fails.

Opposite to pci_enable_msi_block() which takes the number of MSIs to
allocate as a input parameter, pci_enable_msi_block_auto() could be used
by device drivers to obtain the number of assigned MSIs and the number
of MSIs the device supports.

Signed-off-by: Alexander Gordeev <[email protected]>
Acked-by: Bjorn Helgaas <[email protected]>
---
Documentation/PCI/MSI-HOWTO.txt | 37 ++++++++++++++++++++++++++++++++-----
drivers/pci/msi.c | 26 ++++++++++++++++++++++++++
include/linux/pci.h | 7 +++++++
3 files changed, 65 insertions(+), 5 deletions(-)

diff --git a/Documentation/PCI/MSI-HOWTO.txt b/Documentation/PCI/MSI-HOWTO.txt
index 53e6fca..a091780 100644
--- a/Documentation/PCI/MSI-HOWTO.txt
+++ b/Documentation/PCI/MSI-HOWTO.txt
@@ -127,15 +127,42 @@ on the number of vectors that can be allocated; pci_enable_msi_block()
returns as soon as it finds any constraint that doesn't allow the
call to succeed.

-4.2.3 pci_disable_msi
+4.2.3 pci_enable_msi_block_auto
+
+int pci_enable_msi_block_auto(struct pci_dev *dev, unsigned int *count)
+
+This variation on pci_enable_msi() call allows a device driver to request
+the maximum possible number of MSIs. The MSI specification only allows
+interrupts to be allocated in powers of two, up to a maximum of 2^5 (32).
+
+If this function returns a positive number, it indicates that it has
+succeeded and the returned value is the number of allocated interrupts. In
+this case, the function enables MSI on this device and updates dev->irq to
+be the lowest of the new interrupts assigned to it. The other interrupts
+assigned to the device are in the range dev->irq to dev->irq + returned
+value - 1.
+
+If this function returns a negative number, it indicates an error and
+the driver should not attempt to request any more MSI interrupts for
+this device.
+
+If the device driver needs to know the number of interrupts the device
+supports it can pass the pointer count where that number is stored. The
+device driver must decide what action to take if pci_enable_msi_block_auto()
+succeeds, but returns a value less than the number of interrupts supported.
+If the device driver does not need to know the number of interrupts
+supported, it can set the pointer count to NULL.
+
+4.2.4 pci_disable_msi

void pci_disable_msi(struct pci_dev *dev)

This function should be used to undo the effect of pci_enable_msi() or
-pci_enable_msi_block(). Calling it restores dev->irq to the pin-based
-interrupt number and frees the previously allocated message signaled
-interrupt(s). The interrupt may subsequently be assigned to another
-device, so drivers should not cache the value of dev->irq.
+pci_enable_msi_block() or pci_enable_msi_block_auto(). Calling it restores
+dev->irq to the pin-based interrupt number and frees the previously
+allocated message signaled interrupt(s). The interrupt may subsequently be
+assigned to another device, so drivers should not cache the value of
+dev->irq.

Before calling this function, a device driver must always call free_irq()
on any interrupt for which it previously called request_irq().
diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
index f0752d1..690b268 100644
--- a/drivers/pci/msi.c
+++ b/drivers/pci/msi.c
@@ -845,6 +845,32 @@ int pci_enable_msi_block(struct pci_dev *dev, unsigned int nvec)
}
EXPORT_SYMBOL(pci_enable_msi_block);

+int pci_enable_msi_block_auto(struct pci_dev *dev, unsigned int *maxvec)
+{
+ int ret, pos, nvec;
+ u16 msgctl;
+
+ pos = pci_find_capability(dev, PCI_CAP_ID_MSI);
+ if (!pos)
+ return -EINVAL;
+
+ pci_read_config_word(dev, pos + PCI_MSI_FLAGS, &msgctl);
+ ret = 1 << ((msgctl & PCI_MSI_FLAGS_QMASK) >> 1);
+
+ if (maxvec)
+ *maxvec = ret;
+
+ do {
+ nvec = ret;
+ ret = pci_enable_msi_block(dev, nvec);
+ } while (ret > 0);
+
+ if (ret < 0)
+ return ret;
+ return nvec;
+}
+EXPORT_SYMBOL(pci_enable_msi_block_auto);
+
void pci_msi_shutdown(struct pci_dev *dev)
{
struct msi_desc *desc;
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 5faa831..b8a9454 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -1070,6 +1070,12 @@ static inline int pci_enable_msi_block(struct pci_dev *dev, unsigned int nvec)
return -1;
}

+static inline int
+pci_enable_msi_block_auto(struct pci_dev *dev, unsigned int *maxvec)
+{
+ return -1;
+}
+
static inline void pci_msi_shutdown(struct pci_dev *dev)
{ }
static inline void pci_disable_msi(struct pci_dev *dev)
@@ -1101,6 +1107,7 @@ static inline int pci_msi_enabled(void)
}
#else
extern int pci_enable_msi_block(struct pci_dev *dev, unsigned int nvec);
+extern int pci_enable_msi_block_auto(struct pci_dev *dev, unsigned int *maxvec);
extern void pci_msi_shutdown(struct pci_dev *dev);
extern void pci_disable_msi(struct pci_dev *dev);
extern int pci_msix_table_size(struct pci_dev *dev);
--
1.7.7.6


--
Regards,
Alexander Gordeev
[email protected]

2012-10-01 08:13:43

by Alexander Gordeev

[permalink] [raw]
Subject: [PATCH v3 -tip 5/5] AHCI: Support multiple MSIs

Take advantage of multiple MSIs implementation on x86 - on systems with
IRQ remapping AHCI ports not only get assigned separate MSI vectors -
but also separate IRQs. As result, interrupts generated by different
ports could be serviced on different CPUs rather than on a single one.

In cases when number of allocated MSIs is less than requested the Sharing
Last MSI mode does not get used, no matter implemented in hardware or not.
Instead, the driver assumes the advantage of multiple MSIs is negated and
falls back to the single MSI mode as if MRSM bit was set (some Intel chips
implement this strategy anyway - MRSM bit gets set even if the number of
allocated MSIs exceeds the number of implemented ports).

Signed-off-by: Alexander Gordeev <[email protected]>
---
drivers/ata/ahci.c | 91 ++++++++++++++++++++++++++++++++++++--
drivers/ata/ahci.h | 6 +++
drivers/ata/libahci.c | 118 ++++++++++++++++++++++++++++++++++++++++++++++---
3 files changed, 205 insertions(+), 10 deletions(-)

diff --git a/drivers/ata/ahci.c b/drivers/ata/ahci.c
index 7862d17..5463fcea 100644
--- a/drivers/ata/ahci.c
+++ b/drivers/ata/ahci.c
@@ -1057,6 +1057,84 @@ static inline void ahci_gtf_filter_workaround(struct ata_host *host)
{}
#endif

+int ahci_init_interrupts(struct pci_dev *pdev, struct ahci_host_priv *hpriv)
+{
+ int rc;
+ unsigned int maxvec;
+
+ if (!(hpriv->flags & AHCI_HFLAG_NO_MSI)) {
+ rc = pci_enable_msi_block_auto(pdev, &maxvec);
+ if (rc > 0) {
+ if ((rc == maxvec) || (rc == 1))
+ return rc;
+ /* assume that advantage of multipe MSIs is negated,
+ * so fallback to single MSI mode to save resources */
+ pci_disable_msi(pdev);
+ if (!pci_enable_msi(pdev))
+ return 1;
+ }
+ }
+
+ pci_intx(pdev, 1);
+ return 0;
+}
+
+/**
+ * ahci_host_activate - start AHCI host, request IRQs and register it
+ * @host: target ATA host
+ * @irq: base IRQ number to request
+ * @n_msis: number of MSIs allocated for this host
+ * @irq_handler: irq_handler used when requesting IRQs
+ * @irq_flags: irq_flags used when requesting IRQs
+ *
+ * Similar to ata_host_activate, but requests IRQs according to AHCI-1.1
+ * when multiple MSIs were allocated. That is one MSI per port, starting
+ * from @irq.
+ *
+ * LOCKING:
+ * Inherited from calling layer (may sleep).
+ *
+ * RETURNS:
+ * 0 on success, -errno otherwise.
+ */
+int ahci_host_activate(struct ata_host *host, int irq, unsigned int n_msis)
+{
+ int i, rc;
+
+ /* Sharing Last Message among several ports is not supported */
+ if (n_msis < host->n_ports)
+ return -EINVAL;
+
+ rc = ata_host_start(host);
+ if (rc)
+ return rc;
+
+ for (i = 0; i < host->n_ports; i++) {
+ rc = devm_request_threaded_irq(host->dev,
+ irq + i, ahci_hw_interrupt, ahci_thread_fn, IRQF_SHARED,
+ dev_driver_string(host->dev), host->ports[i]);
+ if (rc)
+ goto out_free_irqs;
+ }
+
+ for (i = 0; i < host->n_ports; i++)
+ ata_port_desc(host->ports[i], "irq %d", irq + i);
+
+ rc = ata_host_register(host, &ahci_sht);
+ if (rc)
+ goto out_free_all_irqs;
+
+ return 0;
+
+out_free_all_irqs:
+ i = host->n_ports;
+out_free_irqs:
+ for (; i; i--)
+ devm_free_irq(host->dev, irq + i - 1, host->ports[i]);
+
+ return rc;
+}
+
static int ahci_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
{
unsigned int board_id = ent->driver_data;
@@ -1065,7 +1143,7 @@ static int ahci_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
struct device *dev = &pdev->dev;
struct ahci_host_priv *hpriv;
struct ata_host *host;
- int n_ports, i, rc;
+ int n_ports, n_msis, i, rc;
int ahci_pci_bar = AHCI_PCI_BAR_STANDARD;

VPRINTK("ENTER\n");
@@ -1150,11 +1228,12 @@ static int ahci_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
if (ahci_sb600_enable_64bit(pdev))
hpriv->flags &= ~AHCI_HFLAG_32BIT_ONLY;

- if ((hpriv->flags & AHCI_HFLAG_NO_MSI) || pci_enable_msi(pdev))
- pci_intx(pdev, 1);
-
hpriv->mmio = pcim_iomap_table(pdev)[ahci_pci_bar];

+ n_msis = ahci_init_interrupts(pdev, hpriv);
+ if (n_msis > 1)
+ hpriv->flags |= AHCI_HFLAG_MULTI_MSI;
+
/* save initial config */
ahci_pci_save_initial_config(pdev, hpriv);

@@ -1250,6 +1329,10 @@ static int ahci_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
ahci_pci_print_info(host);

pci_set_master(pdev);
+
+ if (hpriv->flags & AHCI_HFLAG_MULTI_MSI)
+ return ahci_host_activate(host, pdev->irq, n_msis);
+
return ata_host_activate(host, pdev->irq, ahci_interrupt, IRQF_SHARED,
&ahci_sht);
}
diff --git a/drivers/ata/ahci.h b/drivers/ata/ahci.h
index 57eb1c2..24251e8 100644
--- a/drivers/ata/ahci.h
+++ b/drivers/ata/ahci.h
@@ -216,6 +216,7 @@ enum {
AHCI_HFLAG_DELAY_ENGINE = (1 << 15), /* do not start engine on
port start (wait until
error-handling stage) */
+ AHCI_HFLAG_MULTI_MSI = (1 << 16), /* multiple PCI MSIs */

/* ap->flags bits */

@@ -282,6 +283,8 @@ struct ahci_port_priv {
unsigned int ncq_saw_d2h:1;
unsigned int ncq_saw_dmas:1;
unsigned int ncq_saw_sdb:1;
+ u32 intr_status; /* interrupts to handle */
+ spinlock_t lock;
u32 intr_mask; /* interrupts to enable */
bool fbs_supported; /* set iff FBS is supported */
bool fbs_enabled; /* set iff FBS is enabled */
@@ -343,7 +346,10 @@ void ahci_set_em_messages(struct ahci_host_priv *hpriv,
struct ata_port_info *pi);
int ahci_reset_em(struct ata_host *host);
irqreturn_t ahci_interrupt(int irq, void *dev_instance);
+irqreturn_t ahci_hw_interrupt(int irq, void *dev_instance);
+irqreturn_t ahci_thread_fn(int irq, void *dev_instance);
void ahci_print_info(struct ata_host *host, const char *scc_s);
+int ahci_host_activate(struct ata_host *host, int irq, unsigned int n_msis);

static inline void __iomem *__ahci_port_base(struct ata_host *host,
unsigned int port_no)
diff --git a/drivers/ata/libahci.c b/drivers/ata/libahci.c
index 555c07a..3b54e84 100644
--- a/drivers/ata/libahci.c
+++ b/drivers/ata/libahci.c
@@ -1639,19 +1639,16 @@ static void ahci_error_intr(struct ata_port *ap, u32 irq_stat)
ata_port_abort(ap);
}

-static void ahci_port_intr(struct ata_port *ap)
+static void ahci_handle_port_interrupt(struct ata_port *ap,
+ void __iomem *port_mmio, u32 status)
{
- void __iomem *port_mmio = ahci_port_base(ap);
struct ata_eh_info *ehi = &ap->link.eh_info;
struct ahci_port_priv *pp = ap->private_data;
struct ahci_host_priv *hpriv = ap->host->private_data;
int resetting = !!(ap->pflags & ATA_PFLAG_RESETTING);
- u32 status, qc_active = 0;
+ u32 qc_active = 0;
int rc;

- status = readl(port_mmio + PORT_IRQ_STAT);
- writel(status, port_mmio + PORT_IRQ_STAT);
-
/* ignore BAD_PMP while resetting */
if (unlikely(resetting))
status &= ~PORT_IRQ_BAD_PMP;
@@ -1727,6 +1724,107 @@ static void ahci_port_intr(struct ata_port *ap)
}
}

+void ahci_port_intr(struct ata_port *ap)
+{
+ void __iomem *port_mmio = ahci_port_base(ap);
+ u32 status;
+
+ status = readl(port_mmio + PORT_IRQ_STAT);
+ writel(status, port_mmio + PORT_IRQ_STAT);
+
+ ahci_handle_port_interrupt(ap, port_mmio, status);
+}
+
+irqreturn_t ahci_thread_fn(int irq, void *dev_instance)
+{
+ struct ata_port *ap = dev_instance;
+ struct ahci_port_priv *pp = ap->private_data;
+ void __iomem *port_mmio = ahci_port_base(ap);
+ unsigned long flags;
+ u32 status;
+
+ spin_lock_irqsave(&ap->host->lock, flags);
+ status = pp->intr_status;
+ if (status)
+ pp->intr_status = 0;
+ spin_unlock_irqrestore(&ap->host->lock, flags);
+
+ spin_lock_bh(ap->lock);
+ ahci_handle_port_interrupt(ap, port_mmio, status);
+ spin_unlock_bh(ap->lock);
+
+ return IRQ_HANDLED;
+}
+EXPORT_SYMBOL_GPL(ahci_thread_fn);
+
+void ahci_hw_port_interrupt(struct ata_port *ap)
+{
+ void __iomem *port_mmio = ahci_port_base(ap);
+ struct ahci_port_priv *pp = ap->private_data;
+ u32 status;
+
+ status = readl(port_mmio + PORT_IRQ_STAT);
+ writel(status, port_mmio + PORT_IRQ_STAT);
+
+ pp->intr_status |= status;
+}
+
+irqreturn_t ahci_hw_interrupt(int irq, void *dev_instance)
+{
+ struct ata_port *ap_this = dev_instance;
+ struct ahci_port_priv *pp = ap_this->private_data;
+ struct ata_host *host = ap_this->host;
+ struct ahci_host_priv *hpriv = host->private_data;
+ void __iomem *mmio = hpriv->mmio;
+ unsigned int i;
+ u32 irq_stat, irq_masked;
+
+ VPRINTK("ENTER\n");
+
+ spin_lock(&host->lock);
+
+ irq_stat = readl(mmio + HOST_IRQ_STAT);
+
+ if (!irq_stat) {
+ u32 status = pp->intr_status;
+
+ spin_unlock(&host->lock);
+
+ VPRINTK("EXIT\n");
+
+ return status ? IRQ_WAKE_THREAD : IRQ_NONE;
+ }
+
+ irq_masked = irq_stat & hpriv->port_map;
+
+ for (i = 0; i < host->n_ports; i++) {
+ struct ata_port *ap;
+
+ if (!(irq_masked & (1 << i)))
+ continue;
+
+ ap = host->ports[i];
+ if (ap) {
+ ahci_hw_port_interrupt(ap);
+ VPRINTK("port %u\n", i);
+ } else {
+ VPRINTK("port %u (no irq)\n", i);
+ if (ata_ratelimit())
+ dev_warn(host->dev,
+ "interrupt on disabled port %u\n", i);
+ }
+ }
+
+ writel(irq_stat, mmio + HOST_IRQ_STAT);
+
+ spin_unlock(&host->lock);
+
+ VPRINTK("EXIT\n");
+
+ return IRQ_WAKE_THREAD;
+}
+EXPORT_SYMBOL_GPL(ahci_hw_interrupt);
+
irqreturn_t ahci_interrupt(int irq, void *dev_instance)
{
struct ata_host *host = dev_instance;
@@ -2105,6 +2203,14 @@ static int ahci_port_start(struct ata_port *ap)
*/
pp->intr_mask = DEF_PORT_IRQ;

+ /*
+ * Switch to per-port locking in case each port has its own MSI vector.
+ */
+ if ((hpriv->flags & AHCI_HFLAG_MULTI_MSI)) {
+ spin_lock_init(&pp->lock);
+ ap->lock = &pp->lock;
+ }
+
ap->private_data = pp;

/* engage engines, captain */
--
1.7.7.6


--
Regards,
Alexander Gordeev
[email protected]

2012-10-01 08:54:43

by Alexander Gordeev

[permalink] [raw]
Subject: [PATCH v3 -tip 3/5] x86, MSI: Minor readability fixes

Signed-off-by: Alexander Gordeev <[email protected]>
---
arch/x86/kernel/apic/io_apic.c | 6 +++---
1 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
index 84d632b..0489699 100644
--- a/arch/x86/kernel/apic/io_apic.c
+++ b/arch/x86/kernel/apic/io_apic.c
@@ -3123,7 +3123,7 @@ static int msi_compose_msg(struct pci_dev *pdev, unsigned int irq,

if (irq_remapped(cfg)) {
compose_remapped_msi_msg(pdev, irq, dest, msg, hpet_id);
- return err;
+ return 0;
}

if (x2apic_enabled())
@@ -3150,7 +3150,7 @@ static int msi_compose_msg(struct pci_dev *pdev, unsigned int irq,
MSI_DATA_DELIVERY_LOWPRI) |
MSI_DATA_VECTOR(cfg->vector);

- return err;
+ return 0;
}

static int
@@ -3232,7 +3232,7 @@ int setup_msix_irqs(struct pci_dev *dev, int nvec)
list_for_each_entry(msidesc, &dev->msi_list, list) {
irq = create_irq_nr(irq_want, node);
if (irq == 0)
- return -1;
+ return -ENOSPC;
irq_want = irq + 1;
if (!irq_remapping_enabled)
goto no_ir;
--
1.7.7.6


--
Regards,
Alexander Gordeev
[email protected]

2012-10-02 03:04:37

by Jeff Garzik

[permalink] [raw]
Subject: Re: [PATCH v3 -tip 5/5] AHCI: Support multiple MSIs

On 10/01/2012 04:13 AM, Alexander Gordeev wrote:
> Take advantage of multiple MSIs implementation on x86 - on systems with
> IRQ remapping AHCI ports not only get assigned separate MSI vectors -
> but also separate IRQs. As result, interrupts generated by different
> ports could be serviced on different CPUs rather than on a single one.
>
> In cases when number of allocated MSIs is less than requested the Sharing
> Last MSI mode does not get used, no matter implemented in hardware or not.
> Instead, the driver assumes the advantage of multiple MSIs is negated and
> falls back to the single MSI mode as if MRSM bit was set (some Intel chips
> implement this strategy anyway - MRSM bit gets set even if the number of
> allocated MSIs exceeds the number of implemented ports).
>
> Signed-off-by: Alexander Gordeev <[email protected]>
> ---
> drivers/ata/ahci.c | 91 ++++++++++++++++++++++++++++++++++++--
> drivers/ata/ahci.h | 6 +++
> drivers/ata/libahci.c | 118 ++++++++++++++++++++++++++++++++++++++++++++++---
> 3 files changed, 205 insertions(+), 10 deletions(-)

Acked-by: Jeff Garzik <[email protected]>

Normally, this amount of changes would -really- need to go through the
libata tree. However, given the amount of dependencies, it either needs
a merge tree or to go through the PCI tree...?

Any maintainer comments on disposition?

Jeff



2012-10-02 04:22:02

by Bjorn Helgaas

[permalink] [raw]
Subject: Re: [PATCH v3 -tip 5/5] AHCI: Support multiple MSIs

On Mon, Oct 1, 2012 at 9:04 PM, Jeff Garzik <[email protected]> wrote:
> On 10/01/2012 04:13 AM, Alexander Gordeev wrote:
>>
>> Take advantage of multiple MSIs implementation on x86 - on systems with
>> IRQ remapping AHCI ports not only get assigned separate MSI vectors -
>> but also separate IRQs. As result, interrupts generated by different
>> ports could be serviced on different CPUs rather than on a single one.
>>
>> In cases when number of allocated MSIs is less than requested the Sharing
>> Last MSI mode does not get used, no matter implemented in hardware or not.
>> Instead, the driver assumes the advantage of multiple MSIs is negated and
>> falls back to the single MSI mode as if MRSM bit was set (some Intel chips
>> implement this strategy anyway - MRSM bit gets set even if the number of
>> allocated MSIs exceeds the number of implemented ports).
>>
>> Signed-off-by: Alexander Gordeev <[email protected]>
>> ---
>> drivers/ata/ahci.c | 91 ++++++++++++++++++++++++++++++++++++--
>> drivers/ata/ahci.h | 6 +++
>> drivers/ata/libahci.c | 118
>> ++++++++++++++++++++++++++++++++++++++++++++++---
>> 3 files changed, 205 insertions(+), 10 deletions(-)
>
>
> Acked-by: Jeff Garzik <[email protected]>
>
> Normally, this amount of changes would -really- need to go through the
> libata tree. However, given the amount of dependencies, it either needs a
> merge tree or to go through the PCI tree...?
>
> Any maintainer comments on disposition?

For what it's worth, the bulk of this change is outside PCI, so it
doesn't seem to me like it should go through the PCI tree. I think I
did ack the part that touched PCI, and there's not much activity in
the PCI MSI area right now, so I'm fine with it going through libata
or whatever people think makes sense.

2012-10-02 04:49:08

by Jeff Garzik

[permalink] [raw]
Subject: Re: [PATCH v3 -tip 5/5] AHCI: Support multiple MSIs

On 10/02/2012 12:21 AM, Bjorn Helgaas wrote:
> On Mon, Oct 1, 2012 at 9:04 PM, Jeff Garzik <[email protected]> wrote:
>> On 10/01/2012 04:13 AM, Alexander Gordeev wrote:
>>>
>>> Take advantage of multiple MSIs implementation on x86 - on systems with
>>> IRQ remapping AHCI ports not only get assigned separate MSI vectors -
>>> but also separate IRQs. As result, interrupts generated by different
>>> ports could be serviced on different CPUs rather than on a single one.
>>>
>>> In cases when number of allocated MSIs is less than requested the Sharing
>>> Last MSI mode does not get used, no matter implemented in hardware or not.
>>> Instead, the driver assumes the advantage of multiple MSIs is negated and
>>> falls back to the single MSI mode as if MRSM bit was set (some Intel chips
>>> implement this strategy anyway - MRSM bit gets set even if the number of
>>> allocated MSIs exceeds the number of implemented ports).
>>>
>>> Signed-off-by: Alexander Gordeev <[email protected]>
>>> ---
>>> drivers/ata/ahci.c | 91 ++++++++++++++++++++++++++++++++++++--
>>> drivers/ata/ahci.h | 6 +++
>>> drivers/ata/libahci.c | 118
>>> ++++++++++++++++++++++++++++++++++++++++++++++---
>>> 3 files changed, 205 insertions(+), 10 deletions(-)
>>
>>
>> Acked-by: Jeff Garzik <[email protected]>
>>
>> Normally, this amount of changes would -really- need to go through the
>> libata tree. However, given the amount of dependencies, it either needs a
>> merge tree or to go through the PCI tree...?
>>
>> Any maintainer comments on disposition?
>
> For what it's worth, the bulk of this change is outside PCI, so it
> doesn't seem to me like it should go through the PCI tree. I think I
> did ack the part that touched PCI, and there's not much activity in
> the PCI MSI area right now, so I'm fine with it going through libata
> or whatever people think makes sense.

That works for me, too. I'm ready to queue it, if libata tree is fine
with people.

Jeff



2012-10-02 04:55:28

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH v3 -tip 1/5] x86, MSI: Support multiple MSIs in presense of IRQ remapping


* Alexander Gordeev <[email protected]> wrote:

> The MSI specification has several constraints in comparison with MSI-X,
> most notable of them is the inability to configure MSIs independently.
> As a result, it is impossible to dispatch interrupts from different
> queues to different CPUs. This is largely devalues the support of
> multiple MSIs in SMP systems.
>
> Also, a necessity to allocate a contiguous block of vector numbers for
> devices capable of multiple MSIs might cause a considerable pressure on
> x86 interrupt vector allocator and could lead to fragmentation of the
> interrupt vectors space.
>
> This patch overcomes both drawbacks in presense of IRQ remapping and
> lets devices take advantage of multiple queues and per-IRQ affinity
> assignments.
>
> Signed-off-by: Alexander Gordeev <[email protected]>
> ---
> arch/x86/kernel/apic/io_apic.c | 174 +++++++++++++++++++++++++++++++++------
> include/linux/irq.h | 6 ++
> kernel/irq/chip.c | 30 +++++--
> kernel/irq/irqdesc.c | 31 +++++++
> 4 files changed, 206 insertions(+), 35 deletions(-)
>
> diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
> index c265593..d5cb13c 100644
> --- a/arch/x86/kernel/apic/io_apic.c
> +++ b/arch/x86/kernel/apic/io_apic.c
> @@ -305,6 +305,11 @@ static int alloc_irq_from(unsigned int from, int node)
> return irq_alloc_desc_from(from, node);
> }
>
> +static int alloc_irqs_from(unsigned int from, unsigned int count, int node)
> +{
> + return irq_alloc_descs_from(from, count, node);
> +}
> +
> static void free_irq_at(unsigned int at, struct irq_cfg *cfg)
> {
> free_irq_cfg(at, cfg);
> @@ -2991,37 +2996,58 @@ device_initcall(ioapic_init_ops);
> /*
> * Dynamic irq allocate and deallocation
> */
> -unsigned int create_irq_nr(unsigned int from, int node)
> +unsigned int __create_irqs(unsigned int from, unsigned int count, int node)
> {
> - struct irq_cfg *cfg;
> + struct irq_cfg **cfg;
> unsigned long flags;
> - unsigned int ret = 0;
> - int irq;
> + int irq, i;
>
> if (from < nr_irqs_gsi)
> from = nr_irqs_gsi;
>
> - irq = alloc_irq_from(from, node);
> - if (irq < 0)
> - return 0;
> - cfg = alloc_irq_cfg(irq, node);
> - if (!cfg) {
> - free_irq_at(irq, NULL);
> + cfg = kzalloc_node(count * sizeof(cfg[0]), GFP_KERNEL, node);
> + if (!cfg)
> return 0;
> +
> + irq = alloc_irqs_from(from, count, node);
> + if (irq < 0)
> + goto out_cfgs;
> +
> + for (i = 0; i < count; i++) {
> + cfg[i] = alloc_irq_cfg(irq + i, node);
> + if (!cfg[i])
> + goto out_irqs;
> }
>
> raw_spin_lock_irqsave(&vector_lock, flags);
> - if (!__assign_irq_vector(irq, cfg, apic->target_cpus()))
> - ret = irq;
> + for (i = 0; i < count; i++)
> + if (__assign_irq_vector(irq + i, cfg[i], apic->target_cpus()))
> + goto out_vecs;
> raw_spin_unlock_irqrestore(&vector_lock, flags);
>
> - if (ret) {
> - irq_set_chip_data(irq, cfg);
> - irq_clear_status_flags(irq, IRQ_NOREQUEST);
> - } else {
> - free_irq_at(irq, cfg);
> + for (i = 0; i < count; i++) {
> + irq_set_chip_data(irq + i, cfg[i]);
> + irq_clear_status_flags(irq + i, IRQ_NOREQUEST);
> }
> - return ret;
> +
> + kfree(cfg);
> + return irq;
> +
> +out_vecs:
> + for (; i; i--)
> + __clear_irq_vector(irq + i - 1, cfg[i - 1]);
> + raw_spin_unlock_irqrestore(&vector_lock, flags);
> +out_irqs:
> + for (i = 0; i < count; i++)
> + free_irq_at(irq + i, cfg[i]);
> +out_cfgs:
> + kfree(cfg);
> + return 0;
> +}
> +
> +unsigned int create_irq_nr(unsigned int from, int node)
> +{
> + return __create_irqs(from, 1, node);
> }
>
> int create_irq(void)
> @@ -3054,6 +3080,27 @@ void destroy_irq(unsigned int irq)
> free_irq_at(irq, cfg);
> }
>
> +static inline void destroy_irqs(unsigned int irq, unsigned int count)
> +{
> + unsigned int i;
> + for (i = 0; i < count; i++)

Missing newline.

> + destroy_irq(irq + i);
> +}
> +
> +static inline int
> +can_create_pow_of_two_irqs(unsigned int from, unsigned int count)
> +{
> + if ((count > 1) && (count % 2))
> + return -EINVAL;
> +
> + for (; count; count = count / 2) {
> + if (!irq_can_alloc_irqs(from, count))
> + return count;
> + }
> +
> + return -ENOSPC;
> +}
> +
> /*
> * MSI message composition
> */
> @@ -3145,18 +3192,25 @@ static struct irq_chip msi_chip = {
> .irq_retrigger = ioapic_retrigger_irq,
> };
>
> -static int setup_msi_irq(struct pci_dev *dev, struct msi_desc *msidesc, int irq)
> +static int setup_msi_irq(struct pci_dev *dev, struct msi_desc *msidesc,
> + unsigned int irq_base, unsigned int irq_offset)
> {
> struct irq_chip *chip = &msi_chip;
> struct msi_msg msg;
> + unsigned int irq = irq_base + irq_offset;
> int ret;
>
> ret = msi_compose_msg(dev, irq, &msg, -1);
> if (ret < 0)
> return ret;
>
> - irq_set_msi_desc(irq, msidesc);
> - write_msi_msg(irq, &msg);
> + irq_set_msi_desc_off(irq_base, irq_offset, msidesc);
> +
> + /* MSI-X message is written per-IRQ, the offset is always 0.
> + * MSI message denotes a contiguous group of IRQs, written for 0th IRQ.
> + */

Please use the customary (multi-line) comment style:

/*
* Comment .....
* ...... goes here.
*/

specified in Documentation/CodingStyle.


> + if (!irq_offset)
> + write_msi_msg(irq, &msg);
>
> if (irq_remapped(irq_get_chip_data(irq))) {
> irq_set_status_flags(irq, IRQ_MOVE_PCNTXT);
> @@ -3170,16 +3224,12 @@ static int setup_msi_irq(struct pci_dev *dev, struct msi_desc *msidesc, int irq)
> return 0;
> }
>
> -int native_setup_msi_irqs(struct pci_dev *dev, int nvec, int type)
> +int setup_msix_irqs(struct pci_dev *dev, int nvec)
> {
> int node, ret, sub_handle, index = 0;
> unsigned int irq, irq_want;
> struct msi_desc *msidesc;
>
> - /* x86 doesn't support multiple MSI yet */
> - if (type == PCI_CAP_ID_MSI && nvec > 1)
> - return 1;
> -
> node = dev_to_node(&dev->dev);
> irq_want = nr_irqs_gsi;
> sub_handle = 0;
> @@ -3208,7 +3258,7 @@ int native_setup_msi_irqs(struct pci_dev *dev, int nvec, int type)
> goto error;
> }
> no_ir:
> - ret = setup_msi_irq(dev, msidesc, irq);
> + ret = setup_msi_irq(dev, msidesc, irq, 0);
> if (ret < 0)
> goto error;
> sub_handle++;
> @@ -3220,6 +3270,76 @@ error:
> return ret;
> }
>
> +int setup_msi_irqs(struct pci_dev *dev, int nvec)
> +{
> + int node, ret, sub_handle, index = 0;
> + unsigned int irq;
> + struct msi_desc *msidesc;
> +
> + if (nvec > 1 && !irq_remapping_enabled)
> + return 1;
> +
> + nvec = __roundup_pow_of_two(nvec);
> + ret = can_create_pow_of_two_irqs(nr_irqs_gsi, nvec);
> + if (ret != nvec)
> + return ret;
> +
> + WARN_ON(!list_is_singular(&dev->msi_list));
> + msidesc = list_entry(dev->msi_list.next, struct msi_desc, list);
> + WARN_ON(msidesc->irq);
> + WARN_ON(msidesc->msi_attrib.multiple);
> +
> + node = dev_to_node(&dev->dev);
> + irq = __create_irqs(nr_irqs_gsi, nvec, node);
> + if (irq == 0)
> + return -ENOSPC;
> +
> + if (!irq_remapping_enabled) {
> + ret = setup_msi_irq(dev, msidesc, irq, 0);
> + if (ret < 0)
> + goto error;
> + return 0;
> + }
> +
> + msidesc->msi_attrib.multiple = ilog2(nvec);
> + for (sub_handle = 0; sub_handle < nvec; sub_handle++) {
> + if (!sub_handle) {
> + index = msi_alloc_remapped_irq(dev, irq, nvec);
> + if (index < 0) {
> + ret = index;
> + goto error;
> + }
> + } else {
> + ret = msi_setup_remapped_irq(dev, irq + sub_handle,
> + index, sub_handle);
> + if (ret < 0)
> + goto error;
> + }
> + ret = setup_msi_irq(dev, msidesc, irq, sub_handle);
> + if (ret < 0)
> + goto error;
> + }
> + return 0;
> +
> +error:
> + destroy_irqs(irq, nvec);
> +
> + /* Restore altered MSI descriptor fields and prevent just destroyed
> + * IRQs from tearing down again in default_teardown_msi_irqs()
> + */

Ditto.

> + msidesc->irq = 0;
> + msidesc->msi_attrib.multiple = 0;
> +
> + return ret;
> +}
> +
> +int native_setup_msi_irqs(struct pci_dev *dev, int nvec, int type)
> +{
> + if (type == PCI_CAP_ID_MSI)
> + return setup_msi_irqs(dev, nvec);
> + return setup_msix_irqs(dev, nvec);
> +}
> +
> void native_teardown_msi_irq(unsigned int irq)
> {
> destroy_irq(irq);
> diff --git a/include/linux/irq.h b/include/linux/irq.h
> index 216b0ba..c3ba39f 100644
> --- a/include/linux/irq.h
> +++ b/include/linux/irq.h
> @@ -522,6 +522,8 @@ extern int irq_set_handler_data(unsigned int irq, void *data);
> extern int irq_set_chip_data(unsigned int irq, void *data);
> extern int irq_set_irq_type(unsigned int irq, unsigned int type);
> extern int irq_set_msi_desc(unsigned int irq, struct msi_desc *entry);
> +extern int irq_set_msi_desc_off(unsigned int irq_base, unsigned int irq_offset,
> + struct msi_desc *entry);
> extern struct irq_data *irq_get_irq_data(unsigned int irq);
>
> static inline struct irq_chip *irq_get_chip(unsigned int irq)
> @@ -584,8 +586,12 @@ int __irq_alloc_descs(int irq, unsigned int from, unsigned int cnt, int node,
> #define irq_alloc_desc_from(from, node) \
> irq_alloc_descs(-1, from, 1, node)
>
> +#define irq_alloc_descs_from(from, cnt, node) \
> + irq_alloc_descs(-1, from, cnt, node)
> +

Please use inlines instead of macros. Might transform the one
above it as well in the process.

> void irq_free_descs(unsigned int irq, unsigned int cnt);
> int irq_reserve_irqs(unsigned int from, unsigned int cnt);
> +int irq_can_alloc_irqs(unsigned int from, unsigned int cnt);
>
> static inline void irq_free_desc(unsigned int irq)
> {
> diff --git a/kernel/irq/chip.c b/kernel/irq/chip.c
> index 57d86d0..2230389 100644
> --- a/kernel/irq/chip.c
> +++ b/kernel/irq/chip.c
> @@ -90,27 +90,41 @@ int irq_set_handler_data(unsigned int irq, void *data)
> EXPORT_SYMBOL(irq_set_handler_data);
>
> /**
> - * irq_set_msi_desc - set MSI descriptor data for an irq
> - * @irq: Interrupt number
> - * @entry: Pointer to MSI descriptor data
> + * irq_set_msi_desc_off - set MSI descriptor data for an irq at offset
> + * @irq_base: Interrupt number base
> + * @irq_offset: Interrupt number offset
> + * @entry: Pointer to MSI descriptor data
> *
> - * Set the MSI descriptor entry for an irq
> + * Set the MSI descriptor entry for an irq at offset
> */
> -int irq_set_msi_desc(unsigned int irq, struct msi_desc *entry)
> +int irq_set_msi_desc_off(unsigned int irq_base, unsigned int irq_offset,
> + struct msi_desc *entry)
> {
> unsigned long flags;
> - struct irq_desc *desc = irq_get_desc_lock(irq, &flags, IRQ_GET_DESC_CHECK_GLOBAL);
> + struct irq_desc *desc = irq_get_desc_lock(irq_base + irq_offset, &flags, IRQ_GET_DESC_CHECK_GLOBAL);
>
> if (!desc)
> return -EINVAL;
> desc->irq_data.msi_desc = entry;
> - if (entry)
> - entry->irq = irq;
> + if (entry && !irq_offset)
> + entry->irq = irq_base;
> irq_put_desc_unlock(desc, flags);
> return 0;
> }
>
> /**
> + * irq_set_msi_desc - set MSI descriptor data for an irq
> + * @irq: Interrupt number
> + * @entry: Pointer to MSI descriptor data
> + *
> + * Set the MSI descriptor entry for an irq
> + */
> +int irq_set_msi_desc(unsigned int irq, struct msi_desc *entry)
> +{
> + return irq_set_msi_desc_off(irq, 0, entry);
> +}
> +
> +/**
> * irq_set_chip_data - set irq chip data for an irq
> * @irq: Interrupt number
> * @data: Pointer to chip specific data
> diff --git a/kernel/irq/irqdesc.c b/kernel/irq/irqdesc.c
> index 192a302..8287b78 100644
> --- a/kernel/irq/irqdesc.c
> +++ b/kernel/irq/irqdesc.c
> @@ -210,6 +210,13 @@ static int irq_expand_nr_irqs(unsigned int nr)
> return 0;
> }
>
> +static int irq_can_expand_nr_irqs(unsigned int nr)
> +{
> + if (nr > IRQ_BITMAP_BITS)
> + return -ENOMEM;
> + return 0;
> +}
> +
> int __init early_irq_init(void)
> {
> int i, initcnt, node = first_online_node;
> @@ -414,6 +421,30 @@ int irq_reserve_irqs(unsigned int from, unsigned int cnt)
> }
>
> /**
> + * irq_can_alloc_irqs - checks if a range of irqs could be allocated
> + * @from: check from irq number
> + * @cnt: number of irqs to check
> + *
> + * Returns 0 on success or an appropriate error code
> + */
> +int irq_can_alloc_irqs(unsigned int from, unsigned int cnt)
> +{
> + unsigned int start;
> + int ret = 0;
> +
> + if (!cnt)
> + return -EINVAL;
> +
> + mutex_lock(&sparse_irq_lock);
> + start = bitmap_find_next_zero_area(allocated_irqs, IRQ_BITMAP_BITS,
> + from, cnt, 0);
> + mutex_unlock(&sparse_irq_lock);
> + if (start + cnt > nr_irqs)
> + ret = irq_can_expand_nr_irqs(start + cnt);
> + return ret;

How is this supposed to work wrt. races?

Thanks,

Ingo

2012-10-02 04:57:49

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH v3 -tip 3/5] x86, MSI: Minor readability fixes


* Alexander Gordeev <[email protected]> wrote:

> Signed-off-by: Alexander Gordeev <[email protected]>
> ---
> arch/x86/kernel/apic/io_apic.c | 6 +++---
> 1 files changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
> index 84d632b..0489699 100644
> --- a/arch/x86/kernel/apic/io_apic.c
> +++ b/arch/x86/kernel/apic/io_apic.c
> @@ -3123,7 +3123,7 @@ static int msi_compose_msg(struct pci_dev *pdev, unsigned int irq,
>
> if (irq_remapped(cfg)) {
> compose_remapped_msi_msg(pdev, irq, dest, msg, hpet_id);
> - return err;
> + return 0;
> }
>
> if (x2apic_enabled())
> @@ -3150,7 +3150,7 @@ static int msi_compose_msg(struct pci_dev *pdev, unsigned int irq,
> MSI_DATA_DELIVERY_LOWPRI) |
> MSI_DATA_VECTOR(cfg->vector);
>
> - return err;
> + return 0;
> }
>
> static int
> @@ -3232,7 +3232,7 @@ int setup_msix_irqs(struct pci_dev *dev, int nvec)
> list_for_each_entry(msidesc, &dev->msi_list, list) {
> irq = create_irq_nr(irq_want, node);
> if (irq == 0)
> - return -1;
> + return -ENOSPC;
> irq_want = irq + 1;
> if (!irq_remapping_enabled)
> goto no_ir;

This should be backported into earlier patches.

Thanks,

Ingo

2012-10-02 04:58:51

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH v3 -tip 2/5] x86, MSI: Allocate as many multiple IRQs as requested


* Alexander Gordeev <[email protected]> wrote:

> When multiple MSIs are enabled with pci_enable_msi_block() the number of
> allocated IRQs 'nvec' is rounded up to the nearest value of power of two.
> That could lead to a condition when number of requested and used IRQs is
> less than number of actually allocated IRQs.
>
> This fix introduces 'msi_desc::nvec' field to address the above issue -
> when non-zero, it holds the number of allocated IRQs. Otherwise, the old
> method is used.
>
> Signed-off-by: Alexander Gordeev <[email protected]>
> ---
> arch/x86/kernel/apic/io_apic.c | 16 +++++++---------
> drivers/pci/msi.c | 10 ++++++++--
> include/linux/msi.h | 1 +
> 3 files changed, 16 insertions(+), 11 deletions(-)

This should be switched with the first patch: first extend the
generic MSI code, then add x86 support for that variant.

Adding multi-MSI support in one patch then tweaking it in the
very next patch makes little sense and cannot possibly have been
tested much so it's a potential bisection trap.

Thanks,

Ingo

2012-10-02 05:09:37

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH v3 -tip 5/5] AHCI: Support multiple MSIs


* Alexander Gordeev <[email protected]> wrote:

> Take advantage of multiple MSIs implementation on x86 - on systems with
> IRQ remapping AHCI ports not only get assigned separate MSI vectors -
> but also separate IRQs. As result, interrupts generated by different
> ports could be serviced on different CPUs rather than on a single one.
>
> In cases when number of allocated MSIs is less than requested the Sharing
> Last MSI mode does not get used, no matter implemented in hardware or not.
> Instead, the driver assumes the advantage of multiple MSIs is negated and
> falls back to the single MSI mode as if MRSM bit was set (some Intel chips
> implement this strategy anyway - MRSM bit gets set even if the number of
> allocated MSIs exceeds the number of implemented ports).
>
> Signed-off-by: Alexander Gordeev <[email protected]>
> ---
> drivers/ata/ahci.c | 91 ++++++++++++++++++++++++++++++++++++--
> drivers/ata/ahci.h | 6 +++
> drivers/ata/libahci.c | 118 ++++++++++++++++++++++++++++++++++++++++++++++---
> 3 files changed, 205 insertions(+), 10 deletions(-)
>
> diff --git a/drivers/ata/ahci.c b/drivers/ata/ahci.c
> index 7862d17..5463fcea 100644
> --- a/drivers/ata/ahci.c
> +++ b/drivers/ata/ahci.c
> @@ -1057,6 +1057,84 @@ static inline void ahci_gtf_filter_workaround(struct ata_host *host)
> {}
> #endif
>
> +int ahci_init_interrupts(struct pci_dev *pdev, struct ahci_host_priv *hpriv)
> +{
> + int rc;
> + unsigned int maxvec;
> +
> + if (!(hpriv->flags & AHCI_HFLAG_NO_MSI)) {
> + rc = pci_enable_msi_block_auto(pdev, &maxvec);
> + if (rc > 0) {
> + if ((rc == maxvec) || (rc == 1))
> + return rc;
> + /* assume that advantage of multipe MSIs is negated,
> + * so fallback to single MSI mode to save resources */

Please use the customary (multi-line) comment style:

/*
* Comment .....
* ...... goes here.
*/

specified in Documentation/CodingStyle.

> + pci_disable_msi(pdev);
> + if (!pci_enable_msi(pdev))
> + return 1;
> + }
> + }
> +
> + pci_intx(pdev, 1);
> + return 0;
> +}
> +
> +/**
> + * ahci_host_activate - start AHCI host, request IRQs and register it
> + * @host: target ATA host
> + * @irq: base IRQ number to request
> + * @n_msis: number of MSIs allocated for this host
> + * @irq_handler: irq_handler used when requesting IRQs
> + * @irq_flags: irq_flags used when requesting IRQs
> + *
> + * Similar to ata_host_activate, but requests IRQs according to AHCI-1.1
> + * when multiple MSIs were allocated. That is one MSI per port, starting
> + * from @irq.
> + *
> + * LOCKING:
> + * Inherited from calling layer (may sleep).
> + *
> + * RETURNS:
> + * 0 on success, -errno otherwise.
> + */
> +int ahci_host_activate(struct ata_host *host, int irq, unsigned int n_msis)
> +{
> + int i, rc;
> +
> + /* Sharing Last Message among several ports is not supported */
> + if (n_msis < host->n_ports)
> + return -EINVAL;
> +
> + rc = ata_host_start(host);
> + if (rc)
> + return rc;
> +
> + for (i = 0; i < host->n_ports; i++) {
> + rc = devm_request_threaded_irq(host->dev,
> + irq + i, ahci_hw_interrupt, ahci_thread_fn, IRQF_SHARED,
> + dev_driver_string(host->dev), host->ports[i]);
> + if (rc)
> + goto out_free_irqs;
> + }
> +
> + for (i = 0; i < host->n_ports; i++)
> + ata_port_desc(host->ports[i], "irq %d", irq + i);
> +
> + rc = ata_host_register(host, &ahci_sht);
> + if (rc)
> + goto out_free_all_irqs;
> +
> + return 0;
> +
> +out_free_all_irqs:
> + i = host->n_ports;
> +out_free_irqs:
> + for (; i; i--)
> + devm_free_irq(host->dev, irq + i - 1, host->ports[i]);

Please test the failure path somehow - for example I'm quite
sure that the line above is buggy and will crash/hang a real
system: it should be host->ports[i-1].

Writing it as:

devm_free_irq(host->dev, irq + i-1, host->ports[i-1]);

would help readability as well as bit - or just:

for (i--; i >= 0; i--)
devm_free_irq(host->dev, irq + i, host->ports[i]);

(untested)

> +
> + return rc;
> +}
> +
> static int ahci_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
> {
> unsigned int board_id = ent->driver_data;
> @@ -1065,7 +1143,7 @@ static int ahci_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
> struct device *dev = &pdev->dev;
> struct ahci_host_priv *hpriv;
> struct ata_host *host;
> - int n_ports, i, rc;
> + int n_ports, n_msis, i, rc;
> int ahci_pci_bar = AHCI_PCI_BAR_STANDARD;
>
> VPRINTK("ENTER\n");
> @@ -1150,11 +1228,12 @@ static int ahci_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
> if (ahci_sb600_enable_64bit(pdev))
> hpriv->flags &= ~AHCI_HFLAG_32BIT_ONLY;
>
> - if ((hpriv->flags & AHCI_HFLAG_NO_MSI) || pci_enable_msi(pdev))
> - pci_intx(pdev, 1);
> -
> hpriv->mmio = pcim_iomap_table(pdev)[ahci_pci_bar];
>
> + n_msis = ahci_init_interrupts(pdev, hpriv);
> + if (n_msis > 1)
> + hpriv->flags |= AHCI_HFLAG_MULTI_MSI;
> +
> /* save initial config */
> ahci_pci_save_initial_config(pdev, hpriv);
>
> @@ -1250,6 +1329,10 @@ static int ahci_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
> ahci_pci_print_info(host);
>
> pci_set_master(pdev);
> +
> + if (hpriv->flags & AHCI_HFLAG_MULTI_MSI)
> + return ahci_host_activate(host, pdev->irq, n_msis);
> +
> return ata_host_activate(host, pdev->irq, ahci_interrupt, IRQF_SHARED,
> &ahci_sht);
> }
> diff --git a/drivers/ata/ahci.h b/drivers/ata/ahci.h
> index 57eb1c2..24251e8 100644
> --- a/drivers/ata/ahci.h
> +++ b/drivers/ata/ahci.h
> @@ -216,6 +216,7 @@ enum {
> AHCI_HFLAG_DELAY_ENGINE = (1 << 15), /* do not start engine on
> port start (wait until
> error-handling stage) */
> + AHCI_HFLAG_MULTI_MSI = (1 << 16), /* multiple PCI MSIs */
>
> /* ap->flags bits */
>
> @@ -282,6 +283,8 @@ struct ahci_port_priv {
> unsigned int ncq_saw_d2h:1;
> unsigned int ncq_saw_dmas:1;
> unsigned int ncq_saw_sdb:1;
> + u32 intr_status; /* interrupts to handle */
> + spinlock_t lock;

In general it's nice to add a small comment that explains what
data the lock protects precisely and in what circumstances it's
used - especially as it's in the middle of a structure, making
it unclear at first sight whether it's for the whole ahci_port
descriptor or just part of it.

> u32 intr_mask; /* interrupts to enable */
> bool fbs_supported; /* set iff FBS is supported */
> bool fbs_enabled; /* set iff FBS is enabled */
> @@ -343,7 +346,10 @@ void ahci_set_em_messages(struct ahci_host_priv *hpriv,
> struct ata_port_info *pi);
> int ahci_reset_em(struct ata_host *host);
> irqreturn_t ahci_interrupt(int irq, void *dev_instance);
> +irqreturn_t ahci_hw_interrupt(int irq, void *dev_instance);
> +irqreturn_t ahci_thread_fn(int irq, void *dev_instance);
> void ahci_print_info(struct ata_host *host, const char *scc_s);
> +int ahci_host_activate(struct ata_host *host, int irq, unsigned int n_msis);
>
> static inline void __iomem *__ahci_port_base(struct ata_host *host,
> unsigned int port_no)
> diff --git a/drivers/ata/libahci.c b/drivers/ata/libahci.c
> index 555c07a..3b54e84 100644
> --- a/drivers/ata/libahci.c
> +++ b/drivers/ata/libahci.c
> @@ -1639,19 +1639,16 @@ static void ahci_error_intr(struct ata_port *ap, u32 irq_stat)
> ata_port_abort(ap);
> }
>
> -static void ahci_port_intr(struct ata_port *ap)
> +static void ahci_handle_port_interrupt(struct ata_port *ap,
> + void __iomem *port_mmio, u32 status)
> {
> - void __iomem *port_mmio = ahci_port_base(ap);
> struct ata_eh_info *ehi = &ap->link.eh_info;
> struct ahci_port_priv *pp = ap->private_data;
> struct ahci_host_priv *hpriv = ap->host->private_data;
> int resetting = !!(ap->pflags & ATA_PFLAG_RESETTING);
> - u32 status, qc_active = 0;
> + u32 qc_active = 0;
> int rc;
>
> - status = readl(port_mmio + PORT_IRQ_STAT);
> - writel(status, port_mmio + PORT_IRQ_STAT);
> -
> /* ignore BAD_PMP while resetting */
> if (unlikely(resetting))
> status &= ~PORT_IRQ_BAD_PMP;
> @@ -1727,6 +1724,107 @@ static void ahci_port_intr(struct ata_port *ap)
> }
> }
>
> +void ahci_port_intr(struct ata_port *ap)
> +{
> + void __iomem *port_mmio = ahci_port_base(ap);
> + u32 status;
> +
> + status = readl(port_mmio + PORT_IRQ_STAT);
> + writel(status, port_mmio + PORT_IRQ_STAT);
> +
> + ahci_handle_port_interrupt(ap, port_mmio, status);
> +}
> +
> +irqreturn_t ahci_thread_fn(int irq, void *dev_instance)
> +{
> + struct ata_port *ap = dev_instance;
> + struct ahci_port_priv *pp = ap->private_data;
> + void __iomem *port_mmio = ahci_port_base(ap);
> + unsigned long flags;
> + u32 status;
> +
> + spin_lock_irqsave(&ap->host->lock, flags);
> + status = pp->intr_status;
> + if (status)
> + pp->intr_status = 0;
> + spin_unlock_irqrestore(&ap->host->lock, flags);
> +
> + spin_lock_bh(ap->lock);
> + ahci_handle_port_interrupt(ap, port_mmio, status);
> + spin_unlock_bh(ap->lock);
> +
> + return IRQ_HANDLED;
> +}
> +EXPORT_SYMBOL_GPL(ahci_thread_fn);
> +
> +void ahci_hw_port_interrupt(struct ata_port *ap)
> +{
> + void __iomem *port_mmio = ahci_port_base(ap);
> + struct ahci_port_priv *pp = ap->private_data;
> + u32 status;
> +
> + status = readl(port_mmio + PORT_IRQ_STAT);
> + writel(status, port_mmio + PORT_IRQ_STAT);
> +
> + pp->intr_status |= status;
> +}
> +
> +irqreturn_t ahci_hw_interrupt(int irq, void *dev_instance)
> +{
> + struct ata_port *ap_this = dev_instance;
> + struct ahci_port_priv *pp = ap_this->private_data;
> + struct ata_host *host = ap_this->host;
> + struct ahci_host_priv *hpriv = host->private_data;
> + void __iomem *mmio = hpriv->mmio;
> + unsigned int i;
> + u32 irq_stat, irq_masked;
> +
> + VPRINTK("ENTER\n");

Is this per IRQ handler execution debugging code still needed?
Same for the other VPRINTK() lines in this patch.

Thanks,

Ingo

2012-10-02 11:03:52

by Alexander Gordeev

[permalink] [raw]
Subject: Re: [PATCH v3 -tip 1/5] x86, MSI: Support multiple MSIs in presense of IRQ remapping

On Tue, Oct 02, 2012 at 06:55:18AM +0200, Ingo Molnar wrote:

Thanks for the review, Ingo.

> > @@ -584,8 +586,12 @@ int __irq_alloc_descs(int irq, unsigned int from, unsigned int cnt, int node,
> > #define irq_alloc_desc_from(from, node) \
> > irq_alloc_descs(-1, from, 1, node)
> >
> > +#define irq_alloc_descs_from(from, cnt, node) \
> > + irq_alloc_descs(-1, from, cnt, node)
> > +
>
> Please use inlines instead of macros. Might transform the one
> above it as well in the process.

You mean here do not introduce irq_alloc_descs_from, but rather use
irq_alloc_descs() directly?

> > +int irq_can_alloc_irqs(unsigned int from, unsigned int cnt)
> > +{
> > + unsigned int start;
> > + int ret = 0;
> > +
> > + if (!cnt)
> > + return -EINVAL;
> > +
> > + mutex_lock(&sparse_irq_lock);
> > + start = bitmap_find_next_zero_area(allocated_irqs, IRQ_BITMAP_BITS,
> > + from, cnt, 0);
> > + mutex_unlock(&sparse_irq_lock);
> > + if (start + cnt > nr_irqs)
> > + ret = irq_can_expand_nr_irqs(start + cnt);
> > + return ret;
>
> How is this supposed to work wrt. races?

It is not supposed. Just a quick check if there are enough bits before an
attempt to allocate memory in __create_irqs(). Otherwise __create_irqs()
might allocate irq_cfg's, then realize there are no bits, then deallocate
and fail.

But strictly speaking, irq_can_alloc_irqs() is unnecessary.

--
Regards,
Alexander Gordeev
[email protected]

2012-10-02 11:25:32

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH v3 -tip 1/5] x86, MSI: Support multiple MSIs in presense of IRQ remapping


* Alexander Gordeev <[email protected]> wrote:

> On Tue, Oct 02, 2012 at 06:55:18AM +0200, Ingo Molnar wrote:
>
> Thanks for the review, Ingo.
>
> > > @@ -584,8 +586,12 @@ int __irq_alloc_descs(int irq, unsigned int from, unsigned int cnt, int node,
> > > #define irq_alloc_desc_from(from, node) \
> > > irq_alloc_descs(-1, from, 1, node)
> > >
> > > +#define irq_alloc_descs_from(from, cnt, node) \
> > > + irq_alloc_descs(-1, from, cnt, node)
> > > +
> >
> > Please use inlines instead of macros. Might transform the one
> > above it as well in the process.
>
> You mean here do not introduce irq_alloc_descs_from, but rather use
> irq_alloc_descs() directly?

My suggestion is to add irq_alloc_descs_from() as a (very
simple) inline function and change irq_alloc_desc_from() to be
an inline function as well.

> > > +int irq_can_alloc_irqs(unsigned int from, unsigned int cnt)
> > > +{
> > > + unsigned int start;
> > > + int ret = 0;
> > > +
> > > + if (!cnt)
> > > + return -EINVAL;
> > > +
> > > + mutex_lock(&sparse_irq_lock);
> > > + start = bitmap_find_next_zero_area(allocated_irqs, IRQ_BITMAP_BITS,
> > > + from, cnt, 0);
> > > + mutex_unlock(&sparse_irq_lock);
> > > + if (start + cnt > nr_irqs)
> > > + ret = irq_can_expand_nr_irqs(start + cnt);
> > > + return ret;
> >
> > How is this supposed to work wrt. races?
>
> It is not supposed. Just a quick check if there are enough bits before an
> attempt to allocate memory in __create_irqs(). Otherwise __create_irqs()
> might allocate irq_cfg's, then realize there are no bits, then deallocate
> and fail.
>
> But strictly speaking, irq_can_alloc_irqs() is unnecessary.

Why complicate it if it's unnecessary? The function is inviting
wrong logic: it *cannot* tell whether there are enough bits,
because the check is racy.

So I'd suggest to keep this out - this will further simplify the
patches.

Thanks,

Ingo

2012-10-02 16:39:43

by Alexander Gordeev

[permalink] [raw]
Subject: Re: [PATCH v3 -tip 5/5] AHCI: Support multiple MSIs

On Tue, Oct 02, 2012 at 07:09:29AM +0200, Ingo Molnar wrote:
> > +irqreturn_t ahci_hw_interrupt(int irq, void *dev_instance)
> > +{
> > + struct ata_port *ap_this = dev_instance;
> > + struct ahci_port_priv *pp = ap_this->private_data;
> > + struct ata_host *host = ap_this->host;
> > + struct ahci_host_priv *hpriv = host->private_data;
> > + void __iomem *mmio = hpriv->mmio;
> > + unsigned int i;
> > + u32 irq_stat, irq_masked;
> > +
> > + VPRINTK("ENTER\n");
>
> Is this per IRQ handler execution debugging code still needed?
> Same for the other VPRINTK() lines in this patch.

These VPRINKs are only to make new handlers look like ahci_interrupt()
which did not change. I believe, if they need to go it is better to
remove them altogether, with a separate followup patch.

Jeff, what do you think?

Thanks!

--
Regards,
Alexander Gordeev
[email protected]

2012-10-02 16:45:33

by Jeff Garzik

[permalink] [raw]
Subject: Re: [PATCH v3 -tip 5/5] AHCI: Support multiple MSIs

On 10/02/2012 12:42 PM, Alexander Gordeev wrote:
> On Tue, Oct 02, 2012 at 07:09:29AM +0200, Ingo Molnar wrote:
>>> +irqreturn_t ahci_hw_interrupt(int irq, void *dev_instance)
>>> +{
>>> + struct ata_port *ap_this = dev_instance;
>>> + struct ahci_port_priv *pp = ap_this->private_data;
>>> + struct ata_host *host = ap_this->host;
>>> + struct ahci_host_priv *hpriv = host->private_data;
>>> + void __iomem *mmio = hpriv->mmio;
>>> + unsigned int i;
>>> + u32 irq_stat, irq_masked;
>>> +
>>> + VPRINTK("ENTER\n");
>>
>> Is this per IRQ handler execution debugging code still needed?
>> Same for the other VPRINTK() lines in this patch.
>
> These VPRINKs are only to make new handlers look like ahci_interrupt()
> which did not change. I believe, if they need to go it is better to
> remove them altogether, with a separate followup patch.

Definitely followup patch material + discussion :)

For the moment, the above is consistent with existing code, and by
default compiled out, as one would expect.

Jeff




2012-10-04 07:55:09

by Alexander Gordeev

[permalink] [raw]
Subject: Re: [PATCH v3 -tip 1/5] x86, MSI: Support multiple MSIs in presense of IRQ remapping

On Tue, Oct 02, 2012 at 01:25:24PM +0200, Ingo Molnar wrote:
> > > > @@ -584,8 +586,12 @@ int __irq_alloc_descs(int irq, unsigned int from, unsigned int cnt, int node,
> > > > #define irq_alloc_desc_from(from, node) \
> > > > irq_alloc_descs(-1, from, 1, node)
> > > >
> > > > +#define irq_alloc_descs_from(from, cnt, node) \
> > > > + irq_alloc_descs(-1, from, cnt, node)
> > > > +
> > >
> > > Please use inlines instead of macros. Might transform the one
> > > above it as well in the process.
> >
> > You mean here do not introduce irq_alloc_descs_from, but rather use
> > irq_alloc_descs() directly?
>
> My suggestion is to add irq_alloc_descs_from() as a (very
> simple) inline function and change irq_alloc_desc_from() to be
> an inline function as well.

These defines were added on purpose with commit ec53cf2 ("irq: don't put
module.h into irq.h for tracking irqgen modules.") - the relevant hunk is
below. I suppose we do not want to revert it?


@@ -567,29 +567,21 @@ static inline struct msi_desc *irq_data_get_msi(struct irq_data *d)
int __irq_alloc_descs(int irq, unsigned int from, unsigned int cnt, int node,
struct module *owner);

-static inline int irq_alloc_descs(int irq, unsigned int from, unsigned int cnt,
- int node)
-{
- return __irq_alloc_descs(irq, from, cnt, node, THIS_MODULE);
-}
+/* use macros to avoid needing export.h for THIS_MODULE */
+#define irq_alloc_descs(irq, from, cnt, node) \
+ __irq_alloc_descs(irq, from, cnt, node, THIS_MODULE)

-void irq_free_descs(unsigned int irq, unsigned int cnt);
-int irq_reserve_irqs(unsigned int from, unsigned int cnt);
+#define irq_alloc_desc(node) \
+ irq_alloc_descs(-1, 0, 1, node)

-static inline int irq_alloc_desc(int node)
-{
- return irq_alloc_descs(-1, 0, 1, node);
-}
+#define irq_alloc_desc_at(at, node) \
+ irq_alloc_descs(at, at, 1, node)

-static inline int irq_alloc_desc_at(unsigned int at, int node)
-{
- return irq_alloc_descs(at, at, 1, node);
-}
+#define irq_alloc_desc_from(from, node) \
+ irq_alloc_descs(-1, from, 1, node)

-static inline int irq_alloc_desc_from(unsigned int from, int node)
-{
- return irq_alloc_descs(-1, from, 1, node);
-}
+void irq_free_descs(unsigned int irq, unsigned int cnt);
+int irq_reserve_irqs(unsigned int from, unsigned int cnt);

static inline void irq_free_desc(unsigned int irq)
{



--
Regards,
Alexander Gordeev
[email protected]

2012-10-04 09:17:38

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH v3 -tip 1/5] x86, MSI: Support multiple MSIs in presense of IRQ remapping


* Alexander Gordeev <[email protected]> wrote:

> On Tue, Oct 02, 2012 at 01:25:24PM +0200, Ingo Molnar wrote:
> > > > > @@ -584,8 +586,12 @@ int __irq_alloc_descs(int irq, unsigned int from, unsigned int cnt, int node,
> > > > > #define irq_alloc_desc_from(from, node) \
> > > > > irq_alloc_descs(-1, from, 1, node)
> > > > >
> > > > > +#define irq_alloc_descs_from(from, cnt, node) \
> > > > > + irq_alloc_descs(-1, from, cnt, node)
> > > > > +
> > > >
> > > > Please use inlines instead of macros. Might transform the one
> > > > above it as well in the process.
> > >
> > > You mean here do not introduce irq_alloc_descs_from, but rather use
> > > irq_alloc_descs() directly?
> >
> > My suggestion is to add irq_alloc_descs_from() as a (very
> > simple) inline function and change irq_alloc_desc_from() to be
> > an inline function as well.
>
> These defines were added on purpose with commit ec53cf2 ("irq:
> don't put module.h into irq.h for tracking irqgen modules.") -
> the relevant hunk is below. I suppose we do not want to revert
> it?

Sigh - that commit is really making a step backwards, but indeed
you are probably right that reintroducing the inlines would
create header dependency problems - which should be addressed in
another patch.

So I concur with your original approach that added a macro.

Thanks,

Ingo

2012-10-04 14:36:15

by Alexander Gordeev

[permalink] [raw]
Subject: Re: [PATCH v3 -tip 5/5] AHCI: Support multiple MSIs

On Tue, Oct 02, 2012 at 12:45:26PM -0400, Jeff Garzik wrote:
> >>>+ VPRINTK("ENTER\n");
> >>
> >>Is this per IRQ handler execution debugging code still needed?
> >>Same for the other VPRINTK() lines in this patch.
> >
> >These VPRINKs are only to make new handlers look like ahci_interrupt()
> >which did not change. I believe, if they need to go it is better to
> >remove them altogether, with a separate followup patch.
>
> Definitely followup patch material + discussion :)

It appears the discussion comes first :)

As DPRINTKs/VPRINTKs are all over the ATA stack, I do not really realize
why it should be eliminated in one places (i.e. in hardware handlers) and
left in anothers. So as long as you would not like to remove them all, I
would leave it as is for now.

Does it make sense for you?

--
Regards,
Alexander Gordeev
[email protected]