2013-06-07 22:31:37

by Yinghai Lu

[permalink] [raw]
Subject: [PATCH v3 00/27] x86, irq: support ioapic device hotplug

Hi,

Intel cpu (from IVB) include cpu, mem controller, and IIO.
When hotadd cpu physically, it will involve cpu hotplug, mem hotplug,
and pci root hotplug.

IIO includes pci host bridge, ioapic controller and iommu.
pci devices will need to use ioapic and iommu.
So to make pci root bus hotplug working, we need to add ioapic hotplug support.

Now ioapics are with ACPI MADT table. During booting, kernel will parse
MADT and put info into ioapic arrays.
Also Bjorn added one pci device based driver, but it is not wired in yet,
as it need to call acpi_register_ioapic, and that is TBD.

This patchset will
1. extend genirq to support reserve/realloc method.
because we want irqs for one ioapic controller to be linear mapping
to the gsi range.
2. change ioapic array operation code so we could insert new ioapic and
remove one leave the blank slot.
3. record irq_base in gsi_config in ioapic struct, and use it to convert gsi
to irq for pci device using that ioapic controller
4. make static ioapic path (from MADT) use same code as hot-add path,
with reserve/realloc.
5. change ioapic add/removing to use acpi way, as that is only needed when
pci root bus hotplug. Also make it support the case that ioapic
controller is hiding in pci config space or ioapic address is not
managed by pci reallocation subsys.

Also include some cleanups
1. print MSI-X clearly in /proc/interrupts and dmesg.
2. convert irq_2_pin to be generic list.

It is based on linus's tree of 2013/06/07 aka v3.10-rc3+.

could get them from:
git://git.kernel.org/pub/scm/linux/kernel/git/yinghai/linux-yinghai.git for-x86-irq

-v3: split last patch to 4 patches per request from Bjorn.
drop one patch that will put pci device name in proc/interrupts
update changelog.

Yinghai Lu (27):
x86, irq: Change irq_remap_modify_chip_defaults to static
x86, irq: Modify irq chip once for irq remapping
x86, irq: Print out MSI/MSI-X clearly
x86, irq: Show MSI-X in /proc/interrupt
x86, irq: Make dmar_msi/hpet_msi irq_chip name consistent
ia64, irq: Add dummy create_irq_nr()
iommu, irq: Allocate irq_desc for dmar_msi with local node
x86, irq: kill create_irq()
x86, irq: Convert irq_2_pin list to generic list
genirq: Split __irq_reserve_irqs from irq_alloc_descs
x86, irq: Add realloc_irq_and_cfg_at()
x86, irq: Move down arch_early_irq_init()
x86, irq: Split out alloc_ioapic_save_registers()
xen, irq: call irq_realloc_desc_at() at first
x86, irq: pre-reserve irq range/realloc for booting path
x86, irq: Add ioapic_gsi_to_irq
genirq: Bail out early in free_desc()
x86, irq: More strict checking about registering ioapic
x86, irq: Make mp_register_ioapic handle hot-added ioapic
x86, irq: Add mp_unregister_ioapic to handle hot-remove ioapic
x86, irq: Make ioapics loop skip blank slots
x86, ioapic: Find usable ioapic id for 64bit.
x86: Move declaration for mp_register_ioapic()
PCI, x86: Make ioapic hotplug support built-in
PCI, x86, ACPI: Link acpi ioapic register to ioapic
PCI, x86, ACPI: Enable ioapic hotplug support with acpi host bridge.
PCI, x86, ACPI: get ioapic address from acpi device

arch/ia64/kernel/irq_ia64.c | 10 +
arch/x86/include/asm/hw_irq.h | 2 +-
arch/x86/include/asm/io_apic.h | 7 +
arch/x86/include/asm/irq_remapping.h | 6 -
arch/x86/include/asm/mpspec.h | 21 +-
arch/x86/kernel/acpi/boot.c | 32 +--
arch/x86/kernel/apic/apic.c | 9 +-
arch/x86/kernel/apic/io_apic.c | 441 ++++++++++++++++++++++++++---------
drivers/acpi/pci_root.c | 4 +
drivers/iommu/dmar.c | 2 +-
drivers/iommu/irq_remapping.c | 16 +-
drivers/pci/Kconfig | 3 +-
drivers/pci/ioapic.c | 232 ++++++++++++------
drivers/xen/events.c | 8 +-
include/linux/irq.h | 6 +
include/linux/pci-acpi.h | 8 +
kernel/irq/irqdesc.c | 83 +++++--
17 files changed, 650 insertions(+), 240 deletions(-)

--
1.8.1.4


2013-06-07 22:31:40

by Yinghai Lu

[permalink] [raw]
Subject: [PATCH v3 06/27] ia64, irq: Add dummy create_irq_nr()

create_irq() will return -1 when fail to allocate.
create_irq_nr() will return 0 when fail to allocate.

Will use it to fix one return value checking for dmar_msi irq.

Signed-off-by: Yinghai Lu <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: Fenghua Yu <[email protected]>
Cc: [email protected]
---
arch/ia64/kernel/irq_ia64.c | 10 ++++++++++
1 file changed, 10 insertions(+)

diff --git a/arch/ia64/kernel/irq_ia64.c b/arch/ia64/kernel/irq_ia64.c
index 1034884..38e46df 100644
--- a/arch/ia64/kernel/irq_ia64.c
+++ b/arch/ia64/kernel/irq_ia64.c
@@ -429,6 +429,16 @@ int create_irq(void)
return irq;
}

+unsigned int create_irq_nr(unsigned int from, int node)
+{
+ int irq = create_irq();
+
+ if (irq < 0)
+ irq = 0;
+
+ return irq;
+}
+
void destroy_irq(unsigned int irq)
{
dynamic_irq_cleanup(irq);
--
1.8.1.4

2013-06-07 22:31:53

by Yinghai Lu

[permalink] [raw]
Subject: [PATCH v3 23/27] x86: Move declaration for mp_register_ioapic()

Address compiling problem that Fengguang report.

Reported-by: Fengguang Wu <[email protected]>
Signed-off-by: Yinghai Lu <[email protected]>
---
arch/x86/include/asm/mpspec.h | 19 ++++++++++++++++++-
1 file changed, 18 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/mpspec.h b/arch/x86/include/asm/mpspec.h
index e70b7e5..13439c5 100644
--- a/arch/x86/include/asm/mpspec.h
+++ b/arch/x86/include/asm/mpspec.h
@@ -95,10 +95,27 @@ static inline void early_reserve_e820_mpc_new(void) { }
#endif

void __cpuinit generic_processor_info(int apicid, int version);
-#ifdef CONFIG_ACPI
+
+#ifdef CONFIG_X86_IO_APIC
int __mp_register_ioapic(int id, u32 address, u32 gsi_base, bool hot);
int mp_unregister_ioapic(u32 gsi_base);
extern void mp_register_ioapic(int id, u32 address, u32 gsi_base);
+#else
+static inline int __mp_register_ioapic(int id, u32 address, u32 gsi_base,
+ bool hot)
+{
+ return 0;
+}
+static inline int mp_unregister_ioapic(u32 gsi_base)
+{
+ return 0;
+}
+static inline void mp_register_ioapic(int id, u32 address, u32 gsi_base)
+{
+}
+#endif
+
+#ifdef CONFIG_ACPI
extern void mp_override_legacy_irq(u8 bus_irq, u8 polarity, u8 trigger,
u32 gsi);
extern void mp_config_acpi_legacy_irqs(void);
--
1.8.1.4

2013-06-07 22:32:03

by Yinghai Lu

[permalink] [raw]
Subject: [PATCH v3 26/27] PCI, x86, ACPI: Enable ioapic hotplug support with acpi host bridge.

We need to have ioapic setup before normal pci drivers.
otherwise other pci driver can not setup irq.

So we should not treat them as normal pci devices.
Also we will need to support ioapic hotplug without pci device around.

We need to call ioapic add/remove during host-bridge add/remove.

Signed-off-by: Yinghai Lu <[email protected]>
---
drivers/acpi/pci_root.c | 4 ++
drivers/pci/ioapic.c | 147 ++++++++++++++++++++++++++++++-----------------
include/linux/pci-acpi.h | 8 +++
3 files changed, 106 insertions(+), 53 deletions(-)

diff --git a/drivers/acpi/pci_root.c b/drivers/acpi/pci_root.c
index e427dc5..dadcbe9 100644
--- a/drivers/acpi/pci_root.c
+++ b/drivers/acpi/pci_root.c
@@ -542,6 +542,8 @@ static int acpi_pci_root_add(struct acpi_device *device,
if (system_state != SYSTEM_BOOTING)
pci_enable_bridges(root->bus);

+ acpi_pci_ioapic_add(root);
+
pci_bus_add_devices(root->bus);
return 1;

@@ -561,6 +563,8 @@ static void acpi_pci_root_remove(struct acpi_device *device)

pci_stop_root_bus(root->bus);

+ acpi_pci_ioapic_remove(root);
+
device_set_run_wake(root->bus->bridge, false);
pci_acpi_remove_bus_pm_notifier(device);

diff --git a/drivers/pci/ioapic.c b/drivers/pci/ioapic.c
index 7d6b157..60351b2 100644
--- a/drivers/pci/ioapic.c
+++ b/drivers/pci/ioapic.c
@@ -22,101 +22,142 @@
#include <linux/slab.h>
#include <acpi/acpi_bus.h>

-struct ioapic {
- acpi_handle handle;
+struct acpi_pci_ioapic {
+ acpi_handle root_handle;
u32 gsi_base;
+ struct pci_dev *pdev;
+ struct list_head list;
};

-static int ioapic_probe(struct pci_dev *dev, const struct pci_device_id *ent)
+static LIST_HEAD(ioapic_list);
+static DEFINE_MUTEX(ioapic_list_lock);
+
+static void handle_ioapic_add(acpi_handle handle, struct pci_dev **pdev,
+ u32 *pgsi_base)
{
- acpi_handle handle;
acpi_status status;
unsigned long long gsb;
- struct ioapic *ioapic;
+ struct pci_dev *dev;
+ u32 gsi_base;
int ret;
char *type;
- struct resource *res;
+ struct resource r;
+ struct resource *res = &r;
+ char objname[64];
+ struct acpi_buffer buffer = {sizeof(objname), objname};

- handle = DEVICE_ACPI_HANDLE(&dev->dev);
- if (!handle)
- return -EINVAL;
+ *pdev = NULL;
+ *pgsi_base = 0;

status = acpi_evaluate_integer(handle, "_GSB", NULL, &gsb);
- if (ACPI_FAILURE(status))
- return -EINVAL;
-
- /*
- * The previous code in acpiphp evaluated _MAT if _GSB failed, but
- * ACPI spec 4.0 sec 6.2.2 requires _GSB for hot-pluggable I/O APICs.
- */
+ if (ACPI_FAILURE(status) || !gsb)
+ return;

- ioapic = kzalloc(sizeof(*ioapic), GFP_KERNEL);
- if (!ioapic)
- return -ENOMEM;
+ dev = acpi_get_pci_dev(handle);
+ if (!dev)
+ return;

- ioapic->handle = handle;
- ioapic->gsi_base = (u32) gsb;
+ acpi_get_name(handle, ACPI_FULL_PATHNAME, &buffer);

- if (dev->class == PCI_CLASS_SYSTEM_PIC_IOAPIC)
- type = "IOAPIC";
- else
- type = "IOxAPIC";
+ gsi_base = gsb;
+ type = "IOxAPIC";

ret = pci_enable_device(dev);
if (ret < 0)
- goto exit_free;
+ goto exit_put;

pci_set_master(dev);

+ if (dev->class == PCI_CLASS_SYSTEM_PIC_IOAPIC)
+ type = "IOAPIC";
+
if (pci_request_region(dev, 0, type))
goto exit_disable;

res = &dev->resource[0];
- if (acpi_register_ioapic(ioapic->handle, res->start, ioapic->gsi_base))
+
+ if (acpi_register_ioapic(handle, res->start, gsi_base))
goto exit_release;

- pci_set_drvdata(dev, ioapic);
- dev_info(&dev->dev, "%s at %pR, GSI %u\n", type, res, ioapic->gsi_base);
- return 0;
+ pr_info("%s %s %s at %pR, GSI %u\n",
+ dev_name(&dev->dev), objname, type,
+ res, gsi_base);
+
+ *pdev = dev;
+ *pgsi_base = gsi_base;
+ return;

exit_release:
pci_release_region(dev, 0);
exit_disable:
pci_disable_device(dev);
-exit_free:
- kfree(ioapic);
- return -ENODEV;
+exit_put:
+ pci_dev_put(dev);
}

-static void ioapic_remove(struct pci_dev *dev)
+static void handle_ioapic_remove(acpi_handle handle, struct pci_dev *dev,
+ u32 gsi_base)
{
- struct ioapic *ioapic = pci_get_drvdata(dev);
+ acpi_unregister_ioapic(handle, gsi_base);

- acpi_unregister_ioapic(ioapic->handle, ioapic->gsi_base);
pci_release_region(dev, 0);
pci_disable_device(dev);
- kfree(ioapic);
+ pci_dev_put(dev);
}

+static acpi_status register_ioapic(acpi_handle handle, u32 lvl,
+ void *context, void **rv)
+{
+ acpi_handle root_handle = context;
+ struct pci_dev *pdev;
+ u32 gsi_base;
+ struct acpi_pci_ioapic *ioapic;

-static DEFINE_PCI_DEVICE_TABLE(ioapic_devices) = {
- { PCI_DEVICE_CLASS(PCI_CLASS_SYSTEM_PIC_IOAPIC, ~0) },
- { PCI_DEVICE_CLASS(PCI_CLASS_SYSTEM_PIC_IOXAPIC, ~0) },
- { }
-};
-MODULE_DEVICE_TABLE(pci, ioapic_devices);
+ handle_ioapic_add(handle, &pdev, &gsi_base);
+ if (!gsi_base)
+ return AE_OK;

-static struct pci_driver ioapic_driver = {
- .name = "ioapic",
- .id_table = ioapic_devices,
- .probe = ioapic_probe,
- .remove = ioapic_remove,
-};
+ ioapic = kzalloc(sizeof(*ioapic), GFP_KERNEL);
+ if (!ioapic) {
+ pr_err("%s: cannot allocate memory\n", __func__);
+ handle_ioapic_remove(root_handle, pdev, gsi_base);
+ return AE_OK;
+ }
+ ioapic->root_handle = root_handle;
+ ioapic->pdev = pdev;
+ ioapic->gsi_base = gsi_base;
+
+ mutex_lock(&ioapic_list_lock);
+ list_add(&ioapic->list, &ioapic_list);
+ mutex_unlock(&ioapic_list_lock);
+
+ return AE_OK;
+}

-static int __init ioapic_init(void)
+void acpi_pci_ioapic_add(struct acpi_pci_root *root)
{
- return pci_register_driver(&ioapic_driver);
+ acpi_status status;
+
+ status = acpi_walk_namespace(ACPI_TYPE_DEVICE, root->device->handle,
+ (u32)1, register_ioapic, NULL,
+ root->device->handle,
+ NULL);
+ if (ACPI_FAILURE(status))
+ pr_err("%s: register_ioapic failure - %d", __func__, status);
}
-module_init(ioapic_init);

-MODULE_LICENSE("GPL");
+void acpi_pci_ioapic_remove(struct acpi_pci_root *root)
+{
+ struct acpi_pci_ioapic *ioapic, *tmp;
+
+ mutex_lock(&ioapic_list_lock);
+ list_for_each_entry_safe(ioapic, tmp, &ioapic_list, list) {
+ if (root->device->handle != ioapic->root_handle)
+ continue;
+ list_del(&ioapic->list);
+ handle_ioapic_remove(ioapic->root_handle, ioapic->pdev,
+ ioapic->gsi_base);
+ kfree(ioapic);
+ }
+ mutex_unlock(&ioapic_list_lock);
+}
diff --git a/include/linux/pci-acpi.h b/include/linux/pci-acpi.h
index 1704479..b2a2ced 100644
--- a/include/linux/pci-acpi.h
+++ b/include/linux/pci-acpi.h
@@ -69,6 +69,14 @@ static inline void acpiphp_remove_slots(struct pci_bus *bus) { }
static inline void acpiphp_check_host_bridge(acpi_handle handle) { }
#endif

+#ifdef CONFIG_PCI_IOAPIC
+void acpi_pci_ioapic_add(struct acpi_pci_root *root);
+void acpi_pci_ioapic_remove(struct acpi_pci_root *root);
+#else
+static inline void acpi_pci_ioapic_add(struct acpi_pci_root *root) { }
+static inline void acpi_pci_ioapic_remove(struct acpi_pci_root *root) { }
+#endif
+
#else /* CONFIG_ACPI */
static inline void acpi_pci_add_bus(struct pci_bus *bus) { }
static inline void acpi_pci_remove_bus(struct pci_bus *bus) { }
--
1.8.1.4

2013-06-07 22:32:23

by Yinghai Lu

[permalink] [raw]
Subject: [PATCH v3 27/27] PCI, x86, ACPI: get ioapic address from acpi device

Some ioapic controllers do not show up on pci config space,
or pci device is there but no bar is used and is set by firmware in
other non standard registers.

We can get ioapic address from ACPI0009's _CRS.

Signed-off-by: Yinghai Lu <[email protected]>
---
drivers/pci/ioapic.c | 86 +++++++++++++++++++++++++++++++++++++++++++---------
1 file changed, 71 insertions(+), 15 deletions(-)

diff --git a/drivers/pci/ioapic.c b/drivers/pci/ioapic.c
index 60351b2..41f7c69 100644
--- a/drivers/pci/ioapic.c
+++ b/drivers/pci/ioapic.c
@@ -32,6 +32,36 @@ struct acpi_pci_ioapic {
static LIST_HEAD(ioapic_list);
static DEFINE_MUTEX(ioapic_list_lock);

+static acpi_status setup_res(struct acpi_resource *acpi_res, void *data)
+{
+ struct resource *res;
+ struct acpi_resource_address64 addr;
+ acpi_status status;
+ unsigned long flags;
+ u64 start, end;
+
+ status = acpi_resource_to_address64(acpi_res, &addr);
+ if (!ACPI_SUCCESS(status))
+ return AE_OK;
+
+ if (addr.resource_type == ACPI_MEMORY_RANGE) {
+ if (addr.info.mem.caching == ACPI_PREFETCHABLE_MEMORY)
+ return AE_OK;
+ flags = IORESOURCE_MEM;
+ } else
+ return AE_OK;
+
+ start = addr.minimum + addr.translation_offset;
+ end = addr.maximum + addr.translation_offset;
+
+ res = data;
+ res->flags = flags;
+ res->start = start;
+ res->end = end;
+
+ return AE_OK;
+}
+
static void handle_ioapic_add(acpi_handle handle, struct pci_dev **pdev,
u32 *pgsi_base)
{
@@ -54,33 +84,56 @@ static void handle_ioapic_add(acpi_handle handle, struct pci_dev **pdev,
return;

dev = acpi_get_pci_dev(handle);
- if (!dev)
- return;
+ if (!dev || !pci_resource_len(dev, 0)) {
+ struct acpi_device_info *info;
+ char *hid = NULL;
+
+ status = acpi_get_object_info(handle, &info);
+ if (ACPI_FAILURE(status))
+ goto exit_put;
+ if (info->valid & ACPI_VALID_HID)
+ hid = info->hardware_id.string;
+ if (!hid || strcmp(hid, "ACPI0009")) {
+ kfree(info);
+ goto exit_put;
+ }
+ kfree(info);
+ memset(res, 0, sizeof(*res));
+ acpi_walk_resources(handle, METHOD_NAME__CRS, setup_res, res);
+ if (!res->flags)
+ goto exit_put;
+ }

acpi_get_name(handle, ACPI_FULL_PATHNAME, &buffer);

gsi_base = gsb;
type = "IOxAPIC";
+ if (dev) {
+ ret = pci_enable_device(dev);
+ if (ret < 0)
+ goto exit_put;

- ret = pci_enable_device(dev);
- if (ret < 0)
- goto exit_put;
-
- pci_set_master(dev);
+ pci_set_master(dev);

- if (dev->class == PCI_CLASS_SYSTEM_PIC_IOAPIC)
- type = "IOAPIC";
+ if (dev->class == PCI_CLASS_SYSTEM_PIC_IOAPIC)
+ type = "IOAPIC";

- if (pci_request_region(dev, 0, type))
- goto exit_disable;
+ if (pci_resource_len(dev, 0)) {
+ if (pci_request_region(dev, 0, type))
+ goto exit_disable;

- res = &dev->resource[0];
+ res = &dev->resource[0];
+ }
+ }

- if (acpi_register_ioapic(handle, res->start, gsi_base))
- goto exit_release;
+ if (acpi_register_ioapic(handle, res->start, gsi_base)) {
+ if (dev)
+ goto exit_release;
+ return;
+ }

pr_info("%s %s %s at %pR, GSI %u\n",
- dev_name(&dev->dev), objname, type,
+ dev ? dev_name(&dev->dev) : "", objname, type,
res, gsi_base);

*pdev = dev;
@@ -100,6 +153,9 @@ static void handle_ioapic_remove(acpi_handle handle, struct pci_dev *dev,
{
acpi_unregister_ioapic(handle, gsi_base);

+ if (!dev)
+ return;
+
pci_release_region(dev, 0);
pci_disable_device(dev);
pci_dev_put(dev);
--
1.8.1.4

2013-06-07 22:31:51

by Yinghai Lu

[permalink] [raw]
Subject: [PATCH v3 24/27] PCI, x86: Make ioapic hotplug support built-in

ioapic hotplug should be built-in like pci root bus hotplug.

Also need to make it depends on X86_IO_APIC.

Signed-off-by: Yinghai Lu <[email protected]>
---
drivers/pci/Kconfig | 3 ++-
drivers/pci/ioapic.c | 7 -------
2 files changed, 2 insertions(+), 8 deletions(-)

diff --git a/drivers/pci/Kconfig b/drivers/pci/Kconfig
index 6d51aa6..720989f 100644
--- a/drivers/pci/Kconfig
+++ b/drivers/pci/Kconfig
@@ -110,10 +110,11 @@ config PCI_PASID
If unsure, say N.

config PCI_IOAPIC
- tristate "PCI IO-APIC hotplug support" if X86
+ bool "PCI IO-APIC hotplug support" if X86
depends on PCI
depends on ACPI
depends on HOTPLUG
+ depends on X86_IO_APIC
default !X86

config PCI_LABEL
diff --git a/drivers/pci/ioapic.c b/drivers/pci/ioapic.c
index 3c6bbdd..7d6b157 100644
--- a/drivers/pci/ioapic.c
+++ b/drivers/pci/ioapic.c
@@ -117,13 +117,6 @@ static int __init ioapic_init(void)
{
return pci_register_driver(&ioapic_driver);
}
-
-static void __exit ioapic_exit(void)
-{
- pci_unregister_driver(&ioapic_driver);
-}
-
module_init(ioapic_init);
-module_exit(ioapic_exit);

MODULE_LICENSE("GPL");
--
1.8.1.4

2013-06-07 22:32:47

by Yinghai Lu

[permalink] [raw]
Subject: [PATCH v3 12/27] x86, irq: Move down arch_early_irq_init()

Change position only.

Prepare to update arch_early_irq_init(), it needs to call some static
functions.

Signed-off-by: Yinghai Lu <[email protected]>
Cc: Joerg Roedel <[email protected]>
Cc: Konrad Rzeszutek Wilk <[email protected]>
Cc: Sebastian Andrzej Siewior <[email protected]>
---
arch/x86/kernel/apic/io_apic.c | 89 +++++++++++++++++++++---------------------
1 file changed, 44 insertions(+), 45 deletions(-)

diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
index a157a56..2fcf813 100644
--- a/arch/x86/kernel/apic/io_apic.c
+++ b/arch/x86/kernel/apic/io_apic.c
@@ -185,51 +185,6 @@ static struct irq_pin_list *alloc_irq_pin_list(int node)
return kzalloc_node(sizeof(struct irq_pin_list), GFP_KERNEL, node);
}

-
-/* irq_cfg is indexed by the sum of all RTEs in all I/O APICs. */
-static struct irq_cfg irq_cfgx[NR_IRQS_LEGACY];
-
-int __init arch_early_irq_init(void)
-{
- struct irq_cfg *cfg;
- int count, node, i;
-
- if (!legacy_pic->nr_legacy_irqs)
- io_apic_irqs = ~0UL;
-
- for (i = 0; i < nr_ioapics; i++) {
- ioapics[i].saved_registers =
- kzalloc(sizeof(struct IO_APIC_route_entry) *
- ioapics[i].nr_registers, GFP_KERNEL);
- if (!ioapics[i].saved_registers)
- pr_err("IOAPIC %d: suspend/resume impossible!\n", i);
- }
-
- cfg = irq_cfgx;
- count = ARRAY_SIZE(irq_cfgx);
- node = cpu_to_node(0);
-
- /* Make sure the legacy interrupts are marked in the bitmap */
- irq_reserve_irqs(0, legacy_pic->nr_legacy_irqs);
-
- for (i = 0; i < count; i++) {
- INIT_LIST_HEAD(&cfg[i].irq_2_pin);
- irq_set_chip_data(i, &cfg[i]);
- zalloc_cpumask_var_node(&cfg[i].domain, GFP_KERNEL, node);
- zalloc_cpumask_var_node(&cfg[i].old_domain, GFP_KERNEL, node);
- /*
- * For legacy IRQ's, start with assigning irq0 to irq15 to
- * IRQ0_VECTOR to IRQ15_VECTOR for all cpu's.
- */
- if (i < legacy_pic->nr_legacy_irqs) {
- cfg[i].vector = IRQ0_VECTOR + i;
- cpumask_setall(cfg[i].domain);
- }
- }
-
- return 0;
-}
-
static struct irq_cfg *irq_cfg(unsigned int irq)
{
return irq_get_chip_data(irq);
@@ -332,6 +287,50 @@ static struct irq_cfg *realloc_irq_and_cfg_at(unsigned int at, int node)
return alloc_irq_and_cfg_at(at, node);
}

+/* irq_cfg is indexed by the sum of all RTEs in all I/O APICs. */
+static struct irq_cfg irq_cfgx[NR_IRQS_LEGACY];
+
+int __init arch_early_irq_init(void)
+{
+ struct irq_cfg *cfg;
+ int count, node, i;
+
+ if (!legacy_pic->nr_legacy_irqs)
+ io_apic_irqs = ~0UL;
+
+ for (i = 0; i < nr_ioapics; i++) {
+ ioapics[i].saved_registers =
+ kzalloc(sizeof(struct IO_APIC_route_entry) *
+ ioapics[i].nr_registers, GFP_KERNEL);
+ if (!ioapics[i].saved_registers)
+ pr_err("IOAPIC %d: suspend/resume impossible!\n", i);
+ }
+
+ cfg = irq_cfgx;
+ count = ARRAY_SIZE(irq_cfgx);
+ node = cpu_to_node(0);
+
+ /* Make sure the legacy interrupts are marked in the bitmap */
+ irq_reserve_irqs(0, legacy_pic->nr_legacy_irqs);
+
+ for (i = 0; i < count; i++) {
+ INIT_LIST_HEAD(&cfg[i].irq_2_pin);
+ irq_set_chip_data(i, &cfg[i]);
+ zalloc_cpumask_var_node(&cfg[i].domain, GFP_KERNEL, node);
+ zalloc_cpumask_var_node(&cfg[i].old_domain, GFP_KERNEL, node);
+ /*
+ * For legacy IRQ's, start with assigning irq0 to irq15 to
+ * IRQ0_VECTOR to IRQ15_VECTOR for all cpu's.
+ */
+ if (i < legacy_pic->nr_legacy_irqs) {
+ cfg[i].vector = IRQ0_VECTOR + i;
+ cpumask_setall(cfg[i].domain);
+ }
+ }
+
+ return 0;
+}
+
struct io_apic {
unsigned int index;
unsigned int unused[3];
--
1.8.1.4

2013-06-07 22:32:58

by Yinghai Lu

[permalink] [raw]
Subject: [PATCH v3 20/27] x86, irq: Add mp_unregister_ioapic to handle hot-remove ioapic

It will free ioapic related irq_desc and also clear allocated_irqs bits.

Signed-off-by: Yinghai Lu <[email protected]>
Cc: Joerg Roedel <[email protected]>
Cc: Konrad Rzeszutek Wilk <[email protected]>
Cc: Sebastian Andrzej Siewior <[email protected]>
---
arch/x86/include/asm/mpspec.h | 1 +
arch/x86/kernel/apic/io_apic.c | 42 ++++++++++++++++++++++++++++++++++++++++++
2 files changed, 43 insertions(+)

diff --git a/arch/x86/include/asm/mpspec.h b/arch/x86/include/asm/mpspec.h
index 11bb6ea..e70b7e5 100644
--- a/arch/x86/include/asm/mpspec.h
+++ b/arch/x86/include/asm/mpspec.h
@@ -97,6 +97,7 @@ static inline void early_reserve_e820_mpc_new(void) { }
void __cpuinit generic_processor_info(int apicid, int version);
#ifdef CONFIG_ACPI
int __mp_register_ioapic(int id, u32 address, u32 gsi_base, bool hot);
+int mp_unregister_ioapic(u32 gsi_base);
extern void mp_register_ioapic(int id, u32 address, u32 gsi_base);
extern void mp_override_legacy_irq(u8 bus_irq, u8 polarity, u8 trigger,
u32 gsi);
diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
index c581c4c..8a3fedf 100644
--- a/arch/x86/kernel/apic/io_apic.c
+++ b/arch/x86/kernel/apic/io_apic.c
@@ -256,6 +256,14 @@ static void free_irq_at(unsigned int at, struct irq_cfg *cfg)
irq_free_desc(at);
}

+static void free_irqs(unsigned int from, int cnt)
+{
+ int i;
+
+ for (i = from; i < from + cnt; i++)
+ free_irq_at(i, irq_get_chip_data(i));
+}
+
static struct irq_cfg *realloc_irq_and_cfg_at(unsigned int at, int node)
{
struct irq_desc *desc = irq_to_desc(at);
@@ -329,6 +337,16 @@ static void __init reserve_ioapic_gsi_irq_extra(void)
}
}

+static void free_ioapic_gsi_irq_base(int idx)
+{
+ struct mp_ioapic_gsi *gsi_cfg = mp_ioapic_gsi_routing(idx);
+ int irq_base = gsi_cfg->irq_base;
+ int irq_cnt = gsi_cfg->gsi_end - gsi_cfg->gsi_base + 1;
+
+ if (irq_base > 0)
+ free_irqs(irq_base, irq_cnt);
+}
+
static void alloc_ioapic_saved_registers(int idx)
{
if (ioapics[idx].saved_registers)
@@ -342,6 +360,11 @@ static void alloc_ioapic_saved_registers(int idx)
pr_err("IOAPIC %d: suspend/resume impossible!\n", idx);
}

+static void free_ioapic_saved_registers(int idx)
+{
+ kfree(ioapics[idx].saved_registers);
+}
+
int __init arch_early_irq_init(void)
{
int node = cpu_to_node(0);
@@ -3980,6 +4003,25 @@ void mp_register_ioapic(int id, u32 address, u32 gsi_base)
__mp_register_ioapic(id, address, gsi_base, false);
}

+int mp_unregister_ioapic(u32 gsi_base)
+{
+ int idx;
+
+ idx = mp_find_ioapic(gsi_base);
+ if (idx < 0)
+ return -EINVAL;
+
+ free_ioapic_saved_registers(idx);
+
+ free_ioapic_gsi_irq_base(idx);
+
+ clear_fixmap(FIX_IO_APIC_BASE_0 + idx);
+ memset(&ioapics[idx], 0, sizeof(struct ioapic));
+ ioapics[idx].mp_config.apicid = 0xff;
+
+ return 0;
+}
+
/* Enable IOAPIC early just for system timer */
void __init pre_init_apic_IRQ0(void)
{
--
1.8.1.4

2013-06-07 22:33:07

by Yinghai Lu

[permalink] [raw]
Subject: [PATCH v3 11/27] x86, irq: Add realloc_irq_and_cfg_at()

For ioapic hot-add support, it would be easy if we put all irqs
for that ioapic controller together.

We can reserve irq range at first, then reallocate those
pre-reserved one when it is needed.

Add realloc_irq_and_cfg_at() to really allocate irq_desc and cfg,
because pre-reserved only mark bits in allocate_irqs bit maps.

The reasons for not allocating them during reserving:
1. only several pins in ioapic are used, allocate for all pins, will
waste memory for not used pins.
2. relocate later could make sure irq_desc is allocated on local node ram.
as dev->node is set at that point.

-v2: update changelog by adding reasons, requested by Konrad.

Signed-off-by: Yinghai Lu <[email protected]>
Cc: Joerg Roedel <[email protected]>
Cc: Konrad Rzeszutek Wilk <[email protected]>
Cc: Sebastian Andrzej Siewior <[email protected]>
---
arch/x86/kernel/apic/io_apic.c | 32 +++++++++++++++++++++++++++++++-
include/linux/irq.h | 5 +++++
kernel/irq/irqdesc.c | 26 ++++++++++++++++++++++++++
3 files changed, 62 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
index 670c538..a157a56 100644
--- a/arch/x86/kernel/apic/io_apic.c
+++ b/arch/x86/kernel/apic/io_apic.c
@@ -301,6 +301,36 @@ static void free_irq_at(unsigned int at, struct irq_cfg *cfg)
irq_free_desc(at);
}

+static struct irq_cfg *realloc_irq_and_cfg_at(unsigned int at, int node)
+{
+ struct irq_desc *desc = irq_to_desc(at);
+ struct irq_cfg *cfg;
+ int res;
+
+ if (desc) {
+ if (irq_desc_get_irq_data(desc)->node == node)
+ return alloc_irq_and_cfg_at(at, node);
+
+ cfg = irq_desc_get_chip_data(desc);
+ if (cfg) {
+ /* shared irq */
+ if (!list_empty(&cfg->irq_2_pin))
+ return cfg;
+ free_irq_cfg(at, cfg);
+ }
+ }
+
+ res = irq_realloc_desc_at(at, node);
+ if (res >= 0) {
+ cfg = alloc_irq_cfg(at, node);
+ if (cfg) {
+ irq_set_chip_data(at, cfg);
+ return cfg;
+ }
+ }
+
+ return alloc_irq_and_cfg_at(at, node);
+}

struct io_apic {
unsigned int index;
@@ -3352,7 +3382,7 @@ int arch_setup_ht_irq(unsigned int irq, struct pci_dev *dev)
static int
io_apic_setup_irq_pin(unsigned int irq, int node, struct io_apic_irq_attr *attr)
{
- struct irq_cfg *cfg = alloc_irq_and_cfg_at(irq, node);
+ struct irq_cfg *cfg = realloc_irq_and_cfg_at(irq, node);
int ret;

if (!cfg)
diff --git a/include/linux/irq.h b/include/linux/irq.h
index 4e0fcbb..9c6c047 100644
--- a/include/linux/irq.h
+++ b/include/linux/irq.h
@@ -602,6 +602,11 @@ void irq_free_descs(unsigned int irq, unsigned int cnt);
int irq_reserve_irqs(unsigned int from, unsigned int cnt);
int __irq_reserve_irqs(int irq, unsigned int from, unsigned int cnt);

+int __irq_realloc_desc(int at, int node, struct module *owner);
+/* use macros to avoid needing export.h for THIS_MODULE */
+#define irq_realloc_desc_at(at, node) \
+ __irq_realloc_desc(at, node, THIS_MODULE)
+
static inline void irq_free_desc(unsigned int irq)
{
irq_free_descs(irq, 1);
diff --git a/kernel/irq/irqdesc.c b/kernel/irq/irqdesc.c
index 3b9fb92..b48f65b 100644
--- a/kernel/irq/irqdesc.c
+++ b/kernel/irq/irqdesc.c
@@ -99,6 +99,11 @@ EXPORT_SYMBOL_GPL(nr_irqs);
static DEFINE_MUTEX(sparse_irq_lock);
static DECLARE_BITMAP(allocated_irqs, IRQ_BITMAP_BITS);

+static bool __irq_is_reserved(int irq)
+{
+ return !!test_bit(irq, allocated_irqs);
+}
+
#ifdef CONFIG_SPARSE_IRQ

static RADIX_TREE(irq_desc_tree, GFP_KERNEL);
@@ -410,6 +415,27 @@ __irq_alloc_descs(int irq, unsigned int from, unsigned int cnt, int node,
EXPORT_SYMBOL_GPL(__irq_alloc_descs);

/**
+ * irq_realloc_desc - allocate irq descriptor for irq that is already reserved
+ * @irq: Allocate for specific irq number if irq >= 0
+ * @node: Preferred node on which the irq descriptor should be allocated
+ * @owner: Owning module (can be NULL)
+ *
+ * Returns the irq number or error code
+ */
+int __ref
+__irq_realloc_desc(int irq, int node, struct module *owner)
+{
+ if (!__irq_is_reserved(irq))
+ return -EINVAL;
+
+ if (irq_to_desc(irq))
+ free_desc(irq);
+
+ return alloc_descs(irq, 1, node, owner);
+}
+EXPORT_SYMBOL_GPL(__irq_realloc_desc);
+
+/**
* irq_reserve_irqs - mark irqs allocated
* @from: mark from irq number
* @cnt: number of irqs to mark
--
1.8.1.4

2013-06-07 22:33:15

by Yinghai Lu

[permalink] [raw]
Subject: [PATCH v3 05/27] x86, irq: Make dmar_msi/hpet_msi irq_chip name consistent

All others are using "-" instead of "_".

Change dmar_msi and hpet_msi to use "-".

Signed-off-by: Yinghai Lu <[email protected]>
Cc: Joerg Roedel <[email protected]>
Cc: Konrad Rzeszutek Wilk <[email protected]>
Cc: Sebastian Andrzej Siewior <[email protected]>
---
arch/x86/kernel/apic/io_apic.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
index f8d4d8d..436fb38 100644
--- a/arch/x86/kernel/apic/io_apic.c
+++ b/arch/x86/kernel/apic/io_apic.c
@@ -3199,7 +3199,7 @@ dmar_msi_set_affinity(struct irq_data *data, const struct cpumask *mask,
}

static struct irq_chip dmar_msi_type = {
- .name = "DMAR_MSI",
+ .name = "DMAR-MSI",
.irq_unmask = dmar_msi_unmask,
.irq_mask = dmar_msi_mask,
.irq_ack = ack_apic_edge,
@@ -3247,7 +3247,7 @@ static int hpet_msi_set_affinity(struct irq_data *data,
}

struct irq_chip hpet_msi_type = {
- .name = "HPET_MSI",
+ .name = "HPET-MSI",
.irq_unmask = hpet_msi_unmask,
.irq_mask = hpet_msi_mask,
.irq_ack = ack_apic_edge,
--
1.8.1.4

2013-06-07 22:33:19

by Yinghai Lu

[permalink] [raw]
Subject: [PATCH v3 02/27] x86, irq: Modify irq chip once for irq remapping

Current code: after irq remapping is enabled, irq_chip fields are modified
during every irq setup.
mp_register_gsi
io_apic_set_pci_routing
io_apic_setup_irq_pin
setup_ioapic_irq
ioapic_register_intr
setup_remapped_irq
native_setup_msi_irqs
setup_msi_irq
setup_remapped_irq
default_setup_hpet_msi
setup_remapped_irq
that is not efficient.

We only need to modify those irq chip one time just after we enable
irq mapping.

Change irq_remap_modify_chip_defaults() to __init as it only gets
called during booting stage, via irq_remap_modify_chips().

Affected irq_chip: ioapic_chip, msi_chip, hpet_msi_type.
We don't need to use #ifdef in irq_remap_modify_chips():
IRQ_REMAP only support x86_64 and X86_IO_APIC and PCI_MSI.
HPET_TIMER is set when x86_64 is set.
When we have IRQ_REMAP enabled, al three chips are defined and
used.

Signed-off-by: Yinghai Lu <[email protected]>
Cc: Konrad Rzeszutek Wilk <[email protected]>
Cc: Sebastian Andrzej Siewior <[email protected]>
Cc: Joerg Roedel <[email protected]>
---
arch/x86/include/asm/io_apic.h | 5 +++++
arch/x86/kernel/apic/apic.c | 9 ++++++++-
arch/x86/kernel/apic/io_apic.c | 8 +++-----
drivers/iommu/irq_remapping.c | 11 +++++++++--
4 files changed, 25 insertions(+), 8 deletions(-)

diff --git a/arch/x86/include/asm/io_apic.h b/arch/x86/include/asm/io_apic.h
index 459e50a..eaff3ad 100644
--- a/arch/x86/include/asm/io_apic.h
+++ b/arch/x86/include/asm/io_apic.h
@@ -215,6 +215,11 @@ static inline void io_apic_modify(unsigned int apic, unsigned int reg, unsigned

extern void io_apic_eoi(unsigned int apic, unsigned int vector);

+extern struct irq_chip ioapic_chip;
+extern struct irq_chip msi_chip;
+extern struct irq_chip hpet_msi_type;
+void irq_remap_modify_chips(void);
+
#else /* !CONFIG_X86_IO_APIC */

#define io_apic_assign_pci_irqs 0
diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
index 904611b..ff50b90 100644
--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -1552,6 +1552,8 @@ void enable_x2apic(void)
int __init enable_IR(void)
{
#ifdef CONFIG_IRQ_REMAP
+ int ret;
+
if (!irq_remapping_supported()) {
pr_debug("intr-remapping not supported\n");
return -1;
@@ -1563,7 +1565,12 @@ int __init enable_IR(void)
return -1;
}

- return irq_remapping_enable();
+ ret = irq_remapping_enable();
+
+ if (ret >= 0)
+ irq_remap_modify_chips();
+
+ return ret;
#endif
return -1;
}
diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
index 9ed796c..f78f7f6 100644
--- a/arch/x86/kernel/apic/io_apic.c
+++ b/arch/x86/kernel/apic/io_apic.c
@@ -1238,8 +1238,6 @@ void __setup_vector_irq(int cpu)
raw_spin_unlock(&vector_lock);
}

-static struct irq_chip ioapic_chip;
-
#ifdef CONFIG_X86_32
static inline int IO_APIC_irq_trigger(int irq)
{
@@ -2511,7 +2509,7 @@ static void ack_apic_level(struct irq_data *data)
ioapic_irqd_unmask(data, cfg, masked);
}

-static struct irq_chip ioapic_chip __read_mostly = {
+struct irq_chip ioapic_chip __read_mostly = {
.name = "IO-APIC",
.irq_startup = startup_ioapic_irq,
.irq_mask = mask_ioapic_irq,
@@ -3093,7 +3091,7 @@ msi_set_affinity(struct irq_data *data, const struct cpumask *mask, bool force)
* IRQ Chip for MSI PCI/PCI-X/PCI-Express Devices,
* which implement the MSI or MSI-X Capability Structure.
*/
-static struct irq_chip msi_chip = {
+struct irq_chip msi_chip = {
.name = "PCI-MSI",
.irq_unmask = unmask_msi_irq,
.irq_mask = mask_msi_irq,
@@ -3240,7 +3238,7 @@ static int hpet_msi_set_affinity(struct irq_data *data,
return IRQ_SET_MASK_OK_NOCOPY;
}

-static struct irq_chip hpet_msi_type = {
+struct irq_chip hpet_msi_type = {
.name = "HPET_MSI",
.irq_unmask = hpet_msi_unmask,
.irq_mask = hpet_msi_mask,
diff --git a/drivers/iommu/irq_remapping.c b/drivers/iommu/irq_remapping.c
index 07ce86a..21ef344 100644
--- a/drivers/iommu/irq_remapping.c
+++ b/drivers/iommu/irq_remapping.c
@@ -373,19 +373,26 @@ static void ir_print_prefix(struct irq_data *data, struct seq_file *p)
seq_printf(p, " IR-%s", data->chip->name);
}

-static void irq_remap_modify_chip_defaults(struct irq_chip *chip)
+static void __init irq_remap_modify_chip_defaults(struct irq_chip *chip)
{
+ printk(KERN_DEBUG "irq_chip: %s ==> IR-%s", chip->name, chip->name);
chip->irq_print_chip = ir_print_prefix;
chip->irq_ack = ir_ack_apic_edge;
chip->irq_eoi = ir_ack_apic_level;
chip->irq_set_affinity = x86_io_apic_ops.set_affinity;
}

+void __init irq_remap_modify_chips(void)
+{
+ irq_remap_modify_chip_defaults(&ioapic_chip);
+ irq_remap_modify_chip_defaults(&msi_chip);
+ irq_remap_modify_chip_defaults(&hpet_msi_type);
+}
+
bool setup_remapped_irq(int irq, struct irq_cfg *cfg, struct irq_chip *chip)
{
if (!irq_remapped(cfg))
return false;
irq_set_status_flags(irq, IRQ_MOVE_PCNTXT);
- irq_remap_modify_chip_defaults(chip);
return true;
}
--
1.8.1.4

2013-06-07 22:33:24

by Yinghai Lu

[permalink] [raw]
Subject: [PATCH v3 13/27] x86, irq: Split out alloc_ioapic_save_registers()

Split alloc_ioapic_save_registers() from early_irq_init(),
so it will be per ioapic.

Will call that later for hot-added ioapic controller.

Signed-off-by: Yinghai Lu <[email protected]>
Cc: Joerg Roedel <[email protected]>
Cc: Konrad Rzeszutek Wilk <[email protected]>
Cc: Sebastian Andrzej Siewior <[email protected]>
---
arch/x86/kernel/apic/io_apic.c | 22 +++++++++++++++-------
1 file changed, 15 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
index 2fcf813..23b0be5 100644
--- a/arch/x86/kernel/apic/io_apic.c
+++ b/arch/x86/kernel/apic/io_apic.c
@@ -290,6 +290,19 @@ static struct irq_cfg *realloc_irq_and_cfg_at(unsigned int at, int node)
/* irq_cfg is indexed by the sum of all RTEs in all I/O APICs. */
static struct irq_cfg irq_cfgx[NR_IRQS_LEGACY];

+static void alloc_ioapic_saved_registers(int idx)
+{
+ if (ioapics[idx].saved_registers)
+ return;
+
+ ioapics[idx].saved_registers =
+ kzalloc(sizeof(struct IO_APIC_route_entry) *
+ ioapics[idx].nr_registers, GFP_KERNEL);
+
+ if (!ioapics[idx].saved_registers)
+ pr_err("IOAPIC %d: suspend/resume impossible!\n", idx);
+}
+
int __init arch_early_irq_init(void)
{
struct irq_cfg *cfg;
@@ -298,13 +311,8 @@ int __init arch_early_irq_init(void)
if (!legacy_pic->nr_legacy_irqs)
io_apic_irqs = ~0UL;

- for (i = 0; i < nr_ioapics; i++) {
- ioapics[i].saved_registers =
- kzalloc(sizeof(struct IO_APIC_route_entry) *
- ioapics[i].nr_registers, GFP_KERNEL);
- if (!ioapics[i].saved_registers)
- pr_err("IOAPIC %d: suspend/resume impossible!\n", i);
- }
+ for (i = 0; i < nr_ioapics; i++)
+ alloc_ioapic_saved_registers(i);

cfg = irq_cfgx;
count = ARRAY_SIZE(irq_cfgx);
--
1.8.1.4

2013-06-07 22:33:36

by Yinghai Lu

[permalink] [raw]
Subject: [PATCH v3 09/27] x86, irq: Convert irq_2_pin list to generic list

Now irq_2_pin list is own grown list.
We can use generic list to replace it so we could use generic helper
functions to operate it.

Also make free_irq_cfg() free irq_2_pin list to support coming ioapic
hotplug.

Signed-off-by: Yinghai Lu <[email protected]>
Cc: Konrad Rzeszutek Wilk <[email protected]>
Cc: Sebastian Andrzej Siewior <[email protected]>
Cc: Joerg Roedel <[email protected]>
---
arch/x86/include/asm/hw_irq.h | 2 +-
arch/x86/kernel/apic/io_apic.c | 22 ++++++++++++----------
2 files changed, 13 insertions(+), 11 deletions(-)

diff --git a/arch/x86/include/asm/hw_irq.h b/arch/x86/include/asm/hw_irq.h
index 1da97ef..dc3942c 100644
--- a/arch/x86/include/asm/hw_irq.h
+++ b/arch/x86/include/asm/hw_irq.h
@@ -122,7 +122,7 @@ struct irq_2_irte {
* Most irqs are mapped 1:1 with pins.
*/
struct irq_cfg {
- struct irq_pin_list *irq_2_pin;
+ struct list_head irq_2_pin;
cpumask_var_t domain;
cpumask_var_t old_domain;
u8 vector;
diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
index 70b6617..670c538 100644
--- a/arch/x86/kernel/apic/io_apic.c
+++ b/arch/x86/kernel/apic/io_apic.c
@@ -66,7 +66,7 @@
#define __apicdebuginit(type) static type __init

#define for_each_irq_pin(entry, head) \
- for (entry = head; entry; entry = entry->next)
+ list_for_each_entry(entry, &head, list)

/*
* Is the SiS APIC rmw bug present ?
@@ -176,8 +176,8 @@ void mp_save_irq(struct mpc_intsrc *m)
}

struct irq_pin_list {
+ struct list_head list;
int apic, pin;
- struct irq_pin_list *next;
};

static struct irq_pin_list *alloc_irq_pin_list(int node)
@@ -213,6 +213,7 @@ int __init arch_early_irq_init(void)
irq_reserve_irqs(0, legacy_pic->nr_legacy_irqs);

for (i = 0; i < count; i++) {
+ INIT_LIST_HEAD(&cfg[i].irq_2_pin);
irq_set_chip_data(i, &cfg[i]);
zalloc_cpumask_var_node(&cfg[i].domain, GFP_KERNEL, node);
zalloc_cpumask_var_node(&cfg[i].old_domain, GFP_KERNEL, node);
@@ -245,6 +246,7 @@ static struct irq_cfg *alloc_irq_cfg(unsigned int irq, int node)
goto out_cfg;
if (!zalloc_cpumask_var_node(&cfg->old_domain, GFP_KERNEL, node))
goto out_domain;
+ INIT_LIST_HEAD(&cfg->irq_2_pin);
return cfg;
out_domain:
free_cpumask_var(cfg->domain);
@@ -255,11 +257,15 @@ out_cfg:

static void free_irq_cfg(unsigned int at, struct irq_cfg *cfg)
{
+ struct irq_pin_list *entry, *tmp;
+
if (!cfg)
return;
irq_set_chip_data(at, NULL);
free_cpumask_var(cfg->domain);
free_cpumask_var(cfg->old_domain);
+ list_for_each_entry_safe(entry, tmp, &cfg->irq_2_pin, list)
+ kfree(entry);
kfree(cfg);
}

@@ -420,15 +426,12 @@ static void ioapic_mask_entry(int apic, int pin)
*/
static int __add_pin_to_irq_node(struct irq_cfg *cfg, int node, int apic, int pin)
{
- struct irq_pin_list **last, *entry;
+ struct irq_pin_list *entry;

/* don't allow duplicates */
- last = &cfg->irq_2_pin;
- for_each_irq_pin(entry, cfg->irq_2_pin) {
+ for_each_irq_pin(entry, cfg->irq_2_pin)
if (entry->apic == apic && entry->pin == pin)
return 0;
- last = &entry->next;
- }

entry = alloc_irq_pin_list(node);
if (!entry) {
@@ -439,7 +442,7 @@ static int __add_pin_to_irq_node(struct irq_cfg *cfg, int node, int apic, int pi
entry->apic = apic;
entry->pin = pin;

- *last = entry;
+ list_add_tail(&entry->list, &cfg->irq_2_pin);
return 0;
}

@@ -1622,8 +1625,7 @@ __apicdebuginit(void) print_IO_APICs(void)
cfg = irq_get_chip_data(irq);
if (!cfg)
continue;
- entry = cfg->irq_2_pin;
- if (!entry)
+ if (list_empty(&cfg->irq_2_pin))
continue;
printk(KERN_DEBUG "IRQ%d ", irq);
for_each_irq_pin(entry, cfg->irq_2_pin)
--
1.8.1.4

2013-06-07 22:33:42

by Yinghai Lu

[permalink] [raw]
Subject: [PATCH v3 04/27] x86, irq: Show MSI-X in /proc/interrupt

Now MSI-X is shown as MSI in /proc/interrupt.

We could use new added irq_print_chip() interface to append -X for MSI-X.

After this patch, we will have
PCI-MSI-edge
PCI-MSI-X-edge
IR-PCI-MSI-edge
IR-PCI-MSI-X-edge
in the /proc/interrupt

-v2: do not need to check if msi_desc is null in msi_irq_print_chip().

Signed-off-by: Yinghai Lu <[email protected]>
Cc: Joerg Roedel <[email protected]>
Cc: Konrad Rzeszutek Wilk <[email protected]>
Cc: Sebastian Andrzej Siewior <[email protected]>
---
arch/x86/kernel/apic/io_apic.c | 7 +++++++
drivers/iommu/irq_remapping.c | 5 ++++-
2 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
index 059589f..f8d4d8d 100644
--- a/arch/x86/kernel/apic/io_apic.c
+++ b/arch/x86/kernel/apic/io_apic.c
@@ -3087,6 +3087,12 @@ msi_set_affinity(struct irq_data *data, const struct cpumask *mask, bool force)
return IRQ_SET_MASK_OK_NOCOPY;
}

+static void msi_irq_print_chip(struct irq_data *data, struct seq_file *p)
+{
+ seq_printf(p, " %s%s", data->chip->name,
+ data->msi_desc->msi_attrib.is_msix ? "-X" : "");
+}
+
/*
* IRQ Chip for MSI PCI/PCI-X/PCI-Express Devices,
* which implement the MSI or MSI-X Capability Structure.
@@ -3098,6 +3104,7 @@ struct irq_chip msi_chip = {
.irq_ack = ack_apic_edge,
.irq_set_affinity = msi_set_affinity,
.irq_retrigger = ioapic_retrigger_irq,
+ .irq_print_chip = msi_irq_print_chip,
};

int setup_msi_irq(struct pci_dev *dev, struct msi_desc *msidesc,
diff --git a/drivers/iommu/irq_remapping.c b/drivers/iommu/irq_remapping.c
index 21ef344..05496bd 100644
--- a/drivers/iommu/irq_remapping.c
+++ b/drivers/iommu/irq_remapping.c
@@ -370,7 +370,10 @@ static void ir_ack_apic_level(struct irq_data *data)

static void ir_print_prefix(struct irq_data *data, struct seq_file *p)
{
- seq_printf(p, " IR-%s", data->chip->name);
+ seq_printf(p, " IR-%s%s", data->chip->name,
+ data->msi_desc ?
+ (data->msi_desc->msi_attrib.is_msix ? "-X" : "")
+ : "");
}

static void __init irq_remap_modify_chip_defaults(struct irq_chip *chip)
--
1.8.1.4

2013-06-07 22:33:47

by Yinghai Lu

[permalink] [raw]
Subject: [PATCH v3 19/27] x86, irq: Make mp_register_ioapic handle hot-added ioapic

It needs to reserve irq range in allocated_irqs bitmaps
and irq_base will be used to get right irq for ioapic/pin or gsi.

Signed-off-by: Yinghai Lu <[email protected]>
Cc: Paul Gortmaker <[email protected]>
Cc: Joerg Roedel <[email protected]>
Cc: Konrad Rzeszutek Wilk <[email protected]>
Cc: Sebastian Andrzej Siewior <[email protected]>
---
arch/x86/include/asm/mpspec.h | 1 +
arch/x86/kernel/apic/io_apic.c | 59 ++++++++++++++++++++++++++++++++----------
2 files changed, 46 insertions(+), 14 deletions(-)

diff --git a/arch/x86/include/asm/mpspec.h b/arch/x86/include/asm/mpspec.h
index 3e2f42a..11bb6ea 100644
--- a/arch/x86/include/asm/mpspec.h
+++ b/arch/x86/include/asm/mpspec.h
@@ -96,6 +96,7 @@ static inline void early_reserve_e820_mpc_new(void) { }

void __cpuinit generic_processor_info(int apicid, int version);
#ifdef CONFIG_ACPI
+int __mp_register_ioapic(int id, u32 address, u32 gsi_base, bool hot);
extern void mp_register_ioapic(int id, u32 address, u32 gsi_base);
extern void mp_override_legacy_irq(u8 bus_irq, u8 polarity, u8 trigger,
u32 gsi);
diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
index 850d8e99..c581c4c 100644
--- a/arch/x86/kernel/apic/io_apic.c
+++ b/arch/x86/kernel/apic/io_apic.c
@@ -3890,7 +3890,7 @@ static __init int bad_ioapic_register(int idx)
return 0;
}

-void __init mp_register_ioapic(int id, u32 address, u32 gsi_base)
+int __mp_register_ioapic(int id, u32 address, u32 gsi_base, bool hotadd)
{
int idx;
int entries;
@@ -3898,11 +3898,19 @@ void __init mp_register_ioapic(int id, u32 address, u32 gsi_base)

idx = __mp_find_ioapic(gsi_base, true);
if (idx >= 0)
- return;
+ return -EINVAL;

idx = nr_ioapics;
+ if (hotadd) {
+ /* find free spot */
+ for (idx = 0; idx < nr_ioapics; idx++)
+ if (!ioapics[idx].nr_registers &&
+ ioapics[idx].mp_config.apicid == 0xff)
+ break;
+ }
+
if (bad_ioapic(idx, address))
- return;
+ return -EINVAL;

ioapics[idx].mp_config.type = MP_IOAPIC;
ioapics[idx].mp_config.flags = MPC_APIC_USABLE;
@@ -3910,10 +3918,8 @@ void __init mp_register_ioapic(int id, u32 address, u32 gsi_base)

set_fixmap_nocache(FIX_IO_APIC_BASE_0 + idx, address);

- if (bad_ioapic_register(idx)) {
- clear_fixmap(FIX_IO_APIC_BASE_0 + idx);
- return;
- }
+ if (bad_ioapic_register(idx))
+ goto failed;

ioapics[idx].mp_config.apicid = io_apic_unique_id(id);
ioapics[idx].mp_config.apicver = io_apic_get_version(idx);
@@ -3924,10 +3930,8 @@ void __init mp_register_ioapic(int id, u32 address, u32 gsi_base)
*/
entries = io_apic_get_redir_entries(idx);

- if (!entries || entries > MP_MAX_IOAPIC_PIN) {
- clear_fixmap(FIX_IO_APIC_BASE_0 + idx);
- return;
- }
+ if (!entries || entries > MP_MAX_IOAPIC_PIN)
+ goto failed;

gsi_cfg = mp_ioapic_gsi_routing(idx);
gsi_cfg->gsi_base = gsi_base;
@@ -3938,15 +3942,42 @@ void __init mp_register_ioapic(int id, u32 address, u32 gsi_base)
*/
ioapics[idx].nr_registers = entries;

- if (gsi_cfg->gsi_end >= gsi_top)
- gsi_top = gsi_cfg->gsi_end + 1;
+ if (!hotadd) {
+ /*
+ * irqs will be reserved in arch_early_irq_init()
+ * don't need to update gsi_top for hot add case
+ */
+ if (gsi_cfg->gsi_end >= gsi_top)
+ gsi_top = gsi_cfg->gsi_end + 1;
+ } else {
+ int irq = reserve_ioapic_gsi_irq_base(idx);
+
+ if (irq < 0)
+ goto failed;
+
+ alloc_ioapic_saved_registers(idx);
+ }

pr_info("IOAPIC[%d]: apic_id %d, version %d, address 0x%x, GSI %d-%d\n",
idx, mpc_ioapic_id(idx),
mpc_ioapic_ver(idx), mpc_ioapic_addr(idx),
gsi_cfg->gsi_base, gsi_cfg->gsi_end);

- nr_ioapics++;
+ if (idx == nr_ioapics)
+ nr_ioapics++;
+
+ return 0;
+
+failed:
+ clear_fixmap(FIX_IO_APIC_BASE_0 + idx);
+ memset(&ioapics[idx], 0, sizeof(struct ioapic));
+ ioapics[idx].mp_config.apicid = 0xff;
+ return -EINVAL;
+}
+
+void mp_register_ioapic(int id, u32 address, u32 gsi_base)
+{
+ __mp_register_ioapic(id, address, gsi_base, false);
}

/* Enable IOAPIC early just for system timer */
--
1.8.1.4

2013-06-07 22:33:56

by Yinghai Lu

[permalink] [raw]
Subject: [PATCH v3 16/27] x86, irq: Add ioapic_gsi_to_irq

For hot add ioapic, irq_base is not equal to gsi_base.
We need a way to do mapping between gsi and irq.

Also remove irq_to_gsi() that is confusing, just use that array
directly as only caller already check input irq before.

Signed-off-by: Yinghai Lu <[email protected]>
Cc: Pavel Machek <[email protected]>
Cc: Joerg Roedel <[email protected]>
Cc: Konrad Rzeszutek Wilk <[email protected]>
Cc: Sebastian Andrzej Siewior <[email protected]>
---
arch/x86/include/asm/io_apic.h | 1 +
arch/x86/kernel/acpi/boot.c | 22 +++++-----------------
arch/x86/kernel/apic/io_apic.c | 29 ++++++++++++++++++++++++++++-
3 files changed, 34 insertions(+), 18 deletions(-)

diff --git a/arch/x86/include/asm/io_apic.h b/arch/x86/include/asm/io_apic.h
index 8181fd8..02ac411 100644
--- a/arch/x86/include/asm/io_apic.h
+++ b/arch/x86/include/asm/io_apic.h
@@ -180,6 +180,7 @@ struct mp_ioapic_gsi{
};
extern struct mp_ioapic_gsi mp_gsi_routing[];
extern u32 gsi_top;
+int ioapic_gsi_to_irq(u32 gsi);
int mp_find_ioapic(u32 gsi);
int mp_find_ioapic_pin(int ioapic, u32 gsi);
void __init mp_register_ioapic(int id, u32 address, u32 gsi_base);
diff --git a/arch/x86/kernel/acpi/boot.c b/arch/x86/kernel/acpi/boot.c
index 230c8ea..7190d0d 100644
--- a/arch/x86/kernel/acpi/boot.c
+++ b/arch/x86/kernel/acpi/boot.c
@@ -111,6 +111,10 @@ static unsigned int gsi_to_irq(unsigned int gsi)
}
}

+#ifdef CONFIG_X86_IO_APIC
+ if (acpi_irq_model == ACPI_IRQ_MODEL_IOAPIC)
+ return ioapic_gsi_to_irq(gsi);
+#endif
/* Provide an identity mapping of gsi == irq
* except on truly weird platforms that have
* non isa irqs in the first 16 gsis.
@@ -123,22 +127,6 @@ static unsigned int gsi_to_irq(unsigned int gsi)
return irq;
}

-static u32 irq_to_gsi(int irq)
-{
- unsigned int gsi;
-
- if (irq < NR_IRQS_LEGACY)
- gsi = isa_irq_to_gsi[irq];
- else if (irq < gsi_top)
- gsi = irq;
- else if (irq < (gsi_top + NR_IRQS_LEGACY))
- gsi = irq - gsi_top;
- else
- gsi = 0xffffffff;
-
- return gsi;
-}
-
/*
* Temporarily use the virtual area starting from FIX_IO_APIC_BASE_END,
* to map the target physical address. The problem is that set_fixmap()
@@ -528,7 +516,7 @@ int acpi_isa_irq_to_gsi(unsigned isa_irq, u32 *gsi)
{
if (isa_irq >= 16)
return -1;
- *gsi = irq_to_gsi(isa_irq);
+ *gsi = isa_irq_to_gsi[isa_irq];
return 0;
}

diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
index 65863bd..7f95bbe 100644
--- a/arch/x86/kernel/apic/io_apic.c
+++ b/arch/x86/kernel/apic/io_apic.c
@@ -1044,13 +1044,16 @@ static int pin_2_irq(int idx, int apic, int pin)

if (test_bit(bus, mp_bus_not_pci)) {
irq = mp_irqs[idx].srcbusirq;
- } else {
+ } else if (gsi_cfg->gsi_base == gsi_cfg->irq_base) {
u32 gsi = gsi_cfg->gsi_base + pin;

if (gsi >= NR_IRQS_LEGACY)
irq = gsi;
else
irq = gsi_top + gsi;
+ } else {
+ /* hotadd ioapic */
+ irq = gsi_cfg->irq_base + pin;
}

#ifdef CONFIG_X86_32
@@ -1476,6 +1479,30 @@ static void __init setup_IO_APIC_irqs(void)
__io_apic_setup_irqs(ioapic_idx);
}

+int ioapic_gsi_to_irq(u32 gsi)
+{
+ int ioapic_idx = 0, irq = gsi;
+ struct mp_ioapic_gsi *gsi_cfg;
+
+ ioapic_idx = mp_find_ioapic(gsi);
+ if (ioapic_idx < 0)
+ return -1;
+
+ gsi_cfg = mp_ioapic_gsi_routing(ioapic_idx);
+ if (gsi_cfg->gsi_base == gsi_cfg->irq_base) {
+ if (gsi < NR_IRQS_LEGACY)
+ irq = gsi_top + gsi;
+ } else {
+ int pin = mp_find_ioapic_pin(ioapic_idx, gsi);
+
+ if (pin < 0)
+ return -1;
+ /* hotadd ioapic */
+ irq = gsi_cfg->irq_base + pin;
+ }
+
+ return irq;
+}
/*
* for the gsit that is not in first ioapic
* but could not use acpi_register_gsi()
--
1.8.1.4

2013-06-07 22:33:53

by Yinghai Lu

[permalink] [raw]
Subject: [PATCH v3 15/27] x86, irq: pre-reserve irq range/realloc for booting path

We will use reserve/realloc_irq_and_cfg_at for hotplug ioapic path.

To make thing simple, we could make booting path use same code.

All gsi range will be reserved at first, and realloc will really
allocate those irq_desc/cfg when it is used.

Signed-off-by: Yinghai Lu <[email protected]>
Cc: Joerg Roedel <[email protected]>
Cc: Konrad Rzeszutek Wilk <[email protected]>
Cc: Sebastian Andrzej Siewior <[email protected]>
---
arch/x86/include/asm/io_apic.h | 1 +
arch/x86/kernel/apic/io_apic.c | 78 ++++++++++++++++++++++++++++++------------
2 files changed, 57 insertions(+), 22 deletions(-)

diff --git a/arch/x86/include/asm/io_apic.h b/arch/x86/include/asm/io_apic.h
index eaff3ad..8181fd8 100644
--- a/arch/x86/include/asm/io_apic.h
+++ b/arch/x86/include/asm/io_apic.h
@@ -176,6 +176,7 @@ extern void setup_ioapic_ids_from_mpc_nocheck(void);
struct mp_ioapic_gsi{
u32 gsi_base;
u32 gsi_end;
+ u32 irq_base;
};
extern struct mp_ioapic_gsi mp_gsi_routing[];
extern u32 gsi_top;
diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
index 23b0be5..65863bd 100644
--- a/arch/x86/kernel/apic/io_apic.c
+++ b/arch/x86/kernel/apic/io_apic.c
@@ -287,8 +287,47 @@ static struct irq_cfg *realloc_irq_and_cfg_at(unsigned int at, int node)
return alloc_irq_and_cfg_at(at, node);
}

-/* irq_cfg is indexed by the sum of all RTEs in all I/O APICs. */
-static struct irq_cfg irq_cfgx[NR_IRQS_LEGACY];
+static int reserve_ioapic_gsi_irq_base(int idx)
+{
+ int irq;
+ struct mp_ioapic_gsi *gsi_cfg = mp_ioapic_gsi_routing(idx);
+ int cnt = gsi_cfg->gsi_end - gsi_cfg->gsi_base + 1;
+
+ irq = __irq_reserve_irqs(-1, gsi_cfg->gsi_base, cnt);
+ if (irq >= 0) {
+ gsi_cfg->irq_base = irq;
+ apic_printk(APIC_VERBOSE, KERN_INFO
+ "IOAPIC[%d]: apic_id %d, GSI %d-%d ==> irq %d-%d reserved\n",
+ idx, mpc_ioapic_id(idx),
+ gsi_cfg->gsi_base, gsi_cfg->gsi_end,
+ irq, irq + cnt - 1);
+ } else
+ apic_printk(APIC_VERBOSE, KERN_WARNING
+ "IOAPIC[%d]: apic_id %d, GSI %d-%d ==> irq reserve failed\n",
+ idx, mpc_ioapic_id(idx),
+ gsi_cfg->gsi_base, gsi_cfg->gsi_end);
+
+ return irq;
+}
+
+static void __init reserve_ioapic_gsi_irq_extra(void)
+{
+ int irq;
+
+ /* to prevent hot add ioapic taking those slots */
+ if (gsi_top) {
+ irq = irq_reserve_irqs(gsi_top, NR_IRQS_LEGACY);
+ if (irq >= 0)
+ apic_printk(APIC_VERBOSE, KERN_INFO
+ "IOAPIC[extra]: GSI %d-%d ==> irq %d-%d reserved\n",
+ gsi_top, gsi_top + NR_IRQS_LEGACY - 1,
+ irq, irq + NR_IRQS_LEGACY - 1);
+ else
+ apic_printk(APIC_VERBOSE, KERN_WARNING
+ "IOAPIC[extra]: GSI %d-%d ==> irq reserve failed\n",
+ gsi_top, gsi_top + NR_IRQS_LEGACY - 1);
+ }
+}

static void alloc_ioapic_saved_registers(int idx)
{
@@ -305,8 +344,9 @@ static void alloc_ioapic_saved_registers(int idx)

int __init arch_early_irq_init(void)
{
+ int node = cpu_to_node(0);
struct irq_cfg *cfg;
- int count, node, i;
+ int i;

if (!legacy_pic->nr_legacy_irqs)
io_apic_irqs = ~0UL;
@@ -314,26 +354,19 @@ int __init arch_early_irq_init(void)
for (i = 0; i < nr_ioapics; i++)
alloc_ioapic_saved_registers(i);

- cfg = irq_cfgx;
- count = ARRAY_SIZE(irq_cfgx);
- node = cpu_to_node(0);
+ for (i = 0; i < nr_ioapics; i++)
+ reserve_ioapic_gsi_irq_base(i);

- /* Make sure the legacy interrupts are marked in the bitmap */
- irq_reserve_irqs(0, legacy_pic->nr_legacy_irqs);
+ reserve_ioapic_gsi_irq_extra();

- for (i = 0; i < count; i++) {
- INIT_LIST_HEAD(&cfg[i].irq_2_pin);
- irq_set_chip_data(i, &cfg[i]);
- zalloc_cpumask_var_node(&cfg[i].domain, GFP_KERNEL, node);
- zalloc_cpumask_var_node(&cfg[i].old_domain, GFP_KERNEL, node);
- /*
- * For legacy IRQ's, start with assigning irq0 to irq15 to
- * IRQ0_VECTOR to IRQ15_VECTOR for all cpu's.
- */
- if (i < legacy_pic->nr_legacy_irqs) {
- cfg[i].vector = IRQ0_VECTOR + i;
- cpumask_setall(cfg[i].domain);
- }
+ /*
+ * For legacy IRQ's, start with assigning irq0 to irq15 to
+ * IRQ0_VECTOR to IRQ15_VECTOR for all cpu's.
+ */
+ for (i = 0; i < legacy_pic->nr_legacy_irqs; i++) {
+ cfg = realloc_irq_and_cfg_at(i, node);
+ cfg->vector = IRQ0_VECTOR + i;
+ cpumask_setall(cfg->domain);
}

return 0;
@@ -3467,7 +3500,8 @@ int __init arch_probe_nr_irqs(void)
if (nr < nr_irqs)
nr_irqs = nr;

- return NR_IRQS_LEGACY;
+ /* x86 arch code will allocate irq_desc/cfg */
+ return 0;
}

int io_apic_set_pci_routing(struct device *dev, int irq,
--
1.8.1.4

2013-06-07 22:34:15

by Yinghai Lu

[permalink] [raw]
Subject: [PATCH v3 22/27] x86, ioapic: Find usable ioapic id for 64bit.

Checking the id in register, if that is duplicated, will pick one and
update id register.

Signed-off-by: Yinghai Lu <[email protected]>
Cc: Joerg Roedel <[email protected]>
Cc: Konrad Rzeszutek Wilk <[email protected]>
Cc: Sebastian Andrzej Siewior <[email protected]>
---
arch/x86/kernel/apic/io_apic.c | 41 ++++++++++++++++++++++++++++++++++-------
1 file changed, 34 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
index 970c4fd..8eb4cd3 100644
--- a/arch/x86/kernel/apic/io_apic.c
+++ b/arch/x86/kernel/apic/io_apic.c
@@ -3660,27 +3660,54 @@ static int __init io_apic_get_unique_id(int ioapic, int apic_id)
return apic_id;
}

-static u8 __init io_apic_unique_id(u8 id)
+static u8 __init io_apic_unique_id(int idx, u8 id)
{
if ((boot_cpu_data.x86_vendor == X86_VENDOR_INTEL) &&
!APIC_XAPIC(apic_version[boot_cpu_physical_apicid]))
- return io_apic_get_unique_id(nr_ioapics, id);
+ return io_apic_get_unique_id(idx, id);
else
return id;
}
#else
-static u8 __init io_apic_unique_id(u8 id)
+static u8 io_apic_unique_id(int idx, u8 id)
{
int i;
+ u8 new_id;
+ unsigned long flags;
DECLARE_BITMAP(used, 256);
+ union IO_APIC_reg_00 reg_00;

bitmap_zero(used, 256);
- for (i = 0; i < nr_ioapics; i++) {
+ for (i = 0; i < nr_ioapics; i++)
__set_bit(mpc_ioapic_id(i), used);
- }
if (!test_bit(id, used))
return id;
- return find_first_zero_bit(used, 256);
+
+ /* check register at first */
+ raw_spin_lock_irqsave(&ioapic_lock, flags);
+ reg_00.raw = io_apic_read(idx, 0);
+ raw_spin_unlock_irqrestore(&ioapic_lock, flags);
+ new_id = reg_00.bits.ID;
+ if (!test_bit(new_id, used)) {
+ apic_printk(APIC_VERBOSE, KERN_INFO
+ "IOAPIC[%d]: Using reg apic_id %d instead of %d\n",
+ idx, new_id, id);
+ return new_id;
+ }
+
+ new_id = find_first_zero_bit(used, 256);
+ reg_00.bits.ID = new_id;
+ raw_spin_lock_irqsave(&ioapic_lock, flags);
+ io_apic_write(idx, 0, reg_00.raw);
+ reg_00.raw = io_apic_read(idx, 0);
+ raw_spin_unlock_irqrestore(&ioapic_lock, flags);
+
+ /* Sanity check */
+ if (reg_00.bits.ID != new_id)
+ pr_warn("IOAPIC[%d]: Unable to change apic_id to %d!\n",
+ idx, new_id);
+
+ return new_id;
}
#endif

@@ -3961,7 +3988,7 @@ int __mp_register_ioapic(int id, u32 address, u32 gsi_base, bool hotadd)
if (bad_ioapic_register(idx))
goto failed;

- ioapics[idx].mp_config.apicid = io_apic_unique_id(id);
+ ioapics[idx].mp_config.apicid = io_apic_unique_id(idx, id);
ioapics[idx].mp_config.apicver = io_apic_get_version(idx);

/*
--
1.8.1.4

2013-06-07 22:33:32

by Yinghai Lu

[permalink] [raw]
Subject: [PATCH v3 01/27] x86, irq: Change irq_remap_modify_chip_defaults to static

Change irq_remap_modify_chip_defaults() to static, as we have no
outside user.

Signed-off-by: Yinghai Lu <[email protected]>
Cc: Joerg Roedel <[email protected]>
Cc: Konrad Rzeszutek Wilk <[email protected]>
Cc: Sebastian Andrzej Siewior <[email protected]>
---
arch/x86/include/asm/irq_remapping.h | 6 ------
drivers/iommu/irq_remapping.c | 2 +-
2 files changed, 1 insertion(+), 7 deletions(-)

diff --git a/arch/x86/include/asm/irq_remapping.h b/arch/x86/include/asm/irq_remapping.h
index d806b22..606f42e 100644
--- a/arch/x86/include/asm/irq_remapping.h
+++ b/arch/x86/include/asm/irq_remapping.h
@@ -56,8 +56,6 @@ extern bool setup_remapped_irq(int irq,
struct irq_cfg *cfg,
struct irq_chip *chip);

-void irq_remap_modify_chip_defaults(struct irq_chip *chip);
-
#else /* CONFIG_IRQ_REMAP */

static inline void setup_irq_remapping_ops(void) { }
@@ -91,10 +89,6 @@ static inline void panic_if_irq_remap(const char *msg)
{
}

-static inline void irq_remap_modify_chip_defaults(struct irq_chip *chip)
-{
-}
-
static inline bool setup_remapped_irq(int irq,
struct irq_cfg *cfg,
struct irq_chip *chip)
diff --git a/drivers/iommu/irq_remapping.c b/drivers/iommu/irq_remapping.c
index dcfea4e..07ce86a 100644
--- a/drivers/iommu/irq_remapping.c
+++ b/drivers/iommu/irq_remapping.c
@@ -373,7 +373,7 @@ static void ir_print_prefix(struct irq_data *data, struct seq_file *p)
seq_printf(p, " IR-%s", data->chip->name);
}

-void irq_remap_modify_chip_defaults(struct irq_chip *chip)
+static void irq_remap_modify_chip_defaults(struct irq_chip *chip)
{
chip->irq_print_chip = ir_print_prefix;
chip->irq_ack = ir_ack_apic_edge;
--
1.8.1.4

2013-06-07 22:35:00

by Yinghai Lu

[permalink] [raw]
Subject: [PATCH v3 18/27] x86, irq: More strict checking about registering ioapic

1. check overlaping gsi range
for hot-add ioapic case, BIOS may have some entries in MADT
and also have setting in pci root bus with _GSB of DSDT.

2. make bad_ioapics check idx instead of nr_ioapics.
for hotadd ioapic could find spare slot in the middle later.

3. check if entries is in right range.

Signed-off-by: Yinghai Lu <[email protected]>
Cc: Joerg Roedel <[email protected]>
Cc: Konrad Rzeszutek Wilk <[email protected]>
Cc: Sebastian Andrzej Siewior <[email protected]>
---
arch/x86/kernel/apic/io_apic.c | 29 ++++++++++++++++++++++-------
1 file changed, 22 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
index 7f95bbe..850d8e99 100644
--- a/arch/x86/kernel/apic/io_apic.c
+++ b/arch/x86/kernel/apic/io_apic.c
@@ -3818,7 +3818,7 @@ void __init ioapic_insert_resources(void)
}
}

-int mp_find_ioapic(u32 gsi)
+static int __mp_find_ioapic(u32 gsi, bool quiet)
{
int i = 0;

@@ -3833,10 +3833,16 @@ int mp_find_ioapic(u32 gsi)
return i;
}

- printk(KERN_ERR "ERROR: Unable to locate IOAPIC for GSI %d\n", gsi);
+ if (!quiet)
+ pr_err("ERROR: Unable to locate IOAPIC for GSI %d\n", gsi);
return -1;
}

+int mp_find_ioapic(u32 gsi)
+{
+ return __mp_find_ioapic(gsi, false);
+}
+
int mp_find_ioapic_pin(int ioapic, u32 gsi)
{
struct mp_ioapic_gsi *gsi_cfg;
@@ -3851,11 +3857,11 @@ int mp_find_ioapic_pin(int ioapic, u32 gsi)
return gsi - gsi_cfg->gsi_base;
}

-static __init int bad_ioapic(unsigned long address)
+static __init int bad_ioapic(int idx, unsigned long address)
{
- if (nr_ioapics >= MAX_IO_APICS) {
+ if (idx >= MAX_IO_APICS) {
pr_warn("WARNING: Max # of I/O APICs (%d) exceeded (found %d), skipping\n",
- MAX_IO_APICS, nr_ioapics);
+ MAX_IO_APICS, idx);
return 1;
}
if (!address) {
@@ -3886,14 +3892,17 @@ static __init int bad_ioapic_register(int idx)

void __init mp_register_ioapic(int id, u32 address, u32 gsi_base)
{
- int idx = 0;
+ int idx;
int entries;
struct mp_ioapic_gsi *gsi_cfg;

- if (bad_ioapic(address))
+ idx = __mp_find_ioapic(gsi_base, true);
+ if (idx >= 0)
return;

idx = nr_ioapics;
+ if (bad_ioapic(idx, address))
+ return;

ioapics[idx].mp_config.type = MP_IOAPIC;
ioapics[idx].mp_config.flags = MPC_APIC_USABLE;
@@ -3914,6 +3923,12 @@ void __init mp_register_ioapic(int id, u32 address, u32 gsi_base)
* and to prevent reprogramming of IOAPIC pins (PCI GSIs).
*/
entries = io_apic_get_redir_entries(idx);
+
+ if (!entries || entries > MP_MAX_IOAPIC_PIN) {
+ clear_fixmap(FIX_IO_APIC_BASE_0 + idx);
+ return;
+ }
+
gsi_cfg = mp_ioapic_gsi_routing(idx);
gsi_cfg->gsi_base = gsi_base;
gsi_cfg->gsi_end = gsi_base + entries - 1;
--
1.8.1.4

2013-06-07 22:35:50

by Yinghai Lu

[permalink] [raw]
Subject: [PATCH v3 03/27] x86, irq: Print out MSI/MSI-X clearly

Print out exact MSI or MSI-X instead of MSI/MSI-X.

Signed-off-by: Yinghai Lu <[email protected]>
Cc: Joerg Roedel <[email protected]>
Cc: Konrad Rzeszutek Wilk <[email protected]>
Cc: Sebastian Andrzej Siewior <[email protected]>
---
arch/x86/kernel/apic/io_apic.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
index f78f7f6..059589f 100644
--- a/arch/x86/kernel/apic/io_apic.c
+++ b/arch/x86/kernel/apic/io_apic.c
@@ -3125,7 +3125,8 @@ int setup_msi_irq(struct pci_dev *dev, struct msi_desc *msidesc,

irq_set_chip_and_handler_name(irq, chip, handle_edge_irq, "edge");

- dev_printk(KERN_DEBUG, &dev->dev, "irq %d for MSI/MSI-X\n", irq);
+ dev_printk(KERN_DEBUG, &dev->dev, "irq %d for MSI%s\n", irq,
+ msidesc->msi_attrib.is_msix ? "-X" : "");

return 0;
}
--
1.8.1.4

2013-06-07 22:36:11

by Yinghai Lu

[permalink] [raw]
Subject: [PATCH v3 08/27] x86, irq: kill create_irq()

create_irq() will return -1 when failing to allocate.
create_irq_nr() will return 0 when failing to allocate.

It only causes confusion.

Now we have no user for create_irq(), so remove create_irq() for x86.

Signed-off-by: Yinghai Lu <[email protected]>
Cc: Joerg Roedel <[email protected]>
Cc: Konrad Rzeszutek Wilk <[email protected]>
Cc: Sebastian Andrzej Siewior <[email protected]>
---
arch/x86/kernel/apic/io_apic.c | 15 ---------------
1 file changed, 15 deletions(-)

diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
index 436fb38..70b6617 100644
--- a/arch/x86/kernel/apic/io_apic.c
+++ b/arch/x86/kernel/apic/io_apic.c
@@ -2968,21 +2968,6 @@ unsigned int create_irq_nr(unsigned int from, int node)
return __create_irqs(from, 1, node);
}

-int create_irq(void)
-{
- int node = cpu_to_node(0);
- unsigned int irq_want;
- int irq;
-
- irq_want = nr_irqs_gsi;
- irq = create_irq_nr(irq_want, node);
-
- if (irq == 0)
- irq = -1;
-
- return irq;
-}
-
void destroy_irq(unsigned int irq)
{
struct irq_cfg *cfg = irq_get_chip_data(irq);
--
1.8.1.4

2013-06-07 22:36:50

by Yinghai Lu

[permalink] [raw]
Subject: [PATCH v3 21/27] x86, irq: Make ioapics loop skip blank slots

When multiple ioapics get added and removed, we could have blank slots
in middle of ioapics array.

Add skip code for this case by checking nr_registers.

Signed-off-by: Yinghai Lu <[email protected]>
Cc: Paul Gortmaker <[email protected]>
Cc: Joerg Roedel <[email protected]>
Cc: Konrad Rzeszutek Wilk <[email protected]>
Cc: Sebastian Andrzej Siewior <[email protected]>
---
arch/x86/kernel/apic/io_apic.c | 29 +++++++++++++++++++++++------
1 file changed, 23 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
index 8a3fedf..970c4fd 100644
--- a/arch/x86/kernel/apic/io_apic.c
+++ b/arch/x86/kernel/apic/io_apic.c
@@ -793,7 +793,8 @@ int save_ioapic_entries(void)
int err = 0;

for (apic = 0; apic < nr_ioapics; apic++) {
- if (!ioapics[apic].saved_registers) {
+ if (!ioapics[apic].saved_registers &&
+ ioapics[apic].nr_registers) {
err = -ENOMEM;
continue;
}
@@ -899,9 +900,12 @@ static int __init find_isa_irq_apic(int irq, int type)
if (i < mp_irq_entries) {
int ioapic_idx;

- for (ioapic_idx = 0; ioapic_idx < nr_ioapics; ioapic_idx++)
+ for (ioapic_idx = 0; ioapic_idx < nr_ioapics; ioapic_idx++) {
+ if (!ioapics[ioapic_idx].nr_registers)
+ continue;
if (mpc_ioapic_id(ioapic_idx) == mp_irqs[i].dstapic)
return ioapic_idx;
+ }
}

return -1;
@@ -1121,10 +1125,13 @@ int IO_APIC_get_PCI_irq_vector(int bus, int slot, int pin,
for (i = 0; i < mp_irq_entries; i++) {
int lbus = mp_irqs[i].srcbus;

- for (ioapic_idx = 0; ioapic_idx < nr_ioapics; ioapic_idx++)
+ for (ioapic_idx = 0; ioapic_idx < nr_ioapics; ioapic_idx++) {
+ if (!ioapics[ioapic_idx].nr_registers)
+ continue;
if (mpc_ioapic_id(ioapic_idx) == mp_irqs[i].dstapic ||
mp_irqs[i].dstapic == MP_APIC_ALL)
break;
+ }

if (!test_bit(lbus, mp_bus_not_pci) &&
!mp_irqs[i].irqtype &&
@@ -2084,6 +2091,9 @@ void __init setup_ioapic_ids_from_mpc_nocheck(void)
* Set the IOAPIC ID to the value stored in the MPC table.
*/
for (ioapic_idx = 0; ioapic_idx < nr_ioapics; ioapic_idx++) {
+ if (!ioapics[ioapic_idx].nr_registers)
+ continue;
+
/* Read the register 0 value */
raw_spin_lock_irqsave(&ioapic_lock, flags);
reg_00.raw = io_apic_read(ioapic_idx, 0);
@@ -3013,8 +3023,12 @@ static void ioapic_resume(void)
{
int ioapic_idx;

- for (ioapic_idx = nr_ioapics - 1; ioapic_idx >= 0; ioapic_idx--)
+ for (ioapic_idx = nr_ioapics - 1; ioapic_idx >= 0; ioapic_idx--) {
+ if (!ioapics[ioapic_idx].nr_registers)
+ continue;
+
resume_ioapic_id(ioapic_idx);
+ }

restore_ioapic_entries();
}
@@ -3851,8 +3865,11 @@ static int __mp_find_ioapic(u32 gsi, bool quiet)
/* Find the IOAPIC that manages this GSI. */
for (i = 0; i < nr_ioapics; i++) {
struct mp_ioapic_gsi *gsi_cfg = mp_ioapic_gsi_routing(i);
- if ((gsi >= gsi_cfg->gsi_base)
- && (gsi <= gsi_cfg->gsi_end))
+
+ if (!ioapics[i].nr_registers)
+ continue;
+
+ if ((gsi >= gsi_cfg->gsi_base) && (gsi <= gsi_cfg->gsi_end))
return i;
}

--
1.8.1.4

2013-06-07 22:31:49

by Yinghai Lu

[permalink] [raw]
Subject: [PATCH v3 25/27] PCI, x86, ACPI: Link acpi ioapic register to ioapic

During ioapic hotplug, acpi_register_ioapic will be called.
Now for x86, that function is blank.
Fill that will update __mp_register_ioapic to use those ioapic.

Signed-off-by: Yinghai Lu <[email protected]>
---
arch/x86/kernel/acpi/boot.c | 10 ++++++----
1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/acpi/boot.c b/arch/x86/kernel/acpi/boot.c
index 7190d0d..5dca557 100644
--- a/arch/x86/kernel/acpi/boot.c
+++ b/arch/x86/kernel/acpi/boot.c
@@ -700,16 +700,18 @@ EXPORT_SYMBOL(acpi_unmap_lsapic);

int acpi_register_ioapic(acpi_handle handle, u64 phys_addr, u32 gsi_base)
{
- /* TBD */
- return -EINVAL;
+ unsigned long long id = 0;
+
+ acpi_evaluate_integer(handle, "_UID", NULL, &id);
+
+ return __mp_register_ioapic(id, phys_addr, gsi_base, true);
}

EXPORT_SYMBOL(acpi_register_ioapic);

int acpi_unregister_ioapic(acpi_handle handle, u32 gsi_base)
{
- /* TBD */
- return -EINVAL;
+ return mp_unregister_ioapic(gsi_base);
}

EXPORT_SYMBOL(acpi_unregister_ioapic);
--
1.8.1.4

2013-06-07 22:31:45

by Yinghai Lu

[permalink] [raw]
Subject: [PATCH v3 17/27] genirq: Bail out early in free_desc()

We pre-reserve irq range for hot-added ioapic, and later only
some are used via realloc.
So during hot-remove, we need to clear bits in allocated_irqs
for both case.

Check if the irq_desc is there at first, and bail out early if
irq_desc is not allocated yet.
We can use irq_free_descs to clear allocated_irqs bits for
preserved irqs only.

Signed-off-by: Yinghai Lu <[email protected]>
---
kernel/irq/irqdesc.c | 3 +++
1 file changed, 3 insertions(+)

diff --git a/kernel/irq/irqdesc.c b/kernel/irq/irqdesc.c
index b48f65b..32f099e 100644
--- a/kernel/irq/irqdesc.c
+++ b/kernel/irq/irqdesc.c
@@ -170,6 +170,9 @@ static void free_desc(unsigned int irq)
{
struct irq_desc *desc = irq_to_desc(irq);

+ if (!desc)
+ return;
+
unregister_irq_proc(irq, desc);

mutex_lock(&sparse_irq_lock);
--
1.8.1.4

2013-06-07 22:31:43

by Yinghai Lu

[permalink] [raw]
Subject: [PATCH v3 14/27] xen, irq: call irq_realloc_desc_at() at first

To make x86 irq allocation to be same with booting path and ioapic
hot add path, We will pre-reserve irq for all gsi at first.
We have to use realloc here, otherwise irq_alloc_desc_at will fail
because bit is already get marked for pre-reserved in irq bitmaps.

Signed-off-by: Yinghai Lu <[email protected]>
Cc: Konrad Rzeszutek Wilk <[email protected]>
Cc: [email protected]
---
drivers/xen/events.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/xen/events.c b/drivers/xen/events.c
index 6a6bbe4..b0caa4e 100644
--- a/drivers/xen/events.c
+++ b/drivers/xen/events.c
@@ -505,8 +505,12 @@ static int __must_check xen_allocate_irq_gsi(unsigned gsi)
/* Legacy IRQ descriptors are already allocated by the arch. */
if (gsi < NR_IRQS_LEGACY)
irq = gsi;
- else
- irq = irq_alloc_desc_at(gsi, -1);
+ else {
+ /* for x86, irq already get reserved for gsi */
+ irq = irq_realloc_desc_at(gsi, -1);
+ if (irq < 0)
+ irq = irq_alloc_desc_at(gsi, -1);
+ }

xen_irq_init(irq);

--
1.8.1.4

2013-06-07 22:38:22

by Yinghai Lu

[permalink] [raw]
Subject: [PATCH v3 10/27] genirq: Split __irq_reserve_irqs from irq_alloc_descs

irq_alloc_descs and irq_reserve_irqs are almost the same.
Separate code out to __irq_reserved_irqs, and other two reuse
__irq_reserve_irqs.

We will use __irq_reserve_irqs for coming ioapic hotplug support.

Signed-off-by: Yinghai Lu <[email protected]>
Cc: Alexander Gordeev <[email protected]>
---
include/linux/irq.h | 1 +
kernel/irq/irqdesc.c | 54 ++++++++++++++++++++++++++++++----------------------
2 files changed, 32 insertions(+), 23 deletions(-)

diff --git a/include/linux/irq.h b/include/linux/irq.h
index bc4e066..4e0fcbb 100644
--- a/include/linux/irq.h
+++ b/include/linux/irq.h
@@ -600,6 +600,7 @@ int __irq_alloc_descs(int irq, unsigned int from, unsigned int cnt, int node,

void irq_free_descs(unsigned int irq, unsigned int cnt);
int irq_reserve_irqs(unsigned int from, unsigned int cnt);
+int __irq_reserve_irqs(int irq, unsigned int from, unsigned int cnt);

static inline void irq_free_desc(unsigned int irq)
{
diff --git a/kernel/irq/irqdesc.c b/kernel/irq/irqdesc.c
index 192a302..3b9fb92 100644
--- a/kernel/irq/irqdesc.c
+++ b/kernel/irq/irqdesc.c
@@ -340,18 +340,15 @@ void irq_free_descs(unsigned int from, unsigned int cnt)
EXPORT_SYMBOL_GPL(irq_free_descs);

/**
- * irq_alloc_descs - allocate and initialize a range of irq descriptors
- * @irq: Allocate for specific irq number if irq >= 0
+ * __irq_reserve_descs - reserve and initialize a range of irq descriptors
+ * @irq: Reserve for specific irq number if irq >= 0
* @from: Start the search from this irq number
- * @cnt: Number of consecutive irqs to allocate.
- * @node: Preferred node on which the irq descriptor should be allocated
- * @owner: Owning module (can be NULL)
+ * @cnt: Number of consecutive irqs to reserve.
*
* Returns the first irq number or error code
*/
int __ref
-__irq_alloc_descs(int irq, unsigned int from, unsigned int cnt, int node,
- struct module *owner)
+__irq_reserve_irqs(int irq, unsigned int from, unsigned int cnt)
{
int start, ret;

@@ -369,7 +366,7 @@ __irq_alloc_descs(int irq, unsigned int from, unsigned int cnt, int node,
start = bitmap_find_next_zero_area(allocated_irqs, IRQ_BITMAP_BITS,
from, cnt, 0);
ret = -EEXIST;
- if (irq >=0 && start != irq)
+ if (irq >= 0 && start != irq)
goto err;

if (start + cnt > nr_irqs) {
@@ -380,12 +377,36 @@ __irq_alloc_descs(int irq, unsigned int from, unsigned int cnt, int node,

bitmap_set(allocated_irqs, start, cnt);
mutex_unlock(&sparse_irq_lock);
- return alloc_descs(start, cnt, node, owner);
+ return start;

err:
mutex_unlock(&sparse_irq_lock);
return ret;
}
+
+/**
+ * irq_alloc_descs - allocate and initialize a range of irq descriptors
+ * @irq: Allocate for specific irq number if irq >= 0
+ * @from: Start the search from this irq number
+ * @cnt: Number of consecutive irqs to allocate.
+ * @node: Preferred node on which the irq descriptor should be allocated
+ * @owner: Owning module (can be NULL)
+ *
+ * Returns the first irq number or error code
+ */
+int __ref
+__irq_alloc_descs(int irq, unsigned int from, unsigned int cnt, int node,
+ struct module *owner)
+{
+ int start;
+
+ start = __irq_reserve_irqs(irq, from, cnt);
+
+ if (start < 0)
+ return start;
+
+ return alloc_descs(start, cnt, node, owner);
+}
EXPORT_SYMBOL_GPL(__irq_alloc_descs);

/**
@@ -397,20 +418,7 @@ EXPORT_SYMBOL_GPL(__irq_alloc_descs);
*/
int irq_reserve_irqs(unsigned int from, unsigned int cnt)
{
- unsigned int start;
- int ret = 0;
-
- if (!cnt || (from + cnt) > nr_irqs)
- return -EINVAL;
-
- mutex_lock(&sparse_irq_lock);
- start = bitmap_find_next_zero_area(allocated_irqs, nr_irqs, from, cnt, 0);
- if (start == from)
- bitmap_set(allocated_irqs, start, cnt);
- else
- ret = -EEXIST;
- mutex_unlock(&sparse_irq_lock);
- return ret;
+ return __irq_reserve_irqs(from, from, cnt);
}

/**
--
1.8.1.4

2013-06-07 22:38:40

by Yinghai Lu

[permalink] [raw]
Subject: [PATCH v3 07/27] iommu, irq: Allocate irq_desc for dmar_msi with local node

iommu irq's irq_desc should be on local node ram.

Fix the return value checking problem.

create_irq() will return -1 when fail to allocate.
create_irq_nr() will return 0 when fail to allocate.

here only check !irq, so need to change it to use create_irq_nr instead.

Signed-off-by: Yinghai Lu <[email protected]>
Cc: Joerg Roedel <[email protected]>
Cc: Donald Dutile <[email protected]>
Acked-by: Donald Dutile <[email protected]>
---
drivers/iommu/dmar.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/iommu/dmar.c b/drivers/iommu/dmar.c
index a7967ce..c09e60d 100644
--- a/drivers/iommu/dmar.c
+++ b/drivers/iommu/dmar.c
@@ -1273,7 +1273,7 @@ int dmar_set_interrupt(struct intel_iommu *iommu)
if (iommu->irq)
return 0;

- irq = create_irq();
+ irq = create_irq_nr(0, iommu->node);
if (!irq) {
pr_err("IOMMU: no free vectors\n");
return -EINVAL;
--
1.8.1.4

Subject: Re: [PATCH v3 02/27] x86, irq: Modify irq chip once for irq remapping

On Fri, Jun 07, 2013 at 03:30:48PM -0700, Yinghai Lu wrote:
> Current code: after irq remapping is enabled, irq_chip fields are modified
> during every irq setup.
> mp_register_gsi
> io_apic_set_pci_routing
> io_apic_setup_irq_pin
> setup_ioapic_irq
> ioapic_register_intr
> setup_remapped_irq
> native_setup_msi_irqs
> setup_msi_irq
> setup_remapped_irq
> default_setup_hpet_msi
> setup_remapped_irq
> that is not efficient.
>
> We only need to modify those irq chip one time just after we enable
> irq mapping.

The overhead you talk about is calling setup_remapped_irq() for every
interrupt from the mp_register_gsi() call chain? MSI & HPET should happen only
once, or do I miss something?

> Change irq_remap_modify_chip_defaults() to __init as it only gets
> called during booting stage, via irq_remap_modify_chips().
>
> Affected irq_chip: ioapic_chip, msi_chip, hpet_msi_type.
> We don't need to use #ifdef in irq_remap_modify_chips():
> IRQ_REMAP only support x86_64 and X86_IO_APIC and PCI_MSI.
> HPET_TIMER is set when x86_64 is set.
> When we have IRQ_REMAP enabled, al three chips are defined and
> used.

all, not al

Still, the user could disable hpet or apic from the commandline but this
should cause any harm as the irq chips shouldn't be used then.

> diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
> index 904611b..ff50b90 100644
> --- a/arch/x86/kernel/apic/apic.c
> +++ b/arch/x86/kernel/apic/apic.c
> @@ -1552,6 +1552,8 @@ void enable_x2apic(void)
> int __init enable_IR(void)
> {
> #ifdef CONFIG_IRQ_REMAP
> + int ret;
> +
> if (!irq_remapping_supported()) {
> pr_debug("intr-remapping not supported\n");
> return -1;
> @@ -1563,7 +1565,12 @@ int __init enable_IR(void)
> return -1;
> }
>
> - return irq_remapping_enable();
> + ret = irq_remapping_enable();
> +
> + if (ret >= 0)
> + irq_remap_modify_chips();

This looks like ehm, well not well.
Could you please change this to:
ret = irq_remapping_enable();
if (ret)
return ret;

irq_remap_modify_chips();

> +
> + return ret;
> #endif
> return -1;
> }
> diff --git a/drivers/iommu/irq_remapping.c b/drivers/iommu/irq_remapping.c
> index 07ce86a..21ef344 100644
> --- a/drivers/iommu/irq_remapping.c
> +++ b/drivers/iommu/irq_remapping.c
> @@ -373,19 +373,26 @@ static void ir_print_prefix(struct irq_data *data, struct seq_file *p)
> seq_printf(p, " IR-%s", data->chip->name);
> }
>
> -static void irq_remap_modify_chip_defaults(struct irq_chip *chip)
> +static void __init irq_remap_modify_chip_defaults(struct irq_chip *chip)
> {
> + printk(KERN_DEBUG "irq_chip: %s ==> IR-%s", chip->name, chip->name);

If you need this please use pr_debug and add a \n at the end.

> chip->irq_print_chip = ir_print_prefix;
> chip->irq_ack = ir_ack_apic_edge;
> chip->irq_eoi = ir_ack_apic_level;
> chip->irq_set_affinity = x86_io_apic_ops.set_affinity;
> }
>
> +void __init irq_remap_modify_chips(void)
> +{
> + irq_remap_modify_chip_defaults(&ioapic_chip);
> + irq_remap_modify_chip_defaults(&msi_chip);
> + irq_remap_modify_chip_defaults(&hpet_msi_type);
> +}
> +
> bool setup_remapped_irq(int irq, struct irq_cfg *cfg, struct irq_chip *chip)
> {
> if (!irq_remapped(cfg))
> return false;
> irq_set_status_flags(irq, IRQ_MOVE_PCNTXT);
> - irq_remap_modify_chip_defaults(chip);

chip is not required, and can be removed.

> return true;
> }

Sebastian

Subject: Re: [PATCH v3 05/27] x86, irq: Make dmar_msi/hpet_msi irq_chip name consistent

On Fri, Jun 07, 2013 at 03:30:51PM -0700, Yinghai Lu wrote:
> All others are using "-" instead of "_".

Who are "all others"? According to my grep it is 121 vs 44. So even without
your two here we still 42 doing it the other way around so it is not all.

However it might make sense to use _ instead of " " within a chip's name and
- as a delimiter for modes like edge or level. However that is way beyond the
scope of this patch and I am not sure if anyone finds this usefull at all.

Saying this, I am fine with this patch :)

Sebastian

Subject: Re: [PATCH v3 06/27] ia64, irq: Add dummy create_irq_nr()

On Fri, Jun 07, 2013 at 03:30:52PM -0700, Yinghai Lu wrote:
> create_irq() will return -1 when fail to allocate.
The ia64 code here will return -ENOSPC.

> create_irq_nr() will return 0 when fail to allocate.
>
> Will use it to fix one return value checking for dmar_msi irq.

What about to unify the interface? Using -1 is kinda bad.

Sebastian

Subject: Re: [PATCH v3 07/27] iommu, irq: Allocate irq_desc for dmar_msi with local node

On Fri, Jun 07, 2013 at 03:30:53PM -0700, Yinghai Lu wrote:
> diff --git a/drivers/iommu/dmar.c b/drivers/iommu/dmar.c
> index a7967ce..c09e60d 100644
> --- a/drivers/iommu/dmar.c
> +++ b/drivers/iommu/dmar.c
> @@ -1273,7 +1273,7 @@ int dmar_set_interrupt(struct intel_iommu *iommu)
> if (iommu->irq)
> return 0;
>
> - irq = create_irq();
> + irq = create_irq_nr(0, iommu->node);
> if (!irq) {

This is okay. However you fix a bug here so you might want first fix the bug
with a Cc: stable tag and then do this.

> pr_err("IOMMU: no free vectors\n");
> return -EINVAL;

Sebastian

Subject: Re: [PATCH v3 08/27] x86, irq: kill create_irq()

On Fri, Jun 07, 2013 at 03:30:54PM -0700, Yinghai Lu wrote:
> create_irq() will return -1 when failing to allocate.
> create_irq_nr() will return 0 when failing to allocate.
>
> It only causes confusion.
>
> Now we have no user for create_irq(), so remove create_irq() for x86.

Oh thank you so much. That is what I just asked for :)

Sebastian

Subject: Re: [PATCH v3 09/27] x86, irq: Convert irq_2_pin list to generic list

On Fri, Jun 07, 2013 at 03:30:55PM -0700, Yinghai Lu wrote:
> Now irq_2_pin list is own grown list.
> We can use generic list to replace it so we could use generic helper
> functions to operate it.

Nice.

Sebastian

Subject: Re: [PATCH v3 11/27] x86, irq: Add realloc_irq_and_cfg_at()

On Fri, Jun 07, 2013 at 03:30:57PM -0700, Yinghai Lu wrote:
> For ioapic hot-add support, it would be easy if we put all irqs
> for that ioapic controller together.
>
> We can reserve irq range at first, then reallocate those
> pre-reserved one when it is needed.
>
> Add realloc_irq_and_cfg_at() to really allocate irq_desc and cfg,
> because pre-reserved only mark bits in allocate_irqs bit maps.
>
> The reasons for not allocating them during reserving:
> 1. only several pins in ioapic are used, allocate for all pins, will
> waste memory for not used pins.
> 2. relocate later could make sure irq_desc is allocated on local node ram.
> as dev->node is set at that point.
>
> -v2: update changelog by adding reasons, requested by Konrad.

I think it will be better to split out the gen irq changes out of x86 / apic
specific code.

I don't what to say. You are worried about the extra unused memory in case
of irq_desc right? The OF code has more or less the same problem. They use
irq_of_parse_and_map() / irq_create_mapping() to create a mapping between
virq (the linux number) and hw-irq (pin on the irq chip). This mapping is
created once the irq chip is detected. They don't care about virqs (aka
linux numbers) to be in a row and they allocate all of irqdesc at once.

Once you get irqdomain to be used within ioapic, then your "linux irq
number" vs "hw irq number" should disapper. Plus you can remove the whole
gsi_number thingy. And then you start working on getting irqdesc allocated
later, say at request_irq() time and free at free_irq().

However I am not sure if this is worth it. The advantage would be that you
would once one infrastcuture for linux-number vs hw-number mapping and the
delayed irqdesc allocate + numa node rellocating.

> Signed-off-by: Yinghai Lu <[email protected]>
> Cc: Joerg Roedel <[email protected]>
> Cc: Konrad Rzeszutek Wilk <[email protected]>
> Cc: Sebastian Andrzej Siewior <[email protected]>
> ---
> arch/x86/kernel/apic/io_apic.c | 32 +++++++++++++++++++++++++++++++-
> include/linux/irq.h | 5 +++++
> kernel/irq/irqdesc.c | 26 ++++++++++++++++++++++++++
> 3 files changed, 62 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
> index 670c538..a157a56 100644
> --- a/arch/x86/kernel/apic/io_apic.c
> +++ b/arch/x86/kernel/apic/io_apic.c
> @@ -301,6 +301,36 @@ static void free_irq_at(unsigned int at, struct irq_cfg *cfg)
> irq_free_desc(at);
> }
>
> +static struct irq_cfg *realloc_irq_and_cfg_at(unsigned int at, int node)
> +{
> + struct irq_desc *desc = irq_to_desc(at);
> + struct irq_cfg *cfg;
> + int res;
> +
> + if (desc) {
> + if (irq_desc_get_irq_data(desc)->node == node)
> + return alloc_irq_and_cfg_at(at, node);
> +
> + cfg = irq_desc_get_chip_data(desc);
> + if (cfg) {
> + /* shared irq */
> + if (!list_empty(&cfg->irq_2_pin))
> + return cfg;
> + free_irq_cfg(at, cfg);
> + }
> + }
> +
> + res = irq_realloc_desc_at(at, node);
> + if (res >= 0) {
> + cfg = alloc_irq_cfg(at, node);
> + if (cfg) {
> + irq_set_chip_data(at, cfg);
> + return cfg;
> + }
> + }

This looks somehow hackish. If irqdesc exists on another node then it
looks here like you overwrite it with a new one but __irq_realloc_desc()
deallocates the old one. As I said, this looks very hackish.

> +
> + return alloc_irq_and_cfg_at(at, node);
> +}
>
> struct io_apic {
> unsigned int index;
> @@ -3352,7 +3382,7 @@ int arch_setup_ht_irq(unsigned int irq, struct pci_dev *dev)
> static int
> io_apic_setup_irq_pin(unsigned int irq, int node, struct io_apic_irq_attr *attr)
> {
> - struct irq_cfg *cfg = alloc_irq_and_cfg_at(irq, node);
> + struct irq_cfg *cfg = realloc_irq_and_cfg_at(irq, node);
> int ret;
>
> if (!cfg)
> diff --git a/include/linux/irq.h b/include/linux/irq.h
> index 4e0fcbb..9c6c047 100644
> --- a/include/linux/irq.h
> +++ b/include/linux/irq.h
> @@ -602,6 +602,11 @@ void irq_free_descs(unsigned int irq, unsigned int cnt);
> int irq_reserve_irqs(unsigned int from, unsigned int cnt);
> int __irq_reserve_irqs(int irq, unsigned int from, unsigned int cnt);
>
> +int __irq_realloc_desc(int at, int node, struct module *owner);
> +/* use macros to avoid needing export.h for THIS_MODULE */
If you put this in line with the other functions, you wouldn't need to
copy the comment.

> +#define irq_realloc_desc_at(at, node) \
> + __irq_realloc_desc(at, node, THIS_MODULE)
> +
> static inline void irq_free_desc(unsigned int irq)
> {
> irq_free_descs(irq, 1);
> diff --git a/kernel/irq/irqdesc.c b/kernel/irq/irqdesc.c
> index 3b9fb92..b48f65b 100644
> --- a/kernel/irq/irqdesc.c
> +++ b/kernel/irq/irqdesc.c
> @@ -99,6 +99,11 @@ EXPORT_SYMBOL_GPL(nr_irqs);
> static DEFINE_MUTEX(sparse_irq_lock);
> static DECLARE_BITMAP(allocated_irqs, IRQ_BITMAP_BITS);
>
> +static bool __irq_is_reserved(int irq)
> +{
> + return !!test_bit(irq, allocated_irqs);

This is not reserved, this is allocated.

> +}
> +
> #ifdef CONFIG_SPARSE_IRQ
>
> static RADIX_TREE(irq_desc_tree, GFP_KERNEL);
> @@ -410,6 +415,27 @@ __irq_alloc_descs(int irq, unsigned int from, unsigned int cnt, int node,
> EXPORT_SYMBOL_GPL(__irq_alloc_descs);
>
> /**
> + * irq_realloc_desc - allocate irq descriptor for irq that is already reserved
> + * @irq: Allocate for specific irq number if irq >= 0
> + * @node: Preferred node on which the irq descriptor should be allocated
> + * @owner: Owning module (can be NULL)
> + *
> + * Returns the irq number or error code
> + */
> +int __ref
> +__irq_realloc_desc(int irq, int node, struct module *owner)
> +{
> + if (!__irq_is_reserved(irq))
> + return -EINVAL;

I don't like the part where it is named reserved but means allocated

> +
> + if (irq_to_desc(irq))
> + free_desc(irq);

and free if it is already avaiable because it should not be allocated.

> +
> + return alloc_descs(irq, 1, node, owner);
> +}
> +EXPORT_SYMBOL_GPL(__irq_realloc_desc);

Sebastian

2013-06-10 13:51:04

by Alexander Gordeev

[permalink] [raw]
Subject: Re: [PATCH v3 10/27] genirq: Split __irq_reserve_irqs from irq_alloc_descs

On Fri, Jun 07, 2013 at 03:30:56PM -0700, Yinghai Lu wrote:
> irq_alloc_descs and irq_reserve_irqs are almost the same.
> Separate code out to __irq_reserved_irqs, and other two reuse
> __irq_reserve_irqs.
>
> We will use __irq_reserve_irqs for coming ioapic hotplug support.
>
> Signed-off-by: Yinghai Lu <[email protected]>
> Cc: Alexander Gordeev <[email protected]>
> ---
> include/linux/irq.h | 1 +
> kernel/irq/irqdesc.c | 54 ++++++++++++++++++++++++++++++----------------------
> 2 files changed, 32 insertions(+), 23 deletions(-)
>
> diff --git a/include/linux/irq.h b/include/linux/irq.h
> index bc4e066..4e0fcbb 100644
> --- a/include/linux/irq.h
> +++ b/include/linux/irq.h
> @@ -600,6 +600,7 @@ int __irq_alloc_descs(int irq, unsigned int from, unsigned int cnt, int node,
>
> void irq_free_descs(unsigned int irq, unsigned int cnt);
> int irq_reserve_irqs(unsigned int from, unsigned int cnt);
> +int __irq_reserve_irqs(int irq, unsigned int from, unsigned int cnt);
>
> static inline void irq_free_desc(unsigned int irq)
> {
> diff --git a/kernel/irq/irqdesc.c b/kernel/irq/irqdesc.c
> index 192a302..3b9fb92 100644
> --- a/kernel/irq/irqdesc.c
> +++ b/kernel/irq/irqdesc.c
> @@ -340,18 +340,15 @@ void irq_free_descs(unsigned int from, unsigned int cnt)
> EXPORT_SYMBOL_GPL(irq_free_descs);
>
> /**
> - * irq_alloc_descs - allocate and initialize a range of irq descriptors
> - * @irq: Allocate for specific irq number if irq >= 0
> + * __irq_reserve_descs - reserve and initialize a range of irq descriptors
> + * @irq: Reserve for specific irq number if irq >= 0
> * @from: Start the search from this irq number
> - * @cnt: Number of consecutive irqs to allocate.
> - * @node: Preferred node on which the irq descriptor should be allocated
> - * @owner: Owning module (can be NULL)
> + * @cnt: Number of consecutive irqs to reserve.
> *
> * Returns the first irq number or error code
> */
> int __ref
> -__irq_alloc_descs(int irq, unsigned int from, unsigned int cnt, int node,
> - struct module *owner)
> +__irq_reserve_irqs(int irq, unsigned int from, unsigned int cnt)
> {
> int start, ret;
>
> @@ -369,7 +366,7 @@ __irq_alloc_descs(int irq, unsigned int from, unsigned int cnt, int node,
> start = bitmap_find_next_zero_area(allocated_irqs, IRQ_BITMAP_BITS,
> from, cnt, 0);
> ret = -EEXIST;
> - if (irq >=0 && start != irq)
> + if (irq >= 0 && start != irq)
> goto err;
>
> if (start + cnt > nr_irqs) {
> @@ -380,12 +377,36 @@ __irq_alloc_descs(int irq, unsigned int from, unsigned int cnt, int node,
>
> bitmap_set(allocated_irqs, start, cnt);
> mutex_unlock(&sparse_irq_lock);
> - return alloc_descs(start, cnt, node, owner);
> + return start;
>
> err:
> mutex_unlock(&sparse_irq_lock);
> return ret;
> }
> +
> +/**
> + * irq_alloc_descs - allocate and initialize a range of irq descriptors
> + * @irq: Allocate for specific irq number if irq >= 0
> + * @from: Start the search from this irq number
> + * @cnt: Number of consecutive irqs to allocate.
> + * @node: Preferred node on which the irq descriptor should be allocated
> + * @owner: Owning module (can be NULL)
> + *
> + * Returns the first irq number or error code
> + */
> +int __ref
> +__irq_alloc_descs(int irq, unsigned int from, unsigned int cnt, int node,
> + struct module *owner)
> +{
> + int start;
> +
> + start = __irq_reserve_irqs(irq, from, cnt);
> +
> + if (start < 0)
> + return start;
> +
> + return alloc_descs(start, cnt, node, owner);

I think alloc_descs() fail path is needed before return.

> +}
> EXPORT_SYMBOL_GPL(__irq_alloc_descs);
>
> /**
> @@ -397,20 +418,7 @@ EXPORT_SYMBOL_GPL(__irq_alloc_descs);
> */
> int irq_reserve_irqs(unsigned int from, unsigned int cnt)
> {
> - unsigned int start;
> - int ret = 0;
> -
> - if (!cnt || (from + cnt) > nr_irqs)
> - return -EINVAL;
> -
> - mutex_lock(&sparse_irq_lock);
> - start = bitmap_find_next_zero_area(allocated_irqs, nr_irqs, from, cnt, 0);
> - if (start == from)
> - bitmap_set(allocated_irqs, start, cnt);
> - else
> - ret = -EEXIST;
> - mutex_unlock(&sparse_irq_lock);
> - return ret;
> + return __irq_reserve_irqs(from, from, cnt);
> }
>
> /**
> --
> 1.8.1.4
>

--
Regards,
Alexander Gordeev
[email protected]

2013-06-10 19:16:21

by Yinghai Lu

[permalink] [raw]
Subject: Re: [PATCH v3 10/27] genirq: Split __irq_reserve_irqs from irq_alloc_descs

On Mon, Jun 10, 2013 at 6:51 AM, Alexander Gordeev <[email protected]> wrote:
> On Fri, Jun 07, 2013 at 03:30:56PM -0700, Yinghai Lu wrote:
>> irq_alloc_descs and irq_reserve_irqs are almost the same.
>> Separate code out to __irq_reserved_irqs, and other two reuse
>> __irq_reserve_irqs.
>>
>> We will use __irq_reserve_irqs for coming ioapic hotplug support.
>>
>> Signed-off-by: Yinghai Lu <[email protected]>
>> Cc: Alexander Gordeev <[email protected]>
>> ---
>> include/linux/irq.h | 1 +
>> kernel/irq/irqdesc.c | 54 ++++++++++++++++++++++++++++++----------------------
>> 2 files changed, 32 insertions(+), 23 deletions(-)
>>
>> diff --git a/include/linux/irq.h b/include/linux/irq.h
>> index bc4e066..4e0fcbb 100644
>> --- a/include/linux/irq.h
>> +++ b/include/linux/irq.h
>> @@ -600,6 +600,7 @@ int __irq_alloc_descs(int irq, unsigned int from, unsigned int cnt, int node,
>>
>> void irq_free_descs(unsigned int irq, unsigned int cnt);
>> int irq_reserve_irqs(unsigned int from, unsigned int cnt);
>> +int __irq_reserve_irqs(int irq, unsigned int from, unsigned int cnt);
>>
>> static inline void irq_free_desc(unsigned int irq)
>> {
>> diff --git a/kernel/irq/irqdesc.c b/kernel/irq/irqdesc.c
>> index 192a302..3b9fb92 100644
>> --- a/kernel/irq/irqdesc.c
>> +++ b/kernel/irq/irqdesc.c
>> @@ -340,18 +340,15 @@ void irq_free_descs(unsigned int from, unsigned int cnt)
>> EXPORT_SYMBOL_GPL(irq_free_descs);
>>
>> /**
>> - * irq_alloc_descs - allocate and initialize a range of irq descriptors
>> - * @irq: Allocate for specific irq number if irq >= 0
>> + * __irq_reserve_descs - reserve and initialize a range of irq descriptors
>> + * @irq: Reserve for specific irq number if irq >= 0
>> * @from: Start the search from this irq number
>> - * @cnt: Number of consecutive irqs to allocate.
>> - * @node: Preferred node on which the irq descriptor should be allocated
>> - * @owner: Owning module (can be NULL)
>> + * @cnt: Number of consecutive irqs to reserve.
>> *
>> * Returns the first irq number or error code
>> */
>> int __ref
>> -__irq_alloc_descs(int irq, unsigned int from, unsigned int cnt, int node,
>> - struct module *owner)
>> +__irq_reserve_irqs(int irq, unsigned int from, unsigned int cnt)
>> {
>> int start, ret;
>>
>> @@ -369,7 +366,7 @@ __irq_alloc_descs(int irq, unsigned int from, unsigned int cnt, int node,
>> start = bitmap_find_next_zero_area(allocated_irqs, IRQ_BITMAP_BITS,
>> from, cnt, 0);
>> ret = -EEXIST;
>> - if (irq >=0 && start != irq)
>> + if (irq >= 0 && start != irq)
>> goto err;
>>
>> if (start + cnt > nr_irqs) {
>> @@ -380,12 +377,36 @@ __irq_alloc_descs(int irq, unsigned int from, unsigned int cnt, int node,
>>
>> bitmap_set(allocated_irqs, start, cnt);
>> mutex_unlock(&sparse_irq_lock);
>> - return alloc_descs(start, cnt, node, owner);
>> + return start;
>>
>> err:
>> mutex_unlock(&sparse_irq_lock);
>> return ret;
>> }
>> +
>> +/**
>> + * irq_alloc_descs - allocate and initialize a range of irq descriptors
>> + * @irq: Allocate for specific irq number if irq >= 0
>> + * @from: Start the search from this irq number
>> + * @cnt: Number of consecutive irqs to allocate.
>> + * @node: Preferred node on which the irq descriptor should be allocated
>> + * @owner: Owning module (can be NULL)
>> + *
>> + * Returns the first irq number or error code
>> + */
>> +int __ref
>> +__irq_alloc_descs(int irq, unsigned int from, unsigned int cnt, int node,
>> + struct module *owner)
>> +{
>> + int start;
>> +
>> + start = __irq_reserve_irqs(irq, from, cnt);
>> +
>> + if (start < 0)
>> + return start;
>> +
>> + return alloc_descs(start, cnt, node, owner);
>
> I think alloc_descs() fail path is needed before return.

__irq_reserve_irqs already return -EEXIST etc,

old kernel is like:

>> @@ -380,12 +377,36 @@ __irq_alloc_descs(int irq, unsigned int from, unsigned int cnt, int node,

bitmap_set(allocated_irqs, start, cnt);
mutex_unlock(&sparse_irq_lock);
return alloc_descs(start, cnt, node, owner);

err:
mutex_unlock(&sparse_irq_lock);
return ret;

so i don't change the fail path handling.

Thanks

Yinghai

2013-06-10 19:40:23

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH v3 10/27] genirq: Split __irq_reserve_irqs from irq_alloc_descs

On Fri, 7 Jun 2013, Yinghai Lu wrote:
> /**
> - * irq_alloc_descs - allocate and initialize a range of irq descriptors
> - * @irq: Allocate for specific irq number if irq >= 0
> + * __irq_reserve_descs - reserve and initialize a range of irq descriptors

Did you even bother to run docbook ?

No you didn't. Otherwise you'd have noticed that documentation for
__irq_reserve_descs is not the right thing for a function named
__irq_reserve_irqs.

> + * __irq_reserve_descs - reserve and initialize a range of irq descriptors

The function only reserves a range of irq descriptors and does not
initialize them.

> +
> +/**
> + * irq_alloc_descs - allocate and initialize a range of irq descriptors

Again you document non matching function names.

> +int __ref
> +__irq_alloc_descs(int irq, unsigned int from, unsigned int cnt, int node,
> + struct module *owner)

Sloppy as usual along with a sucky changelog as usual.....

Thanks,

tglx

2013-06-10 19:41:47

by Alexander Gordeev

[permalink] [raw]
Subject: Re: [PATCH v3 10/27] genirq: Split __irq_reserve_irqs from irq_alloc_descs

On Mon, Jun 10, 2013 at 12:16:16PM -0700, Yinghai Lu wrote:
> On Mon, Jun 10, 2013 at 6:51 AM, Alexander Gordeev <[email protected]> wrote:
> >> +/**
> >> + * irq_alloc_descs - allocate and initialize a range of irq descriptors
> >> + * @irq: Allocate for specific irq number if irq >= 0
> >> + * @from: Start the search from this irq number
> >> + * @cnt: Number of consecutive irqs to allocate.
> >> + * @node: Preferred node on which the irq descriptor should be allocated
> >> + * @owner: Owning module (can be NULL)
> >> + *
> >> + * Returns the first irq number or error code
> >> + */
> >> +int __ref
> >> +__irq_alloc_descs(int irq, unsigned int from, unsigned int cnt, int node,
> >> + struct module *owner)
> >> +{
> >> + int start;
> >> +
> >> + start = __irq_reserve_irqs(irq, from, cnt);
> >> +
> >> + if (start < 0)
> >> + return start;
> >> +
> >> + return alloc_descs(start, cnt, node, owner);
> >
> > I think alloc_descs() fail path is needed before return.
>
> __irq_reserve_irqs already return -EEXIST etc,
>
> old kernel is like:
>
> >> @@ -380,12 +377,36 @@ __irq_alloc_descs(int irq, unsigned int from, unsigned int cnt, int node,
>
> bitmap_set(allocated_irqs, start, cnt);
> mutex_unlock(&sparse_irq_lock);
> return alloc_descs(start, cnt, node, owner);
>
> err:
> mutex_unlock(&sparse_irq_lock);
> return ret;
>
> so i don't change the fail path handling.

I rather meant the bits should be unset in case alloc_descs() failed.
But I failed to notice alloc_descs() does it. Therefore..

Reviewed-by: Alexander Gordeev <[email protected]>

> Thanks
>
> Yinghai

--
Regards,
Alexander Gordeev
[email protected]

2013-06-10 20:13:54

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH v3 11/27] x86, irq: Add realloc_irq_and_cfg_at()

On Fri, 7 Jun 2013, Yinghai Lu wrote:

> For ioapic hot-add support, it would be easy if we put all irqs
> for that ioapic controller together.
>
> We can reserve irq range at first, then reallocate those

No. We do not reallocate something which does not exist in the first
place.

> pre-reserved one when it is needed.
>
> Add realloc_irq_and_cfg_at() to really allocate irq_desc and cfg,
> because pre-reserved only mark bits in allocate_irqs bit maps.
>
> The reasons for not allocating them during reserving:
> 1. only several pins in ioapic are used, allocate for all pins, will
> waste memory for not used pins.
> 2. relocate later could make sure irq_desc is allocated on local node ram.
> as dev->node is set at that point.

This is not relocating. Your changelog sucks as much as your code.

> -v2: update changelog by adding reasons, requested by Konrad.
>
> Signed-off-by: Yinghai Lu <[email protected]>
> Cc: Joerg Roedel <[email protected]>
> Cc: Konrad Rzeszutek Wilk <[email protected]>
> Cc: Sebastian Andrzej Siewior <[email protected]>
> ---
> arch/x86/kernel/apic/io_apic.c | 32 +++++++++++++++++++++++++++++++-
> include/linux/irq.h | 5 +++++
> kernel/irq/irqdesc.c | 26 ++++++++++++++++++++++++++

No, we do not add new code to the core and use it in the same patch at
some random other place. The core code change wants to be separate and
have a separate changelog.

> --- a/include/linux/irq.h
> +++ b/include/linux/irq.h
> @@ -602,6 +602,11 @@ void irq_free_descs(unsigned int irq, unsigned int cnt);
> int irq_reserve_irqs(unsigned int from, unsigned int cnt);
> int __irq_reserve_irqs(int irq, unsigned int from, unsigned int cnt);
>
> +int __irq_realloc_desc(int at, int node, struct module *owner);
> +/* use macros to avoid needing export.h for THIS_MODULE */

You must be kidding. export.h has been split out from module.h exactly
to avoid horrible comments like the above and nonsense like this:

> +#define irq_realloc_desc_at(at, node) \
> + __irq_realloc_desc(at, node, THIS_MODULE)
> +

> diff --git a/kernel/irq/irqdesc.c b/kernel/irq/irqdesc.c
> index 3b9fb92..b48f65b 100644
> --- a/kernel/irq/irqdesc.c
> +++ b/kernel/irq/irqdesc.c
> @@ -99,6 +99,11 @@ EXPORT_SYMBOL_GPL(nr_irqs);
> static DEFINE_MUTEX(sparse_irq_lock);
> static DECLARE_BITMAP(allocated_irqs, IRQ_BITMAP_BITS);
>
> +static bool __irq_is_reserved(int irq)
> +{
> + return !!test_bit(irq, allocated_irqs);

What's the point of this? Why not use test_bit() directly in the code?

If we really want this to be a function, then it should be inline and
it could do without the pointless and !! nonsense.

> static RADIX_TREE(irq_desc_tree, GFP_KERNEL);
> @@ -410,6 +415,27 @@ __irq_alloc_descs(int irq, unsigned int from, unsigned int cnt, int node,
> EXPORT_SYMBOL_GPL(__irq_alloc_descs);
>
> /**
> + * irq_realloc_desc - allocate irq descriptor for irq that is already reserved

And of course you are documenting crap again.

> + * @irq: Allocate for specific irq number if irq >= 0
> + * @node: Preferred node on which the irq descriptor should be allocated
> + * @owner: Owning module (can be NULL)
> + *
> + * Returns the irq number or error code
> + */
> +int __ref
> +__irq_realloc_desc(int irq, int node, struct module *owner)

What's the point of this line split ?

> +{
> + if (!__irq_is_reserved(irq))
> + return -EINVAL;

So this function can operate safely w/o holding sparse_irq_lock?

> + if (irq_to_desc(irq))
> + free_desc(irq);

You unconditionally throw away an existing irq descriptor? No, you
should bail out here. The function name is a misnomer as it does not
match the funciton description:

irq_realloc_desc - allocate irq descriptor for irq that is already reserved

You want to allocate an irq descriptor for a reserved irq. That's what
the function is about, not about reallocating an existing irq
descriptor.

So what you want is:

irq_alloc_reserved_desc - allocate irq descriptor for irq that is already reserved

and then bail out if the irq descriptor already exists.

> + return alloc_descs(irq, 1, node, owner);

> +EXPORT_SYMBOL_GPL(__irq_realloc_desc);

What's the point of exporting this?

Thanks,

tglx

2013-06-10 20:43:54

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH v3 17/27] genirq: Bail out early in free_desc()

On Fri, 7 Jun 2013, Yinghai Lu wrote:

> We pre-reserve irq range for hot-added ioapic, and later only
> some are used via realloc.
> So during hot-remove, we need to clear bits in allocated_irqs
> for both case.
>
> Check if the irq_desc is there at first, and bail out early if
> irq_desc is not allocated yet.
> We can use irq_free_descs to clear allocated_irqs bits for
> preserved irqs only.

This changelog is a nightmare as usual.

It has nothing to do with pre reservation and hot-added ioapics. This
is generic code and does not care at all about x86 specific crappola.

Your change is adding a generic sanity check into free_desc().

So first of all the patch subject is bogus:

genirq: Bail out early in free_desc()

That's missing _WHY_ it bails out early.

And then the changelog itself drivels about completely irrelevant
nonsense instead of explaining the change itself.

So what I want to see here is something like this:

"genirq: Do not free unallocated irq descriptors

Hot-added interrupt controllers can reserve a range of interrupt
numbers, but only allocate some of them. To simplify the release on
hot-remove allow them to iterate over the reserved range and let the
free_desc() code return early when the descriptor does not exist."

Can you see the difference?

I told you more than once, that I'm not accepting your sloppy crap
anymore. I'm simply not buying your claim that you think "chinese"
when writing "english". You are simply too lazy to give a rats ass
about it. That applies to your code and to your changelogs in the same
way.

I'm really tired of dealing with shit like this.

No thanks

tglx

> Signed-off-by: Yinghai Lu <[email protected]>
> ---
> kernel/irq/irqdesc.c | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/kernel/irq/irqdesc.c b/kernel/irq/irqdesc.c
> index b48f65b..32f099e 100644
> --- a/kernel/irq/irqdesc.c
> +++ b/kernel/irq/irqdesc.c
> @@ -170,6 +170,9 @@ static void free_desc(unsigned int irq)
> {
> struct irq_desc *desc = irq_to_desc(irq);
>
> + if (!desc)
> + return;
> +
> unregister_irq_proc(irq, desc);
>
> mutex_lock(&sparse_irq_lock);
> --
> 1.8.1.4
>
>

2013-06-10 23:17:36

by Yinghai Lu

[permalink] [raw]
Subject: Re: [PATCH v3 02/27] x86, irq: Modify irq chip once for irq remapping

On Sun, Jun 9, 2013 at 7:54 AM, Sebastian Andrzej Siewior
<[email protected]> wrote:
> On Fri, Jun 07, 2013 at 03:30:48PM -0700, Yinghai Lu wrote:
>> Current code: after irq remapping is enabled, irq_chip fields are modified
>> during every irq setup.
>> mp_register_gsi
>> io_apic_set_pci_routing
>> io_apic_setup_irq_pin
>> setup_ioapic_irq
>> ioapic_register_intr
>> setup_remapped_irq
>> native_setup_msi_irqs
>> setup_msi_irq
>> setup_remapped_irq
>> default_setup_hpet_msi
>> setup_remapped_irq
>> that is not efficient.
>>
>> We only need to modify those irq chip one time just after we enable
>> irq mapping.
>
> The overhead you talk about is calling setup_remapped_irq() for every
> interrupt from the mp_register_gsi() call chain? MSI & HPET should happen only
> once, or do I miss something?

yes, and for every pci_enable_msi per pci device.

>
>> Change irq_remap_modify_chip_defaults() to __init as it only gets
>> called during booting stage, via irq_remap_modify_chips().
>>
>> Affected irq_chip: ioapic_chip, msi_chip, hpet_msi_type.
>> We don't need to use #ifdef in irq_remap_modify_chips():
>> IRQ_REMAP only support x86_64 and X86_IO_APIC and PCI_MSI.
>> HPET_TIMER is set when x86_64 is set.
>> When we have IRQ_REMAP enabled, al three chips are defined and
>> used.
>
> all, not al
>
> Still, the user could disable hpet or apic from the commandline but this
> should cause any harm as the irq chips shouldn't be used then.

should "not" ?

>
>> diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
>> index 904611b..ff50b90 100644
>> --- a/arch/x86/kernel/apic/apic.c
>> +++ b/arch/x86/kernel/apic/apic.c
>> @@ -1552,6 +1552,8 @@ void enable_x2apic(void)
>> int __init enable_IR(void)
>> {
>> #ifdef CONFIG_IRQ_REMAP
>> + int ret;
>> +
>> if (!irq_remapping_supported()) {
>> pr_debug("intr-remapping not supported\n");
>> return -1;
>> @@ -1563,7 +1565,12 @@ int __init enable_IR(void)
>> return -1;
>> }
>>
>> - return irq_remapping_enable();
>> + ret = irq_remapping_enable();
>> +
>> + if (ret >= 0)
>> + irq_remap_modify_chips();
>
> This looks like ehm, well not well.
> Could you please change this to:
> ret = irq_remapping_enable();
> if (ret)
> return ret;
>
> irq_remap_modify_chips();
>
>> +
>> + return ret;

So you want to use

ret = irq_remapping_enable();
if (ret < 0)
return ret;

irq_remap_modify_chips();

return ret;

instead of:

ret = irq_remapping_enable();
if (ret >= 0)
irq_remap_modify_chips();

return ret;


>> #endif

>> return -1;
>> }
>> diff --git a/drivers/iommu/irq_remapping.c b/drivers/iommu/irq_remapping.c
>> index 07ce86a..21ef344 100644
>> --- a/drivers/iommu/irq_remapping.c
>> +++ b/drivers/iommu/irq_remapping.c
>> @@ -373,19 +373,26 @@ static void ir_print_prefix(struct irq_data *data, struct seq_file *p)
>> seq_printf(p, " IR-%s", data->chip->name);
>> }
>>
>> -static void irq_remap_modify_chip_defaults(struct irq_chip *chip)
>> +static void __init irq_remap_modify_chip_defaults(struct irq_chip *chip)
>> {
>> + printk(KERN_DEBUG "irq_chip: %s ==> IR-%s", chip->name, chip->name);
>
> If you need this please use pr_debug and add a \n at the end.

No, I hate pr_debug, as it is useless unless have DEBUG defined.
later even ask user to append "debug ignore_loglevel", we still get nothing
unless user recompile the kernel.

will add "\n", not sure how that get dropped.

>
>> chip->irq_print_chip = ir_print_prefix;
>> chip->irq_ack = ir_ack_apic_edge;
>> chip->irq_eoi = ir_ack_apic_level;
>> chip->irq_set_affinity = x86_io_apic_ops.set_affinity;
>> }
>>
>> +void __init irq_remap_modify_chips(void)
>> +{
>> + irq_remap_modify_chip_defaults(&ioapic_chip);
>> + irq_remap_modify_chip_defaults(&msi_chip);
>> + irq_remap_modify_chip_defaults(&hpet_msi_type);
>> +}
>> +
>> bool setup_remapped_irq(int irq, struct irq_cfg *cfg, struct irq_chip *chip)
>> {
>> if (!irq_remapped(cfg))
>> return false;
>> irq_set_status_flags(irq, IRQ_MOVE_PCNTXT);
>> - irq_remap_modify_chip_defaults(chip);
>
> chip is not required, and can be removed.

yes.

Thanks

Yinghai

2013-06-10 23:40:07

by Yinghai Lu

[permalink] [raw]
Subject: Re: [PATCH v3 05/27] x86, irq: Make dmar_msi/hpet_msi irq_chip name consistent

On Sun, Jun 9, 2013 at 8:16 AM, Sebastian Andrzej Siewior
<[email protected]> wrote:
> On Fri, Jun 07, 2013 at 03:30:51PM -0700, Yinghai Lu wrote:
>> All others are using "-" instead of "_".
>
> Who are "all others"? According to my grep it is 121 vs 44. So even without
> your two here we still 42 doing it the other way around so it is not all.

I mean in arch/x86

arch/x86/kernel/apic/io_apic.c:struct irq_chip ioapic_chip __read_mostly = {
arch/x86/kernel/apic/io_apic.c- .name = "IO-APIC",
--
arch/x86/kernel/apic/io_apic.c:static struct irq_chip lapic_chip
__read_mostly = {
arch/x86/kernel/apic/io_apic.c- .name = "local-APIC",
--
arch/x86/kernel/apic/io_apic.c:struct irq_chip msi_chip = {
arch/x86/kernel/apic/io_apic.c- .name = "PCI-MSI",
--
arch/x86/kernel/apic/io_apic.c:static struct irq_chip dmar_msi_type = {
arch/x86/kernel/apic/io_apic.c- .name = "DMAR_MSI",
--
arch/x86/kernel/apic/io_apic.c:struct irq_chip hpet_msi_type = {
arch/x86/kernel/apic/io_apic.c- .name = "HPET_MSI",
--
arch/x86/kernel/apic/io_apic.c:static struct irq_chip ht_irq_chip = {
arch/x86/kernel/apic/io_apic.c- .name = "PCI-HT",
--
arch/x86/kernel/i8259.c:struct irq_chip i8259A_chip = {
arch/x86/kernel/i8259.c- .name = "XT-PIC",
--
arch/x86/lguest/boot.c:static struct irq_chip lguest_irq_controller = {
arch/x86/lguest/boot.c- .name = "lguest",
--
arch/x86/platform/uv/uv_irq.c:static struct irq_chip uv_irq_chip = {
arch/x86/platform/uv/uv_irq.c- .name = "UV-CORE",
--
arch/x86/platform/visws/visws_quirks.c:static struct irq_chip
cobalt_irq_type = {
arch/x86/platform/visws/visws_quirks.c- .name = "Cobalt-APIC",
--
arch/x86/platform/visws/visws_quirks.c:static struct irq_chip
piix4_master_irq_type = {
arch/x86/platform/visws/visws_quirks.c- .name = "PIIX4-master",
--
arch/x86/platform/visws/visws_quirks.c:static struct irq_chip
piix4_virtual_irq_type = {
arch/x86/platform/visws/visws_quirks.c- .name = "PIIX4-virtual",
--


>
> However it might make sense to use _ instead of " " within a chip's name and
> - as a delimiter for modes like edge or level. However that is way beyond the
> scope of this patch and I am not sure if anyone finds this usefull at all.
>
> Saying this, I am fine with this patch :)

ok.

Thanks

Yinghai

2013-06-10 23:41:42

by Yinghai Lu

[permalink] [raw]
Subject: Re: [PATCH v3 06/27] ia64, irq: Add dummy create_irq_nr()

On Sun, Jun 9, 2013 at 8:22 AM, Sebastian Andrzej Siewior
<[email protected]> wrote:
> On Fri, Jun 07, 2013 at 03:30:52PM -0700, Yinghai Lu wrote:
>> create_irq() will return -1 when fail to allocate.
> The ia64 code here will return -ENOSPC.
>
>> create_irq_nr() will return 0 when fail to allocate.
>>
>> Will use it to fix one return value checking for dmar_msi irq.
>
> What about to unify the interface? Using -1 is kinda bad.

after some following patch, create_irq() in x86 get killed.

Still need ia64 guys to kill create_irq() in arch/ia64.

Thanks

Yinghai

2013-06-10 23:43:13

by Yinghai Lu

[permalink] [raw]
Subject: Re: [PATCH v3 07/27] iommu, irq: Allocate irq_desc for dmar_msi with local node

On Sun, Jun 9, 2013 at 8:31 AM, Sebastian Andrzej Siewior
<[email protected]> wrote:
> On Fri, Jun 07, 2013 at 03:30:53PM -0700, Yinghai Lu wrote:
>> diff --git a/drivers/iommu/dmar.c b/drivers/iommu/dmar.c
>> index a7967ce..c09e60d 100644
>> --- a/drivers/iommu/dmar.c
>> +++ b/drivers/iommu/dmar.c
>> @@ -1273,7 +1273,7 @@ int dmar_set_interrupt(struct intel_iommu *iommu)
>> if (iommu->irq)
>> return 0;
>>
>> - irq = create_irq();
>> + irq = create_irq_nr(0, iommu->node);
>> if (!irq) {
>
> This is okay. However you fix a bug here so you might want first fix the bug
> with a Cc: stable tag and then do this.

ok, will add cc to stable to two related patches.

Thanks

Yinghai

2013-06-10 23:55:41

by Yinghai Lu

[permalink] [raw]
Subject: Re: [PATCH v3 10/27] genirq: Split __irq_reserve_irqs from irq_alloc_descs

On Mon, Jun 10, 2013 at 12:39 PM, Thomas Gleixner <[email protected]> wrote:
> On Fri, 7 Jun 2013, Yinghai Lu wrote:
>> /**
>> - * irq_alloc_descs - allocate and initialize a range of irq descriptors
>> - * @irq: Allocate for specific irq number if irq >= 0
>> + * __irq_reserve_descs - reserve and initialize a range of irq descriptors
>
> Did you even bother to run docbook ?
>
> No you didn't. Otherwise you'd have noticed that documentation for
> __irq_reserve_descs is not the right thing for a function named
> __irq_reserve_irqs.

sorry, will them consistent.

Thanks

Yinghai

2013-06-11 21:53:16

by Tony Luck

[permalink] [raw]
Subject: RE: [PATCH v3 06/27] ia64, irq: Add dummy create_irq_nr()

> Still need ia64 guys to kill create_irq() in arch/ia64.

Was there already a patch to do that? I'm afraid my eyes tend to glaze over when I
see [part 64/87: ia64 ...] and assume that its general cleanup that will flow through
with all the other parts of the patch series. Please point me at something that you
want me to apply.

If not - then what is the recommended replacement? There seem to be a half dozen
places in arch/ia64 where create_irq() is called.

-Tony