2010-12-09 21:01:25

by Konrad Rzeszutek Wilk

[permalink] [raw]
Subject: [PATCH] Fix when booting Xen+Linux under QEMU.

Please take a look. The first patch just cleans up the find_unbound_irq
so it is easier to understand what it does.

The second patch hardness the Xen IRQ (event channels) allocation finder
if you try to boot a minimalistic 32-bit Linux kernel along with a
Xen hypervisor under QEMU.

We end up with a weird scenario where the nr_irq_gsi was greater than nr_irq
by 16. This code hardness it by pointing out the issue to the user and tries
to continue - BUT this might impact the PCI device allocation.

Tested also on normal machine with no regressions found.


2010-12-09 21:00:26

by Konrad Rzeszutek Wilk

[permalink] [raw]
Subject: [PATCH 1/2] xen/irq: Cleanup the find_unbound_irq

The "find_unbound_irq" is a bit unusual - it allocates
virtual IRQ (event channels) in reverse order. This means
starting at the "top" of the available IRQs (nr_irqs) down
to the GSI/MSI IRQs (nr_irqs_gsi).

Signed-off-by: Konrad Rzeszutek Wilk <[email protected]>
---
drivers/xen/events.c | 18 ++++++++++++------
1 files changed, 12 insertions(+), 6 deletions(-)

diff --git a/drivers/xen/events.c b/drivers/xen/events.c
index 31af0ac..4d4a23d 100644
--- a/drivers/xen/events.c
+++ b/drivers/xen/events.c
@@ -405,15 +405,21 @@ static int find_unbound_irq(void)
{
struct irq_data *data;
int irq, res;
- int start = get_nr_hw_irqs();
+ int bottom = get_nr_hw_irqs();
+ int top = nr_irqs-1;

- if (start == nr_irqs)
+ if (bottom == nr_irqs)
goto no_irqs;

- /* nr_irqs is a magic value. Must not use it.*/
- for (irq = nr_irqs-1; irq > start; irq--) {
+ /* This loop starts from the top of IRQ space and goes down.
+ * We need this b/c if we have a PCI device in a Xen PV guest
+ * we do not have an IO-APIC (though the backend might have them)
+ * mapped in. To not have a collision of physical IRQs with the Xen
+ * event channels start at the top of the IRQ space for virtual IRQs.
+ */
+ for (irq = top; irq > bottom; irq--) {
data = irq_get_irq_data(irq);
- /* only 0->15 have init'd desc; handle irq > 16 */
+ /* only 15->0 have init'd desc; handle irq > 16 */
if (!data)
break;
if (data->chip == &no_irq_chip)
@@ -424,7 +430,7 @@ static int find_unbound_irq(void)
return irq;
}

- if (irq == start)
+ if (irq == bottom)
goto no_irqs;

res = irq_alloc_desc_at(irq, -1);
--
1.7.1

2010-12-09 21:00:54

by Konrad Rzeszutek Wilk

[permalink] [raw]
Subject: [PATCH 2/2] xen/irq: Don't fall over when nr_irqs_gsi > nr_irqs.

This scenario where the nr_irq_gsi is greater than nr_irqs
is rather strange but lets still try to survive. Make sure
to print a warning so the user wouldn't be surprised in case
things don't work.

Solves a bootup-crash when booting Xen and Linux under QEMU.

Signed-off-by: Konrad Rzeszutek Wilk <[email protected]>
---
drivers/xen/events.c | 9 +++++++++
1 files changed, 9 insertions(+), 0 deletions(-)

diff --git a/drivers/xen/events.c b/drivers/xen/events.c
index 4d4a23d..11687dd 100644
--- a/drivers/xen/events.c
+++ b/drivers/xen/events.c
@@ -411,6 +411,7 @@ static int find_unbound_irq(void)
if (bottom == nr_irqs)
goto no_irqs;

+retry:
/* This loop starts from the top of IRQ space and goes down.
* We need this b/c if we have a PCI device in a Xen PV guest
* we do not have an IO-APIC (though the backend might have them)
@@ -434,6 +435,14 @@ static int find_unbound_irq(void)
goto no_irqs;

res = irq_alloc_desc_at(irq, -1);
+ if (res == -EEXIST) {
+ top--;
+ if (bottom > top)
+ printk(KERN_ERR "Eating in GSI/MSI space (%ld)!" \
+ " Your PCI device might not work!\n", top);
+ if (top > NR_IRQS_LEGACY)
+ goto retry;
+ }

if (WARN_ON(res != irq))
return -1;
--
1.7.1

2010-12-09 21:12:12

by Jeremy Fitzhardinge

[permalink] [raw]
Subject: Re: [PATCH] Fix when booting Xen+Linux under QEMU.

On 12/09/2010 01:01 PM, Konrad Rzeszutek Wilk wrote:
> Please take a look. The first patch just cleans up the find_unbound_irq
> so it is easier to understand what it does.
>
> The second patch hardness the Xen IRQ (event channels) allocation finder
> if you try to boot a minimalistic 32-bit Linux kernel along with a
> Xen hypervisor under QEMU.
>
> We end up with a weird scenario where the nr_irq_gsi was greater than nr_irq
> by 16. This code hardness it by pointing out the issue to the user and tries
> to continue - BUT this might impact the PCI device allocation.
>
> Tested also on normal machine with no regressions found.

I think I'd prefer to:

1. move to using all dynamic irqs, and use the core kernel irq
allocator (ie, resurrect IanC's patches)
2. kill the ioapic dummy page hack (I'm assuming that would be
trivial - at least to make it all zero - because the kernel won't
care about the number of GSIs at that point)

J

2010-12-10 03:54:08

by Konrad Rzeszutek Wilk

[permalink] [raw]
Subject: Re: [PATCH] Fix when booting Xen+Linux under QEMU.

> I think I'd prefer to:
>
> 1. move to using all dynamic irqs, and use the core kernel irq
> allocator (ie, resurrect IanC's patches)

That would make it easier, except that they don't work when you have DomU
guest with Xen PCI front. Simply b/c the DomU guest has no idea of what GSI of
the host machine is. Hence the find_unbound_irq that starts from the top
(nr_irqs) and goes down to GSI count (nr_irq_gsi).

Maybe we can stick it in the Xen PCIfronted/Xen PCI backend a mechanism (some
key value) which will specify the host's nr_irq_gsi value. Then the DomU guest
can set that and then use the core kernel irq allocator...

2010-12-10 13:02:45

by Stefano Stabellini

[permalink] [raw]
Subject: Re: [PATCH] Fix when booting Xen+Linux under QEMU.

On Thu, 9 Dec 2010, Jeremy Fitzhardinge wrote:
> 2. kill the ioapic dummy page hack (I'm assuming that would be
> trivial - at least to make it all zero - because the kernel won't
> care about the number of GSIs at that point)

That wouldn't work: mp_register_ioapic reads the number of entries for
each ioapic and it cannot be zero, because later on mp_find_ioapic is
going to find the ioapic a particular gsi belongs to based on the
previous results.
Mp_register_ioapic also increases gsi_top that ends up influencing the
value of nr_irqs_gsi.


Considering that nr_irqs is not a real upper limit on Xen because you
usually have at least 1024 evtchns anyway, I think we should reimplement
arch_probe_nr_irqs on Xen to set nr_irqs appropriately.
Something like this:

#define NR_EVTCHNS (sizeof(unsigned long) * 8 * sizeof(unsigned long))
int __init arch_probe_nr_irqs(void)
{
if (xen_domain())
nr_irqs = NR_EVTCHNS;

return NR_IRQS_LEGACY;
}

The problem is that #ifdef CONFIG_SPARSE_IRQ io_apic.c reimplements
arch_probe_nr_irqs too but only if CONFIG_PCI_MSI is also defined
nr_irqs is set high enough by that function.
So to work around this problem we might have to redefine
arch_probe_nr_irqs only #if !defined(CONFIG_SPARSE_IRQ) and we might also
have to automatically enable CONFIG_PCI_MSI if the user selects
CONFIG_SPARSE_IRQ on XEN somehow.