Thomas,
I wonder if you have any advice on the following.
Under Xen we have 1024 event channels per-VM which can be injected into
any VCPU. We map these into IRQs and inject them into the system through
the generic IRQ mechanisms.
Event channels are independent from the normal x86 concept of a vector,
although these can also exist e.g. in an HVM guest with PV extensions
you get both 256 vectors per CPU and 1024 event channels.
In some cases there is some rough equivalence between event channels and
x86 vectors. Specifically in domain 0 or HVM guests with the right PV
extensions host GSIs or emulated GSIs respectively can be bound to an
event channel as a "pirq". In this case we allocate IRQs such that
GSI==IRQ for consistency with the same kernel running natively.
For all other event channels we allocate the IRQs dynamically. Since
both event channels and x86 vectors can exist simultaneously we always
allocate an IRQ for dynamic event channels from above nr_irqs_gsi
(somewhat similar to MSIs on native I guess). Since nr_irqs_gsi under
Xen is always an overestimate compared with the actual number of host
GSIs (or accurate in the HVM with PV extensions case) there is no
problem with clashes between the 1-1 GSI==IRQ range and the dynamic
range.
However because nr_irqs on x86, including when running under Xen, is
derived from NR_VECTORS * nr_cpu_ids it is often the case that we can
run out of available IRQ numbers above the nr_irqs_gsi limit, in fact it
is sometimes the case that nr_irqs_gsi >= nr_irqs in which case no
dynamic event channels can be allocated at all!
To work around this Xen currently tries to allocate an IRQ from
nr_irqs_gsi..nr_irqs but if that doesn't work it will fall back to to
using the IRQ space below nr_irqs_gsi. This risks clashing with
allocation in the 1-1 GSI<->IRQ region.
I'd very much like to remove this workaround (better described as a hack
I think) but in order to do so I need to make sure there are plenty of
IRQs between nr_irqs_gsi and nr_irqs. Effectively what we would like to
do is:
nr_irqs += NR_EVENT_CHANNELS;
somewhere, except obviously we don't want to just drop that into generic
code!
Do you have any hints as to an appropriate existing interface which
could Xen use here?
If not any suggestions for what sort of interface might be acceptable to
add?
For example I was wondering about adding x86_info.irqs.probe_nr_irqs,
which returns a platform specific additional number of IRQs, and having
arch_probe_nr_irqs += that value into its calculations.
Ian.
Ian,
On Wed, 16 Feb 2011, Ian Campbell wrote:
> I'd very much like to remove this workaround (better described as a hack
> I think) but in order to do so I need to make sure there are plenty of
> IRQs between nr_irqs_gsi and nr_irqs. Effectively what we would like to
> do is:
> nr_irqs += NR_EVENT_CHANNELS;
> somewhere, except obviously we don't want to just drop that into generic
> code!
>
> Do you have any hints as to an appropriate existing interface which
> could Xen use here?
>
> If not any suggestions for what sort of interface might be acceptable to
> add?
>
> For example I was wondering about adding x86_info.irqs.probe_nr_irqs,
> which returns a platform specific additional number of IRQs, and having
> arch_probe_nr_irqs += that value into its calculations.
I'm about to remove the nr_irqs NR_IRQS limitation. It's silly when we
deal with sparse irqs. So the idea is to have the initial nr_irqs set
in early boot to have a sensible size for allocating stuff. Later on
we can expand nr_irqs when the need arises.
It's not only Xen which wants to eliminate the limitation. Think about
irq expanders which are detected late in the boot. We have no sensible
way to reserve enough numbers for them at early boot as we dont know
whether that hardware is there or not.
So my plan for .39 is to ignore the NR_IRQS limitation in the sparse
case and make nr_irqs expandable of course with a sensible upper limit
in the core code itself. It's basically the allocation bitmap which
limits it, but I doubt we'll hit 1 Million irq numbers in the
forseeable future.
Thanks,
tglx
On Wed, 2011-02-16 at 15:56 +0000, Thomas Gleixner wrote:
> I'm about to remove the nr_irqs NR_IRQS limitation. It's silly when we
> deal with sparse irqs. So the idea is to have the initial nr_irqs set
> in early boot to have a sensible size for allocating stuff. Later on
> we can expand nr_irqs when the need arises.
> It's not only Xen which wants to eliminate the limitation. Think about
> irq expanders which are detected late in the boot. We have no sensible
> way to reserve enough numbers for them at early boot as we dont know
> whether that hardware is there or not.
>
> So my plan for .39 is to ignore the NR_IRQS limitation in the sparse
> case and make nr_irqs expandable of course with a sensible upper limit
> in the core code itself. It's basically the allocation bitmap which
> limits it, but I doubt we'll hit 1 Million irq numbers in the
> forseeable future.
That sounds ideal, thanks!
I was hoping to get rid of the workaround in Xen events.c in the 2.6.39
timeframe too.
If you let me know when you have something I can test I'll combine with
the Xen side and give it a spin.
On a vaguely related note, what is the future of non-sparse IRQs (on x86
and/or generally)?
Ian.
On Wed, 16 Feb 2011, Ian Campbell wrote:
> On Wed, 2011-02-16 at 15:56 +0000, Thomas Gleixner wrote:
> > I'm about to remove the nr_irqs NR_IRQS limitation. It's silly when we
> > deal with sparse irqs. So the idea is to have the initial nr_irqs set
> > in early boot to have a sensible size for allocating stuff. Later on
> > we can expand nr_irqs when the need arises.
>
> > It's not only Xen which wants to eliminate the limitation. Think about
> > irq expanders which are detected late in the boot. We have no sensible
> > way to reserve enough numbers for them at early boot as we dont know
> > whether that hardware is there or not.
> >
> > So my plan for .39 is to ignore the NR_IRQS limitation in the sparse
> > case and make nr_irqs expandable of course with a sensible upper limit
> > in the core code itself. It's basically the allocation bitmap which
> > limits it, but I doubt we'll hit 1 Million irq numbers in the
> > forseeable future.
>
> That sounds ideal, thanks!
>
> I was hoping to get rid of the workaround in Xen events.c in the 2.6.39
> timeframe too.
>
> If you let me know when you have something I can test I'll combine with
> the Xen side and give it a spin.
>
> On a vaguely related note, what is the future of non-sparse IRQs (on x86
> and/or generally)?
In general I want to switch everything over to SPARSE_IRQ. When the
open coded access to irq_desc[] is gone, which should be mostly the
case in .39 then switching everything over should be a smooth
thing. For those archs which do not want to adjust the numbers
dynamically we simple allocate NR_IRQS in early_irq_init(). So they
wont even notice.
Thanks,
tglx
On Wed, 2011-02-16 at 16:25 +0000, Thomas Gleixner wrote:
> On Wed, 16 Feb 2011, Ian Campbell wrote:
> > On a vaguely related note, what is the future of non-sparse IRQs (on x86
> > and/or generally)?
>
> In general I want to switch everything over to SPARSE_IRQ. When the
> open coded access to irq_desc[] is gone, which should be mostly the
> case in .39 then switching everything over should be a smooth
> thing. For those archs which do not want to adjust the numbers
> dynamically we simple allocate NR_IRQS in early_irq_init(). So they
> wont even notice.
Sweet, I won't worry myself unduly over the non-SPARSE_IRQ case then.
Thanks,
Ian.