2008-07-10 18:41:55

by Suresh Siddha

[permalink] [raw]
Subject: [patch 00/26] x64, x2apic/intr-remap: Interrupt-remapping and x2apic support

x2APIC architecture provides a new x2apic mode, which allows for the
increased range of processor addressability ( > 8 bit apic ID support),
MSR access to APIC registers, etc. x2apic specification can be found at
http://download.intel.com/design/processor/specupdt/318148.pdf
(located under http://developer.intel.com/products/processor/manuals/index.htm )

Interrupt-remapping is part of Intel Virtualization Technology for
Directed I/O architecture and the specification can be found from
http://download.intel.com/technology/computing/vptech/Intel(r)_VT_for_Direct_IO.pdf
(above link seems to be broken for the moment, but in general it should be
found under http://www.intel.com/technology/virtualization/ )

Interrupt-remapping architecture enables extended Interrupt Mode on x86
platforms supporting 32-bit APIC-IDs. This infrastructure allows
the existing interrupt sources such as I/OxAPICs and MSI/MSI-X devices work
seamlessly with apic-id's > 8 bits. As such, this is a pre-requisite for
enabling x2apic mode in the CPU.

This patchset adds 64-bit support for interrupt-remapping and x2apic, which
introduces apic_ops for basic APIC ops(uncached memory Vs MSR accesses etc),
new irq_chip's for supporting interrupt-remapping and new genapic for
supporting IPI's, logical cluster/physical x2apic modes.

irq migration in the presence of interrupt-remapping is done from the
process-context as opposed to interrupt-context. Interrupt-remapping
infrastrucutre allows us to do this migration in a simple fashion (atleast for
edge triggered interrupts).

Interrupt-remapping (CONFIG_INTR_REMAP) and DMA-remapping (CONFIG_DMAR)
can be enabled separately.

More details in the individual patches that follow.

Signed-off-by: Suresh Siddha <[email protected]>
--


2008-07-10 19:53:51

by Ingo Molnar

[permalink] [raw]
Subject: Re: [patch 00/26] x64, x2apic/intr-remap: Interrupt-remapping and x2apic support


* Suresh Siddha <[email protected]> wrote:

> x2APIC architecture provides a new x2apic mode, which allows for the
> increased range of processor addressability ( > 8 bit apic ID
> support), MSR access to APIC registers, etc. x2apic specification can
> be found at
> http://download.intel.com/design/processor/specupdt/318148.pdf
> (located under
> http://developer.intel.com/products/processor/manuals/index.htm )
>
> Interrupt-remapping is part of Intel Virtualization Technology for
> Directed I/O architecture and the specification can be found from
> http://download.intel.com/technology/computing/vptech/Intel(r)_VT_for_Direct_IO.pdf
> (above link seems to be broken for the moment, but in general it
> should be found under http://www.intel.com/technology/virtualization/
> )
>
> Interrupt-remapping architecture enables extended Interrupt Mode on
> x86 platforms supporting 32-bit APIC-IDs. This infrastructure allows
> the existing interrupt sources such as I/OxAPICs and MSI/MSI-X devices
> work seamlessly with apic-id's > 8 bits. As such, this is a
> pre-requisite for enabling x2apic mode in the CPU.
>
> This patchset adds 64-bit support for interrupt-remapping and x2apic,
> which introduces apic_ops for basic APIC ops(uncached memory Vs MSR
> accesses etc), new irq_chip's for supporting interrupt-remapping and
> new genapic for supporting IPI's, logical cluster/physical x2apic
> modes.
>
> irq migration in the presence of interrupt-remapping is done from the
> process-context as opposed to interrupt-context. Interrupt-remapping
> infrastrucutre allows us to do this migration in a simple fashion
> (atleast for edge triggered interrupts).
>
> Interrupt-remapping (CONFIG_INTR_REMAP) and DMA-remapping
> (CONFIG_DMAR) can be enabled separately.
>
> More details in the individual patches that follow.

quite some stuff!

For review and testing purposes i've created a new topic branch for
this: tip/x86/x2apic and have picked up your patches into it.

I've pushed it out, but it's not merged into tip/master yet (obviously,
you sent this just a few minutes ago :)

It integrates fine with tip/master. If you do this:

git-checkout tip/master
git-merge tip/x86/x2apic

you'll get a clean merge.

Btw., i threw it at the -tip test-cluster and got back a quick build
bugreport:

arch/x86/xen/enlighten.c: In function 'xen_patch':
arch/x86/xen/enlighten.c:1084: warning: label 'patch_site' defined but not used
arch/x86/xen/enlighten.c: At top level:
arch/x86/xen/enlighten.c:1272: error: expected identifier before '(' token
arch/x86/xen/enlighten.c:1273: error: expected '}' before '.' token
arch/x86/kernel/paravirt.c:376:2: error: invalid preprocessing directive
#ifndedarch/x86/kernel/paravirt.c:384:2: error: #endif without #if

with this config:

http://redhat.com/~mingo/misc/config-Thu_Jul_10_21_43_28_CEST_2008.bad

bisection shows that it was caused by:

| 38c56e6b674074f8ec98722ceca3d15771e17abe is first bad commit
| commit 38c56e6b674074f8ec98722ceca3d15771e17abe
| Author: Suresh Siddha <[email protected]>
| Date: Thu Jul 10 11:16:49 2008 -0700
|
| x64, x2apic/intr-remap: basic apic ops support

(that's all for tonight - will have another look tomorrow.)

Ingo

2008-07-10 20:12:17

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [patch 00/26] x64, x2apic/intr-remap: Interrupt-remapping and x2apic support


At a quick skim nothing looks really bad, nothing looks really bad,
but you are dealing in an area of code that could be made much nicer
and if we are going to support a noticeably different style of irq
management we need to get in some of those pending cleanups so the
code does not fall down under it's own wait.

Suresh Siddha <[email protected]> writes:

> irq migration in the presence of interrupt-remapping is done from the
> process-context as opposed to interrupt-context. Interrupt-remapping
> infrastrucutre allows us to do this migration in a simple fashion (atleast for
> edge triggered interrupts).

Unless I have misread things this irq migration remains racy, as I did not
see any instructions that would guarantee that in flight irqs were flushed
to the cpus local apics before we cleaned up the destination.

You are sizing an array as NR_IRQS this is something there should be sufficient
existing infrastructure to avoid. Arrays sized by NR_IRQS is a significant
problem both for scaling the system up and down so ultimately we need to kill
this. For now we should not introduce any new arrays.

A lot of your code is generic, and some of it is for just x86_64. Since the
cpus are capable of running in 32bit mode. We really need to implement x86_32
and x86_64 support in the same code base. Which I believe means factoring out
pieces of io_apic_N.c into things such as msi.c that can be shared between the
two architectures.

Eric

2008-07-10 20:18:31

by Ingo Molnar

[permalink] [raw]
Subject: Re: [patch 00/26] x64, x2apic/intr-remap: Interrupt-remapping and x2apic support


* Eric W. Biederman <[email protected]> wrote:

> A lot of your code is generic, and some of it is for just x86_64.
> Since the cpus are capable of running in 32bit mode. We really need
> to implement x86_32 and x86_64 support in the same code base. Which I
> believe means factoring out pieces of io_apic_N.c into things such as
> msi.c that can be shared between the two architectures.

i think the APIC code should be fully unified down the line - the
APIC/IOAPIC knows little about the mode the CPU is running in and has to
be programmed the same way independent of which mode the CPU is in. The
current fork between the 32-bit and 64-bit APIC code is in good part
artificial.

Ingo

2008-07-10 20:23:00

by Suresh Siddha

[permalink] [raw]
Subject: Re: [patch 00/26] x64, x2apic/intr-remap: Interrupt-remapping and x2apic support

On Thu, Jul 10, 2008 at 12:53:20PM -0700, Ingo Molnar wrote:
> quite some stuff!
>
> For review and testing purposes i've created a new topic branch for
> this: tip/x86/x2apic and have picked up your patches into it.
>
> I've pushed it out, but it's not merged into tip/master yet (obviously,
> you sent this just a few minutes ago :)
>
> It integrates fine with tip/master. If you do this:
>
> git-checkout tip/master
> git-merge tip/x86/x2apic
>
> you'll get a clean merge.
>
> Btw., i threw it at the -tip test-cluster and got back a quick build
> bugreport:
>
> arch/x86/xen/enlighten.c: In function 'xen_patch':
> arch/x86/xen/enlighten.c:1084: warning: label 'patch_site' defined but not used
> arch/x86/xen/enlighten.c: At top level:
> arch/x86/xen/enlighten.c:1272: error: expected identifier before '(' token
> arch/x86/xen/enlighten.c:1273: error: expected '}' before '.' token
> arch/x86/kernel/paravirt.c:376:2: error: invalid preprocessing directive
> #ifndedarch/x86/kernel/paravirt.c:384:2: error: #endif without #if
>
> with this config:
>
> http://redhat.com/~mingo/misc/config-Thu_Jul_10_21_43_28_CEST_2008.bad

Yes. It is conflicting with Jermey's recent changes. I will post the fixes
soon, meanwhile everything should be ok with !CONFIG_PARAVIRT

thanks,
suresh

2008-07-10 21:12:16

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [patch 00/26] x64, x2apic/intr-remap: Interrupt-remapping and x2apic support

Ingo Molnar <[email protected]> writes:

> * Eric W. Biederman <[email protected]> wrote:
>
>> A lot of your code is generic, and some of it is for just x86_64.
>> Since the cpus are capable of running in 32bit mode. We really need
>> to implement x86_32 and x86_64 support in the same code base. Which I
>> believe means factoring out pieces of io_apic_N.c into things such as
>> msi.c that can be shared between the two architectures.
>
> i think the APIC code should be fully unified down the line - the
> APIC/IOAPIC knows little about the mode the CPU is running in and has to
> be programmed the same way independent of which mode the CPU is in. The
> current fork between the 32-bit and 64-bit APIC code is in good part
> artificial.

I completely agree. However
1) There is a fair amount of work involved in the unification so taking it in small pieces
is good.
2) We are doing much more then we should in ioapic_N.c anyway.

So it makes sense to grab the pieces we are actively working on factor them out and unify them
first.

The basic model I expect will end up looking something like:
Type of cpu irq reception. PIC mode, local APIC mode, ???? Virtualized modes????
Type of irq source. ioapic, msi, htirq, ??? Virtualized source ????
Type of configuration mptable, acpi mps table.

Some of this we have split out today and nicely factored. Other parts we don't.

In particular for setting up msi and ioapics we use exactly the same mapping
of bits. So we describe things a little differently from the irq reception layer
to the irq sending layer we should be able to reuse exactly the same msi and ioapic code
instead of having their setup methods test for irq_remapping().

Eric

2008-07-10 21:15:32

by Suresh Siddha

[permalink] [raw]
Subject: Re: [patch 00/26] x64, x2apic/intr-remap: Interrupt-remapping and x2apic support

On Thu, Jul 10, 2008 at 01:05:50PM -0700, Eric W. Biederman wrote:
>
> At a quick skim nothing looks really bad, nothing looks really bad,
> but you are dealing in an area of code that could be made much nicer
> and if we are going to support a noticeably different style of irq
> management we need to get in some of those pending cleanups so the
> code does not fall down under it's own wait.
>
> Suresh Siddha <[email protected]> writes:
>
> > irq migration in the presence of interrupt-remapping is done from the
> > process-context as opposed to interrupt-context. Interrupt-remapping
> > infrastrucutre allows us to do this migration in a simple fashion (atleast for
> > edge triggered interrupts).
>
> Unless I have misread things this irq migration remains racy, as I did not
> see any instructions that would guarantee that in flight irqs were flushed
> to the cpus local apics before we cleaned up the destination.

Flushing the interrupt entry cache will take care of this. We modify the IRTE
and then flush the interrupt entry cache before cleaning up the original
vector allocated.

Any new interrupts from the device will see the new entry. Old in flight
interrupts will be registered at the CPU before the flush of the cache is
complete.

>
> You are sizing an array as NR_IRQS this is something there should be sufficient
> existing infrastructure to avoid. Arrays sized by NR_IRQS is a significant
> problem both for scaling the system up and down so ultimately we need to kill
> this. For now we should not introduce any new arrays.

Ok. Ideally dynamic_irq_init()/cleanup() can take care of this. or
create_irq()/destroy_irq() and embed this as a pointer somewhere inside
irq_desc. I need to take a look at this more closer and post a fix up patch.

>
> A lot of your code is generic, and some of it is for just x86_64. Since the
> cpus are capable of running in 32bit mode. We really need to implement x86_32
> and x86_64 support in the same code base. Which I believe means factoring out
> pieces of io_apic_N.c into things such as msi.c that can be shared between the
> two architectures.

Yes, As you and Ingo mentioned, there is nothing 64bit specific and one
can easily add the 32bit support. But before that we need, some more
x86 unification and I am very short on resources currently :(

thanks,
suresh

2008-07-10 21:56:28

by Suresh Siddha

[permalink] [raw]
Subject: Re: [patch 00/26] x64, x2apic/intr-remap: Interrupt-remapping and x2apic support

On Thu, Jul 10, 2008 at 12:53:20PM -0700, Ingo Molnar wrote:
>
> Btw., i threw it at the -tip test-cluster and got back a quick build
> bugreport:
>
> arch/x86/xen/enlighten.c: In function 'xen_patch':
> arch/x86/xen/enlighten.c:1084: warning: label 'patch_site' defined but not used
> arch/x86/xen/enlighten.c: At top level:
> arch/x86/xen/enlighten.c:1272: error: expected identifier before '(' token
> arch/x86/xen/enlighten.c:1273: error: expected '}' before '.' token
> arch/x86/kernel/paravirt.c:376:2: error: invalid preprocessing directive
> #ifndedarch/x86/kernel/paravirt.c:384:2: error: #endif without #if
>
> with this config:
>
> http://redhat.com/~mingo/misc/config-Thu_Jul_10_21_43_28_CEST_2008.bad

Ingo, that was my stupid typo. Please apply this patch. BTW, we
need some more xen64 paravirt fixes in this area. I will look at it
as soon as possible.
---

[patch] compile fix with x2apic patch and CONFIG_PARAVIRT

fix the typo.

Signed-off-by: Suresh Siddha <[email protected]>
---

diff --git a/arch/x86/kernel/paravirt.c b/arch/x86/kernel/paravirt.c
index b80105a..4f29ff8 100644
--- a/arch/x86/kernel/paravirt.c
+++ b/arch/x86/kernel/paravirt.c
@@ -360,7 +360,7 @@ struct pv_cpu_ops pv_cpu_ops = {

struct pv_apic_ops pv_apic_ops = {
#ifdef CONFIG_X86_LOCAL_APIC
-#ifnded CONFIG_X86_64
+#ifndef CONFIG_X86_64
.apic_write = native_apic_mem_write,
.apic_write_atomic = native_apic_mem_write_atomic,
.apic_read = native_apic_mem_read,

2008-07-10 22:09:18

by Arjan van de Ven

[permalink] [raw]
Subject: Re: [patch 00/26] x64, x2apic/intr-remap: Interrupt-remapping and x2apic support

Eric W. Biederman wrote:
> A lot of your code is generic, and some of it is for just x86_64. Since the
> cpus are capable of running in 32bit mode. We really need to implement x86_32
> and x86_64 support in the same code base. Which I believe means factoring out
> pieces of io_apic_N.c into things such as msi.c that can be shared between the
> two architectures.

in general the purpose of this is to support 256 and more logical threads....
.... not going to happen on 32 bit

2008-07-10 23:02:17

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [patch 00/26] x64, x2apic/intr-remap: Interrupt-remapping and x2apic support

Suresh Siddha <[email protected]> writes:

> Flushing the interrupt entry cache will take care of this. We modify the IRTE
> and then flush the interrupt entry cache before cleaning up the original
> vector allocated.
>
> Any new interrupts from the device will see the new entry. Old in flight
> interrupts will be registered at the CPU before the flush of the cache is
> complete.

That sounds nice in principle. I saw cpu cache flushes, I saw writes.
I did not see any reads which is necessary to get that behavior with
the standard pci transaction rules.

Having seen enough little races and misbehaving hardware I'm very paranoid
about irq migration. The current implementation is belt and suspenders
and I still think there are races that I have missed.

>> You are sizing an array as NR_IRQS this is something there should be
> sufficient
>> existing infrastructure to avoid. Arrays sized by NR_IRQS is a significant
>> problem both for scaling the system up and down so ultimately we need to kill
>> this. For now we should not introduce any new arrays.
>
> Ok. Ideally dynamic_irq_init()/cleanup() can take care of this. or
> create_irq()/destroy_irq() and embed this as a pointer somewhere inside
> irq_desc. I need to take a look at this more closer and post a fix up patch.

Sounds good. Ultimately we are looking at handler_data or chip_data.
There are very specific rules that meant I could not use them for
the msi data but otherwise I don't remember exactly what the are for.
IOMMU are covered though.

>> A lot of your code is generic, and some of it is for just x86_64. Since the
>> cpus are capable of running in 32bit mode. We really need to implement x86_32
>> and x86_64 support in the same code base. Which I believe means factoring out
>> pieces of io_apic_N.c into things such as msi.c that can be shared between the
>> two architectures.
>
> Yes, As you and Ingo mentioned, there is nothing 64bit specific and one
> can easily add the 32bit support. But before that we need, some more
> x86 unification and I am very short on resources currently :(

At least for msi the code you are working on was essentially unified
when it was written, it just happened to have two copies. I don't
think I'm asking for heaving lifting. Mostly just putting code that
is touched into something other then the growing monstrosity that is
ioapic.c

Further can we please see some better abstractions. In particular can
we generate a token for the irq destination. And have the msi and
ioapic setup read that token and program it into the hardware. The
rules for which bits go where is exactly the same both with and
without irq_remapping so having an if statement there seems to obscure
what is really happening. Especially if as it appears that we may be used
the new token format with x2apics without remapping.

My primary concern is that the end result be well factored irq handling code
so it is possible to get in there and look at the code and maintain it.

A small part of that is the 32bit support. Another part are the missing
abstractions I described. I don't know what else since I have barely scratched the surface patch
review wise.

Eric

2008-07-10 23:02:29

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [patch 00/26] x64, x2apic/intr-remap: Interrupt-remapping and x2apic support

Arjan van de Ven <[email protected]> writes:

> in general the purpose of this is to support 256 and more logical threads....
> .... not going to happen on 32 bit

Huh?

Intel is still making new 32bit processors with otherwise modern chipsets
like tolapai so I don't plan to count anything out. Especially things like
the iommu support for irqs may be interesting. Further the iommu irq
support is interesting to hypervisors so I would not at all be surprised
to see the code getting reused in that context, and with 32bit kernels.

I'm not asking for anyone to unify code they are not touching but
rather to code things so unification because trivial. The entire
purpose of having arch/x86. We have had this discussion and the
viewpoint that we won't add new hardware features to x86_32 and just
put it in maintenance mode lost, because the hardware manufactures
include Intel have not put x86_32 into strictly maintenance mode.

Eric

2008-07-11 02:35:22

by Suresh Siddha

[permalink] [raw]
Subject: Re: [patch 00/26] x64, x2apic/intr-remap: Interrupt-remapping and x2apic support

On Thu, Jul 10, 2008 at 03:52:50PM -0700, Eric W. Biederman wrote:
> Suresh Siddha <[email protected]> writes:
>
> > Flushing the interrupt entry cache will take care of this. We modify the IRTE
> > and then flush the interrupt entry cache before cleaning up the original
> > vector allocated.
> >
> > Any new interrupts from the device will see the new entry. Old in flight
> > interrupts will be registered at the CPU before the flush of the cache is
> > complete.
>
> That sounds nice in principle. I saw cpu cache flushes, I saw writes.
> I did not see any reads which is necessary to get that behavior with
> the standard pci transaction rules.

qi_flush_iec() will submit an invalidation descriptor and will wait
till it finishes the invalidation of the interrupt entry cache.
qi_submit_sync() will do the job. Descriptor completion ensures that
the inflight interrupts are flushed.

> Having seen enough little races and misbehaving hardware I'm very paranoid
> about irq migration. The current implementation is belt and suspenders
> and I still think there are races that I have missed.

Eric, This process irq migration is done on the cutting edge hardware
which was designed with all the feedback and experiences in the mind ;)

And also, I don't think we are deviating much from what we are currently doing.
We are still using cleanup vector etc, to clean up the previous vector
allocation.

> >> You are sizing an array as NR_IRQS this is something there should be
> > sufficient
> >> existing infrastructure to avoid. Arrays sized by NR_IRQS is a significant
> >> problem both for scaling the system up and down so ultimately we need to kill
> >> this. For now we should not introduce any new arrays.
> >
> > Ok. Ideally dynamic_irq_init()/cleanup() can take care of this. or
> > create_irq()/destroy_irq() and embed this as a pointer somewhere inside
> > irq_desc. I need to take a look at this more closer and post a fix up patch.
>
> Sounds good. Ultimately we are looking at handler_data or chip_data.
> There are very specific rules that meant I could not use them for
> the msi data but otherwise I don't remember exactly what the are for.
> IOMMU are covered though.

IOMMU is covered as part of pci_dev (pci_sysdata). But in the case of
interrupt-remapping, there are some interrupt resources like ioapics and
hpet, which don't have the corresponding pci dev. Will take a look at this.

> At least for msi the code you are working on was essentially unified
> when it was written, it just happened to have two copies. I don't
> think I'm asking for heaving lifting. Mostly just putting code that
> is touched into something other then the growing monstrosity that is
> ioapic.c

We can create msi.c which handles MSI specific handling. I will
look into this. But I def welcome somone beating me in posting those
patches :) I made a note of this however.

> Further can we please see some better abstractions. In particular can
> we generate a token for the irq destination. And have the msi and
> ioapic setup read that token and program it into the hardware. The
> rules for which bits go where is exactly the same both with and
> without irq_remapping so having an if statement there seems to obscure
> what is really happening. Especially if as it appears that we may be used
> the new token format with x2apics without remapping.

unfortunately x2apic can't be enabled with out enabling interrupt-remapping.
Interrupts don't work in majority of the configurations (as I mentioned
earlier). Programming IOAPIC RTE's and MSI address/data registers are
completely different based on the presence of interrupt-remapping.

>
> My primary concern is that the end result be well factored irq handling code
> so it is possible to get in there and look at the code and maintain it.
>
> A small part of that is the 32bit support. Another part are the missing
> abstractions I described. I don't know what else since I have barely scratched the surface patch
> review wise.

Please keep the expert comments coming.

thanks,
suresh

2008-07-11 03:22:18

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [patch 00/26] x64, x2apic/intr-remap: Interrupt-remapping and x2apic support

Suresh Siddha <[email protected]> writes:

>> That sounds nice in principle. I saw cpu cache flushes, I saw writes.
>> I did not see any reads which is necessary to get that behavior with
>> the standard pci transaction rules.
>
> qi_flush_iec() will submit an invalidation descriptor and will wait
> till it finishes the invalidation of the interrupt entry cache.
> qi_submit_sync() will do the job. Descriptor completion ensures that
> the inflight interrupts are flushed.

I will have to look a second time. It seems I did not see the wait.

>> Having seen enough little races and misbehaving hardware I'm very paranoid
>> about irq migration. The current implementation is belt and suspenders
>> and I still think there are races that I have missed.
>
> Eric, This process irq migration is done on the cutting edge hardware
> which was designed with all the feedback and experiences in the mind ;)
>
> And also, I don't think we are deviating much from what we are currently doing.
> We are still using cleanup vector etc, to clean up the previous vector
> allocation.

Yes. In general I think the design appears sound. I'm just not certain
of the implementation details. Having spent way to many hours
debugging some very subtle hardware issues, I am paranoid.


>> Sounds good. Ultimately we are looking at handler_data or chip_data.
>> There are very specific rules that meant I could not use them for
>> the msi data but otherwise I don't remember exactly what the are for.
>> IOMMU are covered though.
>
> IOMMU is covered as part of pci_dev (pci_sysdata). But in the case of
> interrupt-remapping, there are some interrupt resources like ioapics and
> hpet, which don't have the corresponding pci dev. Will take a look at this.
>
>> At least for msi the code you are working on was essentially unified
>> when it was written, it just happened to have two copies. I don't
>> think I'm asking for heaving lifting. Mostly just putting code that
>> is touched into something other then the growing monstrosity that is
>> ioapic.c
>
> We can create msi.c which handles MSI specific handling. I will
> look into this. But I def welcome somone beating me in posting those
> patches :) I made a note of this however.

Thanks. It is all of these little things. My hope is that we can
whittle down the unshared core instead of increasing the amount of
non-shared code.

>> Further can we please see some better abstractions. In particular can
>> we generate a token for the irq destination. And have the msi and
>> ioapic setup read that token and program it into the hardware. The
>> rules for which bits go where is exactly the same both with and
>> without irq_remapping so having an if statement there seems to obscure
>> what is really happening. Especially if as it appears that we may be used
>> the new token format with x2apics without remapping.
>
> unfortunately x2apic can't be enabled with out enabling interrupt-remapping.
> Interrupts don't work in majority of the configurations (as I mentioned
> earlier). Programming IOAPIC RTE's and MSI address/data registers are
> completely different based on the presence of interrupt-remapping.

Regardless of the x2apic mode issues there is a different issue here (address better in
another response where I gave an example).

There are a set of architecturally defined bits that can be
programmed. These same bits exist in the ioapic routing entry and in
the msi message.

Therefore we should have a generic mapping function that says give the architecturally
defined bits.

Then both the ioapic setup and the msi setup can call x86_irq_map(irq) get those
architectural bits and program them in their architecturally defined location.

For MSI it looks like you be able to take advantage of a few more bits, but the
same principle applies.

Getting these intermediate abstractions relatively clean is important so we can do
things with the hardware.

Eric

2008-07-11 10:28:50

by Ingo Molnar

[permalink] [raw]
Subject: Re: [patch 00/26] x64, x2apic/intr-remap: Interrupt-remapping and x2apic support


* Suresh Siddha <[email protected]> wrote:

> On Thu, Jul 10, 2008 at 12:53:20PM -0700, Ingo Molnar wrote:
> >
> > Btw., i threw it at the -tip test-cluster and got back a quick build
> > bugreport:
> >
> > arch/x86/xen/enlighten.c: In function 'xen_patch':
> > arch/x86/xen/enlighten.c:1084: warning: label 'patch_site' defined but not used
> > arch/x86/xen/enlighten.c: At top level:
> > arch/x86/xen/enlighten.c:1272: error: expected identifier before '(' token
> > arch/x86/xen/enlighten.c:1273: error: expected '}' before '.' token
> > arch/x86/kernel/paravirt.c:376:2: error: invalid preprocessing directive
> > #ifndedarch/x86/kernel/paravirt.c:384:2: error: #endif without #if
> >
> > with this config:
> >
> > http://redhat.com/~mingo/misc/config-Thu_Jul_10_21_43_28_CEST_2008.bad
>
> Ingo, that was my stupid typo. Please apply this patch. BTW, we
> need some more xen64 paravirt fixes in this area. I will look at it
> as soon as possible.

applied to tip/x86/x2apic - thanks Suresh.

Ingo

2008-07-11 20:10:36

by Ingo Molnar

[permalink] [raw]
Subject: Re: [patch 00/26] x64, x2apic/intr-remap: Interrupt-remapping and x2apic support


* Ingo Molnar <[email protected]> wrote:

> > > http://redhat.com/~mingo/misc/config-Thu_Jul_10_21_43_28_CEST_2008.bad
> >
> > Ingo, that was my stupid typo. Please apply this patch. BTW, we need
> > some more xen64 paravirt fixes in this area. I will look at it as
> > soon as possible.
>
> applied to tip/x86/x2apic - thanks Suresh.

another problem is the redefinition of apic_read(), causing:

arch/x86/xen/enlighten.c: In function ‘xen_patch':
arch/x86/xen/enlighten.c:1084: warning: label ‘patch_site' defined but not used
arch/x86/xen/enlighten.c: At top level:
arch/x86/xen/enlighten.c:1272: error: expected identifier before ‘(' token
arch/x86/xen/enlighten.c:1273: error: expected ‘}' before ‘.' token

with this config:

http://redhat.com/~mingo/misc/config-Fri_Jul_11_21_51_18_CEST_2008.bad

the continued spaghetti in all the APIC variants is quite ugly. This
should all be handled via a single apic_ops template that should cover
the paravirt and native variants as well.

Ingo

2008-07-11 20:32:27

by Suresh Siddha

[permalink] [raw]
Subject: Re: [patch 00/26] x64, x2apic/intr-remap: Interrupt-remapping and x2apic support

On Fri, Jul 11, 2008 at 01:09:57PM -0700, Ingo Molnar wrote:
>
> * Ingo Molnar <[email protected]> wrote:
>
> > > > http://redhat.com/~mingo/misc/config-Thu_Jul_10_21_43_28_CEST_2008.bad
> > >
> > > Ingo, that was my stupid typo. Please apply this patch. BTW, we need
> > > some more xen64 paravirt fixes in this area. I will look at it as
> > > soon as possible.
> >
> > applied to tip/x86/x2apic - thanks Suresh.
>
> another problem is the redefinition of apic_read(), causing:
>
> arch/x86/xen/enlighten.c: In function ‘xen_patch':
> arch/x86/xen/enlighten.c:1084: warning: label ‘patch_site' defined but not used
> arch/x86/xen/enlighten.c: At top level:
> arch/x86/xen/enlighten.c:1272: error: expected identifier before ‘(' token
> arch/x86/xen/enlighten.c:1273: error: expected ‘}' before ‘.' token
>
> with this config:
>
> http://redhat.com/~mingo/misc/config-Fri_Jul_11_21_51_18_CEST_2008.bad
>
> the continued spaghetti in all the APIC variants is quite ugly. This
> should all be handled via a single apic_ops template that should cover
> the paravirt and native variants as well.

Ingo, I just posted the fix for this.

To cleanup the code:

struct pv_apic_ops {
#ifdef CONFIG_X86_LOCAL_APIC
/*
* Direct APIC operations, principally for VMI. Ideally
* these shouldn't be in this interface.
*/
void (*apic_write)(unsigned long reg, u32 v);
void (*apic_write_atomic)(unsigned long reg, u32 v);
u32 (*apic_read)(unsigned long reg);

Probably we should move the three above routines to basic apic_ops, which just
deal with the apic HW accesses and retain the below for pv_apic_ops, which
care more than the basic reg accesses. This will be true for both 32/64bits..

void (*setup_boot_clock)(void);
void (*setup_secondary_clock)(void);

void (*startup_ipi_hook)(int phys_apicid,
unsigned long start_eip,
unsigned long start_esp);
#endif
};


Unless there is an objection, I will post the fix.

thanks,
suresh

2008-07-11 20:42:34

by Yinghai Lu

[permalink] [raw]
Subject: Re: [patch 00/26] x64, x2apic/intr-remap: Interrupt-remapping and x2apic support

got:

Linux version 2.6.26-rc9-tip-01763-g74f94b1-dirty (yhlu@linux-zpir)
(gcc version 4.3.1 20080507 (prerelease) [gcc-4_3-branch revision
135036] (SUSE Linux) ) #320 SMP Fri Jul 11 13:35:20 PDT 2008
Command line: console=uart8250,io,0x3f8,115200n8
initrd=kernel.org/mydisk11_x86_64.gz rw root=/dev/ram0 debug
show_msr=1 nopat initcall_debug apic=verbose pci=routeirq ip=dhcp
load_ramdisk=1 ramdisk_size=131072
BOOT_IMAGE=kernel.org/bzImage_2.6.26_k8.h
BIOS-provided physical RAM map:
BIOS-e820: 0000000000000000 - 000000000009cc00 (usable)....
BIOS-e820: 000000000009cc00 - 00000000000a0000 (reserved)......................
BIOS-e820: 00000000000e6000 - 0000000000100000 (reserved)......................
BIOS-e820: 0000000000100000 - 0000000087fe0000 (usable)........................
BIOS-e820: 0000000087fe0000 - 0000000087fee000 (ACPI data).....................
BIOS-e820: 0000000087fee000 - 0000000087fff2e0 (ACPI NVS)......................
BIOS-e820: 0000000087fff2e0 - 0000000088000000 (reserved)
BIOS-e820: 00000000fec00000 - 00000000fec01000 (reserved)
BIOS-e820: 00000000fee00000 - 00000000fee01000 (reserved)
BIOS-e820: 00000000ff700000 - 0000000100000000 (reserved)
BIOS-e820: 0000000100000000 - 0000000478000000 (usable)
KERNEL supported cpus:
Intel GenuineIntel
AMD AuthenticAMD
Centaur CentaurHauls
Early serial console at I/O port 0x3f8 (options '115200n8')
console [uart0] enabled
PAT support disabled.
last_pfn = 0x478000 max_arch_pfn = 0x3ffffffff
last_pfn = 0x87fe0 max_arch_pfn = 0x3ffffffff
init_memory_mapping
Using GB pages for direct mapping
0000000000 - 0080000000 page 1G
0080000000 - 0087e00000 page 2M
0087e00000 - 0087fe0000 page 4k
kernel direct mapping tables up to 87fe0000 @ 8000-b000
last_map_addr: 87fe0000 end: 87fe0000
init_memory_mapping
Using GB pages for direct mapping
0100000000 - 0440000000 page 1G
0440000000 - 0478000000 page 2M
kernel direct mapping tables up to 478000000 @ a000-c000
last_map_addr: 478000000 end: 478000000
RAMDISK: 7e6d7000 - 7ffffe0b
DMI 2.3 present.
ACPI: RSDP 000FA740, 0024 (r2 SUN )
ACPI: XSDT 87FE0100, 006C (r1 SUN X6420 14 MSFT 97)
ACPI: FACP 87FE0290, 00F4 (r3 SUN X6420 14 MSFT 97)
ACPI: DSDT 87FE0500, 876F (r1 SUN X6420 14 INTL 20060512)
ACPI: FACS 87FEE000, 0040
ACPI: APIC 87FE0390, 00D8 (r1 SUN X6420 14 MSFT 97)
ACPI: SPCR 87FE0470, 0050 (r1 SUN X6420 14 MSFT 97)
ACPI: SLIT 87FE04C0, 003C (r1 SUN X6420 14 MSFT 97)
ACPI: OEMB 87FEE040, 0063 (r1 SUN X6420 14 MSFT 97)
ACPI: SRAT 87FE8C70, 0220 (r1 AMD HAMMER 1 AMD 1)
ACPI: HPET 87FE8E90, 0038 (r1 SUN X6420 14 MSFT 97)
ACPI: IPET 87FE8ED0, 0038 (r1 SUN X6420 14 MSFT 97)
ACPI: SSDT 87FE8F10, 2854 (r1 A M I POWERNOW 1 AMD 1)
SRAT: PXM 0 -> APIC 4 -> Node 0
SRAT: PXM 0 -> APIC 5 -> Node 0
SRAT: PXM 0 -> APIC 6 -> Node 0
SRAT: PXM 0 -> APIC 7 -> Node 0
SRAT: PXM 1 -> APIC 8 -> Node 1
SRAT: PXM 1 -> APIC 9 -> Node 1
SRAT: PXM 1 -> APIC 10 -> Node 1
SRAT: PXM 1 -> APIC 11 -> Node 1
SRAT: PXM 2 -> APIC 12 -> Node 2
SRAT: PXM 2 -> APIC 13 -> Node 2
SRAT: PXM 2 -> APIC 14 -> Node 2
SRAT: PXM 2 -> APIC 15 -> Node 2
SRAT: PXM 3 -> APIC 16 -> Node 3
SRAT: PXM 3 -> APIC 17 -> Node 3
SRAT: PXM 3 -> APIC 18 -> Node 3
SRAT: PXM 3 -> APIC 19 -> Node 3
SRAT: Node 0 PXM 0 0-a0000
Entering add_active_range(0, 0x0, 0x9c) 0 entries of 3200 used
SRAT: Node 0 PXM 0 100000-88000000
Entering add_active_range(0, 0x100, 0x87fe0) 1 entries of 3200 used
SRAT: Node 0 PXM 0 100000000-178000000
Entering add_active_range(0, 0x100000, 0x178000) 2 entries of 3200 used
SRAT: Node 1 PXM 1 178000000-278000000
Entering add_active_range(1, 0x178000, 0x278000) 3 entries of 3200 used
SRAT: Node 2 PXM 2 278000000-378000000
Entering add_active_range(2, 0x278000, 0x378000) 4 entries of 3200 used
SRAT: Node 3 PXM 3 378000000-478000000
Entering add_active_range(3, 0x378000, 0x478000) 5 entries of 3200 used
ACPI: SLIT: nodes = 4
10 13 13 16
13 10 13 13
13 13 10 13
16 13 13 10
NUMA: Allocated memnodemap from b000 - 13f80
NUMA: Using 20 for the hash shift.
Bootmem setup node 0 0000000000000000-0000000178000000
NODE_DATA [0000000000013f80 - 0000000000018f7f]
bootmap [0000000000019000 - 0000000000047fff] pages 2f
(9 early reservations) ==> bootmem
#0 [0000000000 - 0000001000] BIOS data page ==> [0000000000 - 0000001000]
#1 [0000006000 - 0000008000] TRAMPOLINE ==> [0000006000 - 0000008000]
#2 [0000200000 - 00010ba6d4] TEXT DATA BSS ==> [0000200000 - 00010ba6d4]
#3 [007e6d7000 - 007ffffe0b] RAMDISK ==> [007e6d7000 - 007ffffe0b]
#4 [000009c800 - 0000100000] BIOS reserved ==> [000009c800 - 0000100000]
#5 [0000008000 - 000000a000] PGTABLE ==> [0000008000 - 000000a000]
#6 [000000a000 - 000000b000] PGTABLE ==> [000000a000 - 000000b000]
#7 [0000001000 - 000000103c] ACPI SLIT ==> [0000001000 - 000000103c]
#8 [000000b000 - 0000013f80] MEMNODEMAP ==> [000000b000 - 0000013f80]
Bootmem setup node 1 0000000178000000-0000000278000000
NODE_DATA [0000000178000000 - 0000000178004fff]
bootmap [0000000178005000 - 0000000178024fff] pages 20
(9 early reservations) ==> bootmem
#0 [0000000000 - 0000001000] BIOS data page
#1 [0000006000 - 0000008000] TRAMPOLINE
#2 [0000200000 - 00010ba6d4] TEXT DATA BSS
#3 [007e6d7000 - 007ffffe0b] RAMDISK
#4 [000009c800 - 0000100000] BIOS reserved
#5 [0000008000 - 000000a000] PGTABLE
#6 [000000a000 - 000000b000] PGTABLE
#7 [0000001000 - 000000103c] ACPI SLIT
#8 [000000b000 - 0000013f80] MEMNODEMAP
Bootmem setup node 2 0000000278000000-0000000378000000
NODE_DATA [0000000278000000 - 0000000278004fff]
bootmap [0000000278005000 - 0000000278024fff] pages 20
(9 early reservations) ==> bootmem
#0 [0000000000 - 0000001000] BIOS data page
#1 [0000006000 - 0000008000] TRAMPOLINE
#2 [0000200000 - 00010ba6d4] TEXT DATA BSS
#3 [007e6d7000 - 007ffffe0b] RAMDISK
#4 [000009c800 - 0000100000] BIOS reserved
#5 [0000008000 - 000000a000] PGTABLE
#6 [000000a000 - 000000b000] PGTABLE
#7 [0000001000 - 000000103c] ACPI SLIT
#8 [000000b000 - 0000013f80] MEMNODEMAP
Bootmem setup node 3 0000000378000000-0000000478000000
NODE_DATA [0000000378000000 - 0000000378004fff]
bootmap [0000000378005000 - 0000000378024fff] pages 20
(9 early reservations) ==> bootmem
#0 [0000000000 - 0000001000] BIOS data page
#1 [0000006000 - 0000008000] TRAMPOLINE
#2 [0000200000 - 00010ba6d4] TEXT DATA BSS
#3 [007e6d7000 - 007ffffe0b] RAMDISK
#4 [000009c800 - 0000100000] BIOS reserved
#5 [0000008000 - 000000a000] PGTABLE
#6 [000000a000 - 000000b000] PGTABLE
#7 [0000001000 - 000000103c] ACPI SLIT
#8 [000000b000 - 0000013f80] MEMNODEMAP
Scan SMP from ffff880000000000 for 1024 bytes.
Scan SMP from ffff88000009fc00 for 1024 bytes.
Scan SMP from ffff8800000f0000 for 65536 bytes.
found SMP MP-table at [ffff8800000ff780] 000ff780
[ffffe20000000000-ffffe27fffffffff] PGD ->ffff8800011bd000 on node 0
[ffffe20000000000-ffffe2003fffffff] PUD ->ffff8800011be000 on node 0
[ffffe20005240000-ffffe200053fffff] potential offnode page_structs
[ffffe20000000000-ffffe200053fffff] PMD ->
[ffff880001200000-ffff880004bfffff] on node 0
[ffffe20008a40000-ffffe20008bfffff] potential offnode page_structs
[ffffe20005400000-ffffe20008bfffff] PMD ->
[ffff880178200000-ffff88017b9fffff] on node 1
[ffffe2000c240000-ffffe2000c3fffff] potential offnode page_structs
[ffffe20008c00000-ffffe2000c3fffff] PMD ->
[ffff880278200000-ffff88027b9fffff] on node 2
[ffffe2000c400000-ffffe2000fbfffff] PMD ->
[ffff880378200000-ffff88037b9fffff] on node 3
Zone PFN ranges:
DMA 0x00000000 -> 0x00001000
DMA32 0x00001000 -> 0x00100000
Normal 0x00100000 -> 0x00478000
Movable zone start PFN for each node
early_node_map[6] active PFN ranges
0: 0x00000000 -> 0x0000009c
0: 0x00000100 -> 0x00087fe0
0: 0x00100000 -> 0x00178000
1: 0x00178000 -> 0x00278000
2: 0x00278000 -> 0x00378000
3: 0x00378000 -> 0x00478000
On node 0 totalpages: 1048444
DMA zone: 56 pages used for memmap
DMA zone: 115 pages reserved
DMA zone: 3825 pages, LIFO batch:0
DMA32 zone: 14280 pages used for memmap
DMA32 zone: 538648 pages, LIFO batch:31
Normal zone: 6720 pages used for memmap
Normal zone: 484800 pages, LIFO batch:31
Movable zone: 0 pages used for memmap
On node 1 totalpages: 1048576
DMA zone: 0 pages used for memmap
DMA32 zone: 0 pages used for memmap
Normal zone: 14336 pages used for memmap
Normal zone: 1034240 pages, LIFO batch:31
Movable zone: 0 pages used for memmap
On node 2 totalpages: 1048576
DMA zone: 0 pages used for memmap
DMA32 zone: 0 pages used for memmap
Normal zone: 14336 pages used for memmap
Normal zone: 1034240 pages, LIFO batch:31
Movable zone: 0 pages used for memmap
On node 3 totalpages: 1048576
DMA zone: 0 pages used for memmap
DMA32 zone: 0 pages used for memmap
Normal zone: 14336 pages used for memmap
Normal zone: 1034240 pages, LIFO batch:31
Movable zone: 0 pages used for memmap
ACPI: PM-Timer IO Port: 0x4008
ACPI: Local APIC address 0xfee00000
ACPI: LAPIC (acpi_id[0x01] lapic_id[0x04] enabled)
ACPI: LAPIC (acpi_id[0x02] lapic_id[0x05] enabled)
ACPI: LAPIC (acpi_id[0x03] lapic_id[0x06] enabled)
ACPI: LAPIC (acpi_id[0x04] lapic_id[0x07] enabled)
ACPI: LAPIC (acpi_id[0x05] lapic_id[0x08] enabled)
ACPI: LAPIC (acpi_id[0x06] lapic_id[0x09] enabled)
ACPI: LAPIC (acpi_id[0x07] lapic_id[0x0a] enabled)
ACPI: LAPIC (acpi_id[0x08] lapic_id[0x0b] enabled)
ACPI: LAPIC (acpi_id[0x09] lapic_id[0x0c] enabled)
ACPI: LAPIC (acpi_id[0x0a] lapic_id[0x0d] enabled)
ACPI: LAPIC (acpi_id[0x0b] lapic_id[0x0e] enabled)
ACPI: LAPIC (acpi_id[0x0c] lapic_id[0x0f] enabled)
ACPI: LAPIC (acpi_id[0x0d] lapic_id[0x10] enabled)
ACPI: LAPIC (acpi_id[0x0e] lapic_id[0x11] enabled)
ACPI: LAPIC (acpi_id[0x0f] lapic_id[0x12] enabled)
ACPI: LAPIC (acpi_id[0x10] lapic_id[0x13] enabled)
ACPI: IOAPIC (id[0x00] address[0xfec00000] gsi_base[0])
IOAPIC[0]: apic_id 0, version 0, address 0xfec00000, GSI 0-23
ACPI: IOAPIC (id[0x01] address[0xbdeff000] gsi_base[24])
IOAPIC[1]: apic_id 1, version 0, address 0xbdeff000, GSI 24-47
ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
ACPI: IRQ0 used by override.
ACPI: IRQ2 used by override.
ACPI: IRQ9 used by override.
ACPI: HPET id: 0x0 base: 0xfed00000
Using ACPI (MADT) for SMP configuration information
SMP: Allowing 16 CPUs, 0 hotplug CPUs
init_cpu_to_node:
cpu 0 -> apicid 4 -> node 0
cpu 1 -> apicid 5 -> node 0
cpu 2 -> apicid 6 -> node 0
cpu 3 -> apicid 7 -> node 0
cpu 4 -> apicid 8 -> node 1
cpu 5 -> apicid 9 -> node 1
cpu 6 -> apicid 10 -> node 1
cpu 7 -> apicid 11 -> node 1
cpu 8 -> apicid 12 -> node 2
cpu 9 -> apicid 13 -> node 2
cpu 10 -> apicid 14 -> node 2
cpu 11 -> apicid 15 -> node 2
cpu 12 -> apicid 16 -> node 3
cpu 13 -> apicid 17 -> node 3
cpu 14 -> apicid 18 -> node 3
cpu 15 -> apicid 19 -> node 3
mapped APIC to ffffffffff5fb000 ( fee00000)
mapped IOAPIC to ffffffffff5fa000 (00000000fec00000)
mapped IOAPIC to ffffffffff5f9000 (00000000bdeff000)
Allocating PCI resources starting at 90000000 (gap: 88000000:76c00000)
PERCPU: Allocating 53312 bytes of per cpu data
per cpu data for cpu0 on node0 at 00000000010e4000
per cpu data for cpu1 on node0 at 00000000010f2000
per cpu data for cpu2 on node0 at 0000000001100000
per cpu data for cpu3 on node0 at 000000000110e000
per cpu data for cpu4 on node1 at 000000017ba18000
per cpu data for cpu5 on node1 at 000000017ba26000
per cpu data for cpu6 on node1 at 000000017ba34000
per cpu data for cpu7 on node1 at 000000017ba42000
per cpu data for cpu8 on node2 at 000000027ba18000
per cpu data for cpu9 on node2 at 000000027ba26000
per cpu data for cpu10 on node2 at 000000027ba34000
per cpu data for cpu11 on node2 at 000000027ba42000
per cpu data for cpu12 on node3 at 000000037ba18000
per cpu data for cpu13 on node3 at 000000037ba26000
per cpu data for cpu14 on node3 at 000000037ba34000
per cpu data for cpu15 on node3 at 000000037ba42000
NR_CPUS: 128, nr_cpu_ids: 16, nr_node_ids 4
Built 4 zonelists in Zone order, mobility grouping on. Total pages: 4129993
Policy zone: Normal
Kernel command line: console=uart8250,io,0x3f8,115200n8
initrd=kernel.org/mydisk11_x86_64.gz rw root=/dev/ram0 debug
show_msr=1 nopat initcall_debug apic=verbose pci=routeirq ip=dhcp
load_ramdisk=1 ramdisk_size=131072
BOOT_IMAGE=kernel.org/bzImage_2.6.26_k8.h
Initializing CPU#0
PID hash table entries: 4096 (order: 12, 32768 bytes)
Extended CMOS year: 2000
TSC calibrated against PM_TIMER
Detected 2293.903 MHz processor.
spurious 8259A interrupt: IRQ7.
Console: colour VGA+ 80x25
console handover: boot [uart0] -> real [ttyS0]
Checking aperture...
No AGP bridge found
Node 0: aperture @ a21c000000 size 32 MB
Aperture beyond 4GB. Ignoring.
Your BIOS doesn't leave a aperture memory hole
Please enable the IOMMU option in the BIOS setup
This costs you 64 MB of RAM
Mapping aperture over 65536 KB of RAM @ 20000000
numa_free_all_bootmem node 0 done
numa_free_all_bootmem node 1 done
numa_free_all_bootmem node 2 done
numa_free_all_bootmem node 3 done
Memory: 16437296k/18743296k available (8383k kernel code, 339392k
reserved, 4020k data, 988k init)
CPA: page pool initialized 1 of 1 pages preallocated
SLUB: Genslabs=13, HWalign=64, Order=0-3, MinObjects=0, CPUs=16, Nodes=4
hpet clockevent registered
Calibrating delay loop (skipped), value calculated using timer
frequency.. <6>4587.80 BogoMIPS (lpj=9175600)
Dentry cache hash table entries: 2097152 (order: 12, 16777216 bytes)
Inode-cache hash table entries: 1048576 (order: 11, 8388608 bytes)
Mount-cache hash table entries: 256
Initializing cgroup subsys ns
Initializing cgroup subsys cpuacct
CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
CPU: L2 Cache: 512K (64 bytes/line)
CPU 0/4 -> Node 0
Enable MMCONFIG on AMD Family 10h
CPU: Physical Processor ID: 0
CPU: Processor Core ID: 0
using C1E aware idle routine
ACPI: Core revision 20080321
Parsing all Control Methods:
Table [DSDT](id 0001) - 1289 Objects with 114 Devices 462 Methods 26 Regions
Parsing all Control Methods:
Table [SSDT](id 0002) - 80 Objects with 0 Devices 0 Methods 0 Regions
tbxface-0598 [00] tb_load_namespace : ACPI Tables successfully acquired
evxfevnt-0091 [00] enable : Transition to ACPI mode successful
Setting APIC routing to physical flat
Kernel panic - not syncing: Boot APIC ID in local APIC unexpected (0 vs 4)
Pid: 1, comm: swapper Not tainted 2.6.26-rc9-tip-01763-g74f94b1-dirty #320

Call Trace:
[<ffffffff80a21505>] ? set_cpu_sibling_map+0x38c/0x3bd
[<ffffffff80245215>] ? read_xapic_id+0x25/0x3e
[<ffffffff80e5a2c3>] ? verify_local_APIC+0x139/0x1b9
[<ffffffff80245215>] ? read_xapic_id+0x25/0x3e
[<ffffffff80e589af>] ? native_smp_prepare_cpus+0x224/0x2e9
[<ffffffff80e4881a>] ? kernel_init+0x64/0x341
[<ffffffff8022a439>] ? child_rip+0xa/0x11
[<ffffffff80e487b6>] ? kernel_init+0x0/0x341
[<ffffffff8022a42f>] ? child_rip+0x0/0x11


guess read_apic_id changing cuase some problem...

YH

2008-07-11 20:45:56

by Ingo Molnar

[permalink] [raw]
Subject: Re: [patch 00/26] x64, x2apic/intr-remap: Interrupt-remapping and x2apic support


* Yinghai Lu <[email protected]> wrote:

> Setting APIC routing to physical flat
> Kernel panic - not syncing: Boot APIC ID in local APIC unexpected (0 vs 4)
> Pid: 1, comm: swapper Not tainted 2.6.26-rc9-tip-01763-g74f94b1-dirty #320
>
> Call Trace:
> [<ffffffff80a21505>] ? set_cpu_sibling_map+0x38c/0x3bd
> [<ffffffff80245215>] ? read_xapic_id+0x25/0x3e
> [<ffffffff80e5a2c3>] ? verify_local_APIC+0x139/0x1b9
> [<ffffffff80245215>] ? read_xapic_id+0x25/0x3e
> [<ffffffff80e589af>] ? native_smp_prepare_cpus+0x224/0x2e9
> [<ffffffff80e4881a>] ? kernel_init+0x64/0x341
> [<ffffffff8022a439>] ? child_rip+0xa/0x11
> [<ffffffff80e487b6>] ? kernel_init+0x0/0x341
> [<ffffffff8022a42f>] ? child_rip+0x0/0x11
>
>
> guess read_apic_id changing cuase some problem...

i got a build error sooner so i've taken it out of tip/master again.

Ingo

2008-07-11 20:50:06

by Ingo Molnar

[permalink] [raw]
Subject: Re: [patch 00/26] x64, x2apic/intr-remap: Interrupt-remapping and x2apic support


* Suresh Siddha <[email protected]> wrote:

> On Fri, Jul 11, 2008 at 01:09:57PM -0700, Ingo Molnar wrote:
> >
> > * Ingo Molnar <[email protected]> wrote:
> >
> > > > > http://redhat.com/~mingo/misc/config-Thu_Jul_10_21_43_28_CEST_2008.bad
> > > >
> > > > Ingo, that was my stupid typo. Please apply this patch. BTW, we need
> > > > some more xen64 paravirt fixes in this area. I will look at it as
> > > > soon as possible.
> > >
> > > applied to tip/x86/x2apic - thanks Suresh.
> >
> > another problem is the redefinition of apic_read(), causing:
> >
> > arch/x86/xen/enlighten.c: In function ‘xen_patch':
> > arch/x86/xen/enlighten.c:1084: warning: label ‘patch_site' defined but not used
> > arch/x86/xen/enlighten.c: At top level:
> > arch/x86/xen/enlighten.c:1272: error: expected identifier before ‘(' token
> > arch/x86/xen/enlighten.c:1273: error: expected ‘}' before ‘.' token
> >
> > with this config:
> >
> > http://redhat.com/~mingo/misc/config-Fri_Jul_11_21_51_18_CEST_2008.bad
> >
> > the continued spaghetti in all the APIC variants is quite ugly. This
> > should all be handled via a single apic_ops template that should cover
> > the paravirt and native variants as well.
>
> Ingo, I just posted the fix for this.

applied to tip/x86/x2apic:

Suresh Siddha (3):
x2apic: uninline uv_init_apic_ldr()
x2apic: xen64 paravirt basic apic ops
x2apic: kernel-parameter documentation for "x2apic_phys"

thanks Suresh.

> To cleanup the code:
>
> struct pv_apic_ops {
> #ifdef CONFIG_X86_LOCAL_APIC
> /*
> * Direct APIC operations, principally for VMI. Ideally
> * these shouldn't be in this interface.
> */
> void (*apic_write)(unsigned long reg, u32 v);
> void (*apic_write_atomic)(unsigned long reg, u32 v);
> u32 (*apic_read)(unsigned long reg);
>
> Probably we should move the three above routines to basic apic_ops,
> which just deal with the apic HW accesses and retain the below for
> pv_apic_ops, which care more than the basic reg accesses. This will be
> true for both 32/64bits..
>
> void (*setup_boot_clock)(void);
> void (*setup_secondary_clock)(void);
>
> void (*startup_ipi_hook)(int phys_apicid,
> unsigned long start_eip,
> unsigned long start_esp);
> #endif
> };
>
> Unless there is an objection, I will post the fix.

ok. Jeremy, agreed?

Ingo

2008-07-11 21:24:27

by Suresh Siddha

[permalink] [raw]
Subject: Re: [patch 00/26] x64, x2apic/intr-remap: Interrupt-remapping and x2apic support

On Fri, Jul 11, 2008 at 01:45:21PM -0700, Ingo Molnar wrote:
>
> * Yinghai Lu <[email protected]> wrote:
>
> > Setting APIC routing to physical flat
> > Kernel panic - not syncing: Boot APIC ID in local APIC unexpected (0 vs 4)
> > Pid: 1, comm: swapper Not tainted 2.6.26-rc9-tip-01763-g74f94b1-dirty #320
> >
> > Call Trace:
> > [<ffffffff80a21505>] ? set_cpu_sibling_map+0x38c/0x3bd
> > [<ffffffff80245215>] ? read_xapic_id+0x25/0x3e
> > [<ffffffff80e5a2c3>] ? verify_local_APIC+0x139/0x1b9
> > [<ffffffff80245215>] ? read_xapic_id+0x25/0x3e
> > [<ffffffff80e589af>] ? native_smp_prepare_cpus+0x224/0x2e9
> > [<ffffffff80e4881a>] ? kernel_init+0x64/0x341
> > [<ffffffff8022a439>] ? child_rip+0xa/0x11
> > [<ffffffff80e487b6>] ? kernel_init+0x0/0x341
> > [<ffffffff8022a42f>] ? child_rip+0x0/0x11
> >
> >
> > guess read_apic_id changing cuase some problem...

Yinghai, Can you please try the appended patch to see if it fixes your problem?

I guess you need to try the tip/x86/x2apic tree now please :(

BTW, I don't know why we even do verify_local_APIC() in 64bit. Especially
when we don't care for the return value, it should be a no-op.

---
genapic's read_apic_id() returns the actual apic id extracted from
the APIC_ID register. And in some cases like UV, read_apic_id()
returns completely different values from APIC ID register.

Use the native apic register read, rather than genapic read_apic_id()
in verify_local_APIC()

And also, lapic_suspend() should also use native apic register read.

Signed-off-by: Suresh Siddha <[email protected]>
---

diff --git a/arch/x86/kernel/apic_64.c b/arch/x86/kernel/apic_64.c
index 9a319d7..dd05010 100644
--- a/arch/x86/kernel/apic_64.c
+++ b/arch/x86/kernel/apic_64.c
@@ -698,10 +698,10 @@ int __init verify_local_APIC(void)
/*
* The ID register is read/write in a real APIC.
*/
- reg0 = read_apic_id();
+ reg0 = apic_read(APIC_ID);
apic_printk(APIC_DEBUG, "Getting ID: %x\n", reg0);
apic_write(APIC_ID, reg0 ^ APIC_ID_MASK);
- reg1 = read_apic_id();
+ reg1 = apic_read(APIC_ID);
apic_printk(APIC_DEBUG, "Getting ID: %x\n", reg1);
apic_write(APIC_ID, reg0);
if (reg1 != (reg0 ^ APIC_ID_MASK))
@@ -1336,7 +1336,7 @@ static int lapic_suspend(struct sys_device *dev, pm_message_t state)

maxlvt = lapic_get_maxlvt();

- apic_pm_state.apic_id = read_apic_id();
+ apic_pm_state.apic_id = apic_read(APIC_ID);
apic_pm_state.apic_taskpri = apic_read(APIC_TASKPRI);
apic_pm_state.apic_ldr = apic_read(APIC_LDR);
apic_pm_state.apic_dfr = apic_read(APIC_DFR);

2008-07-11 22:02:56

by Yinghai Lu

[permalink] [raw]
Subject: Re: [patch 00/26] x64, x2apic/intr-remap: Interrupt-remapping and x2apic support

On Fri, Jul 11, 2008 at 2:24 PM, Suresh Siddha
<[email protected]> wrote:
> On Fri, Jul 11, 2008 at 01:45:21PM -0700, Ingo Molnar wrote:
>>
>> * Yinghai Lu <[email protected]> wrote:
>>
>> > Setting APIC routing to physical flat
>> > Kernel panic - not syncing: Boot APIC ID in local APIC unexpected (0 vs 4)
>> > Pid: 1, comm: swapper Not tainted 2.6.26-rc9-tip-01763-g74f94b1-dirty #320
>> >
>> > Call Trace:
>> > [<ffffffff80a21505>] ? set_cpu_sibling_map+0x38c/0x3bd
>> > [<ffffffff80245215>] ? read_xapic_id+0x25/0x3e
>> > [<ffffffff80e5a2c3>] ? verify_local_APIC+0x139/0x1b9
>> > [<ffffffff80245215>] ? read_xapic_id+0x25/0x3e
>> > [<ffffffff80e589af>] ? native_smp_prepare_cpus+0x224/0x2e9
>> > [<ffffffff80e4881a>] ? kernel_init+0x64/0x341
>> > [<ffffffff8022a439>] ? child_rip+0xa/0x11
>> > [<ffffffff80e487b6>] ? kernel_init+0x0/0x341
>> > [<ffffffff8022a42f>] ? child_rip+0x0/0x11
>> >
>> >
>> > guess read_apic_id changing cuase some problem...
>
> Yinghai, Can you please try the appended patch to see if it fixes your problem?
>
> I guess you need to try the tip/x86/x2apic tree now please :(
>
> BTW, I don't know why we even do verify_local_APIC() in 64bit. Especially
> when we don't care for the return value, it should be a no-op.
>
> ---
> genapic's read_apic_id() returns the actual apic id extracted from
> the APIC_ID register. And in some cases like UV, read_apic_id()
> returns completely different values from APIC ID register.
>
> Use the native apic register read, rather than genapic read_apic_id()
> in verify_local_APIC()
>
> And also, lapic_suspend() should also use native apic register read.
>
> Signed-off-by: Suresh Siddha <[email protected]>
> ---
>
> diff --git a/arch/x86/kernel/apic_64.c b/arch/x86/kernel/apic_64.c
> index 9a319d7..dd05010 100644
> --- a/arch/x86/kernel/apic_64.c
> +++ b/arch/x86/kernel/apic_64.c
> @@ -698,10 +698,10 @@ int __init verify_local_APIC(void)
> /*
> * The ID register is read/write in a real APIC.
> */
> - reg0 = read_apic_id();
> + reg0 = apic_read(APIC_ID);
> apic_printk(APIC_DEBUG, "Getting ID: %x\n", reg0);
> apic_write(APIC_ID, reg0 ^ APIC_ID_MASK);
> - reg1 = read_apic_id();
> + reg1 = apic_read(APIC_ID);
> apic_printk(APIC_DEBUG, "Getting ID: %x\n", reg1);
> apic_write(APIC_ID, reg0);
> if (reg1 != (reg0 ^ APIC_ID_MASK))
> @@ -1336,7 +1336,7 @@ static int lapic_suspend(struct sys_device *dev, pm_message_t state)
>
> maxlvt = lapic_get_maxlvt();
>
> - apic_pm_state.apic_id = read_apic_id();
> + apic_pm_state.apic_id = apic_read(APIC_ID);
> apic_pm_state.apic_taskpri = apic_read(APIC_TASKPRI);
> apic_pm_state.apic_ldr = apic_read(APIC_LDR);
> apic_pm_state.apic_dfr = apic_read(APIC_DFR);
>

works. it should be merged into the patch that introduce new read_apic_id

really should unify read_apic_id, GET_APIC_ID, GET_XAPIC_ID...

YH

2008-07-12 03:16:18

by Yinghai Lu

[permalink] [raw]
Subject: Re: [patch 00/26] x64, x2apic/intr-remap: Interrupt-remapping and x2apic support

1. wonder if x2apic can be use with uniprocessor.

in APIC_init_uniprocessor, it will try to enable x2apic, but later

apic_write(APIC_ID, SET_APIC_ID(boot_cpu_physical_apicid));

but SET_APIC_ID is still for xapic version. so need to GET_APIC_ID,
SET_APIC_ID for different
genapic like 32bit.

2 check_x2apic is called in setup_arch, but it only set apic_ops,
and genapic still not changed, aka apic_flat...
wonder if you need to call setup_apic_routing to set genapic.

otherwise read_apic_id could have use the one from apic_flat....need
to shift......

3.or move read_apic_id to apic_ops intead...together with GET_APIC_ID too.
but 32bit version seems like to put GET_APIC_ID with genapic...

which one is better? 2 or 3

YH

2008-07-12 04:02:22

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [patch 00/26] x64, x2apic/intr-remap: Interrupt-remapping and x2apic support

"Yinghai Lu" <[email protected]> writes:

> 1. wonder if x2apic can be use with uniprocessor.
>
> in APIC_init_uniprocessor, it will try to enable x2apic, but later
>
> apic_write(APIC_ID, SET_APIC_ID(boot_cpu_physical_apicid));
>
> but SET_APIC_ID is still for xapic version. so need to GET_APIC_ID,
> SET_APIC_ID for different
> genapic like 32bit.
>
> 2 check_x2apic is called in setup_arch, but it only set apic_ops,
> and genapic still not changed, aka apic_flat...
> wonder if you need to call setup_apic_routing to set genapic.
>
> otherwise read_apic_id could have use the one from apic_flat....need
> to shift......
>
> 3.or move read_apic_id to apic_ops intead...together with GET_APIC_ID too.
> but 32bit version seems like to put GET_APIC_ID with genapic...
>
> which one is better? 2 or 3

Z finish untangle SMP support from apic initialization and move the apic
initialization up into init_IRQ.

That is better but is likely the wrong short term approach.

Eric

2008-07-12 05:37:48

by Ingo Molnar

[permalink] [raw]
Subject: Re: [patch 00/26] x64, x2apic/intr-remap: Interrupt-remapping and x2apic support


* Yinghai Lu <[email protected]> wrote:

> > Yinghai, Can you please try the appended patch to see if it fixes
> > your problem?

> works. it should be merged into the patch that introduce new
> read_apic_id

ok, that came from the x86/uv branch - but in that form it was not
affected, only tip/x2apic exposed the problem, right?

So i've rebased tip/x86/x2apic and moved Suresh's fix in front of the
other patches, to make it all bisectable.

i have also applied your apic unification patches. This is how the topic
layout looks like now:

Ingo Molnar (1):
Merge branch 'x86/core' into x86/x2apic

Suresh Siddha (31):
x64, x2apic/intr-remap: Interrupt-remapping and x2apic support, fix
x64, x2apic/intr-remap: Intel vt-d, IOMMU code reorganization
x64, x2apic/intr-remap: fix the need for sequential array allocation of iommus
x64, x2apic/intr-remap: code re-structuring, to be used by both DMA and Interrupt remapping
x64, x2apic/intr-remap: use CONFIG_DMAR for DMA-remapping specific code
x64, x2apic/intr-remap: Fix the need for RMRR in the DMA-remapping detection
x64, x2apic/intr-remap: parse ioapic scope under vt-d structures
x64, x2apic/intr-remap: move IOMMU_WAIT_OP() macro to intel-iommu.h
x64, x2apic/intr-remap: Queued invalidation infrastructure (part of VT-d)
x64, x2apic/intr-remap: Interrupt remapping infrastructure
x64, x2apic/intr-remap: routines managing Interrupt remapping table entries.
x64, x2apic/intr-remap: generic irq migration support from process context
x64, x2apic/intr-remap: 8259 specific mask/unmask routines
x64, x2apic/intr-remap: ioapic routines which deal with initial io-apic RTE setup
x64, x2apic/intr-remap: introduce read_apic_id() to genapic routines
x64, x2apic/intr-remap: basic apic ops support
x64, x2apic/intr-remap: cpuid bits for x2apic feature
x64, x2apic/intr-remap: disable DMA-remapping if Interrupt-remapping is detected (temporary quirk)
x64, x2apic/intr-remap: x2apic ops for x2apic mode support
x64, x2apic/intr-remap: introcude self IPI to genapic routines
x64, x2apic/intr-remap: x2apic cluster mode support
x64, x2apic/intr-remap: setup init_apic_ldr for UV
x64, x2apic/intr-remap: IO-APIC support for interrupt-remapping
x64, x2apic/intr-remap: MSI and MSI-X support for interrupt remapping infrastructure
x64, x2apic/intr-remap: add x2apic support, including enabling interrupt-remapping
x64, x2apic/intr-remap: support for x2apic physical mode support
x64, x2apic/intr-remap: introduce CONFIG_INTR_REMAP
x64, x2apic/intr-remap: Interrupt-remapping and x2apic support
x2apic: uninline uv_init_apic_ldr()
x2apic: xen64 paravirt basic apic ops
x2apic: kernel-parameter documentation for "x2apic_phys"

Yinghai Lu (3):
x86: let 32bit use apic_ops too
x86: mach_apicdef.h need to include before smp.h
x86: make read_apic_id return final apicid

Ingo

2008-07-12 06:06:52

by Yinghai Lu

[permalink] [raw]
Subject: Re: [patch 00/26] x64, x2apic/intr-remap: Interrupt-remapping and x2apic support

On Fri, Jul 11, 2008 at 10:37 PM, Ingo Molnar <[email protected]> wrote:
>
> * Yinghai Lu <[email protected]> wrote:
>
>> > Yinghai, Can you please try the appended patch to see if it fixes
>> > your problem?
>
>> works. it should be merged into the patch that introduce new
>> read_apic_id
>
> ok, that came from the x86/uv branch - but in that form it was not
> affected, only tip/x2apic exposed the problem, right?

because of new read_apic_id()

>
> So i've rebased tip/x86/x2apic and moved Suresh's fix in front of the
> other patches, to make it all bisectable.

should incorperate that fix into the patch that introduce new read_apic_id()

commit df8cc50cc9357ba5a5d6a07744fa36b16a81121c
Author: Suresh Siddha <[email protected]>
Date: Thu Jul 10 11:16:48 2008 -0700

x64, x2apic/intr-remap: introduce read_apic_id() to genapic routines

Move the read_apic_id() to genapic routines.

YH

2008-07-12 06:17:18

by Yinghai Lu

[permalink] [raw]
Subject: Re: [patch 00/26] x64, x2apic/intr-remap: Interrupt-remapping and x2apic support

On Fri, Jul 11, 2008 at 8:52 PM, Eric W. Biederman
<[email protected]> wrote:
> "Yinghai Lu" <[email protected]> writes:
>
>> 1. wonder if x2apic can be use with uniprocessor.
>>
>> in APIC_init_uniprocessor, it will try to enable x2apic, but later
>>
>> apic_write(APIC_ID, SET_APIC_ID(boot_cpu_physical_apicid));
>>
>> but SET_APIC_ID is still for xapic version. so need to GET_APIC_ID,
>> SET_APIC_ID for different
>> genapic like 32bit.
>>
>> 2 check_x2apic is called in setup_arch, but it only set apic_ops,
>> and genapic still not changed, aka apic_flat...
>> wonder if you need to call setup_apic_routing to set genapic.
>>
>> otherwise read_apic_id could have use the one from apic_flat....need
>> to shift......
>>
>> 3.or move read_apic_id to apic_ops intead...together with GET_APIC_ID too.
>> but 32bit version seems like to put GET_APIC_ID with genapic...
>>
>> which one is better? 2 or 3
>
> Z finish untangle SMP support from apic initialization and move the apic
> initialization up into init_IRQ.
>
> That is better but is likely the wrong short term approach.

plan to add get_apic_id(x) into 64bit genapic, and will use
#define GET_APIC_ID(x) genapic->get_apic_id(x)
#define read_apic_id() GET_APIC_ID(apic_read(APIC_ID))

so it is identical to 32bit, and we smooth the merging of 32/64 apic code

also read the x2APIC spec pdf, it doesn't say anything about interrupt
remapping...need to be used with x2apic...

YH

2008-07-12 06:46:21

by Ingo Molnar

[permalink] [raw]
Subject: Re: [patch 00/26] x64, x2apic/intr-remap: Interrupt-remapping and x2apic support


* Yinghai Lu <[email protected]> wrote:

> > other patches, to make it all bisectable.
>
> should incorperate that fix into the patch that introduce new
> read_apic_id()
>
> commit df8cc50cc9357ba5a5d6a07744fa36b16a81121c
> Author: Suresh Siddha <[email protected]>
> Date: Thu Jul 10 11:16:48 2008 -0700
>
> x64, x2apic/intr-remap: introduce read_apic_id() to genapic routines
>
> Move the read_apic_id() to genapic routines.

ok, i've moved it straight after that commit. (bisecting hitting exactly
that window is not an issue and it's better if we see in the history the
types of breakages certain changes can cause - that helps people who
research yet-unfixed crashes, etc.)

Ingo

2008-07-12 07:12:18

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [patch 00/26] x64, x2apic/intr-remap: Interrupt-remapping and x2apic support

"Yinghai Lu" <[email protected]> writes:

> On Fri, Jul 11, 2008 at 8:52 PM, Eric W. Biederman
> <[email protected]> wrote:
>> "Yinghai Lu" <[email protected]> writes:
>>
>>> 1. wonder if x2apic can be use with uniprocessor.
>>>
>>> in APIC_init_uniprocessor, it will try to enable x2apic, but later
>>>
>>> apic_write(APIC_ID, SET_APIC_ID(boot_cpu_physical_apicid));
>>>
>>> but SET_APIC_ID is still for xapic version. so need to GET_APIC_ID,
>>> SET_APIC_ID for different
>>> genapic like 32bit.
>>>
>>> 2 check_x2apic is called in setup_arch, but it only set apic_ops,
>>> and genapic still not changed, aka apic_flat...
>>> wonder if you need to call setup_apic_routing to set genapic.
>>>
>>> otherwise read_apic_id could have use the one from apic_flat....need
>>> to shift......
>>>
>>> 3.or move read_apic_id to apic_ops intead...together with GET_APIC_ID too.
>>> but 32bit version seems like to put GET_APIC_ID with genapic...
>>>
>>> which one is better? 2 or 3
>>
>> Z finish untangle SMP support from apic initialization and move the apic
>> initialization up into init_IRQ.
>>
>> That is better but is likely the wrong short term approach.
>
> plan to add get_apic_id(x) into 64bit genapic, and will use
> #define GET_APIC_ID(x) genapic->get_apic_id(x)
> #define read_apic_id() GET_APIC_ID(apic_read(APIC_ID))
>
> so it is identical to 32bit, and we smooth the merging of 32/64 apic code
>
> also read the x2APIC spec pdf, it doesn't say anything about interrupt
> remapping...need to be used with x2apic...

Clustered logical mode won't work as it requires > 16 bits of apicid.
So only flat physical mode will work.

Eric

2008-07-12 07:50:04

by Yinghai Lu

[permalink] [raw]
Subject: Re: [patch 00/26] x64, x2apic/intr-remap: Interrupt-remapping and x2apic support

On Sat, Jul 12, 2008 at 12:02 AM, Eric W. Biederman
<[email protected]> wrote:
> "Yinghai Lu" <[email protected]> writes:
>
>> On Fri, Jul 11, 2008 at 8:52 PM, Eric W. Biederman
>> <[email protected]> wrote:
>>> "Yinghai Lu" <[email protected]> writes:
>>>
>>>> 1. wonder if x2apic can be use with uniprocessor.
>>>>
>>>> in APIC_init_uniprocessor, it will try to enable x2apic, but later
>>>>
>>>> apic_write(APIC_ID, SET_APIC_ID(boot_cpu_physical_apicid));
>>>>
>>>> but SET_APIC_ID is still for xapic version. so need to GET_APIC_ID,
>>>> SET_APIC_ID for different
>>>> genapic like 32bit.
>>>>
>>>> 2 check_x2apic is called in setup_arch, but it only set apic_ops,
>>>> and genapic still not changed, aka apic_flat...
>>>> wonder if you need to call setup_apic_routing to set genapic.
>>>>
>>>> otherwise read_apic_id could have use the one from apic_flat....need
>>>> to shift......
>>>>
>>>> 3.or move read_apic_id to apic_ops intead...together with GET_APIC_ID too.
>>>> but 32bit version seems like to put GET_APIC_ID with genapic...
>>>>
>>>> which one is better? 2 or 3
>>>
>>> Z finish untangle SMP support from apic initialization and move the apic
>>> initialization up into init_IRQ.
>>>
>>> That is better but is likely the wrong short term approach.
>>
>> plan to add get_apic_id(x) into 64bit genapic, and will use
>> #define GET_APIC_ID(x) genapic->get_apic_id(x)
>> #define read_apic_id() GET_APIC_ID(apic_read(APIC_ID))
>>
>> so it is identical to 32bit, and we smooth the merging of 32/64 apic code
>>
>> also read the x2APIC spec pdf, it doesn't say anything about interrupt
>> remapping...need to be used with x2apic...
>
> Clustered logical mode won't work as it requires > 16 bits of apicid.
> So only flat physical mode will work.

current read_apic_id in genx2apic_cluster and genx2apic_phys is the same...

YH

2008-07-12 08:22:24

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [patch 00/26] x64, x2apic/intr-remap: Interrupt-remapping and x2apic support

"Yinghai Lu" <[email protected]> writes:

>>> also read the x2APIC spec pdf, it doesn't say anything about interrupt
>>> remapping...need to be used with x2apic...
>>
>> Clustered logical mode won't work as it requires > 16 bits of apicid.
>> So only flat physical mode will work.
>
> current read_apic_id in genx2apic_cluster and genx2apic_phys is the same...

There is a fixed defined mapping between logical & physical mappings,
so that may not be an issue.

A logical cluster apicid is encoded with the high 16bits being the
cluster number, and the low 16bits being a bitmap of which core
in the cluster to send the irq to. It sounded like a single
cluster can not span multiple sockets.

So in practice if you have 2 sockets you have a cluster id of 1.
Which means physical apic ids over 16 and logical apicids over 65536.

Eric

2008-07-12 08:37:17

by Yinghai Lu

[permalink] [raw]
Subject: Re: [patch 00/26] x64, x2apic/intr-remap: Interrupt-remapping and x2apic support

On Sat, Jul 12, 2008 at 1:11 AM, Eric W. Biederman
<[email protected]> wrote:
> "Yinghai Lu" <[email protected]> writes:
>
>>>> also read the x2APIC spec pdf, it doesn't say anything about interrupt
>>>> remapping...need to be used with x2apic...
>>>
>>> Clustered logical mode won't work as it requires > 16 bits of apicid.
>>> So only flat physical mode will work.
>>
>> current read_apic_id in genx2apic_cluster and genx2apic_phys is the same...
>
> There is a fixed defined mapping between logical & physical mappings,
> so that may not be an issue.
>
> A logical cluster apicid is encoded with the high 16bits being the
> cluster number, and the low 16bits being a bitmap of which core
> in the cluster to send the irq to. It sounded like a single
> cluster can not span multiple sockets.
>
> So in practice if you have 2 sockets you have a cluster id of 1.
> Which means physical apic ids over 16 and logical apicids over 65536.
is it like
0x0001, 0x0002, 0x0004, 0x0008, ...., 0x8000 in one cluster? so every
cluster only have 16

YH

2008-07-12 09:52:18

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [patch 00/26] x64, x2apic/intr-remap: Interrupt-remapping and x2apic support

"Yinghai Lu" <[email protected]> writes:

>> So in practice if you have 2 sockets you have a cluster id of 1.
>> Which means physical apic ids over 16 and logical apicids over 65536.
> is it like
> 0x0001, 0x0002, 0x0004, 0x0008, ...., 0x8000 in one cluster? so every
> cluster only have 16

Yes. At least that is how I read the x2apic documentation.

Eric

2008-07-13 00:55:58

by Suresh Siddha

[permalink] [raw]
Subject: Re: [patch 00/26] x64, x2apic/intr-remap: Interrupt-remapping and x2apic support

On Fri, Jul 11, 2008 at 08:16:03PM -0700, Yinghai Lu wrote:
> 1. wonder if x2apic can be use with uniprocessor.

>From the theory yes.

> in APIC_init_uniprocessor, it will try to enable x2apic, but later
>
> apic_write(APIC_ID, SET_APIC_ID(boot_cpu_physical_apicid));
>
> but SET_APIC_ID is still for xapic version. so need to GET_APIC_ID,
> SET_APIC_ID for different
> genapic like 32bit.

apic_write for APIC_ID in x2apic is ignored. As it is a RO register.

> 2 check_x2apic is called in setup_arch, but it only set apic_ops,

BIOS can handover the control to OS in x2apic mode, so we need
to check very early about the mode of the APIC to use proper
accesses (mem Vs reg). BIOS can hand over in x2apic if the platform
has > 8 bit apic id's.

> and genapic still not changed, aka apic_flat...
> wonder if you need to call setup_apic_routing to set genapic.
>
> otherwise read_apic_id could have use the one from apic_flat....need
> to shift......

for setup_apic_routing(), we really need to know the platform capbilities,
like intr-remapping etc. So we can;t do the genapic setup so early.

Ideally, read_apic_id() can be part of native apic_ops, but UV has
a different implementation.

And also, typically for boot processor, xapic id and x2apic id will be same.
xapic id needs shifting but not x2apic id.

And also, we can re-set the boot_cpu_physical_apicid, after enabling x2apic.
I was planning to look at this in the next patchset.

> 3.or move read_apic_id to apic_ops intead...together with GET_APIC_ID too.
> but 32bit version seems like to put GET_APIC_ID with genapic...

yeah, UV has a different version. so need to think more about the clean solution

thanks,
suresh

>
> which one is better? 2 or 3
>
> YH

2008-07-13 01:00:17

by Suresh Siddha

[permalink] [raw]
Subject: Re: [patch 00/26] x64, x2apic/intr-remap: Interrupt-remapping and x2apic support

On Fri, Jul 11, 2008 at 11:17:02PM -0700, Yinghai Lu wrote:
> also read the x2APIC spec pdf, it doesn't say anything about interrupt
> remapping...need to be used with x2apic...

We are updating the spec with more clarifications. But in short,
intr-remapping needs to be enabled prior to enabling x2apic in the CPU.

physical mode might work for < 255 id's for some. But it is HW
implementation specific and might not work from generation to generation.
chipsets or cpu's may drop the interrupts if cpu and chipset are
in different modes (one in legacy mode and another in extended).

So Intel is recommending to enable Intr-remapping before enabling x2apic.

thanks,
suresh

2008-07-13 01:01:17

by Suresh Siddha

[permalink] [raw]
Subject: Re: [patch 00/26] x64, x2apic/intr-remap: Interrupt-remapping and x2apic support

On Sat, Jul 12, 2008 at 12:49:53AM -0700, Yinghai Lu wrote:
> On Sat, Jul 12, 2008 at 12:02 AM, Eric W. Biederman
> <[email protected]> wrote:
> > "Yinghai Lu" <[email protected]> writes:
> >
> >> On Fri, Jul 11, 2008 at 8:52 PM, Eric W. Biederman
> >> <[email protected]> wrote:
> >>> "Yinghai Lu" <[email protected]> writes:
> >>>
> >>>> 1. wonder if x2apic can be use with uniprocessor.
> >>>>
> >>>> in APIC_init_uniprocessor, it will try to enable x2apic, but later
> >>>>
> >>>> apic_write(APIC_ID, SET_APIC_ID(boot_cpu_physical_apicid));
> >>>>
> >>>> but SET_APIC_ID is still for xapic version. so need to GET_APIC_ID,
> >>>> SET_APIC_ID for different
> >>>> genapic like 32bit.
> >>>>
> >>>> 2 check_x2apic is called in setup_arch, but it only set apic_ops,
> >>>> and genapic still not changed, aka apic_flat...
> >>>> wonder if you need to call setup_apic_routing to set genapic.
> >>>>
> >>>> otherwise read_apic_id could have use the one from apic_flat....need
> >>>> to shift......
> >>>>
> >>>> 3.or move read_apic_id to apic_ops intead...together with GET_APIC_ID too.
> >>>> but 32bit version seems like to put GET_APIC_ID with genapic...
> >>>>
> >>>> which one is better? 2 or 3
> >>>
> >>> Z finish untangle SMP support from apic initialization and move the apic
> >>> initialization up into init_IRQ.
> >>>
> >>> That is better but is likely the wrong short term approach.
> >>
> >> plan to add get_apic_id(x) into 64bit genapic, and will use
> >> #define GET_APIC_ID(x) genapic->get_apic_id(x)
> >> #define read_apic_id() GET_APIC_ID(apic_read(APIC_ID))
> >>
> >> so it is identical to 32bit, and we smooth the merging of 32/64 apic code
> >>
> >> also read the x2APIC spec pdf, it doesn't say anything about interrupt
> >> remapping...need to be used with x2apic...
> >
> > Clustered logical mode won't work as it requires > 16 bits of apicid.
> > So only flat physical mode will work.
>
> current read_apic_id in genx2apic_cluster and genx2apic_phys is the same...

read_apic_id() corresponds to physical apic id and it is same
irrespective of whether we use logical cluster mode or physical mode.

thanks,
suresh

2008-07-13 01:02:56

by Suresh Siddha

[permalink] [raw]
Subject: Re: [patch 00/26] x64, x2apic/intr-remap: Interrupt-remapping and x2apic support

On Sat, Jul 12, 2008 at 01:37:01AM -0700, Yinghai Lu wrote:
> is it like
> 0x0001, 0x0002, 0x0004, 0x0008, ...., 0x8000 in one cluster? so every
> cluster only have 16

for logical cluster id's, yes.

2008-07-13 01:32:54

by Suresh Siddha

[permalink] [raw]
Subject: Re: [patch 00/26] x64, x2apic/intr-remap: Interrupt-remapping and x2apic support

On Sat, Jul 12, 2008 at 12:02:54AM -0700, Eric W. Biederman wrote:
> "Yinghai Lu" <[email protected]> writes:
> > also read the x2APIC spec pdf, it doesn't say anything about interrupt
> > remapping...need to be used with x2apic...
>
> Clustered logical mode won't work as it requires > 16 bits of apicid.
> So only flat physical mode will work.

Eric, As I mentioned just now in another thread, even flat physical mode
might not work, if cpu or chipset thinks that they are in different modes.

For example if CPU is in extended mode, chipset may block non-remapped
intr-messages and not fwd these to the cpu. So while in theory, physical mode
for < 255 apic id's may work, Intel is not validating and not recommending to
enable x2apic mode in the CPU with out enabling intr-remapping in the chipset.

thanks,
suresh

2008-07-16 14:43:56

by Yong Wang

[permalink] [raw]
Subject: Re: [patch 00/26] x64, x2apic/intr-remap: Interrupt-remapping and x2apic support

On Thu, Jul 10, 2008 at 09:53:20PM +0200, Ingo Molnar wrote:
>
> quite some stuff!
>
> For review and testing purposes i've created a new topic branch for
> this: tip/x86/x2apic and have picked up your patches into it.
>
> I've pushed it out, but it's not merged into tip/master yet (obviously,
> you sent this just a few minutes ago :)
>
> It integrates fine with tip/master. If you do this:
>
> git-checkout tip/master
> git-merge tip/x86/x2apic
>
> you'll get a clean merge.
>

I'm seeing the following build error using default config.

AS arch/x86/lib/csum-copy_64.o
arch/x86/lib/csum-copy_64.S: Assembler messages:
arch/x86/lib/csum-copy_64.S:48: Error: Macro `ignore' was already defined
make[1]: *** [arch/x86/lib/csum-copy_64.o] Error 1
make: *** [arch/x86/lib] Error 2
make: *** Waiting for unfinished jobs....

Anyone encountered the same problem or any idea of what's wrong?

Thanks
-Yong

2008-07-16 14:53:27

by Ingo Molnar

[permalink] [raw]
Subject: Re: [patch 00/26] x64, x2apic/intr-remap: Interrupt-remapping and x2apic support


* Yong Wang <[email protected]> wrote:

> On Thu, Jul 10, 2008 at 09:53:20PM +0200, Ingo Molnar wrote:
> >
> > quite some stuff!
> >
> > For review and testing purposes i've created a new topic branch for
> > this: tip/x86/x2apic and have picked up your patches into it.
> >
> > I've pushed it out, but it's not merged into tip/master yet (obviously,
> > you sent this just a few minutes ago :)
> >
> > It integrates fine with tip/master. If you do this:
> >
> > git-checkout tip/master
> > git-merge tip/x86/x2apic
> >
> > you'll get a clean merge.
> >
>
> I'm seeing the following build error using default config.
>
> AS arch/x86/lib/csum-copy_64.o
> arch/x86/lib/csum-copy_64.S: Assembler messages:
> arch/x86/lib/csum-copy_64.S:48: Error: Macro `ignore' was already defined
> make[1]: *** [arch/x86/lib/csum-copy_64.o] Error 1
> make: *** [arch/x86/lib] Error 2
> make: *** Waiting for unfinished jobs....
>
> Anyone encountered the same problem or any idea of what's wrong?

that should be fixed already.

What is the output of:

git-log tip/master | head

? The latest should be:

commit a328874646a654eabb67da88b3d0a606e552ffe7
Merge: 760e48e... 9d3b08d...
Author: Ingo Molnar <[email protected]>
Date: Tue Jul 15 17:16:40 2008 +0200

and there this error should not occur.

Ingo

2008-07-22 20:50:18

by Andrew Morton

[permalink] [raw]
Subject: Re: [patch 00/26] x64, x2apic/intr-remap: Interrupt-remapping and x2apic support

On Thu, 10 Jul 2008 21:53:20 +0200
Ingo Molnar <[email protected]> wrote:

> For review and testing purposes i've created a new topic branch for
> this: tip/x86/x2apic and have picked up your patches into it.

This has today turned up in linux-next and I'm having to rework 2.6.27
patches to compensate for it.

Which means that I now need to wait until this work goes into mainline
before I can merge those patches or I need to undo those fixes, route
around Suresh's changes and then force you to fix the resulting damage.

What's the score here? Is this stuff going into 2.6.27?

2008-07-22 21:01:27

by Mike Travis

[permalink] [raw]
Subject: Re: [patch 00/26] x64, x2apic/intr-remap: Interrupt-remapping and x2apic support

Andrew Morton wrote:
> On Thu, 10 Jul 2008 21:53:20 +0200
> Ingo Molnar <[email protected]> wrote:
>
>> For review and testing purposes i've created a new topic branch for
>> this: tip/x86/x2apic and have picked up your patches into it.
>
> This has today turned up in linux-next and I'm having to rework 2.6.27
> patches to compensate for it.
>
> Which means that I now need to wait until this work goes into mainline
> before I can merge those patches or I need to undo those fixes, route
> around Suresh's changes and then force you to fix the resulting damage.
>
> What's the score here? Is this stuff going into 2.6.27?

Hi Andrew,

Jack is out at OLS this week, but yes I believe we're trying to push this
into 6.2.27 as that is what the distro's will be basing their distributions
on when the new system sees the light of day.

Is there something I can do to help? Perhaps work on conflict resolutions?

Thanks,
Mike

2008-07-22 21:18:56

by Andrew Morton

[permalink] [raw]
Subject: Re: [patch 00/26] x64, x2apic/intr-remap: Interrupt-remapping and x2apic support

On Tue, 22 Jul 2008 14:00:47 -0700
Mike Travis <[email protected]> wrote:

> Andrew Morton wrote:
> > On Thu, 10 Jul 2008 21:53:20 +0200
> > Ingo Molnar <[email protected]> wrote:
> >
> >> For review and testing purposes i've created a new topic branch for
> >> this: tip/x86/x2apic and have picked up your patches into it.
> >
> > This has today turned up in linux-next and I'm having to rework 2.6.27
> > patches to compensate for it.
> >
> > Which means that I now need to wait until this work goes into mainline
> > before I can merge those patches or I need to undo those fixes, route
> > around Suresh's changes and then force you to fix the resulting damage.
> >
> > What's the score here? Is this stuff going into 2.6.27?
>
> Hi Andrew,
>
> Jack is out at OLS this week, but yes I believe we're trying to push this
> into 6.2.27 as that is what the distro's will be basing their distributions
> on when the new system sees the light of day.

2.6.27 material shuld not be turning up in linux-next halfway through
the merge window.

> Is there something I can do to help? Perhaps work on conflict resolutions?

It was a simple fix.

2008-07-24 05:06:37

by Ingo Molnar

[permalink] [raw]
Subject: Re: [patch 00/26] x64, x2apic/intr-remap: Interrupt-remapping and x2apic support


* Mike Travis <[email protected]> wrote:

> Andrew Morton wrote:
> > On Thu, 10 Jul 2008 21:53:20 +0200
> > Ingo Molnar <[email protected]> wrote:
> >
> >> For review and testing purposes i've created a new topic branch for
> >> this: tip/x86/x2apic and have picked up your patches into it.
> >
> > This has today turned up in linux-next and I'm having to rework 2.6.27
> > patches to compensate for it.
> >
> > Which means that I now need to wait until this work goes into mainline
> > before I can merge those patches or I need to undo those fixes, route
> > around Suresh's changes and then force you to fix the resulting damage.
> >
> > What's the score here? Is this stuff going into 2.6.27?
>
> Hi Andrew,
>
> Jack is out at OLS this week, but yes I believe we're trying to push
> this into 6.2.27 as that is what the distro's will be basing their
> distributions on when the new system sees the light of day.

Note that this is the generic x2apic work from Intel, not the SGI/UV
specific x2apic stuff. The SGI-specific code is upstream already. (and
as i understand it was based on an earlier version of the Intel code)

The Intel x2apic code was submitted before the merge window but was
indeed pushed to linux-next during the window because that's when its
integration became fully ready. We'd rather not hold the whole merge
window and linux-next hostage with not fully ready stuff :-)

Ingo