Hello,
I am trying to use i386 SRAT and it is not working. The srat code
(get_memcfg_from_srat) needs to map in the SRAT table during boot to see
all the numa information. It gets the RSDP just fine but when it looks
up the RSDT the header is empty (I tried to print out RSDT header and it
was empty) and it exits :(
Excerpts from my boot log....
get_memcfg_from_srat: assigning address to rsdp
RSD PTR v0 [IBM ]
ACPI: RSDT signature incorrect
failed to get NUMA memory information from SRAT table
NUMA - single node, flat memory mode
Node: 0, start_pfn: 0, end_pfn: 156
Something is wrong.
A while later in the boot I see.
Using APIC driver default
ACPI: RSDP (v000 IBM ) @ 0x000fdfc0
ACPI: RSDT (v001 IBM SERVIGIL 0x00001000 IBM 0x45444f43) @ 0xeff9c2c0
ACPI: FADT (v001 IBM SERVIGIL 0x00001000 IBM 0x45444f43) @ 0xeff9c240
ACPI: MADT (v001 IBM SERVIGIL 0x00001000 IBM 0x45444f43) @ 0xeff9c0c0
ACPI: SRAT (v001 IBM SERVIGIL 0x00001000 IBM 0x45444f43) @ 0xeff9bf40
ACPI: DSDT (v001 IBM SERVIGIL 0x00001000 INTL 0x02002025) @ 0x00000000
Looks like the RSDT table it there....
I haven't booted i386 numa Summit in a while and was wondering if anyone
had any ideas?
Thanks,
Keith
On Tue, 2006-09-12 at 19:18 -0700, keith mannthey wrote:
> Hello,
> I am trying to use i386 SRAT and it is not working. The srat code
> (get_memcfg_from_srat) needs to map in the SRAT table during boot to see
> all the numa information. It gets the RSDP just fine but when it looks
> up the RSDT the header is empty (I tried to print out RSDT header and it
> was empty) and it exits :(
>
> Excerpts from my boot log....
>
> get_memcfg_from_srat: assigning address to rsdp
> RSD PTR v0 [IBM ]
> ACPI: RSDT signature incorrect
> failed to get NUMA memory information from SRAT table
> NUMA - single node, flat memory mode
> Node: 0, start_pfn: 0, end_pfn: 156
>
> Something is wrong.
>
> A while later in the boot I see.
>
> Using APIC driver default
> ACPI: RSDP (v000 IBM ) @ 0x000fdfc0
> ACPI: RSDT (v001 IBM SERVIGIL 0x00001000 IBM 0x45444f43) @ 0xeff9c2c0
> ACPI: FADT (v001 IBM SERVIGIL 0x00001000 IBM 0x45444f43) @ 0xeff9c240
> ACPI: MADT (v001 IBM SERVIGIL 0x00001000 IBM 0x45444f43) @ 0xeff9c0c0
> ACPI: SRAT (v001 IBM SERVIGIL 0x00001000 IBM 0x45444f43) @ 0xeff9bf40
> ACPI: DSDT (v001 IBM SERVIGIL 0x00001000 INTL 0x02002025) @ 0x00000000
>
> Looks like the RSDT table it there....
>
> I haven't booted i386 numa Summit in a while and was wondering if anyone
> had any ideas?
I found something odd. I went back to a kernels that I knew had booted
(2.6.16) and it was still broken. I realized that I was running my
kernel builds with a new config file so I started poking around. In the
end changing the config like (dropping kdump)
@@ -232,8 +232,8 @@
# CONFIG_HZ_1000 is not set
CONFIG_HZ=250
CONFIG_KEXEC=y
-CONFIG_CRASH_DUMP=y
-CONFIG_PHYSICAL_START=0x1000000
+# CONFIG_CRASH_DUMP is not set
+CONFIG_PHYSICAL_START=0x100000
CONFIG_HOTPLUG_CPU=y
# CONFIG_COMPAT_VDSO is not set
CONFIG_ARCH_ENABLE_MEMORY_HOTPLUG=y
@@ -2872,7 +2872,6 @@
#
CONFIG_PROC_FS=y
CONFIG_PROC_KCORE=y
-CONFIG_PROC_VMCORE=y
CONFIG_PROC_SYSCTL=y
CONFIG_SYSFS=y
CONFIG_TMPFS=y
allowed the SRAT to be mapped during boot. I highly suspect the change
in CONFIG_PHYSICAL_START from 0x100000 to 0x1000000 is to blame for the
change if functionally of boot_ioremap.
The SRAT code uses boot_ioremap to map in the table. With the change of
the the kernel start the mappings that boot_ioremap are returning are
messed up (the data is *not* mapped at the address it returns).
There is some disconnect between boot_ioremap and different
CONFIG_PHYSICAL_START values. I suspect efi (it uses boot_ioremap) will
also be broken with kdump.
Well that is what I found and it is not a new problem just a previous
unused config.
Any ideas?
Thanks,
Keith
Keith, can you get printouts of the phys_addrs it is trying to use
there? In fact, can you print out all of the calls to all of the
functions and all of their arguments in that file?
I'd also be especially interested in what the actual pte values are that
are getting set, what their addresses are, and what the vaddr of
boot_ioremap_space[] is.
Also, it might be possible that this data somehow got pushed above the
8MB boundary. Getting me those addresses will let me check that.
-- Dave
On Thu, 2006-09-14 at 15:01 -0700, Dave Hansen wrote:
> Keith, can you get printouts of the phys_addrs it is trying to use
> there? In fact, can you print out all of the calls to all of the
> functions and all of their arguments in that file?
The call to boot_ioremap is always the same
(works)
BIOS-e820: 0000000000000000 - 000000000009c400 (usable)
BIOS-e820: 000000000009c400 - 00000000000a0000 (reserved)
BIOS-e820: 00000000000e0000 - 0000000000100000 (reserved)
BIOS-e820: 0000000000100000 - 00000000eff91840 (usable)
BIOS-e820: 00000000eff91840 - 00000000eff9c340 (ACPI data)
BIOS-e820: 00000000eff9c340 - 00000000f0000000 (reserved)
BIOS-e820: 00000000fec00000 - 0000000100000000 (reserved)
BIOS-e820: 0000000100000000 - 00000001d0000000 (usable)
Node: 0, start_pfn: 0, end_pfn: 156
Node: 0, start_pfn: 256, end_pfn: 982929
Node: 0, start_pfn: 1048576, end_pfn: 1900544
get_memcfg_from_srat: assigning address to rsdp fdfc0
RSD PTR v0 [IBM ]
rsdp->rsdt_address eff9c2c0
boot_ioremap phys_addr = eff9c2c0 long = 44
boot_ioremap and I return c04da2c0
rsdt = c04da2c0 header is RSDT4
Begin SRAT table scan....
....
(no works)
BIOS-e820: 0000000000000000 - 000000000009c400 (usable)
BIOS-e820: 000000000009c400 - 00000000000a0000 (reserved)
BIOS-e820: 00000000000e0000 - 0000000000100000 (reserved)
BIOS-e820: 0000000000100000 - 00000000eff91840 (usable)
BIOS-e820: 00000000eff91840 - 00000000eff9c340 (ACPI data)
BIOS-e820: 00000000eff9c340 - 00000000f0000000 (reserved)
BIOS-e820: 00000000fec00000 - 0000000100000000 (reserved)
BIOS-e820: 0000000100000000 - 00000001d0000000 (usable)
Node: 0, start_pfn: 0, end_pfn: 156
Node: 0, start_pfn: 256, end_pfn: 982929
Node: 0, start_pfn: 1048576, end_pfn: 1900544
get_memcfg_from_srat: assigning address to rsdp fdfc0
RSD PTR v0 [IBM ]
rsdp->rsdt_address eff9c2c0
boot_ioremap phys_addr = eff9c2c0 long = 44
boot_ioremap and I return c13db2c0
rsdt = c13db2c0 header is
ACPI: RSDT signature incorrect
failed to get NUMA memory information from SRAT table
NUMA - single node, flat memory mode
...
> Also, it might be possible that this data somehow got pushed above the
> 8MB boundary. Getting me those addresses will let me check that.
I think the kernel starts @ 16mb with i386 kdump.
Thanks,
Keith
How do the ptes look?
-- Dave
On Thu, Sep 14, 2006 at 03:43:50PM -0700, keith mannthey wrote:
> On Thu, 2006-09-14 at 15:01 -0700, Dave Hansen wrote:
> > Keith, can you get printouts of the phys_addrs it is trying to use
> > there? In fact, can you print out all of the calls to all of the
> > functions and all of their arguments in that file?
> The call to boot_ioremap is always the same
>
> (works)
> BIOS-e820: 0000000000000000 - 000000000009c400 (usable)
> BIOS-e820: 000000000009c400 - 00000000000a0000 (reserved)
> BIOS-e820: 00000000000e0000 - 0000000000100000 (reserved)
> BIOS-e820: 0000000000100000 - 00000000eff91840 (usable)
> BIOS-e820: 00000000eff91840 - 00000000eff9c340 (ACPI data)
> BIOS-e820: 00000000eff9c340 - 00000000f0000000 (reserved)
> BIOS-e820: 00000000fec00000 - 0000000100000000 (reserved)
> BIOS-e820: 0000000100000000 - 00000001d0000000 (usable)
> Node: 0, start_pfn: 0, end_pfn: 156
> Node: 0, start_pfn: 256, end_pfn: 982929
> Node: 0, start_pfn: 1048576, end_pfn: 1900544
> get_memcfg_from_srat: assigning address to rsdp fdfc0
> RSD PTR v0 [IBM ]
> rsdp->rsdt_address eff9c2c0
> boot_ioremap phys_addr = eff9c2c0 long = 44
> boot_ioremap and I return c04da2c0
> rsdt = c04da2c0 header is RSDT4
> Begin SRAT table scan....
> ....
>
> (no works)
> BIOS-e820: 0000000000000000 - 000000000009c400 (usable)
> BIOS-e820: 000000000009c400 - 00000000000a0000 (reserved)
> BIOS-e820: 00000000000e0000 - 0000000000100000 (reserved)
> BIOS-e820: 0000000000100000 - 00000000eff91840 (usable)
> BIOS-e820: 00000000eff91840 - 00000000eff9c340 (ACPI data)
> BIOS-e820: 00000000eff9c340 - 00000000f0000000 (reserved)
> BIOS-e820: 00000000fec00000 - 0000000100000000 (reserved)
> BIOS-e820: 0000000100000000 - 00000001d0000000 (usable)
> Node: 0, start_pfn: 0, end_pfn: 156
> Node: 0, start_pfn: 256, end_pfn: 982929
> Node: 0, start_pfn: 1048576, end_pfn: 1900544
> get_memcfg_from_srat: assigning address to rsdp fdfc0
> RSD PTR v0 [IBM ]
> rsdp->rsdt_address eff9c2c0
> boot_ioremap phys_addr = eff9c2c0 long = 44
> boot_ioremap and I return c13db2c0
> rsdt = c13db2c0 header is
> ACPI: RSDT signature incorrect
> failed to get NUMA memory information from SRAT table
> NUMA - single node, flat memory mode
> ...
>
>
> > Also, it might be possible that this data somehow got pushed above the
> > 8MB boundary. Getting me those addresses will let me check that.
>
> I think the kernel starts @ 16mb with i386 kdump.
Yes, generally we start at non 1MB addresses for kdump. But normal cases
also we should be able to boot at physical addr 16MB.
I think I know what is going on wrong here. boot_ioremap() is assuming
that only first 8MB of physical memory is being mapped and while
calculating the index into page table (boot_pte_index) we will truncate
any higher address bits.
When we compile the kernel for address 16MB, initial boot page tables
map much more physical memory (roughly 20MB, assuming kernel size to
be 4MB). So roughly 5 page table pages are required, say pg0, pg1, pg2
pg3 and pg4. Kernel will most likely be mapped in pg4 and pg5. But
current boot_ioremap() logic will map the new physical address either
in pg0 or pg1. Hence we don't update the right address and end up
reading wrong data.
I hope, I understand the code right. :-)
Thanks
Vivek
On Thu, 2006-09-14 at 15:48 -0700, Dave Hansen wrote:
> How do the ptes look?
rsdp->rsdt_address eff9c2c0
boot_ioremap phys_addr = eff9c2c0 long = 44
__boot_ioremap phys_addr = eff9c000 pages = 1 source c13db000
setting pte c1682f6c to eff9c063
just flushed c13db000
boot_ioremap and I return c13db2c0
rsdt = c13db2c0 header is
ACPI: RSDT signature incorrect
the pte we get back from boot_vaddr_to_pte looks to be off be off (or
the data . Seems odd we set the pte c1682f6c then flush c13db000....
Still polking around. I just read Viveks mail he seems to be onto
something.
Thanks,
Keith
On Thu, 2006-09-14 at 19:04 -0400, Vivek Goyal wrote:
> I think I know what is going on wrong here. boot_ioremap() is assuming
> that only first 8MB of physical memory is being mapped and while
> calculating the index into page table (boot_pte_index) we will truncate
> any higher address bits.
Vivek, are those pte pages still all contiguous?
Yeah, that's probably it. Keith, I'm trying to think of reasons why we
need the mask here:
#define boot_pte_index(address) \
(((address) >> PAGE_SHIFT) & (BOOT_PTE_PTRS - 1))
and I can't think of any other than just masking out the top of the
virtual address. You could do this a bunch of other ways, like __pa().
This might just work:
static unsigned long boot_pte_index(unsigned long vaddr)
{
return __pa(vaddr) >> PAGE_SHIFT;
}
-- Dave
On Thu, Sep 14, 2006 at 04:21:33PM -0700, Dave Hansen wrote:
> On Thu, 2006-09-14 at 19:04 -0400, Vivek Goyal wrote:
> > I think I know what is going on wrong here. boot_ioremap() is assuming
> > that only first 8MB of physical memory is being mapped and while
> > calculating the index into page table (boot_pte_index) we will truncate
> > any higher address bits.
>
> Vivek, are those pte pages still all contiguous?
>
Yes, they are. (arch/i386/kernel/head.S)
> Yeah, that's probably it. Keith, I'm trying to think of reasons why we
> need the mask here:
>
> #define boot_pte_index(address) \
> (((address) >> PAGE_SHIFT) & (BOOT_PTE_PTRS - 1))
>
> and I can't think of any other than just masking out the top of the
> virtual address. You could do this a bunch of other ways, like __pa().
>
> This might just work:
>
> static unsigned long boot_pte_index(unsigned long vaddr)
> {
> return __pa(vaddr) >> PAGE_SHIFT;
> }
>
This looks good. Should work.
Thanks
Vivek
On Thu, 2006-09-14 at 16:21 -0700, Dave Hansen wrote:
> On Thu, 2006-09-14 at 19:04 -0400, Vivek Goyal wrote:
> > I think I know what is going on wrong here. boot_ioremap() is assuming
> > that only first 8MB of physical memory is being mapped and while
> > calculating the index into page table (boot_pte_index) we will truncate
> > any higher address bits.
>
> Vivek, are those pte pages still all contiguous?
>
> Yeah, that's probably it. Keith, I'm trying to think of reasons why we
> need the mask here:
>
> #define boot_pte_index(address) \
> (((address) >> PAGE_SHIFT) & (BOOT_PTE_PTRS - 1))
>
> and I can't think of any other than just masking out the top of the
> virtual address. You could do this a bunch of other ways, like __pa().
>
> This might just work:
>
> static unsigned long boot_pte_index(unsigned long vaddr)
> {
> return __pa(vaddr) >> PAGE_SHIFT;
> }
With
CONFIG_KEXEC=y
CONFIG_CRASH_DUMP=y
CONFIG_PHYSICAL_START=0x1000000
and the above __pa(vaddr) >> PAGE_SHIFT changed things worked for me but
I am still confused as to why the pte we set is c1686f6c and we flush
and return c13db000.
get_memcfg_from_srat: assigning address to rsdp fdfc0
RSD PTR v0 [IBM ]
rsdp->rsdt_address eff9c2c0
boot_ioremap phys_addr = eff9c2c0 long = 44
__boot_ioremap phys_addr = eff9c000 pages = 1 source c13db000
setting pte c1686f6c to eff9c063
just flushed c13db000
boot_ioremap and I return c13db2c0
rsdt = c13db2c0 header is RSDT4
Begin SRAT table scan....
thanks,
Keith
On Thu, Sep 14, 2006 at 04:43:33PM -0700, keith mannthey wrote:
> On Thu, 2006-09-14 at 16:21 -0700, Dave Hansen wrote:
> > On Thu, 2006-09-14 at 19:04 -0400, Vivek Goyal wrote:
> > > I think I know what is going on wrong here. boot_ioremap() is assuming
> > > that only first 8MB of physical memory is being mapped and while
> > > calculating the index into page table (boot_pte_index) we will truncate
> > > any higher address bits.
> >
> > Vivek, are those pte pages still all contiguous?
> >
> > Yeah, that's probably it. Keith, I'm trying to think of reasons why we
> > need the mask here:
> >
> > #define boot_pte_index(address) \
> > (((address) >> PAGE_SHIFT) & (BOOT_PTE_PTRS - 1))
> >
> > and I can't think of any other than just masking out the top of the
> > virtual address. You could do this a bunch of other ways, like __pa().
> >
> > This might just work:
> >
> > static unsigned long boot_pte_index(unsigned long vaddr)
> > {
> > return __pa(vaddr) >> PAGE_SHIFT;
> > }
> With
> CONFIG_KEXEC=y
> CONFIG_CRASH_DUMP=y
> CONFIG_PHYSICAL_START=0x1000000
> and the above __pa(vaddr) >> PAGE_SHIFT changed things worked for me but
> I am still confused as to why the pte we set is c1686f6c and we flush
> and return c13db000.
I think c13db000 is the virtual address of symbol boot_ioremap_space,
and we are remapping this virtual address to physical addres eff9c063,
that's why we flush tlb for this virtual address.
I guess c1686f6c is virtual address of PTE (somewhere between pg0 to pg5).
It is independent of actual virtual address (boot_ioremap_space) being
remapped.
Thanks
Vivek
>
> get_memcfg_from_srat: assigning address to rsdp fdfc0
> RSD PTR v0 [IBM ]
> rsdp->rsdt_address eff9c2c0
> boot_ioremap phys_addr = eff9c2c0 long = 44
> __boot_ioremap phys_addr = eff9c000 pages = 1 source c13db000
> setting pte c1686f6c to eff9c063
> just flushed c13db000
> boot_ioremap and I return c13db2c0
> rsdt = c13db2c0 header is RSDT4
> Begin SRAT table scan....
>
> thanks,
> Keith
>
On Thu, 2006-09-14 at 19:59 -0400, Vivek Goyal wrote:
> On Thu, Sep 14, 2006 at 04:43:33PM -0700, keith mannthey wrote:
> > On Thu, 2006-09-14 at 16:21 -0700, Dave Hansen wrote:
> > > On Thu, 2006-09-14 at 19:04 -0400, Vivek Goyal wrote:
> > > > I think I know what is going on wrong here. boot_ioremap() is assuming
> > > > that only first 8MB of physical memory is being mapped and while
> > > > calculating the index into page table (boot_pte_index) we will truncate
> > > > any higher address bits.
> > >
> > > Vivek, are those pte pages still all contiguous?
> > >
> > > Yeah, that's probably it. Keith, I'm trying to think of reasons why we
> > > need the mask here:
> > >
> > > #define boot_pte_index(address) \
> > > (((address) >> PAGE_SHIFT) & (BOOT_PTE_PTRS - 1))
> > >
> > > and I can't think of any other than just masking out the top of the
> > > virtual address. You could do this a bunch of other ways, like __pa().
> > >
> > > This might just work:
> > >
> > > static unsigned long boot_pte_index(unsigned long vaddr)
> > > {
> > > return __pa(vaddr) >> PAGE_SHIFT;
> > > }
> > With
> > CONFIG_KEXEC=y
> > CONFIG_CRASH_DUMP=y
> > CONFIG_PHYSICAL_START=0x1000000
> > and the above __pa(vaddr) >> PAGE_SHIFT changed things worked for me but
> > I am still confused as to why the pte we set is c1686f6c and we flush
> > and return c13db000.
>
> I think c13db000 is the virtual address of symbol boot_ioremap_space,
> and we are remapping this virtual address to physical addres eff9c063,
> that's why we flush tlb for this virtual address.
>
> I guess c1686f6c is virtual address of PTE (somewhere between pg0 to pg5).
> It is independent of actual virtual address (boot_ioremap_space) being
> remapped.
Yup thanks.
Dave do you want to send out a patch?
Thanks,
Keith