2009-12-15 11:51:08

by Jens Axboe

[permalink] [raw]
Subject: kexec boot regression

Hi,

I have this big box that takes forever to boot, so I use kexec to boot
into new kernels. Works fine, but some time past 2.6.32 it stopped
working. Instead of wasting brain cycles on finding out why, I handed
the problem to my trusty regression friend - git bisect.

This is what it found (sorry Yinghai it's you again, you owe me a beer
for hours of 2.6.32-git bisecting ;-)


99935a7a59eaca0292c1a5880e10bae03f4a5e3d is the first bad commit
commit 99935a7a59eaca0292c1a5880e10bae03f4a5e3d
Author: Yinghai Lu <[email protected]>
Date: Sun Oct 4 21:54:24 2009 -0700

x86/PCI: read root resources from IOH on Intel

For intel systems with multi IOH, we should read peer root resources
directly from PCI config space, and don't trust _CRS.


I could not revert this single commit, as a further commit made other
changes. So I reverted 67f241f4 first and then 99935a7a. I confirmed
that this kernel then works fine.

With current -git, I get tons and tons of:

[ 16.841724] pci 0000:00:01.0: BAR 7: no parent found for bridge [io
0x6000-0x6fff]
[ 16.850368] pci 0000:00:01.0: BAR 7: can't allocate [io
0x6000-0x6fff]
[ 16.857821] pci 0000:00:01.0: BAR 8: no parent found for bridge [mem
0x9bc00000-0x9bcfffff]
[ 16.867238] pci 0000:00:01.0: BAR 8: can't allocate [mem
0x9bc00000-0x9bcfffff]
[ 16.875492] pci 0000:00:02.0: BAR 7: no parent found for bridge [io
0x5000-0x5fff]
[ 16.884137] pci 0000:00:02.0: BAR 7: can't allocate [io
0x5000-0x5fff]
[ 16.891591] pci 0000:00:02.0: BAR 8: no parent found for bridge [mem
0x9bb00000-0x9bbfffff]
[ 16.901010] pci 0000:00:02.0: BAR 8: can't allocate [mem
0x9bb00000-0x9bbfffff]
[ 16.909264] pci 0000:00:03.0: BAR 7: no parent found for bridge [io
0x4000-0x4fff]
[ 16.917908] pci 0000:00:03.0: BAR 7: can't allocate [io
0x4000-0x4fff]
[...]

I can provide a full log if needed.

--
Jens Axboe


2009-12-15 12:03:34

by Yinghai Lu

[permalink] [raw]
Subject: Re: kexec boot regression

Jens Axboe wrote:
> Hi,
>
> I have this big box that takes forever to boot, so I use kexec to boot
> into new kernels. Works fine, but some time past 2.6.32 it stopped
> working. Instead of wasting brain cycles on finding out why, I handed
> the problem to my trusty regression friend - git bisect.
>
> This is what it found (sorry Yinghai it's you again, you owe me a beer
> for hours of 2.6.32-git bisecting ;-)

sure.

>
>
> 99935a7a59eaca0292c1a5880e10bae03f4a5e3d is the first bad commit
> commit 99935a7a59eaca0292c1a5880e10bae03f4a5e3d
> Author: Yinghai Lu <[email protected]>
> Date: Sun Oct 4 21:54:24 2009 -0700
>
> x86/PCI: read root resources from IOH on Intel
>
> For intel systems with multi IOH, we should read peer root resources
> directly from PCI config space, and don't trust _CRS.
>
>
> I could not revert this single commit, as a further commit made other
> changes. So I reverted 67f241f4 first and then 99935a7a. I confirmed
> that this kernel then works fine.
>

let see how BIOS mess it up again!

> With current -git, I get tons and tons of:
>
> [ 16.841724] pci 0000:00:01.0: BAR 7: no parent found for bridge [io
> 0x6000-0x6fff]
> [ 16.850368] pci 0000:00:01.0: BAR 7: can't allocate [io
> 0x6000-0x6fff]
> [ 16.857821] pci 0000:00:01.0: BAR 8: no parent found for bridge [mem
> 0x9bc00000-0x9bcfffff]
> [ 16.867238] pci 0000:00:01.0: BAR 8: can't allocate [mem
> 0x9bc00000-0x9bcfffff]
> [ 16.875492] pci 0000:00:02.0: BAR 7: no parent found for bridge [io
> 0x5000-0x5fff]
> [ 16.884137] pci 0000:00:02.0: BAR 7: can't allocate [io
> 0x5000-0x5fff]
> [ 16.891591] pci 0000:00:02.0: BAR 8: no parent found for bridge [mem
> 0x9bb00000-0x9bbfffff]
> [ 16.901010] pci 0000:00:02.0: BAR 8: can't allocate [mem
> 0x9bb00000-0x9bbfffff]
> [ 16.909264] pci 0000:00:03.0: BAR 7: no parent found for bridge [io
> 0x4000-0x4fff]
> [ 16.917908] pci 0000:00:03.0: BAR 7: can't allocate [io
> 0x4000-0x4fff]
> [...]
>
> I can provide a full log if needed.

please.

YH

2009-12-15 12:14:41

by Jens Axboe

[permalink] [raw]
Subject: Re: kexec boot regression

On Tue, Dec 15 2009, Yinghai Lu wrote:
> Jens Axboe wrote:
> > Hi,
> >
> > I have this big box that takes forever to boot, so I use kexec to boot
> > into new kernels. Works fine, but some time past 2.6.32 it stopped
> > working. Instead of wasting brain cycles on finding out why, I handed
> > the problem to my trusty regression friend - git bisect.
> >
> > This is what it found (sorry Yinghai it's you again, you owe me a beer
> > for hours of 2.6.32-git bisecting ;-)
>
> sure.
>
> >
> >
> > 99935a7a59eaca0292c1a5880e10bae03f4a5e3d is the first bad commit
> > commit 99935a7a59eaca0292c1a5880e10bae03f4a5e3d
> > Author: Yinghai Lu <[email protected]>
> > Date: Sun Oct 4 21:54:24 2009 -0700
> >
> > x86/PCI: read root resources from IOH on Intel
> >
> > For intel systems with multi IOH, we should read peer root resources
> > directly from PCI config space, and don't trust _CRS.
> >
> >
> > I could not revert this single commit, as a further commit made other
> > changes. So I reverted 67f241f4 first and then 99935a7a. I confirmed
> > that this kernel then works fine.
> >
>
> let see how BIOS mess it up again!

Heh, I had a feeling this was coming :-)

> please.

Please find two logs attached - one from a boot with -git and the two
patches reverted, and one from a boot with -git.

--
Jens Axboe


Attachments:
(No filename) (1.31 kB)
good-boot.log.gz (15.54 kB)
bad-boot.log.gz (14.39 kB)
Download all attachments

2009-12-15 12:32:55

by Yinghai Lu

[permalink] [raw]
Subject: Re: kexec boot regression

Jens Axboe wrote:
> On Tue, Dec 15 2009, Yinghai Lu wrote:
>> Jens Axboe wrote:
>>> Hi,
>>>
>>> I have this big box that takes forever to boot, so I use kexec to boot
>>> into new kernels. Works fine, but some time past 2.6.32 it stopped
>>> working. Instead of wasting brain cycles on finding out why, I handed
>>> the problem to my trusty regression friend - git bisect.
>>>
>>> This is what it found (sorry Yinghai it's you again, you owe me a beer
>>> for hours of 2.6.32-git bisecting ;-)
>> sure.
>>
>>>
>>> 99935a7a59eaca0292c1a5880e10bae03f4a5e3d is the first bad commit
>>> commit 99935a7a59eaca0292c1a5880e10bae03f4a5e3d
>>> Author: Yinghai Lu <[email protected]>
>>> Date: Sun Oct 4 21:54:24 2009 -0700
>>>
>>> x86/PCI: read root resources from IOH on Intel
>>>
>>> For intel systems with multi IOH, we should read peer root resources
>>> directly from PCI config space, and don't trust _CRS.
>>>
>>>
>>> I could not revert this single commit, as a further commit made other
>>> changes. So I reverted 67f241f4 first and then 99935a7a. I confirmed
>>> that this kernel then works fine.
>>>
>> let see how BIOS mess it up again!
>
> Heh, I had a feeling this was coming :-)
>
>> please.
>
> Please find two logs attached - one from a boot with -git and the two
> patches reverted, and one from a boot with -git.

please enabled CONFIG_PCI_DEBUG and boot with debug in boot command line.

Thanks

Yinghai

2009-12-15 12:39:53

by Jens Axboe

[permalink] [raw]
Subject: Re: kexec boot regression

On Tue, Dec 15 2009, Yinghai Lu wrote:
> Jens Axboe wrote:
> > On Tue, Dec 15 2009, Yinghai Lu wrote:
> >> Jens Axboe wrote:
> >>> Hi,
> >>>
> >>> I have this big box that takes forever to boot, so I use kexec to boot
> >>> into new kernels. Works fine, but some time past 2.6.32 it stopped
> >>> working. Instead of wasting brain cycles on finding out why, I handed
> >>> the problem to my trusty regression friend - git bisect.
> >>>
> >>> This is what it found (sorry Yinghai it's you again, you owe me a beer
> >>> for hours of 2.6.32-git bisecting ;-)
> >> sure.
> >>
> >>>
> >>> 99935a7a59eaca0292c1a5880e10bae03f4a5e3d is the first bad commit
> >>> commit 99935a7a59eaca0292c1a5880e10bae03f4a5e3d
> >>> Author: Yinghai Lu <[email protected]>
> >>> Date: Sun Oct 4 21:54:24 2009 -0700
> >>>
> >>> x86/PCI: read root resources from IOH on Intel
> >>>
> >>> For intel systems with multi IOH, we should read peer root resources
> >>> directly from PCI config space, and don't trust _CRS.
> >>>
> >>>
> >>> I could not revert this single commit, as a further commit made other
> >>> changes. So I reverted 67f241f4 first and then 99935a7a. I confirmed
> >>> that this kernel then works fine.
> >>>
> >> let see how BIOS mess it up again!
> >
> > Heh, I had a feeling this was coming :-)
> >
> >> please.
> >
> > Please find two logs attached - one from a boot with -git and the two
> > patches reverted, and one from a boot with -git.
>
> please enabled CONFIG_PCI_DEBUG and boot with debug in boot command line.

On the good or bad kernel?

--
Jens Axboe

2009-12-15 12:56:58

by Yinghai Lu

[permalink] [raw]
Subject: Re: kexec boot regression

Jens Axboe wrote:
> On Tue, Dec 15 2009, Yinghai Lu wrote:
>> Jens Axboe wrote:
>>> On Tue, Dec 15 2009, Yinghai Lu wrote:
>>>> Jens Axboe wrote:
>>>>> Hi,
>>>>>
>>>>> I have this big box that takes forever to boot, so I use kexec to boot
>>>>> into new kernels. Works fine, but some time past 2.6.32 it stopped
>>>>> working. Instead of wasting brain cycles on finding out why, I handed
>>>>> the problem to my trusty regression friend - git bisect.
>>>>>
>>>>> This is what it found (sorry Yinghai it's you again, you owe me a beer
>>>>> for hours of 2.6.32-git bisecting ;-)
>>>> sure.
>>>>
>>>>> 99935a7a59eaca0292c1a5880e10bae03f4a5e3d is the first bad commit
>>>>> commit 99935a7a59eaca0292c1a5880e10bae03f4a5e3d
>>>>> Author: Yinghai Lu <[email protected]>
>>>>> Date: Sun Oct 4 21:54:24 2009 -0700
>>>>>
>>>>> x86/PCI: read root resources from IOH on Intel
>>>>>
>>>>> For intel systems with multi IOH, we should read peer root resources
>>>>> directly from PCI config space, and don't trust _CRS.
>>>>>
>>>>>
>>>>> I could not revert this single commit, as a further commit made other
>>>>> changes. So I reverted 67f241f4 first and then 99935a7a. I confirmed
>>>>> that this kernel then works fine.
>>>>>
>>>> let see how BIOS mess it up again!
>>> Heh, I had a feeling this was coming :-)
>>>
>>>> please.
>>> Please find two logs attached - one from a boot with -git and the two
>>> patches reverted, and one from a boot with -git.
>> please enabled CONFIG_PCI_DEBUG and boot with debug in boot command line.
>
> On the good or bad kernel?

both please.

YH

2009-12-15 14:11:15

by Jens Axboe

[permalink] [raw]
Subject: Re: kexec boot regression

On Tue, Dec 15 2009, Yinghai Lu wrote:
> Jens Axboe wrote:
> > On Tue, Dec 15 2009, Yinghai Lu wrote:
> >> Jens Axboe wrote:
> >>> On Tue, Dec 15 2009, Yinghai Lu wrote:
> >>>> Jens Axboe wrote:
> >>>>> Hi,
> >>>>>
> >>>>> I have this big box that takes forever to boot, so I use kexec to boot
> >>>>> into new kernels. Works fine, but some time past 2.6.32 it stopped
> >>>>> working. Instead of wasting brain cycles on finding out why, I handed
> >>>>> the problem to my trusty regression friend - git bisect.
> >>>>>
> >>>>> This is what it found (sorry Yinghai it's you again, you owe me a beer
> >>>>> for hours of 2.6.32-git bisecting ;-)
> >>>> sure.
> >>>>
> >>>>> 99935a7a59eaca0292c1a5880e10bae03f4a5e3d is the first bad commit
> >>>>> commit 99935a7a59eaca0292c1a5880e10bae03f4a5e3d
> >>>>> Author: Yinghai Lu <[email protected]>
> >>>>> Date: Sun Oct 4 21:54:24 2009 -0700
> >>>>>
> >>>>> x86/PCI: read root resources from IOH on Intel
> >>>>>
> >>>>> For intel systems with multi IOH, we should read peer root resources
> >>>>> directly from PCI config space, and don't trust _CRS.
> >>>>>
> >>>>>
> >>>>> I could not revert this single commit, as a further commit made other
> >>>>> changes. So I reverted 67f241f4 first and then 99935a7a. I confirmed
> >>>>> that this kernel then works fine.
> >>>>>
> >>>> let see how BIOS mess it up again!
> >>> Heh, I had a feeling this was coming :-)
> >>>
> >>>> please.
> >>> Please find two logs attached - one from a boot with -git and the two
> >>> patches reverted, and one from a boot with -git.
> >> please enabled CONFIG_PCI_DEBUG and boot with debug in boot command line.
> >
> > On the good or bad kernel?
>
> both please.

Attached.

--
Jens Axboe


Attachments:
(No filename) (1.70 kB)
good-log-debug.txt.gz (40.75 kB)
bad-log-debug.txt.gz (37.82 kB)
Download all attachments

2009-12-15 18:42:43

by Yinghai Lu

[permalink] [raw]
Subject: Re: kexec boot regression

Jens Axboe wrote:
> On Tue, Dec 15 2009, Yinghai Lu wrote:
>>>>>>>
>>>>>> let see how BIOS mess it up again!
>>>>> Heh, I had a feeling this was coming :-)

[ 0.000000] user-defined physical RAM map:

[ 0.000000] user: 0000000000000100 - 0000000000098800 (usable)

[ 0.000000] user: 0000000000098800 - 00000000000a0000 (reserved)

[ 0.000000] user: 00000000000e0000 - 0000000000100000 (reserved)

[ 0.000000] user: 0000000000100000 - 0000000078c63000 (usable)

[ 0.000000] user: 0000000078c63000 - 0000000078e77000 (ACPI NVS)

[ 0.000000] user: 0000000078e77000 - 000000007924e000 (ACPI data)

[ 0.000000] user: 000000007924e000 - 00000000792c2000 (reserved)

[ 0.000000] user: 00000000792c2000 - 00000000792d2000 (ACPI data)

[ 0.000000] user: 00000000792d2000 - 00000000792e7000 (reserved)

[ 0.000000] user: 00000000792e7000 - 0000000079301000 (ACPI data)

[ 0.000000] user: 0000000079301000 - 0000000079303000 (reserved)

[ 0.000000] user: 0000000079303000 - 0000000079305000 (ACPI data)


[ 0.000000] user: 0000000079305000 - 0000000079310000 (reserved)

[ 0.000000] user: 0000000079310000 - 0000000079314000 (ACPI data)

[ 0.000000] user: 0000000079314000 - 0000000079319000 (reserved)

[ 0.000000] user: 0000000079319000 - 0000000079336000 (ACPI data)

[ 0.000000] user: 0000000079336000 - 0000000079358000 (reserved)

[ 0.000000] user: 0000000079358000 - 0000000079388000 (ACPI data)

[ 0.000000] user: 0000000079388000 - 00000000793c9000 (reserved)

[ 0.000000] user: 00000000793c9000 - 000000007968f000 (ACPI data)

[ 0.000000] user: 000000007968f000 - 00000000796bb000 (reserved)

[ 0.000000] user: 00000000796bb000 - 00000000799d8000 (ACPI data)

[ 0.000000] user: 00000000799d8000 - 0000000079bd8000 (ACPI NVS)

[ 0.000000] user: 0000000079bd8000 - 0000000079d87000 (ACPI data)

[ 0.000000] user: 0000000079d87000 - 0000000079d8a000 (reserved)

[ 0.000000] user: 0000000079d8a000 - 0000000079dca000 (ACPI data)

[ 0.000000] user: 0000000079dca000 - 0000000079dcb000 (reserved)

[ 0.000000] user: 0000000079dcb000 - 0000000079e1c000 (ACPI data)

[ 0.000000] user: 0000000079e1c000 - 0000000079e87000 (reserved)

[ 0.000000] user: 0000000079e87000 - 000000007bd5f000 (ACPI data)

[ 0.000000] user: 000000007bd5f000 - 000000007be4f000 (reserved)

[ 0.000000] user: 000000007be4f000 - 000000007bf87000 (ACPI data)

[ 0.000000] user: 0000000100000000 - 0000001080000000 (usable)
...
[ 0.000000] SRAT: Node 0 PXM 0 0-80000000

[ 0.000000] SRAT: Node 0 PXM 0 100000000-480000000

[ 0.000000] SRAT: Node 2 PXM 1 480000000-880000000

[ 0.000000] SRAT: Node 1 PXM 2 880000000-c80000000

[ 0.000000] SRAT: Node 3 PXM 3 c80000000-1080000000

[ 0.000000] ACPI: [SRAT:0x01] ignored 16 entries of 32 found

[ 0.000000] NUMA: Using 31 for the hash shift.

[ 0.000000] SRAT: PXMs only cover 49035MB of your 65419MB e820 RAM. Not used.

[ 0.000000] SRAT: SRAT not used.

[ 0.000000] No NUMA configuration found

so SRAT is broken?

if (max_entries && count > max_entries) {
printk(KERN_WARNING PREFIX "[%4.4s:0x%02x] ignored %i entries of "
"%i found\n", id, entry_id, count - max_entries, count);
}
...

or what is your CONFIG_NODES_SHIFT? 3? can you try to set it to 6?

[ 13.018720] PCI: MMCONFIG for domain 0000 [bus 00-ff] at [mem 0x80000000-0x8fffffff] (base 0x80000000)

[ 13.100724] [Firmware Bug]: PCI: MMCONFIG at [mem 0x80000000-0x8fffffff] not reserved in ACPI motherboard resources

[ 13.112475] PCI: not using MMCONFIG

[ 13.206650] ACPI: No dock devices found.

so mmconf is not used...<ask BIOS fix it please!>

then we get

[ 13.990335] IOH bus: [00, 00]

[ 13.993707] IOH bus: 00 index 0 io port: [0, fff]

[ 13.999023] IOH bus: 00 index 1 mmio: [0, ffffff]

[ 14.004335] IOH bus: 00 index 2 mmio: [0, 3ffffff]

please check

[PATCH] x86/pci: intel ioh bus num reg accessing fix

it is above 0x100, so if mmconf is not enable, need to skip it

Reported-by: Jens Axboe <[email protected]>
Signed-off-by: Yinghai Lu <[email protected]>

---
arch/x86/pci/intel_bus.c | 4 ++++
1 file changed, 4 insertions(+)

Index: linux-2.6/arch/x86/pci/intel_bus.c
===================================================================
--- linux-2.6.orig/arch/x86/pci/intel_bus.c
+++ linux-2.6/arch/x86/pci/intel_bus.c
@@ -49,6 +49,10 @@ static void __devinit pci_root_bus_res(s
u64 mmioh_base, mmioh_end;
int bus_base, bus_end;

+ /* some sys doesn't get mmconf enabled */
+ if (dev->cfg_size < 0x200)
+ return;
+
if (pci_root_num >= PCI_ROOT_NR) {
printk(KERN_DEBUG "intel_bus.c: PCI_ROOT_NR is too small\n");
return;

2009-12-15 18:48:06

by Matthew Wilcox

[permalink] [raw]
Subject: Re: kexec boot regression

On Tue, Dec 15, 2009 at 10:39:37AM -0800, Yinghai Lu wrote:
> + /* some sys doesn't get mmconf enabled */
> + if (dev->cfg_size < 0x200)
> + return;

What is the meaning of this mystic 0x200?

--
Matthew Wilcox Intel Open Source Technology Centre
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours. We can't possibly take such
a retrograde step."

2009-12-15 18:54:49

by Jens Axboe

[permalink] [raw]
Subject: Re: kexec boot regression

On Tue, Dec 15 2009, Yinghai Lu wrote:
> Jens Axboe wrote:
> > On Tue, Dec 15 2009, Yinghai Lu wrote:
> >>>>>>>
> >>>>>> let see how BIOS mess it up again!
> >>>>> Heh, I had a feeling this was coming :-)
>
> [ 0.000000] user-defined physical RAM map:
>
> [ 0.000000] user: 0000000000000100 - 0000000000098800 (usable)
>
> [ 0.000000] user: 0000000000098800 - 00000000000a0000 (reserved)
>
> [ 0.000000] user: 00000000000e0000 - 0000000000100000 (reserved)
>
> [ 0.000000] user: 0000000000100000 - 0000000078c63000 (usable)
>
> [ 0.000000] user: 0000000078c63000 - 0000000078e77000 (ACPI NVS)
>
> [ 0.000000] user: 0000000078e77000 - 000000007924e000 (ACPI data)
>
> [ 0.000000] user: 000000007924e000 - 00000000792c2000 (reserved)
>
> [ 0.000000] user: 00000000792c2000 - 00000000792d2000 (ACPI data)
>
> [ 0.000000] user: 00000000792d2000 - 00000000792e7000 (reserved)
>
> [ 0.000000] user: 00000000792e7000 - 0000000079301000 (ACPI data)
>
> [ 0.000000] user: 0000000079301000 - 0000000079303000 (reserved)
>
> [ 0.000000] user: 0000000079303000 - 0000000079305000 (ACPI data)
>
>
> [ 0.000000] user: 0000000079305000 - 0000000079310000 (reserved)
>
> [ 0.000000] user: 0000000079310000 - 0000000079314000 (ACPI data)
>
> [ 0.000000] user: 0000000079314000 - 0000000079319000 (reserved)
>
> [ 0.000000] user: 0000000079319000 - 0000000079336000 (ACPI data)
>
> [ 0.000000] user: 0000000079336000 - 0000000079358000 (reserved)
>
> [ 0.000000] user: 0000000079358000 - 0000000079388000 (ACPI data)
>
> [ 0.000000] user: 0000000079388000 - 00000000793c9000 (reserved)
>
> [ 0.000000] user: 00000000793c9000 - 000000007968f000 (ACPI data)
>
> [ 0.000000] user: 000000007968f000 - 00000000796bb000 (reserved)
>
> [ 0.000000] user: 00000000796bb000 - 00000000799d8000 (ACPI data)
>
> [ 0.000000] user: 00000000799d8000 - 0000000079bd8000 (ACPI NVS)
>
> [ 0.000000] user: 0000000079bd8000 - 0000000079d87000 (ACPI data)
>
> [ 0.000000] user: 0000000079d87000 - 0000000079d8a000 (reserved)
>
> [ 0.000000] user: 0000000079d8a000 - 0000000079dca000 (ACPI data)
>
> [ 0.000000] user: 0000000079dca000 - 0000000079dcb000 (reserved)
>
> [ 0.000000] user: 0000000079dcb000 - 0000000079e1c000 (ACPI data)
>
> [ 0.000000] user: 0000000079e1c000 - 0000000079e87000 (reserved)
>
> [ 0.000000] user: 0000000079e87000 - 000000007bd5f000 (ACPI data)
>
> [ 0.000000] user: 000000007bd5f000 - 000000007be4f000 (reserved)
>
> [ 0.000000] user: 000000007be4f000 - 000000007bf87000 (ACPI data)
>
> [ 0.000000] user: 0000000100000000 - 0000001080000000 (usable)
> ...
> [ 0.000000] SRAT: Node 0 PXM 0 0-80000000
>
> [ 0.000000] SRAT: Node 0 PXM 0 100000000-480000000
>
> [ 0.000000] SRAT: Node 2 PXM 1 480000000-880000000
>
> [ 0.000000] SRAT: Node 1 PXM 2 880000000-c80000000
>
> [ 0.000000] SRAT: Node 3 PXM 3 c80000000-1080000000
>
> [ 0.000000] ACPI: [SRAT:0x01] ignored 16 entries of 32 found
>
> [ 0.000000] NUMA: Using 31 for the hash shift.
>
> [ 0.000000] SRAT: PXMs only cover 49035MB of your 65419MB e820 RAM. Not used.
>
> [ 0.000000] SRAT: SRAT not used.
>
> [ 0.000000] No NUMA configuration found
>
> so SRAT is broken?
>
> if (max_entries && count > max_entries) {
> printk(KERN_WARNING PREFIX "[%4.4s:0x%02x] ignored %i entries of "
> "%i found\n", id, entry_id, count - max_entries, count);
> }
> ...
>
> or what is your CONFIG_NODES_SHIFT? 3? can you try to set it to 6?

Hmm funky, perhaps the BIOS changed that too. NUMA has otherwise been
working fine, didn't check whether it still did after a BIOS upgrade.
I'll try 6, it is set to 3 iirc.

> [ 13.018720] PCI: MMCONFIG for domain 0000 [bus 00-ff] at [mem 0x80000000-0x8fffffff] (base 0x80000000)
>
> [ 13.100724] [Firmware Bug]: PCI: MMCONFIG at [mem 0x80000000-0x8fffffff] not reserved in ACPI motherboard resources
>
> [ 13.112475] PCI: not using MMCONFIG
>
> [ 13.206650] ACPI: No dock devices found.
>
> so mmconf is not used...<ask BIOS fix it please!>

Reported, thanks.

> then we get
>
> [ 13.990335] IOH bus: [00, 00]
>
> [ 13.993707] IOH bus: 00 index 0 io port: [0, fff]
>
> [ 13.999023] IOH bus: 00 index 1 mmio: [0, ffffff]
>
> [ 14.004335] IOH bus: 00 index 2 mmio: [0, 3ffffff]
>
> please check
>
> [PATCH] x86/pci: intel ioh bus num reg accessing fix
>
> it is above 0x100, so if mmconf is not enable, need to skip it

Will check that now.

--
Jens Axboe

2009-12-15 18:59:44

by Jens Axboe

[permalink] [raw]
Subject: Re: kexec boot regression

On Tue, Dec 15 2009, Yinghai Lu wrote:
> [ 13.018720] PCI: MMCONFIG for domain 0000 [bus 00-ff] at [mem 0x80000000-0x8fffffff] (base 0x80000000)
>
> [ 13.100724] [Firmware Bug]: PCI: MMCONFIG at [mem 0x80000000-0x8fffffff] not reserved in ACPI motherboard resources

On a "normal" non-kexec boot, I get:

[ 12.173583] PCI: MMCONFIG for domain 0000 [bus 00-ff] at [mem 0x80000000-0x8fffffff] (base 0x80000000)
[ 12.184075] PCI: MMCONFIG at [mem 0x80000000-0x8fffffff] reserved in E820
[ 12.216874] PCI: Using configuration type 1 for base access

--
Jens Axboe

2009-12-15 19:06:23

by Yinghai Lu

[permalink] [raw]
Subject: Re: kexec boot regression

Jens Axboe wrote:
> On Tue, Dec 15 2009, Yinghai Lu wrote:
>> [ 13.018720] PCI: MMCONFIG for domain 0000 [bus 00-ff] at [mem 0x80000000-0x8fffffff] (base 0x80000000)
>>
>> [ 13.100724] [Firmware Bug]: PCI: MMCONFIG at [mem 0x80000000-0x8fffffff] not reserved in ACPI motherboard resources
>
> On a "normal" non-kexec boot, I get:
>
> [ 12.173583] PCI: MMCONFIG for domain 0000 [bus 00-ff] at [mem 0x80000000-0x8fffffff] (base 0x80000000)
> [ 12.184075] PCI: MMCONFIG at [mem 0x80000000-0x8fffffff] reserved in E820
> [ 12.216874] PCI: Using configuration type 1 for base access
>

can you run following scripts in first kernel?

cd /sys/firmware/memmap
for dir in * ; do
start=$(cat $dir/start)
end=$(cat $dir/end)
type=$(cat $dir/type)
printf "%016x-%016x (%s)\n" $start $[ $end +1] "$type" >> /tmp/memmap.txt
done

and send out /tmp/memmap.txt

what is your kexec tools version? could be too old?

YH

2009-12-15 19:11:43

by Jens Axboe

[permalink] [raw]
Subject: Re: kexec boot regression

On Tue, Dec 15 2009, Yinghai Lu wrote:
> Jens Axboe wrote:
> > On Tue, Dec 15 2009, Yinghai Lu wrote:
> >> [ 13.018720] PCI: MMCONFIG for domain 0000 [bus 00-ff] at [mem 0x80000000-0x8fffffff] (base 0x80000000)
> >>
> >> [ 13.100724] [Firmware Bug]: PCI: MMCONFIG at [mem 0x80000000-0x8fffffff] not reserved in ACPI motherboard resources
> >
> > On a "normal" non-kexec boot, I get:
> >
> > [ 12.173583] PCI: MMCONFIG for domain 0000 [bus 00-ff] at [mem 0x80000000-0x8fffffff] (base 0x80000000)
> > [ 12.184075] PCI: MMCONFIG at [mem 0x80000000-0x8fffffff] reserved in E820
> > [ 12.216874] PCI: Using configuration type 1 for base access
> >
>
> can you run following scripts in first kernel?
>
> cd /sys/firmware/memmap
> for dir in * ; do
> start=$(cat $dir/start)
> end=$(cat $dir/end)
> type=$(cat $dir/type)
> printf "%016x-%016x (%s)\n" $start $[ $end +1] "$type" >> /tmp/memmap.txt
> done
>
> and send out /tmp/memmap.txt

Below.

> what is your kexec tools version? could be too old?

It says:

kexec-tools-testing 20080324 released 24th March 2008


0000000000000000-0000000000098800 (System RAM)
0000000000098800-00000000000a0000 (reserved)
0000000079301000-0000000079303000 (reserved)
0000000079303000-0000000079305000 (ACPI Tables)
0000000079305000-0000000079310000 (reserved)
0000000079310000-0000000079314000 (ACPI Tables)
0000000079314000-0000000079319000 (reserved)
0000000079319000-0000000079336000 (ACPI Tables)
0000000079336000-0000000079358000 (reserved)
0000000079358000-0000000079388000 (ACPI Tables)
0000000079388000-00000000793c9000 (reserved)
00000000793c9000-000000007968f000 (ACPI Tables)
00000000000e0000-0000000000100000 (reserved)
000000007968f000-00000000796bb000 (reserved)
00000000796bb000-00000000799d8000 (ACPI Tables)
00000000799d8000-0000000079bd8000 (ACPI Non-volatile Storage)
0000000079bd8000-0000000079d8b000 (ACPI Tables)
0000000079d8b000-0000000079d8c000 (reserved)
0000000079d8c000-0000000079dc8000 (ACPI Tables)
0000000079dc8000-0000000079dcb000 (reserved)
0000000079dcb000-0000000079e1c000 (ACPI Tables)
0000000079e1c000-0000000079e87000 (reserved)
0000000079e87000-000000007bd5f000 (ACPI Tables)
0000000000100000-0000000078c59000 (System RAM)
000000007bd5f000-000000007be4f000 (reserved)
000000007be4f000-000000007bf87000 (ACPI Tables)
000000007bf87000-000000007bfcf000 (ACPI Non-volatile Storage)
000000007bfcf000-000000007bfff000 (ACPI Tables)
000000007bfff000-0000000090000000 (reserved)
00000000fc000000-00000000fd000000 (reserved)
00000000fed1c000-00000000fed20000 (reserved)
00000000ff000000-0000000100000000 (reserved)
0000000100000000-0000001080000000 (System RAM)
0000000078c59000-0000000078e6d000 (ACPI Non-volatile Storage)
0000000078e6d000-000000007924e000 (ACPI Tables)
000000007924e000-00000000792c2000 (reserved)
00000000792c2000-00000000792d2000 (ACPI Tables)
00000000792d2000-00000000792e7000 (reserved)
00000000792e7000-0000000079301000 (ACPI Tables)

--
Jens Axboe

2009-12-15 19:18:48

by Yinghai Lu

[permalink] [raw]
Subject: Re: kexec boot regression

Jens Axboe wrote:
> On Tue, Dec 15 2009, Yinghai Lu wrote:
>> Jens Axboe wrote:
>>> On Tue, Dec 15 2009, Yinghai Lu wrote:
>>>> [ 13.018720] PCI: MMCONFIG for domain 0000 [bus 00-ff] at [mem 0x80000000-0x8fffffff] (base 0x80000000)
>>>>
>>>> [ 13.100724] [Firmware Bug]: PCI: MMCONFIG at [mem 0x80000000-0x8fffffff] not reserved in ACPI motherboard resources
>>> On a "normal" non-kexec boot, I get:
>>>
>>> [ 12.173583] PCI: MMCONFIG for domain 0000 [bus 00-ff] at [mem 0x80000000-0x8fffffff] (base 0x80000000)
>>> [ 12.184075] PCI: MMCONFIG at [mem 0x80000000-0x8fffffff] reserved in E820
>>> [ 12.216874] PCI: Using configuration type 1 for base access
>>>
>> can you run following scripts in first kernel?
>>
>> cd /sys/firmware/memmap
>> for dir in * ; do
>> start=$(cat $dir/start)
>> end=$(cat $dir/end)
>> type=$(cat $dir/type)
>> printf "%016x-%016x (%s)\n" $start $[ $end +1] "$type" >> /tmp/memmap.txt
>> done
>>
>> and send out /tmp/memmap.txt
>
> Below.
>
>> what is your kexec tools version? could be too old?
>
> It says:
>
> kexec-tools-testing 20080324 released 24th March 2008
>
>
> 0000000000000000-0000000000098800 (System RAM)
> 0000000000098800-00000000000a0000 (reserved)
> 0000000079301000-0000000079303000 (reserved)
> 0000000079303000-0000000079305000 (ACPI Tables)
> 0000000079305000-0000000079310000 (reserved)
> 0000000079310000-0000000079314000 (ACPI Tables)
> 0000000079314000-0000000079319000 (reserved)
> 0000000079319000-0000000079336000 (ACPI Tables)
> 0000000079336000-0000000079358000 (reserved)
> 0000000079358000-0000000079388000 (ACPI Tables)
> 0000000079388000-00000000793c9000 (reserved)
> 00000000793c9000-000000007968f000 (ACPI Tables)
> 00000000000e0000-0000000000100000 (reserved)
> 000000007968f000-00000000796bb000 (reserved)
> 00000000796bb000-00000000799d8000 (ACPI Tables)
> 00000000799d8000-0000000079bd8000 (ACPI Non-volatile Storage)
> 0000000079bd8000-0000000079d8b000 (ACPI Tables)
> 0000000079d8b000-0000000079d8c000 (reserved)
> 0000000079d8c000-0000000079dc8000 (ACPI Tables)
> 0000000079dc8000-0000000079dcb000 (reserved)
> 0000000079dcb000-0000000079e1c000 (ACPI Tables)
> 0000000079e1c000-0000000079e87000 (reserved)
> 0000000079e87000-000000007bd5f000 (ACPI Tables)
> 0000000000100000-0000000078c59000 (System RAM)
> 000000007bd5f000-000000007be4f000 (reserved)
> 000000007be4f000-000000007bf87000 (ACPI Tables)
> 000000007bf87000-000000007bfcf000 (ACPI Non-volatile Storage)
> 000000007bfcf000-000000007bfff000 (ACPI Tables)
> 000000007bfff000-0000000090000000 (reserved)
> 00000000fc000000-00000000fd000000 (reserved)
> 00000000fed1c000-00000000fed20000 (reserved)
> 00000000ff000000-0000000100000000 (reserved)
> 0000000100000000-0000001080000000 (System RAM)
> 0000000078c59000-0000000078e6d000 (ACPI Non-volatile Storage)
> 0000000078e6d000-000000007924e000 (ACPI Tables)
> 000000007924e000-00000000792c2000 (reserved)
> 00000000792c2000-00000000792d2000 (ACPI Tables)
> 00000000792d2000-00000000792e7000 (reserved)
> 00000000792e7000-0000000079301000 (ACPI Tables)
>

boot log of first kernel?

YH

2009-12-15 19:22:31

by Jens Axboe

[permalink] [raw]
Subject: Re: kexec boot regression

On Tue, Dec 15 2009, Yinghai Lu wrote:
> Jens Axboe wrote:
> > On Tue, Dec 15 2009, Yinghai Lu wrote:
> >> Jens Axboe wrote:
> >>> On Tue, Dec 15 2009, Yinghai Lu wrote:
> >>>> [ 13.018720] PCI: MMCONFIG for domain 0000 [bus 00-ff] at [mem 0x80000000-0x8fffffff] (base 0x80000000)
> >>>>
> >>>> [ 13.100724] [Firmware Bug]: PCI: MMCONFIG at [mem 0x80000000-0x8fffffff] not reserved in ACPI motherboard resources
> >>> On a "normal" non-kexec boot, I get:
> >>>
> >>> [ 12.173583] PCI: MMCONFIG for domain 0000 [bus 00-ff] at [mem 0x80000000-0x8fffffff] (base 0x80000000)
> >>> [ 12.184075] PCI: MMCONFIG at [mem 0x80000000-0x8fffffff] reserved in E820
> >>> [ 12.216874] PCI: Using configuration type 1 for base access
> >>>
> >> can you run following scripts in first kernel?
> >>
> >> cd /sys/firmware/memmap
> >> for dir in * ; do
> >> start=$(cat $dir/start)
> >> end=$(cat $dir/end)
> >> type=$(cat $dir/type)
> >> printf "%016x-%016x (%s)\n" $start $[ $end +1] "$type" >> /tmp/memmap.txt
> >> done
> >>
> >> and send out /tmp/memmap.txt
> >
> > Below.
> >
> >> what is your kexec tools version? could be too old?
> >
> > It says:
> >
> > kexec-tools-testing 20080324 released 24th March 2008
> >
> >
> > 0000000000000000-0000000000098800 (System RAM)
> > 0000000000098800-00000000000a0000 (reserved)
> > 0000000079301000-0000000079303000 (reserved)
> > 0000000079303000-0000000079305000 (ACPI Tables)
> > 0000000079305000-0000000079310000 (reserved)
> > 0000000079310000-0000000079314000 (ACPI Tables)
> > 0000000079314000-0000000079319000 (reserved)
> > 0000000079319000-0000000079336000 (ACPI Tables)
> > 0000000079336000-0000000079358000 (reserved)
> > 0000000079358000-0000000079388000 (ACPI Tables)
> > 0000000079388000-00000000793c9000 (reserved)
> > 00000000793c9000-000000007968f000 (ACPI Tables)
> > 00000000000e0000-0000000000100000 (reserved)
> > 000000007968f000-00000000796bb000 (reserved)
> > 00000000796bb000-00000000799d8000 (ACPI Tables)
> > 00000000799d8000-0000000079bd8000 (ACPI Non-volatile Storage)
> > 0000000079bd8000-0000000079d8b000 (ACPI Tables)
> > 0000000079d8b000-0000000079d8c000 (reserved)
> > 0000000079d8c000-0000000079dc8000 (ACPI Tables)
> > 0000000079dc8000-0000000079dcb000 (reserved)
> > 0000000079dcb000-0000000079e1c000 (ACPI Tables)
> > 0000000079e1c000-0000000079e87000 (reserved)
> > 0000000079e87000-000000007bd5f000 (ACPI Tables)
> > 0000000000100000-0000000078c59000 (System RAM)
> > 000000007bd5f000-000000007be4f000 (reserved)
> > 000000007be4f000-000000007bf87000 (ACPI Tables)
> > 000000007bf87000-000000007bfcf000 (ACPI Non-volatile Storage)
> > 000000007bfcf000-000000007bfff000 (ACPI Tables)
> > 000000007bfff000-0000000090000000 (reserved)
> > 00000000fc000000-00000000fd000000 (reserved)
> > 00000000fed1c000-00000000fed20000 (reserved)
> > 00000000ff000000-0000000100000000 (reserved)
> > 0000000100000000-0000001080000000 (System RAM)
> > 0000000078c59000-0000000078e6d000 (ACPI Non-volatile Storage)
> > 0000000078e6d000-000000007924e000 (ACPI Tables)
> > 000000007924e000-00000000792c2000 (reserved)
> > 00000000792c2000-00000000792d2000 (ACPI Tables)
> > 00000000792d2000-00000000792e7000 (reserved)
> > 00000000792e7000-0000000079301000 (ACPI Tables)
> >
>
> boot log of first kernel?

Hmm not completely sure, let me re-do it after a cold boot.

BTW, I just checked, and 2.6.32 has NUMA working fine. Below is the SRAT
and NUMA output from 2.6.32 (kexec'ed kernel). Is the check a newly
introduced one?

[ 0.000000] SRAT: PXM 0 -> APIC 0 -> Node 0
[ 0.000000] SRAT: PXM 2 -> APIC 64 -> Node 1
[ 0.000000] SRAT: PXM 1 -> APIC 32 -> Node 2
[ 0.000000] SRAT: PXM 3 -> APIC 96 -> Node 3
[ 0.000000] SRAT: PXM 0 -> APIC 2 -> Node 0
[ 0.000000] SRAT: PXM 2 -> APIC 66 -> Node 1
[ 0.000000] SRAT: PXM 1 -> APIC 34 -> Node 2
[ 0.000000] SRAT: PXM 3 -> APIC 98 -> Node 3
[ 0.000000] SRAT: PXM 0 -> APIC 4 -> Node 0
[ 0.000000] SRAT: PXM 2 -> APIC 68 -> Node 1
[ 0.000000] SRAT: PXM 1 -> APIC 36 -> Node 2
[ 0.000000] SRAT: PXM 3 -> APIC 100 -> Node 3
[ 0.000000] SRAT: PXM 0 -> APIC 6 -> Node 0
[ 0.000000] SRAT: PXM 2 -> APIC 70 -> Node 1
[ 0.000000] SRAT: PXM 1 -> APIC 38 -> Node 2
[ 0.000000] SRAT: PXM 3 -> APIC 102 -> Node 3
[ 0.000000] SRAT: PXM 0 -> APIC 16 -> Node 0
[ 0.000000] SRAT: PXM 2 -> APIC 80 -> Node 1
[ 0.000000] SRAT: PXM 1 -> APIC 48 -> Node 2
[ 0.000000] SRAT: PXM 3 -> APIC 112 -> Node 3
[ 0.000000] SRAT: PXM 0 -> APIC 18 -> Node 0
[ 0.000000] SRAT: PXM 2 -> APIC 82 -> Node 1
[ 0.000000] SRAT: PXM 1 -> APIC 50 -> Node 2
[ 0.000000] SRAT: PXM 3 -> APIC 114 -> Node 3
[ 0.000000] SRAT: PXM 0 -> APIC 20 -> Node 0
[ 0.000000] SRAT: PXM 2 -> APIC 84 -> Node 1
[ 0.000000] SRAT: PXM 1 -> APIC 52 -> Node 2
[ 0.000000] SRAT: PXM 3 -> APIC 116 -> Node 3
[ 0.000000] SRAT: PXM 0 -> APIC 22 -> Node 0
[ 0.000000] SRAT: PXM 2 -> APIC 86 -> Node 1
[ 0.000000] SRAT: PXM 1 -> APIC 54 -> Node 2
[ 0.000000] SRAT: PXM 3 -> APIC 118 -> Node 3
[ 0.000000] SRAT: PXM 0 -> APIC 1 -> Node 0
[ 0.000000] SRAT: PXM 2 -> APIC 65 -> Node 1
[ 0.000000] SRAT: PXM 1 -> APIC 33 -> Node 2
[ 0.000000] SRAT: PXM 3 -> APIC 97 -> Node 3
[ 0.000000] SRAT: PXM 0 -> APIC 3 -> Node 0
[ 0.000000] SRAT: PXM 2 -> APIC 67 -> Node 1
[ 0.000000] SRAT: PXM 1 -> APIC 35 -> Node 2
[ 0.000000] SRAT: PXM 3 -> APIC 99 -> Node 3
[ 0.000000] SRAT: PXM 0 -> APIC 5 -> Node 0
[ 0.000000] SRAT: PXM 2 -> APIC 69 -> Node 1
[ 0.000000] SRAT: PXM 1 -> APIC 37 -> Node 2
[ 0.000000] SRAT: PXM 3 -> APIC 101 -> Node 3
[ 0.000000] SRAT: PXM 0 -> APIC 7 -> Node 0
[ 0.000000] SRAT: PXM 2 -> APIC 71 -> Node 1
[ 0.000000] SRAT: PXM 1 -> APIC 39 -> Node 2
[ 0.000000] SRAT: PXM 3 -> APIC 103 -> Node 3
[ 0.000000] SRAT: PXM 0 -> APIC 17 -> Node 0
[ 0.000000] SRAT: PXM 2 -> APIC 81 -> Node 1
[ 0.000000] SRAT: PXM 1 -> APIC 49 -> Node 2
[ 0.000000] SRAT: PXM 3 -> APIC 113 -> Node 3
[ 0.000000] SRAT: PXM 0 -> APIC 19 -> Node 0
[ 0.000000] SRAT: PXM 2 -> APIC 83 -> Node 1
[ 0.000000] SRAT: PXM 1 -> APIC 51 -> Node 2
[ 0.000000] SRAT: PXM 3 -> APIC 115 -> Node 3
[ 0.000000] SRAT: PXM 0 -> APIC 21 -> Node 0
[ 0.000000] SRAT: PXM 2 -> APIC 85 -> Node 1
[ 0.000000] SRAT: PXM 1 -> APIC 53 -> Node 2
[ 0.000000] SRAT: PXM 3 -> APIC 117 -> Node 3
[ 0.000000] SRAT: PXM 0 -> APIC 23 -> Node 0
[ 0.000000] SRAT: PXM 2 -> APIC 87 -> Node 1
[ 0.000000] SRAT: PXM 1 -> APIC 55 -> Node 2
[ 0.000000] SRAT: PXM 3 -> APIC 119 -> Node 3
[ 0.000000] SRAT: Node 0 PXM 0 0-80000000
[ 0.000000] SRAT: Node 0 PXM 0 100000000-480000000
[ 0.000000] SRAT: Node 2 PXM 1 480000000-880000000
[ 0.000000] SRAT: Node 1 PXM 2 880000000-c80000000
[ 0.000000] SRAT: Node 3 PXM 3 c80000000-1080000000
[ 0.000000] NUMA: Using 31 for the hash shift.
[ 0.000000] Bootmem setup node 0 0000000000000000-0000000480000000
[ 0.000000] NODE_DATA [0000000000048000 - 000000000004cfff]
[ 0.000000] bootmap [0000000000100000 - 000000000018ffff] pages 90
[ 0.000000] (8 early reservations) ==> bootmem [0000000000 - 0480000000]
[ 0.000000] #0 [0000000000 - 0000001000] BIOS data page ==> [0000000000 - 0000001000]
[ 0.000000] #1 [0000006000 - 0000008000] TRAMPOLINE ==> [0000006000 - 0000008000]
[ 0.000000] #2 [0001000000 - 000200f260] TEXT DATA BSS ==> [0001000000 - 000200f260]
[ 0.000000] #3 [0000098800 - 0000100000] BIOS reserved ==> [0000098800 - 0000100000]
[ 0.000000] #4 [0002010000 - 000201035c] BRK ==> [0002010000 - 000201035c]
[ 0.000000] #5 [0000008000 - 000000a000] PGTABLE ==> [0000008000 - 000000a000]
[ 0.000000] #6 [000000a000 - 0000048000] PGTABLE ==> [000000a000 - 0000048000]
[ 0.000000] #7 [0000001000 - 000000103c] ACPI SLIT ==> [0000001000 - 000000103c]
[ 0.000000] Bootmem setup node 1 0000000880000000-0000000c80000000
[ 0.000000] NODE_DATA [0000000880000000 - 0000000880004fff]
[ 0.000000] bootmap [0000000880005000 - 0000000880084fff] pages 80
[ 0.000000] (8 early reservations) ==> bootmem [0880000000 - 0c80000000]
[ 0.000000] #0 [0000000000 - 0000001000] BIOS data page
[ 0.000000] #1 [0000006000 - 0000008000] TRAMPOLINE
[ 0.000000] #2 [0001000000 - 000200f260] TEXT DATA BSS
[ 0.000000] #3 [0000098800 - 0000100000] BIOS reserved
[ 0.000000] #4 [0002010000 - 000201035c] BRK
[ 0.000000] #5 [0000008000 - 000000a000] PGTABLE
[ 0.000000] #6 [000000a000 - 0000048000] PGTABLE
[ 0.000000] #7 [0000001000 - 000000103c] ACPI SLIT
[ 0.000000] Bootmem setup node 2 0000000480000000-0000000880000000
[ 0.000000] NODE_DATA [0000000480000000 - 0000000480004fff]
[ 0.000000] bootmap [0000000480005000 - 0000000480084fff] pages 80
[ 0.000000] (8 early reservations) ==> bootmem [0480000000 - 0880000000]
[ 0.000000] #0 [0000000000 - 0000001000] BIOS data page
[ 0.000000] #1 [0000006000 - 0000008000] TRAMPOLINE
[ 0.000000] #2 [0001000000 - 000200f260] TEXT DATA BSS
[ 0.000000] #3 [0000098800 - 0000100000] BIOS reserved
[ 0.000000] #4 [0002010000 - 000201035c] BRK
[ 0.000000] #5 [0000008000 - 000000a000] PGTABLE
[ 0.000000] #6 [000000a000 - 0000048000] PGTABLE
[ 0.000000] #7 [0000001000 - 000000103c] ACPI SLIT
[ 0.000000] Bootmem setup node 3 0000000c80000000-0000001080000000
[ 0.000000] NODE_DATA [0000000c80000000 - 0000000c80004fff]
[ 0.000000] bootmap [0000000c80005000 - 0000000c80084fff] pages 80
[ 0.000000] (8 early reservations) ==> bootmem [0c80000000 - 1080000000]
[ 0.000000] #0 [0000000000 - 0000001000] BIOS data page
[ 0.000000] #1 [0000006000 - 0000008000] TRAMPOLINE
[ 0.000000] #2 [0001000000 - 000200f260] TEXT DATA BSS
[ 0.000000] #3 [0000098800 - 0000100000] BIOS reserved
[ 0.000000] #4 [0002010000 - 000201035c] BRK
[ 0.000000] #5 [0000008000 - 000000a000] PGTABLE
[ 0.000000] #6 [000000a000 - 0000048000] PGTABLE
[ 0.000000] #7 [0000001000 - 000000103c] ACPI SLIT
[ 0.000000] found SMP MP-table at [ffff8800000fddb0] fddb0
[ 0.000000] [ffffea0000000000-ffffea001d3fffff] PMD -> [ffff880028600000-ffff8800425fffff] on node 0
[ 0.000000] [ffffea001d400000-ffffea00373fffff] PMD -> [ffff880480200000-ffff88049a1fffff] on node 2
[ 0.000000] [ffffea0037400000-ffffea003fffffff] PMD -> [ffff880880200000-ffff880888dfffff] on node 1
[ 0.000000] [ffffea0040000000-ffffea00513fffff] PMD -> [ffff880889000000-ffff88089a3fffff] on node 1
[ 0.000000] [ffffea0051400000-ffffea006b3fffff] PMD -> [ffff880c80200000-ffff880c9a1fffff] on node 3
[ 0.000000] Zone PFN ranges:
[ 0.000000] DMA 0x00000001 -> 0x00001000
[ 0.000000] DMA32 0x00001000 -> 0x00100000
[ 0.000000] Normal 0x00100000 -> 0x01080000
[ 0.000000] Movable zone start PFN for each node
[ 0.000000] early_node_map[6] active PFN ranges
[ 0.000000] 0: 0x00000001 -> 0x00000098
[ 0.000000] 0: 0x00000100 -> 0x00078c59
[ 0.000000] 0: 0x00100000 -> 0x00480000
[ 0.000000] 2: 0x00480000 -> 0x00880000
[ 0.000000] 1: 0x00880000 -> 0x00c80000
[ 0.000000] 3: 0x00c80000 -> 0x01080000
[ 0.000000] On node 0 totalpages: 4164592
[ 0.000000] DMA zone: 104 pages used for memmap
[ 0.000000] DMA zone: 185 pages reserved
[ 0.000000] DMA zone: 3702 pages, LIFO batch:0
[ 0.000000] DMA32 zone: 26520 pages used for memmap
[ 0.000000] DMA32 zone: 464065 pages, LIFO batch:31
[ 0.000000] Normal zone: 93184 pages used for memmap
[ 0.000000] Normal zone: 3576832 pages, LIFO batch:31
[ 0.000000] On node 1 totalpages: 4194304
[ 0.000000] Normal zone: 106496 pages used for memmap
[ 0.000000] Normal zone: 4087808 pages, LIFO batch:31
[ 0.000000] On node 2 totalpages: 4194304
[ 0.000000] Normal zone: 106496 pages used for memmap
[ 0.000000] Normal zone: 4087808 pages, LIFO batch:31
[ 0.000000] On node 3 totalpages: 4194304
[ 0.000000] Normal zone: 106496 pages used for memmap
[ 0.000000] Normal zone: 4087808 pages, LIFO batch:31

--
Jens Axboe

2009-12-15 19:32:11

by Jens Axboe

[permalink] [raw]
Subject: Re: kexec boot regression

On Tue, Dec 15 2009, Jens Axboe wrote:
> > boot log of first kernel?
>
> Hmm not completely sure, let me re-do it after a cold boot.

This is from a cold boot of 2.6.32.

0000000000000000-0000000000098800 (System RAM)
0000000000098800-00000000000a0000 (reserved)
0000000079301000-0000000079303000 (reserved)
0000000079303000-0000000079305000 (ACPI Tables)
0000000079305000-0000000079310000 (reserved)
0000000079310000-0000000079314000 (ACPI Tables)
0000000079314000-0000000079319000 (reserved)
0000000079319000-0000000079336000 (ACPI Tables)
0000000079336000-0000000079358000 (reserved)
0000000079358000-0000000079388000 (ACPI Tables)
0000000079388000-00000000793c9000 (reserved)
00000000793c9000-000000007968f000 (ACPI Tables)
00000000000e0000-0000000000100000 (reserved)
000000007968f000-00000000796bb000 (reserved)
00000000796bb000-00000000799d8000 (ACPI Tables)
00000000799d8000-0000000079bd8000 (ACPI Non-volatile Storage)
0000000079bd8000-0000000079d87000 (ACPI Tables)
0000000079d87000-0000000079d8a000 (reserved)
0000000079d8a000-0000000079dca000 (ACPI Tables)
0000000079dca000-0000000079dcb000 (reserved)
0000000079dcb000-0000000079e1c000 (ACPI Tables)
0000000079e1c000-0000000079e87000 (reserved)
0000000079e87000-000000007bd5f000 (ACPI Tables)
0000000000100000-0000000078c63000 (System RAM)
000000007bd5f000-000000007be4f000 (reserved)
000000007be4f000-000000007bf87000 (ACPI Tables)
000000007bf87000-000000007bfcf000 (ACPI Non-volatile Storage)
000000007bfcf000-000000007bfff000 (ACPI Tables)
000000007bfff000-0000000090000000 (reserved)
00000000fc000000-00000000fd000000 (reserved)
00000000fed1c000-00000000fed20000 (reserved)
00000000ff000000-0000000100000000 (reserved)
0000000100000000-0000001080000000 (System RAM)
0000000078c63000-0000000078e77000 (ACPI Non-volatile Storage)
0000000078e77000-000000007924e000 (ACPI Tables)
000000007924e000-00000000792c2000 (reserved)
00000000792c2000-00000000792d2000 (ACPI Tables)
00000000792d2000-00000000792e7000 (reserved)
00000000792e7000-0000000079301000 (ACPI Tables)

--
Jens Axboe

2009-12-15 19:43:39

by Jens Axboe

[permalink] [raw]
Subject: Re: kexec boot regression

On Tue, Dec 15 2009, Yinghai Lu wrote:
> [PATCH] x86/pci: intel ioh bus num reg accessing fix
>
> it is above 0x100, so if mmconf is not enable, need to skip it

This works, it kexecs kernels fine. But since 2.6.32 doesn't have the
mmconf problem to begin with, are we now just working around the issue?
SRAT still reports issues, numa doesn't work.

--
Jens Axboe

2009-12-15 19:45:47

by Yinghai Lu

[permalink] [raw]
Subject: Re: kexec boot regression

Jens Axboe wrote:
> On Tue, Dec 15 2009, Yinghai Lu wrote:
>> Jens Axboe wrote:
>>> On Tue, Dec 15 2009, Yinghai Lu wrote:
>>>> [ 13.018720] PCI: MMCONFIG for domain 0000 [bus 00-ff] at [mem 0x80000000-0x8fffffff] (base 0x80000000)
>>>>
>>>> [ 13.100724] [Firmware Bug]: PCI: MMCONFIG at [mem 0x80000000-0x8fffffff] not reserved in ACPI motherboard resources
>>> On a "normal" non-kexec boot, I get:
>>>
>>> [ 12.173583] PCI: MMCONFIG for domain 0000 [bus 00-ff] at [mem 0x80000000-0x8fffffff] (base 0x80000000)
>>> [ 12.184075] PCI: MMCONFIG at [mem 0x80000000-0x8fffffff] reserved in E820
>>> [ 12.216874] PCI: Using configuration type 1 for base access
>>>
>> can you run following scripts in first kernel?
>>
>> cd /sys/firmware/memmap
>> for dir in * ; do
>> start=$(cat $dir/start)
>> end=$(cat $dir/end)
>> type=$(cat $dir/type)
>> printf "%016x-%016x (%s)\n" $start $[ $end +1] "$type" >> /tmp/memmap.txt
>> done
>>
>> and send out /tmp/memmap.txt
>
> Below.
>
>> what is your kexec tools version? could be too old?
>
> It says:
>
> kexec-tools-testing 20080324 released 24th March 2008
>
>
> 0000000000000000-0000000000098800 (System RAM)
> 0000000000098800-00000000000a0000 (reserved)
> 0000000079301000-0000000079303000 (reserved)
> 0000000079303000-0000000079305000 (ACPI Tables)
> 0000000079305000-0000000079310000 (reserved)
> 0000000079310000-0000000079314000 (ACPI Tables)
> 0000000079314000-0000000079319000 (reserved)
> 0000000079319000-0000000079336000 (ACPI Tables)
> 0000000079336000-0000000079358000 (reserved)
> 0000000079358000-0000000079388000 (ACPI Tables)
> 0000000079388000-00000000793c9000 (reserved)
> 00000000793c9000-000000007968f000 (ACPI Tables)
> 00000000000e0000-0000000000100000 (reserved)
> 000000007968f000-00000000796bb000 (reserved)
> 00000000796bb000-00000000799d8000 (ACPI Tables)
> 00000000799d8000-0000000079bd8000 (ACPI Non-volatile Storage)
> 0000000079bd8000-0000000079d8b000 (ACPI Tables)
> 0000000079d8b000-0000000079d8c000 (reserved)
> 0000000079d8c000-0000000079dc8000 (ACPI Tables)
> 0000000079dc8000-0000000079dcb000 (reserved)
> 0000000079dcb000-0000000079e1c000 (ACPI Tables)
> 0000000079e1c000-0000000079e87000 (reserved)
> 0000000079e87000-000000007bd5f000 (ACPI Tables)
> 0000000000100000-0000000078c59000 (System RAM)
> 000000007bd5f000-000000007be4f000 (reserved)
> 000000007be4f000-000000007bf87000 (ACPI Tables)

so following ranges are not passed to second kernel by kexec?

> 000000007bf87000-000000007bfcf000 (ACPI Non-volatile Storage)
> 000000007bfcf000-000000007bfff000 (ACPI Tables)
> 000000007bfff000-0000000090000000 (reserved)
> 00000000fc000000-00000000fd000000 (reserved)
> 00000000fed1c000-00000000fed20000 (reserved)
> 00000000ff000000-0000000100000000 (reserved)
> 0000000100000000-0000001080000000 (System RAM)
> 0000000078c59000-0000000078e6d000 (ACPI Non-volatile Storage)
> 0000000078e6d000-000000007924e000 (ACPI Tables)
> 000000007924e000-00000000792c2000 (reserved)
> 00000000792c2000-00000000792d2000 (ACPI Tables)
> 00000000792d2000-00000000792e7000 (reserved)
> 00000000792e7000-0000000079301000 (ACPI Tables)
>

second kernel only get

[ 0.000000] BIOS-provided physical RAM map:

[ 0.000000] BIOS-e820: 0000000000000100 - 0000000000098800 (usable)

[ 0.000000] BIOS-e820: 0000000000098800 - 00000000000a0000 (reserved)

[ 0.000000] BIOS-e820: 00000000000e0000 - 0000000000100000 (reserved)

[ 0.000000] BIOS-e820: 0000000000100000 - 0000000078c63000 (usable)

[ 0.000000] BIOS-e820: 0000000078c63000 - 0000000078e77000 (ACPI NVS)

[ 0.000000] BIOS-e820: 0000000078e77000 - 000000007924e000 (ACPI data)

[ 0.000000] BIOS-e820: 000000007924e000 - 00000000792c2000 (reserved)

[ 0.000000] BIOS-e820: 00000000792c2000 - 00000000792d2000 (ACPI data)

[ 0.000000] BIOS-e820: 00000000792d2000 - 00000000792e7000 (reserved)

[ 0.000000] BIOS-e820: 00000000792e7000 - 0000000079301000 (ACPI data)

[ 0.000000] BIOS-e820: 0000000079301000 - 0000000079303000 (reserved)

[ 0.000000] BIOS-e820: 0000000079303000 - 0000000079305000 (ACPI data)

[ 0.000000] BIOS-e820: 0000000079305000 - 0000000079310000 (reserved)

[ 0.000000] BIOS-e820: 0000000079310000 - 0000000079314000 (ACPI data)

[ 0.000000] BIOS-e820: 0000000079314000 - 0000000079319000 (reserved)

[ 0.000000] BIOS-e820: 0000000079319000 - 0000000079336000 (ACPI data)

[ 0.000000] BIOS-e820: 0000000079336000 - 0000000079358000 (reserved)

[ 0.000000] BIOS-e820: 0000000079358000 - 0000000079388000 (ACPI data)

[ 0.000000] BIOS-e820: 0000000079388000 - 00000000793c9000 (reserved)

[ 0.000000] BIOS-e820: 00000000793c9000 - 000000007968f000 (ACPI data)

[ 0.000000] BIOS-e820: 000000007968f000 - 00000000796bb000 (reserved)

[ 0.000000] BIOS-e820: 00000000796bb000 - 00000000799d8000 (ACPI data)

[ 0.000000] BIOS-e820: 00000000799d8000 - 0000000079bd8000 (ACPI NVS)

[ 0.000000] BIOS-e820: 0000000079bd8000 - 0000000079d87000 (ACPI data)

[ 0.000000] BIOS-e820: 0000000079d87000 - 0000000079d8a000 (reserved)

[ 0.000000] BIOS-e820: 0000000079d8a000 - 0000000079dca000 (ACPI data)

[ 0.000000] BIOS-e820: 0000000079dca000 - 0000000079dcb000 (reserved)

[ 0.000000] BIOS-e820: 0000000079dcb000 - 0000000079e1c000 (ACPI data)

[ 0.000000] BIOS-e820: 0000000079e1c000 - 0000000079e87000 (reserved)

[ 0.000000] BIOS-e820: 0000000079e87000 - 000000007bd5f000 (ACPI data)

[ 0.000000] BIOS-e820: 000000007bd5f000 - 000000007be4f000 (reserved)

[ 0.000000] BIOS-e820: 000000007be4f000 - 000000007bf87000 (ACPI data)

so mmconf range is not reserved, and some ACPI data
> 0000000078c59000-0000000078e6d000 (ACPI Non-volatile Storage)
0000000078c59000 - 0000000078c63000 get currupted...

YH

2009-12-15 19:48:23

by Jens Axboe

[permalink] [raw]
Subject: Re: kexec boot regression

On Tue, Dec 15 2009, Yinghai Lu wrote:
> Jens Axboe wrote:
> > On Tue, Dec 15 2009, Yinghai Lu wrote:
> >> Jens Axboe wrote:
> >>> On Tue, Dec 15 2009, Yinghai Lu wrote:
> >>>> [ 13.018720] PCI: MMCONFIG for domain 0000 [bus 00-ff] at [mem 0x80000000-0x8fffffff] (base 0x80000000)
> >>>>
> >>>> [ 13.100724] [Firmware Bug]: PCI: MMCONFIG at [mem 0x80000000-0x8fffffff] not reserved in ACPI motherboard resources
> >>> On a "normal" non-kexec boot, I get:
> >>>
> >>> [ 12.173583] PCI: MMCONFIG for domain 0000 [bus 00-ff] at [mem 0x80000000-0x8fffffff] (base 0x80000000)
> >>> [ 12.184075] PCI: MMCONFIG at [mem 0x80000000-0x8fffffff] reserved in E820
> >>> [ 12.216874] PCI: Using configuration type 1 for base access
> >>>
> >> can you run following scripts in first kernel?
> >>
> >> cd /sys/firmware/memmap
> >> for dir in * ; do
> >> start=$(cat $dir/start)
> >> end=$(cat $dir/end)
> >> type=$(cat $dir/type)
> >> printf "%016x-%016x (%s)\n" $start $[ $end +1] "$type" >> /tmp/memmap.txt
> >> done
> >>
> >> and send out /tmp/memmap.txt
> >
> > Below.
> >
> >> what is your kexec tools version? could be too old?
> >
> > It says:
> >
> > kexec-tools-testing 20080324 released 24th March 2008
> >
> >
> > 0000000000000000-0000000000098800 (System RAM)
> > 0000000000098800-00000000000a0000 (reserved)
> > 0000000079301000-0000000079303000 (reserved)
> > 0000000079303000-0000000079305000 (ACPI Tables)
> > 0000000079305000-0000000079310000 (reserved)
> > 0000000079310000-0000000079314000 (ACPI Tables)
> > 0000000079314000-0000000079319000 (reserved)
> > 0000000079319000-0000000079336000 (ACPI Tables)
> > 0000000079336000-0000000079358000 (reserved)
> > 0000000079358000-0000000079388000 (ACPI Tables)
> > 0000000079388000-00000000793c9000 (reserved)
> > 00000000793c9000-000000007968f000 (ACPI Tables)
> > 00000000000e0000-0000000000100000 (reserved)
> > 000000007968f000-00000000796bb000 (reserved)
> > 00000000796bb000-00000000799d8000 (ACPI Tables)
> > 00000000799d8000-0000000079bd8000 (ACPI Non-volatile Storage)
> > 0000000079bd8000-0000000079d8b000 (ACPI Tables)
> > 0000000079d8b000-0000000079d8c000 (reserved)
> > 0000000079d8c000-0000000079dc8000 (ACPI Tables)
> > 0000000079dc8000-0000000079dcb000 (reserved)
> > 0000000079dcb000-0000000079e1c000 (ACPI Tables)
> > 0000000079e1c000-0000000079e87000 (reserved)
> > 0000000079e87000-000000007bd5f000 (ACPI Tables)
> > 0000000000100000-0000000078c59000 (System RAM)
> > 000000007bd5f000-000000007be4f000 (reserved)
> > 000000007be4f000-000000007bf87000 (ACPI Tables)
>
> so following ranges are not passed to second kernel by kexec?

I have the following addition to my kexec kernel command line:

memmap=62G@4G

since that last big 62G RAM entry doesn't show up without it, that's why
you see a user defined e820 map as well in the boot logs. So a kexec'ed
kernel is missing at least that entry.

I just tried with the latest and greatest kexec-tools (2.0.1) and
there's no difference.

--
Jens Axboe

2009-12-15 19:49:42

by Yinghai Lu

[permalink] [raw]
Subject: Re: kexec boot regression

Jens Axboe wrote:
> On Tue, Dec 15 2009, Yinghai Lu wrote:
>> [PATCH] x86/pci: intel ioh bus num reg accessing fix
>>
>> it is above 0x100, so if mmconf is not enable, need to skip it
>
> This works, it kexecs kernels fine. But since 2.6.32 doesn't have the
> mmconf problem to begin with, are we now just working around the issue?
> SRAT still reports issues, numa doesn't work.

that patch will be bullet proof... we need it.

also still need to figure out why memmap range is not passed properly.

do you mean 2.6.32 kexec 2.6.32 it have worked mmconf and numa in second kernel?

YH

2009-12-15 19:50:56

by Yinghai Lu

[permalink] [raw]
Subject: Re: kexec boot regression

Jens Axboe wrote:
> On Tue, Dec 15 2009, Yinghai Lu wrote:
>> Jens Axboe wrote:
>>> On Tue, Dec 15 2009, Yinghai Lu wrote:
>>>> Jens Axboe wrote:
>>>>> On Tue, Dec 15 2009, Yinghai Lu wrote:
>>>>>> [ 13.018720] PCI: MMCONFIG for domain 0000 [bus 00-ff] at [mem 0x80000000-0x8fffffff] (base 0x80000000)
>>>>>>
>>>>>> [ 13.100724] [Firmware Bug]: PCI: MMCONFIG at [mem 0x80000000-0x8fffffff] not reserved in ACPI motherboard resources
>>>>> On a "normal" non-kexec boot, I get:
>>>>>
>>>>> [ 12.173583] PCI: MMCONFIG for domain 0000 [bus 00-ff] at [mem 0x80000000-0x8fffffff] (base 0x80000000)
>>>>> [ 12.184075] PCI: MMCONFIG at [mem 0x80000000-0x8fffffff] reserved in E820
>>>>> [ 12.216874] PCI: Using configuration type 1 for base access
>>>>>
>>>> can you run following scripts in first kernel?
>>>>
>>>> cd /sys/firmware/memmap
>>>> for dir in * ; do
>>>> start=$(cat $dir/start)
>>>> end=$(cat $dir/end)
>>>> type=$(cat $dir/type)
>>>> printf "%016x-%016x (%s)\n" $start $[ $end +1] "$type" >> /tmp/memmap.txt
>>>> done
>>>>
>>>> and send out /tmp/memmap.txt
>>> Below.
>>>
>>>> what is your kexec tools version? could be too old?
>>> It says:
>>>
>>> kexec-tools-testing 20080324 released 24th March 2008
>>>
>>>
>>> 0000000000000000-0000000000098800 (System RAM)
>>> 0000000000098800-00000000000a0000 (reserved)
>>> 0000000079301000-0000000079303000 (reserved)
>>> 0000000079303000-0000000079305000 (ACPI Tables)
>>> 0000000079305000-0000000079310000 (reserved)
>>> 0000000079310000-0000000079314000 (ACPI Tables)
>>> 0000000079314000-0000000079319000 (reserved)
>>> 0000000079319000-0000000079336000 (ACPI Tables)
>>> 0000000079336000-0000000079358000 (reserved)
>>> 0000000079358000-0000000079388000 (ACPI Tables)
>>> 0000000079388000-00000000793c9000 (reserved)
>>> 00000000793c9000-000000007968f000 (ACPI Tables)
>>> 00000000000e0000-0000000000100000 (reserved)
>>> 000000007968f000-00000000796bb000 (reserved)
>>> 00000000796bb000-00000000799d8000 (ACPI Tables)
>>> 00000000799d8000-0000000079bd8000 (ACPI Non-volatile Storage)
>>> 0000000079bd8000-0000000079d8b000 (ACPI Tables)
>>> 0000000079d8b000-0000000079d8c000 (reserved)
>>> 0000000079d8c000-0000000079dc8000 (ACPI Tables)
>>> 0000000079dc8000-0000000079dcb000 (reserved)
>>> 0000000079dcb000-0000000079e1c000 (ACPI Tables)
>>> 0000000079e1c000-0000000079e87000 (reserved)
>>> 0000000079e87000-000000007bd5f000 (ACPI Tables)
>>> 0000000000100000-0000000078c59000 (System RAM)
>>> 000000007bd5f000-000000007be4f000 (reserved)
>>> 000000007be4f000-000000007bf87000 (ACPI Tables)
>> so following ranges are not passed to second kernel by kexec?
>
> I have the following addition to my kexec kernel command line:
>
> memmap=62G@4G
>
> since that last big 62G RAM entry doesn't show up without it, that's why
> you see a user defined e820 map as well in the boot logs. So a kexec'ed
> kernel is missing at least that entry.
>
> I just tried with the latest and greatest kexec-tools (2.0.1) and
> there's no difference.

current kernel kexec 2.6.32 make numa and mmconf working on second kernel?

YH

2009-12-15 19:51:18

by Jens Axboe

[permalink] [raw]
Subject: Re: kexec boot regression

On Tue, Dec 15 2009, Yinghai Lu wrote:
> Jens Axboe wrote:
> > On Tue, Dec 15 2009, Yinghai Lu wrote:
> >> [PATCH] x86/pci: intel ioh bus num reg accessing fix
> >>
> >> it is above 0x100, so if mmconf is not enable, need to skip it
> >
> > This works, it kexecs kernels fine. But since 2.6.32 doesn't have the
> > mmconf problem to begin with, are we now just working around the issue?
> > SRAT still reports issues, numa doesn't work.
>
> that patch will be bullet proof... we need it.
>
> also still need to figure out why memmap range is not passed properly.
>
> do you mean 2.6.32 kexec 2.6.32 it have worked mmconf and numa in
> second kernel?

Yes, 2.6.32 booted and 2.6.32 kexec'ed works just fine, no SRAT
complaints and NUMA works fine.

--
Jens Axboe

2009-12-15 19:58:13

by Yinghai Lu

[permalink] [raw]
Subject: Re: kexec boot regression

Jens Axboe wrote:
> On Tue, Dec 15 2009, Yinghai Lu wrote:
>> Jens Axboe wrote:
>>> On Tue, Dec 15 2009, Yinghai Lu wrote:
>>>> [PATCH] x86/pci: intel ioh bus num reg accessing fix
>>>>
>>>> it is above 0x100, so if mmconf is not enable, need to skip it
>>> This works, it kexecs kernels fine. But since 2.6.32 doesn't have the
>>> mmconf problem to begin with, are we now just working around the issue?
>>> SRAT still reports issues, numa doesn't work.
>> that patch will be bullet proof... we need it.
>>
>> also still need to figure out why memmap range is not passed properly.
>>
>> do you mean 2.6.32 kexec 2.6.32 it have worked mmconf and numa in
>> second kernel?
>
> Yes, 2.6.32 booted and 2.6.32 kexec'ed works just fine, no SRAT
> complaints and NUMA works fine.
>
how about

current kernel booted and 2.6.32 kexec'ed works just fine, no SRAT
complaints and NUMA works fine. ?

YH

2009-12-15 19:58:01

by Jens Axboe

[permalink] [raw]
Subject: Re: kexec boot regression

On Tue, Dec 15 2009, Yinghai Lu wrote:
> Jens Axboe wrote:
> > On Tue, Dec 15 2009, Yinghai Lu wrote:
> >> Jens Axboe wrote:
> >>> On Tue, Dec 15 2009, Yinghai Lu wrote:
> >>>> Jens Axboe wrote:
> >>>>> On Tue, Dec 15 2009, Yinghai Lu wrote:
> >>>>>> [ 13.018720] PCI: MMCONFIG for domain 0000 [bus 00-ff] at [mem 0x80000000-0x8fffffff] (base 0x80000000)
> >>>>>>
> >>>>>> [ 13.100724] [Firmware Bug]: PCI: MMCONFIG at [mem 0x80000000-0x8fffffff] not reserved in ACPI motherboard resources
> >>>>> On a "normal" non-kexec boot, I get:
> >>>>>
> >>>>> [ 12.173583] PCI: MMCONFIG for domain 0000 [bus 00-ff] at [mem 0x80000000-0x8fffffff] (base 0x80000000)
> >>>>> [ 12.184075] PCI: MMCONFIG at [mem 0x80000000-0x8fffffff] reserved in E820
> >>>>> [ 12.216874] PCI: Using configuration type 1 for base access
> >>>>>
> >>>> can you run following scripts in first kernel?
> >>>>
> >>>> cd /sys/firmware/memmap
> >>>> for dir in * ; do
> >>>> start=$(cat $dir/start)
> >>>> end=$(cat $dir/end)
> >>>> type=$(cat $dir/type)
> >>>> printf "%016x-%016x (%s)\n" $start $[ $end +1] "$type" >> /tmp/memmap.txt
> >>>> done
> >>>>
> >>>> and send out /tmp/memmap.txt
> >>> Below.
> >>>
> >>>> what is your kexec tools version? could be too old?
> >>> It says:
> >>>
> >>> kexec-tools-testing 20080324 released 24th March 2008
> >>>
> >>>
> >>> 0000000000000000-0000000000098800 (System RAM)
> >>> 0000000000098800-00000000000a0000 (reserved)
> >>> 0000000079301000-0000000079303000 (reserved)
> >>> 0000000079303000-0000000079305000 (ACPI Tables)
> >>> 0000000079305000-0000000079310000 (reserved)
> >>> 0000000079310000-0000000079314000 (ACPI Tables)
> >>> 0000000079314000-0000000079319000 (reserved)
> >>> 0000000079319000-0000000079336000 (ACPI Tables)
> >>> 0000000079336000-0000000079358000 (reserved)
> >>> 0000000079358000-0000000079388000 (ACPI Tables)
> >>> 0000000079388000-00000000793c9000 (reserved)
> >>> 00000000793c9000-000000007968f000 (ACPI Tables)
> >>> 00000000000e0000-0000000000100000 (reserved)
> >>> 000000007968f000-00000000796bb000 (reserved)
> >>> 00000000796bb000-00000000799d8000 (ACPI Tables)
> >>> 00000000799d8000-0000000079bd8000 (ACPI Non-volatile Storage)
> >>> 0000000079bd8000-0000000079d8b000 (ACPI Tables)
> >>> 0000000079d8b000-0000000079d8c000 (reserved)
> >>> 0000000079d8c000-0000000079dc8000 (ACPI Tables)
> >>> 0000000079dc8000-0000000079dcb000 (reserved)
> >>> 0000000079dcb000-0000000079e1c000 (ACPI Tables)
> >>> 0000000079e1c000-0000000079e87000 (reserved)
> >>> 0000000079e87000-000000007bd5f000 (ACPI Tables)
> >>> 0000000000100000-0000000078c59000 (System RAM)
> >>> 000000007bd5f000-000000007be4f000 (reserved)
> >>> 000000007be4f000-000000007bf87000 (ACPI Tables)
> >> so following ranges are not passed to second kernel by kexec?
> >
> > I have the following addition to my kexec kernel command line:
> >
> > memmap=62G@4G
> >
> > since that last big 62G RAM entry doesn't show up without it, that's why
> > you see a user defined e820 map as well in the boot logs. So a kexec'ed
> > kernel is missing at least that entry.
> >
> > I just tried with the latest and greatest kexec-tools (2.0.1) and
> > there's no difference.
>
> current kernel kexec 2.6.32 make numa and mmconf working on second kernel?

Just tested that configuration, and with current -git booted and
kexec into 2.6.32 gets me working numa but mmconf still complains:

[ 15.669222] PCI: MCFG configuration 0: base 80000000 segment 0 buses
0 - 255
[ 15.677166] PCI: Not using MMCONFIG.
[...]
[ 15.971448] PCI: MCFG configuration 0: base 80000000 segment 0 buses
0 - 255
[ 16.066995] PCI: BIOS Bug: MCFG area at 80000000 is not reserved in
ACPI motherboard resources
[ 16.076705] PCI: Not using MMCONFIG.

SRAT looks good:

[...]
[ 0.000000] SRAT: Node 0 PXM 0 0-80000000
[ 0.000000] SRAT: Node 0 PXM 0 100000000-480000000
[ 0.000000] SRAT: Node 2 PXM 1 480000000-880000000
[ 0.000000] SRAT: Node 1 PXM 2 880000000-c80000000
[ 0.000000] SRAT: Node 3 PXM 3 c80000000-1080000000
[ 0.000000] NUMA: Using 31 for the hash shift.
[snip same working NUMA config]

--
Jens Axboe

2009-12-15 20:09:22

by Jens Axboe

[permalink] [raw]
Subject: Re: kexec boot regression

On Tue, Dec 15 2009, Yinghai Lu wrote:
> Jens Axboe wrote:
> > On Tue, Dec 15 2009, Yinghai Lu wrote:
> >> Jens Axboe wrote:
> >>> On Tue, Dec 15 2009, Yinghai Lu wrote:
> >>>> [PATCH] x86/pci: intel ioh bus num reg accessing fix
> >>>>
> >>>> it is above 0x100, so if mmconf is not enable, need to skip it
> >>> This works, it kexecs kernels fine. But since 2.6.32 doesn't have the
> >>> mmconf problem to begin with, are we now just working around the issue?
> >>> SRAT still reports issues, numa doesn't work.
> >> that patch will be bullet proof... we need it.
> >>
> >> also still need to figure out why memmap range is not passed properly.
> >>
> >> do you mean 2.6.32 kexec 2.6.32 it have worked mmconf and numa in
> >> second kernel?
> >
> > Yes, 2.6.32 booted and 2.6.32 kexec'ed works just fine, no SRAT
> > complaints and NUMA works fine.
> >
> how about
>
> current kernel booted and 2.6.32 kexec'ed works just fine, no SRAT
> complaints and NUMA works fine. ?

Yes, that's exactly what happens, see the previous reply I sent. mmconf
still complains, though.

--
Jens Axboe

2009-12-15 20:16:11

by Yinghai Lu

[permalink] [raw]
Subject: Re: kexec boot regression

Jens Axboe wrote:
> On Tue, Dec 15 2009, Yinghai Lu wrote:
>> Jens Axboe wrote:
>>> On Tue, Dec 15 2009, Yinghai Lu wrote:
>>>> [PATCH] x86/pci: intel ioh bus num reg accessing fix
>>>>
>>>> it is above 0x100, so if mmconf is not enable, need to skip it
>>> This works, it kexecs kernels fine. But since 2.6.32 doesn't have the
>>> mmconf problem to begin with, are we now just working around the issue?
>>> SRAT still reports issues, numa doesn't work.
>> that patch will be bullet proof... we need it.
>>
>> also still need to figure out why memmap range is not passed properly.
>>
>> do you mean 2.6.32 kexec 2.6.32 it have worked mmconf and numa in
>> second kernel?
>
> Yes, 2.6.32 booted and 2.6.32 kexec'ed works just fine, no SRAT
> complaints and NUMA works fine.

do you need
memmap=62G@4G
in this case?

YH

2009-12-15 20:19:14

by Jens Axboe

[permalink] [raw]
Subject: Re: kexec boot regression

On Tue, Dec 15 2009, Yinghai Lu wrote:
> Jens Axboe wrote:
> > On Tue, Dec 15 2009, Yinghai Lu wrote:
> >> Jens Axboe wrote:
> >>> On Tue, Dec 15 2009, Yinghai Lu wrote:
> >>>> [PATCH] x86/pci: intel ioh bus num reg accessing fix
> >>>>
> >>>> it is above 0x100, so if mmconf is not enable, need to skip it
> >>> This works, it kexecs kernels fine. But since 2.6.32 doesn't have the
> >>> mmconf problem to begin with, are we now just working around the issue?
> >>> SRAT still reports issues, numa doesn't work.
> >> that patch will be bullet proof... we need it.
> >>
> >> also still need to figure out why memmap range is not passed properly.
> >>
> >> do you mean 2.6.32 kexec 2.6.32 it have worked mmconf and numa in
> >> second kernel?
> >
> > Yes, 2.6.32 booted and 2.6.32 kexec'ed works just fine, no SRAT
> > complaints and NUMA works fine.
>
> do you need
> memmap=62G@4G
> in this case?

Yes, I've needed that always.

--
Jens Axboe

2009-12-15 20:22:23

by Yinghai Lu

[permalink] [raw]
Subject: Re: kexec boot regression

Jens Axboe wrote:
> On Tue, Dec 15 2009, Yinghai Lu wrote:
>> Jens Axboe wrote:
>>> On Tue, Dec 15 2009, Yinghai Lu wrote:
>>>> Jens Axboe wrote:
>>>>> On Tue, Dec 15 2009, Yinghai Lu wrote:
>>>>>> [PATCH] x86/pci: intel ioh bus num reg accessing fix
>>>>>>
>>>>>> it is above 0x100, so if mmconf is not enable, need to skip it
>>>>> This works, it kexecs kernels fine. But since 2.6.32 doesn't have the
>>>>> mmconf problem to begin with, are we now just working around the issue?
>>>>> SRAT still reports issues, numa doesn't work.
>>>> that patch will be bullet proof... we need it.
>>>>
>>>> also still need to figure out why memmap range is not passed properly.
>>>>
>>>> do you mean 2.6.32 kexec 2.6.32 it have worked mmconf and numa in
>>>> second kernel?
>>> Yes, 2.6.32 booted and 2.6.32 kexec'ed works just fine, no SRAT
>>> complaints and NUMA works fine.
>> do you need
>> memmap=62G@4G
>> in this case?
>
> Yes, I've needed that always.

good,

can you enable debug option in kexec to see why kexec can not pass whole 38? range to second kernel?

YH

2009-12-15 20:42:25

by Jens Axboe

[permalink] [raw]
Subject: Re: kexec boot regression

On Tue, Dec 15 2009, Yinghai Lu wrote:
> Jens Axboe wrote:
> > On Tue, Dec 15 2009, Yinghai Lu wrote:
> >> Jens Axboe wrote:
> >>> On Tue, Dec 15 2009, Yinghai Lu wrote:
> >>>> Jens Axboe wrote:
> >>>>> On Tue, Dec 15 2009, Yinghai Lu wrote:
> >>>>>> [PATCH] x86/pci: intel ioh bus num reg accessing fix
> >>>>>>
> >>>>>> it is above 0x100, so if mmconf is not enable, need to skip it
> >>>>> This works, it kexecs kernels fine. But since 2.6.32 doesn't have the
> >>>>> mmconf problem to begin with, are we now just working around the issue?
> >>>>> SRAT still reports issues, numa doesn't work.
> >>>> that patch will be bullet proof... we need it.
> >>>>
> >>>> also still need to figure out why memmap range is not passed properly.
> >>>>
> >>>> do you mean 2.6.32 kexec 2.6.32 it have worked mmconf and numa in
> >>>> second kernel?
> >>> Yes, 2.6.32 booted and 2.6.32 kexec'ed works just fine, no SRAT
> >>> complaints and NUMA works fine.
> >> do you need
> >> memmap=62G@4G
> >> in this case?
> >
> > Yes, I've needed that always.
>
> good,
>
> can you enable debug option in kexec to see why kexec can not pass
> whole 38? range to second kernel?

Not getting any output so far, -d doesn't do much. Poking around in the
source...

--
Jens Axboe

2009-12-15 20:55:21

by Jens Axboe

[permalink] [raw]
Subject: Re: kexec boot regression

On Tue, Dec 15 2009, Jens Axboe wrote:
> On Tue, Dec 15 2009, Yinghai Lu wrote:
> > Jens Axboe wrote:
> > > On Tue, Dec 15 2009, Yinghai Lu wrote:
> > >> Jens Axboe wrote:
> > >>> On Tue, Dec 15 2009, Yinghai Lu wrote:
> > >>>> Jens Axboe wrote:
> > >>>>> On Tue, Dec 15 2009, Yinghai Lu wrote:
> > >>>>>> [PATCH] x86/pci: intel ioh bus num reg accessing fix
> > >>>>>>
> > >>>>>> it is above 0x100, so if mmconf is not enable, need to skip it
> > >>>>> This works, it kexecs kernels fine. But since 2.6.32 doesn't have the
> > >>>>> mmconf problem to begin with, are we now just working around the issue?
> > >>>>> SRAT still reports issues, numa doesn't work.
> > >>>> that patch will be bullet proof... we need it.
> > >>>>
> > >>>> also still need to figure out why memmap range is not passed properly.
> > >>>>
> > >>>> do you mean 2.6.32 kexec 2.6.32 it have worked mmconf and numa in
> > >>>> second kernel?
> > >>> Yes, 2.6.32 booted and 2.6.32 kexec'ed works just fine, no SRAT
> > >>> complaints and NUMA works fine.
> > >> do you need
> > >> memmap=62G@4G
> > >> in this case?
> > >
> > > Yes, I've needed that always.
> >
> > good,
> >
> > can you enable debug option in kexec to see why kexec can not pass
> > whole 38? range to second kernel?
>
> Not getting any output so far, -d doesn't do much. Poking around in the
> source...

OK, cold boot and kexec 2.0.1 gets all 39 ranges passed properly to
kexec'ed kernels. Since the older kexec stopped at range 30 (31 ranges
total), that smells like just a kexec bug. Retesting -git...

--
Jens Axboe

2009-12-15 21:01:51

by Jens Axboe

[permalink] [raw]
Subject: Re: kexec boot regression

On Tue, Dec 15 2009, Jens Axboe wrote:
> On Tue, Dec 15 2009, Jens Axboe wrote:
> > On Tue, Dec 15 2009, Yinghai Lu wrote:
> > > Jens Axboe wrote:
> > > > On Tue, Dec 15 2009, Yinghai Lu wrote:
> > > >> Jens Axboe wrote:
> > > >>> On Tue, Dec 15 2009, Yinghai Lu wrote:
> > > >>>> Jens Axboe wrote:
> > > >>>>> On Tue, Dec 15 2009, Yinghai Lu wrote:
> > > >>>>>> [PATCH] x86/pci: intel ioh bus num reg accessing fix
> > > >>>>>>
> > > >>>>>> it is above 0x100, so if mmconf is not enable, need to skip it
> > > >>>>> This works, it kexecs kernels fine. But since 2.6.32 doesn't have the
> > > >>>>> mmconf problem to begin with, are we now just working around the issue?
> > > >>>>> SRAT still reports issues, numa doesn't work.
> > > >>>> that patch will be bullet proof... we need it.
> > > >>>>
> > > >>>> also still need to figure out why memmap range is not passed properly.
> > > >>>>
> > > >>>> do you mean 2.6.32 kexec 2.6.32 it have worked mmconf and numa in
> > > >>>> second kernel?
> > > >>> Yes, 2.6.32 booted and 2.6.32 kexec'ed works just fine, no SRAT
> > > >>> complaints and NUMA works fine.
> > > >> do you need
> > > >> memmap=62G@4G
> > > >> in this case?
> > > >
> > > > Yes, I've needed that always.
> > >
> > > good,
> > >
> > > can you enable debug option in kexec to see why kexec can not pass
> > > whole 38? range to second kernel?
> >
> > Not getting any output so far, -d doesn't do much. Poking around in the
> > source...
>
> OK, cold boot and kexec 2.0.1 gets all 39 ranges passed properly to
> kexec'ed kernels. Since the older kexec stopped at range 30 (31 ranges
> total), that smells like just a kexec bug. Retesting -git...

Current -git works fine when all the ranges are passed correctly. So, I
think, the only existing regression is the SRAT issue.

--
Jens Axboe

2009-12-15 21:27:39

by Yinghai Lu

[permalink] [raw]
Subject: Re: kexec boot regression

Jens Axboe wrote:
> On Tue, Dec 15 2009, Jens Axboe wrote:
>> On Tue, Dec 15 2009, Jens Axboe wrote:
>>> On Tue, Dec 15 2009, Yinghai Lu wrote:
>>>> Jens Axboe wrote:
>>>>> On Tue, Dec 15 2009, Yinghai Lu wrote:
>>>>>> Jens Axboe wrote:
>>>>>>> On Tue, Dec 15 2009, Yinghai Lu wrote:
>>>>>>>> Jens Axboe wrote:
>>>>>>>>> On Tue, Dec 15 2009, Yinghai Lu wrote:
>>>>>>>>>> [PATCH] x86/pci: intel ioh bus num reg accessing fix
>>>>>>>>>>
>>>>>>>>>> it is above 0x100, so if mmconf is not enable, need to skip it
>>>>>>>>> This works, it kexecs kernels fine. But since 2.6.32 doesn't have the
>>>>>>>>> mmconf problem to begin with, are we now just working around the issue?
>>>>>>>>> SRAT still reports issues, numa doesn't work.
>>>>>>>> that patch will be bullet proof... we need it.
>>>>>>>>
>>>>>>>> also still need to figure out why memmap range is not passed properly.
>>>>>>>>
>>>>>>>> do you mean 2.6.32 kexec 2.6.32 it have worked mmconf and numa in
>>>>>>>> second kernel?
>>>>>>> Yes, 2.6.32 booted and 2.6.32 kexec'ed works just fine, no SRAT
>>>>>>> complaints and NUMA works fine.
>>>>>> do you need
>>>>>> memmap=62G@4G
>>>>>> in this case?
>>>>> Yes, I've needed that always.
>>>> good,
>>>>
>>>> can you enable debug option in kexec to see why kexec can not pass
>>>> whole 38? range to second kernel?
>>> Not getting any output so far, -d doesn't do much. Poking around in the
>>> source...
>> OK, cold boot and kexec 2.0.1 gets all 39 ranges passed properly to
>> kexec'ed kernels. Since the older kexec stopped at range 30 (31 ranges
>> total), that smells like just a kexec bug. Retesting -git...
>
> Current -git works fine when all the ranges are passed correctly. So, I
> think, the only existing regression is the SRAT issue.

did you change node_shift?

YH

2009-12-15 21:37:13

by Markus Trippelsdorf

[permalink] [raw]
Subject: Re: kexec boot regression

On Tue, Dec 15, 2009 at 11:04:55AM -0800, Yinghai Lu wrote:
> Jens Axboe wrote:
> > On Tue, Dec 15 2009, Yinghai Lu wrote:
> >> [ 13.018720] PCI: MMCONFIG for domain 0000 [bus 00-ff] at [mem 0x80000000-0x8fffffff] (base 0x80000000)
> >>
> >> [ 13.100724] [Firmware Bug]: PCI: MMCONFIG at [mem 0x80000000-0x8fffffff] not reserved in ACPI motherboard resources
> >
> > On a "normal" non-kexec boot, I get:
> >
> > [ 12.173583] PCI: MMCONFIG for domain 0000 [bus 00-ff] at [mem 0x80000000-0x8fffffff] (base 0x80000000)
> > [ 12.184075] PCI: MMCONFIG at [mem 0x80000000-0x8fffffff] reserved in E820
> > [ 12.216874] PCI: Using configuration type 1 for base access
> >
>
> can you run following scripts in first kernel?
>
> cd /sys/firmware/memmap
> for dir in * ; do
> start=$(cat $dir/start)
> end=$(cat $dir/end)
> type=$(cat $dir/type)
> printf "%016x-%016x (%s)\n" $start $[ $end +1] "$type" >> /tmp/memmap.txt
> done
>
> and send out /tmp/memmap.txt
>
> what is your kexec tools version? could be too old?

I have the same symptoms on my machine, but the underlying cause must be
different. I once reverted all Radeon related changes since 2.6.32 and
kexec started working again.

Full dmesg and the output of the script is attached.

kexec-tools 2.0.1 released 13th August 2009

--
Markus


Attachments:
(No filename) (1.29 kB)
memmap.txt (431.00 B)
dmesg (26.29 kB)
Download all attachments

2009-12-15 21:30:39

by Jens Axboe

[permalink] [raw]
Subject: Re: kexec boot regression

On Tue, Dec 15 2009, Yinghai Lu wrote:
> Jens Axboe wrote:
> > On Tue, Dec 15 2009, Jens Axboe wrote:
> >> On Tue, Dec 15 2009, Jens Axboe wrote:
> >>> On Tue, Dec 15 2009, Yinghai Lu wrote:
> >>>> Jens Axboe wrote:
> >>>>> On Tue, Dec 15 2009, Yinghai Lu wrote:
> >>>>>> Jens Axboe wrote:
> >>>>>>> On Tue, Dec 15 2009, Yinghai Lu wrote:
> >>>>>>>> Jens Axboe wrote:
> >>>>>>>>> On Tue, Dec 15 2009, Yinghai Lu wrote:
> >>>>>>>>>> [PATCH] x86/pci: intel ioh bus num reg accessing fix
> >>>>>>>>>>
> >>>>>>>>>> it is above 0x100, so if mmconf is not enable, need to skip it
> >>>>>>>>> This works, it kexecs kernels fine. But since 2.6.32 doesn't have the
> >>>>>>>>> mmconf problem to begin with, are we now just working around the issue?
> >>>>>>>>> SRAT still reports issues, numa doesn't work.
> >>>>>>>> that patch will be bullet proof... we need it.
> >>>>>>>>
> >>>>>>>> also still need to figure out why memmap range is not passed properly.
> >>>>>>>>
> >>>>>>>> do you mean 2.6.32 kexec 2.6.32 it have worked mmconf and numa in
> >>>>>>>> second kernel?
> >>>>>>> Yes, 2.6.32 booted and 2.6.32 kexec'ed works just fine, no SRAT
> >>>>>>> complaints and NUMA works fine.
> >>>>>> do you need
> >>>>>> memmap=62G@4G
> >>>>>> in this case?
> >>>>> Yes, I've needed that always.
> >>>> good,
> >>>>
> >>>> can you enable debug option in kexec to see why kexec can not pass
> >>>> whole 38? range to second kernel?
> >>> Not getting any output so far, -d doesn't do much. Poking around in the
> >>> source...
> >> OK, cold boot and kexec 2.0.1 gets all 39 ranges passed properly to
> >> kexec'ed kernels. Since the older kexec stopped at range 30 (31 ranges
> >> total), that smells like just a kexec bug. Retesting -git...
> >
> > Current -git works fine when all the ranges are passed correctly. So, I
> > think, the only existing regression is the SRAT issue.
>
> did you change node_shift?

Yes:

CONFIG_NODES_SHIFT=6

What I don't get is that 2.6.32 and -git print the same PXM map, and in
both cases it's totalling exactly 64G. Yet it says:

SRAT: PXMs only cover 49035MB of your 65419MB e820 RAM. Not used.

--
Jens Axboe

2009-12-15 21:40:17

by Jens Axboe

[permalink] [raw]
Subject: Re: kexec boot regression

On Tue, Dec 15 2009, Jens Axboe wrote:
> On Tue, Dec 15 2009, Yinghai Lu wrote:
> > Jens Axboe wrote:
> > > On Tue, Dec 15 2009, Jens Axboe wrote:
> > >> On Tue, Dec 15 2009, Jens Axboe wrote:
> > >>> On Tue, Dec 15 2009, Yinghai Lu wrote:
> > >>>> Jens Axboe wrote:
> > >>>>> On Tue, Dec 15 2009, Yinghai Lu wrote:
> > >>>>>> Jens Axboe wrote:
> > >>>>>>> On Tue, Dec 15 2009, Yinghai Lu wrote:
> > >>>>>>>> Jens Axboe wrote:
> > >>>>>>>>> On Tue, Dec 15 2009, Yinghai Lu wrote:
> > >>>>>>>>>> [PATCH] x86/pci: intel ioh bus num reg accessing fix
> > >>>>>>>>>>
> > >>>>>>>>>> it is above 0x100, so if mmconf is not enable, need to skip it
> > >>>>>>>>> This works, it kexecs kernels fine. But since 2.6.32 doesn't have the
> > >>>>>>>>> mmconf problem to begin with, are we now just working around the issue?
> > >>>>>>>>> SRAT still reports issues, numa doesn't work.
> > >>>>>>>> that patch will be bullet proof... we need it.
> > >>>>>>>>
> > >>>>>>>> also still need to figure out why memmap range is not passed properly.
> > >>>>>>>>
> > >>>>>>>> do you mean 2.6.32 kexec 2.6.32 it have worked mmconf and numa in
> > >>>>>>>> second kernel?
> > >>>>>>> Yes, 2.6.32 booted and 2.6.32 kexec'ed works just fine, no SRAT
> > >>>>>>> complaints and NUMA works fine.
> > >>>>>> do you need
> > >>>>>> memmap=62G@4G
> > >>>>>> in this case?
> > >>>>> Yes, I've needed that always.
> > >>>> good,
> > >>>>
> > >>>> can you enable debug option in kexec to see why kexec can not pass
> > >>>> whole 38? range to second kernel?
> > >>> Not getting any output so far, -d doesn't do much. Poking around in the
> > >>> source...
> > >> OK, cold boot and kexec 2.0.1 gets all 39 ranges passed properly to
> > >> kexec'ed kernels. Since the older kexec stopped at range 30 (31 ranges
> > >> total), that smells like just a kexec bug. Retesting -git...
> > >
> > > Current -git works fine when all the ranges are passed correctly. So, I
> > > think, the only existing regression is the SRAT issue.
> >
> > did you change node_shift?
>
> Yes:
>
> CONFIG_NODES_SHIFT=6
>
> What I don't get is that 2.6.32 and -git print the same PXM map, and in
> both cases it's totalling exactly 64G. Yet it says:
>
> SRAT: PXMs only cover 49035MB of your 65419MB e820 RAM. Not used.

Clue:

[ 0.000000] SRAT: Node 0 PXM 0 0-80000000
[ 0.000000] SRAT: Node 0 PXM 0 100000000-480000000
[ 0.000000] SRAT: Node 2 PXM 1 480000000-880000000
[ 0.000000] SRAT: Node 1 PXM 2 880000000-c80000000
[ 0.000000] SRAT: Node 3 PXM 3 c80000000-1080000000
[ 0.000000] NUMA: Using 31 for the hash shift.
[ 0.000000] pxm0: 0-480000 (4718592), absent 553990
[ 0.000000] pxm1: 880000-c80000 (4194304), absent 0
[ 0.000000] pxm2: 480000-880000 (4194304), absent 4194304
[ 0.000000] pxm3: c80000-1080000 (4194304), absent 0
[ 0.000000] SRAT: PXMs only cover 49035MB of your 65419MB e820 RAM. Not used.
[ 0.000000] SRAT: SRAT not used.

It's essentially disregarding pxm2, claiming all pages are absent.

--
Jens Axboe

2009-12-15 21:44:24

by Yinghai Lu

[permalink] [raw]
Subject: Re: kexec boot regression

Jens Axboe wrote:
> On Tue, Dec 15 2009, Jens Axboe wrote:
>> On Tue, Dec 15 2009, Yinghai Lu wrote:
>>> Jens Axboe wrote:
>>>> On Tue, Dec 15 2009, Jens Axboe wrote:
>>>>> On Tue, Dec 15 2009, Jens Axboe wrote:
>>>>>> On Tue, Dec 15 2009, Yinghai Lu wrote:
>>>>>>> Jens Axboe wrote:
>>>>>>>> On Tue, Dec 15 2009, Yinghai Lu wrote:
>>>>>>>>> Jens Axboe wrote:
>>>>>>>>>> On Tue, Dec 15 2009, Yinghai Lu wrote:
>>>>>>>>>>> Jens Axboe wrote:
>>>>>>>>>>>> On Tue, Dec 15 2009, Yinghai Lu wrote:
>>>>>>>>>>>>> [PATCH] x86/pci: intel ioh bus num reg accessing fix
>>>>>>>>>>>>>
>>>>>>>>>>>>> it is above 0x100, so if mmconf is not enable, need to skip it
>>>>>>>>>>>> This works, it kexecs kernels fine. But since 2.6.32 doesn't have the
>>>>>>>>>>>> mmconf problem to begin with, are we now just working around the issue?
>>>>>>>>>>>> SRAT still reports issues, numa doesn't work.
>>>>>>>>>>> that patch will be bullet proof... we need it.
>>>>>>>>>>>
>>>>>>>>>>> also still need to figure out why memmap range is not passed properly.
>>>>>>>>>>>
>>>>>>>>>>> do you mean 2.6.32 kexec 2.6.32 it have worked mmconf and numa in
>>>>>>>>>>> second kernel?
>>>>>>>>>> Yes, 2.6.32 booted and 2.6.32 kexec'ed works just fine, no SRAT
>>>>>>>>>> complaints and NUMA works fine.
>>>>>>>>> do you need
>>>>>>>>> memmap=62G@4G
>>>>>>>>> in this case?
>>>>>>>> Yes, I've needed that always.
>>>>>>> good,
>>>>>>>
>>>>>>> can you enable debug option in kexec to see why kexec can not pass
>>>>>>> whole 38? range to second kernel?
>>>>>> Not getting any output so far, -d doesn't do much. Poking around in the
>>>>>> source...
>>>>> OK, cold boot and kexec 2.0.1 gets all 39 ranges passed properly to
>>>>> kexec'ed kernels. Since the older kexec stopped at range 30 (31 ranges
>>>>> total), that smells like just a kexec bug. Retesting -git...
>>>> Current -git works fine when all the ranges are passed correctly. So, I
>>>> think, the only existing regression is the SRAT issue.
>>> did you change node_shift?
>> Yes:
>>
>> CONFIG_NODES_SHIFT=6
>>
>> What I don't get is that 2.6.32 and -git print the same PXM map, and in
>> both cases it's totalling exactly 64G. Yet it says:
>>
>> SRAT: PXMs only cover 49035MB of your 65419MB e820 RAM. Not used.
>
> Clue:
>
> [ 0.000000] SRAT: Node 0 PXM 0 0-80000000
> [ 0.000000] SRAT: Node 0 PXM 0 100000000-480000000
> [ 0.000000] SRAT: Node 2 PXM 1 480000000-880000000
> [ 0.000000] SRAT: Node 1 PXM 2 880000000-c80000000
> [ 0.000000] SRAT: Node 3 PXM 3 c80000000-1080000000
> [ 0.000000] NUMA: Using 31 for the hash shift.
> [ 0.000000] pxm0: 0-480000 (4718592), absent 553990
> [ 0.000000] pxm1: 880000-c80000 (4194304), absent 0
> [ 0.000000] pxm2: 480000-880000 (4194304), absent 4194304
> [ 0.000000] pxm3: c80000-1080000 (4194304), absent 0
> [ 0.000000] SRAT: PXMs only cover 49035MB of your 65419MB e820 RAM. Not used.
> [ 0.000000] SRAT: SRAT not used.
>

oh, i post one patch last week,

can you check it?

YH


Attachments:
Attached Message (5.59 kB)

2009-12-15 21:47:48

by Jens Axboe

[permalink] [raw]
Subject: Re: kexec boot regression

On Tue, Dec 15 2009, Yinghai Lu wrote:
> Jens Axboe wrote:
> > On Tue, Dec 15 2009, Jens Axboe wrote:
> >> On Tue, Dec 15 2009, Yinghai Lu wrote:
> >>> Jens Axboe wrote:
> >>>> On Tue, Dec 15 2009, Jens Axboe wrote:
> >>>>> On Tue, Dec 15 2009, Jens Axboe wrote:
> >>>>>> On Tue, Dec 15 2009, Yinghai Lu wrote:
> >>>>>>> Jens Axboe wrote:
> >>>>>>>> On Tue, Dec 15 2009, Yinghai Lu wrote:
> >>>>>>>>> Jens Axboe wrote:
> >>>>>>>>>> On Tue, Dec 15 2009, Yinghai Lu wrote:
> >>>>>>>>>>> Jens Axboe wrote:
> >>>>>>>>>>>> On Tue, Dec 15 2009, Yinghai Lu wrote:
> >>>>>>>>>>>>> [PATCH] x86/pci: intel ioh bus num reg accessing fix
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> it is above 0x100, so if mmconf is not enable, need to skip it
> >>>>>>>>>>>> This works, it kexecs kernels fine. But since 2.6.32 doesn't have the
> >>>>>>>>>>>> mmconf problem to begin with, are we now just working around the issue?
> >>>>>>>>>>>> SRAT still reports issues, numa doesn't work.
> >>>>>>>>>>> that patch will be bullet proof... we need it.
> >>>>>>>>>>>
> >>>>>>>>>>> also still need to figure out why memmap range is not passed properly.
> >>>>>>>>>>>
> >>>>>>>>>>> do you mean 2.6.32 kexec 2.6.32 it have worked mmconf and numa in
> >>>>>>>>>>> second kernel?
> >>>>>>>>>> Yes, 2.6.32 booted and 2.6.32 kexec'ed works just fine, no SRAT
> >>>>>>>>>> complaints and NUMA works fine.
> >>>>>>>>> do you need
> >>>>>>>>> memmap=62G@4G
> >>>>>>>>> in this case?
> >>>>>>>> Yes, I've needed that always.
> >>>>>>> good,
> >>>>>>>
> >>>>>>> can you enable debug option in kexec to see why kexec can not pass
> >>>>>>> whole 38? range to second kernel?
> >>>>>> Not getting any output so far, -d doesn't do much. Poking around in the
> >>>>>> source...
> >>>>> OK, cold boot and kexec 2.0.1 gets all 39 ranges passed properly to
> >>>>> kexec'ed kernels. Since the older kexec stopped at range 30 (31 ranges
> >>>>> total), that smells like just a kexec bug. Retesting -git...
> >>>> Current -git works fine when all the ranges are passed correctly. So, I
> >>>> think, the only existing regression is the SRAT issue.
> >>> did you change node_shift?
> >> Yes:
> >>
> >> CONFIG_NODES_SHIFT=6
> >>
> >> What I don't get is that 2.6.32 and -git print the same PXM map, and in
> >> both cases it's totalling exactly 64G. Yet it says:
> >>
> >> SRAT: PXMs only cover 49035MB of your 65419MB e820 RAM. Not used.
> >
> > Clue:
> >
> > [ 0.000000] SRAT: Node 0 PXM 0 0-80000000
> > [ 0.000000] SRAT: Node 0 PXM 0 100000000-480000000
> > [ 0.000000] SRAT: Node 2 PXM 1 480000000-880000000
> > [ 0.000000] SRAT: Node 1 PXM 2 880000000-c80000000
> > [ 0.000000] SRAT: Node 3 PXM 3 c80000000-1080000000
> > [ 0.000000] NUMA: Using 31 for the hash shift.
> > [ 0.000000] pxm0: 0-480000 (4718592), absent 553990
> > [ 0.000000] pxm1: 880000-c80000 (4194304), absent 0
> > [ 0.000000] pxm2: 480000-880000 (4194304), absent 4194304
> > [ 0.000000] pxm3: c80000-1080000 (4194304), absent 0
> > [ 0.000000] SRAT: PXMs only cover 49035MB of your 65419MB e820 RAM. Not used.
> > [ 0.000000] SRAT: SRAT not used.
> >
>
> oh, i post one patch last week,
>
> can you check it?

Sure, let me try it. I already found out that commit 8716273c is the
guilty one (x86: Export srat physical topology).

--
Jens Axboe

2009-12-15 21:51:18

by Yinghai Lu

[permalink] [raw]
Subject: Re: kexec boot regression

Jens Axboe wrote:
> On Tue, Dec 15 2009, Yinghai Lu wrote:
>> Jens Axboe wrote:
>>> On Tue, Dec 15 2009, Jens Axboe wrote:
>>>> On Tue, Dec 15 2009, Yinghai Lu wrote:
>>>>> Jens Axboe wrote:
>>>>>> On Tue, Dec 15 2009, Jens Axboe wrote:
>>>>>>> On Tue, Dec 15 2009, Jens Axboe wrote:
>>>>>>>> On Tue, Dec 15 2009, Yinghai Lu wrote:
>>>>>>>>> Jens Axboe wrote:
>>>>>>>>>> On Tue, Dec 15 2009, Yinghai Lu wrote:
>>>>>>>>>>> Jens Axboe wrote:
>>>>>>>>>>>> On Tue, Dec 15 2009, Yinghai Lu wrote:
>>>>>>>>>>>>> Jens Axboe wrote:
>>>>>>>>>>>>>> On Tue, Dec 15 2009, Yinghai Lu wrote:
>>>>>>>>>>>>>>> [PATCH] x86/pci: intel ioh bus num reg accessing fix
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> it is above 0x100, so if mmconf is not enable, need to skip it
>>>>>>>>>>>>>> This works, it kexecs kernels fine. But since 2.6.32 doesn't have the
>>>>>>>>>>>>>> mmconf problem to begin with, are we now just working around the issue?
>>>>>>>>>>>>>> SRAT still reports issues, numa doesn't work.
>>>>>>>>>>>>> that patch will be bullet proof... we need it.
>>>>>>>>>>>>>
>>>>>>>>>>>>> also still need to figure out why memmap range is not passed properly.
>>>>>>>>>>>>>
>>>>>>>>>>>>> do you mean 2.6.32 kexec 2.6.32 it have worked mmconf and numa in
>>>>>>>>>>>>> second kernel?
>>>>>>>>>>>> Yes, 2.6.32 booted and 2.6.32 kexec'ed works just fine, no SRAT
>>>>>>>>>>>> complaints and NUMA works fine.
>>>>>>>>>>> do you need
>>>>>>>>>>> memmap=62G@4G
>>>>>>>>>>> in this case?
>>>>>>>>>> Yes, I've needed that always.
>>>>>>>>> good,
>>>>>>>>>
>>>>>>>>> can you enable debug option in kexec to see why kexec can not pass
>>>>>>>>> whole 38? range to second kernel?
>>>>>>>> Not getting any output so far, -d doesn't do much. Poking around in the
>>>>>>>> source...
>>>>>>> OK, cold boot and kexec 2.0.1 gets all 39 ranges passed properly to
>>>>>>> kexec'ed kernels. Since the older kexec stopped at range 30 (31 ranges
>>>>>>> total), that smells like just a kexec bug. Retesting -git...
>>>>>> Current -git works fine when all the ranges are passed correctly. So, I
>>>>>> think, the only existing regression is the SRAT issue.
>>>>> did you change node_shift?
>>>> Yes:
>>>>
>>>> CONFIG_NODES_SHIFT=6
>>>>
>>>> What I don't get is that 2.6.32 and -git print the same PXM map, and in
>>>> both cases it's totalling exactly 64G. Yet it says:
>>>>
>>>> SRAT: PXMs only cover 49035MB of your 65419MB e820 RAM. Not used.
>>> Clue:
>>>
>>> [ 0.000000] SRAT: Node 0 PXM 0 0-80000000
>>> [ 0.000000] SRAT: Node 0 PXM 0 100000000-480000000
>>> [ 0.000000] SRAT: Node 2 PXM 1 480000000-880000000
>>> [ 0.000000] SRAT: Node 1 PXM 2 880000000-c80000000
>>> [ 0.000000] SRAT: Node 3 PXM 3 c80000000-1080000000
>>> [ 0.000000] NUMA: Using 31 for the hash shift.
>>> [ 0.000000] pxm0: 0-480000 (4718592), absent 553990
>>> [ 0.000000] pxm1: 880000-c80000 (4194304), absent 0
>>> [ 0.000000] pxm2: 480000-880000 (4194304), absent 4194304
>>> [ 0.000000] pxm3: c80000-1080000 (4194304), absent 0
>>> [ 0.000000] SRAT: PXMs only cover 49035MB of your 65419MB e820 RAM. Not used.
>>> [ 0.000000] SRAT: SRAT not used.
>>>
>> oh, i post one patch last week,
>>
>> can you check it?
>
> Sure, let me try it. I already found out that commit 8716273c is the
> guilty one (x86: Export srat physical topology).

ok, my patch should fix that.

YH

2009-12-15 21:52:19

by Jens Axboe

[permalink] [raw]
Subject: Re: kexec boot regression

On Tue, Dec 15 2009, Jens Axboe wrote:
> > oh, i post one patch last week,
> >
> > can you check it?
>
> Sure, let me try it. I already found out that commit 8716273c is the
> guilty one (x86: Export srat physical topology).

Confirmed, -git with that patch works as well. So that's all of them I
think, can we please get this expedited in so that -rc1 will work?
Thanks!

--
Jens Axboe

2009-12-15 22:26:15

by Yinghai Lu

[permalink] [raw]
Subject: Re: kexec boot regression

Jens Axboe wrote:
> On Tue, Dec 15 2009, Jens Axboe wrote:
>>> oh, i post one patch last week,
>>>
>>> can you check it?
>> Sure, let me try it. I already found out that commit 8716273c is the
>> guilty one (x86: Export srat physical topology).
>
> Confirmed, -git with that patch works as well. So that's all of them I
> think, can we please get this expedited in so that -rc1 will work?
> Thanks!

updated version:

[PATCH] x86: fix checking of SRAT when node0 ram is not from 0 -v3

Found one system that boot from socket1 instead of socket0, SRAT get rejected...

[ 0.000000] SRAT: Node 1 PXM 0 0-a0000
[ 0.000000] SRAT: Node 1 PXM 0 100000-80000000
[ 0.000000] SRAT: Node 1 PXM 0 100000000-2080000000
[ 0.000000] SRAT: Node 0 PXM 1 2080000000-4080000000
[ 0.000000] SRAT: Node 2 PXM 2 4080000000-6080000000
[ 0.000000] SRAT: Node 3 PXM 3 6080000000-8080000000
[ 0.000000] SRAT: Node 4 PXM 4 8080000000-a080000000
[ 0.000000] SRAT: Node 5 PXM 5 a080000000-c080000000
[ 0.000000] SRAT: Node 6 PXM 6 c080000000-e080000000
[ 0.000000] SRAT: Node 7 PXM 7 e080000000-10080000000
...
[ 0.000000] NUMA: Allocated memnodemap from 500000 - 701040
[ 0.000000] NUMA: Using 20 for the hash shift.
[ 0.000000] Adding active range (0, 0x2080000, 0x4080000) 0 entries of 3200 used
[ 0.000000] Adding active range (1, 0x0, 0x96) 1 entries of 3200 used
[ 0.000000] Adding active range (1, 0x100, 0x7f750) 2 entries of 3200 used
[ 0.000000] Adding active range (1, 0x100000, 0x2080000) 3 entries of 3200 used
[ 0.000000] Adding active range (2, 0x4080000, 0x6080000) 4 entries of 3200 used
[ 0.000000] Adding active range (3, 0x6080000, 0x8080000) 5 entries of 3200 used
[ 0.000000] Adding active range (4, 0x8080000, 0xa080000) 6 entries of 3200 used
[ 0.000000] Adding active range (5, 0xa080000, 0xc080000) 7 entries of 3200 used
[ 0.000000] Adding active range (6, 0xc080000, 0xe080000) 8 entries of 3200 used
[ 0.000000] Adding active range (7, 0xe080000, 0x10080000) 9 entries of 3200 used
[ 0.000000] SRAT: PXMs only cover 917504MB of your 1048566MB e820 RAM. Not used.
[ 0.000000] SRAT: SRAT not used.

the early_node_map is not sorted because node0 with non zero start come first.

so try to sort it right away after all regions are registered.

also fixs refression by 8716273c (x86: Export srat physical topology)

-v2: make it more solid to handle cross node case like node0 [0,4g), [8,12g) and node1 [4g, 8g), [12g, 16g)
-v3: update comments.

Signed-off-by: Yinghai Lu <[email protected]>
Tested-by: Jens Axboe <[email protected]>

---
arch/x86/mm/srat_32.c | 2 ++
arch/x86/mm/srat_64.c | 4 +++-
include/linux/mm.h | 3 +++
mm/page_alloc.c | 4 ++--
4 files changed, 10 insertions(+), 3 deletions(-)

Index: linux-2.6/arch/x86/mm/srat_32.c
===================================================================
--- linux-2.6.orig/arch/x86/mm/srat_32.c
+++ linux-2.6/arch/x86/mm/srat_32.c
@@ -267,6 +267,8 @@ int __init get_memcfg_from_srat(void)
e820_register_active_regions(chunk->nid, chunk->start_pfn,
min(chunk->end_pfn, max_pfn));
}
+ /* for out of order entries in SRAT */
+ sort_node_map();

for_each_online_node(nid) {
unsigned long start = node_start_pfn[nid];
Index: linux-2.6/arch/x86/mm/srat_64.c
===================================================================
--- linux-2.6.orig/arch/x86/mm/srat_64.c
+++ linux-2.6/arch/x86/mm/srat_64.c
@@ -317,7 +317,7 @@ static int __init nodes_cover_memory(con
unsigned long s = nodes[i].start >> PAGE_SHIFT;
unsigned long e = nodes[i].end >> PAGE_SHIFT;
pxmram += e - s;
- pxmram -= absent_pages_in_range(s, e);
+ pxmram -= __absent_pages_in_range(i, s, e);
if ((long)pxmram < 0)
pxmram = 0;
}
@@ -373,6 +373,8 @@ int __init acpi_scan_nodes(unsigned long
for_each_node_mask(i, nodes_parsed)
e820_register_active_regions(i, nodes[i].start >> PAGE_SHIFT,
nodes[i].end >> PAGE_SHIFT);
+ /* for out of order entries in SRAT */
+ sort_node_map();
if (!nodes_cover_memory(nodes)) {
bad_srat();
return -1;
Index: linux-2.6/include/linux/mm.h
===================================================================
--- linux-2.6.orig/include/linux/mm.h
+++ linux-2.6/include/linux/mm.h
@@ -1037,6 +1037,9 @@ extern void add_active_range(unsigned in
extern void remove_active_range(unsigned int nid, unsigned long start_pfn,
unsigned long end_pfn);
extern void remove_all_active_ranges(void);
+void sort_node_map(void);
+unsigned long __absent_pages_in_range(int nid, unsigned long start_pfn,
+ unsigned long end_pfn);
extern unsigned long absent_pages_in_range(unsigned long start_pfn,
unsigned long end_pfn);
extern void get_pfn_range_for_nid(unsigned int nid,
Index: linux-2.6/mm/page_alloc.c
===================================================================
--- linux-2.6.orig/mm/page_alloc.c
+++ linux-2.6/mm/page_alloc.c
@@ -3569,7 +3569,7 @@ static unsigned long __meminit zone_span
* Return the number of holes in a range on a node. If nid is MAX_NUMNODES,
* then all holes in the requested range will be accounted for.
*/
-static unsigned long __meminit __absent_pages_in_range(int nid,
+unsigned long __meminit __absent_pages_in_range(int nid,
unsigned long range_start_pfn,
unsigned long range_end_pfn)
{
@@ -4098,7 +4098,7 @@ static int __init cmp_node_active_region
}

/* sort the node_map by start_pfn */
-static void __init sort_node_map(void)
+void __init sort_node_map(void)
{
sort(early_node_map, (size_t)nr_nodemap_entries,
sizeof(struct node_active_region),

2009-12-15 23:07:56

by Markus Trippelsdorf

[permalink] [raw]
Subject: Re: kexec boot regression radeon/kms (bisected)

On Tue, Dec 15, 2009 at 10:30:21PM +0100, Markus Trippelsdorf wrote:

> I have the same symptoms on my machine, but the underlying cause must be
> different. I once reverted all Radeon related changes since 2.6.32 and
> kexec started working again.
>
OK, I bisected this down to:

d8f60cfc93452d0554f6a701aa8e3236cbee4636 is the first bad commit
commit d8f60cfc93452d0554f6a701aa8e3236cbee4636
Author: Alex Deucher <[email protected]>
Date: Tue Dec 1 13:43:46 2009 -0500

drm/radeon/kms: Add support for interrupts on r6xx/r7xx chips (v3)
--
Markus

2009-12-16 10:01:10

by Jens Axboe

[permalink] [raw]
Subject: Re: kexec boot regression

On Tue, Dec 15 2009, Yinghai Lu wrote:
> Jens Axboe wrote:
> > On Tue, Dec 15 2009, Jens Axboe wrote:
> >>> oh, i post one patch last week,
> >>>
> >>> can you check it?
> >> Sure, let me try it. I already found out that commit 8716273c is the
> >> guilty one (x86: Export srat physical topology).
> >
> > Confirmed, -git with that patch works as well. So that's all of them I
> > think, can we please get this expedited in so that -rc1 will work?
> > Thanks!
>
> updated version:
>
> [PATCH] x86: fix checking of SRAT when node0 ram is not from 0 -v3

Verified, this one works fine, too.

--
Jens Axboe