2009-06-26 15:59:29

by Mikael Pettersson

[permalink] [raw]
Subject: [BUG 2.6.31-rc1] HIGHMEM64G causes hang in PCI init on 32-bit x86

The combination of HIGHMEM64G and PCI doesn't work in 2.6.31-rc1,
causing a hang in PCI initialisation during boot:

Linux version 2.6.31-rc1 (mikpe@brewer) (gcc version 4.3.4 20090621 (prerelease) (GCC) ) #1 Fri Jun 26 16:01:50 CEST 2009
KERNEL supported cpus:
Intel GenuineIntel
BIOS-provided physical RAM map:
BIOS-e820: 0000000000000000 - 000000000009ec00 (usable)
BIOS-e820: 000000000009ec00 - 00000000000a0000 (reserved)
BIOS-e820: 00000000000e4000 - 0000000000100000 (reserved)
BIOS-e820: 0000000000100000 - 000000007ff90000 (usable)
BIOS-e820: 000000007ff90000 - 000000007ff9e000 (ACPI data)
BIOS-e820: 000000007ff9e000 - 000000007ffe0000 (ACPI NVS)
BIOS-e820: 000000007ffe0000 - 0000000080000000 (reserved)
BIOS-e820: 00000000fee00000 - 00000000fee01000 (reserved)
BIOS-e820: 00000000ffb00000 - 0000000100000000 (reserved)
BIOS-e820: 0000000100000000 - 0000000200000000 (usable)
DMI 2.4 present.
last_pfn = 0x200000 max_arch_pfn = 0x1000000
init_memory_mapping: 0000000000000000-00000000379fe000
NX (Execute Disable) protection: active
RAMDISK: 37ebb000 - 37fef7c6
Allocated new RAMDISK: 00221000 - 003557c6
Move RAMDISK from 0000000037ebb000 - 0000000037fef7c5 to 00221000 - 003557c5
7302MB HIGHMEM available.
889MB LOWMEM available.
mapped low ram: 0 - 379fe000
low ram: 0 - 379fe000
node 0 low ram: 00000000 - 379fe000
node 0 bootmap 00009000 - 0000ff40
(7 early reservations) ==> bootmem [0000000000 - 00379fe000]
#0 [0000000000 - 0000001000] BIOS data page ==> [0000000000 - 0000001000]
#1 [0000100000 - 000021cf18] TEXT DATA BSS ==> [0000100000 - 000021cf18]
#2 [000009ec00 - 0000100000] BIOS reserved ==> [000009ec00 - 0000100000]
#3 [000021d000 - 000022022c] BRK ==> [000021d000 - 000022022c]
#4 [0000007000 - 0000009000] PGTABLE ==> [0000007000 - 0000009000]
#5 [0000221000 - 00003557c6] NEW RAMDISK ==> [0000221000 - 00003557c6]
#6 [0000009000 - 0000010000] BOOTMAP ==> [0000009000 - 0000010000]
Zone PFN ranges:
DMA 0x00000000 -> 0x00001000
Normal 0x00001000 -> 0x000379fe
HighMem 0x000379fe -> 0x00200000
Movable zone start PFN for each node
early_node_map[3] active PFN ranges
0: 0x00000000 -> 0x0000009e
0: 0x00000100 -> 0x0007ff90
0: 0x00100000 -> 0x00200000
Allocating PCI resources starting at 80000000 (gap: 80000000:7ee00000)
Built 1 zonelists in Zone order, mobility grouping on. Total pages: 1556269
Kernel command line: ro root=LABEL=32/ console=ttyS0,115200
PID hash table entries: 4096 (order: 12, 16384 bytes)
Dentry cache hash table entries: 131072 (order: 7, 524288 bytes)
Inode-cache hash table entries: 65536 (order: 6, 262144 bytes)
Enabling fast FPU save and restore... done.
Enabling unmasked SIMD FPU exception support... done.
Initializing CPU#0
Initializing HighMem for node 0 (000379fe:00200000)
Memory: 6220996k/8388608k available (691k kernel code, 68768k reserved, 203k data, 144k init, 5379656k highmem)
virtual kernel memory layout:
fixmap : 0xfffe6000 - 0xfffff000 ( 100 kB)
pkmap : 0xffa00000 - 0xffc00000 (2048 kB)
vmalloc : 0xf81fe000 - 0xff9fe000 ( 120 MB)
lowmem : 0xc0000000 - 0xf79fe000 ( 889 MB)
.init : 0xc01e2000 - 0xc0206000 ( 144 kB)
.data : 0xc01acd86 - 0xc01dfbdc ( 203 kB)
.text : 0xc0100000 - 0xc01acd86 ( 691 kB)
Checking if this processor honours the WP bit even in supervisor mode...Ok.
NR_IRQS:16
Fast TSC calibration using PIT
Detected 2400.277 MHz processor.
Console: colour VGA+ 80x25
console [ttyS0] enabled
Calibrating delay loop (skipped), value calculated using timer frequency.. 4800.55 BogoMIPS (lpj=24002770)
Mount-cache hash table entries: 512
CPU: L1 I cache: 32K, L1 D cache: 32K
CPU: L2 cache: 4096K
using mwait in idle threads.
CPU: Intel(R) Core(TM)2 CPU 6600 @ 2.40GHz stepping 06
Checking 'hlt' instruction... OK.
PCI: PCI BIOS revision 3.00 entry at 0xf0031, last bus=2
PCI: Using configuration type 1 for base access
PCI: Probing PCI hardware
pci 0000:00:01.0: PME# supported from D0 D3hot D3cold
pci 0000:00:01.0: PME# disabled
pci 0000:00:1a.7: PME# supported from D0 D3hot D3cold
pci 0000:00:1a.7: PME# disabled
pci 0000:00:1b.0: PME# supported from D0 D3hot D3cold
pci 0000:00:1b.0: PME# disabled
pci 0000:00:1d.7: PME# supported from D0 D3hot D3cold
pci 0000:00:1d.7: PME# disabled
pci 0000:00:1f.0: quirk: region 0800-087f claimed by ICH6 ACPI/GPIO/TCO
pci 0000:00:1f.0: quirk: region 0480-04bf claimed by ICH6 GPIO
pci 0000:00:1f.0: ICH7 LPC Generic IO decode 1 PIO at 0294 (mask 0003)
pci 0000:00:1f.2: PME# supported from D3hot
pci 0000:00:1f.2: PME# disabled
pci 0000:02:02.0: PME# supported from D1 D2 D3hot D3cold
pci 0000:02:02.0: PME# disabled
pci 0000:00:1e.0: transparent bridge
pci 0000:00:1f.0: PIIX/ICH IRQ router [8086:2810]

At this point the kernel hangs hard until rebooted.

Rebooting with mem=2048M (to avoid issues with mappings >= 4GB)
allows PCI init to proceed and print:

pci 0000:00:01.0: PCI bridge, secondary bus 0000:01
pci 0000:00:01.0: IO window: 0x9000-0xbfff
pci 0000:00:01.0: MEM window: 0xff800000-0xff8fffff
pci 0000:00:01.0: PREFETCH window: 0x000000bff00000-0x000000dfefffff
pci 0000:00:1e.0: PCI bridge, secondary bus 0000:02
pci 0000:00:1e.0: IO window: 0xc000-0xcfff
pci 0000:00:1e.0: MEM window: 0xff900000-0xff9fffff
pci 0000:00:1e.0: PREFETCH window: disabled
pci 0000:00:01.0: found PCI INT A -> IRQ 11
pci 0000:00:01.0: sharing IRQ 11 with 0000:00:1a.0
pci 0000:00:01.0: sharing IRQ 11 with 0000:01:00.0
Unpacking initramfs...
Freeing initrd memory: 1233k freed
platform rtc_cmos: registered platform RTC device (no PNP device found)
Serial: 8250/16550 driver, 4 ports, IRQ sharing disabled
serial8250: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
Platform driver 'serial8250' needs updating - please use dev_pm_ops
Freeing unused kernel memory: 144k freed
Red Hat nash version 5.1.19.0.3 starting
...

Booting with pci=off also allows the kernel to boot.
Reconfiguring with NOHIGHMEM or HIGHMEM4G also allows the kernel to boot.

This is a regression from 2.6.30 and earlier kernels.

/Mikael


2009-06-27 01:14:07

by Mikael Pettersson

[permalink] [raw]
Subject: Re: [BUG 2.6.31-rc1] HIGHMEM64G causes hang in PCI init on 32-bit x86

Mikael Pettersson writes:
> The combination of HIGHMEM64G and PCI doesn't work in 2.6.31-rc1,
> causing a hang in PCI initialisation during boot:
>
> Linux version 2.6.31-rc1 (mikpe@brewer) (gcc version 4.3.4 20090621 (prerelease) (GCC) ) #1 Fri Jun 26 16:01:50 CEST 2009
> KERNEL supported cpus:
> Intel GenuineIntel
> BIOS-provided physical RAM map:
> BIOS-e820: 0000000000000000 - 000000000009ec00 (usable)
> BIOS-e820: 000000000009ec00 - 00000000000a0000 (reserved)
> BIOS-e820: 00000000000e4000 - 0000000000100000 (reserved)
> BIOS-e820: 0000000000100000 - 000000007ff90000 (usable)
> BIOS-e820: 000000007ff90000 - 000000007ff9e000 (ACPI data)
> BIOS-e820: 000000007ff9e000 - 000000007ffe0000 (ACPI NVS)
> BIOS-e820: 000000007ffe0000 - 0000000080000000 (reserved)
> BIOS-e820: 00000000fee00000 - 00000000fee01000 (reserved)
> BIOS-e820: 00000000ffb00000 - 0000000100000000 (reserved)
> BIOS-e820: 0000000100000000 - 0000000200000000 (usable)
> DMI 2.4 present.
> last_pfn = 0x200000 max_arch_pfn = 0x1000000
> init_memory_mapping: 0000000000000000-00000000379fe000
> NX (Execute Disable) protection: active
> RAMDISK: 37ebb000 - 37fef7c6
> Allocated new RAMDISK: 00221000 - 003557c6
> Move RAMDISK from 0000000037ebb000 - 0000000037fef7c5 to 00221000 - 003557c5
> 7302MB HIGHMEM available.
> 889MB LOWMEM available.
> mapped low ram: 0 - 379fe000
> low ram: 0 - 379fe000
> node 0 low ram: 00000000 - 379fe000
> node 0 bootmap 00009000 - 0000ff40
> (7 early reservations) ==> bootmem [0000000000 - 00379fe000]
> #0 [0000000000 - 0000001000] BIOS data page ==> [0000000000 - 0000001000]
> #1 [0000100000 - 000021cf18] TEXT DATA BSS ==> [0000100000 - 000021cf18]
> #2 [000009ec00 - 0000100000] BIOS reserved ==> [000009ec00 - 0000100000]
> #3 [000021d000 - 000022022c] BRK ==> [000021d000 - 000022022c]
> #4 [0000007000 - 0000009000] PGTABLE ==> [0000007000 - 0000009000]
> #5 [0000221000 - 00003557c6] NEW RAMDISK ==> [0000221000 - 00003557c6]
> #6 [0000009000 - 0000010000] BOOTMAP ==> [0000009000 - 0000010000]
> Zone PFN ranges:
> DMA 0x00000000 -> 0x00001000
> Normal 0x00001000 -> 0x000379fe
> HighMem 0x000379fe -> 0x00200000
> Movable zone start PFN for each node
> early_node_map[3] active PFN ranges
> 0: 0x00000000 -> 0x0000009e
> 0: 0x00000100 -> 0x0007ff90
> 0: 0x00100000 -> 0x00200000
> Allocating PCI resources starting at 80000000 (gap: 80000000:7ee00000)
> Built 1 zonelists in Zone order, mobility grouping on. Total pages: 1556269
> Kernel command line: ro root=LABEL=32/ console=ttyS0,115200
> PID hash table entries: 4096 (order: 12, 16384 bytes)
> Dentry cache hash table entries: 131072 (order: 7, 524288 bytes)
> Inode-cache hash table entries: 65536 (order: 6, 262144 bytes)
> Enabling fast FPU save and restore... done.
> Enabling unmasked SIMD FPU exception support... done.
> Initializing CPU#0
> Initializing HighMem for node 0 (000379fe:00200000)
> Memory: 6220996k/8388608k available (691k kernel code, 68768k reserved, 203k data, 144k init, 5379656k highmem)
> virtual kernel memory layout:
> fixmap : 0xfffe6000 - 0xfffff000 ( 100 kB)
> pkmap : 0xffa00000 - 0xffc00000 (2048 kB)
> vmalloc : 0xf81fe000 - 0xff9fe000 ( 120 MB)
> lowmem : 0xc0000000 - 0xf79fe000 ( 889 MB)
> .init : 0xc01e2000 - 0xc0206000 ( 144 kB)
> .data : 0xc01acd86 - 0xc01dfbdc ( 203 kB)
> .text : 0xc0100000 - 0xc01acd86 ( 691 kB)
> Checking if this processor honours the WP bit even in supervisor mode...Ok.
> NR_IRQS:16
> Fast TSC calibration using PIT
> Detected 2400.277 MHz processor.
> Console: colour VGA+ 80x25
> console [ttyS0] enabled
> Calibrating delay loop (skipped), value calculated using timer frequency.. 4800.55 BogoMIPS (lpj=24002770)
> Mount-cache hash table entries: 512
> CPU: L1 I cache: 32K, L1 D cache: 32K
> CPU: L2 cache: 4096K
> using mwait in idle threads.
> CPU: Intel(R) Core(TM)2 CPU 6600 @ 2.40GHz stepping 06
> Checking 'hlt' instruction... OK.
> PCI: PCI BIOS revision 3.00 entry at 0xf0031, last bus=2
> PCI: Using configuration type 1 for base access
> PCI: Probing PCI hardware
> pci 0000:00:01.0: PME# supported from D0 D3hot D3cold
> pci 0000:00:01.0: PME# disabled
> pci 0000:00:1a.7: PME# supported from D0 D3hot D3cold
> pci 0000:00:1a.7: PME# disabled
> pci 0000:00:1b.0: PME# supported from D0 D3hot D3cold
> pci 0000:00:1b.0: PME# disabled
> pci 0000:00:1d.7: PME# supported from D0 D3hot D3cold
> pci 0000:00:1d.7: PME# disabled
> pci 0000:00:1f.0: quirk: region 0800-087f claimed by ICH6 ACPI/GPIO/TCO
> pci 0000:00:1f.0: quirk: region 0480-04bf claimed by ICH6 GPIO
> pci 0000:00:1f.0: ICH7 LPC Generic IO decode 1 PIO at 0294 (mask 0003)
> pci 0000:00:1f.2: PME# supported from D3hot
> pci 0000:00:1f.2: PME# disabled
> pci 0000:02:02.0: PME# supported from D1 D2 D3hot D3cold
> pci 0000:02:02.0: PME# disabled
> pci 0000:00:1e.0: transparent bridge
> pci 0000:00:1f.0: PIIX/ICH IRQ router [8086:2810]
>
> At this point the kernel hangs hard until rebooted.
>
> Rebooting with mem=2048M (to avoid issues with mappings >= 4GB)
> allows PCI init to proceed and print:
>
> pci 0000:00:01.0: PCI bridge, secondary bus 0000:01
> pci 0000:00:01.0: IO window: 0x9000-0xbfff
> pci 0000:00:01.0: MEM window: 0xff800000-0xff8fffff
> pci 0000:00:01.0: PREFETCH window: 0x000000bff00000-0x000000dfefffff
> pci 0000:00:1e.0: PCI bridge, secondary bus 0000:02
> pci 0000:00:1e.0: IO window: 0xc000-0xcfff
> pci 0000:00:1e.0: MEM window: 0xff900000-0xff9fffff
> pci 0000:00:1e.0: PREFETCH window: disabled
> pci 0000:00:01.0: found PCI INT A -> IRQ 11
> pci 0000:00:01.0: sharing IRQ 11 with 0000:00:1a.0
> pci 0000:00:01.0: sharing IRQ 11 with 0000:01:00.0
> Unpacking initramfs...
> Freeing initrd memory: 1233k freed
> platform rtc_cmos: registered platform RTC device (no PNP device found)
> Serial: 8250/16550 driver, 4 ports, IRQ sharing disabled
> serial8250: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
> Platform driver 'serial8250' needs updating - please use dev_pm_ops
> Freeing unused kernel memory: 144k freed
> Red Hat nash version 5.1.19.0.3 starting
> ...
>
> Booting with pci=off also allows the kernel to boot.
> Reconfiguring with NOHIGHMEM or HIGHMEM4G also allows the kernel to boot.
>
> This is a regression from 2.6.30 and earlier kernels.

I've now identified commit 95ee14e4379c5e19c0897c872350570402014742
"x86: cap iomem_resource to addressable physical memory" by hpa (cc:d)
as the culprit. Reverting it fixes my boot hang.

That commit was:

> x86: cap iomem_resource to addressable physical memory
>
> iomem_resource is by default initialized to -1, which means 64 bits of
> physical address space if 64-bit resources are enabled. However, x86
> CPUs cannot address 64 bits of physical address space. Thus, we want
> to cap the physical address space to what the union of all CPU can
> actually address.
>
> Without this patch, we may end up assigning inaccessible values to
> uninitialized 64-bit PCI memory resources.
>
> Signed-off-by: H. Peter Anvin <[email protected]>
> Cc: Matthew Wilcox <[email protected]>
> Cc: Jesse Barnes <[email protected]>
> Cc: Martin Mares <[email protected]>
> Cc: [email protected]
> ---
>
> diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
> index 3ffdcfa..5b9cb88 100644
> --- a/arch/x86/kernel/cpu/common.c
> +++ b/arch/x86/kernel/cpu/common.c
> @@ -853,6 +853,9 @@ static void __cpuinit identify_cpu(struct cpuinfo_x86 *c)
> #if defined(CONFIG_NUMA) && defined(CONFIG_X86_64)
> numa_add_cpu(smp_processor_id());
> #endif
> +
> + /* Cap the iomem address space to what is addressable on all CPUs */
> + iomem_resource.end &= (1ULL << c->x86_phys_bits) - 1;
> }
>
> #ifdef CONFIG_X86_64

2009-06-27 04:26:06

by Grant Grundler

[permalink] [raw]
Subject: Re: [BUG 2.6.31-rc1] HIGHMEM64G causes hang in PCI init on 32-bit x86

On Sat, Jun 27, 2009 at 03:13:52AM +0200, Mikael Pettersson wrote:
> Mikael Pettersson writes:
...
> > CPU: Intel(R) Core(TM)2 CPU 6600 @ 2.40GHz stepping 06
> > Checking 'hlt' instruction... OK.
> > PCI: PCI BIOS revision 3.00 entry at 0xf0031, last bus=2
...
> > pci 0000:00:1e.0: transparent bridge
> > pci 0000:00:1f.0: PIIX/ICH IRQ router [8086:2810]
> >
> > At this point the kernel hangs hard until rebooted.

...
> I've now identified commit 95ee14e4379c5e19c0897c872350570402014742
> "x86: cap iomem_resource to addressable physical memory" by hpa (cc:d)
> as the culprit. Reverting it fixes my boot hang.

Mikael,
thanks for tracking this down...can you dump the value of c->x86_phys_bits
please?

I have one question about the original commit below.

> > x86: cap iomem_resource to addressable physical memory
> >
> > iomem_resource is by default initialized to -1, which means 64 bits of
> > physical address space if 64-bit resources are enabled. However, x86
> > CPUs cannot address 64 bits of physical address space. Thus, we want
> > to cap the physical address space to what the union of all CPU can
> > actually address.
> >
> > Without this patch, we may end up assigning inaccessible values to
> > uninitialized 64-bit PCI memory resources.

In general, this makes sense to me and that's why I didn't comment
on the patch when it was originally submitted.

...
> > --- a/arch/x86/kernel/cpu/common.c
> > +++ b/arch/x86/kernel/cpu/common.c
> > @@ -853,6 +853,9 @@ static void __cpuinit identify_cpu(struct cpuinfo_x86 *c)
> > #if defined(CONFIG_NUMA) && defined(CONFIG_X86_64)
> > numa_add_cpu(smp_processor_id());
> > #endif
> > +
> > + /* Cap the iomem address space to what is addressable on all CPUs */
> > + iomem_resource.end &= (1ULL << c->x86_phys_bits) - 1;

Does x86_phys_bits represent the number of address lines/bits handled by
the memory controller, coming out of the CPU, or handled by the
"north bridge" (IO controller)?

I was assuming all three are the same thing but that might not be true
with "QPI" or whatever Intel is calling it's serial interconnect these days.
I'm wondering if the addressing capability of the CPU->memory controller
might be different than CPU->IO Controller.

Parallel interconnects are limited by the number of lines wired to
transmit address data and I expect that's where x86_phys_bits originally
came from. Chipsets _were_ all designed around those limits.

thanks,
grant

2009-06-27 04:56:46

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [BUG 2.6.31-rc1] HIGHMEM64G causes hang in PCI init on 32-bit x86

Can you send me /proc/cpuinfo and /proc/iomem from this box?

-hpa

Mikael Pettersson wrote:
> Mikael Pettersson writes:
> > The combination of HIGHMEM64G and PCI doesn't work in 2.6.31-rc1,
> > causing a hang in PCI initialisation during boot:
> >
> > Linux version 2.6.31-rc1 (mikpe@brewer) (gcc version 4.3.4 20090621 (prerelease) (GCC) ) #1 Fri Jun 26 16:01:50 CEST 2009
> > KERNEL supported cpus:
> > Intel GenuineIntel
> > BIOS-provided physical RAM map:
> > BIOS-e820: 0000000000000000 - 000000000009ec00 (usable)
> > BIOS-e820: 000000000009ec00 - 00000000000a0000 (reserved)
> > BIOS-e820: 00000000000e4000 - 0000000000100000 (reserved)
> > BIOS-e820: 0000000000100000 - 000000007ff90000 (usable)
> > BIOS-e820: 000000007ff90000 - 000000007ff9e000 (ACPI data)
> > BIOS-e820: 000000007ff9e000 - 000000007ffe0000 (ACPI NVS)
> > BIOS-e820: 000000007ffe0000 - 0000000080000000 (reserved)
> > BIOS-e820: 00000000fee00000 - 00000000fee01000 (reserved)
> > BIOS-e820: 00000000ffb00000 - 0000000100000000 (reserved)
> > BIOS-e820: 0000000100000000 - 0000000200000000 (usable)
> > DMI 2.4 present.
> > last_pfn = 0x200000 max_arch_pfn = 0x1000000
> > init_memory_mapping: 0000000000000000-00000000379fe000
> > NX (Execute Disable) protection: active
> > RAMDISK: 37ebb000 - 37fef7c6
> > Allocated new RAMDISK: 00221000 - 003557c6
> > Move RAMDISK from 0000000037ebb000 - 0000000037fef7c5 to 00221000 - 003557c5
> > 7302MB HIGHMEM available.
> > 889MB LOWMEM available.
> > mapped low ram: 0 - 379fe000
> > low ram: 0 - 379fe000
> > node 0 low ram: 00000000 - 379fe000
> > node 0 bootmap 00009000 - 0000ff40
> > (7 early reservations) ==> bootmem [0000000000 - 00379fe000]
> > #0 [0000000000 - 0000001000] BIOS data page ==> [0000000000 - 0000001000]
> > #1 [0000100000 - 000021cf18] TEXT DATA BSS ==> [0000100000 - 000021cf18]
> > #2 [000009ec00 - 0000100000] BIOS reserved ==> [000009ec00 - 0000100000]
> > #3 [000021d000 - 000022022c] BRK ==> [000021d000 - 000022022c]
> > #4 [0000007000 - 0000009000] PGTABLE ==> [0000007000 - 0000009000]
> > #5 [0000221000 - 00003557c6] NEW RAMDISK ==> [0000221000 - 00003557c6]
> > #6 [0000009000 - 0000010000] BOOTMAP ==> [0000009000 - 0000010000]
> > Zone PFN ranges:
> > DMA 0x00000000 -> 0x00001000
> > Normal 0x00001000 -> 0x000379fe
> > HighMem 0x000379fe -> 0x00200000
> > Movable zone start PFN for each node
> > early_node_map[3] active PFN ranges
> > 0: 0x00000000 -> 0x0000009e
> > 0: 0x00000100 -> 0x0007ff90
> > 0: 0x00100000 -> 0x00200000
> > Allocating PCI resources starting at 80000000 (gap: 80000000:7ee00000)
> > Built 1 zonelists in Zone order, mobility grouping on. Total pages: 1556269
> > Kernel command line: ro root=LABEL=32/ console=ttyS0,115200
> > PID hash table entries: 4096 (order: 12, 16384 bytes)
> > Dentry cache hash table entries: 131072 (order: 7, 524288 bytes)
> > Inode-cache hash table entries: 65536 (order: 6, 262144 bytes)
> > Enabling fast FPU save and restore... done.
> > Enabling unmasked SIMD FPU exception support... done.
> > Initializing CPU#0
> > Initializing HighMem for node 0 (000379fe:00200000)
> > Memory: 6220996k/8388608k available (691k kernel code, 68768k reserved, 203k data, 144k init, 5379656k highmem)
> > virtual kernel memory layout:
> > fixmap : 0xfffe6000 - 0xfffff000 ( 100 kB)
> > pkmap : 0xffa00000 - 0xffc00000 (2048 kB)
> > vmalloc : 0xf81fe000 - 0xff9fe000 ( 120 MB)
> > lowmem : 0xc0000000 - 0xf79fe000 ( 889 MB)
> > .init : 0xc01e2000 - 0xc0206000 ( 144 kB)
> > .data : 0xc01acd86 - 0xc01dfbdc ( 203 kB)
> > .text : 0xc0100000 - 0xc01acd86 ( 691 kB)
> > Checking if this processor honours the WP bit even in supervisor mode...Ok.
> > NR_IRQS:16
> > Fast TSC calibration using PIT
> > Detected 2400.277 MHz processor.
> > Console: colour VGA+ 80x25
> > console [ttyS0] enabled
> > Calibrating delay loop (skipped), value calculated using timer frequency.. 4800.55 BogoMIPS (lpj=24002770)
> > Mount-cache hash table entries: 512
> > CPU: L1 I cache: 32K, L1 D cache: 32K
> > CPU: L2 cache: 4096K
> > using mwait in idle threads.
> > CPU: Intel(R) Core(TM)2 CPU 6600 @ 2.40GHz stepping 06
> > Checking 'hlt' instruction... OK.
> > PCI: PCI BIOS revision 3.00 entry at 0xf0031, last bus=2
> > PCI: Using configuration type 1 for base access
> > PCI: Probing PCI hardware
> > pci 0000:00:01.0: PME# supported from D0 D3hot D3cold
> > pci 0000:00:01.0: PME# disabled
> > pci 0000:00:1a.7: PME# supported from D0 D3hot D3cold
> > pci 0000:00:1a.7: PME# disabled
> > pci 0000:00:1b.0: PME# supported from D0 D3hot D3cold
> > pci 0000:00:1b.0: PME# disabled
> > pci 0000:00:1d.7: PME# supported from D0 D3hot D3cold
> > pci 0000:00:1d.7: PME# disabled
> > pci 0000:00:1f.0: quirk: region 0800-087f claimed by ICH6 ACPI/GPIO/TCO
> > pci 0000:00:1f.0: quirk: region 0480-04bf claimed by ICH6 GPIO
> > pci 0000:00:1f.0: ICH7 LPC Generic IO decode 1 PIO at 0294 (mask 0003)
> > pci 0000:00:1f.2: PME# supported from D3hot
> > pci 0000:00:1f.2: PME# disabled
> > pci 0000:02:02.0: PME# supported from D1 D2 D3hot D3cold
> > pci 0000:02:02.0: PME# disabled
> > pci 0000:00:1e.0: transparent bridge
> > pci 0000:00:1f.0: PIIX/ICH IRQ router [8086:2810]
> >
> > At this point the kernel hangs hard until rebooted.
> >
> > Rebooting with mem=2048M (to avoid issues with mappings >= 4GB)
> > allows PCI init to proceed and print:
> >
> > pci 0000:00:01.0: PCI bridge, secondary bus 0000:01
> > pci 0000:00:01.0: IO window: 0x9000-0xbfff
> > pci 0000:00:01.0: MEM window: 0xff800000-0xff8fffff
> > pci 0000:00:01.0: PREFETCH window: 0x000000bff00000-0x000000dfefffff
> > pci 0000:00:1e.0: PCI bridge, secondary bus 0000:02
> > pci 0000:00:1e.0: IO window: 0xc000-0xcfff
> > pci 0000:00:1e.0: MEM window: 0xff900000-0xff9fffff
> > pci 0000:00:1e.0: PREFETCH window: disabled
> > pci 0000:00:01.0: found PCI INT A -> IRQ 11
> > pci 0000:00:01.0: sharing IRQ 11 with 0000:00:1a.0
> > pci 0000:00:01.0: sharing IRQ 11 with 0000:01:00.0
> > Unpacking initramfs...
> > Freeing initrd memory: 1233k freed
> > platform rtc_cmos: registered platform RTC device (no PNP device found)
> > Serial: 8250/16550 driver, 4 ports, IRQ sharing disabled
> > serial8250: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
> > Platform driver 'serial8250' needs updating - please use dev_pm_ops
> > Freeing unused kernel memory: 144k freed
> > Red Hat nash version 5.1.19.0.3 starting
> > ...
> >
> > Booting with pci=off also allows the kernel to boot.
> > Reconfiguring with NOHIGHMEM or HIGHMEM4G also allows the kernel to boot.
> >
> > This is a regression from 2.6.30 and earlier kernels.
>
> I've now identified commit 95ee14e4379c5e19c0897c872350570402014742
> "x86: cap iomem_resource to addressable physical memory" by hpa (cc:d)
> as the culprit. Reverting it fixes my boot hang.
>
> That commit was:
>
>> x86: cap iomem_resource to addressable physical memory
>>
>> iomem_resource is by default initialized to -1, which means 64 bits of
>> physical address space if 64-bit resources are enabled. However, x86
>> CPUs cannot address 64 bits of physical address space. Thus, we want
>> to cap the physical address space to what the union of all CPU can
>> actually address.
>>
>> Without this patch, we may end up assigning inaccessible values to
>> uninitialized 64-bit PCI memory resources.
>>
>> Signed-off-by: H. Peter Anvin <[email protected]>
>> Cc: Matthew Wilcox <[email protected]>
>> Cc: Jesse Barnes <[email protected]>
>> Cc: Martin Mares <[email protected]>
>> Cc: [email protected]
>> ---
>>
>> diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
>> index 3ffdcfa..5b9cb88 100644
>> --- a/arch/x86/kernel/cpu/common.c
>> +++ b/arch/x86/kernel/cpu/common.c
>> @@ -853,6 +853,9 @@ static void __cpuinit identify_cpu(struct cpuinfo_x86 *c)
>> #if defined(CONFIG_NUMA) && defined(CONFIG_X86_64)
>> numa_add_cpu(smp_processor_id());
>> #endif
>> +
>> + /* Cap the iomem address space to what is addressable on all CPUs */
>> + iomem_resource.end &= (1ULL << c->x86_phys_bits) - 1;
>> }
>>
>> #ifdef CONFIG_X86_64

2009-06-27 05:35:15

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [BUG 2.6.31-rc1] HIGHMEM64G causes hang in PCI init on 32-bit x86

Grant Grundler wrote:
>>> +
>>> + /* Cap the iomem address space to what is addressable on all CPUs */
>>> + iomem_resource.end &= (1ULL << c->x86_phys_bits) - 1;
>
> Does x86_phys_bits represent the number of address lines/bits handled by
> the memory controller, coming out of the CPU, or handled by the
> "north bridge" (IO controller)?
>

x86_phys_bits represents the top end of what the processor can address.

> I was assuming all three are the same thing but that might not be true
> with "QPI" or whatever Intel is calling it's serial interconnect these days.
> I'm wondering if the addressing capability of the CPU->memory controller
> might be different than CPU->IO Controller.
>
> Parallel interconnects are limited by the number of lines wired to
> transmit address data and I expect that's where x86_phys_bits originally
> came from. Chipsets _were_ all designed around those limits.

Serial interconnects behave the same way, it's just that the address
bits are sent in serial order. Something is seriously goofy here, and
it's probably reasonably straightforward to figure out what.

-hpa

2009-06-27 09:42:32

by Mikael Pettersson

[permalink] [raw]
Subject: Re: [BUG 2.6.31-rc1] HIGHMEM64G causes hang in PCI init on 32-bit x86

Grant Grundler writes:
> On Sat, Jun 27, 2009 at 03:13:52AM +0200, Mikael Pettersson wrote:
> > Mikael Pettersson writes:
> ...
> > > CPU: Intel(R) Core(TM)2 CPU 6600 @ 2.40GHz stepping 06
> > > Checking 'hlt' instruction... OK.
> > > PCI: PCI BIOS revision 3.00 entry at 0xf0031, last bus=2
> ...
> > > pci 0000:00:1e.0: transparent bridge
> > > pci 0000:00:1f.0: PIIX/ICH IRQ router [8086:2810]
> > >
> > > At this point the kernel hangs hard until rebooted.
>
> ...
> > I've now identified commit 95ee14e4379c5e19c0897c872350570402014742
> > "x86: cap iomem_resource to addressable physical memory" by hpa (cc:d)
> > as the culprit. Reverting it fixes my boot hang.
>
> Mikael,
> thanks for tracking this down...can you dump the value of c->x86_phys_bits
> please?

Sure. This is from 2.6.31-rc1 with the commit commented out and
replaced by a debug printk:

identify_cpu: ->x86_phys_bits == 36, would change iomem_resource.end from 0xffffffffffffffff to 0x0000000fffffffff

which looks correct for HIGHMEM64G.

2009-06-27 09:45:33

by Mikael Pettersson

[permalink] [raw]
Subject: Re: [BUG 2.6.31-rc1] HIGHMEM64G causes hang in PCI init on 32-bit x86

H. Peter Anvin writes:
> Can you send me /proc/cpuinfo and /proc/iomem from this box?

This is from 2.6.31-rc1 with the commit commented out.

processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 15
model name : Intel(R) Core(TM)2 CPU 6600 @ 2.40GHz
stepping : 6
cpu MHz : 1596.000
cache size : 4096 KB
physical id : 0
siblings : 2
core id : 0
cpu cores : 2
apicid : 0
initial apicid : 0
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 10
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm constant_tsc arch_perfmon pebs bts pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm lahf_lm tpr_shadow
bogomips : 4799.81
clflush size : 64
power management:

processor : 1
vendor_id : GenuineIntel
cpu family : 6
model : 15
model name : Intel(R) Core(TM)2 CPU 6600 @ 2.40GHz
stepping : 6
cpu MHz : 1596.000
cache size : 4096 KB
physical id : 0
siblings : 2
core id : 1
cpu cores : 2
apicid : 1
initial apicid : 1
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 10
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm constant_tsc arch_perfmon pebs bts pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm lahf_lm tpr_shadow
bogomips : 4800.01
clflush size : 64
power management:

00000000-0009ebff : System RAM
0009ec00-0009ffff : reserved
000a0000-000bffff : Video RAM area
000c0000-000c7fff : Video ROM
000e4000-000fffff : reserved
000f0000-000fffff : System ROM
00100000-7ff8ffff : System RAM
00100000-002e022e : Kernel code
002e022f-0038baf7 : Kernel data
003d8000-003fc9f3 : Kernel bss
7ff90000-7ff9dfff : ACPI Tables
7ff9e000-7ffdffff : ACPI Non-volatile Storage
7ffe0000-7fffffff : reserved
80000000-800000ff : 0000:00:1f.3
bff00000-dfefffff : PCI Bus 0000:01
c0000000-cfffffff : 0000:01:00.0
e0000000-efffffff : PCI MMCONFIG 0 [00-ff]
e0000000-efffffff : pnp 00:0e
febfe000-febfec00 : pnp 00:09
fec00000-fec00fff : IOAPIC 0
fec00000-fec00fff : pnp 00:0b
fed00000-fed003ff : HPET 0
fed14000-fed19fff : pnp 00:01
fed1c000-fed1ffff : pnp 00:09
fed20000-fed8ffff : pnp 00:09
fee00000-fee00fff : Local APIC
fee00000-fee00fff : reserved
fee00000-fee00fff : pnp 00:0b
ff800000-ff8fffff : PCI Bus 0000:01
ff8c0000-ff8dffff : 0000:01:00.0
ff8e0000-ff8effff : 0000:01:00.1
ff8f0000-ff8fffff : 0000:01:00.0
ff900000-ff9fffff : PCI Bus 0000:02
ff9ffc00-ff9ffcff : 0000:02:02.0
ff9ffc00-ff9ffcff : 8139too
ffaf8000-ffafbfff : 0000:00:1b.0
ffaf8000-ffafbfff : ICH HD audio
ffaff000-ffaff3ff : 0000:00:1d.7
ffaff000-ffaff3ff : ehci_hcd
ffaff400-ffaff7ff : 0000:00:1a.7
ffaff400-ffaff7ff : ehci_hcd
ffaff800-ffafffff : 0000:00:1f.2
ffaff800-ffafffff : ahci
ffb00000-ffffffff : reserved
ffb00000-ffbfffff : pnp 00:09
fff00000-fffffffe : pnp 00:09
100000000-1ffffffff : System RAM
200000000-ffffffffffffffff : RAM buffer

With 2.6.30 things look similar, except 2.6.30 does not show the
last "200000000-ffffffffffffffff : RAM buffer" line.

2009-06-27 19:30:39

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [BUG 2.6.31-rc1] HIGHMEM64G causes hang in PCI init on 32-bit x86

Mikael Pettersson wrote:
>
> Sure. This is from 2.6.31-rc1 with the commit commented out and
> replaced by a debug printk:
>
> identify_cpu: ->x86_phys_bits == 36, would change iomem_resource.end from 0xffffffffffffffff to 0x0000000fffffffff
>
> which looks correct for HIGHMEM64G.

OK, I'm going to revert the checkin... however, we still need to figure
out what the right answer here is, and why this broke (and why none of
my tests showed this failure...)

However, it is clear that the answer is more complex than we originally
anticipated. Mikael, would you be willing to run some experiments over
the near future?

-hpa

--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel. I don't speak on their behalf.

2009-06-27 19:34:17

by H. Peter Anvin

[permalink] [raw]
Subject: [tip:x86/urgent] Revert "x86: cap iomem_resource to addressable physical memory"

Commit-ID: fc1dbec8ae1a451557ade978bf7777b267690ca9
Gitweb: http://git.kernel.org/tip/fc1dbec8ae1a451557ade978bf7777b267690ca9
Author: H. Peter Anvin <[email protected]>
AuthorDate: Sat, 27 Jun 2009 12:22:27 -0700
Committer: H. Peter Anvin <[email protected]>
CommitDate: Sat, 27 Jun 2009 12:22:27 -0700

Revert "x86: cap iomem_resource to addressable physical memory"

This reverts commit 95ee14e4379c5e19c0897c872350570402014742.
Mikael Petterson <[email protected]> reported that at least one of his
systems will not boot as a result. We have ruled out the detection
algorithm malfunctioning, so it is not a matter of producing the
incorrect bitmasks; rather, something in the application of them
fails.

Revert the commit until we can root cause and correct this problem.

-stable team: this means the underlying commit should be rejected.

Reported-and-isolated-by: Mikael Petterson <[email protected]>
Signed-off-by: H. Peter Anvin <[email protected]>
LKML-Reference: <[email protected]>
Cc: [email protected]
Cc: Grant Grundler <[email protected]>


---
arch/x86/kernel/cpu/common.c | 3 ---
1 files changed, 0 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 6b26d4d..f1961c0 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -848,9 +848,6 @@ static void __cpuinit identify_cpu(struct cpuinfo_x86 *c)
#if defined(CONFIG_NUMA) && defined(CONFIG_X86_64)
numa_add_cpu(smp_processor_id());
#endif
-
- /* Cap the iomem address space to what is addressable on all CPUs */
- iomem_resource.end &= (1ULL << c->x86_phys_bits) - 1;
}

#ifdef CONFIG_X86_64

2009-06-27 21:34:49

by Mikael Pettersson

[permalink] [raw]
Subject: Re: [BUG 2.6.31-rc1] HIGHMEM64G causes hang in PCI init on 32-bit x86

H. Peter Anvin writes:
> Mikael Pettersson wrote:
> >
> > Sure. This is from 2.6.31-rc1 with the commit commented out and
> > replaced by a debug printk:
> >
> > identify_cpu: ->x86_phys_bits == 36, would change iomem_resource.end from 0xffffffffffffffff to 0x0000000fffffffff
> >
> > which looks correct for HIGHMEM64G.
>
> OK, I'm going to revert the checkin... however, we still need to figure
> out what the right answer here is, and why this broke (and why none of
> my tests showed this failure...)
>
> However, it is clear that the answer is more complex than we originally
> anticipated. Mikael, would you be willing to run some experiments over
> the near future?

Yes

/Mikael

2009-06-27 21:50:17

by Mikael Pettersson

[permalink] [raw]
Subject: Re: [tip:x86/urgent] Revert "x86: cap iomem_resource to addressable physical memory"

tip-bot for H. Peter Anvin writes:
> Commit-ID: fc1dbec8ae1a451557ade978bf7777b267690ca9
> Gitweb: http://git.kernel.org/tip/fc1dbec8ae1a451557ade978bf7777b267690ca9
> Author: H. Peter Anvin <[email protected]>
> AuthorDate: Sat, 27 Jun 2009 12:22:27 -0700
> Committer: H. Peter Anvin <[email protected]>
> CommitDate: Sat, 27 Jun 2009 12:22:27 -0700
>
> Revert "x86: cap iomem_resource to addressable physical memory"
>
> This reverts commit 95ee14e4379c5e19c0897c872350570402014742.
> Mikael Petterson <[email protected]> reported that at least one of his
> systems will not boot as a result. We have ruled out the detection
> algorithm malfunctioning, so it is not a matter of producing the
> incorrect bitmasks; rather, something in the application of them
> fails.
>
> Revert the commit until we can root cause and correct this problem.
>
> -stable team: this means the underlying commit should be rejected.
>
> Reported-and-isolated-by: Mikael Petterson <[email protected]>

Typo, please s/mikepe/mikpe/ here.

> Signed-off-by: H. Peter Anvin <[email protected]>
> LKML-Reference: <[email protected]>
> Cc: [email protected]
> Cc: Grant Grundler <[email protected]>
>
>
> ---
> arch/x86/kernel/cpu/common.c | 3 ---
> 1 files changed, 0 insertions(+), 3 deletions(-)
>
> diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
> index 6b26d4d..f1961c0 100644
> --- a/arch/x86/kernel/cpu/common.c
> +++ b/arch/x86/kernel/cpu/common.c
> @@ -848,9 +848,6 @@ static void __cpuinit identify_cpu(struct cpuinfo_x86 *c)
> #if defined(CONFIG_NUMA) && defined(CONFIG_X86_64)
> numa_add_cpu(smp_processor_id());
> #endif
> -
> - /* Cap the iomem address space to what is addressable on all CPUs */
> - iomem_resource.end &= (1ULL << c->x86_phys_bits) - 1;
> }
>
> #ifdef CONFIG_X86_64
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>

2009-06-27 22:04:58

by H. Peter Anvin

[permalink] [raw]
Subject: [tip:x86/urgent] Revert "x86: cap iomem_resource to addressable physical memory"

Commit-ID: aadaf6b756031e14c0957aebc63d60f1d9a08524
Gitweb: http://git.kernel.org/tip/aadaf6b756031e14c0957aebc63d60f1d9a08524
Author: H. Peter Anvin <[email protected]>
AuthorDate: Sat, 27 Jun 2009 12:22:27 -0700
Committer: H. Peter Anvin <[email protected]>
CommitDate: Sat, 27 Jun 2009 14:59:07 -0700

Revert "x86: cap iomem_resource to addressable physical memory"

This reverts commit 95ee14e4379c5e19c0897c872350570402014742.
Mikael Petterson <[email protected]> reported that at least one of his
systems will not boot as a result. We have ruled out the detection
algorithm malfunctioning, so it is not a matter of producing the
incorrect bitmasks; rather, something in the application of them
fails.

Revert the commit until we can root cause and correct this problem.

-stable team: this means the underlying commit should be rejected.

Reported-and-isolated-by: Mikael Petterson <[email protected]>
Signed-off-by: H. Peter Anvin <[email protected]>
LKML-Reference: <[email protected]>
Cc: [email protected]
Cc: Grant Grundler <[email protected]>


---
arch/x86/kernel/cpu/common.c | 3 ---
1 files changed, 0 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 6b26d4d..f1961c0 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -848,9 +848,6 @@ static void __cpuinit identify_cpu(struct cpuinfo_x86 *c)
#if defined(CONFIG_NUMA) && defined(CONFIG_X86_64)
numa_add_cpu(smp_processor_id());
#endif
-
- /* Cap the iomem address space to what is addressable on all CPUs */
- iomem_resource.end &= (1ULL << c->x86_phys_bits) - 1;
}

#ifdef CONFIG_X86_64

2009-06-28 07:40:48

by H. Peter Anvin

[permalink] [raw]
Subject: [tip:x86/urgent] Revert "x86: cap iomem_resource to addressable physical memory"

Commit-ID: ff8a4bae459a9b6455504127fcb78fdbc8e50e4c
Gitweb: http://git.kernel.org/tip/ff8a4bae459a9b6455504127fcb78fdbc8e50e4c
Author: H. Peter Anvin <[email protected]>
AuthorDate: Sat, 27 Jun 2009 12:22:27 -0700
Committer: Ingo Molnar <[email protected]>
CommitDate: Sun, 28 Jun 2009 09:38:47 +0200

Revert "x86: cap iomem_resource to addressable physical memory"

This reverts commit 95ee14e4379c5e19c0897c872350570402014742.
Mikael Petterson <[email protected]> reported that at least one of his
systems will not boot as a result. We have ruled out the detection
algorithm malfunctioning, so it is not a matter of producing the
incorrect bitmasks; rather, something in the application of them
fails.

Revert the commit until we can root cause and correct this problem.

-stable team: this means the underlying commit should be rejected.

Reported-and-isolated-by: Mikael Petterson <[email protected]>
Signed-off-by: H. Peter Anvin <[email protected]>
LKML-Reference: <[email protected]>
Cc: [email protected]
Cc: Grant Grundler <[email protected]>


---
arch/x86/kernel/cpu/common.c | 3 ---
1 files changed, 0 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 6b26d4d..f1961c0 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -848,9 +848,6 @@ static void __cpuinit identify_cpu(struct cpuinfo_x86 *c)
#if defined(CONFIG_NUMA) && defined(CONFIG_X86_64)
numa_add_cpu(smp_processor_id());
#endif
-
- /* Cap the iomem address space to what is addressable on all CPUs */
- iomem_resource.end &= (1ULL << c->x86_phys_bits) - 1;
}

#ifdef CONFIG_X86_64

2009-06-29 02:24:19

by Grant Grundler

[permalink] [raw]
Subject: Re: [BUG 2.6.31-rc1] HIGHMEM64G causes hang in PCI init on 32-bit x86

On Fri, Jun 26, 2009 at 10:31:49PM -0700, H. Peter Anvin wrote:
> Grant Grundler wrote:
>>>> +
>>>> + /* Cap the iomem address space to what is addressable on all CPUs */
>>>> + iomem_resource.end &= (1ULL << c->x86_phys_bits) - 1;
>>
>> Does x86_phys_bits represent the number of address lines/bits handled by
>> the memory controller, coming out of the CPU, or handled by the
>> "north bridge" (IO controller)?
>>
>
> x86_phys_bits represents the top end of what the processor can address.

Ok - I interpret that to mean the number of physical address bits
the processor can deal with regardless of the memory controller
or IO Controller.


>> I was assuming all three are the same thing but that might not be true
>> with "QPI" or whatever Intel is calling it's serial interconnect these days.
>> I'm wondering if the addressing capability of the CPU->memory controller
>> might be different than CPU->IO Controller.
>>
>> Parallel interconnects are limited by the number of lines wired to
>> transmit address data and I expect that's where x86_phys_bits originally
>> came from. Chipsets _were_ all designed around those limits.
>
> Serial interconnects behave the same way, it's just that the address
> bits are sent in serial order.

The bits going across the wire are the protocol for the interconnect
and not necessarily what the CPU has implemented.

> Something is seriously goofy here, and
> it's probably reasonably straightforward to figure out what.

Ok - just tracking it down will be difficult...really need to know
the state of the machine (MCE dump?) when it wedges and can then
determine which code is trying to access the range.

thanks,
grant

2009-06-29 02:29:19

by Grant Grundler

[permalink] [raw]
Subject: Re: [BUG 2.6.31-rc1] HIGHMEM64G causes hang in PCI init on 32-bit x86

On Sat, Jun 27, 2009 at 11:45:24AM +0200, Mikael Pettersson wrote:
...
> fff00000-fffffffe : pnp 00:09
> 100000000-1ffffffff : System RAM
> 200000000-ffffffffffffffff : RAM buffer
>
> With 2.6.30 things look similar, except 2.6.30 does not show the
> last "200000000-ffffffffffffffff : RAM buffer" line.

BIOS e280 table didn't report that line.
I expect it's created by arch/x86/kernel/e820.c:
1398 /*
1399 * Try to bump up RAM regions to reasonable boundaries to
1400 * avoid stolen RAM:
1401 */
1402 for (i = 0; i < e820.nr_map; i++) {
1403 struct e820entry *entry = &e820_saved.map[i];
1404 resource_size_t start, end;
1405
1406 if (entry->type != E820_RAM)
1407 continue;
1408 start = entry->addr + entry->size;
1409 end = round_up(start, ram_alignment(start));
1410 if (start == end)
1411 continue;
1412 reserve_region_with_split(&iomem_resource, start,
1413 end - 1, "RAM buffer");
1414 }

hth,
grant

2009-06-29 05:04:29

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [BUG 2.6.31-rc1] HIGHMEM64G causes hang in PCI init on 32-bit x86

Grant Grundler wrote:
> On Sat, Jun 27, 2009 at 11:45:24AM +0200, Mikael Pettersson wrote:
> ...
>> fff00000-fffffffe : pnp 00:09
>> 100000000-1ffffffff : System RAM
>> 200000000-ffffffffffffffff : RAM buffer
>>
>> With 2.6.30 things look similar, except 2.6.30 does not show the
>> last "200000000-ffffffffffffffff : RAM buffer" line.
>
> BIOS e280 table didn't report that line.
> I expect it's created by arch/x86/kernel/e820.c:
> 1398 /*
> 1399 * Try to bump up RAM regions to reasonable boundaries to
> 1400 * avoid stolen RAM:
> 1401 */
> 1402 for (i = 0; i < e820.nr_map; i++) {
> 1403 struct e820entry *entry = &e820_saved.map[i];
> 1404 resource_size_t start, end;
> 1405
> 1406 if (entry->type != E820_RAM)
> 1407 continue;
> 1408 start = entry->addr + entry->size;
> 1409 end = round_up(start, ram_alignment(start));
> 1410 if (start == end)
> 1411 continue;
> 1412 reserve_region_with_split(&iomem_resource, start,
> 1413 end - 1, "RAM buffer");
> 1414 }
>

OK, this seems more than a wee bit strange, to say the least. We
shouldn't be reserving the entire address space; this is legitimate I/O
space.

However, the reservation suddenly being improper for the root resource
would definitely make things unhappy...

-hpa

--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel. I don't speak on their behalf.

2009-06-29 11:12:20

by Mikael Pettersson

[permalink] [raw]
Subject: Re: [BUG 2.6.31-rc1] HIGHMEM64G causes hang in PCI init on 32-bit x86

H. Peter Anvin writes:
> Grant Grundler wrote:
> > On Sat, Jun 27, 2009 at 11:45:24AM +0200, Mikael Pettersson wrote:
> > ...
> >> fff00000-fffffffe : pnp 00:09
> >> 100000000-1ffffffff : System RAM
> >> 200000000-ffffffffffffffff : RAM buffer
> >>
> >> With 2.6.30 things look similar, except 2.6.30 does not show the
> >> last "200000000-ffffffffffffffff : RAM buffer" line.
> >
> > BIOS e280 table didn't report that line.
> > I expect it's created by arch/x86/kernel/e820.c:
> > 1398 /*
> > 1399 * Try to bump up RAM regions to reasonable boundaries to
> > 1400 * avoid stolen RAM:
> > 1401 */
> > 1402 for (i = 0; i < e820.nr_map; i++) {
> > 1403 struct e820entry *entry = &e820_saved.map[i];
> > 1404 resource_size_t start, end;
> > 1405
> > 1406 if (entry->type != E820_RAM)
> > 1407 continue;
> > 1408 start = entry->addr + entry->size;
> > 1409 end = round_up(start, ram_alignment(start));
> > 1410 if (start == end)
> > 1411 continue;
> > 1412 reserve_region_with_split(&iomem_resource, start,
> > 1413 end - 1, "RAM buffer");
> > 1414 }
> >
>
> OK, this seems more than a wee bit strange, to say the least. We
> shouldn't be reserving the entire address space; this is legitimate I/O
> space.
>
> However, the reservation suddenly being improper for the root resource
> would definitely make things unhappy...

Reverting the two e820 changes in 2.6.31-rc1,
5d423ccd7ba4285f1084e91b26805e1d0ae978ed and then
45fbe3ee01b8e463b28c2751b5dcc0cbdc142d90,
but keeping the iomem_resource.end cap change, makes 2.6.31-rc1
work on my HIGHMEM64G machine.

Seems the e820 and the iomem_resource.end changes are Ok in
isolation but break when combined.

/Mikael

2009-06-29 11:22:00

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [BUG 2.6.31-rc1] HIGHMEM64G causes hang in PCI init on 32-bit x86

On Mon, Jun 29, 2009 at 01:12:05PM +0200, Mikael Pettersson wrote:
> H. Peter Anvin writes:
> > Grant Grundler wrote:
> > > On Sat, Jun 27, 2009 at 11:45:24AM +0200, Mikael Pettersson wrote:
> > > ...
> > >> fff00000-fffffffe : pnp 00:09
> > >> 100000000-1ffffffff : System RAM
> > >> 200000000-ffffffffffffffff : RAM buffer
> > >>
> > >> With 2.6.30 things look similar, except 2.6.30 does not show the
> > >> last "200000000-ffffffffffffffff : RAM buffer" line.
> > >
> > > BIOS e280 table didn't report that line.
> > > I expect it's created by arch/x86/kernel/e820.c:
> > > 1398 /*
> > > 1399 * Try to bump up RAM regions to reasonable boundaries to
> > > 1400 * avoid stolen RAM:
> > > 1401 */
> > > 1402 for (i = 0; i < e820.nr_map; i++) {
> > > 1403 struct e820entry *entry = &e820_saved.map[i];
> > > 1404 resource_size_t start, end;
> > > 1405
> > > 1406 if (entry->type != E820_RAM)
> > > 1407 continue;
> > > 1408 start = entry->addr + entry->size;
> > > 1409 end = round_up(start, ram_alignment(start));
> > > 1410 if (start == end)
> > > 1411 continue;
> > > 1412 reserve_region_with_split(&iomem_resource, start,
> > > 1413 end - 1, "RAM buffer");
> > > 1414 }
> > >
> >
> > OK, this seems more than a wee bit strange, to say the least. We
> > shouldn't be reserving the entire address space; this is legitimate I/O
> > space.
> >
> > However, the reservation suddenly being improper for the root resource
> > would definitely make things unhappy...
>
> Reverting the two e820 changes in 2.6.31-rc1,
> 5d423ccd7ba4285f1084e91b26805e1d0ae978ed and then
> 45fbe3ee01b8e463b28c2751b5dcc0cbdc142d90,
> but keeping the iomem_resource.end cap change, makes 2.6.31-rc1
> work on my HIGHMEM64G machine.
>
> Seems the e820 and the iomem_resource.end changes are Ok in
> isolation but break when combined.

With the e820 change reverted, what does /proc/iomem look like?

--
Matthew Wilcox Intel Open Source Technology Centre
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours. We can't possibly take such
a retrograde step."

2009-06-29 11:57:32

by Mikael Pettersson

[permalink] [raw]
Subject: Re: [BUG 2.6.31-rc1] HIGHMEM64G causes hang in PCI init on 32-bit x86

Matthew Wilcox writes:
> On Mon, Jun 29, 2009 at 01:12:05PM +0200, Mikael Pettersson wrote:
> > H. Peter Anvin writes:
> > > Grant Grundler wrote:
> > > > On Sat, Jun 27, 2009 at 11:45:24AM +0200, Mikael Pettersson wrote:
> > > > ...
> > > >> fff00000-fffffffe : pnp 00:09
> > > >> 100000000-1ffffffff : System RAM
> > > >> 200000000-ffffffffffffffff : RAM buffer
> > > >>
> > > >> With 2.6.30 things look similar, except 2.6.30 does not show the
> > > >> last "200000000-ffffffffffffffff : RAM buffer" line.
> > > >
> > > > BIOS e280 table didn't report that line.
> > > > I expect it's created by arch/x86/kernel/e820.c:
> > > > 1398 /*
> > > > 1399 * Try to bump up RAM regions to reasonable boundaries to
> > > > 1400 * avoid stolen RAM:
> > > > 1401 */
> > > > 1402 for (i = 0; i < e820.nr_map; i++) {
> > > > 1403 struct e820entry *entry = &e820_saved.map[i];
> > > > 1404 resource_size_t start, end;
> > > > 1405
> > > > 1406 if (entry->type != E820_RAM)
> > > > 1407 continue;
> > > > 1408 start = entry->addr + entry->size;
> > > > 1409 end = round_up(start, ram_alignment(start));
> > > > 1410 if (start == end)
> > > > 1411 continue;
> > > > 1412 reserve_region_with_split(&iomem_resource, start,
> > > > 1413 end - 1, "RAM buffer");
> > > > 1414 }
> > > >
> > >
> > > OK, this seems more than a wee bit strange, to say the least. We
> > > shouldn't be reserving the entire address space; this is legitimate I/O
> > > space.
> > >
> > > However, the reservation suddenly being improper for the root resource
> > > would definitely make things unhappy...
> >
> > Reverting the two e820 changes in 2.6.31-rc1,
> > 5d423ccd7ba4285f1084e91b26805e1d0ae978ed and then
> > 45fbe3ee01b8e463b28c2751b5dcc0cbdc142d90,
> > but keeping the iomem_resource.end cap change, makes 2.6.31-rc1
> > work on my HIGHMEM64G machine.
> >
> > Seems the e820 and the iomem_resource.end changes are Ok in
> > isolation but break when combined.
>
> With the e820 change reverted, what does /proc/iomem look like?

00000000-0009ebff : System RAM
0009ec00-0009ffff : reserved
000a0000-000bffff : Video RAM area
000c0000-000ccfff : Video ROM
000e4000-000fffff : reserved
000f0000-000fffff : System ROM
00100000-7ff8ffff : System RAM
00100000-002e022e : Kernel code
002e022f-0038aaf7 : Kernel data
003d8000-003fc9f3 : Kernel bss
7ff90000-7ff9dfff : ACPI Tables
7ff9e000-7ffdffff : ACPI Non-volatile Storage
7ffe0000-7fffffff : reserved
88000000-880000ff : 0000:00:1f.3
bff00000-dfefffff : PCI Bus 0000:01
c0000000-cfffffff : 0000:01:00.0
e0000000-efffffff : PCI MMCONFIG 0 [00-ff]
e0000000-efffffff : pnp 00:0e
febfe000-febfec00 : pnp 00:09
fec00000-fec00fff : IOAPIC 0
fec00000-fec00fff : pnp 00:0b
fed00000-fed003ff : HPET 0
fed14000-fed19fff : pnp 00:01
fed1c000-fed1ffff : pnp 00:09
fed20000-fed8ffff : pnp 00:09
fee00000-fee00fff : Local APIC
fee00000-fee00fff : reserved
fee00000-fee00fff : pnp 00:0b
ff800000-ff8fffff : PCI Bus 0000:01
ff8c0000-ff8dffff : 0000:01:00.0
ff8e0000-ff8effff : 0000:01:00.1
ff8f0000-ff8fffff : 0000:01:00.0
ff900000-ff9fffff : PCI Bus 0000:02
ff9ffc00-ff9ffcff : 0000:02:02.0
ff9ffc00-ff9ffcff : 8139too
ffaf8000-ffafbfff : 0000:00:1b.0
ffaf8000-ffafbfff : ICH HD audio
ffaff000-ffaff3ff : 0000:00:1d.7
ffaff000-ffaff3ff : ehci_hcd
ffaff400-ffaff7ff : 0000:00:1a.7
ffaff400-ffaff7ff : ehci_hcd
ffaff800-ffafffff : 0000:00:1f.2
ffaff800-ffafffff : ahci
ffb00000-ffffffff : reserved
ffb00000-ffbfffff : pnp 00:09
fff00000-fffffffe : pnp 00:09
100000000-1ffffffff : System RAM

2009-06-29 21:53:29

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [BUG 2.6.31-rc1] HIGHMEM64G causes hang in PCI init on 32-bit x86

[Add Cc: Yinghai]

Mikael Pettersson wrote:
> > > >
> > > > OK, this seems more than a wee bit strange, to say the least. We
> > > > shouldn't be reserving the entire address space; this is legitimate I/O
> > > > space.
> > > >
> > > > However, the reservation suddenly being improper for the root resource
> > > > would definitely make things unhappy...
> > >
> > > Reverting the two e820 changes in 2.6.31-rc1,
> > > 5d423ccd7ba4285f1084e91b26805e1d0ae978ed and then
> > > 45fbe3ee01b8e463b28c2751b5dcc0cbdc142d90,
> > > but keeping the iomem_resource.end cap change, makes 2.6.31-rc1
> > > work on my HIGHMEM64G machine.
> > >
> > > Seems the e820 and the iomem_resource.end changes are Ok in
> > > isolation but break when combined.
> >
> > With the e820 change reverted, what does /proc/iomem look like?
>

OK. This is starting to make sense. I suspect this is a similar issue
as 3b0fde0fac19c180317eb0601b3504083f4b9bf5 addresses, which is that the
e820 code assumes -- and I don't see any exception to that in
45fbe3ee01b8e463b28c2751b5dcc0cbdc142d90 -- that iomem_resource covers
the entire 64-bit address space that e820 knows. I wonder what happens
with "interestingly shaped" memory above 4 GB if resource_size_t is 32
bits with that code.

In terms of address space assignment, an alternate implementation of the
address space cap is to mark it reserved; that would unfortunately
result in an ugly turd at the end of /proc/iomem, but that can be
addressed if need be, too.

-hpa

2009-06-29 22:48:35

by Yinghai Lu

[permalink] [raw]
Subject: Re: [BUG 2.6.31-rc1] HIGHMEM64G causes hang in PCI init on 32-bit x86

H. Peter Anvin wrote:
> [Add Cc: Yinghai]
>
> Mikael Pettersson wrote:
>> > > >
>> > > > OK, this seems more than a wee bit strange, to say the least. We
>> > > > shouldn't be reserving the entire address space; this is legitimate I/O
>> > > > space.
>> > > >
>> > > > However, the reservation suddenly being improper for the root resource
>> > > > would definitely make things unhappy...
>> > >
>> > > Reverting the two e820 changes in 2.6.31-rc1,
>> > > 5d423ccd7ba4285f1084e91b26805e1d0ae978ed and then
>> > > 45fbe3ee01b8e463b28c2751b5dcc0cbdc142d90,
>> > > but keeping the iomem_resource.end cap change, makes 2.6.31-rc1
>> > > work on my HIGHMEM64G machine.
>> > >
>> > > Seems the e820 and the iomem_resource.end changes are Ok in
>> > > isolation but break when combined.
>> >
>> > With the e820 change reverted, what does /proc/iomem look like?
>>
>
> OK. This is starting to make sense. I suspect this is a similar issue
> as 3b0fde0fac19c180317eb0601b3504083f4b9bf5 addresses, which is that the
> e820 code assumes -- and I don't see any exception to that in
> 45fbe3ee01b8e463b28c2751b5dcc0cbdc142d90 -- that iomem_resource covers
> the entire 64-bit address space that e820 knows. I wonder what happens
> with "interestingly shaped" memory above 4 GB if resource_size_t is 32
> bits with that code.
>
> In terms of address space assignment, an alternate implementation of the
> address space cap is to mark it reserved; that would unfortunately
> result in an ugly turd at the end of /proc/iomem, but that can be
> addressed if need be, too.

always enable 64bit resource for 32bit too?

YH

2009-06-29 23:30:51

by Yinghai Lu

[permalink] [raw]
Subject: Re: [BUG 2.6.31-rc1] HIGHMEM64G causes hang in PCI init on 32-bit x86

Yinghai Lu wrote:
> H. Peter Anvin wrote:
>> [Add Cc: Yinghai]
>>
>> Mikael Pettersson wrote:
>>> > > >
>>> > > > OK, this seems more than a wee bit strange, to say the least. We
>>> > > > shouldn't be reserving the entire address space; this is legitimate I/O
>>> > > > space.
>>> > > >
>>> > > > However, the reservation suddenly being improper for the root resource
>>> > > > would definitely make things unhappy...
>>> > >
>>> > > Reverting the two e820 changes in 2.6.31-rc1,
>>> > > 5d423ccd7ba4285f1084e91b26805e1d0ae978ed and then
>>> > > 45fbe3ee01b8e463b28c2751b5dcc0cbdc142d90,
>>> > > but keeping the iomem_resource.end cap change, makes 2.6.31-rc1
>>> > > work on my HIGHMEM64G machine.
>>> > >
>>> > > Seems the e820 and the iomem_resource.end changes are Ok in
>>> > > isolation but break when combined.
>>> >
>>> > With the e820 change reverted, what does /proc/iomem look like?
>>>
>> OK. This is starting to make sense. I suspect this is a similar issue
>> as 3b0fde0fac19c180317eb0601b3504083f4b9bf5 addresses, which is that the
>> e820 code assumes -- and I don't see any exception to that in
>> 45fbe3ee01b8e463b28c2751b5dcc0cbdc142d90 -- that iomem_resource covers
>> the entire 64-bit address space that e820 knows. I wonder what happens
>> with "interestingly shaped" memory above 4 GB if resource_size_t is 32
>> bits with that code.
>>
>> In terms of address space assignment, an alternate implementation of the
>> address space cap is to mark it reserved; that would unfortunately
>> result in an ugly turd at the end of /proc/iomem, but that can be
>> addressed if need be, too.

Mikael, can you try following patch on your system?

---
arch/x86/kernel/e820.c | 10 ++++++----
1 file changed, 6 insertions(+), 4 deletions(-)

Index: linux-2.6/arch/x86/kernel/e820.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/e820.c
+++ linux-2.6/arch/x86/kernel/e820.c
@@ -1400,8 +1400,8 @@ void __init e820_reserve_resources_late(
* avoid stolen RAM:
*/
for (i = 0; i < e820.nr_map; i++) {
- struct e820entry *entry = &e820_saved.map[i];
- resource_size_t start, end;
+ struct e820entry *entry = &e820.map[i];
+ u64 start, end;

if (entry->type != E820_RAM)
continue;
@@ -1409,8 +1409,10 @@ void __init e820_reserve_resources_late(
end = round_up(start, ram_alignment(start));
if (start == end)
continue;
- reserve_region_with_split(&iomem_resource, start,
- end - 1, "RAM buffer");
+ if (end != (resource_size_t)end)
+ continue;
+ reserve_region_with_split(&iomem_resource, (resource_size_t)start,
+ (resource_size_t)(end - 1), "RAM buffer");
}
}

2009-06-30 01:45:35

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [BUG 2.6.31-rc1] HIGHMEM64G causes hang in PCI init on 32-bit x86

Linus Torvalds wrote:
> ...
> end = round_up(start, ram_alignment(start)) - 1;
> if (end > MAX_RESOURCE_SIZE)
> end = MAX_RESOURCE_SIZE;
> if (start > end)
> continue;
>
> Because otherwise we'll just be ignoring resources that cross the resource
> size boundary, which sounds wrong.
>
> We _could_ have a RAM resource that crosses the 4GB boundary, after all.
>

We could, but the *alignment pad* shouldn't be able to cross a
power-of-two boundary ("end" is always an aligned-up version of "start").

> That said, I have to admit that I'm getting tired of these bugs that only
> happen when we have a 32-bit resource_size_t. So I can understand the
> attraction to just forcing it to 64-bit and forgetting about these
> irritating issues.

Probably would be worth figuring out just how much it would be.

-hpa

--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel. I don't speak on their behalf.

2009-06-30 01:42:01

by Yinghai Lu

[permalink] [raw]
Subject: Re: [BUG 2.6.31-rc1] HIGHMEM64G causes hang in PCI init on 32-bit x86

Linus Torvalds wrote:
>
> On Mon, 29 Jun 2009, Yinghai Lu wrote:
>> + end = round_up(start, ram_alignment(start)) - 1;
>> + if (start > (resource_size_t)end)
>> continue;
>> - reserve_region_with_split(&iomem_resource, start,
>> - end - 1, "RAM buffer");
>> + reserve_region_with_split(&iomem_resource, (resource_size_t)start,
>> + (resource_size_t)end, "RAM buffer");
>
> Hmm. You shouldn't need the casts with reserve_region_with_split(), and
> they just make things uglier.
>
> Also, I wonder if we should do something like this instead
>
> #define MAX_RESOURCE_SIZE ((resource_size_t)-1)
>
> ...
> end = round_up(start, ram_alignment(start)) - 1;
> if (end > MAX_RESOURCE_SIZE)
> end = MAX_RESOURCE_SIZE;
> if (start > end)
> continue;
>
> Because otherwise we'll just be ignoring resources that cross the resource
> size boundary, which sounds wrong.
>
> We _could_ have a RAM resource that crosses the 4GB boundary, after all.
>
> Yeah, it doesn't happen much in practice, because usually the 3G-4G range
> is left for PCI mappings etc, so we might never hit this in practice, but
> still, this sounds like a more correct thing to do.
>
> It also avoids the cast. We simply cap the end to the max that
> 'resource_size_t' can hold.

Mikael, please try this on your system, and send out /proc/iomem

Thanks

Yinghai

[PATCH] x86: add boundary check for 32bit res before expand e820 resource to alignment

fix hang with HIGHMEM_64G and 32bit resource.

according to hpa and Linus, use (resource_size_t)-1 to fend off big ranges.

Signed-off-by: Yinghai Lu <[email protected]>

---
arch/x86/kernel/e820.c | 20 ++++++++++++--------
1 file changed, 12 insertions(+), 8 deletions(-)

Index: linux-2.6/arch/x86/kernel/e820.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/e820.c
+++ linux-2.6/arch/x86/kernel/e820.c
@@ -1367,9 +1367,9 @@ void __init e820_reserve_resources(void)
}

/* How much should we pad RAM ending depending on where it is? */
-static unsigned long ram_alignment(resource_size_t pos)
+static u64 ram_alignment(u64 pos)
{
- unsigned long mb = pos >> 20;
+ u64 mb = pos >> 20;

/* To 64kB in the first megabyte */
if (!mb)
@@ -1383,6 +1383,8 @@ static unsigned long ram_alignment(resou
return 32*1024*1024;
}

+#define MAX_RESOURCE_SIZE ((resource_size_t)-1)
+
void __init e820_reserve_resources_late(void)
{
int i;
@@ -1400,17 +1402,19 @@ void __init e820_reserve_resources_late(
* avoid stolen RAM:
*/
for (i = 0; i < e820.nr_map; i++) {
- struct e820entry *entry = &e820_saved.map[i];
- resource_size_t start, end;
+ struct e820entry *entry = &e820.map[i];
+ u64 start, end;

if (entry->type != E820_RAM)
continue;
start = entry->addr + entry->size;
- end = round_up(start, ram_alignment(start));
- if (start == end)
+ end = round_up(start, ram_alignment(start)) - 1;
+ if (end > MAX_RESOURCE_SIZE)
+ end = MAX_RESOURCE_SIZE;
+ if (start > end)
continue;
- reserve_region_with_split(&iomem_resource, start,
- end - 1, "RAM buffer");
+ reserve_region_with_split(&iomem_resource, start, end,
+ "RAM buffer");
}
}

2009-06-30 01:28:42

by Linus Torvalds

[permalink] [raw]
Subject: Re: [BUG 2.6.31-rc1] HIGHMEM64G causes hang in PCI init on 32-bit x86



On Mon, 29 Jun 2009, Yinghai Lu wrote:
> + end = round_up(start, ram_alignment(start)) - 1;
> + if (start > (resource_size_t)end)
> continue;
> - reserve_region_with_split(&iomem_resource, start,
> - end - 1, "RAM buffer");
> + reserve_region_with_split(&iomem_resource, (resource_size_t)start,
> + (resource_size_t)end, "RAM buffer");

Hmm. You shouldn't need the casts with reserve_region_with_split(), and
they just make things uglier.

Also, I wonder if we should do something like this instead

#define MAX_RESOURCE_SIZE ((resource_size_t)-1)

...
end = round_up(start, ram_alignment(start)) - 1;
if (end > MAX_RESOURCE_SIZE)
end = MAX_RESOURCE_SIZE;
if (start > end)
continue;

Because otherwise we'll just be ignoring resources that cross the resource
size boundary, which sounds wrong.

We _could_ have a RAM resource that crosses the 4GB boundary, after all.

Yeah, it doesn't happen much in practice, because usually the 3G-4G range
is left for PCI mappings etc, so we might never hit this in practice, but
still, this sounds like a more correct thing to do.

It also avoids the cast. We simply cap the end to the max that
'resource_size_t' can hold.

That said, I have to admit that I'm getting tired of these bugs that only
happen when we have a 32-bit resource_size_t. So I can understand the
attraction to just forcing it to 64-bit and forgetting about these
irritating issues.

Linus

2009-06-30 01:15:28

by Yinghai Lu

[permalink] [raw]
Subject: Re: [BUG 2.6.31-rc1] HIGHMEM64G causes hang in PCI init on 32-bit x86

H. Peter Anvin wrote:
> Yinghai Lu wrote:
>> continue;
>> @@ -1409,8 +1409,10 @@ void __init e820_reserve_resources_late(
>> end = round_up(start, ram_alignment(start));
>> if (start == end)
>> continue;
>> - reserve_region_with_split(&iomem_resource, start,
>> - end - 1, "RAM buffer");
>> + if (end != (resource_size_t)end)
>> + continue;
>> + reserve_region_with_split(&iomem_resource, (resource_size_t)start,
>> + (resource_size_t)(end - 1), "RAM buffer");
>> }
>> }
>>
>
> That doesn't quite look right; for one thing it doesn't handle the
> (admittedly somewhat unlikely) case of end pointing to the end of the
> address space.
>
> Something like:
>
> if (start > (resource_size_t)end-1)
> continue;
>
> ... should work better.
>

ok this one?

---
arch/x86/kernel/e820.c | 16 ++++++++--------
1 file changed, 8 insertions(+), 8 deletions(-)

Index: linux-2.6/arch/x86/kernel/e820.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/e820.c
+++ linux-2.6/arch/x86/kernel/e820.c
@@ -1367,9 +1367,9 @@ void __init e820_reserve_resources(void)
}

/* How much should we pad RAM ending depending on where it is? */
-static unsigned long ram_alignment(resource_size_t pos)
+static u64 ram_alignment(u64 pos)
{
- unsigned long mb = pos >> 20;
+ u64 mb = pos >> 20;

/* To 64kB in the first megabyte */
if (!mb)
@@ -1400,17 +1400,17 @@ void __init e820_reserve_resources_late(
* avoid stolen RAM:
*/
for (i = 0; i < e820.nr_map; i++) {
- struct e820entry *entry = &e820_saved.map[i];
- resource_size_t start, end;
+ struct e820entry *entry = &e820.map[i];
+ u64 start, end;

if (entry->type != E820_RAM)
continue;
start = entry->addr + entry->size;
- end = round_up(start, ram_alignment(start));
- if (start == end)
+ end = round_up(start, ram_alignment(start)) - 1;
+ if (start > (resource_size_t)end)
continue;
- reserve_region_with_split(&iomem_resource, start,
- end - 1, "RAM buffer");
+ reserve_region_with_split(&iomem_resource, (resource_size_t)start,
+ (resource_size_t)end, "RAM buffer");
}
}

2009-06-30 01:20:10

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [BUG 2.6.31-rc1] HIGHMEM64G causes hang in PCI init on 32-bit x86

Yinghai Lu wrote:
>>>
>> That doesn't quite look right; for one thing it doesn't handle the
>> (admittedly somewhat unlikely) case of end pointing to the end of the
>> address space.
>>
>> Something like:
>>
>> if (start > (resource_size_t)end-1)
>> continue;
>>

> + if (start > (resource_size_t)end)
> continue;

Erk... thinko on my part. Should have been (resource_size_t)-1.

-hpa

--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel. I don't speak on their behalf.

2009-06-30 00:29:18

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [BUG 2.6.31-rc1] HIGHMEM64G causes hang in PCI init on 32-bit x86

Yinghai Lu wrote:
> continue;
> @@ -1409,8 +1409,10 @@ void __init e820_reserve_resources_late(
> end = round_up(start, ram_alignment(start));
> if (start == end)
> continue;
> - reserve_region_with_split(&iomem_resource, start,
> - end - 1, "RAM buffer");
> + if (end != (resource_size_t)end)
> + continue;
> + reserve_region_with_split(&iomem_resource, (resource_size_t)start,
> + (resource_size_t)(end - 1), "RAM buffer");
> }
> }
>

That doesn't quite look right; for one thing it doesn't handle the
(admittedly somewhat unlikely) case of end pointing to the end of the
address space.

Something like:

if (start > (resource_size_t)end-1)
continue;

... should work better.

-hpa

--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel. I don't speak on their behalf.

2009-06-30 01:24:46

by Yinghai Lu

[permalink] [raw]
Subject: Re: [BUG 2.6.31-rc1] HIGHMEM64G causes hang in PCI init on 32-bit x86

H. Peter Anvin wrote:
> Yinghai Lu wrote:
>>>>
>>> That doesn't quite look right; for one thing it doesn't handle the
>>> (admittedly somewhat unlikely) case of end pointing to the end of the
>>> address space.
>>>
>>> Something like:
>>>
>>> if (start > (resource_size_t)end-1)
>>> continue;
>>>
>
>> + if (start > (resource_size_t)end)
>> continue;
>
> Erk... thinko on my part. Should have been (resource_size_t)-1.
>

how about start is already 32M ?
you will insert blank one...

64bit, we could expand range that is above 4g to be aligned.

YH

2009-06-30 00:26:53

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [BUG 2.6.31-rc1] HIGHMEM64G causes hang in PCI init on 32-bit x86

Yinghai Lu wrote:
>>>
>> OK. This is starting to make sense. I suspect this is a similar issue
>> as 3b0fde0fac19c180317eb0601b3504083f4b9bf5 addresses, which is that the
>> e820 code assumes -- and I don't see any exception to that in
>> 45fbe3ee01b8e463b28c2751b5dcc0cbdc142d90 -- that iomem_resource covers
>> the entire 64-bit address space that e820 knows. I wonder what happens
>> with "interestingly shaped" memory above 4 GB if resource_size_t is 32
>> bits with that code.
>>
>> In terms of address space assignment, an alternate implementation of the
>> address space cap is to mark it reserved; that would unfortunately
>> result in an ugly turd at the end of /proc/iomem, but that can be
>> addressed if need be, too.
>
> always enable 64bit resource for 32bit too?
>

That would address the problem if combined with the "alternative
implementation" that I described below, but I'm not sure how well it
would go over, especially since the 32-bit x86 world is increasingly
getting concentrated on the very-resource-starved end of the computing
spectrum.

The bottom-line problem is the same: e820, and the e820 allocator, can
describe address space that lies outside our real range of possible
address space. What to do with that is easy -- it should simply be
ignored -- but it does lead to oddball sequencing issues. In that
sense, reserving a chunk of address space at the end is cleaner, but
that doesn't address the issue of what happens with a 32-bit
resource_size_t.

Unfortunately, my attempts at reproducing the problem locally has failed
so far.

-hpa

--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel. I don't speak on their behalf.

2009-06-30 02:42:54

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [BUG 2.6.31-rc1] HIGHMEM64G causes hang in PCI init on 32-bit x86

Yinghai Lu wrote:
>
> how about start is already 32M ?
> you will insert blank one...
>

I was rather assuming that zero range length was handled elsewhere...

-hpa

--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel. I don't speak on their behalf.

2009-06-30 09:11:25

by Mikael Pettersson

[permalink] [raw]
Subject: Re: [BUG 2.6.31-rc1] HIGHMEM64G causes hang in PCI init on 32-bit x86

Yinghai Lu writes:
> Linus Torvalds wrote:
> >
> > On Mon, 29 Jun 2009, Yinghai Lu wrote:
> >> + end = round_up(start, ram_alignment(start)) - 1;
> >> + if (start > (resource_size_t)end)
> >> continue;
> >> - reserve_region_with_split(&iomem_resource, start,
> >> - end - 1, "RAM buffer");
> >> + reserve_region_with_split(&iomem_resource, (resource_size_t)start,
> >> + (resource_size_t)end, "RAM buffer");
> >
> > Hmm. You shouldn't need the casts with reserve_region_with_split(), and
> > they just make things uglier.
> >
> > Also, I wonder if we should do something like this instead
> >
> > #define MAX_RESOURCE_SIZE ((resource_size_t)-1)
> >
> > ...
> > end = round_up(start, ram_alignment(start)) - 1;
> > if (end > MAX_RESOURCE_SIZE)
> > end = MAX_RESOURCE_SIZE;
> > if (start > end)
> > continue;
> >
> > Because otherwise we'll just be ignoring resources that cross the resource
> > size boundary, which sounds wrong.
> >
> > We _could_ have a RAM resource that crosses the 4GB boundary, after all.
> >
> > Yeah, it doesn't happen much in practice, because usually the 3G-4G range
> > is left for PCI mappings etc, so we might never hit this in practice, but
> > still, this sounds like a more correct thing to do.
> >
> > It also avoids the cast. We simply cap the end to the max that
> > 'resource_size_t' can hold.
>
> Mikael, please try this on your system, and send out /proc/iomem
>
> Thanks
>
> Yinghai
>
> [PATCH] x86: add boundary check for 32bit res before expand e820 resource to alignment
>
> fix hang with HIGHMEM_64G and 32bit resource.
>
> according to hpa and Linus, use (resource_size_t)-1 to fend off big ranges.
>
> Signed-off-by: Yinghai Lu <[email protected]>

Thanks, 2.6.31-rc1 vanilla (which didn't boot) plus this one does boot.
/proc/iomem now looks as follows:

00000000-0009ebff : System RAM
0009ec00-0009ffff : reserved
000a0000-000bffff : Video RAM area
000c0000-000ccfff : Video ROM
000e4000-000fffff : reserved
000f0000-000fffff : System ROM
00100000-7ff8ffff : System RAM
00100000-002e022e : Kernel code
002e022f-0038aaf7 : Kernel data
003d8000-003fc9f3 : Kernel bss
7ff90000-7ff9dfff : ACPI Tables
7ff9e000-7ffdffff : ACPI Non-volatile Storage
7ffe0000-7fffffff : reserved
80000000-800000ff : 0000:00:1f.3
bff00000-dfefffff : PCI Bus 0000:01
c0000000-cfffffff : 0000:01:00.0
e0000000-efffffff : PCI MMCONFIG 0 [00-ff]
e0000000-efffffff : pnp 00:0e
febfe000-febfec00 : pnp 00:09
fec00000-fec00fff : IOAPIC 0
fec00000-fec00fff : pnp 00:0b
fed00000-fed003ff : HPET 0
fed14000-fed19fff : pnp 00:01
fed1c000-fed1ffff : pnp 00:09
fed20000-fed8ffff : pnp 00:09
fee00000-fee00fff : Local APIC
fee00000-fee00fff : reserved
fee00000-fee00fff : pnp 00:0b
ff800000-ff8fffff : PCI Bus 0000:01
ff8c0000-ff8dffff : 0000:01:00.0
ff8e0000-ff8effff : 0000:01:00.1
ff8f0000-ff8fffff : 0000:01:00.0
ff900000-ff9fffff : PCI Bus 0000:02
ff9ffc00-ff9ffcff : 0000:02:02.0
ff9ffc00-ff9ffcff : 8139too
ffaf8000-ffafbfff : 0000:00:1b.0
ffaf8000-ffafbfff : ICH HD audio
ffaff000-ffaff3ff : 0000:00:1d.7
ffaff000-ffaff3ff : ehci_hcd
ffaff400-ffaff7ff : 0000:00:1a.7
ffaff400-ffaff7ff : ehci_hcd
ffaff800-ffafffff : 0000:00:1f.2
ffaff800-ffafffff : ahci
ffb00000-ffffffff : reserved
ffb00000-ffbfffff : pnp 00:09
fff00000-fffffffe : pnp 00:09
100000000-1ffffffff : System RAM

2009-06-30 14:50:55

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [BUG 2.6.31-rc1] HIGHMEM64G causes hang in PCI init on 32-bit x86

Mikael Pettersson wrote:
>
> Thanks, 2.6.31-rc1 vanilla (which didn't boot) plus this one does boot.
> /proc/iomem now looks as follows:
>

... as it should. So far so good, and this is a real problem.

However, there is something that really bothers me: *why does this help
on Mikael's system, which is PAE and therefore has a 64-bit
resource_size_t*? This whole patch should be a no-op! There is still
something that doesn't make sense.

The use of "unsigned long" in ram_alignment() will overflow after 2^52
bytes, but again, that's not the issue here, since the highest "start"
value we have is (0x2 << 32).

By process of elimination, the culprit must be round_up(), which reveals
that the macro definition of round_up() has a *very* sublte behavior
with mixed types:

#define round_up(x, y) (((x) + (y) - 1) & ~((y) - 1))

ram_alignment() returns unsigned long, which becomes (y). This means
that the mask word on the right hand of the & gets truncated to 32 bits
*before* the masking happens -- since ((y) - 1) is still unsigned long,
inverting it will not set bits [63..32] to on.

I think this macro is actively dangerous. Better would be:

({ __typeof__(x) __mask = (y)-1; ((x)+__mask) & ~__mask; })

... which is also multiple-inclusion-free at the cost of using gcc
({...}) constructs.

The deep irony in this is that in our particular case is perhaps that
align_up(x,y)-1 is the same thing as x | (y-1) which would have avoided
the problem...

-hpa

--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel. I don't speak on their behalf.

2009-06-30 15:08:31

by Rolf Eike Beer

[permalink] [raw]
Subject: Re: [BUG 2.6.31-rc1] HIGHMEM64G causes hang in PCI init on 32-bit x86

H. Peter Anvin wrote:
> Mikael Pettersson wrote:
> > Thanks, 2.6.31-rc1 vanilla (which didn't boot) plus this one does boot.
> > /proc/iomem now looks as follows:
>
> ... as it should. So far so good, and this is a real problem.
>
> However, there is something that really bothers me: *why does this help
> on Mikael's system, which is PAE and therefore has a 64-bit
> resource_size_t*? This whole patch should be a no-op! There is still
> something that doesn't make sense.
>
> The use of "unsigned long" in ram_alignment() will overflow after 2^52
> bytes, but again, that's not the issue here, since the highest "start"
> value we have is (0x2 << 32).

I assume you meant "2^32" and (0x1 << 32)?

Eike


Attachments:
(No filename) (708.00 B)
signature.asc (198.00 B)
This is a digitally signed message part.
Download all attachments

2009-06-30 19:00:01

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [BUG 2.6.31-rc1] HIGHMEM64G causes hang in PCI init on 32-bit x86

Rolf Eike Beer wrote:
> H. Peter Anvin wrote:
>> Mikael Pettersson wrote:
>>> Thanks, 2.6.31-rc1 vanilla (which didn't boot) plus this one does boot.
>>> /proc/iomem now looks as follows:
>> ... as it should. So far so good, and this is a real problem.
>>
>> However, there is something that really bothers me: *why does this help
>> on Mikael's system, which is PAE and therefore has a 64-bit
>> resource_size_t*? This whole patch should be a no-op! There is still
>> something that doesn't make sense.
>>
>> The use of "unsigned long" in ram_alignment() will overflow after 2^52
>> bytes, but again, that's not the issue here, since the highest "start"
>> value we have is (0x2 << 32).
>
> I assume you meant "2^32" and (0x1 << 32)?
>

No, I meant 2^52 and (0x2 << 32) [== 2^33.]

-hpa

2009-06-30 19:34:31

by Yinghai Lu

[permalink] [raw]
Subject: Re: [BUG 2.6.31-rc1] HIGHMEM64G causes hang in PCI init on 32-bit x86

H. Peter Anvin wrote:
> Mikael Pettersson wrote:
>> Thanks, 2.6.31-rc1 vanilla (which didn't boot) plus this one does boot.
>> /proc/iomem now looks as follows:
>>
>
> ... as it should. So far so good, and this is a real problem.
>
> However, there is something that really bothers me: *why does this help
> on Mikael's system, which is PAE and therefore has a 64-bit
> resource_size_t*? This whole patch should be a no-op! There is still
> something that doesn't make sense.
>
> The use of "unsigned long" in ram_alignment() will overflow after 2^52
> bytes, but again, that's not the issue here, since the highest "start"
> value we have is (0x2 << 32).
>
> By process of elimination, the culprit must be round_up(), which reveals
> that the macro definition of round_up() has a *very* sublte behavior
> with mixed types:
>
> #define round_up(x, y) (((x) + (y) - 1) & ~((y) - 1))
>
> ram_alignment() returns unsigned long, which becomes (y). This means
> that the mask word on the right hand of the & gets truncated to 32 bits
> *before* the masking happens -- since ((y) - 1) is still unsigned long,
> inverting it will not set bits [63..32] to on.
>
> I think this macro is actively dangerous. Better would be:
>
> ({ __typeof__(x) __mask = (y)-1; ((x)+__mask) & ~__mask; })
>
> ... which is also multiple-inclusion-free at the cost of using gcc
> ({...}) constructs.
>
> The deep irony in this is that in our particular case is perhaps that
> align_up(x,y)-1 is the same thing as x | (y-1) which would have avoided
> the problem...

agreed, that is why we change round_up to take u64.

wonder if we should kill round_up and use roundup instead.

in include/linux/kernel.h
#define roundup(x, y) ((((x) + ((y) - 1)) / (y)) * (y))

YH

2009-06-30 19:46:18

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [BUG 2.6.31-rc1] HIGHMEM64G causes hang in PCI init on 32-bit x86

Yinghai Lu wrote:
>
> agreed, that is why we change round_up to take u64.
>

round_up() is a macro, it doesn't "take" anything per se...

> wonder if we should kill round_up and use roundup instead.
>
> in include/linux/kernel.h
> #define roundup(x, y) ((((x) + ((y) - 1)) / (y)) * (y))

Either that or we should change it to the form I specified...

-hpa

2009-06-30 20:06:32

by Yinghai Lu

[permalink] [raw]
Subject: Re: [BUG 2.6.31-rc1] HIGHMEM64G causes hang in PCI init on 32-bit x86

H. Peter Anvin wrote:
> Yinghai Lu wrote:
>> agreed, that is why we change round_up to take u64.
>>
>
> round_up() is a macro, it doesn't "take" anything per se...
>
i mean
end = roundup(start, ram_alignment(start)) - 1;

static u64 ram_alignment(u64 pos)

and other calling to round_up


please check

[PATCH] x86: add boundary check for 32bit res before expand e820 resource to alignment -v2

fix hang with HIGHMEM_64G and 32bit resource.
according to hpa and Linus, use (resource_size_t)-1 to fend off big ranges.

analyzed by hpa

v2: use roundup instead

Reported-and-tested-by: Mikael Pettersson <[email protected]>
Signed-off-by: Yinghai Lu <[email protected]>

---
arch/x86/kernel/e820.c | 20 ++++++++++++--------
1 file changed, 12 insertions(+), 8 deletions(-)

Index: linux-2.6/arch/x86/kernel/e820.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/e820.c
+++ linux-2.6/arch/x86/kernel/e820.c
@@ -1367,9 +1367,9 @@ void __init e820_reserve_resources(void)
}

/* How much should we pad RAM ending depending on where it is? */
-static unsigned long ram_alignment(resource_size_t pos)
+static u64 ram_alignment(u64 pos)
{
- unsigned long mb = pos >> 20;
+ u64 mb = pos >> 20;

/* To 64kB in the first megabyte */
if (!mb)
@@ -1383,6 +1383,8 @@ static unsigned long ram_alignment(resou
return 32*1024*1024;
}

+#define MAX_RESOURCE_SIZE ((resource_size_t)-1)
+
void __init e820_reserve_resources_late(void)
{
int i;
@@ -1400,17 +1402,19 @@ void __init e820_reserve_resources_late(
* avoid stolen RAM:
*/
for (i = 0; i < e820.nr_map; i++) {
- struct e820entry *entry = &e820_saved.map[i];
- resource_size_t start, end;
+ struct e820entry *entry = &e820.map[i];
+ u64 start, end;

if (entry->type != E820_RAM)
continue;
start = entry->addr + entry->size;
- end = round_up(start, ram_alignment(start));
- if (start == end)
+ end = roundup(start, ram_alignment(start)) - 1;
+ if (end > MAX_RESOURCE_SIZE)
+ end = MAX_RESOURCE_SIZE;
+ if (start > end)
continue;
- reserve_region_with_split(&iomem_resource, start,
- end - 1, "RAM buffer");
+ reserve_region_with_split(&iomem_resource, start, end,
+ "RAM buffer");
}
}

2009-06-30 20:12:45

by Linus Torvalds

[permalink] [raw]
Subject: Re: [BUG 2.6.31-rc1] HIGHMEM64G causes hang in PCI init on 32-bit x86



On Tue, 30 Jun 2009, H. Peter Anvin wrote:
>
> By process of elimination, the culprit must be round_up(), which reveals
> that the macro definition of round_up() has a *very* sublte behavior
> with mixed types:
>
> #define round_up(x, y) (((x) + (y) - 1) & ~((y) - 1))
>
> ram_alignment() returns unsigned long, which becomes (y). This means
> that the mask word on the right hand of the & gets truncated to 32 bits
> *before* the masking happens -- since ((y) - 1) is still unsigned long,
> inverting it will not set bits [63..32] to on.

Good catch.

Also, this shows another bug in the #define: it evaluates 'y' twice, which
is a no-no for something that _looks_ like a function.

> I think this macro is actively dangerous. Better would be:
>
> ({ __typeof__(x) __mask = (y)-1; ((x)+__mask) & ~__mask; })

Yes. Please make it so.

> The deep irony in this is that in our particular case is perhaps that
> align_up(x,y)-1 is the same thing as x | (y-1) which would have avoided
> the problem...

I don't know how deep that irony is, but I do agree that maybe we should
do that simplification too. In addition to fixing round_up() to not bite
future generations in the ass.

Linus

2009-06-30 21:22:42

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [BUG 2.6.31-rc1] HIGHMEM64G causes hang in PCI init on 32-bit x86

Yinghai Lu wrote:
> H. Peter Anvin wrote:
>> Yinghai Lu wrote:
>>> agreed, that is why we change round_up to take u64.
>>>
>> round_up() is a macro, it doesn't "take" anything per se...
>>
> i mean
> end = roundup(start, ram_alignment(start)) - 1;
>
> static u64 ram_alignment(u64 pos)
>
> and other calling to round_up
>

This is fixing the wrong problem. Fixing round_up is the right thing,
not working around its brain damage.

-hpa

2009-06-30 21:51:19

by Yinghai Lu

[permalink] [raw]
Subject: Re: [BUG 2.6.31-rc1] HIGHMEM64G causes hang in PCI init on 32-bit x86

H. Peter Anvin wrote:
> Yinghai Lu wrote:
>> H. Peter Anvin wrote:
>>> Yinghai Lu wrote:
>>>> agreed, that is why we change round_up to take u64.
>>>>
>>> round_up() is a macro, it doesn't "take" anything per se...
>>>
>> i mean
>> end = roundup(start, ram_alignment(start)) - 1;
>>
>> static u64 ram_alignment(u64 pos)
>>
>> and other calling to round_up
>>
>
> This is fixing the wrong problem. Fixing round_up is the right thing,
> not working around its brain damage.
>

this one ?

or you want to move round_up/down to include/linux/kernel.h?

[PATCH] x86: add boundary check for 32bit res before expand e820 resource to alignment -v3

fix hang with HIGHMEM_64G and 32bit resource.
according to hpa and Linus, use (resource_size_t)-1 to fend off big ranges.

analyized by hpa

v2: use roundup instead
v3: from HPA update round_up with __type_of__

Reported-and-tested-by: Mikael Pettersson <[email protected]>
Signed-off-by: Yinghai Lu <[email protected]>

---
arch/x86/include/asm/proto.h | 5 +++--
arch/x86/kernel/e820.c | 20 ++++++++++++--------
2 files changed, 15 insertions(+), 10 deletions(-)

Index: linux-2.6/arch/x86/kernel/e820.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/e820.c
+++ linux-2.6/arch/x86/kernel/e820.c
@@ -1367,9 +1367,9 @@ void __init e820_reserve_resources(void)
}

/* How much should we pad RAM ending depending on where it is? */
-static unsigned long ram_alignment(resource_size_t pos)
+static u64 ram_alignment(u64 pos)
{
- unsigned long mb = pos >> 20;
+ u64 mb = pos >> 20;

/* To 64kB in the first megabyte */
if (!mb)
@@ -1383,6 +1383,8 @@ static unsigned long ram_alignment(resou
return 32*1024*1024;
}

+#define MAX_RESOURCE_SIZE ((resource_size_t)-1)
+
void __init e820_reserve_resources_late(void)
{
int i;
@@ -1400,17 +1402,19 @@ void __init e820_reserve_resources_late(
* avoid stolen RAM:
*/
for (i = 0; i < e820.nr_map; i++) {
- struct e820entry *entry = &e820_saved.map[i];
- resource_size_t start, end;
+ struct e820entry *entry = &e820.map[i];
+ u64 start, end;

if (entry->type != E820_RAM)
continue;
start = entry->addr + entry->size;
- end = round_up(start, ram_alignment(start));
- if (start == end)
+ end = round_up(start, ram_alignment(start)) - 1;
+ if (end > MAX_RESOURCE_SIZE)
+ end = MAX_RESOURCE_SIZE;
+ if (start > end)
continue;
- reserve_region_with_split(&iomem_resource, start,
- end - 1, "RAM buffer");
+ reserve_region_with_split(&iomem_resource, start, end,
+ "RAM buffer");
}
}

Index: linux-2.6/arch/x86/include/asm/proto.h
===================================================================
--- linux-2.6.orig/arch/x86/include/asm/proto.h
+++ linux-2.6/arch/x86/include/asm/proto.h
@@ -22,7 +22,8 @@ extern int reboot_force;

long do_arch_prctl(struct task_struct *task, int code, unsigned long addr);

-#define round_up(x, y) (((x) + (y) - 1) & ~((y) - 1))
-#define round_down(x, y) ((x) & ~((y) - 1))
+#define round_up(x, y) ({ __typeof__(x) __mask = (y)-1; \
+ ((x)+__mask) & ~__mask; })
+#define round_down(x, y) ({ __typeof__(x) __mask = (y)-1; (x) & ~__mask; })

#endif /* _ASM_X86_PROTO_H */

2009-06-30 22:12:22

by Linus Torvalds

[permalink] [raw]
Subject: Re: [BUG 2.6.31-rc1] HIGHMEM64G causes hang in PCI init on 32-bit x86



On Tue, 30 Jun 2009, Yinghai Lu wrote:
?
> +#define round_up(x, y) ({ __typeof__(x) __mask = (y)-1; \
> + ((x)+__mask) & ~__mask; })
> +#define round_down(x, y) ({ __typeof__(x) __mask = (y)-1; (x) & ~__mask; })

Yes, except we might as well simplify it. Do it without the statement
expressions, using just a single 'y'. Like this:

#define __round_mask(x,y) ((__typeof__(x))((y)-1))
#define round_up(x,y) (((x) | __round_mask(x,y))+1)
#define round_down(x,y) ((x) & ~__round_mask(x,y))

(Yeah, it uses 'x' twice, but the second one is for 'typeof', which
doesn't actually cause the value to be evaluated, so it's ok).

Now those 'round_xyz()' operations will always return a value of a type
that is the same as the type of 'x' except it's gone through the normal C
integer promotion rules (ie if 'x' is a smaller type than 'int', then it
will be promoted to 'int').

Not very well tested, but it _looks_ correct, and uses Peter's trick, and
willlet the compiler notice that

round_up(x,y)-1

is the same thing as

x | (y-1)

which it _could_ have perhaps figured out before, but now it's way more
obvious.
Linus

2009-06-30 22:31:32

by Yinghai Lu

[permalink] [raw]
Subject: Re: [BUG 2.6.31-rc1] HIGHMEM64G causes hang in PCI init on 32-bit x86

Linus Torvalds wrote:
>
> On Tue, 30 Jun 2009, Yinghai Lu wrote:
> ?
>> +#define round_up(x, y) ({ __typeof__(x) __mask = (y)-1; \
>> + ((x)+__mask) & ~__mask; })
>> +#define round_down(x, y) ({ __typeof__(x) __mask = (y)-1; (x) & ~__mask; })
>
> Yes, except we might as well simplify it. Do it without the statement
> expressions, using just a single 'y'. Like this:
>
> #define __round_mask(x,y) ((__typeof__(x))((y)-1))
> #define round_up(x,y) (((x) | __round_mask(x,y))+1)
> #define round_down(x,y) ((x) & ~__round_mask(x,y))
>
> (Yeah, it uses 'x' twice, but the second one is for 'typeof', which
> doesn't actually cause the value to be evaluated, so it's ok).
>
> Now those 'round_xyz()' operations will always return a value of a type
> that is the same as the type of 'x' except it's gone through the normal C
> integer promotion rules (ie if 'x' is a smaller type than 'int', then it
> will be promoted to 'int').
>
> Not very well tested, but it _looks_ correct, and uses Peter's trick, and
> willlet the compiler notice that
>
> round_up(x,y)-1
>
> is the same thing as
>
> x | (y-1)
>
> which it _could_ have perhaps figured out before, but now it's way more
> obvious.

how about x = 0, y = 0x100?

YH

2009-06-30 22:52:14

by Yinghai Lu

[permalink] [raw]
Subject: Re: [BUG 2.6.31-rc1] HIGHMEM64G causes hang in PCI init on 32-bit x86

Yinghai Lu wrote:
> Linus Torvalds wrote:
>> On Tue, 30 Jun 2009, Yinghai Lu wrote:
>> ?
>>> +#define round_up(x, y) ({ __typeof__(x) __mask = (y)-1; \
>>> + ((x)+__mask) & ~__mask; })
>>> +#define round_down(x, y) ({ __typeof__(x) __mask = (y)-1; (x) & ~__mask; })
>> Yes, except we might as well simplify it. Do it without the statement
>> expressions, using just a single 'y'. Like this:
>>
>> #define __round_mask(x,y) ((__typeof__(x))((y)-1))
>> #define round_up(x,y) (((x) | __round_mask(x,y))+1)
>> #define round_down(x,y) ((x) & ~__round_mask(x,y))
>>
>> (Yeah, it uses 'x' twice, but the second one is for 'typeof', which
>> doesn't actually cause the value to be evaluated, so it's ok).
>>
>> Now those 'round_xyz()' operations will always return a value of a type
>> that is the same as the type of 'x' except it's gone through the normal C
>> integer promotion rules (ie if 'x' is a smaller type than 'int', then it
>> will be promoted to 'int').
>>
>> Not very well tested, but it _looks_ correct, and uses Peter's trick, and
>> willlet the compiler notice that
>>
>> round_up(x,y)-1
>>
>> is the same thing as
>>
>> x | (y-1)
>>
>> which it _could_ have perhaps figured out before, but now it's way more
>> obvious.
>
> how about x = 0, y = 0x100?
>

also x=0x100000, and y=0x100?

YH

2009-06-30 22:56:34

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [BUG 2.6.31-rc1] HIGHMEM64G causes hang in PCI init on 32-bit x86

Yinghai Lu wrote:
>>>
>>> Not very well tested, but it _looks_ correct, and uses Peter's trick, and
>>> willlet the compiler notice that
>>>
>>> round_up(x,y)-1
>>>
>>> is the same thing as
>>>
>>> x | (y-1)
>>>
>>> which it _could_ have perhaps figured out before, but now it's way more
>>> obvious.
>> how about x = 0, y = 0x100?
>>
>
> also x=0x100000, and y=0x100?
>

Yes, it doesn't work when x is already aligned.

-hpa

2009-06-30 23:02:22

by Linus Torvalds

[permalink] [raw]
Subject: Re: [BUG 2.6.31-rc1] HIGHMEM64G causes hang in PCI init on 32-bit x86



On Tue, 30 Jun 2009, Yinghai Lu wrote:
>
> how about x = 0, y = 0x100?

Yeah, no. Peter's trick doesn't work. Good catch.

I don't see any single-use trick then, and so it needs the whole statement
expression mess.

Linus

2009-06-30 23:06:33

by Linus Torvalds

[permalink] [raw]
Subject: Re: [BUG 2.6.31-rc1] HIGHMEM64G causes hang in PCI init on 32-bit x86



On Tue, 30 Jun 2009, Linus Torvalds wrote:
>
> I don't see any single-use trick then, and so it needs the whole statement
> expression mess.

Hmm. Does (((x)-1) | mask)+1) work?

I haven't thought it fully through, but that _should_ take care of the
"already aligned" case, no?

Linus

2009-06-30 23:14:27

by Yinghai Lu

[permalink] [raw]
Subject: Re: [BUG 2.6.31-rc1] HIGHMEM64G causes hang in PCI init on 32-bit x86

Linus Torvalds wrote:
>
> On Tue, 30 Jun 2009, Linus Torvalds wrote:
>> I don't see any single-use trick then, and so it needs the whole statement
>> expression mess.
>
> Hmm. Does (((x)-1) | mask)+1) work?
>
> I haven't thought it fully through, but that _should_ take care of the
> "already aligned" case, no?

yes. that is right.

then how about
roundup(x,y)
round_up(x,y)

roundup doesn't need y is 2^n
but round_up does need y is 2^n, and only for x86

the name is confusing.

YH

2009-06-30 23:19:45

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [BUG 2.6.31-rc1] HIGHMEM64G causes hang in PCI init on 32-bit x86

Linus Torvalds wrote:
>
> On Tue, 30 Jun 2009, Yinghai Lu wrote:
>> how about x = 0, y = 0x100?
>
> Yeah, no. Peter's trick doesn't work. Good catch.
>
> I don't see any single-use trick then, and so it needs the whole statement
> expression mess.
>

Yes, the | trick only works if you know you have a nonzero alignment pad
(or you want a full-block pad in the empty case.)

-hpa

2009-06-30 23:21:40

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [BUG 2.6.31-rc1] HIGHMEM64G causes hang in PCI init on 32-bit x86

Yinghai Lu wrote:
> Linus Torvalds wrote:
>> On Tue, 30 Jun 2009, Linus Torvalds wrote:
>>> I don't see any single-use trick then, and so it needs the whole statement
>>> expression mess.
>> Hmm. Does (((x)-1) | mask)+1) work?
>>
>> I haven't thought it fully through, but that _should_ take care of the
>> "already aligned" case, no?
>
> yes. that is right.
>
> then how about
> roundup(x,y)
> round_up(x,y)
>
> roundup doesn't need y is 2^n
> but round_up does need y is 2^n, and only for x86
>

We should definitely move whatever to global. However, I do think it is
valuable to have something that can avoid divides even if the argument
is not necessarily a constant.

-hpa

2009-07-01 19:32:56

by Yinghai Lu

[permalink] [raw]
Subject: [PATCH] x86: add boundary check for 32bit res before expand e820 resource to alignment


fix hang with HIGHMEM_64G and 32bit resource.
according to hpa and Linus, use (resource_size_t)-1 to fend off big ranges.

analyized by hpa

Reported-and-tested-by: Mikael Pettersson <[email protected]>
Signed-off-by: Yinghai Lu <[email protected]>

---
arch/x86/include/asm/proto.h | 3 ---
arch/x86/kernel/e820.c | 20 ++++++++++++--------
include/linux/kernel.h | 5 +++++
3 files changed, 17 insertions(+), 11 deletions(-)

Index: linux-2.6/arch/x86/kernel/e820.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/e820.c
+++ linux-2.6/arch/x86/kernel/e820.c
@@ -1367,9 +1367,9 @@ void __init e820_reserve_resources(void)
}

/* How much should we pad RAM ending depending on where it is? */
-static unsigned long ram_alignment(resource_size_t pos)
+static u64 ram_alignment(u64 pos)
{
- unsigned long mb = pos >> 20;
+ u64 mb = pos >> 20;

/* To 64kB in the first megabyte */
if (!mb)
@@ -1383,6 +1383,8 @@ static unsigned long ram_alignment(resou
return 32*1024*1024;
}

+#define MAX_RESOURCE_SIZE ((resource_size_t)-1)
+
void __init e820_reserve_resources_late(void)
{
int i;
@@ -1400,17 +1402,19 @@ void __init e820_reserve_resources_late(
* avoid stolen RAM:
*/
for (i = 0; i < e820.nr_map; i++) {
- struct e820entry *entry = &e820_saved.map[i];
- resource_size_t start, end;
+ struct e820entry *entry = &e820.map[i];
+ u64 start, end;

if (entry->type != E820_RAM)
continue;
start = entry->addr + entry->size;
- end = round_up(start, ram_alignment(start));
- if (start == end)
+ end = round_up(start, ram_alignment(start)) - 1;
+ if (end > MAX_RESOURCE_SIZE)
+ end = MAX_RESOURCE_SIZE;
+ if (start > end)
continue;
- reserve_region_with_split(&iomem_resource, start,
- end - 1, "RAM buffer");
+ reserve_region_with_split(&iomem_resource, start, end,
+ "RAM buffer");
}
}

2009-07-01 19:34:20

by Yinghai Lu

[permalink] [raw]
Subject: [PATCH] fix round_up/down

From: H. Peter Anvin <[email protected]>

round_up with __type_of__
round_up use mask and | tricks from HPA and linus
also move round_up/down to kernel.h

Signed-off-by: Yinghai Lu <[email protected]>

---
arch/um/sys-x86_64/signal.c | 2 --
arch/x86/include/asm/proto.h | 3 ---
drivers/md/dm-exception-store.c | 10 ----------
include/linux/kernel.h | 6 ++++++
4 files changed, 6 insertions(+), 15 deletions(-)

Index: linux-2.6/arch/x86/include/asm/proto.h
===================================================================
--- linux-2.6.orig/arch/x86/include/asm/proto.h
+++ linux-2.6/arch/x86/include/asm/proto.h
@@ -22,7 +22,4 @@ extern int reboot_force;

long do_arch_prctl(struct task_struct *task, int code, unsigned long addr);

-#define round_up(x, y) (((x) + (y) - 1) & ~((y) - 1))
-#define round_down(x, y) ((x) & ~((y) - 1))
-
#endif /* _ASM_X86_PROTO_H */
Index: linux-2.6/include/linux/kernel.h
===================================================================
--- linux-2.6.orig/include/linux/kernel.h
+++ linux-2.6/include/linux/kernel.h
@@ -140,6 +140,7 @@ extern const char linux_proc_banner[];
#define FIELD_SIZEOF(t, f) (sizeof(((t*)0)->f))
#define DIV_ROUND_UP(n,d) (((n) + (d) - 1) / (d))
#define roundup(x, y) ((((x) + ((y) - 1)) / (y)) * (y))
+#define rounddown(x, y) (((x) / (y)) * (y))
#define DIV_ROUND_CLOSEST(x, divisor)( \
{ \
typeof(divisor) __divisor = divisor; \
@@ -147,6 +148,11 @@ extern const char linux_proc_banner[];
} \
)

+/* need y is 2^n */
+#define __round_mask(x,y) ((__typeof__(x))((y)-1))
+#define round_up(x,y) ((((x)-1) | __round_mask(x,y))+1)
+#define round_down(x,y) ((x) & ~__round_mask(x,y))
+
#define _RET_IP_ (unsigned long)__builtin_return_address(0)
#define _THIS_IP_ ({ __label__ __here; __here: (unsigned long)&&__here; })

Index: linux-2.6/arch/um/sys-x86_64/signal.c
===================================================================
--- linux-2.6.orig/arch/um/sys-x86_64/signal.c
+++ linux-2.6/arch/um/sys-x86_64/signal.c
@@ -165,8 +165,6 @@ struct rt_sigframe
struct _fpstate fpstate;
};

-#define round_down(m, n) (((m) / (n)) * (n))
-
int setup_signal_stack_si(unsigned long stack_top, int sig,
struct k_sigaction *ka, struct pt_regs * regs,
siginfo_t *info, sigset_t *set)
Index: linux-2.6/drivers/md/dm-exception-store.c
===================================================================
--- linux-2.6.orig/drivers/md/dm-exception-store.c
+++ linux-2.6/drivers/md/dm-exception-store.c
@@ -138,16 +138,6 @@ int dm_exception_store_type_unregister(s
}
EXPORT_SYMBOL(dm_exception_store_type_unregister);

-/*
- * Round a number up to the nearest 'size' boundary. size must
- * be a power of 2.
- */
-static ulong round_up(ulong n, ulong size)
-{
- size--;
- return (n + size) & ~size;
-}
-
static int set_chunk_size(struct dm_exception_store *store,
const char *chunk_size_arg, char **error)
{

2009-07-01 19:43:42

by Joe Perches

[permalink] [raw]
Subject: Re: [PATCH] fix round_up/down

On Wed, 2009-07-01 at 12:33 -0700, Yinghai Lu wrote:
> --- linux-2.6.orig/include/linux/kernel.h
> +++ linux-2.6/include/linux/kernel.h
[]
> #define roundup(x, y) ((((x) + ((y) - 1)) / (y)) * (y))
> +#define rounddown(x, y) (((x) / (y)) * (y))
[]
> +#define round_up(x,y) ((((x)-1) | __round_mask(x,y))+1)
> +#define round_down(x,y) ((x) & ~__round_mask(x,y))

Isn't this just asking for trouble?

How about a better name?
Maybe masked_roundup, or roundup_ala_zorro...

2009-07-01 20:04:37

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH] fix round_up/down

On Wed, 01 Jul 2009 12:39:35 -0700
Joe Perches <[email protected]> wrote:

> On Wed, 2009-07-01 at 12:33 -0700, Yinghai Lu wrote:
> > --- linux-2.6.orig/include/linux/kernel.h
> > +++ linux-2.6/include/linux/kernel.h
> []
> > #define roundup(x, y) ((((x) + ((y) - 1)) / (y)) * (y))
> > +#define rounddown(x, y) (((x) / (y)) * (y))
> []
> > +#define round_up(x,y) ((((x)-1) | __round_mask(x,y))+1)
> > +#define round_down(x,y) ((x) & ~__round_mask(x,y))
>
> Isn't this just asking for trouble?

Yes, I think so. round_up() versus roundup() is a bit subtle!

> How about a better name?
> Maybe masked_roundup, or roundup_ala_zorro...

Yes. roundup() wasn't a well-chosen identifier, really. But I guess
it's compatible with the faster bitwise-based rounding operation so
it's OK to have a special fast version of roundup() for the cases we're
rounding up to a power-of-2.


umm, how about roundup_pow2()? Sucks?


Also, it it lower-case or all-caps? I think it should be all-caps. Because

a) it is a macro, and it can ONLY be implemented as a macro, so
there's no point in pretending that it might be a C function and that the
caller needn't care.

b) several of these macros evaluate their args multiple times and
hence will produce buggy or inefficeint code when passed expressions
with side-effects. So we should warn people that these things are macros.

We should also fix that, dammit. The proposed new rounddown()
above has this failing as well.

2009-07-02 18:11:42

by Yinghai Lu

[permalink] [raw]
Subject: [PATCH] x86: add boundary check for 32bit res before expand e820 resource to alignment -v2


fix hang with HIGHMEM_64G and 32bit resource.
according to hpa and Linus, use (resource_size_t)-1 to fend off big ranges.
analyized by hpa

Alex found:
for i386 machine the specjbb2005 still can not run with hugepage

-v2: it also fix hugepage problem

Reported-and-tested-by: Mikael Pettersson <[email protected]>
Reported-and-Tested-by: Alex Shi <[email protected]>
Signed-off-by: Yinghai Lu <[email protected]>

---
arch/x86/include/asm/proto.h | 3 ---
arch/x86/kernel/e820.c | 20 ++++++++++++--------
include/linux/kernel.h | 5 +++++
3 files changed, 17 insertions(+), 11 deletions(-)

Index: linux-2.6/arch/x86/kernel/e820.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/e820.c
+++ linux-2.6/arch/x86/kernel/e820.c
@@ -1367,9 +1367,9 @@ void __init e820_reserve_resources(void)
}

/* How much should we pad RAM ending depending on where it is? */
-static unsigned long ram_alignment(resource_size_t pos)
+static u64 ram_alignment(u64 pos)
{
- unsigned long mb = pos >> 20;
+ u64 mb = pos >> 20;

/* To 64kB in the first megabyte */
if (!mb)
@@ -1383,6 +1383,8 @@ static unsigned long ram_alignment(resou
return 32*1024*1024;
}

+#define MAX_RESOURCE_SIZE ((resource_size_t)-1)
+
void __init e820_reserve_resources_late(void)
{
int i;
@@ -1400,17 +1402,19 @@ void __init e820_reserve_resources_late(
* avoid stolen RAM:
*/
for (i = 0; i < e820.nr_map; i++) {
- struct e820entry *entry = &e820_saved.map[i];
- resource_size_t start, end;
+ struct e820entry *entry = &e820.map[i];
+ u64 start, end;

if (entry->type != E820_RAM)
continue;
start = entry->addr + entry->size;
- end = round_up(start, ram_alignment(start));
- if (start == end)
+ end = round_up(start, ram_alignment(start)) - 1;
+ if (end > MAX_RESOURCE_SIZE)
+ end = MAX_RESOURCE_SIZE;
+ if (start > end)
continue;
- reserve_region_with_split(&iomem_resource, start,
- end - 1, "RAM buffer");
+ reserve_region_with_split(&iomem_resource, start, end,
+ "RAM buffer");
}
}

2009-07-03 08:06:43

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH] x86: add boundary check for 32bit res before expand e820 resource to alignment -v2


* Yinghai Lu <[email protected]> wrote:

> +#define MAX_RESOURCE_SIZE ((resource_size_t)-1)

> + end = round_up(start, ram_alignment(start)) - 1;
> + if (end > MAX_RESOURCE_SIZE)
> + end = MAX_RESOURCE_SIZE;

As Andrew noted it, this should probably have a comment along the
lines of:

/*
* Clip entries that go beyond our maximum resource
* awareness limit to the max. If we accepted them
* blindly we'd get random rounding artifacts and a
* possibly messed up resource tree and boot
* failures:
*/
if (end > MAX_RESOURCE_SIZE)
end = MAX_RESOURCE_SIZE;

Ingo