2006-08-31 10:48:24

by Paul Jackson

[permalink] [raw]
Subject: x86_64 account-for-memmap patch in 2.6.18-rc4-mm3 doesn't boot.

The following patch in 2.6.18-rc4-mm3 is broken on my x86_64:

account-for-memmap-and-optionally-the-kernel-image-as-holes.patch

The failure is 100% reproducible.

The system has a pair of dual-core Intel Xeon 5100 series (Woodcrest)
processors (4 logical CPUs total) and 2 GBytes of ram.

The .config is what one gets from 'make defconfig' for arch x86_64,
plus the following changes:

=========================== begin ===========================
--- .config.def 2006-08-31 04:29:22.100311614 -0500
+++ .config 2006-08-31 04:29:03.247761750 -0500
@@ -1,7 +1,7 @@
#
# Automatically generated make config: don't edit
# Linux kernel version: 2.6.18-rc4-mm3
-# Thu Aug 31 04:29:22 2006
+# Thu Aug 31 04:07:54 2006
#
CONFIG_X86_64=y
CONFIG_64BIT=y
@@ -44,7 +44,7 @@
# CONFIG_AUDIT is not set
CONFIG_IKCONFIG=y
CONFIG_IKCONFIG_PROC=y
-# CONFIG_CPUSETS is not set
+CONFIG_CPUSETS=y
# CONFIG_RELAY is not set
CONFIG_INITRAMFS_SOURCE=""
CONFIG_UID16=y
@@ -205,7 +205,7 @@
# CONFIG_ACPI_ASUS is not set
# CONFIG_ACPI_IBM is not set
# CONFIG_ACPI_TOSHIBA is not set
-CONFIG_ACPI_SONY=m
+# CONFIG_ACPI_SONY is not set
CONFIG_ACPI_BLACKLIST_YEAR=0
# CONFIG_ACPI_DEBUG is not set
CONFIG_ACPI_EC=y
@@ -1270,7 +1270,11 @@
# CONFIG_REISERFS_FS_SECURITY is not set
# CONFIG_JFS_FS is not set
CONFIG_FS_POSIX_ACL=y
-# CONFIG_XFS_FS is not set
+CONFIG_XFS_FS=y
+# CONFIG_XFS_QUOTA is not set
+# CONFIG_XFS_SECURITY is not set
+# CONFIG_XFS_POSIX_ACL is not set
+# CONFIG_XFS_RT is not set
# CONFIG_GFS2_FS is not set
# CONFIG_OCFS2_FS is not set
# CONFIG_MINIX_FS is not set
============================ end ============================

The boot fails with the following console output:

=========================== begin ===========================
root (hd0,0)
Filesystem type is ext2fs, partition type 0x83
kernel /vmlinuz.pj2 root=/dev/sda3 console=ttyS1,115200 showopts pj2
[Linux-bzImage, setup=0x1c00, size=0x2b66e5]

Linux version 2.6.18-rc4-mm3 (pj@spandau) (gcc version 4.1.0 (SUSE Linux)) #48 SMP Thu Aug 31 04:22:41 CDT 2006
Command line: root=/dev/sda3 console=ttyS1,115200 showopts pj2
BIOS-provided physical RAM map:
BIOS-e820: 0000000000000000 - 000000000009f000 (usable)
BIOS-e820: 000000000009f000 - 0000000000100000 (reserved)
BIOS-e820: 0000000000100000 - 000000007f932000 (usable)
BIOS-e820: 000000007f932000 - 000000007f9d0(ACPI NVS)
BIOS-e820: 000000007f9d0000 - 000000007fa42000 (usable)
BIOS-e820: 000000007fa420000 - 000000007fb2b000 (usable)
BIOS-e820: 000000007fb2b000 - 000000007fb3a000 (ACPI data)
B0000000000-000000007fc00000
Bootmem setup node 0 0000000000000000-000000007fc00000
Zone PFN raProcessor #0 (Bootup-CPU)
ACPI: LAPIC (acpi_id[0x01] lapic_id[0x06] enabled)
Processor #6
ACPapic_id[0x85] disabled)
ACPI: LAPIC (acpi_id[0x06] lapic_id[0x86] disabled)
ACPI: LAPIC (acpi_x02] high level lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0x03] high level lint[0x1])
ACPI: LAPIC_NM0x08] address[0xfec00000] gsi_base[0])
IOAPIC[0]: apic_id 8, address 0xfec00000, GSI 0-23
ACPI0x0b] address[0xfec84400] gsi_base[72])
IOAPIC[3]: apic_id 11, address 0xfec84400, GSI 72-95
AUsing ACPI (MADT) for SMP configuration information
Allocating PCI resources starting at 800000ot=/dev/sda3 console=ttyS1,115200 showopts pj2
Initializing CPU#0
PID hash table entries: 40962 (order: 8, 1048576 bytes)
Checking aperture...
Memory: 2052128k/2093056k available (3519k kerved, 2323k data, 280k init)
Calibrating delay using timer specific routine.. 5324.66 BogoMIPS (lpj=10649332)
Mount-cache hash table entries: 256
CPU: L1 I cache: 32K, L1 D cache: 32K
CPU: L2 cache: 4096K
CPU 0/0 -> Node 0
using mwait in idle threads.
CPU: Physical Processor ID: 0
CPU: Processor Core ID: 0
CPU0: Thermal monitoring enabled (TM2)
SMP alternatives: switching to UP code
ACPI: Core revision 20060707
Using local APIC timer interrupts.
result 20781304
Detected 20.781 MHz APIC timer.
SMP alternatives: switching to SMP code
Booting processor 1/4 APIC 0x6
Initializing CPU#1
Calibrating delay using timer specific routine.. 5320.16 BogoMIPS (lpj=10640330)
CPU: L1 I cache: 32K, L1 D cache: 32K
CPU: L2 cache: 4096K
CPU 1/6 -> Node 0
CPU: Physical Processor ID: 3
CPU: Processor Core ID: 0
CPU1: Thermal monitoring enabled (TM2)
Genuine Intel(R) CPU @ 2.66GHz stepping 04
SMP alternatives: switching to SMP code
Booting processor 2/4 APIC 0x1
Initializing CPU#2
Calibrating delay using timer specific routine.. 5320.16 BogoMIPS (lpj=10640332)
CPU: L1 I cache: 32K, L1 D cache: 32K
CPU: L2 cache: 4096K
CPU 2/1 -> Node 0
CPU: Physical Processor ID: 0
CPU: Processor Core ID: 1
CPU2: Thermal monitoring enabled (TM2)
Genuine Intel(R) CPU @ 2.66GHz stepping 04
SMP alternatives: switching to SMP code
Booting processor 3/4 APIC 0x7
Initializing CPU#3
Calibrating delay using timer specific routine.. 5320.04 BogoMIPS (lpj=10640092)
CPU: L1 I cache: 32K, L1 D cache: 32K
CPU: L2 cache: 4096K
CPU 3/7 -> Node 0
CPU: Physical Processor ID: 3
CPU: Processor Core ID: 1
CPU3: Thermal monitoring enabled (TM2)
Genuine Intel(R) CPU @ 2.66GHz stepping 04
Brought up 4 CPUs
testing NMI watchdog ... OK.
time.c: Using 14.318180 MHz WALL HPET GTOD HPET/TSC timer.
time.c: Detected 2660.007 MHz processor.
migration_cost=30,7937
NET: Registered protocol family 16
ACPI: bus type pci registered
PCI: Using MMCONFIG at a0000000
ACPI: Interpreter enabled
ACPI: Using IOAPIC for interrupt routing
ACPI: PCI Root Bridge [PCI0] (0000:00)
PCI: Ignoring BAR0-3 of IDE controller 0000:00:1f.1
PCI: PXH quirk detected, disabling MSI for SHPC device
PCI: PXH quirk detected, disabling MSI for SHPC device
PCI: Transparent bridge - 0000:00:1e.0
ACPI: PCI Interrupt Link [LNKA] (IRQs 5 7 *10 11)
ACPI: PCI Interrupt Link [LNKB] (IRQs 5 7 10 *11)
ACPI: PCI Interrupt Link [LNKC] (IRQs 5 7 *10 11)
ACPI: PCI Interrupt Link [LNKD] (IRQs *5 7 10 11)
ACPI: PCI Interrupt Link [LNKE] (IRQs *5 7 10 11)
ACPI: PCI Interrupt Link Intel 82802 RNG detected
SCSI subsystem initialized
usbcore: registered new interface driver uirq". If it helps, post a report
hpet0: at MMIO 0xfed00000, IRQs 2, 8, 0
hpet0: 3 64-bit timeow: b8b00000-b8bfffff
PCI: Bridge: 0000:03:00.2
IO window: disabled.
MEM window: disabled.
MEM window: disabled.
PREFETCH window: disabled.
PCI: Bridge: 0000:02:02.0
IO windowisabled.
MEM window: disabled.
PREFETCH window: disabled.
PCI: Bridge: 0000:00:02.0
IOdow: disabled.
MEM window: disabled.
PREFETCH window: disabled.
PCI: Bridge: 0000:00:05.0c:00.2
IO window: disabled.
MEM window: disabled.
PREFETCH window: disabled.
PCI: BridEFETCH window: disabled.
PCI: Bridge: 0000:00:1e.0
IO window: 1000-1fff
MEM window: b8c00terrupt 0000:01:00.0[A] -> GSI 16 (level, low) -> IRQ 169
ACPI: PCI Interrupt 0000:02:00.0[A] - IRQ 169
ACPI: PCI Interrupt 0000:00:03.0[A] -> GSI 16 (level, low) -> IRQ 169
ACPI: PCI Inter Interrupt 0000:00:07.0[A] -> GSI 16 (level, low) -> IRQ 169
IP route cache hash table entries: 65536 (order: 7, 524288 bytes)
TCP established hash table entries: 262144 (order: 10, 4194304 bytes)
TCP bind hash table entries: 65536 (order: 8, 1048576 TCP: Hash tables configured (established 262144 bind 65536)
TCP reno registered
Total HugeTLB io scheduler noop registered
io scheduler deadline registered
io scheduler cfq registered (def0000:00:1d.7 EHCI: BIOS handoff failed (BIOS bug ?) 01010001
assign_interrupt_mode Found MSI capability
assign_interrupt_mode Found MSI capability
assign_interrupt_mode Found MSI capability
assign_interrupt_mode Found MSI capability
assign_interrupt_mode Found MSI capability
assign_interrupt_mode Found MSI capability
assign_interrupt_mode Found MSI capability
assign_interrupt_mode Found MSI capability
assign_interrupt_mode Found MSI capability
aer: probe of 0000:00:02.0:pcie01 failed with error 2
aer: probe of 0000:00:03.0:pcie01 failed with error 1
aer: probe of 0000:00:04.0:pcie01 failed failed with error 2
aer: probe of 0000:00:07.0:pcie01 failed with error 2
ACPI: Power Button r Device is not present [20060707]
ACPI: Getting cpuindex for acpiid 0x4
ACPI Exception (acpi_060707]
ACPI: Getting cpuindex for acpiid 0x6
ACPI Exception (acpi_processor-0681): AE_NOT_FOUReal Time Clock Driver v1.12ac
Linux agpgart interface v0.101 (c) Dave Jones
Serial: 8250/1655/O 0x2f8 (irq = 3) is a 16550A
floppy0: no floppy controllers found
RAMDISK driver initialized: 16 RAM disks of 4096K size 1024 blocksize
loop: loaded (max 8 devices)
IntI 17 sharing vector 0x42 and IRQ 17
ACPI: PCI Interrupt 0000:07:00.0[A] -> GSI 18 (level, low) -> IRQ 66
e1000: 0000:07:00.0: e1000_probe: (PCI Express:2.5Gb/s:Width x4) 00:04:23:cf:2d:d2
e1000: eth0: e1000_probe: Intel(R) PRO/1000 Network Connection
GSI 18 sharing vector 0x4A and IRQ 18
ACPI: PCI Interrupt 0000:07:00.1[B] -> GSI 19 (level, low) -> IRQ 74
e1000: 0000:07:00.1:ide0 at 0x1f0-0x1f7,0x3f6 on irq 14
hda: ATAPI 24X DVD-ROM drive, 256kB Cache, UDMA(33)
Uniform CD-ROM driver Revision: 3.20
megaraid cmm: 2.20.2.7 (Release Date: Sun Jul 16 00:01:03 EST 2006)
megaraid: 2.20.4.9 (Release Date: Sun Jul 16 12:27:22 EST 2006)
megasas: 00.00.03.01 Sun May 14 22:49:52 PDT 2006
megasas: 0x1000:0x0411:0x8086:0x3501: bus 4:slot 14:func 0
ACPI: PCI Interrupt 0000:04:0e.0[A] -> GSI 18 (level, low) -> IRQ 66
scsi0 : LSI Logic SAS based MegaRAID driver
scsi 0:0:0:0: Direct-Access ATA HDT722525DLA380 A80A PQ: 0 ANSI: 5
scsi 0:0:1:0: Direct-Access ATA HDS725050KLA360 AB0A PQ: 0 ANSI: 5
scsi 0:0:2:0: Direct-Access ATA HDS725050KLA360 AB0A PQ: 0 ANSI: 5
scsi 0:0:3:0: Direct-Access Ascsi 0:2:0:0: Direct-Access INTEL SROMBSAS18E 1.00 PQ: 0 ANSI: 5
scsi 0:2:1:0: Direswapper invoked oom-killer: gfp_mask=0xd1, order=0, oomkilladj=0

Call Trace:
[<ffffffff802025bc67>] __alloc_pages+0x229/0x2b2
[<ffffffff80274e46>] cache_grow+0x134/0x333
[<ffffffff802really_probe+0x47/0xc9
[<ffffffff803eea20>] __driver_attach+0x6f/0xaf
[<ffffffff803ee214>] bffffffff803abf12>] acpi_ds_init_one_object+0x0/0x82
[<ffffffff80207046>] init+0x0/0x306
[<ffu 0 hot: high 186, batch 31 used:24
cpu 0 cold: high 62, batch 15 used:0
cpu 1 hot: high 186, 15 used:0
Node 0 Normal per-cpu: empty
Active:0 inactive:0 dirty:0 writeback:0 unstable:0 freeB high:0kB active:0kB inactive:0kB present:0kB pages_scanned:0 all_unreclaimable? no
lowmem_res 0*2048kB 496*4096kB = 2035560kB
Node 0 Normal: empty
Swap cache: add 0, delete 0, find 0/0, r swap cached
Kernel panic - not syncing: Out of memory and no killable processes...
============================ end ============================


Without this bad patch, the system boot continues with the following
messages, slightly overlapping my presentation with the above output:


========================== begin ===========================
...
ACPI: PCI Interrupt 0000:04:0e.0[A] -> GSI 18 (level, low) -> IRQ 66
scsi0 : LSI Logic SAS based MegaRAID driver
scsi 0:0:0:0: Direct-Access ATA HDT722525DLA380 A80A PQ: 0 ANSI: 5
scsi 0:0:1:0: Direct-Access ATA HDS725050KLA360 AB0A PQ: 0 ANSI: 5
scsi 0:0:2:0: Direct-Access ATA HDS725050KLA360 AB0A PQ: 0 ANSI: 5
scsi 0:0:3:0: Direct-Access ATA HDS725050KLA360 AB0A PQ: 0 ANSI: 5
scsi 0:0:4:0: Direct-Access ATA HDS725050KLA360 AB0A PQ: 0 ANSI: 5
scsi 0:2:0:0: Direct-Access INTEL SROMBSAS18E 1.00 PQ: 0 ANSI: 5
scsi 0:2:1:0: Direct-Access INTEL SROMBSAS18E 1.00 PQ: 0 ANSI: 5
SCSI devi: write through
SCSI device sda: 486326272 512-byte hdwr sectors (248999 MB)
sda: test WP fail sda1 sda2 sda3
sd 0:2:0:0: Attached scsi disk sda
SCSI device sdb: 2923825152 512-byte hdwr s assuming drive cache: write through
SCSI device sdb: 2923825152 512-byte hdwr sectors (1496998 sdb1
sd 0:2:1:0: Attached scsi disk sdb
sd 0:2:0:0: Attached scsi generic sg0 type 0
sd 0:2:aw1394: /dev/raw1394 device initialized
GSI 20 sharing vector 0x5A and IRQ 20
ACPI: PCI Interr1d.7: debug port 1
...
============================ end ============================


--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <[email protected]> 1.925.600.0401


2006-08-31 16:17:56

by Mel Gorman

[permalink] [raw]
Subject: Re: x86_64 account-for-memmap patch in 2.6.18-rc4-mm3 doesn't boot.

On Thu, 31 Aug 2006, Paul Jackson wrote:

> The following patch in 2.6.18-rc4-mm3 is broken on my x86_64:
>
> account-for-memmap-and-optionally-the-kernel-image-as-holes.patch
>
> The failure is 100% reproducible.
>

Ok, I'm suprised that it is this patch that causes a problem. I felt the
patch would either explode everywhere or just work.

> The system has a pair of dual-core Intel Xeon 5100 series (Woodcrest)
> processors (4 logical CPUs total) and 2 GBytes of ram.
>
> The .config is what one gets from 'make defconfig' for arch x86_64,
> plus the following changes:
>
> =========================== begin ===========================
> --- .config.def 2006-08-31 04:29:22.100311614 -0500
> +++ .config 2006-08-31 04:29:03.247761750 -0500
> @@ -1,7 +1,7 @@
> #
> # Automatically generated make config: don't edit
> # Linux kernel version: 2.6.18-rc4-mm3
> -# Thu Aug 31 04:29:22 2006
> +# Thu Aug 31 04:07:54 2006
> #
> CONFIG_X86_64=y
> CONFIG_64BIT=y
> @@ -44,7 +44,7 @@
> # CONFIG_AUDIT is not set
> CONFIG_IKCONFIG=y
> CONFIG_IKCONFIG_PROC=y
> -# CONFIG_CPUSETS is not set
> +CONFIG_CPUSETS=y
> # CONFIG_RELAY is not set
> CONFIG_INITRAMFS_SOURCE=""
> CONFIG_UID16=y
> @@ -205,7 +205,7 @@
> # CONFIG_ACPI_ASUS is not set
> # CONFIG_ACPI_IBM is not set
> # CONFIG_ACPI_TOSHIBA is not set
> -CONFIG_ACPI_SONY=m
> +# CONFIG_ACPI_SONY is not set
> CONFIG_ACPI_BLACKLIST_YEAR=0
> # CONFIG_ACPI_DEBUG is not set
> CONFIG_ACPI_EC=y
> @@ -1270,7 +1270,11 @@
> # CONFIG_REISERFS_FS_SECURITY is not set
> # CONFIG_JFS_FS is not set
> CONFIG_FS_POSIX_ACL=y
> -# CONFIG_XFS_FS is not set
> +CONFIG_XFS_FS=y
> +# CONFIG_XFS_QUOTA is not set
> +# CONFIG_XFS_SECURITY is not set
> +# CONFIG_XFS_POSIX_ACL is not set
> +# CONFIG_XFS_RT is not set
> # CONFIG_GFS2_FS is not set
> # CONFIG_OCFS2_FS is not set
> # CONFIG_MINIX_FS is not set
> ============================ end ============================
>

Nothing very suprising there.

> The boot fails with the following console output:
>


ok, this is interesting. It appears that the log is truncated or somehow
corrupt.

> =========================== begin ===========================
> root (hd0,0)
> Filesystem type is ext2fs, partition type 0x83
> kernel /vmlinuz.pj2 root=/dev/sda3 console=ttyS1,115200 showopts pj2
> [Linux-bzImage, setup=0x1c00, size=0x2b66e5]
>
> Linux version 2.6.18-rc4-mm3 (pj@spandau) (gcc version 4.1.0 (SUSE Linux)) #48 SMP Thu Aug 31 04:22:41 CDT 2006
> Command line: root=/dev/sda3 console=ttyS1,115200 showopts pj2
> BIOS-provided physical RAM map:
> BIOS-e820: 0000000000000000 - 000000000009f000 (usable)
> BIOS-e820: 000000000009f000 - 0000000000100000 (reserved)
> BIOS-e820: 0000000000100000 - 000000007f932000 (usable)
> BIOS-e820: 000000007f932000 - 000000007f9d0(ACPI NVS)

Little bit missing here. I don't expect 000000007f9d0 to be truncated like
that.

> BIOS-e820: 000000007f9d0000 - 000000007fa42000 (usable)
> BIOS-e820: 000000007fa420000 - 000000007fb2b000 (usable)

or 000000007fa420000 to have an additional 0 at the end.

> BIOS-e820: 000000007fb2b000 - 000000007fb3a000 (ACPI data)
> B0000000000-000000007fc00000
> Bootmem setup node 0 0000000000000000-000000007fc00000

and this seems to interleave even though the bootmem setup node range
would match your physical memory.

> Zone PFN raProcessor #0 (Bootup-CPU)

There is information missing here. That should be Zone PFN Ranges followed
by a list of active PFN ranges from your system. After that, I expect to
see a message like

X pages DMA reserved
Y pages used for memmap

Do you think this is a problem with your serial console or something else?
Do you see the Zone PFN ranges information when the patch is backed out?
Those messages, as well as botting with loglevel=8 would really help me
figure out what went pear shaped.

> ACPI: LAPIC (acpi_id[0x01] lapic_id[0x06] enabled)
> Processor #6
> ACPapic_id[0x85] disabled)
> ACPI: LAPIC (acpi_id[0x06] lapic_id[0x86] disabled)
> ACPI: LAPIC (acpi_x02] high level lint[0x1])
> ACPI: LAPIC_NMI (acpi_id[0x03] high level lint[0x1])
> ACPI: LAPIC_NM0x08] address[0xfec00000] gsi_base[0])
> IOAPIC[0]: apic_id 8, address 0xfec00000, GSI 0-23
> ACPI0x0b] address[0xfec84400] gsi_base[72])
> IOAPIC[3]: apic_id 11, address 0xfec84400, GSI 72-95
> AUsing ACPI (MADT) for SMP configuration information
> Allocating PCI resources starting at 800000ot=/dev/sda3 console=ttyS1,115200 showopts pj2
> Initializing CPU#0
> PID hash table entries: 40962 (order: 8, 1048576 bytes)
> Checking aperture...
> Memory: 2052128k/2093056k available (3519k kerved, 2323k data, 280k init)
> Calibrating delay using timer specific routine.. 5324.66 BogoMIPS (lpj=10649332)
> Mount-cache hash table entries: 256
> CPU: L1 I cache: 32K, L1 D cache: 32K
> CPU: L2 cache: 4096K
> CPU 0/0 -> Node 0
> using mwait in idle threads.
> CPU: Physical Processor ID: 0
> CPU: Processor Core ID: 0
> CPU0: Thermal monitoring enabled (TM2)
> SMP alternatives: switching to UP code
> ACPI: Core revision 20060707
> Using local APIC timer interrupts.
> result 20781304
> Detected 20.781 MHz APIC timer.
> SMP alternatives: switching to SMP code
> Booting processor 1/4 APIC 0x6
> Initializing CPU#1
> Calibrating delay using timer specific routine.. 5320.16 BogoMIPS (lpj=10640330)
> CPU: L1 I cache: 32K, L1 D cache: 32K
> CPU: L2 cache: 4096K
> CPU 1/6 -> Node 0
> CPU: Physical Processor ID: 3
> CPU: Processor Core ID: 0
> CPU1: Thermal monitoring enabled (TM2)
> Genuine Intel(R) CPU @ 2.66GHz stepping 04
> SMP alternatives: switching to SMP code
> Booting processor 2/4 APIC 0x1
> Initializing CPU#2
> Calibrating delay using timer specific routine.. 5320.16 BogoMIPS (lpj=10640332)
> CPU: L1 I cache: 32K, L1 D cache: 32K
> CPU: L2 cache: 4096K
> CPU 2/1 -> Node 0
> CPU: Physical Processor ID: 0
> CPU: Processor Core ID: 1
> CPU2: Thermal monitoring enabled (TM2)
> Genuine Intel(R) CPU @ 2.66GHz stepping 04
> SMP alternatives: switching to SMP code
> Booting processor 3/4 APIC 0x7
> Initializing CPU#3
> Calibrating delay using timer specific routine.. 5320.04 BogoMIPS (lpj=10640092)
> CPU: L1 I cache: 32K, L1 D cache: 32K
> CPU: L2 cache: 4096K
> CPU 3/7 -> Node 0
> CPU: Physical Processor ID: 3
> CPU: Processor Core ID: 1
> CPU3: Thermal monitoring enabled (TM2)
> Genuine Intel(R) CPU @ 2.66GHz stepping 04
> Brought up 4 CPUs
> testing NMI watchdog ... OK.
> time.c: Using 14.318180 MHz WALL HPET GTOD HPET/TSC timer.
> time.c: Detected 2660.007 MHz processor.
> migration_cost=30,7937
> NET: Registered protocol family 16
> ACPI: bus type pci registered
> PCI: Using MMCONFIG at a0000000
> ACPI: Interpreter enabled
> ACPI: Using IOAPIC for interrupt routing
> ACPI: PCI Root Bridge [PCI0] (0000:00)
> PCI: Ignoring BAR0-3 of IDE controller 0000:00:1f.1
> PCI: PXH quirk detected, disabling MSI for SHPC device
> PCI: PXH quirk detected, disabling MSI for SHPC device
> PCI: Transparent bridge - 0000:00:1e.0
> ACPI: PCI Interrupt Link [LNKA] (IRQs 5 7 *10 11)
> ACPI: PCI Interrupt Link [LNKB] (IRQs 5 7 10 *11)
> ACPI: PCI Interrupt Link [LNKC] (IRQs 5 7 *10 11)
> ACPI: PCI Interrupt Link [LNKD] (IRQs *5 7 10 11)
> ACPI: PCI Interrupt Link [LNKE] (IRQs *5 7 10 11)
> ACPI: PCI Interrupt Link Intel 82802 RNG detected
> SCSI subsystem initialized
> usbcore: registered new interface driver uirq". If it helps, post a report
> hpet0: at MMIO 0xfed00000, IRQs 2, 8, 0
> hpet0: 3 64-bit timeow: b8b00000-b8bfffff
> PCI: Bridge: 0000:03:00.2
> IO window: disabled.
> MEM window: disabled.
> MEM window: disabled.
> PREFETCH window: disabled.
> PCI: Bridge: 0000:02:02.0
> IO windowisabled.
> MEM window: disabled.
> PREFETCH window: disabled.
> PCI: Bridge: 0000:00:02.0
> IOdow: disabled.
> MEM window: disabled.
> PREFETCH window: disabled.
> PCI: Bridge: 0000:00:05.0c:00.2
> IO window: disabled.
> MEM window: disabled.
> PREFETCH window: disabled.
> PCI: BridEFETCH window: disabled.
> PCI: Bridge: 0000:00:1e.0
> IO window: 1000-1fff
> MEM window: b8c00terrupt 0000:01:00.0[A] -> GSI 16 (level, low) -> IRQ 169
> ACPI: PCI Interrupt 0000:02:00.0[A] - IRQ 169
> ACPI: PCI Interrupt 0000:00:03.0[A] -> GSI 16 (level, low) -> IRQ 169
> ACPI: PCI Inter Interrupt 0000:00:07.0[A] -> GSI 16 (level, low) -> IRQ 169
> IP route cache hash table entries: 65536 (order: 7, 524288 bytes)
> TCP established hash table entries: 262144 (order: 10, 4194304 bytes)
> TCP bind hash table entries: 65536 (order: 8, 1048576 TCP: Hash tables configured (established 262144 bind 65536)
> TCP reno registered
> Total HugeTLB io scheduler noop registered
> io scheduler deadline registered
> io scheduler cfq registered (def0000:00:1d.7 EHCI: BIOS handoff failed (BIOS bug ?) 01010001
> assign_interrupt_mode Found MSI capability
> assign_interrupt_mode Found MSI capability
> assign_interrupt_mode Found MSI capability
> assign_interrupt_mode Found MSI capability
> assign_interrupt_mode Found MSI capability
> assign_interrupt_mode Found MSI capability
> assign_interrupt_mode Found MSI capability
> assign_interrupt_mode Found MSI capability
> assign_interrupt_mode Found MSI capability
> aer: probe of 0000:00:02.0:pcie01 failed with error 2
> aer: probe of 0000:00:03.0:pcie01 failed with error 1
> aer: probe of 0000:00:04.0:pcie01 failed failed with error 2
> aer: probe of 0000:00:07.0:pcie01 failed with error 2
> ACPI: Power Button r Device is not present [20060707]
> ACPI: Getting cpuindex for acpiid 0x4
> ACPI Exception (acpi_060707]
> ACPI: Getting cpuindex for acpiid 0x6
> ACPI Exception (acpi_processor-0681): AE_NOT_FOUReal Time Clock Driver v1.12ac
> Linux agpgart interface v0.101 (c) Dave Jones
> Serial: 8250/1655/O 0x2f8 (irq = 3) is a 16550A
> floppy0: no floppy controllers found
> RAMDISK driver initialized: 16 RAM disks of 4096K size 1024 blocksize
> loop: loaded (max 8 devices)
> IntI 17 sharing vector 0x42 and IRQ 17
> ACPI: PCI Interrupt 0000:07:00.0[A] -> GSI 18 (level, low) -> IRQ 66
> e1000: 0000:07:00.0: e1000_probe: (PCI Express:2.5Gb/s:Width x4) 00:04:23:cf:2d:d2
> e1000: eth0: e1000_probe: Intel(R) PRO/1000 Network Connection
> GSI 18 sharing vector 0x4A and IRQ 18
> ACPI: PCI Interrupt 0000:07:00.1[B] -> GSI 19 (level, low) -> IRQ 74
> e1000: 0000:07:00.1:ide0 at 0x1f0-0x1f7,0x3f6 on irq 14
> hda: ATAPI 24X DVD-ROM drive, 256kB Cache, UDMA(33)
> Uniform CD-ROM driver Revision: 3.20
> megaraid cmm: 2.20.2.7 (Release Date: Sun Jul 16 00:01:03 EST 2006)
> megaraid: 2.20.4.9 (Release Date: Sun Jul 16 12:27:22 EST 2006)
> megasas: 00.00.03.01 Sun May 14 22:49:52 PDT 2006
> megasas: 0x1000:0x0411:0x8086:0x3501: bus 4:slot 14:func 0
> ACPI: PCI Interrupt 0000:04:0e.0[A] -> GSI 18 (level, low) -> IRQ 66
> scsi0 : LSI Logic SAS based MegaRAID driver
> scsi 0:0:0:0: Direct-Access ATA HDT722525DLA380 A80A PQ: 0 ANSI: 5
> scsi 0:0:1:0: Direct-Access ATA HDS725050KLA360 AB0A PQ: 0 ANSI: 5
> scsi 0:0:2:0: Direct-Access ATA HDS725050KLA360 AB0A PQ: 0 ANSI: 5
> scsi 0:0:3:0: Direct-Access Ascsi 0:2:0:0: Direct-Access INTEL SROMBSAS18E 1.00 PQ: 0 ANSI: 5
> scsi 0:2:1:0: Direswapper invoked oom-killer: gfp_mask=0xd1, order=0, oomkilladj=0
>
> Call Trace:
> [<ffffffff802025bc67>] __alloc_pages+0x229/0x2b2
> [<ffffffff80274e46>] cache_grow+0x134/0x333
> [<ffffffff802really_probe+0x47/0xc9
> [<ffffffff803eea20>] __driver_attach+0x6f/0xaf
> [<ffffffff803ee214>] bffffffff803abf12>] acpi_ds_init_one_object+0x0/0x82
> [<ffffffff80207046>] init+0x0/0x306
> [<ffu 0 hot: high 186, batch 31 used:24
> cpu 0 cold: high 62, batch 15 used:0
> cpu 1 hot: high 186, 15 used:0
> Node 0 Normal per-cpu: empty
> Active:0 inactive:0 dirty:0 writeback:0 unstable:0 freeB high:0kB active:0kB inactive:0kB present:0kB pages_scanned:0 all_unreclaimable? no
> lowmem_res 0*2048kB 496*4096kB = 2035560kB

This is also garbled up. This is in show_free_areas() though and it looks
like it is saying there are 496*4096kB pages currently free. Not clear at
all how it managed to go OOM due to this patch.

> Node 0 Normal: empty
> Swap cache: add 0, delete 0, find 0/0, r swap cached
> Kernel panic - not syncing: Out of memory and no killable processes...
> ============================ end ============================
>
>
> Without this bad patch, the system boot continues with the following
> messages, slightly overlapping my presentation with the above output:
>
>
> ========================== begin ===========================
> ...
> ACPI: PCI Interrupt 0000:04:0e.0[A] -> GSI 18 (level, low) -> IRQ 66
> scsi0 : LSI Logic SAS based MegaRAID driver
> scsi 0:0:0:0: Direct-Access ATA HDT722525DLA380 A80A PQ: 0 ANSI: 5
> scsi 0:0:1:0: Direct-Access ATA HDS725050KLA360 AB0A PQ: 0 ANSI: 5
> scsi 0:0:2:0: Direct-Access ATA HDS725050KLA360 AB0A PQ: 0 ANSI: 5
> scsi 0:0:3:0: Direct-Access ATA HDS725050KLA360 AB0A PQ: 0 ANSI: 5
> scsi 0:0:4:0: Direct-Access ATA HDS725050KLA360 AB0A PQ: 0 ANSI: 5
> scsi 0:2:0:0: Direct-Access INTEL SROMBSAS18E 1.00 PQ: 0 ANSI: 5
> scsi 0:2:1:0: Direct-Access INTEL SROMBSAS18E 1.00 PQ: 0 ANSI: 5
> SCSI devi: write through
> SCSI device sda: 486326272 512-byte hdwr sectors (248999 MB)
> sda: test WP fail sda1 sda2 sda3
> sd 0:2:0:0: Attached scsi disk sda
> SCSI device sdb: 2923825152 512-byte hdwr s assuming drive cache: write through
> SCSI device sdb: 2923825152 512-byte hdwr sectors (1496998 sdb1
> sd 0:2:1:0: Attached scsi disk sdb
> sd 0:2:0:0: Attached scsi generic sg0 type 0
> sd 0:2:aw1394: /dev/raw1394 device initialized
> GSI 20 sharing vector 0x5A and IRQ 20
> ACPI: PCI Interr1d.7: debug port 1
> ...
> ============================ end ============================
>
>
> --
> I won't rest till it's the best ...
> Programmer, Linux Scalability
> Paul Jackson <[email protected]> 1.925.600.0401
>

Can I see a full bootlog with the patch backed out to see if that console
garbling is still there please? Have you any idea why the console garbling
is happening?

Thanks

--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab

2006-08-31 17:02:33

by Paul Jackson

[permalink] [raw]
Subject: Re: x86_64 account-for-memmap patch in 2.6.18-rc4-mm3 doesn't boot.

Mel wrote:
> Have you any idea why the console garbling is happening?

Yeah - you're right - it's garbled. Looks like its dropping chars.

I don't know why, but I'm not surprised. It's a lab system with a
new (for us) way of rigging the console output. I just got this
particular x86_64's console connection to work at all yesterday.

I've been working indirectly through my good lab tech. I should
drive in to the lab that has this rig (an hour away) and check it
out in person, and see what can be done to get clean console output.

This may take a day or three to yield results, unless I get lucky.

--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <[email protected]> 1.925.600.0401

2006-09-01 08:38:18

by Mel Gorman

[permalink] [raw]
Subject: Re: x86_64 account-for-memmap patch in 2.6.18-rc4-mm3 doesn't boot.

On Thu, 31 Aug 2006, Paul Jackson wrote:

> Mel wrote:
>> Have you any idea why the console garbling is happening?
>
> Yeah - you're right - it's garbled. Looks like its dropping chars.
>

Or writing some chars twice but at a different time. The system might be
one of those that fakes serial console output on the assumption the
operating system isn't doing the same thing. I've seen one or two blade
systems that did something like this with mixed results.

> I don't know why, but I'm not surprised. It's a lab system with a
> new (for us) way of rigging the console output. I just got this
> particular x86_64's console connection to work at all yesterday.
>
> I've been working indirectly through my good lab tech. I should
> drive in to the lab that has this rig (an hour away) and check it
> out in person, and see what can be done to get clean console output.
>

That is a bit of a sickener. It may be worth getting your good lab tech
to check if there is a configuration setting in the hardware for
simulating console output before you make the trip.

> This may take a day or three to yield results, unless I get lucky.
>

I have Keith's problem with reserve-based-hot-add to keep me occupied in
the meantime. Whenever you get the chance will be fine. Thanks a lot

--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab

2006-09-02 03:25:19

by Paul Jackson

[permalink] [raw]
Subject: Re: x86_64 account-for-memmap patch in 2.6.18-rc4-mm3 doesn't boot.

> That is a bit of a sickener. It may be worth getting your good lab tech
> to check if there is a configuration setting in the hardware for
> simulating console output before you make the trip.

Apparently my lab setup simply lacks correct flow control on the serial
console line. I hacked the 8250 serial driver in my kernel to put a one
msec delay between each character output, and it no longer drops
console output during boot.

> > This may take a day or three to yield results, unless I get lucky.
> >
>
> I have Keith's problem with reserve-based-hot-add to keep me occupied in
> the meantime. Whenever you get the chance will be fine. Thanks a lot

Ok, below is the console output for one of these crashes.

This output is missing the first couple dozen lines commencing with
grub announcing it is loading my kernel, as those lines seem to go via
a different serial driver that I didn't chase down to hack. Those
initial lines were still dropping lotsa chars. If you need those
initial lines bad, holler, and I can probably hack something to get
them to show up.

By the way, the crash continues to happen 100% with the patch:

patches/account-for-memmap-and-optionally-the-kernel-image-as-holes.patch

and zero percent without it. So this patch continues to be suspect
number one. There is no suspect number two ;).

Notice the really bogus looking memory size numbers on the line near
the end that begins "Node 0 DMA free: ...". No, this is not a
gazillion petabyte Altix. It's a mundane 2 GByte, 2 processor package
(4 cores total) Xeon system.

Without further ado ...
=======================

CPU: L1 I cache: 32K, L1 D cache: 32K
CPU: L2 cache: 4096K
CPU 0/0 -> Node 0
using mwait in idle threads.
CPU: Physical Processor ID: 0
CPU: Processor Core ID: 0
CPU0: Thermal monitoring enabled (TM2)
SMP alternatives: switching to UP code
ACPI: Core revision 20060707
Using local APIC timer interrupts.
result 20781258
Detected 20.781 MHz APIC timer.
SMP alternatives: switching to SMP code
Booting processor 1/4 APIC 0x6
Initializing CPU#1
Calibrating delay using timer specific routine.. 5320.09 BogoMIPS (lpj=10640184)
CPU: L1 I cache: 32K, L1 D cache: 32K
CPU: L2 cache: 4096K
CPU 1/6 -> Node 0
CPU: Physical Processor ID: 3
CPU: Processor Core ID: 0
CPU1: Thermal monitoring enabled (TM2)
Genuine Intel(R) CPU @ 2.66GHz stepping 04
SMP alternatives: switching to SMP code
Booting processor 2/4 APIC 0x1
Initializing CPU#2
Calibrating delay using timer specific routine.. 5320.27 BogoMIPS (lpj=10640543)
CPU: L1 I cache: 32K, L1 D cache: 32K
CPU: L2 cache: 4096K
CPU 2/1 -> Node 0
CPU: Physical Processor ID: 0
CPU: Processor Core ID: 1
CPU2: Thermal monitoring enabled (TM2)
Genuine Intel(R) CPU @ 2.66GHz stepping 04
SMP alternatives: switching to SMP code
Booting processor 3/4 APIC 0x7
Initializing CPU#3
Calibrating delay using timer specific routine.. 5320.03 BogoMIPS (lpj=10640065)
CPU: L1 I cache: 32K, L1 D cache: 32K
CPU: L2 cache: 4096K
CPU 3/7 -> Node 0
CPU: Physical Processor ID: 3
CPU: Processor Core ID: 1
CPU3: Thermal monitoring enabled (TM2)
Genuine Intel(R) CPU @ 2.66GHz stepping 04
Brought up 4 CPUs
testing NMI watchdog ... OK.
time.c: Using 14.318180 MHz WALL HPET GTOD HPET/TSC timer.
time.c: Detected 2660.003 MHz processor.
migration_cost=26,7972
NET: Registered protocol family 16
ACPI: bus type pci registered
PCI: Using MMCONFIG at a0000000
ACPI: Interpreter enabled
ACPI: Using IOAPIC for interrupt routing
ACPI: PCI Root Bridge [PCI0] (0000:00)
PCI: Ignoring BAR0-3 of IDE controller 0000:00:1f.1
PCI: PXH quirk detected, disabling MSI for SHPC device
PCI: PXH quirk detected, disabling MSI for SHPC device
PCI: Transparent bridge - 0000:00:1e.0
ACPI: PCI Interrupt Link [LNKA] (IRQs 5 7 *10 11)
ACPI: PCI Interrupt Link [LNKB] (IRQs 5 7 10 *11)
ACPI: PCI Interrupt Link [LNKC] (IRQs 5 7 *10 11)
ACPI: PCI Interrupt Link [LNKD] (IRQs *5 7 10 11)
ACPI: PCI Interrupt Link [LNKE] (IRQs *5 7 10 11)
ACPI: PCI Interrupt Link [LNKF] (IRQs 5 7 10 11) *0, disabled.
ACPI: PCI Interrupt Link [LNKG] (IRQs 5 7 *10 11)
ACPI: PCI Interrupt Link [LNKH] (IRQs 5 7 10 *11)
Intel 82802 RNG detected
SCSI subsystem initialized
usbcore: registered new interface driver usbfs
usbcore: registered new interface driver hub
usbcore: registered new device driver usb
PCI: Using ACPI for IRQ routing
PCI: If a device doesn't work, try "pci=routeirq". If it helps, post a report
hpet0: at MMIO 0xfed00000, IRQs 2, 8, 0
hpet0: 3 64-bit timers, 14318180 Hz
PCI-GART: No AMD northbridge found.
PCI: Bridge: 0000:03:00.0
IO window: disabled.
MEM window: b8900000-b89fffff
PREFETCH window: b8b00000-b8bfffff
PCI: Bridge: 0000:03:00.2
IO window: disabled.
MEM window: disabled.
PREFETCH window: disabled.
PCI: Bridge: 0000:02:00.0
IO window: disabled.
MEM window: b8900000-b89fffff
PREFETCH window: b8b00000-b8bfffff
PCI: Bridge: 0000:02:01.0
IO window: disabled.
MEM window: disabled.
PREFETCH window: disabled.
PCI: Bridge: 0000:02:02.0
IO window: 2000-2fff
MEM window: b8000000-b88fffff
PREFETCH window: disabled.
PCI: Bridge: 0000:01:00.0
IO window: 2000-2fff
MEM window: b8000000-b89fffff
PREFETCH window: b8b00000-b8bfffff
PCI: Bridge: 0000:01:00.3
IO window: disabled.
MEM window: disabled.
PREFETCH window: disabled.
PCI: Bridge: 0000:00:02.0
IO window: 2000-2fff
MEM window: b8000000-b8afffff
PREFETCH window: b8b00000-b8bfffff
PCI: Bridge: 0000:00:03.0
IO window: disabled.
MEM window: disabled.
PREFETCH window: disabled.
PCI: Bridge: 0000:00:04.0
IO window: disabled.
MEM window: disabled.
PREFETCH window: disabled.
PCI: Bridge: 0000:00:05.0
IO window: disabled.
MEM window: disabled.
PREFETCH window: disabled.
PCI: Bridge: 0000:0c:00.0
IO window: disabled.
MEM window: disabled.
PREFETCH window: disabled.
PCI: Bridge: 0000:0c:00.2
IO window: disabled.
MEM window: disabled.
PREFETCH window: disabled.
PCI: Bridge: 0000:00:06.0
IO window: disabled.
MEM window: disabled.
PREFETCH window: disabled.
PCI: Bridge: 0000:00:07.0
IO window: disabled.
MEM window: disabled.
PREFETCH window: disabled.
PCI: Bridge: 0000:00:1e.0
IO window: 1000-1fff
MEM window: b8c00000-b8cfffff
PREFETCH window: b0000000-b7ffffff
GSI 16 sharing vector 0xA9 and IRQ 16
ACPI: PCI Interrupt 0000:00:02.0[A] -> GSI 16 (level, low) -> IRQ 169
ACPI: PCI Interrupt 0000:01:00.0[A] -> GSI 16 (level, low) -> IRQ 169
ACPI: PCI Interrupt 0000:02:00.0[A] -> GSI 16 (level, low) -> IRQ 169
ACPI: PCI Interrupt 0000:02:01.0[A] -> GSI 16 (level, low) -> IRQ 169
ACPI: PCI Interrupt 0000:02:02.0[A] -> GSI 16 (level, low) -> IRQ 169
ACPI: PCI Interrupt 0000:00:03.0[A] -> GSI 16 (level, low) -> IRQ 169
ACPI: PCI Interrupt 0000:00:04.0[A] -> GSI 16 (level, low) -> IRQ 169
ACPI: PCI Interrupt 0000:00:05.0[A] -> GSI 16 (level, low) -> IRQ 169
ACPI: PCI Interrupt 0000:00:06.0[A] -> GSI 16 (level, low) -> IRQ 169
ACPI: PCI Interrupt 0000:00:07.0[A] -> GSI 16 (level, low) -> IRQ 169
NET: Registered protocol family 2
IP route cache hash table entries: 65536 (order: 7, 524288 bytes)
TCP established hash table entries: 262144 (order: 10, 4194304 bytes)
TCP bind hash table entries: 65536 (order: 8, 1048576 bytes)
TCP: Hash tables configured (established 262144 bind 65536)
TCP reno registered
Total HugeTLB memory allocated, 0
Installing knfsd (copyright (C) 1996 [email protected]).
SGI XFS with large block/inode numbers, no debug enabled
io scheduler noop registered
io scheduler deadline registered
io scheduler cfq registered (default)
0000:00:1d.7 EHCI: BIOS handoff failed (BIOS bug ?) 01010001
assign_interrupt_mode Found MSI capability
assign_interrupt_mode Found MSI capability
assign_interrupt_mode Found MSI capability
assign_interrupt_mode Found MSI capability
assign_interrupt_mode Found MSI capability
assign_interrupt_mode Found MSI capability
assign_interrupt_mode Found MSI capability
assign_interrupt_mode Found MSI capability
assign_interrupt_mode Found MSI capability
aer: probe of 0000:00:02.0:pcie01 failed with error 2
aer: probe of 0000:00:03.0:pcie01 failed with error 1
aer: probe of 0000:00:04.0:pcie01 failed with error 2
aer: probe of 0000:00:05.0:pcie01 failed with error 2
aer: probe of 0000:00:06.0:pcie01 failed with error 2
aer: probe of 0000:00:07.0:pcie01 failed with error 2
ACPI: Power Button (FF) [PWRF]
ACPI: Power Button (CM) [PWRB]
ACPI: Invalid PBLK length [5]
ACPI: Invalid PBLK length [5]
ACPI: Invalid PBLK length [5]
ACPI: Invalid PBLK length [5]
ACPI Exception (acpi_processor-0681): AE_NOT_FOUND, Processor Device is not present [20060707]
ACPI: Getting cpuindex for acpiid 0x4
ACPI Exception (acpi_processor-0681): AE_NOT_FOUND, Processor Device is not present [20060707]
ACPI: Getting cpuindex for acpiid 0x5
ACPI Exception (acpi_processor-0681): AE_NOT_FOUND, Processor Device is not present [20060707]
ACPI: Getting cpuindex for acpiid 0x6
ACPI Exception (acpi_processor-0681): AE_NOT_FOUND, Processor Device is not present [20060707]
ACPI: Getting cpuindex for acpiid 0x7
Real Time Clock Driver v1.12ac
Linux agpgart interface v0.101 (c) Dave Jones
Serial: 8250/16550 driver $Revision: 1.90 $ 4 ports, IRQ sharing disabled
serial8250: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
serial8250: ttyS1 at I/O 0x2f8 (irq = 3) is a 16550A
floppy0: no floppy controllers found
RAMDISK driver initialized: 16 RAM disks of 4096K size 1024 blocksize
loop: loaded (max 8 devices)
Intel(R) PRO/1000 Network Driver - version 7.1.9-k6
Copyright (c) 1999-2006 Intel Corporation.
GSI 17 sharing vector 0x42 and IRQ 17
ACPI: PCI Interrupt 0000:07:00.0[A] -> GSI 18 (level, low) -> IRQ 66
e1000: 0000:07:00.0: e1000_probe: (PCI Express:2.5Gb/s:Width x4) 00:04:23:cf:2d:d2e1000: eth0: e1000_probe: Intel(R) PRO/1000 Network Connection
GSI 18 sharing vector 0x4A and IRQ 18
ACPI: PCI Interrupt 0000:07:00.1[B] -> GSI 19 (level, low) -> IRQ 74
e1000: 0000:07:00.1: e1000_probe: (PCI Express:2.5Gb/s:Width x4) 00:04:23:cf:2d:d3
e1000: eth1: e1000_probe: Intel(R) PRO/1000 Network Connection
e100: Intel(R) PRO/100 Network Driver, 3.5.10-k4-NAPI
e100: Cohda: DV-28E-N, ATAPI CD/DVD-ROM drive
ide0 at 0x1f0-0x1f7,0x3f6 on irq 14
hda: ATAPI 24X DVD-ROM drive, 256kB Cache, UDMA(33)
Uniform CD-ROM driver Revision: 3.20
megaraid cmm: 2.20.2.7 (Release Date: Sun Jul 16 00:01:03 EST 2006)
megaraid: 2.20.4.9 (Release Date: Sun Jul 16 12:27:22 EST 2006)
megasas: 00.00.03.01 Sun May 14 22:49:52 PDT 2006
megasas: 0x1000:0x0411:0x8086:0x3501: bus 4:slot 14:func 0
ACPI: PCI Interrupt 0000:04:0e.0[A] -> GSI 18 (level, low) -> IRQ 66
scsi0 : LSI Logic SAS based MegaRAID driver
scsi 0:0:0:0: Direct-Access ATA HDT722525DLA380 A80A PQ: 0 ANSI: 5
scsi 0:0:1:0: Direct-Access ATA HDS725050KLA360 AB0A PQ: 0 ANSI: 5
scsi 0:0:2:0: Direct-Access ATA HDS725050KLA360 AB0A PQ: 0 ANSI: 5
scsi 0:0:3:0: Direct-Access ATA HDS725050KLA360 AB0A PQ: 0 ANSI: 5
scsi 0:0:4:0: Direct-Access ATA HDS725050KLA360 AB0A PQ: 0 ANSI: 5
scsi 0:2:0:0: Direct-Access INTEL SROMBSAS18E 1.00 PQ: 0 ANSI: 5
scsi 0:2:1:0: Direct-Access INTEL SROMBSAS18E 1.00 PQ: 0 ANSI: 5
swapper invoked oom-killer: gfp_mask=0xd1, order=0, oomkilladj=0

Call Trace:
[<ffffffff8020acfa>] show_trace+0x34/0x47
[<ffffffff8020ad1f>] dump_stack+0x12/0x17
[<ffffffff8025a27f>] out_of_memory+0x79/0x282
[<ffffffff8025bce3>] __alloc_pages+0x229/0x2b2
[<ffffffff80274ec2>] cache_grow+0x134/0x333
[<ffffffff802753d2>] cache_alloc_refill+0x17e/0x1cc
[<ffffffff80275820>] kmem_cache_alloc+0x6c/0x76
[<ffffffff8047b8fb>] sd_revalidate_disk+0x3a/0xcdb
[<ffffffff8047d39d>] sd_probe+0x28b/0x31e
[<ffffffff803ee8b7>] really_probe+0x47/0xc9
[<ffffffff803eeaa8>] __driver_attach+0x6f/0xaf
[<ffffffff803ee29c>] bus_for_each_dev+0x43/0x6e
[<ffffffff803eddbc>] bus_add_driver+0x6b/0x18d
[<ffffffff80207184>] init+0x13e/0x306
[<ffffffff8020a3f8>] child_rip+0xa/0x12
DWARF2 unwinder stuck at child_rip+0xa/0x12
Leftover inexact backtrace:
[<ffffffff803abf92>] acpi_ds_init_one_object+0x0/0x82
[<ffffffff80207046>] init+0x0/0x306
[<ffffffff8020a3ee>] child_rip+0x0/0x12

Mem-info:
Node 0 DMA per-cpu:
cpu 0 hot: high 186, batch 31 used:0
cpu 0 cold: high 62, batch 15 used:0
cpu 1 hot: high 186, batch 31 used:0
cpu 1 cold: high 62, batch 15 used:0
cpu 2 hot: high 186, batch 31 used:0
cpu 2 cold: high 62, batch 15 used:0
cpu 3 hot: high 186, batch 31 used:0
cpu 3 cold: high 62, batch 15 used:0
Node 0 DMA32 per-cpu:
cpu 0 hot: high 186, batch 31 used:19
cpu 0 cold: high 62, batch 15 used:0
cpu 1 hot: high 186, batch 31 used:16
cpu 1 cold: high 62, batch 15 used:0
cpu 2 hot: high 186, batch 31 used:158
cpu 2 cold: high 62, batch 15 used:0
cpu 3 hot: high 186, batch 31 used:10
cpu 3 cold: high 62, batch 15 used:0
Node 0 Normal per-cpu: empty
Active:0 inactive:0 dirty:0 writeback:0 unstable:0 free:509486 slab:1362 mapped:0 pagetables:0
Node 0 DMA free:1616kB min:143085642166168kB low:178857052707708kB high:214628463249252kB active:0kB inactive:0kB present:18446744073709538996kB pages_scanned:0 all_unreclaimable? yes
lowmem_reserve[]: 0 2026 2026
Node 0 DMA32 free:2036328kB min:5776kB low:7220kB high:8664kB active:0kB inactive:0kB present:2075356kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 0 0
Node 0 Normal free:0kB min:0kB low:0kB high:0kB active:0kB inactive:0kB present:0kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 0 0
Node 0 DMA: 2*4kB 3*8kB 1*16kB 1*32kB 2*64kB 1*128kB 1*256kB 0*512kB 1*1024kB 0*2048kB 0*4096kB = 1616kB
Node 0 DMA32: 0*4kB 1*8kB 0*16kB 1*32kB 5*64kB 8*128kB 3*256kB 1*512kB 0*1024kB 1*2048kB 496*4096kB = 2036328kB
Node 0 Normal: empty
Swap cache: add 0, delete 0, find 0/0, race 0+0
Free swap = 0kB
Total swap = 0kB
Free swap: 0kB
523264 pages of RAM
10232 reserved pages
0 pages shared
0 pages swap cached
Kernel panic - not syncing: Out of memory and no killable processes...



--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <[email protected]> 1.925.600.0401

--
VGER BF report: U 0.5

2006-09-04 09:45:09

by mel

[permalink] [raw]
Subject: Re: x86_64 account-for-memmap patch in 2.6.18-rc4-mm3 doesn't boot.

On (01/09/06 20:24), Paul Jackson didst pronounce:
> > That is a bit of a sickener. It may be worth getting your good lab tech
> > to check if there is a configuration setting in the hardware for
> > simulating console output before you make the trip.
>
> Apparently my lab setup simply lacks correct flow control on the serial
> console line. I hacked the 8250 serial driver in my kernel to put a one
> msec delay between each character output, and it no longer drops
> console output during boot.
>

Nice work.

> > > This may take a day or three to yield results, unless I get lucky.
> > >
> >
> > I have Keith's problem with reserve-based-hot-add to keep me occupied in
> > the meantime. Whenever you get the chance will be fine. Thanks a lot
>
> Ok, below is the console output for one of these crashes.
>
> This output is missing the first couple dozen lines commencing with
> grub announcing it is loading my kernel, as those lines seem to go via
> a different serial driver that I didn't chase down to hack. Those
> initial lines were still dropping lotsa chars. If you need those
> initial lines bad, holler, and I can probably hack something to get
> them to show up.
>

I could do with those lines, but I believe there was enough information
printed to determine why it failed to boot. I've attached a patch that
should boot the machine and assuming it works, I just need the output of
dmesg.

> By the way, the crash continues to happen 100% with the patch:
>
> patches/account-for-memmap-and-optionally-the-kernel-image-as-holes.patch
>

Not suprising considering what the min_free_kbytes is from this output!

> Node 0 DMA free:1616kB min:143085642166168kB low:178857052707708kB high:214628463249252kB active:0kB inactive:0kB present:18446744073709538996kB pages_scanned:0 all_unreclaimable? yes
> lowmem_reserve[]: 0 2026 2026
> Node 0 DMA32 free:2036328kB min:5776kB low:7220kB high:8664kB active:0kB inactive:0kB present:2075356kB pages_scanned:0 all_unreclaimable? no
>

I believe it is because memmap was calculated to be bigger than it
possibly could be. Can you try booting the following patch with
loglevel=8 and send me the dmesg output if it boots please? Thanks

diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.18-rc4-mm3-clean/mm/page_alloc.c linux-2.6.18-rc4-mm3-fix_accountmemmap/mm/page_alloc.c
--- linux-2.6.18-rc4-mm3-clean/mm/page_alloc.c 2006-08-28 15:05:30.000000000 +0100
+++ linux-2.6.18-rc4-mm3-fix_accountmemmap/mm/page_alloc.c 2006-09-04 10:36:04.000000000 +0100
@@ -2373,7 +2373,9 @@ unsigned long __meminit account_memmap(s
if (zone_index == memmap_zone_idx(pgdat->node_mem_map)) {
pages = pgdat->node_spanned_pages;
pages = (pages * sizeof(struct page)) >> PAGE_SHIFT;
- printk(KERN_DEBUG "%lu pages used for memmap\n", pages);
+ printk(KERN_DEBUG
+ " %s zone: %lu pages used for memmap\n",
+ zone_names[zone_index], pages);
}
return pages;
}
@@ -2411,7 +2413,9 @@ unsigned long account_memmap(struct pgli
}

pages >>= PAGE_SHIFT;
- printk(KERN_DEBUG "%lu pages used for SPARSE memmap\n", pages);
+ printk(KERN_DEBUG
+ " %s zone: %lu pages used for SPARSEMEM memmap\n",
+ zone_names[zone_index], pages);
return pages;
}
#endif
@@ -2437,17 +2441,24 @@ static void __meminit free_area_init_cor

for (j = 0; j < MAX_NR_ZONES; j++) {
struct zone *zone = pgdat->node_zones + j;
- unsigned long size, realsize;
+ unsigned long size, realsize, memmap_size;

size = zone_spanned_pages_in_node(nid, j, zones_size);
realsize = size - zone_absent_pages_in_node(nid, j,
zholes_size);

- realsize -= account_memmap(pgdat, j);
+ /* Account for the size of mem_map */
+ memmap_size = account_memmap(pgdat, j);
+ if (realsize >= memmap_size)
+ realsize -= memmap_size;
+ else
+ printk(KERN_WARNING "memmap_size of %lu exceeds %lu\n",
+ memmap_size, realsize);
+
/* Account for reserved DMA pages */
if (j == ZONE_DMA && realsize > dma_reserve) {
realsize -= dma_reserve;
- printk(KERN_DEBUG "%lu pages DMA reserved\n",
+ printk(KERN_DEBUG " DMA zone: %lu pages reserved\n",
dma_reserve);
}


--
VGER BF report: U 0.499996

2006-09-06 22:10:57

by Paul Jackson

[permalink] [raw]
Subject: Re: x86_64 account-for-memmap patch in 2.6.18-rc4-mm3 doesn't boot.

Mel Gorman wrote:
> I could do with those lines, but I believe there was enough information
> printed to determine why it failed to boot. I've attached a patch that
> should boot the machine and assuming it works, I just need the output of
> dmesg.

Yup - that patch booted it, and produced the output you asked for.

Here's the dmesg output from booting your patch:

Linux version 2.6.18-rc4-mm3 (pj@spandau) (gcc version 4.1.0 (SUSE Linux)) #60 SMP Wed Sep 6 16:34:36 CDT 2006
Command line: root=/dev/sda3 console=ttyS1,115200 showopts pj1
BIOS-provided physical RAM map:
BIOS-e820: 0000000000000000 - 000000000009f000 (usable)
BIOS-e820: 000000000009f000 - 0000000000100000 (reserved)
BIOS-e820: 0000000000100000 - 000000007f932000 (usable)
BIOS-e820: 000000007f932000 - 000000007f9d0000 (ACPI NVS)
BIOS-e820: 000000007f9d0000 - 000000007fa42000 (usable)
BIOS-e820: 000000007fa42000 - 000000007fa9a000 (reserved)
BIOS-e820: 000000007fa9a000 - 000000007fad6000 (usable)
BIOS-e820: 000000007fad6000 - 000000007fb1a000 (ACPI NVS)
BIOS-e820: 000000007fb1a000 - 000000007fb2b000 (usable)
BIOS-e820: 000000007fb2b000 - 000000007fb3a000 (ACPI data)
BIOS-e820: 000000007fb3a000 - 000000007fc00000 (usable)
BIOS-e820: 00000000ffc00000 - 00000000ffc0c000 (reserved)
Entering add_active_range(0, 0, 159) 0 entries of 3200 used
Entering add_active_range(0, 256, 522546) 1 entries of 3200 used
Entering add_active_range(0, 522704, 522818) 2 entries of 3200 used
Entering add_active_range(0, 522906, 522966) 3 entries of 3200 used
Entering add_active_range(0, 523034, 523051) 4 entries of 3200 used
Entering add_active_range(0, 523066, 523264) 5 entries of 3200 used
end_pfn_map = 1047564
DMI 2.4 present.
ACPI: RSDP (v002 INTEL ) @ 0x00000000000f0350
ACPI: XSDT (v001 INTEL S5000PAL 0x00000000 INTL 0x01000013) @ 0x000000007fb39120
ACPI: FADT (v003 INTEL S5000PAL 0x00000000 INTL 0x01000013) @ 0x000000007fb36000
ACPI: MADT (v001 INTEL S5000PAL 0x00000000 INTL 0x01000013) @ 0x000000007fb35000
ACPI: SPCR (v001 INTEL S5000PAL 0x00000000 INTL 0x01000013) @ 0x000000007fb2e000
ACPI: HPET (v001 INTEL S5000PAL 0x00000001 INTL 0x01000013) @ 0x000000007fb2d000
ACPI: MCFG (v001 INTEL S5000PAL 0x00000001 INTL 0x01000013) @ 0x000000007fb2c000
ACPI: SSDT (v002 INTEL S5000PAL 0x00004000 INTL 0x01000013) @ 0x000000007fb2b000
ACPI: DSDT (v002 INTEL S5000PAL 0x00000008 INTL 0x01000013) @ 0x0000000000000000
No NUMA configuration found
Faking a node at 0000000000000000-000000007fc00000
Entering add_active_range(0, 0, 159) 0 entries of 3200 used
Entering add_active_range(0, 256, 522546) 1 entries of 3200 used
Entering add_active_range(0, 522704, 522818) 2 entries of 3200 used
Entering add_active_range(0, 522906, 522966) 3 entries of 3200 used
Entering add_active_range(0, 523034, 523051) 4 entries of 3200 used
Entering add_active_range(0, 523066, 523264) 5 entries of 3200 used
Bootmem setup node 0 0000000000000000-000000007fc00000
Zone PFN ranges:
DMA 0 -> 4096
DMA32 4096 -> 1048576
Normal 1048576 -> 1048576
early_node_map[6] active PFN ranges
0: 0 -> 159
0: 256 -> 522546
0: 522704 -> 522818
0: 522906 -> 522966
0: 523034 -> 523051
0: 523066 -> 523264
On node 0 totalpages: 522838
DMA zone: 7154 pages used for memmap
memmap_size of 7154 exceeds 3999
DMA zone: 1732 pages reserved
DMA zone: 2267 pages, LIFO batch:0
DMA32 zone: 518839 pages, LIFO batch:31
ACPI: PM-Timer IO Port: 0x408
ACPI: Local APIC address 0xfee00000
ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled)
Processor #0 (Bootup-CPU)
ACPI: LAPIC (acpi_id[0x01] lapic_id[0x06] enabled)
Processor #6
ACPI: LAPIC (acpi_id[0x02] lapic_id[0x01] enabled)
Processor #1
ACPI: LAPIC (acpi_id[0x03] lapic_id[0x07] enabled)
Processor #7
ACPI: LAPIC (acpi_id[0x04] lapic_id[0x84] disabled)
ACPI: LAPIC (acpi_id[0x05] lapic_id[0x85] disabled)
ACPI: LAPIC (acpi_id[0x06] lapic_id[0x86] disabled)
ACPI: LAPIC (acpi_id[0x07] lapic_id[0x87] disabled)
ACPI: LAPIC_NMI (acpi_id[0x00] high level lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0x01] high level lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0x02] high level lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0x03] high level lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0x04] high level lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0x05] high level lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0x06] high level lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0x07] high level lint[0x1])
ACPI: IOAPIC (id[0x08] address[0xfec00000] gsi_base[0])
IOAPIC[0]: apic_id 8, address 0xfec00000, GSI 0-23
ACPI: IOAPIC (id[0x09] address[0xfec80000] gsi_base[24])
IOAPIC[1]: apic_id 9, address 0xfec80000, GSI 24-47
ACPI: IOAPIC (id[0x0a] address[0xfec84000] gsi_base[48])
IOAPIC[2]: apic_id 10, address 0xfec84000, GSI 48-71
ACPI: IOAPIC (id[0x0b] address[0xfec84400] gsi_base[72])
IOAPIC[3]: apic_id 11, address 0xfec84400, GSI 72-95
ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
ACPI: IRQ0 used by override.
ACPI: IRQ2 used by override.
ACPI: IRQ9 used by override.
Setting APIC routing to physical flat
ACPI: HPET id: 0x8086a201 base: 0xfed00000
Using ACPI (MADT) for SMP configuration information
Allocating PCI resources starting at 80000000 (gap: 7fc00000:80000000)
SMP: Allowing 8 CPUs, 4 hotplug CPUs
PERCPU: Allocating 32000 bytes of per cpu data
Built 1 zonelists. Total pages: 521106
Kernel command line: root=/dev/sda3 console=ttyS1,115200 showopts pj1
Initializing CPU#0
PID hash table entries: 4096 (order: 12, 32768 bytes)
Console: colour VGA+ 80x25
Dentry cache hash table entries: 262144 (order: 9, 2097152 bytes)
Inode-cache hash table entries: 131072 (order: 8, 1048576 bytes)
Checking aperture...
Memory: 2052128k/2093056k available (3520k kernel code, 39224k reserved, 2327k data, 280k init)
Calibrating delay using timer specific routine.. 5324.65 BogoMIPS (lpj=10649309)
Mount-cache hash table entries: 256
CPU: L1 I cache: 32K, L1 D cache: 32K
CPU: L2 cache: 4096K
CPU 0/0 -> Node 0
using mwait in idle threads.
CPU: Physical Processor ID: 0
CPU: Processor Core ID: 0
CPU0: Thermal monitoring enabled (TM2)
SMP alternatives: switching to UP code
ACPI: Core revision 20060707
Using local APIC timer interrupts.
result 20781307
Detected 20.781 MHz APIC timer.
SMP alternatives: switching to SMP code
Booting processor 1/4 APIC 0x6
Initializing CPU#1
Calibrating delay using timer specific routine.. 5320.18 BogoMIPS (lpj=10640368)
CPU: L1 I cache: 32K, L1 D cache: 32K
CPU: L2 cache: 4096K
CPU 1/6 -> Node 0
CPU: Physical Processor ID: 3
CPU: Processor Core ID: 0
CPU1: Thermal monitoring enabled (TM2)
Genuine Intel(R) CPU @ 2.66GHz stepping 04
SMP alternatives: switching to SMP code
Booting processor 2/4 APIC 0x1
Initializing CPU#2
Calibrating delay using timer specific routine.. 5320.13 BogoMIPS (lpj=10640272)
CPU: L1 I cache: 32K, L1 D cache: 32K
CPU: L2 cache: 4096K
CPU 2/1 -> Node 0
CPU: Physical Processor ID: 0
CPU: Processor Core ID: 1
CPU2: Thermal monitoring enabled (TM2)
Genuine Intel(R) CPU @ 2.66GHz stepping 04
SMP alternatives: switching to SMP code
Booting processor 3/4 APIC 0x7
Initializing CPU#3
Calibrating delay using timer specific routine.. 5320.15 BogoMIPS (lpj=10640316)
CPU: L1 I cache: 32K, L1 D cache: 32K
CPU: L2 cache: 4096K
CPU 3/7 -> Node 0
CPU: Physical Processor ID: 3
CPU: Processor Core ID: 1
CPU3: Thermal monitoring enabled (TM2)
Genuine Intel(R) CPU @ 2.66GHz stepping 04
Brought up 4 CPUs
testing NMI watchdog ... OK.
time.c: Using 14.318180 MHz WALL HPET GTOD HPET/TSC timer.
time.c: Detected 2660.008 MHz processor.
migration_cost=24,8015
NET: Registered protocol family 16
ACPI: bus type pci registered
PCI: Using MMCONFIG at a0000000
ACPI: Interpreter enabled
ACPI: Using IOAPIC for interrupt routing
ACPI: PCI Root Bridge [PCI0] (0000:00)
PCI: Probing PCI hardware (bus 00)
PCI: Ignoring BAR0-3 of IDE controller 0000:00:1f.1
Losing some ticks... checking if CPU frequency changed.
PCI: PXH quirk detected, disabling MSI for SHPC device
PCI: PXH quirk detected, disabling MSI for SHPC device
Boot video device is 0000:10:0c.0
PCI: Transparent bridge - 0000:00:1e.0
ACPI: PCI Interrupt Routing Table [\_SB_.PCI0._PRT]
ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.P32_._PRT]
ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.PCE4._PRT]
ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.PCE5._PRT]
ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.EXPC._PRT]
ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.EXPC.PXHA._PRT]
ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.EXPC.PXHB._PRT]
ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.PCE7._PRT]
ACPI: PCI Interrupt Link [LNKA] (IRQs 5 7 *10 11)
ACPI: PCI Interrupt Link [LNKB] (IRQs 5 7 10 *11)
ACPI: PCI Interrupt Link [LNKC] (IRQs 5 7 *10 11)
ACPI: PCI Interrupt Link [LNKD] (IRQs *5 7 10 11)
ACPI: PCI Interrupt Link [LNKE] (IRQs *5 7 10 11)
ACPI: PCI Interrupt Link [LNKF] (IRQs 5 7 10 11) *0, disabled.
ACPI: PCI Interrupt Link [LNKG] (IRQs 5 7 *10 11)
ACPI: PCI Interrupt Link [LNKH] (IRQs 5 7 10 *11)
ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.PCIE._PRT]
ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.PCIE.PCIE._PRT]
ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.PCIE.PCIW._PRT]
ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.PCIE.PCIW.PCIO._PRT]
ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.PCIE.PCIW.PCIO.PCIA._PRT]
ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.PCIE.PCIW.PCIP._PRT]
ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.PCIE.PCIW.PCIQ._PRT]
Intel 82802 RNG detected
SCSI subsystem initialized
usbcore: registered new interface driver usbfs
usbcore: registered new interface driver hub
usbcore: registered new device driver usb
PCI: Using ACPI for IRQ routing
PCI: If a device doesn't work, try "pci=routeirq". If it helps, post a report
hpet0: at MMIO 0xfed00000, IRQs 2, 8, 0
hpet0: 3 64-bit timers, 14318180 Hz
PCI-GART: No AMD northbridge found.
PCI: Bridge: 0000:03:00.0
IO window: disabled.
MEM window: b8900000-b89fffff
PREFETCH window: b8b00000-b8bfffff
PCI: Bridge: 0000:03:00.2
IO window: disabled.
MEM window: disabled.
PREFETCH window: disabled.
PCI: Bridge: 0000:02:00.0
IO window: disabled.
MEM window: b8900000-b89fffff
PREFETCH window: b8b00000-b8bfffff
PCI: Bridge: 0000:02:01.0
IO window: disabled.
MEM window: disabled.
PREFETCH window: disabled.
PCI: Bridge: 0000:02:02.0
IO window: 2000-2fff
MEM window: b8000000-b88fffff
PREFETCH window: disabled.
PCI: Bridge: 0000:01:00.0
IO window: 2000-2fff
MEM window: b8000000-b89fffff
PREFETCH window: b8b00000-b8bfffff
PCI: Bridge: 0000:01:00.3
IO window: disabled.
MEM window: disabled.
PREFETCH window: disabled.
PCI: Bridge: 0000:00:02.0
IO window: 2000-2fff
MEM window: b8000000-b8afffff
PREFETCH window: b8b00000-b8bfffff
PCI: Bridge: 0000:00:03.0
IO window: disabled.
MEM window: disabled.
PREFETCH window: disabled.
PCI: Bridge: 0000:00:04.0
IO window: disabled.
MEM window: disabled.
PREFETCH window: disabled.
PCI: Bridge: 0000:00:05.0
IO window: disabled.
MEM window: disabled.
PREFETCH window: disabled.
PCI: Bridge: 0000:0c:00.0
IO window: disabled.
MEM window: disabled.
PREFETCH window: disabled.
PCI: Bridge: 0000:0c:00.2
IO window: disabled.
MEM window: disabled.
PREFETCH window: disabled.
PCI: Bridge: 0000:00:06.0
IO window: disabled.
MEM window: disabled.
PREFETCH window: disabled.
PCI: Bridge: 0000:00:07.0
IO window: disabled.
MEM window: disabled.
PREFETCH window: disabled.
PCI: Bridge: 0000:00:1e.0
IO window: 1000-1fff
MEM window: b8c00000-b8cfffff
PREFETCH window: b0000000-b7ffffff
GSI 16 sharing vector 0xA9 and IRQ 16
ACPI: PCI Interrupt 0000:00:02.0[A] -> GSI 16 (level, low) -> IRQ 169
PCI: Setting latency timer of device 0000:00:02.0 to 64
ACPI: PCI Interrupt 0000:01:00.0[A] -> GSI 16 (level, low) -> IRQ 169
PCI: Setting latency timer of device 0000:01:00.0 to 64
ACPI: PCI Interrupt 0000:02:00.0[A] -> GSI 16 (level, low) -> IRQ 169
PCI: Setting latency timer of device 0000:02:00.0 to 64
PCI: Setting latency timer of device 0000:03:00.0 to 64
PCI: Setting latency timer of device 0000:03:00.2 to 64
ACPI: PCI Interrupt 0000:02:01.0[A] -> GSI 16 (level, low) -> IRQ 169
PCI: Setting latency timer of device 0000:02:01.0 to 64
ACPI: PCI Interrupt 0000:02:02.0[A] -> GSI 16 (level, low) -> IRQ 169
PCI: Setting latency timer of device 0000:02:02.0 to 64
PCI: Setting latency timer of device 0000:01:00.3 to 64
ACPI: PCI Interrupt 0000:00:03.0[A] -> GSI 16 (level, low) -> IRQ 169
PCI: Setting latency timer of device 0000:00:03.0 to 64
ACPI: PCI Interrupt 0000:00:04.0[A] -> GSI 16 (level, low) -> IRQ 169
PCI: Setting latency timer of device 0000:00:04.0 to 64
ACPI: PCI Interrupt 0000:00:05.0[A] -> GSI 16 (level, low) -> IRQ 169
PCI: Setting latency timer of device 0000:00:05.0 to 64
ACPI: PCI Interrupt 0000:00:06.0[A] -> GSI 16 (level, low) -> IRQ 169
PCI: Setting latency timer of device 0000:00:06.0 to 64
PCI: Setting latency timer of device 0000:0c:00.0 to 64
PCI: Setting latency timer of device 0000:0c:00.2 to 64
ACPI: PCI Interrupt 0000:00:07.0[A] -> GSI 16 (level, low) -> IRQ 169
PCI: Setting latency timer of device 0000:00:07.0 to 64
PCI: Setting latency timer of device 0000:00:1e.0 to 64
NET: Registered protocol family 2
IP route cache hash table entries: 65536 (order: 7, 524288 bytes)
TCP established hash table entries: 262144 (order: 10, 4194304 bytes)
TCP bind hash table entries: 65536 (order: 8, 1048576 bytes)
TCP: Hash tables configured (established 262144 bind 65536)
TCP reno registered
Total HugeTLB memory allocated, 0
Installing knfsd (copyright (C) 1996 [email protected]).
SGI XFS with large block/inode numbers, no debug enabled
io scheduler noop registered
io scheduler deadline registered
io scheduler cfq registered (default)
0000:00:1d.7 EHCI: BIOS handoff failed (BIOS bug ?) 01010001
PCI: Setting latency timer of device 0000:00:02.0 to 64
assign_interrupt_mode Found MSI capability
Allocate Port Service[0000:00:02.0:pcie00]
Allocate Port Service[0000:00:02.0:pcie01]
PCI: Setting latency timer of device 0000:00:03.0 to 64
assign_interrupt_mode Found MSI capability
Allocate Port Service[0000:00:03.0:pcie00]
Allocate Port Service[0000:00:03.0:pcie01]
PCI: Setting latency timer of device 0000:00:04.0 to 64
assign_interrupt_mode Found MSI capability
Allocate Port Service[0000:00:04.0:pcie00]
Allocate Port Service[0000:00:04.0:pcie01]
Allocate Port Service[0000:00:04.0:pcie02]
PCI: Setting latency timer of device 0000:00:05.0 to 64
assign_interrupt_mode Found MSI capability
Allocate Port Service[0000:00:05.0:pcie00]
Allocate Port Service[0000:00:05.0:pcie01]
Allocate Port Service[0000:00:05.0:pcie02]
PCI: Setting latency timer of device 0000:00:06.0 to 64
assign_interrupt_mode Found MSI capability
Allocate Port Service[0000:00:06.0:pcie00]
Allocate Port Service[0000:00:06.0:pcie01]
Allocate Port Service[0000:00:06.0:pcie02]
PCI: Setting latency timer of device 0000:00:07.0 to 64
assign_interrupt_mode Found MSI capability
Allocate Port Service[0000:00:07.0:pcie00]
Allocate Port Service[0000:00:07.0:pcie01]
PCI: Setting latency timer of device 0000:01:00.0 to 64
Allocate Port Service[0000:01:00.0:pcie10]
Allocate Port Service[0000:01:00.0:pcie11]
PCI: Setting latency timer of device 0000:02:00.0 to 64
assign_interrupt_mode Found MSI capability
Allocate Port Service[0000:02:00.0:pcie20]
Allocate Port Service[0000:02:00.0:pcie21]
Allocate Port Service[0000:02:00.0:pcie22]
PCI: Setting latency timer of device 0000:02:01.0 to 64
assign_interrupt_mode Found MSI capability
Allocate Port Service[0000:02:01.0:pcie20]
Allocate Port Service[0000:02:01.0:pcie21]
PCI: Setting latency timer of device 0000:02:02.0 to 64
assign_interrupt_mode Found MSI capability
Allocate Port Service[0000:02:02.0:pcie20]
Allocate Port Service[0000:02:02.0:pcie21]
Evaluate _OSC Set fails. Status = 0x0005
aer_init: AER service init fails - Run ACPI _OSC fails
aer: probe of 0000:00:02.0:pcie01 failed with error 2
aer_init: AER service init fails - No ACPI _OSC support
aer: probe of 0000:00:03.0:pcie01 failed with error 1
Evaluate _OSC Set fails. Status = 0x0005
aer_init: AER service init fails - Run ACPI _OSC fails
aer: probe of 0000:00:04.0:pcie01 failed with error 2
Evaluate _OSC Set fails. Status = 0x0005
aer_init: AER service init fails - Run ACPI _OSC fails
aer: probe of 0000:00:05.0:pcie01 failed with error 2
Evaluate _OSC Set fails. Status = 0x0005
aer_init: AER service init fails - Run ACPI _OSC fails
aer: probe of 0000:00:06.0:pcie01 failed with error 2
Evaluate _OSC Set fails. Status = 0x0005
aer_init: AER service init fails - Run ACPI _OSC fails
aer: probe of 0000:00:07.0:pcie01 failed with error 2
ACPI: Power Button (FF) [PWRF]
ACPI: Power Button (CM) [PWRB]
ACPI: Invalid PBLK length [5]
ACPI: Invalid PBLK length [5]
ACPI: Invalid PBLK length [5]
ACPI: Invalid PBLK length [5]
ACPI Exception (acpi_processor-0681): AE_NOT_FOUND, Processor Device is not present [20060707]
ACPI: Getting cpuindex for acpiid 0x4
ACPI Exception (acpi_processor-0681): AE_NOT_FOUND, Processor Device is not present [20060707]
ACPI: Getting cpuindex for acpiid 0x5
ACPI Exception (acpi_processor-0681): AE_NOT_FOUND, Processor Device is not present [20060707]
ACPI: Getting cpuindex for acpiid 0x6
ACPI Exception (acpi_processor-0681): AE_NOT_FOUND, Processor Device is not present [20060707]
ACPI: Getting cpuindex for acpiid 0x7
Real Time Clock Driver v1.12ac
hpet_resources: 0xfed00000 is busy
Linux agpgart interface v0.101 (c) Dave Jones
Serial: 8250/16550 driver $Revision: 1.90 $ 4 ports, IRQ sharing disabled
serial8250: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
serial8250: ttyS1 at I/O 0x2f8 (irq = 3) is a 16550A
floppy0: no floppy controllers found
RAMDISK driver initialized: 16 RAM disks of 4096K size 1024 blocksize
loop: loaded (max 8 devices)
Intel(R) PRO/1000 Network Driver - version 7.1.9-k6
Copyright (c) 1999-2006 Intel Corporation.
GSI 17 sharing vector 0x42 and IRQ 17
ACPI: PCI Interrupt 0000:07:00.0[A] -> GSI 18 (level, low) -> IRQ 66
PCI: Setting latency timer of device 0000:07:00.0 to 64
e1000: 0000:07:00.0: e1000_probe: (PCI Express:2.5Gb/s:Width x4) 00:04:23:cf:2d:d2
e1000: eth0: e1000_probe: Intel(R) PRO/1000 Network Connection
GSI 18 sharing vector 0x4A and IRQ 18
ACPI: PCI Interrupt 0000:07:00.1[B] -> GSI 19 (level, low) -> IRQ 74
PCI: Setting latency timer of device 0000:07:00.1 to 64
e1000: 0000:07:00.1: e1000_probe: (PCI Express:2.5Gb/s:Width x4) 00:04:23:cf:2d:d3
e1000: eth1: e1000_probe: Intel(R) PRO/1000 Network Connection
e100: Intel(R) PRO/100 Network Driver, 3.5.10-k4-NAPI
e100: Copyright(c) 1999-2006 Intel Corporation
forcedeth.c: Reverse Engineered nForce ethernet driver. Version 0.57.
tun: Universal TUN/TAP device driver, 1.6
tun: (C) 1999-2004 Max Krasnyansky <[email protected]>
netconsole: not configured, aborting
Uniform Multi-Platform E-IDE driver Revision: 7.00alpha2
ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx
ESB2: IDE controller at PCI slot 0000:00:1f.1
GSI 19 sharing vector 0x52 and IRQ 19
ACPI: PCI Interrupt 0000:00:1f.1[A] -> GSI 20 (level, low) -> IRQ 82
ESB2: chipset revision 9
ESB2: not 100% native mode: will probe irqs later
ide0: BM-DMA at 0x30a0-0x30a7, BIOS settings: hda:DMA, hdb:pio
Probing IDE interface ide0...
hda: DV-28E-N, ATAPI CD/DVD-ROM drive
ide0 at 0x1f0-0x1f7,0x3f6 on irq 14
Probing IDE interface ide1...
hda: ATAPI 24X DVD-ROM drive, 256kB Cache, UDMA(33)
Uniform CD-ROM driver Revision: 3.20
megaraid cmm: 2.20.2.7 (Release Date: Sun Jul 16 00:01:03 EST 2006)
megaraid: 2.20.4.9 (Release Date: Sun Jul 16 12:27:22 EST 2006)
megasas: 00.00.03.01 Sun May 14 22:49:52 PDT 2006
megasas: 0x1000:0x0411:0x8086:0x3501: bus 4:slot 14:func 0
ACPI: PCI Interrupt 0000:04:0e.0[A] -> GSI 18 (level, low) -> IRQ 66
scsi0 : LSI Logic SAS based MegaRAID driver
scsi 0:0:0:0: Direct-Access ATA HDT722525DLA380 A80A PQ: 0 ANSI: 5
scsi 0:0:1:0: Direct-Access ATA HDS725050KLA360 AB0A PQ: 0 ANSI: 5
scsi 0:0:2:0: Direct-Access ATA HDS725050KLA360 AB0A PQ: 0 ANSI: 5
scsi 0:0:3:0: Direct-Access ATA HDS725050KLA360 AB0A PQ: 0 ANSI: 5
scsi 0:0:4:0: Direct-Access ATA HDS725050KLA360 AB0A PQ: 0 ANSI: 5
scsi 0:2:0:0: Direct-Access INTEL SROMBSAS18E 1.00 PQ: 0 ANSI: 5
scsi 0:2:1:0: Direct-Access INTEL SROMBSAS18E 1.00 PQ: 0 ANSI: 5
SCSI device sda: 486326272 512-byte hdwr sectors (248999 MB)
sda: test WP failed, assume Write Enabled
sda: asking for cache data failed
sda: assuming drive cache: write through
SCSI device sda: 486326272 512-byte hdwr sectors (248999 MB)
sda: test WP failed, assume Write Enabled
sda: asking for cache data failed
sda: assuming drive cache: write through
sda: sda1 sda2 sda3
sd 0:2:0:0: Attached scsi disk sda
SCSI device sdb: 2923825152 512-byte hdwr sectors (1496998 MB)
sdb: test WP failed, assume Write Enabled
sdb: asking for cache data failed
sdb: assuming drive cache: write through
SCSI device sdb: 2923825152 512-byte hdwr sectors (1496998 MB)
sdb: test WP failed, assume Write Enabled
sdb: asking for cache data failed
sdb: assuming drive cache: write through
sdb: sdb1
sd 0:2:1:0: Attached scsi disk sdb
sd 0:2:0:0: Attached scsi generic sg0 type 0
sd 0:2:1:0: Attached scsi generic sg1 type 0
Fusion MPT base driver 3.04.01
Copyright (c) 1999-2005 LSI Logic Corporation
Fusion MPT SPI Host driver 3.04.01
Fusion MPT SAS Host driver 3.04.01
ieee1394: raw1394: /dev/raw1394 device initialized
GSI 20 sharing vector 0x5A and IRQ 20
ACPI: PCI Interrupt 0000:00:1d.7[A] -> GSI 23 (level, low) -> IRQ 90
PCI: Setting latency timer of device 0000:00:1d.7 to 64
ehci_hcd 0000:00:1d.7: EHCI Host Controller
ehci_hcd 0000:00:1d.7: new USB bus registered, assigned bus number 1
ehci_hcd 0000:00:1d.7: debug port 1
PCI: cache line size of 32 is not supported by device 0000:00:1d.7
ehci_hcd 0000:00:1d.7: irq 90, io mem 0xb8d00000
ehci_hcd 0000:00:1d.7: USB 2.0 started, EHCI 1.00, driver 10 Dec 2004
usb usb1: new device found, idVendor=0000, idProduct=0000
usb usb1: new device strings: Mfr=3, Product=2, SerialNumber=1
usb usb1: Product: EHCI Host Controller
usb usb1: Manufacturer: Linux 2.6.18-rc4-mm3 ehci_hcd
usb usb1: SerialNumber: 0000:00:1d.7
usb usb1: configuration #1 chosen from 1 choice
hub 1-0:1.0: USB hub found
hub 1-0:1.0: 8 ports detected
ohci_hcd: 2006 August 04 USB 1.1 'Open' Host Controller (OHCI) Driver (PCI)
USB Universal Host Controller Interface driver v3.0
ACPI: PCI Interrupt 0000:00:1d.0[A] -> GSI 23 (level, low) -> IRQ 90
PCI: Setting latency timer of device 0000:00:1d.0 to 64
uhci_hcd 0000:00:1d.0: UHCI Host Controller
uhci_hcd 0000:00:1d.0: new USB bus registered, assigned bus number 2
uhci_hcd 0000:00:1d.0: irq 90, io base 0x00003080
usb usb2: new device found, idVendor=0000, idProduct=0000
usb usb2: new device strings: Mfr=3, Product=2, SerialNumber=1
usb usb2: Product: UHCI Host Controller
usb usb2: Manufacturer: Linux 2.6.18-rc4-mm3 uhci_hcd
usb usb2: SerialNumber: 0000:00:1d.0
usb usb2: configuration #1 chosen from 1 choice
hub 2-0:1.0: USB hub found
hub 2-0:1.0: 2 ports detected
GSI 21 sharing vector 0x62 and IRQ 21
ACPI: PCI Interrupt 0000:00:1d.1[B] -> GSI 22 (level, low) -> IRQ 98
PCI: Setting latency timer of device 0000:00:1d.1 to 64
uhci_hcd 0000:00:1d.1: UHCI Host Controller
uhci_hcd 0000:00:1d.1: new USB bus registered, assigned bus number 3
uhci_hcd 0000:00:1d.1: irq 98, io base 0x00003060
usb usb3: new device found, idVendor=0000, idProduct=0000
usb usb3: new device strings: Mfr=3, Product=2, SerialNumber=1
usb usb3: Product: UHCI Host Controller
usb usb3: Manufacturer: Linux 2.6.18-rc4-mm3 uhci_hcd
usb usb3: SerialNumber: 0000:00:1d.1
usb usb3: configuration #1 chosen from 1 choice
hub 3-0:1.0: USB hub found
hub 3-0:1.0: 2 ports detected
ACPI: PCI Interrupt 0000:00:1d.2[C] -> GSI 23 (level, low) -> IRQ 90
PCI: Setting latency timer of device 0000:00:1d.2 to 64
uhci_hcd 0000:00:1d.2: UHCI Host Controller
uhci_hcd 0000:00:1d.2: new USB bus registered, assigned bus number 4
uhci_hcd 0000:00:1d.2: irq 90, io base 0x00003040
usb usb4: new device found, idVendor=0000, idProduct=0000
usb usb4: new device strings: Mfr=3, Product=2, SerialNumber=1
usb usb4: Product: UHCI Host Controller
usb usb4: Manufacturer: Linux 2.6.18-rc4-mm3 uhci_hcd
usb usb4: SerialNumber: 0000:00:1d.2
usb usb4: configuration #1 chosen from 1 choice
hub 4-0:1.0: USB hub found
hub 4-0:1.0: 2 ports detected
ACPI: PCI Interrupt 0000:00:1d.3[D] -> GSI 22 (level, low) -> IRQ 98
PCI: Setting latency timer of device 0000:00:1d.3 to 64
uhci_hcd 0000:00:1d.3: UHCI Host Controller
uhci_hcd 0000:00:1d.3: new USB bus registered, assigned bus number 5
uhci_hcd 0000:00:1d.3: irq 98, io base 0x00003020
usb usb5: new device found, idVendor=0000, idProduct=0000
usb usb5: new device strings: Mfr=3, Product=2, SerialNumber=1
usb usb5: Product: UHCI Host Controller
usb usb5: Manufacturer: Linux 2.6.18-rc4-mm3 uhci_hcd
usb usb5: SerialNumber: 0000:00:1d.3
usb usb5: configuration #1 chosen from 1 choice
hub 5-0:1.0: USB hub found
hub 5-0:1.0: 2 ports detected
usbcore: registered new interface driver usblp
drivers/usb/class/usblp.c: v0.13: USB Printer Device Class driver
Initializing USB Mass Storage driver...
usbcore: registered new interface driver usb-storage
USB Mass Storage support registered.
usbcore: registered new interface driver usbhid
drivers/usb/input/hid-core.c: v2.6:USB HID core driver
serio: i8042 KBD port at 0x60,0x64 irq 1
serio: i8042 AUX port at 0x60,0x64 irq 12
mice: PS/2 mouse device common for all mice
device-mapper: ioctl: 4.7.0-ioctl (2006-06-24) initialised: [email protected]
Intel 810 + AC97 Audio, version 1.01, 16:33:05 Sep 6 2006
oprofile: using timer interrupt.
TCP bic registered
NET: Registered protocol family 1
NET: Registered protocol family 10
IPv6 over IPv4 tunneling driver
NET: Registered protocol family 17
input: AT Translated Set 2 keyboard as /class/input/input0
ACPI: (supports S0 S1 S4 S5)
logips2pp: Detected unknown logitech mouse model 1
input: PS/2 Logitech Mouse as /class/input/input1
XFS mounting filesystem sda3
Ending clean XFS mount for filesystem: sda3
VFS: Mounted root (xfs filesystem) readonly.
Freeing unused kernel memory: 280k freed
Adding 9438176k swap on /dev/disk/by-id/scsi-3600062b0000011200c86316cf30a2b4c-part2. Priority:-1 extents:1 across:9438176k
ADDRCONF(NETDEV_UP): eth0: link is not ready
e1000: eth0: e1000_watchdog: NIC Link is Up 100 Mbps Full Duplex
e1000: eth0: e1000_watchdog: 10/100 speed: disabling TSO
ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
eth0: no IPv6 routers present


--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <[email protected]> 1.925.600.0401

2006-09-07 14:17:07

by Mel Gorman

[permalink] [raw]
Subject: Re: x86_64 account-for-memmap patch in 2.6.18-rc4-mm3 doesn't boot.

On Wed, 6 Sep 2006, Paul Jackson wrote:

> Mel Gorman wrote:
>> I could do with those lines, but I believe there was enough information
>> printed to determine why it failed to boot. I've attached a patch that
>> should boot the machine and assuming it works, I just need the output of
>> dmesg.
>
> Yup - that patch booted it, and produced the output you asked for.
>
> Here's the dmesg output from booting your patch:
>
> <dmesg log snipped>

Thanks. Now it's *painfully* obvious what went wrong - memmap is not
necessarily in one zone and in your machine memmap spanned two zones. A
patch will follow this mail that fixes the underlying issue but keeps the
underflow check in case. Please give it a test if you get the chance. It
passes regression tests here.

--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab

2006-09-07 14:27:48

by mel

[permalink] [raw]
Subject: [PATCH] Fix memmap accounting by approximating the map size

Arch-independent zone-sizing uses account_memmap() in an attempt to accurately
account for how much memory was used in a zone by memmap. Watermarks and
per-cpu sizes initialisations then take the memmap into account. However,
the memmap may span multiple zones and in one case, there was an underflow
causing boot failures.

The fix that perfectly accounts for memory consumed by memmap is complicated
with no clear benefit. The architecture-specific code in x86_64 was simpler
because it approximated how much memory was consumed for memmap backing that
zone regardless of where the memmap was really stored.

This patch ditches the account_memmap() complexity and replaces with the
simple approximation used by x86_64 while ensuring no underflow occurs.

Signed-off-by: Mel Gorman <[email protected]>

diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.18-rc4-mm3-clean/mm/page_alloc.c linux-2.6.18-rc4-mm3-101_fix_account_memmap/mm/page_alloc.c
--- linux-2.6.18-rc4-mm3-clean/mm/page_alloc.c 2006-08-28 15:05:30.000000000 +0100
+++ linux-2.6.18-rc4-mm3-101_fix_account_memmap/mm/page_alloc.c 2006-09-07 14:36:05.000000000 +0100
@@ -2364,58 +2364,6 @@ static void __init calculate_node_totalp
realtotalpages);
}

-#ifdef CONFIG_FLAT_NODE_MEM_MAP
-/* Account for mem_map for CONFIG_FLAT_NODE_MEM_MAP */
-unsigned long __meminit account_memmap(struct pglist_data *pgdat,
- int zone_index)
-{
- unsigned long pages = 0;
- if (zone_index == memmap_zone_idx(pgdat->node_mem_map)) {
- pages = pgdat->node_spanned_pages;
- pages = (pages * sizeof(struct page)) >> PAGE_SHIFT;
- printk(KERN_DEBUG "%lu pages used for memmap\n", pages);
- }
- return pages;
-}
-#else
-/* Account for mem_map for CONFIG_SPARSEMEM */
-unsigned long account_memmap(struct pglist_data *pgdat, int zone_index)
-{
- unsigned long pages = 0;
- unsigned long memmap_pfn;
- struct page *memmap_addr;
- int pnum;
- unsigned long pgdat_startpfn, pgdat_endpfn;
- struct mem_section *section;
-
- pgdat_startpfn = pgdat->node_start_pfn;
- pgdat_endpfn = pgdat_startpfn + pgdat->node_spanned_pages;
-
- /* Go through valid sections looking for memmap */
- for (pnum = 0; pnum < NR_MEM_SECTIONS; pnum++) {
- if (!valid_section_nr(pnum))
- continue;
-
- section = __nr_to_section(pnum);
- if (!section_has_mem_map(section))
- continue;
-
- memmap_addr = __section_mem_map_addr(section);
- memmap_pfn = (unsigned long)memmap_addr >> PAGE_SHIFT;
-
- if (memmap_pfn < pgdat_startpfn || memmap_pfn >= pgdat_endpfn)
- continue;
-
- if (zone_index == memmap_zone_idx(memmap_addr))
- pages += (PAGES_PER_SECTION * sizeof(struct page));
- }
-
- pages >>= PAGE_SHIFT;
- printk(KERN_DEBUG "%lu pages used for SPARSE memmap\n", pages);
- return pages;
-}
-#endif
-
/*
* Set up the zone data structures:
* - mark all pages reserved
@@ -2437,17 +2385,32 @@ static void __meminit free_area_init_cor

for (j = 0; j < MAX_NR_ZONES; j++) {
struct zone *zone = pgdat->node_zones + j;
- unsigned long size, realsize;
+ unsigned long size, realsize, memmap_pages;

size = zone_spanned_pages_in_node(nid, j, zones_size);
realsize = size - zone_absent_pages_in_node(nid, j,
zholes_size);

- realsize -= account_memmap(pgdat, j);
+ /*
+ * Adjust realsize so that it accounts for how much memory
+ * is used by this zone for memmap. This affects the watermark
+ * and per-cpu initialisations
+ */
+ memmap_pages = (size * sizeof(struct page)) >> PAGE_SHIFT;
+ if (realsize >= memmap_pages) {
+ realsize -= memmap_pages;
+ printk(KERN_DEBUG
+ " %s zone: %lu pages used for memmap\n",
+ zone_names[j], memmap_pages);
+ } else
+ printk(KERN_WARNING
+ " %s zone: %lu pages exceeds realsize %lu\n",
+ zone_names[j], memmap_pages, realsize);
+
/* Account for reserved DMA pages */
if (j == ZONE_DMA && realsize > dma_reserve) {
realsize -= dma_reserve;
- printk(KERN_DEBUG "%lu pages DMA reserved\n",
+ printk(KERN_DEBUG " DMA zone: %lu pages reserved\n",
dma_reserve);
}

2006-09-07 15:33:48

by Paul Jackson

[permalink] [raw]
Subject: Re: [PATCH] Fix memmap accounting by approximating the map size

Mel wrote:
> This patch ditches the account_memmap() complexity and replaces with the
> simple approximation used by x86_64 while ensuring no underflow occurs.

Works for me, on my x86_64 box. Thanks, Mel.

Acked-by: Paul Jackson <[email protected]>

--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <[email protected]> 1.925.600.0401