2005-09-15 16:51:49

by Petr Vandrovec

[permalink] [raw]
Subject: 2.6.14-rc1-git-now still dying in mm/slab - this time line 1849

Hello,
so now once crashes on UP system were sorted out, I tried to
put new kernel on my SMP host - and sorry to say, but it does not
seem to work as advertised :-( It seems that we somehow got
blocks from CPU#1 into memory blocks on CPU#0, and free_block
complains that caller holds cachep->nodelists[0]->list_lock
while nodeid for block passed to free_block() comes from processor
(and node) #1...

I cannot find how this happened. Hopefully somebody else
will know... Meanwhile I'll try to get rid of PREEMPT, apparently
although it is now masqueraded under 'Low-latency desktop' it
is still somewhat dangerous. If it is triggered by preempt, that is.
Thanks,
Petr Vandrovec


ttyS0 at I/O 0x3f8 (irq = 0) is a 16550A
ttyS1 at I/O 0x2f8 (irq = 0) is a 16550A
ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
ttyS1 at I/O 0x2f8 (irq = 3) is a 16550A
parport: PnPBIOS parport detected.
parport0: PC-style at 0x378 (0x778), irq 7, dma 3 [PCSPP,TRISTATE,COMPAT,EPP,ECP,DMA]
io scheduler noop registered
io scheduler anticipatory registered
io scheduler deadline registered
io scheduler cfq registered
pktcdvd: v0.2.0a 2004-07-14 Jens Axboe ([email protected]) and [email protected]
Uniform Multi-Platform E-IDE driver Revision: 7.00alpha2
ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx
AMD8111: IDE controller at PCI slot 0000:00:07.1
AMD8111: chipset revision 3
AMD8111: not 100% native mode: will probe irqs later
AMD8111: 0000:00:07.1 (rev 03) UDMA133 controller
ide0: BM-DMA at 0xffa0-0xffa7, BIOS settings: hda:DMA, hdb:pio
ide1: BM-DMA at 0xffa8-0xffaf, BIOS settings: hdc:DMA, hdd:pio
----------- [cut here ] --------- [please bite here ] ---------
Kernel BUG at mm/slab.c:1849
invalid operand: 0000 [1] PREEMPT SMP
CPU 0
Modules linked in:
Pid: 8, comm: events/0 Not tainted 2.6.14-rc1-1619 #1
RIP: 0010:[<ffffffff8016e826>] <ffffffff8016e826>{free_block+294}
RSP: 0000:ffff81007ff21d88 EFLAGS: 00010002
RAX: 0000000000000001 RBX: 0000000000000001 RCX: 0000000000000310
RDX: 0000000000000000 RSI: ffff81007ffddd10 RDI: ffff81007ffda080
RBP: ffff81007ffde000 R08: ffff81003ffaed90 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: ffff81007ffc9b50
R13: ffff81007ffde048 R14: ffff81007ffda080 R15: ffff81007ffda080
FS: 0000000000000000(0000) GS:ffffffff805fb800(0000) knlGS:0000000000000000
CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 0000000000000000 CR3: 0000000000101000 CR4: 00000000000006e0
Process events/0 (pid: 8, threadinfo ffff81007ff20000, task ffff81003ff8c790)
Stack: 0000000000000000 0000000000000000 0000000000000213 0000000200000000
ffff81007ffddd10 ffff81007ffddd10 ffff81007ffddce8 0000000000000002
0000000000000000 ffff81007ffda080
Call Trace:<ffffffff8016fdc7>{drain_array_locked+167} <ffffffff8016feee>{cache_reap+206}
<ffffffff803a2374>{_spin_lock_irqsave+36} <ffffffff8016fe20>{cache_reap+0}
<ffffffff8014a1bc>{worker_thread+476} <ffffffff80132610>{default_wake_function+0}
<ffffffff80132610>{default_wake_function+0} <ffffffff80149fe0>{worker_thread+0}
<ffffffff8014ebc2>{kthread+146} <ffffffff8010ed12>{child_rip+8}
<ffffffff80149fe0>{worker_thread+0} <ffffffff8014eb30>{kthread+0}
<ffffffff8010ed0a>{child_rip+0}

Code: 0f 0b 68 bd aa 3d 80 c2 39 07 48 89 ee 4c 89 ff 4c 8d 75 30
RIP <ffffffff8016e826>{free_block+294} RSP <ffff81007ff21d88>
<6>note: events/0[8] exited with preempt_count 1
hda: _NEC DVD_RW ND-3500AG, ATAPI CD/DVD-ROM drive
ide0 at 0x1f0-0x1f7,0x3f6 on irq 14
Probing IDE interface ide1...
hdc: WDC WD1200JB-00CRA0, ATA DISK drive
ide1 at 0x170-0x177,0x376 on irq 15
hdc: max request size: 128KiB
hdc: 234441648 sectors (120034 MB) w/8192KiB Cache, CHS=65535/16/63, UDMA(100)
hdc: cache flushes not supported
hdc: hdc1
libata version 1.12 loaded.
sata_sil version 0.9
ACPI: PCI Interrupt 0000:01:0b.0[A] -> GSI 17 (level, low) -> IRQ 177
<and box is dead>


2005-09-15 17:33:30

by Petr Vandrovec

[permalink] [raw]
Subject: Re: 2.6.14-rc1-git-now still dying in mm/slab - this time line 1849

Petr Vandrovec wrote:
> Hello,
> so now once crashes on UP system were sorted out, I tried to
> put new kernel on my SMP host - and sorry to say, but it does not
> seem to work as advertised :-( It seems that we somehow got
> blocks from CPU#1 into memory blocks on CPU#0, and free_block
> complains that caller holds cachep->nodelists[0]->list_lock
> while nodeid for block passed to free_block() comes from processor
> (and node) #1...
>
> I cannot find how this happened. Hopefully somebody else
> will know... Meanwhile I'll try to get rid of PREEMPT, apparently
> although it is now masqueraded under 'Low-latency desktop' it
> is still somewhat dangerous. If it is triggered by preempt, that is.

It is not caused by preempt, non-preempt kernel crashes exactly same
way.
Petr

> Thanks,
> Petr Vandrovec
>
>
> ttyS0 at I/O 0x3f8 (irq = 0) is a 16550A
> ttyS1 at I/O 0x2f8 (irq = 0) is a 16550A
> ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
> ttyS1 at I/O 0x2f8 (irq = 3) is a 16550A
> parport: PnPBIOS parport detected.
> parport0: PC-style at 0x378 (0x778), irq 7, dma 3
> [PCSPP,TRISTATE,COMPAT,EPP,ECP,DMA]
> io scheduler noop registered
> io scheduler anticipatory registered
> io scheduler deadline registered
> io scheduler cfq registered
> pktcdvd: v0.2.0a 2004-07-14 Jens Axboe ([email protected]) and
> [email protected]
> Uniform Multi-Platform E-IDE driver Revision: 7.00alpha2
> ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx
> AMD8111: IDE controller at PCI slot 0000:00:07.1
> AMD8111: chipset revision 3
> AMD8111: not 100% native mode: will probe irqs later
> AMD8111: 0000:00:07.1 (rev 03) UDMA133 controller
> ide0: BM-DMA at 0xffa0-0xffa7, BIOS settings: hda:DMA, hdb:pio
> ide1: BM-DMA at 0xffa8-0xffaf, BIOS settings: hdc:DMA, hdd:pio
> ----------- [cut here ] --------- [please bite here ] ---------
> Kernel BUG at mm/slab.c:1849
> invalid operand: 0000 [1] PREEMPT SMP
> CPU 0
> Modules linked in:
> Pid: 8, comm: events/0 Not tainted 2.6.14-rc1-1619 #1
> RIP: 0010:[<ffffffff8016e826>] <ffffffff8016e826>{free_block+294}
> RSP: 0000:ffff81007ff21d88 EFLAGS: 00010002
> RAX: 0000000000000001 RBX: 0000000000000001 RCX: 0000000000000310
> RDX: 0000000000000000 RSI: ffff81007ffddd10 RDI: ffff81007ffda080
> RBP: ffff81007ffde000 R08: ffff81003ffaed90 R09: 0000000000000000
> R10: 0000000000000000 R11: 0000000000000000 R12: ffff81007ffc9b50
> R13: ffff81007ffde048 R14: ffff81007ffda080 R15: ffff81007ffda080
> FS: 0000000000000000(0000) GS:ffffffff805fb800(0000)
> knlGS:0000000000000000
> CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
> CR2: 0000000000000000 CR3: 0000000000101000 CR4: 00000000000006e0
> Process events/0 (pid: 8, threadinfo ffff81007ff20000, task
> ffff81003ff8c790)
> Stack: 0000000000000000 0000000000000000 0000000000000213 0000000200000000
> ffff81007ffddd10 ffff81007ffddd10 ffff81007ffddce8 0000000000000002
> 0000000000000000 ffff81007ffda080
> Call Trace:<ffffffff8016fdc7>{drain_array_locked+167}
> <ffffffff8016feee>{cache_reap+206}
> <ffffffff803a2374>{_spin_lock_irqsave+36}
> <ffffffff8016fe20>{cache_reap+0}
> <ffffffff8014a1bc>{worker_thread+476}
> <ffffffff80132610>{default_wake_function+0}
> <ffffffff80132610>{default_wake_function+0}
> <ffffffff80149fe0>{worker_thread+0}
> <ffffffff8014ebc2>{kthread+146} <ffffffff8010ed12>{child_rip+8}
> <ffffffff80149fe0>{worker_thread+0} <ffffffff8014eb30>{kthread+0}
> <ffffffff8010ed0a>{child_rip+0}
>
> Code: 0f 0b 68 bd aa 3d 80 c2 39 07 48 89 ee 4c 89 ff 4c 8d 75 30
> RIP <ffffffff8016e826>{free_block+294} RSP <ffff81007ff21d88>
> <6>note: events/0[8] exited with preempt_count 1
> hda: _NEC DVD_RW ND-3500AG, ATAPI CD/DVD-ROM drive
> ide0 at 0x1f0-0x1f7,0x3f6 on irq 14
> Probing IDE interface ide1...
> hdc: WDC WD1200JB-00CRA0, ATA DISK drive
> ide1 at 0x170-0x177,0x376 on irq 15
> hdc: max request size: 128KiB
> hdc: 234441648 sectors (120034 MB) w/8192KiB Cache, CHS=65535/16/63,
> UDMA(100)
> hdc: cache flushes not supported
> hdc: hdc1
> libata version 1.12 loaded.
> sata_sil version 0.9
> ACPI: PCI Interrupt 0000:01:0b.0[A] -> GSI 17 (level, low) -> IRQ 177
> <and box is dead>
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>


2005-09-19 16:02:15

by Petr Vandrovec

[permalink] [raw]
Subject: Re: 2.6.14-rc1-git-now still dying in mm/slab - this time line 1849

Andrew Morton wrote:
> Petr Vandrovec <[email protected]> wrote:
>
>>Andrew Morton wrote:
>> > Petr Vandrovec <[email protected]> wrote:
>> >
>> >> so now once crashes on UP system were sorted out, I tried to
>> >> put new kernel on my SMP host - and sorry to say, but it does not
>> >> seem to work as advertised :-(
>> >
>> > .config (again), please.
>>
>> Any SMP with NUMA. One which I'm trying to debug now is attached.
>> It is available at http://vana.vc.cvut.cz/config as well.
>
> I can get 2.6.14-rc1 to crash with your .config, but current -linus is OK.

It still dies for me, with current git (tree 7513cdadc661cfe0bd1625145a4876e54df191ca,
commit 6c0741fbdee5bd0f8ed13ac287c4ab18e8ba7d83). Config is available at
http://platan.vc.cvut.cz/config-vana.txt. Box is dual opteron Tyan K8W, S2885.

Any idea how to track problem down? I'm not sure bisect will work without
lot of interaction & patching, as almost all kernels after 2.6.13 were dying
with some other problems on that box...
Thanks,
Petr Vandrovec


Bootdata ok (command line is BOOT_IMAGE=Linux ro root=801 ramdisk=0 console=ttyS0,115200 console=tty0 nmi_watchdog=2 psmouse_noext=1 verbose)
Linux version 2.6.14-rc1-6c07 (root@vana) (gcc version 3.3.3 (Debian 20040401)) #4 SMP Mon Sep 19 17:44:44 CEST 2005
BIOS-provided physical RAM map:
BIOS-e820: 0000000000000000 - 000000000009fc00 (usable)
BIOS-e820: 000000000009fc00 - 00000000000a0000 (reserved)
BIOS-e820: 00000000000e0000 - 0000000000100000 (reserved)
BIOS-e820: 0000000000100000 - 000000007fff0000 (usable)
BIOS-e820: 000000007fff0000 - 000000007ffff000 (ACPI data)
BIOS-e820: 000000007ffff000 - 0000000080000000 (ACPI NVS)
BIOS-e820: 00000000ff780000 - 0000000100000000 (reserved)
SRAT: PXM 0 -> APIC 0 -> Node 0
SRAT: PXM 1 -> APIC 1 -> Node 1
SRAT: Node 0 PXM 0 100000-3fffffff
SRAT: Node 1 PXM 1 40000000-7fffffff
SRAT: Node 0 PXM 0 0-3fffffff
Bootmem setup node 0 0000000000000000-000000003fffffff
Bootmem setup node 1 0000000040000000-000000007ffeffff
ACPI: PM-Timer IO Port: 0x5008
ACPI: LAPIC (acpi_id[0x01] lapic_id[0x00] enabled)
Processor #0 15:5 APIC version 16
ACPI: LAPIC (acpi_id[0x02] lapic_id[0x01] enabled)
Processor #1 15:5 APIC version 16
ACPI: LAPIC (acpi_id[0x03] lapic_id[0x82] disabled)
ACPI: LAPIC (acpi_id[0x04] lapic_id[0x83] disabled)
ACPI: IOAPIC (id[0x02] address[0xfec00000] gsi_base[0])
IOAPIC[0]: apic_id 2, version 17, address 0xfec00000, GSI 0-23
ACPI: IOAPIC (id[0x03] address[0xff4ff000] gsi_base[24])
IOAPIC[1]: apic_id 3, version 17, address 0xff4ff000, GSI 24-27
ACPI: IOAPIC (id[0x04] address[0xff4fe000] gsi_base[28])
IOAPIC[2]: apic_id 4, version 17, address 0xff4fe000, GSI 28-31
ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
Setting APIC routing to flat
ACPI: HPET id: 0x102282a0 base: 0xfec01000
Using ACPI (MADT) for SMP configuration information
Allocating PCI resources starting at 88000000 (gap: 80000000:7f780000)
Checking aperture...
CPU 0: aperture @ c0000000 size 512 MB
CPU 1: aperture @ c0000000 size 512 MB
Built 2 zonelists
Kernel command line: BOOT_IMAGE=Linux ro root=801 ramdisk=0 console=ttyS0,115200 console=tty0 nmi_watchdog=2 psmouse_noext=1 verbose
Parameter psmouse_noext is obsolete, ignored
Initializing CPU#0
PID hash table entries: 4096 (order: 12, 131072 bytes)
time.c: Using 14.318180 MHz HPET timer.
time.c: Detected 1993.374 MHz processor.
Console: colour VGA+ 80x25
Dentry cache hash table entries: 524288 (order: 10, 4194304 bytes)
Inode-cache hash table entries: 262144 (order: 9, 2097152 bytes)
Memory: 2056328k/2097088k available (2616k kernel code, 40372k reserved, 1869k data, 248k init)
Calibrating delay using timer specific routine.. 3991.40 BogoMIPS (lpj=19957004)
Security Framework v1.0.0 initialized
SELinux: Initializing.
SELinux: Starting in permissive mode
Mount-cache hash table entries: 256
CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
CPU: L2 Cache: 1024K (64 bytes/line)
CPU 0(1) -> Node 0 -> Core 0
mtrr: v2.0 (20020519)
Using local APIC timer interrupts.
Detected 12.458 MHz APIC timer.
softlockup thread 0 started up.
Booting processor 1/2 APIC 0x1
Initializing CPU#1
Calibrating delay using timer specific routine.. 3986.60 BogoMIPS (lpj=19933040)
CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
CPU: L2 Cache: 1024K (64 bytes/line)
CPU 1(1) -> Node 1 -> Core 0
AMD Opteron(tm) Processor 246 stepping 0a
CPU 1: Syncing TSC to CPU 0.
CPU 1: synchronized TSC with CPU 0 (last diff -8 cycles, maxerr 1095 cycles)
Brought up 2 CPUs
softlockup thread 1 started up.
time.c: Using HPET based timekeeping.
testing NMI watchdog ... OK.
NET: Registered protocol family 16
ACPI: bus type pci registered
PCI: Using configuration type 1
ACPI: Subsystem revision 20050902
ACPI: Interpreter enabled
ACPI: Using IOAPIC for interrupt routing
ACPI: PCI Root Bridge [PCI0] (0000:00)
PCI: Probing PCI hardware (bus 00)
ACPI: PCI Root Bridge [PCIB] (0000:04)
PCI: Probing PCI hardware (bus 04)
ACPI: PCI Interrupt Link [LNKA] (IRQs 3 4 5 6 7 9 10 *11 12 14 15)
ACPI: PCI Interrupt Link [LNKB] (IRQs 3 4 5 6 7 9 *10 11 12 14 15)
ACPI: PCI Interrupt Link [LNKC] (IRQs 3 4 5 6 7 *9 10 11 12 14 15)
ACPI: PCI Interrupt Link [LNKD] (IRQs 3 4 *5 6 7 9 10 11 12 14 15)
Linux Plug and Play Support v0.97 (c) Adam Belay
pnp: PnP ACPI init
pnp: PnP ACPI: found 14 devices
SCSI subsystem initialized
PCI: Using ACPI for IRQ routing
PCI: If a device doesn't work, try "pci=routeirq". If it helps, post a report
TC classifier action (bugs to [email protected] cc [email protected])
hpet0: at MMIO 0xfec01000, IRQs 2, 8, 0
hpet0: 69ns tick, 3 32-bit timers
agpgart: Detected AMD 8151 AGP Bridge rev B3
agpgart: AGP aperture is 512M @ 0xc0000000
PCI-DMA: Disabling IOMMU.
pnp: 00:09: ioport range 0x680-0x6ff has been reserved
pnp: 00:09: ioport range 0x295-0x296 has been reserved
pnp: 00:09: ioport range 0xb78-0xb7f has been reserved
pnp: 00:09: ioport range 0xf78-0xf7f has been reserved
PCI: Bridge: 0000:00:06.0
IO window: 9000-afff
MEM window: ff100000-ff2fffff
PREFETCH window: disabled.
PCI: Bridge: 0000:00:0a.0
IO window: disabled.
MEM window: ff300000-ff3fffff
PREFETCH window: 9e900000-9e9fffff
PCI: Bridge: 0000:00:0b.0
IO window: disabled.
MEM window: disabled.
PREFETCH window: disabled.
PCI: Bridge: 0000:04:01.0
IO window: c000-cfff
MEM window: ff500000-ff5fffff
PREFETCH window: 9eb00000-beafffff
IA-32 Microcode Update Driver: v1.14 <[email protected]>
IA32 emulation $Id: sys_ia32.c,v 1.32 2002/03/24 13:02:28 ak Exp $
audit: initializing netlink socket (disabled)
audit(1127144995.320:1): initialized
Total HugeTLB memory allocated, 0
SELinux: Registering netfilter hooks
Initializing Cryptographic API
PCI: MSI quirk detected. pci_msi_quirk set.
PCI: MSI quirk detected. pci_msi_quirk set.
ACPI: PCI Interrupt 0000:05:00.0[A] -> GSI 16 (level, low) -> IRQ 169
radeonfb: Found Intel x86 BIOS ROM Image
radeonfb: Retreived PLL infos from BIOS
radeonfb: Reference=27.00 MHz (RefDiv=12) Memory=200.00 Mhz, System=166.00 MHz
radeonfb: PLL min 20000 max 40000
radeonfb: Monitor 1 type DFP found
radeonfb: EDID probed
radeonfb: Monitor 2 type no found
Console: switching to colour frame buffer device 240x75
radeonfb (0000:05:00.0): ATI Radeon Yd
ACPI: Power Button (FF) [PWRF]
ACPI: Power Button (CM) [PWRB]
Using specific hotkey driver
ACPI: CPU0 (power states: C1[C1])
ACPI: Processor [CPU1] (supports 8 throttling states)
ACPI: CPU1 (power states: C1[C1])
Real Time Clock Driver v1.12
hpet_acpi_add: no address or irqs in _CRS
Linux agpgart interface v0.101 (c) Dave Jones
[drm] Initialized drm 1.0.0 20040925
PNP: PS/2 Controller [PNP0303:PS2K] at 0x60,0x64 irq 1
PNP: PS/2 controller doesn't have AUX irq; using default 12
serio: i8042 AUX port at 0x60,0x64 irq 12
serio: i8042 KBD port at 0x60,0x64 irq 1
Serial: 8250/16550 driver $Revision: 1.90 $ 4 ports, IRQ sharing disabled
ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
ttyS1 at I/O 0x2f8 (irq = 3) is a 16550A
ttyS0 at I/O 0x3f8 (irq = 0) is a 16550A
ttyS1 at I/O 0x2f8 (irq = 0) is a 16550A
ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
ttyS1 at I/O 0x2f8 (irq = 3) is a 16550A
parport: PnPBIOS parport detected.
parport0: PC-style at 0x378 (0x778), irq 7, dma 3 [PCSPP,TRISTATE,COMPAT,EPP,ECP,DMA]
io scheduler noop registered
io scheduler anticipatory registered
io scheduler deadline registered
io scheduler cfq registered
pktcdvd: v0.2.0a 2004-07-14 Jens Axboe ([email protected]) and [email protected]
Uniform Multi-Platform E-IDE driver Revision: 7.00alpha2
ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx
AMD8111: IDE controller at PCI slot 0000:00:07.1
AMD8111: chipset revision 3
AMD8111: not 100% native mode: will probe irqs later
AMD8111: 0000:00:07.1 (rev 03) UDMA133 controller
ide0: BM-DMA at 0xffa0-0xffa7, BIOS settings: hda:DMA, hdb:pio
ide1: BM-DMA at 0xffa8-0xffaf, BIOS settings: hdc:DMA, hdd:pio
----------- [cut here ] --------- [please bite here ] ---------
Kernel BUG at mm/slab.c:1849
invalid operand: 0000 [1] SMP
CPU 0
Modules linked in:
Pid: 8, comm: events/0 Not tainted 2.6.14-rc1-6c07 #1
RIP: 0010:[<ffffffff8016d316>] <ffffffff8016d316>{free_block+294}
RSP: 0000:ffff81007ff21d88 EFLAGS: 00010002
RAX: 0000000000000001 RBX: 0000000000000001 RCX: 0000000000000310
RDX: 0000000000000000 RSI: ffff81007ffddd10 RDI: ffff81007ffda080
RBP: ffff81007ffde000 R08: ffff81003ffa0d50 R09: 0000000000000000
R10: 00000000ffffffff R11: 0000000000000000 R12: ffff81007ffc9b50
R13: ffff81007ffde048 R14: ffff81007ffda080 R15: ffff81007ffda080
FS: 0000000000000000(0000) GS:ffffffff805f2800(0000) knlGS:0000000000000000
CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 0000000000000000 CR3: 0000000000101000 CR4: 00000000000006e0
Process events/0 (pid: 8, threadinfo ffff81007ff20000, task ffff81003ff8c790)
Stack: 0000000000000000 0000000000000000 0000000000000292 hda: _NEC DVD_RW ND-3500AG, ATAPI CD/DVD-ROM drive
0000000200000000
ffff81007ffddd10 ffff81007ffddd10 ffff81007ffddce8 0000000000000002
0000000000000000 ffff81007ffda080
Call Trace:<ffffffff8016e8b7>{drain_array_locked+167} <ffffffff8016e9f7>{cache_reap+231}
<ffffffff80131e23>{__wake_up+67} <ffffffff8016e910>{cache_reap+0}
<ffffffff8014930c>{worker_thread+476} <ffffffff80131d60>{default_wake_function+0}
<ffffffff80131d60>{default_wake_function+0} <ffffffff80149130>{worker_thread+0}
<ffffffff8014db82>{kthread+146} <ffffffff8010ec22>{child_rip+8}
<ffffffff80149130>{worker_thread+0} <ffffffff8014daf0>{kthread+0}
<ffffffff8010ec1a>{child_rip+0}

Code: 0f 0b 68 9d 26 3d 80 c2 39 07 48 89 ee 4c 89 ff 4c 8d 75 30
RIP <ffffffff8016d316>{free_block+294} RSP <ffff81007ff21d88>
ide0 at 0x1f0-0x1f7,0x3f6 on irq 14
Probing IDE interface ide1...
hdc: WDC WD1200JB-00CRA0, ATA DISK drive
ide1 at 0x170-0x177,0x376 on irq 15
hdc: max request size: 128KiB
hdc: 234441648 sectors (120034 MB) w/8192KiB Cache, CHS=65535/16/63, UDMA(100)
hdc: cache flushes not supported
hdc: hdc1
libata version 1.12 loaded.
sata_sil version 0.9
ACPI: PCI Interrupt 0000:01:0b.0[A] -> GSI 17 (level, low) -> IRQ 177

2005-09-19 18:30:15

by Andrew Morton

[permalink] [raw]
Subject: Re: 2.6.14-rc1-git-now still dying in mm/slab - this time line 1849

Petr Vandrovec <[email protected]> wrote:
>
> Andrew Morton wrote:
> > Petr Vandrovec <[email protected]> wrote:
> >
> >>Andrew Morton wrote:
> >> > Petr Vandrovec <[email protected]> wrote:
> >> >
> >> >> so now once crashes on UP system were sorted out, I tried to
> >> >> put new kernel on my SMP host - and sorry to say, but it does not
> >> >> seem to work as advertised :-(
> >> >
> >> > .config (again), please.
> >>
> >> Any SMP with NUMA. One which I'm trying to debug now is attached.
> >> It is available at http://vana.vc.cvut.cz/config as well.
> >
> > I can get 2.6.14-rc1 to crash with your .config, but current -linus is OK.
>
> It still dies for me, with current git (tree 7513cdadc661cfe0bd1625145a4876e54df191ca,
> commit 6c0741fbdee5bd0f8ed13ac287c4ab18e8ba7d83). Config is available at
> http://platan.vc.cvut.cz/config-vana.txt. Box is dual opteron Tyan K8W, S2885.
>
> ...
>
> ide0: BM-DMA at 0xffa0-0xffa7, BIOS settings: hda:DMA, hdb:pio
> ide1: BM-DMA at 0xffa8-0xffaf, BIOS settings: hdc:DMA, hdd:pio
> ----------- [cut here ] --------- [please bite here ] ---------
> Kernel BUG at mm/slab.c:1849
> invalid operand: 0000 [1] SMP
> CPU 0
> Modules linked in:
> Pid: 8, comm: events/0 Not tainted 2.6.14-rc1-6c07 #1
> RIP: 0010:[<ffffffff8016d316>] <ffffffff8016d316>{free_block+294}
> RSP: 0000:ffff81007ff21d88 EFLAGS: 00010002
> RAX: 0000000000000001 RBX: 0000000000000001 RCX: 0000000000000310
> RDX: 0000000000000000 RSI: ffff81007ffddd10 RDI: ffff81007ffda080
> RBP: ffff81007ffde000 R08: ffff81003ffa0d50 R09: 0000000000000000
> R10: 00000000ffffffff R11: 0000000000000000 R12: ffff81007ffc9b50
> R13: ffff81007ffde048 R14: ffff81007ffda080 R15: ffff81007ffda080
> FS: 0000000000000000(0000) GS:ffffffff805f2800(0000) knlGS:0000000000000000
> CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
> CR2: 0000000000000000 CR3: 0000000000101000 CR4: 00000000000006e0
> Process events/0 (pid: 8, threadinfo ffff81007ff20000, task ffff81003ff8c790)
> Stack: 0000000000000000 0000000000000000 0000000000000292 hda: _NEC DVD_RW ND-3500AG, ATAPI CD/DVD-ROM drive
> 0000000200000000
> ffff81007ffddd10 ffff81007ffddd10 ffff81007ffddce8 0000000000000002
> 0000000000000000 ffff81007ffda080
> Call Trace:<ffffffff8016e8b7>{drain_array_locked+167} <ffffffff8016e9f7>{cache_reap+231}
> <ffffffff80131e23>{__wake_up+67} <ffffffff8016e910>{cache_reap+0}
> <ffffffff8014930c>{worker_thread+476} <ffffffff80131d60>{default_wake_function+0}
> <ffffffff80131d60>{default_wake_function+0} <ffffffff80149130>{worker_thread+0}
> <ffffffff8014db82>{kthread+146} <ffffffff8010ec22>{child_rip+8}
> <ffffffff80149130>{worker_thread+0} <ffffffff8014daf0>{kthread+0}
> <ffffffff8010ec1a>{child_rip+0}
>
> Code: 0f 0b 68 9d 26 3d 80 c2 39 07 48 89 ee 4c 89 ff 4c 8d 75 30
> RIP <ffffffff8016d316>{free_block+294} RSP <ffff81007ff21d88>
> ide0 at 0x1f0-0x1f7,0x3f6 on irq 14

Well. The CPU_UP_CANCELED locking in cpuup_callback() looks borked to me -
it takes cachep->nodelists[node]->list_lock and then calls
drain_alien_cache() which appears to take the same lock. But that's not
the problem here.

The code in cache_reap() recalculates numa_node_id() multiple times, so if
the caller changes CPUs then this assertion will trigger. However it's
running under keventd here, which is pinned to a single CPU. Still, it
would be useful if you could try putting preempt_disable()s in
cache_reap(), or change cache_reap() to evaluate numa_node_id() just the
once, and cache that in a local variable.

I wonder why numa_node_id() uses raw_smp_processor_id()? That's just
asking for preempt non-atomicity bugs.

2005-09-19 18:52:41

by Christoph Lameter

[permalink] [raw]
Subject: Re: 2.6.14-rc1-git-now still dying in mm/slab - this time line 1849

On Mon, 19 Sep 2005, Andrew Morton wrote:

> Well. The CPU_UP_CANCELED locking in cpuup_callback() looks borked to me -
> it takes cachep->nodelists[node]->list_lock and then calls
> drain_alien_cache() which appears to take the same lock. But that's not
> the problem here.
>
> The code in cache_reap() recalculates numa_node_id() multiple times, so if
> the caller changes CPUs then this assertion will trigger. However it's
> running under keventd here, which is pinned to a single CPU. Still, it
> would be useful if you could try putting preempt_disable()s in
> cache_reap(), or change cache_reap() to evaluate numa_node_id() just the
> once, and cache that in a local variable.

drain_array_cache_locked calls check_spinlock_acquired_node which is in
turn insuring that interrupts are off. So no move to a different processor
should be possible.

However, that is contradicted by __wake_up calling
drain_array_cache_locked. The process just woke up?

> I wonder why numa_node_id() uses raw_smp_processor_id()? That's just
> asking for preempt non-atomicity bugs.

Accessing arrays indexed by node number even works if the process
continues to be executed on another node.

2005-09-19 18:56:29

by Petr Vandrovec

[permalink] [raw]
Subject: Re: 2.6.14-rc1-git-now still dying in mm/slab - this time line 1849

Andrew Morton wrote:
> Petr Vandrovec <[email protected]> wrote:
>
>>Andrew Morton wrote:
>>
>>>Petr Vandrovec <[email protected]> wrote:
>>>
>>>
>>>>Andrew Morton wrote:
>>>>
>>>>>Petr Vandrovec <[email protected]> wrote:
>>>>>
>>>>>
>>>>>> so now once crashes on UP system were sorted out, I tried to
>>>>>>put new kernel on my SMP host - and sorry to say, but it does not
>>>>>>seem to work as advertised :-(
>>>>>
>>>>>.config (again), please.
>>>>
>>>>Any SMP with NUMA. One which I'm trying to debug now is attached.
>>>>It is available at http://vana.vc.cvut.cz/config as well.
>>>
>>>I can get 2.6.14-rc1 to crash with your .config, but current -linus is OK.
>>
>>It still dies for me, with current git (tree 7513cdadc661cfe0bd1625145a4876e54df191ca,
>>commit 6c0741fbdee5bd0f8ed13ac287c4ab18e8ba7d83). Config is available at
>>http://platan.vc.cvut.cz/config-vana.txt. Box is dual opteron Tyan K8W, S2885.
>>
>>...
>>
>> ide0: BM-DMA at 0xffa0-0xffa7, BIOS settings: hda:DMA, hdb:pio
>> ide1: BM-DMA at 0xffa8-0xffaf, BIOS settings: hdc:DMA, hdd:pio
>>----------- [cut here ] --------- [please bite here ] ---------
>>Kernel BUG at mm/slab.c:1849
>>invalid operand: 0000 [1] SMP
>>CPU 0
>>Modules linked in:
>>Pid: 8, comm: events/0 Not tainted 2.6.14-rc1-6c07 #1
>>RIP: 0010:[<ffffffff8016d316>] <ffffffff8016d316>{free_block+294}
>>RSP: 0000:ffff81007ff21d88 EFLAGS: 00010002
>>RAX: 0000000000000001 RBX: 0000000000000001 RCX: 0000000000000310
>>RDX: 0000000000000000 RSI: ffff81007ffddd10 RDI: ffff81007ffda080
>>RBP: ffff81007ffde000 R08: ffff81003ffa0d50 R09: 0000000000000000
>>R10: 00000000ffffffff R11: 0000000000000000 R12: ffff81007ffc9b50
>>R13: ffff81007ffde048 R14: ffff81007ffda080 R15: ffff81007ffda080
>>FS: 0000000000000000(0000) GS:ffffffff805f2800(0000) knlGS:0000000000000000
>>CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
>>CR2: 0000000000000000 CR3: 0000000000101000 CR4: 00000000000006e0
>>Process events/0 (pid: 8, threadinfo ffff81007ff20000, task ffff81003ff8c790)
>>Stack: 0000000000000000 0000000000000000 0000000000000292 hda: _NEC DVD_RW ND-3500AG, ATAPI CD/DVD-ROM drive
>>0000000200000000
>> ffff81007ffddd10 ffff81007ffddd10 ffff81007ffddce8 0000000000000002
>> 0000000000000000 ffff81007ffda080
>>Call Trace:<ffffffff8016e8b7>{drain_array_locked+167} <ffffffff8016e9f7>{cache_reap+231}
>> <ffffffff80131e23>{__wake_up+67} <ffffffff8016e910>{cache_reap+0}
>> <ffffffff8014930c>{worker_thread+476} <ffffffff80131d60>{default_wake_function+0}
>> <ffffffff80131d60>{default_wake_function+0} <ffffffff80149130>{worker_thread+0}
>> <ffffffff8014db82>{kthread+146} <ffffffff8010ec22>{child_rip+8}
>> <ffffffff80149130>{worker_thread+0} <ffffffff8014daf0>{kthread+0}
>> <ffffffff8010ec1a>{child_rip+0}
>>
>>Code: 0f 0b 68 9d 26 3d 80 c2 39 07 48 89 ee 4c 89 ff 4c 8d 75 30
>>RIP <ffffffff8016d316>{free_block+294} RSP <ffff81007ff21d88>
>> ide0 at 0x1f0-0x1f7,0x3f6 on irq 14
>
>
> Well. The CPU_UP_CANCELED locking in cpuup_callback() looks borked to me -
> it takes cachep->nodelists[node]->list_lock and then calls
> drain_alien_cache() which appears to take the same lock. But that's not
> the problem here.
>
> The code in cache_reap() recalculates numa_node_id() multiple times, so if
> the caller changes CPUs then this assertion will trigger. However it's
> running under keventd here, which is pinned to a single CPU. Still, it
> would be useful if you could try putting preempt_disable()s in
> cache_reap(), or change cache_reap() to evaluate numa_node_id() just the
> once, and cache that in a local variable.
>
> I wonder why numa_node_id() uses raw_smp_processor_id()? That's just
> asking for preempt non-atomicity bugs.

I've thought that this is problem, but as far as I can tell while this is
problem it does not happen here. Just free_block() finds that pointer it
got from caller belongs to the slab that belongs to the CPU#1/node#1
while caller obtained lock on CPU#0/node#0 structures. Which suggests
that drain_array_locked() was issued with node #0 while array_cache->entry
it got contains blocks which belong to node #1. Which I cannot explain.
Petr

2005-09-19 19:09:00

by Christoph Lameter

[permalink] [raw]
Subject: Re: 2.6.14-rc1-git-now still dying in mm/slab - this time line 1849

On Mon, 19 Sep 2005, Petr Vandrovec wrote:

> I've thought that this is problem, but as far as I can tell while this is
> problem it does not happen here. Just free_block() finds that pointer it
> got from caller belongs to the slab that belongs to the CPU#1/node#1
> while caller obtained lock on CPU#0/node#0 structures. Which suggests
> that drain_array_locked() was issued with node #0 while array_cache->entry
> it got contains blocks which belong to node #1. Which I cannot explain.

That can happen if node 0 runs out of memory and the page_allocator falls
back to take memory from node 1 for node 0 requests.

Maybe we have a problem here.

2005-09-19 19:29:29

by Andrew Morton

[permalink] [raw]
Subject: Re: 2.6.14-rc1-git-now still dying in mm/slab - this time line 1849

Christoph Lameter <[email protected]> wrote:
>
> On Mon, 19 Sep 2005, Andrew Morton wrote:
>
> > Well. The CPU_UP_CANCELED locking in cpuup_callback() looks borked to me -
> > it takes cachep->nodelists[node]->list_lock and then calls
> > drain_alien_cache() which appears to take the same lock. But that's not
> > the problem here.
> >
> > The code in cache_reap() recalculates numa_node_id() multiple times, so if
> > the caller changes CPUs then this assertion will trigger. However it's
> > running under keventd here, which is pinned to a single CPU. Still, it
> > would be useful if you could try putting preempt_disable()s in
> > cache_reap(), or change cache_reap() to evaluate numa_node_id() just the
> > once, and cache that in a local variable.
>
> drain_array_cache_locked calls check_spinlock_acquired_node which is in
> turn insuring that interrupts are off. So no move to a different processor
> should be possible.

list_for_each(walk, &cache_chain) {
kmem_cache_t *searchp;
struct list_head* p;
int tofree;
struct slab *slabp;

searchp = list_entry(walk, kmem_cache_t, next);

if (searchp->flags & SLAB_NO_REAP)
goto next;

check_irq_on();

l3 = searchp->nodelists[numa_node_id()];
if (l3->alien)
drain_alien_cache(searchp, l3);
->preempt here
spin_lock_irq(&l3->list_lock);

drain_array_locked(searchp, ac_data(searchp), 0,
numa_node_id());
->oops, wrong node.


Still, this should all be pinned to one CPU, by happenstance.

> However, that is contradicted by __wake_up calling
> drain_array_cache_locked. The process just woke up?

Not sure what you mean here.

> > I wonder why numa_node_id() uses raw_smp_processor_id()? That's just
> > asking for preempt non-atomicity bugs.
>
> Accessing arrays indexed by node number even works if the process
> continues to be executed on another node.

That's a special case and the callers should be changed to use a new
raw_numa_node_id() in that case.

Code which calls numa_node_id() and then continues to use the result of
that in preemptible code is often buggy. Code which reevaluates
numa_node_id() in preemptible code and assumes that it returned the same
thing is even buggier (unless it happens to be CPU pinned).

numa_node_id() is doing a bad thing and should be converted to use
smp_processor_id() so we can identify all the possibly-buggy callsites.


2005-09-20 05:06:14

by Christoph Lameter

[permalink] [raw]
Subject: Re: 2.6.14-rc1-git-now still dying in mm/slab - this time line 1849

On Mon, 19 Sep 2005, Andrew Morton wrote:

> list_for_each(walk, &cache_chain) {
> kmem_cache_t *searchp;
> struct list_head* p;
> int tofree;
> struct slab *slabp;
>
> searchp = list_entry(walk, kmem_cache_t, next);
>
> if (searchp->flags & SLAB_NO_REAP)
> goto next;
>
> check_irq_on();
>
> l3 = searchp->nodelists[numa_node_id()];
> if (l3->alien)
> drain_alien_cache(searchp, l3);
> ->preempt here
> spin_lock_irq(&l3->list_lock);
>
> drain_array_locked(searchp, ac_data(searchp), 0,
> numa_node_id());
> ->oops, wrong node.

This is called from keventd which exists per processor. Hmmm... This looks
as if it can change processors after all but the slab allocator depends on
it running on the right processor. So does the page allocator. sigh. What
is the point of having per processor workqueues if they do not stay on
the assigned processor?

The fast fix for this case is to get the node number once and then use it
consistently. But we really need to audit the slab and page allocator for
additional cases like this or disable preempt and check for the right
processor in cache_reap().

Index: linux-2.6/mm/slab.c
===================================================================
--- linux-2.6.orig/mm/slab.c 2005-09-19 14:10:33.489800899 -0700
+++ linux-2.6/mm/slab.c 2005-09-19 14:10:44.555105862 -0700
@@ -3262,6 +3262,7 @@
{
struct list_head *walk;
struct kmem_list3 *l3;
+ int node = numa_node_id();

if (down_trylock(&cache_chain_sem)) {
/* Give up. Setup the next iteration. */
@@ -3282,13 +3283,13 @@

check_irq_on();

- l3 = searchp->nodelists[numa_node_id()];
+ l3 = searchp->nodelists[node];
if (l3->alien)
drain_alien_cache(searchp, l3);
spin_lock_irq(&l3->list_lock);

drain_array_locked(searchp, ac_data(searchp), 0,
- numa_node_id());
+ node);

if (time_after(l3->next_reap, jiffies))
goto next_unlock;
@@ -3297,7 +3298,7 @@

if (l3->shared)
drain_array_locked(searchp, l3->shared, 0,
- numa_node_id());
+ node);

if (l3->free_touched) {
l3->free_touched = 0;

2005-09-20 05:17:11

by Andrew Morton

[permalink] [raw]
Subject: Re: 2.6.14-rc1-git-now still dying in mm/slab - this time line 1849

Christoph Lameter <[email protected]> wrote:
>
> On Mon, 19 Sep 2005, Andrew Morton wrote:
>
> > list_for_each(walk, &cache_chain) {
> > kmem_cache_t *searchp;
> > struct list_head* p;
> > int tofree;
> > struct slab *slabp;
> >
> > searchp = list_entry(walk, kmem_cache_t, next);
> >
> > if (searchp->flags & SLAB_NO_REAP)
> > goto next;
> >
> > check_irq_on();
> >
> > l3 = searchp->nodelists[numa_node_id()];
> > if (l3->alien)
> > drain_alien_cache(searchp, l3);
> > ->preempt here
> > spin_lock_irq(&l3->list_lock);
> >
> > drain_array_locked(searchp, ac_data(searchp), 0,
> > numa_node_id());
> > ->oops, wrong node.
>
> This is called from keventd which exists per processor. Hmmm... This looks
> as if it can change processors after all

Well no, it would be a big bug if a keventd thread were to change CPUs.

It's OK to rely upon the pinnedness of keventd I guess - a comment would be
nice.

> but the slab allocator depends on
> it running on the right processor. So does the page allocator. sigh. What
> is the point of having per processor workqueues if they do not stay on
> the assigned processor?

They do. I don't believe that preemption is the source of this BUG.
(Petr, does CONFIG_PREEMPT=n fix it?)

> The fast fix for this case is to get the node number once and then use it
> consistently.

If one is writing preempt-safe code then one should disable preemption
before copying the current CPU number into a local variable.

> But we really need to audit the slab and page allocator for
> additional cases like this or disable preempt and check for the right
> processor in cache_reap().

numa_node_id() must use smp_processor_id(), not raw_smp_processor_id().
Then all the runtime squawks need to be audited and fixed, or switched to
(new) raw_numa_node_id() if is is verified that a CPU/node switch at any
time is OK.


2005-09-20 08:32:01

by Alok Kataria

[permalink] [raw]
Subject: Re: 2.6.14-rc1-git-now still dying in mm/slab - this time line 1849

Hi,

Attached is a patch which stores the numa_node_id in a local variable
after disabling interrupts, in the cache_reap code path.
I was not able to reproduce the bug that Petr was talking about, but if
the cache reap threads do schedule across cpu's which might be the
problem here then this should just fix it.

Andrew, i also have a patch which fixes the CPU_DOWN code path, which i
will send u later.

Thanks & Regards,
Alok.
On Tue, 2005-09-20 at 10:46, Andrew Morton wrote:
> Christoph Lameter <[email protected]> wrote:
> >
> > On Mon, 19 Sep 2005, Andrew Morton wrote:
> >
> > > list_for_each(walk, &cache_chain) {
> > > kmem_cache_t *searchp;
> > > struct list_head* p;
> > > int tofree;
> > > struct slab *slabp;
> > >
> > > searchp = list_entry(walk, kmem_cache_t, next);
> > >
> > > if (searchp->flags & SLAB_NO_REAP)
> > > goto next;
> > >
> > > check_irq_on();
> > >
> > > l3 = searchp->nodelists[numa_node_id()];
> > > if (l3->alien)
> > > drain_alien_cache(searchp, l3);
> > > ->preempt here
> > > spin_lock_irq(&l3->list_lock);
> > >
> > > drain_array_locked(searchp, ac_data(searchp), 0,
> > > numa_node_id());
> > > ->oops, wrong node.
> >
> > This is called from keventd which exists per processor. Hmmm... This looks
> > as if it can change processors after all
>
> Well no, it would be a big bug if a keventd thread were to change CPUs.
>
> It's OK to rely upon the pinnedness of keventd I guess - a comment would be
> nice.
>
> > but the slab allocator depends on
> > it running on the right processor. So does the page allocator. sigh. What
> > is the point of having per processor workqueues if they do not stay on
> > the assigned processor?
>
> They do. I don't believe that preemption is the source of this BUG.
> (Petr, does CONFIG_PREEMPT=n fix it?)
>
> > The fast fix for this case is to get the node number once and then use it
> > consistently.
>
> If one is writing preempt-safe code then one should disable preemption
> before copying the current CPU number into a local variable.
>
> > But we really need to audit the slab and page allocator for
> > additional cases like this or disable preempt and check for the right
> > processor in cache_reap().
>
> numa_node_id() must use smp_processor_id(), not raw_smp_processor_id().
> Then all the runtime squawks need to be audited and fixed, or switched to
> (new) raw_numa_node_id() if is is verified that a CPU/node switch at any
> time is OK.
>
--
There are two ways of constructing a software design. One way is to make
it so simple that there are obviously no deficiencies. And the other way
is to make it so complicated that there are no obvious deficiencies.


Attachments:
cache_reap_nodeid.patch (1.41 kB)

2005-09-20 13:58:19

by Petr Vandrovec

[permalink] [raw]
Subject: Re: 2.6.14-rc1-git-now still dying in mm/slab - this time line 1849

Andrew Morton wrote:
> Christoph Lameter <[email protected]> wrote:
>
>>On Mon, 19 Sep 2005, Andrew Morton wrote:
>>
>>
>>> list_for_each(walk, &cache_chain) {
>>> kmem_cache_t *searchp;
>>> struct list_head* p;
>>> int tofree;
>>> struct slab *slabp;
>>>
>>> searchp = list_entry(walk, kmem_cache_t, next);
>>>
>>> if (searchp->flags & SLAB_NO_REAP)
>>> goto next;
>>>
>>> check_irq_on();
>>>
>>> l3 = searchp->nodelists[numa_node_id()];
>>> if (l3->alien)
>>> drain_alien_cache(searchp, l3);
>>>->preempt here
>>> spin_lock_irq(&l3->list_lock);
>>>
>>> drain_array_locked(searchp, ac_data(searchp), 0,
>>> numa_node_id());
>>>->oops, wrong node.
>>
>>This is called from keventd which exists per processor. Hmmm... This looks
>>as if it can change processors after all
>
>
> Well no, it would be a big bug if a keventd thread were to change CPUs.
>
> It's OK to rely upon the pinnedness of keventd I guess - a comment would be
> nice.
>
>
>>but the slab allocator depends on
>>it running on the right processor. So does the page allocator. sigh. What
>>is the point of having per processor workqueues if they do not stay on
>>the assigned processor?
>
>
> They do. I don't believe that preemption is the source of this BUG.
> (Petr, does CONFIG_PREEMPT=n fix it?)

No, it does not. I've even added printks here and there to show node number,
and everything works as it should. Maybe there are some problems with
numa_node_id() and migrating between processors when memory gets released,
I do not know.

Only thing I know that if I'll add WARN_ON below to the free_block(), it
triggers...

@free_block
slabp = GET_PAGE_SLAB(virt_to_page(objp));
nodeid = slabp->nodeid;
+ WARN_ON(nodeid != numa_node_id()); <<<<<
l3 = cachep->nodelist[nodeid];
list_del(&slabp->list);
objnr = (objp - slabp->s_mem) / cachep->objsize;
check_spinlock_acquired_node(cachep, nodeid);
check_slabp(cachep, slabp);

... saying that keventd/0 tries to operate on
slab belonging to node#1, while having acquired lock for cachep belonging
to node #0. Due to this check_spinlock_acquired_node(cachep, nodeid) fails
(check_spinlock_acquired_node(cachep, 0) would succeed).
Petr

2005-09-21 01:04:22

by Christoph Lameter

[permalink] [raw]
Subject: Re: 2.6.14-rc1-git-now still dying in mm/slab - this time line 1849

On Tue, 20 Sep 2005, Petr Vandrovec wrote:

> slab belonging to node#1, while having acquired lock for cachep belonging
> to node #0. Due to this check_spinlock_acquired_node(cachep, nodeid) fails
> (check_spinlock_acquired_node(cachep, 0) would succeed).

Hmmm. If a node runs out of memory then pages from another node may end up
on the slab list of a node. But it seems that free_block cannot handle
that properly.

How are you producing the problem?

Could you try the following patch:

---

The numa slab allocator may allocate pages from foreign nodes onto the lists
for a particular node if a node runs out of memory. Inspecting the slab->nodeid
field will not reflect that the page is now in use for the slabs of another node.

This patch fixes that issue by adding a node field to free_block so that the caller
can indicate which node currently uses a slab.

Also removes the check for the current node from kmalloc_cache_node since the
process may shift later to another node which may lead to an allocation on another
node than intended.

Signed-off-by: Christoph Lameter <[email protected]>

Index: linux-2.6.14-rc1/mm/slab.c
===================================================================
--- linux-2.6.14-rc1.orig/mm/slab.c 2005-09-21 00:09:05.000000000 +0000
+++ linux-2.6.14-rc1/mm/slab.c 2005-09-21 00:48:12.000000000 +0000
@@ -639,7 +639,7 @@ static enum {

static DEFINE_PER_CPU(struct work_struct, reap_work);

-static void free_block(kmem_cache_t* cachep, void** objpp, int len);
+static void free_block(kmem_cache_t* cachep, void** objpp, int len, int node);
static void enable_cpucache (kmem_cache_t *cachep);
static void cache_reap (void *unused);
static int __node_shrink(kmem_cache_t *cachep, int node);
@@ -804,7 +804,7 @@ static inline void __drain_alien_cache(k

if (ac->avail) {
spin_lock(&rl3->list_lock);
- free_block(cachep, ac->entry, ac->avail);
+ free_block(cachep, ac->entry, ac->avail, node);
ac->avail = 0;
spin_unlock(&rl3->list_lock);
}
@@ -925,7 +925,7 @@ static int __devinit cpuup_callback(stru
/* Free limit for this kmem_list3 */
l3->free_limit -= cachep->batchcount;
if (nc)
- free_block(cachep, nc->entry, nc->avail);
+ free_block(cachep, nc->entry, nc->avail, node);

if (!cpus_empty(mask)) {
spin_unlock(&l3->list_lock);
@@ -934,7 +934,7 @@ static int __devinit cpuup_callback(stru

if (l3->shared) {
free_block(cachep, l3->shared->entry,
- l3->shared->avail);
+ l3->shared->avail, node);
kfree(l3->shared);
l3->shared = NULL;
}
@@ -1882,12 +1882,13 @@ static void do_drain(void *arg)
{
kmem_cache_t *cachep = (kmem_cache_t*)arg;
struct array_cache *ac;
+ int node = numa_node_id();

check_irq_off();
ac = ac_data(cachep);
- spin_lock(&cachep->nodelists[numa_node_id()]->list_lock);
- free_block(cachep, ac->entry, ac->avail);
- spin_unlock(&cachep->nodelists[numa_node_id()]->list_lock);
+ spin_lock(&cachep->nodelists[node]->list_lock);
+ free_block(cachep, ac->entry, ac->avail, node);
+ spin_unlock(&cachep->nodelists[node]->list_lock);
ac->avail = 0;
}

@@ -2608,7 +2609,7 @@ done:
/*
* Caller needs to acquire correct kmem_list's list_lock
*/
-static void free_block(kmem_cache_t *cachep, void **objpp, int nr_objects)
+static void free_block(kmem_cache_t *cachep, void **objpp, int nr_objects, int node)
{
int i;
struct kmem_list3 *l3;
@@ -2617,14 +2618,12 @@ static void free_block(kmem_cache_t *cac
void *objp = objpp[i];
struct slab *slabp;
unsigned int objnr;
- int nodeid = 0;

slabp = GET_PAGE_SLAB(virt_to_page(objp));
- nodeid = slabp->nodeid;
- l3 = cachep->nodelists[nodeid];
+ l3 = cachep->nodelists[node];
list_del(&slabp->list);
objnr = (objp - slabp->s_mem) / cachep->objsize;
- check_spinlock_acquired_node(cachep, nodeid);
+ check_spinlock_acquired_node(cachep, node);
check_slabp(cachep, slabp);


@@ -2664,13 +2663,14 @@ static void cache_flusharray(kmem_cache_
{
int batchcount;
struct kmem_list3 *l3;
+ int node = numa_node_id();

batchcount = ac->batchcount;
#if DEBUG
BUG_ON(!batchcount || batchcount > ac->avail);
#endif
check_irq_off();
- l3 = cachep->nodelists[numa_node_id()];
+ l3 = cachep->nodelists[node];
spin_lock(&l3->list_lock);
if (l3->shared) {
struct array_cache *shared_array = l3->shared;
@@ -2686,7 +2686,7 @@ static void cache_flusharray(kmem_cache_
}
}

- free_block(cachep, ac->entry, batchcount);
+ free_block(cachep, ac->entry, batchcount, node);
free_done:
#if STATS
{
@@ -2751,7 +2751,7 @@ static inline void __cache_free(kmem_cac
} else {
spin_lock(&(cachep->nodelists[nodeid])->
list_lock);
- free_block(cachep, &objp, 1);
+ free_block(cachep, &objp, 1, nodeid);
spin_unlock(&(cachep->nodelists[nodeid])->
list_lock);
}
@@ -2844,7 +2844,7 @@ void *kmem_cache_alloc_node(kmem_cache_t
unsigned long save_flags;
void *ptr;

- if (nodeid == numa_node_id() || nodeid == -1)
+ if (nodeid == -1)
return __cache_alloc(cachep, flags);

if (unlikely(!cachep->nodelists[nodeid])) {
@@ -3079,7 +3079,7 @@ static int alloc_kmemlist(kmem_cache_t *

if ((nc = cachep->nodelists[node]->shared))
free_block(cachep, nc->entry,
- nc->avail);
+ nc->avail, node);

l3->shared = new;
if (!cachep->nodelists[node]->alien) {
@@ -3160,7 +3160,7 @@ static int do_tune_cpucache(kmem_cache_t
if (!ccold)
continue;
spin_lock_irq(&cachep->nodelists[cpu_to_node(i)]->list_lock);
- free_block(cachep, ccold->entry, ccold->avail);
+ free_block(cachep, ccold->entry, ccold->avail, cpu_to_node(i));
spin_unlock_irq(&cachep->nodelists[cpu_to_node(i)]->list_lock);
kfree(ccold);
}
@@ -3240,7 +3240,7 @@ static void drain_array_locked(kmem_cach
if (tofree > ac->avail) {
tofree = (ac->avail+1)/2;
}
- free_block(cachep, ac->entry, tofree);
+ free_block(cachep, ac->entry, tofree, node);
ac->avail -= tofree;
memmove(ac->entry, &(ac->entry[tofree]),
sizeof(void*)*ac->avail);

2005-09-21 01:22:15

by Petr Vandrovec

[permalink] [raw]
Subject: Re: 2.6.14-rc1-git-now still dying in mm/slab - this time line 1849

Christoph Lameter wrote:
> On Tue, 20 Sep 2005, Petr Vandrovec wrote:
>
>
>>slab belonging to node#1, while having acquired lock for cachep belonging
>>to node #0. Due to this check_spinlock_acquired_node(cachep, nodeid) fails
>>(check_spinlock_acquired_node(cachep, 0) would succeed).
>
>
> Hmmm. If a node runs out of memory then pages from another node may end up
> on the slab list of a node. But it seems that free_block cannot handle
> that properly.
>
> How are you producing the problem?

Simple... I just boot any kernel after 2.6.13, and it dies in front of me.
Currently I'm using config below, which I boot with 'rootdelay=60' so panic
in keventd happens before panic due to no root filesystem. No ACPI.
Nothing. 100% reproducible. Maybe I should enable embedded options and
remove all other device drivers still present in the kernel.

Below config is dmesg from 2.6.13, which has no problems with comming up. Maybe
you'll find some clue there, but I see none. Node #0 has 1GB of memory, so
it should have no need to borrow blocks from node #1 when this kernel is able
to boot in 16MB of memory...
Petr


#
# Automatically generated make config: don't edit
# Linux kernel version: 2.6.14-rc1-6c07
# Wed Sep 21 03:03:20 2005
#
CONFIG_X86_64=y
CONFIG_64BIT=y
CONFIG_X86=y
CONFIG_SEMAPHORE_SLEEPERS=y
CONFIG_MMU=y
CONFIG_RWSEM_GENERIC_SPINLOCK=y
CONFIG_GENERIC_CALIBRATE_DELAY=y
CONFIG_X86_CMPXCHG=y
CONFIG_EARLY_PRINTK=y
CONFIG_GENERIC_ISA_DMA=y
CONFIG_GENERIC_IOMAP=y
CONFIG_ARCH_MAY_HAVE_PC_FDC=y

#
# Code maturity level options
#
# CONFIG_EXPERIMENTAL is not set
CONFIG_CLEAN_COMPILE=y
CONFIG_LOCK_KERNEL=y
CONFIG_INIT_ENV_ARG_LIMIT=32
# CONFIG_SHOW_LOGO is not set

#
# General setup
#
CONFIG_LOCALVERSION=""
# CONFIG_LOCALVERSION_AUTO is not set
# CONFIG_SWAP is not set
# CONFIG_SYSVIPC is not set
# CONFIG_BSD_PROCESS_ACCT is not set
# CONFIG_SYSCTL is not set
# CONFIG_HOTPLUG is not set
# CONFIG_IKCONFIG is not set
# CONFIG_CPUSETS is not set
CONFIG_INITRAMFS_SOURCE=""
# CONFIG_EMBEDDED is not set
CONFIG_KALLSYMS=y
# CONFIG_KALLSYMS_ALL is not set
# CONFIG_KALLSYMS_EXTRA_PASS is not set
CONFIG_PRINTK=y
CONFIG_BUG=y
CONFIG_BASE_FULL=y
CONFIG_FUTEX=y
CONFIG_EPOLL=y
CONFIG_SHMEM=y
CONFIG_CC_ALIGN_FUNCTIONS=0
CONFIG_CC_ALIGN_LABELS=0
CONFIG_CC_ALIGN_LOOPS=0
CONFIG_CC_ALIGN_JUMPS=0
# CONFIG_TINY_SHMEM is not set
CONFIG_BASE_SMALL=0

#
# Loadable module support
#
# CONFIG_MODULES is not set

#
# Processor type and features
#
CONFIG_MK8=y
# CONFIG_MPSC is not set
# CONFIG_GENERIC_CPU is not set
CONFIG_X86_L1_CACHE_BYTES=64
CONFIG_X86_L1_CACHE_SHIFT=6
CONFIG_X86_TSC=y
CONFIG_X86_GOOD_APIC=y
# CONFIG_MICROCODE is not set
# CONFIG_X86_MSR is not set
# CONFIG_X86_CPUID is not set
CONFIG_X86_IO_APIC=y
CONFIG_X86_LOCAL_APIC=y
# CONFIG_MTRR is not set
CONFIG_SMP=y
# CONFIG_SCHED_SMT is not set
CONFIG_PREEMPT_NONE=y
# CONFIG_PREEMPT_VOLUNTARY is not set
# CONFIG_PREEMPT is not set
# CONFIG_PREEMPT_BKL is not set
CONFIG_K8_NUMA=y
# CONFIG_NUMA_EMU is not set
CONFIG_ARCH_DISCONTIGMEM_ENABLE=y
CONFIG_NUMA=y
CONFIG_ARCH_DISCONTIGMEM_DEFAULT=y
CONFIG_ARCH_SPARSEMEM_ENABLE=y
CONFIG_DISCONTIGMEM=y
CONFIG_FLAT_NODE_MEM_MAP=y
CONFIG_NEED_MULTIPLE_NODES=y
# CONFIG_SPARSEMEM_STATIC is not set
CONFIG_HAVE_ARCH_EARLY_PFN_TO_NID=y
CONFIG_NR_CPUS=8
CONFIG_HPET_TIMER=y
CONFIG_DUMMY_IOMMU=y
CONFIG_X86_MCE=y
# CONFIG_X86_MCE_INTEL is not set
CONFIG_PHYSICAL_START=0x100000
# CONFIG_SECCOMP is not set
CONFIG_HZ_100=y
# CONFIG_HZ_250 is not set
# CONFIG_HZ_1000 is not set
CONFIG_HZ=100
CONFIG_GENERIC_HARDIRQS=y
CONFIG_GENERIC_IRQ_PROBE=y
CONFIG_ISA_DMA_API=y
CONFIG_GENERIC_PENDING_IRQ=y

#
# Power management options
#
# CONFIG_PM is not set

#
# ACPI (Advanced Configuration and Power Interface) Support
#
# CONFIG_ACPI is not set

#
# CPU Frequency scaling
#
# CONFIG_CPU_FREQ is not set

#
# Bus options (PCI etc.)
#
# CONFIG_PCI is not set

#
# PCCARD (PCMCIA/CardBus) support
#
# CONFIG_PCCARD is not set

#
# PCI Hotplug Support
#

#
# Executable file formats / Emulations
#
# CONFIG_BINFMT_ELF is not set
# CONFIG_BINFMT_MISC is not set
# CONFIG_IA32_EMULATION is not set

#
# Networking
#
# CONFIG_NET is not set

#
# Device Drivers
#

#
# Generic Driver Options
#
CONFIG_STANDALONE=y
# CONFIG_PREVENT_FIRMWARE_BUILD is not set
# CONFIG_FW_LOADER is not set
# CONFIG_DEBUG_DRIVER is not set

#
# Connector - unified userspace <-> kernelspace linker
#

#
# Memory Technology Devices (MTD)
#
# CONFIG_MTD is not set

#
# Parallel port support
#
# CONFIG_PARPORT is not set

#
# Plug and Play support
#

#
# Block devices
#
# CONFIG_BLK_DEV_FD is not set
# CONFIG_BLK_DEV_COW_COMMON is not set
# CONFIG_BLK_DEV_LOOP is not set
# CONFIG_BLK_DEV_RAM is not set
CONFIG_BLK_DEV_RAM_COUNT=16
# CONFIG_LBD is not set
# CONFIG_CDROM_PKTCDVD is not set

#
# IO Schedulers
#
CONFIG_IOSCHED_NOOP=y
# CONFIG_IOSCHED_AS is not set
# CONFIG_IOSCHED_DEADLINE is not set
# CONFIG_IOSCHED_CFQ is not set

#
# ATA/ATAPI/MFM/RLL support
#
# CONFIG_IDE is not set

#
# SCSI device support
#
# CONFIG_RAID_ATTRS is not set
# CONFIG_SCSI is not set

#
# Multi-device support (RAID and LVM)
#
# CONFIG_MD is not set

#
# Fusion MPT device support
#
# CONFIG_FUSION is not set

#
# IEEE 1394 (FireWire) support
#

#
# I2O device support
#

#
# Network device support
#
# CONFIG_NETPOLL is not set
# CONFIG_NET_POLL_CONTROLLER is not set

#
# ISDN subsystem
#

#
# Telephony Support
#
# CONFIG_PHONE is not set

#
# Input device support
#
CONFIG_INPUT=y

#
# Userland interfaces
#
CONFIG_INPUT_MOUSEDEV=y
# CONFIG_INPUT_MOUSEDEV_PSAUX is not set
CONFIG_INPUT_MOUSEDEV_SCREEN_X=1024
CONFIG_INPUT_MOUSEDEV_SCREEN_Y=768
# CONFIG_INPUT_JOYDEV is not set
# CONFIG_INPUT_TSDEV is not set
# CONFIG_INPUT_EVDEV is not set
# CONFIG_INPUT_EVBUG is not set

#
# Input Device Drivers
#
CONFIG_INPUT_KEYBOARD=y
CONFIG_KEYBOARD_ATKBD=y
# CONFIG_KEYBOARD_SUNKBD is not set
# CONFIG_KEYBOARD_LKKBD is not set
# CONFIG_KEYBOARD_XTKBD is not set
# CONFIG_KEYBOARD_NEWTON is not set
# CONFIG_INPUT_MOUSE is not set
# CONFIG_INPUT_JOYSTICK is not set
# CONFIG_INPUT_TOUCHSCREEN is not set
# CONFIG_INPUT_MISC is not set

#
# Hardware I/O ports
#
CONFIG_SERIO=y
CONFIG_SERIO_I8042=y
# CONFIG_SERIO_SERPORT is not set
# CONFIG_SERIO_CT82C710 is not set
CONFIG_SERIO_LIBPS2=y
# CONFIG_SERIO_RAW is not set
# CONFIG_GAMEPORT is not set

#
# Character devices
#
CONFIG_VT=y
CONFIG_VT_CONSOLE=y
# CONFIG_CONSOLE_KNOWS_9B is not set
CONFIG_HW_CONSOLE=y
# CONFIG_SERIAL_NONSTANDARD is not set

#
# Serial drivers
#
# CONFIG_SERIAL_8250 is not set

#
# Non-8250 serial port support
#
CONFIG_UNIX98_PTYS=y
# CONFIG_LEGACY_PTYS is not set

#
# IPMI
#
# CONFIG_IPMI_HANDLER is not set

#
# Watchdog Cards
#
# CONFIG_WATCHDOG is not set
# CONFIG_NVRAM is not set
# CONFIG_RTC is not set
# CONFIG_GEN_RTC is not set
# CONFIG_DTLK is not set
# CONFIG_R3964 is not set

#
# Ftape, the floppy tape device driver
#
# CONFIG_AGP is not set
# CONFIG_MWAVE is not set
# CONFIG_RAW_DRIVER is not set
# CONFIG_HANGCHECK_TIMER is not set

#
# TPM devices
#

#
# I2C support
#
# CONFIG_I2C is not set

#
# Dallas's 1-wire bus
#
# CONFIG_W1 is not set

#
# Hardware Monitoring support
#
# CONFIG_HWMON is not set
# CONFIG_HWMON_VID is not set

#
# Misc devices
#

#
# Multimedia Capabilities Port drivers
#

#
# Multimedia devices
#
# CONFIG_VIDEO_DEV is not set

#
# Digital Video Broadcasting Devices
#

#
# Graphics support
#
# CONFIG_FB is not set
# CONFIG_VIDEO_SELECT is not set

#
# Console display driver support
#
CONFIG_VGA_CONSOLE=y
CONFIG_DUMMY_CONSOLE=y

#
# Sound
#
# CONFIG_SOUND is not set

#
# USB support
#
# CONFIG_USB_ARCH_HAS_HCD is not set
# CONFIG_USB_ARCH_HAS_OHCI is not set

#
# USB Gadget Support
#
# CONFIG_USB_GADGET is not set

#
# MMC/SD Card support
#
# CONFIG_MMC is not set

#
# InfiniBand support
#

#
# SN Devices
#

#
# Firmware Drivers
#
# CONFIG_DELL_RBU is not set
# CONFIG_DCDBAS is not set

#
# File systems
#
# CONFIG_EXT2_FS is not set
# CONFIG_EXT3_FS is not set
# CONFIG_JBD is not set
# CONFIG_REISERFS_FS is not set
# CONFIG_JFS_FS is not set
# CONFIG_FS_POSIX_ACL is not set
# CONFIG_XFS_FS is not set
# CONFIG_MINIX_FS is not set
# CONFIG_ROMFS_FS is not set
# CONFIG_INOTIFY is not set
# CONFIG_QUOTA is not set
CONFIG_DNOTIFY=y
# CONFIG_AUTOFS_FS is not set
# CONFIG_AUTOFS4_FS is not set
# CONFIG_FUSE_FS is not set

#
# CD-ROM/DVD Filesystems
#
# CONFIG_ISO9660_FS is not set
# CONFIG_UDF_FS is not set

#
# DOS/FAT/NT Filesystems
#
# CONFIG_MSDOS_FS is not set
# CONFIG_VFAT_FS is not set
# CONFIG_NTFS_FS is not set

#
# Pseudo filesystems
#
CONFIG_PROC_FS=y
CONFIG_PROC_KCORE=y
CONFIG_SYSFS=y
# CONFIG_TMPFS is not set
# CONFIG_HUGETLBFS is not set
# CONFIG_HUGETLB_PAGE is not set
CONFIG_RAMFS=y
# CONFIG_RELAYFS_FS is not set

#
# Miscellaneous filesystems
#
# CONFIG_HFSPLUS_FS is not set
# CONFIG_CRAMFS is not set
# CONFIG_VXFS_FS is not set
# CONFIG_HPFS_FS is not set
# CONFIG_QNX4FS_FS is not set
# CONFIG_SYSV_FS is not set
# CONFIG_UFS_FS is not set

#
# Partition Types
#
# CONFIG_PARTITION_ADVANCED is not set
CONFIG_MSDOS_PARTITION=y

#
# Native Language Support
#
# CONFIG_NLS is not set

#
# Kernel hacking
#
# CONFIG_PRINTK_TIME is not set
CONFIG_DEBUG_KERNEL=y
CONFIG_MAGIC_SYSRQ=y
CONFIG_LOG_BUF_SHIFT=17
CONFIG_DETECT_SOFTLOCKUP=y
CONFIG_SCHEDSTATS=y
CONFIG_DEBUG_SLAB=y
CONFIG_DEBUG_SPINLOCK=y
CONFIG_DEBUG_SPINLOCK_SLEEP=y
# CONFIG_DEBUG_KOBJECT is not set
CONFIG_DEBUG_INFO=y
CONFIG_DEBUG_FS=y
# CONFIG_FRAME_POINTER is not set
CONFIG_INIT_DEBUG=y
# CONFIG_KPROBES is not set

#
# Security options
#
# CONFIG_KEYS is not set
# CONFIG_SECURITY is not set

#
# Cryptographic options
#
# CONFIG_CRYPTO is not set

#
# Hardware crypto devices
#

#
# Library routines
#
# CONFIG_CRC_CCITT is not set
# CONFIG_CRC16 is not set
CONFIG_CRC32=y
# CONFIG_LIBCRC32C is not set


Bootdata ok (command line is BOOT_IMAGE=2.6.13-64 ro root=801 ramdisk=0 console=ttyS0,115200 console=tty0 nmi_watchdog=2 psmouse_noext=1 verbose)
Linux version 2.6.13 (root@vana) (gcc version 3.3.3 (Debian 20040401)) #2 SMP Tue Aug 30 02:41:20 CEST 2005
BIOS-provided physical RAM map:
BIOS-e820: 0000000000000000 - 000000000009fc00 (usable)
BIOS-e820: 000000000009fc00 - 00000000000a0000 (reserved)
BIOS-e820: 00000000000e0000 - 0000000000100000 (reserved)
BIOS-e820: 0000000000100000 - 000000007fff0000 (usable)
BIOS-e820: 000000007fff0000 - 000000007ffff000 (ACPI data)
BIOS-e820: 000000007ffff000 - 0000000080000000 (ACPI NVS)
BIOS-e820: 00000000ff780000 - 0000000100000000 (reserved)
ACPI: RSDP (v002 ACPIAM ) @ 0x00000000000f6e10
ACPI: XSDT (v001 A M I OEMXSDT 0x06000514 MSFT 0x00000097) @ 0x000000007fff0100
ACPI: FADT (v001 A M I OEMFACP 0x06000514 MSFT 0x00000097) @ 0x000000007fff0281
ACPI: MADT (v001 A M I OEMAPIC 0x06000514 MSFT 0x00000097) @ 0x000000007fff0380
ACPI: OEMB (v001 A M I OEMBIOS 0x06000514 MSFT 0x00000097) @ 0x000000007ffff040
ACPI: SRAT (v001 A M I OEMSRAT 0x06000514 MSFT 0x00000097) @ 0x000000007fff4260
ACPI: HPET (v001 A M I OEMHPET 0x06000514 MSFT 0x00000097) @ 0x000000007fff4370
ACPI: ASF! (v001 AMIASF AMDSTRET 0x00000001 INTL 0x02002026) @ 0x000000007fff43b0
ACPI: DSDT (v001 0AAAA 0AAAA001 0x00000001 INTL 0x02002026) @ 0x0000000000000000
SRAT: PXM 0 -> APIC 0 -> CPU 0 -> Node 0
SRAT: PXM 1 -> APIC 1 -> CPU 1 -> Node 1
SRAT: Node 0 PXM 0 100000-3fffffff
SRAT: Node 1 PXM 1 40000000-7fffffff
SRAT: Node 0 PXM 0 0-3fffffff
Using 24 for the hash shift. Max adder is 7fffffff
Bootmem setup node 0 0000000000000000-000000003fffffff
Bootmem setup node 1 0000000040000000-000000007ffeffff
On node 0 totalpages: 262046
DMA zone: 3999 pages, LIFO batch:1
Normal zone: 258047 pages, LIFO batch:31
HighMem zone: 0 pages, LIFO batch:1
On node 1 totalpages: 262127
DMA zone: 0 pages, LIFO batch:1
Normal zone: 262127 pages, LIFO batch:31
HighMem zone: 0 pages, LIFO batch:1
ACPI: PM-Timer IO Port: 0x5008
ACPI: Local APIC address 0xfee00000
ACPI: LAPIC (acpi_id[0x01] lapic_id[0x00] enabled)
Processor #0 15:5 APIC version 16
ACPI: LAPIC (acpi_id[0x02] lapic_id[0x01] enabled)
Processor #1 15:5 APIC version 16
ACPI: LAPIC (acpi_id[0x03] lapic_id[0x82] disabled)
ACPI: LAPIC (acpi_id[0x04] lapic_id[0x83] disabled)
ACPI: IOAPIC (id[0x02] address[0xfec00000] gsi_base[0])
IOAPIC[0]: apic_id 2, version 17, address 0xfec00000, GSI 0-23
ACPI: IOAPIC (id[0x03] address[0xff4ff000] gsi_base[24])
IOAPIC[1]: apic_id 3, version 17, address 0xff4ff000, GSI 24-27
ACPI: IOAPIC (id[0x04] address[0xff4fe000] gsi_base[28])
IOAPIC[2]: apic_id 4, version 17, address 0xff4fe000, GSI 28-31
ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
ACPI: IRQ0 used by override.
ACPI: IRQ2 used by override.
ACPI: IRQ9 used by override.
Setting APIC routing to flat
ACPI: HPET id: 0x102282a0 base: 0xfec01000
Using ACPI (MADT) for SMP configuration information
Allocating PCI resources starting at 80000000 (gap: 80000000:7f780000)
Checking aperture...
CPU 0: aperture @ c0000000 size 512 MB
CPU 1: aperture @ c0000000 size 512 MB
Built 2 zonelists
Kernel command line: BOOT_IMAGE=2.6.13-64 ro root=801 ramdisk=0 console=ttyS0,115200 console=tty0 nmi_watchdog=2 psmouse_noext=1 verbose
Parameter psmouse_noext is obsolete, ignored
Initializing CPU#0
PID hash table entries: 4096 (order: 12, 131072 bytes)
time.c: Using 14.318180 MHz HPET timer.
time.c: Detected 1991.621 MHz processor.
Console: colour VGA+ 80x25
Dentry cache hash table entries: 524288 (order: 10, 4194304 bytes)
Inode-cache hash table entries: 262144 (order: 9, 2097152 bytes)
Memory: 2055660k/2097088k available (2668k kernel code, 0k reserved, 2318k data, 232k init)
Calibrating delay using timer specific routine.. 3988.04 BogoMIPS (lpj=19940224)
Security Framework v1.0.0 initialized
SELinux: Initializing.
SELinux: Starting in permissive mode
Mount-cache hash table entries: 256
CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
CPU: L2 Cache: 1024K (64 bytes/line)
CPU 0(1) -> Node 0 -> Core 0
mtrr: v2.0 (20020519)
Using local APIC timer interrupts.
Detected 12.447 MHz APIC timer.
Booting processor 1/2 APIC 0x1
Initializing CPU#1
Calibrating delay using timer specific routine.. 3983.13 BogoMIPS (lpj=19915667)
CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
CPU: L2 Cache: 1024K (64 bytes/line)
CPU 1(1) -> Node 1 -> Core 0
AMD Opteron(tm) Processor 246 stepping 0a
CPU 1: Syncing TSC to CPU 0.
CPU 1: synchronized TSC with CPU 0 (last diff -136 cycles, maxerr 901 cycles)
Brought up 2 CPUs
time.c: Using HPET based timekeeping.
testing NMI watchdog ... OK.
NET: Registered protocol family 16
ACPI: bus type pci registered
PCI: Using configuration type 1
ACPI: Subsystem revision 20050408
ACPI: Interpreter enabled
ACPI: Using IOAPIC for interrupt routing
ACPI: PCI Root Bridge [PCI0] (0000:00)
PCI: Probing PCI hardware (bus 00)
ACPI: Assume root bridge [\_SB_.PCI0] segment is 0
ACPI: Assume root bridge [\_SB_.PCIB] segment is 0
PCI: Scanning bus 0000:00
PCI: Found 0000:00:06.0 [1022/7460] 000604 01
PCI: Found 0000:00:07.0 [1022/7468] 000601 00
PCI: Found 0000:00:07.1 [1022/7469] 000101 00
PCI: Found 0000:00:07.2 [1022/746a] 000c05 00
PCI: Found 0000:00:07.3 [1022/746b] 000680 00
PCI: Found 0000:00:07.5 [1022/746d] 000401 00
PCI: Found 0000:00:0a.0 [1022/7450] 000604 01
PCI: Found 0000:00:0a.1 [1022/7451] 000800 00
PCI: Found 0000:00:0b.0 [1022/7450] 000604 01
PCI: Found 0000:00:0b.1 [1022/7451] 000800 00
PCI: Found 0000:00:18.0 [1022/1100] 000600 00
PCI: Found 0000:00:18.1 [1022/1101] 000600 00
PCI: Found 0000:00:18.2 [1022/1102] 000600 00
PCI: Found 0000:00:18.3 [1022/1103] 000600 00
PCI: Found 0000:00:19.0 [1022/1100] 000600 00
PCI: Found 0000:00:19.1 [1022/1101] 000600 00
PCI: Found 0000:00:19.2 [1022/1102] 000600 00
PCI: Found 0000:00:19.3 [1022/1103] 000600 00
PCI: Fixups for bus 0000:00
PCI: Scanning behind PCI bridge 0000:00:06.0, config 010100, pass 0
PCI: Scanning bus 0000:01
PCI: Found 0000:01:00.0 [1022/7464] 000c03 00
PCI: Found 0000:01:00.1 [1022/7464] 000c03 00
PCI: Found 0000:01:0b.0 [1095/3114] 000180 00
PCI: Found 0000:01:0c.0 [104c/8023] 000c00 00
PCI: Fixups for bus 0000:01
PCI: Bus scan for 0000:01 returning with max=01
PCI: Scanning behind PCI bridge 0000:00:0a.0, config 020200, pass 0
PCI: Scanning bus 0000:02
PCI: Found 0000:02:07.0 [1131/7146] 000480 00
PCI: Found 0000:02:09.0 [14e4/16a7] 000200 00
PCI: Fixups for bus 0000:02
PCI: Bus scan for 0000:02 returning with max=02
PCI: Scanning behind PCI bridge 0000:00:0b.0, config 030300, pass 0
PCI: Scanning bus 0000:03
PCI: Fixups for bus 0000:03
PCI: Bus scan for 0000:03 returning with max=03
PCI: Scanning behind PCI bridge 0000:00:06.0, config 010100, pass 1
PCI: Scanning behind PCI bridge 0000:00:0a.0, config 020200, pass 1
PCI: Scanning behind PCI bridge 0000:00:0b.0, config 030300, pass 1
PCI: Bus scan for 0000:00 returning with max=03
ACPI: PCI Interrupt Routing Table [\_SB_.PCI0._PRT]
ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.PCI1._PRT]
ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.GOLA._PRT]
ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.GOLB._PRT]
ACPI: PCI Root Bridge [PCIB] (0000:04)
PCI: Probing PCI hardware (bus 04)
ACPI: Assume root bridge [\_SB_.PCI0] segment is 0
ACPI: Assume root bridge [\_SB_.PCIB] segment is 0
PCI: Scanning bus 0000:04
PCI: Found 0000:04:00.0 [1022/7454] 000600 00
PCI: Found 0000:04:01.0 [1022/7455] 000604 01
PCI: Fixups for bus 0000:04
PCI: Scanning behind PCI bridge 0000:04:01.0, config 050504, pass 0
PCI: Scanning bus 0000:05
PCI: Found 0000:05:00.0 [1002/5964] 000300 00
Boot video device is 0000:05:00.0
PCI: Found 0000:05:00.1 [1002/5d44] 000380 00
PCI: Fixups for bus 0000:05
PCI: Bus scan for 0000:05 returning with max=05
PCI: Scanning behind PCI bridge 0000:04:01.0, config 050504, pass 1
PCI: Bus scan for 0000:04 returning with max=05
ACPI: PCI Interrupt Routing Table [\_SB_.PCIB.PBP2._PRT]
ACPI: PCI Interrupt Link [LNKA] (IRQs 3 4 5 6 7 9 10 *11 12 14 15)
ACPI: PCI Interrupt Link [LNKB] (IRQs 3 4 5 6 7 9 *10 11 12 14 15)
ACPI: PCI Interrupt Link [LNKC] (IRQs 3 4 5 6 7 *9 10 11 12 14 15)
ACPI: PCI Interrupt Link [LNKD] (IRQs 3 4 *5 6 7 9 10 11 12 14 15)
Linux Plug and Play Support v0.97 (c) Adam Belay
pnp: PnP ACPI init
pnp: PnP ACPI: found 14 devices
SCSI subsystem initialized
PCI: Using ACPI for IRQ routing
PCI: If a device doesn't work, try "pci=routeirq". If it helps, post a report
got res [80000000:8007ffff] bus [80000000:8007ffff] flags 7200 for BAR 6 of 0000:01:0b.0
PCI: Bridge: 0000:00:06.0
IO window: 9000-afff
MEM window: ff100000-ff2fffff
PREFETCH window: 80000000-800fffff
got res [9e900000:9e90ffff] bus [9e900000:9e90ffff] flags 7200 for BAR 6 of 0000:02:09.0
PCI: Bridge: 0000:00:0a.0
IO window: disabled.
MEM window: ff300000-ff3fffff
PREFETCH window: 9e900000-9e9fffff
PCI: Bridge: 0000:00:0b.0
IO window: disabled.
MEM window: disabled.
PREFETCH window: disabled.
got res [9eb00000:9eb1ffff] bus [9eb00000:9eb1ffff] flags 7202 for BAR 6 of 0000:05:00.0
PCI: Bridge: 0000:04:01.0
IO window: c000-cfff
MEM window: ff500000-ff5fffff
PREFETCH window: 9eb00000-beafffff
TC classifier action (bugs to [email protected] cc [email protected])
hpet0: at MMIO 0xfec01000, IRQs 2, 8, 0
hpet0: 69ns tick, 3 32-bit timers
agpgart: Detected AMD 8151 AGP Bridge rev B3
agpgart: AGP aperture is 512M @ 0xc0000000
PCI-DMA: Disabling IOMMU.
pnp: 00:09: ioport range 0x680-0x6ff has been reserved
pnp: 00:09: ioport range 0x295-0x296 has been reserved
pnp: 00:09: ioport range 0xb78-0xb7f has been reserved
pnp: 00:09: ioport range 0xf78-0xf7f has been reserved
IA-32 Microcode Update Driver: v1.14 <[email protected]>
IA32 emulation $Id: sys_ia32.c,v 1.32 2002/03/24 13:02:28 ak Exp $
audit: initializing netlink socket (disabled)
audit(1127264758.390:1): initialized
Total HugeTLB memory allocated, 0
SELinux: Registering netfilter hooks
Initializing Cryptographic API
PCI: MSI quirk detected. pci_msi_quirk set.
PCI: MSI quirk detected. pci_msi_quirk set.
ACPI: PCI Interrupt 0000:05:00.0[A] -> GSI 16 (level, low) -> IRQ 169
radeonfb: Found Intel x86 BIOS ROM Image
radeonfb: Retreived PLL infos from BIOS
radeonfb: Reference=27.00 MHz (RefDiv=12) Memory=200.00 Mhz, System=166.00 MHz
radeonfb: PLL min 20000 max 40000
radeonfb: Monitor 1 type DFP found
radeonfb: EDID probed
radeonfb: Monitor 2 type no found
Console: switching to colour frame buffer device 240x75
radeonfb (0000:05:00.0): ATI Radeon Yd
ACPI: Power Button (FF) [PWRF]
ACPI: Power Button (CM) [PWRB]
Using specific hotkey driver
ACPI: CPU0 (power states: C1[C1])
ACPI: Processor [CPU1] (supports 8 throttling states)
ACPI: CPU1 (power states: C1[C1])
Real Time Clock Driver v1.12
hpet_acpi_add: no address or irqs in _CRS
Linux agpgart interface v0.101 (c) Dave Jones
[drm] Initialized drm 1.0.0 20040925
PNP: PS/2 controller doesn't have AUX irq; using default 0xc
PNP: PS/2 Controller [PNP0303:PS2K] at 0x60,0x64 irq 112
serio: i8042 AUX port at 0x60,0x64 irq 12
serio: i8042 KBD port at 0x60,0x64 irq 1
Serial: 8250/16550 driver $Revision: 1.90 $ 4 ports, IRQ sharing disabled
ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
ttyS1 at I/O 0x2f8 (irq = 3) is a 16550A
ttyS0 at I/O 0x3f8 (irq = 0) is a 16550A
ttyS1 at I/O 0x2f8 (irq = 0) is a 16550A
ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
ttyS1 at I/O 0x2f8 (irq = 3) is a 16550A
parport: PnPBIOS parport detected.
parport0: PC-style at 0x378 (0x778), irq 7, dma 3 [PCSPP,TRISTATE,COMPAT,EPP,ECP,DMA]
io scheduler noop registered
io scheduler anticipatory registered
io scheduler deadline registered
io scheduler cfq registered
pktcdvd: v0.2.0a 2004-07-14 Jens Axboe ([email protected]) and [email protected]
Uniform Multi-Platform E-IDE driver Revision: 7.00alpha2
ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx
AMD8111: IDE controller at PCI slot 0000:00:07.1
AMD8111: chipset revision 3
AMD8111: not 100% native mode: will probe irqs later
AMD8111: 0000:00:07.1 (rev 03) UDMA133 controller
ide0: BM-DMA at 0xffa0-0xffa7, BIOS settings: hda:DMA, hdb:pio
ide1: BM-DMA at 0xffa8-0xffaf, BIOS settings: hdc:DMA, hdd:pio
Probing IDE interface ide0...
hda: _NEC DVD_RW ND-3500AG, ATAPI CD/DVD-ROM drive
ide0 at 0x1f0-0x1f7,0x3f6 on irq 14
Probing IDE interface ide1...
hdc: WDC WD1200JB-00CRA0, ATA DISK drive
ide1 at 0x170-0x177,0x376 on irq 15
hdc: max request size: 128KiB
hdc: 234441648 sectors (120034 MB) w/8192KiB Cache, CHS=65535/16/63, UDMA(100)
hdc: cache flushes not supported
hdc: hdc1
libata version 1.12 loaded.
sata_sil version 0.9
ACPI: PCI Interrupt 0000:01:0b.0[A] -> GSI 17 (level, low) -> IRQ 177
ata1: SATA max UDMA/100 cmd 0xFFFFC20000004C80 ctl 0xFFFFC20000004C8A bmdma 0xFFFFC20000004C00 irq 177
ata2: SATA max UDMA/100 cmd 0xFFFFC20000004CC0 ctl 0xFFFFC20000004CCA bmdma 0xFFFFC20000004C08 irq 177
ata3: SATA max UDMA/100 cmd 0xFFFFC20000004E80 ctl 0xFFFFC20000004E8A bmdma 0xFFFFC20000004E00 irq 177
ata4: SATA max UDMA/100 cmd 0xFFFFC20000004EC0 ctl 0xFFFFC20000004ECA bmdma 0xFFFFC20000004E08 irq 177
ata1: dev 0 cfg 49:2f00 82:74eb 83:7feb 84:4123 85:74e8 86:3c03 87:4123 88:207f
ata1: dev 0 ATA, max UDMA/133, 781422768 sectors: lba48
ata1: dev 0 configured for UDMA/100
scsi0 : sata_sil
ata2: no device found (phy stat 00000000)
scsi1 : sata_sil
ata3: no device found (phy stat 00000000)
scsi2 : sata_sil
ata4: no device found (phy stat 00000000)
scsi3 : sata_sil
Vendor: ATA Model: HDS724040KLSA80 Rev: KFAO
Type: Direct-Access ANSI SCSI revision: 05
SCSI device sda: 781422768 512-byte hdwr sectors (400088 MB)
SCSI device sda: drive cache: write back
SCSI device sda: 781422768 512-byte hdwr sectors (400088 MB)
SCSI device sda: drive cache: write back
sda: sda1 sda2 sda3 sda4
Attached scsi disk sda at scsi0, channel 0, id 0, lun 0
Fusion MPT base driver 3.03.02
Copyright (c) 1999-2005 LSI Logic Corporation
Fusion MPT SPI Host driver 3.03.02
Fusion MPT misc device (ioctl) driver 3.03.02
mptctl: Registered with Fusion MPT base driver
mptctl: /dev/mptctl @ (major,minor=10,220)
aoe: aoe_init: AoE v2.6-10 initialised.
mice: PS/2 mouse device common for all mice
input: PC Speaker
i2c /dev entries driver
NET: Registered protocol family 2
IP route cache hash table entries: 131072 (order: 8, 1048576 bytes)
TCP established hash table entries: 131072 (order: 9, 3145728 bytes)
TCP bind hash table entries: 65536 (order: 8, 1572864 bytes)
input: AT Translated Set 2 keyboard on isa0060/serio0
TCP: Hash tables configured (established 131072 bind 65536)
TCP reno registered
NET: Registered protocol family 1
NET: Registered protocol family 17
BIOS EDD facility v0.16 2004-Jun-25, 2 devices found
Found 512b device! Using larger block size...
EXT3-fs: mounted filesystem with ordered data mode.
VFS: Mounted root (ext3 filesystem) readonly.
Freeing unused kernel memory: 232k freed
kjournald starting. Commit interval 5 seconds
Found 512b device! Using larger block size...
Found 512b device! Using larger block size...
Adding 1959916k swap on /dev/sda2. Priority:-1 extents:1
EXT3 FS on sda1, internal journal
i2c_adapter i2c-6: Detecting device at 6,0x2e with COMPANY: 0x41 and VERSTEP: 0x62
i2c_adapter i2c-6: Autodetecting device at 6,0x2e ...
lm85 6-002e: Initializing device
lm85 6-002e: LM85_REG_CONFIG is: 0x05
lm85 6-002e: Setting CONFIG to: 0x05
powernow-k8: Found 2 AMD Athlon 64 / Opteron processors (version 1.50.3)
powernow-k8: 0 : fid 0xc (2000 MHz), vid 0x2 (1500 mV)
powernow-k8: 1 : fid 0xa (1800 MHz), vid 0x6 (1400 mV)
powernow-k8: 2 : fid 0x2 (1000 MHz), vid 0xe (1200 mV)
cpu_init done, current fid 0xc, vid 0x2
powernow-k8: 0 : fid 0xc (2000 MHz), vid 0x2 (1500 mV)
powernow-k8: 1 : fid 0xa (1800 MHz), vid 0x6 (1400 mV)
powernow-k8: 2 : fid 0x2 (1000 MHz), vid 0xe (1200 mV)
cpu_init done, current fid 0xc, vid 0x2
tg3.c:v3.37 (August 25, 2005)
ACPI: PCI Interrupt 0000:02:09.0[A] -> GSI 24 (level, low) -> IRQ 185
eth0: Tigon3 [partno(BCM95703A30) rev 1002 PHY(5703)] (PCI:33MHz:64-bit) 10/100/1000BaseT Ethernet 00:e0:81:2c:90:0a
eth0: RXcsums[1] LinkChgREG[0] MIirq[0] ASF[0] Split[0] WireSpeed[1] TSOcap[1]
eth0: dma_rwctrl[763f0000]
Found 512b device! Using larger block size...
Found 512b device! Using larger block size...
kjournald starting. Commit interval 5 seconds
EXT3 FS on sda3, internal journal
EXT3-fs: mounted filesystem with ordered data mode.
Found 512b device! Using larger block size...
kjournald starting. Commit interval 5 seconds
EXT3 FS on sda4, internal journal
EXT3-fs: mounted filesystem with ordered data mode.
Intel 810 + AC97 Audio, version 1.01, 19:40:55 Aug 29 2005
ACPI: PCI Interrupt 0000:00:07.5[B] -> GSI 17 (level, low) -> IRQ 177
i810: AMD-8111 IOHub found at IO 0xbc00 and 0xb800, MEM 0x0000 and 0x0000, IRQ 177
i810_audio: Audio Controller supports 6 channels.
i810_audio: Defaulting to base 2 channel mode.
i810_audio: Resetting connection 0
ac97_codec: AC97 Audio codec, id: ADS116 (Analog Devices AD1981B)
i810_audio: AC'97 codec 0 supports AMAP, total channels = 2
usbcore: registered new driver usbfs
usbcore: registered new driver hub
ohci_hcd: 2005 April 22 USB 1.1 'Open' Host Controller (OHCI) Driver (PCI)
ACPI: PCI Interrupt 0000:01:00.0[D] -> GSI 19 (level, low) -> IRQ 193
ohci_hcd 0000:01:00.0: Advanced Micro Devices [AMD] AMD-8111 USB
ohci_hcd 0000:01:00.0: new USB bus registered, assigned bus number 1
ohci_hcd 0000:01:00.0: irq 193, io mem 0xff2fd000
hub 1-0:1.0: USB hub found
hub 1-0:1.0: 3 ports detected
ACPI: PCI Interrupt 0000:01:00.1[D] -> GSI 19 (level, low) -> IRQ 193
ohci_hcd 0000:01:00.1: Advanced Micro Devices [AMD] AMD-8111 USB (#2)
ohci_hcd 0000:01:00.1: new USB bus registered, assigned bus number 2
ohci_hcd 0000:01:00.1: irq 193, io mem 0xff2fe000
hub 2-0:1.0: USB hub found
hub 2-0:1.0: 3 ports detected
ieee1394: Initialized config rom entry `ip1394'
ohci1394: $Rev: 1299 $ Ben Collins <[email protected]>
ACPI: PCI Interrupt 0000:01:0c.0[A] -> GSI 19 (level, low) -> IRQ 193
ohci1394: fw-host0: OHCI-1394 1.1 (PCI): IRQ=[193] MMIO=[ff2ff000-ff2ff7ff] Max Packet=[2048]
ieee1394: Host added: ID:BUS[0-00:1023] GUID[00e0810000303bdf]
lm85 6-002e: Reading sensor values
eth1394: $Rev: 1264 $ Ben Collins <[email protected]>
eth1394: eth1: IEEE-1394 IPv4 over 1394 Ethernet (fw-host0)
lm85 6-002e: Reading config values
NET: Registered protocol family 15
pci_hotplug: PCI Hot Plug PCI Core version: 0.5
shpchp: HPC vendor_id 1022 device_id 7460 ss_vid 0 ss_did 0
shpchp: shpc_init: cannot reserve MMIO region
shpchp: HPC vendor_id 1022 device_id 7450 ss_vid 0 ss_did 0
shpchp: shpc_init: cannot reserve MMIO region
shpchp: HPC vendor_id 1022 device_id 7450 ss_vid 0 ss_did 0
shpchp: shpc_init: cannot reserve MMIO region
shpchp: HPC vendor_id 1022 device_id 7455 ss_vid 0 ss_did 0
shpchp: shpc_init: cannot reserve MMIO region
shpchp: Standard Hot Plug PCI Controller Driver version: 0.4
hw_random: AMD768 system management I/O registers at 0x5000.
hw_random hardware driver 1.0.0 loaded
saa7146: register extension 'budget_ci dvb'.
ACPI: PCI Interrupt 0000:02:07.0[A] -> GSI 26 (level, low) -> IRQ 201
saa7146: found saa7146 @ mem ffffc20000198c00 (revision 1, irq 201) (0x13c2,0x1011).
DVB: registering new adapter (TT-Budget/WinTV-NOVA-T PCI).
adapter has MAC addr = 00:d0:5c:03:23:34
DVB: registering frontend 0 (Philips TDA10045H DVB-T)...
Floppy drive(s): fd0 is 1.44M
FDC 0 is a post-1991 82077
hda: ATAPI 48X DVD-ROM DVD-R CD-R/RW drive, 2048kB Cache, UDMA(33)
Uniform CD-ROM driver Revision: 3.20
tg3: eth0: Link is up at 100 Mbps, full duplex.
tg3: eth0: Flow control is off for TX and off for RX.
NET: Registered protocol family 10
Disabled Privacy Extensions on device ffffffff8052efc0(lo)
IPv6 over IPv4 tunneling driver
WDT driver for the Winbond(TM) W83627HF Super I/O chip initialising.
w83627hf WDT: initialized. timeout=60 sec (nowayout=0)
selinux_register_security: Registering secondary module capability
Capability LSM initialized as secondary
Installing knfsd (copyright (C) 1996 [email protected]).
NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state recovery directory
NFSD: starting 90-second grace period
crc32c: Unknown symbol crc32c_le
Initializing IPsec netlink socket
ipcomp6: Unknown symbol xfrm6_tunnel_free_spi
ipcomp6: Unknown symbol xfrm6_tunnel_alloc_spi
ipcomp6: Unknown symbol xfrm6_tunnel_spi_lookup
NET: Registered protocol family 4
ioctl32(ipx_configure:6768): Unknown cmd fd(3) cmd(000089e1){00} arg(ffffd85b) on socket:[13557]
Process accounting paused
lm85 6-002e: Reading sensor values

2005-09-21 15:59:52

by Christoph Lameter

[permalink] [raw]
Subject: Re: 2.6.14-rc1-git-now still dying in mm/slab - this time line 1849

On Wed, 21 Sep 2005, Petr Vandrovec wrote:

> Simple... I just boot any kernel after 2.6.13, and it dies in front of me.
> Currently I'm using config below, which I boot with 'rootdelay=60' so panic
> in keventd happens before panic due to no root filesystem. No ACPI.
> Nothing. 100% reproducible. Maybe I should enable embedded options and
> remove all other device drivers still present in the kernel.
>
> Below config is dmesg from 2.6.13, which has no problems with comming up.
> Maybe
> you'll find some clue there, but I see none. Node #0 has 1GB of memory, so
> it should have no need to borrow blocks from node #1 when this kernel is able
> to boot in 16MB of memory...

Hmm. This likely has something to do with debugging code. I was unable to
reproduce this on amd64 with your config. I get another failure with
2.6.14-rc2 on ia64 if I enable all the debugging features that you have.
The system works fine if no debugging is configured:

kernel BUG at kernel/workqueue.c:541!
swapper[1]: bugcheck! 0 [1]
Modules linked in:

Pid: 1, CPU 0, comm: swapper
psr : 00001010085a6010 ifs : 8000000000000105 ip : [<a0000001000e5b10>]
Not tainted
ip is at init_workqueues+0x90/0xa0
unat: 0000000000000000 pfs : 0000000000000105 rsc : 0000000000000003
rnat: 0000000000000000 bsps: 000000000001003e pr : 000000000000a541
ldrs: 0000000000000000 ccv : 0000000000000000 fpsr: 0009804c8a70433f
csd : 0000000000000000 ssd : 0000000000000000
b0 : a0000001000e5b10 b6 : e000003002385ad0 b7 : e000000001fffc00
f6 : 1003e000000003f34da65 f7 : 1003e20c49ba5e353f7cf
f8 : 1003e00000000000000c8 f9 : 10006c7fffffffd73ea5c
f10 : 0fffdfffffffffaf00000 f11 : 1003e0000000000000000
r1 : a000000100cd7670 r2 : 0000000000000001 r3 : e00000b0057c0e00
r8 : 0000000000000029 r9 : 0000000000004000 r10 : 0000000000000001
r11 : 0000000000000002 r12 : e00000b0057c7de0 r13 : e00000b0057c0000
r14 : a0000001009690b0 r15 : e00000b0057c0df4 r16 : e00000b0057c0e00
r17 : 0000000000000001 r18 : 0000000000000002 r19 : a0000001009690b0
r20 : a000000100ad78f8 r21 : ffffffffffffffff r22 : e000000001fffc00
r23 : a000000100b11748 r24 : 0000000000000000 r25 : 0000000000000004
r26 : e00000b0057c0df0 r27 : e00000b0057c7cc8 r28 : e00000b0057c7cd0
r29 : 0000000000000c46 r30 : 0000000000000c46 r31 : 0000000000000308

2005-09-22 19:53:14

by Christoph Lameter

[permalink] [raw]
Subject: Re: 2.6.14-rc1-git-now still dying in mm/slab - this time line 1849

On Wed, 21 Sep 2005, Christoph Lameter wrote:

> Hmm. This likely has something to do with debugging code. I was unable to
> reproduce this on amd64 with your config. I get another failure with
> 2.6.14-rc2 on ia64 if I enable all the debugging features that you have.
> The system works fine if no debugging is configured:
>
> kernel BUG at kernel/workqueue.c:541!
> swapper[1]: bugcheck! 0 [1]

I fixed the above issue (a structure became larger than the maximum
allowed by the slab allocator) and the kernel boots fine now on an 8 way
ia64. Cannot reproduce the problem.

#
# Automatically generated make config: don't edit
# Linux kernel version: 2.6.14-rc2-git1
# Thu Sep 22 10:31:35 2005
#

#
# Code maturity level options
#
CONFIG_EXPERIMENTAL=y
CONFIG_CLEAN_COMPILE=y
CONFIG_LOCK_KERNEL=y
CONFIG_INIT_ENV_ARG_LIMIT=32

#
# General setup
#
CONFIG_LOCALVERSION=""
CONFIG_LOCALVERSION_AUTO=y
CONFIG_SWAP=y
CONFIG_SYSVIPC=y
CONFIG_POSIX_MQUEUE=y
# CONFIG_BSD_PROCESS_ACCT is not set
CONFIG_SYSCTL=y
# CONFIG_AUDIT is not set
CONFIG_HOTPLUG=y
CONFIG_KOBJECT_UEVENT=y
# CONFIG_IKCONFIG is not set
CONFIG_CPUSETS=y
CONFIG_INITRAMFS_SOURCE=""
# CONFIG_EMBEDDED is not set
CONFIG_KALLSYMS=y
CONFIG_KALLSYMS_ALL=y
# CONFIG_KALLSYMS_EXTRA_PASS is not set
CONFIG_PRINTK=y
CONFIG_BUG=y
CONFIG_BASE_FULL=y
CONFIG_FUTEX=y
CONFIG_EPOLL=y
CONFIG_SHMEM=y
CONFIG_CC_ALIGN_FUNCTIONS=0
CONFIG_CC_ALIGN_LABELS=0
CONFIG_CC_ALIGN_LOOPS=0
CONFIG_CC_ALIGN_JUMPS=0
# CONFIG_TINY_SHMEM is not set
CONFIG_BASE_SMALL=0

#
# Loadable module support
#
CONFIG_MODULES=y
CONFIG_MODULE_UNLOAD=y
# CONFIG_MODULE_FORCE_UNLOAD is not set
CONFIG_OBSOLETE_MODPARM=y
# CONFIG_MODVERSIONS is not set
# CONFIG_MODULE_SRCVERSION_ALL is not set
CONFIG_KMOD=y
CONFIG_STOP_MACHINE=y

#
# Processor type and features
#
CONFIG_IA64=y
CONFIG_64BIT=y
CONFIG_MMU=y
CONFIG_RWSEM_XCHGADD_ALGORITHM=y
CONFIG_GENERIC_CALIBRATE_DELAY=y
CONFIG_TIME_INTERPOLATION=y
CONFIG_EFI=y
CONFIG_GENERIC_IOMAP=y
CONFIG_SCHED_NO_NO_OMIT_FRAME_POINTER=y
CONFIG_IA64_UNCACHED_ALLOCATOR=y
# CONFIG_IA64_GENERIC is not set
# CONFIG_IA64_DIG is not set
# CONFIG_IA64_HP_ZX1 is not set
# CONFIG_IA64_HP_ZX1_SWIOTLB is not set
CONFIG_IA64_SGI_SN2=y
# CONFIG_IA64_HP_SIM is not set
# CONFIG_ITANIUM is not set
CONFIG_MCKINLEY=y
# CONFIG_IA64_PAGE_SIZE_4KB is not set
# CONFIG_IA64_PAGE_SIZE_8KB is not set
CONFIG_IA64_PAGE_SIZE_16KB=y
# CONFIG_IA64_PAGE_SIZE_64KB is not set
# CONFIG_HZ_100 is not set
CONFIG_HZ_250=y
# CONFIG_HZ_1000 is not set
CONFIG_HZ=250
CONFIG_IA64_L1_CACHE_SHIFT=7
CONFIG_NUMA=y
CONFIG_VIRTUAL_MEM_MAP=y
CONFIG_HOLES_IN_ZONE=y
CONFIG_ARCH_DISCONTIGMEM_ENABLE=y
# CONFIG_IA64_CYCLONE is not set
CONFIG_IOSAPIC=y
CONFIG_IA64_SGI_SN_XP=m
CONFIG_FORCE_MAX_ZONEORDER=18
CONFIG_SMP=y
CONFIG_NR_CPUS=512
# CONFIG_HOTPLUG_CPU is not set
CONFIG_SCHED_SMT=y
CONFIG_PREEMPT=y
CONFIG_SELECT_MEMORY_MODEL=y
# CONFIG_FLATMEM_MANUAL is not set
CONFIG_DISCONTIGMEM_MANUAL=y
# CONFIG_SPARSEMEM_MANUAL is not set
CONFIG_DISCONTIGMEM=y
CONFIG_FLAT_NODE_MEM_MAP=y
CONFIG_NEED_MULTIPLE_NODES=y
# CONFIG_SPARSEMEM_STATIC is not set
CONFIG_IA32_SUPPORT=y
CONFIG_COMPAT=y
CONFIG_IA64_MCA_RECOVERY=y
CONFIG_PERFMON=y
CONFIG_IA64_PALINFO=y

#
# Firmware Drivers
#
CONFIG_EFI_VARS=y
CONFIG_EFI_PCDP=y
# CONFIG_DELL_RBU is not set
CONFIG_BINFMT_ELF=y
# CONFIG_BINFMT_MISC is not set

#
# Power management and ACPI
#
CONFIG_PM=y
# CONFIG_PM_DEBUG is not set

#
# ACPI (Advanced Configuration and Power Interface) Support
#
CONFIG_ACPI=y
# CONFIG_ACPI_BUTTON is not set
# CONFIG_ACPI_FAN is not set
# CONFIG_ACPI_PROCESSOR is not set
CONFIG_ACPI_NUMA=y
CONFIG_ACPI_BLACKLIST_YEAR=0
# CONFIG_ACPI_DEBUG is not set
CONFIG_ACPI_POWER=y
CONFIG_ACPI_SYSTEM=y
# CONFIG_ACPI_CONTAINER is not set

#
# CPU Frequency scaling
#
# CONFIG_CPU_FREQ is not set

#
# Bus options (PCI, PCMCIA)
#
CONFIG_PCI=y
CONFIG_PCI_DOMAINS=y
# CONFIG_PCI_MSI is not set
CONFIG_PCI_LEGACY_PROC=y
# CONFIG_PCI_DEBUG is not set

#
# PCI Hotplug Support
#
CONFIG_HOTPLUG_PCI=y
# CONFIG_HOTPLUG_PCI_FAKE is not set
# CONFIG_HOTPLUG_PCI_ACPI is not set
# CONFIG_HOTPLUG_PCI_CPCI is not set
# CONFIG_HOTPLUG_PCI_SHPC is not set
CONFIG_HOTPLUG_PCI_SGI=y

#
# PCCARD (PCMCIA/CardBus) support
#
# CONFIG_PCCARD is not set

#
# Networking
#
CONFIG_NET=y

#
# Networking options
#
CONFIG_PACKET=y
CONFIG_PACKET_MMAP=y
CONFIG_UNIX=y
# CONFIG_NET_KEY is not set
CONFIG_INET=y
CONFIG_IP_MULTICAST=y
# CONFIG_IP_ADVANCED_ROUTER is not set
CONFIG_IP_FIB_HASH=y
# CONFIG_IP_PNP is not set
# CONFIG_NET_IPIP is not set
# CONFIG_NET_IPGRE is not set
# CONFIG_IP_MROUTE is not set
# CONFIG_ARPD is not set
CONFIG_SYN_COOKIES=y
# CONFIG_INET_AH is not set
# CONFIG_INET_ESP is not set
# CONFIG_INET_IPCOMP is not set
# CONFIG_INET_TUNNEL is not set
CONFIG_INET_DIAG=y
CONFIG_INET_TCP_DIAG=y
# CONFIG_TCP_CONG_ADVANCED is not set
CONFIG_TCP_CONG_BIC=y
CONFIG_IPV6=m
# CONFIG_IPV6_PRIVACY is not set
# CONFIG_INET6_AH is not set
# CONFIG_INET6_ESP is not set
# CONFIG_INET6_IPCOMP is not set
# CONFIG_INET6_TUNNEL is not set
# CONFIG_IPV6_TUNNEL is not set
# CONFIG_NETFILTER is not set

#
# DCCP Configuration (EXPERIMENTAL)
#
# CONFIG_IP_DCCP is not set

#
# SCTP Configuration (EXPERIMENTAL)
#
# CONFIG_IP_SCTP is not set
# CONFIG_ATM is not set
# CONFIG_BRIDGE is not set
# CONFIG_VLAN_8021Q is not set
# CONFIG_DECNET is not set
# CONFIG_LLC2 is not set
# CONFIG_IPX is not set
# CONFIG_ATALK is not set
# CONFIG_X25 is not set
# CONFIG_LAPB is not set
# CONFIG_NET_DIVERT is not set
# CONFIG_ECONET is not set
# CONFIG_WAN_ROUTER is not set
# CONFIG_NET_SCHED is not set
# CONFIG_NET_CLS_ROUTE is not set

#
# Network testing
#
# CONFIG_NET_PKTGEN is not set
# CONFIG_HAMRADIO is not set
# CONFIG_IRDA is not set
# CONFIG_BT is not set
# CONFIG_IEEE80211 is not set

#
# Device Drivers
#

#
# Generic Driver Options
#
CONFIG_STANDALONE=y
CONFIG_PREVENT_FIRMWARE_BUILD=y
CONFIG_FW_LOADER=y
# CONFIG_DEBUG_DRIVER is not set

#
# Connector - unified userspace <-> kernelspace linker
#
# CONFIG_CONNECTOR is not set

#
# Memory Technology Devices (MTD)
#
# CONFIG_MTD is not set

#
# Parallel port support
#
# CONFIG_PARPORT is not set

#
# Plug and Play support
#
# CONFIG_PNP is not set

#
# Block devices
#
# CONFIG_BLK_CPQ_DA is not set
# CONFIG_BLK_CPQ_CISS_DA is not set
# CONFIG_BLK_DEV_DAC960 is not set
# CONFIG_BLK_DEV_UMEM is not set
# CONFIG_BLK_DEV_COW_COMMON is not set
CONFIG_BLK_DEV_LOOP=y
CONFIG_BLK_DEV_CRYPTOLOOP=m
CONFIG_BLK_DEV_NBD=m
# CONFIG_BLK_DEV_SX8 is not set
# CONFIG_BLK_DEV_UB is not set
CONFIG_BLK_DEV_RAM=y
CONFIG_BLK_DEV_RAM_COUNT=16
CONFIG_BLK_DEV_RAM_SIZE=4096
CONFIG_BLK_DEV_INITRD=y
# CONFIG_CDROM_PKTCDVD is not set

#
# IO Schedulers
#
CONFIG_IOSCHED_NOOP=y
CONFIG_IOSCHED_AS=y
CONFIG_IOSCHED_DEADLINE=y
CONFIG_IOSCHED_CFQ=y
CONFIG_ATA_OVER_ETH=m

#
# ATA/ATAPI/MFM/RLL support
#
CONFIG_IDE=y
CONFIG_BLK_DEV_IDE=y

#
# Please see Documentation/ide.txt for help/info on IDE drives
#
# CONFIG_BLK_DEV_IDE_SATA is not set
CONFIG_BLK_DEV_IDEDISK=y
# CONFIG_IDEDISK_MULTI_MODE is not set
CONFIG_BLK_DEV_IDECD=y
# CONFIG_BLK_DEV_IDETAPE is not set
# CONFIG_BLK_DEV_IDEFLOPPY is not set
# CONFIG_BLK_DEV_IDESCSI is not set
# CONFIG_IDE_TASK_IOCTL is not set

#
# IDE chipset support/bugfixes
#
CONFIG_IDE_GENERIC=y
CONFIG_BLK_DEV_IDEPCI=y
# CONFIG_IDEPCI_SHARE_IRQ is not set
# CONFIG_BLK_DEV_OFFBOARD is not set
# CONFIG_BLK_DEV_GENERIC is not set
# CONFIG_BLK_DEV_OPTI621 is not set
CONFIG_BLK_DEV_IDEDMA_PCI=y
# CONFIG_BLK_DEV_IDEDMA_FORCED is not set
CONFIG_IDEDMA_PCI_AUTO=y
# CONFIG_IDEDMA_ONLYDISK is not set
# CONFIG_BLK_DEV_AEC62XX is not set
# CONFIG_BLK_DEV_ALI15X3 is not set
# CONFIG_BLK_DEV_AMD74XX is not set
# CONFIG_BLK_DEV_CMD64X is not set
# CONFIG_BLK_DEV_TRIFLEX is not set
# CONFIG_BLK_DEV_CY82C693 is not set
# CONFIG_BLK_DEV_CS5520 is not set
# CONFIG_BLK_DEV_CS5530 is not set
# CONFIG_BLK_DEV_HPT34X is not set
# CONFIG_BLK_DEV_HPT366 is not set
# CONFIG_BLK_DEV_SC1200 is not set
# CONFIG_BLK_DEV_PIIX is not set
# CONFIG_BLK_DEV_IT821X is not set
# CONFIG_BLK_DEV_NS87415 is not set
# CONFIG_BLK_DEV_PDC202XX_OLD is not set
# CONFIG_BLK_DEV_PDC202XX_NEW is not set
# CONFIG_BLK_DEV_SVWKS is not set
CONFIG_BLK_DEV_SGIIOC4=y
# CONFIG_BLK_DEV_SIIMAGE is not set
# CONFIG_BLK_DEV_SLC90E66 is not set
# CONFIG_BLK_DEV_TRM290 is not set
# CONFIG_BLK_DEV_VIA82CXXX is not set
# CONFIG_IDE_ARM is not set
CONFIG_BLK_DEV_IDEDMA=y
# CONFIG_IDEDMA_IVB is not set
CONFIG_IDEDMA_AUTO=y
# CONFIG_BLK_DEV_HD is not set

#
# SCSI device support
#
# CONFIG_RAID_ATTRS is not set
CONFIG_SCSI=y
CONFIG_SCSI_PROC_FS=y

#
# SCSI support type (disk, tape, CD-ROM)
#
CONFIG_BLK_DEV_SD=y
CONFIG_CHR_DEV_ST=m
# CONFIG_CHR_DEV_OSST is not set
CONFIG_BLK_DEV_SR=m
# CONFIG_BLK_DEV_SR_VENDOR is not set
CONFIG_CHR_DEV_SG=m
CONFIG_CHR_DEV_SCH=m

#
# Some SCSI devices (e.g. CD jukebox) support multiple LUNs
#
# CONFIG_SCSI_MULTI_LUN is not set
CONFIG_SCSI_CONSTANTS=y
# CONFIG_SCSI_LOGGING is not set

#
# SCSI Transport Attributes
#
CONFIG_SCSI_SPI_ATTRS=y
CONFIG_SCSI_FC_ATTRS=y
# CONFIG_SCSI_ISCSI_ATTRS is not set
# CONFIG_SCSI_SAS_ATTRS is not set

#
# SCSI low-level drivers
#
# CONFIG_BLK_DEV_3W_XXXX_RAID is not set
# CONFIG_SCSI_3W_9XXX is not set
# CONFIG_SCSI_ACARD is not set
# CONFIG_SCSI_AACRAID is not set
# CONFIG_SCSI_AIC7XXX is not set
# CONFIG_SCSI_AIC7XXX_OLD is not set
# CONFIG_SCSI_AIC79XX is not set
# CONFIG_MEGARAID_NEWGEN is not set
# CONFIG_MEGARAID_LEGACY is not set
CONFIG_SCSI_SATA=y
# CONFIG_SCSI_SATA_AHCI is not set
# CONFIG_SCSI_SATA_SVW is not set
# CONFIG_SCSI_ATA_PIIX is not set
# CONFIG_SCSI_SATA_MV is not set
# CONFIG_SCSI_SATA_NV is not set
# CONFIG_SCSI_SATA_PROMISE is not set
# CONFIG_SCSI_SATA_QSTOR is not set
# CONFIG_SCSI_SATA_SX4 is not set
# CONFIG_SCSI_SATA_SIL is not set
# CONFIG_SCSI_SATA_SIS is not set
# CONFIG_SCSI_SATA_ULI is not set
# CONFIG_SCSI_SATA_VIA is not set
CONFIG_SCSI_SATA_VITESSE=y
# CONFIG_SCSI_DMX3191D is not set
# CONFIG_SCSI_FUTURE_DOMAIN is not set
# CONFIG_SCSI_IPS is not set
# CONFIG_SCSI_INITIO is not set
# CONFIG_SCSI_INIA100 is not set
# CONFIG_SCSI_SYM53C8XX_2 is not set
# CONFIG_SCSI_IPR is not set
# CONFIG_SCSI_QLOGIC_FC is not set
CONFIG_SCSI_QLOGIC_1280=y
# CONFIG_SCSI_QLOGIC_1280_1040 is not set
CONFIG_SCSI_QLA2XXX=y
# CONFIG_SCSI_QLA21XX is not set
CONFIG_SCSI_QLA22XX=y
CONFIG_SCSI_QLA2300=y
CONFIG_SCSI_QLA2322=y
# CONFIG_SCSI_QLA6312 is not set
# CONFIG_SCSI_QLA24XX is not set
# CONFIG_SCSI_LPFC is not set
# CONFIG_SCSI_DC395x is not set
# CONFIG_SCSI_DC390T is not set
# CONFIG_SCSI_DEBUG is not set

#
# Multi-device support (RAID and LVM)
#
CONFIG_MD=y
CONFIG_BLK_DEV_MD=y
CONFIG_MD_LINEAR=y
CONFIG_MD_RAID0=y
CONFIG_MD_RAID1=y
# CONFIG_MD_RAID10 is not set
CONFIG_MD_RAID5=y
# CONFIG_MD_RAID6 is not set
CONFIG_MD_MULTIPATH=y
# CONFIG_MD_FAULTY is not set
CONFIG_BLK_DEV_DM=y
CONFIG_DM_CRYPT=m
CONFIG_DM_SNAPSHOT=m
CONFIG_DM_MIRROR=m
CONFIG_DM_ZERO=m
CONFIG_DM_MULTIPATH=m
CONFIG_DM_MULTIPATH_EMC=m

#
# Fusion MPT device support
#
CONFIG_FUSION=y
CONFIG_FUSION_SPI=y
CONFIG_FUSION_FC=y
# CONFIG_FUSION_SAS is not set
CONFIG_FUSION_MAX_SGE=128
CONFIG_FUSION_CTL=m

#
# IEEE 1394 (FireWire) support
#
# CONFIG_IEEE1394 is not set

#
# I2O device support
#
# CONFIG_I2O is not set

#
# Network device support
#
CONFIG_NETDEVICES=y
# CONFIG_DUMMY is not set
# CONFIG_BONDING is not set
# CONFIG_EQUALIZER is not set
# CONFIG_TUN is not set

#
# ARCnet devices
#
# CONFIG_ARCNET is not set

#
# PHY device support
#

#
# Ethernet (10 or 100Mbit)
#
# CONFIG_NET_ETHERNET is not set

#
# Ethernet (1000 Mbit)
#
# CONFIG_ACENIC is not set
# CONFIG_DL2K is not set
# CONFIG_E1000 is not set
# CONFIG_NS83820 is not set
# CONFIG_HAMACHI is not set
# CONFIG_YELLOWFIN is not set
# CONFIG_R8169 is not set
# CONFIG_SIS190 is not set
# CONFIG_SKGE is not set
# CONFIG_SK98LIN is not set
CONFIG_TIGON3=y
# CONFIG_BNX2 is not set

#
# Ethernet (10000 Mbit)
#
# CONFIG_CHELSIO_T1 is not set
# CONFIG_IXGB is not set
CONFIG_S2IO=m
# CONFIG_S2IO_NAPI is not set
# CONFIG_2BUFF_MODE is not set

#
# Token Ring devices
#
# CONFIG_TR is not set

#
# Wireless LAN (non-hamradio)
#
# CONFIG_NET_RADIO is not set

#
# Wan interfaces
#
# CONFIG_WAN is not set
# CONFIG_FDDI is not set
# CONFIG_HIPPI is not set
# CONFIG_PPP is not set
# CONFIG_SLIP is not set
# CONFIG_NET_FC is not set
# CONFIG_SHAPER is not set
CONFIG_NETCONSOLE=y
CONFIG_NETPOLL=y
# CONFIG_NETPOLL_RX is not set
# CONFIG_NETPOLL_TRAP is not set
CONFIG_NET_POLL_CONTROLLER=y

#
# ISDN subsystem
#
# CONFIG_ISDN is not set

#
# Telephony Support
#
# CONFIG_PHONE is not set

#
# Input device support
#
CONFIG_INPUT=y

#
# Userland interfaces
#
CONFIG_INPUT_MOUSEDEV=y
# CONFIG_INPUT_MOUSEDEV_PSAUX is not set
CONFIG_INPUT_MOUSEDEV_SCREEN_X=1024
CONFIG_INPUT_MOUSEDEV_SCREEN_Y=768
# CONFIG_INPUT_JOYDEV is not set
# CONFIG_INPUT_TSDEV is not set
# CONFIG_INPUT_EVDEV is not set
# CONFIG_INPUT_EVBUG is not set

#
# Input Device Drivers
#
# CONFIG_INPUT_KEYBOARD is not set
# CONFIG_INPUT_MOUSE is not set
# CONFIG_INPUT_JOYSTICK is not set
# CONFIG_INPUT_TOUCHSCREEN is not set
# CONFIG_INPUT_MISC is not set

#
# Hardware I/O ports
#
# CONFIG_SERIO is not set
# CONFIG_GAMEPORT is not set

#
# Character devices
#
CONFIG_VT=y
CONFIG_VT_CONSOLE=y
CONFIG_HW_CONSOLE=y
CONFIG_SERIAL_NONSTANDARD=y
# CONFIG_ROCKETPORT is not set
# CONFIG_CYCLADES is not set
# CONFIG_DIGIEPCA is not set
# CONFIG_MOXA_SMARTIO is not set
# CONFIG_ISI is not set
# CONFIG_SYNCLINKMP is not set
# CONFIG_N_HDLC is not set
# CONFIG_SPECIALIX is not set
# CONFIG_SX is not set
# CONFIG_STALDRV is not set
CONFIG_SGI_SNSC=y
CONFIG_SGI_TIOCX=y
CONFIG_SGI_MBCS=m

#
# Serial drivers
#
# CONFIG_SERIAL_8250 is not set

#
# Non-8250 serial port support
#
CONFIG_SERIAL_CORE=y
CONFIG_SERIAL_CORE_CONSOLE=y
CONFIG_SERIAL_SGI_L1_CONSOLE=y
# CONFIG_SERIAL_JSM is not set
CONFIG_SERIAL_SGI_IOC4=y
CONFIG_UNIX98_PTYS=y
CONFIG_LEGACY_PTYS=y
CONFIG_LEGACY_PTY_COUNT=256

#
# IPMI
#
# CONFIG_IPMI_HANDLER is not set

#
# Watchdog Cards
#
# CONFIG_WATCHDOG is not set
# CONFIG_HW_RANDOM is not set
CONFIG_EFI_RTC=y
# CONFIG_DTLK is not set
# CONFIG_R3964 is not set
# CONFIG_APPLICOM is not set

#
# Ftape, the floppy tape device driver
#
# CONFIG_AGP is not set
# CONFIG_DRM is not set
CONFIG_RAW_DRIVER=m
# CONFIG_HPET is not set
CONFIG_MAX_RAW_DEVS=256
# CONFIG_HANGCHECK_TIMER is not set
CONFIG_MMTIMER=y

#
# TPM devices
#
# CONFIG_TCG_TPM is not set

#
# I2C support
#
# CONFIG_I2C is not set

#
# Dallas's 1-wire bus
#
# CONFIG_W1 is not set

#
# Hardware Monitoring support
#
# CONFIG_HWMON is not set
# CONFIG_HWMON_VID is not set

#
# Misc devices
#

#
# Multimedia Capabilities Port drivers
#

#
# Multimedia devices
#
# CONFIG_VIDEO_DEV is not set

#
# Digital Video Broadcasting Devices
#
# CONFIG_DVB is not set

#
# Graphics support
#
# CONFIG_FB is not set

#
# Console display driver support
#
CONFIG_VGA_CONSOLE=y
CONFIG_DUMMY_CONSOLE=y

#
# Sound
#
# CONFIG_SOUND is not set

#
# USB support
#
CONFIG_USB_ARCH_HAS_HCD=y
CONFIG_USB_ARCH_HAS_OHCI=y
CONFIG_USB=m
# CONFIG_USB_DEBUG is not set

#
# Miscellaneous USB options
#
# CONFIG_USB_DEVICEFS is not set
# CONFIG_USB_BANDWIDTH is not set
# CONFIG_USB_DYNAMIC_MINORS is not set
# CONFIG_USB_SUSPEND is not set
# CONFIG_USB_OTG is not set

#
# USB Host Controller Drivers
#
CONFIG_USB_EHCI_HCD=m
# CONFIG_USB_EHCI_SPLIT_ISO is not set
# CONFIG_USB_EHCI_ROOT_HUB_TT is not set
# CONFIG_USB_ISP116X_HCD is not set
CONFIG_USB_OHCI_HCD=m
# CONFIG_USB_OHCI_BIG_ENDIAN is not set
CONFIG_USB_OHCI_LITTLE_ENDIAN=y
CONFIG_USB_UHCI_HCD=m
# CONFIG_USB_SL811_HCD is not set

#
# USB Device Class drivers
#
# CONFIG_USB_BLUETOOTH_TTY is not set
# CONFIG_USB_ACM is not set
# CONFIG_USB_PRINTER is not set

#
# NOTE: USB_STORAGE enables SCSI, and 'SCSI disk support' may also be needed; see USB_STORAGE Help for more information
#
# CONFIG_USB_STORAGE is not set

#
# USB Input Devices
#
CONFIG_USB_HID=m
CONFIG_USB_HIDINPUT=y
# CONFIG_HID_FF is not set
# CONFIG_USB_HIDDEV is not set

#
# USB HID Boot Protocol drivers
#
# CONFIG_USB_KBD is not set
# CONFIG_USB_MOUSE is not set
# CONFIG_USB_AIPTEK is not set
# CONFIG_USB_WACOM is not set
# CONFIG_USB_ACECAD is not set
# CONFIG_USB_KBTAB is not set
# CONFIG_USB_POWERMATE is not set
# CONFIG_USB_MTOUCH is not set
# CONFIG_USB_ITMTOUCH is not set
# CONFIG_USB_EGALAX is not set
# CONFIG_USB_YEALINK is not set
# CONFIG_USB_XPAD is not set
# CONFIG_USB_ATI_REMOTE is not set
# CONFIG_USB_KEYSPAN_REMOTE is not set
# CONFIG_USB_APPLETOUCH is not set

#
# USB Imaging devices
#
# CONFIG_USB_MDC800 is not set
# CONFIG_USB_MICROTEK is not set

#
# USB Multimedia devices
#
# CONFIG_USB_DABUSB is not set

#
# Video4Linux support is needed for USB Multimedia device support
#

#
# USB Network Adapters
#
# CONFIG_USB_CATC is not set
# CONFIG_USB_KAWETH is not set
# CONFIG_USB_PEGASUS is not set
# CONFIG_USB_RTL8150 is not set
# CONFIG_USB_USBNET is not set
CONFIG_USB_MON=y

#
# USB port drivers
#

#
# USB Serial Converter support
#
# CONFIG_USB_SERIAL is not set

#
# USB Miscellaneous drivers
#
# CONFIG_USB_EMI62 is not set
# CONFIG_USB_EMI26 is not set
# CONFIG_USB_AUERSWALD is not set
# CONFIG_USB_RIO500 is not set
# CONFIG_USB_LEGOTOWER is not set
# CONFIG_USB_LCD is not set
# CONFIG_USB_LED is not set
# CONFIG_USB_CYTHERM is not set
# CONFIG_USB_PHIDGETKIT is not set
# CONFIG_USB_PHIDGETSERVO is not set
# CONFIG_USB_IDMOUSE is not set
# CONFIG_USB_SISUSBVGA is not set
# CONFIG_USB_LD is not set

#
# USB DSL modem support
#

#
# USB Gadget Support
#
# CONFIG_USB_GADGET is not set

#
# MMC/SD Card support
#
# CONFIG_MMC is not set

#
# InfiniBand support
#
CONFIG_INFINIBAND=m
# CONFIG_INFINIBAND_USER_MAD is not set
# CONFIG_INFINIBAND_USER_ACCESS is not set
CONFIG_INFINIBAND_MTHCA=m
# CONFIG_INFINIBAND_MTHCA_DEBUG is not set
CONFIG_INFINIBAND_IPOIB=m
# CONFIG_INFINIBAND_IPOIB_DEBUG is not set

#
# SN Devices
#
CONFIG_SGI_IOC4=y

#
# File systems
#
CONFIG_EXT2_FS=y
CONFIG_EXT2_FS_XATTR=y
CONFIG_EXT2_FS_POSIX_ACL=y
CONFIG_EXT2_FS_SECURITY=y
# CONFIG_EXT2_FS_XIP is not set
CONFIG_EXT3_FS=y
CONFIG_EXT3_FS_XATTR=y
CONFIG_EXT3_FS_POSIX_ACL=y
CONFIG_EXT3_FS_SECURITY=y
CONFIG_JBD=y
# CONFIG_JBD_DEBUG is not set
CONFIG_FS_MBCACHE=y
CONFIG_REISERFS_FS=y
# CONFIG_REISERFS_CHECK is not set
# CONFIG_REISERFS_PROC_INFO is not set
CONFIG_REISERFS_FS_XATTR=y
CONFIG_REISERFS_FS_POSIX_ACL=y
CONFIG_REISERFS_FS_SECURITY=y
# CONFIG_JFS_FS is not set
CONFIG_FS_POSIX_ACL=y
CONFIG_XFS_FS=y
CONFIG_XFS_EXPORT=y
CONFIG_XFS_QUOTA=y
# CONFIG_XFS_SECURITY is not set
CONFIG_XFS_POSIX_ACL=y
CONFIG_XFS_RT=y
# CONFIG_MINIX_FS is not set
# CONFIG_ROMFS_FS is not set
CONFIG_INOTIFY=y
CONFIG_QUOTA=y
# CONFIG_QFMT_V1 is not set
# CONFIG_QFMT_V2 is not set
CONFIG_QUOTACTL=y
CONFIG_DNOTIFY=y
CONFIG_AUTOFS_FS=m
CONFIG_AUTOFS4_FS=m
# CONFIG_FUSE_FS is not set

#
# CD-ROM/DVD Filesystems
#
CONFIG_ISO9660_FS=y
CONFIG_JOLIET=y
# CONFIG_ZISOFS is not set
CONFIG_UDF_FS=m
CONFIG_UDF_NLS=y

#
# DOS/FAT/NT Filesystems
#
CONFIG_FAT_FS=y
# CONFIG_MSDOS_FS is not set
CONFIG_VFAT_FS=y
CONFIG_FAT_DEFAULT_CODEPAGE=437
CONFIG_FAT_DEFAULT_IOCHARSET="iso8859-1"
# CONFIG_NTFS_FS is not set

#
# Pseudo filesystems
#
CONFIG_PROC_FS=y
CONFIG_PROC_KCORE=y
CONFIG_SYSFS=y
CONFIG_TMPFS=y
CONFIG_HUGETLBFS=y
CONFIG_HUGETLB_PAGE=y
CONFIG_RAMFS=y
# CONFIG_RELAYFS_FS is not set

#
# Miscellaneous filesystems
#
# CONFIG_ADFS_FS is not set
# CONFIG_AFFS_FS is not set
# CONFIG_HFS_FS is not set
# CONFIG_HFSPLUS_FS is not set
# CONFIG_BEFS_FS is not set
# CONFIG_BFS_FS is not set
# CONFIG_EFS_FS is not set
# CONFIG_CRAMFS is not set
# CONFIG_VXFS_FS is not set
# CONFIG_HPFS_FS is not set
# CONFIG_QNX4FS_FS is not set
# CONFIG_SYSV_FS is not set
# CONFIG_UFS_FS is not set

#
# Network File Systems
#
CONFIG_NFS_FS=m
CONFIG_NFS_V3=y
# CONFIG_NFS_V3_ACL is not set
CONFIG_NFS_V4=y
CONFIG_NFS_DIRECTIO=y
CONFIG_NFSD=m
CONFIG_NFSD_V3=y
# CONFIG_NFSD_V3_ACL is not set
CONFIG_NFSD_V4=y
CONFIG_NFSD_TCP=y
CONFIG_LOCKD=m
CONFIG_LOCKD_V4=y
CONFIG_EXPORTFS=y
CONFIG_NFS_COMMON=y
CONFIG_SUNRPC=m
CONFIG_SUNRPC_GSS=m
CONFIG_RPCSEC_GSS_KRB5=m
# CONFIG_RPCSEC_GSS_SPKM3 is not set
CONFIG_SMB_FS=m
# CONFIG_SMB_NLS_DEFAULT is not set
CONFIG_CIFS=m
# CONFIG_CIFS_STATS is not set
# CONFIG_CIFS_XATTR is not set
# CONFIG_CIFS_EXPERIMENTAL is not set
# CONFIG_NCP_FS is not set
# CONFIG_CODA_FS is not set
# CONFIG_AFS_FS is not set
# CONFIG_9P_FS is not set

#
# Partition Types
#
CONFIG_PARTITION_ADVANCED=y
# CONFIG_ACORN_PARTITION is not set
# CONFIG_OSF_PARTITION is not set
# CONFIG_AMIGA_PARTITION is not set
# CONFIG_ATARI_PARTITION is not set
# CONFIG_MAC_PARTITION is not set
CONFIG_MSDOS_PARTITION=y
# CONFIG_BSD_DISKLABEL is not set
# CONFIG_MINIX_SUBPARTITION is not set
# CONFIG_SOLARIS_X86_PARTITION is not set
# CONFIG_UNIXWARE_DISKLABEL is not set
# CONFIG_LDM_PARTITION is not set
CONFIG_SGI_PARTITION=y
# CONFIG_ULTRIX_PARTITION is not set
# CONFIG_SUN_PARTITION is not set
CONFIG_EFI_PARTITION=y

#
# Native Language Support
#
CONFIG_NLS=y
CONFIG_NLS_DEFAULT="iso8859-1"
CONFIG_NLS_CODEPAGE_437=y
# CONFIG_NLS_CODEPAGE_737 is not set
# CONFIG_NLS_CODEPAGE_775 is not set
# CONFIG_NLS_CODEPAGE_850 is not set
# CONFIG_NLS_CODEPAGE_852 is not set
# CONFIG_NLS_CODEPAGE_855 is not set
# CONFIG_NLS_CODEPAGE_857 is not set
# CONFIG_NLS_CODEPAGE_860 is not set
# CONFIG_NLS_CODEPAGE_861 is not set
# CONFIG_NLS_CODEPAGE_862 is not set
# CONFIG_NLS_CODEPAGE_863 is not set
# CONFIG_NLS_CODEPAGE_864 is not set
# CONFIG_NLS_CODEPAGE_865 is not set
# CONFIG_NLS_CODEPAGE_866 is not set
# CONFIG_NLS_CODEPAGE_869 is not set
# CONFIG_NLS_CODEPAGE_936 is not set
# CONFIG_NLS_CODEPAGE_950 is not set
# CONFIG_NLS_CODEPAGE_932 is not set
# CONFIG_NLS_CODEPAGE_949 is not set
# CONFIG_NLS_CODEPAGE_874 is not set
# CONFIG_NLS_ISO8859_8 is not set
# CONFIG_NLS_CODEPAGE_1250 is not set
# CONFIG_NLS_CODEPAGE_1251 is not set
# CONFIG_NLS_ASCII is not set
CONFIG_NLS_ISO8859_1=y
# CONFIG_NLS_ISO8859_2 is not set
# CONFIG_NLS_ISO8859_3 is not set
# CONFIG_NLS_ISO8859_4 is not set
# CONFIG_NLS_ISO8859_5 is not set
# CONFIG_NLS_ISO8859_6 is not set
# CONFIG_NLS_ISO8859_7 is not set
# CONFIG_NLS_ISO8859_9 is not set
# CONFIG_NLS_ISO8859_13 is not set
# CONFIG_NLS_ISO8859_14 is not set
# CONFIG_NLS_ISO8859_15 is not set
# CONFIG_NLS_KOI8_R is not set
# CONFIG_NLS_KOI8_U is not set
CONFIG_NLS_UTF8=y

#
# Library routines
#
# CONFIG_CRC_CCITT is not set
# CONFIG_CRC16 is not set
CONFIG_CRC32=y
# CONFIG_LIBCRC32C is not set
CONFIG_ZLIB_INFLATE=m
CONFIG_ZLIB_DEFLATE=m
CONFIG_GENERIC_ALLOCATOR=y
CONFIG_GENERIC_HARDIRQS=y
CONFIG_GENERIC_IRQ_PROBE=y
CONFIG_GENERIC_PENDING_IRQ=y

#
# Profiling support
#
# CONFIG_PROFILING is not set

#
# Kernel hacking
#
# CONFIG_PRINTK_TIME is not set
CONFIG_DEBUG_KERNEL=y
CONFIG_MAGIC_SYSRQ=y
CONFIG_LOG_BUF_SHIFT=20
CONFIG_DETECT_SOFTLOCKUP=y
CONFIG_SCHEDSTATS=y
CONFIG_DEBUG_SLAB=y
CONFIG_DEBUG_PREEMPT=y
CONFIG_DEBUG_SPINLOCK=y
CONFIG_DEBUG_SPINLOCK_SLEEP=y
# CONFIG_DEBUG_KOBJECT is not set
CONFIG_DEBUG_INFO=y
CONFIG_DEBUG_FS=y
# CONFIG_KPROBES is not set
CONFIG_IA64_GRANULE_16MB=y
# CONFIG_IA64_GRANULE_64MB is not set
# CONFIG_IA64_PRINT_HAZARDS is not set
# CONFIG_DISABLE_VHPT is not set
# CONFIG_IA64_DEBUG_CMPXCHG is not set
# CONFIG_IA64_DEBUG_IRQ is not set
CONFIG_SYSVIPC_COMPAT=y

#
# Security options
#
# CONFIG_KEYS is not set
# CONFIG_SECURITY is not set

#
# Cryptographic options
#
CONFIG_CRYPTO=y
CONFIG_CRYPTO_HMAC=y
# CONFIG_CRYPTO_NULL is not set
# CONFIG_CRYPTO_MD4 is not set
CONFIG_CRYPTO_MD5=y
CONFIG_CRYPTO_SHA1=m
# CONFIG_CRYPTO_SHA256 is not set
# CONFIG_CRYPTO_SHA512 is not set
# CONFIG_CRYPTO_WP512 is not set
# CONFIG_CRYPTO_TGR192 is not set
CONFIG_CRYPTO_DES=m
# CONFIG_CRYPTO_BLOWFISH is not set
# CONFIG_CRYPTO_TWOFISH is not set
# CONFIG_CRYPTO_SERPENT is not set
# CONFIG_CRYPTO_AES is not set
# CONFIG_CRYPTO_CAST5 is not set
# CONFIG_CRYPTO_CAST6 is not set
# CONFIG_CRYPTO_TEA is not set
# CONFIG_CRYPTO_ARC4 is not set
# CONFIG_CRYPTO_KHAZAD is not set
# CONFIG_CRYPTO_ANUBIS is not set
CONFIG_CRYPTO_DEFLATE=m
# CONFIG_CRYPTO_MICHAEL_MIC is not set
# CONFIG_CRYPTO_CRC32C is not set
# CONFIG_CRYPTO_TEST is not set

#
# Hardware crypto devices
#

2005-09-22 20:02:59

by Andrew Morton

[permalink] [raw]
Subject: Re: 2.6.14-rc1-git-now still dying in mm/slab - this time line 1849

Christoph Lameter <[email protected]> wrote:
>
> On Wed, 21 Sep 2005, Christoph Lameter wrote:
>
> > Hmm. This likely has something to do with debugging code. I was unable to
> > reproduce this on amd64 with your config. I get another failure with
> > 2.6.14-rc2 on ia64 if I enable all the debugging features that you have.
> > The system works fine if no debugging is configured:
> >
> > kernel BUG at kernel/workqueue.c:541!
> > swapper[1]: bugcheck! 0 [1]
>
> I fixed the above issue (a structure became larger than the maximum
> allowed by the slab allocator) and the kernel boots fine now on an 8 way
> ia64. Cannot reproduce the problem.

Petr can. I think we're still waiting for him to test the below (please):




Begin forwarded message:

Date: Tue, 20 Sep 2005 18:03:54 -0700 (PDT)
From: Christoph Lameter <[email protected]>
To: Petr Vandrovec <[email protected]>
Cc: Andrew Morton <[email protected]>, [email protected], [email protected], [email protected]
Subject: Re: 2.6.14-rc1-git-now still dying in mm/slab - this time line 1849


On Tue, 20 Sep 2005, Petr Vandrovec wrote:

> slab belonging to node#1, while having acquired lock for cachep belonging
> to node #0. Due to this check_spinlock_acquired_node(cachep, nodeid) fails
> (check_spinlock_acquired_node(cachep, 0) would succeed).

Hmmm. If a node runs out of memory then pages from another node may end up
on the slab list of a node. But it seems that free_block cannot handle
that properly.

How are you producing the problem?

Could you try the following patch:

---

The numa slab allocator may allocate pages from foreign nodes onto the lists
for a particular node if a node runs out of memory. Inspecting the slab->nodeid
field will not reflect that the page is now in use for the slabs of another node.

This patch fixes that issue by adding a node field to free_block so that the caller
can indicate which node currently uses a slab.

Also removes the check for the current node from kmalloc_cache_node since the
process may shift later to another node which may lead to an allocation on another
node than intended.

Signed-off-by: Christoph Lameter <[email protected]>

Index: linux-2.6.14-rc1/mm/slab.c
===================================================================
--- linux-2.6.14-rc1.orig/mm/slab.c 2005-09-21 00:09:05.000000000 +0000
+++ linux-2.6.14-rc1/mm/slab.c 2005-09-21 00:48:12.000000000 +0000
@@ -639,7 +639,7 @@ static enum {

static DEFINE_PER_CPU(struct work_struct, reap_work);

-static void free_block(kmem_cache_t* cachep, void** objpp, int len);
+static void free_block(kmem_cache_t* cachep, void** objpp, int len, int node);
static void enable_cpucache (kmem_cache_t *cachep);
static void cache_reap (void *unused);
static int __node_shrink(kmem_cache_t *cachep, int node);
@@ -804,7 +804,7 @@ static inline void __drain_alien_cache(k

if (ac->avail) {
spin_lock(&rl3->list_lock);
- free_block(cachep, ac->entry, ac->avail);
+ free_block(cachep, ac->entry, ac->avail, node);
ac->avail = 0;
spin_unlock(&rl3->list_lock);
}
@@ -925,7 +925,7 @@ static int __devinit cpuup_callback(stru
/* Free limit for this kmem_list3 */
l3->free_limit -= cachep->batchcount;
if (nc)
- free_block(cachep, nc->entry, nc->avail);
+ free_block(cachep, nc->entry, nc->avail, node);

if (!cpus_empty(mask)) {
spin_unlock(&l3->list_lock);
@@ -934,7 +934,7 @@ static int __devinit cpuup_callback(stru

if (l3->shared) {
free_block(cachep, l3->shared->entry,
- l3->shared->avail);
+ l3->shared->avail, node);
kfree(l3->shared);
l3->shared = NULL;
}
@@ -1882,12 +1882,13 @@ static void do_drain(void *arg)
{
kmem_cache_t *cachep = (kmem_cache_t*)arg;
struct array_cache *ac;
+ int node = numa_node_id();

check_irq_off();
ac = ac_data(cachep);
- spin_lock(&cachep->nodelists[numa_node_id()]->list_lock);
- free_block(cachep, ac->entry, ac->avail);
- spin_unlock(&cachep->nodelists[numa_node_id()]->list_lock);
+ spin_lock(&cachep->nodelists[node]->list_lock);
+ free_block(cachep, ac->entry, ac->avail, node);
+ spin_unlock(&cachep->nodelists[node]->list_lock);
ac->avail = 0;
}

@@ -2608,7 +2609,7 @@ done:
/*
* Caller needs to acquire correct kmem_list's list_lock
*/
-static void free_block(kmem_cache_t *cachep, void **objpp, int nr_objects)
+static void free_block(kmem_cache_t *cachep, void **objpp, int nr_objects, int node)
{
int i;
struct kmem_list3 *l3;
@@ -2617,14 +2618,12 @@ static void free_block(kmem_cache_t *cac
void *objp = objpp[i];
struct slab *slabp;
unsigned int objnr;
- int nodeid = 0;

slabp = GET_PAGE_SLAB(virt_to_page(objp));
- nodeid = slabp->nodeid;
- l3 = cachep->nodelists[nodeid];
+ l3 = cachep->nodelists[node];
list_del(&slabp->list);
objnr = (objp - slabp->s_mem) / cachep->objsize;
- check_spinlock_acquired_node(cachep, nodeid);
+ check_spinlock_acquired_node(cachep, node);
check_slabp(cachep, slabp);


@@ -2664,13 +2663,14 @@ static void cache_flusharray(kmem_cache_
{
int batchcount;
struct kmem_list3 *l3;
+ int node = numa_node_id();

batchcount = ac->batchcount;
#if DEBUG
BUG_ON(!batchcount || batchcount > ac->avail);
#endif
check_irq_off();
- l3 = cachep->nodelists[numa_node_id()];
+ l3 = cachep->nodelists[node];
spin_lock(&l3->list_lock);
if (l3->shared) {
struct array_cache *shared_array = l3->shared;
@@ -2686,7 +2686,7 @@ static void cache_flusharray(kmem_cache_
}
}

- free_block(cachep, ac->entry, batchcount);
+ free_block(cachep, ac->entry, batchcount, node);
free_done:
#if STATS
{
@@ -2751,7 +2751,7 @@ static inline void __cache_free(kmem_cac
} else {
spin_lock(&(cachep->nodelists[nodeid])->
list_lock);
- free_block(cachep, &objp, 1);
+ free_block(cachep, &objp, 1, nodeid);
spin_unlock(&(cachep->nodelists[nodeid])->
list_lock);
}
@@ -2844,7 +2844,7 @@ void *kmem_cache_alloc_node(kmem_cache_t
unsigned long save_flags;
void *ptr;

- if (nodeid == numa_node_id() || nodeid == -1)
+ if (nodeid == -1)
return __cache_alloc(cachep, flags);

if (unlikely(!cachep->nodelists[nodeid])) {
@@ -3079,7 +3079,7 @@ static int alloc_kmemlist(kmem_cache_t *

if ((nc = cachep->nodelists[node]->shared))
free_block(cachep, nc->entry,
- nc->avail);
+ nc->avail, node);

l3->shared = new;
if (!cachep->nodelists[node]->alien) {
@@ -3160,7 +3160,7 @@ static int do_tune_cpucache(kmem_cache_t
if (!ccold)
continue;
spin_lock_irq(&cachep->nodelists[cpu_to_node(i)]->list_lock);
- free_block(cachep, ccold->entry, ccold->avail);
+ free_block(cachep, ccold->entry, ccold->avail, cpu_to_node(i));
spin_unlock_irq(&cachep->nodelists[cpu_to_node(i)]->list_lock);
kfree(ccold);
}
@@ -3240,7 +3240,7 @@ static void drain_array_locked(kmem_cach
if (tofree > ac->avail) {
tofree = (ac->avail+1)/2;
}
- free_block(cachep, ac->entry, tofree);
+ free_block(cachep, ac->entry, tofree, node);
ac->avail -= tofree;
memmove(ac->entry, &(ac->entry[tofree]),
sizeof(void*)*ac->avail);

2005-09-22 21:25:56

by Petr Vandrovec

[permalink] [raw]
Subject: Re: 2.6.14-rc1-git-now still dying in mm/slab - this time line 1849

Andrew Morton wrote:
> Christoph Lameter <[email protected]> wrote:
>
>>On Wed, 21 Sep 2005, Christoph Lameter wrote:
>>
>> > Hmm. This likely has something to do with debugging code. I was unable to
>> > reproduce this on amd64 with your config. I get another failure with
>> > 2.6.14-rc2 on ia64 if I enable all the debugging features that you have.
>> > The system works fine if no debugging is configured:
>> >
>> > kernel BUG at kernel/workqueue.c:541!
>> > swapper[1]: bugcheck! 0 [1]
>>
>> I fixed the above issue (a structure became larger than the maximum
>> allowed by the slab allocator) and the kernel boots fine now on an 8 way
>> ia64. Cannot reproduce the problem.
>
>
> Petr can. I think we're still waiting for him to test the below (please):

Sorry, I've missed that half of email completely. Yes, it seems to fix problem,
box has currently 8 min uptime, which is 7:55 more than it survived before.
Thanks,
Petr Vandrovec

> Could you try the following patch:
>
> ---
>
> The numa slab allocator may allocate pages from foreign nodes onto the lists
> for a particular node if a node runs out of memory. Inspecting the slab->nodeid
> field will not reflect that the page is now in use for the slabs of another node.
>
> This patch fixes that issue by adding a node field to free_block so that the caller
> can indicate which node currently uses a slab.
>
> Also removes the check for the current node from kmalloc_cache_node since the
> process may shift later to another node which may lead to an allocation on another
> node than intended.
>
> Signed-off-by: Christoph Lameter <[email protected]>

2005-09-22 21:32:49

by Christoph Lameter

[permalink] [raw]
Subject: Re: 2.6.14-rc1-git-now still dying in mm/slab - this time line 1849

On Thu, 22 Sep 2005, Petr Vandrovec wrote:

> Sorry, I've missed that half of email completely. Yes, it seems to fix
> problem,
> box has currently 8 min uptime, which is 7:55 more than it survived before.

I thought the box did not boot at all? The problem appears on an
otherwise idle machine after bootup?

2005-09-22 21:46:53

by Andrew Morton

[permalink] [raw]
Subject: Re: 2.6.14-rc1-git-now still dying in mm/slab - this time line 1849

Petr Vandrovec <[email protected]> wrote:
>
> Andrew Morton wrote:
> > Christoph Lameter <[email protected]> wrote:
> >
> >>On Wed, 21 Sep 2005, Christoph Lameter wrote:
> >>
> >> > Hmm. This likely has something to do with debugging code. I was unable to
> >> > reproduce this on amd64 with your config. I get another failure with
> >> > 2.6.14-rc2 on ia64 if I enable all the debugging features that you have.
> >> > The system works fine if no debugging is configured:
> >> >
> >> > kernel BUG at kernel/workqueue.c:541!
> >> > swapper[1]: bugcheck! 0 [1]
> >>
> >> I fixed the above issue (a structure became larger than the maximum
> >> allowed by the slab allocator) and the kernel boots fine now on an 8 way
> >> ia64. Cannot reproduce the problem.
> >
> >
> > Petr can. I think we're still waiting for him to test the below (please):
>
> Sorry, I've missed that half of email completely. Yes, it seems to fix problem,
> box has currently 8 min uptime, which is 7:55 more than it survived before.

Great, thanks. Christoph, was that patch the final official version?

2005-09-22 21:54:52

by Christoph Lameter

[permalink] [raw]
Subject: Re: 2.6.14-rc1-git-now still dying in mm/slab - this time line 1849

On Thu, 22 Sep 2005, Andrew Morton wrote:

> Great, thanks. Christoph, was that patch the final official version?

This should deal with the node ownership issue. So yes.

I still have some open question on how pages ended up on the wrong node.
This should only happen if a zone / node has run out of memory. If pages
ended up on the wrong node without that then there may be a different
issue still to be fixed.

Maybe Petr can give us some more details on when the problem occurs?

2005-09-23 00:25:57

by Petr Vandrovec

[permalink] [raw]
Subject: Re: 2.6.14-rc1-git-now still dying in mm/slab - this time line 1849

Christoph Lameter wrote:
> On Thu, 22 Sep 2005, Andrew Morton wrote:
>
>
>>Great, thanks. Christoph, was that patch the final official version?
>
>
> This should deal with the node ownership issue. So yes.
>
> I still have some open question on how pages ended up on the wrong node.
> This should only happen if a zone / node has run out of memory. If pages
> ended up on the wrong node without that then there may be a different
> issue still to be fixed.
>
> Maybe Petr can give us some more details on when the problem occurs?

Problem seems to happen immediately, and just first run of cache_reap
(2 seconds after eventd initializes if I understand it correctly) already
finds problem.

But I'm confused. I've just added code which is supposed to verify all
additions to the cache entry[] (http://platan.vc.cvut.cz/verify-all-entry-add.diff)
on the top of Christoph patch to catch one which later causes problem in cache_reap,
and it logs nothing at the time crash was happening :-( Only incident it logs is
"while (batchcount > 0)" loop in cache_alloc_refill, saying that

objp ffff81007ffd9430 belonging to the slab ffff81007ffd9000 which belongs
to node 1 was added to array_cache belonging to node 0 (called from
ffffffff8016e4a9) (mm/slab.c ~ line 2430)
... cache avc_node

This repeats couple of times, for avc_node, mnt_cache, proc_inode_cache
and bdev_cache. Nothing else.

So I've reverted your fix, and still I did not catch offender, so I'm probably
missing some place which populates array_cache entry[] :-(

Only if after I added logging to free_block() I was able to find that offender is
proc_inode_cache. But I have no idea how this object appeared in the incorrect
node cache...
Petr

2005-09-23 19:31:43

by Alok Kataria

[permalink] [raw]
Subject: Re: 2.6.14-rc1-git-now still dying in mm/slab - this time line 1849

On Wed, 2005-09-21 at 06:33, Christoph Lameter wrote:

Hi Christoph,
I have some doubts over this...

>/On Tue, 20 Sep 2005, Petr Vandrovec wrote:
>
>> slab belonging to node#1, while having acquired lock for cachep belonging
>> to node #0. Due to this check_spinlock_acquired_node(cachep, nodeid) fails
>> (check_spinlock_acquired_node(cachep, 0) would succeed).
>
>Hmmm. If a node runs out of memory then pages from another node may end up
>on the slab list of a node. But it seems that free_block cannot handle
>that properly.
>
>How are you producing the problem?
>
>Could you try the following patch:
>
>---
>
>The numa slab allocator may allocate pages from foreign nodes onto the lists
>for a particular node if a node runs out of memory. Inspecting the slab->nodeid
>field will not reflect that the page is now in use for the slabs of another node.
>/
>
/
/

IMO the slab->nodeid field just lets us know to which nodes list3 is
this slab attached, irrespective of the node from
which node the memory was got.


>/This patch fixes that issue by adding a node field to free_block so that the caller
>can indicate which node currently uses a slab.
>
>/
>
But the nodeid is already accessible through the slab-descriptor of this
object, and this nodeid is set in the cache_grow
function.

>/Also removes the check for the current node from kmalloc_cache_node since the
>process may shift later to another node which may lead to an allocation on another
>node than intended.
>/
>
Yeah that is possible, but won't putting a check in __cache_alloc_node
after disabling the interrupt be better, because
kmalloc_node/kmem_cache_alloc_node can be called at runtime as well, and
getting the object directly from the slabs, instead of the arraycaches
may slow up things.
Thus tweaking the patch a little.


Thanks & Regards,
Alok


Attachments:
cache_alloc_node.patch (1.84 kB)

2005-09-23 23:58:08

by Christoph Lameter

[permalink] [raw]
Subject: Re: 2.6.14-rc1-git-now still dying in mm/slab - this time line 1849

On Sat, 24 Sep 2005, Alok Kataria wrote:

> But the nodeid is already accessible through the slab-descriptor of this
> object, and this nodeid is set in the cache_grow
> function.

Correct. We still have no explanation why the slab was later assigned to
the wrong node. The patch fixes the locking issue though because the wrong
nodeid field is now ignored. There is certianly more to fix here.

> > /Also removes the check for the current node from kmalloc_cache_node since
> > the
> > process may shift later to another node which may lead to an allocation on
> > another
> > node than intended.
> > /
> >
> Yeah that is possible, but won't putting a check in __cache_alloc_node after
> disabling the interrupt be better, because kmalloc_node/kmem_cache_alloc_node
> can be called at runtime as well, and getting the object directly from the
> slabs, instead of the arraycaches may slow up things.
> Thus tweaking the patch a little.

Good

2005-09-24 00:05:21

by Christoph Lameter

[permalink] [raw]
Subject: Re: 2.6.14-rc1-git-now still dying in mm/slab - this time line 1849

Patch was not send inline. So this is going to look at a bit strange.
Comments on the code:

@@ -2852,6 +2860,8 @@

cache_alloc_debugcheck_before(cachep, flags);
local_irq_save(save_flags);
+ if (nodeid == numa_node_id())
+ ____cache_alloc(cachep, flags);
ptr = __cache_alloc_node(cachep, flags, nodeid);

This should be

ptr = ___cache_alloc(cachep, flags)
else
ptr = __cache_alloc_node(...)

right?

local_irq_restore(save_flags);
ptr = cache_alloc_debugcheck_after(cachep, flags, ptr, __builtin_return_address(0));

2005-09-24 12:53:27

by Manfred Spraul

[permalink] [raw]
Subject: Re: 2.6.14-rc1-git-now still dying in mm/slab - this time line 1849

Alok Kataria wrote:

>
> IMO the slab->nodeid field just lets us know to which nodes list3 is
> this slab attached, irrespective of the node from
> which node the memory was got.
>
Correct. Otherwise the code wouldn't work on ia32 NUMAQ systems: They
have the whole ZONE_NORMAL in node 0.
When a slab is allocated, it's assigned to the node that did the alloc,
regardless of the physical location of the memory.

--
Manfred

2005-09-25 14:14:13

by Alok Kataria

[permalink] [raw]
Subject: Re: 2.6.14-rc1-git-now still dying in mm/slab - this time line 1849

On Sat, 2005-09-24 at 05:35, Christoph Lameter wrote:
> Comments on the code:
>
> @@ -2852,6 +2860,8 @@
>
> cache_alloc_debugcheck_before(cachep, flags);
> local_irq_save(save_flags);
> + if (nodeid == numa_node_id())
> + ____cache_alloc(cachep, flags);
> ptr = __cache_alloc_node(cachep, flags, nodeid);
>
> This should be
>
> ptr = ___cache_alloc(cachep, flags)
> else
> ptr = __cache_alloc_node(...)
>
> right?
>
> local_irq_restore(save_flags);
> ptr = cache_alloc_debugcheck_after(cachep, flags, ptr,
> __builtin_return_address(0));

Oh a major blunder !! Updated the patch

--
As pointed by Christoph, In kmalloc_node we are cheking if, the allocation is for the
same node when interrupts are "on", this may lead to an allocation on another node than intended.
This patch just shifts the check for the current node in __cache_alloc_node when interrupts
are disabled.

Signed-off-by: Alok N Kataria <[email protected]>
Cc : Christoph Lameter <[email protected]>

Index: linux-2.6.13/mm/slab.c
===================================================================
--- linux-2.6.13.orig/mm/slab.c 2005-09-25 18:48:16.068349500 +0530
+++ linux-2.6.13/mm/slab.c 2005-09-25 18:48:18.484500500 +0530
@@ -2508,16 +2508,12 @@
#define cache_alloc_debugcheck_after(a,b,objp,d) (objp)
#endif

-
-static inline void *__cache_alloc(kmem_cache_t *cachep, unsigned int __nocast flags)
+static inline void *____cache_alloc(kmem_cache_t *cachep, unsigned int __nocast flags)
{
- unsigned long save_flags;
void* objp;
struct array_cache *ac;

- cache_alloc_debugcheck_before(cachep, flags);
-
- local_irq_save(save_flags);
+ check_irq_off();
ac = ac_data(cachep);
if (likely(ac->avail)) {
STATS_INC_ALLOCHIT(cachep);
@@ -2527,6 +2523,18 @@
STATS_INC_ALLOCMISS(cachep);
objp = cache_alloc_refill(cachep, flags);
}
+ return objp;
+}
+
+static inline void *__cache_alloc(kmem_cache_t *cachep, unsigned int __nocast flags)
+{
+ unsigned long save_flags;
+ void* objp;
+
+ cache_alloc_debugcheck_before(cachep, flags);
+
+ local_irq_save(save_flags);
+ objp = ____cache_alloc(cachep, flags);
local_irq_restore(save_flags);
objp = cache_alloc_debugcheck_after(cachep, flags, objp,
__builtin_return_address(0));
@@ -2844,7 +2852,7 @@
unsigned long save_flags;
void *ptr;

- if (nodeid == numa_node_id() || nodeid == -1)
+ if (nodeid == -1)
return __cache_alloc(cachep, flags);

if (unlikely(!cachep->nodelists[nodeid])) {
@@ -2855,7 +2863,10 @@

cache_alloc_debugcheck_before(cachep, flags);
local_irq_save(save_flags);
- ptr = __cache_alloc_node(cachep, flags, nodeid);
+ if (nodeid == numa_node_id())
+ ptr = ____cache_alloc(cachep, flags);
+ else
+ ptr = __cache_alloc_node(cachep, flags, nodeid);
local_irq_restore(save_flags);
ptr = cache_alloc_debugcheck_after(cachep, flags, ptr, __builtin_return_address(0));

2005-09-26 18:00:29

by Christoph Lameter

[permalink] [raw]
Subject: Re: 2.6.14-rc1-git-now still dying in mm/slab - this time line 1849

On Sun, 25 Sep 2005, Alok Kataria wrote:

> As pointed by Christoph, In kmalloc_node we are cheking if, the allocation is
> for the
> same node when interrupts are "on", this may lead to an allocation on another
> node than intended.
> This patch just shifts the check for the current node in __cache_alloc_node
> when interrupts
> are disabled.

Alokk, could you verify that this patch works?

Petr, could you check that this patch fixes your issue? I am a bit
skeptical. I do not think we have found the real problem yet. We need to
have some way to reproduce the problem if it still persists after applying
Alokk's patch.


2005-09-26 19:31:54

by Alok Kataria

[permalink] [raw]
Subject: Re: 2.6.14-rc1-git-now still dying in mm/slab - this time line 1849

On Mon, 2005-09-26 at 23:30, Christoph Lameter wrote:
> On Sun, 25 Sep 2005, Alok Kataria wrote:
>
> > As pointed by Christoph, In kmalloc_node we are cheking if, the allocation is
> > for the
> > same node when interrupts are "on", this may lead to an allocation on another
> > node than intended.
> > This patch just shifts the check for the current node in __cache_alloc_node
> > when interrupts
> > are disabled.
>
> Alokk, could you verify that this patch works?

Yes it does work at my end, i am still not able to reproduce the BUG so
don't know if we really fix that BUG.

>
> Petr, could you check that this patch fixes your issue? I am a bit
> skeptical. I do not think we have found the real problem yet. We need to
> have some way to reproduce the problem if it still persists after applying
> Alokk's patch.

Yep, that will help, if it still BUG's the information that you provided with verify_entry will be great.

Thanks & Regards,
Alok

2005-09-28 21:03:12

by Ravikiran G Thirumalai

[permalink] [raw]
Subject: Re: 2.6.14-rc1-git-now still dying in mm/slab - this time line 1849

On Tue, Sep 20, 2005 at 03:58:16PM +0200, Petr Vandrovec wrote:
> Andrew Morton wrote:
> >Christoph Lameter <[email protected]> wrote:
> >
> >...
> >They do. I don't believe that preemption is the source of this BUG.
> >(Petr, does CONFIG_PREEMPT=n fix it?)
>
> No, it does not. I've even added printks here and there to show node
> number,
> and everything works as it should. Maybe there are some problems with
> numa_node_id() and migrating between processors when memory gets released,
> I do not know.
>
> Only thing I know that if I'll add WARN_ON below to the free_block(), it
> triggers...
>
> @free_block
> slabp = GET_PAGE_SLAB(virt_to_page(objp));
> nodeid = slabp->nodeid;
> + WARN_ON(nodeid != numa_node_id()); <<<<<
> l3 = cachep->nodelist[nodeid];
> list_del(&slabp->list);
> objnr = (objp - slabp->s_mem) / cachep->objsize;
> check_spinlock_acquired_node(cachep, nodeid);
> check_slabp(cachep, slabp);
>
> ... saying that keventd/0 tries to operate on
> slab belonging to node#1, while having acquired lock for cachep belonging
^^^^^^^^^^^^^^^^^^^^^^^^^
> to node #0
^^^^^^^^^^

Just might be relevant here, I found a bug with the recent
x86_64 changes to 2.6.14-rc* which causes the node_to_cpumask[] to go bad for
the boot processor. This happens on both amd and em64t boxes. I guess the
kevent/0 cpus_allowed mask might be changed by the bad node_to_cpumask[]
here?

On a opteron box (courtesy Ananth M)
# cat /sys/devices/system/node/node0/cpumap
00000000

# cat /sys/devices/system/node/node1/cpumap
00000003

On our em64t IBM x460 NUMA,

# cat /sys/devices/system/node/node0/cpumap
0000000e

# cat /sys/devices/system/node/node1/cpumap
000000f1

Here is a fix for that, I have sounded out Andi Kleen on this and waiting
for his comments. Maybe somebody can test the patch below on amds?

Thanks,
Kiran

---
Patch to fix the BP node_to_cpumask. 2.6.14-rc* broke the boot cpu bit as
the cpu_to_node(0) is now not setup early enough for numa_init_array.
cpu_to_node[] is setup much later at srat_detect_node on acpi srat based
em64t machines. This seems like a problem on amd machines too, Tested on
em64t though. /sys/devices/system/node/node0/cpumap shows up sanely after
this patch.

Signed off by: Ravikiran Thirumalai <[email protected]>
Signed-off-by: Shai Fultheim <[email protected]>


Index: linux-2.6.14-rc1/arch/x86_64/mm/numa.c
===================================================================
--- linux-2.6.14-rc1.orig/arch/x86_64/mm/numa.c 2005-09-19 17:58:10.000000000 -0700
+++ linux-2.6.14-rc1/arch/x86_64/mm/numa.c 2005-09-27 01:34:20.000000000 -0700
@@ -178,7 +178,6 @@
rr++;
}

- set_bit(0, &node_to_cpumask[cpu_to_node(0)]);
}

#ifdef CONFIG_NUMA_EMU
@@ -266,9 +265,7 @@

__cpuinit void numa_add_cpu(int cpu)
{
- /* BP is initialized elsewhere */
- if (cpu)
- set_bit(cpu, &node_to_cpumask[cpu_to_node(cpu)]);
+ set_bit(cpu, &node_to_cpumask[cpu_to_node(cpu)]);
}

unsigned long __init numa_free_all_bootmem(void)

2005-09-28 22:51:53

by Christoph Lameter

[permalink] [raw]
Subject: Re: 2.6.14-rc1-git-now still dying in mm/slab - this time line 1849

On Wed, 28 Sep 2005, Ravikiran G Thirumalai wrote:

> Just might be relevant here, I found a bug with the recent
> x86_64 changes to 2.6.14-rc* which causes the node_to_cpumask[] to go bad for
> the boot processor. This happens on both amd and em64t boxes. I guess the
> kevent/0 cpus_allowed mask might be changed by the bad node_to_cpumask[]
> here?

Andrew, could we add the following patch to the kernel to detect
conditions in the future? This code will only be compiled in if slab
debugging is enabled.

---
[SLAB] Add additional debugging to detect slabs from the wrong node

This patch adds some stack dumps if the slab logic is processing
slab blocks from the wrong node. This is necessary in order to detect
situations as encountered by Petr.

Signed-off-by: Christoph Lameter <[email protected]>

Index: linux-2.6.14-rc2/mm/slab.c
===================================================================
--- linux-2.6.14-rc2.orig/mm/slab.c 2005-09-27 13:22:30.000000000 -0700
+++ linux-2.6.14-rc2/mm/slab.c 2005-09-28 15:46:31.000000000 -0700
@@ -2421,6 +2421,7 @@ retry:
next = slab_bufctl(slabp)[slabp->free];
#if DEBUG
slab_bufctl(slabp)[slabp->free] = BUFCTL_FREE;
+ WARN_ON(numa_node_id() != slabp->nodeid);
#endif
slabp->free = next;
}
@@ -2635,8 +2636,10 @@ static void free_block(kmem_cache_t *cac
check_spinlock_acquired_node(cachep, node);
check_slabp(cachep, slabp);

-
#if DEBUG
+ /* Verify that the slab belongs to the intended node */
+ WARN_ON(slabp->nodeid != node);
+
if (slab_bufctl(slabp)[objnr] != BUFCTL_FREE) {
printk(KERN_ERR "slab: double free detected in cache "
"'%s', objp %p\n", cachep->name, objp);

2005-09-29 16:43:09

by Petr Vandrovec

[permalink] [raw]
Subject: Re: 2.6.14-rc1-git-now still dying in mm/slab - this time line 1849

Ravikiran G Thirumalai wrote:

Unfortunately I must confirm that it does not fix problem. But it pointed
out to me another thing - proc_inode_cache stuff is put into caches
BEFORE this code is executed. So if anything in mm/slab.c relies
on node_to_mask[] being valid (and if it relies on some other things
which are set this late), it probably won't work.
Petr


> On Tue, Sep 20, 2005 at 03:58:16PM +0200, Petr Vandrovec wrote:
> Index: linux-2.6.14-rc1/arch/x86_64/mm/numa.c
> ===================================================================
> --- linux-2.6.14-rc1.orig/arch/x86_64/mm/numa.c 2005-09-19 17:58:10.000000000 -0700
> +++ linux-2.6.14-rc1/arch/x86_64/mm/numa.c 2005-09-27 01:34:20.000000000 -0700
> @@ -178,7 +178,6 @@
> rr++;
> }
>
> - set_bit(0, &node_to_cpumask[cpu_to_node(0)]);
> }
>
> #ifdef CONFIG_NUMA_EMU
> @@ -266,9 +265,7 @@
>
> __cpuinit void numa_add_cpu(int cpu)
> {
> - /* BP is initialized elsewhere */
> - if (cpu)
> - set_bit(cpu, &node_to_cpumask[cpu_to_node(cpu)]);
> + set_bit(cpu, &node_to_cpumask[cpu_to_node(cpu)]);
> }
>
> unsigned long __init numa_free_all_bootmem(void)
>


2005-09-29 18:11:58

by Ravikiran G Thirumalai

[permalink] [raw]
Subject: Re: 2.6.14-rc1-git-now still dying in mm/slab - this time line 1849

On Thu, Sep 29, 2005 at 06:43:05PM +0200, Petr Vandrovec wrote:
> Ravikiran G Thirumalai wrote:
>
> Unfortunately I must confirm that it does not fix problem. But it pointed
> out to me another thing - proc_inode_cache stuff is put into caches
> BEFORE this code is executed. So if anything in mm/slab.c relies
> on node_to_mask[] being valid (and if it relies on some other things
> which are set this late), it probably won't work.

Hmmm. Another data point for this bug. Bryan, who encountered the same bug
on his box just tried 2.6.13 stock + numa slab patches from 2.6.13-mm s, and
apparently, the kernel booted up on his opteron. So I guess we should
concentrate on the x86_64 bootup part.

Thanks,
Kiran

2005-09-29 18:38:42

by Christoph Lameter

[permalink] [raw]
Subject: Re: 2.6.14-rc1-git-now still dying in mm/slab - this time line 1849

On Thu, 29 Sep 2005, Ravikiran G Thirumalai wrote:

> Hmmm. Another data point for this bug. Bryan, who encountered the same bug
> on his box just tried 2.6.13 stock + numa slab patches from 2.6.13-mm s, and
> apparently, the kernel booted up on his opteron. So I guess we should
> concentrate on the x86_64 bootup part.

Careful with the patchsets. Some of them contain my fix that masks the
problem. Be sure to either have the WARN_ON statements in there that
check for valid node numers or use a version before I added the node
parameter to free_block.

2005-09-30 05:46:43

by Ravikiran G Thirumalai

[permalink] [raw]
Subject: Re: 2.6.14-rc1-git-now still dying in mm/slab - this time line 1849

On Thu, Sep 29, 2005 at 06:43:05PM +0200, Petr Vandrovec wrote:
> Ravikiran G Thirumalai wrote:
>
> Unfortunately I must confirm that it does not fix problem. But it pointed
> out to me another thing - proc_inode_cache stuff is put into caches
> BEFORE this code is executed. So if anything in mm/slab.c relies
> on node_to_mask[] being valid (and if it relies on some other things
> which are set this late), it probably won't work.

The tests Alok carried out on Petr's box confirmed that cpu_to_node[BP]
is not setup early enough by numa_init_array due to the x86_64 changes in
2.6.14-rc*, and unfortunately set wrongly by the work around code in
numa_init_array(). cpu_to_node[0] gets set with 1 early and later gets set
properly to 0 during identify_cpu() when all cpus are brought up, but
confusing the numa slab in the process.

Here is a quick fix for this. The right fix obviously is to have
cpu_to_node[bsp] setup early for numa_init_array(). The following patch
will fix the problem now, and the code can stay on even when cpu_to_node{BP]
gets fixed early correctly.

Thanks to Petr for access to his box.

Signed off by: Ravikiran Thirumalai <[email protected]>
Signed-off-by: Alok N Kataria <[email protected]>

Index: slab-x86_64-fix-2.6.14-rc2/arch/x86_64/mm/numa.c
===================================================================
--- slab-x86_64-fix-2.6.14-rc2.orig/arch/x86_64/mm/numa.c 2005-09-29 20:39:25.000000000 -0700
+++ slab-x86_64-fix-2.6.14-rc2/arch/x86_64/mm/numa.c 2005-09-29 21:38:05.000000000 -0700
@@ -167,15 +167,14 @@
mapping. To avoid this fill in the mapping for all possible
CPUs, as the number of CPUs is not known yet.
We round robin the existing nodes. */
- rr = 0;
+ rr = first_node(node_online_map);
for (i = 0; i < NR_CPUS; i++) {
if (cpu_to_node[i] != NUMA_NO_NODE)
continue;
+ cpu_to_node[i] = rr;
rr = next_node(rr, node_online_map);
if (rr == MAX_NUMNODES)
rr = first_node(node_online_map);
- cpu_to_node[i] = rr;
- rr++;
}

}

2005-09-30 06:07:08

by Andrew Morton

[permalink] [raw]
Subject: Re: 2.6.14-rc1-git-now still dying in mm/slab - this time line 1849

Ravikiran G Thirumalai <[email protected]> wrote:
>
> On Thu, Sep 29, 2005 at 06:43:05PM +0200, Petr Vandrovec wrote:
> > Ravikiran G Thirumalai wrote:
> >
> > Unfortunately I must confirm that it does not fix problem. But it pointed
> > out to me another thing - proc_inode_cache stuff is put into caches
> > BEFORE this code is executed. So if anything in mm/slab.c relies
> > on node_to_mask[] being valid (and if it relies on some other things
> > which are set this late), it probably won't work.
>
> The tests Alok carried out on Petr's box confirmed that cpu_to_node[BP]
> is not setup early enough by numa_init_array due to the x86_64 changes in
> 2.6.14-rc*, and unfortunately set wrongly by the work around code in
> numa_init_array(). cpu_to_node[0] gets set with 1 early and later gets set
> properly to 0 during identify_cpu() when all cpus are brought up, but
> confusing the numa slab in the process.
>
> Here is a quick fix for this. The right fix obviously is to have
> cpu_to_node[bsp] setup early for numa_init_array(). The following patch
> will fix the problem now, and the code can stay on even when cpu_to_node{BP]
> gets fixed early correctly.
>
> Thanks to Petr for access to his box.
>
> Signed off by: Ravikiran Thirumalai <[email protected]>
> Signed-off-by: Alok N Kataria <[email protected]>
>
> Index: slab-x86_64-fix-2.6.14-rc2/arch/x86_64/mm/numa.c
> ===================================================================
> --- slab-x86_64-fix-2.6.14-rc2.orig/arch/x86_64/mm/numa.c 2005-09-29 20:39:25.000000000 -0700
> +++ slab-x86_64-fix-2.6.14-rc2/arch/x86_64/mm/numa.c 2005-09-29 21:38:05.000000000 -0700
> @@ -167,15 +167,14 @@
> mapping. To avoid this fill in the mapping for all possible
> CPUs, as the number of CPUs is not known yet.
> We round robin the existing nodes. */
> - rr = 0;
> + rr = first_node(node_online_map);
> for (i = 0; i < NR_CPUS; i++) {
> if (cpu_to_node[i] != NUMA_NO_NODE)
> continue;
> + cpu_to_node[i] = rr;
> rr = next_node(rr, node_online_map);
> if (rr == MAX_NUMNODES)
> rr = first_node(node_online_map);
> - cpu_to_node[i] = rr;
> - rr++;
> }
>
> }

Is this fix independent from
ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.14-rc2/2.6.14-rc2-mm2/broken-out/x86_64-fix-the-bp-node_to_cpumask.patch
?


2005-09-30 06:29:14

by Ravikiran G Thirumalai

[permalink] [raw]
Subject: Re: 2.6.14-rc1-git-now still dying in mm/slab - this time line 1849

On Thu, Sep 29, 2005 at 11:05:40PM -0700, Andrew Morton wrote:
> Ravikiran G Thirumalai <[email protected]> wrote:
>
> Is this fix independent from
> ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.14-rc2/2.6.14-rc2-mm2/broken-out/x86_64-fix-the-bp-node_to_cpumask.patch
> ?

Yes.

2005-09-30 15:16:30

by Bryan O'Sullivan

[permalink] [raw]
Subject: Re: 2.6.14-rc1-git-now still dying in mm/slab - this time line 1849

On Thu, 2005-09-29 at 23:28 -0700, Ravikiran G Thirumalai wrote:

> Yes.

Kiran, your patch works for me, too. I can boot 2.6.14-rc2 with your
patch, but not without it.

Thanks for your help.

<b

2005-09-30 15:58:30

by Christoph Lameter

[permalink] [raw]
Subject: Re: 2.6.14-rc1-git-now still dying in mm/slab - this time line 1849

On Fri, 30 Sep 2005, Bryan O'Sullivan wrote:

> Kiran, your patch works for me, too. I can boot 2.6.14-rc2 with your
> patch, but not without it.

The patch is not in rc2-mm2 right? I can now reproduce it on a AMD64
single processor with numa emulation (numa=fake=2). So all x86_64 NUMA
systems will throw these same stacktraces for rc2-mm2?



Bootdata ok (command line is root=/dev/sda2 console=tty0
console=ttyS0,38400n8 numa=fake=2)
Linux version 2.6.14-rc2-mm2 ([email protected]) (gcc version 3.3.5
(Debian 1:3.3.5-13)) #2 SMP Fri Sep 30 15:49:50 UTC 2005
BIOS-provided physical RAM map:
BIOS-e820: 0000000000000000 - 000000000009f800 (usable)
BIOS-e820: 000000000009f800 - 00000000000a0000 (reserved)
BIOS-e820: 00000000000f0000 - 0000000000100000 (reserved)
BIOS-e820: 0000000000100000 - 000000003fff0000 (usable)
BIOS-e820: 000000003fff0000 - 000000003fff3000 (ACPI NVS)
BIOS-e820: 000000003fff3000 - 0000000040000000 (ACPI data)
BIOS-e820: 00000000fec00000 - 00000000fec01000 (reserved)
BIOS-e820: 00000000fee00000 - 00000000fef00000 (reserved)
BIOS-e820: 00000000fefffc00 - 00000000ff000000 (reserved)
BIOS-e820: 00000000ffff0000 - 0000000100000000 (reserved)
Faking node 0 at 0000000000000000-000000000fffffff (255MB)
Faking node 1 at 0000000010000000-000000003fff0000 (767MB)
Bootmem setup node 0 0000000000000000-000000000fffffff
Bootmem setup node 1 0000000010000000-000000003fff0000
Nvidia board detected. Ignoring ACPI timer override.
ACPI: PM-Timer IO Port: 0x4008
ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled)
Processor #0 15:12 APIC version 16
ACPI: LAPIC_NMI (acpi_id[0x00] high edge lint[0x1])
ACPI: IOAPIC (id[0x02] address[0xfec00000] gsi_base[0])
IOAPIC[0]: apic_id 2, version 17, address 0xfec00000, GSI 0-23
ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
ACPI: BIOS IRQ0 pin2 override ignored.
ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
ACPI: INT_SRC_OVR (bus 0 bus_irq 14 global_irq 14 high edge)
ACPI: INT_SRC_OVR (bus 0 bus_irq 15 global_irq 15 high edge)
Setting APIC routing to flat
Using ACPI (MADT) for SMP configuration information
Allocating PCI resources starting at 50000000 (gap: 40000000:bec00000)
Built 2 zonelists
Initializing CPU#0
Kernel command line: root=/dev/sda2 console=tty0 console=ttyS0,38400n8
numa=fake=2
PID hash table entries: 4096 (order: 12, 131072 bytes)
time.c: Using 3.579545 MHz PM timer.
time.c: Detected 2210.110 MHz processor.

...

powernow-k8: Found 1 AMD Athlon 64 / Opteron processors (version 1.50.3)
powernow-k8: 0 : fid 0xe (2200 MHz), vid 0x2 (1500 mV)
powernow-k8: 1 : fid 0xc (2000 MHz), vid 0x6 (1400 mV)
powernow-k8: 2 : fid 0xa (1800 MHz), vid 0xa (1300 mV)
powernow-k8: 3 : fid 0x2 (1000 MHz), vid 0x12 (1100 mV)
cpu_init done, current fid 0xe, vid 0x2
Badness in cache_alloc_refill at mm/slab.c:2424

Call Trace:<ffffffff80164f8a>{cache_alloc_refill+458}
<ffffffff80165858>{kmem_cache_alloc+136}
<ffffffff8019f41e>{alloc_vfsmnt+30}
<ffffffff801893d2>{do_kern_mount+82}
<ffffffff8018fd5f>{getname+31} <ffffffff801a07ef>{do_new_mount+143}
<ffffffff801a0f5b>{do_mount+363} <ffffffff8018fd5f>{getname+31}
<ffffffff8015ee60>{__get_free_pages+16}
<ffffffff801a134b>{sys_mount+139}
<ffffffff8010b4ae>{name_to_dev_t+62}
<ffffffff80514e70>{prepare_namespace+80}
<ffffffff8010b16a>{init+250} <ffffffff8010ed2e>{child_rip+8}
<ffffffff8010b070>{init+0} <ffffffff8010ed26>{child_rip+0}

Badness in cache_alloc_refill at mm/slab.c:2424

Call Trace:<ffffffff80164f8a>{cache_alloc_refill+458}
<ffffffff80165858>{kmem_cache_alloc+136}
<ffffffff8019f41e>{alloc_vfsmnt+30}
<ffffffff801893d2>{do_kern_mount+82}
<ffffffff8018fd5f>{getname+31} <ffffffff801a07ef>{do_new_mount+143}
<ffffffff801a0f5b>{do_mount+363} <ffffffff8018fd5f>{getname+31}
<ffffffff8015ee60>{__get_free_pages+16}
<ffffffff801a134b>{sys_mount+139}
<ffffffff8010b4ae>{name_to_dev_t+62}
<ffffffff80514e70>{prepare_namespace+80}
<ffffffff8010b16a>{init+250} <ffffffff8010ed2e>{child_rip+8}
<ffffffff8010b070>{init+0} <ffffffff8010ed26>{child_rip+0}

Badness in cache_alloc_refill at mm/slab.c:2424


2005-09-30 16:45:11

by Bryan O'Sullivan

[permalink] [raw]
Subject: Re: 2.6.14-rc1-git-now still dying in mm/slab - this time line 1849

On Fri, 2005-09-30 at 08:57 -0700, Christoph Lameter wrote:
> On Fri, 30 Sep 2005, Bryan O'Sullivan wrote:
>
> > Kiran, your patch works for me, too. I can boot 2.6.14-rc2 with your
> > patch, but not without it.
>
> The patch is not in rc2-mm2 right?

Correct.

> I can now reproduce it on a AMD64
> single processor with numa emulation (numa=fake=2).

That's helpful for reproducing the problem. Thanks.

> So all x86_64 NUMA
> systems will throw these same stacktraces for rc2-mm2?

I've only tried with HDAMA motherboards, but based on your report and
Petr's, it seems somewhat likely.

<b

2005-09-30 16:56:13

by Christoph Lameter

[permalink] [raw]
Subject: Re: 2.6.14-rc1-git-now still dying in mm/slab - this time line 1849

On Thu, 29 Sep 2005, Ravikiran G Thirumalai wrote:

> Here is a quick fix for this. The right fix obviously is to have
> cpu_to_node[bsp] setup early for numa_init_array(). The following patch
> will fix the problem now, and the code can stay on even when cpu_to_node{BP]
> gets fixed early correctly.

This fixes the problem that I can produce by booting with numa=fake=2

2005-09-30 20:11:27

by Andi Kleen

[permalink] [raw]
Subject: Re: 2.6.14-rc1-git-now still dying in mm/slab - this time line 1849

On Friday 30 September 2005 17:16, Bryan O'Sullivan wrote:
> On Thu, 2005-09-29 at 23:28 -0700, Ravikiran G Thirumalai wrote:
> > Yes.
>
> Kiran, your patch works for me, too. I can boot 2.6.14-rc2 with your
> patch, but not without it.
>
> Thanks for your help.

It's already on its way to Linus. Thanks Kiran.

BTW for my defense: my NUMA boxes booted just fine with the original patchkit.

-Andi

2005-09-30 20:23:33

by Ravikiran G Thirumalai

[permalink] [raw]
Subject: Re: 2.6.14-rc1-git-now still dying in mm/slab - this time line 1849

On Fri, Sep 30, 2005 at 10:11:15PM +0200, Andi Kleen wrote:
> On Friday 30 September 2005 17:16, Bryan O'Sullivan wrote:
> > On Thu, 2005-09-29 at 23:28 -0700, Ravikiran G Thirumalai wrote:
> > > Yes.
> >
> > Kiran, your patch works for me, too. I can boot 2.6.14-rc2 with your
> > patch, but not without it.
> >
> > Thanks for your help.
>
> It's already on its way to Linus. Thanks Kiran.

Thanks are also due to Alok for spending long hours trying out all combinations
on Petr's box. Thanks Alok.

Kiran