2007-08-21 04:02:58

by taoyue

[permalink] [raw]
Subject: [BUG]: posix timer: slab error 'double free'

Linux version 2.6.23-rc3-g2a677896-dirty (ytao@ytao-desktop) (gcc version 4.1.2 (Ubuntu 4.1.2-0ubuntu4)) #2 SMP PREEMPT Mon Aug 20 15:18:42 EDT 2007
BIOS-provided physical RAM map:
BIOS-e820: 0000000000000000 - 000000000009e400 (usable)
BIOS-e820: 000000000009e400 - 00000000000a0000 (reserved)
BIOS-e820: 00000000000e0000 - 0000000000100000 (reserved)
BIOS-e820: 0000000000100000 - 000000001fff0000 (usable)
BIOS-e820: 000000001fff0000 - 000000001ffff000 (ACPI data)
BIOS-e820: 000000001ffff000 - 0000000020000000 (ACPI NVS)
BIOS-e820: 00000000fec00000 - 00000000fed00000 (reserved)
BIOS-e820: 00000000fee00000 - 00000000fee01000 (reserved)
BIOS-e820: 00000000ffb00000 - 0000000100000000 (reserved)
0MB HIGHMEM available.
511MB LOWMEM available.
found SMP MP-table at 000ff780
Entering add_active_range(0, 0, 131056) 0 entries of 256 used
Zone PFN ranges:
DMA 0 -> 4096
Normal 4096 -> 131056
HighMem 131056 -> 131056
Movable zone start PFN for each node
early_node_map[1] active PFN ranges
0: 0 -> 131056
On node 0 totalpages: 131056
DMA zone: 32 pages used for memmap
DMA zone: 0 pages reserved
DMA zone: 4064 pages, LIFO batch:0
Normal zone: 991 pages used for memmap
Normal zone: 125969 pages, LIFO batch:31
HighMem zone: 0 pages used for memmap
Movable zone: 0 pages used for memmap
DMI 2.3 present.
Intel MultiProcessor Specification v1.4
Virtual Wire compatibility mode.
OEM ID: INTEL Product ID: KC861I3B.86E APIC at: 0xFEE00000
Processor #0 15:2 APIC version 20
Processor #6 15:2 APIC version 20
I/O APIC #8 Version 32 at 0xFEC00000.
I/O APIC #9 Version 32 at 0xFEC81000.
I/O APIC #10 Version 32 at 0xFEC81400.
Enabling APIC mode: Flat. Using 3 I/O APICs
Processors: 2
Allocating PCI resources starting at 30000000 (gap: 20000000:dec00000)
Built 1 zonelists in Zone order. Total pages: 130033
Kernel command line: ksdevice=eth1 console=ttyS0,9600n8 root=/dev/nfs rw ip=dhcp BOOT_IMAGE=vlm-boards/8151/kernel ip=192.168.29.97:192.168.29.11:192.168.29.1:255.255.255.0
mapped APIC to ffffb000 (fee00000)
mapped IOAPIC to ffffa000 (fec00000)
mapped IOAPIC to ffff9000 (fec81000)
mapped IOAPIC to ffff8000 (fec81400)
Enabling fast FPU save and restore... done.
Enabling unmasked SIMD FPU exception support... done.
Initializing CPU#0
PID hash table entries: 2048 (order: 11, 8192 bytes)
Detected 1999.886 MHz processor.
Console: colour dummy device 80x25
console [ttyS0] enabled
Dentry cache hash table entries: 65536 (order: 6, 262144 bytes)
Inode-cache hash table entries: 32768 (order: 5, 131072 bytes)
Memory: 515440k/524224k available (1972k kernel code, 8300k reserved, 659k data, 228k init, 0k highmem)
virtual kernel memory layout:
fixmap : 0xfff4e000 - 0xfffff000 ( 708 kB)
pkmap : 0xff800000 - 0xffc00000 (4096 kB)
vmalloc : 0xe0800000 - 0xff7fe000 ( 495 MB)
lowmem : 0xc0000000 - 0xdfff0000 ( 511 MB)
.init : 0xc0398000 - 0xc03d1000 ( 228 kB)
.data : 0xc02ed294 - 0xc0391ffc ( 659 kB)
.text : 0xc0100000 - 0xc02ed294 (1972 kB)
Checking if this processor honours the WP bit even in supervisor mode... Ok.
Calibrating delay using timer specific routine.. 4003.61 BogoMIPS (lpj=8007232)
Mount-cache hash table entries: 512
CPU: After generic identify, caps: bfebfbff 00000000 00000000 00000000 00004400 00000000 00000000 00000000
CPU: Trace cache: 12K uops, L1 D cache: 8K
CPU: L2 cache: 512K
CPU: Physical Processor ID: 0
CPU: After all inits, caps: bfebfbf7 00000000 00000000 0000b080 00004400 00000000 00000000 00000000
Compat vDSO mapped to ffffe000.
Checking 'hlt' instruction... OK.
Freeing SMP alternatives: 12k freed
CPU0: Intel(R) Xeon(TM) CPU 2.00GHz stepping 09
Booting processor 1/6 eip 2000
Initializing CPU#1
Calibrating delay using timer specific routine.. 3999.90 BogoMIPS (lpj=7999806)
CPU: After generic identify, caps: bfebfbff 00000000 00000000 00000000 00004400 00000000 00000000 00000000
CPU: Trace cache: 12K uops, L1 D cache: 8K
CPU: L2 cache: 512K
CPU: Physical Processor ID: 3
CPU: After all inits, caps: bfebfbf7 00000000 00000000 0000b080 00004400 00000000 00000000 00000000
CPU1: Intel(R) Xeon(TM) CPU 2.00GHz stepping 09
Total of 2 processors activated (8003.51 BogoMIPS).
ExtINT not setup in hardware but reported by MP table
ENABLING IO-APIC IRQs
..TIMER: vector=0x31 apic1=0 pin1=2 apic2=0 pin2=0
checking TSC synchronization [CPU#0 -> CPU#1]: passed.
Brought up 2 CPUs
NET: Registered protocol family 16
PCI: PCI BIOS revision 2.10 entry at 0xf0031, last bus=4
PCI: Using configuration type 1
Setting up standard PCI resources
PCI: Probing PCI hardware
PCI: Probing PCI hardware (bus 00)
PCI quirk: region 0400-047f claimed by ICH4 ACPI/GPIO/TCO
PCI quirk: region 0500-053f claimed by ICH4 GPIO
PCI: Transparent bridge - 0000:00:1e.0
PCI: Using IRQ router PIIX/ICH [8086/2480] at 0000:00:1f.0
PCI->APIC IRQ transform: 0000:00:1d.0[A] -> IRQ 16
PCI->APIC IRQ transform: 0000:00:1f.1[A] -> IRQ 18
PCI->APIC IRQ transform: 0000:00:1f.3[B] -> IRQ 17
PCI->APIC IRQ transform: 0000:04:01.0[A] -> IRQ 48
PCI->APIC IRQ transform: 0000:04:01.1[B] -> IRQ 49
Time: tsc clocksource has been installed.
PCI: Bridge: 0000:02:1d.0
IO window: e000-efff
MEM window: ff800000-ff8fffff
PREFETCH window: ff300000-ff3fffff
PCI: Bridge: 0000:02:1f.0
IO window: disabled.
MEM window: ff700000-ff7fffff
PREFETCH window: ff200000-ff2fffff
PCI: Bridge: 0000:00:03.0
IO window: e000-efff
MEM window: ff700000-ff9fffff
PREFETCH window: ff200000-ff4fffff
PCI: Bridge: 0000:00:1e.0
IO window: disabled.
MEM window: ff600000-ff6fffff
PREFETCH window: ff100000-ff1fffff
PCI: Setting latency timer of device 0000:00:1e.0 to 64
NET: Registered protocol family 2
IP route cache hash table entries: 4096 (order: 2, 16384 bytes)
TCP established hash table entries: 16384 (order: 6, 262144 bytes)
TCP bind hash table entries: 16384 (order: 5, 196608 bytes)
TCP: Hash tables configured (established 16384 bind 16384)
TCP reno registered
io scheduler noop registered
io scheduler anticipatory registered
io scheduler deadline registered
io scheduler cfq registered (default)
Real Time Clock Driver v1.12ac
Serial: 8250/16550 driver $Revision: 1.90 $ 1 ports, IRQ sharing disabled
serial8250: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
RAMDISK driver initialized: 16 RAM disks of 4096K size 1024 blocksize
Intel(R) PRO/1000 Network Driver - version 7.3.20-k2
Copyright (c) 1999-2006 Intel Corporation.
Clocksource tsc unstable (delta = 322848815 ns)
e1000: 0000:04:01.0: e1000_probe: (PCI-X:133MHz:64-bit) 00:02:b3:f1:88:c4
e1000: eth0: e1000_probe: Intel(R) PRO/1000 Network Connection
Time: jiffies clocksource has been installed.
e1000: 0000:04:01.1: e1000_probe: (PCI-X:133MHz:64-bit) 00:02:b3:f1:88:c5
e1000: eth1: e1000_probe: Intel(R) PRO/1000 Network Connection
netconsole: not configured, aborting
Uniform Multi-Platform E-IDE driver Revision: 7.00alpha2
ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx
Probing IDE interface ide0...
hda: FUJITSU MHT2020AT, ATA DISK drive
Probing IDE interface ide1...
ide0 at 0x1f0-0x1f7,0x3f6 on irq 14
hda: max request size: 128KiB
hda: 39070080 sectors (20003 MB) w/2048KiB Cache, CHS=38760/16/63
hda: cache flushes supported
hda: hda1 hda2 hda3 hda4
md: linear personality registered for level -1
md: raid0 personality registered for level 0
md: raid1 personality registered for level 1
md: raid10 personality registered for level 10
md: multipath personality registered for level -4
md: faulty personality registered for level -5
device-mapper: ioctl: 4.11.0-ioctl (2006-10-12) initialised: [email protected]
TCP cubic registered
NET: Registered protocol family 1
NET: Registered protocol family 17
Starting balanced_irq
Using IPI Shortcut mode
e1000: eth0: e1000_watchdog: NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
Sending DHCP requests . OK
IP-Config: Got DHCP answer from 192.168.29.11, my address is 192.168.29.97
IP-Config: Complete:
device=eth0, addr=192.168.29.97, mask=255.255.255.0, gw=192.168.29.1,
host=kennicott-node4, domain=tuxlab.wrs.com, nis-domain=swamp,
bootserver=192.168.29.11, rootserver=192.168.29.11, rootpath=/export/pxeboot/vlm-boards/8151/rootfs
md: Autodetecting RAID arrays.
md: autorun ...
md: ... autorun DONE.
Looking up port of RPC 100003/2 on 192.168.29.11
Looking up port of RPC 100005/1 on 192.168.29.11
VFS: Mounted root (nfs filesystem).
Freeing unused kernel memory: 228k freed
Write protecting the kernel text: 1976k
Write protecting the kernel read-only data: 496k


Attachments:
config.txt (24.64 kB)
cpuinfo.txt (2.48 kB)
posix_timer_test.c (3.13 kB)
dmesg.txt (8.38 kB)
Download all attachments

2007-08-21 18:45:17

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [BUG]: posix timer: slab error 'double free'

(maintainers cc'ed)

On 08/21, taoyue wrote:
>
> I write a simple stress test program for posix timer subsystem, to
> reproduce the problem in the lastest mainline kernel.
> My test program creates 200 threads, and each thread does the following job:
>
> Please see my test program in the attachemnt "posix_timer_test.c". You
> can compile my test program via the following command line:

Now we have a testcase. Great, thanks a lot!

Unfortunately, I can't use it because my /lib/libc-2.3.3.so doesn't implement
SIGEV_THREAD (this is a user-space thing).

However,

> [<c0103941>] show_trace_log_lvl+0x1a/0x30
> [<c0104593>] show_trace+0x12/0x14
> [<c01045ab>] dump_stack+0x16/0x18
> [<c015637b>] __slab_error+0x26/0x28
> [<c0156bbf>] cache_free_debugcheck+0x1d9/0x298
> [<c0156eaa>] kmem_cache_free+0x66/0xb5
> [<c0122f4d>] __sigqueue_free+0x2f/0x32
> [<c0123147>] __dequeue_signal+0xdc/0x174
> [<c0124960>] dequeue_signal+0xbb/0x149
> [<c012569e>] sys_rt_sigtimedwait+0x7f/0x240
> [<c01029e6>] syscall_call+0x7/0xb

This is very nice trace, which says loudly that sigqueue_free() is racy
wrt collect_signal(). Very old bug, very small window.

I'll try to send the "right" fix tomorrow after sleep, but in the meanwhile,
could you please try the patch at the end?

(I am _not_ sure this can explain all problems you were seeing, though).

> And I also apply the four patches from Oleg Nesterov from lkml:
>
> http://lkm.org/lkml/2007/8/12/193
> http://lkm.org/lkml/2007/8/12/194
> http://lkm.org/lkml/2007/8/12/195
> http://lkm.org/lkml/2007/8/12/196
^^^
(should be lkml)

Please drop them, except the first one. They have nothing to do with
these problems, and in fact you apllied the wrong versions.

Thanks!

Oleg.

--- t/kernel/signal.c~ 2007-08-21 22:29:25.000000000 +0400
+++ t/kernel/signal.c 2007-08-21 22:30:05.000000000 +0400
@@ -1305,7 +1305,7 @@ void sigqueue_free(struct sigqueue *q)
* If the signal is still pending remove it from the
* pending queue.
*/
- if (unlikely(!list_empty(&q->list))) {
+ if (1) {
spinlock_t *lock = &current->sighand->siglock;
read_lock(&tasklist_lock);
spin_lock_irqsave(lock, flags);

2007-08-24 02:06:17

by taoyue

[permalink] [raw]
Subject: Re: [BUG]: posix timer: slab error 'double free'

Oleg Nesterov wrote:
> (maintainers cc'ed)
>
> On 08/21, taoyue wrote:
>
>> I write a simple stress test program for posix timer subsystem, to
>> reproduce the problem in the lastest mainline kernel.
>> My test program creates 200 threads, and each thread does the following job:
>>
>> Please see my test program in the attachemnt "posix_timer_test.c". You
>> can compile my test program via the following command line:
>>
>
> Now we have a testcase. Great, thanks a lot!
>
> Unfortunately, I can't use it because my /lib/libc-2.3.3.so doesn't implement
> SIGEV_THREAD (this is a user-space thing).
>
> However,
>
>
>> [<c0103941>] show_trace_log_lvl+0x1a/0x30
>> [<c0104593>] show_trace+0x12/0x14
>> [<c01045ab>] dump_stack+0x16/0x18
>> [<c015637b>] __slab_error+0x26/0x28
>> [<c0156bbf>] cache_free_debugcheck+0x1d9/0x298
>> [<c0156eaa>] kmem_cache_free+0x66/0xb5
>> [<c0122f4d>] __sigqueue_free+0x2f/0x32
>> [<c0123147>] __dequeue_signal+0xdc/0x174
>> [<c0124960>] dequeue_signal+0xbb/0x149
>> [<c012569e>] sys_rt_sigtimedwait+0x7f/0x240
>> [<c01029e6>] syscall_call+0x7/0xb
>>
>
> This is very nice trace, which says loudly that sigqueue_free() is racy
> wrt collect_signal(). Very old bug, very small window.
>
> I'll try to send the "right" fix tomorrow after sleep, but in the meanwhile,
> could you please try the patch at the end?
>
> (I am _not_ sure this can explain all problems you were seeing, though).
>
>
>> And I also apply the four patches from Oleg Nesterov from lkml:
>>
>> http://lkm.org/lkml/2007/8/12/193
>> http://lkm.org/lkml/2007/8/12/194
>> http://lkm.org/lkml/2007/8/12/195
>> http://lkm.org/lkml/2007/8/12/196
>>
> ^^^
> (should be lkml)
>
> Please drop them, except the first one. They have nothing to do with
> these problems, and in fact you apllied the wrong versions.
>
> Thanks!
>
> Oleg.
>
> --- t/kernel/signal.c~ 2007-08-21 22:29:25.000000000 +0400
> +++ t/kernel/signal.c 2007-08-21 22:30:05.000000000 +0400
> @@ -1305,7 +1305,7 @@ void sigqueue_free(struct sigqueue *q)
> * If the signal is still pending remove it from the
> * pending queue.
> */
> - if (unlikely(!list_empty(&q->list))) {
> + if (1) {
> spinlock_t *lock = &current->sighand->siglock;
> read_lock(&tasklist_lock);
> spin_lock_irqsave(lock, flags);
>
>
>
Think you for your help. I reserved the first patch:

http://lkm.org/lkml/2007/8/12/193

and applied the previous patch, running the test program again.
Up to now, the test program has been running for more than one
day. It seems likely that the patches avoid the race condition.

yue.tao