2009-11-19 03:48:38

by Lucas C. Villa Real

[permalink] [raw]
Subject: Oops with 2.6.32-rc6

Hi,

I recently decided to test 2.6.32-rc6 and I noticed that, whenever too
many disk activity happens, the system crashes. The error shown in the
traces below happened about 3 times in a week.

Do you have any suggestions?

Thanks,
Lucas


Nov 16 10:37:27 (none) kernel: BUG: unable to handle kernel paging
request at 0000b2cb
Nov 16 10:37:33 (none) kernel: IP: [<c0198266>] __rmqueue+0x98/0x36c
Nov 16 10:37:33 (none) kernel: *pdpt = 0000000031dd2001 *pde = 0000000000000000
Nov 16 10:37:33 (none) kernel: Oops: 0002 [#1] PREEMPT SMP
Nov 16 10:37:33 (none) kernel: last sysfs file:
/System/Kernel/Objects/devices/LNXSYSTM:00/LNXSYBUS:00/ACPI0003:00/power_supply/ADP1/online
Nov 16 10:37:33 (none) kernel: Modules linked in: ipv6 acpi_cpufreq
snd_pcm_oss snd_mixer_oss hfsplus ndiswrapper fuse
snd_hda_codec_realtek joydev isight_firmware snd_hda_intel sky2
uvcvideo snd_hda_codec videodev firewire_ohci video output snd_hwdep
firewire_core v4l1_compat ac battery appletouch snd_pcm i2c_i801
applesmc led_class rtc_cmos thermal snd_timer processor shpchp
i2c_core button ohci1394 intel_agp iTCO_wdt rtc_core rtc_lib
input_polldev pcspkr snd snd_page_alloc iTCO_vendor_support
pci_hotplug
Nov 16 10:37:33 (none) kernel:
Nov 16 10:37:33 (none) kernel: Pid: 1724, comm: tar Tainted: P
(2.6.32-rc6-Gobo #3) MacBook3,1
Nov 16 10:37:33 (none) kernel: EIP: 0060:[<c0198266>] EFLAGS: 00010086 CPU: 0
Nov 16 10:37:33 (none) kernel: EIP is at __rmqueue+0x98/0x36c
Nov 16 10:37:33 (none) kernel: EAX: 000001b8 EBX: c1ad1000 ECX:
0000000a EDX: 0000b2c7
Nov 16 10:37:33 (none) kernel: ESI: c0b2cf40 EDI: c0b2d22c EBP:
f0c8fc50 ESP: f0c8fc18
Nov 16 10:37:33 (none) kernel: DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
Nov 16 10:37:33 (none) kernel: Process tar (pid: 1724, ti=f0c8f000
task=f307a610 task.ti=f0c8f000)
Nov 16 10:37:33 (none) kernel: Stack:
Nov 16 10:37:33 (none) kernel: c01c909e f24c7250 00000000 00000000
00000010 00000000 c0b2d218 c0b2d21c
Nov 16 10:37:33 (none) kernel: <0> 00000002 c1ad1018 00000010 c0b2cf40
c1ae0ff8 00000000 f0c8fca4 c0199335
Nov 16 10:37:33 (none) kernel: <0> 00000000 ffffffff 0000001f 0000003c
00000000 c0b2d744 c0b2cf7c c0bd4134
Nov 16 10:37:33 (none) kernel: Call Trace:
Nov 16 10:37:33 (none) kernel: [<c01c909e>] ? inode_get_bytes+0x48/0x54
Nov 16 10:37:33 (none) kernel: [<c0199335>] ?
get_page_from_freelist+0x147/0x3ec
Nov 16 10:37:33 (none) kernel: [<c01996a0>] ? __alloc_pages_nodemask+0xc6/0x480
Nov 16 10:37:33 (none) kernel: [<c014e22f>] ? autoremove_wake_function+0x0/0x34
Nov 16 10:37:33 (none) kernel: [<c01946fe>] ? find_get_page+0x2d/0x9f
Nov 16 10:37:33 (none) kernel: [<c0194cdc>] ?
grab_cache_page_write_begin+0x54/0x8e
Nov 16 10:37:33 (none) kernel: [<c0217b8b>] ? reiserfs_write_begin+0x81/0x1b2
Nov 16 10:37:33 (none) kernel: [<c0195588>] ?
generic_file_buffered_write+0xd9/0x22f
Nov 16 10:37:33 (none) kernel: [<c0195c2e>] ?
__generic_file_aio_write+0x3a2/0x3e3
Nov 16 10:37:33 (none) kernel: [<c019635a>] ? generic_file_aio_read+0x4cf/0x509
Nov 16 10:37:33 (none) kernel: [<c0195cd3>] ? generic_file_aio_write+0x64/0xab
Nov 16 10:37:33 (none) kernel: [<c01c5eea>] ? do_sync_write+0xb0/0xeb
Nov 16 10:37:33 (none) kernel: [<c014e22f>] ? autoremove_wake_function+0x0/0x34
Nov 16 10:37:33 (none) kernel: [<c014e22f>] ? autoremove_wake_function+0x0/0x34
Nov 16 10:37:33 (none) kernel: [<c01d81e9>] ? expand_files+0xe/0x201
Nov 16 10:37:33 (none) kernel: [<c01ebbce>] ? fsnotify+0xe/0xdb
Nov 16 10:37:33 (none) kernel: [<c021be86>] ? reiserfs_file_write+0x6e/0x77
Nov 16 10:37:33 (none) kernel: [<c01c68f3>] ? vfs_write+0x99/0x14c
Nov 16 10:37:33 (none) kernel: [<c021be18>] ? reiserfs_file_write+0x0/0x77
Nov 16 10:37:33 (none) kernel: [<c01c6a62>] ? sys_write+0x48/0x75
Nov 16 10:37:33 (none) kernel: [<c0103553>] ? sysenter_do_call+0x12/0x28
Nov 16 10:37:33 (none) kernel: Code: 39 5d f0 75 06 41 e9 a0 00 00 00
8b 55 e8 c1 e2 03 89 55 f0 01 c2 8b 94 16 24 01 00 00 89 d3 83 eb 18
89 55 ec 8b 7b 1c 8b 53 18 <89> 7a 04 89 17 c7 43 1c 00 02 20 00 c7 43
18 00 01 10 00 8b 7d
Nov 16 10:37:33 (none) kernel: EIP: [<c0198266>] __rmqueue+0x98/0x36c
SS:ESP 0068:f0c8fc18
Nov 16 10:37:33 (none) kernel: CR2: 000000000000b2cb
Nov 16 10:37:33 (none) kernel: ---[ end trace 611dcee22abb0dec ]---


Nov 16 10:37:33 (none) kernel: note: tar[1724] exited with preempt_count 2
Nov 16 10:37:33 (none) kernel: BUG: scheduling while atomic: tar/1724/0x10000003
Nov 16 10:37:33 (none) kernel: Modules linked in: ipv6 acpi_cpufreq
snd_pcm_oss snd_mixer_oss hfsplus ndiswrapper fuse
snd_hda_codec_realtek joydev isight_firmware snd_hda_intel sky2
uvcvideo snd_hda_codec videodev firewire_ohci video output snd_hwdep
firewire_core v4l1_compat ac battery appletouch snd_pcm i2c_i801
applesmc led_class rtc_cmos thermal snd_timer processor shpchp
i2c_core button ohci1394 intel_agp iTCO_wdt rtc_core rtc_lib
input_polldev pcspkr snd snd_page_alloc iTCO_vendor_support
pci_hotplug
Nov 16 10:37:33 (none) kernel: Pid: 1724, comm: tar Tainted: P D
2.6.32-rc6-Gobo #3
Nov 16 10:37:33 (none) kernel: Call Trace:
Nov 16 10:37:33 (none) kernel: [<c012da92>] __schedule_bug+0x51/0x56
Nov 16 10:37:33 (none) kernel: [<c07f0fc9>] schedule+0x9f/0x993
Nov 16 10:37:33 (none) kernel: [<c019b750>] ? release_pages+0xe/0x165
Nov 16 10:37:33 (none) kernel: [<c01c4463>] ? lookup_page_cgroup+0x9/0x32
Nov 16 10:37:33 (none) kernel: [<c01c4463>] ? lookup_page_cgroup+0x9/0x32
Nov 16 10:37:33 (none) kernel: [<c07f1a47>] ? preempt_schedule+0x8/0x49
Nov 16 10:37:33 (none) kernel: [<c019bd73>] ? lru_add_drain+0x95/0x9b
Nov 16 10:37:33 (none) kernel: [<c01a569f>] ? __dec_zone_state+0xe/0x87
Nov 16 10:37:33 (none) kernel: [<c012e61e>] __cond_resched+0x1b/0x2b
Nov 16 10:37:33 (none) kernel: [<c07f19c1>] _cond_resched+0x20/0x2b
Nov 16 10:37:33 (none) kernel: [<c01a972c>] unmap_vmas+0x55f/0x6af
Nov 16 10:37:33 (none) kernel: [<c019b750>] ? release_pages+0xe/0x165
Nov 16 10:37:33 (none) kernel: [<c01ad4e4>] exit_mmap+0xaf/0x13e
Nov 16 10:37:33 (none) kernel: [<c0135b76>] mmput+0x3a/0xb0
Nov 16 10:37:33 (none) kernel: [<c01395bb>] exit_mm+0xea/0xf2
Nov 16 10:37:33 (none) kernel: [<c013ae64>] do_exit+0x1b3/0x5f6
Nov 16 10:37:33 (none) kernel: [<c07f0b7a>] ? printk+0x14/0x16
Nov 16 10:37:33 (none) kernel: [<c07f4039>] oops_end+0xa2/0xaa
Nov 16 10:37:33 (none) kernel: [<c011e1c0>] no_context+0x13b/0x145
Nov 16 10:37:33 (none) kernel: [<c011e2b6>] __bad_area_nosemaphore+0xec/0xf4
Nov 16 10:37:33 (none) kernel: [<c011e2d0>] bad_area_nosemaphore+0x12/0x15
Nov 16 10:37:33 (none) kernel: [<c07f52a7>] do_page_fault+0x200/0x34e
Nov 16 10:37:33 (none) kernel: [<c07f50a7>] ? do_page_fault+0x0/0x34e
Nov 16 10:37:33 (none) kernel: [<c07f36c3>] error_code+0x73/0x78
Nov 16 10:37:33 (none) kernel: [<c02200d8>] ? find_hash_out+0xc1/0x1dd
Nov 16 10:37:33 (none) kernel: [<c0198266>] ? __rmqueue+0x98/0x36c
Nov 16 10:37:33 (none) kernel: [<c01c909e>] ? inode_get_bytes+0x48/0x54
Nov 16 10:37:33 (none) kernel: [<c0199335>] get_page_from_freelist+0x147/0x3ec
Nov 16 10:37:33 (none) kernel: [<c01996a0>] __alloc_pages_nodemask+0xc6/0x480
Nov 16 10:37:33 (none) kernel: [<c014e22f>] ? autoremove_wake_function+0x0/0x34
Nov 16 10:37:33 (none) kernel: [<c01946fe>] ? find_get_page+0x2d/0x9f
Nov 16 10:37:33 (none) kernel: [<c0194cdc>]
grab_cache_page_write_begin+0x54/0x8e
Nov 16 10:37:33 (none) kernel: [<c0217b8b>] reiserfs_write_begin+0x81/0x1b2
Nov 16 10:37:33 (none) kernel: [<c0195588>]
generic_file_buffered_write+0xd9/0x22f
Nov 16 10:37:33 (none) kernel: [<c0195c2e>]
__generic_file_aio_write+0x3a2/0x3e3
Nov 16 10:37:33 (none) kernel: [<c019635a>] ? generic_file_aio_read+0x4cf/0x509
Nov 16 10:37:33 (none) kernel: [<c0195cd3>] generic_file_aio_write+0x64/0xab
Nov 16 10:37:33 (none) kernel: [<c01c5eea>] do_sync_write+0xb0/0xeb
Nov 16 10:37:33 (none) kernel: [<c014e22f>] ? autoremove_wake_function+0x0/0x34
Nov 16 10:37:33 (none) kernel: [<c014e22f>] ? autoremove_wake_function+0x0/0x34
Nov 16 10:37:33 (none) kernel: [<c01d81e9>] ? expand_files+0xe/0x201
Nov 16 10:37:33 (none) kernel: [<c01ebbce>] ? fsnotify+0xe/0xdb
Nov 16 10:37:33 (none) kernel: [<c021be86>] reiserfs_file_write+0x6e/0x77
Nov 16 10:37:33 (none) kernel: [<c01c68f3>] vfs_write+0x99/0x14c
Nov 16 10:37:33 (none) kernel: [<c021be18>] ? reiserfs_file_write+0x0/0x77
Nov 16 10:37:33 (none) kernel: [<c01c6a62>] sys_write+0x48/0x75
Nov 16 10:37:33 (none) kernel: [<c0103553>] sysenter_do_call+0x12/0x28
Nov 16 10:37:33 (none) kernel: BUG: scheduling while atomic: tar/1724/0x00000003
Nov 16 10:37:33 (none) kernel: Modules linked in: ipv6 acpi_cpufreq
snd_pcm_oss snd_mixer_oss hfsplus ndiswrapper fuse
snd_hda_codec_realtek joydev isight_firmware snd_hda_intel sky2
uvcvideo snd_hda_codec videodev firewire_ohci video output snd_hwdep
firewire_core v4l1_compat ac battery appletouch snd_pcm i2c_i801
applesmc led_class rtc_cmos thermal snd_timer processor shpchp
i2c_core button ohci1394 intel_agp iTCO_wdt rtc_core rtc_lib
input_polldev pcspkr snd snd_page_alloc iTCO_vendor_support
pci_hotplug
Nov 16 10:37:33 (none) kernel: Pid: 1724, comm: tar Tainted: P D
2.6.32-rc6-Gobo #3
Nov 16 10:37:33 (none) kernel: Call Trace:
Nov 16 10:37:33 (none) kernel: [<c012da92>] __schedule_bug+0x51/0x56
Nov 16 10:37:34 (none) kernel: [<c07f0fc9>] schedule+0x9f/0x993
Nov 16 10:37:34 (none) kernel: [<c012422b>] ? idle_cpu+0x8/0x2a
Nov 16 10:37:34 (none) kernel: [<c013d77d>] ? irq_exit+0x3e/0x6b
Nov 16 10:37:34 (none) kernel: [<c0124c0e>] ? mutex_spin_on_owner+0x56/0x65
Nov 16 10:37:34 (none) kernel: [<c07f2066>] __mutex_lock_slowpath+0xc8/0x122
Nov 16 10:37:34 (none) kernel: [<c07f1f14>] mutex_lock+0x18/0x26
Nov 16 10:37:34 (none) kernel: [<c021bbe7>] reiserfs_file_release+0x116/0x303
Nov 16 10:37:34 (none) kernel: [<c01f39aa>] ? locks_remove_posix+0xc/0x8c
Nov 16 10:37:34 (none) kernel: [<c01ebbce>] ? fsnotify+0xe/0xdb
Nov 16 10:37:34 (none) kernel: [<c01c724d>] ? __fput+0x85/0x177
Nov 16 10:37:34 (none) kernel: [<c01c7297>] __fput+0xcf/0x177
Nov 16 10:37:34 (none) kernel: [<c01c7359>] fput+0x1a/0x1c
Nov 16 10:37:34 (none) kernel: [<c01c4811>] filp_close+0x56/0x60
Nov 16 10:37:34 (none) kernel: [<c0139747>] put_files_struct+0x5d/0xa1
Nov 16 10:37:34 (none) kernel: [<c01397c7>] exit_files+0x3c/0x41
Nov 16 10:37:34 (none) kernel: [<c013aebb>] do_exit+0x20a/0x5f6
Nov 16 10:37:34 (none) kernel: [<c07f0b7a>] ? printk+0x14/0x16
Nov 16 10:37:34 (none) kernel: [<c07f4039>] oops_end+0xa2/0xaa
Nov 16 10:37:34 (none) kernel: [<c011e1c0>] no_context+0x13b/0x145
Nov 16 10:37:34 (none) kernel: [<c011e2b6>] __bad_area_nosemaphore+0xec/0xf4
Nov 16 10:37:34 (none) kernel: [<c011e2d0>] bad_area_nosemaphore+0x12/0x15
Nov 16 10:37:34 (none) kernel: [<c07f52a7>] do_page_fault+0x200/0x34e
Nov 16 10:37:34 (none) kernel: [<c07f50a7>] ? do_page_fault+0x0/0x34e
Nov 16 10:37:34 (none) kernel: [<c07f36c3>] error_code+0x73/0x78
Nov 16 10:37:34 (none) kernel: [<c02200d8>] ? find_hash_out+0xc1/0x1dd
Nov 16 10:37:34 (none) kernel: [<c0198266>] ? __rmqueue+0x98/0x36c
Nov 16 10:37:34 (none) kernel: [<c01c909e>] ? inode_get_bytes+0x48/0x54
Nov 16 10:37:34 (none) kernel: [<c0199335>] get_page_from_freelist+0x147/0x3ec
Nov 16 10:37:34 (none) kernel: [<c01996a0>] __alloc_pages_nodemask+0xc6/0x480
Nov 16 10:37:34 (none) kernel: [<c014e22f>] ? autoremove_wake_function+0x0/0x34
Nov 16 10:37:34 (none) kernel: [<c01946fe>] ? find_get_page+0x2d/0x9f
Nov 16 10:37:34 (none) kernel: [<c0194cdc>]
grab_cache_page_write_begin+0x54/0x8e
Nov 16 10:37:34 (none) kernel: [<c0217b8b>] reiserfs_write_begin+0x81/0x1b2
Nov 16 10:37:34 (none) kernel: [<c0195588>]
generic_file_buffered_write+0xd9/0x22f
Nov 16 10:37:34 (none) kernel: [<c0195c2e>]
__generic_file_aio_write+0x3a2/0x3e3
Nov 16 10:37:34 (none) kernel: [<c019635a>] ? generic_file_aio_read+0x4cf/0x509
Nov 16 10:37:34 (none) kernel: [<c0195cd3>] generic_file_aio_write+0x64/0xab
Nov 16 10:37:34 (none) kernel: [<c01c5eea>] do_sync_write+0xb0/0xeb
Nov 16 10:37:34 (none) kernel: [<c014e22f>] ? autoremove_wake_function+0x0/0x34
Nov 16 10:37:34 (none) kernel: [<c014e22f>] ? autoremove_wake_function+0x0/0x34
Nov 16 10:37:34 (none) kernel: [<c01d81e9>] ? expand_files+0xe/0x201
Nov 16 10:37:34 (none) kernel: [<c01ebbce>] ? fsnotify+0xe/0xdb
Nov 16 10:37:34 (none) kernel: [<c021be86>] reiserfs_file_write+0x6e/0x77
Nov 16 10:37:34 (none) kernel: [<c01c68f3>] vfs_write+0x99/0x14c
Nov 16 10:37:34 (none) kernel: [<c021be18>] ? reiserfs_file_write+0x0/0x77
Nov 16 10:37:34 (none) kernel: [<c01c6a62>] sys_write+0x48/0x75
Nov 16 10:37:34 (none) kernel: [<c0103553>] sysenter_do_call+0x12/0x28


2010-01-19 04:50:34

by Lucas C. Villa Real

[permalink] [raw]
Subject: Re: Oops with 2.6.32-rc6

On Thu, Nov 19, 2009 at 1:48 AM, Lucas C. Villa Real
<[email protected]> wrote:
> Hi,
>
> I recently decided to test 2.6.32-rc6 and I noticed that, whenever too
> many disk activity happens, the system crashes. The error shown in the
> traces below happened about 3 times in a week.
>
> Do you have any suggestions?
>
> Thanks,
> Lucas
>

I just got a reproduction of the kernel oops with 2.6.33-rc4, whose
original report can be seen at
http://bugzilla.kernel.org/show_bug.cgi?id=14656.

I'm seeing this problem while I'm stressing a FUSE file system which
is sitting on top of ReiserFS 3. However, since some write operations
in this test-case also operate in the root filesystem I cannot tell if
FUSE has anything to do with this. Based on the stack trace I would
say no.

I have one complete message which shows the complete stack trace,
found below, and another partial one which includes some debugging
messages from CONFIG_DEBUG_LIST=y. The very line which is causing the
problem is a list_del() in __rmqueue:

(gdb) list *__rmqueue+0x98
0x963 is in __rmqueue (mm/page_alloc.c:730).
725 continue;
726
727 page = list_entry(area->free_list[migratetype].next,
728 struct
page, lru);
729 list_del(&page->lru);
730 rmv_page_order(page);

"page" is a valid pointer, but it looks like the members of lru are
corrupted, as seen in the first trace below:

Jan 19 02:01:46 (none) kernel: ------------[ cut here ]------------
Jan 19 02:01:47 (none) kernel: WARNING: at lib/list_debug.c:51
list_del+0x41/0x60()
Jan 19 02:01:47 (none) kernel: Hardware name: MacBook3,1
Jan 19 02:01:47 (none) kernel: list_del corruption. next->prev should
be c1b71018, but was 00005095
Jan 19 02:01:47 (none) kernel: Modules linked in: tun ipv6
acpi_cpufreq snd_pcm_oss snd_mixer_oss hfsplus ndiswrapper fuse
snd_hda_codec_realtek snd_hda_
intel snd_hda_codec joydev snd_hwdep sky2 applesmc led_class uvcvideo
firewire_ohci rtc_cmos snd_pcm videodev firewire_core input_polldev
rtc_core video
output snd_timer v4l1_compat shpchp battery rtc_lib ac appletouch
pcspkr snd thermal button processor ohci1394 pci_hotplug intel_agp
snd_page_alloc iTCO_
wdt i2c_i801 iTCO_vendor_support i2c_core
Jan 19 02:01:47 (none) kernel: Pid: 30559, comm: lt-ltfs Tainted: P
M 2.6.33-rc4-Gobo #3
Jan 19 02:01:47 (none) kernel: Call Trace:
Jan 19 02:01:47 (none) kernel: [<c0137f28>] warn_slowpath_common+0x6a/0x81
Jan 19 02:01:47 (none) kernel: [<c0400811>] ? list_del+0x41/0x60


For reference, this is the complete stack trace which I got yesterday:

Jan 18 00:58:30 (none) kernel: BUG: unable to handle kernel NULL
pointer dereference at 00000006
Jan 18 00:58:30 (none) kernel: IP: [<c019b505>] __rmqueue+0x98/0x36c
Jan 18 00:58:30 (none) kernel: *pdpt = 00000000298e7001 *pde = 0000000000000000
Jan 18 00:58:30 (none) kernel: Oops: 0002 [#1] PREEMPT SMP
Jan 18 00:58:30 (none) kernel: last sysfs file:
/System/Kernel/Objects/devices/LNXSYSTM:00/LNXSYBUS:00/ACPI0003:00/power_supply/ADP1/online
Jan 18 00:58:30 (none) kernel: Modules linked in: cdc_ether usbnet mii
cdc_acm tun kqemu ndiswrapper dvb_usb_dib0700 dib7000p dib0090
dib7000m dib0070 dv
b_usb dib8000 dvb_core dib3000mc dibx000_common ipv6 acpi_cpufreq
snd_pcm_oss snd_mixer_oss hfsplus fuse joydev snd_hda_codec_realtek
applesmc led_class
snd_hda_intel uvcvideo input_polldev snd_hda_codec videodev
firewire_ohci video firewire_core output snd_hwdep v4l1_compat ac sky2
battery snd_pcm i2c_i8
01 ohci1394 appletouch button thermal processor snd_timer snd i2c_core
intel_agp snd_page_alloc iTCO_wdt iTCO_vendor_support rtc_cmos pcspkr
rtc_core rtc
_lib shpchp pci_hotplug
Jan 18 00:58:30 (none) kernel:
Jan 18 00:58:30 (none) kernel: Pid: 10381, comm: lt-ltfs Tainted: P
2.6.33-rc4-Gobo #1 Mac-F22788C8/MacBook3,1
Jan 18 00:58:30 (none) kernel: EIP: 0060:[<c019b505>] EFLAGS: 00010086 CPU: 0
Jan 18 00:58:30 (none) kernel: EIP is at __rmqueue+0x98/0x36c
Jan 18 00:58:30 (none) kernel: EAX: 000001b8 EBX: c1b69000 ECX:
0000000a EDX: 00000002
Jan 18 00:58:30 (none) kernel: ESI: c0bb69c0 EDI: c0bb6ccc EBP:
f011ec64 ESP: f011ec2c
Jan 18 00:58:30 (none) kernel: DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
Jan 18 00:58:30 (none) kernel: Process lt-ltfs (pid: 10381,
ti=f011e000 task=f004a610 task.ti=f011e000)
Jan 18 00:58:30 (none) kernel: Stack:
Jan 18 00:58:30 (none) kernel: c01cc35e e9130990 00000000 00000000
00000010 00000000 c0bb6cb8 c0bb6cbc
Jan 18 00:58:30 (none) kernel: <0> 00000002 c1b69018 00000010 c0bb69c0
c1b78ff8 00000000 f011ecbc c019cb28
Jan 18 00:58:30 (none) kernel: <0> 00000000 00000040 00000002 ffffffff
0000001f 00000020 00000000 c0bb7244
Jan 18 00:58:30 (none) kernel: Call Trace:
Jan 18 00:58:30 (none) kernel: [<c01cc35e>] ? inode_get_bytes+0x48/0x54
Jan 18 00:58:31 (none) kernel: [<c019cb28>] ?
get_page_from_freelist+0x14c/0x3ea
Jan 18 00:58:31 (none) kernel: [<c019ce8c>] ? __alloc_pages_nodemask+0xc6/0x49a
Jan 18 00:58:31 (none) kernel: [<c01980ac>] ? find_get_page+0x2d/0xaf
Jan 18 00:58:31 (none) kernel: [<c01986af>] ?
grab_cache_page_write_begin+0x54/0x8e
Jan 18 00:58:31 (none) kernel: [<c021b54b>] ? reiserfs_write_begin+0x7b/0x1cf
Jan 18 00:58:31 (none) kernel: [<c0197a2d>] ?
generic_file_buffered_write+0xd2/0x1d2
Jan 18 00:58:31 (none) kernel: [<c019939d>] ?
__generic_file_aio_write+0x39f/0x3e0
Jan 18 00:58:31 (none) kernel: [<c01d9380>] ? wake_up_inode+0x1c/0x1e
Jan 18 00:58:31 (none) kernel: [<c023531d>] ? reiserfs_write_unlock+0x37/0x39
Jan 18 00:58:31 (none) kernel: [<c0851fcf>] ? _raw_spin_unlock+0xd/0x25
Jan 18 00:58:31 (none) kernel: [<c0199442>] ? generic_file_aio_write+0x64/0xab
Jan 18 00:58:31 (none) kernel: [<c01c9179>] ? do_sync_write+0x8e/0xc9
Jan 18 00:58:31 (none) kernel: [<c01d3906>] ? do_filp_open+0x564/0xa44
Jan 18 00:58:31 (none) kernel: [<c021f466>] ? reiserfs_file_write+0x6e/0x77
Jan 18 00:58:31 (none) kernel: [<c01c9b3e>] ? vfs_write+0x99/0x14c
Jan 18 00:58:31 (none) kernel: [<c021f3f8>] ? reiserfs_file_write+0x0/0x77
Jan 18 00:58:31 (none) kernel: [<c01c9cad>] ? sys_write+0x48/0x75
Jan 18 00:58:31 (none) kernel: [<c010345f>] ? sysenter_do_call+0x12/0x28
Jan 18 00:58:31 (none) kernel: Code: 39 5d f0 75 06 41 e9 a0 00 00 00
8b 55 e8 c1 e2 03 89 55 f0 01 c2 8b 94 16 44 01 00 00 89 d3 83 eb 18
89 55 ec 8b 7b
1c 8b 53 18 <89> 7a 04 89 17 c7 43 1c 00 02 20 00 c7 43 18 00 01 10 00 8b 7d


Do you have any suggestions on things that I should try? The last
kernel version that I used which works just fine is 2.6.27.4, which is
a bit old to look for possible regressions.

Thanks,
Lucas

2010-02-02 17:04:12

by Lucas C. Villa Real

[permalink] [raw]
Subject: Re: Oops with 2.6.32-rc6

On Tue, Jan 19, 2010 at 2:50 AM, Lucas C. Villa Real
<[email protected]> wrote:
>
> On Thu, Nov 19, 2009 at 1:48 AM, Lucas C. Villa Real
> <[email protected]> wrote:
> > Hi,
> >
> > I recently decided to test 2.6.32-rc6 and I noticed that, whenever too
> > many disk activity happens, the system crashes. The error shown in the
> > traces below happened about 3 times in a week.
> >
> > Do you have any suggestions?
> >
> > Thanks,
> > Lucas
> >
>
> I just got a reproduction of the kernel oops with 2.6.33-rc4, whose
> original report can be seen at
> http://bugzilla.kernel.org/show_bug.cgi?id=14656.
>
> I'm seeing this problem while I'm stressing a FUSE file system which
> is sitting on top of ReiserFS 3. However, since some write operations
> in this test-case also operate in the root filesystem I cannot tell if
> FUSE has anything to do with this. Based on the stack trace I would
> say no.
>
> I have one complete message which shows the complete stack trace,
> found below, and another partial one which includes some debugging
> messages from CONFIG_DEBUG_LIST=y. The very line which is causing the
> problem is a list_del() in __rmqueue:
>
> (gdb) list *__rmqueue+0x98
> 0x963 is in __rmqueue (mm/page_alloc.c:730).
> 725 continue;
> 726
> 727 page = list_entry(area->free_list[migratetype].next,
> 728 struct
> page, lru);
> 729 list_del(&page->lru);
> 730 rmv_page_order(page);
>
> "page" is a valid pointer, but it looks like the members of lru are
> corrupted, as seen in the first trace below:
>
> Jan 19 02:01:46 (none) kernel: ------------[ cut here ]------------
> Jan 19 02:01:47 (none) kernel: WARNING: at lib/list_debug.c:51
> list_del+0x41/0x60()
> Jan 19 02:01:47 (none) kernel: Hardware name: MacBook3,1
> Jan 19 02:01:47 (none) kernel: list_del corruption. next->prev should
> be c1b71018, but was 00005095
> Jan 19 02:01:47 (none) kernel: Modules linked in: tun ipv6
> acpi_cpufreq snd_pcm_oss snd_mixer_oss hfsplus ndiswrapper fuse
> snd_hda_codec_realtek snd_hda_
> intel snd_hda_codec joydev snd_hwdep sky2 applesmc led_class uvcvideo
> firewire_ohci rtc_cmos snd_pcm videodev firewire_core input_polldev
> rtc_core video
> output snd_timer v4l1_compat shpchp battery rtc_lib ac appletouch
> pcspkr snd thermal button processor ohci1394 pci_hotplug intel_agp
> snd_page_alloc iTCO_
> wdt i2c_i801 iTCO_vendor_support i2c_core
> Jan 19 02:01:47 (none) kernel: Pid: 30559, comm: lt-ltfs Tainted: P
> M 2.6.33-rc4-Gobo #3
> Jan 19 02:01:47 (none) kernel: Call Trace:
> Jan 19 02:01:47 (none) kernel: [<c0137f28>] warn_slowpath_common+0x6a/0x81
> Jan 19 02:01:47 (none) kernel: [<c0400811>] ? list_del+0x41/0x60
>
>
> For reference, this is the complete stack trace which I got yesterday:
>
> Jan 18 00:58:30 (none) kernel: BUG: unable to handle kernel NULL
> pointer dereference at 00000006
> Jan 18 00:58:30 (none) kernel: IP: [<c019b505>] __rmqueue+0x98/0x36c
> Jan 18 00:58:30 (none) kernel: *pdpt = 00000000298e7001 *pde = 0000000000000000
> Jan 18 00:58:30 (none) kernel: Oops: 0002 [#1] PREEMPT SMP
> Jan 18 00:58:30 (none) kernel: last sysfs file:
> /System/Kernel/Objects/devices/LNXSYSTM:00/LNXSYBUS:00/ACPI0003:00/power_supply/ADP1/online
> Jan 18 00:58:30 (none) kernel: Modules linked in: cdc_ether usbnet mii
> cdc_acm tun kqemu ndiswrapper dvb_usb_dib0700 dib7000p dib0090
> dib7000m dib0070 dv
> b_usb dib8000 dvb_core dib3000mc dibx000_common ipv6 acpi_cpufreq
> snd_pcm_oss snd_mixer_oss hfsplus fuse joydev snd_hda_codec_realtek
> applesmc led_class
> snd_hda_intel uvcvideo input_polldev snd_hda_codec videodev
> firewire_ohci video firewire_core output snd_hwdep v4l1_compat ac sky2
> battery snd_pcm i2c_i8
> 01 ohci1394 appletouch button thermal processor snd_timer snd i2c_core
> intel_agp snd_page_alloc iTCO_wdt iTCO_vendor_support rtc_cmos pcspkr
> rtc_core rtc
> _lib shpchp pci_hotplug
> Jan 18 00:58:30 (none) kernel:
> Jan 18 00:58:30 (none) kernel: Pid: 10381, comm: lt-ltfs Tainted: P
> 2.6.33-rc4-Gobo #1 Mac-F22788C8/MacBook3,1
> Jan 18 00:58:30 (none) kernel: EIP: 0060:[<c019b505>] EFLAGS: 00010086 CPU: 0
> Jan 18 00:58:30 (none) kernel: EIP is at __rmqueue+0x98/0x36c
> Jan 18 00:58:30 (none) kernel: EAX: 000001b8 EBX: c1b69000 ECX:
> 0000000a EDX: 00000002
> Jan 18 00:58:30 (none) kernel: ESI: c0bb69c0 EDI: c0bb6ccc EBP:
> f011ec64 ESP: f011ec2c
> Jan 18 00:58:30 (none) kernel: DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
> Jan 18 00:58:30 (none) kernel: Process lt-ltfs (pid: 10381,
> ti=f011e000 task=f004a610 task.ti=f011e000)
> Jan 18 00:58:30 (none) kernel: Stack:
> Jan 18 00:58:30 (none) kernel: c01cc35e e9130990 00000000 00000000
> 00000010 00000000 c0bb6cb8 c0bb6cbc
> Jan 18 00:58:30 (none) kernel: <0> 00000002 c1b69018 00000010 c0bb69c0
> c1b78ff8 00000000 f011ecbc c019cb28
> Jan 18 00:58:30 (none) kernel: <0> 00000000 00000040 00000002 ffffffff
> 0000001f 00000020 00000000 c0bb7244
> Jan 18 00:58:30 (none) kernel: Call Trace:
> Jan 18 00:58:30 (none) kernel: [<c01cc35e>] ? inode_get_bytes+0x48/0x54
> Jan 18 00:58:31 (none) kernel: [<c019cb28>] ?
> get_page_from_freelist+0x14c/0x3ea
> Jan 18 00:58:31 (none) kernel: [<c019ce8c>] ? __alloc_pages_nodemask+0xc6/0x49a
> Jan 18 00:58:31 (none) kernel: [<c01980ac>] ? find_get_page+0x2d/0xaf
> Jan 18 00:58:31 (none) kernel: [<c01986af>] ?
> grab_cache_page_write_begin+0x54/0x8e
> Jan 18 00:58:31 (none) kernel: [<c021b54b>] ? reiserfs_write_begin+0x7b/0x1cf
> Jan 18 00:58:31 (none) kernel: [<c0197a2d>] ?
> generic_file_buffered_write+0xd2/0x1d2
> Jan 18 00:58:31 (none) kernel: [<c019939d>] ?
> __generic_file_aio_write+0x39f/0x3e0
> Jan 18 00:58:31 (none) kernel: [<c01d9380>] ? wake_up_inode+0x1c/0x1e
> Jan 18 00:58:31 (none) kernel: [<c023531d>] ? reiserfs_write_unlock+0x37/0x39
> Jan 18 00:58:31 (none) kernel: [<c0851fcf>] ? _raw_spin_unlock+0xd/0x25
> Jan 18 00:58:31 (none) kernel: [<c0199442>] ? generic_file_aio_write+0x64/0xab
> Jan 18 00:58:31 (none) kernel: [<c01c9179>] ? do_sync_write+0x8e/0xc9
> Jan 18 00:58:31 (none) kernel: [<c01d3906>] ? do_filp_open+0x564/0xa44
> Jan 18 00:58:31 (none) kernel: [<c021f466>] ? reiserfs_file_write+0x6e/0x77
> Jan 18 00:58:31 (none) kernel: [<c01c9b3e>] ? vfs_write+0x99/0x14c
> Jan 18 00:58:31 (none) kernel: [<c021f3f8>] ? reiserfs_file_write+0x0/0x77
> Jan 18 00:58:31 (none) kernel: [<c01c9cad>] ? sys_write+0x48/0x75
> Jan 18 00:58:31 (none) kernel: [<c010345f>] ? sysenter_do_call+0x12/0x28
> Jan 18 00:58:31 (none) kernel: Code: 39 5d f0 75 06 41 e9 a0 00 00 00
> 8b 55 e8 c1 e2 03 89 55 f0 01 c2 8b 94 16 44 01 00 00 89 d3 83 eb 18
> 89 55 ec 8b 7b
> 1c 8b 53 18 <89> 7a 04 89 17 c7 43 1c 00 02 20 00 c7 43 18 00 01 10 00 8b 7d
>
>
> Do you have any suggestions on things that I should try? The last
> kernel version that I used which works just fine is 2.6.27.4, which is
> a bit old to look for possible regressions.

Hi, folks,

I compiled linux-2.6-stable from Git last night and just got a
reproduction of this oops.

A few days ago I took a diff from 2.6.27.4, which was the latest
stable version I had installed, to 2.6.33-rc4. All the significant
changes involve locking operations, such as the removal of the BKL and
lock contention fixes.

I'm about to rollback a few of these, starting with the BKL ones, in
an attempt to find the culprit. However I'd really like to have some
comments from some of you, as I'm not familiar with ReiserFS code.

The new trace finds below.

Thanks,
Lucas


Feb 2 14:40:32 (none) kernel: ------------[ cut here ]------------
Feb 2 14:40:32 (none) kernel: WARNING: at lib/list_debug.c:51
list_del+0x41/0x60()
Feb 2 14:40:32 (none) kernel: Hardware name: MacBook3,1
Feb 2 14:40:32 (none) kernel: list_del corruption. next->prev should
be c1b71018, but was 000056d5
Feb 2 14:40:32 (none) kernel: Modules linked in: ndiswrapper tun fuse
ipv6 acpi_cpufreq snd_pcm_oss snd_mixer_oss hfsplus
snd_hda_codec_realtek s
nd_hda_intel joydev sky2 snd_hda_codec uvcvideo applesmc led_class
snd_hwdep rtc_cmos videodev video snd_pcm firewire_ohci firewire_core
snd_timer
input_polldev output v4l1_compat rtc_core battery snd ac shpchp
appletouch thermal processor button rtc_lib ohci1394 intel_agp
snd_page_alloc pci
_hotplug pcspkr iTCO_wdt iTCO_vendor_support i2c_i801 i2c_core [last
unloaded: fuse]
Feb 2 14:40:32 (none) kernel: Pid: 24395, comm: lnotes Tainted: P M
2.6.33-rc6-Gobo-00072-gab65832-dirty #1
Feb 2 14:40:32 (none) kernel: Call Trace:
Feb 2 14:40:32 (none) kernel: [<c0137f50>] warn_slowpath_common+0x6a/0x81
Feb 2 14:40:32 (none) kernel: [<c0400581>] ? list_del+0x41/0x60
Feb 2 14:40:32 (none) kernel: [<c0137fa5>] warn_slowpath_fmt+0x29/0x2c
Feb 2 14:40:32 (none) kernel: [<c0400581>] list_del+0x41/0x60
Feb 2 14:40:32 (none) kernel: [<c019c1ca>] __rmqueue+0x9f/0x38f
Feb 2 14:40:32 (none) kernel: [<c019d7a5>] get_page_from_freelist+0x151/0x3ea
Feb 2 14:40:32 (none) kernel: [<c019db04>] __alloc_pages_nodemask+0xc6/0x49a
Feb 2 14:40:32 (none) kernel: [<c01c61ad>] ?
mem_cgroup_charge_statistics+0xad/0xc5
Feb 2 14:40:32 (none) kernel: [<c01c638d>] ?
__mem_cgroup_commit_charge+0xc1/0xd8
Feb 2 14:40:32 (none) kernel: [<c0852ff1>] ? sub_preempt_count+0x8/0x74
Feb 2 14:40:32 (none) kernel: [<c019ff65>] ? __lru_cache_add+0x71/0x89
Feb 2 14:40:32 (none) kernel: [<c01aa9c7>] ? page_address+0xe/0xb5
Feb 2 14:40:32 (none) kernel: [<c019ffa7>] ? lru_cache_add_lru+0x2a/0x2c
Feb 2 14:40:32 (none) kernel: [<c01ad5dc>] handle_mm_fault+0x1ff/0x897
Feb 2 14:40:32 (none) kernel: [<c01d9249>] ? __d_lookup+0xf1/0x10d
Feb 2 14:40:32 (none) kernel: [<c0852fd3>] do_page_fault+0x350/0x366
Feb 2 14:40:32 (none) kernel: [<c0852c83>] ? do_page_fault+0x0/0x366
Feb 2 14:40:32 (none) kernel: [<c0850d53>] error_code+0x73/0x78
Feb 2 14:40:32 (none) kernel: [<c085007b>] ? _raw_spin_unlock+0x2b/0x2c
Feb 2 14:40:32 (none) kernel: [<c019894d>] ? file_read_actor+0x42/0xc6
Feb 2 14:40:32 (none) kernel: [<c019a5a3>] generic_file_aio_read+0x327/0x50c
Feb 2 14:40:32 (none) kernel: [<c01c9d86>] do_sync_read+0x8e/0xc9
Feb 2 14:40:32 (none) kernel: [<c019ffa7>] ? lru_cache_add_lru+0x2a/0x2c
Feb 2 14:40:32 (none) kernel: [<c011da3b>] ? native_set_pte_at+0xc/0x19
Feb 2 14:40:32 (none) kernel: [<c0852ff1>] ? sub_preempt_count+0x8/0x74
Feb 2 14:40:32 (none) kernel: [<c01c988a>] ?
generic_file_llseek_unlocked+0xe/0x84
Feb 2 14:40:32 (none) kernel: [<c084ec93>] ? mutex_unlock+0x8/0x1b
Feb 2 14:40:32 (none) kernel: [<c01c9dd2>] ? rw_verify_area+0x11/0xa7
Feb 2 14:40:32 (none) kernel: [<c01ca8b5>] vfs_read+0x97/0x14a
Feb 2 14:40:32 (none) kernel: [<c01c9cf8>] ? do_sync_read+0x0/0xc9
Feb 2 14:40:33 (none) kernel: [<c01caa24>] sys_read+0x48/0x75
Feb 2 14:40:33 (none) kernel: [<c010345f>] sysenter_do_call+0x12/0x28
Feb 2 14:40:33 (none) kernel: ---[ end trace c8086567704fab22 ]---
Feb 2 14:40:33 (none) kernel: BUG: unable to handle kernel NULL
pointer dereference at 00000006