2003-03-17 23:45:03

by Matt C

[permalink] [raw]
Subject: 2.4.21-pre5 BUG: vmscan.c:359

Hi-

Got the following OOPS on 2.4.21-pre5:

kernel BUG at vmscan.c:359!
invalid operand: 0000
CPU: 3
EIP: 0010:[<c0133570>] Not tainted
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 00010202
eax: fdd026d3 ebx: 00000000 ecx: c112acbc edx: c2856000
esi: c112aca0 edi: 00000005 ebp: c2857f50 esp: c2857f1c
ds: 0018 es: 0018 ss: 0018
Process kswapd (pid: 7, stackpage=c2857000)
Stack: f7361f00 c2856000 000001b1 0000f009 000001d0 c02bc468 c2864c00 00000001
00000001 00000001 00000020 000001d0 00000006 c2857f74 c01339b4 00000006
00000006 00000020 c02bc468 00000006 000001d0 c02bc468 c2857f8c c0133a1c
Call Trace: [<c01339b4>] [<c0133a1c>] [<c0133b4f>] [<c0133bc6>] [<c0133cff>]
[<c0133c60>] [<c0105000>] [<c0107476>] [<c0133c60>]
Code: 0f 0b 67 01 58 9c 27 c0 8b 01 31 db 8b 51 04 89 50 04 89 02

>>EIP; c0133570 <shrink_cache+e0/3c0> <=====
Trace; c01339b4 <shrink_caches+54/80>
Trace; c0133a1c <try_to_free_pages_zone+3c/60>
Trace; c0133b4f <kswapd_balance_pgdat+4f/a0>
Trace; c0133bc6 <kswapd_balance+26/40>
Trace; c0133cff <kswapd+9f/b8>
Trace; c0133c60 <kswapd+0/b8>
Trace; c0105000 <_stext+0/0>
Trace; c0107476 <kernel_thread+26/40>
Trace; c0133c60 <kswapd+0/b8>
Code; c0133570 <shrink_cache+e0/3c0>
00000000 <_EIP>:
Code; c0133570 <shrink_cache+e0/3c0> <=====
0: 0f 0b ud2a <=====
Code; c0133572 <shrink_cache+e2/3c0>
2: 67 01 58 9c addr16 add %ebx,-100(%bx,%si)
Code; c0133576 <shrink_cache+e6/3c0>
6: 27 daa
Code; c0133577 <shrink_cache+e7/3c0>
7: c0 8b 01 31 db 8b 51 rorb $0x51,0x8bdb3101(%ebx)
Code; c013357e <shrink_cache+ee/3c0>
e: 04 89 add $0x89,%al
Code; c0133580 <shrink_cache+f0/3c0>
10: 50 push %eax
Code; c0133581 <shrink_cache+f1/3c0>
11: 04 89 add $0x89,%al
Code; c0133583 <shrink_cache+f3/3c0>
13: 02 00 add (%eax),%al

The machine is an HP LT6000R:
4x550MHz Xeon
2GB ECC RAM
megaraid RAID controller
e100 network interface

I've run memtest86 on the machine for 72hours with zero errors, so I don't
think this is a hardware problem. The machine is also quite stable running
2.4.18. The host was under synthetic load when this happened, so it
should be reproduceable. This oops happened after about 2 days of
constant load on the machine. The synthetic load was:

- stress-kernel (aka cerberus)
- a simple file copy loop that runs a find on an NFS mount (autofs, linux
server), and for each file it copies it to a local ext3 filesystem, cats
the file to /dev/null and then deletes the file. files range in size from
1k to 2GB.
- 'top' and 'vmstat', of course...

The only patches to the kernel were redhat's netdump code (so we can get
the oops) and the O_STREAMING patch. Let me know what other info would be
helpful from the machine. I can easily restart the test load as needed, as
the machine is dedicated to kernel testing.

Thanks!

-matt


2003-03-24 11:12:53

by Andrew Ferguson

[permalink] [raw]
Subject: Re: 2.4.21-pre5 BUG: vmscan.c:359

Hi,
First off, I am not subscribed to this list and I have never filed an
OOPS report before, but I was encouraged to do so by the original poster of
this problem. I had the exact same OOPS as vmscan.c:359 in the kswapd process
under kernel 2.4.21-pre5. It is important to note that there are no hardware
similarities between the two computers. The original poster had a quad Pentium
4, my computer is an Athlon. We are both using ext3 on our filesystems,
however.

kernel BUG at vmscan.c:359!
invalid operand: 0000
CPU: 0
EIP: 0010:[shrink_cache+192/784] Not tainted
EIP: 0010:[<c012b940>] Not tainted
EFLAGS: 00010202
eax: 010000cc ebx: c17a6000 ecx: c19c033c edx: c1c34000
esi: c19c0320 edi: 0000001e ebp: 00005041 esp: c1c35f34
ds: 0018 es: 0018 ss: 0018
Process kswapd (pid: 4, stackpage=c1c35000)
Stack: 00000000 c1c34000 00000200 000001d0 c02754e0 c1c0d198 d075fcc0 c1c0dae0
00000001 00000020 000001d0 00000006 00000020 c012bce0 00000006 0000000b
c02754e0 00000006 000001d0 c02754e0 00000000 c012bd4c 00000020
c02754e0 Call Trace: [shrink_caches+80/128] [try_to_free_pages_zone+60/96]
[kswapd_balance_pgdat+79/160] [kswapd_balance+38/64] [kswapd+161/192]
Call Trace: [<c012bce0>] [<c012bd4c>] [<c012be5f>] [<c012bed6>] [<c012c011>]
[kswapd+0/192] [_stext+0/48] [kernel_thread+38/48] [kswapd+0/192]
[<c012bf70>] [<c0105000>] [<c0107116>] [<c012bf70>]

Code: 0f 0b 67 01 a6 3d 24 c0 8b 01 8b 51 04 31 db 89 50 04 89 02

>> EIP; c012b940 <shrink_cache+c0/310> <=====
Trace; c012bce0 <shrink_caches+50/80>
Trace; c012bd4c <try_to_free_pages_zone+3c/60>
Trace; c012be5e <kswapd_balance_pgdat+4e/a0>
Trace; c012bed6 <kswapd_balance+26/40>
Trace; c012c010 <kswapd+a0/c0>
Trace; c012bf70 <kswapd+0/c0>
Trace; c0105000 <_stext+0/0>
Trace; c0107116 <kernel_thread+26/30>
Trace; c012bf70 <kswapd+0/c0>
Code; c012b940 <shrink_cache+c0/310>
00000000 <_EIP>:
Code; c012b940 <shrink_cache+c0/310> <=====
0: 0f 0b ud2a <=====
Code; c012b942 <shrink_cache+c2/310>
2: 67 01 a6 3d 24 addr16 add %esp,9277(%bp)
Code; c012b946 <shrink_cache+c6/310>
7: c0 8b 01 8b 51 04 31 rorb $0x31,0x4518b01(%ebx)
Code; c012b94e <shrink_cache+ce/310>
e: db 89 50 04 89 02 (bad) 0x2890450(%ecx)


The system was under minimal load at the time (xmms, StarOffice, gaim) and
does not use either the netdump or O_STREAMING patches that were used in the
original post. The system has successfully passed several hours of memtest86.

Also, when I rebooted, ext3 declared that it was "recovering journal" on my
/home filesystem if that's of any consequence.

Please CC me on any replies, thanks!

Oh, and the original reporter has stated that moving to 2.4.21-pre5aa2 seems
to have cured him of this problem.

______________________________________
Andrew Ferguson - [email protected]
http://www.princeton.edu/~owsla/
http://www.phstower.org/